- 20 7月, 2011 1 次提交
-
-
由 Al Viro 提交于
not used in the instances anymore. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 13 7月, 2011 1 次提交
-
-
由 Ryusuke Konishi 提交于
Resize feature was supported by the commit 4e33f9ea but it was not reflected to the list of unsupported features in nilfs2.txt file. This updates the list to fix discrepancy. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
- 08 7月, 2011 1 次提交
-
-
由 David Howells 提交于
Add an FS-Cache helper to bulk uncache pages on an inode. This will only work for the circumstance where the pages in the cache correspond 1:1 with the pages attached to an inode's page cache. This is required for CIFS and NFS: When disabling inode cookie, we were returning the cookie and setting cifsi->fscache to NULL but failed to invalidate any previously mapped pages. This resulted in "Bad page state" errors and manifested in other kind of errors when running fsstress. Fix it by uncaching mapped pages when we disable the inode cookie. This patch should fix the following oops and "Bad page state" errors seen during fsstress testing. ------------[ cut here ]------------ kernel BUG at fs/cachefiles/namei.c:201! invalid opcode: 0000 [#1] SMP Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles] RSP: 0018:ffff88002ce6dd00 EFLAGS: 00010282 RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282 RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300 R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840 R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0 FS: 00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0) Stack: 0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00 ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380 ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56 Call Trace: cachefiles_lookup_object+0x78/0xd4 [cachefiles] fscache_lookup_object+0x131/0x16d [fscache] fscache_object_work_func+0x1bc/0x669 [fscache] process_one_work+0x186/0x298 worker_thread+0xda/0x15d kthread+0x84/0x8c kernel_thread_helper+0x4/0x10 RIP cachefiles_walk_to_object+0x436/0x745 [cachefiles] ---[ end trace 1d481c9af1804caa ]--- I tested the uncaching by the following means: (1) Create a big file on my NFS server (104857600 bytes). (2) Read the file into the cache with md5sum on the NFS client. Look in /proc/fs/fscache/stats: Pages : mrk=25601 unc=0 (3) Open the file for read/write ("bash 5<>/warthog/bigfile"). Look in proc again: Pages : mrk=25601 unc=25601 Reported-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-and-Tested-by: NSuresh Jayaraman <sjayaraman@suse.de> cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 6月, 2011 1 次提交
-
-
由 Shaohua Li 提交于
Commit a26ac245(rcu: move TREE_RCU from softirq to kthread) introduced performance regression. In an AIM7 test, this commit degraded performance by about 40%. The commit runs rcu callbacks in a kthread instead of softirq. We observed high rate of context switch which is caused by this. Out test system has 64 CPUs and HZ is 1000, so we saw more than 64k context switch per second which is caused by RCU's per-CPU kthread. A trace showed that most of the time the RCU per-CPU kthread doesn't actually handle any callbacks, but instead just does a very small amount of work handling grace periods. This means that RCU's per-CPU kthreads are making the scheduler do quite a bit of work in order to allow a very small amount of RCU-related processing to be done. Alex Shi's analysis determined that this slowdown is due to lock contention within the scheduler. Unfortunately, as Peter Zijlstra points out, the scheduler's real-time semantics require global action, which means that this contention is inherent in real-time scheduling. (Yes, perhaps someone will come up with a workaround -- otherwise, -rt is not going to do well on large SMP systems -- but this patch will work around this issue in the meantime. And "the meantime" might well be forever.) This patch therefore re-introduces softirq processing to RCU, but only for core RCU work. RCU callbacks are still executed in kthread context, so that only a small amount of RCU work runs in softirq context in the common case. This should minimize ksoftirqd execution, allowing us to skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Tested-by: N"Alex,Shi" <alex.shi@intel.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 27 5月, 2011 2 次提交
-
-
由 Christoph Hellwig 提交于
Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or anything else, so that the filesystem can track internally if it needs to push out a transaction for fdatasync or not. This is just the prototype change with no user for it yet. I plan to push large XFS changes for the next merge window, and getting this trivial infrastructure in this window would help a lot to avoid tree interdependencies. Also remove incorrect comments that ->dirty_inode can't block. That has been changed a long time ago, and many implementations rely on it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jiri Slaby 提交于
When configfs_register_subsystem() fails, we unregister too many subsystems in configfs_example_init. Decrement i by one to not unregister non-registered subsystem. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NJiri Slaby <jslaby@suse.cz> Cc: Joel Becker <joel.becker@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 5月, 2011 3 次提交
-
-
由 Mike Travis 提交于
Manually adjusting the smp_affinity for IRQ's becomes unwieldy when the cpu count is large. Setting smp affinity to cpus 256 to 263 would be: echo 000000ff,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > smp_affinity instead of: echo 256-263 > smp_affinity_list Think about what it looks like for cpus around say, 4088 to 4095. We already have many alternate "list" interfaces: /sys/devices/system/cpu/cpuX/indexY/shared_cpu_list /sys/devices/system/cpu/cpuX/topology/thread_siblings_list /sys/devices/system/cpu/cpuX/topology/core_siblings_list /sys/devices/system/node/nodeX/cpulist /sys/devices/pci***/***/local_cpulist Add a companion interface, smp_affinity_list to use cpu lists instead of cpu maps. This conforms to other companion interfaces where both a map and a list interface exists. This required adding a bitmap_parselist_user() function in a manner similar to the bitmap_parse_user() function. [akpm@linux-foundation.org: make __bitmap_parselist() static] Signed-off-by: NMike Travis <travis@sgi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jack Steiner <steiner@sgi.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Van Hensbergen 提交于
Update documentation pointers to include virtfs publication, 9p RFC as well as updated list of servers and alternative clients. Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
由 Christoph Hellwig 提交于
Now that we have reliably tracking of deleted extents in a transaction we can easily implement "online" discard support which calls blkdev_issue_discard once a transaction commits. The actual discard is a two stage operation as we first have to mark the busy extent as not available for reuse before we can start the actual discard. Note that we don't bother supporting discard for the non-delaylog mode. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 24 5月, 2011 1 次提交
-
-
由 Tiger Yang 提交于
As ocfs2 supports relatime and strictatime, we need update the relative document. Atime_quantum need work with strictatime, so only show it in procfs when mount with strictatime. Signed-off-by: NTiger Yang <tiger.yang@oracle.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
- 23 5月, 2011 1 次提交
-
-
由 Artem Bityutskiy 提交于
Switch to debugging using dynamic printk (pr_debug()). There is no good reason to carry custom debugging prints if there is so cool and powerful generic dynamic printk infrastructure, see Documentation/dynamic-debug-howto.txt. With dynamic printks we can switch on/of individual prints, per-file, per-function and per format messages. This means that instead of doing old-fashioned echo 1 > /sys/module/ubifs/parameters/debug_msgs to enable general messages, we can do: echo 'format "UBIFS DBG gen" +ptlf' > control to enable general messages and additionally ask the dynamic printk infrastructure to print process ID, line number and function name. So there is no reason to keep UBIFS-specific crud if there is more powerful generic thing. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
- 20 5月, 2011 1 次提交
-
-
由 Randy Dunlap 提交于
move LSM-, credentials-, and keys-related files from Documentation/ to Documentation/security/, add Documentation/security/00-INDEX, and update all occurrences of Documentation/<moved_file> to Documentation/security/<moved_file>.
-
- 14 5月, 2011 1 次提交
-
-
由 Artem Bityutskiy 提交于
UBIFS can force itself to use the 'in-the-gaps' commit method - the last resort method which is normally invoced very very rarely. Currently this "force int-the-gaps" debugging feature is a separate test mode. But it is a bit saner to make it to be the "general" self-test check instead. This patch is just a clean-up which should make the debugging code look a bit nicer and easier to use - we have way too many debugging options. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
- 06 5月, 2011 1 次提交
-
-
由 Paul E. McKenney 提交于
If RCU priority boosting is to be meaningful, callback invocation must be boosted in addition to preempted RCU readers. Otherwise, in presence of CPU real-time threads, the grace period ends, but the callbacks don't get invoked. If the callbacks don't get invoked, the associated memory doesn't get freed, so the system is still subject to OOM. But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit moves the callback invocations to a kthread, which can be boosted easily. Also add comments and properly synchronized all accesses to rcu_cpu_kthread_task, as suggested by Lai Jiangshan. Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
- 02 5月, 2011 1 次提交
-
-
由 Theodore Ts'o 提交于
The block reservation code from ext3 was removed long ago... Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 31 3月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
-
- 25 3月, 2011 1 次提交
-
-
由 Dave Chinner 提交于
Now that inode state changes are protected by the inode->i_lock and the inode LRU manipulations by the inode_lru_lock, we can remove the inode_lock from prune_icache and the initial part of iput_final(). instead of using the inode_lock to protect the inode during iput_final, use the inode->i_lock instead. This protects the inode against new references being taken while we change the inode state to I_FREEING, as well as preventing prune_icache from grabbing the inode while we are manipulating it. Hence we no longer need the inode_lock in iput_final prior to setting I_FREEING on the inode. For prune_icache, we no longer need the inode_lock to protect the LRU list, and the inodes themselves are protected against freeing races by the inode->i_lock. Hence we can lift the inode_lock from prune_icache as well. Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 23 3月, 2011 1 次提交
-
-
由 Stuart Swales 提交于
ADFS (FileCore) storage complies with the RISC OS filetype specification (12 bits of file type information is stored in the file load address, rather than using a file extension). The existing driver largely ignores this information and does not present it to the end user. It is desirable that stored filetypes be made visible to the end user to facilitate a precise copy of data and metadata from a hard disc (or image thereof) into a RISC OS emulator (such as RPCEmu) or to a network share which can be accessed by real Acorn systems. This patch implements a per-mount filetype suffix option (use -o ftsuffix=1) to present any filetype as a ,xyz hexadecimal suffix on each file. This type suffix is compatible with that used by RISC OS systems that access network servers using NFS client software and by RPCemu's host filing system. Signed-off-by: NStuart Swales <stuart.swales.croftnuisk@gmail.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 3月, 2011 1 次提交
-
-
由 Al Viro 提交于
it's always false... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 17 3月, 2011 1 次提交
-
-
由 Al Viro 提交于
This is an ex-parrot. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 3月, 2011 1 次提交
-
-
由 Boaz Harrosh 提交于
If /dev/osd* devices are shuffled because more devices where added, and/or login order has changed. It is hard to mount the FS you want. Add an option to mount by osdname. osdname is any osd-device's osdname as specified to the mkfs.exofs command when formatting the osd-devices. The new mount format is: OPT="osdname=$UUID0,pid=$PID,_netdev" mount -t exofs -o $OPT $DEV_OSD0 $MOUNTDIR if "osdname=" is specified in options above $DEV_OSD0 is ignored and can be empty. Also while at it: Removed some old unused Opt_* enums. Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
- 12 3月, 2011 1 次提交
-
-
由 Fred Isaman 提交于
Signed-off-by: NFred Isaman <iisaman@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 11 3月, 2011 1 次提交
-
-
由 Artem Bityutskiy 提交于
Change the default UBIFS behavior WRT data CRC checking. Currently, UBIFS checks data CRC when reading, which slows it down quite a bit, and this is the default option. However, it looks like in average user does not need this feature and would prefer faster read speed over extra reliability. And this seems to be de-facto standard that file-systems do not check data CRC every time they read from the media. Thus, make UBIFS default behavior so that it does not check data CRC. This corresponds to the no_chk_data_crc mount option. Those users who need extra protection can always enable it using the chk_data_crc option. Please, read more information about this feature here: http://www.linux-mtd.infradead.org/doc/ubifs.html#L_checksummingSigned-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
- 01 3月, 2011 1 次提交
-
-
由 Phillip Lougher 提交于
Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
-
- 23 2月, 2011 1 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 22 2月, 2011 1 次提交
-
-
由 Lukas Czerner 提交于
Add documentation for mount options and ioctls to Documentation/filesystem/ext4.txt, which has not been udpated for some time. Also add for ext4 sysfs tunables to the Documentation/ABI/testing/sysfs-fs-ext4 file, and fix a few typographical errors in that file. https://bugzilla.kernel.org/show_bug.cgi?id=9423Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 18 2月, 2011 1 次提交
-
-
由 Alexander Kurz 提交于
Signed-off-by: NAlexander Kurz <linux@kbdbabel.org> Acked-by: NRandy Dunlap <rdunlap@xenotime.net> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 04 2月, 2011 2 次提交
-
-
由 Bart Van Assche 提交于
Since snprintf() may return a value that exceeds its second argument, show() methods should use scnprintf() instead of snprintf(). This patch updates the example in the sysfs documentation accordingly. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Cc: Tejun Heo <tj@kernel.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Bart Van Assche 提交于
Some time ago the way how sysfs stores a pointer to a kobject corresponding to a directory was modified. This patch brings the documentation again in sync with the implementation. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Cc: Tejun Heo <tj@kernel.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 31 1月, 2011 1 次提交
-
-
由 Anton Altaparmakov 提交于
In ntfs_mft_record_alloc() when mapping the new extent mft record with map_extent_mft_record() we overwrite @m with the return value and on error, we then try to use the old @m but that is no longer there as @m now contains an error code instead so we crash when dereferencing the error code as if it were a pointer. The simple fix is to use a temporary variable to store the return value thus preserving the original @m for later use. This is a backport from the commercial Tuxera-NTFS driver and is well tested... Thanks go to Julia Lawall for pointing this out (whilst I had fixed it in the commercial driver I had failed to fix it in the Linux kernel). Signed-off-by: NAnton Altaparmakov <anton@tuxera.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 1月, 2011 1 次提交
-
-
由 Christoph Hellwig 提交于
Currently all filesystems except XFS implement fallocate asynchronously, while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC I/O we really want our allocation on disk, especially for the !KEEP_SIZE case where we actually grow the file with user-visible zeroes. On the other hand always commiting the transaction is a bad idea for fast-path uses of fallocate like for example in recent Samba versions. Given that block allocation is a data plane operation anyway change it from an inode operation to a file operation so that we have the file structure available that lets us check for O_SYNC. This also includes moving the code around for a few of the filesystems, and remove the already unnedded S_ISDIR checks given that we only wire up fallocate for regular files. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 1月, 2011 4 次提交
-
-
由 David Howells 提交于
Unexport do_add_mount() and make ->d_automount() return the vfsmount to be added rather than calling do_add_mount() itself. follow_automount() will then do the addition. This slightly complicates things as ->d_automount() normally wants to add the new vfsmount to an expiration list and start an expiration timer. The problem with that is that the vfsmount will be deleted if it has a refcount of 1 and the timer will not repeat if the expiration list is empty. To this end, we require the vfsmount to be returned from d_automount() with a refcount of (at least) 2. One of these refs will be dropped unconditionally. In addition, follow_automount() must get a 3rd ref around the call to do_add_mount() lest it eat a ref and return an error, leaving the mount we have open to being expired as we would otherwise have only 1 ref on it. d_automount() should also add the the vfsmount to the expiration list (by calling mnt_set_expiry()) and start the expiration timer before returning, if this mechanism is to be used. The vfsmount will be unlinked from the expiration list by follow_automount() if do_add_mount() fails. This patch also fixes the call to do_add_mount() for AFS to propagate the mount flags from the parent vfsmount. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 David Howells 提交于
Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well as when it is in Ref-walk mode. This permits __follow_mount_rcu() to call d_manage() directly. d_manage() needs a parameter to indicate that it is in RCU-walk mode as it isn't allowed to sleep if in that mode (but should return -ECHILD instead). autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon accesses it and otherwise request dropping back to ref-walk mode. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 David Howells 提交于
Add a dentry op (d_manage) to permit a filesystem to hold a process and make it sleep when it tries to transit away from one of that filesystem's directories during a pathwalk. The operation is keyed off a new dentry flag (DCACHE_MANAGE_TRANSIT). The filesystem is allowed to be selective about which processes it holds and which it permits to continue on or prohibits from transiting from each flagged directory. This will allow autofs to hold up client processes whilst letting its userspace daemon through to maintain the directory or the stuff behind it or mounted upon it. The ->d_manage() dentry operation: int (*d_manage)(struct path *path, bool mounting_here); takes a pointer to the directory about to be transited away from and a flag indicating whether the transit is undertaken by do_add_mount() or do_move_mount() skipping through a pile of filesystems mounted on a mountpoint. It should return 0 if successful and to let the process continue on its way; -EISDIR to prohibit the caller from skipping to overmounted filesystems or automounting, and to use this directory; or some other error code to return to the user. ->d_manage() is called with namespace_sem writelocked if mounting_here is true and no other locks held, so it may sleep. However, if mounting_here is true, it may not initiate or wait for a mount or unmount upon the parameter directory, even if the act is actually performed by userspace. Within fs/namei.c, follow_managed() is extended to check with d_manage() first on each managed directory, before transiting away from it or attempting to automount upon it. follow_down() is renamed follow_down_one() and should only be used where the filesystem deliberately intends to avoid management steps (e.g. autofs). A new follow_down() is added that incorporates the loop done by all other callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS and CIFS do use it, their use is removed by converting them to use d_automount()). The new follow_down() calls d_manage() as appropriate. It also takes an extra parameter to indicate if it is being called from mount code (with namespace_sem writelocked) which it passes to d_manage(). follow_down() ignores automount points so that it can be used to mount on them. __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have that determine whether to abort or not itself. That would allow the autofs daemon to continue on in rcu-walk mode. Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't required as every tranist from that directory will cause d_manage() to be invoked. It can always be set again when necessary. ========================== WHAT THIS MEANS FOR AUTOFS ========================== Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to trigger the automounting of indirect mounts, and both of these can be called with i_mutex held. autofs knows that the i_mutex will be held by the caller in lookup(), and so can drop it before invoking the daemon - but this isn't so for d_revalidate(), since the lock is only held on _some_ of the code paths that call it. This means that autofs can't risk dropping i_mutex from its d_revalidate() function before it calls the daemon. The bug could manifest itself as, for example, a process that's trying to validate an automount dentry that gets made to wait because that dentry is expired and needs cleaning up: mkdir S ffffffff8014e05a 0 32580 24956 Call Trace: [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f [<ffffffff80057a2f>] lookup_create+0x46/0x80 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4 versus the automount daemon which wants to remove that dentry, but can't because the normal process is holding the i_mutex lock: automount D ffffffff8014e05a 0 32581 1 32561 Call Trace: [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14 [<ffffffff800e6d55>] do_rmdir+0x77/0xde [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 which means that the system is deadlocked. This patch allows autofs to hold up normal processes whilst the daemon goes ahead and does things to the dentry tree behind the automouter point without risking a deadlock as almost no locks are held in d_manage() and none in d_automount(). Signed-off-by: NDavid Howells <dhowells@redhat.com> Was-Acked-by: NIan Kent <raven@themaw.net> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 David Howells 提交于
Add a dentry op (d_automount) to handle automounting directories rather than abusing the follow_link() inode operation. The operation is keyed off a new dentry flag (DCACHE_NEED_AUTOMOUNT). This also makes it easier to add an AT_ flag to suppress terminal segment automount during pathwalk and removes the need for the kludge code in the pathwalk algorithm to handle directories with follow_link() semantics. The ->d_automount() dentry operation: struct vfsmount *(*d_automount)(struct path *mountpoint); takes a pointer to the directory to be mounted upon, which is expected to provide sufficient data to determine what should be mounted. If successful, it should return the vfsmount struct it creates (which it should also have added to the namespace using do_add_mount() or similar). If there's a collision with another automount attempt, NULL should be returned. If the directory specified by the parameter should be used directly rather than being mounted upon, -EISDIR should be returned. In any other case, an error code should be returned. The ->d_automount() operation is called with no locks held and may sleep. At this point the pathwalk algorithm will be in ref-walk mode. Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is added to handle mountpoints. It will return -EREMOTE if the automount flag was set, but no d_automount() op was supplied, -ELOOP if we've encountered too many symlinks or mountpoints, -EISDIR if the walk point should be used without mounting and 0 if successful. The path will be updated to point to the mounted filesystem if a successful automount took place. __follow_mount() is replaced by follow_managed() which is more generic (especially with the patch that adds ->d_manage()). This handles transits from directories during pathwalk, including automounting and skipping over mountpoints (and holding processes with the next patch). __follow_mount_rcu() will jump out of RCU-walk mode if it encounters an automount point with nothing mounted on it. follow_dotdot*() does not handle automounts as you don't want to trigger them whilst following "..". I've also extracted the mount/don't-mount logic from autofs4 and included it here. It makes the mount go ahead anyway if someone calls open() or creat(), tries to traverse the directory, tries to chdir/chroot/etc. into the directory, or sticks a '/' on the end of the pathname. If they do a stat(), however, they'll only trigger the automount if they didn't also say O_NOFOLLOW. I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their inodes as automount points. This flag is automatically propagated to the dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate(). This saves NFS and could save AFS a private flag bit apiece, but is not strictly necessary. It would be preferable to do the propagation in d_set_d_op(), but that doesn't normally have access to the inode. [AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu() succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after that, rather than just returning with ungrabbed *path] Signed-off-by: NDavid Howells <dhowells@redhat.com> Was-Acked-by: NIan Kent <raven@themaw.net> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 14 1月, 2011 3 次提交
-
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Mandeep Singh Baines 提交于
We'd like to be able to oom_score_adj a process up/down as it enters/leaves the foreground. Currently, it is not possible to oom_adj down without CAP_SYS_RESOURCE. This patch allows a task to decrease its oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to or its inherited value at fork. Assuming the thread that has forked it has oom_score_adj of 0, each process could decrease it back from 0 upon activation unless a CAP_SYS_RESOURCE thread elevated it to something higher. Alternative considered: * a setuid binary * a daemon with CAP_SYS_RESOURCE Since you don't wan't all processes to be able to reduce their oom_adj, a setuid or daemon implementation would be complex. The alternatives also have much higher overhead. This patch updated from original patch based on feedback from David Rientjes. Signed-off-by: NMandeep Singh Baines <msb@chromium.org> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nikanth Karthikesan 提交于
Currently there is no way to find whether a process has locked its pages in memory or not. And which of the memory regions are locked in memory. Add a new field "Locked" to export this information via the smaps file. Signed-off-by: NNikanth Karthikesan <knikanth@suse.de> Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Acked-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 1月, 2011 2 次提交
-
-
由 Josef Bacik 提交于
This patch simply adds documentation on how to handle the hole punching mode of fallocate for any filesystem wishing to use it. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Anton Altaparmakov 提交于
Fix writev() to not keep writing the first segment over and over again instead of moving onto subsequent segments and update the NTFS entry in MAINTAINERS to reflect that Tuxera Inc. now supports the NTFS driver. Signed-off-by: NAnton Altaparmakov <anton@tuxera.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-