- 16 1月, 2011 1 次提交
-
-
由 Al Viro 提交于
do_lookup() has a path leading from LOOKUP_RCU case to non-RCU crossing of mountpoints, which breaks things badly. If we hit need_revalidate: and do nothing in there, we need to come back into LOOKUP_RCU half of things, not to done: in non-RCU one. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 14 1月, 2011 4 次提交
-
-
由 Nick Piggin 提交于
J. R. Okajima noticed that ->put_link is being attempted on the wrong inode, and suggested the way to fix it. I changed it a bit according to Al's suggestion to keep an explicit link path around. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 J. R. Okajima 提交于
When open(2) without O_DIRECTORY opens an existing dir, it should return EISDIR. In do_last(), the variable 'error' is initialized EISDIR, but it is changed by d_revalidate() which returns any positive to represent 'the target dir is valid.' Should we keep and return the initialized 'error' in this case. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
As J. R. Okajima noted, force_reval_path passes in the same dentry to d_revalidate as the one in the nameidata structure (other callers pass in a child), so the locking breaks. This can oops with a chrooted nfs mount, for example. Similarly there can be other problems with revalidating a dentry which is already in nameidata of the path walk. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
d_revalidate can return in rcu-walk mode even when it returns 0. We can't just call any old dcache function on rcu-walk dentry (the dentry is unstable, so even through d_lock can safely be taken, the result may no longer be what we expect -- careful re-checks would be required). So just drop rcu in this case. (I missed this conversion when switching to the rcu-walk convention that Linus suggested) Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
- 13 1月, 2011 1 次提交
-
-
由 Jeff Layton 提交于
When a file is opened with O_TRUNC, the truncate processing is handled by handle_truncate(). This function however doesn't receive any info about the newly instantiated filp, and therefore can't pass that info along so that the setattr can use it. This makes NFSv4 misbehave. The client does an open and gets a valid stateid, and then doesn't use that stateid on the subsequent truncate. It uses the zero-stateid instead. Most servers ignore this fact and just do the truncate anyway, but some don't like it (notably, RHEL4). It seems more correct that since we have a fully instantiated file at the time that handle_truncate is called, that we pass that along so that the truncate operation can properly use it. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 10 1月, 2011 1 次提交
-
-
由 Randy Dunlap 提交于
Fix new kernel-doc notation warnings in fs/namei.c and spell ECHILD correctly. Warning(fs/namei.c:218): No description found for parameter 'flags' Warning(fs/namei.c:425): Excess function parameter 'Returns' description in 'nameidata_drop_rcu' Warning(fs/namei.c:478): Excess function parameter 'Returns' description in 'nameidata_dentry_drop_rcu' Warning(fs/namei.c:540): Excess function parameter 'Returns' description in 'nameidata_drop_rcu_last' Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Cc: Nick Piggin <npiggin@kernel.dk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 1月, 2011 9 次提交
-
-
由 Nick Piggin 提交于
The problem that this patch aims to fix is vfsmount refcounting scalability. We need to take a reference on the vfsmount for every successful path lookup, which often go to the same mount point. The fundamental difficulty is that a "simple" reference count can never be made scalable, because any time a reference is dropped, we must check whether that was the last reference. To do that requires communication with all other CPUs that may have taken a reference count. We can make refcounts more scalable in a couple of ways, involving keeping distributed counters, and checking for the global-zero condition less frequently. - check the global sum once every interval (this will delay zero detection for some interval, so it's probably a showstopper for vfsmounts). - keep a local count and only taking the global sum when local reaches 0 (this is difficult for vfsmounts, because we can't hold preempt off for the life of a reference, so a counter would need to be per-thread or tied strongly to a particular CPU which requires more locking). - keep a local difference of increments and decrements, which allows us to sum the total difference and hence find the refcount when summing all CPUs. Then, keep a single integer "long" refcount for slow and long lasting references, and only take the global sum of local counters when the long refcount is 0. This last scheme is what I implemented here. Attached mounts and process root and working directory references are "long" references, and everything else is a short reference. This allows scalable vfsmount references during path walking over mounted subtrees and unattached (lazy umounted) mounts with processes still running in them. This results in one fewer atomic op in the fastpath: mntget is now just a per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock and non-atomic decrement in the common case. However code is otherwise bigger and heavier, so single threaded performance is basically a wash. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
Use a seqlock in the fs_struct to enable us to take an atomic copy of the complete cwd and root paths. Use this in the RCU lookup path to avoid a thread-shared spinlock in RCU lookup operations. Multi-threaded apps may now perform path lookups with scalability matching multi-process apps. Operations such as stat(2) become very scalable for multi-threaded workload. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
Perform common cases of path lookups without any stores or locking in the ancestor dentry elements. This is called rcu-walk, as opposed to the current algorithm which is a refcount based walk, or ref-walk. This results in far fewer atomic operations on every path element, significantly improving path lookup performance. It also avoids cacheline bouncing on common dentries, significantly improving scalability. The overall design is like this: * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk. * Take the RCU lock for the entire path walk, starting with the acquiring of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are not required for dentry persistence. * synchronize_rcu is called when unregistering a filesystem, so we can access d_ops and i_ops during rcu-walk. * Similarly take the vfsmount lock for the entire path walk. So now mnt refcounts are not required for persistence. Also we are free to perform mount lookups, and to assume dentry mount points and mount roots are stable up and down the path. * Have a per-dentry seqlock to protect the dentry name, parent, and inode, so we can load this tuple atomically, and also check whether any of its members have changed. * Dentry lookups (based on parent, candidate string tuple) recheck the parent sequence after the child is found in case anything changed in the parent during the path walk. * inode is also RCU protected so we can load d_inode and use the inode for limited things. * i_mode, i_uid, i_gid can be tested for exec permissions during path walk. * i_op can be loaded. When we reach the destination dentry, we lock it, recheck lookup sequence, and increment its refcount and mountpoint refcount. RCU and vfsmount locks are dropped. This is termed "dropping rcu-walk". If the dentry refcount does not match, we can not drop rcu-walk gracefully at the current point in the lokup, so instead return -ECHILD (for want of a better errno). This signals the path walking code to re-do the entire lookup with a ref-walk. Aside from the final dentry, there are other situations that may be encounted where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take a reference on the last good dentry) and continue with a ref-walk. Again, if we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup using ref-walk. But it is very important that we can continue with ref-walk for most cases, particularly to avoid the overhead of double lookups, and to gain the scalability advantages on common path elements (like cwd and root). The cases where rcu-walk cannot continue are: * NULL dentry (ie. any uncached path element) * parent with d_inode->i_op->permission or ACLs * dentries with d_revalidate * Following links In future patches, permission checks and d_revalidate become rcu-walk aware. It may be possible eventually to make following links rcu-walk aware. Uncached path elements will always require dropping to ref-walk mode, at the very least because i_mutex needs to be grabbed, and objects allocated. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
dcache_lock no longer protects anything. remove it. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
Change d_hash so it may be called from lock-free RCU lookups. See similar patch for d_compare for details. For in-tree filesystems, this is just a mechanical change. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
- 08 12月, 2010 1 次提交
-
-
由 Lino Sanfilippo 提交于
Unsetting FMODE_NONOTIFY in fsnotify_open() is too late, since fsnotify_perm() is called before. If FMODE_NONOTIFY is set fsnotify_perm() will skip permission checks, so a user can still disable permission checks by setting this flag in an open() call. This patch corrects this by unsetting the flag before fsnotify_perm is called. Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: NEric Paris <eparis@redhat.com>
-
- 29 10月, 2010 1 次提交
-
-
由 Al Viro 提交于
nameidata_to_filp() drops nd->path or transfers it to opened file. In the former case it's a Bad Idea(tm) to do mnt_drop_write() on nd->path.mnt, since we might race with umount and vfsmount in question might be gone already. Fix: don't drop it, then... IOW, have nameidata_to_filp() grab nd->path in case it transfers it to file and do path_drop() in callers. After they are through with accessing nd->path... Reported-by: NMiklos Szeredi <miklos@szeredi.hu> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 10月, 2010 2 次提交
-
-
由 Al Viro 提交于
Clones an existing reference to inode; caller must already hold one. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
The caller that didn't need it is gone. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 18 8月, 2010 4 次提交
-
-
由 Nick Piggin 提交于
fs: brlock vfsmount_lock Use a brlock for the vfsmount lock. It must be taken for write whenever modifying the mount hash or associated fields, and may be taken for read when performing mount hash lookups. A new lock is added for the mnt-id allocator, so it doesn't need to take the heavy vfsmount write-lock. The number of atomics should remain the same for fastpath rlock cases, though code would be slightly slower due to per-cpu access. Scalability is not not be much improved in common cases yet, due to other locks (ie. dcache_lock) getting in the way. However path lookups crossing mountpoints should be one case where scalability is improved (currently requiring the global lock). The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node Altix system (high latency to remote nodes), a simple umount microbenchmark (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it took 6.8s, afterwards took 7.1s, about 5% slower. Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NNick Piggin <npiggin@kernel.dk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Nick Piggin 提交于
fs: remove extra lookup in __lookup_hash Optimize lookup for create operations, where no dentry should often be common-case. In cases where it is not, such as unlink, the added overhead is much smaller than the removed. Also, move comments about __d_lookup racyness to the __d_lookup call site. d_lookup is intuitive; __d_lookup is what needs commenting. So in that same vein, add kerneldoc comments to __d_lookup and clean up some of the comments: - We are interested in how the RCU lookup works here, particularly with renames. Make that explicit, and point to the document where it is explained in more detail. - RCU is pretty standard now, and macros make implementations pretty mindless. If we want to know about RCU barrier details, we look in RCU code. - Delete some boring legacy comments because we don't care much about how the code used to work, more about the interesting parts of how it works now. So comments about lazy LRU may be interesting, but would better be done in the LRU or refcount management code. Signed-off-by: NNick Piggin <npiggin@kernel.dk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Nick Piggin 提交于
fs: dentry allocation consolidation There are 2 duplicate copies of code in dentry allocation in path lookup. Consolidate them into a single function. Signed-off-by: NNick Piggin <npiggin@kernel.dk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Nick Piggin 提交于
fs: fix do_lookup false negative In do_lookup, if we initially find no dentry, we take the directory i_mutex and re-check the lookup. If we find a dentry there, then we revalidate it if needed. However if that revalidate asks for the dentry to be invalidated, we return -ENOENT from do_lookup. What should happen instead is an attempt to allocate and lookup a new dentry. This is probably not noticed because it is rare. It is only reached if a concurrent create races in first (in which case, the dentry probably won't be invalidated anyway), or if the racy __d_lookup has failed due to a false-negative (which is very rare). Fix this by removing code and have it use the normal reval path. Signed-off-by: NNick Piggin <npiggin@kernel.dk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 11 8月, 2010 1 次提交
-
-
由 Miklos Szeredi 提交于
Add three helpers that retrieve a refcounted copy of the root and cwd from the supplied fs_struct. get_fs_root() get_fs_pwd() get_fs_root_and_pwd() Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 02 8月, 2010 2 次提交
-
-
由 Eric Paris 提交于
SELinux needs to pass the MAY_ACCESS flag so it can handle auditting correctly. Presently the masking of MAY_* flags is done in the VFS. In order to allow LSMs to decide what flags they care about and what flags they don't just pass them all and the each LSM mask off what they don't need. This patch should contain no functional changes to either the VFS or any LSM. Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NStephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Tetsuo Handa 提交于
When commit be6d3e56 "introduce new LSM hooks where vfsmount is available." was proposed, regarding security_path_truncate(), only "struct file *" argument (which AppArmor wanted to use) was removed. But length and time_attrs arguments are not used by TOMOYO nor AppArmor. Thus, let's remove these arguments. Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 28 7月, 2010 1 次提交
-
-
由 Eric Paris 提交于
fsnotify was using char * when it passed around the d_name.name string internally but it is actually an unsigned char *. This patch switches fsnotify to use unsigned and should silence some pointer signess warnings which have popped out of xfs. I do not add -Wpointer-sign to the fsnotify code as there are still issues with kstrdup and strlen which would pop out needless warnings. Signed-off-by: NEric Paris <eparis@redhat.com>
-
- 28 5月, 2010 1 次提交
-
-
由 Neil Brown 提交于
Commit 1f36f774 broke FS_REVAL_DOT semantics. In particular, before this patch, the command ls -l in an NFS mounted directory would always check if the directory on the server had changed and if so would flush and refill the pagecache for the dir. After this patch, the same "ls -l" will repeatedly return stale date until the cached attributes for the directory time out. The following patch fixes this by ensuring the d_revalidate is called by do_last when "." is being looked-up. link_path_walk has already called d_revalidate, but in that case LOOKUP_OPEN is not set so nfs_lookup_verify_inode chooses not to do any validation. The following patch restores the original behaviour. Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 5月, 2010 1 次提交
-
-
由 Huang Shijie 提交于
update the mnt of the path when it is not equal to the new one. Signed-off-by: NHuang Shijie <shijie8@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 5月, 2010 1 次提交
-
-
由 Al Viro 提交于
1) i_flags simply doesn't work for mount/unlink race prevention; we may have many links to file and rm on one of those obviously shouldn't prevent bind on top of another later on. To fix it right way we need to mark _dentry_ as unsuitable for mounting upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and i_mutex on the inode in question. Set it (with dont_mount(dentry)) in unlink/rmdir/etc., check (with cant_mount(dentry)) in places in namespace.c that used to check for S_DEAD. Setting S_DEAD is still needed in places where we used to set it (for directories getting killed), since we rely on it for readdir/rmdir race prevention. 2) rename()/mount() protection has another bogosity - we unhash the target before we'd checked that it's not a mountpoint. Fixed. 3) ancient bogosity in pivot_root() - we locked i_mutex on the right directory, but checked S_DEAD on the different (and wrong) one. Noticed and fixed. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 13 5月, 2010 1 次提交
-
-
由 Jan Kara 提交于
According to specification mkdir d; ln -s d a; open("a/", O_NOFOLLOW | O_RDONLY) should return success but currently it returns ELOOP. This is a regression caused by path lookup cleanup patch series. Fix the code to ignore O_NOFOLLOW in case the provided path has trailing slashes. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: NMarius Tolzmann <tolzmann@molgen.mpg.de> Acked-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 3月, 2010 1 次提交
-
-
由 Al Viro 提交于
Lose want_dir argument, while we are at it - since now nd->flags & LOOKUP_DIRECTORY is equivalent to it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 07 3月, 2010 1 次提交
-
-
由 Al Viro 提交于
We managed to lose O_DIRECTORY testing due to a stupid typo in commit 1f36f774 ("Switch !O_CREAT case to use of do_last()") Reported-by: NWalter Sheets <w41ter@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 3月, 2010 6 次提交
-
-
由 Al Viro 提交于
... and now we have all intents crap well localized Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Now that nd->last stays around until ->put_link() is called, we can just postpone that ->put_link() in do_filp_open() a bit and don't bother with copying. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Don't bother with path_walk() (and its retry loop); link_path_walk() will do it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
We set it to 1 iff we return NULL Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-