- 29 6月, 2013 2 次提交
-
-
由 Al Viro 提交于
the only remaining caller (in ncpfs) is guaranteed to return 0 - we only hit it if we'd just checked that there's no dentry with such name. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 5月, 2013 2 次提交
-
-
由 Wei Yongjun 提交于
Using list_move() instead of list_del() + list_add(). Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Yan, Zheng 提交于
When pruning a dentry, its ancestor dentry can also be pruned. But the ancestor dentry does not go through dput(), so it does not get put on the dentry LRU. Hence associating d_prune with removing the dentry from the LRU is the wrong. The fix is remove dentry_lru_prune(). Call file system's d_prune() callback directly when pruning dentries. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 01 5月, 2013 1 次提交
-
-
由 Greg Thelen 提交于
Call cond_resched() in shrink_dcache_parent() to maintain interactivity. Before this patch: void shrink_dcache_parent(struct dentry * parent) { while ((found = select_parent(parent, &dispose)) != 0) shrink_dentry_list(&dispose); } select_parent() populates the dispose list with dentries which shrink_dentry_list() then deletes. select_parent() carefully uses need_resched() to avoid doing too much work at once. But neither shrink_dcache_parent() nor its called functions call cond_resched(). So once need_resched() is set select_parent() will return single dentry dispose list which is then deleted by shrink_dentry_list(). This is inefficient when there are a lot of dentry to process. This can cause softlockup and hurts interactivity on non preemptable kernels. This change adds cond_resched() in shrink_dcache_parent(). The benefit of this is that need_resched() is quickly cleared so that future calls to select_parent() are able to efficiently return a big batch of dentry. These additional cond_resched() do not seem to impact performance, at least for the workload below. Here is a program which can cause soft lockup if other system activity sets need_resched(). int main() { struct rlimit rlim; int i; int f[100000]; char buf[20]; struct timeval t1, t2; double diff; /* cleanup past run */ system("rm -rf x"); /* boost nfile rlimit */ rlim.rlim_cur = 200000; rlim.rlim_max = 200000; if (setrlimit(RLIMIT_NOFILE, &rlim)) err(1, "setrlimit"); /* make directory for files */ if (mkdir("x", 0700)) err(1, "mkdir"); if (gettimeofday(&t1, NULL)) err(1, "gettimeofday"); /* populate directory with open files */ for (i = 0; i < 100000; i++) { snprintf(buf, sizeof(buf), "x/%d", i); f[i] = open(buf, O_CREAT); if (f[i] == -1) err(1, "open"); } /* close some of the files */ for (i = 0; i < 85000; i++) close(f[i]); /* unlink all files, even open ones */ system("rm -rf x"); if (gettimeofday(&t2, NULL)) err(1, "gettimeofday"); diff = (((double)t2.tv_sec * 1000000 + t2.tv_usec) - ((double)t1.tv_sec * 1000000 + t1.tv_usec)); printf("done: %g elapsed\n", diff/1e6); return 0; } Signed-off-by: NGreg Thelen <gthelen@google.com> Signed-off-by: NDave Chinner <david@fromorbit.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 3月, 2013 1 次提交
-
-
由 Al Viro 提交于
... lest we get livelocks between path_is_under() and d_path() and friends. The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks; it is possible to have thread B spin on attempt to take lock shared while thread A is already holding it shared, if B is on lower-numbered CPU than A and there's a thread C spinning on attempt to take the same lock exclusive. As the result, we need consistent ordering between vfsmount_lock (lglock) and rename_lock (seq_lock), even though everything that takes both is going to take vfsmount_lock only shared. Spotted-by: NBrad Spengler <spender@grsecurity.net> Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 2月, 2013 1 次提交
-
-
由 Sasha Levin 提交于
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NSasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 2月, 2013 2 次提交
-
-
由 Jeff Layton 提交于
The following set of operations on a NFS client and server will cause server# mkdir a client# cd a server# mv a a.bak client# sleep 30 # (or whatever the dir attrcache timeout is) client# stat . stat: cannot stat `.': Stale NFS file handle Obviously, we should not be getting an ESTALE error back there since the inode still exists on the server. The problem is that the lookup code will call d_revalidate on the dentry that "." refers to, because NFS has FS_REVAL_DOT set. nfs_lookup_revalidate will see that the parent directory has changed and will try to reverify the dentry by redoing a LOOKUP. That of course fails, so the lookup code returns ESTALE. The problem here is that d_revalidate is really a bad fit for this case. What we really want to know at this point is whether the inode is still good or not, but we don't really care what name it goes by or whether the dcache is still valid. Add a new d_op->d_weak_revalidate operation and have complete_walk call that instead of d_revalidate. The intent there is to allow for a "weaker" d_revalidate that just checks to see whether the inode is still good. This is also gives us an opportunity to kill off the FS_REVAL_DOT special casing. [AV: changed method name, added note in porting, fixed confusion re having it possibly called from RCU mode (it won't be)] Cc: NeilBrown <neilb@suse.de> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
* calling conventions change - ERR_PTR() is returned on ->d_hash() errors; NULL is just for dcache miss now. * exported, open-coded instances in ncpfs and cifs converted. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 23 2月, 2013 4 次提交
-
-
由 Al Viro 提交于
Its first argument is always non-root, while the second one is always root. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
The last caller was removed >2 years ago in commit 7b2a69ba. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 21 12月, 2012 2 次提交
-
-
由 NeilBrown 提交于
NFS appears to use d_obtain_alias() to create the root dentry rather than d_make_root. This can cause 'prepend_path()' to complain that the root has a weird name if an NFS filesystem is lazily unmounted. e.g. if "/mnt" is an NFS mount then { cd /mnt; umount -l /mnt ; ls -l /proc/self/cwd; } will cause a WARN message like WARNING: at /home/git/linux/fs/dcache.c:2624 prepend_path+0x1d7/0x1e0() ... Root dentry has weird name <> to appear in kernel logs. So change d_obtain_alias() to use "/" rather than "" as the anonymous name. Signed-off-by: NNeilBrown <neilb@suse.de> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
The code that relied on that flag was ripped out of btrfs quite some time ago, and never added back. Josef indicated that he was going to take a different approach to the problem in btrfs, and that we could just eliminate this flag. Cc: Josef Bacik <jbacik@fusionio.com> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 30 9月, 2012 1 次提交
-
-
由 Miklos Szeredi 提交于
IBM reported a deadlock in select_parent(). This was found to be caused by taking rename_lock when already locked when restarting the tree traversal. There are two cases when the traversal needs to be restarted: 1) concurrent d_move(); this can only happen when not already locked, since taking rename_lock protects against concurrent d_move(). 2) racing with final d_put() on child just at the moment of ascending to parent; rename_lock doesn't protect against this rare race, so it can happen when already locked. Because of case 2, we need to be able to handle restarting the traversal when rename_lock is already held. This patch fixes all three callers of try_to_ascend(). IBM reported that the deadlock is gone with this patch. [ I rewrote the patch to be smaller and just do the "goto again" if the lock was already held, but credit goes to Miklos for the real work. - Linus ] Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 9月, 2012 1 次提交
-
-
由 J. Bruce Fields 提交于
"Search list for X" sounds like you're trying to find X on a list. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 9月, 2012 1 次提交
-
-
由 Alan Cox 提交于
Each iteration of d_delete we reload inode from dentry->d_inode and then call S_ISDIR(inode-i_mode), so inode cannot possibly be NULL shortly afterwards unless something went horribly wrong. Signed-off-by: NAlan Cox <alan@linux.intel.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 19 9月, 2012 1 次提交
-
-
由 Miklos Szeredi 提交于
IBM reported a soft lockup after applying the fix for the rename_lock deadlock. Commit c83ce989 ("VFS: Fix the nfs sillyrename regression in kernel 2.6.38") was found to be the culprit. The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the dentry was killed. This flag can be set on non-killed dentries too, which results in infinite retries when trying to traverse the dentry tree. This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is only set in d_kill() and makes try_to_ascend() test only this flag. IBM reported successful test results with this patch. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 7月, 2012 3 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
it's enough to set ->mnt_ns of internal vfsmounts to something distinct from all struct mnt_namespace out there; then we can just use the check for ->mnt_ns != NULL in the fast path of mntput_no_expire() Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 09 6月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit 7732a557 (and commit 3f50fff4, which was a follow-up cleanup). We're chasing an elusive bug that Dave Jones can apparently reproduce using his system call fuzzer tool, and that looks like some kind of locking ordering problem on the directory i_mutex chain. Our i_mutex locking is rather complex, and depends on the topological ordering of the directories, which is why we have been very wary of splicing directory entries around. Of course, we really don't want to ever see aliased unconnected directories anyway, so none of this should ever happen, but this revert aims to basically get us back to a known older state. Bruce points to some of the previous discussion at http://marc.info/?i=<20110310105821.GE22723@ZenIV.linux.org.uk> and in particular a long post from Neil: http://marc.info/?i=<20110311150749.2fa2be66@notabene.brown> It should be noted that it's possible that Dave's problems come from other changes altohgether, including possibly just the fact that Dave constantly is teachning his fuzzer new tricks. So what appears to be a new bug could in fact be an old one that just gets newly triggered, but reverting these patches as "still under heavy discussion" is the right thing regardless. Requested-by: NAl Viro <viro@zeniv.linux.org.uk> Acked-by: NJ. Bruce Fields <bfields@fieldses.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 5月, 2012 2 次提交
-
-
由 J. Bruce Fields 提交于
Nobody sets want_disconn any more. Reported-by: NPeng Tao <bergwolf@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 J. Bruce Fields 提交于
A directory should never have more than one dentry pointing to it. But d_splice_alias() will add one if it finds a directory with an already-existing non-DISCONNECTED dentry. I can't find an obvious reproducer, but I also can't see what prevents d_splice_alias() from encountering such a case. It therefore seems safest to allow d_splice_alias to use any dentry it finds. (Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0, this could cause an nfsd deadlock like this: - Somebody attempts to remove a non-empty directory. - The dentry_unhash() in vfs_rmdir() unhashes the dentry pointing to the non-empty directory. - ->rmdir() then fails with -ENOTEMPTY - Before the vfs_rmdir() caller reaches dput(), an nfsd process in rename looks up the directory by filehandle; at the end of that lookup, this dentry is found by d_alloc_anon(), and a reference is taken on it, preventing dput() from removing it. - A regular lookup of the directory calls d_splice_alias(), finds only an unhashed (not a DISCONNECTED) dentry, and insteads adds a new one, so the directory now has two dentries. - The nfsd process in rename, which was previously looking up the source directory of the rename, now looks up the target directory (which is the same), and gets the dentry newly created by the previous lookup. - The rename, seeing two different dentries, assumes this is a cross-directory rename and attempts to take the i_mutex on the directory twice. That reproducer no longer exists, but I don't think there was anything fundamentally incorrect about the vfs_rmdir() behavior there, so I think the real fault was here in d_splice_alias().) Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 30 5月, 2012 1 次提交
-
-
由 Andi Kleen 提交于
lglocks and brlocks are currently generated with some complicated macros in lglock.h. But there's no reason to not just use common utility functions and put all the data into a common data structure. In preparation, this patch changes the API to look more like normal function calls with pointers, not magic macros. The patch is rather large because I move over all users in one go to keep it bisectable. This impacts the VFS somewhat in terms of lines changed. But no actual behaviour change. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 24 5月, 2012 1 次提交
-
-
由 Tim Bird 提交于
UDP stack needs a minimum hash size value for proper operation and also uses alloc_large_system_hash() for proper NUMA distribution of its hash tables and automatic sizing depending on available system memory. On some low memory situations, udp_table_init() must ignore the alloc_large_system_hash() result and reallocs a bigger memory area. As we cannot easily free old hash table, we leak it and kmemleak can issue a warning. This patch adds a low limit parameter to alloc_large_system_hash() to solve this problem. We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table allocation. Reported-by: NMark Asselstine <mark.asselstine@windriver.com> Reported-by: NTim Bird <tim.bird@am.sony.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 5月, 2012 2 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit 8c01a529. It turns out the d_unhashed() check isn't unnecessary after all: while it's true that unhashing will increment the sequence numbers, that does not necessarily invalidate the RCU lookup, because it might have seen the dentry pointer (before it got unhashed), but by the time it loaded the sequence number, it could have seen the *new* sequence number (after it got unhashed). End result: we might look up an unhashed dentry that is about to be freed, with the sequence number never indicating anything bad about it. So checking that the dentry is still hashed (*after* reading the sequence number) is indeed the proper fix, and was never unnecessary. Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Miklos Szeredi points out that we need to also worry about memory odering when doing the dentry name comparison asynchronously with RCU. In particular, doing a rename can do a memcpy() of one dentry name over another, and we want to make sure that any unlocked reader will always see the proper terminating NUL character, so that it won't ever run off the allocation. Rather than having to be extra careful with the name copy or at lookup time for each character, this resolves the issue by making sure that all names that are inlined in the dentry always have a NUL character at the end of the name allocation. If we do that at dentry allocation time, we know that no future name copy will ever change that final NUL to anything else, so there are no memory ordering issues. So even if a concurrent rename ends up overwriting the NUL character that terminates the original name, we always know that there is one final NUL at the end, and there is no worry about the lockless RCU lookup traversing the name too far. The out-of-line allocations are never copied over, so we can just make sure that we write the name (with terminating NULL) and do a write barrier before we expose the name to anything else by setting it in the dentry. Reported-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 5月, 2012 4 次提交
-
-
由 Linus Torvalds 提交于
This allows comparing hash and len in one operation on 64-bit architectures. Right now only __d_lookup_rcu() takes advantage of this, since that is the case we care most about. The use of anonymous struct/unions hides the alternate 64-bit approach from most users, the exception being a few cases where we initialize a 'struct qstr' with a static initializer. This makes the problematic cases use a new QSTR_INIT() helper function for that (but initializing just the name pointer with a "{ .name = xyzzy }" initializer remains valid, as does just copying another qstr structure). Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
All callers do want to check the dentry length, but some of them can check the length and the hash together, so doing it in dentry_cmp() can be counter-productive. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Commit 12f8ad4b ("vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces") did the careful ACCESS_ONCE() of the dentry name only for the word-at-a-time case, even though the issue is generic. Admittedly I don't really see gcc ever reloading the value in the middle of the loop, so the ACCESS_ONCE() protects us from a fairly theoretical issue. But better safe than sorry. Also, this consolidates the common parts of the word-at-a-time and bytewise logic, which includes checking the length. We'll be changing that later. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
The check for d_unhashed() is not strictly incorrect, but at the same time it is also not sensible. The actual dentry removal from the dentry hash chains is totally asynchronous to the __d_lookup_rcu() logic, and we depend on __d_drop() updating the sequence number to invalidate any lookup of an unhashed dentry. So checking d_unhashed() is not incorrect, but it's not useful either: the code has to work correctly even without it. So just remove it. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 5月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
The calling conventions for __d_lookup_rcu() and dentry_cmp() are annoying in different ways, and there is actually one single underlying reason for both of the annoyances. The fundamental reason is that we do the returned dentry sequence number check inside __d_lookup_rcu() instead of doing it in the caller. This results in two annoyances: - __d_lookup_rcu() now not only needs to return the dentry and the sequence number that goes along with the lookup, it also needs to return the inode pointer that was validated by that sequence number check. - and because we did the sequence number check early (to validate the name pointer and length) we also couldn't just pass the dentry itself to dentry_cmp(), we had to pass the counted string that contained the name. So that sequence number decision caused two separate ugly calling conventions. Both of these problems would be solved if we just did the sequence number check in the caller instead. There's only one caller, and that caller already has to do the sequence number check for the parent anyway, so just do that. That allows us to stop returning the dentry->d_inode in that in-out argument (pointer-to-pointer-to-inode), so we can make the inode argument just a regular input inode pointer. The caller can just load the inode from dentry->d_inode, and then do the sequence number check after that to make sure that it's synchronized with the name we looked up. And it allows us to just pass in the dentry to dentry_cmp(), which is what all the callers really wanted. Sure, dentry_cmp() has to be a bit careful about the dentry (which is not stable during RCU lookup), but that's actually very simple. And now that dentry_cmp() can clearly see that the first string argument is a dentry, we can use the direct word access for that, instead of the careful unaligned zero-padding. The dentry name is always properly aligned, since it is a single path component that is either embedded into the dentry itself, or was allocated with kmalloc() (see __d_alloc). Finally, this also uninlines the nasty slow-case for dentry comparisons: that one *does* need to do a sequence number check, since it will call in to the low-level filesystems, and we want to give those a stable inode pointer and path component length/start arguments. Doing an extra sequence check for that slow case is not a problem, though. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 5月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
It turns out that there are more cases than CONFIG_DEBUG_PAGEALLOC that can have holes in the kernel address space: it seems to happen easily with Xen, and it looks like the AMD gart64 code will also punch holes dynamically. Actually hitting that case is still very unlikely, so just do the access, and take an exception and fix it up for the very unlikely case of it being a page-crosser with no next page. And hey, this abstraction might even help other architectures that have other issues with unaligned word accesses than the possible missing next page. IOW, this could do the byte order magic too. Peter Anvin fixed a thinko in the shifting for the exception case. Reported-and-tested-by: NJana Saout <jana@saout.de> Cc: Peter Anvin <hpa@zytor.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 3月, 2012 1 次提交
-
-
由 Michel Lespinasse 提交于
In d_materialise_unique() there are 3 subcases to the 'aliased dentry' case; in two subcases the inode i_lock is properly released but this does not occur in the -ELOOP subcase. This seems to have been introduced by commit 18367501 ("fix loop checks in d_materialise_unique()"). Signed-off-by: NMichel Lespinasse <walken@google.com> Cc: stable@vger.kernel.org # v3.0+ [ Added a comment, and moved the unlock to where we generate the -ELOOP, which seems to be more natural. You probably can't actually trigger this without a buggy network file server - d_materialize_unique() is for finding aliases on non-local filesystems, and the d_ancestor() case is for a hardlinked directory loop. But we should be robust in the case of such buggy servers anyway. ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 3月, 2012 1 次提交
-
-
由 Randy Dunlap 提交于
Fix kernel-doc warnings in fs/dcache.c: Warning(fs/dcache.c:1743): No description found for parameter 'seqp' Warning(fs/dcache.c:1743): Excess function parameter 'seq' description in '__d_lookup_rcu' Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 3月, 2012 1 次提交
-
-
由 Al Viro 提交于
all callers converted to d_make_root() by now Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 20 3月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
For some odd historical reason, the final mixing round for the dentry cache hash table lookup had an insane "xor with big constant" logic. In two places. The big constant that is being xor'ed is GOLDEN_RATIO_PRIME, which is a fairly random-looking number that is designed to be *multiplied* with so that the bits get spread out over a whole long-word. But xor'ing with it is insane. It doesn't really even change the hash - it really only shifts the hash around in the hash table. To make matters worse, the insane big constant is different on 32-bit and 64-bit builds, even though the name hash bits we use are always 32-bit (and the bits from the pointer we mix in effectively are too). It's all total voodoo programming, in other words. Now, some testing and analysis of the hash chains shows that the rest of the hash function seems to be fairly good. It does pick the right bits of the parent dentry pointer, for example, and while it's generally a bad idea to use an xor to mix down the upper bits (because if there is a repeating pattern, the xor can cause "destructive interference"), it seems to not have been a disaster. For example, replacing the hash with the normal "hash_long()" code (that uses the GOLDEN_RATIO_PRIME constant correctly, btw) actually just makes the hash worse. The hand-picked hash knew which bits of the pointer had the highest entropy, and hash_long() ends up mixing bits less optimally at least in some trivial tests. So the hash function overall seems fine, it just has that really odd "shift result around by a constant xor". So get rid of the silly xor, and replace the down-mixing of the bits with an add instead of an xor that tends to not have the same kind of destructive interference issues. Some stats on the resulting hash chains shows that they look statistically identical before and after, but the code is simpler and no longer makes you go "WTF?". Also, the incoming hash really is just "unsigned int", not a long, and there's no real point to worry about the high 26 bits of the dentry pointer for the 64-bit case, because they are all going to be identical anyway. So also change the hashing to be done in the more natural 'unsigned int' that is the real size of the actual hashed data anyway. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 3月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
Ok, this is hacky, and only works on little-endian machines with goo unaligned handling. And even then only with CONFIG_DEBUG_PAGEALLOC disabled, since it can access up to 7 bytes after the pathname. But it runs like a bat out of hell. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-