提交 · b161dfa6937ae46d50adce8a7c6b12233e96e7bd · xiphi1978 / linux

19 9月, 2012 1 次提交

vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill() · b161dfa6

由 Miklos Szeredi 提交于 9月 17, 2012

IBM reported a soft lockup after applying the fix for the rename_lock
deadlock.  Commit c83ce989 ("VFS: Fix the nfs sillyrename regression
in kernel 2.6.38") was found to be the culprit.

The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the
dentry was killed.  This flag can be set on non-killed dentries too,
which results in infinite retries when trying to traverse the dentry
tree.

This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is
only set in d_kill() and makes try_to_ascend() test only this flag.

IBM reported successful test results with this patch.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b161dfa6

14 7月, 2012 3 次提交

A
__d_unalias() should refuse to move mountpoints · ee3efa91
由 Al Viro 提交于 6月 08, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ee3efa91
A
vfs: switch i_dentry/d_alias to hlist · b3d9b7a3
由 Al Viro 提交于 6月 09, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b3d9b7a3

get rid of ->mnt_longterm · f7a99c5b

由 Al Viro 提交于 6月 09, 2012

it's enough to set ->mnt_ns of internal vfsmounts to something
distinct from all struct mnt_namespace out there; then we can
just use the check for ->mnt_ns != NULL in the fast path of
mntput_no_expire()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f7a99c5b

09 6月, 2012 1 次提交

Revert "vfs: stop d_splice_alias creating directory aliases" · 32ba9c3f

由 Linus Torvalds 提交于 6月 08, 2012

This reverts commit 7732a557 (and commit
3f50fff4, which was a follow-up
cleanup).

We're chasing an elusive bug that Dave Jones can apparently reproduce
using his system call fuzzer tool, and that looks like some kind of
locking ordering problem on the directory i_mutex chain.  Our i_mutex
locking is rather complex, and depends on the topological ordering of
the directories, which is why we have been very wary of splicing
directory entries around.

Of course, we really don't want to ever see aliased unconnected
directories anyway, so none of this should ever happen, but this revert
aims to basically get us back to a known older state.

Bruce points to some of the previous discussion at

       http://marc.info/?i=<20110310105821.GE22723@ZenIV.linux.org.uk>

and in particular a long post from Neil:

       http://marc.info/?i=<20110311150749.2fa2be66@notabene.brown>

It should be noted that it's possible that Dave's problems come from
other changes altohgether, including possibly just the fact that Dave
constantly is teachning his fuzzer new tricks.  So what appears to be a
new bug could in fact be an old one that just gets newly triggered, but
reverting these patches as "still under heavy discussion" is the right
thing regardless.
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

32ba9c3f

31 5月, 2012 2 次提交

vfs: remove unused __d_splice_alias argument · 3f50fff4

由 J. Bruce Fields 提交于 5月 09, 2012

Nobody sets want_disconn any more.
Reported-by: NPeng Tao <bergwolf@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3f50fff4

vfs: stop d_splice_alias creating directory aliases · 7732a557

由 J. Bruce Fields 提交于 5月 09, 2012

A directory should never have more than one dentry pointing to it.

But d_splice_alias() will add one if it finds a directory with an
already-existing non-DISCONNECTED dentry.

I can't find an obvious reproducer, but I also can't see what prevents
d_splice_alias() from encountering such a case.

It therefore seems safest to allow d_splice_alias to use any dentry it
finds.

(Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0,
this could cause an nfsd deadlock like this:

	- Somebody attempts to remove a non-empty directory.
	- The dentry_unhash() in vfs_rmdir() unhashes the dentry
	  pointing to the non-empty directory.
	- ->rmdir() then fails with -ENOTEMPTY
	- Before the vfs_rmdir() caller reaches dput(), an nfsd process
	  in rename looks up the directory by filehandle; at the end of
	  that lookup, this dentry is found by d_alloc_anon(), and a
	  reference is taken on it, preventing dput() from removing it.
	- A regular lookup of the directory calls d_splice_alias(),
	  finds only an unhashed (not a DISCONNECTED) dentry, and
	  insteads adds a new one, so the directory now has two
	  dentries.
	- The nfsd process in rename, which was previously looking up
	  the source directory of the rename, now looks up the target
	  directory (which is the same), and gets the dentry newly
	  created by the previous lookup.
	- The rename, seeing two different dentries, assumes this is a
	  cross-directory rename and attempts to take the i_mutex on the
	  directory twice.

That reproducer no longer exists, but I don't think there was anything
fundamentally incorrect about the vfs_rmdir() behavior there, so I think
the real fault was here in d_splice_alias().)
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7732a557

30 5月, 2012 1 次提交

brlocks/lglocks: API cleanups · 962830df

由 Andi Kleen 提交于 5月 08, 2012

lglocks and brlocks are currently generated with some complicated macros
in lglock.h.  But there's no reason to not just use common utility
functions and put all the data into a common data structure.

In preparation, this patch changes the API to look more like normal
function calls with pointers, not magic macros.

The patch is rather large because I move over all users in one go to keep
it bisectable.  This impacts the VFS somewhat in terms of lines changed.
But no actual behaviour change.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

962830df

24 5月, 2012 1 次提交

mm: add a low limit to alloc_large_system_hash · 31fe62b9

由 Tim Bird 提交于 5月 23, 2012

UDP stack needs a minimum hash size value for proper operation and also
uses alloc_large_system_hash() for proper NUMA distribution of its hash
tables and automatic sizing depending on available system memory.

On some low memory situations, udp_table_init() must ignore the
alloc_large_system_hash() result and reallocs a bigger memory area.

As we cannot easily free old hash table, we leak it and kmemleak can
issue a warning.

This patch adds a low limit parameter to alloc_large_system_hash() to
solve this problem.

We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table
allocation.
Reported-by: NMark Asselstine <mark.asselstine@windriver.com>
Reported-by: NTim Bird <tim.bird@am.sony.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31fe62b9

22 5月, 2012 2 次提交

Revert "vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu" · 2e321806

由 Linus Torvalds 提交于 5月 21, 2012

This reverts commit 8c01a529.

It turns out the d_unhashed() check isn't unnecessary after all: while
it's true that unhashing will increment the sequence numbers, that does
not necessarily invalidate the RCU lookup, because it might have seen
the dentry pointer (before it got unhashed), but by the time it loaded
the sequence number, it could have seen the *new* sequence number (after
it got unhashed).

End result: we might look up an unhashed dentry that is about to be
freed, with the sequence number never indicating anything bad about it.
So checking that the dentry is still hashed (*after* reading the sequence
number) is indeed the proper fix, and was never unnecessary.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e321806

vfs: be even more careful about dentry RCU name lookups · 6326c71f

由 Linus Torvalds 提交于 5月 21, 2012

Miklos Szeredi points out that we need to also worry about memory
odering when doing the dentry name comparison asynchronously with RCU.

In particular, doing a rename can do a memcpy() of one dentry name over
another, and we want to make sure that any unlocked reader will always
see the proper terminating NUL character, so that it won't ever run off
the allocation.

Rather than having to be extra careful with the name copy or at lookup
time for each character, this resolves the issue by making sure that all
names that are inlined in the dentry always have a NUL character at the
end of the name allocation.  If we do that at dentry allocation time, we
know that no future name copy will ever change that final NUL to
anything else, so there are no memory ordering issues.

So even if a concurrent rename ends up overwriting the NUL character
that terminates the original name, we always know that there is one
final NUL at the end, and there is no worry about the lockless RCU
lookup traversing the name too far.

The out-of-line allocations are never copied over, so we can just make
sure that we write the name (with terminating NULL) and do a write
barrier before we expose the name to anything else by setting it in the
dentry.
Reported-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6326c71f

11 5月, 2012 4 次提交

vfs: make it possible to access the dentry hash/len as one 64-bit entry · 26fe5750

由 Linus Torvalds 提交于 5月 10, 2012

This allows comparing hash and len in one operation on 64-bit
architectures.  Right now only __d_lookup_rcu() takes advantage of this,
since that is the case we care most about.

The use of anonymous struct/unions hides the alternate 64-bit approach
from most users, the exception being a few cases where we initialize a
'struct qstr' with a static initializer.  This makes the problematic
cases use a new QSTR_INIT() helper function for that (but initializing
just the name pointer with a "{ .name = xyzzy }" initializer remains
valid, as does just copying another qstr structure).
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

26fe5750

vfs: move dentry name length comparison from dentry_cmp() into callers · ee983e89

由 Linus Torvalds 提交于 5月 10, 2012

All callers do want to check the dentry length, but some of them can
check the length and the hash together, so doing it in dentry_cmp() can
be counter-productive.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee983e89

vfs: do the careful dentry name access for all dentry_cmp cases · 94753db5

由 Linus Torvalds 提交于 5月 10, 2012

Commit 12f8ad4b ("vfs: clean up __d_lookup_rcu() and dentry_cmp()
interfaces") did the careful ACCESS_ONCE() of the dentry name only for
the word-at-a-time case, even though the issue is generic.

Admittedly I don't really see gcc ever reloading the value in the middle
of the loop, so the ACCESS_ONCE() protects us from a fairly theoretical
issue. But better safe than sorry.

Also, this consolidates the common parts of the word-at-a-time and
bytewise logic, which includes checking the length.  We'll be changing
that later.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

94753db5

vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu · 8c01a529

由 Linus Torvalds 提交于 5月 10, 2012

The check for d_unhashed() is not strictly incorrect, but at the same
time it is also not sensible. The actual dentry removal from the dentry
hash chains is totally asynchronous to the __d_lookup_rcu() logic, and
we depend on __d_drop() updating the sequence number to invalidate any
lookup of an unhashed dentry.

So checking d_unhashed() is not incorrect, but it's not useful either:
the code has to work correctly even without it. So just remove it.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8c01a529

05 5月, 2012 1 次提交

vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces · 12f8ad4b

由 Linus Torvalds 提交于 5月 04, 2012

The calling conventions for __d_lookup_rcu() and dentry_cmp() are
annoying in different ways, and there is actually one single underlying
reason for both of the annoyances.

The fundamental reason is that we do the returned dentry sequence number
check inside __d_lookup_rcu() instead of doing it in the caller.  This
results in two annoyances:

 - __d_lookup_rcu() now not only needs to return the dentry and the
   sequence number that goes along with the lookup, it also needs to
   return the inode pointer that was validated by that sequence number
   check.

 - and because we did the sequence number check early (to validate the
   name pointer and length) we also couldn't just pass the dentry itself
   to dentry_cmp(), we had to pass the counted string that contained the
   name.

So that sequence number decision caused two separate ugly calling
conventions.

Both of these problems would be solved if we just did the sequence
number check in the caller instead.  There's only one caller, and that
caller already has to do the sequence number check for the parent
anyway, so just do that.

That allows us to stop returning the dentry->d_inode in that in-out
argument (pointer-to-pointer-to-inode), so we can make the inode
argument just a regular input inode pointer.  The caller can just load
the inode from dentry->d_inode, and then do the sequence number check
after that to make sure that it's synchronized with the name we looked
up.

And it allows us to just pass in the dentry to dentry_cmp(), which is
what all the callers really wanted.  Sure, dentry_cmp() has to be a bit
careful about the dentry (which is not stable during RCU lookup), but
that's actually very simple.

And now that dentry_cmp() can clearly see that the first string argument
is a dentry, we can use the direct word access for that, instead of the
careful unaligned zero-padding.  The dentry name is always properly
aligned, since it is a single path component that is either embedded
into the dentry itself, or was allocated with kmalloc() (see __d_alloc).

Finally, this also uninlines the nasty slow-case for dentry comparisons:
that one *does* need to do a sequence number check, since it will call
in to the low-level filesystems, and we want to give those a stable
inode pointer and path component length/start arguments.  Doing an extra
sequence check for that slow case is not a problem, though.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12f8ad4b

04 5月, 2012 1 次提交

vfs: make word-at-a-time accesses handle a non-existing page · e419b4cc

由 Linus Torvalds 提交于 5月 03, 2012

It turns out that there are more cases than CONFIG_DEBUG_PAGEALLOC that
can have holes in the kernel address space: it seems to happen easily
with Xen, and it looks like the AMD gart64 code will also punch holes
dynamically.

Actually hitting that case is still very unlikely, so just do the
access, and take an exception and fix it up for the very unlikely case
of it being a page-crosser with no next page.

And hey, this abstraction might even help other architectures that have
other issues with unaligned word accesses than the possible missing next
page.  IOW, this could do the byte order magic too.

Peter Anvin fixed a thinko in the shifting for the exception case.
Reported-and-tested-by: NJana Saout <jana@saout.de>
Cc:  Peter Anvin <hpa@zytor.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e419b4cc

29 3月, 2012 1 次提交

vfs: fix d_ancestor() case in d_materialize_unique · b18dafc8

由 Michel Lespinasse 提交于 3月 26, 2012

In d_materialise_unique() there are 3 subcases to the 'aliased dentry'
case; in two subcases the inode i_lock is properly released but this
does not occur in the -ELOOP subcase.

This seems to have been introduced by commit 18367501 ("fix loop
checks in d_materialise_unique()").
Signed-off-by: NMichel Lespinasse <walken@google.com>
Cc: stable@vger.kernel.org # v3.0+
[ Added a comment, and moved the unlock to where we generate the -ELOOP,
  which seems to be more natural.

  You probably can't actually trigger this without a buggy network file
  server - d_materialize_unique() is for finding aliases on non-local
  filesystems, and the d_ancestor() case is for a hardlinked directory
  loop.

  But we should be robust in the case of such buggy servers anyway. ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b18dafc8

23 3月, 2012 1 次提交

fs: fix kernel-doc warnings in dcache.c · 1f1e6e52

由 Randy Dunlap 提交于 3月 18, 2012

Fix kernel-doc warnings in fs/dcache.c:

Warning(fs/dcache.c:1743): No description found for parameter 'seqp'
Warning(fs/dcache.c:1743): Excess function parameter 'seq' description in '__d_lookup_rcu'
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1f1e6e52

21 3月, 2012 1 次提交

vfs: d_alloc_root() gone · 32991ab3

由 Al Viro 提交于 2月 12, 2012

all callers converted to d_make_root() by now
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

32991ab3

20 3月, 2012 1 次提交

vfs: get rid of batshit-insane pointless dentry hash calculations · 6d7d1a0d

由 Linus Torvalds 提交于 3月 19, 2012

For some odd historical reason, the final mixing round for the dentry
cache hash table lookup had an insane "xor with big constant" logic. In
two places.

The big constant that is being xor'ed is GOLDEN_RATIO_PRIME, which is a
fairly random-looking number that is designed to be *multiplied* with so
that the bits get spread out over a whole long-word.

But xor'ing with it is insane. It doesn't really even change the hash -
it really only shifts the hash around in the hash table. To make
matters worse, the insane big constant is different on 32-bit and 64-bit
builds, even though the name hash bits we use are always 32-bit (and the
bits from the pointer we mix in effectively are too).

It's all total voodoo programming, in other words.

Now, some testing and analysis of the hash chains shows that the rest of
the hash function seems to be fairly good. It does pick the right bits
of the parent dentry pointer, for example, and while it's generally a
bad idea to use an xor to mix down the upper bits (because if there is a
repeating pattern, the xor can cause "destructive interference"), it
seems to not have been a disaster.

For example, replacing the hash with the normal "hash_long()" code (that
uses the GOLDEN_RATIO_PRIME constant correctly, btw) actually just makes
the hash worse. The hand-picked hash knew which bits of the pointer had
the highest entropy, and hash_long() ends up mixing bits less optimally
at least in some trivial tests.

So the hash function overall seems fine, it just has that really odd
"shift result around by a constant xor".

So get rid of the silly xor, and replace the down-mixing of the bits
with an add instead of an xor that tends to not have the same kind of
destructive interference issues. Some stats on the resulting hash
chains shows that they look statistically identical before and after,
but the code is simpler and no longer makes you go "WTF?".

Also, the incoming hash really is just "unsigned int", not a long, and
there's no real point to worry about the high 26 bits of the dentry
pointer for the 64-bit case, because they are all going to be identical
anyway.

So also change the hashing to be done in the more natural 'unsigned int'
that is the real size of the actual hashed data anyway.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d7d1a0d

09 3月, 2012 1 次提交

vfs: use 'unsigned long' accesses for dcache name comparison and hashing · bfcfaa77

由 Linus Torvalds 提交于 3月 06, 2012

Ok, this is hacky, and only works on little-endian machines with goo
unaligned handling.  And even then only with CONFIG_DEBUG_PAGEALLOC
disabled, since it can access up to 7 bytes after the pathname.

But it runs like a bat out of hell.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bfcfaa77

05 3月, 2012 1 次提交

vfs: move dentry_cmp from <linux/dcache.h> to fs/dcache.c · 5483f18e

由 Linus Torvalds 提交于 3月 04, 2012

It's only used inside fs/dcache.c, and we're going to play games with it
for the word-at-a-time patches.  This time we really don't even want to
export it, because it really is an internal function to fs/dcache.c, and
has been since it was introduced.

Having it in that extremely hot header file (it's included in pretty
much everything, thanks to <linux/fs.h>) is a disaster for testing
different versions, and is utterly pointless.

We really should have some kind of header file diet thing, where we
figure out which parts of header files are really better off private and
only result in more expensive compiles.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5483f18e

03 3月, 2012 1 次提交

vfs: trivial __d_lookup_rcu() cleanups · 8966be90

由 Linus Torvalds 提交于 3月 02, 2012

These don't change any semantics, but they clean up the code a bit and
mark some arguments appropriately 'const'.

They came up as I was doing the word-at-a-time dcache name accessor
code, and cleaning this up now allows me to send out a smaller relevant
interesting patch for the experimental stuff.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8966be90

29 2月, 2012 1 次提交

fs: reduce the use of module.h wherever possible · 630d9c47

由 Paul Gortmaker 提交于 11月 16, 2011

For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include.  Fix up any implicit
include dependencies that were being masked by module.h along
the way.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

630d9c47

14 2月, 2012 1 次提交

vfs: fix panic in __d_lookup() with high dentry hashtable counts · 074b8517

由 Dimitri Sivanich 提交于 2月 08, 2012

When the number of dentry cache hash table entries gets too high
(2147483648 entries), as happens by default on a 16TB system, use of a
signed integer in the dcache_init() initialization loop prevents the
dentry_hashtable from getting initialized, causing a panic in
__d_lookup().  Fix this in dcache_init() and similar areas.
Signed-off-by: NDimitri Sivanich <sivanich@sgi.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

074b8517

13 1月, 2012 1 次提交

vfs: export symbol d_find_any_alias() · 46f72b34

由 Sage Weil 提交于 1月 10, 2012

Ceph needs this.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSage Weil <sage@newdream.net>

46f72b34

11 1月, 2012 1 次提交

fix shrink_dcache_parent() livelock · eaf5f907

由 Miklos Szeredi 提交于 1月 10, 2012

Two (or more) concurrent calls of shrink_dcache_parent() on the same dentry may
cause shrink_dcache_parent() to loop forever.

Here's what appears to happen:

1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1

2 - CPU1: select_parent(P) locks P->d_lock

3 - CPU0: shrink_dentry_list() locks C->d_lock
   dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock

4 - CPU1: select_parent(P) locks C->d_lock,
         moves C from dispose list being processed on CPU0 to the new
dispose list, returns 1

5 - CPU0: shrink_dentry_list() finds dispose list empty, returns

6 - Goto 2 with CPU0 and CPU1 switched

Basically select_parent() steals the dentry from shrink_dentry_list() and thinks
it found a new one, causing shrink_dentry_list() to think it's making progress
and loop over and over.

One way to trigger this is to make udev calls stat() on the sysfs file while it
is going away.

Having a file in /lib/udev/rules.d/ with only this one rule seems to the trick:

ATTR{vendor}=="0x8086", ATTR{device}=="0x10ca", ENV{PCI_SLOT_NAME}="%k", ENV{MATCHADDR}="$attr{address}", RUN+="/bin/true"

Then execute the following loop:

while true; do
        echo -bond0 > /sys/class/net/bonding_masters
        echo +bond0 > /sys/class/net/bonding_masters
        echo -bond1 > /sys/class/net/bonding_masters
        echo +bond1 > /sys/class/net/bonding_masters
done

One fix would be to check all callers and prevent concurrent calls to
shrink_dcache_parent().  But I think a better solution is to stop the
stealing behavior.

This patch adds a new dentry flag that is set when the dentry is added to the
dispose list.  The flag is cleared in dentry_lru_del() in case the dentry gets a
new reference just before being pruned.

If the dentry has this flag, select_parent() will skip it and let
shrink_dentry_list() retry pruning it.  With select_parent() skipping those
dentries there will not be the appearance of progress (new dentries found) when
there is none, hence shrink_dcache_parent() will not loop forever.

Set the flag is also set in prune_dcache_sb() for consistency as suggested by
Linus.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eaf5f907

10 1月, 2012 2 次提交

vfs: new helper - d_make_root() · adc0e91a

由 Al Viro 提交于 1月 08, 2012

d_alloc_root() with iput() in case of allocation failure...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

adc0e91a

dcache: use a dispose list in select_parent · b48f03b3

由 Dave Chinner 提交于 8月 23, 2011

select_parent currently abuses the dentry cache LRU to provide
cleanup features for child dentries that need to be freed. It moves
them to the tail of the LRU, then tells shrink_dcache_parent() to
calls __shrink_dcache_sb to unconditionally move them to a dispose
list (as DCACHE_REFERENCED is ignored). __shrink_dcache_sb() has to
relock the dentries to move them off the LRU onto the dispose list,
but otherwise does not touch the dentries that select_parent() moved
to the tail of the LRU. It then passses the dispose list to
shrink_dentry_list() which tries to free the dentries.

IOWs, the use of __shrink_dcache_sb() is superfluous - we can build
exactly the same list of dentries for disposal directly in
select_parent() and call shrink_dentry_list() instead of calling
__shrink_dcache_sb() to do that. This means that we avoid long holds
on the lru lock walking the LRU moving dentries to the dispose list
We also avoid the need to relock each dentry just to move it off the
LRU, reducing the numebr of times we lock each dentry to dispose of
them in shrink_dcache_parent() from 3 to 2 times.

Further, we remove one of the two callers of __shrink_dcache_sb().
This also means that __shrink_dcache_sb can be moved into back into
prune_dcache_sb() and we no longer have to handle referenced
dentries conditionally, simplifying the code.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b48f03b3

04 1月, 2012 7 次提交

A
vfs: mnt_ns moved to struct mount · 143c8c91
由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
143c8c91
A
vfs: move mnt_mountpoint to struct mount · a73324da
由 Al Viro 提交于 11月 24, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a73324da
A
vfs: now it can be done - make mnt_parent point to struct mount · 0714a533
由 Al Viro 提交于 11月 24, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
0714a533
A
vfs: mnt_parent moved to struct mount · 3376f34f
由 Al Viro 提交于 11月 24, 2011
```
the second victim...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3376f34f
A
vfs: spread struct mount - mnt_has_parent · 676da58d
由 Al Viro 提交于 11月 24, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
676da58d

vfs: more mnt_parent cleanups · afac7cba

由 Al Viro 提交于 11月 23, 2011

a) mount --move is checking that ->mnt_parent is non-NULL before
looking if that parent happens to be shared; ->mnt_parent is never
NULL and it's not even an misspelled !mnt_has_parent()

b) pivot_root open-codes is_path_reachable(), poorly.

c) so does path_is_under(), while we are at it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

afac7cba

vfs: new internal helper: mnt_has_parent(mnt) · b2dba1af

由 Al Viro 提交于 11月 23, 2011

vfsmounts have ->mnt_parent pointing either to a different vfsmount
or to itself; it's never NULL and termination condition in loops
traversing the tree towards root is mnt == mnt->mnt_parent.  At least
one place (see the next patch) is confused about what's going on;
let's add an explicit helper checking it right way and use it in
all places where we need it.  Not that there had been too many,
but...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b2dba1af

07 12月, 2011 1 次提交

fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API · 02125a82

由 Al Viro 提交于 12月 05, 2011

__d_path() API is asking for trouble and in case of apparmor d_namespace_path()
getting just that.  The root cause is that when __d_path() misses the root
it had been told to look for, it stores the location of the most remote ancestor
in *root.  Without grabbing references.  Sure, at the moment of call it had
been pinned down by what we have in *path.  And if we raced with umount -l, we
could have very well stopped at vfsmount/dentry that got freed as soon as
prepend_path() dropped vfsmount_lock.

It is safe to compare these pointers with pre-existing (and known to be still
alive) vfsmount and dentry, as long as all we are asking is "is it the same
address?".  Dereferencing is not safe and apparmor ended up stepping into
that.  d_namespace_path() really wants to examine the place where we stopped,
even if it's not connected to our namespace.  As the result, it looked
at ->d_sb->s_magic of a dentry that might've been already freed by that point.
All other callers had been careful enough to avoid that, but it's really
a bad interface - it invites that kind of trouble.

The fix is fairly straightforward, even though it's bigger than I'd like:
	* prepend_path() root argument becomes const.
	* __d_path() is never called with NULL/NULL root.  It was a kludge
to start with.  Instead, we have an explicit function - d_absolute_root().
Same as __d_path(), except that it doesn't get root passed and stops where
it stops.  apparmor and tomoyo are using it.
	* __d_path() returns NULL on path outside of root.  The main
caller is show_mountinfo() and that's precisely what we pass root for - to
skip those outside chroot jail.  Those who don't want that can (and do)
use d_path().
	* __d_path() root argument becomes const.  Everyone agrees, I hope.
	* apparmor does *NOT* try to use __d_path() or any of its variants
when it sees that path->mnt is an internal vfsmount.  In that case it's
definitely not mounted anywhere and dentry_path() is exactly what we want
there.  Handling of sysctl()-triggered weirdness is moved to that place.
	* if apparmor is asked to do pathname relative to chroot jail
and __d_path() tells it we it's not in that jail, the sucker just calls
d_absolute_path() instead.  That's the other remaining caller of __d_path(),
BTW.
        * seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway -
the normal seq_file logics will take care of growing the buffer and redoing
the call of ->show() just fine).  However, if it gets path not reachable
from root, it returns SEQ_SKIP.  The only caller adjusted (i.e. stopped
ignoring the return value as it used to do).
Reviewed-by: NJohn Johansen <john.johansen@canonical.com>
ACKed-by: NJohn Johansen <john.johansen@canonical.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org

02125a82

21 11月, 2011 1 次提交

VFS: Log the fact that we've given ELOOP rather than creating a loop · dd179946

由 David Howells 提交于 8月 16, 2011

To prevent an NFS server from being used to create a directory loop in an NFS
superblock on the client, the following patch was committed:

commit 18367501
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Tue Jul 12 21:42:24 2011 -0400
Subject: fix loop checks in d_materialise_unique()

This causes ELOOP to be reported to anyone trying to access the dentry that
would otherwise cause the kernel to complete the loop.

However, no indication is given to the caller as to why an operation that ought
to work doesn't. The fault is with the kernel, which doesn't want to try and
solve the problem as it gets horrendously messy if there's another mountpoint
somewhere in the trees being spliced that can't be moved[*].

[*] The real problem is that we don't handle the excision of a subtree that
gets moved _out_ of what we can see. This can happen on the server where a
directory is merely moved between two other dirs on the same filesystem, but
where destination dir is not accessible by the client.

So, given the choice to return ELOOP rather than trying to reconfigure the
dentry tree, we should give the caller some indication of why they aren't being
allowed to make what should be a legitimate request and log a message.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSachin Prabhu <sprabhu@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dd179946

08 11月, 2011 1 次提交

vfs: d_invalidate() should leave mountpoints alone · 50e69630

由 Al Viro 提交于 11月 07, 2011

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50e69630