提交 · 9c8c10e262e0f62cb2530f1b076de979123183dd · openanolis / cloud-kernel

04 5月, 2014 2 次提交

more graceful recovery in umount_collect() · 9c8c10e2

由 Al Viro 提交于 5月 02, 2014

Start with shrink_dcache_parent(), then scan what remains.

First of all, BUG() is very much an overkill here; we are holding
->s_umount, and hitting BUG() means that a lot of interesting stuff
will be hanging after that point (sync(2), for example).  Moreover,
in cases when there had been more than one leak, we'll be better
off reporting all of them.  And more than just the last component
of pathname - %pd is there for just such uses...

That was the last user of dentry_lru_del(), so kill it off...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9c8c10e2

don't remove from shrink list in select_collect() · fe91522a

由 Al Viro 提交于 5月 03, 2014

	If we find something already on a shrink list, just increment
data->found and do nothing else.  Loops in shrink_dcache_parent() and
check_submounts_and_drop() will do the right thing - everything we
did put into our list will be evicted and if there had been nothing,
but data->found got non-zero, well, we have somebody else shrinking
those guys; just try again.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fe91522a

01 5月, 2014 5 次提交

dentry_kill(): don't try to remove from shrink list · 41edf278

由 Al Viro 提交于 5月 01, 2014

If the victim in on the shrink list, don't remove it from there.
If shrink_dentry_list() manages to remove it from the list before
we are done - fine, we'll just free it as usual.  If not - mark
it with new flag (DCACHE_MAY_FREE) and leave it there.

Eventually, shrink_dentry_list() will get to it, remove the sucker
from shrink list and call dentry_kill(dentry, 0).  Which is where
we'll deal with freeing.

Since now dentry_kill(dentry, 0) may happen after or during
dentry_kill(dentry, 1), we need to recognize that (by seeing
DCACHE_DENTRY_KILLED already set), unlock everything
and either free the sucker (in case DCACHE_MAY_FREE has been
set) or leave it for ongoing dentry_kill(dentry, 1) to deal with.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

41edf278

A
expand the call of dentry_lru_del() in dentry_kill() · 01b60351
由 Al Viro 提交于 4月 29, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
01b60351

new helper: dentry_free() · b4f0354e

由 Al Viro 提交于 4月 29, 2014

The part of old d_free() that dealt with actual freeing of dentry.
Taken out of dentry_kill() into a separate function.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b4f0354e

A
fold try_prune_one_dentry() · 5c47e6d0
由 Al Viro 提交于 4月 29, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5c47e6d0
A
fold d_kill() and d_free() · 03b3b889
由 Al Viro 提交于 4月 29, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
03b3b889

20 4月, 2014 1 次提交

fix races between __d_instantiate() and checks of dentry flags · 22213318

由 Al Viro 提交于 4月 19, 2014

in non-lazy walk we need to be careful about dentry switching from
negative to positive - both ->d_flags and ->d_inode are updated,
and in some places we might see only one store.  The cases where
dentry has been obtained by dcache lookup with ->i_mutex held on
parent are safe - ->d_lock and ->i_mutex provide all the barriers
we need.  However, there are several places where we run into
trouble:
	* do_last() fetches ->d_inode, then checks ->d_flags and
assumes that inode won't be NULL unless d_is_negative() is true.
Race with e.g. creat() - we might have fetched the old value of
->d_inode (still NULL) and new value of ->d_flags (already not
DCACHE_MISS_TYPE).  Lin Ming has observed and reported the resulting
oops.
	* a bunch of places checks ->d_inode for being non-NULL,
then checks ->d_flags for "is it a symlink".  Race with symlink(2)
in case if our CPU sees ->d_inode update first - we see non-NULL
there, but ->d_flags still contains DCACHE_MISS_TYPE instead of
DCACHE_SYMLINK_TYPE.  Result: false negative on "should we follow
link here?", with subsequent unpleasantness.

Cc: stable@vger.kernel.org # 3.13 and 3.14 need that one
Reported-and-tested-by: NLin Ming <minggr@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

22213318

01 4月, 2014 1 次提交

vfs: add cross-rename · da1ce067

由 Miklos Szeredi 提交于 4月 01, 2014

If flags contain RENAME_EXCHANGE then exchange source and destination files.
There's no restriction on the type of the files; e.g. a directory can be
exchanged with a symlink.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>

da1ce067

23 3月, 2014 1 次提交

make prepend_name() work correctly when called with negative *buflen · e825196d

由 Al Viro 提交于 3月 23, 2014

In all callchains leading to prepend_name(), the value left in *buflen
is eventually discarded unused if prepend_name() has returned a negative.
So we are free to do what prepend() does, and subtract from *buflen
*before* checking for underflow (which turns into checking the sign
of subtraction result, of course).

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e825196d

16 3月, 2014 1 次提交

drm: add pseudo filesystem for shared inodes · 31bbe16f

由 David Herrmann 提交于 1月 03, 2014

Our current DRM design uses a single address_space for all users of the
same DRM device. However, there is no way to create an anonymous
address_space without an underlying inode. Therefore, we wait for the
first ->open() callback on a registered char-dev and take-over the inode
of the char-dev. This worked well so far, but has several drawbacks:
 - We screw with FS internals and rely on some non-obvious invariants like
   inode->i_mapping being the same as inode->i_data for char-devs.
 - We don't have any address_space prior to the first ->open() from
   user-space. This leads to ugly fallback code and we cannot allocate
   global objects early.

As pointed out by Al-Viro, fs/anon_inode.c is *not* supposed to be used by
drivers for anonymous inode-allocation. Therefore, this patch follows the
proposed alternative solution and adds a pseudo filesystem mount-point to
DRM. We can then allocate private inodes including a private address_space
for each DRM device at initialization time.

Note that we could use:
  sysfs_get_inode(sysfs_mnt->mnt_sb, drm_device->dev->kobj.sd);
to get access to the underlying sysfs-inode of a "struct device" object.
However, most of this information is currently hidden and it's not clear
whether this address_space is suitable for driver access. Thus, unless
linux allows anonymous address_space objects or driver-core provides a
public inode per device, we're left with our own private internal mount
point.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>

31bbe16f

27 1月, 2014 1 次提交

__dentry_path() fixes · f6500801

由 Al Viro 提交于 1月 26, 2014

* we need to save the starting point for restarts
* reject pathologically short buffers outright
Spotted-by: NDenys Vlasenko <dvlasenk@redhat.com>
Spotted-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f6500801

26 1月, 2014 1 次提交

vfs: Remove second variable named error in __dentry_path · a8323da0

由 Eric W. Biederman 提交于 1月 20, 2014

In commit  232d2d60
Author: Waiman Long <Waiman.Long@hp.com>
Date:   Mon Sep 9 12:18:13 2013 -0400

    dcache: Translating dentry into pathname without taking rename_lock

The __dentry_path locking was changed and the variable error was
intended to be moved outside of the loop.  Unfortunately the inner
declaration of error was not removed. Resulting in a version of
__dentry_path that will never return an error.

Remove the problematic inner declaration of error and allow
__dentry_path to return errors once again.

Cc: stable@vger.kernel.org
Cc: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a8323da0

13 12月, 2013 1 次提交

dcache: allow word-at-a-time name hashing with big-endian CPUs · a5c21dce

由 Will Deacon 提交于 12月 12, 2013

When explicitly hashing the end of a string with the word-at-a-time
interface, we have to be careful which end of the word we pick up.

On big-endian CPUs, the upper-bits will contain the data we're after, so
ensure we generate our masks accordingly (and avoid hashing whatever
random junk may have been sitting after the string).

This patch adds a new dcache helper, bytemask_from_count, which creates
a mask appropriate for the CPU endianness.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5c21dce

27 11月, 2013 1 次提交

vfs: In d_path don't call d_dname on a mount point · f48cfddc

由 Eric W. Biederman 提交于 11月 08, 2013

Aditya Kali (adityakali@google.com) wrote:
> Commit bf056bfa:
> "proc: Fix the namespace inode permission checks." converted
> the namespace files into symlinks. The same commit changed
> the way namespace bind mounts appear in /proc/mounts:
>   $ mount --bind /proc/self/ns/ipc /mnt/ipc
> Originally:
>   $ cat /proc/mounts | grep ipc
>   proc /mnt/ipc proc rw,nosuid,nodev,noexec 0 0
>
> After commit bf056bfa:
>   $ cat /proc/mounts | grep ipc
>   proc ipc:[4026531839] proc rw,nosuid,nodev,noexec 0 0
>
> This breaks userspace which expects the 2nd field in
> /proc/mounts to be a valid path.

The symlink /proc/<pid>/ns/{ipc,mnt,net,pid,user,uts} point to
dentries allocated with d_alloc_pseudo that we can mount, and
that have interesting names printed out with d_dname.

When these files are bind mounted /proc/mounts is not currently
displaying the mount point correctly because d_dname is called instead
of just displaying the path where the file is mounted.

Solve this by adding an explicit check to distinguish mounted pseudo
inodes and unmounted pseudo inodes.  Unmounted pseudo inodes always
use mount of their filesstem as the mnt_root  in their path making
these two cases easy to distinguish.

CC: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reported-by: NAditya Kali <adityakali@google.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

f48cfddc

16 11月, 2013 3 次提交

fold try_to_ascend() into the sole remaining caller · 31dec132

由 Al Viro 提交于 10月 25, 2013

There used to be a bunch of tree-walkers in dcache.c, all alike.
try_to_ascend() had been introduced to abstract a piece of logics
duplicated in all of them. These days all these tree-walkers are
implemented via the same iterator (d_walk()), which is the only
remaining caller of try_to_ascend(), so let's fold it back...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

31dec132

dcache.c: get rid of pointless macros · 482db906

由 Al Viro 提交于 10月 25, 2013

D_HASH{MASK,BITS} are used once each, both in the same function (d_hash()).
At this point they are actively misguiding - they imply that values are
compiler constants, which is no longer true.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

482db906

A
take read_seqbegin_or_lock() and friends to seqlock.h · 2bc74feb
由 Al Viro 提交于 10月 25, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2bc74feb

13 11月, 2013 2 次提交

A
prepend_path() needs to reinitialize dentry/vfsmount/mnt on restarts · ede4cebc
由 Al Viro 提交于 11月 13, 2013
```
... and equivalent is needed in 3.12; it's broken there as well
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ede4cebc

fix unpaired rcu lock in prepend_path() · 4ec6c2ae

由 Li Zhong 提交于 11月 13, 2013

Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4ec6c2ae

09 11月, 2013 7 次提交

dcache: don't clear DCACHE_DISCONNECTED too early · f80de2cd

由 J. Bruce Fields 提交于 7月 18, 2012

DCACHE_DISCONNECTED should not be cleared until we're sure the dentry is
connected all the way up to the root of the filesystem.  It *shouldn't*
be cleared as soon as the dentry is connected to a parent.  That will
cause bugs at least on exportable filesystems.
Acked-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f80de2cd

dcache: Don't set DISCONNECTED on "pseudo filesystem" dentries · e1a24bb0

由 J. Bruce Fields 提交于 6月 29, 2012

I can't for the life of me see any reason why anyone should care whether
a dentry that is never hooked into the dentry cache would need
DCACHE_DISCONNECTED set.

This originates from 4b936885 "fs:
improve scalability of pseudo filesystems", which probably just made the
false assumption the DCACHE_DISCONNECTED was meant to be set on anything
not connected to a parent somehow.

So this is just confusing.  Ideally the only uses of DCACHE_DISCONNECTED
would be in the filehandle-lookup code, which needs it to ensure
dentries are connected into the dentry tree before use.

I left d_alloc_pseudo there even though it's now equivalent to
__d_alloc(), just on the theory the name is better documentation of its
intended use outside dcache.c.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e1a24bb0

dcache: use IS_ROOT to decide where dentry is hashed · 7632e465

由 J. Bruce Fields 提交于 6月 28, 2012

Every hashed dentry is either hashed in the dentry_hashtable, or a
superblock's s_anon list.

__d_drop() assumes it can determine which is the case by checking
DCACHE_DISCONNECTED; this is not true.

It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
only hashed on dentry_hashtable, but is fully connected to its parents
back to the root.

But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
attempts to connect a directory (found by filehandle lookup) back to
root by ascending to parents and performing lookups one at a time.  It
does not clear DCACHE_DISCONNECTED until it's done, and that is not at
all an atomic process.

In particular, it is possible for DCACHE_DISCONNECTED to be set on a
dentry which is hashed on the dentry_hashtable.

Instead, use IS_ROOT() to check which hash chain a dentry is on.  This
*does* work:

Dentries are hashed only by:

	- d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.

	- __d_rehash, called by _d_rehash: hashes to the dentry's
	  parent, and all callers of _d_rehash appear to have d_parent
	  set to a "real" parent.
	- __d_rehash, called by __d_move: rehashes the moved dentry to
	  hash chain determined by target, and assigns target's d_parent
	  to its d_parent, before dropping the dentry's d_lock.

Therefore I believe it's safe for a holder of a dentry's d_lock to
assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
true.

I believe the incorrect assumption about DCACHE_DISCONNECTED was
originally introduced by ceb5bdc2 "fs: dcache per-bucket dcache hash
locking".

Also add a comment while we're here.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: NChristoph Hellwig <hch@infradead.org>
Reviewed-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7632e465

VFS: Put a small type field into struct dentry::d_flags · b18825a7

由 David Howells 提交于 9月 12, 2013

Put a type field into struct dentry::d_flags to indicate if the dentry is one
of the following types that relate particularly to pathwalk:

	Miss (negative dentry)
	Directory
	"Automount" directory (defective - no i_op->lookup())
	Symlink
	Other (regular, socket, fifo, device)

The type field is set to one of the first five types on a dentry by calls to
__d_instantiate() and d_obtain_alias() from information in the inode (if one is
given).

The type is cleared by dentry_unlink_inode() when it reconstitutes an existing
dentry as a negative dentry.

Accessors provided are:

	d_set_type(dentry, type)
	d_is_directory(dentry)
	d_is_autodir(dentry)
	d_is_symlink(dentry)
	d_is_file(dentry)
	d_is_negative(dentry)
	d_is_positive(dentry)

A bunch of checks in pathname resolution switched to those.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b18825a7

A
fold __d_shrink() into its only remaining caller · b61625d2
由 Al Viro 提交于 10月 04, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b61625d2

RCU'd vfsmounts · 48a066e7

由 Al Viro 提交于 9月 29, 2013

* RCU-delayed freeing of vfsmounts
* vfsmount_lock replaced with a seqlock (mount_lock)
* sequence number from mount_lock is stored in nameidata->m_seq and
used when we exit RCU mode
* new vfsmount flag - MNT_SYNC_UMOUNT.  Set by umount_tree() when its
caller knows that vfsmount will have no surviving references.
* synchronize_rcu() done between unlocking namespace_sem in namespace_unlock()
and doing pending mntput().
* new helper: legitimize_mnt(mnt, seq).  Checks the mount_lock sequence
number against seq, then grabs reference to mnt.  Then it rechecks mount_lock
again to close the race and either returns success or drops the reference it
has acquired.  The subtle point is that in case of MNT_SYNC_UMOUNT we can
simply decrement the refcount and sod off - aforementioned synchronize_rcu()
makes sure that final mntput() won't come until we leave RCU mode.  We need
that, since we don't want to end up with some lazy pathwalk racing with
umount() and stealing the final mntput() from it - caller of umount() may
expect it to return only once the fs is shut down and we don't want to break
that.  In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do
full-blown mntput() in case of mount_lock sequence number mismatch happening
just as we'd grabbed the reference, but in those cases we won't be stealing
the final mntput() from anything that would care.
* mntput_no_expire() doesn't lock anything on the fast path now.  Incidentally,
SMP and UP cases are handled the same way - no ifdefs there.
* normal pathname resolution does *not* do any writes to mount_lock.  It does,
of course, bump the refcounts of vfsmount and dentry in the very end, but that's
it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

48a066e7

A
switch shrink_dcache_for_umount() to use of d_walk() · 42c32608
由 Al Viro 提交于 11月 08, 2013
```
we have too many iterators in fs/dcache.c...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
42c32608

06 11月, 2013 1 次提交

seqcount: Add lockdep functionality to seqcount/seqlock structures · 1ca7d67c

由 John Stultz 提交于 10月 07, 2013

Currently seqlocks and seqcounts don't support lockdep.

After running across a seqcount related deadlock in the timekeeping
code, I used a less-refined and more focused variant of this patch
to narrow down the cause of the issue.

This is a first-pass attempt to properly enable lockdep functionality
on seqlocks and seqcounts.

Since seqcounts are used in the vdso gettimeofday code, I've provided
non-lockdep accessors for those needs.

I've also handled one case where there were nested seqlock writers
and there may be more edge cases.

Comments and feedback would be appreciated!
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

1ca7d67c

01 11月, 2013 1 次提交

vfs: decrapify dput(), fix cache behavior under normal load · 358eec18

由 Linus Torvalds 提交于 10月 31, 2013

We do not want to dirty the dentry->d_flags cacheline in dput() just to
set the DCACHE_REFERENCED flag when it is already set in the common case
anyway.  This way the first cacheline of the dentry (which contains the
RCU lookup information etc) can stay shared among multiple CPU's.

This finishes off some of the details of all the scalability patches
merged during the merge window.

Also don't mark dentry_kill() for inlining, since it's the uncommon path
and inlining it just makes the common path slower due to extra function
entry/exit overhead.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

358eec18

25 10月, 2013 2 次提交

vfs: introduce d_instantiate_no_diralias() · b70a80e7

由 Miklos Szeredi 提交于 10月 01, 2013

...which just returns -EBUSY if a directory alias would be created.

This is to be used by fuse mkdir to make sure that a buggy or malicious
userspace filesystem doesn't do anything nasty.  Previously fuse used a
private mutex for this purpose, which can now go away.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

b70a80e7

A
move taking vfsmount_lock down into prepend_path() · 94e92a6e
由 Al Viro 提交于 10月 01, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
94e92a6e

22 10月, 2013 1 次提交

vfs: fix new kernel-doc warnings · 69c88dc7

由 Randy Dunlap 提交于 10月 19, 2013

Move kernel-doc notation to immediately before its function to eliminate
kernel-doc warnings introduced by commit db14fc3a ("vfs: add
d_walk()")

  Warning(fs/dcache.c:1343): No description found for parameter 'data'
  Warning(fs/dcache.c:1343): No description found for parameter 'dentry'
  Warning(fs/dcache.c:1343): Excess function parameter 'parent' description in 'check_mount'
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

69c88dc7

15 9月, 2013 1 次提交

vfs: fix typo in comment in recent dentry work · 05a8252b

由 Linus Torvalds 提交于 9月 15, 2013

Sedat points out that I transposed some letters in "LRU" and wrote "RLU"
instead in one of the new comments explaining the flow.  Let's just fix
it.
Reported-by: NSedat Dilek <sedat.dilek@jpberlin.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

05a8252b

14 9月, 2013 1 次提交

vfs: fix dentry LRU list handling and nr_dentry_unused accounting · 89dc77bc

由 Linus Torvalds 提交于 9月 13, 2013

The LRU list changes interacted badly with our nr_dentry_unused
accounting, and even worse with the new DCACHE_LRU_LIST bit logic.

This introduces helper functions to make sure everything follows the
proper dcache d_lru list rules: the dentry cache is complicated by the
fact that some of the hotpaths don't even want to look at the LRU list
at all, and the fact that we use the same list entry in the dentry for
both the LRU list and for our temporary shrinking lists when removing
things from the LRU.

The helper functions temporarily have some extra sanity checking for the
flag bits that have to match the current LRU state of the dentry.  We'll
remove that before the final 3.12 release, but considering how easy it
is to get wrong, this first cleanup version has some very particular
sanity checking.
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

89dc77bc

13 9月, 2013 6 次提交

vfs: make d_path() get the root path under RCU · 68f0d9d9

由 Linus Torvalds 提交于 9月 12, 2013

This avoids the spinlocks and refcounts in the d_path() sequence too
(used by /proc and various other entities).  See commit 8b19e341 for
the equivalent getcwd() system call path.

And unlike getcwd(), d_path() doesn't copy the result to user space, so
I don't need to fear _that_ particular bug happening again.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

68f0d9d9

vfs: use __getname/__putname for getcwd() system call · 3272c544

由 Linus Torvalds 提交于 9月 12, 2013

It's a pathname.  It should use the pathname allocators and
deallocators, and PATH_MAX instead of PAGE_SIZE.  Never mind that the
two are commonly the same.

With this, the allocations scale up nicely too, and I can do getcwd()
system calls at a rate of about 300M/s, with no lock contention
anywhere.

Of course, nobody sane does that, especially since getcwd() is
traditionally a very slow operation in Unix.  But this was also the
simplest way to benchmark the prepend_path() improvements by Waiman, and
once I saw the profiles I couldn't leave it well enough alone.

But apart from being an performance improvement (from using per-cpu slab
allocators instead of the raw page allocator), it's actually a valid and
real cleanup.
Signed-off-by: NLinus "OCD" Torvalds <torvalds@linux-foundation.org>

3272c544

vfs: don't copy things to user space holding the rcu readlock · ff812d72

由 Linus Torvalds 提交于 9月 12, 2013

Oops.  That wasn't very smart.  We don't actually need the RCU lock any
more by the time we copy the cwd string to user space, but I had
stupidly surrounded the whole thing with it.

Introduced by commit 8b19e341 ("vfs: make getcwd() get the root and
pwd path under rcu")

Is-a-big-hairy-idiot: Linus Torvalds <torvalds@linux-foundation.org>

ff812d72

vfs: make getcwd() get the root and pwd path under rcu · 8b19e341

由 Linus Torvalds 提交于 9月 12, 2013

This allows us to skip all the crazy spinlocks and reference count
updates, and instead use the fs sequence read-lock to get an atomic
snapshot of the root and cwd information.

We might want to make the rule that "prepend_path()" is always called
with the RCU lock held, but the RCU lock nests fine and this is the
minimal fix.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b19e341

vfs: move get_fs_root_and_pwd() to single caller · 5762482f

由 Linus Torvalds 提交于 9月 12, 2013

Let's not pollute the include files with inline functions that are only
used in a single place. Especially not if we decide we might want to
change the semantics of said function to make it more efficient..
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5762482f

dcache: get/release read lock in read_seqbegin_or_lock() & friend · 18129977

由 Waiman Long 提交于 9月 12, 2013

This patch modifies read_seqbegin_or_lock() and need_seqretry() to use
newly introduced read_seqlock_excl() and read_sequnlock_excl()
primitives so that they won't change the sequence number even if they
fall back to take the lock.  This is OK as no change to the protected
data structure is being made.

It will prevent one fallback to lock taking from cascading into a series
of lock taking reducing performance because of the sequence number
change.  It will also allow other sequence readers to go forward while
an exclusive reader lock is taken.

This patch also updates some of the inaccurate comments in the code.
Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

18129977

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功