提交 · 55db2fd9361424a6a5815e7796bcf03b19df437c · openeuler / raspberrypi-kernel

03 5月, 2016 10 次提交

A
atomic_open(): massage the create_error logics a bit · 55db2fd9
由 Al Viro 提交于 4月 27, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
55db2fd9
A
atomic_open(): consolidate "overridden ENOENT" in open-yourself cases · 9d0728e1
由 Al Viro 提交于 4月 27, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
9d0728e1
A
atomic_open(): don't bother with EEXIST check - it's done in do_last() · 5249e411
由 Al Viro 提交于 4月 27, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5249e411

lookup_open(): expand the call of vfs_create() · ce8644fc

由 Al Viro 提交于 4月 26, 2016

Lift IS_DEADDIR handling up into the part common with atomic_open(),
remove it from the latter. Collapse permission checks into the
call of may_o_create(), getting it closer to atomic_open() case.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ce8644fc

path_openat(): take O_PATH handling out of do_last() · 6ac08709

由 Al Viro 提交于 4月 26, 2016

do_last() and lookup_open() simpler that way and so does O_PATH
itself.  As it bloody well should: we find what the pathname
resolves to, same way as in stat() et.al. and associate it with
FMODE_PATH struct file.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6ac08709

parallel lookups: actual switch to rwsem · 9902af79

由 Al Viro 提交于 4月 15, 2016

ta-da!

The main issue is the lack of down_write_killable(), so the places
like readdir.c switched to plain inode_lock(); once killable
variants of rwsem primitives appear, that'll be dealt with.

lockdep side also might need more work
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9902af79

parallel lookups machinery, part 4 (and last) · d9171b93

由 Al Viro 提交于 4月 15, 2016

If we *do* run into an in-lookup match, we need to wait for it to
cease being in-lookup.  Fortunately, we do have unused space in
in-lookup dentries - d_lru is never looked at until it stops being
in-lookup.

So we can stash a pointer to wait_queue_head from stack frame of
the caller of ->lookup().  Some precautions are needed while
waiting, but it's not that hard - we do hold a reference to dentry
we are waiting for, so it can't go away.  If it's found to be
in-lookup the wait_queue_head is still alive and will remain so
at least while ->d_lock is held.  Moreover, the condition we
are waiting for becomes true at the same point where everything
on that wq gets woken up, so we can just add ourselves to the
queue once.

d_alloc_parallel() gets a pointer to wait_queue_head_t from its
caller; lookup_slow() adjusted, d_add_ci() taught to use
d_alloc_parallel() if the dentry passed to it happens to be
in-lookup one (i.e. if it's been called from the parallel lookup).

That's pretty much it - all that remains is to switch ->i_mutex
to rwsem and have lookup_slow() take it shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d9171b93

parallel lookups machinery, part 3 · 94bdd655

由 Al Viro 提交于 4月 15, 2016

We will need to be able to check if there is an in-lookup
dentry with matching parent/name.  Right now it's impossible,
but as soon as start locking directories shared such beasts
will appear.

Add a secondary hash for locating those.  Hash chains go through
the same space where d_alias will be once it's not in-lookup anymore.
Search is done under the same bitlock we use for modifications -
with the primary hash we can rely on d_rehash() into the wrong
chain being the worst that could happen, but here the pointers are
buggered once it's removed from the chain.  On the other hand,
the chains are not going to be long and normally we'll end up
adding to the chain anyway.  That allows us to avoid bothering with
->d_lock when doing the comparisons - everything is stable until
removed from chain.

New helper: d_alloc_parallel().  Right now it allocates, verifies
that no hashed and in-lookup matches exist and adds to in-lookup
hash.

Returns ERR_PTR() for error, hashed match (in the unlikely case it's
been found) or new dentry.  In-lookup matches trigger BUG() for
now; that will change in the next commit when we introduce waiting
for ongoing lookup to finish.  Note that in-lookup matches won't be
possible until we actually go for shared locking.

lookup_slow() switched to use of d_alloc_parallel().

Again, these commits are separated only for making it easier to
review.  All this machinery will start doing something useful only
when we go for shared locking; it's just that the combination is
too large for my taste.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

94bdd655

beginning of transition to parallel lookups - marking in-lookup dentries · 85c7f810

由 Al Viro 提交于 4月 14, 2016

marked as such when (would be) parallel lookup is about to pass them
to actual ->lookup(); unmarked when
	* __d_add() is about to make it hashed, positive or not.
	* __d_move() (from d_splice_alias(), directly or via
__d_unalias()) puts a preexisting dentry in its place
	* in caller of ->lookup() if it has escaped all of the
above.  Bug (WARN_ON, actually) if it reaches the final dput()
or d_instantiate() while still marked such.

As the result, we are guaranteed that for as long as the flag is
set, dentry will
	* remain negative unhashed with positive refcount
	* never have its ->d_alias looked at
	* never have its ->d_lru looked at
	* never have its ->d_parent and ->d_name changed

Right now we have at most one such for any given parent directory.
With parallel lookups that restriction will weaken to
	* only exist when parent is locked shared
	* at most one with given (parent,name) pair (comparison of
names is according to ->d_compare())
	* only exist when there's no hashed dentry with the same
(parent,name)

Transition will take the next several commits; unfortunately, we'll
only be able to switch to rwsem at the end of this series.  The
reason for not making it a single patch is to simplify review.

New primitives: d_in_lookup() (a predicate checking if dentry is in
the in-lookup state) and d_lookup_done() (tells the system that
we are done with lookup and if it's still marked as in-lookup, it
should cease to be such).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

85c7f810

A
lookup_slow(): bugger off on IS_DEADDIR() from the very beginning · 1936386e
由 Al Viro 提交于 4月 14, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1936386e

01 5月, 2016 1 次提交

atomic_open(): fix the handling of create_error · 10c64cea

由 Al Viro 提交于 4月 27, 2016

* if we have a hashed negative dentry and either CREAT|EXCL on
r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL
with failing may_o_create(), we should fail with EROFS or the
error may_o_create() has returned, but not ENOENT.  Which is what
the current code ends up returning.

* if we have CREAT|TRUNC hitting a regular file on a read-only
filesystem, we can't fail with EROFS here.  At the very least,
not until we'd done follow_managed() - we might have a writable
file (or a device, for that matter) bound on top of that one.
Moreover, the code downstream will see that O_TRUNC and attempt
to grab the write access (*after* following possible mount), so
if we really should fail with EROFS, it will happen.  No need
to do that inside atomic_open().

The real logics is much simpler than what the current code is
trying to do - if we decided to go for simple lookup, ended
up with a negative dentry *and* had create_error set, fail with
create_error.  No matter whether we'd got that negative dentry
from lookup_real() or had found it in dcache.

Cc: stable@vger.kernel.org # v3.6+
Acked-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

10c64cea

11 4月, 2016 1 次提交
- A
  don't bother with ->d_inode->i_sb - it's always equal to ->d_sb · fc64005c
  由 Al Viro 提交于 4月 10, 2016
```
... and neither can ever be NULL
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  fc64005c
31 3月, 2016 2 次提交

posix_acl: Inode acl caching fixes · b8a7a3a6

由 Andreas Gruenbacher 提交于 3月 24, 2016

When get_acl() is called for an inode whose ACL is not cached yet, the
get_acl inode operation is called to fetch the ACL from the filesystem.
The inode operation is responsible for updating the cached acl with
set_cached_acl(). This is done without locking at the VFS level, so
another task can call set_cached_acl() or forget_cached_acl() before the
get_acl inode operation gets to calling set_cached_acl(), and then
get_acl's call to set_cached_acl() results in caching an outdate ACL.

Prevent this from happening by setting the cached ACL pointer to a
task-specific sentinel value before calling the get_acl inode operation.
Move the responsibility for updating the cached ACL from the get_acl
inode operations to get_acl(). There, only set the cached ACL if the
sentinel value hasn't changed.

The sentinel values are chosen to have odd values. Likewise, the value
of ACL_NOT_CACHED is odd. In contrast, ACL object pointers always have
an even value (ACLs are aligned in memory). This allows to distinguish
uncached ACLs values from ACL objects.

In addition, switch from guarding inode->i_acl and inode->i_default_acl
upates by the inode->i_lock spinlock to using xchg() and cmpxchg().

Filesystems that do not want ACLs returned from their get_acl inode
operations to be cached must call forget_cached_acl() to prevent the VFS
from doing so.

(Patch written by Al Viro and Andreas Gruenbacher.)
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b8a7a3a6

fix the braino in "namei: massage lookup_slow() to be usable by lookup_one_len_unlocked()" · 7500c38a

由 Al Viro 提交于 3月 31, 2016

We should try to trigger automount *before* bailing out on negative dentry.
Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Reported-by: NArend van Spriel <arend@broadcom.com>
Tested-by: NArend van Spriel <arend@broadcom.com>
Tested-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7500c38a

14 3月, 2016 7 次提交

kill dentry_unhash() · 9d95afd5

由 Al Viro 提交于 3月 01, 2016

the last user is gone
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9d95afd5

namei: teach lookup_slow() to skip revalidate · 949a852e

由 Al Viro 提交于 3月 06, 2016

... and make mountpoint_last() use it.  That makes all
candidates for lookup with parent locked shared go
through lookup_slow().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

949a852e

namei: massage lookup_slow() to be usable by lookup_one_len_unlocked() · e3c13928

由 Al Viro 提交于 3月 06, 2016

Return dentry and don't pass nameidata or path; lift crossing mountpoints
into the caller.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e3c13928

lookup_one_len_unlocked(): use lookup_dcache() · d6d95ded

由 Al Viro 提交于 3月 05, 2016

No need to lock parent just because of ->d_revalidate() on child;
contrary to the stale comment, lookup_dcache() *can* be used without
locking the parent. Result can be moved as soon as we return, of
course, but the same is true for lookup_one_len_unlocked() itself.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d6d95ded

A
namei: simplify invalidation logics in lookup_dcache() · 74ff0ffc
由 Al Viro 提交于 3月 05, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
74ff0ffc

namei: change calling conventions for lookup_{fast,slow} and follow_managed() · e9742b53

由 Al Viro 提交于 3月 05, 2016

Have lookup_fast() return 1 on success and 0 on "need to fall back";
lookup_slow() and follow_managed() return positive (1) on success.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e9742b53

A
namei: untanlge lookup_fast() · 5d0f49c1
由 Al Viro 提交于 3月 05, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5d0f49c1

06 3月, 2016 2 次提交

A
lookup_dcache(): lift d_alloc() into callers · 6c51e513
由 Al Viro 提交于 3月 05, 2016
```
... and kill need_lookup thing
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6c51e513

do_last(): reorder and simplify a bit · 6583fe22

由 Al Viro 提交于 3月 05, 2016

bugger off on negatives a bit earlier, simplify the tests
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6583fe22

28 2月, 2016 4 次提交

do_last(): ELOOP failure exit should be done after leaving RCU mode · 5129fa48

由 Al Viro 提交于 2月 27, 2016

... or we risk seeing a bogus value of d_is_symlink() there.

Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5129fa48

should_follow_link(): validate ->d_seq after having decided to follow · a7f77542

由 Al Viro 提交于 2月 27, 2016

... otherwise d_is_symlink() above might have nothing to do with
the inode value we've got.

Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a7f77542

namei: ->d_inode of a pinned dentry is stable only for positives · d4565649

由 Al Viro 提交于 2月 27, 2016

both do_last() and walk_component() risk picking a NULL inode out
of dentry about to become positive, *then* checking its flags and
seeing that it's not negative anymore and using (already stale by
then) value they'd fetched earlier.  Usually ends up oopsing soon
after that...

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d4565649

do_last(): don't let a bogus return value from ->open() et.al. to confuse us · c80567c8

由 Al Viro 提交于 2月 27, 2016

... into returning a positive to path_openat(), which would interpret that
as "symlink had been encountered" and proceed to corrupt memory, etc.
It can only happen due to a bug in some ->open() instance or in some LSM
hook, etc., so we report any such event *and* make sure it doesn't trick
us into further unpleasantness.

Cc: stable@vger.kernel.org # v3.6+, at least
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c80567c8

23 1月, 2016 1 次提交

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

09 1月, 2016 1 次提交

nfsd: don't hold i_mutex over userspace upcalls · bbddca8e

由 NeilBrown 提交于 1月 07, 2016

We need information about exports when crossing mountpoints during
lookup or NFSv4 readdir.  If we don't already have that information
cached, we may have to ask (and wait for) rpc.mountd.

In both cases we currently hold the i_mutex on the parent of the
directory we're asking rpc.mountd about.  We've seen situations where
rpc.mountd performs some operation on that directory that tries to take
the i_mutex again, resulting in deadlock.

With some care, we may be able to avoid that in rpc.mountd.  But it
seems better just to avoid holding a mutex while waiting on userspace.

It appears that lookup_one_len is pretty much the only operation that
needs the i_mutex.  So we could just drop the i_mutex elsewhere and do
something like

	mutex_lock()
	lookup_one_len()
	mutex_unlock()

In many cases though the lookup would have been cached and not required
the i_mutex, so it's more efficient to create a lookup_one_len() variant
that only takes the i_mutex when necessary.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bbddca8e

04 1月, 2016 1 次提交
- A
  don't carry MAY_OPEN in op->acc_mode · 62fb4a15
  由 Al Viro 提交于 12月 26, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  62fb4a15
31 12月, 2015 1 次提交
- A
  switch ->get_link() to delayed_call, kill ->put_link() · fceef393
  由 Al Viro 提交于 12月 29, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  fceef393
09 12月, 2015 3 次提交

teach page_get_link() to work in RCU mode · d3883d4f

由 Al Viro 提交于 11月 17, 2015

more or less along the lines of Neil's patchset, sans the insanity
around kmap().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d3883d4f

replace ->follow_link() with new method that could stay in RCU mode · 6b255391

由 Al Viro 提交于 11月 17, 2015

new method: ->get_link(); replacement of ->follow_link().  The differences
are:
	* inode and dentry are passed separately
	* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
	* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted.  Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode.  That'll change
in the next commits.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6b255391

don't put symlink bodies in pagecache into highmem · 21fc61c7

由 Al Viro 提交于 11月 17, 2015

kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.

new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases.  page_follow_link_light()
instrumented to yell about anything missed.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

21fc61c7

07 12月, 2015 6 次提交

restore_nameidata(): no need to clear now->stack · e1a63bbc

由 Al Viro 提交于 12月 05, 2015

microoptimization: in all callers *now is in the frame we are about to leave.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e1a63bbc

namei.c: take "jump to root" into a new helper · 248fb5b9

由 Al Viro 提交于 12月 05, 2015

... and use it both in path_init() (for absolute pathnames) and
get_link() (for absolute symlinks).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

248fb5b9

path_init(): set nd->inode earlier in cwd-relative case · ef55d917

由 Al Viro 提交于 12月 05, 2015

that allows to kill the recheck of nd->seq on the way out in
this case, and this check on the way out is left only for
absolute pathnames.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ef55d917

A
namei.c: fold set_root_rcu() into set_root() · 9e6697e2
由 Al Viro 提交于 12月 05, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
9e6697e2
M
typo in fs/namei.c comment · 57e3715c
由 Mike Marshall 提交于 11月 30, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
57e3715c
A
namei: page_getlink() and page_follow_link_light() are the same thing · aa80deab
由 Al Viro 提交于 11月 16, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
aa80deab