提交 · d9171b9345261e0d941d92fdda5672b5db67f968 · openeuler / Kernel

03 5月, 2016 4 次提交

parallel lookups machinery, part 4 (and last) · d9171b93

由 Al Viro 提交于 4月 15, 2016

If we *do* run into an in-lookup match, we need to wait for it to
cease being in-lookup.  Fortunately, we do have unused space in
in-lookup dentries - d_lru is never looked at until it stops being
in-lookup.

So we can stash a pointer to wait_queue_head from stack frame of
the caller of ->lookup().  Some precautions are needed while
waiting, but it's not that hard - we do hold a reference to dentry
we are waiting for, so it can't go away.  If it's found to be
in-lookup the wait_queue_head is still alive and will remain so
at least while ->d_lock is held.  Moreover, the condition we
are waiting for becomes true at the same point where everything
on that wq gets woken up, so we can just add ourselves to the
queue once.

d_alloc_parallel() gets a pointer to wait_queue_head_t from its
caller; lookup_slow() adjusted, d_add_ci() taught to use
d_alloc_parallel() if the dentry passed to it happens to be
in-lookup one (i.e. if it's been called from the parallel lookup).

That's pretty much it - all that remains is to switch ->i_mutex
to rwsem and have lookup_slow() take it shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d9171b93

parallel lookups machinery, part 3 · 94bdd655

由 Al Viro 提交于 4月 15, 2016

We will need to be able to check if there is an in-lookup
dentry with matching parent/name.  Right now it's impossible,
but as soon as start locking directories shared such beasts
will appear.

Add a secondary hash for locating those.  Hash chains go through
the same space where d_alias will be once it's not in-lookup anymore.
Search is done under the same bitlock we use for modifications -
with the primary hash we can rely on d_rehash() into the wrong
chain being the worst that could happen, but here the pointers are
buggered once it's removed from the chain.  On the other hand,
the chains are not going to be long and normally we'll end up
adding to the chain anyway.  That allows us to avoid bothering with
->d_lock when doing the comparisons - everything is stable until
removed from chain.

New helper: d_alloc_parallel().  Right now it allocates, verifies
that no hashed and in-lookup matches exist and adds to in-lookup
hash.

Returns ERR_PTR() for error, hashed match (in the unlikely case it's
been found) or new dentry.  In-lookup matches trigger BUG() for
now; that will change in the next commit when we introduce waiting
for ongoing lookup to finish.  Note that in-lookup matches won't be
possible until we actually go for shared locking.

lookup_slow() switched to use of d_alloc_parallel().

Again, these commits are separated only for making it easier to
review.  All this machinery will start doing something useful only
when we go for shared locking; it's just that the combination is
too large for my taste.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

94bdd655

beginning of transition to parallel lookups - marking in-lookup dentries · 85c7f810

由 Al Viro 提交于 4月 14, 2016

marked as such when (would be) parallel lookup is about to pass them
to actual ->lookup(); unmarked when
	* __d_add() is about to make it hashed, positive or not.
	* __d_move() (from d_splice_alias(), directly or via
__d_unalias()) puts a preexisting dentry in its place
	* in caller of ->lookup() if it has escaped all of the
above.  Bug (WARN_ON, actually) if it reaches the final dput()
or d_instantiate() while still marked such.

As the result, we are guaranteed that for as long as the flag is
set, dentry will
	* remain negative unhashed with positive refcount
	* never have its ->d_alias looked at
	* never have its ->d_lru looked at
	* never have its ->d_parent and ->d_name changed

Right now we have at most one such for any given parent directory.
With parallel lookups that restriction will weaken to
	* only exist when parent is locked shared
	* at most one with given (parent,name) pair (comparison of
names is according to ->d_compare())
	* only exist when there's no hashed dentry with the same
(parent,name)

Transition will take the next several commits; unfortunately, we'll
only be able to switch to rwsem at the end of this series.  The
reason for not making it a single patch is to simplify review.

New primitives: d_in_lookup() (a predicate checking if dentry is in
the in-lookup state) and d_lookup_done() (tells the system that
we are done with lookup and if it's still marked as in-lookup, it
should cease to be such).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

85c7f810

A
lookup_slow(): bugger off on IS_DEADDIR() from the very beginning · 1936386e
由 Al Viro 提交于 4月 14, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1936386e

11 4月, 2016 1 次提交
- A
  don't bother with ->d_inode->i_sb - it's always equal to ->d_sb · fc64005c
  由 Al Viro 提交于 4月 10, 2016
```
... and neither can ever be NULL
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  fc64005c
31 3月, 2016 2 次提交

posix_acl: Inode acl caching fixes · b8a7a3a6

由 Andreas Gruenbacher 提交于 3月 24, 2016

When get_acl() is called for an inode whose ACL is not cached yet, the
get_acl inode operation is called to fetch the ACL from the filesystem.
The inode operation is responsible for updating the cached acl with
set_cached_acl(). This is done without locking at the VFS level, so
another task can call set_cached_acl() or forget_cached_acl() before the
get_acl inode operation gets to calling set_cached_acl(), and then
get_acl's call to set_cached_acl() results in caching an outdate ACL.

Prevent this from happening by setting the cached ACL pointer to a
task-specific sentinel value before calling the get_acl inode operation.
Move the responsibility for updating the cached ACL from the get_acl
inode operations to get_acl(). There, only set the cached ACL if the
sentinel value hasn't changed.

The sentinel values are chosen to have odd values. Likewise, the value
of ACL_NOT_CACHED is odd. In contrast, ACL object pointers always have
an even value (ACLs are aligned in memory). This allows to distinguish
uncached ACLs values from ACL objects.

In addition, switch from guarding inode->i_acl and inode->i_default_acl
upates by the inode->i_lock spinlock to using xchg() and cmpxchg().

Filesystems that do not want ACLs returned from their get_acl inode
operations to be cached must call forget_cached_acl() to prevent the VFS
from doing so.

(Patch written by Al Viro and Andreas Gruenbacher.)
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b8a7a3a6

fix the braino in "namei: massage lookup_slow() to be usable by lookup_one_len_unlocked()" · 7500c38a

由 Al Viro 提交于 3月 31, 2016

We should try to trigger automount *before* bailing out on negative dentry.
Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Reported-by: NArend van Spriel <arend@broadcom.com>
Tested-by: NArend van Spriel <arend@broadcom.com>
Tested-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7500c38a

14 3月, 2016 7 次提交

kill dentry_unhash() · 9d95afd5

由 Al Viro 提交于 3月 01, 2016

the last user is gone
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9d95afd5

namei: teach lookup_slow() to skip revalidate · 949a852e

由 Al Viro 提交于 3月 06, 2016

... and make mountpoint_last() use it.  That makes all
candidates for lookup with parent locked shared go
through lookup_slow().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

949a852e

namei: massage lookup_slow() to be usable by lookup_one_len_unlocked() · e3c13928

由 Al Viro 提交于 3月 06, 2016

Return dentry and don't pass nameidata or path; lift crossing mountpoints
into the caller.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e3c13928

lookup_one_len_unlocked(): use lookup_dcache() · d6d95ded

由 Al Viro 提交于 3月 05, 2016

No need to lock parent just because of ->d_revalidate() on child;
contrary to the stale comment, lookup_dcache() *can* be used without
locking the parent. Result can be moved as soon as we return, of
course, but the same is true for lookup_one_len_unlocked() itself.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d6d95ded

A
namei: simplify invalidation logics in lookup_dcache() · 74ff0ffc
由 Al Viro 提交于 3月 05, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
74ff0ffc

namei: change calling conventions for lookup_{fast,slow} and follow_managed() · e9742b53

由 Al Viro 提交于 3月 05, 2016

Have lookup_fast() return 1 on success and 0 on "need to fall back";
lookup_slow() and follow_managed() return positive (1) on success.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e9742b53

A
namei: untanlge lookup_fast() · 5d0f49c1
由 Al Viro 提交于 3月 05, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5d0f49c1

06 3月, 2016 2 次提交

A
lookup_dcache(): lift d_alloc() into callers · 6c51e513
由 Al Viro 提交于 3月 05, 2016
```
... and kill need_lookup thing
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6c51e513

do_last(): reorder and simplify a bit · 6583fe22

由 Al Viro 提交于 3月 05, 2016

bugger off on negatives a bit earlier, simplify the tests
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6583fe22

28 2月, 2016 4 次提交

do_last(): ELOOP failure exit should be done after leaving RCU mode · 5129fa48

由 Al Viro 提交于 2月 27, 2016

... or we risk seeing a bogus value of d_is_symlink() there.

Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5129fa48

should_follow_link(): validate ->d_seq after having decided to follow · a7f77542

由 Al Viro 提交于 2月 27, 2016

... otherwise d_is_symlink() above might have nothing to do with
the inode value we've got.

Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a7f77542

namei: ->d_inode of a pinned dentry is stable only for positives · d4565649

由 Al Viro 提交于 2月 27, 2016

both do_last() and walk_component() risk picking a NULL inode out
of dentry about to become positive, *then* checking its flags and
seeing that it's not negative anymore and using (already stale by
then) value they'd fetched earlier.  Usually ends up oopsing soon
after that...

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d4565649

do_last(): don't let a bogus return value from ->open() et.al. to confuse us · c80567c8

由 Al Viro 提交于 2月 27, 2016

... into returning a positive to path_openat(), which would interpret that
as "symlink had been encountered" and proceed to corrupt memory, etc.
It can only happen due to a bug in some ->open() instance or in some LSM
hook, etc., so we report any such event *and* make sure it doesn't trick
us into further unpleasantness.

Cc: stable@vger.kernel.org # v3.6+, at least
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c80567c8

23 1月, 2016 1 次提交

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

09 1月, 2016 1 次提交

nfsd: don't hold i_mutex over userspace upcalls · bbddca8e

由 NeilBrown 提交于 1月 07, 2016

We need information about exports when crossing mountpoints during
lookup or NFSv4 readdir.  If we don't already have that information
cached, we may have to ask (and wait for) rpc.mountd.

In both cases we currently hold the i_mutex on the parent of the
directory we're asking rpc.mountd about.  We've seen situations where
rpc.mountd performs some operation on that directory that tries to take
the i_mutex again, resulting in deadlock.

With some care, we may be able to avoid that in rpc.mountd.  But it
seems better just to avoid holding a mutex while waiting on userspace.

It appears that lookup_one_len is pretty much the only operation that
needs the i_mutex.  So we could just drop the i_mutex elsewhere and do
something like

	mutex_lock()
	lookup_one_len()
	mutex_unlock()

In many cases though the lookup would have been cached and not required
the i_mutex, so it's more efficient to create a lookup_one_len() variant
that only takes the i_mutex when necessary.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bbddca8e

04 1月, 2016 1 次提交
- A
  don't carry MAY_OPEN in op->acc_mode · 62fb4a15
  由 Al Viro 提交于 12月 26, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  62fb4a15
31 12月, 2015 1 次提交
- A
  switch ->get_link() to delayed_call, kill ->put_link() · fceef393
  由 Al Viro 提交于 12月 29, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  fceef393
09 12月, 2015 3 次提交

teach page_get_link() to work in RCU mode · d3883d4f

由 Al Viro 提交于 11月 17, 2015

more or less along the lines of Neil's patchset, sans the insanity
around kmap().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d3883d4f

replace ->follow_link() with new method that could stay in RCU mode · 6b255391

由 Al Viro 提交于 11月 17, 2015

new method: ->get_link(); replacement of ->follow_link().  The differences
are:
	* inode and dentry are passed separately
	* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
	* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted.  Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode.  That'll change
in the next commits.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6b255391

don't put symlink bodies in pagecache into highmem · 21fc61c7

由 Al Viro 提交于 11月 17, 2015

kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.

new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases.  page_follow_link_light()
instrumented to yell about anything missed.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

21fc61c7

07 12月, 2015 7 次提交

restore_nameidata(): no need to clear now->stack · e1a63bbc

由 Al Viro 提交于 12月 05, 2015

microoptimization: in all callers *now is in the frame we are about to leave.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e1a63bbc

namei.c: take "jump to root" into a new helper · 248fb5b9

由 Al Viro 提交于 12月 05, 2015

... and use it both in path_init() (for absolute pathnames) and
get_link() (for absolute symlinks).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

248fb5b9

path_init(): set nd->inode earlier in cwd-relative case · ef55d917

由 Al Viro 提交于 12月 05, 2015

that allows to kill the recheck of nd->seq on the way out in
this case, and this check on the way out is left only for
absolute pathnames.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ef55d917

A
namei.c: fold set_root_rcu() into set_root() · 9e6697e2
由 Al Viro 提交于 12月 05, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
9e6697e2
M
typo in fs/namei.c comment · 57e3715c
由 Mike Marshall 提交于 11月 30, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
57e3715c
A
namei: page_getlink() and page_follow_link_light() are the same thing · aa80deab
由 Al Viro 提交于 11月 16, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
aa80deab

Don't reset ->total_link_count on nested calls of vfs_path_lookup() · 2788cc47

由 Al Viro 提交于 12月 06, 2015

we already zero it on outermost set_nameidata(), so initialization in
path_init() is pointless and wrong.  The same DoS exists on pre-4.2
kernels, but there a slightly different fix will be needed.

Cc: stable@vger.kernel.org # v4.2
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2788cc47

07 11月, 2015 1 次提交

mm, fs: introduce mapping_gfp_constraint() · c62d2555

由 Michal Hocko 提交于 11月 06, 2015

There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.

Let's introduce a helper function which makes the restriction explicit and
easier to track.  This patch doesn't introduce any functional changes.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c62d2555

28 10月, 2015 1 次提交

namei: permit linking with CAP_FOWNER in userns · f2ca3796

由 Dirk Steinmetz 提交于 10月 20, 2015

Attempting to hardlink to an unsafe file (e.g. a setuid binary) from
within an unprivileged user namespace fails, even if CAP_FOWNER is held
within the namespace. This may cause various failures, such as a gentoo
installation within a lxc container failing to build and install specific
packages.

This change permits hardlinking of files owned by mapped uids, if
CAP_FOWNER is held for that namespace. Furthermore, it improves consistency
by using the existing inode_owner_or_capable(), which is aware of
namespaced capabilities as of 23adbe12 ("fs,userns: Change
inode_capable to capable_wrt_inode_uidgid").
Signed-off-by: NDirk Steinmetz <public@rsjtdrjgfuzkfg.com>

This is hitting us in Ubuntu during some dpkg upgrades in containers.
When upgrading a file dpkg creates a hard link to the old file to back
it up before overwriting it. When packages upgrade suid files owned by a
non-root user the link isn't permitted, and the package upgrade fails.
This patch fixes our problem.
Tested-by: NSeth Forshee <seth.forshee@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

f2ca3796

11 10月, 2015 1 次提交

namei: results of d_is_negative() should be checked after dentry revalidation · daf3761c

由 Trond Myklebust 提交于 10月 09, 2015

Leandro Awa writes:
 "After switching to version 4.1.6, our parallelized and distributed
  workflows now fail consistently with errors of the form:

  T34: ./regex.c:39:22: error: config.h: No such file or directory

  From our 'git bisect' testing, the following commit appears to be the
  possible cause of the behavior we've been seeing: commit 766c4cbf"

Al Viro says:
 "What happens is that 766c4cbf got the things subtly wrong.

  We used to treat d_is_negative() after lookup_fast() as "fall with
  ENOENT".  That was wrong - checking ->d_flags outside of ->d_seq
  protection is unreliable and failing with hard error on what should've
  fallen back to non-RCU pathname resolution is a bug.

  Unfortunately, we'd pulled the test too far up and ran afoul of
  another kind of staleness.  The dentry might have been absolutely
  stable from the RCU point of view (and we might be on UP, etc), but
  stale from the remote fs point of view.  If ->d_revalidate() returns
  "it's actually stale", dentry gets thrown away and the original code
  wouldn't even have looked at its ->d_flags.

  What we need is to check ->d_flags where 766c4cbf does (prior to
  ->d_seq validation) but only use the result in cases where we do not
  discard this dentry outright"
Reported-by: NLeandro Awa <lawa@nvidia.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=104911
Fixes: 766c4cbf ("namei: d_is_negative() should be checked...")
Tested-by: NLeandro Awa <lawa@nvidia.com>
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

daf3761c

29 9月, 2015 1 次提交

fs: Drop unlikely before IS_ERR(_OR_NULL) · a1c83681

由 Viresh Kumar 提交于 8月 12, 2015

IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there
is no need to do that again from its callers. Drop it.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NJeff Layton <jlayton@poochiereds.net>
Reviewed-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NSteve French <smfrench@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

a1c83681

11 9月, 2015 1 次提交

namei: fix warning while make xmldocs caused by namei.c · 2a78b857

由 Masanari Iida 提交于 9月 09, 2015

Fix the following warnings:

Warning(.//fs/namei.c:2422): No description found for parameter 'nd'
Warning(.//fs/namei.c:2422): Excess function parameter 'nameidata'
description in 'path_mountpoint'
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2a78b857

21 8月, 2015 1 次提交

vfs: Test for and handle paths that are unreachable from their mnt_root · 397d425d

由 Eric W. Biederman 提交于 8月 15, 2015

In rare cases a directory can be renamed out from under a bind mount.
In those cases without special handling it becomes possible to walk up
the directory tree to the root dentry of the filesystem and down
from the root dentry to every other file or directory on the filesystem.

Like division by zero .. from an unconnected path can not be given
a useful semantic as there is no predicting at which path component
the code will realize it is unconnected.  We certainly can not match
the current behavior as the current behavior is a security hole.

Therefore when encounting .. when following an unconnected path
return -ENOENT.

- Add a function path_connected to verify path->dentry is reachable
  from path->mnt.mnt_root.  AKA to validate that rename did not do
  something nasty to the bind mount.

  To avoid races path_connected must be called after following a path
  component to it's next path component.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

397d425d

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功