提交 · 80b5dce8c59b0de1ed6e403b8298e02dcb4db64b · openeuler / raspberrypi-kernel

09 10月, 2014 4 次提交

vfs: Add a function to lazily unmount all mounts from any dentry. · 80b5dce8

由 Eric W. Biederman 提交于 10月 03, 2013

The new function detach_mounts comes in two pieces.  The first piece
is a static inline test of d_mounpoint that returns immediately
without taking any locks if d_mounpoint is not set.  In the common
case when mountpoints are absent this allows the vfs to continue
running with it's same cacheline foot print.

The second piece of detach_mounts __detach_mounts actually does the
work and it assumes that a mountpoint is present so it is slow and
takes namespace_sem for write, and then locks the mount hash (aka
mount_lock) after a struct mountpoint has been found.

With those two locks held each entry on the list of mounts on a
mountpoint is selected and lazily unmounted until all of the mount
have been lazily unmounted.

v7: Wrote a proper change description and removed the changelog
    documenting deleted wrong turns.
Signed-off-by: NEric W. Biederman <ebiederman@twitter.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

80b5dce8

vfs: Keep a list of mounts on a mount point · 0a5eb7c8

由 Eric W. Biederman 提交于 9月 22, 2013

To spot any possible problems call BUG if a mountpoint
is put when it's list of mounts is not empty.

AV: use hlist instead of list_head
Reviewed-by: NMiklos Szeredi <miklos@szeredi.hu>
Signed-off-by: NEric W. Biederman <ebiederman@twitter.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0a5eb7c8

vfs: Don't allow overwriting mounts in the current mount namespace · 7af1364f

由 Eric W. Biederman 提交于 10月 04, 2013

In preparation for allowing mountpoints to be renamed and unlinked
in remote filesystems and in other mount namespaces test if on a dentry
there is a mount in the local mount namespace before allowing it to
be renamed or unlinked.

The primary motivation here are old versions of fusermount unmount
which is not safe if the a path can be renamed or unlinked while it is
verifying the mount is safe to unmount.  More recent versions are simpler
and safer by simply using UMOUNT_NOFOLLOW when unmounting a mount
in a directory owned by an arbitrary user.

Miklos Szeredi <miklos@szeredi.hu> reports this is approach is good
enough to remove concerns about new kernels mixed with old versions
of fusermount.

A secondary motivation for restrictions here is that it removing empty
directories that have non-empty mount points on them appears to
violate the rule that rmdir can not remove empty directories.  As
Linus Torvalds pointed out this is useful for programs (like git) that
test if a directory is empty with rmdir.

Therefore this patch arranges to enforce the existing mount point
semantics for local mount namespace.

v2: Rewrote the test to be a drop in replacement for d_mountpoint
v3: Use bool instead of int as the return type of is_local_mountpoint
Reviewed-by: NMiklos Szeredi <miklos@szeredi.hu>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7af1364f

delayed mntput · 9ea459e1

由 Al Viro 提交于 8月 08, 2014

On final mntput() we want fs shutdown to happen before return to
userland; however, the only case where we want it happen right
there (i.e. where task_work_add won't do) is MNT_INTERNAL victim.
Those have to be fully synchronous - failure halfway through module
init might count on having vfsmount killed right there.  Fortunately,
final mntput on MNT_INTERNAL vfsmounts happens on shallow stack.
So we handle those synchronously and do an analog of delayed fput
logics for everything else.

As the result, we are guaranteed that fs shutdown will always happen
on shallow stack.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9ea459e1

08 8月, 2014 2 次提交

death to mnt_pinned · 3064c356

由 Al Viro 提交于 8月 07, 2014

Rather than playing silly buggers with vfsmount refcounts, just have
acct_on() ask fs/namespace.c for internal clone of file->f_path.mnt
and replace it with said clone.  Then attach the pin to original
vfsmount.  Voila - the clone will be alive until the file gets closed,
making sure that underlying superblock remains active, etc., and
we can drop the original vfsmount, so that it's not kept busy.
If the file lives until the final mntput of the original vfsmount,
we'll notice that there's an fs_pin (one in bsd_acct_struct that
holds that file) and mnt_pin_kill() will take it out.  Since
->kill() is synchronous, we won't proceed past that point until
these files are closed (and private clones of our vfsmount are
gone), so we get the same ordering warranties we used to get.

mnt_pin()/mnt_unpin()/->mnt_pinned is gone now, and good riddance -
it never became usable outside of kernel/acct.c (and racy wrt
umount even there).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3064c356

acct: get rid of acct_list · 215752fc

由 Al Viro 提交于 8月 07, 2014

Put these suckers on per-vfsmount and per-superblock lists instead.
Note: right now it's still acct_lock for everything, but that's
going to change.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

215752fc

02 4月, 2014 1 次提交
- A
  reduce m_start() cost... · c7999c36
  由 Al Viro 提交于 2月 27, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  c7999c36
31 3月, 2014 2 次提交

switch mnt_hash to hlist · 38129a13

由 Al Viro 提交于 3月 20, 2014

fixes RCU bug - walking through hlist is safe in face of element moves,
since it's self-terminating.  Cyclic lists are not - if we end up jumping
to another hash chain, we'll loop infinitely without ever hitting the
original list head.

[fix for dumb braino folded]

Spotted by: Max Kellermann <mk@cm4all.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

38129a13

resizable namespace.c hashes · 0818bf27

由 Al Viro 提交于 2月 28, 2014

* switch allocation to alloc_large_system_hash()
* make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
* switch mountpoint_hashtable from list_head to hlist_head

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0818bf27

26 1月, 2014 1 次提交

vfs: Is mounted should be testing mnt_ns for NULL or error. · 260a459d

由 Eric W. Biederman 提交于 1月 20, 2014

A bug was introduced with the is_mounted helper function in
commit f7a99c5b
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat Jun 9 00:59:08 2012 -0400

    get rid of ->mnt_longterm

    it's enough to set ->mnt_ns of internal vfsmounts to something
    distinct from all struct mnt_namespace out there; then we can
    just use the check for ->mnt_ns != NULL in the fast path of
    mntput_no_expire()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

The intent was to test if the real_mount(vfsmount)->mnt_ns was
NULL_OR_ERR but the code is actually testing real_mount(vfsmount)
and always returning true.

The result is d_absolute_path returning paths it should be hiding.

Cc: stable@vger.kernel.org
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

260a459d

09 11月, 2013 1 次提交

RCU'd vfsmounts · 48a066e7

由 Al Viro 提交于 9月 29, 2013

* RCU-delayed freeing of vfsmounts
* vfsmount_lock replaced with a seqlock (mount_lock)
* sequence number from mount_lock is stored in nameidata->m_seq and
used when we exit RCU mode
* new vfsmount flag - MNT_SYNC_UMOUNT.  Set by umount_tree() when its
caller knows that vfsmount will have no surviving references.
* synchronize_rcu() done between unlocking namespace_sem in namespace_unlock()
and doing pending mntput().
* new helper: legitimize_mnt(mnt, seq).  Checks the mount_lock sequence
number against seq, then grabs reference to mnt.  Then it rechecks mount_lock
again to close the race and either returns success or drops the reference it
has acquired.  The subtle point is that in case of MNT_SYNC_UMOUNT we can
simply decrement the refcount and sod off - aforementioned synchronize_rcu()
makes sure that final mntput() won't come until we leave RCU mode.  We need
that, since we don't want to end up with some lazy pathwalk racing with
umount() and stealing the final mntput() from it - caller of umount() may
expect it to return only once the fs is shut down and we don't want to break
that.  In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do
full-blown mntput() in case of mount_lock sequence number mismatch happening
just as we'd grabbed the reference, but in those cases we won't be stealing
the final mntput() from anything that would care.
* mntput_no_expire() doesn't lock anything on the fast path now.  Incidentally,
SMP and UP cases are handled the same way - no ifdefs there.
* normal pathname resolution does *not* do any writes to mount_lock.  It does,
of course, bump the refcounts of vfsmount and dentry in the very end, but that's
it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

48a066e7

25 10月, 2013 3 次提交

split __lookup_mnt() in two functions · 474279dc

由 Al Viro 提交于 10月 01, 2013

Instead of passing the direction as argument (and checking it on every
step through the hash chain), just have separate __lookup_mnt() and
__lookup_mnt_last().  And use the standard iterators...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

474279dc

new helpers: lock_mount_hash/unlock_mount_hash · 719ea2fb

由 Al Viro 提交于 9月 29, 2013

aka br_write_{lock,unlock} of vfsmount_lock.  Inlines in fs/mount.h,
vfsmount_lock extern moved over there as well.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

719ea2fb

A
namespace.c: get rid of mnt_ghosts · aba809cf
由 Al Viro 提交于 9月 28, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
aba809cf

10 4月, 2013 1 次提交
- A
  get rid of full-hash scan on detaching vfsmounts · 84d17192
  由 Al Viro 提交于 3月 15, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  84d17192
20 11月, 2012 1 次提交

proc: Usable inode numbers for the namespace file descriptors. · 98f842e6

由 Eric W. Biederman 提交于 6月 15, 2011

Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.

A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.

This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.

We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.

I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

98f842e6

19 11月, 2012 2 次提交

vfs: Add a user namespace reference from struct mnt_namespace · 771b1371

由 Eric W. Biederman 提交于 7月 26, 2012

This will allow for support for unprivileged mounts in a new user namespace.
Acked-by: N"Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

771b1371

vfs: Add setns support for the mount namespace · 8823c079

由 Eric W. Biederman 提交于 3月 07, 2010

setns support for the mount namespace is a little tricky as an
arbitrary decision must be made about what to set fs->root and
fs->pwd to, as there is no expectation of a relationship between
the two mount namespaces.  Therefore I arbitrarily find the root
mount point, and follow every mount on top of it to find the top
of the mount stack.  Then I set fs->root and fs->pwd to that
location.  The topmost root of the mount stack seems like a
reasonable place to be.

Bind mount support for the mount namespace inodes has the
possibility of creating circular dependencies between mount
namespaces.  Circular dependencies can result in loops that
prevent mount namespaces from every being freed.  I avoid
creating those circular dependencies by adding a sequence number
to the mount namespace and require all bind mounts be of a
younger mount namespace into an older mount namespace.

Add a helper function proc_ns_inode so it is possible to
detect when we are attempting to bind mound a namespace inode.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

8823c079

14 7月, 2012 2 次提交

get rid of magic in proc_namespace.c · 6ce6e24e

由 Al Viro 提交于 6月 09, 2012

don't rely on proc_mounts->m being the first field; container_of()
is there for purpose.  No need to bother with ->private, while
we are at it - the same container_of will do nicely.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6ce6e24e

get rid of ->mnt_longterm · f7a99c5b

由 Al Viro 提交于 6月 09, 2012

it's enough to set ->mnt_ns of internal vfsmounts to something
distinct from all struct mnt_namespace out there; then we can
just use the check for ->mnt_ns != NULL in the fast path of
mntput_no_expire()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f7a99c5b

07 1月, 2012 1 次提交

vfs: keep list of mounts for each superblock · 39f7c4db

由 Miklos Szeredi 提交于 11月 21, 2011

Keep track of vfsmounts belonging to a superblock.  List is protected
by vfsmount_lock.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

39f7c4db

04 1月, 2012 19 次提交
- A
  switch mnt_namespace ->root to struct mount · be08d6d2
  由 Al Viro 提交于 12月 06, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  be08d6d2
- A
  vfs: take /proc/*/mounts and friends to fs/proc_namespace.c · 0226f492
  由 Al Viro 提交于 12月 06, 2011
```
rationale: that stuff is far tighter bound to fs/namespace.c than to
the guts of procfs proper.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  0226f492
- A
  vfs: move fsnotify junk to struct mount · c63181e6
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  c63181e6
- A
  vfs: move mnt_devname · 52ba1621
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  52ba1621
- A
  vfs: move mnt_list to struct mount · 1a4eeaf2
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  1a4eeaf2
- A
  vfs: move the rest of int fields to struct mount · 863d684f
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  863d684f
- A
  vfs: mnt_id/mnt_group_id moved · 15169fe7
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  15169fe7
- A
  vfs: mnt_ns moved to struct mount · 143c8c91
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  143c8c91
- A
  vfs: take mnt_share/mnt_slave/mnt_slave_list and mnt_expire to struct mount · 6776db3d
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  6776db3d
- A
  vfs: and now we can make ->mnt_master point to struct mount · 32301920
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  32301920
- A
  vfs: take mnt_master to struct mount · d10e8def
  由 Al Viro 提交于 11月 25, 2011
```
make IS_MNT_SLAVE take struct mount * at the same time
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  d10e8def
- A
  vfs: take mnt_child/mnt_mounts to struct mount · 6b41d536
  由 Al Viro 提交于 11月 24, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  6b41d536
- A
  vfs: all counters taken to struct mount · 68e8a9fe
  由 Al Viro 提交于 11月 24, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  68e8a9fe
- A
  vfs: move mnt_mountpoint to struct mount · a73324da
  由 Al Viro 提交于 11月 24, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  a73324da
- A
  vfs: now it can be done - make mnt_parent point to struct mount · 0714a533
  由 Al Viro 提交于 11月 24, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  0714a533
- A
  vfs: mnt_parent moved to struct mount · 3376f34f
  由 Al Viro 提交于 11月 24, 2011
```
the second victim...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  3376f34f
- A
  vfs: spread struct mount - mnt_has_parent · 676da58d
  由 Al Viro 提交于 11月 24, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  676da58d
- A
  vfs: the first spoils - mnt_hash moved · 1b8e5564
  由 Al Viro 提交于 11月 24, 2011
```
taken out of struct vfsmount into struct mount
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  1b8e5564
- A
  vfs: spread struct mount - __lookup_mnt() result · c7105365
  由 Al Viro 提交于 11月 24, 2011
```
switch __lookup_mnt() to returning struct mount *; callers adjusted.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  c7105365