提交 · f6b742d8697ae0aeacb025e6e0ab3c61a6918846 · openeuler / raspberrypi-kernel

25 10月, 2013 9 次提交
- A
  mnt_set_expiry() doesn't need vfsmount_lock · f6b742d8
  由 Al Viro 提交于 9月 28, 2013
```
->mnt_expire is protected by namespace_sem
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  f6b742d8
- A
  finish_automount() doesn't need vfsmount_lock for removal from expiry list · 22a79192
  由 Al Viro 提交于 9月 28, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  22a79192
- A
  fs/namespace.c: bury long-dead define · 085e83ff
  由 Al Viro 提交于 9月 28, 2013
```
MNT_WRITER_UNDERFLOW_LIMIT has been missed 4 years ago when it became unused.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  085e83ff
- A
  fold mntfree() into mntput_no_expire() · 649a795a
  由 Al Viro 提交于 9月 28, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  649a795a
- A
  do_remount(): pull touch_mnt_namespace() up · 6339dab8
  由 Al Viro 提交于 9月 16, 2013
```
... and don't bother with dropping and regaining vfsmount_lock
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  6339dab8
- A
  dup_mnt_ns(): get rid of pointless grabbing of vfsmount_lock · aa7a574d
  由 Al Viro 提交于 9月 16, 2013
```
mnt_list is protected by namespace_sem, not vfsmount_lock
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  aa7a574d
- A
  fs_is_visible only needs namespace_sem held shared · 44bb4385
  由 Al Viro 提交于 9月 16, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  44bb4385
- A
  initialize namespace_sem statically · 59aa0da8
  由 Al Viro 提交于 9月 16, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  59aa0da8
- A
  put_mnt_ns(): use drop_collected_mounts() · 7b00ed6f
  由 Al Viro 提交于 9月 16, 2013
```
... rather than open-coding it
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  7b00ed6f
12 9月, 2013 1 次提交

initmpfs: move rootfs code from fs/ramfs/ to init/ · 57f150a5

由 Rob Landley 提交于 9月 11, 2013

When the rootfs code was a wrapper around ramfs, having them in the same
file made sense.  Now that it can wrap another filesystem type, move it in
with the init code instead.

This also allows a subsequent patch to access rootfstype= command line
arg.
Signed-off-by: NRob Landley <rob@landley.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57f150a5

09 9月, 2013 1 次提交

rename user_path_umountat() to user_path_mountpoint_at() · 197df04c

由 Al Viro 提交于 9月 08, 2013

... and move the extern from linux/namei.h to fs/internal.h,
along with that of vfs_path_lookup().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

197df04c

06 9月, 2013 1 次提交

vfs: check unlinked ancestors before mount · eed81007

由 Miklos Szeredi 提交于 9月 05, 2013

We check submounts before doing d_drop() on a non-empty directory dentry in
NFS (have_submounts()), but we do not exclude a racing mount. Nor do we
prevent mounts to be added to the disconnected subtree using relative paths
after the d_drop().

This patch fixes these issues by checking for unlinked (unhashed, non-root)
ancestors before proceeding with the mount. This is done with rename
seqlock taken for write and with ->d_lock grabbed on each ancestor in turn,
including our dentry itself. This ensures that the only one of
check_submounts_and_drop() or has_unlinked_ancestor() can succeed.
Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eed81007

04 9月, 2013 1 次提交

vfs: allow umount to handle mountpoints without revalidating them · 8033426e

由 Jeff Layton 提交于 7月 26, 2013

Christopher reported a regression where he was unable to unmount a NFS
filesystem where the root had gone stale. The problem is that
d_revalidate handles the root of the filesystem differently from other
dentries, but d_weak_revalidate does not. We could simply fix this by
making d_weak_revalidate return success on IS_ROOT dentries, but there
are cases where we do want to revalidate the root of the fs.

A umount is really a special case. We generally aren't interested in
anything but the dentry and vfsmount that's attached at that point. If
the inode turns out to be stale we just don't care since the intent is
to stop using it anyway.

Try to handle this situation better by treating umount as a special
case in the lookup code. Have it resolve the parent using normal
means, and then do a lookup of the final dentry without revalidating
it. In most cases, the final lookup will come out of the dcache, but
the case where there's a trailing symlink or !LAST_NORM entry on the
end complicates things a bit.

Cc: Neil Brown <neilb@suse.de>
Reported-by: NChristopher T Vogan <cvogan@us.ibm.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8033426e

31 8月, 2013 1 次提交

userns: Kill nsown_capable it makes the wrong thing easy · c7b96acf

由 Eric W. Biederman 提交于 3月 20, 2013

nsown_capable is a special case of ns_capable essentially for just CAP_SETUID and
CAP_SETGID. For the existing users it doesn't noticably simplify things and
from the suggested patches I have seen it encourages people to do the wrong
thing. So remove nsown_capable.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

c7b96acf

27 8月, 2013 2 次提交

userns: Better restrictions on when proc and sysfs can be mounted · e51db735

由 Eric W. Biederman 提交于 3月 30, 2013

Rely on the fact that another flavor of the filesystem is already
mounted and do not rely on state in the user namespace.

Verify that the mounted filesystem is not covered in any significant
way.  I would love to verify that the previously mounted filesystem
has no mounts on top but there are at least the directories
/proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
for other filesystems to mount on top of.

Refactor the test into a function named fs_fully_visible and call that
function from the mount routines of proc and sysfs.  This makes this
test local to the filesystems involved and the results current of when
the mounts take place, removing a weird threading of the user
namespace, the mount namespace and the filesystems themselves.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

e51db735

vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces · 4ce5d2b1

由 Eric W. Biederman 提交于 3月 30, 2013

Don't copy bind mounts of /proc/<pid>/ns/mnt between namespaces.
These files hold references to a mount namespace and copying them
between namespaces could result in a reference counting loop.

The current mnt_ns_loop test prevents loops on the assumption that
mounts don't cross between namespaces. Unfortunately unsharing a
mount namespace and shared substrees can both cause mounts to
propogate between mount namespaces.

Add two flags CL_COPY_UNBINDABLE and CL_COPY_MNT_NS_FILE are added to
control this behavior, and CL_COPY_ALL is redefined as both of them.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

4ce5d2b1

25 8月, 2013 1 次提交

VFS: collect_mounts() should return an ERR_PTR · 52e220d3

由 Dan Carpenter 提交于 8月 14, 2013

This should actually be returning an ERR_PTR on error instead of NULL.
That was how it was designed and all the callers expect it.

[AV: actually, that's what "VFS: Make clone_mnt()/copy_tree()/collect_mounts()
return errors" missed - originally collect_mounts() was expected to return
NULL on failure]

Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

52e220d3

25 7月, 2013 1 次提交

vfs: Lock in place mounts from more privileged users · 5ff9d8a6

由 Eric W. Biederman 提交于 3月 29, 2013

When creating a less privileged mount namespace or propogating mounts
from a more privileged to a less privileged mount namespace lock the
submounts so they may not be unmounted individually in the child mount
namespace revealing what is under them.

This enforces the reasonable expectation that it is not possible to
see under a mount point.  Most of the time mounts are on empty
directories and revealing that does not matter, however I have seen an
occassionaly sloppy configuration where there were interesting things
concealed under a mount point that probably should not be revealed.

Expirable submounts are not locked because they will eventually
unmount automatically so whatever is under them already needs
to be safe for unprivileged users to access.

From a practical standpoint these restrictions do not appear to be
significant for unprivileged users of the mount namespace.  Recursive
bind mounts and pivot_root continues to work, and mounts that are
created in a mount namespace may be unmounted there.  All of which
means that the common idiom of keeping a directory of interesting
files and using pivot_root to throw everything else away continues to
work just fine.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Acked-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

5ff9d8a6

05 5月, 2013 2 次提交

create_mnt_ns: unidiomatic use of list_add() · b1983cd8

由 Al Viro 提交于 5月 04, 2013

while list_add(A, B) and list_add(B, A) are equivalent when both A and B
are guaranteed to be empty, the usual idiom is list_add(what, where),
not the other way round...  Not a bug per se, but only by accident and
it makes RTFS harder for no good reason.
Spotted-by: NRajat Sharma <fs.rajat@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b1983cd8

do_mount(): fix a leak introduced in 3.9 ("mount: consolidate permission checks") · 0d5cadb8

由 Al Viro 提交于 5月 04, 2013

Cc: stable@vger.kernel.org
Bisected-by: NMichael Leun <lkml20130126@newton.leun.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0d5cadb8

02 5月, 2013 1 次提交

proc: Split the namespace stuff out into linux/proc_ns.h · 0bb80f24

由 David Howells 提交于 4月 12, 2013

Split the proc namespace stuff out into linux/proc_ns.h.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: netdev@vger.kernel.org
cc: Serge E. Hallyn <serge.hallyn@ubuntu.com>
cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0bb80f24

10 4月, 2013 7 次提交

fold release_mounts() into namespace_unlock() · 97216be0

由 Al Viro 提交于 3月 16, 2013

... and provide namespace_lock() as a trivial wrapper;
switch to those two consistently.

Result is patterned after rtnl_lock/rtnl_unlock pair.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

97216be0

switch unlock_mount() to namespace_unlock(), convert all umount_tree() callers · 328e6d90

由 Al Viro 提交于 3月 16, 2013

which allows to kill the last argument of umount_tree() and make release_mounts()
static.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

328e6d90

A
more conversions to namespace_unlock() · 3ab6abee
由 Al Viro 提交于 3月 16, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3ab6abee
A
get rid of the second argument of shrink_submounts() · b54b9be7
由 Al Viro 提交于 3月 16, 2013
```
... it's always &unmounted.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b54b9be7

saner umount_tree()/release_mounts(), part 1 · e3197d83

由 Al Viro 提交于 3月 16, 2013

global list of release_mounts() fodder, protected by namespace_sem;
eventually, all umount_tree() callers will use it as kill list.
Helper picking the contents of that list, releasing namespace_sem
and doing release_mounts() on what it got.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e3197d83

A
get rid of full-hash scan on detaching vfsmounts · 84d17192
由 Al Viro 提交于 3月 15, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
84d17192

mnt: release locks on error path in do_loopback · e9c5d8a5

由 Andrey Vagin 提交于 4月 09, 2013

do_loopback calls lock_mount(path) and forget to unlock_mount
if clone_mnt or copy_mnt fails.

[   77.661566] ================================================
[   77.662939] [ BUG: lock held when returning to user space! ]
[   77.664104] 3.9.0-rc5+ #17 Not tainted
[   77.664982] ------------------------------------------------
[   77.666488] mount/514 is leaving the kernel with locks still held!
[   77.668027] 2 locks held by mount/514:
[   77.668817]  #0:  (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffff811cca22>] lock_mount+0x32/0xe0
[   77.671755]  #1:  (&namespace_sem){+++++.}, at: [<ffffffff811cca3a>] lock_mount+0x4a/0xe0
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e9c5d8a5

27 3月, 2013 4 次提交

userns: Restrict when proc and sysfs can be mounted · 87a8ebd6

由 Eric W. Biederman 提交于 3月 24, 2013

Only allow unprivileged mounts of proc and sysfs if they are already
mounted when the user namespace is created.

proc and sysfs are interesting because they have content that is
per namespace, and so fresh mounts are needed when new namespaces
are created while at the same time proc and sysfs have content that
is shared between every instance.

Respect the policy of who may see the shared content of proc and sysfs
by only allowing new mounts if there was an existing mount at the time
the user namespace was created.

In practice there are only two interesting cases: proc and sysfs are
mounted at their usual places, proc and sysfs are not mounted at all
(some form of mount namespace jail).

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

87a8ebd6

vfs: Carefully propogate mounts across user namespaces · 132c94e3

由 Eric W. Biederman 提交于 3月 22, 2013

As a matter of policy MNT_READONLY should not be changable if the
original mounter had more privileges than creator of the mount
namespace.

Add the flag CL_UNPRIVILEGED to note when we are copying a mount from
a mount namespace that requires more privileges to a mount namespace
that requires fewer privileges.

When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT
if any of the mnt flags that should never be changed are set.

This protects both mount propagation and the initial creation of a less
privileged mount namespace.

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

132c94e3

vfs: Add a mount flag to lock read only bind mounts · 90563b19

由 Eric W. Biederman 提交于 3月 22, 2013

When a read-only bind mount is copied from mount namespace in a higher
privileged user namespace to a mount namespace in a lesser privileged
user namespace, it should not be possible to remove the the read-only
restriction.

Add a MNT_LOCK_READONLY mount flag to indicate that a mount must
remain read-only.

CC: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

90563b19

userns: Don't allow creation if the user is chrooted · 3151527e

由 Eric W. Biederman 提交于 3月 15, 2013

Guarantee that the policy of which files may be access that is
established by setting the root directory will not be violated
by user namespaces by verifying that the root directory points
to the root of the mount namespace at the time of user namespace
creation.

Changing the root is a privileged operation, and as a matter of policy
it serves to limit unprivileged processes to files below the current
root directory.

For reasons of simplicity and comprehensibility the privilege to
change the root directory is gated solely on the CAP_SYS_CHROOT
capability in the user namespace.  Therefore when creating a user
namespace we must ensure that the policy of which files may be access
can not be violated by changing the root directory.

Anyone who runs a processes in a chroot and would like to use user
namespace can setup the same view of filesystems with a mount
namespace instead.  With this result that this is not a practical
limitation for using user namespaces.

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

3151527e

23 2月, 2013 3 次提交

A
new helper: file_inode(file) · 496ad9aa
由 Al Viro 提交于 1月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
496ad9aa

mount: consolidate permission checks · 57eccb83

由 Al Viro 提交于 2月 22, 2013

... and ask for global CAP_SYS_ADMIN only for superblock-level remounts
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

57eccb83

get rid of unprotected dereferencing of mnt->mnt_ns · 9b40bc90

由 Al Viro 提交于 2月 22, 2013

It's safe only under namespace_sem or vfsmount_lock; all places
in fs/namespace.c that want mnt->mnt_ns->user_ns actually want to use
current->nsproxy->mnt_ns->user_ns (note the calls of check_mnt() in
there).

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9b40bc90

21 12月, 2012 1 次提交

vfs, freeze: use ACCESS_ONCE() to guard access to ->mnt_flags · 1e75529e

由 Miao Xie 提交于 11月 16, 2012

The compiler may optimize the while loop and make the check just be done once,
so we should use ACCESS_ONCE() to guard access to ->mnt_flags
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1e75529e

15 12月, 2012 1 次提交

userns: Require CAP_SYS_ADMIN for most uses of setns. · 5e4a0847

由 Eric W. Biederman 提交于 12月 14, 2012

Andy Lutomirski <luto@amacapital.net> found a nasty little bug in
the permissions of setns.  With unprivileged user namespaces it
became possible to create new namespaces without privilege.

However the setns calls were relaxed to only require CAP_SYS_ADMIN in
the user nameapce of the targed namespace.

Which made the following nasty sequence possible.

pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
if (pid == 0) { /* child */
	system("mount --bind /home/me/passwd /etc/passwd");
}
else if (pid != 0) { /* parent */
	char path[PATH_MAX];
	snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
	fd = open(path, O_RDONLY);
	setns(fd, 0);
	system("su -");
}

Prevent this possibility by requiring CAP_SYS_ADMIN
in the current user namespace when joing all but the user namespace.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

5e4a0847

20 11月, 2012 1 次提交

proc: Usable inode numbers for the namespace file descriptors. · 98f842e6

由 Eric W. Biederman 提交于 6月 15, 2011

Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.

A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.

This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.

We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.

I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

98f842e6

19 11月, 2012 2 次提交

userns: fix return value on mntns_install() failure · ae11e0f1

由 Zhao Hongjiang 提交于 9月 13, 2012

Change return value from -EINVAL to -EPERM when the permission check fails.
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

ae11e0f1

vfs: Allow unprivileged manipulation of the mount namespace. · 0c55cfc4

由 Eric W. Biederman 提交于 7月 26, 2012

- Add a filesystem flag to mark filesystems that are safe to mount as
  an unprivileged user.

- Add a filesystem flag to mark filesystems that don't need MNT_NODEV
  when mounted by an unprivileged user.

- Relax the permission checks to allow unprivileged users that have
  CAP_SYS_ADMIN permissions in the user namespace referred to by the
  current mount namespace to be allowed to mount, unmount, and move
  filesystems.
Acked-by: N"Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

0c55cfc4