提交 · e8f2b548de7ae65e17ee911e54712a3f26f69c60 · xiphi1978 / linux

10 4月, 2013 1 次提交

mnt: release locks on error path in do_loopback · e9c5d8a5

由 Andrey Vagin 提交于 4月 09, 2013

do_loopback calls lock_mount(path) and forget to unlock_mount
if clone_mnt or copy_mnt fails.

[   77.661566] ================================================
[   77.662939] [ BUG: lock held when returning to user space! ]
[   77.664104] 3.9.0-rc5+ #17 Not tainted
[   77.664982] ------------------------------------------------
[   77.666488] mount/514 is leaving the kernel with locks still held!
[   77.668027] 2 locks held by mount/514:
[   77.668817]  #0:  (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffff811cca22>] lock_mount+0x32/0xe0
[   77.671755]  #1:  (&namespace_sem){+++++.}, at: [<ffffffff811cca3a>] lock_mount+0x4a/0xe0
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e9c5d8a5

27 3月, 2013 4 次提交

userns: Restrict when proc and sysfs can be mounted · 87a8ebd6

由 Eric W. Biederman 提交于 3月 24, 2013

Only allow unprivileged mounts of proc and sysfs if they are already
mounted when the user namespace is created.

proc and sysfs are interesting because they have content that is
per namespace, and so fresh mounts are needed when new namespaces
are created while at the same time proc and sysfs have content that
is shared between every instance.

Respect the policy of who may see the shared content of proc and sysfs
by only allowing new mounts if there was an existing mount at the time
the user namespace was created.

In practice there are only two interesting cases: proc and sysfs are
mounted at their usual places, proc and sysfs are not mounted at all
(some form of mount namespace jail).

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

87a8ebd6

vfs: Carefully propogate mounts across user namespaces · 132c94e3

由 Eric W. Biederman 提交于 3月 22, 2013

As a matter of policy MNT_READONLY should not be changable if the
original mounter had more privileges than creator of the mount
namespace.

Add the flag CL_UNPRIVILEGED to note when we are copying a mount from
a mount namespace that requires more privileges to a mount namespace
that requires fewer privileges.

When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT
if any of the mnt flags that should never be changed are set.

This protects both mount propagation and the initial creation of a less
privileged mount namespace.

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

132c94e3

vfs: Add a mount flag to lock read only bind mounts · 90563b19

由 Eric W. Biederman 提交于 3月 22, 2013

When a read-only bind mount is copied from mount namespace in a higher
privileged user namespace to a mount namespace in a lesser privileged
user namespace, it should not be possible to remove the the read-only
restriction.

Add a MNT_LOCK_READONLY mount flag to indicate that a mount must
remain read-only.

CC: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

90563b19

userns: Don't allow creation if the user is chrooted · 3151527e

由 Eric W. Biederman 提交于 3月 15, 2013

Guarantee that the policy of which files may be access that is
established by setting the root directory will not be violated
by user namespaces by verifying that the root directory points
to the root of the mount namespace at the time of user namespace
creation.

Changing the root is a privileged operation, and as a matter of policy
it serves to limit unprivileged processes to files below the current
root directory.

For reasons of simplicity and comprehensibility the privilege to
change the root directory is gated solely on the CAP_SYS_CHROOT
capability in the user namespace.  Therefore when creating a user
namespace we must ensure that the policy of which files may be access
can not be violated by changing the root directory.

Anyone who runs a processes in a chroot and would like to use user
namespace can setup the same view of filesystems with a mount
namespace instead.  With this result that this is not a practical
limitation for using user namespaces.

Cc: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

3151527e

23 2月, 2013 3 次提交

A
new helper: file_inode(file) · 496ad9aa
由 Al Viro 提交于 1月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
496ad9aa

mount: consolidate permission checks · 57eccb83

由 Al Viro 提交于 2月 22, 2013

... and ask for global CAP_SYS_ADMIN only for superblock-level remounts
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

57eccb83

get rid of unprotected dereferencing of mnt->mnt_ns · 9b40bc90

由 Al Viro 提交于 2月 22, 2013

It's safe only under namespace_sem or vfsmount_lock; all places
in fs/namespace.c that want mnt->mnt_ns->user_ns actually want to use
current->nsproxy->mnt_ns->user_ns (note the calls of check_mnt() in
there).

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9b40bc90

21 12月, 2012 1 次提交

vfs, freeze: use ACCESS_ONCE() to guard access to ->mnt_flags · 1e75529e

由 Miao Xie 提交于 11月 16, 2012

The compiler may optimize the while loop and make the check just be done once,
so we should use ACCESS_ONCE() to guard access to ->mnt_flags
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1e75529e

15 12月, 2012 1 次提交

userns: Require CAP_SYS_ADMIN for most uses of setns. · 5e4a0847

由 Eric W. Biederman 提交于 12月 14, 2012

Andy Lutomirski <luto@amacapital.net> found a nasty little bug in
the permissions of setns.  With unprivileged user namespaces it
became possible to create new namespaces without privilege.

However the setns calls were relaxed to only require CAP_SYS_ADMIN in
the user nameapce of the targed namespace.

Which made the following nasty sequence possible.

pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
if (pid == 0) { /* child */
	system("mount --bind /home/me/passwd /etc/passwd");
}
else if (pid != 0) { /* parent */
	char path[PATH_MAX];
	snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
	fd = open(path, O_RDONLY);
	setns(fd, 0);
	system("su -");
}

Prevent this possibility by requiring CAP_SYS_ADMIN
in the current user namespace when joing all but the user namespace.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

5e4a0847

20 11月, 2012 1 次提交

proc: Usable inode numbers for the namespace file descriptors. · 98f842e6

由 Eric W. Biederman 提交于 6月 15, 2011

Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.

A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.

This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.

We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.

I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

98f842e6

19 11月, 2012 5 次提交

userns: fix return value on mntns_install() failure · ae11e0f1

由 Zhao Hongjiang 提交于 9月 13, 2012

Change return value from -EINVAL to -EPERM when the permission check fails.
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

ae11e0f1

vfs: Allow unprivileged manipulation of the mount namespace. · 0c55cfc4

由 Eric W. Biederman 提交于 7月 26, 2012

- Add a filesystem flag to mark filesystems that are safe to mount as
  an unprivileged user.

- Add a filesystem flag to mark filesystems that don't need MNT_NODEV
  when mounted by an unprivileged user.

- Relax the permission checks to allow unprivileged users that have
  CAP_SYS_ADMIN permissions in the user namespace referred to by the
  current mount namespace to be allowed to mount, unmount, and move
  filesystems.
Acked-by: N"Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

0c55cfc4

vfs: Only support slave subtrees across different user namespaces · 7a472ef4

由 Eric W. Biederman 提交于 7月 31, 2012

Sharing mount subtress with mount namespaces created by unprivileged
users allows unprivileged mounts created by unprivileged users to
propagate to mount namespaces controlled by privileged users.

Prevent nasty consequences by changing shared subtrees to slave
subtress when an unprivileged users creates a new mount namespace.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

7a472ef4

vfs: Add a user namespace reference from struct mnt_namespace · 771b1371

由 Eric W. Biederman 提交于 7月 26, 2012

This will allow for support for unprivileged mounts in a new user namespace.
Acked-by: N"Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

771b1371

vfs: Add setns support for the mount namespace · 8823c079

由 Eric W. Biederman 提交于 3月 07, 2010

setns support for the mount namespace is a little tricky as an
arbitrary decision must be made about what to set fs->root and
fs->pwd to, as there is no expectation of a relationship between
the two mount namespaces.  Therefore I arbitrarily find the root
mount point, and follow every mount on top of it to find the top
of the mount stack.  Then I set fs->root and fs->pwd to that
location.  The topmost root of the mount stack seems like a
reasonable place to be.

Bind mount support for the mount namespace inodes has the
possibility of creating circular dependencies between mount
namespaces.  Circular dependencies can result in loops that
prevent mount namespaces from every being freed.  I avoid
creating those circular dependencies by adding a sequence number
to the mount namespace and require all bind mounts be of a
younger mount namespace into an older mount namespace.

Add a helper function proc_ns_inode so it is possible to
detect when we are attempting to bind mound a namespace inode.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

8823c079

13 10月, 2012 1 次提交

vfs: define struct filename and have getname() return it · 91a27b2a

由 Jeff Layton 提交于 10月 10, 2012

getname() is intended to copy pathname strings from userspace into a
kernel buffer. The result is just a string in kernel space. It would
however be quite helpful to be able to attach some ancillary info to
the string.

For instance, we could attach some audit-related info to reduce the
amount of audit-related processing needed. When auditing is enabled,
we could also call getname() on the string more than once and not
need to recopy it from userspace.

This patchset converts the getname()/putname() interfaces to return
a struct instead of a string. For now, the struct just tracks the
string in kernel space and the original userland pointer for it.

Later, we'll add other information to the struct as it becomes
convenient.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

91a27b2a

12 10月, 2012 1 次提交
- A
  consitify do_mount() arguments · 808d4e3c
  由 Al Viro 提交于 10月 11, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  808d4e3c
23 9月, 2012 1 次提交

do_add_mount()/umount -l races · 156cacb1

由 Al Viro 提交于 9月 21, 2012

normally we deal with lock_mount()/umount races by checking that
mountpoint to be is still in our namespace after lock_mount() has
been done.  However, do_add_mount() skips that check when called
with MNT_SHRINKABLE in flags (i.e. from finish_automount()).  The
reason is that ->mnt_ns may be a temporary namespace created exactly
to contain automounts a-la NFS4 referral handling.  It's not the
namespace of the caller, though, so check_mnt() would fail here.
We still need to check that ->mnt_ns is non-NULL in that case,
though.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

156cacb1

31 7月, 2012 1 次提交

fs: Add freezing handling to mnt_want_write() / mnt_drop_write() · eb04c282

由 Jan Kara 提交于 6月 12, 2012

Most of places where we want freeze protection coincides with the places where
we also have remount-ro protection. So make mnt_want_write() and
mnt_drop_write() (and their _file alternative) prevent freezing as well.
For the few cases that are really interested only in remount-ro protection
provide new function variants.

BugLink: https://bugs.launchpad.net/bugs/897421Tested-by: NKamal Mostafa <kamal@canonical.com>
Tested-by: NPeter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: NDann Frazier <dann.frazier@canonical.com>
Tested-by: NMassimo Morana <massimo.morana@canonical.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eb04c282

14 7月, 2012 4 次提交

VFS: Comment mount following code · f015f126

由 David Howells 提交于 6月 25, 2012

Add comments describing what the directions "up" and "down" mean and ref count
handling to the VFS mount following family of functions.

Signed-off-by: Valerie Aurora <vaurora@redhat.com> (Original author)
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f015f126

VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors · be34d1a3

由 David Howells 提交于 6月 25, 2012

copy_tree() can theoretically fail in a case other than ENOMEM, but always
returns NULL which is interpreted by callers as -ENOMEM.  Change it to return
an explicit error.

Also change clone_mnt() for consistency and because union mounts will add new
error cases.

Thanks to Andreas Gruenbacher <agruen@suse.de> for a bug fix.
[AV: folded braino fix by Dan Carpenter]

Original-author: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Cc: Valerie Aurora <valerie.aurora@gmail.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

be34d1a3

get rid of magic in proc_namespace.c · 6ce6e24e

由 Al Viro 提交于 6月 09, 2012

don't rely on proc_mounts->m being the first field; container_of()
is there for purpose.  No need to bother with ->private, while
we are at it - the same container_of will do nicely.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6ce6e24e

get rid of ->mnt_longterm · f7a99c5b

由 Al Viro 提交于 6月 09, 2012

it's enough to set ->mnt_ns of internal vfsmounts to something
distinct from all struct mnt_namespace out there; then we can
just use the check for ->mnt_ns != NULL in the fast path of
mntput_no_expire()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f7a99c5b

31 5月, 2012 1 次提交

vfs: umount_tree() might be called on subtree that had never made it · 63d37a84

由 Al Viro 提交于 5月 29, 2012

__mnt_make_shortterm() in there undoes the effect of __mnt_make_longterm()
we'd done back when we set ->mnt_ns non-NULL; it should not be done to
vfsmounts that had never gone through commit_tree() and friends.  Kudos to
lczerner for catching that one...

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

63d37a84

30 5月, 2012 1 次提交

brlocks/lglocks: API cleanups · 962830df

由 Andi Kleen 提交于 5月 08, 2012

lglocks and brlocks are currently generated with some complicated macros
in lglock.h.  But there's no reason to not just use common utility
functions and put all the data into a common data structure.

In preparation, this patch changes the API to look more like normal
function calls with pointers, not magic macros.

The patch is rather large because I move over all users in one go to keep
it bisectable.  This impacts the VFS somewhat in terms of lines changed.
But no actual behaviour change.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

962830df

07 1月, 2012 4 次提交

vfs: prevent remount read-only if pending removes · 8e8b8796

由 Miklos Szeredi 提交于 11月 21, 2011

If there are any inodes on the super block that have been unlinked
(i_nlink == 0) but have not yet been deleted then prevent the
remounting the super block read-only.
Reported-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8e8b8796

vfs: protect remounting superblock read-only · 4ed5e82f

由 Miklos Szeredi 提交于 11月 21, 2011

Currently remouting superblock read-only is racy in a major way.

With the per mount read-only infrastructure it is now possible to
prevent most races, which this patch attempts.

Before starting the remount read-only, iterate through all mounts
belonging to the superblock and if none of them have any pending
writes, set sb->s_readonly_remount.  This indicates that remount is in
progress and no further write requests are allowed.  If the remount
succeeds set MS_RDONLY and reset s_readonly_remount.

If the remounting is unsuccessful just reset s_readonly_remount.
This can result in transient EROFS errors, despite the fact the
remount failed.  Unfortunately hodling off writes is difficult as
remount itself may touch the filesystem (e.g. through load_nls())
which would deadlock.

A later patch deals with delayed writes due to nlink going to zero.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4ed5e82f

vfs: keep list of mounts for each superblock · 39f7c4db

由 Miklos Szeredi 提交于 11月 21, 2011

Keep track of vfsmounts belonging to a superblock.  List is protected
by vfsmount_lock.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

39f7c4db

A
vfs: switch ->show_options() to struct dentry * · 34c80b1d
由 Al Viro 提交于 12月 08, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
34c80b1d

04 1月, 2012 10 次提交
- A
  vfs: trim includes a bit · d10577a8
  由 Al Viro 提交于 12月 07, 2011
```
[folded fix for missing magic.h from Tetsuo Handa]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  d10577a8
- A
  switch mnt_namespace ->root to struct mount · be08d6d2
  由 Al Viro 提交于 12月 06, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  be08d6d2
- A
  vfs: take /proc/*/mounts and friends to fs/proc_namespace.c · 0226f492
  由 Al Viro 提交于 12月 06, 2011
```
rationale: that stuff is far tighter bound to fs/namespace.c than to
the guts of procfs proper.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  0226f492
- A
  vfs: opencode mntget() mnt_set_mountpoint() · 3a2393d7
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  3a2393d7
- A
  vfs: spread struct mount - remaining argument of next_mnt() · 909b0a88
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  909b0a88
- A
  vfs: move fsnotify junk to struct mount · c63181e6
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  c63181e6
- A
  vfs: move mnt_devname · 52ba1621
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  52ba1621
- A
  vfs: move mnt_list to struct mount · 1a4eeaf2
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  1a4eeaf2
- A
  vfs: switch pnode.h macros to struct mount * · fc7be130
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  fc7be130
- A
  vfs: move the rest of int fields to struct mount · 863d684f
  由 Al Viro 提交于 11月 25, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  863d684f