- 18 7月, 2018 3 次提交
-
-
由 Miklos Szeredi 提交于
The underlying real file used by overlayfs still contains the overlay path. This results in mnt_want_write_file() calls by the filesystem getting freeze protection on the wrong inode (the overlayfs one instead of the real one). Fix by using file_inode(file)->i_sb instead of file->f_path.mnt->mnt_sb. Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Miklos Szeredi 提交于
This reverts commit 7c6893e3. Overlayfs no longer relies on the vfs for checking writability of files. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
This reverts commit 954c736f. Overlayfs no longer relies on the vfs for checking writability of files. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 25 5月, 2018 1 次提交
-
-
由 Eric W. Biederman 提交于
Superblock level remounts are currently restricted to global CAP_SYS_ADMIN, as is the path for changing the root mount to read only on umount. Loosen both of these permission checks to also allow CAP_SYS_ADMIN in any namespace which is privileged towards the userns which originally mounted the filesystem. Signed-off-by: NSeth Forshee <seth.forshee@canonical.com> Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Acked-by: NSerge Hallyn <serge@hallyn.com> Acked-by: NChristian Brauner <christian@brauner.io> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 21 4月, 2018 1 次提交
-
-
由 David Howells 提交于
In do_mount() when the MS_* flags are being converted to MNT_* flags, MS_RDONLY got accidentally convered to SB_RDONLY. Undo this change. Fixes: e462ec50 ("VFS: Differentiate mount flags (MS_*) from internal superblock flags") Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 4月, 2018 1 次提交
-
-
由 Al Viro 提交于
We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for their copies. As it is, creating a deep stack of bindings of /proc/*/ns/* somewhere in a new namespace and exiting yields a stack overflow. Cc: stable@kernel.org Reported-by: NAlexander Aring <aring@mojatatu.com> Bisected-by: NKirill Tkhai <ktkhai@virtuozzo.com> Tested-by: NKirill Tkhai <ktkhai@virtuozzo.com> Tested-by: NAlexander Aring <aring@mojatatu.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 03 4月, 2018 2 次提交
-
-
由 Dominik Brodowski 提交于
Using this helper allows us to avoid the in-kernel call to the sys_umount() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as ksys_umount(). In the near future, the only fs-external caller of ksys_umount() should be converted to call do_umount() directly. Then, ksys_umount() can be moved within sys_umount() again. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
Using this helper allows us to avoid the in-kernel calls to the sys_mount() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_mount(). In the near future, all callers of ksys_mount() should be converted to call do_mount() directly. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
- 10 12月, 2017 1 次提交
-
-
由 Markus Trippelsdorf 提交于
Since commit e462ec50 ("VFS: Differentiate mount flags (MS_*) from internal superblock flags") the lazytime mount option doesn't get passed on anymore. Fix the issue by handling the option in do_mount(). Reviewed-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 09 11月, 2017 1 次提交
-
-
由 Mimi Zohar 提交于
The mount i_version flag is not enabled in the new sb_flags. This patch adds the missing SB_I_VERSION flag. Fixes: e462ec50 "VFS: Differentiate mount flags (MS_*) from internal superblock flags" Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
-
- 25 10月, 2017 1 次提交
-
-
由 Mark Rutland 提交于
locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 17 10月, 2017 1 次提交
-
-
由 Mimi Zohar 提交于
The mount i_version flag is not enabled in the new sb_flags. This patch adds the missing SB_I_VERSION flag. Fixes: e462ec50 "VFS: Differentiate mount flags (MS_*) from internal superblock flags" Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 10月, 2017 1 次提交
-
-
由 Amir Goldstein 提交于
Overlayfs directory file_inode() is the overlay inode whether the real inode is upper or lower. This fixes a regression in xfstest generic/158. Fixes: 7c6893e3 ("ovl: don't allow writing ioctl on lower layer") Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 05 9月, 2017 1 次提交
-
-
由 Miklos Szeredi 提交于
Problem with ioctl() is that it's a file operation, yet often used as an inode operation (i.e. modify the inode despite the file being opened for read-only). mnt_want_write_file() is used by filesystems in such cases to get write access on an arbitrary open file. Since overlayfs lets filesystems do all file operations, including ioctl, this can lead to mnt_want_write_file() returning OK for a lower file and modification of that lower file. This patch prevents modification by checking if the file is from an overlayfs lower layer and returning EPERM in that case. Need to introduce a mnt_want_write_file_path() variant that still does the old thing for inode operations that can do the copy up + modification correctly in such cases (fchown, fsetxattr, fremovexattr). This does not address the correctness of such ioctls on overlayfs (the correct way would be to copy up and attempt to perform ioctl on upper file). In theory this could be a regression. We very much hope that nobody is relying on such a hack in any sane setup. While this patch meddles in VFS code, it has no effect on non-overlayfs filesystems. Reported-by: N"zhangyi (F)" <yi.zhang@huawei.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 28 8月, 2017 1 次提交
-
-
由 Byungchul Park 提交于
Although llist provides proper APIs, they are not used. Make them used. Signed-off-by: NByungchul Park <byungchul.park@lge.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 17 7月, 2017 2 次提交
-
-
由 David Howells 提交于
Differentiate the MS_* flags passed to mount(2) from the internal flags set in the super_block's s_flags. s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. In this patch, just the headers are altered and some kernel code where blind automated conversion isn't necessarily correct. Note that this shows up some interesting issues: (1) Some MS_* flags get translated to MNT_* flags (such as MS_NODEV -> MNT_NODEV) without passing this on to the filesystem, but some filesystems set such flags anyway. (2) The ->remount_fs() methods of some filesystems adjust the *flags argument by setting MS_* flags in it, such as MS_NOATIME - but these flags are then scrubbed by do_remount_sb() (only the occupants of MS_RMT_MASK are permitted: MS_RDONLY, MS_SYNCHRONOUS, MS_MANDLOCK, MS_I_VERSION and MS_LAZYTIME) I'm not sure what's the best way to solve all these cases. Suggested-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Firstly by applying the following with coccinelle's spatch: @@ expression SB; @@ -SB->s_flags & MS_RDONLY +sb_rdonly(SB) to effect the conversion to sb_rdonly(sb), then by applying: @@ expression A, SB; @@ ( -(!sb_rdonly(SB)) && A +!sb_rdonly(SB) && A | -A != (sb_rdonly(SB)) +A != sb_rdonly(SB) | -A == (sb_rdonly(SB)) +A == sb_rdonly(SB) | -!(sb_rdonly(SB)) +!sb_rdonly(SB) | -A && (sb_rdonly(SB)) +A && sb_rdonly(SB) | -A || (sb_rdonly(SB)) +A || sb_rdonly(SB) | -(sb_rdonly(SB)) != A +sb_rdonly(SB) != A | -(sb_rdonly(SB)) == A +sb_rdonly(SB) == A | -(sb_rdonly(SB)) && A +sb_rdonly(SB) && A | -(sb_rdonly(SB)) || A +sb_rdonly(SB) || A ) @@ expression A, B, SB; @@ ( -(sb_rdonly(SB)) ? 1 : 0 +sb_rdonly(SB) | -(sb_rdonly(SB)) ? A : B +sb_rdonly(SB) ? A : B ) to remove left over excess bracketage and finally by applying: @@ expression A, SB; @@ ( -(A & MS_RDONLY) != sb_rdonly(SB) +(bool)(A & MS_RDONLY) != sb_rdonly(SB) | -(A & MS_RDONLY) == sb_rdonly(SB) +(bool)(A & MS_RDONLY) == sb_rdonly(SB) ) to make comparisons against the result of sb_rdonly() (which is a bool) work correctly. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 11 7月, 2017 1 次提交
-
-
由 David Howells 提交于
Kill off s_options, save/replace_mount_options() and generic_show_options() as all filesystems now implement ->show_options() for themselves. This should make it easier to implement a context-based mount where the mount options can be passed individually over a file descriptor. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 07 7月, 2017 1 次提交
-
-
由 Pavel Tatashin 提交于
Update dcache, inode, pid, mountpoint, and mount hash tables to use HASH_ZERO, and remove initialization after allocations. In case of places where HASH_EARLY was used such as in __pv_init_lock_hash the zeroed hash table was already assumed, because memblock zeroes the memory. CPU: SPARC M6, Memory: 7T Before fix: Dentry cache hash table entries: 1073741824 Inode-cache hash table entries: 536870912 Mount-cache hash table entries: 16777216 Mountpoint-cache hash table entries: 16777216 ftrace: allocating 20414 entries in 40 pages Total time: 11.798s After fix: Dentry cache hash table entries: 1073741824 Inode-cache hash table entries: 536870912 Mount-cache hash table entries: 16777216 Mountpoint-cache hash table entries: 16777216 ftrace: allocating 20414 entries in 40 pages Total time: 3.198s CPU: Intel Xeon E5-2630, Memory: 2.2T: Before fix: Dentry cache hash table entries: 536870912 Inode-cache hash table entries: 268435456 Mount-cache hash table entries: 8388608 Mountpoint-cache hash table entries: 8388608 CPU: Physical Processor ID: 0 Total time: 3.245s After fix: Dentry cache hash table entries: 536870912 Inode-cache hash table entries: 268435456 Mount-cache hash table entries: 8388608 Mountpoint-cache hash table entries: 8388608 CPU: Physical Processor ID: 0 Total time: 3.244s Link: http://lkml.kernel.org/r/1488432825-92126-4-git-send-email-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: NBabu Moger <babu.moger@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 7月, 2017 1 次提交
-
-
由 David Howells 提交于
Clean up line terminal whitespace in fs/namespace.c and fs/super.c. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 6月, 2017 1 次提交
-
-
由 Andrei Vagin 提交于
Fixes: 4f757f3c ("make sure that mntns_install() doesn't end up with referral for root") Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrei Vagin <avagin@openvz.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 23 5月, 2017 2 次提交
-
-
由 Eric W. Biederman 提交于
While investigating some poor umount performance I realized that in the case of overlapping mount trees where some of the mounts are locked the code has been failing to unmount all of the mounts it should have been unmounting. This failure to unmount all of the necessary mounts can be reproduced with: $ cat locked_mounts_test.sh mount -t tmpfs test-base /mnt mount --make-shared /mnt mkdir -p /mnt/b mount -t tmpfs test1 /mnt/b mount --make-shared /mnt/b mkdir -p /mnt/b/10 mount -t tmpfs test2 /mnt/b/10 mount --make-shared /mnt/b/10 mkdir -p /mnt/b/10/20 mount --rbind /mnt/b /mnt/b/10/20 unshare -Urm --propagation unchaged /bin/sh -c 'sleep 5; if [ $(grep test /proc/self/mountinfo | wc -l) -eq 1 ] ; then echo SUCCESS ; else echo FAILURE ; fi' sleep 1 umount -l /mnt/b wait %% $ unshare -Urm ./locked_mounts_test.sh This failure is corrected by removing the prepass that marks mounts that may be umounted. A first pass is added that umounts mounts if possible and if not sets mount mark if they could be unmounted if they weren't locked and adds them to a list to umount possibilities. This first pass reconsiders the mounts parent if it is on the list of umount possibilities, ensuring that information of umoutability will pass from child to mount parent. A second pass then walks through all mounts that are umounted and processes their children unmounting them or marking them for reparenting. A last pass cleans up the state on the mounts that could not be umounted and if applicable reparents them to their first parent that remained mounted. While a bit longer than the old code this code is much more robust as it allows information to flow up from the leaves and down from the trunk making the order in which mounts are encountered in the umount propgation tree irrelevant. Cc: stable@vger.kernel.org Fixes: 0c56fe31 ("mnt: Don't propagate unmounts to locked mounts") Reviewed-by: NAndrei Vagin <avagin@virtuozzo.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
It was observed that in some pathlogical cases that the current code does not unmount everything it should. After investigation it was determined that the issue is that mnt_change_mntpoint can can change which mounts are available to be unmounted during mount propagation which is wrong. The trivial reproducer is: $ cat ./pathological.sh mount -t tmpfs test-base /mnt cd /mnt mkdir 1 2 1/1 mount --bind 1 1 mount --make-shared 1 mount --bind 1 2 mount --bind 1/1 1/1 mount --bind 1/1 1/1 echo grep test-base /proc/self/mountinfo umount 1/1 echo grep test-base /proc/self/mountinfo $ unshare -Urm ./pathological.sh The expected output looks like: 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 The output without the fix looks like: 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 That last mount in the output was in the propgation tree to be unmounted but was missed because the mnt_change_mountpoint changed it's parent before the walk through the mount propagation tree observed it. Cc: stable@vger.kernel.org Fixes: 1064f874 ("mnt: Tuck mounts under others instead of creating shadow/side mounts.") Acked-by: NAndrei Vagin <avagin@virtuozzo.com> Reviewed-by: NRam Pai <linuxram@us.ibm.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 22 4月, 2017 1 次提交
-
-
由 Al Viro 提交于
new flag: LOOKUP_DOWN. If the starting point is overmounted, cross into whatever's mounted on top, triggering referrals et.al. Use that instead of follow_down_one() loop in mntns_install(), handle errors properly. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 10 4月, 2017 2 次提交
-
-
由 Jan Kara 提交于
Currently we free fsnotify_mark_connector structure only when inode / vfsmount is getting freed. This can however impose noticeable memory overhead when marks get attached to inodes only temporarily. So free the connector structure once the last mark is detached from the object. Since notification infrastructure can be working with the connector under the protection of fsnotify_mark_srcu, we have to be careful and free the fsnotify_mark_connector only after SRCU period passes. Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com> Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
Currently notification marks are attached to object (inode or vfsmnt) by a hlist_head in the object. The list is also protected by a spinlock in the object. So while there is any mark attached to the list of marks, the object must be pinned in memory (and thus e.g. last iput() deleting inode cannot happen). Also for list iteration in fsnotify() to work, we must hold fsnotify_mark_srcu lock so that mark itself and mark->obj_list.next cannot get freed. Thus we are required to wait for response to fanotify events from userspace process with fsnotify_mark_srcu lock held. That causes issues when userspace process is buggy and does not reply to some event - basically the whole notification subsystem gets eventually stuck. So to be able to drop fsnotify_mark_srcu lock while waiting for response, we have to pin the mark in memory and make sure it stays in the object list (as removing the mark waiting for response could lead to lost notification events for groups later in the list). However we don't want inode reclaim to block on such mark as that would lead to system just locking up elsewhere. This commit is the first in the series that paves way towards solving these conflicting lifetime needs. Instead of anchoring the list of marks directly in the object, we anchor it in a dedicated structure (fsnotify_mark_connector) and just point to that structure from the object. The following commits will also add spinlock protecting the list and object pointer to the structure. Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com> Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 02 3月, 2017 2 次提交
-
-
由 Ingo Molnar 提交于
sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h> Update all usage sites first. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h doing that for them. Note that even if the count where we need to add extra headers seems high, it's still a net win, because <linux/sched.h> is included in over 2,200 files ... Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 2月, 2017 1 次提交
-
-
由 Eric W. Biederman 提交于
Ever since mount propagation was introduced in cases where a mount in propagated to parent mount mountpoint pair that is already in use the code has placed the new mount behind the old mount in the mount hash table. This implementation detail is problematic as it allows creating arbitrary length mount hash chains. Furthermore it invalidates the constraint maintained elsewhere in the mount code that a parent mount and a mountpoint pair will have exactly one mount upon them. Making it hard to deal with and to talk about this special case in the mount code. Modify mount propagation to notice when there is already a mount at the parent mount and mountpoint where a new mount is propagating to and place that preexisting mount on top of the new mount. Modify unmount propagation to notice when a mount that is being unmounted has another mount on top of it (and no other children), and to replace the unmounted mount with the mount on top of it. Move the MNT_UMUONT test from __lookup_mnt_last into __propagate_umount as that is the only call of __lookup_mnt_last where MNT_UMOUNT may be set on any mount visible in the mount hash table. These modifications allow: - __lookup_mnt_last to be removed. - attach_shadows to be renamed __attach_mnt and its shadow handling to be removed. - commit_tree to be simplified - copy_tree to be simplified The result is an easier to understand tree of mounts that does not allow creation of arbitrary length hash chains in the mount hash table. The result is also a very slight userspace visible difference in semantics. The following two cases now behave identically, where before order mattered: case 1: (explicit user action) B is a slave of A mount something on A/a , it will propagate to B/a and than mount something on B/a case 2: (tucked mount) B is a slave of A mount something on B/a and than mount something on A/a Histroically umount A/a would fail in case 1 and succeed in case 2. Now umount A/a succeeds in both configurations. This very small change in semantics appears if anything to be a bug fix to me and my survey of userspace leads me to believe that no programs will notice or care of this subtle semantic change. v2: Updated to mnt_change_mountpoint to not call dput or mntput and instead to decrement the counts directly. It is guaranteed that there will be other references when mnt_change_mountpoint is called so this is safe. v3: Moved put_mountpoint under mount_lock in attach_recursive_mnt As the locking in fs/namespace.c changed between v2 and v3. v4: Reworked the logic in propagate_mount_busy and __propagate_umount that detects when a mount completely covers another mount. v5: Removed unnecessary tests whose result is alwasy true in find_topper and attach_recursive_mnt. v6: Document the user space visible semantic difference. Cc: stable@vger.kernel.org Fixes: b90fa9ae ("[PATCH] shared mount handling: bind and rbind") Tested-by: NAndrei Vagin <avagin@virtuozzo.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 01 2月, 2017 1 次提交
-
-
由 Eric W. Biederman 提交于
To support unprivileged users mounting filesystems two permission checks have to be performed: a test to see if the user allowed to create a mount in the mount namespace, and a test to see if the user is allowed to access the specified filesystem. The automount case is special in that mounting the original filesystem grants permission to mount the sub-filesystems, to any user who happens to stumble across the their mountpoint and satisfies the ordinary filesystem permission checks. Attempting to handle the automount case by using override_creds almost works. It preserves the idea that permission to mount the original filesystem is permission to mount the sub-filesystem. Unfortunately using override_creds messes up the filesystems ordinary permission checks. Solve this by being explicit that a mount is a submount by introducing vfs_submount, and using it where appropriate. vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let sget and friends know that a mount is a submount so they can take appropriate action. sget and sget_userns are modified to not perform any permission checks on submounts. follow_automount is modified to stop using override_creds as that has proven problemantic. do_mount is modified to always remove the new MS_SUBMOUNT flag so that we know userspace will never by able to specify it. autofs4 is modified to stop using current_real_cred that was put in there to handle the previous version of submount permission checking. cifs is modified to pass the mountpoint all of the way down to vfs_submount. debugfs is modified to pass the mountpoint all of the way down to trace_automount by adding a new parameter. To make this change easier a new typedef debugfs_automount_t is introduced to capture the type of the debugfs automount function. Cc: stable@vger.kernel.org Fixes: 069d5ac9 ("autofs: Fix automounts by using current_real_cred()->uid") Fixes: aeaa4a79 ("fs: Call d_automount with the filesystems creds") Reviewed-by: NTrond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: NSeth Forshee <seth.forshee@canonical.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 10 1月, 2017 1 次提交
-
-
由 Eric W. Biederman 提交于
Protecting the mountpoint hashtable with namespace_sem was sufficient until a call to umount_mnt was added to mntput_no_expire. At which point it became possible for multiple calls of put_mountpoint on the same hash chain to happen on the same time. Kristen Johansen <kjlx@templeofstupid.com> reported: > This can cause a panic when simultaneous callers of put_mountpoint > attempt to free the same mountpoint. This occurs because some callers > hold the mount_hash_lock, while others hold the namespace lock. Some > even hold both. > > In this submitter's case, the panic manifested itself as a GP fault in > put_mountpoint() when it called hlist_del() and attempted to dereference > a m_hash.pprev that had been poisioned by another thread. Al Viro observed that the simple fix is to switch from using the namespace_sem to the mount_lock to protect the mountpoint hash table. I have taken Al's suggested patch moved put_mountpoint in pivot_root (instead of taking mount_lock an additional time), and have replaced new_mountpoint with get_mountpoint a function that does the hash table lookup and addition under the mount_lock. The introduction of get_mounptoint ensures that only the mount_lock is needed to manipulate the mountpoint hashtable. d_set_mounted is modified to only set DCACHE_MOUNTED if it is not already set. This allows get_mountpoint to use the setting of DCACHE_MOUNTED to ensure adding a struct mountpoint for a dentry happens exactly once. Cc: stable@vger.kernel.org Fixes: ce07d891 ("mnt: Honor MNT_LOCKED when detaching mounts") Reported-by: NKrister Johansen <kjlx@templeofstupid.com> Suggested-by: NAl Viro <viro@ZenIV.linux.org.uk> Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 17 12月, 2016 3 次提交
-
-
由 Al Viro 提交于
Make sure that clone_mnt() never returns a mount with MNT_SHARED in flags, but without a valid ->mnt_group_id. That allows to demystify do_make_slave() quite a bit, among other things. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
not for CL_PRIVATE clone_mnt() Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Hadn't been true for quite a while Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 06 12月, 2016 2 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Mickaël Salaün 提交于
The function path_is_under() doesn't modify the paths pointed by its arguments but only browse them. Constifying this pointers make a cleaner interface to be used by (future) code which may only have access to const struct path pointers (e.g. LSM hooks). Signed-off-by: NMickaël Salaün <mic@digikod.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 04 12月, 2016 1 次提交
-
-
由 Ian Kent 提交于
d_mountpoint() can only be used reliably to establish if a dentry is not mounted in any namespace. It isn't aware of the possibility there may be multiple mounts using a given dentry that may be in a different namespace. Add helper functions, path_is_mountpoint(), that checks if a struct path is a mountpoint for this case. Link: http://lkml.kernel.org/r/20161011053358.27645.9729.stgit@pluto.themaw.netSigned-off-by: NIan Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 11 10月, 2016 1 次提交
-
-
由 Emese Revfy 提交于
The __latent_entropy gcc attribute can be used only on functions and variables. If it is on a function then the plugin will instrument it for gathering control-flow entropy. If the attribute is on a variable then the plugin will initialize it with random contents. The variable must be an integer, an integer array type or a structure with integer fields. These specific functions have been selected because they are init functions (to help gather boot-time entropy), are called at unpredictable times, or they have variable loops, each of which provide some level of latent entropy. Signed-off-by: NEmese Revfy <re.emese@gmail.com> [kees: expanded commit message] Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 01 10月, 2016 1 次提交
-
-
由 Eric W. Biederman 提交于
CAI Qian <caiqian@redhat.com> pointed out that the semantics of shared subtrees make it possible to create an exponentially increasing number of mounts in a mount namespace. mkdir /tmp/1 /tmp/2 mount --make-rshared / for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done Will create create 2^20 or 1048576 mounts, which is a practical problem as some people have managed to hit this by accident. As such CVE-2016-6213 was assigned. Ian Kent <raven@themaw.net> described the situation for autofs users as follows: > The number of mounts for direct mount maps is usually not very large because of > the way they are implemented, large direct mount maps can have performance > problems. There can be anywhere from a few (likely case a few hundred) to less > than 10000, plus mounts that have been triggered and not yet expired. > > Indirect mounts have one autofs mount at the root plus the number of mounts that > have been triggered and not yet expired. > > The number of autofs indirect map entries can range from a few to the common > case of several thousand and in rare cases up to between 30000 and 50000. I've > not heard of people with maps larger than 50000 entries. > > The larger the number of map entries the greater the possibility for a large > number of active mounts so it's not hard to expect cases of a 1000 or somewhat > more active mounts. So I am setting the default number of mounts allowed per mount namespace at 100,000. This is more than enough for any use case I know of, but small enough to quickly stop an exponential increase in mounts. Which should be perfect to catch misconfigurations and malfunctioning programs. For anyone who needs a higher limit this can be changed by writing to the new /proc/sys/fs/mount-max sysctl. Tested-by: NCAI Qian <caiqian@redhat.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 23 9月, 2016 1 次提交
-
-
由 Andrey Vagin 提交于
Return -EPERM if an owning user namespace is outside of a process current user namespace. v2: In a first version ns_get_owner returned ENOENT for init_user_ns. This special cases was removed from this version. There is nothing outside of init_user_ns, so we can return EPERM. v3: rename ns->get_owner() to ns->owner(). get_* usually means that it grabs a reference. Acked-by: NSerge Hallyn <serge@hallyn.com> Signed-off-by: NAndrei Vagin <avagin@openvz.org> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-