- 05 7月, 2017 19 次提交
-
-
由 Amir Goldstein 提交于
Bad index entries are entries whose name does not match the origin file handle stored in trusted.overlay.origin xattr. Bad index entries could be a result of a system power off in the middle of copy up. Stale index entries are entries whose origin file handle is stale. Stale index entries could be a result of copying layers or removing lower entries while the overlay is not mounted. The case of copying layers should be detected earlier by the verification of upper root dir origin and index dir origin. Both bad and stale index entries are detected and removed on mount. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Amir Goldstein 提交于
When inodes index feature is enabled, lookup in indexdir for the index entry of lower real inode or copy up origin inode. The index entry name is the hex representation of the lower inode file handle. If the index dentry in negative, then either no lower aliases have been copied up yet, or aliases have been copied up in older kernels and are not indexed. If the index dentry for a copy up origin inode is positive, but points to an inode different than the upper inode, then either the upper inode has been copied up and not indexed or it was indexed, but since then index dir was cleared. Either way, that index cannot be used to indentify the overlay inode. If a positive dentry that matches the upper inode was found, then it is safe to use the copy up origin st_ino for upper hardlinks, because all indexed upper hardlinks are represented by the same overlay inode as the copy up origin. Set the INDEX type flag on an indexed upper dentry. A non-upper dentry may also have a positive index from copy up of another lower hardlink. This situation will be handled by following patches. Index lookup is going to be used to prevent breaking hardlinks on copy up. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Amir Goldstein 提交于
An index dir contains persistent hardlinks to files in upper dir. Therefore, we must never mount an existing index dir with a differnt upper dir. Store the upper root dir file handle in index dir inode when index dir is created and verify the file handle before using an existing index dir on mount. Add an 'is_upper' flag to the overlay file handle encoding and set it when encoding the upper root file handle. This is not critical for index dir verification, but it is good practice towards a standard overlayfs file handle format for NFS export. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Amir Goldstein 提交于
When inodes index feature is enabled, verify that the file handle stored in upper root dir matches the lower root dir or fail to mount. If upper root dir has no stored file handle, encode and store the lower root dir file handle in overlay.origin xattr. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Amir Goldstein 提交于
Create the index dir on mount. The index dir will contain hardlinks to upper inodes, named after the hex representation of their origin lower inodes. The index dir is going to be used to prevent breaking lower hardlinks on copy up and to implement overlayfs NFS export. Because the feature is not fully backward compat, enabling the feature is opt-in by config/module/mount option. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Amir Goldstein 提交于
Pass in the subdir name to create and specify if subdir is persistent or if it should be cleaned up on every mount. Move fallback to readonly mount on failure to create dir and print of error message into the helper. This function is going to be used for creating the persistent 'index' dir under workbasedir. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Amir Goldstein 提交于
For the case of all layers not on the same fs, try to decode the copy up origin file handle on any of the lower layers. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Amir Goldstein 提交于
Bad things can happen if several concurrent overlay mounts try to use the same upperdir/workdir path. Try to get the 'inuse' advisory lock on upperdir and workdir. Fail mount if another overlay mount instance or another user holds the 'inuse' lock on these directories. Note that this provides no protection for concurrent overlay mount that use overlapping (i.e. descendant) upper/work dirs. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Amir Goldstein 提交于
Added an i_state flag I_INUSE and helpers to set/clear/test the bit. The 'inuse' lock is an 'advisory' inode lock, that can be used to extend exclusive create protection beyond parent->i_mutex lock among cooperating users. This is going to be used by overlayfs to get exclusive ownership on upper and work dirs among overlayfs mounts. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Amir Goldstein 提交于
Use the new ovl_inode mutex to synchonize concurrent copy up instead of the super block copy up workqueue. Moving the synchronization object from the overlay dentry to the overlay inode is needed for synchonizing concurrent copy up of lower hardlinks to the same upper inode. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
When checking for consistency in directory operations (unlink, rename, etc.) match inodes not dentries. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Amir Goldstein 提交于
We need some more space to store overlay inode data in memory, so allocate overlay inodes from a slab of struct ovl_inode. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Amir Goldstein 提交于
This patch fixes an overlay inode nlink leak in the case where ovl_rename() renames over a non-dir. This is not so critical, because overlay inode doesn't rely on nlink dropping to zero for inode deletion. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 28 6月, 2017 2 次提交
-
-
由 Miklos Szeredi 提交于
When copying up a file that has multiple hard links we need to break any association with the origin file. This makes copy-up be essentially an atomic replace. The new file has nothing to do with the old one (except having the same data and metadata initially), so don't set the overlay.origin attribute. We can relax this in the future when we are able to index upper object by origin. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 3a1e819b ("ovl: store file handle of lower inode on copy up")
-
由 Miklos Szeredi 提交于
Nothing prevents mischief on upper layer while we are busy copying up the data. Move the lookup right before the looked up dentry is actually used. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 01ad3eb8 ("ovl: concurrent copy up of regular files") Cc: <stable@vger.kernel.org> # v4.11
-
- 24 6月, 2017 4 次提交
-
-
由 Kees Cook 提交于
When limiting the argv/envp strings during exec to 1/4 of the stack limit, the storage of the pointers to the strings was not included. This means that an exec with huge numbers of tiny strings could eat 1/4 of the stack limit in strings and then additional space would be later used by the pointers to the strings. For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721 single-byte strings would consume less than 2MB of stack, the max (8MB / 4) amount allowed, but the pointers to the strings would consume the remaining additional stack space (1677721 * 4 == 6710884). The result (1677721 + 6710884 == 8388605) would exhaust stack space entirely. Controlling this stack exhaustion could result in pathological behavior in setuid binaries (CVE-2017-1000365). [akpm@linux-foundation.org: additional commenting from Kees] Fixes: b6a2fea3 ("mm: variable length argument support") Link: http://lkml.kernel.org/r/20170622001720.GA32173@beastSigned-off-by: NKees Cook <keescook@chromium.org> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qualys Security Advisory <qsa@qualys.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Ren 提交于
Another deadlock path caused by recursive locking is reported. This kind of issue was introduced since commit 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths have been fixed by commit b891fa50 ("ocfs2: fix deadlock issue when taking inode lock at vfs entry points"). Yes, we intend to fix this kind of case in incremental way, because it's hard to find out all possible paths at once. This one can be reproduced like this. On node1, cp a large file from home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl. Both nodes will hang up there. The backtraces: On node1: __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2] ocfs2_write_begin+0x43/0x1a0 [ocfs2] generic_perform_write+0xa9/0x180 __generic_file_write_iter+0x1aa/0x1d0 ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2] __vfs_write+0xc3/0x130 vfs_write+0xb1/0x1a0 SyS_write+0x46/0xa0 On node2: __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2] ocfs2_xattr_set+0x12e/0xe80 [ocfs2] ocfs2_set_acl+0x22d/0x260 [ocfs2] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2] set_posix_acl+0x75/0xb0 posix_acl_xattr_set+0x49/0xa0 __vfs_setxattr+0x69/0x80 __vfs_setxattr_noperm+0x72/0x1a0 vfs_setxattr+0xa7/0xb0 setxattr+0x12d/0x190 path_setxattr+0x9f/0xb0 SyS_setxattr+0x14/0x20 Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is exported by commit 439a36b8 ("ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock"). Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com Fixes: 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: NEric Ren <zren@suse.com> Reported-by: NThomas Voegtle <tv@lio96.de> Tested-by: NThomas Voegtle <tv@lio96.de> Reviewed-by: NJoseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
dax_writeback_mapping_range() fails to update iteration index when searching radix tree for entries needing cache flushing. Thus each pagevec worth of entries is searched starting from the start which is inefficient and prone to livelocks. Update index properly. Link: http://lkml.kernel.org/r/20170619124531.21491-1-jack@suse.cz Fixes: 9973c98e ("dax: add support for fsync/sync") Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 NeilBrown 提交于
If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl, autofs4_d_automount() will return ERR_PTR(status) with that status to follow_automount(), which will then dereference an invalid pointer. So treat a positive status the same as zero, and map to ENOENT. See comment in systemd src/core/automount.c::automount_send_ready(). Link: http://lkml.kernel.org/r/871sqwczx5.fsf@notabene.neil.brown.nameSigned-off-by: NNeilBrown <neilb@suse.com> Cc: Ian Kent <raven@themaw.net> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 6月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
bmap returns a dumb LBA address but not the block device that goes with that LBA. Swapfiles don't care about this and will blindly assume that the data volume is the correct blockdev, which is totally bogus for files on the rt subvolume. This results in the swap code doing IOs to arbitrary locations on the data device(!) if the passed in mapping is a realtime file, so just turn off bmap for rt files. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 21 6月, 2017 5 次提交
-
-
由 Christophe Jaillet 提交于
'rc' is known to be 0 at this point. So if 'init_sg' or 'kzalloc' fails, we should return -ENOMEM instead. Also remove a useless 'rc' in a debug message as it is meaningless here. Fixes: 026e93dc ("CIFS: Encrypt SMB3 requests before sending") Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com> Reviewed-by: NAurelien Aptel <aaptel@suse.com> Signed-off-by: NSteve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
-
由 Colin Ian King 提交于
There is a redundant return in function cifs_creation_time_get that appears to be old vestigial code than can be removed. So remove it. Detected by CoverityScan, CID#1361924 ("Structurally dead code") Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NSteve French <smfrench@gmail.com>
-
由 Pavel Shilovsky 提交于
Downgrade the loglevel for SMB2 to prevent filling the log with messages if e.g. readdir was interrupted. Also make SMB2 and SMB1 codepaths do the same logging during readdir. Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com> Signed-off-by: NSteve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
-
由 Colin Ian King 提交于
pages is being allocated however a null check on bv is being used to see if the allocation failed. Fix this by checking if pages is null. Detected by CoverityScan, CID#1432974 ("Logically dead code") Fixes: ccf7f408 ("CIFS: Add asynchronous context to support kernel AIO") Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com> Signed-off-by: NSteve French <smfrench@gmail.com>
-
由 Dan Carpenter 提交于
The current code causes a static checker warning because ITER_IOVEC is zero so the condition is never true. Fixes: 6685c5e2 ("CIFS: Add asynchronous read support through kernel AIO") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NSteve French <smfrench@gmail.com>
-
- 19 6月, 2017 1 次提交
-
-
由 Hugh Dickins 提交于
Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Original-patch-by: NOleg Nesterov <oleg@redhat.com> Original-patch-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NMichal Hocko <mhocko@suse.com> Tested-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 6月, 2017 3 次提交
-
-
由 Al Viro 提交于
* original hysteresis loop got broken by typo back in 2002; now it never switches out of OPTTIME state. Fixed. * critical levels for switching from OPTTIME to OPTSPACE and back ought to be calculated once, at mount time. * we should use mul_u64_u32_div() for those calculations, now that ->s_dsize is 64bit. * to quote Kirk McKusick (in 1995 FreeBSD commit message): The threshold for switching from time-space and space-time is too small when minfree is 5%...so make it stay at space in this case. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 17 6月, 2017 1 次提交
-
-
由 Andrea Arcangeli 提交于
Anon and hugetlbfs handle FOLL_DUMP set by get_dump_page() internally to __get_user_pages(). shmem as opposed has no special FOLL_DUMP handling there so handle_mm_fault() is invoked without mmap_sem and ends up calling handle_userfault() that isn't expecting to be invoked without mmap_sem held. This makes handle_userfault() fail immediately if invoked through shmem_vm_ops->fault during coredumping and solves the problem. The side effect is a BUG_ON with no lock held triggered by the coredumping process which exits. Only 4.11 is affected, pre-4.11 anon memory holes are skipped in __get_user_pages by checking FOLL_DUMP explicitly against empty pagetables (mm/gup.c:no_page_table()). It's zero cost as we already had a check for current->flags to prevent futex to trigger userfaults during exit (PF_EXITING). Link: http://lkml.kernel.org/r/20170615214838.27429-1-aarcange@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reported-by: N"Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: <stable@vger.kernel.org> [4.11+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 6月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
Fixes: 793b80ef ("vfs: pass a flags argument to vfs_readv/vfs_writev") Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 6月, 2017 3 次提交
-
-
由 Andrei Vagin 提交于
Fixes: 4f757f3c ("make sure that mntns_install() doesn't end up with referral for root") Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrei Vagin <avagin@openvz.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
It's not hard to trigger a bunch of d_invalidate() on the same dentry in parallel. They end up fighting each other - any dentry picked for removal by one will be skipped by the rest and we'll go for the next iteration through the entire subtree, even if everything is being skipped. Morevoer, we immediately go back to scanning the subtree. The only thing we really need is to dissolve all mounts in the subtree and as soon as we've nothing left to do, we can just unhash the dentry and bugger off. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
The logics when deciding whether we need to do anything with direct blocks is broken when new size is within the last direct block. It's better to find the path to the last byte _not_ to be removed and use that instead of the path to the beginning of the first block to be freed... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-