- 05 7月, 2017 2 次提交
-
-
由 Miklos Szeredi 提交于
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 29 5月, 2017 1 次提交
-
-
由 Miklos Szeredi 提交于
Filesystems filter out extended attributes in the "trusted." domain for unprivlieged callers. Overlay calls underlying filesystem's method with elevated privs, so need to do the filtering in overlayfs too. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 05 5月, 2017 2 次提交
-
-
由 Miklos Szeredi 提交于
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Amir Goldstein 提交于
When all layers are on the same underlying filesystem, let stat(2) return st_dev/st_ino values of the copy up origin inode if it is known. This results in constant st_ino/st_dev representation of files in an overlay mount before and after copy up. When the underlying filesystem support NFS exportfs, the result is also persistent st_ino/st_dev representation before and after mount cycle. Lower hardlinks are broken on copy up to different upper files, so we cannot use the lower origin st_ino for those different files, even for the same fs case. When all overlay layers are on the same fs, use overlay st_dev for non-dirs to get the correct result from du -x. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 08 3月, 2017 1 次提交
-
-
由 Amir Goldstein 提交于
An overlayfs instance can be the lower layer of another overlayfs instance. This setup triggers a lockdep splat of possible recursive locking of sb->s_type->i_mutex_key in iterate_dir(). Trimmed snip: [ INFO: possible recursive locking detected ] bash/2468 is trying to acquire lock: &sb->s_type->i_mutex_key#14, at: iterate_dir+0x7d/0x15c but task is already holding lock: &sb->s_type->i_mutex_key#14, at: iterate_dir+0x7d/0x15c One problem observed with this splat is that ovl_new_inode() does not call lockdep_annotate_inode_mutex_key() to annotate the dir inode lock as &sb->s_type->i_mutex_dir_key like other fs do. The other problem is that the 2 nested levels of overlayfs inode lock are annotated using the same key, which is the cause of the false positive lockdep warning. Fix this by annotating overlayfs inode lock in ovl_fill_inode() according to stack level of the super block instance and use different key for dir vs. non-dir like other fs do. Here is an edited snip from /proc/lockdep_chains after iterate_dir() of nested overlayfs: [...] &ovl_i_mutex_dir_key[depth] (stack_depth=2) [...] &ovl_i_mutex_dir_key[depth]#2 (stack_depth=1) [...] &type->i_mutex_dir_key (stack_depth=0) Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 03 3月, 2017 1 次提交
-
-
由 David Howells 提交于
Add a system call to make extended file information available, including file creation and some attribute flags where available through the underlying filesystem. The getattr inode operation is altered to take two additional arguments: a u32 request_mask and an unsigned int flags that indicate the synchronisation mode. This change is propagated to the vfs_getattr*() function. Functions like vfs_stat() are now inline wrappers around new functions vfs_statx() and vfs_statx_fd() to reduce stack usage. ======== OVERVIEW ======== The idea was initially proposed as a set of xattrs that could be retrieved with getxattr(), but the general preference proved to be for a new syscall with an extended stat structure. A number of requests were gathered for features to be included. The following have been included: (1) Make the fields a consistent size on all arches and make them large. (2) Spare space, request flags and information flags are provided for future expansion. (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an __s64). (4) Creation time: The SMB protocol carries the creation time, which could be exported by Samba, which will in turn help CIFS make use of FS-Cache as that can be used for coherency data (stx_btime). This is also specified in NFSv4 as a recommended attribute and could be exported by NFSD [Steve French]. (5) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server [Trond Myklebust, Ulrich Drepper, Andreas Dilger] (AT_STATX_DONT_SYNC). (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date [Trond Myklebust] (AT_STATX_FORCE_SYNC). And the following have been left out for future extension: (7) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. Can also be used to modify fill_post_wcc() in NFSD which retrieves i_version directly, but has just called vfs_getattr(). It could get it from the kstat struct if it used vfs_xgetattr() instead. (There's disagreement on the exact semantics of a single field, since not all filesystems do this the same way). (8) BSD stat compatibility: Including more fields from the BSD stat such as creation time (st_btime) and inode generation number (st_gen) [Jeremy Allison, Bernd Schubert]. (9) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd Schubert]. (This was asked for but later deemed unnecessary with the open-by-handle capability available and caused disagreement as to whether it's a security hole or not). (10) Extra coherency data may be useful in making backups [Andreas Dilger]. (No particular data were offered, but things like last backup timestamp, the data version number and the DOS archive bit would come into this category). (11) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available, so if, for instance, inode numbers or UIDs don't exist or are fabricated locally... (This requires a separate system call - I have an fsinfo() call idea for this). (12) Store a 16-byte volume ID in the superblock that can be returned in struct xstat [Steve French]. (Deferred to fsinfo). (13) Include granularity fields in the time data to indicate the granularity of each of the times (NFSv4 time_delta) [Steve French]. (Deferred to fsinfo). (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. Note that the Linux IOC flags are a mess and filesystems such as Ext4 define flags that aren't in linux/fs.h, so translation in the kernel may be a necessity (or, possibly, we provide the filesystem type too). (Some attributes are made available in stx_attributes, but the general feeling was that the IOC flags were to ext[234]-specific and shouldn't be exposed through statx this way). (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, Michael Kerrisk]. (Deferred, probably to fsinfo. Finding out if there's an ACL or seclabal might require extra filesystem operations). (16) Femtosecond-resolution timestamps [Dave Chinner]. (A __reserved field has been left in the statx_timestamp struct for this - if there proves to be a need). (17) A set multiple attributes syscall to go with this. =============== NEW SYSTEM CALL =============== The new system call is: int ret = statx(int dfd, const char *filename, unsigned int flags, unsigned int mask, struct statx *buffer); The dfd, filename and flags parameters indicate the file to query, in a similar way to fstatat(). There is no equivalent of lstat() as that can be emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is also no equivalent of fstat() as that can be emulated by passing a NULL filename to statx() with the fd of interest in dfd. Whether or not statx() synchronises the attributes with the backing store can be controlled by OR'ing a value into the flags argument (this typically only affects network filesystems): (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this respect. (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise its attributes with the server - which might require data writeback to occur to get the timestamps correct. (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a network filesystem. The resulting values should be considered approximate. mask is a bitmask indicating the fields in struct statx that are of interest to the caller. The user should set this to STATX_BASIC_STATS to get the basic set returned by stat(). It should be noted that asking for more information may entail extra I/O operations. buffer points to the destination for the data. This must be 256 bytes in size. ====================== MAIN ATTRIBUTES RECORD ====================== The following structures are defined in which to return the main attribute set: struct statx_timestamp { __s64 tv_sec; __s32 tv_nsec; __s32 __reserved; }; struct statx { __u32 stx_mask; __u32 stx_blksize; __u64 stx_attributes; __u32 stx_nlink; __u32 stx_uid; __u32 stx_gid; __u16 stx_mode; __u16 __spare0[1]; __u64 stx_ino; __u64 stx_size; __u64 stx_blocks; __u64 __spare1[1]; struct statx_timestamp stx_atime; struct statx_timestamp stx_btime; struct statx_timestamp stx_ctime; struct statx_timestamp stx_mtime; __u32 stx_rdev_major; __u32 stx_rdev_minor; __u32 stx_dev_major; __u32 stx_dev_minor; __u64 __spare2[14]; }; The defined bits in request_mask and stx_mask are: STATX_TYPE Want/got stx_mode & S_IFMT STATX_MODE Want/got stx_mode & ~S_IFMT STATX_NLINK Want/got stx_nlink STATX_UID Want/got stx_uid STATX_GID Want/got stx_gid STATX_ATIME Want/got stx_atime{,_ns} STATX_MTIME Want/got stx_mtime{,_ns} STATX_CTIME Want/got stx_ctime{,_ns} STATX_INO Want/got stx_ino STATX_SIZE Want/got stx_size STATX_BLOCKS Want/got stx_blocks STATX_BASIC_STATS [The stuff in the normal stat struct] STATX_BTIME Want/got stx_btime{,_ns} STATX_ALL [All currently available stuff] stx_btime is the file creation time, stx_mask is a bitmask indicating the data provided and __spares*[] are where as-yet undefined fields can be placed. Time fields are structures with separate seconds and nanoseconds fields plus a reserved field in case we want to add even finer resolution. Note that times will be negative if before 1970; in such a case, the nanosecond fields will also be negative if not zero. The bits defined in the stx_attributes field convey information about a file, how it is accessed, where it is and what it does. The following attributes map to FS_*_FL flags and are the same numerical value: STATX_ATTR_COMPRESSED File is compressed by the fs STATX_ATTR_IMMUTABLE File is marked immutable STATX_ATTR_APPEND File is append-only STATX_ATTR_NODUMP File is not to be dumped STATX_ATTR_ENCRYPTED File requires key to decrypt in fs Within the kernel, the supported flags are listed by: KSTAT_ATTR_FS_IOC_FLAGS [Are any other IOC flags of sufficient general interest to be exposed through this interface?] New flags include: STATX_ATTR_AUTOMOUNT Object is an automount trigger These are for the use of GUI tools that might want to mark files specially, depending on what they are. Fields in struct statx come in a number of classes: (0) stx_dev_*, stx_blksize. These are local system information and are always available. (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino, stx_size, stx_blocks. These will be returned whether the caller asks for them or not. The corresponding bits in stx_mask will be set to indicate whether they actually have valid values. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. If the values don't actually exist for the underlying object (such as UID or GID on a DOS file), then the bit won't be set in the stx_mask, even if the caller asked for the value. In such a case, the returned value will be a fabrication. Note that there are instances where the type might not be valid, for instance Windows reparse points. (2) stx_rdev_*. This will be set only if stx_mode indicates we're looking at a blockdev or a chardev, otherwise will be 0. (3) stx_btime. Similar to (1), except this will be set to 0 if it doesn't exist. ======= TESTING ======= The following test program can be used to test the statx system call: samples/statx/test-statx.c Just compile and run, passing it paths to the files you want to examine. The file is built automatically if CONFIG_SAMPLES is enabled. Here's some example output. Firstly, an NFS directory that crosses to another FSID. Note that the AUTOMOUNT attribute is set because transiting this directory will cause d_automount to be invoked by the VFS. [root@andromeda ~]# /tmp/test-statx -A /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:26 Inode: 1703937 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------) Secondly, the result of automounting on that directory. [root@andromeda ~]# /tmp/test-statx /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:27 Inode: 2 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 02 3月, 2017 1 次提交
-
-
由 Ingo Molnar 提交于
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h doing that for them. Note that even if the count where we need to add extra headers seems high, it's still a net win, because <linux/sched.h> is included in over 2,200 files ... Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 12月, 2016 3 次提交
-
-
由 Amir Goldstein 提交于
This removes code duplication. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
No sense in opening special files on the underlying layers, they work just as well if opened on the overlay. Side effect is that it's no longer possible to connect one side of a pipe opened on overlayfs with the other side opened on the underlying layer. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
This reverts commit 03bea604. Commit 4d0c5ba2 ("vfs: do get_write_access() on upper layer of overlayfs") makes the writecount checks inside overlayfs superfluous, the file is already copied up and write access acquired on the upper inode when ovl_setattr is called with ATTR_SIZE. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 09 12月, 2016 1 次提交
-
-
由 Miklos Szeredi 提交于
If .readlink == NULL implies generic_readlink(). Generated by: to_del="\.readlink.*=.*generic_readlink" for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 31 10月, 2016 1 次提交
-
-
由 Miklos Szeredi 提交于
tmpfs doesn't have ->get_acl() because it only uses cached acls. This fixes the acl tests in pjdfstest when tmpfs is used as the upper layer of the overlay. Reported-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 39a25b2b ("ovl: define ->get_acl() for overlay inodes") Cc: <stable@vger.kernel.org> # v4.8
-
- 14 10月, 2016 2 次提交
-
-
由 Miklos Szeredi 提交于
Resulting in a complete removal of a function basically implementing the inverse of vfs_readlink(). As a bonus, now the proper security hook is also called. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
All filesystems that are backers for overlayfs would also use generic_readlink(). Move this logic to the overlay itself, which is a nice cleanup. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 08 10月, 2016 1 次提交
-
-
由 Andreas Gruenbacher 提交于
These inode operations are no longer used; remove them. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 9月, 2016 1 次提交
-
-
由 Jan Kara 提交于
inode_change_ok() will be resposible for clearing capabilities and IMA extended attributes and as such will need dentry. Give it as an argument to inode_change_ok() instead of an inode. Also rename inode_change_ok() to setattr_prepare() to better relect that it does also some modifications in addition to checks. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 19 9月, 2016 1 次提交
-
-
由 Vivek Goyal 提交于
Now, we have the notion that copy up of a file is done with the creds of mounter of overlay filesystem (as opposed to task). Right now before we switch creds, we do some vfs_getattr() operations in the context of task and that itself can fail. We should do that getattr() using the creds of mounter instead. So this patch switches to mounter's creds early during copy up process so that even vfs_getattr() is done with mounter's creds. Do not call revert_creds() unless we have already called ovl_override_creds(). [Reported by Arnd Bergmann] Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 01 9月, 2016 6 次提交
-
-
由 Miklos Szeredi 提交于
Be defensive about what underlying fs provides us in the returned xattr list buffer. If it's not properly null terminated, bail out with a warning insead of BUG. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
-
由 Andreas Gruenbacher 提交于
Now that overlayfs has xattr handlers for iop->{set,remove}xattr, use those same handlers for iop->getxattr as well. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Andreas Gruenbacher 提交于
Commit d837a49b ("ovl: fix POSIX ACL setting") switches from iop->setxattr from ovl_setxattr to generic_setxattr, so switch from ovl_removexattr to generic_removexattr as well. As far as permission checking goes, the same rules should apply in either case. While doing that, rename ovl_setxattr to ovl_xattr_set to indicate that this is not an iop->setxattr implementation and remove the unused inode argument. Move ovl_other_xattr_set above ovl_own_xattr_set so that they match the order of handlers in ovl_xattr_handlers. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Fixes: d837a49b ("ovl: fix POSIX ACL setting") Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Andreas Gruenbacher 提交于
Make sure ovl_own_xattr_handler only matches attribute names starting with "overlay.", not "overlayXXX". Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Fixes: d837a49b ("ovl: fix POSIX ACL setting") Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Some operations (setxattr/chmod) can make the cached acl stale. We either need to clear overlay's acl cache for the affected inode or prevent acl caching on the overlay altogether. Preventing caching has the following advantages: - no double caching, less memory used - overlay cache doesn't go stale when fs clears it's own cache Possible disadvantage is performance loss. If that becomes a problem get_acl() can be optimized for overlayfs. This patch disables caching by pre setting i_*acl to a value that - has bit 0 set, so is_uncached_acl() will return true - is not equal to ACL_NOT_CACHED, so get_acl() will not overwrite it The constant -3 was chosen for this purpose. Fixes: 39a25b2b ("ovl: define ->get_acl() for overlay inodes") Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Instead of calling ->get_acl() directly, use get_acl() to get the cached value. We will have the acl cached on the underlying inode anyway, because we do permission checking on the both the overlay and the underlying fs. So, since we already have double caching, this improves performance without any cost. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 08 8月, 2016 1 次提交
-
-
由 Miklos Szeredi 提交于
When a copy up of a directory occurs which has the opaque xattr set, the xattr remains in the upper directory. The immediate behavior with overlayfs is that the upper directory is not treated as opaque, however after a remount the opaque flag is used and upper directory is treated as opaque. This causes files created in the lower layer to be hidden when using multiple lower directories. Fix by not copying up the opaque flag. To reproduce: ----8<---------8<---------8<---------8<---------8<---------8<---- mkdir -p l/d/s u v w mnt mount -t overlay overlay -olowerdir=l,upperdir=u,workdir=w mnt rm -rf mnt/d/ mkdir -p mnt/d/n umount mnt mount -t overlay overlay -olowerdir=u:l,upperdir=v,workdir=w mnt touch mnt/d/foo umount mnt mount -t overlay overlay -olowerdir=u:l,upperdir=v,workdir=w mnt ls mnt/d ----8<---------8<---------8<---------8<---------8<---------8<---- output should be: "foo n" Reported-by: NDerek McGowan <dmcg@drizz.net> Link: https://bugzilla.kernel.org/show_bug.cgi?id=151291Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
-
- 29 7月, 2016 13 次提交
-
-
由 Vivek Goyal 提交于
Right now we remove MAY_WRITE/MAY_APPEND bits from mask if realfile is on lower/. This is done as files on lower will never be written and will be copied up. But to copy up a file, mounter should have MAY_READ permission otherwise copy up will fail. So set MAY_READ in mask when MAY_WRITE is reset. Dan Walsh noticed this when he did access(lowerfile, W_OK) and it returned True (context mounts) but when he tried to actually write to file, it failed as mounter did not have permission on lower file. [SzM] don't set MAY_READ if only MAY_APPEND is set without MAY_WRITE; this won't trigger a copy-up. Reported-by: NDan Walsh <dwalsh@redhat.com> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Vivek Goyal 提交于
Right now if file is on lower/, we remove MAY_WRITE/MAY_APPEND bits from mask as lower/ will never be written and file will be copied up. But this is not true for special files. These files are not copied up and are opened in place. So don't dilute the checks for these types of files. Reported-by: NDan Walsh <dwalsh@redhat.com> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Setting POSIX ACL needs special handling: 1) Some permission checks are done by ->setxattr() which now uses mounter's creds ("ovl: do operations on underlying file system in mounter's context"). These permission checks need to be done with current cred as well. 2) Setting ACL can fail for various reasons. We do not need to copy up in these cases. In the mean time switch to using generic_setxattr. [Arnd Bergmann] Fix link error without POSIX ACL. posix_acl_from_xattr() doesn't have a 'static inline' implementation when CONFIG_FS_POSIX_ACL is disabled, and I could not come up with an obvious way to do it. This instead avoids the link error by defining two sets of ACL operations and letting the compiler drop one of the two at compile time depending on CONFIG_FS_POSIX_ACL. This avoids all references to the ACL code, also leading to smaller code. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Inode attributes are copied up to overlay inode (uid, gid, mode, atime, mtime, ctime) so generic code using these fields works correcty. If a hard link is created in overlayfs separate inodes are allocated for each link. If chmod/chown/etc. is performed on one of the links then the inode belonging to the other ones won't be updated. This patch attempts to fix this by sharing inodes for hard links. Use inode hash (with real inode pointer as a key) to make sure overlay inodes are shared for hard links on upper. Hard links on lower are still split (which is not user observable until the copy-up happens, see Documentation/filesystems/overlayfs.txt under "Non-standard behavior"). The inode is only inserted in the hash if it is non-directoy and upper. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
To get from overlay inode to real inode we currently use 'struct ovl_entry', which has lifetime connected to overlay dentry. This is okay, since each overlay dentry had a new overlay inode allocated. Following patch will break that assumption, so need to leave out ovl_entry. This patch stores the real inode directly in i_private, with the lowest bit used to indicate whether the inode is upper or lower. Lifetime rules remain, using ovl_inode_real() must only be done while caller holds ref on overlay dentry (and hence on real dentry), or within RCU protected regions. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
The error is due to RCU and is temporary. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Fix atime update logic in overlayfs. This patch adds an i_op->update_time() handler to overlayfs inodes. This forwards atime updates to the upper layer only. No atime updates are done on lower layers. Remove implicit atime updates to underlying files and directories with O_NOATIME. Remove explicit atime update in ovl_readlink(). Clear atime related mnt flags from cloned upper mount. This means atime updates are controlled purely by overlayfs mount options. Reported-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
The fact that we always do permission checking on the overlay inode and clear MAY_WRITE for checking access to the lower inode allows cruft to be removed from ovl_permission(). 1) "default_permissions" option effectively did generic_permission() on the overlay inode with i_mode, i_uid and i_gid updated from underlying filesystem. This is what we do by default now. It did the update using vfs_getattr() but that's only needed if the underlying filesystem can change (which is not allowed). We may later introduce a "paranoia_mode" that verifies that mode/uid/gid are not changed. 2) splitting out the IS_RDONLY() check from inode_permission() also becomes unnecessary once we remove the MAY_WRITE from the lower inode check. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Vivek Goyal 提交于
Now we have two levels of checks in ovl_permission(). overlay inode is checked with the creds of task while underlying inode is checked with the creds of mounter. Looks like mounter does not have to have WRITE access to files on lower/. So remove the MAY_WRITE from access mask for checks on underlying lower inode. This means task should still have the MAY_WRITE permission on lower inode and mounter is not required to have MAY_WRITE. It also solves the problem of read only NFS mounts being used as lower. If __inode_permission(lower_inode, MAY_WRITE) is called on read only NFS, it fails. By resetting MAY_WRITE, check succeeds and case of read only NFS shold work with overlay without having to specify any special mount options (default permission). Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Vivek Goyal 提交于
Given we are now doing checks both on overlay inode as well underlying inode, we should be able to do checks and operations on underlying file system using mounter's context. So modify all operations to do checks/operations on underlying dentry/inode in the context of mounter. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Vivek Goyal 提交于
Right now ovl_permission() calls __inode_permission(realinode), to do permission checks on real inode and no checks are done on overlay inode. Modify it to do checks both on overlay inode as well as underlying inode. Checks on overlay inode will be done with the creds of calling task while checks on underlying inode will be done with the creds of mounter. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Vivek Goyal 提交于
Now we are planning to do DAC permission checks on overlay inode itself. And to make it work, we will need to make sure we can get acls from underlying inode. So define ->get_acl() for overlay inodes and this in turn calls into underlying filesystem to get acls, if any. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Andreas Gruenbacher 提交于
Previously this was only done for directory inodes. Doing so for all inodes makes for a nice cleanup in ovl_permission at zero cost. Inodes are not shared for hard links on the overlay, so this works fine. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 04 7月, 2016 2 次提交
-
-
由 Vivek Goyal 提交于
Right now when a new overlay inode is created, we initialize overlay inode's ->i_mode from underlying inode ->i_mode but we retain only file type bits (S_IFMT) and discard permission bits. This patch changes it and retains permission bits too. This should allow overlay to do permission checks on overlay inode itself in task context. [SzM] It also fixes clearing suid/sgid bits on write. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Reported-by: NEryu Guan <eguan@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org>
-
由 Miklos Szeredi 提交于
Before 4bacc9c9 ("overlayfs: Make f_path...") file->f_path pointed to the underlying file, hence suid/sgid removal on write worked fine. After that patch file->f_path pointed to the overlay file, and the file mode bits weren't copied to overlay_inode->i_mode. So the suid/sgid removal simply stopped working. The fix is to copy the mode bits, but then ovl_setattr() needs to clear ATTR_MODE to avoid the BUG() in notify_change(). So do this first, then in the next patch copy the mode. Reported-by: NEryu Guan <eguan@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org>
-