1. 29 7月, 2016 8 次提交
    • M
      ovl: permission: return ECHILD instead of ENOENT · a999d7e1
      Miklos Szeredi 提交于
      The error is due to RCU and is temporary.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a999d7e1
    • M
      ovl: update atime on upper · d719e8f2
      Miklos Szeredi 提交于
      Fix atime update logic in overlayfs.
      
      This patch adds an i_op->update_time() handler to overlayfs inodes.  This
      forwards atime updates to the upper layer only.  No atime updates are done
      on lower layers.
      
      Remove implicit atime updates to underlying files and directories with
      O_NOATIME.  Remove explicit atime update in ovl_readlink().
      
      Clear atime related mnt flags from cloned upper mount.  This means atime
      updates are controlled purely by overlayfs mount options.
      
      Reported-by: Konstantin Khlebnikov <koct9i@gmail.com> 
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d719e8f2
    • M
      ovl: simplify permission checking · 9c630ebe
      Miklos Szeredi 提交于
      The fact that we always do permission checking on the overlay inode and
      clear MAY_WRITE for checking access to the lower inode allows cruft to be
      removed from ovl_permission().
      
      1) "default_permissions" option effectively did generic_permission() on the
      overlay inode with i_mode, i_uid and i_gid updated from underlying
      filesystem.  This is what we do by default now.  It did the update using
      vfs_getattr() but that's only needed if the underlying filesystem can
      change (which is not allowed).  We may later introduce a "paranoia_mode"
      that verifies that mode/uid/gid are not changed.
      
      2) splitting out the IS_RDONLY() check from inode_permission() also becomes
      unnecessary once we remove the MAY_WRITE from the lower inode check.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      9c630ebe
    • V
      ovl: do not require mounter to have MAY_WRITE on lower · 754f8cb7
      Vivek Goyal 提交于
      Now we have two levels of checks in ovl_permission(). overlay inode
      is checked with the creds of task while underlying inode is checked
      with the creds of mounter.
      
      Looks like mounter does not have to have WRITE access to files on lower/.
      So remove the MAY_WRITE from access mask for checks on underlying
      lower inode.
      
      This means task should still have the MAY_WRITE permission on lower
      inode and mounter is not required to have MAY_WRITE.
      
      It also solves the problem of read only NFS mounts being used as lower.
      If __inode_permission(lower_inode, MAY_WRITE) is called on read only
      NFS, it fails. By resetting MAY_WRITE, check succeeds and case of
      read only NFS shold work with overlay without having to specify any
      special mount options (default permission).
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      754f8cb7
    • V
      ovl: do operations on underlying file system in mounter's context · 1175b6b8
      Vivek Goyal 提交于
      Given we are now doing checks both on overlay inode as well underlying
      inode, we should be able to do checks and operations on underlying file
      system using mounter's context.
      
      So modify all operations to do checks/operations on underlying dentry/inode
      in the context of mounter.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      1175b6b8
    • V
      ovl: modify ovl_permission() to do checks on two inodes · c0ca3d70
      Vivek Goyal 提交于
      Right now ovl_permission() calls __inode_permission(realinode), to do
      permission checks on real inode and no checks are done on overlay inode.
      
      Modify it to do checks both on overlay inode as well as underlying inode.
      Checks on overlay inode will be done with the creds of calling task while
      checks on underlying inode will be done with the creds of mounter.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      c0ca3d70
    • V
      ovl: define ->get_acl() for overlay inodes · 39a25b2b
      Vivek Goyal 提交于
      Now we are planning to do DAC permission checks on overlay inode
      itself. And to make it work, we will need to make sure we can get acls from
      underlying inode. So define ->get_acl() for overlay inodes and this in turn
      calls into underlying filesystem to get acls, if any.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      39a25b2b
    • A
      ovl: store ovl_entry in inode->i_private for all inodes · 58ed4e70
      Andreas Gruenbacher 提交于
      Previously this was only done for directory inodes.  Doing so for all
      inodes makes for a nice cleanup in ovl_permission at zero cost.
      
      Inodes are not shared for hard links on the overlay, so this works fine.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      58ed4e70
  2. 04 7月, 2016 2 次提交
    • V
      ovl: Copy up underlying inode's ->i_mode to overlay inode · 07a2daab
      Vivek Goyal 提交于
      Right now when a new overlay inode is created, we initialize overlay
      inode's ->i_mode from underlying inode ->i_mode but we retain only
      file type bits (S_IFMT) and discard permission bits.
      
      This patch changes it and retains permission bits too. This should allow
      overlay to do permission checks on overlay inode itself in task context.
      
      [SzM] It also fixes clearing suid/sgid bits on write.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reported-by: NEryu Guan <eguan@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
      Cc: <stable@vger.kernel.org>
      07a2daab
    • M
      ovl: handle ATTR_KILL* · b99c2d91
      Miklos Szeredi 提交于
      Before 4bacc9c9 ("overlayfs: Make f_path...") file->f_path pointed to
      the underlying file, hence suid/sgid removal on write worked fine.
      
      After that patch file->f_path pointed to the overlay file, and the file
      mode bits weren't copied to overlay_inode->i_mode.  So the suid/sgid
      removal simply stopped working.
      
      The fix is to copy the mode bits, but then ovl_setattr() needs to clear
      ATTR_MODE to avoid the BUG() in notify_change().  So do this first, then in
      the next patch copy the mode.
      Reported-by: NEryu Guan <eguan@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
      Cc: <stable@vger.kernel.org>
      b99c2d91
  3. 30 6月, 2016 1 次提交
    • M
      vfs: merge .d_select_inode() into .d_real() · 2d902671
      Miklos Szeredi 提交于
      The two methods essentially do the same: find the real dentry/inode
      belonging to an overlay dentry.  The difference is in the usage:
      
      vfs_open() uses ->d_select_inode() and expects the function to perform
      copy-up if necessary based on the open flags argument.
      
      file_dentry() uses ->d_real() passing in the overlay dentry as well as the
      underlying inode.
      
      vfs_rename() uses ->d_select_inode() but passes zero flags.  ->d_real()
      with a zero inode would have worked just as well here.
      
      This patch merges the functionality of ->d_select_inode() into ->d_real()
      by adding an 'open_flags' argument to the latter.
      
      [Al Viro] Make the signature of d_real() match that of ->d_real() again.
      And constify the inode argument, while we are at it.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2d902671
  4. 29 6月, 2016 2 次提交
    • M
      ovl: get_write_access() in truncate · 03bea604
      Miklos Szeredi 提交于
      When truncating a file we should check write access on the underlying
      inode.  And we should do so on the lower file as well (before copy-up) for
      consistency.
      
      Original patch and test case by Aihua Zhang.
      
       - - >o >o - - test.c - - >o >o - -
      #include <stdio.h>
      #include <errno.h>
      #include <unistd.h>
      
      int main(int argc, char *argv[])
      {
      	int ret;
      
      	ret = truncate(argv[0], 4096);
      	if (ret != -1) {
      		fprintf(stderr, "truncate(argv[0]) should have failed\n");
      		return 1;
      	}
      	if (errno != ETXTBSY) {
      		perror("truncate(argv[0])");
      		return 1;
      	}
      
      	return 0;
      }
       - - >o >o - - >o >o - - >o >o - -
      Reported-by: NAihua Zhang <zhangaihua1@huawei.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Cc: <stable@vger.kernel.org>
      03bea604
    • M
      ovl: fix dentry leak for default_permissions · a4859d75
      Miklos Szeredi 提交于
      When using the 'default_permissions' mount option, ovl_permission() on
      non-directories was missing a dput(alias), resulting in "BUG Dentry still
      in use".
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 8d3095f4 ("ovl: default permissions")
      Cc: <stable@vger.kernel.org> # v4.5+
      a4859d75
  5. 06 6月, 2016 1 次提交
    • M
      ovl: xattr filter fix · b581755b
      Miklos Szeredi 提交于
      a) ovl_need_xattr_filter() is wrong, we can have multiple lower layers
      overlaid, all of which (except the lowest one) honouring the
      "trusted.overlay.opaque" xattr.  So need to filter everything except the
      bottom and the pure-upper layer.
      
      b) we no longer can assume that inode is attached to dentry in
      get/setxattr.
      
      This patch unconditionally filters private xattrs to fix both of the above.
      Performance impact for get/removexattrs is likely in the noise.
      
      For listxattrs it might be measurable in pathological cases, but I very
      much hope nobody cares.  If they do, we'll fix it then.
      Reported-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: b9680917 ("security_d_instantiate(): move to the point prior to attaching dentry to inode")
      b581755b
  6. 28 5月, 2016 1 次提交
  7. 11 4月, 2016 1 次提交
  8. 04 3月, 2016 1 次提交
  9. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  10. 31 12月, 2015 1 次提交
  11. 11 12月, 2015 1 次提交
  12. 09 12月, 2015 1 次提交
    • A
      replace ->follow_link() with new method that could stay in RCU mode · 6b255391
      Al Viro 提交于
      new method: ->get_link(); replacement of ->follow_link().  The differences
      are:
      	* inode and dentry are passed separately
      	* might be called both in RCU and non-RCU mode;
      the former is indicated by passing it a NULL dentry.
      	* when called that way it isn't allowed to block
      and should return ERR_PTR(-ECHILD) if it needs to be called
      in non-RCU mode.
      
      It's a flagday change - the old method is gone, all in-tree instances
      converted.  Conversion isn't hard; said that, so far very few instances
      do not immediately bail out when called in RCU mode.  That'll change
      in the next commits.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6b255391
  13. 07 12月, 2015 2 次提交
  14. 12 10月, 2015 2 次提交
    • M
      ovl: default permissions · 8d3095f4
      Miklos Szeredi 提交于
      Add mount option "default_permissions" to alter the way permissions are
      calculated.
      
      Without this option and prior to this patch permissions were calculated by
      underlying lower or upper filesystem.
      
      With this option the permissions are calculated by overlayfs based on the
      file owner, group and mode bits.
      
      This has significance for example when a read-only exported NFS filesystem
      is used as a lower layer.  In this case the underlying NFS filesystem will
      reply with EROFS, in which case all we know is that the filesystem is
      read-only.  But that's not what we are interested in, we are interested in
      whether the access would be allowed if the filesystem wasn't read-only; the
      server doesn't tell us that, and would need updating at various levels,
      which doesn't seem practicable.
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      8d3095f4
    • M
      ovl: fix open in stacked overlay · 1c8a47df
      Miklos Szeredi 提交于
      If two overlayfs filesystems are stacked on top of each other, then we need
      recursion in ovl_d_select_inode().
      
      I guess d_backing_inode() is supposed to do that.  But currently it doesn't
      and that functionality is open coded in vfs_open().  This is now copied
      into ovl_d_select_inode() to fix this regression.
      Reported-by: NAlban Crequy <alban.crequy@gmail.com>
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay...")
      Cc: David Howells <dhowells@redhat.com>
      Cc: <stable@vger.kernel.org> # v4.2+
      1c8a47df
  15. 12 7月, 2015 1 次提交
  16. 19 6月, 2015 2 次提交
    • D
      overlayfs: Make f_path always point to the overlay and f_inode to the underlay · 4bacc9c9
      David Howells 提交于
      Make file->f_path always point to the overlay dentry so that the path in
      /proc/pid/fd is correct and to ensure that label-based LSMs have access to the
      overlay as well as the underlay (path-based LSMs probably don't need it).
      
      Using my union testsuite to set things up, before the patch I see:
      
      	[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
      	[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
      	...
      	lr-x------. 1 root root 64 Jun  5 14:38 5 -> /a/foo107
      	[root@andromeda union-testsuite]# stat /mnt/a/foo107
      	...
      	Device: 23h/35d Inode: 13381       Links: 1
      	...
      	[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
      	...
      	Device: 23h/35d Inode: 13381       Links: 1
      	...
      
      After the patch:
      
      	[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
      	[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
      	...
      	lr-x------. 1 root root 64 Jun  5 14:22 5 -> /mnt/a/foo107
      	[root@andromeda union-testsuite]# stat /mnt/a/foo107
      	...
      	Device: 23h/35d Inode: 40346       Links: 1
      	...
      	[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
      	...
      	Device: 23h/35d Inode: 40346       Links: 1
      	...
      
      Note the change in where /proc/$$/fd/5 points to in the ls command.  It was
      pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
      (which is correct).
      
      The inode accessed, however, is the lower layer.  The union layer is on device
      25h/37d and the upper layer on 24h/36d.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4bacc9c9
    • D
      overlay: Call ovl_drop_write() earlier in ovl_dentry_open() · f25801ee
      David Howells 提交于
      Call ovl_drop_write() earlier in ovl_dentry_open() before we call vfs_open()
      as we've done the copy up for which we needed the freeze-write lock by that
      point.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f25801ee
  17. 11 5月, 2015 4 次提交
    • A
      switch ->put_link() from dentry to inode · 5f2c4179
      Al Viro 提交于
      only one instance looks at that argument at all; that sole
      exception wants inode rather than dentry.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5f2c4179
    • A
      don't pass nameidata to ->follow_link() · 6e77137b
      Al Viro 提交于
      its only use is getting passed to nd_jump_link(), which can obtain
      it from current->nameidata
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6e77137b
    • A
      new ->follow_link() and ->put_link() calling conventions · 680baacb
      Al Viro 提交于
      a) instead of storing the symlink body (via nd_set_link()) and returning
      an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
      that opaque pointer (into void * passed by address by caller) and returns
      the symlink body.  Returning ERR_PTR() on error, NULL on jump (procfs magic
      symlinks) and pointer to symlink body for normal symlinks.  Stored pointer
      is ignored in all cases except the last one.
      
      Storing NULL for opaque pointer (or not storing it at all) means no call
      of ->put_link().
      
      b) the body used to be passed to ->put_link() implicitly (via nameidata).
      Now only the opaque pointer is.  In the cases when we used the symlink body
      to free stuff, ->follow_link() now should store it as opaque pointer in addition
      to returning it.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      680baacb
    • N
      ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link · 3188b295
      NeilBrown 提交于
      ovl_follow_link current calls ->put_link on an error path.
      However ->put_link is about to change in a way that it will be
      impossible to call it from ovl_follow_link.
      
      So rearrange the code to avoid the need for that error path.
      Specifically: move the kmalloc() call before the ->follow_link()
      call to the subordinate filesystem.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3188b295
  18. 13 12月, 2014 3 次提交
  19. 20 11月, 2014 1 次提交
    • M
      ovl: fix race in private xattr checks · 52148463
      Miklos Szeredi 提交于
      Xattr operations can race with copy up.  This does not matter as long as
      we consistently fiter out "trunsted.overlay.opaque" attribute on upper
      directories.
      
      Previously we checked parent against OVL_PATH_MERGE.  This is too general,
      and prone to race with copy-up.  I.e. we found the parent to be on the
      lower layer but ovl_dentry_real() would return the copied-up dentry,
      possibly with the "opaque" attribute.
      
      So instead use ovl_path_real() and decide to filter the attributes based on
      the actual type of the dentry we'll use.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      52148463
  20. 24 10月, 2014 1 次提交
    • M
      overlay filesystem · e9be9d5e
      Miklos Szeredi 提交于
      Overlayfs allows one, usually read-write, directory tree to be
      overlaid onto another, read-only directory tree.  All modifications
      go to the upper, writable layer.
      
      This type of mechanism is most often used for live CDs but there's a
      wide variety of other uses.
      
      The implementation differs from other "union filesystem"
      implementations in that after a file is opened all operations go
      directly to the underlying, lower or upper, filesystems.  This
      simplifies the implementation and allows native performance in these
      cases.
      
      The dentry tree is duplicated from the underlying filesystems, this
      enables fast cached lookups without adding special support into the
      VFS.  This uses slightly more memory than union mounts, but dentries
      are relatively small.
      
      Currently inodes are duplicated as well, but it is a possible
      optimization to share inodes for non-directories.
      
      Opening non directories results in the open forwarded to the
      underlying filesystem.  This makes the behavior very similar to union
      mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
      descriptors).
      
      Usage:
      
        mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper/upper,workdir=/upper/work /overlay
      
      The following cotributions have been folded into this patch:
      
      Neil Brown <neilb@suse.de>:
       - minimal remount support
       - use correct seek function for directories
       - initialise is_real before use
       - rename ovl_fill_cache to ovl_dir_read
      
      Felix Fietkau <nbd@openwrt.org>:
       - fix a deadlock in ovl_dir_read_merged
       - fix a deadlock in ovl_remove_whiteouts
      
      Erez Zadok <ezk@fsl.cs.sunysb.edu>
       - fix cleanup after WARN_ON
      
      Sedat Dilek <sedat.dilek@googlemail.com>
       - fix up permission to confirm to new API
      
      Robin Dong <hao.bigrat@gmail.com>
       - fix possible leak in ovl_new_inode
       - create new inode in ovl_link
      
      Andy Whitcroft <apw@canonical.com>
       - switch to __inode_permission()
       - copy up i_uid/i_gid from the underlying inode
      
      AV:
       - ovl_copy_up_locked() - dput(ERR_PTR(...)) on two failure exits
       - ovl_clear_empty() - one failure exit forgetting to do unlock_rename(),
         lack of check for udir being the parent of upper, dropping and regaining
         the lock on udir (which would require _another_ check for parent being
         right).
       - bogus d_drop() in copyup and rename [fix from your mail]
       - copyup/remove and copyup/rename races [fix from your mail]
       - ovl_dir_fsync() leaving ERR_PTR() in ->realfile
       - ovl_entry_free() is pointless - it's just a kfree_rcu()
       - fold ovl_do_lookup() into ovl_lookup()
       - manually assigning ->d_op is wrong.  Just use ->s_d_op.
       [patches picked from Miklos]:
       * copyup/remove and copyup/rename races
       * bogus d_drop() in copyup and rename
      
      Also thanks to the following people for testing and reporting bugs:
      
        Jordi Pujol <jordipujolp@gmail.com>
        Andy Whitcroft <apw@canonical.com>
        Michal Suchanek <hramrach@centrum.cz>
        Felix Fietkau <nbd@openwrt.org>
        Erez Zadok <ezk@fsl.cs.sunysb.edu>
        Randy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      e9be9d5e