1. 31 1月, 2019 3 次提交
    • D
      convert do_remount_sb() to fs_context · 8d0347f6
      David Howells 提交于
      Replace do_remount_sb() with a function, reconfigure_super(), that's
      fs_context aware.  The fs_context is expected to be parameterised already
      and have ->root pointing to the superblock to be reconfigured.
      
      A legacy wrapper is provided that is intended to be called from the
      fs_context ops when those appear, but for now is called directly from
      reconfigure_super().  This wrapper invokes the ->remount_fs() superblock op
      for the moment.  It is intended that the remount_fs() op will be phased
      out.
      
      The fs_context->purpose is set to FS_CONTEXT_FOR_RECONFIGURE to indicate
      that the context is being used for reconfiguration.
      
      do_umount_root() is provided to consolidate remount-to-R/O for umount and
      emergency remount by creating a context and invoking reconfiguration.
      
      do_remount(), do_umount() and do_emergency_remount_callback() are switched
      to use the new process.
      
      [AV -- fold UMOUNT and EMERGENCY_REMOUNT in; fixes the
      umount / bug, gets rid of pointless complexity]
      [AV -- set ->net_ns in all cases; nfs remount will need that]
      [AV -- shift security_sb_remount() call into reconfigure_super(); the callers
      that didn't do security_sb_remount() have NULL fc->security anyway, so it's
      a no-op for them]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Co-developed-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8d0347f6
    • A
      vfs_get_tree(): evict the call of security_sb_kern_mount() · c9ce29ed
      Al Viro 提交于
      Right now vfs_get_tree() calls security_sb_kern_mount() (i.e.
      mount MAC) unless it gets MS_KERNMOUNT or MS_SUBMOUNT in flags.
      Doing it that way is both clumsy and imprecise.
      
      Consider the callers' tree of vfs_get_tree():
      vfs_get_tree()
              <- do_new_mount()
      	<- vfs_kern_mount()
      		<- simple_pin_fs()
      		<- vfs_submount()
      		<- kern_mount_data()
      		<- init_mount_tree()
      		<- btrfs_mount()
      			<- vfs_get_tree()
      		<- nfs_do_root_mount()
      			<- nfs4_try_mount()
      				<- nfs_fs_mount()
      					<- vfs_get_tree()
      			<- nfs4_referral_mount()
      
      do_new_mount() always does need MAC (we are guaranteed that neither
      MS_KERNMOUNT nor MS_SUBMOUNT will be passed there).
      
      simple_pin_fs(), vfs_submount() and kern_mount_data() pass explicit
      flags inhibiting that check.  So does nfs4_referral_mount() (the
      flags there are ulimately coming from vfs_submount()).
      
      init_mount_tree() is called too early for anything LSM-related; it
      doesn't matter whether we attempt those checks, they'll do nothing.
      
      Finally, in case of btrfs_mount() and nfs_fs_mount(), doing MAC
      is pointless - either the caller will do it, or the flags are
      such that we wouldn't have done it either.
      
      In other words, the one and only case when we want that check
      done is when we are called from do_new_mount(), and there we
      want it unconditionally.
      
      So let's simply move it there.  The superblock is still locked,
      so nobody is going to get access to it (via ustat(2), etc.)
      until we get a chance to apply the checks - we are free to
      move them to any point up to where we drop ->s_umount (in
      do_new_mount_fc()).
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c9ce29ed
    • D
      vfs: Introduce fs_context, switch vfs_kern_mount() to it. · 9bc61ab1
      David Howells 提交于
      Introduce a filesystem context concept to be used during superblock
      creation for mount and superblock reconfiguration for remount.  This is
      allocated at the beginning of the mount procedure and into it is placed:
      
       (1) Filesystem type.
      
       (2) Namespaces.
      
       (3) Source/Device names (there may be multiple).
      
       (4) Superblock flags (SB_*).
      
       (5) Security details.
      
       (6) Filesystem-specific data, as set by the mount options.
      
      Accessor functions are then provided to set up a context, parameterise it
      from monolithic mount data (the data page passed to mount(2)) and tear it
      down again.
      
      A legacy wrapper is provided that implements what will be the basic
      operations, wrapping access to filesystems that aren't yet aware of the
      fs_context.
      
      Finally, vfs_kern_mount() is changed to make use of the fs_context and
      mount_fs() is replaced by vfs_get_tree(), called from vfs_kern_mount().
      [AV -- add missing kstrdup()]
      [AV -- put_cred() can be unconditional - fc->cred can't be NULL]
      [AV -- take legacy_validate() contents into legacy_parse_monolithic()]
      [AV -- merge KERNEL_MOUNT and USER_MOUNT]
      [AV -- don't unlock superblock on success return from vfs_get_tree()]
      [AV -- kill 'reference' argument of init_fs_context()]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Co-developed-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9bc61ab1
  2. 18 7月, 2018 4 次提交
  3. 12 7月, 2018 4 次提交
  4. 11 7月, 2018 1 次提交
  5. 20 6月, 2018 1 次提交
  6. 04 6月, 2018 1 次提交
    • A
      Revert "fs: fold open_check_o_direct into do_dentry_open" · af04fadc
      Al Viro 提交于
      This reverts commit cab64df1.
      
      Having vfs_open() in some cases drop the reference to
      struct file combined with
      
      	error = vfs_open(path, f, cred);
      	if (error) {
      		put_filp(f);
      		return ERR_PTR(error);
      	}
      	return f;
      
      is flat-out wrong.  It used to be
      
      		error = vfs_open(path, f, cred);
      		if (!error) {
      			/* from now on we need fput() to dispose of f */
      			error = open_check_o_direct(f);
      			if (error) {
      				fput(f);
      				f = ERR_PTR(error);
      			}
      		} else {
      			put_filp(f);
      			f = ERR_PTR(error);
      		}
      
      and sure, having that open_check_o_direct() boilerplate gotten rid of is
      nice, but not that way...
      
      Worse, another call chain (via finish_open()) is FUBAR now wrt
      FILE_OPENED handling - in that case we get error returned, with file
      already hit by fput() *AND* FILE_OPENED not set.  Guess what happens in
      path_openat(), when it hits
      
      	if (!(opened & FILE_OPENED)) {
      		BUG_ON(!error);
      		put_filp(file);
      	}
      
      The root cause of all that crap is that the callers of do_dentry_open()
      have no way to tell which way did it fail; while that could be fixed up
      (by passing something like int *opened to do_dentry_open() and have it
      marked if we'd called ->open()), it's probably much too late in the
      cycle to do so right now.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af04fadc
  7. 03 4月, 2018 9 次提交
  8. 28 3月, 2018 1 次提交
  9. 10 11月, 2017 1 次提交
  10. 05 9月, 2017 1 次提交
    • M
      ovl: don't allow writing ioctl on lower layer · 7c6893e3
      Miklos Szeredi 提交于
      Problem with ioctl() is that it's a file operation, yet often used as an
      inode operation (i.e. modify the inode despite the file being opened for
      read-only).
      
      mnt_want_write_file() is used by filesystems in such cases to get write
      access on an arbitrary open file.
      
      Since overlayfs lets filesystems do all file operations, including ioctl,
      this can lead to mnt_want_write_file() returning OK for a lower file and
      modification of that lower file.
      
      This patch prevents modification by checking if the file is from an
      overlayfs lower layer and returning EPERM in that case.
      
      Need to introduce a mnt_want_write_file_path() variant that still does the
      old thing for inode operations that can do the copy up + modification
      correctly in such cases (fchown, fsetxattr, fremovexattr).
      
      This does not address the correctness of such ioctls on overlayfs (the
      correct way would be to copy up and attempt to perform ioctl on upper
      file).
      
      In theory this could be a regression.  We very much hope that nobody is
      relying on such a hack in any sane setup.
      
      While this patch meddles in VFS code, it has no effect on non-overlayfs
      filesystems.
      Reported-by: N"zhangyi (F)" <yi.zhang@huawei.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      7c6893e3
  11. 02 9月, 2017 1 次提交
    • D
      xfs: evict all inodes involved with log redo item · 799ea9e9
      Darrick J. Wong 提交于
      When we introduced the bmap redo log items, we set MS_ACTIVE on the
      mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
      from being truncated prematurely during log recovery.  This also had the
      effect of putting linked inodes on the lru instead of evicting them.
      
      Unfortunately, we neglected to find all those unreferenced lru inodes
      and evict them after finishing log recovery, which means that we leak
      them if anything goes wrong in the rest of xfs_mountfs, because the lru
      is only cleaned out on unmount.
      
      Therefore, evict unreferenced inodes in the lru list immediately
      after clearing MS_ACTIVE.
      
      Fixes: 17c12bcd ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Cc: viro@ZenIV.linux.org.uk
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      799ea9e9
  12. 30 4月, 2017 1 次提交
  13. 18 4月, 2017 1 次提交
  14. 31 1月, 2017 1 次提交
  15. 06 12月, 2016 1 次提交
  16. 30 11月, 2016 1 次提交
  17. 28 9月, 2016 1 次提交
  18. 19 9月, 2016 1 次提交
  19. 16 9月, 2016 1 次提交
    • M
      vfs: update ovl inode before relatime check · 598e3c8f
      Miklos Szeredi 提交于
      On overlayfs relatime_need_update() needs inode times to be correct on
      overlay inode.  But i_mtime and i_ctime are updated by filesystem code on
      underlying inode only, so they will be out-of-date on the overlay inode.
      
      This patch copies the times from the underlying inode if needed.  This
      can't be done if called from RCU lookup (link following) but link m/ctime
      are not updated by fs, so this is all right.
      
      This patch doesn't change functionality for anything but overlayfs.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      598e3c8f
  20. 03 8月, 2016 1 次提交
  21. 21 6月, 2016 1 次提交
    • C
      fs: introduce iomap infrastructure · ae259a9c
      Christoph Hellwig 提交于
      Add infrastructure for multipage buffered writes.  This is implemented
      using an main iterator that applies an actor function to a range that
      can be written.
      
      This infrastucture is used to implement a buffered write helper, one
      to zero file ranges and one to implement the ->page_mkwrite VM
      operations.  All of them borrow a fair amount of code from fs/buffers.
      for now by using an internal version of __block_write_begin that
      gets passed an iomap and builds the corresponding buffer head.
      
      The file system is gets a set of paired ->iomap_begin and ->iomap_end
      calls which allow it to map/reserve a range and get a notification
      once the write code is finished with it.
      
      Based on earlier code from Dave Chinner.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ae259a9c
  22. 10 6月, 2016 1 次提交
    • A
      much milder d_walk() race · ba65dc5e
      Al Viro 提交于
      d_walk() relies upon the tree not getting rearranged under it without
      rename_lock being touched.  And we do grab rename_lock around the
      places that change the tree topology.  Unfortunately, branch reordering
      is just as bad from d_walk() POV and we have two places that do it
      without touching rename_lock - one in handling of cursors (for ramfs-style
      directories) and another in autofs.  autofs one is a separate story; this
      commit deals with the cursors.
      	* mark cursor dentries explicitly at allocation time
      	* make __dentry_kill() leave ->d_child.next pointing to the next
      non-cursor sibling, making sure that it won't be moved around unnoticed
      before the parent is relocked on ascend-to-parent path in d_walk().
      	* make d_walk() skip cursors explicitly; strictly speaking it's
      not necessary (all callbacks we pass to d_walk() are no-ops on cursors),
      but it makes analysis easier.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ba65dc5e
  23. 31 3月, 2016 1 次提交
  24. 09 1月, 2016 1 次提交