1. 09 10月, 2014 4 次提交
    • E
      vfs: Add a function to lazily unmount all mounts from any dentry. · 80b5dce8
      Eric W. Biederman 提交于
      The new function detach_mounts comes in two pieces.  The first piece
      is a static inline test of d_mounpoint that returns immediately
      without taking any locks if d_mounpoint is not set.  In the common
      case when mountpoints are absent this allows the vfs to continue
      running with it's same cacheline foot print.
      
      The second piece of detach_mounts __detach_mounts actually does the
      work and it assumes that a mountpoint is present so it is slow and
      takes namespace_sem for write, and then locks the mount hash (aka
      mount_lock) after a struct mountpoint has been found.
      
      With those two locks held each entry on the list of mounts on a
      mountpoint is selected and lazily unmounted until all of the mount
      have been lazily unmounted.
      
      v7: Wrote a proper change description and removed the changelog
          documenting deleted wrong turns.
      Signed-off-by: NEric W. Biederman <ebiederman@twitter.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      80b5dce8
    • E
      vfs: Keep a list of mounts on a mount point · 0a5eb7c8
      Eric W. Biederman 提交于
      To spot any possible problems call BUG if a mountpoint
      is put when it's list of mounts is not empty.
      
      AV: use hlist instead of list_head
      Reviewed-by: NMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: NEric W. Biederman <ebiederman@twitter.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0a5eb7c8
    • E
      vfs: Don't allow overwriting mounts in the current mount namespace · 7af1364f
      Eric W. Biederman 提交于
      In preparation for allowing mountpoints to be renamed and unlinked
      in remote filesystems and in other mount namespaces test if on a dentry
      there is a mount in the local mount namespace before allowing it to
      be renamed or unlinked.
      
      The primary motivation here are old versions of fusermount unmount
      which is not safe if the a path can be renamed or unlinked while it is
      verifying the mount is safe to unmount.  More recent versions are simpler
      and safer by simply using UMOUNT_NOFOLLOW when unmounting a mount
      in a directory owned by an arbitrary user.
      
      Miklos Szeredi <miklos@szeredi.hu> reports this is approach is good
      enough to remove concerns about new kernels mixed with old versions
      of fusermount.
      
      A secondary motivation for restrictions here is that it removing empty
      directories that have non-empty mount points on them appears to
      violate the rule that rmdir can not remove empty directories.  As
      Linus Torvalds pointed out this is useful for programs (like git) that
      test if a directory is empty with rmdir.
      
      Therefore this patch arranges to enforce the existing mount point
      semantics for local mount namespace.
      
      v2: Rewrote the test to be a drop in replacement for d_mountpoint
      v3: Use bool instead of int as the return type of is_local_mountpoint
      Reviewed-by: NMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7af1364f
    • A
      delayed mntput · 9ea459e1
      Al Viro 提交于
      On final mntput() we want fs shutdown to happen before return to
      userland; however, the only case where we want it happen right
      there (i.e. where task_work_add won't do) is MNT_INTERNAL victim.
      Those have to be fully synchronous - failure halfway through module
      init might count on having vfsmount killed right there.  Fortunately,
      final mntput on MNT_INTERNAL vfsmounts happens on shallow stack.
      So we handle those synchronously and do an analog of delayed fput
      logics for everything else.
      
      As the result, we are guaranteed that fs shutdown will always happen
      on shallow stack.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9ea459e1
  2. 08 8月, 2014 2 次提交
    • A
      death to mnt_pinned · 3064c356
      Al Viro 提交于
      Rather than playing silly buggers with vfsmount refcounts, just have
      acct_on() ask fs/namespace.c for internal clone of file->f_path.mnt
      and replace it with said clone.  Then attach the pin to original
      vfsmount.  Voila - the clone will be alive until the file gets closed,
      making sure that underlying superblock remains active, etc., and
      we can drop the original vfsmount, so that it's not kept busy.
      If the file lives until the final mntput of the original vfsmount,
      we'll notice that there's an fs_pin (one in bsd_acct_struct that
      holds that file) and mnt_pin_kill() will take it out.  Since
      ->kill() is synchronous, we won't proceed past that point until
      these files are closed (and private clones of our vfsmount are
      gone), so we get the same ordering warranties we used to get.
      
      mnt_pin()/mnt_unpin()/->mnt_pinned is gone now, and good riddance -
      it never became usable outside of kernel/acct.c (and racy wrt
      umount even there).
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3064c356
    • A
      acct: get rid of acct_list · 215752fc
      Al Viro 提交于
      Put these suckers on per-vfsmount and per-superblock lists instead.
      Note: right now it's still acct_lock for everything, but that's
      going to change.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      215752fc
  3. 02 4月, 2014 1 次提交
  4. 31 3月, 2014 2 次提交
    • A
      switch mnt_hash to hlist · 38129a13
      Al Viro 提交于
      fixes RCU bug - walking through hlist is safe in face of element moves,
      since it's self-terminating.  Cyclic lists are not - if we end up jumping
      to another hash chain, we'll loop infinitely without ever hitting the
      original list head.
      
      [fix for dumb braino folded]
      
      Spotted by: Max Kellermann <mk@cm4all.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      38129a13
    • A
      resizable namespace.c hashes · 0818bf27
      Al Viro 提交于
      * switch allocation to alloc_large_system_hash()
      * make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
      * switch mountpoint_hashtable from list_head to hlist_head
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0818bf27
  5. 26 1月, 2014 1 次提交
  6. 09 11月, 2013 1 次提交
    • A
      RCU'd vfsmounts · 48a066e7
      Al Viro 提交于
      * RCU-delayed freeing of vfsmounts
      * vfsmount_lock replaced with a seqlock (mount_lock)
      * sequence number from mount_lock is stored in nameidata->m_seq and
      used when we exit RCU mode
      * new vfsmount flag - MNT_SYNC_UMOUNT.  Set by umount_tree() when its
      caller knows that vfsmount will have no surviving references.
      * synchronize_rcu() done between unlocking namespace_sem in namespace_unlock()
      and doing pending mntput().
      * new helper: legitimize_mnt(mnt, seq).  Checks the mount_lock sequence
      number against seq, then grabs reference to mnt.  Then it rechecks mount_lock
      again to close the race and either returns success or drops the reference it
      has acquired.  The subtle point is that in case of MNT_SYNC_UMOUNT we can
      simply decrement the refcount and sod off - aforementioned synchronize_rcu()
      makes sure that final mntput() won't come until we leave RCU mode.  We need
      that, since we don't want to end up with some lazy pathwalk racing with
      umount() and stealing the final mntput() from it - caller of umount() may
      expect it to return only once the fs is shut down and we don't want to break
      that.  In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do
      full-blown mntput() in case of mount_lock sequence number mismatch happening
      just as we'd grabbed the reference, but in those cases we won't be stealing
      the final mntput() from anything that would care.
      * mntput_no_expire() doesn't lock anything on the fast path now.  Incidentally,
      SMP and UP cases are handled the same way - no ifdefs there.
      * normal pathname resolution does *not* do any writes to mount_lock.  It does,
      of course, bump the refcounts of vfsmount and dentry in the very end, but that's
      it.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      48a066e7
  7. 25 10月, 2013 3 次提交
  8. 10 4月, 2013 1 次提交
  9. 20 11月, 2012 1 次提交
    • E
      proc: Usable inode numbers for the namespace file descriptors. · 98f842e6
      Eric W. Biederman 提交于
      Assign a unique proc inode to each namespace, and use that
      inode number to ensure we only allocate at most one proc
      inode for every namespace in proc.
      
      A single proc inode per namespace allows userspace to test
      to see if two processes are in the same namespace.
      
      This has been a long requested feature and only blocked because
      a naive implementation would put the id in a global space and
      would ultimately require having a namespace for the names of
      namespaces, making migration and certain virtualization tricks
      impossible.
      
      We still don't have per superblock inode numbers for proc, which
      appears necessary for application unaware checkpoint/restart and
      migrations (if the application is using namespace file descriptors)
      but that is now allowd by the design if it becomes important.
      
      I have preallocated the ipc and uts initial proc inode numbers so
      their structures can be statically initialized.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      98f842e6
  10. 19 11月, 2012 2 次提交
    • E
      vfs: Add a user namespace reference from struct mnt_namespace · 771b1371
      Eric W. Biederman 提交于
      This will allow for support for unprivileged mounts in a new user namespace.
      Acked-by: N"Serge E. Hallyn" <serge@hallyn.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      771b1371
    • E
      vfs: Add setns support for the mount namespace · 8823c079
      Eric W. Biederman 提交于
      setns support for the mount namespace is a little tricky as an
      arbitrary decision must be made about what to set fs->root and
      fs->pwd to, as there is no expectation of a relationship between
      the two mount namespaces.  Therefore I arbitrarily find the root
      mount point, and follow every mount on top of it to find the top
      of the mount stack.  Then I set fs->root and fs->pwd to that
      location.  The topmost root of the mount stack seems like a
      reasonable place to be.
      
      Bind mount support for the mount namespace inodes has the
      possibility of creating circular dependencies between mount
      namespaces.  Circular dependencies can result in loops that
      prevent mount namespaces from every being freed.  I avoid
      creating those circular dependencies by adding a sequence number
      to the mount namespace and require all bind mounts be of a
      younger mount namespace into an older mount namespace.
      
      Add a helper function proc_ns_inode so it is possible to
      detect when we are attempting to bind mound a namespace inode.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      8823c079
  11. 14 7月, 2012 2 次提交
  12. 07 1月, 2012 1 次提交
  13. 04 1月, 2012 19 次提交