1. 31 3月, 2014 3 次提交
  2. 30 11月, 2013 1 次提交
    • T
      sysfs, kernfs: prepare mount path for kernfs · 4b93dc9b
      Tejun Heo 提交于
      We're in the process of separating out core sysfs functionality into
      kernfs which will deal with sysfs_dirents directly.  This patch
      rearranges mount path so that the kernfs and sysfs parts are separate.
      
      * As sysfs_super_info won't be visible outside kernfs proper,
        kernfs_super_ns() is added to allow kernfs users to access a
        super_block's namespace tag.
      
      * Generic mount operation is separated out into kernfs_mount_ns().
        sysfs_mount() now just performs sysfs-specific permission check,
        acquires namespace tag, and invokes kernfs_mount_ns().
      
      * Generic superblock release is separated out into kernfs_kill_sb()
        which can be used directly as file_system_type->kill_sb().  As sysfs
        needs to put the namespace tag, sysfs_kill_sb() wraps
        kernfs_kill_sb() with ns tag put.
      
      * sysfs_dir_cachep init and sysfs_inode_init() are separated out into
        kernfs_init().  kernfs_init() uses only small amount of memory and
        trying to handle and propagate kernfs_init() failure doesn't make
        much sense.  Use SLAB_PANIC for sysfs_dir_cachep and make
        sysfs_inode_init() panic on failure.
      
        After this change, kernfs_init() should be called before
        sysfs_init(), fs/namespace.c::mnt_init() modified accordingly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b93dc9b
  3. 27 11月, 2013 1 次提交
    • E
      vfs: Fix a regression in mounting proc · 41301ae7
      Eric W. Biederman 提交于
      Gao feng <gaofeng@cn.fujitsu.com> reported that commit
      e51db735
      userns: Better restrictions on when proc and sysfs can be mounted
      caused a regression on mounting a new instance of proc in a mount
      namespace created with user namespace privileges, when binfmt_misc
      is mounted on /proc/sys/fs/binfmt_misc.
      
      This is an unintended regression caused by the absolutely bogus empty
      directory check in fs_fully_visible.  The check fs_fully_visible replaced
      didn't even bother to attempt to verify proc was fully visible and
      hiding proc files with any kind of mount is rare.  So for now fix
      the userspace regression by allowing directory with nlink == 1
      as /proc/sys/fs/binfmt_misc has.
      
      I will have a better patch but it is not stable material, or
      last minute kernel material.  So it will have to wait.
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NGao feng <gaofeng@cn.fujitsu.com>
      Tested-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      41301ae7
  4. 09 11月, 2013 1 次提交
    • A
      RCU'd vfsmounts · 48a066e7
      Al Viro 提交于
      * RCU-delayed freeing of vfsmounts
      * vfsmount_lock replaced with a seqlock (mount_lock)
      * sequence number from mount_lock is stored in nameidata->m_seq and
      used when we exit RCU mode
      * new vfsmount flag - MNT_SYNC_UMOUNT.  Set by umount_tree() when its
      caller knows that vfsmount will have no surviving references.
      * synchronize_rcu() done between unlocking namespace_sem in namespace_unlock()
      and doing pending mntput().
      * new helper: legitimize_mnt(mnt, seq).  Checks the mount_lock sequence
      number against seq, then grabs reference to mnt.  Then it rechecks mount_lock
      again to close the race and either returns success or drops the reference it
      has acquired.  The subtle point is that in case of MNT_SYNC_UMOUNT we can
      simply decrement the refcount and sod off - aforementioned synchronize_rcu()
      makes sure that final mntput() won't come until we leave RCU mode.  We need
      that, since we don't want to end up with some lazy pathwalk racing with
      umount() and stealing the final mntput() from it - caller of umount() may
      expect it to return only once the fs is shut down and we don't want to break
      that.  In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do
      full-blown mntput() in case of mount_lock sequence number mismatch happening
      just as we'd grabbed the reference, but in those cases we won't be stealing
      the final mntput() from anything that would care.
      * mntput_no_expire() doesn't lock anything on the fast path now.  Incidentally,
      SMP and UP cases are handled the same way - no ifdefs there.
      * normal pathname resolution does *not* do any writes to mount_lock.  It does,
      of course, bump the refcounts of vfsmount and dentry in the very end, but that's
      it.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      48a066e7
  5. 25 10月, 2013 13 次提交
  6. 12 9月, 2013 1 次提交
    • R
      initmpfs: move rootfs code from fs/ramfs/ to init/ · 57f150a5
      Rob Landley 提交于
      When the rootfs code was a wrapper around ramfs, having them in the same
      file made sense.  Now that it can wrap another filesystem type, move it in
      with the init code instead.
      
      This also allows a subsequent patch to access rootfstype= command line
      arg.
      Signed-off-by: NRob Landley <rob@landley.net>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Stephen Warren <swarren@nvidia.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jim Cromie <jim.cromie@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57f150a5
  7. 09 9月, 2013 1 次提交
  8. 06 9月, 2013 1 次提交
    • M
      vfs: check unlinked ancestors before mount · eed81007
      Miklos Szeredi 提交于
      We check submounts before doing d_drop() on a non-empty directory dentry in
      NFS (have_submounts()), but we do not exclude a racing mount.  Nor do we
      prevent mounts to be added to the disconnected subtree using relative paths
      after the d_drop().
      
      This patch fixes these issues by checking for unlinked (unhashed, non-root)
      ancestors before proceeding with the mount.  This is done with rename
      seqlock taken for write and with ->d_lock grabbed on each ancestor in turn,
      including our dentry itself.  This ensures that the only one of
      check_submounts_and_drop() or has_unlinked_ancestor() can succeed.
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eed81007
  9. 04 9月, 2013 1 次提交
    • J
      vfs: allow umount to handle mountpoints without revalidating them · 8033426e
      Jeff Layton 提交于
      Christopher reported a regression where he was unable to unmount a NFS
      filesystem where the root had gone stale. The problem is that
      d_revalidate handles the root of the filesystem differently from other
      dentries, but d_weak_revalidate does not. We could simply fix this by
      making d_weak_revalidate return success on IS_ROOT dentries, but there
      are cases where we do want to revalidate the root of the fs.
      
      A umount is really a special case. We generally aren't interested in
      anything but the dentry and vfsmount that's attached at that point. If
      the inode turns out to be stale we just don't care since the intent is
      to stop using it anyway.
      
      Try to handle this situation better by treating umount as a special
      case in the lookup code. Have it resolve the parent using normal
      means, and then do a lookup of the final dentry without revalidating
      it. In most cases, the final lookup will come out of the dcache, but
      the case where there's a trailing symlink or !LAST_NORM entry on the
      end complicates things a bit.
      
      Cc: Neil Brown <neilb@suse.de>
      Reported-by: NChristopher T Vogan <cvogan@us.ibm.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8033426e
  10. 31 8月, 2013 1 次提交
  11. 27 8月, 2013 2 次提交
    • E
      userns: Better restrictions on when proc and sysfs can be mounted · e51db735
      Eric W. Biederman 提交于
      Rely on the fact that another flavor of the filesystem is already
      mounted and do not rely on state in the user namespace.
      
      Verify that the mounted filesystem is not covered in any significant
      way.  I would love to verify that the previously mounted filesystem
      has no mounts on top but there are at least the directories
      /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
      for other filesystems to mount on top of.
      
      Refactor the test into a function named fs_fully_visible and call that
      function from the mount routines of proc and sysfs.  This makes this
      test local to the filesystems involved and the results current of when
      the mounts take place, removing a weird threading of the user
      namespace, the mount namespace and the filesystems themselves.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      e51db735
    • E
      vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces · 4ce5d2b1
      Eric W. Biederman 提交于
      Don't copy bind mounts of /proc/<pid>/ns/mnt between namespaces.
      These files hold references to a mount namespace and copying them
      between namespaces could result in a reference counting loop.
      
      The current mnt_ns_loop test prevents loops on the assumption that
      mounts don't cross between namespaces.  Unfortunately unsharing a
      mount namespace and shared substrees can both cause mounts to
      propogate between mount namespaces.
      
      Add two flags CL_COPY_UNBINDABLE and CL_COPY_MNT_NS_FILE are added to
      control this behavior, and CL_COPY_ALL is redefined as both of them.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      4ce5d2b1
  12. 25 8月, 2013 1 次提交
  13. 25 7月, 2013 1 次提交
    • E
      vfs: Lock in place mounts from more privileged users · 5ff9d8a6
      Eric W. Biederman 提交于
      When creating a less privileged mount namespace or propogating mounts
      from a more privileged to a less privileged mount namespace lock the
      submounts so they may not be unmounted individually in the child mount
      namespace revealing what is under them.
      
      This enforces the reasonable expectation that it is not possible to
      see under a mount point.  Most of the time mounts are on empty
      directories and revealing that does not matter, however I have seen an
      occassionaly sloppy configuration where there were interesting things
      concealed under a mount point that probably should not be revealed.
      
      Expirable submounts are not locked because they will eventually
      unmount automatically so whatever is under them already needs
      to be safe for unprivileged users to access.
      
      From a practical standpoint these restrictions do not appear to be
      significant for unprivileged users of the mount namespace.  Recursive
      bind mounts and pivot_root continues to work, and mounts that are
      created in a mount namespace may be unmounted there.  All of which
      means that the common idiom of keeping a directory of interesting
      files and using pivot_root to throw everything else away continues to
      work just fine.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      5ff9d8a6
  14. 05 5月, 2013 2 次提交
  15. 02 5月, 2013 1 次提交
  16. 10 4月, 2013 7 次提交
  17. 27 3月, 2013 2 次提交
    • E
      userns: Restrict when proc and sysfs can be mounted · 87a8ebd6
      Eric W. Biederman 提交于
      Only allow unprivileged mounts of proc and sysfs if they are already
      mounted when the user namespace is created.
      
      proc and sysfs are interesting because they have content that is
      per namespace, and so fresh mounts are needed when new namespaces
      are created while at the same time proc and sysfs have content that
      is shared between every instance.
      
      Respect the policy of who may see the shared content of proc and sysfs
      by only allowing new mounts if there was an existing mount at the time
      the user namespace was created.
      
      In practice there are only two interesting cases: proc and sysfs are
      mounted at their usual places, proc and sysfs are not mounted at all
      (some form of mount namespace jail).
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      87a8ebd6
    • E
      vfs: Carefully propogate mounts across user namespaces · 132c94e3
      Eric W. Biederman 提交于
      As a matter of policy MNT_READONLY should not be changable if the
      original mounter had more privileges than creator of the mount
      namespace.
      
      Add the flag CL_UNPRIVILEGED to note when we are copying a mount from
      a mount namespace that requires more privileges to a mount namespace
      that requires fewer privileges.
      
      When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT
      if any of the mnt flags that should never be changed are set.
      
      This protects both mount propagation and the initial creation of a less
      privileged mount namespace.
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      132c94e3