1. 09 9月, 2013 1 次提交
  2. 06 9月, 2013 1 次提交
    • M
      vfs: check unlinked ancestors before mount · eed81007
      Miklos Szeredi 提交于
      We check submounts before doing d_drop() on a non-empty directory dentry in
      NFS (have_submounts()), but we do not exclude a racing mount.  Nor do we
      prevent mounts to be added to the disconnected subtree using relative paths
      after the d_drop().
      
      This patch fixes these issues by checking for unlinked (unhashed, non-root)
      ancestors before proceeding with the mount.  This is done with rename
      seqlock taken for write and with ->d_lock grabbed on each ancestor in turn,
      including our dentry itself.  This ensures that the only one of
      check_submounts_and_drop() or has_unlinked_ancestor() can succeed.
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eed81007
  3. 04 9月, 2013 1 次提交
    • J
      vfs: allow umount to handle mountpoints without revalidating them · 8033426e
      Jeff Layton 提交于
      Christopher reported a regression where he was unable to unmount a NFS
      filesystem where the root had gone stale. The problem is that
      d_revalidate handles the root of the filesystem differently from other
      dentries, but d_weak_revalidate does not. We could simply fix this by
      making d_weak_revalidate return success on IS_ROOT dentries, but there
      are cases where we do want to revalidate the root of the fs.
      
      A umount is really a special case. We generally aren't interested in
      anything but the dentry and vfsmount that's attached at that point. If
      the inode turns out to be stale we just don't care since the intent is
      to stop using it anyway.
      
      Try to handle this situation better by treating umount as a special
      case in the lookup code. Have it resolve the parent using normal
      means, and then do a lookup of the final dentry without revalidating
      it. In most cases, the final lookup will come out of the dcache, but
      the case where there's a trailing symlink or !LAST_NORM entry on the
      end complicates things a bit.
      
      Cc: Neil Brown <neilb@suse.de>
      Reported-by: NChristopher T Vogan <cvogan@us.ibm.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8033426e
  4. 31 8月, 2013 1 次提交
  5. 27 8月, 2013 2 次提交
    • E
      userns: Better restrictions on when proc and sysfs can be mounted · e51db735
      Eric W. Biederman 提交于
      Rely on the fact that another flavor of the filesystem is already
      mounted and do not rely on state in the user namespace.
      
      Verify that the mounted filesystem is not covered in any significant
      way.  I would love to verify that the previously mounted filesystem
      has no mounts on top but there are at least the directories
      /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
      for other filesystems to mount on top of.
      
      Refactor the test into a function named fs_fully_visible and call that
      function from the mount routines of proc and sysfs.  This makes this
      test local to the filesystems involved and the results current of when
      the mounts take place, removing a weird threading of the user
      namespace, the mount namespace and the filesystems themselves.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      e51db735
    • E
      vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces · 4ce5d2b1
      Eric W. Biederman 提交于
      Don't copy bind mounts of /proc/<pid>/ns/mnt between namespaces.
      These files hold references to a mount namespace and copying them
      between namespaces could result in a reference counting loop.
      
      The current mnt_ns_loop test prevents loops on the assumption that
      mounts don't cross between namespaces.  Unfortunately unsharing a
      mount namespace and shared substrees can both cause mounts to
      propogate between mount namespaces.
      
      Add two flags CL_COPY_UNBINDABLE and CL_COPY_MNT_NS_FILE are added to
      control this behavior, and CL_COPY_ALL is redefined as both of them.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      4ce5d2b1
  6. 25 8月, 2013 1 次提交
  7. 25 7月, 2013 1 次提交
    • E
      vfs: Lock in place mounts from more privileged users · 5ff9d8a6
      Eric W. Biederman 提交于
      When creating a less privileged mount namespace or propogating mounts
      from a more privileged to a less privileged mount namespace lock the
      submounts so they may not be unmounted individually in the child mount
      namespace revealing what is under them.
      
      This enforces the reasonable expectation that it is not possible to
      see under a mount point.  Most of the time mounts are on empty
      directories and revealing that does not matter, however I have seen an
      occassionaly sloppy configuration where there were interesting things
      concealed under a mount point that probably should not be revealed.
      
      Expirable submounts are not locked because they will eventually
      unmount automatically so whatever is under them already needs
      to be safe for unprivileged users to access.
      
      From a practical standpoint these restrictions do not appear to be
      significant for unprivileged users of the mount namespace.  Recursive
      bind mounts and pivot_root continues to work, and mounts that are
      created in a mount namespace may be unmounted there.  All of which
      means that the common idiom of keeping a directory of interesting
      files and using pivot_root to throw everything else away continues to
      work just fine.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      5ff9d8a6
  8. 05 5月, 2013 2 次提交
  9. 02 5月, 2013 1 次提交
  10. 10 4月, 2013 7 次提交
  11. 27 3月, 2013 4 次提交
    • E
      userns: Restrict when proc and sysfs can be mounted · 87a8ebd6
      Eric W. Biederman 提交于
      Only allow unprivileged mounts of proc and sysfs if they are already
      mounted when the user namespace is created.
      
      proc and sysfs are interesting because they have content that is
      per namespace, and so fresh mounts are needed when new namespaces
      are created while at the same time proc and sysfs have content that
      is shared between every instance.
      
      Respect the policy of who may see the shared content of proc and sysfs
      by only allowing new mounts if there was an existing mount at the time
      the user namespace was created.
      
      In practice there are only two interesting cases: proc and sysfs are
      mounted at their usual places, proc and sysfs are not mounted at all
      (some form of mount namespace jail).
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      87a8ebd6
    • E
      vfs: Carefully propogate mounts across user namespaces · 132c94e3
      Eric W. Biederman 提交于
      As a matter of policy MNT_READONLY should not be changable if the
      original mounter had more privileges than creator of the mount
      namespace.
      
      Add the flag CL_UNPRIVILEGED to note when we are copying a mount from
      a mount namespace that requires more privileges to a mount namespace
      that requires fewer privileges.
      
      When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT
      if any of the mnt flags that should never be changed are set.
      
      This protects both mount propagation and the initial creation of a less
      privileged mount namespace.
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      132c94e3
    • E
      vfs: Add a mount flag to lock read only bind mounts · 90563b19
      Eric W. Biederman 提交于
      When a read-only bind mount is copied from mount namespace in a higher
      privileged user namespace to a mount namespace in a lesser privileged
      user namespace, it should not be possible to remove the the read-only
      restriction.
      
      Add a MNT_LOCK_READONLY mount flag to indicate that a mount must
      remain read-only.
      
      CC: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      90563b19
    • E
      userns: Don't allow creation if the user is chrooted · 3151527e
      Eric W. Biederman 提交于
      Guarantee that the policy of which files may be access that is
      established by setting the root directory will not be violated
      by user namespaces by verifying that the root directory points
      to the root of the mount namespace at the time of user namespace
      creation.
      
      Changing the root is a privileged operation, and as a matter of policy
      it serves to limit unprivileged processes to files below the current
      root directory.
      
      For reasons of simplicity and comprehensibility the privilege to
      change the root directory is gated solely on the CAP_SYS_CHROOT
      capability in the user namespace.  Therefore when creating a user
      namespace we must ensure that the policy of which files may be access
      can not be violated by changing the root directory.
      
      Anyone who runs a processes in a chroot and would like to use user
      namespace can setup the same view of filesystems with a mount
      namespace instead.  With this result that this is not a practical
      limitation for using user namespaces.
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      3151527e
  12. 23 2月, 2013 3 次提交
  13. 21 12月, 2012 1 次提交
  14. 15 12月, 2012 1 次提交
    • E
      userns: Require CAP_SYS_ADMIN for most uses of setns. · 5e4a0847
      Eric W. Biederman 提交于
      Andy Lutomirski <luto@amacapital.net> found a nasty little bug in
      the permissions of setns.  With unprivileged user namespaces it
      became possible to create new namespaces without privilege.
      
      However the setns calls were relaxed to only require CAP_SYS_ADMIN in
      the user nameapce of the targed namespace.
      
      Which made the following nasty sequence possible.
      
      pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
      if (pid == 0) { /* child */
      	system("mount --bind /home/me/passwd /etc/passwd");
      }
      else if (pid != 0) { /* parent */
      	char path[PATH_MAX];
      	snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
      	fd = open(path, O_RDONLY);
      	setns(fd, 0);
      	system("su -");
      }
      
      Prevent this possibility by requiring CAP_SYS_ADMIN
      in the current user namespace when joing all but the user namespace.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      5e4a0847
  15. 20 11月, 2012 1 次提交
    • E
      proc: Usable inode numbers for the namespace file descriptors. · 98f842e6
      Eric W. Biederman 提交于
      Assign a unique proc inode to each namespace, and use that
      inode number to ensure we only allocate at most one proc
      inode for every namespace in proc.
      
      A single proc inode per namespace allows userspace to test
      to see if two processes are in the same namespace.
      
      This has been a long requested feature and only blocked because
      a naive implementation would put the id in a global space and
      would ultimately require having a namespace for the names of
      namespaces, making migration and certain virtualization tricks
      impossible.
      
      We still don't have per superblock inode numbers for proc, which
      appears necessary for application unaware checkpoint/restart and
      migrations (if the application is using namespace file descriptors)
      but that is now allowd by the design if it becomes important.
      
      I have preallocated the ipc and uts initial proc inode numbers so
      their structures can be statically initialized.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      98f842e6
  16. 19 11月, 2012 5 次提交
  17. 13 10月, 2012 1 次提交
    • J
      vfs: define struct filename and have getname() return it · 91a27b2a
      Jeff Layton 提交于
      getname() is intended to copy pathname strings from userspace into a
      kernel buffer. The result is just a string in kernel space. It would
      however be quite helpful to be able to attach some ancillary info to
      the string.
      
      For instance, we could attach some audit-related info to reduce the
      amount of audit-related processing needed. When auditing is enabled,
      we could also call getname() on the string more than once and not
      need to recopy it from userspace.
      
      This patchset converts the getname()/putname() interfaces to return
      a struct instead of a string. For now, the struct just tracks the
      string in kernel space and the original userland pointer for it.
      
      Later, we'll add other information to the struct as it becomes
      convenient.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      91a27b2a
  18. 12 10月, 2012 1 次提交
  19. 23 9月, 2012 1 次提交
    • A
      do_add_mount()/umount -l races · 156cacb1
      Al Viro 提交于
      normally we deal with lock_mount()/umount races by checking that
      mountpoint to be is still in our namespace after lock_mount() has
      been done.  However, do_add_mount() skips that check when called
      with MNT_SHRINKABLE in flags (i.e. from finish_automount()).  The
      reason is that ->mnt_ns may be a temporary namespace created exactly
      to contain automounts a-la NFS4 referral handling.  It's not the
      namespace of the caller, though, so check_mnt() would fail here.
      We still need to check that ->mnt_ns is non-NULL in that case,
      though.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      156cacb1
  20. 31 7月, 2012 1 次提交
  21. 14 7月, 2012 3 次提交