1. 27 3月, 2013 4 次提交
    • E
      userns: Restrict when proc and sysfs can be mounted · 87a8ebd6
      Eric W. Biederman 提交于
      Only allow unprivileged mounts of proc and sysfs if they are already
      mounted when the user namespace is created.
      
      proc and sysfs are interesting because they have content that is
      per namespace, and so fresh mounts are needed when new namespaces
      are created while at the same time proc and sysfs have content that
      is shared between every instance.
      
      Respect the policy of who may see the shared content of proc and sysfs
      by only allowing new mounts if there was an existing mount at the time
      the user namespace was created.
      
      In practice there are only two interesting cases: proc and sysfs are
      mounted at their usual places, proc and sysfs are not mounted at all
      (some form of mount namespace jail).
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      87a8ebd6
    • E
      vfs: Carefully propogate mounts across user namespaces · 132c94e3
      Eric W. Biederman 提交于
      As a matter of policy MNT_READONLY should not be changable if the
      original mounter had more privileges than creator of the mount
      namespace.
      
      Add the flag CL_UNPRIVILEGED to note when we are copying a mount from
      a mount namespace that requires more privileges to a mount namespace
      that requires fewer privileges.
      
      When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT
      if any of the mnt flags that should never be changed are set.
      
      This protects both mount propagation and the initial creation of a less
      privileged mount namespace.
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      132c94e3
    • E
      vfs: Add a mount flag to lock read only bind mounts · 90563b19
      Eric W. Biederman 提交于
      When a read-only bind mount is copied from mount namespace in a higher
      privileged user namespace to a mount namespace in a lesser privileged
      user namespace, it should not be possible to remove the the read-only
      restriction.
      
      Add a MNT_LOCK_READONLY mount flag to indicate that a mount must
      remain read-only.
      
      CC: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      90563b19
    • E
      userns: Don't allow creation if the user is chrooted · 3151527e
      Eric W. Biederman 提交于
      Guarantee that the policy of which files may be access that is
      established by setting the root directory will not be violated
      by user namespaces by verifying that the root directory points
      to the root of the mount namespace at the time of user namespace
      creation.
      
      Changing the root is a privileged operation, and as a matter of policy
      it serves to limit unprivileged processes to files below the current
      root directory.
      
      For reasons of simplicity and comprehensibility the privilege to
      change the root directory is gated solely on the CAP_SYS_CHROOT
      capability in the user namespace.  Therefore when creating a user
      namespace we must ensure that the policy of which files may be access
      can not be violated by changing the root directory.
      
      Anyone who runs a processes in a chroot and would like to use user
      namespace can setup the same view of filesystems with a mount
      namespace instead.  With this result that this is not a practical
      limitation for using user namespaces.
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      3151527e
  2. 16 3月, 2013 1 次提交
    • L
      Btrfs: fix warning of free_extent_map · 3b277594
      Liu Bo 提交于
      Users report that an extent map's list is still linked when it's actually
      going to be freed from cache.
      
      The story is that
      
      a) when we're going to drop an extent map and may split this large one into
      smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means
      that it's on the list to be logged, then the smaller ems split from it will also
      be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected.
      
      b) we'll keep ems from unlinking the list and freeing when they are flagged with
      EXTENT_FLAG_LOGGING, because the log code holds one reference.
      
      The end result is the warning, but the truth is that we set the flag
      EXTENT_FLAG_LOGGING only during fsync.
      
      So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one.
      Reported-by: NJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Reported-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      3b277594
  3. 15 3月, 2013 6 次提交
  4. 14 3月, 2013 2 次提交
  5. 13 3月, 2013 3 次提交
    • J
      ext2: Fix BUG_ON in evict() on inode deletion · c288d296
      Jan Kara 提交于
      Commit 8e3dffc6 introduced a regression where deleting inode with
      large extended attributes leads to triggering
        BUG_ON(inode->i_state != (I_FREEING | I_CLEAR))
      in fs/inode.c:evict(). That happens because freeing of xattr block
      dirtied the inode and it happened after clear_inode() has been called.
      
      Fix the issue by moving removal of xattr block into ext2_evict_inode()
      before clear_inode() call close to a place where data blocks are
      truncated. That is also more logical place and removes surprising
      requirement that ext2_free_blocks() mustn't dirty the inode.
      Reported-by: NTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      c288d296
    • E
      fs: Readd the fs module aliases. · fa7614dd
      Eric W. Biederman 提交于
      I had assumed that the only use of module aliases for filesystems
      prior to "fs: Limit sys_mount to only request filesystem modules."
      was in request_module.  It turns out I was wrong.  At least mkinitcpio
      in Arch linux uses these aliases.
      
      So readd the preexising aliases, to keep from breaking userspace.
      
      Userspace eventually will have to follow and use the same aliases the
      kernel does.  So at some point we may be delete these aliases without
      problems.  However that day is not today.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      fa7614dd
    • M
      Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys · 8aec0f5d
      Mathieu Desnoyers 提交于
      Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to
      compat_process_vm_rw() shows that the compatibility code requires an
      explicit "access_ok()" check before calling
      compat_rw_copy_check_uvector(). The same difference seems to appear when
      we compare fs/read_write.c:do_readv_writev() to
      fs/compat.c:compat_do_readv_writev().
      
      This subtle difference between the compat and non-compat requirements
      should probably be debated, as it seems to be error-prone. In fact,
      there are two others sites that use this function in the Linux kernel,
      and they both seem to get it wrong:
      
      Now shifting our attention to fs/aio.c, we see that aio_setup_iocb()
      also ends up calling compat_rw_copy_check_uvector() through
      aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to
      be missing. Same situation for
      security/keys/compat.c:compat_keyctl_instantiate_key_iov().
      
      I propose that we add the access_ok() check directly into
      compat_rw_copy_check_uvector(), so callers don't have to worry about it,
      and it therefore makes the compat call code similar to its non-compat
      counterpart. Place the access_ok() check in the same location where
      copy_from_user() can trigger a -EFAULT error in the non-compat code, so
      the ABI behaviors are alike on both compat and non-compat.
      
      While we are here, fix compat_do_readv_writev() so it checks for
      compat_rw_copy_check_uvector() negative return values.
      
      And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error
      handling.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8aec0f5d
  6. 12 3月, 2013 4 次提交
  7. 11 3月, 2013 2 次提交
  8. 09 3月, 2013 2 次提交
  9. 07 3月, 2013 8 次提交
  10. 06 3月, 2013 1 次提交
  11. 05 3月, 2013 7 次提交