1. 12 2月, 2013 3 次提交
    • E
      ceph: Translate inode uid and gid attributes to/from kuids and kgids. · ab871b90
      Eric W. Biederman 提交于
      - In fill_inode() transate uids and gids in the initial user namespace
        into kuids and kgids stored in inode->i_uid and inode->i_gid.
      
      - In ceph_setattr() if they have changed convert inode->i_uid and
        inode->i_gid into initial user namespace uids and gids for
        transmission.
      
      Cc: Sage Weil <sage@inktank.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      ab871b90
    • E
      ceph: Translate between uid and gids in cap messages and kuids and kgids · 05cb11c1
      Eric W. Biederman 提交于
      - Make the uid and gid arguments of send_cap_msg() used to compose
        ceph_mds_caps messages of type kuid_t and kgid_t.
      
      - Pass inode->i_uid and inode->i_gid in __send_cap to send_cap_msg()
        through variables of type kuid_t and kgid_t.
      
      - Modify struct ceph_cap_snap to store uids and gids in types kuid_t
        and kgid_t.  This allows capturing inode->i_uid and inode->i_gid in
        ceph_queue_cap_snap() without loss and pssing them to
        __ceph_flush_snaps() where they are removed from struct
        ceph_cap_snap and passed to send_cap_msg().
      
      - In handle_cap_grant translate uid and gids in the initial user
        namespace stored in struct ceph_mds_cap into kuids and kgids
        before setting inode->i_uid and inode->i_gid.
      
      Cc: Sage Weil <sage@inktank.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      05cb11c1
    • E
      ceph: Only allow mounts in the initial network namespace · eea553c2
      Eric W. Biederman 提交于
      Today ceph opens tcp sockets from a delayed work callback.  Delayed
      work happens from kernel threads which are always in the initial
      network namespace.   Therefore fail early if someone attempts
      to mount a ceph filesystem from something other than the initial
      network namespace.
      
      Cc: Sage Weil <sage@inktank.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      eea553c2
  2. 28 1月, 2013 1 次提交
  3. 27 1月, 2013 6 次提交
    • E
      userns: Allow the userns root to mount tmpfs. · 2b8576cb
      Eric W. Biederman 提交于
      There is no backing store to tmpfs and file creation rules are the
      same as for any other filesystem so it is semantically safe to allow
      unprivileged users to mount it.  ramfs is safe for the same reasons so
      allow either flavor of tmpfs to be mounted by a user namespace root
      user.
      
      The memory control group successfully limits how much memory tmpfs can
      consume on any system that cares about a user namespace root using
      tmpfs to exhaust memory the memory control group can be deployed.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      2b8576cb
    • E
      userns: Allow the userns root to mount ramfs. · b3c6761d
      Eric W. Biederman 提交于
      There is no backing store to ramfs and file creation
      rules are the same as for any other filesystem so
      it is semantically safe to allow unprivileged users
      to mount it.
      
      The memory control group successfully limits how much
      memory ramfs can consume on any system that cares about
      a user namespace root using ramfs to exhaust memory
      the memory control group can be deployed.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      b3c6761d
    • E
      userns: Allow the userns root to mount of devpts · ec2aa8e8
      Eric W. Biederman 提交于
      - The context in which devpts is mounted has no effect on the creation
        of ptys as the /dev/ptmx interface has been used by unprivileged
        users for many years.
      
      - Only support unprivileged mounts in combination with the newinstance
        option to ensure that mounting of /dev/pts in a user namespace will
        not allow the options of an existing mount of devpts to be modified.
      
      - Create /dev/pts/ptmx as the root user in the user namespace that
        mounts devpts so that it's permissions to be changed.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      ec2aa8e8
    • E
      userns: Recommend use of memory control groups. · e11f0ae3
      Eric W. Biederman 提交于
      In the help text describing user namespaces recommend use of memory
      control groups.  In many cases memory control groups are the only
      mechanism there is to limit how much memory a user who can create
      user namespaces can use.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      e11f0ae3
    • E
      userns: Allow any uid or gid mappings that don't overlap. · 0bd14b4f
      Eric W. Biederman 提交于
      When I initially wrote the code for /proc/<pid>/uid_map.  I was lazy
      and avoided duplicate mappings by the simple expedient of ensuring the
      first number in a new extent was greater than any number in the
      previous extent.
      
      Unfortunately that precludes a number of valid mappings, and someone
      noticed and complained.  So use a simple check to ensure that ranges
      in the mapping extents don't overlap.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      0bd14b4f
    • E
      userns: Avoid recursion in put_user_ns · c61a2810
      Eric W. Biederman 提交于
      When freeing a deeply nested user namespace free_user_ns calls
      put_user_ns on it's parent which may in turn call free_user_ns again.
      When -fno-optimize-sibling-calls is passed to gcc one stack frame per
      user namespace is left on the stack, potentially overflowing the
      kernel stack.  CONFIG_FRAME_POINTER forces -fno-optimize-sibling-calls
      so we can't count on gcc to optimize this code.
      
      Remove struct kref and use a plain atomic_t.  Making the code more
      flexible and easier to comprehend.  Make the loop in free_user_ns
      explict to guarantee that the stack does not overflow with
      CONFIG_FRAME_POINTER enabled.
      
      I have tested this fix with a simple program that uses unshare to
      create a deeply nested user namespace structure and then calls exit.
      With 1000 nesteuser namespaces before this change running my test
      program causes the kernel to die a horrible death.  With 10,000,000
      nested user namespaces after this change my test program runs to
      completion and causes no harm.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Pointed-out-by: NVasily Kulikov <segoon@openwall.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      c61a2810
  4. 27 12月, 2012 1 次提交
  5. 26 12月, 2012 3 次提交
    • E
      f2fs: Don't assign e_id in f2fs_acl_from_disk · 48c6d121
      Eric W. Biederman 提交于
      With user namespaces enabled building f2fs fails with:
      
       CC      fs/f2fs/acl.o
      fs/f2fs/acl.c: In function ‘f2fs_acl_from_disk’:
      fs/f2fs/acl.c:85:21: error: ‘struct posix_acl_entry’ has no member named ‘e_id’
      make[2]: *** [fs/f2fs/acl.o] Error 1
      make[2]: Target `__build' not remade because of errors.
      
      e_id is a backwards compatibility field only used for file systems
      that haven't been converted to use kuids and kgids.  When the posix
      acl tag field is neither ACL_USER nor ACL_GROUP assigning e_id is
      unnecessary.  Remove the assignment so f2fs will build with user
      namespaces enabled.
      
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Amit Sahrawat <a.sahrawat@samsung.com>
      Acked-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      48c6d121
    • E
      proc: Allow proc_free_inum to be called from any context · dfb2ea45
      Eric W. Biederman 提交于
      While testing the pid namespace code I hit this nasty warning.
      
      [  176.262617] ------------[ cut here ]------------
      [  176.263388] WARNING: at /home/eric/projects/linux/linux-userns-devel/kernel/softirq.c:160 local_bh_enable_ip+0x7a/0xa0()
      [  176.265145] Hardware name: Bochs
      [  176.265677] Modules linked in:
      [  176.266341] Pid: 742, comm: bash Not tainted 3.7.0userns+ #18
      [  176.266564] Call Trace:
      [  176.266564]  [<ffffffff810a539f>] warn_slowpath_common+0x7f/0xc0
      [  176.266564]  [<ffffffff810a53fa>] warn_slowpath_null+0x1a/0x20
      [  176.266564]  [<ffffffff810ad9ea>] local_bh_enable_ip+0x7a/0xa0
      [  176.266564]  [<ffffffff819308c9>] _raw_spin_unlock_bh+0x19/0x20
      [  176.266564]  [<ffffffff8123dbda>] proc_free_inum+0x3a/0x50
      [  176.266564]  [<ffffffff8111d0dc>] free_pid_ns+0x1c/0x80
      [  176.266564]  [<ffffffff8111d195>] put_pid_ns+0x35/0x50
      [  176.266564]  [<ffffffff810c608a>] put_pid+0x4a/0x60
      [  176.266564]  [<ffffffff8146b177>] tty_ioctl+0x717/0xc10
      [  176.266564]  [<ffffffff810aa4d5>] ? wait_consider_task+0x855/0xb90
      [  176.266564]  [<ffffffff81086bf9>] ? default_spin_lock_flags+0x9/0x10
      [  176.266564]  [<ffffffff810cab0a>] ? remove_wait_queue+0x5a/0x70
      [  176.266564]  [<ffffffff811e37e8>] do_vfs_ioctl+0x98/0x550
      [  176.266564]  [<ffffffff810b8a0f>] ? recalc_sigpending+0x1f/0x60
      [  176.266564]  [<ffffffff810b9127>] ? __set_task_blocked+0x37/0x80
      [  176.266564]  [<ffffffff810ab95b>] ? sys_wait4+0xab/0xf0
      [  176.266564]  [<ffffffff811e3d31>] sys_ioctl+0x91/0xb0
      [  176.266564]  [<ffffffff810a95f0>] ? task_stopped_code+0x50/0x50
      [  176.266564]  [<ffffffff81939199>] system_call_fastpath+0x16/0x1b
      [  176.266564] ---[ end trace 387af88219ad6143 ]---
      
      It turns out that spin_unlock_bh(proc_inum_lock) is not safe when
      put_pid is called with another spinlock held and irqs disabled.
      
      For now take the easy path and use spin_lock_irqsave(proc_inum_lock)
      in proc_free_inum and spin_loc_irq in proc_alloc_inum(proc_inum_lock).
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      dfb2ea45
    • E
      pidns: Stop pid allocation when init dies · c876ad76
      Eric W. Biederman 提交于
      Oleg pointed out that in a pid namespace the sequence.
      - pid 1 becomes a zombie
      - setns(thepidns), fork,...
      - reaping pid 1.
      - The injected processes exiting.
      
      Can lead to processes attempting access their child reaper and
      instead following a stale pointer.
      
      That waitpid for init can return before all of the processes in
      the pid namespace have exited is also unfortunate.
      
      Avoid these problems by disabling the allocation of new pids in a pid
      namespace when init dies, instead of when the last process in a pid
      namespace is reaped.
      Pointed-out-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      c876ad76
  6. 25 12月, 2012 1 次提交
  7. 22 12月, 2012 25 次提交