1. 01 4月, 2009 1 次提交
  2. 25 11月, 2008 1 次提交
    • S
      User namespaces: set of cleanups (v2) · 18b6e041
      Serge Hallyn 提交于
      The user_ns is moved from nsproxy to user_struct, so that a struct
      cred by itself is sufficient to determine access (which it otherwise
      would not be).  Corresponding ecryptfs fixes (by David Howells) are
      here as well.
      
      Fix refcounting.  The following rules now apply:
              1. The task pins the user struct.
              2. The user struct pins its user namespace.
              3. The user namespace pins the struct user which created it.
      
      User namespaces are cloned during copy_creds().  Unsharing a new user_ns
      is no longer possible.  (We could re-add that, but it'll cause code
      duplication and doesn't seem useful if PAM doesn't need to clone user
      namespaces).
      
      When a user namespace is created, its first user (uid 0) gets empty
      keyrings and a clean group_info.
      
      This incorporates a previous patch by David Howells.  Here
      is his original patch description:
      
      >I suggest adding the attached incremental patch.  It makes the following
      >changes:
      >
      > (1) Provides a current_user_ns() macro to wrap accesses to current's user
      >     namespace.
      >
      > (2) Fixes eCryptFS.
      >
      > (3) Renames create_new_userns() to create_user_ns() to be more consistent
      >     with the other associated functions and because the 'new' in the name is
      >     superfluous.
      >
      > (4) Moves the argument and permission checks made for CLONE_NEWUSER to the
      >     beginning of do_fork() so that they're done prior to making any attempts
      >     at allocation.
      >
      > (5) Calls create_user_ns() after prepare_creds(), and gives it the new creds
      >     to fill in rather than have it return the new root user.  I don't imagine
      >     the new root user being used for anything other than filling in a cred
      >     struct.
      >
      >     This also permits me to get rid of a get_uid() and a free_uid(), as the
      >     reference the creds were holding on the old user_struct can just be
      >     transferred to the new namespace's creator pointer.
      >
      > (6) Makes create_user_ns() reset the UIDs and GIDs of the creds under
      >     preparation rather than doing it in copy_creds().
      >
      >David
      
      >Signed-off-by: David Howells <dhowells@redhat.com>
      
      Changelog:
      	Oct 20: integrate dhowells comments
      		1. leave thread_keyring alone
      		2. use current_user_ns() in set_user()
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      18b6e041
  3. 26 7月, 2008 1 次提交
    • S
      cgroup_clone: use pid of newly created task for new cgroup · e885dcde
      Serge E. Hallyn 提交于
      cgroup_clone creates a new cgroup with the pid of the task.  This works
      correctly for unshare, but for clone cgroup_clone is called from
      copy_namespaces inside copy_process, which happens before the new pid is
      created.  As a result, the new cgroup was created with current's pid.
      This patch:
      
      	1. Moves the call inside copy_process to after the new pid
      	   is created
      	2. Passes the struct pid into ns_cgroup_clone (as it is not
      	   yet attached to the task)
      	3. Passes a name from ns_cgroup_clone() into cgroup_clone()
      	   so as to keep cgroup_clone() itself simpler
      	4. Uses pid_vnr() to get the process id value, so that the
      	   pid used to name the new cgroup is always the pid as it
      	   would be known to the task which did the cloning or
      	   unsharing.  I think that is the most intuitive thing to
      	   do.  This way, task t1 does clone(CLONE_NEWPID) to get
      	   t2, which does clone(CLONE_NEWPID) to get t3, then the
      	   cgroup for t3 will be named for the pid by which t2 knows
      	   t3.
      
      (Thanks to Dan Smith for finding the main bug)
      
      Changelog:
      	June 11: Incorporate Paul Menage's feedback:  don't pass
      	         NULL to ns_cgroup_clone from unshare, and reduce
      		 patch size by using 'nodename' in cgroup_clone.
      	June 10: Original version
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NSerge Hallyn <serge@us.ibm.com>
      Acked-by: NPaul Menage <menage@google.com>
      Tested-by: NDan Smith <danms@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e885dcde
  4. 20 10月, 2007 2 次提交
    • P
      Make access to task's nsproxy lighter · cf7b708c
      Pavel Emelyanov 提交于
      When someone wants to deal with some other taks's namespaces it has to lock
      the task and then to get the desired namespace if the one exists.  This is
      slow on read-only paths and may be impossible in some cases.
      
      E.g.  Oleg recently noticed a race between unshare() and the (sent for
      review in cgroups) pid namespaces - when the task notifies the parent it
      has to know the parent's namespace, but taking the task_lock() is
      impossible there - the code is under write locked tasklist lock.
      
      On the other hand switching the namespace on task (daemonize) and releasing
      the namespace (after the last task exit) is rather rare operation and we
      can sacrifice its speed to solve the issues above.
      
      The access to other task namespaces is proposed to be performed
      like this:
      
           rcu_read_lock();
           nsproxy = task_nsproxy(tsk);
           if (nsproxy != NULL) {
                   / *
                     * work with the namespaces here
                     * e.g. get the reference on one of them
                     * /
           } / *
               * NULL task_nsproxy() means that this task is
               * almost dead (zombie)
               * /
           rcu_read_unlock();
      
      This patch has passed the review by Eric and Oleg :) and,
      of course, tested.
      
      [clg@fr.ibm.com: fix unshare()]
      [ebiederm@xmission.com: Update get_net_ns_by_pid]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf7b708c
    • S
      cgroups: implement namespace tracking subsystem · 858d72ea
      Serge E. Hallyn 提交于
      When a task enters a new namespace via a clone() or unshare(), a new cgroup
      is created and the task moves into it.
      
      This version names cgroups which are automatically created using
      cgroup_clone() as "node_<pid>" where pid is the pid of the unsharing or
      cloned process.  (Thanks Pavel for the idea) This is safe because if the
      process unshares again, it will create
      
      	/cgroups/(...)/node_<pid>/node_<pid>
      
      The only possibilities (AFAICT) for a -EEXIST on unshare are
      
      	1. pid wraparound
      	2. a process fails an unshare, then tries again.
      
      Case 1 is unlikely enough that I ignore it (at least for now).  In case 2, the
      node_<pid> will be empty and can be rmdir'ed to make the subsequent unshare()
      succeed.
      
      Changelog:
      	Name cloned cgroups as "node_<pid>".
      
      [clg@fr.ibm.com: fix order of cgroup subsystems in init/Kconfig]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      858d72ea
  5. 17 10月, 2007 1 次提交
  6. 11 10月, 2007 1 次提交
  7. 17 7月, 2007 2 次提交
  8. 09 5月, 2007 1 次提交
    • B
      Merge sys_clone()/sys_unshare() nsproxy and namespace handling · e3222c4e
      Badari Pulavarty 提交于
      sys_clone() and sys_unshare() both makes copies of nsproxy and its associated
      namespaces.  But they have different code paths.
      
      This patch merges all the nsproxy and its associated namespace copy/clone
      handling (as much as possible).  Posted on container list earlier for
      feedback.
      
      - Create a new nsproxy and its associated namespaces and pass it back to
        caller to attach it to right process.
      
      - Changed all copy_*_ns() routines to return a new copy of namespace
        instead of attaching it to task->nsproxy.
      
      - Moved the CAP_SYS_ADMIN checks out of copy_*_ns() routines.
      
      - Removed unnessary !ns checks from copy_*_ns() and added BUG_ON()
        just incase.
      
      - Get rid of all individual unshare_*_ns() routines and make use of
        copy_*_ns() instead.
      
      [akpm@osdl.org: cleanups, warning fix]
      [clg@fr.ibm.com: remove dup_namespaces() declaration]
      [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval]
      [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n]
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <containers@lists.osdl.org>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3222c4e
  9. 31 1月, 2007 2 次提交
  10. 14 12月, 2006 1 次提交
  11. 09 12月, 2006 3 次提交
  12. 26 11月, 2006 1 次提交
    • V
      [PATCH] mounstats NULL pointer dereference · 701e054e
      Vasily Tarasov 提交于
      OpenVZ developers team has encountered the following problem in 2.6.19-rc6
      kernel. After some seconds of running script
      
      while [[ 1 ]]
      do
      	find  /proc -name mountstats | xargs cat
      done
      
      this Oops appears:
      
      BUG: unable to handle kernel NULL pointer dereference at virtual address
      00000010
       printing eip:
      c01a6b70
      *pde = 00000000
      Oops: 0000 [#1]
      SMP
      Modules linked in: xt_length ipt_ttl xt_tcpmss ipt_TCPMSS iptable_mangle
      iptable_filter xt_multiport xt_limit ipt_tos ipt_REJECT ip_tables x_tables
      parport_pc lp parport sunrpc af_packet thermal processor fan button battery
      asus_acpi ac ohci_hcd ehci_hcd usbcore i2c_nforce2 i2c_core tg3 floppy
      pata_amd
      ide_cd cdrom sata_nv libata
      CPU:    1
      EIP:    0060:[<c01a6b70>]    Not tainted VLI
      EFLAGS: 00010246   (2.6.19-rc6 #2)
      EIP is at mountstats_open+0x70/0xf0
      eax: 00000000   ebx: e6247030   ecx: e62470f8   edx: 00000000
      esi: 00000000   edi: c01a6b00   ebp: c33b83c0   esp: f4105eb4
      ds: 007b   es: 007b   ss: 0068
      Process cat (pid: 6044, ti=f4105000 task=f4104a70 task.ti=f4105000)
      Stack: c33b83c0 c04ee940 f46a4a80 c33b83c0 e4df31b4 c01a6b00 f4105000 c0169231
             e4df31b4 c33b83c0 c33b83c0 f4105f20 00000003 f4105000 c0169445 f2503cf0
             f7f8c4c0 00008000 c33b83c0 00000000 00008000 c0169350 f4105f20 00008000
      Call Trace:
       [<c01a6b00>] mountstats_open+0x0/0xf0
       [<c0169231>] __dentry_open+0x181/0x250
       [<c0169445>] nameidata_to_filp+0x35/0x50
       [<c0169350>] do_filp_open+0x50/0x60
       [<c01873d6>] seq_read+0xc6/0x300
       [<c0169511>] get_unused_fd+0x31/0xc0
       [<c01696d3>] do_sys_open+0x63/0x110
       [<c01697a7>] sys_open+0x27/0x30
       [<c01030bd>] sysenter_past_esp+0x56/0x79
       =======================
      Code: 45 74 8b 54 24 20 89 44 24 08 8b 42 f0 31 d2 e8 47 cb f8 ff 85 c0 89 c3
      74 51 8d 80 a0 04 00 00 e8 46 06 2c 00 8b 83 48 04 00 00 <8b> 78 10 85 ff 74
      03
      f0 ff 07 b0 01 86 83 a0 04 00 00 f0 ff 4b
      EIP: [<c01a6b70>] mountstats_open+0x70/0xf0 SS:ESP 0068:f4105eb4
      
      The problem is that task->nsproxy can be equal NULL for some time during
      task exit. This patch fixes the BUG.
      Signed-off-by: NVasily Tarasov <vtaras@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: "Serge E. Hallyn" <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      701e054e
  13. 02 10月, 2006 4 次提交