1. 20 10月, 2007 9 次提交
    • P
      pid namespaces: allow cloning of new namespace · 30e49c26
      Pavel Emelyanov 提交于
      When clone() is invoked with CLONE_NEWPID, create a new pid namespace and then
      create a new struct pid for the new process.  Allocate pid_t's for the new
      process in the new pid namespace and all ancestor pid namespaces.  Make the
      newly cloned process the session and process group leader.
      
      Since the active pid namespace is special and expected to be the first entry
      in pid->upid_list, preserve the order of pid namespaces.
      
      The size of 'struct pid' is dependent on the the number of pid namespaces the
      process exists in, so we use multiple pid-caches'.  Only one pid cache is
      created during system startup and this used by processes that exist only in
      init_pid_ns.
      
      When a process clones its pid namespace, we create additional pid caches as
      necessary and use the pid cache to allocate 'struct pids' for that depth.
      
      Note, that with this patch the newly created namespace won't work, since the
      rest of the kernel still uses global pids, but this is to be fixed soon.  Init
      pid namespace still works.
      
      [oleg@tv-sign.ru: merge fix]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30e49c26
    • P
      pid namespaces: miscellaneous preparations for pid namespaces · b461cc03
      Pavel Emelyanov 提交于
      * remove pid.h from pid_namespaces.h;
      * rework is_(cgroup|global)_init;
      * optimize (get|put)_pid_ns for init_pid_ns;
      * declare task_child_reaper to return actual reaper.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b461cc03
    • P
      pid namespaces: helpers to find the task by its numerical ids · 198fe21b
      Pavel Emelyanov 提交于
      When searching the task by numerical id on may need to find it using global
      pid (as it is done now in kernel) or by its virtual id, e.g.  when sending a
      signal to a task from one namespace the sender will specify the task's virtual
      id and we should find the task by this value.
      
      [akpm@linux-foundation.org: fix gfs2 linkage]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      198fe21b
    • P
      pid namespaces: helpers to obtain pid numbers · 7af57294
      Pavel Emelyanov 提交于
      When showing pid to user or getting the pid numerical id for in-kernel use the
      value of this id may differ depending on the namespace.
      
      This set of helpers is used to get the global pid nr, the virtual (i.e.  seen
      by task in its namespace) nr and the nr as it is seen from the specified
      namespace.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7af57294
    • P
      pid namespaces: make alloc_pid(), free_pid() and put_pid() work with struct upid · 8ef047aa
      Pavel Emelyanov 提交于
      Each struct upid element of struct pid has to be initialized properly, i.e.
      its nr mst be allocated from appropriate pidmap and ns set to appropriate
      namespace.
      
      When allocating a new pid, we need to know the namespace this pid will live
      in, so the additional argument is added to alloc_pid().
      
      On the other hand, the rest of the kernel still uses the pid->nr and
      pid->pid_chain fields, so these ones are still initialized, but this will be
      removed soon.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ef047aa
    • P
      pid namespaces: add support for pid namespaces hierarchy · faacbfd3
      Pavel Emelyanov 提交于
      Each namespace has a parent and is characterized by its "level".  Level is the
      number of the namespace generation.  E.g.  init namespace has level 0, after
      cloning new one it will have level 1, the next one - 2 and so on and so forth.
       This level is not explicitly limited.
      
      True hierarchy must have some way to find each namespace's children, but it is
      not used in the patches, so this ability is not added (yet).
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      faacbfd3
    • S
      pid namespaces: define is_global_init() and is_container_init() · b460cbc5
      Serge E. Hallyn 提交于
      is_init() is an ambiguous name for the pid==1 check.  Split it into
      is_global_init() and is_container_init().
      
      A cgroup init has it's tsk->pid == 1.
      
      A global init also has it's tsk->pid == 1 and it's active pid namespace
      is the init_pid_ns.  But rather than check the active pid namespace,
      compare the task structure with 'init_pid_ns.child_reaper', which is
      initialized during boot to the /sbin/init process and never changes.
      
      Changelog:
      
      	2.6.22-rc4-mm2-pidns1:
      	- Use 'init_pid_ns.child_reaper' to determine if a given task is the
      	  global init (/sbin/init) process. This would improve performance
      	  and remove dependence on the task_pid().
      
      	2.6.21-mm2-pidns2:
      
      	- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
      	  ppc,avr32}/traps.c for the _exception() call to is_global_init().
      	  This way, we kill only the cgroup if the cgroup's init has a
      	  bug rather than force a kernel panic.
      
      [akpm@linux-foundation.org: fix comment]
      [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
      [bunk@stusta.de: kernel/pid.c: remove unused exports]
      [sukadev@us.ibm.com: Fix capability.c to work with threaded init]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Acked-by: NPavel Emelianov <xemul@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Herbert Poetzel <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b460cbc5
    • S
      pid namespaces: define and use task_active_pid_ns() wrapper · 2894d650
      Sukadev Bhattiprolu 提交于
      With multiple pid namespaces, a process is known by some pid_t in every
      ancestor pid namespace.  Every time the process forks, the child process also
      gets a pid_t in every ancestor pid namespace.
      
      While a process is visible in >=1 pid namespaces, it can see pid_t's in only
      one pid namespace.  We call this pid namespace it's "active pid namespace",
      and it is always the youngest pid namespace in which the process is known.
      
      This patch defines and uses a wrapper to find the active pid namespace of a
      process.  The implementation of the wrapper will be changed in when support
      for multiple pid namespaces are added.
      
      Changelog:
      	2.6.22-rc4-mm2-pidns1:
      	- [Pavel Emelianov, Alexey Dobriyan] Back out the change to use
      	  task_active_pid_ns() in child_reaper() since task->nsproxy
      	  can be NULL during task exit (so child_reaper() continues to
      	  use init_pid_ns).
      
      	  to implement child_reaper() since init_pid_ns.child_reaper to
      	  implement child_reaper() since tsk->nsproxy can be NULL during exit.
      
      	2.6.21-rc6-mm1:
      	- Rename task_pid_ns() to task_active_pid_ns() to reflect that a
      	  process can have multiple pid namespaces.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Acked-by: NPavel Emelianov <xemul@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Herbert Poetzel <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2894d650
    • P
      pid namespaces: dynamic kmem cache allocator for pid namespaces · baf8f0f8
      Pavel Emelianov 提交于
      Add kmem_cache to pid_namespace to allocate pids from.
      
      Since both implementations expand the struct pid to carry more numerical
      values each namespace should have separate cache to store pids of different
      sizes.
      
      Each kmem cache is name "pid_<NR>", where <NR> is the number of numerical ids
      on the pid.  Different namespaces with same level of nesting will have same
      caches.
      
      This patch has two FIXMEs that are to be fixed after we reach the consensus
      about the struct pid itself.
      
      The first one is that the namespace to free the pid from in free_pid() must be
      taken from pid.  Now the init_pid_ns is used.
      
      The second FIXME is about the cache allocation.  When we do know how long the
      object will be then we'll have to calculate this size in create_pid_cachep.
      Right now the sizeof(struct pid) value is used.
      
      [akpm@linux-foundation.org: coding-style repair]
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Acked-by: NCedric Le Goater <clg@fr.ibm.com>
      Acked-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      baf8f0f8
  2. 17 7月, 2007 1 次提交
  3. 11 5月, 2007 2 次提交
  4. 09 5月, 2007 1 次提交
    • B
      Merge sys_clone()/sys_unshare() nsproxy and namespace handling · e3222c4e
      Badari Pulavarty 提交于
      sys_clone() and sys_unshare() both makes copies of nsproxy and its associated
      namespaces.  But they have different code paths.
      
      This patch merges all the nsproxy and its associated namespace copy/clone
      handling (as much as possible).  Posted on container list earlier for
      feedback.
      
      - Create a new nsproxy and its associated namespaces and pass it back to
        caller to attach it to right process.
      
      - Changed all copy_*_ns() routines to return a new copy of namespace
        instead of attaching it to task->nsproxy.
      
      - Moved the CAP_SYS_ADMIN checks out of copy_*_ns() routines.
      
      - Removed unnessary !ns checks from copy_*_ns() and added BUG_ON()
        just incase.
      
      - Get rid of all individual unshare_*_ns() routines and make use of
        copy_*_ns() instead.
      
      [akpm@osdl.org: cleanups, warning fix]
      [clg@fr.ibm.com: remove dup_namespaces() declaration]
      [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval]
      [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n]
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <containers@lists.osdl.org>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3222c4e
  5. 08 5月, 2007 1 次提交
  6. 31 1月, 2007 1 次提交
  7. 09 12月, 2006 4 次提交
  8. 08 12月, 2006 1 次提交
  9. 02 10月, 2006 8 次提交
  10. 27 9月, 2006 2 次提交
  11. 04 7月, 2006 1 次提交
  12. 01 4月, 2006 1 次提交
    • E
      [PATCH] pidhash: Refactor the pid hash table · 92476d7f
      Eric W. Biederman 提交于
      Simplifies the code, reduces the need for 4 pid hash tables, and makes the
      code more capable.
      
      In the discussions I had with Oleg it was felt that to a large extent the
      cleanup itself justified the work.  With struct pid being dynamically
      allocated meant we could create the hash table entry when the pid was
      allocated and free the hash table entry when the pid was freed.  Instead of
      playing with the hash lists when ever a process would attach or detach to a
      process.
      
      For myself the fact that it gave what my previous task_ref patch gave for free
      with simpler code was a big win.  The problem is that if you hold a reference
      to struct task_struct you lock in 10K of low memory.  If you do that in a user
      controllable way like /proc does, with an unprivileged but hostile user space
      application with typical resource limits of 1000 fds and 100 processes I can
      trigger the OOM killer by consuming all of low memory with task structs, on a
      machine wight 1GB of low memory.
      
      If I instead hold a reference to struct pid which holds a pointer to my
      task_struct, I don't suffer from that problem because struct pid is 2 orders
      of magnitude smaller.  In fact struct pid is small enough that most other
      kernel data structures dwarf it, so simply limiting the number of referring
      data structures is enough to prevent exhaustion of low memory.
      
      This splits the current struct pid into two structures, struct pid and struct
      pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one.
      struct pid_link is the per process linkage into the hash tables and lives in
      struct task_struct.  struct pid is given an indepedent lifetime, and holds
      pointers to each of the pid types.
      
      The independent life of struct pid simplifies attach_pid, and detach_pid,
      because we are always manipulating the list of pids and not the hash table.
      In addition in giving struct pid an indpendent life it makes the concept much
      more powerful.
      
      Kernel data structures can now embed a struct pid * instead of a pid_t and
      not suffer from pid wrap around problems or from keeping unnecessarily
      large amounts of memory allocated.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      92476d7f
  13. 29 3月, 2006 2 次提交
    • O
      [PATCH] pidhash: don't count idle threads · 73b9ebfe
      Oleg Nesterov 提交于
      fork_idle() does unhash_process() just after copy_process().  Contrary,
      boot_cpu's idle thread explicitely registers itself for each pid_type with nr
      = 0.
      
      copy_process() already checks p->pid != 0 before process_counts++, I think we
      can just skip attach_pid() calls and job control inits for idle threads and
      kill unhash_process().  We don't need to cleanup ->proc_dentry in fork_idle()
      because with this patch idle threads are never hashed in
      kernel/pid.c:pid_hash[].
      
      We don't need to hash pid == 0 in pidmap_init().  free_pidmap() is never
      called with pid == 0 arg, so it will never be reused.  So it is still possible
      to use pid == 0 in any PIDTYPE_xxx namespace from kernel/pid.c's POV.
      
      However with this patch we don't hash pid == 0 for PIDTYPE_PID case.  We still
      have have PIDTYPE_PGID/PIDTYPE_SID entries with pid == 0: /sbin/init and
      kernel threads which don't call daemonize().
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      73b9ebfe
    • E
      [PATCH] pidhash: kill switch_exec_pids · d73d6529
      Eric W. Biederman 提交于
      switch_exec_pids is only called from de_thread by way of exec, and it is
      only called when we are exec'ing from a non thread group leader.
      
      Currently switch_exec_pids gives the leader the pid of the thread and
      unhashes and rehashes all of the process groups.  The leader is already in
      the EXIT_DEAD state so no one cares about it's pids.  The only concern for
      the leader is that __unhash_process called from release_task will function
      correctly.  If we don't touch the leader at all we know that
      __unhash_process will work fine so there is no need to touch the leader.
      
      For the task becomming the thread group leader, we just need to give it the
      pid of the old thread group leader, add it to the task list, and attach it
      to the session and the process group of the thread group.
      
      Currently de_thread is also adding the task to the task list which is just
      silly.
      
      Currently the only leader of __detach_pid besides detach_pid is
      switch_exec_pids because of the ugly extra work that was being
      performed.
      
      So this patch removes switch_exec_pids because it is doing too much, it is
      creating an unnecessary special case in pid.c, duing work duplicated in
      de_thread, and generally obscuring what it is going on.
      
      The necessary work is added to de_thread, and it seems to be a little
      clearer there what is going on.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Kirill Korotaev <dev@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d73d6529
  14. 09 1月, 2006 1 次提交
  15. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4