1. 04 7月, 2013 1 次提交
    • O
      kernel/fork.c:copy_process(): don't add the uninitialized child to thread/task/pid lists · 81907739
      Oleg Nesterov 提交于
      copy_process() adds the new child to thread_group/init_task.tasks list and
      then does attach_pid(child, PIDTYPE_PID).  This means that the lockless
      next_thread() or next_task() can see this thread with the wrong pid.  Say,
      "ls /proc/pid/task" can list the same inode twice.
      
      We could move attach_pid(child, PIDTYPE_PID) up, but in this case
      find_task_by_vpid() can find the new thread before it was fully
      initialized.
      
      And this is already true for PIDTYPE_PGID/PIDTYPE_SID, With this patch
      copy_process() initializes child->pids[*].pid first, then calls
      attach_pid() to insert the task into the pid->tasks list.
      
      attach_pid() no longer need the "struct pid*" argument, it is always
      called after pid_link->pid was already set.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Sergey Dyasly <dserrg@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81907739
  2. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  3. 26 12月, 2012 1 次提交
    • E
      pidns: Stop pid allocation when init dies · c876ad76
      Eric W. Biederman 提交于
      Oleg pointed out that in a pid namespace the sequence.
      - pid 1 becomes a zombie
      - setns(thepidns), fork,...
      - reaping pid 1.
      - The injected processes exiting.
      
      Can lead to processes attempting access their child reaper and
      instead following a stale pointer.
      
      That waitpid for init can return before all of the processes in
      the pid namespace have exited is also unfortunate.
      
      Avoid these problems by disabling the allocation of new pids in a pid
      namespace when init dies, instead of when the last process in a pid
      namespace is reaped.
      Pointed-out-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      c876ad76
  4. 27 5月, 2011 1 次提交
  5. 19 4月, 2011 1 次提交
    • L
      next_pidmap: fix overflow condition · c78193e9
      Linus Torvalds 提交于
      next_pidmap() just quietly accepted whatever 'last' pid that was passed
      in, which is not all that safe when one of the users is /proc.
      
      Admittedly the proc code should do some sanity checking on the range
      (and that will be the next commit), but that doesn't mean that the
      helper functions should just do that pidmap pointer arithmetic without
      checking the range of its arguments.
      
      So clamp 'last' to PID_MAX_LIMIT.  The fact that we then do "last+1"
      doesn't really matter, the for-loop does check against the end of the
      pidmap array properly (it's only the actual pointer arithmetic overflow
      case we need to worry about, and going one bit beyond isn't going to
      overflow).
      
      [ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]
      Reported-by: NTavis Ormandy <taviso@cmpxchg8b.com>
      Analyzed-by: NRobert Święcki <robert@swiecki.net>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c78193e9
  6. 31 3月, 2011 1 次提交
  7. 24 3月, 2011 1 次提交
  8. 09 1月, 2009 1 次提交
    • E
      pid: implement ns_of_pid · f9fb860f
      Eric W. Biederman 提交于
      A current problem with the pid namespace is that it is easy to do pid
      related work after exit_task_namespaces which drops the nsproxy pointer.
      
      However if we are doing pid namespace related work we are always operating
      on some struct pid which retains the pid_namespace pointer of the pid
      namespace it was allocated in.
      
      So provide ns_of_pid which allows us to find the pid namespace a pid was
      allocated in.
      
      Using this we have the needed infrastructure to do pid namespace related
      work at anytime we have a struct pid, removing the chance of accidentally
      having a NULL pointer dereference when accessing current->nsproxy.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Bastian Blank <bastian@waldi.eu.org>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Nadia Derbey <Nadia.Derbey@bull.net>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9fb860f
  9. 04 12月, 2008 1 次提交
  10. 21 8月, 2008 1 次提交
    • K
      fix setpriority(PRIO_PGRP) thread iterator breakage · 2d70b68d
      Ken Chen 提交于
      When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP
      process, only the task leader of the process is affected, all other
      sibling LWP threads didn't receive the setting.  The problem was that the
      iterator used in sys_setpriority() only iteartes over one task for each
      process, ignoring all other sibling thread.
      
      Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk
      each thread of a process.  Convert 4 call sites in {set/get}priority and
      ioprio_{set/get}.
      Signed-off-by: NKen Chen <kenchen@google.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d70b68d
  11. 26 7月, 2008 3 次提交
  12. 30 4月, 2008 2 次提交
  13. 14 2月, 2008 1 次提交
  14. 09 2月, 2008 3 次提交
  15. 20 10月, 2007 7 次提交
  16. 11 5月, 2007 2 次提交
  17. 13 2月, 2007 1 次提交
  18. 09 12月, 2006 1 次提交
  19. 03 10月, 2006 1 次提交
  20. 02 10月, 2006 5 次提交
    • O
      [PATCH] introduce get_task_pid() to fix unsafe get_pid() · 1a657f78
      Oleg Nesterov 提交于
      proc_pid_make_inode:
      
      	ei->pid = get_pid(task_pid(task));
      
      I think this is not safe.  get_pid() can be preempted after checking "pid
      != NULL".  Then the task exits, does detach_pid(), and RCU frees the pid.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1a657f78
    • O
      [PATCH] pid: simplify pid iterators · d387cae0
      Oleg Nesterov 提交于
      I think it is hardly possible to read the current do_each_task_pid().  The
      new version is much simpler and makes the code smaller.
      
      Only the do_each_task_pid change is tested, the do_each_pid_task isn't.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d387cae0
    • E
      [PATCH] pid: implement pid_nr · 5feb8f5f
      Eric W. Biederman 提交于
      As we stop storing pid_t's and move to storing struct pid *.  We need a way to
      get the pid_t from the struct pid to report to user space what we have stored.
      
      Having a clean well defined way to do this is especially important as we move
      to multiple pid spaces as may need to report a different value to the caller
      depending on which pid space the caller is in.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5feb8f5f
    • E
      [PATCH] pid: add do_each_pid_task · 558cb325
      Eric W. Biederman 提交于
      To avoid pid rollover confusion the kernel needs to work with struct pid *
      instead of pid_t.  Currently there is not an iterator that walks through all
      of the tasks of a given pid type starting with a struct pid.  This prevents us
      replacing some pid_t instances with struct pid.  So this patch adds
      do_each_pid_task which walks through the set of task for a given pid type
      starting with a struct pid.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      558cb325
    • E
      [PATCH] proc: readdir race fix (take 3) · 0804ef4b
      Eric W. Biederman 提交于
      The problem: An opendir, readdir, closedir sequence can fail to report
      process ids that are continually in use throughout the sequence of system
      calls.  For this race to trigger the process that proc_pid_readdir stops at
      must exit before readdir is called again.
      
      This can cause ps to fail to report processes, and it is in violation of
      posix guarantees and normal application expectations with respect to
      readdir.
      
      Currently there is no way to work around this problem in user space short
      of providing a gargantuan buffer to user space so the directory read all
      happens in on system call.
      
      This patch implements the normal directory semantics for proc, that
      guarantee that a directory entry that is neither created nor destroyed
      while reading the directory entry will be returned.  For directory that are
      either created or destroyed during the readdir you may or may not see them.
       Furthermore you may seek to a directory offset you have previously seen.
      
      These are the guarantee that ext[23] provides and that posix requires, and
      more importantly that user space expects.  Plus it is a simple semantic to
      implement reliable service.  It is just a matter of calling readdir a
      second time if you are wondering if something new has show up.
      
      These better semantics are implemented by scanning through the pids in
      numerical order and by making the file offset a pid plus a fixed offset.
      
      The pid scan happens on the pid bitmap, which when you look at it is
      remarkably efficient for a brute force algorithm.  Given that a typical
      cache line is 64 bytes and thus covers space for 64*8 == 200 pids.  There
      are only 40 cache lines for the entire 32K pid space.  A typical system
      will have 100 pids or more so this is actually fewer cache lines we have to
      look at to scan a linked list, and the worst case of having to scan the
      entire pid bitmap is pretty reasonable.
      
      If we need something more efficient we can go to a more efficient data
      structure for indexing the pids, but for now what we have should be
      sufficient.
      
      In addition this takes no additional locks and is actually less code than
      what we are doing now.
      
      Also another very subtle bug in this area has been fixed.  It is possible
      to catch a task in the middle of de_thread where a thread is assuming the
      thread of it's thread group leader.  This patch carefully handles that case
      so if we hit it we don't fail to return the pid, that is undergoing the
      de_thread dance.
      
      Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
      providing the first fix, pointing this out and working on it.
      
      [oleg@tv-sign.ru: fix it]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Jean Delvare <jdelvare@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0804ef4b
  21. 27 9月, 2006 1 次提交
    • E
      [PATCH] pid: Implement transfer_pid and use it to simplify de_thread · c18258c6
      Eric W. Biederman 提交于
      In de_thread we move pids from one process to another, a rather ugly case.
      The function transfer_pid makes it clear what we are doing, and makes the
      action atomic.  This is useful we ever want to atomically traverse the
      process group and session lists, in a rcu safe manner.
      
      Even if the atomic properties this change should be a win as transfer_pid
      should be less code to execute than executing both attach_pid and
      detach_pid, and this should make de_thread slightly smaller as only a
      single function call needs to be emitted.  The only downside is that the
      code might be slower to execute as the odds are against transfer_pid being
      in cache.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c18258c6
  22. 01 4月, 2006 1 次提交
    • E
      [PATCH] pidhash: Refactor the pid hash table · 92476d7f
      Eric W. Biederman 提交于
      Simplifies the code, reduces the need for 4 pid hash tables, and makes the
      code more capable.
      
      In the discussions I had with Oleg it was felt that to a large extent the
      cleanup itself justified the work.  With struct pid being dynamically
      allocated meant we could create the hash table entry when the pid was
      allocated and free the hash table entry when the pid was freed.  Instead of
      playing with the hash lists when ever a process would attach or detach to a
      process.
      
      For myself the fact that it gave what my previous task_ref patch gave for free
      with simpler code was a big win.  The problem is that if you hold a reference
      to struct task_struct you lock in 10K of low memory.  If you do that in a user
      controllable way like /proc does, with an unprivileged but hostile user space
      application with typical resource limits of 1000 fds and 100 processes I can
      trigger the OOM killer by consuming all of low memory with task structs, on a
      machine wight 1GB of low memory.
      
      If I instead hold a reference to struct pid which holds a pointer to my
      task_struct, I don't suffer from that problem because struct pid is 2 orders
      of magnitude smaller.  In fact struct pid is small enough that most other
      kernel data structures dwarf it, so simply limiting the number of referring
      data structures is enough to prevent exhaustion of low memory.
      
      This splits the current struct pid into two structures, struct pid and struct
      pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one.
      struct pid_link is the per process linkage into the hash tables and lives in
      struct task_struct.  struct pid is given an indepedent lifetime, and holds
      pointers to each of the pid types.
      
      The independent life of struct pid simplifies attach_pid, and detach_pid,
      because we are always manipulating the list of pids and not the hash table.
      In addition in giving struct pid an indpendent life it makes the concept much
      more powerful.
      
      Kernel data structures can now embed a struct pid * instead of a pid_t and
      not suffer from pid wrap around problems or from keeping unnecessarily
      large amounts of memory allocated.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      92476d7f
  23. 29 3月, 2006 2 次提交
    • O
      [PATCH] pids: kill PIDTYPE_TGID · 47e65328
      Oleg Nesterov 提交于
      This patch kills PIDTYPE_TGID pid_type thus saving one hash table in
      kernel/pid.c and speeding up subthreads create/destroy a bit.  It is also a
      preparation for the further tref/pids rework.
      
      This patch adds 'struct list_head thread_group' to 'struct task_struct'
      instead.
      
      We don't detach group leader from PIDTYPE_PID namespace until another
      thread inherits it's ->pid == ->tgid, so we are safe wrt premature
      free_pidmap(->tgid) call.
      
      Currently there are no users of find_task_by_pid_type(PIDTYPE_TGID).
      Should the need arise, we can use find_task_by_pid()->group_leader.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Acked-By: NEric Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      47e65328
    • E
      [PATCH] pidhash: kill switch_exec_pids · d73d6529
      Eric W. Biederman 提交于
      switch_exec_pids is only called from de_thread by way of exec, and it is
      only called when we are exec'ing from a non thread group leader.
      
      Currently switch_exec_pids gives the leader the pid of the thread and
      unhashes and rehashes all of the process groups.  The leader is already in
      the EXIT_DEAD state so no one cares about it's pids.  The only concern for
      the leader is that __unhash_process called from release_task will function
      correctly.  If we don't touch the leader at all we know that
      __unhash_process will work fine so there is no need to touch the leader.
      
      For the task becomming the thread group leader, we just need to give it the
      pid of the old thread group leader, add it to the task list, and attach it
      to the session and the process group of the thread group.
      
      Currently de_thread is also adding the task to the task list which is just
      silly.
      
      Currently the only leader of __detach_pid besides detach_pid is
      switch_exec_pids because of the ugly extra work that was being
      performed.
      
      So this patch removes switch_exec_pids because it is doing too much, it is
      creating an unnecessary special case in pid.c, duing work duplicated in
      de_thread, and generally obscuring what it is going on.
      
      The necessary work is added to de_thread, and it seems to be a little
      clearer there what is going on.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Kirill Korotaev <dev@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d73d6529