1. 30 1月, 2009 2 次提交
  2. 09 1月, 2009 14 次提交
  3. 07 1月, 2009 1 次提交
  4. 06 1月, 2009 1 次提交
  5. 05 1月, 2009 1 次提交
    • L
      cgroups: fix a race between cgroup_clone and umount · 7b574b7b
      Li Zefan 提交于
      The race is calling cgroup_clone() while umounting the ns cgroup subsys,
      and thus cgroup_clone() might access invalid cgroup_fs, or kill_sb() is
      called after cgroup_clone() created a new dir in it.
      
      The BUG I triggered is BUG_ON(root->number_of_cgroups != 1);
      
        ------------[ cut here ]------------
        kernel BUG at kernel/cgroup.c:1093!
        invalid opcode: 0000 [#1] SMP
        ...
        Process umount (pid: 5177, ti=e411e000 task=e40c4670 task.ti=e411e000)
        ...
        Call Trace:
         [<c0493df7>] ? deactivate_super+0x3f/0x51
         [<c04a3600>] ? mntput_no_expire+0xb3/0xdd
         [<c04a3ab2>] ? sys_umount+0x265/0x2ac
         [<c04a3b06>] ? sys_oldumount+0xd/0xf
         [<c0403911>] ? sysenter_do_call+0x12/0x31
        ...
        EIP: [<c0456e76>] cgroup_kill_sb+0x23/0xe0 SS:ESP 0068:e411ef2c
        ---[ end trace c766c1be3bf944ac ]---
      
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Serge E. Hallyn" <serue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b574b7b
  6. 24 12月, 2008 2 次提交
  7. 16 12月, 2008 1 次提交
    • P
      cgroups: fix a race between rmdir and remount · 307257cf
      Paul Menage 提交于
      When a cgroup is removed, it's unlinked from its parent's children list,
      but not actually freed until the last dentry on it is released (at which
      point cgrp->root->number_of_cgroups is decremented).
      
      Currently rebind_subsystems checks for the top cgroup's child list being
      empty in order to rebind subsystems into or out of a hierarchy - this can
      result in the set of subsystems bound to a hierarchy being
      removed-but-not-freed cgroup.
      
      The simplest fix for this is to forbid remounts that change the set of
      subsystems on a hierarchy that has removed-but-not-freed cgroups.  This
      bug can be reproduced via:
      
      mkdir /mnt/cg
      mount -t cgroup -o ns,freezer cgroup /mnt/cg
      mkdir /mnt/cg/foo
      sleep 1h < /mnt/cg/foo &
      rmdir /mnt/cg/foo
      mount -t cgroup -o remount,ns,devices,freezer cgroup /mnt/cg
      kill $!
      
      Though the above will cause oops in -mm only but not mainline, but the bug
      can cause memory leak in mainline (and even oops)
      Signed-off-by: NPaul Menage <menage@google.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      307257cf
  8. 20 11月, 2008 2 次提交
  9. 14 11月, 2008 3 次提交
  10. 07 11月, 2008 1 次提交
    • L
      cgroups: fix invalid cgrp->dentry before cgroup has been completely removed · 24eb0899
      Li Zefan 提交于
      This fixes an oops when reading /proc/sched_debug.
      
      A cgroup won't be removed completely until finishing cgroup_diput(), so we
      shouldn't invalidate cgrp->dentry in cgroup_rmdir().  Otherwise, when a
      group is being removed while cgroup_path() gets called, we may trigger
      NULL dereference BUG.
      
      The bug can be reproduced:
      
       # cat test.sh
       #!/bin/sh
       mount -t cgroup -o cpu xxx /mnt
       for (( ; ; ))
       {
      	mkdir /mnt/sub
      	rmdir /mnt/sub
       }
       # ./test.sh &
       # cat /proc/sched_debug
      
      BUG: unable to handle kernel NULL pointer dereference at 00000038
      IP: [<c045a47f>] cgroup_path+0x39/0x90
      ...
      Call Trace:
       [<c0420344>] ? print_cfs_rq+0x6e/0x75d
       [<c0421160>] ? sched_debug_show+0x72d/0xc1e
      ...
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>		[2.6.26.x, 2.6.27.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24eb0899
  11. 27 10月, 2008 1 次提交
  12. 20 10月, 2008 2 次提交
    • P
      cgroups: convert tasks file to use a seq_file with shared pid array · cc31edce
      Paul Menage 提交于
      Rather than pre-generating the entire text for the "tasks" file each
      time the file is opened, we instead just generate/update the array of
      process ids and use a seq_file to report these to userspace.  All open
      file handles on the same "tasks" file can share a pid array, which may
      be updated any time that no thread is actively reading the array.  By
      sharing the array, the potential for userspace to DoS the system by
      opening many handles on the same "tasks" file is removed.
      
      [Based on a patch by Lai Jiangshan, extended to use seq_file]
      Signed-off-by: NPaul Menage <menage@google.com>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc31edce
    • L
      cgroups: fix probable race with put_css_set[_taskexit] and find_css_set · 146aa1bd
      Lai Jiangshan 提交于
      put_css_set_taskexit may be called when find_css_set is called on other
      cpu.  And the race will occur:
      
      put_css_set_taskexit side                    find_css_set side
      
                                              |
      atomic_dec_and_test(&kref->refcount)    |
          /* kref->refcount = 0 */            |
      ....................................................................
                                              |  read_lock(&css_set_lock)
                                              |  find_existing_css_set
                                              |  get_css_set
                                              |  read_unlock(&css_set_lock);
      ....................................................................
      __release_css_set                       |
      ....................................................................
                                              | /* use a released css_set */
                                              |
      
      [put_css_set is the same. But in the current code, all put_css_set are
      put into cgroup mutex critical region as the same as find_css_set.]
      
      [akpm@linux-foundation.org: repair comments]
      [menage@google.com: eliminate race in css_set refcounting]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NPaul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      146aa1bd
  13. 17 10月, 2008 1 次提交
  14. 29 9月, 2008 1 次提交
    • B
      mm owner: fix race between swapoff and exit · 31a78f23
      Balbir Singh 提交于
      There's a race between mm->owner assignment and swapoff, more easily
      seen when task slab poisoning is turned on.  The condition occurs when
      try_to_unuse() runs in parallel with an exiting task.  A similar race
      can occur with callers of get_task_mm(), such as /proc/<pid>/<mmstats>
      or ptrace or page migration.
      
      CPU0                                    CPU1
                                              try_to_unuse
                                              looks at mm = task0->mm
                                              increments mm->mm_users
      task 0 exits
      mm->owner needs to be updated, but no
      new owner is found (mm_users > 1, but
      no other task has task->mm = task0->mm)
      mm_update_next_owner() leaves
                                              mmput(mm) decrements mm->mm_users
      task0 freed
                                              dereferencing mm->owner fails
      
      The fix is to notify the subsystem via mm_owner_changed callback(),
      if no new owner is found, by specifying the new task as NULL.
      
      Jiri Slaby:
      mm->owner was set to NULL prior to calling cgroup_mm_owner_callbacks(), but
      must be set after that, so as not to pass NULL as old owner causing oops.
      
      Daisuke Nishimura:
      mm_update_next_owner() may set mm->owner to NULL, but mem_cgroup_from_task()
      and its callers need to take account of this situation to avoid oops.
      
      Hugh Dickins:
      Lockdep warning and hang below exec_mmap() when testing these patches.
      exit_mm() up_reads mmap_sem before calling mm_update_next_owner(),
      so exec_mmap() now needs to do the same.  And with that repositioning,
      there's now no point in mm_need_new_owner() allowing for NULL mm.
      Reported-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31a78f23
  15. 31 7月, 2008 3 次提交
  16. 27 7月, 2008 2 次提交
  17. 26 7月, 2008 2 次提交
    • S
      cgroup_clone: use pid of newly created task for new cgroup · e885dcde
      Serge E. Hallyn 提交于
      cgroup_clone creates a new cgroup with the pid of the task.  This works
      correctly for unshare, but for clone cgroup_clone is called from
      copy_namespaces inside copy_process, which happens before the new pid is
      created.  As a result, the new cgroup was created with current's pid.
      This patch:
      
      	1. Moves the call inside copy_process to after the new pid
      	   is created
      	2. Passes the struct pid into ns_cgroup_clone (as it is not
      	   yet attached to the task)
      	3. Passes a name from ns_cgroup_clone() into cgroup_clone()
      	   so as to keep cgroup_clone() itself simpler
      	4. Uses pid_vnr() to get the process id value, so that the
      	   pid used to name the new cgroup is always the pid as it
      	   would be known to the task which did the cloning or
      	   unsharing.  I think that is the most intuitive thing to
      	   do.  This way, task t1 does clone(CLONE_NEWPID) to get
      	   t2, which does clone(CLONE_NEWPID) to get t3, then the
      	   cgroup for t3 will be named for the pid by which t2 knows
      	   t3.
      
      (Thanks to Dan Smith for finding the main bug)
      
      Changelog:
      	June 11: Incorporate Paul Menage's feedback:  don't pass
      	         NULL to ns_cgroup_clone from unshare, and reduce
      		 patch size by using 'nodename' in cgroup_clone.
      	June 10: Original version
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NSerge Hallyn <serge@us.ibm.com>
      Acked-by: NPaul Menage <menage@google.com>
      Tested-by: NDan Smith <danms@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e885dcde
    • P
      cgroup files: turn attach_task_by_pid directly into a cgroup write handler · af351026
      Paul Menage 提交于
      This patch changes attach_task_by_pid() to take a u64 rather than a
      string; as a result it can be called directly as a control groups
      write_u64 handler, and cgroup_common_file_write() can be removed.
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af351026