1. 03 4月, 2009 2 次提交
    • K
      cgroup: CSS ID support · 38460b48
      KAMEZAWA Hiroyuki 提交于
      Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
      
      This patch attaches unique ID to each css and provides following.
      
       - css_lookup(subsys, id)
         returns pointer to struct cgroup_subysys_state of id.
       - css_get_next(subsys, id, rootid, depth, foundid)
         returns the next css under "root" by scanning
      
      When cgroup_subsys->use_id is set, an id for css is maintained.
      
      The cgroup framework only parepares
      	- css_id of root css for subsys
      	- id is automatically attached at creation of css.
      	- id is *not* freed automatically. Because the cgroup framework
      	  don't know lifetime of cgroup_subsys_state.
      	  free_css_id() function is provided. This must be called by subsys.
      
      There are several reasons to develop this.
      	- Saving space .... For example, memcg's swap_cgroup is array of
      	  pointers to cgroup. But it is not necessary to be very fast.
      	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
      	  reduce much amount of memory usage.
      
      	- Scanning without lock.
      	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
      	  css under root can be written without locks.
      	  ex)
      	  do {
      		rcu_read_lock();
      		next = cgroup_get_next(subsys, id, root, &found);
      		/* check sanity of next here */
      		css_tryget();
      		rcu_read_unlock();
      		id = found + 1
      	 } while(...)
      
      Characteristics:
      	- Each css has unique ID under subsys.
      	- Lifetime of ID is controlled by subsys.
      	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
      	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
      
      Design Choices:
      	- scan-by-ID v.s. scan-by-tree-walk.
      	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
      	  by following kind of routine.
      	  scan -> rest a while(release a lock) -> conitunue from interrupted
      	  memcg's hierarchical reclaim does this.
      
      	- When subsys->use_id is set, # of css in the system is limited to
      	  65535.
      
      [bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38460b48
    • G
      cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups · 313e924c
      Grzegorz Nosek 提交于
      The ns_proxy cgroup allows moving processes to child cgroups only one
      level deep at a time.  This commit relaxes this restriction and makes it
      possible to attach tasks directly to grandchild cgroups, e.g.:
      
      ($pid is in the root cgroup)
      echo $pid > /cgroup/CG1/CG2/tasks
      
      Previously this operation would fail with -EPERM and would have to be
      performed as two steps:
      echo $pid > /cgroup/CG1/tasks
      echo $pid > /cgroup/CG1/CG2/tasks
      
      Also, the target cgroup no longer needs to be empty to move a task there.
      Signed-off-by: NGrzegorz Nosek <root@localdomain.pl>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      313e924c
  2. 28 3月, 2009 2 次提交
  3. 19 2月, 2009 1 次提交
  4. 12 2月, 2009 1 次提交
  5. 30 1月, 2009 4 次提交
  6. 09 1月, 2009 14 次提交
  7. 07 1月, 2009 1 次提交
  8. 06 1月, 2009 1 次提交
  9. 05 1月, 2009 1 次提交
    • L
      cgroups: fix a race between cgroup_clone and umount · 7b574b7b
      Li Zefan 提交于
      The race is calling cgroup_clone() while umounting the ns cgroup subsys,
      and thus cgroup_clone() might access invalid cgroup_fs, or kill_sb() is
      called after cgroup_clone() created a new dir in it.
      
      The BUG I triggered is BUG_ON(root->number_of_cgroups != 1);
      
        ------------[ cut here ]------------
        kernel BUG at kernel/cgroup.c:1093!
        invalid opcode: 0000 [#1] SMP
        ...
        Process umount (pid: 5177, ti=e411e000 task=e40c4670 task.ti=e411e000)
        ...
        Call Trace:
         [<c0493df7>] ? deactivate_super+0x3f/0x51
         [<c04a3600>] ? mntput_no_expire+0xb3/0xdd
         [<c04a3ab2>] ? sys_umount+0x265/0x2ac
         [<c04a3b06>] ? sys_oldumount+0xd/0xf
         [<c0403911>] ? sysenter_do_call+0x12/0x31
        ...
        EIP: [<c0456e76>] cgroup_kill_sb+0x23/0xe0 SS:ESP 0068:e411ef2c
        ---[ end trace c766c1be3bf944ac ]---
      
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Serge E. Hallyn" <serue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b574b7b
  10. 24 12月, 2008 2 次提交
  11. 16 12月, 2008 1 次提交
    • P
      cgroups: fix a race between rmdir and remount · 307257cf
      Paul Menage 提交于
      When a cgroup is removed, it's unlinked from its parent's children list,
      but not actually freed until the last dentry on it is released (at which
      point cgrp->root->number_of_cgroups is decremented).
      
      Currently rebind_subsystems checks for the top cgroup's child list being
      empty in order to rebind subsystems into or out of a hierarchy - this can
      result in the set of subsystems bound to a hierarchy being
      removed-but-not-freed cgroup.
      
      The simplest fix for this is to forbid remounts that change the set of
      subsystems on a hierarchy that has removed-but-not-freed cgroups.  This
      bug can be reproduced via:
      
      mkdir /mnt/cg
      mount -t cgroup -o ns,freezer cgroup /mnt/cg
      mkdir /mnt/cg/foo
      sleep 1h < /mnt/cg/foo &
      rmdir /mnt/cg/foo
      mount -t cgroup -o remount,ns,devices,freezer cgroup /mnt/cg
      kill $!
      
      Though the above will cause oops in -mm only but not mainline, but the bug
      can cause memory leak in mainline (and even oops)
      Signed-off-by: NPaul Menage <menage@google.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      307257cf
  12. 20 11月, 2008 2 次提交
  13. 14 11月, 2008 3 次提交
  14. 07 11月, 2008 1 次提交
    • L
      cgroups: fix invalid cgrp->dentry before cgroup has been completely removed · 24eb0899
      Li Zefan 提交于
      This fixes an oops when reading /proc/sched_debug.
      
      A cgroup won't be removed completely until finishing cgroup_diput(), so we
      shouldn't invalidate cgrp->dentry in cgroup_rmdir().  Otherwise, when a
      group is being removed while cgroup_path() gets called, we may trigger
      NULL dereference BUG.
      
      The bug can be reproduced:
      
       # cat test.sh
       #!/bin/sh
       mount -t cgroup -o cpu xxx /mnt
       for (( ; ; ))
       {
      	mkdir /mnt/sub
      	rmdir /mnt/sub
       }
       # ./test.sh &
       # cat /proc/sched_debug
      
      BUG: unable to handle kernel NULL pointer dereference at 00000038
      IP: [<c045a47f>] cgroup_path+0x39/0x90
      ...
      Call Trace:
       [<c0420344>] ? print_cfs_rq+0x6e/0x75d
       [<c0421160>] ? sched_debug_show+0x72d/0xc1e
      ...
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>		[2.6.26.x, 2.6.27.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24eb0899
  15. 27 10月, 2008 1 次提交
  16. 20 10月, 2008 2 次提交
    • P
      cgroups: convert tasks file to use a seq_file with shared pid array · cc31edce
      Paul Menage 提交于
      Rather than pre-generating the entire text for the "tasks" file each
      time the file is opened, we instead just generate/update the array of
      process ids and use a seq_file to report these to userspace.  All open
      file handles on the same "tasks" file can share a pid array, which may
      be updated any time that no thread is actively reading the array.  By
      sharing the array, the potential for userspace to DoS the system by
      opening many handles on the same "tasks" file is removed.
      
      [Based on a patch by Lai Jiangshan, extended to use seq_file]
      Signed-off-by: NPaul Menage <menage@google.com>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc31edce
    • L
      cgroups: fix probable race with put_css_set[_taskexit] and find_css_set · 146aa1bd
      Lai Jiangshan 提交于
      put_css_set_taskexit may be called when find_css_set is called on other
      cpu.  And the race will occur:
      
      put_css_set_taskexit side                    find_css_set side
      
                                              |
      atomic_dec_and_test(&kref->refcount)    |
          /* kref->refcount = 0 */            |
      ....................................................................
                                              |  read_lock(&css_set_lock)
                                              |  find_existing_css_set
                                              |  get_css_set
                                              |  read_unlock(&css_set_lock);
      ....................................................................
      __release_css_set                       |
      ....................................................................
                                              | /* use a released css_set */
                                              |
      
      [put_css_set is the same. But in the current code, all put_css_set are
      put into cgroup mutex critical region as the same as find_css_set.]
      
      [akpm@linux-foundation.org: repair comments]
      [menage@google.com: eliminate race in css_set refcounting]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NPaul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      146aa1bd
  17. 17 10月, 2008 1 次提交