1. 03 4月, 2009 4 次提交
    • J
      kernel/cgroup.c: kfree(NULL) is legal · 66bdc9cf
      Jesper Juhl 提交于
      Reduces object file size a bit:
      
      Before:
      $ size kernel/cgroup.o
         text    data     bss     dec     hex filename
        21593    7804    4924   34321    8611 kernel/cgroup.o
      After:
      $ size kernel/cgroup.o
         text    data     bss     dec     hex filename
        21537    7744    4924   34205    859d kernel/cgroup.o
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66bdc9cf
    • K
      cgroup: fix frequent -EBUSY at rmdir · ec64f515
      KAMEZAWA Hiroyuki 提交于
      In following situation, with memory subsystem,
      
      	/groupA use_hierarchy==1
      		/01 some tasks
      		/02 some tasks
      		/03 some tasks
      		/04 empty
      
      When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim
      is triggered and the kernel walks tree under groupA. In this case,
      rmdir /groupA/04 fails with -EBUSY frequently because of temporal
      refcnt from the kernel.
      
      In general. cgroup can be rmdir'd if there are no children groups and
      no tasks. Frequent fails of rmdir() is not useful to users.
      (And the reason for -EBUSY is unknown to users.....in most cases)
      
      This patch tries to modify above behavior, by
      	- retries if css_refcnt is got by someone.
      	- add "return value" to pre_destroy() and allows subsystem to
      	  say "we're really busy!"
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec64f515
    • K
      cgroup: CSS ID support · 38460b48
      KAMEZAWA Hiroyuki 提交于
      Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
      
      This patch attaches unique ID to each css and provides following.
      
       - css_lookup(subsys, id)
         returns pointer to struct cgroup_subysys_state of id.
       - css_get_next(subsys, id, rootid, depth, foundid)
         returns the next css under "root" by scanning
      
      When cgroup_subsys->use_id is set, an id for css is maintained.
      
      The cgroup framework only parepares
      	- css_id of root css for subsys
      	- id is automatically attached at creation of css.
      	- id is *not* freed automatically. Because the cgroup framework
      	  don't know lifetime of cgroup_subsys_state.
      	  free_css_id() function is provided. This must be called by subsys.
      
      There are several reasons to develop this.
      	- Saving space .... For example, memcg's swap_cgroup is array of
      	  pointers to cgroup. But it is not necessary to be very fast.
      	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
      	  reduce much amount of memory usage.
      
      	- Scanning without lock.
      	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
      	  css under root can be written without locks.
      	  ex)
      	  do {
      		rcu_read_lock();
      		next = cgroup_get_next(subsys, id, root, &found);
      		/* check sanity of next here */
      		css_tryget();
      		rcu_read_unlock();
      		id = found + 1
      	 } while(...)
      
      Characteristics:
      	- Each css has unique ID under subsys.
      	- Lifetime of ID is controlled by subsys.
      	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
      	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
      
      Design Choices:
      	- scan-by-ID v.s. scan-by-tree-walk.
      	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
      	  by following kind of routine.
      	  scan -> rest a while(release a lock) -> conitunue from interrupted
      	  memcg's hierarchical reclaim does this.
      
      	- When subsys->use_id is set, # of css in the system is limited to
      	  65535.
      
      [bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38460b48
    • G
      cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups · 313e924c
      Grzegorz Nosek 提交于
      The ns_proxy cgroup allows moving processes to child cgroups only one
      level deep at a time.  This commit relaxes this restriction and makes it
      possible to attach tasks directly to grandchild cgroups, e.g.:
      
      ($pid is in the root cgroup)
      echo $pid > /cgroup/CG1/CG2/tasks
      
      Previously this operation would fail with -EPERM and would have to be
      performed as two steps:
      echo $pid > /cgroup/CG1/tasks
      echo $pid > /cgroup/CG1/CG2/tasks
      
      Also, the target cgroup no longer needs to be empty to move a task there.
      Signed-off-by: NGrzegorz Nosek <root@localdomain.pl>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      313e924c
  2. 28 3月, 2009 2 次提交
  3. 19 2月, 2009 1 次提交
  4. 12 2月, 2009 1 次提交
  5. 30 1月, 2009 4 次提交
  6. 09 1月, 2009 14 次提交
  7. 07 1月, 2009 1 次提交
  8. 06 1月, 2009 1 次提交
  9. 05 1月, 2009 1 次提交
    • L
      cgroups: fix a race between cgroup_clone and umount · 7b574b7b
      Li Zefan 提交于
      The race is calling cgroup_clone() while umounting the ns cgroup subsys,
      and thus cgroup_clone() might access invalid cgroup_fs, or kill_sb() is
      called after cgroup_clone() created a new dir in it.
      
      The BUG I triggered is BUG_ON(root->number_of_cgroups != 1);
      
        ------------[ cut here ]------------
        kernel BUG at kernel/cgroup.c:1093!
        invalid opcode: 0000 [#1] SMP
        ...
        Process umount (pid: 5177, ti=e411e000 task=e40c4670 task.ti=e411e000)
        ...
        Call Trace:
         [<c0493df7>] ? deactivate_super+0x3f/0x51
         [<c04a3600>] ? mntput_no_expire+0xb3/0xdd
         [<c04a3ab2>] ? sys_umount+0x265/0x2ac
         [<c04a3b06>] ? sys_oldumount+0xd/0xf
         [<c0403911>] ? sysenter_do_call+0x12/0x31
        ...
        EIP: [<c0456e76>] cgroup_kill_sb+0x23/0xe0 SS:ESP 0068:e411ef2c
        ---[ end trace c766c1be3bf944ac ]---
      
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Serge E. Hallyn" <serue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b574b7b
  10. 24 12月, 2008 2 次提交
  11. 16 12月, 2008 1 次提交
    • P
      cgroups: fix a race between rmdir and remount · 307257cf
      Paul Menage 提交于
      When a cgroup is removed, it's unlinked from its parent's children list,
      but not actually freed until the last dentry on it is released (at which
      point cgrp->root->number_of_cgroups is decremented).
      
      Currently rebind_subsystems checks for the top cgroup's child list being
      empty in order to rebind subsystems into or out of a hierarchy - this can
      result in the set of subsystems bound to a hierarchy being
      removed-but-not-freed cgroup.
      
      The simplest fix for this is to forbid remounts that change the set of
      subsystems on a hierarchy that has removed-but-not-freed cgroups.  This
      bug can be reproduced via:
      
      mkdir /mnt/cg
      mount -t cgroup -o ns,freezer cgroup /mnt/cg
      mkdir /mnt/cg/foo
      sleep 1h < /mnt/cg/foo &
      rmdir /mnt/cg/foo
      mount -t cgroup -o remount,ns,devices,freezer cgroup /mnt/cg
      kill $!
      
      Though the above will cause oops in -mm only but not mainline, but the bug
      can cause memory leak in mainline (and even oops)
      Signed-off-by: NPaul Menage <menage@google.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      307257cf
  12. 20 11月, 2008 2 次提交
  13. 14 11月, 2008 3 次提交
  14. 07 11月, 2008 1 次提交
    • L
      cgroups: fix invalid cgrp->dentry before cgroup has been completely removed · 24eb0899
      Li Zefan 提交于
      This fixes an oops when reading /proc/sched_debug.
      
      A cgroup won't be removed completely until finishing cgroup_diput(), so we
      shouldn't invalidate cgrp->dentry in cgroup_rmdir().  Otherwise, when a
      group is being removed while cgroup_path() gets called, we may trigger
      NULL dereference BUG.
      
      The bug can be reproduced:
      
       # cat test.sh
       #!/bin/sh
       mount -t cgroup -o cpu xxx /mnt
       for (( ; ; ))
       {
      	mkdir /mnt/sub
      	rmdir /mnt/sub
       }
       # ./test.sh &
       # cat /proc/sched_debug
      
      BUG: unable to handle kernel NULL pointer dereference at 00000038
      IP: [<c045a47f>] cgroup_path+0x39/0x90
      ...
      Call Trace:
       [<c0420344>] ? print_cfs_rq+0x6e/0x75d
       [<c0421160>] ? sched_debug_show+0x72d/0xc1e
      ...
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>		[2.6.26.x, 2.6.27.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24eb0899
  15. 27 10月, 2008 1 次提交
  16. 20 10月, 2008 1 次提交