1. 02 10月, 2009 1 次提交
  2. 24 9月, 2009 11 次提交
  3. 23 9月, 2009 1 次提交
  4. 22 9月, 2009 2 次提交
  5. 11 9月, 2009 1 次提交
  6. 30 7月, 2009 2 次提交
  7. 19 6月, 2009 1 次提交
  8. 12 6月, 2009 1 次提交
  9. 09 5月, 2009 1 次提交
  10. 03 4月, 2009 7 次提交
    • K
      memcg: fix OOM killer under memcg · 0b7f569e
      KAMEZAWA Hiroyuki 提交于
      This patch tries to fix OOM Killer problems caused by hierarchy.
      Now, memcg itself has OOM KILL function (in oom_kill.c) and tries to
      kill a task in memcg.
      
      But, when hierarchy is used, it's broken and correct task cannot
      be killed. For example, in following cgroup
      
      	/groupA/	hierarchy=1, limit=1G,
      		01	nolimit
      		02	nolimit
      All tasks' memory usage under /groupA, /groupA/01, groupA/02 is limited to
      groupA's 1Gbytes but OOM Killer just kills tasks in groupA.
      
      This patch provides makes the bad process be selected from all tasks
      under hierarchy. BTW, currently, oom_jiffies is updated against groupA
      in above case. oom_jiffies of tree should be updated.
      
      To see how oom_jiffies is used, please check mem_cgroup_oom_called()
      callers.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: const fix]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b7f569e
    • L
      cgroups: don't change release_agent when remount failed · 0670e08b
      Li Zefan 提交于
      Remount can fail in either case:
        - wrong mount options is specified, or option 'noprefix' is changed.
        - a to-be-added subsys is already mounted/active.
      
      When using remount to change 'release_agent', for the above former failure
      case, remount will return errno with release_agent unchanged, but for the
      latter case, remount will return EBUSY with relase_agent changed, which is
      unexpected I think:
      
       # mount -t cgroup -o cpu xxx /cgrp1
       # mount -t cgroup -o cpuset,release_agent=agent1 yyy /cgrp2
       # cat /cgrp2/release_agent
       agent1
       # mount -t cgroup -o remount,cpuset,noprefix,release_agent=agent2 yyy /cgrp2
       mount: /cgrp2 not mounted already, or bad option
       # cat /cgrp2/release_agent
       agent1     <-- ok
       # mount -t cgroup -o remount,cpu,cpuset,release_agent=agent2 yyy /cgrp2
       mount: /cgrp2 is busy
       # cat /cgrp2/release_agent
       agent2     <-- unexpected!
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0670e08b
    • L
      cgroups: show correct file mode · 099fca32
      Li Zefan 提交于
      We have some read-only files and write-only files, but currently they are
      all set to 0644, which is counter-intuitive and cause trouble for some
      cgroup tools like libcgroup.
      
      This patch adds 'mode' to struct cftype to allow cgroup subsys to set it's
      own files' file mode, and for the most cases cft->mode can be default to 0
      and cgroup will figure out proper mode.
      Acked-by: NPaul Menage <menage@google.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      099fca32
    • J
      kernel/cgroup.c: kfree(NULL) is legal · 66bdc9cf
      Jesper Juhl 提交于
      Reduces object file size a bit:
      
      Before:
      $ size kernel/cgroup.o
         text    data     bss     dec     hex filename
        21593    7804    4924   34321    8611 kernel/cgroup.o
      After:
      $ size kernel/cgroup.o
         text    data     bss     dec     hex filename
        21537    7744    4924   34205    859d kernel/cgroup.o
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66bdc9cf
    • K
      cgroup: fix frequent -EBUSY at rmdir · ec64f515
      KAMEZAWA Hiroyuki 提交于
      In following situation, with memory subsystem,
      
      	/groupA use_hierarchy==1
      		/01 some tasks
      		/02 some tasks
      		/03 some tasks
      		/04 empty
      
      When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim
      is triggered and the kernel walks tree under groupA. In this case,
      rmdir /groupA/04 fails with -EBUSY frequently because of temporal
      refcnt from the kernel.
      
      In general. cgroup can be rmdir'd if there are no children groups and
      no tasks. Frequent fails of rmdir() is not useful to users.
      (And the reason for -EBUSY is unknown to users.....in most cases)
      
      This patch tries to modify above behavior, by
      	- retries if css_refcnt is got by someone.
      	- add "return value" to pre_destroy() and allows subsystem to
      	  say "we're really busy!"
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec64f515
    • K
      cgroup: CSS ID support · 38460b48
      KAMEZAWA Hiroyuki 提交于
      Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
      
      This patch attaches unique ID to each css and provides following.
      
       - css_lookup(subsys, id)
         returns pointer to struct cgroup_subysys_state of id.
       - css_get_next(subsys, id, rootid, depth, foundid)
         returns the next css under "root" by scanning
      
      When cgroup_subsys->use_id is set, an id for css is maintained.
      
      The cgroup framework only parepares
      	- css_id of root css for subsys
      	- id is automatically attached at creation of css.
      	- id is *not* freed automatically. Because the cgroup framework
      	  don't know lifetime of cgroup_subsys_state.
      	  free_css_id() function is provided. This must be called by subsys.
      
      There are several reasons to develop this.
      	- Saving space .... For example, memcg's swap_cgroup is array of
      	  pointers to cgroup. But it is not necessary to be very fast.
      	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
      	  reduce much amount of memory usage.
      
      	- Scanning without lock.
      	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
      	  css under root can be written without locks.
      	  ex)
      	  do {
      		rcu_read_lock();
      		next = cgroup_get_next(subsys, id, root, &found);
      		/* check sanity of next here */
      		css_tryget();
      		rcu_read_unlock();
      		id = found + 1
      	 } while(...)
      
      Characteristics:
      	- Each css has unique ID under subsys.
      	- Lifetime of ID is controlled by subsys.
      	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
      	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
      
      Design Choices:
      	- scan-by-ID v.s. scan-by-tree-walk.
      	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
      	  by following kind of routine.
      	  scan -> rest a while(release a lock) -> conitunue from interrupted
      	  memcg's hierarchical reclaim does this.
      
      	- When subsys->use_id is set, # of css in the system is limited to
      	  65535.
      
      [bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38460b48
    • G
      cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups · 313e924c
      Grzegorz Nosek 提交于
      The ns_proxy cgroup allows moving processes to child cgroups only one
      level deep at a time.  This commit relaxes this restriction and makes it
      possible to attach tasks directly to grandchild cgroups, e.g.:
      
      ($pid is in the root cgroup)
      echo $pid > /cgroup/CG1/CG2/tasks
      
      Previously this operation would fail with -EPERM and would have to be
      performed as two steps:
      echo $pid > /cgroup/CG1/tasks
      echo $pid > /cgroup/CG1/CG2/tasks
      
      Also, the target cgroup no longer needs to be empty to move a task there.
      Signed-off-by: NGrzegorz Nosek <root@localdomain.pl>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      313e924c
  11. 28 3月, 2009 2 次提交
  12. 19 2月, 2009 1 次提交
  13. 12 2月, 2009 1 次提交
  14. 30 1月, 2009 4 次提交
  15. 09 1月, 2009 4 次提交
    • P
      cgroups: add css_tryget() · e7c5ec91
      Paul Menage 提交于
      Add css_tryget(), that obtains a counted reference on a CSS.  It is used
      in situations where the caller has a "weak" reference to the CSS, i.e.
      one that does not protect the cgroup from removal via a reference count,
      but would instead be cleaned up by a destroy() callback.
      
      css_tryget() will return true on success, or false if the cgroup is being
      removed.
      
      This is similar to Kamezawa Hiroyuki's patch from a week or two ago, but
      with the difference that in the event of css_tryget() racing with a
      cgroup_rmdir(), css_tryget() will only return false if the cgroup really
      does get removed.
      
      This implementation is done by biasing css->refcnt, so that a refcnt of 1
      means "releasable" and 0 means "released or releasing".  In the event of a
      race, css_tryget() distinguishes between "released" and "releasing" by
      checking for the CSS_REMOVED flag in css->flags.
      Signed-off-by: NPaul Menage <menage@google.com>
      Tested-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7c5ec91
    • P
      cgroups: add a per-subsystem hierarchy_mutex · 999cd8a4
      Paul Menage 提交于
      These patches introduce new locking/refcount support for cgroups to
      reduce the need for subsystems to call cgroup_lock(). This will
      ultimately allow the atomicity of cgroup_rmdir() (which was removed
      recently) to be restored.
      
      These three patches give:
      
      1/3 - introduce a per-subsystem hierarchy_mutex which a subsystem can
           use to prevent changes to its own cgroup tree
      
      2/3 - use hierarchy_mutex in place of calling cgroup_lock() in the
           memory controller
      
      3/3 - introduce a css_tryget() function similar to the one recently
            proposed by Kamezawa, but avoiding spurious refcount failures in
            the event of a race between a css_tryget() and an unsuccessful
            cgroup_rmdir()
      
      Future patches will likely involve:
      
      - using hierarchy mutex in place of cgroup_lock() in more subsystems
       where appropriate
      
      - restoring the atomicity of cgroup_rmdir() with respect to cgroup_create()
      
      This patch:
      
      Add a hierarchy_mutex to the cgroup_subsys object that protects changes to
      the hierarchy observed by that subsystem.  It is taken by the cgroup
      subsystem (in addition to cgroup_mutex) for the following operations:
      
      - linking a cgroup into that subsystem's cgroup tree
      - unlinking a cgroup from that subsystem's cgroup tree
      - moving the subsystem to/from a hierarchy (including across the
        bind() callback)
      
      Thus if the subsystem holds its own hierarchy_mutex, it can safely
      traverse its own hierarchy.
      Signed-off-by: NPaul Menage <menage@google.com>
      Tested-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      999cd8a4
    • P
      cgroups: make cgroup_path() RCU-safe · a47295e6
      Paul Menage 提交于
      Fix races between /proc/sched_debug by freeing cgroup objects via an RCU
      callback.  Thus any cgroup reference obtained from an RCU-safe source will
      remain valid during the RCU section.  Since dentries are also RCU-safe,
      this allows us to traverse up the tree safely.
      
      Additionally, make cgroup_path() check for a NULL cgrp->dentry to avoid
      trying to report a path for a partially-created cgroup.
      
      [lizf@cn.fujitsu.com: call deactive_super() in cgroup_diput()]
      Signed-off-by: NPaul Menage <menage@google.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a47295e6
    • G
      cgroups: skip processes from other namespaces when listing a cgroup · e7b80bb6
      Gowrishankar M 提交于
      Once tasks are populated from system namespace inside cgroup, container
      replaces other namespace task with 0 while listing tasks, inside
      container.
      
      Though this is expected behaviour from container end, there is no use of
      showing unwanted 0s.
      
      In this patch, we check if a process is in same namespace before loading
      into pid array.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NGowrishankar M <gowrishankar.m@in.ibm.com>
      Acked-by: NPaul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7b80bb6