1. 30 7月, 2009 1 次提交
  2. 03 4月, 2009 7 次提交
    • L
      cgroups: add 'data' field to struct cgroup_scanner · bd1a8ab7
      Li Zefan 提交于
      We need to pass some data to test_task() or process_task() in some cases.
      Will be used later.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd1a8ab7
    • K
      memcg: fix OOM killer under memcg · 0b7f569e
      KAMEZAWA Hiroyuki 提交于
      This patch tries to fix OOM Killer problems caused by hierarchy.
      Now, memcg itself has OOM KILL function (in oom_kill.c) and tries to
      kill a task in memcg.
      
      But, when hierarchy is used, it's broken and correct task cannot
      be killed. For example, in following cgroup
      
      	/groupA/	hierarchy=1, limit=1G,
      		01	nolimit
      		02	nolimit
      All tasks' memory usage under /groupA, /groupA/01, groupA/02 is limited to
      groupA's 1Gbytes but OOM Killer just kills tasks in groupA.
      
      This patch provides makes the bad process be selected from all tasks
      under hierarchy. BTW, currently, oom_jiffies is updated against groupA
      in above case. oom_jiffies of tree should be updated.
      
      To see how oom_jiffies is used, please check mem_cgroup_oom_called()
      callers.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: const fix]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b7f569e
    • L
      cgroups: show correct file mode · 099fca32
      Li Zefan 提交于
      We have some read-only files and write-only files, but currently they are
      all set to 0644, which is counter-intuitive and cause trouble for some
      cgroup tools like libcgroup.
      
      This patch adds 'mode' to struct cftype to allow cgroup subsys to set it's
      own files' file mode, and for the most cases cft->mode can be default to 0
      and cgroup will figure out proper mode.
      Acked-by: NPaul Menage <menage@google.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      099fca32
    • K
      cgroup: fix frequent -EBUSY at rmdir · ec64f515
      KAMEZAWA Hiroyuki 提交于
      In following situation, with memory subsystem,
      
      	/groupA use_hierarchy==1
      		/01 some tasks
      		/02 some tasks
      		/03 some tasks
      		/04 empty
      
      When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim
      is triggered and the kernel walks tree under groupA. In this case,
      rmdir /groupA/04 fails with -EBUSY frequently because of temporal
      refcnt from the kernel.
      
      In general. cgroup can be rmdir'd if there are no children groups and
      no tasks. Frequent fails of rmdir() is not useful to users.
      (And the reason for -EBUSY is unknown to users.....in most cases)
      
      This patch tries to modify above behavior, by
      	- retries if css_refcnt is got by someone.
      	- add "return value" to pre_destroy() and allows subsystem to
      	  say "we're really busy!"
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec64f515
    • K
      cgroup: CSS ID support · 38460b48
      KAMEZAWA Hiroyuki 提交于
      Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
      
      This patch attaches unique ID to each css and provides following.
      
       - css_lookup(subsys, id)
         returns pointer to struct cgroup_subysys_state of id.
       - css_get_next(subsys, id, rootid, depth, foundid)
         returns the next css under "root" by scanning
      
      When cgroup_subsys->use_id is set, an id for css is maintained.
      
      The cgroup framework only parepares
      	- css_id of root css for subsys
      	- id is automatically attached at creation of css.
      	- id is *not* freed automatically. Because the cgroup framework
      	  don't know lifetime of cgroup_subsys_state.
      	  free_css_id() function is provided. This must be called by subsys.
      
      There are several reasons to develop this.
      	- Saving space .... For example, memcg's swap_cgroup is array of
      	  pointers to cgroup. But it is not necessary to be very fast.
      	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
      	  reduce much amount of memory usage.
      
      	- Scanning without lock.
      	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
      	  css under root can be written without locks.
      	  ex)
      	  do {
      		rcu_read_lock();
      		next = cgroup_get_next(subsys, id, root, &found);
      		/* check sanity of next here */
      		css_tryget();
      		rcu_read_unlock();
      		id = found + 1
      	 } while(...)
      
      Characteristics:
      	- Each css has unique ID under subsys.
      	- Lifetime of ID is controlled by subsys.
      	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
      	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
      
      Design Choices:
      	- scan-by-ID v.s. scan-by-tree-walk.
      	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
      	  by following kind of routine.
      	  scan -> rest a while(release a lock) -> conitunue from interrupted
      	  memcg's hierarchical reclaim does this.
      
      	- When subsys->use_id is set, # of css in the system is limited to
      	  65535.
      
      [bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38460b48
    • G
      cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups · 313e924c
      Grzegorz Nosek 提交于
      The ns_proxy cgroup allows moving processes to child cgroups only one
      level deep at a time.  This commit relaxes this restriction and makes it
      possible to attach tasks directly to grandchild cgroups, e.g.:
      
      ($pid is in the root cgroup)
      echo $pid > /cgroup/CG1/CG2/tasks
      
      Previously this operation would fail with -EPERM and would have to be
      performed as two steps:
      echo $pid > /cgroup/CG1/tasks
      echo $pid > /cgroup/CG1/CG2/tasks
      
      Also, the target cgroup no longer needs to be empty to move a task there.
      Signed-off-by: NGrzegorz Nosek <root@localdomain.pl>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      313e924c
    • P
      cgroups: fix cgroup.h comments · d20a390a
      Paul Menage 提交于
      Fix the style of some multi-line comments in cgroup.h to match
      Documentation/CodingStyle
      Signed-off-by: NPaul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d20a390a
  3. 30 3月, 2009 1 次提交
  4. 12 2月, 2009 1 次提交
  5. 30 1月, 2009 1 次提交
  6. 09 1月, 2009 4 次提交
    • P
      cgroups: add css_tryget() · e7c5ec91
      Paul Menage 提交于
      Add css_tryget(), that obtains a counted reference on a CSS.  It is used
      in situations where the caller has a "weak" reference to the CSS, i.e.
      one that does not protect the cgroup from removal via a reference count,
      but would instead be cleaned up by a destroy() callback.
      
      css_tryget() will return true on success, or false if the cgroup is being
      removed.
      
      This is similar to Kamezawa Hiroyuki's patch from a week or two ago, but
      with the difference that in the event of css_tryget() racing with a
      cgroup_rmdir(), css_tryget() will only return false if the cgroup really
      does get removed.
      
      This implementation is done by biasing css->refcnt, so that a refcnt of 1
      means "releasable" and 0 means "released or releasing".  In the event of a
      race, css_tryget() distinguishes between "released" and "releasing" by
      checking for the CSS_REMOVED flag in css->flags.
      Signed-off-by: NPaul Menage <menage@google.com>
      Tested-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7c5ec91
    • P
      cgroups: add a per-subsystem hierarchy_mutex · 999cd8a4
      Paul Menage 提交于
      These patches introduce new locking/refcount support for cgroups to
      reduce the need for subsystems to call cgroup_lock(). This will
      ultimately allow the atomicity of cgroup_rmdir() (which was removed
      recently) to be restored.
      
      These three patches give:
      
      1/3 - introduce a per-subsystem hierarchy_mutex which a subsystem can
           use to prevent changes to its own cgroup tree
      
      2/3 - use hierarchy_mutex in place of calling cgroup_lock() in the
           memory controller
      
      3/3 - introduce a css_tryget() function similar to the one recently
            proposed by Kamezawa, but avoiding spurious refcount failures in
            the event of a race between a css_tryget() and an unsuccessful
            cgroup_rmdir()
      
      Future patches will likely involve:
      
      - using hierarchy mutex in place of cgroup_lock() in more subsystems
       where appropriate
      
      - restoring the atomicity of cgroup_rmdir() with respect to cgroup_create()
      
      This patch:
      
      Add a hierarchy_mutex to the cgroup_subsys object that protects changes to
      the hierarchy observed by that subsystem.  It is taken by the cgroup
      subsystem (in addition to cgroup_mutex) for the following operations:
      
      - linking a cgroup into that subsystem's cgroup tree
      - unlinking a cgroup from that subsystem's cgroup tree
      - moving the subsystem to/from a hierarchy (including across the
        bind() callback)
      
      Thus if the subsystem holds its own hierarchy_mutex, it can safely
      traverse its own hierarchy.
      Signed-off-by: NPaul Menage <menage@google.com>
      Tested-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      999cd8a4
    • P
      cgroups: make cgroup_path() RCU-safe · a47295e6
      Paul Menage 提交于
      Fix races between /proc/sched_debug by freeing cgroup objects via an RCU
      callback.  Thus any cgroup reference obtained from an RCU-safe source will
      remain valid during the RCU section.  Since dentries are also RCU-safe,
      this allows us to traverse up the tree safely.
      
      Additionally, make cgroup_path() check for a NULL cgrp->dentry to avoid
      trying to report a path for a partially-created cgroup.
      
      [lizf@cn.fujitsu.com: call deactive_super() in cgroup_diput()]
      Signed-off-by: NPaul Menage <menage@google.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a47295e6
    • L
      cgroups: don't put struct cgroupfs_root protected by RCU · b2aa30f7
      Lai Jiangshan 提交于
      We don't access struct cgroupfs_root in fast path, so we should not put
      struct cgroupfs_root protected by RCU
      
      But the comment in struct cgroup_subsys.root confuse us.
      
      struct cgroup_subsys.root is used in these places:
      
      1 find_css_set(): if (ss->root->subsys_list.next == &ss->sibling)
      2 rebind_subsystems(): if (ss->root != &rootnode)
                             rcu_assign_pointer(ss->root, root);
                             rcu_assign_pointer(subsys[i]->root, &rootnode);
      3 cgroup_has_css_refs(): if (ss->root != cgrp->root)
      4 cgroup_init_subsys(): ss->root = &rootnode;
      5 proc_cgroupstats_show(): ss->name, ss->root->subsys_bits,
                                 ss->root->number_of_cgroups, !ss->disabled);
      6 cgroup_clone(): root = subsys->root;
                        if ((root != subsys->root) ||
      
      All these place we have held cgroup_lock() or we don't dereference to
      struct cgroupfs_root.  It's means wo don't need RCU when use struct
      cgroup_subsys.root, and we should not put struct cgroupfs_root protected
      by RCU.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Reviewed-by: NPaul Menage <menage@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2aa30f7
  7. 07 1月, 2009 1 次提交
  8. 31 10月, 2008 1 次提交
  9. 20 10月, 2008 3 次提交
  10. 17 10月, 2008 1 次提交
  11. 26 7月, 2008 5 次提交
  12. 29 4月, 2008 9 次提交
  13. 05 4月, 2008 1 次提交
    • P
      cgroups: add cgroup support for enabling controllers at boot time · 8bab8dde
      Paul Menage 提交于
      The effects of cgroup_disable=foo are:
      
      - foo isn't auto-mounted if you mount all cgroups in a single hierarchy
      - foo isn't visible as an individually mountable subsystem
      
      As a result there will only ever be one call to foo->create(), at init time;
      all processes will stay in this group, and the group will never be mounted on
      a visible hierarchy.  Any additional effects (e.g.  not allocating metadata)
      are up to the foo subsystem.
      
      This doesn't handle early_init subsystems (their "disabled" bit isn't set be,
      but it could easily be extended to do so if any of the early_init systems
      wanted it - I think it would just involve some nastier parameter processing
      since it would occur before the command-line argument parser had been run.
      
      Hugh said:
      
        Ballpark figures, I'm trying to get this question out rather than
        processing the exact numbers: CONFIG_CGROUP_MEM_RES_CTLR adds 15% overhead
        to the affected paths, booting with cgroup_disable=memory cuts that back to
        1% overhead (due to slightly bigger struct page).
      
        I'm no expert on distros, they may have no interest whatever in
        CONFIG_CGROUP_MEM_RES_CTLR=y; and the rest of us can easily build with or
        without it, or apply the cgroup_disable=memory patches.
      
      Unix bench's execl test result on x86_64 was
      
      == just after boot without mounting any cgroup fs.==
      mem_cgorup=off : Execl Throughput       43.0     3150.1      732.6
      mem_cgroup=on  : Execl Throughput       43.0     2932.6      682.0
      ==
      
      [lizf@cn.fujitsu.com: fix boot option parsing]
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bab8dde
  14. 24 2月, 2008 2 次提交
  15. 08 2月, 2008 2 次提交
    • C
      hotplug cpu: move tasks in empty cpusets to parent · 956db3ca
      Cliff Wickman 提交于
      This patch corrects a situation that occurs when one disables all the cpus in
      a cpuset.
      
      Currently, the disabled (cpu-less) cpuset inherits the cpus of its parent,
      which is incorrect because it may then overlap its cpu-exclusive sibling.
      
      Tasks of an empty cpuset should be moved to the cpuset which is the parent of
      their current cpuset.  Or if the parent cpuset has no cpus, to its parent,
      etc.
      
      And the empty cpuset should be released (if it is flagged notify_on_release).
      
      Depends on the cgroup_scan_tasks() function (proposed by David Rientjes) to
      iterate through all tasks in the cpu-less cpuset.  We are deliberately
      avoiding a walk of the tasklist.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      956db3ca
    • C
      cgroups: mechanism to process each task in a cgroup · 31a7df01
      Cliff Wickman 提交于
      Provide cgroup_scan_tasks(), which iterates through every task in a cgroup,
      calling a test function and a process function for each.  And call the process
      function without holding the css_set_lock lock.
      
      The idea is David Rientjes', predicting that such a function will make it much
      easier in the future to extend things that require access to each task in a
      cgroup without holding the lock,
      
      [akpm@linux-foundation.org: cleanup]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Paul Jackson <pj@sgi.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31a7df01