1. 13 3月, 2010 1 次提交
  2. 25 2月, 2010 1 次提交
    • P
      sched: Use lockdep-based checking on rcu_dereference() · d11c563d
      Paul E. McKenney 提交于
      Update the rcu_dereference() usages to take advantage of the new
      lockdep-based checking.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-6-git-send-email-paulmck@linux.vnet.ibm.com>
      [ -v2: fix allmodconfig missing symbol export build failure on x86 ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d11c563d
  3. 02 10月, 2009 1 次提交
  4. 24 9月, 2009 5 次提交
    • B
      cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time · be367d09
      Ben Blum 提交于
      Alter the ss->can_attach and ss->attach functions to be able to deal with
      a whole threadgroup at a time, for use in cgroup_attach_proc.  (This is a
      pre-patch to cgroup-procs-writable.patch.)
      
      Currently, new mode of the attach function can only tell the subsystem
      about the old cgroup of the threadgroup leader.  No subsystem currently
      needs that information for each thread that's being moved, but if one were
      to be added (for example, one that counts tasks within a group) this bit
      would need to be reworked a bit to tell the subsystem the right
      information.
      
      [hidave.darkstar@gmail.com: fix build]
      Signed-off-by: NBen Blum <bblum@google.com>
      Signed-off-by: NPaul Menage <menage@google.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Reviewed-by: NMatt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Dave Young <hidave.darkstar@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be367d09
    • B
      cgroups: change css_set freeing mechanism to be under RCU · c378369d
      Ben Blum 提交于
      Changes css_set freeing mechanism to be under RCU
      
      This is a prepatch for making the procs file writable. In order to free the
      old css_sets for each task to be moved as they're being moved, the freeing
      mechanism must be RCU-protected, or else we would have to have a call to
      synchronize_rcu() for each task before freeing its old css_set.
      Signed-off-by: NBen Blum <bblum@google.com>
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c378369d
    • B
      cgroups: ensure correct concurrent opening/reading of pidlists across pid namespaces · 72a8cb30
      Ben Blum 提交于
      Previously there was the problem in which two processes from different pid
      namespaces reading the tasks or procs file could result in one process
      seeing results from the other's namespace.  Rather than one pidlist for
      each file in a cgroup, we now keep a list of pidlists keyed by namespace
      and file type (tasks versus procs) in which entries are placed on demand.
      Each pidlist has its own lock, and that the pidlists themselves are passed
      around in the seq_file's private pointer means we don't have to touch the
      cgroup or its master list except when creating and destroying entries.
      Signed-off-by: NBen Blum <bblum@google.com>
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72a8cb30
    • B
      cgroups: add a read-only "procs" file similar to "tasks" that shows only unique tgids · 102a775e
      Ben Blum 提交于
      struct cgroup used to have a bunch of fields for keeping track of the
      pidlist for the tasks file.  Those are now separated into a new struct
      cgroup_pidlist, of which two are had, one for procs and one for tasks.
      The way the seq_file operations are set up is changed so that just the
      pidlist struct gets passed around as the private data.
      
      Interface example: Suppose a multithreaded process has pid 1000 and other
      threads with ids 1001, 1002, 1003:
      $ cat tasks
      1000
      1001
      1002
      1003
      $ cat cgroup.procs
      1000
      $
      Signed-off-by: NBen Blum <bblum@google.com>
      Signed-off-by: NPaul Menage <menage@google.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      102a775e
    • P
      cgroups: revert "cgroups: fix pid namespace bug" · 8f3ff208
      Paul Menage 提交于
      The following series adds a "cgroup.procs" file to each cgroup that
      reports unique tgids rather than pids, and allows all threads in a
      threadgroup to be atomically moved to a new cgroup.
      
      The subsystem "attach" interface is modified to support attaching whole
      threadgroups at a time, which could introduce potential problems if any
      subsystem were to need to access the old cgroup of every thread being
      moved.  The attach interface may need to be revised if this becomes the
      case.
      
      Also added is functionality for read/write locking all CLONE_THREAD
      fork()ing within a threadgroup, by means of an rwsem that lives in the
      sighand_struct, for per-threadgroup-ness and also for sharing a cacheline
      with the sighand's atomic count.  This scheme should introduce no extra
      overhead in the fork path when there's no contention.
      
      The final patch reveals potential for a race when forking before a
      subsystem's attach function is called - one potential solution in case any
      subsystem has this problem is to hang on to the group's fork mutex through
      the attach() calls, though no subsystem yet demonstrates need for an
      extended critical section.
      
      This patch:
      
      Revert
      
      commit 096b7fe0
      Author:     Li Zefan <lizf@cn.fujitsu.com>
      AuthorDate: Wed Jul 29 15:04:04 2009 -0700
      Commit:     Linus Torvalds <torvalds@linux-foundation.org>
      CommitDate: Wed Jul 29 19:10:35 2009 -0700
      
          cgroups: fix pid namespace bug
      
      This is in preparation for some clashing cgroups changes that subsume the
      original commit's functionaliy.
      
      The original commit fixed a pid namespace bug which Ben Blum fixed
      independently (in the same way, but with different code) as part of a
      series of patches.  I played around with trying to reconcile Ben's patch
      series with Li's patch, but concluded that it was simpler to just revert
      Li's, given that Ben's patch series contained essentially the same fix.
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f3ff208
  5. 30 7月, 2009 2 次提交
  6. 03 4月, 2009 7 次提交
    • L
      cgroups: add 'data' field to struct cgroup_scanner · bd1a8ab7
      Li Zefan 提交于
      We need to pass some data to test_task() or process_task() in some cases.
      Will be used later.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd1a8ab7
    • K
      memcg: fix OOM killer under memcg · 0b7f569e
      KAMEZAWA Hiroyuki 提交于
      This patch tries to fix OOM Killer problems caused by hierarchy.
      Now, memcg itself has OOM KILL function (in oom_kill.c) and tries to
      kill a task in memcg.
      
      But, when hierarchy is used, it's broken and correct task cannot
      be killed. For example, in following cgroup
      
      	/groupA/	hierarchy=1, limit=1G,
      		01	nolimit
      		02	nolimit
      All tasks' memory usage under /groupA, /groupA/01, groupA/02 is limited to
      groupA's 1Gbytes but OOM Killer just kills tasks in groupA.
      
      This patch provides makes the bad process be selected from all tasks
      under hierarchy. BTW, currently, oom_jiffies is updated against groupA
      in above case. oom_jiffies of tree should be updated.
      
      To see how oom_jiffies is used, please check mem_cgroup_oom_called()
      callers.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: const fix]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b7f569e
    • L
      cgroups: show correct file mode · 099fca32
      Li Zefan 提交于
      We have some read-only files and write-only files, but currently they are
      all set to 0644, which is counter-intuitive and cause trouble for some
      cgroup tools like libcgroup.
      
      This patch adds 'mode' to struct cftype to allow cgroup subsys to set it's
      own files' file mode, and for the most cases cft->mode can be default to 0
      and cgroup will figure out proper mode.
      Acked-by: NPaul Menage <menage@google.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      099fca32
    • K
      cgroup: fix frequent -EBUSY at rmdir · ec64f515
      KAMEZAWA Hiroyuki 提交于
      In following situation, with memory subsystem,
      
      	/groupA use_hierarchy==1
      		/01 some tasks
      		/02 some tasks
      		/03 some tasks
      		/04 empty
      
      When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim
      is triggered and the kernel walks tree under groupA. In this case,
      rmdir /groupA/04 fails with -EBUSY frequently because of temporal
      refcnt from the kernel.
      
      In general. cgroup can be rmdir'd if there are no children groups and
      no tasks. Frequent fails of rmdir() is not useful to users.
      (And the reason for -EBUSY is unknown to users.....in most cases)
      
      This patch tries to modify above behavior, by
      	- retries if css_refcnt is got by someone.
      	- add "return value" to pre_destroy() and allows subsystem to
      	  say "we're really busy!"
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec64f515
    • K
      cgroup: CSS ID support · 38460b48
      KAMEZAWA Hiroyuki 提交于
      Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
      
      This patch attaches unique ID to each css and provides following.
      
       - css_lookup(subsys, id)
         returns pointer to struct cgroup_subysys_state of id.
       - css_get_next(subsys, id, rootid, depth, foundid)
         returns the next css under "root" by scanning
      
      When cgroup_subsys->use_id is set, an id for css is maintained.
      
      The cgroup framework only parepares
      	- css_id of root css for subsys
      	- id is automatically attached at creation of css.
      	- id is *not* freed automatically. Because the cgroup framework
      	  don't know lifetime of cgroup_subsys_state.
      	  free_css_id() function is provided. This must be called by subsys.
      
      There are several reasons to develop this.
      	- Saving space .... For example, memcg's swap_cgroup is array of
      	  pointers to cgroup. But it is not necessary to be very fast.
      	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
      	  reduce much amount of memory usage.
      
      	- Scanning without lock.
      	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
      	  css under root can be written without locks.
      	  ex)
      	  do {
      		rcu_read_lock();
      		next = cgroup_get_next(subsys, id, root, &found);
      		/* check sanity of next here */
      		css_tryget();
      		rcu_read_unlock();
      		id = found + 1
      	 } while(...)
      
      Characteristics:
      	- Each css has unique ID under subsys.
      	- Lifetime of ID is controlled by subsys.
      	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
      	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
      
      Design Choices:
      	- scan-by-ID v.s. scan-by-tree-walk.
      	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
      	  by following kind of routine.
      	  scan -> rest a while(release a lock) -> conitunue from interrupted
      	  memcg's hierarchical reclaim does this.
      
      	- When subsys->use_id is set, # of css in the system is limited to
      	  65535.
      
      [bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38460b48
    • G
      cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups · 313e924c
      Grzegorz Nosek 提交于
      The ns_proxy cgroup allows moving processes to child cgroups only one
      level deep at a time.  This commit relaxes this restriction and makes it
      possible to attach tasks directly to grandchild cgroups, e.g.:
      
      ($pid is in the root cgroup)
      echo $pid > /cgroup/CG1/CG2/tasks
      
      Previously this operation would fail with -EPERM and would have to be
      performed as two steps:
      echo $pid > /cgroup/CG1/tasks
      echo $pid > /cgroup/CG1/CG2/tasks
      
      Also, the target cgroup no longer needs to be empty to move a task there.
      Signed-off-by: NGrzegorz Nosek <root@localdomain.pl>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      313e924c
    • P
      cgroups: fix cgroup.h comments · d20a390a
      Paul Menage 提交于
      Fix the style of some multi-line comments in cgroup.h to match
      Documentation/CodingStyle
      Signed-off-by: NPaul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d20a390a
  7. 30 3月, 2009 1 次提交
  8. 12 2月, 2009 1 次提交
  9. 30 1月, 2009 1 次提交
  10. 09 1月, 2009 4 次提交
    • P
      cgroups: add css_tryget() · e7c5ec91
      Paul Menage 提交于
      Add css_tryget(), that obtains a counted reference on a CSS.  It is used
      in situations where the caller has a "weak" reference to the CSS, i.e.
      one that does not protect the cgroup from removal via a reference count,
      but would instead be cleaned up by a destroy() callback.
      
      css_tryget() will return true on success, or false if the cgroup is being
      removed.
      
      This is similar to Kamezawa Hiroyuki's patch from a week or two ago, but
      with the difference that in the event of css_tryget() racing with a
      cgroup_rmdir(), css_tryget() will only return false if the cgroup really
      does get removed.
      
      This implementation is done by biasing css->refcnt, so that a refcnt of 1
      means "releasable" and 0 means "released or releasing".  In the event of a
      race, css_tryget() distinguishes between "released" and "releasing" by
      checking for the CSS_REMOVED flag in css->flags.
      Signed-off-by: NPaul Menage <menage@google.com>
      Tested-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7c5ec91
    • P
      cgroups: add a per-subsystem hierarchy_mutex · 999cd8a4
      Paul Menage 提交于
      These patches introduce new locking/refcount support for cgroups to
      reduce the need for subsystems to call cgroup_lock(). This will
      ultimately allow the atomicity of cgroup_rmdir() (which was removed
      recently) to be restored.
      
      These three patches give:
      
      1/3 - introduce a per-subsystem hierarchy_mutex which a subsystem can
           use to prevent changes to its own cgroup tree
      
      2/3 - use hierarchy_mutex in place of calling cgroup_lock() in the
           memory controller
      
      3/3 - introduce a css_tryget() function similar to the one recently
            proposed by Kamezawa, but avoiding spurious refcount failures in
            the event of a race between a css_tryget() and an unsuccessful
            cgroup_rmdir()
      
      Future patches will likely involve:
      
      - using hierarchy mutex in place of cgroup_lock() in more subsystems
       where appropriate
      
      - restoring the atomicity of cgroup_rmdir() with respect to cgroup_create()
      
      This patch:
      
      Add a hierarchy_mutex to the cgroup_subsys object that protects changes to
      the hierarchy observed by that subsystem.  It is taken by the cgroup
      subsystem (in addition to cgroup_mutex) for the following operations:
      
      - linking a cgroup into that subsystem's cgroup tree
      - unlinking a cgroup from that subsystem's cgroup tree
      - moving the subsystem to/from a hierarchy (including across the
        bind() callback)
      
      Thus if the subsystem holds its own hierarchy_mutex, it can safely
      traverse its own hierarchy.
      Signed-off-by: NPaul Menage <menage@google.com>
      Tested-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      999cd8a4
    • P
      cgroups: make cgroup_path() RCU-safe · a47295e6
      Paul Menage 提交于
      Fix races between /proc/sched_debug by freeing cgroup objects via an RCU
      callback.  Thus any cgroup reference obtained from an RCU-safe source will
      remain valid during the RCU section.  Since dentries are also RCU-safe,
      this allows us to traverse up the tree safely.
      
      Additionally, make cgroup_path() check for a NULL cgrp->dentry to avoid
      trying to report a path for a partially-created cgroup.
      
      [lizf@cn.fujitsu.com: call deactive_super() in cgroup_diput()]
      Signed-off-by: NPaul Menage <menage@google.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a47295e6
    • L
      cgroups: don't put struct cgroupfs_root protected by RCU · b2aa30f7
      Lai Jiangshan 提交于
      We don't access struct cgroupfs_root in fast path, so we should not put
      struct cgroupfs_root protected by RCU
      
      But the comment in struct cgroup_subsys.root confuse us.
      
      struct cgroup_subsys.root is used in these places:
      
      1 find_css_set(): if (ss->root->subsys_list.next == &ss->sibling)
      2 rebind_subsystems(): if (ss->root != &rootnode)
                             rcu_assign_pointer(ss->root, root);
                             rcu_assign_pointer(subsys[i]->root, &rootnode);
      3 cgroup_has_css_refs(): if (ss->root != cgrp->root)
      4 cgroup_init_subsys(): ss->root = &rootnode;
      5 proc_cgroupstats_show(): ss->name, ss->root->subsys_bits,
                                 ss->root->number_of_cgroups, !ss->disabled);
      6 cgroup_clone(): root = subsys->root;
                        if ((root != subsys->root) ||
      
      All these place we have held cgroup_lock() or we don't dereference to
      struct cgroupfs_root.  It's means wo don't need RCU when use struct
      cgroup_subsys.root, and we should not put struct cgroupfs_root protected
      by RCU.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Reviewed-by: NPaul Menage <menage@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2aa30f7
  11. 07 1月, 2009 1 次提交
  12. 31 10月, 2008 1 次提交
  13. 20 10月, 2008 3 次提交
  14. 17 10月, 2008 1 次提交
  15. 26 7月, 2008 5 次提交
  16. 29 4月, 2008 5 次提交