提交 · d3daf28da16a30af95bfb303189a634a87606725 · openeuler / raspberrypi-kernel

14 6月, 2013 7 次提交

cgroup: use percpu refcnt for cgroup_subsys_states · d3daf28d

由 Tejun Heo 提交于 6月 13, 2013

A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.

One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.

In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.

This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.

The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.

This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.

v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().

    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NKent Overstreet <koverstreet@google.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>

d3daf28d

cgroup: split cgroup destruction into two steps · ea15f8cc

由 Tejun Heo 提交于 6月 13, 2013

Split cgroup_destroy_locked() into two steps and put the latter half
into cgroup_offline_fn() which is executed from a work item.  The
latter half is responsible for offlining the css's, removing the
cgroup from internal lists, and propagating release notification to
the parent.  The separation is to allow using percpu refcnt for css.

Note that this allows for other cgroup operations to happen between
the first and second halves of destruction, including creating a new
cgroup with the same name.  As the target cgroup is marked DEAD in the
first half and cgroup internals don't care about the names of cgroups,
this should be fine.  A comment explaining this will be added by the
next patch which implements the actual percpu refcnting.

As RCU freeing is guaranteed to happen after the second step of
destruction, we can use the same work item for both.  This patch
renames cgroup->free_work to ->destroy_work and uses it for both
purposes.  INIT_WORK() is now performed right before queueing the work
item.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

ea15f8cc

cgroup: remove cgroup->count and use · 6f3d828f

由 Tejun Heo 提交于 6月 12, 2013

cgroup->count tracks the number of css_sets associated with the cgroup
and used only to verify that no css_set is associated when the cgroup
is being destroyed.  It's superflous as the destruction path can
simply check whether cgroup->cset_links is empty instead.

Drop cgroup->count and check ->cset_links directly from
cgroup_destroy_locked().
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

6f3d828f

cgroup: rename CGRP_REMOVED to CGRP_DEAD · 54766d4a

由 Tejun Heo 提交于 6月 12, 2013

We will add another flag indicating that the cgroup is in the process
of being killed.  REMOVING / REMOVED is more difficult to distinguish
and cgroup_is_removing()/cgroup_is_removed() are a bit awkward.  Also,
later percpu_ref usage will involve "kill"ing the refcnt.

 s/CGRP_REMOVED/CGRP_DEAD/
 s/cgroup_is_removed()/cgroup_is_dead()

This patch is purely cosmetic.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

54766d4a

cgroup: clean up css_[try]get() and css_put() · 5de0107e

由 Tejun Heo 提交于 6月 12, 2013

* __css_get() isn't used by anyone.  Fold it into css_get().

* Add proper function comments to all css reference functions.

This patch is purely cosmetic.

v2: Typo fix as per Li.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

5de0107e

cgroup: bring some sanity to naming around cg_cgroup_link · 69d0206c

由 Tejun Heo 提交于 6月 12, 2013

cgroups and css_sets are mapped M:N and this M:N mapping is
represented by struct cg_cgroup_link which forms linked lists on both
sides.  The naming around this mapping is already confusing and struct
cg_cgroup_link exacerbates the situation quite a bit.

>From cgroup side, it starts off ->css_sets and runs through
->cgrp_link_list.  From css_set side, it starts off ->cg_links and
runs through ->cg_link_list.  This is rather reversed as
cgrp_link_list is used to iterate css_sets and cg_link_list cgroups.
Also, this is the only place which is still using the confusing "cg"
for css_sets.  This patch cleans it up a bit.

* s/cgroup->css_sets/cgroup->cset_links/
  s/css_set->cg_links/css_set->cgrp_links/
  s/cgroup_iter->cg_link/cgroup_iter->cset_link/

* s/cg_cgroup_link/cgrp_cset_link/

* s/cgrp_cset_link->cg/cgrp_cset_link->cset/
  s/cgrp_cset_link->cgrp_link_list/cgrp_cset_link->cset_link/
  s/cgrp_cset_link->cg_link_list/cgrp_cset_link->cgrp_link/

* s/init_css_set_link/init_cgrp_cset_link/
  s/free_cg_links/free_cgrp_cset_links/
  s/allocate_cg_links/allocate_cgrp_cset_links/

* s/cgl[12]/link[12]/ in compare_css_sets()

* s/saved_link/tmp_link/ s/tmp/tmp_links/ and a couple similar
  adustments.

* Comment and whiteline adjustments.

After the changes, we have

	list_for_each_entry(link, &cont->cset_links, cset_link) {
		struct css_set *cset = link->cset;

instead of

	list_for_each_entry(link, &cont->css_sets, cgrp_link_list) {
		struct css_set *cset = link->cg;

This patch is purely cosmetic.

v2: Fix broken sentences in the patch description.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

69d0206c

T
cgroup: remove now unused css_depth() · 3fc3db9a
由 Tejun Heo 提交于 6月 12, 2013
```
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
```
3fc3db9a

24 5月, 2013 4 次提交

cgroup: update iterators to use cgroup_next_sibling() · 75501a6d

由 Tejun Heo 提交于 5月 24, 2013

This patch converts cgroup_for_each_child(),
cgroup_next_descendant_pre/post() and thus
cgroup_for_each_descendant_pre/post() to use cgroup_next_sibling()
instead of manually dereferencing ->sibling.next.

The only reason the iterators couldn't allow dropping RCU read lock
while iteration is in progress was because they couldn't determine the
next sibling safely once RCU read lock is dropped.  Using
cgroup_next_sibling() removes that problem and enables all iterators
to allow dropping RCU read lock in the middle.  Comments are updated
accordingly.

This makes the iterators easier to use and will simplify controllers.

Note that @cgroup argument is renamed to @cgrp in
cgroup_for_each_child() because it conflicts with "struct cgroup" used
in the new macro body.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>

75501a6d

cgroup: add cgroup->serial_nr and implement cgroup_next_sibling() · 53fa5261

由 Tejun Heo 提交于 5月 24, 2013

Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.

It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.

This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.

This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.

Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.

v2: Typo fix as per Serge.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>

53fa5261

cgroup: make cgroup_is_removed() static · bdc7119f

由 Tejun Heo 提交于 5月 24, 2013

cgroup_is_removed() no longer has external users and it shouldn't grow
any - controllers should deal with cgroup_subsys_state on/offline
state instead of cgroup removal state.  Make it static.

While at it, make it return bool.
Signed-off-by: NTejun Heo <tj@kernel.org>

bdc7119f

cgroup: fix a subtle bug in descendant pre-order walk · 7805d000

由 Tejun Heo 提交于 5月 24, 2013

When cgroup_next_descendant_pre() initiates a walk, it checks whether
the subtree root doesn't have any children and if not returns NULL.
Later code assumes that the subtree isn't empty.  This is broken
because the subtree may become empty inbetween, which can lead to the
traversal escaping the subtree by walking to the sibling of the
subtree root.

There's no reason to have the early exit path.  Remove it along with
the later assumption that the subtree isn't empty.  This simplifies
the code a bit and fixes the subtle bug.

While at it, fix the comment of cgroup_for_each_descendant_pre() which
was incorrectly referring to ->css_offline() instead of
->css_online().
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org

7805d000

15 5月, 2013 2 次提交

cgroup.h: remove some functions that are now gone · 23958e72

由 Greg KH 提交于 5月 03, 2013

cgroup_lock() and cgroup_unlock() are now no longer exported, so fix
cgroup.h to not declare them if CONFIG_CGROUPS is not enabled.
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

23958e72

cgroup: implement task_cgroup_path_from_hierarchy() · 857a2beb

由 Tejun Heo 提交于 4月 14, 2013

kdbus folks want a sane way to determine the cgroup path that a given
task belongs to on a given hierarchy, which is a reasonble thing to
expect from cgroup core.

Implement task_cgroup_path_from_hierarchy().

v2: Dropped unnecessary NULL check on the return value of
    task_cgroup_from_root() as suggested by Li Zefan.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NGreg Kroah-Hartman <greg@kroah.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <daniel@zonque.org>

857a2beb

08 5月, 2013 1 次提交

aio: don't include aio.h in sched.h · a27bb332

由 Kent Overstreet 提交于 5月 07, 2013

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a27bb332

02 5月, 2013 1 次提交
- A
  take cgroup_open() and cpuset_open() to fs/proc/base.c · 8d8b97ba
  由 Al Viro 提交于 4月 19, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  8d8b97ba
30 4月, 2013 1 次提交

cgroup: remove css_get_next · 6d2488f6

由 Michal Hocko 提交于 4月 29, 2013

Now that we have generic and well ordered cgroup tree walkers there is
no need to keep css_get_next in the place.
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ying Han <yinghan@google.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d2488f6

19 4月, 2013 1 次提交

cgroup: fix broken file xattrs · 712317ad

由 Li Zefan 提交于 4月 18, 2013

We should store file xattrs in struct cfent instead of struct cftype,
because cftype is a type while cfent is object instance of cftype.

For example each cgroup has a tasks file, and each tasks file is
associated with a uniq cfent, but all those files share the same
struct cftype.

Alexey Kodanev reported a crash, which can be reproduced:

  # mount -t cgroup -o xattr /sys/fs/cgroup
  # mkdir /sys/fs/cgroup/test
  # setfattr -n trusted.value -v test_value /sys/fs/cgroup/tasks
  # rmdir /sys/fs/cgroup/test
  # umount /sys/fs/cgroup
  oops!

In this case, simple_xattrs_free() will free the same struct simple_xattrs
twice.

tj: Dropped unused local variable @cft from cgroup_diput().

Cc: <stable@vger.kernel.org> # 3.8.x
Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

712317ad

16 4月, 2013 1 次提交

memcg: force use_hierarchy if sane_behavior · f00baae7

由 Tejun Heo 提交于 4月 15, 2013

Turn on use_hierarchy by default if sane_behavior is specified and
don't create .use_hierarchy file.

It is debatable whether to remove .use_hierarchy file or make it ro as
the former could make transition easier in certain cases; however, the
behavior changes which will be gated by sane_behavior are intensive
including changing basic meaning of certain control knobs in a few
controllers and I don't really think keeping this piece would make
things easier in any noticeable way, so let's remove it.

v2: Explain that mem_cgroup_bind() doesn't have to worry about
    children as suggested by Michal Hocko.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

f00baae7

15 4月, 2013 3 次提交

cgroup: remove cgrp->top_cgroup · 05fb22ec

由 Li Zefan 提交于 4月 15, 2013

It's not used, and it can be retrieved via cgrp->root->top_cgroup.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

05fb22ec

cgroup: introduce sane_behavior mount option · 873fe09e

由 Tejun Heo 提交于 4月 14, 2013

It's a sad fact that at this point various cgroup controllers are
carrying so many idiosyncrasies and pure insanities that it simply
isn't possible to reach any sort of sane consistent behavior while
maintaining staying fully compatible with what already has been
exposed to userland.

As we can't break exposed userland interface, transitioning to sane
behaviors can only be done in steps while maintaining backwards
compatibility.  This patch introduces a new mount option -
__DEVEL__sane_behavior - which disables crazy features and enforces
consistent behaviors in cgroup core proper and various controllers.
As exactly which behaviors it changes are still being determined, the
mount option, at this point, is useful only for development of the new
behaviors.  As such, the mount option is prefixed with __DEVEL__ and
generates a warning message when used.

Eventually, once we get to the point where all controller's behaviors
are consistent enough to implement unified hierarchy, the __DEVEL__
prefix will be dropped, and more importantly, unified-hierarchy will
enforce sane_behavior by default.  Maybe we'll able to completely drop
the crazy stuff after a while, maybe not, but we at least have a
strategy to move on to saner behaviors.

This patch introduces the mount option and changes the following
behaviors in cgroup core.

* Mount options "noprefix" and "clone_children" are disallowed.  Also,
  cgroupfs file cgroup.clone_children is not created.

* When mounting an existing superblock, mount options should match.
  This is currently pretty crazy.  If one mounts a cgroup, creates a
  subdirectory, unmounts it and then mount it again with different
  option, it looks like the new options are applied but they aren't.

* Remount is disallowed.

The behaviors changes are documented in the comment above
CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
controllers are converted and planned improvements progress.

v2: Dropped unnecessary explicit file permission setting sane_behavior
    cftype entry as suggested by Li Zefan.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>

873fe09e

move cgroupfs_root to include/linux/cgroup.h · 25a7e684

由 Tejun Heo 提交于 4月 14, 2013

While controllers shouldn't be accessing cgroupfs_root directly, it
being hidden inside kern/cgroup.c makes somethings pretty silly.  This
makes routing hierarchy-wide settings which need to be visible to
controllers cumbersome.

We're gonna add another hierarchy-wide setting which needs to be
accessed from controllers.  Move cgroupfs_root and its flags to the
header file so that we can access root settings with inline helpers.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>

25a7e684

13 4月, 2013 1 次提交

Revert "cgroup: remove bind() method from cgroup_subsys." · 26d5bbe5

由 Tejun Heo 提交于 4月 12, 2013

This reverts commit 84cfb6ab.  There
are scheduled changes which make use of the removed callback.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Rami Rosen <ramirose@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>

26d5bbe5

11 4月, 2013 2 次提交

cgroup: implement cgroup_is_descendant() · 78574cf9

由 Li Zefan 提交于 4月 08, 2013

A couple controllers want to determine whether two cgroups are in
ancestor/descendant relationship.  As it's more likely that the
descendant is the primary subject of interest and there are other
operations focusing on the descendants, let's ask is_descendent rather
than is_ancestor.

Implementation is trivial as the previous patch guarantees that all
ancestors of a cgroup stay accessible as long as the cgroup is
accessible.

tj: Removed depth optimization, renamed from cgroup_is_ancestor(),
    rewrote descriptions.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

78574cf9

cgroup: remove bind() method from cgroup_subsys. · 84cfb6ab

由 Rami Rosen 提交于 4月 10, 2013

The bind() method of cgroup_subsys is not used in any of the
controllers (cpuset, freezer, blkio, net_cls, memcg, net_prio,
devices, perf, hugetlb, cpu and cpuacct)

tj: Removed the entry on ->bind() from
    Documentation/cgroups/cgroups.txt.  Also updated a couple
    paragraphs which were suggesting that dynamic re-binding may be
    implemented.  It's not gonna.
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

84cfb6ab

10 4月, 2013 1 次提交

cgroup: Kill subsys.active flag · 479f6141

由 Li Zefan 提交于 3月 29, 2013

The only user was cpuacct.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5155385A.4040207@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

479f6141

08 4月, 2013 3 次提交

cgroup: remove cgroup_lock_is_held() · 2219449a

由 Tejun Heo 提交于 4月 07, 2013

We don't want controllers to assume that the information is officially
available and do funky things with it.

The only user is task_subsys_state_check() which uses it to verify RCU
access context.  We can move cgroup_lock_is_held() inside
CONFIG_PROVE_RCU but that doesn't add meaningful protection compared
to conditionally exposing cgroup_mutex.

Remove cgroup_lock_is_held(), export cgroup_mutex iff CONFIG_PROVE_RCU
and use lockdep_is_held() directly on the mutex in
task_subsys_state_check().

While at it, add parentheses around macro arguments in
task_subsys_state_check().
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

2219449a

cgroup: unexport locking interface and cgroup_attach_task() · b9777cf8

由 Tejun Heo 提交于 4月 07, 2013

Now that all external cgroup_lock() users are gone, we can finally
unexport the locking interface and prevent future abuse of
cgroup_mutex.

Make cgroup_[un]lock() and cgroup_lock_live_group() static.  Also,
cgroup_attach_task() doesn't have any user left and can't be used
without locking interface anyway.  Make it static too.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

b9777cf8

cgroup, cpuset: replace move_member_tasks_to_cpuset() with cgroup_transfer_tasks() · 8cc99345

由 Tejun Heo 提交于 4月 07, 2013

When a cpuset becomes empty (no CPU or memory), its tasks are
transferred with the nearest ancestor with execution resources.  This
is implemented using cgroup_scan_tasks() with a callback which grabs
cgroup_mutex and invokes cgroup_attach_task() on each task.

Both cgroup_mutex and cgroup_attach_task() are scheduled to be
unexported.  Implement cgroup_transfer_tasks() in cgroup proper which
is essentially the same as move_member_tasks_to_cpuset() except that
it takes cgroups instead of cpusets and @to comes before @from like
normal functions with those arguments, and replace
move_member_tasks_to_cpuset() with it.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

8cc99345

20 3月, 2013 1 次提交

cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc() · 081aa458

由 Li Zefan 提交于 3月 13, 2013

These two functions share most of the code.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

081aa458

13 3月, 2013 1 次提交

cgroup: remove cgroup_is_descendant() · e7b2dcc5

由 Li Zefan 提交于 3月 12, 2013

It was used by ns cgroup, and ns cgroup was removed long ago.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

e7b2dcc5

06 3月, 2013 1 次提交

cgroup: avoid accessing modular cgroup subsys structure without locking · 7d8e0bf5

由 Li Zefan 提交于 3月 05, 2013

subsys[i] is set to NULL in cgroup_unload_subsys() at modular unload,
and that's protected by cgroup_mutex, and then the memory *subsys[i]
resides will be freed.

So this is unsafe without any locking:

  if (!ss || ss->module)
  ...

v2:
- add a comment for enum cgroup_subsys_id
- simplify the comment in cgroup_exit()
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

7d8e0bf5

05 3月, 2013 1 次提交

cgroup: fix cgroup_path() vs rename() race · 65dff759

由 Li Zefan 提交于 3月 01, 2013

rename() will change dentry->d_name. The result of this race can
be worse than seeing partially rewritten name, but we might access
a stale pointer because rename() will re-allocate memory to hold
a longer name.

As accessing dentry->name must be protected by dentry->d_lock or
parent inode's i_mutex, while on the other hand cgroup-path() can
be called with some irq-safe spinlocks held, we can't generate
cgroup path using dentry->d_name.

Alternatively we make a copy of dentry->d_name and save it in
cgrp->name when a cgroup is created, and update cgrp->name at
rename().

v5: use flexible array instead of zero-size array.
v4: - allocate root_cgroup_name and all root_cgroup->name points to it.
    - add cgroup_name() wrapper.
v3: use kfree_rcu() instead of synchronize_rcu() in user-visible path.
v2: make cgrp->name RCU safe.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

65dff759

25 1月, 2013 1 次提交

cgroup: remove synchronize_rcu() from cgroup_diput() · be445626

由 Li Zefan 提交于 1月 24, 2013

Free cgroup via call_rcu(). The actual work is done through
workqueue.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

be445626

08 1月, 2013 2 次提交

cgroup: implement cgroup_rightmost_descendant() · 12a9d2fe

由 Tejun Heo 提交于 1月 07, 2013

Implement cgroup_rightmost_descendant() which returns the right most
descendant of the specified cgroup.  This can be used to skip the
cgroup's subtree while iterating with
cgroup_for_each_descendant_pre().
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NLi Zefan <lizefan@huawei.com>

12a9d2fe

cgroup: remove unused dummy cgroup_fork_callbacks() · d5b1fe68

由 Tejun Heo 提交于 12月 28, 2012

5edee61e ("cgroup: cgroup_subsys->fork() should be called after the
task is added to css_set") removed cgroup_fork_callbacks() but forgot
to remove its dummy version for !CONFIG_CGROUPS. Remove it.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NHerton Ronaldo Krzesinski <herton.krzesinski@canonical.com>

d5b1fe68

20 11月, 2012 5 次提交

cgroup: remove obsolete guarantee from cgroup_task_migrate. · d0b2fdd2

由 Tao Ma 提交于 11月 20, 2012

'guarantee' is already removed from cgroup_task_migrate, so remove
the corresponding comments. Some other typos in cgroup are also
changed.

Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

d0b2fdd2

cgroup: add cgroup->id · 0a950f65

由 Tejun Heo 提交于 11月 19, 2012

With the introduction of generic cgroup hierarchy iterators, css_id is
being phased out.  It was unnecessarily complex, id'ing the wrong
thing (cgroups need IDs, not CSSes) and has other oddities like not
being available at ->css_alloc().

This patch adds cgroup->id, which is a simple per-hierarchy
ida-allocated ID which is assigned before ->css_alloc() and released
after ->css_free().
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>

0a950f65

cgroup, cpuset: remove cgroup_subsys->post_clone() · 033fa1c5

由 Tejun Heo 提交于 11月 19, 2012

Currently CGRP_CPUSET_CLONE_CHILDREN triggers ->post_clone().  Now
that clone_children is cpuset specific, there's no reason to have this
rather odd option activation mechanism in cgroup core.  cpuset can
check the flag from its ->css_allocate() and take the necessary
action.

Move cpuset_post_clone() logic to the end of cpuset_css_alloc() and
remove cgroup_subsys->post_clone().

Loosely based on Glauber's "generalize post_clone into post_create"
patch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Original-patch-by: NGlauber Costa <glommer@parallels.com>
Original-patch: <1351686554-22592-2-git-send-email-glommer@parallels.com>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>

033fa1c5

cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/ · 2260e7fc

由 Tejun Heo 提交于 11月 19, 2012

clone_children is only meaningful for cpuset and will stay that way.
Rename the flag to reflect that and update documentation.  Also, drop
clone_children() wrapper in cgroup.c.  The thin wrapper is used only a
few times and one of them will go away soon.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>

2260e7fc

cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() · 92fb9748

由 Tejun Heo 提交于 11月 19, 2012

Rename cgroup_subsys css lifetime related callbacks to better describe
what their roles are.  Also, update documentation.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

92fb9748