提交 · 2bb566cb68dfafad328af666ebadf0e49accd6ca · openanolis / cloud-kernel

09 8月, 2013 5 次提交

cgroup: add subsys backlink pointer to cftype · 2bb566cb

由 Tejun Heo 提交于 8月 08, 2013

cgroup is transitioning to using css (cgroup_subsys_state) instead of
cgroup as the primary subsystem handle.  The cgroupfs file interface
will be converted to use css's which requires finding out the
subsystem from cftype so that the matching css can be determined from
the cgroup.

This patch adds cftype->ss which points to the subsystem the file
belongs to.  The field is initialized while a cftype is being
registered.  This makes it unnecessary to explicitly specify the
subsystem for other cftype handling functions.  @ss argument dropped
from various cftype handling functions.

This patch shouldn't introduce any behavior differences.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>

2bb566cb

cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methods · eb95419b

由 Tejun Heo 提交于 8月 08, 2013

cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup *
in subsystem implementations for the following reasons.

* With unified hierarchy, subsystems will be dynamically bound and
  unbound from cgroups and thus css's (cgroup_subsys_state) may be
  created and destroyed dynamically over the lifetime of a cgroup,
  which is different from the current state where all css's are
  allocated and destroyed together with the associated cgroup.  This
  in turn means that cgroup_css() should be synchronized and may
  return NULL, making it more cumbersome to use.

* Differing levels of per-subsystem granularity in the unified
  hierarchy means that the task and descendant iterators should behave
  differently depending on the specific subsystem the iteration is
  being performed for.

* In majority of the cases, subsystems only care about its part in the
  cgroup hierarchy - ie. the hierarchy of css's.  Subsystem methods
  often obtain the matching css pointer from the cgroup and don't
  bother with the cgroup pointer itself.  Passing around css fits
  much better.

This patch converts all cgroup_subsys methods to take @css instead of
@cgroup.  The conversions are mostly straight-forward.  A few
noteworthy changes are

* ->css_alloc() now takes css of the parent cgroup rather than the
  pointer to the new cgroup as the css for the new cgroup doesn't
  exist yet.  Knowing the parent css is enough for all the existing
  subsystems.

* In kernel/cgroup.c::offline_css(), unnecessary open coded css
  dereference is replaced with local variable access.

This patch shouldn't cause any behavior differences.

v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
    with local variable @css as suggested by Li Zefan.

    Rebased on top of new for-3.12 which includes for-3.11-fixes so
    that ->css_free() invocation added by da0a12ca ("cgroup: fix a
    leak when percpu_ref_init() fails") is converted too.  Suggested
    by Li Zefan.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NAristeu Rozanski <aris@redhat.com>
Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>

eb95419b

cgroup: add css_parent() · 63876986

由 Tejun Heo 提交于 8月 08, 2013

Currently, controllers have to explicitly follow the cgroup hierarchy
to find the parent of a given css.  cgroup is moving towards using
cgroup_subsys_state as the main controller interface construct, so
let's provide a way to climb the hierarchy using just csses.

This patch implements css_parent() which, given a css, returns its
parent.  The function is guarnateed to valid non-NULL parent css as
long as the target css is not at the top of the hierarchy.

freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
are converted to use css_parent() instead of accessing cgroup->parent
directly.

* __parent_ca() is dropped from cpuacct and its usage is replaced with
  parent_ca().  The only difference between the two was NULL test on
  cgroup->parent which is now embedded in css_parent() making the
  distinction moot.  Note that eventually a css->parent field will be
  added to css and the NULL check in css_parent() will go away.

This patch shouldn't cause any behavior differences.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

63876986

cgroup: add subsystem pointer to cgroup_subsys_state · 72c97e54

由 Tejun Heo 提交于 8月 08, 2013

Currently, given a cgroup_subsys_state, there's no way to find out
which subsystem the css is for, which we'll need to convert the cgroup
controller API to primarily use @css instead of @cgroup.  This patch
adds cgroup_subsys_state->ss which points to the subsystem the @css
belongs to.

While at it, remove the comment about accessing @css->cgroup to
determine the hierarchy.  cgroup core will provide API to traverse
hierarchy of css'es and we don't want subsystems to directly walk
cgroup hierarchies anymore.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

72c97e54

cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/ · 8af01f56

由 Tejun Heo 提交于 8月 08, 2013

The names of the two struct cgroup_subsys_state accessors -
cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
The former clashes with the type name and the latter doesn't even
indicate it's somehow related to cgroup.

We're about to revamp large portion of cgroup API, so, let's rename
them so that they're less awkward.  Most per-controller usages of the
accessors are localized in accessor wrappers and given the amount of
scheduled changes, this isn't gonna add any noticeable headache.

Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
to task_css().  This patch is pure rename.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

8af01f56

31 7月, 2013 4 次提交

cgroup: implement cgroup_from_id() · e14880f7

由 Li Zefan 提交于 7月 31, 2013

This will be used as a replacement for css_lookup().

There's a difference with cgroup id and css id. cgroup id starts with 0,
while css id starts with 1.

v4:
- also check if cggroup_mutex is held.
- make it an inline function.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NTejun Heo <tj@kernel.org>

e14880f7

cgroup: document how cgroup IDs are assigned · b414dc09

由 Li Zefan 提交于 7月 31, 2013

As cgroup id has been used in netprio cgroup and will be used in memcg,
it's important to make it clear how a cgroup id is allocated.

For example, in netprio cgroup, the id is used as index of anarray.
Signed-off-by: NLi Zefan <lizefan@huwei.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NTejun Heo <tj@kernel.org>

b414dc09

cgroup: convert cgroup_ida to cgroup_idr · 4e96ee8e

由 Li Zefan 提交于 7月 31, 2013

This enables us to lookup a cgroup by its id.

v4:
- add a comment for idr_remove() in cgroup_offline_fn().

v3:
- on success, idr_alloc() returns the id but not 0, so fix the BUG_ON()
  in cgroup_init().
- pass the right value to idr_alloc() so that the id for dummy cgroup is 0.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NTejun Heo <tj@kernel.org>

4e96ee8e

cgroup: more naming cleanups · 6f4b7e63

由 Li Zefan 提交于 7月 31, 2013

Constantly use @cset for css_set variables and use @cgrp as cgroup
variables.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

6f4b7e63

13 7月, 2013 1 次提交

cgroup: replace task_cgroup_path_from_hierarchy() with task_cgroup_path() · 913ffdb5

由 Tejun Heo 提交于 7月 11, 2013

task_cgroup_path_from_hierarchy() was added for the planned new users
and none of the currently planned users wants to know about multiple
hierarchies.  This patch drops the multiple hierarchy part and makes
it always return the path in the first non-dummy hierarchy.

As unified hierarchy will always have id 1, this is guaranteed to
return the path for the unified hierarchy if mounted; otherwise, it
will return the path from the hierarchy which happens to occupy the
lowest hierarchy id, which will usually be the first hierarchy mounted
after boot.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Jan Kaluža <jkaluza@redhat.com>

913ffdb5

28 6月, 2013 1 次提交

cgroup: CGRP_ROOT_SUBSYS_BOUND should be ignored when comparing mount options · 0ce6cba3

由 Tejun Heo 提交于 6月 27, 2013

1672d040 ("cgroup: fix cgroupfs_root early destruction path")
introduced CGRP_ROOT_SUBSYS_BOUND which is used to mark completion of
subsys binding on a new root; however, this broke remounts.
cgroup_remount() doesn't allow changing root options via remount and
CGRP_ROOT_SUBSYS_BOUND, which is set on all fully initialized roots,
makes the function reject all remounts.

Fix it by putting the options part in the lower 16 bits of root->flags
and masking the comparions.  While at it, make cgroup_remount() emit
an error message explaining why it's rejecting a remount request, so
that it's less of a mystery.
Signed-off-by: NTejun Heo <tj@kernel.org>

0ce6cba3

27 6月, 2013 2 次提交

cgroup: fix RCU accesses to task->cgroups · 14611e51

由 Tejun Heo 提交于 6月 25, 2013

task->cgroups is a RCU pointer pointing to struct css_set.  A task
switches to a different css_set on cgroup migration but a css_set
doesn't change once created and its pointers to cgroup_subsys_states
aren't RCU protected.

task_subsys_state[_check]() is the macro to acquire css given a task
and subsys_id pair.  It RCU-dereferences task->cgroups->subsys[] not
task->cgroups, so the RCU pointer task->cgroups ends up being
dereferenced without read_barrier_depends() after it.  It's broken.

Fix it by introducing task_css_set[_check]() which does
RCU-dereference on task->cgroups.  task_subsys_state[_check]() is
reimplemented to directly dereference ->subsys[] of the css_set
returned from task_css_set[_check]().

This removes some of sparse RCU warnings in cgroup.

v2: Fixed unbalanced parenthsis and there's no need to use
    rcu_dereference_raw() when !CONFIG_PROVE_RCU.  Both spotted by Li.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org

14611e51

cgroup: fix cgroupfs_root early destruction path · 1672d040

由 Tejun Heo 提交于 6月 25, 2013

cgroupfs_root used to have ->actual_subsys_mask in addition to
->subsys_mask.  a8a648c4 ("cgroup: remove
cgroup->actual_subsys_mask") removed it noting that the subsys_mask is
essentially temporary and doesn't belong in cgroupfs_root; however,
the patch made it impossible to tell whether a cgroupfs_root actually
has the subsystems bound or just have the bits set leading to the
following BUG when trying to mount with subsystems which are already
mounted elsewhere.

 kernel BUG at kernel/cgroup.c:1038!
 invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 ...
 CPU: 1 PID: 7973 Comm: mount Tainted: G        W    3.10.0-rc7-next-20130625-sasha-00011-g1c1dc0e #1105
 task: ffff880fc0ae8000 ti: ffff880fc0b9a000 task.ti: ffff880fc0b9a000
 RIP: 0010:[<ffffffff81249b29>]  [<ffffffff81249b29>] rebind_subsystems+0x409/0x5f0
 ...
 Call Trace:
  [<ffffffff8124bd4f>] cgroup_kill_sb+0xff/0x210
  [<ffffffff813d21af>] deactivate_locked_super+0x4f/0x90
  [<ffffffff8124f3b3>] cgroup_mount+0x673/0x6e0
  [<ffffffff81257169>] cpuset_mount+0xd9/0x110
  [<ffffffff813d2580>] mount_fs+0xb0/0x2d0
  [<ffffffff81404afd>] vfs_kern_mount+0xbd/0x180
  [<ffffffff814070b5>] do_new_mount+0x145/0x2c0
  [<ffffffff814085d6>] do_mount+0x356/0x3c0
  [<ffffffff8140873d>] SyS_mount+0xfd/0x140
  [<ffffffff854eb600>] tracesys+0xdd/0xe2

We still want rebind_subsystems() to take added/removed masks, so
let's fix it by marking whether a cgroupfs_root has finished binding
or not.  Also, document what's going on around ->subsys_mask
initialization so that similar mistakes aren't repeated.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Acked-by: NLi Zefan <lizefan@huawei.com>

1672d040

25 6月, 2013 2 次提交

cgroup: remove cgroup->actual_subsys_mask · a8a648c4

由 Tejun Heo 提交于 6月 24, 2013

cgroup curiously has two subsystem masks, ->subsys_mask and
->actual_subsys_mask.  The latter only exists because the new target
subsys_mask is passed into rebind_subsystems() via @root>subsys_mask.
rebind_subsystems() needs to know what the current mask is to decide
how to reach the target mask so ->actual_subsys_mask is used as the
temp location to remember the current state.

Adding a temporary field to a permanent data structure is rather silly
and can be misleading.  Update rebind_subsystems() to take @added_mask
and @removed_mask instead and remove @root->actual_subsys_mask.

This patch shouldn't introduce any behavior changes.

v2: Comment and description updated as suggested by Li.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

a8a648c4

cgroup: convert CFTYPE_* flags to enums · 02c402d9

由 Tejun Heo 提交于 6月 24, 2013

Purely cosmetic.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

02c402d9

19 6月, 2013 2 次提交

cgroup: rename cont to cgrp · 03c78cbe

由 Li Zefan 提交于 6月 14, 2013

Cont is short for container. control group was named process container
at first, but then people found container already has a meaning in
linux kernel.

Clean up the leftover variable name @cont.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

03c78cbe

cgroup: convert cgroup_cft_commit() to use cgroup_for_each_descendant_pre() · e8c82d20

由 Li Zefan 提交于 6月 18, 2013

We used root->allcg_list to iterate cgroup hierarchy because at that time
cgroup_for_each_descendant_pre() hasn't been invented.

tj: In cgroup_cfts_commit(), s/@serial_nr/@update_upto/, move the
    assignment right above releasing cgroup_mutex and explain what's
    going on there.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

e8c82d20

18 6月, 2013 1 次提交

cgroup: disallow rename(2) if sane_behavior · 6db8e85c

由 Tejun Heo 提交于 6月 14, 2013

cgroup's rename(2) isn't a proper migration implementation - it can't
move the cgroup to a different parent in the hierarchy.  All it can do
is swapping the name string for that cgroup.  This isn't useful and
can mislead users to think that cgroup supports proper cgroup-level
migration.  Disallow rename(2) if sane_behavior.

v2: Fail with -EPERM instead of -EINVAL so that it matches the vfs
    return value when ->rename is not implemented as suggested by Li.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

6db8e85c

14 6月, 2013 10 次提交

cgroup: update sane_behavior documentation · f63674fd

由 Tejun Heo 提交于 6月 13, 2013

f12dc020 ("cgroup: mark "tasks" cgroup file as insane") and
cc5943a7 ("cgroup: mark "notify_on_release" and "release_agent"
cgroup files insane") forgot to update the changed behavior
documentation in cgroup.h.  Update it.
Signed-off-by: NTejun Heo <tj@kernel.org>

f63674fd

cgroup: use percpu refcnt for cgroup_subsys_states · d3daf28d

由 Tejun Heo 提交于 6月 13, 2013

A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.

One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.

In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.

This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.

The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.

This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.

v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().

    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NKent Overstreet <koverstreet@google.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>

d3daf28d

cgroup: split cgroup destruction into two steps · ea15f8cc

由 Tejun Heo 提交于 6月 13, 2013

Split cgroup_destroy_locked() into two steps and put the latter half
into cgroup_offline_fn() which is executed from a work item.  The
latter half is responsible for offlining the css's, removing the
cgroup from internal lists, and propagating release notification to
the parent.  The separation is to allow using percpu refcnt for css.

Note that this allows for other cgroup operations to happen between
the first and second halves of destruction, including creating a new
cgroup with the same name.  As the target cgroup is marked DEAD in the
first half and cgroup internals don't care about the names of cgroups,
this should be fine.  A comment explaining this will be added by the
next patch which implements the actual percpu refcnting.

As RCU freeing is guaranteed to happen after the second step of
destruction, we can use the same work item for both.  This patch
renames cgroup->free_work to ->destroy_work and uses it for both
purposes.  INIT_WORK() is now performed right before queueing the work
item.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

ea15f8cc

cgroup: remove cgroup->count and use · 6f3d828f

由 Tejun Heo 提交于 6月 12, 2013

cgroup->count tracks the number of css_sets associated with the cgroup
and used only to verify that no css_set is associated when the cgroup
is being destroyed.  It's superflous as the destruction path can
simply check whether cgroup->cset_links is empty instead.

Drop cgroup->count and check ->cset_links directly from
cgroup_destroy_locked().
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

6f3d828f

cgroup: rename CGRP_REMOVED to CGRP_DEAD · 54766d4a

由 Tejun Heo 提交于 6月 12, 2013

We will add another flag indicating that the cgroup is in the process
of being killed.  REMOVING / REMOVED is more difficult to distinguish
and cgroup_is_removing()/cgroup_is_removed() are a bit awkward.  Also,
later percpu_ref usage will involve "kill"ing the refcnt.

 s/CGRP_REMOVED/CGRP_DEAD/
 s/cgroup_is_removed()/cgroup_is_dead()

This patch is purely cosmetic.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

54766d4a

cgroup: clean up css_[try]get() and css_put() · 5de0107e

由 Tejun Heo 提交于 6月 12, 2013

* __css_get() isn't used by anyone.  Fold it into css_get().

* Add proper function comments to all css reference functions.

This patch is purely cosmetic.

v2: Typo fix as per Li.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

5de0107e

cgroup: bring some sanity to naming around cg_cgroup_link · 69d0206c

由 Tejun Heo 提交于 6月 12, 2013

cgroups and css_sets are mapped M:N and this M:N mapping is
represented by struct cg_cgroup_link which forms linked lists on both
sides.  The naming around this mapping is already confusing and struct
cg_cgroup_link exacerbates the situation quite a bit.

>From cgroup side, it starts off ->css_sets and runs through
->cgrp_link_list.  From css_set side, it starts off ->cg_links and
runs through ->cg_link_list.  This is rather reversed as
cgrp_link_list is used to iterate css_sets and cg_link_list cgroups.
Also, this is the only place which is still using the confusing "cg"
for css_sets.  This patch cleans it up a bit.

* s/cgroup->css_sets/cgroup->cset_links/
  s/css_set->cg_links/css_set->cgrp_links/
  s/cgroup_iter->cg_link/cgroup_iter->cset_link/

* s/cg_cgroup_link/cgrp_cset_link/

* s/cgrp_cset_link->cg/cgrp_cset_link->cset/
  s/cgrp_cset_link->cgrp_link_list/cgrp_cset_link->cset_link/
  s/cgrp_cset_link->cg_link_list/cgrp_cset_link->cgrp_link/

* s/init_css_set_link/init_cgrp_cset_link/
  s/free_cg_links/free_cgrp_cset_links/
  s/allocate_cg_links/allocate_cgrp_cset_links/

* s/cgl[12]/link[12]/ in compare_css_sets()

* s/saved_link/tmp_link/ s/tmp/tmp_links/ and a couple similar
  adustments.

* Comment and whiteline adjustments.

After the changes, we have

	list_for_each_entry(link, &cont->cset_links, cset_link) {
		struct css_set *cset = link->cset;

instead of

	list_for_each_entry(link, &cont->css_sets, cgrp_link_list) {
		struct css_set *cset = link->cg;

This patch is purely cosmetic.

v2: Fix broken sentences in the patch description.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

69d0206c

T
cgroup: remove now unused css_depth() · 3fc3db9a
由 Tejun Heo 提交于 6月 12, 2013
```
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
```
3fc3db9a

cpuset: allow to move tasks to empty cpusets · 88fa523b

由 Li Zefan 提交于 6月 09, 2013

Currently some cpuset behaviors are not friendly when cpuset is co-mounted
with other cgroup controllers.

Now with this patchset if cpuset is mounted with sane_behavior option,
it behaves differently:

- Tasks will be kept in empty cpusets when hotplug happens and take
  masks of ancestors with non-empty cpus/mems, instead of being moved to
  an ancestor.

- A task can be moved into an empty cpuset, and again it takes masks of
  ancestors, so the user can drop a task into a newly created cgroup without
  having to do anything for it.

As tasks can reside in empy cpusets, here're some rules:

- They can be moved to another cpuset, regardless it's empty or not.

- Though it takes masks from ancestors, it takes other configs from the
  empty cpuset.

- If the ancestors' masks are changed, those tasks will also be updated
  to take new masks.

v2: add documentation in include/linux/cgroup.h
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

88fa523b

cpuset: allow to keep tasks in empty cpusets · 5c5cc623

由 Li Zefan 提交于 6月 09, 2013

To achieve this:

- We call update_tasks_cpumask/nodemask() for empty cpusets when
hotplug happens, instead of moving tasks out of them.

- When a cpuset's masks are changed by writing cpuset.cpus/mems,
we also update tasks in child cpusets which are empty.

v3:
- do propagation work in one place for both hotplug and unplug

v2:
- drop rcu_read_lock before calling update_task_nodemask() and
  update_task_cpumask(), instead of using workqueue.
- add documentation in include/linux/cgroup.h
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

5c5cc623

24 5月, 2013 4 次提交

cgroup: update iterators to use cgroup_next_sibling() · 75501a6d

由 Tejun Heo 提交于 5月 24, 2013

This patch converts cgroup_for_each_child(),
cgroup_next_descendant_pre/post() and thus
cgroup_for_each_descendant_pre/post() to use cgroup_next_sibling()
instead of manually dereferencing ->sibling.next.

The only reason the iterators couldn't allow dropping RCU read lock
while iteration is in progress was because they couldn't determine the
next sibling safely once RCU read lock is dropped.  Using
cgroup_next_sibling() removes that problem and enables all iterators
to allow dropping RCU read lock in the middle.  Comments are updated
accordingly.

This makes the iterators easier to use and will simplify controllers.

Note that @cgroup argument is renamed to @cgrp in
cgroup_for_each_child() because it conflicts with "struct cgroup" used
in the new macro body.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>

75501a6d

cgroup: add cgroup->serial_nr and implement cgroup_next_sibling() · 53fa5261

由 Tejun Heo 提交于 5月 24, 2013

Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.

It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.

This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.

This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.

Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.

v2: Typo fix as per Serge.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>

53fa5261

cgroup: make cgroup_is_removed() static · bdc7119f

由 Tejun Heo 提交于 5月 24, 2013

cgroup_is_removed() no longer has external users and it shouldn't grow
any - controllers should deal with cgroup_subsys_state on/offline
state instead of cgroup removal state.  Make it static.

While at it, make it return bool.
Signed-off-by: NTejun Heo <tj@kernel.org>

bdc7119f

cgroup: fix a subtle bug in descendant pre-order walk · 7805d000

由 Tejun Heo 提交于 5月 24, 2013

When cgroup_next_descendant_pre() initiates a walk, it checks whether
the subtree root doesn't have any children and if not returns NULL.
Later code assumes that the subtree isn't empty.  This is broken
because the subtree may become empty inbetween, which can lead to the
traversal escaping the subtree by walking to the sibling of the
subtree root.

There's no reason to have the early exit path.  Remove it along with
the later assumption that the subtree isn't empty.  This simplifies
the code a bit and fixes the subtle bug.

While at it, fix the comment of cgroup_for_each_descendant_pre() which
was incorrectly referring to ->css_offline() instead of
->css_online().
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org

7805d000

15 5月, 2013 3 次提交

blk-throttle: implement proper hierarchy support · 9138125b

由 Tejun Heo 提交于 5月 14, 2013

With the recent updates, blk-throttle is finally ready for proper
hierarchy support.  Dispatching now honors service_queue->parent_sq
and propagates correctly.  The only thing missing is setting
->parent_sq correctly so that throtl_grp hierarchy matches the cgroup
hierarchy.

This patch updates throtl_pd_init() such that service_queues form the
same hierarchy as the cgroup hierarchy if sane_behavior is enabled.
As this concludes proper hierarchy support for blkcg, the shameful
.broken_hierarchy tag is removed from blkio_subsys.

v2: Updated blkio-controller.txt as suggested by Vivek.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>

9138125b

cgroup.h: remove some functions that are now gone · 23958e72

由 Greg KH 提交于 5月 03, 2013

cgroup_lock() and cgroup_unlock() are now no longer exported, so fix
cgroup.h to not declare them if CONFIG_CGROUPS is not enabled.
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

23958e72

cgroup: implement task_cgroup_path_from_hierarchy() · 857a2beb

由 Tejun Heo 提交于 4月 14, 2013

kdbus folks want a sane way to determine the cgroup path that a given
task belongs to on a given hierarchy, which is a reasonble thing to
expect from cgroup core.

Implement task_cgroup_path_from_hierarchy().

v2: Dropped unnecessary NULL check on the return value of
    task_cgroup_from_root() as suggested by Li Zefan.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NGreg Kroah-Hartman <greg@kroah.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <daniel@zonque.org>

857a2beb

08 5月, 2013 1 次提交

aio: don't include aio.h in sched.h · a27bb332

由 Kent Overstreet 提交于 5月 07, 2013

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a27bb332

02 5月, 2013 1 次提交
- A
  take cgroup_open() and cpuset_open() to fs/proc/base.c · 8d8b97ba
  由 Al Viro 提交于 4月 19, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  8d8b97ba
30 4月, 2013 1 次提交

cgroup: remove css_get_next · 6d2488f6

由 Michal Hocko 提交于 4月 29, 2013

Now that we have generic and well ordered cgroup tree walkers there is
no need to keep css_get_next in the place.
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ying Han <yinghan@google.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d2488f6

19 4月, 2013 1 次提交

cgroup: fix broken file xattrs · 712317ad

由 Li Zefan 提交于 4月 18, 2013

We should store file xattrs in struct cfent instead of struct cftype,
because cftype is a type while cfent is object instance of cftype.

For example each cgroup has a tasks file, and each tasks file is
associated with a uniq cfent, but all those files share the same
struct cftype.

Alexey Kodanev reported a crash, which can be reproduced:

  # mount -t cgroup -o xattr /sys/fs/cgroup
  # mkdir /sys/fs/cgroup/test
  # setfattr -n trusted.value -v test_value /sys/fs/cgroup/tasks
  # rmdir /sys/fs/cgroup/test
  # umount /sys/fs/cgroup
  oops!

In this case, simple_xattrs_free() will free the same struct simple_xattrs
twice.

tj: Dropped unused local variable @cft from cgroup_diput().

Cc: <stable@vger.kernel.org> # 3.8.x
Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

712317ad

16 4月, 2013 1 次提交

memcg: force use_hierarchy if sane_behavior · f00baae7

由 Tejun Heo 提交于 4月 15, 2013

Turn on use_hierarchy by default if sane_behavior is specified and
don't create .use_hierarchy file.

It is debatable whether to remove .use_hierarchy file or make it ro as
the former could make transition easier in certain cases; however, the
behavior changes which will be gated by sane_behavior are intensive
including changing basic meaning of certain control knobs in a few
controllers and I don't really think keeping this piece would make
things easier in any noticeable way, so let's remove it.

v2: Explain that mem_cgroup_bind() doesn't have to worry about
    children as suggested by Michal Hocko.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

f00baae7

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功