1. 13 8月, 2013 3 次提交
    • T
      cgroup: add __rcu modifier to cgroup->subsys[] · 73e80ed8
      Tejun Heo 提交于
      For the planned unified hierarchy, each css (cgroup_subsys_state) will
      be RCU protected so that it can be created and destroyed individually
      while allowing RCU accesses.  Previous changes ensured that all
      cgroup->subsys[] accesses use the cgroup_css() accessor.  This patch
      adds __rcu modifier to cgroup->subsys[], add matching RCU dereference
      in cgroup_css() and convert all assignments to either
      rcu_assign_pointer() or RCU_INIT_POINTER().
      
      This change prepares for the actual RCUfication of css's and doesn't
      introduce any visible behavior change.  The conversion is verified
      with sparse and all accesses are properly RCU annotated.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      73e80ed8
    • T
      cgroup: add cgroup_subsys_state->parent · 0ae78e0b
      Tejun Heo 提交于
      With the planned unified hierarchy, css's (cgroup_subsys_state) will
      be RCU protected and allowed to be attached and detached dynamically
      over the course of a cgroup's lifetime.  This means that css's will
      stay accessible after being detached from its cgroup - the matching
      pointer in cgroup->subsys[] cleared - for ref draining and RCU grace
      period.
      
      cgroup core still wants to guarantee that the parent css is never
      destroyed before its children and css_parent() always returns the
      parent regardless of the state of the child css as long as it's
      accessible.
      
      This patch makes css's hold onto their parents and adds css->parent so
      that the parent css is never detroyed before its children and can be
      determined without consulting the cgroups.
      
      cgroup->dummy_css is also updated to point to the parent dummy_css;
      however, it doesn't need to worry about object lifetime as the parent
      cgroup is already pinned by the child.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      0ae78e0b
    • T
      cgroup: rename cgroup_subsys_state->dput_work and its callback function · 35ef10da
      Tejun Heo 提交于
      css (cgroup_subsys_state) will become RCU protected and there will be
      two stages which require punting to work item during release.  To
      prepare for using the work item for multiple times, rename
      css->dput_work to css->destroy_work and css_dput_fn() to
      css_free_work_fn() and move work item initialization from css init to
      right before the actual usage.
      
      This reorganization doesn't introduce any behavior change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      35ef10da
  2. 09 8月, 2013 18 次提交
    • T
      cgroup: make css_for_each_descendant() and friends include the origin css in the iteration · bd8815a6
      Tejun Heo 提交于
      Previously, all css descendant iterators didn't include the origin
      (root of subtree) css in the iteration.  The reasons were maintaining
      consistency with css_for_each_child() and that at the time of
      introduction more use cases needed skipping the origin anyway;
      however, given that css_is_descendant() considers self to be a
      descendant, omitting the origin css has become more confusing and
      looking at the accumulated use cases rather clearly indicates that
      including origin would result in simpler code overall.
      
      While this is a change which can easily lead to subtle bugs, cgroup
      API including the iterators has recently gone through major
      restructuring and no out-of-tree changes will be applicable without
      adjustments making this a relatively acceptable opportunity for this
      type of change.
      
      The conversions are mostly straight-forward.  If the iteration block
      had explicit origin handling before or after, it's moved inside the
      iteration.  If not, if (pos == origin) continue; is added.  Some
      conversions add extra reference get/put around origin handling by
      consolidating origin handling and the rest.  While the extra ref
      operations aren't strictly necessary, this shouldn't cause any
      noticeable difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      bd8815a6
    • T
      cgroup: unexport cgroup_css() · 95109b62
      Tejun Heo 提交于
      cgroup_css() no longer has any user left outside cgroup.c proper and
      we don't want subsystems to grow new usages of the function.  cgroup
      core should always provide the css to use to the subsystems, which
      will make dynamic creation and destruction of css's across the
      lifetime of a cgroup much more manageable than exposing the cgroup
      directly to subsystems and let them dereference css's from it.
      
      Make cgroup_css() a static function in cgroup.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      95109b62
    • T
      cgroup: make cgroup_taskset deal with cgroup_subsys_state instead of cgroup · d99c8727
      Tejun Heo 提交于
      cgroup is in the process of converting to css (cgroup_subsys_state)
      from cgroup as the principal subsystem interface handle.  This is
      mostly to prepare for the unified hierarchy support where css's will
      be created and destroyed dynamically but also helps cleaning up
      subsystem implementations as css is usually what they are interested
      in anyway.
      
      cgroup_taskset which is used by the subsystem attach methods is the
      last cgroup subsystem API which isn't using css as the handle.  Update
      cgroup_taskset_cur_cgroup() to cgroup_taskset_cur_css() and
      cgroup_taskset_for_each() to take @skip_css instead of @skip_cgrp.
      
      The conversions are pretty mechanical.  One exception is
      cpuset::cgroup_cs(), which lost its last user and got removed.
      
      This patch shouldn't introduce any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      d99c8727
    • T
      cgroup: make cftype->[un]register_event() deal with cgroup_subsys_state instead of cgroup · 81eeaf04
      Tejun Heo 提交于
      cgroup is in the process of converting to css (cgroup_subsys_state)
      from cgroup as the principal subsystem interface handle.  This is
      mostly to prepare for the unified hierarchy support where css's will
      be created and destroyed dynamically but also helps cleaning up
      subsystem implementations as css is usually what they are interested
      in anyway.
      
      cftype->[un]register_event() is among the remaining couple interfaces
      which still use struct cgroup.  Convert it to cgroup_subsys_state.
      The conversion is mostly mechanical and removes the last users of
      mem_cgroup_from_cont() and cg_to_vmpressure(), which are removed.
      
      v2: indentation update as suggested by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      81eeaf04
    • T
      cgroup: make task iterators deal with cgroup_subsys_state instead of cgroup · 72ec7029
      Tejun Heo 提交于
      cgroup is in the process of converting to css (cgroup_subsys_state)
      from cgroup as the principal subsystem interface handle.  This is
      mostly to prepare for the unified hierarchy support where css's will
      be created and destroyed dynamically but also helps cleaning up
      subsystem implementations as css is usually what they are interested
      in anyway.
      
      This patch converts task iterators to deal with css instead of cgroup.
      Note that under unified hierarchy, different sets of tasks will be
      considered belonging to a given cgroup depending on the subsystem in
      question and making the iterators deal with css instead cgroup
      provides them with enough information about the iteration.
      
      While at it, fix several function comment formats in cpuset.c.
      
      This patch doesn't introduce any behavior differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      72ec7029
    • T
      cgroup: remove struct cgroup_scanner · e535837b
      Tejun Heo 提交于
      cgroup_scan_tasks() takes a pointer to struct cgroup_scanner as its
      sole argument and the only function of that struct is packing the
      arguments of the function call which are consisted of five fields.
      It's not too unusual to pack parameters into a struct when the number
      of arguments gets excessive or the whole set needs to be passed around
      a lot, but neither holds here making it just weird.
      
      Drop struct cgroup_scanner and pass the params directly to
      cgroup_scan_tasks().  Note that struct cpuset_change_nodemask_arg was
      added to cpuset.c to pass both ->cs and ->newmems pointer to
      cpuset_change_nodemask() using single data pointer.
      
      This doesn't make any functional differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      e535837b
    • T
      cgroup: make cgroup_task_iter remember the cgroup being iterated · c59cd3d8
      Tejun Heo 提交于
      Currently all cgroup_task_iter functions require @cgrp to be passed
      in, which is superflous and increases chance of usage error.  Make
      cgroup_task_iter remember the cgroup being iterated and drop @cgrp
      argument from next and end functions.
      
      This patch doesn't introduce any behavior differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      c59cd3d8
    • T
      cgroup: rename cgroup_iter to cgroup_task_iter · 0942eeee
      Tejun Heo 提交于
      cgroup now has multiple iterators and it's quite confusing to have
      something which walks over tasks of a single cgroup named cgroup_iter.
      Let's rename it to cgroup_task_iter.
      
      While at it, reformat / update comments and replace the overview
      comment above the interface function decls with proper function
      comments.  Such overview can be useful but function comments should be
      more than enough here.
      
      This is pure rename and doesn't introduce any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      0942eeee
    • T
      cgroup: make hierarchy iterators deal with cgroup_subsys_state instead of cgroup · 492eb21b
      Tejun Heo 提交于
      cgroup is currently in the process of transitioning to using css
      (cgroup_subsys_state) as the primary handle instead of cgroup in
      subsystem API.  For hierarchy iterators, this is beneficial because
      
      * In most cases, css is the only thing subsystems care about anyway.
      
      * On the planned unified hierarchy, iterations for different
        subsystems will need to skip over different subtrees of the
        hierarchy depending on which subsystems are enabled on each cgroup.
        Passing around css makes it unnecessary to explicitly specify the
        subsystem in question as css is intersection between cgroup and
        subsystem
      
      * For the planned unified hierarchy, css's would need to be created
        and destroyed dynamically independent from cgroup hierarchy.  Having
        cgroup core manage css iteration makes enforcing deref rules a lot
        easier.
      
      Most subsystem conversions are straight-forward.  Noteworthy changes
      are
      
      * blkio: cgroup_to_blkcg() is no longer used.  Removed.
      
      * freezer: cgroup_freezer() is no longer used.  Removed.
      
      * devices: cgroup_to_devcgroup() is no longer used.  Removed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      492eb21b
    • T
      cgroup: always use cgroup_next_child() to walk the children list · f48e3924
      Tejun Heo 提交于
      There are several places where the children list is accessed directly.
      This patch converts those places to use cgroup_next_child().  This
      will help updating the hierarchy iterators to use @css instead of
      @cgrp.
      
      While cgroup_next_child() can be heavy in pathological cases - e.g. a
      lot of dead children, this shouldn't cause any noticeable behavior
      differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      f48e3924
    • T
      cgroup: convert cgroup_next_sibling() to cgroup_next_child() · 3b287a50
      Tejun Heo 提交于
      cgroup is transitioning to using css (cgroup_subsys_state) as the main
      subsys interface handle instead of cgroup and the iterators will be
      updated to use css too.  The iterators need to walk the cgroup
      hierarchy and return the css's matching the origin css, which is a bit
      cumbersome to open code.
      
      This patch converts cgroup_next_sibling() to cgroup_next_child() so
      that it can handle all steps of direct child iteration.  This will be
      used to update iterators to take @css instead of @cgrp.  In addition
      to the new iteration init handling, cgroup_next_child() is
      restructured so that the different branches share the end of iteration
      condition check.
      
      This patch doesn't change any behavior.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      3b287a50
    • T
      cgroup: pass around cgroup_subsys_state instead of cgroup in file methods · 182446d0
      Tejun Heo 提交于
      cgroup is currently in the process of transitioning to using struct
      cgroup_subsys_state * as the primary handle instead of struct cgroup.
      Please see the previous commit which converts the subsystem methods
      for rationale.
      
      This patch converts all cftype file operations to take @css instead of
      @cgroup.  cftypes for the cgroup core files don't have their subsytem
      pointer set.  These will automatically use the dummy_css added by the
      previous patch and can be converted the same way.
      
      Most subsystem conversions are straight forwards but there are some
      interesting ones.
      
      * freezer: update_if_frozen() is also converted to take @css instead
        of @cgroup for consistency.  This will make the code look simpler
        too once iterators are converted to use css.
      
      * memory/vmpressure: mem_cgroup_from_css() needs to be exported to
        vmpressure while mem_cgroup_from_cont() can be made static.
        Updated accordingly.
      
      * cpu: cgroup_tg() doesn't have any user left.  Removed.
      
      * cpuacct: cgroup_ca() doesn't have any user left.  Removed.
      
      * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
        Removed.
      
      * net_cls: cgrp_cls_state() doesn't have any user left.  Removed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      182446d0
    • T
      cgroup: add cgroup->dummy_css · 67f4c36f
      Tejun Heo 提交于
      cgroup subsystem API is being converted to use css
      (cgroup_subsys_state) as the main handle, which makes things a bit
      awkward for subsystem agnostic core features - the "cgroup.*"
      interface files and various iterations - a bit awkward as they don't
      have a css to use.
      
      This patch adds cgroup->dummy_css which has NULL ->ss and whose only
      role is pointing back to the cgroup.  This will be used to support
      subsystem agnostic features on the coming css based API.
      
      css_parent() is updated to handle dummy_css's.  Note that css will
      soon grow its own ->parent field and css_parent() will be made
      trivial.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      67f4c36f
    • T
      cgroup: add subsys backlink pointer to cftype · 2bb566cb
      Tejun Heo 提交于
      cgroup is transitioning to using css (cgroup_subsys_state) instead of
      cgroup as the primary subsystem handle.  The cgroupfs file interface
      will be converted to use css's which requires finding out the
      subsystem from cftype so that the matching css can be determined from
      the cgroup.
      
      This patch adds cftype->ss which points to the subsystem the file
      belongs to.  The field is initialized while a cftype is being
      registered.  This makes it unnecessary to explicitly specify the
      subsystem for other cftype handling functions.  @ss argument dropped
      from various cftype handling functions.
      
      This patch shouldn't introduce any behavior differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      2bb566cb
    • T
      cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methods · eb95419b
      Tejun Heo 提交于
      cgroup is currently in the process of transitioning to using struct
      cgroup_subsys_state * as the primary handle instead of struct cgroup *
      in subsystem implementations for the following reasons.
      
      * With unified hierarchy, subsystems will be dynamically bound and
        unbound from cgroups and thus css's (cgroup_subsys_state) may be
        created and destroyed dynamically over the lifetime of a cgroup,
        which is different from the current state where all css's are
        allocated and destroyed together with the associated cgroup.  This
        in turn means that cgroup_css() should be synchronized and may
        return NULL, making it more cumbersome to use.
      
      * Differing levels of per-subsystem granularity in the unified
        hierarchy means that the task and descendant iterators should behave
        differently depending on the specific subsystem the iteration is
        being performed for.
      
      * In majority of the cases, subsystems only care about its part in the
        cgroup hierarchy - ie. the hierarchy of css's.  Subsystem methods
        often obtain the matching css pointer from the cgroup and don't
        bother with the cgroup pointer itself.  Passing around css fits
        much better.
      
      This patch converts all cgroup_subsys methods to take @css instead of
      @cgroup.  The conversions are mostly straight-forward.  A few
      noteworthy changes are
      
      * ->css_alloc() now takes css of the parent cgroup rather than the
        pointer to the new cgroup as the css for the new cgroup doesn't
        exist yet.  Knowing the parent css is enough for all the existing
        subsystems.
      
      * In kernel/cgroup.c::offline_css(), unnecessary open coded css
        dereference is replaced with local variable access.
      
      This patch shouldn't cause any behavior differences.
      
      v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
          with local variable @css as suggested by Li Zefan.
      
          Rebased on top of new for-3.12 which includes for-3.11-fixes so
          that ->css_free() invocation added by da0a12ca ("cgroup: fix a
          leak when percpu_ref_init() fails") is converted too.  Suggested
          by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      eb95419b
    • T
      cgroup: add css_parent() · 63876986
      Tejun Heo 提交于
      Currently, controllers have to explicitly follow the cgroup hierarchy
      to find the parent of a given css.  cgroup is moving towards using
      cgroup_subsys_state as the main controller interface construct, so
      let's provide a way to climb the hierarchy using just csses.
      
      This patch implements css_parent() which, given a css, returns its
      parent.  The function is guarnateed to valid non-NULL parent css as
      long as the target css is not at the top of the hierarchy.
      
      freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
      are converted to use css_parent() instead of accessing cgroup->parent
      directly.
      
      * __parent_ca() is dropped from cpuacct and its usage is replaced with
        parent_ca().  The only difference between the two was NULL test on
        cgroup->parent which is now embedded in css_parent() making the
        distinction moot.  Note that eventually a css->parent field will be
        added to css and the NULL check in css_parent() will go away.
      
      This patch shouldn't cause any behavior differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      63876986
    • T
      cgroup: add subsystem pointer to cgroup_subsys_state · 72c97e54
      Tejun Heo 提交于
      Currently, given a cgroup_subsys_state, there's no way to find out
      which subsystem the css is for, which we'll need to convert the cgroup
      controller API to primarily use @css instead of @cgroup.  This patch
      adds cgroup_subsys_state->ss which points to the subsystem the @css
      belongs to.
      
      While at it, remove the comment about accessing @css->cgroup to
      determine the hierarchy.  cgroup core will provide API to traverse
      hierarchy of css'es and we don't want subsystems to directly walk
      cgroup hierarchies anymore.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      72c97e54
    • T
      cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/ · 8af01f56
      Tejun Heo 提交于
      The names of the two struct cgroup_subsys_state accessors -
      cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
      The former clashes with the type name and the latter doesn't even
      indicate it's somehow related to cgroup.
      
      We're about to revamp large portion of cgroup API, so, let's rename
      them so that they're less awkward.  Most per-controller usages of the
      accessors are localized in accessor wrappers and given the amount of
      scheduled changes, this isn't gonna add any noticeable headache.
      
      Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
      to task_css().  This patch is pure rename.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      8af01f56
  3. 31 7月, 2013 4 次提交
  4. 13 7月, 2013 1 次提交
    • T
      cgroup: replace task_cgroup_path_from_hierarchy() with task_cgroup_path() · 913ffdb5
      Tejun Heo 提交于
      task_cgroup_path_from_hierarchy() was added for the planned new users
      and none of the currently planned users wants to know about multiple
      hierarchies.  This patch drops the multiple hierarchy part and makes
      it always return the path in the first non-dummy hierarchy.
      
      As unified hierarchy will always have id 1, this is guaranteed to
      return the path for the unified hierarchy if mounted; otherwise, it
      will return the path from the hierarchy which happens to occupy the
      lowest hierarchy id, which will usually be the first hierarchy mounted
      after boot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Jan Kaluža <jkaluza@redhat.com>
      913ffdb5
  5. 28 6月, 2013 1 次提交
    • T
      cgroup: CGRP_ROOT_SUBSYS_BOUND should be ignored when comparing mount options · 0ce6cba3
      Tejun Heo 提交于
      1672d040 ("cgroup: fix cgroupfs_root early destruction path")
      introduced CGRP_ROOT_SUBSYS_BOUND which is used to mark completion of
      subsys binding on a new root; however, this broke remounts.
      cgroup_remount() doesn't allow changing root options via remount and
      CGRP_ROOT_SUBSYS_BOUND, which is set on all fully initialized roots,
      makes the function reject all remounts.
      
      Fix it by putting the options part in the lower 16 bits of root->flags
      and masking the comparions.  While at it, make cgroup_remount() emit
      an error message explaining why it's rejecting a remount request, so
      that it's less of a mystery.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      0ce6cba3
  6. 27 6月, 2013 2 次提交
    • T
      cgroup: fix RCU accesses to task->cgroups · 14611e51
      Tejun Heo 提交于
      task->cgroups is a RCU pointer pointing to struct css_set.  A task
      switches to a different css_set on cgroup migration but a css_set
      doesn't change once created and its pointers to cgroup_subsys_states
      aren't RCU protected.
      
      task_subsys_state[_check]() is the macro to acquire css given a task
      and subsys_id pair.  It RCU-dereferences task->cgroups->subsys[] not
      task->cgroups, so the RCU pointer task->cgroups ends up being
      dereferenced without read_barrier_depends() after it.  It's broken.
      
      Fix it by introducing task_css_set[_check]() which does
      RCU-dereference on task->cgroups.  task_subsys_state[_check]() is
      reimplemented to directly dereference ->subsys[] of the css_set
      returned from task_css_set[_check]().
      
      This removes some of sparse RCU warnings in cgroup.
      
      v2: Fixed unbalanced parenthsis and there's no need to use
          rcu_dereference_raw() when !CONFIG_PROVE_RCU.  Both spotted by Li.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: stable@vger.kernel.org
      14611e51
    • T
      cgroup: fix cgroupfs_root early destruction path · 1672d040
      Tejun Heo 提交于
      cgroupfs_root used to have ->actual_subsys_mask in addition to
      ->subsys_mask.  a8a648c4 ("cgroup: remove
      cgroup->actual_subsys_mask") removed it noting that the subsys_mask is
      essentially temporary and doesn't belong in cgroupfs_root; however,
      the patch made it impossible to tell whether a cgroupfs_root actually
      has the subsystems bound or just have the bits set leading to the
      following BUG when trying to mount with subsystems which are already
      mounted elsewhere.
      
       kernel BUG at kernel/cgroup.c:1038!
       invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
       ...
       CPU: 1 PID: 7973 Comm: mount Tainted: G        W    3.10.0-rc7-next-20130625-sasha-00011-g1c1dc0e #1105
       task: ffff880fc0ae8000 ti: ffff880fc0b9a000 task.ti: ffff880fc0b9a000
       RIP: 0010:[<ffffffff81249b29>]  [<ffffffff81249b29>] rebind_subsystems+0x409/0x5f0
       ...
       Call Trace:
        [<ffffffff8124bd4f>] cgroup_kill_sb+0xff/0x210
        [<ffffffff813d21af>] deactivate_locked_super+0x4f/0x90
        [<ffffffff8124f3b3>] cgroup_mount+0x673/0x6e0
        [<ffffffff81257169>] cpuset_mount+0xd9/0x110
        [<ffffffff813d2580>] mount_fs+0xb0/0x2d0
        [<ffffffff81404afd>] vfs_kern_mount+0xbd/0x180
        [<ffffffff814070b5>] do_new_mount+0x145/0x2c0
        [<ffffffff814085d6>] do_mount+0x356/0x3c0
        [<ffffffff8140873d>] SyS_mount+0xfd/0x140
        [<ffffffff854eb600>] tracesys+0xdd/0xe2
      
      We still want rebind_subsystems() to take added/removed masks, so
      let's fix it by marking whether a cgroupfs_root has finished binding
      or not.  Also, document what's going on around ->subsys_mask
      initialization so that similar mistakes aren't repeated.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      1672d040
  7. 25 6月, 2013 2 次提交
    • T
      cgroup: remove cgroup->actual_subsys_mask · a8a648c4
      Tejun Heo 提交于
      cgroup curiously has two subsystem masks, ->subsys_mask and
      ->actual_subsys_mask.  The latter only exists because the new target
      subsys_mask is passed into rebind_subsystems() via @root>subsys_mask.
      rebind_subsystems() needs to know what the current mask is to decide
      how to reach the target mask so ->actual_subsys_mask is used as the
      temp location to remember the current state.
      
      Adding a temporary field to a permanent data structure is rather silly
      and can be misleading.  Update rebind_subsystems() to take @added_mask
      and @removed_mask instead and remove @root->actual_subsys_mask.
      
      This patch shouldn't introduce any behavior changes.
      
      v2: Comment and description updated as suggested by Li.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      a8a648c4
    • T
      cgroup: convert CFTYPE_* flags to enums · 02c402d9
      Tejun Heo 提交于
      Purely cosmetic.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      02c402d9
  8. 19 6月, 2013 2 次提交
  9. 18 6月, 2013 1 次提交
    • T
      cgroup: disallow rename(2) if sane_behavior · 6db8e85c
      Tejun Heo 提交于
      cgroup's rename(2) isn't a proper migration implementation - it can't
      move the cgroup to a different parent in the hierarchy.  All it can do
      is swapping the name string for that cgroup.  This isn't useful and
      can mislead users to think that cgroup supports proper cgroup-level
      migration.  Disallow rename(2) if sane_behavior.
      
      v2: Fail with -EPERM instead of -EINVAL so that it matches the vfs
          return value when ->rename is not implemented as suggested by Li.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      6db8e85c
  10. 14 6月, 2013 6 次提交
    • T
      cgroup: update sane_behavior documentation · f63674fd
      Tejun Heo 提交于
      f12dc020 ("cgroup: mark "tasks" cgroup file as insane") and
      cc5943a7 ("cgroup: mark "notify_on_release" and "release_agent"
      cgroup files insane") forgot to update the changed behavior
      documentation in cgroup.h.  Update it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f63674fd
    • T
      cgroup: use percpu refcnt for cgroup_subsys_states · d3daf28d
      Tejun Heo 提交于
      A css (cgroup_subsys_state) is how each cgroup is represented to a
      controller.  As such, it can be used in hot paths across the various
      subsystems different controllers are associated with.
      
      One of the common operations is reference counting, which up until now
      has been implemented using a global atomic counter and can have
      significant adverse impact on scalability.  For example, css refcnt
      can be gotten and put multiple times by blkcg for each IO request.
      For highops configurations which try to do as much per-cpu as
      possible, the global frequent refcnting can be very expensive.
      
      In general, given the various and hugely diverse paths css's end up
      being used from, we need to make it cheap and highly scalable.  In its
      usage, css refcnting isn't very different from module refcnting.
      
      This patch converts css refcnting to use the recently added
      percpu_ref.  css_get/tryget/put() directly maps to the matching
      percpu_ref operations and the deactivation logic is no longer
      necessary as percpu_ref already has refcnt killing.
      
      The only complication is that as the refcnt is per-cpu,
      percpu_ref_kill() in itself doesn't ensure that further tryget
      operations will fail, which we need to guarantee before invoking
      ->css_offline()'s.  This is resolved collecting kill confirmation
      using percpu_ref_kill_and_confirm() and initiating the offline phase
      of destruction after all css refcnt's are confirmed to be seen as
      killed on all CPUs.  The previous patches already splitted destruction
      into two phases, so percpu_ref_kill_and_confirm() can be hooked up
      easily.
      
      This patch removes css_refcnt() which is used for rcu dereference
      sanity check in css_id().  While we can add a percpu refcnt API to ask
      the same question, css_id() itself is scheduled to be removed fairly
      soon, so let's not bother with it.  Just drop the sanity check and use
      rcu_dereference_raw() instead.
      
      v2: - init_cgroup_css() was calling percpu_ref_init() without checking
            the return value.  This causes two problems - the obvious lack
            of error handling and percpu_ref_init() being called from
            cgroup_init_subsys() before the allocators are up, which
            triggers warnings but doesn't cause actual problems as the
            refcnt isn't used for roots anyway.  Fix both by moving
            percpu_ref_init() to cgroup_create().
      
          - The base references were put too early by
            percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
            refs one extra time.  This wasn't noticeable because css's go
            through another RCU grace period before being freed.  Update
            cgroup_destroy_locked() to grab an extra reference before
            killing the refcnts.  This problem was noticed by Kent.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NKent Overstreet <koverstreet@google.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: "Alasdair G. Kergon" <agk@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Glauber Costa <glommer@gmail.com>
      d3daf28d
    • T
      cgroup: split cgroup destruction into two steps · ea15f8cc
      Tejun Heo 提交于
      Split cgroup_destroy_locked() into two steps and put the latter half
      into cgroup_offline_fn() which is executed from a work item.  The
      latter half is responsible for offlining the css's, removing the
      cgroup from internal lists, and propagating release notification to
      the parent.  The separation is to allow using percpu refcnt for css.
      
      Note that this allows for other cgroup operations to happen between
      the first and second halves of destruction, including creating a new
      cgroup with the same name.  As the target cgroup is marked DEAD in the
      first half and cgroup internals don't care about the names of cgroups,
      this should be fine.  A comment explaining this will be added by the
      next patch which implements the actual percpu refcnting.
      
      As RCU freeing is guaranteed to happen after the second step of
      destruction, we can use the same work item for both.  This patch
      renames cgroup->free_work to ->destroy_work and uses it for both
      purposes.  INIT_WORK() is now performed right before queueing the work
      item.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      ea15f8cc
    • T
      cgroup: remove cgroup->count and use · 6f3d828f
      Tejun Heo 提交于
      cgroup->count tracks the number of css_sets associated with the cgroup
      and used only to verify that no css_set is associated when the cgroup
      is being destroyed.  It's superflous as the destruction path can
      simply check whether cgroup->cset_links is empty instead.
      
      Drop cgroup->count and check ->cset_links directly from
      cgroup_destroy_locked().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      6f3d828f
    • T
      cgroup: rename CGRP_REMOVED to CGRP_DEAD · 54766d4a
      Tejun Heo 提交于
      We will add another flag indicating that the cgroup is in the process
      of being killed.  REMOVING / REMOVED is more difficult to distinguish
      and cgroup_is_removing()/cgroup_is_removed() are a bit awkward.  Also,
      later percpu_ref usage will involve "kill"ing the refcnt.
      
       s/CGRP_REMOVED/CGRP_DEAD/
       s/cgroup_is_removed()/cgroup_is_dead()
      
      This patch is purely cosmetic.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      54766d4a
    • T
      cgroup: clean up css_[try]get() and css_put() · 5de0107e
      Tejun Heo 提交于
      * __css_get() isn't used by anyone.  Fold it into css_get().
      
      * Add proper function comments to all css reference functions.
      
      This patch is purely cosmetic.
      
      v2: Typo fix as per Li.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      5de0107e