1. 09 7月, 2014 5 次提交
    • T
      cgroup: remove CGRP_ROOT_OPTION_MASK · 7450e90b
      Tejun Heo 提交于
      cgroup_root->flags only contains CGRP_ROOT_* flags and there's no
      reason to mask the flags.  Remove CGRP_ROOT_OPTION_MASK.
      
      This doesn't cause any behavior differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      7450e90b
    • T
      cgroup: implement cgroup_subsys->depends_on · af0ba678
      Tejun Heo 提交于
      Currently, the blkio subsystem attributes all of writeback IOs to the
      root.  One of the issues is that there's no way to tell who originated
      a writeback IO from block layer.  Those IOs are usually issued
      asynchronously from a task which didn't have anything to do with
      actually generating the dirty pages.  The memory subsystem, when
      enabled, already keeps track of the ownership of each dirty page and
      it's desirable for blkio to piggyback instead of adding its own
      per-page tag.
      
      blkio piggybacking on memory is an implementation detail which
      preferably should be handled automatically without requiring explicit
      userland action.  To achieve that, this patch implements
      cgroup_subsys->depends_on which contains the mask of subsystems which
      should be enabled together when the subsystem is enabled.
      
      The previous patches already implemented the support for enabled but
      invisible subsystems and cgroup_subsys->depends_on can be easily
      implemented by updating cgroup_refresh_child_subsys_mask() so that it
      calculates cgroup->child_subsys_mask considering
      cgroup_subsys->depends_on of the explicitly enabled subsystems.
      
      Documentation/cgroups/unified-hierarchy.txt is updated to explain that
      subsystems may not become immediately available after being unused
      from userland and that dependency could be a factor in it.  As
      subsystems may already keep residual references, this doesn't
      significantly change how subsystem rebinding can be used.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      af0ba678
    • T
      cgroup: implement cgroup_subsys->css_reset() · b4536f0c
      Tejun Heo 提交于
      cgroup is implementing support for subsystem dependency which would
      require a way to enable a subsystem even when it's not directly
      configured through "cgroup.subtree_control".
      
      The previous patches added support for explicitly and implicitly
      enabled subsystems and showing/hiding their interface files.  An
      explicitly enabled subsystem may become implicitly enabled if it's
      turned off through "cgroup.subtree_control" but there are subsystems
      depending on it.  In such cases, the subsystem, as it's turned off
      when seen from userland, shouldn't enforce any resource control.
      Also, the subsystem may be explicitly turned on later again and its
      interface files should be as close to the intial state as possible.
      
      This patch adds cgroup_subsys->css_reset() which is invoked when a css
      is hidden.  The callback should disable resource control and reset the
      state to the vanilla state.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      b4536f0c
    • T
      cgroup: make interface files visible iff enabled on cgroup->subtree_control · f63070d3
      Tejun Heo 提交于
      cgroup is implementing support for subsystem dependency which would
      require a way to enable a subsystem even when it's not directly
      configured through "cgroup.subtree_control".
      
      The preceding patch distinguished cgroup->subtree_control and
      ->child_subsys_mask where the former is the subsystems explicitly
      configured by the userland and the latter is all enabled subsystems
      currently is equal to the former but will include subsystems
      implicitly enabled through dependency.
      
      Subsystems which are enabled due to dependency shouldn't be visible to
      userland.  This patch updates cgroup_subtree_control_write() and
      create_css() such that interface files are not created for implicitly
      enabled subsytems.
      
      * @visible paramter is added to create_css().  Interface files are
        created only when true.
      
      * If an already implicitly enabled subsystem is turned on through
        "cgroup.subtree_control", the existing css should be used.  css
        draining is skipped.
      
      * cgroup_subtree_control_write() computes the new target
        cgroup->child_subsys_mask and create/kill or show/hide csses
        accordingly.
      
      As the two subsystem masks are still kept identical, this patch
      doesn't introduce any behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      f63070d3
    • T
      cgroup: introduce cgroup->subtree_control · 667c2491
      Tejun Heo 提交于
      cgroup is implementing support for subsystem dependency which would
      require a way to enable a subsystem even when it's not directly
      configured through "cgroup.subtree_control".
      
      Previously, cgroup->child_subsys_mask directly reflected
      "cgroup.subtree_control" and the enabled subsystems in the child
      cgroups.  This patch adds cgroup->subtree_control which
      "cgroup.subtree_control" operates on.  cgroup->child_subsys_mask is
      now calculated from cgroup->subtree_control by
      cgroup_refresh_child_subsys_mask(), which sets it identical to
      cgroup->subtree_control for now.
      
      This will allow using cgroup->child_subsys_mask for all the enabled
      subsystems including the implicit ones and ->subtree_control for
      tracking the explicitly requested ones.  This patch keeps the two
      masks identical and doesn't introduce any behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      667c2491
  2. 20 5月, 2014 1 次提交
    • T
      cgroup: disallow debug controller on the default hierarchy · 5533e011
      Tejun Heo 提交于
      The debug controller, as its name suggests, exposes cgroup core
      internals to userland to aid debugging.  Unfortunately, except for the
      name, there's no provision to prevent its usage in production
      configurations and the controller is widely enabled and mounted
      leaking internal details to userland.  Like most other debug
      information, the information exposed by debug isn't interesting even
      for debugging itself once the related parts are working reliably.
      
      This controller has no reason for existing.  This patch implements
      cgrp_dfl_root_inhibit_ss_mask which can suppress specific subsystems
      on the default hierarchy and adds the debug subsystem to it so that it
      can be gradually deprecated as usages move towards the unified
      hierarchy.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5533e011
  3. 17 5月, 2014 10 次提交
    • T
      cgroup: implement css_tryget() · 6f4524d3
      Tejun Heo 提交于
      Implement css_tryget() which tries to grab a cgroup_subsys_state's
      reference as long as it already hasn't reached zero.  Combined with
      the recent css iterator changes to include offline && !released csses
      during traversal, this can be used to access csses regardless of its
      online state.
      
      v2: Take the new flag CSS_NO_REF into account.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      6f4524d3
    • T
      cgroup: convert cgroup_has_live_children() into css_has_online_children() · f3d46500
      Tejun Heo 提交于
      Now that cgroup liveliness and css onliness are the same state,
      convert cgroup_has_live_children() into css_has_online_children() so
      that it can be used for actual csses too.  The function now uses
      css_for_each_child() for iteration and is published.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      f3d46500
    • T
      cgroup: use CSS_ONLINE instead of CGRP_DEAD · 184faf32
      Tejun Heo 提交于
      Use CSS_ONLINE on the self css to indicate whether a cgroup has been
      killed instead of CGRP_DEAD.  This will allow re-using css online test
      for cgroup liveliness test.  This doesn't introduce any functional
      change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      184faf32
    • T
      cgroup: iterate cgroup_subsys_states directly · c2931b70
      Tejun Heo 提交于
      Currently, css_next_child() is implemented as finding the next child
      cgroup which has the css enabled, which used to be the only way to do
      it as only cgroups participated in sibling lists and thus could be
      iteratd.  This works as long as what's required during iteration is
      not missing online csses; however, it turns out that there are use
      cases where offlined but not yet released csses need to be iterated.
      This is difficult to implement through cgroup iteration the unified
      hierarchy as there may be multiple dying csses for the same subsystem
      associated with single cgroup.
      
      After the recent changes, the cgroup self and regular csses behave
      identically in how they're linked and unlinked from the sibling lists
      including assertion of CSS_RELEASED and css_next_child() can simply
      switch to iterating csses directly.  This both simplifies the logic
      and ensures that all visible non-released csses are included in the
      iteration whether there are multiple dying csses for a subsystem or
      not.
      
      As all other iterators depend on css_next_child() for sibling
      iteration, this changes behaviors of all css iterators.  Add and
      update explanations on the css states which are included in traversal
      to all iterators.
      
      As css iteration could always contain offlined csses, this shouldn't
      break any of the current users and new usages which need iteration of
      all on and offline csses can make use of the new semantics.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      c2931b70
    • T
      cgroup: introduce CSS_RELEASED and reduce css iteration fallback window · de3f0341
      Tejun Heo 提交于
      css iterations allow the caller to drop RCU read lock.  As long as the
      caller keeps the current position accessible, it can simply re-grab
      RCU read lock later and continue iteration.  This is achieved by using
      CGRP_DEAD to detect whether the current positions next pointer is safe
      to dereference and if not re-iterate from the beginning to the next
      position using ->serial_nr.
      
      CGRP_DEAD is used as the marker to invalidate the next pointer and the
      only requirement is that the marker is set before the next sibling
      starts its RCU grace period.  Because CGRP_DEAD is set at the end of
      cgroup_destroy_locked() but the cgroup is unlinked when the reference
      count reaches zero, we currently have a rather large window where this
      fallback re-iteration logic can be triggered.
      
      This patch introduces CSS_RELEASED which is set when a css is unlinked
      from its sibling list.  This still keeps the re-iteration logic
      working while drastically reducing the window of its activation.
      While at it, rewrite the comment in css_next_child() to reflect the
      new flag and better explain the synchronization.
      
      This will also enable iterating csses directly instead of through
      cgroups.
      
      v2: CSS_RELEASED now assigned to 1 << 2 as 1 << 0 is used by
          CSS_NO_REF.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      de3f0341
    • T
      cgroup: move cgroup->serial_nr into cgroup_subsys_state · 0cb51d71
      Tejun Heo 提交于
      We're moving towards using cgroup_subsys_states as the fundamental
      structural blocks.  All csses including the cgroup->self and actual
      ones now form trees through css->children and ->sibling which follow
      the same rules as what cgroup->children and ->sibling followed.  This
      patch moves cgroup->serial_nr which is used to implement css iteration
      into css.
      
      Note that all csses, regardless of their types, allocate their serial
      numbers from the same monotonically increasing counter.  This doesn't
      affect the ordering needed by css iteration or cause any other
      material behavior changes.  This will be used to update css iteration.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      0cb51d71
    • T
      cgroup: move cgroup->sibling and ->children into cgroup_subsys_state · d5c419b6
      Tejun Heo 提交于
      We're moving towards using cgroup_subsys_states as the fundamental
      structural blocks.  Let's move cgroup->sibling and ->children into
      cgroup_subsys_state.  This is pure move without functional change and
      only cgroup->self's fields are actually used.  Other csses will make
      use of the fields later.
      
      While at it, update init_and_link_css() so that it zeroes the whole
      css before initializing it and remove explicit zeroing of ->flags.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      d5c419b6
    • T
      cgroup: remove cgroup->parent · d51f39b0
      Tejun Heo 提交于
      cgroup->parent is redundant as cgroup->self.parent can also be used to
      determine the parent cgroup and we're moving towards using
      cgroup_subsys_states as the fundamental structural blocks.  This patch
      introduces cgroup_parent() which follows cgroup->self.parent and
      removes cgroup->parent.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      d51f39b0
    • T
      cgroup: remove css_parent() · 5c9d535b
      Tejun Heo 提交于
      cgroup in general is moving towards using cgroup_subsys_state as the
      fundamental structural component and css_parent() was introduced to
      convert from using cgroup->parent to css->parent.  It was quite some
      time ago and we're moving forward with making css more prominent.
      
      This patch drops the trivial wrapper css_parent() and let the users
      dereference css->parent.  While at it, explicitly mark fields of css
      which are public and immutable.
      
      v2: New usage from device_cgroup.c converted.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      5c9d535b
    • T
      cgroup: skip refcnting on normal root csses and cgrp_dfl_root self css · 3b514d24
      Tejun Heo 提交于
      9395a450 ("cgroup: enable refcnting for root csses") enabled
      reference counting for root csses (cgroup_subsys_states) so that
      cgroup's self csses can be used to manage the lifetime of the
      containing cgroups.
      
      Unfortunately, this change was incorrect.  During early init,
      cgrp_dfl_root self css refcnt is used.  percpu_ref can't initialized
      during early init and its initialization is deferred till
      cgroup_init() time.  This means that cpu was using percpu_ref which
      wasn't properly initialized.  Due to the way percpu variables are laid
      out on x86, this didn't blow up immediately on x86 but ended up
      incrementing and decrementing the percpu variable at offset zero,
      whatever it may be; however, on other archs, this caused fault and
      early boot failure.
      
      As cgroup self csses for root cgroups of non-dfl hierarchies need
      working refcounting, we can't revert 9395a450.  This patch adds
      CSS_NO_REF which explicitly inhibits reference counting on the css and
      sets it on all normal (non-self) csses and cgroup_dfl_root self css.
      
      v2: cgrp_dfl_root.self is the offending one.  Set the flag on it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NStephen Warren <swarren@nvidia.com>
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Fixes: 9395a450 ("cgroup: enable refcnting for root csses")
      3b514d24
  4. 14 5月, 2014 9 次提交
    • T
      cgroup: use cgroup->self.refcnt for cgroup refcnting · 9d755d33
      Tejun Heo 提交于
      Currently cgroup implements refcnting separately using atomic_t
      cgroup->refcnt.  The destruction paths of cgroup and css are rather
      complex and bear a lot of similiarities including the use of RCU and
      bouncing to a work item.
      
      This patch makes cgroup use the refcnt of self css for refcnting
      instead of using its own.  This makes cgroup refcnting use css's
      percpu refcnt and share the destruction mechanism.
      
      * css_release_work_fn() and css_free_work_fn() are updated to handle
        both csses and cgroups.  This is a bit messy but should do until we
        can make cgroup->self a full css, which currently can't be done
        thanks to multiple hierarchies.
      
      * cgroup_destroy_locked() now performs
        percpu_ref_kill(&cgrp->self.refcnt) instead of cgroup_put(cgrp).
      
      * Negative refcnt sanity check in cgroup_get() is no longer necessary
        as percpu_ref already handles it.
      
      * Similarly, as a cgroup which hasn't been killed will never be
        released regardless of its refcnt value and percpu_ref has sanity
        check on kill, cgroup_is_dead() sanity check in cgroup_put() is no
        longer necessary.
      
      * As whether a refcnt reached zero or not can only be decided after
        the reference count is killed, cgroup_root->cgrp's refcnting can no
        longer be used to decide whether to kill the root or not.  Let's
        make cgroup_kill_sb() explicitly initiate destruction if the root
        doesn't have any children.  This makes sense anyway as unmounted
        cgroup hierarchy without any children should be destroyed.
      
      While this is a bit messy, this will allow pushing more bookkeeping
      towards cgroup->self and thus handling cgroups and csses in more
      uniform way.  In the very long term, it should be possible to
      introduce a base subsystem and convert the self css to a proper one
      making things whole lot simpler and unified.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      9d755d33
    • T
      cgroup: enable refcnting for root csses · 9395a450
      Tejun Heo 提交于
      Currently, css_get(), css_tryget() and css_tryget_online() are noops
      for root csses as an optimization; however, we're planning to use css
      refcnts to track of cgroup lifetime too and root cgroups also need to
      be reference counted.  Since css has been converted to percpu_refcnt,
      the overhead of refcnting is miniscule and this optimization isn't too
      meaningful anymore.  Furthermore, controllers which optimize the root
      cgroup often never even invoke these functions in their hot paths.
      
      This patch enables refcnting for root csses too.  This makes CSS_ROOT
      flag unused and removes it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      9395a450
    • T
      cgroup: remove cgroup_destory_css_killed() · 249f3468
      Tejun Heo 提交于
      cgroup_destroy_css_killed() is cgroup destruction stage which happens
      after all csses are offlined.  After the recent updates, it no longer
      does anything other than putting the base reference.  This patch
      removes the function and makes cgroup_destroy_locked() put the base
      ref at the end isntead.
      
      This also makes cgroup->nr_css unnecessary.  Removed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      249f3468
    • T
      cgroup: rename cgroup->dummy_css to ->self and move it to the top · 9d800df1
      Tejun Heo 提交于
      cgroup->dummy_css is used as the placeholder css when performing css
      oriended operations on the cgroup.  We're gonna shift more cgroup
      management to this css.  Let's rename it to ->self and move it to the
      top.
      
      This is pure rename and field relocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      9d800df1
    • T
      cgroup: remove cgroup->control_kn · b7fc5ad2
      Tejun Heo 提交于
      Now that cgroup_subtree_control_write() has access to the associated
      kernfs_open_file and thus the kernfs_node, there's no need to cache it
      in cgroup->control_kn on creation.  Remove cgroup->control_kn and use
      @of->kn directly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      b7fc5ad2
    • T
      cgroup: replace cftype->trigger() with cftype->write() · 6770c64e
      Tejun Heo 提交于
      cftype->trigger() is pointless.  It's trivial to ignore the input
      buffer from a regular ->write() operation.  Convert all ->trigger()
      users to ->write() and remove ->trigger().
      
      This patch doesn't introduce any visible behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      6770c64e
    • T
      cgroup: replace cftype->write_string() with cftype->write() · 451af504
      Tejun Heo 提交于
      Convert all cftype->write_string() users to the new cftype->write()
      which maps directly to kernfs write operation and has full access to
      kernfs and cgroup contexts.  The conversions are mostly mechanical.
      
      * @css and @cft are accessed using of_css() and of_cft() accessors
        respectively instead of being specified as arguments.
      
      * Should return @nbytes on success instead of 0.
      
      * @buf is not trimmed automatically.  Trim if necessary.  Note that
        blkcg and netprio don't need this as the parsers already handle
        whitespaces.
      
      cftype->write_string() has no user left after the conversions and
      removed.
      
      While at it, remove unnecessary local variable @p in
      cgroup_subtree_control_write() and stale comment about
      CGROUP_LOCAL_BUFFER_SIZE in cgroup_freezer.c.
      
      This patch doesn't introduce any visible behavior changes.
      
      v2: netprio was missing from conversion.  Converted.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NAristeu Rozanski <arozansk@redhat.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      451af504
    • T
      cgroup: implement cftype->write() · b4168640
      Tejun Heo 提交于
      During the recent conversion to kernfs, cftype's seq_file operations
      are updated so that they are directly mapped to kernfs operations and
      thus can fully access the associated kernfs and cgroup contexts;
      however, write path hasn't seen similar updates and none of the
      existing write operations has access to, for example, the associated
      kernfs_open_file.
      
      Let's introduce a new operation cftype->write() which maps directly to
      the kernfs write operation and has access to all the arguments and
      contexts.  This will replace ->write_string() and ->trigger() and ease
      manipulation of kernfs active protection from cgroup file operations.
      
      Two accessors - of_cft() and of_css() - are introduced to enable
      accessing the associated cgroup context from cftype->write() which
      only takes kernfs_open_file for the context information.  The
      accessors for seq_file operations - seq_cft() and seq_css() - are
      rewritten to wrap the of_ accessors.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      b4168640
    • T
      cgroup: rename css_tryget*() to css_tryget_online*() · ec903c0c
      Tejun Heo 提交于
      Unlike the more usual refcnting, what css_tryget() provides is the
      distinction between online and offline csses instead of protection
      against upping a refcnt which already reached zero.  cgroup is
      planning to provide actual tryget which fails if the refcnt already
      reached zero.  Let's rename the existing trygets so that they clearly
      indicate that they're onliness.
      
      I thought about keeping the existing names as-are and introducing new
      names for the planned actual tryget; however, given that each
      controller participates in the synchronization of the online state, it
      seems worthwhile to make it explicit that these functions are about
      on/offline state.
      
      Rename css_tryget() to css_tryget_online() and css_tryget_from_dir()
      to css_tryget_online_from_dir().  This is pure rename.
      
      v2: cgroup_freezer grew new usages of css_tryget().  Update
          accordingly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      ec903c0c
  5. 13 5月, 2014 1 次提交
    • T
      cgroup: introduce task_css_is_root() · 5024ae29
      Tejun Heo 提交于
      Determining the css of a task usually requires RCU read lock as that's
      the only thing which keeps the returned css accessible till its
      reference is acquired; however, testing whether a task belongs to the
      root can be performed without dereferencing the returned css by
      comparing the returned pointer against the root one in init_css_set[]
      which never changes.
      
      Implement task_css_is_root() which can be invoked in any context.
      This will be used by the scheduled cgroup_freezer change.
      
      v2: cgroup no longer supports modular controllers.  No need to export
          init_css_set.  Pointed out by Li.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      5024ae29
  6. 10 5月, 2014 1 次提交
  7. 07 5月, 2014 1 次提交
  8. 05 5月, 2014 3 次提交
    • T
      cgroup, memcg: implement css->id and convert css_from_id() to use it · 15a4c835
      Tejun Heo 提交于
      Until now, cgroup->id has been used to identify all the associated
      csses and css_from_id() takes cgroup ID and returns the matching css
      by looking up the cgroup and then dereferencing the css associated
      with it; however, now that the lifetimes of cgroup and css are
      separate, this is incorrect and breaks on the unified hierarchy when a
      controller is disabled and enabled back again before the previous
      instance is released.
      
      This patch adds css->id which is a subsystem-unique ID and converts
      css_from_id() to look up by the new css->id instead.  memcg is the
      only user of css_from_id() and also converted to use css->id instead.
      
      For traditional hierarchies, this shouldn't make any functional
      difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Jianyu Zhan <nasa4836@gmail.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      15a4c835
    • T
      cgroup, memcg: allocate cgroup ID from 1 · 7d699ddb
      Tejun Heo 提交于
      Currently, cgroup->id is allocated from 0, which is always assigned to
      the root cgroup; unfortunately, memcg wants to use ID 0 to indicate
      invalid IDs and ends up incrementing all IDs by one.
      
      It's reasonable to reserve 0 for special purposes.  This patch updates
      cgroup core so that ID 0 is not used and the root cgroups get ID 1.
      The ID incrementing is removed form memcg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      7d699ddb
    • T
      cgroup: make flags and subsys_masks unsigned int · 69dfa00c
      Tejun Heo 提交于
      There's no reason to use atomic bitops for cgroup_subsys_state->flags,
      cgroup_root->flags and various subsys_masks.  This patch updates those
      to use bitwise and/or operations instead and converts them form
      unsigned long to unsigned int.
      
      This makes the fields occupy (marginally) smaller space and makes it
      clear that they don't require atomicity.
      
      This patch doesn't cause any behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      69dfa00c
  9. 26 4月, 2014 1 次提交
    • T
      cgroup: implement cgroup.populated for the default hierarchy · 842b597e
      Tejun Heo 提交于
      cgroup users often need a way to determine when a cgroup's
      subhierarchy becomes empty so that it can be cleaned up.  cgroup
      currently provides release_agent for it; unfortunately, this mechanism
      is riddled with issues.
      
      * It delivers events by forking and execing a userland binary
        specified as the release_agent.  This is a long deprecated method of
        notification delivery.  It's extremely heavy, slow and cumbersome to
        integrate with larger infrastructure.
      
      * There is single monitoring point at the root.  There's no way to
        delegate management of a subtree.
      
      * The event isn't recursive.  It triggers when a cgroup doesn't have
        any tasks or child cgroups.  Events for internal nodes trigger only
        after all children are removed.  This again makes it impossible to
        delegate management of a subtree.
      
      * Events are filtered from the kernel side.  "notify_on_release" file
        is used to subscribe to or suppress release event.  This is
        unnecessarily complicated and probably done this way because event
        delivery itself was expensive.
      
      This patch implements interface file "cgroup.populated" which can be
      used to monitor whether the cgroup's subhierarchy has tasks in it or
      not.  Its value is 0 if there is no task in the cgroup and its
      descendants; otherwise, 1, and kernfs_notify() notificaiton is
      triggers when the value changes, which can be monitored through poll
      and [di]notify.
      
      This is a lot ligther and simpler and trivially allows delegating
      management of subhierarchy - subhierarchy monitoring can block further
      propgation simply by putting itself or another process in the root of
      the subhierarchy and monitor events that it's interested in from there
      without interfering with monitoring higher in the tree.
      
      v2: Patch description updated as per Serge.
      
      v3: "cgroup.subtree_populated" renamed to "cgroup.populated".  The
          subtree_ prefix was a bit confusing because
          "cgroup.subtree_control" uses it to denote the tree rooted at the
          cgroup sans the cgroup itself while the populated state includes
          the cgroup itself.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Lennart Poettering <lennart@poettering.net>
      842b597e
  10. 23 4月, 2014 6 次提交
    • T
      cgroup: implement dynamic subtree controller enable/disable on the default hierarchy · f8f22e53
      Tejun Heo 提交于
      cgroup is switching away from multiple hierarchies and will use one
      unified default hierarchy where controllers can be dynamically enabled
      and disabled per subtree.  The default hierarchy will serve as the
      unified hierarchy to which all controllers are attached and a css on
      the default hierarchy would need to also serve the tasks of descendant
      cgroups which don't have the controller enabled - ie. the tree may be
      collapsed from leaf towards root when viewed from specific
      controllers.  This has been implemented through effective css in the
      previous patches.
      
      This patch finally implements dynamic subtree controller
      enable/disable on the default hierarchy via a new knob -
      "cgroup.subtree_control" which controls which controllers are enabled
      on the child cgroups.  Let's assume a hierarchy like the following.
      
        root - A - B - C
                     \ D
      
      root's "cgroup.subtree_control" determines which controllers are
      enabled on A.  A's on B.  B's on C and D.  This coincides with the
      fact that controllers on the immediate sub-level are used to
      distribute the resources of the parent.  In fact, it's natural to
      assume that resource control knobs of a child belong to its parent.
      Enabling a controller in "cgroup.subtree_control" declares that
      distribution of the respective resources of the cgroup will be
      controlled.  Note that this means that controller enable states are
      shared among siblings.
      
      The default hierarchy has an extra restriction - only cgroups which
      don't contain any task may have controllers enabled in
      "cgroup.subtree_control".  Combined with the other properties of the
      default hierarchy, this guarantees that, from the view point of
      controllers, tasks are only on the leaf cgroups.  In other words, only
      leaf csses may contain tasks.  This rules out situations where child
      cgroups compete against internal tasks of the parent, which is a
      competition between two different types of entities without any clear
      way to determine resource distribution between the two.  Different
      controllers handle it differently and all the implemented behaviors
      are ambiguous, ad-hoc, cumbersome and/or just wrong.  Having this
      structural constraints imposed from cgroup core removes the burden
      from controller implementations and enables showing one consistent
      behavior across all controllers.
      
      When a controller is enabled or disabled, css associations for the
      controller in the subtrees of each child should be updated.  After
      enabling, the whole subtree of a child should point to the new css of
      the child.  After disabling, the whole subtree of a child should point
      to the cgroup's css.  This is implemented by first updating cgroup
      states such that cgroup_e_css() result points to the appropriate css
      and then invoking cgroup_update_dfl_csses() which migrates all tasks
      in the affected subtrees to the self cgroup on the default hierarchy.
      
      * When read, "cgroup.subtree_control" lists all the currently enabled
        controllers on the children of the cgroup.
      
      * White-space separated list of controller names prefixed with either
        '+' or '-' can be written to "cgroup.subtree_control".  The ones
        prefixed with '+' are enabled on the controller and '-' disabled.
      
      * A controller can be enabled iff the parent's
        "cgroup.subtree_control" enables it and disabled iff no child's
        "cgroup.subtree_control" has it enabled.
      
      * If a cgroup has tasks, no controller can be enabled via
        "cgroup.subtree_control".  Likewise, if "cgroup.subtree_control" has
        some controllers enabled, tasks can't be migrated into the cgroup.
      
      * All controllers which aren't bound on other hierarchies are
        automatically associated with the root cgroup of the default
        hierarchy.  All the controllers which are bound to the default
        hierarchy are listed in the read-only file "cgroup.controllers" in
        the root directory.
      
      * "cgroup.controllers" in all non-root cgroups is read-only file whose
        content is equal to that of "cgroup.subtree_control" of the parent.
        This indicates which controllers can be used in the cgroup's
        "cgroup.subtree_control".
      
      This is still experimental and there are some holes, one of which is
      that ->can_attach() failure during cgroup_update_dfl_csses() may leave
      the cgroups in an undefined state.  The issues will be addressed by
      future patches.
      
      v2: Non-root cgroups now also have "cgroup.controllers".
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      f8f22e53
    • T
      cgroup: add css_set->dfl_cgrp · 6803c006
      Tejun Heo 提交于
      To implement the unified hierarchy behavior, we'll need to be able to
      determine the associated cgroup on the default hierarchy from css_set.
      Let's add css_set->dfl_cgrp so that it can be accessed conveniently
      and efficiently.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      6803c006
    • T
      cgroup: teach css_task_iter about effective csses · 3ebb2b6e
      Tejun Heo 提交于
      Currently, css_task_iter iterates tasks associated with a css by
      visiting each css_set associated with the owning cgroup and walking
      tasks of each of them.  This works fine for !unified hierarchies as
      each cgroup has its own css for each associated subsystem on the
      hierarchy; however, on the planned unified hierarchy, a cgroup may not
      have csses associated and its tasks would be considered associated
      with the matching css of the nearest ancestor which has the subsystem
      enabled.
      
      This means that on the default unified hierarchy, just walking all
      tasks associated with a cgroup isn't enough to walk all tasks which
      are associated with the specified css.  If any of its children doesn't
      have the matching css enabled, task iteration should also include all
      tasks from the subtree.  We already added cgroup->e_csets[] to list
      all css_sets effectively associated with a given css and walk css_sets
      on that list instead to achieve such iteration.
      
      This patch updates css_task_iter iteration such that it walks css_sets
      on cgroup->e_csets[] instead of cgroup->cset_links if iteration is
      requested on an non-dummy css.  Thanks to the previous iteration
      update, this change can be achieved with the addition of
      css_task_iter->ss and minimal updates to css_advance_task_iter() and
      css_task_iter_start().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      3ebb2b6e
    • T
      cgroup: reorganize css_task_iter · 0f0a2b4f
      Tejun Heo 提交于
      This patch reorganizes css_task_iter so that adding effective css
      support is easier.
      
      * s/->cset_link/->cset_pos/ and s/->task/->task_pos/ for consistency
      
      * ->origin_css is used to determine whether the iteration reached the
        last css_set.  Replace it with explicit ->cset_head so that
        css_advance_task_iter() doesn't have to know the termination
        condition directly.
      
      * css_task_iter_next() currently assumes that it's walking list of
        cgrp_cset_link and reaches into the current cset through the current
        link to determine the termination conditions for task walking.  As
        this won't always be true for effective css walking, add
        ->tasks_head and ->mg_tasks_head and use them to control task
        walking so that css_task_iter_next() doesn't have to know how
        css_sets are being walked.
      
      This patch doesn't make any behavior changes.  The iteration logic
      stays unchanged after the patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      0f0a2b4f
    • T
      cgroup: implement cgroup->e_csets[] · 2d8f243a
      Tejun Heo 提交于
      On the default unified hierarchy, a cgroup may be associated with
      csses of its ancestors, which means that a css of a given cgroup may
      be associated with css_sets of descendant cgroups.  This means that we
      can't walk all tasks associated with a css by iterating the css_sets
      associated with the cgroup as there are css_sets which are pointing to
      the css but linked on the descendants.
      
      This patch adds per-subsystem list heads cgroup->e_csets[].  Any
      css_set which is pointing to a css is linked to
      css->cgroup->e_csets[$SUBSYS_ID] through
      css_set->e_cset_node[$SUBSYS_ID].  The lists are protected by
      css_set_rwsem and will allow us to walk all css_sets associated with a
      given css so that we can find out all associated tasks.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      2d8f243a
    • T
      cgroup: update cgroup->subsys_mask to ->child_subsys_mask and restore cgroup_root->subsys_mask · f392e51c
      Tejun Heo 提交于
      94419627 ("cgroup: move ->subsys_mask from cgroupfs_root to
      cgroup") moved ->subsys_mask from cgroup_root to cgroup to prepare for
      the unified hierarhcy; however, it turns out that carrying the
      subsys_mask of the children in the parent, instead of itself, is a lot
      more natural.  This patch restores cgroup_root->subsys_mask and morphs
      cgroup->subsys_mask into cgroup->child_subsys_mask.
      
      * Uses of root->cgrp.subsys_mask are restored to root->subsys_mask.
      
      * Remove automatic setting and clearing of cgrp->subsys_mask and
        instead just inherit ->child_subsys_mask from the parent during
        cgroup creation.  Note that this doesn't affect any current
        behaviors.
      
      * Undo __kill_css() separation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      f392e51c
  11. 29 3月, 2014 1 次提交
  12. 19 3月, 2014 1 次提交