1. 26 9月, 2014 1 次提交
    • Z
      Revert "cgroup: remove redundant variable in cgroup_mount()" · e756c7b6
      Zefan Li 提交于
      This reverts commit 0c7bf3e8.
      
      If there are child cgroups in the cgroupfs and then we umount it,
      the superblock will be destroyed but the cgroup_root will be kept
      around. When we mount it again, cgroup_mount() will find this
      cgroup_root and allocate a new sb for it.
      
      So with this commit we will be trapped in a dead loop in the case
      described above, because kernfs_pin_sb() keeps returning NULL.
      
      Currently I don't see how we can avoid using both pinned_sb and
      new_sb, so just revert it.
      
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Reported-by: NAndrey Wagin <avagin@gmail.com>
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e756c7b6
  2. 25 9月, 2014 1 次提交
    • T
      percpu_ref: add PERCPU_REF_INIT_* flags · 2aad2a86
      Tejun Heo 提交于
      With the recent addition of percpu_ref_reinit(), percpu_ref now can be
      used as a persistent switch which can be turned on and off repeatedly
      where turning off maps to killing the ref and waiting for it to drain;
      however, there currently isn't a way to initialize a percpu_ref in its
      off (killed and drained) state, which can be inconvenient for certain
      persistent switch use cases.
      
      Similarly, percpu_ref_switch_to_atomic/percpu() allow dynamic
      selection of operation mode; however, currently a newly initialized
      percpu_ref is always in percpu mode making it impossible to avoid the
      latency overhead of switching to atomic mode.
      
      This patch adds @flags to percpu_ref_init() and implements the
      following flags.
      
      * PERCPU_REF_INIT_ATOMIC	: start ref in atomic mode
      * PERCPU_REF_INIT_DEAD		: start ref killed and drained
      
      These flags should be able to serve the above two use cases.
      
      v2: target_core_tpg.c conversion was missing.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      2aad2a86
  3. 21 9月, 2014 2 次提交
  4. 19 9月, 2014 4 次提交
    • Z
      cgroup: remove CGRP_RELEASABLE flag · a25eb52e
      Zefan Li 提交于
      We call put_css_set() after setting CGRP_RELEASABLE flag in
      cgroup_task_migrate(), but in other places we call it without setting
      the flag. I don't see the necessity of this flag.
      
      Moreover once the flag is set, it will never be cleared, unless writing
      to the notify_on_release control file, so it can be quite confusing
      if we look at the output of debug.releasable.
      
        # mount -t cgroup -o debug xxx /cgroup
        # mkdir /cgroup/child
        # cat /cgroup/child/debug.releasable
        0   <-- shows 0 though the cgroup is empty
        # echo $$ > /cgroup/child/tasks
        # cat /cgroup/child/debug.releasable
        0
        # echo $$ > /cgroup/tasks && echo $$ > /cgroup/child/tasks
        # cat /proc/child/debug.releasable
        1   <-- shows 1 though the cgroup is not empty
      
      This patch removes the flag, and now debug.releasable shows if the
      cgroup is empty or not.
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a25eb52e
    • Z
      cgroup: simplify proc_cgroup_show() · 006f4ac4
      Zefan Li 提交于
      Use the ONE macro instead of REG, and we can simplify proc_cgroup_show().
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      006f4ac4
    • Z
      cgroup: use a per-cgroup work for release agent · 971ff493
      Zefan Li 提交于
      Instead of using a global work to schedule release agent on removable
      cgroups, we change to use a per-cgroup work to do this, which makes
      the code much simpler.
      
      v2: use a dedicated work instead of reusing css->destroy_work. (Tejun)
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      971ff493
    • Z
      cgroup: fix unbalanced locking · eb4aec84
      Zefan Li 提交于
      cgroup_pidlist_start() holds cgrp->pidlist_mutex and then calls
      pidlist_array_load(), and cgroup_pidlist_stop() releases the mutex.
      
      It is wrong that we release the mutex in the failure path in
      pidlist_array_load(), because cgroup_pidlist_stop() will be called
      no matter if cgroup_pidlist_start() returns errno or not.
      
      Fixes: 4bac00d1
      Cc: <stable@vger.kernel.org> # 3.14+
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      eb4aec84
  5. 18 9月, 2014 3 次提交
  6. 08 9月, 2014 1 次提交
    • T
      percpu-refcount: add @gfp to percpu_ref_init() · a34375ef
      Tejun Heo 提交于
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_ref_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_refs too.
      
      This patch doesn't make any functional difference.
      
      v2: blk-mq conversion was missing.  Updated.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      a34375ef
  7. 05 9月, 2014 2 次提交
    • L
      cgroup: check cgroup liveliness before unbreaking kernfs · aa32362f
      Li Zefan 提交于
      When cgroup_kn_lock_live() is called through some kernfs operation and
      another thread is calling cgroup_rmdir(), we'll trigger the warning in
      cgroup_get().
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 1228 at kernel/cgroup.c:1034 cgroup_get+0x89/0xa0()
      ...
      Call Trace:
       [<c16ee73d>] dump_stack+0x41/0x52
       [<c10468ef>] warn_slowpath_common+0x7f/0xa0
       [<c104692d>] warn_slowpath_null+0x1d/0x20
       [<c10bb999>] cgroup_get+0x89/0xa0
       [<c10bbe58>] cgroup_kn_lock_live+0x28/0x70
       [<c10be3c1>] __cgroup_procs_write.isra.26+0x51/0x230
       [<c10be5b2>] cgroup_tasks_write+0x12/0x20
       [<c10bb7b0>] cgroup_file_write+0x40/0x130
       [<c11aee71>] kernfs_fop_write+0xd1/0x160
       [<c1148e58>] vfs_write+0x98/0x1e0
       [<c114934d>] SyS_write+0x4d/0xa0
       [<c16f656b>] sysenter_do_call+0x12/0x12
      ---[ end trace 6f2e0c38c2108a74 ]---
      
      Fix this by calling css_tryget() instead of cgroup_get().
      
      v2:
      - move cgroup_tryget() right below cgroup_get() definition. (Tejun)
      
      Cc: <stable@vger.kernel.org> # 3.15+
      Reported-by: NToralf Förster <toralf.foerster@gmx.de>
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      aa32362f
    • L
      cgroup: delay the clearing of cgrp->kn->priv · a4189487
      Li Zefan 提交于
      Run these two scripts concurrently:
      
          for ((; ;))
          {
              mkdir /cgroup/sub
              rmdir /cgroup/sub
          }
      
          for ((; ;))
          {
              echo $$ > /cgroup/sub/cgroup.procs
              echo $$ > /cgroup/cgroup.procs
          }
      
      A kernel bug will be triggered:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000038
      IP: [<c10bbd69>] cgroup_put+0x9/0x80
      ...
      Call Trace:
       [<c10bbe19>] cgroup_kn_unlock+0x39/0x50
       [<c10bbe91>] cgroup_kn_lock_live+0x61/0x70
       [<c10be3c1>] __cgroup_procs_write.isra.26+0x51/0x230
       [<c10be5b2>] cgroup_tasks_write+0x12/0x20
       [<c10bb7b0>] cgroup_file_write+0x40/0x130
       [<c11aee71>] kernfs_fop_write+0xd1/0x160
       [<c1148e58>] vfs_write+0x98/0x1e0
       [<c114934d>] SyS_write+0x4d/0xa0
       [<c16f656b>] sysenter_do_call+0x12/0x12
      
      We clear cgrp->kn->priv in the end of cgroup_rmdir(), but another
      concurrent thread can access kn->priv after the clearing.
      
      We should move the clearing to css_release_work_fn(). At that time
      no one is holding reference to the cgroup and no one can gain a new
      reference to access it.
      
      v2:
      - move RCU_INIT_POINTER() into the else block. (Tejun)
      - remove the cgroup_parent() check. (Tejun)
      - update the comment in css_tryget_online_from_dir().
      
      Cc: <stable@vger.kernel.org> # 3.15+
      Reported-by: NToralf Förster <toralf.foerster@gmx.de>
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a4189487
  8. 25 8月, 2014 1 次提交
  9. 23 8月, 2014 1 次提交
    • V
      cgroup: Display legacy cgroup files on default hierarchy · fa8137be
      Vivek Goyal 提交于
      Kernel command line parameter cgroup__DEVEL__legacy_files_on_dfl forces
      legacy cgroup files to show up on default hierarhcy if susbsystem does
      not have any files defined for default hierarchy.
      
      But this seems to be working only if legacy files are defined in
      ss->legacy_cftypes. If one adds some cftypes later using
      cgroup_add_legacy_cftypes(), these files don't show up on default
      hierarchy.  Update the function accordingly so that the dynamically
      added legacy files also show up in the default hierarchy if the target
      subsystem is also using the base legacy files for the default
      hierarchy.
      
      tj: Patch description and comment updates.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      fa8137be
  10. 18 8月, 2014 1 次提交
  11. 15 7月, 2014 6 次提交
    • T
      cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test · 5de4fa13
      Tejun Heo 提交于
      cgrp_dfl_root_inhibit_ss_mask determines which subsystems are not
      supported on the default hierarchy and is currently initialized
      statically and just includes the debug subsystem.  Now that there's
      cgroup_subsys->dfl_files, we can easily tell which subsystems support
      the default hierarchy or not.
      
      Let's initialize cgrp_dfl_root_inhibit_ss_mask by testing whether
      cgroup_subsys->dfl_files is NULL.  After all, subsystems with NULL
      ->dfl_files aren't useable on the default hierarchy anyway.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      5de4fa13
    • T
      cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core · 05ebb6e6
      Tejun Heo 提交于
      cgroup now distinguishes cftypes for the default and legacy
      hierarchies more explicitly by using separate arrays and
      CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE should be and are used only
      inside cgroup core proper.  Let's make it clear that the flags are
      internal by prefixing them with double underscores.
      
      CFTYPE_INSANE is renamed to __CFTYPE_NOT_ON_DFL for consistency.  The
      two flags are also collected and assigned bits >= 16 so that they
      aren't mixed with the published flags.
      
      v2: Convert the extra ones in cgroup_exit_cftypes() which are added by
          revision to the previous patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      05ebb6e6
    • T
      cgroup: distinguish the default and legacy hierarchies when handling cftypes · a8ddc821
      Tejun Heo 提交于
      Until now, cftype arrays carried files for both the default and legacy
      hierarchies and the files which needed to be used on only one of them
      were flagged with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE.  This
      gets confusing very quickly and we may end up exposing interface files
      to the default hierarchy without thinking it through.
      
      This patch makes cgroup core provide separate sets of interfaces for
      cftype handling so that the cftypes for the default and legacy
      hierarchies are clearly distinguished.  The previous two patches
      renamed the existing ones so that they clearly indicate that they're
      for the legacy hierarchies.  This patch adds the interface for the
      default hierarchy and apply them selectively depending on the
      hierarchy type.
      
      * cftypes added through cgroup_subsys->dfl_cftypes and
        cgroup_add_dfl_cftypes() only show up on the default hierarchy.
      
      * cftypes added through cgroup_subsys->legacy_cftypes and
        cgroup_add_legacy_cftypes() only show up on the legacy hierarchies.
      
      * cgroup_subsys->dfl_cftypes and ->legacy_cftypes can point to the
        same array for the cases where the interface files are identical on
        both types of hierarchies.
      
      * This makes all the existing subsystem interface files legacy-only by
        default and all subsystems will have no interface file created when
        enabled on the default hierarchy.  Each subsystem should explicitly
        review and compose the interface for the default hierarchy.
      
      * A boot param "cgroup__DEVEL__legacy_files_on_dfl" is added which
        makes subsystems which haven't decided the interface files for the
        default hierarchy to present the legacy files on the default
        hierarchy so that its behavior on the default hierarchy can be
        tested.  As the awkward name suggests, this is for development only.
      
      * memcg's CFTYPE_INSANE on "use_hierarchy" is noop now as the whole
        array isn't used on the default hierarchy.  The flag is removed.
      
      v2: Updated documentation for cgroup__DEVEL__legacy_files_on_dfl.
      
      v3: Clear CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE when cfts are removed
          as suggested by Li.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Aristeu Rozanski <aris@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      a8ddc821
    • T
      cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() · 2cf669a5
      Tejun Heo 提交于
      Currently, cftypes added by cgroup_add_cftypes() are used for both the
      unified default hierarchy and legacy ones and subsystems can mark each
      file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to
      appear only on one of them.  This is quite hairy and error-prone.
      Also, we may end up exposing interface files to the default hierarchy
      without thinking it through.
      
      cgroup_subsys will grow two separate cftype addition functions and
      apply each only on the hierarchies of the matching type.  This will
      allow organizing cftypes in a lot clearer way and encourage subsystems
      to scrutinize the interface which is being exposed in the new default
      hierarchy.
      
      In preparation, this patch adds cgroup_add_legacy_cftypes() which
      currently is a simple wrapper around cgroup_add_cftypes() and replaces
      all cgroup_add_cftypes() usages with it.
      
      While at it, this patch drops a completely spurious return from
      __hugetlb_cgroup_file_init().
      
      This patch doesn't introduce any functional differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      2cf669a5
    • T
      cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes · 5577964e
      Tejun Heo 提交于
      Currently, cgroup_subsys->base_cftypes is used for both the unified
      default hierarchy and legacy ones and subsystems can mark each file
      with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear
      only on one of them.  This is quite hairy and error-prone.  Also, we
      may end up exposing interface files to the default hierarchy without
      thinking it through.
      
      cgroup_subsys will grow two separate cftype arrays and apply each only
      on the hierarchies of the matching type.  This will allow organizing
      cftypes in a lot clearer way and encourage subsystems to scrutinize
      the interface which is being exposed in the new default hierarchy.
      
      In preparation, this patch renames cgroup_subsys->base_cftypes to
      cgroup_subsys->legacy_cftypes.  This patch is pure rename.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Aristeu Rozanski <aris@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      5577964e
    • T
      cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[] · a14c6874
      Tejun Heo 提交于
      Currently cgroup_base_files[] contains the cgroup core interface files
      for both legacy and default hierarchies with each file tagged with
      CFTYPE_INSANE and CFTYPE_ONLY_ON_DFL.  This is difficult to read.
      
      Let's separate it out to two separate tables, cgroup_dfl_base_files[]
      and cgroup_legacy_base_files[], and use the appropriate one in
      cgroup_mkdir() depending on the hierarchy type.  This makes tagging
      each file unnecessary.
      
      This patch doesn't introduce any behavior changes.
      
      v2: cgroup_dfl_base_files[] was missing the termination entry
          triggering WARN in cgroup_init_cftypes() for 0day kernel testing
          robot.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Jet Chen <jet.chen@intel.com>
      a14c6874
  12. 09 7月, 2014 9 次提交
    • T
      cgroup: clean up sane_behavior handling · 7b9a6ba5
      Tejun Heo 提交于
      After the previous patch to remove sane_behavior support from
      non-default hierarchies, CGRP_ROOT_SANE_BEHAVIOR is used only to
      indicate the default hierarchy while parsing mount options.  This
      patch makes the following cleanups around it.
      
      * Don't show it in the mount option.  Eventually the default hierarchy
        will be assigned a different filesystem type.
      
      * As sane_behavior is no longer effective on non-default hierarchies
        and the default hierarchy doesn't accept any mount options,
        parse_cgroupfs_options() can consider sane_behavior mount option as
        indicating the default hierarchy and fail if any other options are
        specified with it.  While at it, remove one of the double blank
        lines in the function.
      
      * cgroup_mount() can now simply test CGRP_ROOT_SANE_BEHAVIOR to tell
        whether to mount the default hierarchy or not.
      
      * As CGROUP_ROOT_SANE_BEHAVIOR's only role now is indicating whether
        to select the default hierarchy or not during mount, it doesn't need
        to be set in the default hierarchy itself.  cgroup_init_early()
        updated accordingly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      7b9a6ba5
    • T
      cgroup: remove sane_behavior support on non-default hierarchies · aa6ec29b
      Tejun Heo 提交于
      sane_behavior has been used as a development vehicle for the default
      unified hierarchy.  Now that the default hierarchy is in place, the
      flag became redundant and confusing as its usage is allowed on all
      hierarchies.  There are gonna be either the default hierarchy or
      legacy ones.  Let's make that clear by removing sane_behavior support
      on non-default hierarchies.
      
      This patch replaces cgroup_sane_behavior() with cgroup_on_dfl().  The
      comment on top of CGRP_ROOT_SANE_BEHAVIOR is moved to on top of
      cgroup_on_dfl() with sane_behavior specific part dropped.
      
      On the default and legacy hierarchies w/o sane_behavior, this
      shouldn't cause any behavior differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      aa6ec29b
    • T
      cgroup: make interface file "cgroup.sane_behavior" legacy-only · c1d5d42e
      Tejun Heo 提交于
      "cgroup.sane_behavior" is added to help distinguishing whether
      sane_behavior is in effect or not.  We now have the default hierarchy
      where the flag is always in effect and are planning to remove
      supporting sane behavior on the legacy hierarchies making this file on
      the default hierarchy rather pointless.  Let's make it legacy only and
      thus always zero.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      c1d5d42e
    • T
      cgroup: remove CGRP_ROOT_OPTION_MASK · 7450e90b
      Tejun Heo 提交于
      cgroup_root->flags only contains CGRP_ROOT_* flags and there's no
      reason to mask the flags.  Remove CGRP_ROOT_OPTION_MASK.
      
      This doesn't cause any behavior differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      7450e90b
    • T
      cgroup: implement cgroup_subsys->depends_on · af0ba678
      Tejun Heo 提交于
      Currently, the blkio subsystem attributes all of writeback IOs to the
      root.  One of the issues is that there's no way to tell who originated
      a writeback IO from block layer.  Those IOs are usually issued
      asynchronously from a task which didn't have anything to do with
      actually generating the dirty pages.  The memory subsystem, when
      enabled, already keeps track of the ownership of each dirty page and
      it's desirable for blkio to piggyback instead of adding its own
      per-page tag.
      
      blkio piggybacking on memory is an implementation detail which
      preferably should be handled automatically without requiring explicit
      userland action.  To achieve that, this patch implements
      cgroup_subsys->depends_on which contains the mask of subsystems which
      should be enabled together when the subsystem is enabled.
      
      The previous patches already implemented the support for enabled but
      invisible subsystems and cgroup_subsys->depends_on can be easily
      implemented by updating cgroup_refresh_child_subsys_mask() so that it
      calculates cgroup->child_subsys_mask considering
      cgroup_subsys->depends_on of the explicitly enabled subsystems.
      
      Documentation/cgroups/unified-hierarchy.txt is updated to explain that
      subsystems may not become immediately available after being unused
      from userland and that dependency could be a factor in it.  As
      subsystems may already keep residual references, this doesn't
      significantly change how subsystem rebinding can be used.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      af0ba678
    • T
      cgroup: implement cgroup_subsys->css_reset() · b4536f0c
      Tejun Heo 提交于
      cgroup is implementing support for subsystem dependency which would
      require a way to enable a subsystem even when it's not directly
      configured through "cgroup.subtree_control".
      
      The previous patches added support for explicitly and implicitly
      enabled subsystems and showing/hiding their interface files.  An
      explicitly enabled subsystem may become implicitly enabled if it's
      turned off through "cgroup.subtree_control" but there are subsystems
      depending on it.  In such cases, the subsystem, as it's turned off
      when seen from userland, shouldn't enforce any resource control.
      Also, the subsystem may be explicitly turned on later again and its
      interface files should be as close to the intial state as possible.
      
      This patch adds cgroup_subsys->css_reset() which is invoked when a css
      is hidden.  The callback should disable resource control and reset the
      state to the vanilla state.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      b4536f0c
    • T
      cgroup: make interface files visible iff enabled on cgroup->subtree_control · f63070d3
      Tejun Heo 提交于
      cgroup is implementing support for subsystem dependency which would
      require a way to enable a subsystem even when it's not directly
      configured through "cgroup.subtree_control".
      
      The preceding patch distinguished cgroup->subtree_control and
      ->child_subsys_mask where the former is the subsystems explicitly
      configured by the userland and the latter is all enabled subsystems
      currently is equal to the former but will include subsystems
      implicitly enabled through dependency.
      
      Subsystems which are enabled due to dependency shouldn't be visible to
      userland.  This patch updates cgroup_subtree_control_write() and
      create_css() such that interface files are not created for implicitly
      enabled subsytems.
      
      * @visible paramter is added to create_css().  Interface files are
        created only when true.
      
      * If an already implicitly enabled subsystem is turned on through
        "cgroup.subtree_control", the existing css should be used.  css
        draining is skipped.
      
      * cgroup_subtree_control_write() computes the new target
        cgroup->child_subsys_mask and create/kill or show/hide csses
        accordingly.
      
      As the two subsystem masks are still kept identical, this patch
      doesn't introduce any behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      f63070d3
    • T
      cgroup: introduce cgroup->subtree_control · 667c2491
      Tejun Heo 提交于
      cgroup is implementing support for subsystem dependency which would
      require a way to enable a subsystem even when it's not directly
      configured through "cgroup.subtree_control".
      
      Previously, cgroup->child_subsys_mask directly reflected
      "cgroup.subtree_control" and the enabled subsystems in the child
      cgroups.  This patch adds cgroup->subtree_control which
      "cgroup.subtree_control" operates on.  cgroup->child_subsys_mask is
      now calculated from cgroup->subtree_control by
      cgroup_refresh_child_subsys_mask(), which sets it identical to
      cgroup->subtree_control for now.
      
      This will allow using cgroup->child_subsys_mask for all the enabled
      subsystems including the implicit ones and ->subtree_control for
      tracking the explicitly requested ones.  This patch keeps the two
      masks identical and doesn't introduce any behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      667c2491
    • T
      cgroup: reorganize cgroup_subtree_control_write() · c29adf24
      Tejun Heo 提交于
      Make the following two reorganizations to
      cgroup_subtree_control_write().  These are to prepare for future
      changes and shouldn't cause any functional difference.
      
      * Move availability above css offlining wait.
      
      * Move cgrp->child_subsys_mask update above new css creation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      c29adf24
  13. 30 6月, 2014 2 次提交
    • L
      cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() · 3a32bd72
      Li Zefan 提交于
      We've converted cgroup to kernfs so cgroup won't be intertwined with
      vfs objects and locking, but there are dark areas.
      
      Run two instances of this script concurrently:
      
          for ((; ;))
          {
          	mount -t cgroup -o cpuacct xxx /cgroup
          	umount /cgroup
          }
      
      After a while, I saw two mount processes were stuck at retrying, because
      they were waiting for a subsystem to become free, but the root associated
      with this subsystem never got freed.
      
      This can happen, if thread A is in the process of killing superblock but
      hasn't called percpu_ref_kill(), and at this time thread B is mounting
      the same cgroup root and finds the root in the root list and performs
      percpu_ref_try_get().
      
      To fix this, we try to increase both the refcnt of the superblock and the
      percpu refcnt of cgroup root.
      
      v2:
      - we should try to get both the superblock refcnt and cgroup_root refcnt,
        because cgroup_root may have no superblock assosiated with it.
      - adjust/add comments.
      
      tj: Updated comments.  Renamed @sb to @pinned_sb.
      
      Cc: <stable@vger.kernel.org> # 3.15
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      3a32bd72
    • L
      cgroup: fix mount failure in a corner case · 970317aa
      Li Zefan 提交于
        # cat test.sh
        #! /bin/bash
      
        mount -t cgroup -o cpu xxx /cgroup
        umount /cgroup
      
        mount -t cgroup -o cpu,cpuacct xxx /cgroup
        umount /cgroup
        # ./test.sh
        mount: xxx already mounted or /cgroup busy
        mount: according to mtab, xxx is already mounted on /cgroup
      
      It's because the cgroupfs_root of the first mount was under destruction
      asynchronously.
      
      Fix this by delaying and then retrying mount for this case.
      
      v3:
      - put the refcnt immediately after getting it. (Tejun)
      
      v2:
      - use percpu_ref_tryget_live() rather that introducing
        percpu_ref_alive(). (Tejun)
      - adjust comment.
      
      tj: Updated the comment a bit.
      
      Cc: <stable@vger.kernel.org> # 3.15
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      970317aa
  14. 28 6月, 2014 1 次提交
    • T
      percpu-refcount: require percpu_ref to be exited explicitly · 9a1049da
      Tejun Heo 提交于
      Currently, a percpu_ref undoes percpu_ref_init() automatically by
      freeing the allocated percpu area when the percpu_ref is killed.
      While seemingly convenient, this has the following niggles.
      
      * It's impossible to re-init a released reference counter without
        going through re-allocation.
      
      * In the similar vein, it's impossible to initialize a percpu_ref
        count with static percpu variables.
      
      * We need and have an explicit destructor anyway for failure paths -
        percpu_ref_cancel_init().
      
      This patch removes the automatic percpu counter freeing in
      percpu_ref_kill_rcu() and repurposes percpu_ref_cancel_init() into a
      generic destructor now named percpu_ref_exit().  percpu_ref_destroy()
      is considered but it gets confusing with percpu_ref_kill() while
      "exit" clearly indicates that it's the counterpart of
      percpu_ref_init().
      
      All percpu_ref_cancel_init() users are updated to invoke
      percpu_ref_exit() instead and explicit percpu_ref_exit() calls are
      added to the destruction path of all percpu_ref users.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Li Zefan <lizefan@huawei.com>
      9a1049da
  15. 18 6月, 2014 1 次提交
  16. 05 6月, 2014 2 次提交
    • L
      cgroup: disallow disabled controllers on the default hierarchy · c731ae1d
      Li Zefan 提交于
      After booting with cgroup_disable=memory, I still saw memcg files
      in the default hierarchy, and I can write to them, though it won't
      take effect.
      
        # dmesg
        ...
        Disabling memory control group subsystem
        ...
        # mount -t cgroup -o __DEVEL__sane_behavior xxx /cgroup
        # ls /cgroup
        ...
        memory.failcnt                   memory.move_charge_at_immigrate
        memory.force_empty               memory.numa_stat
        memory.limit_in_bytes            memory.oom_control
        ...
        # cat /cgroup/memory.usage_in_bytes
        0
      
      tj: Minor comment update.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      c731ae1d
    • L
      cgroup: don't destroy the default root · 1f779fb2
      Li Zefan 提交于
      The default root is allocated and initialized at boot phase, so we
      shouldn't destroy the default root when it's umounted, otherwise
      it will lead to disaster.
      
      Just try mount and then umount the default root, and the kernel will
      crash immediately.
      
      v2:
      - No need to check for CSS_NO_REF in cgroup_get/put(). (Tejun)
      - Better call cgroup_put() for the default root in kill_sb(). (Tejun)
      - Add a comment.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      1f779fb2
  17. 03 6月, 2014 1 次提交
  18. 28 5月, 2014 1 次提交