1. 04 3月, 2016 1 次提交
  2. 03 3月, 2016 19 次提交
    • T
      cgroup: update css iteration in cgroup_update_dfl_csses() · 54962604
      Tejun Heo 提交于
      The existing sequences of operations ensure that the offlining csses
      are drained before cgroup_update_dfl_csses(), so even though
      cgroup_update_dfl_csses() uses css_for_each_descendant_pre() to walk
      the target cgroups, it doesn't end up operating on dead cgroups.
      Also, the function explicitly excludes the subtree root from
      operation.
      
      This is fragile and inconsistent with the rest of css update
      operations.  This patch updates cgroup_update_dfl_csses() to use
      cgroup_for_each_live_descendant_pre() instead and include the subtree
      root.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      54962604
    • T
      cgroup: allocate 2x cgrp_cset_links when setting up a new root · 04313591
      Tejun Heo 提交于
      During prep, cgroup_setup_root() allocates cgrp_cset_links matching
      the number of existing css_sets to later link the new root.  This is
      fine for now as the only operation which can happen inbetween is
      rebind_subsystems() and rebinding of empty subsystems doesn't create
      new css_sets.
      
      However, while not yet allowed, with the recent reimplementation,
      rebind_subsystems() can rebind subsystems with descendant csses and
      thus can create new css_sets.  This patch makes cgroup_setup_root()
      allocate 2x of the existing css_sets so that later use of live
      subsystem rebinding doesn't blow up.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      04313591
    • T
      cgroup: make cgroup_calc_subtree_ss_mask() take @this_ss_mask · 5ced2518
      Tejun Heo 提交于
      cgroup_calc_subtree_ss_mask() currently takes @cgrp and
      @subtree_control.  @cgrp is used for two purposes - to decide whether
      it's for default hierarchy and the mask of available subsystems.  The
      former doesn't matter as the results are the same regardless.  The
      latter can be specified directly through a subsystem mask.
      
      This patch makes cgroup_calc_subtree_ss_mask() perform the same
      calculations for both default and legacy hierarchies and take
      @this_ss_mask for available subsystems.  @cgrp is no longer used and
      dropped.  This is to allow using the function in contexts where
      available controllers can't be decided from the cgroup.
      
      v2: cgroup_refres_subtree_ss_mask() is removed by a previous patch.
          Updated accordingly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      5ced2518
    • T
      cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends · 334c3679
      Tejun Heo 提交于
      rebind_subsystem() open codes quite a bit of css and interface file
      manipulations.  It tries to be fail-safe but doesn't quite achieve it.
      It can be greatly simplified by using the new css management helpers.
      This patch reimplements rebind_subsytsems() using
      cgroup_apply_control() and friends.
      
      * The half-baked rollback on file creation failure is dropped.  It is
        an extremely cold path, failure isn't critical, and, aside from
        kernel bugs, the only reason it can fail is memory allocation
        failure which pretty much doesn't happen for small allocations.
      
      * As cgroup_apply_control_disable() is now used to clean up root
        cgroup on rebind, make sure that it doesn't end up killing root
        csses.
      
      * All callers of rebind_subsystems() are updated to use
        cgroup_lock_and_drain_offline() as the apply_control functions
        require drained subtree.
      
      * This leaves cgroup_refresh_subtree_ss_mask() without any user.
        Removed.
      
      * css_populate_dir() and css_clear_dir() no longer needs
        @cgrp_override parameter.  Dropped.
      
      * While at it, add WARN_ON() to rebind_subsystem() calls which are
        expected to always succeed just in case.
      
      While the rules visible to userland aren't changed, this
      reimplementation not only simplifies rebind_subsystems() but also
      allows it to disable and enable csses recursively.  This can be used
      to implement more flexible rebinding.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      334c3679
    • T
      cgroup: use cgroup_apply_enable_control() in cgroup creation path · 03970d3c
      Tejun Heo 提交于
      cgroup_create() manually updates control masks and creates child csses
      which cgroup_mkdir() then manually populates.  Both can be simplified
      by using cgroup_apply_enable_control() and friends.  The only catch is
      that it calls css_populate_dir() with NULL cgroup->kn during
      cgroup_create().  This is worked around by making the function noop on
      NULL kn.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      03970d3c
    • T
      cgroup: combine cgroup_mutex locking and offline css draining · 945ba199
      Tejun Heo 提交于
      cgroup_drain_offline() is used to wait for csses being offlined to
      uninstall itself from cgroup->subsys[] array so that new csses can be
      installed.  The function's only user, cgroup_subtree_control_write(),
      calls it after performing some checks and restarts the whole process
      via restart_syscall() if draining has to release cgroup_mutex to wait.
      
      This can be simplified by draining before other synchronized
      operations so that there's nothing to restart.  This patch converts
      cgroup_drain_offline() to cgroup_lock_and_drain_offline() which
      performs both locking and draining and updates cgroup_kn_lock_live()
      use it instead of cgroup_mutex() if requested.  This combined locking
      and draining operations are easier to use and less error-prone.
      
      While at it, add WARNs in control_apply functions which triggers if
      the subtree isn't properly drained.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      945ba199
    • T
      cgroup: factor out cgroup_{apply|finalize}_control() from cgroup_subtree_control_write() · f7b2814b
      Tejun Heo 提交于
      Factor out cgroup_{apply|finalize}_control() so that control mask
      update can be done in several simple steps.  This patch doesn't
      introduce behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      f7b2814b
    • T
      cgroup: introduce cgroup_{save|propagate|restore}_control() · 15a27c36
      Tejun Heo 提交于
      While controllers are being enabled and disabled in
      cgroup_subtree_control_write(), the original subsystem masks are
      stashed in local variables so that they can be restored if the
      operation fails in the middle.
      
      This patch adds dedicated fields to struct cgroup to be used instead
      of the local variables and implements functions to stash the current
      values, propagate the changes and restore them recursively.  Combined
      with the previous changes, this makes subsystem management operations
      fully recursive and modularlized.  This will be used to expand cgroup
      core functionalities.
      
      While at it, remove now unused @css_enable and @css_disable from
      cgroup_subtree_control_write().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      15a27c36
    • T
      cgroup: make cgroup_drain_offline() and cgroup_apply_control_{disable|enable}() recursive · ce3f1d9d
      Tejun Heo 提交于
      The three factored out css management operations -
      cgroup_drain_offline() and cgroup_apply_control_{disable|enable}() -
      only depend on the current state of the target cgroups and idempotent
      and thus can be easily made to operate on the subtree instead of the
      immediate children.
      
      This patch introduces the iterators which walk live subtree and
      converts the three functions to operate on the subtree including self
      instead of the children.  While this leads to spurious walking and be
      slightly more expensive, it will allow them to be used for wider scope
      of operations.
      
      Note that cgroup_drain_offline() now tests for whether a css is dying
      before trying to drain it.  This is to avoid trying to drain live
      csses as there can be mix of live and dying csses in a subtree unlike
      children of the same parent.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      ce3f1d9d
    • T
      cgroup: factor out cgroup_apply_control_enable() from cgroup_subtree_control_write() · bdb53bd7
      Tejun Heo 提交于
      Factor out css enabling and showing into cgroup_apply_control_enable().
      
      * Nest subsystem walk inside child walk.  The child walk will later be
        converted to subtree walk which is a bit more expensive.
      
      * Instead of operating on the differential masks @css_enable, simply
        enable or show csses according to the current cgroup_control() and
        cgroup_ss_mask().  This leads to the same result and is simpler and
        more robust.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      bdb53bd7
    • T
      cgroup: factor out cgroup_apply_control_disable() from cgroup_subtree_control_write() · 12b3bb6a
      Tejun Heo 提交于
      Factor out css disabling and hiding into cgroup_apply_control_disable().
      
      * Nest subsystem walk inside child walk.  The child walk will later be
        converted to subtree walk which is a bit more expensive.
      
      * Instead of operating on the differential masks @css_enable and
        @css_disable, simply disable or hide csses according to the current
        cgroup_control() and cgroup_ss_mask().  This leads to the same
        result and is simpler and more robust.
      
      * This allows error handling path to share the same code.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      12b3bb6a
    • T
      cgroup: factor out cgroup_drain_offline() from cgroup_subtree_control_write() · 1b9b96a1
      Tejun Heo 提交于
      Factor out async css offline draining into cgroup_drain_offline().
      
      * Nest subsystem walk inside child walk.  The child walk will later be
        converted to subtree walk which is a bit more expensive.
      
      * Relocate the draining above subsystem mask preparation, which
        doesn't create any behavior differences but helps further
        refactoring.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      1b9b96a1
    • T
      cgroup: introduce cgroup_control() and cgroup_ss_mask() · 5531dc91
      Tejun Heo 提交于
      When a controller is enabled and visible on a non-root cgroup is
      determined by subtree_control and subtree_ss_mask of the parent
      cgroup.  For a root cgroup, by the type of the hierarchy and which
      controllers are attached to it.  Deciding the above on each usage is
      fragile and unnecessarily complicates the users.
      
      This patch introduces cgroup_control() and cgroup_ss_mask() which
      calculate and return the [visibly] enabled subsyste mask for the
      specified cgroup and conver the existing usages.
      
      * cgroup_e_css() is restructured for simplicity.
      
      * cgroup_calc_subtree_ss_mask() and cgroup_subtree_control_write() no
        longer need to distinguish root and non-root cases.
      
      * With cgroup_control(), cgroup_controllers_show() can now handle both
        root and non-root cases.  cgroup_root_controllers_show() is removed.
      
      v2: cgroup_control() updated to yield the correct result on v1
          hierarchies too.  cgroup_subtree_control_write() converted.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      5531dc91
    • T
      cgroup: factor out cgroup_create() out of cgroup_mkdir() · a5bca215
      Tejun Heo 提交于
      We're in the process of refactoring cgroup and css management paths to
      separate them out to eventually allow cgroups which aren't visible
      through cgroup fs.  This patch factors out cgroup_create() out of
      cgroup_mkdir().  cgroup_create() contains all internal object creation
      and initialization.  cgroup_mkdir() uses cgroup_create() to create the
      internal cgroup and adds interface directory and file creation.
      
      This patch doesn't cause any behavior differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      a5bca215
    • T
      cgroup: reorder operations in cgroup_mkdir() · 195e9b6c
      Tejun Heo 提交于
      Currently, operations to initialize internal objects and create
      interface directory and files are intermixed in cgroup_mkdir().  We're
      in the process of refactoring cgroup and css management paths to
      separate them out to eventually allow cgroups which aren't visible
      through cgroup fs.
      
      This patch reorders operations inside cgroup_mkdir() so that interface
      directory and file handling comes after internal object
      initialization.  This will enable further refactoring.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      195e9b6c
    • T
      cgroup: explicitly track whether a cgroup_subsys_state is visible to userland · 88cb04b9
      Tejun Heo 提交于
      Currently, whether a css (cgroup_subsys_state) has its interface files
      created is not tracked and assumed to change together with the owning
      cgroup's lifecycle.  cgroup directory and interface creation is being
      separated out from internal object creation to help refactoring and
      eventually allow cgroups which are not visible through cgroupfs.
      
      This patch adds CSS_VISIBLE to track whether a css has its interface
      files created and perform management operations only when necessary
      which helps decoupling interface file handling from internal object
      lifecycle.  After this patch, all css interface file management
      functions can be called regardless of the current state and will
      achieve the expected result.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      88cb04b9
    • T
      cgroup: separate out interface file creation from css creation · 6cd0f5bb
      Tejun Heo 提交于
      Currently, interface files are created when a css is created depending
      on whether @visible is set.  This patch separates out the two into
      separate steps to help code refactoring and eventually allow cgroups
      which aren't visible through cgroup fs.
      
      Move css_populate_dir() out of create_css() and drop @visible.  While
      at it, rename the function to css_create() for consistency.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      6cd0f5bb
    • T
      cgroup: suppress spurious de-populated events · 20b454a6
      Tejun Heo 提交于
      During task migration, tasks may transfer between two css_sets which
      are associated with the same cgroup.  If those tasks are the only
      tasks in the cgroup, this currently triggers a spurious de-populated
      event on the cgroup.
      
      Fix it by bumping up populated count before bumping it down during
      migration to ensure that it doesn't reach zero spuriously.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      20b454a6
    • T
      cgroup: re-hash init_css_set after subsystems are initialized · 2378d8b8
      Tejun Heo 提交于
      css_sets are hashed by their subsys[] contents and in cgroup_init()
      init_css_set is hashed early, before subsystem inits, when all entries
      in its subsys[] are NULL, so that cgroup_dfl_root initialization can
      find and link to it.  As subsystems are initialized,
      init_css_set.subsys[] is filled up but the hashing is never updated
      making init_css_set hashed in the wrong place.  While incorrect, this
      doesn't cause a critical failure as css_set management code would
      create an identical css_set dynamically.
      
      Fix it by rehashing init_css_set after subsystems are initialized.
      While at it, drop unnecessary @key local variable.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      2378d8b8
  3. 02 3月, 2016 1 次提交
    • V
      cgroup: reset css on destruction · fa06235b
      Vladimir Davydov 提交于
      An associated css can be around for quite a while after a cgroup
      directory has been removed. In general, it makes sense to reset it to
      defaults so as not to worry about any remnants. For instance, memory
      cgroup needs to reset memory.low, otherwise pages charged to a dead
      cgroup might never get reclaimed. There's ->css_reset callback, which
      would fit perfectly for the purpose. Currently, it's only called when a
      subsystem is disabled in the unified hierarchy and there are other
      subsystems dependant on it. Let's call it on css destruction as well.
      Suggested-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      fa06235b
  4. 27 2月, 2016 1 次提交
  5. 23 2月, 2016 10 次提交
  6. 17 2月, 2016 1 次提交
  7. 13 2月, 2016 1 次提交
    • J
      cgroup: provide cgroup_nov1= to disable controllers in v1 mounts · 223ffb29
      Johannes Weiner 提交于
      Testing cgroup2 can be painful with system software automatically
      mounting and populating all cgroup controllers in v1 mode. Sometimes
      they can be unmounted from rc.local, sometimes even that is too late.
      
      Provide a commandline option to disable certain controllers in v1
      mounts, so that they remain available for cgroup2 mounts.
      
      Example use:
      
      cgroup_no_v1=memory,cpu
      cgroup_no_v1=all
      
      Disabling will be confirmed at boot-time as such:
      
      [    0.013770] Disabling cpu control group subsystem in v1 mounts
      [    0.016004] Disabling memory control group subsystem in v1 mounts
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      223ffb29
  8. 31 1月, 2016 1 次提交
  9. 29 1月, 2016 1 次提交
  10. 22 1月, 2016 4 次提交
    • T
      cgroup: make sure a parent css isn't freed before its children · 8bb5ef79
      Tejun Heo 提交于
      There are three subsystem callbacks in css shutdown path -
      css_offline(), css_released() and css_free().  Except for
      css_released(), cgroup core didn't guarantee the order of invocation.
      css_offline() or css_free() could be called on a parent css before its
      children.  This behavior is unexpected and led to bugs in cpu and
      memory controller.
      
      The previous patch updated ordering for css_offline() which fixes the
      cpu controller issue.  While there currently isn't a known bug caused
      by misordering of css_free() invocations, let's fix it too for
      consistency.
      
      css_free() ordering can be trivially fixed by moving putting of the
      parent css below css_free() invocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8bb5ef79
    • T
      cgroup: make sure a parent css isn't offlined before its children · aa226ff4
      Tejun Heo 提交于
      There are three subsystem callbacks in css shutdown path -
      css_offline(), css_released() and css_free().  Except for
      css_released(), cgroup core didn't guarantee the order of invocation.
      css_offline() or css_free() could be called on a parent css before its
      children.  This behavior is unexpected and led to bugs in cpu and
      memory controller.
      
      This patch updates offline path so that a parent css is never offlined
      before its children.  Each css keeps online_cnt which reaches zero iff
      itself and all its children are offline and offline_css() is invoked
      only after online_cnt reaches zero.
      
      This fixes the memory controller bug and allows the fix for cpu
      controller.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reported-by: NBrian Christiansen <brian.o.christiansen@gmail.com>
      Link: http://lkml.kernel.org/g/5698A023.9070703@de.ibm.com
      Link: http://lkml.kernel.org/g/CAKB58ikDkzc8REt31WBkD99+hxNzjK4+FBmhkgS+NVrC9vjMSg@mail.gmail.com
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      aa226ff4
    • T
      cpuset: make mm migration asynchronous · e93ad19d
      Tejun Heo 提交于
      If "cpuset.memory_migrate" is set, when a process is moved from one
      cpuset to another with a different memory node mask, pages in used by
      the process are migrated to the new set of nodes.  This was performed
      synchronously in the ->attach() callback, which is synchronized
      against process management.  Recently, the synchronization was changed
      from per-process rwsem to global percpu rwsem for simplicity and
      optimization.
      
      Combined with the synchronous mm migration, this led to deadlocks
      because mm migration could schedule a work item which may in turn try
      to create a new worker blocking on the process management lock held
      from cgroup process migration path.
      
      This heavy an operation shouldn't be performed synchronously from that
      deep inside cgroup migration in the first place.  This patch punts the
      actual migration to an ordered workqueue and updates cgroup process
      migration and cpuset config update paths to flush the workqueue after
      all locks are released.  This way, the operations still seem
      synchronous to userland without entangling mm migration with process
      management synchronization.  CPU hotplug can also invoke mm migration
      but there's no reason for it to wait for mm migrations and thus
      doesn't synchronize against their completions.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Cc: stable@vger.kernel.org # v4.4+
      e93ad19d
    • L
      Merge branch 'for-4.5/nvme' of git://git.kernel.dk/linux-block · 3e1e21c7
      Linus Torvalds 提交于
      Pull NVMe updates from Jens Axboe:
       "Last branch for this series is the nvme changes.  It's in a separate
        branch to avoid splitting too much between core and NVMe changes,
        since NVMe is still helping drive some blk-mq changes.  That said, not
        a huge amount of core changes in here.  The grunt of the work is the
        continued split of the code"
      
      * 'for-4.5/nvme' of git://git.kernel.dk/linux-block: (67 commits)
        uapi: update install list after nvme.h rename
        NVMe: Export NVMe attributes to sysfs group
        NVMe: Shutdown controller only for power-off
        NVMe: IO queue deletion re-write
        NVMe: Remove queue freezing on resets
        NVMe: Use a retryable error code on reset
        NVMe: Fix admin queue ring wrap
        nvme: make SG_IO support optional
        nvme: fixes for NVME_IOCTL_IO_CMD on the char device
        nvme: synchronize access to ctrl->namespaces
        nvme: Move nvme_freeze/unfreeze_queues to nvme core
        PCI/AER: include header file
        NVMe: Export namespace attributes to sysfs
        NVMe: Add pci error handlers
        block: remove REQ_NO_TIMEOUT flag
        nvme: merge iod and cmd_info
        nvme: meta_sg doesn't have to be an array
        nvme: properly free resources for cancelled command
        nvme: simplify completion handling
        nvme: special case AEN requests
        ...
      3e1e21c7