1. 18 8月, 2017 2 次提交
  2. 12 8月, 2017 1 次提交
  3. 11 8月, 2017 1 次提交
    • T
      cgroup: misc changes · 3e48930c
      Tejun Heo 提交于
      Misc trivial changes to prepare for future changes.  No functional
      difference.
      
      * Expose cgroup_get(), cgroup_tryget() and cgroup_parent().
      
      * Implement task_dfl_cgroup() which dereferences css_set->dfl_cgrp.
      
      * Rename cgroup_stats_show() to cgroup_stat_show() for consistency
        with the file name.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      3e48930c
  4. 03 8月, 2017 5 次提交
    • T
      cgroup: short-circuit cset_cgroup_from_root() on the default hierarchy · 13d82fb7
      Tejun Heo 提交于
      Each css_set directly points to the default cgroup it belongs to, so
      there's no reason to walk the cgrp_links list on the default
      hierarchy.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      13d82fb7
    • R
      cgroup: re-use the parent pointer in cgroup_destroy_locked() · 5a621e6c
      Roman Gushchin 提交于
      As we already have a pointer to the parent cgroup in
      cgroup_destroy_locked(), we don't need to calculate it again
      to pass as an argument for cgroup1_check_for_release().
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Suggested-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: kernel-team@fb.com
      Cc: linux-kernel@vger.kernel.org
      5a621e6c
    • R
      cgroup: add cgroup.stat interface with basic hierarchy stats · ec39225c
      Roman Gushchin 提交于
      A cgroup can consume resources even after being deleted by a user.
      For example, writing back dirty pages should be accounted and
      limited, despite the corresponding cgroup might contain no processes
      and being deleted by a user.
      
      In the current implementation a cgroup can remain in such "dying" state
      for an undefined amount of time. For instance, if a memory cgroup
      contains a pge, mlocked by a process belonging to an other cgroup.
      
      Although the lifecycle of a dying cgroup is out of user's control,
      it's important to have some insight of what's going on under the hood.
      
      In particular, it's handy to have a counter which will allow
      to detect css leaks.
      
      To solve this problem, add a cgroup.stat interface to
      the base cgroup control files with the following metrics:
      
      nr_descendants		total number of visible descendant cgroups
      nr_dying_descendants	total number of dying descendant cgroups
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Suggested-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: kernel-team@fb.com
      Cc: cgroups@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      ec39225c
    • R
      cgroup: implement hierarchy limits · 1a926e0b
      Roman Gushchin 提交于
      Creating cgroup hierearchies of unreasonable size can affect
      overall system performance. A user might want to limit the
      size of cgroup hierarchy. This is especially important if a user
      is delegating some cgroup sub-tree.
      
      To address this issue, introduce an ability to control
      the size of cgroup hierarchy.
      
      The cgroup.max.descendants control file allows to set the maximum
      allowed number of descendant cgroups.
      The cgroup.max.depth file controls the maximum depth of the cgroup
      tree. Both are single value r/w files, with "max" default value.
      
      The control files exist on each hierarchy level (including root).
      When a new cgroup is created, we check the total descendants
      and depth limits on each level, and if none of them are exceeded,
      a new cgroup is created.
      
      Only alive cgroups are counted, removed (dying) cgroups are
      ignored.
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Suggested-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: kernel-team@fb.com
      Cc: cgroups@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      1a926e0b
    • R
      cgroup: keep track of number of descent cgroups · 0679dee0
      Roman Gushchin 提交于
      Keep track of the number of online and dying descent cgroups.
      
      This data will be used later to add an ability to control cgroup
      hierarchy (limit the depth and the number of descent cgroups)
      and display hierarchy stats.
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Suggested-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: kernel-team@fb.com
      Cc: cgroups@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      0679dee0
  5. 26 7月, 2017 2 次提交
  6. 21 7月, 2017 6 次提交
    • W
      cgroup: update debug controller to print out thread mode information · 7a0cf0e7
      Waiman Long 提交于
      Update debug controller so that it prints out debug info about thread
      mode.
      
       1) The relationship between proc_cset and threaded_csets are displayed.
       2) The status of being a thread root or threaded cgroup is displayed.
      
      This patch is extracted from Waiman's larger patch.
      
      v2: - Removed [thread root] / [threaded] from debug.cgroup_css_links
            file as the same information is available from cgroup.type.
            Suggested by Waiman.
          - Threaded marking is moved to the previous patch.
      Patch-originally-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      7a0cf0e7
    • T
      cgroup: implement cgroup v2 thread support · 8cfd8147
      Tejun Heo 提交于
      This patch implements cgroup v2 thread support.  The goal of the
      thread mode is supporting hierarchical accounting and control at
      thread granularity while staying inside the resource domain model
      which allows coordination across different resource controllers and
      handling of anonymous resource consumptions.
      
      A cgroup is always created as a domain and can be made threaded by
      writing to the "cgroup.type" file.  When a cgroup becomes threaded, it
      becomes a member of a threaded subtree which is anchored at the
      closest ancestor which isn't threaded.
      
      The threads of the processes which are in a threaded subtree can be
      placed anywhere without being restricted by process granularity or
      no-internal-process constraint.  Note that the threads aren't allowed
      to escape to a different threaded subtree.  To be used inside a
      threaded subtree, a controller should explicitly support threaded mode
      and be able to handle internal competition in the way which is
      appropriate for the resource.
      
      The root of a threaded subtree, the nearest ancestor which isn't
      threaded, is called the threaded domain and serves as the resource
      domain for the whole subtree.  This is the last cgroup where domain
      controllers are operational and where all the domain-level resource
      consumptions in the subtree are accounted.  This allows threaded
      controllers to operate at thread granularity when requested while
      staying inside the scope of system-level resource distribution.
      
      As the root cgroup is exempt from the no-internal-process constraint,
      it can serve as both a threaded domain and a parent to normal cgroups,
      so, unlike non-root cgroups, the root cgroup can have both domain and
      threaded children.
      
      Internally, in a threaded subtree, each css_set has its ->dom_cset
      pointing to a matching css_set which belongs to the threaded domain.
      This ensures that thread root level cgroup_subsys_state for all
      threaded controllers are readily accessible for domain-level
      operations.
      
      This patch enables threaded mode for the pids and perf_events
      controllers.  Neither has to worry about domain-level resource
      consumptions and it's enough to simply set the flag.
      
      For more details on the interface and behavior of the thread mode,
      please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added
      by this patch.
      
      v5: - Dropped silly no-op ->dom_cgrp init from cgroup_create().
            Spotted by Waiman.
          - Documentation updated as suggested by Waiman.
          - cgroup.type content slightly reformatted.
          - Mark the debug controller threaded.
      
      v4: - Updated to the general idea of marking specific cgroups
            domain/threaded as suggested by PeterZ.
      
      v3: - Dropped "join" and always make mixed children join the parent's
            threaded subtree.
      
      v2: - After discussions with Waiman, support for mixed thread mode is
            added.  This should address the issue that Peter pointed out
            where any nesting should be avoided for thread subtrees while
            coexisting with other domain cgroups.
          - Enabling / disabling thread mode now piggy backs on the existing
            control mask update mechanism.
          - Bug fixes and cleanup.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8cfd8147
    • T
      cgroup: implement CSS_TASK_ITER_THREADED · 450ee0c1
      Tejun Heo 提交于
      cgroup v2 is in the process of growing thread granularity support.
      Once thread mode is enabled, the root cgroup of the subtree serves as
      the dom_cgrp to which the processes of the subtree conceptually belong
      and domain-level resource consumptions not tied to any specific task
      are charged.  In the subtree, threads won't be subject to process
      granularity or no-internal-task constraint and can be distributed
      arbitrarily across the subtree.
      
      This patch implements a new task iterator flag CSS_TASK_ITER_THREADED,
      which, when used on a dom_cgrp, makes the iteration include the tasks
      on all the associated threaded css_sets.  "cgroup.procs" read path is
      updated to use it so that reading the file on a proc_cgrp lists all
      processes.  This will also be used by controller implementations which
      need to walk processes or tasks at the resource domain level.
      
      Task iteration is implemented nested in css_set iteration.  If
      CSS_TASK_ITER_THREADED is specified, after walking tasks of each
      !threaded css_set, all the associated threaded css_sets are visited
      before moving onto the next !threaded css_set.
      
      v2: ->cur_pcset renamed to ->cur_dcset.  Updated for the new
          enable-threaded-per-cgroup behavior.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      450ee0c1
    • T
      cgroup: introduce cgroup->dom_cgrp and threaded css_set handling · 454000ad
      Tejun Heo 提交于
      cgroup v2 is in the process of growing thread granularity support.  A
      threaded subtree is composed of a thread root and threaded cgroups
      which are proper members of the subtree.
      
      The root cgroup of the subtree serves as the domain cgroup to which
      the processes (as opposed to threads / tasks) of the subtree
      conceptually belong and domain-level resource consumptions not tied to
      any specific task are charged.  Inside the subtree, threads won't be
      subject to process granularity or no-internal-task constraint and can
      be distributed arbitrarily across the subtree.
      
      This patch introduces cgroup->dom_cgrp along with threaded css_set
      handling.
      
      * cgroup->dom_cgrp points to self for normal and thread roots.  For
        proper thread subtree members, points to the dom_cgrp (the thread
        root).
      
      * css_set->dom_cset points to self if for normal and thread roots.  If
        threaded, points to the css_set which belongs to the cgrp->dom_cgrp.
        The dom_cgrp serves as the resource domain and keeps the matching
        csses available.  The dom_cset holds those csses and makes them
        easily accessible.
      
      * All threaded csets are linked on their dom_csets to enable iteration
        of all threaded tasks.
      
      * cgroup->nr_threaded_children keeps track of the number of threaded
        children.
      
      This patch adds the above but doesn't actually use them yet.  The
      following patches will build on top.
      
      v4: ->nr_threaded_children added.
      
      v3: ->proc_cgrp/cset renamed to ->dom_cgrp/cset.  Updated for the new
          enable-threaded-per-cgroup behavior.
      
      v2: Added cgroup_is_threaded() helper.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      454000ad
    • T
      cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS · bc2fb7ed
      Tejun Heo 提交于
      css_task_iter currently always walks all tasks.  With the scheduled
      cgroup v2 thread support, the iterator would need to handle multiple
      types of iteration.  As a preparation, add @flags to
      css_task_iter_start() and implement CSS_TASK_ITER_PROCS.  If the flag
      is not specified, it walks all tasks as before.  When asserted, the
      iterator only walks the group leaders.
      
      For now, the only user of the flag is cgroup v2 "cgroup.procs" file
      which no longer needs to skip non-leader tasks in cgroup_procs_next().
      Note that cgroup v1 "cgroup.procs" can't use the group leader walk as
      v1 "cgroup.procs" doesn't mean "list all thread group leaders in the
      cgroup" but "list all thread group id's with any threads in the
      cgroup".
      
      While at it, update cgroup_procs_show() to use task_pid_vnr() instead
      of task_tgid_vnr().  As the iteration guarantees that the function
      only sees group leaders, this doesn't change the output and will allow
      sharing the function for thread iteration.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      bc2fb7ed
    • T
      cgroup: reorganize cgroup.procs / task write path · 715c809d
      Tejun Heo 提交于
      Currently, writes "cgroup.procs" and "cgroup.tasks" files are all
      handled by __cgroup_procs_write() on both v1 and v2.  This patch
      reoragnizes the write path so that there are common helper functions
      that different write paths use.
      
      While this somewhat increases LOC, the different paths are no longer
      intertwined and each path has more flexibility to implement different
      behaviors which will be necessary for the planned v2 thread support.
      
      v3: - Restructured so that cgroup_procs_write_permission() takes
            @src_cgrp and @dst_cgrp.
      
      v2: - Rolled in Waiman's task reference count fix.
          - Updated on top of nsdelegate changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      715c809d
  7. 17 7月, 2017 3 次提交
  8. 07 7月, 2017 2 次提交
    • V
      mm, cpuset: always use seqlock when changing task's nodemask · 5f155f27
      Vlastimil Babka 提交于
      When updating task's mems_allowed and rebinding its mempolicy due to
      cpuset's mems being changed, we currently only take the seqlock for
      writing when either the task has a mempolicy, or the new mems has no
      intersection with the old mems.
      
      This should be enough to prevent a parallel allocation seeing no
      available nodes, but the optimization is IMHO unnecessary (cpuset
      updates should not be frequent), and we still potentially risk issues if
      the intersection of new and old nodes has limited amount of
      free/reclaimable memory.
      
      Let's just use the seqlock for all tasks.
      
      Link: http://lkml.kernel.org/r/20170517081140.30654-6-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5f155f27
    • V
      mm, mempolicy: simplify rebinding mempolicies when updating cpusets · 213980c0
      Vlastimil Babka 提交于
      Commit c0ff7453 ("cpuset,mm: fix no node to alloc memory when
      changing cpuset's mems") has introduced a two-step protocol when
      rebinding task's mempolicy due to cpuset update, in order to avoid a
      parallel allocation seeing an empty effective nodemask and failing.
      
      Later, commit cc9a6c87 ("cpuset: mm: reduce large amounts of memory
      barrier related damage v3") introduced a seqlock protection and removed
      the synchronization point between the two update steps.  At that point
      (or perhaps later), the two-step rebinding became unnecessary.
      
      Currently it only makes sure that the update first adds new nodes in
      step 1 and then removes nodes in step 2.  Without memory barriers the
      effects are questionable, and even then this cannot prevent a parallel
      zonelist iteration checking the nodemask at each step to observe all
      nodes as unusable for allocation.  We now fully rely on the seqlock to
      prevent premature OOMs and allocation failures.
      
      We can thus remove the two-step update parts and simplify the code.
      
      Link: http://lkml.kernel.org/r/20170517081140.30654-5-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      213980c0
  9. 29 6月, 2017 2 次提交
    • T
      cgroup: implement "nsdelegate" mount option · 5136f636
      Tejun Heo 提交于
      Currently, cgroup only supports delegation to !root users and cgroup
      namespaces don't get any special treatments.  This limits the
      usefulness of cgroup namespaces as they by themselves can't be safe
      delegation boundaries.  A process inside a cgroup can change the
      resource control knobs of the parent in the namespace root and may
      move processes in and out of the namespace if cgroups outside its
      namespace are visible somehow.
      
      This patch adds a new mount option "nsdelegate" which makes cgroup
      namespaces delegation boundaries.  If set, cgroup behaves as if write
      permission based delegation took place at namespace boundaries -
      writes to the resource control knobs from the namespace root are
      denied and migration crossing the namespace boundary aren't allowed
      from inside the namespace.
      
      This allows cgroup namespace to function as a delegation boundary by
      itself.
      
      v2: Silently ignore nsdelegate specified on !init mounts.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Aravind Anbudurai <aru7@fb.com>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      5136f636
    • T
      cgroup: restructure cgroup_procs_write_permission() · 824ecbe0
      Tejun Heo 提交于
      Restructure cgroup_procs_write_permission() to make extending
      permission logic easier.
      
      This patch doesn't cause any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      824ecbe0
  10. 15 6月, 2017 7 次提交
    • T
      cgroup: fix lockdep warning in debug controller · b6053d40
      Tejun Heo 提交于
      The debug controller grabs cgroup_mutex from interface file show
      functions which can deadlock and triggers lockdep warnings.  Fix it by
      using cgroup_kn_lock_live()/cgroup_kn_unlock() instead.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      b6053d40
    • T
      cgroup: refactor cgroup_masks_read() in the debug controller · 2866c0b4
      Tejun Heo 提交于
      Factor out cgroup_masks_read_one() out of cgroup_masks_read() for
      simplicity.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      2866c0b4
    • T
      cgroup: make debug an implicit controller on cgroup2 · 8cc38fa7
      Tejun Heo 提交于
      Make debug an implicit controller on cgroup2 which is enabled by
      "cgroup_debug" boot param.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      8cc38fa7
    • W
      cgroup: Make debug cgroup support v2 and thread mode · 575313f4
      Waiman Long 提交于
      Besides supporting cgroup v2 and thread mode, the following changes
      are also made:
       1) current_* cgroup files now resides only at the root as we don't
          need duplicated files of the same function all over the cgroup
          hierarchy.
       2) The cgroup_css_links_read() function is modified to report
          the number of tasks that are skipped because of overflow.
       3) The number of extra unaccounted references are displayed.
       4) The current_css_set_read() function now prints out the addresses of
          the css'es associated with the current css_set.
       5) A new cgroup_subsys_states file is added to display the css objects
          associated with a cgroup.
       6) A new cgroup_masks file is added to display the various controller
          bit masks in the cgroup.
      
      tj: Dropped thread mode related information for now so that debug
          controller changes aren't blocked on the thread mode.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      575313f4
    • W
      cgroup: Make Kconfig prompt of debug cgroup more accurate · 23b0be48
      Waiman Long 提交于
      The Kconfig prompt and description of the debug cgroup controller
      more accurate by saying that it is for debug purpose only and its
      interfaces are unstable.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      23b0be48
    • W
      cgroup: Move debug cgroup to its own file · a28f8f5e
      Waiman Long 提交于
      The debug cgroup currently resides within cgroup-v1.c and is enabled
      only for v1 cgroup. To enable the debug cgroup also for v2, it makes
      sense to put the code into its own file as it will no longer be v1
      specific. There is no change to the debug cgroup specific code.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a28f8f5e
    • W
      cgroup: Keep accurate count of tasks in each css_set · 73a7242a
      Waiman Long 提交于
      The reference count in the css_set data structure was used as a
      proxy of the number of tasks attached to that css_set. However, that
      count is actually not an accurate measure especially with thread mode
      support. So a new variable nr_tasks is added to the css_set to keep
      track of the actual task count. This new variable is protected by
      the css_set_lock. Functions that require the actual task count are
      updated to use the new variable.
      
      tj: s/task_count/nr_tasks/ for consistency with cgroup_root->nr_cgrps.
          Refreshed on top of cgroup/for-v4.13 which dropped on
          css_set_populated() -> nr_tasks conversion.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      73a7242a
  11. 25 5月, 2017 1 次提交
    • T
      cpuset: consider dying css as offline · 41c25707
      Tejun Heo 提交于
      In most cases, a cgroup controller don't care about the liftimes of
      cgroups.  For the controller, a css becomes online when ->css_online()
      is called on it and offline when ->css_offline() is called.
      
      However, cpuset is special in that the user interface it exposes cares
      whether certain cgroups exist or not.  Combined with the RCU delay
      between cgroup removal and css offlining, this can lead to user
      visible behavior oddities where operations which should succeed after
      cgroup removals fail for some time period.  The effects of cgroup
      removals are delayed when seen from userland.
      
      This patch adds css_is_dying() which tests whether offline is pending
      and updates is_cpuset_online() so that the function returns false also
      while offline is pending.  This gets rid of the userland visible
      delays.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Link: http://lkml.kernel.org/r/327ca1f5-7957-fbb9-9e5f-9ba149d40ba2@oracle.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NTejun Heo <tj@kernel.org>
      41c25707
  12. 18 5月, 2017 1 次提交
  13. 02 5月, 2017 1 次提交
  14. 29 4月, 2017 2 次提交
    • Z
      cgroup: avoid attaching a cgroup root to two different superblocks, take 2 · 9732adc5
      Zefan Li 提交于
      Commit bfb0b80d ("cgroup: avoid attaching a cgroup root to two
      different superblocks") is broken.  Now we try to fix the race by
      delaying the initialization of cgroup root refcnt until a superblock
      has been allocated.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: NAndrei Vagin <avagin@virtuozzo.com>
      Tested-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9732adc5
    • T
      cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc() · a590b90d
      Tejun Heo 提交于
      cgroup_get() expected to be called only on live cgroups and triggers
      warning on a dead cgroup; however, cgroup_sk_alloc() may be called
      while cloning a socket which is left in an empty and removed cgroup
      and thus may legitimately duplicate its reference on a dead cgroup.
      This currently triggers the following warning spuriously.
      
       WARNING: CPU: 14 PID: 0 at kernel/cgroup.c:490 cgroup_get+0x55/0x60
       ...
        [<ffffffff8107e123>] __warn+0xd3/0xf0
        [<ffffffff8107e20e>] warn_slowpath_null+0x1e/0x20
        [<ffffffff810ff465>] cgroup_get+0x55/0x60
        [<ffffffff81106061>] cgroup_sk_alloc+0x51/0xe0
        [<ffffffff81761beb>] sk_clone_lock+0x2db/0x390
        [<ffffffff817cce06>] inet_csk_clone_lock+0x16/0xc0
        [<ffffffff817e8173>] tcp_create_openreq_child+0x23/0x4b0
        [<ffffffff818601a1>] tcp_v6_syn_recv_sock+0x91/0x670
        [<ffffffff817e8b16>] tcp_check_req+0x3a6/0x4e0
        [<ffffffff81861ba3>] tcp_v6_rcv+0x693/0xa00
        [<ffffffff81837429>] ip6_input_finish+0x59/0x3e0
        [<ffffffff81837cb2>] ip6_input+0x32/0xb0
        [<ffffffff81837387>] ip6_rcv_finish+0x57/0xa0
        [<ffffffff81837ac8>] ipv6_rcv+0x318/0x4d0
        [<ffffffff817778c7>] __netif_receive_skb_core+0x2d7/0x9a0
        [<ffffffff81777fa6>] __netif_receive_skb+0x16/0x70
        [<ffffffff81778023>] netif_receive_skb_internal+0x23/0x80
        [<ffffffff817787d8>] napi_gro_frags+0x208/0x270
        [<ffffffff8168a9ec>] mlx4_en_process_rx_cq+0x74c/0xf40
        [<ffffffff8168b270>] mlx4_en_poll_rx_cq+0x30/0x90
        [<ffffffff81778b30>] net_rx_action+0x210/0x350
        [<ffffffff8188c426>] __do_softirq+0x106/0x2c7
        [<ffffffff81082bad>] irq_exit+0x9d/0xa0 [<ffffffff8188c0e4>] do_IRQ+0x54/0xd0
        [<ffffffff8188a63f>] common_interrupt+0x7f/0x7f <EOI>
        [<ffffffff8173d7e7>] cpuidle_enter+0x17/0x20
        [<ffffffff810bdfd9>] cpu_startup_entry+0x2a9/0x2f0
        [<ffffffff8103edd1>] start_secondary+0xf1/0x100
      
      This patch renames the existing cgroup_get() with the dead cgroup
      warning to cgroup_get_live() after cgroup_kn_lock_live() and
      introduces the new cgroup_get() which doesn't check whether the cgroup
      is live or dead.
      
      All existing cgroup_get() users except for cgroup_sk_alloc() are
      converted to use cgroup_get_live().
      
      Fixes: d979a39d ("cgroup: duplicate cgroup reference when cloning sockets")
      Cc: stable@vger.kernel.org # v4.5+
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Reported-by: NChris Mason <clm@fb.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a590b90d
  15. 16 4月, 2017 1 次提交
  16. 11 4月, 2017 2 次提交
    • Z
      cgroup: avoid attaching a cgroup root to two different superblocks · bfb0b80d
      Zefan Li 提交于
      Run this:
      
          touch file0
          for ((; ;))
          {
              mount -t cpuset xxx file0
          }
      
      And this concurrently:
      
          touch file1
          for ((; ;))
          {
              mount -t cpuset xxx file1
          }
      
      We'll trigger a warning like this:
      
       ------------[ cut here ]------------
       WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0
       percpu_ref_kill_and_confirm called more than once on css_release!
       CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
       Call Trace:
        dump_stack+0x63/0x84
        __warn+0xd1/0xf0
        warn_slowpath_fmt+0x5f/0x80
        percpu_ref_kill_and_confirm+0x92/0xb0
        cgroup_kill_sb+0x95/0xb0
        deactivate_locked_super+0x43/0x70
        deactivate_super+0x46/0x60
       ...
       ---[ end trace a79f61c2a2633700 ]---
      
      Here's a race:
      
        Thread A				Thread B
      
        cgroup1_mount()
          # alloc a new cgroup root
          cgroup_setup_root()
      					cgroup1_mount()
      					  # no sb yet, returns NULL
      					  kernfs_pin_sb()
      
      					  # but succeeds in getting the refcnt,
      					  # so re-use cgroup root
      					  percpu_ref_tryget_live()
          # alloc sb with cgroup root
          cgroup_do_mount()
      
        cgroup_kill_sb()
      					  # alloc another sb with same root
      					  cgroup_do_mount()
      
      					cgroup_kill_sb()
      
      We end up using the same cgroup root for two different superblocks,
      so percpu_ref_kill() will be called twice on the same root when the
      two superblocks are destroyed.
      
      We should fix to make sure the superblock pinning is really successful.
      
      Cc: stable@vger.kernel.org # 3.16+
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      bfb0b80d
    • R
      cpuset: Remove cpuset_update_active_cpus()'s parameter. · 30e03acd
      Rakib Mullick 提交于
      In cpuset_update_active_cpus(), cpu_online isn't used anymore. Remove
      it.
      
      Signed-off-by: Rakib Mullick<rakib.mullick@gmail.com>
      Acked-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      30e03acd
  17. 28 3月, 2017 1 次提交