1. 28 10月, 2014 2 次提交
  2. 25 9月, 2014 1 次提交
    • Z
      cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags · 2ad654bc
      Zefan Li 提交于
      When we change cpuset.memory_spread_{page,slab}, cpuset will flip
      PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset.
      This should be done using atomic bitops, but currently we don't,
      which is broken.
      
      Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
      when one thread tried to clear PF_USED_MATH while at the same time another
      thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
      the same task.
      
      Here's the full report:
      https://lkml.org/lkml/2014/9/19/230
      
      To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.
      
      v4:
      - updated mm/slab.c. (Fengguang Wu)
      - updated Documentation.
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Cc: Kees Cook <keescook@chromium.org>
      Fixes: 950592f7 ("cpusets: update tasks' page/slab spread flags in time")
      Cc: <stable@vger.kernel.org> # 2.6.31+
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      2ad654bc
  3. 19 9月, 2014 1 次提交
  4. 30 7月, 2014 1 次提交
  5. 15 7月, 2014 1 次提交
    • T
      cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes · 5577964e
      Tejun Heo 提交于
      Currently, cgroup_subsys->base_cftypes is used for both the unified
      default hierarchy and legacy ones and subsystems can mark each file
      with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear
      only on one of them.  This is quite hairy and error-prone.  Also, we
      may end up exposing interface files to the default hierarchy without
      thinking it through.
      
      cgroup_subsys will grow two separate cftype arrays and apply each only
      on the hierarchies of the matching type.  This will allow organizing
      cftypes in a lot clearer way and encourage subsystems to scrutinize
      the interface which is being exposed in the new default hierarchy.
      
      In preparation, this patch renames cgroup_subsys->base_cftypes to
      cgroup_subsys->legacy_cftypes.  This patch is pure rename.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Aristeu Rozanski <aris@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      5577964e
  6. 10 7月, 2014 12 次提交
    • L
      cpuset: export effective masks to userspace · afd1a8b3
      Li Zefan 提交于
      cpuset.cpus and cpuset.mems are the configured masks, and we need
      to export effective masks to userspace, so users know the real
      cpus_allowed and mems_allowed that apply to the tasks in a cpuset.
      
      v2:
      - export those masks unconditionally, suggested by Tejun.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      afd1a8b3
    • L
      cpuset: allow writing offlined masks to cpuset.cpus/mems · 5d8ba82c
      Li Zefan 提交于
      As the configured masks won't be limited by its parent, and the top
      cpuset's masks won't change when hotplug happens, it's natural to
      allow writing offlined masks to the configured masks.
      
      If on default hierarchy:
      
      	# echo 0 > /sys/devices/system/cpu/cpu1/online
      	# mkdir /cpuset/sub
      	# echo 1 > /cpuset/sub/cpuset.cpus
      	# cat /cpuset/sub/cpuset.cpus
      	1
      
      If on legacy hierarchy:
      
      	# echo 0 > /sys/devices/system/cpu/cpu1/online
      	# mkdir /cpuset/sub
      	# echo 1 > /cpuset/sub/cpuset.cpus
      	-bash: echo: write error: Invalid argument
      
      Note the checks don't need to be gated by cgroup_on_dfl, because we've
      initialized top_cpuset.{cpus,mems}_allowed accordingly in cpuset_bind().
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5d8ba82c
    • L
      cpuset: enable onlined cpu/node in effective masks · be4c9dd7
      Li Zefan 提交于
      Firstly offline cpu1:
      
        # echo 0-1 > cpuset.cpus
        # echo 0 > /sys/devices/system/cpu/cpu1/online
        # cat cpuset.cpus
        0-1
        # cat cpuset.effective_cpus
        0
      
      Then online it:
      
        # echo 1 > /sys/devices/system/cpu/cpu1/online
        # cat cpuset.cpus
        0-1
        # cat cpuset.effective_cpus
        0-1
      
      And cpuset will bring it back to the effective mask.
      
      The implementation is quite straightforward. Instead of calculating the
      offlined cpus/mems and do updates, we just set the new effective_mask
      to online_mask & congifured_mask.
      
      This is a behavior change for default hierarchy, so legacy hierarchy
      won't be affected.
      
      v2:
      - make refactoring of cpuset_hotplug_update_tasks() as seperate patch,
        suggested by Tejun.
      - make hotplug_update_tasks_insane() use @new_cpus and @new_mems as
        hotplug_update_tasks_sane() does.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      be4c9dd7
    • L
      cpuset: refactor cpuset_hotplug_update_tasks() · 390a36aa
      Li Zefan 提交于
      We mix the handling for both default hierarchy and legacy hierarchy in
      the same function, and it's quite messy, so split into two functions.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      390a36aa
    • L
      cpuset: make cs->{cpus, mems}_allowed as user-configured masks · 7e88291b
      Li Zefan 提交于
      Now we've used effective cpumasks to enforce hierarchical manner,
      we can use cs->{cpus,mems}_allowed as configured masks.
      
      Configured masks can be changed by writing cpuset.cpus and cpuset.mems
      only. The new behaviors are:
      
      - They won't be changed by hotplug anymore.
      - They won't be limited by its parent's masks.
      
      This ia a behavior change, but won't take effect unless mount with
      sane_behavior.
      
      v2:
      - Add comments to explain the differences between configured masks and
      effective masks.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      7e88291b
    • L
      cpuset: apply cs->effective_{cpus,mems} · ae1c8023
      Li Zefan 提交于
      Now we can use cs->effective_{cpus,mems} as effective masks. It's
      used whenever:
      
      - we update tasks' cpus_allowed/mems_allowed,
      - we want to retrieve tasks_cs(tsk)'s cpus_allowed/mems_allowed.
      
      They actually replace effective_{cpu,node}mask_cpuset().
      
      effective_mask == configured_mask & parent effective_mask except when
      the reault is empty, in which case it inherits parent effective_mask.
      The result equals the mask computed from effective_{cpu,node}mask_cpuset().
      
      This won't affect the original legacy hierarchy, because in this case we
      make sure the effective masks are always the same with user-configured
      masks.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      ae1c8023
    • L
      cpuset: initialize top_cpuset's configured masks at mount · 39bd0d15
      Li Zefan 提交于
      We now have to support different behaviors for default hierachy and
      legacy hiearchy, top_cpuset's configured masks need to be initialized
      accordingly.
      
      Suppose we've offlined cpu1.
      
      On default hierarchy:
      
      	# mount -t cgroup -o __DEVEL__sane_behavior xxx /cpuset
      	# cat /cpuset/cpuset.cpus
      	0-15
      
      On legacy hierarchy:
      
      	# mount -t cgroup xxx /cpuset
      	# cat /cpuset/cpuset.cpus
      	0,2-15
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      39bd0d15
    • L
      cpuset: use effective cpumask to build sched domains · 8b5f1c52
      Li Zefan 提交于
      We're going to have separate user-configured masks and effective ones.
      
      Eventually configured masks can only be changed by writing cpuset.cpus
      and cpuset.mems, and they won't be restricted by parent cpuset. While
      effective masks reflect cpu/memory hotplug and hierachical restriction,
      and these are the real masks that apply to the tasks in the cpuset.
      
      We calculate effective mask this way:
        - top cpuset's effective_mask == online_mask, otherwise
        - cpuset's effective_mask == configured_mask & parent effective_mask,
          if the result is empty, it inherits parent effective mask.
      
      Those behavior changes are for default hierarchy only. For legacy
      hierarchy, effective_mask and configured_mask are the same, so we won't
      break old interfaces.
      
      We should partition sched domains according to effective_cpus, which
      is the real cpulist that takes effects on tasks in the cpuset.
      
      This won't introduce behavior change.
      
      v2:
      - Add a comment for the call of rebuild_sched_domains(), suggested
      by Tejun.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8b5f1c52
    • L
      cpuset: inherit ancestor's masks if effective_{cpus, mems} becomes empty · 554b0d1c
      Li Zefan 提交于
      We're going to have separate user-configured masks and effective ones.
      
      Eventually configured masks can only be changed by writing cpuset.cpus
      and cpuset.mems, and they won't be restricted by parent cpuset. While
      effective masks reflect cpu/memory hotplug and hierachical restriction,
      and these are the real masks that apply to the tasks in the cpuset.
      
      We calculate effective mask this way:
        - top cpuset's effective_mask == online_mask, otherwise
        - cpuset's effective_mask == configured_mask & parent effective_mask,
          if the result is empty, it inherits parent effective mask.
      
      Those behavior changes are for default hierarchy only. For legacy
      hierarchy, effective_mask and configured_mask are the same, so we won't
      break old interfaces.
      
      To make cs->effective_{cpus,mems} to be effective masks, we need to
        - update the effective masks at hotplug
        - update the effective masks at config change
        - take on ancestor's mask when the effective mask is empty
      
      The last item is done here.
      
      This won't introduce behavior change.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      554b0d1c
    • L
      cpuset: update cs->effective_{cpus, mems} when config changes · 734d4513
      Li Zefan 提交于
      We're going to have separate user-configured masks and effective ones.
      
      Eventually configured masks can only be changed by writing cpuset.cpus
      and cpuset.mems, and they won't be restricted by parent cpuset. While
      effective masks reflect cpu/memory hotplug and hierachical restriction,
      and these are the real masks that apply to the tasks in the cpuset.
      
      We calculate effective mask this way:
        - top cpuset's effective_mask == online_mask, otherwise
        - cpuset's effective_mask == configured_mask & parent effective_mask,
          if the result is empty, it inherits parent effective mask.
      
      Those behavior changes are for default hierarchy only. For legacy
      hierarchy, effective_mask and configured_mask are the same, so we won't
      break old interfaces.
      
      To make cs->effective_{cpus,mems} to be effective masks, we need to
        - update the effective masks at hotplug
        - update the effective masks at config change
        - take on ancestor's mask when the effective mask is empty
      
      The second item is done here. We don't need to treat root_cs specially
      in update_cpumasks_hier().
      
      This won't introduce behavior change.
      
      v3:
      - add a WARN_ON() to check if effective masks are the same with configured
        masks on legacy hierarchy.
      - pass trialcs->cpus_allowed to update_cpumasks_hier() and add a comment for
        it. Similar change for update_nodemasks_hier(). Suggested by Tejun.
      
      v2:
      - revise the comment in update_{cpu,node}masks_hier(), suggested by Tejun.
      - fix to use @CP instead of @cs in these two functions.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      734d4513
    • L
      cpuset: update cpuset->effective_{cpus,mems} at hotplug · 1344ab9c
      Li Zefan 提交于
      We're going to have separate user-configured masks and effective ones.
      
      Eventually configured masks can only be changed by writing cpuset.cpus
      and cpuset.mems, and they won't be restricted by parent cpuset. While
      effective masks reflect cpu/memory hotplug and hierachical restriction,
      and these are the real masks that apply to the tasks in the cpuset.
      
      We calculate effective mask this way:
        - top cpuset's effective_mask == online_mask, otherwise
        - cpuset's effective_mask == configured_mask & parent effective_mask,
          if the result is empty, it inherits parent effective mask.
      
      Those behavior changes are for default hierarchy only. For legacy
      hierarchy, effective_mask and configured_mask are the same, so we won't
      break old interfaces.
      
      To make cs->effective_{cpus,mems} to be effective masks, we need to
        - update the effective masks at hotplug
        - update the effective masks at config change
        - take on ancestor's mask when the effective mask is empty
      
      The first item is done here.
      
      This won't introduce behavior change.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      1344ab9c
    • L
      cpuset: add cs->effective_cpus and cs->effective_mems · e2b9a3d7
      Li Zefan 提交于
      We're going to have separate user-configured masks and effective ones.
      
      Eventually configured masks can only be changed by writing cpuset.cpus
      and cpuset.mems, and they won't be restricted by parent cpuset. While
      effective masks reflect cpu/memory hotplug and hierachical restriction,
      and these are the real masks that apply to the tasks in the cpuset.
      
      We calculate effective mask this way:
        - top cpuset's effective_mask == online_mask, otherwise
        - cpuset's effective_mask == configured_mask & parent effective_mask,
          if the result is empty, it inherits parent effective mask.
      
      Those behavior changes are for default hierarchy only. For legacy
      hierachy, effective_mask and configured_mask are the same, so we won't
      break old interfaces.
      
      This patch adds the effective masks to struct cpuset and initializes
      them. The effective masks of the top cpuset is the same with configured
      masks, and a child cpuset inherits its parent's effective masks.
      
      This won't introduce behavior change.
      
      v2:
      - s/real_{mems,cpus}_allowed/effective_{mems,cpus}, suggested by Tejun.
      - don't init effective masks in cpuset_css_online() if !cgroup_on_dfl.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e2b9a3d7
  7. 09 7月, 2014 1 次提交
    • T
      cgroup: remove sane_behavior support on non-default hierarchies · aa6ec29b
      Tejun Heo 提交于
      sane_behavior has been used as a development vehicle for the default
      unified hierarchy.  Now that the default hierarchy is in place, the
      flag became redundant and confusing as its usage is allowed on all
      hierarchies.  There are gonna be either the default hierarchy or
      legacy ones.  Let's make that clear by removing sane_behavior support
      on non-default hierarchies.
      
      This patch replaces cgroup_sane_behavior() with cgroup_on_dfl().  The
      comment on top of CGRP_ROOT_SANE_BEHAVIOR is moved to on top of
      cgroup_on_dfl() with sane_behavior specific part dropped.
      
      On the default and legacy hierarchies w/o sane_behavior, this
      shouldn't cause any behavior differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      aa6ec29b
  8. 02 7月, 2014 1 次提交
    • T
      cpuset: break kernfs active protection in cpuset_write_resmask() · 76bb5ab8
      Tejun Heo 提交于
      Writing to either "cpuset.cpus" or "cpuset.mems" file flushes
      cpuset_hotplug_work so that cpu or memory hotunplug doesn't end up
      migrating tasks off a cpuset after new resources are added to it.
      
      As cpuset_hotplug_work calls into cgroup core via
      cgroup_transfer_tasks(), this flushing adds the dependency to cgroup
      core locking from cpuset_write_resmak().  This used to be okay because
      cgroup interface files were protected by a different mutex; however,
      8353da1f ("cgroup: remove cgroup_tree_mutex") simplified the
      cgroup core locking and this dependency became a deadlock hazard -
      cgroup file removal performed under cgroup core lock tries to drain
      on-going file operation which is trying to flush cpuset_hotplug_work
      blocked on the same cgroup core lock.
      
      The locking simplification was done because kernfs added an a lot
      easier way to deal with circular dependencies involving kernfs active
      protection.  Let's use the same strategy in cpuset and break active
      protection in cpuset_write_resmask().  While it isn't the prettiest,
      this is a very rare, likely unique, situation which also goes away on
      the unified hierarchy.
      
      The commands to trigger the deadlock warning without the patch and the
      lockdep output follow.
      
       localhost:/ # mount -t cgroup -o cpuset xxx /cpuset
       localhost:/ # mkdir /cpuset/tmp
       localhost:/ # echo 1 > /cpuset/tmp/cpuset.cpus
       localhost:/ # echo 0 > cpuset/tmp/cpuset.mems
       localhost:/ # echo $$ > /cpuset/tmp/tasks
       localhost:/ # echo 0 > /sys/devices/system/cpu/cpu1/online
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.16.0-rc1-0.1-default+ #7 Not tainted
        -------------------------------------------------------
        kworker/1:0/32649 is trying to acquire lock:
         (cgroup_mutex){+.+.+.}, at: [<ffffffff8110e3d7>] cgroup_transfer_tasks+0x37/0x150
      
        but task is already holding lock:
         (cpuset_hotplug_work){+.+...}, at: [<ffffffff81085412>] process_one_work+0x192/0x520
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (cpuset_hotplug_work){+.+...}:
        ...
        -> #1 (s_active#175){++++.+}:
        ...
        -> #0 (cgroup_mutex){+.+.+.}:
        ...
      
        other info that might help us debug this:
      
        Chain exists of:
          cgroup_mutex --> s_active#175 --> cpuset_hotplug_work
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(cpuset_hotplug_work);
      				 lock(s_active#175);
      				 lock(cpuset_hotplug_work);
          lock(cgroup_mutex);
      
         *** DEADLOCK ***
      
        2 locks held by kworker/1:0/32649:
         #0:  ("events"){.+.+.+}, at: [<ffffffff81085412>] process_one_work+0x192/0x520
         #1:  (cpuset_hotplug_work){+.+...}, at: [<ffffffff81085412>] process_one_work+0x192/0x520
      
        stack backtrace:
        CPU: 1 PID: 32649 Comm: kworker/1:0 Not tainted 3.16.0-rc1-0.1-default+ #7
       ...
        Call Trace:
         [<ffffffff815a5f78>] dump_stack+0x72/0x8a
         [<ffffffff810c263f>] print_circular_bug+0x10f/0x120
         [<ffffffff810c481e>] check_prev_add+0x43e/0x4b0
         [<ffffffff810c4ee6>] validate_chain+0x656/0x7c0
         [<ffffffff810c53d2>] __lock_acquire+0x382/0x660
         [<ffffffff810c57a9>] lock_acquire+0xf9/0x170
         [<ffffffff815aa13f>] mutex_lock_nested+0x6f/0x380
         [<ffffffff8110e3d7>] cgroup_transfer_tasks+0x37/0x150
         [<ffffffff811129c0>] hotplug_update_tasks_insane+0x110/0x1d0
         [<ffffffff81112bbd>] cpuset_hotplug_update_tasks+0x13d/0x180
         [<ffffffff811148ec>] cpuset_hotplug_workfn+0x18c/0x630
         [<ffffffff810854d4>] process_one_work+0x254/0x520
         [<ffffffff810875dd>] worker_thread+0x13d/0x3d0
         [<ffffffff8108e0c8>] kthread+0xf8/0x100
         [<ffffffff815acaec>] ret_from_fork+0x7c/0xb0
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NLi Zefan <lizefan@huawei.com>
      Tested-by: NLi Zefan <lizefan@huawei.com>
      76bb5ab8
  9. 25 6月, 2014 1 次提交
    • G
      cpuset,mempolicy: fix sleeping function called from invalid context · 391acf97
      Gu Zheng 提交于
      When runing with the kernel(3.15-rc7+), the follow bug occurs:
      [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
      [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
      [ 9969.441175] INFO: lockdep is turned off.
      [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
      [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
      [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
      [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
      [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
      [ 9969.974071] Call Trace:
      [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
      [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
      [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
      [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
      [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
      [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
      [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
      [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
      [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
      [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
      [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
      [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
      [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
      [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
      [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
      [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
      [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
      
      The cause is that cpuset_mems_allowed() try to take
      mutex_lock(&callback_mutex) under the rcu_read_lock(which was hold in
      __mpol_dup()). And in cpuset_mems_allowed(), the access to cpuset is
      under rcu_read_lock, so in __mpol_dup, we can reduce the rcu_read_lock
      protection region to protect the access to cpuset only in
      current_cpuset_is_being_rebound(). So that we can avoid this bug.
      
      This patch is a temporary solution that just addresses the bug
      mentioned above, can not fix the long-standing issue about cpuset.mems
      rebinding on fork():
      
      "When the forker's task_struct is duplicated (which includes
       ->mems_allowed) and it races with an update to cpuset_being_rebound
       in update_tasks_nodemask() then the task's mems_allowed doesn't get
       updated. And the child task's mems_allowed can be wrong if the
       cpuset's nodemask changes before the child has been added to the
       cgroup's tasklist."
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable <stable@vger.kernel.org>
      391acf97
  10. 05 6月, 2014 1 次提交
  11. 17 5月, 2014 1 次提交
    • T
      cgroup: remove css_parent() · 5c9d535b
      Tejun Heo 提交于
      cgroup in general is moving towards using cgroup_subsys_state as the
      fundamental structural component and css_parent() was introduced to
      convert from using cgroup->parent to css->parent.  It was quite some
      time ago and we're moving forward with making css more prominent.
      
      This patch drops the trivial wrapper css_parent() and let the users
      dereference css->parent.  While at it, explicitly mark fields of css
      which are public and immutable.
      
      v2: New usage from device_cgroup.c converted.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      5c9d535b
  12. 14 5月, 2014 2 次提交
    • T
      cgroup: replace cftype->write_string() with cftype->write() · 451af504
      Tejun Heo 提交于
      Convert all cftype->write_string() users to the new cftype->write()
      which maps directly to kernfs write operation and has full access to
      kernfs and cgroup contexts.  The conversions are mostly mechanical.
      
      * @css and @cft are accessed using of_css() and of_cft() accessors
        respectively instead of being specified as arguments.
      
      * Should return @nbytes on success instead of 0.
      
      * @buf is not trimmed automatically.  Trim if necessary.  Note that
        blkcg and netprio don't need this as the parsers already handle
        whitespaces.
      
      cftype->write_string() has no user left after the conversions and
      removed.
      
      While at it, remove unnecessary local variable @p in
      cgroup_subtree_control_write() and stale comment about
      CGROUP_LOCAL_BUFFER_SIZE in cgroup_freezer.c.
      
      This patch doesn't introduce any visible behavior changes.
      
      v2: netprio was missing from conversion.  Converted.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NAristeu Rozanski <arozansk@redhat.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      451af504
    • T
      cgroup: rename css_tryget*() to css_tryget_online*() · ec903c0c
      Tejun Heo 提交于
      Unlike the more usual refcnting, what css_tryget() provides is the
      distinction between online and offline csses instead of protection
      against upping a refcnt which already reached zero.  cgroup is
      planning to provide actual tryget which fails if the refcnt already
      reached zero.  Let's rename the existing trygets so that they clearly
      indicate that they're onliness.
      
      I thought about keeping the existing names as-are and introducing new
      names for the planned actual tryget; however, given that each
      controller participates in the synchronization of the online state, it
      seems worthwhile to make it explicit that these functions are about
      on/offline state.
      
      Rename css_tryget() to css_tryget_online() and css_tryget_from_dir()
      to css_tryget_online_from_dir().  This is pure rename.
      
      v2: cgroup_freezer grew new usages of css_tryget().  Update
          accordingly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      ec903c0c
  13. 06 5月, 2014 2 次提交
  14. 04 4月, 2014 1 次提交
  15. 19 3月, 2014 1 次提交
    • T
      cgroup: drop const from @buffer of cftype->write_string() · 4d3bb511
      Tejun Heo 提交于
      cftype->write_string() just passes on the writeable buffer from kernfs
      and there's no reason to add const restriction on the buffer.  The
      only thing const achieves is unnecessarily complicating parsing of the
      buffer.  Drop const from @buffer.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>                                           
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      4d3bb511
  16. 04 3月, 2014 1 次提交
  17. 27 2月, 2014 2 次提交
    • L
      cpuset: fix a race condition in __cpuset_node_allowed_softwall() · 99afb0fd
      Li Zefan 提交于
      It's not safe to access task's cpuset after releasing task_lock().
      Holding callback_mutex won't help.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      99afb0fd
    • L
      cpuset: fix a locking issue in cpuset_migrate_mm() · 47295830
      Li Zefan 提交于
      I can trigger a lockdep warning:
      
        # mount -t cgroup -o cpuset xxx /cgroup
        # mkdir /cgroup/cpuset
        # mkdir /cgroup/tmp
        # echo 0 > /cgroup/tmp/cpuset.cpus
        # echo 0 > /cgroup/tmp/cpuset.mems
        # echo 1 > /cgroup/tmp/cpuset.memory_migrate
        # echo $$ > /cgroup/tmp/tasks
        # echo 1 > /cgruop/tmp/cpuset.mems
      
        ===============================
        [ INFO: suspicious RCU usage. ]
        3.14.0-rc1-0.1-default+ #32 Not tainted
        -------------------------------
        include/linux/cgroup.h:682 suspicious rcu_dereference_check() usage!
        ...
          [<ffffffff81582174>] dump_stack+0x72/0x86
          [<ffffffff810b8f01>] lockdep_rcu_suspicious+0x101/0x140
          [<ffffffff81105ba1>] cpuset_migrate_mm+0xb1/0xe0
        ...
      
      We used to hold cgroup_mutex when calling cpuset_migrate_mm(), but now
      we hold cpuset_mutex, which causes task_css() to complain.
      
      This is not a false-positive but a real issue.
      
      Holding cpuset_mutex won't prevent a task from migrating to another
      cpuset, and it won't prevent the original task->cgroup from destroying
      during this change.
      
      Fixes: 5d21cc2d (cpuset: replace cgroup_mutex locking with cpuset internal locking)
      Cc: <stable@vger.kernel.org> # 3.9+
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Sigend-off-by: NTejun Heo <tj@kernel.org>
      47295830
  18. 13 2月, 2014 4 次提交
    • T
      cpuset: don't use cgroup_taskset_cur_css() · 57fce0a6
      Tejun Heo 提交于
      cgroup_taskset_cur_css() will be removed during the planned
      resturcturing of migration path.  The only use of
      cgroup_taskset_cur_css() is finding out the old cgroup_subsys_state of
      the leader in cpuset_attach().  This usage can easily be removed by
      remembering the old value from cpuset_can_attach().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      57fce0a6
    • T
      cgroup: drop @skip_css from cgroup_taskset_for_each() · 924f0d9a
      Tejun Heo 提交于
      If !NULL, @skip_css makes cgroup_taskset_for_each() skip the matching
      css.  The intention of the interface is to make it easy to skip css's
      (cgroup_subsys_states) which already match the migration target;
      however, this is entirely unnecessary as migration taskset doesn't
      include tasks which are already in the target cgroup.  Drop @skip_css
      from cgroup_taskset_for_each().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      924f0d9a
    • T
      cpuset: use css_task_iter_start/next/end() instead of css_scan_tasks() · d66393e5
      Tejun Heo 提交于
      Now that css_task_iter_start/next_end() supports blocking while
      iterating, there's no reason to use css_scan_tasks() which is more
      cumbersome to use and scheduled to be removed.
      
      Convert all css_scan_tasks() usages in cpuset to
      css_task_iter_start/next/end().  This simplifies the code by removing
      heap allocation and callbacks.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      d66393e5
    • T
      cgroup: implement cgroup_has_tasks() and unexport cgroup_task_count() · 07bc356e
      Tejun Heo 提交于
      cgroup_task_count() read-locks css_set_lock and walks all tasks to
      count them and then returns the result.  The only thing all the users
      want is determining whether the cgroup is empty or not.  This patch
      implements cgroup_has_tasks() which tests whether cgroup->cset_links
      is empty, replaces all cgroup_task_count() usages and unexports it.
      
      Note that the test isn't synchronized.  This is the same as before.
      The test has always been racy.
      
      This will help planned css_set locking update.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      07bc356e
  19. 12 2月, 2014 1 次提交
    • T
      cgroup: remove cgroup->name · e61734c5
      Tejun Heo 提交于
      cgroup->name handling became quite complicated over time involving
      dedicated struct cgroup_name for RCU protection.  Now that cgroup is
      on kernfs, we can drop all of it and simply use kernfs_name/path() and
      friends.  Replace cgroup->name and all related code with kernfs
      name/path constructs.
      
      * Reimplement cgroup_name() and cgroup_path() as thin wrappers on top
        of kernfs counterparts, which involves semantic changes.
        pr_cont_cgroup_name() and pr_cont_cgroup_path() added.
      
      * cgroup->name handling dropped from cgroup_rename().
      
      * All users of cgroup_name/path() updated to the new semantics.  Users
        which were formatting the string just to printk them are converted
        to use pr_cont_cgroup_name/path() instead, which simplifies things
        quite a bit.  As cgroup_name() no longer requires RCU read lock
        around it, RCU lockings which were protecting only cgroup_name() are
        removed.
      
      v2: Comment above oom_info_lock updated as suggested by Michal.
      
      v3: dummy_top doesn't have a kn associated and
          pr_cont_cgroup_name/path() ended up calling the matching kernfs
          functions with NULL kn leading to oops.  Test for NULL kn and
          print "/" if so.  This issue was reported by Fengguang Wu.
      
      v4: Rebased on top of 0ab02ca8 ("cgroup: protect modifications to
          cgroup_idr with cgroup_mutex").
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      e61734c5
  20. 08 2月, 2014 1 次提交
    • T
      cgroup: clean up cgroup_subsys names and initialization · 073219e9
      Tejun Heo 提交于
      cgroup_subsys is a bit messier than it needs to be.
      
      * The name of a subsys can be different from its internal identifier
        defined in cgroup_subsys.h.  Most subsystems use the matching name
        but three - cpu, memory and perf_event - use different ones.
      
      * cgroup_subsys_id enums are postfixed with _subsys_id and each
        cgroup_subsys is postfixed with _subsys.  cgroup.h is widely
        included throughout various subsystems, it doesn't and shouldn't
        have claim on such generic names which don't have any qualifier
        indicating that they belong to cgroup.
      
      * cgroup_subsys->subsys_id should always equal the matching
        cgroup_subsys_id enum; however, we require each controller to
        initialize it and then BUG if they don't match, which is a bit
        silly.
      
      This patch cleans up cgroup_subsys names and initialization by doing
      the followings.
      
      * cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
        cgroup_subsys with _cgrp_subsys.
      
      * With the above, renaming subsys identifiers to match the userland
        visible names doesn't cause any naming conflicts.  All non-matching
        identifiers are renamed to match the official names.
      
        cpu_cgroup -> cpu
        mem_cgroup -> memory
        perf -> perf_event
      
      * controllers no longer need to initialize ->subsys_id and ->name.
        They're generated in cgroup core and set automatically during boot.
      
      * Redundant cgroup_subsys declarations removed.
      
      * While updating BUG_ON()s in cgroup_init_early(), convert them to
        WARN()s.  BUGging that early during boot is stupid - the kernel
        can't print anything, even through serial console and the trap
        handler doesn't even link stack frame properly for back-tracing.
      
      This patch doesn't introduce any behavior changes.
      
      v2: Rebased on top of fe1217c4 ("net: net_cls: move cgroupfs
          classid handling into core").
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Acked-by: N"Rafael J. Wysocki" <rjw@rjwysocki.net>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Acked-by: NIngo Molnar <mingo@redhat.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      073219e9
  21. 06 12月, 2013 2 次提交
    • T
      cgroup: replace cftype->read_seq_string() with cftype->seq_show() · 2da8ca82
      Tejun Heo 提交于
      In preparation of conversion to kernfs, cgroup file handling is
      updated so that it can be easily mapped to kernfs.  This patch
      replaces cftype->read_seq_string() with cftype->seq_show() which is
      not limited to single_open() operation and will map directcly to
      kernfs seq_file interface.
      
      The conversions are mechanical.  As ->seq_show() doesn't have @css and
      @cft, the functions which make use of them are converted to use
      seq_css() and seq_cft() respectively.  In several occassions, e.f. if
      it has seq_string in its name, the function name is updated to fit the
      new method better.
      
      This patch does not introduce any behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NAristeu Rozanski <arozansk@redhat.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      2da8ca82
    • T
      cpuset: convert away from cftype->read() · 51ffe411
      Tejun Heo 提交于
      In preparation of conversion to kernfs, cgroup file handling is being
      consolidated so that it can be easily mapped to the seq_file based
      interface of kernfs.
      
      All users of cftype->read() can be easily served, usually better, by
      seq_file and other methods.  Rename cpuset_common_file_read() to
      cpuset_common_read_seq_string() and convert it to use
      read_seq_string() interface instead.  This not only simplifies the
      code but also makes it more versatile.  Before, the file couldn't
      output if the result is longer than PAGE_SIZE.  After the conversion,
      seq_file automatically grows the buffer until the output can fit.
      
      This patch doesn't make any visible behavior changes except for being
      able to handle output larger than PAGE_SIZE.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      51ffe411