1. 06 6月, 2013 2 次提交
  2. 02 5月, 2013 1 次提交
  3. 30 4月, 2013 1 次提交
  4. 28 4月, 2013 1 次提交
  5. 27 4月, 2013 2 次提交
    • L
      cpuset: fix cpu hotplug vs rebuild_sched_domains() race · 5b16c2a4
      Li Zefan 提交于
      rebuild_sched_domains() might pass doms with offlined cpu to
      partition_sched_domains(), which results in an oops:
      
      general protection fault: 0000 [#1] SMP
      ...
      RIP: 0010:[<ffffffff81077a1e>]  [<ffffffff81077a1e>] get_group+0x6e/0x90
      ...
      Call Trace:
       [<ffffffff8107f07c>] build_sched_domains+0x70c/0xcb0
       [<ffffffff8107f2a7>] ? build_sched_domains+0x937/0xcb0
       [<ffffffff81173f64>] ? kfree+0xe4/0x1b0
       [<ffffffff8107f6e0>] ? partition_sched_domains+0xc0/0x470
       [<ffffffff8107f905>] partition_sched_domains+0x2e5/0x470
       [<ffffffff8107f6e0>] ? partition_sched_domains+0xc0/0x470
       [<ffffffff810c9007>] ? generate_sched_domains+0xc7/0x530
       [<ffffffff810c94a8>] rebuild_sched_domains_locked+0x38/0x70
       [<ffffffff810cb4a4>] cpuset_write_resmask+0x1a4/0x500
       [<ffffffff810c8700>] ? cpuset_mount+0xe0/0xe0
       [<ffffffff810c7f50>] ? cpuset_read_u64+0x100/0x100
       [<ffffffff810be890>] ? cgroup_iter_next+0x90/0x90
       [<ffffffff810cb300>] ? cpuset_css_offline+0x70/0x70
       [<ffffffff810c1a73>] cgroup_file_write+0x133/0x2e0
       [<ffffffff8118995b>] vfs_write+0xcb/0x130
       [<ffffffff8118a174>] sys_write+0x64/0xa0
      Reported-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5b16c2a4
    • L
      cpuset: use rebuild_sched_domains() in cpuset_hotplug_workfn() · e0e80a02
      Li Zhong 提交于
      In cpuset_hotplug_workfn(), partition_sched_domains() is called without
      hotplug lock held, which is actually needed (stated in the function
      header of partition_sched_domains()).
      
      This patch tries to use rebuild_sched_domains() to solve the above
      issue, and makes the code looks a little simpler.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e0e80a02
  6. 08 4月, 2013 1 次提交
    • T
      cgroup, cpuset: replace move_member_tasks_to_cpuset() with cgroup_transfer_tasks() · 8cc99345
      Tejun Heo 提交于
      When a cpuset becomes empty (no CPU or memory), its tasks are
      transferred with the nearest ancestor with execution resources.  This
      is implemented using cgroup_scan_tasks() with a callback which grabs
      cgroup_mutex and invokes cgroup_attach_task() on each task.
      
      Both cgroup_mutex and cgroup_attach_task() are scheduled to be
      unexported.  Implement cgroup_transfer_tasks() in cgroup proper which
      is essentially the same as move_member_tasks_to_cpuset() except that
      it takes cgroups instead of cpusets and @to comes before @from like
      normal functions with those arguments, and replace
      move_member_tasks_to_cpuset() with it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      8cc99345
  7. 20 3月, 2013 2 次提交
    • L
      cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc() · 081aa458
      Li Zefan 提交于
      These two functions share most of the code.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      081aa458
    • T
      sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY · 14a40ffc
      Tejun Heo 提交于
      PF_THREAD_BOUND was originally used to mark kernel threads which were
      bound to a specific CPU using kthread_bind() and a task with the flag
      set allows cpus_allowed modifications only to itself.  Workqueue is
      currently abusing it to prevent userland from meddling with
      cpus_allowed of workqueue workers.
      
      What we need is a flag to prevent userland from messing with
      cpus_allowed of certain kernel tasks.  In kernel, anyone can
      (incorrectly) squash the flag, and, for worker-type usages,
      restricting cpus_allowed modification to the task itself doesn't
      provide meaningful extra proection as other tasks can inject work
      items to the task anyway.
      
      This patch replaces PF_THREAD_BOUND with PF_NO_SETAFFINITY.
      sched_setaffinity() checks the flag and return -EINVAL if set.
      set_cpus_allowed_ptr() is no longer affected by the flag.
      
      This will allow simplifying workqueue worker CPU affinity management.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      14a40ffc
  8. 13 3月, 2013 1 次提交
  9. 05 3月, 2013 1 次提交
  10. 19 2月, 2013 1 次提交
  11. 16 1月, 2013 2 次提交
    • L
      cpuset: drop spurious retval assignment in proc_cpuset_show() · d127027b
      Li Zefan 提交于
      proc_cpuset_show() has a spurious -EINVAL assignment which does
      nothing.  Remove it.
      
      This patch doesn't make any functional difference.
      
      tj: Rewrote patch description.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d127027b
    • L
      cpuset: fix RCU lockdep splat · 27e89ae5
      Li Zefan 提交于
      5d21cc2d ("cpuset: replace
      cgroup_mutex locking with cpuset internal locking") incorrectly
      converted proc_cpuset_show() from cgroup_lock() to cpuset_mutex.
      proc_cpuset_show() is accessing cgroup hierarchy proper to determine
      cgroup path which can't be protected by cpuset_mutex.  This triggered
      the following RCU warning.
      
       ===============================
       [ INFO: suspicious RCU usage. ]
       3.8.0-rc3-next-20130114-sasha-00016-ga107525-dirty #262 Tainted: G        W
       -------------------------------
       include/linux/cgroup.h:534 suspicious rcu_dereference_check() usage!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 1, debug_locks = 1
       2 locks held by trinity/7514:
        #0:  (&p->lock){+.+.+.}, at: [<ffffffff812b06aa>] seq_read+0x3a/0x3e0
        #1:  (cpuset_mutex){+.+...}, at: [<ffffffff811abae4>] proc_cpuset_show+0x84/0x190
      
       stack backtrace:
       Pid: 7514, comm: trinity Tainted: G        W
      +3.8.0-rc3-next-20130114-sasha-00016-ga107525-dirty #262
       Call Trace:
        [<ffffffff81182cab>] lockdep_rcu_suspicious+0x10b/0x120
        [<ffffffff811abb71>] proc_cpuset_show+0x111/0x190
        [<ffffffff812b0827>] seq_read+0x1b7/0x3e0
        [<ffffffff812b0670>] ? seq_lseek+0x110/0x110
        [<ffffffff8128b4fb>] do_loop_readv_writev+0x4b/0x90
        [<ffffffff8128b776>] do_readv_writev+0xf6/0x1d0
        [<ffffffff8128b8ee>] vfs_readv+0x3e/0x60
        [<ffffffff8128b960>] sys_readv+0x50/0xd0
        [<ffffffff83d33d18>] tracesys+0xe1/0xe6
      
      The operation can be performed under RCU read lock.  Replace
      cpuset_mutex locking with RCU read locking.
      
      tj: Rewrote patch description.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      27e89ae5
  12. 08 1月, 2013 15 次提交
    • T
      cpuset: remove cpuset->parent · c431069f
      Tejun Heo 提交于
      cgroup already tracks the hierarchy.  Follow cgroup->parent to find
      the parent and drop cpuset->parent.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      c431069f
    • T
      cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre() · fc560a26
      Tejun Heo 提交于
      Implement cpuset_for_each_descendant_pre() and replace the
      cpuset-specific tree walking using cpuset->stack_list with it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      fc560a26
    • T
      cpuset: replace cgroup_mutex locking with cpuset internal locking · 5d21cc2d
      Tejun Heo 提交于
      Supposedly for historical reasons, cpuset depends on cgroup core for
      locking.  It depends on cgroup_mutex in cgroup callbacks and grabs
      cgroup_mutex from other places where it wants to be synchronized.
      This is majorly messy and highly prone to introducing circular locking
      dependency especially because cgroup_mutex is supposed to be one of
      the outermost locks.
      
      As previous patches already plugged possible races which may happen by
      decoupling from cgroup_mutex, replacing cgroup_mutex with cpuset
      specific cpuset_mutex is mostly straight-forward.  Introduce
      cpuset_mutex, replace all occurrences of cgroup_mutex with it, and add
      cpuset_mutex locking to places which inherited cgroup_mutex from
      cgroup core.
      
      The only complication is from cpuset wanting to initiate task
      migration when a cpuset loses all cpus or memory nodes.  Task
      migration may go through full cgroup and all subsystem locking and
      should be initiated without holding any cpuset specific lock; however,
      a previous patch already made hotplug handled asynchronously and
      moving the task migration part outside other locks is easy.
      cpuset_propagate_hotplug_workfn() now invokes
      remove_tasks_in_empty_cpuset() without holding any lock.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      5d21cc2d
    • T
      cpuset: schedule hotplug propagation from cpuset_attach() if the cpuset is empty · 02bb5863
      Tejun Heo 提交于
      cpuset is scheduled to be decoupled from cgroup_lock which will make
      hotplug handling race with task migration.  cpus or mems will be
      allowed to go offline between ->can_attach() and ->attach().  If
      hotplug takes down all cpus or mems of a cpuset while attach is in
      progress, ->attach() may end up putting tasks into an empty cpuset.
      
      This patchset makes ->attach() schedule hotplug propagation if the
      cpuset is empty after attaching is complete.  This will move the tasks
      to the nearest ancestor which can execute and the end result would be
      as if hotplug handling happened after the tasks finished attaching.
      
      cpuset_write_resmask() now also flushes cpuset_propagate_hotplug_wq to
      wait for propagations scheduled directly by cpuset_attach().
      
      This currently doesn't make any functional difference as everything is
      protected by cgroup_mutex but enables decoupling the locking.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      02bb5863
    • T
      cpuset: pin down cpus and mems while a task is being attached · 452477fa
      Tejun Heo 提交于
      cpuset is scheduled to be decoupled from cgroup_lock which will make
      configuration updates race with task migration.  Any config update
      will be allowed to happen between ->can_attach() and ->attach().  If
      such config update removes either all cpus or mems, by the time
      ->attach() is called, the condition verified by ->can_attach(), that
      the cpuset is capable of hosting the tasks, is no longer true.
      
      This patch adds cpuset->attach_in_progress which is incremented from
      ->can_attach() and decremented when the attach operation finishes
      either successfully or not.  validate_change() treats cpusets w/
      non-zero ->attach_in_progress like cpusets w/ tasks and refuses to
      remove all cpus or mems from it.
      
      This currently doesn't make any functional difference as everything is
      protected by cgroup_mutex but enables decoupling the locking.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      452477fa
    • T
      cpuset: make CPU / memory hotplug propagation asynchronous · 8d033948
      Tejun Heo 提交于
      cpuset_hotplug_workfn() has been invoking cpuset_propagate_hotplug()
      directly to propagate hotplug updates to !root cpusets; however, this
      has the following problems.
      
      * cpuset locking is scheduled to be decoupled from cgroup_mutex,
        cgroup_mutex will be unexported, and cgroup_attach_task() will do
        cgroup locking internally, so propagation can't synchronously move
        tasks to a parent cgroup while walking the hierarchy.
      
      * We can't use cgroup generic tree iterator because propagation to
        each cpuset may sleep.  With propagation done asynchronously, we can
        lose the rather ugly cpuset specific iteration.
      
      Convert cpuset_propagate_hotplug() to
      cpuset_propagate_hotplug_workfn() and execute it from newly added
      cpuset->hotplug_work.  The work items are run on an ordered workqueue,
      so the propagation order is preserved.  cpuset_hotplug_workfn()
      schedules all propagations while holding cgroup_mutex and waits for
      completion without cgroup_mutex.  Each in-flight propagation holds a
      reference to the cpuset->css.
      
      This patch doesn't cause any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      8d033948
    • T
      cpuset: drop async_rebuild_sched_domains() · 699140ba
      Tejun Heo 提交于
      In general, we want to make cgroup_mutex one of the outermost locks
      and be able to use get_online_cpus() and friends from cgroup methods.
      With cpuset hotplug made async, get_online_cpus() can now be nested
      inside cgroup_mutex.
      
      Currently, cpuset avoids nesting get_online_cpus() inside cgroup_mutex
      by bouncing sched_domain rebuilding to a work item.  As such nesting
      is allowed now, remove the workqueue bouncing code and always rebuild
      sched_domains synchronously.  This also nests sched_domains_mutex
      inside cgroup_mutex, which is intended and should be okay.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      699140ba
    • T
      cpuset: don't nest cgroup_mutex inside get_online_cpus() · 3a5a6d0c
      Tejun Heo 提交于
      CPU / memory hotplug path currently grabs cgroup_mutex from hotplug
      event notifications.  We want to separate cpuset locking from cgroup
      core and make cgroup_mutex outer to hotplug synchronization so that,
      among other things, mechanisms which depend on get_online_cpus() can
      be used from cgroup callbacks.  In general, we want to keep
      cgroup_mutex the outermost lock to minimize locking interactions among
      different controllers.
      
      Convert cpuset_handle_hotplug() to cpuset_hotplug_workfn() and
      schedule it from the hotplug notifications.  As the function can
      already handle multiple mixed events without any input, converting it
      to a work function is mostly trivial; however, one complication is
      that cpuset_update_active_cpus() needs to update sched domains
      synchronously to reflect an offlined cpu to avoid confusing the
      scheduler.  This is worked around by falling back to the the default
      single sched domain synchronously before scheduling the actual hotplug
      work.  This makes sched domain rebuilt twice per CPU hotplug event but
      the operation isn't that heavy and a lot of the second operation would
      be noop for systems w/ single sched domain, which is the common case.
      
      This decouples cpuset hotplug handling from the notification callbacks
      and there can be an arbitrary delay between the actual event and
      updates to cpusets.  Scheduler and mm can handle it fine but moving
      tasks out of an empty cpuset may race against writes to the cpuset
      restoring execution resources which can lead to confusing behavior.
      Flush hotplug work item from cpuset_write_resmask() to avoid such
      confusions.
      
      v2: Synchronous sched domain rebuilding using the fallback sched
          domain added.  This fixes various issues caused by confused
          scheduler putting tasks on a dead CPU, including the one reported
          by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      3a5a6d0c
    • T
      cpuset: reorganize CPU / memory hotplug handling · deb7aa30
      Tejun Heo 提交于
      Reorganize hotplug path to prepare for async hotplug handling.
      
      * Both CPU and memory hotplug handlings are collected into a single
        function - cpuset_handle_hotplug().  It doesn't take any argument
        but compares the current setttings of top_cpuset against what's
        actually available to determine what happened.  This function
        directly updates top_cpuset.  If there are CPUs or memory nodes
        which are taken down, cpuset_propagate_hotplug() in invoked on all
        !root cpusets.
      
      * cpuset_propagate_hotplug() is responsible for updating the specified
        cpuset so that it doesn't include any resource which isn't available
        to top_cpuset.  If no CPU or memory is left after update, all tasks
        are moved to the nearest ancestor with both resources.
      
      * update_tasks_cpumask() and update_tasks_nodemask() are now always
        called after cpus or mems masks are updated even if the cpuset
        doesn't have any task.  This is for brevity and not expected to have
        any measureable effect.
      
      * cpu_active_mask and N_HIGH_MEMORY are read exactly once per
        cpuset_handle_hotplug() invocation, all cpusets share the same view
        of what resources are available, and cpuset_handle_hotplug() can
        handle multiple resources going up and down.  These properties will
        allow async operation.
      
      The reorganization, while drastic, is equivalent and shouldn't cause
      any behavior difference.  This will enable making hotplug handling
      async and remove get_online_cpus() -> cgroup_mutex nesting.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      deb7aa30
    • T
      cpuset: cleanup cpuset[_can]_attach() · 4e4c9a14
      Tejun Heo 提交于
      cpuset_can_attach() prepare global variables cpus_attach and
      cpuset_attach_nodemask_{to|from} which are used by cpuset_attach().
      There is no reason to prepare in cpuset_can_attach().  The same
      information can be accessed from cpuset_attach().
      
      Move the prepartion logic from cpuset_can_attach() to cpuset_attach()
      and make the global variables static ones inside cpuset_attach().
      
      With this change, there's no reason to keep
      cpuset_attach_nodemask_{from|to} global.  Move them inside
      cpuset_attach().  Unfortunately, we need to keep cpus_attach global as
      it can't be allocated from cpuset_attach().
      
      v2: cpus_attach not converted to cpumask_t as per Li Zefan and Rusty
          Russell.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      4e4c9a14
    • T
      cpuset: introduce cpuset_for_each_child() · ae8086ce
      Tejun Heo 提交于
      Instead of iterating cgroup->children directly, introduce and use
      cpuset_for_each_child() which wraps cgroup_for_each_child() and
      performs online check.  As it uses the generic iterator, it requires
      RCU read locking too.
      
      As cpuset is currently protected by cgroup_mutex, non-online cpusets
      aren't visible to all the iterations and this patch currently doesn't
      make any functional difference.  This will be used to de-couple cpuset
      locking from cgroup core.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      ae8086ce
    • T
      cpuset: introduce CS_ONLINE · efeb77b2
      Tejun Heo 提交于
      Add CS_ONLINE which is set from css_online() and cleared from
      css_offline().  This will enable using generic cgroup iterator while
      allowing decoupling cpuset from cgroup internal locking.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      efeb77b2
    • T
      cpuset: introduce ->css_on/offline() · c8f699bb
      Tejun Heo 提交于
      Add cpuset_css_on/offline() and rearrange css init/exit such that,
      
      * Allocation and clearing to the default values happen in css_alloc().
        Allocation now uses kzalloc().
      
      * Config inheritance and registration happen in css_online().
      
      * css_offline() undoes what css_online() did.
      
      * css_free() frees.
      
      This doesn't introduce any visible behavior changes.  This will help
      cleaning up locking.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      c8f699bb
    • T
      cpuset: remove fast exit path from remove_tasks_in_empty_cpuset() · 0772324a
      Tejun Heo 提交于
      The function isn't that hot, the overhead of missing the fast exit is
      low, the test itself depends heavily on cgroup internals, and it's
      gonna be a hindrance when trying to decouple cpuset locking from
      cgroup core.  Remove the fast exit path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      0772324a
    • T
      cpuset: remove unused cpuset_unlock() · 01c889cf
      Tejun Heo 提交于
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      01c889cf
  13. 13 12月, 2012 1 次提交
  14. 20 11月, 2012 2 次提交
  15. 24 7月, 2012 4 次提交
  16. 02 4月, 2012 1 次提交
    • T
      cgroup: convert all non-memcg controllers to the new cftype interface · 4baf6e33
      Tejun Heo 提交于
      Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio,
      net_cls and device controllers to use the new cftype based interface.
      Termination entry is added to cftype arrays and populate callbacks are
      replaced with cgroup_subsys->base_cftypes initializations.
      
      This is functionally identical transformation.  There shouldn't be any
      visible behavior change.
      
      memcg is rather special and will be converted separately.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <paul@paulmenage.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      4baf6e33
  17. 29 3月, 2012 1 次提交
  18. 28 3月, 2012 1 次提交