1. 08 1月, 2013 10 次提交
    • T
      cpuset: drop async_rebuild_sched_domains() · 699140ba
      Tejun Heo 提交于
      In general, we want to make cgroup_mutex one of the outermost locks
      and be able to use get_online_cpus() and friends from cgroup methods.
      With cpuset hotplug made async, get_online_cpus() can now be nested
      inside cgroup_mutex.
      
      Currently, cpuset avoids nesting get_online_cpus() inside cgroup_mutex
      by bouncing sched_domain rebuilding to a work item.  As such nesting
      is allowed now, remove the workqueue bouncing code and always rebuild
      sched_domains synchronously.  This also nests sched_domains_mutex
      inside cgroup_mutex, which is intended and should be okay.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      699140ba
    • T
      cpuset: don't nest cgroup_mutex inside get_online_cpus() · 3a5a6d0c
      Tejun Heo 提交于
      CPU / memory hotplug path currently grabs cgroup_mutex from hotplug
      event notifications.  We want to separate cpuset locking from cgroup
      core and make cgroup_mutex outer to hotplug synchronization so that,
      among other things, mechanisms which depend on get_online_cpus() can
      be used from cgroup callbacks.  In general, we want to keep
      cgroup_mutex the outermost lock to minimize locking interactions among
      different controllers.
      
      Convert cpuset_handle_hotplug() to cpuset_hotplug_workfn() and
      schedule it from the hotplug notifications.  As the function can
      already handle multiple mixed events without any input, converting it
      to a work function is mostly trivial; however, one complication is
      that cpuset_update_active_cpus() needs to update sched domains
      synchronously to reflect an offlined cpu to avoid confusing the
      scheduler.  This is worked around by falling back to the the default
      single sched domain synchronously before scheduling the actual hotplug
      work.  This makes sched domain rebuilt twice per CPU hotplug event but
      the operation isn't that heavy and a lot of the second operation would
      be noop for systems w/ single sched domain, which is the common case.
      
      This decouples cpuset hotplug handling from the notification callbacks
      and there can be an arbitrary delay between the actual event and
      updates to cpusets.  Scheduler and mm can handle it fine but moving
      tasks out of an empty cpuset may race against writes to the cpuset
      restoring execution resources which can lead to confusing behavior.
      Flush hotplug work item from cpuset_write_resmask() to avoid such
      confusions.
      
      v2: Synchronous sched domain rebuilding using the fallback sched
          domain added.  This fixes various issues caused by confused
          scheduler putting tasks on a dead CPU, including the one reported
          by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      3a5a6d0c
    • T
      cpuset: reorganize CPU / memory hotplug handling · deb7aa30
      Tejun Heo 提交于
      Reorganize hotplug path to prepare for async hotplug handling.
      
      * Both CPU and memory hotplug handlings are collected into a single
        function - cpuset_handle_hotplug().  It doesn't take any argument
        but compares the current setttings of top_cpuset against what's
        actually available to determine what happened.  This function
        directly updates top_cpuset.  If there are CPUs or memory nodes
        which are taken down, cpuset_propagate_hotplug() in invoked on all
        !root cpusets.
      
      * cpuset_propagate_hotplug() is responsible for updating the specified
        cpuset so that it doesn't include any resource which isn't available
        to top_cpuset.  If no CPU or memory is left after update, all tasks
        are moved to the nearest ancestor with both resources.
      
      * update_tasks_cpumask() and update_tasks_nodemask() are now always
        called after cpus or mems masks are updated even if the cpuset
        doesn't have any task.  This is for brevity and not expected to have
        any measureable effect.
      
      * cpu_active_mask and N_HIGH_MEMORY are read exactly once per
        cpuset_handle_hotplug() invocation, all cpusets share the same view
        of what resources are available, and cpuset_handle_hotplug() can
        handle multiple resources going up and down.  These properties will
        allow async operation.
      
      The reorganization, while drastic, is equivalent and shouldn't cause
      any behavior difference.  This will enable making hotplug handling
      async and remove get_online_cpus() -> cgroup_mutex nesting.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      deb7aa30
    • T
      cpuset: cleanup cpuset[_can]_attach() · 4e4c9a14
      Tejun Heo 提交于
      cpuset_can_attach() prepare global variables cpus_attach and
      cpuset_attach_nodemask_{to|from} which are used by cpuset_attach().
      There is no reason to prepare in cpuset_can_attach().  The same
      information can be accessed from cpuset_attach().
      
      Move the prepartion logic from cpuset_can_attach() to cpuset_attach()
      and make the global variables static ones inside cpuset_attach().
      
      With this change, there's no reason to keep
      cpuset_attach_nodemask_{from|to} global.  Move them inside
      cpuset_attach().  Unfortunately, we need to keep cpus_attach global as
      it can't be allocated from cpuset_attach().
      
      v2: cpus_attach not converted to cpumask_t as per Li Zefan and Rusty
          Russell.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      4e4c9a14
    • T
      cpuset: introduce cpuset_for_each_child() · ae8086ce
      Tejun Heo 提交于
      Instead of iterating cgroup->children directly, introduce and use
      cpuset_for_each_child() which wraps cgroup_for_each_child() and
      performs online check.  As it uses the generic iterator, it requires
      RCU read locking too.
      
      As cpuset is currently protected by cgroup_mutex, non-online cpusets
      aren't visible to all the iterations and this patch currently doesn't
      make any functional difference.  This will be used to de-couple cpuset
      locking from cgroup core.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      ae8086ce
    • T
      cpuset: introduce CS_ONLINE · efeb77b2
      Tejun Heo 提交于
      Add CS_ONLINE which is set from css_online() and cleared from
      css_offline().  This will enable using generic cgroup iterator while
      allowing decoupling cpuset from cgroup internal locking.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      efeb77b2
    • T
      cpuset: introduce ->css_on/offline() · c8f699bb
      Tejun Heo 提交于
      Add cpuset_css_on/offline() and rearrange css init/exit such that,
      
      * Allocation and clearing to the default values happen in css_alloc().
        Allocation now uses kzalloc().
      
      * Config inheritance and registration happen in css_online().
      
      * css_offline() undoes what css_online() did.
      
      * css_free() frees.
      
      This doesn't introduce any visible behavior changes.  This will help
      cleaning up locking.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      c8f699bb
    • T
      cpuset: remove fast exit path from remove_tasks_in_empty_cpuset() · 0772324a
      Tejun Heo 提交于
      The function isn't that hot, the overhead of missing the fast exit is
      low, the test itself depends heavily on cgroup internals, and it's
      gonna be a hindrance when trying to decouple cpuset locking from
      cgroup core.  Remove the fast exit path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      0772324a
    • T
      cpuset: remove unused cpuset_unlock() · 01c889cf
      Tejun Heo 提交于
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      01c889cf
    • T
      cgroup: implement cgroup_rightmost_descendant() · 12a9d2fe
      Tejun Heo 提交于
      Implement cgroup_rightmost_descendant() which returns the right most
      descendant of the specified cgroup.  This can be used to skip the
      cgroup's subtree while iterating with
      cgroup_for_each_descendant_pre().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      12a9d2fe
  2. 26 12月, 2012 1 次提交
    • E
      pidns: Stop pid allocation when init dies · c876ad76
      Eric W. Biederman 提交于
      Oleg pointed out that in a pid namespace the sequence.
      - pid 1 becomes a zombie
      - setns(thepidns), fork,...
      - reaping pid 1.
      - The injected processes exiting.
      
      Can lead to processes attempting access their child reaper and
      instead following a stale pointer.
      
      That waitpid for init can return before all of the processes in
      the pid namespace have exited is also unfortunate.
      
      Avoid these problems by disabling the allocation of new pids in a pid
      namespace when init dies, instead of when the last process in a pid
      namespace is reaped.
      Pointed-out-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      c876ad76
  3. 25 12月, 2012 1 次提交
  4. 21 12月, 2012 2 次提交
  5. 20 12月, 2012 7 次提交
  6. 19 12月, 2012 3 次提交
  7. 18 12月, 2012 9 次提交
  8. 17 12月, 2012 1 次提交
  9. 15 12月, 2012 3 次提交
  10. 14 12月, 2012 3 次提交