1. 27 11月, 2013 1 次提交
  2. 20 11月, 2013 2 次提交
  3. 13 11月, 2013 1 次提交
  4. 06 11月, 2013 3 次提交
  5. 29 10月, 2013 1 次提交
  6. 28 10月, 2013 1 次提交
  7. 16 10月, 2013 2 次提交
    • P
      sched: Remove get_online_cpus() usage · 6acce3ef
      Peter Zijlstra 提交于
      Remove get_online_cpus() usage from the scheduler; there's 4 sites that
      use it:
      
       - sched_init_smp(); where its completely superfluous since we're in
         'early' boot and there simply cannot be any hotplugging.
      
       - sched_getaffinity(); we already take a raw spinlock to protect the
         task cpus_allowed mask, this disables preemption and therefore
         also stabilizes cpu_online_mask as that's modified using
         stop_machine. However switch to active mask for symmetry with
         sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
         mask stability by inserting sync_rcu/sched() into _cpu_down.
      
       - sched_setaffinity(); we don't appear to need get_online_cpus()
         either, there's two sites where hotplug appears relevant:
          * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
            for the cpuset case we hold task_lock, which is a spinlock and
            thus for mainline disables preemption (might cause pain on RT).
          * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
            preemption properly disabled; also it already deals with hotplug
            races explicitly where it releases them.
      
       - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
         us with a little trickery. By adding a sync_sched/rcu() after the
         CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
         cpu_active_mask. Use these to validate that both our cpus are active
         when queueing the stop work before we queue the stop_machine works
         for take_cpu_down().
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6acce3ef
    • P
      sched: Fix race in migrate_swap_stop() · 74602315
      Peter Zijlstra 提交于
      There is a subtle race in migrate_swap, when task P, on CPU A, decides to swap
      places with task T, on CPU B.
      
      Task P:
        - call migrate_swap
      Task T:
        - go to sleep, removing itself from the runqueue
      Task P:
        - double lock the runqueues on CPU A & B
      Task T:
        - get woken up, place itself on the runqueue of CPU C
      Task P:
        - see that task T is on a runqueue, and pretend to remove it
          from the runqueue on CPU B
      
      Now CPUs B & C both have corrupted scheduler data structures.
      
      This patch fixes it, by holding the pi_lock for both of the tasks
      involved in the migrate swap. This prevents task T from waking up,
      and placing itself onto another runqueue, until after migrate_swap
      has released all locks.
      
      This means that, when migrate_swap checks, task T will be either
      on the runqueue where it was originally seen, or not on any
      runqueue at all. Migrate_swap deals correctly with of those cases.
      Tested-by: NJoe Mario <jmario@redhat.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: hannes@cmpxchg.org
      Cc: aarcange@redhat.com
      Cc: srikar@linux.vnet.ibm.com
      Cc: tglx@linutronix.de
      Cc: hpa@zytor.com
      Link: http://lkml.kernel.org/r/20131010181722.GO13848@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74602315
  8. 09 10月, 2013 14 次提交
  9. 25 9月, 2013 7 次提交
  10. 20 9月, 2013 2 次提交
  11. 02 9月, 2013 1 次提交
    • P
      sched/fair: Fix the sd_parent_degenerate() code · 10866e62
      Peter Zijlstra 提交于
      I found that on my WSM box I had a redundant domain:
      
      [    0.949769] CPU0 attaching sched-domain:
      [    0.953765]  domain 0: span 0,12 level SIBLING
      [    0.958335]   groups: 0 (cpu_power = 587) 12 (cpu_power = 588)
      [    0.964548]   domain 1: span 0-5,12-17 level MC
      [    0.969206]    groups: 0,12 (cpu_power = 1175) 1,13 (cpu_power = 1176) 2,14 (cpu_power = 1176) 3,15 (cpu_power = 1176) 4,16 (cpu_power = 1176) 5,17 (cpu_power = 1176)
      [    0.984993]    domain 2: span 0-5,12-17 level CPU
      [    0.989822]     groups: 0-5,12-17 (cpu_power = 7055)
      [    0.995049]     domain 3: span 0-23 level NUMA
      [    0.999620]      groups: 0-5,12-17 (cpu_power = 7055) 6-11,18-23 (cpu_power = 7056)
      
      Note how domain 2 has only a single group and spans the same CPUs as
      domain 1. We should not keep such domains and do in fact have code to
      prune these.
      
      It turns out that the 'new' SD_PREFER_SIBLING flag causes this, it
      makes sd_parent_degenerate() fail on the CPU domain. We can easily
      fix this by 'ignoring' the SD_PREFER_SIBLING bit and transfering it
      to whatever domain ends up covering the span.
      
      With this patch the domains now look like this:
      
      [    0.950419] CPU0 attaching sched-domain:
      [    0.954454]  domain 0: span 0,12 level SIBLING
      [    0.959039]   groups: 0 (cpu_power = 587) 12 (cpu_power = 588)
      [    0.965271]   domain 1: span 0-5,12-17 level MC
      [    0.969936]    groups: 0,12 (cpu_power = 1175) 1,13 (cpu_power = 1176) 2,14 (cpu_power = 1176) 3,15 (cpu_power = 1176) 4,16 (cpu_power = 1176) 5,17 (cpu_power = 1176)
      [    0.985737]    domain 2: span 0-23 level NUMA
      [    0.990231]     groups: 0-5,12-17 (cpu_power = 7055) 6-11,18-23 (cpu_power = 7056)
      Reviewed-by: NPaul Turner <pjt@google.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-ys201g4jwukj0h8xcamakxq1@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      10866e62
  12. 16 8月, 2013 1 次提交
  13. 13 8月, 2013 2 次提交
    • O
      sched: fix the theoretical signal_wake_up() vs schedule() race · e0acd0a6
      Oleg Nesterov 提交于
      This is only theoretical, but after try_to_wake_up(p) was changed
      to check p->state under p->pi_lock the code like
      
      	__set_current_state(TASK_INTERRUPTIBLE);
      	schedule();
      
      can miss a signal. This is the special case of wait-for-condition,
      it relies on try_to_wake_up/schedule interaction and thus it does
      not need mb() between __set_current_state() and if(signal_pending).
      
      However, this __set_current_state() can move into the critical
      section protected by rq->lock, now that try_to_wake_up() takes
      another lock we need to ensure that it can't be reordered with
      "if (signal_pending(current))" check inside that section.
      
      The patch is actually one-liner, it simply adds smp_wmb() before
      spin_lock_irq(rq->lock). This is what try_to_wake_up() already
      does by the same reason.
      
      We turn this wmb() into the new helper, smp_mb__before_spinlock(),
      for better documentation and to allow the architectures to change
      the default implementation.
      
      While at it, kill smp_mb__after_lock(), it has no callers.
      
      Perhaps we can also add smp_mb__before/after_spinunlock() for
      prepare_to_wait().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0acd0a6
    • F
      sched: Consolidate open coded preemptible() checks · fbb00b56
      Frederic Weisbecker 提交于
      preempt_schedule() and preempt_schedule_context() open
      code their preemptability checks.
      
      Use the standard API instead for consolidation.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Alex Shi <alex.shi@intel.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      fbb00b56
  14. 09 8月, 2013 2 次提交
    • T
      cgroup: make cgroup_taskset deal with cgroup_subsys_state instead of cgroup · d99c8727
      Tejun Heo 提交于
      cgroup is in the process of converting to css (cgroup_subsys_state)
      from cgroup as the principal subsystem interface handle.  This is
      mostly to prepare for the unified hierarchy support where css's will
      be created and destroyed dynamically but also helps cleaning up
      subsystem implementations as css is usually what they are interested
      in anyway.
      
      cgroup_taskset which is used by the subsystem attach methods is the
      last cgroup subsystem API which isn't using css as the handle.  Update
      cgroup_taskset_cur_cgroup() to cgroup_taskset_cur_css() and
      cgroup_taskset_for_each() to take @skip_css instead of @skip_cgrp.
      
      The conversions are pretty mechanical.  One exception is
      cpuset::cgroup_cs(), which lost its last user and got removed.
      
      This patch shouldn't introduce any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      d99c8727
    • T
      cgroup: pass around cgroup_subsys_state instead of cgroup in file methods · 182446d0
      Tejun Heo 提交于
      cgroup is currently in the process of transitioning to using struct
      cgroup_subsys_state * as the primary handle instead of struct cgroup.
      Please see the previous commit which converts the subsystem methods
      for rationale.
      
      This patch converts all cftype file operations to take @css instead of
      @cgroup.  cftypes for the cgroup core files don't have their subsytem
      pointer set.  These will automatically use the dummy_css added by the
      previous patch and can be converted the same way.
      
      Most subsystem conversions are straight forwards but there are some
      interesting ones.
      
      * freezer: update_if_frozen() is also converted to take @css instead
        of @cgroup for consistency.  This will make the code look simpler
        too once iterators are converted to use css.
      
      * memory/vmpressure: mem_cgroup_from_css() needs to be exported to
        vmpressure while mem_cgroup_from_cont() can be made static.
        Updated accordingly.
      
      * cpu: cgroup_tg() doesn't have any user left.  Removed.
      
      * cpuacct: cgroup_ca() doesn't have any user left.  Removed.
      
      * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
        Removed.
      
      * net_cls: cgrp_cls_state() doesn't have any user left.  Removed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      182446d0