1. 13 1月, 2014 1 次提交
    • D
      sched: Add new scheduler syscalls to support an extended scheduling parameters ABI · d50dde5a
      Dario Faggioli 提交于
      Add the syscalls needed for supporting scheduling algorithms
      with extended scheduling parameters (e.g., SCHED_DEADLINE).
      
      In general, it makes possible to specify a periodic/sporadic task,
      that executes for a given amount of runtime at each instance, and is
      scheduled according to the urgency of their own timing constraints,
      i.e.:
      
       - a (maximum/typical) instance execution time,
       - a minimum interval between consecutive instances,
       - a time constraint by which each instance must be completed.
      
      Thus, both the data structure that holds the scheduling parameters of
      the tasks and the system calls dealing with it must be extended.
      Unfortunately, modifying the existing struct sched_param would break
      the ABI and result in potentially serious compatibility issues with
      legacy binaries.
      
      For these reasons, this patch:
      
       - defines the new struct sched_attr, containing all the fields
         that are necessary for specifying a task in the computational
         model described above;
      
       - defines and implements the new scheduling related syscalls that
         manipulate it, i.e., sched_setattr() and sched_getattr().
      
      Syscalls are introduced for x86 (32 and 64 bits) and ARM only, as a
      proof of concept and for developing and testing purposes. Making them
      available on other architectures is straightforward.
      
      Since no "user" for these new parameters is introduced in this patch,
      the implementation of the new system calls is just identical to their
      already existing counterpart. Future patches that implement scheduling
      policies able to exploit the new data structure must also take care of
      modifying the sched_*attr() calls accordingly with their own purposes.
      Signed-off-by: NDario Faggioli <raistlin@linux.it>
      [ Rewrote to use sched_attr. ]
      Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
      [ Removed sched_setscheduler2() for now. ]
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1383831828-15501-3-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d50dde5a
  2. 12 1月, 2014 1 次提交
    • R
      sched: Calculate effective load even if local weight is 0 · 9722c2da
      Rik van Riel 提交于
      Thomas Hellstrom bisected a regression where erratic 3D performance is
      experienced on virtual machines as measured by glxgears. It identified
      commit 58d081b5 ("sched/numa: Avoid overloading CPUs on a preferred NUMA
      node") as the problem which had modified the behaviour of effective_load.
      
      Effective load calculates the difference to the system-wide load if a
      scheduling entity was moved to another CPU. The task group is not heavier
      as a result of the move but overall system load can increase/decrease as a
      result of the change. Commit 58d081b5 ("sched/numa: Avoid overloading CPUs
      on a preferred NUMA node") changed effective_load to make it suitable for
      calculating if a particular NUMA node was compute overloaded. To reduce
      the cost of the function, it assumed that a current sched entity weight
      of 0 was uninteresting but that is not the case.
      
      wake_affine() uses a weight of 0 for sync wakeups on the grounds that it
      is assuming the waking task will sleep and not contribute to load in the
      near future. In this case, we still want to calculate the effective load
      of the sched entity hierarchy. As effective_load is no longer used by
      task_numa_compare since commit fb13c7ee (sched/numa: Use a system-wide
      search to find swap/migration candidates), this patch simply restores the
      historical behaviour.
      Reported-and-tested-by: NThomas Hellstrom <thellstrom@vmware.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      [ Wrote changelog]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140106113912.GC6178@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9722c2da
  3. 19 12月, 2013 1 次提交
  4. 17 12月, 2013 6 次提交
  5. 11 12月, 2013 2 次提交
    • P
      sched/fair: Rework sched_fair time accounting · 9dbdb155
      Peter Zijlstra 提交于
      Christian suffers from a bad BIOS that wrecks his i5's TSC sync. This
      results in him occasionally seeing time going backwards - which
      crashes the scheduler ...
      
      Most of our time accounting can actually handle that except the most
      common one; the tick time update of sched_fair.
      
      There is a further problem with that code; previously we assumed that
      because we get a tick every TICK_NSEC our time delta could never
      exceed 32bits and math was simpler.
      
      However, ever since Frederic managed to get NO_HZ_FULL merged; this is
      no longer the case since now a task can run for a long time indeed
      without getting a tick. It only takes about ~4.2 seconds to overflow
      our u32 in nanoseconds.
      
      This means we not only need to better deal with time going backwards;
      but also means we need to be able to deal with large deltas.
      
      This patch reworks the entire code and uses mul_u64_u32_shr() as
      proposed by Andy a long while ago.
      
      We express our virtual time scale factor in a u32 multiplier and shift
      right and the 32bit mul_u64_u32_shr() implementation reduces to a
      single 32x32->64 multiply if the time delta is still short (common
      case).
      
      For 64bit a 64x64->128 multiply can be used if ARCH_SUPPORTS_INT128.
      Reported-and-Tested-by: NChristian Engelmayer <cengelma@gmx.at>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: fweisbec@gmail.com
      Cc: Paul Turner <pjt@google.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20131118172706.GI3866@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9dbdb155
    • P
      sched: Initialize power_orig for overlapping groups · 8e8339a3
      Peter Zijlstra 提交于
      Yinghai reported that he saw a /0 in sg_capacity on his EX parts.
      Make sure to always initialize power_orig now that we actually use it.
      
      Ideally build_sched_domains() -> init_sched_groups_power() would also
      initialize this; but for some yet unexplained reason some setups seem
      to miss updates there.
      Reported-by: NYinghai Lu <yinghai@kernel.org>
      Tested-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-l8ng2m9uml6fhibln8wqpom7@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8e8339a3
  6. 05 12月, 2013 1 次提交
  7. 27 11月, 2013 6 次提交
  8. 20 11月, 2013 3 次提交
  9. 13 11月, 2013 4 次提交
  10. 06 11月, 2013 5 次提交
  11. 29 10月, 2013 5 次提交
  12. 28 10月, 2013 1 次提交
  13. 26 10月, 2013 1 次提交
  14. 16 10月, 2013 3 次提交
    • P
      sched: Remove get_online_cpus() usage · 6acce3ef
      Peter Zijlstra 提交于
      Remove get_online_cpus() usage from the scheduler; there's 4 sites that
      use it:
      
       - sched_init_smp(); where its completely superfluous since we're in
         'early' boot and there simply cannot be any hotplugging.
      
       - sched_getaffinity(); we already take a raw spinlock to protect the
         task cpus_allowed mask, this disables preemption and therefore
         also stabilizes cpu_online_mask as that's modified using
         stop_machine. However switch to active mask for symmetry with
         sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
         mask stability by inserting sync_rcu/sched() into _cpu_down.
      
       - sched_setaffinity(); we don't appear to need get_online_cpus()
         either, there's two sites where hotplug appears relevant:
          * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
            for the cpuset case we hold task_lock, which is a spinlock and
            thus for mainline disables preemption (might cause pain on RT).
          * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
            preemption properly disabled; also it already deals with hotplug
            races explicitly where it releases them.
      
       - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
         us with a little trickery. By adding a sync_sched/rcu() after the
         CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
         cpu_active_mask. Use these to validate that both our cpus are active
         when queueing the stop work before we queue the stop_machine works
         for take_cpu_down().
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6acce3ef
    • P
      sched: Fix race in migrate_swap_stop() · 74602315
      Peter Zijlstra 提交于
      There is a subtle race in migrate_swap, when task P, on CPU A, decides to swap
      places with task T, on CPU B.
      
      Task P:
        - call migrate_swap
      Task T:
        - go to sleep, removing itself from the runqueue
      Task P:
        - double lock the runqueues on CPU A & B
      Task T:
        - get woken up, place itself on the runqueue of CPU C
      Task P:
        - see that task T is on a runqueue, and pretend to remove it
          from the runqueue on CPU B
      
      Now CPUs B & C both have corrupted scheduler data structures.
      
      This patch fixes it, by holding the pi_lock for both of the tasks
      involved in the migrate swap. This prevents task T from waking up,
      and placing itself onto another runqueue, until after migrate_swap
      has released all locks.
      
      This means that, when migrate_swap checks, task T will be either
      on the runqueue where it was originally seen, or not on any
      runqueue at all. Migrate_swap deals correctly with of those cases.
      Tested-by: NJoe Mario <jmario@redhat.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: hannes@cmpxchg.org
      Cc: aarcange@redhat.com
      Cc: srikar@linux.vnet.ibm.com
      Cc: tglx@linutronix.de
      Cc: hpa@zytor.com
      Link: http://lkml.kernel.org/r/20131010181722.GO13848@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74602315
    • P
      sched/rt: Add missing rmb() · 7c3f2ab7
      Peter Zijlstra 提交于
      While discussing the proposed SCHED_DEADLINE patches which in parts
      mimic the existing FIFO code it was noticed that the wmb in
      rt_set_overloaded() didn't have a matching barrier.
      
      The only site using rt_overloaded() to test the rto_count is
      pull_rt_task() and we should issue a matching rmb before then assuming
      there's an rto_mask bit set.
      
      Without that smp_rmb() in there we could actually miss seeing the
      rto_mask bit.
      
      Also, change to using smp_[wr]mb(), even though this is SMP only code;
      memory barriers without smp_ always make me think they're against
      hardware of some sort.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: vincent.guittot@linaro.org
      Cc: luca.abeni@unitn.it
      Cc: bruce.ashfield@windriver.com
      Cc: dhaval.giani@gmail.com
      Cc: rostedt@goodmis.org
      Cc: hgu1972@gmail.com
      Cc: oleg@redhat.com
      Cc: fweisbec@gmail.com
      Cc: darren@dvhart.com
      Cc: johan.eker@ericsson.com
      Cc: p.faure@akatech.ch
      Cc: paulmck@linux.vnet.ibm.com
      Cc: raistlin@linux.it
      Cc: claudio@evidence.eu.com
      Cc: insop.song@gmail.com
      Cc: michael@amarulasolutions.com
      Cc: liming.wang@windriver.com
      Cc: fchecconi@gmail.com
      Cc: jkacur@redhat.com
      Cc: tommaso.cucinotta@sssup.it
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: harald.gustafsson@ericsson.com
      Cc: nicola.manica@disi.unitn.it
      Cc: tglx@linutronix.de
      Link: http://lkml.kernel.org/r/20131015103507.GF10651@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7c3f2ab7