1. 04 5月, 2013 1 次提交
    • F
      sched: Keep at least 1 tick per second for active dynticks tasks · 265f22a9
      Frederic Weisbecker 提交于
      The scheduler doesn't yet fully support environments
      with a single task running without a periodic tick.
      
      In order to ensure we still maintain the duties of scheduler_tick(),
      keep at least 1 tick per second.
      
      This makes sure that we keep the progression of various scheduler
      accounting and background maintainance even with a very low granularity.
      Examples include cpu load, sched average, CFS entity vruntime,
      avenrun and events such as load balancing, amongst other details
      handled in sched_class::task_tick().
      
      This limitation will be removed in the future once we get
      these individual items to work in full dynticks CPUs.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      265f22a9
  2. 26 4月, 2013 1 次提交
    • V
      sched: Fix init NOHZ_IDLE flag · 25f55d9d
      Vincent Guittot 提交于
      On my SMP platform which is made of 5 cores in 2 clusters, I
      have the nr_busy_cpu field of sched_group_power struct that is
      not null when the platform is fully idle - which makes the
      scheduler unhappy.
      
      The root cause is:
      
      During the boot sequence, some CPUs reach the idle loop and set
      their NOHZ_IDLE flag while waiting for others CPUs to boot. But
      the nr_busy_cpus field is initialized later with the assumption
      that all CPUs are in the busy state whereas some CPUs have
      already set their NOHZ_IDLE flag.
      
      More generally, the NOHZ_IDLE flag must be initialized when new
      sched_domains are created in order to ensure that NOHZ_IDLE and
      nr_busy_cpus are aligned.
      
      This condition can be ensured by adding a synchronize_rcu()
      between the destruction of old sched_domains and the creation of
      new ones so the NOHZ_IDLE flag will not be updated with old
      sched_domain once it has been initialized. But this solution
      introduces a additionnal latency in the rebuild sequence that is
      called during cpu hotplug.
      
      As suggested by Frederic Weisbecker, another solution is to have
      the same rcu lifecycle for both NOHZ_IDLE and sched_domain
      struct. A new nohz_idle field is added to sched_domain so both
      status and sched_domain will share the same RCU lifecycle and
      will be always synchronized. In addition, there is no more need
      to protect nohz_idle against concurrent access as it is only
      modified by 2 exclusive functions called by local cpu.
      
      This solution has been prefered to the creation of a new struct
      with an extra pointer indirection for sched_domain.
      
      The synchronization is done at the cost of :
      
       - An additional indirection and a rcu_dereference for accessing nohz_idle.
       - We use only the nohz_idle field of the top sched_domain.
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: linaro-kernel@lists.linaro.org
      Cc: peterz@infradead.org
      Cc: fweisbec@gmail.com
      Cc: pjt@google.com
      Cc: rostedt@goodmis.org
      Cc: efault@gmx.de
      Link: http://lkml.kernel.org/r/1366729142-14662-1-git-send-email-vincent.guittot@linaro.org
      [ Fixed !NO_HZ build bug. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      25f55d9d
  3. 23 4月, 2013 1 次提交
    • F
      sched: Kick full dynticks CPU that have more than one task enqueued. · 9f3660c2
      Frederic Weisbecker 提交于
      Kick the tick on full dynticks CPUs when they get more
      than one task running on their queue. This makes sure that
      local fairness is maintained by the tick on the destination.
      
      This is done regardless of these tasks' class. We should
      be able to be more clever in the future depending on these. eg:
      a CPU that runs a SCHED_FIFO task doesn't need to maintain
      fairness against local pending tasks of the fair class.
      
      But keep things simple for now.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      9f3660c2
  4. 21 4月, 2013 1 次提交
    • V
      sched: Fix wrong rq's runnable_avg update with rt tasks · 642dbc39
      Vincent Guittot 提交于
      The current update of the rq's load can be erroneous when RT
      tasks are involved.
      
      The update of the load of a rq that becomes idle, is done only
      if the avg_idle is less than sysctl_sched_migration_cost. If RT
      tasks and short idle duration alternate, the runnable_avg will
      not be updated correctly and the time will be accounted as idle
      time when a CFS task wakes up.
      
      A new idle_enter function is called when the next task is the
      idle function so the elapsed time will be accounted as run time
      in the load of the rq, whatever the average idle time is. The
      function update_rq_runnable_avg is removed from idle_balance.
      
      When a RT task is scheduled on an idle CPU, the update of the
      rq's load is not done when the rq exit idle state because CFS's
      functions are not called. Then, the idle_balance, which is
      called just before entering the idle function, updates the rq's
      load and makes the assumption that the elapsed time since the
      last update, was only running time.
      
      As a consequence, the rq's load of a CPU that only runs a
      periodic RT task, is close to LOAD_AVG_MAX whatever the running
      duration of the RT task is.
      
      A new idle_exit function is called when the prev task is the
      idle function so the elapsed time will be accounted as idle time
      in the rq's load.
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: linaro-kernel@lists.linaro.org
      Cc: peterz@infradead.org
      Cc: pjt@google.com
      Cc: fweisbec@gmail.com
      Cc: efault@gmx.de
      Link: http://lkml.kernel.org/r/1366302867-5055-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      642dbc39
  5. 10 4月, 2013 1 次提交
  6. 03 4月, 2013 1 次提交
    • F
      nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON · 3451d024
      Frederic Weisbecker 提交于
      We are planning to convert the dynticks Kconfig options layout
      into a choice menu. The user must be able to easily pick
      any of the following implementations: constant periodic tick,
      idle dynticks, full dynticks.
      
      As this implies a mutual exclusion, the two dynticks implementions
      need to converge on the selection of a common Kconfig option in order
      to ease the sharing of a common infrastructure.
      
      It would thus seem pretty natural to reuse CONFIG_NO_HZ to
      that end. It already implements all the idle dynticks code
      and the full dynticks depends on all that code for now.
      So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
      CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.
      
      On the other hand we want to stay backward compatible: if
      CONFIG_NO_HZ is set in an older config file, we want to
      enable CONFIG_NO_HZ_IDLE by default.
      
      But we can't afford both at the same time or we run into
      a circular dependency:
      
      1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
         CONFIG_NO_HZ
      2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE
      
      We might be able to support that from Kconfig/Kbuild but it
      may not be wise to introduce such a confusing behaviour.
      
      So to solve this, create a new CONFIG_NO_HZ_COMMON option
      which gathers the common code between idle and full dynticks
      (that common code for now is simply the idle dynticks code)
      and select it from their referring Kconfig.
      
      Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
      to it for backward compatibility.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      3451d024
  7. 11 3月, 2013 1 次提交
  8. 06 3月, 2013 6 次提交
  9. 08 2月, 2013 2 次提交
  10. 11 12月, 2012 2 次提交
  11. 24 10月, 2012 9 次提交
  12. 13 9月, 2012 1 次提交
  13. 04 9月, 2012 1 次提交
  14. 20 8月, 2012 1 次提交
    • F
      sched: Move cputime code to its own file · 73fbec60
      Frederic Weisbecker 提交于
      Extract cputime code from the giant sched/core.c and
      put it in its own file. This make it easier to deal with
      this particular area and de-bloat a bit more core.c
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      73fbec60
  15. 14 8月, 2012 2 次提交
  16. 24 7月, 2012 1 次提交
  17. 06 7月, 2012 1 次提交
  18. 06 6月, 2012 1 次提交
  19. 14 5月, 2012 1 次提交
    • P
      sched/nohz: Fix rq->cpu_load[] calculations · 556061b0
      Peter Zijlstra 提交于
      While investigating why the load-balancer did funny I found that the
      rq->cpu_load[] tables were completely screwy.. a bit more digging
      revealed that the updates that got through were missing ticks followed
      by a catchup of 2 ticks.
      
      The catchup assumes the cpu was idle during that time (since only nohz
      can cause missed ticks and the machine is idle etc..) this means that
      esp. the higher indices were significantly lower than they ought to
      be.
      
      The reason for this is that its not correct to compare against jiffies
      on every jiffy on any other cpu than the cpu that updates jiffies.
      
      This patch cludges around it by only doing the catch-up stuff from
      nohz_idle_balance() and doing the regular stuff unconditionally from
      the tick.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: pjt@google.com
      Cc: Venkatesh Pallipadi <venki@google.com>
      Link: http://lkml.kernel.org/n/tip-tp4kj18xdd5aj4vvj0qg55s2@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      556061b0
  20. 09 5月, 2012 1 次提交
  21. 13 3月, 2012 1 次提交
  22. 01 3月, 2012 1 次提交
  23. 24 2月, 2012 1 次提交
    • I
      static keys: Introduce 'struct static_key', static_key_true()/false() and... · c5905afb
      Ingo Molnar 提交于
      static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()
      
      So here's a boot tested patch on top of Jason's series that does
      all the cleanups I talked about and turns jump labels into a
      more intuitive to use facility. It should also address the
      various misconceptions and confusions that surround jump labels.
      
      Typical usage scenarios:
      
              #include <linux/static_key.h>
      
              struct static_key key = STATIC_KEY_INIT_TRUE;
      
              if (static_key_false(&key))
                      do unlikely code
              else
                      do likely code
      
      Or:
      
              if (static_key_true(&key))
                      do likely code
              else
                      do unlikely code
      
      The static key is modified via:
      
              static_key_slow_inc(&key);
              ...
              static_key_slow_dec(&key);
      
      The 'slow' prefix makes it abundantly clear that this is an
      expensive operation.
      
      I've updated all in-kernel code to use this everywhere. Note
      that I (intentionally) have not pushed through the rename
      blindly through to the lowest levels: the actual jump-label
      patching arch facility should be named like that, so we want to
      decouple jump labels from the static-key facility a bit.
      
      On non-jump-label enabled architectures static keys default to
      likely()/unlikely() branches.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NJason Baron <jbaron@redhat.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: a.p.zijlstra@chello.nl
      Cc: mathieu.desnoyers@efficios.com
      Cc: davem@davemloft.net
      Cc: ddaney.cavm@gmail.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.huSigned-off-by: NIngo Molnar <mingo@elte.hu>
      c5905afb
  24. 22 2月, 2012 1 次提交