1. 23 4月, 2013 4 次提交
    • F
      sched: Kick full dynticks CPU that have more than one task enqueued. · 9f3660c2
      Frederic Weisbecker 提交于
      Kick the tick on full dynticks CPUs when they get more
      than one task running on their queue. This makes sure that
      local fairness is maintained by the tick on the destination.
      
      This is done regardless of these tasks' class. We should
      be able to be more clever in the future depending on these. eg:
      a CPU that runs a SCHED_FIFO task doesn't need to maintain
      fairness against local pending tasks of the fair class.
      
      But keep things simple for now.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      9f3660c2
    • F
      perf: New helper to prevent full dynticks CPUs from stopping tick · 026249ef
      Frederic Weisbecker 提交于
      Provide a new helper that help full dynticks CPUs to prevent
      from stopping their tick in case there are events in the local
      rotation list.
      
      This way we make sure that perf_event_task_tick() is serviced
      on demand.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      026249ef
    • F
      perf: Kick full dynticks CPU if events rotation is needed · 12351ef8
      Frederic Weisbecker 提交于
      Kick the current CPU's tick by sending it a self IPI when
      an event is queued on the rotation list and it is the first
      element inserted. This makes sure that perf_event_task_tick()
      works on full dynticks CPUs.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      12351ef8
    • F
      posix_timers: Fix pre-condition to stop the tick on full dynticks · 6ac29178
      Frederic Weisbecker 提交于
      The test that checks if a CPU can stop its tick from posix CPU
      timers angle was mistakenly inverted.
      
      What we want is to prevent the tick from being stopped as long
      as the current CPU's task runs a posix CPU timer.
      
      Fix this.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      6ac29178
  2. 21 4月, 2013 2 次提交
  3. 19 4月, 2013 6 次提交
    • F
      posix_timers: New API to prevent from stopping the tick when timers are running · 555347f6
      Frederic Weisbecker 提交于
      Bring a new helper that the full dynticks infrastructure can
      call in order to know if it can safely stop the tick from
      the posix cpu timers subsystem point of view.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      555347f6
    • F
      posix_timers: Kick full dynticks CPUs when a posix cpu timer is armed · a8572160
      Frederic Weisbecker 提交于
      Kick the full dynticks CPUs when a posix cpu timer is enqueued by
      way of a standard call to posix_cpu_timer_set() or set_process_cpu_timer().
      This also include rescheduled firing timers.
      
      This way they can re-evaluate the state of (and possibly restart) their
      tick against the new expiry modification.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      a8572160
    • F
      nohz: New option to default all CPUs in full dynticks range · f98823ac
      Frederic Weisbecker 提交于
      Provide a new kernel config that defaults all CPUs to be part
      of the full dynticks range, except the boot one for timekeeping.
      
      This default setting is overriden by the nohz_full= boot option
      if passed by the user.
      
      This is helpful for those who don't need a finegrained range
      of full dynticks CPU and also for automated testing.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      f98823ac
    • F
      nohz: Ensure full dynticks CPUs are RCU nocbs · d1e43fa5
      Frederic Weisbecker 提交于
      We need full dynticks CPU to also be RCU nocb so
      that we don't have to keep the tick to handle RCU
      callbacks.
      
      Make sure the range passed to nohz_full= boot
      parameter is a subset of rcu_nocbs=
      
      The CPUs that fail to meet this requirement will be
      excluded from the nohz_full range. This is checked
      early in boot time, before any CPU has the opportunity
      to stop its tick.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      d1e43fa5
    • F
      nohz: Force boot CPU outside full dynticks range · 0453b435
      Frederic Weisbecker 提交于
      The timekeeping job must be able to run early on boot
      because there may be some pre-SMP (and thus pre-initcalls )
      components that rely on it. The IO-APIC is one such users
      as it tests the timer health by watching jiffies progression.
      
      Given that it happens before we know the initial online
      set, we can't rely on it to select a timekeeper. We need
      one before SMP time otherwise we simply crash on boot.
      
      To fix this and keep things simple for now, force the boot CPU
      outside of the full dynticks range in any case and do this early
      on kernel parameter parsing time.
      
      We might want a trickier solution later, expecially for aSMP
      architectures that need to assign housekeeping tasks to arbitrary
      low power CPUs.
      
      But it's still first pass KISS time for now.
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      0453b435
    • F
      nohz: New APIs to re-evaluate the tick on full dynticks CPUs · 76c24fb0
      Frederic Weisbecker 提交于
      Provide two new helpers in order to notify the full dynticks CPUs about
      some internal system changes against which they may reconsider the state
      of their tick. Some practical examples include: posix cpu timers, perf tick
      and sched clock tick.
      
      For now the notifying handler, implemented through IPIs, is a stub
      that will be implemented when we get the tick stop/restart infrastructure
      in.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      76c24fb0
  4. 16 4月, 2013 5 次提交
    • P
      rcu: Kick adaptive-ticks CPUs that are holding up RCU grace periods · 65d798f0
      Paul E. McKenney 提交于
      Adaptive-ticks CPUs inform RCU when they enter kernel mode, but they do
      not necessarily turn the scheduler-clock tick back on.  This state of
      affairs could result in RCU waiting on an adaptive-ticks CPU running
      for an extended period in kernel mode.  Such a CPU will never run the
      RCU state machine, and could therefore indefinitely extend the RCU state
      machine, sooner or later resulting in an OOM condition.
      
      This patch, inspired by an earlier patch by Frederic Weisbecker, therefore
      causes RCU's force-quiescent-state processing to check for this condition
      and to send an IPI to CPUs that remain in that state for too long.
      "Too long" currently means about three jiffies by default, which is
      quite some time for a CPU to remain in the kernel without blocking.
      The rcu_tree.jiffies_till_first_fqs and rcutree.jiffies_till_next_fqs
      sysfs variables may be used to tune "too long" if needed.
      Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      65d798f0
    • F
      nohz: Improve a bit the full dynticks Kconfig documentation · fae30dd6
      Frederic Weisbecker 提交于
      Remove the "single task" statement from CONFIG_NO_HZ_FULL
      title. The constraint can be invalidated when tasks from
      other sched classes than SCHED_FAIR are running. Moreover
      it's possible that hrtick join the party in the future.
      
      Also add a line about the dependency on SMP.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      fae30dd6
    • F
      nohz: Align periodic tick Kconfig with other choices' naming convention · 5b533f4f
      Frederic Weisbecker 提交于
      Rename CONFIG_PERIODIC_HZ to CONFIG_HZ_PERIODIC in
      order to stay consistent with other tick implementation
      entries:
      
      	CONFIG_HZ_PERIODIC
      	CONFIG_NO_HZ_IDLE
      	CONFIG_NO_HZ_FULL
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      5b533f4f
    • F
      nohz: Switch from "extended nohz" to "full nohz" based naming · c5bfece2
      Frederic Weisbecker 提交于
      "Extended nohz" was used as a naming base for the full dynticks
      API and Kconfig symbols. It reflects the fact the system tries
      to stop the tick in more places than just idle.
      
      But that "extended" name is a bit opaque and vague. Rename it to
      "full" makes it clearer what the system tries to do under this
      config: try to shutdown the tick anytime it can. The various
      constraints that prevent that to happen shouldn't be considered
      as fundamental properties of this feature but rather technical
      issues that may be solved in the future.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      c5bfece2
    • F
      nohz: Fix old dynticks idle Kconfig backward compatibility · 0644ca5c
      Frederic Weisbecker 提交于
      In order to enforce backward compatibility with older
      config files, we want the new dynticks-idle Kconfig entry
      to default its value to the one of the old CONFIG_NO_HZ symbol
      if present.
      
      Namely we want:
      
      	config NO_HZ # old obsolete dynticks idle symbol
      		bool
      
      	config NO_HZ_IDLE # new dynticks idle symbol
      		default NO_HZ
      
      However Kconfig prevents this to work if the old symbol
      is not visible. And this is currently the case because
      NO_HZ lacks a title in order to show it in make oldconfig
      and alike.
      
      To fix this, bring a minimal title and help text to the
      obsolete Kconfig entry that explains its purpose. This
      makes the "defaulting" to work.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      0644ca5c
  5. 03 4月, 2013 4 次提交
    • F
      nohz: Print final full dynticks CPUs range on boot · 1034fc2f
      Frederic Weisbecker 提交于
      Given that we apply a few restrictions on the full dynticks
      CPUs range (keep an online timekeeper oustide the range,
      then in the future have the range be an RCU nocb CPUs subset),
      let's print the final resulting range of full dynticks CPUs to
      the user so that he knows what's really going to run.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      1034fc2f
    • F
      nohz: Pack nohz Kconfig option in a menu of choices · 3ca277e4
      Frederic Weisbecker 提交于
      Now the user has the choice between three implementations of
      the timer tick:
      
      * Static periodic tick
      * Idle dynticks
      * Full dynticks
      
      At least for now, these are mutually exclusive choices, so
      let's rely on the proper Kconfig feature to display these
      to the user.
      
      A new entry CONFIG_NO_HZ_IDLE is created and the old
      CONFIG_NO_HZ maps to it for config file backward compatibility.
      The old name was too general now that we have more
      granular dynticks implementations.
      
      While at it, add some explanation to help the user on
      his decision between the 3 entries.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      3ca277e4
    • F
      nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON · 3451d024
      Frederic Weisbecker 提交于
      We are planning to convert the dynticks Kconfig options layout
      into a choice menu. The user must be able to easily pick
      any of the following implementations: constant periodic tick,
      idle dynticks, full dynticks.
      
      As this implies a mutual exclusion, the two dynticks implementions
      need to converge on the selection of a common Kconfig option in order
      to ease the sharing of a common infrastructure.
      
      It would thus seem pretty natural to reuse CONFIG_NO_HZ to
      that end. It already implements all the idle dynticks code
      and the full dynticks depends on all that code for now.
      So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
      CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.
      
      On the other hand we want to stay backward compatible: if
      CONFIG_NO_HZ is set in an older config file, we want to
      enable CONFIG_NO_HZ_IDLE by default.
      
      But we can't afford both at the same time or we run into
      a circular dependency:
      
      1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
         CONFIG_NO_HZ
      2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE
      
      We might be able to support that from Kconfig/Kbuild but it
      may not be wise to introduce such a confusing behaviour.
      
      So to solve this, create a new CONFIG_NO_HZ_COMMON option
      which gathers the common code between idle and full dynticks
      (that common code for now is simply the idle dynticks code)
      and select it from their referring Kconfig.
      
      Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
      to it for backward compatibility.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      3451d024
    • F
      nohz: Unhide full dynticks feature from its dependencies · ab71d36d
      Frederic Weisbecker 提交于
      The full dynticks feature only shows up when all its
      Kconfig dependencies are met (RCU nocbs, RCU user mode, ...)
      
      This is far from being user friendly as those who want to
      activate this feature need to look into the Kconfig files
      and iterate through each dependency then activate these
      by hand in order to show and select the full dynticks
      Kconfig option.
      
      So process the other way around: show up the Kconfig option
      if the minimal low level dependencies are met and activate
      the high level ones when we enable the feature.
      
      Note there is one exception in the picture:
      CONFIG_VIRT_CPU_ACCOUNTING_GEN is part of a Kconfig choice
      menu and it appears we can't select it from another Kconfig
      selection when it's under such layout. So for now this
      particular item stays as a passive dependency.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      ab71d36d
  6. 21 3月, 2013 3 次提交
    • F
      nohz: Wake up full dynticks CPUs when a timer gets enqueued · 1c20091e
      Frederic Weisbecker 提交于
      Wake up a CPU when a timer list timer is enqueued there and
      the target is part of the full dynticks range. Sending an IPI
      to it makes it reconsidering the next timer to program on top
      of recent updates.
      
      This may later be improved by checking if the tick is really
      stopped on the target. This would need some careful
      synchronization though. So deal with such optimization later
      and start simple.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      1c20091e
    • F
      nohz: Assign timekeeping duty to a CPU outside the full dynticks range · a382bf93
      Frederic Weisbecker 提交于
      This way the full nohz CPUs can safely run with the tick
      stopped with a guarantee that somebody else is taking
      care of the jiffies and GTOD progression.
      
      Once the duty is attributed to a CPU, it won't change. Also that
      CPU can't enter into dyntick idle mode or be hot unplugged.
      
      This may later be improved from a power consumption POV. At
      least we should be able to share the duty amongst all CPUs
      outside the full dynticks range. Then the duty could even be
      shared with full dynticks CPUs when those can't stop their
      tick for any reason.
      
      But let's start with that very simple approach first.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      [fix have_nohz_full_mask offcase]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a382bf93
    • F
      nohz: Basic full dynticks interface · a831881b
      Frederic Weisbecker 提交于
      For extreme usecases such as Real Time or HPC, having
      the ability to shutdown the tick when a single task runs
      on a CPU is a desired feature:
      
      * Reducing the amount of interrupts improves throughput
      for CPU-bound tasks. The CPU is less distracted from its
      real job, from an execution time and from the cache point
      of views.
      
      * This also improve latency response as we have less critical
      sections.
      
      Start with introducing a very simple interface to define
      full dynticks CPU: use a boot time option defined cpumask
      through the "nohz_extended=" kernel parameter. CPUs that
      are part of this range will have their tick shutdown
      whenever possible: provided they run a single task and
      they don't do kernel activity that require the periodic
      tick. These details will be later documented in
      Documentation/*
      
      An online CPU must be kept outside this range to handle the
      timekeeping.
      Suggested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      a831881b
  7. 18 3月, 2013 2 次提交
  8. 14 3月, 2013 3 次提交
    • A
      sched: Fix variable name misnomer, add comments · 1bf08230
      Andrei Epure 提交于
      The min_vruntime variable actually stores the maximum value.
      The added comment was taken from place_entity function.
      Signed-off-by: NAndrei Epure <epure.andrei@gmail.com>
      Cc: peterz@infradead.org
      Link: http://lkml.kernel.org/r/1363115544-1964-1-git-send-email-epure.andrei@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1bf08230
    • F
      sched: Lower chances of cputime scaling overflow · d9a3c982
      Frederic Weisbecker 提交于
      Some users have reported that after running a process with
      hundreds of threads on intensive CPU-bound loads, the cputime
      of the group started to freeze after a few days.
      
      This is due to how we scale the tick-based cputime against
      the scheduler precise execution time value.
      
      We add the values of all threads in the group and we multiply
      that against the sum of the scheduler exec runtime of the whole
      group.
      
      This easily overflows after a few days/weeks of execution.
      
      A proposed solution to solve this was to compute that multiplication
      on stime instead of utime:
         62188451
         ("cputime: Avoid multiplication overflow on utime scaling")
      
      The rationale behind that was that it's easy for a thread to
      spend most of its time in userspace under intensive CPU-bound workload
      but it's much harder to do CPU-bound intensive long run in the kernel.
      
      This postulate got defeated when a user recently reported he was still
      seeing cputime freezes after the above patch. The workload that
      triggers this issue relates to intensive networking workloads where
      most of the cputime is consumed in the kernel.
      
      To reduce much more the opportunities for multiplication overflow,
      lets reduce the multiplication factors to the remainders of the division
      between sched exec runtime and cputime. Assuming the difference between
      these shouldn't ever be that large, it could work on many situations.
      
      This gets the same results as in the upstream scaling code except for
      a small difference: the upstream code always rounds the results to
      the nearest integer not greater to what would be the precise result.
      The new code rounds to the nearest integer either greater or not
      greater. In practice this difference probably shouldn't matter but
      it's worth mentioning.
      
      If this solution appears not to be enough in the end, we'll
      need to partly revert back to the behaviour prior to commit
           0cf55e1e
           ("sched, cputime: Introduce thread_group_times()")
      
      Back then, the scaling was done on exit() time before adding the cputime
      of an exiting thread to the signal struct. And then we'll need to
      scale one-by-one the live threads cputime in thread_group_cputime(). The
      drawback may be a slightly slower code on exit time.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      d9a3c982
    • F
      math64: New div64_u64_rem helper · f7926850
      Frederic Weisbecker 提交于
      Provide an extended version of div64_u64() that
      also returns the remainder of the division.
      
      We are going to need this to refine the cputime
      scaling code.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      f7926850
  9. 11 3月, 2013 2 次提交
  10. 08 3月, 2013 6 次提交
    • I
      Merge branch 'sched/cputime' of... · 4e3da467
      Ingo Molnar 提交于
      Merge branch 'sched/cputime' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core
      
      Pull cputime changes from Frederic Weisbecker:
      
        * Generalize exception handling
      
        * Fix race in context tracking state restore on return from exception
          and irq exit kernel preemption
      
        * Fix cputime scaling in full dynticks accounting dynamic off-case
      
        * Fix default Kconfig value
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4e3da467
    • F
      context_tracking: Enable probes by default for selftesting · 8b438766
      Frederic Weisbecker 提交于
      Until we provide the nohz_mask boot parameter, keeping
      the context tracking probes disabled by default is pointless
      since what we want is to runtime test this code anyway.
      
      It's furthermore confusing for the users which don't expect
      the probes to be off when they select RCU user mode or full
      dynticks cputime accounting.
      
      Let's enable these probes selftests by default for now.
      
      Suggested: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      8b438766
    • F
      cputime: Dynamically scale cputime for full dynticks accounting · 9fbc42ea
      Frederic Weisbecker 提交于
      The full dynticks cputime accounting is able to account either
      using the tick or the context tracking subsystem. This way
      the housekeeping CPU can keep the low overhead tick based
      solution.
      
      This latter mode has a low jiffies resolution granularity and
      need to be scaled against CFS precise runtime accounting to
      improve its result. We are doing this for CONFIG_TICK_CPU_ACCOUNTING,
      now we also need to expand it to full dynticks accounting dynamic
      off-case as well.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      9fbc42ea
    • F
      context_tracking: Restore preempted context state after preempt_schedule_irq() · b22366cd
      Frederic Weisbecker 提交于
      From the context tracking POV, preempt_schedule_irq() behaves pretty much
      like an exception: It can be called anytime and schedule another task.
      
      But currently it doesn't restore the context tracking state of the preempted
      code on preempt_schedule_irq() return.
      
      As a result, if preempt_schedule_irq() is called in the tiny frame between
      user_enter() and the actual return to userspace, we resume userspace with
      the wrong context tracking state.
      
      Fix this by using exception_enter/exit() which are a perfect fit for this
      kind of issue.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      b22366cd
    • F
      context_tracking: Restore correct previous context state on exception exit · 6c1e0256
      Frederic Weisbecker 提交于
      On exception exit, we restore the previous context tracking state based on
      the regs of the interrupted frame. Iff that frame is in user mode as
      stated by user_mode() helper, we restore the context tracking user mode.
      
      However there is a tiny chunck of low level arch code after we pass through
      user_enter() and until the CPU eventually resumes userspace.
      If an exception happens in this tiny area, exception_enter() correctly
      exits the context tracking user mode but exception_exit() won't restore
      it because of the value returned by user_mode(regs).
      
      As a result we may return to userspace with the wrong context tracking
      state.
      
      To fix this, change exception_enter() to return the context tracking state
      prior to its call and pass this saved state to exception_exit(). This restores
      the real context tracking state of the interrupted frame.
      
      (May be this patch was suggested to me, I don't recall exactly. If so,
      sorry for the missing credit).
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      6c1e0256
    • F
      context_tracking: Move exception handling to generic code · 56dd9470
      Frederic Weisbecker 提交于
      Exceptions handling on context tracking should share common
      treatment: on entry we exit user mode if the exception triggered
      in that context. Then on exception exit we return to that previous
      context.
      
      Generalize this to avoid duplication across archs.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      56dd9470
  11. 06 3月, 2013 3 次提交