1. 16 4月, 2013 2 次提交
    • F
      nohz: Switch from "extended nohz" to "full nohz" based naming · c5bfece2
      Frederic Weisbecker 提交于
      "Extended nohz" was used as a naming base for the full dynticks
      API and Kconfig symbols. It reflects the fact the system tries
      to stop the tick in more places than just idle.
      
      But that "extended" name is a bit opaque and vague. Rename it to
      "full" makes it clearer what the system tries to do under this
      config: try to shutdown the tick anytime it can. The various
      constraints that prevent that to happen shouldn't be considered
      as fundamental properties of this feature but rather technical
      issues that may be solved in the future.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      c5bfece2
    • F
      nohz: Fix old dynticks idle Kconfig backward compatibility · 0644ca5c
      Frederic Weisbecker 提交于
      In order to enforce backward compatibility with older
      config files, we want the new dynticks-idle Kconfig entry
      to default its value to the one of the old CONFIG_NO_HZ symbol
      if present.
      
      Namely we want:
      
      	config NO_HZ # old obsolete dynticks idle symbol
      		bool
      
      	config NO_HZ_IDLE # new dynticks idle symbol
      		default NO_HZ
      
      However Kconfig prevents this to work if the old symbol
      is not visible. And this is currently the case because
      NO_HZ lacks a title in order to show it in make oldconfig
      and alike.
      
      To fix this, bring a minimal title and help text to the
      obsolete Kconfig entry that explains its purpose. This
      makes the "defaulting" to work.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      0644ca5c
  2. 03 4月, 2013 4 次提交
    • F
      nohz: Print final full dynticks CPUs range on boot · 1034fc2f
      Frederic Weisbecker 提交于
      Given that we apply a few restrictions on the full dynticks
      CPUs range (keep an online timekeeper oustide the range,
      then in the future have the range be an RCU nocb CPUs subset),
      let's print the final resulting range of full dynticks CPUs to
      the user so that he knows what's really going to run.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      1034fc2f
    • F
      nohz: Pack nohz Kconfig option in a menu of choices · 3ca277e4
      Frederic Weisbecker 提交于
      Now the user has the choice between three implementations of
      the timer tick:
      
      * Static periodic tick
      * Idle dynticks
      * Full dynticks
      
      At least for now, these are mutually exclusive choices, so
      let's rely on the proper Kconfig feature to display these
      to the user.
      
      A new entry CONFIG_NO_HZ_IDLE is created and the old
      CONFIG_NO_HZ maps to it for config file backward compatibility.
      The old name was too general now that we have more
      granular dynticks implementations.
      
      While at it, add some explanation to help the user on
      his decision between the 3 entries.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      3ca277e4
    • F
      nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON · 3451d024
      Frederic Weisbecker 提交于
      We are planning to convert the dynticks Kconfig options layout
      into a choice menu. The user must be able to easily pick
      any of the following implementations: constant periodic tick,
      idle dynticks, full dynticks.
      
      As this implies a mutual exclusion, the two dynticks implementions
      need to converge on the selection of a common Kconfig option in order
      to ease the sharing of a common infrastructure.
      
      It would thus seem pretty natural to reuse CONFIG_NO_HZ to
      that end. It already implements all the idle dynticks code
      and the full dynticks depends on all that code for now.
      So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
      CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.
      
      On the other hand we want to stay backward compatible: if
      CONFIG_NO_HZ is set in an older config file, we want to
      enable CONFIG_NO_HZ_IDLE by default.
      
      But we can't afford both at the same time or we run into
      a circular dependency:
      
      1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
         CONFIG_NO_HZ
      2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE
      
      We might be able to support that from Kconfig/Kbuild but it
      may not be wise to introduce such a confusing behaviour.
      
      So to solve this, create a new CONFIG_NO_HZ_COMMON option
      which gathers the common code between idle and full dynticks
      (that common code for now is simply the idle dynticks code)
      and select it from their referring Kconfig.
      
      Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
      to it for backward compatibility.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      3451d024
    • F
      nohz: Unhide full dynticks feature from its dependencies · ab71d36d
      Frederic Weisbecker 提交于
      The full dynticks feature only shows up when all its
      Kconfig dependencies are met (RCU nocbs, RCU user mode, ...)
      
      This is far from being user friendly as those who want to
      activate this feature need to look into the Kconfig files
      and iterate through each dependency then activate these
      by hand in order to show and select the full dynticks
      Kconfig option.
      
      So process the other way around: show up the Kconfig option
      if the minimal low level dependencies are met and activate
      the high level ones when we enable the feature.
      
      Note there is one exception in the picture:
      CONFIG_VIRT_CPU_ACCOUNTING_GEN is part of a Kconfig choice
      menu and it appears we can't select it from another Kconfig
      selection when it's under such layout. So for now this
      particular item stays as a passive dependency.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      ab71d36d
  3. 21 3月, 2013 3 次提交
    • F
      nohz: Wake up full dynticks CPUs when a timer gets enqueued · 1c20091e
      Frederic Weisbecker 提交于
      Wake up a CPU when a timer list timer is enqueued there and
      the target is part of the full dynticks range. Sending an IPI
      to it makes it reconsidering the next timer to program on top
      of recent updates.
      
      This may later be improved by checking if the tick is really
      stopped on the target. This would need some careful
      synchronization though. So deal with such optimization later
      and start simple.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      1c20091e
    • F
      nohz: Assign timekeeping duty to a CPU outside the full dynticks range · a382bf93
      Frederic Weisbecker 提交于
      This way the full nohz CPUs can safely run with the tick
      stopped with a guarantee that somebody else is taking
      care of the jiffies and GTOD progression.
      
      Once the duty is attributed to a CPU, it won't change. Also that
      CPU can't enter into dyntick idle mode or be hot unplugged.
      
      This may later be improved from a power consumption POV. At
      least we should be able to share the duty amongst all CPUs
      outside the full dynticks range. Then the duty could even be
      shared with full dynticks CPUs when those can't stop their
      tick for any reason.
      
      But let's start with that very simple approach first.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      [fix have_nohz_full_mask offcase]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a382bf93
    • F
      nohz: Basic full dynticks interface · a831881b
      Frederic Weisbecker 提交于
      For extreme usecases such as Real Time or HPC, having
      the ability to shutdown the tick when a single task runs
      on a CPU is a desired feature:
      
      * Reducing the amount of interrupts improves throughput
      for CPU-bound tasks. The CPU is less distracted from its
      real job, from an execution time and from the cache point
      of views.
      
      * This also improve latency response as we have less critical
      sections.
      
      Start with introducing a very simple interface to define
      full dynticks CPU: use a boot time option defined cpumask
      through the "nohz_extended=" kernel parameter. CPUs that
      are part of this range will have their tick shutdown
      whenever possible: provided they run a single task and
      they don't do kernel activity that require the periodic
      tick. These details will be later documented in
      Documentation/*
      
      An online CPU must be kept outside this range to handle the
      timekeeping.
      Suggested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      a831881b
  4. 18 3月, 2013 1 次提交
  5. 14 3月, 2013 2 次提交
    • A
      sched: Fix variable name misnomer, add comments · 1bf08230
      Andrei Epure 提交于
      The min_vruntime variable actually stores the maximum value.
      The added comment was taken from place_entity function.
      Signed-off-by: NAndrei Epure <epure.andrei@gmail.com>
      Cc: peterz@infradead.org
      Link: http://lkml.kernel.org/r/1363115544-1964-1-git-send-email-epure.andrei@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1bf08230
    • F
      sched: Lower chances of cputime scaling overflow · d9a3c982
      Frederic Weisbecker 提交于
      Some users have reported that after running a process with
      hundreds of threads on intensive CPU-bound loads, the cputime
      of the group started to freeze after a few days.
      
      This is due to how we scale the tick-based cputime against
      the scheduler precise execution time value.
      
      We add the values of all threads in the group and we multiply
      that against the sum of the scheduler exec runtime of the whole
      group.
      
      This easily overflows after a few days/weeks of execution.
      
      A proposed solution to solve this was to compute that multiplication
      on stime instead of utime:
         62188451
         ("cputime: Avoid multiplication overflow on utime scaling")
      
      The rationale behind that was that it's easy for a thread to
      spend most of its time in userspace under intensive CPU-bound workload
      but it's much harder to do CPU-bound intensive long run in the kernel.
      
      This postulate got defeated when a user recently reported he was still
      seeing cputime freezes after the above patch. The workload that
      triggers this issue relates to intensive networking workloads where
      most of the cputime is consumed in the kernel.
      
      To reduce much more the opportunities for multiplication overflow,
      lets reduce the multiplication factors to the remainders of the division
      between sched exec runtime and cputime. Assuming the difference between
      these shouldn't ever be that large, it could work on many situations.
      
      This gets the same results as in the upstream scaling code except for
      a small difference: the upstream code always rounds the results to
      the nearest integer not greater to what would be the precise result.
      The new code rounds to the nearest integer either greater or not
      greater. In practice this difference probably shouldn't matter but
      it's worth mentioning.
      
      If this solution appears not to be enough in the end, we'll
      need to partly revert back to the behaviour prior to commit
           0cf55e1e
           ("sched, cputime: Introduce thread_group_times()")
      
      Back then, the scaling was done on exit() time before adding the cputime
      of an exiting thread to the signal struct. And then we'll need to
      scale one-by-one the live threads cputime in thread_group_cputime(). The
      drawback may be a slightly slower code on exit time.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      d9a3c982
  6. 11 3月, 2013 2 次提交
  7. 08 3月, 2013 2 次提交
    • F
      cputime: Dynamically scale cputime for full dynticks accounting · 9fbc42ea
      Frederic Weisbecker 提交于
      The full dynticks cputime accounting is able to account either
      using the tick or the context tracking subsystem. This way
      the housekeeping CPU can keep the low overhead tick based
      solution.
      
      This latter mode has a low jiffies resolution granularity and
      need to be scaled against CFS precise runtime accounting to
      improve its result. We are doing this for CONFIG_TICK_CPU_ACCOUNTING,
      now we also need to expand it to full dynticks accounting dynamic
      off-case as well.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      9fbc42ea
    • F
      context_tracking: Restore preempted context state after preempt_schedule_irq() · b22366cd
      Frederic Weisbecker 提交于
      From the context tracking POV, preempt_schedule_irq() behaves pretty much
      like an exception: It can be called anytime and schedule another task.
      
      But currently it doesn't restore the context tracking state of the preempted
      code on preempt_schedule_irq() return.
      
      As a result, if preempt_schedule_irq() is called in the tiny frame between
      user_enter() and the actual return to userspace, we resume userspace with
      the wrong context tracking state.
      
      Fix this by using exception_enter/exit() which are a perfect fit for this
      kind of issue.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      b22366cd
  8. 06 3月, 2013 7 次提交
  9. 03 3月, 2013 2 次提交
    • A
      fix compat_sys_rt_sigprocmask() · db61ec29
      Al Viro 提交于
      Converting bitmask to 32bit granularity is fine, but we'd better
      _do_ something with the result.  Such as "copy it to userland"...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      db61ec29
    • J
      trace/ring_buffer: handle 64bit aligned structs · 649508f6
      James Hogan 提交于
      Some 32 bit architectures require 64 bit values to be aligned (for
      example Meta which has 64 bit read/write instructions). These require 8
      byte alignment of event data too, so use
      !CONFIG_HAVE_64BIT_ALIGNED_ACCESS instead of !CONFIG_64BIT ||
      CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to decide alignment, and align
      buffer_data_page::data accordingly.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Acked-by: Steven Rostedt <rostedt@goodmis.org> (previous version subtly different)
      649508f6
  10. 02 3月, 2013 8 次提交
  11. 28 2月, 2013 7 次提交