1. 10 7月, 2014 1 次提交
    • P
      rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs · c0f489d2
      Paul E. McKenney 提交于
      Binding the grace-period kthreads to the timekeeping CPU resulted in
      significant performance decreases for some workloads.  For more detail,
      see:
      
      https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
      
      https://lkml.org/lkml/2014/6/4/218 for CPU statistics
      
      It turns out that it is necessary to bind the grace-period kthreads
      to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
      on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
      In other cases, it suffices to bind the grace-period kthreads to the
      set of non-nohz_full CPUs.
      
      This commit therefore creates a tick_nohz_not_full_mask that is the
      complement of tick_nohz_full_mask, and then binds the grace-period
      kthread to the set of CPUs indicated by this new mask, which covers
      the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
      case still binds the grace-period kthreads to the timekeeping CPU.
      This commit also includes the tick_nohz_full_enabled() check suggested
      by Frederic Weisbecker.
      Reported-by: NJet Chen <jet.chen@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Created housekeeping_affine() and housekeeping_mask per
        fweisbec feedback. ]
      c0f489d2
  2. 16 6月, 2014 1 次提交
    • F
      nohz: Support nohz full remote kick · 3d36aebc
      Frederic Weisbecker 提交于
      Remotely kicking a full nohz CPU in order to make it re-evaluate its
      next tick is currently implemented using the scheduler IPI.
      
      However this bloats a scheduler fast path with an off-topic feature.
      The scheduler tick was abused here for its cool "callable
      anywhere/anytime" properties.
      
      But now that the irq work subsystem can queue remote callbacks, it's
      a perfect fit to safely queue IPIs when interrupts are disabled
      without worrying about concurrent callers.
      
      So lets implement remote kick on top of irq work. This is going to
      be used when a new event requires the next tick to be recalculated:
      more than 1 task competing on the CPU, timer armed, ...
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      3d36aebc
  3. 16 4月, 2014 2 次提交
  4. 16 1月, 2014 3 次提交
  5. 13 1月, 2014 1 次提交
    • P
      sched/clock, x86: Use a static_key for sched_clock_stable · 35af99e6
      Peter Zijlstra 提交于
      In order to avoid the runtime condition and variable load turn
      sched_clock_stable into a static_key.
      
      Also provide a shorter implementation of local_clock() and
      cpu_clock(int) when sched_clock_stable==1.
      
                              MAINLINE   PRE       POST
      
          sched_clock_stable: 1          1         1
          (cold) sched_clock: 329841     221876    215295
          (cold) local_clock: 301773     234692    220773
          (warm) sched_clock: 38375      25602     25659
          (warm) local_clock: 100371     33265     27242
          (warm) rdtsc:       27340      24214     24208
          sched_clock_stable: 0          0         0
          (cold) sched_clock: 382634     235941    237019
          (cold) local_clock: 396890     297017    294819
          (warm) sched_clock: 38194      25233     25609
          (warm) local_clock: 143452     71234     71232
          (warm) rdtsc:       27345      24245     24243
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35af99e6
  6. 24 12月, 2013 1 次提交
    • J
      tick/timekeeping: Call update_wall_time outside the jiffies lock · 47a1b796
      John Stultz 提交于
      Since the xtime lock was split into the timekeeping lock and
      the jiffies lock, we no longer need to call update_wall_time()
      while holding the jiffies lock.
      
      Thus, this patch splits update_wall_time() out from do_timer().
      
      This allows us to get away from calling clock_was_set_delayed()
      in update_wall_time() and instead use the standard clock_was_set()
      call that previously would deadlock, as it causes the jiffies lock
      to be acquired.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      47a1b796
  7. 03 12月, 2013 1 次提交
    • F
      nohz: Convert a few places to use local per cpu accesses · e8fcaa5c
      Frederic Weisbecker 提交于
      A few functions use remote per CPU access APIs when they
      deal with local values.
      
      Just do the right conversion to improve performance, code
      readability and debug checks.
      
      While at it, lets extend some of these function names with *_this_cpu()
      suffix in order to display their purpose more clearly.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      e8fcaa5c
  8. 29 11月, 2013 1 次提交
  9. 19 11月, 2013 1 次提交
  10. 16 8月, 2013 1 次提交
  11. 14 8月, 2013 3 次提交
    • F
      nohz: Optimize full dynticks's sched hooks with static keys · d13508f9
      Frederic Weisbecker 提交于
      Scheduler IPIs and task context switches are serious fast path.
      Let's try to hide as much as we can the impact of full
      dynticks APIs' off case that are called on these sites
      through the use of static keys.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      d13508f9
    • F
      nohz: Optimize full dynticks state checks with static keys · 460775df
      Frederic Weisbecker 提交于
      These APIs are frequenctly accessed and priority is given
      to optimize the full dynticks off-case in order to let
      distros enable this feature without suffering from
      significant performance regressions.
      
      Let's inline these APIs and optimize them with static keys.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      460775df
    • F
      nohz: Rename a few state variables · 73867dcd
      Frederic Weisbecker 提交于
      Rename the full dynticks's cpumask and cpumask state variables
      to some more exportable names.
      
      These will be used later from global headers to optimize
      the main full dynticks APIs in conjunction with static keys.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      73867dcd
  12. 13 8月, 2013 1 次提交
    • F
      nohz: Only enable context tracking on full dynticks CPUs · 2e709338
      Frederic Weisbecker 提交于
      The context tracking subsystem has the ability to selectively
      enable the tracking on any defined subset of CPU. This means that
      we can define a CPU range that doesn't run the context tracking
      and another range that does.
      
      Now what we want in practice is to enable the tracking on full
      dynticks CPUs only. In order to perform this, we just need to pass
      our full dynticks CPU range selection from the full dynticks
      subsystem to the context tracking.
      
      This way we can spare the overhead of RCU user extended quiescent
      state and vtime maintainance on the CPUs that are outside the
      full dynticks range. Just keep in mind the raw context tracking
      itself is still necessary everywhere.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      2e709338
  13. 29 7月, 2013 1 次提交
    • R
      Revert "cpuidle: Quickly notice prediction failure for repeat mode" · 14851912
      Rafael J. Wysocki 提交于
      Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
      repeat mode), because it has been identified as the source of a
      significant performance regression in v3.8 and later as explained by
      Jeremy Eder:
      
        We believe we've identified a particular commit to the cpuidle code
        that seems to be impacting performance of variety of workloads.
        The simplest way to reproduce is using netperf TCP_RR test, so
        we're using that, on a pair of Sandy Bridge based servers.  We also
        have data from a large database setup where performance is also
        measurably/positively impacted, though that test data isn't easily
        share-able.
      
        Included below are test results from 3 test kernels:
      
        kernel       reverts
        -----------------------------------------------------------
        1) vanilla   upstream (no reverts)
      
        2) perfteam2 reverts e11538d1
      
        3) test      reverts 69a37bea
                             e11538d1
      
        In summary, netperf TCP_RR numbers improve by approximately 4%
        after reverting 69a37bea.  When
        69a37bea is included, C0 residency
        never seems to get above 40%.  Taking that patch out gets C0 near
        100% quite often, and performance increases.
      
        The below data are histograms representing the %c0 residency @
        1-second sample rates (using turbostat), while under netperf test.
      
        - If you look at the first 4 histograms, you can see %c0 residency
          almost entirely in the 30,40% bin.
        - The last pair, which reverts 69a37bea,
          shows %c0 in the 80,90,100% bins.
      
        Below each kernel name are netperf TCP_RR trans/s numbers for the
        particular kernel that can be disclosed publicly, comparing the 3
        test kernels.  We ran a 4th test with the vanilla kernel where
        we've also set /dev/cpu_dma_latency=0 to show overall impact
        boosting single-threaded TCP_RR performance over 11% above
        baseline.
      
        3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
        TCP_RR trans/s 54323.78
      
        -----------------------------------------------------------
        3.10-rc2 vanilla RX (no reverts)
        TCP_RR trans/s 48192.47
      
        Receiver %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    59]:
        ***********************************************************
           40.0000 -    50.0000 [     1]: *
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        Sender %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    11]: ***********
           40.0000 -    50.0000 [    49]:
        *************************************************
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        -----------------------------------------------------------
        3.10-rc2 perfteam2 RX (reverts commit
        e11538d1)
        TCP_RR trans/s 49698.69
      
        Receiver %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     1]: *
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    59]:
        ***********************************************************
           40.0000 -    50.0000 [     0]:
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        Sender %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [     2]: **
           40.0000 -    50.0000 [    58]:
        **********************************************************
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        -----------------------------------------------------------
        3.10-rc2 test RX (reverts 69a37bea
        and e11538d1)
        TCP_RR trans/s 47766.95
      
        Receiver %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     1]: *
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    27]: ***************************
           40.0000 -    50.0000 [     2]: **
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     2]: **
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [    28]: ****************************
      
        Sender:
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    11]: ***********
           40.0000 -    50.0000 [     0]:
           50.0000 -    60.0000 [     1]: *
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     3]: ***
           80.0000 -    90.0000 [     7]: *******
           90.0000 -   100.0000 [    38]: **************************************
      
        These results demonstrate gaining back the tendency of the CPU to
        stay in more responsive, performant C-states (and thus yield
        measurably better performance), by reverting commit
        69a37bea.
      Requested-by: NJeremy Eder <jeder@redhat.com>
      Tested-by: NLen Brown <len.brown@intel.com>
      Cc: 3.8+ <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      14851912
  14. 25 7月, 2013 2 次提交
    • L
      nohz: fix compile warning in tick_nohz_init() · ca06416b
      Li Zhong 提交于
      cpu is not used after commit 5b8621a6Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      ca06416b
    • S
      nohz: Do not warn about unstable tsc unless user uses nohz_full · 543487c7
      Steven Rostedt 提交于
      If the user enables CONFIG_NO_HZ_FULL and runs the kernel on a machine
      with an unstable TSC, it will produce a WARN_ON dump as well as taint
      the kernel. This is a bit extreme for a kernel that just enables a
      feature but doesn't use it.
      
      The warning should only happen if the user tries to use the feature by
      either adding nohz_full to the kernel command line, or by enabling
      CONFIG_NO_HZ_FULL_ALL that makes nohz used on all CPUs at boot up. Note,
      this second feature should not (yet) be used by distros or anyone that
      doesn't care if NO_HZ is used or not.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      543487c7
  15. 15 7月, 2013 1 次提交
    • P
      kernel: delete __cpuinit usage from all core kernel files · 0db0628d
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the uses of the __cpuinit macros from C files in
      the core kernel directories (kernel, init, lib, mm, and include)
      that don't really have a specific maintainer.
      
      [1] https://lkml.org/lkml/2013/5/20/589Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      0db0628d
  16. 20 6月, 2013 2 次提交
    • F
      nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs · 5b8621a6
      Frederic Weisbecker 提交于
      Building full dynticks now implies that all CPUs are forced
      into RCU nocb mode through CONFIG_RCU_NOCB_CPU_ALL.
      
      The dynamic check has become useless.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      5b8621a6
    • S
      nohz: Warn if the machine can not perform nohz_full · e12d0271
      Steven Rostedt 提交于
      If the user configures NO_HZ_FULL and defines nohz_full=XXX on the
      kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails
      due to the machine having a unstable clock, warn about it.
      
      We do not want users thinking that they are getting the benefit
      of nohz when their machine can not support it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      e12d0271
  17. 31 5月, 2013 1 次提交
  18. 14 5月, 2013 1 次提交
  19. 12 5月, 2013 1 次提交
  20. 04 5月, 2013 1 次提交
    • F
      sched: Keep at least 1 tick per second for active dynticks tasks · 265f22a9
      Frederic Weisbecker 提交于
      The scheduler doesn't yet fully support environments
      with a single task running without a periodic tick.
      
      In order to ensure we still maintain the duties of scheduler_tick(),
      keep at least 1 tick per second.
      
      This makes sure that we keep the progression of various scheduler
      accounting and background maintainance even with a very low granularity.
      Examples include cpu load, sched average, CFS entity vruntime,
      avenrun and events such as load balancing, amongst other details
      handled in sched_class::task_tick().
      
      This limitation will be removed in the future once we get
      these individual items to work in full dynticks CPUs.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      265f22a9
  21. 29 4月, 2013 1 次提交
    • L
      nohz: Protect smp_processor_id() in tick_nohz_task_switch() · 6296ace4
      Li Zhong 提交于
      I saw following error when testing the latest nohz code on
      Power:
      
      [   85.295384] BUG: using smp_processor_id() in preemptible [00000000] code: rsyslogd/3493
      [   85.295396] caller is .tick_nohz_task_switch+0x1c/0xb8
      [   85.295402] Call Trace:
      [   85.295408] [c0000001fababab0] [c000000000012dc4] .show_stack+0x110/0x25c (unreliable)
      [   85.295420] [c0000001fababba0] [c0000000007c4b54] .dump_stack+0x20/0x30
      [   85.295430] [c0000001fababc10] [c00000000044eb74] .debug_smp_processor_id+0xf4/0x124
      [   85.295438] [c0000001fababca0] [c0000000000d7594] .tick_nohz_task_switch+0x1c/0xb8
      [   85.295447] [c0000001fababd20] [c0000000000b9748] .finish_task_switch+0x13c/0x160
      [   85.295455] [c0000001fababdb0] [c0000000000bbe50] .schedule_tail+0x50/0x124
      [   85.295463] [c0000001fababe30] [c000000000009dc8] .ret_from_fork+0x4/0x54
      
      The code below moves the test into local_irq_save/restore
      section to avoid the above complaint.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1367119558.6391.34.camel@ThinkPad-T5421.cn.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6296ace4
  22. 26 4月, 2013 1 次提交
    • I
      nohz: Reduce overhead under high-freq idling patterns · 47aa8b6c
      Ingo Molnar 提交于
      One testbox of mine (Intel Nehalem, 16-way) uses MWAIT for its idle routine,
      which apparently can break out of its idle loop rather frequently, with
      high frequency.
      
      In that case NO_HZ_FULL=y kernels show high ksoftirqd overhead and constant
      context switching, because tick_nohz_stop_sched_tick() will, if
      delta_jiffies == 0, mis-identify this as a timer event - activating the
      TIMER_SOFTIRQ, which wakes up ksoftirqd.
      
      Fix this by treating delta_jiffies == 0 the same way we treat other short
      wakeups, delta_jiffies == 1.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      47aa8b6c
  23. 23 4月, 2013 5 次提交
    • F
      nohz: Add basic tracing · cb41a290
      Frederic Weisbecker 提交于
      It's not obvious to find out why the full dynticks subsystem
      doesn't always stop the tick: whether this is due to kthreads,
      posix timers, perf events, etc...
      
      These new tracepoints are here to help the user diagnose
      the failures and test this feature.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      cb41a290
    • F
      nohz: Re-evaluate the tick for the new task after a context switch · 99e5ada9
      Frederic Weisbecker 提交于
      When a task is scheduled in, it may have some properties
      of its own that could make the CPU reconsider the need for
      the tick: posix cpu timers, perf events, ...
      
      So notify the full dynticks subsystem when a task gets
      scheduled in and re-check the tick dependency at this
      stage. This is done through a self IPI to avoid messing
      up with any current lock scenario.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      99e5ada9
    • F
      nohz: Prepare to stop the tick on irq exit · 5811d996
      Frederic Weisbecker 提交于
      Interrupt exit is a natural place to stop the tick: it happens
      after all events happening before and during the irq which
      are liable to update the dependency on the tick occured. Also
      it makes sure that any check on tick dependency is well ordered
      against dynticks kick IPIs.
      
      Bring in the infrastructure that performs the tick dependency
      checks on irq exit and shut it down if these checks show that we
      can do it safely.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      5811d996
    • F
      nohz: Implement full dynticks kick · 9014c45d
      Frederic Weisbecker 提交于
      Implement the full dynticks kick that is performed from
      IPIs sent by various subsystems (scheduler, posix timers, ...)
      when they want to notify about a new event that may
      reconsider the dependency on the tick.
      
      Most of the time, such an event end up restarting the tick.
      
      (Part of the design with subsystems providing *_can_stop_tick()
      helpers suggested by Peter Zijlstra a while ago).
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      9014c45d
    • F
      nohz: Re-evaluate the tick from the scheduler IPI · ff442c51
      Frederic Weisbecker 提交于
      The scheduler IPI is used by the scheduler to kick
      full dynticks CPUs asynchronously when more than one
      task are running or when a new timer list timer is
      enqueued. This way the destination CPU can decide
      to restart the tick to handle this new situation.
      
      Now let's call that kick in the scheduler IPI.
      
      (Reusing the scheduler IPI rather than implementing
      a new IPI was suggested by Peter Zijlstra a while ago)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      ff442c51
  24. 19 4月, 2013 4 次提交
    • F
      nohz: New option to default all CPUs in full dynticks range · f98823ac
      Frederic Weisbecker 提交于
      Provide a new kernel config that defaults all CPUs to be part
      of the full dynticks range, except the boot one for timekeeping.
      
      This default setting is overriden by the nohz_full= boot option
      if passed by the user.
      
      This is helpful for those who don't need a finegrained range
      of full dynticks CPU and also for automated testing.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      f98823ac
    • F
      nohz: Ensure full dynticks CPUs are RCU nocbs · d1e43fa5
      Frederic Weisbecker 提交于
      We need full dynticks CPU to also be RCU nocb so
      that we don't have to keep the tick to handle RCU
      callbacks.
      
      Make sure the range passed to nohz_full= boot
      parameter is a subset of rcu_nocbs=
      
      The CPUs that fail to meet this requirement will be
      excluded from the nohz_full range. This is checked
      early in boot time, before any CPU has the opportunity
      to stop its tick.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      d1e43fa5
    • F
      nohz: Force boot CPU outside full dynticks range · 0453b435
      Frederic Weisbecker 提交于
      The timekeeping job must be able to run early on boot
      because there may be some pre-SMP (and thus pre-initcalls )
      components that rely on it. The IO-APIC is one such users
      as it tests the timer health by watching jiffies progression.
      
      Given that it happens before we know the initial online
      set, we can't rely on it to select a timekeeper. We need
      one before SMP time otherwise we simply crash on boot.
      
      To fix this and keep things simple for now, force the boot CPU
      outside of the full dynticks range in any case and do this early
      on kernel parameter parsing time.
      
      We might want a trickier solution later, expecially for aSMP
      architectures that need to assign housekeeping tasks to arbitrary
      low power CPUs.
      
      But it's still first pass KISS time for now.
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      0453b435
    • F
      nohz: New APIs to re-evaluate the tick on full dynticks CPUs · 76c24fb0
      Frederic Weisbecker 提交于
      Provide two new helpers in order to notify the full dynticks CPUs about
      some internal system changes against which they may reconsider the state
      of their tick. Some practical examples include: posix cpu timers, perf tick
      and sched clock tick.
      
      For now the notifying handler, implemented through IPIs, is a stub
      that will be implemented when we get the tick stop/restart infrastructure
      in.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      76c24fb0
  25. 16 4月, 2013 1 次提交
    • F
      nohz: Switch from "extended nohz" to "full nohz" based naming · c5bfece2
      Frederic Weisbecker 提交于
      "Extended nohz" was used as a naming base for the full dynticks
      API and Kconfig symbols. It reflects the fact the system tries
      to stop the tick in more places than just idle.
      
      But that "extended" name is a bit opaque and vague. Rename it to
      "full" makes it clearer what the system tries to do under this
      config: try to shutdown the tick anytime it can. The various
      constraints that prevent that to happen shouldn't be considered
      as fundamental properties of this feature but rather technical
      issues that may be solved in the future.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      c5bfece2
  26. 03 4月, 2013 1 次提交
    • F
      nohz: Print final full dynticks CPUs range on boot · 1034fc2f
      Frederic Weisbecker 提交于
      Given that we apply a few restrictions on the full dynticks
      CPUs range (keep an online timekeeper oustide the range,
      then in the future have the range be an RCU nocb CPUs subset),
      let's print the final resulting range of full dynticks CPUs to
      the user so that he knows what's really going to run.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      1034fc2f