1. 09 10月, 2014 1 次提交
  2. 14 9月, 2014 2 次提交
    • F
      nohz: nohz full depends on irq work self IPI support · 9b01f5bf
      Frederic Weisbecker 提交于
      The nohz full functionality depends on IRQ work to trigger its own
      interrupts. As it's used to restart the tick, we can't rely on the tick
      fallback for irq work callbacks, ie: we can't use the tick to restart
      the tick itself.
      
      Lets reject the full dynticks initialization if that arch support isn't
      available.
      
      As a side effect, this makes sure that nohz kick is never called from
      the tick. That otherwise would result in illegal hrtimer self-cancellation
      and lockup.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      9b01f5bf
    • F
      nohz: Consolidate nohz full init code · 4327b15f
      Frederic Weisbecker 提交于
      The supports for CONFIG_NO_HZ_FULL_ALL=y and the nohz_full= kernel
      parameter both have their own way to do the same thing: allocate
      full dynticks cpumasks, fill them and initialize some state variables.
      
      Lets consolidate that all in the same place.
      
      While at it, convert some regular printk message to warnings when
      fundamental allocations fail.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      4327b15f
  3. 05 9月, 2014 1 次提交
    • F
      nohz: Restore NMI safe local irq work for local nohz kick · 40bea039
      Frederic Weisbecker 提交于
      The local nohz kick is currently used by perf which needs it to be
      NMI-safe. Recent commit though (7d1311b9)
      changed its implementation to fire the local kick using the remote kick
      API. It was convenient to make the code more generic but the remote kick
      isn't NMI-safe.
      
      As a result:
      
      	WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140()
      	CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34
      	0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37
      	0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180
      	0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000
      	Call Trace:
      	<NMI>  [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a
      	[<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0
      	[<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20
      	[<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140
      	[<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90
      	[<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350
      	[<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0
      	[<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
      	[<ffffffff9a187934>] perf_event_overflow+0x14/0x20
      	[<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410
      	[<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130
      	[<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50
      	[<ffffffff9a007b72>] nmi_handle+0xd2/0x390
      	[<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390
      	[<ffffffff9a0d131b>] ? lock_release+0xab/0x330
      	[<ffffffff9a008062>] default_do_nmi+0x72/0x1c0
      	[<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200
      	[<ffffffff9a008268>] do_nmi+0xb8/0x100
      
      Lets fix this by restoring the use of local irq work for the nohz local
      kick.
      Reported-by: NCatalin Iacob <iacobcatalin@gmail.com>
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      40bea039
  4. 27 8月, 2014 2 次提交
  5. 23 8月, 2014 2 次提交
    • V
      nohz: Avoid tick's double reprogramming in highres mode · 2a16fc93
      Viresh Kumar 提交于
      In highres mode, the tick reschedules itself unconditionally to the
      next jiffies.
      
      However while this clock reprogramming is relevant when the tick is
      in periodic mode, it's not that interesting when we run in dynticks mode
      because irq exit is likely going to overwrite the next tick to some
      randomly deferred future.
      
      So lets just get rid of this tick self rescheduling in dynticks mode.
      This way we can avoid some clockevents double write in favourable
      scenarios like when we stop the tick completely in idle while no other
      hrtimer is pending.
      Suggested-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      2a16fc93
    • V
      nohz: Fix spurious periodic tick behaviour in low-res dynticks mode · b5e995e6
      Viresh Kumar 提交于
      When we reach the end of the tick handler, we unconditionally reschedule
      the next tick to the next jiffy. Then on irq exit, the nohz code
      overrides that setting if needed and defers the next tick as far away in
      the future as possible.
      
      Now in the best dynticks case, when we actually don't need any tick in
      the future (ie: expires == KTIME_MAX), low-res and high-res behave
      differently. What we want in this case is to cancel the next tick
      programmed by the previous one. That's what we do in high-res mode. OTOH
      we lack a low-res mode equivalent of hrtimer_cancel() so we simply don't
      do anything in this case and the next tick remains scheduled to jiffies + 1.
      
      As a result, in low-res mode, when the dynticks code determines that no
      tick is needed in the future, we can recursively get a spurious tick
      every jiffy because then the next tick is always reprogrammed from the
      tick handler and is never cancelled. And this can happen indefinetly
      until some subsystem actually needs a precise tick in the future and only
      then we eventually overwrite the previous tick handler setting to defer
      the next tick.
      
      We are fixing this by introducing the ONESHOT_STOPPED mode which will
      let us pause a clockevent when no further interrupt is needed. Meanwhile
      we can't expect all drivers to support this new mode.
      
      So lets reduce much of the symptoms by skipping the nohz-blind tick
      rescheduling from the tick-handler when the CPU is in dynticks mode.
      That tick rescheduling wrongly assumed periodicity and the low-res
      dynticks code can't cancel such decision. This breaks the recursive (and
      thus the worst) part of the problem. In the worst case now, we'll get
      only one extra tick due to uncancelled tick scheduled before we entered
      dynticks mode.
      
      This also removes a needless clockevent write on idle ticks. Since those
      clock write are usually considered to be slow, it's a general win.
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      b5e995e6
  6. 10 7月, 2014 1 次提交
    • P
      rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs · c0f489d2
      Paul E. McKenney 提交于
      Binding the grace-period kthreads to the timekeeping CPU resulted in
      significant performance decreases for some workloads.  For more detail,
      see:
      
      https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
      
      https://lkml.org/lkml/2014/6/4/218 for CPU statistics
      
      It turns out that it is necessary to bind the grace-period kthreads
      to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
      on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
      In other cases, it suffices to bind the grace-period kthreads to the
      set of non-nohz_full CPUs.
      
      This commit therefore creates a tick_nohz_not_full_mask that is the
      complement of tick_nohz_full_mask, and then binds the grace-period
      kthread to the set of CPUs indicated by this new mask, which covers
      the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
      case still binds the grace-period kthreads to the timekeeping CPU.
      This commit also includes the tick_nohz_full_enabled() check suggested
      by Frederic Weisbecker.
      Reported-by: NJet Chen <jet.chen@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Created housekeeping_affine() and housekeeping_mask per
        fweisbec feedback. ]
      c0f489d2
  7. 16 6月, 2014 1 次提交
    • F
      nohz: Support nohz full remote kick · 3d36aebc
      Frederic Weisbecker 提交于
      Remotely kicking a full nohz CPU in order to make it re-evaluate its
      next tick is currently implemented using the scheduler IPI.
      
      However this bloats a scheduler fast path with an off-topic feature.
      The scheduler tick was abused here for its cool "callable
      anywhere/anytime" properties.
      
      But now that the irq work subsystem can queue remote callbacks, it's
      a perfect fit to safely queue IPIs when interrupts are disabled
      without worrying about concurrent callers.
      
      So lets implement remote kick on top of irq work. This is going to
      be used when a new event requires the next tick to be recalculated:
      more than 1 task competing on the CPU, timer armed, ...
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      3d36aebc
  8. 16 4月, 2014 2 次提交
  9. 16 1月, 2014 3 次提交
  10. 13 1月, 2014 1 次提交
    • P
      sched/clock, x86: Use a static_key for sched_clock_stable · 35af99e6
      Peter Zijlstra 提交于
      In order to avoid the runtime condition and variable load turn
      sched_clock_stable into a static_key.
      
      Also provide a shorter implementation of local_clock() and
      cpu_clock(int) when sched_clock_stable==1.
      
                              MAINLINE   PRE       POST
      
          sched_clock_stable: 1          1         1
          (cold) sched_clock: 329841     221876    215295
          (cold) local_clock: 301773     234692    220773
          (warm) sched_clock: 38375      25602     25659
          (warm) local_clock: 100371     33265     27242
          (warm) rdtsc:       27340      24214     24208
          sched_clock_stable: 0          0         0
          (cold) sched_clock: 382634     235941    237019
          (cold) local_clock: 396890     297017    294819
          (warm) sched_clock: 38194      25233     25609
          (warm) local_clock: 143452     71234     71232
          (warm) rdtsc:       27345      24245     24243
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35af99e6
  11. 24 12月, 2013 1 次提交
    • J
      tick/timekeeping: Call update_wall_time outside the jiffies lock · 47a1b796
      John Stultz 提交于
      Since the xtime lock was split into the timekeeping lock and
      the jiffies lock, we no longer need to call update_wall_time()
      while holding the jiffies lock.
      
      Thus, this patch splits update_wall_time() out from do_timer().
      
      This allows us to get away from calling clock_was_set_delayed()
      in update_wall_time() and instead use the standard clock_was_set()
      call that previously would deadlock, as it causes the jiffies lock
      to be acquired.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      47a1b796
  12. 03 12月, 2013 1 次提交
    • F
      nohz: Convert a few places to use local per cpu accesses · e8fcaa5c
      Frederic Weisbecker 提交于
      A few functions use remote per CPU access APIs when they
      deal with local values.
      
      Just do the right conversion to improve performance, code
      readability and debug checks.
      
      While at it, lets extend some of these function names with *_this_cpu()
      suffix in order to display their purpose more clearly.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      e8fcaa5c
  13. 29 11月, 2013 1 次提交
  14. 19 11月, 2013 1 次提交
  15. 16 8月, 2013 1 次提交
  16. 14 8月, 2013 3 次提交
    • F
      nohz: Optimize full dynticks's sched hooks with static keys · d13508f9
      Frederic Weisbecker 提交于
      Scheduler IPIs and task context switches are serious fast path.
      Let's try to hide as much as we can the impact of full
      dynticks APIs' off case that are called on these sites
      through the use of static keys.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      d13508f9
    • F
      nohz: Optimize full dynticks state checks with static keys · 460775df
      Frederic Weisbecker 提交于
      These APIs are frequenctly accessed and priority is given
      to optimize the full dynticks off-case in order to let
      distros enable this feature without suffering from
      significant performance regressions.
      
      Let's inline these APIs and optimize them with static keys.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      460775df
    • F
      nohz: Rename a few state variables · 73867dcd
      Frederic Weisbecker 提交于
      Rename the full dynticks's cpumask and cpumask state variables
      to some more exportable names.
      
      These will be used later from global headers to optimize
      the main full dynticks APIs in conjunction with static keys.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      73867dcd
  17. 13 8月, 2013 1 次提交
    • F
      nohz: Only enable context tracking on full dynticks CPUs · 2e709338
      Frederic Weisbecker 提交于
      The context tracking subsystem has the ability to selectively
      enable the tracking on any defined subset of CPU. This means that
      we can define a CPU range that doesn't run the context tracking
      and another range that does.
      
      Now what we want in practice is to enable the tracking on full
      dynticks CPUs only. In order to perform this, we just need to pass
      our full dynticks CPU range selection from the full dynticks
      subsystem to the context tracking.
      
      This way we can spare the overhead of RCU user extended quiescent
      state and vtime maintainance on the CPUs that are outside the
      full dynticks range. Just keep in mind the raw context tracking
      itself is still necessary everywhere.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      2e709338
  18. 29 7月, 2013 1 次提交
    • R
      Revert "cpuidle: Quickly notice prediction failure for repeat mode" · 14851912
      Rafael J. Wysocki 提交于
      Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
      repeat mode), because it has been identified as the source of a
      significant performance regression in v3.8 and later as explained by
      Jeremy Eder:
      
        We believe we've identified a particular commit to the cpuidle code
        that seems to be impacting performance of variety of workloads.
        The simplest way to reproduce is using netperf TCP_RR test, so
        we're using that, on a pair of Sandy Bridge based servers.  We also
        have data from a large database setup where performance is also
        measurably/positively impacted, though that test data isn't easily
        share-able.
      
        Included below are test results from 3 test kernels:
      
        kernel       reverts
        -----------------------------------------------------------
        1) vanilla   upstream (no reverts)
      
        2) perfteam2 reverts e11538d1
      
        3) test      reverts 69a37bea
                             e11538d1
      
        In summary, netperf TCP_RR numbers improve by approximately 4%
        after reverting 69a37bea.  When
        69a37bea is included, C0 residency
        never seems to get above 40%.  Taking that patch out gets C0 near
        100% quite often, and performance increases.
      
        The below data are histograms representing the %c0 residency @
        1-second sample rates (using turbostat), while under netperf test.
      
        - If you look at the first 4 histograms, you can see %c0 residency
          almost entirely in the 30,40% bin.
        - The last pair, which reverts 69a37bea,
          shows %c0 in the 80,90,100% bins.
      
        Below each kernel name are netperf TCP_RR trans/s numbers for the
        particular kernel that can be disclosed publicly, comparing the 3
        test kernels.  We ran a 4th test with the vanilla kernel where
        we've also set /dev/cpu_dma_latency=0 to show overall impact
        boosting single-threaded TCP_RR performance over 11% above
        baseline.
      
        3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
        TCP_RR trans/s 54323.78
      
        -----------------------------------------------------------
        3.10-rc2 vanilla RX (no reverts)
        TCP_RR trans/s 48192.47
      
        Receiver %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    59]:
        ***********************************************************
           40.0000 -    50.0000 [     1]: *
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        Sender %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    11]: ***********
           40.0000 -    50.0000 [    49]:
        *************************************************
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        -----------------------------------------------------------
        3.10-rc2 perfteam2 RX (reverts commit
        e11538d1)
        TCP_RR trans/s 49698.69
      
        Receiver %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     1]: *
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    59]:
        ***********************************************************
           40.0000 -    50.0000 [     0]:
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        Sender %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [     2]: **
           40.0000 -    50.0000 [    58]:
        **********************************************************
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        -----------------------------------------------------------
        3.10-rc2 test RX (reverts 69a37bea
        and e11538d1)
        TCP_RR trans/s 47766.95
      
        Receiver %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     1]: *
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    27]: ***************************
           40.0000 -    50.0000 [     2]: **
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     2]: **
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [    28]: ****************************
      
        Sender:
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    11]: ***********
           40.0000 -    50.0000 [     0]:
           50.0000 -    60.0000 [     1]: *
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     3]: ***
           80.0000 -    90.0000 [     7]: *******
           90.0000 -   100.0000 [    38]: **************************************
      
        These results demonstrate gaining back the tendency of the CPU to
        stay in more responsive, performant C-states (and thus yield
        measurably better performance), by reverting commit
        69a37bea.
      Requested-by: NJeremy Eder <jeder@redhat.com>
      Tested-by: NLen Brown <len.brown@intel.com>
      Cc: 3.8+ <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      14851912
  19. 25 7月, 2013 2 次提交
    • L
      nohz: fix compile warning in tick_nohz_init() · ca06416b
      Li Zhong 提交于
      cpu is not used after commit 5b8621a6Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      ca06416b
    • S
      nohz: Do not warn about unstable tsc unless user uses nohz_full · 543487c7
      Steven Rostedt 提交于
      If the user enables CONFIG_NO_HZ_FULL and runs the kernel on a machine
      with an unstable TSC, it will produce a WARN_ON dump as well as taint
      the kernel. This is a bit extreme for a kernel that just enables a
      feature but doesn't use it.
      
      The warning should only happen if the user tries to use the feature by
      either adding nohz_full to the kernel command line, or by enabling
      CONFIG_NO_HZ_FULL_ALL that makes nohz used on all CPUs at boot up. Note,
      this second feature should not (yet) be used by distros or anyone that
      doesn't care if NO_HZ is used or not.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      543487c7
  20. 15 7月, 2013 1 次提交
    • P
      kernel: delete __cpuinit usage from all core kernel files · 0db0628d
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the uses of the __cpuinit macros from C files in
      the core kernel directories (kernel, init, lib, mm, and include)
      that don't really have a specific maintainer.
      
      [1] https://lkml.org/lkml/2013/5/20/589Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      0db0628d
  21. 20 6月, 2013 2 次提交
    • F
      nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs · 5b8621a6
      Frederic Weisbecker 提交于
      Building full dynticks now implies that all CPUs are forced
      into RCU nocb mode through CONFIG_RCU_NOCB_CPU_ALL.
      
      The dynamic check has become useless.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      5b8621a6
    • S
      nohz: Warn if the machine can not perform nohz_full · e12d0271
      Steven Rostedt 提交于
      If the user configures NO_HZ_FULL and defines nohz_full=XXX on the
      kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails
      due to the machine having a unstable clock, warn about it.
      
      We do not want users thinking that they are getting the benefit
      of nohz when their machine can not support it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      e12d0271
  22. 31 5月, 2013 1 次提交
  23. 14 5月, 2013 1 次提交
  24. 12 5月, 2013 1 次提交
  25. 04 5月, 2013 1 次提交
    • F
      sched: Keep at least 1 tick per second for active dynticks tasks · 265f22a9
      Frederic Weisbecker 提交于
      The scheduler doesn't yet fully support environments
      with a single task running without a periodic tick.
      
      In order to ensure we still maintain the duties of scheduler_tick(),
      keep at least 1 tick per second.
      
      This makes sure that we keep the progression of various scheduler
      accounting and background maintainance even with a very low granularity.
      Examples include cpu load, sched average, CFS entity vruntime,
      avenrun and events such as load balancing, amongst other details
      handled in sched_class::task_tick().
      
      This limitation will be removed in the future once we get
      these individual items to work in full dynticks CPUs.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      265f22a9
  26. 29 4月, 2013 1 次提交
    • L
      nohz: Protect smp_processor_id() in tick_nohz_task_switch() · 6296ace4
      Li Zhong 提交于
      I saw following error when testing the latest nohz code on
      Power:
      
      [   85.295384] BUG: using smp_processor_id() in preemptible [00000000] code: rsyslogd/3493
      [   85.295396] caller is .tick_nohz_task_switch+0x1c/0xb8
      [   85.295402] Call Trace:
      [   85.295408] [c0000001fababab0] [c000000000012dc4] .show_stack+0x110/0x25c (unreliable)
      [   85.295420] [c0000001fababba0] [c0000000007c4b54] .dump_stack+0x20/0x30
      [   85.295430] [c0000001fababc10] [c00000000044eb74] .debug_smp_processor_id+0xf4/0x124
      [   85.295438] [c0000001fababca0] [c0000000000d7594] .tick_nohz_task_switch+0x1c/0xb8
      [   85.295447] [c0000001fababd20] [c0000000000b9748] .finish_task_switch+0x13c/0x160
      [   85.295455] [c0000001fababdb0] [c0000000000bbe50] .schedule_tail+0x50/0x124
      [   85.295463] [c0000001fababe30] [c000000000009dc8] .ret_from_fork+0x4/0x54
      
      The code below moves the test into local_irq_save/restore
      section to avoid the above complaint.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1367119558.6391.34.camel@ThinkPad-T5421.cn.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6296ace4
  27. 26 4月, 2013 1 次提交
    • I
      nohz: Reduce overhead under high-freq idling patterns · 47aa8b6c
      Ingo Molnar 提交于
      One testbox of mine (Intel Nehalem, 16-way) uses MWAIT for its idle routine,
      which apparently can break out of its idle loop rather frequently, with
      high frequency.
      
      In that case NO_HZ_FULL=y kernels show high ksoftirqd overhead and constant
      context switching, because tick_nohz_stop_sched_tick() will, if
      delta_jiffies == 0, mis-identify this as a timer event - activating the
      TIMER_SOFTIRQ, which wakes up ksoftirqd.
      
      Fix this by treating delta_jiffies == 0 the same way we treat other short
      wakeups, delta_jiffies == 1.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      47aa8b6c
  28. 23 4月, 2013 3 次提交
    • F
      nohz: Add basic tracing · cb41a290
      Frederic Weisbecker 提交于
      It's not obvious to find out why the full dynticks subsystem
      doesn't always stop the tick: whether this is due to kthreads,
      posix timers, perf events, etc...
      
      These new tracepoints are here to help the user diagnose
      the failures and test this feature.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      cb41a290
    • F
      nohz: Re-evaluate the tick for the new task after a context switch · 99e5ada9
      Frederic Weisbecker 提交于
      When a task is scheduled in, it may have some properties
      of its own that could make the CPU reconsider the need for
      the tick: posix cpu timers, perf events, ...
      
      So notify the full dynticks subsystem when a task gets
      scheduled in and re-check the tick dependency at this
      stage. This is done through a self IPI to avoid messing
      up with any current lock scenario.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      99e5ada9
    • F
      nohz: Prepare to stop the tick on irq exit · 5811d996
      Frederic Weisbecker 提交于
      Interrupt exit is a natural place to stop the tick: it happens
      after all events happening before and during the irq which
      are liable to update the dependency on the tick occured. Also
      it makes sure that any check on tick dependency is well ordered
      against dynticks kick IPIs.
      
      Bring in the infrastructure that performs the tick dependency
      checks on irq exit and shut it down if these checks show that we
      can do it safely.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      5811d996