1. 27 10月, 2017 2 次提交
  2. 23 3月, 2017 1 次提交
    • R
      cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely · b7eaf1aa
      Rafael J. Wysocki 提交于
      The way the schedutil governor uses the PELT metric causes it to
      underestimate the CPU utilization in some cases.
      
      That can be easily demonstrated by running kernel compilation on
      a Sandy Bridge Intel processor, running turbostat in parallel with
      it and looking at the values written to the MSR_IA32_PERF_CTL
      register.  Namely, the expected result would be that when all CPUs
      were 100% busy, all of them would be requested to run in the maximum
      P-state, but observation shows that this clearly isn't the case.
      The CPUs run in the maximum P-state for a while and then are
      requested to run slower and go back to the maximum P-state after
      a while again.  That causes the actual frequency of the processor to
      visibly oscillate below the sustainable maximum in a jittery fashion
      which clearly is not desirable.
      
      That has been attributed to CPU utilization metric updates on task
      migration that cause the total utilization value for the CPU to be
      reduced by the utilization of the migrated task.  If that happens,
      the schedutil governor may see a CPU utilization reduction and will
      attempt to reduce the CPU frequency accordingly right away.  That
      may be premature, though, for example if the system is generally
      busy and there are other runnable tasks waiting to be run on that
      CPU already.
      
      This is unlikely to be an issue on systems where cpufreq policies are
      shared between multiple CPUs, because in those cases the policy
      utilization is computed as the maximum of the CPU utilization values
      over the whole policy and if that turns out to be low, reducing the
      frequency for the policy most likely is a good idea anyway.  On
      systems with one CPU per policy, however, it may affect performance
      adversely and even lead to increased energy consumption in some cases.
      
      On those systems it may be addressed by taking another utilization
      metric into consideration, like whether or not the CPU whose
      frequency is about to be reduced has been idle recently, because if
      that's not the case, the CPU is likely to be busy in the near future
      and its frequency should not be reduced.
      
      To that end, use the counter of idle calls in the timekeeping code.
      Namely, make the schedutil governor look at that counter for the
      current CPU every time before its frequency is about to be reduced.
      If the counter has not changed since the previous iteration of the
      governor computations for that CPU, the CPU has been busy for all
      that time and its frequency should not be decreased, so if the new
      frequency would be lower than the one set previously, the governor
      will skip the frequency update.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NJoel Fernandes <joelaf@google.com>
      b7eaf1aa
  3. 26 12月, 2016 1 次提交
    • T
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner 提交于
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855
  4. 18 3月, 2016 1 次提交
  5. 02 3月, 2016 4 次提交
    • F
      posix-cpu-timers: Migrate to use new tick dependency mask model · b7878300
      Frederic Weisbecker 提交于
      Instead of providing asynchronous checks for the nohz subsystem to verify
      posix cpu timers tick dependency, migrate the latter to the new mask.
      
      In order to keep track of the running timers and expose the tick
      dependency accordingly, we must probe the timers queuing and dequeuing
      on threads and process lists.
      
      Unfortunately it implies both task and signal level dependencies. We
      should be able to further optimize this and merge all that on the task
      level dependency, at the cost of a bit of complexity and may be overhead.
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      b7878300
    • F
      perf: Migrate perf to use new tick dependency mask model · 555e0c1e
      Frederic Weisbecker 提交于
      Instead of providing asynchronous checks for the nohz subsystem to verify
      perf event tick dependency, migrate perf to the new mask.
      
      Perf needs the tick for two situations:
      
      1) Freq events. We could set the tick dependency when those are
      installed on a CPU context. But setting a global dependency on top of
      the global freq events accounting is much easier. If people want that
      to be optimized, we can still refine that on the per-CPU tick dependency
      level. This patch dooesn't change the current behaviour anyway.
      
      2) Throttled events: this is a per-cpu dependency.
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      555e0c1e
    • F
      nohz: Use enum code for tick stop failure tracing message · e6e6cc22
      Frederic Weisbecker 提交于
      It makes nohz tracing more lightweight, standard and easier to parse.
      
      Examples:
      
             user_loop-2904  [007] d..1   517.701126: tick_stop: success=1 dependency=NONE
             user_loop-2904  [007] dn.1   518.021181: tick_stop: success=0 dependency=SCHED
          posix_timers-6142  [007] d..1  1739.027400: tick_stop: success=0 dependency=POSIX_TIMER
             user_loop-5463  [007] dN.1  1185.931939: tick_stop: success=0 dependency=PERF_EVENTS
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      e6e6cc22
    • F
      nohz: New tick dependency mask · d027d45d
      Frederic Weisbecker 提交于
      The tick dependency is evaluated on every IRQ and context switch. This
      consists is a batch of checks which determine whether it is safe to
      stop the tick or not. These checks are often split in many details:
      posix cpu timers, scheduler, sched clock, perf events.... each of which
      are made of smaller details: posix cpu timer involves checking process
      wide timers then thread wide timers. Perf involves checking freq events
      then more per cpu details.
      
      Checking these informations asynchronously every time we update the full
      dynticks state bring avoidable overhead and a messy layout.
      
      Let's introduce instead tick dependency masks: one for system wide
      dependency (unstable sched clock, freq based perf events), one for CPU
      wide dependency (sched, throttling perf events), and task/signal level
      dependencies (posix cpu timers). The subsystems are responsible
      for setting and clearing their dependency through a set of APIs that will
      take care of concurrent dependency mask modifications and kick targets
      to restart the relevant CPU tick whenever needed.
      
      This new dependency engine stays beside the old one until all subsystems
      having a tick dependency are converted to it.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      d027d45d
  6. 16 1月, 2016 1 次提交
  7. 02 9月, 2015 1 次提交
  8. 29 7月, 2015 3 次提交
    • F
      nohz: Remove useless argument on tick_nohz_task_switch() · de734f89
      Frederic Weisbecker 提交于
      Leftover from early code.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      de734f89
    • F
      nohz: Restart nohz full tick from irq exit · 73738a95
      Frederic Weisbecker 提交于
      Restart the tick when necessary from the irq exit path. It makes nohz
      full more flexible, simplify the related IPIs and doesn't bring
      significant overhead on irq exit.
      
      In a longer term view, it will allow us to piggyback the nohz kick
      on the scheduler IPI in the future instead of sending a dedicated IPI
      that often doubles the scheduler IPI on task wakeup. This will require
      more changes though including careful review of resched_curr() callers
      to include nohz full needs.
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      73738a95
    • C
      nohz: Prevent tilegx network driver interrupts · 03f6199a
      Chris Metcalf 提交于
      Normally the tilegx networking shim sends irqs to all the cores
      to distribute the load of processing incoming-packet interrupts,
      so that you can get to multiple Gb's of traffic inbound.
      
      However, in nohz_full mode we don't want to interrupt the
      nohz_full cores by default, so we limit the set of cores we use
      to only the online housekeeping cores.
      
      To make client code easier to read, we introduce a new nohz_full
      accessor, housekeeping_cpumask(), which returns a pointer to the
      housekeeping_mask if nohz_full is enabled, and otherwise returns
      the cpu_possible_mask.
      Signed-off-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      03f6199a
  9. 08 7月, 2015 2 次提交
    • T
      tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build · 37b64a42
      Thomas Gleixner 提交于
      Making tick_broadcast_oneshot_control() independent from
      CONFIG_GENERIC_CLOCKEVENTS_BROADCAST broke the build for
      CONFIG_GENERIC_CLOCKEVENTS=n because the function is not defined
      there.
      
      Provide a proper stub inline.
      
      Fixes: f32dd117 'tick/broadcast: Make idle check independent from mode and config'
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      37b64a42
    • T
      tick/broadcast: Make idle check independent from mode and config · f32dd117
      Thomas Gleixner 提交于
      Currently the broadcast busy check, which prevents the idle code from
      going into deep idle, works only in one shot mode.
      
      If NOHZ and HIGHRES are off (config or command line) there is no
      sanity check at all, so under certain conditions cpus are allowed to
      go into deep idle, where the local timer stops, and are not woken up
      again because there is no broadcast timer installed or a hrtimer based
      broadcast device is not evaluated.
      
      Move tick_broadcast_oneshot_control() into the common code and provide
      proper subfunctions for the various config combinations.
      
      The common check in tick_broadcast_oneshot_control() is for the C3STOP
      misfeature flag of the local clock event device. If its not set, idle
      can proceed. If set, further checks are necessary.
      
      Provide checks for the trivial cases:
      
       - If broadcast is disabled in the config, then return busy
      
       - If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
         busy if the broadcast device is hrtimer based.
      
       - If oneshot mode is enabled in the config call the original
         tick_broadcast_oneshot_control() function. That function needs
         extra checks which will be implemented in seperate patches.
      
      [ Split out from a larger combo patch ]
      Reported-and-tested-by: NSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
      Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
      Cc: Catalin Marinas <Catalin.Marinas@arm.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
      f32dd117
  10. 19 5月, 2015 1 次提交
  11. 07 5月, 2015 1 次提交
  12. 03 4月, 2015 4 次提交
  13. 02 4月, 2015 1 次提交
  14. 01 4月, 2015 4 次提交
  15. 16 2月, 2015 1 次提交
    • R
      PM / sleep: Make it possible to quiesce timers during suspend-to-idle · 124cf911
      Rafael J. Wysocki 提交于
      The efficiency of suspend-to-idle depends on being able to keep CPUs
      in the deepest available idle states for as much time as possible.
      Ideally, they should only be brought out of idle by system wakeup
      interrupts.
      
      However, timer interrupts occurring periodically prevent that from
      happening and it is not practical to chase all of the "misbehaving"
      timers in a whack-a-mole fashion.  A much more effective approach is
      to suspend the local ticks for all CPUs and the entire timekeeping
      along the lines of what is done during full suspend, which also
      helps to keep suspend-to-idle and full suspend reasonably similar.
      
      The idea is to suspend the local tick on each CPU executing
      cpuidle_enter_freeze() and to make the last of them suspend the
      entire timekeeping.  That should prevent timer interrupts from
      triggering until an IO interrupt wakes up one of the CPUs.  It
      needs to be done with interrupts disabled on all of the CPUs,
      though, because otherwise the suspended clocksource might be
      accessed by an interrupt handler which might lead to fatal
      consequences.
      
      Unfortunately, the existing ->enter callbacks provided by cpuidle
      drivers generally cannot be used for implementing that, because some
      of them re-enable interrupts temporarily and some idle entry methods
      cause interrupts to be re-enabled automatically on exit.  Also some
      of these callbacks manipulate local clock event devices of the CPUs
      which really shouldn't be done after suspending their ticks.
      
      To overcome that difficulty, introduce a new cpuidle state callback,
      ->enter_freeze, that will be guaranteed (1) to keep interrupts
      disabled all the time (and return with interrupts disabled) and (2)
      not to touch the CPU timer devices.  Modify cpuidle_enter_freeze() to
      look for the deepest available idle state with ->enter_freeze present
      and to make the CPU execute that callback with suspended tick (and the
      last of the online CPUs to execute it with suspended timekeeping).
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      124cf911
  16. 09 10月, 2014 1 次提交
  17. 14 9月, 2014 1 次提交
    • F
      nohz: Move nohz full init call to tick init · a80e49e2
      Frederic Weisbecker 提交于
      This way we unbloat a bit main.c and more importantly we initialize
      nohz full after init_IRQ(). This dependency will be needed in further
      patches because nohz full needs irq work to raise its own IRQ.
      Information about the support for this ability on ARM64 is obtained on
      init_IRQ() which initialize the pointer to __smp_call_function.
      
      Since tick_init() is called right after init_IRQ(), this is a good place
      to call tick_nohz_init() and prepare for that dependency.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      a80e49e2
  18. 05 9月, 2014 1 次提交
    • F
      nohz: Restore NMI safe local irq work for local nohz kick · 40bea039
      Frederic Weisbecker 提交于
      The local nohz kick is currently used by perf which needs it to be
      NMI-safe. Recent commit though (7d1311b9)
      changed its implementation to fire the local kick using the remote kick
      API. It was convenient to make the code more generic but the remote kick
      isn't NMI-safe.
      
      As a result:
      
      	WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140()
      	CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34
      	0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37
      	0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180
      	0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000
      	Call Trace:
      	<NMI>  [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a
      	[<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0
      	[<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20
      	[<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140
      	[<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90
      	[<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350
      	[<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0
      	[<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
      	[<ffffffff9a187934>] perf_event_overflow+0x14/0x20
      	[<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410
      	[<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130
      	[<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50
      	[<ffffffff9a007b72>] nmi_handle+0xd2/0x390
      	[<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390
      	[<ffffffff9a0d131b>] ? lock_release+0xab/0x330
      	[<ffffffff9a008062>] default_do_nmi+0x72/0x1c0
      	[<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200
      	[<ffffffff9a008268>] do_nmi+0xb8/0x100
      
      Lets fix this by restoring the use of local irq work for the nohz local
      kick.
      Reported-by: NCatalin Iacob <iacobcatalin@gmail.com>
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      40bea039
  19. 10 7月, 2014 1 次提交
    • P
      rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs · c0f489d2
      Paul E. McKenney 提交于
      Binding the grace-period kthreads to the timekeeping CPU resulted in
      significant performance decreases for some workloads.  For more detail,
      see:
      
      https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
      
      https://lkml.org/lkml/2014/6/4/218 for CPU statistics
      
      It turns out that it is necessary to bind the grace-period kthreads
      to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
      on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
      In other cases, it suffices to bind the grace-period kthreads to the
      set of non-nohz_full CPUs.
      
      This commit therefore creates a tick_nohz_not_full_mask that is the
      complement of tick_nohz_full_mask, and then binds the grace-period
      kthread to the set of CPUs indicated by this new mask, which covers
      the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
      case still binds the grace-period kthreads to the timekeeping CPU.
      This commit also includes the tick_nohz_full_enabled() check suggested
      by Frederic Weisbecker.
      Reported-by: NJet Chen <jet.chen@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Created housekeeping_affine() and housekeeping_mask per
        fweisbec feedback. ]
      c0f489d2
  20. 16 6月, 2014 1 次提交
    • F
      nohz: Support nohz full remote kick · 3d36aebc
      Frederic Weisbecker 提交于
      Remotely kicking a full nohz CPU in order to make it re-evaluate its
      next tick is currently implemented using the scheduler IPI.
      
      However this bloats a scheduler fast path with an off-topic feature.
      The scheduler tick was abused here for its cool "callable
      anywhere/anytime" properties.
      
      But now that the irq work subsystem can queue remote callbacks, it's
      a perfect fit to safely queue IPIs when interrupts are disabled
      without worrying about concurrent callers.
      
      So lets implement remote kick on top of irq work. This is going to
      be used when a new event requires the next tick to be recalculated:
      more than 1 task competing on the CPU, timer armed, ...
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      3d36aebc
  21. 16 1月, 2014 1 次提交
    • F
      tick: Rename tick_check_idle() to tick_irq_enter() · 5acac1be
      Frederic Weisbecker 提交于
      This makes the code more symetric against the existing tick functions
      called on irq exit: tick_irq_exit() and tick_nohz_irq_exit().
      
      These function are also symetric as they mirror each other's action:
      we start to account idle time on irq exit and we stop this accounting
      on irq entry. Also the tick is stopped on irq exit and timekeeping
      catches up with the tickless time elapsed until we reach irq entry.
      
      This rename was suggested by Peter Zijlstra a long while ago but it
      got forgotten in the mass.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Link: http://lkml.kernel.org/r/1387320692-28460-2-git-send-email-fweisbec@gmail.comSigned-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      5acac1be
  22. 03 12月, 2013 2 次提交
  23. 14 8月, 2013 2 次提交
    • F
      nohz: Optimize full dynticks's sched hooks with static keys · d13508f9
      Frederic Weisbecker 提交于
      Scheduler IPIs and task context switches are serious fast path.
      Let's try to hide as much as we can the impact of full
      dynticks APIs' off case that are called on these sites
      through the use of static keys.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      d13508f9
    • F
      nohz: Optimize full dynticks state checks with static keys · 460775df
      Frederic Weisbecker 提交于
      These APIs are frequenctly accessed and priority is given
      to optimize the full dynticks off-case in order to let
      distros enable this feature without suffering from
      significant performance regressions.
      
      Let's inline these APIs and optimize them with static keys.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      460775df
  24. 29 7月, 2013 1 次提交
    • R
      Revert "cpuidle: Quickly notice prediction failure for repeat mode" · 14851912
      Rafael J. Wysocki 提交于
      Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
      repeat mode), because it has been identified as the source of a
      significant performance regression in v3.8 and later as explained by
      Jeremy Eder:
      
        We believe we've identified a particular commit to the cpuidle code
        that seems to be impacting performance of variety of workloads.
        The simplest way to reproduce is using netperf TCP_RR test, so
        we're using that, on a pair of Sandy Bridge based servers.  We also
        have data from a large database setup where performance is also
        measurably/positively impacted, though that test data isn't easily
        share-able.
      
        Included below are test results from 3 test kernels:
      
        kernel       reverts
        -----------------------------------------------------------
        1) vanilla   upstream (no reverts)
      
        2) perfteam2 reverts e11538d1
      
        3) test      reverts 69a37bea
                             e11538d1
      
        In summary, netperf TCP_RR numbers improve by approximately 4%
        after reverting 69a37bea.  When
        69a37bea is included, C0 residency
        never seems to get above 40%.  Taking that patch out gets C0 near
        100% quite often, and performance increases.
      
        The below data are histograms representing the %c0 residency @
        1-second sample rates (using turbostat), while under netperf test.
      
        - If you look at the first 4 histograms, you can see %c0 residency
          almost entirely in the 30,40% bin.
        - The last pair, which reverts 69a37bea,
          shows %c0 in the 80,90,100% bins.
      
        Below each kernel name are netperf TCP_RR trans/s numbers for the
        particular kernel that can be disclosed publicly, comparing the 3
        test kernels.  We ran a 4th test with the vanilla kernel where
        we've also set /dev/cpu_dma_latency=0 to show overall impact
        boosting single-threaded TCP_RR performance over 11% above
        baseline.
      
        3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
        TCP_RR trans/s 54323.78
      
        -----------------------------------------------------------
        3.10-rc2 vanilla RX (no reverts)
        TCP_RR trans/s 48192.47
      
        Receiver %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    59]:
        ***********************************************************
           40.0000 -    50.0000 [     1]: *
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        Sender %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    11]: ***********
           40.0000 -    50.0000 [    49]:
        *************************************************
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        -----------------------------------------------------------
        3.10-rc2 perfteam2 RX (reverts commit
        e11538d1)
        TCP_RR trans/s 49698.69
      
        Receiver %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     1]: *
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    59]:
        ***********************************************************
           40.0000 -    50.0000 [     0]:
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        Sender %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [     2]: **
           40.0000 -    50.0000 [    58]:
        **********************************************************
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        -----------------------------------------------------------
        3.10-rc2 test RX (reverts 69a37bea
        and e11538d1)
        TCP_RR trans/s 47766.95
      
        Receiver %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     1]: *
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    27]: ***************************
           40.0000 -    50.0000 [     2]: **
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     2]: **
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [    28]: ****************************
      
        Sender:
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    11]: ***********
           40.0000 -    50.0000 [     0]:
           50.0000 -    60.0000 [     1]: *
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     3]: ***
           80.0000 -    90.0000 [     7]: *******
           90.0000 -   100.0000 [    38]: **************************************
      
        These results demonstrate gaining back the tendency of the CPU to
        stay in more responsive, performant C-states (and thus yield
        measurably better performance), by reverting commit
        69a37bea.
      Requested-by: NJeremy Eder <jeder@redhat.com>
      Tested-by: NLen Brown <len.brown@intel.com>
      Cc: 3.8+ <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      14851912
  25. 23 4月, 2013 1 次提交
    • F
      nohz: Re-evaluate the tick for the new task after a context switch · 99e5ada9
      Frederic Weisbecker 提交于
      When a task is scheduled in, it may have some properties
      of its own that could make the CPU reconsider the need for
      the tick: posix cpu timers, perf events, ...
      
      So notify the full dynticks subsystem when a task gets
      scheduled in and re-check the tick dependency at this
      stage. This is done through a self IPI to avoid messing
      up with any current lock scenario.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      99e5ada9