1. 27 8月, 2013 2 次提交
  2. 14 8月, 2013 2 次提交
  3. 23 4月, 2013 1 次提交
  4. 09 4月, 2013 1 次提交
  5. 27 11月, 2012 1 次提交
    • J
      cpuidle: Measure idle state durations with monotonic clock · a474a515
      Julius Werner 提交于
      Many cpuidle drivers measure their time spent in an idle state by
      reading the wallclock time before and after idling and calculating the
      difference. This leads to erroneous results when the wallclock time gets
      updated by another processor in the meantime, adding that clock
      adjustment to the idle state's time counter.
      
      If the clock adjustment was negative, the result is even worse due to an
      erroneous cast from int to unsigned long long of the last_residency
      variable. The negative 32 bit integer will zero-extend and result in a
      forward time jump of roughly four billion milliseconds or 1.3 hours on
      the idle state residency counter.
      
      This patch changes all affected cpuidle drivers to either use the
      monotonic clock for their measurements or make use of the generic time
      measurement wrapper in cpuidle.c, which was already working correctly.
      Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
      should always already be disabled before entering the idle function, and
      not get reenabled until the generic wrapper has performed its second
      measurement). It also removes the erroneous cast, making sure that
      negative residency values are applied correctly even though they should
      not appear anymore.
      Signed-off-by: NJulius Werner <jwerner@chromium.org>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Tested-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a474a515
  6. 18 10月, 2012 3 次提交
    • D
      cpuidle/powerpc: Fix snooze state problem in the cpuidle design on pseries. · 83dac594
      Deepthi Dharwar 提交于
      Earlier without cpuidle framework on pseries, the native arch
      idle routine comprised of both snooze and nap
      states.  smt_snooze_delay variable was used to delay
      the idle process entry to deeper idle state like  nap.
      With the coming of cpuidle, this arch specific idle was replaced
      by two different idle routines, one for supporting snooze and other
      for nap. This enabled addition of more
      low level idle states on pseries in the future.
      
      On adopting the generic cpuidle framework for POWER systems,
      the decision of which idle state to choose from,  given a predicted
      idle time is taken by the menu governor based on
      target_residency and  exit_latency of the idle states.
      target_residency is the minimum time to be resident in that idle state.
      Exit_latency is time taken to exit out of idle state.
      Deeper the idle state, both the target residency and exit latency
      would be higher.
      
      In the current design, smt_snooze_delay is used as target_residency
      for the  snooze state which is incorrect, as it is not the
      minimum but the maximum duration to be in snooze state.
      This would  result in the governor in taking bad decision,
      as presently target_residency of nap < target_residency of snooze
      inspite of nap being deeper idle state.
      
      This patch aims to fix this problem by replacing the smt_snooze_delay loop
      in snooze state, with the need_resched()  as the governor is aware of
      entry and exit of various idle transitions based on which
      next idle time prediction.
      
      The governor is intelligent enough to determine the idle state the needs to
      be transitioned to and maintains a whole of heuristics including
      io load, previous idle states predictions etc for the same, based on
      which idle state entry decision is taken.
      
      With this fix, of setting target_residency of snooze to 0
      					     nap to smt_snooze_delay
      if the predicted idle time is less
      than smt_snooze_delay (target_residency of nap)
      value governor would pick snooze state, else nap. This adhers to the
      previous native idle design.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      83dac594
    • D
      cpuidle/powerpc: Fix smt_snooze_delay functionality. · 8ea959a1
      Deepthi Dharwar 提交于
      smt_snooze_delay was designed to  delay idle loop's nap entry
      in the native idle code before it got  ported over to use as part of
      the cpuidle framework.
      
      A -ve value  assigned to smt_snooze_delay should result in
      busy looping, in other words disabling the entry to nap state.
      
      	- https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-May/082450.html
      
      This particular functionality can be achieved currently by
      echo 1 > /sys/devices/system/cpu/cpu*/state1/disable
      but it is broken when one assigns -ve value to  the smt_snooze_delay
      variable either via sysfs entry or ppc64_cpu util.
      
      This patch aims to fix this, by disabling nap state when smt_snooze_delay
      variable is set to -ve value.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8ea959a1
    • D
      cpuidle/powerpc: Fix target residency initialisation in pseries cpuidle · 817deb05
      Deepthi Dharwar 提交于
      Remove the redundant target residency initialisation in pseries_cpuidle_driver_init().
      This is currently over-writing the residency time updated as part of the static
      table, resulting in  all the idle states having the same target
      residency of 100us which is incorrect. This may result in the menu governor making
      wrong state decisions.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      817deb05
  7. 11 7月, 2012 1 次提交
    • D
      powerpc/cpuidle: Fixes for pseries_idle hotplug notifier · 852d8cb1
      Deepthi Dharwar 提交于
      Currently the call to pseries_notify_cpuidle_add_cpu(), that takes
      action on the cpuidle front when a cpu is added/removed
      is being made from smp_xics_setup_cpu().
      This caused lockdep issues as
      reported https://lkml.org/lkml/2012/5/17/2
      
      On addition of each cpu,
      resources were cleared and re-allocated each time, all in critical
      section as part of start_secondary() call were interrupts are disabled.
      To resolve this issue, the pseries_notify_cpuidle_add_cpu() call is
      is being replaced by a hotplug notifier which
      would prevent cpuidle resources from being
      released and allocated each time cpu is onlined in the critical code path.
      It was fixed in https://lkml.org/lkml/2012/5/18/174.
      
      Also it is essential to call cpuidle_enable/disable_device
      between  cpuidle_pause_and_lock()  and
      cpuidle_resume_and_unlock()  when used externally
      to avoid race conditions. Add support for CPU_ONLINE_FROZEN
      and CPU_DEAD_FROZEN as part of hotplug notify event for
      pseries_idle  and unregister hotplug notifier
      while exiting out. The above mentioned issues
      are fixed as part of this patch.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      852d8cb1
  8. 10 7月, 2012 1 次提交
    • B
      powerpc: More fixes for lazy IRQ vs. idle · be2cf20a
      Benjamin Herrenschmidt 提交于
      Looks like we still have issues with pSeries and Cell idle code
      vs. the lazy irq state. In fact, the reset fixes that went upstream
      are exposing the problem more by causing BUG_ON() to trigger (which
      this patch turns into a WARN_ON instead).
      
      We need to be careful when using a variant of low power state that
      has the side effect of turning interrupts back on, to properly set
      all the SW & lazy state to look as if everything is enabled before
      we enter the low power state with MSR:EE off as we will return with
      MSR:EE on. If not, we have a discrepancy of state which can cause
      things to go very wrong later on.
      
      This patch moves the logic into a helper and uses it from the
      pseries and cell idle code. The power4/970 idle code already got
      things right (in assembly even !) so I'm not touching it. The power7
      "bare metal" idle code is subtly different and correct. Remains PA6T
      and some hypervisor based Cell platforms which have questionable
      code in there, but they are mostly dead platforms so I'll fix them
      when I manage to get final answers from the respective maintainers
      about how the low power state actually works on them.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: stable@vger.kernel.org [v3.4]
      be2cf20a
  9. 03 7月, 2012 1 次提交
  10. 29 6月, 2012 1 次提交
    • A
      powerpc: check_and_cede_processor() never cedes · 0b17ba72
      Anton Blanchard 提交于
      Commit f948501b ("Make hard_irq_disable() actually hard-disable
      interrupts") caused check_and_cede_processor to stop working.
      ->irq_happened will never be zero right after a hard_irq_disable
      so the compiler removes the call to cede_processor completely.
      
      The bug was introduced back in the lazy interrupt handling rework
      of 3.4 but was hidden until recently because hard_irq_disable did
      nothing.
      
      This issue will eventually appear in 3.4 stable since the
      hard_irq_disable fix is marked stable, so mark this one for stable
      too.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0b17ba72
  11. 29 3月, 2012 1 次提交
  12. 09 3月, 2012 1 次提交
    • B
      powerpc: Rework lazy-interrupt handling · 7230c564
      Benjamin Herrenschmidt 提交于
      The current implementation of lazy interrupts handling has some
      issues that this tries to address.
      
      We don't do the various workarounds we need to do when re-enabling
      interrupts in some cases such as when returning from an interrupt
      and thus we may still lose or get delayed decrementer or doorbell
      interrupts.
      
      The current scheme also makes it much harder to handle the external
      "edge" interrupts provided by some BookE processors when using the
      EPR facility (External Proxy) and the Freescale Hypervisor.
      
      Additionally, we tend to keep interrupts hard disabled in a number
      of cases, such as decrementer interrupts, external interrupts, or
      when a masked decrementer interrupt is pending. This is sub-optimal.
      
      This is an attempt at fixing it all in one go by reworking the way
      we do the lazy interrupt disabling from the ground up.
      
      The base idea is to replace the "hard_enabled" field with a
      "irq_happened" field in which we store a bit mask of what interrupt
      occurred while soft-disabled.
      
      When re-enabling, either via arch_local_irq_restore() or when returning
      from an interrupt, we can now decide what to do by testing bits in that
      field.
      
      We then implement replaying of the missed interrupts either by
      re-using the existing exception frame (in exception exit case) or via
      the creation of a new one from an assembly trampoline (in the
      arch_local_irq_enable case).
      
      This removes the need to play with the decrementer to try to create
      fake interrupts, among others.
      
      In addition, this adds a few refinements:
      
       - We no longer  hard disable decrementer interrupts that occur
      while soft-disabled. We now simply bump the decrementer back to max
      (on BookS) or leave it stopped (on BookE) and continue with hard interrupts
      enabled, which means that we'll potentially get better sample quality from
      performance monitor interrupts.
      
       - Timer, decrementer and doorbell interrupts now hard-enable
      shortly after removing the source of the interrupt, which means
      they no longer run entirely hard disabled. Again, this will improve
      perf sample quality.
      
       - On Book3E 64-bit, we now make the performance monitor interrupt
      act as an NMI like Book3S (the necessary C code for that to work
      appear to already be present in the FSL perf code, notably calling
      nmi_enter instead of irq_enter). (This also fixes a bug where BookE
      perfmon interrupts could clobber r14 ... oops)
      
       - We could make "masked" decrementer interrupts act as NMIs when doing
      timer-based perf sampling to improve the sample quality.
      
      Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2:
      
      - Add hard-enable to decrementer, timer and doorbells
      - Fix CR clobber in masked irq handling on BookE
      - Make embedded perf interrupt act as an NMI
      - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
        to retrigger an interrupt without preventing hard-enable
      
      v3:
      
       - Fix or vs. ori bug on Book3E
       - Fix enabling of interrupts for some exceptions on Book3E
      
      v4:
      
       - Fix resend of doorbells on return from interrupt on Book3E
      
      v5:
      
       - Rebased on top of my latest series, which involves some significant
      rework of some aspects of the patch.
      
      v6:
       - 32-bit compile fix
       - more compile fixes with various .config combos
       - factor out the asm code to soft-disable interrupts
       - remove the C wrapper around preempt_schedule_irq
      
      v7:
       - Fix a bug with hard irq state tracking on native power7
      7230c564
  13. 08 12月, 2011 2 次提交