1. 31 3月, 2018 2 次提交
  2. 23 3月, 2018 1 次提交
    • P
      powerpc/powernv: Provide a way to force a core into SMT4 mode · 7672691a
      Paul Mackerras 提交于
      POWER9 processors up to and including "Nimbus" v2.2 have hardware
      bugs relating to transactional memory and thread reconfiguration.
      One of these bugs has a workaround which is to get the core into
      SMT4 state temporarily.  This workaround is only needed when
      running bare-metal.
      
      This patch provides a function which gets the core into SMT4 mode
      by preventing threads from going to a stop state, and waking up
      those which are already in a stop state.  Once at least 3 threads
      are not in a stop state, the core will be in SMT4 and we can
      continue.
      
      To do this, we add a "dont_stop" flag to the paca to tell the
      thread not to go into a stop state.  If this flag is set,
      power9_idle_stop() just returns immediately with a return value
      of 0.  The pnv_power9_force_smt4_catch() function does the following:
      
      1. Set the dont_stop flag for each thread in the core, except
         ourselves (in fact we use an atomic_inc() in case more than
         one thread is calling this function concurrently).
      2. See how many threads are awake, indicated by their
         requested_psscr field in the paca being 0.  If this is at
         least 3, skip to step 5.
      3. Send a doorbell interrupt to each thread that was seen as
         being in a stop state in step 2.
      4. Until at least 3 threads are awake, scan the threads to which
         we sent a doorbell interrupt and check if they are awake now.
      
      This relies on the following properties:
      
      - Once dont_stop is non-zero, requested_psccr can't go from zero to
        non-zero, except transiently (and without the thread doing stop).
      - requested_psscr being zero guarantees that the thread isn't in
        a state-losing stop state where thread reconfiguration could occur.
      - Doing stop with a PSSCR value of 0 won't be a state-losing stop
        and thus won't allow thread reconfiguration.
      - Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
        must be in SMT4 mode, since SMT modes are powers of 2.
      
      This does add a sync to power9_idle_stop(), which is necessary to
      provide the correct ordering between setting requested_psscr and
      checking dont_stop.  The overhead of the sync should be unnoticeable
      compared to the latency of going into and out of a stop state.
      
      Because some objected to incurring this extra latency on systems where
      the XER[SO] bug is not relevant, I have put the test in
      power9_idle_stop inside a feature section.  This means that
      pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems
      without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will
      probably hang the system.
      
      In order to cater for uses where the caller has an operation that
      has to be done while the core is in SMT4, the core continues to be
      kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
      until the pnv_power9_force_smt4_release() function is called.
      It undoes the effect of step 1 above and allows the other threads
      to go into a stop state.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7672691a
  3. 15 11月, 2017 1 次提交
    • M
      powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature · 3ffa9d9e
      Michael Ellerman 提交于
      Recently we added a CPU feature for Power9 DD2.0, to capture the fact
      that some workarounds are required only on Power9 DD1 and DD2.0 but
      not DD2.1 or later.
      
      Then in commit 9d2f510a ("powerpc/64s/idle: avoid POWER9 DD1 and
      DD2.0 ERAT workaround on DD2.1") and commit e3646330
      "powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on
      DD2.1") we changed CPU_FTR_SECTIONs to check for DD1 or DD20, eg:
      
        BEGIN_FTR_SECTION
                PPC_INVALIDATE_ERAT
        END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1 | CPU_FTR_POWER9_DD20)
      
      Unfortunately although this reads as "if set DD1 or DD2.0", the or is
      a bitwise or and actually generates a mask of both bits. The code that
      does the feature patching then checks that the value of the CPU
      features masked with that mask are equal to the mask.
      
      So the end result is we're checking for DD1 and DD20 being set, which
      never happens. Yes the API is terrible.
      
      Removing the ERAT workaround on DD2.0 results in random SEGVs, the
      system tends to boot, but things randomly die including sometimes
      dhclient, udev etc.
      
      To fix the problem and hopefully avoid it in future, we remove the
      DD2.0 CPU feature and instead add a DD2.1 (or later) feature. This
      allows us to easily express that the workarounds are required if DD2.1
      is not set.
      
      At some point we will drop the DD1 workarounds entirely and some of
      this can be cleaned up.
      
      Fixes: 9d2f510a ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 ERAT workaround on DD2.1")
      Fixes: e3646330 ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on DD2.1")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3ffa9d9e
  4. 06 11月, 2017 2 次提交
  5. 19 10月, 2017 1 次提交
  6. 29 8月, 2017 5 次提交
  7. 04 8月, 2017 1 次提交
  8. 01 8月, 2017 1 次提交
    • G
      powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle · e1c1cfed
      Gautham R. Shenoy 提交于
      The stop4 idle state on POWER9 is a deep idle state which loses
      hypervisor resources, but whose latency is low enough that it can be
      exposed via cpuidle.
      
      Until now, the deep idle states which lose hypervisor resources (eg:
      winkle) were only exposed via CPU-Hotplug.  Hence currently on wakeup
      from such states, barring a few SPRs which need to be restored to
      their older value, rest of the SPRS are reinitialized to their values
      corresponding to that at boot time.
      
      When stop4 is used in the context of cpuidle, we want these additional
      SPRs to be restored to their older value, to ensure that the context
      on the CPU coming back from idle is same as it was before going idle.
      
      In this patch, we define a SPR save area in PACA (since we have used
      up the volatile register space in the stack) and on POWER9, we restore
      SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
      SPRN_MMCR2 to the values they had before entering stop.
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e1c1cfed
  9. 18 7月, 2017 1 次提交
  10. 28 6月, 2017 1 次提交
    • A
      powerpc/powernv/idle: Clear r12 on wakeup from stop lite · 4d0d7c02
      Akshay Adiga 提交于
      pnv_wakeup_noloss() expects r12 to contain SRR1 value to determine if the wakeup
      reason is an HMI in CHECK_HMI_INTERRUPT.
      
      When we wakeup with ESL=0, SRR1 will not contain the wakeup reason, so there is
      no point setting r12 to SRR1.
      
      However, we don't set r12 at all so r12 contains garbage (likely a kernel
      pointer), and is still used to check HMI assuming that it contained SRR1. This
      causes the OPAL msglog to be filled with the following print:
      
        HMI: Received HMI interrupt: HMER = 0x0040000000000000
      
      This patch clears r12 after waking up from stop with ESL=EC=0, so that we don't
      accidentally enter the HMI handler in pnv_wakeup_noloss() if the value of
      r12[42:45] corresponds to HMI as wakeup reason.
      
      Prior to commit 9d292501 ("powerpc/64s/idle: Avoid SRR usage in idle
      sleep/wake paths") this bug existed, in that we would incorrectly look at SRR1
      to check for a HMI when SRR1 didn't contain a wakeup reason. However the SRR1
      value would just happen to never have bits 42:45 set.
      
      Fixes: 9d292501 ("powerpc/64s/idle: Avoid SRR usage in idle sleep/wake paths")
      Signed-off-by: NAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Change log and comment massaging]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4d0d7c02
  11. 27 6月, 2017 1 次提交
  12. 19 6月, 2017 3 次提交
  13. 30 5月, 2017 3 次提交
    • G
      powerpc/powernv/idle: Use Requested Level for restoring state on P9 DD1 · 22c6663d
      Gautham R. Shenoy 提交于
      On Power9 DD1 due to a hardware bug the Power-Saving Level Status
      field (PLS) of the PSSCR for a thread waking up from a deep state can
      under-report if some other thread in the core is in a shallow stop
      state. The scenario in which this can manifest is as follows:
      
         1) All the threads of the core are in deep stop.
         2) One of the threads is woken up. The PLS for this thread will
            correctly reflect that it is waking up from deep stop.
         3) The thread that has woken up now executes a shallow stop.
         4) When some other thread in the core is woken, its PLS will reflect
            the shallow stop state.
      
      Thus, the subsequent thread for which the PLS is under-reporting the
      wakeup state will not restore the hypervisor resources.
      
      Hence, on DD1 systems, use the Requested Level (RL) field as a
      workaround to restore the contents of the hypervisor resources on the
      wakeup from the stop state.
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      22c6663d
    • G
      powerpc/powernv/idle: Restore LPCR on wakeup from deep-stop · cb0be7ec
      Gautham R. Shenoy 提交于
      On wakeup from a deep stop state which is supposed to lose the
      hypervisor state, we don't restore the LPCR to the old value but set
      it to a "sane" value via cur_cpu_spec->cpu_restore().
      
      The problem is that the "sane" value doesn't include UPRT and the HR
      bits which are required to run correctly in Radix mode.
      
      Fix this on POWER9 onwards by restoring the LPCR value whatever it was
      before executing the stop instruction.
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cb0be7ec
    • G
      powerpc/powernv/idle: Decouple Timebase restore & Per-core SPRs restore · ec486735
      Gautham R. Shenoy 提交于
      On POWER8, in case of
         -  nap: both timebase and hypervisor state is retained.
         -  fast-sleep: timebase is lost. But the hypervisor state is retained.
         -  winkle: timebase and hypervisor state is lost.
      
      Hence, the current code for handling exit from a idle state assumes
      that if the timebase value is retained, then so is the hypervisor
      state. Thus, the current code doesn't restore per-core hypervisor
      state in such cases.
      
      But that is no longer the case on POWER9 where we do have stop states
      in which timebase value is retained, but the hypervisor state is
      lost. So we have to ensure that the per-core hypervisor state gets
      restored in such cases.
      
      Fix this by ensuring that even in the case when timebase is retained,
      we explicitly check if we are waking up from a deep stop that loses
      per-core hypervisor state (indicated by cr4 being eq or gt), and if
      this is the case, we restore the per-core hypervisor state.
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ec486735
  14. 16 5月, 2017 1 次提交
  15. 23 4月, 2017 8 次提交
  16. 11 4月, 2017 1 次提交
    • G
      powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1 · 17ed4c8f
      Gautham R. Shenoy 提交于
      POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up
      from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus
      the HSPRG0 of a thread waking up from can contain the paca pointer of
      its sibling.
      
      This patch implements a context recovery framework within threads of a
      core, by provisioning space in paca_struct for saving every sibling
      threads's paca pointers. Basically, we should be able to arrive at the
      right paca pointer from any of the thread's existing paca pointer.
      
      At bootup, during powernv idle-init, we save the paca address of every
      CPU in each one its siblings paca_struct in the slot corresponding to
      this CPU's index in the core.
      
      On wakeup from a stop, the thread will determine its index in the core
      from the TIR register and recover its PACA pointer by indexing into
      the correct slot in the provisioned space in the current PACA.
      
      Furthermore, ensure that the NVGPRs are restored from the stack on the
      way out by setting the NAPSTATELOST in paca.
      
      [Changelog written with inputs from svaidy@linux.vnet.ibm.com]
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Call it a bug]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      17ed4c8f
  17. 20 3月, 2017 1 次提交
  18. 03 3月, 2017 1 次提交
    • G
      powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stop · 424f8acd
      Gautham R. Shenoy 提交于
      Commit 09206b60 ("powernv: Pass PSSCR value and mask to
      power9_idle_stop") added additional code in power_enter_stop() to
      distinguish between stop requests whose PSSCR had ESL=EC=1 from those
      which did not. When ESL=EC=1, we do a forward-jump to a location
      labelled by "1", which had the code to handle the ESL=EC=1 case.
      
      Unfortunately just a couple of instructions before this label, is the
      macro IDLE_STATE_ENTER_SEQ() which also has a label "1" in its
      expansion.
      
      As a result, the current code can result in directly executing stop
      instruction for deep stop requests with PSSCR ESL=EC=1, without saving
      the hypervisor state.
      
      Fix this BUG by labeling the location that handles ESL=EC=1 case with
      a more descriptive label ".Lhandle_esl_ec_set" (local label suggestion
      a la .Lxx from Anton Blanchard).
      
      While at it, rename the label "2" labelling the location of the code
      handling entry into deep stop states with ".Lhandle_deep_stop".
      
      For a good measure, change the label in IDLE_STATE_ENTER_SEQ() macro
      to an not-so commonly used value in order to avoid similar mishaps in
      the future.
      
      Fixes: 09206b60 ("powernv: Pass PSSCR value and mask to power9_idle_stop")
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      424f8acd
  19. 07 2月, 2017 1 次提交
  20. 31 1月, 2017 2 次提交
    • G
      powernv: Pass PSSCR value and mask to power9_idle_stop · 09206b60
      Gautham R. Shenoy 提交于
      The power9_idle_stop method currently takes only the requested stop
      level as a parameter and picks up the rest of the PSSCR bits from a
      hand-coded macro. This is not a very flexible design, especially when
      the firmware has the capability to communicate the psscr value and the
      mask associated with a particular stop state via device tree.
      
      This patch modifies the power9_idle_stop API to take as parameters the
      PSSCR value and the PSSCR mask corresponding to the stop state that
      needs to be set. These PSSCR value and mask are respectively obtained
      by parsing the "ibm,cpu-idle-state-psscr" and
      "ibm,cpu-idle-state-psscr-mask" fields from the device tree.
      
      In addition to this, the patch adds support for handling stop states
      for which ESL and EC bits in the PSSCR are zero. As per the
      architecture, a wakeup from these stop states resumes execution from
      the subsequent instruction as opposed to waking up at the System
      Vector.
      
      The older firmware sets only the Requested Level (RL) field in the
      psscr and psscr-mask exposed in the device tree. For older firmware
      where psscr-mask=0xf, this patch will set the default sane values that
      the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and
      TR). For the new firmware, the patch will validate that the invariants
      required by the ISA for the psscr values are maintained by the
      firmware.
      
      This skiboot patch that exports fully populated PSSCR values and the
      mask for all the stop states can be found here:
      https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html
      
      [Optimize the number of instructions before entering STOP with
      ESL=EC=0, validate the PSSCR values provided by the firimware
      maintains the invariants required as per the ISA suggested by Balbir
      Singh]
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      09206b60
    • G
      powernv:idle: Add IDLE_STATE_ENTER_SEQ_NORET macro · 823b7bd5
      Gautham R. Shenoy 提交于
      Currently all the low-power idle states are expected to wake up
      at reset vector 0x100. Which is why the macro IDLE_STATE_ENTER_SEQ
      that puts the CPU to an idle state and never returns.
      
      On ISA v3.0, when the ESL and EC bits in the PSSCR are zero, the CPU
      is expected to wake up at the next instruction of the idle
      instruction.
      
      This patch adds a new macro named IDLE_STATE_ENTER_SEQ_NORET for the
      no-return variant and reuses the name IDLE_STATE_ENTER_SEQ
      for a variant that allows resuming operation at the instruction next
      to the idle-instruction.
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      823b7bd5
  21. 24 10月, 2016 2 次提交
    • P
      powerpc/64: Fix race condition in setting lock bit in idle/wakeup code · 09b7e37b
      Paul Mackerras 提交于
      This fixes a race condition where one thread that is entering or
      leaving a power-saving state can inadvertently ignore the lock bit
      that was set by another thread, and potentially also clear it.
      The core_idle_lock_held function is called when the lock bit is
      seen to be set.  It polls the lock bit until it is clear, then
      does a lwarx to load the word containing the lock bit and thread
      idle bits so it can be updated.  However, it is possible that the
      value loaded with the lwarx has the lock bit set, even though an
      immediately preceding lwz loaded a value with the lock bit clear.
      If this happens then we go ahead and update the word despite the
      lock bit being set, and when called from pnv_enter_arch207_idle_mode,
      we will subsequently clear the lock bit.
      
      No identifiable misbehaviour has been attributed to this race.
      
      This fixes it by checking the lock bit in the value loaded by the
      lwarx.  If it is set then we just go back and keep on polling.
      
      Fixes: b32aadc1 ("powerpc/powernv: Fix race in updating core_idle_state")
      Cc: stable@vger.kernel.org # v4.2+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      09b7e37b
    • P
      powerpc/64: Re-fix race condition between going idle and entering guest · 56c46222
      Paul Mackerras 提交于
      Commit 8117ac6a ("powerpc/powernv: Switch off MMU before entering
      nap/sleep/rvwinkle mode", 2014-12-10) fixed a race condition where one
      thread entering a KVM guest could switch the MMU context to the guest
      while another thread was still in host kernel context with the MMU on.
      That commit moved the point where a thread entering a power-saving
      mode set its kvm_hstate.hwthread_state field in its PACA to
      KVM_HWTHREAD_IN_IDLE from a point where the MMU was on to after the
      MMU had been switched off.  That commit also added a comment
      explaining that we have to switch to real mode before setting
      hwthread_state to avoid this race.
      
      Nevertheless, commit 4eae2c9a ("powerpc/powernv: Make
      pnv_powersave_common more generic", 2016-07-08) subsequently moved
      the setting of hwthread_state back to a point where the MMU is on,
      thus reintroducing the race, despite the comment saying that this
      should not be done being included in full in the context lines of
      the patch that did it.
      
      This fixes the race again and adds a bigger and shoutier comment
      explaining the potential race condition.
      
      Fixes: 4eae2c9a ("powerpc/powernv: Make pnv_powersave_common more generic")
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NShreyas B. Prabhu <shreyasbp@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      56c46222