1. 07 5月, 2018 1 次提交
    • M
      powerpc/64: Remove unused paca->soft_enabled · c4ec1f03
      Michael Ellerman 提交于
      In commit 4e26bc4a ("powerpc/64: Rename soft_enabled to
      irq_soft_mask") we renamed paca->soft_enabled. But then in commit
      8e0b634b ("powerpc/64s: Do not allocate lppaca if we are not
      virtualized") we added it back. Oops. This happened because the two
      patches were in flight at the same time and rebased vs each other
      multiple times, and we missed it in review.
      
      Fixes: 8e0b634b ("powerpc/64s: Do not allocate lppaca if we are not virtualized")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c4ec1f03
  2. 30 3月, 2018 3 次提交
  3. 23 3月, 2018 1 次提交
    • P
      powerpc/powernv: Provide a way to force a core into SMT4 mode · 7672691a
      Paul Mackerras 提交于
      POWER9 processors up to and including "Nimbus" v2.2 have hardware
      bugs relating to transactional memory and thread reconfiguration.
      One of these bugs has a workaround which is to get the core into
      SMT4 state temporarily.  This workaround is only needed when
      running bare-metal.
      
      This patch provides a function which gets the core into SMT4 mode
      by preventing threads from going to a stop state, and waking up
      those which are already in a stop state.  Once at least 3 threads
      are not in a stop state, the core will be in SMT4 and we can
      continue.
      
      To do this, we add a "dont_stop" flag to the paca to tell the
      thread not to go into a stop state.  If this flag is set,
      power9_idle_stop() just returns immediately with a return value
      of 0.  The pnv_power9_force_smt4_catch() function does the following:
      
      1. Set the dont_stop flag for each thread in the core, except
         ourselves (in fact we use an atomic_inc() in case more than
         one thread is calling this function concurrently).
      2. See how many threads are awake, indicated by their
         requested_psscr field in the paca being 0.  If this is at
         least 3, skip to step 5.
      3. Send a doorbell interrupt to each thread that was seen as
         being in a stop state in step 2.
      4. Until at least 3 threads are awake, scan the threads to which
         we sent a doorbell interrupt and check if they are awake now.
      
      This relies on the following properties:
      
      - Once dont_stop is non-zero, requested_psccr can't go from zero to
        non-zero, except transiently (and without the thread doing stop).
      - requested_psscr being zero guarantees that the thread isn't in
        a state-losing stop state where thread reconfiguration could occur.
      - Doing stop with a PSSCR value of 0 won't be a state-losing stop
        and thus won't allow thread reconfiguration.
      - Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
        must be in SMT4 mode, since SMT modes are powers of 2.
      
      This does add a sync to power9_idle_stop(), which is necessary to
      provide the correct ordering between setting requested_psscr and
      checking dont_stop.  The overhead of the sync should be unnoticeable
      compared to the latency of going into and out of a stop state.
      
      Because some objected to incurring this extra latency on systems where
      the XER[SO] bug is not relevant, I have put the test in
      power9_idle_stop inside a feature section.  This means that
      pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems
      without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will
      probably hang the system.
      
      In order to cater for uses where the caller has an operation that
      has to be done while the core is in SMT4, the core continues to be
      kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
      until the pnv_power9_force_smt4_release() function is called.
      It undoes the effect of step 1 above and allows the other threads
      to go into a stop state.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7672691a
  4. 06 3月, 2018 1 次提交
  5. 23 1月, 2018 1 次提交
    • N
      powerpc/64s: Improve RFI L1-D cache flush fallback · bdcb1aef
      Nicholas Piggin 提交于
      The fallback RFI flush is used when firmware does not provide a way
      to flush the cache. It's a "displacement flush" that evicts useful
      data by displacing it with an uninteresting buffer.
      
      The flush has to take care to work with implementation specific cache
      replacment policies, so the recipe has been in flux. The initial
      slow but conservative approach is to touch all lines of a congruence
      class, with dependencies between each load. It has since been
      determined that a linear pattern of loads without dependencies is
      sufficient, and is significantly faster.
      
      Measuring the speed of a null syscall with RFI fallback flush enabled
      gives the relative improvement:
      
      P8 - 1.83x
      P9 - 1.75x
      
      The flush also becomes simpler and more adaptable to different cache
      geometries.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bdcb1aef
  6. 19 1月, 2018 1 次提交
  7. 10 1月, 2018 1 次提交
    • M
      powerpc/64s: Add support for RFI flush of L1-D cache · aa8a5e00
      Michael Ellerman 提交于
      On some CPUs we can prevent the Meltdown vulnerability by flushing the
      L1-D cache on exit from kernel to user mode, and from hypervisor to
      guest.
      
      This is known to be the case on at least Power7, Power8 and Power9. At
      this time we do not know the status of the vulnerability on other CPUs
      such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale
      CPUs. As more information comes to light we can enable this, or other
      mechanisms on those CPUs.
      
      The vulnerability occurs when the load of an architecturally
      inaccessible memory region (eg. userspace load of kernel memory) is
      speculatively executed to the point where its result can influence the
      address of a subsequent speculatively executed load.
      
      In order for that to happen, the first load must hit in the L1,
      because before the load is sent to the L2 the permission check is
      performed. Therefore if no kernel addresses hit in the L1 the
      vulnerability can not occur. We can ensure that is the case by
      flushing the L1 whenever we return to userspace. Similarly for
      hypervisor vs guest.
      
      In order to flush the L1-D cache on exit, we add a section of nops at
      each (h)rfi location that returns to a lower privileged context, and
      patch that with some sequence. Newer firmwares are able to advertise
      to us that there is a special nop instruction that flushes the L1-D.
      If we do not see that advertised, we fall back to doing a displacement
      flush in software.
      
      For guest kernels we support migration between some CPU versions, and
      different CPUs may use different flush instructions. So that we are
      prepared to migrate to a machine with a different flush instruction
      activated, we may have to patch more than one flush instruction at
      boot if the hypervisor tells us to.
      
      In the end this patch is mostly the work of Nicholas Piggin and
      Michael Ellerman. However a cast of thousands contributed to analysis
      of the issue, earlier versions of the patch, back ports testing etc.
      Many thanks to all of them.
      Tested-by: NJon Masters <jcm@redhat.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aa8a5e00
  8. 13 11月, 2017 1 次提交
  9. 06 11月, 2017 1 次提交
    • M
      powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64 · 4e003747
      Michael Ellerman 提交于
      CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU
      on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU
      found in "server" processors, from IBM mainly.
      
      Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's
      annoying to have two symbols that always have the same value, it's not
      quite annoying enough to bother removing one.
      
      However with the arrival of Power9, we now have the situation where
      CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the
      Radix MMU - *not* the "standard" MMU. So it is now actively confusing
      to use it, because it implies that code is disabled or inactive when
      the Radix MMU is in use, however that is not necessarily true.
      
      So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor
      formatting updates of some of the affected lines.
      
      This will be a pain for backports, but c'est la vie.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4e003747
  10. 27 9月, 2017 1 次提交
    • M
      powerpc/64s: Add workaround for P9 vector CI load issue · 5080332c
      Michael Neuling 提交于
      POWER9 DD2.1 and earlier has an issue where some cache inhibited
      vector load will return bad data. The workaround is two part, one
      firmware/microcode part triggers HMI interrupts when hitting such
      loads, the other part is this patch which then emulates the
      instructions in Linux.
      
      The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and
      lxvh8x.
      
      When an instruction triggers the HMI, all threads in the core will be
      sent to the HMI handler, not just the one running the vector load.
      
      In general, these spurious HMIs are detected by the emulation code and
      we just return back to the running process. Unfortunately, if a
      spurious interrupt occurs on a vector load that's to normal memory we
      have no way to detect that it's spurious (unless we walk the page
      tables, which is very expensive). In this case we emulate the load but
      we need do so using a vector load itself to ensure 128bit atomicity is
      preserved.
      
      Some additional debugfs emulated instruction counters are added also.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [mpe: Switch CONFIG_PPC_BOOK3S_64 to CONFIG_VSX to unbreak the build]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5080332c
  11. 01 8月, 2017 1 次提交
    • G
      powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle · e1c1cfed
      Gautham R. Shenoy 提交于
      The stop4 idle state on POWER9 is a deep idle state which loses
      hypervisor resources, but whose latency is low enough that it can be
      exposed via cpuidle.
      
      Until now, the deep idle states which lose hypervisor resources (eg:
      winkle) were only exposed via CPU-Hotplug.  Hence currently on wakeup
      from such states, barring a few SPRs which need to be restored to
      their older value, rest of the SPRS are reinitialized to their values
      corresponding to that at boot time.
      
      When stop4 is used in the context of cpuidle, we want these additional
      SPRs to be restored to their older value, to ensure that the context
      on the CPU coming back from idle is same as it was before going idle.
      
      In this patch, we define a SPR save area in PACA (since we have used
      up the volatile register space in the stack) and on POWER9, we restore
      SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
      SPRN_MMCR2 to the values they had before entering stop.
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e1c1cfed
  12. 20 6月, 2017 1 次提交
  13. 30 5月, 2017 1 次提交
    • G
      powerpc/powernv/idle: Use Requested Level for restoring state on P9 DD1 · 22c6663d
      Gautham R. Shenoy 提交于
      On Power9 DD1 due to a hardware bug the Power-Saving Level Status
      field (PLS) of the PSSCR for a thread waking up from a deep state can
      under-report if some other thread in the core is in a shallow stop
      state. The scenario in which this can manifest is as follows:
      
         1) All the threads of the core are in deep stop.
         2) One of the threads is woken up. The PLS for this thread will
            correctly reflect that it is waking up from deep stop.
         3) The thread that has woken up now executes a shallow stop.
         4) When some other thread in the core is woken, its PLS will reflect
            the shallow stop state.
      
      Thus, the subsequent thread for which the PLS is under-reporting the
      wakeup state will not restore the hypervisor resources.
      
      Hence, on DD1 systems, use the Requested Level (RL) field as a
      workaround to restore the contents of the hypervisor resources on the
      wakeup from the stop state.
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      22c6663d
  14. 28 4月, 2017 3 次提交
    • N
      powerpc/64s: Dedicated system reset interrupt stack · b1ee8a3d
      Nicholas Piggin 提交于
      The system reset interrupt is used for crash/debug situations, so it is
      desirable to have as little impact on the normal state of the system as
      possible.
      
      Currently it uses the current kernel stack to process the exception.
      This stores into the stack which may be involved with the crash. The
      stack pointer may be corrupted, or it may have overflowed.
      
      Avoid or minimise these problems by creating a dedicated NMI stack for
      the system reset interrupt to use.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b1ee8a3d
    • N
      powerpc/64s: Disallow system reset vs system reset reentrancy · c4f3b52c
      Nicholas Piggin 提交于
      In preparation for using a dedicated stack for system reset interrupts,
      prevent a nested system reset from recovering, in order to simplify
      code that is called in crash/debug path. This allows a system reset
      interrupt to just use the base stack pointer.
      
      Keep an in_nmi nesting counter similarly to the in_mce counter. Consider
      the interrrupt non-recoverable if it is taken inside another system
      reset.
      
      Interrupt nesting could be allowed similarly to MCE, but system reset
      is a special case that's not for normal operation, so simplicity wins
      until there is requirement for nested system reset interrupts.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c4f3b52c
    • N
      powerpc/64s: Fix system reset vs general interrupt reentrancy · a3d96f70
      Nicholas Piggin 提交于
      The system reset interrupt can occur when MSR_EE=0, and it currently
      uses the PACA_EXGEN save area.
      
      Some PACA_EXGEN interrupts have a window where MSR_RI=1 and MSR_EE=0
      when the save area is still in use. A system reset interrupt in this
      window can lead to undetected corruption when the save area gets
      overwritten.
      
      This patch introduces PACA_EXNMI save area for system reset exceptions,
      which closes this corruption window. It's also helpful to retain the
      EXGEN state for debugging situations, even if not considering the
      recoverability aspect.
      
      This patch also moves the PACA_EXMC area down to a less frequently used
      part of the paca with the new save area.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a3d96f70
  15. 11 4月, 2017 1 次提交
    • G
      powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1 · 17ed4c8f
      Gautham R. Shenoy 提交于
      POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up
      from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus
      the HSPRG0 of a thread waking up from can contain the paca pointer of
      its sibling.
      
      This patch implements a context recovery framework within threads of a
      core, by provisioning space in paca_struct for saving every sibling
      threads's paca pointers. Basically, we should be able to arrive at the
      right paca pointer from any of the thread's existing paca pointer.
      
      At bootup, during powernv idle-init, we save the paca address of every
      CPU in each one its siblings paca_struct in the slot corresponding to
      this CPU's index in the core.
      
      On wakeup from a stop, the thread will determine its index in the core
      from the TIR register and recover its PACA pointer by indexing into
      the correct slot in the provisioned space in the current PACA.
      
      Furthermore, ensure that the NVGPRs are restored from the stack on the
      way out by setting the NAPSTATELOST in paca.
      
      [Changelog written with inputs from svaidy@linux.vnet.ibm.com]
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Call it a bug]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      17ed4c8f
  16. 01 4月, 2017 1 次提交
  17. 31 3月, 2017 1 次提交
  18. 14 1月, 2017 1 次提交
  19. 09 9月, 2016 1 次提交
    • P
      powerpc: move hmi.c to arch/powerpc/kvm/ · 3f257774
      Paolo Bonzini 提交于
      hmi.c functions are unused unless sibling_subcore_state is nonzero, and
      that in turn happens only if KVM is in use.  So move the code to
      arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
      rather than CONFIG_PPC_BOOK3S_64.  The sibling_subcore_state is also
      included in struct paca_struct only if KVM is supported by the kernel.
      
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: kvm-ppc@vger.kernel.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3f257774
  20. 22 8月, 2016 1 次提交
    • P
      powerpc: move hmi.c to arch/powerpc/kvm/ · 7c379526
      Paolo Bonzini 提交于
      hmi.c functions are unused unless sibling_subcore_state is nonzero, and
      that in turn happens only if KVM is in use.  So move the code to
      arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
      rather than CONFIG_PPC_BOOK3S_64.  The sibling_subcore_state is also
      included in struct paca_struct only if KVM is supported by the kernel.
      
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: kvm-ppc@vger.kernel.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7c379526
  21. 09 7月, 2016 1 次提交
  22. 20 6月, 2016 1 次提交
    • M
      KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt · fd7bacbc
      Mahesh Salgaonkar 提交于
      When a guest is assigned to a core it converts the host Timebase (TB)
      into guest TB by adding guest timebase offset before entering into
      guest. During guest exit it restores the guest TB to host TB. This means
      under certain conditions (Guest migration) host TB and guest TB can differ.
      
      When we get an HMI for TB related issues the opal HMI handler would
      try fixing errors and restore the correct host TB value. With no guest
      running, we don't have any issues. But with guest running on the core
      we run into TB corruption issues.
      
      If we get an HMI while in the guest, the current HMI handler invokes opal
      hmi handler before forcing guest to exit. The guest exit path subtracts
      the guest TB offset from the current TB value which may have already
      been restored with host value by opal hmi handler. This leads to incorrect
      host and guest TB values.
      
      With split-core, things become more complex. With split-core, TB also gets
      split and each subcore gets its own TB register. When a hmi handler fixes
      a TB error and restores the TB value, it affects all the TB values of
      sibling subcores on the same core. On TB errors all the thread in the core
      gets HMI. With existing code, the individual threads call opal hmi handle
      independently which can easily throw TB out of sync if we have guest
      running on subcores. Hence we will need to co-ordinate with all the
      threads before making opal hmi handler call followed by TB resync.
      
      This patch introduces a sibling subcore state structure (shared by all
      threads in the core) in paca which holds information about whether sibling
      subcores are in Guest mode or host mode. An array in_guest[] of size
      MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore.
      The subcore id is used as index into in_guest[] array. Only primary
      thread entering/exiting the guest is responsible to set/unset its
      designated array element.
      
      On TB error, we get HMI interrupt on every thread on the core. Upon HMI,
      this patch will now force guest to vacate the core/subcore. Primary
      thread from each subcore will then turn off its respective bit
      from the above bitmap during the guest exit path just after the
      guest->host partition switch is complete.
      
      All other threads that have just exited the guest OR were already in host
      will wait until all other subcores clears their respective bit.
      Once all the subcores turn off their respective bit, all threads will
      will make call to opal hmi handler.
      
      It is not necessary that opal hmi handler would resync the TB value for
      every HMI interrupts. It would do so only for the HMI caused due to
      TB errors. For rest, it would not touch TB value. Hence to make things
      simpler, primary thread would call TB resync explicitly once for each
      core immediately after opal hmi handler instead of subtracting guest
      offset from TB. TB resync call will restore the TB with host value.
      Thus we can be sure about the TB state.
      
      One of the primary threads exiting the guest will take up the
      responsibility of calling TB resync. It will use one of the top bits
      (bit 63) from subcore state flags bitmap to make the decision. The first
      primary thread (among the subcores) that is able to set the bit will
      have to call the TB resync. Rest all other threads will wait until TB
      resync is complete.  Once TB resync is complete all threads will then
      proceed.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      fd7bacbc
  23. 09 1月, 2016 1 次提交
  24. 27 12月, 2015 1 次提交
  25. 19 12月, 2015 1 次提交
  26. 01 4月, 2015 1 次提交
    • K
      powerpc: book3e_64: fix the align size for paca_struct · 016f8cf0
      Kevin Hao 提交于
      All the cache line size of the current book3e 64bit SoCs are 64 bytes.
      So we should use this size to align the member of paca_struct.
      This only change the paca_struct's members which are private to book3e
      CPUs, and should not have any effect to book3s ones. With this, we save
      192 bytes. Also change it to __aligned(size) since it is preferred over
      __attribute__((aligned(size))).
      
      Before:
      	/* size: 1920, cachelines: 30, members: 46 */
      	/* sum members: 1667, holes: 6, sum holes: 141 */
      	/* padding: 112 */
      
      After:
      	/* size: 1728, cachelines: 27, members: 46 */
      	/* sum members: 1667, holes: 4, sum holes: 13 */
      	/* padding: 48 */
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      016f8cf0
  27. 15 12月, 2014 2 次提交
    • S
      powernv/powerpc: Add winkle support for offline cpus · 77b54e9f
      Shreyas B. Prabhu 提交于
      Winkle is a deep idle state supported in power8 chips. A core enters
      winkle when all the threads of the core enter winkle. In this state
      power supply to the entire chiplet i.e core, private L2 and private L3
      is turned off. As a result it gives higher powersavings compared to
      sleep.
      
      But entering winkle results in a total hypervisor state loss. Hence the
      hypervisor context has to be preserved before entering winkle and
      restored upon wake up.
      
      Power-on Reset Engine (PORE) is a dedicated engine which is responsible
      for powering on the chiplet during wake up. It can be programmed to
      restore the register contests of a few specific registers. This patch
      uses PORE to restore register state wherever possible and uses stack to
      save and restore rest of the necessary registers.
      
      With hypervisor state restore things fall under three categories-
      per-core state, per-subcore state and per-thread state. To manage this,
      extend the infrastructure introduced for sleep. Mainly we add a paca
      variable subcore_sibling_mask. Using this and the core_idle_state we can
      distingush first thread in core and subcore.
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      77b54e9f
    • S
      powernv/cpuidle: Redesign idle states management · 7cba160a
      Shreyas B. Prabhu 提交于
      Deep idle states like sleep and winkle are per core idle states. A core
      enters these states only when all the threads enter either the
      particular idle state or a deeper one. There are tasks like fastsleep
      hardware bug workaround and hypervisor core state save which have to be
      done only by the last thread of the core entering deep idle state and
      similarly tasks like timebase resync, hypervisor core register restore
      that have to be done only by the first thread waking up from these
      state.
      
      The current idle state management does not have a way to distinguish the
      first/last thread of the core waking/entering idle states. Tasks like
      timebase resync are done for all the threads. This is not only is
      suboptimal, but can cause functionality issues when subcores and kvm is
      involved.
      
      This patch adds the necessary infrastructure to track idle states of
      threads in a per-core structure. It uses this info to perform tasks like
      fastsleep workaround and timebase resync only once per core.
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Originally-by: NPreeti U. Murthy <preeti@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: linux-pm@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7cba160a
  28. 02 12月, 2014 1 次提交
  29. 05 8月, 2014 1 次提交
  30. 28 7月, 2014 1 次提交
  31. 28 5月, 2014 1 次提交
    • S
      powerpc: Fix regression of per-CPU DSCR setting · 1739ea9e
      Sam bobroff 提交于
      Since commit "efcac658 powerpc: Per process DSCR + some fixes (try#4)"
      it is no longer possible to set the DSCR on a per-CPU basis.
      
      The old behaviour was to minipulate the DSCR SPR directly but this is no
      longer sufficient: the value is quickly overwritten by context switching.
      
      This patch stores the per-CPU DSCR value in a kernel variable rather than
      directly in the SPR and it is used whenever a process has not set the DSCR
      itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged.
      
      Writes to the old global default (/sys/devices/system/cpu/dscr_default)
      now set all of the per-CPU values and reads return the last written value.
      
      The new per-CPU default is added to the paca_struct and is used everywhere
      outside of sysfs.c instead of the old global default.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1739ea9e
  32. 20 3月, 2014 2 次提交
    • S
      powerpc/booke64: Critical and machine check exception support · 609af38f
      Scott Wood 提交于
      Add special state saving for critical and machine check exceptions.
      
      Most of this code could be used to handle debug exceptions taken from
      kernel space, but actually doing so is outside the scope of this patch.
      
      The various critical and machine check exceptions now point to their
      real handlers, rather than hanging the kernel.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      609af38f
    • S
      powerpc/booke64: Use SPRG7 for VDSO · 9d378dfa
      Scott Wood 提交于
      Previously SPRG3 was marked for use by both VDSO and critical
      interrupts (though critical interrupts were not fully implemented).
      
      In commit 8b64a9df ("powerpc/booke64:
      Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman
      made an attempt to resolve this conflict by restoring the VDSO value
      early in the critical interrupt, but this has some issues:
      
       - It's incompatible with EXCEPTION_COMMON which restores r13 from the
         by-then-overwritten scratch (this cost me some debugging time).
       - It forces critical exceptions to be a special case handled
         differently from even machine check and debug level exceptions.
       - It didn't occur to me that it was possible to make this work at all
         (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after
         I made (most of) this patch. :-)
      
      It might be worth investigating using a load rather than SPRG on return
      from all exceptions (except TLB misses where the scratch never leaves
      the SPRG) -- it could save a few cycles.  Until then, let's stick with
      SPRG for all exceptions.
      
      Since we cannot use SPRG4-7 for scratch without corrupting the state of
      a KVM guest, move VDSO to SPRG7 on book3e.  Since neither SPRG4-7 nor
      critical interrupts exist on book3s, SPRG3 is still used for VDSO
      there.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Mihai Caraman <mihai.caraman@freescale.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: kvm-ppc@vger.kernel.org
      9d378dfa
  33. 15 1月, 2014 1 次提交
  34. 10 1月, 2014 1 次提交
    • S
      powerpc/e6500: TLB miss handler with hardware tablewalk support · 28efc35f
      Scott Wood 提交于
      There are a few things that make the existing hw tablewalk handlers
      unsuitable for e6500:
      
       - Indirect entries go in TLB1 (though the resulting direct entries go in
         TLB0).
      
       - It has threads, but no "tlbsrx." -- so we need a spinlock and
         a normal "tlbsx".  Because we need this lock, hardware tablewalk
         is mandatory on e6500 unless we want to add spinlock+tlbsx to
         the normal bolted TLB miss handler.
      
       - TLB1 has no HES (nor next-victim hint) so we need software round robin
         (TODO: integrate this round robin data with hugetlb/KVM)
      
       - The existing tablewalk handlers map half of a page table at a time,
         because IBM hardware has a fixed 1MiB indirect page size.  e6500
         has variable size indirect entries, with a minimum of 2MiB.
         So we can't do the half-page indirect mapping, and even if we
         could it would be less efficient than mapping the full page.
      
       - Like on e5500, the linear mapping is bolted, so we don't need the
         overhead of supporting nested tlb misses.
      
      Note that hardware tablewalk does not work in rev1 of e6500.
      We do not expect to support e6500 rev1 in mainline Linux.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Mihai Caraman <mihai.caraman@freescale.com>
      28efc35f