1. 29 12月, 2014 1 次提交
  2. 17 12月, 2014 3 次提交
    • S
      KVM: PPC: Book3S HV: Improve H_CONFER implementation · 90fd09f8
      Sam Bobroff 提交于
      Currently the H_CONFER hcall is implemented in kernel virtual mode,
      meaning that whenever a guest thread does an H_CONFER, all the threads
      in that virtual core have to exit the guest.  This is bad for
      performance because it interrupts the other threads even if they
      are doing useful work.
      
      The H_CONFER hcall is called by a guest VCPU when it is spinning on a
      spinlock and it detects that the spinlock is held by a guest VCPU that
      is currently not running on a physical CPU.  The idea is to give this
      VCPU's time slice to the holder VCPU so that it can make progress
      towards releasing the lock.
      
      To avoid having the other threads exit the guest unnecessarily,
      we add a real-mode implementation of H_CONFER that checks whether
      the other threads are doing anything.  If all the other threads
      are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER),
      it returns H_TOO_HARD which causes a guest exit and allows the
      H_CONFER to be handled in virtual mode.
      
      Otherwise it spins for a short time (up to 10 microseconds) to give
      other threads the chance to observe that this thread is trying to
      confer.  The spin loop also terminates when any thread exits the guest
      or when all other threads are idle or trying to confer.  If the
      timeout is reached, the H_CONFER returns H_SUCCESS.  In this case the
      guest VCPU will recheck the spinlock word and most likely call
      H_CONFER again.
      
      This also improves the implementation of the H_CONFER virtual mode
      handler.  If the VCPU is part of a virtual core (vcore) which is
      runnable, there will be a 'runner' VCPU which has taken responsibility
      for running the vcore.  In this case we yield to the runner VCPU
      rather than the target VCPU.
      
      We also introduce a check on the target VCPU's yield count: if it
      differs from the yield count passed to H_CONFER, the target VCPU
      has run since H_CONFER was called and may have already released
      the lock.  This check is required by PAPR.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      90fd09f8
    • P
      KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register · 4a157d61
      Paul Mackerras 提交于
      There are two ways in which a guest instruction can be obtained from
      the guest in the guest exit code in book3s_hv_rmhandlers.S.  If the
      exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal
      instruction), the offending instruction is in the HEIR register
      (Hypervisor Emulation Instruction Register).  If the exit was caused
      by a load or store to an emulated MMIO device, we load the instruction
      from the guest by turning data relocation on and loading the instruction
      with an lwz instruction.
      
      Unfortunately, in the case where the guest has opposite endianness to
      the host, these two methods give results of different endianness, but
      both get put into vcpu->arch.last_inst.  The HEIR value has been loaded
      using guest endianness, whereas the lwz will load the instruction using
      host endianness.  The rest of the code that uses vcpu->arch.last_inst
      assumes it was loaded using host endianness.
      
      To fix this, we define a new vcpu field to store the HEIR value.  Then,
      in kvmppc_handle_exit_hv(), we transfer the value from this new field to
      vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness
      differ.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4a157d61
    • P
      KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf
      Paul Mackerras 提交于
      This removes the code that was added to enable HV KVM to work
      on PPC970 processors.  The PPC970 is an old CPU that doesn't
      support virtualizing guest memory.  Removing PPC970 support also
      lets us remove the code for allocating and managing contiguous
      real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
      case, the code for pinning pages of guest memory when first
      accessed and keeping track of which pages have been pinned, and
      the code for handling H_ENTER hypercalls in virtual mode.
      
      Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
      The KVM_CAP_PPC_RMA capability now always returns 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c17b98cf
  3. 08 12月, 2014 1 次提交
    • P
      powerpc/powernv: Return to cpu offline loop when finished in KVM guest · 56548fc0
      Paul Mackerras 提交于
      When a secondary hardware thread has finished running a KVM guest, we
      currently put that thread into nap mode using a nap instruction in
      the KVM code.  This changes the code so that instead of doing a nap
      instruction directly, we instead cause the call to power7_nap() that
      put the thread into nap mode to return.  The reason for doing this is
      to avoid having the KVM code having to know what low-power mode to
      put the thread into.
      
      In the case of a secondary thread used to run a KVM guest, the thread
      will be offline from the point of view of the host kernel, and the
      relevant power7_nap() call is the one in pnv_smp_cpu_disable().
      In this case we don't want to clear pending IPIs in the offline loop
      in that function, since that might cause us to miss the wakeup for
      the next time the thread needs to run a guest.  To tell whether or
      not to clear the interrupt, we use the SRR1 value returned from
      power7_nap(), and check if it indicates an external interrupt.  We
      arrange that the return from power7_nap() when we have finished running
      a guest returns 0, so pending interrupts don't get flushed in that
      case.
      
      Note that it is important a secondary thread that has finished
      executing in the guest, or that didn't have a guest to run, should
      not return to power7_nap's caller while the kvm_hstate.hwthread_req
      flag in the PACA is non-zero, because the return from power7_nap
      will reenable the MMU, and the MMU might still be in guest context.
      In this situation we spin at low priority in real mode waiting for
      hwthread_req to become zero.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      56548fc0
  4. 22 9月, 2014 1 次提交
  5. 05 8月, 2014 1 次提交
  6. 28 7月, 2014 7 次提交
    • A
      KVM: PPC: Book3S HV: Fix ABIv2 on LE · 9bf163f8
      Alexander Graf 提交于
      For code that doesn't live in modules we can just branch to the real function
      names, giving us compatibility with ABIv1 and ABIv2.
      
      Do this for the compiled-in code of HV KVM.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9bf163f8
    • A
      KVM: PPC: Book3S HV: Access XICS in BE · 76d072fb
      Alexander Graf 提交于
      On the exit path from the guest we check what type of interrupt we received
      if we received one. This means we're doing hardware access to the XICS interrupt
      controller.
      
      However, when running on a little endian system, this access is byte reversed.
      
      So let's make sure to swizzle the bytes back again and virtually make XICS
      accesses big endian.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      76d072fb
    • A
      KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE · 0865a583
      Alexander Graf 提交于
      Some data structures are always stored in big endian. Among those are the LPPACA
      fields as well as the shadow slb. These structures might be shared with a
      hypervisor.
      
      So whenever we access those fields, make sure we do so in big endian byte order.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      0865a583
    • P
      KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled · ae2113a4
      Paul Mackerras 提交于
      This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL
      capability is used to enable or disable in-kernel handling of an
      hcall, that the hcall is actually implemented by the kernel.
      If not an EINVAL error is returned.
      
      This also checks the default-enabled list of hcalls and prints a
      warning if any hcall there is not actually implemented.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ae2113a4
    • P
      KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling · 699a0ea0
      Paul Mackerras 提交于
      This provides a way for userspace controls which sPAPR hcalls get
      handled in the kernel.  Each hcall can be individually enabled or
      disabled for in-kernel handling, except for H_RTAS.  The exception
      for H_RTAS is because userspace can already control whether
      individual RTAS functions are handled in-kernel or not via the
      KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for
      H_RTAS is out of the normal sequence of hcall numbers.
      
      Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the
      KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM.
      The args field of the struct kvm_enable_cap specifies the hcall number
      in args[0] and the enable/disable flag in args[1]; 0 means disable
      in-kernel handling (so that the hcall will always cause an exit to
      userspace) and 1 means enable.  Enabling or disabling in-kernel
      handling of an hcall is effective across the whole VM.
      
      The ability for KVM_ENABLE_CAP to be used on a VM file descriptor
      on PowerPC is new, added by this commit.  The KVM_CAP_ENABLE_CAP_VM
      capability advertises that this ability exists.
      
      When a VM is created, an initial set of hcalls are enabled for
      in-kernel handling.  The set that is enabled is the set that have
      an in-kernel implementation at this point.  Any new hcall
      implementations from this point onwards should not be added to the
      default set without a good reason.
      
      No distinction is made between real-mode and virtual-mode hcall
      implementations; the one setting controls them both.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      699a0ea0
    • A
      KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC() · ad7d4584
      Anton Blanchard 提交于
      Both kvmppc_hv_entry_trampoline and kvmppc_entry_trampoline are
      assembly functions that are exported to modules and also require
      a valid r2.
      
      As such we need to use _GLOBAL_TOC so we provide a global entry
      point that establishes the TOC (r2).
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ad7d4584
    • A
      KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue · 05a308c7
      Anton Blanchard 提交于
      To establish addressability quickly, ABIv2 requires the target
      address of the function being called to be in r12.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      05a308c7
  7. 07 7月, 2014 1 次提交
  8. 11 6月, 2014 1 次提交
    • M
      powerpc/book3s: Fix guest MC delivery mechanism to avoid soft lockups in guest. · 74845bc2
      Mahesh Salgaonkar 提交于
      Currently we forward MCEs to guest which have been recovered by guest.
      And for unhandled errors we do not deliver the MCE to guest. It looks like
      with no support of FWNMI in qemu, guest just panics whenever we deliver the
      recovered MCEs to guest. Also, the existig code used to return to host for
      unhandled errors which was casuing guest to hang with soft lockups inside
      guest and makes it difficult to recover guest instance.
      
      This patch now forwards all fatal MCEs to guest causing guest to crash/panic.
      And, for recovered errors we just go back to normal functioning of guest
      instead of returning to host. This fixes soft lockup issues in guest.
      This patch also fixes an issue where guest MCE events were not logged to
      host console.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      74845bc2
  9. 30 5月, 2014 2 次提交
    • P
      KVM: PPC: Book3S HV: Fix machine check delivery to guest · 000a25dd
      Paul Mackerras 提交于
      The code that delivered a machine check to the guest after handling
      it in real mode failed to load up r11 before calling kvmppc_msr_interrupt,
      which needs the old MSR value in r11 so it can see the transactional
      state there.  This adds the missing load.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      000a25dd
    • P
      KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs · 9bc01a9b
      Paul Mackerras 提交于
      This adds workarounds for two hardware bugs in the POWER8 performance
      monitor unit (PMU), both related to interrupt generation.  The effect
      of these bugs is that PMU interrupts can get lost, leading to tools
      such as perf reporting fewer counts and samples than they should.
      
      The first bug relates to the PMAO (perf. mon. alert occurred) bit in
      MMCR0; setting it should cause an interrupt, but doesn't.  The other
      bug relates to the PMAE (perf. mon. alert enable) bit in MMCR0.
      Setting PMAE when a counter is negative and counter negative
      conditions are enabled to cause alerts should cause an alert, but
      doesn't.
      
      The workaround for the first bug is to create conditions where a
      counter will overflow, whenever we are about to restore a MMCR0
      value that has PMAO set (and PMAO_SYNC clear).  The workaround for
      the second bug is to freeze all counters using MMCR2 before reading
      MMCR0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9bc01a9b
  10. 28 5月, 2014 1 次提交
    • S
      powerpc: Fix regression of per-CPU DSCR setting · 1739ea9e
      Sam bobroff 提交于
      Since commit "efcac658 powerpc: Per process DSCR + some fixes (try#4)"
      it is no longer possible to set the DSCR on a per-CPU basis.
      
      The old behaviour was to minipulate the DSCR SPR directly but this is no
      longer sufficient: the value is quickly overwritten by context switching.
      
      This patch stores the per-CPU DSCR value in a kernel variable rather than
      directly in the SPR and it is used whenever a process has not set the DSCR
      itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged.
      
      Writes to the old global default (/sys/devices/system/cpu/dscr_default)
      now set all of the per-CPU values and reads return the last written value.
      
      The new per-CPU default is added to the paca_struct and is used everywhere
      outside of sysfs.c instead of the old global default.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1739ea9e
  11. 28 4月, 2014 3 次提交
  12. 23 4月, 2014 2 次提交
  13. 29 3月, 2014 3 次提交
    • P
      KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8 · 72cde5a8
      Paul Mackerras 提交于
      Currently we save the host PMU configuration, counter values, etc.,
      when entering a guest, and restore it on return from the guest.
      (We have to do this because the guest has control of the PMU while
      it is executing.)  However, we missed saving/restoring the SIAR and
      SDAR registers, as well as the registers which are new on POWER8,
      namely SIER and MMCR2.
      
      This adds code to save the values of these registers when entering
      the guest and restore them on exit.  This also works around the bug
      in POWER8 where setting PMAE with a counter already negative doesn't
      generate an interrupt.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NScott Wood <scottwood@freescale.com>
      72cde5a8
    • P
      KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset · c5fb80d3
      Paul Mackerras 提交于
      Commit c7699822bc21 ("KVM: PPC: Book3S HV: Make physical thread 0 do
      the MMU switching") reordered the guest entry/exit code so that most
      of the guest register save/restore code happened in guest MMU context.
      A side effect of that is that the timebase still contains the guest
      timebase value at the point where we compute and use vcpu->arch.dec_expires,
      and therefore that is now a guest timebase value rather than a host
      timebase value.  That in turn means that the timeouts computed in
      kvmppc_set_timer() are wrong if the timebase offset for the guest is
      non-zero.  The consequence of that is things such as "sleep 1" in a
      guest after migration may sleep for much longer than they should.
      
      This fixes the problem by converting between guest and host timebase
      values as necessary, by adding or subtracting the timebase offset.
      This also fixes an incorrect comment.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NScott Wood <scottwood@freescale.com>
      c5fb80d3
    • M
      KVM: PPC: Book3S HV: Add transactional memory support · e4e38121
      Michael Neuling 提交于
      This adds saving of the transactional memory (TM) checkpointed state
      on guest entry and exit.  We only do this if we see that the guest has
      an active transaction.
      
      It also adds emulation of the TM state changes when delivering IRQs
      into the guest.  According to the architecture, if we are
      transactional when an IRQ occurs, the TM state is changed to
      suspended, otherwise it's left unchanged.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NScott Wood <scottwood@freescale.com>
      e4e38121
  14. 26 3月, 2014 1 次提交
  15. 20 3月, 2014 1 次提交
    • S
      powerpc/booke64: Use SPRG7 for VDSO · 9d378dfa
      Scott Wood 提交于
      Previously SPRG3 was marked for use by both VDSO and critical
      interrupts (though critical interrupts were not fully implemented).
      
      In commit 8b64a9df ("powerpc/booke64:
      Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman
      made an attempt to resolve this conflict by restoring the VDSO value
      early in the critical interrupt, but this has some issues:
      
       - It's incompatible with EXCEPTION_COMMON which restores r13 from the
         by-then-overwritten scratch (this cost me some debugging time).
       - It forces critical exceptions to be a special case handled
         differently from even machine check and debug level exceptions.
       - It didn't occur to me that it was possible to make this work at all
         (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after
         I made (most of) this patch. :-)
      
      It might be worth investigating using a load rather than SPRG on return
      from all exceptions (except TLB misses where the scratch never leaves
      the SPRG) -- it could save a few cycles.  Until then, let's stick with
      SPRG for all exceptions.
      
      Since we cannot use SPRG4-7 for scratch without corrupting the state of
      a KVM guest, move VDSO to SPRG7 on book3e.  Since neither SPRG4-7 nor
      critical interrupts exist on book3s, SPRG3 is still used for VDSO
      there.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Mihai Caraman <mihai.caraman@freescale.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: kvm-ppc@vger.kernel.org
      9d378dfa
  16. 13 3月, 2014 2 次提交
  17. 27 1月, 2014 9 次提交
    • M
      KVM: PPC: Book3S HV: Add new state for transactional memory · 7b490411
      Michael Neuling 提交于
      Add new state for transactional memory (TM) to kvm_vcpu_arch.  Also add
      asm-offset bits that are going to be required.
      
      This also moves the existing TFHAR, TFIAR and TEXASR SPRs into a
      CONFIG_PPC_TRANSACTIONAL_MEM section.  This requires some code changes to
      ensure we still compile with CONFIG_PPC_TRANSACTIONAL_MEM=N.  Much of the added
      the added #ifdefs are removed in a later patch when the bulk of the TM code is
      added.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix merge conflict]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7b490411
    • A
      KVM: PPC: Book3S HV: Basic little-endian guest support · d682916a
      Anton Blanchard 提交于
      We create a guest MSR from scratch when delivering exceptions in
      a few places.  Instead of extracting LPCR[ILE] and inserting it
      into MSR_LE each time, we simply create a new variable intr_msr which
      contains the entire MSR to use.  For a little-endian guest, userspace
      needs to set the ILE (interrupt little-endian) bit in the LPCR for
      each vcpu (or at least one vcpu in each virtual core).
      
      [paulus@samba.org - removed H_SET_MODE implementation from original
      version of the patch, and made kvmppc_set_lpcr update vcpu->arch.intr_msr.]
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d682916a
    • P
      KVM: PPC: Book3S HV: Add support for DABRX register on POWER7 · 8563bf52
      Paul Mackerras 提交于
      The DABRX (DABR extension) register on POWER7 processors provides finer
      control over which accesses cause a data breakpoint interrupt.  It
      contains 3 bits which indicate whether to enable accesses in user,
      kernel and hypervisor modes respectively to cause data breakpoint
      interrupts, plus one bit that enables both real mode and virtual mode
      accesses to cause interrupts.  Currently, KVM sets DABRX to allow
      both kernel and user accesses to cause interrupts while in the guest.
      
      This adds support for the guest to specify other values for DABRX.
      PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR
      and DABRX with one call.  This adds a real-mode implementation of
      H_SET_XDABR, which shares most of its code with the existing H_SET_DABR
      implementation.  To support this, we add a per-vcpu field to store the
      DABRX value plus code to get and set it via the ONE_REG interface.
      
      For Linux guests to use this new hcall, userspace needs to add
      "hcall-xdabr" to the set of strings in the /chosen/hypertas-functions
      property in the device tree.  If userspace does this and then migrates
      the guest to a host where the kernel doesn't include this patch, then
      userspace will need to implement H_SET_XDABR by writing the specified
      DABR value to the DABR using the ONE_REG interface.  In that case, the
      old kernel will set DABRX to DABRX_USER | DABRX_KERNEL.  That should
      still work correctly, at least for Linux guests, since Linux guests
      cope with getting data breakpoint interrupts in modes that weren't
      requested by just ignoring the interrupt, and Linux guests never set
      DABRX_BTI.
      
      The other thing this does is to make H_SET_DABR and H_SET_XDABR work
      on POWER8, which has the DAWR and DAWRX instead of DABR/X.  Guests that
      know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but
      guests running in POWER7 compatibility mode will still use H_SET_[X]DABR.
      For them, this adds the logic to convert DABR/X values into DAWR/X values
      on POWER8.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8563bf52
    • P
      KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells · 5d00f66b
      Paul Mackerras 提交于
      POWER8 has support for hypervisor doorbell interrupts.  Though the
      kernel doesn't use them for IPIs on the powernv platform yet, it
      probably will in future, so this makes KVM cope gracefully if a
      hypervisor doorbell interrupt arrives while in a guest.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5d00f66b
    • P
      KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs · aa31e843
      Paul Mackerras 提交于
      * SRR1 wake reason field for system reset interrupt on wakeup from nap
        is now a 4-bit field on P8, compared to 3 bits on P7.
      
      * Set PECEDP in LPCR when napping because of H_CEDE so guest doorbells
        will wake us up.
      
      * Waking up from nap because of a guest doorbell interrupt is not a
        reason to exit the guest.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      aa31e843
    • P
      KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from nap · e3bbbbfa
      Paul Mackerras 提交于
      Currently in book3s_hv_rmhandlers.S we have three places where we
      have woken up from nap mode and we check the reason field in SRR1
      to see what event woke us up.  This consolidates them into a new
      function, kvmppc_check_wake_reason.  It looks at the wake reason
      field in SRR1, and if it indicates that an external interrupt caused
      the wakeup, calls kvmppc_read_intr to check what sort of interrupt
      it was.
      
      This also consolidates the two places where we synthesize an external
      interrupt (0x500 vector) for the guest.  Now, if the guest exit code
      finds that there was an external interrupt which has been handled
      (i.e. it was an IPI indicating that there is now an interrupt pending
      for the guest), it jumps to deliver_guest_interrupt, which is in the
      last part of the guest entry code, where we synthesize guest external
      and decrementer interrupts.  That code has been streamlined a little
      and now clears LPCR[MER] when appropriate as well as setting it.
      
      The extra clearing of any pending IPI on a secondary, offline CPU
      thread before going back to nap mode has been removed.  It is no longer
      necessary now that we have code to read and acknowledge IPIs in the
      guest exit path.
      
      This fixes a minor bug in the H_CEDE real-mode handling - previously,
      if we found that other threads were already exiting the guest when we
      were about to go to nap mode, we would branch to the cede wakeup path
      and end up looking in SRR1 for a wakeup reason.  Now we branch to a
      point after we have checked the wakeup reason.
      
      This also fixes a minor bug in kvmppc_read_intr - previously it could
      return 0xff rather than 1, in the case where we find that a host IPI
      is pending after we have cleared the IPI.  Now it returns 1.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e3bbbbfa
    • P
      KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8 · ca252055
      Paul Mackerras 提交于
      POWER8 has 512 sets in the TLB, compared to 128 for POWER7, so we need
      to do more tlbiel instructions when flushing the TLB on POWER8.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ca252055
    • M
      KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs · b005255e
      Michael Neuling 提交于
      This adds fields to the struct kvm_vcpu_arch to store the new
      guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
      functions to allow userspace to access this state, and adds code to
      the guest entry and exit to context-switch these SPRs between host
      and guest.
      
      Note that DPDES (Directed Privileged Doorbell Exception State) is
      shared between threads on a core; hence we store it in struct
      kvmppc_vcore and have the master thread save and restore it.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b005255e
    • P
      KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers · e0b7ec05
      Paul Mackerras 提交于
      On a threaded processor such as POWER7, we group VCPUs into virtual
      cores and arrange that the VCPUs in a virtual core run on the same
      physical core.  Currently we don't enforce any correspondence between
      virtual thread numbers within a virtual core and physical thread
      numbers.  Physical threads are allocated starting at 0 on a first-come
      first-served basis to runnable virtual threads (VCPUs).
      
      POWER8 implements a new "msgsndp" instruction which guest kernels can
      use to interrupt other threads in the same core or sub-core.  Since
      the instruction takes the destination physical thread ID as a parameter,
      it becomes necessary to align the physical thread IDs with the virtual
      thread IDs, that is, to make sure virtual thread N within a virtual
      core always runs on physical thread N.
      
      This means that it's possible that thread 0, which is where we call
      __kvmppc_vcore_entry, may end up running some other vcpu than the
      one whose task called kvmppc_run_core(), or it may end up running
      no vcpu at all, if for example thread 0 of the virtual core is
      currently executing in userspace.  However, we do need thread 0
      to be responsible for switching the MMU -- a previous version of
      this patch that had other threads switching the MMU was found to
      be responsible for occasional memory corruption and machine check
      interrupts in the guest on POWER7 machines.
      
      To accommodate this, we no longer pass the vcpu pointer to
      __kvmppc_vcore_entry, but instead let the assembly code load it from
      the PACA.  Since the assembly code will need to know the kvm pointer
      and the thread ID for threads which don't have a vcpu, we move the
      thread ID into the PACA and we add a kvm pointer to the virtual core
      structure.
      
      In the case where thread 0 has no vcpu to run, it still calls into
      kvmppc_hv_entry in order to do the MMU switch, and then naps until
      either its vcpu is ready to run in the guest, or some other thread
      needs to exit the guest.  In the latter case, thread 0 jumps to the
      code that switches the MMU back to the host.  This control flow means
      that now we switch the MMU before loading any guest vcpu state.
      Similarly, on guest exit we now save all the guest vcpu state before
      switching the MMU back to the host.  This has required substantial
      code movement, making the diff rather large.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e0b7ec05