1. 09 1月, 2014 2 次提交
  2. 21 11月, 2013 1 次提交
  3. 17 10月, 2013 11 次提交
    • A
      kvm: powerpc: book3s: Cleanup interrupt handling code · dd96b2c2
      Aneesh Kumar K.V 提交于
      With this patch if HV is included, interrupts come in to the HV version
      of the kvmppc_interrupt code, which then jumps to the PR handler,
      renamed to kvmppc_interrupt_pr, if the guest is a PR guest. This helps
      in enabling both HV and PR, which we do in later patch
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      dd96b2c2
    • P
      KVM: PPC: Book3S HV: Better handling of exceptions that happen in real mode · 44a3add8
      Paul Mackerras 提交于
      When an interrupt or exception happens in the guest that comes to the
      host, the CPU goes to hypervisor real mode (MMU off) to handle the
      exception but doesn't change the MMU context.  After saving a few
      registers, we then clear the "in guest" flag.  If, for any reason,
      we get an exception in the real-mode code, that then gets handled
      by the normal kernel exception handlers, which turn the MMU on.  This
      is disastrous if the MMU is still set to the guest context, since we
      end up executing instructions from random places in the guest kernel
      with hypervisor privilege.
      
      In order to catch this situation, we define a new value for the "in guest"
      flag, KVM_GUEST_MODE_HOST_HV, to indicate that we are in hypervisor real
      mode with guest MMU context.  If the "in guest" flag is set to this value,
      we branch off to an emergency handler.  For the moment, this just does
      a branch to self to stop the CPU from doing anything further.
      
      While we're here, we define another new flag value to indicate that we
      are in a HV guest, as distinct from a PR guest.  This will be useful
      when we have a kernel that can support both PR and HV guests concurrently.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      44a3add8
    • P
      KVM: PPC: Book3S: Move skip-interrupt handlers to common code · 4f6c11db
      Paul Mackerras 提交于
      Both PR and HV KVM have separate, identical copies of the
      kvmppc_skip_interrupt and kvmppc_skip_Hinterrupt handlers that are
      used for the situation where an interrupt happens when loading the
      instruction that caused an exit from the guest.  To eliminate this
      duplication and make it easier to compile in both PR and HV KVM,
      this moves this code to arch/powerpc/kernel/exceptions-64s.S along
      with other kernel interrupt handler code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4f6c11db
    • P
      KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7 · 388cc6e1
      Paul Mackerras 提交于
      This enables us to use the Processor Compatibility Register (PCR) on
      POWER7 to put the processor into architecture 2.05 compatibility mode
      when running a guest.  In this mode the new instructions and registers
      that were introduced on POWER7 are disabled in user mode.  This
      includes all the VSX facilities plus several other instructions such
      as ldbrx, stdbrx, popcntw, popcntd, etc.
      
      To select this mode, we have a new register accessible through the
      set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT.  Setting
      this to zero gives the full set of capabilities of the processor.
      Setting it to one of the "logical" PVR values defined in PAPR puts
      the vcpu into the compatibility mode for the corresponding
      architecture level.  The supported values are:
      
      0x0f000002	Architecture 2.05 (POWER6)
      0x0f000003	Architecture 2.06 (POWER7)
      0x0f100003	Architecture 2.06+ (POWER7+)
      
      Since the PCR is per-core, the architecture compatibility level and
      the corresponding PCR value are stored in the struct kvmppc_vcore, and
      are therefore shared between all vcpus in a virtual core.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: squash in fix to add missing break statements and documentation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      388cc6e1
    • P
      KVM: PPC: Book3S HV: Add support for guest Program Priority Register · 4b8473c9
      Paul Mackerras 提交于
      POWER7 and later IBM server processors have a register called the
      Program Priority Register (PPR), which controls the priority of
      each hardware CPU SMT thread, and affects how fast it runs compared
      to other SMT threads.  This priority can be controlled by writing to
      the PPR or by use of a set of instructions of the form or rN,rN,rN
      which are otherwise no-ops but have been defined to set the priority
      to particular levels.
      
      This adds code to context switch the PPR when entering and exiting
      guests and to make the PPR value accessible through the SET/GET_ONE_REG
      interface.  When entering the guest, we set the PPR as late as
      possible, because if we are setting a low thread priority it will
      make the code run slowly from that point on.  Similarly, the
      first-level interrupt handlers save the PPR value in the PACA very
      early on, and set the thread priority to the medium level, so that
      the interrupt handling code runs at a reasonable speed.
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4b8473c9
    • P
      KVM: PPC: Book3S HV: Store LPCR value for each virtual core · a0144e2a
      Paul Mackerras 提交于
      This adds the ability to have a separate LPCR (Logical Partitioning
      Control Register) value relating to a guest for each virtual core,
      rather than only having a single value for the whole VM.  This
      corresponds to what real POWER hardware does, where there is a LPCR
      per CPU thread but most of the fields are required to have the same
      value on all active threads in a core.
      
      The per-virtual-core LPCR can be read and written using the
      GET/SET_ONE_REG interface.  Userspace can can only modify the
      following fields of the LPCR value:
      
      DPFD	Default prefetch depth
      ILE	Interrupt little-endian
      TC	Translation control (secondary HPT hash group search disable)
      
      We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
      contains bits relating to memory management, i.e. the Virtualized
      Partition Memory (VPM) bits and the bits relating to guest real mode.
      When this default value is updated, the update needs to be propagated
      to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
      that.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix whitespace]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a0144e2a
    • P
      KVM: PPC: Book3S HV: Avoid unbalanced increments of VPA yield count · 8c2dbb79
      Paul Mackerras 提交于
      The yield count in the VPA is supposed to be incremented every time
      we enter the guest, and every time we exit the guest, so that its
      value is even when the vcpu is running in the guest and odd when it
      isn't.  However, it's currently possible that we increment the yield
      count on the way into the guest but then find that other CPU threads
      are already exiting the guest, so we go back to nap mode via the
      secondary_too_late label.  In this situation we don't increment the
      yield count again, breaking the relationship between the LSB of the
      count and whether the vcpu is in the guest.
      
      To fix this, we move the increment of the yield count to a point
      after we have checked whether other CPU threads are exiting.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8c2dbb79
    • P
      KVM: PPC: Book3S HV: Pull out interrupt-reading code into a subroutine · c934243c
      Paul Mackerras 提交于
      This moves the code in book3s_hv_rmhandlers.S that reads any pending
      interrupt from the XICS interrupt controller, and works out whether
      it is an IPI for the guest, an IPI for the host, or a device interrupt,
      into a new function called kvmppc_read_intr.  Later patches will
      need this.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c934243c
    • P
      KVM: PPC: Book3S HV: Restructure kvmppc_hv_entry to be a subroutine · 218309b7
      Paul Mackerras 提交于
      We have two paths into and out of the low-level guest entry and exit
      code: from a vcpu task via kvmppc_hv_entry_trampoline, and from the
      system reset vector for an offline secondary thread on POWER7 via
      kvm_start_guest.  Currently both just branch to kvmppc_hv_entry to
      enter the guest, and on guest exit, we test the vcpu physical thread
      ID to detect which way we came in and thus whether we should return
      to the vcpu task or go back to nap mode.
      
      In order to make the code flow clearer, and to keep the code relating
      to each flow together, this turns kvmppc_hv_entry into a subroutine
      that follows the normal conventions for call and return.  This means
      that kvmppc_hv_entry_trampoline() and kvmppc_hv_entry() now establish
      normal stack frames, and we use the normal stack slots for saving
      return addresses rather than local_paca->kvm_hstate.vmhandler.  Apart
      from that this is mostly moving code around unchanged.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      218309b7
    • P
      KVM: PPC: Book3S HV: Implement timebase offset for guests · 93b0f4dc
      Paul Mackerras 提交于
      This allows guests to have a different timebase origin from the host.
      This is needed for migration, where a guest can migrate from one host
      to another and the two hosts might have a different timebase origin.
      However, the timebase seen by the guest must not go backwards, and
      should go forwards only by a small amount corresponding to the time
      taken for the migration.
      
      Therefore this provides a new per-vcpu value accessed via the one_reg
      interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
      defaults to 0 and is not modified by KVM.  On entering the guest, this
      value is added onto the timebase, and on exiting the guest, it is
      subtracted from the timebase.
      
      This is only supported for recent POWER hardware which has the TBU40
      (timebase upper 40 bits) register.  Writing to the TBU40 register only
      alters the upper 40 bits of the timebase, leaving the lower 24 bits
      unchanged.  This provides a way to modify the timebase for guest
      migration without disturbing the synchronization of the timebase
      registers across CPU cores.  The kernel rounds up the value given
      to a multiple of 2^24.
      
      Timebase values stored in KVM structures (struct kvm_vcpu, struct
      kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
      values in the dispatch trace log need to be guest timebase values,
      however, since that is read directly by the guest.  This moves the
      setting of vcpu->arch.dec_expires on guest exit to a point after we
      have restored the host timebase so that vcpu->arch.dec_expires is a
      host timebase value.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      93b0f4dc
    • P
      KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers · 14941789
      Paul Mackerras 提交于
      Currently we are not saving and restoring the SIAR and SDAR registers in
      the PMU (performance monitor unit) on guest entry and exit.  The result
      is that performance monitoring tools in the guest could get false
      information about where a program was executing and what data it was
      accessing at the time of a performance monitor interrupt.  This fixes
      it by saving and restoring these registers along with the other PMU
      registers on guest entry/exit.
      
      This also provides a way for userspace to access these values for a
      vcpu via the one_reg interface.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      14941789
  4. 10 10月, 2013 1 次提交
  5. 14 8月, 2013 2 次提交
  6. 10 7月, 2013 1 次提交
  7. 27 4月, 2013 4 次提交
    • P
      KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts · 4619ac88
      Paul Mackerras 提交于
      This streamlines our handling of external interrupts that come in
      while we're in the guest.  First, when waking up a hardware thread
      that was napping, we split off the "napping due to H_CEDE" case
      earlier, and use the code that handles an external interrupt (0x500)
      in the guest to handle that too.  Secondly, the code that handles
      those external interrupts now checks if any other thread is exiting
      to the host before bouncing an external interrupt to the guest, and
      also checks that there is actually an external interrupt pending for
      the guest before setting the LPCR MER bit (mediated external request).
      
      This also makes sure that we clear the "ceded" flag when we handle a
      wakeup from cede in real mode, and fixes a potential infinite loop
      in kvmppc_run_vcpu() which can occur if we ever end up with the ceded
      flag set but MSR[EE] off.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4619ac88
    • B
      KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation · e7d26f28
      Benjamin Herrenschmidt 提交于
      This adds an implementation of the XICS hypercalls in real mode for HV
      KVM, which allows us to avoid exiting the guest MMU context on all
      threads for a variety of operations such as fetching a pending
      interrupt, EOI of messages, IPIs, etc.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e7d26f28
    • B
      KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM · 54695c30
      Benjamin Herrenschmidt 提交于
      Currently, we wake up a CPU by sending a host IPI with
      smp_send_reschedule() to thread 0 of that core, which will take all
      threads out of the guest, and cause them to re-evaluate their
      interrupt status on the way back in.
      
      This adds a mechanism to differentiate real host IPIs from IPIs sent
      by KVM for guest threads to poke each other, in order to target the
      guest threads precisely when possible and avoid that global switch of
      the core to host state.
      
      We then use this new facility in the in-kernel XICS code.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      54695c30
    • P
      KVM: PPC: Book3S HV: Report VPA and DTL modifications in dirty map · c35635ef
      Paul Mackerras 提交于
      At present, the KVM_GET_DIRTY_LOG ioctl doesn't report modifications
      done by the host to the virtual processor areas (VPAs) and dispatch
      trace logs (DTLs) registered by the guest.  This is because those
      modifications are done either in real mode or in the host kernel
      context, and in neither case does the access go through the guest's
      HPT, and thus no change (C) bit gets set in the guest's HPT.
      
      However, the changes done by the host do need to be tracked so that
      the modified pages get transferred when doing live migration.  In
      order to track these modifications, this adds a dirty flag to the
      struct representing the VPA/DTL areas, and arranges to set the flag
      when the VPA/DTL gets modified by the host.  Then, when we are
      collecting the dirty log, we also check the dirty flags for the
      VPA and DTL for each vcpu and set the relevant bit in the dirty log
      if necessary.  Doing this also means we now need to keep track of
      the guest physical address of the VPA/DTL areas.
      
      So as not to lose track of modifications to a VPA/DTL area when it gets
      unregistered, or when a new area gets registered in its place, we need
      to transfer the dirty state to the rmap chain.  This adds code to
      kvmppc_unpin_guest_page() to do that if the area was dirty.  To simplify
      that code, we now require that all VPA, DTL and SLB shadow buffer areas
      fit within a single host page.  Guests already comply with this
      requirement because pHyp requires that these areas not cross a 4k
      boundary.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c35635ef
  8. 15 2月, 2013 1 次提交
  9. 06 12月, 2012 2 次提交
    • P
      KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking · b4072df4
      Paul Mackerras 提交于
      Currently, if a machine check interrupt happens while we are in the
      guest, we exit the guest and call the host's machine check handler,
      which tends to cause the host to panic.  Some machine checks can be
      triggered by the guest; for example, if the guest creates two entries
      in the SLB that map the same effective address, and then accesses that
      effective address, the CPU will take a machine check interrupt.
      
      To handle this better, when a machine check happens inside the guest,
      we call a new function, kvmppc_realmode_machine_check(), while still in
      real mode before exiting the guest.  On POWER7, it handles the cases
      that the guest can trigger, either by flushing and reloading the SLB,
      or by flushing the TLB, and then it delivers the machine check interrupt
      directly to the guest without going back to the host.  On POWER7, the
      OPAL firmware patches the machine check interrupt vector so that it
      gets control first, and it leaves behind its analysis of the situation
      in a structure pointed to by the opal_mc_evt field of the paca.  The
      kvmppc_realmode_machine_check() function looks at this, and if OPAL
      reports that there was no error, or that it has handled the error, we
      also go straight back to the guest with a machine check.  We have to
      deliver a machine check to the guest since the machine check interrupt
      might have trashed valid values in SRR0/1.
      
      If the machine check is one we can't handle in real mode, and one that
      OPAL hasn't already handled, or on PPC970, we exit the guest and call
      the host's machine check handler.  We do this by jumping to the
      machine_check_fwnmi label, rather than absolute address 0x200, because
      we don't want to re-execute OPAL's handler on POWER7.  On PPC970, the
      two are equivalent because address 0x200 just contains a branch.
      
      Then, if the host machine check handler decides that the system can
      continue executing, kvmppc_handle_exit() delivers a machine check
      interrupt to the guest -- once again to let the guest know that SRR0/1
      have been modified.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix checkpatch warnings]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b4072df4
    • P
      KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations · 1b400ba0
      Paul Mackerras 提交于
      When we change or remove a HPT (hashed page table) entry, we can do
      either a global TLB invalidation (tlbie) that works across the whole
      machine, or a local invalidation (tlbiel) that only affects this core.
      Currently we do local invalidations if the VM has only one vcpu or if
      the guest requests it with the H_LOCAL flag, though the guest Linux
      kernel currently doesn't ever use H_LOCAL.  Then, to cope with the
      possibility that vcpus moving around to different physical cores might
      expose stale TLB entries, there is some code in kvmppc_hv_entry to
      flush the whole TLB of entries for this VM if either this vcpu is now
      running on a different physical core from where it last ran, or if this
      physical core last ran a different vcpu.
      
      There are a number of problems on POWER7 with this as it stands:
      
      - The TLB invalidation is done per thread, whereas it only needs to be
        done per core, since the TLB is shared between the threads.
      - With the possibility of the host paging out guest pages, the use of
        H_LOCAL by an SMP guest is dangerous since the guest could possibly
        retain and use a stale TLB entry pointing to a page that had been
        removed from the guest.
      - The TLB invalidations that we do when a vcpu moves from one physical
        core to another are unnecessary in the case of an SMP guest that isn't
        using H_LOCAL.
      - The optimization of using local invalidations rather than global should
        apply to guests with one virtual core, not just one vcpu.
      
      (None of this applies on PPC970, since there we always have to
      invalidate the whole TLB when entering and leaving the guest, and we
      can't support paging out guest memory.)
      
      To fix these problems and simplify the code, we now maintain a simple
      cpumask of which cpus need to flush the TLB on entry to the guest.
      (This is indexed by cpu, though we only ever use the bits for thread
      0 of each core.)  Whenever we do a local TLB invalidation, we set the
      bits for every cpu except the bit for thread 0 of the core that we're
      currently running on.  Whenever we enter a guest, we test and clear the
      bit for our core, and flush the TLB if it was set.
      
      On initial startup of the VM, and when resetting the HPT, we set all the
      bits in the need_tlb_flush cpumask, since any core could potentially have
      stale TLB entries from the previous VM to use the same LPID, or the
      previous contents of the HPT.
      
      Then, we maintain a count of the number of online virtual cores, and use
      that when deciding whether to use a local invalidation rather than the
      number of online vcpus.  The code to make that decision is extracted out
      into a new function, global_invalidates().  For multi-core guests on
      POWER7 (i.e. when we are using mmu notifiers), we now never do local
      invalidations regardless of the H_LOCAL flag.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1b400ba0
  10. 30 10月, 2012 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix some races in starting secondary threads · 7b444c67
      Paul Mackerras 提交于
      Subsequent patches implementing in-kernel XICS emulation will make it
      possible for IPIs to arrive at secondary threads at arbitrary times.
      This fixes some races in how we start the secondary threads, which
      if not fixed could lead to occasional crashes of the host kernel.
      
      This makes sure that (a) we have grabbed all the secondary threads,
      and verified that they are no longer in the kernel, before we start
      any thread, (b) that the secondary thread loads its vcpu pointer
      after clearing the IPI that woke it up (so we don't miss a wakeup),
      and (c) that the secondary thread clears its vcpu pointer before
      incrementing the nap count.  It also removes unnecessary setting
      of the vcpu and vcore pointers in the paca in kvmppc_core_vcpu_load.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7b444c67
  11. 07 9月, 2012 1 次提交
  12. 16 8月, 2012 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix incorrect branch in H_CEDE code · 04f995a5
      Paul Mackerras 提交于
      In handling the H_CEDE hypercall, if this vcpu has already been
      prodded (with the H_PROD hypercall, which Linux guests don't in fact
      use), we branch to a numeric label '1f'.  Unfortunately there is
      another '1:' label before the one that we want to jump to.  This fixes
      the problem by using a textual label, 'kvm_cede_prodded'.  It also
      changes the label for another longish branch from '2:' to
      'kvm_cede_exit' to avoid a possible future problem if code modifications
      add another numeric '2:' label in between.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      04f995a5
  13. 11 7月, 2012 1 次提交
    • A
      powerpc: Add VDSO version of getcpu · 18ad51dd
      Anton Blanchard 提交于
      We have a request for a fast method of getting CPU and NUMA node IDs
      from userspace. This patch implements a getcpu VDSO function,
      similar to x86.
      
      Ben suggested we use SPRG3 which is userspace readable. SPRG3 can be
      modified by a KVM guest, so we save the SPRG3 value in the paca and
      restore it when transitioning from the guest to the host.
      
      I have a glibc patch that implements sched_getcpu on top of this.
      Testing on a POWER7:
      
      baseline: 538 cycles
      vdso:      30 cycles
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      18ad51dd
  14. 10 7月, 2012 2 次提交
  15. 02 7月, 2012 1 次提交
  16. 08 4月, 2012 2 次提交
    • P
      KVM: PPC: Work around POWER7 DABR corruption problem · 8943633c
      Paul Mackerras 提交于
      It turns out that on POWER7, writing to the DABR can cause a corrupted
      value to be written if the PMU is active and updating SDAR in continuous
      sampling mode.  To work around this, we make sure that the PMU is inactive
      and SDAR updates are disabled (via MMCRA) when we are context-switching
      DABR.
      
      When the guest sets DABR via the H_SET_DABR hypercall, we use a slightly
      different workaround, which is to read back the DABR and write it again
      if it got corrupted.
      
      While we are at it, make it consistent that the saving and restoring
      of the guest's non-volatile GPRs and the FPRs are done with the guest
      setup of the PMU active.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      8943633c
    • P
      KVM: PPC: Book3S HV: Make secondary threads more robust against stray IPIs · f0888f70
      Paul Mackerras 提交于
      Currently on POWER7, if we are running the guest on a core and we don't
      need all the hardware threads, we do nothing to ensure that the unused
      threads aren't executing in the kernel (other than checking that they
      are offline).  We just assume they're napping and we don't do anything
      to stop them trying to enter the kernel while the guest is running.
      This means that a stray IPI can wake up the hardware thread and it will
      then try to enter the kernel, but since the core is in guest context,
      it will execute code from the guest in hypervisor mode once it turns the
      MMU on, which tends to lead to crashes or hangs in the host.
      
      This fixes the problem by adding two new one-byte flags in the
      kvmppc_host_state structure in the PACA which are used to interlock
      between the primary thread and the unused secondary threads when entering
      the guest.  With these flags, the primary thread can ensure that the
      unused secondaries are not already in kernel mode (i.e. handling a stray
      IPI) and then indicate that they should not try to enter the kernel
      if they do get woken for any reason.  Instead they will go into KVM code,
      find that there is no vcpu to run, acknowledge and clear the IPI and go
      back to nap mode.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f0888f70
  17. 05 3月, 2012 3 次提交
    • P
      KVM: PPC: Allow for read-only pages backing a Book3S HV guest · 4cf302bc
      Paul Mackerras 提交于
      With this, if a guest does an H_ENTER with a read/write HPTE on a page
      which is currently read-only, we make the actual HPTE inserted be a
      read-only version of the HPTE.  We now intercept protection faults as
      well as HPTE not found faults, and for a protection fault we work out
      whether it should be reflected to the guest (e.g. because the guest HPTE
      didn't allow write access to usermode) or handled by switching to
      kernel context and calling kvmppc_book3s_hv_page_fault, which will then
      request write access to the page and update the actual HPTE.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4cf302bc
    • P
      KVM: PPC: Implement MMU notifiers for Book3S HV guests · 342d3db7
      Paul Mackerras 提交于
      This adds the infrastructure to enable us to page out pages underneath
      a Book3S HV guest, on processors that support virtualized partition
      memory, that is, POWER7.  Instead of pinning all the guest's pages,
      we now look in the host userspace Linux page tables to find the
      mapping for a given guest page.  Then, if the userspace Linux PTE
      gets invalidated, kvm_unmap_hva() gets called for that address, and
      we replace all the guest HPTEs that refer to that page with absent
      HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit
      set, which will cause an HDSI when the guest tries to access them.
      Finally, the page fault handler is extended to reinstantiate the
      guest HPTE when the guest tries to access a page which has been paged
      out.
      
      Since we can't intercept the guest DSI and ISI interrupts on PPC970,
      we still have to pin all the guest pages on PPC970.  We have a new flag,
      kvm->arch.using_mmu_notifiers, that indicates whether we can page
      guest pages out.  If it is not set, the MMU notifier callbacks do
      nothing and everything operates as before.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      342d3db7
    • P
      KVM: PPC: Implement MMIO emulation support for Book3S HV guests · 697d3899
      Paul Mackerras 提交于
      This provides the low-level support for MMIO emulation in Book3S HV
      guests.  When the guest tries to map a page which is not covered by
      any memslot, that page is taken to be an MMIO emulation page.  Instead
      of inserting a valid HPTE, we insert an HPTE that has the valid bit
      clear but another hypervisor software-use bit set, which we call
      HPTE_V_ABSENT, to indicate that this is an absent page.  An
      absent page is treated much like a valid page as far as guest hcalls
      (H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
      an absent HPTE doesn't need to be invalidated with tlbie since it
      was never valid as far as the hardware is concerned.
      
      When the guest accesses a page for which there is an absent HPTE, it
      will take a hypervisor data storage interrupt (HDSI) since we now set
      the VPM1 bit in the LPCR.  Our HDSI handler for HPTE-not-present faults
      looks up the hash table and if it finds an absent HPTE mapping the
      requested virtual address, will switch to kernel mode and handle the
      fault in kvmppc_book3s_hv_page_fault(), which at present just calls
      kvmppc_hv_emulate_mmio() to set up the MMIO emulation.
      
      This is based on an earlier patch by Benjamin Herrenschmidt, but since
      heavily reworked.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      697d3899
  18. 08 12月, 2011 1 次提交
    • P
      powerpc: Provide a way for KVM to indicate that NV GPR values are lost · 2fde6d20
      Paul Mackerras 提交于
      This fixes a problem where a CPU thread coming out of nap mode can
      think it has valid values in the nonvolatile GPRs (r14 - r31) as saved
      away in power7_idle, but in fact the values have been trashed because
      the thread was used for KVM in the mean time.  The result is that the
      thread crashes because code that called power7_idle (e.g.,
      pnv_smp_cpu_kill_self()) goes to use values in registers that have
      been trashed.
      
      The bit field in SRR1 that tells whether state was lost only reflects
      the most recent nap, which may not have been the nap instruction in
      power7_idle.  So we need an extra PACA field to indicate that state
      has been lost even if SRR1 indicates that the most recent nap didn't
      lose state.  We clear this field when saving the state in power7_idle,
      we set it to a non-zero value when we use the thread for KVM, and we
      test it in power7_wakeup_noloss.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2fde6d20
  19. 08 11月, 2011 1 次提交
  20. 26 9月, 2011 1 次提交
    • P
      KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code · 19ccb76a
      Paul Mackerras 提交于
      With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
      core), whenever a CPU goes idle, we have to pull all the other
      hardware threads in the core out of the guest, because the H_CEDE
      hcall is handled in the kernel.  This is inefficient.
      
      This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
      in real mode.  When a guest vcpu does an H_CEDE hcall, we now only
      exit to the kernel if all the other vcpus in the same core are also
      idle.  Otherwise we mark this vcpu as napping, save state that could
      be lost in nap mode (mainly GPRs and FPRs), and execute the nap
      instruction.  When the thread wakes up, because of a decrementer or
      external interrupt, we come back in at kvm_start_guest (from the
      system reset interrupt vector), find the `napping' flag set in the
      paca, and go to the resume path.
      
      This has some other ramifications.  First, when starting a core, we
      now start all the threads, both those that are immediately runnable and
      those that are idle.  This is so that we don't have to pull all the
      threads out of the guest when an idle thread gets a decrementer interrupt
      and wants to start running.  In fact the idle threads will all start
      with the H_CEDE hcall returning; being idle they will just do another
      H_CEDE immediately and go to nap mode.
      
      This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
      These functions have been restructured to make them simpler and clearer.
      We introduce a level of indirection in the wait queue that gets woken
      when external and decrementer interrupts get generated for a vcpu, so
      that we can have the 4 vcpus in a vcore using the same wait queue.
      We need this because the 4 vcpus are being handled by one thread.
      
      Secondly, when we need to exit from the guest to the kernel, we now
      have to generate an IPI for any napping threads, because an HDEC
      interrupt doesn't wake up a napping thread.
      
      Thirdly, we now need to be able to handle virtual external interrupts
      and decrementer interrupts becoming pending while a thread is napping,
      and deliver those interrupts to the guest when the thread wakes.
      This is done in kvmppc_cede_reentry, just before fast_guest_return.
      
      Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
      and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
      from kvm_arch_vcpu_runnable.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      19ccb76a