1. 31 1月, 2017 16 次提交
    • D
      KVM: PPC: Book3S HV: Gather HPT related variables into sub-structure · 3f9d4f5a
      David Gibson 提交于
      Currently, the powerpc kvm_arch structure contains a number of variables
      tracking the state of the guest's hashed page table (HPT) in KVM HV.  This
      patch gathers them all together into a single kvm_hpt_info substructure.
      This makes life more convenient for the upcoming HPT resizing
      implementation.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3f9d4f5a
    • D
      KVM: PPC: Book3S HV: Rename kvm_alloc_hpt() for clarity · db9a290d
      David Gibson 提交于
      The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at
      all obvious from the name.  In practice kvmppc_alloc_hpt() allocates an HPT
      by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate
      it with CMA only.
      
      To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma().
      Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma().
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      db9a290d
    • P
      KVM: PPC: Book3S HV: Enable radix guest support · 8cf4ecc0
      Paul Mackerras 提交于
      This adds a few last pieces of the support for radix guests:
      
      * Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and
        KVM_PPC_GET_RMMU_INFO ioctls for radix guests
      
      * On POWER9, allow secondary threads to be on/off-lined while guests
        are running.
      
      * Set up LPCR and the partition table entry for radix guests.
      
      * Don't allocate the rmap array in the kvm_memory_slot structure
        on radix.
      
      * Don't try to initialize the HPT for radix guests, since they don't
        have an HPT.
      
      * Take out the code that prevents the HV KVM module from
        initializing on radix hosts.
      
      At this stage, we only support radix guests if the host is running
      in radix mode, and only support HPT guests if the host is running in
      HPT mode.  Thus a guest cannot switch from one mode to the other,
      which enables some simplifications.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8cf4ecc0
    • P
      KVM: PPC: Book3S HV: Invalidate ERAT on guest entry/exit for POWER9 DD1 · f11f6f79
      Paul Mackerras 提交于
      On POWER9 DD1, we need to invalidate the ERAT (effective to real
      address translation cache) when changing the PIDR register, which
      we do as part of guest entry and exit.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f11f6f79
    • P
      KVM: PPC: Book3S HV: Allow guest exit path to have MMU on · 53af3ba2
      Paul Mackerras 提交于
      If we allow LPCR[AIL] to be set for radix guests, then interrupts from
      the guest to the host can be delivered by the hardware with relocation
      on, and thus the code path starting at kvmppc_interrupt_hv can be
      executed in virtual mode (MMU on) for radix guests (previously it was
      only ever executed in real mode).
      
      Most of the code is indifferent to whether the MMU is on or off, but
      the calls to OPAL that use the real-mode OPAL entry code need to
      be switched to use the virtual-mode code instead.  The affected
      calls are the calls to the OPAL XICS emulation functions in
      kvmppc_read_one_intr() and related functions.  We test the MSR[IR]
      bit to detect whether we are in real or virtual mode, and call the
      opal_rm_* or opal_* function as appropriate.
      
      The other place that depends on the MMU being off is the optimization
      where the guest exit code jumps to the external interrupt vector or
      hypervisor doorbell interrupt vector, or returns to its caller (which
      is __kvmppc_vcore_entry).  If the MMU is on and we are returning to
      the caller, then we don't need to use an rfid instruction since the
      MMU is already on; a simple blr suffices.  If there is an external
      or hypervisor doorbell interrupt to handle, we branch to the
      relocation-on version of the interrupt vector.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      53af3ba2
    • P
      KVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movement · a29ebeaf
      Paul Mackerras 提交于
      With radix, the guest can do TLB invalidations itself using the tlbie
      (global) and tlbiel (local) TLB invalidation instructions.  Linux guests
      use local TLB invalidations for translations that have only ever been
      accessed on one vcpu.  However, that doesn't mean that the translations
      have only been accessed on one physical cpu (pcpu) since vcpus can move
      around from one pcpu to another.  Thus a tlbiel might leave behind stale
      TLB entries on a pcpu where the vcpu previously ran, and if that task
      then moves back to that previous pcpu, it could see those stale TLB
      entries and thus access memory incorrectly.  The usual symptom of this
      is random segfaults in userspace programs in the guest.
      
      To cope with this, we detect when a vcpu is about to start executing on
      a thread in a core that is a different core from the last time it
      executed.  If that is the case, then we mark the core as needing a
      TLB flush and then send an interrupt to any thread in the core that is
      currently running a vcpu from the same guest.  This will get those vcpus
      out of the guest, and the first one to re-enter the guest will do the
      TLB flush.  The reason for interrupting the vcpus executing on the old
      core is to cope with the following scenario:
      
      	CPU 0			CPU 1			CPU 4
      	(core 0)			(core 0)			(core 1)
      
      	VCPU 0 runs task X      VCPU 1 runs
      	core 0 TLB gets
      	entries from task X
      	VCPU 0 moves to CPU 4
      							VCPU 0 runs task X
      							Unmap pages of task X
      							tlbiel
      
      				(still VCPU 1)			task X moves to VCPU 1
      				task X runs
      				task X sees stale TLB
      				entries
      
      That is, as soon as the VCPU starts executing on the new core, it
      could unmap and tlbiel some page table entries, and then the task
      could migrate to one of the VCPUs running on the old core and
      potentially see stale TLB entries.
      
      Since the TLB is shared between all the threads in a core, we only
      use the bit of kvm->arch.need_tlb_flush corresponding to the first
      thread in the core.  To ensure that we don't have a window where we
      can miss a flush, this moves the clearing of the bit from before the
      actual flush to after it.  This way, two threads might both do the
      flush, but we prevent the situation where one thread can enter the
      guest before the flush is finished.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a29ebeaf
    • P
      KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix mode · 65dae540
      Paul Mackerras 提交于
      If the guest is in radix mode, then it doesn't have a hashed page
      table (HPT), so all of the hypercalls that manipulate the HPT can't
      work and should return an error.  This adds checks to make them
      return H_FUNCTION ("function not supported").
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      65dae540
    • P
      KVM: PPC: Book3S HV: Implement dirty page logging for radix guests · 8f7b79b8
      Paul Mackerras 提交于
      This adds code to keep track of dirty pages when requested (that is,
      when memslot->dirty_bitmap is non-NULL) for radix guests.  We use the
      dirty bits in the PTEs in the second-level (partition-scoped) page
      tables, together with a bitmap of pages that were dirty when their
      PTE was invalidated (e.g., when the page was paged out).  This bitmap
      is stored in the first half of the memslot->dirty_bitmap area, and
      kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the
      bitmap that gets returned to userspace.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8f7b79b8
    • P
      KVM: PPC: Book3S HV: MMU notifier callbacks for radix guests · 01756099
      Paul Mackerras 提交于
      This adapts our implementations of the MMU notifier callbacks
      (unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva)
      to call radix functions when the guest is using radix.  These
      implementations are much simpler than for HPT guests because we
      have only one PTE to deal with, so we don't need to traverse
      rmap chains.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      01756099
    • P
      KVM: PPC: Book3S HV: Page table construction and page faults for radix guests · 5a319350
      Paul Mackerras 提交于
      This adds the code to construct the second-level ("partition-scoped" in
      architecturese) page tables for guests using the radix MMU.  Apart from
      the PGD level, which is allocated when the guest is created, the rest
      of the tree is all constructed in response to hypervisor page faults.
      
      As well as hypervisor page faults for missing pages, we also get faults
      for reference/change (RC) bits needing to be set, as well as various
      other error conditions.  For now, we only set the R or C bit in the
      guest page table if the same bit is set in the host PTE for the
      backing page.
      
      This code can take advantage of the guest being backed with either
      transparent or ordinary 2MB huge pages, and insert 2MB page entries
      into the guest page tables.  There is no support for 1GB huge pages
      yet.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5a319350
    • P
      KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle radix guests · f4c51f84
      Paul Mackerras 提交于
      This adds code to  branch around the parts that radix guests don't
      need - clearing and loading the SLB with the guest SLB contents,
      saving the guest SLB contents on exit, and restoring the host SLB
      contents.
      
      Since the host is now using radix, we need to save and restore the
      host value for the PID register.
      
      On hypervisor data/instruction storage interrupts, we don't do the
      guest HPT lookup on radix, but just save the guest physical address
      for the fault (from the ASDR register) in the vcpu struct.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f4c51f84
    • P
      KVM: PPC: Book3S HV: Add basic infrastructure for radix guests · 9e04ba69
      Paul Mackerras 提交于
      This adds a field in struct kvm_arch and an inline helper to
      indicate whether a guest is a radix guest or not, plus a new file
      to contain the radix MMU code, which currently contains just a
      translate function which knows how to traverse the guest page
      tables to translate an address.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9e04ba69
    • P
      KVM: PPC: Book3S HV: Use ASDR for HPT guests on POWER9 · ef8c640c
      Paul Mackerras 提交于
      POWER9 adds a register called ASDR (Access Segment Descriptor
      Register), which is set by hypervisor data/instruction storage
      interrupts to contain the segment descriptor for the address
      being accessed, assuming the guest is using HPT translation.
      (For radix guests, it contains the guest real address of the
      access.)
      
      Thus, for HPT guests on POWER9, we can use this register rather
      than looking up the SLB with the slbfee. instruction.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ef8c640c
    • P
      KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9 · 468808bd
      Paul Mackerras 提交于
      This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl
      for HPT guests on POWER9.  With this, we can return 1 for the
      KVM_CAP_PPC_MMU_HASH_V3 capability.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      468808bd
    • P
      KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU · c9270132
      Paul Mackerras 提交于
      This adds two capabilities and two ioctls to allow userspace to
      find out about and configure the POWER9 MMU in a guest.  The two
      capabilities tell userspace whether KVM can support a guest using
      the radix MMU, or using the hashed page table (HPT) MMU with a
      process table and segment tables.  (Note that the MMUs in the
      POWER9 processor cores do not use the process and segment tables
      when in HPT mode, but the nest MMU does).
      
      The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
      whether a guest will use the radix MMU or the HPT MMU, and to
      specify the size and location (in guest space) of the process
      table.
      
      The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
      the radix MMU.  It returns a list of supported radix tree geometries
      (base page size and number of bits indexed at each level of the
      radix tree) and the encoding used to specify the various page
      sizes for the TLB invalidate entry instruction.
      
      Initially, both capabilities return 0 and the ioctls return -EINVAL,
      until the necessary infrastructure for them to operate correctly
      is added.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c9270132
    • N
      KVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interrupts · a97a65d5
      Nicholas Piggin 提交于
      64-bit Book3S exception handlers must find the dynamic kernel base
      to add to the target address when branching beyond __end_interrupts,
      in order to support kernel running at non-0 physical address.
      
      Support this in KVM by branching with CTR, similarly to regular
      interrupt handlers. The guest CTR saved in HSTATE_SCRATCH1 and
      restored after the branch.
      
      Without this, the host kernel hangs and crashes randomly when it is
      running at a non-0 address and a KVM guest is started.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a97a65d5
  2. 27 1月, 2017 9 次提交
    • T
      KVM: PPC: Book3S PR: Refactor program interrupt related code into separate function · fcd4f3c6
      Thomas Huth 提交于
      The function kvmppc_handle_exit_pr() is quite huge and thus hard to read,
      and even contains a "spaghetti-code"-like goto between the different case
      labels of the big switch statement. This can be made much more readable
      by moving the code related to injecting program interrupts / instruction
      emulation into a separate function instead.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      fcd4f3c6
    • P
      KVM: PPC: Book3S HV: Fix H_PROD to actually wake the target vcpu · 8464c884
      Paul Mackerras 提交于
      The H_PROD hypercall is supposed to wake up an idle vcpu.  We have
      an implementation, but because Linux doesn't use it except when
      doing cpu hotplug, it was never tested properly.  AIX does use it,
      and reported it broken.  It turns out we were waking the wrong
      vcpu (the one doing H_PROD, not the target of the prod) and we
      weren't handling the case where the target needs an IPI to wake
      it.  Fix it by using the existing kvmppc_fast_vcpu_kick_hv()
      function, which is intended for this kind of thing, and by using
      the target vcpu not the current vcpu.
      
      We were also not looking at the prodded flag when checking whether a
      ceded vcpu should wake up, so this adds checks for the prodded flag
      alongside the checks for pending exceptions.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      8464c884
    • N
      KVM: PPC: Book3S: Change interrupt call to reduce scratch space use on HV · d3918e7f
      Nicholas Piggin 提交于
      Change the calling convention to put the trap number together with
      CR in two halves of r12, which frees up HSTATE_SCRATCH2 in the HV
      handler.
      
      The 64-bit PR handler entry translates the calling convention back
      to match the previous call convention (i.e., shared with 32-bit), for
      simplicity.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d3918e7f
    • L
      KVM: PPC: Book 3S: XICS: Don't lock twice when checking for resend · 21acd0e4
      Li Zhong 提交于
      This patch improves the code that takes lock twice to check the resend flag
      and do the actual resending, by checking the resend flag locklessly, and
      add a boolean parameter check_resend to icp_[rm_]deliver_irq(), so the
      resend flag can be checked in the lock when doing the delivery.
      
      We need make sure when we clear the ics's bit in the icp's resend_map, we
      don't miss the resend flag of the irqs that set the bit. It could be
      ordered through the barrier in test_and_clear_bit(), and a newly added
      wmb between setting irq's resend flag, and icp's resend_map.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      21acd0e4
    • L
      KVM: PPC: Book 3S: XICS: Implement ICS P/Q states · 17d48610
      Li Zhong 提交于
      This patch implements P(Presented)/Q(Queued) states for ICS irqs.
      
      When the interrupt is presented, set P. Present if P was not set.
      If P is already set, don't present again, set Q.
      When the interrupt is EOI'ed, move Q into P (and clear Q). If it is
      set, re-present.
      
      The asserted flag used by LSI is also incorporated into the P bit.
      
      When the irq state is saved, P/Q bits are also saved, they need some
      qemu modifications to be recognized and passed around to be restored.
      KVM_XICS_PENDING bit set and saved should also indicate
      KVM_XICS_PRESENTED bit set and saved. But it is possible some old
      code doesn't have/recognize the P bit, so when we restore, we set P
      for PENDING bit, too.
      
      The idea and much of the code come from Ben.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      17d48610
    • L
      KVM: PPC: Book 3S: XICS: Fix potential issue with duplicate IRQ resends · bf5a71d5
      Li Zhong 提交于
      It is possible that in the following order, one irq is resent twice:
      
              CPU 1                                   CPU 2
      
      ics_check_resend()
        lock ics_lock
          see resend set
        unlock ics_lock
                                             /* change affinity of the irq */
                                             kvmppc_xics_set_xive()
                                               write_xive()
                                                 lock ics_lock
                                                   see resend set
                                                 unlock ics_lock
      
                                               icp_deliver_irq() /* resend */
      
        icp_deliver_irq() /* resend again */
      
      It doesn't have any user-visible effect at present, but needs to be avoided
      when the following patch implementing the P/Q stuff is applied.
      
      This patch clears the resend flag before releasing the ics lock, when we
      know we will do a re-delivery after checking the flag, or setting the flag.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      bf5a71d5
    • L
      KVM: PPC: Book 3S: XICS: correct the real mode ICP rejecting counter · 37451bc9
      Li Zhong 提交于
      Some counters are added in Commit 6e0365b7 ("KVM: PPC: Book3S HV:
      Add ICP real mode counters"), to provide some performance statistics to
      determine whether further optimizing is needed for real mode functions.
      
      The n_reject counter counts how many times ICP rejects an irq because of
      priority in real mode. The redelivery of an lsi that is still asserted
      after eoi doesn't fall into this category, so the increasement there is
      removed.
      
      Also, it needs to be increased in icp_rm_deliver_irq() if it rejects
      another one.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      37451bc9
    • L
      KVM: PPC: Book 3S: XICS cleanup: remove XICS_RM_REJECT · 5efa6605
      Li Zhong 提交于
      Commit b0221556 ("KVM: PPC: Book3S HV: Move virtual mode ICP functions
       to real-mode") removed the setting of the XICS_RM_REJECT flag. And
      since that commit, nothing else sets the flag any more, so we can remove
      the flag and the remaining code that handles it, including the counter
      that counts how many times it get set.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      5efa6605
    • P
      KVM: PPC: Book3S HV: Don't try to signal cpu -1 · 3deda5e5
      Paul Mackerras 提交于
      If the target vcpu for kvmppc_fast_vcpu_kick_hv() is not running on
      any CPU, then we will have vcpu->arch.thread_cpu == -1, and as it
      happens, kvmppc_fast_vcpu_kick_hv will call kvmppc_ipi_thread with
      -1 as the cpu argument.  Although this is not meaningful, in the past,
      before commit 1704a81c ("KVM: PPC: Book3S HV: Use msgsnd for IPIs
      to other cores on POWER9", 2016-11-18), it was harmless because CPU
      -1 is not in the same core as any real CPU thread.  On a POWER9,
      however, we don't do the "same core" check, so we were trying to
      do a msgsnd to thread -1, which is invalid.  To avoid this, we add
      a check to see that vcpu->arch.thread_cpu is >= 0 before calling
      kvmppc_ipi_thread() with it.  Since vcpu->arch.thread_vcpu can change
      asynchronously, we use READ_ONCE to ensure that the value we check is
      the same value that we use as the argument to kvmppc_ipi_thread().
      
      Fixes: 1704a81c ("KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3deda5e5
  3. 26 12月, 2016 1 次提交
    • T
      ktime: Cleanup ktime_set() usage · 8b0e1953
      Thomas Gleixner 提交于
      ktime_set(S,N) was required for the timespec storage type and is still
      useful for situations where a Seconds and Nanoseconds part of a time value
      needs to be converted. For anything where the Seconds argument is 0, this
      is pointless and can be replaced with a simple assignment.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8b0e1953
  4. 25 12月, 2016 1 次提交
  5. 02 12月, 2016 1 次提交
  6. 01 12月, 2016 1 次提交
  7. 28 11月, 2016 3 次提交
  8. 24 11月, 2016 8 次提交
    • D
      KVM: PPC: Correctly report KVM_CAP_PPC_ALLOC_HTAB · a8acaece
      David Gibson 提交于
      At present KVM on powerpc always reports KVM_CAP_PPC_ALLOC_HTAB as enabled.
      However, the ioctl() it advertises (KVM_PPC_ALLOCATE_HTAB) only actually
      works on KVM HV.  On KVM PR it will fail with ENOTTY.
      
      QEMU already has a workaround for this, so it's not breaking things in
      practice, but it would be better to advertise this correctly.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      a8acaece
    • P
      KVM: PPC: Book3S HV: Fix compilation with unusual configurations · e2702871
      Paul Mackerras 提交于
      This adds the "again" parameter to the dummy version of
      kvmppc_check_passthru(), so that it matches the real version.
      This fixes compilation with CONFIG_BOOK3S_64_HV set but
      CONFIG_KVM_XICS=n.
      
      This includes asm/smp.h in book3s_hv_builtin.c to fix compilation
      with CONFIG_SMP=n.  The explicit inclusion is necessary to provide
      definitions of hard_smp_processor_id() and get_hard_smp_processor_id()
      in UP configs.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e2702871
    • S
      KVM: PPC: Book3S HV: Update kvmppc_set_arch_compat() for ISA v3.00 · 2ee13be3
      Suraj Jitindar Singh 提交于
      The function kvmppc_set_arch_compat() is used to determine the value of the
      processor compatibility register (PCR) for a guest running in a given
      compatibility mode. There is currently no support for v3.00 of the ISA.
      
      Add support for v3.00 of the ISA which adds an ISA v2.07 compatilibity mode
      to the PCR.
      
      We also add a check to ensure the processor we are running on is capable of
      emulating the chosen processor (for example a POWER7 cannot emulate a
      POWER8, similarly with a POWER8 and a POWER9).
      
      Based on work by: Paul Mackerras <paulus@ozlabs.org>
      
      [paulus@ozlabs.org - moved dummy PCR_ARCH_300 definition here; set
       guest_pcr_bit when arch_compat == 0, added comment.]
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      2ee13be3
    • P
      KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcores · 45c940ba
      Paul Mackerras 提交于
      With POWER9, each CPU thread has its own MMU context and can be
      in the host or a guest independently of the other threads; there is
      still however a restriction that all threads must use the same type
      of address translation, either radix tree or hashed page table (HPT).
      
      Since we only support HPT guests on a HPT host at this point, we
      can treat the threads as being independent, and avoid all of the
      work of coordinating the CPU threads.  To make this simpler, we
      introduce a new threads_per_vcore() function that returns 1 on
      POWER9 and threads_per_subcore on POWER7/8, and use that instead
      of threads_per_subcore or threads_per_core in various places.
      
      This also changes the value of the KVM_CAP_PPC_SMT capability on
      POWER9 systems from 4 to 1, so that userspace will not try to
      create VMs with multiple vcpus per vcore.  (If userspace did create
      a VM that thought it was in an SMT mode, the VM might try to use
      the msgsndp instruction, which will not work as expected.  In
      future it may be possible to trap and emulate msgsndp in order to
      allow VMs to think they are in an SMT mode, if only for the purpose
      of allowing migration from POWER8 systems.)
      
      With all this, we can now run guests on POWER9 as long as the host
      is running with HPT translation.  Since userspace currently has no
      way to request radix tree translation for the guest, the guest has
      no choice but to use HPT translation.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      45c940ba
    • P
      KVM: PPC: Book3S HV: Enable hypervisor virtualization interrupts while in guest · 84f7139c
      Paul Mackerras 提交于
      The new XIVE interrupt controller on POWER9 can direct external
      interrupts to the hypervisor or the guest.  The interrupts directed to
      the hypervisor are controlled by an LPCR bit called LPCR_HVICE, and
      come in as a "hypervisor virtualization interrupt".  This sets the
      LPCR bit so that hypervisor virtualization interrupts can occur while
      we are in the guest.  We then also need to cope with exiting the guest
      because of a hypervisor virtualization interrupt.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      84f7139c
    • P
      KVM: PPC: Book3S HV: Use stop instruction rather than nap on POWER9 · bf53c88e
      Paul Mackerras 提交于
      POWER9 replaces the various power-saving mode instructions on POWER8
      (doze, nap, sleep and rvwinkle) with a single "stop" instruction, plus
      a register, PSSCR, which controls the depth of the power-saving mode.
      This replaces the use of the nap instruction when threads are idle
      during guest execution with the stop instruction, and adds code to
      set PSSCR to a value which will allow an SMT mode switch while the
      thread is idle (given that the core as a whole won't be idle in these
      cases).
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      bf53c88e
    • P
      KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9 · f725758b
      Paul Mackerras 提交于
      POWER9 includes a new interrupt controller, called XIVE, which is
      quite different from the XICS interrupt controller on POWER7 and
      POWER8 machines.  KVM-HV accesses the XICS directly in several places
      in order to send and clear IPIs and handle interrupts from PCI
      devices being passed through to the guest.
      
      In order to make the transition to XIVE easier, OPAL firmware will
      include an emulation of XICS on top of XIVE.  Access to the emulated
      XICS is via OPAL calls.  The one complication is that the EOI
      (end-of-interrupt) function can now return a value indicating that
      another interrupt is pending; in this case, the XIVE will not signal
      an interrupt in hardware to the CPU, and software is supposed to
      acknowledge the new interrupt without waiting for another interrupt
      to be delivered in hardware.
      
      This adapts KVM-HV to use the OPAL calls on machines where there is
      no XICS hardware.  When there is no XICS, we look for a device-tree
      node with "ibm,opal-intc" in its compatible property, which is how
      OPAL indicates that it provides XICS emulation.
      
      In order to handle the EOI return value, kvmppc_read_intr() has
      become kvmppc_read_one_intr(), with a boolean variable passed by
      reference which can be set by the EOI functions to indicate that
      another interrupt is pending.  The new kvmppc_read_intr() keeps
      calling kvmppc_read_one_intr() until there are no more interrupts
      to process.  The return value from kvmppc_read_intr() is the
      largest non-zero value of the returns from kvmppc_read_one_intr().
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f725758b
    • P
      KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9 · 1704a81c
      Paul Mackerras 提交于
      On POWER9, the msgsnd instruction is able to send interrupts to
      other cores, as well as other threads on the local core.  Since
      msgsnd is generally simpler and faster than sending an IPI via the
      XICS, we use msgsnd for all IPIs sent by KVM on POWER9.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1704a81c