1. 05 4月, 2018 1 次提交
    • S
      Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown" · 2c151b25
      Sean Christopherson 提交于
      The bug that led to commit 95e057e2
      was a benign warning (no adverse affects other than the warning
      itself) that was detected by syzkaller.  Further inspection shows
      that the WARN_ON in question, in handle_ept_misconfig(), is
      unnecessary and flawed (this was also briefly discussed in the
      original patch: https://patchwork.kernel.org/patch/10204649).
      
        * The WARN_ON is unnecessary as kvm_mmu_page_fault() will WARN
          if reserved bits are set in the SPTEs, i.e. it covers the case
          where an EPT misconfig occurred because of a KVM bug.
      
        * The WARN_ON is flawed because it will fire on any system error
          code that is hit while handling the fault, e.g. -ENOMEM can be
          returned by mmu_topup_memory_caches() while handling a legitmate
          MMIO EPT misconfig.
      
      The original behavior of returning -EFAULT when userspace munmaps
      an HVA without first removing the memslot is correct and desirable,
      i.e. KVM is letting userspace know it has generated a bad address.
      Returning RET_PF_EMULATE masks the WARN_ON in the EPT misconfig path,
      but does not fix the underlying bug, i.e. the WARN_ON is bogus.
      
      Furthermore, returning RET_PF_EMULATE has the unwanted side effect of
      causing KVM to attempt to emulate an instruction on any page fault
      with an invalid HVA translation, e.g. a not-present EPT violation
      on a VM_PFNMAP VMA whose fault handler failed to insert a PFN.
      
        * There is no guarantee that the fault is directly related to the
          instruction, i.e. the fault could have been triggered by a side
          effect memory access in the guest, e.g. while vectoring a #DB or
          writing a tracing record.  This could cause KVM to effectively
          mask the fault if KVM doesn't model the behavior leading to the
          fault, i.e. emulation could succeed and resume the guest.
      
        * If emulation does fail, KVM will return EMULATION_FAILED instead
          of -EFAULT, which is a red herring as the user will either debug
          a bogus emulation attempt or scratch their head wondering why we
          were attempting emulation in the first place.
      
      TL;DR: revert to returning -EFAULT and remove the bogus WARN_ON in
      handle_ept_misconfig in a future patch.
      
      This reverts commit 95e057e2.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2c151b25
  2. 04 4月, 2018 2 次提交
    • S
      kvm: Add emulation for movups/movupd · 29916968
      Stefan Fritsch 提交于
      This is very similar to the aligned versions movaps/movapd.
      
      We have seen the corresponding emulation failures with openbsd as guest
      and with Windows 10 with intel HD graphics pass through.
      Signed-off-by: NChristian Ehrhardt <christian_ehrhardt@genua.de>
      Signed-off-by: NStefan Fritsch <sf@sfritsch.de>
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      29916968
    • S
      KVM: VMX: raise internal error for exception during invalid protected mode state · add5ff7a
      Sean Christopherson 提交于
      Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
      an exception in Protected Mode while emulating guest due to invalid
      guest state.  Unlike Big RM, KVM doesn't support emulating exceptions
      in PM, i.e. PM exceptions are always injected via the VMCS.  Because
      we will never do VMRESUME due to emulation_required, the exception is
      never realized and we'll keep emulating the faulting instruction over
      and over until we receive a signal.
      
      Exit to userspace iff there is a pending exception, i.e. don't exit
      simply on a requested event. The purpose of this check and exit is to
      aid in debugging a guest that is in all likelihood already doomed.
      Invalid guest state in PM is extremely limited in normal operation,
      e.g. it generally only occurs for a few instructions early in BIOS,
      and any exception at this time is all but guaranteed to be fatal.
      Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
      handled/emulated, while checking for vectored interrupts, e.g. INTR
      and NMI, without hitting false positives would add a fair amount of
      complexity for almost no benefit (getting hit by lightning seems
      more likely than encountering this specific scenario).
      
      Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
      exception via the VMCS and emulation_required is true.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      add5ff7a
  3. 30 3月, 2018 1 次提交
  4. 29 3月, 2018 22 次提交
  5. 28 3月, 2018 5 次提交
    • A
      KVM: x86: Fix perf timer mode IP reporting · dd60d217
      Andi Kleen 提交于
      KVM and perf have a special backdoor mechanism to report the IP for interrupts
      re-executed after vm exit. This works for the NMIs that perf normally uses.
      
      However when perf is in timer mode it doesn't work because the timer interrupt
      doesn't get this special treatment. This is common when KVM is running
      nested in another hypervisor which may not implement the PMU, so only
      timer mode is available.
      
      Call the functions to set up the backdoor IP also for non NMI interrupts.
      
      I renamed the functions to set up the backdoor IP reporting to be more
      appropiate for their new use.  The SVM change is only compile tested.
      
      v2: Moved the functions inline.
      For the normal interrupt case the before/after functions are now
      called from x86.c, not arch specific code.
      For the NMI case we still need to call it in the architecture
      specific code, because it's already needed in the low level *_run
      functions.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      [Removed unnecessary calls from arch handle_external_intr. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      dd60d217
    • R
      Merge tag 'kvm-arm-for-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm · abe7a458
      Radim Krčmář 提交于
      KVM/ARM updates for v4.17
      
      - VHE optimizations
      - EL2 address space randomization
      - Variant 3a mitigation for Cortex-A57 and A72
      - The usual vgic fixes
      - Various minor tidying-up
      abe7a458
    • M
      arm64: Add temporary ERRATA_MIDR_ALL_VERSIONS compatibility macro · dc6ed61d
      Marc Zyngier 提交于
      MIDR_ALL_VERSIONS is changing, and won't have the same meaning
      in 4.17, and the right thing to use will be ERRATA_MIDR_ALL_VERSIONS.
      
      In order to cope with the merge window, let's add a compatibility
      macro that will allow a relatively smooth transition, and that
      can be removed post 4.17-rc1.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      dc6ed61d
    • M
      Revert "arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening" · adc91ab7
      Marc Zyngier 提交于
      Creates far too many conflicts with arm64/for-next/core, to be
      resent post -rc1.
      
      This reverts commit f9f5dc19.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      adc91ab7
    • P
      KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot() in page fault handler · 31c8b0d0
      Paul Mackerras 提交于
      This changes the hypervisor page fault handler for radix guests to use
      the generic KVM __gfn_to_pfn_memslot() function instead of using
      get_user_pages_fast() and then handling the case of VM_PFNMAP vmas
      specially.  The old code missed the case of VM_IO vmas; with this
      change, VM_IO vmas will now be handled correctly by code within
      __gfn_to_pfn_memslot.
      
      Currently, __gfn_to_pfn_memslot calls hva_to_pfn, which only uses
      __get_user_pages_fast for the initial lookup in the cases where
      either atomic or async is set.  Since we are not setting either
      atomic or async, we do our own __get_user_pages_fast first, for now.
      
      This also adds code to check for the KVM_MEM_READONLY flag on the
      memslot.  If it is set and this is a write access, we synthesize a
      data storage interrupt for the guest.
      
      In the case where the page is not normal RAM (i.e. page == NULL in
      kvmppc_book3s_radix_page_fault(), we read the PTE from the Linux page
      tables because we need the mapping attribute bits as well as the PFN.
      (The mapping attribute bits indicate whether accesses have to be
      non-cacheable and/or guarded.)
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      31c8b0d0
  6. 26 3月, 2018 2 次提交
    • M
      KVM: arm/arm64: vgic-its: Fix potential overrun in vgic_copy_lpi_list · 7d8b44c5
      Marc Zyngier 提交于
      vgic_copy_lpi_list() parses the LPI list and picks LPIs targeting
      a given vcpu. We allocate the array containing the intids before taking
      the lpi_list_lock, which means we can have an array size that is not
      equal to the number of LPIs.
      
      This is particularly obvious when looking at the path coming from
      vgic_enable_lpis, which is not a command, and thus can run in parallel
      with commands:
      
      vcpu 0:                                        vcpu 1:
      vgic_enable_lpis
        its_sync_lpi_pending_table
          vgic_copy_lpi_list
            intids = kmalloc_array(irq_count)
                                                     MAPI(lpi targeting vcpu 0)
            list_for_each_entry(lpi_list_head)
              intids[i++] = irq->intid;
      
      At that stage, we will happily overrun the intids array. Boo. An easy
      fix is is to break once the array is full. The MAPI command will update
      the config anyway, and we won't miss a thing. We also make sure that
      lpi_list_count is read exactly once, so that further updates of that
      value will not affect the array bound check.
      
      Cc: stable@vger.kernel.org
      Fixes: ccb1d791 ("KVM: arm64: vgic-its: Fix pending table sync")
      Reviewed-by: NAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      7d8b44c5
    • M
      KVM: arm/arm64: vgic: Disallow Active+Pending for level interrupts · 67b5b673
      Marc Zyngier 提交于
      It was recently reported that VFIO mediated devices, and anything
      that VFIO exposes as level interrupts, do no strictly follow the
      expected logic of such interrupts as it only lowers the input
      line when the guest has EOId the interrupt at the GIC level, rather
      than when it Acked the interrupt at the device level.
      
      THe GIC's Active+Pending state is fundamentally incompatible with
      this behaviour, as it prevents KVM from observing the EOI, and in
      turn results in VFIO never dropping the line. This results in an
      interrupt storm in the guest, which it really never expected.
      
      As we cannot really change VFIO to follow the strict rules of level
      signalling, let's forbid the A+P state altogether, as it is in the
      end only an optimization. It ensures that we will transition via
      an invalid state, which we can use to notify VFIO of the EOI.
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NShunyong Yang <shunyong.yang@hxt-semitech.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      67b5b673
  7. 24 3月, 2018 5 次提交
  8. 21 3月, 2018 2 次提交
    • P
      KVM: nVMX: fix vmentry failure code when L2 state would require emulation · 3184a995
      Paolo Bonzini 提交于
      Commit 2bb8cafe ("KVM: vVMX: signal failure for nested VMEntry if
      emulation_required", 2018-03-12) introduces a new error path which does
      not set *entry_failure_code.  Fix that to avoid a leak of L0 stack to L1.
      Reported-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3184a995
    • L
      KVM: nVMX: Do not load EOI-exitmap while running L2 · e40ff1d6
      Liran Alon 提交于
      When L1 IOAPIC redirection-table is written, a request of
      KVM_REQ_SCAN_IOAPIC is set on all vCPUs. This is done such that
      all vCPUs will now recalc their IOAPIC handled vectors and load
      it to their EOI-exitmap.
      
      However, it could be that one of the vCPUs is currently running
      L2. In this case, load_eoi_exitmap() will be called which would
      write to vmcs02->eoi_exit_bitmap, which is wrong because
      vmcs02->eoi_exit_bitmap should always be equal to
      vmcs12->eoi_exit_bitmap. Furthermore, at this point
      KVM_REQ_SCAN_IOAPIC was already consumed and therefore we will
      never update vmcs01->eoi_exit_bitmap. This could lead to remote_irr
      of some IOAPIC level-triggered entry to remain set forever.
      
      Fix this issue by delaying the load of EOI-exitmap to when vCPU
      is running L1.
      
      One may wonder why not just delay entire KVM_REQ_SCAN_IOAPIC
      processing to when vCPU is running L1. This is done in order to handle
      correctly the case where LAPIC & IO-APIC of L1 is pass-throughed into
      L2. In this case, vmcs12->virtual_interrupt_delivery should be 0. In
      current nVMX implementation, that results in
      vmcs02->virtual_interrupt_delivery to also be 0. Thus,
      vmcs02->eoi_exit_bitmap is not used. Therefore, every L2 EOI cause
      a #VMExit into L0 (either on MSR_WRITE to x2APIC MSR or
      APIC_ACCESS/APIC_WRITE/EPT_MISCONFIG to APIC MMIO page).
      In order for such L2 EOI to be broadcasted, if needed, from LAPIC
      to IO-APIC, vcpu->arch.ioapic_handled_vectors must be updated
      while L2 is running. Therefore, patch makes sure to delay only the
      loading of EOI-exitmap but not the update of
      vcpu->arch.ioapic_handled_vectors.
      Reviewed-by: NArbel Moshe <arbel.moshe@oracle.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e40ff1d6