1. 08 12月, 2021 20 次提交
  2. 02 12月, 2021 1 次提交
  3. 30 11月, 2021 1 次提交
    • P
      KVM: x86: check PIR even for vCPUs with disabled APICv · 37c4dbf3
      Paolo Bonzini 提交于
      The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even
      if APICv is disabled on the vCPU that receives it.  In that case, the
      interrupt will just cause a vmexit and leave the ON bit set together
      with the PIR bit corresponding to the interrupt.
      
      Right now, the interrupt would not be delivered until APICv is re-enabled.
      However, fixing this is just a matter of always doing the PIR->IRR
      synchronization, even if the vCPU has temporarily disabled APICv.
      
      This is not a problem for performance, or if anything it is an
      improvement.  First, in the common case where vcpu->arch.apicv_active is
      true, one fewer check has to be performed.  Second, static_call_cond will
      elide the function call if APICv is not present or disabled.  Finally,
      in the case for AMD hardware we can remove the sync_pir_to_irr callback:
      it is only needed for apic_has_interrupt_for_ppr, and that function
      already has a fallback for !APICv.
      
      Cc: stable@vger.kernel.org
      Co-developed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20211123004311.2954158-4-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      37c4dbf3
  4. 26 11月, 2021 4 次提交
  5. 18 11月, 2021 3 次提交
    • V
      KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS · 2845e735
      Vitaly Kuznetsov 提交于
      It doesn't make sense to return the recommended maximum number of
      vCPUs which exceeds the maximum possible number of vCPUs.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211116163443.88707-7-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2845e735
    • T
      KVM: x86: Assume a 64-bit hypercall for guests with protected state · b5aead00
      Tom Lendacky 提交于
      When processing a hypercall for a guest with protected state, currently
      SEV-ES guests, the guest CS segment register can't be checked to
      determine if the guest is in 64-bit mode. For an SEV-ES guest, it is
      expected that communication between the guest and the hypervisor is
      performed to shared memory using the GHCB. In order to use the GHCB, the
      guest must have been in long mode, otherwise writes by the guest to the
      GHCB would be encrypted and not be able to be comprehended by the
      hypervisor.
      
      Create a new helper function, is_64_bit_hypercall(), that assumes the
      guest is in 64-bit mode when the guest has protected state, and returns
      true, otherwise invoking is_64_bit_mode() to determine the mode. Update
      the hypercall related routines to use is_64_bit_hypercall() instead of
      is_64_bit_mode().
      
      Add a WARN_ON_ONCE() to is_64_bit_mode() to catch occurences of calls to
      this helper function for a guest running with protected state.
      
      Fixes: f1c6366e ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
      Reported-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <e0b20c770c9d0d1403f23d83e785385104211f74.1621878537.git.thomas.lendacky@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b5aead00
    • D
      KVM: Fix steal time asm constraints · 964b7aa0
      David Woodhouse 提交于
      In 64-bit mode, x86 instruction encoding allows us to use the low 8 bits
      of any GPR as an 8-bit operand. In 32-bit mode, however, we can only use
      the [abcd] registers. For which, GCC has the "q" constraint instead of
      the less restrictive "r".
      
      Also fix st->preempted, which is an input/output operand rather than an
      input.
      
      Fixes: 7e2175eb ("KVM: x86: Fix recording of guest steal time / preempted status")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <89bf72db1b859990355f9c40713a34e0d2d86c98.camel@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      964b7aa0
  6. 16 11月, 2021 1 次提交
    • KVM: x86: Fix uninitialized eoi_exit_bitmap usage in vcpu_load_eoi_exitmap() · c5adbb3a
      黄乐 提交于
      In vcpu_load_eoi_exitmap(), currently the eoi_exit_bitmap[4] array is
      initialized only when Hyper-V context is available, in other path it is
      just passed to kvm_x86_ops.load_eoi_exitmap() directly from on the stack,
      which would cause unexpected interrupt delivery/handling issues, e.g. an
      *old* linux kernel that relies on PIT to do clock calibration on KVM might
      randomly fail to boot.
      
      Fix it by passing ioapic_handled_vectors to load_eoi_exitmap() when Hyper-V
      context is not available.
      
      Fixes: f2bc14b6 ("KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V context")
      Cc: stable@vger.kernel.org
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NHuang Le <huangle1@jd.com>
      Message-Id: <62115b277dab49ea97da5633f8522daf@jd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c5adbb3a
  7. 12 11月, 2021 1 次提交
  8. 11 11月, 2021 8 次提交
    • V
      KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS · da1bfd52
      Vitaly Kuznetsov 提交于
      KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of
      VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports
      either num_online_cpus() or num_present_cpus(), s390 has multiple
      constants depending on hardware features. On x86, KVM reports an
      arbitrary value of '710' which is supposed to be the maximum tested
      value but it's possible to test all KVM_MAX_VCPUS even when there are
      less physical CPUs available.
      
      Drop the arbitrary '710' value and return num_online_cpus() on x86 as
      well. The recommendation will match other architectures and will mean
      'no CPU overcommit'.
      
      For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning
      when the requested vCPU number exceeds it. The static limit of '710'
      is quite weird as smaller systems with just a few physical CPUs should
      certainly "recommend" less.
      Suggested-by: NEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211111134733.86601-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      da1bfd52
    • V
      KVM: Move INVPCID type check from vmx and svm to the common kvm_handle_invpcid() · 796c83c5
      Vipin Sharma 提交于
      Handle #GP on INVPCID due to an invalid type in the common switch
      statement instead of relying on the callers (VMX and SVM) to manually
      validate the type.
      
      Unlike INVVPID and INVEPT, INVPCID is not explicitly documented to check
      the type before reading the operand from memory, so deferring the
      type validity check until after that point is architecturally allowed.
      Signed-off-by: NVipin Sharma <vipinsh@google.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211109174426.2350547-3-vipinsh@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      796c83c5
    • V
      KVM: x86: Rename kvm_lapic_enable_pv_eoi() · 77c3323f
      Vitaly Kuznetsov 提交于
      kvm_lapic_enable_pv_eoi() is a misnomer as the function is also
      used to disable PV EOI. Rename it to kvm_lapic_set_pv_eoi().
      
      No functional change intended.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211108152819.12485-2-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      77c3323f
    • M
      KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active · cae72dcc
      Maxim Levitsky 提交于
      KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
      standard kvm's inject_pending_event, and not via APICv/AVIC.
      
      Since this is a debug feature, just inhibit APICv/AVIC while
      KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.
      
      Fixes: 61e5f69e ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Tested-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cae72dcc
    • J
      kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool · e6cd31f1
      Jim Mattson 提交于
      These function names sound like predicates, and they have siblings,
      *is_valid_msr(), which _are_ predicates. Moreover, there are comments
      that essentially warn that these functions behave unexpectedly.
      
      Flip the polarity of the return values, so that they become
      predicates, and convert the boolean result to a success/failure code
      at the outer call site.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211105202058.1048757-1-jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e6cd31f1
    • D
      KVM: x86: Fix recording of guest steal time / preempted status · 7e2175eb
      David Woodhouse 提交于
      In commit b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
      not missed") we switched to using a gfn_to_pfn_cache for accessing the
      guest steal time structure in order to allow for an atomic xchg of the
      preempted field. This has a couple of problems.
      
      Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
      atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
      guest vCPU using an IOMEM page for its steal time would never have its
      preempted field set.
      
      Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
      should have been. There are two stages to the GFN->PFN conversion;
      first the GFN is converted to a userspace HVA, and then that HVA is
      looked up in the process page tables to find the underlying host PFN.
      Correct invalidation of the latter would require being hooked up to the
      MMU notifiers, but that doesn't happen---so it just keeps mapping and
      unmapping the *wrong* PFN after the userspace page tables change.
      
      In the !IOMEM case at least the stale page *is* pinned all the time it's
      cached, so it won't be freed and reused by anyone else while still
      receiving the steal time updates. The map/unmap dance only takes care
      of the KVM administrivia such as marking the page dirty.
      
      Until the gfn_to_pfn cache handles the remapping automatically by
      integrating with the MMU notifiers, we might as well not get a
      kernel mapping of it, and use the perfectly serviceable userspace HVA
      that we already have.  We just need to implement the atomic xchg on
      the userspace address with appropriate exception handling, which is
      fairly trivial.
      
      Cc: stable@vger.kernel.org
      Fixes: b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org>
      [I didn't entirely agree with David's assessment of the
       usefulness of the gfn_to_pfn cache, and integrated the outcome
       of the discussion in the above commit message. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7e2175eb
    • P
      KVM: SEV: Add support for SEV intra host migration · b5663931
      Peter Gonda 提交于
      For SEV to work with intra host migration, contents of the SEV info struct
      such as the ASID (used to index the encryption key in the AMD SP) and
      the list of memory regions need to be transferred to the target VM.
      This change adds a commands for a target VMM to get a source SEV VM's sev
      info.
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NMarc Orr <marcorr@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Message-Id: <20211021174303.385706-3-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b5663931
    • P
      KVM: generalize "bugged" VM to "dead" VM · f4d31653
      Paolo Bonzini 提交于
      Generalize KVM_REQ_VM_BUGGED so that it can be called even in cases
      where it is by design that the VM cannot be operated upon.  In this
      case any KVM_BUG_ON should still warn, so introduce a new flag
      kvm->vm_dead that is separate from kvm->vm_bugged.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f4d31653
  9. 28 10月, 2021 1 次提交
    • D
      KVM: x86: Take srcu lock in post_kvm_run_save() · f3d1436d
      David Woodhouse 提交于
      The Xen interrupt injection for event channels relies on accessing the
      guest's vcpu_info structure in __kvm_xen_has_interrupt(), through a
      gfn_to_hva_cache.
      
      This requires the srcu lock to be held, which is mostly the case except
      for this code path:
      
      [   11.822877] WARNING: suspicious RCU usage
      [   11.822965] -----------------------------
      [   11.823013] include/linux/kvm_host.h:664 suspicious rcu_dereference_check() usage!
      [   11.823131]
      [   11.823131] other info that might help us debug this:
      [   11.823131]
      [   11.823196]
      [   11.823196] rcu_scheduler_active = 2, debug_locks = 1
      [   11.823253] 1 lock held by dom:0/90:
      [   11.823292]  #0: ffff998956ec8118 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x680
      [   11.823379]
      [   11.823379] stack backtrace:
      [   11.823428] CPU: 2 PID: 90 Comm: dom:0 Kdump: loaded Not tainted 5.4.34+ #5
      [   11.823496] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [   11.823612] Call Trace:
      [   11.823645]  dump_stack+0x7a/0xa5
      [   11.823681]  lockdep_rcu_suspicious+0xc5/0x100
      [   11.823726]  __kvm_xen_has_interrupt+0x179/0x190
      [   11.823773]  kvm_cpu_has_extint+0x6d/0x90
      [   11.823813]  kvm_cpu_accept_dm_intr+0xd/0x40
      [   11.823853]  kvm_vcpu_ready_for_interrupt_injection+0x20/0x30
                    < post_kvm_run_save() inlined here >
      [   11.823906]  kvm_arch_vcpu_ioctl_run+0x135/0x6a0
      [   11.823947]  kvm_vcpu_ioctl+0x263/0x680
      
      Fixes: 40da8ccd ("KVM: x86/xen: Add event channel interrupt vector upcall")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: stable@vger.kernel.org
      Message-Id: <606aaaf29fca3850a63aa4499826104e77a72346.camel@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f3d1436d