1. 10 12月, 2021 2 次提交
  2. 02 12月, 2021 1 次提交
  3. 30 11月, 2021 1 次提交
    • P
      KVM: x86: check PIR even for vCPUs with disabled APICv · 37c4dbf3
      Paolo Bonzini 提交于
      The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even
      if APICv is disabled on the vCPU that receives it.  In that case, the
      interrupt will just cause a vmexit and leave the ON bit set together
      with the PIR bit corresponding to the interrupt.
      
      Right now, the interrupt would not be delivered until APICv is re-enabled.
      However, fixing this is just a matter of always doing the PIR->IRR
      synchronization, even if the vCPU has temporarily disabled APICv.
      
      This is not a problem for performance, or if anything it is an
      improvement.  First, in the common case where vcpu->arch.apicv_active is
      true, one fewer check has to be performed.  Second, static_call_cond will
      elide the function call if APICv is not present or disabled.  Finally,
      in the case for AMD hardware we can remove the sync_pir_to_irr callback:
      it is only needed for apic_has_interrupt_for_ppr, and that function
      already has a fallback for !APICv.
      
      Cc: stable@vger.kernel.org
      Co-developed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20211123004311.2954158-4-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      37c4dbf3
  4. 26 11月, 2021 4 次提交
  5. 18 11月, 2021 3 次提交
    • V
      KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS · 2845e735
      Vitaly Kuznetsov 提交于
      It doesn't make sense to return the recommended maximum number of
      vCPUs which exceeds the maximum possible number of vCPUs.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211116163443.88707-7-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2845e735
    • T
      KVM: x86: Assume a 64-bit hypercall for guests with protected state · b5aead00
      Tom Lendacky 提交于
      When processing a hypercall for a guest with protected state, currently
      SEV-ES guests, the guest CS segment register can't be checked to
      determine if the guest is in 64-bit mode. For an SEV-ES guest, it is
      expected that communication between the guest and the hypervisor is
      performed to shared memory using the GHCB. In order to use the GHCB, the
      guest must have been in long mode, otherwise writes by the guest to the
      GHCB would be encrypted and not be able to be comprehended by the
      hypervisor.
      
      Create a new helper function, is_64_bit_hypercall(), that assumes the
      guest is in 64-bit mode when the guest has protected state, and returns
      true, otherwise invoking is_64_bit_mode() to determine the mode. Update
      the hypercall related routines to use is_64_bit_hypercall() instead of
      is_64_bit_mode().
      
      Add a WARN_ON_ONCE() to is_64_bit_mode() to catch occurences of calls to
      this helper function for a guest running with protected state.
      
      Fixes: f1c6366e ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
      Reported-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <e0b20c770c9d0d1403f23d83e785385104211f74.1621878537.git.thomas.lendacky@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b5aead00
    • D
      KVM: Fix steal time asm constraints · 964b7aa0
      David Woodhouse 提交于
      In 64-bit mode, x86 instruction encoding allows us to use the low 8 bits
      of any GPR as an 8-bit operand. In 32-bit mode, however, we can only use
      the [abcd] registers. For which, GCC has the "q" constraint instead of
      the less restrictive "r".
      
      Also fix st->preempted, which is an input/output operand rather than an
      input.
      
      Fixes: 7e2175eb ("KVM: x86: Fix recording of guest steal time / preempted status")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <89bf72db1b859990355f9c40713a34e0d2d86c98.camel@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      964b7aa0
  6. 16 11月, 2021 1 次提交
    • KVM: x86: Fix uninitialized eoi_exit_bitmap usage in vcpu_load_eoi_exitmap() · c5adbb3a
      黄乐 提交于
      In vcpu_load_eoi_exitmap(), currently the eoi_exit_bitmap[4] array is
      initialized only when Hyper-V context is available, in other path it is
      just passed to kvm_x86_ops.load_eoi_exitmap() directly from on the stack,
      which would cause unexpected interrupt delivery/handling issues, e.g. an
      *old* linux kernel that relies on PIT to do clock calibration on KVM might
      randomly fail to boot.
      
      Fix it by passing ioapic_handled_vectors to load_eoi_exitmap() when Hyper-V
      context is not available.
      
      Fixes: f2bc14b6 ("KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V context")
      Cc: stable@vger.kernel.org
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NHuang Le <huangle1@jd.com>
      Message-Id: <62115b277dab49ea97da5633f8522daf@jd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c5adbb3a
  7. 12 11月, 2021 1 次提交
  8. 11 11月, 2021 8 次提交
    • V
      KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS · da1bfd52
      Vitaly Kuznetsov 提交于
      KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of
      VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports
      either num_online_cpus() or num_present_cpus(), s390 has multiple
      constants depending on hardware features. On x86, KVM reports an
      arbitrary value of '710' which is supposed to be the maximum tested
      value but it's possible to test all KVM_MAX_VCPUS even when there are
      less physical CPUs available.
      
      Drop the arbitrary '710' value and return num_online_cpus() on x86 as
      well. The recommendation will match other architectures and will mean
      'no CPU overcommit'.
      
      For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning
      when the requested vCPU number exceeds it. The static limit of '710'
      is quite weird as smaller systems with just a few physical CPUs should
      certainly "recommend" less.
      Suggested-by: NEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211111134733.86601-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      da1bfd52
    • V
      KVM: Move INVPCID type check from vmx and svm to the common kvm_handle_invpcid() · 796c83c5
      Vipin Sharma 提交于
      Handle #GP on INVPCID due to an invalid type in the common switch
      statement instead of relying on the callers (VMX and SVM) to manually
      validate the type.
      
      Unlike INVVPID and INVEPT, INVPCID is not explicitly documented to check
      the type before reading the operand from memory, so deferring the
      type validity check until after that point is architecturally allowed.
      Signed-off-by: NVipin Sharma <vipinsh@google.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211109174426.2350547-3-vipinsh@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      796c83c5
    • V
      KVM: x86: Rename kvm_lapic_enable_pv_eoi() · 77c3323f
      Vitaly Kuznetsov 提交于
      kvm_lapic_enable_pv_eoi() is a misnomer as the function is also
      used to disable PV EOI. Rename it to kvm_lapic_set_pv_eoi().
      
      No functional change intended.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211108152819.12485-2-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      77c3323f
    • M
      KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active · cae72dcc
      Maxim Levitsky 提交于
      KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
      standard kvm's inject_pending_event, and not via APICv/AVIC.
      
      Since this is a debug feature, just inhibit APICv/AVIC while
      KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.
      
      Fixes: 61e5f69e ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Tested-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cae72dcc
    • J
      kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool · e6cd31f1
      Jim Mattson 提交于
      These function names sound like predicates, and they have siblings,
      *is_valid_msr(), which _are_ predicates. Moreover, there are comments
      that essentially warn that these functions behave unexpectedly.
      
      Flip the polarity of the return values, so that they become
      predicates, and convert the boolean result to a success/failure code
      at the outer call site.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211105202058.1048757-1-jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e6cd31f1
    • D
      KVM: x86: Fix recording of guest steal time / preempted status · 7e2175eb
      David Woodhouse 提交于
      In commit b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
      not missed") we switched to using a gfn_to_pfn_cache for accessing the
      guest steal time structure in order to allow for an atomic xchg of the
      preempted field. This has a couple of problems.
      
      Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
      atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
      guest vCPU using an IOMEM page for its steal time would never have its
      preempted field set.
      
      Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
      should have been. There are two stages to the GFN->PFN conversion;
      first the GFN is converted to a userspace HVA, and then that HVA is
      looked up in the process page tables to find the underlying host PFN.
      Correct invalidation of the latter would require being hooked up to the
      MMU notifiers, but that doesn't happen---so it just keeps mapping and
      unmapping the *wrong* PFN after the userspace page tables change.
      
      In the !IOMEM case at least the stale page *is* pinned all the time it's
      cached, so it won't be freed and reused by anyone else while still
      receiving the steal time updates. The map/unmap dance only takes care
      of the KVM administrivia such as marking the page dirty.
      
      Until the gfn_to_pfn cache handles the remapping automatically by
      integrating with the MMU notifiers, we might as well not get a
      kernel mapping of it, and use the perfectly serviceable userspace HVA
      that we already have.  We just need to implement the atomic xchg on
      the userspace address with appropriate exception handling, which is
      fairly trivial.
      
      Cc: stable@vger.kernel.org
      Fixes: b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org>
      [I didn't entirely agree with David's assessment of the
       usefulness of the gfn_to_pfn cache, and integrated the outcome
       of the discussion in the above commit message. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7e2175eb
    • P
      KVM: SEV: Add support for SEV intra host migration · b5663931
      Peter Gonda 提交于
      For SEV to work with intra host migration, contents of the SEV info struct
      such as the ASID (used to index the encryption key in the AMD SP) and
      the list of memory regions need to be transferred to the target VM.
      This change adds a commands for a target VMM to get a source SEV VM's sev
      info.
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NMarc Orr <marcorr@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Message-Id: <20211021174303.385706-3-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b5663931
    • P
      KVM: generalize "bugged" VM to "dead" VM · f4d31653
      Paolo Bonzini 提交于
      Generalize KVM_REQ_VM_BUGGED so that it can be called even in cases
      where it is by design that the VM cannot be operated upon.  In this
      case any KVM_BUG_ON should still warn, so introduce a new flag
      kvm->vm_dead that is separate from kvm->vm_bugged.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f4d31653
  9. 28 10月, 2021 1 次提交
    • D
      KVM: x86: Take srcu lock in post_kvm_run_save() · f3d1436d
      David Woodhouse 提交于
      The Xen interrupt injection for event channels relies on accessing the
      guest's vcpu_info structure in __kvm_xen_has_interrupt(), through a
      gfn_to_hva_cache.
      
      This requires the srcu lock to be held, which is mostly the case except
      for this code path:
      
      [   11.822877] WARNING: suspicious RCU usage
      [   11.822965] -----------------------------
      [   11.823013] include/linux/kvm_host.h:664 suspicious rcu_dereference_check() usage!
      [   11.823131]
      [   11.823131] other info that might help us debug this:
      [   11.823131]
      [   11.823196]
      [   11.823196] rcu_scheduler_active = 2, debug_locks = 1
      [   11.823253] 1 lock held by dom:0/90:
      [   11.823292]  #0: ffff998956ec8118 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x680
      [   11.823379]
      [   11.823379] stack backtrace:
      [   11.823428] CPU: 2 PID: 90 Comm: dom:0 Kdump: loaded Not tainted 5.4.34+ #5
      [   11.823496] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [   11.823612] Call Trace:
      [   11.823645]  dump_stack+0x7a/0xa5
      [   11.823681]  lockdep_rcu_suspicious+0xc5/0x100
      [   11.823726]  __kvm_xen_has_interrupt+0x179/0x190
      [   11.823773]  kvm_cpu_has_extint+0x6d/0x90
      [   11.823813]  kvm_cpu_accept_dm_intr+0xd/0x40
      [   11.823853]  kvm_vcpu_ready_for_interrupt_injection+0x20/0x30
                    < post_kvm_run_save() inlined here >
      [   11.823906]  kvm_arch_vcpu_ioctl_run+0x135/0x6a0
      [   11.823947]  kvm_vcpu_ioctl+0x263/0x680
      
      Fixes: 40da8ccd ("KVM: x86/xen: Add event channel interrupt vector upcall")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: stable@vger.kernel.org
      Message-Id: <606aaaf29fca3850a63aa4499826104e77a72346.camel@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f3d1436d
  10. 25 10月, 2021 2 次提交
    • D
      KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock · 8228c77d
      David Woodhouse 提交于
      On the preemption path when updating a Xen guest's runstate times, this
      lock is taken inside the scheduler rq->lock, which is a raw spinlock.
      This was shown in a lockdep warning:
      
      [   89.138354] =============================
      [   89.138356] [ BUG: Invalid wait context ]
      [   89.138358] 5.15.0-rc5+ #834 Tainted: G S        I E
      [   89.138360] -----------------------------
      [   89.138361] xen_shinfo_test/2575 is trying to lock:
      [   89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm]
      [   89.138442] other info that might help us debug this:
      [   89.138444] context-{5:5}
      [   89.138445] 4 locks held by xen_shinfo_test/2575:
      [   89.138447]  #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm]
      [   89.138483]  #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm]
      [   89.138526]  #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0
      [   89.138534]  #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm]
      ...
      [   89.138695]  get_kvmclock_ns+0x1f/0x130 [kvm]
      [   89.138734]  kvm_xen_update_runstate+0x14/0x90 [kvm]
      [   89.138783]  kvm_xen_update_runstate_guest+0x15/0xd0 [kvm]
      [   89.138830]  kvm_arch_vcpu_put+0xe6/0x170 [kvm]
      [   89.138870]  kvm_sched_out+0x2f/0x40 [kvm]
      [   89.138900]  __schedule+0x5de/0xbd0
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
      Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8228c77d
    • D
      KVM: x86: On emulation failure, convey the exit reason, etc. to userspace · e615e355
      David Edmondson 提交于
      Should instruction emulation fail, include the VM exit reason, etc. in
      the emulation_failure data passed to userspace, in order that the VMM
      can report it as a debugging aid when describing the failure.
      Suggested-by: NJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210920103737.2696756-4-david.edmondson@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e615e355
  11. 23 10月, 2021 1 次提交
  12. 22 10月, 2021 15 次提交
    • S
      KVM: x86: Use rw_semaphore for APICv lock to allow vCPU parallelism · 187c8833
      Sean Christopherson 提交于
      Use a rw_semaphore instead of a mutex to coordinate APICv updates so that
      vCPUs responding to requests can take the lock for read and run in
      parallel.  Using a mutex forces serialization of vCPUs even though
      kvm_vcpu_update_apicv() only touches data local to that vCPU or is
      protected by a different lock, e.g. SVM's ir_list_lock.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211022004927.1448382-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      187c8833
    • S
      KVM: x86: Move SVM's APICv sanity check to common x86 · ee49a893
      Sean Christopherson 提交于
      Move SVM's assertion that vCPU's APICv state is consistent with its VM's
      state out of svm_vcpu_run() and into x86's common inner run loop.  The
      assertion and underlying logic is not unique to SVM, it's just that SVM
      has more inhibiting conditions and thus is more likely to run headfirst
      into any KVM bugs.
      
      Add relevant comments to document exactly why the update path has unusual
      ordering between the update the kick, why said ordering is safe, and also
      the basic rules behind the assertion in the run loop.
      
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211022004927.1448382-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ee49a893
    • P
      KVM: SEV-ES: go over the sev_pio_data buffer in multiple passes if needed · 95e16b47
      Paolo Bonzini 提交于
      The PIO scratch buffer is larger than a single page, and therefore
      it is not possible to copy it in a single step to vcpu->arch/pio_data.
      Bound each call to emulator_pio_in/out to a single page; keep
      track of how many I/O operations are left in vcpu->arch.sev_pio_count,
      so that the operation can be restarted in the complete_userspace_io
      callback.
      
      For OUT, this means that the previous kvm_sev_es_outs implementation
      becomes an iterator of the loop, and we can consume the sev_pio_data
      buffer before leaving to userspace.
      
      For IN, instead, consuming the buffer and decreasing sev_pio_count
      is always done in the complete_userspace_io callback, because that
      is when the memcpy is done into sev_pio_data.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
      Reported-by: NFelix Wilhelm <fwilhelm@google.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      95e16b47
    • P
      KVM: SEV-ES: keep INS functions together · 4fa4b38d
      Paolo Bonzini 提交于
      Make the diff a little nicer when we actually get to fixing
      the bug.  No functional change intended.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4fa4b38d
    • P
      KVM: x86: remove unnecessary arguments from complete_emulator_pio_in · 6b5efc93
      Paolo Bonzini 提交于
      complete_emulator_pio_in can expect that vcpu->arch.pio has been filled in,
      and therefore does not need the size and count arguments.  This makes things
      nicer when the function is called directly from a complete_userspace_io
      callback.
      
      No functional change intended.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6b5efc93
    • P
      KVM: x86: split the two parts of emulator_pio_in · 3b27de27
      Paolo Bonzini 提交于
      emulator_pio_in handles both the case where the data is pending in
      vcpu->arch.pio.count, and the case where I/O has to be done via either
      an in-kernel device or a userspace exit.  For SEV-ES we would like
      to split these, to identify clearly the moment at which the
      sev_pio_data is consumed.  To this end, create two different
      functions: __emulator_pio_in fills in vcpu->arch.pio.count, while
      complete_emulator_pio_in clears it and releases vcpu->arch.pio.data.
      
      Because this patch has to be backported, things are left a bit messy.
      kernel_pio() operates on vcpu->arch.pio, which leads to emulator_pio_in()
      having with two calls to complete_emulator_pio_in().  It will be fixed
      in the next release.
      
      While at it, remove the unused void* val argument of emulator_pio_in_out.
      The function currently hardcodes vcpu->arch.pio_data as the
      source/destination buffer, which sucks but will be fixed after the more
      severe SEV-ES buffer overflow.
      
      No functional change intended.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3b27de27
    • P
      KVM: SEV-ES: clean up kvm_sev_es_ins/outs · ea724ea4
      Paolo Bonzini 提交于
      A few very small cleanups to the functions, smushed together because
      the patch is already very small like this:
      
      - inline emulator_pio_in_emulated and emulator_pio_out_emulated,
        since we already have the vCPU
      
      - remove the data argument and pull setting vcpu->arch.sev_pio_data into
        the caller
      
      - remove unnecessary clearing of vcpu->arch.pio.count when
        emulation is done by the kernel (and therefore vcpu->arch.pio.count
        is already clear on exit from emulator_pio_in and emulator_pio_out).
      
      No functional change intended.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ea724ea4
    • P
      KVM: x86: leave vcpu->arch.pio.count alone in emulator_pio_in_out · 0d33b1ba
      Paolo Bonzini 提交于
      Currently emulator_pio_in clears vcpu->arch.pio.count twice if
      emulator_pio_in_out performs kernel PIO.  Move the clear into
      emulator_pio_out where it is actually necessary.
      
      No functional change intended.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0d33b1ba
    • P
      KVM: SEV-ES: rename guest_ins_data to sev_pio_data · b5998402
      Paolo Bonzini 提交于
      We will be using this field for OUTS emulation as well, in case the
      data that is pushed via OUTS spans more than one page.  In that case,
      there will be a need to save the data pointer across exits to userspace.
      
      So, change the name to something that refers to any kind of PIO.
      Also spell out what it is used for, namely SEV-ES.
      
      No functional change intended.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b5998402
    • L
      KVM: X86: Don't unload MMU in kvm_vcpu_flush_tlb_guest() · 61b05a9f
      Lai Jiangshan 提交于
      kvm_mmu_unload() destroys all the PGD caches.  Use the lighter
      kvm_mmu_sync_roots() and kvm_mmu_sync_prev_roots() instead.
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20211019110154.4091-5-jiangshanlai@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      61b05a9f
    • L
      KVM: X86: Cache CR3 in prev_roots when PCID is disabled · 509bfe3d
      Lai Jiangshan 提交于
      The commit 21823fbd ("KVM: x86: Invalidate all PGDs for the
      current PCID on MOV CR3 w/ flush") invalidates all PGDs for the specific
      PCID and in the case of PCID is disabled, it includes all PGDs in the
      prev_roots and the commit made prev_roots totally unused in this case.
      
      Not using prev_roots fixes a problem when CR4.PCIDE is changed 0 -> 1
      before the said commit:
      
      	(CR4.PCIDE=0, CR4.PGE=1; CR3=cr3_a; the page for the guest
      	 RIP is global; cr3_b is cached in prev_roots)
      
      	modify page tables under cr3_b
      		the shadow root of cr3_b is unsync in kvm
      	INVPCID single context
      		the guest expects the TLB is clean for PCID=0
      	change CR4.PCIDE 0 -> 1
      	switch to cr3_b with PCID=0,NOFLUSH=1
      		No sync in kvm, cr3_b is still unsync in kvm
      	jump to the page that was modified in step 1
      		shadow page tables point to the wrong page
      
      It is a very unlikely case, but it shows that stale prev_roots can be
      a problem after CR4.PCIDE changes from 0 to 1.  However, to fix this
      case, the commit disabled caching CR3 in prev_roots altogether when PCID
      is disabled.  Not all CPUs have PCID; especially the PCID support
      for AMD CPUs is kind of recent.  To restore the prev_roots optimization
      for CR4.PCIDE=0, flush the whole MMU (including all prev_roots) when
      CR4.PCIDE changes.
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20211019110154.4091-3-jiangshanlai@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      509bfe3d
    • L
      KVM: X86: Fix tlb flush for tdp in kvm_invalidate_pcid() · e45e9e39
      Lai Jiangshan 提交于
      The KVM doesn't know whether any TLB for a specific pcid is cached in
      the CPU when tdp is enabled.  So it is better to flush all the guest
      TLB when invalidating any single PCID context.
      
      The case is very rare or even impossible since KVM generally doesn't
      intercept CR3 write or INVPCID instructions when tdp is enabled, so the
      fix is mostly for the sake of overall robustness.
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20211019110154.4091-2-jiangshanlai@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e45e9e39
    • L
      KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE · a91a7c70
      Lai Jiangshan 提交于
      X86_CR4_PGE doesn't participate in kvm_mmu_role, so the mmu context
      doesn't need to be reset.  It is only required to flush all the guest
      tlb.
      
      It is also inconsistent that X86_CR4_PGE is in KVM_MMU_CR4_ROLE_BITS
      while kvm_mmu_role doesn't use X86_CR4_PGE.  So X86_CR4_PGE is also
      removed from KVM_MMU_CR4_ROLE_BITS.
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210919024246.89230-3-jiangshanlai@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a91a7c70
    • L
      KVM: X86: Don't reset mmu context when X86_CR4_PCIDE 1->0 · 55261738
      Lai Jiangshan 提交于
      X86_CR4_PCIDE doesn't participate in kvm_mmu_role, so the mmu context
      doesn't need to be reset.  It is only required to flush all the guest
      tlb.
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210919024246.89230-2-jiangshanlai@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      55261738
    • S
      KVM: x86: Add vendor name to kvm_x86_ops, use it for error messages · 9dadfc4a
      Sean Christopherson 提交于
      Paul pointed out the error messages when KVM fails to load are unhelpful
      in understanding exactly what went wrong if userspace probes the "wrong"
      module.
      
      Add a mandatory kvm_x86_ops field to track vendor module names, kvm_intel
      and kvm_amd, and use the name for relevant error message when KVM fails
      to load so that the user knows which module failed to load.
      
      Opportunistically tweak the "disabled by bios" error message to clarify
      that _support_ was disabled, not that the module itself was magically
      disabled by BIOS.
      Suggested-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211018183929.897461-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9dadfc4a