1. 01 3月, 2022 5 次提交
  2. 25 2月, 2022 5 次提交
  3. 19 2月, 2022 5 次提交
  4. 12 2月, 2022 1 次提交
    • M
      KVM: SVM: fix race between interrupt delivery and AVIC inhibition · 66fa226c
      Maxim Levitsky 提交于
      If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
      inhibited, it might read a stale value of vcpu->arch.apicv_active
      which can lead to the target vCPU not noticing the interrupt.
      
      To fix this use load-acquire/store-release so that, if the target vCPU
      is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the
      AVIC.  If AVIC has been disabled in the meanwhile, proceed with the
      KVM_REQ_EVENT-based delivery.
      
      Incomplete IPI vmexit has the same races as svm_deliver_avic_intr, and
      in fact it can be handled in exactly the same way; the only difference
      lies in who has set IRR, whether svm_deliver_interrupt or the processor.
      Therefore, svm_complete_interrupt_delivery can be used to fix incomplete
      IPI vmexits as well.
      Co-developed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      66fa226c
  5. 11 2月, 2022 13 次提交
    • D
      KVM: x86/mmu: Split huge pages mapped by the TDP MMU during KVM_CLEAR_DIRTY_LOG · cb00a70b
      David Matlack 提交于
      When using KVM_DIRTY_LOG_INITIALLY_SET, huge pages are not
      write-protected when dirty logging is enabled on the memslot. Instead
      they are write-protected once userspace invokes KVM_CLEAR_DIRTY_LOG for
      the first time and only for the specific sub-region being cleared.
      
      Enhance KVM_CLEAR_DIRTY_LOG to also try to split huge pages prior to
      write-protecting to avoid causing write-protection faults on vCPU
      threads. This also allows userspace to smear the cost of huge page
      splitting across multiple ioctls, rather than splitting the entire
      memslot as is the case when initially-all-set is not used.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20220119230739.2234394-17-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cb00a70b
    • D
      KVM: x86/mmu: Split huge pages mapped by the TDP MMU when dirty logging is enabled · a3fe5dbd
      David Matlack 提交于
      When dirty logging is enabled without initially-all-set, try to split
      all huge pages in the memslot down to 4KB pages so that vCPUs do not
      have to take expensive write-protection faults to split huge pages.
      
      Eager page splitting is best-effort only. This commit only adds the
      support for the TDP MMU, and even there splitting may fail due to out
      of memory conditions. Failures to split a huge page is fine from a
      correctness standpoint because KVM will always follow up splitting by
      write-protecting any remaining huge pages.
      
      Eager page splitting moves the cost of splitting huge pages off of the
      vCPU threads and onto the thread enabling dirty logging on the memslot.
      This is useful because:
      
       1. Splitting on the vCPU thread interrupts vCPUs execution and is
          disruptive to customers whereas splitting on VM ioctl threads can
          run in parallel with vCPU execution.
      
       2. Splitting all huge pages at once is more efficient because it does
          not require performing VM-exit handling or walking the page table for
          every 4KiB page in the memslot, and greatly reduces the amount of
          contention on the mmu_lock.
      
      For example, when running dirty_log_perf_test with 96 virtual CPUs, 1GiB
      per vCPU, and 1GiB HugeTLB memory, the time it takes vCPUs to write to
      all of their memory after dirty logging is enabled decreased by 95% from
      2.94s to 0.14s.
      
      Eager Page Splitting is over 100x more efficient than the current
      implementation of splitting on fault under the read lock. For example,
      taking the same workload as above, Eager Page Splitting reduced the CPU
      required to split all huge pages from ~270 CPU-seconds ((2.94s - 0.14s)
      * 96 vCPU threads) to only 1.55 CPU-seconds.
      
      Eager page splitting does increase the amount of time it takes to enable
      dirty logging since it has split all huge pages. For example, the time
      it took to enable dirty logging in the 96GiB region of the
      aforementioned test increased from 0.001s to 1.55s.
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20220119230739.2234394-16-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a3fe5dbd
    • S
      KVM: x86: Use more verbose names for mem encrypt kvm_x86_ops hooks · 03d004cd
      Sean Christopherson 提交于
      Use slightly more verbose names for the so called "memory encrypt",
      a.k.a. "mem enc", kvm_x86_ops hooks to bridge the gap between the current
      super short kvm_x86_ops names and SVM's more verbose, but non-conforming
      names.  This is a step toward using kvm-x86-ops.h with KVM_X86_CVM_OP()
      to fill svm_x86_ops.
      
      Opportunistically rename mem_enc_op() to mem_enc_ioctl() to better
      reflect its true nature, as it really is a full fledged ioctl() of its
      own.  Ideally, the hook would be named confidential_vm_ioctl() or so, as
      the ioctl() is a gateway to more than just memory encryption, and because
      its underlying purpose to support Confidential VMs, which can be provided
      without memory encryption, e.g. if the TCB of the guest includes the host
      kernel but not host userspace, or by isolation in hardware without
      encrypting memory.  But, diverging from KVM_MEMORY_ENCRYPT_OP even
      further is undeseriable, and short of creating alises for all related
      ioctl()s, which introduces a different flavor of divergence, KVM is stuck
      with the nomenclature.
      
      Defer renaming SVM's functions to a future commit as there are additional
      changes needed to make SVM fully conforming and to match reality (looking
      at you, svm_vm_copy_asid_from()).
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220128005208.4008533-20-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      03d004cd
    • S
      KVM: x86: Move get_cs_db_l_bits() helper to SVM · 872e0c53
      Sean Christopherson 提交于
      Move kvm_get_cs_db_l_bits() to SVM and rename it appropriately so that
      its svm_x86_ops entry can be filled via kvm-x86-ops, and to eliminate a
      superfluous export from KVM x86.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220128005208.4008533-16-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      872e0c53
    • S
      KVM: x86: Use static_call() for copy/move encryption context ioctls() · 7ad02ef0
      Sean Christopherson 提交于
      Define and use static_call()s for .vm_{copy,move}_enc_context_from(),
      mostly so that the op is defined in kvm-x86-ops.h.  This will allow using
      KVM_X86_OP in vendor code to wire up the implementation.  Any performance
      gains eeked out by using static_call() is a happy bonus and not the
      primary motiviation.
      
      Opportunistically refactor the code to reduce indentation and keep line
      lengths reasonable, and to be consistent when wrapping versus running
      a bit over the 80 char soft limit.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220128005208.4008533-12-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7ad02ef0
    • S
      KVM: x86: Unexport kvm_x86_ops · dfc4e6ca
      Sean Christopherson 提交于
      Drop the export of kvm_x86_ops now it is no longer referenced by SVM or
      VMX.  Disallowing access to kvm_x86_ops is very desirable as it prevents
      vendor code from incorrectly modifying hooks after they have been set by
      kvm_arch_hardware_setup(), and more importantly after each function's
      associated static_call key has been updated.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220128005208.4008533-11-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dfc4e6ca
    • S
      KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names · e27bc044
      Sean Christopherson 提交于
      Rename a variety of kvm_x86_op function pointers so that preferred name
      for vendor implementations follows the pattern <vendor>_<function>, e.g.
      rename .run() to .vcpu_run() to match {svm,vmx}_vcpu_run().  This will
      allow vendor implementations to be wired up via the KVM_X86_OP macro.
      
      In many cases, VMX and SVM "disagree" on the preferred name, though in
      reality it's VMX and x86 that disagree as SVM blindly prepended _svm to
      the kvm_x86_ops name.  Justification for using the VMX nomenclature:
      
        - set_{irq,nmi} => inject_{irq,nmi} because the helper is injecting an
          event that has already been "set" in e.g. the vIRR.  SVM's relevant
          VMCB field is even named event_inj, and KVM's stat is irq_injections.
      
        - prepare_guest_switch => prepare_switch_to_guest because the former is
          ambiguous, e.g. it could mean switching between multiple guests,
          switching from the guest to host, etc...
      
        - update_pi_irte => pi_update_irte to allow for matching match the rest
          of VMX's posted interrupt naming scheme, which is vmx_pi_<blah>().
      
        - start_assignment => pi_start_assignment to again follow VMX's posted
          interrupt naming scheme, and to provide context for what bit of code
          might care about an otherwise undescribed "assignment".
      
      The "tlb_flush" => "flush_tlb" creates an inconsistency with respect to
      Hyper-V's "tlb_remote_flush" hooks, but Hyper-V really is the one that's
      wrong.  x86, VMX, and SVM all use flush_tlb, and even common KVM is on a
      variant of the bandwagon with "kvm_flush_remote_tlbs", e.g. a more
      appropriate name for the Hyper-V hooks would be flush_remote_tlbs.  Leave
      that change for another time as the Hyper-V hooks always start as NULL,
      i.e. the name doesn't matter for using kvm-x86-ops.h, and changing all
      names requires an astounding amount of churn.
      
      VMX and SVM function names are intentionally left as is to minimize the
      diff.  Both VMX and SVM will need to rename even more functions in order
      to fully utilize KVM_X86_OPS, i.e. an additional patch for each is
      inevitable.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220128005208.4008533-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e27bc044
    • S
      KVM: x86: Drop export for .tlb_flush_current() static_call key · feee3d9d
      Sean Christopherson 提交于
      Remove the export of kvm_x86_tlb_flush_current() as there are no longer
      any users outside of common x86 code.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220128005208.4008533-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      feee3d9d
    • J
      KVM: x86: Remove unused "flags" of kvm_pv_kick_cpu_op() · 9d68c6f6
      Jinrong Liang 提交于
      The "unsigned long flags" parameter of  kvm_pv_kick_cpu_op() is not used,
      so remove it. No functional change intended.
      Signed-off-by: NJinrong Liang <cloudliang@tencent.com>
      Message-Id: <20220125095909.38122-20-cloudliang@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9d68c6f6
    • J
      KVM: x86: Remove unused "vcpu" of kvm_scale_tsc() · 62711e5a
      Jinrong Liang 提交于
      The "struct kvm_vcpu *vcpu" parameter of kvm_scale_tsc() is not used,
      so remove it. No functional change intended.
      Signed-off-by: NJinrong Liang <cloudliang@tencent.com>
      Message-Id: <20220125095909.38122-18-cloudliang@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      62711e5a
    • S
      KVM: x86: Skip APICv update if APICv is disable at the module level · f1575642
      Sean Christopherson 提交于
      Bail from the APICv update paths _before_ taking apicv_update_lock if
      APICv is disabled at the module level.  kvm_request_apicv_update() in
      particular is invoked from multiple paths that can be reached without
      APICv being enabled, e.g. svm_enable_irq_window(), and taking the
      rw_sem for write when APICv is disabled may introduce unnecessary
      contention and stalls.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211208015236.1616697-25-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f1575642
    • S
      KVM: x86: Drop NULL check on kvm_x86_ops.check_apicv_inhibit_reasons · 7446cfeb
      Sean Christopherson 提交于
      Drop the useless NULL check on kvm_x86_ops.check_apicv_inhibit_reasons
      when handling an APICv update, both VMX and SVM unconditionally implement
      the helper and leave it non-NULL even if APICv is disabled at the module
      level.  The latter is a moot point now that __kvm_request_apicv_update()
      is called if and only if enable_apicv is true.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211208015236.1616697-26-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7446cfeb
    • S
      KVM: x86: Unexport __kvm_request_apicv_update() · cf9e2555
      Sean Christopherson 提交于
      Unexport __kvm_request_apicv_update(), it's not used by vendor code and
      should never be used by vendor code.  The only reason it's exposed at all
      is because Hyper-V's SynIC needs to track how many auto-EOIs are in use,
      and it's convenient to use apicv_update_lock to guard that tracking.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211208015236.1616697-27-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cf9e2555
  6. 04 2月, 2022 1 次提交
    • S
      KVM: x86: Use ERR_PTR_USR() to return -EFAULT as a __user pointer · 6e37ec88
      Sean Christopherson 提交于
      Use ERR_PTR_USR() when returning -EFAULT from kvm_get_attr_addr(), sparse
      complains about implicitly casting the kernel pointer from ERR_PTR() into
      a __user pointer.
      
      >> arch/x86/kvm/x86.c:4342:31: sparse: sparse: incorrect type in return expression
         (different address spaces) @@     expected void [noderef] __user * @@     got void * @@
         arch/x86/kvm/x86.c:4342:31: sparse:     expected void [noderef] __user *
         arch/x86/kvm/x86.c:4342:31: sparse:     got void *
      >> arch/x86/kvm/x86.c:4342:31: sparse: sparse: incorrect type in return expression
         (different address spaces) @@     expected void [noderef] __user * @@     got void * @@
         arch/x86/kvm/x86.c:4342:31: sparse:     expected void [noderef] __user *
         arch/x86/kvm/x86.c:4342:31: sparse:     got void *
      
      No functional change intended.
      
      Fixes: 56f289a8 ("KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220202005157.2545816-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6e37ec88
  7. 01 2月, 2022 1 次提交
    • M
      kvm/x86: rework guest entry logic · b2d2af7e
      Mark Rutland 提交于
      For consistency and clarity, migrate x86 over to the generic helpers for
      guest timing and lockdep/RCU/tracing management, and remove the
      x86-specific helpers.
      
      Prior to this patch, the guest timing was entered in
      kvm_guest_enter_irqoff() (called by svm_vcpu_enter_exit() and
      svm_vcpu_enter_exit()), and was exited by the call to
      vtime_account_guest_exit() within vcpu_enter_guest().
      
      To minimize duplication and to more clearly balance entry and exit, both
      entry and exit of guest timing are placed in vcpu_enter_guest(), using
      the new guest_timing_{enter,exit}_irqoff() helpers. When context
      tracking is used a small amount of additional time will be accounted
      towards guests; tick-based accounting is unnaffected as IRQs are
      disabled at this point and not enabled until after the return from the
      guest.
      
      This also corrects (benign) mis-balanced context tracking accounting
      introduced in commits:
      
        ae95f566 ("KVM: X86: TSCDEADLINE MSR emulation fastpath")
        26efe2fd ("KVM: VMX: Handle preemption timer fastpath")
      
      Where KVM can enter a guest multiple times, calling vtime_guest_enter()
      without a corresponding call to vtime_account_guest_exit(), and with
      vtime_account_system() called when vtime_account_guest() should be used.
      As account_system_time() checks PF_VCPU and calls account_guest_time(),
      this doesn't result in any functional problem, but is unnecessarily
      confusing.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NNicolas Saenz Julienne <nsaenzju@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Message-Id: <20220201132926.3301912-4-mark.rutland@arm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b2d2af7e
  8. 28 1月, 2022 2 次提交
    • P
      KVM: x86: add system attribute to retrieve full set of supported xsave states · dd6e6312
      Paolo Bonzini 提交于
      Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded
      VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that
      have not been enabled.  Probing those, for example so that they can be
      passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl.
      The latter is in fact worse, even though that is what the rest of the
      API uses, because it would require supported_xcr0 to be moved from the
      KVM module to the kernel just for this use.  In addition, the value
      would be nonsensical (or an error would have to be returned) until
      the KVM module is loaded in.
      
      Therefore, to limit the growth of system ioctls, add a /dev/kvm
      variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86
      with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP).
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dd6e6312
    • S
      KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr · 56f289a8
      Sean Christopherson 提交于
      Add a helper to handle converting the u64 userspace address embedded in
      struct kvm_device_attr into a userspace pointer, it's all too easy to
      forget the intermediate "unsigned long" cast as well as the truncation
      check.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      56f289a8
  9. 27 1月, 2022 5 次提交
    • L
      KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time · 05a9e065
      Like Xu 提交于
      XCR0 is reset to 1 by RESET but not INIT and IA32_XSS is zeroed by
      both RESET and INIT. The kvm_set_msr_common()'s handling of MSR_IA32_XSS
      also needs to update kvm_update_cpuid_runtime(). In the above cases, the
      size in bytes of the XSAVE area containing all states enabled by XCR0 or
      (XCRO | IA32_XSS) needs to be updated.
      
      For simplicity and consistency, existing helpers are used to write values
      and call kvm_update_cpuid_runtime(), and it's not exactly a fast path.
      
      Fixes: a554d207 ("KVM: X86: Processor States following Reset or INIT")
      Cc: stable@vger.kernel.org
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220126172226.2298529-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      05a9e065
    • L
      KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS · 4c282e51
      Like Xu 提交于
      Do a runtime CPUID update for a vCPU if MSR_IA32_XSS is written, as the
      size in bytes of the XSAVE area is affected by the states enabled in XSS.
      
      Fixes: 20300099 ("kvm: vmx: add MSR logic for XSAVES")
      Cc: stable@vger.kernel.org
      Signed-off-by: NLike Xu <likexu@tencent.com>
      [sean: split out as a separate patch, adjust Fixes tag]
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220126172226.2298529-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4c282e51
    • X
      KVM: x86: Keep MSR_IA32_XSS unchanged for INIT · be4f3b3f
      Xiaoyao Li 提交于
      It has been corrected from SDM version 075 that MSR_IA32_XSS is reset to
      zero on Power up and Reset but keeps unchanged on INIT.
      
      Fixes: a554d207 ("KVM: X86: Processor States following Reset or INIT")
      Cc: stable@vger.kernel.org
      Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220126172226.2298529-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      be4f3b3f
    • S
      KVM: x86: Forcibly leave nested virt when SMM state is toggled · f7e57078
      Sean Christopherson 提交于
      Forcibly leave nested virtualization operation if userspace toggles SMM
      state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS.  If userspace
      forces the vCPU out of SMM while it's post-VMXON and then injects an SMI,
      vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both
      vmxon=false and smm.vmxon=false, but all other nVMX state allocated.
      
      Don't attempt to gracefully handle the transition as (a) most transitions
      are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't
      sufficient information to handle all transitions, e.g. SVM wants access
      to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede
      KVM_SET_NESTED_STATE during state restore as the latter disallows putting
      the vCPU into L2 if SMM is active, and disallows tagging the vCPU as
      being post-VMXON in SMM if SMM is not active.
      
      Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX
      due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond
      just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU
      in an architecturally impossible state.
      
        WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
        WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
        Modules linked in:
        CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
        RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
        Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00
        Call Trace:
         <TASK>
         kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123
         kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
         kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460
         kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline]
         kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676
         kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline]
         kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250
         kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273
         __fput+0x286/0x9f0 fs/file_table.c:311
         task_work_run+0xdd/0x1a0 kernel/task_work.c:164
         exit_task_work include/linux/task_work.h:32 [inline]
         do_exit+0xb29/0x2a30 kernel/exit.c:806
         do_group_exit+0xd2/0x2f0 kernel/exit.c:935
         get_signal+0x4b0/0x28c0 kernel/signal.c:2862
         arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
         handle_signal_work kernel/entry/common.c:148 [inline]
         exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
         exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
         __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
         syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
         do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220125220358.2091737-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7e57078
    • S
      KVM: x86: Pass emulation type to can_emulate_instruction() · 4d31d9ef
      Sean Christopherson 提交于
      Pass the emulation type to kvm_x86_ops.can_emulate_insutrction() so that
      a future commit can harden KVM's SEV support to WARN on emulation
      scenarios that should never happen.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-6-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4d31d9ef
  10. 25 1月, 2022 1 次提交
  11. 20 1月, 2022 1 次提交
    • S
      KVM: VMX: Don't do full kick when triggering posted interrupt "fails" · 0f65a9d3
      Sean Christopherson 提交于
      Replace the full "kick" with just the "wake" in the fallback path when
      triggering a virtual interrupt via a posted interrupt fails because the
      guest is not IN_GUEST_MODE.  If the guest transitions into guest mode
      between the check and the kick, then it's guaranteed to see the pending
      interrupt as KVM syncs the PIR to IRR (and onto GUEST_RVI) after setting
      IN_GUEST_MODE.  Kicking the guest in this case is nothing more than an
      unnecessary VM-Exit (and host IRQ).
      
      Opportunistically update comments to explain the various ordering rules
      and barriers at play.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211208015236.1616697-17-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0f65a9d3