1. 30 9月, 2021 10 次提交
    • V
      KVM: Drop 'except' parameter from kvm_make_vcpus_request_mask() · 381cecc5
      Vitaly Kuznetsov 提交于
      Both remaining callers of kvm_make_vcpus_request_mask() pass 'NULL' for
      'except' parameter so it can just be dropped.
      
      No functional change intended ©.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210903075141.403071-6-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      381cecc5
    • V
      KVM: Optimize kvm_make_vcpus_request_mask() a bit · ae0946cd
      Vitaly Kuznetsov 提交于
      Iterating over set bits in 'vcpu_bitmap' should be faster than going
      through all vCPUs, especially when just a few bits are set.
      
      Drop kvm_make_vcpus_request_mask() call from kvm_make_all_cpus_request_except()
      to avoid handling the special case when 'vcpu_bitmap' is NULL, move the
      code to kvm_make_all_cpus_request_except() itself.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210903075141.403071-5-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ae0946cd
    • V
      KVM: x86: hyper-v: Avoid calling kvm_make_vcpus_request_mask() with vcpu_mask==NULL · 6470accc
      Vitaly Kuznetsov 提交于
      In preparation to making kvm_make_vcpus_request_mask() use for_each_set_bit()
      switch kvm_hv_flush_tlb() to calling kvm_make_all_cpus_request() for 'all cpus'
      case.
      
      Note: kvm_make_all_cpus_request() (unlike kvm_make_vcpus_request_mask())
      currently dynamically allocates cpumask on each call and this is suboptimal.
      Both kvm_make_all_cpus_request() and kvm_make_vcpus_request_mask() are
      going to be switched to using pre-allocated per-cpu masks.
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210903075141.403071-4-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6470accc
    • Y
      KVM: use vma_pages() helper · 11476d27
      Yang Li 提交于
      Use vma_pages function on vma object instead of explicit computation.
      
      Fix the following coccicheck warning:
      ./virt/kvm/kvm_main.c:3526:29-35: WARNING: Consider using vma_pages
      helper on vma
      Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: NYang Li <yang.lee@linux.alibaba.com>
      Message-Id: <1632900526-119643-1-git-send-email-yang.lee@linux.alibaba.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      11476d27
    • V
      KVM: nVMX: Reset vmxon_ptr upon VMXOFF emulation. · feb3162f
      Vitaly Kuznetsov 提交于
      Currently, 'vmx->nested.vmxon_ptr' is not reset upon VMXOFF
      emulation. This is not a problem per se as we never access
      it when !vmx->nested.vmxon. But this should be done to avoid
      any issue in the future.
      
      Also, initialize the vmxon_ptr when vcpu is created.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
      Message-Id: <20210929175154.11396-3-yu.c.zhang@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      feb3162f
    • Y
      KVM: nVMX: Use INVALID_GPA for pointers used in nVMX. · 64c78508
      Yu Zhang 提交于
      Clean up nested.c and vmx.c by using INVALID_GPA instead of "-1ull",
      to denote an invalid address in nested VMX. Affected addresses are
      the ones of VMXON region, current VMCS, VMCS link pointer, virtual-
      APIC page, ENCLS-exiting bitmap, and IO bitmap etc.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
      Message-Id: <20210929175154.11396-2-yu.c.zhang@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      64c78508
    • S
      KVM: selftests: Ensure all migrations are performed when test is affined · 7b0035ea
      Sean Christopherson 提交于
      Rework the CPU selection in the migration worker to ensure the specified
      number of migrations are performed when the test iteslf is affined to a
      subset of CPUs.  The existing logic skips iterations if the target CPU is
      not in the original set of possible CPUs, which causes the test to fail
      if too many iterations are skipped.
      
        ==== Test Assertion Failure ====
        rseq_test.c:228: i > (NR_TASK_MIGRATIONS / 2)
        pid=10127 tid=10127 errno=4 - Interrupted system call
           1  0x00000000004018e5: main at rseq_test.c:227
           2  0x00007fcc8fc66bf6: ?? ??:0
           3  0x0000000000401959: _start at ??:?
        Only performed 4 KVM_RUNs, task stalled too much?
      
      Calculate the min/max possible CPUs as a cheap "best effort" to avoid
      high runtimes when the test is affined to a small percentage of CPUs.
      Alternatively, a list or xarray of the possible CPUs could be used, but
      even in a horrendously inefficient setup, such optimizations are not
      needed because the runtime is completely dominated by the cost of
      migrating the task, and the absolute runtime is well under a minute in
      even truly absurd setups, e.g. running on a subset of vCPUs in a VM that
      is heavily overcommited (16 vCPUs per pCPU).
      
      Fixes: 61e52f16 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs")
      Reported-by: NDongli Zhang <dongli.zhang@oracle.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210929234112.1862848-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7b0035ea
    • S
      KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checks · e8a747d0
      Sean Christopherson 提交于
      Check whether a CPUID entry's index is significant before checking for a
      matching index to hack-a-fix an undefined behavior bug due to consuming
      uninitialized data.  RESET/INIT emulation uses kvm_cpuid() to retrieve
      CPUID.0x1, which does _not_ have a significant index, and fails to
      initialize the dummy variable that doubles as EBX/ECX/EDX output _and_
      ECX, a.k.a. index, input.
      
      Practically speaking, it's _extremely_  unlikely any compiler will yield
      code that causes problems, as the compiler would need to inline the
      kvm_cpuid() call to detect the uninitialized data, and intentionally hose
      the kernel, e.g. insert ud2, instead of simply ignoring the result of
      the index comparison.
      
      Although the sketchy "dummy" pattern was introduced in SVM by commit
      66f7b72e ("KVM: x86: Make register state after reset conform to
      specification"), it wasn't actually broken until commit 7ff6c035
      ("KVM: x86: Remove stateful CPUID handling") arbitrarily swapped the
      order of operations such that "index" was checked before the significant
      flag.
      
      Avoid consuming uninitialized data by reverting to checking the flag
      before the index purely so that the fix can be easily backported; the
      offending RESET/INIT code has been refactored, moved, and consolidated
      from vendor code to common x86 since the bug was introduced.  A future
      patch will directly address the bad RESET/INIT behavior.
      
      The undefined behavior was detected by syzbot + KernelMemorySanitizer.
      
        BUG: KMSAN: uninit-value in cpuid_entry2_find arch/x86/kvm/cpuid.c:68
        BUG: KMSAN: uninit-value in kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103
        BUG: KMSAN: uninit-value in kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183
         cpuid_entry2_find arch/x86/kvm/cpuid.c:68 [inline]
         kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103 [inline]
         kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183
         kvm_vcpu_reset+0x13fb/0x1c20 arch/x86/kvm/x86.c:10885
         kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923
         vcpu_enter_guest+0xfd2/0x6d80 arch/x86/kvm/x86.c:9534
         vcpu_run+0x7f5/0x18d0 arch/x86/kvm/x86.c:9788
         kvm_arch_vcpu_ioctl_run+0x245b/0x2d10 arch/x86/kvm/x86.c:10020
      
        Local variable ----dummy@kvm_vcpu_reset created at:
         kvm_vcpu_reset+0x1fb/0x1c20 arch/x86/kvm/x86.c:10812
         kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923
      
      Reported-by: syzbot+f3985126b746b3d59c9d@syzkaller.appspotmail.com
      Reported-by: NAlexander Potapenko <glider@google.com>
      Fixes: 2a24be79 ("KVM: VMX: Set EDX at INIT with CPUID.0x1, Family-Model-Stepping")
      Fixes: 7ff6c035 ("KVM: x86: Remove stateful CPUID handling")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Message-Id: <20210929222426.1855730-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e8a747d0
    • Z
      ptp: Fix ptp_kvm_getcrosststamp issue for x86 ptp_kvm · 773e89ab
      Zelin Deng 提交于
      hv_clock is preallocated to have only HVC_BOOT_ARRAY_SIZE (64) elements;
      if the PTP_SYS_OFFSET_PRECISE ioctl is executed on vCPUs whose index is
      64 of higher, retrieving the struct pvclock_vcpu_time_info pointer with
      "src = &hv_clock[cpu].pvti" will result in an out-of-bounds access and
      a wild pointer.  Change it to "this_cpu_pvti()" which is guaranteed to
      be valid.
      
      Fixes: 95a3d445 ("Switch kvmclock data to a PER_CPU variable")
      Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
      Cc: <stable@vger.kernel.org>
      Message-Id: <1632892429-101194-3-git-send-email-zelin.deng@linux.alibaba.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      773e89ab
    • Z
      x86/kvmclock: Move this_cpu_pvti into kvmclock.h · ad9af930
      Zelin Deng 提交于
      There're other modules might use hv_clock_per_cpu variable like ptp_kvm,
      so move it into kvmclock.h and export the symbol to make it visiable to
      other modules.
      Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
      Cc: <stable@vger.kernel.org>
      Message-Id: <1632892429-101194-2-git-send-email-zelin.deng@linux.alibaba.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ad9af930
  2. 28 9月, 2021 1 次提交
  3. 27 9月, 2021 1 次提交
    • Z
      KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issue · 5c49d185
      Zhenzhong Duan 提交于
      When updating the host's mask for its MSR_IA32_TSX_CTRL user return entry,
      clear the mask in the found uret MSR instead of vmx->guest_uret_msrs[i].
      Modifying guest_uret_msrs directly is completely broken as 'i' does not
      point at the MSR_IA32_TSX_CTRL entry.  In fact, it's guaranteed to be an
      out-of-bounds accesses as is always set to kvm_nr_uret_msrs in a prior
      loop. By sheer dumb luck, the fallout is limited to "only" failing to
      preserve the host's TSX_CTRL_CPUID_CLEAR.  The out-of-bounds access is
      benign as it's guaranteed to clear a bit in a guest MSR value, which are
      always zero at vCPU creation on both x86-64 and i386.
      
      Cc: stable@vger.kernel.org
      Fixes: 8ea8b8d6 ("KVM: VMX: Use common x86's uret MSR list as the one true list")
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@intel.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210926015545.281083-1-zhenzhong.duan@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5c49d185
  4. 24 9月, 2021 3 次提交
  5. 23 9月, 2021 7 次提交
  6. 22 9月, 2021 18 次提交
    • F
      kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[] · e1fc1553
      Fares Mehanna 提交于
      Intel PMU MSRs is in msrs_to_save_all[], so add AMD PMU MSRs to have a
      consistent behavior between Intel and AMD when using KVM_GET_MSRS,
      KVM_SET_MSRS or KVM_GET_MSR_INDEX_LIST.
      
      We have to add legacy and new MSRs to handle guests running without
      X86_FEATURE_PERFCTR_CORE.
      Signed-off-by: NFares Mehanna <faresx@amazon.de>
      Message-Id: <20210915133951.22389-1-faresx@amazon.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e1fc1553
    • M
      KVM: x86: nVMX: re-evaluate emulation_required on nested VM exit · dbab610a
      Maxim Levitsky 提交于
      If L1 had invalid state on VM entry (can happen on SMM transactions
      when we enter from real mode, straight to nested guest),
      
      then after we load 'host' state from VMCS12, the state has to become
      valid again, but since we load the segment registers with
      __vmx_set_segment we weren't always updating emulation_required.
      
      Update emulation_required explicitly at end of load_vmcs12_host_state.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210913140954.165665-8-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dbab610a
    • M
      KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry · c8607e4a
      Maxim Levitsky 提交于
      It is possible that when non root mode is entered via special entry
      (!from_vmentry), that is from SMM or from loading the nested state,
      the L2 state could be invalid in regard to non unrestricted guest mode,
      but later it can become valid.
      
      (for example when RSM emulation restores segment registers from SMRAM)
      
      Thus delay the check to VM entry, where we will check this and fail.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210913140954.165665-7-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c8607e4a
    • M
      KVM: x86: VMX: synthesize invalid VM exit when emulating invalid guest state · c42dec14
      Maxim Levitsky 提交于
      Since no actual VM entry happened, the VM exit information is stale.
      To avoid this, synthesize an invalid VM guest state VM exit.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210913140954.165665-6-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c42dec14
    • M
      KVM: x86: nSVM: refactor svm_leave_smm and smm_enter_smm · 136a55c0
      Maxim Levitsky 提交于
      Use return statements instead of nested if, and fix error
      path to free all the maps that were allocated.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210913140954.165665-2-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      136a55c0
    • M
      KVM: x86: SVM: call KVM_REQ_GET_NESTED_STATE_PAGES on exit from SMM mode · e85d3e7b
      Maxim Levitsky 提交于
      Currently the KVM_REQ_GET_NESTED_STATE_PAGES on SVM only reloads PDPTRs,
      and MSR bitmap, with former not really needed for SMM as SMM exit code
      reloads them again from SMRAM'S CR3, and later happens to work
      since MSR bitmap isn't modified while in SMM.
      
      Still it is better to be consistient with VMX.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210913140954.165665-5-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e85d3e7b
    • M
      KVM: x86: reset pdptrs_from_userspace when exiting smm · 37687c40
      Maxim Levitsky 提交于
      When exiting SMM, pdpts are loaded again from the guest memory.
      
      This fixes a theoretical bug, when exit from SMM triggers entry to the
      nested guest which re-uses some of the migration
      code which uses this flag as a workaround for a legacy userspace.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210913140954.165665-4-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      37687c40
    • M
      KVM: x86: nSVM: restore the L1 host state prior to resuming nested guest on SMM exit · e2e6e449
      Maxim Levitsky 提交于
      Otherwise guest entry code might see incorrect L1 state (e.g paging state).
      
      Fixes: 37be407b ("KVM: nSVM: Fix L1 state corruption upon return from SMM")
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210913140954.165665-3-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e2e6e449
    • V
      KVM: nVMX: Filter out all unsupported controls when eVMCS was activated · 8d68bad6
      Vitaly Kuznetsov 提交于
      Windows Server 2022 with Hyper-V role enabled failed to boot on KVM when
      enlightened VMCS is advertised. Debugging revealed there are two exposed
      secondary controls it is not happy with: SECONDARY_EXEC_ENABLE_VMFUNC and
      SECONDARY_EXEC_SHADOW_VMCS. These controls are known to be unsupported,
      as there are no corresponding fields in eVMCSv1 (see the comment above
      EVMCS1_UNSUPPORTED_2NDEXEC definition).
      
      Previously, commit 31de3d25 ("x86/kvm/hyper-v: move VMX controls
      sanitization out of nested_enable_evmcs()") introduced the required
      filtering mechanism for VMX MSRs but for some reason put only known
      to be problematic (and not full EVMCS1_UNSUPPORTED_* lists) controls
      there.
      
      Note, Windows Server 2022 seems to have gained some sanity check for VMX
      MSRs: it doesn't even try to launch a guest when there's something it
      doesn't like, nested_evmcs_check_controls() mechanism can't catch the
      problem.
      
      Let's be bold this time and instead of playing whack-a-mole just filter out
      all unsupported controls from VMX MSRs.
      
      Fixes: 31de3d25 ("x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210907163530.110066-1-vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8d68bad6
    • S
      KVM: KVM: Use cpumask_available() to check for NULL cpumask when kicking vCPUs · 0bbc2ca8
      Sean Christopherson 提交于
      Check for a NULL cpumask_var_t when kicking multiple vCPUs via
      cpumask_available(), which performs a !NULL check if and only if cpumasks
      are configured to be allocated off-stack.  This is a meaningless
      optimization, e.g. avoids a TEST+Jcc and TEST+CMOV on x86, but more
      importantly helps document that the NULL check is necessary even though
      all callers pass in a local variable.
      
      No functional change intended.
      
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210827092516.1027264-3-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0bbc2ca8
    • S
      KVM: Clean up benign vcpu->cpu data races when kicking vCPUs · 85b64045
      Sean Christopherson 提交于
      Fix a benign data race reported by syzbot+KCSAN[*] by ensuring vcpu->cpu
      is read exactly once, and by ensuring the vCPU is booted from guest mode
      if kvm_arch_vcpu_should_kick() returns true.  Fix a similar race in
      kvm_make_vcpus_request_mask() by ensuring the vCPU is interrupted if
      kvm_request_needs_ipi() returns true.
      
      Reading vcpu->cpu before vcpu->mode (via kvm_arch_vcpu_should_kick() or
      kvm_request_needs_ipi()) means the target vCPU could get migrated (change
      vcpu->cpu) and enter !OUTSIDE_GUEST_MODE between reading vcpu->cpud and
      reading vcpu->mode.  If that happens, the kick/IPI will be sent to the
      old pCPU, not the new pCPU that is now running the vCPU or reading SPTEs.
      
      Although failing to kick the vCPU is not exactly ideal, practically
      speaking it cannot cause a functional issue unless there is also a bug in
      the caller, and any such bug would exist regardless of kvm_vcpu_kick()'s
      behavior.
      
      The purpose of sending an IPI is purely to get a vCPU into the host (or
      out of reading SPTEs) so that the vCPU can recognize a change in state,
      e.g. a KVM_REQ_* request.  If vCPU's handling of the state change is
      required for correctness, KVM must ensure either the vCPU sees the change
      before entering the guest, or that the sender sees the vCPU as running in
      guest mode.  All architectures handle this by (a) sending the request
      before calling kvm_vcpu_kick() and (b) checking for requests _after_
      setting vcpu->mode.
      
      x86's READING_SHADOW_PAGE_TABLES has similar requirements; KVM needs to
      ensure it kicks and waits for vCPUs that started reading SPTEs _before_
      MMU changes were finalized, but any vCPU that starts reading after MMU
      changes were finalized will see the new state and can continue on
      uninterrupted.
      
      For uses of kvm_vcpu_kick() that are not paired with a KVM_REQ_*, e.g.
      x86's kvm_arch_sync_dirty_log(), the order of the kick must not be relied
      upon for functional correctness, e.g. in the dirty log case, userspace
      cannot assume it has a 100% complete log if vCPUs are still running.
      
      All that said, eliminate the benign race since the cost of doing so is an
      "extra" atomic cmpxchg() in the case where the target vCPU is loaded by
      the current pCPU or is not loaded at all.  I.e. the kick will be skipped
      due to kvm_vcpu_exiting_guest_mode() seeing a compatible vcpu->mode as
      opposed to the kick being skipped because of the cpu checks.
      
      Keep the "cpu != me" checks even though they appear useless/impossible at
      first glance.  x86 processes guest IPI writes in a fast path that runs in
      IN_GUEST_MODE, i.e. can call kvm_vcpu_kick() from IN_GUEST_MODE.  And
      calling kvm_vm_bugged()->kvm_make_vcpus_request_mask() from IN_GUEST or
      READING_SHADOW_PAGE_TABLES is perfectly reasonable.
      
      Note, a race with the cpu_online() check in kvm_vcpu_kick() likely
      persists, e.g. the vCPU could exit guest mode and get offlined between
      the cpu_online() check and the sending of smp_send_reschedule().  But,
      the online check appears to exist only to avoid a WARN in x86's
      native_smp_send_reschedule() that fires if the target CPU is not online.
      The reschedule WARN exists because CPU offlining takes the CPU out of the
      scheduling pool, i.e. the WARN is intended to detect the case where the
      kernel attempts to schedule a task on an offline CPU.  The actual sending
      of the IPI is a non-issue as at worst it will simpy be dropped on the
      floor.  In other words, KVM's usurping of the reschedule IPI could
      theoretically trigger a WARN if the stars align, but there will be no
      loss of functionality.
      
      [*] https://syzkaller.appspot.com/bug?extid=cd4154e502f43f10808a
      
      Cc: Venkatesh Srinivas <venkateshs@google.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Fixes: 97222cc8 ("KVM: Emulate local APIC in kernel")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210827092516.1027264-2-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      85b64045
    • V
      KVM: x86: Fix stack-out-of-bounds memory access from ioapic_write_indirect() · 2f9b68f5
      Vitaly Kuznetsov 提交于
      KASAN reports the following issue:
      
       BUG: KASAN: stack-out-of-bounds in kvm_make_vcpus_request_mask+0x174/0x440 [kvm]
       Read of size 8 at addr ffffc9001364f638 by task qemu-kvm/4798
      
       CPU: 0 PID: 4798 Comm: qemu-kvm Tainted: G               X --------- ---
       Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS RYM0081C 07/13/2020
       Call Trace:
        dump_stack+0xa5/0xe6
        print_address_description.constprop.0+0x18/0x130
        ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm]
        __kasan_report.cold+0x7f/0x114
        ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm]
        kasan_report+0x38/0x50
        kasan_check_range+0xf5/0x1d0
        kvm_make_vcpus_request_mask+0x174/0x440 [kvm]
        kvm_make_scan_ioapic_request_mask+0x84/0xc0 [kvm]
        ? kvm_arch_exit+0x110/0x110 [kvm]
        ? sched_clock+0x5/0x10
        ioapic_write_indirect+0x59f/0x9e0 [kvm]
        ? static_obj+0xc0/0xc0
        ? __lock_acquired+0x1d2/0x8c0
        ? kvm_ioapic_eoi_inject_work+0x120/0x120 [kvm]
      
      The problem appears to be that 'vcpu_bitmap' is allocated as a single long
      on stack and it should really be KVM_MAX_VCPUS long. We also seem to clear
      the lower 16 bits of it with bitmap_zero() for no particular reason (my
      guess would be that 'bitmap' and 'vcpu_bitmap' variables in
      kvm_bitmap_or_dest_vcpus() caused the confusion: while the later is indeed
      16-bit long, the later should accommodate all possible vCPUs).
      
      Fixes: 7ee30bc1 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
      Fixes: 9a2ae9f6 ("KVM: x86: Zero the IOAPIC scan request dest vCPUs bitmap")
      Reported-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210827092516.1027264-7-vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2f9b68f5
    • D
      KVM: selftests: Create a separate dirty bitmap per slot · 7c236b81
      David Matlack 提交于
      The calculation to get the per-slot dirty bitmap was incorrect leading
      to a buffer overrun. Fix it by splitting out the dirty bitmap into a
      separate bitmap per slot.
      
      Fixes: 609e6202 ("KVM: selftests: Support multiple slots in dirty_log_perf_test")
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Message-Id: <20210917173657.44011-4-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7c236b81
    • D
      KVM: selftests: Refactor help message for -s backing_src · 9f2fc555
      David Matlack 提交于
      All selftests that support the backing_src option were printing their
      own description of the flag and then calling backing_src_help() to dump
      the list of available backing sources. Consolidate the flag printing in
      backing_src_help() to align indentation, reduce duplicated strings, and
      improve consistency across tests.
      
      Note: Passing "-s" to backing_src_help is unnecessary since every test
      uses the same flag. However I decided to keep it for code readability
      at the call sites.
      
      While here this opportunistically fixes the incorrectly interleaved
      printing -x help message and list of backing source types in
      dirty_log_perf_test.
      
      Fixes: 609e6202 ("KVM: selftests: Support multiple slots in dirty_log_perf_test")
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210917173657.44011-3-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9f2fc555
    • D
      KVM: selftests: Change backing_src flag to -s in demand_paging_test · a1e638da
      David Matlack 提交于
      Every other KVM selftest uses -s for the backing_src, so switch
      demand_paging_test to match.
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210917173657.44011-2-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a1e638da
    • P
      KVM: SEV: Allow some commands for mirror VM · 5b92b6ca
      Peter Gonda 提交于
      A mirrored SEV-ES VM will need to call KVM_SEV_LAUNCH_UPDATE_VMSA to
      setup its vCPUs and have them measured, and their VMSAs encrypted. Without
      this change, it is impossible to have mirror VMs as part of SEV-ES VMs.
      
      Also allow the guest status check and debugging commands since they do
      not change any guest state.
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Nathan Tempelman <natet@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Steve Rutherford <srutherford@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 54526d1f ("KVM: x86: Support KVM VMs sharing SEV context", 2021-04-21)
      Message-Id: <20210921150345.2221634-3-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5b92b6ca
    • P
      KVM: SEV: Update svm_vm_copy_asid_from for SEV-ES · f43c887c
      Peter Gonda 提交于
      For mirroring SEV-ES the mirror VM will need more then just the ASID.
      The FD and the handle are required to all the mirror to call psp
      commands. The mirror VM will need to call KVM_SEV_LAUNCH_UPDATE_VMSA to
      setup its vCPUs' VMSAs for SEV-ES.
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Nathan Tempelman <natet@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Steve Rutherford <srutherford@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 54526d1f ("KVM: x86: Support KVM VMs sharing SEV context", 2021-04-21)
      Message-Id: <20210921150345.2221634-2-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f43c887c
    • C
      KVM: nVMX: Fix nested bus lock VM exit · 24a996ad
      Chenyi Qiang 提交于
      Nested bus lock VM exits are not supported yet. If L2 triggers bus lock
      VM exit, it will be directed to L1 VMM, which would cause unexpected
      behavior. Therefore, handle L2's bus lock VM exits in L0 directly.
      
      Fixes: fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit")
      Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Message-Id: <20210914095041.29764-1-chenyi.qiang@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      24a996ad