1. 11 11月, 2021 4 次提交
    • S
      KVM: nVMX: Handle dynamic MSR intercept toggling · 67f4b996
      Sean Christopherson 提交于
      Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2,
      and always update the relevant bits in vmcs02.  This fixes two distinct,
      but intertwined bugs related to dynamic MSR bitmap modifications.
      
      The first issue is that KVM fails to enable MSR interception in vmcs02
      for the FS/GS base MSRs if L1 first runs L2 with interception disabled,
      and later enables interception.
      
      The second issue is that KVM fails to honor userspace MSR filtering when
      preparing vmcs02.
      
      Fix both issues simultaneous as fixing only one of the issues (doesn't
      matter which) would create a mess that no one should have to bisect.
      Fixing only the first bug would exacerbate the MSR filtering issue as
      userspace would see inconsistent behavior depending on the whims of L1.
      Fixing only the second bug (MSR filtering) effectively requires fixing
      the first, as the nVMX code only knows how to transition vmcs02's
      bitmap from 1->0.
      
      Move the various accessor/mutators that are currently buried in vmx.c
      into vmx.h so that they can be shared by the nested code.
      
      Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering")
      Fixes: d69129b4 ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible")
      Cc: stable@vger.kernel.org
      Cc: Alexander Graf <graf@amazon.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211109013047.2041518-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      67f4b996
    • S
      KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use · 7dfbc624
      Sean Christopherson 提交于
      Check the current VMCS controls to determine if an MSR write will be
      intercepted due to MSR bitmaps being disabled.  In the nested VMX case,
      KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
      if KVM can't map L1's bitmaps for whatever reason.
      
      Note, the bad behavior is relatively benign in the current code base as
      KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
      only if L0 KVM also disables interception of an MSR, and only uses the
      buggy helper for MSR_IA32_SPEC_CTRL.  Because KVM explicitly tests WRMSR
      before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
      will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
      isn't strictly necessary.
      
      Tag the fix for stable in case a future fix wants to use
      msr_write_intercepted(), in which case a buggy implementation in older
      kernels could prove subtly problematic.
      
      Fixes: d28b387f ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211109013047.2041518-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7dfbc624
    • M
      KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active · cae72dcc
      Maxim Levitsky 提交于
      KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
      standard kvm's inject_pending_event, and not via APICv/AVIC.
      
      Since this is a debug feature, just inhibit APICv/AVIC while
      KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.
      
      Fixes: 61e5f69e ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Tested-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cae72dcc
    • J
      kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool · e6cd31f1
      Jim Mattson 提交于
      These function names sound like predicates, and they have siblings,
      *is_valid_msr(), which _are_ predicates. Moreover, there are comments
      that essentially warn that these functions behave unexpectedly.
      
      Flip the polarity of the return values, so that they become
      predicates, and convert the boolean result to a success/failure code
      at the outer call site.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211105202058.1048757-1-jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e6cd31f1
  2. 25 10月, 2021 3 次提交
  3. 23 10月, 2021 1 次提交
  4. 22 10月, 2021 6 次提交
  5. 01 10月, 2021 1 次提交
  6. 30 9月, 2021 5 次提交
  7. 27 9月, 2021 1 次提交
    • Z
      KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issue · 5c49d185
      Zhenzhong Duan 提交于
      When updating the host's mask for its MSR_IA32_TSX_CTRL user return entry,
      clear the mask in the found uret MSR instead of vmx->guest_uret_msrs[i].
      Modifying guest_uret_msrs directly is completely broken as 'i' does not
      point at the MSR_IA32_TSX_CTRL entry.  In fact, it's guaranteed to be an
      out-of-bounds accesses as is always set to kvm_nr_uret_msrs in a prior
      loop. By sheer dumb luck, the fallout is limited to "only" failing to
      preserve the host's TSX_CTRL_CPUID_CLEAR.  The out-of-bounds access is
      benign as it's guaranteed to clear a bit in a guest MSR value, which are
      always zero at vCPU creation on both x86-64 and i386.
      
      Cc: stable@vger.kernel.org
      Fixes: 8ea8b8d6 ("KVM: VMX: Use common x86's uret MSR list as the one true list")
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@intel.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210926015545.281083-1-zhenzhong.duan@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5c49d185
  8. 22 9月, 2021 7 次提交
  9. 06 9月, 2021 1 次提交
  10. 21 8月, 2021 1 次提交
  11. 13 8月, 2021 10 次提交
    • S
      KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-Enter · f7782bb8
      Sean Christopherson 提交于
      Clear nested.pi_pending on nested VM-Enter even if L2 will run without
      posted interrupts enabled.  If nested.pi_pending is left set from a
      previous L2, vmx_complete_nested_posted_interrupt() will pick up the
      stale flag and exit to userspace with an "internal emulation error" due
      the new L2 not having a valid nested.pi_desc.
      
      Arguably, vmx_complete_nested_posted_interrupt() should first check for
      posted interrupts being enabled, but it's also completely reasonable that
      KVM wouldn't screw up a fundamental flag.  Not to mention that the mere
      existence of nested.pi_pending is a long-standing bug as KVM shouldn't
      move the posted interrupt out of the IRR until it's actually processed,
      e.g. KVM effectively drops an interrupt when it performs a nested VM-Exit
      with a "pending" posted interrupt.  Fixing the mess is a future problem.
      
      Prior to vmx_complete_nested_posted_interrupt() interpreting a null PI
      descriptor as an error, this was a benign bug as the null PI descriptor
      effectively served as a check on PI not being enabled.  Even then, the
      new flow did not become problematic until KVM started checking the result
      of kvm_check_nested_events().
      
      Fixes: 705699a1 ("KVM: nVMX: Enable nested posted interrupt processing")
      Fixes: 966eefb8 ("KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable")
      Fixes: 47d3530f86c0 ("KVM: x86: Exit to userspace when kvm_check_nested_events fails")
      Cc: stable@vger.kernel.org
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210810144526.2662272-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7782bb8
    • L
      KVM: x86: Clean up redundant ROL16(val, n) macro definition · c1a527a1
      Like Xu 提交于
      The ROL16(val, n) macro is repeatedly defined in several vmcs-related
      files, and it has never been used outside the KVM context.
      
      Let's move it to vmcs.h without any intended functional changes.
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Message-Id: <20210809093410.59304-4-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c1a527a1
    • U
      KVM: x86: Move declaration of kvm_spurious_fault() to x86.h · 65297341
      Uros Bizjak 提交于
      Move the declaration of kvm_spurious_fault() to KVM's "private" x86.h,
      it should never be called by anything other than low level KVM code.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
      [sean: rebased to a series without __ex()/__kvm_handle_fault_on_reboot()]
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210809173955.1710866-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      65297341
    • S
      KVM: x86: Kill off __ex() and __kvm_handle_fault_on_reboot() · ad0577c3
      Sean Christopherson 提交于
      Remove the __kvm_handle_fault_on_reboot() and __ex() macros now that all
      VMX and SVM instructions use asm goto to handle the fault (or in the
      case of VMREAD, completely custom logic).  Drop kvm_spurious_fault()'s
      asmlinkage annotation as __kvm_handle_fault_on_reboot() was the only
      flow that invoked it from assembly code.
      
      Cc: Uros Bizjak <ubizjak@gmail.com>
      Cc: Like Xu <like.xu.linux@gmail.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210809173955.1710866-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ad0577c3
    • S
      KVM: VMX: Hide VMCS control calculators in vmx.c · 2fba4fc1
      Sean Christopherson 提交于
      Now that nested VMX pulls KVM's desired VMCS controls from vmcs01 instead
      of re-calculating on the fly, bury the helpers that do the calcluations
      in vmx.c.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210810171952.2758100-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2fba4fc1
    • S
      KVM: VMX: Drop caching of KVM's desired sec exec controls for vmcs01 · b6247686
      Sean Christopherson 提交于
      Remove the secondary execution controls cache now that it's effectively
      dead code; it is only read immediately after it is written.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210810171952.2758100-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b6247686
    • S
      KVM: nVMX: Pull KVM L0's desired controls directly from vmcs01 · 389ab252
      Sean Christopherson 提交于
      When preparing controls for vmcs02, grab KVM's desired controls from
      vmcs01's shadow state instead of recalculating the controls from scratch,
      or in the secondary execution controls, instead of using the dedicated
      cache.  Calculating secondary exec controls is eye-poppingly expensive
      due to the guest CPUID checks, hence the dedicated cache, but the other
      calculations aren't exactly free either.
      
      Explicitly clear several bits (x2APIC, DESC exiting, and load EFER on
      exit) as appropriate as they may be set in vmcs01, whereas the previous
      implementation relied on dynamic bits being cleared in the calculator.
      
      Intentionally propagate VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL from
      vmcs01 to vmcs02.  Whether or not PERF_GLOBAL_CTRL is loaded depends on
      whether or not perf itself is active, so unless perf stops between the
      exit from L1 and entry to L2, vmcs01 will hold the desired value.  This
      is purely an optimization as atomic_switch_perf_msrs() will set/clear
      the control as needed at VM-Enter, i.e. it avoids two extra VMWRITEs in
      the case where perf is active (versus starting with the bits clear in
      vmcs02, which was the previous behavior).
      
      Cc: Zeng Guang <guang.zeng@intel.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210810171952.2758100-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      389ab252
    • P
      KVM: VMX: Reset DR6 only when KVM_DEBUGREG_WONT_EXIT · 1ccb6f98
      Paolo Bonzini 提交于
      The commit efdab992 ("KVM: x86: fix escape of guest dr6 to the host")
      fixed a bug by resetting DR6 unconditionally when the vcpu being scheduled out.
      
      But writing to debug registers is slow, and it can be visible in perf results
      sometimes, even if neither the host nor the guest activate breakpoints.
      
      Since KVM_DEBUGREG_WONT_EXIT on Intel processors is the only case
      where DR6 gets the guest value, and it never happens at all on SVM,
      the register can be cleared in vmx.c right after reading it.
      Reported-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1ccb6f98
    • P
      KVM: X86: Set host DR6 only on VMX and for KVM_DEBUGREG_WONT_EXIT · 375e28ff
      Paolo Bonzini 提交于
      Commit c77fb5fe ("KVM: x86: Allow the guest to run with dirty debug
      registers") allows the guest accessing to DRs without exiting when
      KVM_DEBUGREG_WONT_EXIT and we need to ensure that they are synchronized
      on entry to the guest---including DR6 that was not synced before the commit.
      
      But the commit sets the hardware DR6 not only when KVM_DEBUGREG_WONT_EXIT,
      but also when KVM_DEBUGREG_BP_ENABLED.  The second case is unnecessary
      and just leads to a more case which leaks stale DR6 to the host which has
      to be resolved by unconditionally reseting DR6 in kvm_arch_vcpu_put().
      
      Even if KVM_DEBUGREG_WONT_EXIT, however, setting the host DR6 only matters
      on VMX because SVM always uses the DR6 value from the VMCB.  So move this
      line to vmx.c and make it conditional on KVM_DEBUGREG_WONT_EXIT.
      Reported-by: NLai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      375e28ff
    • S
      KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF · 18712c13
      Sean Christopherson 提交于
      Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF
      in L2 or if the VM-Exit should be forwarded to L1.  The current logic fails
      to account for the case where #PF is intercepted to handle
      guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into
      L1.  At best, L1 will complain and inject the #PF back into L2.  At
      worst, L1 will eat the unexpected fault and cause L2 to hang on infinite
      page faults.
      
      Note, while the bug was technically introduced by the commit that added
      support for the MAXPHYADDR madness, the shame is all on commit
      a0c13434 ("KVM: VMX: introduce vmx_need_pf_intercept").
      
      Fixes: 1dbf5d68 ("KVM: VMX: Add guest physical address check in EPT violation and misconfig")
      Cc: stable@vger.kernel.org
      Cc: Peter Shier <pshier@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210812045615.3167686-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      18712c13