1. 11 11月, 2021 11 次提交
    • S
      KVM: nVMX: Handle dynamic MSR intercept toggling · 67f4b996
      Sean Christopherson 提交于
      Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2,
      and always update the relevant bits in vmcs02.  This fixes two distinct,
      but intertwined bugs related to dynamic MSR bitmap modifications.
      
      The first issue is that KVM fails to enable MSR interception in vmcs02
      for the FS/GS base MSRs if L1 first runs L2 with interception disabled,
      and later enables interception.
      
      The second issue is that KVM fails to honor userspace MSR filtering when
      preparing vmcs02.
      
      Fix both issues simultaneous as fixing only one of the issues (doesn't
      matter which) would create a mess that no one should have to bisect.
      Fixing only the first bug would exacerbate the MSR filtering issue as
      userspace would see inconsistent behavior depending on the whims of L1.
      Fixing only the second bug (MSR filtering) effectively requires fixing
      the first, as the nVMX code only knows how to transition vmcs02's
      bitmap from 1->0.
      
      Move the various accessor/mutators that are currently buried in vmx.c
      into vmx.h so that they can be shared by the nested code.
      
      Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering")
      Fixes: d69129b4 ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible")
      Cc: stable@vger.kernel.org
      Cc: Alexander Graf <graf@amazon.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211109013047.2041518-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      67f4b996
    • S
      KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use · 7dfbc624
      Sean Christopherson 提交于
      Check the current VMCS controls to determine if an MSR write will be
      intercepted due to MSR bitmaps being disabled.  In the nested VMX case,
      KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
      if KVM can't map L1's bitmaps for whatever reason.
      
      Note, the bad behavior is relatively benign in the current code base as
      KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
      only if L0 KVM also disables interception of an MSR, and only uses the
      buggy helper for MSR_IA32_SPEC_CTRL.  Because KVM explicitly tests WRMSR
      before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
      will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
      isn't strictly necessary.
      
      Tag the fix for stable in case a future fix wants to use
      msr_write_intercepted(), in which case a buggy implementation in older
      kernels could prove subtly problematic.
      
      Fixes: d28b387f ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211109013047.2041518-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7dfbc624
    • V
      KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was... · afd67ee3
      Vitaly Kuznetsov 提交于
      KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was written to MSR_KVM_PV_EOI_EN
      
      When kvm_gfn_to_hva_cache_init() call from kvm_lapic_set_pv_eoi() fails,
      MSR write to MSR_KVM_PV_EOI_EN results in #GP so it is reasonable to
      expect that the value we keep internally in KVM wasn't updated.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211108152819.12485-3-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      afd67ee3
    • V
      KVM: x86: Rename kvm_lapic_enable_pv_eoi() · 77c3323f
      Vitaly Kuznetsov 提交于
      kvm_lapic_enable_pv_eoi() is a misnomer as the function is also
      used to disable PV EOI. Rename it to kvm_lapic_set_pv_eoi().
      
      No functional change intended.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211108152819.12485-2-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      77c3323f
    • P
      KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES · 760849b1
      Paul Durrant 提交于
      Currently when kvm_update_cpuid_runtime() runs, it assumes that the
      KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true,
      however, if Hyper-V support is enabled. In this case the KVM leaves will
      be offset.
      
      This patch introdues as new 'kvm_cpuid_base' field into struct
      kvm_vcpu_arch to track the location of the KVM leaves and function
      kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the
      leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a
      definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now
      target the correct leaf.
      
      NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced
            into processor.h to avoid having duplicate code for the iteration
            over possible hypervisor base leaves.
      Signed-off-by: NPaul Durrant <pdurrant@amazon.com>
      Message-Id: <20211105095101.5384-3-pdurrant@amazon.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      760849b1
    • S
      KVM: x86: Add helper to consolidate core logic of SET_CPUID{2} flows · 8b44b174
      Sean Christopherson 提交于
      Move the core logic of SET_CPUID and SET_CPUID2 to a common helper, the
      only difference between the two ioctls() is the format of the userspace
      struct.  A future fix will add yet more code to the core logic.
      
      No functional change intended.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211105095101.5384-2-pdurrant@amazon.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8b44b174
    • J
      kvm: mmu: Use fast PF path for access tracking of huge pages when possible · 10c30de0
      Junaid Shahid 提交于
      The fast page fault path bails out on write faults to huge pages in
      order to accommodate dirty logging. This change adds a check to do that
      only when dirty logging is actually enabled, so that access tracking for
      huge pages can still use the fast path for write faults in the common
      case.
      Signed-off-by: NJunaid Shahid <junaids@google.com>
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211104003359.2201967-1-junaids@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      10c30de0
    • S
      KVM: x86/mmu: Properly dereference rcu-protected TDP MMU sptep iterator · c435d4b7
      Sean Christopherson 提交于
      Wrap the read of iter->sptep in tdp_mmu_map_handle_target_level() with
      rcu_dereference().  Shadow pages in the TDP MMU, and thus their SPTEs,
      are protected by rcu.
      
      This fixes a Sparse warning at tdp_mmu.c:900:51:
        warning: incorrect type in argument 1 (different address spaces)
        expected unsigned long long [usertype] *sptep
        got unsigned long long [noderef] [usertype] __rcu *[usertype] sptep
      
      Fixes: 7158bee4 ("KVM: MMU: pass kvm_mmu_page struct to make_spte")
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211103161833.3769487-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c435d4b7
    • M
      KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active · cae72dcc
      Maxim Levitsky 提交于
      KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
      standard kvm's inject_pending_event, and not via APICv/AVIC.
      
      Since this is a debug feature, just inhibit APICv/AVIC while
      KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.
      
      Fixes: 61e5f69e ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Tested-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cae72dcc
    • J
      kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool · e6cd31f1
      Jim Mattson 提交于
      These function names sound like predicates, and they have siblings,
      *is_valid_msr(), which _are_ predicates. Moreover, there are comments
      that essentially warn that these functions behave unexpectedly.
      
      Flip the polarity of the return values, so that they become
      predicates, and convert the boolean result to a success/failure code
      at the outer call site.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211105202058.1048757-1-jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e6cd31f1
    • D
      KVM: x86: Fix recording of guest steal time / preempted status · 7e2175eb
      David Woodhouse 提交于
      In commit b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
      not missed") we switched to using a gfn_to_pfn_cache for accessing the
      guest steal time structure in order to allow for an atomic xchg of the
      preempted field. This has a couple of problems.
      
      Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
      atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
      guest vCPU using an IOMEM page for its steal time would never have its
      preempted field set.
      
      Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
      should have been. There are two stages to the GFN->PFN conversion;
      first the GFN is converted to a userspace HVA, and then that HVA is
      looked up in the process page tables to find the underlying host PFN.
      Correct invalidation of the latter would require being hooked up to the
      MMU notifiers, but that doesn't happen---so it just keeps mapping and
      unmapping the *wrong* PFN after the userspace page tables change.
      
      In the !IOMEM case at least the stale page *is* pinned all the time it's
      cached, so it won't be freed and reused by anyone else while still
      receiving the steal time updates. The map/unmap dance only takes care
      of the KVM administrivia such as marking the page dirty.
      
      Until the gfn_to_pfn cache handles the remapping automatically by
      integrating with the MMU notifiers, we might as well not get a
      kernel mapping of it, and use the perfectly serviceable userspace HVA
      that we already have.  We just need to implement the atomic xchg on
      the userspace address with appropriate exception handling, which is
      fairly trivial.
      
      Cc: stable@vger.kernel.org
      Fixes: b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org>
      [I didn't entirely agree with David's assessment of the
       usefulness of the gfn_to_pfn cache, and integrated the outcome
       of the discussion in the above commit message. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7e2175eb
  2. 25 10月, 2021 4 次提交
  3. 23 10月, 2021 2 次提交
  4. 22 10月, 2021 22 次提交
  5. 19 10月, 2021 1 次提交
    • O
      KVM: x86: Expose TSC offset controls to userspace · 828ca896
      Oliver Upton 提交于
      To date, VMM-directed TSC synchronization and migration has been a bit
      messy. KVM has some baked-in heuristics around TSC writes to infer if
      the VMM is attempting to synchronize. This is problematic, as it depends
      on host userspace writing to the guest's TSC within 1 second of the last
      write.
      
      A much cleaner approach to configuring the guest's views of the TSC is to
      simply migrate the TSC offset for every vCPU. Offsets are idempotent,
      and thus not subject to change depending on when the VMM actually
      reads/writes values from/to KVM. The VMM can then read the TSC once with
      KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
      the guest is paused.
      
      Cc: David Matlack <dmatlack@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: NOliver Upton <oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210916181538.968978-8-oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      828ca896