1. 14 5月, 2020 6 次提交
    • S
      KVM: VMX: Add proper cache tracking for CR0 · bd31fe49
      Sean Christopherson 提交于
      Move CR0 caching into the standard register caching mechanism in order
      to take advantage of the availability checks provided by regs_avail.
      This avoids multiple VMREADs in the (uncommon) case where kvm_read_cr0()
      is called multiple times in a single VM-Exit, and more importantly
      eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading
      CR0, and squashes the confusing naming discrepancy of "cache_reg" vs.
      "decache_cr0_guest_bits".
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200502043234.12481-8-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bd31fe49
    • S
      KVM: VMX: Add proper cache tracking for CR4 · f98c1e77
      Sean Christopherson 提交于
      Move CR4 caching into the standard register caching mechanism in order
      to take advantage of the availability checks provided by regs_avail.
      This avoids multiple VMREADs and retpolines (when configured) during
      nested VMX transitions as kvm_read_cr4_bits() is invoked multiple times
      on each transition, e.g. when stuffing CR0 and CR3.
      
      As an added bonus, this eliminates a kvm_x86_ops hook, saves a retpoline
      on SVM when reading CR4, and squashes the confusing naming discrepancy
      of "cache_reg" vs. "decache_cr4_guest_bits".
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200502043234.12481-7-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f98c1e77
    • S
      KVM: x86: Save L1 TSC offset in 'struct kvm_vcpu_arch' · 56ba77a4
      Sean Christopherson 提交于
      Save L1's TSC offset in 'struct kvm_vcpu_arch' and drop the kvm_x86_ops
      hook read_l1_tsc_offset().  This avoids a retpoline (when configured)
      when reading L1's effective TSC, which is done at least once on every
      VM-Exit.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200502043234.12481-2-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      56ba77a4
    • P
      KVM: x86: Replace late check_nested_events() hack with more precise fix · c300ab9f
      Paolo Bonzini 提交于
      Add an argument to interrupt_allowed and nmi_allowed, to checking if
      interrupt injection is blocked.  Use the hook to handle the case where
      an interrupt arrives between check_nested_events() and the injection
      logic.  Drop the retry of check_nested_events() that hack-a-fixed the
      same condition.
      
      Blocking injection is also a bit of a hack, e.g. KVM should do exiting
      and non-exiting interrupt processing in a single pass, but it's a more
      precise hack.  The old comment is also misleading, e.g. KVM_REQ_EVENT is
      purely an optimization, setting it on every run loop (which KVM doesn't
      do) should not affect functionality, only performance.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200423022550.15113-13-sean.j.christopherson@intel.com>
      [Extend to SVM, add SMI and NMI.  Even though NMI and SMI cannot come
       asynchronously right now, making the fix generic is easy and removes a
       special case. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c300ab9f
    • S
      KVM: x86: Make return for {interrupt_nmi,smi}_allowed() a bool instead of int · 88c604b6
      Sean Christopherson 提交于
      Return an actual bool for kvm_x86_ops' {interrupt_nmi}_allowed() hook to
      better reflect the return semantics, and to avoid creating an even
      bigger mess when the related VMX code is refactored in upcoming patches.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200423022550.15113-5-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      88c604b6
    • S
      KVM: nVMX: Open a window for pending nested VMX preemption timer · d2060bd4
      Sean Christopherson 提交于
      Add a kvm_x86_ops hook to detect a nested pending "hypervisor timer" and
      use it to effectively open a window for servicing the expired timer.
      Like pending SMIs on VMX, opening a window simply means requesting an
      immediate exit.
      
      This fixes a bug where an expired VMX preemption timer (for L2) will be
      delayed and/or lost if a pending exception is injected into L2.  The
      pending exception is rightly prioritized by vmx_check_nested_events()
      and injected into L2, with the preemption timer left pending.  Because
      no window opened, L2 is free to run uninterrupted.
      
      Fixes: f4124500 ("KVM: nVMX: Fully emulate preemption timer")
      Reported-by: NJim Mattson <jmattson@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Peter Shier <pshier@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200423022550.15113-3-sean.j.christopherson@intel.com>
      [Check it in kvm_vcpu_has_events too, to ensure that the preemption
       timer is serviced promptly even if the vCPU is halted and L1 is not
       intercepting HLT. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d2060bd4
  2. 13 5月, 2020 1 次提交
    • B
      KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c · 37486135
      Babu Moger 提交于
      Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU
      resource isn't. It can be read with XSAVE and written with XRSTOR.
      So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state),
      the guest can read the host value.
      
      In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could
      potentially use XRSTOR to change the host PKRU value.
      
      While at it, move pkru state save/restore to common code and the
      host_pkru field to kvm_vcpu_arch.  This will let SVM support protection keys.
      
      Cc: stable@vger.kernel.org
      Reported-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NBabu Moger <babu.moger@amd.com>
      Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      37486135
  3. 08 5月, 2020 2 次提交
    • P
      KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6 · d67668e9
      Paolo Bonzini 提交于
      There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the
      different handling of DR6 on intercepted #DB exceptions on Intel and AMD.
      
      On Intel, #DB exceptions transmit the DR6 value via the exit qualification
      field of the VMCS, and the exit qualification only contains the description
      of the precise event that caused a vmexit.
      
      On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception
      was to be injected into the guest.  This has two effects when guest debugging
      is in use:
      
      * the guest DR6 is clobbered
      
      * the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather
      than just the last one that happened (the testcase in the next patch covers
      this issue).
      
      This patch fixes both issues by emulating, so to speak, the Intel behavior
      on AMD processors.  The important observation is that (after the previous
      patches) the VMCB value of DR6 is only ever observable from the guest is
      KVM_DEBUGREG_WONT_EXIT is set.  Therefore we can actually set vmcb->save.dr6
      to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it
      will be if guest debugging is enabled.
      
      Therefore it is possible to enter the guest with an all-zero DR6,
      reconstruct the #DB payload from the DR6 we get at exit time, and let
      kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6.
      Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT
      is set, but this is harmless.
      
      This may not be the most optimized way to deal with this, but it is
      simple and, being confined within SVM code, it gets rid of the set_dr6
      callback and kvm_update_dr6.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d67668e9
    • P
      KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6 · 5679b803
      Paolo Bonzini 提交于
      kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the
      second argument.  Ensure that the VMCB value is synchronized to
      vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so
      that the current value of DR6 is always available in vcpu->arch.dr6.
      The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5679b803
  4. 07 5月, 2020 1 次提交
  5. 05 5月, 2020 1 次提交
  6. 23 4月, 2020 1 次提交
    • P
      KVM: x86: move nested-related kvm_x86_ops to a separate struct · 33b22172
      Paolo Bonzini 提交于
      Clean up some of the patching of kvm_x86_ops, by moving kvm_x86_ops related to
      nested virtualization into a separate struct.
      
      As a result, these ops will always be non-NULL on VMX.  This is not a problem:
      
      * check_nested_events is only called if is_guest_mode(vcpu) returns true
      
      * get_nested_state treats VMXOFF state the same as nested being disabled
      
      * set_nested_state fails if you attempt to set nested state while
        nesting is disabled
      
      * nested_enable_evmcs could already be called on a CPU without VMX enabled
        in CPUID.
      
      * nested_get_evmcs_version was fixed in the previous patch
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      33b22172
  7. 21 4月, 2020 12 次提交
    • W
      KVM: X86: Improve latency for single target IPI fastpath · a9ab13ff
      Wanpeng Li 提交于
      IPI and Timer cause the main MSRs write vmexits in cloud environment
      observation, let's optimize virtual IPI latency more aggressively to
      inject target IPI as soon as possible.
      
      Running kvm-unit-tests/vmexit.flat IPI testing on SKX server, disable
      adaptive advance lapic timer and adaptive halt-polling to avoid the
      interference, this patch can give another 7% improvement.
      
      w/o fastpath   -> x86.c fastpath      4238 -> 3543  16.4%
      x86.c fastpath -> vmx.c fastpath      3543 -> 3293     7%
      w/o fastpath   -> vmx.c fastpath      4238 -> 3293  22.3%
      
      Cc: Haiwei Li <lihaiwei@tencent.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200410174703.1138-3-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a9ab13ff
    • S
      KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags · 87915858
      Sean Christopherson 提交于
      Introduce a new "extended register" type, EXIT_INFO_2 (to pair with the
      nomenclature in .get_exit_info()), and use it to cache VMX's
      vmcs.EXIT_INTR_INFO.  Drop a comment in vmx_recover_nmi_blocking() that
      is obsoleted by the generic caching mechanism.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200415203454.8296-6-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      87915858
    • S
      KVM: VMX: Cache vmcs.EXIT_QUALIFICATION using arch avail_reg flags · 5addc235
      Sean Christopherson 提交于
      Introduce a new "extended register" type, EXIT_INFO_1 (to pair with the
      nomenclature in .get_exit_info()), and use it to cache VMX's
      vmcs.EXIT_QUALIFICATION.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200415203454.8296-5-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5addc235
    • S
      KVM: x86: Replace "cr3" with "pgd" in "new cr3/pgd" related code · be01e8e2
      Sean Christopherson 提交于
      Rename functions and variables in kvm_mmu_new_cr3() and related code to
      replace "cr3" with "pgd", i.e. continue the work started by commit
      727a7e27 ("KVM: x86: rename set_cr3 callback and related flags to
      load_mmu_pgd").  kvm_mmu_new_cr3() and company are not always loading a
      new CR3, e.g. when nested EPT is enabled "cr3" is actually an EPTP.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-37-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      be01e8e2
    • S
      KVM: x86/mmu: Add separate override for MMU sync during fast CR3 switch · 4a632ac6
      Sean Christopherson 提交于
      Add a separate "skip" override for MMU sync, a future change to avoid
      TLB flushes on nested VMX transitions may need to sync the MMU even if
      the TLB flush is unnecessary.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-32-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a632ac6
    • S
      KVM: VMX: Retrieve APIC access page HPA only when necessary · a4148b7c
      Sean Christopherson 提交于
      Move the retrieval of the HPA associated with L1's APIC access page into
      VMX code to avoid unnecessarily calling gfn_to_page(), e.g. when the
      vCPU is in guest mode (L2).  Alternatively, the optimization logic in
      VMX could be mirrored into the common x86 code, but that will get ugly
      fast when further optimizations are introduced.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-29-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a4148b7c
    • S
      KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASID · eeeb4f67
      Sean Christopherson 提交于
      Add KVM_REQ_TLB_FLUSH_CURRENT to allow optimized TLB flushing of VMX's
      EPTP/VPID contexts[*] from the KVM MMU and/or in a deferred manner, e.g.
      to flush L2's context during nested VM-Enter.
      
      Convert KVM_REQ_TLB_FLUSH to KVM_REQ_TLB_FLUSH_CURRENT in flows where
      the flush is directly associated with vCPU-scoped instruction emulation,
      i.e. MOV CR3 and INVPCID.
      
      Add a comment in vmx_vcpu_load_vmcs() above its KVM_REQ_TLB_FLUSH to
      make it clear that it deliberately requests a flush of all contexts.
      
      Service any pending flush request on nested VM-Exit as it's possible a
      nested VM-Exit could occur after requesting a flush for L2.  Add the
      same logic for nested VM-Enter even though it's _extremely_ unlikely
      for flush to be pending on nested VM-Enter, but theoretically possible
      (in the future) due to RSM (SMM) emulation.
      
      [*] Intel also has an Address Space Identifier (ASID) concept, e.g.
          EPTP+VPID+PCID == ASID, it's just not documented in the SDM because
          the rules of invalidation are different based on which piece of the
          ASID is being changed, i.e. whether the EPTP, VPID, or PCID context
          must be invalidated.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-25-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eeeb4f67
    • S
      KVM: x86: Rename ->tlb_flush() to ->tlb_flush_all() · 7780938c
      Sean Christopherson 提交于
      Rename ->tlb_flush() to ->tlb_flush_all() in preparation for adding a
      new hook to flush only the current ASID/context.
      
      Opportunstically replace the comment in vmx_flush_tlb() that explains
      why it flushes all EPTP/VPID contexts with a comment explaining why it
      unconditionally uses INVEPT when EPT is enabled.  I.e. rely on the "all"
      part of the name to clarify why it does global INVEPT/INVVPID.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-23-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7780938c
    • S
      KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush() · f55ac304
      Sean Christopherson 提交于
      Drop @invalidate_gpa from ->tlb_flush() and kvm_vcpu_flush_tlb() now
      that all callers pass %true for said param, or ignore the param (SVM has
      an internal call to svm_flush_tlb() in svm_flush_tlb_guest that somewhat
      arbitrarily passes %false).
      
      Remove __vmx_flush_tlb() as it is no longer used.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-17-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f55ac304
    • V
      KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest() · 0baedd79
      Vitaly Kuznetsov 提交于
      Hyper-V PV TLB flush mechanism does TLB flush on behalf of the guest
      so doing tlb_flush_all() is an overkill, switch to using tlb_flush_guest()
      (just like KVM PV TLB flush mechanism) instead. Introduce
      KVM_REQ_HV_TLB_FLUSH to support the change.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0baedd79
    • S
      KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook · e64419d9
      Sean Christopherson 提交于
      Add a dedicated hook to handle flushing TLB entries on behalf of the
      guest, i.e. for a paravirtualized TLB flush, and use it directly instead
      of bouncing through kvm_vcpu_flush_tlb().
      
      For VMX, change the effective implementation implementation to never do
      INVEPT and flush only the current context, i.e. to always flush via
      INVVPID(SINGLE_CONTEXT).  The INVEPT performed by __vmx_flush_tlb() when
      @invalidate_gpa=false and enable_vpid=0 is unnecessary, as it will only
      flush guest-physical mappings; linear and combined mappings are flushed
      by VM-Enter when VPID is disabled, and changes in the guest pages tables
      do not affect guest-physical mappings.
      
      When EPT and VPID are enabled, doing INVVPID is not required (by Intel's
      architecture) to invalidate guest-physical mappings, i.e. TLB entries
      that cache guest-physical mappings can live across INVVPID as the
      mappings are associated with an EPTP, not a VPID.  The intent of
      @invalidate_gpa is to inform vmx_flush_tlb() that it must "invalidate
      gpa mappings", i.e. do INVEPT and not simply INVVPID.  Other than nested
      VPID handling, which now calls vpid_sync_context() directly, the only
      scenario where KVM can safely do INVVPID instead of INVEPT (when EPT is
      enabled) is if KVM is flushing TLB entries from the guest's perspective,
      i.e. is only required to invalidate linear mappings.
      
      For SVM, flushing TLB entries from the guest's perspective can be done
      by flushing the current ASID, as changes to the guest's page tables are
      associated only with the current ASID.
      
      Adding a dedicated ->tlb_flush_guest() paves the way toward removing
      @invalidate_gpa, which is a potentially dangerous control flag as its
      meaning is not exactly crystal clear, even for those who are familiar
      with the subtleties of what mappings Intel CPUs are/aren't allowed to
      keep across various invalidation scenarios.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-15-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e64419d9
    • P
      KVM: x86: introduce kvm_mmu_invalidate_gva · 5efac074
      Paolo Bonzini 提交于
      Wrap the combination of mmu->invlpg and kvm_x86_ops->tlb_flush_gva
      into a new function.  This function also lets us specify the host PGD to
      invalidate and also the MMU, both of which will be useful in fixing and
      simplifying kvm_inject_emulated_page_fault.
      
      A nested guest's MMU however has g_context->invlpg == NULL.  Instead of
      setting it to nonpaging_invlpg, make kvm_mmu_invalidate_gva the only
      entry point to mmu->invlpg and make a NULL invlpg pointer equivalent
      to nonpaging_invlpg, saving a retpoline.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5efac074
  8. 16 4月, 2020 1 次提交
  9. 31 3月, 2020 3 次提交
  10. 17 3月, 2020 12 次提交