1. 09 1月, 2020 2 次提交
  2. 04 12月, 2019 1 次提交
  3. 23 11月, 2019 1 次提交
    • J
      kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints · 85c9aae9
      Jim Mattson 提交于
      Commit 37e4c997 ("KVM: VMX: validate individual bits of guest
      MSR_IA32_FEATURE_CONTROL") broke the KVM_SET_MSRS ABI by instituting
      new constraints on the data values that kvm would accept for the guest
      MSR, IA32_FEATURE_CONTROL. Perhaps these constraints should have been
      opt-in via a new KVM capability, but they were applied
      indiscriminately, breaking at least one existing hypervisor.
      
      Relax the constraints to allow either or both of
      FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX and
      FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX to be set when nVMX is
      enabled. This change is sufficient to fix the aforementioned breakage.
      
      Fixes: 37e4c997 ("KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL")
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      85c9aae9
  4. 21 11月, 2019 4 次提交
    • L
      KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page · 0155b2b9
      Liran Alon 提交于
      According to Intel SDM section 28.3.3.3/28.3.3.4 Guidelines for Use
      of the INVVPID/INVEPT Instruction, the hypervisor needs to execute
      INVVPID/INVEPT X in case CPU executes VMEntry with VPID/EPTP X and
      either: "Virtualize APIC accesses" VM-execution control was changed
      from 0 to 1, OR the value of apic_access_page was changed.
      
      In the nested case, the burden falls on L1, unless L0 enables EPT in
      vmcs02 but L1 enables neither EPT nor VPID in vmcs12.  For this reason
      prepare_vmcs02() and load_vmcs12_host_state() have special code to
      request a TLB flush in case L1 does not use EPT but it uses
      "virtualize APIC accesses".
      
      This special case however is not necessary. On a nested vmentry the
      physical TLB will already be flushed except if all the following apply:
      
      * L0 uses VPID
      
      * L1 uses VPID
      
      * L0 can guarantee TLB entries populated while running L1 are tagged
      differently than TLB entries populated while running L2.
      
      If the first condition is false, the processor will flush the TLB
      on vmentry to L2.  If the second or third condition are false,
      prepare_vmcs02() will request KVM_REQ_TLB_FLUSH.  However, even
      if both are true, no extra TLB flush is needed to handle the APIC
      access page:
      
      * if L1 doesn't use VPID, the second condition doesn't hold and the
      TLB will be flushed anyway.
      
      * if L1 uses VPID, it has to flush the TLB itself with INVVPID and
      section 28.3.3.3 doesn't apply to L0.
      
      * even INVEPT is not needed because, if L0 uses EPT, it uses different
      EPTP when running L2 than L1 (because guest_mode is part of mmu-role).
      In this case SDM section 28.3.3.4 doesn't apply.
      
      Similarly, examining nested_vmx_vmexit()->load_vmcs12_host_state(),
      one could note that L0 won't flush TLB only in cases where SDM sections
      28.3.3.3 and 28.3.3.4 don't apply.  In particular, if L0 uses different
      VPIDs for L1 and L2 (i.e. vmx->vpid != vmx->nested.vpid02), section
      28.3.3.3 doesn't apply.
      
      Thus, remove this flush from prepare_vmcs02() and nested_vmx_vmexit().
      
      Side-note: This patch can be viewed as removing parts of commit
      fb6c8198 ("kvm: vmx: Flush TLB when the APIC-access address changes”)
      that is not relevant anymore since commit
      1313cc2b ("kvm: mmu: Add guest_mode to kvm_mmu_page_role”).
      i.e. The first commit assumes that if L0 use EPT and L1 doesn’t use EPT,
      then L0 will use same EPTP for both L0 and L1. Which indeed required
      L0 to execute INVEPT before entering L2 guest. This assumption is
      not true anymore since when guest_mode was added to mmu-role.
      Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0155b2b9
    • L
      KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning · b11494bc
      Liran Alon 提交于
      vmcs->apic_access_page is simply a token that the hypervisor puts into
      the PFN of a 4KB EPTE (or PTE if using shadow-paging) that triggers
      APIC-access VMExit or APIC virtualization logic whenever a CPU running
      in VMX non-root mode read/write from/to this PFN.
      
      As every write either triggers an APIC-access VMExit or write is
      performed on vmcs->virtual_apic_page, the PFN pointed to by
      vmcs->apic_access_page should never actually be touched by CPU.
      
      Therefore, there is no need to mark vmcs02->apic_access_page as dirty
      after unpin it on L2->L1 emulated VMExit or when L1 exit VMX operation.
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b11494bc
    • P
      KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it · b07a5c53
      Paolo Bonzini 提交于
      If X86_FEATURE_RTM is disabled, the guest should not be able to access
      MSR_IA32_TSX_CTRL.  We can therefore use it in KVM to force all
      transactions from the guest to abort.
      Tested-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b07a5c53
    • P
      KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality · c11f83e0
      Paolo Bonzini 提交于
      The current guest mitigation of TAA is both too heavy and not really
      sufficient.  It is too heavy because it will cause some affected CPUs
      (those that have MDS_NO but lack TAA_NO) to fall back to VERW and
      get the corresponding slowdown.  It is not really sufficient because
      it will cause the MDS_NO bit to disappear upon microcode update, so
      that VMs started before the microcode update will not be runnable
      anymore afterwards, even with tsx=on.
      
      Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for
      the guest and let it run without the VERW mitigation.  Even though
      MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write
      it on every vmentry, we can use the shared MSR functionality because
      the host kernel need not protect itself from TSX-based side-channels.
      Tested-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c11f83e0
  5. 20 11月, 2019 3 次提交
  6. 16 11月, 2019 1 次提交
    • T
      x86/tss: Fix and move VMX BUILD_BUG_ON() · 6b546e1c
      Thomas Gleixner 提交于
      The BUILD_BUG_ON(IO_BITMAP_OFFSET - 1 == 0x67) in the VMX code is bogus in
      two aspects:
      
      1) This wants to be in generic x86 code simply to catch issues even when
         VMX is disabled in Kconfig.
      
      2) The IO_BITMAP_OFFSET is not the right thing to check because it makes
         asssumptions about the layout of tss_struct. Nothing requires that the
         I/O bitmap is placed right after x86_tss, which is the hardware mandated
         tss structure. It pointlessly makes restrictions on the struct
         tss_struct layout.
      
      The proper thing to check is:
      
          - Offset of x86_tss in tss_struct is 0
          - Size of x86_tss == 0x68
      
      Move it to the other build time TSS checks and make it do the right thing.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      6b546e1c
  7. 15 11月, 2019 20 次提交
  8. 12 11月, 2019 4 次提交
  9. 31 10月, 2019 1 次提交
    • P
      KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active · 9167ab79
      Paolo Bonzini 提交于
      VMX already does so if the host has SMEP, in order to support the combination of
      CR0.WP=1 and CR4.SMEP=1.  However, it is perfectly safe to always do so, and in
      fact VMX already ends up running with EFER.NXE=1 on old processors that lack the
      "load EFER" controls, because it may help avoiding a slow MSR write.  Removing
      all the conditionals simplifies the code.
      
      SVM does not have similar code, but it should since recent AMD processors do
      support SMEP.  So this patch also makes the code for the two vendors more similar
      while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors.
      
      Cc: stable@vger.kernel.org
      Cc: Joerg Roedel <jroedel@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9167ab79
  10. 23 10月, 2019 1 次提交
    • J
      KVM: nVMX: Don't leak L1 MMIO regions to L2 · 671ddc70
      Jim Mattson 提交于
      If the "virtualize APIC accesses" VM-execution control is set in the
      VMCS, the APIC virtualization hardware is triggered when a page walk
      in VMX non-root mode terminates at a PTE wherein the address of the 4k
      page frame matches the APIC-access address specified in the VMCS. On
      hardware, the APIC-access address may be any valid 4k-aligned physical
      address.
      
      KVM's nVMX implementation enforces the additional constraint that the
      APIC-access address specified in the vmcs12 must be backed by
      a "struct page" in L1. If not, L0 will simply clear the "virtualize
      APIC accesses" VM-execution control in the vmcs02.
      
      The problem with this approach is that the L1 guest has arranged the
      vmcs12 EPT tables--or shadow page tables, if the "enable EPT"
      VM-execution control is clear in the vmcs12--so that the L2 guest
      physical address(es)--or L2 guest linear address(es)--that reference
      the L2 APIC map to the APIC-access address specified in the
      vmcs12. Without the "virtualize APIC accesses" VM-execution control in
      the vmcs02, the APIC accesses in the L2 guest will directly access the
      APIC-access page in L1.
      
      When there is no mapping whatsoever for the APIC-access address in L1,
      the L2 VM just loses the intended APIC virtualization. However, when
      the APIC-access address is mapped to an MMIO region in L1, the L2
      guest gets direct access to the L1 MMIO device. For example, if the
      APIC-access address specified in the vmcs12 is 0xfee00000, then L2
      gets direct access to L1's APIC.
      
      Since this vmcs12 configuration is something that KVM cannot
      faithfully emulate, the appropriate response is to exit to userspace
      with KVM_INTERNAL_ERROR_EMULATION.
      
      Fixes: fe3ef05c ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12")
      Reported-by: NDan Cross <dcross@google.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      671ddc70
  11. 22 10月, 2019 2 次提交