提交 · 9dadc2f918df26e64aa04794cdb4d8667c934f47 · openeuler / Kernel

09 1月, 2020 2 次提交

KVM: VMX: Rename INTERRUPT_PENDING to INTERRUPT_WINDOW · 9dadc2f9

由 Xiaoyao Li 提交于 12月 06, 2019

Rename interrupt-windown exiting related definitions to match the
latest Intel SDM. No functional changes.
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9dadc2f9

KVM: vmx: remove unreachable statement in vmx_get_msr_feature() · 4fb7b452

由 Miaohe Lin 提交于 12月 05, 2019

We have no way to reach the final statement, remove it.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4fb7b452

04 12月, 2019 1 次提交

kvm: vmx: Stop wasting a page for guest_msrs · 7d73710d

由 Jim Mattson 提交于 12月 03, 2019

We will never need more guest_msrs than there are indices in
vmx_msr_index. Thus, at present, the guest_msrs array will not exceed
168 bytes.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7d73710d

23 11月, 2019 1 次提交

kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints · 85c9aae9

由 Jim Mattson 提交于 11月 22, 2019

Commit 37e4c997 ("KVM: VMX: validate individual bits of guest
MSR_IA32_FEATURE_CONTROL") broke the KVM_SET_MSRS ABI by instituting
new constraints on the data values that kvm would accept for the guest
MSR, IA32_FEATURE_CONTROL. Perhaps these constraints should have been
opt-in via a new KVM capability, but they were applied
indiscriminately, breaking at least one existing hypervisor.

Relax the constraints to allow either or both of
FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX and
FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX to be set when nVMX is
enabled. This change is sufficient to fix the aforementioned breakage.

Fixes: 37e4c997 ("KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL")
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

85c9aae9

21 11月, 2019 4 次提交

KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page · 0155b2b9

由 Liran Alon 提交于 11月 20, 2019

According to Intel SDM section 28.3.3.3/28.3.3.4 Guidelines for Use
of the INVVPID/INVEPT Instruction, the hypervisor needs to execute
INVVPID/INVEPT X in case CPU executes VMEntry with VPID/EPTP X and
either: "Virtualize APIC accesses" VM-execution control was changed
from 0 to 1, OR the value of apic_access_page was changed.

In the nested case, the burden falls on L1, unless L0 enables EPT in
vmcs02 but L1 enables neither EPT nor VPID in vmcs12.  For this reason
prepare_vmcs02() and load_vmcs12_host_state() have special code to
request a TLB flush in case L1 does not use EPT but it uses
"virtualize APIC accesses".

This special case however is not necessary. On a nested vmentry the
physical TLB will already be flushed except if all the following apply:

* L0 uses VPID

* L1 uses VPID

* L0 can guarantee TLB entries populated while running L1 are tagged
differently than TLB entries populated while running L2.

If the first condition is false, the processor will flush the TLB
on vmentry to L2.  If the second or third condition are false,
prepare_vmcs02() will request KVM_REQ_TLB_FLUSH.  However, even
if both are true, no extra TLB flush is needed to handle the APIC
access page:

* if L1 doesn't use VPID, the second condition doesn't hold and the
TLB will be flushed anyway.

* if L1 uses VPID, it has to flush the TLB itself with INVVPID and
section 28.3.3.3 doesn't apply to L0.

* even INVEPT is not needed because, if L0 uses EPT, it uses different
EPTP when running L2 than L1 (because guest_mode is part of mmu-role).
In this case SDM section 28.3.3.4 doesn't apply.

Similarly, examining nested_vmx_vmexit()->load_vmcs12_host_state(),
one could note that L0 won't flush TLB only in cases where SDM sections
28.3.3.3 and 28.3.3.4 don't apply.  In particular, if L0 uses different
VPIDs for L1 and L2 (i.e. vmx->vpid != vmx->nested.vpid02), section
28.3.3.3 doesn't apply.

Thus, remove this flush from prepare_vmcs02() and nested_vmx_vmexit().

Side-note: This patch can be viewed as removing parts of commit
fb6c8198 ("kvm: vmx: Flush TLB when the APIC-access address changes”)
that is not relevant anymore since commit
1313cc2b ("kvm: mmu: Add guest_mode to kvm_mmu_page_role”).
i.e. The first commit assumes that if L0 use EPT and L1 doesn’t use EPT,
then L0 will use same EPTP for both L0 and L1. Which indeed required
L0 to execute INVEPT before entering L2 guest. This assumption is
not true anymore since when guest_mode was added to mmu-role.
Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0155b2b9

KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning · b11494bc

由 Liran Alon 提交于 11月 21, 2019

vmcs->apic_access_page is simply a token that the hypervisor puts into
the PFN of a 4KB EPTE (or PTE if using shadow-paging) that triggers
APIC-access VMExit or APIC virtualization logic whenever a CPU running
in VMX non-root mode read/write from/to this PFN.

As every write either triggers an APIC-access VMExit or write is
performed on vmcs->virtual_apic_page, the PFN pointed to by
vmcs->apic_access_page should never actually be touched by CPU.

Therefore, there is no need to mark vmcs02->apic_access_page as dirty
after unpin it on L2->L1 emulated VMExit or when L1 exit VMX operation.
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b11494bc

KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it · b07a5c53

由 Paolo Bonzini 提交于 11月 18, 2019

If X86_FEATURE_RTM is disabled, the guest should not be able to access
MSR_IA32_TSX_CTRL.  We can therefore use it in KVM to force all
transactions from the guest to abort.
Tested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b07a5c53

KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality · c11f83e0

由 Paolo Bonzini 提交于 11月 18, 2019

The current guest mitigation of TAA is both too heavy and not really
sufficient.  It is too heavy because it will cause some affected CPUs
(those that have MDS_NO but lack TAA_NO) to fall back to VERW and
get the corresponding slowdown.  It is not really sufficient because
it will cause the MDS_NO bit to disappear upon microcode update, so
that VMs started before the microcode update will not be runnable
anymore afterwards, even with tsx=on.

Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for
the guest and let it run without the VERW mitigation.  Even though
MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write
it on every vmentry, we can use the shared MSR functionality because
the host kernel need not protect itself from TSX-based side-channels.
Tested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c11f83e0

20 11月, 2019 3 次提交

KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT · 992edeae

由 Liran Alon 提交于 11月 20, 2019

Since commit 1313cc2b ("kvm: mmu: Add guest_mode to kvm_mmu_page_role"),
guest_mode was added to mmu-role and therefore if L0 use EPT, it will
always run L1 and L2 with different EPTP. i.e. EPTP01!=EPTP02.

Because TLB entries are tagged with EP4TA, KVM can assume
TLB entries populated while running L2 are tagged differently
than TLB entries populated while running L1.

Therefore, update nested_has_guest_tlb_tag() to consider if
L0 use EPT instead of if L1 use EPT.
Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

992edeae

KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1 · c79eb775

由 Chenyi Qiang 提交于 11月 19, 2019

When L1 guest uses 5-level paging, it fails vm-entry to L2 due to
invalid host-state. It needs to add CR4_LA57 bit to nested CR4_FIXED1
MSR.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c79eb775

KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization · cc877670

由 Liran Alon 提交于 11月 18, 2019

Reviewed-by: NMark Kanda <mark.kanda@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cc877670

16 11月, 2019 1 次提交

x86/tss: Fix and move VMX BUILD_BUG_ON() · 6b546e1c

由 Thomas Gleixner 提交于 11月 11, 2019

The BUILD_BUG_ON(IO_BITMAP_OFFSET - 1 == 0x67) in the VMX code is bogus in
two aspects:

1) This wants to be in generic x86 code simply to catch issues even when
   VMX is disabled in Kconfig.

2) The IO_BITMAP_OFFSET is not the right thing to check because it makes
   asssumptions about the layout of tss_struct. Nothing requires that the
   I/O bitmap is placed right after x86_tss, which is the hardware mandated
   tss structure. It pointlessly makes restrictions on the struct
   tss_struct layout.

The proper thing to check is:

    - Offset of x86_tss in tss_struct is 0
    - Size of x86_tss == 0x68

Move it to the other build time TSS checks and make it do the right thing.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Acked-by: NAndy Lutomirski <luto@kernel.org>

6b546e1c

15 11月, 2019 20 次提交

KVM: nVMX: Add support for capturing highest observable L2 TSC · 662f1d1d

由 Aaron Lewis 提交于 11月 07, 2019

The L1 hypervisor may include the IA32_TIME_STAMP_COUNTER MSR in the
vmcs12 MSR VM-exit MSR-store area as a way of determining the highest
TSC value that might have been observed by L2 prior to VM-exit. The
current implementation does not capture a very tight bound on this
value.  To tighten the bound, add the IA32_TIME_STAMP_COUNTER MSR to the
vmcs02 VM-exit MSR-store area whenever it appears in the vmcs12 VM-exit
MSR-store area.  When L0 processes the vmcs12 VM-exit MSR-store area
during the emulation of an L2->L1 VM-exit, special-case the
IA32_TIME_STAMP_COUNTER MSR, using the value stored in the vmcs02
VM-exit MSR-store area to derive the value to be stored in the vmcs12
VM-exit MSR-store area.
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NAaron Lewis <aaronlewis@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

662f1d1d

kvm: vmx: Rename function find_msr() to vmx_find_msr_index() · ef0fbcac

由 Aaron Lewis 提交于 11月 07, 2019

Rename function find_msr() to vmx_find_msr_index() in preparation for an
upcoming patch where we export it and use it in nested.c.
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NAaron Lewis <aaronlewis@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ef0fbcac

kvm: vmx: Rename NR_AUTOLOAD_MSRS to NR_LOADSTORE_MSRS · 7cfe0526

由 Aaron Lewis 提交于 11月 07, 2019

Rename NR_AUTOLOAD_MSRS to NR_LOADSTORE_MSRS.  This needs to be done
due to the addition of the MSR-autostore area that will be added in a
future patch.  After that the name AUTOLOAD will no longer make sense.
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NAaron Lewis <aaronlewis@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7cfe0526

kvm: nested: Introduce read_and_check_msr_entry() · 365d3d55

由 Aaron Lewis 提交于 11月 07, 2019

Add the function read_and_check_msr_entry() which just pulls some code
out of nested_vmx_store_msr().  This will be useful as reusable code in
upcoming patches.
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NAaron Lewis <aaronlewis@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

365d3d55

KVM: nVMX: mark functions in the header as "static inline" · d4069dbe

由 Paolo Bonzini 提交于 11月 15, 2019

Correct a small inaccuracy in the shattering of vmx.c, which becomes
visible now that pmu_intel.c includes nested.h.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d4069dbe

KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control · 03a8871a