提交 · b614c6027896ff9ad6757122e84760d938cab15e · openeuler / Kernel

10 7月, 2019 1 次提交

KVM: Properly check if "page" is valid in kvm_vcpu_unmap · b614c602

由 KarimAllah Ahmed 提交于 7月 10, 2019

The field "page" is initialized to KVM_UNMAPPED_PAGE when it is not used
(i.e. when the memory lives outside kernel control). So this check will
always end up using kunmap even for memremap regions.

Fixes: e45adf66 ("KVM: Introduce a new guest mapping API")
Cc: stable@vger.kernel.org
Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b614c602

06 7月, 2019 1 次提交

KVM: LAPIC: Retry tune per-vCPU timer_advance_ns if adaptive tuning goes insane · 548f7fb2

由 Wanpeng Li 提交于 7月 05, 2019

Retry tune per-vCPU timer_advance_ns if adaptive tuning goes insane which
can happen sporadically in product environment.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

548f7fb2

05 7月, 2019 14 次提交

kvm: LAPIC: write down valid APIC registers · 01402cf8

由 Paolo Bonzini 提交于 7月 05, 2019

Replace a magic 64-bit mask with a list of valid registers, computing
the same mask in the end.
Suggested-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

01402cf8

KVM: LAPIC: ARBPRI is a reserved register for x2APIC · 101628de

由 Paolo Bonzini 提交于 7月 05, 2019

kvm-unit-tests were adjusted to match bare metal behavior, but KVM
itself was not doing what bare metal does; fix that.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

101628de

KVM nVMX: Check Host Segment Registers and Descriptor Tables on vmentry of nested guests · 1ef23e1f

由 Krish Sadhukhan 提交于 7月 03, 2019

According to section "Checks on Host Segment and Descriptor-Table
Registers" in Intel SDM vol 3C, the following checks are performed on
vmentry of nested guests:

   - In the selector field for each of CS, SS, DS, ES, FS, GS and TR, the
     RPL (bits 1:0) and the TI flag (bit 2) must be 0.
   - The selector fields for CS and TR cannot be 0000H.
   - The selector field for SS cannot be 0000H if the "host address-space
     size" VM-exit control is 0.
   - On processors that support Intel 64 architecture, the base-address
     fields for FS, GS and TR must contain canonical addresses.
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: NKarl Heubaum <karl.heubaum@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1ef23e1f

KVM: nVMX: Stash L1's CR3 in vmcs01.GUEST_CR3 on nested entry w/o EPT · f087a029

由 Sean Christopherson 提交于 6月 07, 2019

KVM does not have 100% coverage of VMX consistency checks, i.e. some
checks that cause VM-Fail may only be detected by hardware during a
nested VM-Entry.  In such a case, KVM must restore L1's state to the
pre-VM-Enter state as L2's state has already been loaded into KVM's
software model.

L1's CR3 and PDPTRs in particular are loaded from vmcs01.GUEST_*.  But
when EPT is disabled, the associated fields hold KVM's shadow values,
not L1's "real" values.  Fortunately, when EPT is disabled the PDPTRs
come from memory, i.e. are not cached in the VMCS.  Which leaves CR3
as the sole anomaly.

A previously applied workaround to handle CR3 was to force nested early
checks if EPT is disabled:

  commit 2b27924b ("KVM: nVMX: always use early vmcs check when EPT
                         is disabled")

Forcing nested early checks is undesirable as doing so adds hundreds of
cycles to every nested VM-Entry.  Rather than take this performance hit,
handle CR3 by overwriting vmcs01.GUEST_CR3 with L1's CR3 during nested
VM-Entry when EPT is disabled *and* nested early checks are disabled.
By stuffing vmcs01.GUEST_CR3, nested_vmx_restore_host_state() will
naturally restore the correct vcpu->arch.cr3 from vmcs01.GUEST_CR3.

These shenanigans work because nested_vmx_restore_host_state() does a
full kvm_mmu_reset_context(), i.e. unloads the current MMU, which
guarantees vmcs01.GUEST_CR3 will be rewritten with a new shadow CR3
prior to re-entering L1.

vcpu->arch.root_mmu.root_hpa is set to INVALID_PAGE via:

    nested_vmx_restore_host_state() ->
        kvm_mmu_reset_context() ->
            kvm_mmu_unload() ->
                kvm_mmu_free_roots()

kvm_mmu_unload() has WARN_ON(root_hpa != INVALID_PAGE), i.e. we can bank
on 'root_hpa == INVALID_PAGE' unless the implementation of
kvm_mmu_reset_context() is changed.

On the way into L1, VMCS.GUEST_CR3 is guaranteed to be written (on a
successful entry) via:

    vcpu_enter_guest() ->
        kvm_mmu_reload() ->
            kvm_mmu_load() ->
                kvm_mmu_load_cr3() ->
                    vmx_set_cr3()

Stuff vmcs01.GUEST_CR3 if and only if nested early checks are disabled
as a "late" VM-Fail should never happen win that case (KVM WARNs), and
the conditional write avoids the need to restore the correct GUEST_CR3
when nested_vmx_check_vmentry_hw() fails.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20190607185534.24368-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f087a029

P
KVM: x86: add tracepoints around __direct_map and FNAME(fetch) · 335e192a
由 Paolo Bonzini 提交于 7月 01, 2019
```
These are useful in debugging shadow paging.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
335e192a

KVM: x86: change kvm_mmu_page_get_gfn BUG_ON to WARN_ON · e9f2a760

由 Paolo Bonzini 提交于 6月 30, 2019

Note that in such a case it is quite likely that KVM will BUG_ON
in __pte_list_remove when the VM is closed.  However, there is no
immediate risk of memory corruption in the host so a WARN_ON is
enough and it lets you gather traces for debugging.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e9f2a760

KVM: x86: remove now unneeded hugepage gfn adjustment · d679b326

由 Paolo Bonzini 提交于 6月 23, 2019

After the previous patch, the low bits of the gfn are masked in
both FNAME(fetch) and __direct_map, so we do not need to clear them
in transparent_hugepage_adjust.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d679b326

KVM: x86: make FNAME(fetch) and __direct_map more similar · 3fcf2d1b

由 Paolo Bonzini 提交于 6月 24, 2019

These two functions are basically doing the same thing through
kvm_mmu_get_page, link_shadow_page and mmu_set_spte; yet, for historical
reasons, their code looks very different.  This patch tries to take the
best of each and make them very similar, so that it is easy to understand
changes that apply to both of them.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3fcf2d1b

kvm: x86: Do not release the page inside mmu_set_spte() · 43fdcda9

由 Junaid Shahid 提交于 1月 03, 2019

Release the page at the call-site where it was originally acquired.
This makes the exit code cleaner for most call sites, since they
do not need to duplicate code between success and the failure
label.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

43fdcda9

KVM: cpuid: remove has_leaf_count from struct kvm_cpuid_param · 60cec433

由 Paolo Bonzini 提交于 6月 24, 2019

The has_leaf_count member was originally added for KVM's paravirtualization
CPUID leaves. However, since then the leaf count _has_ been added to those
leaves as well, so we can drop that special case.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

60cec433

KVM: cpuid: rename do_cpuid_1_ent · 50a9e1a4

由 Paolo Bonzini 提交于 6月 24, 2019

do_cpuid_1_ent does not do the entire processing for a CPUID entry, it
only retrieves the host's values. Rename it to match reality.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

50a9e1a4

KVM: cpuid: set struct kvm_cpuid_entry2 flags in do_cpuid_1_ent · d9aadaf6

由 Paolo Bonzini 提交于 7月 04, 2019

do_cpuid_1_ent is typically called in two places by __do_cpuid_func
for CPUID functions that have subleafs.  Both places have to set
the KVM_CPUID_FLAG_SIGNIFCANT_INDEX.  Set that flag, and
KVM_CPUID_FLAG_STATEFUL_FUNC as well, directly in do_cpuid_1_ent.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d9aadaf6

KVM: cpuid: extract do_cpuid_7_mask and support multiple subleafs · 54d360d4

由 Paolo Bonzini 提交于 7月 04, 2019

CPUID function 7 has multiple subleafs.  Instead of having nested
switch statements, move the logic to filter supported features to
a separate function, and call it for each subleaf.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

54d360d4

KVM: cpuid: do_cpuid_ent works on a whole CPUID function · ab8bcf64

由 Paolo Bonzini 提交于 6月 24, 2019

Rename it as well as __do_cpuid_ent and __do_cpuid_ent_emulated to have
"func" in its name, and drop the index parameter which is always 0.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ab8bcf64

03 7月, 2019 10 次提交

KVM: LAPIC: remove the trailing newline used in the fmt parameter of TP_printk · 7be373b6

由 Wanpeng Li 提交于 6月 14, 2019

The trailing newlines will lead to extra newlines in the trace file
which looks like the following output, so remove it.

qemu-system-x86-15695 [002] ...1 15774.839240: kvm_hv_timer_state: vcpu_id 0 hv_timer 1

qemu-system-x86-15695 [002] ...1 15774.839309: kvm_hv_timer_state: vcpu_id 0 hv_timer 1

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7be373b6

KVM: svm: add nrips module parameter · d647eb63

由 Paolo Bonzini 提交于 6月 20, 2019

Allow testing code for old processors that lack the next RIP save
feature, by disabling usage of the next_rip field.

Nested hypervisors however get the feature unconditionally.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d647eb63

kvm: x86: Pass through AMD_STIBP_ALWAYS_ON in GET_SUPPORTED_CPUID · c550505b

由 Jim Mattson 提交于 6月 27, 2019

This bit is purely advisory. Passing it through to the guest indicates
that the virtual processor, like the physical processor, prefers that
STIBP is only set once during boot and not changed.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c550505b

kvm: nVMX: Remove unnecessary sync_roots from handle_invept · b1190198

由 Jim Mattson 提交于 6月 13, 2019

When L0 is executing handle_invept(), the TDP MMU is active. Emulating
an L1 INVEPT does require synchronizing the appropriate shadow EPT
root(s), but a call to kvm_mmu_sync_roots in this context won't do
that. Similarly, the hardware TLB and paging-structure-cache entries
associated with the appropriate shadow EPT root(s) must be flushed,
but requesting a TLB_FLUSH from this context won't do that either.

How did this ever work? KVM always does a sync_roots and TLB flush (in
the correct context) when transitioning from L1 to L2. That isn't the
best choice for nested VM performance, but it effectively papers over
the mistakes here.

Remove the unnecessary operations and leave a comment to try to do
better in the future.
Reported-by: NJunaid Shahid <junaids@google.com>
Fixes: bfd0a56b ("nEPT: Nested INVEPT")
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Nadav Har'El <nyh@il.ibm.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Xinhao Xu <xinhao.xu@intel.com>
Cc: Yang Zhang <yang.z.zhang@Intel.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by Peter Shier <pshier@google.com>
Reviewed-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b1190198

P
Documentation: kvm: document CPUID bit for MSR_KVM_POLL_CONTROL · 9824c83f
由 Paolo Bonzini 提交于 7月 02, 2019
```
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
9824c83f

KVM: X86: Expose PV_SCHED_YIELD CPUID feature bit to guest · 32b72ecc

由 Wanpeng Li 提交于 6月 11, 2019

Expose PV_SCHED_YIELD feature bit to guest, the guest can check this
feature bit before using paravirtualized sched yield.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

32b72ecc

KVM: X86: Implement PV sched yield hypercall · 71506297

由 Wanpeng Li 提交于 6月 11, 2019

The target vCPUs are in runnable state after vcpu_kick and suitable
as a yield target. This patch implements the sched yield hypercall.

17% performance increasement of ebizzy benchmark can be observed in an
over-subscribe environment. (w/ kvm-pv-tlb disabled, testing TLB flush
call-function IPI-many since call-function is not easy to be trigged
by userspace workload).

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

71506297

KVM: X86: Yield to IPI target if necessary · f85f6e7b

由 Wanpeng Li 提交于 6月 11, 2019

When sending a call-function IPI-many to vCPUs, yield if any of
the IPI target vCPUs was preempted, we just select the first
preempted target vCPU which we found since the state of target
vCPUs can change underneath and to avoid race conditions.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f85f6e7b

x86/kvm/nVMX: fix VMCLEAR when Enlightened VMCS is in use · 11e34914

由 Vitaly Kuznetsov 提交于 6月 28, 2019

When Enlightened VMCS is in use, it is valid to do VMCLEAR and,
according to TLFS, this should "transition an enlightened VMCS from the
active to the non-active state". It is, however, wrong to assume that
it is only valid to do VMCLEAR for the eVMCS which is currently active
on the vCPU performing VMCLEAR.

Currently, the logic in handle_vmclear() is broken: in case, there is no
active eVMCS on the vCPU doing VMCLEAR we treat the argument as a 'normal'
VMCS and kvm_vcpu_write_guest() to the 'launch_state' field irreversibly
corrupts the memory area.

So, in case the VMCLEAR argument is not the current active eVMCS on the
vCPU, how can we know if the area it is pointing to is a normal or an
enlightened VMCS?
Thanks to the bug in Hyper-V (see commit 72aeb60c ("KVM: nVMX: Verify
eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD")) we can
not, the revision can't be used to distinguish between them. So let's
assume it is always enlightened in case enlightened vmentry is enabled in
the assist page. Also, check if vmx->nested.enlightened_vmcs_enabled to
minimize the impact for 'unenlightened' workloads.

Fixes: b8bbab92 ("KVM: nVMX: implement enlightened VMPTRLD and VMCLEAR")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

11e34914

x86/KVM/nVMX: don't use clean fields data on enlightened VMLAUNCH · a21a39c2

由 Vitaly Kuznetsov 提交于 6月 28, 2019

Apparently, Windows doesn't maintain clean fields data after it does
VMCLEAR for an enlightened VMCS so we can only use it on VMRESUME.
The issue went unnoticed because currently we do nested_release_evmcs()
in handle_vmclear() and the consecutive enlightened VMPTRLD invalidates
clean fields when a new eVMCS is mapped but we're going to change the
logic.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a21a39c2

02 7月, 2019 3 次提交

KVM: nVMX: list VMX MSRs in KVM_GET_MSR_INDEX_LIST · 95c5c7c7

由 Paolo Bonzini 提交于 7月 02, 2019

This allows userspace to know which MSRs are supported by the hypervisor.
Unfortunately userspace must resort to tricks for everything except
MSR_IA32_VMX_VMFUNC (which was just added in the previous patch).
One possibility is to use the feature control MSR, which is tied to nested
VMX as well and is present on all KVM versions that support feature MSRs.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

95c5c7c7

KVM: nVMX: allow setting the VMFUNC controls MSR · e8a70bd4

由 Paolo Bonzini 提交于 7月 02, 2019

Allow userspace to set a custom value for the VMFUNC controls MSR, as long
as the capabilities it advertises do not exceed those of the host.

Fixes: 27c42a1b ("KVM: nVMX: Enable VMFUNC for the L1 hypervisor", 2017-08-03)
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e8a70bd4

KVM: nVMX: include conditional controls in /dev/kvm KVM_GET_MSRS · 6defc591

由 Paolo Bonzini 提交于 7月 02, 2019

Some secondary controls are automatically enabled/disabled based on the CPUID
values that are set for the guest.  However, they are still available at a
global level and therefore should be present when KVM_GET_MSRS is sent to
/dev/kvm.

Fixes: 1389309c ("KVM: nVMX: expose VMX capabilities for nested hypervisors to userspace", 2018-02-26)
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6defc591

20 6月, 2019 2 次提交

KVM: x86: Fix apic dangling pointer in vcpu · a251fb90

由 Saar Amar 提交于 5月 06, 2019

The function kvm_create_lapic() attempts to allocate the apic structure
and sets a pointer to it in the virtual processor structure. However, if
get_zeroed_page() failed, the function frees the apic chunk, but forgets
to set the pointer in the vcpu to NULL. It's not a security issue since
there isn't a use of that pointer if kvm_create_lapic() returns error,
but it's more accurate that way.
Signed-off-by: NSaar Amar <saaramar@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a251fb90

KVM: VMX: check CPUID before allowing read/write of IA32_XSS · 4d763b16

由 Wanpeng Li 提交于 6月 20, 2019

Raise #GP when guest read/write IA32_XSS, but the CPUID bits
say that it shouldn't exist.

Fixes: 20300099 (kvm: vmx: add MSR logic for XSAVES)
Reported-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
Reported-by: NTao Xu <tao3.xu@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4d763b16

18 6月, 2019 9 次提交

KVM: nVMX: shadow pin based execution controls · eceb9973

由 Paolo Bonzini 提交于 6月 13, 2019

The VMX_PREEMPTION_TIMER flag may be toggled frequently, though not
*very* frequently.  Since it does not affect KVM's dirty logic, e.g.
the preemption timer value is loaded from vmcs12 even if vmcs12 is
"clean", there is no need to mark vmcs12 dirty when L1 writes pin
controls, and shadowing the field achieves that.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eceb9973

KVM: VMX: Leave preemption timer running when it's disabled · 804939ea

由 Sean Christopherson 提交于 5月 07, 2019

VMWRITEs to the major VMCS controls, pin controls included, are
deceptively expensive. CPUs with VMCS caching (Westmere and later) also
optimize away consistency checks on VM-Entry, i.e. skip consistency
checks if the relevant fields have not changed since the last successful
VM-Entry (of the cached VMCS). Because uops are a precious commodity,
uCode's dirty VMCS field tracking isn't as precise as software would
prefer. Notably, writing any of the major VMCS fields effectively marks
the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
consistency checks, which consumes several hundred cycles.

As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
doubles the latency of the next VM-Entry (and again when/if the flag is
toggled back). In a non-nested scenario, running a "standard" guest
with the preemption timer enabled, toggling the timer flag is uncommon
but not rare, e.g. roughly 1 in 10 entries. Disabling the preemption
timer can change these numbers due to its use for "immediate exits",
even when explicitly disabled by userspace.

Nested virtualization in particular is painful, as the timer flag is set
for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
pin controls to *clear* the flag since its the timer's final state isn't
known until vmx_vcpu_run(). I.e. the majority of nested VM-Enters end
up unnecessarily writing pin controls *twice*.

Rather than toggle the timer flag in pin controls, set the timer value
itself to the largest allowed value to put it into a "soft disabled"
state, and ignore any spurious preemption timer exits.

Sadly, the timer is a 32-bit value and so theoretically it can fire
before the head death of the universe, i.e. spurious exits are possible.
But because KVM does *not* save the timer value on VM-Exit and because
the timer runs at a slower rate than the TSC, the maximuma timer value
is still sufficiently large for KVM's purposes. E.g. on a modern CPU
with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
execution. In other words, spurious VM-Exits are effectively only
possible if the host is completely tickless on the logical CPU, the
guest is not using the preemption timer, and the guest is not generating
VM-Exits for any other reason.

To be safe from bad/weird hardware, disable the preemption timer if its
maximum delay is less than ten seconds. Ten seconds is mostly arbitrary
and was selected in no small part because it's a nice round number.
For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
if the preemption timer is disabled by KVM or userspace. Previously
KVM continued to use the preemption timer to force immediate exits even
when the timer was disabled by userspace. Now that KVM leaves the timer
running instead of truly disabling it, allow userspace to kill it
entirely in the unlikely event the timer (or KVM) malfunctions.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

804939ea

KVM: VMX: Drop hv_timer_armed from 'struct loaded_vmcs' · 9d99cc49

由 Sean Christopherson 提交于 5月 07, 2019

... now that it is fully redundant with the pin controls shadow.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9d99cc49

KVM: nVMX: Preset *DT exiting in vmcs02 when emulating UMIP · 469debdb

由 Sean Christopherson 提交于 5月 07, 2019

KVM dynamically toggles SECONDARY_EXEC_DESC to intercept (a subset of)
instructions that are subject to User-Mode Instruction Prevention, i.e.
VMCS.SECONDARY_EXEC_DESC == CR4.UMIP when emulating UMIP. Preset the
VMCS control when preparing vmcs02 to avoid unnecessarily VMWRITEs,
e.g. KVM will clear VMCS.SECONDARY_EXEC_DESC in prepare_vmcs02_early()
and then set it in vmx_set_cr4().
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

469debdb

KVM: nVMX: Preserve last USE_MSR_BITMAPS when preparing vmcs02 · de0286b7

由 Sean Christopherson 提交于 5月 07, 2019

KVM dynamically toggles the CPU_BASED_USE_MSR_BITMAPS execution control
for nested guests based on whether or not both L0 and L1 want to pass
through the same MSRs to L2.  Preserve the last used value from vmcs02
so as to avoid multiple VMWRITEs to (re)set/(re)clear the bit on nested
VM-Entry.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

de0286b7

KVM: VMX: Explicitly initialize controls shadow at VMCS allocation · 3af80fec

由 Sean Christopherson 提交于 5月 07, 2019

Or: Don't re-initialize vmcs02's controls on every nested VM-Entry.

VMWRITEs to the major VMCS controls are deceptively expensive. Intel
CPUs with VMCS caching (Westmere and later) also optimize away
consistency checks on VM-Entry, i.e. skip consistency checks if the
relevant fields have not changed since the last successful VM-Entry (of
the cached VMCS). Because uops are a precious commodity, uCode's dirty
VMCS field tracking isn't as precise as software would prefer. Notably,
writing any of the major VMCS fields effectively marks the entire VMCS
dirty, i.e. causes the next VM-Entry to perform all consistency checks,
which consumes several hundred cycles.

Zero out the controls' shadow copies during VMCS allocation and use the
optimized setter when "initializing" controls. While this technically
affects both non-nested and nested virtualization, nested virtualization
is the primary beneficiary as avoid VMWRITEs when prepare vmcs02 allows
hardware to optimizie away consistency checks.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3af80fec

KVM: nVMX: Don't reset VMCS controls shadow on VMCS switch · ae81d089

由 Sean Christopherson 提交于 5月 07, 2019

... now that the shadow copies are per-VMCS.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ae81d089

KVM: nVMX: Shadow VMCS controls on a per-VMCS basis · 09e226cf

由 Sean Christopherson 提交于 5月 07, 2019

... to pave the way for not preserving the shadow copies across switches
between vmcs01 and vmcs02, and eventually to avoid VMWRITEs to vmcs02
when the desired value is unchanged across nested VM-Enters.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

09e226cf

KVM: VMX: Shadow VMCS secondary execution controls · fe7f895d

由 Sean Christopherson 提交于 5月 07, 2019

Prepare to shadow all major control fields on a per-VMCS basis, which
allows KVM to avoid costly VMWRITEs when switching between vmcs01 and
vmcs02.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe7f895d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功