提交 · d7ee039e2bab6341cdabf42d3036063cf91b40ea · openanolis / cloud-kernel

06 8月, 2018 40 次提交

KVM: vmx: move struct host_state usage to struct loaded_vmcs · d7ee039e

由 Sean Christopherson 提交于 7月 23, 2018

Make host_state a property of a loaded_vmcs so that it can be
used as a cache of the VMCS fields, e.g. to lazily VMWRITE the
corresponding VMCS field.  Treating host_state as a cache does
not work if it's not VMCS specific as the cache would become
incoherent when switching between vmcs01 and vmcs02.

Move vmcs_host_cr3 and vmcs_host_cr4 into host_state.

Explicitly zero out host_state when allocating a new VMCS for a
loaded_vmcs.  Unlike the pre-existing vmcs_host_cr{3,4} usage,
the segment information is not guaranteed to be (re)initialized
when running a new nested VMCS, e.g. HOST_FS_BASE is not written
in vmx_set_constant_host_state().
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d7ee039e

KVM: vmx: compute need to reload FS/GS/LDT on demand · e920de85

由 Sean Christopherson 提交于 7月 23, 2018

Remove fs_reload_needed and gs_ldt_reload_needed from host_state
and instead compute whether we need to reload various state at
the time we actually do the reload.  The state that is tracked
by the *_reload_needed variables is not any more volatile than
the trackers themselves.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e920de85

KVM: nVMX: remove a misleading comment regarding vmcs02 fields · fd1ec772

由 Sean Christopherson 提交于 7月 23, 2018

prepare_vmcs02() has an odd comment that says certain fields are
"not in vmcs02".  AFAICT the intent of the comment is to document
that various VMCS fields are not handled by prepare_vmcs02(),
e.g. HOST_{FS,GS}_{BASE,SELECTOR}.  While technically true, the
comment is misleading, e.g. it can lead the reader to think that
KVM never writes those fields to vmcs02.

Remove the comment altogether as the handling of FS and GS is
not specific to nested VMX, and GUEST_PML_INDEX has been written
by prepare_vmcs02() since commit "4e59516a (kvm: vmx: ensure
VMCS is current while enabling PML)"
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fd1ec772

KVM: vmx: rename __vmx_load_host_state() and vmx_save_host_state() · 6d6095bd

由 Sean Christopherson 提交于 7月 23, 2018

Now that the vmx_load_host_state() wrapper is gone, i.e. the only
time we call the core functions is when we're actually about to
switch between guest/host, rename the functions that handle lazy
state switching to vmx_prepare_switch_to_{guest,host}_state() to
better document the full extent of their functionality.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6d6095bd

KVM: vmx: add dedicated utility to access guest's kernel_gs_base · 678e315e

由 Sean Christopherson 提交于 7月 23, 2018

When lazy save/restore of MSR_KERNEL_GS_BASE was introduced[1], the
MSR was intercepted in all modes and was only restored for the host
when the guest is in 64-bit mode.  So at the time, going through the
full host restore prior to accessing MSR_KERNEL_GS_BASE was necessary
to load host state and was not a significant waste of cycles.

Later, MSR_KERNEL_GS_BASE interception was disabled for a 64-bit
guest[2], and then unconditionally saved/restored for the host[3].
As a result, loading full host state is overkill for accesses to
MSR_KERNEL_GS_BASE, and completely unnecessary when the guest is
not in 64-bit mode.

Add a dedicated utility to read/write the guest's MSR_KERNEL_GS_BASE
(outside of the save/restore flow) to minimize the overhead incurred
when accessing the MSR.  When setting EFER, only decache the MSR if
the new EFER will disable long mode.

Removing out-of-band usage of vmx_load_host_state() also eliminates,
or at least reduces, potential corner cases in its usage, which in
turn will (hopefuly) make it easier to reason about future changes
to the save/restore flow, e.g. optimization of saving host state.

[1] commit 44ea2b17 ("KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx
                                    autoload msr area")
[2] commit 5897297b ("KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE")
[3] commit c8770e7b ("KVM: VMX: Fix host userspace gsbase corruption")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

678e315e

KVM: vmx: track host_state.loaded using a loaded_vmcs pointer · bd9966de

由 Sean Christopherson 提交于 7月 23, 2018

Using 'struct loaded_vmcs*' to track whether the CPU registers
contain host or guest state kills two birds with one stone.

  1. The (effective) boolean host_state.loaded is poorly named.
     It does not track whether or not host state is loaded into
     the CPU registers (which most readers would expect), but
     rather tracks if host state has been saved AND guest state
     is loaded.

  2. Using a loaded_vmcs pointer provides a more robust framework
     for the optimized guest/host state switching, especially when
     consideration per-VMCS enhancements.  To that end, WARN_ONCE
     if we try to switch to host state with a different VMCS than
     was last used to save host state.

Resolve an occurrence of the new WARN by setting loaded_vmcs after
the call to vmx_vcpu_put() in vmx_switch_vmcs().
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bd9966de

KVM: vmx: refactor segmentation code in vmx_save_host_state() · e368b875

由 Sean Christopherson 提交于 7月 23, 2018

Use local variables in vmx_save_host_state() to temporarily track
the selector and base values for FS and GS, and reorganize the
code so that the 64-bit vs 32-bit portions are contained within
a single #ifdef.  This refactoring paves the way for future patches
to modify the updating of VMCS state with minimal changes to the
code, and (hopefully) simplifies resolving a likely conflict with
another in-flight patch[1] by being the whipping boy for future
patches.

[1] https://www.spinics.net/lists/kvm/msg171647.htmlSigned-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e368b875

kvm: nVMX: Fix fault priority for VMX operations · e49fcb8b

由 Jim Mattson 提交于 7月 27, 2018

When checking emulated VMX instructions for faults, the #UD for "IF
(not in VMX operation)" should take precedence over the #GP for "ELSIF
CPL > 0."
Suggested-by: NEric Northup <digitaleric@google.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e49fcb8b

kvm: nVMX: Fix fault vector for VMX operation at CPL > 0 · 36090bf4

由 Jim Mattson 提交于 7月 27, 2018

The fault that should be raised for a privilege level violation is #GP
rather than #UD.

Fixes: 727ba748 ("kvm: nVMX: Enforce cpl=0 for VMX instructions")
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

36090bf4

KVM: try __get_user_pages_fast even if not in atomic context · b9b33da2

由 Paolo Bonzini 提交于 7月 27, 2018

We are currently cutting hva_to_pfn_fast short if we do not want an
immediate exit, which is represented by !async && !atomic.  However,
this is unnecessary, and __get_user_pages_fast is *much* faster
because the regular get_user_pages takes pmd_lock/pte_lock.
In fact, when many CPUs take a nested vmexit at the same time
the contention on those locks is visible, and this patch removes
about 25% (compared to 4.18) from vmexit.flat on a 16 vCPU
nested guest.
Suggested-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9b33da2

KVM: vmx: Add tlb_remote_flush callback support · 877ad952

由 Tianyu Lan 提交于 7月 19, 2018

Register tlb_remote_flush callback for vmx when hyperv capability of
nested guest mapping flush is detected. The interface can help to
reduce overhead when flush ept table among vcpus for nested VM. The
tradition way is to send IPIs to all affected vcpus and executes
INVEPT on each vcpus. It will trigger several vmexits for IPI
and INVEPT emulation. Hyper-V provides such hypercall to do
flush for all vcpus and call the hypercall when all ept table
pointers of single VM are same.
Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

877ad952

KVM: x86: Add tlb remote flush callback in kvm_x86_ops. · b08660e5

由 Tianyu Lan 提交于 7月 19, 2018

This patch is to provide a way for platforms to register hv tlb remote
flush callback and this helps to optimize operation of tlb flush
among vcpus for nested virtualization case.
Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b08660e5

X86/Hyper-V: Add hyperv_nested_flush_guest_mapping ftrace support · 60cfce4c

由 Tianyu Lan 提交于 7月 19, 2018

This patch is to add hyperv_nested_flush_guest_mapping support to trace
hvFlushGuestPhysicalAddressSpace hypercall.
Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
Acked-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

60cfce4c

X86/Hyper-V: Add flush HvFlushGuestPhysicalAddressSpace hypercall support · eb914cfe

由 Tianyu Lan 提交于 7月 19, 2018

Hyper-V supports a pv hypercall HvFlushGuestPhysicalAddressSpace to
flush nested VM address space mapping in l1 hypervisor and it's to
reduce overhead of flushing ept tlb among vcpus. This patch is to
implement it.
Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
Acked-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eb914cfe

x86/kvm: Don't use pvqspinlock code if only 1 vCPU · 3553ae56

由 Waiman Long 提交于 7月 17, 2018

On a VM with only 1 vCPU, the locking fast path will always be
successful. In this case, there is no need to use the the PV qspinlock
code which has higher overhead on the unlock side than the native
qspinlock code.
Signed-off-by: NWaiman Long <longman@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3553ae56

KVM/MMU: Simplify __kvm_sync_page() function · 450917b6

由 Tianyu Lan 提交于 7月 18, 2018

Merge check of "sp->role.cr4_pae != !!is_pae(vcpu))" and "vcpu->
arch.mmu.sync_page(vcpu, sp) == 0". kvm_mmu_prepare_zap_page()
is called under both these conditions.
Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

450917b6

kvm: x86: Remove CR3_PCID_INVD flag · 208320ba

由 Junaid Shahid 提交于 6月 27, 2018

It is a duplicate of X86_CR3_PCID_NOFLUSH. So just use that instead.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

208320ba

kvm: x86: Add multi-entry LRU cache for previous CR3s · b94742c9

由 Junaid Shahid 提交于 6月 27, 2018

Adds support for storing multiple previous CR3/root_hpa pairs maintained
as an LRU cache, so that the lockless CR3 switch path can be used when
switching back to any of them.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b94742c9

kvm: x86: Flush only affected TLB entries in kvm_mmu_invlpg* · faff8758

由 Junaid Shahid 提交于 6月 29, 2018

This needs a minor bug fix. The updated patch is as follows.

Thanks,
Junaid

------------------------------------------------------------------------------

kvm_mmu_invlpg() and kvm_mmu_invpcid_gva() only need to flush the TLB
entries for the specific guest virtual address, instead of flushing all
TLB entries associated with the VM.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

faff8758

kvm: x86: Skip shadow page resync on CR3 switch when indicated by guest · 956bf353

由 Junaid Shahid 提交于 6月 27, 2018

When the guest indicates that the TLB doesn't need to be flushed in a
CR3 switch, we can also skip resyncing the shadow page tables since an
out-of-sync shadow page table is equivalent to an out-of-sync TLB.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

956bf353

kvm: x86: Support selectively freeing either current or previous MMU root · 08fb59d8

由 Junaid Shahid 提交于 6月 27, 2018

kvm_mmu_free_roots() now takes a mask specifying which roots to free, so
that either one of the roots (active/previous) can be individually freed
when needed.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

08fb59d8

kvm: x86: Add a root_hpa parameter to kvm_mmu->invlpg() · 7eb77e9f

由 Junaid Shahid 提交于 6月 27, 2018

This allows invlpg() to be called using either the active root_hpa
or the prev_root_hpa.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7eb77e9f

kvm: x86: Skip TLB flush on fast CR3 switch when indicated by guest · ade61e28

由 Junaid Shahid 提交于 6月 27, 2018

When PCIDs are enabled, the MSb of the source operand for a MOV-to-CR3
instruction indicates that the TLB doesn't need to be flushed.

This change enables this optimization for MOV-to-CR3s in the guest
that have been intercepted by KVM for shadow paging and are handled
within the fast CR3 switch path.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ade61e28

kvm: vmx: Support INVPCID in shadow paging mode · eb4b248e

由 Junaid Shahid 提交于 6月 27, 2018

Implement support for INVPCID in shadow paging mode as well.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eb4b248e

kvm: x86: Propagate guest PCIDs to host PCIDs · c9470a2e

由 Junaid Shahid 提交于 6月 27, 2018

When using shadow paging mode, propagate the guest's PCID value to
the shadow CR3 in the host instead of always using PCID 0.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9470a2e

kvm: x86: Add ability to skip TLB flush when switching CR3 · afe828d1

由 Junaid Shahid 提交于 6月 27, 2018

Remove the implicit flush from the set_cr3 handlers, so that the
callers are able to decide whether to flush the TLB or not.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

afe828d1

kvm: x86: Use fast CR3 switch for nested VMX · 50c28f21

由 Junaid Shahid 提交于 6月 27, 2018

Use the fast CR3 switch mechanism to locklessly change the MMU root
page when switching between L1 and L2. The switch from L2 to L1 should
always go through the fast path, while the switch from L1 to L2 should
go through the fast path if L1's CR3/EPTP for L2 hasn't changed
since the last time.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

50c28f21

kvm: x86: Support resetting the MMU context without resetting roots · 1c53da3f

由 Junaid Shahid 提交于 6月 27, 2018

This adds support for re-initializing the MMU context in a different
mode while preserving the active root_hpa and the prev_root.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1c53da3f

kvm: x86: Add support for fast CR3 switch across different MMU modes · 0aab33e4

由 Junaid Shahid 提交于 6月 27, 2018

This generalizes the lockless CR3 switch path to be able to work
across different MMU modes (e.g. nested vs non-nested) by checking
that the expected page role of the new root page matches the page role
of the previously stored root page in addition to checking that the new
CR3 matches the previous CR3. Furthermore, instead of loading the
hardware CR3 in fast_cr3_switch(), it is now done in vcpu_enter_guest(),
as by that time the MMU context would be up-to-date with the VCPU mode.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0aab33e4

kvm: x86: Introduce KVM_REQ_LOAD_CR3 · 6e42782f

由 Junaid Shahid 提交于 6月 27, 2018

The KVM_REQ_LOAD_CR3 request loads the hardware CR3 using the
current root_hpa.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6e42782f

kvm: x86: Introduce kvm_mmu_calc_root_page_role() · 9fa72119

由 Junaid Shahid 提交于 6月 27, 2018

These functions factor out the base role calculation from the
corresponding kvm_init_*_mmu() functions. The new functions return
what would be the role assigned to a root page in the current VCPU
state. This can be masked with mmu_base_role_mask to derive the base
role.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9fa72119

kvm: x86: Add fast CR3 switch code path · 7c390d35

由 Junaid Shahid 提交于 6月 27, 2018

When using shadow paging, a CR3 switch in the guest results in a VM Exit.
In the common case, that VM exit doesn't require much processing by KVM.
However, it does acquire the MMU lock, which can start showing signs of
contention under some workloads even on a 2 VCPU VM when the guest is
using KPTI. Therefore, we add a fast path that avoids acquiring the MMU
lock in the most common cases e.g. when switching back and forth between
the kernel and user mode CR3s used by KPTI with no guest page table
changes in between.

For now, this fast path is implemented only for 64-bit guests and hosts
to avoid the handling of PDPTEs, but it can be extended later to 32-bit
guests and/or hosts as well.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7c390d35

kvm: x86: Avoid taking MMU lock in kvm_mmu_sync_roots if no sync is needed · 578e1c4d

由 Junaid Shahid 提交于 6月 27, 2018

kvm_mmu_sync_roots() can locklessly check whether a sync is needed and just
bail out if it isn't.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

578e1c4d

kvm: x86: Make sync_page() flush remote TLBs once only · 5ce4786f

由 Junaid Shahid 提交于 6月 27, 2018

sync_page() calls set_spte() from a loop across a page table. It would
work better if set_spte() left the TLB flushing to its callers, so that
sync_page() can aggregate into a single call.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5ce4786f

KVM: MMU: drop vcpu param in gpte_access · 42522d08

由 Peter Xu 提交于 7月 18, 2018

It's never used.  Drop it.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

42522d08

KVM: nVMX: Separate logic allocating shadow vmcs to a function · abfc52c6

由 Liran Alon 提交于 6月 23, 2018

No functionality change.
This is done as a preparation for VMCS shadowing virtualization.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

abfc52c6

KVM: VMX: Mark vmcs header as shadow in case alloc_vmcs_cpu() allocate shadow vmcs · 491a6038

由 Liran Alon 提交于 6月 23, 2018

No functionality change.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

491a6038

KVM: nVMX: Expose VMCS shadowing to L1 guest · 32c7acf0

由 Liran Alon 提交于 6月 23, 2018

Expose VMCS shadowing to L1 as a VMX capability of the virtual CPU,
whether or not VMCS shadowing is supported by the physical CPU.
(VMCS shadowing emulation)

Shadowed VMREADs and VMWRITEs from L2 are handled by L0, without a
VM-exit to L1.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

32c7acf0

KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by... · a7cde481

由 Liran Alon 提交于 6月 23, 2018

KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by vmcs12 vmread/vmwrite bitmaps

This is done as a preparation for VMCS shadowing emulation.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a7cde481

KVM: nVMX: vmread/vmwrite: Use shadow vmcs12 if running L2 · 6d894f49

由 Liran Alon 提交于 6月 23, 2018

This is done as a preparation to VMCS shadowing emulation.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6d894f49

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功