提交 · 50c28f21d045dde8c52548f8482d456b3f0956f5 · openanolis / cloud-kernel

06 8月, 2018 18 次提交

kvm: x86: Use fast CR3 switch for nested VMX · 50c28f21

由 Junaid Shahid 提交于 6月 27, 2018

Use the fast CR3 switch mechanism to locklessly change the MMU root
page when switching between L1 and L2. The switch from L2 to L1 should
always go through the fast path, while the switch from L1 to L2 should
go through the fast path if L1's CR3/EPTP for L2 hasn't changed
since the last time.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

50c28f21

KVM: nVMX: Separate logic allocating shadow vmcs to a function · abfc52c6

由 Liran Alon 提交于 6月 23, 2018

No functionality change.
This is done as a preparation for VMCS shadowing virtualization.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

abfc52c6

KVM: VMX: Mark vmcs header as shadow in case alloc_vmcs_cpu() allocate shadow vmcs · 491a6038

由 Liran Alon 提交于 6月 23, 2018

No functionality change.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

491a6038

KVM: nVMX: Expose VMCS shadowing to L1 guest · 32c7acf0

由 Liran Alon 提交于 6月 23, 2018

Expose VMCS shadowing to L1 as a VMX capability of the virtual CPU,
whether or not VMCS shadowing is supported by the physical CPU.
(VMCS shadowing emulation)

Shadowed VMREADs and VMWRITEs from L2 are handled by L0, without a
VM-exit to L1.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

32c7acf0

KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by... · a7cde481

由 Liran Alon 提交于 6月 23, 2018

KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by vmcs12 vmread/vmwrite bitmaps

This is done as a preparation for VMCS shadowing emulation.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a7cde481

KVM: nVMX: vmread/vmwrite: Use shadow vmcs12 if running L2 · 6d894f49

由 Liran Alon 提交于 6月 23, 2018

This is done as a preparation to VMCS shadowing emulation.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6d894f49

KVM: nVMX: include shadow vmcs12 in nested state · fa58a9fa

由 Paolo Bonzini 提交于 7月 18, 2018

The shadow vmcs12 cannot be flushed on KVM_GET_NESTED_STATE,
because at that point guest memory is assumed by userspace to
be immutable.  Capture the cache in vmx_get_nested_state, adding
another page at the end if there is an active shadow vmcs12.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fa58a9fa

KVM: nVMX: Cache shadow vmcs12 on VMEntry and flush to memory on VMExit · 61ada748

由 Liran Alon 提交于 6月 23, 2018

This is done is done as a preparation to VMCS shadowing emulation.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61ada748

KVM: nVMX: Verify VMCS shadowing VMCS link pointer · f145d90d

由 Liran Alon 提交于 6月 23, 2018

Intel SDM considers these checks to be part of
"Checks on Guest Non-Register State".

Note that it is legal for vmcs->vmcs_link_pointer to be -1ull
when VMCS shadowing is enabled. In this case, any VMREAD/VMWRITE to
shadowed-field sets the ALU flags for VMfailInvalid (i.e. CF=1).
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f145d90d

KVM: nVMX: Verify VMCS shadowing controls · a8a7c02b

由 Liran Alon 提交于 6月 23, 2018

Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a8a7c02b

KVM: nVMX: Introduce nested_cpu_has_shadow_vmcs() · f792d274

由 Liran Alon 提交于 6月 23, 2018

Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f792d274

KVM: nVMX: Fail VMLAUNCH and VMRESUME on shadow VMCS · a6192d40

由 Liran Alon 提交于 6月 23, 2018

Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a6192d40

KVM: nVMX: Allow VMPTRLD for shadow VMCS if vCPU supports VMCS shadowing · fa97d7db

由 Liran Alon 提交于 7月 18, 2018

Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fa97d7db

KVM: VMX: Change vmcs12_{read,write}_any() to receive vmcs12 as parameter · e2536742

由 Liran Alon 提交于 6月 23, 2018

No functionality change.
This is done as a preparation for VMCS shadowing emulation.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e2536742

KVM: VMX: Create struct for VMCS header · 392b2f25

由 Liran Alon 提交于 6月 23, 2018

No functionality change.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

392b2f25

kvm: nVMX: Introduce KVM_CAP_NESTED_STATE · 8fcc4b59

由 Jim Mattson 提交于 7月 10, 2018

For nested virtualization L0 KVM is managing a bit of state for L2 guests,
this state can not be captured through the currently available IOCTLs. In
fact the state captured through all of these IOCTLs is usually a mix of L1
and L2 state. It is also dependent on whether the L2 guest was running at
the moment when the process was interrupted to save its state.

With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
that is in VMX operation.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NJim Mattson <jmattson@google.com>
[karahmed@ - rename structs and functions and make them ready for AMD and
             address previous comments.
           - handle nested.smm state.
           - rebase & a bit of refactoring.
           - Merge 7/8 and 8/8 into one patch. ]
Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8fcc4b59

KVM: x86: do not load vmcs12 pages while still in SMM · 7f7f1ba3

由 Paolo Bonzini 提交于 7月 18, 2018

If the vCPU enters system management mode while running a nested guest,
RSM starts processing the vmentry while still in SMM.  In that case,
however, the pages pointed to by the vmcs12 might be incorrectly
loaded from SMRAM.  To avoid this, delay the handling of the pages
until just before the next vmentry.  This is done with a new request
and a new entry in kvm_x86_ops, which we will be able to reuse for
nested VMX state migration.

Extracted from a patch by Jim Mattson and KarimAllah Ahmed.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7f7f1ba3

KVM: vmx: remove save/restore of host BNDCGFS MSR · cf81a7e5

由 Sean Christopherson 提交于 7月 11, 2018

Linux does not support Memory Protection Extensions (MPX) in the
kernel itself, thus the BNDCFGS (Bound Config Supervisor) MSR will
always be zero in the KVM host, i.e. RDMSR in vmx_save_host_state()
is superfluous.  KVM unconditionally sets VM_EXIT_CLEAR_BNDCFGS,
i.e. BNDCFGS will always be zero after VMEXIT, thus manually loading
BNDCFGS is also superfluous.

And in the event the MPX kernel support is added (unlikely given
that MPX for userspace is in its death throes[1]), BNDCFGS will
likely be common across all CPUs[2], and at the least shouldn't
change on a regular basis, i.e. saving the MSR on every VMENTRY is
completely unnecessary.

WARN_ONCE in hardware_setup() if the host's BNDCFGS is non-zero to
document that KVM does not preserve BNDCFGS and to serve as a hint
as to how BNDCFGS likely should be handled if MPX is used in the
kernel, e.g. BNDCFGS should be saved once during KVM setup.

[1] https://lkml.org/lkml/2018/4/27/1046
[2] http://www.openwall.com/lists/kernel-hardening/2017/07/24/28Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cf81a7e5

18 7月, 2018 1 次提交

KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled · 2307af1c

由 Liran Alon 提交于 6月 29, 2018

When eVMCS is enabled, all VMCS allocated to be used by KVM are marked
with revision_id of KVM_EVMCS_VERSION instead of revision_id reported
by MSR_IA32_VMX_BASIC.

However, even though not explictly documented by TLFS, VMXArea passed
as VMXON argument should still be marked with revision_id reported by
physical CPU.

This issue was found by the following setup:
* L0 = KVM which expose eVMCS to it's L1 guest.
* L1 = KVM which consume eVMCS reported by L0.
This setup caused the following to occur:
1) L1 execute hardware_enable().
2) hardware_enable() calls kvm_cpu_vmxon() to execute VMXON.
3) L0 intercept L1 VMXON and execute handle_vmon() which notes
vmxarea->revision_id != VMCS12_REVISION and therefore fails with
nested_vmx_failInvalid() which sets RFLAGS.CF.
4) L1 kvm_cpu_vmxon() don't check RFLAGS.CF for failure and therefore
hardware_enable() continues as usual.
5) L1 hardware_enable() then calls ept_sync_global() which executes
INVEPT.
6) L0 intercept INVEPT and execute handle_invept() which notes
!vmx->nested.vmxon and thus raise a #UD to L1.
7) Raised #UD caused L1 to panic.
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 773e8a04Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2307af1c

15 7月, 2018 2 次提交

kvm: nVMX: Restore exit qual for VM-entry failure due to MSR loading · 0b88abdc

由 Jim Mattson 提交于 5月 30, 2018

This exit qualification was inadvertently dropped when the two
VM-entry failure blocks were coalesced.

Fixes: e79f245d ("X86/KVM: Properly update 'tsc_offset' to represent the running guest")
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0b88abdc

x86/kvm/vmx: don't read current->thread.{fs,gs}base of legacy tasks · b062b794

由 Vitaly Kuznetsov 提交于 7月 11, 2018

When we switched from doing rdmsr() to reading FS/GS base values from
current->thread we completely forgot about legacy 32-bit userspaces which
we still support in KVM (why?). task->thread.{fsbase,gsbase} are only
synced for 64-bit processes, calling save_fsgs_for_kvm() and using
its result from current is illegal for legacy processes.

There's no ARCH_SET_FS/GS prctls for legacy applications. Base MSRs are,
however, not always equal to zero. Intel's manual says (3.4.4 Segment
Loading Instructions in IA-32e Mode):

"In order to set up compatibility mode for an application, segment-load
instructions (MOV to Sreg, POP Sreg) work normally in 64-bit mode. An
entry is read from the system descriptor table (GDT or LDT) and is loaded
in the hidden portion of the segment register.
...
The hidden descriptor register fields for FS.base and GS.base are
physically mapped to MSRs in order to load all address bits supported by
a 64-bit implementation.
"

The issue was found by strace test suite where 32-bit ioctl_kvm_run test
started segfaulting.
Reported-by: NDmitry V. Levin <ldv@altlinux.org>
Bisected-by: NMasatake YAMATO <yamato@redhat.com>
Fixes: 42b933b5 ("x86/kvm/vmx: read MSR_{FS,KERNEL_GS}_BASE from current->thread")
Cc: stable@vger.kernel.org
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b062b794

22 6月, 2018 1 次提交

kvm: vmx: Nested VM-entry prereqs for event inj. · 0447378a

由 Marc Orr 提交于 6月 20, 2018

This patch extends the checks done prior to a nested VM entry.
Specifically, it extends the check_vmentry_prereqs function with checks
for fields relevant to the VM-entry event injection information, as
described in the Intel SDM, volume 3.

This patch is motivated by a syzkaller bug, where a bad VM-entry
interruption information field is generated in the VMCS02, which causes
the nested VM launch to fail. Then, KVM fails to resume L1.

While KVM should be improved to correctly resume L1 execution after a
failed nested launch, this change is justified because the existing code
to resume L1 is flaky/ad-hoc and the test coverage for resuming L1 is
sparse.
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NMarc Orr <marcorr@google.com>
[Removed comment whose parts were describing previous revisions and the
 rest was obvious from function/variable naming. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

0447378a

15 6月, 2018 1 次提交

KVM: x86: VMX: redo fix for link error without CONFIG_HYPERV · 1f008e11

由 Arnd Bergmann 提交于 5月 25, 2018

Arnd had sent this patch to the KVM mailing list, but it slipped through
the cracks of maintainers hand-off, and therefore wasn't included in
the pull request.

The same issue had been fixed by Linus in commit dbee3d02 ("KVM: x86:
VMX: fix build without hyper-v", 2018-06-12) as a self-described
"quick-and-hacky build fix".  However, checking the compile-time
configuration symbol with IS_ENABLED is cleaner and it is enough to
avoid the link error, so switch to Arnd's solution.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
[Rewritten commit message. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1f008e11

13 6月, 2018 1 次提交

KVM: x86: VMX: fix build without hyper-v · dbee3d02

由 Linus Torvalds 提交于 6月 12, 2018

Commit ceef7d10 ("KVM: x86: VMX: hyper-v: Enlightened MSR-Bitmap
support") broke the build with Hyper-V disabled, because it accesses
ms_hyperv.nested_features without checking if that exists.

This is the quick-and-hacky build fix.

I suspect the proper fix is to replace the

    static_branch_unlikely(&enable_evmcs)

tests with an inline helper function that also checks that CONFIG_HYPERV
is enabled, since without that, enable_evmcs makes no sense.

But I want a working build environment first and foremost, and I'm upset
this slipped through in the first place.  My primary build tests missed
it because I tend to build with everything enabled, but it should have
been caught in the kvm tree.

Fixes: ceef7d10 ("KVM: x86: VMX: hyper-v: Enlightened MSR-Bitmap support")
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dbee3d02

12 6月, 2018 2 次提交

KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system · ce14e868

由 Paolo Bonzini 提交于 6月 06, 2018

Int the next patch the emulator's .read_std and .write_std callbacks will
grow another argument, which is not needed in kvm_read_guest_virt and
kvm_write_guest_virt_system's callers. Since we have to make separate
functions, let's give the currently existing names a nicer interface, too.

Fixes: 129a72a0 ("KVM: x86: Introduce segmented_write_std", 2017-01-12)
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ce14e868

kvm: nVMX: Enforce cpl=0 for VMX instructions · 727ba748

由 Felix Wilhelm 提交于 6月 11, 2018

VMX instructions executed inside a L1 VM will always trigger a VM exit
even when executed with cpl 3. This means we must perform the
privilege check in software.

Fixes: 70f3aac9("kvm: nVMX: Remove superfluous VMX instruction fault checks")
Cc: stable@vger.kernel.org
Signed-off-by: NFelix Wilhelm <fwilhelm@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

727ba748

04 6月, 2018 3 次提交

kvm: nVMX: Add support for "VMWRITE to any supported field" · f4160e45

由 Jim Mattson 提交于 5月 29, 2018

Add support for "VMWRITE to any supported field in the VMCS" and
enable this feature by default in L1's IA32_VMX_MISC MSR. If userspace
clears the VMX capability bit, the old behavior will be restored.

Note that this feature is a prerequisite for kvm in L1 to use VMCS
shadowing, once that feature is available.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f4160e45

kvm: nVMX: Restrict VMX capability MSR changes · a943ac50

由 Jim Mattson 提交于 5月 29, 2018

Disallow changes to the VMX capability MSRs while the vCPU is in VMX
operation. Although this does break the existing API, it helps to
avoid some potentially tricky situations for which there is no
architected behavior.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a943ac50

KVM: VMX: Optimize tscdeadline timer latency · c5ce8235

由 Wanpeng Li 提交于 5月 29, 2018

'Commit d0659d94 ("KVM: x86: add option to advance tscdeadline
hrtimer expiration")' advances the tscdeadline (the timer is emulated
by hrtimer) expiration in order that the latency which is incurred
by hypervisor (apic_timer_fn -> vmentry) can be avoided. This patch
adds the advance tscdeadline expiration support to which the tscdeadline
timer is emulated by VMX preemption timer to reduce the hypervisor
lantency (handle_preemption_timer -> vmentry). The guest can also
set an expiration that is very small (for example in Linux if an
hrtimer feeds a expiration in the past); in that case we set delta_tsc
to 0, leading to an immediately vmexit when delta_tsc is not bigger than
advance ns.

This patch can reduce ~63% latency (~4450 cycles to ~1660 cycles on
a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing
busy waits.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c5ce8235

02 6月, 2018 1 次提交

kvm: Make VM ioctl do valloc for some archs · d1e5b0e9

由 Marc Orr 提交于 5月 15, 2018

The kvm struct has been bloating. For example, it's tens of kilo-bytes
for x86, which turns out to be a large amount of memory to allocate
contiguously via kzalloc. Thus, this patch does the following:
1. Uses architecture-specific routines to allocate the kvm struct via
   vzalloc for x86.
2. Switches arm to __KVM_HAVE_ARCH_VM_ALLOC so that it can use vzalloc
   when has_vhe() is true.

Other architectures continue to default to kalloc, as they have a
dependency on kalloc or have a small-enough struct kvm.
Signed-off-by: NMarc Orr <marcorr@google.com>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d1e5b0e9

25 5月, 2018 3 次提交

KVM: nVMX: Emulate L1 individual-address invvpid by L0 individual-address invvpid · cd9a491f

由 Liran Alon 提交于 5月 22, 2018

When vmcs12 uses VPID, all TLB entries populated by L2 are tagged with
vmx->nested.vpid02. Currently, INVVPID executed by L1 is emulated by L0
by using INVVPID single/global-context to flush all TLB entries
tagged with vmx->nested.vpid02 regardless of INVVPID type executed by
L1.

However, we can easily optimize the case of L1 INVVPID on an
individual-address. Just INVVPID given individual-address tagged with
vmx->nested.vpid02.
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
[Squashed with a preparatory patch that added the !operand.vpid line.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

cd9a491f

KVM: nVMX: Don't flush TLB when vmcs12 uses VPID · 6f1e03bc

由 Liran Alon 提交于 5月 22, 2018

Since commit 5c614b35 ("KVM: nVMX: nested VPID emulation"),
vmcs01 and vmcs02 don't share the same VPID. vmcs01 uses vmx->vpid
while vmcs02 uses vmx->nested.vpid02. This was done such that TLB
flush could be avoided when switching between L1 and L2.

However, the above mentioned commit only changed L2 VMEntry logic to
not flush TLB when switching from L1 to L2. It forgot to also remove
the TLB flush which is done when simulating a VMExit from L2 to L1.

To fix this issue, on VMExit from L2 to L1 we flush TLB only in case
vmcs01 enables VPID and vmcs01->vpid==vmcs02->vpid. This happens when
vmcs01 enables VPID and vmcs12 does not.

Fixes: 5c614b35 ("KVM: nVMX: nested VPID emulation")
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

6f1e03bc

KVM: nVMX: Use vmx local var for referencing vpid02 · 6bce30c7

由 Liran Alon 提交于 5月 22, 2018

Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

6bce30c7

23 5月, 2018 3 次提交

KVM: nVMX: Ensure that VMCS12 field offsets do not change · 21ebf53b

由 Jim Mattson 提交于 5月 01, 2018

Enforce the invariant that existing VMCS12 field offsets must not
change. Experience has shown that without strict enforcement, this
invariant will not be maintained.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Changed the code to use BUILD_BUG_ON_MSG instead of better, but GCC 4.6
 requiring _Static_assert. - Radim.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

21ebf53b

KVM: nVMX: Restore the VMCS12 offsets for v4.0 fields · b348e793

由 Jim Mattson 提交于 5月 01, 2018

Changing the VMCS12 layout will break save/restore compatibility with
older kvm releases once the KVM_{GET,SET}_NESTED_STATE ioctls are
accepted upstream. Google has already been using these ioctls for some
time, and we implore the community not to disturb the existing layout.

Move the four most recently added fields to preserve the offsets of
the previously defined fields and reserve locations for the vmread and
vmwrite bitmaps, which will be used in the virtualization of VMCS
shadowing (to improve the performance of double-nesting).
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Kept the SDM order in vmcs_field_to_offset_table. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b348e793

kvm: nVMX: Use nested_run_pending rather than from_vmentry · 6514dc38

由 Jim Mattson 提交于 4月 26, 2018

When saving a vCPU's nested state, the vmcs02 is discarded. Only the
shadow vmcs12 is saved. The shadow vmcs12 contains all of the
information needed to reconstruct an equivalent vmcs02 on restore, but
we have to be able to deal with two contexts:

1. The nested state was saved immediately after an emulated VM-entry,
   before the vmcs02 was ever launched.

2. The nested state was saved some time after the first successful
   launch of the vmcs02.

Though it's an implementation detail rather than an architected bit,
vmx->nested_run_pending serves to distinguish between these two
cases. Hence, we save it as part of the vCPU's nested state. (Yes,
this is ugly.)

Even when restoring from a checkpoint, it may be necessary to build
the vmcs02 as if prepare_vmcs02 was called from nested_vmx_run. So,
the 'from_vmentry' argument should be dropped, and
vmx->nested_run_pending should be consulted instead. The nested state
restoration code then has to set vmx->nested_run_pending prior to
calling prepare_vmcs02. It's important that the restoration code set
vmx->nested_run_pending anyway, since the flag impacts things like
interrupt delivery as well.

Fixes: cf8b84f4 ("kvm: nVMX: Prepare for checkpointing L2 state")
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

6514dc38

17 5月, 2018 3 次提交

KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD · bc226f07

由 Tom Lendacky 提交于 5月 10, 2018

Expose the new virtualized architectural mechanism, VIRT_SSBD, for using
speculative store bypass disable (SSBD) under SVM.  This will allow guests
to use SSBD on hardware that uses non-architectural mechanisms for enabling
SSBD.

[ tglx: Folded the migration fixup from Paolo Bonzini ]
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

bc226f07

x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL · ccbcd267

由 Thomas Gleixner 提交于 5月 09, 2018

AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store
Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care
about the bit position of the SSBD bit and thus facilitate migration.
Also, the sibling coordination on Family 17H CPUs can only be done on
the host.

Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an
extra argument for the VIRT_SPEC_CTRL MSR.

Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU
data structure which is going to be used in later patches for the actual
implementation.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ccbcd267

x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP · e7c587da

由 Borislav Petkov 提交于 5月 02, 2018

Intel and AMD have different CPUID bits hence for those use synthetic bits
which get set on the respective vendor's in init_speculation_control(). So
that debacles like what the commit message of

  c65732e4 ("x86/cpu: Restore CPUID_8000_0008_EBX reload")

talks about don't happen anymore.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: NJörg Otte <jrg.otte@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20180504161815.GG9257@pd.tnic

e7c587da

15 5月, 2018 1 次提交

kvm: nVMX: Eliminate APIC access page sharing between L1 and L2 · ab5df31c

由 Jim Mattson 提交于 5月 09, 2018

It is only possible to share the APIC access page between L1 and L2 if
they also share the virtual-APIC page.  If L2 has its own virtual-APIC
page, then MMIO accesses to L1's TPR from L2 will access L2's TPR
instead.  Moreover, L1's local APIC has to be in xAPIC mode, which is
another condition that hasn't been checked.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ab5df31c

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功