提交 · 7313c698050387a11c21afb0c6b4c61f21f7c042 · openanolis / cloud-kernel

02 8月, 2017 1 次提交

KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12 · 7313c698

由 Paolo Bonzini 提交于 7月 27, 2017

Do this in the caller of nested_vmx_vmexit instead.

nested_vmx_check_exception was doing a vmwrite to the vmcs02's
VM_EXIT_INTR_ERROR_CODE field, so that prepare_vmcs12 would move
the field to vmcs12->vm_exit_intr_error_code.  However that isn't
possible on pre-Haswell machines.  Moving the vmcs12 write to the
callers fixes it.
Reported-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[Changed nested_vmx_reflect_vmexit() return type to (int)1 from (bool)1,
 thanks to fengguang.wu@intel.com]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

7313c698

27 7月, 2017 2 次提交

KVM: nVMX: Fix loss of L2's NMI blocking state · 2d6144e3

由 Wanpeng Li 提交于 7月 25, 2017

Run kvm-unit-tests/eventinj.flat in L1 w/ ept=0 on both L0 and L1:

Before NMI IRET test
Sending NMI to self
NMI isr running stack 0x461000
Sending nested NMI to self
After nested NMI to self
Nested NMI isr running rip=40038e
After iret
After NMI to self
FAIL: NMI

Commit 4c4a6f79 (KVM: nVMX: track NMI blocking state separately
for each VMCS) tracks NMI blocking state separately for vmcs01 and
vmcs02. However it is not enough:

 - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault
   on IRET, so the L2 can generate #PF which can be intercepted by L0.
 - L0 walks L1's guest page table and sees the mapping is invalid, it
   resumes the L1 guest and injects the #PF into L1.  At this point the
   vmcs02 has nmi_known_unmasked=true.
 - L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field
   of vmcs12 (and fixes the shadow page table) before resuming L2 guest.
 - L1 executes VMRESUME to resume L2, causing a vmexit to L0
 - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the
   interruptibility-state field of vmcs02, but nmi_known_unmasked is
   still true.
 - L2 immediately exits to L0 with another page fault, because L0 still has
   not updated the NGVA->HPA page tables.  However, nmi_known_unmasked is
   true so vmx_recover_nmi_blocking does not do anything.

The fix is to update nmi_known_unmasked when preparing vmcs02 from vmcs12.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2d6144e3

KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode · 06a5524f

由 Wincy Van 提交于 4月 28, 2017

The PI vector for L0 and L1 must be different. If dest vcpu0
is in guest mode while vcpu1 is delivering a non-nested PI to
vcpu0, there wont't be any vmexit so that the non-nested interrupt
will be delayed.
Signed-off-by: NWincy Van <fanwenyi0529@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

06a5524f

24 7月, 2017 1 次提交
- P
  KVM: VMX: remove unused field · fa19871a
  由 Paolo Bonzini 提交于 7月 14, 2017
```
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
  fa19871a
20 7月, 2017 1 次提交

KVM: VMX: Fix invalid guest state detection after task-switch emulation · f244deed

由 Wanpeng Li 提交于 7月 20, 2017

This can be reproduced by EPT=1, unrestricted_guest=N, emulate_invalid_state=Y
or EPT=0, the trace of kvm-unit-tests/taskswitch2.flat is like below, it tries
to emulate invalid guest state task-switch:

kvm_exit: reason TASK_SWITCH rip 0x0 info 40000058 0
kvm_emulate_insn: 42000:0:0f 0b (0x2)
kvm_emulate_insn: 42000:0:0f 0b (0x2) failed
kvm_inj_exception: #UD (0x0)
kvm_entry: vcpu 0
kvm_exit: reason TASK_SWITCH rip 0x0 info 40000058 0
kvm_emulate_insn: 42000:0:0f 0b (0x2)
kvm_emulate_insn: 42000:0:0f 0b (0x2) failed
kvm_inj_exception: #UD (0x0)
......................

It appears that the task-switch emulation updates rflags (and vm86
flag) only after the segments are loaded, causing vmx->emulation_required
to be set, when in fact invalid guest state emulation is not needed.

This patch fixes it by updating vmx->emulation_required after the
rflags (and vm86 flag) is updated in task-switch emulation.

Thanks Radim for moving the update to vmx__set_flags and adding Paolo's
suggestion for the check.
Suggested-by: NNadav Amit <nadav.amit@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

f244deed

19 7月, 2017 2 次提交

KVM: nVMX: Disallow VM-entry in MOV-SS shadow · b3f1dfb6

由 Jim Mattson 提交于 7月 17, 2017

Immediately following MOV-to-SS/POP-to-SS, VM-entry is
disallowed. This check comes after the check for a valid VMCS. When
this check fails, the instruction pointer should fall through to the
next instruction, the ALU flags should be set to indicate VMfailValid,
and the VM-instruction error should be set to 26 ("VM entry with
events blocked by MOV SS").
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b3f1dfb6

KVM: nVMX: track NMI blocking state separately for each VMCS · 4c4a6f79

由 Paolo Bonzini 提交于 7月 14, 2017

vmx_recover_nmi_blocking is using a cached value of the guest
interruptibility info, which is stored in vmx->nmi_known_unmasked.
vmx_recover_nmi_blocking is run for both normal and nested guests,
so the cached value must be per-VMCS.

This fixes eventinj.flat in a nested non-EPT environment.  With EPT it
works, because the EPT violation handler doesn't have the
vmx->nmi_known_unmasked optimization (it is unnecessary because, unlike
vmx_recover_nmi_blocking, it can just look at the exit qualification).

Thanks to Wanpeng Li for debugging the testcase and providing an initial
patch.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

4c4a6f79

14 7月, 2017 4 次提交

KVM: async_pf: Let guest support delivery of async_pf from guest mode · 52a5c155

由 Wanpeng Li 提交于 7月 13, 2017

Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1,
async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0,
kvm_can_do_async_pf returns 0 if in guest mode.

This is similar to what svm.c wanted to do all along, but it is only
enabled for Linux as L1 hypervisor.  Foreign hypervisors must never
receive async page faults as vmexits, because they'd probably be very
confused about that.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

52a5c155

KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf · adfe20fb

由 Wanpeng Li 提交于 7月 13, 2017

Add an nested_apf field to vcpu->arch.exception to identify an async page
fault, and constructs the expected vm-exit information fields. Force a
nested VM exit from nested_vmx_check_exception() if the injected #PF is
async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

adfe20fb

KVM: async_pf: Add L1 guest async_pf #PF vmexit handler · 1261bfa3

由 Wanpeng Li 提交于 7月 13, 2017

This patch adds the L1 guest async page fault #PF vmexit handler, such
by L1 similar to ordinary async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
[Passed insn parameters to kvm_mmu_page_fault().]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

1261bfa3

KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list · cfcd20e5

由 Wanpeng Li 提交于 7月 13, 2017

This patch removes all arguments except the first in
kvm_x86_ops->queue_exception since they can extract the arguments from
vcpu->arch.exception themselves.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

cfcd20e5

13 7月, 2017 5 次提交

kvm: vmx: Properly handle machine check during VM-entry · 48ae0fb4

由 Jim Mattson 提交于 5月 22, 2017

vmx_complete_atomic_exit should call kvm_machine_check for any
VM-entry failure due to a machine-check event. Such an exit should be
recognized solely by its basic exit reason (i.e. the low 16 bits of
the VMCS exit reason field). None of the other VMCS exit information
fields contain valid information when the VM-exit is due to "VM-entry
failure due to machine-check event".
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NXiao Guangrong <xiaoguangrong@tencent.com>
[Changed VM_EXIT_INTR_INFO condition to better describe its reason.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

48ae0fb4

kvm: nVMX: Shadow "high" parts of shadowed 64-bit VMCS fields · 85fd514e

由 Jim Mattson 提交于 7月 07, 2017

Inconsistencies result from shadowing only accesses to the full
64-bits of a 64-bit VMCS field, but not shadowing accesses to the high
32-bits of the field. The "high" part of a 64-bit field should be
shadowed whenever the full 64-bit field is shadowed.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

85fd514e

kvm: nVMX: Fix nested_vmx_check_msr_bitmap_controls · 5fa99cbe

由 Jim Mattson 提交于 7月 06, 2017

Allow the L1 guest to specify the last page of addressable guest
physical memory for an L2 MSR permission bitmap. Also remove the
vmcs12_read_any() check that should never fail.

Fixes: 3af18d9c ("KVM: nVMX: Prepare for using hardware MSR bitmap")
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5fa99cbe

kvm: nVMX: Validate the I/O bitmaps on nested VM-entry · 56a20510

由 Jim Mattson 提交于 7月 06, 2017

According to the SDM, if the "use I/O bitmaps" VM-execution control is
1, bits 11:0 of each I/O-bitmap address must be 0. Neither address
should set any bits beyond the processor's physical-address width.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56a20510

kvm: nVMX: Don't set vmcs12 to "launched" when VMLAUNCH fails · 7cdc2d62

由 Jim Mattson 提交于 7月 06, 2017

The VMCS launch state is not set to "launched" unless the VMLAUNCH
actually succeeds. VMLAUNCH failure includes VM-exits with bit 31 set.

Note that this change does not address the general problem that a
failure to launch/resume vmcs02 (i.e. vmx->fail) is not handled
correctly.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7cdc2d62

10 7月, 2017 1 次提交

KVM: vmx: expose more information for KVM_INTERNAL_ERROR_DELIVERY_EV exits · 70bcd708

由 Paolo Bonzini 提交于 7月 05, 2017

This exit ended up being reported, but the currently exposed data does not provide
much of a starting point for debugging. In the reported case, the vmexit was
an EPT misconfiguration (MMIO access). Let userspace report ethe exit qualification
and, if relevant, the GPA.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

70bcd708

04 7月, 2017 1 次提交

kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS · 691bd434

由 Haozhong Zhang 提交于 7月 04, 2017

It's easier for host applications, such as QEMU, if they can always
access guest MSR_IA32_BNDCFGS in VMCS, even though MPX is disabled in
guest cpuid.

Cc: stable@vger.kernel.org
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

691bd434

03 7月, 2017 2 次提交

x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 · 995f00a6

由 Peter Feiner 提交于 6月 30, 2017

EPT A/D was enabled in the vmcs02 EPTP regardless of the vmcs12's EPTP
value. The problem is that enabling A/D changes the behavior of L2's
x86 page table walks as seen by L1. With A/D enabled, x86 page table
walks are always treated as EPT writes.

Commit ae1e2d10 ("kvm: nVMX: support EPT accessed/dirty bits",
2017-03-30) tried to work around this problem by clearing the write
bit in the exit qualification for EPT violations triggered by page
walks.  However, that fixup introduced the opposite bug: page-table walks
that actually set x86 A/D bits were *missing* the write bit in the exit
qualification.

This patch fixes the problem by disabling EPT A/D in the shadow MMU
when EPT A/D is disabled in vmcs12's EPTP.
Signed-off-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

995f00a6

x86: kvm: mmu: make spte mmio mask more explicit · dcdca5fe

由 Peter Feiner 提交于 6月 30, 2017

Specify both a mask (i.e., bits to consider) and a value (i.e.,
pattern of bits that indicates a special PTE) for mmio SPTEs. On
Intel, this lets us pack even more information into the
(SPTE_SPECIAL_MASK | EPT_VMX_RWX_MASK) mask we use for access
tracking liberating all (SPTE_SPECIAL_MASK | (non-misconfigured-RWX))
values.
Signed-off-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dcdca5fe

30 6月, 2017 2 次提交

objtool, x86: Add several functions and files to the objtool whitelist · c207aee4

由 Josh Poimboeuf 提交于 6月 28, 2017

In preparation for an objtool rewrite which will have broader checks,
whitelist functions and files which cause problems because they do
unusual things with the stack.

These whitelists serve as a TODO list for which functions and files
don't yet have undwarf unwinder coverage.  Eventually most of the
whitelists can be removed in favor of manual CFI hint annotations or
objtool improvements.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

c207aee4

KVM: LAPIC: Fix lapic timer injection delay · c8533544

由 Wanpeng Li 提交于 6月 29, 2017

If the TSC deadline timer is programmed really close to the deadline or
even in the past, the computation in vmx_set_hv_timer will program the
absolute target tsc value to vmcs preemption timer field w/ delta == 0,
then plays a vmentry and an upcoming vmx preemption timer fire vmexit
dance, the lapic timer injection is delayed due to this duration. Actually
the lapic timer which is emulated by hrtimer can handle this correctly.

This patch fixes it by firing the lapic timer and injecting a timer interrupt
immediately during the next vmentry if the TSC deadline timer is programmed
really close to the deadline or even in the past. This saves ~300 cycles on
the tsc_deadline_timer test of apic.flat.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c8533544

29 6月, 2017 1 次提交

kvm: nVMX: Check memory operand to INVVPID · 40352605

由 Jim Mattson 提交于 6月 28, 2017

The memory operand fetched for INVVPID is 128 bits. Bits 63:16 are
reserved and must be zero.  Otherwise, the instruction fails with
VMfail(Invalid operand to INVEPT/INVVPID).  If the INVVPID_TYPE is 0
(individual address invalidation), then bits 127:64 must be in
canonical form, or the instruction fails with VMfail(Invalid operand
to INVEPT/INVVPID).
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

40352605

13 6月, 2017 1 次提交

x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3() · 6c690ee1

由 Andy Lutomirski 提交于 6月 12, 2017

The kernel has several code paths that read CR3.  Most of them assume that
CR3 contains the PGD's physical address, whereas some of them awkwardly
use PHYSICAL_PAGE_MASK to mask off low bits.

Add explicit mask macros for CR3 and convert all of the CR3 readers.
This will keep them from breaking when PCID is enabled.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

6c690ee1

07 6月, 2017 5 次提交

KVM: nVMX: Update vmcs12->guest_linear_address on nested VM-exit · d281e13b

由 Jim Mattson 提交于 6月 01, 2017

The guest-linear address field is set for VM exits due to attempts to
execute LMSW with a memory operand and VM exits due to attempts to
execute INS or OUTS for which the relevant segment is usable,
regardless of whether or not EPT is in use.

Fixes: 119a9c01 ("KVM: nVMX: pass valid guest linear-address to the L1")
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

d281e13b

KVM: nVMX: Don't update vmcs12->xss_exit_bitmap on nested VM-exit · d923fcf6

由 Jim Mattson 提交于 6月 01, 2017

The XSS-exiting bitmap is a VMCS control field that does not change
while the CPU is in non-root mode. Transferring the unchanged value
from vmcs02 to vmcs12 is unnecessary.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

d923fcf6

kvm: vmx: Check value written to IA32_BNDCFGS · 4531662d

由 Jim Mattson 提交于 5月 23, 2017

Bits 11:2 must be zero and the linear addess in bits 63:12 must be
canonical. Otherwise, WRMSR(BNDCFGS) should raise #GP.

Fixes: 0dd376e7 ("KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save")
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

4531662d

kvm: x86: Guest BNDCFGS requires guest MPX support · 4439af9f

由 Jim Mattson 提交于 5月 24, 2017

The BNDCFGS MSR should only be exposed to the guest if the guest
supports MPX. (cf. the TSC_AUX MSR and RDTSCP.)

Fixes: 0dd376e7 ("KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save")
Change-Id: I3ad7c01bda616715137ceac878f3fa7e66b6b387
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

4439af9f

kvm: vmx: Do not disable intercepts for BNDCFGS · a8b6fda3

由 Jim Mattson 提交于 5月 23, 2017

The MSR permission bitmaps are shared by all VMs. However, some VMs
may not be configured to support MPX, even when the host does. If the
host supports VMX and the guest does not, we should intercept accesses
to the BNDCFGS MSR, so that we can synthesize a #GP
fault. Furthermore, if the host does not support MPX and the
"ignore_msrs" kvm kernel parameter is set, then we should intercept
accesses to the BNDCFGS MSR, so that we can skip over the rdmsr/wrmsr
without raising a #GP fault.

Fixes: da8999d3 ("KVM: x86: Intel MPX vmx and msr handle")
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

a8b6fda3

06 6月, 2017 1 次提交

KVM: nVMX: Fix exception injection · d4912215

由 Wanpeng Li 提交于 6月 05, 2017

 WARNING: CPU: 3 PID: 2840 at arch/x86/kvm/vmx.c:10966 nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
 CPU: 3 PID: 2840 Comm: qemu-system-x86 Tainted: G           OE   4.12.0-rc3+ #23
 RIP: 0010:nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
 Call Trace:
  ? kvm_check_async_pf_completion+0xef/0x120 [kvm]
  ? rcu_read_lock_sched_held+0x79/0x80
  vmx_queue_exception+0x104/0x160 [kvm_intel]
  ? vmx_queue_exception+0x104/0x160 [kvm_intel]
  kvm_arch_vcpu_ioctl_run+0x1171/0x1ce0 [kvm]
  ? kvm_arch_vcpu_load+0x47/0x240 [kvm]
  ? kvm_arch_vcpu_load+0x62/0x240 [kvm]
  kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
  ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
  ? __fget+0xf3/0x210
  do_vfs_ioctl+0xa4/0x700
  ? __fget+0x114/0x210
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x81/0x220
  entry_SYSCALL64_slow_path+0x25/0x25

This is triggered occasionally by running both win7 and win2016 in L2, in
addition, EPT is disabled on both L1 and L2. It can't be reproduced easily.

Commit 0b6ac343 (KVM: nVMX: Correct handling of exception injection) mentioned
that "KVM wants to inject page-faults which it got to the guest. This function
assumes it is called with the exit reason in vmcs02 being a #PF exception".
Commit e011c663 (KVM: nVMX: Check all exceptions for intercept during delivery to
L2) allows to check all exceptions for intercept during delivery to L2. However,
there is no guarantee the exit reason is exception currently, when there is an
external interrupt occurred on host, maybe a time interrupt for host which should
not be injected to guest, and somewhere queues an exception, then the function
nested_vmx_check_exception() will be called and the vmexit emulation codes will
try to emulate the "Acknowledge interrupt on exit" behavior, the warning is
triggered.

Reusing the exit reason from the L2->L0 vmexit is wrong in this case,
the reason must always be EXCEPTION_NMI when injecting an exception into
L1 as a nested vmexit.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Fixes: e011c663 ("KVM: nVMX: Check all exceptions for intercept during delivery to L2")
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

d4912215

05 6月, 2017 1 次提交

x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant · d6e41f11

由 Andy Lutomirski 提交于 5月 28, 2017

When PCID is enabled, CR3's PCID bits can change during context
switches, so KVM won't be able to treat CR3 as a per-mm constant any
more.

I structured this like the existing CR4 handling.  Under ordinary
circumstances (PCID disabled or if the current PCID and the value
that's already in the VMCS match), then we won't do an extra VMCS
write, and we'll never do an extra direct CR3 read.  The overhead
should be minimal.

I disallowed using the new helper in non-atomic context because
PCID support will cause CR3 to stop being constant in non-atomic
process context.

(Frankly, it also scares me a bit that KVM ever treated CR3 as
constant, but it looks like it was okay before.)
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

d6e41f11

01 6月, 2017 1 次提交

KVM: white space cleanup in nested_vmx_setup_ctls_msrs() · 7461fbc4

由 Dan Carpenter 提交于 5月 18, 2017

This should have been indented one more character over and it should use
tabs.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

7461fbc4

30 5月, 2017 1 次提交

KVM: nVMX: fix nested_vmx_check_vmptr failure paths under debugging · cbf71279

由 Radim Krčmář 提交于 5月 19, 2017

kvm_skip_emulated_instruction() will return 0 if userspace is
single-stepping the guest.

kvm_skip_emulated_instruction() uses return status convention of exit
handler: 0 means "exit to userspace" and 1 means "continue vm entries".
The problem is that nested_vmx_check_vmptr() return status means
something else: 0 is ok, 1 is error.

This means we would continue executing after a failure.  Static checker
noticed it because vmptr was not initialized.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: 6affcbed ("KVM: x86: Add kvm_skip_emulated_instruction and use it.")
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cbf71279

26 5月, 2017 1 次提交

KVM: nVMX: Fix handling of lmsw instruction · e1d39b17

由 Jan H. Schönherr 提交于 5月 20, 2017

The decision whether or not to exit from L2 to L1 on an lmsw instruction is
based on bogus values: instead of using the information encoded within the
exit qualification, it uses the data also used for the mov-to-cr
instruction, which boils down to using whatever is in %eax at that point.

Use the correct values instead.

Without this fix, an L1 may not get notified when a 32-bit Linux L2
switches its secondary CPUs to protected mode; the L1 is only notified on
the next modification of CR0. This short time window poses a problem, when
there is some other reason to exit to L1 in between. Then, L2 will be
resumed in real mode and chaos ensues.
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e1d39b17

15 5月, 2017 2 次提交

KVM: VMX: Don't enable EPT A/D feature if EPT feature is disabled · fce6ac4c

由 Wanpeng Li 提交于 5月 11, 2017

We can observe eptad kvm_intel module parameter is still Y
even if ept is disabled which is weird. This patch will
not enable EPT A/D feature if EPT feature is disabled.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

fce6ac4c

kvm: nVMX: off by one in vmx_write_pml_buffer() · 4769886b

由 Dan Carpenter 提交于 5月 10, 2017

There are PML_ENTITY_NUM elements in the pml_address[] array so the >
should be >= or we write beyond the end of the array when we do:

	pml_address[vmcs12->guest_pml_index--] = gpa;

Fixes: c5f983f6 ("nVMX: Implement emulated Page Modification Logging")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

4769886b

09 5月, 2017 3 次提交

nVMX: Advertise PML to L1 hypervisor · 03efce6f

由 Bandan Das 提交于 5月 05, 2017

Advertise the PML bit in vmcs12 but don't try to enable
it in hardware when running L2 since L0 is emulating it. Also,
preserve L0's settings for PML since it may still
want to log writes.
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

03efce6f

nVMX: Implement emulated Page Modification Logging · c5f983f6

由 Bandan Das 提交于 5月 05, 2017

With EPT A/D enabled, processor access to L2 guest
paging structures will result in a write violation.
When this happens, write the GUEST_PHYSICAL_ADDRESS
to the pml buffer provided by L1 if the access is
write and the dirty bit is being set.

This patch also adds necessary checks during VMEntry if L1
has enabled PML. If the PML index overflows, we change the
exit reason and run L1 to simulate a PML full event.
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c5f983f6

kvm: nVMX: Validate CR3 target count on nested VM-entry · c7c2c709

由 Jim Mattson 提交于 5月 05, 2017

According to the SDM, the CR3-target count must not be greater than
4. Future processors may support a different number of CR3-target
values. Software should read the VMX capability MSR IA32_VMX_MISC to
determine the number of values supported.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c7c2c709

05 5月, 2017 1 次提交

kvm: nVMX: Don't validate disabled secondary controls · 2e5b0bd9

由 Jim Mattson 提交于 5月 04, 2017

According to the SDM, if the "activate secondary controls" primary
processor-based VM-execution control is 0, no checks are performed on
the secondary processor-based VM-execution controls.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2e5b0bd9

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功