提交 · a057e0e22ca11d04959648694945ebcc44873159 · openanolis / cloud-kernel

12 8月, 2017 1 次提交

KVM: nVMX: validate eptp pointer · a057e0e2

由 David Hildenbrand 提交于 8月 10, 2017

Let's reuse the function introduced with eptp switching.

We don't explicitly have to check against enable_ept_ad_bits, as this
is implicitly done when checking against nested_vmx_ept_caps in
valid_ept_address().
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a057e0e2

10 8月, 2017 2 次提交

kvm: nVMX: Add support for fast unprotection of nested guest page tables · eebed243

由 Paolo Bonzini 提交于 11月 28, 2016

This is the same as commit 14727754 ("kvm: svm: Add support for
additional SVM NPF error codes", 2016-11-23), but for Intel processors.
In this case, the exit qualification field's bit 8 says whether the
EPT violation occurred while translating the guest's final physical
address or rather while translating the guest page tables.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eebed243

KVM: X86: Fix residual mmio emulation request to userspace · bbeac283

由 Wanpeng Li 提交于 8月 09, 2017

Reported by syzkaller:

The kvm-intel.unrestricted_guest=0

   WARNING: CPU: 5 PID: 1014 at /home/kernel/data/kvm/arch/x86/kvm//x86.c:7227 kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
   CPU: 5 PID: 1014 Comm: warn_test Tainted: G        W  OE   4.13.0-rc3+ #8
   RIP: 0010:kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
   Call Trace:
    ? put_pid+0x3a/0x50
    ? rcu_read_lock_sched_held+0x79/0x80
    ? kmem_cache_free+0x2f2/0x350
    kvm_vcpu_ioctl+0x340/0x700 [kvm]
    ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
    ? __fget+0xfc/0x210
    do_vfs_ioctl+0xa4/0x6a0
    ? __fget+0x11d/0x210
    SyS_ioctl+0x79/0x90
    entry_SYSCALL_64_fastpath+0x23/0xc2
    ? __this_cpu_preempt_check+0x13/0x20

The syszkaller folks reported a residual mmio emulation request to userspace
due to vm86 fails to emulate inject real mode interrupt(fails to read CS) and
incurs a triple fault. The vCPU returns to userspace with vcpu->mmio_needed == true
and KVM_EXIT_SHUTDOWN exit reason. However, the syszkaller testcase constructs
several threads to launch the same vCPU, the thread which lauch this vCPU after
the thread whichs get the vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN will
trigger the warning.

   #define _GNU_SOURCE
   #include <pthread.h>
   #include <stdio.h>
   #include <stdlib.h>
   #include <string.h>
   #include <sys/wait.h>
   #include <sys/types.h>
   #include <sys/stat.h>
   #include <sys/mman.h>
   #include <fcntl.h>
   #include <unistd.h>
   #include <linux/kvm.h>
   #include <stdio.h>

   int kvmcpu;
   struct kvm_run *run;

   void* thr(void* arg)
   {
     int res;
     res = ioctl(kvmcpu, KVM_RUN, 0);
     printf("ret1=%d exit_reason=%d suberror=%d\n",
         res, run->exit_reason, run->internal.suberror);
     return 0;
   }

   void test()
   {
     int i, kvm, kvmvm;
     pthread_t th[4];

     kvm = open("/dev/kvm", O_RDWR);
     kvmvm = ioctl(kvm, KVM_CREATE_VM, 0);
     kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0);
     run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, kvmcpu, 0);
     srand(getpid());
     for (i = 0; i < 4; i++) {
       pthread_create(&th[i], 0, thr, 0);
       usleep(rand() % 10000);
     }
     for (i = 0; i < 4; i++)
       pthread_join(th[i], 0);
   }

   int main()
   {
     for (;;) {
       int pid = fork();
       if (pid < 0)
         exit(1);
       if (pid == 0) {
         test();
         exit(0);
       }
       int status;
       while (waitpid(pid, &status, __WALL) != pid) {}
     }
     return 0;
   }

This patch fixes it by resetting the vcpu->mmio_needed once we receive
the triple fault to avoid the residue.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Tested-by: NDmitry Vyukov <dvyukov@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bbeac283

08 8月, 2017 2 次提交

KVM: X86: implement the logic for spinlock optimization · de63ad4c

由 Longpeng(Mike) 提交于 8月 08, 2017

get_cpl requires vcpu_load, so we must cache the result (whether the
vcpu was preempted when its cpl=0) in kvm_vcpu_arch.
Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

de63ad4c

KVM: add spinlock optimization framework · 199b5763

由 Longpeng(Mike) 提交于 8月 08, 2017

If a vcpu exits due to request a user mode spinlock, then
the spinlock-holder may be preempted in user mode or kernel mode.
(Note that not all architectures trap spin loops in user mode,
only AMD x86 and ARM/ARM64 currently do).

But if a vcpu exits in kernel mode, then the holder must be
preempted in kernel mode, so we should choose a vcpu in kernel mode
as a more likely candidate for the lock holder.

This introduces kvm_arch_vcpu_in_kernel() to decide whether the
vcpu is in kernel-mode when it's preempted.  kvm_vcpu_on_spin's
new argument says the same of the spinning VCPU.
Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

199b5763

07 8月, 2017 8 次提交

KVM: x86: use general helpers for some cpuid manipulation · 1b4d56b8

由 Radim Krčmář 提交于 8月 05, 2017

Add guest_cpuid_clear() and use it instead of kvm_find_cpuid_entry().
Also replace some uses of kvm_find_cpuid_entry() with guest_cpuid_has().
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1b4d56b8

KVM: x86: generalize guest_cpuid_has_ helpers · d6321d49

由 Radim Krčmář 提交于 8月 05, 2017

This patch turns guest_cpuid_has_XYZ(cpuid) into guest_cpuid_has(cpuid,
X86_FEATURE_XYZ), which gets rid of many very similar helpers.

When seeing a X86_FEATURE_*, we can know which cpuid it belongs to, but
this information isn't in common code, so we recreate it for KVM.

Add some BUILD_BUG_ONs to make sure that it runs nicely.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d6321d49

KVM: nVMX: Emulate EPTP switching for the L1 hypervisor · 41ab9372

由 Bandan Das 提交于 8月 03, 2017

When L2 uses vmfunc, L0 utilizes the associated vmexit to
emulate a switching of the ept pointer by reloading the
guest MMU.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NBandan Das <bsd@redhat.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

41ab9372

KVM: nVMX: Enable VMFUNC for the L1 hypervisor · 27c42a1b

由 Bandan Das 提交于 8月 03, 2017

Expose VMFUNC in MSRs and VMCS fields. No actual VMFUNCs are enabled.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NBandan Das <bsd@redhat.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

27c42a1b

KVM: vmx: Enable VMFUNCs · 2a499e49

由 Bandan Das 提交于 8月 03, 2017

Enable VMFUNC in the secondary execution controls.  This simplifies the
changes necessary to expose it to nested hypervisors.  VMFUNCs still
cause #UD when invoked.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NBandan Das <bsd@redhat.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

2a499e49

KVM: nVMX: get rid of nested_release_page* · 53a70daf

由 David Hildenbrand 提交于 8月 03, 2017

Let's also just use the underlying functions directly here.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
[Rebased on top of 9f744c59 ("KVM: nVMX: do not pin the VMCS12")]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

53a70daf

KVM: nVMX: get rid of nested_get_page() · 5e2f30b7

由 David Hildenbrand 提交于 8月 03, 2017

nested_get_page() just sounds confusing. All we want is a page from G1.
This is even unrelated to nested.

Let's introduce kvm_vcpu_gpa_to_page() so we don't get too lengthy
lines.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
[Squash pasto fix from Wanpeng Li. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5e2f30b7

KVM: nVMX: INVPCID support · 90a2db6d

由 Paolo Bonzini 提交于 7月 27, 2017

Expose the "Enable INVPCID" secondary execution control to the guest
and properly reflect the exit reason.

In addition, before this patch the guest was always running with
INVPCID enabled, causing pcid.flat's "Test on INVPCID when disabled"
test to fail.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

90a2db6d

03 8月, 2017 4 次提交

KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit" · 6550c4df

由 Wanpeng Li 提交于 7月 31, 2017

------------[ cut here ]------------
 WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
 CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
 RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
Call Trace:
  vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
  ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
  kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
  ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x8f/0x750
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL64_slow_path+0x25/0x25

This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which
means that tells the kernel to not make use of any IOAPICs that may be present
in the system.

Actually external_intr variable in nested_vmx_vmexit() is the req_int_win
variable passed from vcpu_enter_guest() which means that the L0's userspace
requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
L0's userspace reqeusts an irq window) is true, so there is no interrupt which
L1 requires to inject to L2, we should not attempt to emualte "Acknowledge
interrupt on exit" for the irq window requirement in this scenario.

This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"
if there is no L1 requirement to inject an interrupt to L2.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
[Added code comment to make it obvious that the behavior is not correct.
 We should do a userspace exit with open interrupt window instead of the
 nested VM exit.  This patch still improves the behavior, so it was
 accepted as a (temporary) workaround.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

6550c4df

KVM: nVMX: mark vmcs12 pages dirty on L2 exit · c9f04407

由 David Matlack 提交于 8月 01, 2017

The host physical addresses of L1's Virtual APIC Page and Posted
Interrupt descriptor are loaded into the VMCS02. The CPU may write
to these pages via their host physical address while L2 is running,
bypassing address-translation-based dirty tracking (e.g. EPT write
protection). Mark them dirty on every exit from L2 to prevent them
from getting out of sync with dirty tracking.

Also mark the virtual APIC page and the posted interrupt descriptor
dirty when KVM is virtualizing posted interrupt processing.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c9f04407

kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown · 8ca44e88

由 David Matlack 提交于 8月 01, 2017

According to the Intel SDM, software cannot rely on the current VMCS to be
coherent after a VMXOFF or shutdown. So this is a valid way to handle VMCS12
flushes.

24.11.1 Software Use of Virtual-Machine Control Structures
...
  If a logical processor leaves VMX operation, any VMCSs active on
  that logical processor may be corrupted (see below). To prevent
  such corruption of a VMCS that may be used either after a return
  to VMX operation or on another logical processor, software should
  execute VMCLEAR for that VMCS before executing the VMXOFF instruction
  or removing power from the processor (e.g., as part of a transition
  to the S3 and S4 power states).
...

This fixes a "suspicious rcu_dereference_check() usage!" warning during
kvm_vm_release() because nested_release_vmcs12() calls
kvm_vcpu_write_guest_page() without holding kvm->srcu.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

8ca44e88

KVM: nVMX: do not pin the VMCS12 · 9f744c59

由 Paolo Bonzini 提交于 7月 27, 2017

Since the current implementation of VMCS12 does a memcpy in and out
of guest memory, we do not need current_vmcs12 and current_vmcs12_page
anymore.  current_vmptr is enough to read and write the VMCS12.

And David Matlack noted:

  This patch also fixes dirty tracking (memslot->dirty_bitmap) of the
  VMCS12 page by using kvm_write_guest. nested_release_page() only marks
  the struct page dirty.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
[Added David Matlack's note and nested_release_page_clean() fix.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

9f744c59

02 8月, 2017 2 次提交

KVM: nVMX: fixes to nested virt interrupt injection · b96fb439

由 Paolo Bonzini 提交于 7月 27, 2017

There are three issues in nested_vmx_check_exception:

1) it is not taking PFEC_MATCH/PFEC_MASK into account, as reported
by Wanpeng Li;

2) it should rebuild the interruption info and exit qualification fields
from scratch, as reported by Jim Mattson, because the values from the
L2->L0 vmexit may be invalid (e.g. if an emulated instruction causes
a page fault, the EPT misconfig's exit qualification is incorrect).

3) CR2 and DR6 should not be written for exception intercept vmexits
(CR2 only for AMD).

This patch fixes the first two and adds a comment about the last,
outlining the fix.

Cc: Jim Mattson <jmattson@google.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b96fb439

KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12 · 7313c698

由 Paolo Bonzini 提交于 7月 27, 2017

Do this in the caller of nested_vmx_vmexit instead.

nested_vmx_check_exception was doing a vmwrite to the vmcs02's
VM_EXIT_INTR_ERROR_CODE field, so that prepare_vmcs12 would move
the field to vmcs12->vm_exit_intr_error_code.  However that isn't
possible on pre-Haswell machines.  Moving the vmcs12 write to the
callers fixes it.
Reported-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[Changed nested_vmx_reflect_vmexit() return type to (int)1 from (bool)1,
 thanks to fengguang.wu@intel.com]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

7313c698

27 7月, 2017 2 次提交

KVM: nVMX: Fix loss of L2's NMI blocking state · 2d6144e3

由 Wanpeng Li 提交于 7月 25, 2017

Run kvm-unit-tests/eventinj.flat in L1 w/ ept=0 on both L0 and L1:

Before NMI IRET test
Sending NMI to self
NMI isr running stack 0x461000
Sending nested NMI to self
After nested NMI to self
Nested NMI isr running rip=40038e
After iret
After NMI to self
FAIL: NMI

Commit 4c4a6f79 (KVM: nVMX: track NMI blocking state separately
for each VMCS) tracks NMI blocking state separately for vmcs01 and
vmcs02. However it is not enough:

 - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault
   on IRET, so the L2 can generate #PF which can be intercepted by L0.
 - L0 walks L1's guest page table and sees the mapping is invalid, it
   resumes the L1 guest and injects the #PF into L1.  At this point the
   vmcs02 has nmi_known_unmasked=true.
 - L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field
   of vmcs12 (and fixes the shadow page table) before resuming L2 guest.
 - L1 executes VMRESUME to resume L2, causing a vmexit to L0
 - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the
   interruptibility-state field of vmcs02, but nmi_known_unmasked is
   still true.
 - L2 immediately exits to L0 with another page fault, because L0 still has
   not updated the NGVA->HPA page tables.  However, nmi_known_unmasked is
   true so vmx_recover_nmi_blocking does not do anything.

The fix is to update nmi_known_unmasked when preparing vmcs02 from vmcs12.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2d6144e3

KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode · 06a5524f

由 Wincy Van 提交于 4月 28, 2017

The PI vector for L0 and L1 must be different. If dest vcpu0
is in guest mode while vcpu1 is delivering a non-nested PI to
vcpu0, there wont't be any vmexit so that the non-nested interrupt
will be delayed.
Signed-off-by: NWincy Van <fanwenyi0529@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

06a5524f

24 7月, 2017 1 次提交
- P
  KVM: VMX: remove unused field · fa19871a
  由 Paolo Bonzini 提交于 7月 14, 2017
```
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
  fa19871a
20 7月, 2017 1 次提交

KVM: VMX: Fix invalid guest state detection after task-switch emulation · f244deed

由 Wanpeng Li 提交于 7月 20, 2017

This can be reproduced by EPT=1, unrestricted_guest=N, emulate_invalid_state=Y
or EPT=0, the trace of kvm-unit-tests/taskswitch2.flat is like below, it tries
to emulate invalid guest state task-switch:

kvm_exit: reason TASK_SWITCH rip 0x0 info 40000058 0
kvm_emulate_insn: 42000:0:0f 0b (0x2)
kvm_emulate_insn: 42000:0:0f 0b (0x2) failed
kvm_inj_exception: #UD (0x0)
kvm_entry: vcpu 0
kvm_exit: reason TASK_SWITCH rip 0x0 info 40000058 0
kvm_emulate_insn: 42000:0:0f 0b (0x2)
kvm_emulate_insn: 42000:0:0f 0b (0x2) failed
kvm_inj_exception: #UD (0x0)
......................

It appears that the task-switch emulation updates rflags (and vm86
flag) only after the segments are loaded, causing vmx->emulation_required
to be set, when in fact invalid guest state emulation is not needed.

This patch fixes it by updating vmx->emulation_required after the
rflags (and vm86 flag) is updated in task-switch emulation.

Thanks Radim for moving the update to vmx__set_flags and adding Paolo's
suggestion for the check.
Suggested-by: NNadav Amit <nadav.amit@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

f244deed

19 7月, 2017 2 次提交

KVM: nVMX: Disallow VM-entry in MOV-SS shadow · b3f1dfb6

由 Jim Mattson 提交于 7月 17, 2017

Immediately following MOV-to-SS/POP-to-SS, VM-entry is
disallowed. This check comes after the check for a valid VMCS. When
this check fails, the instruction pointer should fall through to the
next instruction, the ALU flags should be set to indicate VMfailValid,
and the VM-instruction error should be set to 26 ("VM entry with
events blocked by MOV SS").
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b3f1dfb6

KVM: nVMX: track NMI blocking state separately for each VMCS · 4c4a6f79

由 Paolo Bonzini 提交于 7月 14, 2017

vmx_recover_nmi_blocking is using a cached value of the guest
interruptibility info, which is stored in vmx->nmi_known_unmasked.
vmx_recover_nmi_blocking is run for both normal and nested guests,
so the cached value must be per-VMCS.

This fixes eventinj.flat in a nested non-EPT environment.  With EPT it
works, because the EPT violation handler doesn't have the
vmx->nmi_known_unmasked optimization (it is unnecessary because, unlike
vmx_recover_nmi_blocking, it can just look at the exit qualification).

Thanks to Wanpeng Li for debugging the testcase and providing an initial
patch.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

4c4a6f79

14 7月, 2017 4 次提交

KVM: async_pf: Let guest support delivery of async_pf from guest mode · 52a5c155

由 Wanpeng Li 提交于 7月 13, 2017

Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1,
async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0,
kvm_can_do_async_pf returns 0 if in guest mode.

This is similar to what svm.c wanted to do all along, but it is only
enabled for Linux as L1 hypervisor.  Foreign hypervisors must never
receive async page faults as vmexits, because they'd probably be very
confused about that.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

52a5c155

KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf · adfe20fb

由 Wanpeng Li 提交于 7月 13, 2017

Add an nested_apf field to vcpu->arch.exception to identify an async page
fault, and constructs the expected vm-exit information fields. Force a
nested VM exit from nested_vmx_check_exception() if the injected #PF is
async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

adfe20fb

KVM: async_pf: Add L1 guest async_pf #PF vmexit handler · 1261bfa3

由 Wanpeng Li 提交于 7月 13, 2017

This patch adds the L1 guest async page fault #PF vmexit handler, such
by L1 similar to ordinary async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
[Passed insn parameters to kvm_mmu_page_fault().]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

1261bfa3

KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list · cfcd20e5

由 Wanpeng Li 提交于 7月 13, 2017

This patch removes all arguments except the first in
kvm_x86_ops->queue_exception since they can extract the arguments from
vcpu->arch.exception themselves.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

cfcd20e5

13 7月, 2017 5 次提交

kvm: vmx: Properly handle machine check during VM-entry · 48ae0fb4

由 Jim Mattson 提交于 5月 22, 2017

vmx_complete_atomic_exit should call kvm_machine_check for any
VM-entry failure due to a machine-check event. Such an exit should be
recognized solely by its basic exit reason (i.e. the low 16 bits of
the VMCS exit reason field). None of the other VMCS exit information
fields contain valid information when the VM-exit is due to "VM-entry
failure due to machine-check event".
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NXiao Guangrong <xiaoguangrong@tencent.com>
[Changed VM_EXIT_INTR_INFO condition to better describe its reason.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

48ae0fb4

kvm: nVMX: Shadow "high" parts of shadowed 64-bit VMCS fields · 85fd514e

由 Jim Mattson 提交于 7月 07, 2017

Inconsistencies result from shadowing only accesses to the full
64-bits of a 64-bit VMCS field, but not shadowing accesses to the high
32-bits of the field. The "high" part of a 64-bit field should be
shadowed whenever the full 64-bit field is shadowed.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

85fd514e

kvm: nVMX: Fix nested_vmx_check_msr_bitmap_controls · 5fa99cbe

由 Jim Mattson 提交于 7月 06, 2017

Allow the L1 guest to specify the last page of addressable guest
physical memory for an L2 MSR permission bitmap. Also remove the
vmcs12_read_any() check that should never fail.

Fixes: 3af18d9c ("KVM: nVMX: Prepare for using hardware MSR bitmap")
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5fa99cbe

kvm: nVMX: Validate the I/O bitmaps on nested VM-entry · 56a20510

由 Jim Mattson 提交于 7月 06, 2017

According to the SDM, if the "use I/O bitmaps" VM-execution control is
1, bits 11:0 of each I/O-bitmap address must be 0. Neither address
should set any bits beyond the processor's physical-address width.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56a20510

kvm: nVMX: Don't set vmcs12 to "launched" when VMLAUNCH fails · 7cdc2d62

由 Jim Mattson 提交于 7月 06, 2017

The VMCS launch state is not set to "launched" unless the VMLAUNCH
actually succeeds. VMLAUNCH failure includes VM-exits with bit 31 set.

Note that this change does not address the general problem that a
failure to launch/resume vmcs02 (i.e. vmx->fail) is not handled
correctly.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7cdc2d62

10 7月, 2017 1 次提交

KVM: vmx: expose more information for KVM_INTERNAL_ERROR_DELIVERY_EV exits · 70bcd708

由 Paolo Bonzini 提交于 7月 05, 2017

This exit ended up being reported, but the currently exposed data does not provide
much of a starting point for debugging. In the reported case, the vmexit was
an EPT misconfiguration (MMIO access). Let userspace report ethe exit qualification
and, if relevant, the GPA.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

70bcd708

04 7月, 2017 1 次提交

kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS · 691bd434

由 Haozhong Zhang 提交于 7月 04, 2017

It's easier for host applications, such as QEMU, if they can always
access guest MSR_IA32_BNDCFGS in VMCS, even though MPX is disabled in
guest cpuid.

Cc: stable@vger.kernel.org
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

691bd434

03 7月, 2017 2 次提交

x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 · 995f00a6

由 Peter Feiner 提交于 6月 30, 2017

EPT A/D was enabled in the vmcs02 EPTP regardless of the vmcs12's EPTP
value. The problem is that enabling A/D changes the behavior of L2's
x86 page table walks as seen by L1. With A/D enabled, x86 page table
walks are always treated as EPT writes.

Commit ae1e2d10 ("kvm: nVMX: support EPT accessed/dirty bits",
2017-03-30) tried to work around this problem by clearing the write
bit in the exit qualification for EPT violations triggered by page
walks.  However, that fixup introduced the opposite bug: page-table walks
that actually set x86 A/D bits were *missing* the write bit in the exit
qualification.

This patch fixes the problem by disabling EPT A/D in the shadow MMU
when EPT A/D is disabled in vmcs12's EPTP.
Signed-off-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

995f00a6

x86: kvm: mmu: make spte mmio mask more explicit · dcdca5fe

由 Peter Feiner 提交于 6月 30, 2017

Specify both a mask (i.e., bits to consider) and a value (i.e.,
pattern of bits that indicates a special PTE) for mmio SPTEs. On
Intel, this lets us pack even more information into the
(SPTE_SPECIAL_MASK | EPT_VMX_RWX_MASK) mask we use for access
tracking liberating all (SPTE_SPECIAL_MASK | (non-misconfigured-RWX))
values.
Signed-off-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dcdca5fe

30 6月, 2017 2 次提交

objtool, x86: Add several functions and files to the objtool whitelist · c207aee4

由 Josh Poimboeuf 提交于 6月 28, 2017

In preparation for an objtool rewrite which will have broader checks,
whitelist functions and files which cause problems because they do
unusual things with the stack.

These whitelists serve as a TODO list for which functions and files
don't yet have undwarf unwinder coverage.  Eventually most of the
whitelists can be removed in favor of manual CFI hint annotations or
objtool improvements.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

c207aee4

KVM: LAPIC: Fix lapic timer injection delay · c8533544

由 Wanpeng Li 提交于 6月 29, 2017

If the TSC deadline timer is programmed really close to the deadline or
even in the past, the computation in vmx_set_hv_timer will program the
absolute target tsc value to vmcs preemption timer field w/ delta == 0,
then plays a vmentry and an upcoming vmx preemption timer fire vmexit
dance, the lapic timer injection is delayed due to this duration. Actually
the lapic timer which is emulated by hrtimer can handle this correctly.

This patch fixes it by firing the lapic timer and injecting a timer interrupt
immediately during the next vmentry if the TSC deadline timer is programmed
really close to the deadline or even in the past. This saves ~300 cycles on
the tsc_deadline_timer test of apic.flat.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c8533544

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功