提交 · d2060bd42e4482b15c35f961a294ee57f369027d · openeuler / Kernel

14 5月, 2020 6 次提交

KVM: nVMX: Open a window for pending nested VMX preemption timer · d2060bd4

由 Sean Christopherson 提交于 4月 22, 2020

Add a kvm_x86_ops hook to detect a nested pending "hypervisor timer" and
use it to effectively open a window for servicing the expired timer.
Like pending SMIs on VMX, opening a window simply means requesting an
immediate exit.

This fixes a bug where an expired VMX preemption timer (for L2) will be
delayed and/or lost if a pending exception is injected into L2.  The
pending exception is rightly prioritized by vmx_check_nested_events()
and injected into L2, with the preemption timer left pending.  Because
no window opened, L2 is free to run uninterrupted.

Fixes: f4124500 ("KVM: nVMX: Fully emulate preemption timer")
Reported-by: NJim Mattson <jmattson@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Peter Shier <pshier@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-3-sean.j.christopherson@intel.com>
[Check it in kvm_vcpu_has_events too, to ensure that the preemption
 timer is serviced promptly even if the vCPU is halted and L1 is not
 intercepting HLT. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d2060bd4

KVM: nVMX: Preserve exception priority irrespective of exiting behavior · 6ce347af

由 Sean Christopherson 提交于 4月 22, 2020

Short circuit vmx_check_nested_events() if an exception is pending and
needs to be injected into L2, priority between coincident events is not
dependent on exiting behavior.  This fixes a bug where a single-step #DB
that is not intercepted by L1 is incorrectly dropped due to servicing a
VMX Preemption Timer VM-Exit.

Injected exceptions also need to be blocked if nested VM-Enter is
pending or an exception was already injected, otherwise injecting the
exception could overwrite an existing event injection from L1.
Technically, this scenario should be impossible, i.e. KVM shouldn't
inject its own exception during nested VM-Enter.  This will be addressed
in a future patch.

Note, event priority between SMI, NMI and INTR is incorrect for L2, e.g.
SMI should take priority over VM-Exit on NMI/INTR, and NMI that is
injected into L2 should take priority over VM-Exit INTR.  This will also
be addressed in a future patch.

Fixes: b6b8a145 ("KVM: nVMX: Rework interception of IRQs and NMIs")
Reported-by: NJim Mattson <jmattson@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Peter Shier <pshier@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6ce347af

KVM: SVM: Implement check_nested_events for NMI · 9c3d370a

由 Cathy Avery 提交于 4月 14, 2020

Migrate nested guest NMI intercept processing
to new check_nested_events.
Signed-off-by: NCathy Avery <cavery@redhat.com>
Message-Id: <20200414201107.22952-2-cavery@redhat.com>
[Reorder clauses as NMIs have higher priority than IRQs; inject
 immediate vmexit as is now done for IRQ vmexits. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9c3d370a

KVM: SVM: immediately inject INTR vmexit · 6e085cbf

由 Paolo Bonzini 提交于 4月 23, 2020

We can immediately leave SVM guest mode in svm_check_nested_events
now that we have the nested_run_pending mechanism.  This makes
things easier because we can run the rest of inject_pending_event
with GIF=0, and KVM will naturally end up requesting the next
interrupt window.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6e085cbf

KVM: SVM: leave halted state on vmexit · 38c0b192

由 Paolo Bonzini 提交于 4月 23, 2020

Similar to VMX, we need to leave the halted state when performing a vmexit.
Failure to do so will cause a hang after vmexit.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

38c0b192

KVM: SVM: introduce nested_run_pending · f74f9414

由 Paolo Bonzini 提交于 4月 23, 2020

We want to inject vmexits immediately from svm_check_nested_events,
so that the interrupt/NMI window requests happen in inject_pending_event
right after it returns.

This however has the same issue as in vmx_check_nested_events, so
introduce a nested_run_pending flag with the exact same purpose
of delaying vmexit injection after the vmentry.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f74f9414

13 5月, 2020 1 次提交

KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c · 37486135

由 Babu Moger 提交于 5月 12, 2020

Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU
resource isn't. It can be read with XSAVE and written with XRSTOR.
So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state),
the guest can read the host value.

In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could
potentially use XRSTOR to change the host PKRU value.

While at it, move pkru state save/restore to common code and the
host_pkru field to kvm_vcpu_arch.  This will let SVM support protection keys.

Cc: stable@vger.kernel.org
Reported-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

37486135

08 5月, 2020 5 次提交

KVM: SVM: Disable AVIC before setting V_IRQ · 7d611233

由 Suravee Suthikulpanit 提交于 5月 06, 2020

The commit 64b5bd27 ("KVM: nSVM: ignore L1 interrupt window
while running L2 with V_INTR_MASKING=1") introduced a WARN_ON,
which checks if AVIC is enabled when trying to set V_IRQ
in the VMCB for enabling irq window.

The following warning is triggered because the requesting vcpu
(to deactivate AVIC) does not get to process APICv update request
for itself until the next #vmexit.

WARNING: CPU: 0 PID: 118232 at arch/x86/kvm/svm/svm.c:1372 enable_irq_window+0x6a/0xa0 [kvm_amd]
 RIP: 0010:enable_irq_window+0x6a/0xa0 [kvm_amd]
 Call Trace:
  kvm_arch_vcpu_ioctl_run+0x6e3/0x1b50 [kvm]
  ? kvm_vm_ioctl_irq_line+0x27/0x40 [kvm]
  ? _copy_to_user+0x26/0x30
  ? kvm_vm_ioctl+0xb3e/0xd90 [kvm]
  ? set_next_entity+0x78/0xc0
  kvm_vcpu_ioctl+0x236/0x610 [kvm]
  ksys_ioctl+0x8a/0xc0
  __x64_sys_ioctl+0x1a/0x20
  do_syscall_64+0x58/0x210
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes by sending APICV update request to all other vcpus, and
immediately update APIC for itself.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lkml.org/lkml/2020/5/2/167
Fixes: 64b5bd27 ("KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1")
Message-Id: <1588818939-54264-1-git-send-email-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7d611233

KVM: Introduce kvm_make_all_cpus_request_except() · 54163a34

由 Suravee Suthikulpanit 提交于 5月 06, 2020

This allows making request to all other vcpus except the one
specified in the parameter.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <1588771076-73790-2-git-send-email-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

54163a34

KVM: VMX: pass correct DR6 for GD userspace exit · 45981ded

由 Paolo Bonzini 提交于 5月 06, 2020

When KVM_EXIT_DEBUG is raised for the disabled-breakpoints case (DR7.GD),
DR6 was incorrectly copied from the value in the VM. Instead,
DR6.BD should be set in order to catch this case.

On AMD this does not need any special code because the processor triggers
a #DB exception that is intercepted. However, the testcase would fail
without the previous patch because both DR6.BS and DR6.BD would be set.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

45981ded

KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6 · d67668e9

由 Paolo Bonzini 提交于 5月 06, 2020

There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the
different handling of DR6 on intercepted #DB exceptions on Intel and AMD.

On Intel, #DB exceptions transmit the DR6 value via the exit qualification
field of the VMCS, and the exit qualification only contains the description
of the precise event that caused a vmexit.

On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception
was to be injected into the guest.  This has two effects when guest debugging
is in use:

* the guest DR6 is clobbered

* the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather
than just the last one that happened (the testcase in the next patch covers
this issue).

This patch fixes both issues by emulating, so to speak, the Intel behavior
on AMD processors.  The important observation is that (after the previous
patches) the VMCB value of DR6 is only ever observable from the guest is
KVM_DEBUGREG_WONT_EXIT is set.  Therefore we can actually set vmcb->save.dr6
to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it
will be if guest debugging is enabled.

Therefore it is possible to enter the guest with an all-zero DR6,
reconstruct the #DB payload from the DR6 we get at exit time, and let
kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6.
Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT
is set, but this is harmless.

This may not be the most optimized way to deal with this, but it is
simple and, being confined within SVM code, it gets rid of the set_dr6
callback and kvm_update_dr6.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d67668e9

KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6 · 5679b803

由 Paolo Bonzini 提交于 5月 04, 2020

kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the
second argument. Ensure that the VMCB value is synchronized to
vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so
that the current value of DR6 is always available in vcpu->arch.dr6.
The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5679b803

07 5月, 2020 5 次提交

P
KVM: nSVM: trap #DB and #BP to userspace if guest debugging is on · 2c19dba6
由 Paolo Bonzini 提交于 5月 07, 2020
```
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
2c19dba6

KVM: X86: Fix single-step with KVM_SET_GUEST_DEBUG · d5d260c5

由 Peter Xu 提交于 5月 05, 2020

When single-step triggered with KVM_SET_GUEST_DEBUG, we should fill in the pc
value with current linear RIP rather than the cached singlestep address.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20200505205000.188252-3-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d5d260c5

KVM: X86: Set RTM for DB_VECTOR too for KVM_EXIT_DEBUG · 13196638

由 Peter Xu 提交于 5月 05, 2020

RTM should always been set even with KVM_EXIT_DEBUG on #DB.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20200505205000.188252-2-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

13196638

KVM: x86: fix DR6 delivery for various cases of #DB injection · 4d5523cf

由 Paolo Bonzini 提交于 5月 05, 2020

Go through kvm_queue_exception_p so that the payload is correctly delivered
through the exit qualification, and add a kvm_update_dr6 call to
kvm_deliver_exception_payload that is needed on AMD.
Reported-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4d5523cf

KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly · b9b2782c

由 Peter Xu 提交于 5月 05, 2020

KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared
as supported.  My wild guess is that userspaces like QEMU are using "#ifdef
KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be
wrong because the compilation host may not be the runtime host.

The userspace might still want to keep the old "#ifdef" though to not break the
guest debug on old kernels.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20200505154750.126300-1-peterx@redhat.com>
[Do the same for PPC and s390. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9b2782c

06 5月, 2020 2 次提交

kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits · 139f7425

由 Paolo Bonzini 提交于 5月 05, 2020

Using CPUID data can be useful for the processor compatibility
check, but that's it.  Using it to compute guest-reserved bits
can have both false positives (such as LA57 and UMIP which we
are already handling) and false negatives: in particular, with
this patch we don't allow anymore a KVM guest to set CR4.PKE
when CR4.PKE is clear on the host.

Fixes: b9dd21e1 ("KVM: x86: simplify handling of PKRU")
Reported-by: NJim Mattson <jmattson@google.com>
Tested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

139f7425

KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path · c7cb2d65

由 Sean Christopherson 提交于 5月 05, 2020

Clear CF and ZF in the VM-Exit path after doing __FILL_RETURN_BUFFER so
that KVM doesn't interpret clobbered RFLAGS as a VM-Fail.  Filling the
RSB has always clobbered RFLAGS, its current incarnation just happens
clear CF and ZF in the processs.  Relying on the macro to clear CF and
ZF is extremely fragile, e.g. commit 089dd8e5 ("x86/speculation:
Change FILL_RETURN_BUFFER to work with objtool") tweaks the loop such
that the ZF flag is always set.
Reported-by: NQian Cai <cai@lca.pw>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Fixes: f2fde6a5 ("KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200506035355.2242-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c7cb2d65

05 5月, 2020 2 次提交

kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts · 8be8f932

由 Paolo Bonzini 提交于 5月 04, 2020

Commit f458d039 ("kvm: ioapic: Lazy update IOAPIC EOI") introduces
the following infinite loop:

BUG: stack guard page was hit at 000000008f595917 \
(stack is 00000000bdefe5a4..00000000ae2b06f5)
kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
RIP: 0010:kvm_set_irq+0x51/0x160 [kvm]
Call Trace:
 irqfd_resampler_ack+0x32/0x90 [kvm]
 kvm_notify_acked_irq+0x62/0xd0 [kvm]
 kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
 ioapic_set_irq+0x20e/0x240 [kvm]
 kvm_ioapic_set_irq+0x5c/0x80 [kvm]
 kvm_set_irq+0xbb/0x160 [kvm]
 ? kvm_hv_set_sint+0x20/0x20 [kvm]
 irqfd_resampler_ack+0x32/0x90 [kvm]
 kvm_notify_acked_irq+0x62/0xd0 [kvm]
 kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
 ioapic_set_irq+0x20e/0x240 [kvm]
 kvm_ioapic_set_irq+0x5c/0x80 [kvm]
 kvm_set_irq+0xbb/0x160 [kvm]
 ? kvm_hv_set_sint+0x20/0x20 [kvm]
....

The re-entrancy happens because the irq state is the OR of
the interrupt state and the resamplefd state.  That is, we don't
want to show the state as 0 until we've had a chance to set the
resamplefd.  But if the interrupt has _not_ gone low then
ioapic_set_irq is invoked again, causing an infinite loop.

This can only happen for a level-triggered interrupt, otherwise
irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID high
and then low.  Fortunately, in the case of level-triggered interrupts the VMEXIT already happens because
TMR is set.  Thus, fix the bug by restricting the lazy invocation
of the ack notifier to edge-triggered interrupts, the only ones that
need it.
Tested-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reported-by: borisvk@bstnet.org
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Link: https://www.spinics.net/lists/kvm/msg213512.html
Fixes: f458d039 ("kvm: ioapic: Lazy update IOAPIC EOI")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207489Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8be8f932

KVM: x86: Fixes posted interrupt check for IRQs delivery modes · 637543a8

由 Suravee Suthikulpanit 提交于 4月 07, 2020

Current logic incorrectly uses the enum ioapic_irq_destination_types
to check the posted interrupt destination types. However, the value was
set using APIC_DM_XXX macros, which are left-shifted by 8 bits.

Fixes by using the APIC_DM_FIXED and APIC_DM_LOWEST instead.

Fixes: (fdcf7562 'KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes')
Cc: Alexander Graf <graf@amazon.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <1586239989-58305-1-git-send-email-suravee.suthikulpanit@amd.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Tested-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

637543a8

04 5月, 2020 2 次提交

KVM: SVM: fill in kvm_run->debug.arch.dr[67] · dee919d1

由 Paolo Bonzini 提交于 5月 04, 2020

The corresponding code was added for VMX in commit 42dbaa5a
("KVM: x86: Virtualize debug registers, 2008-12-15) but never for AMD.
Fix this.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dee919d1

KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning · f9336e32

由 Sean Christopherson 提交于 5月 04, 2020

Use BUG() in the impossible-to-hit default case when switching on the
scope of INVEPT to squash a warning with clang 11 due to clang treating
the BUG_ON() as conditional.

  >> arch/x86/kvm/vmx/nested.c:5246:3: warning: variable 'roots_to_free'
     is used uninitialized whenever 'if' condition is false
     [-Wsometimes-uninitialized]
                   BUG_ON(1);
Reported-by: Nkbuild test robot <lkp@intel.com>
Fixes: ce8fe7b7 ("KVM: nVMX: Free only the affected contexts when emulating INVEPT")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200504153506.28898-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f9336e32

01 5月, 2020 1 次提交

KVM: arm64: Fix 32bit PC wrap-around · 0225fd5e

由 Marc Zyngier 提交于 4月 29, 2020

In the unlikely event that a 32bit vcpu traps into the hypervisor
on an instruction that is located right at the end of the 32bit
range, the emulation of that instruction is going to increment
PC past the 32bit range. This isn't great, as userspace can then
observe this value and get a bit confused.

Conversly, userspace can do things like (in the context of a 64bit
guest that is capable of 32bit EL0) setting PSTATE to AArch64-EL0,
set PC to a 64bit value, change PSTATE to AArch32-USR, and observe
that PC hasn't been truncated. More confusion.

Fix both by:
- truncating PC increments for 32bit guests
- sanitizing all 32bit regs every time a core reg is changed by
  userspace, and that PSTATE indicates a 32bit mode.

Cc: stable@vger.kernel.org
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

0225fd5e

30 4月, 2020 2 次提交

KVM: arm64: Save/restore sp_el0 as part of __guest_enter · 6e977984

由 Marc Zyngier 提交于 4月 24, 2020

We currently save/restore sp_el0 in C code. This is a bit unsafe,
as a lot of the C code expects 'current' to be accessible from
there (and the opportunity to run kernel code in HYP is specially
great with VHE).

Instead, let's move the save/restore of sp_el0 to the assembly
code (in __guest_enter), making sure that sp_el0 is correct
very early on when we exit the guest, and is preserved as long
as possible to its host value when we enter the guest.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

6e977984

KVM: arm64: Delete duplicated label in invalid_vector · 6aea9e05

由 Fangrui Song 提交于 4月 13, 2020

SYM_CODE_START defines \label , so it is redundant to define \label again.
A redefinition at the same place is accepted by GNU as
(https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=159fbb6088f17a341bcaaac960623cab881b4981)
but rejected by the clang integrated assembler.

Fixes: 617a2f39 ("arm64: kvm: Annotate assembly using modern annoations")
Signed-off-by: NFangrui Song <maskray@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Tested-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/988
Link: https://lore.kernel.org/r/20200413231016.250737-1-maskray@google.com

6aea9e05

25 4月, 2020 2 次提交

KVM: SVM: do not allow VMRUN inside SMM · 7c67f546

由 Paolo Bonzini 提交于 4月 23, 2020

VMRUN is not supported inside the SMM handler and the behavior is undefined.
Just raise a #UD.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7c67f546

KVM: nVMX: Store vmcs.EXIT_QUALIFICATION as an unsigned long, not u32 · 87796555

由 Sean Christopherson 提交于 4月 22, 2020

Use an unsigned long for 'exit_qual' in nested_vmx_reflect_vmexit(), the
EXIT_QUALIFICATION field is naturally sized, not a 32-bit field.

The bug is most easily observed by doing VMXON (or any VMX instruction)
in L2 with a negative displacement, in which case dropping the upper
bits on nested VM-Exit results in L1 calculating the wrong virtual
address for the memory operand, e.g. "vmxon -0x8(%rbp)" yields:

Unhandled cpu exception 14 #PF at ip 0000000000400553
rbp=0000000000537000 cr2=0000000100536ff8

Fixes: fbdd5025 ("KVM: nVMX: Move VM-Fail check out of nested_vmx_exit_reflected()")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423001127.13490-1-sean.j.christopherson@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

87796555

24 4月, 2020 1 次提交

KVM: nVMX: Drop a redundant call to vmx_get_intr_info() · 9bd4af24

由 Sean Christopherson 提交于 4月 21, 2020

Drop nested_vmx_l1_wants_exit()'s initialization of intr_info from
vmx_get_intr_info() that was inadvertantly introduced along with the
caching mechanism. EXIT_REASON_EXCEPTION_NMI, the only consumer of
intr_info, populates the variable before using it.

Fixes: bb53120d67cd ("KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200421075328.14458-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9bd4af24

23 4月, 2020 3 次提交

KVM: x86: move nested-related kvm_x86_ops to a separate struct · 33b22172

由 Paolo Bonzini 提交于 4月 17, 2020

Clean up some of the patching of kvm_x86_ops, by moving kvm_x86_ops related to
nested virtualization into a separate struct.

As a result, these ops will always be non-NULL on VMX.  This is not a problem:

* check_nested_events is only called if is_guest_mode(vcpu) returns true

* get_nested_state treats VMXOFF state the same as nested being disabled

* set_nested_state fails if you attempt to set nested state while
  nesting is disabled

* nested_enable_evmcs could already be called on a CPU without VMX enabled
  in CPUID.

* nested_get_evmcs_version was fixed in the previous patch
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

33b22172

KVM: eVMCS: check if nesting is enabled · 25091990

由 Paolo Bonzini 提交于 4月 17, 2020

In the next patch nested_get_evmcs_version will be always set in kvm_x86_ops for
VMX, even if nesting is disabled. Therefore, check whether VMX (aka nesting)
is available in the function, the caller will not do the check anymore.
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

25091990

KVM: x86: check_nested_events is never NULL · 56083bdf

由 Paolo Bonzini 提交于 4月 17, 2020

Both Intel and AMD now implement it, so there is no need to check if the
callback is implemented.
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56083bdf

21 4月, 2020 8 次提交

KVM: SVM: avoid infinite loop on NPF from bad address · e72436bc

由 Paolo Bonzini 提交于 4月 17, 2020

When a nested page fault is taken from an address that does not have
a memslot associated to it, kvm_mmu_do_page_fault returns RET_PF_EMULATE
(via mmu_set_spte) and kvm_mmu_page_fault then invokes svm_need_emulation_on_page_fault.

The default answer there is to return false, but in this case this just
causes the page fault to be retried ad libitum.  Since this is not a
fast path, and the only other case where it is taken is an erratum,
just stick a kvm_vcpu_gfn_to_memslot check in there to detect the
common case where the erratum is not happening.

This fixes an infinite loop in the new set_memory_region_test.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e72436bc

KVM: Remove redundant argument to kvm_arch_vcpu_ioctl_run · 1b94f6f8

由 Tianjia Zhang 提交于 4月 16, 2020

In earlier versions of kvm, 'kvm_run' was an independent structure
and was not included in the vcpu structure. At present, 'kvm_run'
is already included in the vcpu structure, so the parameter
'kvm_run' is redundant.

This patch simplifies the function definition, removes the extra
'kvm_run' parameter, and extracts it from the 'kvm_vcpu' structure
if necessary.
Signed-off-by: NTianjia Zhang <tianjia.zhang@linux.alibaba.com>
Message-Id: <20200416051057.26526-1-tianjia.zhang@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1b94f6f8

KVM: nSVM: Check for CR0.CD and CR0.NW on VMRUN of nested guests · 4f233371

由 Krish Sadhukhan 提交于 4月 09, 2020

According to section "Canonicalization and Consistency Checks" in APM vol. 2,
the following guest state combination is illegal:

	"CR0.CD is zero and CR0.NW is set"
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <20200409205035.16830-2-krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f233371

KVM: X86: Improve latency for single target IPI fastpath · a9ab13ff

由 Wanpeng Li 提交于 4月 10, 2020

IPI and Timer cause the main MSRs write vmexits in cloud environment
observation, let's optimize virtual IPI latency more aggressively to
inject target IPI as soon as possible.

Running kvm-unit-tests/vmexit.flat IPI testing on SKX server, disable
adaptive advance lapic timer and adaptive halt-polling to avoid the
interference, this patch can give another 7% improvement.

w/o fastpath   -> x86.c fastpath      4238 -> 3543  16.4%
x86.c fastpath -> vmx.c fastpath      3543 -> 3293     7%
w/o fastpath   -> vmx.c fastpath      4238 -> 3293  22.3%

Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410174703.1138-3-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a9ab13ff

KVM: VMX: Optimize handling of VM-Entry failures in vmx_vcpu_run() · 873e1da1

由 Sean Christopherson 提交于 4月 10, 2020

Mark the VM-Fail, VM-Exit on VM-Enter, and #MC on VM-Enter paths as
'unlikely' so as to improve code generation so that it favors successful
VM-Enter.  The performance of successful VM-Enter is for more important,
irrespective of whether or not success is actually likely.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410174703.1138-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

873e1da1

KVM: nVMX: Remove non-functional "support" for CR3 target values · b8d295f9

由 Sean Christopherson 提交于 4月 15, 2020

Remove all references to cr3_target_value[0-3] and replace the fields
in vmcs12 with "dead_space" to preserve the vmcs12 layout. KVM doesn't
support emulating CR3-target values, despite a variety of code that
implies otherwise, as KVM unconditionally reports '0' for the number of
supported CR3-target values.

This technically fixes a bug where KVM would incorrectly allow VMREAD
and VMWRITE to nonexistent fields, i.e. cr3_target_value[0-3]. Per
Intel's SDM, the number of supported CR3-target values reported in
VMX_MISC also enumerates the existence of the associated VMCS fields:

If a future implementation supports more than 4 CR3-target values, they
will be encoded consecutively following the 4 encodings given here.

Alternatively, the "bug" could be fixed by actually advertisting support
for 4 CR3-target values, but that'd likely just enable kvm-unit-tests
given that no one has complained about lack of support for going on ten
years, e.g. KVM, Xen and HyperV don't use CR3-target values.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200416000739.9012-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b8d295f9

KVM: x86/mmu: Avoid an extra memslot lookup in try_async_pf() for L2 · c36b7150

由 Paolo Bonzini 提交于 4月 16, 2020

Create a new function kvm_is_visible_memslot() and use it from
kvm_is_visible_gfn(); use the new function in try_async_pf() too,
to avoid an extra memslot lookup.

Opportunistically squish a multi-line comment into a single-line comment.

Note, the end result, KVM_PFN_NOSLOT, is unchanged.

Cc: Jim Mattson <jmattson@google.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c36b7150

KVM: x86/mmu: Set @writable to false for non-visible accesses by L2 · c583eed6

由 Sean Christopherson 提交于 4月 15, 2020

Explicitly set @writable to false in try_async_pf() if the GFN->PFN
translation is short-circuited due to the requested GFN not being
visible to L2.

Leaving @writable ('map_writable' in the callers) uninitialized is ok
in that it's never actually consumed, but one has to track it all the
way through set_spte() being short-circuited by set_mmio_spte() to
understand that the uninitialized variable is benign, and relying on
@writable being ignored is an unnecessary risk.  Explicitly setting
@writable also aligns try_async_pf() with __gfn_to_pfn_memslot().

Jim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415214414.10194-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c583eed6

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功