提交 · a05950009f50ca971a1d616655d01628177bd2e3 · openeuler / Kernel

13 9月, 2017 2 次提交

KVM: x86: Fix handling of pending signal on uninitialized AP · a0595000

由 Jan H. Schönherr 提交于 9月 06, 2017

KVM API says that KVM_RUN will return with -EINTR when a signal is
pending. However, if a vCPU is in KVM_MP_STATE_UNINITIALIZED, then
the return value is unconditionally -EAGAIN.

Copy over some code from vcpu_run(), so that the case of a pending
signal results in the expected return value.
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

a0595000

KVM: SVM: Add a missing 'break' statement · 49a8afca

由 Jan H. Schönherr 提交于 9月 05, 2017

Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Fixes: f6511935 ("KVM: SVM: Add checks for IO instructions")
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

49a8afca

01 9月, 2017 1 次提交

KVM: update to new mmu_notifier semantic v2 · fb1522e0

由 Jérôme Glisse 提交于 8月 31, 2017

Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()

Remove now useless invalidate_page callback.

Changed since v1 (Linus Torvalds)
    - remove now useless kvm_arch_mmu_notifier_invalidate_page()
Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
Tested-by: NMike Galbraith <efault@gmx.de>
Tested-by: NAdam Borowski <kilobyte@angband.pl>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fb1522e0

29 8月, 2017 1 次提交

x86/idt: Unify gate_struct handling for 32/64-bit kernels · 64b163fa

由 Thomas Gleixner 提交于 8月 28, 2017

The first 32 bits of gate struct are the same for 32 and 64 bit kernels.

The 32-bit version uses desc_struct and no designated data structure,
so we need different accessors for 32 and 64 bit kernels.

Aside of that the macros which are necessary to build the 32-bit
gate descriptor are horrible to read.

Unify the gate structs and switch all code fiddling with it over.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.861974317@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

64b163fa

26 8月, 2017 1 次提交

kvm/x86: Avoid clearing the C-bit in rsvd_bits() · ea2800dd

由 Brijesh Singh 提交于 8月 25, 2017

The following commit:

  d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")

uses __sme_clr() to remove the C-bit in rsvd_bits(). rsvd_bits() is
just a simple function to return some 1 bits. Applying a mask based
on properties of the host MMU is incorrect. Additionally, the masks
computed by __reset_rsvds_bits_mask also apply to guest page tables,
where the C bit is reserved since we don't emulate SME.

The fix is to clear the C-bit from rsvd_bits_mask array after it has been
populated from __reset_rsvds_bits_mask()
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: paolo.bonzini@gmail.com
Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Link: http://lkml.kernel.org/r/20170825205540.123531-1-brijesh.singh@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

ea2800dd

25 8月, 2017 14 次提交

kvm: nVMX: Validate the virtual-APIC address on nested VM-entry · 712b12d7

由 Jim Mattson 提交于 8月 24, 2017

According to the SDM, if the "use TPR shadow" VM-execution control is
1, bits 11:0 of the virtual-APIC address must be 0 and the address
should set any bits beyond the processor's physical-address width.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

712b12d7

KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state · 38cfd5e3

由 Paolo Bonzini 提交于 8月 23, 2017

The host pkru is restored right after vcpu exit (commit 1be0e61c), so
KVM_GET_XSAVE will return the host PKRU value instead.  Fix this by
using the guest PKRU explicitly in fill_xsave and load_xsave.  This
part is based on a patch by Junkang Fu.

The host PKRU data may also not match the value in vcpu->arch.guest_fpu.state,
because it could have been changed by userspace since the last time
it was saved, so skip loading it in kvm_load_guest_fpu.
Reported-by: NJunkang Fu <junkang.fjk@alibaba-inc.com>
Cc: Yang Zhang <zy107165@alibaba-inc.com>
Fixes: 1be0e61c
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

38cfd5e3

KVM: x86: simplify handling of PKRU · b9dd21e1

由 Paolo Bonzini 提交于 8月 23, 2017

Move it to struct kvm_arch_vcpu, replacing guest_pkru_valid with a
simple comparison against the host value of the register.  The write of
PKRU in addition can be skipped if the guest has not enabled the feature.
Once we do this, we need not test OSPKE in the host anymore, because
guest_CR4.PKE=1 implies host_CR4.PKE=1.

The static PKU test is kept to elide the code on older CPUs.
Suggested-by: NYang Zhang <zy107165@alibaba-inc.com>
Fixes: 1be0e61c
Cc: stable@vger.kernel.org
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9dd21e1

KVM: x86: block guest protection keys unless the host has them enabled · c469268c

由 Paolo Bonzini 提交于 8月 24, 2017

If the host has protection keys disabled, we cannot read and write the
guest PKRU---RDPKRU and WRPKRU fail with #GP(0) if CR4.PKE=0.  Block
the PKU cpuid bit in that case.

This ensures that guest_CR4.PKE=1 implies host_CR4.PKE=1.

Fixes: 1be0e61c
Cc: stable@vger.kernel.org
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c469268c

KVM: nVMX: Fix trying to cancel vmlauch/vmresume · bfcf83b1

由 Wanpeng Li 提交于 8月 24, 2017

------------[ cut here ]------------
WARNING: CPU: 7 PID: 3861 at /home/kernel/ssd/kvm/arch/x86/kvm//vmx.c:11299 nested_vmx_vmexit+0x176e/0x1980 [kvm_intel]
CPU: 7 PID: 3861 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0-rc4+ #11
RIP: 0010:nested_vmx_vmexit+0x176e/0x1980 [kvm_intel]
Call Trace:
 ? kvm_multiple_exception+0x149/0x170 [kvm]
 ? handle_emulation_failure+0x79/0x230 [kvm]
 ? load_vmcs12_host_state+0xa80/0xa80 [kvm_intel]
 ? check_chain_key+0x137/0x1e0
 ? reexecute_instruction.part.168+0x130/0x130 [kvm]
 nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel]
 ? nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel]
 vmx_queue_exception+0x197/0x300 [kvm_intel]
 kvm_arch_vcpu_ioctl_run+0x1b0c/0x2c90 [kvm]
 ? kvm_arch_vcpu_runnable+0x220/0x220 [kvm]
 ? preempt_count_sub+0x18/0xc0
 ? restart_apic_timer+0x17d/0x300 [kvm]
 ? kvm_lapic_restart_hv_timer+0x37/0x50 [kvm]
 ? kvm_arch_vcpu_load+0x1d8/0x350 [kvm]
 kvm_vcpu_ioctl+0x4e4/0x910 [kvm]
 ? kvm_vcpu_ioctl+0x4e4/0x910 [kvm]
 ? kvm_dev_ioctl+0xbe0/0xbe0 [kvm]

The flag "nested_run_pending", which can override the decision of which should run
next, L1 or L2. nested_run_pending=1 means that we *must* run L2 next, not L1. This
is necessary in particular when L1 did a VMLAUNCH of L2 and therefore expects L2 to
be run (and perhaps be injected with an event it specified, etc.). Nested_run_pending
is especially intended to avoid switching  to L1 in the injection decision-point.

This can be handled just like the other cases in vmx_check_nested_events, instead of
having a special case in vmx_queue_exception.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bfcf83b1

KVM: X86: Fix loss of exception which has not yet been injected · 664f8e26

由 Wanpeng Li 提交于 8月 24, 2017

vmx_complete_interrupts() assumes that the exception is always injected,
so it can be dropped by kvm_clear_exception_queue().  However,
an exception cannot be injected immediately if it is: 1) originally
destined to a nested guest; 2) trapped to cause a vmexit; 3) happening
right after VMLAUNCH/VMRESUME, i.e. when nested_run_pending is true.

This patch applies to exceptions the same algorithm that is used for
NMIs, replacing exception.reinject with "exception.injected" (equivalent
to nmi_injected).

exception.pending now represents an exception that is queued and whose
side effects (e.g., update RFLAGS.RF or DR7) have not been applied yet.
If exception.pending is true, the exception might result in a nested
vmexit instead, too (in which case the side effects must not be applied).

exception.injected instead represents an exception that is going to be
injected into the guest at the next vmentry.
Reported-by: NRadim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

664f8e26

KVM: VMX: use kvm_event_needs_reinjection · 274bba52

由 Wanpeng Li 提交于 8月 24, 2017

Use kvm_event_needs_reinjection() encapsulation.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

274bba52

KVM: MMU: speedup update_permission_bitmask · 09f037aa

由 Paolo Bonzini 提交于 8月 24, 2017

update_permission_bitmask currently does a 128-iteration loop to,
essentially, compute a constant array.  Computing the 8 bits in parallel
reduces it to 16 iterations, and is enough to speed it up substantially
because many boolean operations in the inner loop become constants or
simplify noticeably.

Because update_permission_bitmask is actually the top item in the profile
for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
clock cycles, or up to 30%:

                                         before     after
   cpuid                                 35173      25954
   vmcall                                35122      27079
   inl_from_pmtimer                      52635      42675
   inl_from_qemu                         53604      44599
   inl_from_kernel                       38498      30798
   outl_to_kernel                        34508      28816
   wr_tsc_adjust_msr                     34185      26818
   rd_tsc_adjust_msr                     37409      27049
   mmio-no-eventfd:pci-mem               50563      45276
   mmio-wildcard-eventfd:pci-mem         34495      30823
   mmio-datamatch-eventfd:pci-mem        35612      31071
   portio-no-eventfd:pci-io              44925      40661
   portio-wildcard-eventfd:pci-io        29708      27269
   portio-datamatch-eventfd:pci-io       31135      27164

(I wrote a small C program to compare the tables for all values of CR0.WP,
CR4.SMAP and CR4.SMEP, and they match).
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

09f037aa

KVM: MMU: Expose the LA57 feature to VM. · fd8cb433

由 Yu Zhang 提交于 8月 24, 2017

This patch exposes 5 level page table feature to the VM.
At the same time, the canonical virtual address checking is
extended to support both 48-bits and 57-bits address width.
Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fd8cb433

KVM: MMU: Add 5 level EPT & Shadow page table support. · 855feb67

由 Yu Zhang 提交于 8月 24, 2017

Extends the shadow paging code, so that 5 level shadow page
table can be constructed if VM is running in 5 level paging
mode.

Also extends the ept code, so that 5 level ept table can be
constructed if maxphysaddr of VM exceeds 48 bits. Unlike the
shadow logic, KVM should still use 4 level ept table for a VM
whose physical address width is less than 48 bits, even when
the VM is running in 5 level paging mode.
Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
[Unconditionally reset the MMU context in kvm_cpuid_update.
 Changing MAXPHYADDR invalidates the reserved bit bitmasks.
 - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

855feb67

KVM: MMU: Rename PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL. · 2a7266a8

由 Yu Zhang 提交于 8月 24, 2017

Now we have 4 level page table and 5 level page table in 64 bits
long mode, let's rename the PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL,
then we can use PT64_ROOT_5LEVEL for 5 level page table, it's
helpful to make the code more clear.

Also PT64_ROOT_MAX_LEVEL is defined as 4, so that we can just
redefine it to 5 whenever a replacement is needed for 5 level
paging.
Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2a7266a8

KVM: MMU: check guest CR3 reserved bits based on its physical address width. · d1cd3ce9

由 Yu Zhang 提交于 8月 24, 2017

Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the
reserved bits in CR3. Yet the length of reserved bits in
guest CR3 should be based on the physical address width
exposed to the VM. This patch changes CR3 check logic to
calculate the reserved bits at runtime.
Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d1cd3ce9

KVM: x86: Add return value to kvm_cpuid(). · e911eb3b

由 Yu Zhang 提交于 8月 24, 2017

Return false in kvm_cpuid() when it fails to find the cpuid
entry. Also, this routine(and its caller) is optimized with
a new argument - check_limit, so that the check_cpuid_limit()
fall back can be avoided.
Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e911eb3b

kvm: vmx: Raise #UD on unsupported XSAVES/XRSTORS · 3db13480

由 Paolo Bonzini 提交于 8月 24, 2017

A guest may not be configured to support XSAVES/XRSTORS, even when the host
does. If the guest does not support XSAVES/XRSTORS, clear the secondary
execution control so that the processor will raise #UD.

Also clear the "allowed-1" bit for XSAVES/XRSTORS exiting in the
IA32_VMX_PROCBASED_CTLS2 MSR, and pass through VMCS12's control in
the VMCS02.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3db13480

24 8月, 2017 5 次提交

kvm: vmx: Raise #UD on unsupported RDSEED · 75f4fc8d

由 Jim Mattson 提交于 8月 23, 2017

A guest may not be configured to support RDSEED, even when the host
does. If the guest does not support RDSEED, intercept the instruction
and synthesize #UD. Also clear the "allowed-1" bit for RDSEED exiting
in the IA32_VMX_PROCBASED_CTLS2 MSR.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

75f4fc8d

kvm: vmx: Raise #UD on unsupported RDRAND · 45ec368c

由 Jim Mattson 提交于 8月 23, 2017

A guest may not be configured to support RDRAND, even when the host
does. If the guest does not support RDRAND, intercept the instruction
and synthesize #UD. Also clear the "allowed-1" bit for RDRAND exiting
in the IA32_VMX_PROCBASED_CTLS2 MSR.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

45ec368c

KVM: VMX: cache secondary exec controls · 80154d77

由 Paolo Bonzini 提交于 8月 24, 2017

Currently, secondary execution controls are divided in three groups:

- static, depending mostly on the module arguments or the processor
  (vmx_secondary_exec_control)

- static, depending on CPUID (vmx_cpuid_update)

- dynamic, depending on nested VMX or local APIC state

Because walking CPUID is expensive, prepare_vmcs02 is using only
the first group.  This however is unnecessarily complicated.  Just
cache the static secondary execution controls, and then prepare_vmcs02
does not need to compute them every time.  Computation of all static
secondary execution controls is now kept in a single function,
vmx_compute_secondary_exec_control.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

80154d77

x86/lguest: Remove lguest support · ecda85e7

由 Juergen Gross 提交于 8月 16, 2017

Lguest seems to be rather unused these days. It has seen only patches
ensuring it still builds the last two years and its official state is
"Odd Fixes".

Remove it in order to be able to clean up the paravirt code.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: boris.ostrovsky@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: rusty@rustcorp.com.au
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170816173157.8633-3-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

ecda85e7

KVM: SVM: Enable Virtual GIF feature · 640bd6e5

由 Janakarajan Natarajan 提交于 8月 23, 2017

Enable the Virtual GIF feature. This is done by setting bit 25 at position
60h in the vmcb.

With this feature enabled, the processor uses bit 9 at position 60h as the
virtual GIF when executing STGI/CLGI instructions.

Since the execution of STGI by the L1 hypervisor does not cause a return to
the outermost (L0) hypervisor, the enable_irq_window and enable_nmi_window
are modified.

The IRQ window will be opened even if GIF is not set, under the assumption
that on resuming the L1 hypervisor the IRQ will be held pending until the
processor executes the STGI instruction.

For the NMI window, the STGI intercept is set. This will assist in opening
the window only when GIF=1.
Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

640bd6e5

18 8月, 2017 6 次提交

KVM: VMX: always require WB memory type for EPT · 42aa53b4

由 David Hildenbrand 提交于 8月 10, 2017

We already always set that type but don't check if it is supported. Also
for nVMX, we only support WB for now. Let's just require it.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

42aa53b4

KVM: VMX: cleanup EPTP definitions · bb97a016

由 David Hildenbrand 提交于 8月 10, 2017

Don't use shifts, tag them correctly as EPTP and use better matching
names (PWL vs. GAW).
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

bb97a016

KVM: SVM: delete avic_vm_id_bitmap (2 megabyte static array) · 3f0d4db7

由 Denys Vlasenko 提交于 8月 11, 2017

With lightly tweaked defconfig:

    text    data     bss      dec     hex filename
11259661 5109408 2981888 19350957 12745ad vmlinux.before
11259661 5109408  884736 17253805 10745ad vmlinux.after

Only compile-tested.
Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

3f0d4db7

KVM: x86: fix use of L1 MMIO areas in nested guests · 9034e6e8

由 Paolo Bonzini 提交于 8月 17, 2017

There is currently some confusion between nested and L1 GPAs.  The
assignment to "direct" in kvm_mmu_page_fault tries to fix that, but
it is not enough.  What this patch does is fence off the MMIO cache
completely when using shadow nested page tables, since we have neither
a GVA nor an L1 GPA to put in the cache.  This also allows some
simplifications in kvm_mmu_page_fault and FNAME(page_fault).

The EPT misconfig likewise does not have an L1 GPA to pass to
kvm_io_bus_write, so that must be skipped for guest mode.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
[Changed comment to say "GPAs" instead of "L1's physical addresses", as
 per David's review. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

9034e6e8

KVM: x86: Avoid guest page table walk when gpa_available is set · 618232e2

由 Brijesh Singh 提交于 8月 17, 2017

When a guest causes a page fault which requires emulation, the
vcpu->arch.gpa_available flag is set to indicate that cr2 contains a
valid GPA.

Currently, emulator_read_write_onepage() makes use of gpa_available flag
to avoid a guest page walk for a known MMIO regions. Lets not limit
the gpa_available optimization to just MMIO region. The patch extends
the check to avoid page walk whenever gpa_available flag is set.
Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
[Fix EPT=0 according to Wanpeng Li's fix, plus ensure VMX also uses the
 new code. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
[Moved "ret < 0" to the else brach, as per David's review. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

618232e2

KVM: x86: simplify ept_misconfig · e08d26f0

由 Paolo Bonzini 提交于 8月 17, 2017

Calling handle_mmio_page_fault() has been unnecessary since commit
e9ee956e ("KVM: x86: MMU: Move handle_mmio_page_fault() call to
kvm_mmu_page_fault()", 2016-02-22).

handle_mmio_page_fault() can now be made static.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

e08d26f0

12 8月, 2017 4 次提交

kvm: x86: Disallow illegal IA32_APIC_BASE MSR values · d3802286

由 Jim Mattson 提交于 8月 10, 2017

Host-initiated writes to the IA32_APIC_BASE MSR do not have to follow
local APIC state transition constraints, but the value written must be
valid.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d3802286

KVM: MMU: Bail out immediately if there is no available mmu page · 26eeb53c

由 Wanpeng Li 提交于 8月 10, 2017

Bailing out immediately if there is no available mmu page to alloc.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

26eeb53c

KVM: MMU: Fix softlockup due to mmu_lock is held too long · 42bcbebf

由 Wanpeng Li 提交于 8月 10, 2017

watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [warn_test:3089]
 irq event stamp: 20532
 hardirqs last  enabled at (20531): [<ffffffff8e9b6908>] restore_regs_and_iret+0x0/0x1d
 hardirqs last disabled at (20532): [<ffffffff8e9b7ae8>] apic_timer_interrupt+0x98/0xb0
 softirqs last  enabled at (8266): [<ffffffff8e9badc6>] __do_softirq+0x206/0x4c1
 softirqs last disabled at (8253): [<ffffffff8e083918>] irq_exit+0xf8/0x100
 CPU: 5 PID: 3089 Comm: warn_test Tainted: G           OE   4.13.0-rc3+ #8
 RIP: 0010:kvm_mmu_prepare_zap_page+0x72/0x4b0 [kvm]
 Call Trace:
  make_mmu_pages_available.isra.120+0x71/0xc0 [kvm]
  kvm_mmu_load+0x1cf/0x410 [kvm]
  kvm_arch_vcpu_ioctl_run+0x1316/0x1bf0 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x23/0xc2
  ? __this_cpu_preempt_check+0x13/0x20

This can be reproduced readily by ept=N and running syzkaller tests since
many syzkaller testcases don't setup any memory regions. However, if ept=Y
rmode identity map will be created, then kvm_mmu_calculate_mmu_pages() will
extend the number of VM's mmu pages to at least KVM_MIN_ALLOC_MMU_PAGES
which just hide the issue.

I saw the scenario kvm->arch.n_max_mmu_pages == 0 && kvm->arch.n_used_mmu_pages == 1,
so there is one active mmu page on the list, kvm_mmu_prepare_zap_page() fails
to zap any pages, however prepare_zap_oldest_mmu_page() always returns true.
It incurs infinite loop in make_mmu_pages_available() which causes mmu->lock
softlockup.

This patch fixes it by setting the return value of prepare_zap_oldest_mmu_page()
according to whether or not there is mmu page zapped.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

42bcbebf

KVM: nVMX: validate eptp pointer · a057e0e2

由 David Hildenbrand 提交于 8月 10, 2017

Let's reuse the function introduced with eptp switching.

We don't explicitly have to check against enable_ept_ad_bits, as this
is implicitly done when checking against nested_vmx_ept_caps in
valid_ept_address().
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a057e0e2

11 8月, 2017 1 次提交

x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag · 5442c269

由 Borislav Petkov 提交于 8月 01, 2017

"virtual_vmload_vmsave" is what is going to land in /proc/cpuinfo now
as per v4.13-rc4, for a single feature bit which is clearly too long.

So rename it to what it is called in the processor manual.
"v_vmsave_vmload" is a bit shorter, after all.

We could go more aggressively here but having it the same as in the
processor manual is advantageous.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NRadim Krčmář <rkrcmar@redhat.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm-ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170801185552.GA3743@nazgul.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>

5442c269

10 8月, 2017 3 次提交

kvm: nVMX: Add support for fast unprotection of nested guest page tables · eebed243

由 Paolo Bonzini 提交于 11月 28, 2016

This is the same as commit 14727754 ("kvm: svm: Add support for
additional SVM NPF error codes", 2016-11-23), but for Intel processors.
In this case, the exit qualification field's bit 8 says whether the
EPT violation occurred while translating the guest's final physical
address or rather while translating the guest page tables.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eebed243

KVM: SVM: Limit PFERR_NESTED_GUEST_PAGE error_code check to L1 guest · 64531a3b

由 Brijesh Singh 提交于 8月 07, 2017

Commit 14727754 ("kvm: svm: Add support for additional SVM NPF error
codes", 2016-11-23) added a new error code to aid nested page fault
handling.  The commit unprotects (kvm_mmu_unprotect_page) the page when
we get a NPF due to guest page table walk where the page was marked RO.

However, if an L0->L2 shadow nested page table can also be marked read-only
when a page is read only in L1's nested page table.  If such a page
is accessed by L2 while walking page tables it can cause a nested
page fault (page table walks are write accesses).  However, after
kvm_mmu_unprotect_page we may get another page fault, and again in an
endless stream.

To cover this use case, we qualify the new error_code check with
vcpu->arch.mmu_direct_map so that the error_code check would run on L1
guest, and not the L2 guest.  This avoids hitting the above scenario.

Fixes: 14727754
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

64531a3b

KVM: X86: Fix residual mmio emulation request to userspace · bbeac283

由 Wanpeng Li 提交于 8月 09, 2017

Reported by syzkaller:

The kvm-intel.unrestricted_guest=0

   WARNING: CPU: 5 PID: 1014 at /home/kernel/data/kvm/arch/x86/kvm//x86.c:7227 kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
   CPU: 5 PID: 1014 Comm: warn_test Tainted: G        W  OE   4.13.0-rc3+ #8
   RIP: 0010:kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
   Call Trace:
    ? put_pid+0x3a/0x50
    ? rcu_read_lock_sched_held+0x79/0x80
    ? kmem_cache_free+0x2f2/0x350
    kvm_vcpu_ioctl+0x340/0x700 [kvm]
    ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
    ? __fget+0xfc/0x210
    do_vfs_ioctl+0xa4/0x6a0
    ? __fget+0x11d/0x210
    SyS_ioctl+0x79/0x90
    entry_SYSCALL_64_fastpath+0x23/0xc2
    ? __this_cpu_preempt_check+0x13/0x20

The syszkaller folks reported a residual mmio emulation request to userspace
due to vm86 fails to emulate inject real mode interrupt(fails to read CS) and
incurs a triple fault. The vCPU returns to userspace with vcpu->mmio_needed == true
and KVM_EXIT_SHUTDOWN exit reason. However, the syszkaller testcase constructs
several threads to launch the same vCPU, the thread which lauch this vCPU after
the thread whichs get the vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN will
trigger the warning.

   #define _GNU_SOURCE
   #include <pthread.h>
   #include <stdio.h>
   #include <stdlib.h>
   #include <string.h>
   #include <sys/wait.h>
   #include <sys/types.h>
   #include <sys/stat.h>
   #include <sys/mman.h>
   #include <fcntl.h>
   #include <unistd.h>
   #include <linux/kvm.h>
   #include <stdio.h>

   int kvmcpu;
   struct kvm_run *run;

   void* thr(void* arg)
   {
     int res;
     res = ioctl(kvmcpu, KVM_RUN, 0);
     printf("ret1=%d exit_reason=%d suberror=%d\n",
         res, run->exit_reason, run->internal.suberror);
     return 0;
   }

   void test()
   {
     int i, kvm, kvmvm;
     pthread_t th[4];

     kvm = open("/dev/kvm", O_RDWR);
     kvmvm = ioctl(kvm, KVM_CREATE_VM, 0);
     kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0);
     run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, kvmcpu, 0);
     srand(getpid());
     for (i = 0; i < 4; i++) {
       pthread_create(&th[i], 0, thr, 0);
       usleep(rand() % 10000);
     }
     for (i = 0; i < 4; i++)
       pthread_join(th[i], 0);
   }

   int main()
   {
     for (;;) {
       int pid = fork();
       if (pid < 0)
         exit(1);
       if (pid == 0) {
         test();
         exit(0);
       }
       int status;
       while (waitpid(pid, &status, __WALL) != pid) {}
     }
     return 0;
   }

This patch fixes it by resetting the vcpu->mmio_needed once we receive
the triple fault to avoid the residue.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Tested-by: NDmitry Vyukov <dvyukov@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bbeac283

08 8月, 2017 2 次提交

KVM: X86: implement the logic for spinlock optimization · de63ad4c

由 Longpeng(Mike) 提交于 8月 08, 2017

get_cpl requires vcpu_load, so we must cache the result (whether the
vcpu was preempted when its cpl=0) in kvm_vcpu_arch.
Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

de63ad4c

KVM: add spinlock optimization framework · 199b5763

由 Longpeng(Mike) 提交于 8月 08, 2017

If a vcpu exits due to request a user mode spinlock, then
the spinlock-holder may be preempted in user mode or kernel mode.
(Note that not all architectures trap spin loops in user mode,
only AMD x86 and ARM/ARM64 currently do).

But if a vcpu exits in kernel mode, then the holder must be
preempted in kernel mode, so we should choose a vcpu in kernel mode
as a more likely candidate for the lock holder.

This introduces kvm_arch_vcpu_in_kernel() to decide whether the
vcpu is in kernel-mode when it's preempted.  kvm_vcpu_on_spin's
new argument says the same of the spinning VCPU.
Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

199b5763

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功