提交 · fa59cc003804f7606b3d0aa8e16509f9103f7377 · openeuler / Kernel

16 1月, 2018 6 次提交

KVM: x86: Optimization: Create SVM stubs for sync_pir_to_irr() · fa59cc00

由 Liran Alon 提交于 12月 24, 2017

sync_pir_to_irr() is only called if vcpu->arch.apicv_active()==true.
In case it is false, VMX code make sure to set sync_pir_to_irr
to NULL.

Therefore, having SVM stubs allows to remove check for if
sync_pir_to_irr != NULL from all calling sites.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
[Return highest IRR in the SVM case. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

fa59cc00

KVM: x86: avoid unnecessary XSETBV on guest entry · 476b7ada

由 Paolo Bonzini 提交于 12月 13, 2017

xsetbv can be expensive when running on nested virtualization, try to
avoid it.
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: NQuan Xu <quan.xu0@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

476b7ada

KVM: x86: fix escape of guest dr6 to the host · efdab992

由 Wanpeng Li 提交于 12月 13, 2017

syzkaller reported:

   WARNING: CPU: 0 PID: 12927 at arch/x86/kernel/traps.c:780 do_debug+0x222/0x250
   CPU: 0 PID: 12927 Comm: syz-executor Tainted: G           OE    4.15.0-rc2+ #16
   RIP: 0010:do_debug+0x222/0x250
   Call Trace:
    <#DB>
    debug+0x3e/0x70
   RIP: 0010:copy_user_enhanced_fast_string+0x10/0x20
    </#DB>
    _copy_from_user+0x5b/0x90
    SyS_timer_create+0x33/0x80
    entry_SYSCALL_64_fastpath+0x23/0x9a

The testcase sets a watchpoint (with perf_event_open) on a buffer that is
passed to timer_create() as the struct sigevent argument.  In timer_create(),
copy_from_user()'s rep movsb triggers the BP.  The testcase also sets
the debug registers for the guest.

However, KVM only restores host debug registers when the host has active
watchpoints, which triggers a race condition when running the testcase with
multiple threads.  The guest's DR6.BS bit can escape to the host before
another thread invokes timer_create(), and do_debug() complains.

The fix is to respect do_debug()'s dr6 invariant when leaving KVM.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

efdab992

KVM: X86: support paravirtualized help for TLB shootdowns · f38a7b75

由 Wanpeng Li 提交于 12月 12, 2017

When running on a virtual machine, IPIs are expensive when the target
CPU is sleeping.  Thus, it is nice to be able to avoid them for TLB
shootdowns.  KVM can just do the flush via INVVPID on the guest's behalf
the next time the CPU is scheduled.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
[Use "&" to test the bit instead of "==". - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

f38a7b75

KVM: X86: introduce invalidate_gpa argument to tlb flush · c2ba05cc

由 Wanpeng Li 提交于 12月 12, 2017

Introduce a new bool invalidate_gpa argument to kvm_x86_ops->tlb_flush,
it will be used by later patches to just flush guest tlb.

For VMX, this will use INVVPID instead of INVEPT, which will invalidate
combined mappings while keeping guest-physical mappings.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c2ba05cc

KVM: X86: Add KVM_VCPU_PREEMPTED · fa55eedd

由 Wanpeng Li 提交于 12月 12, 2017

The next patch will add another bit to the preempted field in
kvm_steal_time.  Define a constant for bit 0 (the only one that is
currently used).

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

fa55eedd

14 12月, 2017 18 次提交

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl · 9b062471

由 Christoffer Dall 提交于 12月 04, 2017

Move the calls to vcpu_load() and vcpu_put() in to the architecture
specific implementations of kvm_arch_vcpu_ioctl() which dispatches
further architecture-specific ioctls on to other functions.

Some architectures support asynchronous vcpu ioctls which cannot call
vcpu_load() or take the vcpu->mutex, because that would prevent
concurrent execution with a running VCPU, which is the intended purpose
of these ioctls, for example because they inject interrupts.

We repeat the separate checks for these specifics in the architecture
code for MIPS, S390 and PPC, and avoid taking the vcpu->mutex and
calling vcpu_load for these ioctls.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9b062471

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_fpu · 6a96bc7f

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_fpu().
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6a96bc7f

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_fpu · 1393123e

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_fpu().
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1393123e

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_guest_debug · 66b56562

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_guest_debug().
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

66b56562

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_translate · 1da5b61d

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_translate().
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1da5b61d

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_mpstate · e83dff5e

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_mpstate().
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e83dff5e

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_mpstate · fd232561

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_mpstate().
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fd232561

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_sregs · b4ef9d4e

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_sregs().
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b4ef9d4e

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_sregs · bcdec41c

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_sregs().
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bcdec41c

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_regs · 875656fe

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_regs().
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

875656fe

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_regs · 1fc9b76b

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_regs().
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1fc9b76b

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_run · accb757d

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_run().
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390 parts
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
[Rebased. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

accb757d

KVM: Take vcpu->mutex outside vcpu_load · ec7660cc

由 Christoffer Dall 提交于 12月 04, 2017

As we're about to call vcpu_load() from architecture-specific
implementations of the KVM vcpu ioctls, but yet we access data
structures protected by the vcpu->mutex in the generic code, factor
this logic out from vcpu_load().

x86 is the only architecture which calls vcpu_load() outside of the main
vcpu ioctl function, and these calls will no longer take the vcpu mutex
following this patch.  However, with the exception of
kvm_arch_vcpu_postcreate (see below), the callers are either in the
creation or destruction path of the VCPU, which means there cannot be
any concurrent access to the data structure, because the file descriptor
is not yet accessible, or is already gone.

kvm_arch_vcpu_postcreate makes the newly created vcpu potentially
accessible by other in-kernel threads through the kvm->vcpus array, and
we therefore take the vcpu mutex in this case directly.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec7660cc

KVM: X86: Reduce the overhead when lapic_timer_advance is disabled · 9c48d517

由 Wanpeng Li 提交于 12月 01, 2017

When I run ebizzy in a 32 vCPUs guest on a 32 pCPUs Xeon box, I can observe
~8000 kvm_wait_lapic_expire CurAvg/s through kvm_stat tool even if the advance
tscdeadline hrtimer expiration is disabled. Each call to wait_lapic_expire()
will consume ~70 cycles when a timer fires since apic_timer_expire() will
set expired_tscdeadline and then wait_lapic_expire() will do some caculation
before bailing out. So total ~175us per second is lost on this 3.2Ghz machine.
This patch reduces the overhead by skipping the function wait_lapic_expire()
when lapic_timer_advance is disabled.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9c48d517

KVM: x86: Add emulation of MSR_SMI_COUNT · 52797bf9

由 Liran Alon 提交于 11月 15, 2017

This MSR returns the number of #SMIs that occurred on CPU since
boot.

It was seen to be used frequently by ESXi guest.

Patch adds a new vcpu-arch specific var called smi_count to
save the number of #SMIs which occurred on CPU since boot.
It is exposed as a read-only MSR to guest (causing #GP
on wrmsr) in RDMSR/WRMSR emulation code.
MSR_SMI_COUNT is also added to emulated_msrs[] to make sure
user-space can save/restore it for migration purposes.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: NBhavesh Davda <bhavesh.davda@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

52797bf9

KVM: x86: add support for UMIP · ae3e61e1

由 Paolo Bonzini 提交于 7月 12, 2016

Add the CPUID bits, make the CR4.UMIP bit not reserved anymore, and
add UMIP support for instructions that are already emulated by KVM.
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ae3e61e1

kvm: x86: fix WARN due to uninitialized guest FPU state · 5663d8f9

由 Peter Xu 提交于 12月 12, 2017

------------[ cut here ]------------
 Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers.
 WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90
 CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G    B      OE    4.15.0-rc2+ #10
 RIP: 0010:ex_handler_fprestore+0x88/0x90
 Call Trace:
  fixup_exception+0x4e/0x60
  do_general_protection+0xff/0x270
  general_protection+0x22/0x30
 RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm]
 RSP: 0018:ffff8803d5627810 EFLAGS: 00010246
  kvm_vcpu_reset+0x3b4/0x3c0 [kvm]
  kvm_apic_accept_events+0x1c0/0x240 [kvm]
  kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm]
  kvm_vcpu_ioctl+0x479/0x880 [kvm]
  do_vfs_ioctl+0x142/0x9a0
  SyS_ioctl+0x74/0x80
  do_syscall_64+0x15f/0x600

where kvm_put_guest_fpu is called without a prior kvm_load_guest_fpu.
To fix it, move kvm_load_guest_fpu to the very beginning of
kvm_arch_vcpu_ioctl_run.

Cc: stable@vger.kernel.org
Fixes: f775b13eSigned-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5663d8f9

KVM: X86: Fix load RFLAGS w/o the fixed bit · d73235d1

由 Wanpeng Li 提交于 12月 07, 2017

 *** Guest State ***
 CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
 CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871
 CR3 = 0x00000000fffbc000
 RSP = 0x0000000000000000  RIP = 0x0000000000000000
 RFLAGS=0x00000000         DR7 = 0x0000000000000400
        ^^^^^^^^^^

The failed vmentry is triggered by the following testcase when ept=Y:

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[5];
    int main()
    {
    	r[2] = open("/dev/kvm", O_RDONLY);
    	r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
    	r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
    	struct kvm_regs regs = {
    		.rflags = 0,
    	};
    	ioctl(r[4], KVM_SET_REGS, &regs);
    	ioctl(r[4], KVM_RUN, 0);
    }

X86 RFLAGS bit 1 is fixed set, userspace can simply clearing bit 1
of RFLAGS with KVM_SET_REGS ioctl which results in vmentry fails.
This patch fixes it by oring X86_EFLAGS_FIXED during ioctl.

Cc: stable@vger.kernel.org
Suggested-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NQuan Xu <quan.xu0@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d73235d1

06 12月, 2017 3 次提交

KVM: x86: fix APIC page invalidation · b1394e74

由 Radim Krčmář 提交于 11月 30, 2017

Implementation of the unpinned APIC page didn't update the VMCS address
cache when invalidation was done through range mmu notifiers.
This became a problem when the page notifier was removed.

Re-introduce the arch-specific helper and call it from ...range_start.
Reported-by: NFabian Grünbichler <f.gruenbichler@proxmox.com>
Fixes: 38b99173 ("kvm: vmx: Implement set_apic_access_page_addr")
Fixes: 369ea824 ("mm/rmap: update to new mmu_notifier semantic v2")
Cc: <stable@vger.kernel.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Tested-by: NWanpeng Li <wanpeng.li@hotmail.com>
Tested-by: NFabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b1394e74

x86,kvm: remove KVM emulator get_fpu / put_fpu · 6ab0b9fe

由 Rik van Riel 提交于 11月 14, 2017

Now that get_fpu and put_fpu do nothing, because the scheduler will
automatically load and restore the guest FPU context for us while we
are in this code (deep inside the vcpu_run main loop), we can get rid
of the get_fpu and put_fpu hooks.
Signed-off-by: NRik van Riel <riel@redhat.com>
Suggested-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6ab0b9fe

x86,kvm: move qemu/guest FPU switching out to vcpu_run · f775b13e

由 Rik van Riel 提交于 11月 14, 2017

Currently, every time a VCPU is scheduled out, the host kernel will
first save the guest FPU/xstate context, then load the qemu userspace
FPU context, only to then immediately save the qemu userspace FPU
context back to memory. When scheduling in a VCPU, the same extraneous
FPU loads and saves are done.

This could be avoided by moving from a model where the guest FPU is
loaded and stored with preemption disabled, to a model where the
qemu userspace FPU is swapped out for the guest FPU context for
the duration of the KVM_RUN ioctl.

This is done under the VCPU mutex, which is also taken when other
tasks inspect the VCPU FPU context, so the code should already be
safe for this change. That should come as no surprise, given that
s390 already has this optimization.

This can fix a bug where KVM calls get_user_pages while owning the
FPU, and the file system ends up requesting the FPU again:

    [258270.527947]  __warn+0xcb/0xf0
    [258270.527948]  warn_slowpath_null+0x1d/0x20
    [258270.527951]  kernel_fpu_disable+0x3f/0x50
    [258270.527953]  __kernel_fpu_begin+0x49/0x100
    [258270.527955]  kernel_fpu_begin+0xe/0x10
    [258270.527958]  crc32c_pcl_intel_update+0x84/0xb0
    [258270.527961]  crypto_shash_update+0x3f/0x110
    [258270.527968]  crc32c+0x63/0x8a [libcrc32c]
    [258270.527975]  dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
    [258270.527978]  node_prepare_for_write+0x44/0x70 [dm_persistent_data]
    [258270.527985]  dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
    [258270.527988]  submit_io+0x170/0x1b0 [dm_bufio]
    [258270.527992]  __write_dirty_buffer+0x89/0x90 [dm_bufio]
    [258270.527994]  __make_buffer_clean+0x4f/0x80 [dm_bufio]
    [258270.527996]  __try_evict_buffer+0x42/0x60 [dm_bufio]
    [258270.527998]  dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
    [258270.528002]  shrink_slab.part.40+0x1f5/0x420
    [258270.528004]  shrink_node+0x22c/0x320
    [258270.528006]  do_try_to_free_pages+0xf5/0x330
    [258270.528008]  try_to_free_pages+0xe9/0x190
    [258270.528009]  __alloc_pages_slowpath+0x40f/0xba0
    [258270.528011]  __alloc_pages_nodemask+0x209/0x260
    [258270.528014]  alloc_pages_vma+0x1f1/0x250
    [258270.528017]  do_huge_pmd_anonymous_page+0x123/0x660
    [258270.528021]  handle_mm_fault+0xfd3/0x1330
    [258270.528025]  __get_user_pages+0x113/0x640
    [258270.528027]  get_user_pages+0x4f/0x60
    [258270.528063]  __gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
    [258270.528108]  try_async_pf+0x66/0x230 [kvm]
    [258270.528135]  tdp_page_fault+0x130/0x280 [kvm]
    [258270.528149]  kvm_mmu_page_fault+0x60/0x120 [kvm]
    [258270.528158]  handle_ept_violation+0x91/0x170 [kvm_intel]
    [258270.528162]  vmx_handle_exit+0x1ca/0x1400 [kvm_intel]

No performance changes were detected in quick ping-pong tests on
my 4 socket system, which is expected since an FPU+xstate load is
on the order of 0.1us, while ping-ponging between CPUs is on the
order of 20us, and somewhat noisy.

Cc: stable@vger.kernel.org
Signed-off-by: NRik van Riel <riel@redhat.com>
Suggested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu,
 which happened inside from KVM_CREATE_VCPU ioctl. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

f775b13e

05 12月, 2017 2 次提交

KVM: Introduce KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctl · 69eaedee

由 Brijesh Singh 提交于 12月 04, 2017

If hardware supports memory encryption then KVM_MEMORY_ENCRYPT_REG_REGION
and KVM_MEMORY_ENCRYPT_UNREG_REGION ioctl's can be used by userspace to
register/unregister the guest memory regions which may contain the encrypted
data (e.g guest RAM, PCI BAR, SMRAM etc).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Improvements-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>

69eaedee

KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctl · 5acc5c06

由 Brijesh Singh 提交于 12月 04, 2017

If the hardware supports memory encryption then the
KVM_MEMORY_ENCRYPT_OP ioctl can be used by qemu to issue a platform
specific memory encryption commands.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>

5acc5c06

28 11月, 2017 2 次提交

KVM: Let KVM_SET_SIGNAL_MASK work as advertised · 20b7035c

由 Jan H. Schönherr 提交于 11月 24, 2017

KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
"any unblocked signal received [...] will cause KVM_RUN to return with
-EINTR" and that "the signal will only be delivered if not blocked by
the original signal mask".

This, however, is only true, when the calling task has a signal handler
registered for a signal. If not, signal evaluation is short-circuited for
SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
returning or the whole process is terminated.

Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
to that in do_sigtimedwait() to avoid short-circuiting of signals.
Signed-off-by: NJan H. SchÃ¶nherr <jschoenh@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

20b7035c

KVM: X86: Fix softlockup when get the current kvmclock · e70b57a6

由 Wanpeng Li 提交于 11月 20, 2017

 watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [qemu-system-x86:10185]
 CPU: 6 PID: 10185 Comm: qemu-system-x86 Tainted: G           OE   4.14.0-rc4+ #4
 RIP: 0010:kvm_get_time_scale+0x4e/0xa0 [kvm]
 Call Trace:
  get_time_ref_counter+0x5a/0x80 [kvm]
  kvm_hv_process_stimers+0x120/0x5f0 [kvm]
  kvm_arch_vcpu_ioctl_run+0x4b4/0x1690 [kvm]
  kvm_vcpu_ioctl+0x33a/0x620 [kvm]
  do_vfs_ioctl+0xa1/0x5d0
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x1e/0xa9

This can be reproduced when running kvm-unit-tests/hyperv_stimer.flat and
cpu-hotplug stress simultaneously. __this_cpu_read(cpu_tsc_khz) returns 0
(set in kvmclock_cpu_down_prep()) when the pCPU is unhotplug which results
in kvm_get_time_scale() gets into an infinite loop.

This patch fixes it by treating the unhotplug pCPU as not using master clock.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e70b57a6

17 11月, 2017 4 次提交

KVM: x86: inject exceptions produced by x86_decode_insn · 6ea6e843

由 Paolo Bonzini 提交于 11月 10, 2017

Sometimes, a processor might execute an instruction while another
processor is updating the page tables for that instruction's code page,
but before the TLB shootdown completes.  The interesting case happens
if the page is in the TLB.

In general, the processor will succeed in executing the instruction and
nothing bad happens.  However, what if the instruction is an MMIO access?
If *that* happens, KVM invokes the emulator, and the emulator gets the
updated page tables.  If the update side had marked the code page as non
present, the page table walk then will fail and so will x86_decode_insn.

Unfortunately, even though kvm_fetch_guest_virt is correctly returning
X86EMUL_PROPAGATE_FAULT, x86_decode_insn's caller treats the failure as
a fatal error if the instruction cannot simply be reexecuted (as is the
case for MMIO).  And this in fact happened sometimes when rebooting
Windows 2012r2 guests.  Just checking ctxt->have_exception and injecting
the exception if true is enough to fix the case.

Thanks to Eduardo Habkost for helping in the debugging of this issue.
Reported-by: NYanan Fu <yfu@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

6ea6e843

KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs · fab0aa3b

由 Eyal Moscovici 提交于 11月 08, 2017

Some guests use these unhandled MSRs very frequently.
This cause dmesg to be populated with lots of aggregated messages on
usage of ignored MSRs. As ignore_msrs=true means that the user is
well-aware his guest use ignored MSRs, allow to also disable the
prints on their usage.

An example of such guest is ESXi which tends to access a lot to MSR
0x34 (MSR_SMI_COUNT) very frequently.

In addition, we have observed this to cause unnecessary delays to
guest execution. Such an example is ESXi which experience networking
delays in it's guests (L2 guests) because of these prints (even when
prints are rate-limited). This can easily be reproduced by pinging
from one L2 guest to another.  Once in a while, a peak in ping RTT
will be observed. Removing these unhandled MSR prints solves the
issue.

Because these prints can help diagnose issues with guests,
this commit only suppress them by a module parameter instead of
removing them from code entirely.
Signed-off-by: NEyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Changed suppress_ignore_msrs_prints to report_ignored_msrs - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

fab0aa3b

KVM: x86: emulator: Return to user-mode on L1 CPL=0 emulation failure · 1f4dcb3b

由 Liran Alon 提交于 11月 05, 2017

On this case, handle_emulation_failure() fills kvm_run with
internal-error information which it expects to be delivered
to user-mode for further processing.
However, the code reports a wrong return-value which makes KVM to never
return to user-mode on this scenario.

Fixes: 6d77dbfc ("KVM: inject #UD if instruction emulation fails and exit to
userspace")
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

1f4dcb3b

KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk · 51c4b8bb

由 Liran Alon 提交于 11月 05, 2017

When guest passes KVM it's pvclock-page GPA via WRMSR to
MSR_KVM_SYSTEM_TIME / MSR_KVM_SYSTEM_TIME_NEW, KVM don't initialize
pvclock-page to some start-values. It just requests a clock-update which
will happen before entering to guest.

The clock-update logic will call kvm_setup_pvclock_page() to update the
pvclock-page with info. However, kvm_setup_pvclock_page() *wrongly*
assumes that the version-field is initialized to an even number. This is
wrong because at first-time write, field could be any-value.

Fix simply makes sure that if first-time version-field is odd, increment
it once more to make it even and only then start standard logic.
This follows same logic as done in other pvclock shared-pages (See
kvm_write_wall_clock() and record_steal_time()).
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

51c4b8bb

21 10月, 2017 1 次提交

KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0 · 9ffd986c

由 Wanpeng Li 提交于 10月 19, 2017

Both Intel SDM and AMD APM mentioned that MCi_STATUS, when the register is
implemented, this register can be cleared by explicitly writing 0s to this
register. Writing 1s to this register will cause a general-protection
exception.

The mce is emulated in qemu, so just the guest attempts to write 1 to this
register should cause a #GP, this patch does it.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

9ffd986c

19 10月, 2017 1 次提交

KVM: SVM: detect opening of SMI window using STGI intercept · cc3d967f

由 Ladi Prosek 提交于 10月 17, 2017

Commit 05cade71 ("KVM: nSVM: fix SMI injection in guest mode") made
KVM mask SMI if GIF=0 but it didn't do anything to unmask it when GIF is
enabled.

The issue manifests for me as a significantly longer boot time of Windows
guests when running with SMM-enabled OVMF.

This commit fixes it by intercepting STGI instead of requesting immediate
exit if the reason why SMM was masked is GIF.

Fixes: 05cade71 ("KVM: nSVM: fix SMI injection in guest mode")
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

cc3d967f

12 10月, 2017 3 次提交

KVM: nSVM: fix SMI injection in guest mode · 05cade71

由 Ladi Prosek 提交于 10月 11, 2017

Entering SMM while running in guest mode wasn't working very well because several
pieces of the vcpu state were left set up for nested operation.

Some of the issues observed:

* L1 was getting unexpected VM exits (using L1 interception controls but running
  in SMM execution environment)
* MMU was confused (walk_mmu was still set to nested_mmu)
* INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)

Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
document this but they provide fields in the SMM state-save area to stash the
current state of SVM. What we need to do is basically get out of guest mode for
the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
control and no L1 observable state changes.

To avoid code duplication this commit takes advantage of the existing nested
vmexit and run functionality, perhaps at the cost of efficiency. To get out of
guest mode, nested_svm_vmexit is called, unchanged. Re-entering is performed using
enter_svm_guest_mode.

This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
OVMF firmware (OVMF_CODE-need-smm.fd).
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

05cade71

KVM: x86: introduce ISA specific smi_allowed callback · 72d7b374

由 Ladi Prosek 提交于 10月 11, 2017

Similar to NMI, there may be ISA specific reasons why an SMI cannot be
injected into the guest. This commit adds a new smi_allowed callback to
be implemented in following commits.
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

72d7b374

KVM: x86: introduce ISA specific SMM entry/exit callbacks · 0234bf88

由 Ladi Prosek 提交于 10月 11, 2017

Entering and exiting SMM may require ISA specific handling under certain
circumstances. This commit adds two new callbacks with empty implementations.
Actual functionality will be added in following commits.

* pre_enter_smm() is to be called when injecting an SMM, before any
  SMM related vcpu state has been changed
* pre_leave_smm() is to be called when emulating the RSM instruction,
  when the vcpu is in real mode and before any SMM related vcpu state
  has been restored
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0234bf88

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功