提交 · efdab992813fb2ed825745625b83c05032e9cda2 · openeuler / Kernel

16 1月, 2018 5 次提交

KVM: x86: fix escape of guest dr6 to the host · efdab992

由 Wanpeng Li 提交于 12月 13, 2017

syzkaller reported:

   WARNING: CPU: 0 PID: 12927 at arch/x86/kernel/traps.c:780 do_debug+0x222/0x250
   CPU: 0 PID: 12927 Comm: syz-executor Tainted: G           OE    4.15.0-rc2+ #16
   RIP: 0010:do_debug+0x222/0x250
   Call Trace:
    <#DB>
    debug+0x3e/0x70
   RIP: 0010:copy_user_enhanced_fast_string+0x10/0x20
    </#DB>
    _copy_from_user+0x5b/0x90
    SyS_timer_create+0x33/0x80
    entry_SYSCALL_64_fastpath+0x23/0x9a

The testcase sets a watchpoint (with perf_event_open) on a buffer that is
passed to timer_create() as the struct sigevent argument.  In timer_create(),
copy_from_user()'s rep movsb triggers the BP.  The testcase also sets
the debug registers for the guest.

However, KVM only restores host debug registers when the host has active
watchpoints, which triggers a race condition when running the testcase with
multiple threads.  The guest's DR6.BS bit can escape to the host before
another thread invokes timer_create(), and do_debug() complains.

The fix is to respect do_debug()'s dr6 invariant when leaving KVM.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

efdab992

KVM: X86: support paravirtualized help for TLB shootdowns · f38a7b75

由 Wanpeng Li 提交于 12月 12, 2017

When running on a virtual machine, IPIs are expensive when the target
CPU is sleeping.  Thus, it is nice to be able to avoid them for TLB
shootdowns.  KVM can just do the flush via INVVPID on the guest's behalf
the next time the CPU is scheduled.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
[Use "&" to test the bit instead of "==". - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

f38a7b75

KVM: X86: introduce invalidate_gpa argument to tlb flush · c2ba05cc

由 Wanpeng Li 提交于 12月 12, 2017

Introduce a new bool invalidate_gpa argument to kvm_x86_ops->tlb_flush,
it will be used by later patches to just flush guest tlb.

For VMX, this will use INVVPID instead of INVEPT, which will invalidate
combined mappings while keeping guest-physical mappings.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c2ba05cc

KVM: X86: use paravirtualized TLB Shootdown · 858a43aa

由 Wanpeng Li 提交于 12月 12, 2017

Remote TLB flush does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted or
blocked. In this scenario, the initator vcpu would end up busy-waiting
for a long amount of time; it also consumes CPU unnecessarily to wake
up the target of the shootdown.

This patch set adds support for KVM's new paravirtualized TLB flush;
remote TLB flush does not wait for vcpus that are sleeping, instead
KVM will flush the TLB as soon as the vCPU starts running again.

The improvement is clearly visible when the host is overcommitted; in this
case, the PV TLB flush (in addition to avoiding the wait on the main CPU)
prevents preempted vCPUs from stealing precious execution time from the
running ones.

Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
so 64 pCPUs, and each VM is 64 vCPUs.

ebizzy -M
              vanilla    optimized     boost
1VM            46799       48670         4%
2VM            23962       42691        78%
3VM            16152       37539       132%

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

858a43aa

KVM: X86: Add KVM_VCPU_PREEMPTED · fa55eedd

由 Wanpeng Li 提交于 12月 12, 2017

The next patch will add another bit to the preempted field in
kvm_steal_time.  Define a constant for bit 0 (the only one that is
currently used).

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

fa55eedd

14 12月, 2017 33 次提交

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl · 9b062471

由 Christoffer Dall 提交于 12月 04, 2017

Move the calls to vcpu_load() and vcpu_put() in to the architecture
specific implementations of kvm_arch_vcpu_ioctl() which dispatches
further architecture-specific ioctls on to other functions.

Some architectures support asynchronous vcpu ioctls which cannot call
vcpu_load() or take the vcpu->mutex, because that would prevent
concurrent execution with a running VCPU, which is the intended purpose
of these ioctls, for example because they inject interrupts.

We repeat the separate checks for these specifics in the architecture
code for MIPS, S390 and PPC, and avoid taking the vcpu->mutex and
calling vcpu_load for these ioctls.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9b062471

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_fpu · 6a96bc7f

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_fpu().
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6a96bc7f

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_fpu · 1393123e

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_fpu().
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1393123e

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_guest_debug · 66b56562

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_guest_debug().
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

66b56562

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_translate · 1da5b61d

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_translate().
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1da5b61d

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_mpstate · e83dff5e

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_mpstate().
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e83dff5e

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_mpstate · fd232561

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_mpstate().
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fd232561

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_sregs · b4ef9d4e

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_sregs().
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b4ef9d4e

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_sregs · bcdec41c

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_sregs().
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bcdec41c

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_regs · 875656fe

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_regs().
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

875656fe

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_regs · 1fc9b76b

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_regs().
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1fc9b76b

KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_run · accb757d

由 Christoffer Dall 提交于 12月 04, 2017

Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_run().
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390 parts
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
[Rebased. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

accb757d

KVM: Take vcpu->mutex outside vcpu_load · ec7660cc

由 Christoffer Dall 提交于 12月 04, 2017

As we're about to call vcpu_load() from architecture-specific
implementations of the KVM vcpu ioctls, but yet we access data
structures protected by the vcpu->mutex in the generic code, factor
this logic out from vcpu_load().

x86 is the only architecture which calls vcpu_load() outside of the main
vcpu ioctl function, and these calls will no longer take the vcpu mutex
following this patch.  However, with the exception of
kvm_arch_vcpu_postcreate (see below), the callers are either in the
creation or destruction path of the VCPU, which means there cannot be
any concurrent access to the data structure, because the file descriptor
is not yet accessible, or is already gone.

kvm_arch_vcpu_postcreate makes the newly created vcpu potentially
accessible by other in-kernel threads through the kvm->vcpus array, and
we therefore take the vcpu mutex in this case directly.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec7660cc

KVM: VMX: drop I/O permission bitmaps · 8eb73e2d

由 Quan Xu 提交于 12月 12, 2017

Since KVM removes the only I/O port 0x80 bypass on Intel hosts,
clear CPU_BASED_USE_IO_BITMAPS and set CPU_BASED_UNCOND_IO_EXITING
bit. Then these I/O permission bitmaps are not used at all, so
drop I/O permission bitmaps.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim KrÄmÃ¡Å™ <rkrcmar@redhat.com>
Signed-off-by: NQuan Xu <quan.xu0@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8eb73e2d

KVM: X86: Reduce the overhead when lapic_timer_advance is disabled · 9c48d517

由 Wanpeng Li 提交于 12月 01, 2017

When I run ebizzy in a 32 vCPUs guest on a 32 pCPUs Xeon box, I can observe
~8000 kvm_wait_lapic_expire CurAvg/s through kvm_stat tool even if the advance
tscdeadline hrtimer expiration is disabled. Each call to wait_lapic_expire()
will consume ~70 cycles when a timer fires since apic_timer_expire() will
set expired_tscdeadline and then wait_lapic_expire() will do some caculation
before bailing out. So total ~175us per second is lost on this 3.2Ghz machine.
This patch reduces the overhead by skipping the function wait_lapic_expire()
when lapic_timer_advance is disabled.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9c48d517

KVM: VMX: Cache IA32_DEBUGCTL in memory · 74c55931

由 Wanpeng Li 提交于 11月 29, 2017

MSR_IA32_DEBUGCTLMSR is zeroed on VMEXIT, so it is saved/restored each
time during world switch.  This patch caches the host IA32_DEBUGCTL MSR
and saves/restores the host IA32_DEBUGCTL msr when guest/host switches
to avoid to save/restore each time during world switch.  This saves
about 100 clock cycles per vmexit.
Suggested-by: NJim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

74c55931

KVM: nVMX: Add a WARN for freeing a loaded VMCS02 · 276c796c

由 Mark Kanda 提交于 11月 27, 2017

When attempting to free a loaded VMCS02, add a WARN and avoid
freeing it (to avoid use-after-free situations).
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMark Kanda <mark.kanda@oracle.com>
Reviewed-by: NAmeya More <ameya.more@oracle.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

276c796c

KVM: nVMX: Eliminate vmcs02 pool · 00647b44

由 Jim Mattson 提交于 11月 27, 2017

The potential performance advantages of a vmcs02 pool have never been
realized. To simplify the code, eliminate the pool. Instead, a single
vmcs02 is allocated per VCPU when the VCPU enters VMX operation.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NMark Kanda <mark.kanda@oracle.com>
Reviewed-by: NAmeya More <ameya.more@oracle.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

00647b44

KVM: Expose new cpu features to guest · 80fef315

由 Yang Zhong 提交于 11月 22, 2017

Intel IceLake cpu has added new cpu features,AVX512_VBMI2/GFNI/
VAES/VPCLMULQDQ/AVX512_VNNI/AVX512_BITALG. Those new cpu features
need expose to guest VM.

The bit definition:
CPUID.(EAX=7,ECX=0):ECX[bit 06] AVX512_VBMI2
CPUID.(EAX=7,ECX=0):ECX[bit 08] GFNI
CPUID.(EAX=7,ECX=0):ECX[bit 09] VAES
CPUID.(EAX=7,ECX=0):ECX[bit 10] VPCLMULQDQ
CPUID.(EAX=7,ECX=0):ECX[bit 11] AVX512_VNNI
CPUID.(EAX=7,ECX=0):ECX[bit 12] AVX512_BITALG

The release document ref below link:
https://software.intel.com/sites/default/files/managed/c5/15/\
architecture-instruction-set-extensions-programming-reference.pdf

The kernel dependency commit in kvm.git:
(c128dbfa)
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

80fef315

KVM: x86: Add emulation of MSR_SMI_COUNT · 52797bf9

由 Liran Alon 提交于 11月 15, 2017

This MSR returns the number of #SMIs that occurred on CPU since
boot.

It was seen to be used frequently by ESXi guest.

Patch adds a new vcpu-arch specific var called smi_count to
save the number of #SMIs which occurred on CPU since boot.
It is exposed as a read-only MSR to guest (causing #GP
on wrmsr) in RDMSR/WRMSR emulation code.
MSR_SMI_COUNT is also added to emulated_msrs[] to make sure
user-space can save/restore it for migration purposes.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: NBhavesh Davda <bhavesh.davda@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

52797bf9

KVM: x86: simplify kvm_mwait_in_guest() · 431f5d44

由 Radim Krčmář 提交于 11月 29, 2017

If Intel/AMD implements MWAIT, we expect that it works well and only
reject known bugs;  no reason to do it the other way around for minor
vendors.  (Not that they are relevant ATM.)

This allows further simplification of kvm_mwait_in_guest().
And use boot_cpu_has() instead of "cpu_has(&boot_cpu_data," while at it.
Reviewed-by: NAlexander Graf <agraf@suse.de>
Acked-by: NBorislav Petkov <bp@suse.de>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

431f5d44

KVM: x86: drop bogus MWAIT check · 346f48fa

由 Radim Krčmář 提交于 11月 29, 2017

The check was added in some iteration while trying to fix a reported OS
X on Core 2 bug, but that bug is elsewhere.

The comment is misleading because the guest can call MWAIT with ECX = 0
even if we enforce CPUID5_ECX_INTERRUPT_BREAK;  the call would have the
exactly the same effect as if the host didn't have the feature.

A problem is that a QEMU feature exposes CPUID5_ECX_INTERRUPT_BREAK on
CPUs that do not support it.  Removing the check changes behavior on
last Pentium 4 lines (Presler, Dempsey, and Tulsa, which had VMX and
MONITOR while missing INTERRUPT_BREAK) when running a guest OS that uses
MWAIT without checking for its presence (QEMU doesn't expose MONITOR).

The only known OS that ignores the MONITOR flag is old Mac OS X and we
allowed it to bug on Core 2 (MWAIT used to throw #UD and only that OS
noticed), so we can save another 20 lines letting it bug on even older
CPUs.  Alternatively, we can return MWAIT exiting by default and let
userspace toggle it.
Reviewed-by: NAlexander Graf <agraf@suse.de>
Acked-by: NBorislav Petkov <bp@suse.de>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

346f48fa

KVM: x86: prevent MWAIT in guest with buggy MONITOR · 2a140f3b

由 Radim Krčmář 提交于 11月 29, 2017

The bug prevents MWAIT from waking up after a write to the monitored
cache line.
KVM might emulate a CPU model that shouldn't have the bug, so the guest
would not employ a workaround and possibly miss wakeups.
Better to avoid the situation.
Reviewed-by: NAlexander Graf <agraf@suse.de>
Acked-by: NBorislav Petkov <bp@suse.de>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2a140f3b

KVM: x86: MMU: make array audit_point_name static · 0d37d26f

由 Colin Ian King 提交于 11月 30, 2017

The array audit_point_name is local to the source and does not need to
be in global scope, so make it static.

Cleans up sparse warning:
arch/x86/kvm/mmu_audit.c:22:12: warning: symbol 'audit_point_name' was
not declared. Should it be static?
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0d37d26f

x86: kvm: mmu: make kvm_mmu_clear_all_pte_masks static · 858ac87f

由 Gimcuan Hui 提交于 11月 03, 2017

The kvm_mmu_clear_all_pte_masks interface is only used by kvm_mmu_module_init
locally, and does not need to be called by other module, make it static.

This patch cleans up sparse warning:
symbol 'kvm_mmu_clear_all_pte_masks' was not declared. Should it be static?
Signed-off-by: NGimcuan Hui <gimcuan@gmail.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

858ac87f

KVM: x86: emulate RDPID · fb6d4d34

由 Paolo Bonzini 提交于 7月 12, 2016

This is encoded as F3 0F C7 /7 with a register argument.  The register
argument is the second array in the group9 GroupDual, while F3 is the
fourth element of a Prefix.
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fb6d4d34

KVM: vmx: add support for emulating UMIP · 0367f205

由 Paolo Bonzini 提交于 7月 12, 2016

UMIP can be emulated almost perfectly on Intel processor by enabling
descriptor-table exits. SMSW does not cause a vmexit and hence it
cannot be changed into a #GP fault, but all in all it's the most
"innocuous" of the unprivileged instructions that UMIP blocks.

In fact, Linux is _also_ emulating SMSW instructions on behalf of the
program that executes them, because some 16-bit programs expect to use
SMSW to detect vm86 mode, so this is an even smaller issue.
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0367f205

KVM: x86: add support for emulating UMIP · 66336cab

由 Paolo Bonzini 提交于 7月 12, 2016

The User-Mode Instruction Prevention feature present in recent Intel
processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and
str) from being executed with CPL > 0. Otherwise, a general protection
fault is issued.

UMIP instructions in general are also able to trigger vmexits, so we can
actually emulate UMIP on older processors.  This commit sets up the
infrastructure so that kvm-intel.ko and kvm-amd.ko can set the UMIP
feature bit for CPUID even if the feature is not actually available
in hardware.
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

66336cab

KVM: x86: emulate sldt and str · dd307d01

由 Paolo Bonzini 提交于 7月 12, 2016

These are needed to handle the descriptor table vmexits when emulating
UMIP.
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dd307d01

KVM: x86: add support for UMIP · ae3e61e1

由 Paolo Bonzini 提交于 7月 12, 2016

Add the CPUID bits, make the CR4.UMIP bit not reserved anymore, and
add UMIP support for instructions that are already emulated by KVM.
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ae3e61e1

kvm: x86: fix WARN due to uninitialized guest FPU state · 5663d8f9

由 Peter Xu 提交于 12月 12, 2017

------------[ cut here ]------------
 Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers.
 WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90
 CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G    B      OE    4.15.0-rc2+ #10
 RIP: 0010:ex_handler_fprestore+0x88/0x90
 Call Trace:
  fixup_exception+0x4e/0x60
  do_general_protection+0xff/0x270
  general_protection+0x22/0x30
 RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm]
 RSP: 0018:ffff8803d5627810 EFLAGS: 00010246
  kvm_vcpu_reset+0x3b4/0x3c0 [kvm]
  kvm_apic_accept_events+0x1c0/0x240 [kvm]
  kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm]
  kvm_vcpu_ioctl+0x479/0x880 [kvm]
  do_vfs_ioctl+0x142/0x9a0
  SyS_ioctl+0x74/0x80
  do_syscall_64+0x15f/0x600

where kvm_put_guest_fpu is called without a prior kvm_load_guest_fpu.
To fix it, move kvm_load_guest_fpu to the very beginning of
kvm_arch_vcpu_ioctl_run.

Cc: stable@vger.kernel.org
Fixes: f775b13eSigned-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5663d8f9

KVM: X86: Fix load RFLAGS w/o the fixed bit · d73235d1

由 Wanpeng Li 提交于 12月 07, 2017

 *** Guest State ***
 CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
 CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871
 CR3 = 0x00000000fffbc000
 RSP = 0x0000000000000000  RIP = 0x0000000000000000
 RFLAGS=0x00000000         DR7 = 0x0000000000000400
        ^^^^^^^^^^

The failed vmentry is triggered by the following testcase when ept=Y:

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[5];
    int main()
    {
    	r[2] = open("/dev/kvm", O_RDONLY);
    	r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
    	r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
    	struct kvm_regs regs = {
    		.rflags = 0,
    	};
    	ioctl(r[4], KVM_SET_REGS, &regs);
    	ioctl(r[4], KVM_RUN, 0);
    }

X86 RFLAGS bit 1 is fixed set, userspace can simply clearing bit 1
of RFLAGS with KVM_SET_REGS ioctl which results in vmentry fails.
This patch fixes it by oring X86_EFLAGS_FIXED during ioctl.

Cc: stable@vger.kernel.org
Suggested-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NQuan Xu <quan.xu0@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d73235d1

KVM: MMU: Fix infinite loop when there is no available mmu page · ed52870f

由 Wanpeng Li 提交于 12月 04, 2017

The below test case can cause infinite loop in kvm when ept=0.

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[5];
    int main()
    {
    	r[2] = open("/dev/kvm", O_RDONLY);
    	r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
    	r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
    	ioctl(r[4], KVM_RUN, 0);
    }

It doesn't setup the memory regions, mmu_alloc_shadow/direct_roots() in
kvm return 1 when kvm fails to allocate root page table which can result
in beblow infinite loop:

    vcpu_run() {
    	for (;;) {
	    	r = vcpu_enter_guest()::kvm_mmu_reload() returns 1
	    	if (r <= 0)
	    		break;
	    	if (need_resched())
	    		cond_resched();
      }
    }

This patch fixes it by returning -ENOSPC when there is no available kvm mmu
page for root page table.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 26eeb53c (KVM: MMU: Bail out immediately if there is no available mmu page)
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ed52870f

09 12月, 2017 1 次提交

kmemcheck: rip it out for real · f335195a

由 Michal Hocko 提交于 12月 06, 2017

Commit 4675ff05 ("kmemcheck: rip it out") has removed the code but
for some reason SPDX header stayed in place.  This looks like a rebase
mistake in the mmotm tree or the merge mistake.  Let's drop those
leftovers as well.
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f335195a

07 12月, 2017 1 次提交

x86/vdso: Change time() prototype to match __vdso_time() · 88edb57d

由 Arnd Bergmann 提交于 12月 04, 2017

gcc-8 warns that time() is an alias for __vdso_time() but the two
have different prototypes:

  arch/x86/entry/vdso/vclock_gettime.c:327:5: error: 'time' alias between functions of incompatible types 'int(time_t *)' {aka 'int(long int *)'} and 'time_t(time_t *)' {aka 'long int(long int *)'} [-Werror=attribute-alias]
   int time(time_t *t)
       ^~~~
  arch/x86/entry/vdso/vclock_gettime.c:318:16: note: aliased declaration here

I could not figure out whether this is intentional, but I see that
changing it to return time_t avoids the warning.

Returning 'int' from time() is also a bit questionable, as it causes an
overflow in y2038 even on 64-bit architectures that use a 64-bit time_t
type. On 32-bit architecture with 64-bit time_t, time() should always
be implement by the C library by calling a (to be added) clock_gettime()
variant that takes a sufficiently wide argument.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: http://lkml.kernel.org/r/20171204150203.852959-1-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

88edb57d

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功