提交 · 67c9dddc95ac16a09db996e8e4dcacfd94cf2306 · openeuler / raspberrypi-kernel

19 5月, 2016 1 次提交

KVM: x86: make hwapic_isr_update and hwapic_irr_update look the same · 67c9dddc

由 Paolo Bonzini 提交于 5月 10, 2016

Neither APICv nor AVIC actually need the first argument of
hwapic_isr_update, but the vCPU makes more sense than passing the
pointer to the whole virtual machine! In fact in the APICv case it's
just happening that the vCPU is used implicitly, through the loaded VMCS.

The second argument instead is named differently, make it consistent.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

67c9dddc

29 4月, 2016 1 次提交

KVM: x86: fix ordering of cr0 initialization code in vmx_cpu_reset · f2463247

由 Bruce Rogers 提交于 4月 28, 2016

Commit d28bc9dd reversed the order of two lines which initialize cr0,
allowing the current (old) cr0 value to mess up vcpu initialization.
This was observed in the checks for cr0 X86_CR0_WP bit in the context of
kvm_mmu_reset_context(). Besides, setting vcpu->arch.cr0 after vmx_set_cr0()
is completely redundant. Change the order back to ensure proper vcpu
initialization.

The combination of booting with ovmf firmware when guest vcpus > 1 and kvm's
ept=N option being set results in a VM-entry failure. This patch fixes that.

Fixes: d28bc9dd ("KVM: x86: INIT and reset sequences are different")
Cc: stable@vger.kernel.org
Signed-off-by: NBruce Rogers <brogers@suse.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

f2463247

22 3月, 2016 6 次提交

KVM, pkeys: add pkeys support for permission_fault · be94f6b7

由 Huaitong Han 提交于 3月 22, 2016

Protection keys define a new 4-bit protection key field (PKEY) in bits
62:59 of leaf entries of the page tables, the PKEY is an index to PKRU
register(16 domains), every domain has 2 bits(write disable bit, access
disable bit).

Static logic has been produced in update_pkru_bitmask, dynamic logic need
read pkey from page table entries, get pkru value, and deduce the correct
result.

[ Huaitong: Xiao helps to modify many sections. ]
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

be94f6b7

KVM, pkeys: save/restore PKRU when guest/host switches · 1be0e61c

由 Xiao Guangrong 提交于 3月 22, 2016

Currently XSAVE state of host is not restored after VM-exit and PKRU
is managed by XSAVE so the PKRU from guest is still controlling the
memory access even if the CPU is running the code of host. This is
not safe as KVM needs to access the memory of userspace (e,g QEMU) to
do some emulation.

So we save/restore PKRU when guest/host switches.
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1be0e61c

KVM, pkeys: disable pkeys for guests in non-paging mode · ddba2628

由 Huaitong Han 提交于 3月 22, 2016

Pkeys is disabled if CPU is in non-paging mode in hardware. However KVM
always uses paging mode to emulate guest non-paging, mode with TDP. To
emulate this behavior, pkeys needs to be manually disabled when guest
switches to non-paging mode.
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ddba2628

KVM: VMX: fix nested vpid for old KVM guests · ef697a71

由 Paolo Bonzini 提交于 3月 18, 2016

Old KVM guests invoke single-context invvpid without actually checking
whether it is supported.  This was fixed by commit 518c8aee ("KVM: VMX:
Make sure single type invvpid is supported before issuing invvpid
instruction", 2010-08-01) and the patch after, but pre-2.6.36
kernels lack it including RHEL 6.

Reported-by: jmontleo@redhat.com
Tested-by: jmontleo@redhat.com
Cc: stable@vger.kernel.org
Fixes: 99b83ac8Reviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ef697a71

KVM: VMX: avoid guest hang on invalid invvpid instruction · f6870ee9

由 Paolo Bonzini 提交于 3月 18, 2016

A guest executing an invalid invvpid instruction would hang
because the instruction pointer was not updated.

Reported-by: jmontleo@redhat.com
Tested-by: jmontleo@redhat.com
Cc: stable@vger.kernel.org
Fixes: 99b83ac8Reviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f6870ee9

KVM: VMX: avoid guest hang on invalid invept instruction · 2849eb4f

由 Paolo Bonzini 提交于 3月 18, 2016

A guest executing an invalid invept instruction would hang
because the instruction pointer was not updated.

Cc: stable@vger.kernel.org
Fixes: bfd0a56bReviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2849eb4f

10 3月, 2016 1 次提交

KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo · 844a5fe2

由 Paolo Bonzini 提交于 3月 08, 2016

Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution. This will still cause a user write to fault, while
supervisor writes will succeed. User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.

When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0. If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.

The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).

There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.

Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: f6577a5fSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

844a5fe2

09 3月, 2016 1 次提交

KVM: x86: disable MPX if host did not enable MPX XSAVE features · a87036ad

由 Paolo Bonzini 提交于 3月 08, 2016

When eager FPU is disabled, KVM will still see the MPX bit in CPUID and
presumably the MPX vmentry and vmexit controls.  However, it will not
be able to expose the MPX XSAVE features to the guest, because the guest's
accessible XSAVE features are always a subset of host_xcr0.

In this case, we should disable the MPX CPUID bit, the BNDCFGS MSR,
and the MPX vmentry and vmexit controls for nested virtualization.
It is then unnecessary to enable guest eager FPU if the guest has the
MPX CPUID bit set.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a87036ad

08 3月, 2016 1 次提交

KVM: VMX: disable PEBS before a guest entry · 7099e2e1

由 Radim Krčmář 提交于 3月 04, 2016

Linux guests on Haswell (and also SandyBridge and Broadwell, at least)
would crash if you decided to run a host command that uses PEBS, like
  perf record -e 'cpu/mem-stores/pp' -a

This happens because KVM is using VMX MSR switching to disable PEBS, but
SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it
isn't safe:
  When software needs to reconfigure PEBS facilities, it should allow a
  quiescent period between stopping the prior event counting and setting
  up a new PEBS event. The quiescent period is to allow any latent
  residual PEBS records to complete its capture at their previously
  specified buffer address (provided by IA32_DS_AREA).

There might not be a quiescent period after the MSR switch, so a CPU
ends up using host's MSR_IA32_DS_AREA to access an area in guest's
memory.  (Or MSR switching is just buggy on some models.)

The guest can learn something about the host this way:
If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results
in #PF where we leak host's MSR_IA32_DS_AREA through CR2.

After that, a malicious guest can map and configure memory where
MSR_IA32_DS_AREA is pointing and can therefore get an output from
host's tracing.

This is not a critical leak as the host must initiate with PEBS tracing
and I have not been able to get a record from more than one instruction
before vmentry in vmx_vcpu_run() (that place has most registers already
overwritten with guest's).

We could disable PEBS just few instructions before vmentry, but
disabling it earlier shouldn't affect host tracing too much.
We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that
optimization isn't worth its code, IMO.

(If you are implementing PEBS for guests, be sure to handle the case
 where both host and guest enable PEBS, because this patch doesn't.)

Fixes: 26a4f3c0 ("perf/x86: disable PEBS on a guest entry.")
Cc: <stable@vger.kernel.org>
Reported-by: NJiří Olša <jolsa@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7099e2e1

04 3月, 2016 1 次提交

KVM: VMX: use vmcs_clear/set_bits for debug register exits · 8f22372f

由 Paolo Bonzini 提交于 2月 26, 2016

Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8f22372f

02 3月, 2016 1 次提交

kvm: x86: Update tsc multiplier on change. · 2680d6da

由 Owen Hofmann 提交于 3月 01, 2016

vmx.c writes the TSC_MULTIPLIER field in vmx_vcpu_load, but only when a
vcpu has migrated physical cpus. Record the last value written and
update in vmx_vcpu_load on any change, otherwise a cpu migration must
occur for TSC frequency scaling to take effect.

Cc: stable@vger.kernel.org
Fixes: ff2c3a18Signed-off-by: NOwen Hofmann <osh@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2680d6da

24 2月, 2016 2 次提交

x86: Fix misspellings in comments · 6a6256f9

由 Adam Buchbinder 提交于 2月 23, 2016

Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

6a6256f9

x86/kvm: Add output operand in vmx_handle_external_intr inline asm · 3f62de5f

由 Chris J Arges 提交于 1月 22, 2016

Stacktool generates the following warning:
  stacktool: arch/x86/kvm/vmx.o: vmx_handle_external_intr()+0x67: call without frame pointer save/setup

By adding the stackpointer as an output operand, this patch ensures that a
stack frame is created when CONFIG_FRAME_POINTER is enabled for the inline
assmebly statement.
Signed-off-by: NChris J Arges <chris.j.arges@canonical.com>
Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: gleb@kernel.org
Cc: kvm@vger.kernel.org
Cc: live-patching@vger.kernel.org
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1453499078-9330-3-git-send-email-chris.j.arges@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

3f62de5f

23 2月, 2016 1 次提交

KVM: x86: use list_last_entry · d74c0e6b

由 Geliang Tang 提交于 1月 01, 2016

To make the intention clearer, use list_last_entry instead of
list_entry.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d74c0e6b

17 2月, 2016 3 次提交

kvm/x86: Pass return code of kvm_emulate_hypercall · 0d9c055e

由 Andrey Smetanin 提交于 2月 11, 2016

Pass the return code from kvm_emulate_hypercall on to the caller,
in order to allow it to indicate to the userspace that
the hypercall has to be handled there.

Also adjust all the existing code paths to return 1 to make sure the
hypercall isn't passed to the userspace without setting kvm_run
appropriately.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Joerg Roedel <joro@8bytes.org>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0d9c055e

KVM: VMX: Fix guest debugging while in L2 · 6f05485d

由 Jan Kiszka 提交于 2月 09, 2016

When we take a #DB or #BP vmexit while in guest mode, we first of all
need to check if there is ongoing guest debugging that might be
interested in the event. Currently, we unconditionally leave L2 and
inject the event into L1 if it is intercepting the exceptions. That
breaks things marvelously.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6f05485d

KVM: VMX: Factor out is_exception_n helper · 5bb16016

由 Jan Kiszka 提交于 2月 09, 2016

There is quite some common code in all these is_<exception>() helpers.
Factor it out before adding even more of them.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5bb16016

09 2月, 2016 2 次提交

KVM/VMX: Add host irq information in trace event when updating IRTE for posted interrupts · b6ce9780

由 Feng Wu 提交于 1月 25, 2016

Add host irq information in trace event, so we can better understand
which irq is in posted mode.
Signed-off-by: NFeng Wu <feng.wu@intel.com>
Reviewed-by: NRadim Krcmar <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6ce9780

KVM: Recover IRTE to remapped mode if the interrupt is not single-destination · 23a1c257

由 Feng Wu 提交于 1月 25, 2016

When the interrupt is not single destination any more, we need
to change back IRTE to remapped mode explicitly.
Signed-off-by: NFeng Wu <feng.wu@intel.com>
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

23a1c257

16 1月, 2016 1 次提交

kvm: rename pfn_t to kvm_pfn_t · ba049e93

由 Dan Williams 提交于 1月 15, 2016

To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o.  It allows userspace to coordinate
DMA/RDMA from/to persistent memory.

The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver.  The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.

The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.

Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array.  Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory.  The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.

This patch (of 18):

The core has developed a need for a "pfn_t" type [1].  Move the existing
pfn_t in KVM to kvm_pfn_t [2].

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba049e93

12 1月, 2016 1 次提交

kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL · 45bdbcfd

由 Huaitong Han 提交于 1月 12, 2016

vmx_cpuid_tries to update SECONDARY_VM_EXEC_CONTROL in the VMCS, but
it will cause a vmwrite error on older CPUs because the code does not
check for the presence of CPU_BASED_ACTIVATE_SECONDARY_CONTROLS.

This will get rid of the following trace on e.g. Core2 6600:

vmwrite error: reg 401e value 10 (err 12)
Call Trace:
[<ffffffff8116e2b9>] dump_stack+0x40/0x57
[<ffffffffa020b88d>] vmx_cpuid_update+0x5d/0x150 [kvm_intel]
[<ffffffffa01d8fdc>] kvm_vcpu_ioctl_set_cpuid2+0x4c/0x70 [kvm]
[<ffffffffa01b8363>] kvm_arch_vcpu_ioctl+0x903/0xfa0 [kvm]

Fixes: feda805f
Cc: stable@vger.kernel.org
Reported-by: NZdenek Kaspar <zkaspar82@gmail.com>
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

45bdbcfd

17 12月, 2015 4 次提交

KVM: vmx: detect mismatched size in VMCS read/write · 8a86aea9

由 Paolo Bonzini 提交于 12月 03, 2015

Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
---
	I am sending this as RFC because the error messages it produces are
	very ugly.  Because of inlining, the original line is lost.  The
	alternative is to change vmcs_read/write/checkXX into macros, but
	then you need to have a single huge BUILD_BUG_ON or BUILD_BUG_ON_MSG
	because multiple BUILD_BUG_ON* with the same __LINE__ are not
	supported well.

8a86aea9

KVM: VMX: fix read/write sizes of VMCS fields in dump_vmcs · 845c5b40

由 Paolo Bonzini 提交于 12月 03, 2015

This was not printing the high parts of several 64-bit fields on
32-bit kernels.  Separate from the previous one to make the patches
easier to review.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

845c5b40

KVM: VMX: fix read/write sizes of VMCS fields · f3531054

由 Paolo Bonzini 提交于 12月 03, 2015

In theory this should have broken EPT on 32-bit kernels (due to
reading the high part of natural-width field GUEST_CR3).  Not sure
if no one noticed or the processor behaves differently from the
documentation.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f3531054

KVM: VMX: fix the writing POSTED_INTR_NV · 0bcf261c

由 Li RongQing 提交于 12月 03, 2015

POSTED_INTR_NV is 16bit, should not use 64bit write function

[ 5311.676074] vmwrite error: reg 3 value 0 (err 12)
  [ 5311.680001] CPU: 49 PID: 4240 Comm: qemu-system-i38 Tainted: G I 4.1.13-WR8.0.0.0_standard #1
  [ 5311.689343] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
  [ 5311.699550] 00000000 00000000 e69a7e1c c1950de1 00000000 e69a7e38 fafcff45 fafebd24
  [ 5311.706924] 00000003 00000000 0000000c b6a06dfa e69a7e40 fafcff79 e69a7eb0 fafd5f57
  [ 5311.714296] e69a7ec0 c1080600 00000000 00000001 c0e18018 000001be 00000000 00000b43
  [ 5311.721651] Call Trace:
  [ 5311.722942] [<c1950de1>] dump_stack+0x4b/0x75
  [ 5311.726467] [<fafcff45>] vmwrite_error+0x35/0x40 [kvm_intel]
  [ 5311.731444] [<fafcff79>] vmcs_writel+0x29/0x30 [kvm_intel]
  [ 5311.736228] [<fafd5f57>] vmx_create_vcpu+0x337/0xb90 [kvm_intel]
  [ 5311.741600] [<c1080600>] ? dequeue_task_fair+0x2e0/0xf60
  [ 5311.746197] [<faf3b9ca>] kvm_arch_vcpu_create+0x3a/0x70 [kvm]
  [ 5311.751278] [<faf29e9d>] kvm_vm_ioctl+0x14d/0x640 [kvm]
  [ 5311.755771] [<c1129d44>] ? free_pages_prepare+0x1a4/0x2d0
  [ 5311.760455] [<c13e2842>] ? debug_smp_processor_id+0x12/0x20
  [ 5311.765333] [<c10793be>] ? sched_move_task+0xbe/0x170
  [ 5311.769621] [<c11752b3>] ? kmem_cache_free+0x213/0x230
  [ 5311.774016] [<faf29d50>] ? kvm_set_memory_region+0x60/0x60 [kvm]
  [ 5311.779379] [<c1199fa2>] do_vfs_ioctl+0x2e2/0x500
  [ 5311.783285] [<c11752b3>] ? kmem_cache_free+0x213/0x230
  [ 5311.787677] [<c104dc73>] ? __mmdrop+0x63/0xd0
  [ 5311.791196] [<c104dc73>] ? __mmdrop+0x63/0xd0
  [ 5311.794712] [<c104dc73>] ? __mmdrop+0x63/0xd0
  [ 5311.798234] [<c11a2ed7>] ? __fget+0x57/0x90
  [ 5311.801559] [<c11a2f72>] ? __fget_light+0x22/0x50
  [ 5311.805464] [<c119a240>] SyS_ioctl+0x80/0x90
  [ 5311.808885] [<c1957d30>] sysenter_do_call+0x12/0x12
  [ 5312.059280] kvm: zapping shadow pages for mmio generation wraparound
  [ 5313.678415] kvm [4231]: vcpu0 disabled perfctr wrmsr: 0xc2 data 0xffff
  [ 5313.726518] kvm [4231]: vcpu0 unhandled rdmsr: 0x570
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Cc: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0bcf261c

14 12月, 2015 1 次提交

KVM: VMX: Fix host initiated access to guest MSR_TSC_AUX · 81b1b9ca

由 Haozhong Zhang 提交于 12月 14, 2015

The current handling of accesses to guest MSR_TSC_AUX returns error if
vcpu does not support rdtscp, though those accesses are initiated by
host. This can result in the reboot failure of some versions of
QEMU. This patch fixes this issue by passing those host initiated
accesses for further handling instead.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

81b1b9ca

11 12月, 2015 1 次提交

kvm: x86: move tracepoints outside extended quiescent state · 8b89fe1f

由 Paolo Bonzini 提交于 12月 10, 2015

Invoking tracepoints within kvm_guest_enter/kvm_guest_exit causes a
lockdep splat.
Reported-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8b89fe1f

26 11月, 2015 2 次提交

kvm/x86: per-vcpu apicv deactivation support · d62caabb

由 Andrey Smetanin 提交于 11月 10, 2015

The decision on whether to use hardware APIC virtualization used to be
taken globally, based on the availability of the feature in the CPU
and the value of a module parameter.

However, under certain circumstances we want to control it on per-vcpu
basis.  In particular, when the userspace activates HyperV synthetic
interrupt controller (SynIC), APICv has to be disabled as it's
incompatible with SynIC auto-EOI behavior.

To achieve that, introduce 'apicv_active' flag on struct
kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv
off.  The flag is initialized based on the module parameter and CPU
capability, and consulted whenever an APICv-specific action is
performed.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d62caabb

kvm/x86: split ioapic-handled and EOI exit bitmaps · 6308630b

由 Andrey Smetanin 提交于 11月 10, 2015

The function to determine if the vector is handled by ioapic used to
rely on the fact that only ioapic-handled vectors were set up to
cause vmexits when virtual apic was in use.

We're going to break this assumption when introducing Hyper-V
synthetic interrupts: they may need to cause vmexits too.

To achieve that, introduce a new bitmap dedicated specifically for
ioapic-handled vectors, and populate EOI exit bitmap from it for now.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6308630b

25 11月, 2015 1 次提交

KVM: nVMX: remove incorrect vpid check in nested invvpid emulation · b2467e74

由 Haozhong Zhang 提交于 11月 25, 2015

This patch removes the vpid check when emulating nested invvpid
instruction of type all-contexts invalidation. The existing code is
incorrect because:
 (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate
     Translations Based on VPID", invvpid instruction does not check
     vpid in the invvpid descriptor when its type is all-contexts
     invalidation.
 (2) According to the same document, invvpid of type all-contexts
     invalidation does not require there is an active VMCS, so/and
     get_vmcs12() in the existing code may result in a NULL-pointer
     dereference. In practice, it can crash both KVM itself and L1
     hypervisors that use invvpid (e.g. Xen).
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b2467e74

10 11月, 2015 8 次提交

KVM: x86: rename update_db_bp_intercept to update_bp_intercept · a96036b8

由 Paolo Bonzini 提交于 11月 10, 2015

Because #DB is now intercepted unconditionally, this callback
only operates on #BP for both VMX and SVM.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a96036b8

KVM: x86: work around infinite loop in microcode when #AC is delivered · 54a20552

由 Eric Northup 提交于 11月 03, 2015

It was found that a guest can DoS a host by triggering an infinite
stream of "alignment check" (#AC) exceptions.  This causes the
microcode to enter an infinite loop where the core never receives
another interrupt.  The host kernel panics pretty quickly due to the
effects (CVE-2015-5307).
Signed-off-by: NEric Northup <digitaleric@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

54a20552

KVM: VMX: Dump TSC multiplier in dump_vmcs() · 8cfe9866

由 Haozhong Zhang 提交于 10月 20, 2015

This patch enhances dump_vmcs() to dump the value of TSC multiplier
field in VMCS.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8cfe9866

KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC · be7b263e

由 Haozhong Zhang 提交于 10月 20, 2015

This patch makes kvm-intel to return a scaled host TSC plus the TSC
offset when handling guest readings to MSR_IA32_TSC.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

be7b263e

KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded · ff2c3a18

由 Haozhong Zhang 提交于 10月 20, 2015

This patch makes kvm-intel module to load TSC scaling ratio into TSC
multiplier field of VMCS when a vcpu is loaded, so that TSC scaling
ratio can take effect if VMX TSC scaling is enabled.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ff2c3a18

KVM: VMX: Enable and initialize VMX TSC scaling · 64903d61

由 Haozhong Zhang 提交于 10月 20, 2015

This patch exhances kvm-intel module to enable VMX TSC scaling and
collects information of TSC scaling ratio during initialization.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

64903d61

KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset() · 58ea6767

由 Haozhong Zhang 提交于 10月 20, 2015

For both VMX and SVM, if the 2nd argument of call-back
adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale
it first. This patch moves this common TSC scaling logic to its caller
adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to
adjust_tsc_offset_guest().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

58ea6767

KVM: x86: Replace call-back compute_tsc_offset() with a common function · 07c1419a

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM calculate the tsc-offset in the same way, so this
patch removes the call-back compute_tsc_offset() and replaces it with a
common function kvm_compute_tsc_offset().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

07c1419a