提交 · 67198ac3f37ffb150f1c95fae16b597339eabc9d · openanolis / cloud-kernel

20 9月, 2016 2 次提交

KVM: x86: initialize kvmclock_offset · 67198ac3

由 Paolo Bonzini 提交于 9月 01, 2016

Make the guest's kvmclock count up from zero, not from the host boot
time.  The guest cannot rely on that anyway because it changes on
migration, the numbers are easier on the eye and finally it matches the
desired semantics of the Hyper-V time reference counter.
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

67198ac3

KVM: x86: always fill in vcpu->arch.hv_clock · 0d6dd2ff

由 Paolo Bonzini 提交于 9月 01, 2016

We will use it in the next patches for KVM_GET_CLOCK and as a basis for the
contents of the Hyper-V TSC page.  Get the values from the Linux
timekeeper even if kvmclock is not enabled.
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0d6dd2ff

16 9月, 2016 4 次提交

kvm: x86: export TSC information to user-space · 4f5758fc

由 Luiz Capitulino 提交于 9月 16, 2016

This commit exports the following information to
user-space via the newly created per-vcpu debugfs
directory:

 - TSC offset (as a signed number)
 - TSC scaling ratio
 - TSC scaling ratio fractinal bits

The original intention of this commit was to
export only the TSC offset, but the TSC scaling
information is exported for completeness.

We need to retrieve the TSC offset from user-space
in order to support the merging of host and guest
traces in trace-cmd. Today, we use the kvm_write_tsc_offset
tracepoint, but it has a number of problems (mainly,
it requires a running VM to be rebooted, ftrace setup,
and also tracepoints are not supposed to be ABIs).

The merging of host and guest traces is explained
in more detail in this thread:

 [Qemu-devel] [RFC] host and guest kernel trace merging
 https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html

This commit creates the following files in debugfs:

/sys/kernel/debug/kvm/66828-10/vcpu0/tsc-offset
/sys/kernel/debug/kvm/66828-10/vcpu0/tsc-scaling-ratio
/sys/kernel/debug/kvm/66828-10/vcpu0/tsc-scaling-ratio-frac-bits

The last two are only created if TSC scaling is supported.
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f5758fc

kvm: add stubs for arch specific debugfs support · 235539b4

由 Luiz Capitulino 提交于 9月 07, 2016

Two stubs are added:

 o kvm_arch_has_vcpu_debugfs(): must return true if the arch
   supports creating debugfs entries in the vcpu debugfs dir
   (which will be implemented by the next commit)

 o kvm_arch_create_vcpu_debugfs(): code that creates debugfs
   entries in the vcpu debugfs dir

For x86, this commit introduces a new file to avoid growing
arch/x86/kvm/x86.c even more.
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

235539b4

kvm: x86: drop read_tsc_offset() · 3e3f5026

由 Luiz Capitulino 提交于 9月 07, 2016

The TSC offset can now be read directly from struct kvm_arch_vcpu.
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3e3f5026

kvm: x86: add tsc_offset field to struct kvm_vcpu_arch · a545ab6a

由 Luiz Capitulino 提交于 9月 07, 2016

A future commit will want to easily read a vCPU's TSC offset,
so we store it in struct kvm_arch_vcpu_arch for easy access.
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a545ab6a

08 9月, 2016 10 次提交

svm: Implements update_pi_irte hook to setup posted interrupt · 411b44ba

由 Suravee Suthikulpanit 提交于 8月 23, 2016

This patch implements update_pi_irte function hook to allow SVM
communicate to IOMMU driver regarding how to set up IRTE for handling
posted interrupt.

In case AVIC is enabled, during vcpu_load/unload, SVM needs to update
IOMMU IRTE with appropriate host physical APIC ID. Also, when
vcpu_blocking/unblocking, SVM needs to update the is-running bit in
the IOMMU IRTE. Both are achieved via calling amd_iommu_update_ga().

However, if GA mode is not enabled for the pass-through device,
IOMMU driver will simply just return when calling amd_iommu_update_ga.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

411b44ba

svm: Introduce AMD IOMMU avic_ga_log_notifier · 5881f737

由 Suravee Suthikulpanit 提交于 8月 23, 2016

This patch introduces avic_ga_log_notifier, which will be called
by IOMMU driver whenever it handles the Guest vAPIC (GA) log entry.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5881f737

svm: Introduces AVIC per-VM ID · 5ea11f2b

由 Suravee Suthikulpanit 提交于 8月 23, 2016

Introduces per-VM AVIC ID and helper functions to manage the IDs.
Currently, the ID will be used to implement 32-bit AVIC IOMMU GA tag.

The ID is 24-bit one-based indexing value, and is managed via helper
functions to get the next ID, or to free an ID once a VM is destroyed.
There should be no ID conflict for any active VMs.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5ea11f2b

KVM: nVMX: expose INS/OUTS information support · 9ac7e3e8

由 Jan Dakinevich 提交于 9月 04, 2016

Expose the feature to L1 hypervisor if host CPU supports it, since
certain hypervisors requires it for own purposes.

According to Intel SDM A.1, if CPU supports the feature,
VMX_INSTRUCTION_INFO field of VMCS will contain detailed information
about INS/OUTS instructions handling. This field is already copied to
VMCS12 for L1 hypervisor (see prepare_vmcs12 routine) independently
feature presence.
Signed-off-by: NJan Dakinevich <jan.dakinevich@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9ac7e3e8

KVM: VMX: not use vmcs_config in setup_vmcs_config · 16cb0255

由 Paolo Bonzini 提交于 9月 05, 2016

setup_vmcs_config takes a pointer to the vmcs_config global.  The
indirection is somewhat pointless, but just keep things consistent
for now.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

16cb0255

KVM: x86: remove stale comments · 1a698235

由 Paolo Bonzini 提交于 8月 19, 2016

handle_external_intr does not enable interrupts anymore, vcpu_enter_guest
does it after calling guest_exit_irqoff.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1a698235

KVM: x86: ratelimit and decrease severity for guest-triggered printk · bbe41b95

由 Paolo Bonzini 提交于 8月 19, 2016

These are mostly related to nested VMX.  They needn't have
a loglevel as high as KERN_WARN, and mustn't be allowed to
pollute the host logs.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bbe41b95

KVM: nVMX: pass valid guest linear-address to the L1 · 119a9c01

由 Jan Dakinevich 提交于 9月 04, 2016

If EPT support is exposed to L1 hypervisor, guest linear-address field
of VMCS should contain GVA of L2, the access to which caused EPT violation.
Signed-off-by: NJan Dakinevich <jan.dakinevich@gmail.com>
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

119a9c01

KVM: nVMX: make emulated nested preemption timer pinned · f15a75ee

由 Wanpeng Li 提交于 8月 30, 2016

Commit 61abdbe0 ("kvm: x86: make lapic hrtimer pinned") pins the emulated
lapic timer. This patch does the same for the emulated nested preemption
timer to avoid vmexit an unrelated vCPU and the latency of kicking IPI to
another vCPU.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f15a75ee

vmx: refine validity check for guest linear address · 72e0ae58

由 Liang Li 提交于 8月 18, 2016

The validity check for the guest line address is inefficient,
check the invalid value instead of enumerating the valid ones.
Signed-off-by: NLiang Li <liang.z.li@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

72e0ae58

19 8月, 2016 3 次提交

KVM: x86: Expose more Intel AVX512 feature to guest · 8e3562f6

由 Luwei Kang 提交于 8月 02, 2016

Expose AVX512DQ, AVX512BW, AVX512VL feature to guest.
Its spec can be found at:
https://software.intel.com/sites/default/files/managed/b4/3a/319433-024.pdfSigned-off-by: NLuwei Kang <luwei.kang@intel.com>
[Resolved a trivial conflict with removed F(PCOMMIT).]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

8e3562f6

mmu: don't pass *kvm to spte_write_protect and spte_*_dirty · c4f138b4

由 Bandan Das 提交于 8月 02, 2016

That parameter isn't used in these functions,
it's probably a historical artifact.
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c4f138b4

KVM: lapic: don't recalculate apic map table twice when enabling LAPIC · 187ca84b

由 Wanpeng Li 提交于 8月 03, 2016

APIC map table is recalculated during reset APIC ID to the initial value
when enabling LAPIC. This patch move the recalculate_apic_map() to the
next branch since we don't need to recalculate apic map twice in current
codes.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

187ca84b

18 8月, 2016 3 次提交

kvm: nVMX: fix nested tsc scaling · c95ba92a

由 Peter Feiner 提交于 8月 17, 2016

When the host supported TSC scaling, L2 would use a TSC multiplier of
0, which causes a VM entry failure. Now L2's TSC uses the same
multiplier as L1.
Signed-off-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c95ba92a

KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write · dccbfcf5

由 Radim Krčmář 提交于 8月 08, 2016

If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
write with vmcs02 as the current VMCS.
This will incorrectly apply modifications intended for vmcs01 to vmcs02
and L2 can use it to gain access to L0's x2APIC registers by disabling
virtualized x2APIC while using msr bitmap that assumes enabled.

Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
current VMCS.  An alternative solution would temporarily make vmcs01 the
current VMCS, but it requires more care.

Fixes: 8d14695f ("x86, apicv: add virtual x2apic support")
Reported-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

dccbfcf5

KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC · d048c098

由 Radim Krčmář 提交于 8月 08, 2016

msr bitmap can be used to avoid a VM exit (interception) on guest MSR
accesses.  In some configurations of VMX controls, the guest can even
directly access host's x2APIC MSRs.  See SDM 29.5 VIRTUALIZING MSR-BASED
APIC ACCESSES.

L2 could read all L0's x2APIC MSRs and write TPR, EOI, and SELF_IPI.
To do so, L1 would first trick KVM to disable all possible interceptions
by enabling APICv features and then would turn those features off;
nested_vmx_merge_msr_bitmap() only disabled interceptions, so VMX would
not intercept previously enabled MSRs even though they were not safe
with the new configuration.

Correctly re-enabling interceptions is not enough as a second bug would
still allow L1+L2 to access host's MSRs: msr bitmap was shared for all
VMCSs, so L1 could trigger a race to get the desired combination of msr
bitmap and VMX controls.

This fix allocates a msr bitmap for every L1 VCPU, allows only safe
x2APIC MSRs from L1's msr bitmap, and disables msr bitmaps if they would
have to intercept everything anyway.

Fixes: 3af18d9c ("KVM: nVMX: Prepare for using hardware MSR bitmap")
Reported-by: NJim Mattson <jmattson@google.com>
Suggested-by: NWincy Van <fanwenyi0529@gmail.com>
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

d048c098

04 8月, 2016 3 次提交

nvmx: mark ept single context invalidation as supported · 45e11817

由 Bandan Das 提交于 8月 02, 2016

Commit 4b855078 ("KVM: nVMX: Don't advertise single
context invalidation for invept") removed advertising
single context invalidation since the spec does not mandate it.
However, some hypervisors (such as ESX) require it to be present
before willing to use ept in a nested environment. Advertise it
and fallback to the global case.
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

45e11817

nvmx: remove comment about missing nested vpid support · 03331b4b

由 Bandan Das 提交于 8月 02, 2016

Nested vpid is already supported and both single/global
modes are advertised to the guest
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

03331b4b

KVM: lapic: fix access preemption timer stuff even if kernel_irqchip=off · 91005300

由 Wanpeng Li 提交于 8月 03, 2016

BUG: unable to handle kernel NULL pointer dereference at 000000000000008c
IP: [<ffffffffc04e0180>] kvm_lapic_hv_timer_in_use+0x10/0x20 [kvm]
PGD 0
Oops: 0000 [#1] SMP
Call Trace:
 kvm_arch_vcpu_load+0x86/0x260 [kvm]
 vcpu_load+0x46/0x60 [kvm]
 kvm_vcpu_ioctl+0x79/0x7c0 [kvm]
 ? __lock_is_held+0x54/0x70
 do_vfs_ioctl+0x96/0x6a0
 ? __fget_light+0x2a/0x90
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x7c/0x1e0
 entry_SYSCALL64_slow_path+0x25/0x25
RIP  [<ffffffffc04e0180>] kvm_lapic_hv_timer_in_use+0x10/0x20 [kvm]
 RSP <ffff8800db1f3d70>
CR2: 000000000000008c
---[ end trace a55fb79d2b3b4ee8 ]---

This can be reproduced steadily by kernel_irqchip=off.

We should not access preemption timer stuff if lapic is emulated in userspace.
This patch fix it by avoiding access preemption timer stuff when kernel_irqchip=off.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

91005300

01 8月, 2016 2 次提交

KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD · b80c76ec

由 Jim Mattson 提交于 7月 29, 2016

Kexec needs to know the addresses of all VMCSs that are active on
each CPU, so that it can flush them from the VMCS caches. It is
safe to record superfluous addresses that are not associated with
an active VMCS, but it is not safe to omit an address associated
with an active VMCS.

After a call to vmcs_load, the VMCS that was loaded is active on
the CPU. The VMCS should be added to the CPU's list of active
VMCSs before it is loaded.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b80c76ec

kvm: x86: nVMX: maintain internal copy of current VMCS · 4f2777bc

由 David Matlack 提交于 7月 13, 2016

KVM maintains L1's current VMCS in guest memory, at the guest physical
page identified by the argument to VMPTRLD. This makes hairy
time-of-check to time-of-use bugs possible,as VCPUs can be writing
the the VMCS page in memory while KVM is emulating VMLAUNCH and
VMRESUME.

The spec documents that writing to the VMCS page while it is loaded is
"undefined". Therefore it is reasonable to load the entire VMCS into
an internal cache during VMPTRLD and ignore writes to the VMCS page
-- the guest should be using VMREAD and VMWRITE to access the current
VMCS.

To adhere to the spec, KVM should flush the current VMCS during VMPTRLD,
and the target VMCS during VMCLEAR (as given by the operand to VMCLEAR).
Since this implementation of VMCS caching only maintains the the current
VMCS, VMCLEAR will only do a flush if the operand to VMCLEAR is the
current VMCS pointer.

KVM will also flush during VMXOFF, which is not mandated by the spec,
but also not in conflict with the spec.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f2777bc

24 7月, 2016 1 次提交

Revert "KVM: x86: add pcommit support" · dfa169bb

由 Dan Williams 提交于 6月 02, 2016

This reverts commit 8b3e34e4.

Given the deprecation of the pcommit instruction, the relevant VMX
features and CPUID bits are not going to be rolled into the SDM.  Remove
their usage from KVM.

Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

dfa169bb

23 7月, 2016 1 次提交

KVM: Move kvm_setup_default/empty_irq_routing declaration in arch specific header · d9565a73

由 Eric Auger 提交于 7月 22, 2016

kvm_setup_default_irq_routing and kvm_setup_empty_irq_routing are
not used by generic code. So let's move the declarations in x86 irq.h
header instead of kvm_host.h.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Suggested-by: NAndre Przywara <andre.przywara@arm.com>
Acked-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

d9565a73

16 7月, 2016 2 次提交

KVM: VMX: handle PML full VMEXIT that occurs during event delivery · b244c9fc

由 Cao, Lei 提交于 7月 15, 2016

With PML enabled, guest will shut down if a PML full VMEXIT occurs during
event delivery. According to Intel SDM 27.2.3, PML full VMEXIT can occur when
event is being delivered through IDT, so KVM should not exit to user space
with error. Instead, it should let EXIT_REASON_PML_FULL go through and the
event will be re-injected on the next VMENTRY.
Signed-off-by: NLei Cao <lei.cao@stratus.com>
Cc: stable@vger.kernel.org
Fixes: 843e4330 ("KVM: VMX: Add PML support in VMX")
[Shortened the summary and Cc'd stable.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b244c9fc

Revert "KVM: SVM: fix trashing of MSR_TSC_AUX" · 6a907cd0

由 Radim Krčmář 提交于 7月 15, 2016

This reverts commit 9770404a.

The reverted patch is not needed as only userspace uses RDTSCP and
MSR_TSC_AUX is in host_save_user_msrs[] and therefore properly saved in
svm_vcpu_load() and restored in svm_vcpu_put() before every switch to
userspace.

The reverted patch did not allow the kernel to use RDTSCP in the future,
because of missed trashing in svm_set_msr() and 64-bit ifdef.

This reverts commit 2b23c3a6.

2b23c3a6 ("KVM: SVM: do not set MSR_TSC_AUX on 32-bit builds") is a
build fix for 9770404a and reverting them separately would only
break more bisections.

Cc: stable@vger.kernel.org

6a907cd0

15 7月, 2016 6 次提交

x86/kvm/kvmclock: Convert to hotplug state machine · 251a5fd6

由 Sebastian Andrzej Siewior 提交于 7月 13, 2016

Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

We assumed that the priority ordering was ment to invoke the online
callback as the last step. In the original code this also invoked the
down prepare callback as the last step. With the symmetric state
machine the down prepare callback is now the first step.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153335.542880859@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

251a5fd6

KVM/x86: Remove superfluous SMP function call · 162e52a1

由 Anna-Maria Gleixner 提交于 7月 13, 2016

Since the following commit:

  1cf4f629 ("cpu/hotplug: Move online calls to hotplugged cpu")

... the CPU_ONLINE and CPU_DOWN_PREPARE notifiers are always run on the hot
plugged CPU, and as of commit:

  3b9d6da6 ("cpu/hotplug: Fix rollback during error-out in __cpu_disable()")

the CPU_DOWN_FAILED notifier also runs on the hot plugged CPU.  This patch
converts the SMP functional calls into direct calls.

smp_function_call_single() executes the function with interrupts
disabled. This calling convention is not preserved because there
is no reason to do so.
Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153335.452527104@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

162e52a1

KVM: SVM: do not set MSR_TSC_AUX on 32-bit builds · 2b23c3a6

由 Paolo Bonzini 提交于 7月 14, 2016

This is unnecessary---and besides, __getcpu() is not even
available on 32-bit builds.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2b23c3a6

KVM: nVMX: Fix memory corruption when using VMCS shadowing · 2f1fe811

由 Jim Mattson 提交于 7月 08, 2016

When freeing the nested resources of a vcpu, there is an assumption that
the vcpu's vmcs01 is the current VMCS on the CPU that executes
nested_release_vmcs12(). If this assumption is violated, the vcpu's
vmcs01 may be made active on multiple CPUs at the same time, in
violation of Intel's specification. Moreover, since the vcpu's vmcs01 is
not VMCLEARed on every CPU on which it is active, it can linger in a
CPU's VMCS cache after it has been freed and potentially
repurposed. Subsequent eviction from the CPU's VMCS cache on a capacity
miss can result in memory corruption.

It is not sufficient for vmx_free_vcpu() to call vmx_load_vmcs01(). If
the vcpu in question was last loaded on a different CPU, it must be
migrated to the current CPU before calling vmx_load_vmcs01().
Signed-off-by: NJim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2f1fe811

kvm: vmx: ensure VMCS is current while enabling PML · 4e59516a

由 Peter Feiner 提交于 7月 07, 2016

Between loading the new VMCS and enabling PML, the CPU was unpinned.
If the vCPU thread were migrated to another CPU in the interim (e.g.,
due to preemption or sleeping alloc_page), then the VMWRITEs to enable
PML would target the wrong VMCS -- or no VMCS at all:

  [ 2087.266950] vmwrite error: reg 200e value 3fe1d52000 (err -506126336)
  [ 2087.267062] vmwrite error: reg 812 value 1ff (err 511)
  [ 2087.267125] vmwrite error: reg 401e value 12229c00 (err 304258048)

This patch ensures that the VMCS remains current while enabling PML by
doing the VMWRITEs while the CPU is pinned. Allocation of the PML buffer
is hoisted out of the critical section.
Signed-off-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4e59516a

KVM: SVM: fix trashing of MSR_TSC_AUX · 9770404a

由 Paolo Bonzini 提交于 7月 06, 2016

I don't know what I was thinking when I wrote commit 46896c73 ("KVM:
svm: add support for RDTSCP", 2015-11-12); I missed write_rdtscp_aux which
obviously uses MSR_TSC_AUX.

Therefore we do need to save/restore MSR_TSC_AUX in svm_vcpu_run.

Cc: stable@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Fixes: 46896c73 ("KVM: svm: add support for RDTSCP")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9770404a

14 7月, 2016 3 次提交

x86/kvm: Audit and remove any unnecessary uses of module.h · 1767e931

由 Paul Gortmaker 提交于 7月 13, 2016

Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  In the case of
kvm where it is modular, we can extend that to also include files
that are building basic support functionality but not related
to loading or registering the final module; such files also have
no need whatsoever for module.h

The advantage in removing such instances is that module.h itself
sources about 15 other headers; adding significantly to what we feed
cpp, and it can obscure what headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each instance for the
presence of either and replace as needed.

Several instances got replaced with moduleparam.h since that was
really all that was required for those particular files.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/20160714001901.31603-8-paul.gortmaker@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1767e931

KVM: x86: bump KVM_MAX_VCPU_ID to 1023 · af1bae54

由 Radim Krčmář 提交于 7月 12, 2016

kzalloc was replaced with kvm_kvzalloc to allow non-contiguous areas and
rcu had to be modified to cope with it.

The practical limit for KVM_MAX_VCPU_ID right now is INT_MAX, but lower
value was chosen in case there were bugs.  1023 is sufficient maximum
APIC ID for 288 VCPUs.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

af1bae54

KVM: x86: add a flag to disable KVM x2apic broadcast quirk · c519265f

由 Radim Krčmář 提交于 7月 12, 2016

Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to
KVM_CAP_X2APIC_API.

The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode.
The enableable capability is needed in order to support standard x2APIC and
remain backward compatible.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
[Expand kvm_apic_mda comment. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c519265f

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功