提交 · f7819512996361280b86259222456fcf15aad926 · openeuler / raspberrypi-kernel

06 2月, 2015 1 次提交

kvm: add halt_poll_ns module parameter · f7819512

由 Paolo Bonzini 提交于 2月 04, 2015

This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.

This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.

With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.

Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.

The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.

While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.

The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f7819512

30 1月, 2015 1 次提交

KVM: VMX: Add PML support in VMX · 843e4330

由 Kai Huang 提交于 1月 28, 2015

This patch adds PML support in VMX. A new module parameter 'enable_pml' is added
to allow user to enable/disable it manually.
Signed-off-by: NKai Huang <kai.huang@linux.intel.com>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

843e4330

29 1月, 2015 2 次提交

KVM: x86: Add new dirty logging kvm_x86_ops for PML · 88178fd4

由 Kai Huang 提交于 1月 28, 2015

This patch adds new kvm_x86_ops dirty logging hooks to enable/disable dirty
logging for particular memory slot, and to flush potentially logged dirty GPAs
before reporting slot->dirty_bitmap to userspace.

kvm x86 common code calls these hooks when they are available so PML logic can
be hidden to VMX specific. SVM won't be impacted as these hooks remain NULL
there.
Signed-off-by: NKai Huang <kai.huang@linux.intel.com>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

88178fd4

KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access · 1c91cad4

由 Kai Huang 提交于 1月 28, 2015

This patch changes the second parameter of kvm_mmu_slot_remove_write_access from
'slot id' to 'struct kvm_memory_slot *' to align with kvm_x86_ops dirty logging
hooks, which will be introduced in further patch.

Better way is to change second parameter of kvm_arch_commit_memory_region from
'struct kvm_userspace_memory_region *' to 'struct kvm_memory_slot * new', but it
requires changes on other non-x86 ARCH too, so avoid it now.
Signed-off-by: NKai Huang <kai.huang@linux.intel.com>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1c91cad4

26 1月, 2015 2 次提交

KVM: x86: 32-bit wraparound read/write not emulated correctly · bac15531

由 Nadav Amit 提交于 1月 26, 2015

If we got a wraparound of 32-bit operand, and the limit is 0xffffffff, read and
writes should be successful. It just needs to be done in two segments.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bac15531

KVM: x86: IRET emulation does not clear NMI masking · 801806d9

由 Nadav Amit 提交于 1月 26, 2015

The IRET instruction should clear NMI masking, but the current implementation
does not do so.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

801806d9

23 1月, 2015 1 次提交

KVM: remove unneeded return value of vcpu_postcreate · 31928aa5

由 Dominik Dingel 提交于 12月 04, 2014

The return value of kvm_arch_vcpu_postcreate is not checked in its
caller.  This is okay, because only x86 provides vcpu_postcreate right
now and it could only fail if vcpu_load failed.  But that is not
possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so
just get rid of the unchecked return value.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

31928aa5

21 1月, 2015 2 次提交

KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue · 54750f2c

由 Marcelo Tosatti 提交于 1月 20, 2015

SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp
and rdtsc is larger than a given threshold:

 * If we get more than the below threshold into the future, we rerequest
 * the real time from the host again which has only little offset then
 * that we need to adjust using the TSC.
 *
 * For now that threshold is 1/5th of a jiffie. That should be good
 * enough accuracy for completely broken systems, but also give us swing
 * to not call out to the host all the time.
 */
#define PVCLOCK_DELTA_MAX ((1000000000ULL / HZ) / 5)

Disable masterclock support (which increases said delta) in case the
boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW.

Upstreams kernels which support pvclock vsyscalls (and therefore make
use of PVCLOCK_STABLE_BIT) use MSR_KVM_SYSTEM_TIME_NEW.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

54750f2c

KVM: fix "Should it be static?" warnings from sparse · 69b0049a

由 Fengguang Wu 提交于 1月 19, 2015

arch/x86/kvm/x86.c:495:5: sparse: symbol 'kvm_read_nested_guest_page' was not declared. Should it be static?
arch/x86/kvm/x86.c:646:5: sparse: symbol '__kvm_set_xcr' was not declared. Should it be static?
arch/x86/kvm/x86.c:1183:15: sparse: symbol 'max_tsc_khz' was not declared. Should it be static?
arch/x86/kvm/x86.c:1237:6: sparse: symbol 'kvm_track_tsc_matching' was not declared. Should it be static?
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

69b0049a

16 1月, 2015 1 次提交

KVM: x86: switch to kvm_get_dirty_log_protect · e108ff2f

由 Paolo Bonzini 提交于 1月 15, 2015

We now have a generic function that does most of the work of
kvm_vm_ioctl_get_dirty_log, now use it.
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMario Smarduch <m.smarduch@samsung.com>

e108ff2f

09 1月, 2015 4 次提交

KVM: x86: allow TSC deadline timer on all hosts · defcf51f

由 Radim Krčmář 提交于 1月 08, 2015

Emulation does not utilize the feature.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

defcf51f

kvm: x86: Remove kvm_make_request from lapic.c · bab5bb39

由 Nicholas Krause 提交于 1月 01, 2015

Adds a function kvm_vcpu_set_pending_timer instead of calling
kvm_make_request in lapic.c.
Signed-off-by: NNicholas Krause <xerofoify@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bab5bb39

KVM: x86: add option to advance tscdeadline hrtimer expiration · d0659d94

由 Marcelo Tosatti 提交于 12月 16, 2014

For the hrtimer which emulates the tscdeadline timer in the guest,
add an option to advance expiration, and busy spin on VM-entry waiting
for the actual expiration time to elapse.

This allows achieving low latencies in cyclictest (or any scenario
which requires strict timing regarding timer expiration).

Reduces average cyclictest latency from 12us to 8us
on Core i5 desktop.

Note: this option requires tuning to find the appropriate value
for a particular hardware/guest combination. One method is to measure the
average delay between apic_timer_fn and VM-entry.
Another method is to start with 1000ns, and increase the value
in say 500ns increments until avg cyclictest numbers stop decreasing.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d0659d94

KVM: nVMX: Add nested msr load/restore algorithm · ff651cb6

由 Wincy Van 提交于 12月 11, 2014

Several hypervisors need MSR auto load/restore feature.
We read MSRs from VM-entry MSR load area which specified by L1,
and load them via kvm_set_msr in the nested entry.
When nested exit occurs, we get MSRs via kvm_get_msr, writing
them to L1`s MSR store area. After this, we read MSRs from VM-exit
MSR load area, and load them via kvm_set_msr.
Signed-off-by: NWincy Van <fanwenyi0529@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ff651cb6

05 12月, 2014 1 次提交

KVM: x86: support XSAVES usage in the host · df1daba7

由 Paolo Bonzini 提交于 11月 21, 2014

Userspace is expecting non-compacted format for KVM_GET_XSAVE, but
struct xsave_struct might be using the compacted format.  Convert
in order to preserve userspace ABI.

Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE
but the kernel will pass it to XRSTORS, and we need to convert back.

Fixes: f31a9f7c
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: stable@vger.kernel.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: NNadav Amit <namit@cs.technion.ac.il>
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

df1daba7

24 11月, 2014 3 次提交

kvm: x86: avoid warning about potential shift wrapping bug · 2b4a273b

由 Paolo Bonzini 提交于 11月 24, 2014

cs.base is declared as a __u64 variable and vector is a u32 so this
causes a static checker warning.  The user indeed can set "sipi_vector"
to any u32 value in kvm_vcpu_ioctl_x86_set_vcpu_events(), but the
value should really have 8-bit precision only.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2b4a273b

KVM: x86: move device assignment out of kvm_host.h · c9eab58f

由 Paolo Bonzini 提交于 11月 24, 2014

Create a new header, and hide the device assignment functions there.
Move struct kvm_assigned_dev_kernel to assigned-dev.c by modifying
arch/x86/kvm/iommu.c to take a PCI device struct.

Based on a patch by Radim Krcmar <rkrcmark@redhat.com>.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9eab58f

kvm: x86: move assigned-dev.c and iommu.c to arch/x86/ · c274e03a

由 Radim Krčmář 提交于 11月 21, 2014

Now that ia64 is gone, we can hide deprecated device assignment in x86.

Notable changes:
 - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl()

The easy parts were removed from generic kvm code, remaining
 - kvm_iommu_(un)map_pages() would require new code to be moved
 - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c274e03a

22 11月, 2014 1 次提交

kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/ · 6ef768fa

由 Paolo Bonzini 提交于 11月 20, 2014

ia64 does not need them anymore.  Ack notifiers become x86-specific
too.
Suggested-by: NGleb Natapov <gleb@kernel.org>
Reviewed-by: NRadim Krcmar <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6ef768fa

13 11月, 2014 1 次提交

kvm: svm: move WARN_ON in svm_adjust_tsc_offset · d913b904

由 Chris J Arges 提交于 11月 12, 2014

When running the tsc_adjust kvm-unit-test on an AMD processor with the
IA32_TSC_ADJUST feature enabled, the WARN_ON in svm_adjust_tsc_offset can be
triggered. This WARN_ON checks for a negative adjustment in case __scale_tsc
is called; however it may trigger unnecessary warnings.

This patch moves the WARN_ON to trigger only if __scale_tsc will actually be
called from svm_adjust_tsc_offset. In addition make adj in kvm_set_msr_common
s64 since this can have signed values.
Signed-off-by: NChris J Arges <chris.j.arges@canonical.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d913b904

10 11月, 2014 1 次提交

KVM: x86: fix warning on 32-bit compilation · ac146235

由 Paolo Bonzini 提交于 11月 10, 2014

PCIDs are only supported in 64-bit mode.  No need to clear bit 63
of CR3 unless the host is 64-bit.

Reported by Fengguang Wu's autobuilder.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ac146235

08 11月, 2014 3 次提交

kvm: x86: add trace event for pvclock updates · ce1a5e60

由 David Matlack 提交于 11月 05, 2014

The new trace event records:
  * the id of vcpu being updated
  * the pvclock_vcpu_time_info struct being written to guest memory

This is useful for debugging pvclock bugs, such as the bug fixed by
"[PATCH] kvm: x86: Fix kvm clock versioning.".
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ce1a5e60

kvm: x86: Fix kvm clock versioning. · 09a0c3f1

由 Owen Hofmann 提交于 11月 03, 2014

kvm updates the version number for the guest paravirt clock structure by
incrementing the version of its private copy. It does not read the guest
version, so will write version = 2 in the first update for every new VM,
including after restoring a saved state. If guest state is saved during
reading the clock, it could read and accept struct fields and guest TSC
from two different updates. This changes the code to increment the guest
version and write it back.
Signed-off-by: NOwen Hofmann <osh@google.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

09a0c3f1

KVM: x86: update masterclock values on TSC writes · 7f187922

由 Marcelo Tosatti 提交于 11月 04, 2014

When the guest writes to the TSC, the masterclock TSC copy must be
updated as well along with the TSC_OFFSET update, otherwise a negative
tsc_timestamp is calculated at kvm_guest_time_update.

Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to
"if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)"
becomes redundant, so remove the do_request boolean and collapse
everything into a single condition.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7f187922

07 11月, 2014 4 次提交

KVM: x86: Do not update EFLAGS on faulting emulation · 38827dbd

由 Nadav Amit 提交于 11月 02, 2014

If the emulation ends in fault, eflags should not be updated. However, several
instruction emulations (actually all the fastops) currently update eflags, if
the fault was detected afterwards (e.g., #PF during writeback).
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

38827dbd

KVM: x86: MOV to CR3 can set bit 63 · 9d88fca7

由 Nadav Amit 提交于 11月 02, 2014

Although Intel SDM mentions bit 63 is reserved, MOV to CR3 can have bit 63 set.
As Intel SDM states in section 4.10.4 "Invalidation of TLBs and
Paging-Structure Caches": " MOV to CR3. ... If CR4.PCIDE = 1 and bit 63 of the
instructionâ€™s source operand is 0 ..."

In other words, bit 63 is not reserved. KVM emulator currently consider bit 63
as reserved. Fix it.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9d88fca7

KVM: x86: Breakpoints do not consider CS.base · 82b32774

由 Nadav Amit 提交于 11月 02, 2014

x86 debug registers hold a linear address. Therefore, breakpoints detection
should consider CS.base, and check whether instruction linear address equals
(CS.base + RIP). This patch introduces a function to evaluate RIP linear
address and uses it for breakpoints detection.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

82b32774

KVM: x86: No error-code on real-mode exceptions · 3ffb2468

由 Nadav Amit 提交于 11月 02, 2014

Real-mode exceptions do not deliver error code. As can be seen in Intel SDM
volume 2, real-mode exceptions do not have parentheses, which indicate
error-code.  To avoid significant changes of the code, the error code is
"removed" during exception queueing.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3ffb2468

03 11月, 2014 3 次提交

KVM: x86: Enable Intel AVX-512 for guest · 612263b3

由 Chao Peng 提交于 10月 22, 2014

Expose Intel AVX-512 feature bits to guest. Also add checks for
xcr0 AVX512 related bits according to spec:
http://download-software.intel.com/sites/default/files/managed/71/2e/319433-017.pdfSigned-off-by: NChao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

612263b3

KVM: vmx: Unavailable DR4/5 is checked before CPL · 16f8a6f9

由 Nadav Amit 提交于 10月 03, 2014

If DR4/5 is accessed when it is unavailable (since CR4.DE is set), then #UD
should be generated even if CPL>0. This is according to Intel SDM Table 6-2:
"Priority Among Simultaneous Exceptions and Interrupts".

Note, that this may happen on the first DR access, even if the host does not
sets debug breakpoints. Obviously, it occurs when the host debugs the guest.

This patch moves the DR4/5 checks from __kvm_set_dr/_kvm_get_dr to handle_dr.
The emulator already checks DR4/5 availability in check_dr_read. Nested
virutalization related calls to kvm_set_dr/kvm_get_dr would not like to inject
exceptions to the guest.

As for SVM, the patch follows the previous logic as much as possible. Anyhow,
it appears the DR interception code might be buggy - even if the DR access
may cause an exception, the instruction is skipped.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

16f8a6f9

KVM: x86: DR7.GD should be cleared upon any #DB exception · 6bdf0662

由 Nadav Amit 提交于 9月 30, 2014

Intel SDM 17.2.4 (Debug Control Register (DR7)) says: "The processor clears the
GD flag upon entering to the debug exception handler." This sentence may be
misunderstood as if it happens only on #DB due to debug-register protection,
but it happens regardless to the cause of the #DB.

Fix the behavior to match both real hardware and Bochs.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6bdf0662

24 10月, 2014 2 次提交

KVM: x86: Prevent host from panicking on shared MSR writes. · 8b3c3104

由 Andy Honig 提交于 8月 27, 2014

The previous patch blocked invalid writes directly when the MSR
is written.  As a precaution, prevent future similar mistakes by
gracefulling handle GPs caused by writes to shared MSRs.

Cc: stable@vger.kernel.org
Signed-off-by: NAndrew Honig <ahonig@google.com>
[Remove parts obsoleted by Nadav's patch. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8b3c3104

KVM: x86: Check non-canonical addresses upon WRMSR · 854e8bb1

由 Nadav Amit 提交于 9月 16, 2014

Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
written to certain MSRs. The behavior is "almost" identical for AMD and Intel
(ignoring MSRs that are not implemented in either architecture since they would
anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
non-canonical address is written on Intel but not on AMD (which ignores the top
32-bits).

Accordingly, this patch injects a #GP on the MSRs which behave identically on
Intel and AMD. To eliminate the differences between the architecutres, the
value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
canonical value before writing instead of injecting a #GP.

Some references from Intel and AMD manuals:

According to Intel SDM description of WRMSR instruction #GP is expected on
WRMSR "If the source register contains a non-canonical address and ECX
specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."

According to AMD manual instruction manual:
LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
LSTAR and CSTAR registers. If an RIP written by WRMSR is not in canonical
form, a general-protection exception (#GP) occurs."
IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
base field must be in canonical form or a #GP fault will occur."
IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
be in canonical form."

This patch fixes CVE-2014-3610.

Cc: stable@vger.kernel.org
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

854e8bb1

03 10月, 2014 1 次提交

kvm: do not handle APIC access page if in-kernel irqchip is not in use · f439ed27

由 Paolo Bonzini 提交于 10月 02, 2014

This fixes the following OOPS:

   loaded kvm module (v3.17-rc1-168-gcec26bc3)
   BUG: unable to handle kernel paging request at fffffffffffffffe
   IP: [<ffffffff81168449>] put_page+0x9/0x30
   PGD 1e15067 PUD 1e17067 PMD 0
   Oops: 0000 [#1] PREEMPT SMP
    [<ffffffffa063271d>] ? kvm_vcpu_reload_apic_access_page+0x5d/0x70 [kvm]
    [<ffffffffa013b6db>] vmx_vcpu_reset+0x21b/0x470 [kvm_intel]
    [<ffffffffa0658816>] ? kvm_pmu_reset+0x76/0xb0 [kvm]
    [<ffffffffa064032a>] kvm_vcpu_reset+0x15a/0x1b0 [kvm]
    [<ffffffffa06403ac>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
    [<ffffffffa062e540>] kvm_vm_ioctl+0x200/0x780 [kvm]
    [<ffffffff81212170>] do_vfs_ioctl+0x2d0/0x4b0
    [<ffffffff8108bd99>] ? __mmdrop+0x69/0xb0
    [<ffffffff812123d1>] SyS_ioctl+0x81/0xa0
    [<ffffffff8112a6f6>] ? __audit_syscall_exit+0x1f6/0x2a0
    [<ffffffff817229e9>] system_call_fastpath+0x16/0x1b
   Code: c6 78 ce a3 81 4c 89 e7 e8 d9 80 ff ff 0f 0b 4c 89 e7 e8 8f f6 ff ff e9 fa fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 1e 8b 47 1c 85 c0 74 27 f0
   RIP  [<ffffffff81193045>] put_page+0x5/0x50

when not using the in-kernel irqchip ("-machine kernel_irqchip=off"
with QEMU).  The fix is to make the same check in
kvm_vcpu_reload_apic_access_page that we already have
in vmx.c's vm_need_virtualize_apic_accesses().
Reported-by: NJan Kiszka <jan.kiszka@siemens.com>
Tested-by: NJan Kiszka <jan.kiszka@siemens.com>
Fixes: 4256f43fSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f439ed27

24 9月, 2014 6 次提交

kvm: x86: Unpin and remove kvm_arch->apic_access_page · c24ae0dc

由 Tang Chen 提交于 9月 24, 2014

In order to make the APIC access page migratable, stop pinning it in
memory.

And because the APIC access page is not pinned in memory, we can
remove kvm_arch->apic_access_page.  When we need to write its
physical address into vmcs, we use gfn_to_page() to get its page
struct, which is needed to call page_to_phys(); the page is then
immediately unpinned.
Suggested-by: NGleb Natapov <gleb@kernel.org>
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c24ae0dc

kvm: x86: Add request bit to reload APIC access page address · 4256f43f

由 Tang Chen 提交于 9月 24, 2014

Currently, the APIC access page is pinned by KVM for the entire life
of the guest.  We want to make it migratable in order to make memory
hot-unplug available for machines that run KVM.

This patch prepares to handle this in generic code, through a new
request bit (that will be set by the MMU notifier) and a new hook
that is called whenever the request bit is processed.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4256f43f

kvm: Add arch specific mmu notifier for page invalidation · fe71557a

由 Tang Chen 提交于 9月 24, 2014

This will be used to let the guest run while the APIC access page is
not pinned.  Because subsequent patches will fill in the function
for x86, place the (still empty) x86 implementation in the x86.c file
instead of adding an inline function in kvm_host.h.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe71557a

kvm: x86: use macros to compute bank MSRs · 81760dcc

由 Chen Yucong 提交于 9月 23, 2014

Avoid open coded calculations for bank MSRs by using well-defined
macros that hide the index of higher bank MSRs.

No semantic changes.
Signed-off-by: NChen Yucong <slaoub@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

81760dcc

KVM: vmx: Inject #GP on invalid PAT CR · 4566654b

由 Nadav Amit 提交于 9月 18, 2014

Guest which sets the PAT CR to invalid value should get a #GP.  Currently, if
vmx supports loading PAT CR during entry, then the value is not checked.  This
patch makes the required check in that case.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4566654b

KVM: x86: directly use kvm_make_request again · 77c3913b

由 Liang Chen 提交于 9月 18, 2014

A one-line wrapper around kvm_make_request is not particularly
useful. Replace kvm_mmu_flush_tlb() with kvm_make_request().
Signed-off-by: NLiang Chen <liangchen.linux@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

77c3913b