提交 · d913b904355721e0aabb28dc8be2a3121de2e794 · FengXiao2002 / Linux Imx

07 11月, 2014 2 次提交

KVM: x86: MOV to CR3 can set bit 63 · 9d88fca7

由 Nadav Amit 提交于 11月 02, 2014

Although Intel SDM mentions bit 63 is reserved, MOV to CR3 can have bit 63 set.
As Intel SDM states in section 4.10.4 "Invalidation of TLBs and
Paging-Structure Caches": " MOV to CR3. ... If CR4.PCIDE = 1 and bit 63 of the
instructionâ€™s source operand is 0 ..."

In other words, bit 63 is not reserved. KVM emulator currently consider bit 63
as reserved. Fix it.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9d88fca7

KVM: x86: Breakpoints do not consider CS.base · 82b32774

由 Nadav Amit 提交于 11月 02, 2014

x86 debug registers hold a linear address. Therefore, breakpoints detection
should consider CS.base, and check whether instruction linear address equals
(CS.base + RIP). This patch introduces a function to evaluate RIP linear
address and uses it for breakpoints detection.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

82b32774

03 11月, 2014 2 次提交

KVM: vmx: Unavailable DR4/5 is checked before CPL · 16f8a6f9

由 Nadav Amit 提交于 10月 03, 2014

If DR4/5 is accessed when it is unavailable (since CR4.DE is set), then #UD
should be generated even if CPL>0. This is according to Intel SDM Table 6-2:
"Priority Among Simultaneous Exceptions and Interrupts".

Note, that this may happen on the first DR access, even if the host does not
sets debug breakpoints. Obviously, it occurs when the host debugs the guest.

This patch moves the DR4/5 checks from __kvm_set_dr/_kvm_get_dr to handle_dr.
The emulator already checks DR4/5 availability in check_dr_read. Nested
virutalization related calls to kvm_set_dr/kvm_get_dr would not like to inject
exceptions to the guest.

As for SVM, the patch follows the previous logic as much as possible. Anyhow,
it appears the DR interception code might be buggy - even if the DR access
may cause an exception, the instruction is skipped.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

16f8a6f9

KVM: x86: some apic broadcast modes does not work · 394457a9

由 Nadav Amit 提交于 10月 03, 2014

KVM does not deliver x2APIC broadcast messages with physical mode.  Intel SDM
(10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of
FFFF_FFFFH is used for broadcast of interrupts in both logical destination and
physical destination modes."

In addition, the local-apic enables cluster mode broadcast. As Intel SDM
10.6.2.2 says: "Broadcast to all local APICs is achieved by setting all
destination bits to one." This patch enables cluster mode broadcast.

The fix tries to combine broadcast in different modes through a unified code.

One rare case occurs when the source of IPI has its APIC disabled.  In such
case, the source can still issue IPIs, but since the source is not obliged to
have the same LAPIC mode as the enabled ones, we cannot rely on it.
Since it is a rare case, it is unoptimized and done on the slow-path.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NWanpeng Li <wanpeng.li@linux.intel.com>
[As per Radim's review, use unsigned int for X2APIC_BROADCAST, return bool from
 kvm_apic_broadcast. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

394457a9

24 10月, 2014 2 次提交

KVM: x86: Prevent host from panicking on shared MSR writes. · 8b3c3104

由 Andy Honig 提交于 8月 27, 2014

The previous patch blocked invalid writes directly when the MSR
is written.  As a precaution, prevent future similar mistakes by
gracefulling handle GPs caused by writes to shared MSRs.

Cc: stable@vger.kernel.org
Signed-off-by: NAndrew Honig <ahonig@google.com>
[Remove parts obsoleted by Nadav's patch. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8b3c3104

KVM: x86: Check non-canonical addresses upon WRMSR · 854e8bb1

由 Nadav Amit 提交于 9月 16, 2014

Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
written to certain MSRs. The behavior is "almost" identical for AMD and Intel
(ignoring MSRs that are not implemented in either architecture since they would
anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
non-canonical address is written on Intel but not on AMD (which ignores the top
32-bits).

Accordingly, this patch injects a #GP on the MSRs which behave identically on
Intel and AMD. To eliminate the differences between the architecutres, the
value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
canonical value before writing instead of injecting a #GP.

Some references from Intel and AMD manuals:

According to Intel SDM description of WRMSR instruction #GP is expected on
WRMSR "If the source register contains a non-canonical address and ECX
specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."

According to AMD manual instruction manual:
LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
LSTAR and CSTAR registers. If an RIP written by WRMSR is not in canonical
form, a general-protection exception (#GP) occurs."
IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
base field must be in canonical form or a #GP fault will occur."
IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
be in canonical form."

This patch fixes CVE-2014-3610.

Cc: stable@vger.kernel.org
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

854e8bb1

24 9月, 2014 5 次提交

kvm: x86: Unpin and remove kvm_arch->apic_access_page · c24ae0dc

由 Tang Chen 提交于 9月 24, 2014

In order to make the APIC access page migratable, stop pinning it in
memory.

And because the APIC access page is not pinned in memory, we can
remove kvm_arch->apic_access_page.  When we need to write its
physical address into vmcs, we use gfn_to_page() to get its page
struct, which is needed to call page_to_phys(); the page is then
immediately unpinned.
Suggested-by: NGleb Natapov <gleb@kernel.org>
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c24ae0dc

kvm: x86: Add request bit to reload APIC access page address · 4256f43f

由 Tang Chen 提交于 9月 24, 2014

Currently, the APIC access page is pinned by KVM for the entire life
of the guest.  We want to make it migratable in order to make memory
hot-unplug available for machines that run KVM.

This patch prepares to handle this in generic code, through a new
request bit (that will be set by the MMU notifier) and a new hook
that is called whenever the request bit is processed.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4256f43f

kvm: Add arch specific mmu notifier for page invalidation · fe71557a

由 Tang Chen 提交于 9月 24, 2014

This will be used to let the guest run while the APIC access page is
not pinned.  Because subsequent patches will fill in the function
for x86, place the (still empty) x86 implementation in the x86.c file
instead of adding an inline function in kvm_host.h.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe71557a

kvm: Fix page ageing bugs · 57128468

由 Andres Lagar-Cavilla 提交于 9月 22, 2014

1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.

2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.

3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.
Signed-off-by: NAndres Lagar-Cavilla <andreslc@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

57128468

KVM: x86: directly use kvm_make_request again · 77c3913b

由 Liang Chen 提交于 9月 18, 2014

A one-line wrapper around kvm_make_request is not particularly
useful. Replace kvm_mmu_flush_tlb() with kvm_make_request().
Signed-off-by: NLiang Chen <liangchen.linux@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

77c3913b

17 9月, 2014 1 次提交

kvm: Remove ept_identity_pagetable from struct kvm_arch. · a255d479

由 Tang Chen 提交于 9月 16, 2014

kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
As a result, it cannot be migrated/hot-removed. After this patch, since
kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
is no longer pinned in memory. And it can be migrated/hot-removed.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: NGleb Natapov <gleb@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a255d479

05 9月, 2014 2 次提交

KVM: x86: propagate exception from permission checks on the nested page fault · 54987b7a

由 Paolo Bonzini 提交于 9月 02, 2014

Currently, if a permission error happens during the translation of
the final GPA to HPA, walk_addr_generic returns 0 but does not fill
in walker->fault.  To avoid this, add an x86_exception* argument
to the translate_gpa function, and let it fill in walker->fault.
The nested_page_fault field will be true, since the walk_mmu is the
nested_mmu and translate_gpu instead operates on the "outer" (NPT)
instance.
Reported-by: NValentine Sinitsyn <valentine.sinitsyn@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

54987b7a

KVM: x86: skip writeback on injection of nested exception · ef54bcfe

由 Paolo Bonzini 提交于 9月 04, 2014

If a nested page fault happens during emulation, we will inject a vmexit,
not a page fault.  However because writeback happens after the injection,
we will write ctxt->eip from L2 into the L1 EIP.  We do not write back
if an instruction caused an interception vmexit---do the same for page
faults.
Suggested-by: NGleb Natapov <gleb@kernel.org>
Reviewed-by: NGleb Natapov <gleb@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ef54bcfe

03 9月, 2014 1 次提交

kvm: x86: fix stale mmio cache bug · 56f17dd3

由 David Matlack 提交于 8月 18, 2014

The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
up to userspace:

(1) Guest accesses gpa X without a memory slot. The gfn is cached in
struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
the SPTE write-execute-noread so that future accesses cause
EPT_MISCONFIGs.

(2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
covering the page just accessed.

(3) Guest attempts to read or write to gpa X again. On Intel, this
generates an EPT_MISCONFIG. The memory slot generation number that
was incremented in (2) would normally take care of this but we fast
path mmio faults through quickly_check_mmio_pf(), which only checks
the per-vcpu mmio cache. Since we hit the cache, KVM passes a
KVM_EXIT_MMIO up to userspace.

This patch fixes the issue by using the memslot generation number
to validate the mmio cache.

Cc: stable@vger.kernel.org
Signed-off-by: NDavid Matlack <dmatlack@google.com>
[xiaoguangrong: adjust the code to make it simpler for stable-tree fix.]
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Tested-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56f17dd3

29 8月, 2014 2 次提交

KVM: remove garbage arg to *hardware_{en,dis}able · 13a34e06

由 Radim Krčmář 提交于 8月 28, 2014

In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.

Remove unnecessary arguments that stem from this.
Signed-off-by: NRadim KrÄmÃ¡Å™ <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

13a34e06

KVM: forward declare structs in kvm_types.h · 65647300

由 Paolo Bonzini 提交于 8月 29, 2014

Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid
"'struct foo' declared inside parameter list" warnings (and consequent
breakage due to conflicting types).

Move them from individual files to a generic place in linux/kvm_types.h.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

65647300

22 8月, 2014 1 次提交

KVM: x86: introduce sched_in to kvm_x86_ops · ae97a3b8

由 Radim Krčmář 提交于 8月 21, 2014

sched_in preempt notifier is available for x86, allow its use in
specific virtualization technlogies as well.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ae97a3b8

19 8月, 2014 2 次提交

Revert "KVM: x86: Increase the number of fixed MTRR regs to 10" · 0d234daf

由 Paolo Bonzini 提交于 8月 18, 2014

This reverts commit 682367c4,
which causes 32-bit SMP Windows 7 guests to panic.

SeaBIOS has a limit on the number of MTRRs that it can handle,
and this patch exceeded the limit.  Better revert it.
Thanks to Nadav Amit for debugging the cause.

Cc: stable@nongnu.org
Reported-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0d234daf

KVM: x86: drop fpu_activate hook · 4473b570

由 Wanpeng Li 提交于 8月 18, 2014

The only user of the fpu_activate hook was dropped in commit
2d04a05b (KVM: x86 emulator: emulate CLTS internally, 2011-04-20).
vmx_fpu_activate and svm_fpu_activate are still called on #NM (and for
Intel CLTS), but never from common code; hence, there's no need for
a hook.
Reviewed-by: NYang Zhang <yang.z.zhang@intel.com>
Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4473b570

21 7月, 2014 1 次提交

KVM: x86: DR6/7.RTM cannot be written · 6f43ed01

由 Nadav Amit 提交于 7月 15, 2014

Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is
not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the
current KVM implementation. This bug is apparent only if the MOV-DR instruction
is emulated or the host also debugs the guest.

This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and
set respectively. It also sets DR6.RTM upon every debug exception. Obviously,
it is not a complete fix, as debugging of RTM is still unsupported.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6f43ed01

11 7月, 2014 1 次提交

KVM: x86: return all bits from get_interrupt_shadow · 37ccdcbe

由 Paolo Bonzini 提交于 5月 20, 2014

For the next patch we will need to know the full state of the
interrupt shadow; we will then set KVM_REQ_EVENT when one bit
is cleared.

However, right now get_interrupt_shadow only returns the one
corresponding to the emulated instruction, or an unconditional
0 if the emulated instruction does not have an interrupt shadow.
This is confusing and does not allow us to check for cleared
bits as mentioned above.

Clean the callback up, and modify toggle_interruptibility to
match the comment above the call.  As a small result, the
call to set_interrupt_shadow will be skipped in the common
case where int_shadow == 0 && mask == 0.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

37ccdcbe

10 7月, 2014 1 次提交

KVM: x86: fix TSC matching · 0d3da0d2

由 Tomasz Grabiec 提交于 6月 24, 2014

I've observed kvmclock being marked as unstable on a modern
single-socket system with a stable TSC and qemu-1.6.2 or qemu-2.0.0.

The culprit was failure in TSC matching because of overflow of
kvm_arch::nr_vcpus_matched_tsc in case there were multiple TSC writes
in a single synchronization cycle.

Turns out that qemu does multiple TSC writes during init, below is the
evidence of that (qemu-2.0.0):

The first one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm]
 0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm]

The second one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]

 #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
 #1  kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
 #2  kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
 #3  kvm_cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641
 #4  cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330
 #5  cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521
 #6  main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390

The third one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]

 #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
 #1  kvm_put_msrs  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
 #2  kvm_arch_put_registers  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
 #3  kvm_cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635
 #4  cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323
 #5  cpu_synchronize_all_post_reset () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:512
 #6  main  at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482

The fix is to count each vCPU only once when matched, so that
nr_vcpus_matched_tsc holds the size of the matched set. This is
achieved by reusing generation counters. Every vCPU with
this_tsc_generation == cur_tsc_generation is in the matched set. The
match set is cleared by setting cur_tsc_generation to a value which no
other vCPU is set to (by incrementing it).

I needed to bump up the counter size form u8 to u64 to ensure it never
overflows. Otherwise in cases TSC is not written the same number of
times on each vCPU the counter could overflow and incorrectly indicate
some vCPUs as being in the matched set. This scenario seems unlikely
but I'm not sure if it can be disregarded.
Signed-off-by: NTomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0d3da0d2

19 6月, 2014 2 次提交

KVM: x86: preserve the high 32-bits of the PAT register · 7cb060a9

由 Paolo Bonzini 提交于 6月 19, 2014

KVM does not really do much with the PAT, so this went unnoticed for a
long time.  It is exposed however if you try to do rdmsr on the PAT
register.
Reported-by: NValentine Sinitsyn <valentine.sinitsyn@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7cb060a9

KVM: x86: Increase the number of fixed MTRR regs to 10 · 682367c4

由 Nadav Amit 提交于 6月 18, 2014

Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
sometime make assumptions on CPUs while they ignore capability MSRs, it is
better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
actually supported has no functional implications.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

682367c4

18 6月, 2014 1 次提交

KVM: x86: rdpmc emulation checks the counter incorrectly · 67f4d428

由 Nadav Amit 提交于 6月 02, 2014

The rdpmc emulation checks that the counter (ECX) is not higher than 2, without
taking into considerations bits 30:31 role (e.g., bit 30 marks whether the
counter is fixed). The fix uses the pmu information for checking the validity
of the pmu counter.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

67f4d428

22 5月, 2014 1 次提交

KVM: x86: get CPL from SS.DPL · ae9fedc7

由 Paolo Bonzini 提交于 5月 14, 2014

CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS.  And CS.DPL is also not equal
to the CPL for conforming code segments.

However, SS.DPL *is* always equal to the CPL except for the weird
case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
value in the STAR MSR, but force CPL=3 (Intel instead forces
SS.DPL=SS.RPL=CPL=3).

So this patch:

- modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
the above case with SYSRET is not broken further, and the way
to fix it would be to pass the CPL to userspace and back

- modifies VMX to always return the CPL from SS.DPL (except
forcing it to 0 if we are emulating real mode via vm86 mode;
in vm86 mode all DPLs have to be 3, but real mode does allow
privileged instructions).  It also removes the CPL cache,
which becomes a duplicate of the SS access rights cache.

This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
CR0.PE=1 but before CS has been reloaded.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ae9fedc7

24 4月, 2014 1 次提交

KVM: x86: Fix CR3 reserved bits · 346874c9

由 Nadav Amit 提交于 4月 18, 2014

According to Intel specifications, PAE and non-PAE does not have any reserved
bits.  In long-mode, regardless to PCIDE, only the high bits (above the
physical address) are reserved.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

346874c9

15 4月, 2014 1 次提交

KVM: Remove SMAP bit from CR4_RESERVED_BITS · 56d6efc2

由 Feng Wu 提交于 4月 01, 2014

This patch removes SMAP bit from CR4_RESERVED_BITS.
Signed-off-by: NFeng Wu <feng.wu@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

56d6efc2

11 3月, 2014 4 次提交

KVM: x86: Allow the guest to run with dirty debug registers · c77fb5fe

由 Paolo Bonzini 提交于 2月 21, 2014

When not running in guest-debug mode, the guest controls the debug
registers and having to take an exit for each DR access is a waste
of time. If the guest gets into a state where each context switch
causes DR to be saved and restored, this can take away as much as 40%
of the execution time from the guest.

After this patch, VMX- and SVM-specific code can set a flag in
switch_db_regs, telling vcpu_enter_guest that on the next exit the debug
registers might be dirty and need to be reloaded (syncing will be taken
care of by a new callback in kvm_x86_ops). This flag can be set on the
first access to a debug registers, so that multiple accesses to the
debug registers only cause one vmexit.

Note that since the guest will be able to read debug registers and
enable breakpoints in DR7, we need to ensure that they are synchronized
on entry to the guest---including DR6 that was not synced before.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c77fb5fe

KVM: x86: change vcpu->arch.switch_db_regs to a bit mask · 360b948d

由 Paolo Bonzini 提交于 2月 21, 2014

The next patch will add another bit that we can test with the
same "if".
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

360b948d

KVM: x86: Remove return code from enable_irq/nmi_window · c9a7953f

由 Jan Kiszka 提交于 3月 07, 2014

It's no longer possible to enter enable_irq_window in guest mode when
L1 intercepts external interrupts and we are entering L2. This is now
caught in vcpu_enter_guest. So we can remove the check from the VMX
version of enable_irq_window, thus the need to return an error code from
both enable_irq_window and enable_nmi_window.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9a7953f

KVM: nVMX: Rework interception of IRQs and NMIs · b6b8a145

由 Jan Kiszka 提交于 3月 07, 2014

Move the check for leaving L2 on pending and intercepted IRQs or NMIs
from the *_allowed handler into a dedicated callback. Invoke this
callback at the relevant points before KVM checks if IRQs/NMIs can be
injected. The callback has the task to switch from L2 to L1 if needed
and inject the proper vmexit events.

The rework fixes L2 wakeups from HLT and provides the foundation for
preemption timer emulation.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6b8a145

04 3月, 2014 2 次提交

x86: kvm: introduce periodic global clock updates · 332967a3

由 Andrew Jones 提交于 2月 28, 2014

commit 0061d53d introduced a mechanism to execute a global clock
update for a vm. We can apply this periodically in order to propagate
host NTP corrections. Also, if all vcpus of a vm are pinned, then
without an additional trigger, no guest NTP corrections can propagate
either, as the current trigger is only vcpu cpu migration.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

332967a3

x86: kvm: rate-limit global clock updates · 7e44e449

由 Andrew Jones 提交于 2月 28, 2014

When we update a vcpu's local clock it may pick up an NTP correction.
We can't wait an indeterminate amount of time for other vcpus to pick
up that correction, so commit 0061d53d introduced a global clock
update. However, we can't request a global clock update on every vcpu
load either (which is what happens if the tsc is marked as unstable).
The solution is to rate-limit the global clock updates. Marcelo
calculated that we should delay the global clock updates no more
than 0.1s as follows:

Assume an NTP correction c is applied to one vcpu, but not the other,
then in n seconds the delta of the vcpu system_timestamps will be
c * n. If we assume a correction of 500ppm (worst-case), then the two
vcpus will diverge 50us in 0.1s, which is a considerable amount.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7e44e449

24 2月, 2014 1 次提交

KVM: x86: Intel MPX vmx and msr handle · da8999d3

由 Liu, Jinsong 提交于 2月 24, 2014

From caddc009a6d2019034af8f2346b2fd37a81608d0 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Mon, 24 Feb 2014 18:11:11 +0800
Subject: [PATCH v5 1/3] KVM: x86: Intel MPX vmx and msr handle

This patch handle vmx and msr of Intel MPX feature.
Signed-off-by: NXudong Hao <xudong.hao@intel.com>
Signed-off-by: NLiu Jinsong <jinsong.liu@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

da8999d3

04 2月, 2014 1 次提交

KVM: x86: remove unused last_kernel_ns variable · 4f34d683

由 Marcelo Tosatti 提交于 1月 29, 2014

Remove unused last_kernel_ns variable.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f34d683

17 1月, 2014 2 次提交

KVM: SVM: Fix reading of DR6 · 73aaf249

由 Jan Kiszka 提交于 1月 04, 2014

In contrast to VMX, SVM dose not automatically transfer DR6 into the
VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor
hook to obtain the current value. And as SVM now picks the DR6 state
from its VMCB, we also need a set callback in order to write updates of
DR6 back.

Fixes a regression of 020df079.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73aaf249

add support for Hyper-V reference time counter · e984097b

由 Vadim Rozenfeld 提交于 1月 16, 2014

Signed-off: Peter Lieven <pl@kamp.de>
Signed-off: Gleb Natapov
Signed-off: Vadim Rozenfeld <vrozenfe@redhat.com>

After some consideration I decided to submit only Hyper-V reference
counters support this time. I will submit iTSC support as a separate
patch as soon as it is ready.

v1 -> v2
1. mark TSC page dirty as suggested by
    Eric Northup <digitaleric@google.com> and Gleb
2. disable local irq when calling get_kernel_ns,
    as it was done by Peter Lieven <pl@amp.de>
3. move check for TSC page enable from second patch
    to this one.

v3 -> v4
    Get rid of ref counter offset.

v4 -> v5
    replace __copy_to_user with kvm_write_guest
    when updateing iTSC page.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e984097b

06 11月, 2013 1 次提交

kvm: Delete prototype for non-existent function kvm_check_iopl · a890b6fe

由 Josh Triplett 提交于 10月 20, 2013

The prototype for kvm_check_iopl appeared in commit
f850e2e6 ("KVM: x86 emulator: Check IOPL
level during io instruction emulation"), but the function never actually
existed.  Remove the prototype.
Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

a890b6fe