提交 · 52ce3c21aec30d9dd99a89662ae87c657636787b · openanolis / cloud-kernel

03 11月, 2014 6 次提交

x86,kvm,vmx: Don't trap writes to CR4.TSD · 52ce3c21

由 Andy Lutomirski 提交于 10月 07, 2014

CR4.TSD is guest-owned; don't trap writes to it in VMX guests.  This
avoids a VM exit on context switches into or out of a PR_TSC_SIGSEGV
task.

I think that this fixes an unintentional side-effect of:
    4c38609a KVM: VMX: Make guest cr4 mask more conservative
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

52ce3c21

KVM: x86: Sysexit emulation does not mask RIP/RSP · bf0b682c

由 Nadav Amit 提交于 9月 18, 2014

If the operand size is not 64-bit, then the sysexit instruction should assign
ECX to RSP and EDX to RIP.  The current code assigns the full 64-bits.

Fix it by masking.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bf0b682c

KVM: x86: Distinguish between stack operation and near branches · 58b7075d

由 Nadav Amit 提交于 10月 24, 2014

In 64-bit, stack operations default to 64-bits, but can be overriden (to
16-bit) using opsize override prefix. In contrast, near-branches are always
64-bit. This patch distinguish between the different behaviors.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

58b7075d

KVM: x86: Getting rid of grp45 in emulator · f7784046

由 Nadav Amit 提交于 9月 18, 2014

Breaking grp45 to the relevant functions to speed up the emulation and simplify
the code. In addition, it is necassary the next patch will distinguish between
far and near branches according to the flags.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f7784046

KVM: x86: Use new is_noncanonical_address in _linearize · 4be4de7e

由 Nadav Amit 提交于 9月 18, 2014

Replace the current canonical address check with the new function which is
identical.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4be4de7e

KVM: emulator: always inline __linearize · d09155d2

由 Paolo Bonzini 提交于 10月 27, 2014

The two callers have a lot of constant arguments that can be
optimized out.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d09155d2

02 11月, 2014 3 次提交

KVM: vmx: defer load of APIC access page address during reset · a73896cb

由 Paolo Bonzini 提交于 11月 02, 2014

Most call paths to vmx_vcpu_reset do not hold the SRCU lock.  Defer loading
the APIC access page to the next vmentry.

This avoids the following lockdep splat:

[ INFO: suspicious RCU usage. ]
3.18.0-rc2-test2+ #70 Not tainted
-------------------------------
include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/2371:
 #0:  (&vcpu->mutex){+.+...}, at: [<ffffffffa037d800>] vcpu_load+0x20/0xd0 [kvm]

stack backtrace:
CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70
Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
 0000000000000001 ffff880209983ca8 ffffffff816f514f 0000000000000000
 ffff8802099b8990 ffff880209983cd8 ffffffff810bd687 00000000000fee00
 ffff880208a2c000 ffff880208a10000 ffff88020ef50040 ffff880209983d08
Call Trace:
 [<ffffffff816f514f>] dump_stack+0x4e/0x71
 [<ffffffff810bd687>] lockdep_rcu_suspicious+0xe7/0x120
 [<ffffffffa037d055>] gfn_to_memslot+0xd5/0xe0 [kvm]
 [<ffffffffa03807d3>] __gfn_to_pfn+0x33/0x60 [kvm]
 [<ffffffffa0380885>] gfn_to_page+0x25/0x90 [kvm]
 [<ffffffffa038aeec>] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm]
 [<ffffffffa08f0a9c>] vmx_vcpu_reset+0x20c/0x460 [kvm_intel]
 [<ffffffffa039ab8e>] kvm_vcpu_reset+0x15e/0x1b0 [kvm]
 [<ffffffffa039ac0c>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
 [<ffffffffa037f7e0>] kvm_vm_ioctl+0x1d0/0x780 [kvm]
 [<ffffffff810bc664>] ? __lock_is_held+0x54/0x80
 [<ffffffff812231f0>] do_vfs_ioctl+0x300/0x520
 [<ffffffff8122ee45>] ? __fget+0x5/0x250
 [<ffffffff8122f0fa>] ? __fget_light+0x2a/0xe0
 [<ffffffff81223491>] SyS_ioctl+0x81/0xa0
 [<ffffffff816fed6d>] system_call_fastpath+0x16/0x1b
Reported-by: NTakashi Iwai <tiwai@suse.de>
Reported-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com>
Reviewed-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Fixes: 38b99173Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a73896cb

KVM: nVMX: Disable preemption while reading from shadow VMCS · 282da870

由 Jan Kiszka 提交于 10月 08, 2014

In order to access the shadow VMCS, we need to load it. At this point,
vmx->loaded_vmcs->vmcs and the actually loaded one start to differ. If
we now get preempted by Linux, vmx_vcpu_put and, on return, the
vmx_vcpu_load will work against the wrong vmcs. That can cause
copy_shadow_to_vmcs12 to corrupt the vmcs12 state.

Fix the issue by disabling preemption during the copy operation.
copy_vmcs12_to_shadow is safe from this issue as it is executed by
vmx_vcpu_run when preemption is already disabled before vmentry.

This bug is exposed by running Jailhouse within KVM on CPUs with
shadow VMCS support. Jailhouse never expects an interrupt pending
vmexit, but the bug can cause it if, after copy_shadow_to_vmcs12
is preempted, the active VMCS happens to have the virtual interrupt
pending flag set in the CPU-based execution controls.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

282da870

KVM: x86: Fix far-jump to non-canonical check · 7e46dddd

由 Nadav Amit 提交于 10月 28, 2014

Commit d1442d85 ("KVM: x86: Handle errors when RIP is set during far
jumps") introduced a bug that caused the fix to be incomplete.  Due to
incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
not trigger #GP.  As we know, this imposes a security problem.

In addition, the condition for two warnings was incorrect.

Fixes: d1442d85Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
[Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7e46dddd

29 10月, 2014 4 次提交

KVM: nVMX: Disable preemption while reading from shadow VMCS · 41e7ed64

由 Jan Kiszka 提交于 10月 08, 2014

Fix the issue by disabling preemption during the copy operation.
copy_vmcs12_to_shadow is safe from this issue as it is executed by
vmx_vcpu_run when preemption is already disabled before vmentry.

41e7ed64

KVM: x86: Fix far-jump to non-canonical check · cd9b8e2c

由 Nadav Amit 提交于 10月 28, 2014

Commit d1442d85 ("KVM: x86: Handle errors when RIP is set during far
jumps") introduced a bug that caused the fix to be incomplete.  Due to
incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
not trigger #GP.  As we know, this imposes a security problem.

In addition, the condition for two warnings was incorrect.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
[Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cd9b8e2c

KVM: emulator: fix execution close to the segment limit · fd56e154

由 Paolo Bonzini 提交于 10月 27, 2014

Emulation of code that is 14 bytes to the segment limit or closer
(e.g. RIP = 0xFFFFFFF2 after reset) is broken because we try to read as
many as 15 bytes from the beginning of the instruction, and __linearize
fails when the passed (address, size) pair reaches out of the segment.

To fix this, let __linearize return the maximum accessible size (clamped
to 2^32-1) for usage in __do_insn_fetch_bytes, and avoid the limit check
by passing zero for the desired size.

For expand-down segments, __linearize is performing a redundant check.
(u32)(addr.ea + size - 1) <= lim can only happen if addr.ea is close
to 4GB; in this case, addr.ea + size - 1 will also fail the check against
the upper bound of the segment (which is provided by the D/B bit).
After eliminating the redundant check, it is simple to compute
the *max_size for expand-down segments too.

Now that the limit check is done in __do_insn_fetch_bytes, we want
to inject a general protection fault there if size < op_size (like
__linearize would have done), instead of just aborting.

This fixes booting Tiano Core from emulated flash with EPT disabled.

Cc: stable@vger.kernel.org
Fixes: 719d5a9bReported-by: NBorislav Petkov <bp@suse.de>
Tested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fd56e154

KVM: emulator: fix error code for __linearize · 3606189f

由 Paolo Bonzini 提交于 10月 27, 2014

The error code for #GP and #SS is zero when the segment is used to
access an operand or an instruction.  It is only non-zero when
a segment register is being loaded; for limit checks this means
cases such as:

* for #GP, when RIP is beyond the limit on a far call (before the first
instruction is executed).  We do not implement this check, but it
would be in em_jmp_far/em_call_far.

* for #SS, if the new stack overflows during an inter-privilege-level
call to a non-conforming code segment.  We do not implement stack
switching at all.

So use an error code of zero.
Reviewed-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3606189f

24 10月, 2014 13 次提交

KVM: x86: Wrong assertion on paging_tmpl.h · 1715d0dc

由 Nadav Amit 提交于 9月 30, 2014

Even after the recent fix, the assertion on paging_tmpl.h is triggered.
Apparently, the assertion wants to check that the PAE is always set on
long-mode, but does it in incorrect way. Note that the assertion is not
enabled unless the code is debugged by defining MMU_DEBUG.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1715d0dc

KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag · 3f6f1480

由 Nadav Amit 提交于 10月 13, 2014

The decode phase of the x86 emulator assumes that every instruction with the
ModRM flag, and which can be used with RIP-relative addressing, has either
SrcMem or DstMem.  This is not the case for several instructions - prefetch,
hint-nop and clflush.

Adding SrcMem|NoAccess for prefetch and hint-nop and SrcMem for clflush.

This fixes CVE-2014-8480.

Fixes: 41061cdb
Cc: stable@vger.kernel.org
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3f6f1480

KVM: x86: Emulator does not decode clflush well · 13e457e0

由 Nadav Amit 提交于 10月 13, 2014

Currently, all group15 instructions are decoded as clflush (e.g., mfence,
xsave).  In addition, the clflush instruction requires no prefix (66/f2/f3)
would exist. If prefix exists it may encode a different instruction (e.g.,
clflushopt).

Creating a group for clflush, and different group for each prefix.

This has been the case forever, but the next patch needs the cflush group
in order to fix a bug introduced in 3.17.

Fixes: 41061cdb
Cc: stable@vger.kernel.org
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

13e457e0

KVM: emulate: avoid accessing NULL ctxt->memopp · a430c916

由 Paolo Bonzini 提交于 10月 23, 2014

A failure to decode the instruction can cause a NULL pointer access.
This is fixed simply by moving the "done" label as close as possible
to the return.

This fixes CVE-2014-8481.
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Cc: stable@vger.kernel.org
Fixes: 41061cdbSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a430c916

KVM: x86: Decoding guest instructions which cross page boundary may fail · 08da44ae

由 Nadav Amit 提交于 10月 03, 2014

Once an instruction crosses a page boundary, the size read from the second page
disregards the common case that part of the operand resides on the first page.
As a result, fetch of long insturctions may fail, and thereby cause the
decoding to fail as well.

Cc: stable@vger.kernel.org
Fixes: 5cfc7e0fSigned-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

08da44ae

kvm: x86: don't kill guest on unknown exit reason · 2bc19dc3

由 Michael S. Tsirkin 提交于 9月 18, 2014

KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was
triggered by a priveledged application.  Let's not kill the guest: WARN
and inject #UD instead.

Cc: stable@vger.kernel.org
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2bc19dc3

kvm: vmx: handle invvpid vm exit gracefully · a642fc30

由 Petr Matousek 提交于 9月 23, 2014

On systems with invvpid instruction support (corresponding bit in
IA32_VMX_EPT_VPID_CAP MSR is set) guest invocation of invvpid
causes vm exit, which is currently not handled and results in
propagation of unknown exit to userspace.

Fix this by installing an invvpid vm exit handler.

This is CVE-2014-3646.

Cc: stable@vger.kernel.org
Signed-off-by: NPetr Matousek <pmatouse@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a642fc30

KVM: x86: Handle errors when RIP is set during far jumps · d1442d85

由 Nadav Amit 提交于 9月 18, 2014

Far jmp/call/ret may fault while loading a new RIP.  Currently KVM does not
handle this case, and may result in failed vm-entry once the assignment is
done.  The tricky part of doing so is that loading the new CS affects the
VMCS/VMCB state, so if we fail during loading the new RIP, we are left in
unconsistent state.  Therefore, this patch saves on 64-bit the old CS
descriptor and restores it if loading RIP failed.

This fixes CVE-2014-3647.

Cc: stable@vger.kernel.org
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d1442d85

KVM: x86: Emulator fixes for eip canonical checks on near branches · 234f3ce4

由 Nadav Amit 提交于 9月 18, 2014

Before changing rip (during jmp, call, ret, etc.) the target should be asserted
to be canonical one, as real CPUs do.  During sysret, both target rsp and rip
should be canonical. If any of these values is noncanonical, a #GP exception
should occur.  The exception to this rule are syscall and sysenter instructions
in which the assigned rip is checked during the assignment to the relevant
MSRs.

This patch fixes the emulator to behave as real CPUs do for near branches.
Far branches are handled by the next patch.

This fixes CVE-2014-3647.

Cc: stable@vger.kernel.org
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

234f3ce4

KVM: x86: Fix wrong masking on relative jump/call · 05c83ec9

由 Nadav Amit 提交于 9月 18, 2014

Relative jumps and calls do the masking according to the operand size, and not
according to the address size as the KVM emulator does today.

This patch fixes KVM behavior.

Cc: stable@vger.kernel.org
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

05c83ec9

KVM: x86: Improve thread safety in pit · 2febc839

由 Andy Honig 提交于 8月 27, 2014

There's a race condition in the PIT emulation code in KVM.  In
__kvm_migrate_pit_timer the pit_timer object is accessed without
synchronization.  If the race condition occurs at the wrong time this
can crash the host kernel.

This fixes CVE-2014-3611.

Cc: stable@vger.kernel.org
Signed-off-by: NAndrew Honig <ahonig@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2febc839

KVM: x86: Prevent host from panicking on shared MSR writes. · 8b3c3104

由 Andy Honig 提交于 8月 27, 2014

The previous patch blocked invalid writes directly when the MSR
is written.  As a precaution, prevent future similar mistakes by
gracefulling handle GPs caused by writes to shared MSRs.

Cc: stable@vger.kernel.org
Signed-off-by: NAndrew Honig <ahonig@google.com>
[Remove parts obsoleted by Nadav's patch. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8b3c3104

KVM: x86: Check non-canonical addresses upon WRMSR · 854e8bb1

由 Nadav Amit 提交于 9月 16, 2014

Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
written to certain MSRs. The behavior is "almost" identical for AMD and Intel
(ignoring MSRs that are not implemented in either architecture since they would
anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
non-canonical address is written on Intel but not on AMD (which ignores the top
32-bits).

Accordingly, this patch injects a #GP on the MSRs which behave identically on
Intel and AMD. To eliminate the differences between the architecutres, the
value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
canonical value before writing instead of injecting a #GP.

Some references from Intel and AMD manuals:

According to Intel SDM description of WRMSR instruction #GP is expected on
WRMSR "If the source register contains a non-canonical address and ECX
specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."

According to AMD manual instruction manual:
LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
LSTAR and CSTAR registers. If an RIP written by WRMSR is not in canonical
form, a general-protection exception (#GP) occurs."
IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
base field must be in canonical form or a #GP fault will occur."
IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
be in canonical form."

This patch fixes CVE-2014-3610.

Cc: stable@vger.kernel.org
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

854e8bb1

19 10月, 2014 1 次提交

x86,kvm,vmx: Preserve CR4 across VM entry · d974baa3

由 Andy Lutomirski 提交于 10月 08, 2014

CR4 isn't constant; at least the TSD and PCE bits can vary.

TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
like it's correct.

This adds a branch and a read from cr4 to each vm entry.  Because it is
extremely likely that consecutive entries into the same vcpu will have
the same host cr4 value, this fixes up the vmcs instead of restoring cr4
after the fact.  A subsequent patch will add a kernel-wide cr4 shadow,
reducing the overhead in the common case to just two memory reads and a
branch.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d974baa3

03 10月, 2014 1 次提交

kvm: do not handle APIC access page if in-kernel irqchip is not in use · f439ed27

由 Paolo Bonzini 提交于 10月 02, 2014

This fixes the following OOPS:

   loaded kvm module (v3.17-rc1-168-gcec26bc3)
   BUG: unable to handle kernel paging request at fffffffffffffffe
   IP: [<ffffffff81168449>] put_page+0x9/0x30
   PGD 1e15067 PUD 1e17067 PMD 0
   Oops: 0000 [#1] PREEMPT SMP
    [<ffffffffa063271d>] ? kvm_vcpu_reload_apic_access_page+0x5d/0x70 [kvm]
    [<ffffffffa013b6db>] vmx_vcpu_reset+0x21b/0x470 [kvm_intel]
    [<ffffffffa0658816>] ? kvm_pmu_reset+0x76/0xb0 [kvm]
    [<ffffffffa064032a>] kvm_vcpu_reset+0x15a/0x1b0 [kvm]
    [<ffffffffa06403ac>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
    [<ffffffffa062e540>] kvm_vm_ioctl+0x200/0x780 [kvm]
    [<ffffffff81212170>] do_vfs_ioctl+0x2d0/0x4b0
    [<ffffffff8108bd99>] ? __mmdrop+0x69/0xb0
    [<ffffffff812123d1>] SyS_ioctl+0x81/0xa0
    [<ffffffff8112a6f6>] ? __audit_syscall_exit+0x1f6/0x2a0
    [<ffffffff817229e9>] system_call_fastpath+0x16/0x1b
   Code: c6 78 ce a3 81 4c 89 e7 e8 d9 80 ff ff 0f 0b 4c 89 e7 e8 8f f6 ff ff e9 fa fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 1e 8b 47 1c 85 c0 74 27 f0
   RIP  [<ffffffff81193045>] put_page+0x5/0x50

when not using the in-kernel irqchip ("-machine kernel_irqchip=off"
with QEMU).  The fix is to make the same check in
kvm_vcpu_reload_apic_access_page that we already have
in vmx.c's vm_need_virtualize_apic_accesses().
Reported-by: NJan Kiszka <jan.kiszka@siemens.com>
Tested-by: NJan Kiszka <jan.kiszka@siemens.com>
Fixes: 4256f43fSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f439ed27

24 9月, 2014 12 次提交

kvm: x86: Unpin and remove kvm_arch->apic_access_page · c24ae0dc

由 Tang Chen 提交于 9月 24, 2014

In order to make the APIC access page migratable, stop pinning it in
memory.

And because the APIC access page is not pinned in memory, we can
remove kvm_arch->apic_access_page.  When we need to write its
physical address into vmcs, we use gfn_to_page() to get its page
struct, which is needed to call page_to_phys(); the page is then
immediately unpinned.
Suggested-by: NGleb Natapov <gleb@kernel.org>
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c24ae0dc

kvm: vmx: Implement set_apic_access_page_addr · 38b99173

由 Tang Chen 提交于 9月 24, 2014

Currently, the APIC access page is pinned by KVM for the entire life
of the guest. We want to make it migratable in order to make memory
hot-unplug available for machines that run KVM.

This patch prepares to handle this for the case where there is no nested
virtualization, or where the nested guest does not have an APIC page of
its own. All accesses to kvm->arch.apic_access_page are changed to go
through kvm_vcpu_reload_apic_access_page.

If the APIC access page is invalidated when the host is running, we update
the VMCS in the next guest entry.

If it is invalidated when the guest is running, the MMU notifier will force
an exit, after which we will handle everything as in the previous case.

If it is invalidated when a nested guest is running, the request will update
either the VMCS01 or the VMCS02. Updating the VMCS01 is done at the
next L2->L1 exit, while updating the VMCS02 is done in prepare_vmcs02.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

38b99173

kvm: x86: Add request bit to reload APIC access page address · 4256f43f

由 Tang Chen 提交于 9月 24, 2014

Currently, the APIC access page is pinned by KVM for the entire life
of the guest.  We want to make it migratable in order to make memory
hot-unplug available for machines that run KVM.

This patch prepares to handle this in generic code, through a new
request bit (that will be set by the MMU notifier) and a new hook
that is called whenever the request bit is processed.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4256f43f

kvm: Add arch specific mmu notifier for page invalidation · fe71557a

由 Tang Chen 提交于 9月 24, 2014

This will be used to let the guest run while the APIC access page is
not pinned.  Because subsequent patches will fill in the function
for x86, place the (still empty) x86 implementation in the x86.c file
instead of adding an inline function in kvm_host.h.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe71557a

kvm: Fix page ageing bugs · 57128468

由 Andres Lagar-Cavilla 提交于 9月 22, 2014

1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.

2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.

3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.
Signed-off-by: NAndres Lagar-Cavilla <andreslc@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

57128468

kvm/x86/mmu: Pass gfn and level to rmapp callback. · 8a9522d2

由 Andres Lagar-Cavilla 提交于 9月 23, 2014

Callbacks don't have to do extra computation to learn what the caller
(lvm_handle_hva_range()) knows very well. Useful for
debugging/tracing/printk/future.
Signed-off-by: NAndres Lagar-Cavilla <andreslc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8a9522d2

kvm: x86: use macros to compute bank MSRs · 81760dcc

由 Chen Yucong 提交于 9月 23, 2014

Avoid open coded calculations for bank MSRs by using well-defined
macros that hide the index of higher bank MSRs.

No semantic changes.
Signed-off-by: NChen Yucong <slaoub@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

81760dcc

KVM: x86: Remove debug assertion of non-PAE reserved bits · d5262739

由 Nadav Amit 提交于 9月 23, 2014

Commit 346874c9 ("KVM: x86: Fix CR3 reserved bits") removed non-PAE
reserved bits which were not according to Intel SDM. However, residue was left
in a debug assertion (CR3_NONPAE_RESERVED_BITS). Remove it.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d5262739

kvm: x86: fix two typos in comment · b4619660

由 Tiejun Chen 提交于 9月 22, 2014

s/drity/dirty and s/vmsc01/vmcs01
Signed-off-by: NTiejun Chen <tiejun.chen@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b4619660

KVM: vmx: Inject #GP on invalid PAT CR · 4566654b

由 Nadav Amit 提交于 9月 18, 2014

Guest which sets the PAT CR to invalid value should get a #GP.  Currently, if
vmx supports loading PAT CR during entry, then the value is not checked.  This
patch makes the required check in that case.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4566654b

KVM: x86: emulating descriptor load misses long-mode case · 040c8dc8

由 Nadav Amit 提交于 9月 18, 2014

In 64-bit mode a #GP should be delivered to the guest "if the code segment
descriptor pointed to by the selector in the 64-bit gate doesn't have the L-bit
set and the D-bit clear." - Intel SDM "Interrupt 13â€”General Protection
Exception (#GP)".

This patch fixes the behavior of CS loading emulation code. Although the
comment says that segment loading is not supported in long mode, this function
is executed in long mode, so the fix is necassary.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

040c8dc8

KVM: x86: directly use kvm_make_request again · 77c3913b

由 Liang Chen 提交于 9月 18, 2014

A one-line wrapper around kvm_make_request is not particularly
useful. Replace kvm_mmu_flush_tlb() with kvm_make_request().
Signed-off-by: NLiang Chen <liangchen.linux@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

77c3913b

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功