提交 · cb7cb2864e758a1b040040bc55e404c677c911cb · openeuler / raspberrypi-kernel

27 11月, 2012 1 次提交

x86, kvm: Remove incorrect redundant assembly constraint · cb7cb286

由 H. Peter Anvin 提交于 11月 21, 2012

In __emulate_1op_rax_rdx, we use "+a" and "+d" which are input/output
constraints, and *then* use "a" and "d" as input constraints. This is
incorrect, but happens to work on some versions of gcc.

However, it breaks gcc with -O0 and icc, and may break on future
versions of gcc.
Reported-and-tested-by: NMelanie Blower <melanie.blower@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/B3584E72CFEBED439A3ECA9BCE67A4EF1B17AF90@FMSMSX107.amr.corp.intel.comReviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>

cb7cb286

17 11月, 2012 1 次提交

KVM: x86: Fix invalid secondary exec controls in vmx_cpuid_update() · 29282fde

由 Takashi Iwai 提交于 11月 09, 2012

The commit [ad756a16: KVM: VMX: Implement PCID/INVPCID for guests with
EPT] introduced the unconditional access to SECONDARY_VM_EXEC_CONTROL,
and this triggers kernel warnings like below on old CPUs:

    vmwrite error: reg 401e value a0568000 (err 12)
    Pid: 13649, comm: qemu-kvm Not tainted 3.7.0-rc4-test2+ #154
    Call Trace:
     [<ffffffffa0558d86>] vmwrite_error+0x27/0x29 [kvm_intel]
     [<ffffffffa054e8cb>] vmcs_writel+0x1b/0x20 [kvm_intel]
     [<ffffffffa054f114>] vmx_cpuid_update+0x74/0x170 [kvm_intel]
     [<ffffffffa03629b6>] kvm_vcpu_ioctl_set_cpuid2+0x76/0x90 [kvm]
     [<ffffffffa0341c67>] kvm_arch_vcpu_ioctl+0xc37/0xed0 [kvm]
     [<ffffffff81143f7c>] ? __vunmap+0x9c/0x110
     [<ffffffffa0551489>] ? vmx_vcpu_load+0x39/0x1a0 [kvm_intel]
     [<ffffffffa0340ee2>] ? kvm_arch_vcpu_load+0x52/0x1a0 [kvm]
     [<ffffffffa032dcd4>] ? vcpu_load+0x74/0xd0 [kvm]
     [<ffffffffa032deb0>] kvm_vcpu_ioctl+0x110/0x5e0 [kvm]
     [<ffffffffa032e93d>] ? kvm_dev_ioctl+0x4d/0x4a0 [kvm]
     [<ffffffff8117dc6f>] do_vfs_ioctl+0x8f/0x530
     [<ffffffff81139d76>] ? remove_vma+0x56/0x60
     [<ffffffff8113b708>] ? do_munmap+0x328/0x400
     [<ffffffff81187c8c>] ? fget_light+0x4c/0x100
     [<ffffffff8117e1a1>] sys_ioctl+0x91/0xb0
     [<ffffffff815a942d>] system_call_fastpath+0x1a/0x1f

This patch adds a check for the availability of secondary exec
control to avoid these warnings.

Cc: <stable@vger.kernel.org> [v3.6+]
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

29282fde

13 11月, 2012 1 次提交

KVM: x86: invalid opcode oops on SET_SREGS with OSXSAVE bit set (CVE-2012-4461) · 6d1068b3

由 Petr Matousek 提交于 11月 06, 2012

On hosts without the XSAVE support unprivileged local user can trigger
oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest
cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN
ioctl.

invalid opcode: 0000 [#2] SMP
Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables
...
Pid: 24935, comm: zoog_kvm_monito Tainted: G      D      3.2.0-3-686-pae
EIP: 0060:[<f8b9550c>] EFLAGS: 00210246 CPU: 0
EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm]
EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0
task.ti=d7c62000)
Stack:
 00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000
 ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0
 c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80
Call Trace:
 [<f8b940a9>] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm]
...
 [<c12bfb44>] ? syscall_call+0x7/0xb
Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74
1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01
d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89
EIP: [<f8b9550c>] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP
0068:d7c63e70

QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID
and then sets them later. So guest's X86_FEATURE_XSAVE should be masked
out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with
X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with
X86_FEATURE_XSAVE even on hosts that do not support it, might be
susceptible to this attack from inside the guest as well.

Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support.
Signed-off-by: NPetr Matousek <pmatouse@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6d1068b3

01 11月, 2012 1 次提交

KVM: x86: fix vcpu->mmio_fragments overflow · 87da7e66

由 Xiao Guangrong 提交于 10月 24, 2012

After commit b3356bf0 (KVM: emulator: optimize "rep ins" handling),
the pieces of io data can be collected and write them to the guest memory
or MMIO together

Unfortunately, kvm splits the mmio access into 8 bytes and store them to
vcpu->mmio_fragments. If the guest uses "rep ins" to move large data, it
will cause vcpu->mmio_fragments overflow

The bug can be exposed by isapc (-M isapc):

[23154.818733] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ ......]
[23154.858083] Call Trace:
[23154.859874]  [<ffffffffa04f0e17>] kvm_get_cr8+0x1d/0x28 [kvm]
[23154.861677]  [<ffffffffa04fa6d4>] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm]
[23154.863604]  [<ffffffffa04f5a1a>] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]

Actually, we can use one mmio_fragment to store a large mmio access then
split it when we pass the mmio-exit-info to userspace. After that, we only
need two entries to store mmio info for the cross-mmio pages access
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

87da7e66

23 10月, 2012 2 次提交

KVM: apic: fix LDR calculation in x2apic mode · 7f46ddbd

由 Gleb Natapov 提交于 10月 14, 2012

Signed-off-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NChegu Vinod  <chegu_vinod@hp.com>
Tested-by: NChegu Vinod <chegu_vinod@hp.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7f46ddbd

KVM: MMU: fix release noslot pfn · f3ac1a4b

由 Xiao Guangrong 提交于 10月 16, 2012

We can not directly call kvm_release_pfn_clean to release the pfn
since we can meet noslot pfn which is used to cache mmio info into
spte
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

f3ac1a4b

23 9月, 2012 2 次提交

KVM: x86: Fix guest debug across vcpu INIT reset · c8639010

由 Jan Kiszka 提交于 9月 21, 2012

If we reset a vcpu on INIT, we so far overwrote dr7 as provided by
KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally.

Fix this by saving the dr7 used for guest debugging and calculating the
effective register value as well as switch_db_regs on any potential
change. This will change to focus of the set_guest_debug vendor op to
update_dp_bp_intercept.

Found while trying to stop on start_secondary.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c8639010

KVM: Add resampling irqfds for level triggered interrupts · 7a84428a

由 Alex Williamson 提交于 9月 21, 2012

To emulate level triggered interrupts, add a resample option to
KVM_IRQFD.  When specified, a new resamplefd is provided that notifies
the user when the irqchip has been resampled by the VM.  This may, for
instance, indicate an EOI.  Also in this mode, posting of an interrupt
through an irqfd only asserts the interrupt.  On resampling, the
interrupt is automatically de-asserted prior to user notification.
This enables level triggered interrupts to be posted and re-enabled
from vfio with no userspace intervention.

All resampling irqfds can make use of a single irq source ID, so we
reserve a new one for this interface.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7a84428a

22 9月, 2012 1 次提交

x86, kvm: fix kvm's usage of kernel_fpu_begin/end() · b1a74bf8

由 Suresh Siddha 提交于 9月 20, 2012

Preemption is disabled between kernel_fpu_begin/end() and as such
it is not a good idea to use these routines in kvm_load/put_guest_fpu()
which can be very far apart.

kvm_load/put_guest_fpu() routines are already called with
preemption disabled and KVM already uses the preempt notifier to save
the guest fpu state using kvm_put_guest_fpu().

So introduce __kernel_fpu_begin/end() routines which don't touch
preemption and use them instead of kernel_fpu_begin/end()
for KVM's use model of saving/restoring guest FPU state.

Also with this change (and with eagerFPU model), fix the host cr0.TS vm-exit
state in the case of VMX. For eagerFPU case, host cr0.TS is always clear.
So no need to worry about it. For the traditional lazyFPU restore case,
change the cr0.TS bit for the host state during vm-exit to be always clear
and cr0.TS bit is set in the __vmx_load_host_state() when the FPU
(guest FPU or the host task's FPU) state is not active. This ensures
that the host/guest FPU state is properly saved, restored
during context-switch and with interrupts (using irq_fpu_usable()) not
stomping on the active FPU state.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1348164109.26695.338.camel@sbsiddha-desk.sc.intel.com
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

b1a74bf8

21 9月, 2012 1 次提交

KVM: x86: Export svm/vmx exit code and vector code to userspace · 26bf264e

由 Xiao Guangrong 提交于 9月 17, 2012

Exporting KVM exit information to userspace to be consumed by perf.
Signed-off-by: NDong Hao <haodong@linux.vnet.ibm.com>
[ Dong Hao <haodong@linux.vnet.ibm.com>: rebase it on acme's git tree ]
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1347870675-31495-2-git-send-email-haodong@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

26bf264e

20 9月, 2012 11 次提交

KVM: optimize apic interrupt delivery · 1e08ec4a

由 Gleb Natapov 提交于 9月 13, 2012

Most interrupt are delivered to only one vcpu. Use pre-build tables to
find interrupt destination instead of looping through all vcpus. In case
of logical mode loop only through vcpus in a logical cluster irq is sent
to.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1e08ec4a

KVM: MMU: Eliminate pointless temporary 'ac' · c5421519

由 Avi Kivity 提交于 9月 19, 2012

'ac' essentially reconstructs the 'access' variable we already
have, except for the PFERR_PRESENT_MASK and PFERR_RSVD_MASK.  As
these are not used by callees, just use 'access' directly.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c5421519

KVM: MMU: Avoid access/dirty update loop if all is well · b514c30f

由 Avi Kivity 提交于 9月 16, 2012

Keep track of accessed/dirty bits; if they are all set, do not
enter the accessed/dirty update loop.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b514c30f

KVM: MMU: Eliminate eperm temporary · 71331a1d

由 Avi Kivity 提交于 9月 16, 2012

'eperm' is no longer used in the walker loop, so we can eliminate it.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

71331a1d

KVM: MMU: Optimize is_last_gpte() · 6fd01b71

由 Avi Kivity 提交于 9月 12, 2012

Instead of branchy code depending on level, gpte.ps, and mmu configuration,
prepare everything in a bitmap during mode changes and look it up during
runtime.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6fd01b71

KVM: MMU: Simplify walk_addr_generic() loop · 13d22b6a

由 Avi Kivity 提交于 9月 12, 2012

The page table walk is coded as an infinite loop, with a special
case on the last pte.

Code it as an ordinary loop with a termination condition on the last
pte (large page or walk length exhausted), and put the last pte handling
code after the loop where it belongs.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

13d22b6a

KVM: MMU: Optimize pte permission checks · 97d64b78

由 Avi Kivity 提交于 9月 12, 2012

walk_addr_generic() permission checks are a maze of branchy code, which is
performed four times per lookup.  It depends on the type of access, efer.nxe,
cr0.wp, cr4.smep, and in the near future, cr4.smap.

Optimize this away by precalculating all variants and storing them in a
bitmap.  The bitmap is recalculated when rarely-changing variables change
(cr0, cr4) and is indexed by the often-changing variables (page fault error
code, pte access permissions).

The permission check is moved to the end of the loop, otherwise an SMEP
fault could be reported as a false positive, when PDE.U=1 but PTE.U=0.
Noted by Xiao Guangrong.

The result is short, branch-free code.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

97d64b78

KVM: MMU: Update accessed and dirty bits after guest pagetable walk · 8cbc7069

由 Avi Kivity 提交于 9月 16, 2012

While unspecified, the behaviour of Intel processors is to first
perform the page table walk, then, if the walk was successful, to
atomically update the accessed and dirty bits of walked paging elements.

While we are not required to follow this exactly, doing so will allow us
to perform the access permissions check after the walk is complete, rather
than after each walk step.

(the tricky case is SMEP: a zero in any pte's U bit makes the referenced
page a supervisor page, so we can't fault on a one bit during the walk
itself).
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8cbc7069

KVM: MMU: Move gpte_access() out of paging_tmpl.h · 3d34adec

由 Avi Kivity 提交于 9月 12, 2012

We no longer rely on paging_tmpl.h defines; so we can move the function
to mmu.c.

Rely on zero extension to 64 bits to get the correct nx behaviour.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3d34adec

KVM: MMU: Optimize gpte_access() slightly · edc2ae84

由 Avi Kivity 提交于 9月 12, 2012

If nx is disabled, then is gpte[63] is set we will hit a reserved
bit set fault before checking permissions; so we can ignore the
setting of efer.nxe.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

edc2ae84

KVM: MMU: Push clean gpte write protection out of gpte_access() · 8ea667f2

由 Avi Kivity 提交于 9月 12, 2012

gpte_access() computes the access permissions of a guest pte and also
write-protects clean gptes.  This is wrong when we are servicing a
write fault (since we'll be setting the dirty bit momentarily) but
correct when instantiating a speculative spte, or when servicing a
read fault (since we'll want to trap a following write in order to
set the dirty bit).

It doesn't seem to hurt in practice, but in order to make the code
readable, push the write protection out of gpte_access() and into
a new protect_clean_gpte() which is called explicitly when needed.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8ea667f2

19 9月, 2012 2 次提交

arch/x86: Remove unecessary semicolons · 4b8073e4

由 Peter Senna Tschudin 提交于 9月 18, 2012

Found by http://coccinelle.lip6.fr/Signed-off-by: NPeter Senna Tschudin <peter.senna@gmail.com>
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
Cc: a.p.zijlstra@chello.nl
Cc: rusty@rustcorp.com.au
Cc: masami.hiramatsu.pt@hitachi.com
Cc: suresh.b.siddha@intel.com
Cc: joerg.roedel@amd.com
Cc: agordeev@redhat.com
Cc: yinghai@kernel.org
Cc: bhelgaas@google.com
Cc: liuj97@gmail.com
Link: http://lkml.kernel.org/r/1347986174-30287-7-git-send-email-peter.senna@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

4b8073e4

x86, kvm: use kernel_fpu_begin/end() in kvm_load/put_guest_fpu() · 9c1c3fac

由 Suresh Siddha 提交于 8月 24, 2012

kvm's guest fpu save/restore should be wrapped around
kernel_fpu_begin/end(). This will avoid for example taking a DNA
in kvm_load_guest_fpu() when it tries to load the fpu immediately
after doing unlazy_fpu() on the host side.

More importantly this will prevent the host process fpu from being
corrupted.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1345842782-24175-4-git-send-email-suresh.b.siddha@intel.com
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

9c1c3fac

18 9月, 2012 1 次提交

KVM: make processes waiting on vcpu mutex killable · 9fc77441

由 Michael S. Tsirkin 提交于 9月 16, 2012

vcpu mutex can be held for unlimited time so
taking it with mutex_lock on an ioctl is wrong:
one process could be passed a vcpu fd and
call this ioctl on the vcpu used by another process,
it will then be unkillable until the owner exits.

Call mutex_lock_killable instead and return status.
Note: mutex_lock_interruptible would be even nicer,
but I am not sure all users are prepared to handle EINTR
from these ioctls. They might misinterpret it as an error.

Cleanup paths expect a vcpu that can't be used by
any userspace so this will always succeed - catch bugs
by calling BUG_ON.

Catch callers that don't check return state by adding
__must_check.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

9fc77441

17 9月, 2012 3 次提交

KVM: SVM: Make use of asm.h · 7454766f

由 Avi Kivity 提交于 9月 16, 2012

Use macros for bitness-insensitive register names, instead of
rolling our own.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7454766f

KVM: VMX: Make use of asm.h · b188c81f

由 Avi Kivity 提交于 9月 16, 2012

Use macros for bitness-insensitive register names, instead of
rolling our own.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b188c81f

KVM: VMX: Make lto-friendly · 83287ea4

由 Avi Kivity 提交于 9月 16, 2012

LTO (link-time optimization) doesn't like local labels to be referred to
from a different function, since the two functions may be built in separate
compilation units.  Use an external variable instead.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

83287ea4

13 9月, 2012 1 次提交

KVM: x86: lapic: Clean up find_highest_vector() and count_vectors() · ecba9a52

由 Takuya Yoshikawa 提交于 9月 05, 2012

find_highest_vector() and count_vectors():
 - Instead of using magic values, define and use proper macros.

find_highest_vector():
 - Remove likely() which is there only for historical reasons and not
   doing correct branch predictions anymore.  Using such heuristics
   to optimize this function is not worth it now.  Let CPUs predict
   things instead.

 - Stop checking word[0] separately.  This was only needed for doing
   likely() optimization.

 - Use for loop, not while, to iterate over the register array to make
   the code clearer.

Note that we actually confirmed that the likely() did wrong predictions
by inserting debug code.
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ecba9a52

10 9月, 2012 3 次提交

KVM: fix error paths for failed gfn_to_page() calls · 4484141a

由 Xiao Guangrong 提交于 9月 07, 2012

This bug was triggered:
[ 4220.198458] BUG: unable to handle kernel paging request at fffffffffffffffe
[ 4220.203907] IP: [<ffffffff81104d85>] put_page+0xf/0x34
......
[ 4220.237326] Call Trace:
[ 4220.237361]  [<ffffffffa03830d0>] kvm_arch_destroy_vm+0xf9/0x101 [kvm]
[ 4220.237382]  [<ffffffffa036fe53>] kvm_put_kvm+0xcc/0x127 [kvm]
[ 4220.237401]  [<ffffffffa03702bc>] kvm_vcpu_release+0x18/0x1c [kvm]
[ 4220.237407]  [<ffffffff81145425>] __fput+0x111/0x1ed
[ 4220.237411]  [<ffffffff8114550f>] ____fput+0xe/0x10
[ 4220.237418]  [<ffffffff81063511>] task_work_run+0x5d/0x88
[ 4220.237424]  [<ffffffff8104c3f7>] do_exit+0x2bf/0x7ca

The test case:

	printf(fmt, ##args);		\
	exit(-1);} while (0)

static int create_vm(void)
{
	int sys_fd, vm_fd;

	sys_fd = open("/dev/kvm", O_RDWR);
	if (sys_fd < 0)
		die("open /dev/kvm fail.\n");

	vm_fd = ioctl(sys_fd, KVM_CREATE_VM, 0);
	if (vm_fd < 0)
		die("KVM_CREATE_VM fail.\n");

	return vm_fd;
}

static int create_vcpu(int vm_fd)
{
	int vcpu_fd;

	vcpu_fd = ioctl(vm_fd, KVM_CREATE_VCPU, 0);
	if (vcpu_fd < 0)
		die("KVM_CREATE_VCPU ioctl.\n");
	printf("Create vcpu.\n");
	return vcpu_fd;
}

static void *vcpu_thread(void *arg)
{
	int vm_fd = (int)(long)arg;

	create_vcpu(vm_fd);
	return NULL;
}

int main(int argc, char *argv[])
{
	pthread_t thread;
	int vm_fd;

	(void)argc;
	(void)argv;

	vm_fd = create_vm();
	pthread_create(&thread, NULL, vcpu_thread, (void *)(long)vm_fd);
	printf("Exit.\n");
	return 0;
}

It caused by release kvm->arch.ept_identity_map_addr which is the
error page.

The parent thread can send KILL signal to the vcpu thread when it was
exiting which stops faulting pages and potentially allocating memory.
So gfn_to_pfn/gfn_to_page may fail at this time

Fixed by checking the page before it is used
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4484141a

KVM: MMU: remove unnecessary check · 7de5bdc9

由 Xiao Guangrong 提交于 9月 07, 2012

Checking the return of kvm_mmu_get_page is unnecessary since it is
guaranteed by memory cache
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7de5bdc9

KVM: Depend on HIGH_RES_TIMERS · 92b5265d

由 Liu, Jinsong 提交于 9月 10, 2012

KVM lapic timer and tsc deadline timer based on hrtimer,
setting a leftmost node to rb tree and then do hrtimer reprogram.
If hrtimer not configured as high resolution, hrtimer_enqueue_reprogram
do nothing and then make kvm lapic timer and tsc deadline timer fail.
Signed-off-by: NLiu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

92b5265d

09 9月, 2012 1 次提交

KVM: x86: Check INVPCID feature bit in EBX of leaf 7 · 4f977045

由 Ren, Yongjie 提交于 9月 07, 2012

Checks and operations on the INVPCID feature bit should use EBX
of CPUID leaf 7 instead of ECX.
Signed-off-by: NJunjie Mao <junjie.mao@intel.com>
Signed-off-by: NYongjie Ren <yongjien.ren@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4f977045

06 9月, 2012 7 次提交

KVM: use symbolic constant for nr interrupts · a50abc3b

由 Michael S. Tsirkin 提交于 9月 05, 2012

interrupt_bitmap is KVM_NR_INTERRUPTS bits in size,
so just use that instead of hard-coded constants
and math.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a50abc3b

KVM: emulator: optimize "rep ins" handling · b3356bf0

由 Gleb Natapov 提交于 9月 03, 2012

Optimize "rep ins" by allowing emulator to write back more than one
datum at a time. Introduce new operand type OP_MEM_STR which tells
writeback() that dst contains pointer to an array that should be written
back as opposite to just one data element.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b3356bf0

KVM: emulator: string_addr_inc() cleanup · f3bd64c6

由 Gleb Natapov 提交于 9月 03, 2012

Remove unneeded segment argument. Address structure already has correct
segment which was put there during decode.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f3bd64c6

G
KVM: emulator: make x86 emulation modes enum instead of defines · 9d1b39a9
由 Gleb Natapov 提交于 9月 03, 2012
```
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
9d1b39a9

KVM: Provide userspace IO exit completion callback · 716d51ab

由 Gleb Natapov 提交于 9月 03, 2012

Current code assumes that IO exit was due to instruction emulation
and handles execution back to emulator directly. This patch adds new
userspace IO exit completion callback that can be set by any other code
that caused IO exit to userspace.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

716d51ab

KVM: move postcommit flush to x86, as mmio sptes are x86 specific · 3b4dc3a0

由 Marcelo Tosatti 提交于 8月 28, 2012

Other arches do not need this.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

v2: fix incorrect deletion of mmio sptes on gpa move (noticed by Takuya)
Signed-off-by: NAvi Kivity <avi@redhat.com>

3b4dc3a0

KVM: split kvm_arch_flush_shadow · 2df72e9b

由 Marcelo Tosatti 提交于 8月 24, 2012

Introducing kvm_arch_flush_shadow_memslot, to invalidate the
translations of a single memory slot.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2df72e9b

05 9月, 2012 1 次提交

KVM: SVM: constify lookup tables · 09941fbb

由 Mathias Krause 提交于 8月 30, 2012

We never modify direct_access_msrs[], msrpm_ranges[],
svm_exit_handlers[] or x86_intercept_map[] at runtime.
Mark them r/o.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

09941fbb