提交 · 1b7fd45c32fcc170246bf10ba8d33871840319b8 · OpenHarmony / kernel_linux

18 3月, 2011 29 次提交

KVM: MMU: set spte accessed bit properly · 1b7fd45c

由 Xiao Guangrong 提交于 3月 04, 2011

Set spte accessed bit only if guest_initiated == 1 that means the really
accessed
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1b7fd45c

KVM: MMU: fix kvm_mmu_slot_remove_write_access dropping intermediate W bits · da8dc75f

由 Xiao Guangrong 提交于 3月 04, 2011

Only remove write access in the last sptes.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

da8dc75f

KVM: better readability of efer_reserved_bits · 1260edbe

由 Lai Jiangshan 提交于 2月 21, 2011

use EFER_SCE, EFER_LME and EFER_LMA instead of magic numbers.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1260edbe

KVM: Clear async page fault hash after switching to real mode · d170c419

由 Lai Jiangshan 提交于 2月 21, 2011

The hash array of async gfns may still contain some left gfns after
kvm_clear_async_pf_completion_queue() called, need to clear them.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d170c419

KVM: VMX: Initialize vm86 TSS only once. · 93ea5388

由 Gleb Natapov 提交于 2月 21, 2011

Currently vm86 task is initialized on each real mode entry and vcpu
reset. Initialization is done by zeroing TSS and updating relevant
fields. But since all vcpus are using the same TSS there is a race where
one vcpu may use TSS while other vcpu is initializing it, so the vcpu
that uses TSS will see wrong TSS content and will behave incorrectly.
Fix that by initializing TSS only once.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

93ea5388

KVM: VMX: update live TR selector if it changes in real mode · a8ba6c26

由 Gleb Natapov 提交于 2月 21, 2011

When rmode.vm86 is active TR descriptor is updated with vm86 task values,
but selector is left intact. vmx_set_segment() makes sure that if TR
register is written into while vm86 is active the new values are saved
for use after vm86 is deactivated, but since selector is not updated on
vm86 activation/deactivation new value is lost. Fix this by writing new
selector into vmcs immediately.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a8ba6c26

KVM: VMX: add the __noclone attribute to vmx_vcpu_run · a3b5ba49

由 Lai Jiangshan 提交于 2月 11, 2011

The changelog of 104f226b said "adds the __noclone attribute",
but it was missing in its patch. I think it is still needed.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a3b5ba49

KVM: x86: Convert tsc_write_lock to raw_spinlock · 038f8c11

由 Jan Kiszka 提交于 2月 04, 2011

Code under this lock requires non-preemptibility. Ensure this also over
-rt by converting it to raw spinlock.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

038f8c11

KVM: remove isr_ack logic from PIC · 7049467b

由 Gleb Natapov 提交于 2月 09, 2011

isr_ack logic was added by e4825800 to avoid unnecessary IPIs. Back
then it made sense, but now the code checks that vcpu is ready to accept
interrupt before sending IPI, so this logic is no longer needed. The
patch removes it.

Fixes a regression with Debian/Hurd.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Reported-and-tested-by: NJonathan Nieder <jrnieder@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7049467b

KVM: VMX: fix detection of BIOS disabling VMX · 23f3e991

由 Joseph Cihula 提交于 2月 08, 2011

This patch fixes the logic used to detect whether BIOS has disabled VMX, for
the case where VMX is enabled only under SMX, but tboot is not active.
Signed-off-by: NJoseph Cihula <joseph.cihula@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

23f3e991

KVM: Convert kvm_lock to raw_spinlock · e935b837

由 Jan Kiszka 提交于 2月 08, 2011

Code under this lock requires non-preemptibility. Ensure this also over
-rt by converting it to raw spinlock.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e935b837

KVM: SVM: check for progress after IRET interception · bd3d1ec3

由 Avi Kivity 提交于 2月 03, 2011

When we enable an NMI window, we ask for an IRET intercept, since
the IRET re-enables NMIs. However, the IRET intercept happens before
the instruction executes, while the NMI window architecturally opens
afterwards.

To compensate for this mismatch, we only open the NMI window in the
following exit, assuming that the IRET has by then executed; however,
this assumption is not always correct; we may exit due to a host interrupt
or page fault, without having executed the instruction.

Fix by checking for forward progress by recording and comparing the IRET's
rip. This is somewhat of a hack, since an unchaging rip does not mean that
no forward progress has been made, but is the simplest fix for now.
Signed-off-by: NAvi Kivity <avi@redhat.com>

bd3d1ec3

KVM: Fix race between nmi injection and enabling nmi window · f8636849

由 Avi Kivity 提交于 2月 03, 2011

The interrupt injection logic looks something like

  if an nmi is pending, and nmi injection allowed
    inject nmi
  if an nmi is pending
    request exit on nmi window

the problem is that "nmi is pending" can be set asynchronously by
the PIT; if it happens to fire between the two if statements, we
will request an nmi window even though nmi injection is allowed.  On
SVM, this has disasterous results, since it causes eflags.TF to be
set in random guest code.

The fix is simple; make nmi_pending synchronous using the standard
vcpu->requests mechanism; this ensures the code above is completely
synchronous wrt nmi_pending.
Signed-off-by: NAvi Kivity <avi@redhat.com>

f8636849

KVM: Drop ad-hoc vendor specific instruction restriction · 4005996e

由 Avi Kivity 提交于 2月 01, 2011

Use the new support in the emulator, and drop the ad-hoc code in x86.c.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

4005996e

KVM: x86 emulator: vendor specific instructions · d867162c

由 Avi Kivity 提交于 2月 01, 2011

Mark some instructions as vendor specific, and allow the caller to request
emulation only of vendor specific instructions.  This is useful in some
circumstances (responding to a #UD fault).
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d867162c

KVM: Drop bogus x86_decode_insn() error check · 3e909439

由 Avi Kivity 提交于 2月 01, 2011

x86_decode_insn() doesn't return X86EMUL_* values, so the check
for X86EMUL_PROPOGATE_FAULT will always fail.  There is a proper
check later on, so there is no need for a replacement for this
code.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3e909439

KVM: x86: Drop obsolete warning about INIT on runnable VCPU · 0bb88659

由 Jan Kiszka 提交于 2月 01, 2011

This warning was once used for debugging QEMU user space. Though
uncommon, it is actually possible to send an INIT request to a running
VCPU. So better drop this warning before someone misuses it to flood
kernel logs this way.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0bb88659

KVM: x86: release kvmclock page on reset · 12f9a48f

由 Glauber Costa 提交于 2月 01, 2011

When a vcpu is reset, kvmclock page keeps being written to this days.
This is wrong and inconsistent: a cpu reset should take it to its
initial state.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
CC: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

12f9a48f

KVM: x86: handle guest access to BBL_CR_CTL3 MSR · 91c9c3ed

由 john cooper 提交于 1月 21, 2011

A correction to Intel cpu model CPUID data (patch queued)
caused winxp to BSOD when booted with a Penryn model.
This was traced to the CPUID "model" field correction from
6 -> 23 (as is proper for a Penryn class of cpu).  Only in
this case does the problem surface.

The cause for this failure is winxp accessing the BBL_CR_CTL3
MSR which is unsupported by current kvm, appears to be a
legacy MSR not fully characterized yet existing in current
silicon, and is apparently carried forward in MSR space to
accommodate vintage code as here.  It is not yet conclusive
whether this MSR implements any of its legacy functionality
or is just an ornamental dud for compatibility.  While I
found no silicon version specific documentation link to
this MSR, a general description exists in Intel's developer's
reference which agrees with the functional behavior of
other bootloader/kernel code I've examined accessing
BBL_CR_CTL3.  Regrettably winxp appears to be setting bit #19
called out as "reserved" in the above document.

So to minimally accommodate this MSR, kvm msr get will provide
the equivalent mock data and kvm msr write will simply toss the
guest passed data without interpretation.  While this treatment
of BBL_CR_CTL3 addresses the immediate problem, the approach may
be modified pending clarification from Intel.
Signed-off-by: Njohn cooper <john.cooper@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

91c9c3ed

KVM: Add "exiting guest mode" state · 6b7e2d09

由 Xiao Guangrong 提交于 1月 12, 2011

Currently we keep track of only two states: guest mode and host
mode.  This patch adds an "exiting guest mode" state that tells
us that an IPI will happen soon, so unless we need to wait for the
IPI, we can avoid it completely.

Also
1: No need atomically to read/write ->mode in vcpu's thread

2: reorganize struct kvm_vcpu to make ->mode and ->requests
   in the same cache line explicitly
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6b7e2d09

KVM: x86: Remove user space triggerable MCE error message · 9ca52318

由 Jan Kiszka 提交于 1月 15, 2011

This case is a pure user space error we do not need to record. Moreover,
it can be misused to flood the kernel log. Remove it.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

9ca52318

KVM: fix rcu usage warning in kvm_arch_vcpu_ioctl_set_sregs() · 63f42e02

由 Xiao Guangrong 提交于 1月 12, 2011

Fix:

[ 1001.499596] ===================================================
[ 1001.499599] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 1001.499601] ---------------------------------------------------
[ 1001.499604] include/linux/kvm_host.h:301 invoked rcu_dereference_check() without protection!
	......
[ 1001.499636] Pid: 6035, comm: qemu-system-x86 Not tainted 2.6.37-rc6+ #62
[ 1001.499638] Call Trace:
[ 1001.499644]  [] lockdep_rcu_dereference+0x9d/0xa5
[ 1001.499653]  [] gfn_to_memslot+0x8d/0xc8 [kvm]
[ 1001.499661]  [] gfn_to_hva+0x16/0x3f [kvm]
[ 1001.499669]  [] kvm_read_guest_page+0x1e/0x5e [kvm]
[ 1001.499681]  [] kvm_read_guest_page_mmu+0x53/0x5e [kvm]
[ 1001.499699]  [] load_pdptrs+0x3f/0x9c [kvm]
[ 1001.499705]  [] ? vmx_set_cr0+0x507/0x517 [kvm_intel]
[ 1001.499717]  [] kvm_arch_vcpu_ioctl_set_sregs+0x1f3/0x3c0 [kvm]
[ 1001.499727]  [] kvm_vcpu_ioctl+0x6a5/0xbc5 [kvm]
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

63f42e02

KVM: VMX: Avoid atomic operation in vmx_vcpu_run · 40712fae

由 Avi Kivity 提交于 1月 06, 2011

Instead of exchanging the guest and host rcx, have separate storage
for each.  This allows us to avoid using the xchg instruction, which
is is a little slower than normal operations.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

40712fae

KVM: VMX: Simplify saving guest rcx in vmx_vcpu_run · 1c696d0e

由 Avi Kivity 提交于 1月 06, 2011

Change

  push top-of-stack
  pop guest-rcx
  pop dummy

to

  pop guest-rcx

which is the same thing, only simpler.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1c696d0e

KVM: VMX: increase ple_gap default to 128 · 00c25bce

由 Rik van Riel 提交于 1月 04, 2011

On some CPUs, a ple_gap of 41 is simply insufficient to ever trigger
PLE exits, even with the minimalistic PLE test from kvm-unit-tests.

http://git.kernel.org/?p=virt/kvm/kvm-unit-tests.git;a=commitdiff;h=eda71b28fa122203e316483b35f37aaacd42f545

For example, the Xeon X5670 CPU needs a ple_gap of at least 48 in
order to get pause loop exits:

# modprobe kvm_intel ple_gap=47
# taskset 1 /usr/local/bin/qemu-system-x86_64 \
  -device testdev,chardev=log -chardev stdio,id=log \
  -kernel x86/vmexit.flat -append ple-round-robin -smp 2
VNC server running on `::1:5900'
enabling apic
enabling apic
ple-round-robin 58298446
# rmmod kvm_intel
# modprobe kvm_intel ple_gap=48
# taskset 1 /usr/local/bin/qemu-system-x86_64 \
   -device testdev,chardev=log -chardev stdio,id=log \
   -kernel x86/vmexit.flat -append ple-round-robin -smp 2
VNC server running on `::1:5900'
enabling apic
enabling apic
ple-round-robin 36616

Increase the ple_gap to 128 to be on the safe side.
Signed-off-by: NRik van Riel <riel@redhat.com>
Acked-by: NZhai, Edwin <edwin.zhai@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

00c25bce

KVM: SVM: Add support for perf-kvm · 3781c01c

由 Joerg Roedel 提交于 1月 14, 2011

This patch adds the necessary code to run perf-kvm on AMD
machines.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3781c01c

KVM: VMX: Avoid leaking fake realmode state to userspace · a9179499

由 Avi Kivity 提交于 1月 03, 2011

When emulating real mode, we fake some state:

 - tr.base points to a fake vm86 tss
 - segment registers are made to conform to vm86 restrictions

change vmx_get_segment() not to expose this fake state to userspace;
instead, return the original state.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a9179499

KVM: VMX: Save and restore tr selector across mode switches · d0ba64f9

由 Avi Kivity 提交于 1月 03, 2011

When emulating real mode we play with tr hidden state, but leave
tr.selector alone.  That works well, except for save/restore, since
loading TR writes it to the hidden state in vmx->rmode.

Fix by also saving and restoring the tr selector; this makes things
more consistent and allows migration to work during the early
boot stages of Windows XP.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d0ba64f9

KVM: MMU: Don't flush shadow when enabling dirty tracking · 8234b22e

由 Avi Kivity 提交于 12月 27, 2010

Instead, drop large mappings, which were the reason we dropped shadow.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8234b22e

10 3月, 2011 1 次提交

tracing: Fix event alignment: kvm:kvm_hv_hypercall · d5bf2ff0

由 David Sharp 提交于 12月 03, 2010

Acked-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NDavid Sharp <dhsharp@google.com>
LKML-Reference: <1291421609-14665-8-git-send-email-dhsharp@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d5bf2ff0

22 2月, 2011 1 次提交

KVM: SVM: Advance instruction pointer in dr_intercept · 2c46d2ae

由 Joerg Roedel 提交于 2月 09, 2011

In the dr_intercept function a new cpu-feature called
decode-assists is implemented and used when available. This
code-path does not advance the guest-rip causing the guest
to dead-loop over mov-dr instructions. This is fixed by this
patch.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2c46d2ae

10 2月, 2011 1 次提交

KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_index · 893a5ab6

由 Joerg Roedel 提交于 1月 14, 2011

The gs_index loading code uses the swapgs instruction to
switch to the user gs_base temporarily. This is unsave in an
lightweight exit-path in KVM on AMD because the
KERNEL_GS_BASE MSR is switches lazily. An NMI happening in
the critical path of load_gs_index may use the wrong GS_BASE
value then leading to unpredictable behavior, e.g. a
triple-fault.

This patch fixes the issue by making sure that load_gs_index
is called only with a valid KERNEL_GS_BASE value loaded in
KVM.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

893a5ab6

14 1月, 2011 2 次提交

thp: mmu_notifier_test_young · 8ee53820

由 Andrea Arcangeli 提交于 1月 13, 2011

For GRU and EPT, we need gup-fast to set referenced bit too (this is why
it's correct to return 0 when shadow_access_mask is zero, it requires
gup-fast to set the referenced bit).  qemu-kvm access already sets the
young bit in the pte if it isn't zero-copy, if it's zero copy or a shadow
paging EPT minor fault we relay on gup-fast to signal the page is in
use...

We also need to check the young bits on the secondary pagetables for NPT
and not nested shadow mmu as the data may never get accessed again by the
primary pte.

Without this closer accuracy, we'd have to remove the heuristic that
avoids collapsing hugepages in hugepage virtual regions that have not even
a single subpage in use.

->test_young is full backwards compatible with GRU and other usages that
don't have young bits in pagetables set by the hardware and that should
nuke the secondary mmu mappings when ->clear_flush_young runs just like
EPT does.

Removing the heuristic that checks the young bit in
khugepaged/collapse_huge_page completely isn't so bad either probably but
I thought it was worth it and this makes it reliable.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ee53820

thp: kvm mmu transparent hugepage support · 936a5fe6

由 Andrea Arcangeli 提交于 1月 13, 2011

This should work for both hugetlbfs and transparent hugepages.

[akpm@linux-foundation.org: bring forward PageTransCompound() addition for bisectability]
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

936a5fe6

12 1月, 2011 6 次提交

KVM: Initialize fpu state in preemptible context · e5c30142

由 Avi Kivity 提交于 1月 11, 2011

init_fpu() (which is indirectly called by the fpu switching code) assumes
it is in process context.  Rather than makeing init_fpu() use an atomic
allocation, which can cause a task to be killed, make sure the fpu is
already initialized when we enter the run loop.

KVM-Stable-Tag.
Reported-and-tested-by: NKirill A. Shutemov <kas@openvz.org>
Acked-by: NPekka Enberg <penberg@kernel.org>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e5c30142

KVM: VMX: when entering real mode align segment base to 16 bytes · 444e863d

由 Gleb Natapov 提交于 12月 27, 2010

VMX checks that base is equal segment shifted 4 bits left. Otherwise
guest entry fails.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

444e863d

KVM: MMU: handle 'map_writable' in set_spte() function · f8e453b0

由 Xiao Guangrong 提交于 12月 23, 2010

Move the operation of 'writable' to set_spte() to clean up code

[avi: remove unneeded booleanification]
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f8e453b0

KVM: MMU: audit: allow audit more guests at the same time · b034cf01

由 Xiao Guangrong 提交于 12月 23, 2010

It only allows to audit one guest in the system since:
- 'audit_point' is a glob variable
- mmu_audit_disable() is called in kvm_mmu_destroy(), so audit is disabled
  after a guest exited

this patch fix those issues then allow to audit more guests at the same time
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b034cf01

KVM: Fetch guest cr3 from hardware on demand · aff48baa

由 Avi Kivity 提交于 12月 05, 2010

Instead of syncing the guest cr3 every exit, which is expensince on vmx
with ept enabled, sync it only on demand.

[sheng: fix incorrect cr3 seen by Windows XP]
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

aff48baa

A
KVM: Replace reads of vcpu->arch.cr3 by an accessor · 9f8fe504
由 Avi Kivity 提交于 12月 05, 2010
```
This allows us to keep cr3 in the VMCS, later on.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
9f8fe504

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年