提交 · 1aa8ceef0312a6aae7dd863a120a55f1637b361d · openanolis / cloud-kernel

18 3月, 2011 40 次提交

KVM: fix kvmclock regression due to missing clock update · 1aa8ceef

由 Nikola Ciprich 提交于 3月 09, 2011

commit 387b9f97750444728962b236987fbe8ee8cc4f8c moved kvm_request_guest_time_update(vcpu),
breaking 32bit SMP guests using kvm-clock. Fix this by moving (new) clock update function
to proper place.
Signed-off-by: NNikola Ciprich <nikola.ciprich@linuxbox.cz>
Acked-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1aa8ceef

KVM: emulator: Fix permission checking in io permission bitmap · 399a40c9

由 Gleb Natapov 提交于 3月 07, 2011

Currently if io port + len crosses 8bit boundary in io permission bitmap the
check may allow IO that otherwise should not be allowed. The patch fixes that.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

399a40c9

KVM: emulator: Fix io permission checking for 64bit guest · 5601d05b

由 Gleb Natapov 提交于 3月 07, 2011

Current implementation truncates upper 32bit of TR base address during IO
permission bitmap check. The patch fixes this.
Reported-and-tested-by: NFrancis Moreau <francis.moro@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5601d05b

KVM: SVM: Load %gs earlier if CONFIG_X86_32_LAZY_GS=n · 831ca609

由 Avi Kivity 提交于 3月 08, 2011

With CONFIG_CC_STACKPROTECTOR, we need a valid %gs at all times, so disable
lazy reload and do an eager reload immediately after the vmexit.
Reported-by: NIVAN ANGELOV <ivangotoy@gmail.com>
Acked-By: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

831ca609

KVM: x86: Remove useless regs_page pointer from kvm_lapic · afc20184

由 Takuya Yoshikawa 提交于 3月 05, 2011

Access to this page is mostly done through the regs member which holds
the address to this page. The exceptions are in vmx_vcpu_reset() and
kvm_free_lapic() and these both can easily be converted to using regs.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

afc20184

KVM: improve comment on rcu use in irqfd_deassign · c8ce057e

由 Michael S. Tsirkin 提交于 3月 06, 2011

The RCU use in kvm_irqfd_deassign is tricky: we have rcu_assign_pointer
but no synchronize_rcu: synchronize_rcu is done by kvm_irq_routing_update
which we share a spinlock with.

Fix up a comment in an attempt to make this clearer.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c8ce057e

KVM: MMU: remove unused macros · 676646ee

由 Xiao Guangrong 提交于 3月 04, 2011

These macros are not used, so removed
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

676646ee

KVM: MMU: cleanup page alloc and free · 842f22ed

由 Xiao Guangrong 提交于 3月 04, 2011

Using __get_free_page instead of alloc_page and page_address,
using free_page instead of __free_page and virt_to_page
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

842f22ed

KVM: MMU: do not record gfn in kvm_mmu_pte_write · 49b26e26

由 Xiao Guangrong 提交于 3月 04, 2011

No need to record the gfn to verifier the pte has the same mode as
current vcpu, it's because we only speculatively update the pte only
if the pte and vcpu have the same mode
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

49b26e26

KVM: MMU: move mmu pages calculated out of mmu lock · 48c0e4e9

由 Xiao Guangrong 提交于 3月 04, 2011

kvm_mmu_calculate_mmu_pages need to walk all memslots and it's protected by
kvm->slots_lock, so move it out of mmu spinlock
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

48c0e4e9

KVM: MMU: set spte accessed bit properly · 1b7fd45c

由 Xiao Guangrong 提交于 3月 04, 2011

Set spte accessed bit only if guest_initiated == 1 that means the really
accessed
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1b7fd45c

KVM: MMU: fix kvm_mmu_slot_remove_write_access dropping intermediate W bits · da8dc75f

由 Xiao Guangrong 提交于 3月 04, 2011

Only remove write access in the last sptes.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

da8dc75f

KVM: Start lock documentation · 38a778aa

由 Jan Kiszka 提交于 2月 09, 2011

The goal of this document shall be
- overview of all locks used in KVM core
- provide details on the scope of each lock
- explain the lock type, specifically of a raw spin locks
- provide a lock ordering guide
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

38a778aa

KVM: better readability of efer_reserved_bits · 1260edbe

由 Lai Jiangshan 提交于 2月 21, 2011

use EFER_SCE, EFER_LME and EFER_LMA instead of magic numbers.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1260edbe

KVM: Clear async page fault hash after switching to real mode · d170c419

由 Lai Jiangshan 提交于 2月 21, 2011

The hash array of async gfns may still contain some left gfns after
kvm_clear_async_pf_completion_queue() called, need to clear them.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d170c419

KVM: VMX: Initialize vm86 TSS only once. · 93ea5388

由 Gleb Natapov 提交于 2月 21, 2011

Currently vm86 task is initialized on each real mode entry and vcpu
reset. Initialization is done by zeroing TSS and updating relevant
fields. But since all vcpus are using the same TSS there is a race where
one vcpu may use TSS while other vcpu is initializing it, so the vcpu
that uses TSS will see wrong TSS content and will behave incorrectly.
Fix that by initializing TSS only once.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

93ea5388

KVM: VMX: update live TR selector if it changes in real mode · a8ba6c26

由 Gleb Natapov 提交于 2月 21, 2011

When rmode.vm86 is active TR descriptor is updated with vm86 task values,
but selector is left intact. vmx_set_segment() makes sure that if TR
register is written into while vm86 is active the new values are saved
for use after vm86 is deactivated, but since selector is not updated on
vm86 activation/deactivation new value is lost. Fix this by writing new
selector into vmcs immediately.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a8ba6c26

KVM: VMX: add the __noclone attribute to vmx_vcpu_run · a3b5ba49

由 Lai Jiangshan 提交于 2月 11, 2011

The changelog of 104f226b said "adds the __noclone attribute",
but it was missing in its patch. I think it is still needed.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a3b5ba49

KVM: x86: Convert tsc_write_lock to raw_spinlock · 038f8c11

由 Jan Kiszka 提交于 2月 04, 2011

Code under this lock requires non-preemptibility. Ensure this also over
-rt by converting it to raw spinlock.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

038f8c11

KVM: remove isr_ack logic from PIC · 7049467b

由 Gleb Natapov 提交于 2月 09, 2011

isr_ack logic was added by e4825800 to avoid unnecessary IPIs. Back
then it made sense, but now the code checks that vcpu is ready to accept
interrupt before sending IPI, so this logic is no longer needed. The
patch removes it.

Fixes a regression with Debian/Hurd.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Reported-and-tested-by: NJonathan Nieder <jrnieder@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7049467b

KVM: VMX: fix detection of BIOS disabling VMX · 23f3e991

由 Joseph Cihula 提交于 2月 08, 2011

This patch fixes the logic used to detect whether BIOS has disabled VMX, for
the case where VMX is enabled only under SMX, but tboot is not active.
Signed-off-by: NJoseph Cihula <joseph.cihula@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

23f3e991

KVM: Convert kvm_lock to raw_spinlock · e935b837

由 Jan Kiszka 提交于 2月 08, 2011

Code under this lock requires non-preemptibility. Ensure this also over
-rt by converting it to raw spinlock.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e935b837

KVM: SVM: check for progress after IRET interception · bd3d1ec3

由 Avi Kivity 提交于 2月 03, 2011

When we enable an NMI window, we ask for an IRET intercept, since
the IRET re-enables NMIs. However, the IRET intercept happens before
the instruction executes, while the NMI window architecturally opens
afterwards.

To compensate for this mismatch, we only open the NMI window in the
following exit, assuming that the IRET has by then executed; however,
this assumption is not always correct; we may exit due to a host interrupt
or page fault, without having executed the instruction.

Fix by checking for forward progress by recording and comparing the IRET's
rip. This is somewhat of a hack, since an unchaging rip does not mean that
no forward progress has been made, but is the simplest fix for now.
Signed-off-by: NAvi Kivity <avi@redhat.com>

bd3d1ec3

KVM: Fix race between nmi injection and enabling nmi window · f8636849

由 Avi Kivity 提交于 2月 03, 2011

The interrupt injection logic looks something like

  if an nmi is pending, and nmi injection allowed
    inject nmi
  if an nmi is pending
    request exit on nmi window

the problem is that "nmi is pending" can be set asynchronously by
the PIT; if it happens to fire between the two if statements, we
will request an nmi window even though nmi injection is allowed.  On
SVM, this has disasterous results, since it causes eflags.TF to be
set in random guest code.

The fix is simple; make nmi_pending synchronous using the standard
vcpu->requests mechanism; this ensures the code above is completely
synchronous wrt nmi_pending.
Signed-off-by: NAvi Kivity <avi@redhat.com>

f8636849

KVM: use yield_to instead of sleep in kvm_vcpu_on_spin · 217ece61

由 Rik van Riel 提交于 2月 01, 2011

Instead of sleeping in kvm_vcpu_on_spin, which can cause gigantic
slowdowns of certain workloads, we instead use yield_to to get
another VCPU in the same KVM guest to run sooner.

This seems to give a 10-15% speedup in certain workloads.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

217ece61

KVM: keep track of which task is running a KVM vcpu · 34bb10b7

由 Rik van Riel 提交于 2月 01, 2011

Keep track of which task is running a KVM vcpu.  This helps us
figure out later what task to wake up if we want to boost a
vcpu that got preempted.

Unfortunately there are no guarantees that the same task
always keeps the same vcpu, so we can only track the task
across a single "run" of the vcpu.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

34bb10b7

export pid symbols needed for kvm_vcpu_on_spin · 77c100c8

由 Rik van Riel 提交于 2月 01, 2011

Export the symbols required for a race-free kvm_vcpu_on_spin.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

77c100c8

KVM: Drop ad-hoc vendor specific instruction restriction · 4005996e

由 Avi Kivity 提交于 2月 01, 2011

Use the new support in the emulator, and drop the ad-hoc code in x86.c.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

4005996e

KVM: x86 emulator: vendor specific instructions · d867162c

由 Avi Kivity 提交于 2月 01, 2011

Mark some instructions as vendor specific, and allow the caller to request
emulation only of vendor specific instructions.  This is useful in some
circumstances (responding to a #UD fault).
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d867162c

KVM: Drop bogus x86_decode_insn() error check · 3e909439

由 Avi Kivity 提交于 2月 01, 2011

x86_decode_insn() doesn't return X86EMUL_* values, so the check
for X86EMUL_PROPOGATE_FAULT will always fail.  There is a proper
check later on, so there is no need for a replacement for this
code.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3e909439

KVM: x86: Drop obsolete warning about INIT on runnable VCPU · 0bb88659

由 Jan Kiszka 提交于 2月 01, 2011

This warning was once used for debugging QEMU user space. Though
uncommon, it is actually possible to send an INIT request to a running
VCPU. So better drop this warning before someone misuses it to flood
kernel logs this way.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0bb88659

KVM: x86: release kvmclock page on reset · 12f9a48f

由 Glauber Costa 提交于 2月 01, 2011

When a vcpu is reset, kvmclock page keeps being written to this days.
This is wrong and inconsistent: a cpu reset should take it to its
initial state.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
CC: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

12f9a48f

mm: remove is_hwpoison_address · f58c9df7

由 Huang Ying 提交于 1月 30, 2011

Unused.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f58c9df7

KVM: Replace is_hwpoison_address with __get_user_pages · fafc3dba

由 Huang Ying 提交于 1月 30, 2011

is_hwpoison_address only checks whether the page table entry is
hwpoisoned, regardless the memory page mapped.  While __get_user_pages
will check both.

QEMU will clear the poisoned page table entry (via unmap/map) to make
it possible to allocate a new memory page for the virtual address
across guest rebooting.  But it is also possible that the underlying
memory page is kept poisoned even after the corresponding page table
entry is cleared, that is, a new memory page can not be allocated.
__get_user_pages can catch these situations.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

fafc3dba

mm: make __get_user_pages return -EHWPOISON for HWPOISON page optionally · 69ebb83e

由 Huang Ying 提交于 1月 30, 2011

Make __get_user_pages return -EHWPOISON for HWPOISON page only if
FOLL_HWPOISON is specified.  With this patch, the interested callers
can distinguish HWPOISON pages from general FAULT pages, while other
callers will still get -EFAULT for all these pages, so the user space
interface need not to be changed.

This feature is needed by KVM, where UCR MCE should be relayed to
guest for HWPOISON page, while instruction emulation and MMIO will be
tried for general FAULT page.

The idea comes from Andrew Morton.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

69ebb83e

mm: export __get_user_pages · 0014bd99

由 Huang Ying 提交于 1月 30, 2011

In most cases, get_user_pages and get_user_pages_fast should be used
to pin user pages in memory.  But sometimes, some special flags except
FOLL_GET, FOLL_WRITE and FOLL_FORCE are needed, for example in
following patch, KVM needs FOLL_HWPOISON.  To support these users,
__get_user_pages is exported directly.

There are some symbol name conflicts in infiniband driver, fixed them too.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Michel Lespinasse <walken@google.com>
CC: Roland Dreier <roland@kernel.org>
CC: Ralph Campbell <infinipath@qlogic.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0014bd99

KVM: x86: handle guest access to BBL_CR_CTL3 MSR · 91c9c3ed

由 john cooper 提交于 1月 21, 2011

A correction to Intel cpu model CPUID data (patch queued)
caused winxp to BSOD when booted with a Penryn model.
This was traced to the CPUID "model" field correction from
6 -> 23 (as is proper for a Penryn class of cpu).  Only in
this case does the problem surface.

The cause for this failure is winxp accessing the BBL_CR_CTL3
MSR which is unsupported by current kvm, appears to be a
legacy MSR not fully characterized yet existing in current
silicon, and is apparently carried forward in MSR space to
accommodate vintage code as here.  It is not yet conclusive
whether this MSR implements any of its legacy functionality
or is just an ornamental dud for compatibility.  While I
found no silicon version specific documentation link to
this MSR, a general description exists in Intel's developer's
reference which agrees with the functional behavior of
other bootloader/kernel code I've examined accessing
BBL_CR_CTL3.  Regrettably winxp appears to be setting bit #19
called out as "reserved" in the above document.

So to minimally accommodate this MSR, kvm msr get will provide
the equivalent mock data and kvm msr write will simply toss the
guest passed data without interpretation.  While this treatment
of BBL_CR_CTL3 addresses the immediate problem, the approach may
be modified pending clarification from Intel.
Signed-off-by: Njohn cooper <john.cooper@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

91c9c3ed

KVM: make make_all_cpus_request() lockless · 3cba4130

由 Xiao Guangrong 提交于 1月 12, 2011

Now, we have 'vcpu->mode' to judge whether need to send ipi to other
cpus, this way is very exact, so checking request bit is needless,
then we can drop the spinlock let it's collateral
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3cba4130

KVM: Add "exiting guest mode" state · 6b7e2d09

由 Xiao Guangrong 提交于 1月 12, 2011

Currently we keep track of only two states: guest mode and host
mode.  This patch adds an "exiting guest mode" state that tells
us that an IPI will happen soon, so unless we need to wait for the
IPI, we can avoid it completely.

Also
1: No need atomically to read/write ->mode in vcpu's thread

2: reorganize struct kvm_vcpu to make ->mode and ->requests
   in the same cache line explicitly
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6b7e2d09

KVM: fix build warning within __kvm_set_memory_region() on s390 · d48ead8b

由 Heiko Carstens 提交于 1月 17, 2011

Get rid of this warning:

  CC      arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c:596:12: warning: 'kvm_create_dirty_bitmap' defined but not used

The only caller of the function is within a !CONFIG_S390 section, so add the
same ifdef around kvm_create_dirty_bitmap() as well.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d48ead8b

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功