提交 · 893a5ab6ee7d51b231ed45aa844f8088642cb6bf · openeuler / Kernel

10 2月, 2011 1 次提交

KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_index · 893a5ab6

由 Joerg Roedel 提交于 1月 14, 2011

The gs_index loading code uses the swapgs instruction to
switch to the user gs_base temporarily. This is unsave in an
lightweight exit-path in KVM on AMD because the
KERNEL_GS_BASE MSR is switches lazily. An NMI happening in
the critical path of load_gs_index may use the wrong GS_BASE
value then leading to unpredictable behavior, e.g. a
triple-fault.

This patch fixes the issue by making sure that load_gs_index
is called only with a valid KERNEL_GS_BASE value loaded in
KVM.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

893a5ab6

14 1月, 2011 2 次提交

thp: mmu_notifier_test_young · 8ee53820

由 Andrea Arcangeli 提交于 1月 13, 2011

For GRU and EPT, we need gup-fast to set referenced bit too (this is why
it's correct to return 0 when shadow_access_mask is zero, it requires
gup-fast to set the referenced bit).  qemu-kvm access already sets the
young bit in the pte if it isn't zero-copy, if it's zero copy or a shadow
paging EPT minor fault we relay on gup-fast to signal the page is in
use...

We also need to check the young bits on the secondary pagetables for NPT
and not nested shadow mmu as the data may never get accessed again by the
primary pte.

Without this closer accuracy, we'd have to remove the heuristic that
avoids collapsing hugepages in hugepage virtual regions that have not even
a single subpage in use.

->test_young is full backwards compatible with GRU and other usages that
don't have young bits in pagetables set by the hardware and that should
nuke the secondary mmu mappings when ->clear_flush_young runs just like
EPT does.

Removing the heuristic that checks the young bit in
khugepaged/collapse_huge_page completely isn't so bad either probably but
I thought it was worth it and this makes it reliable.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ee53820

thp: kvm mmu transparent hugepage support · 936a5fe6

由 Andrea Arcangeli 提交于 1月 13, 2011

This should work for both hugetlbfs and transparent hugepages.

[akpm@linux-foundation.org: bring forward PageTransCompound() addition for bisectability]
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

936a5fe6

12 1月, 2011 37 次提交

KVM: Initialize fpu state in preemptible context · e5c30142

由 Avi Kivity 提交于 1月 11, 2011

init_fpu() (which is indirectly called by the fpu switching code) assumes
it is in process context.  Rather than makeing init_fpu() use an atomic
allocation, which can cause a task to be killed, make sure the fpu is
already initialized when we enter the run loop.

KVM-Stable-Tag.
Reported-and-tested-by: NKirill A. Shutemov <kas@openvz.org>
Acked-by: NPekka Enberg <penberg@kernel.org>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e5c30142

KVM: VMX: when entering real mode align segment base to 16 bytes · 444e863d

由 Gleb Natapov 提交于 12月 27, 2010

VMX checks that base is equal segment shifted 4 bits left. Otherwise
guest entry fails.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

444e863d

KVM: MMU: handle 'map_writable' in set_spte() function · f8e453b0

由 Xiao Guangrong 提交于 12月 23, 2010

Move the operation of 'writable' to set_spte() to clean up code

[avi: remove unneeded booleanification]
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f8e453b0

KVM: MMU: audit: allow audit more guests at the same time · b034cf01

由 Xiao Guangrong 提交于 12月 23, 2010

It only allows to audit one guest in the system since:
- 'audit_point' is a glob variable
- mmu_audit_disable() is called in kvm_mmu_destroy(), so audit is disabled
  after a guest exited

this patch fix those issues then allow to audit more guests at the same time
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b034cf01

KVM: Fetch guest cr3 from hardware on demand · aff48baa

由 Avi Kivity 提交于 12月 05, 2010

Instead of syncing the guest cr3 every exit, which is expensince on vmx
with ept enabled, sync it only on demand.

[sheng: fix incorrect cr3 seen by Windows XP]
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

aff48baa

A
KVM: Replace reads of vcpu->arch.cr3 by an accessor · 9f8fe504
由 Avi Kivity 提交于 12月 05, 2010
```
This allows us to keep cr3 in the VMCS, later on.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
9f8fe504

KVM: MMU: only write protect mappings at pagetable level · e49146dc

由 Marcelo Tosatti 提交于 12月 22, 2010

If a pagetable contains a writeable large spte, all of its sptes will be
write protected, including non-leaf ones, leading to endless pagefaults.

Do not write protect pages above PT_PAGE_TABLE_LEVEL, as the spte fault
paths assume non-leaf sptes are writable.
Tested-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e49146dc

KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear() · 16d8f72f

由 Avi Kivity 提交于 12月 21, 2010

'error' is byte sized, so use a byte register constraint.
Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

16d8f72f

KVM: MMU: Initialize base_role for tdp mmus · c445f8ef

由 Avi Kivity 提交于 12月 21, 2010

Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c445f8ef

KVM: VMX: Optimize atomic EFER load · 110312c8

由 Avi Kivity 提交于 12月 21, 2010

When NX is enabled on the host but not on the guest, we use the entry/exit
msr load facility, which is slow.  Optimize it to use entry/exit efer load,
which is ~1200 cycles faster.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

110312c8

KVM: SVM: copy instruction bytes from VMCB · dc25e89e

由 Andre Przywara 提交于 12月 21, 2010

In case of a nested page fault or an intercepted #PF newer SVM
implementations provide a copy of the faulting instruction bytes
in the VMCB.
Use these bytes to feed the instruction emulator and avoid the costly
guest instruction fetch in this case.
Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

dc25e89e

KVM: SVM: implement enhanced INVLPG intercept · df4f3108

由 Andre Przywara 提交于 12月 21, 2010

When the DecodeAssist feature is available, the linear address
is provided in the VMCB on INVLPG intercepts. Use it directly to
avoid any decoding and emulation.
This is only useful for shadow paging, though.
Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

df4f3108

KVM: SVM: enhance mov DR intercept handler · cae3797a

由 Andre Przywara 提交于 12月 21, 2010

Newer SVM implementations provide the GPR number in the VMCB, so
that the emulation path is no longer necesarry to handle debug
register access intercepts. Implement the handling in svm.c and
use it when the info is provided.
Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

cae3797a

KVM: SVM: enhance MOV CR intercept handler · 7ff76d58

由 Andre Przywara 提交于 12月 21, 2010

Newer SVM implementations provide the GPR number in the VMCB, so
that the emulation path is no longer necesarry to handle CR
register access intercepts. Implement the handling in svm.c and
use it when the info is provided.
Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7ff76d58

KVM: SVM: add new SVM feature bit names · ddce97aa

由 Andre Przywara 提交于 12月 21, 2010

the recent APM Vol.2 and the recent AMD CPUID specification describe
new CPUID features bits for SVM. Name them here for later usage.
Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ddce97aa

KVM: cleanup emulate_instruction · 51d8b661

由 Andre Przywara 提交于 12月 21, 2010

emulate_instruction had many callers, but only one used all
parameters. One parameter was unused, another one is now
hidden by a wrapper function (required for a future addition
anyway), so most callers use now a shorter parameter list.
Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

51d8b661

KVM: move complete_insn_gp() into x86.c · db8fcefa

由 Andre Przywara 提交于 12月 21, 2010

move the complete_insn_gp() helper function out of the VMX part
into the generic x86 part to make it usable by SVM.
Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

db8fcefa

KVM: x86: fix CR8 handling · eea1cff9

由 Andre Przywara 提交于 12月 21, 2010

The handling of CR8 writes in KVM is currently somewhat cumbersome.
This patch makes it look like the other CR register handlers
and fixes a possible issue in VMX, where the RIP would be incremented
despite an injected #GP.
Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

eea1cff9

KVM: Take missing slots_lock for kvm_io_bus_unregister_dev() · 175504cd

由 Takuya Yoshikawa 提交于 12月 16, 2010

In KVM_CREATE_IRQCHIP, kvm_io_bus_unregister_dev() is called without taking
slots_lock in the error handling path.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

175504cd

KVM: return true when user space query KVM_CAP_USER_NMI extension · a355c85c

由 Lai Jiangshan 提交于 12月 14, 2010

userspace may check this extension in runtime.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a355c85c

KVM: Correct kvm_pio tracepoint count field · 61cfab2e

由 Avi Kivity 提交于 12月 13, 2010

Currently, we record '1' for count regardless of the real count.  Fix.
Signed-off-by: NAvi Kivity <avi@redhat.com>

61cfab2e

KVM: MMU: Fix incorrect direct page write protection due to ro host page · d3c422bd

由 Avi Kivity 提交于 12月 12, 2010

If KVM sees a read-only host page, it will map it as read-only to prevent
breaking a COW. However, if the page was part of a large guest page, KVM
incorrectly extends the write protection to the entire large page frame
instead of limiting it to the normal host page.

This results in the instantiation of a new shadow page with read-only access.

If this happens for a MOVS instruction that moves memory between two normal
pages, within a single large page frame, and mapped within the guest as a
large page, and if, in addition, the source operand is not writeable in the
host (perhaps due to KSM), then KVM will instantiate a read-only direct
shadow page, instantiate an spte for the source operand, then instantiate
a new read/write direct shadow page and instantiate an spte for the
destination operand. Since these two sptes are in different shadow pages,
MOVS will never see them at the same time and the guest will not make
progress.

Fix by mapping the direct shadow page read/write, and only marking the
host page read-only.
Signed-off-by: NAvi Kivity <avi@redhat.com>

d3c422bd

KVM: SVM: Add xsetbv intercept · 81dd35d4

由 Joerg Roedel 提交于 12月 07, 2010

This patch implements the xsetbv intercept to the AMD part
of KVM. This makes AVX usable in a save way for the guest on
AVX capable AMD hardware.

The patch is tested by using AVX in the guest and host in
parallel and checking for data corruption. I also used the
KVM xsave unit-tests and they all pass.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

81dd35d4

KVM: MMU: Make the way of accessing lpage_info more generic · d4dbf470

由 Takuya Yoshikawa 提交于 12月 07, 2010

Large page information has two elements but one of them, write_count, alone
is accessed by a helper function.

This patch replaces this helper function with more generic one which returns
newly named kvm_lpage_info structure and use it to access the other element
rmap_pde.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d4dbf470

KVM: VMX: add module parameter to avoid trapping HLT instructions (v5) · 443381a8

由 Anthony Liguori 提交于 12月 06, 2010

In certain use-cases, we want to allocate guests fixed time slices where idle
guest cycles leave the machine idling. There are many approaches to achieve
this but the most direct is to simply avoid trapping the HLT instruction which
lets the guest directly execute the instruction putting the processor to sleep.

Introduce this as a module-level option for kvm-vmx.ko since if you do this
for one guest, you probably want to do it for all.
Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

443381a8

KVM: SVM: Implement Flush-By-Asid feature · 38e5e92f

由 Joerg Roedel 提交于 12月 03, 2010

This patch adds the new flush-by-asid of upcoming AMD
processors to the KVM-AMD module.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

38e5e92f

KVM: SVM: Use svm_flush_tlb instead of force_new_asid · f40f6a45

由 Joerg Roedel 提交于 12月 03, 2010

This patch replaces all calls to force_new_asid which are
intended to flush the guest-tlb by the more appropriate
function svm_flush_tlb. As a side-effect the force_new_asid
function is removed.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f40f6a45

KVM: SVM: Remove flush_guest_tlb function · fa22a8d6

由 Joerg Roedel 提交于 12月 03, 2010

This function is unused and there is svm_flush_tlb which
does the same. So this function can be removed.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fa22a8d6

KVM: MMU: retry #PF for softmmu · fb67e14f

由 Xiao Guangrong 提交于 12月 07, 2010

Retry #PF for softmmu only when the current vcpu has the same cr3 as the time
when #PF occurs
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fb67e14f

KVM: MMU: fix accessed bit set on prefault path · 2ec4739d

由 Xiao Guangrong 提交于 12月 07, 2010

Retry #PF is the speculative path, so don't set the accessed bit
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2ec4739d

KVM: MMU: rename 'no_apf' to 'prefault' · 78b2c54a

由 Xiao Guangrong 提交于 12月 07, 2010

It's the speculative path if 'no_apf = 1' and we will specially handle this
speculative path in the later patch, so 'prefault' is better to fit the sense.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

78b2c54a

KVM: SVM: Add clean-bit for LBR state · b53ba3f9

由 Joerg Roedel 提交于 12月 03, 2010

This patch implements the clean-bit for all LBR related
state. This includes the debugctl, br_from, br_to,
last_excp_from, and last_excp_to msrs.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b53ba3f9

KVM: SVM: Add clean-bit for CR2 register · 0574dec0

由 Joerg Roedel 提交于 12月 03, 2010

This patch implements the clean-bit for the cr2 register in
the vmcb.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0574dec0

KVM: SVM: Add clean-bit for Segements and CPL · 060d0c9a

由 Joerg Roedel 提交于 12月 03, 2010

This patch implements the clean-bit defined for the cs, ds,
ss, an es segemnts and the current cpl saved in the vmcb.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

060d0c9a

KVM: SVM: Add clean-bit for GDT and IDT · 17a703cb

由 Joerg Roedel 提交于 12月 03, 2010

This patch implements the clean-bit for the base and limit
of the gdt and idt in the vmcb.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

17a703cb

KVM: SVM: Add clean-bit for DR6 and DR7 · 72214b96

由 Joerg Roedel 提交于 12月 03, 2010

This patch implements the clean-bit for the dr6 and dr7
debug registers in the vmcb.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

72214b96

KVM: SVM: Add clean-bit for control registers · dcca1a65

由 Joerg Roedel 提交于 12月 03, 2010

This patch implements the CRx clean-bit for the vmcb. This
bit covers cr0, cr3, cr4, and efer.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

dcca1a65

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功