提交 · cf3ace79c065d65e9636f719a9df1382725410e3 · openeuler / raspberrypi-kernel

26 9月, 2011 2 次提交

KVM: Use __print_symbolic() for vmexit tracepoints · 0d460ffc

由 Stefan Hajnoczi 提交于 7月 22, 2011

The vmexit tracepoints format the exit_reason to make it human-readable.
Since the exit_reason depends on the instruction set (vmx or svm),
formatting is handled with ftrace_print_symbols_seq() by referring to
the appropriate exit reason table.

However, the ftrace_print_symbols_seq() function is not meant to be used
directly in tracepoints since it does not export the formatting table
which userspace tools like trace-cmd and perf use to format traces.

In practice perf dies when formatting vmexit-related events and
trace-cmd falls back to printing the numeric value (with extra
formatting code in the kvm plugin to paper over this limitation). Other
userspace consumers of vmexit-related tracepoints would be in similar
trouble.

To avoid significant changes to the kvm_exit tracepoint, this patch
moves the vmx and svm exit reason tables into arch/x86/kvm/trace.h and
selects the right table with __print_symbolic() depending on the
instruction set. Note that __print_symbolic() is designed for exporting
the formatting table to userspace and allows trace-cmd and perf to work.
Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0d460ffc

KVM: x86: Raise the hard VCPU count limit · 8c3ba334

由 Sasha Levin 提交于 7月 18, 2011

The patch raises the hard limit of VCPU count to 254.

This will allow developers to easily work on scalability
and will allow users to test high VCPU setups easily without
patching the kernel.

To prevent possible issues with current setups, KVM_CAP_NR_VCPUS
now returns the recommended VCPU limit (which is still 64) - this
should be a safe value for everybody, while a new KVM_CAP_MAX_VCPUS
returns the hard limit which is now 254.

Cc: Avi Kivity <avi@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Suggested-by: NPekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8c3ba334

24 7月, 2011 3 次提交

KVM: MMU: lockless walking shadow page table · c2a2ac2b

由 Xiao Guangrong 提交于 7月 12, 2011

Use rcu to protect shadow pages table to be freed, so we can safely walk it,
it should run fastly and is needed by mmio page fault
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c2a2ac2b

KVM: MMU: remove bypass_guest_pf · c3707958

由 Xiao Guangrong 提交于 7月 12, 2011

The idea is from Avi:
| Maybe it's time to kill off bypass_guest_pf=1.  It's not as effective as
| it used to be, since unsync pages always use shadow_trap_nonpresent_pte,
| and since we convert between the two nonpresent_ptes during sync and unsync.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c3707958

KVM: MMU: cache mmio info on page fault path · bebb106a

由 Xiao Guangrong 提交于 7月 12, 2011

If the page fault is caused by mmio, we can cache the mmio info, later, we do
not need to walk guest page table and quickly know it is a mmio fault while we
emulate the mmio instruction
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bebb106a

14 7月, 2011 1 次提交

KVM: Steal time implementation · c9aaa895

由 Glauber Costa 提交于 7月 11, 2011

To implement steal time, we need the hypervisor to pass the guest
information about how much time was spent running other processes
outside the VM, while the vcpu had meaningful work to do - halt
time does not count.

This information is acquired through the run_delay field of
delayacct/schedstats infrastructure, that counts time spent in a
runqueue but not running.

Steal time is a per-cpu information, so the traditional MSR-based
infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the
memory area address containing information about steal time

This patch contains the hypervisor part of the steal time infrasructure,
and can be backported independently of the guest portion.

[avi, yongjie: export delayacct_on, to avoid build failures in some configs]
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Tested-by: NEric B Munson <emunson@mgebm.net>
CC: Rik van Riel <riel@redhat.com>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NYongjie Ren <yongjie.ren@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c9aaa895

12 7月, 2011 8 次提交

KVM: MMU: Adjust shadow paging to work when SMEP=1 and CR0.WP=0 · 411c588d

由 Avi Kivity 提交于 6月 06, 2011

When CR0.WP=0, we sometimes map user pages as kernel pages (to allow
the kernel to write to them).  Unfortunately this also allows the kernel
to fetch from these pages, even if CR4.SMEP is set.

Adjust for this by also setting NX on the spte in these circumstances.
Signed-off-by: NAvi Kivity <avi@redhat.com>

411c588d

KVM: Remove RDWRGSFS bit from CR4_RESERVED_BITS · d9c3476d

由 Yang, Wei 提交于 6月 14, 2011

This patch removes RDWRGSFS bit from CR4_RESERVED_BITS.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d9c3476d

KVM: Remove SMEP bit from CR4_RESERVED_BITS · 8d9c975f

由 Yang, Wei Y 提交于 6月 03, 2011

This patch removes SMEP bit from CR4_RESERVED_BITS.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NShan, Haitao <haitao.shan@intel.com>
Signed-off-by: NLi, Xin <xin.li@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8d9c975f

KVM: nVMX: Allow setting the VMXE bit in CR4 · 5e1746d6

由 Nadav Har'El 提交于 5月 25, 2011

This patch allows the guest to enable the VMXE bit in CR4, which is a
prerequisite to running VMXON.

Whether to allow setting the VMXE bit now depends on the architecture (svm
or vmx), so its checking has moved to kvm_x86_ops->set_cr4(). This function
now returns an int: If kvm_x86_ops->set_cr4() returns 1, __kvm_set_cr4()
will also return 1, and this will cause kvm_set_cr4() will throw a #GP.

Turning on the VMXE bit is allowed only when the nested VMX feature is
enabled, and turning it off is forbidden after a vmxon.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5e1746d6

KVM: MMU: remove the arithmetic of parent pte rmap · 67052b35

由 Xiao Guangrong 提交于 5月 15, 2011

Parent pte rmap and page rmap are very similar, so use the same arithmetic
for them
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

67052b35

KVM: MMU: abstract the operation of rmap · 53c07b18

由 Xiao Guangrong 提交于 5月 15, 2011

Abstract the operation of rmap to spte_list, then we can use it for the
reverse mapping of parent pte in the later patch
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

53c07b18

KVM: MMU: optimize pte write path if don't have protected sp · 332b207d

由 Xiao Guangrong 提交于 5月 15, 2011

Simply return from kvm_mmu_pte_write path if no shadow page is
write-protected, then we can avoid to walk all shadow pages and hold
mmu-lock
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

332b207d

KVM: VMX: Move VMREAD cleanup to exception handler · 5e520e62

由 Avi Kivity 提交于 5月 15, 2011

We clean up a failed VMREAD by clearing the output register.  Do
it in the exception handler instead of unconditionally.  This is
worthwhile since there are more than a hundred call sites.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5e520e62

22 5月, 2011 6 次提交

KVM: VMX: Cache vmcs segment fields · 2fb92db1

由 Avi Kivity 提交于 4月 27, 2011

Since the emulator now checks segment limits and access rights, it
generates a lot more accesses to the vmcs segment fields.  Undo some
of the performance hit by cacheing those fields in a read-only cache
(the entire cache is invalidated on any write, or on guest exit).
Signed-off-by: NAvi Kivity <avi@redhat.com>

2fb92db1

KVM: mmio_fault_cr2 is not used · 8d7d8102

由 Gleb Natapov 提交于 4月 12, 2011

Remove unused variable mmio_fault_cr2.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8d7d8102

A
KVM: x86 emulator: add ->fix_hypercall() callback · d6aa1000
由 Avi Kivity 提交于 4月 20, 2011
```
Artificial, but needed to remove direct calls to KVM.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
d6aa1000
A
KVM: x86 emulator: make emulate_invlpg() an emulator callback · 3cb16fe7
由 Avi Kivity 提交于 4月 20, 2011
```
Removing direct calls to KVM.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
3cb16fe7

KVM: x86 emulator: emulate CLTS internally · 2d04a05b

由 Avi Kivity 提交于 4月 20, 2011

Avoid using ctxt->vcpu; we can do everything with ->get_cr() and ->set_cr().

A side effect is that we no longer activate the fpu on emulated CLTS; but that
should be very rare.
Signed-off-by: NAvi Kivity <avi@redhat.com>

2d04a05b

A
KVM: x86 emulator: add and use new callbacks set_idt(), set_gdt() · 1ac9d0cf
由 Avi Kivity 提交于 4月 20, 2011
```
Replacing direct calls to realmode_lgdt(), realmode_lidt().
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
1ac9d0cf

11 5月, 2011 12 次提交

KVM: X86: Update last_guest_tsc in vcpu_put · 7c4c0f4f

由 Joerg Roedel 提交于 4月 18, 2011

The last_guest_tsc is used in vcpu_load to adjust the
tsc_offset since tsc-scaling is merged. So the
last_guest_tsc needs to be updated in vcpu_put instead of
the the last_host_tsc. This is fixed with this patch.
Reported-by: NJan Kiszka <jan.kiszka@web.de>
Tested-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7c4c0f4f

KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu · 7ae441ea

由 Gleb Natapov 提交于 3月 31, 2011

Currently we sync registers back and forth before/after exiting
to userspace for IO, but during IO device model shouldn't need to
read/write the registers, so we can as well skip those sync points. The
only exaception is broken vmware backdor interface. The new code sync
registers content during IO only if registers are read from/written to
by userspace in the middle of the IO operation and this almost never
happens in practise.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7ae441ea

KVM: X86: Implement userspace interface to set virtual_tsc_khz · 92a1f12d

由 Joerg Roedel 提交于 3月 25, 2011

This patch implements two new vm-ioctls to get and set the
virtual_tsc_khz if the machine supports tsc-scaling. Setting
the tsc-frequency is only possible before userspace creates
any vcpu.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

92a1f12d

KVM: X86: Delegate tsc-offset calculation to architecture code · 857e4099

由 Joerg Roedel 提交于 3月 25, 2011

With TSC scaling in SVM the tsc-offset needs to be
calculated differently. This patch propagates this
calculation into the architecture specific modules so that
this complexity can be handled there.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

857e4099

KVM: X86: Implement call-back to propagate virtual_tsc_khz · 4051b188

由 Joerg Roedel 提交于 3月 25, 2011

This patch implements a call-back into the architecture code
to allow the propagation of changes to the virtual tsc_khz
of the vcpu.
On SVM it updates the tsc_ratio variable, on VMX it does
nothing.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4051b188

KVM: X86: Let kvm-clock report the right tsc frequency · 1e993611

由 Joerg Roedel 提交于 3月 25, 2011

This patch changes the kvm_guest_time_update function to use
TSC frequency the guest actually has for updating its clock.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1e993611

KVM: MMU: remove mmu_seq verification on pte update path · 7c562522

由 Xiao Guangrong 提交于 3月 28, 2011

The mmu_seq verification can be removed since we get the pfn in the
protection of mmu_lock.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7c562522

KVM: SVM: Add intercept check for emulated cr accesses · cfec82cb

由 Joerg Roedel 提交于 4月 04, 2011

This patch adds all necessary intercept checks for
instructions that access the crX registers.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

cfec82cb

KVM: x86: Add x86 callback for intercept check · 8a76d7f2

由 Joerg Roedel 提交于 4月 04, 2011

This patch adds a callback into kvm_x86_ops so that svm and
vmx code can do intercept checks on emulated instructions.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8a76d7f2

KVM: 16-byte mmio support · cef4dea0

由 Avi Kivity 提交于 1月 20, 2010

Since sse instructions can issue 16-byte mmios, we need to support them. We
can't increase the kvm_run mmio buffer size to 16 bytes without breaking
compatibility, so instead we break the large mmios into two smaller 8-byte
ones. Since the bus is 64-bit we aren't breaking any atomicity guarantees.
Signed-off-by: NAvi Kivity <avi@redhat.com>

cef4dea0

KVM: VMX: Cache cpl · 69c73028

由 Avi Kivity 提交于 3月 07, 2011

We may read the cpl quite often in the same vmexit (instruction privilege
check, memory access checks for instruction and operands), so we gain
a bit if we cache the value.
Signed-off-by: NAvi Kivity <avi@redhat.com>

69c73028

KVM: VMX: Optimize vmx_get_rflags() · 6de12732

由 Avi Kivity 提交于 3月 07, 2011

If called several times within the same exit, return cached results.
Signed-off-by: NAvi Kivity <avi@redhat.com>

6de12732

18 3月, 2011 4 次提交

KVM: MMU: cleanup pte write path · 0f53b5b1

由 Xiao Guangrong 提交于 3月 09, 2011

This patch does:
- call vcpu->arch.mmu.update_pte directly
- use gfn_to_pfn_atomic in update_pte path

The suggestion is from Avi.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0f53b5b1

KVM: MMU: do not record gfn in kvm_mmu_pte_write · 49b26e26

由 Xiao Guangrong 提交于 3月 04, 2011

No need to record the gfn to verifier the pte has the same mode as
current vcpu, it's because we only speculatively update the pte only
if the pte and vcpu have the same mode
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

49b26e26

KVM: x86: Convert tsc_write_lock to raw_spinlock · 038f8c11

由 Jan Kiszka 提交于 2月 04, 2011

Code under this lock requires non-preemptibility. Ensure this also over
-rt by converting it to raw spinlock.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

038f8c11

KVM: Convert kvm_lock to raw_spinlock · e935b837

由 Jan Kiszka 提交于 2月 08, 2011

Code under this lock requires non-preemptibility. Ensure this also over
-rt by converting it to raw spinlock.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e935b837

14 1月, 2011 1 次提交

thp: mmu_notifier_test_young · 8ee53820

由 Andrea Arcangeli 提交于 1月 13, 2011

For GRU and EPT, we need gup-fast to set referenced bit too (this is why
it's correct to return 0 when shadow_access_mask is zero, it requires
gup-fast to set the referenced bit).  qemu-kvm access already sets the
young bit in the pte if it isn't zero-copy, if it's zero copy or a shadow
paging EPT minor fault we relay on gup-fast to signal the page is in
use...

We also need to check the young bits on the secondary pagetables for NPT
and not nested shadow mmu as the data may never get accessed again by the
primary pte.

Without this closer accuracy, we'd have to remove the heuristic that
avoids collapsing hugepages in hugepage virtual regions that have not even
a single subpage in use.

->test_young is full backwards compatible with GRU and other usages that
don't have young bits in pagetables set by the hardware and that should
nuke the secondary mmu mappings when ->clear_flush_young runs just like
EPT does.

Removing the heuristic that checks the young bit in
khugepaged/collapse_huge_page completely isn't so bad either probably but
I thought it was worth it and this makes it reliable.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ee53820

12 1月, 2011 3 次提交

KVM: MMU: audit: allow audit more guests at the same time · b034cf01

由 Xiao Guangrong 提交于 12月 23, 2010

It only allows to audit one guest in the system since:
- 'audit_point' is a glob variable
- mmu_audit_disable() is called in kvm_mmu_destroy(), so audit is disabled
  after a guest exited

this patch fix those issues then allow to audit more guests at the same time
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b034cf01

KVM: Fetch guest cr3 from hardware on demand · aff48baa

由 Avi Kivity 提交于 12月 05, 2010

Instead of syncing the guest cr3 every exit, which is expensince on vmx
with ept enabled, sync it only on demand.

[sheng: fix incorrect cr3 seen by Windows XP]
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

aff48baa

KVM: SVM: copy instruction bytes from VMCB · dc25e89e

由 Andre Przywara 提交于 12月 21, 2010

In case of a nested page fault or an intercepted #PF newer SVM
implementations provide a copy of the faulting instruction bytes
in the VMCB.
Use these bytes to feed the instruction emulator and avoid the costly
guest instruction fetch in this case.
Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

dc25e89e