提交 · 22bd035868b06a614debf7352c09fb3efdc7c269 · openanolis / cloud-kernel

12 7月, 2011 11 次提交

KVM: nVMX: Add VMCS fields to the vmcs12 · 22bd0358

由 Nadav Har'El 提交于 5月 25, 2011

In this patch we add to vmcs12 (the VMCS that L1 keeps for L2) all the
standard VMCS fields.

Later patches will enable L1 to read and write these fields using VMREAD/
VMWRITE, and they will be used during a VMLAUNCH/VMRESUME in preparing vmcs02,
a hardware VMCS for running L2.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

22bd0358

KVM: nVMX: Introduce vmcs02: VMCS used to run L2 · ff2f6fe9

由 Nadav Har'El 提交于 5月 25, 2011

We saw in a previous patch that L1 controls its L2 guest with a vcms12.
L0 needs to create a real VMCS for running L2. We call that "vmcs02".
A later patch will contain the code, prepare_vmcs02(), for filling the vmcs02
fields. This patch only contains code for allocating vmcs02.

In this version, prepare_vmcs02() sets *all* of vmcs02's fields each time we
enter from L1 to L2, so keeping just one vmcs02 for the vcpu is enough: It can
be reused even when L1 runs multiple L2 guests. However, in future versions
we'll probably want to add an optimization where vmcs02 fields that rarely
change will not be set each time. For that, we may want to keep around several
vmcs02s of L2 guests that have recently run, so that potentially we could run
these L2s again more quickly because less vmwrites to vmcs02 will be needed.

This patch adds to each vcpu a vmcs02 pool, vmx->nested.vmcs02_pool,
which remembers the vmcs02s last used to run up to VMCS02_POOL_SIZE L2s.
As explained above, in the current version we choose VMCS02_POOL_SIZE=1,
I.e., one vmcs02 is allocated (and loaded onto the processor), and it is
reused to enter any L2 guest. In the future, when prepare_vmcs02() is
optimized not to set all fields every time, VMCS02_POOL_SIZE should be
increased.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ff2f6fe9

KVM: nVMX: Decoding memory operands of VMX instructions · 064aea77

由 Nadav Har'El 提交于 5月 25, 2011

This patch includes a utility function for decoding pointer operands of VMX
instructions issued by L1 (a guest hypervisor)
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

064aea77

KVM: nVMX: Implement reading and writing of VMX MSRs · b87a51ae

由 Nadav Har'El 提交于 5月 25, 2011

When the guest can use VMX instructions (when the "nested" module option is
on), it should also be able to read and write VMX MSRs, e.g., to query about
VMX capabilities. This patch adds this support.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b87a51ae

KVM: nVMX: Introduce vmcs12: a VMCS structure for L1 · a9d30f33

由 Nadav Har'El 提交于 5月 25, 2011

An implementation of VMX needs to define a VMCS structure. This structure
is kept in guest memory, but is opaque to the guest (who can only read or
write it with VMX instructions).

This patch starts to define the VMCS structure which our nested VMX
implementation will present to L1. We call it "vmcs12", as it is the VMCS
that L1 keeps for its L2 guest. We will add more content to this structure
in later patches.

This patch also adds the notion (as required by the VMX spec) of L1's "current
VMCS", and finally includes utility functions for mapping the guest-allocated
VMCSs in host memory.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a9d30f33

KVM: nVMX: Allow setting the VMXE bit in CR4 · 5e1746d6

由 Nadav Har'El 提交于 5月 25, 2011

This patch allows the guest to enable the VMXE bit in CR4, which is a
prerequisite to running VMXON.

Whether to allow setting the VMXE bit now depends on the architecture (svm
or vmx), so its checking has moved to kvm_x86_ops->set_cr4(). This function
now returns an int: If kvm_x86_ops->set_cr4() returns 1, __kvm_set_cr4()
will also return 1, and this will cause kvm_set_cr4() will throw a #GP.

Turning on the VMXE bit is allowed only when the nested VMX feature is
enabled, and turning it off is forbidden after a vmxon.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5e1746d6

KVM: nVMX: Implement VMXON and VMXOFF · ec378aee

由 Nadav Har'El 提交于 5月 25, 2011

This patch allows a guest to use the VMXON and VMXOFF instructions, and
emulates them accordingly. Basically this amounts to checking some
prerequisites, and then remembering whether the guest has enabled or disabled
VMX operation.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ec378aee

KVM: nVMX: Add "nested" module option to kvm_intel · 801d3424

由 Nadav Har'El 提交于 5月 25, 2011

This patch adds to kvm_intel a module option "nested". This option controls
whether the guest can use VMX instructions, i.e., whether we allow nested
virtualization. A similar, but separate, option already exists for the
SVM module.

This option currently defaults to 0, meaning that nested VMX must be
explicitly enabled by giving nested=1. When nested VMX matures, the default
should probably be changed to enable nested VMX by default - just like
nested SVM is currently enabled by default.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

801d3424

KVM: VMX: Keep list of loaded VMCSs, instead of vcpus · d462b819

由 Nadav Har'El 提交于 5月 24, 2011

In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.

The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.

So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Acked-by: NAcked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d462b819

KVM: VMX: always_inline VMREADs · 96304217

由 Avi Kivity 提交于 5月 15, 2011

vmcs_readl() and friends are really short, but gcc thinks they are long because of
the out-of-line exception handlers.  Mark them always_inline to clear the
misunderstanding.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

96304217

KVM: VMX: Move VMREAD cleanup to exception handler · 5e520e62

由 Avi Kivity 提交于 5月 15, 2011

We clean up a failed VMREAD by clearing the output register.  Do
it in the exception handler instead of unconditionally.  This is
worthwhile since there are more than a hundred call sites.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5e520e62

20 6月, 2011 1 次提交

KVM: VMX: do not overwrite uptodate vcpu->arch.cr3 on KVM_SET_SREGS · 5233dd51

由 Marcelo Tosatti 提交于 6月 06, 2011

Only decache guest CR3 value if vcpu->arch.cr3 is stale.
Fixes loadvm with live guest.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Tested-by: NMarkus Schade <markus.schade@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5233dd51

22 5月, 2011 2 次提交

KVM: VMX: Cache vmcs segment fields · 2fb92db1

由 Avi Kivity 提交于 4月 27, 2011

Since the emulator now checks segment limits and access rights, it
generates a lot more accesses to the vmcs segment fields.  Undo some
of the performance hit by cacheing those fields in a read-only cache
(the entire cache is invalidated on any write, or on guest exit).
Signed-off-by: NAvi Kivity <avi@redhat.com>

2fb92db1

A
KVM: VMX: Avoid reading %rip unnecessarily when handling exceptions · 0a434bb2
由 Avi Kivity 提交于 4月 28, 2011
```
Avoids a VMREAD.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
0a434bb2

11 5月, 2011 15 次提交

KVM: fix push of wrong eip when doing softint · 71f9833b

由 Serge E. Hallyn 提交于 4月 13, 2011

When doing a soft int, we need to bump eip before pushing it to
the stack.  Otherwise we'll do the int a second time.

[apw@canonical.com: merged eip update as per Jan's recommendation.]
Signed-off-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: NAndy Whitcroft <apw@canonical.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

71f9833b

KVM: VMX: Ensure that vmx_create_vcpu always returns proper error · be6d05cf

由 Jan Kiszka 提交于 4月 13, 2011

In case certain allocations fail, vmx_create_vcpu may return 0 as error
instead of a negative value encoded via ERR_PTR. This causes a NULL
pointer dereferencing later on in kvm_vm_ioctl_vcpu_create.
Reported-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

be6d05cf

KVM: X86: Delegate tsc-offset calculation to architecture code · 857e4099

由 Joerg Roedel 提交于 3月 25, 2011

With TSC scaling in SVM the tsc-offset needs to be
calculated differently. This patch propagates this
calculation into the architecture specific modules so that
this complexity can be handled there.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

857e4099

KVM: X86: Implement call-back to propagate virtual_tsc_khz · 4051b188

由 Joerg Roedel 提交于 3月 25, 2011

This patch implements a call-back into the architecture code
to allow the propagation of changes to the virtual tsc_khz
of the vcpu.
On SVM it updates the tsc_ratio variable, on VMX it does
nothing.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4051b188

KVM: x86: Add x86 callback for intercept check · 8a76d7f2

由 Joerg Roedel 提交于 4月 04, 2011

This patch adds a callback into kvm_x86_ops so that svm and
vmx code can do intercept checks on emulated instructions.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8a76d7f2

KVM: VMX: simplify NMI mask management · 654f06fc

由 Avi Kivity 提交于 3月 23, 2011

Use vmx_set_nmi_mask() instead of open-coding management of
the hardware bit and the software hint (nmi_known_unmasked).

There's a slight change of behaviour when running without
hardware virtual NMI support - we now clear the NMI mask if
NMI delivery faulted in that case as well.  This improves
emulation accuracy.
Signed-off-by: NAvi Kivity <avi@redhat.com>

654f06fc

KVM: VMX: Use cached VM_EXIT_INTR_INFO in handle_exception · 88786475

由 Avi Kivity 提交于 3月 07, 2011

vmx_complete_atomic_exit() cached it for us, so we can use it here.
Signed-off-by: NAvi Kivity <avi@redhat.com>

88786475

A
KVM: VMX: Don't VMREAD VM_EXIT_INTR_INFO unconditionally · c5ca8e57
由 Avi Kivity 提交于 3月 07, 2011
```
Only read it if we're going to use it later.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
c5ca8e57

KVM: VMX: Refactor vmx_complete_atomic_exit() · 00eba012

由 Avi Kivity 提交于 3月 07, 2011

Move the exit reason checks to the front of the function, for early
exit in the common case.
Signed-off-by: NAvi Kivity <avi@redhat.com>

00eba012

KVM: VMX: Qualify check for host NMI · f9902069

由 Avi Kivity 提交于 3月 07, 2011

Check for the exit reason first; this allows us, later,
to avoid a VMREAD for VM_EXIT_INTR_INFO_FIELD.
Signed-off-by: NAvi Kivity <avi@redhat.com>

f9902069

KVM: VMX: Avoid vmx_recover_nmi_blocking() when unneeded · 9d58b931

由 Avi Kivity 提交于 3月 07, 2011

When we haven't injected an interrupt, we don't need to recover
the nmi blocking state (since the guest can't set it by itself).
This allows us to avoid a VMREAD later on.
Signed-off-by: NAvi Kivity <avi@redhat.com>

9d58b931

KVM: VMX: Cache cpl · 69c73028

由 Avi Kivity 提交于 3月 07, 2011

We may read the cpl quite often in the same vmexit (instruction privilege
check, memory access checks for instruction and operands), so we gain
a bit if we cache the value.
Signed-off-by: NAvi Kivity <avi@redhat.com>

69c73028

KVM: VMX: Optimize vmx_get_cpl() · f4c63e5d

由 Avi Kivity 提交于 3月 07, 2011

In long mode, vm86 mode is disallowed, so we need not check for
it.  Reading rflags.vm may require a VMREAD, so it is expensive.
Signed-off-by: NAvi Kivity <avi@redhat.com>

f4c63e5d

KVM: VMX: Optimize vmx_get_rflags() · 6de12732

由 Avi Kivity 提交于 3月 07, 2011

If called several times within the same exit, return cached results.
Signed-off-by: NAvi Kivity <avi@redhat.com>

6de12732

KVM: Use kvm_get_rflags() and kvm_set_rflags() instead of the raw versions · f6e78475

由 Avi Kivity 提交于 8月 02, 2010

Some rflags bits are owned by the host, not guest, so we need to use
kvm_get_rflags() to strip those bits away or kvm_set_rflags() to add them
back.
Signed-off-by: NAvi Kivity <avi@redhat.com>

f6e78475

18 3月, 2011 11 次提交

KVM: unbreak userspace that does not sets tss address · 776e58ea

由 Gleb Natapov 提交于 3月 13, 2011

Commit 6440e5967bc broke old userspaces that do not set tss address
before entering vcpu. Unbreak it by setting tss address to a safe
value on the first vcpu entry. New userspaces should set tss address,
so print warning in case it doesn't.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

776e58ea

KVM: fix rcu usage in init_rmode_* functions · 40dcaa9f

由 Xiao Guangrong 提交于 3月 09, 2011

fix:
[ 3494.671786] stack backtrace:
[ 3494.671789] Pid: 10527, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23
[ 3494.671790] Call Trace:
[ 3494.671796]  [] ? lockdep_rcu_dereference+0x9d/0xa5
[ 3494.671826]  [] ? kvm_memslots+0x6b/0x73 [kvm]
[ 3494.671834]  [] ? gfn_to_memslot+0x16/0x4f [kvm]
[ 3494.671843]  [] ? gfn_to_hva+0x16/0x27 [kvm]
[ 3494.671851]  [] ? kvm_write_guest_page+0x31/0x83 [kvm]
[ 3494.671861]  [] ? kvm_clear_guest_page+0x1a/0x1c [kvm]
[ 3494.671867]  [] ? vmx_set_tss_addr+0x83/0x122 [kvm_intel]

and:
[ 8328.789599] stack backtrace:
[ 8328.789601] Pid: 18736, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23
[ 8328.789603] Call Trace:
[ 8328.789609]  [] ? lockdep_rcu_dereference+0x9d/0xa5
[ 8328.789621]  [] ? kvm_memslots+0x6b/0x73 [kvm]
[ 8328.789628]  [] ? gfn_to_memslot+0x16/0x4f [kvm]
[ 8328.789635]  [] ? gfn_to_hva+0x16/0x27 [kvm]
[ 8328.789643]  [] ? kvm_write_guest_page+0x31/0x83 [kvm]
[ 8328.789699]  [] ? kvm_clear_guest_page+0x1a/0x1c [kvm]
[ 8328.789713]  [] ? vmx_create_vcpu+0x316/0x3c8 [kvm_intel]
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

40dcaa9f

KVM: x86: Remove useless regs_page pointer from kvm_lapic · afc20184

由 Takuya Yoshikawa 提交于 3月 05, 2011

Access to this page is mostly done through the regs member which holds
the address to this page. The exceptions are in vmx_vcpu_reset() and
kvm_free_lapic() and these both can easily be converted to using regs.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

afc20184

KVM: VMX: Initialize vm86 TSS only once. · 93ea5388

由 Gleb Natapov 提交于 2月 21, 2011

Currently vm86 task is initialized on each real mode entry and vcpu
reset. Initialization is done by zeroing TSS and updating relevant
fields. But since all vcpus are using the same TSS there is a race where
one vcpu may use TSS while other vcpu is initializing it, so the vcpu
that uses TSS will see wrong TSS content and will behave incorrectly.
Fix that by initializing TSS only once.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

93ea5388

KVM: VMX: update live TR selector if it changes in real mode · a8ba6c26

由 Gleb Natapov 提交于 2月 21, 2011

When rmode.vm86 is active TR descriptor is updated with vm86 task values,
but selector is left intact. vmx_set_segment() makes sure that if TR
register is written into while vm86 is active the new values are saved
for use after vm86 is deactivated, but since selector is not updated on
vm86 activation/deactivation new value is lost. Fix this by writing new
selector into vmcs immediately.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a8ba6c26

KVM: VMX: add the __noclone attribute to vmx_vcpu_run · a3b5ba49

由 Lai Jiangshan 提交于 2月 11, 2011

The changelog of 104f226b said "adds the __noclone attribute",
but it was missing in its patch. I think it is still needed.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a3b5ba49

KVM: VMX: fix detection of BIOS disabling VMX · 23f3e991

由 Joseph Cihula 提交于 2月 08, 2011

This patch fixes the logic used to detect whether BIOS has disabled VMX, for
the case where VMX is enabled only under SMX, but tboot is not active.
Signed-off-by: NJoseph Cihula <joseph.cihula@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

23f3e991

KVM: VMX: Avoid atomic operation in vmx_vcpu_run · 40712fae

由 Avi Kivity 提交于 1月 06, 2011

Instead of exchanging the guest and host rcx, have separate storage
for each.  This allows us to avoid using the xchg instruction, which
is is a little slower than normal operations.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

40712fae

KVM: VMX: Simplify saving guest rcx in vmx_vcpu_run · 1c696d0e

由 Avi Kivity 提交于 1月 06, 2011

Change

  push top-of-stack
  pop guest-rcx
  pop dummy

to

  pop guest-rcx

which is the same thing, only simpler.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1c696d0e

KVM: VMX: increase ple_gap default to 128 · 00c25bce

由 Rik van Riel 提交于 1月 04, 2011

On some CPUs, a ple_gap of 41 is simply insufficient to ever trigger
PLE exits, even with the minimalistic PLE test from kvm-unit-tests.

http://git.kernel.org/?p=virt/kvm/kvm-unit-tests.git;a=commitdiff;h=eda71b28fa122203e316483b35f37aaacd42f545

For example, the Xeon X5670 CPU needs a ple_gap of at least 48 in
order to get pause loop exits:

# modprobe kvm_intel ple_gap=47
# taskset 1 /usr/local/bin/qemu-system-x86_64 \
  -device testdev,chardev=log -chardev stdio,id=log \
  -kernel x86/vmexit.flat -append ple-round-robin -smp 2
VNC server running on `::1:5900'
enabling apic
enabling apic
ple-round-robin 58298446
# rmmod kvm_intel
# modprobe kvm_intel ple_gap=48
# taskset 1 /usr/local/bin/qemu-system-x86_64 \
   -device testdev,chardev=log -chardev stdio,id=log \
   -kernel x86/vmexit.flat -append ple-round-robin -smp 2
VNC server running on `::1:5900'
enabling apic
enabling apic
ple-round-robin 36616

Increase the ple_gap to 128 to be on the safe side.
Signed-off-by: NRik van Riel <riel@redhat.com>
Acked-by: NZhai, Edwin <edwin.zhai@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

00c25bce

KVM: VMX: Avoid leaking fake realmode state to userspace · a9179499

由 Avi Kivity 提交于 1月 03, 2011

When emulating real mode, we fake some state:

 - tr.base points to a fake vm86 tss
 - segment registers are made to conform to vm86 restrictions

change vmx_get_segment() not to expose this fake state to userspace;
instead, return the original state.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a9179499

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功