提交 · 012f83cb2f8d7b9b7ad3b65e7e53a9365a357014 · openanolis / cloud-kernel

22 4月, 2013 9 次提交

KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs · 012f83cb

由 Abel Gordon 提交于 4月 18, 2013

Synchronize between the VMCS12 software controlled structure and the
processor-specific shadow vmcs
Signed-off-by: NAbel Gordon <abelg@il.ibm.com>
Reviewed-by: NOrit Wasserman <owasserm@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

012f83cb

KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs · c3114420

由 Abel Gordon 提交于 4月 18, 2013

Introduce a function used to copy fields from the software controlled VMCS12
to the processor-specific shadow vmcs
Signed-off-by: NAbel Gordon <abelg@il.ibm.com>
Reviewed-by: NOrit Wasserman <owasserm@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

c3114420

KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12 · 16f5b903

由 Abel Gordon 提交于 4月 18, 2013

Introduce a function used to copy fields from the processor-specific shadow
vmcs to the software controlled VMCS12
Signed-off-by: NAbel Gordon <abelg@il.ibm.com>
Reviewed-by: NOrit Wasserman <owasserm@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

16f5b903

KVM: nVMX: Release shadow vmcs · e7953d7f

由 Abel Gordon 提交于 4月 18, 2013

Unmap vmcs12 and release the corresponding shadow vmcs
Signed-off-by: NAbel Gordon <abelg@il.ibm.com>
Reviewed-by: NOrit Wasserman <owasserm@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

e7953d7f

KVM: nVMX: Allocate shadow vmcs · 8de48833

由 Abel Gordon 提交于 4月 18, 2013

Allocate a shadow vmcs used by the processor to shadow part of the fields
stored in the software defined VMCS12 (let L1 access fields without causing
exits). Note we keep a shadow vmcs only for the current vmcs12.  Once a vmcs12
becomes non-current, its shadow vmcs is released.
Signed-off-by: NAbel Gordon <abelg@il.ibm.com>
Reviewed-by: NOrit Wasserman <owasserm@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8de48833

KVM: nVMX: Fix VMXON emulation · 145c28dd

由 Abel Gordon 提交于 4月 18, 2013

handle_vmon doesn't check if L1 is already in root mode (VMXON
was previously called). This patch adds this missing check and calls
nested_vmx_failValid if VMX is already ON.
We need this check because L0 will allocate the shadow vmcs when L1
executes VMXON and we want to avoid host leaks (due to shadow vmcs
allocation) if L1 executes VMXON repeatedly.
Signed-off-by: NAbel Gordon <abelg@il.ibm.com>
Reviewed-by: NOrit Wasserman <owasserm@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

145c28dd

KVM: nVMX: Refactor handle_vmwrite · 20b97fea

由 Abel Gordon 提交于 4月 18, 2013

Refactor existent code so we re-use vmcs12_write_any to copy fields from the
shadow vmcs specified by the link pointer (used by the processor,
implementation-specific) to the VMCS12 software format used by L0 to hold
the fields in L1 memory address space.
Signed-off-by: NAbel Gordon <abelg@il.ibm.com>
Reviewed-by: NOrit Wasserman <owasserm@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

20b97fea

KVM: nVMX: Introduce vmread and vmwrite bitmaps · 4607c2d7

由 Abel Gordon 提交于 4月 18, 2013

Prepare vmread and vmwrite bitmaps according to a pre-specified list of fields.
These lists are intended to specifiy most frequent accessed fields so we can
minimize the number of fields that are copied from/to the software controlled
VMCS12 format to/from to processor-specific shadow vmcs. The lists were built
measuring the VMCS fields access rate after L2 Ubuntu 12.04 booted when it was
running on top of L1 KVM, also Ubuntu 12.04. Note that during boot there were
additional fields which were frequently modified but they were not added to
these lists because after boot these fields were not longer accessed by L1.
Signed-off-by: NAbel Gordon <abelg@il.ibm.com>
Reviewed-by: NOrit Wasserman <owasserm@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

4607c2d7

KVM: nVMX: Detect shadow-vmcs capability · abc4fc58

由 Abel Gordon 提交于 4月 18, 2013

Add logic required to detect if shadow-vmcs is supported by the
processor. Introduce a new kernel module parameter to specify if L0 should use
shadow vmcs (or not) to run L1.
Signed-off-by: NAbel Gordon <abelg@il.ibm.com>
Reviewed-by: NOrit Wasserman <owasserm@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

abc4fc58

18 4月, 2013 1 次提交

KVM: x86: Fix posted interrupt with CONFIG_SMP=n · 6ffbbbba

由 Zhang, Yang Z 提交于 4月 17, 2013

->send_IPI_mask is not defined on UP.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6ffbbbba

17 4月, 2013 7 次提交

KVM: VMX: Fix check guest state validity if a guest is in VM86 mode · f13882d8

由 Gleb Natapov 提交于 4月 14, 2013

If guest vcpu is in VM86 mode the vcpu state should be checked as if in
real mode.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f13882d8

KVM: nVMX: check vmcs12 for valid activity state · 26539bd0

由 Paolo Bonzini 提交于 4月 15, 2013

KVM does not use the activity state VMCS field, and does not support
it in nested VMX either (the corresponding bits in the misc VMX feature
MSR are zero).  Fail entry if the activity state is set to anything but
"active".

Since the value will always be the same for L1 and L2, we do not need
to read and write the corresponding VMCS field on L1/L2 transitions,
either.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

26539bd0

KVM: VMX: Use posted interrupt to deliver virtual interrupt · 5a71785d

由 Yang Zhang 提交于 4月 11, 2013

If posted interrupt is avaliable, then uses it to inject virtual
interrupt to guest.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5a71785d

KVM: VMX: Add the deliver posted interrupt algorithm · a20ed54d

由 Yang Zhang 提交于 4月 11, 2013

Only deliver the posted interrupt when target vcpu is running
and there is no previous interrupt pending in pir.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a20ed54d

KVM: Call common update function when ioapic entry changed. · 3d81bc7e

由 Yang Zhang 提交于 4月 11, 2013

Both TMR and EOI exit bitmap need to be updated when ioapic changed
or vcpu's id/ldr/dfr changed. So use common function instead eoi exit
bitmap specific function.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3d81bc7e

KVM: VMX: Check the posted interrupt capability · 01e439be

由 Yang Zhang 提交于 4月 11, 2013

Detect the posted interrupt feature. If it exists, then set it in vmcs_config.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

01e439be

KVM: VMX: Enable acknowledge interupt on vmexit · a547c6db

由 Yang Zhang 提交于 4月 11, 2013

The "acknowledge interrupt on exit" feature controls processor behavior
for external interrupt acknowledgement. When this control is set, the
processor acknowledges the interrupt controller to acquire the
interrupt vector on VM exit.

After enabling this feature, an interrupt which arrived when target cpu is
running in vmx non-root mode will be handled by vmx handler instead of handler
in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt
table to let real handler to handle it. Further, we will recognize the interrupt
and only delivery the interrupt which not belong to current vcpu through idt table.
The interrupt which belonged to current vcpu will be handled inside vmx handler.
This will reduce the interrupt handle cost of KVM.

Also, interrupt enable logic is changed if this feature is turnning on:
Before this patch, hypervior call local_irq_enable() to enable it directly.
Now IF bit is set on interrupt stack frame, and will be enabled on a return from
interrupt handler if exterrupt interrupt exists. If no external interrupt, still
call local_irq_enable() to enable it.

Refer to Intel SDM volum 3, chapter 33.2.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a547c6db

14 4月, 2013 5 次提交

KVM: nVMX: Avoid reading VM_EXIT_INTR_ERROR_CODE needlessly on nested exits · c0d1c770

由 Jan Kiszka 提交于 4月 14, 2013

We only need to update vm_exit_intr_error_code if there is a valid exit
interruption information and it comes with a valid error code.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

c0d1c770

KVM: nVMX: Fix conditions for interrupt injection · e8457c67

由 Jan Kiszka 提交于 4月 14, 2013

If we are entering guest mode, we do not want L0 to interrupt this
vmentry with all its side effects on the vmcs. Therefore, injection
shall be disallowed during L1->L2 transitions, as in the previous
version. However, this check is conceptually independent of
nested_exit_on_intr, so decouple it.

If L1 traps external interrupts, we can kick the guest from L2 to L1,
also just like the previous code worked. But we no longer need to
consider L1's idt_vectoring_info_field. It will always be empty at this
point. Instead, if L2 has pending events, those are now found in the
architectural queues and will, thus, prevent vmx_interrupt_allowed from
being called at all.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

e8457c67

KVM: nVMX: Rework event injection and recovery · 5f3d5799

由 Jan Kiszka 提交于 4月 14, 2013

The basic idea is to always transfer the pending event injection on
vmexit into the architectural state of the VCPU and then drop it from
there if it turns out that we left L2 to enter L1, i.e. if we enter
prepare_vmcs12.

vmcs12_save_pending_events takes care to transfer pending L0 events into
the queue of L1. That is mandatory as L1 may decide to switch the guest
state completely, invalidating or preserving the pending events for
later injection (including on a different node, once we support
migration).

This concept is based on the rule that a pending vmlaunch/vmresume is
not canceled. Otherwise, we would risk to lose injected events or leak
them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
entry of nested_vmx_vmexit.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

5f3d5799

KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1 · 3b656cf7

由 Jan Kiszka 提交于 4月 14, 2013

Check if the interrupt or NMI window exit is for L1 by testing if it has
the corresponding controls enabled. This is required when we allow
direct injection from L0 to L2
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

3b656cf7

KVM: VMX: do not try to reexecute failed instruction while emulating invalid guest state · 991eebf9

由 Gleb Natapov 提交于 4月 11, 2013

During invalid guest state emulation vcpu cannot enter guest mode to try
to reexecute instruction that emulator failed to emulate, so emulation
will happen again and again. Prevent that by telling the emulator that
instruction reexecution should not be attempted.
Signed-off-by: NGleb Natapov <gleb@redhat.com>

991eebf9

08 4月, 2013 2 次提交

KVM: VMX: Add missing braces to avoid redundant error check · a63cb560

由 Jan Kiszka 提交于 4月 08, 2013

The code was already properly aligned, now also add the braces to avoid
that err is checked even if alloc_apic_access_page didn't run and change
it. Found via Coccinelle by Fengguang Wu.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

a63cb560

KVM: x86: fix memory leak in vmx_init · 458f212e

由 Yang Zhang 提交于 4月 08, 2013

Free vmx_msr_bitmap_longmode_x2apic and vmx_msr_bitmap_longmode if
kvm_init() fails.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

458f212e

07 4月, 2013 1 次提交

KVM: nVMX: Check exit control for VM_EXIT_SAVE_IA32_PAT, not entry controls · b8c07d55

由 Jan Kiszka 提交于 4月 06, 2013

Obviously a copy&paste mistake: prepare_vmcs12 has to check L1's exit
controls for VM_EXIT_SAVE_IA32_PAT.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

b8c07d55

21 3月, 2013 1 次提交

KVM: x86: correctly initialize the CS base on reset · 04b66839

由 Paolo Bonzini 提交于 3月 19, 2013

The CS base was initialized to 0 on VMX (wrong, but usually overridden
by userspace before starting) or 0xf0000 on SVM.  The correct value is
0xffff0000, and VMX is able to emulate it now, so use it.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

04b66839

19 3月, 2013 1 次提交

KVM: VMX: Require KVM_SET_TSS_ADDR being called prior to running a VCPU · 4918c6ca

由 Jan Kiszka 提交于 3月 15, 2013

Very old user space (namely qemu-kvm before kvm-49) didn't set the TSS
base before running the VCPU. We always warned about this bug, but no
reports about users actually seeing this are known. Time to finally
remove the workaround that effectively prevented to call vmx_vcpu_reset
while already holding the KVM srcu lock.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

4918c6ca

14 3月, 2013 2 次提交

KVM: nVMX: Add preemption timer support · 0238ea91

由 Jan Kiszka 提交于 3月 13, 2013

Provided the host has this feature, it's straightforward to offer it to
the guest as well. We just need to load to timer value on L2 entry if
the feature was enabled by L1 and watch out for the corresponding exit
reason.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

0238ea91

KVM: nVMX: Provide EFER.LMA saving support · c18911a2

由 Jan Kiszka 提交于 3月 13, 2013

We will need EFER.LMA saving to provide unrestricted guest mode. All
what is missing for this is picking up EFER.LMA from VM_ENTRY_CONTROLS
on L2->L1 switches. If the host does not support EFER.LMA saving,
no change is performed, otherwise we properly emulate for L1 what the
hardware does for L0. Advertise the support, depending on the host
feature.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

c18911a2

13 3月, 2013 2 次提交

KVM: nVMX: Clean up and fix pin-based execution controls · eabeaacc

由 Jan Kiszka 提交于 3月 13, 2013

Only interrupt and NMI exiting are mandatory for KVM to work, thus can
be exposed to the guest unconditionally, virtual NMI exiting is
optional. So we must not advertise it unless the host supports it.

Introduce the symbolic constant PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR at
this chance.
Reviewed-by: N: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

eabeaacc

KVM: x86: Rework INIT and SIPI handling · 66450a21

由 Jan Kiszka 提交于 3月 13, 2013

A VCPU sending INIT or SIPI to some other VCPU races for setting the
remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED
was overwritten by kvm_emulate_halt and, thus, got lost.

This introduces APIC events for those two signals, keeping them in
kvm_apic until kvm_apic_accept_events is run over the target vcpu
context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there
are pending events, thus if vcpu blocking should end.

The patch comes with the side effect of effectively obsoleting
KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but
immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI.
The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED
state. That also means we no longer exit to user space after receiving a
SIPI event.

Furthermore, we already reset the VCPU on INIT, only fixing up the code
segment later on when SIPI arrives. Moreover, we fix INIT handling for
the BSP: it never enter wait-for-SIPI but directly starts over on INIT.
Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

66450a21

12 3月, 2013 1 次提交

KVM: x86: Drop unused return code from VCPU reset callback · 57f252f2

由 Jan Kiszka 提交于 3月 12, 2013

Neither vmx nor svm nor the common part may generate an error on
kvm_vcpu_reset. So drop the return code.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

57f252f2

11 3月, 2013 1 次提交

kvm: remove cast for kmalloc return value · 0fa24ce3

由 Ioan Orghici 提交于 3月 10, 2013

Signed-off-by: Ioan Orghici<ioan.orghici@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

0fa24ce3

08 3月, 2013 2 次提交

KVM: nVMX: Fix setting of CR0 and CR4 in guest mode · 1a0d74e6

由 Jan Kiszka 提交于 3月 07, 2013

The logic for calculating the value with which we call kvm_set_cr0/4 was
broken (will definitely be visible with nested unrestricted guest mode
support). Also, we performed the check regarding CR0_ALWAYSON too early
when in guest mode.

What really needs to be done on both CR0 and CR4 is to mask out L1-owned
bits and merge them in from L1's guest_cr0/4. In contrast, arch.cr0/4
and arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and,
thus, are not suited as input.

For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and
refuse the update if it fails. To be fully consistent, we implement this
check now also for CR4. For CR4, we move the check into vmx_set_cr4
while we keep it in handle_set_cr0. This is because the CR0 checks for
vmxon vs. guest mode will diverge soon when adding unrestricted guest
mode support.

Finally, we have to set the shadow to the value L2 wanted to write
originally.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1a0d74e6

KVM: nVMX: Fix content of MSR_IA32_VMX_ENTRY/EXIT_CTLS · 33fb20c3

由 Jan Kiszka 提交于 3月 06, 2013

Properly set those bits to 1 that the spec demands in case bit 55 of
VMX_BASIC is 0 - like in our case.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

33fb20c3

06 3月, 2013 1 次提交

KVM: nVMX: Reset RFLAGS on VM-exit · c4627c72

由 Jan Kiszka 提交于 3月 03, 2013

Ouch, how could this work so well that far? We need to clear RFLAGS to
the reset value as specified by the SDM. Particularly, IF must be off
after VM-exit!
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c4627c72

05 3月, 2013 2 次提交

KVM: nVMX: Fix switching of debug state · 503cd0c5

由 Jan Kiszka 提交于 3月 03, 2013

First of all, do not blindly overwrite GUEST_DR7 on L2 entry. The host
may have guest debugging enabled. Then properly reset DR7 and DEBUG_CTL
on L2->L1 switch as specified in the SDM.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

503cd0c5

KVM: set_memory_region: Drop user_alloc from set_memory_region() · 47ae31e2

由 Takuya Yoshikawa 提交于 2月 27, 2013

Except ia64's stale code, KVM_SET_MEMORY_REGION support, this is only
used for sanity checks in __kvm_set_memory_region() which can easily
be changed to use slot id instead.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

47ae31e2

28 2月, 2013 2 次提交

KVM: VMX: Pass vcpu to __vmx_complete_interrupts · 3ab66e8a

由 Jan Kiszka 提交于 2月 20, 2013

Cleanup: __vmx_complete_interrupts has no use for the vmx structure.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

3ab66e8a

KVM: nVMX: Avoid one redundant vmcs_read in prepare_vmcs12 · 44ceb9d6

由 Jan Kiszka 提交于 2月 20, 2013

IDT_VECTORING_INFO_FIELD was already read right after vmexit.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

44ceb9d6

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功