- 07 8月, 2013 2 次提交
-
-
由 Nadav Har'El 提交于
Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577 switch the EFER MSR when EPT is used and the host and guest have different NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2) and want to be able to run recent KVM as L1, we need to allow L1 to use this EFER switching feature. To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available, and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the former (the latter is still unsupported). Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state, respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do in this patch is to properly advertise this feature to L1. Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using vmx_set_efer (which itself sets one of several vmcs02 fields), so we always support this feature, regardless of whether the host supports it. Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NXinhao Xu <xinhao.xu@intel.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Gleb Natapov 提交于
After commit 21feb4eb tr base is zeroed during vmexit. Set it to L1's HOST_TR_BASE. This should fix https://bugzilla.kernel.org/show_bug.cgi?id=60679Reported-by: NYongjie Ren <yongjie.ren@intel.com> Reviewed-by: NArthur Chunqi Li <yzt356@gmail.com> Tested-by: NYongjie Ren <yongjie.ren@intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 29 7月, 2013 2 次提交
-
-
由 Gleb Natapov 提交于
During nested vmentry into vm86 mode a vcpu state is found to be incorrect because rflags does not have VM flag set since it is read from the cache and has L1's value instead of L2's. If emulate_invalid_guest_state=1 L0 KVM tries to emulate it, but emulation does not work for nVMX and it never should happen anyway. Fix that by using vmx_set_rflags() to set rflags during nested vmentry which takes care of updating register cache. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The next patch will reuse it for other userspace exits than MMIO, namely debug events. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 7月, 2013 5 次提交
-
-
由 Arthur Chunqi Li 提交于
When L2 exits to L1, segment infomations of L1 are not set correctly. According to Intel SDM 27.5.2(Loading Host Segment and Descriptor Table Registers), segment base/limit/access right of L1 should be set to some designed value when L2 exits to L1. This patch fixes this. Signed-off-by: NArthur Chunqi Li <yzt356@gmail.com> Reviewed-by: NGleb Natapov <gnatapov@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Har'El 提交于
Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment. This patch simulate this MSR in nested_vmx and the default value is 0x0. BIOS should set it to 0x5 before VMXON. After setting the lock bit, write to it will cause #GP(0). Another QEMU patch is also needed to handle emulation of reset and migration. Reset to vCPU should clear this MSR and migration should reserve value of it. This patch is based on Nadav's previous commit. http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/88478Signed-off-by: NNadav Har'El <nyh@math.technion.ac.il> Signed-off-by: NArthur Chunqi Li <yzt356@gmail.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Mathias Krause 提交于
Use a const pointer type instead of casting away the const qualifier from const arrays. Keep the pointer array on the stack, nonetheless. Making it static just increases the object size. Signed-off-by: NMathias Krause <minipli@googlemail.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Arthur Chunqi Li 提交于
Set rflags after successfully emulateing VMXON/VMXOFF in VMX. Signed-off-by: NArthur Chunqi Li <yzt356@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Arthur Chunqi Li 提交于
Move nested_vmx_succeed/nested_vmx_failInvalid/nested_vmx_failValid ahead of handle_vmon to eliminate double declaration in the same file Signed-off-by: NArthur Chunqi Li <yzt356@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 04 7月, 2013 1 次提交
-
-
由 Gleb Natapov 提交于
Some userspaces do not preserve unusable property. Since usable segment has to be present according to VMX spec we can use present property to amend userspace bug by making unusable segment always nonpresent. vmx_segment_access_rights() already marks nonpresent segment as unusable. Cc: stable@vger.kernel.org # 3.9+ Reported-by: NStefan Pietsch <stefan.pietsch@lsexperts.de> Tested-by: NStefan Pietsch <stefan.pietsch@lsexperts.de> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 27 6月, 2013 3 次提交
-
-
由 Yoshihiro YUNOMAE 提交于
Add a tracepoint write_tsc_offset for tracing TSC offset change. We want to merge ftrace's trace data of guest OSs and the host OS using TSC for timestamp in chronological order. We need "TSC offset" values for each guest when merge those because the TSC value on a guest is always the host TSC plus guest's TSC offset. If we get the TSC offset values, we can calculate the host TSC value for each guest events from the TSC offset and the event TSC value. The host TSC values of the guest events are used when we want to merge trace data of guests and the host in chronological order. (Note: the trace_clock of both the host and the guest must be set x86-tsc in this case) This tracepoint also records vcpu_id which can be used to merge trace data for SMP guests. A merge tool will read TSC offset for each vcpu, then the tool converts guest TSC values to host TSC values for each vcpu. TSC offset is stored in the VMCS by vmx_write_tsc_offset() or vmx_adjust_tsc_offset(). KVM executes the former function when a guest boots. The latter function is executed when kvm clock is updated. Only host can read TSC offset value from VMCS, so a host needs to output TSC offset value when TSC offset is changed. Since the TSC offset is not often changed, it could be overwritten by other frequent events while tracing. To avoid that, I recommend to use a special instance for getting this event: 1. set a instance before booting a guest # cd /sys/kernel/debug/tracing/instances # mkdir tsc_offset # cd tsc_offset # echo x86-tsc > trace_clock # echo 1 > events/kvm/kvm_write_tsc_offset/enable 2. boot a guest Signed-off-by: NYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Xiao Guangrong 提交于
This patch tries to introduce a very simple and scale way to invalidate all mmio sptes - it need not walk any shadow pages and hold mmu-lock KVM maintains a global mmio valid generation-number which is stored in kvm->memslots.generation and every mmio spte stores the current global generation-number into his available bits when it is created When KVM need zap all mmio sptes, it just simply increase the global generation-number. When guests do mmio access, KVM intercepts a MMIO #PF then it walks the shadow page table and get the mmio spte. If the generation-number on the spte does not equal the global generation-number, it will go to the normal #PF handler to update the mmio spte Since 19 bits are used to store generation-number on mmio spte, we zap all mmio sptes when the number is round Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: NGleb Natapov <gleb@redhat.com> Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiao Guangrong 提交于
Define some meaningful names instead of raw code Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: NGleb Natapov <gleb@redhat.com> Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 6月, 2013 1 次提交
-
-
由 H. Peter Anvin 提交于
Bit 1 in the x86 EFLAGS is always set. Name the macro something that actually tries to explain what it is all about, rather than being a tautology. Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Gleb Natapov <gleb@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/n/tip-f10rx5vjjm6tfnt8o1wseb3v@git.kernel.org
-
- 21 6月, 2013 1 次提交
-
-
由 Xiao Guangrong 提交于
Let mmio spte only use bit62 and bit63 on upper 32 bits, then bit 52 ~ bit 61 can be used for other purposes Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: NGleb Natapov <gleb@redhat.com> Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 5月, 2013 1 次提交
-
-
由 Gleb Natapov 提交于
The invalid guest state emulation loop does not check halt_request which causes 100% cpu loop while guest is in halt and in invalid state, but more serious issue is that this leaves halt_request set, so random instruction emulated by vm86 #GP exit can be interpreted as halt which causes guest hang. Fix both problems by handling halt_request in emulation loop. Reported-by: NTomas Papan <tomas.papan@gmail.com> Tested-by: NTomas Papan <tomas.papan@gmail.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> CC: stable@vger.kernel.org Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 03 5月, 2013 1 次提交
-
-
由 Jan Kiszka 提交于
With VMX, enable_irq_window can now return -EBUSY, in which case an immediate exit shall be requested before entering the guest. Account for this also in enable_nmi_window which uses enable_irq_window in absence of vnmi support, e.g. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 28 4月, 2013 3 次提交
-
-
由 Jan Kiszka 提交于
While a nested run is pending, vmx_queue_exception is only called to requeue exceptions that were previously picked up via vmx_cancel_injection. Therefore, we must not check for PF interception by L1, possibly causing a bogus nested vmexit. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Jan Kiszka 提交于
The VMX implementation of enable_irq_window raised KVM_REQ_IMMEDIATE_EXIT after we checked it in vcpu_enter_guest. This caused infinite loops on vmentry. Fix it by letting enable_irq_window signal the need for an immediate exit via its return value and drop KVM_REQ_IMMEDIATE_EXIT. This issue only affects nested VMX scenarios. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Jan Kiszka 提交于
Slipped in while copy&pasting from the SDM. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 22 4月, 2013 14 次提交
-
-
由 Jan Kiszka 提交于
If we load the complete EFER MSR on entry or exit, EFER.LMA (and LME) loading is skipped. Their consistency is already checked now before starting the transition. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Jan Kiszka 提交于
As we may emulate the loading of EFER on VM-entry and VM-exit, implement the checks that VMX performs on the guest and host values on vmlaunch/ vmresume. Factor out kvm_valid_efer for this purpose which checks for set reserved bits. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Jan Kiszka 提交于
The logic for checking if interrupts can be injected has to be applied also on NMIs. The difference is that if NMI interception is on these events are consumed and blocked by the VM exit. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Jan Kiszka 提交于
vmx_set_nmi_mask will soon be used by vmx_nmi_allowed. No functional changes. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Abel Gordon 提交于
Once L1 loads VMCS12 we enable shadow-vmcs capability and copy all the VMCS12 shadowed fields to the shadow vmcs. When we release the VMCS12, we also disable shadow-vmcs capability. Signed-off-by: NAbel Gordon <abelg@il.ibm.com> Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Abel Gordon 提交于
Synchronize between the VMCS12 software controlled structure and the processor-specific shadow vmcs Signed-off-by: NAbel Gordon <abelg@il.ibm.com> Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Abel Gordon 提交于
Introduce a function used to copy fields from the software controlled VMCS12 to the processor-specific shadow vmcs Signed-off-by: NAbel Gordon <abelg@il.ibm.com> Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Abel Gordon 提交于
Introduce a function used to copy fields from the processor-specific shadow vmcs to the software controlled VMCS12 Signed-off-by: NAbel Gordon <abelg@il.ibm.com> Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Abel Gordon 提交于
Unmap vmcs12 and release the corresponding shadow vmcs Signed-off-by: NAbel Gordon <abelg@il.ibm.com> Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Abel Gordon 提交于
Allocate a shadow vmcs used by the processor to shadow part of the fields stored in the software defined VMCS12 (let L1 access fields without causing exits). Note we keep a shadow vmcs only for the current vmcs12. Once a vmcs12 becomes non-current, its shadow vmcs is released. Signed-off-by: NAbel Gordon <abelg@il.ibm.com> Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Abel Gordon 提交于
handle_vmon doesn't check if L1 is already in root mode (VMXON was previously called). This patch adds this missing check and calls nested_vmx_failValid if VMX is already ON. We need this check because L0 will allocate the shadow vmcs when L1 executes VMXON and we want to avoid host leaks (due to shadow vmcs allocation) if L1 executes VMXON repeatedly. Signed-off-by: NAbel Gordon <abelg@il.ibm.com> Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Abel Gordon 提交于
Refactor existent code so we re-use vmcs12_write_any to copy fields from the shadow vmcs specified by the link pointer (used by the processor, implementation-specific) to the VMCS12 software format used by L0 to hold the fields in L1 memory address space. Signed-off-by: NAbel Gordon <abelg@il.ibm.com> Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Abel Gordon 提交于
Prepare vmread and vmwrite bitmaps according to a pre-specified list of fields. These lists are intended to specifiy most frequent accessed fields so we can minimize the number of fields that are copied from/to the software controlled VMCS12 format to/from to processor-specific shadow vmcs. The lists were built measuring the VMCS fields access rate after L2 Ubuntu 12.04 booted when it was running on top of L1 KVM, also Ubuntu 12.04. Note that during boot there were additional fields which were frequently modified but they were not added to these lists because after boot these fields were not longer accessed by L1. Signed-off-by: NAbel Gordon <abelg@il.ibm.com> Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Abel Gordon 提交于
Add logic required to detect if shadow-vmcs is supported by the processor. Introduce a new kernel module parameter to specify if L0 should use shadow vmcs (or not) to run L1. Signed-off-by: NAbel Gordon <abelg@il.ibm.com> Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 18 4月, 2013 1 次提交
-
-
由 Zhang, Yang Z 提交于
->send_IPI_mask is not defined on UP. Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 17 4月, 2013 5 次提交
-
-
由 Gleb Natapov 提交于
If guest vcpu is in VM86 mode the vcpu state should be checked as if in real mode. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Paolo Bonzini 提交于
KVM does not use the activity state VMCS field, and does not support it in nested VMX either (the corresponding bits in the misc VMX feature MSR are zero). Fail entry if the activity state is set to anything but "active". Since the value will always be the same for L1 and L2, we do not need to read and write the corresponding VMCS field on L1/L2 transitions, either. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NGleb Natapov <gleb@redhat.com> Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Yang Zhang 提交于
If posted interrupt is avaliable, then uses it to inject virtual interrupt to guest. Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Reviewed-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Yang Zhang 提交于
Only deliver the posted interrupt when target vcpu is running and there is no previous interrupt pending in pir. Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Reviewed-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Yang Zhang 提交于
Both TMR and EOI exit bitmap need to be updated when ioapic changed or vcpu's id/ldr/dfr changed. So use common function instead eoi exit bitmap specific function. Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Reviewed-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-