- 07 8月, 2013 13 次提交
-
-
由 Nadav Har'El 提交于
KVM's existing shadow MMU code already supports nested TDP. To use it, we need to set up a new "MMU context" for nested EPT, and create a few callbacks for it (nested_ept_*()). This context should also use the EPT versions of the page table access functions (defined in the previous patch). Then, we need to switch back and forth between this nested context and the regular MMU context when switching between L1 and L2 (when L1 runs this L2 with EPT). Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NXinhao Xu <xinhao.xu@intel.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yang Zhang 提交于
Inject nEPT fault to L1 guest. This patch is original from Xinhao. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NXinhao Xu <xinhao.xu@intel.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Gleb Natapov 提交于
need_remote_flush() assumes that shadow page is in PT64 format, but with addition of nested EPT this is no longer always true. Fix it by bits definitions that depend on host shadow page type. Reported-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yang Zhang 提交于
Since nEPT doesn't support A/D bit, so we should not set those bit when build shadow page table. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Har'El 提交于
This is the first patch in a series which adds nested EPT support to KVM's nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set its own cr3 and take its own page faults without either of L0 or L1 getting involved. This often significanlty improves L2's performance over the previous two alternatives (shadow page tables over EPT, and shadow page tables over shadow page tables). This patch adds EPT support to paging_tmpl.h. paging_tmpl.h contains the code for reading and writing page tables. The code for 32-bit and 64-bit tables is very similar, but not identical, so paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once with PTTYPE=64, and this generates the two sets of similar functions. There are subtle but important differences between the format of EPT tables and that of ordinary x86 64-bit page tables, so for nested EPT we need a third set of functions to read the guest EPT table and to write the shadow EPT table. So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions (prefixed with "EPT") which correctly read and write EPT tables. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NXinhao Xu <xinhao.xu@intel.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Gleb Natapov 提交于
Some guest paging modes do not support A/D bits. Add support for such modes in shadow page code. For such modes PT_GUEST_DIRTY_MASK, PT_GUEST_ACCESSED_MASK, PT_GUEST_DIRTY_SHIFT and PT_GUEST_ACCESSED_SHIFT should be set to zero. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Gleb Natapov 提交于
This patch makes guest A/D bits definition to be dependable on paging mode, so when EPT support will be added it will be able to define them differently. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Har'El 提交于
For preparation, we just move gpte_access(), prefetch_invalid_gpte(), s_rsvd_bits_set(), protect_clean_gpte() and is_dirty_gpte() from mmu.c to paging_tmpl.h. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NXinhao Xu <xinhao.xu@intel.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Har'El 提交于
kvm_set_cr3() attempts to check if the new cr3 is a valid guest physical address. The problem is that with nested EPT, cr3 is an *L2* physical address, not an L1 physical address as this test expects. As the comment above this test explains, it isn't necessary, and doesn't correspond to anything a real processor would do. So this patch removes it. Note that this wrong test could have also theoretically caused problems in nested NPT, not just in nested EPT. However, in practice, the problem was avoided: nested_svm_vmexit()/vmrun() do not call kvm_set_cr3 in the nested NPT case, and instead set the vmcb (and arch.cr3) directly, thus circumventing the problem. Additional potential calls to the buggy function are avoided in that we don't trap cr3 modifications when nested NPT is enabled. However, because in nested VMX we did want to use kvm_set_cr3() (as requested in Avi Kivity's review of the original nested VMX patches), we can't avoid this problem and need to fix it. Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NXinhao Xu <xinhao.xu@intel.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Har'El 提交于
The existing code for handling cr3 and related VMCS fields during nested exit and entry wasn't correct in all cases: If L2 is allowed to control cr3 (and this is indeed the case in nested EPT), during nested exit we must copy the modified cr3 from vmcs02 to vmcs12, and we forgot to do so. This patch adds this copy. If L0 isn't controlling cr3 when running L2 (i.e., L0 is using EPT), and whoever does control cr3 (L1 or L2) is using PAE, the processor might have saved PDPTEs and we should also save them in vmcs12 (and restore later). Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NXinhao Xu <xinhao.xu@intel.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Har'El 提交于
Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577 switch the EFER MSR when EPT is used and the host and guest have different NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2) and want to be able to run recent KVM as L1, we need to allow L1 to use this EFER switching feature. To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available, and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the former (the latter is still unsupported). Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state, respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do in this patch is to properly advertise this feature to L1. Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using vmx_set_efer (which itself sets one of several vmcs02 fields), so we always support this feature, regardless of whether the host supports it. Reviewed-by: NOrit Wasserman <owasserm@redhat.com> Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NXinhao Xu <xinhao.xu@intel.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiao Guangrong 提交于
Current code always uses arch.mmu to check the reserved bits on guest gpte which is valid only for L1 guest, we should use arch.nested_mmu instead when we translate gva to gpa for the L2 guest Fix it by using @mmu instead since it is adapted to the current mmu mode automatically The bug can be triggered when nested npt is used and L1 guest and L2 guest use different mmu mode Reported-by: NJan Kiszka <jan.kiszka@siemens.com> Reviewed-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Gleb Natapov 提交于
After commit 21feb4eb tr base is zeroed during vmexit. Set it to L1's HOST_TR_BASE. This should fix https://bugzilla.kernel.org/show_bug.cgi?id=60679Reported-by: NYongjie Ren <yongjie.ren@intel.com> Reviewed-by: NArthur Chunqi Li <yzt356@gmail.com> Tested-by: NYongjie Ren <yongjie.ren@intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 29 7月, 2013 11 次提交
-
-
由 Gleb Natapov 提交于
During nested vmentry into vm86 mode a vcpu state is found to be incorrect because rflags does not have VM flag set since it is read from the cache and has L1's value instead of L2's. If emulate_invalid_guest_state=1 L0 KVM tries to emulate it, but emulation does not work for nVMX and it never should happen anyway. Fix that by using vmx_set_rflags() to set rflags during nested vmentry which takes care of updating register cache. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Dominik Dingel 提交于
Current common code uses PAGE_OFFSET to indicate a bad host virtual address. As this check won't work on architectures that don't map kernel and user memory into the same address space (e.g. s390), such architectures can now provide their own KVM_HVA_ERR_BAD defines. Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Thomas Huth 提交于
Introduced a helper function for setting the CC in the guest PSW to improve the readability of the code. Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Thomas Huth 提交于
sparse complained about the missing UL postfix for long constants. Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Michael Mueller 提交于
The patch renames the array holding the HW facility bitmaps. This allows to interprete the variable as set of virtual machine specific "virtual" facilities. The basic idea is to make virtual facilities externally managable in future. An availability test for virtual facilites has been added as well. Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Martin Schwidefsky 提交于
The gmap_map_segment function uses PGDIR_SIZE in the check for the maximum address in the tasks address space. This incorrectly limits the amount of memory usable for a kvm guest to 4TB. The correct limit is (1UL << 53). As the TASK_SIZE has different values (4TB vs 8PB) dependent on the existance of the fourth page table level, create a new define 'TASK_MAX_SIZE' for (1UL << 53). Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Martin Schwidefsky 提交于
Improve the code to upgrade the standard 2K page tables to 4K page tables with PGSTEs to allow the operation to happen when the program is already multi-threaded. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This lets debugging work better during emulation of invalid guest state. This time the check is done after emulation, but before writeback of the flags; we need to check the flags *before* execution of the instruction, we cannot check singlestep_rip because the CS base may have already been modified. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Conflicts: arch/x86/kvm/x86.c
-
由 Paolo Bonzini 提交于
This lets debugging work better during emulation of invalid guest state. The check is done before emulating the instruction, and (in the case of guest debugging) reuses EMULATE_DO_MMIO to exit with KVM_EXIT_DEBUG. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The next patch will reuse it for other userspace exits than MMIO, namely debug events. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
kvm_io_bus_sort_cmp is used also directly, not just as a callback for sort and bsearch. In these cases, it is handy to have a type-safe variant. This patch introduces such a variant, __kvm_io_bus_sort_cmp, and uses it throughout kvm_main.c. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 7月, 2013 2 次提交
-
-
由 Jan Kiszka 提交于
Both have no users anymore. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Jan Kiszka 提交于
If posted interrupts are enabled, we can no longer track if an IRQ was coalesced based on IRR. So drop this logic also from the classic software path and simplify apic_test_and_set_irr to apic_set_irr. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 20 7月, 2013 1 次提交
-
-
由 Andi Kleen 提交于
[KVM maintainers: The underlying support for this is in perf/core now. So please merge this patch into the KVM tree.] This is not arch perfmon, but older CPUs will just ignore it. This makes it possible to do at least some TSX measurements from a KVM guest v2: Various fixes to address review feedback v3: Ignore the bits when no CPUID. No #GP. Force raw events with TSX bits. v4: Use reserved bits for #GP v5: Remove obsolete argument Acked-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 7月, 2013 12 次提交
-
-
由 Arthur Chunqi Li 提交于
When L2 exits to L1, segment infomations of L1 are not set correctly. According to Intel SDM 27.5.2(Loading Host Segment and Descriptor Table Registers), segment base/limit/access right of L1 should be set to some designed value when L2 exits to L1. This patch fixes this. Signed-off-by: NArthur Chunqi Li <yzt356@gmail.com> Reviewed-by: NGleb Natapov <gnatapov@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Marcelo Tosatti 提交于
Linux as a guest on KVM hypervisor, the only user of the pvclock vsyscall interface, does not require notification on task migration because: 1. cpu ID number maps 1:1 to per-CPU pvclock time info. 2. per-CPU pvclock time info is updated if the underlying CPU changes. 3. that version is increased whenever underlying CPU changes. Which is sufficient to guarantee nanoseconds counter is calculated properly. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Nadav Har'El 提交于
Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment. This patch simulate this MSR in nested_vmx and the default value is 0x0. BIOS should set it to 0x5 before VMXON. After setting the lock bit, write to it will cause #GP(0). Another QEMU patch is also needed to handle emulation of reset and migration. Reset to vCPU should clear this MSR and migration should reserve value of it. This patch is based on Nadav's previous commit. http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/88478Signed-off-by: NNadav Har'El <nyh@math.technion.ac.il> Signed-off-by: NArthur Chunqi Li <yzt356@gmail.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Mathias Krause 提交于
Void pointers don't need no casting, drop it. Signed-off-by: NMathias Krause <minipli@googlemail.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Mathias Krause 提交于
Use a const pointer type instead of casting away the const qualifier from const arrays. Keep the pointer array on the stack, nonetheless. Making it static just increases the object size. Signed-off-by: NMathias Krause <minipli@googlemail.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Arthur Chunqi Li 提交于
Set rflags after successfully emulateing VMXON/VMXOFF in VMX. Signed-off-by: NArthur Chunqi Li <yzt356@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Arthur Chunqi Li 提交于
Move nested_vmx_succeed/nested_vmx_failInvalid/nested_vmx_failValid ahead of handle_vmon to eliminate double declaration in the same file Signed-off-by: NArthur Chunqi Li <yzt356@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Takuya Yoshikawa 提交于
Now that kvm_arch_memslots_updated() catches every increment of the memslots->generation, checking if the mmio generation has reached its maximum value is enough. Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Takuya Yoshikawa 提交于
This is called right after the memslots is updated, i.e. when the result of update_memslots() gets installed in install_new_memslots(). Since the memslots needs to be updated twice when we delete or move a memslot, kvm_arch_commit_memory_region() does not correspond to this exactly. In the following patch, x86 will use this new API to check if the mmio generation has reached its maximum value, in which case mmio sptes need to be flushed out. Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Acked-by: NAlexander Graf <agraf@suse.de> Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Cornelia Huck 提交于
Make use of cookies for the virtio ccw notification hypercall to speed up lookup of devices on the io bus. Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com> [Small fix to a comment. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Cornelia Huck 提交于
Add new functions kvm_io_bus_{read,write}_cookie() that allows users of the kvm io infrastructure to use a cookie value to speed up lookup of a device on an io bus. Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Xiao Guangrong 提交于
Currently, fast page fault incorrectly tries to fix mmio page fault when the generation number is invalid (spte.gen != kvm.gen). It then returns to guest to retry the fault since it sees the last spte is nonpresent. This causes an infinite loop. Since fast page fault only works for direct mmu, the issue exists when 1) tdp is enabled. It is only triggered only on AMD host since on Intel host the mmio page fault is recognized as ept-misconfig whose handler call fault-page path with error_code = 0 2) guest paging is disabled. Under this case, the issue is hardly discovered since paging disable is short-lived and the sptes will be invalid after memslot changed for 150 times Fix it by filtering out MMIO page faults in page_fault_can_be_fast. Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 7月, 2013 1 次提交
-
-
由 Linus Torvalds 提交于
-