- 07 8月, 2013 7 次提交
-
-
由 Yang Zhang 提交于
Inject nEPT fault to L1 guest. This patch is original from Xinhao. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NXinhao Xu <xinhao.xu@intel.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yang Zhang 提交于
Since nEPT doesn't support A/D bit, so we should not set those bit when build shadow page table. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Har'El 提交于
This is the first patch in a series which adds nested EPT support to KVM's nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set its own cr3 and take its own page faults without either of L0 or L1 getting involved. This often significanlty improves L2's performance over the previous two alternatives (shadow page tables over EPT, and shadow page tables over shadow page tables). This patch adds EPT support to paging_tmpl.h. paging_tmpl.h contains the code for reading and writing page tables. The code for 32-bit and 64-bit tables is very similar, but not identical, so paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once with PTTYPE=64, and this generates the two sets of similar functions. There are subtle but important differences between the format of EPT tables and that of ordinary x86 64-bit page tables, so for nested EPT we need a third set of functions to read the guest EPT table and to write the shadow EPT table. So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions (prefixed with "EPT") which correctly read and write EPT tables. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NXinhao Xu <xinhao.xu@intel.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Gleb Natapov 提交于
Some guest paging modes do not support A/D bits. Add support for such modes in shadow page code. For such modes PT_GUEST_DIRTY_MASK, PT_GUEST_ACCESSED_MASK, PT_GUEST_DIRTY_SHIFT and PT_GUEST_ACCESSED_SHIFT should be set to zero. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Gleb Natapov 提交于
This patch makes guest A/D bits definition to be dependable on paging mode, so when EPT support will be added it will be able to define them differently. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Har'El 提交于
For preparation, we just move gpte_access(), prefetch_invalid_gpte(), s_rsvd_bits_set(), protect_clean_gpte() and is_dirty_gpte() from mmu.c to paging_tmpl.h. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NXinhao Xu <xinhao.xu@intel.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiao Guangrong 提交于
Current code always uses arch.mmu to check the reserved bits on guest gpte which is valid only for L1 guest, we should use arch.nested_mmu instead when we translate gva to gpa for the L2 guest Fix it by using @mmu instead since it is adapted to the current mmu mode automatically The bug can be triggered when nested npt is used and L1 guest and L2 guest use different mmu mode Reported-by: NJan Kiszka <jan.kiszka@siemens.com> Reviewed-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 27 6月, 2013 2 次提交
-
-
由 Xiao Guangrong 提交于
This patch tries to introduce a very simple and scale way to invalidate all mmio sptes - it need not walk any shadow pages and hold mmu-lock KVM maintains a global mmio valid generation-number which is stored in kvm->memslots.generation and every mmio spte stores the current global generation-number into his available bits when it is created When KVM need zap all mmio sptes, it just simply increase the global generation-number. When guests do mmio access, KVM intercepts a MMIO #PF then it walks the shadow page table and get the mmio spte. If the generation-number on the spte does not equal the global generation-number, it will go to the normal #PF handler to update the mmio spte Since 19 bits are used to store generation-number on mmio spte, we zap all mmio sptes when the number is round Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: NGleb Natapov <gleb@redhat.com> Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiao Guangrong 提交于
Store the generation-number into bit3 ~ bit11 and bit52 ~ bit61, totally 19 bits can be used, it should be enough for nearly all most common cases In this patch, the generation-number is always 0, it will be changed in the later patch [Gleb: masking generation bits from spte in get_mmio_spte_gfn() and get_mmio_spte_access()] Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: NGleb Natapov <gleb@redhat.com> Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 4月, 2013 1 次提交
-
-
由 Takuya Yoshikawa 提交于
With the following commit, shadow pages can be zapped at random during a shadow page talbe walk: KVM: MMU: Move kvm_mmu_free_some_pages() into kvm_mmu_alloc_page() 7ddca7e4 This patch reverts it and fixes __direct_map() and FNAME(fetch)(). Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 22 3月, 2013 1 次提交
-
-
由 Takuya Yoshikawa 提交于
What this function is doing is to ensure that the number of shadow pages does not exceed the maximum limit stored in n_max_mmu_pages: so this is placed at every code path that can reach kvm_mmu_alloc_page(). Although it might have some sense to spread this function in each such code path when it could be called before taking mmu_lock, the rule was changed not to do so. Taking this background into account, this patch moves it into kvm_mmu_alloc_page() and simplifies the code. Note: the unlikely hint in kvm_mmu_free_some_pages() guarantees that the overhead of this function is almost zero except when we actually need to allocate some shadow pages, so we do not need to care about calling it multiple times in one path by doing kvm_mmu_get_page() a few times. Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 07 2月, 2013 1 次提交
-
-
由 Xiao Guangrong 提交于
It is only used in debug code, so drop it Reviewed-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 05 2月, 2013 1 次提交
-
-
由 Gleb Natapov 提交于
Gust page walker puts only present ptes into ptes[] array. No need to check it again. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 22 1月, 2013 1 次提交
-
-
由 Xiao Guangrong 提交于
The current reexecute_instruction can not well detect the failed instruction emulation. It allows guest to retry all the instructions except it accesses on error pfn For example, some cases are nested-write-protect - if the page we want to write is used as PDE but it chains to itself. Under this case, we should stop the emulation and report the case to userspace Reviewed-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 11 1月, 2013 2 次提交
-
-
由 Xiao Guangrong 提交于
We have two issues in current code: - if target gfn is used as its page table, guest will refault then kvm will use small page size to map it. We need two #PF to fix its shadow page table - sometimes, say a exception is triggered during vm-exit caused by #PF (see handle_exception() in vmx.c), we remove all the shadow pages shadowed by the target gfn before go into page fault path, it will cause infinite loop: delete shadow pages shadowed by the gfn -> try to use large page size to map the gfn -> retry the access ->... To fix these, we can adjust page size early if the target gfn is used as page table Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Xiao Guangrong 提交于
If the write-fault access is from supervisor and CR0.WP is not set on the vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte and clears U bit. This is the chance that kvm can change pte access from readonly to writable Unfortunately, the pte access is the access of 'direct' shadow page table, means direct sp.role.access = pte_access, then we will create a writable spte entry on the readonly shadow page table. It will cause Dirty bit is not tracked when two guest ptes point to the same large page. Note, it does not have other impact except Dirty bit since cr0.wp is encoded into sp.role It can be fixed by adjusting pte access before establishing shadow page table. Also, after that, no mmu specified code exists in the common function and drop two parameters in set_spte Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 09 1月, 2013 1 次提交
-
-
由 Gleb Natapov 提交于
Fix compilation warning. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 08 1月, 2013 1 次提交
-
-
由 Gleb Natapov 提交于
MMU code tries to avoid if()s HW is not able to predict reliably by using bitwise operation to streamline code execution, but in case of a dirty bit folding this gives us nothing since write_fault is checked right before the folding code. Lets just piggyback onto the if() to make code more clear. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 30 10月, 2012 1 次提交
-
-
由 Xiao Guangrong 提交于
This patch filters noslot pfn out from error pfns based on Marcelo comment: noslot pfn is not a error pfn After this patch, - is_noslot_pfn indicates that the gfn is not in slot - is_error_pfn indicates that the gfn is in slot but the error is occurred when translate the gfn to pfn - is_error_noslot_pfn indicates that the pfn either it is error pfns or it is noslot pfn And is_invalid_pfn can be removed, it makes the code more clean Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 23 10月, 2012 1 次提交
-
-
由 Christoffer Dall 提交于
The mmu_notifier_retry is not specific to any vcpu (and never will be) so only take struct kvm as a parameter. The motivation is the ARM mmu code that needs to call this from somewhere where we long let go of the vcpu pointer. Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 17 10月, 2012 4 次提交
-
-
由 Xiao Guangrong 提交于
The only difference between FNAME(update_pte) and FNAME(pte_prefetch) is that the former is allowed to prefetch gfn from dirty logged slot, so introduce a common function to prefetch spte Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
The function does not depend on guest mmu mode, move it out from paging_tmpl.h Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Let it return emulate state instead of spte like __direct_map Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Remove mmu_is_invalid and use is_invalid_pfn instead Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 20 9月, 2012 10 次提交
-
-
由 Avi Kivity 提交于
'ac' essentially reconstructs the 'access' variable we already have, except for the PFERR_PRESENT_MASK and PFERR_RSVD_MASK. As these are not used by callees, just use 'access' directly. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
Keep track of accessed/dirty bits; if they are all set, do not enter the accessed/dirty update loop. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
'eperm' is no longer used in the walker loop, so we can eliminate it. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
Instead of branchy code depending on level, gpte.ps, and mmu configuration, prepare everything in a bitmap during mode changes and look it up during runtime. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
The page table walk is coded as an infinite loop, with a special case on the last pte. Code it as an ordinary loop with a termination condition on the last pte (large page or walk length exhausted), and put the last pte handling code after the loop where it belongs. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
walk_addr_generic() permission checks are a maze of branchy code, which is performed four times per lookup. It depends on the type of access, efer.nxe, cr0.wp, cr4.smep, and in the near future, cr4.smap. Optimize this away by precalculating all variants and storing them in a bitmap. The bitmap is recalculated when rarely-changing variables change (cr0, cr4) and is indexed by the often-changing variables (page fault error code, pte access permissions). The permission check is moved to the end of the loop, otherwise an SMEP fault could be reported as a false positive, when PDE.U=1 but PTE.U=0. Noted by Xiao Guangrong. The result is short, branch-free code. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
While unspecified, the behaviour of Intel processors is to first perform the page table walk, then, if the walk was successful, to atomically update the accessed and dirty bits of walked paging elements. While we are not required to follow this exactly, doing so will allow us to perform the access permissions check after the walk is complete, rather than after each walk step. (the tricky case is SMEP: a zero in any pte's U bit makes the referenced page a supervisor page, so we can't fault on a one bit during the walk itself). Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
We no longer rely on paging_tmpl.h defines; so we can move the function to mmu.c. Rely on zero extension to 64 bits to get the correct nx behaviour. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
If nx is disabled, then is gpte[63] is set we will hit a reserved bit set fault before checking permissions; so we can ignore the setting of efer.nxe. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
gpte_access() computes the access permissions of a guest pte and also write-protects clean gptes. This is wrong when we are servicing a write fault (since we'll be setting the dirty bit momentarily) but correct when instantiating a speculative spte, or when servicing a read fault (since we'll want to trap a following write in order to set the dirty bit). It doesn't seem to hurt in practice, but in order to make the code readable, push the write protection out of gpte_access() and into a new protect_clean_gpte() which is called explicitly when needed. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 06 8月, 2012 1 次提交
-
-
由 Xiao Guangrong 提交于
After commit a2766325, the error pfn is replaced by the error code, it need not be released anymore [ The patch has been compiling tested for powerpc ] Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 11 7月, 2012 1 次提交
-
-
由 Xiao Guangrong 提交于
The P bit of page fault error code is missed in this tracepoint, fix it by passing the full error code Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 19 4月, 2012 1 次提交
-
-
由 Davidlohr Bueso 提交于
Its much cleaner to use PT_PAGE_TABLE_LEVEL than its numeric value. Signed-off-by: NDavidlohr Bueso <dave@gnu.org> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 20 3月, 2012 1 次提交
-
-
由 Cong Wang 提交于
Acked-by: NAvi Kivity <avi@redhat.com> Acked-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NCong Wang <amwang@redhat.com>
-
- 27 12月, 2011 2 次提交
-
-
由 Xiao Guangrong 提交于
The tracepoint is only used to audit mmu code, it should not be exposed to user, let us replace it with jump-label. Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Detecting write-flooding does not work well, when we handle page written, if the last speculative spte is not accessed, we treat the page is write-flooding, however, we can speculative spte on many path, such as pte prefetch, page synced, that means the last speculative spte may be not point to the written page and the written page can be accessed via other sptes, so depends on the Accessed bit of the last speculative spte is not enough Instead of detected page accessed, we can detect whether the spte is accessed after it is written, if the spte is not accessed but it is written frequently, we treat is not a page table or it not used for a long time Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-