1. 20 9月, 2012 3 次提交
    • A
      KVM: MMU: Optimize pte permission checks · 97d64b78
      Avi Kivity 提交于
      walk_addr_generic() permission checks are a maze of branchy code, which is
      performed four times per lookup.  It depends on the type of access, efer.nxe,
      cr0.wp, cr4.smep, and in the near future, cr4.smap.
      
      Optimize this away by precalculating all variants and storing them in a
      bitmap.  The bitmap is recalculated when rarely-changing variables change
      (cr0, cr4) and is indexed by the often-changing variables (page fault error
      code, pte access permissions).
      
      The permission check is moved to the end of the loop, otherwise an SMEP
      fault could be reported as a false positive, when PDE.U=1 but PTE.U=0.
      Noted by Xiao Guangrong.
      
      The result is short, branch-free code.
      Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      97d64b78
    • A
      KVM: MMU: Move gpte_access() out of paging_tmpl.h · 3d34adec
      Avi Kivity 提交于
      We no longer rely on paging_tmpl.h defines; so we can move the function
      to mmu.c.
      
      Rely on zero extension to 64 bits to get the correct nx behaviour.
      Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3d34adec
    • A
      KVM: MMU: Push clean gpte write protection out of gpte_access() · 8ea667f2
      Avi Kivity 提交于
      gpte_access() computes the access permissions of a guest pte and also
      write-protects clean gptes.  This is wrong when we are servicing a
      write fault (since we'll be setting the dirty bit momentarily) but
      correct when instantiating a speculative spte, or when servicing a
      read fault (since we'll want to trap a following write in order to
      set the dirty bit).
      
      It doesn't seem to hurt in practice, but in order to make the code
      readable, push the write protection out of gpte_access() and into
      a new protect_clean_gpte() which is called explicitly when needed.
      Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      8ea667f2
  2. 10 9月, 2012 1 次提交
  3. 22 8月, 2012 3 次提交
  4. 06 8月, 2012 5 次提交
  5. 26 7月, 2012 1 次提交
  6. 20 7月, 2012 3 次提交
  7. 19 7月, 2012 8 次提交
  8. 11 7月, 2012 7 次提交
  9. 09 7月, 2012 1 次提交
    • A
      KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulation · e676505a
      Avi Kivity 提交于
      Currently the MMU's ->new_cr3() callback does nothing when guest paging
      is disabled or when two-dimentional paging (e.g. EPT on Intel) is active.
      This means that an emulated write to cr3 can be lost; kvm_set_cr3() will
      write vcpu-arch.cr3, but the GUEST_CR3 field in the VMCS will retain its
      old value and this is what the guest sees.
      
      This bug did not have any effect until now because:
      - with unrestricted guest, or with svm, we never emulate a mov cr3 instruction
      - without unrestricted guest, and with paging enabled, we also never emulate a
        mov cr3 instruction
      - without unrestricted guest, but with paging disabled, the guest's cr3 is
        ignored until the guest enables paging; at this point the value from arch.cr3
        is loaded correctly my the mov cr0 instruction which turns on paging
      
      However, the patchset that enables big real mode causes us to emulate mov cr3
      instructions in protected mode sometimes (when guest state is not virtualizable
      by vmx); this mov cr3 is effectively ignored and will crash the guest.
      
      The fix is to make nonpaging_new_cr3() call mmu_free_roots() to force a cr3
      reload.  This is awkward because now all the new_cr3 callbacks to the same
      thing, and because mmu_free_roots() is somewhat of an overkill; but fixing
      that is more complicated and will be done after this minimal fix.
      
      Observed in the Window XP 32-bit installer while bringing up secondary vcpus.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e676505a
  10. 04 7月, 2012 1 次提交
    • X
      KVM: MMU: fix shrinking page from the empty mmu · 85b70591
      Xiao Guangrong 提交于
      Fix:
      
       [ 3190.059226] BUG: unable to handle kernel NULL pointer dereference at           (null)
       [ 3190.062224] IP: [<ffffffffa02aac66>] mmu_page_zap_pte+0x10/0xa7 [kvm]
       [ 3190.063760] PGD 104f50067 PUD 112bea067 PMD 0
       [ 3190.065309] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
       [ 3190.066860] CPU 1
      [ ...... ]
       [ 3190.109629] Call Trace:
       [ 3190.111342]  [<ffffffffa02aada6>] kvm_mmu_prepare_zap_page+0xa9/0x1fc [kvm]
       [ 3190.113091]  [<ffffffffa02ab2f5>] mmu_shrink+0x11f/0x1f3 [kvm]
       [ 3190.114844]  [<ffffffffa02ab25d>] ? mmu_shrink+0x87/0x1f3 [kvm]
       [ 3190.116598]  [<ffffffff81150c9d>] ? prune_super+0x142/0x154
       [ 3190.118333]  [<ffffffff8110a4f4>] ? shrink_slab+0x39/0x31e
       [ 3190.120043]  [<ffffffff8110a687>] shrink_slab+0x1cc/0x31e
       [ 3190.121718]  [<ffffffff8110ca1d>] do_try_to_free_pages
      
      This is caused by shrinking page from the empty mmu, although we have
      checked n_used_mmu_pages, it is useless since the check is out of mmu-lock
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      85b70591
  11. 14 6月, 2012 1 次提交
  12. 12 6月, 2012 1 次提交
  13. 06 6月, 2012 1 次提交
    • M
      KVM: disable uninitialized var warning · 79f702a6
      Michael S. Tsirkin 提交于
      I see this in 3.5-rc1:
      
      arch/x86/kvm/mmu.c: In function ‘kvm_test_age_rmapp’:
      arch/x86/kvm/mmu.c:1271: warning: ‘iter.desc’ may be used uninitialized in this function
      
      The line in question was introduced by commit
      1e3f42f0
      
       static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
                                    unsigned long data)
       {
      -       u64 *spte;
      +       u64 *sptep;
      +       struct rmap_iterator iter;   <- line 1271
              int young = 0;
      
              /*
      
      The reason I think is that the compiler assumes that
      the rmap value could be 0, so
      
      static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator
      *iter)
      {
              if (!rmap)
                      return NULL;
      
              if (!(rmap & 1)) {
                      iter->desc = NULL;
                      return (u64 *)rmap;
              }
      
              iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
              iter->pos = 0;
              return iter->desc->sptes[iter->pos];
      }
      
      will not initialize iter.desc, but the compiler isn't
      smart enough to see that
      
              for (sptep = rmap_get_first(*rmapp, &iter); sptep;
                   sptep = rmap_get_next(&iter)) {
      
      will immediately exit in this case.
      I checked by adding
              if (!*rmapp)
                      goto out;
      on top which is clearly equivalent but disables the warning.
      
      This patch uses uninitialized_var to disable the warning without
      increasing code size.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      79f702a6
  14. 05 6月, 2012 2 次提交
  15. 28 5月, 2012 1 次提交
  16. 17 5月, 2012 1 次提交
    • A
      KVM: MMU: Don't use RCU for lockless shadow walking · c142786c
      Avi Kivity 提交于
      Using RCU for lockless shadow walking can increase the amount of memory
      in use by the system, since RCU grace periods are unpredictable.  We also
      have an unconditional write to a shared variable (reader_counter), which
      isn't good for scaling.
      
      Replace that with a scheme similar to x86's get_user_pages_fast(): disable
      interrupts during lockless shadow walk to force the freer
      (kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
      processor with interrupts enabled.
      
      We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
      kvm_flush_remote_tlbs() from avoiding the IPI.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      c142786c