1. 11 7月, 2012 1 次提交
  2. 09 7月, 2012 1 次提交
    • A
      KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulation · e676505a
      Avi Kivity 提交于
      Currently the MMU's ->new_cr3() callback does nothing when guest paging
      is disabled or when two-dimentional paging (e.g. EPT on Intel) is active.
      This means that an emulated write to cr3 can be lost; kvm_set_cr3() will
      write vcpu-arch.cr3, but the GUEST_CR3 field in the VMCS will retain its
      old value and this is what the guest sees.
      
      This bug did not have any effect until now because:
      - with unrestricted guest, or with svm, we never emulate a mov cr3 instruction
      - without unrestricted guest, and with paging enabled, we also never emulate a
        mov cr3 instruction
      - without unrestricted guest, but with paging disabled, the guest's cr3 is
        ignored until the guest enables paging; at this point the value from arch.cr3
        is loaded correctly my the mov cr0 instruction which turns on paging
      
      However, the patchset that enables big real mode causes us to emulate mov cr3
      instructions in protected mode sometimes (when guest state is not virtualizable
      by vmx); this mov cr3 is effectively ignored and will crash the guest.
      
      The fix is to make nonpaging_new_cr3() call mmu_free_roots() to force a cr3
      reload.  This is awkward because now all the new_cr3 callbacks to the same
      thing, and because mmu_free_roots() is somewhat of an overkill; but fixing
      that is more complicated and will be done after this minimal fix.
      
      Observed in the Window XP 32-bit installer while bringing up secondary vcpus.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e676505a
  3. 14 6月, 2012 1 次提交
  4. 12 6月, 2012 1 次提交
  5. 06 6月, 2012 1 次提交
    • M
      KVM: disable uninitialized var warning · 79f702a6
      Michael S. Tsirkin 提交于
      I see this in 3.5-rc1:
      
      arch/x86/kvm/mmu.c: In function ‘kvm_test_age_rmapp’:
      arch/x86/kvm/mmu.c:1271: warning: ‘iter.desc’ may be used uninitialized in this function
      
      The line in question was introduced by commit
      1e3f42f0
      
       static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
                                    unsigned long data)
       {
      -       u64 *spte;
      +       u64 *sptep;
      +       struct rmap_iterator iter;   <- line 1271
              int young = 0;
      
              /*
      
      The reason I think is that the compiler assumes that
      the rmap value could be 0, so
      
      static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator
      *iter)
      {
              if (!rmap)
                      return NULL;
      
              if (!(rmap & 1)) {
                      iter->desc = NULL;
                      return (u64 *)rmap;
              }
      
              iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
              iter->pos = 0;
              return iter->desc->sptes[iter->pos];
      }
      
      will not initialize iter.desc, but the compiler isn't
      smart enough to see that
      
              for (sptep = rmap_get_first(*rmapp, &iter); sptep;
                   sptep = rmap_get_next(&iter)) {
      
      will immediately exit in this case.
      I checked by adding
              if (!*rmapp)
                      goto out;
      on top which is clearly equivalent but disables the warning.
      
      This patch uses uninitialized_var to disable the warning without
      increasing code size.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      79f702a6
  6. 05 6月, 2012 2 次提交
  7. 28 5月, 2012 1 次提交
  8. 17 5月, 2012 1 次提交
    • A
      KVM: MMU: Don't use RCU for lockless shadow walking · c142786c
      Avi Kivity 提交于
      Using RCU for lockless shadow walking can increase the amount of memory
      in use by the system, since RCU grace periods are unpredictable.  We also
      have an unconditional write to a shared variable (reader_counter), which
      isn't good for scaling.
      
      Replace that with a scheme similar to x86's get_user_pages_fast(): disable
      interrupts during lockless shadow walk to force the freer
      (kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
      processor with interrupts enabled.
      
      We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
      kvm_flush_remote_tlbs() from avoiding the IPI.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      c142786c
  9. 19 4月, 2012 1 次提交
  10. 08 4月, 2012 4 次提交
    • T
      KVM: MMU: Improve iteration through sptes from rmap · 1e3f42f0
      Takuya Yoshikawa 提交于
      Iteration using rmap_next(), the actual body is pte_list_next(), is
      inefficient: every time we call it we start from checking whether rmap
      holds a single spte or points to a descriptor which links more sptes.
      
      In the case of shadow paging, this quadratic total iteration cost is a
      problem.  Even for two dimensional paging, with EPT/NPT on, in which we
      almost always have a single mapping, the extra checks at the end of the
      iteration should be eliminated.
      
      This patch fixes this by introducing rmap_iterator which keeps the
      iteration context for the next search.  Furthermore the implementation
      of rmap_next() is splitted into two functions, rmap_get_first() and
      rmap_get_next(), to avoid repeatedly checking whether the rmap being
      iterated on has only one spte.
      
      Although there seemed to be only a slight change for EPT/NPT, the actual
      improvement was significant: we observed that GET_DIRTY_LOG for 1GB
      dirty memory became 15% faster than before.  This is probably because
      the new code is easy to make branch predictions.
      
      Note: we just remove pte_list_next() because we can think of parent_ptes
      as a reverse mapping.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      1e3f42f0
    • T
      KVM: MMU: Make pte_list_desc fit cache lines well · 220f773a
      Takuya Yoshikawa 提交于
      We have PTE_LIST_EXT + 1 pointers in this structure and these 40/20
      bytes do not fit cache lines well.  Furthermore, some allocators may
      use 64/32-byte objects for the pte_list_desc cache.
      
      This patch solves this problem by changing PTE_LIST_EXT from 4 to 3.
      
      For shadow paging, the new size is still large enough to hold both the
      kernel and process mappings for usual anonymous pages.  For file
      mappings, there may be a slight change in the cache usage.
      
      Note: with EPT/NPT we almost always have a single spte in each reverse
      mapping and we will not see any change by this.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      220f773a
    • T
      KVM: Avoid checking huge page mappings in get_dirty_log() · 5dc99b23
      Takuya Yoshikawa 提交于
      Dropped such mappings when we enabled dirty logging and we will never
      create new ones until we stop the logging.
      
      For this we introduce a new function which can be used to write protect
      a range of PT level pages: although we do not need to care about a range
      of pages at this point, the following patch will need this feature to
      optimize the write protection of many pages.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      5dc99b23
    • T
      KVM: MMU: Split the main body of rmap_write_protect() off from others · a0ed4607
      Takuya Yoshikawa 提交于
      We will use this in the following patch to implement another function
      which needs to write protect pages using the rmap information.
      
      Note that there is a small change in debug printing for large pages:
      we do not differentiate them from others to avoid duplicating code.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a0ed4607
  11. 08 3月, 2012 3 次提交
  12. 05 3月, 2012 6 次提交
  13. 13 1月, 2012 1 次提交
  14. 27 12月, 2011 16 次提交