1. 09 7月, 2012 1 次提交
    • A
      KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulation · e676505a
      Avi Kivity 提交于
      Currently the MMU's ->new_cr3() callback does nothing when guest paging
      is disabled or when two-dimentional paging (e.g. EPT on Intel) is active.
      This means that an emulated write to cr3 can be lost; kvm_set_cr3() will
      write vcpu-arch.cr3, but the GUEST_CR3 field in the VMCS will retain its
      old value and this is what the guest sees.
      
      This bug did not have any effect until now because:
      - with unrestricted guest, or with svm, we never emulate a mov cr3 instruction
      - without unrestricted guest, and with paging enabled, we also never emulate a
        mov cr3 instruction
      - without unrestricted guest, but with paging disabled, the guest's cr3 is
        ignored until the guest enables paging; at this point the value from arch.cr3
        is loaded correctly my the mov cr0 instruction which turns on paging
      
      However, the patchset that enables big real mode causes us to emulate mov cr3
      instructions in protected mode sometimes (when guest state is not virtualizable
      by vmx); this mov cr3 is effectively ignored and will crash the guest.
      
      The fix is to make nonpaging_new_cr3() call mmu_free_roots() to force a cr3
      reload.  This is awkward because now all the new_cr3 callbacks to the same
      thing, and because mmu_free_roots() is somewhat of an overkill; but fixing
      that is more complicated and will be done after this minimal fix.
      
      Observed in the Window XP 32-bit installer while bringing up secondary vcpus.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e676505a
  2. 04 7月, 2012 1 次提交
  3. 25 6月, 2012 5 次提交
  4. 19 6月, 2012 1 次提交
  5. 14 6月, 2012 1 次提交
  6. 12 6月, 2012 1 次提交
  7. 06 6月, 2012 2 次提交
    • M
      KVM: disable uninitialized var warning · 79f702a6
      Michael S. Tsirkin 提交于
      I see this in 3.5-rc1:
      
      arch/x86/kvm/mmu.c: In function ‘kvm_test_age_rmapp’:
      arch/x86/kvm/mmu.c:1271: warning: ‘iter.desc’ may be used uninitialized in this function
      
      The line in question was introduced by commit
      1e3f42f0
      
       static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
                                    unsigned long data)
       {
      -       u64 *spte;
      +       u64 *sptep;
      +       struct rmap_iterator iter;   <- line 1271
              int young = 0;
      
              /*
      
      The reason I think is that the compiler assumes that
      the rmap value could be 0, so
      
      static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator
      *iter)
      {
              if (!rmap)
                      return NULL;
      
              if (!(rmap & 1)) {
                      iter->desc = NULL;
                      return (u64 *)rmap;
              }
      
              iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
              iter->pos = 0;
              return iter->desc->sptes[iter->pos];
      }
      
      will not initialize iter.desc, but the compiler isn't
      smart enough to see that
      
              for (sptep = rmap_get_first(*rmapp, &iter); sptep;
                   sptep = rmap_get_next(&iter)) {
      
      will immediately exit in this case.
      I checked by adding
              if (!*rmapp)
                      goto out;
      on top which is clearly equivalent but disables the warning.
      
      This patch uses uninitialized_var to disable the warning without
      increasing code size.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      79f702a6
    • C
      KVM: Cleanup the kvm_print functions and introduce pr_XX wrappers · a737f256
      Christoffer Dall 提交于
      Introduces a couple of print functions, which are essentially wrappers
      around standard printk functions, with a KVM: prefix.
      
      Functions introduced or modified are:
       - kvm_err(fmt, ...)
       - kvm_info(fmt, ...)
       - kvm_debug(fmt, ...)
       - kvm_pr_unimpl(fmt, ...)
       - pr_unimpl(vcpu, fmt, ...) -> vcpu_unimpl(vcpu, fmt, ...)
      Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a737f256
  8. 05 6月, 2012 6 次提交
  9. 28 5月, 2012 1 次提交
  10. 17 5月, 2012 4 次提交
    • A
      KVM: Fix mmu_reload() clash with nested vmx event injection · d8368af8
      Avi Kivity 提交于
      Currently the inject_pending_event() call during guest entry happens after
      kvm_mmu_reload().  This is for historical reasons - we used to
      inject_pending_event() in atomic context, while kvm_mmu_reload() needs task
      context.
      
      A problem is that nested vmx can cause the mmu context to be reset, if event
      injection is intercepted and causes a #VMEXIT instead (the #VMEXIT resets
      CR0/CR3/CR4).  If this happens, we end up with invalid root_hpa, and since
      kvm_mmu_reload() has already run, no one will fix it and we end up entering
      the guest this way.
      
      Fix by reordering event injection to be before kvm_mmu_reload().  Use
      ->cancel_injection() to undo if kvm_mmu_reload() fails.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=42980Reported-by: NLuke-Jr <luke-jr+linuxbugs@utopios.org>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      d8368af8
    • A
      KVM: MMU: Don't use RCU for lockless shadow walking · c142786c
      Avi Kivity 提交于
      Using RCU for lockless shadow walking can increase the amount of memory
      in use by the system, since RCU grace periods are unpredictable.  We also
      have an unconditional write to a shared variable (reader_counter), which
      isn't good for scaling.
      
      Replace that with a scheme similar to x86's get_user_pages_fast(): disable
      interrupts during lockless shadow walk to force the freer
      (kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
      processor with interrupts enabled.
      
      We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
      kvm_flush_remote_tlbs() from avoiding the IPI.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      c142786c
    • A
      KVM: VMX: Optimize %ds, %es reload · b2da15ac
      Avi Kivity 提交于
      On x86_64, we can defer %ds and %es reload to the heavyweight context switch,
      since nothing in the lightweight paths uses the host %ds or %es (they are
      ignored by the processor).  Furthermore we can avoid the load if the segments
      are null, by letting the hardware load the null segments for us.  This is the
      expected case.
      
      On i386, we could avoid the reload entirely, since the entry.S paths take care
      of reload, except for the SYSEXIT path which leaves %ds and %es set to __USER_DS.
      So we set them to the same values as well.
      
      Saves about 70 cycles out of 1600 (around 4%; noisy measurements).
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      b2da15ac
    • A
      KVM: VMX: Fix %ds/%es clobber · 512d5649
      Avi Kivity 提交于
      The vmx exit code unconditionally restores %ds and %es to __USER_DS.  This
      can override the user's values, since %ds and %es are not saved and restored
      in x86_64 syscalls.  In practice, this isn't dangerous since nobody uses
      segment registers in long mode, least of all programs that use KVM.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      512d5649
  11. 14 5月, 2012 2 次提交
  12. 06 5月, 2012 4 次提交
  13. 28 4月, 2012 1 次提交
    • J
      KVM: x86: Run PIT work in own kthread · b6ddf05f
      Jan Kiszka 提交于
      We can't run PIT IRQ injection work in the interrupt context of the host
      timer. This would allow the user to influence the handler complexity by
      asking for a broadcast to a large number of VCPUs. Therefore, this work
      was pushed into workqueue context in 9d244caf2e. However, this prevents
      prioritizing the PIT injection over other task as workqueues share
      kernel threads.
      
      This replaces the workqueue with a kthread worker and gives that thread
      a name in the format "kvm-pit/<owner-process-pid>". That allows to
      identify and adjust the kthread priority according to the VM process
      parameters.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      b6ddf05f
  14. 24 4月, 2012 3 次提交
  15. 21 4月, 2012 3 次提交
  16. 20 4月, 2012 1 次提交
    • A
      KVM: Fix page-crossing MMIO · f78146b0
      Avi Kivity 提交于
      MMIO that are split across a page boundary are currently broken - the
      code does not expect to be aborted by the exit to userspace for the
      first MMIO fragment.
      
      This patch fixes the problem by generalizing the current code for handling
      16-byte MMIOs to handle a number of "fragments", and changes the MMIO
      code to create those fragments.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      f78146b0
  17. 19 4月, 2012 2 次提交
  18. 17 4月, 2012 1 次提交
    • M
      KVM: dont clear TMR on EOI · a0c9a822
      Michael S. Tsirkin 提交于
      Intel spec says that TMR needs to be set/cleared
      when IRR is set, but kvm also clears it on  EOI.
      
      I did some tests on a real (AMD based) system,
      and I see same TMR values both before
      and after EOI, so I think it's a minor bug in kvm.
      
      This patch fixes TMR to be set/cleared on IRR set
      only as per spec.
      
      And now that we don't clear TMR, we can save
      an atomic read of TMR on EOI that's not propagated
      to ioapic, by checking whether ioapic needs
      a specific vector first and calculating
      the mode afterwards.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      a0c9a822