1. 09 7月, 2012 8 次提交
    • A
      KVM: VMX: Relax check on unusable segment · f0495f9b
      Avi Kivity 提交于
      Some userspace (e.g. QEMU 1.1) munge the d and g bits of segment
      descriptors, causing us not to recognize them as unusable segments
      with emulate_invalid_guest_state=1.  Relax the check by testing for
      segment not present (a non-present segment cannot be usable).
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f0495f9b
    • A
      KVM: x86 emulator: fix LIDT/LGDT in long mode · 510425ff
      Avi Kivity 提交于
      The operand size for these instructions is 8 bytes in long mode, even without
      a REX prefix.  Set it explicitly.
      
      Triggered while booting Linux with emulate_invalid_guest_state=1.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      510425ff
    • A
      KVM: x86 emulator: allow loading null SS in long mode · 79d5b4c3
      Avi Kivity 提交于
      Null SS is valid in long mode; allow loading it.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      79d5b4c3
    • A
      KVM: x86 emulator: emulate cpuid · 6d6eede4
      Avi Kivity 提交于
      Opcode 0F A2.
      
      Used by Linux during the mode change trampoline while in a state that is
      not virtualizable on vmx without unrestricted_guest, so we need to emulate
      it is emulate_invalid_guest_state=1.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      6d6eede4
    • A
      KVM: x86 emulator: change ->get_cpuid() accessor to use the x86 semantics · 0017f93a
      Avi Kivity 提交于
      Instead of getting an exact leaf, follow the spec and fall back to the last
      main leaf instead.  This lets us easily emulate the cpuid instruction in the
      emulator.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      0017f93a
    • A
      KVM: Split cpuid register access from computation · 62046e5a
      Avi Kivity 提交于
      Introduce kvm_cpuid() to perform the leaf limit check and calculate
      register values, and let kvm_emulate_cpuid() just handle reading and
      writing the registers from/to the vcpu.  This allows us to reuse
      kvm_cpuid() in a context where directly reading and writing registers
      is not desired.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      62046e5a
    • A
      KVM: VMX: Return correct CPL during transition to protected mode · d881e6f6
      Avi Kivity 提交于
      In protected mode, the CPL is defined as the lower two bits of CS, as set by
      the last far jump.  But during the transition to protected mode, there is no
      last far jump, so we need to return zero (the inherited real mode CPL).
      
      Fix by reading CPL from the cache during the transition.  This isn't 100%
      correct since we don't set the CPL cache on a far jump, but since protected
      mode transition will always jump to a segment with RPL=0, it will always
      work.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d881e6f6
    • A
      KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulation · e676505a
      Avi Kivity 提交于
      Currently the MMU's ->new_cr3() callback does nothing when guest paging
      is disabled or when two-dimentional paging (e.g. EPT on Intel) is active.
      This means that an emulated write to cr3 can be lost; kvm_set_cr3() will
      write vcpu-arch.cr3, but the GUEST_CR3 field in the VMCS will retain its
      old value and this is what the guest sees.
      
      This bug did not have any effect until now because:
      - with unrestricted guest, or with svm, we never emulate a mov cr3 instruction
      - without unrestricted guest, and with paging enabled, we also never emulate a
        mov cr3 instruction
      - without unrestricted guest, but with paging disabled, the guest's cr3 is
        ignored until the guest enables paging; at this point the value from arch.cr3
        is loaded correctly my the mov cr0 instruction which turns on paging
      
      However, the patchset that enables big real mode causes us to emulate mov cr3
      instructions in protected mode sometimes (when guest state is not virtualizable
      by vmx); this mov cr3 is effectively ignored and will crash the guest.
      
      The fix is to make nonpaging_new_cr3() call mmu_free_roots() to force a cr3
      reload.  This is awkward because now all the new_cr3 callbacks to the same
      thing, and because mmu_free_roots() is somewhat of an overkill; but fixing
      that is more complicated and will be done after this minimal fix.
      
      Observed in the Window XP 32-bit installer while bringing up secondary vcpus.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e676505a
  2. 04 7月, 2012 1 次提交
  3. 25 6月, 2012 5 次提交
  4. 19 6月, 2012 1 次提交
  5. 14 6月, 2012 1 次提交
  6. 12 6月, 2012 1 次提交
  7. 06 6月, 2012 2 次提交
    • M
      KVM: disable uninitialized var warning · 79f702a6
      Michael S. Tsirkin 提交于
      I see this in 3.5-rc1:
      
      arch/x86/kvm/mmu.c: In function ‘kvm_test_age_rmapp’:
      arch/x86/kvm/mmu.c:1271: warning: ‘iter.desc’ may be used uninitialized in this function
      
      The line in question was introduced by commit
      1e3f42f0
      
       static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
                                    unsigned long data)
       {
      -       u64 *spte;
      +       u64 *sptep;
      +       struct rmap_iterator iter;   <- line 1271
              int young = 0;
      
              /*
      
      The reason I think is that the compiler assumes that
      the rmap value could be 0, so
      
      static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator
      *iter)
      {
              if (!rmap)
                      return NULL;
      
              if (!(rmap & 1)) {
                      iter->desc = NULL;
                      return (u64 *)rmap;
              }
      
              iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
              iter->pos = 0;
              return iter->desc->sptes[iter->pos];
      }
      
      will not initialize iter.desc, but the compiler isn't
      smart enough to see that
      
              for (sptep = rmap_get_first(*rmapp, &iter); sptep;
                   sptep = rmap_get_next(&iter)) {
      
      will immediately exit in this case.
      I checked by adding
              if (!*rmapp)
                      goto out;
      on top which is clearly equivalent but disables the warning.
      
      This patch uses uninitialized_var to disable the warning without
      increasing code size.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      79f702a6
    • C
      KVM: Cleanup the kvm_print functions and introduce pr_XX wrappers · a737f256
      Christoffer Dall 提交于
      Introduces a couple of print functions, which are essentially wrappers
      around standard printk functions, with a KVM: prefix.
      
      Functions introduced or modified are:
       - kvm_err(fmt, ...)
       - kvm_info(fmt, ...)
       - kvm_debug(fmt, ...)
       - kvm_pr_unimpl(fmt, ...)
       - pr_unimpl(vcpu, fmt, ...) -> vcpu_unimpl(vcpu, fmt, ...)
      Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a737f256
  8. 05 6月, 2012 6 次提交
  9. 28 5月, 2012 1 次提交
  10. 17 5月, 2012 4 次提交
    • A
      KVM: Fix mmu_reload() clash with nested vmx event injection · d8368af8
      Avi Kivity 提交于
      Currently the inject_pending_event() call during guest entry happens after
      kvm_mmu_reload().  This is for historical reasons - we used to
      inject_pending_event() in atomic context, while kvm_mmu_reload() needs task
      context.
      
      A problem is that nested vmx can cause the mmu context to be reset, if event
      injection is intercepted and causes a #VMEXIT instead (the #VMEXIT resets
      CR0/CR3/CR4).  If this happens, we end up with invalid root_hpa, and since
      kvm_mmu_reload() has already run, no one will fix it and we end up entering
      the guest this way.
      
      Fix by reordering event injection to be before kvm_mmu_reload().  Use
      ->cancel_injection() to undo if kvm_mmu_reload() fails.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=42980Reported-by: NLuke-Jr <luke-jr+linuxbugs@utopios.org>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      d8368af8
    • A
      KVM: MMU: Don't use RCU for lockless shadow walking · c142786c
      Avi Kivity 提交于
      Using RCU for lockless shadow walking can increase the amount of memory
      in use by the system, since RCU grace periods are unpredictable.  We also
      have an unconditional write to a shared variable (reader_counter), which
      isn't good for scaling.
      
      Replace that with a scheme similar to x86's get_user_pages_fast(): disable
      interrupts during lockless shadow walk to force the freer
      (kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
      processor with interrupts enabled.
      
      We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
      kvm_flush_remote_tlbs() from avoiding the IPI.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      c142786c
    • A
      KVM: VMX: Optimize %ds, %es reload · b2da15ac
      Avi Kivity 提交于
      On x86_64, we can defer %ds and %es reload to the heavyweight context switch,
      since nothing in the lightweight paths uses the host %ds or %es (they are
      ignored by the processor).  Furthermore we can avoid the load if the segments
      are null, by letting the hardware load the null segments for us.  This is the
      expected case.
      
      On i386, we could avoid the reload entirely, since the entry.S paths take care
      of reload, except for the SYSEXIT path which leaves %ds and %es set to __USER_DS.
      So we set them to the same values as well.
      
      Saves about 70 cycles out of 1600 (around 4%; noisy measurements).
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      b2da15ac
    • A
      KVM: VMX: Fix %ds/%es clobber · 512d5649
      Avi Kivity 提交于
      The vmx exit code unconditionally restores %ds and %es to __USER_DS.  This
      can override the user's values, since %ds and %es are not saved and restored
      in x86_64 syscalls.  In practice, this isn't dangerous since nobody uses
      segment registers in long mode, least of all programs that use KVM.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      512d5649
  11. 14 5月, 2012 2 次提交
  12. 06 5月, 2012 4 次提交
  13. 28 4月, 2012 1 次提交
    • J
      KVM: x86: Run PIT work in own kthread · b6ddf05f
      Jan Kiszka 提交于
      We can't run PIT IRQ injection work in the interrupt context of the host
      timer. This would allow the user to influence the handler complexity by
      asking for a broadcast to a large number of VCPUs. Therefore, this work
      was pushed into workqueue context in 9d244caf2e. However, this prevents
      prioritizing the PIT injection over other task as workqueues share
      kernel threads.
      
      This replaces the workqueue with a kthread worker and gives that thread
      a name in the format "kvm-pit/<owner-process-pid>". That allows to
      identify and adjust the kthread priority according to the VM process
      parameters.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      b6ddf05f
  14. 24 4月, 2012 3 次提交