1. 06 6月, 2012 1 次提交
  2. 05 6月, 2012 6 次提交
  3. 28 5月, 2012 1 次提交
  4. 17 5月, 2012 4 次提交
    • A
      KVM: Fix mmu_reload() clash with nested vmx event injection · d8368af8
      Avi Kivity 提交于
      Currently the inject_pending_event() call during guest entry happens after
      kvm_mmu_reload().  This is for historical reasons - we used to
      inject_pending_event() in atomic context, while kvm_mmu_reload() needs task
      context.
      
      A problem is that nested vmx can cause the mmu context to be reset, if event
      injection is intercepted and causes a #VMEXIT instead (the #VMEXIT resets
      CR0/CR3/CR4).  If this happens, we end up with invalid root_hpa, and since
      kvm_mmu_reload() has already run, no one will fix it and we end up entering
      the guest this way.
      
      Fix by reordering event injection to be before kvm_mmu_reload().  Use
      ->cancel_injection() to undo if kvm_mmu_reload() fails.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=42980Reported-by: NLuke-Jr <luke-jr+linuxbugs@utopios.org>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      d8368af8
    • A
      KVM: MMU: Don't use RCU for lockless shadow walking · c142786c
      Avi Kivity 提交于
      Using RCU for lockless shadow walking can increase the amount of memory
      in use by the system, since RCU grace periods are unpredictable.  We also
      have an unconditional write to a shared variable (reader_counter), which
      isn't good for scaling.
      
      Replace that with a scheme similar to x86's get_user_pages_fast(): disable
      interrupts during lockless shadow walk to force the freer
      (kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
      processor with interrupts enabled.
      
      We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
      kvm_flush_remote_tlbs() from avoiding the IPI.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      c142786c
    • A
      KVM: VMX: Optimize %ds, %es reload · b2da15ac
      Avi Kivity 提交于
      On x86_64, we can defer %ds and %es reload to the heavyweight context switch,
      since nothing in the lightweight paths uses the host %ds or %es (they are
      ignored by the processor).  Furthermore we can avoid the load if the segments
      are null, by letting the hardware load the null segments for us.  This is the
      expected case.
      
      On i386, we could avoid the reload entirely, since the entry.S paths take care
      of reload, except for the SYSEXIT path which leaves %ds and %es set to __USER_DS.
      So we set them to the same values as well.
      
      Saves about 70 cycles out of 1600 (around 4%; noisy measurements).
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      b2da15ac
    • A
      KVM: VMX: Fix %ds/%es clobber · 512d5649
      Avi Kivity 提交于
      The vmx exit code unconditionally restores %ds and %es to __USER_DS.  This
      can override the user's values, since %ds and %es are not saved and restored
      in x86_64 syscalls.  In practice, this isn't dangerous since nobody uses
      segment registers in long mode, least of all programs that use KVM.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      512d5649
  5. 14 5月, 2012 2 次提交
  6. 06 5月, 2012 4 次提交
  7. 28 4月, 2012 1 次提交
    • J
      KVM: x86: Run PIT work in own kthread · b6ddf05f
      Jan Kiszka 提交于
      We can't run PIT IRQ injection work in the interrupt context of the host
      timer. This would allow the user to influence the handler complexity by
      asking for a broadcast to a large number of VCPUs. Therefore, this work
      was pushed into workqueue context in 9d244caf2e. However, this prevents
      prioritizing the PIT injection over other task as workqueues share
      kernel threads.
      
      This replaces the workqueue with a kthread worker and gives that thread
      a name in the format "kvm-pit/<owner-process-pid>". That allows to
      identify and adjust the kthread priority according to the VM process
      parameters.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      b6ddf05f
  8. 24 4月, 2012 3 次提交
  9. 21 4月, 2012 3 次提交
  10. 20 4月, 2012 1 次提交
    • A
      KVM: Fix page-crossing MMIO · f78146b0
      Avi Kivity 提交于
      MMIO that are split across a page boundary are currently broken - the
      code does not expect to be aborted by the exit to userspace for the
      first MMIO fragment.
      
      This patch fixes the problem by generalizing the current code for handling
      16-byte MMIOs to handle a number of "fragments", and changes the MMIO
      code to create those fragments.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      f78146b0
  11. 19 4月, 2012 2 次提交
  12. 17 4月, 2012 7 次提交
  13. 10 4月, 2012 1 次提交
  14. 08 4月, 2012 4 次提交
    • T
      KVM: MMU: Improve iteration through sptes from rmap · 1e3f42f0
      Takuya Yoshikawa 提交于
      Iteration using rmap_next(), the actual body is pte_list_next(), is
      inefficient: every time we call it we start from checking whether rmap
      holds a single spte or points to a descriptor which links more sptes.
      
      In the case of shadow paging, this quadratic total iteration cost is a
      problem.  Even for two dimensional paging, with EPT/NPT on, in which we
      almost always have a single mapping, the extra checks at the end of the
      iteration should be eliminated.
      
      This patch fixes this by introducing rmap_iterator which keeps the
      iteration context for the next search.  Furthermore the implementation
      of rmap_next() is splitted into two functions, rmap_get_first() and
      rmap_get_next(), to avoid repeatedly checking whether the rmap being
      iterated on has only one spte.
      
      Although there seemed to be only a slight change for EPT/NPT, the actual
      improvement was significant: we observed that GET_DIRTY_LOG for 1GB
      dirty memory became 15% faster than before.  This is probably because
      the new code is easy to make branch predictions.
      
      Note: we just remove pte_list_next() because we can think of parent_ptes
      as a reverse mapping.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      1e3f42f0
    • T
      KVM: MMU: Make pte_list_desc fit cache lines well · 220f773a
      Takuya Yoshikawa 提交于
      We have PTE_LIST_EXT + 1 pointers in this structure and these 40/20
      bytes do not fit cache lines well.  Furthermore, some allocators may
      use 64/32-byte objects for the pte_list_desc cache.
      
      This patch solves this problem by changing PTE_LIST_EXT from 4 to 3.
      
      For shadow paging, the new size is still large enough to hold both the
      kernel and process mappings for usual anonymous pages.  For file
      mappings, there may be a slight change in the cache usage.
      
      Note: with EPT/NPT we almost always have a single spte in each reverse
      mapping and we will not see any change by this.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      220f773a
    • D
      KVM: x86: add paging gcc optimization · c36fc04e
      Davidlohr Bueso 提交于
      Since most guests will have paging enabled for memory management, add likely() optimization
      around CR0.PG checks.
      Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c36fc04e
    • J
      KVM: VMX: Auto-load on CPUs with VMX · e9bda3b3
      Josh Triplett 提交于
      Enable x86 feature-based autoloading for the kvm-intel module on CPUs
      with X86_FEATURE_VMX.
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-By: NKay Sievers <kay@vrfy.org>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e9bda3b3