1. 20 9月, 2012 2 次提交
    • A
      KVM: MMU: Optimize is_last_gpte() · 6fd01b71
      Avi Kivity 提交于
      Instead of branchy code depending on level, gpte.ps, and mmu configuration,
      prepare everything in a bitmap during mode changes and look it up during
      runtime.
      Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      6fd01b71
    • A
      KVM: MMU: Optimize pte permission checks · 97d64b78
      Avi Kivity 提交于
      walk_addr_generic() permission checks are a maze of branchy code, which is
      performed four times per lookup.  It depends on the type of access, efer.nxe,
      cr0.wp, cr4.smep, and in the near future, cr4.smap.
      
      Optimize this away by precalculating all variants and storing them in a
      bitmap.  The bitmap is recalculated when rarely-changing variables change
      (cr0, cr4) and is indexed by the often-changing variables (page fault error
      code, pte access permissions).
      
      The permission check is moved to the end of the loop, otherwise an SMEP
      fault could be reported as a false positive, when PDE.U=1 but PTE.U=0.
      Noted by Xiao Guangrong.
      
      The result is short, branch-free code.
      Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      97d64b78
  2. 06 9月, 2012 1 次提交
  3. 14 8月, 2012 1 次提交
  4. 06 8月, 2012 1 次提交
  5. 21 7月, 2012 1 次提交
  6. 19 7月, 2012 2 次提交
  7. 12 7月, 2012 1 次提交
    • M
      KVM: VMX: Implement PCID/INVPCID for guests with EPT · ad756a16
      Mao, Junjie 提交于
      This patch handles PCID/INVPCID for guests.
      
      Process-context identifiers (PCIDs) are a facility by which a logical processor
      may cache information for multiple linear-address spaces so that the processor
      may retain cached information when software switches to a different linear
      address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual
      Volume 3A for details.
      
      For guests with EPT, the PCID feature is enabled and INVPCID behaves as running
      natively.
      For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD.
      Signed-off-by: NJunjie Mao <junjie.mao@intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ad756a16
  8. 06 7月, 2012 1 次提交
  9. 25 6月, 2012 1 次提交
    • M
      KVM: host side for eoi optimization · ae7a2a3f
      Michael S. Tsirkin 提交于
      Implementation of PV EOI using shared memory.
      This reduces the number of exits an interrupt
      causes as much as by half.
      
      The idea is simple: there's a bit, per APIC, in guest memory,
      that tells the guest that it does not need EOI.
      We set it before injecting an interrupt and clear
      before injecting a nested one. Guest tests it using
      a test and clear operation - this is necessary
      so that host can detect interrupt nesting -
      and if set, it can skip the EOI MSR.
      
      There's a new MSR to set the address of said register
      in guest memory. Otherwise not much changed:
      - Guest EOI is not required
      - Register is tested & ISR is automatically cleared on exit
      
      For testing results see description of previous patch
      'kvm_para: guest side for eoi avoidance'.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ae7a2a3f
  10. 17 5月, 2012 1 次提交
    • A
      KVM: MMU: Don't use RCU for lockless shadow walking · c142786c
      Avi Kivity 提交于
      Using RCU for lockless shadow walking can increase the amount of memory
      in use by the system, since RCU grace periods are unpredictable.  We also
      have an unconditional write to a shared variable (reader_counter), which
      isn't good for scaling.
      
      Replace that with a scheme similar to x86's get_user_pages_fast(): disable
      interrupts during lockless shadow walk to force the freer
      (kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
      processor with interrupts enabled.
      
      We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
      kvm_flush_remote_tlbs() from avoiding the IPI.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      c142786c
  11. 24 4月, 2012 1 次提交
  12. 21 4月, 2012 1 次提交
  13. 08 4月, 2012 1 次提交
  14. 08 3月, 2012 8 次提交
    • K
      KVM: x86 emulator: Fix task switch privilege checks · 7f3d35fd
      Kevin Wolf 提交于
      Currently, all task switches check privileges against the DPL of the
      TSS. This is only correct for jmp/call to a TSS. If a task gate is used,
      the DPL of this take gate is used for the check instead. Exceptions,
      external interrupts and iret shouldn't perform any check.
      
      [avi: kill kvm-kmod remnants]
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      7f3d35fd
    • T
      KVM: Introduce kvm_memory_slot::arch and move lpage_info into it · db3fe4eb
      Takuya Yoshikawa 提交于
      Some members of kvm_memory_slot are not used by every architecture.
      
      This patch is the first step to make this difference clear by
      introducing kvm_memory_slot::arch;  lpage_info is moved into it.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      db3fe4eb
    • Z
      KVM: Track TSC synchronization in generations · e26101b1
      Zachary Amsden 提交于
      This allows us to track the original nanosecond and counter values
      at each phase of TSC writing by the guest.  This gets us perfect
      offset matching for stable TSC systems, and perfect software
      computed TSC matching for machines with unstable TSC.
      Signed-off-by: NZachary Amsden <zamsden@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e26101b1
    • Z
      KVM: Dont mark TSC unstable due to S4 suspend · 0dd6a6ed
      Zachary Amsden 提交于
      During a host suspend, TSC may go backwards, which KVM interprets
      as an unstable TSC.  Technically, KVM should not be marking the
      TSC unstable, which causes the TSC clocksource to go bad, but we
      need to be adjusting the TSC offsets in such a case.
      
      Dealing with this issue is a little tricky as the only place we
      can reliably do it is before much of the timekeeping infrastructure
      is up and running.  On top of this, we are not in a KVM thread
      context, so we may not be able to safely access VCPU fields.
      Instead, we compute our best known hardware offset at power-up and
      stash it to be applied to all VCPUs when they actually start running.
      Signed-off-by: NZachary Amsden <zamsden@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      0dd6a6ed
    • M
      KVM: Allow adjust_tsc_offset to be in host or guest cycles · f1e2b260
      Marcelo Tosatti 提交于
      Redefine the API to take a parameter indicating whether an
      adjustment is in host or guest cycles.
      Signed-off-by: NZachary Amsden <zamsden@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f1e2b260
    • Z
      KVM: Add last_host_tsc tracking back to KVM · 6f526ec5
      Zachary Amsden 提交于
      The variable last_host_tsc was removed from upstream code.  I am adding
      it back for two reasons.  First, it is unnecessary to use guest TSC
      computation to conclude information about the host TSC.  The guest may
      set the TSC backwards (this case handled by the previous patch), but
      the computation of guest TSC (and fetching an MSR) is significanlty more
      work and complexity than simply reading the hardware counter.  In addition,
      we don't actually need the guest TSC for any part of the computation,
      by always recomputing the offset, we can eliminate the need to deal with
      the current offset and any scaling factors that may apply.
      
      The second reason is that later on, we are going to be using the host
      TSC value to restore TSC offsets after a host S4 suspend, so we need to
      be reading the host values, not the guest values here.
      Signed-off-by: NZachary Amsden <zamsden@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      6f526ec5
    • Z
      KVM: Improve TSC offset matching · 5d3cb0f6
      Zachary Amsden 提交于
      There are a few improvements that can be made to the TSC offset
      matching code.  First, we don't need to call the 128-bit multiply
      (especially on a constant number), the code works much nicer to
      do computation in nanosecond units.
      
      Second, the way everything is setup with software TSC rate scaling,
      we currently have per-cpu rates.  Obviously this isn't too desirable
      to use in practice, but if for some reason we do change the rate of
      all VCPUs at runtime, then reset the TSCs, we will only want to
      match offsets for VCPUs running at the same rate.
      
      Finally, for the case where we have an unstable host TSC, but
      rate scaling is being done in hardware, we should call the platform
      code to compute the TSC offset, so the math is reorganized to recompute
      the base instead, then transform the base into an offset using the
      existing API.
      
      [avi: fix 64-bit division on i386]
      Signed-off-by: NZachary Amsden <zamsden@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      
      KVM: Fix 64-bit division in kvm_write_tsc()
      
      Breaks i386 build.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      5d3cb0f6
    • Z
      KVM: Infrastructure for software and hardware based TSC rate scaling · cc578287
      Zachary Amsden 提交于
      This requires some restructuring; rather than use 'virtual_tsc_khz'
      to indicate whether hardware rate scaling is in effect, we consider
      each VCPU to always have a virtual TSC rate.  Instead, there is new
      logic above the vendor-specific hardware scaling that decides whether
      it is even necessary to use and updates all rate variables used by
      common code.  This means we can simply query the virtual rate at
      any point, which is needed for software rate scaling.
      
      There is also now a threshold added to the TSC rate scaling; minor
      differences and variations of measured TSC rate can accidentally
      provoke rate scaling to be used when it is not needed.  Instead,
      we have a tolerance variable called tsc_tolerance_ppm, which is
      the maximum variation from user requested rate at which scaling
      will be used.  The default is 250ppm, which is the half the
      threshold for NTP adjustment, allowing for some hardware variation.
      
      In the event that hardware rate scaling is not available, we can
      kludge a bit by forcing TSC catchup to turn on when a faster than
      hardware speed has been requested, but there is nothing available
      yet for the reverse case; this requires a trap and emulate software
      implementation for RDTSC, which is still forthcoming.
      
      [avi: fix 64-bit division on i386]
      Signed-off-by: NZachary Amsden <zamsden@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      cc578287
  15. 05 3月, 2012 3 次提交
  16. 27 12月, 2011 10 次提交
  17. 05 10月, 2011 1 次提交
  18. 26 9月, 2011 3 次提交
    • A
      KVM: Fix simultaneous NMIs · 7460fb4a
      Avi Kivity 提交于
      If simultaneous NMIs happen, we're supposed to queue the second
      and next (collapsing them), but currently we sometimes collapse
      the second into the first.
      
      Fix by using a counter for pending NMIs instead of a bool; since
      the counter limit depends on whether the processor is currently
      in an NMI handler, which can only be checked in vcpu context
      (via the NMI mask), we add a new KVM_REQ_NMI to request recalculation
      of the counter.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      7460fb4a
    • N
      KVM: L1 TSC handling · d5c1785d
      Nadav Har'El 提交于
      KVM assumed in several places that reading the TSC MSR returns the value for
      L1. This is incorrect, because when L2 is running, the correct TSC read exit
      emulation is to return L2's value.
      
      We therefore add a new x86_ops function, read_l1_tsc, to use in places that
      specifically need to read the L1 TSC, NOT the TSC of the current level of
      guest.
      
      Note that one change, of one line in kvm_arch_vcpu_load, is made redundant
      by a different patch sent by Zachary Amsden (and not yet applied):
      kvm_arch_vcpu_load() should not read the guest TSC, and if it didn't, of
      course we didn't have to change the call of kvm_get_msr() to read_l1_tsc().
      
      [avi: moved callback to kvm_x86_ops tsc block]
      Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
      Acked-by: NZachary Amsdem <zamsden@gmail.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d5c1785d
    • A
      KVM: MMU: Do not unconditionally read PDPTE from guest memory · e4e517b4
      Avi Kivity 提交于
      Architecturally, PDPTEs are cached in the PDPTRs when CR3 is reloaded.
      On SVM, it is not possible to implement this, but on VMX this is possible
      and was indeed implemented until nested SVM changed this to unconditionally
      read PDPTEs dynamically.  This has noticable impact when running PAE guests.
      
      Fix by changing the MMU to read PDPTRs from the cache, falling back to
      reading from memory for the nested MMU.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Tested-by: NJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      e4e517b4