1. 14 12月, 2012 2 次提交
  2. 01 12月, 2012 2 次提交
    • W
      KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635
      Will Auld 提交于
      CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
      
      Basic design is to emulate the MSR by allowing reads and writes to a guest
      vcpu specific location to store the value of the emulated MSR while adding
      the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
      be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
      is of course as long as the "use TSC counter offsetting" VM-execution control
      is enabled as well as the IA32_TSC_ADJUST control.
      
      However, because hardware will only return the TSC + IA32_TSC_ADJUST +
      vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
      settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
      of these three locations. The argument against storing it in the actual MSR
      is performance. This is likely to be seldom used while the save/restore is
      required on every transition. IA32_TSC_ADJUST was created as a way to solve
      some issues with writing TSC itself so that is not an option either.
      
      The remaining option, defined above as our solution has the problem of
      returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
      done here) as mentioned above. However, more problematic is that storing the
      data in vmcs tsc_offset will have a different semantic effect on the system
      than does using the actual MSR. This is illustrated in the following example:
      
      The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
      process performs a rdtsc. In this case the guest process will get
      TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
      IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
      as seen by the guest do not and hence this will not cause a problem.
      Signed-off-by: NWill Auld <will.auld@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      ba904635
    • W
      KVM: x86: Add code to track call origin for msr assignment · 8fe8ab46
      Will Auld 提交于
      In order to track who initiated the call (host or guest) to modify an msr
      value I have changed function call parameters along the call path. The
      specific change is to add a struct pointer parameter that points to (index,
      data, caller) information rather than having this information passed as
      individual parameters.
      
      The initial use for this capability is for updating the IA32_TSC_ADJUST msr
      while setting the tsc value. It is anticipated that this capability is
      useful for other tasks.
      Signed-off-by: NWill Auld <will.auld@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      8fe8ab46
  3. 28 11月, 2012 3 次提交
    • M
      KVM: x86: require matched TSC offsets for master clock · b48aa97e
      Marcelo Tosatti 提交于
      With master clock, a pvclock clock read calculates:
      
      ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ]
      
      Where 'rdtsc' is the host TSC.
      
      system_timestamp and tsc_timestamp are unique, one tuple
      per VM: the "master clock".
      
      Given a host with synchronized TSCs, its obvious that
      guest TSC must be matched for the above to guarantee monotonicity.
      
      Allow master clock usage only if guest TSCs are synchronized.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      b48aa97e
    • M
      KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag · d828199e
      Marcelo Tosatti 提交于
      KVM added a global variable to guarantee monotonicity in the guest.
      One of the reasons for that is that the time between
      
      	1. ktime_get_ts(&timespec);
      	2. rdtscll(tsc);
      
      Is variable. That is, given a host with stable TSC, suppose that
      two VCPUs read the same time via ktime_get_ts() above.
      
      The time required to execute 2. is not the same on those two instances
      executing in different VCPUS (cache misses, interrupts...).
      
      If the TSC value that is used by the host to interpolate when
      calculating the monotonic time is the same value used to calculate
      the tsc_timestamp value stored in the pvclock data structure, and
      a single <system_timestamp, tsc_timestamp> tuple is visible to all
      vcpus simultaneously, this problem disappears. See comment on top
      of pvclock_update_vm_gtod_copy for details.
      
      Monotonicity is then guaranteed by synchronicity of the host TSCs
      and guest TSCs.
      
      Set TSC stable pvclock flag in that case, allowing the guest to read
      clock from userspace.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      d828199e
    • M
      KVM: x86: pass host_tsc to read_l1_tsc · 886b470c
      Marcelo Tosatti 提交于
      Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc().
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      886b470c
  4. 23 9月, 2012 1 次提交
    • J
      KVM: x86: Fix guest debug across vcpu INIT reset · c8639010
      Jan Kiszka 提交于
      If we reset a vcpu on INIT, we so far overwrote dr7 as provided by
      KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally.
      
      Fix this by saving the dr7 used for guest debugging and calculating the
      effective register value as well as switch_db_regs on any potential
      change. This will change to focus of the set_guest_debug vendor op to
      update_dp_bp_intercept.
      
      Found while trying to stop on start_secondary.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c8639010
  5. 21 9月, 2012 1 次提交
  6. 20 9月, 2012 3 次提交
  7. 06 9月, 2012 1 次提交
  8. 14 8月, 2012 1 次提交
  9. 06 8月, 2012 1 次提交
  10. 21 7月, 2012 1 次提交
  11. 19 7月, 2012 2 次提交
  12. 12 7月, 2012 1 次提交
    • M
      KVM: VMX: Implement PCID/INVPCID for guests with EPT · ad756a16
      Mao, Junjie 提交于
      This patch handles PCID/INVPCID for guests.
      
      Process-context identifiers (PCIDs) are a facility by which a logical processor
      may cache information for multiple linear-address spaces so that the processor
      may retain cached information when software switches to a different linear
      address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual
      Volume 3A for details.
      
      For guests with EPT, the PCID feature is enabled and INVPCID behaves as running
      natively.
      For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD.
      Signed-off-by: NJunjie Mao <junjie.mao@intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ad756a16
  13. 06 7月, 2012 1 次提交
  14. 25 6月, 2012 1 次提交
    • M
      KVM: host side for eoi optimization · ae7a2a3f
      Michael S. Tsirkin 提交于
      Implementation of PV EOI using shared memory.
      This reduces the number of exits an interrupt
      causes as much as by half.
      
      The idea is simple: there's a bit, per APIC, in guest memory,
      that tells the guest that it does not need EOI.
      We set it before injecting an interrupt and clear
      before injecting a nested one. Guest tests it using
      a test and clear operation - this is necessary
      so that host can detect interrupt nesting -
      and if set, it can skip the EOI MSR.
      
      There's a new MSR to set the address of said register
      in guest memory. Otherwise not much changed:
      - Guest EOI is not required
      - Register is tested & ISR is automatically cleared on exit
      
      For testing results see description of previous patch
      'kvm_para: guest side for eoi avoidance'.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ae7a2a3f
  15. 17 5月, 2012 1 次提交
    • A
      KVM: MMU: Don't use RCU for lockless shadow walking · c142786c
      Avi Kivity 提交于
      Using RCU for lockless shadow walking can increase the amount of memory
      in use by the system, since RCU grace periods are unpredictable.  We also
      have an unconditional write to a shared variable (reader_counter), which
      isn't good for scaling.
      
      Replace that with a scheme similar to x86's get_user_pages_fast(): disable
      interrupts during lockless shadow walk to force the freer
      (kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
      processor with interrupts enabled.
      
      We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
      kvm_flush_remote_tlbs() from avoiding the IPI.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      c142786c
  16. 24 4月, 2012 1 次提交
  17. 21 4月, 2012 1 次提交
  18. 08 4月, 2012 1 次提交
  19. 08 3月, 2012 8 次提交
    • K
      KVM: x86 emulator: Fix task switch privilege checks · 7f3d35fd
      Kevin Wolf 提交于
      Currently, all task switches check privileges against the DPL of the
      TSS. This is only correct for jmp/call to a TSS. If a task gate is used,
      the DPL of this take gate is used for the check instead. Exceptions,
      external interrupts and iret shouldn't perform any check.
      
      [avi: kill kvm-kmod remnants]
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      7f3d35fd
    • T
      KVM: Introduce kvm_memory_slot::arch and move lpage_info into it · db3fe4eb
      Takuya Yoshikawa 提交于
      Some members of kvm_memory_slot are not used by every architecture.
      
      This patch is the first step to make this difference clear by
      introducing kvm_memory_slot::arch;  lpage_info is moved into it.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      db3fe4eb
    • Z
      KVM: Track TSC synchronization in generations · e26101b1
      Zachary Amsden 提交于
      This allows us to track the original nanosecond and counter values
      at each phase of TSC writing by the guest.  This gets us perfect
      offset matching for stable TSC systems, and perfect software
      computed TSC matching for machines with unstable TSC.
      Signed-off-by: NZachary Amsden <zamsden@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e26101b1
    • Z
      KVM: Dont mark TSC unstable due to S4 suspend · 0dd6a6ed
      Zachary Amsden 提交于
      During a host suspend, TSC may go backwards, which KVM interprets
      as an unstable TSC.  Technically, KVM should not be marking the
      TSC unstable, which causes the TSC clocksource to go bad, but we
      need to be adjusting the TSC offsets in such a case.
      
      Dealing with this issue is a little tricky as the only place we
      can reliably do it is before much of the timekeeping infrastructure
      is up and running.  On top of this, we are not in a KVM thread
      context, so we may not be able to safely access VCPU fields.
      Instead, we compute our best known hardware offset at power-up and
      stash it to be applied to all VCPUs when they actually start running.
      Signed-off-by: NZachary Amsden <zamsden@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      0dd6a6ed
    • M
      KVM: Allow adjust_tsc_offset to be in host or guest cycles · f1e2b260
      Marcelo Tosatti 提交于
      Redefine the API to take a parameter indicating whether an
      adjustment is in host or guest cycles.
      Signed-off-by: NZachary Amsden <zamsden@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f1e2b260
    • Z
      KVM: Add last_host_tsc tracking back to KVM · 6f526ec5
      Zachary Amsden 提交于
      The variable last_host_tsc was removed from upstream code.  I am adding
      it back for two reasons.  First, it is unnecessary to use guest TSC
      computation to conclude information about the host TSC.  The guest may
      set the TSC backwards (this case handled by the previous patch), but
      the computation of guest TSC (and fetching an MSR) is significanlty more
      work and complexity than simply reading the hardware counter.  In addition,
      we don't actually need the guest TSC for any part of the computation,
      by always recomputing the offset, we can eliminate the need to deal with
      the current offset and any scaling factors that may apply.
      
      The second reason is that later on, we are going to be using the host
      TSC value to restore TSC offsets after a host S4 suspend, so we need to
      be reading the host values, not the guest values here.
      Signed-off-by: NZachary Amsden <zamsden@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      6f526ec5
    • Z
      KVM: Improve TSC offset matching · 5d3cb0f6
      Zachary Amsden 提交于
      There are a few improvements that can be made to the TSC offset
      matching code.  First, we don't need to call the 128-bit multiply
      (especially on a constant number), the code works much nicer to
      do computation in nanosecond units.
      
      Second, the way everything is setup with software TSC rate scaling,
      we currently have per-cpu rates.  Obviously this isn't too desirable
      to use in practice, but if for some reason we do change the rate of
      all VCPUs at runtime, then reset the TSCs, we will only want to
      match offsets for VCPUs running at the same rate.
      
      Finally, for the case where we have an unstable host TSC, but
      rate scaling is being done in hardware, we should call the platform
      code to compute the TSC offset, so the math is reorganized to recompute
      the base instead, then transform the base into an offset using the
      existing API.
      
      [avi: fix 64-bit division on i386]
      Signed-off-by: NZachary Amsden <zamsden@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      
      KVM: Fix 64-bit division in kvm_write_tsc()
      
      Breaks i386 build.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      5d3cb0f6
    • Z
      KVM: Infrastructure for software and hardware based TSC rate scaling · cc578287
      Zachary Amsden 提交于
      This requires some restructuring; rather than use 'virtual_tsc_khz'
      to indicate whether hardware rate scaling is in effect, we consider
      each VCPU to always have a virtual TSC rate.  Instead, there is new
      logic above the vendor-specific hardware scaling that decides whether
      it is even necessary to use and updates all rate variables used by
      common code.  This means we can simply query the virtual rate at
      any point, which is needed for software rate scaling.
      
      There is also now a threshold added to the TSC rate scaling; minor
      differences and variations of measured TSC rate can accidentally
      provoke rate scaling to be used when it is not needed.  Instead,
      we have a tolerance variable called tsc_tolerance_ppm, which is
      the maximum variation from user requested rate at which scaling
      will be used.  The default is 250ppm, which is the half the
      threshold for NTP adjustment, allowing for some hardware variation.
      
      In the event that hardware rate scaling is not available, we can
      kludge a bit by forcing TSC catchup to turn on when a faster than
      hardware speed has been requested, but there is nothing available
      yet for the reverse case; this requires a trap and emulate software
      implementation for RDTSC, which is still forthcoming.
      
      [avi: fix 64-bit division on i386]
      Signed-off-by: NZachary Amsden <zamsden@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      cc578287
  20. 05 3月, 2012 3 次提交
  21. 27 12月, 2011 4 次提交