1. 05 6月, 2013 2 次提交
    • X
      KVM: MMU: reclaim the zapped-obsolete page first · 365c8868
      Xiao Guangrong 提交于
      As Marcelo pointed out that
      | "(retention of large number of pages while zapping)
      | can be fatal, it can lead to OOM and host crash"
      
      We introduce a list, kvm->arch.zapped_obsolete_pages, to link all
      the pages which are deleted from the mmu cache but not actually
      freed. When page reclaiming is needed, we always zap this kind of
      pages first.
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      365c8868
    • X
      KVM: MMU: fast invalidate all pages · 5304b8d3
      Xiao Guangrong 提交于
      The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
      walk and zap all shadow pages one by one, also it need to zap all guest
      page's rmap and all shadow page's parent spte list. Particularly, things
      become worse if guest uses more memory or vcpus. It is not good for
      scalability
      
      In this patch, we introduce a faster way to invalidate all shadow pages.
      KVM maintains a global mmu invalid generation-number which is stored in
      kvm->arch.mmu_valid_gen and every shadow page stores the current global
      generation-number into sp->mmu_valid_gen when it is created
      
      When KVM need zap all shadow pages sptes, it just simply increase the
      global generation-number then reload root shadow pages on all vcpus.
      Vcpu will create a new shadow page table according to current kvm's
      generation-number. It ensures the old pages are not used any more.
      Then the obsolete pages (sp->mmu_valid_gen != kvm->arch.mmu_valid_gen)
      are zapped by using lock-break technique
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      5304b8d3
  2. 03 5月, 2013 1 次提交
  3. 28 4月, 2013 2 次提交
  4. 27 4月, 2013 1 次提交
  5. 22 4月, 2013 1 次提交
  6. 17 4月, 2013 2 次提交
    • Y
      KVM: VMX: Add the deliver posted interrupt algorithm · a20ed54d
      Yang Zhang 提交于
      Only deliver the posted interrupt when target vcpu is running
      and there is no previous interrupt pending in pir.
      Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
      Reviewed-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      a20ed54d
    • Y
      KVM: VMX: Enable acknowledge interupt on vmexit · a547c6db
      Yang Zhang 提交于
      The "acknowledge interrupt on exit" feature controls processor behavior
      for external interrupt acknowledgement. When this control is set, the
      processor acknowledges the interrupt controller to acquire the
      interrupt vector on VM exit.
      
      After enabling this feature, an interrupt which arrived when target cpu is
      running in vmx non-root mode will be handled by vmx handler instead of handler
      in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt
      table to let real handler to handle it. Further, we will recognize the interrupt
      and only delivery the interrupt which not belong to current vcpu through idt table.
      The interrupt which belonged to current vcpu will be handled inside vmx handler.
      This will reduce the interrupt handle cost of KVM.
      
      Also, interrupt enable logic is changed if this feature is turnning on:
      Before this patch, hypervior call local_irq_enable() to enable it directly.
      Now IF bit is set on interrupt stack frame, and will be enabled on a return from
      interrupt handler if exterrupt interrupt exists. If no external interrupt, still
      call local_irq_enable() to enable it.
      
      Refer to Intel SDM volum 3, chapter 33.2.
      Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
      Reviewed-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      a547c6db
  7. 14 4月, 2013 1 次提交
  8. 08 4月, 2013 2 次提交
  9. 02 4月, 2013 1 次提交
    • P
      pmu: prepare for migration support · afd80d85
      Paolo Bonzini 提交于
      In order to migrate the PMU state correctly, we need to restore the
      values of MSR_CORE_PERF_GLOBAL_STATUS (a read-only register) and
      MSR_CORE_PERF_GLOBAL_OVF_CTRL (which has side effects when written).
      We also need to write the full 40-bit value of the performance counter,
      which would only be possible with a v3 architectural PMU's full-width
      counter MSRs.
      
      To distinguish host-initiated writes from the guest's, pass the
      full struct msr_data to kvm_pmu_set_msr.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      afd80d85
  10. 20 3月, 2013 1 次提交
  11. 14 3月, 2013 2 次提交
  12. 13 3月, 2013 1 次提交
    • J
      KVM: x86: Rework INIT and SIPI handling · 66450a21
      Jan Kiszka 提交于
      A VCPU sending INIT or SIPI to some other VCPU races for setting the
      remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED
      was overwritten by kvm_emulate_halt and, thus, got lost.
      
      This introduces APIC events for those two signals, keeping them in
      kvm_apic until kvm_apic_accept_events is run over the target vcpu
      context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there
      are pending events, thus if vcpu blocking should end.
      
      The patch comes with the side effect of effectively obsoleting
      KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but
      immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI.
      The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED
      state. That also means we no longer exit to user space after receiving a
      SIPI event.
      
      Furthermore, we already reset the VCPU on INIT, only fixing up the code
      segment later on when SIPI arrives. Moreover, we fix INIT handling for
      the BSP: it never enter wait-for-SIPI but directly starts over on INIT.
      Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      66450a21
  13. 12 3月, 2013 1 次提交
  14. 29 1月, 2013 2 次提交
  15. 22 1月, 2013 1 次提交
  16. 14 1月, 2013 1 次提交
  17. 14 12月, 2012 3 次提交
  18. 01 12月, 2012 2 次提交
    • W
      KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635
      Will Auld 提交于
      CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
      
      Basic design is to emulate the MSR by allowing reads and writes to a guest
      vcpu specific location to store the value of the emulated MSR while adding
      the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
      be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
      is of course as long as the "use TSC counter offsetting" VM-execution control
      is enabled as well as the IA32_TSC_ADJUST control.
      
      However, because hardware will only return the TSC + IA32_TSC_ADJUST +
      vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
      settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
      of these three locations. The argument against storing it in the actual MSR
      is performance. This is likely to be seldom used while the save/restore is
      required on every transition. IA32_TSC_ADJUST was created as a way to solve
      some issues with writing TSC itself so that is not an option either.
      
      The remaining option, defined above as our solution has the problem of
      returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
      done here) as mentioned above. However, more problematic is that storing the
      data in vmcs tsc_offset will have a different semantic effect on the system
      than does using the actual MSR. This is illustrated in the following example:
      
      The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
      process performs a rdtsc. In this case the guest process will get
      TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
      IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
      as seen by the guest do not and hence this will not cause a problem.
      Signed-off-by: NWill Auld <will.auld@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      ba904635
    • W
      KVM: x86: Add code to track call origin for msr assignment · 8fe8ab46
      Will Auld 提交于
      In order to track who initiated the call (host or guest) to modify an msr
      value I have changed function call parameters along the call path. The
      specific change is to add a struct pointer parameter that points to (index,
      data, caller) information rather than having this information passed as
      individual parameters.
      
      The initial use for this capability is for updating the IA32_TSC_ADJUST msr
      while setting the tsc value. It is anticipated that this capability is
      useful for other tasks.
      Signed-off-by: NWill Auld <will.auld@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      8fe8ab46
  19. 28 11月, 2012 3 次提交
    • M
      KVM: x86: require matched TSC offsets for master clock · b48aa97e
      Marcelo Tosatti 提交于
      With master clock, a pvclock clock read calculates:
      
      ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ]
      
      Where 'rdtsc' is the host TSC.
      
      system_timestamp and tsc_timestamp are unique, one tuple
      per VM: the "master clock".
      
      Given a host with synchronized TSCs, its obvious that
      guest TSC must be matched for the above to guarantee monotonicity.
      
      Allow master clock usage only if guest TSCs are synchronized.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      b48aa97e
    • M
      KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag · d828199e
      Marcelo Tosatti 提交于
      KVM added a global variable to guarantee monotonicity in the guest.
      One of the reasons for that is that the time between
      
      	1. ktime_get_ts(&timespec);
      	2. rdtscll(tsc);
      
      Is variable. That is, given a host with stable TSC, suppose that
      two VCPUs read the same time via ktime_get_ts() above.
      
      The time required to execute 2. is not the same on those two instances
      executing in different VCPUS (cache misses, interrupts...).
      
      If the TSC value that is used by the host to interpolate when
      calculating the monotonic time is the same value used to calculate
      the tsc_timestamp value stored in the pvclock data structure, and
      a single <system_timestamp, tsc_timestamp> tuple is visible to all
      vcpus simultaneously, this problem disappears. See comment on top
      of pvclock_update_vm_gtod_copy for details.
      
      Monotonicity is then guaranteed by synchronicity of the host TSCs
      and guest TSCs.
      
      Set TSC stable pvclock flag in that case, allowing the guest to read
      clock from userspace.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      d828199e
    • M
      KVM: x86: pass host_tsc to read_l1_tsc · 886b470c
      Marcelo Tosatti 提交于
      Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc().
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      886b470c
  20. 23 9月, 2012 1 次提交
    • J
      KVM: x86: Fix guest debug across vcpu INIT reset · c8639010
      Jan Kiszka 提交于
      If we reset a vcpu on INIT, we so far overwrote dr7 as provided by
      KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally.
      
      Fix this by saving the dr7 used for guest debugging and calculating the
      effective register value as well as switch_db_regs on any potential
      change. This will change to focus of the set_guest_debug vendor op to
      update_dp_bp_intercept.
      
      Found while trying to stop on start_secondary.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c8639010
  21. 21 9月, 2012 1 次提交
  22. 20 9月, 2012 3 次提交
  23. 06 9月, 2012 1 次提交
  24. 14 8月, 2012 1 次提交
  25. 06 8月, 2012 1 次提交
  26. 21 7月, 2012 1 次提交
  27. 19 7月, 2012 1 次提交