1. 28 4月, 2013 1 次提交
  2. 17 4月, 2013 2 次提交
    • Y
      KVM: VMX: Add the deliver posted interrupt algorithm · a20ed54d
      Yang Zhang 提交于
      Only deliver the posted interrupt when target vcpu is running
      and there is no previous interrupt pending in pir.
      Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
      Reviewed-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      a20ed54d
    • Y
      KVM: VMX: Enable acknowledge interupt on vmexit · a547c6db
      Yang Zhang 提交于
      The "acknowledge interrupt on exit" feature controls processor behavior
      for external interrupt acknowledgement. When this control is set, the
      processor acknowledges the interrupt controller to acquire the
      interrupt vector on VM exit.
      
      After enabling this feature, an interrupt which arrived when target cpu is
      running in vmx non-root mode will be handled by vmx handler instead of handler
      in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt
      table to let real handler to handle it. Further, we will recognize the interrupt
      and only delivery the interrupt which not belong to current vcpu through idt table.
      The interrupt which belonged to current vcpu will be handled inside vmx handler.
      This will reduce the interrupt handle cost of KVM.
      
      Also, interrupt enable logic is changed if this feature is turnning on:
      Before this patch, hypervior call local_irq_enable() to enable it directly.
      Now IF bit is set on interrupt stack frame, and will be enabled on a return from
      interrupt handler if exterrupt interrupt exists. If no external interrupt, still
      call local_irq_enable() to enable it.
      
      Refer to Intel SDM volum 3, chapter 33.2.
      Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
      Reviewed-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      a547c6db
  3. 21 3月, 2013 1 次提交
  4. 13 3月, 2013 1 次提交
    • J
      KVM: x86: Rework INIT and SIPI handling · 66450a21
      Jan Kiszka 提交于
      A VCPU sending INIT or SIPI to some other VCPU races for setting the
      remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED
      was overwritten by kvm_emulate_halt and, thus, got lost.
      
      This introduces APIC events for those two signals, keeping them in
      kvm_apic until kvm_apic_accept_events is run over the target vcpu
      context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there
      are pending events, thus if vcpu blocking should end.
      
      The patch comes with the side effect of effectively obsoleting
      KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but
      immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI.
      The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED
      state. That also means we no longer exit to user space after receiving a
      SIPI event.
      
      Furthermore, we already reset the VCPU on INIT, only fixing up the code
      segment later on when SIPI arrives. Moreover, we fix INIT handling for
      the BSP: it never enter wait-for-SIPI but directly starts over on INIT.
      Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      66450a21
  5. 12 3月, 2013 1 次提交
  6. 29 1月, 2013 2 次提交
  7. 06 12月, 2012 1 次提交
  8. 01 12月, 2012 2 次提交
    • W
      KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635
      Will Auld 提交于
      CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
      
      Basic design is to emulate the MSR by allowing reads and writes to a guest
      vcpu specific location to store the value of the emulated MSR while adding
      the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
      be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
      is of course as long as the "use TSC counter offsetting" VM-execution control
      is enabled as well as the IA32_TSC_ADJUST control.
      
      However, because hardware will only return the TSC + IA32_TSC_ADJUST +
      vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
      settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
      of these three locations. The argument against storing it in the actual MSR
      is performance. This is likely to be seldom used while the save/restore is
      required on every transition. IA32_TSC_ADJUST was created as a way to solve
      some issues with writing TSC itself so that is not an option either.
      
      The remaining option, defined above as our solution has the problem of
      returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
      done here) as mentioned above. However, more problematic is that storing the
      data in vmcs tsc_offset will have a different semantic effect on the system
      than does using the actual MSR. This is illustrated in the following example:
      
      The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
      process performs a rdtsc. In this case the guest process will get
      TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
      IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
      as seen by the guest do not and hence this will not cause a problem.
      Signed-off-by: NWill Auld <will.auld@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      ba904635
    • W
      KVM: x86: Add code to track call origin for msr assignment · 8fe8ab46
      Will Auld 提交于
      In order to track who initiated the call (host or guest) to modify an msr
      value I have changed function call parameters along the call path. The
      specific change is to add a struct pointer parameter that points to (index,
      data, caller) information rather than having this information passed as
      individual parameters.
      
      The initial use for this capability is for updating the IA32_TSC_ADJUST msr
      while setting the tsc value. It is anticipated that this capability is
      useful for other tasks.
      Signed-off-by: NWill Auld <will.auld@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      8fe8ab46
  9. 28 11月, 2012 2 次提交
  10. 22 10月, 2012 1 次提交
  11. 23 9月, 2012 1 次提交
    • J
      KVM: x86: Fix guest debug across vcpu INIT reset · c8639010
      Jan Kiszka 提交于
      If we reset a vcpu on INIT, we so far overwrote dr7 as provided by
      KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally.
      
      Fix this by saving the dr7 used for guest debugging and calculating the
      effective register value as well as switch_db_regs on any potential
      change. This will change to focus of the set_guest_debug vendor op to
      update_dp_bp_intercept.
      
      Found while trying to stop on start_secondary.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c8639010
  12. 17 9月, 2012 1 次提交
  13. 05 9月, 2012 1 次提交
  14. 06 8月, 2012 1 次提交
  15. 21 7月, 2012 1 次提交
  16. 12 7月, 2012 1 次提交
    • M
      KVM: VMX: Implement PCID/INVPCID for guests with EPT · ad756a16
      Mao, Junjie 提交于
      This patch handles PCID/INVPCID for guests.
      
      Process-context identifiers (PCIDs) are a facility by which a logical processor
      may cache information for multiple linear-address spaces so that the processor
      may retain cached information when software switches to a different linear
      address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual
      Volume 3A for details.
      
      For guests with EPT, the PCID feature is enabled and INVPCID behaves as running
      natively.
      For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD.
      Signed-off-by: NJunjie Mao <junjie.mao@intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ad756a16
  17. 06 6月, 2012 1 次提交
  18. 17 4月, 2012 1 次提交
  19. 08 4月, 2012 1 次提交
  20. 08 3月, 2012 5 次提交
  21. 05 3月, 2012 2 次提交
  22. 02 3月, 2012 1 次提交
  23. 27 12月, 2011 1 次提交
  24. 30 10月, 2011 1 次提交
    • J
      KVM: SVM: Keep intercepting task switching with NPT enabled · f1c1da2b
      Jan Kiszka 提交于
      AMD processors apparently have a bug in the hardware task switching
      support when NPT is enabled. If the task switch triggers a NPF, we can
      get wrong EXITINTINFO along with that fault. On resume, spurious
      exceptions may then be injected into the guest.
      
      We were able to reproduce this bug when our guest triggered #SS and the
      handler were supposed to run over a separate task with not yet touched
      stack pages.
      
      Work around the issue by continuing to emulate task switches even in
      NPT mode.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      f1c1da2b
  25. 26 9月, 2011 6 次提交
    • J
      KVM: x86: Move kvm_trace_exit into atomic vmexit section · 1e2b1dd7
      Jan Kiszka 提交于
      This avoids that events causing the vmexit are recorded before the
      actual exit reason.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      1e2b1dd7
    • N
      KVM: SVM: Fix TSC MSR read in nested SVM · 45133eca
      Nadav Har'El 提交于
      When the TSC MSR is read by an L2 guest (when L1 allowed this MSR to be
      read without exit), we need to return L2's notion of the TSC, not L1's.
      
      The current code incorrectly returned L1 TSC, because svm_get_msr() was also
      used in x86.c where this was assumed, but now that these places call the new
      svm_read_l1_tsc(), the MSR read can be fixed.
      Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
      Tested-by: NJoerg Roedel <joerg.roedel@amd.com>
      Acked-by: NJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      45133eca
    • N
      KVM: L1 TSC handling · d5c1785d
      Nadav Har'El 提交于
      KVM assumed in several places that reading the TSC MSR returns the value for
      L1. This is incorrect, because when L2 is running, the correct TSC read exit
      emulation is to return L2's value.
      
      We therefore add a new x86_ops function, read_l1_tsc, to use in places that
      specifically need to read the L1 TSC, NOT the TSC of the current level of
      guest.
      
      Note that one change, of one line in kvm_arch_vcpu_load, is made redundant
      by a different patch sent by Zachary Amsden (and not yet applied):
      kvm_arch_vcpu_load() should not read the guest TSC, and if it didn't, of
      course we didn't have to change the call of kvm_get_msr() to read_l1_tsc().
      
      [avi: moved callback to kvm_x86_ops tsc block]
      Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
      Acked-by: NZachary Amsdem <zamsden@gmail.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d5c1785d
    • A
      KVM: MMU: Do not unconditionally read PDPTE from guest memory · e4e517b4
      Avi Kivity 提交于
      Architecturally, PDPTEs are cached in the PDPTRs when CR3 is reloaded.
      On SVM, it is not possible to implement this, but on VMX this is possible
      and was indeed implemented until nested SVM changed this to unconditionally
      read PDPTEs dynamically.  This has noticable impact when running PAE guests.
      
      Fix by changing the MMU to read PDPTRs from the cache, falling back to
      reading from memory for the nested MMU.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Tested-by: NJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      e4e517b4
    • S
      KVM: Use __print_symbolic() for vmexit tracepoints · 0d460ffc
      Stefan Hajnoczi 提交于
      The vmexit tracepoints format the exit_reason to make it human-readable.
      Since the exit_reason depends on the instruction set (vmx or svm),
      formatting is handled with ftrace_print_symbols_seq() by referring to
      the appropriate exit reason table.
      
      However, the ftrace_print_symbols_seq() function is not meant to be used
      directly in tracepoints since it does not export the formatting table
      which userspace tools like trace-cmd and perf use to format traces.
      
      In practice perf dies when formatting vmexit-related events and
      trace-cmd falls back to printing the numeric value (with extra
      formatting code in the kvm plugin to paper over this limitation).  Other
      userspace consumers of vmexit-related tracepoints would be in similar
      trouble.
      
      To avoid significant changes to the kvm_exit tracepoint, this patch
      moves the vmx and svm exit reason tables into arch/x86/kvm/trace.h and
      selects the right table with __print_symbolic() depending on the
      instruction set.  Note that __print_symbolic() is designed for exporting
      the formatting table to userspace and allows trace-cmd and perf to work.
      Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      0d460ffc
    • S
      KVM: Record instruction set in all vmexit tracepoints · e097e5ff
      Stefan Hajnoczi 提交于
      The kvm_exit tracepoint recently added the isa argument to aid decoding
      exit_reason.  The semantics of exit_reason depend on the instruction set
      (vmx or svm) and the isa argument allows traces to be analyzed on other
      machines.
      
      Add the isa argument to kvm_nested_vmexit and kvm_nested_vmexit_inject
      so these tracepoints can also be self-describing.
      Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e097e5ff
  26. 12 7月, 2011 1 次提交
    • N
      KVM: nVMX: Allow setting the VMXE bit in CR4 · 5e1746d6
      Nadav Har'El 提交于
      This patch allows the guest to enable the VMXE bit in CR4, which is a
      prerequisite to running VMXON.
      
      Whether to allow setting the VMXE bit now depends on the architecture (svm
      or vmx), so its checking has moved to kvm_x86_ops->set_cr4(). This function
      now returns an int: If kvm_x86_ops->set_cr4() returns 1, __kvm_set_cr4()
      will also return 1, and this will cause kvm_set_cr4() will throw a #GP.
      
      Turning on the VMXE bit is allowed only when the nested VMX feature is
      enabled, and turning it off is forbidden after a vmxon.
      Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      5e1746d6