1. 12 6月, 2018 2 次提交
  2. 04 6月, 2018 3 次提交
    • J
      kvm: nVMX: Add support for "VMWRITE to any supported field" · f4160e45
      Jim Mattson 提交于
      Add support for "VMWRITE to any supported field in the VMCS" and
      enable this feature by default in L1's IA32_VMX_MISC MSR. If userspace
      clears the VMX capability bit, the old behavior will be restored.
      
      Note that this feature is a prerequisite for kvm in L1 to use VMCS
      shadowing, once that feature is available.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f4160e45
    • J
      kvm: nVMX: Restrict VMX capability MSR changes · a943ac50
      Jim Mattson 提交于
      Disallow changes to the VMX capability MSRs while the vCPU is in VMX
      operation. Although this does break the existing API, it helps to
      avoid some potentially tricky situations for which there is no
      architected behavior.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a943ac50
    • W
      KVM: VMX: Optimize tscdeadline timer latency · c5ce8235
      Wanpeng Li 提交于
      'Commit d0659d94 ("KVM: x86: add option to advance tscdeadline
      hrtimer expiration")' advances the tscdeadline (the timer is emulated
      by hrtimer) expiration in order that the latency which is incurred
      by hypervisor (apic_timer_fn -> vmentry) can be avoided. This patch
      adds the advance tscdeadline expiration support to which the tscdeadline
      timer is emulated by VMX preemption timer to reduce the hypervisor
      lantency (handle_preemption_timer -> vmentry). The guest can also
      set an expiration that is very small (for example in Linux if an
      hrtimer feeds a expiration in the past); in that case we set delta_tsc
      to 0, leading to an immediately vmexit when delta_tsc is not bigger than
      advance ns.
      
      This patch can reduce ~63% latency (~4450 cycles to ~1660 cycles on
      a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing
      busy waits.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c5ce8235
  3. 02 6月, 2018 1 次提交
    • M
      kvm: Make VM ioctl do valloc for some archs · d1e5b0e9
      Marc Orr 提交于
      The kvm struct has been bloating. For example, it's tens of kilo-bytes
      for x86, which turns out to be a large amount of memory to allocate
      contiguously via kzalloc. Thus, this patch does the following:
      1. Uses architecture-specific routines to allocate the kvm struct via
         vzalloc for x86.
      2. Switches arm to __KVM_HAVE_ARCH_VM_ALLOC so that it can use vzalloc
         when has_vhe() is true.
      
      Other architectures continue to default to kalloc, as they have a
      dependency on kalloc or have a small-enough struct kvm.
      Signed-off-by: NMarc Orr <marcorr@google.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d1e5b0e9
  4. 25 5月, 2018 3 次提交
  5. 23 5月, 2018 3 次提交
    • J
      KVM: nVMX: Ensure that VMCS12 field offsets do not change · 21ebf53b
      Jim Mattson 提交于
      Enforce the invariant that existing VMCS12 field offsets must not
      change. Experience has shown that without strict enforcement, this
      invariant will not be maintained.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [Changed the code to use BUILD_BUG_ON_MSG instead of better, but GCC 4.6
       requiring _Static_assert. - Radim.]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      21ebf53b
    • J
      KVM: nVMX: Restore the VMCS12 offsets for v4.0 fields · b348e793
      Jim Mattson 提交于
      Changing the VMCS12 layout will break save/restore compatibility with
      older kvm releases once the KVM_{GET,SET}_NESTED_STATE ioctls are
      accepted upstream. Google has already been using these ioctls for some
      time, and we implore the community not to disturb the existing layout.
      
      Move the four most recently added fields to preserve the offsets of
      the previously defined fields and reserve locations for the vmread and
      vmwrite bitmaps, which will be used in the virtualization of VMCS
      shadowing (to improve the performance of double-nesting).
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [Kept the SDM order in vmcs_field_to_offset_table. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b348e793
    • J
      kvm: nVMX: Use nested_run_pending rather than from_vmentry · 6514dc38
      Jim Mattson 提交于
      When saving a vCPU's nested state, the vmcs02 is discarded. Only the
      shadow vmcs12 is saved. The shadow vmcs12 contains all of the
      information needed to reconstruct an equivalent vmcs02 on restore, but
      we have to be able to deal with two contexts:
      
      1. The nested state was saved immediately after an emulated VM-entry,
         before the vmcs02 was ever launched.
      
      2. The nested state was saved some time after the first successful
         launch of the vmcs02.
      
      Though it's an implementation detail rather than an architected bit,
      vmx->nested_run_pending serves to distinguish between these two
      cases. Hence, we save it as part of the vCPU's nested state. (Yes,
      this is ugly.)
      
      Even when restoring from a checkpoint, it may be necessary to build
      the vmcs02 as if prepare_vmcs02 was called from nested_vmx_run. So,
      the 'from_vmentry' argument should be dropped, and
      vmx->nested_run_pending should be consulted instead. The nested state
      restoration code then has to set vmx->nested_run_pending prior to
      calling prepare_vmcs02. It's important that the restoration code set
      vmx->nested_run_pending anyway, since the flag impacts things like
      interrupt delivery as well.
      
      Fixes: cf8b84f4 ("kvm: nVMX: Prepare for checkpointing L2 state")
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      6514dc38
  6. 15 5月, 2018 3 次提交
  7. 11 5月, 2018 1 次提交
  8. 27 4月, 2018 1 次提交
  9. 16 4月, 2018 2 次提交
  10. 13 4月, 2018 1 次提交
  11. 11 4月, 2018 1 次提交
  12. 10 4月, 2018 1 次提交
    • K
      X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted · 386c6ddb
      KarimAllah Ahmed 提交于
      The VMX-preemption timer is used by KVM as a way to set deadlines for the
      guest (i.e. timer emulation). That was safe till very recently when
      capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was
      introduced. According to Intel SDM 25.5.1:
      
      """
      The VMX-preemption timer operates in the C-states C0, C1, and C2; it also
      operates in the shutdown and wait-for-SIPI states. If the timer counts down
      to zero in any state other than the wait-for SIPI state, the logical
      processor transitions to the C0 C-state and causes a VM exit; the timer
      does not cause a VM exit if it counts down to zero in the wait-for-SIPI
      state. The timer is not decremented in C-states deeper than C2.
      """
      
      Now once the guest issues the MWAIT with a c-state deeper than
      C2 the preemption timer will never wake it up again since it stopped
      ticking! Usually this is compensated by other activities in the system that
      would wake the core from the deep C-state (and cause a VMExit). For
      example, if the host itself is ticking or it received interrupts, etc!
      
      So disable the VMX-preemption timer if MWAIT is exposed to the guest!
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Fixes: 4d5422ceSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      386c6ddb
  13. 07 4月, 2018 1 次提交
  14. 05 4月, 2018 4 次提交
  15. 04 4月, 2018 1 次提交
    • S
      KVM: VMX: raise internal error for exception during invalid protected mode state · add5ff7a
      Sean Christopherson 提交于
      Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
      an exception in Protected Mode while emulating guest due to invalid
      guest state.  Unlike Big RM, KVM doesn't support emulating exceptions
      in PM, i.e. PM exceptions are always injected via the VMCS.  Because
      we will never do VMRESUME due to emulation_required, the exception is
      never realized and we'll keep emulating the faulting instruction over
      and over until we receive a signal.
      
      Exit to userspace iff there is a pending exception, i.e. don't exit
      simply on a requested event. The purpose of this check and exit is to
      aid in debugging a guest that is in all likelihood already doomed.
      Invalid guest state in PM is extremely limited in normal operation,
      e.g. it generally only occurs for a few instructions early in BIOS,
      and any exception at this time is all but guaranteed to be fatal.
      Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
      handled/emulated, while checking for vectored interrupts, e.g. INTR
      and NMI, without hitting false positives would add a fair amount of
      complexity for almost no benefit (getting hit by lightning seems
      more likely than encountering this specific scenario).
      
      Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
      exception via the VMCS and emulation_required is true.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      add5ff7a
  16. 29 3月, 2018 6 次提交
  17. 28 3月, 2018 1 次提交
    • A
      KVM: x86: Fix perf timer mode IP reporting · dd60d217
      Andi Kleen 提交于
      KVM and perf have a special backdoor mechanism to report the IP for interrupts
      re-executed after vm exit. This works for the NMIs that perf normally uses.
      
      However when perf is in timer mode it doesn't work because the timer interrupt
      doesn't get this special treatment. This is common when KVM is running
      nested in another hypervisor which may not implement the PMU, so only
      timer mode is available.
      
      Call the functions to set up the backdoor IP also for non NMI interrupts.
      
      I renamed the functions to set up the backdoor IP reporting to be more
      appropiate for their new use.  The SVM change is only compile tested.
      
      v2: Moved the functions inline.
      For the normal interrupt case the before/after functions are now
      called from x86.c, not arch specific code.
      For the NMI case we still need to call it in the architecture
      specific code, because it's already needed in the low level *_run
      functions.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      [Removed unnecessary calls from arch handle_external_intr. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      dd60d217
  18. 24 3月, 2018 4 次提交
  19. 21 3月, 2018 1 次提交