1. 23 5月, 2018 3 次提交
    • J
      KVM: nVMX: Ensure that VMCS12 field offsets do not change · 21ebf53b
      Jim Mattson 提交于
      Enforce the invariant that existing VMCS12 field offsets must not
      change. Experience has shown that without strict enforcement, this
      invariant will not be maintained.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [Changed the code to use BUILD_BUG_ON_MSG instead of better, but GCC 4.6
       requiring _Static_assert. - Radim.]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      21ebf53b
    • J
      KVM: nVMX: Restore the VMCS12 offsets for v4.0 fields · b348e793
      Jim Mattson 提交于
      Changing the VMCS12 layout will break save/restore compatibility with
      older kvm releases once the KVM_{GET,SET}_NESTED_STATE ioctls are
      accepted upstream. Google has already been using these ioctls for some
      time, and we implore the community not to disturb the existing layout.
      
      Move the four most recently added fields to preserve the offsets of
      the previously defined fields and reserve locations for the vmread and
      vmwrite bitmaps, which will be used in the virtualization of VMCS
      shadowing (to improve the performance of double-nesting).
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [Kept the SDM order in vmcs_field_to_offset_table. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b348e793
    • J
      kvm: nVMX: Use nested_run_pending rather than from_vmentry · 6514dc38
      Jim Mattson 提交于
      When saving a vCPU's nested state, the vmcs02 is discarded. Only the
      shadow vmcs12 is saved. The shadow vmcs12 contains all of the
      information needed to reconstruct an equivalent vmcs02 on restore, but
      we have to be able to deal with two contexts:
      
      1. The nested state was saved immediately after an emulated VM-entry,
         before the vmcs02 was ever launched.
      
      2. The nested state was saved some time after the first successful
         launch of the vmcs02.
      
      Though it's an implementation detail rather than an architected bit,
      vmx->nested_run_pending serves to distinguish between these two
      cases. Hence, we save it as part of the vCPU's nested state. (Yes,
      this is ugly.)
      
      Even when restoring from a checkpoint, it may be necessary to build
      the vmcs02 as if prepare_vmcs02 was called from nested_vmx_run. So,
      the 'from_vmentry' argument should be dropped, and
      vmx->nested_run_pending should be consulted instead. The nested state
      restoration code then has to set vmx->nested_run_pending prior to
      calling prepare_vmcs02. It's important that the restoration code set
      vmx->nested_run_pending anyway, since the flag impacts things like
      interrupt delivery as well.
      
      Fixes: cf8b84f4 ("kvm: nVMX: Prepare for checkpointing L2 state")
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      6514dc38
  2. 15 5月, 2018 3 次提交
  3. 11 5月, 2018 1 次提交
  4. 27 4月, 2018 1 次提交
  5. 16 4月, 2018 2 次提交
  6. 13 4月, 2018 1 次提交
  7. 11 4月, 2018 1 次提交
  8. 10 4月, 2018 1 次提交
    • K
      X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted · 386c6ddb
      KarimAllah Ahmed 提交于
      The VMX-preemption timer is used by KVM as a way to set deadlines for the
      guest (i.e. timer emulation). That was safe till very recently when
      capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was
      introduced. According to Intel SDM 25.5.1:
      
      """
      The VMX-preemption timer operates in the C-states C0, C1, and C2; it also
      operates in the shutdown and wait-for-SIPI states. If the timer counts down
      to zero in any state other than the wait-for SIPI state, the logical
      processor transitions to the C0 C-state and causes a VM exit; the timer
      does not cause a VM exit if it counts down to zero in the wait-for-SIPI
      state. The timer is not decremented in C-states deeper than C2.
      """
      
      Now once the guest issues the MWAIT with a c-state deeper than
      C2 the preemption timer will never wake it up again since it stopped
      ticking! Usually this is compensated by other activities in the system that
      would wake the core from the deep C-state (and cause a VMExit). For
      example, if the host itself is ticking or it received interrupts, etc!
      
      So disable the VMX-preemption timer if MWAIT is exposed to the guest!
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Fixes: 4d5422ceSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      386c6ddb
  9. 07 4月, 2018 1 次提交
  10. 05 4月, 2018 4 次提交
  11. 04 4月, 2018 1 次提交
    • S
      KVM: VMX: raise internal error for exception during invalid protected mode state · add5ff7a
      Sean Christopherson 提交于
      Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
      an exception in Protected Mode while emulating guest due to invalid
      guest state.  Unlike Big RM, KVM doesn't support emulating exceptions
      in PM, i.e. PM exceptions are always injected via the VMCS.  Because
      we will never do VMRESUME due to emulation_required, the exception is
      never realized and we'll keep emulating the faulting instruction over
      and over until we receive a signal.
      
      Exit to userspace iff there is a pending exception, i.e. don't exit
      simply on a requested event. The purpose of this check and exit is to
      aid in debugging a guest that is in all likelihood already doomed.
      Invalid guest state in PM is extremely limited in normal operation,
      e.g. it generally only occurs for a few instructions early in BIOS,
      and any exception at this time is all but guaranteed to be fatal.
      Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
      handled/emulated, while checking for vectored interrupts, e.g. INTR
      and NMI, without hitting false positives would add a fair amount of
      complexity for almost no benefit (getting hit by lightning seems
      more likely than encountering this specific scenario).
      
      Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
      exception via the VMCS and emulation_required is true.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      add5ff7a
  12. 29 3月, 2018 6 次提交
  13. 28 3月, 2018 1 次提交
    • A
      KVM: x86: Fix perf timer mode IP reporting · dd60d217
      Andi Kleen 提交于
      KVM and perf have a special backdoor mechanism to report the IP for interrupts
      re-executed after vm exit. This works for the NMIs that perf normally uses.
      
      However when perf is in timer mode it doesn't work because the timer interrupt
      doesn't get this special treatment. This is common when KVM is running
      nested in another hypervisor which may not implement the PMU, so only
      timer mode is available.
      
      Call the functions to set up the backdoor IP also for non NMI interrupts.
      
      I renamed the functions to set up the backdoor IP reporting to be more
      appropiate for their new use.  The SVM change is only compile tested.
      
      v2: Moved the functions inline.
      For the normal interrupt case the before/after functions are now
      called from x86.c, not arch specific code.
      For the NMI case we still need to call it in the architecture
      specific code, because it's already needed in the low level *_run
      functions.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      [Removed unnecessary calls from arch handle_external_intr. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      dd60d217
  14. 24 3月, 2018 4 次提交
  15. 21 3月, 2018 2 次提交
    • P
      KVM: nVMX: fix vmentry failure code when L2 state would require emulation · 3184a995
      Paolo Bonzini 提交于
      Commit 2bb8cafe ("KVM: vVMX: signal failure for nested VMEntry if
      emulation_required", 2018-03-12) introduces a new error path which does
      not set *entry_failure_code.  Fix that to avoid a leak of L0 stack to L1.
      Reported-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3184a995
    • L
      kvm/x86: fix icebp instruction handling · 32d43cd3
      Linus Torvalds 提交于
      The undocumented 'icebp' instruction (aka 'int1') works pretty much like
      'int3' in the absense of in-circuit probing equipment (except,
      obviously, that it raises #DB instead of raising #BP), and is used by
      some validation test-suites as such.
      
      But Andy Lutomirski noticed that his test suite acted differently in kvm
      than on bare hardware.
      
      The reason is that kvm used an inexact test for the icebp instruction:
      it just assumed that an all-zero VM exit qualification value meant that
      the VM exit was due to icebp.
      
      That is not unlike the guess that do_debug() does for the actual
      exception handling case, but it's purely a heuristic, not an absolute
      rule.  do_debug() does it because it wants to ascribe _some_ reasons to
      the #DB that happened, and an empty %dr6 value means that 'icebp' is the
      most likely casue and we have no better information.
      
      But kvm can just do it right, because unlike the do_debug() case, kvm
      actually sees the real reason for the #DB in the VM-exit interruption
      information field.
      
      So instead of relying on an inexact heuristic, just use the actual VM
      exit information that says "it was 'icebp'".
      
      Right now the 'icebp' instruction isn't technically documented by Intel,
      but that will hopefully change.  The special "privileged software
      exception" information _is_ actually mentioned in the Intel SDM, even
      though the cause of it isn't enumerated.
      Reported-by: NAndy Lutomirski <luto@kernel.org>
      Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32d43cd3
  16. 17 3月, 2018 8 次提交