1. 23 5月, 2018 4 次提交
    • J
      KVM: nVMX: Ensure that VMCS12 field offsets do not change · 21ebf53b
      Jim Mattson 提交于
      Enforce the invariant that existing VMCS12 field offsets must not
      change. Experience has shown that without strict enforcement, this
      invariant will not be maintained.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [Changed the code to use BUILD_BUG_ON_MSG instead of better, but GCC 4.6
       requiring _Static_assert. - Radim.]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      21ebf53b
    • J
      KVM: nVMX: Restore the VMCS12 offsets for v4.0 fields · b348e793
      Jim Mattson 提交于
      Changing the VMCS12 layout will break save/restore compatibility with
      older kvm releases once the KVM_{GET,SET}_NESTED_STATE ioctls are
      accepted upstream. Google has already been using these ioctls for some
      time, and we implore the community not to disturb the existing layout.
      
      Move the four most recently added fields to preserve the offsets of
      the previously defined fields and reserve locations for the vmread and
      vmwrite bitmaps, which will be used in the virtualization of VMCS
      shadowing (to improve the performance of double-nesting).
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [Kept the SDM order in vmcs_field_to_offset_table. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b348e793
    • A
      KVM: x86: use timespec64 for KVM_HC_CLOCK_PAIRING · 899a31f5
      Arnd Bergmann 提交于
      The hypercall was added using a struct timespec based implementation,
      but we should not use timespec in new code.
      
      This changes it to timespec64. There is no functional change
      here since the implementation is only used in 64-bit kernels
      that use the same definition for timespec and timespec64.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      899a31f5
    • J
      kvm: nVMX: Use nested_run_pending rather than from_vmentry · 6514dc38
      Jim Mattson 提交于
      When saving a vCPU's nested state, the vmcs02 is discarded. Only the
      shadow vmcs12 is saved. The shadow vmcs12 contains all of the
      information needed to reconstruct an equivalent vmcs02 on restore, but
      we have to be able to deal with two contexts:
      
      1. The nested state was saved immediately after an emulated VM-entry,
         before the vmcs02 was ever launched.
      
      2. The nested state was saved some time after the first successful
         launch of the vmcs02.
      
      Though it's an implementation detail rather than an architected bit,
      vmx->nested_run_pending serves to distinguish between these two
      cases. Hence, we save it as part of the vCPU's nested state. (Yes,
      this is ugly.)
      
      Even when restoring from a checkpoint, it may be necessary to build
      the vmcs02 as if prepare_vmcs02 was called from nested_vmx_run. So,
      the 'from_vmentry' argument should be dropped, and
      vmx->nested_run_pending should be consulted instead. The nested state
      restoration code then has to set vmx->nested_run_pending prior to
      calling prepare_vmcs02. It's important that the restoration code set
      vmx->nested_run_pending anyway, since the flag impacts things like
      interrupt delivery as well.
      
      Fixes: cf8b84f4 ("kvm: nVMX: Prepare for checkpointing L2 state")
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      6514dc38
  2. 15 5月, 2018 8 次提交
  3. 11 5月, 2018 4 次提交
  4. 06 5月, 2018 1 次提交
    • A
      KVM: x86: remove APIC Timer periodic/oneshot spikes · ecf08dad
      Anthoine Bourgeois 提交于
      Since the commit "8003c9ae: add APIC Timer periodic/oneshot mode VMX
      preemption timer support", a Windows 10 guest has some erratic timer
      spikes.
      
      Here the results on a 150000 times 1ms timer without any load:
      	  Before 8003c9ae | After 8003c9ae
      Max           1834us          |  86000us
      Mean          1100us          |   1021us
      Deviation       59us          |    149us
      Here the results on a 150000 times 1ms timer with a cpu-z stress test:
      	  Before 8003c9ae | After 8003c9ae
      Max          32000us          | 140000us
      Mean          1006us          |   1997us
      Deviation      140us          |  11095us
      
      The root cause of the problem is starting hrtimer with an expiry time
      already in the past can take more than 20 milliseconds to trigger the
      timer function.  It can be solved by forward such past timers
      immediately, rather than submitting them to hrtimer_start().
      In case the timer is periodic, update the target expiration and call
      hrtimer_start with it.
      
      v2: Check if the tsc deadline is already expired. Thank you Mika.
      v3: Execute the past timers immediately rather than submitting them to
      hrtimer_start().
      v4: Rearm the periodic timer with advance_periodic_target_expiration() a
      simpler version of set_target_expiration(). Thank you Paolo.
      
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAnthoine Bourgeois <anthoine.bourgeois@blade-group.com>
      8003c9ae ("KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support")
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      ecf08dad
  5. 28 4月, 2018 1 次提交
  6. 27 4月, 2018 1 次提交
  7. 16 4月, 2018 2 次提交
  8. 13 4月, 2018 1 次提交
  9. 11 4月, 2018 2 次提交
  10. 10 4月, 2018 1 次提交
    • K
      X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted · 386c6ddb
      KarimAllah Ahmed 提交于
      The VMX-preemption timer is used by KVM as a way to set deadlines for the
      guest (i.e. timer emulation). That was safe till very recently when
      capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was
      introduced. According to Intel SDM 25.5.1:
      
      """
      The VMX-preemption timer operates in the C-states C0, C1, and C2; it also
      operates in the shutdown and wait-for-SIPI states. If the timer counts down
      to zero in any state other than the wait-for SIPI state, the logical
      processor transitions to the C0 C-state and causes a VM exit; the timer
      does not cause a VM exit if it counts down to zero in the wait-for-SIPI
      state. The timer is not decremented in C-states deeper than C2.
      """
      
      Now once the guest issues the MWAIT with a c-state deeper than
      C2 the preemption timer will never wake it up again since it stopped
      ticking! Usually this is compensated by other activities in the system that
      would wake the core from the deep C-state (and cause a VMExit). For
      example, if the host itself is ticking or it received interrupts, etc!
      
      So disable the VMX-preemption timer if MWAIT is exposed to the guest!
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Fixes: 4d5422ceSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      386c6ddb
  11. 07 4月, 2018 1 次提交
  12. 05 4月, 2018 7 次提交
    • P
      kvm: x86: fix a compile warning · 3140c156
      Peng Hao 提交于
      fix a "warning: no previous prototype".
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPeng Hao <peng.hao2@zte.com.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3140c156
    • W
      KVM: X86: Add Force Emulation Prefix for "emulate the next instruction" · 6c86eedc
      Wanpeng Li 提交于
      There is no easy way to force KVM to run an instruction through the emulator
      (by design as that will expose the x86 emulator as a significant attack-surface).
      However, we do wish to expose the x86 emulator in case we are testing it
      (e.g. via kvm-unit-tests). Therefore, this patch adds a "force emulation prefix"
      that is designed to raise #UD which KVM will trap and it's #UD exit-handler will
      match "force emulation prefix" to run instruction after prefix by the x86 emulator.
      To not expose the x86 emulator by default, we add a module parameter that should
      be off by default.
      
      A simple testcase here:
      
          #include <stdio.h>
          #include <string.h>
      
          #define HYPERVISOR_INFO 0x40000000
      
          #define CPUID(idx, eax, ebx, ecx, edx) \
              asm volatile (\
              "ud2a; .ascii \"kvm\"; cpuid" \
              :"=b" (*ebx), "=a" (*eax), "=c" (*ecx), "=d" (*edx) \
                  :"0"(idx) );
      
          void main()
          {
              unsigned int eax, ebx, ecx, edx;
              char string[13];
      
              CPUID(HYPERVISOR_INFO, &eax, &ebx, &ecx, &edx);
              *(unsigned int *)(string + 0) = ebx;
              *(unsigned int *)(string + 4) = ecx;
              *(unsigned int *)(string + 8) = edx;
      
              string[12] = 0;
              if (strncmp(string, "KVMKVMKVM\0\0\0", 12) == 0)
                  printf("kvm guest\n");
              else
                  printf("bare hardware\n");
          }
      Suggested-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      [Correctly handle usermode exits. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6c86eedc
    • W
      KVM: X86: Introduce handle_ud() · 082d06ed
      Wanpeng Li 提交于
      Introduce handle_ud() to handle invalid opcode, this function will be
      used by later patches.
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmář <rkrcmar@redhat.com>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      082d06ed
    • P
      KVM: vmx: unify adjacent #ifdefs · 4fde8d57
      Paolo Bonzini 提交于
      vmx_save_host_state has multiple ifdefs for CONFIG_X86_64 that have
      no other code between them.  Simplify by reducing them to a single
      conditional.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4fde8d57
    • A
      x86: kvm: hide the unused 'cpu' variable · 51e8a8cc
      Arnd Bergmann 提交于
      The local variable was newly introduced but is only accessed in one
      place on x86_64, but not on 32-bit:
      
      arch/x86/kvm/vmx.c: In function 'vmx_save_host_state':
      arch/x86/kvm/vmx.c:2175:6: error: unused variable 'cpu' [-Werror=unused-variable]
      
      This puts it into another #ifdef.
      
      Fixes: 35060ed6 ("x86/kvm/vmx: avoid expensive rdmsr for MSR_GS_BASE")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      51e8a8cc
    • S
      KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig · c75d0edc
      Sean Christopherson 提交于
      Remove the WARN_ON in handle_ept_misconfig() as it is unnecessary
      and causes false positives.  Return the unmodified result of
      kvm_mmu_page_fault() instead of converting a system error code to
      KVM_EXIT_UNKNOWN so that userspace sees the error code of the
      actual failure, not a generic "we don't know what went wrong".
      
        * kvm_mmu_page_fault() will WARN if reserved bits are set in the
          SPTEs, i.e. it covers the case where an EPT misconfig occurred
          because of a KVM bug.
      
        * The WARN_ON will fire on any system error code that is hit while
          handling the fault, e.g. -ENOMEM from mmu_topup_memory_caches()
          while handling a legitmate MMIO EPT misconfig or -EFAULT from
          kvm_handle_bad_page() if the corresponding HVA is invalid.  In
          either case, userspace should receive the original error code
          and firing a warning is incorrect behavior as KVM is operating
          as designed.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c75d0edc
    • S
      Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown" · 2c151b25
      Sean Christopherson 提交于
      The bug that led to commit 95e057e2
      was a benign warning (no adverse affects other than the warning
      itself) that was detected by syzkaller.  Further inspection shows
      that the WARN_ON in question, in handle_ept_misconfig(), is
      unnecessary and flawed (this was also briefly discussed in the
      original patch: https://patchwork.kernel.org/patch/10204649).
      
        * The WARN_ON is unnecessary as kvm_mmu_page_fault() will WARN
          if reserved bits are set in the SPTEs, i.e. it covers the case
          where an EPT misconfig occurred because of a KVM bug.
      
        * The WARN_ON is flawed because it will fire on any system error
          code that is hit while handling the fault, e.g. -ENOMEM can be
          returned by mmu_topup_memory_caches() while handling a legitmate
          MMIO EPT misconfig.
      
      The original behavior of returning -EFAULT when userspace munmaps
      an HVA without first removing the memslot is correct and desirable,
      i.e. KVM is letting userspace know it has generated a bad address.
      Returning RET_PF_EMULATE masks the WARN_ON in the EPT misconfig path,
      but does not fix the underlying bug, i.e. the WARN_ON is bogus.
      
      Furthermore, returning RET_PF_EMULATE has the unwanted side effect of
      causing KVM to attempt to emulate an instruction on any page fault
      with an invalid HVA translation, e.g. a not-present EPT violation
      on a VM_PFNMAP VMA whose fault handler failed to insert a PFN.
      
        * There is no guarantee that the fault is directly related to the
          instruction, i.e. the fault could have been triggered by a side
          effect memory access in the guest, e.g. while vectoring a #DB or
          writing a tracing record.  This could cause KVM to effectively
          mask the fault if KVM doesn't model the behavior leading to the
          fault, i.e. emulation could succeed and resume the guest.
      
        * If emulation does fail, KVM will return EMULATION_FAILED instead
          of -EFAULT, which is a red herring as the user will either debug
          a bogus emulation attempt or scratch their head wondering why we
          were attempting emulation in the first place.
      
      TL;DR: revert to returning -EFAULT and remove the bogus WARN_ON in
      handle_ept_misconfig in a future patch.
      
      This reverts commit 95e057e2.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2c151b25
  13. 04 4月, 2018 2 次提交
    • S
      kvm: Add emulation for movups/movupd · 29916968
      Stefan Fritsch 提交于
      This is very similar to the aligned versions movaps/movapd.
      
      We have seen the corresponding emulation failures with openbsd as guest
      and with Windows 10 with intel HD graphics pass through.
      Signed-off-by: NChristian Ehrhardt <christian_ehrhardt@genua.de>
      Signed-off-by: NStefan Fritsch <sf@sfritsch.de>
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      29916968
    • S
      KVM: VMX: raise internal error for exception during invalid protected mode state · add5ff7a
      Sean Christopherson 提交于
      Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
      an exception in Protected Mode while emulating guest due to invalid
      guest state.  Unlike Big RM, KVM doesn't support emulating exceptions
      in PM, i.e. PM exceptions are always injected via the VMCS.  Because
      we will never do VMRESUME due to emulation_required, the exception is
      never realized and we'll keep emulating the faulting instruction over
      and over until we receive a signal.
      
      Exit to userspace iff there is a pending exception, i.e. don't exit
      simply on a requested event. The purpose of this check and exit is to
      aid in debugging a guest that is in all likelihood already doomed.
      Invalid guest state in PM is extremely limited in normal operation,
      e.g. it generally only occurs for a few instructions early in BIOS,
      and any exception at this time is all but guaranteed to be fatal.
      Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
      handled/emulated, while checking for vectored interrupts, e.g. INTR
      and NMI, without hitting false positives would add a fair amount of
      complexity for almost no benefit (getting hit by lightning seems
      more likely than encountering this specific scenario).
      
      Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
      exception via the VMCS and emulation_required is true.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      add5ff7a
  14. 29 3月, 2018 5 次提交
    • L
      KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending · f497b6c2
      Liran Alon 提交于
      When vCPU runs L2 and there is a pending event that requires to exit
      from L2 to L1 and nested_run_pending=1, vcpu_enter_guest() will request
      an immediate-exit from L2 (See req_immediate_exit).
      
      Since now handling of req_immediate_exit also makes sure to set
      KVM_REQ_EVENT, there is no need to also set it on vmx_vcpu_run() when
      nested_run_pending=1.
      
      This optimizes cases where VMRESUME was executed by L1 to enter L2 and
      there is no pending events that require exit from L2 to L1. Previously,
      this would have set KVM_REQ_EVENT unnecessarly.
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f497b6c2
    • L
      KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending · 1a680e35
      Liran Alon 提交于
      In case L2 VMExit to L0 during event-delivery, VMCS02 is filled with
      IDT-vectoring-info which vmx_complete_interrupts() makes sure to
      reinject before next resume of L2.
      
      While handling the VMExit in L0, an IPI could be sent by another L1 vCPU
      to the L1 vCPU which currently runs L2 and exited to L0.
      
      When L0 will reach vcpu_enter_guest() and call inject_pending_event(),
      it will note that a previous event was re-injected to L2 (by
      IDT-vectoring-info) and therefore won't check if there are pending L1
      events which require exit from L2 to L1. Thus, L0 enters L2 without
      immediate VMExit even though there are pending L1 events!
      
      This commit fixes the issue by making sure to check for L1 pending
      events even if a previous event was reinjected to L2 and bailing out
      from inject_pending_event() before evaluating a new pending event in
      case an event was already reinjected.
      
      The bug was observed by the following setup:
      * L0 is a 64CPU machine which runs KVM.
      * L1 is a 16CPU machine which runs KVM.
      * L0 & L1 runs with APICv disabled.
      (Also reproduced with APICv enabled but easier to analyze below info
      with APICv disabled)
      * L1 runs a 16CPU L2 Windows Server 2012 R2 guest.
      During L2 boot, L1 hangs completely and analyzing the hang reveals that
      one L1 vCPU is holding KVM's mmu_lock and is waiting forever on an IPI
      that he has sent for another L1 vCPU. And all other L1 vCPUs are
      currently attempting to grab mmu_lock. Therefore, all L1 vCPUs are stuck
      forever (as L1 runs with kernel-preemption disabled).
      
      Observing /sys/kernel/debug/tracing/trace_pipe reveals the following
      series of events:
      (1) qemu-system-x86-19066 [030] kvm_nested_vmexit: rip:
      0xfffff802c5dca82f reason: EPT_VIOLATION ext_inf1: 0x0000000000000182
      ext_inf2: 0x00000000800000d2 ext_int: 0x00000000 ext_int_err: 0x00000000
      (2) qemu-system-x86-19054 [028] kvm_apic_accept_irq: apicid f
      vec 252 (Fixed|edge)
      (3) qemu-system-x86-19066 [030] kvm_inj_virq: irq 210
      (4) qemu-system-x86-19066 [030] kvm_entry: vcpu 15
      (5) qemu-system-x86-19066 [030] kvm_exit: reason EPT_VIOLATION
      rip 0xffffe00069202690 info 83 0
      (6) qemu-system-x86-19066 [030] kvm_nested_vmexit: rip:
      0xffffe00069202690 reason: EPT_VIOLATION ext_inf1: 0x0000000000000083
      ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
      (7) qemu-system-x86-19066 [030] kvm_nested_vmexit_inject: reason:
      EPT_VIOLATION ext_inf1: 0x0000000000000083 ext_inf2: 0x0000000000000000
      ext_int: 0x00000000 ext_int_err: 0x00000000
      (8) qemu-system-x86-19066 [030] kvm_entry: vcpu 15
      
      Which can be analyzed as follows:
      (1) L2 VMExit to L0 on EPT_VIOLATION during delivery of vector 0xd2.
      Therefore, vmx_complete_interrupts() will set KVM_REQ_EVENT and reinject
      a pending-interrupt of 0xd2.
      (2) L1 sends an IPI of vector 0xfc (CALL_FUNCTION_VECTOR) to destination
      vCPU 15. This will set relevant bit in LAPIC's IRR and set KVM_REQ_EVENT.
      (3) L0 reach vcpu_enter_guest() which calls inject_pending_event() which
      notes that interrupt 0xd2 was reinjected and therefore calls
      vmx_inject_irq() and returns. Without checking for pending L1 events!
      Note that at this point, KVM_REQ_EVENT was cleared by vcpu_enter_guest()
      before calling inject_pending_event().
      (4) L0 resumes L2 without immediate-exit even though there is a pending
      L1 event (The IPI pending in LAPIC's IRR).
      
      We have already reached the buggy scenario but events could be
      furthered analyzed:
      (5+6) L2 VMExit to L0 on EPT_VIOLATION.  This time not during
      event-delivery.
      (7) L0 decides to forward the VMExit to L1 for further handling.
      (8) L0 resumes into L1. Note that because KVM_REQ_EVENT is cleared, the
      LAPIC's IRR is not examined and therefore the IPI is still not delivered
      into L1!
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      1a680e35
    • L
      KVM: x86: Fix misleading comments on handling pending exceptions · a042c26f
      Liran Alon 提交于
      The reason that exception.pending should block re-injection of
      NMI/interrupt is not described correctly in comment in code.
      Instead, it describes why a pending exception should be injected
      before a pending NMI/interrupt.
      
      Therefore, move currently present comment to code-block evaluating
      a new pending event which explains why exception.pending is evaluated
      first.
      In addition, create a new comment describing that exception.pending
      blocks re-injection of NMI/interrupt because the exception was
      queued by handling vmexit which was due to NMI/interrupt delivery.
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@orcle.com>
      [Used a comment from Sean J <sean.j.christopherson@intel.com>. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a042c26f
    • L
      KVM: x86: Rename interrupt.pending to interrupt.injected · 04140b41
      Liran Alon 提交于
      For exceptions & NMIs events, KVM code use the following
      coding convention:
      *) "pending" represents an event that should be injected to guest at
      some point but it's side-effects have not yet occurred.
      *) "injected" represents an event that it's side-effects have already
      occurred.
      
      However, interrupts don't conform to this coding convention.
      All current code flows mark interrupt.pending when it's side-effects
      have already taken place (For example, bit moved from LAPIC IRR to
      ISR). Therefore, it makes sense to just rename
      interrupt.pending to interrupt.injected.
      
      This change follows logic of previous commit 664f8e26 ("KVM: X86:
      Fix loss of exception which has not yet been injected") which changed
      exception to follow this coding convention as well.
      
      It is important to note that in case !lapic_in_kernel(vcpu),
      interrupt.pending usage was and still incorrect.
      In this case, interrrupt.pending can only be set using one of the
      following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and
      KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that
      QEMU uses them either to re-set an "interrupt.pending" state it has
      received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or
      via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt
      from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR
      before sending ioctl to KVM. So it seems that indeed "interrupt.pending"
      in this case is also suppose to represent "interrupt.injected".
      However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr()
      is misusing (now named) interrupt.injected in order to return if
      there is a pending interrupt.
      This leads to nVMX/nSVM not be able to distinguish if it should exit
      from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should
      re-inject an injected interrupt.
      Therefore, add a FIXME at these functions for handling this issue.
      
      This patch introduce no semantics change.
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      04140b41
    • L
      KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt · 7c5a6a59
      Liran Alon 提交于
      kvm_inject_realmode_interrupt() is called from one of the injection
      functions which writes event-injection to VMCS: vmx_queue_exception(),
      vmx_inject_irq() and vmx_inject_nmi().
      
      All these functions are called just to cause an event-injection to
      guest. They are not responsible of manipulating the event-pending
      flag. The only purpose of kvm_inject_realmode_interrupt() should be
      to emulate real-mode interrupt-injection.
      
      This was also incorrect when called from vmx_queue_exception().
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      7c5a6a59