1. 06 8月, 2018 18 次提交
  2. 18 7月, 2018 1 次提交
    • L
      KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled · 2307af1c
      Liran Alon 提交于
      When eVMCS is enabled, all VMCS allocated to be used by KVM are marked
      with revision_id of KVM_EVMCS_VERSION instead of revision_id reported
      by MSR_IA32_VMX_BASIC.
      
      However, even though not explictly documented by TLFS, VMXArea passed
      as VMXON argument should still be marked with revision_id reported by
      physical CPU.
      
      This issue was found by the following setup:
      * L0 = KVM which expose eVMCS to it's L1 guest.
      * L1 = KVM which consume eVMCS reported by L0.
      This setup caused the following to occur:
      1) L1 execute hardware_enable().
      2) hardware_enable() calls kvm_cpu_vmxon() to execute VMXON.
      3) L0 intercept L1 VMXON and execute handle_vmon() which notes
      vmxarea->revision_id != VMCS12_REVISION and therefore fails with
      nested_vmx_failInvalid() which sets RFLAGS.CF.
      4) L1 kvm_cpu_vmxon() don't check RFLAGS.CF for failure and therefore
      hardware_enable() continues as usual.
      5) L1 hardware_enable() then calls ept_sync_global() which executes
      INVEPT.
      6) L0 intercept INVEPT and execute handle_invept() which notes
      !vmx->nested.vmxon and thus raise a #UD to L1.
      7) Raised #UD caused L1 to panic.
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Cc: stable@vger.kernel.org
      Fixes: 773e8a04Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2307af1c
  3. 15 7月, 2018 2 次提交
  4. 22 6月, 2018 1 次提交
    • M
      kvm: vmx: Nested VM-entry prereqs for event inj. · 0447378a
      Marc Orr 提交于
      This patch extends the checks done prior to a nested VM entry.
      Specifically, it extends the check_vmentry_prereqs function with checks
      for fields relevant to the VM-entry event injection information, as
      described in the Intel SDM, volume 3.
      
      This patch is motivated by a syzkaller bug, where a bad VM-entry
      interruption information field is generated in the VMCS02, which causes
      the nested VM launch to fail. Then, KVM fails to resume L1.
      
      While KVM should be improved to correctly resume L1 execution after a
      failed nested launch, this change is justified because the existing code
      to resume L1 is flaky/ad-hoc and the test coverage for resuming L1 is
      sparse.
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NMarc Orr <marcorr@google.com>
      [Removed comment whose parts were describing previous revisions and the
       rest was obvious from function/variable naming. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      0447378a
  5. 15 6月, 2018 1 次提交
    • A
      KVM: x86: VMX: redo fix for link error without CONFIG_HYPERV · 1f008e11
      Arnd Bergmann 提交于
      Arnd had sent this patch to the KVM mailing list, but it slipped through
      the cracks of maintainers hand-off, and therefore wasn't included in
      the pull request.
      
      The same issue had been fixed by Linus in commit dbee3d02 ("KVM: x86:
      VMX: fix build without hyper-v", 2018-06-12) as a self-described
      "quick-and-hacky build fix".  However, checking the compile-time
      configuration symbol with IS_ENABLED is cleaner and it is enough to
      avoid the link error, so switch to Arnd's solution.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      [Rewritten commit message. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1f008e11
  6. 13 6月, 2018 1 次提交
    • L
      KVM: x86: VMX: fix build without hyper-v · dbee3d02
      Linus Torvalds 提交于
      Commit ceef7d10 ("KVM: x86: VMX: hyper-v: Enlightened MSR-Bitmap
      support") broke the build with Hyper-V disabled, because it accesses
      ms_hyperv.nested_features without checking if that exists.
      
      This is the quick-and-hacky build fix.
      
      I suspect the proper fix is to replace the
      
          static_branch_unlikely(&enable_evmcs)
      
      tests with an inline helper function that also checks that CONFIG_HYPERV
      is enabled, since without that, enable_evmcs makes no sense.
      
      But I want a working build environment first and foremost, and I'm upset
      this slipped through in the first place.  My primary build tests missed
      it because I tend to build with everything enabled, but it should have
      been caught in the kvm tree.
      
      Fixes: ceef7d10 ("KVM: x86: VMX: hyper-v: Enlightened MSR-Bitmap support")
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dbee3d02
  7. 12 6月, 2018 2 次提交
  8. 04 6月, 2018 3 次提交
    • J
      kvm: nVMX: Add support for "VMWRITE to any supported field" · f4160e45
      Jim Mattson 提交于
      Add support for "VMWRITE to any supported field in the VMCS" and
      enable this feature by default in L1's IA32_VMX_MISC MSR. If userspace
      clears the VMX capability bit, the old behavior will be restored.
      
      Note that this feature is a prerequisite for kvm in L1 to use VMCS
      shadowing, once that feature is available.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f4160e45
    • J
      kvm: nVMX: Restrict VMX capability MSR changes · a943ac50
      Jim Mattson 提交于
      Disallow changes to the VMX capability MSRs while the vCPU is in VMX
      operation. Although this does break the existing API, it helps to
      avoid some potentially tricky situations for which there is no
      architected behavior.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a943ac50
    • W
      KVM: VMX: Optimize tscdeadline timer latency · c5ce8235
      Wanpeng Li 提交于
      'Commit d0659d94 ("KVM: x86: add option to advance tscdeadline
      hrtimer expiration")' advances the tscdeadline (the timer is emulated
      by hrtimer) expiration in order that the latency which is incurred
      by hypervisor (apic_timer_fn -> vmentry) can be avoided. This patch
      adds the advance tscdeadline expiration support to which the tscdeadline
      timer is emulated by VMX preemption timer to reduce the hypervisor
      lantency (handle_preemption_timer -> vmentry). The guest can also
      set an expiration that is very small (for example in Linux if an
      hrtimer feeds a expiration in the past); in that case we set delta_tsc
      to 0, leading to an immediately vmexit when delta_tsc is not bigger than
      advance ns.
      
      This patch can reduce ~63% latency (~4450 cycles to ~1660 cycles on
      a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing
      busy waits.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c5ce8235
  9. 02 6月, 2018 1 次提交
    • M
      kvm: Make VM ioctl do valloc for some archs · d1e5b0e9
      Marc Orr 提交于
      The kvm struct has been bloating. For example, it's tens of kilo-bytes
      for x86, which turns out to be a large amount of memory to allocate
      contiguously via kzalloc. Thus, this patch does the following:
      1. Uses architecture-specific routines to allocate the kvm struct via
         vzalloc for x86.
      2. Switches arm to __KVM_HAVE_ARCH_VM_ALLOC so that it can use vzalloc
         when has_vhe() is true.
      
      Other architectures continue to default to kalloc, as they have a
      dependency on kalloc or have a small-enough struct kvm.
      Signed-off-by: NMarc Orr <marcorr@google.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d1e5b0e9
  10. 25 5月, 2018 3 次提交
  11. 23 5月, 2018 3 次提交
    • J
      KVM: nVMX: Ensure that VMCS12 field offsets do not change · 21ebf53b
      Jim Mattson 提交于
      Enforce the invariant that existing VMCS12 field offsets must not
      change. Experience has shown that without strict enforcement, this
      invariant will not be maintained.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [Changed the code to use BUILD_BUG_ON_MSG instead of better, but GCC 4.6
       requiring _Static_assert. - Radim.]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      21ebf53b
    • J
      KVM: nVMX: Restore the VMCS12 offsets for v4.0 fields · b348e793
      Jim Mattson 提交于
      Changing the VMCS12 layout will break save/restore compatibility with
      older kvm releases once the KVM_{GET,SET}_NESTED_STATE ioctls are
      accepted upstream. Google has already been using these ioctls for some
      time, and we implore the community not to disturb the existing layout.
      
      Move the four most recently added fields to preserve the offsets of
      the previously defined fields and reserve locations for the vmread and
      vmwrite bitmaps, which will be used in the virtualization of VMCS
      shadowing (to improve the performance of double-nesting).
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [Kept the SDM order in vmcs_field_to_offset_table. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b348e793
    • J
      kvm: nVMX: Use nested_run_pending rather than from_vmentry · 6514dc38
      Jim Mattson 提交于
      When saving a vCPU's nested state, the vmcs02 is discarded. Only the
      shadow vmcs12 is saved. The shadow vmcs12 contains all of the
      information needed to reconstruct an equivalent vmcs02 on restore, but
      we have to be able to deal with two contexts:
      
      1. The nested state was saved immediately after an emulated VM-entry,
         before the vmcs02 was ever launched.
      
      2. The nested state was saved some time after the first successful
         launch of the vmcs02.
      
      Though it's an implementation detail rather than an architected bit,
      vmx->nested_run_pending serves to distinguish between these two
      cases. Hence, we save it as part of the vCPU's nested state. (Yes,
      this is ugly.)
      
      Even when restoring from a checkpoint, it may be necessary to build
      the vmcs02 as if prepare_vmcs02 was called from nested_vmx_run. So,
      the 'from_vmentry' argument should be dropped, and
      vmx->nested_run_pending should be consulted instead. The nested state
      restoration code then has to set vmx->nested_run_pending prior to
      calling prepare_vmcs02. It's important that the restoration code set
      vmx->nested_run_pending anyway, since the flag impacts things like
      interrupt delivery as well.
      
      Fixes: cf8b84f4 ("kvm: nVMX: Prepare for checkpointing L2 state")
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      6514dc38
  12. 17 5月, 2018 3 次提交
  13. 15 5月, 2018 1 次提交