1. 15 2月, 2017 13 次提交
    • J
      kvm: nVMX: Refactor nested_vmx_run() · 858e25c0
      Jim Mattson 提交于
      Nested_vmx_run is split into two parts: the part that handles the
      VMLAUNCH/VMRESUME instruction, and the part that modifies the vcpu state
      to transition from VMX root mode to VMX non-root mode. The latter will
      be used when restoring the checkpointed state of a vCPU that was in VMX
      operation when a snapshot was taken.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      858e25c0
    • J
      kvm: nVMX: Split VMCS checks from nested_vmx_run() · ca0bde28
      Jim Mattson 提交于
      The checks performed on the contents of the vmcs12 are extracted from
      nested_vmx_run so that they can be used to validate a vmcs12 that has
      been restored from a checkpoint.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      [Change prepare_vmcs02 and nested_vmx_load_cr3's last argument to u32,
       to match check_vmentry_postreqs.  Update comments for singlestep
       handling. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ca0bde28
    • J
      kvm: nVMX: Refactor nested_get_vmcs12_pages() · 6beb7bd5
      Jim Mattson 提交于
      Perform the checks on vmcs12 state early, but defer the gpa->hpa lookups
      until after prepare_vmcs02. Later, when we restore the checkpointed
      state of a vCPU in guest mode, we will not be able to do the gpa->hpa
      lookups when the restore is done.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6beb7bd5
    • J
      kvm: nVMX: Refactor handle_vmptrld() · a8bc284e
      Jim Mattson 提交于
      Handle_vmptrld is split into two parts: the part that handles the
      VMPTRLD instruction, and the part that establishes the current VMCS
      pointer.  The latter will be used when restoring the checkpointed state
      of a vCPU that had a valid VMCS pointer when a snapshot was taken.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a8bc284e
    • J
      kvm: nVMX: Refactor handle_vmon() · e29acc55
      Jim Mattson 提交于
      Handle_vmon is split into two parts: the part that handles the VMXON
      instruction, and the part that modifies the vcpu state to transition
      from legacy mode to VMX operation. The latter will be used when
      restoring the checkpointed state of a vCPU that was in VMX operation
      when a snapshot was taken.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e29acc55
    • J
      kvm: nVMX: Prepare for checkpointing L2 state · cf8b84f4
      Jim Mattson 提交于
      Split prepare_vmcs12 into two parts: the part that stores the current L2
      guest state and the part that sets up the exit information fields. The
      former will be used when checkpointing the vCPU's VMX state.
      
      Modify prepare_vmcs02 so that it can construct a vmcs02 midway through
      L2 execution, using the checkpointed L2 guest state saved into the
      cached vmcs12 above.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      [Rebasing: add from_vmentry argument to prepare_vmcs02 instead of using
       vmx->nested.nested_run_pending, because it is no longer 1 at the
       point prepare_vmcs02 is called. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cf8b84f4
    • P
      kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection · b95234c8
      Paolo Bonzini 提交于
      Since bf9f6ac8 ("KVM: Update Posted-Interrupts Descriptor when vCPU
      is blocked", 2015-09-18) the posted interrupt descriptor is checked
      unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
      trigger the scan and, if NMIs or SMIs are not involved, we can avoid
      the complicated event injection path.
      
      Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
      there since APICv was introduced.
      
      However, without the KVM_REQ_EVENT safety net KVM needs to be much
      more careful about races between vmx_deliver_posted_interrupt and
      vcpu_enter_guest.  First, the IPI for posted interrupts may be issued
      between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
      If that happens, kvm_trigger_posted_interrupt returns true, but
      smp_kvm_posted_intr_ipi doesn't do anything about it.  The guest is
      entered with PIR.ON, but the posted interrupt IPI has not been sent
      and the interrupt is only delivered to the guest on the next vmentry
      (if any).  To fix this, disable interrupts before setting vcpu->mode.
      This ensures that the IPI is delayed until the guest enters non-root mode;
      it is then trapped by the processor causing the interrupt to be injected.
      
      Second, the IPI may be issued between kvm_x86_ops->sync_pir_to_irr(vcpu)
      and vcpu->mode = IN_GUEST_MODE.  In this case, kvm_vcpu_kick is called
      but it (correctly) doesn't do anything because it sees vcpu->mode ==
      OUTSIDE_GUEST_MODE.  Again, the guest is entered with PIR.ON but no
      posted interrupt IPI is pending; this time, the fix for this is to move
      the RVI update after IN_GUEST_MODE.
      
      Both issues were mostly masked by the liberal usage of KVM_REQ_EVENT,
      though the second could actually happen with VT-d posted interrupts.
      In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
      in another vmentry which would inject the interrupt.
      
      This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b95234c8
    • P
      KVM: x86: do not scan IRR twice on APICv vmentry · 76dfafd5
      Paolo Bonzini 提交于
      Calls to apic_find_highest_irr are scanning IRR twice, once
      in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
      sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr;
      now that it does the computation, it can also do the RVI write.
      
      In order to avoid complications in svm.c, make the callback optional.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      76dfafd5
    • P
    • P
      KVM: x86: preparatory changes for APICv cleanups · 810e6def
      Paolo Bonzini 提交于
      Add return value to __kvm_apic_update_irr/kvm_apic_update_irr.
      Move vmx_sync_pir_to_irr around.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      810e6def
    • P
      kvm: nVMX: move nested events check to kvm_vcpu_running · 0ad3bed6
      Paolo Bonzini 提交于
      vcpu_run calls kvm_vcpu_running, not kvm_arch_vcpu_runnable,
      and the former does not call check_nested_events.
      
      Once KVM_REQ_EVENT is removed from the APICv interrupt injection
      path, however, this would leave no place to trigger a vmexit
      from L2 to L1, causing a missed interrupt delivery while in guest
      mode.  This is caught by the "ack interrupt on exit" test in
      vmx.flat.
      
      [This does not change the calls to check_nested_events in
       inject_pending_event.  That is material for a separate cleanup.]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0ad3bed6
    • P
      KVM: vmx: clear pending interrupts on KVM_SET_LAPIC · 967235d3
      Paolo Bonzini 提交于
      Pending interrupts might be in the PI descriptor when the
      LAPIC is restored from an external state; we do not want
      them to be injected.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      967235d3
    • P
      kvm: vmx: Use the hardware provided GPA instead of page walk · db1c056c
      Paolo Bonzini 提交于
      As in the SVM patch, the guest physical address is passed by
      VMX to x86_emulate_instruction already, so mark the GPA as available
      in vcpu->arch.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      db1c056c
  2. 09 2月, 2017 1 次提交
    • A
      KVM: x86: hide KVM_HC_CLOCK_PAIRING on 32 bit · 8ef81a9a
      Arnd Bergmann 提交于
      The newly added hypercall doesn't work on x86-32:
      
      arch/x86/kvm/x86.c: In function 'kvm_pv_clock_pairing':
      arch/x86/kvm/x86.c:6163:6: error: implicit declaration of function 'kvm_get_walltime_and_clockread';did you mean 'kvm_get_time_scale'? [-Werror=implicit-function-declaration]
      
      This adds an #ifdef around it, matching the one around the related
      functions that are also only implemented on 64-bit systems.
      
      Fixes: 55dd00a7 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8ef81a9a
  3. 08 2月, 2017 4 次提交
  4. 27 1月, 2017 5 次提交
  5. 21 1月, 2017 1 次提交
  6. 18 1月, 2017 1 次提交
    • P
      kvm: x86: Expose Intel VPOPCNTDQ feature to guest · a17f3227
      Piotr Luc 提交于
      Vector population count instructions for dwords and qwords are to be
      used in future Intel Xeon & Xeon Phi processors. The bit 14 of
      CPUID[level:0x07, ECX] indicates that the new instructions are
      supported by a processor.
      
      The spec can be found in the Intel Software Developer Manual (SDM)
      or in the Instruction Set Extensions Programming Reference (ISE).
      Signed-off-by: NPiotr Luc <piotr.luc@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a17f3227
  7. 17 1月, 2017 1 次提交
    • D
      KVM: x86: fix fixing of hypercalls · ce2e852e
      Dmitry Vyukov 提交于
      emulator_fix_hypercall() replaces hypercall with vmcall instruction,
      but it does not handle GP exception properly when writes the new instruction.
      It can return X86EMUL_PROPAGATE_FAULT without setting exception information.
      This leads to incorrect emulation and triggers
      WARN_ON(ctxt->exception.vector > 0x1f) in x86_emulate_insn()
      as discovered by syzkaller fuzzer:
      
      WARNING: CPU: 2 PID: 18646 at arch/x86/kvm/emulate.c:5558
      Call Trace:
       warn_slowpath_null+0x2c/0x40 kernel/panic.c:582
       x86_emulate_insn+0x16a5/0x4090 arch/x86/kvm/emulate.c:5572
       x86_emulate_instruction+0x403/0x1cc0 arch/x86/kvm/x86.c:5618
       emulate_instruction arch/x86/include/asm/kvm_host.h:1127 [inline]
       handle_exception+0x594/0xfd0 arch/x86/kvm/vmx.c:5762
       vmx_handle_exit+0x2b7/0x38b0 arch/x86/kvm/vmx.c:8625
       vcpu_enter_guest arch/x86/kvm/x86.c:6888 [inline]
       vcpu_run arch/x86/kvm/x86.c:6947 [inline]
      
      Set exception information when write in emulator_fix_hypercall() fails.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: kvm@vger.kernel.org
      Cc: syzkaller@googlegroups.com
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      ce2e852e
  8. 12 1月, 2017 4 次提交
    • P
      KVM: x86: fix emulation of "MOV SS, null selector" · 33ab9110
      Paolo Bonzini 提交于
      This is CVE-2017-2583.  On Intel this causes a failed vmentry because
      SS's type is neither 3 nor 7 (even though the manual says this check is
      only done for usable SS, and the dmesg splat says that SS is unusable!).
      On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.
      
      The fix fabricates a data segment descriptor when SS is set to a null
      selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
      Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
      this in turn ensures CPL < 3 because RPL must be equal to CPL.
      
      Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
      the bug and deciphering the manuals.
      Reported-by: NXiaohan Zhang <zhangxiaohan1@huawei.com>
      Fixes: 79d5b4c3
      Cc: stable@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      33ab9110
    • W
      KVM: x86: fix NULL deref in vcpu_scan_ioapic · 546d87e5
      Wanpeng Li 提交于
      Reported by syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0
          IP: _raw_spin_lock+0xc/0x30
          PGD 3e28eb067
          PUD 3f0ac6067
          PMD 0
          Oops: 0002 [#1] SMP
          CPU: 0 PID: 2431 Comm: test Tainted: G           OE   4.10.0-rc1+ #3
          Call Trace:
           ? kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
           kvm_arch_vcpu_ioctl_run+0x10a8/0x15f0 [kvm]
           ? pick_next_task_fair+0xe1/0x4e0
           ? kvm_arch_vcpu_load+0xea/0x260 [kvm]
           kvm_vcpu_ioctl+0x33a/0x600 [kvm]
           ? hrtimer_try_to_cancel+0x29/0x130
           ? do_nanosleep+0x97/0xf0
           do_vfs_ioctl+0xa1/0x5d0
           ? __hrtimer_init+0x90/0x90
           ? do_nanosleep+0x5b/0xf0
           SyS_ioctl+0x79/0x90
           do_syscall_64+0x6e/0x180
           entry_SYSCALL64_slow_path+0x25/0x25
          RIP: _raw_spin_lock+0xc/0x30 RSP: ffffa43688973cc0
      
      The syzkaller folks reported a NULL pointer dereference due to
      ENABLE_CAP succeeding even without an irqchip.  The Hyper-V
      synthetic interrupt controller is activated, resulting in a
      wrong request to rescan the ioapic and a NULL pointer dereference.
      
          #include <sys/ioctl.h>
          #include <sys/mman.h>
          #include <sys/types.h>
          #include <linux/kvm.h>
          #include <pthread.h>
          #include <stddef.h>
          #include <stdint.h>
          #include <stdlib.h>
          #include <string.h>
          #include <unistd.h>
      
          #ifndef KVM_CAP_HYPERV_SYNIC
          #define KVM_CAP_HYPERV_SYNIC 123
          #endif
      
          void* thr(void* arg)
          {
      	struct kvm_enable_cap cap;
      	cap.flags = 0;
      	cap.cap = KVM_CAP_HYPERV_SYNIC;
      	ioctl((long)arg, KVM_ENABLE_CAP, &cap);
      	return 0;
          }
      
          int main()
          {
      	void *host_mem = mmap(0, 0x1000, PROT_READ|PROT_WRITE,
      			MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
      	int kvmfd = open("/dev/kvm", 0);
      	int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
      	struct kvm_userspace_memory_region memreg;
      	memreg.slot = 0;
      	memreg.flags = 0;
      	memreg.guest_phys_addr = 0;
      	memreg.memory_size = 0x1000;
      	memreg.userspace_addr = (unsigned long)host_mem;
      	host_mem[0] = 0xf4;
      	ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg);
      	int cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
      	struct kvm_sregs sregs;
      	ioctl(cpufd, KVM_GET_SREGS, &sregs);
      	sregs.cr0 = 0;
      	sregs.cr4 = 0;
      	sregs.efer = 0;
      	sregs.cs.selector = 0;
      	sregs.cs.base = 0;
      	ioctl(cpufd, KVM_SET_SREGS, &sregs);
      	struct kvm_regs regs = { .rflags = 2 };
      	ioctl(cpufd, KVM_SET_REGS, &regs);
      	ioctl(vmfd, KVM_CREATE_IRQCHIP, 0);
      	pthread_t th;
      	pthread_create(&th, 0, thr, (void*)(long)cpufd);
      	usleep(rand() % 10000);
      	ioctl(cpufd, KVM_RUN, 0);
      	pthread_join(th, 0);
      	return 0;
          }
      
      This patch fixes it by failing ENABLE_CAP if without an irqchip.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Fixes: 5c919412 (kvm/x86: Hyper-V synthetic interrupt controller)
      Cc: stable@vger.kernel.org # 4.5+
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      546d87e5
    • S
      KVM: x86: Introduce segmented_write_std · 129a72a0
      Steve Rutherford 提交于
      Introduces segemented_write_std.
      
      Switches from emulated reads/writes to standard read/writes in fxsave,
      fxrstor, sgdt, and sidt.  This fixes CVE-2017-2584, a longstanding
      kernel memory leak.
      
      Since commit 283c95d0 ("KVM: x86: emulate FXSAVE and FXRSTOR",
      2016-11-09), which is luckily not yet in any final release, this would
      also be an exploitable kernel memory *write*!
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: 96051572
      Fixes: 283c95d0Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSteve Rutherford <srutherford@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      129a72a0
    • D
      KVM: x86: flush pending lapic jump label updates on module unload · cef84c30
      David Matlack 提交于
      KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled).
      These are implemented with delayed_work structs which can still be
      pending when the KVM module is unloaded. We've seen this cause kernel
      panics when the kvm_intel module is quickly reloaded.
      
      Use the new static_key_deferred_flush() API to flush pending updates on
      module unload.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cef84c30
  9. 09 1月, 2017 10 次提交