1. 09 1月, 2017 2 次提交
  2. 05 1月, 2017 1 次提交
  3. 22 12月, 2016 1 次提交
  4. 19 12月, 2016 1 次提交
  5. 15 12月, 2016 1 次提交
  6. 08 12月, 2016 14 次提交
    • J
      KVM: nVMX: invvpid handling improvements · 16c2aec6
      Jan Dakinevich 提交于
       - Expose all invalidation types to the L1
      
       - Reject invvpid instruction, if L1 passed zero vpid value to single
         context invalidations
      Signed-off-by: NJan Dakinevich <jan.dakinevich@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      16c2aec6
    • L
      KVM: nVMX: check host CR3 on vmentry and vmexit · 1dc35dac
      Ladi Prosek 提交于
      This commit adds missing host CR3 checks. Before entering guest mode, the value
      of CR3 is checked for reserved bits. After returning, nested_vmx_load_cr3 is
      called to set the new CR3 value and check and load PDPTRs.
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      1dc35dac
    • L
      KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry · 9ed38ffa
      Ladi Prosek 提交于
      Loading CR3 as part of emulating vmentry is different from regular CR3 loads,
      as implemented in kvm_set_cr3, in several ways.
      
      * different rules are followed to check CR3 and it is desirable for the caller
      to distinguish between the possible failures
      * PDPTRs are not loaded if PAE paging and nested EPT are both enabled
      * many MMU operations are not necessary
      
      This patch introduces nested_vmx_load_cr3 suitable for CR3 loads as part of
      nested vmentry and vmexit, and makes use of it on the nested vmentry path.
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      9ed38ffa
    • L
      KVM: nVMX: propagate errors from prepare_vmcs02 · ee146c1c
      Ladi Prosek 提交于
      It is possible that prepare_vmcs02 fails to load the guest state. This
      patch adds the proper error handling for such a case. L1 will receive
      an INVALID_STATE vmexit with the appropriate exit qualification if it
      happens.
      
      A failure to set guest CR3 is the only error propagated from prepare_vmcs02
      at the moment.
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      ee146c1c
    • L
      KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT · 7ca29de2
      Ladi Prosek 提交于
      KVM does not correctly handle L1 hypervisors that emulate L2 real mode with
      PAE and EPT, such as Hyper-V. In this mode, the L1 hypervisor populates guest
      PDPTE VMCS fields and leaves guest CR3 uninitialized because it is not used
      (see 26.3.2.4 Loading Page-Directory-Pointer-Table Entries). KVM always
      dereferences CR3 and tries to load PDPTEs if PAE is on. This leads to two
      related issues:
      
      1) On the first nested vmentry, the guest PDPTEs, as populated by L1, are
      overwritten in ept_load_pdptrs because the registers are believed to have
      been loaded in load_pdptrs as part of kvm_set_cr3. This is incorrect. L2 is
      running with PAE enabled but PDPTRs have been set up by L1.
      
      2) When L2 is about to enable paging and loads its CR3, we, again, attempt
      to load PDPTEs in load_pdptrs called from kvm_set_cr3. There are no guarantees
      that this will succeed (it's just a CR3 load, paging is not enabled yet) and
      if it doesn't, kvm_set_cr3 returns early without persisting the CR3 which is
      then lost and L2 crashes right after it enables paging.
      
      This patch replaces the kvm_set_cr3 call with a simple register write if PAE
      and EPT are both on. CR3 is not to be interpreted in this case.
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      7ca29de2
    • D
      KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry · 5a6a9748
      David Matlack 提交于
      vmx_set_cr0() modifies GUEST_EFER and "IA-32e mode guest" in the current
      VMCS. Call vmx_set_efer() after vmx_set_cr0() so that emulated VM-entry
      is more faithful to VMCS12.
      
      This patch correctly causes VM-entry to fail when "IA-32e mode guest" is
      1 and GUEST_CR0.PG is 0. Previously this configuration would succeed and
      "IA-32e mode guest" would silently be disabled by KVM.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      5a6a9748
    • D
      KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID · 8322ebbb
      David Matlack 提交于
      MSR_IA32_CR{0,4}_FIXED1 define which bits in CR0 and CR4 are allowed to
      be 1 during VMX operation. Since the set of allowed-1 bits is the same
      in and out of VMX operation, we can generate these MSRs entirely from
      the guest's CPUID. This lets userspace avoiding having to save/restore
      these MSRs.
      
      This patch also initializes MSR_IA32_CR{0,4}_FIXED1 from the CPU's MSRs
      by default. This is a saner than the current default of -1ull, which
      includes bits that the host CPU does not support.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      8322ebbb
    • D
      KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation · 3899152c
      David Matlack 提交于
      KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning
      all CR0 and CR4 bits are allowed to be 1 during VMX operation.
      
      This does not match real hardware, which disallows the high 32 bits of
      CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits
      which are defined in the SDM but missing according to CPUID). A guest
      can induce a VM-entry failure by setting these bits in GUEST_CR0 and
      GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are
      valid.
      
      Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing
      checks on these registers do not verify must-be-0 bits. Fix these checks
      to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1.
      
      This patch should introduce no change in behavior in KVM, since these
      MSRs are still -1ULL.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      3899152c
    • D
      KVM: nVMX: support restore of VMX capability MSRs · 62cc6b9d
      David Matlack 提交于
      The VMX capability MSRs advertise the set of features the KVM virtual
      CPU can support. This set of features varies across different host CPUs
      and KVM versions. This patch aims to addresses both sources of
      differences, allowing VMs to be migrated across CPUs and KVM versions
      without guest-visible changes to these MSRs. Note that cross-KVM-
      version migration is only supported from this point forward.
      
      When the VMX capability MSRs are restored, they are audited to check
      that the set of features advertised are a subset of what KVM and the
      CPU support.
      
      Since the VMX capability MSRs are read-only, they do not need to be on
      the default MSR save/restore lists. The userspace hypervisor can set
      the values of these MSRs or read them from KVM at VCPU creation time,
      and restore the same value after every save/restore.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      62cc6b9d
    • D
      KVM: nVMX: generate non-true VMX MSRs based on true versions · 0115f9cb
      David Matlack 提交于
      The "non-true" VMX capability MSRs can be generated from their "true"
      counterparts, by OR-ing the default1 bits. The default1 bits are fixed
      and defined in the SDM.
      
      Since we can generate the non-true VMX MSRs from the true versions,
      there's no need to store both in struct nested_vmx. This also lets
      userspace avoid having to restore the non-true MSRs.
      
      Note this does not preclude emulating MSR_IA32_VMX_BASIC[55]=0. To do so,
      we simply need to set all the default1 bits in the true MSRs (such that
      the true MSRs and the generated non-true MSRs are equal).
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      0115f9cb
    • K
      KVM: x86: Add kvm_skip_emulated_instruction and use it. · 6affcbed
      Kyle Huey 提交于
      kvm_skip_emulated_instruction calls both
      kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
      skipping the emulated instruction and generating a trap if necessary.
      
      Replacing skip_emulated_instruction calls with
      kvm_skip_emulated_instruction is straightforward, except for:
      
      - ICEBP, which is already inside a trap, so avoid triggering another trap.
      - Instructions that can trigger exits to userspace, such as the IO insns,
        MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
        KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
        IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
        take precedence. The singlestep will be triggered again on the next
        instruction, which is the current behavior.
      - Task switch instructions which would require additional handling (e.g.
        the task switch bit) and are instead left alone.
      - Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
        which do not trigger singlestep traps as mentioned previously.
      Signed-off-by: NKyle Huey <khuey@kylehuey.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      6affcbed
    • K
      KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12 · eb277562
      Kyle Huey 提交于
      We can't return both the pass/fail boolean for the vmcs and the upcoming
      continue/exit-to-userspace boolean for skip_emulated_instruction out of
      nested_vmx_check_vmcs, so move skip_emulated_instruction out of it instead.
      
      Additionally, VMENTER/VMRESUME only trigger singlestep exceptions when
      they advance the IP to the following instruction, not when they a) succeed,
      b) fail MSR validation or c) throw an exception. Add a separate call to
      skip_emulated_instruction that will later not be converted to the variant
      that checks the singlestep flag.
      Signed-off-by: NKyle Huey <khuey@kylehuey.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      eb277562
    • K
      KVM: VMX: Reorder some skip_emulated_instruction calls · 09ca3f20
      Kyle Huey 提交于
      The functions being moved ahead of skip_emulated_instruction here don't
      need updated IPs, and skipping the emulated instruction at the end will
      make it easier to return its value.
      Signed-off-by: NKyle Huey <khuey@kylehuey.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      09ca3f20
    • K
      KVM: x86: Add a return value to kvm_emulate_cpuid · 6a908b62
      Kyle Huey 提交于
      Once skipping the emulated instruction can potentially trigger an exit to
      userspace (via KVM_GUESTDBG_SINGLESTEP) kvm_emulate_cpuid will need to
      propagate a return value.
      Signed-off-by: NKyle Huey <khuey@kylehuey.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      6a908b62
  7. 23 11月, 2016 1 次提交
  8. 17 11月, 2016 1 次提交
  9. 03 11月, 2016 11 次提交
  10. 01 11月, 2016 1 次提交
    • A
      x86/fpu, kvm: Remove host CR0.TS manipulation · 04ac88ab
      Andy Lutomirski 提交于
      Now that x86 always uses eager FPU switching on the host, there's no
      need for KVM to manipulate the host's CR0.TS.
      
      This should be both simpler and faster.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm list <kvm@vger.kernel.org>
      Link: http://lkml.kernel.org/r/b212064922537c05d0c81d931fc4dbe769127ce7.1477951965.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      04ac88ab
  11. 27 10月, 2016 1 次提交
  12. 23 9月, 2016 3 次提交
    • W
      KVM: nVMX: Fix the NMI IDT-vectoring handling · c5a6d5f7
      Wanpeng Li 提交于
      Run kvm-unit-tests/eventinj.flat in L1:
      
      Sending NMI to self
      After NMI to self
      FAIL: NMI
      
      This test scenario is to test whether VMM can handle NMI IDT-vectoring info correctly.
      
      At the beginning, L2 writes LAPIC to send a self NMI, the EPT page tables on both L1
      and L0 are empty so:
      
      - The L2 accesses memory can generate EPT violation which can be intercepted by L0.
      
        The EPT violation vmexit occurred during delivery of this NMI, and the NMI info is
        recorded in vmcs02's IDT-vectoring info.
      
      - L0 walks L1's EPT12 and L0 sees the mapping is invalid, it injects the EPT violation into L1.
      
        The vmcs02's IDT-vectoring info is reflected to vmcs12's IDT-vectoring info since
        it is a nested vmexit.
      
      - L1 receives the EPT violation, then fixes its EPT12.
      - L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exits to L0.
      - L0 emulates VMRESUME which is called from L1, then return to L2.
      
        L0 merges the requirement of vmcs12's IDT-vectoring info and injects it to L2 through
        vmcs02.
      
      - The L2 re-executes the fault instruction and cause EPT violation again.
      - Since the L1's EPT12 is valid, L0 can fix its EPT02
      - L0 resume L2
      
        The EPT violation vmexit occurred during delivery of this NMI again, and the NMI info
        is recorded in vmcs02's IDT-vectoring info. L0 should inject the NMI through vmentry
        event injection since it is caused by EPT02's EPT violation.
      
      However, vmx_inject_nmi() refuses to inject NMI from IDT-vectoring info if vCPU is in
      guest mode, this patch fix it by permitting to inject NMI from IDT-vectoring if it is
      the L0's responsibility to inject NMI from IDT-vectoring info to L2.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Cc: Bandan Das <bsd@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c5a6d5f7
    • W
      KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactive · f6e90f9e
      Wanpeng Li 提交于
      I observed that kvmvapic(to optimize flexpriority=N or AMD) is used
      to boost TPR access when testing kvm-unit-test/eventinj.flat tpr case
      on my haswell desktop (w/ flexpriority, w/o APICv). Commit (8d14695f
      x86, apicv: add virtual x2apic support) disable virtual x2apic mode
      completely if w/o APICv, and the author also told me that windows guest
      can't enter into x2apic mode when he developed the APICv feature several
      years ago. However, it is not truth currently, Interrupt Remapping and
      vIOMMU is added to qemu and the developers from Intel test windows 8 can
      work in x2apic mode w/ Interrupt Remapping enabled recently.
      
      This patch enables TPR shadow for virtual x2apic mode to boost
      windows guest in x2apic mode even if w/o APICv.
      
      Can pass the kvm-unit-test.
      Suggested-by: NRadim Krčmář <rkrcmar@redhat.com>
      Suggested-by: NWincy Van <fanwenyi0529@gmail.com>
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wincy Van <fanwenyi0529@gmail.com>
      Cc: Yang Zhang <yang.zhang.wz@gmail.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f6e90f9e
    • W
      KVM: nVMX: Fix reload apic access page warning · c83b6d15
      Wanpeng Li 提交于
      WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80
      do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0
      CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
      Call Trace:
       dump_stack+0x99/0xd0
       __warn+0xd1/0xf0
       warn_slowpath_fmt+0x4f/0x60
       ? prepare_to_swait+0x39/0xa0
       ? prepare_to_swait+0x39/0xa0
       __might_sleep+0x7e/0x80
       __gfn_to_pfn_memslot+0x156/0x480 [kvm]
       gfn_to_pfn+0x2a/0x30 [kvm]
       gfn_to_page+0xe/0x20 [kvm]
       kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
       nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
       ? _raw_spin_unlock_irqrestore+0x36/0x80
       vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
       kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
       kvm_vcpu_check_block+0x12/0x60 [kvm]
       kvm_vcpu_block+0x94/0x4c0 [kvm]
       kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
       ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
       kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      4.8.0-rc5+ #47 Not tainted
      -------------------------------
      ./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 0
      1 lock held by qemu-system-x86/4230:
       #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm]
      
      stack backtrace:
      CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
      Call Trace:
       dump_stack+0x99/0xd0
       lockdep_rcu_suspicious+0xe7/0x120
       gfn_to_memslot+0x12a/0x140 [kvm]
       gfn_to_pfn+0x12/0x30 [kvm]
       gfn_to_page+0xe/0x20 [kvm]
       kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
       nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
       ? _raw_spin_unlock_irqrestore+0x36/0x80
       vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
       kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
       kvm_vcpu_check_block+0x12/0x60 [kvm]
       kvm_vcpu_block+0x94/0x4c0 [kvm]
       kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
       ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
       kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
       ? __fget+0xfd/0x210
       ? __lock_is_held+0x54/0x70
       do_vfs_ioctl+0x96/0x6a0
       ? __fget+0x11c/0x210
       ? __fget+0x5/0x210
       SyS_ioctl+0x79/0x90
       do_syscall_64+0x81/0x220
       entry_SYSCALL64_slow_path+0x25/0x25
      
      These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat
      
      The nested preemption timer is based on hrtimer which is started on L2
      entry, stopped on L2 exit and evaluated via the new check_nested_events
      hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE)
      if need to yield pCPU and w/o holding srcu read lock when accesses memslots,
      both can be in nested preemption timer evaluation path which results in
      the warning above.
      
      This patch fix it by leveraging request bit to async reload APIC access
      page before vmentry in order to avoid to reload directly during the nested
      preemption timer evaluation, it is safe since the vmcs01 is loaded and
      current is nested vmexit.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Yunhong Jiang <yunhong.jiang@intel.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c83b6d15
  13. 16 9月, 2016 1 次提交
  14. 08 9月, 2016 1 次提交