1. 17 3月, 2018 6 次提交
    • V
      x86/kvm/hyper-v: add reenlightenment MSRs support · a2e164e7
      Vitaly Kuznetsov 提交于
      Nested Hyper-V/Windows guest running on top of KVM will use TSC page
      clocksource in two cases:
      - L0 exposes invariant TSC (CPUID.80000007H:EDX[8]).
      - L0 provides Hyper-V Reenlightenment support (CPUID.40000003H:EAX[13]).
      
      Exposing invariant TSC effectively blocks migration to hosts with different
      TSC frequencies, providing reenlightenment support will be needed when we
      start migrating nested workloads.
      
      Implement rudimentary support for reenlightenment MSRs. For now, these are
      just read/write MSRs with no effect.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a2e164e7
    • K
      KVM: x86: Update the exit_qualification access bits while walking an address · ddd6f0e9
      KarimAllah Ahmed 提交于
      ... to avoid having a stale value when handling an EPT misconfig for MMIO
      regions.
      
      MMIO regions that are not passed-through to the guest are handled through
      EPT misconfigs. The first time a certain MMIO page is touched it causes an
      EPT violation, then KVM marks the EPT entry to cause an EPT misconfig
      instead. Any subsequent accesses to the entry will generate an EPT
      misconfig.
      
      Things gets slightly complicated with nested guest handling for MMIO
      regions that are not passed through from L0 (i.e. emulated by L0
      user-space).
      
      An EPT violation for one of these MMIO regions from L2, exits to L0
      hypervisor. L0 would then look at the EPT12 mapping for L1 hypervisor and
      realize it is not present (or not sufficient to serve the request). Then L0
      injects an EPT violation to L1. L1 would then update its EPT mappings. The
      EXIT_QUALIFICATION value for L1 would come from exit_qualification variable
      in "struct vcpu". The problem is that this variable is only updated on EPT
      violation and not on EPT misconfig. So if an EPT violation because of a
      read happened first, then an EPT misconfig because of a write happened
      afterwards. The L0 hypervisor will still contain exit_qualification value
      from the previous read instead of the write and end up injecting an EPT
      violation to the L1 hypervisor with an out of date EXIT_QUALIFICATION.
      
      The EPT violation that is injected from L0 to L1 needs to have the correct
      EXIT_QUALIFICATION specially for the access bits because the individual
      access bits for MMIO EPTs are updated only on actual access of this
      specific type. So for the example above, the L1 hypervisor will keep
      updating only the read bit in the EPT then resume the L2 guest. The L2
      guest would end up causing another exit where the L0 *again* will inject
      another EPT violation to L1 hypervisor with *again* an out of date
      exit_qualification which indicates a read and not a write. Then this
      ping-pong just keeps happening without making any forward progress.
      
      The behavior of mapping MMIO regions changed in:
      
         commit a340b3e2 ("kvm: Map PFN-type memory regions as writable (if possible)")
      
      ... where an EPT violation for a read would also fixup the write bits to
      avoid another EPT violation which by acciddent would fix the bug mentioned
      above.
      
      This commit fixes this situation and ensures that the access bits for the
      exit_qualifcation is up to date. That ensures that even L1 hypervisor
      running with a KVM version before the commit mentioned above would still
      work.
      
      ( The description above assumes EPT to be available and used by L1
        hypervisor + the L1 hypervisor is passing through the MMIO region to the L2
        guest while this MMIO region is emulated by the L0 user-space ).
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      ddd6f0e9
    • M
      KVM: x86: Make enum conversion explicit in kvm_pdptr_read() · 1df372f4
      Matthias Kaehlcke 提交于
      The type 'enum kvm_reg_ex' is an extension of 'enum kvm_reg', however
      the extension is only semantical and the compiler doesn't know about the
      relationship between the two types. In kvm_pdptr_read() a value of the
      extended type is passed to kvm_x86_ops->cache_reg(), which expects a
      value of the base type. Clang raises the following warning about the
      type mismatch:
      
      arch/x86/kvm/kvm_cache_regs.h:44:32: warning: implicit conversion from
        enumeration type 'enum kvm_reg_ex' to different enumeration type
        'enum kvm_reg' [-Wenum-conversion]
          kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_PDPTR);
      
      Cast VCPU_EXREG_PDPTR to 'enum kvm_reg' to make the compiler happy.
      Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
      Reviewed-by: NGuenter Roeck <groeck@chromium.org>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      1df372f4
    • V
      KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use · 0bcc3fb9
      Vitaly Kuznetsov 提交于
      Devices which use level-triggered interrupts under Windows 2016 with
      Hyper-V role enabled don't work: Windows disables EOI broadcast in SPIV
      unconditionally. Our in-kernel IOAPIC implementation emulates an old IOAPIC
      version which has no EOI register so EOI never happens.
      
      The issue was discovered and discussed a while ago:
      https://www.spinics.net/lists/kvm/msg148098.html
      
      While this is a guest OS bug (it should check that IOAPIC has the required
      capabilities before disabling EOI broadcast) we can workaround it in KVM:
      advertising DIRECTED_EOI with in-kernel IOAPIC makes little sense anyway.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      0bcc3fb9
    • J
      KVM: x86: Add support for AMD Core Perf Extension in guest · c51eb52b
      Janakarajan Natarajan 提交于
      Add support for AMD Core Performance counters in the guest. The base
      event select and counter MSRs are changed. In addition, with the core
      extension, there are 2 extra counters available for performance
      measurements for a total of 6.
      
      With the new MSRs, the logic to map them to the gp_counters[] is changed.
      New functions are added to check the validity of the get/set MSRs.
      
      If the guest has the X86_FEATURE_PERFCTR_CORE cpuid flag set, the number
      of counters available to the vcpu is set to 6. It the flag is not set
      then it is 4.
      Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      [Squashed "Expose AMD Core Perf Extension flag to guests" - Radim.]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c51eb52b
    • J
      x86/msr: Add AMD Core Perf Extension MSRs · e84b7119
      Janakarajan Natarajan 提交于
      Add the EventSelect and Counter MSRs for AMD Core Perf Extension.
      Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      e84b7119
  2. 08 3月, 2018 1 次提交
  3. 07 3月, 2018 8 次提交
  4. 02 3月, 2018 5 次提交
  5. 01 3月, 2018 3 次提交
  6. 28 2月, 2018 4 次提交
  7. 24 2月, 2018 12 次提交
    • B
      KVM: SVM: Fix SEV LAUNCH_SECRET command · 9c5e0afa
      Brijesh Singh 提交于
      The SEV LAUNCH_SECRET command fails with error code 'invalid param'
      because we missed filling the guest and header system physical address
      while issuing the command.
      
      Fixes: 9f5b5b95 (KVM: SVM: Add support for SEV LAUNCH_SECRET command)
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Joerg Roedel <joro@8bytes.org>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9c5e0afa
    • B
      KVM: SVM: install RSM intercept · 7607b717
      Brijesh Singh 提交于
      RSM instruction is used by the SMM handler to return from SMM mode.
      Currently, rsm causes a #UD - which results in instruction fetch, decode,
      and emulate. By installing the RSM intercept we can avoid the instruction
      fetch since we know that #VMEXIT was due to rsm.
      
      The patch is required for the SEV guest, because in case of SEV guest
      memory is encrypted with guest-specific key and hypervisor will not
      able to fetch the instruction bytes from the guest memory.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7607b717
    • B
      KVM: SVM: no need to call access_ok() in LAUNCH_MEASURE command · 3e233385
      Brijesh Singh 提交于
      Using the access_ok() to validate the input before issuing the SEV
      command does not buy us anything in this case. If userland is
      giving us a garbage pointer then copy_to_user() will catch it when we try
      to return the measurement.
      Suggested-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Fixes: 0d0736f7 (KVM: SVM: Add support for KVM_SEV_LAUNCH_MEASURE ...)
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Joerg Roedel <joro@8bytes.org>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3e233385
    • W
      KVM: X86: Avoid traversing all the cpus for pv tlb flush when steal time is disabled · 4f2f61fc
      Wanpeng Li 提交于
      Avoid traversing all the cpus for pv tlb flush when steal time
      is disabled since pv tlb flush depends on the field in steal time
      for shared data.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4f2f61fc
    • D
      x86/kvm: Make parse_no_xxx __init for kvm · afdc3f58
      Dou Liyang 提交于
      The early_param() is only called during kernel initialization, So Linux
      marks the functions of it with __init macro to save memory.
      
      But it forgot to mark the parse_no_kvmapf/stealacc/kvmclock_vsyscall,
      So, Make them __init as well.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: rkrcmar@redhat.com
      Cc: kvm@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: x86@kernel.org
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      afdc3f58
    • R
      KVM: x86: fix backward migration with async_PF · fe2a3027
      Radim Krčmář 提交于
      Guests on new hypersiors might set KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT
      bit when enabling async_PF, but this bit is reserved on old hypervisors,
      which results in a failure upon migration.
      
      To avoid breaking different cases, we are checking for CPUID feature bit
      before enabling the feature and nothing else.
      
      Fixes: 52a5c155 ("KVM: async_pf: Let guest support delivery of async_pf from guest mode")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NWanpeng Li <wanpengli@tencent.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fe2a3027
    • S
      kvm: fix warning for non-x86 builds · f75e4924
      Sebastian Ott 提交于
      Fix the following sparse warning by moving the prototype
      of kvm_arch_mmu_notifier_invalidate_range() to linux/kvm_host.h .
      
        CHECK   arch/s390/kvm/../../../virt/kvm/kvm_main.c
      arch/s390/kvm/../../../virt/kvm/kvm_main.c:138:13: warning: symbol 'kvm_arch_mmu_notifier_invalidate_range' was not declared. Should it be static?
      Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f75e4924
    • W
      KVM: X86: Fix SMRAM accessing even if VM is shutdown · 95e057e2
      Wanpeng Li 提交于
      Reported by syzkaller:
      
         WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
         CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
         RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
         Call Trace:
          vmx_handle_exit+0xbd/0xe20 [kvm_intel]
          kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
          kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
          do_vfs_ioctl+0xa4/0x6a0
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x25/0x9c
      
      The testcase creates a first thread to issue KVM_SMI ioctl, and then creates
      a second thread to mmap and operate on the same vCPU.  This triggers a race
      condition when running the testcase with multiple threads. Sometimes one thread
      exits with a triple fault while another thread mmaps and operates on the same
      vCPU.  Because CS=0x3000/IP=0x8000 is not mapped, accessing the SMI handler
      results in an EPT misconfig. This patch fixes it by returning RET_PF_EMULATE
      in kvm_handle_bad_page(), which will go on to cause an emulation failure and an
      exit with KVM_EXIT_INTERNAL_ERROR.
      
      Reported-by: syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4dc58@syzkaller.appspotmail.com
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      95e057e2
    • C
      KVM: nVMX: Don't halt vcpu when L1 is injecting events to L2 · 135a06c3
      Chao Gao 提交于
      Although L2 is in halt state, it will be in the active state after
      VM entry if the VM entry is vectoring according to SDM 26.6.2 Activity
      State. Halting the vcpu here means the event won't be injected to L2
      and this decision isn't reported to L1. Thus L0 drops an event that
      should be injected to L2.
      
      Cc: Liran Alon <liran.alon@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NChao Gao <chao.gao@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      135a06c3
    • E
      KVM/x86: remove WARN_ON() for when vm_munmap() fails · 103c763c
      Eric Biggers 提交于
      On x86, special KVM memslots such as the TSS region have anonymous
      memory mappings created on behalf of userspace, and these mappings are
      removed when the VM is destroyed.
      
      It is however possible for removing these mappings via vm_munmap() to
      fail.  This can most easily happen if the thread receives SIGKILL while
      it's waiting to acquire ->mmap_sem.   This triggers the 'WARN_ON(r < 0)'
      in __x86_set_memory_region().  syzkaller was able to hit this, using
      'exit()' to send the SIGKILL.  Note that while the vm_munmap() failure
      results in the mapping not being removed immediately, it is not leaked
      forever but rather will be freed when the process exits.
      
      It's not really possible to handle this failure properly, so almost
      every other caller of vm_munmap() doesn't check the return value.  It's
      a limitation of having the kernel manage these mappings rather than
      userspace.
      
      So just remove the WARN_ON() so that users can't spam the kernel log
      with this warning.
      
      Fixes: f0d648bd ("KVM: x86: map/unmap private slots in __x86_set_memory_region")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      103c763c
    • R
      KVM: nVMX: preserve SECONDARY_EXEC_DESC without UMIP · 99158246
      Radim Krčmář 提交于
      L1 might want to use SECONDARY_EXEC_DESC, so we must not clear the VMCS
      bit if UMIP is not being emulated.
      
      We must still set the bit when emulating UMIP as the feature can be
      passed to L2 where L0 will do the emulation and because L2 can change
      CR4 without a VM exit, we should clear the bit if UMIP is disabled.
      
      Fixes: 0367f205 ("KVM: vmx: add support for emulating UMIP")
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      99158246
    • P
      KVM: x86: move LAPIC initialization after VMCS creation · 0b2e9904
      Paolo Bonzini 提交于
      The initial reset of the local APIC is performed before the VMCS has been
      created, but it tries to do a vmwrite:
      
       vmwrite error: reg 810 value 4a00 (err 18944)
       CPU: 54 PID: 38652 Comm: qemu-kvm Tainted: G        W I      4.16.0-0.rc2.git0.1.fc28.x86_64 #1
       Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0003.090520141303 09/05/2014
       Call Trace:
        vmx_set_rvi [kvm_intel]
        vmx_hwapic_irr_update [kvm_intel]
        kvm_lapic_reset [kvm]
        kvm_create_lapic [kvm]
        kvm_arch_vcpu_init [kvm]
        kvm_vcpu_init [kvm]
        vmx_create_vcpu [kvm_intel]
        kvm_vm_ioctl [kvm]
      
      Move it later, after the VMCS has been created.
      
      Fixes: 4191db26 ("KVM: x86: Update APICv on APIC reset")
      Cc: stable@vger.kernel.org
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0b2e9904
  8. 23 2月, 2018 1 次提交