1. 16 1月, 2018 6 次提交
  2. 14 12月, 2017 18 次提交
  3. 06 12月, 2017 3 次提交
    • R
      KVM: x86: fix APIC page invalidation · b1394e74
      Radim Krčmář 提交于
      Implementation of the unpinned APIC page didn't update the VMCS address
      cache when invalidation was done through range mmu notifiers.
      This became a problem when the page notifier was removed.
      
      Re-introduce the arch-specific helper and call it from ...range_start.
      Reported-by: NFabian Grünbichler <f.gruenbichler@proxmox.com>
      Fixes: 38b99173 ("kvm: vmx: Implement set_apic_access_page_addr")
      Fixes: 369ea824 ("mm/rmap: update to new mmu_notifier semantic v2")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Tested-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Tested-by: NFabian Grünbichler <f.gruenbichler@proxmox.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b1394e74
    • R
      x86,kvm: remove KVM emulator get_fpu / put_fpu · 6ab0b9fe
      Rik van Riel 提交于
      Now that get_fpu and put_fpu do nothing, because the scheduler will
      automatically load and restore the guest FPU context for us while we
      are in this code (deep inside the vcpu_run main loop), we can get rid
      of the get_fpu and put_fpu hooks.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6ab0b9fe
    • R
      x86,kvm: move qemu/guest FPU switching out to vcpu_run · f775b13e
      Rik van Riel 提交于
      Currently, every time a VCPU is scheduled out, the host kernel will
      first save the guest FPU/xstate context, then load the qemu userspace
      FPU context, only to then immediately save the qemu userspace FPU
      context back to memory. When scheduling in a VCPU, the same extraneous
      FPU loads and saves are done.
      
      This could be avoided by moving from a model where the guest FPU is
      loaded and stored with preemption disabled, to a model where the
      qemu userspace FPU is swapped out for the guest FPU context for
      the duration of the KVM_RUN ioctl.
      
      This is done under the VCPU mutex, which is also taken when other
      tasks inspect the VCPU FPU context, so the code should already be
      safe for this change. That should come as no surprise, given that
      s390 already has this optimization.
      
      This can fix a bug where KVM calls get_user_pages while owning the
      FPU, and the file system ends up requesting the FPU again:
      
          [258270.527947]  __warn+0xcb/0xf0
          [258270.527948]  warn_slowpath_null+0x1d/0x20
          [258270.527951]  kernel_fpu_disable+0x3f/0x50
          [258270.527953]  __kernel_fpu_begin+0x49/0x100
          [258270.527955]  kernel_fpu_begin+0xe/0x10
          [258270.527958]  crc32c_pcl_intel_update+0x84/0xb0
          [258270.527961]  crypto_shash_update+0x3f/0x110
          [258270.527968]  crc32c+0x63/0x8a [libcrc32c]
          [258270.527975]  dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
          [258270.527978]  node_prepare_for_write+0x44/0x70 [dm_persistent_data]
          [258270.527985]  dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
          [258270.527988]  submit_io+0x170/0x1b0 [dm_bufio]
          [258270.527992]  __write_dirty_buffer+0x89/0x90 [dm_bufio]
          [258270.527994]  __make_buffer_clean+0x4f/0x80 [dm_bufio]
          [258270.527996]  __try_evict_buffer+0x42/0x60 [dm_bufio]
          [258270.527998]  dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
          [258270.528002]  shrink_slab.part.40+0x1f5/0x420
          [258270.528004]  shrink_node+0x22c/0x320
          [258270.528006]  do_try_to_free_pages+0xf5/0x330
          [258270.528008]  try_to_free_pages+0xe9/0x190
          [258270.528009]  __alloc_pages_slowpath+0x40f/0xba0
          [258270.528011]  __alloc_pages_nodemask+0x209/0x260
          [258270.528014]  alloc_pages_vma+0x1f1/0x250
          [258270.528017]  do_huge_pmd_anonymous_page+0x123/0x660
          [258270.528021]  handle_mm_fault+0xfd3/0x1330
          [258270.528025]  __get_user_pages+0x113/0x640
          [258270.528027]  get_user_pages+0x4f/0x60
          [258270.528063]  __gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
          [258270.528108]  try_async_pf+0x66/0x230 [kvm]
          [258270.528135]  tdp_page_fault+0x130/0x280 [kvm]
          [258270.528149]  kvm_mmu_page_fault+0x60/0x120 [kvm]
          [258270.528158]  handle_ept_violation+0x91/0x170 [kvm_intel]
          [258270.528162]  vmx_handle_exit+0x1ca/0x1400 [kvm_intel]
      
      No performance changes were detected in quick ping-pong tests on
      my 4 socket system, which is expected since an FPU+xstate load is
      on the order of 0.1us, while ping-ponging between CPUs is on the
      order of 20us, and somewhat noisy.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      [Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu,
       which happened inside from KVM_CREATE_VCPU ioctl. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f775b13e
  4. 05 12月, 2017 2 次提交
    • B
      KVM: Introduce KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctl · 69eaedee
      Brijesh Singh 提交于
      If hardware supports memory encryption then KVM_MEMORY_ENCRYPT_REG_REGION
      and KVM_MEMORY_ENCRYPT_UNREG_REGION ioctl's can be used by userspace to
      register/unregister the guest memory regions which may contain the encrypted
      data (e.g guest RAM, PCI BAR, SMRAM etc).
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Improvements-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      69eaedee
    • B
      KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctl · 5acc5c06
      Brijesh Singh 提交于
      If the hardware supports memory encryption then the
      KVM_MEMORY_ENCRYPT_OP ioctl can be used by qemu to issue a platform
      specific memory encryption commands.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      5acc5c06
  5. 28 11月, 2017 2 次提交
    • J
      KVM: Let KVM_SET_SIGNAL_MASK work as advertised · 20b7035c
      Jan H. Schönherr 提交于
      KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
      "any unblocked signal received [...] will cause KVM_RUN to return with
      -EINTR" and that "the signal will only be delivered if not blocked by
      the original signal mask".
      
      This, however, is only true, when the calling task has a signal handler
      registered for a signal. If not, signal evaluation is short-circuited for
      SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
      returning or the whole process is terminated.
      
      Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
      to that in do_sigtimedwait() to avoid short-circuiting of signals.
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      20b7035c
    • W
      KVM: X86: Fix softlockup when get the current kvmclock · e70b57a6
      Wanpeng Li 提交于
       watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [qemu-system-x86:10185]
       CPU: 6 PID: 10185 Comm: qemu-system-x86 Tainted: G           OE   4.14.0-rc4+ #4
       RIP: 0010:kvm_get_time_scale+0x4e/0xa0 [kvm]
       Call Trace:
        get_time_ref_counter+0x5a/0x80 [kvm]
        kvm_hv_process_stimers+0x120/0x5f0 [kvm]
        kvm_arch_vcpu_ioctl_run+0x4b4/0x1690 [kvm]
        kvm_vcpu_ioctl+0x33a/0x620 [kvm]
        do_vfs_ioctl+0xa1/0x5d0
        SyS_ioctl+0x79/0x90
        entry_SYSCALL_64_fastpath+0x1e/0xa9
      
      This can be reproduced when running kvm-unit-tests/hyperv_stimer.flat and
      cpu-hotplug stress simultaneously. __this_cpu_read(cpu_tsc_khz) returns 0
      (set in kvmclock_cpu_down_prep()) when the pCPU is unhotplug which results
      in kvm_get_time_scale() gets into an infinite loop.
      
      This patch fixes it by treating the unhotplug pCPU as not using master clock.
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e70b57a6
  6. 17 11月, 2017 4 次提交
  7. 21 10月, 2017 1 次提交
  8. 19 10月, 2017 1 次提交
  9. 12 10月, 2017 3 次提交
    • L
      KVM: nSVM: fix SMI injection in guest mode · 05cade71
      Ladi Prosek 提交于
      Entering SMM while running in guest mode wasn't working very well because several
      pieces of the vcpu state were left set up for nested operation.
      
      Some of the issues observed:
      
      * L1 was getting unexpected VM exits (using L1 interception controls but running
        in SMM execution environment)
      * MMU was confused (walk_mmu was still set to nested_mmu)
      * INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)
      
      Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
      entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
      document this but they provide fields in the SMM state-save area to stash the
      current state of SVM. What we need to do is basically get out of guest mode for
      the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
      control and no L1 observable state changes.
      
      To avoid code duplication this commit takes advantage of the existing nested
      vmexit and run functionality, perhaps at the cost of efficiency. To get out of
      guest mode, nested_svm_vmexit is called, unchanged. Re-entering is performed using
      enter_svm_guest_mode.
      
      This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
      OVMF firmware (OVMF_CODE-need-smm.fd).
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      05cade71
    • L
      KVM: x86: introduce ISA specific smi_allowed callback · 72d7b374
      Ladi Prosek 提交于
      Similar to NMI, there may be ISA specific reasons why an SMI cannot be
      injected into the guest. This commit adds a new smi_allowed callback to
      be implemented in following commits.
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      72d7b374
    • L
      KVM: x86: introduce ISA specific SMM entry/exit callbacks · 0234bf88
      Ladi Prosek 提交于
      Entering and exiting SMM may require ISA specific handling under certain
      circumstances. This commit adds two new callbacks with empty implementations.
      Actual functionality will be added in following commits.
      
      * pre_enter_smm() is to be called when injecting an SMM, before any
        SMM related vcpu state has been changed
      * pre_leave_smm() is to be called when emulating the RSM instruction,
        when the vcpu is in real mode and before any SMM related vcpu state
        has been restored
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0234bf88