1. 11 11月, 2019 1 次提交
    • P
      KVM: Fix NULL-ptr deref after kvm_create_vm fails · 8a44119a
      Paolo Bonzini 提交于
      Reported by syzkaller:
      
          kasan: CONFIG_KASAN_INLINE enabled
          kasan: GPF could be caused by NULL-ptr deref or user memory access
          general protection fault: 0000 [#1] PREEMPT SMP KASAN
          CPU: 0 PID: 14727 Comm: syz-executor.3 Not tainted 5.4.0-rc4+ #0
          RIP: 0010:kvm_coalesced_mmio_init+0x5d/0x110 arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:121
          Call Trace:
           kvm_dev_ioctl_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:3446 [inline]
           kvm_dev_ioctl+0x781/0x1490 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3494
           vfs_ioctl fs/ioctl.c:46 [inline]
           file_ioctl fs/ioctl.c:509 [inline]
           do_vfs_ioctl+0x196/0x1150 fs/ioctl.c:696
           ksys_ioctl+0x62/0x90 fs/ioctl.c:713
           __do_sys_ioctl fs/ioctl.c:720 [inline]
           __se_sys_ioctl fs/ioctl.c:718 [inline]
           __x64_sys_ioctl+0x6e/0xb0 fs/ioctl.c:718
           do_syscall_64+0xca/0x5d0 arch/x86/entry/common.c:290
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Commit 9121923c ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm")
      moves memslots and buses allocations around, however, if kvm->srcu/irq_srcu fails
      initialization, NULL will be returned instead of error code, NULL will not be intercepted
      in kvm_dev_ioctl_create_vm() and be dereferenced by kvm_coalesced_mmio_init(), this patch
      fixes it.
      
      Moving the initialization is required anyway to avoid an incorrect synchronize_srcu that
      was also reported by syzkaller:
      
       wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
       __synchronize_srcu+0x197/0x250 kernel/rcu/srcutree.c:921
       synchronize_srcu_expedited kernel/rcu/srcutree.c:946 [inline]
       synchronize_srcu+0x239/0x3e8 kernel/rcu/srcutree.c:997
       kvm_page_track_unregister_notifier+0xe7/0x130 arch/x86/kvm/page_track.c:212
       kvm_mmu_uninit_vm+0x1e/0x30 arch/x86/kvm/mmu.c:5828
       kvm_arch_destroy_vm+0x4a2/0x5f0 arch/x86/kvm/x86.c:9579
       kvm_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:702 [inline]
      
      so do it.
      
      Reported-by: syzbot+89a8060879fa0bd2db4f@syzkaller.appspotmail.com
      Reported-by: syzbot+e27e7027eb2b80e44225@syzkaller.appspotmail.com
      Fixes: 9121923c ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm")
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8a44119a
  2. 31 10月, 2019 1 次提交
  3. 25 10月, 2019 1 次提交
  4. 22 10月, 2019 1 次提交
  5. 01 10月, 2019 1 次提交
  6. 19 8月, 2019 1 次提交
    • M
      KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence · 07ab0f8d
      Marc Zyngier 提交于
      When a vpcu is about to block by calling kvm_vcpu_block, we call
      back into the arch code to allow any form of synchronization that
      may be required at this point (SVN stops the AVIC, ARM synchronises
      the VMCR and enables GICv4 doorbells). But this synchronization
      comes in quite late, as we've potentially waited for halt_poll_ns
      to expire.
      
      Instead, let's move kvm_arch_vcpu_blocking() to the beginning of
      kvm_vcpu_block(), which on ARM has several benefits:
      
      - VMCR gets synchronised early, meaning that any interrupt delivered
        during the polling window will be evaluated with the correct guest
        PMR
      - GICv4 doorbells are enabled, which means that any guest interrupt
        directly injected during that window will be immediately recognised
      
      Tang Nianyao ran some tests on a GICv4 machine to evaluate such
      change, and reported up to a 10% improvement for netperf:
      
      <quote>
      	netperf result:
      	D06 as server, intel 8180 server as client
      	with change:
      	package 512 bytes - 5500 Mbits/s
      	package 64 bytes - 760 Mbits/s
      	without change:
      	package 512 bytes - 5000 Mbits/s
      	package 64 bytes - 710 Mbits/s
      </quote>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      07ab0f8d
  7. 09 8月, 2019 1 次提交
  8. 05 8月, 2019 4 次提交
    • G
      KVM: no need to check return value of debugfs_create functions · 3e7093d0
      Greg KH 提交于
      When calling debugfs functions, there is no need to ever check the
      return value.  The function can work or not, but the code logic should
      never do something different based on this.
      
      Also, when doing this, change kvm_arch_create_vcpu_debugfs() to return
      void instead of an integer, as we should not care at all about if this
      function actually does anything or not.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <x86@kernel.org>
      Cc: <kvm@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3e7093d0
    • P
      KVM: remove kvm_arch_has_vcpu_debugfs() · 741cbbae
      Paolo Bonzini 提交于
      There is no need for this function as all arches have to implement
      kvm_arch_create_vcpu_debugfs() no matter what.  A #define symbol
      let us actually simplify the code.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      741cbbae
    • W
      KVM: Fix leak vCPU's VMCS value into other pCPU · 17e433b5
      Wanpeng Li 提交于
      After commit d73eb57b (KVM: Boost vCPUs that are delivering interrupts), a
      five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs
      on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting
      in the VMs after stress testing:
      
       INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
       Call Trace:
         flush_tlb_mm_range+0x68/0x140
         tlb_flush_mmu.part.75+0x37/0xe0
         tlb_finish_mmu+0x55/0x60
         zap_page_range+0x142/0x190
         SyS_madvise+0x3cd/0x9c0
         system_call_fastpath+0x1c/0x21
      
      swait_active() sustains to be true before finish_swait() is called in
      kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account
      by kvm_vcpu_on_spin() loop greatly increases the probability condition
      kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv
      is enabled the yield-candidate vCPU's VMCS RVI field leaks(by
      vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current
      VMCS.
      
      This patch fixes it by checking conservatively a subset of events.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Marc Zyngier <Marc.Zyngier@arm.com>
      Cc: stable@vger.kernel.org
      Fixes: 98f4a146 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      17e433b5
    • W
      KVM: Check preempted_in_kernel for involuntary preemption · 046ddeed
      Wanpeng Li 提交于
      preempted_in_kernel is updated in preempt_notifier when involuntary preemption
      ocurrs, it can be stale when the voluntarily preempted vCPUs are taken into
      account by kvm_vcpu_on_spin() loop. This patch lets it just check preempted_in_kernel
      for involuntary preemption.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      046ddeed
  9. 20 7月, 2019 1 次提交
    • W
      KVM: Boost vCPUs that are delivering interrupts · d73eb57b
      Wanpeng Li 提交于
      Inspired by commit 9cac38dd (KVM/s390: Set preempted flag during
      vcpu wakeup and interrupt delivery), we want to also boost not just
      lock holders but also vCPUs that are delivering interrupts. Most
      smp_call_function_many calls are synchronous, so the IPI target vCPUs
      are also good yield candidates.  This patch introduces vcpu->ready to
      boost vCPUs during wakeup and interrupt delivery time; unlike s390 we do
      not reuse vcpu->preempted so that voluntarily preempted vCPUs are taken
      into account by kvm_vcpu_on_spin, but vmx_vcpu_pi_put is not affected
      (VT-d PI handles voluntary preemption separately, in pi_pre_block).
      
      Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
      ebizzy -M
      
                  vanilla     boosting    improved
      1VM          21443       23520         9%
      2VM           2800        8000       180%
      3VM           1800        3100        72%
      
      Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs,
      one running ebizzy -M, the other running 'stress --cpu 2':
      
      w/ boosting + w/o pv sched yield(vanilla)
      
                  vanilla     boosting   improved
                    1570         4000      155%
      
      w/ boosting + w/ pv sched yield(vanilla)
      
                  vanilla     boosting   improved
                    1844         5157      179%
      
      w/o boosting, perf top in VM:
      
       72.33%  [kernel]       [k] smp_call_function_many
        4.22%  [kernel]       [k] call_function_i
        3.71%  [kernel]       [k] async_page_fault
      
      w/ boosting, perf top in VM:
      
       38.43%  [kernel]       [k] smp_call_function_many
        6.31%  [kernel]       [k] async_page_fault
        6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
        4.88%  [kernel]       [k] call_function_interrupt
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d73eb57b
  10. 10 7月, 2019 1 次提交
  11. 19 6月, 2019 1 次提交
  12. 05 6月, 2019 2 次提交
  13. 28 5月, 2019 2 次提交
  14. 25 5月, 2019 2 次提交
    • P
      kvm: fix compilation on s390 · d30b214d
      Paolo Bonzini 提交于
      s390 does not have memremap, even though in this particular case it
      would be useful.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d30b214d
    • W
      KVM: Fix spinlock taken warning during host resume · 2eb06c30
      Wanpeng Li 提交于
       WARNING: CPU: 0 PID: 13554 at kvm/arch/x86/kvm//../../../virt/kvm/kvm_main.c:4183 kvm_resume+0x3c/0x40 [kvm]
        CPU: 0 PID: 13554 Comm: step_after_susp Tainted: G           OE     5.1.0-rc4+ #1
        RIP: 0010:kvm_resume+0x3c/0x40 [kvm]
        Call Trace:
         syscore_resume+0x63/0x2d0
         suspend_devices_and_enter+0x9d1/0xa40
         pm_suspend+0x33a/0x3b0
         state_store+0x82/0xf0
         kobj_attr_store+0x12/0x20
         sysfs_kf_write+0x4b/0x60
         kernfs_fop_write+0x120/0x1a0
         __vfs_write+0x1b/0x40
         vfs_write+0xcd/0x1d0
         ksys_write+0x5f/0xe0
         __x64_sys_write+0x1a/0x20
         do_syscall_64+0x6f/0x6c0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Commit ca84d1a2 (KVM: x86: Add clock sync request to hardware enable) mentioned
      that "we always hold kvm_lock when hardware_enable is called.  The one place that
      doesn't need to worry about it is resume, as resuming a frozen CPU, the spinlock
      won't be taken." However, commit 6706dae9 (virt/kvm: Replace spin_is_locked() with
      lockdep) introduces a bug, it asserts when the lock is not held which is contrary
      to the original goal.
      
      This patch fixes it by WARN_ON when the lock is held.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Fixes: 6706dae9 ("virt/kvm: Replace spin_is_locked() with lockdep")
      [Wrap with #ifdef CONFIG_LOCKDEP - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2eb06c30
  15. 17 5月, 2019 1 次提交
    • P
      kvm: fix compilation on aarch64 · c011d23b
      Paolo Bonzini 提交于
      Commit e45adf66 ("KVM: Introduce a new guest mapping API", 2019-01-31)
      introduced a build failure on aarch64 defconfig:
      
      $ make -j$(nproc) ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- O=out defconfig \
                      Image.gz
      ...
      ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:
          In function '__kvm_map_gfn':
      ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1763:9: error:
          implicit declaration of function 'memremap'; did you mean 'memset_p'?
      ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1763:46: error:
          'MEMREMAP_WB' undeclared (first use in this function)
      ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:
          In function 'kvm_vcpu_unmap':
      ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1795:3: error:
          implicit declaration of function 'memunmap'; did you mean 'vm_munmap'?
      
      because these functions are declared in <linux/io.h> rather than <asm/io.h>,
      and the former was being pulled in already on x86 but not on aarch64.
      Reported-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c011d23b
  16. 15 5月, 2019 1 次提交
    • J
      mm/mmu_notifier: convert user range->blockable to helper function · dfcd6660
      Jérôme Glisse 提交于
      Use the mmu_notifier_range_blockable() helper function instead of directly
      dereferencing the range->blockable field.  This is done to make it easier
      to change the mmu_notifier range field.
      
      This patch is the outcome of the following coccinelle patch:
      
      %<-------------------------------------------------------------------
      @@
      identifier I1, FN;
      @@
      FN(..., struct mmu_notifier_range *I1, ...) {
      <...
      -I1->blockable
      +mmu_notifier_range_blockable(I1)
      ...>
      }
      ------------------------------------------------------------------->%
      
      spatch --in-place --sp-file blockable.spatch --dir .
      
      Link: http://lkml.kernel.org/r/20190326164747.24405-3-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dfcd6660
  17. 14 5月, 2019 1 次提交
  18. 08 5月, 2019 3 次提交
  19. 01 5月, 2019 4 次提交
    • K
      KVM: Introduce a new guest mapping API · e45adf66
      KarimAllah Ahmed 提交于
      In KVM, specially for nested guests, there is a dominant pattern of:
      
      	=> map guest memory -> do_something -> unmap guest memory
      
      In addition to all this unnecessarily noise in the code due to boiler plate
      code, most of the time the mapping function does not properly handle memory
      that is not backed by "struct page". This new guest mapping API encapsulate
      most of this boiler plate code and also handles guest memory that is not
      backed by "struct page".
      
      The current implementation of this API is using memremap for memory that is
      not backed by a "struct page" which would lead to a huge slow-down if it
      was used for high-frequency mapping operations. The API does not have any
      effect on current setups where guest memory is backed by a "struct page".
      Further patches are going to also introduce a pfn-cache which would
      significantly improve the performance of the memremap case.
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e45adf66
    • J
      kvm_main: fix some comments · b8b00220
      Jiang Biao 提交于
      is_dirty has been renamed to flush, but the comment for it is
      outdated. And the description about @flush parameter for
      kvm_clear_dirty_log_protect() is missing, add it in this patch
      as well.
      Signed-off-by: NJiang Biao <benbjiang@tencent.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b8b00220
    • P
      KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size · 65c4189d
      Paolo Bonzini 提交于
      If a memory slot's size is not a multiple of 64 pages (256K), then
      the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages
      either requires the requested page range to go beyond memslot->npages,
      or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect
      requires log->num_pages to be both in range and aligned.
      
      To allow this case, allow log->num_pages not to be a multiple of 64 if
      it ends exactly on the last page of the slot.
      Reported-by: NPeter Xu <peterx@redhat.com>
      Fixes: 98938aa8 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02)
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      65c4189d
    • P
      KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size · 76d58e0f
      Paolo Bonzini 提交于
      If a memory slot's size is not a multiple of 64 pages (256K), then
      the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages
      either requires the requested page range to go beyond memslot->npages,
      or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect
      requires log->num_pages to be both in range and aligned.
      
      To allow this case, allow log->num_pages not to be a multiple of 64 if
      it ends exactly on the last page of the slot.
      Reported-by: NPeter Xu <peterx@redhat.com>
      Fixes: 98938aa8 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02)
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      76d58e0f
  20. 30 4月, 2019 2 次提交
  21. 26 4月, 2019 1 次提交
  22. 16 4月, 2019 2 次提交
  23. 29 3月, 2019 1 次提交
  24. 01 3月, 2019 1 次提交
  25. 23 2月, 2019 1 次提交
  26. 21 2月, 2019 2 次提交