1. 28 1月, 2020 8 次提交
  2. 24 1月, 2020 2 次提交
  3. 21 1月, 2020 4 次提交
  4. 09 1月, 2020 1 次提交
  5. 23 11月, 2019 1 次提交
  6. 15 11月, 2019 2 次提交
  7. 14 11月, 2019 1 次提交
  8. 12 11月, 2019 1 次提交
    • S
      KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved · a78986aa
      Sean Christopherson 提交于
      Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and
      instead manually handle ZONE_DEVICE on a case-by-case basis.  For things
      like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal
      pages, e.g. put pages grabbed via gup().  But for flows such as setting
      A/D bits or shifting refcounts for transparent huge pages, KVM needs to
      to avoid processing ZONE_DEVICE pages as the flows in question lack the
      underlying machinery for proper handling of ZONE_DEVICE pages.
      
      This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup()
      when running a KVM guest backed with /dev/dax memory, as KVM straight up
      doesn't put any references to ZONE_DEVICE pages acquired by gup().
      
      Note, Dan Williams proposed an alternative solution of doing put_page()
      on ZONE_DEVICE pages immediately after gup() in order to simplify the
      auditing needed to ensure is_zone_device_page() is called if and only if
      the backing device is pinned (via gup()).  But that approach would break
      kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til
      unmap() when accessing guest memory, unlike KVM's secondary MMU, which
      coordinates with mmu_notifier invalidations to avoid creating stale
      page references, i.e. doesn't rely on pages being pinned.
      
      [*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.plReported-by: NAdam Borowski <kilobyte@angband.pl>
      Analyzed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Cc: stable@vger.kernel.org
      Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a78986aa
  9. 11 11月, 2019 2 次提交
    • P
      KVM: fix placement of refcount initialization · e2d3fcaf
      Paolo Bonzini 提交于
      Reported by syzkaller:
      
         =============================
         WARNING: suspicious RCU usage
         -----------------------------
         ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage!
      
         other info that might help us debug this:
      
         rcu_scheduler_active = 2, debug_locks = 1
         no locks held by repro_11/12688.
      
         stack backtrace:
         Call Trace:
          dump_stack+0x7d/0xc5
          lockdep_rcu_suspicious+0x123/0x170
          kvm_dev_ioctl+0x9a9/0x1260 [kvm]
          do_vfs_ioctl+0x1a1/0xfb0
          ksys_ioctl+0x6d/0x80
          __x64_sys_ioctl+0x73/0xb0
          do_syscall_64+0x108/0xaa0
          entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Commit a97b0e77 (kvm: call kvm_arch_destroy_vm if vm creation fails)
      sets users_count to 1 before kvm_arch_init_vm(), however, if kvm_arch_init_vm()
      fails, we need to decrease this count.  By moving it earlier, we can push
      the decrease to out_err_no_arch_destroy_vm without introducing yet another
      error label.
      
      syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=15209b84e00000
      
      Reported-by: syzbot+75475908cd0910f141ee@syzkaller.appspotmail.com
      Fixes: a97b0e77 ("kvm: call kvm_arch_destroy_vm if vm creation fails")
      Cc: Jim Mattson <jmattson@google.com>
      Analyzed-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e2d3fcaf
    • P
      KVM: Fix NULL-ptr deref after kvm_create_vm fails · 8a44119a
      Paolo Bonzini 提交于
      Reported by syzkaller:
      
          kasan: CONFIG_KASAN_INLINE enabled
          kasan: GPF could be caused by NULL-ptr deref or user memory access
          general protection fault: 0000 [#1] PREEMPT SMP KASAN
          CPU: 0 PID: 14727 Comm: syz-executor.3 Not tainted 5.4.0-rc4+ #0
          RIP: 0010:kvm_coalesced_mmio_init+0x5d/0x110 arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:121
          Call Trace:
           kvm_dev_ioctl_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:3446 [inline]
           kvm_dev_ioctl+0x781/0x1490 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3494
           vfs_ioctl fs/ioctl.c:46 [inline]
           file_ioctl fs/ioctl.c:509 [inline]
           do_vfs_ioctl+0x196/0x1150 fs/ioctl.c:696
           ksys_ioctl+0x62/0x90 fs/ioctl.c:713
           __do_sys_ioctl fs/ioctl.c:720 [inline]
           __se_sys_ioctl fs/ioctl.c:718 [inline]
           __x64_sys_ioctl+0x6e/0xb0 fs/ioctl.c:718
           do_syscall_64+0xca/0x5d0 arch/x86/entry/common.c:290
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Commit 9121923c ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm")
      moves memslots and buses allocations around, however, if kvm->srcu/irq_srcu fails
      initialization, NULL will be returned instead of error code, NULL will not be intercepted
      in kvm_dev_ioctl_create_vm() and be dereferenced by kvm_coalesced_mmio_init(), this patch
      fixes it.
      
      Moving the initialization is required anyway to avoid an incorrect synchronize_srcu that
      was also reported by syzkaller:
      
       wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
       __synchronize_srcu+0x197/0x250 kernel/rcu/srcutree.c:921
       synchronize_srcu_expedited kernel/rcu/srcutree.c:946 [inline]
       synchronize_srcu+0x239/0x3e8 kernel/rcu/srcutree.c:997
       kvm_page_track_unregister_notifier+0xe7/0x130 arch/x86/kvm/page_track.c:212
       kvm_mmu_uninit_vm+0x1e/0x30 arch/x86/kvm/mmu.c:5828
       kvm_arch_destroy_vm+0x4a2/0x5f0 arch/x86/kvm/x86.c:9579
       kvm_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:702 [inline]
      
      so do it.
      
      Reported-by: syzbot+89a8060879fa0bd2db4f@syzkaller.appspotmail.com
      Reported-by: syzbot+e27e7027eb2b80e44225@syzkaller.appspotmail.com
      Fixes: 9121923c ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm")
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8a44119a
  10. 05 11月, 2019 1 次提交
  11. 04 11月, 2019 1 次提交
  12. 31 10月, 2019 1 次提交
  13. 25 10月, 2019 1 次提交
  14. 22 10月, 2019 3 次提交
  15. 01 10月, 2019 1 次提交
  16. 19 8月, 2019 1 次提交
    • M
      KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence · 07ab0f8d
      Marc Zyngier 提交于
      When a vpcu is about to block by calling kvm_vcpu_block, we call
      back into the arch code to allow any form of synchronization that
      may be required at this point (SVN stops the AVIC, ARM synchronises
      the VMCR and enables GICv4 doorbells). But this synchronization
      comes in quite late, as we've potentially waited for halt_poll_ns
      to expire.
      
      Instead, let's move kvm_arch_vcpu_blocking() to the beginning of
      kvm_vcpu_block(), which on ARM has several benefits:
      
      - VMCR gets synchronised early, meaning that any interrupt delivered
        during the polling window will be evaluated with the correct guest
        PMR
      - GICv4 doorbells are enabled, which means that any guest interrupt
        directly injected during that window will be immediately recognised
      
      Tang Nianyao ran some tests on a GICv4 machine to evaluate such
      change, and reported up to a 10% improvement for netperf:
      
      <quote>
      	netperf result:
      	D06 as server, intel 8180 server as client
      	with change:
      	package 512 bytes - 5500 Mbits/s
      	package 64 bytes - 760 Mbits/s
      	without change:
      	package 512 bytes - 5000 Mbits/s
      	package 64 bytes - 710 Mbits/s
      </quote>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      07ab0f8d
  17. 09 8月, 2019 1 次提交
  18. 05 8月, 2019 4 次提交
    • G
      KVM: no need to check return value of debugfs_create functions · 3e7093d0
      Greg KH 提交于
      When calling debugfs functions, there is no need to ever check the
      return value.  The function can work or not, but the code logic should
      never do something different based on this.
      
      Also, when doing this, change kvm_arch_create_vcpu_debugfs() to return
      void instead of an integer, as we should not care at all about if this
      function actually does anything or not.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <x86@kernel.org>
      Cc: <kvm@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3e7093d0
    • P
      KVM: remove kvm_arch_has_vcpu_debugfs() · 741cbbae
      Paolo Bonzini 提交于
      There is no need for this function as all arches have to implement
      kvm_arch_create_vcpu_debugfs() no matter what.  A #define symbol
      let us actually simplify the code.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      741cbbae
    • W
      KVM: Fix leak vCPU's VMCS value into other pCPU · 17e433b5
      Wanpeng Li 提交于
      After commit d73eb57b (KVM: Boost vCPUs that are delivering interrupts), a
      five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs
      on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting
      in the VMs after stress testing:
      
       INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
       Call Trace:
         flush_tlb_mm_range+0x68/0x140
         tlb_flush_mmu.part.75+0x37/0xe0
         tlb_finish_mmu+0x55/0x60
         zap_page_range+0x142/0x190
         SyS_madvise+0x3cd/0x9c0
         system_call_fastpath+0x1c/0x21
      
      swait_active() sustains to be true before finish_swait() is called in
      kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account
      by kvm_vcpu_on_spin() loop greatly increases the probability condition
      kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv
      is enabled the yield-candidate vCPU's VMCS RVI field leaks(by
      vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current
      VMCS.
      
      This patch fixes it by checking conservatively a subset of events.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Marc Zyngier <Marc.Zyngier@arm.com>
      Cc: stable@vger.kernel.org
      Fixes: 98f4a146 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      17e433b5
    • W
      KVM: Check preempted_in_kernel for involuntary preemption · 046ddeed
      Wanpeng Li 提交于
      preempted_in_kernel is updated in preempt_notifier when involuntary preemption
      ocurrs, it can be stale when the voluntarily preempted vCPUs are taken into
      account by kvm_vcpu_on_spin() loop. This patch lets it just check preempted_in_kernel
      for involuntary preemption.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      046ddeed
  19. 20 7月, 2019 1 次提交
    • W
      KVM: Boost vCPUs that are delivering interrupts · d73eb57b
      Wanpeng Li 提交于
      Inspired by commit 9cac38dd (KVM/s390: Set preempted flag during
      vcpu wakeup and interrupt delivery), we want to also boost not just
      lock holders but also vCPUs that are delivering interrupts. Most
      smp_call_function_many calls are synchronous, so the IPI target vCPUs
      are also good yield candidates.  This patch introduces vcpu->ready to
      boost vCPUs during wakeup and interrupt delivery time; unlike s390 we do
      not reuse vcpu->preempted so that voluntarily preempted vCPUs are taken
      into account by kvm_vcpu_on_spin, but vmx_vcpu_pi_put is not affected
      (VT-d PI handles voluntary preemption separately, in pi_pre_block).
      
      Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
      ebizzy -M
      
                  vanilla     boosting    improved
      1VM          21443       23520         9%
      2VM           2800        8000       180%
      3VM           1800        3100        72%
      
      Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs,
      one running ebizzy -M, the other running 'stress --cpu 2':
      
      w/ boosting + w/o pv sched yield(vanilla)
      
                  vanilla     boosting   improved
                    1570         4000      155%
      
      w/ boosting + w/ pv sched yield(vanilla)
      
                  vanilla     boosting   improved
                    1844         5157      179%
      
      w/o boosting, perf top in VM:
      
       72.33%  [kernel]       [k] smp_call_function_many
        4.22%  [kernel]       [k] call_function_i
        3.71%  [kernel]       [k] async_page_fault
      
      w/ boosting, perf top in VM:
      
       38.43%  [kernel]       [k] smp_call_function_many
        6.31%  [kernel]       [k] async_page_fault
        6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
        4.88%  [kernel]       [k] call_function_interrupt
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d73eb57b
  20. 10 7月, 2019 1 次提交
  21. 19 6月, 2019 1 次提交
  22. 05 6月, 2019 1 次提交