1. 12 2月, 2020 1 次提交
  2. 05 2月, 2020 1 次提交
    • Z
      KVM: fix overflow of zero page refcount with ksm running · 7df003c8
      Zhuang Yanying 提交于
      We are testing Virtual Machine with KSM on v5.4-rc2 kernel,
      and found the zero_page refcount overflow.
      The cause of refcount overflow is increased in try_async_pf
      (get_user_page) without being decreased in mmu_set_spte()
      while handling ept violation.
      In kvm_release_pfn_clean(), only unreserved page will call
      put_page. However, zero page is reserved.
      So, as well as creating and destroy vm, the refcount of
      zero page will continue to increase until it overflows.
      
      step1:
      echo 10000 > /sys/kernel/pages_to_scan/pages_to_scan
      echo 1 > /sys/kernel/pages_to_scan/run
      echo 1 > /sys/kernel/pages_to_scan/use_zero_pages
      
      step2:
      just create several normal qemu kvm vms.
      And destroy it after 10s.
      Repeat this action all the time.
      
      After a long period of time, all domains hang because
      of the refcount of zero page overflow.
      
      Qemu print error log as follow:
       …
       error: kvm run failed Bad address
       EAX=00006cdc EBX=00000008 ECX=80202001 EDX=078bfbfd
       ESI=ffffffff EDI=00000000 EBP=00000008 ESP=00006cc4
       EIP=000efd75 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
       ES =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
       CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
       SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
       DS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
       FS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
       GS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
       LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
       TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
       GDT=     000f7070 00000037
       IDT=     000f70ae 00000000
       CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
       DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
       DR6=00000000ffff0ff0 DR7=0000000000000400
       EFER=0000000000000000
       Code=00 01 00 00 00 e9 e8 00 00 00 c7 05 4c 55 0f 00 01 00 00 00 <8b> 35 00 00 01 00 8b 3d 04 00 01 00 b8 d8 d3 00 00 c1 e0 08 0c ea a3 00 00 01 00 c7 05 04
       …
      
      Meanwhile, a kernel warning is departed.
      
       [40914.836375] WARNING: CPU: 3 PID: 82067 at ./include/linux/mm.h:987 try_get_page+0x1f/0x30
       [40914.836412] CPU: 3 PID: 82067 Comm: CPU 0/KVM Kdump: loaded Tainted: G           OE     5.2.0-rc2 #5
       [40914.836415] RIP: 0010:try_get_page+0x1f/0x30
       [40914.836417] Code: 40 00 c3 0f 1f 84 00 00 00 00 00 48 8b 47 08 a8 01 75 11 8b 47 34 85 c0 7e 10 f0 ff 47 34 b8 01 00 00 00 c3 48 8d 78 ff eb e9 <0f> 0b 31 c0 c3 66 90 66 2e 0f 1f 84 00 0
       0 00 00 00 48 8b 47 08 a8
       [40914.836418] RSP: 0018:ffffb4144e523988 EFLAGS: 00010286
       [40914.836419] RAX: 0000000080000000 RBX: 0000000000000326 RCX: 0000000000000000
       [40914.836420] RDX: 0000000000000000 RSI: 00004ffdeba10000 RDI: ffffdf07093f6440
       [40914.836421] RBP: ffffdf07093f6440 R08: 800000424fd91225 R09: 0000000000000000
       [40914.836421] R10: ffff9eb41bfeebb8 R11: 0000000000000000 R12: ffffdf06bbd1e8a8
       [40914.836422] R13: 0000000000000080 R14: 800000424fd91225 R15: ffffdf07093f6440
       [40914.836423] FS:  00007fb60ffff700(0000) GS:ffff9eb4802c0000(0000) knlGS:0000000000000000
       [40914.836425] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [40914.836426] CR2: 0000000000000000 CR3: 0000002f220e6002 CR4: 00000000003626e0
       [40914.836427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [40914.836427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [40914.836428] Call Trace:
       [40914.836433]  follow_page_pte+0x302/0x47b
       [40914.836437]  __get_user_pages+0xf1/0x7d0
       [40914.836441]  ? irq_work_queue+0x9/0x70
       [40914.836443]  get_user_pages_unlocked+0x13f/0x1e0
       [40914.836469]  __gfn_to_pfn_memslot+0x10e/0x400 [kvm]
       [40914.836486]  try_async_pf+0x87/0x240 [kvm]
       [40914.836503]  tdp_page_fault+0x139/0x270 [kvm]
       [40914.836523]  kvm_mmu_page_fault+0x76/0x5e0 [kvm]
       [40914.836588]  vcpu_enter_guest+0xb45/0x1570 [kvm]
       [40914.836632]  kvm_arch_vcpu_ioctl_run+0x35d/0x580 [kvm]
       [40914.836645]  kvm_vcpu_ioctl+0x26e/0x5d0 [kvm]
       [40914.836650]  do_vfs_ioctl+0xa9/0x620
       [40914.836653]  ksys_ioctl+0x60/0x90
       [40914.836654]  __x64_sys_ioctl+0x16/0x20
       [40914.836658]  do_syscall_64+0x5b/0x180
       [40914.836664]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
       [40914.836666] RIP: 0033:0x7fb61cb6bfc7
      Signed-off-by: NLinFeng <linfeng23@huawei.com>
      Signed-off-by: NZhuang Yanying <ann.zhuangyanying@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7df003c8
  3. 31 1月, 2020 2 次提交
  4. 28 1月, 2020 16 次提交
  5. 24 1月, 2020 2 次提交
  6. 21 1月, 2020 4 次提交
  7. 09 1月, 2020 1 次提交
  8. 23 11月, 2019 1 次提交
  9. 15 11月, 2019 2 次提交
  10. 14 11月, 2019 1 次提交
  11. 12 11月, 2019 1 次提交
    • S
      KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved · a78986aa
      Sean Christopherson 提交于
      Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and
      instead manually handle ZONE_DEVICE on a case-by-case basis.  For things
      like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal
      pages, e.g. put pages grabbed via gup().  But for flows such as setting
      A/D bits or shifting refcounts for transparent huge pages, KVM needs to
      to avoid processing ZONE_DEVICE pages as the flows in question lack the
      underlying machinery for proper handling of ZONE_DEVICE pages.
      
      This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup()
      when running a KVM guest backed with /dev/dax memory, as KVM straight up
      doesn't put any references to ZONE_DEVICE pages acquired by gup().
      
      Note, Dan Williams proposed an alternative solution of doing put_page()
      on ZONE_DEVICE pages immediately after gup() in order to simplify the
      auditing needed to ensure is_zone_device_page() is called if and only if
      the backing device is pinned (via gup()).  But that approach would break
      kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til
      unmap() when accessing guest memory, unlike KVM's secondary MMU, which
      coordinates with mmu_notifier invalidations to avoid creating stale
      page references, i.e. doesn't rely on pages being pinned.
      
      [*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.plReported-by: NAdam Borowski <kilobyte@angband.pl>
      Analyzed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Cc: stable@vger.kernel.org
      Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a78986aa
  12. 11 11月, 2019 2 次提交
    • P
      KVM: fix placement of refcount initialization · e2d3fcaf
      Paolo Bonzini 提交于
      Reported by syzkaller:
      
         =============================
         WARNING: suspicious RCU usage
         -----------------------------
         ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage!
      
         other info that might help us debug this:
      
         rcu_scheduler_active = 2, debug_locks = 1
         no locks held by repro_11/12688.
      
         stack backtrace:
         Call Trace:
          dump_stack+0x7d/0xc5
          lockdep_rcu_suspicious+0x123/0x170
          kvm_dev_ioctl+0x9a9/0x1260 [kvm]
          do_vfs_ioctl+0x1a1/0xfb0
          ksys_ioctl+0x6d/0x80
          __x64_sys_ioctl+0x73/0xb0
          do_syscall_64+0x108/0xaa0
          entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Commit a97b0e77 (kvm: call kvm_arch_destroy_vm if vm creation fails)
      sets users_count to 1 before kvm_arch_init_vm(), however, if kvm_arch_init_vm()
      fails, we need to decrease this count.  By moving it earlier, we can push
      the decrease to out_err_no_arch_destroy_vm without introducing yet another
      error label.
      
      syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=15209b84e00000
      
      Reported-by: syzbot+75475908cd0910f141ee@syzkaller.appspotmail.com
      Fixes: a97b0e77 ("kvm: call kvm_arch_destroy_vm if vm creation fails")
      Cc: Jim Mattson <jmattson@google.com>
      Analyzed-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e2d3fcaf
    • P
      KVM: Fix NULL-ptr deref after kvm_create_vm fails · 8a44119a
      Paolo Bonzini 提交于
      Reported by syzkaller:
      
          kasan: CONFIG_KASAN_INLINE enabled
          kasan: GPF could be caused by NULL-ptr deref or user memory access
          general protection fault: 0000 [#1] PREEMPT SMP KASAN
          CPU: 0 PID: 14727 Comm: syz-executor.3 Not tainted 5.4.0-rc4+ #0
          RIP: 0010:kvm_coalesced_mmio_init+0x5d/0x110 arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:121
          Call Trace:
           kvm_dev_ioctl_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:3446 [inline]
           kvm_dev_ioctl+0x781/0x1490 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3494
           vfs_ioctl fs/ioctl.c:46 [inline]
           file_ioctl fs/ioctl.c:509 [inline]
           do_vfs_ioctl+0x196/0x1150 fs/ioctl.c:696
           ksys_ioctl+0x62/0x90 fs/ioctl.c:713
           __do_sys_ioctl fs/ioctl.c:720 [inline]
           __se_sys_ioctl fs/ioctl.c:718 [inline]
           __x64_sys_ioctl+0x6e/0xb0 fs/ioctl.c:718
           do_syscall_64+0xca/0x5d0 arch/x86/entry/common.c:290
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Commit 9121923c ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm")
      moves memslots and buses allocations around, however, if kvm->srcu/irq_srcu fails
      initialization, NULL will be returned instead of error code, NULL will not be intercepted
      in kvm_dev_ioctl_create_vm() and be dereferenced by kvm_coalesced_mmio_init(), this patch
      fixes it.
      
      Moving the initialization is required anyway to avoid an incorrect synchronize_srcu that
      was also reported by syzkaller:
      
       wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
       __synchronize_srcu+0x197/0x250 kernel/rcu/srcutree.c:921
       synchronize_srcu_expedited kernel/rcu/srcutree.c:946 [inline]
       synchronize_srcu+0x239/0x3e8 kernel/rcu/srcutree.c:997
       kvm_page_track_unregister_notifier+0xe7/0x130 arch/x86/kvm/page_track.c:212
       kvm_mmu_uninit_vm+0x1e/0x30 arch/x86/kvm/mmu.c:5828
       kvm_arch_destroy_vm+0x4a2/0x5f0 arch/x86/kvm/x86.c:9579
       kvm_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:702 [inline]
      
      so do it.
      
      Reported-by: syzbot+89a8060879fa0bd2db4f@syzkaller.appspotmail.com
      Reported-by: syzbot+e27e7027eb2b80e44225@syzkaller.appspotmail.com
      Fixes: 9121923c ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm")
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8a44119a
  13. 05 11月, 2019 1 次提交
  14. 04 11月, 2019 1 次提交
  15. 31 10月, 2019 1 次提交
  16. 25 10月, 2019 1 次提交
  17. 22 10月, 2019 2 次提交