1. 18 6月, 2021 36 次提交
  2. 11 6月, 2021 1 次提交
    • W
      KVM: X86: Fix x86_emulator slab cache leak · dfdc0a71
      Wanpeng Li 提交于
      Commit c9b8b07c (KVM: x86: Dynamically allocate per-vCPU emulation context)
      tries to allocate per-vCPU emulation context dynamically, however, the
      x86_emulator slab cache is still exiting after the kvm module is unload
      as below after destroying the VM and unloading the kvm module.
      
      grep x86_emulator /proc/slabinfo
      x86_emulator          36     36   2672   12    8 : tunables    0    0    0 : slabdata      3      3      0
      
      This patch fixes this slab cache leak by destroying the x86_emulator slab cache
      when the kvm module is unloaded.
      
      Fixes: c9b8b07c (KVM: x86: Dynamically allocate per-vCPU emulation context)
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1623387573-5969-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dfdc0a71
  3. 10 6月, 2021 1 次提交
    • S
      KVM: x86: Immediately reset the MMU context when the SMM flag is cleared · 78fcb2c9
      Sean Christopherson 提交于
      Immediately reset the MMU context when the vCPU's SMM flag is cleared so
      that the SMM flag in the MMU role is always synchronized with the vCPU's
      flag.  If RSM fails (which isn't correctly emulated), KVM will bail
      without calling post_leave_smm() and leave the MMU in a bad state.
      
      The bad MMU role can lead to a NULL pointer dereference when grabbing a
      shadow page's rmap for a page fault as the initial lookups for the gfn
      will happen with the vCPU's SMM flag (=0), whereas the rmap lookup will
      use the shadow page's SMM flag, which comes from the MMU (=1).  SMM has
      an entirely different set of memslots, and so the initial lookup can find
      a memslot (SMM=0) and then explode on the rmap memslot lookup (SMM=1).
      
        general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
        KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
        CPU: 1 PID: 8410 Comm: syz-executor382 Not tainted 5.13.0-rc5-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:__gfn_to_rmap arch/x86/kvm/mmu/mmu.c:935 [inline]
        RIP: 0010:gfn_to_rmap+0x2b0/0x4d0 arch/x86/kvm/mmu/mmu.c:947
        Code: <42> 80 3c 20 00 74 08 4c 89 ff e8 f1 79 a9 00 4c 89 fb 4d 8b 37 44
        RSP: 0018:ffffc90000ffef98 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff888015b9f414 RCX: ffff888019669c40
        RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
        RBP: 0000000000000001 R08: ffffffff811d9cdb R09: ffffed10065a6002
        R10: ffffed10065a6002 R11: 0000000000000000 R12: dffffc0000000000
        R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000
        FS:  000000000124b300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 0000000028e31000 CR4: 00000000001526e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         rmap_add arch/x86/kvm/mmu/mmu.c:965 [inline]
         mmu_set_spte+0x862/0xe60 arch/x86/kvm/mmu/mmu.c:2604
         __direct_map arch/x86/kvm/mmu/mmu.c:2862 [inline]
         direct_page_fault+0x1f74/0x2b70 arch/x86/kvm/mmu/mmu.c:3769
         kvm_mmu_do_page_fault arch/x86/kvm/mmu.h:124 [inline]
         kvm_mmu_page_fault+0x199/0x1440 arch/x86/kvm/mmu/mmu.c:5065
         vmx_handle_exit+0x26/0x160 arch/x86/kvm/vmx/vmx.c:6122
         vcpu_enter_guest+0x3bdd/0x9630 arch/x86/kvm/x86.c:9428
         vcpu_run+0x416/0xc20 arch/x86/kvm/x86.c:9494
         kvm_arch_vcpu_ioctl_run+0x4e8/0xa40 arch/x86/kvm/x86.c:9722
         kvm_vcpu_ioctl+0x70f/0xbb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3460
         vfs_ioctl fs/ioctl.c:51 [inline]
         __do_sys_ioctl fs/ioctl.c:1069 [inline]
         __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:1055
         do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x440ce9
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+fb0b6a7e8713aeb0319c@syzkaller.appspotmail.com
      Fixes: 9ec19493 ("KVM: x86: clear SMM flags before loading state while leaving SMM")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210609185619.992058-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      78fcb2c9
  4. 09 6月, 2021 2 次提交
    • L
      KVM: x86: Unload MMU on guest TLB flush if TDP disabled to force MMU sync · b53e84ee
      Lai Jiangshan 提交于
      When using shadow paging, unload the guest MMU when emulating a guest TLB
      flush to ensure all roots are synchronized.  From the guest's perspective,
      flushing the TLB ensures any and all modifications to its PTEs will be
      recognized by the CPU.
      
      Note, unloading the MMU is overkill, but is done to mirror KVM's existing
      handling of INVPCID(all) and ensure the bug is squashed.  Future cleanup
      can be done to more precisely synchronize roots when servicing a guest
      TLB flush.
      
      If TDP is enabled, synchronizing the MMU is unnecessary even if nested
      TDP is in play, as a "legacy" TLB flush from L1 does not invalidate L1's
      TDP mappings.  For EPT, an explicit INVEPT is required to invalidate
      guest-physical mappings; for NPT, guest mappings are always tagged with
      an ASID and thus can only be invalidated via the VMCB's ASID control.
      
      This bug has existed since the introduction of KVM_VCPU_FLUSH_TLB.
      It was only recently exposed after Linux guests stopped flushing the
      local CPU's TLB prior to flushing remote TLBs (see commit 4ce94eab,
      "x86/mm/tlb: Flush remote and local TLBs concurrently"), but is also
      visible in Windows 10 guests.
      Tested-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Fixes: f38a7b75 ("KVM: X86: support paravirtualized help for TLB shootdowns")
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      [sean: massaged comment and changelog]
      Message-Id: <20210531172256.2908-1-jiangshanlai@gmail.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b53e84ee
    • L
      KVM: x86: Ensure PV TLB flush tracepoint reflects KVM behavior · af3511ff
      Lai Jiangshan 提交于
      In record_steal_time(), st->preempted is read twice, and
      trace_kvm_pv_tlb_flush() might output result inconsistent if
      kvm_vcpu_flush_tlb_guest() see a different st->preempted later.
      
      It is a very trivial problem and hardly has actual harm and can be
      avoided by reseting and reading st->preempted in atomic way via xchg().
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      
      Message-Id: <20210531174628.10265-1-jiangshanlai@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      af3511ff