1. 15 10月, 2021 2 次提交
  2. 13 10月, 2021 7 次提交
  3. 12 10月, 2021 2 次提交
  4. 06 7月, 2021 4 次提交
    • W
      KVM: X86: Fix x86_emulator slab cache leak · f9a6de85
      Wanpeng Li 提交于
      stable inclusion
      from stable-5.10.46
      commit 3a9934d6b8dd8a91d61ed2d0d538fa27cb9192a3
      bugzilla: 168323
      CVE: NA
      
      --------------------------------
      
      commit dfdc0a71 upstream.
      
      Commit c9b8b07c (KVM: x86: Dynamically allocate per-vCPU emulation context)
      tries to allocate per-vCPU emulation context dynamically, however, the
      x86_emulator slab cache is still exiting after the kvm module is unload
      as below after destroying the VM and unloading the kvm module.
      
      grep x86_emulator /proc/slabinfo
      x86_emulator          36     36   2672   12    8 : tunables    0    0    0 : slabdata      3      3      0
      
      This patch fixes this slab cache leak by destroying the x86_emulator slab cache
      when the kvm module is unloaded.
      
      Fixes: c9b8b07c (KVM: x86: Dynamically allocate per-vCPU emulation context)
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1623387573-5969-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      f9a6de85
    • S
      KVM: x86/mmu: Calculate and check "full" mmu_role for nested MMU · 4ba004a3
      Sean Christopherson 提交于
      stable inclusion
      from stable-5.10.46
      commit 18eca69f88f2e3f1421d57f1dc4219a68de5891d
      bugzilla: 168323
      CVE: NA
      
      --------------------------------
      
      commit 654430ef upstream.
      
      Calculate and check the full mmu_role when initializing the MMU context
      for the nested MMU, where "full" means the bits and pieces of the role
      that aren't handled by kvm_calc_mmu_role_common().  While the nested MMU
      isn't used for shadow paging, things like the number of levels in the
      guest's page tables are surprisingly important when walking the guest
      page tables.  Failure to reinitialize the nested MMU context if L2's
      paging mode changes can result in unexpected and/or missed page faults,
      and likely other explosions.
      
      E.g. if an L1 vCPU is running both a 32-bit PAE L2 and a 64-bit L2, the
      "common" role calculation will yield the same role for both L2s.  If the
      64-bit L2 is run after the 32-bit PAE L2, L0 will fail to reinitialize
      the nested MMU context, ultimately resulting in a bad walk of L2's page
      tables as the MMU will still have a guest root_level of PT32E_ROOT_LEVEL.
      
        WARNING: CPU: 4 PID: 167334 at arch/x86/kvm/vmx/vmx.c:3075 ept_save_pdptrs+0x15/0xe0 [kvm_intel]
        Modules linked in: kvm_intel]
        CPU: 4 PID: 167334 Comm: CPU 3/KVM Not tainted 5.13.0-rc1-d849817d5673-reqs #185
        Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
        RIP: 0010:ept_save_pdptrs+0x15/0xe0 [kvm_intel]
        Code: <0f> 0b c3 f6 87 d8 02 00f
        RSP: 0018:ffffbba702dbba00 EFLAGS: 00010202
        RAX: 0000000000000011 RBX: 0000000000000002 RCX: ffffffff810a2c08
        RDX: ffff91d7bc30acc0 RSI: 0000000000000011 RDI: ffff91d7bc30a600
        RBP: ffff91d7bc30a600 R08: 0000000000000010 R09: 0000000000000007
        R10: 0000000000000000 R11: 0000000000000000 R12: ffff91d7bc30a600
        R13: ffff91d7bc30acc0 R14: ffff91d67c123460 R15: 0000000115d7e005
        FS:  00007fe8e9ffb700(0000) GS:ffff91d90fb00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000029f15a001 CR4: 00000000001726e0
        Call Trace:
         kvm_pdptr_read+0x3a/0x40 [kvm]
         paging64_walk_addr_generic+0x327/0x6a0 [kvm]
         paging64_gva_to_gpa_nested+0x3f/0xb0 [kvm]
         kvm_fetch_guest_virt+0x4c/0xb0 [kvm]
         __do_insn_fetch_bytes+0x11a/0x1f0 [kvm]
         x86_decode_insn+0x787/0x1490 [kvm]
         x86_decode_emulated_instruction+0x58/0x1e0 [kvm]
         x86_emulate_instruction+0x122/0x4f0 [kvm]
         vmx_handle_exit+0x120/0x660 [kvm_intel]
         kvm_arch_vcpu_ioctl_run+0xe25/0x1cb0 [kvm]
         kvm_vcpu_ioctl+0x211/0x5a0 [kvm]
         __x64_sys_ioctl+0x83/0xb0
         do_syscall_64+0x40/0xb0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: bf627a92 ("x86/kvm/mmu: check if MMU reconfiguration is needed in init_kvm_nested_mmu()")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210610220026.1364486-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      4ba004a3
    • S
      KVM: x86: Immediately reset the MMU context when the SMM flag is cleared · a557ec4c
      Sean Christopherson 提交于
      stable inclusion
      from stable-5.10.46
      commit 669a8866e468fd020d34eb00e08cb41d3774b71b
      bugzilla: 168323
      CVE: NA
      
      --------------------------------
      
      commit 78fcb2c9 upstream.
      
      Immediately reset the MMU context when the vCPU's SMM flag is cleared so
      that the SMM flag in the MMU role is always synchronized with the vCPU's
      flag.  If RSM fails (which isn't correctly emulated), KVM will bail
      without calling post_leave_smm() and leave the MMU in a bad state.
      
      The bad MMU role can lead to a NULL pointer dereference when grabbing a
      shadow page's rmap for a page fault as the initial lookups for the gfn
      will happen with the vCPU's SMM flag (=0), whereas the rmap lookup will
      use the shadow page's SMM flag, which comes from the MMU (=1).  SMM has
      an entirely different set of memslots, and so the initial lookup can find
      a memslot (SMM=0) and then explode on the rmap memslot lookup (SMM=1).
      
        general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
        KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
        CPU: 1 PID: 8410 Comm: syz-executor382 Not tainted 5.13.0-rc5-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:__gfn_to_rmap arch/x86/kvm/mmu/mmu.c:935 [inline]
        RIP: 0010:gfn_to_rmap+0x2b0/0x4d0 arch/x86/kvm/mmu/mmu.c:947
        Code: <42> 80 3c 20 00 74 08 4c 89 ff e8 f1 79 a9 00 4c 89 fb 4d 8b 37 44
        RSP: 0018:ffffc90000ffef98 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff888015b9f414 RCX: ffff888019669c40
        RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
        RBP: 0000000000000001 R08: ffffffff811d9cdb R09: ffffed10065a6002
        R10: ffffed10065a6002 R11: 0000000000000000 R12: dffffc0000000000
        R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000
        FS:  000000000124b300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 0000000028e31000 CR4: 00000000001526e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         rmap_add arch/x86/kvm/mmu/mmu.c:965 [inline]
         mmu_set_spte+0x862/0xe60 arch/x86/kvm/mmu/mmu.c:2604
         __direct_map arch/x86/kvm/mmu/mmu.c:2862 [inline]
         direct_page_fault+0x1f74/0x2b70 arch/x86/kvm/mmu/mmu.c:3769
         kvm_mmu_do_page_fault arch/x86/kvm/mmu.h:124 [inline]
         kvm_mmu_page_fault+0x199/0x1440 arch/x86/kvm/mmu/mmu.c:5065
         vmx_handle_exit+0x26/0x160 arch/x86/kvm/vmx/vmx.c:6122
         vcpu_enter_guest+0x3bdd/0x9630 arch/x86/kvm/x86.c:9428
         vcpu_run+0x416/0xc20 arch/x86/kvm/x86.c:9494
         kvm_arch_vcpu_ioctl_run+0x4e8/0xa40 arch/x86/kvm/x86.c:9722
         kvm_vcpu_ioctl+0x70f/0xbb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3460
         vfs_ioctl fs/ioctl.c:51 [inline]
         __do_sys_ioctl fs/ioctl.c:1069 [inline]
         __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:1055
         do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x440ce9
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+fb0b6a7e8713aeb0319c@syzkaller.appspotmail.com
      Fixes: 9ec19493 ("KVM: x86: clear SMM flags before loading state while leaving SMM")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210609185619.992058-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      a557ec4c
    • J
      kvm: LAPIC: Restore guard to prevent illegal APIC register access · 6c9054dd
      Jim Mattson 提交于
      stable inclusion
      from stable-5.10.46
      commit 018685461a5b9a9a70e664ac77aef0d7415a3fd5
      bugzilla: 168323
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 218bf772 ]
      
      Per the SDM, "any access that touches bytes 4 through 15 of an APIC
      register may cause undefined behavior and must not be executed."
      Worse, such an access in kvm_lapic_reg_read can result in a leak of
      kernel stack contents. Prior to commit 01402cf8 ("kvm: LAPIC:
      write down valid APIC registers"), such an access was explicitly
      disallowed. Restore the guard that was removed in that commit.
      
      Fixes: 01402cf8 ("kvm: LAPIC: write down valid APIC registers")
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Message-Id: <20210602205224.3189316-1-jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      6c9054dd
  5. 03 7月, 2021 2 次提交
    • S
      KVM: x86: Ensure liveliness of nested VM-Enter fail tracepoint message · 49bf0eca
      Sean Christopherson 提交于
      stable inclusion
      from stable-5.10.44
      commit d046f724bbd725a24007b7e52b2d675249870888
      bugzilla: 109295
      CVE: NA
      
      --------------------------------
      
      commit f31500b0 upstream.
      
      Use the __string() machinery provided by the tracing subystem to make a
      copy of the string literals consumed by the "nested VM-Enter failed"
      tracepoint.  A complete copy is necessary to ensure that the tracepoint
      can't outlive the data/memory it consumes and deference stale memory.
      
      Because the tracepoint itself is defined by kvm, if kvm-intel and/or
      kvm-amd are built as modules, the memory holding the string literals
      defined by the vendor modules will be freed when the module is unloaded,
      whereas the tracepoint and its data in the ring buffer will live until
      kvm is unloaded (or "indefinitely" if kvm is built-in).
      
      This bug has existed since the tracepoint was added, but was recently
      exposed by a new check in tracing to detect exactly this type of bug.
      
        fmt: '%s%s
        ' current_buffer: ' vmx_dirty_log_t-140127  [003] ....  kvm_nested_vmenter_failed: '
        WARNING: CPU: 3 PID: 140134 at kernel/trace/trace.c:3759 trace_check_vprintf+0x3be/0x3e0
        CPU: 3 PID: 140134 Comm: less Not tainted 5.13.0-rc1-ce2e73ce600a-req #184
        Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
        RIP: 0010:trace_check_vprintf+0x3be/0x3e0
        Code: <0f> 0b 44 8b 4c 24 1c e9 a9 fe ff ff c6 44 02 ff 00 49 8b 97 b0 20
        RSP: 0018:ffffa895cc37bcb0 EFLAGS: 00010282
        RAX: 0000000000000000 RBX: ffffa895cc37bd08 RCX: 0000000000000027
        RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff9766cfad74f8
        RBP: ffffffffc0a041d4 R08: ffff9766cfad74f0 R09: ffffa895cc37bad8
        R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffc0a041d4
        R13: ffffffffc0f4dba8 R14: 0000000000000000 R15: ffff976409f2c000
        FS:  00007f92fa200740(0000) GS:ffff9766cfac0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000559bd11b0000 CR3: 000000019fbaa002 CR4: 00000000001726e0
        Call Trace:
         trace_event_printf+0x5e/0x80
         trace_raw_output_kvm_nested_vmenter_failed+0x3a/0x60 [kvm]
         print_trace_line+0x1dd/0x4e0
         s_show+0x45/0x150
         seq_read_iter+0x2d5/0x4c0
         seq_read+0x106/0x150
         vfs_read+0x98/0x180
         ksys_read+0x5f/0xe0
         do_syscall_64+0x40/0xb0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Fixes: 380e0055 ("KVM: nVMX: trace nested VM-Enter failures detected by H/W")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Message-Id: <20210607175748.674002-1-seanjc@google.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      49bf0eca
    • L
      KVM: X86: MMU: Use the correct inherited permissions to get shadow page · 26a4c1e7
      Lai Jiangshan 提交于
      stable inclusion
      from stable-5.10.44
      commit 6b6ff4d1f349cb35a7c7d2057819af1b14f80437
      bugzilla: 109295
      CVE: NA
      
      --------------------------------
      
      commit b1bd5cba upstream.
      
      When computing the access permissions of a shadow page, use the effective
      permissions of the walk up to that point, i.e. the logic AND of its parents'
      permissions.  Two guest PxE entries that point at the same table gfn need to
      be shadowed with different shadow pages if their parents' permissions are
      different.  KVM currently uses the effective permissions of the last
      non-leaf entry for all non-leaf entries.  Because all non-leaf SPTEs have
      full ("uwx") permissions, and the effective permissions are recorded only
      in role.access and merged into the leaves, this can lead to incorrect
      reuse of a shadow page and eventually to a missing guest protection page
      fault.
      
      For example, here is a shared pagetable:
      
         pgd[]   pud[]        pmd[]            virtual address pointers
                           /->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
              /->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
         pgd-|           (shared pmd[] as above)
              \->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
                           \->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
      
        pud1 and pud2 point to the same pmd table, so:
        - ptr1 and ptr3 points to the same page.
        - ptr2 and ptr4 points to the same page.
      
      (pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
      
      - First, the guest reads from ptr1 first and KVM prepares a shadow
        page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
        "u--" comes from the effective permissions of pgd, pud1 and
        pmd1, which are stored in pt->access.  "u--" is used also to get
        the pagetable for pud1, instead of "uw-".
      
      - Then the guest writes to ptr2 and KVM reuses pud1 which is present.
        The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
        even though the pud1 pmd (because of the incorrect argument to
        kvm_mmu_get_page in the previous step) has role.access="u--".
      
      - Then the guest reads from ptr3.  The hypervisor reuses pud1's
        shadow pmd for pud2, because both use "u--" for their permissions.
        Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
      
      - At last, the guest writes to ptr4.  This causes no vmexit or pagefault,
        because pud1's shadow page structures included an "uw-" page even though
        its role.access was "u--".
      
      Any kind of shared pagetable might have the similar problem when in
      virtual machine without TDP enabled if the permissions are different
      from different ancestors.
      
      In order to fix the problem, we change pt->access to be an array, and
      any access in it will not include permissions ANDed from child ptes.
      
      The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
      Remember to test it with TDP disabled.
      
      The problem had existed long before the commit 41074d07 ("KVM: MMU:
      Fix inherited permissions for emulated guest pte updates"), and it
      is hard to find which is the culprit.  So there is no fixes tag here.
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com>
      Cc: stable@vger.kernel.org
      Fixes: cea0f0e7 ("[PATCH] KVM: MMU: Shadow page table caching")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      26a4c1e7
  6. 15 6月, 2021 2 次提交
  7. 03 6月, 2021 21 次提交