1. 08 3月, 2021 2 次提交
  2. 28 1月, 2021 1 次提交
  3. 08 12月, 2020 3 次提交
  4. 16 11月, 2020 2 次提交
  5. 10 10月, 2020 1 次提交
  6. 22 9月, 2020 1 次提交
    • M
      mm/hugetlb: fix a race between hugetlb sysctl handlers · 8cfde287
      Muchun Song 提交于
      mainline inclusion
      from mainline-v5.9-rc4
      commit 17743798
      category: bugfix
      bugzilla: NA
      CVE: CVE-2020-25285
      
      --------------------------------
      
      There is a race between the assignment of `table->data` and write value
      to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
      the other thread.
      
        CPU0:                                 CPU1:
                                              proc_sys_write
        hugetlb_sysctl_handler                  proc_sys_call_handler
        hugetlb_sysctl_handler_common             hugetlb_sysctl_handler
          table->data = &tmp;                       hugetlb_sysctl_handler_common
                                                      table->data = &tmp;
            proc_doulongvec_minmax
              do_proc_doulongvec_minmax           sysctl_head_finish
                __do_proc_doulongvec_minmax         unuse_table
                  i = table->data;
                  *i = val;  // corrupt CPU1's stack
      
      Fix this by duplicating the `table`, and only update the duplicate of
      it.  And introduce a helper of proc_hugetlb_doulongvec_minmax() to
      simplify the code.
      
      The following oops was seen:
      
          BUG: kernel NULL pointer dereference, address: 0000000000000000
          #PF: supervisor instruction fetch in kernel mode
          #PF: error_code(0x0010) - not-present page
          Code: Bad RIP value.
          ...
          Call Trace:
           ? set_max_huge_pages+0x3da/0x4f0
           ? alloc_pool_huge_page+0x150/0x150
           ? proc_doulongvec_minmax+0x46/0x60
           ? hugetlb_sysctl_handler_common+0x1c7/0x200
           ? nr_hugepages_store+0x20/0x20
           ? copy_fd_bitmaps+0x170/0x170
           ? hugetlb_sysctl_handler+0x1e/0x20
           ? proc_sys_call_handler+0x2f1/0x300
           ? unregister_sysctl_table+0xb0/0xb0
           ? __fd_install+0x78/0x100
           ? proc_sys_write+0x14/0x20
           ? __vfs_write+0x4d/0x90
           ? vfs_write+0xef/0x240
           ? ksys_write+0xc0/0x160
           ? __ia32_sys_read+0x50/0x50
           ? __close_fd+0x129/0x150
           ? __x64_sys_write+0x43/0x50
           ? do_syscall_64+0x6c/0x200
           ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: e5ff2159 ("hugetlb: multiple hstates for multiple page sizes")
      Signed-off-by: NMuchun Song <songmuchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200828031146.43035-1-songmuchun@bytedance.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      8cfde287
  7. 14 9月, 2020 2 次提交
  8. 08 9月, 2020 1 次提交
  9. 25 8月, 2020 2 次提交
  10. 17 8月, 2020 1 次提交
  11. 12 5月, 2020 1 次提交
    • L
      mm/hugetlb: fix a addressing exception caused by huge_pte_offset · d07f9eb4
      Longpeng 提交于
      commit 3c1d7e6c upstream.
      
      Our machine encountered a panic(addressing exception) after run for a
      long time and the calltrace is:
      
          RIP: hugetlb_fault+0x307/0xbe0
          RSP: 0018:ffff9567fc27f808  EFLAGS: 00010286
          RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48
          RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48
          RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080
          R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8
          R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074
          FS:  00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
          Call Trace:
            follow_hugetlb_page+0x175/0x540
            __get_user_pages+0x2a0/0x7e0
            __get_user_pages_unlocked+0x15d/0x210
            __gfn_to_pfn_memslot+0x3c5/0x460 [kvm]
            try_async_pf+0x6e/0x2a0 [kvm]
            tdp_page_fault+0x151/0x2d0 [kvm]
           ...
            kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm]
            kvm_vcpu_ioctl+0x309/0x6d0 [kvm]
            do_vfs_ioctl+0x3f0/0x540
            SyS_ioctl+0xa1/0xc0
            system_call_fastpath+0x22/0x27
      
      For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it
      may return a wrong 'pmdp' if there is a race.  Please look at the
      following code snippet:
      
          ...
          pud = pud_offset(p4d, addr);
          if (sz != PUD_SIZE && pud_none(*pud))
              return NULL;
          /* hugepage or swap? */
          if (pud_huge(*pud) || !pud_present(*pud))
              return (pte_t *)pud;
      
          pmd = pmd_offset(pud, addr);
          if (sz != PMD_SIZE && pmd_none(*pmd))
              return NULL;
          /* hugepage or swap? */
          if (pmd_huge(*pmd) || !pmd_present(*pmd))
              return (pte_t *)pmd;
          ...
      
      The following sequence would trigger this bug:
      
       - CPU0: sz = PUD_SIZE and *pud = 0 , continue
       - CPU0: "pud_huge(*pud)" is false
       - CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT)
       - CPU0: "!pud_present(*pud)" is false, continue
       - CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp
      
      However, we want CPU0 to return NULL or pudp in this case.
      
      We must make sure there is exactly one dereference of pud and pmd.
      Signed-off-by: NLongpeng <longpeng2@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200413010342.771-1-longpeng2@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      d07f9eb4
  12. 27 12月, 2019 20 次提交
  13. 08 12月, 2018 1 次提交
  14. 21 11月, 2018 1 次提交
    • M
      hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444! · 7b46e532
      Mike Kravetz 提交于
      commit 5e41540c upstream.
      
      This bug has been experienced several times by the Oracle DB team.  The
      BUG is in remove_inode_hugepages() as follows:
      
      	/*
      	 * If page is mapped, it was faulted in after being
      	 * unmapped in caller.  Unmap (again) now after taking
      	 * the fault mutex.  The mutex will prevent faults
      	 * until we finish removing the page.
      	 *
      	 * This race can only happen in the hole punch case.
      	 * Getting here in a truncate operation is a bug.
      	 */
      	if (unlikely(page_mapped(page))) {
      		BUG_ON(truncate_op);
      
      In this case, the elevated map count is not the result of a race.
      Rather it was incorrectly incremented as the result of a bug in the huge
      pmd sharing code.  Consider the following:
      
       - Process A maps a hugetlbfs file of sufficient size and alignment
         (PUD_SIZE) that a pmd page could be shared.
      
       - Process B maps the same hugetlbfs file with the same size and
         alignment such that a pmd page is shared.
      
       - Process B then calls mprotect() to change protections for the mapping
         with the shared pmd. As a result, the pmd is 'unshared'.
      
       - Process B then calls mprotect() again to chage protections for the
         mapping back to their original value. pmd remains unshared.
      
       - Process B then forks and process C is created. During the fork
         process, we do dup_mm -> dup_mmap -> copy_page_range to copy page
         tables. Copying page tables for hugetlb mappings is done in the
         routine copy_hugetlb_page_range.
      
      In copy_hugetlb_page_range(), the destination pte is obtained by:
      
      	dst_pte = huge_pte_alloc(dst, addr, sz);
      
      If pmd sharing is possible, the returned pointer will be to a pte in an
      existing page table.  In the situation above, process C could share with
      either process A or process B.  Since process A is first in the list,
      the returned pte is a pointer to a pte in process A's page table.
      
      However, the check for pmd sharing in copy_hugetlb_page_range is:
      
      	/* If the pagetables are shared don't copy or take references */
      	if (dst_pte == src_pte)
      		continue;
      
      Since process C is sharing with process A instead of process B, the
      above test fails.  The code in copy_hugetlb_page_range which follows
      assumes dst_pte points to a huge_pte_none pte.  It copies the pte entry
      from src_pte to dst_pte and increments this map count of the associated
      page.  This is how we end up with an elevated map count.
      
      To solve, check the dst_pte entry for huge_pte_none.  If !none, this
      implies PMD sharing so do not copy.
      
      Link: http://lkml.kernel.org/r/20181105212315.14125-1-mike.kravetz@oracle.com
      Fixes: c5c99429 ("fix hugepages leak due to pagetable page sharing")
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b46e532
  15. 14 11月, 2018 1 次提交
    • M
      hugetlbfs: dirty pages as they are added to pagecache · fa5466d7
      Mike Kravetz 提交于
      commit 22146c3ce98962436e401f7b7016a6f664c9ffb5 upstream.
      
      Some test systems were experiencing negative huge page reserve counts and
      incorrect file block counts.  This was traced to /proc/sys/vm/drop_caches
      removing clean pages from hugetlbfs file pagecaches.  When non-hugetlbfs
      explicit code removes the pages, the appropriate accounting is not
      performed.
      
      This can be recreated as follows:
       fallocate -l 2M /dev/hugepages/foo
       echo 1 > /proc/sys/vm/drop_caches
       fallocate -l 2M /dev/hugepages/foo
       grep -i huge /proc/meminfo
         AnonHugePages:         0 kB
         ShmemHugePages:        0 kB
         HugePages_Total:    2048
         HugePages_Free:     2047
         HugePages_Rsvd:    18446744073709551615
         HugePages_Surp:        0
         Hugepagesize:       2048 kB
         Hugetlb:         4194304 kB
       ls -lsh /dev/hugepages/foo
         4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
      
      To address this issue, dirty pages as they are added to pagecache.  This
      can easily be reproduced with fallocate as shown above.  Read faulted
      pages will eventually end up being marked dirty.  But there is a window
      where they are clean and could be impacted by code such as drop_caches.
      So, just dirty them all as they are added to the pagecache.
      
      Link: http://lkml.kernel.org/r/b5be45b8-5afe-56cd-9482-28384699a049@oracle.com
      Fixes: 6bda666a ("hugepages: fold find_or_alloc_pages into huge_no_page()")
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NMihcla Hocko <mhocko@suse.com>
      Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa5466d7