提交 1e392147 编写于 作者: A Andrea Arcangeli 提交者: Linus Torvalds

userfaultfd: hugetlbfs: prevent UFFDIO_COPY to fill beyond the end of i_size

This oops:

  kernel BUG at fs/hugetlbfs/inode.c:484!
  RIP: remove_inode_hugepages+0x3d0/0x410
  Call Trace:
    hugetlbfs_setattr+0xd9/0x130
    notify_change+0x292/0x410
    do_truncate+0x65/0xa0
    do_sys_ftruncate.constprop.3+0x11a/0x180
    SyS_ftruncate+0xe/0x10
    tracesys+0xd9/0xde

was caused by the lack of i_size check in hugetlb_mcopy_atomic_pte.

mmap() can still succeed beyond the end of the i_size after vmtruncate
zapped vmas in those ranges, but the faults must not succeed, and that
includes UFFDIO_COPY.

We could differentiate the retval to userland to represent a SIGBUS like
a page fault would do (vs SIGSEGV), but it doesn't seem very useful and
we'd need to pick a random retval as there's no meaningful syscall
retval that would differentiate from SIGSEGV and SIGBUS, there's just
-EFAULT.

Link: http://lkml.kernel.org/r/20171016223914.2421-2-aarcange@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
上级 5cb0512c
...@@ -3984,6 +3984,9 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, ...@@ -3984,6 +3984,9 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
unsigned long src_addr, unsigned long src_addr,
struct page **pagep) struct page **pagep)
{ {
struct address_space *mapping;
pgoff_t idx;
unsigned long size;
int vm_shared = dst_vma->vm_flags & VM_SHARED; int vm_shared = dst_vma->vm_flags & VM_SHARED;
struct hstate *h = hstate_vma(dst_vma); struct hstate *h = hstate_vma(dst_vma);
pte_t _dst_pte; pte_t _dst_pte;
...@@ -4021,13 +4024,24 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, ...@@ -4021,13 +4024,24 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
__SetPageUptodate(page); __SetPageUptodate(page);
set_page_huge_active(page); set_page_huge_active(page);
mapping = dst_vma->vm_file->f_mapping;
idx = vma_hugecache_offset(h, dst_vma, dst_addr);
/* /*
* If shared, add to page cache * If shared, add to page cache
*/ */
if (vm_shared) { if (vm_shared) {
struct address_space *mapping = dst_vma->vm_file->f_mapping; size = i_size_read(mapping->host) >> huge_page_shift(h);
pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr); ret = -EFAULT;
if (idx >= size)
goto out_release_nounlock;
/*
* Serialization between remove_inode_hugepages() and
* huge_add_to_page_cache() below happens through the
* hugetlb_fault_mutex_table that here must be hold by
* the caller.
*/
ret = huge_add_to_page_cache(page, mapping, idx); ret = huge_add_to_page_cache(page, mapping, idx);
if (ret) if (ret)
goto out_release_nounlock; goto out_release_nounlock;
...@@ -4036,6 +4050,20 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, ...@@ -4036,6 +4050,20 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ptl = huge_pte_lockptr(h, dst_mm, dst_pte); ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
spin_lock(ptl); spin_lock(ptl);
/*
* Recheck the i_size after holding PT lock to make sure not
* to leave any page mapped (as page_mapped()) beyond the end
* of the i_size (remove_inode_hugepages() is strict about
* enforcing that). If we bail out here, we'll also leave a
* page in the radix tree in the vm_shared case beyond the end
* of the i_size, but remove_inode_hugepages() will take care
* of it as soon as we drop the hugetlb_fault_mutex_table.
*/
size = i_size_read(mapping->host) >> huge_page_shift(h);
ret = -EFAULT;
if (idx >= size)
goto out_release_unlock;
ret = -EEXIST; ret = -EEXIST;
if (!huge_pte_none(huge_ptep_get(dst_pte))) if (!huge_pte_none(huge_ptep_get(dst_pte)))
goto out_release_unlock; goto out_release_unlock;
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册