1. 19 1月, 2022 1 次提交
  2. 30 10月, 2021 1 次提交
  3. 15 10月, 2021 2 次提交
  4. 13 10月, 2021 3 次提交
  5. 12 10月, 2021 4 次提交
  6. 26 9月, 2021 1 次提交
  7. 02 9月, 2021 1 次提交
  8. 31 8月, 2021 1 次提交
    • J
      mm: add pin memory method for checkpoint add restore · 7dc4c73d
      Jingxian He 提交于
      hulk inclusion
      category: feature
      bugzilla: 48159
      CVE: N/A
      
      ------------------------------
      
      We can use the checkpoint and restore in userspace(criu) method to dump
      and restore tasks when updating the kernel.
      Currently, criu needs dump all memory data of tasks to files.
      When the memory size is very large(larger than 1G),
      the cost time of the dumping data will be very long(more than 1 min).
      
      By pin the memory data of tasks and collect the corresponding physical pages
      mapping info in checkpoint process, we can remap the physical pages to
      restore tasks after upgrading the kernel. This pin memory method can
      restore the task data within one second.
      
      The pin memory area info is saved in the reserved memblock,
      which can keep usable in the kernel update process.
      
      The pin memory driver provides the following ioctl command for criu:
      1) SET_PIN_MEM_AREA:
      Set pin memory area, which can be remap to the restore task.
      2) CLEAR_PIN_MEM_AREA:
      Clear the pin memory area info,
      which enable user reset the pin data.
      3) REMAP_PIN_MEM_AREA:
      Remap the pages of the pin memory to the restore task.
      Signed-off-by: NJingxian He <hejingxian@huawei.com>
      Reviewed-by: NChen Wandun <chenwandun@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      7dc4c73d
  9. 14 7月, 2021 5 次提交
  10. 19 4月, 2021 1 次提交
  11. 14 4月, 2021 1 次提交
  12. 09 3月, 2021 1 次提交
  13. 12 1月, 2021 1 次提交
    • S
      mm/rmap: always do TTU_IGNORE_ACCESS · 6d38f6ae
      Shakeel Butt 提交于
      stable inclusion
      from stable-5.10.4
      commit dd156e3fcabff9ac2f102ae92f9b2f5dd8525e4d
      bugzilla: 46903
      
      --------------------------------
      
      [ Upstream commit 013339df ]
      
      Since commit 369ea824 ("mm/rmap: update to new mmu_notifier semantic
      v2"), the code to check the secondary MMU's page table access bit is
      broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from the
      secondary MMU's page table before the check.  More specifically for those
      secondary MMUs which unmap the memory in
      mmu_notifier_invalidate_range_start() like kvm.
      
      However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the
      absence of TTU_IGNORE_ACCESS and it explicitly performs the page table
      access check before trying to unmap the page.  So, at worst the reclaim
      will miss accesses in a very short window if we remove page table access
      check in unmapping code.
      
      There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg
      reclaim.  From memcg reclaim the page_referenced() only account the
      accesses from the processes which are in the same memcg of the target page
      but the unmapping code is considering accesses from all the processes, so,
      decreasing the effectiveness of memcg reclaim.
      
      The simplest solution is to always assume TTU_IGNORE_ACCESS in unmapping
      code.
      
      Link: https://lkml.kernel.org/r/20201104231928.1494083-1-shakeelb@google.com
      Fixes: 369ea824 ("mm/rmap: update to new mmu_notifier semantic v2")
      Signed-off-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      6d38f6ae
  14. 23 11月, 2020 1 次提交
    • G
      mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault() · bfe8cc1d
      Gerald Schaefer 提交于
      Alexander reported a syzkaller / KASAN finding on s390, see below for
      complete output.
      
      In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be
      freed in some cases.  In the case of userfaultfd_missing(), this will
      happen after calling handle_userfault(), which might have released the
      mmap_lock.  Therefore, the following pte_free(vma->vm_mm, pgtable) will
      access an unstable vma->vm_mm, which could have been freed or re-used
      already.
      
      For all architectures other than s390 this will go w/o any negative
      impact, because pte_free() simply frees the page and ignores the
      passed-in mm.  The implementation for SPARC32 would also access
      mm->page_table_lock for pte_free(), but there is no THP support in
      SPARC32, so the buggy code path will not be used there.
      
      For s390, the mm->context.pgtable_list is being used to maintain the 2K
      pagetable fragments, and operating on an already freed or even re-used
      mm could result in various more or less subtle bugs due to list /
      pagetable corruption.
      
      Fix this by calling pte_free() before handle_userfault(), similar to how
      it is already done in __do_huge_pmd_anonymous_page() for the WRITE /
      non-huge_zero_page case.
      
      Commit 6b251fc9 ("userfaultfd: call handle_userfault() for
      userfaultfd_missing() faults") actually introduced both, the
      do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page()
      changes wrt to calling handle_userfault(), but only in the latter case
      it put the pte_free() before calling handle_userfault().
      
        BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
        Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334
      
        CPU: 1 PID: 9334 Comm: syz-executor.0 Not tainted 5.10.0-rc1-syzkaller-07083-g4c9720875573 #0
        Hardware name: IBM 3906 M04 701 (KVM/Linux)
        Call Trace:
          do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
          create_huge_pmd mm/memory.c:4256 [inline]
          __handle_mm_fault+0xe6e/0x1068 mm/memory.c:4480
          handle_mm_fault+0x288/0x748 mm/memory.c:4607
          do_exception+0x394/0xae0 arch/s390/mm/fault.c:479
          do_dat_exception+0x34/0x80 arch/s390/mm/fault.c:567
          pgm_check_handler+0x1da/0x22c arch/s390/kernel/entry.S:706
          copy_from_user_mvcos arch/s390/lib/uaccess.c:111 [inline]
          raw_copy_from_user+0x3a/0x88 arch/s390/lib/uaccess.c:174
          _copy_from_user+0x48/0xa8 lib/usercopy.c:16
          copy_from_user include/linux/uaccess.h:192 [inline]
          __do_sys_sigaltstack kernel/signal.c:4064 [inline]
          __s390x_sys_sigaltstack+0xc8/0x240 kernel/signal.c:4060
          system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
      
        Allocated by task 9334:
          slab_alloc_node mm/slub.c:2891 [inline]
          slab_alloc mm/slub.c:2899 [inline]
          kmem_cache_alloc+0x118/0x348 mm/slub.c:2904
          vm_area_dup+0x9c/0x2b8 kernel/fork.c:356
          __split_vma+0xba/0x560 mm/mmap.c:2742
          split_vma+0xca/0x108 mm/mmap.c:2800
          mlock_fixup+0x4ae/0x600 mm/mlock.c:550
          apply_vma_lock_flags+0x2c6/0x398 mm/mlock.c:619
          do_mlock+0x1aa/0x718 mm/mlock.c:711
          __do_sys_mlock2 mm/mlock.c:738 [inline]
          __s390x_sys_mlock2+0x86/0xa8 mm/mlock.c:728
          system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
      
        Freed by task 9333:
          slab_free mm/slub.c:3142 [inline]
          kmem_cache_free+0x7c/0x4b8 mm/slub.c:3158
          __vma_adjust+0x7b2/0x2508 mm/mmap.c:960
          vma_merge+0x87e/0xce0 mm/mmap.c:1209
          userfaultfd_release+0x412/0x6b8 fs/userfaultfd.c:868
          __fput+0x22c/0x7a8 fs/file_table.c:281
          task_work_run+0x200/0x320 kernel/task_work.c:151
          tracehook_notify_resume include/linux/tracehook.h:188 [inline]
          do_notify_resume+0x100/0x148 arch/s390/kernel/signal.c:538
          system_call+0xe6/0x28c arch/s390/kernel/entry.S:416
      
        The buggy address belongs to the object at 00000000962d6948 which belongs to the cache vm_area_struct of size 200
        The buggy address is located 64 bytes inside of 200-byte region [00000000962d6948, 00000000962d6a10)
        The buggy address belongs to the page: page:00000000313a09fe refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x962d6 flags: 0x3ffff00000000200(slab)
        raw: 3ffff00000000200 000040000257e080 0000000c0000000c 000000008020ba00
        raw: 0000000000000000 000f001e00000000 ffffffff00000001 0000000096959501
        page dumped because: kasan: bad access detected
        page->mem_cgroup:0000000096959501
      
        Memory state around the buggy address:
         00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
         00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
        >00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                              ^
         00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00
         00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        ==================================================================
      
      Fixes: 6b251fc9 ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults")
      Reported-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
      Signed-off-by: NGerald Schaefer <gerald.schaefer@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: <stable@vger.kernel.org>	[4.3+]
      Link: https://lkml.kernel.org/r/20201110190329.11920-1-gerald.schaefer@linux.ibm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bfe8cc1d
  15. 17 10月, 2020 6 次提交
  16. 14 10月, 2020 1 次提交
  17. 28 9月, 2020 1 次提交
    • P
      mm/thp: Split huge pmds/puds if they're pinned when fork() · d042035e
      Peter Xu 提交于
      Pinned pages shouldn't be write-protected when fork() happens, because
      follow up copy-on-write on these pages could cause the pinned pages to
      be replaced by random newly allocated pages.
      
      For huge PMDs, we split the huge pmd if pinning is detected.  So that
      future handling will be done by the PTE level (with our latest changes,
      each of the small pages will be copied).  We can achieve this by let
      copy_huge_pmd() return -EAGAIN for pinned pages, so that we'll
      fallthrough in copy_pmd_range() and finally land the next
      copy_pte_range() call.
      
      Huge PUDs will be even more special - so far it does not support
      anonymous pages.  But it can actually be done the same as the huge PMDs
      even if the split huge PUDs means to erase the PUD entries.  It'll
      guarantee the follow up fault ins will remap the same pages in either
      parent/child later.
      
      This might not be the most efficient way, but it should be easy and
      clean enough.  It should be fine, since we're tackling with a very rare
      case just to make sure userspaces that pinned some thps will still work
      even without MADV_DONTFORK and after they fork()ed.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d042035e
  18. 20 9月, 2020 1 次提交
    • R
      mm/thp: fix __split_huge_pmd_locked() for migration PMD · ec0abae6
      Ralph Campbell 提交于
      A migrating transparent huge page has to already be unmapped.  Otherwise,
      the page could be modified while it is being copied to a new page and data
      could be lost.  The function __split_huge_pmd() checks for a PMD migration
      entry before calling __split_huge_pmd_locked() leading one to think that
      __split_huge_pmd_locked() can handle splitting a migrating PMD.
      
      However, the code always increments the page->_mapcount and adjusts the
      memory control group accounting assuming the page is mapped.
      
      Also, if the PMD entry is a migration PMD entry, the call to
      is_huge_zero_pmd(*pmd) is incorrect because it calls pmd_pfn(pmd) instead
      of migration_entry_to_pfn(pmd_to_swp_entry(pmd)).  Fix these problems by
      checking for a PMD migration entry.
      
      Fixes: 84c3fc4e ("mm: thp: check pmd migration entry in common path")
      Signed-off-by: NRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NYang Shi <shy828301@gmail.com>
      Reviewed-by: NZi Yan <ziy@nvidia.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Bharata B Rao <bharata@linux.ibm.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: <stable@vger.kernel.org>	[4.14+]
      Link: https://lkml.kernel.org/r/20200903183140.19055-1-rcampbell@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec0abae6
  19. 05 9月, 2020 1 次提交
  20. 04 9月, 2020 1 次提交
    • C
      mm: Preserve the PG_arch_2 flag in __split_huge_page_tail() · 72e6afa0
      Catalin Marinas 提交于
      When a huge page is split into normal pages, part of the head page flags
      are transferred to the tail pages. However, the PG_arch_* flags are not
      part of the preserved set.
      
      PG_arch_2 is used by the arm64 MTE support to mark pages that have valid
      tags. The absence of such flag would cause the arm64 set_pte_at() to
      clear the tags in order to avoid stale tags exposed to user or the
      swapping out hooks to ignore the tags. Not preserving PG_arch_2 on huge
      page splitting leads to tag corruption in the tail pages.
      
      Preserve the newly added PG_arch_2 flag in __split_huge_page_tail().
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      72e6afa0
  21. 13 8月, 2020 2 次提交
  22. 08 8月, 2020 3 次提交