1. 03 6月, 2021 1 次提交
  2. 09 4月, 2021 1 次提交
  3. 17 10月, 2020 1 次提交
  4. 12 10月, 2020 1 次提交
  5. 11 10月, 2020 1 次提交
    • H
      mm/khugepaged: fix filemap page_to_pgoff(page) != offset · 033b5d77
      Hugh Dickins 提交于
      There have been elusive reports of filemap_fault() hitting its
      VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
      with CONFIG_READ_ONLY_THP_FOR_FS=y.
      
      Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
      CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
      without NUMA reuses the same huge page after collapse_file() failed
      (whereas NUMA targets its allocation to the respective node each time).
      And most of us were usually testing with CONFIG_NUMA=y kernels.
      
      collapse_file(old start)
        new_page = khugepaged_alloc_page(hpage)
        __SetPageLocked(new_page)
        new_page->index = start // hpage->index=old offset
        new_page->mapping = mapping
        xas_store(&xas, new_page)
      
                                filemap_fault
                                  page = find_get_page(mapping, offset)
                                  // if offset falls inside hpage then
                                  // compound_head(page) == hpage
                                  lock_page_maybe_drop_mmap()
                                    __lock_page(page)
      
        // collapse fails
        xas_store(&xas, old page)
        new_page->mapping = NULL
        unlock_page(new_page)
      
      collapse_file(new start)
        new_page = khugepaged_alloc_page(hpage)
        __SetPageLocked(new_page)
        new_page->index = start // hpage->index=new offset
        new_page->mapping = mapping // mapping becomes valid again
      
                                  // since compound_head(page) == hpage
                                  // page_to_pgoff(page) got changed
                                  VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
      
      An initial patch replaced __SetPageLocked() by lock_page(), which did
      fix the race which Suren illustrates above.  But testing showed that it's
      not good enough: if the racing task's __lock_page() gets delayed long
      after its find_get_page(), then it may follow collapse_file(new start)'s
      successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.
      
      It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
      check and retry (as is done for mapping), with similar relaxations in
      find_lock_entry() and pagecache_get_page(): but it's not obvious what
      else might get caught out; and khugepaged non-NUMA appears to be unique
      in exposing a page to page cache, then revoking, without going through
      a full cycle of freeing before reuse.
      
      Instead, non-NUMA khugepaged_prealloc_page() release the old page
      if anyone else has a reference to it (1% of cases when I tested).
      
      Although never reported on huge tmpfs, I believe its find_lock_entry()
      has been at similar risk; but huge tmpfs does not rely on khugepaged
      for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.
      Reported-by: NDenis Lisov <dennis.lissov@gmail.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
      Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.orgReported-by: NQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pwReported-and-analyzed-by: NSuren Baghdasaryan <surenb@google.com>
      Fixes: 87c460a0 ("mm/khugepaged: collapse_shmem() without freezing new_page")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org # v4.9+
      Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      033b5d77
  6. 06 9月, 2020 1 次提交
  7. 22 8月, 2020 1 次提交
  8. 13 8月, 2020 1 次提交
    • J
      mm/vmscan: protect the workingset on anonymous LRU · b518154e
      Joonsoo Kim 提交于
      In current implementation, newly created or swap-in anonymous page is
      started on active list.  Growing active list results in rebalancing
      active/inactive list so old pages on active list are demoted to inactive
      list.  Hence, the page on active list isn't protected at all.
      
      Following is an example of this situation.
      
      Assume that 50 hot pages on active list.  Numbers denote the number of
      pages on active/inactive list (active | inactive).
      
      1. 50 hot pages on active list
      50(h) | 0
      
      2. workload: 50 newly created (used-once) pages
      50(uo) | 50(h)
      
      3. workload: another 50 newly created (used-once) pages
      50(uo) | 50(uo), swap-out 50(h)
      
      This patch tries to fix this issue.  Like as file LRU, newly created or
      swap-in anonymous pages will be inserted to the inactive list.  They are
      promoted to active list if enough reference happens.  This simple
      modification changes the above example as following.
      
      1. 50 hot pages on active list
      50(h) | 0
      
      2. workload: 50 newly created (used-once) pages
      50(h) | 50(uo)
      
      3. workload: another 50 newly created (used-once) pages
      50(h) | 50(uo), swap-out 50(uo)
      
      As you can see, hot pages on active list would be protected.
      
      Note that, this implementation has a drawback that the page cannot be
      promoted and will be swapped-out if re-access interval is greater than the
      size of inactive list but less than the size of total(active+inactive).
      To solve this potential issue, following patch will apply workingset
      detection similar to the one that's already applied to file LRU.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Link: http://lkml.kernel.org/r/1595490560-15117-3-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b518154e
  9. 08 8月, 2020 4 次提交
    • H
      khugepaged: khugepaged_test_exit() check mmget_still_valid() · bbe98f9c
      Hugh Dickins 提交于
      Move collapse_huge_page()'s mmget_still_valid() check into
      khugepaged_test_exit() itself.  collapse_huge_page() is used for anon THP
      only, and earned its mmget_still_valid() check because it inserts a huge
      pmd entry in place of the page table's pmd entry; whereas
      collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp()
      merely clears the page table's pmd entry.  But core dumping without mmap
      lock must have been as open to mistaking a racily cleared pmd entry for a
      page table at physical page 0, as exit_mmap() was.  And we certainly have
      no interest in mapping as a THP once dumping core.
      
      Fixes: 59ea6d06 ("coredump: fix race condition between collapse_huge_page() and core dumping")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021217020.27773@eggly.anvilsSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bbe98f9c
    • H
      khugepaged: retract_page_tables() remember to test exit · 18e77600
      Hugh Dickins 提交于
      Only once have I seen this scenario (and forgot even to notice what forced
      the eventual crash): a sequence of "BUG: Bad page map" alerts from
      vm_normal_page(), from zap_pte_range() servicing exit_mmap();
      pmd:00000000, pte values corresponding to data in physical page 0.
      
      The pte mappings being zapped in this case were supposed to be from a huge
      page of ext4 text (but could as well have been shmem): my belief is that
      it was racing with collapse_file()'s retract_page_tables(), found *pmd
      pointing to a page table, locked it, but *pmd had become 0 by the time
      start_pte was decided.
      
      In most cases, that possibility is excluded by holding mmap lock; but
      exit_mmap() proceeds without mmap lock.  Most of what's run by khugepaged
      checks khugepaged_test_exit() after acquiring mmap lock:
      khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
      for example.  But retract_page_tables() did not: fix that.
      
      The fix is for retract_page_tables() to check khugepaged_test_exit(),
      after acquiring mmap lock, before doing anything to the page table.
      Getting the mmap lock serializes with __mmput(), which briefly takes and
      drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
      mm_users makes sure we don't touch the page table once exit_mmap() might
      reach it, since exit_mmap() will be proceeding without mmap lock, not
      expecting anyone to be racing with it.
      
      Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvilsSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18e77600
    • H
      khugepaged: collapse_pte_mapped_thp() protect the pmd lock · 119a5fc1
      Hugh Dickins 提交于
      When retract_page_tables() removes a page table to make way for a huge
      pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
      pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
      case when the original mmap_write_trylock had failed), only
      mmap_write_trylock and pmd lock are held.
      
      That's not enough.  One machine has twice crashed under load, with "BUG:
      spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b.  Examining the second
      crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
      page_referenced() on a file THP, that had found a page table at *pmd)
      discovers that the page table page and its lock have already been freed by
      the time it comes to unlock.
      
      Follow the example of retract_page_tables(), but we only need one of huge
      page lock or i_mmap_lock_write to secure against this: because it's the
      narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
      the hpage earlier, choose to rely on huge page lock here.
      
      Fixes: 27e1f827 ("khugepaged: enable collapse pmd for pte-mapped THP")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: <stable@vger.kernel.org>	[5.4+]
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvilsSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      119a5fc1
    • H
      khugepaged: collapse_pte_mapped_thp() flush the right range · 723a80da
      Hugh Dickins 提交于
      pmdp_collapse_flush() should be given the start address at which the huge
      page is mapped, haddr: it was given addr, which at that point has been
      used as a local variable, incremented to the end address of the extent.
      
      Found by source inspection while chasing a hugepage locking bug, which I
      then could not explain by this.  At first I thought this was very bad;
      then saw that all of the page translations that were not flushed would
      actually still point to the right pages afterwards, so harmless; then
      realized that I know nothing of how different architectures and models
      cache intermediate paging structures, so maybe it matters after all -
      particularly since the page table concerned is immediately freed.
      
      Much easier to fix than to think about.
      
      Fixes: 27e1f827 ("khugepaged: enable collapse pmd for pte-mapped THP")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: <stable@vger.kernel.org>	[5.4+]
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021204390.27773@eggly.anvilsSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      723a80da
  10. 25 7月, 2020 1 次提交
  11. 10 6月, 2020 3 次提交
  12. 04 6月, 2020 12 次提交
  13. 29 5月, 2020 1 次提交
  14. 08 4月, 2020 3 次提交
  15. 03 4月, 2020 2 次提交
  16. 02 12月, 2019 1 次提交
  17. 16 11月, 2019 1 次提交
    • S
      mm,thp: recheck each page before collapsing file THP · 4655e5e5
      Song Liu 提交于
      In collapse_file(), for !is_shmem case, current check cannot guarantee
      the locked page is up-to-date.  Specifically, xas_unlock_irq() should
      not be called before lock_page() and get_page(); and it is necessary to
      recheck PageUptodate() after locking the page.
      
      With this bug and CONFIG_READ_ONLY_THP_FOR_FS=y, madvise(HUGE)'ed .text
      may contain corrupted data.  This is because khugepaged mistakenly
      collapses some not up-to-date sub pages into a huge page, and assumes
      the huge page is up-to-date.  This will NOT corrupt data in the disk,
      because the page is read-only and never written back.  Fix this by
      properly checking PageUptodate() after locking the page.  This check
      replaces "VM_BUG_ON_PAGE(!PageUptodate(page), page);".
      
      Also, move PageDirty() check after locking the page.  Current khugepaged
      should not try to collapse dirty file THP, because it is limited to
      read-only .text.  The only case we hit a dirty page here is when the
      page hasn't been written since write.  Bail out and retry when this
      happens.
      
      syzbot reported bug on previous version of this patch.
      
      Link: http://lkml.kernel.org/r/20191106060930.2571389-2-songliubraving@fb.com
      Fixes: 99cb0dbd ("mm,thp: add read-only THP support for (non-shmem) FS")
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Reported-by: syzbot+efb9e48b9fbdc49bb34a@syzkaller.appspotmail.com
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4655e5e5
  18. 07 11月, 2019 1 次提交
  19. 25 9月, 2019 3 次提交