1. 12 12月, 2022 1 次提交
  2. 01 12月, 2022 5 次提交
    • M
      mm/page_alloc: simplify locking during free_unref_page_list · a4bafffb
      Mel Gorman 提交于
      While freeing a large list, the zone lock will be released and reacquired
      to avoid long hold times since commit c24ad77d ("mm/page_alloc.c:
      avoid excessive IRQ disabled times in free_unref_page_list()").  As
      suggested by Vlastimil Babka, the lockrelease/reacquire logic can be
      simplified by reusing the logic that acquires a different lock when
      changing zones.
      
      Link: https://lkml.kernel.org/r/20221122131229.5263-3-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      a4bafffb
    • M
      mm/page_alloc: leave IRQs enabled for per-cpu page allocations · 57490774
      Mel Gorman 提交于
      The pcp_spin_lock_irqsave protecting the PCP lists is IRQ-safe as a task
      allocating from the PCP must not re-enter the allocator from IRQ context. 
      In each instance where IRQ-reentrancy is possible, the lock is acquired
      using pcp_spin_trylock_irqsave() even though IRQs are disabled and
      re-entrancy is impossible.
      
      Demote the lock to pcp_spin_lock avoids an IRQ disable/enable in the
      common case at the cost of some IRQ allocations taking a slower path.  If
      the PCP lists need to be refilled, the zone lock still needs to disable
      IRQs but that will only happen on PCP refill and drain.  If an IRQ is
      raised when a PCP allocation is in progress, the trylock will fail and
      fallback to using the buddy lists directly.  Note that this may not be a
      universal win if an interrupt-intensive workload also allocates heavily
      from interrupt context and contends heavily on the zone->lock as a result.
      
      [mgorman@techsingularity.net: migratetype might be wrong if a PCP was locked]
        Link: https://lkml.kernel.org/r/20221122131229.5263-2-mgorman@techsingularity.net
      [yuzhao@google.com: reported lockdep issue on IO completion from softirq]
      [hughd@google.com: fix list corruption, lock improvements, micro-optimsations]
      Link: https://lkml.kernel.org/r/20221118101714.19590-3-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      57490774
    • M
      mm/page_alloc: always remove pages from temporary list · c3e58a70
      Mel Gorman 提交于
      Patch series "Leave IRQs enabled for per-cpu page allocations", v3.
      
      
      This patch (of 2):
      
      free_unref_page_list() has neglected to remove pages properly from the
      list of pages to free since forever.  It works by coincidence because
      list_add happened to do the right thing adding the pages to just the PCP
      lists.  However, a later patch added pages to either the PCP list or the
      zone list but only properly deleted the page from the list in one path
      leading to list corruption and a subsequent failure.  As a preparation
      patch, always delete the pages from one list properly before adding to
      another.  On its own, this fixes nothing although it adds a fractional
      amount of overhead but is critical to the next patch.
      
      Link: https://lkml.kernel.org/r/20221118101714.19590-1-mgorman@techsingularity.net
      Link: https://lkml.kernel.org/r/20221118101714.19590-2-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Reported-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      c3e58a70
    • H
      mm,thp,rmap: subpages_mapcount COMPOUND_MAPPED if PMD-mapped · 4b51634c
      Hugh Dickins 提交于
      Can the lock_compound_mapcount() bit_spin_lock apparatus be removed now? 
      Yes.  Not by atomic64_t or cmpxchg games, those get difficult on 32-bit;
      but if we slightly abuse subpages_mapcount by additionally demanding that
      one bit be set there when the compound page is PMD-mapped, then a cascade
      of two atomic ops is able to maintain the stats without bit_spin_lock.
      
      This is harder to reason about than when bit_spin_locked, but I believe
      safe; and no drift in stats detected when testing.  When there are racing
      removes and adds, of course the sequence of operations is less well-
      defined; but each operation on subpages_mapcount is atomically good.  What
      might be disastrous, is if subpages_mapcount could ever fleetingly appear
      negative: but the pte lock (or pmd lock) these rmap functions are called
      under, ensures that a last remove cannot race ahead of a first add.
      
      Continue to make an exception for hugetlb (PageHuge) pages, though that
      exception can be easily removed by a further commit if necessary: leave
      subpages_mapcount 0, don't bother with COMPOUND_MAPPED in its case, just
      carry on checking compound_mapcount too in folio_mapped(), page_mapped().
      
      Evidence is that this way goes slightly faster than the previous
      implementation in all cases (pmds after ptes now taking around 103ms); and
      relieves us of worrying about contention on the bit_spin_lock.
      
      Link: https://lkml.kernel.org/r/3978f3ca-5473-55a7-4e14-efea5968d892@google.comSigned-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dan Carpenter <error27@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: James Houghton <jthoughton@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      4b51634c
    • H
      mm,thp,rmap: simplify compound page mapcount handling · cb67f428
      Hugh Dickins 提交于
      Compound page (folio) mapcount calculations have been different for anon
      and file (or shmem) THPs, and involved the obscure PageDoubleMap flag. 
      And each huge mapping and unmapping of a file (or shmem) THP involved
      atomically incrementing and decrementing the mapcount of every subpage of
      that huge page, dirtying many struct page cachelines.
      
      Add subpages_mapcount field to the struct folio and first tail page, so
      that the total of subpage mapcounts is available in one place near the
      head: then page_mapcount() and total_mapcount() and page_mapped(), and
      their folio equivalents, are so quick that anon and file and hugetlb don't
      need to be optimized differently.  Delete the unloved PageDoubleMap.
      
      page_add and page_remove rmap functions must now maintain the
      subpages_mapcount as well as the subpage _mapcount, when dealing with pte
      mappings of huge pages; and correct maintenance of NR_ANON_MAPPED and
      NR_FILE_MAPPED statistics still needs reading through the subpages, using
      nr_subpages_unmapped() - but only when first or last pmd mapping finds
      subpages_mapcount raised (double-map case, not the common case).
      
      But are those counts (used to decide when to split an anon THP, and in
      vmscan's pagecache_reclaimable heuristic) correctly maintained?  Not
      quite: since page_remove_rmap() (and also split_huge_pmd()) is often
      called without page lock, there can be races when a subpage pte mapcount
      0<->1 while compound pmd mapcount 0<->1 is scanning - races which the
      previous implementation had prevented.  The statistics might become
      inaccurate, and even drift down until they underflow through 0.  That is
      not good enough, but is better dealt with in a followup patch.
      
      Update a few comments on first and second tail page overlaid fields. 
      hugepage_add_new_anon_rmap() has to "increment" compound_mapcount, but
      subpages_mapcount and compound_pincount are already correctly at 0, so
      delete its reinitialization of compound_pincount.
      
      A simple 100 X munmap(mmap(2GB, MAP_SHARED|MAP_POPULATE, tmpfs), 2GB) took
      18 seconds on small pages, and used to take 1 second on huge pages, but
      now takes 119 milliseconds on huge pages.  Mapping by pmds a second time
      used to take 860ms and now takes 92ms; mapping by pmds after mapping by
      ptes (when the scan is needed) used to take 870ms and now takes 495ms. 
      But there might be some benchmarks which would show a slowdown, because
      tail struct pages now fall out of cache until final freeing checks them.
      
      Link: https://lkml.kernel.org/r/47ad693-717-79c8-e1ba-46c3a6602e48@google.comSigned-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: James Houghton <jthoughton@google.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      cb67f428
  3. 23 11月, 2022 1 次提交
  4. 29 10月, 2022 1 次提交
    • H
      mm: prep_compound_tail() clear page->private · 5aae9265
      Hugh Dickins 提交于
      Although page allocation always clears page->private in the first page or
      head page of an allocation, it has never made a point of clearing
      page->private in the tails (though 0 is often what is already there).
      
      But now commit 71e2d666 ("mm/huge_memory: do not clobber swp_entry_t
      during THP split") issues a warning when page_tail->private is found to be
      non-0 (unless it's swapcache).
      
      Change that warning to dump page_tail (which also dumps head), instead of
      just the head: so far we have seen dead000000000122, dead000000000003,
      dead000000000001 or 0000000000000002 in the raw output for tail private.
      
      We could just delete the warning, but today's consensus appears to want
      page->private to be 0, unless there's a good reason for it to be set: so
      now clear it in prep_compound_tail() (more general than just for THP; but
      not for high order allocation, which makes no pass down the tails).
      
      Link: https://lkml.kernel.org/r/1c4233bb-4e4d-5969-fbd4-96604268a285@google.com
      Fixes: 71e2d666 ("mm/huge_memory: do not clobber swp_entry_t during THP split")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      5aae9265
  5. 21 10月, 2022 1 次提交
  6. 13 10月, 2022 2 次提交
  7. 04 10月, 2022 21 次提交
  8. 27 9月, 2022 2 次提交
  9. 12 9月, 2022 6 次提交