1. 18 7月, 2022 3 次提交
    • M
      mm/page_alloc: split out buddy removal code from rmqueue into separate helper · 589d9973
      Mel Gorman 提交于
      This is a preparation page to allow the buddy removal code to be reused in
      a later patch.
      
      No functional change.
      
      Link: https://lkml.kernel.org/r/20220624125423.6126-4-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Tested-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NNicolas Saenz Julienne <nsaenzju@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NYu Zhao <yuzhao@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      589d9973
    • M
      mm/page_alloc: use only one PCP list for THP-sized allocations · 5d0a661d
      Mel Gorman 提交于
      The per_cpu_pages is cache-aligned on a standard x86-64 distribution
      configuration but a later patch will add a new field which would push the
      structure into the next cache line.  Use only one list to store THP-sized
      pages on the per-cpu list.  This assumes that the vast majority of
      THP-sized allocations are GFP_MOVABLE but even if it was another type, it
      would not contribute to serious fragmentation that potentially causes a
      later THP allocation failure.  Align per_cpu_pages on the cacheline
      boundary to ensure there is no false cache sharing.
      
      After this patch, the structure sizing is;
      
      struct per_cpu_pages {
              int                        count;                /*     0     4 */
              int                        high;                 /*     4     4 */
              int                        batch;                /*     8     4 */
              short int                  free_factor;          /*    12     2 */
              short int                  expire;               /*    14     2 */
              struct list_head           lists[13];            /*    16   208 */
      
              /* size: 256, cachelines: 4, members: 6 */
              /* padding: 32 */
      } __attribute__((__aligned__(64)));
      
      Link: https://lkml.kernel.org/r/20220624125423.6126-3-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Tested-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NYu Zhao <yuzhao@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Nicolas Saenz Julienne <nsaenzju@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      5d0a661d
    • M
      mm/page_alloc: add page->buddy_list and page->pcp_list · bf75f200
      Mel Gorman 提交于
      Patch series "Drain remote per-cpu directly", v5.
      
      Some setups, notably NOHZ_FULL CPUs, may be running realtime or
      latency-sensitive applications that cannot tolerate interference due to
      per-cpu drain work queued by __drain_all_pages().  Introduce a new
      mechanism to remotely drain the per-cpu lists.  It is made possible by
      remotely locking 'struct per_cpu_pages' new per-cpu spinlocks.  This has
      two advantages, the time to drain is more predictable and other unrelated
      tasks are not interrupted.
      
      This series has the same intent as Nicolas' series "mm/page_alloc: Remote
      per-cpu lists drain support" -- avoid interference of a high priority task
      due to a workqueue item draining per-cpu page lists.  While many workloads
      can tolerate a brief interruption, it may cause a real-time task running
      on a NOHZ_FULL CPU to miss a deadline and at minimum, the draining is
      non-deterministic.
      
      Currently an IRQ-safe local_lock protects the page allocator per-cpu
      lists.  The local_lock on its own prevents migration and the IRQ disabling
      protects from corruption due to an interrupt arriving while a page
      allocation is in progress.
      
      This series adjusts the locking.  A spinlock is added to struct
      per_cpu_pages to protect the list contents while local_lock_irq is
      ultimately replaced by just the spinlock in the final patch.  This allows
      a remote CPU to safely.  Follow-on work should allow the spin_lock_irqsave
      to be converted to spin_lock to avoid IRQs being disabled/enabled in most
      cases.  The follow-on patch will be one kernel release later as it is
      relatively high risk and it'll make bisections more clear if there are any
      problems.
      
      Patch 1 is a cosmetic patch to clarify when page->lru is storing buddy pages
      	and when it is storing per-cpu pages.
      
      Patch 2 shrinks per_cpu_pages to make room for a spin lock. Strictly speaking
      	this is not necessary but it avoids per_cpu_pages consuming another
      	cache line.
      
      Patch 3 is a preparation patch to avoid code duplication.
      
      Patch 4 is a minor correction.
      
      Patch 5 uses a spin_lock to protect the per_cpu_pages contents while still
      	relying on local_lock to prevent migration, stabilise the pcp
      	lookup and prevent IRQ reentrancy.
      
      Patch 6 remote drains per-cpu pages directly instead of using a workqueue.
      
      Patch 7 uses a normal spinlock instead of local_lock for remote draining
      
      
      This patch (of 7):
      
      The page allocator uses page->lru for storing pages on either buddy or PCP
      lists.  Create page->buddy_list and page->pcp_list as a union with
      page->lru.  This is simply to clarify what type of list a page is on in
      the page allocator.
      
      No functional change intended.
      
      [minchan@kernel.org: fix page lru fields in macros]
      Link: https://lkml.kernel.org/r/20220624125423.6126-2-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Tested-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NNicolas Saenz Julienne <nsaenzju@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NYu Zhao <yuzhao@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      bf75f200
  2. 04 7月, 2022 4 次提交
  3. 17 6月, 2022 1 次提交
  4. 28 5月, 2022 2 次提交
  5. 27 5月, 2022 1 次提交
    • M
      mm/page_alloc: always attempt to allocate at least one page during bulk allocation · c572e488
      Mel Gorman 提交于
      Peter Pavlisko reported the following problem on kernel bugzilla 216007.
      
      	When I try to extract an uncompressed tar archive (2.6 milion
      	files, 760.3 GiB in size) on newly created (empty) XFS file system,
      	after first low tens of gigabytes extracted the process hangs in
      	iowait indefinitely. One CPU core is 100% occupied with iowait,
      	the other CPU core is idle (on 2-core Intel Celeron G1610T).
      
      It was bisected to c9fa5630 ("xfs: use alloc_pages_bulk_array() for
      buffers") but XFS is only the messenger.  The problem is that nothing is
      waking kswapd to reclaim some pages at a time the PCP lists cannot be
      refilled until some reclaim happens.  The bulk allocator checks that there
      are some pages in the array and the original intent was that a bulk
      allocator did not necessarily need all the requested pages and it was best
      to return as quickly as possible.
      
      This was fine for the first user of the API but both NFS and XFS require
      the requested number of pages be available before making progress.  Both
      could be adjusted to call the page allocator directly if a bulk allocation
      fails but it puts a burden on users of the API.  Adjust the semantics to
      attempt at least one allocation via __alloc_pages() before returning so
      kswapd is woken if necessary.
      
      It was reported via bugzilla that the patch addressed the problem and that
      the tar extraction completed successfully.  This may also address bug
      215975 but has yet to be confirmed.
      
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216007
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215975
      Link: https://lkml.kernel.org/r/20220526091210.GC3441@techsingularity.net
      Fixes: 387ba26f ("mm/page_alloc: add a bulk page allocator")
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: "Darrick J. Wong" <djwong@kernel.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: <stable@vger.kernel.org>	[5.13+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      c572e488
  6. 26 5月, 2022 1 次提交
    • Z
      mm: fix a potential infinite loop in start_isolate_page_range() · 88ee1343
      Zi Yan 提交于
      In isolate_single_pageblock() called by start_isolate_page_range(), there
      are some pageblock isolation issues causing a potential infinite loop when
      isolating a page range.  This is reported by Qian Cai.
      
      1. the pageblock was isolated by just changing pageblock migratetype
         without checking unmovable pages. Calling set_migratetype_isolate() to
         isolate pageblock properly.
      2. an off-by-one error caused migrating pages unnecessarily, since the page
         is not crossing pageblock boundary.
      3. migrating a compound page across pageblock boundary then splitting the
         free page later has a small race window that the free page might be
         allocated again, so that the code will try again, causing an potential
         infinite loop. Temporarily set the to-be-migrated page's pageblock to
         MIGRATE_ISOLATE to prevent that and bail out early if no free page is
         found after page migration.
      
      An additional fix to split_free_page() aims to avoid crashing in
      __free_one_page().  When the free page is split at the specified
      split_pfn_offset, free_page_order should check both the first bit of
      free_page_pfn and the last bit of split_pfn_offset and use the smaller
      one.  For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000,
      free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then
      0x8000, which the original algorithm did.
      
      [akpm@linux-foundation.org: suppress min() warning]
      Link: https://lkml.kernel.org/r/20220524194756.1698351-1-zi.yan@sent.com
      Fixes: b2c9e2fb ("mm: make alloc_contig_range work at pageblock granularity")
      Signed-off-by: NZi Yan <ziy@nvidia.com>
      Reported-by: NQian Cai <quic_qiancai@quicinc.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric Ren <renzhengeek@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      88ee1343
  7. 20 5月, 2022 2 次提交
  8. 13 5月, 2022 6 次提交
  9. 10 5月, 2022 1 次提交
  10. 30 4月, 2022 1 次提交
  11. 29 4月, 2022 6 次提交
  12. 25 4月, 2022 1 次提交
  13. 16 4月, 2022 1 次提交
    • J
      mm, page_alloc: fix build_zonerefs_node() · e553f62f
      Juergen Gross 提交于
      Since commit 6aa303de ("mm, vmscan: only allocate and reclaim from
      zones with pages managed by the buddy allocator") only zones with free
      memory are included in a built zonelist.  This is problematic when e.g.
      all memory of a zone has been ballooned out when zonelists are being
      rebuilt.
      
      The decision whether to rebuild the zonelists when onlining new memory
      is done based on populated_zone() returning 0 for the zone the memory
      will be added to.  The new zone is added to the zonelists only, if it
      has free memory pages (managed_zone() returns a non-zero value) after
      the memory has been onlined.  This implies, that onlining memory will
      always free the added pages to the allocator immediately, but this is
      not true in all cases: when e.g. running as a Xen guest the onlined new
      memory will be added only to the ballooned memory list, it will be freed
      only when the guest is being ballooned up afterwards.
      
      Another problem with using managed_zone() for the decision whether a
      zone is being added to the zonelists is, that a zone with all memory
      used will in fact be removed from all zonelists in case the zonelists
      happen to be rebuilt.
      
      Use populated_zone() when building a zonelist as it has been done before
      that commit.
      
      There was a report that QubesOS (based on Xen) is hitting this problem.
      Xen has switched to use the zone device functionality in kernel 5.9 and
      QubesOS wants to use memory hotplugging for guests in order to be able
      to start a guest with minimal memory and expand it as needed.  This was
      the report leading to the patch.
      
      Link: https://lkml.kernel.org/r/20220407120637.9035-1-jgross@suse.com
      Fixes: 6aa303de ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reported-by: NMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Reviewed-by: NWei Yang <richard.weiyang@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e553f62f
  14. 05 4月, 2022 1 次提交
  15. 02 4月, 2022 1 次提交
  16. 31 3月, 2022 1 次提交
  17. 25 3月, 2022 7 次提交