1. 06 5月, 2021 10 次提交
    • P
      mm: apply per-task gfp constraints in fast path · da6df1b0
      Pavel Tatashin 提交于
      Function current_gfp_context() is called after fast path.  However, soon
      we will add more constraints which will also limit zones based on
      context.  Move this call into fast path, and apply the correct
      constraints for all allocations.
      
      Also update .reclaim_idx based on value returned by
      current_gfp_context() because it soon will modify the allowed zones.
      
      Note:
      With this patch we will do one extra current->flags load during fast path,
      but we already load current->flags in fast-path:
      
      __alloc_pages()
       prepare_alloc_pages()
        current_alloc_flags(gfp_mask, *alloc_flags);
      
      Later, when we add the zone constrain logic to current_gfp_context() we
      will be able to remove current->flags load from current_alloc_flags, and
      therefore return fast-path to the current performance level.
      
      Link: https://lkml.kernel.org/r/20210215161349.246722-7-pasha.tatashin@soleen.comSigned-off-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Suggested-by: NMichal Hocko <mhocko@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da6df1b0
    • P
      mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN · 1a08ae36
      Pavel Tatashin 提交于
      PF_MEMALLOC_NOCMA is used ot guarantee that the allocator will not
      return pages that might belong to CMA region.  This is currently used
      for long term gup to make sure that such pins are not going to be done
      on any CMA pages.
      
      When PF_MEMALLOC_NOCMA has been introduced we haven't realized that it
      is focusing on CMA pages too much and that there is larger class of
      pages that need the same treatment.  MOVABLE zone cannot contain any
      long term pins as well so it makes sense to reuse and redefine this flag
      for that usecase as well.  Rename the flag to PF_MEMALLOC_PIN which
      defines an allocation context which can only get pages suitable for
      long-term pins.
      
      Also rename: memalloc_nocma_save()/memalloc_nocma_restore to
      memalloc_pin_save()/memalloc_pin_restore() and make the new functions
      common.
      
      [rppt@linux.ibm.com: fix renaming of PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN]
        Link: https://lkml.kernel.org/r/20210331163816.11517-1-rppt@kernel.org
      
      Link: https://lkml.kernel.org/r/20210215161349.246722-6-pasha.tatashin@soleen.comSigned-off-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a08ae36
    • M
      mm: use proper type for cma_[alloc|release] · 78fa5150
      Minchan Kim 提交于
      size_t in cma_alloc is confusing since it makes people think it's byte
      count, not pages.  Change it to unsigned long[1].
      
      The unsigned int in cma_release is also not right so change it.  Since we
      have unsigned long in cma_release, free_contig_range should also respect
      it.
      
      [1] 67a2e213, mm: cma: fix incorrect type conversion for size during dma allocation
      
      Link: https://lore.kernel.org/linux-mm/20210324043434.GP1719932@casper.infradead.org/
      Link: https://lkml.kernel.org/r/20210331164018.710560-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78fa5150
    • M
      mm: replace migrate_[prep|finish] with lru_cache_[disable|enable] · 361a2a22
      Minchan Kim 提交于
      Currently, migrate_[prep|finish] is merely a wrapper of
      lru_cache_[disable|enable].  There is not much to gain from having
      additional abstraction.
      
      Use lru_cache_[disable|enable] instead of migrate_[prep|finish], which
      would be more descriptive.
      
      note: migrate_prep_local in compaction.c changed into lru_add_drain to
      avoid CPU schedule cost with involving many other CPUs to keep old
      behavior.
      
      Link: https://lkml.kernel.org/r/20210319175127.886124-2-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
      Cc: John Dias <joaodias@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Oliver Sang <oliver.sang@intel.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      361a2a22
    • M
      mm: disable LRU pagevec during the migration temporarily · d479960e
      Minchan Kim 提交于
      LRU pagevec holds refcount of pages until the pagevec are drained.  It
      could prevent migration since the refcount of the page is greater than
      the expection in migration logic.  To mitigate the issue, callers of
      migrate_pages drains LRU pagevec via migrate_prep or lru_add_drain_all
      before migrate_pages call.
      
      However, it's not enough because pages coming into pagevec after the
      draining call still could stay at the pagevec so it could keep
      preventing page migration.  Since some callers of migrate_pages have
      retrial logic with LRU draining, the page would migrate at next trail
      but it is still fragile in that it doesn't close the fundamental race
      between upcoming LRU pages into pagvec and migration so the migration
      failure could cause contiguous memory allocation failure in the end.
      
      To close the race, this patch disables lru caches(i.e, pagevec) during
      ongoing migration until migrate is done.
      
      Since it's really hard to reproduce, I measured how many times
      migrate_pages retried with force mode(it is about a fallback to a sync
      migration) with below debug code.
      
      int migrate_pages(struct list_head *from, new_page_t get_new_page,
      			..
      			..
      
        if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
               printk(KERN_ERR, "pfn 0x%lx reason %d", page_to_pfn(page), rc);
               dump_page(page, "fail to migrate");
        }
      
      The test was repeating android apps launching with cma allocation in
      background every five seconds.  Total cma allocation count was about 500
      during the testing.  With this patch, the dump_page count was reduced
      from 400 to 30.
      
      The new interface is also useful for memory hotplug which currently
      drains lru pcp caches after each migration failure.  This is rather
      suboptimal as it has to disrupt others running during the operation.
      With the new interface the operation happens only once.  This is also in
      line with pcp allocator cache which are disabled for the offlining as
      well.
      
      Link: https://lkml.kernel.org/r/20210319175127.886124-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NChris Goldsworthy <cgoldswo@codeaurora.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: John Dias <joaodias@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Oliver Sang <oliver.sang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d479960e
    • C
      mm: compaction: update the COMPACT[STALL|FAIL] events properly · 06dac2f4
      Charan Teja Reddy 提交于
      By definition, COMPACT[STALL|FAIL] events needs to be counted when there
      is 'At least in one zone compaction wasn't deferred or skipped from the
      direct compaction'.  And when compaction is skipped or deferred,
      COMPACT_SKIPPED will be returned but it will still go and update these
      compaction events which is wrong in the sense that COMPACT[STALL|FAIL]
      is counted without even trying the compaction.
      
      Correct this by skipping the counting of these events when
      COMPACT_SKIPPED is returned for compaction.  This indirectly also avoid
      the unnecessary try into the get_page_from_freelist() when compaction is
      not even tried.
      
      There is a corner case where compaction is skipped but still count
      COMPACTSTALL event, which is that IRQ came and freed the page and the
      same is captured in capture_control.
      
      Link: https://lkml.kernel.org/r/1613151184-21213-1-git-send-email-charante@codeaurora.orgSigned-off-by: NCharan Teja Reddy <charante@codeaurora.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06dac2f4
    • D
      mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks · 202e35db
      Dave Hansen 提交于
      RECLAIM_ZONE was assumed to be unused because it was never explicitly
      used in the kernel.  However, there were a number of places where it was
      checked implicitly by checking 'node_reclaim_mode' for a zero value.
      
      These zero checks are not great because it is not obvious what a zero
      mode *means* in the code.  Replace them with a helper which makes it
      more obvious: node_reclaim_enabled().
      
      This helper also provides a handy place to explicitly check the
      RECLAIM_ZONE bit itself.  Check it explicitly there to make it more
      obvious where the bit can affect behavior.
      
      This should have no functional impact.
      
      Link: https://lkml.kernel.org/r/20210219172559.BF589C44@viggo.jf.intel.comSigned-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NBen Widawsky <ben.widawsky@intel.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: "Tobin C. Harding" <tobin@kernel.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Daniel Wagner <dwagner@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      202e35db
    • O
      mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig · eb14d4ee
      Oscar Salvador 提交于
      pfn_range_valid_contig() bails out when it finds an in-use page or a
      hugetlb page, among other things.  We can drop the in-use page check since
      __alloc_contig_pages can migrate away those pages, and the hugetlb page
      check can go too since isolate_migratepages_range is now capable of
      dealing with hugetlb pages.  Either way, those checks are racy so let the
      end function handle it when the time comes.
      
      Link: https://lkml.kernel.org/r/20210419075413.1064-8-osalvador@suse.deSigned-off-by: NOscar Salvador <osalvador@suse.de>
      Suggested-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb14d4ee
    • O
      mm,compaction: let isolate_migratepages_{range,block} return error codes · c2ad7a1f
      Oscar Salvador 提交于
      Currently, isolate_migratepages_{range,block} and their callers use a pfn
      == 0 vs pfn != 0 scheme to let the caller know whether there was any error
      during isolation.
      
      This does not work as soon as we need to start reporting different error
      codes and make sure we pass them down the chain, so they are properly
      interpreted by functions like e.g: alloc_contig_range.
      
      Let us rework isolate_migratepages_{range,block} so we can report error
      codes.  Since isolate_migratepages_block will stop returning the next pfn
      to be scanned, we reuse the cc->migrate_pfn field to keep track of that.
      
      Link: https://lkml.kernel.org/r/20210419075413.1064-3-osalvador@suse.deSigned-off-by: NOscar Salvador <osalvador@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2ad7a1f
    • O
      mm,page_alloc: bail out earlier on -ENOMEM in alloc_contig_migrate_range · c8e28b47
      Oscar Salvador 提交于
      Patch series "Make alloc_contig_range handle Hugetlb pages", v10.
      
      alloc_contig_range lacks the ability to handle HugeTLB pages.  This can
      be problematic for some users, e.g: CMA and virtio-mem, where those
      users will fail the call if alloc_contig_range ever sees a HugeTLB page,
      even when those pages lay in ZONE_MOVABLE and are free.  That problem
      can be easily solved by replacing the page in the free hugepage pool.
      
      In-use HugeTLB are no exception though, as those can be isolated and
      migrated as any other LRU or Movable page.
      
      This aims to improve alloc_contig_range->isolate_migratepages_block, so
      that HugeTLB pages can be recognized and handled.
      
      Since we also need to start reporting errors down the chain (e.g:
      -ENOMEM due to not be able to allocate a new hugetlb page),
      isolate_migratepages_{range,block} interfaces need to change to start
      reporting error codes instead of the pfn == 0 vs pfn != 0 scheme it is
      using right now.  From now on, isolate_migratepages_block will not
      return the next pfn to be scanned anymore, but -EINTR, -ENOMEM or 0, so
      we the next pfn to be scanned will be recorded in cc->migrate_pfn field
      (as it is already done in isolate_migratepages_range()).
      
      Below is an insight from David (thanks), where the problem can clearly be
      seen:
      
       "Start a VM with 4G. Hotplug 1G via virtio-mem and online it to
        ZONE_MOVABLE. Allocate 512 huge pages.
      
        [root@localhost ~]# cat /proc/meminfo
        MemTotal:        5061512 kB
        MemFree:         3319396 kB
        MemAvailable:    3457144 kB
        ...
        HugePages_Total:     512
        HugePages_Free:      512
        HugePages_Rsvd:        0
        HugePages_Surp:        0
        Hugepagesize:       2048 kB
      
        The huge pages get partially allocate from ZONE_MOVABLE. Try unplugging
        1G via virtio-mem (remember, all ZONE_MOVABLE). Inside the guest:
      
        [  180.058992] alloc_contig_range: [1b8000, 1c0000) PFNs busy
        [  180.060531] alloc_contig_range: [1b8000, 1c0000) PFNs busy
        [  180.061972] alloc_contig_range: [1b8000, 1c0000) PFNs busy
        [  180.063413] alloc_contig_range: [1b8000, 1c0000) PFNs busy
        [  180.064838] alloc_contig_range: [1b8000, 1c0000) PFNs busy
        [  180.065848] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
        [  180.066794] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
        [  180.067738] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
        [  180.068669] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
        [  180.069598] alloc_contig_range: [1bfc00, 1c0000) PFNs busy"
      
      And then with this patchset running:
      
       "Same experiment with ZONE_MOVABLE:
      
        a) Free huge pages: all memory can get unplugged again.
      
        b) Allocated/populated but idle huge pages: all memory can get unplugged
           again.
      
        c) Allocated/populated but all 512 huge pages are read/written in a
           loop: all memory can get unplugged again, but I get a single
      
           [  121.192345] alloc_contig_range: [180000, 188000) PFNs busy
      
           Most probably because it happened to try migrating a huge page
           while it was busy.  As virtio-mem retries on ZONE_MOVABLE a couple of
           times, it can deal with this temporary failure.
      
        Last but not least, I did something extreme:
      
        # cat /proc/meminfo
        MemTotal:        5061568 kB
        MemFree:          186560 kB
        MemAvailable:     354524 kB
        ...
        HugePages_Total:    2048
        HugePages_Free:     2048
        HugePages_Rsvd:        0
        HugePages_Surp:        0
      
        Triggering unplug would require to dissolve+alloc - which now fails
        when trying to allocate an additional ~512 huge pages (1G).
      
        As expected, I can properly see memory unplug not fully succeeding.  +
        I get a fairly continuous stream of
      
        [  226.611584] alloc_contig_range: [19f400, 19f800) PFNs busy
        ...
      
        But more importantly, the hugepage count remains stable, as configured
        by the admin (me):
      
        HugePages_Total:    2048
        HugePages_Free:     2048
        HugePages_Rsvd:        0
        HugePages_Surp:        0"
      
      This patch (of 7):
      
      Currently, __alloc_contig_migrate_range can generate -EINTR, -ENOMEM or
      -EBUSY, and report them down the chain.  The problem is that when
      migrate_pages() reports -ENOMEM, we keep going till we exhaust all the
      try-attempts (5 at the moment) instead of bailing out.
      
      migrate_pages() bails out right away on -ENOMEM because it is considered a
      fatal error.  Do the same here instead of keep going and retrying.  Note
      that this is not fixing a real issue, just a cosmetic change.  Although we
      can save some cycles by backing off ealier
      
      Link: https://lkml.kernel.org/r/20210419075413.1064-1-osalvador@suse.de
      Link: https://lkml.kernel.org/r/20210419075413.1064-2-osalvador@suse.deSigned-off-by: NOscar Salvador <osalvador@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8e28b47
  2. 01 5月, 2021 19 次提交
  3. 08 4月, 2021 1 次提交
  4. 14 3月, 2021 3 次提交
    • Z
      mm/memcg: set memcg when splitting page · e1baddf8
      Zhou Guanghui 提交于
      As described in the split_page() comment, for the non-compound high order
      page, the sub-pages must be freed individually.  If the memcg of the first
      page is valid, the tail pages cannot be uncharged when be freed.
      
      For example, when alloc_pages_exact is used to allocate 1MB continuous
      physical memory, 2MB is charged(kmemcg is enabled and __GFP_ACCOUNT is
      set).  When make_alloc_exact free the unused 1MB and free_pages_exact free
      the applied 1MB, actually, only 4KB(one page) is uncharged.
      
      Therefore, the memcg of the tail page needs to be set when splitting a
      page.
      
      Michel:
      
      There are at least two explicit users of __GFP_ACCOUNT with
      alloc_exact_pages added recently.  See 7efe8ef2 ("KVM: arm64:
      Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT") and c4196218
      ("KVM: s390: Add memcg accounting to KVM allocations"), so this is not
      just a theoretical issue.
      
      Link: https://lkml.kernel.org/r/20210304074053.65527-3-zhouguanghui1@huawei.comSigned-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NZi Yan <ziy@nvidia.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Cc: Tianhong Ding <dingtianhong@huawei.com>
      Cc: Weilong Chen <chenweilong@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1baddf8
    • A
      kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC · f9d79e8d
      Andrey Konovalov 提交于
      Currently, kasan_free_nondeferred_pages()->kasan_free_pages() is called
      after debug_pagealloc_unmap_pages(). This causes a crash when
      debug_pagealloc is enabled, as HW_TAGS KASAN can't set tags on an
      unmapped page.
      
      This patch puts kasan_free_nondeferred_pages() before
      debug_pagealloc_unmap_pages() and arch_free_page(), which can also make
      the page unavailable.
      
      Link: https://lkml.kernel.org/r/24cd7db274090f0e5bc3adcdc7399243668e3171.1614987311.git.andreyknvl@google.com
      Fixes: 94ab5b61 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS")
      Signed-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9d79e8d
    • M
      mm/page_alloc.c: refactor initialization of struct page for holes in memory layout · 0740a50b
      Mike Rapoport 提交于
      There could be struct pages that are not backed by actual physical memory.
      This can happen when the actual memory bank is not a multiple of
      SECTION_SIZE or when an architecture does not register memory holes
      reserved by the firmware as memblock.memory.
      
      Such pages are currently initialized using init_unavailable_mem() function
      that iterates through PFNs in holes in memblock.memory and if there is a
      struct page corresponding to a PFN, the fields of this page are set to
      default values and it is marked as Reserved.
      
      init_unavailable_mem() does not take into account zone and node the page
      belongs to and sets both zone and node links in struct page to zero.
      
      Before commit 73a6e474 ("mm: memmap_init: iterate over memblock
      regions rather that check each PFN") the holes inside a zone were
      re-initialized during memmap_init() and got their zone/node links right.
      However, after that commit nothing updates the struct pages representing
      such holes.
      
      On a system that has firmware reserved holes in a zone above ZONE_DMA, for
      instance in a configuration below:
      
      	# grep -A1 E820 /proc/iomem
      	7a17b000-7a216fff : Unknown E820 type
      	7a217000-7bffffff : System RAM
      
      unset zone link in struct page will trigger
      
      	VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
      
      in set_pfnblock_flags_mask() when called with a struct page from a range
      other than E820_TYPE_RAM because there are pages in the range of
      ZONE_DMA32 but the unset zone link in struct page makes them appear as a
      part of ZONE_DMA.
      
      Interleave initialization of the unavailable pages with the normal
      initialization of memory map, so that zone and node information will be
      properly set on struct pages that are not backed by the actual memory.
      
      With this change the pages for holes inside a zone will get proper
      zone/node links and the pages that are not spanned by any node will get
      links to the adjacent zone/node.  The holes between nodes will be
      prepended to the zone/node above the hole and the trailing pages in the
      last section that will be appended to the zone/node below.
      
      [akpm@linux-foundation.org: don't initialize static to zero, use %llu for u64]
      
      Link: https://lkml.kernel.org/r/20210225224351.7356-2-rppt@kernel.org
      Fixes: 73a6e474 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reported-by: NQian Cai <cai@lca.pw>
      Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Łukasz Majczak <lma@semihalf.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Sarvela, Tomi P" <tomi.p.sarvela@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0740a50b
  5. 27 2月, 2021 1 次提交
  6. 25 2月, 2021 6 次提交