1. 06 5月, 2021 4 次提交
    • D
      mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks · 202e35db
      Dave Hansen 提交于
      RECLAIM_ZONE was assumed to be unused because it was never explicitly
      used in the kernel.  However, there were a number of places where it was
      checked implicitly by checking 'node_reclaim_mode' for a zero value.
      
      These zero checks are not great because it is not obvious what a zero
      mode *means* in the code.  Replace them with a helper which makes it
      more obvious: node_reclaim_enabled().
      
      This helper also provides a handy place to explicitly check the
      RECLAIM_ZONE bit itself.  Check it explicitly there to make it more
      obvious where the bit can affect behavior.
      
      This should have no functional impact.
      
      Link: https://lkml.kernel.org/r/20210219172559.BF589C44@viggo.jf.intel.comSigned-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NBen Widawsky <ben.widawsky@intel.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: "Tobin C. Harding" <tobin@kernel.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Daniel Wagner <dwagner@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      202e35db
    • O
      mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig · eb14d4ee
      Oscar Salvador 提交于
      pfn_range_valid_contig() bails out when it finds an in-use page or a
      hugetlb page, among other things.  We can drop the in-use page check since
      __alloc_contig_pages can migrate away those pages, and the hugetlb page
      check can go too since isolate_migratepages_range is now capable of
      dealing with hugetlb pages.  Either way, those checks are racy so let the
      end function handle it when the time comes.
      
      Link: https://lkml.kernel.org/r/20210419075413.1064-8-osalvador@suse.deSigned-off-by: NOscar Salvador <osalvador@suse.de>
      Suggested-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb14d4ee
    • O
      mm,compaction: let isolate_migratepages_{range,block} return error codes · c2ad7a1f
      Oscar Salvador 提交于
      Currently, isolate_migratepages_{range,block} and their callers use a pfn
      == 0 vs pfn != 0 scheme to let the caller know whether there was any error
      during isolation.
      
      This does not work as soon as we need to start reporting different error
      codes and make sure we pass them down the chain, so they are properly
      interpreted by functions like e.g: alloc_contig_range.
      
      Let us rework isolate_migratepages_{range,block} so we can report error
      codes.  Since isolate_migratepages_block will stop returning the next pfn
      to be scanned, we reuse the cc->migrate_pfn field to keep track of that.
      
      Link: https://lkml.kernel.org/r/20210419075413.1064-3-osalvador@suse.deSigned-off-by: NOscar Salvador <osalvador@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2ad7a1f
    • O
      mm,page_alloc: bail out earlier on -ENOMEM in alloc_contig_migrate_range · c8e28b47
      Oscar Salvador 提交于
      Patch series "Make alloc_contig_range handle Hugetlb pages", v10.
      
      alloc_contig_range lacks the ability to handle HugeTLB pages.  This can
      be problematic for some users, e.g: CMA and virtio-mem, where those
      users will fail the call if alloc_contig_range ever sees a HugeTLB page,
      even when those pages lay in ZONE_MOVABLE and are free.  That problem
      can be easily solved by replacing the page in the free hugepage pool.
      
      In-use HugeTLB are no exception though, as those can be isolated and
      migrated as any other LRU or Movable page.
      
      This aims to improve alloc_contig_range->isolate_migratepages_block, so
      that HugeTLB pages can be recognized and handled.
      
      Since we also need to start reporting errors down the chain (e.g:
      -ENOMEM due to not be able to allocate a new hugetlb page),
      isolate_migratepages_{range,block} interfaces need to change to start
      reporting error codes instead of the pfn == 0 vs pfn != 0 scheme it is
      using right now.  From now on, isolate_migratepages_block will not
      return the next pfn to be scanned anymore, but -EINTR, -ENOMEM or 0, so
      we the next pfn to be scanned will be recorded in cc->migrate_pfn field
      (as it is already done in isolate_migratepages_range()).
      
      Below is an insight from David (thanks), where the problem can clearly be
      seen:
      
       "Start a VM with 4G. Hotplug 1G via virtio-mem and online it to
        ZONE_MOVABLE. Allocate 512 huge pages.
      
        [root@localhost ~]# cat /proc/meminfo
        MemTotal:        5061512 kB
        MemFree:         3319396 kB
        MemAvailable:    3457144 kB
        ...
        HugePages_Total:     512
        HugePages_Free:      512
        HugePages_Rsvd:        0
        HugePages_Surp:        0
        Hugepagesize:       2048 kB
      
        The huge pages get partially allocate from ZONE_MOVABLE. Try unplugging
        1G via virtio-mem (remember, all ZONE_MOVABLE). Inside the guest:
      
        [  180.058992] alloc_contig_range: [1b8000, 1c0000) PFNs busy
        [  180.060531] alloc_contig_range: [1b8000, 1c0000) PFNs busy
        [  180.061972] alloc_contig_range: [1b8000, 1c0000) PFNs busy
        [  180.063413] alloc_contig_range: [1b8000, 1c0000) PFNs busy
        [  180.064838] alloc_contig_range: [1b8000, 1c0000) PFNs busy
        [  180.065848] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
        [  180.066794] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
        [  180.067738] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
        [  180.068669] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
        [  180.069598] alloc_contig_range: [1bfc00, 1c0000) PFNs busy"
      
      And then with this patchset running:
      
       "Same experiment with ZONE_MOVABLE:
      
        a) Free huge pages: all memory can get unplugged again.
      
        b) Allocated/populated but idle huge pages: all memory can get unplugged
           again.
      
        c) Allocated/populated but all 512 huge pages are read/written in a
           loop: all memory can get unplugged again, but I get a single
      
           [  121.192345] alloc_contig_range: [180000, 188000) PFNs busy
      
           Most probably because it happened to try migrating a huge page
           while it was busy.  As virtio-mem retries on ZONE_MOVABLE a couple of
           times, it can deal with this temporary failure.
      
        Last but not least, I did something extreme:
      
        # cat /proc/meminfo
        MemTotal:        5061568 kB
        MemFree:          186560 kB
        MemAvailable:     354524 kB
        ...
        HugePages_Total:    2048
        HugePages_Free:     2048
        HugePages_Rsvd:        0
        HugePages_Surp:        0
      
        Triggering unplug would require to dissolve+alloc - which now fails
        when trying to allocate an additional ~512 huge pages (1G).
      
        As expected, I can properly see memory unplug not fully succeeding.  +
        I get a fairly continuous stream of
      
        [  226.611584] alloc_contig_range: [19f400, 19f800) PFNs busy
        ...
      
        But more importantly, the hugepage count remains stable, as configured
        by the admin (me):
      
        HugePages_Total:    2048
        HugePages_Free:     2048
        HugePages_Rsvd:        0
        HugePages_Surp:        0"
      
      This patch (of 7):
      
      Currently, __alloc_contig_migrate_range can generate -EINTR, -ENOMEM or
      -EBUSY, and report them down the chain.  The problem is that when
      migrate_pages() reports -ENOMEM, we keep going till we exhaust all the
      try-attempts (5 at the moment) instead of bailing out.
      
      migrate_pages() bails out right away on -ENOMEM because it is considered a
      fatal error.  Do the same here instead of keep going and retrying.  Note
      that this is not fixing a real issue, just a cosmetic change.  Although we
      can save some cycles by backing off ealier
      
      Link: https://lkml.kernel.org/r/20210419075413.1064-1-osalvador@suse.de
      Link: https://lkml.kernel.org/r/20210419075413.1064-2-osalvador@suse.deSigned-off-by: NOscar Salvador <osalvador@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8e28b47
  2. 01 5月, 2021 19 次提交
  3. 08 4月, 2021 1 次提交
  4. 14 3月, 2021 3 次提交
    • Z
      mm/memcg: set memcg when splitting page · e1baddf8
      Zhou Guanghui 提交于
      As described in the split_page() comment, for the non-compound high order
      page, the sub-pages must be freed individually.  If the memcg of the first
      page is valid, the tail pages cannot be uncharged when be freed.
      
      For example, when alloc_pages_exact is used to allocate 1MB continuous
      physical memory, 2MB is charged(kmemcg is enabled and __GFP_ACCOUNT is
      set).  When make_alloc_exact free the unused 1MB and free_pages_exact free
      the applied 1MB, actually, only 4KB(one page) is uncharged.
      
      Therefore, the memcg of the tail page needs to be set when splitting a
      page.
      
      Michel:
      
      There are at least two explicit users of __GFP_ACCOUNT with
      alloc_exact_pages added recently.  See 7efe8ef2 ("KVM: arm64:
      Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT") and c4196218
      ("KVM: s390: Add memcg accounting to KVM allocations"), so this is not
      just a theoretical issue.
      
      Link: https://lkml.kernel.org/r/20210304074053.65527-3-zhouguanghui1@huawei.comSigned-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NZi Yan <ziy@nvidia.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Cc: Tianhong Ding <dingtianhong@huawei.com>
      Cc: Weilong Chen <chenweilong@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1baddf8
    • A
      kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC · f9d79e8d
      Andrey Konovalov 提交于
      Currently, kasan_free_nondeferred_pages()->kasan_free_pages() is called
      after debug_pagealloc_unmap_pages(). This causes a crash when
      debug_pagealloc is enabled, as HW_TAGS KASAN can't set tags on an
      unmapped page.
      
      This patch puts kasan_free_nondeferred_pages() before
      debug_pagealloc_unmap_pages() and arch_free_page(), which can also make
      the page unavailable.
      
      Link: https://lkml.kernel.org/r/24cd7db274090f0e5bc3adcdc7399243668e3171.1614987311.git.andreyknvl@google.com
      Fixes: 94ab5b61 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS")
      Signed-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9d79e8d
    • M
      mm/page_alloc.c: refactor initialization of struct page for holes in memory layout · 0740a50b
      Mike Rapoport 提交于
      There could be struct pages that are not backed by actual physical memory.
      This can happen when the actual memory bank is not a multiple of
      SECTION_SIZE or when an architecture does not register memory holes
      reserved by the firmware as memblock.memory.
      
      Such pages are currently initialized using init_unavailable_mem() function
      that iterates through PFNs in holes in memblock.memory and if there is a
      struct page corresponding to a PFN, the fields of this page are set to
      default values and it is marked as Reserved.
      
      init_unavailable_mem() does not take into account zone and node the page
      belongs to and sets both zone and node links in struct page to zero.
      
      Before commit 73a6e474 ("mm: memmap_init: iterate over memblock
      regions rather that check each PFN") the holes inside a zone were
      re-initialized during memmap_init() and got their zone/node links right.
      However, after that commit nothing updates the struct pages representing
      such holes.
      
      On a system that has firmware reserved holes in a zone above ZONE_DMA, for
      instance in a configuration below:
      
      	# grep -A1 E820 /proc/iomem
      	7a17b000-7a216fff : Unknown E820 type
      	7a217000-7bffffff : System RAM
      
      unset zone link in struct page will trigger
      
      	VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
      
      in set_pfnblock_flags_mask() when called with a struct page from a range
      other than E820_TYPE_RAM because there are pages in the range of
      ZONE_DMA32 but the unset zone link in struct page makes them appear as a
      part of ZONE_DMA.
      
      Interleave initialization of the unavailable pages with the normal
      initialization of memory map, so that zone and node information will be
      properly set on struct pages that are not backed by the actual memory.
      
      With this change the pages for holes inside a zone will get proper
      zone/node links and the pages that are not spanned by any node will get
      links to the adjacent zone/node.  The holes between nodes will be
      prepended to the zone/node above the hole and the trailing pages in the
      last section that will be appended to the zone/node below.
      
      [akpm@linux-foundation.org: don't initialize static to zero, use %llu for u64]
      
      Link: https://lkml.kernel.org/r/20210225224351.7356-2-rppt@kernel.org
      Fixes: 73a6e474 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reported-by: NQian Cai <cai@lca.pw>
      Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Łukasz Majczak <lma@semihalf.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Sarvela, Tomi P" <tomi.p.sarvela@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0740a50b
  5. 27 2月, 2021 1 次提交
  6. 25 2月, 2021 8 次提交
  7. 07 2月, 2021 1 次提交
    • K
      mm: page_frag: Introduce page_frag_alloc_align() · b358e212
      Kevin Hao 提交于
      In the current implementation of page_frag_alloc(), it doesn't have
      any align guarantee for the returned buffer address. But for some
      hardwares they do require the DMA buffer to be aligned correctly,
      so we would have to use some workarounds like below if the buffers
      allocated by the page_frag_alloc() are used by these hardwares for
      DMA.
          buf = page_frag_alloc(really_needed_size + align);
          buf = PTR_ALIGN(buf, align);
      
      These codes seems ugly and would waste a lot of memories if the buffers
      are used in a network driver for the TX/RX. So introduce
      page_frag_alloc_align() to make sure that an aligned buffer address is
      returned.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b358e212
  8. 27 1月, 2021 1 次提交
  9. 25 1月, 2021 2 次提交
    • A
      kasan, mm: fix resetting page_alloc tags for HW_TAGS · acb35b17
      Andrey Konovalov 提交于
      A previous commit added resetting KASAN page tags to
      kernel_init_free_pages() to avoid false-positives due to accesses to
      metadata with the hardware tag-based mode.
      
      That commit did reset page tags before the metadata access, but didn't
      restore them after.  As the result, KASAN fails to detect bad accesses
      to page_alloc allocations on some configurations.
      
      Fix this by recovering the tag after the metadata access.
      
      Link: https://lkml.kernel.org/r/02b5bcd692e912c27d484030f666b350ad7e4ae4.1611074450.git.andreyknvl@google.com
      Fixes: aa1ef4d7 ("kasan, mm: reset tags when accessing metadata")
      Signed-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      acb35b17
    • M
      mm: fix initialization of struct page for holes in memory layout · d3921cb8
      Mike Rapoport 提交于
      There could be struct pages that are not backed by actual physical
      memory.  This can happen when the actual memory bank is not a multiple
      of SECTION_SIZE or when an architecture does not register memory holes
      reserved by the firmware as memblock.memory.
      
      Such pages are currently initialized using init_unavailable_mem()
      function that iterates through PFNs in holes in memblock.memory and if
      there is a struct page corresponding to a PFN, the fields if this page
      are set to default values and the page is marked as Reserved.
      
      init_unavailable_mem() does not take into account zone and node the page
      belongs to and sets both zone and node links in struct page to zero.
      
      On a system that has firmware reserved holes in a zone above ZONE_DMA,
      for instance in a configuration below:
      
      	# grep -A1 E820 /proc/iomem
      	7a17b000-7a216fff : Unknown E820 type
      	7a217000-7bffffff : System RAM
      
      unset zone link in struct page will trigger
      
      	VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
      
      because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link
      in struct page) in the same pageblock.
      
      Update init_unavailable_mem() to use zone constraints defined by an
      architecture to properly setup the zone link and use node ID of the
      adjacent range in memblock.memory to set the node link.
      
      Link: https://lkml.kernel.org/r/20210111194017.22696-3-rppt@kernel.org
      Fixes: 73a6e474 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d3921cb8