1. 28 6月, 2022 1 次提交
  2. 17 6月, 2022 2 次提交
  3. 29 4月, 2022 4 次提交
    • J
      mm/sparse-vmemmap: improve memory savings for compound devmaps · 4917f55b
      Joao Martins 提交于
      A compound devmap is a dev_pagemap with @vmemmap_shift > 0 and it means
      that pages are mapped at a given huge page alignment and utilize uses
      compound pages as opposed to order-0 pages.
      
      Take advantage of the fact that most tail pages look the same (except the
      first two) to minimize struct page overhead.  Allocate a separate page for
      the vmemmap area which contains the head page and separate for the next 64
      pages.  The rest of the subsections then reuse this tail vmemmap page to
      initialize the rest of the tail pages.
      
      Sections are arch-dependent (e.g.  on x86 it's 64M, 128M or 512M) and when
      initializing compound devmap with big enough @vmemmap_shift (e.g.  1G PUD)
      it may cross multiple sections.  The vmemmap code needs to consult @pgmap
      so that multiple sections that all map the same tail data can refer back
      to the first copy of that data for a given gigantic page.
      
      On compound devmaps with 2M align, this mechanism lets 6 pages be saved
      out of the 8 necessary PFNs necessary to set the subsection's 512 struct
      pages being mapped.  On a 1G compound devmap it saves 4094 pages.
      
      Altmap isn't supported yet, given various restrictions in altmap pfn
      allocator, thus fallback to the already in use vmemmap_populate().  It is
      worth noting that altmap for devmap mappings was there to relieve the
      pressure of inordinate amounts of memmap space to map terabytes of pmem. 
      With compound pages the motivation for altmaps for pmem gets reduced.
      
      Link: https://lkml.kernel.org/r/20220420155310.9712-5-joao.m.martins@oracle.comSigned-off-by: NJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      4917f55b
    • J
      mm/sparse-vmemmap: refactor core of vmemmap_populate_basepages() to helper · 2beea70a
      Joao Martins 提交于
      In preparation for describing a memmap with compound pages, move the
      actual pte population logic into a separate function
      vmemmap_populate_address() and have a new helper vmemmap_populate_range()
      walk through all base pages it needs to populate.
      
      While doing that, change the helper to use a pte_t* as return value,
      rather than an hardcoded errno of 0 or -ENOMEM.
      
      Link: https://lkml.kernel.org/r/20220420155310.9712-3-joao.m.martins@oracle.comSigned-off-by: NJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      2beea70a
    • J
      mm/sparse-vmemmap: add a pgmap argument to section activation · e3246d8f
      Joao Martins 提交于
      Patch series "sparse-vmemmap: memory savings for compound devmaps (device-dax)", v9.
      
      This series minimizes 'struct page' overhead by pursuing a similar
      approach as Muchun Song series "Free some vmemmap pages of hugetlb page"
      (now merged since v5.14), but applied to devmap with @vmemmap_shift
      (device-dax).  
      
      The vmemmap dedpulication original idea (already used in HugeTLB) is to
      reuse/deduplicate tail page vmemmap areas, particular the area which only
      describes tail pages.  So a vmemmap page describes 64 struct pages, and
      the first page for a given ZONE_DEVICE vmemmap would contain the head page
      and 63 tail pages.  The second vmemmap page would contain only tail pages,
      and that's what gets reused across the rest of the subsection/section. 
      The bigger the page size, the bigger the savings (2M hpage -> save 6
      vmemmap pages; 1G hpage -> save 4094 vmemmap pages).  
      
      This is done for PMEM /specifically only/ on device-dax configured
      namespaces, not fsdax.  In other words, a devmap with a @vmemmap_shift.
      
      In terms of savings, per 1Tb of memory, the struct page cost would go down
      with compound devmap:
      
      * with 2M pages we lose 4G instead of 16G (0.39% instead of 1.5% of
        total memory)
      
      * with 1G pages we lose 40MB instead of 16G (0.0014% instead of 1.5% of
        total memory)
      
      The series is mostly summed up by patch 4, and to summarize what the
      series does:
      
      Patches 1 - 3: Minor cleanups in preparation for patch 4.  Move the very
      nice docs of hugetlb_vmemmap.c into a Documentation/vm/ entry.
      
      Patch 4: Patch 4 is the one that takes care of the struct page savings
      (also referred to here as tail-page/vmemmap deduplication).  Much like
      Muchun series, we reuse the second PTE tail page vmemmap areas across a
      given @vmemmap_shift On important difference though, is that contrary to
      the hugetlbfs series, there's no vmemmap for the area because we are
      late-populating it as opposed to remapping a system-ram range.  IOW no
      freeing of pages of already initialized vmemmap like the case for
      hugetlbfs, which greatly simplifies the logic (besides not being
      arch-specific).  altmap case unchanged and still goes via the
      vmemmap_populate().  Also adjust the newly added docs to the device-dax
      case.
      
      [Note that device-dax is still a little behind HugeTLB in terms of
      savings.  I have an additional simple patch that reuses the head vmemmap
      page too, as a follow-up.  That will double the savings and namespaces
      initialization.]
      
      Patch 5: Initialize fewer struct pages depending on the page size with
      DRAM backed struct pages -- because fewer pages are unique and most tail
      pages (with bigger vmemmap_shift).
      
          NVDIMM namespace bootstrap improves from ~268-358 ms to
          ~80-110/<1ms on 128G NVDIMMs with 2M and 1G respectivally.  And struct
          page needed capacity will be 3.8x / 1071x smaller for 2M and 1G
          respectivelly.  Tested on x86 with 1.5Tb of pmem (including pinning,
          and RDMA registration/deregistration scalability with 2M MRs)
      
      
      This patch (of 5):
      
      In support of using compound pages for devmap mappings, plumb the pgmap
      down to the vmemmap_populate implementation.  Note that while altmap is
      retrievable from pgmap the memory hotplug code passes altmap without
      pgmap[*], so both need to be independently plumbed.
      
      So in addition to @altmap, pass @pgmap to sparse section populate
      functions namely:
      
      	sparse_add_section
      	  section_activate
      	    populate_section_memmap
         	      __populate_section_memmap
      
      Passing @pgmap allows __populate_section_memmap() to both fetch the
      vmemmap_shift in which memmap metadata is created for and also to let
      sparse-vmemmap fetch pgmap ranges to co-relate to a given section and pick
      whether to just reuse tail pages from past onlined sections.
      
      While at it, fix the kdoc for @altmap for sparse_add_section().
      
      [*] https://lore.kernel.org/linux-mm/20210319092635.6214-1-osalvador@suse.de/
      
      Link: https://lkml.kernel.org/r/20220420155310.9712-1-joao.m.martins@oracle.com
      Link: https://lkml.kernel.org/r/20220420155310.9712-2-joao.m.martins@oracle.comSigned-off-by: NJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      e3246d8f
    • M
      mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP* · 47010c04
      Muchun Song 提交于
      The word of "free" is not expressive enough to express the feature of
      optimizing vmemmap pages associated with each HugeTLB, rename this keywork
      to "optimize".  In this patch , cheanup configs to make code more
      expressive.
      
      Link: https://lkml.kernel.org/r/20220404074652.68024-4-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      47010c04
  4. 23 3月, 2022 3 次提交
    • M
      mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP · e5408417
      Muchun Song 提交于
      The vmemmap_remap_free/alloc are relevant to HugeTLB, so move those
      functiongs to the scope of CONFIG_HUGETLB_PAGE_FREE_VMEMMAP.
      
      Link: https://lkml.kernel.org/r/20211101031651.75851-6-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: NBarry Song <song.bao.hua@hisilicon.com>
      Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
      Cc: Chen Huang <chenhuang5@huawei.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Fam Zheng <fam.zheng@bytedance.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5408417
    • M
      mm: sparsemem: use page table lock to protect kernel pmd operations · d8d55f56
      Muchun Song 提交于
      The init_mm.page_table_lock is used to protect kernel page tables, we
      can use it to serialize splitting vmemmap PMD mappings instead of mmap
      write lock, which can increase the concurrency of vmemmap_remap_free().
      
      Actually, It increase the concurrency between allocations of HugeTLB
      pages.  But it is not the only benefit.  There are a lot of users of
      mmap read lock of init_mm.  The mmap write lock is holding through
      vmemmap_remap_free(), removing mmap write lock usage to make it does not
      affect other users of mmap read lock.  It is not making anything worse
      and always a win to move.
      
      Now the kernel page table walker does not hold the page_table_lock when
      walking pmd entries.  There may be consistency issue of a pmd entry,
      because pmd entry might change from a huge pmd entry to a PTE page
      table.  There is only one user of kernel page table walker, namely
      ptdump.  The ptdump already considers the consistency, which use a local
      variable to cache the value of pmd entry.  But we also need to update
      ->action to ACTION_CONTINUE to make sure the walker does not walk every
      pte entry again when concurrent thread has split the huge pmd.
      
      Link: https://lkml.kernel.org/r/20211101031651.75851-4-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Barry Song <song.bao.hua@hisilicon.com>
      Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
      Cc: Chen Huang <chenhuang5@huawei.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Fam Zheng <fam.zheng@bytedance.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8d55f56
    • M
      mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page · e7d32485
      Muchun Song 提交于
      Patch series "Free the 2nd vmemmap page associated with each HugeTLB
      page", v7.
      
      This series can minimize the overhead of struct page for 2MB HugeTLB
      pages significantly.  It further reduces the overhead of struct page by
      12.5% for a 2MB HugeTLB compared to the previous approach, which means
      2GB per 1TB HugeTLB.  It is a nice gain.  Comments and reviews are
      welcome.  Thanks.
      
      The main implementation and details can refer to the commit log of patch
      1.  In this series, I have changed the following four helpers, the
      following table shows the impact of the overhead of those helpers.
      
      	+------------------+-----------------------+
      	|       APIs       | head page | tail page |
      	+------------------+-----------+-----------+
      	|    PageHead()    |     Y     |     N     |
      	+------------------+-----------+-----------+
      	|    PageTail()    |     Y     |     N     |
      	+------------------+-----------+-----------+
      	|  PageCompound()  |     N     |     N     |
      	+------------------+-----------+-----------+
      	|  compound_head() |     Y     |     N     |
      	+------------------+-----------+-----------+
      
      	Y: Overhead is increased.
      	N: Overhead is _NOT_ increased.
      
      It shows that the overhead of those helpers on a tail page don't change
      between "hugetlb_free_vmemmap=on" and "hugetlb_free_vmemmap=off".  But the
      overhead on a head page will be increased when "hugetlb_free_vmemmap=on"
      (except PageCompound()).  So I believe that Matthew Wilcox's folio series
      will help with this.
      
      The users of PageHead() and PageTail() are much less than compound_head()
      and most users of PageTail() are VM_BUG_ON(), so I have done some tests
      about the overhead of compound_head() on head pages.
      
      I have tested the overhead of calling compound_head() on a head page,
      which is 2.11ns (Measure the call time of 10 million times
      compound_head(), and then average).
      
      For a head page whose address is not aligned with PAGE_SIZE or a
      non-compound page, the overhead of compound_head() is 2.54ns which is
      increased by 20%.  For a head page whose address is aligned with
      PAGE_SIZE, the overhead of compound_head() is 2.97ns which is increased by
      40%.  Most pages are the former.  I do not think the overhead is
      significant since the overhead of compound_head() itself is low.
      
      This patch (of 5):
      
      This patch minimizes the overhead of struct page for 2MB HugeTLB pages
      significantly.  It further reduces the overhead of struct page by 12.5%
      for a 2MB HugeTLB compared to the previous approach, which means 2GB per
      1TB HugeTLB (2MB type).
      
      After the feature of "Free sonme vmemmap pages of HugeTLB page" is
      enabled, the mapping of the vmemmap addresses associated with a 2MB
      HugeTLB page becomes the figure below.
      
           HugeTLB                    struct pages(8 pages)         page frame(8 pages)
       +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+---> PG_head
       |           |                     |     0     | -------------> |     0     |
       |           |                     +-----------+                +-----------+
       |           |                     |     1     | -------------> |     1     |
       |           |                     +-----------+                +-----------+
       |           |                     |     2     | ----------------^ ^ ^ ^ ^ ^
       |           |                     +-----------+                   | | | | |
       |           |                     |     3     | ------------------+ | | | |
       |           |                     +-----------+                     | | | |
       |           |                     |     4     | --------------------+ | | |
       |    2MB    |                     +-----------+                       | | |
       |           |                     |     5     | ----------------------+ | |
       |           |                     +-----------+                         | |
       |           |                     |     6     | ------------------------+ |
       |           |                     +-----------+                           |
       |           |                     |     7     | --------------------------+
       |           |                     +-----------+
       |           |
       |           |
       |           |
       +-----------+
      
      As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and
      remaped. However, the 2nd vmemmap page frame is also can be freed to
      the buddy allocator, then we can change the mapping from the figure
      above to the figure below.
      
          HugeTLB                    struct pages(8 pages)         page frame(8 pages)
       +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+---> PG_head
       |           |                     |     0     | -------------> |     0     |
       |           |                     +-----------+                +-----------+
       |           |                     |     1     | ---------------^ ^ ^ ^ ^ ^ ^
       |           |                     +-----------+                  | | | | | |
       |           |                     |     2     | -----------------+ | | | | |
       |           |                     +-----------+                    | | | | |
       |           |                     |     3     | -------------------+ | | | |
       |           |                     +-----------+                      | | | |
       |           |                     |     4     | ---------------------+ | | |
       |    2MB    |                     +-----------+                        | | |
       |           |                     |     5     | -----------------------+ | |
       |           |                     +-----------+                          | |
       |           |                     |     6     | -------------------------+ |
       |           |                     +-----------+                            |
       |           |                     |     7     | ---------------------------+
       |           |                     +-----------+
       |           |
       |           |
       |           |
       +-----------+
      
      After we do this, all tail vmemmap pages (1-7) are mapped to the head
      vmemmap page frame (0).  In other words, there are more than one page
      struct with PG_head associated with each HugeTLB page.  We __know__ that
      there is only one head page struct, the tail page structs with PG_head are
      fake head page structs.  We need an approach to distinguish between those
      two different types of page structs so that compound_head(), PageHead()
      and PageTail() can work properly if the parameter is the tail page struct
      but with PG_head.
      
      The following code snippet describes how to distinguish between real and
      fake head page struct.
      
      	if (test_bit(PG_head, &page->flags)) {
      		unsigned long head = READ_ONCE(page[1].compound_head);
      
      		if (head & 1) {
      			if (head == (unsigned long)page + 1)
      				==> head page struct
      			else
      				==> tail page struct
      		} else
      			==> head page struct
      	}
      
      We can safely access the field of the @page[1] with PG_head because the
      @page is a compound page composed with at least two contiguous pages.
      
      [songmuchun@bytedance.com: restore lost comment changes]
      
      Link: https://lkml.kernel.org/r/20211101031651.75851-1-songmuchun@bytedance.com
      Link: https://lkml.kernel.org/r/20211101031651.75851-2-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: NBarry Song <song.bao.hua@hisilicon.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Chen Huang <chenhuang5@huawei.com>
      Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
      Cc: Fam Zheng <fam.zheng@bytedance.com>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7d32485
  5. 07 11月, 2021 1 次提交
  6. 01 7月, 2021 3 次提交
    • M
      mm: sparsemem: split the huge PMD mapping of vmemmap pages · 3bc2b6a7
      Muchun Song 提交于
      Patch series "Split huge PMD mapping of vmemmap pages", v4.
      
      In order to reduce the difficulty of code review in series[1].  We disable
      huge PMD mapping of vmemmap pages when that feature is enabled.  In this
      series, we do not disable huge PMD mapping of vmemmap pages anymore.  We
      will split huge PMD mapping when needed.  When HugeTLB pages are freed
      from the pool we do not attempt coalasce and move back to a PMD mapping
      because it is much more complex.
      
      [1] https://lore.kernel.org/linux-doc/20210510030027.56044-1-songmuchun@bytedance.com/
      
      This patch (of 3):
      
      In [1], PMD mappings of vmemmap pages were disabled if the the feature
      hugetlb_free_vmemmap was enabled.  This was done to simplify the initial
      implementation of vmmemap freeing for hugetlb pages.  Now, remove this
      simplification by allowing PMD mapping and switching to PTE mappings as
      needed for allocated hugetlb pages.
      
      When a hugetlb page is allocated, the vmemmap page tables are walked to
      free vmemmap pages.  During this walk, split huge PMD mappings to PTE
      mappings as required.  In the unlikely case PTE pages can not be
      allocated, return error(ENOMEM) and do not optimize vmemmap of the hugetlb
      page.
      
      When HugeTLB pages are freed from the pool, we do not attempt to
      coalesce and move back to a PMD mapping because it is much more complex.
      
      [1] https://lkml.kernel.org/r/20210510030027.56044-8-songmuchun@bytedance.com
      
      Link: https://lkml.kernel.org/r/20210616094915.34432-1-songmuchun@bytedance.com
      Link: https://lkml.kernel.org/r/20210616094915.34432-2-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Chen Huang <chenhuang5@huawei.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3bc2b6a7
    • M
      mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · ad2fa371
      Muchun Song 提交于
      When we free a HugeTLB page to the buddy allocator, we need to allocate
      the vmemmap pages associated with it.  However, we may not be able to
      allocate the vmemmap pages when the system is under memory pressure.  In
      this case, we just refuse to free the HugeTLB page.  This changes behavior
      in some corner cases as listed below:
      
       1) Failing to free a huge page triggered by the user (decrease nr_pages).
      
          User needs to try again later.
      
       2) Failing to free a surplus huge page when freed by the application.
      
          Try again later when freeing a huge page next time.
      
       3) Failing to dissolve a free huge page on ZONE_MOVABLE via
          offline_pages().
      
          This can happen when we have plenty of ZONE_MOVABLE memory, but
          not enough kernel memory to allocate vmemmmap pages.  We may even
          be able to migrate huge page contents, but will not be able to
          dissolve the source huge page.  This will prevent an offline
          operation and is unfortunate as memory offlining is expected to
          succeed on movable zones.  Users that depend on memory hotplug
          to succeed for movable zones should carefully consider whether the
          memory savings gained from this feature are worth the risk of
          possibly not being able to offline memory in certain situations.
      
       4) Failing to dissolve a huge page on CMA/ZONE_MOVABLE via
          alloc_contig_range() - once we have that handling in place. Mainly
          affects CMA and virtio-mem.
      
          Similar to 3). virito-mem will handle migration errors gracefully.
          CMA might be able to fallback on other free areas within the CMA
          region.
      
      Vmemmap pages are allocated from the page freeing context.  In order for
      those allocations to be not disruptive (e.g.  trigger oom killer)
      __GFP_NORETRY is used.  hugetlb_lock is dropped for the allocation because
      a non sleeping allocation would be too fragile and it could fail too
      easily under memory pressure.  GFP_ATOMIC or other modes to access memory
      reserves is not used because we want to prevent consuming reserves under
      heavy hugetlb freeing.
      
      [mike.kravetz@oracle.com: fix dissolve_free_huge_page use of tail/head page]
        Link: https://lkml.kernel.org/r/20210527231225.226987-1-mike.kravetz@oracle.com
      [willy@infradead.org: fix alloc_vmemmap_page_list documentation warning]
        Link: https://lkml.kernel.org/r/20210615200242.1716568-6-willy@infradead.org
      
      Link: https://lkml.kernel.org/r/20210510030027.56044-7-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Barry Song <song.bao.hua@hisilicon.com>
      Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chen Huang <chenhuang5@huawei.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: HORIGUCHI NAOYA <naoya.horiguchi@nec.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Neukum <oneukum@suse.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad2fa371
    • M
      mm: hugetlb: free the vmemmap pages associated with each HugeTLB page · f41f2ed4
      Muchun Song 提交于
      Every HugeTLB has more than one struct page structure.  We __know__ that
      we only use the first 4 (__NR_USED_SUBPAGE) struct page structures to
      store metadata associated with each HugeTLB.
      
      There are a lot of struct page structures associated with each HugeTLB
      page.  For tail pages, the value of compound_head is the same.  So we can
      reuse first page of tail page structures.  We map the virtual addresses of
      the remaining pages of tail page structures to the first tail page struct,
      and then free these page frames.  Therefore, we need to reserve two pages
      as vmemmap areas.
      
      When we allocate a HugeTLB page from the buddy, we can free some vmemmap
      pages associated with each HugeTLB page.  It is more appropriate to do it
      in the prep_new_huge_page().
      
      The free_vmemmap_pages_per_hpage(), which indicates how many vmemmap pages
      associated with a HugeTLB page can be freed, returns zero for now, which
      means the feature is disabled.  We will enable it once all the
      infrastructure is there.
      
      [willy@infradead.org: fix documentation warning]
        Link: https://lkml.kernel.org/r/20210615200242.1716568-5-willy@infradead.org
      
      Link: https://lkml.kernel.org/r/20210510030027.56044-5-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Tested-by: NChen Huang <chenhuang5@huawei.com>
      Tested-by: NBodeddula Balasubramaniam <bodeddub@amazon.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Barry Song <song.bao.hua@hisilicon.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: HORIGUCHI NAOYA <naoya.horiguchi@nec.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Neukum <oneukum@suse.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f41f2ed4
  7. 08 8月, 2020 3 次提交
    • W
      mm/sparse: only sub-section aligned range would be populated · 6cda7204
      Wei Yang 提交于
      There are two code path which invoke __populate_section_memmap()
      
        * sparse_init_nid()
        * sparse_add_section()
      
      For both case, we are sure the memory range is sub-section aligned.
      
        * we pass PAGES_PER_SECTION to sparse_init_nid()
        * we check range by check_pfn_span() before calling
          sparse_add_section()
      
      Also, the counterpart of __populate_section_memmap(), we don't do such
      calculation and check since the range is checked by check_pfn_span() in
      __remove_pages().
      
      Clear the calculation and check to keep it simple and comply with its
      counterpart.
      Signed-off-by: NWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Link: http://lkml.kernel.org/r/20200703031828.14645-1-richard.weiyang@linux.alibaba.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6cda7204
    • A
      mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf() · 56993b4e
      Anshuman Khandual 提交于
      There are many instances where vmemap allocation is often switched between
      regular memory and device memory just based on whether altmap is available
      or not.  vmemmap_alloc_block_buf() is used in various platforms to
      allocate vmemmap mappings.  Lets also enable it to handle altmap based
      device memory allocation along with existing regular memory allocations.
      This will help in avoiding the altmap based allocation switch in many
      places.  To summarize there are two different methods to call
      vmemmap_alloc_block_buf().
      
      vmemmap_alloc_block_buf(size, node, NULL)   /* Allocate from system RAM */
      vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */
      
      This converts altmap_alloc_block_buf() into a static function, drops it's
      entry from the header and updates Documentation/vm/memory-model.rst.
      Suggested-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NJia He <justin.he@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Will Deacon <will@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Hsin-Yi Wang <hsinyi@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Link: http://lkml.kernel.org/r/1594004178-8861-3-git-send-email-anshuman.khandual@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56993b4e
    • A
      mm/sparsemem: enable vmem_altmap support in vmemmap_populate_basepages() · 1d9cfee7
      Anshuman Khandual 提交于
      Patch series "arm64: Enable vmemmap mapping from device memory", v4.
      
      This series enables vmemmap backing memory allocation from device memory
      ranges on arm64.  But before that, it enables vmemmap_populate_basepages()
      and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
      alocation requests.
      
      This patch (of 3):
      
      vmemmap_populate_basepages() is used across platforms to allocate backing
      memory for vmemmap mapping.  This is used as a standard default choice or
      as a fallback when intended huge pages allocation fails.  This just
      creates entire vmemmap mapping with base pages (PAGE_SIZE).
      
      On arm64 platforms, vmemmap_populate_basepages() is called instead of the
      platform specific vmemmap_populate() when ARM64_SWAPPER_USES_SECTION_MAPS
      is not enabled as in case for ARM64_16K_PAGES and ARM64_64K_PAGES configs.
      
      At present vmemmap_populate_basepages() does not support allocating from
      driver defined struct vmem_altmap while trying to create vmemmap mapping
      for a device memory range.  It prevents ARM64_16K_PAGES and
      ARM64_64K_PAGES configs on arm64 from supporting device memory with
      vmemap_altmap request.
      
      This enables vmem_altmap support in vmemmap_populate_basepages() unlocking
      device memory allocation for vmemap mapping on arm64 platforms with 16K or
      64K base page configs.
      
      Each architecture should evaluate and decide on subscribing device memory
      based base page allocation through vmemmap_populate_basepages().  Hence
      lets keep it disabled on all archs in order to preserve the existing
      semantics.  A subsequent patch enables it on arm64.
      Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NJia He <justin.he@arm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hsin-Yi Wang <hsinyi@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Link: http://lkml.kernel.org/r/1594004178-8861-1-git-send-email-anshuman.khandual@arm.com
      Link: http://lkml.kernel.org/r/1594004178-8861-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d9cfee7
  8. 10 6月, 2020 1 次提交
    • M
      mm: don't include asm/pgtable.h if linux/mm.h is already included · e31cf2f4
      Mike Rapoport 提交于
      Patch series "mm: consolidate definitions of page table accessors", v2.
      
      The low level page table accessors (pXY_index(), pXY_offset()) are
      duplicated across all architectures and sometimes more than once.  For
      instance, we have 31 definition of pgd_offset() for 25 supported
      architectures.
      
      Most of these definitions are actually identical and typically it boils
      down to, e.g.
      
      static inline unsigned long pmd_index(unsigned long address)
      {
              return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
      }
      
      static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
      {
              return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
      }
      
      These definitions can be shared among 90% of the arches provided
      XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
      
      For architectures that really need a custom version there is always
      possibility to override the generic version with the usual ifdefs magic.
      
      These patches introduce include/linux/pgtable.h that replaces
      include/asm-generic/pgtable.h and add the definitions of the page table
      accessors to the new header.
      
      This patch (of 12):
      
      The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
      functions involving page table manipulations, e.g.  pte_alloc() and
      pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
      in the files that include <linux/mm.h>.
      
      The include statements in such cases are remove with a simple loop:
      
      	for f in $(git grep -l "include <linux/mm.h>") ; do
      		sed -i -e '/include <asm\/pgtable.h>/ d' $f
      	done
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e31cf2f4
  9. 19 7月, 2019 1 次提交
  10. 31 10月, 2018 3 次提交
    • M
      mm: remove include/linux/bootmem.h · 57c8a661
      Mike Rapoport 提交于
      Move remaining definitions and declarations from include/linux/bootmem.h
      into include/linux/memblock.h and remove the redundant header.
      
      The includes were replaced with the semantic patch below and then
      semi-automated removal of duplicated '#include <linux/memblock.h>
      
      @@
      @@
      - #include <linux/bootmem.h>
      + #include <linux/memblock.h>
      
      [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
      [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
      [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
        Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
      Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57c8a661
    • M
      memblock: replace BOOTMEM_ALLOC_* with MEMBLOCK variants · 97ad1087
      Mike Rapoport 提交于
      Drop BOOTMEM_ALLOC_ACCESSIBLE and BOOTMEM_ALLOC_ANYWHERE in favor of
      identical MEMBLOCK definitions.
      
      Link: http://lkml.kernel.org/r/1536927045-23536-29-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97ad1087
    • M
      memblock: remove _virt from APIs returning virtual address · eb31d559
      Mike Rapoport 提交于
      The conversion is done using
      
      sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
      	$(git grep -l memblock_virt_alloc)
      
      Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb31d559
  11. 18 8月, 2018 5 次提交
    • P
      mm/sparse: delete old sparse_init and enable new one · 2a3cb8ba
      Pavel Tatashin 提交于
      Rename new_sparse_init() to sparse_init() which enables it.  Delete old
      sparse_init() and all the code that became obsolete with.
      
      [pasha.tatashin@oracle.com: remove unused sparse_mem_maps_populate_node()]
        Link: http://lkml.kernel.org/r/20180716174447.14529-6-pasha.tatashin@oracle.com
      Link: http://lkml.kernel.org/r/20180712203730.8703-6-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Tested-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
      Tested-by: NOscar Salvador <osalvador@suse.de>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a3cb8ba
    • P
      mm/sparse: move buffer init/fini to the common place · afda57bc
      Pavel Tatashin 提交于
      Now that both variants of sparse memory use the same buffers to populate
      memory map, we can move sparse_buffer_init()/sparse_buffer_fini() to the
      common place.
      
      Link: http://lkml.kernel.org/r/20180712203730.8703-4-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Tested-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
      Tested-by: NOscar Salvador <osalvador@suse.de>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      afda57bc
    • P
      mm/sparse: abstract sparse buffer allocations · 35fd1eb1
      Pavel Tatashin 提交于
      Patch series "sparse_init rewrite", v6.
      
      In sparse_init() we allocate two large buffers to temporary hold usemap
      and memmap for the whole machine.  However, we can avoid doing that if
      we changed sparse_init() to operated on per-node bases instead of doing
      it on the whole machine beforehand.
      
      As shown by Baoquan
        http://lkml.kernel.org/r/20180628062857.29658-1-bhe@redhat.com
      
      The buffers are large enough to cause machine stop to boot on small
      memory systems.
      
      Another benefit of these changes is that they also obsolete
      CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER.
      
      This patch (of 5):
      
      When struct pages are allocated for sparse-vmemmap VA layout, we first try
      to allocate one large buffer, and than if that fails allocate struct pages
      for each section as we go.
      
      The code that allocates buffer is uses global variables and is spread
      across several call sites.
      
      Cleanup the code by introducing three functions to handle the global
      buffer:
      
      sparse_buffer_init()	initialize the buffer
      sparse_buffer_fini()	free the remaining part of the buffer
      sparse_buffer_alloc()	alloc from the buffer, and if buffer is empty
      return NULL
      
      Define these functions in sparse.c instead of sparse-vmemmap.c because
      later we will use them for non-vmemmap sparse allocations as well.
      
      [akpm@linux-foundation.org: use PTR_ALIGN()]
      [akpm@linux-foundation.org: s/BUG_ON/WARN_ON/]
      Link: http://lkml.kernel.org/r/20180712203730.8703-2-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Tested-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Tested-by: NOscar Salvador <osalvador@suse.de>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35fd1eb1
    • B
      mm/sparse: optimize memmap allocation during sparse_init() · c98aff64
      Baoquan He 提交于
      In sparse_init(), two temporary pointer arrays, usemap_map and map_map
      are allocated with the size of NR_MEM_SECTIONS.  They are used to store
      each memory section's usemap and mem map if marked as present.  With the
      help of these two arrays, continuous memory chunk is allocated for
      usemap and memmap for memory sections on one node.  This avoids too many
      memory fragmentations.  Like below diagram, '1' indicates the present
      memory section, '0' means absent one.  The number 'n' could be much
      smaller than NR_MEM_SECTIONS on most of systems.
      
        |1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
        -------------------------------------------------------
         0 1 2 3         4 5         i   i+1     n-1   n
      
      If we fail to populate the page tables to map one section's memmap, its
      ->section_mem_map will be cleared finally to indicate that it's not
      present.  After use, these two arrays will be released at the end of
      sparse_init().
      
      In 4-level paging mode, each array costs 4M which can be ignorable.
      While in 5-level paging, they costs 256M each, 512M altogether.  Kdump
      kernel Usually only reserves very few memory, e.g 256M.  So, even thouth
      they are temporarily allocated, still not acceptable.
      
      In fact, there's no need to allocate them with the size of
      NR_MEM_SECTIONS.  Since the ->section_mem_map clearing has been deferred
      to the last, the number of present memory sections are kept the same
      during sparse_init() until we finally clear out the memory section's
      ->section_mem_map if its usemap or memmap is not correctly handled.
      Thus in the middle whenever for_each_present_section_nr() loop is taken,
      the i-th present memory section is always the same one.
      
      Here only allocate usemap_map and map_map with the size of
      'nr_present_sections'.  For the i-th present memory section, install its
      usemap and memmap to usemap_map[i] and mam_map[i] during allocation.
      Then in the last for_each_present_section_nr() loop which clears the
      failed memory section's ->section_mem_map, fetch usemap and memmap from
      usemap_map[] and map_map[] array and set them into mem_section[]
      accordingly.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20180628062857.29658-5-bhe@redhat.comSigned-off-by: NBaoquan He <bhe@redhat.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oscar Salvador <osalvador@techadventures.net>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c98aff64
    • B
      mm/sparsemem.c: defer the ms->section_mem_map clearing · 07a34a8c
      Baoquan He 提交于
      In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
      will allocate one continuous memory chunk for mem maps on one node and
      populate the relevant page tables to map memory section one by one.  If
      fail to populate for a certain mem section, print warning and its
      ->section_mem_map will be cleared to cancel the marking of being
      present.  Like this, the number of mem sections marked as present could
      become less during sparse_init() execution.
      
      Here just defer the ms->section_mem_map clearing if failed to populate
      its page tables until the last for_each_present_section_nr() loop.  This
      is in preparation for later optimizing the mem map allocation.
      
      [akpm@linux-foundation.org: remove now-unused local `ms', per Oscar]
      Link: http://lkml.kernel.org/r/20180228032657.32385-3-bhe@redhat.comSigned-off-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NDave Hansen <dave.hansen@intel.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      07a34a8c
  12. 09 1月, 2018 3 次提交
  13. 16 11月, 2017 2 次提交
    • M
      mm, sparse: do not swamp log with huge vmemmap allocation failures · fcdaf842
      Michal Hocko 提交于
      While doing memory hotplug tests under heavy memory pressure we have
      noticed too many page allocation failures when allocating vmemmap memmap
      backed by huge page
      
        kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO)
        [...]
        Call Trace:
          dump_trace+0x59/0x310
          show_stack_log_lvl+0xea/0x170
          show_stack+0x21/0x40
          dump_stack+0x5c/0x7c
          warn_alloc_failed+0xe2/0x150
          __alloc_pages_nodemask+0x3ed/0xb20
          alloc_pages_current+0x7f/0x100
          vmemmap_alloc_block+0x79/0xb6
          __vmemmap_alloc_block_buf+0x136/0x145
          vmemmap_populate+0xd2/0x2b9
          sparse_mem_map_populate+0x23/0x30
          sparse_add_one_section+0x68/0x18e
          __add_pages+0x10a/0x1d0
          arch_add_memory+0x4a/0xc0
          add_memory_resource+0x89/0x160
          add_memory+0x6d/0xd0
          acpi_memory_device_add+0x181/0x251
          acpi_bus_attach+0xfd/0x19b
          acpi_bus_scan+0x59/0x69
          acpi_device_hotplug+0xd2/0x41f
          acpi_hotplug_work_fn+0x1a/0x23
          process_one_work+0x14e/0x410
          worker_thread+0x116/0x490
          kthread+0xbd/0xe0
          ret_from_fork+0x3f/0x70
      
      and we do see many of those because essentially every allocation fails
      for each memory section.  This is an excessive way to tell the user that
      there is nothing to really worry about because we do have a fallback
      mechanism to use base pages.  The only downside might be a performance
      degradation due to TLB pressure.
      
      This patch changes vmemmap_alloc_block() to use __GFP_NOWARN and warn
      explicitly once on the first allocation failure.  This will reduce the
      noise in the kernel log considerably, while we still have an indication
      that a performance might be impacted.
      
      [mhocko@kernel.org: forgot to git add the follow up fix]
        Link: http://lkml.kernel.org/r/20171107090635.c27thtse2lchjgvb@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20171106092228.31098-1-mhocko@kernel.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcdaf842
    • P
      mm: stop zeroing memory during allocation in vmemmap · f7f99100
      Pavel Tatashin 提交于
      vmemmap_alloc_block() will no longer zero the block, so zero memory at
      its call sites for everything except struct pages.  Struct page memory
      is zero'd by struct page initialization.
      
      Replace allocators in sparse-vmemmap to use the non-zeroing version.
      So, we will get the performance improvement by zeroing the memory in
      parallel when struct pages are zeroed.
      
      Add struct page zeroing as a part of initialization of other fields in
      __init_single_page().
      
      This single thread performance collected on: Intel(R) Xeon(R) CPU E7-8895
      v3 @ 2.60GHz with 1T of memory (268400646 pages in 8 nodes):
      
                               BASE            FIX
      sparse_init     11.244671836s   0.007199623s
      zone_sizes_init  4.879775891s   8.355182299s
                        --------------------------
      Total           16.124447727s   8.362381922s
      
      sparse_init is where memory for struct pages is zeroed, and the zeroing
      part is moved later in this patch into __init_single_page(), which is
      called from zone_sizes_init().
      
      [akpm@linux-foundation.org: make vmemmap_alloc_block_zero() private to sparse-vmemmap.c]
      Link: http://lkml.kernel.org/r/20171013173214.27300-10-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Tested-by: NBob Picco <bob.picco@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f7f99100
  14. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  15. 07 9月, 2017 1 次提交
  16. 13 7月, 2017 1 次提交
    • M
      mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic · dcda9b04
      Michal Hocko 提交于
      __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
      the page allocator.  This has been true but only for allocations
      requests larger than PAGE_ALLOC_COSTLY_ORDER.  It has been always
      ignored for smaller sizes.  This is a bit unfortunate because there is
      no way to express the same semantic for those requests and they are
      considered too important to fail so they might end up looping in the
      page allocator for ever, similarly to GFP_NOFAIL requests.
      
      Now that the whole tree has been cleaned up and accidental or misled
      usage of __GFP_REPEAT flag has been removed for !costly requests we can
      give the original flag a better name and more importantly a more useful
      semantic.  Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
      that the allocator would try really hard but there is no promise of a
      success.  This will work independent of the order and overrides the
      default allocator behavior.  Page allocator users have several levels of
      guarantee vs.  cost options (take GFP_KERNEL as an example)
      
       - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
         attempt to free memory at all. The most light weight mode which even
         doesn't kick the background reclaim. Should be used carefully because
         it might deplete the memory and the next user might hit the more
         aggressive reclaim
      
       - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
         allocation without any attempt to free memory from the current
         context but can wake kswapd to reclaim memory if the zone is below
         the low watermark. Can be used from either atomic contexts or when
         the request is a performance optimization and there is another
         fallback for a slow path.
      
       - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
         non sleeping allocation with an expensive fallback so it can access
         some portion of memory reserves. Usually used from interrupt/bh
         context with an expensive slow path fallback.
      
       - GFP_KERNEL - both background and direct reclaim are allowed and the
         _default_ page allocator behavior is used. That means that !costly
         allocation requests are basically nofail but there is no guarantee of
         that behavior so failures have to be checked properly by callers
         (e.g. OOM killer victim is allowed to fail currently).
      
       - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
         and all allocation requests fail early rather than cause disruptive
         reclaim (one round of reclaim in this implementation). The OOM killer
         is not invoked.
      
       - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
         behavior and all allocation requests try really hard. The request
         will fail if the reclaim cannot make any progress. The OOM killer
         won't be triggered.
      
       - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
         and all allocation requests will loop endlessly until they succeed.
         This might be really dangerous especially for larger orders.
      
      Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
      because they already had their semantic.  No new users are added.
      __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
      there is no progress and we have already passed the OOM point.
      
      This means that all the reclaim opportunities have been exhausted except
      the most disruptive one (the OOM killer) and a user defined fallback
      behavior is more sensible than keep retrying in the page allocator.
      
      [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
      [mhocko@suse.com: semantic fix]
        Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
      [mhocko@kernel.org: address other thing spotted by Vlastimil]
        Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dcda9b04
  17. 10 3月, 2017 1 次提交
  18. 03 8月, 2016 1 次提交
  19. 18 3月, 2016 2 次提交
  20. 16 1月, 2016 1 次提交