1. 08 8月, 2020 14 次提交
  2. 03 8月, 2020 2 次提交
  3. 29 7月, 2020 2 次提交
  4. 25 7月, 2020 8 次提交
  5. 21 7月, 2020 1 次提交
  6. 17 7月, 2020 2 次提交
  7. 14 7月, 2020 1 次提交
    • L
      mm: document warning in move_normal_pmd() and make it warn only once · f81fdd0c
      Linus Torvalds 提交于
      Naresh Kamboju reported that the LTP tests can cause warnings on i386
      going back all the way to v5.0, and bisected it to commit 2c91bd4a
      ("mm: speed up mremap by 20x on large regions").
      
      The warning in move_normal_pmd() is actually mostly correct, but we have
      a very unusual special case at process creation time, when we may move
      the stack down with an overlapping mode (kind of like a "memmove()"
      except using the page tables).
      
      And when you have just the right condition of "move a large initial
      stack by the right alignment in the end, but with the early part of the
      move being only page-aligned", we'll be in a situation where we're
      trying to move a normal PMD entry on top of an already existing - but
      now empty - PMD entry.
      
      The warning is still worth having, in case it ever triggers other cases,
      and perhaps as a reminder that we could do the stack move case more
      efficiently (although it's clearly rare enough that it probably doesn't
      matter).
      
      But make it do WARN_ON_ONCE(), so that you can't flood the logs with it.
      
      And add a *big* comment above it to explain and remind us what's going
      on, because it took some figuring out to see how this could trigger.
      Kudos to Joel Fernandes for debugging this.
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Debugged-and-acked-by: NJoel Fernandes <joel@joelfernandes.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f81fdd0c
  8. 11 7月, 2020 2 次提交
    • J
      debugfs: make sure we can remove u32_array files cleanly · a2b992c8
      Jakub Kicinski 提交于
      debugfs_create_u32_array() allocates a small structure to wrap
      the data and size information about the array. If users ever
      try to remove the file this leads to a leak since nothing ever
      frees this wrapper.
      
      That said there are no upstream users of debugfs_create_u32_array()
      that'd remove a u32 array file (we only have one u32 array user in
      CMA), so there is no real bug here.
      
      Make callers pass a wrapper they allocated. This way the lifetime
      management of the wrapper is on the caller, and we can avoid the
      potential leak in debugfs.
      
      CC: Chucheng Luo <luochucheng@vivo.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2b992c8
    • R
      mm/hmm: provide the page mapping order in hmm_range_fault() · 3b50a6e5
      Ralph Campbell 提交于
      hmm_range_fault() returns an array of page frame numbers and flags for how
      the pages are mapped in the requested process' page tables. The PFN can be
      used to get the struct page with hmm_pfn_to_page() and the page size order
      can be determined with compound_order(page).
      
      However, if the page is larger than order 0 (PAGE_SIZE), there is no
      indication that a compound page is mapped by the CPU using a larger page
      size. Without this information, the caller can't safely use a large device
      PTE to map the compound page because the CPU might be using smaller PTEs
      with different read/write permissions.
      
      Add a new function hmm_pfn_to_map_order() to return the mapping size order
      so that callers know the pages are being mapped with consistent
      permissions and a large device page table mapping can be used if one is
      available.
      
      This will allow devices to optimize mapping the page into HW by avoiding
      or batching work for huge pages. For instance the dma_map can be done with
      a high order directly.
      
      Link: https://lore.kernel.org/r/20200701225352.9649-3-rcampbell@nvidia.comSigned-off-by: NRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      3b50a6e5
  9. 10 7月, 2020 1 次提交
    • D
      mm/memblock: expose only miminal interface to add/walk physmem · 77649905
      David Hildenbrand 提交于
      "physmem" in the memblock allocator is somewhat weird: it's not actually
      used for allocation, it's simply information collected during boot, which
      describes the unmodified physical memory map at boot time, without any
      standby/hotplugged memory. It's only used on s390 and is currently the
      only reason s390 keeps using CONFIG_ARCH_KEEP_MEMBLOCK.
      
      Physmem isn't numa aware and current users don't specify any flags. Let's
      hide it from the user, exposing only for_each_physmem(), and simplify. The
      interface for physmem is now really minimalistic:
      - memblock_physmem_add() to add ranges
      - for_each_physmem() / __next_physmem_range() to walk physmem ranges
      
      Don't place it into an __init section and don't discard it without
      CONFIG_ARCH_KEEP_MEMBLOCK. As we're reusing __next_mem_range(), remove
      the __meminit notifier to avoid section mismatch warnings once
      CONFIG_ARCH_KEEP_MEMBLOCK is no longer used with
      CONFIG_HAVE_MEMBLOCK_PHYS_MAP.
      
      While fixing up the documentation, sneak in some related cleanups. We can
      stop setting CONFIG_ARCH_KEEP_MEMBLOCK for s390 next.
      
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Message-Id: <20200701141830.18749-2-david@redhat.com>
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      77649905
  10. 09 7月, 2020 3 次提交
  11. 08 7月, 2020 1 次提交
  12. 04 7月, 2020 3 次提交
    • J
      mm/page_alloc: fix documentation error · 8beeae86
      Joel Savitz 提交于
      When I increased the upper bound of the min_free_kbytes value in
      ee8eb9a5 ("mm/page_alloc: increase default min_free_kbytes bound") I
      forgot to tweak the above comment to reflect the new value.  This patch
      fixes that mistake.
      Signed-off-by: NJoel Savitz <jsavitz@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Fabrizio D'Angelo <fdangelo@redhat.com>
      Link: http://lkml.kernel.org/r/20200624221236.29560-1-jsavitz@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8beeae86
    • B
      mm/cma.c: use exact_nid true to fix possible per-numa cma leak · 40366bd7
      Barry Song 提交于
      Calling cma_declare_contiguous_nid() with false exact_nid for per-numa
      reservation can easily cause cma leak and various confusion.  For example,
      mm/hugetlb.c is trying to reserve per-numa cma for gigantic pages.  But it
      can easily leak cma and make users confused when system has memoryless
      nodes.
      
      In case the system has 4 numa nodes, and only numa node0 has memory.  if
      we set hugetlb_cma=4G in bootargs, mm/hugetlb.c will get 4 cma areas for 4
      different numa nodes.  since exact_nid=false in current code, all 4 numa
      nodes will get cma successfully from node0, but hugetlb_cma[1 to 3] will
      never be available to hugepage will only allocate memory from
      hugetlb_cma[0].
      
      In case the system has 4 numa nodes, both numa node0&2 has memory, other
      nodes have no memory.  if we set hugetlb_cma=4G in bootargs, mm/hugetlb.c
      will get 4 cma areas for 4 different numa nodes.  since exact_nid=false in
      current code, all 4 numa nodes will get cma successfully from node0 or 2,
      but hugetlb_cma[1] and [3] will never be available to hugepage as
      mm/hugetlb.c will only allocate memory from hugetlb_cma[0] and
      hugetlb_cma[2].  This causes permanent leak of the cma areas which are
      supposed to be used by memoryless node.
      
      Of cource we can workaround the issue by letting mm/hugetlb.c scan all cma
      areas in alloc_gigantic_page() even node_mask includes node0 only.  that
      means when node_mask includes node0 only, we can get page from
      hugetlb_cma[1] to hugetlb_cma[3].  But this will cause kernel crash in
      free_gigantic_page() while it wants to free page by:
      cma_release(hugetlb_cma[page_to_nid(page)], page, 1 << order)
      
      On the other hand, exact_nid=false won't consider numa distance, it might
      be not that useful to leverage cma areas on remote nodes.  I feel it is
      much simpler to make exact_nid true to make everything clear.  After that,
      memoryless nodes won't be able to reserve per-numa CMA from other nodes
      which have memory.
      
      Fixes: cf11e85f ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
      Signed-off-by: NBarry Song <song.bao.hua@hisilicon.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Aslan Bakirov <aslan@fb.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Andreas Schaufler <andreas.schaufler@gmx.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200628074345.27228-1-song.bao.hua@hisilicon.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      40366bd7
    • M
      mm/hugetlb.c: fix pages per hugetlb calculation · 1139d336
      Mike Kravetz 提交于
      The routine hpage_nr_pages() was incorrectly used to calculate the number
      of base pages in a hugetlb page.  hpage_nr_pages is designed to be called
      for THP pages and will return HPAGE_PMD_NR for hugetlb pages of any size.
      
      Due to the context in which hpage_nr_pages was called, it is unlikely to
      produce a user visible error.  The routine with the incorrect call is only
      exercised in the case of hugetlb memory error or migration.  In addition,
      this would need to be on an architecture which supports huge page sizes
      less than PMD_SIZE.  And, the vma containing the huge page would also need
      to smaller than PMD_SIZE.
      
      Fixes: c0d0381a ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
      Reported-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200629185003.97202-1-mike.kravetz@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1139d336