1. 01 12月, 2022 2 次提交
    • H
      mm,thp,rmap: subpages_mapcount COMPOUND_MAPPED if PMD-mapped · 4b51634c
      Hugh Dickins 提交于
      Can the lock_compound_mapcount() bit_spin_lock apparatus be removed now? 
      Yes.  Not by atomic64_t or cmpxchg games, those get difficult on 32-bit;
      but if we slightly abuse subpages_mapcount by additionally demanding that
      one bit be set there when the compound page is PMD-mapped, then a cascade
      of two atomic ops is able to maintain the stats without bit_spin_lock.
      
      This is harder to reason about than when bit_spin_locked, but I believe
      safe; and no drift in stats detected when testing.  When there are racing
      removes and adds, of course the sequence of operations is less well-
      defined; but each operation on subpages_mapcount is atomically good.  What
      might be disastrous, is if subpages_mapcount could ever fleetingly appear
      negative: but the pte lock (or pmd lock) these rmap functions are called
      under, ensures that a last remove cannot race ahead of a first add.
      
      Continue to make an exception for hugetlb (PageHuge) pages, though that
      exception can be easily removed by a further commit if necessary: leave
      subpages_mapcount 0, don't bother with COMPOUND_MAPPED in its case, just
      carry on checking compound_mapcount too in folio_mapped(), page_mapped().
      
      Evidence is that this way goes slightly faster than the previous
      implementation in all cases (pmds after ptes now taking around 103ms); and
      relieves us of worrying about contention on the bit_spin_lock.
      
      Link: https://lkml.kernel.org/r/3978f3ca-5473-55a7-4e14-efea5968d892@google.comSigned-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dan Carpenter <error27@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: James Houghton <jthoughton@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      4b51634c
    • H
      mm,thp,rmap: simplify compound page mapcount handling · cb67f428
      Hugh Dickins 提交于
      Compound page (folio) mapcount calculations have been different for anon
      and file (or shmem) THPs, and involved the obscure PageDoubleMap flag. 
      And each huge mapping and unmapping of a file (or shmem) THP involved
      atomically incrementing and decrementing the mapcount of every subpage of
      that huge page, dirtying many struct page cachelines.
      
      Add subpages_mapcount field to the struct folio and first tail page, so
      that the total of subpage mapcounts is available in one place near the
      head: then page_mapcount() and total_mapcount() and page_mapped(), and
      their folio equivalents, are so quick that anon and file and hugetlb don't
      need to be optimized differently.  Delete the unloved PageDoubleMap.
      
      page_add and page_remove rmap functions must now maintain the
      subpages_mapcount as well as the subpage _mapcount, when dealing with pte
      mappings of huge pages; and correct maintenance of NR_ANON_MAPPED and
      NR_FILE_MAPPED statistics still needs reading through the subpages, using
      nr_subpages_unmapped() - but only when first or last pmd mapping finds
      subpages_mapcount raised (double-map case, not the common case).
      
      But are those counts (used to decide when to split an anon THP, and in
      vmscan's pagecache_reclaimable heuristic) correctly maintained?  Not
      quite: since page_remove_rmap() (and also split_huge_pmd()) is often
      called without page lock, there can be races when a subpage pte mapcount
      0<->1 while compound pmd mapcount 0<->1 is scanning - races which the
      previous implementation had prevented.  The statistics might become
      inaccurate, and even drift down until they underflow through 0.  That is
      not good enough, but is better dealt with in a followup patch.
      
      Update a few comments on first and second tail page overlaid fields. 
      hugepage_add_new_anon_rmap() has to "increment" compound_mapcount, but
      subpages_mapcount and compound_pincount are already correctly at 0, so
      delete its reinitialization of compound_pincount.
      
      A simple 100 X munmap(mmap(2GB, MAP_SHARED|MAP_POPULATE, tmpfs), 2GB) took
      18 seconds on small pages, and used to take 1 second on huge pages, but
      now takes 119 milliseconds on huge pages.  Mapping by pmds a second time
      used to take 860ms and now takes 92ms; mapping by pmds after mapping by
      ptes (when the scan is needed) used to take 870ms and now takes 495ms. 
      But there might be some benchmarks which would show a slowdown, because
      tail struct pages now fall out of cache until final freeing checks them.
      
      Link: https://lkml.kernel.org/r/47ad693-717-79c8-e1ba-46c3a6602e48@google.comSigned-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: James Houghton <jthoughton@google.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      cb67f428
  2. 23 11月, 2022 1 次提交
  3. 29 10月, 2022 1 次提交
    • H
      mm: prep_compound_tail() clear page->private · 5aae9265
      Hugh Dickins 提交于
      Although page allocation always clears page->private in the first page or
      head page of an allocation, it has never made a point of clearing
      page->private in the tails (though 0 is often what is already there).
      
      But now commit 71e2d666 ("mm/huge_memory: do not clobber swp_entry_t
      during THP split") issues a warning when page_tail->private is found to be
      non-0 (unless it's swapcache).
      
      Change that warning to dump page_tail (which also dumps head), instead of
      just the head: so far we have seen dead000000000122, dead000000000003,
      dead000000000001 or 0000000000000002 in the raw output for tail private.
      
      We could just delete the warning, but today's consensus appears to want
      page->private to be 0, unless there's a good reason for it to be set: so
      now clear it in prep_compound_tail() (more general than just for THP; but
      not for high order allocation, which makes no pass down the tails).
      
      Link: https://lkml.kernel.org/r/1c4233bb-4e4d-5969-fbd4-96604268a285@google.com
      Fixes: 71e2d666 ("mm/huge_memory: do not clobber swp_entry_t during THP split")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      5aae9265
  4. 21 10月, 2022 1 次提交
  5. 13 10月, 2022 2 次提交
  6. 04 10月, 2022 21 次提交
  7. 27 9月, 2022 2 次提交
  8. 12 9月, 2022 8 次提交
  9. 25 8月, 2022 1 次提交
    • Y
      mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses. · ebc97a52
      Yosry Ahmed 提交于
      We keep track of several kernel memory stats (total kernel memory, page
      tables, stack, vmalloc, etc) on multiple levels (global, per-node,
      per-memcg, etc). These stats give insights to users to how much memory
      is used by the kernel and for what purposes.
      
      Currently, memory used by KVM mmu is not accounted in any of those
      kernel memory stats. This patch series accounts the memory pages
      used by KVM for page tables in those stats in a new
      NR_SECONDARY_PAGETABLE stat. This stat can be later extended to account
      for other types of secondary pages tables (e.g. iommu page tables).
      
      KVM has a decent number of large allocations that aren't for page
      tables, but for most of them, the number/size of those allocations
      scales linearly with either the number of vCPUs or the amount of memory
      assigned to the VM. KVM's secondary page table allocations do not scale
      linearly, especially when nested virtualization is in use.
      
      From a KVM perspective, NR_SECONDARY_PAGETABLE will scale with KVM's
      per-VM pages_{4k,2m,1g} stats unless the guest is doing something
      bizarre (e.g. accessing only 4kb chunks of 2mb pages so that KVM is
      forced to allocate a large number of page tables even though the guest
      isn't accessing that much memory). However, someone would need to either
      understand how KVM works to make that connection, or know (or be told) to
      go look at KVM's stats if they're running VMs to better decipher the stats.
      
      Furthermore, having NR_PAGETABLE side-by-side with NR_SECONDARY_PAGETABLE
      is informative. For example, when backing a VM with THP vs. HugeTLB,
      NR_SECONDARY_PAGETABLE is roughly the same, but NR_PAGETABLE is an order
      of magnitude higher with THP. So having this stat will at the very least
      prove to be useful for understanding tradeoffs between VM backing types,
      and likely even steer folks towards potential optimizations.
      
      The original discussion with more details about the rationale:
      https://lore.kernel.org/all/87ilqoi77b.wl-maz@kernel.org
      
      This stat will be used by subsequent patches to count KVM mmu
      memory usage.
      Signed-off-by: NYosry Ahmed <yosryahmed@google.com>
      Acked-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220823004639.2387269-2-yosryahmed@google.comSigned-off-by: NSean Christopherson <seanjc@google.com>
      ebc97a52
  10. 30 7月, 2022 1 次提交