1. 22 3月, 2022 4 次提交
  2. 04 3月, 2022 4 次提交
  3. 18 2月, 2022 3 次提交
    • H
      mm/munlock: page migration needs mlock pagevec drained · b7435507
      Hugh Dickins 提交于
      Page migration of a VM_LOCKED page tends to fail, because when the old
      page is unmapped, it is put on the mlock pagevec with raised refcount,
      which then fails the freeze.
      
      At first I thought this would be fixed by a local mlock_page_drain() at
      the upper rmap_walk() level - which would have nicely batched all the
      munlocks of that page; but tests show that the task can too easily move
      to another cpu, leaving pagevec residue behind which fails the migration.
      
      So try_to_migrate_one() drain the local pagevec after page_remove_rmap()
      from a VM_LOCKED vma; and do the same in try_to_unmap_one(), whose
      TTU_IGNORE_MLOCK users would want the same treatment; and do the same
      in remove_migration_pte() - not important when successfully inserting
      a new page, but necessary when hoping to retry after failure.
      
      Any new pagevec runs the risk of adding a new way of stranding, and we
      might discover other corners where mlock_page_drain() or lru_add_drain()
      would now help.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      b7435507
    • H
      mm/migrate: __unmap_and_move() push good newpage to LRU · c3096e67
      Hugh Dickins 提交于
      Compaction, NUMA page movement, THP collapse/split, and memory failure
      do isolate unevictable pages from their "LRU", losing the record of
      mlock_count in doing so (isolators are likely to use page->lru for their
      own private lists, so mlock_count has to be presumed lost).
      
      That's unfortunate, and we should put in some work to correct that: one
      can imagine a function to build up the mlock_count again - but it would
      require i_mmap_rwsem for read, so be careful where it's called.  Or
      page_referenced_one() and try_to_unmap_one() might do that extra work.
      
      But one place that can very easily be improved is page migration's
      __unmap_and_move(): a small adjustment to where the successful new page
      is put back on LRU, and its mlock_count (if any) is built back up by
      remove_migration_ptes().
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      c3096e67
    • H
      mm/munlock: rmap call mlock_vma_page() munlock_vma_page() · cea86fe2
      Hugh Dickins 提交于
      Add vma argument to mlock_vma_page() and munlock_vma_page(), make them
      inline functions which check (vma->vm_flags & VM_LOCKED) before calling
      mlock_page() and munlock_page() in mm/mlock.c.
      
      Add bool compound to mlock_vma_page() and munlock_vma_page(): this is
      because we have understandable difficulty in accounting pte maps of THPs,
      and if passed a PageHead page, mlock_page() and munlock_page() cannot
      tell whether it's a pmd map to be counted or a pte map to be ignored.
      
      Add vma arg to page_add_file_rmap() and page_remove_rmap(), like the
      others, and use that to call mlock_vma_page() at the end of the page
      adds, and munlock_vma_page() at the end of page_remove_rmap() (end or
      beginning? unimportant, but end was easier for assertions in testing).
      
      No page lock is required (although almost all adds happen to hold it):
      delete the "Serialize with page migration" BUG_ON(!PageLocked(page))s.
      Certainly page lock did serialize with page migration, but I'm having
      difficulty explaining why that was ever important.
      
      Mlock accounting on THPs has been hard to define, differed between anon
      and file, involved PageDoubleMap in some places and not others, required
      clear_page_mlock() at some points.  Keep it simple now: just count the
      pmds and ignore the ptes, there is no reason for ptes to undo pmd mlocks.
      
      page_add_new_anon_rmap() callers unchanged: they have long been calling
      lru_cache_add_inactive_or_unevictable(), which does its own VM_LOCKED
      handling (it also checks for not VM_SPECIAL: I think that's overcautious,
      and inconsistent with other checks, that mmap_region() already prevents
      VM_LOCKED on VM_SPECIAL; but haven't quite convinced myself to change it).
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      cea86fe2
  4. 22 1月, 2022 1 次提交
    • A
      mm/migrate.c: rework migration_entry_wait() to not take a pageref · ffa65753
      Alistair Popple 提交于
      This fixes the FIXME in migrate_vma_check_page().
      
      Before migrating a page migration code will take a reference and check
      there are no unexpected page references, failing the migration if there
      are.  When a thread faults on a migration entry it will take a temporary
      reference to the page to wait for the page to become unlocked signifying
      the migration entry has been removed.
      
      This reference is dropped just prior to waiting on the page lock,
      however the extra reference can cause migration failures so it is
      desirable to avoid taking it.
      
      As migration code already has a reference to the migrating page an extra
      reference to wait on PG_locked is unnecessary so long as the reference
      can't be dropped whilst setting up the wait.
      
      When faulting on a migration entry the ptl is taken to check the
      migration entry.  Removing a migration entry also requires the ptl, and
      migration code won't drop its page reference until after the migration
      entry has been removed.  Therefore retaining the ptl of a migration
      entry is sufficient to ensure the page has a reference.  Reworking
      migration_entry_wait() to hold the ptl until the wait setup is complete
      means the extra page reference is no longer needed.
      
      [apopple@nvidia.com: v5]
        Link: https://lkml.kernel.org/r/20211213033848.1973946-1-apopple@nvidia.com
      
      Link: https://lkml.kernel.org/r/20211118020754.954425-1-apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ffa65753
  5. 15 1月, 2022 7 次提交
  6. 08 1月, 2022 1 次提交
  7. 05 1月, 2022 1 次提交
  8. 12 11月, 2021 2 次提交
  9. 07 11月, 2021 1 次提交
  10. 19 10月, 2021 3 次提交
    • H
      mm/migrate: fix CPUHP state to update node demotion order · a6a0251c
      Huang Ying 提交于
      The node demotion order needs to be updated during CPU hotplug.  Because
      whether a NUMA node has CPU may influence the demotion order.  The
      update function should be called during CPU online/offline after the
      node_states[N_CPU] has been updated.  That is done in
      CPUHP_AP_ONLINE_DYN during CPU online and in CPUHP_MM_VMSTAT_DEAD during
      CPU offline.  But in commit 884a6e5d ("mm/migrate: update node
      demotion order on hotplug events"), the function to update node demotion
      order is called in CPUHP_AP_ONLINE_DYN during CPU online/offline.  This
      doesn't satisfy the order requirement.
      
      For example, there are 4 CPUs (P0, P1, P2, P3) in 2 sockets (P0, P1 in S0
      and P2, P3 in S1), the demotion order is
      
       - S0 -> NUMA_NO_NODE
       - S1 -> NUMA_NO_NODE
      
      After P2 and P3 is offlined, because S1 has no CPU now, the demotion
      order should have been changed to
      
       - S0 -> S1
       - S1 -> NO_NODE
      
      but it isn't changed, because the order updating callback for CPU
      hotplug doesn't see the new nodemask.  After that, if P1 is offlined,
      the demotion order is changed to the expected order as above.
      
      So in this patch, we added CPUHP_AP_MM_DEMOTION_ONLINE and
      CPUHP_MM_DEMOTION_DEAD to be called after CPUHP_AP_ONLINE_DYN and
      CPUHP_MM_VMSTAT_DEAD during CPU online and offline, and register the
      update function on them.
      
      Link: https://lkml.kernel.org/r/20210929060351.7293-1-ying.huang@intel.com
      Fixes: 884a6e5d ("mm/migrate: update node demotion order on hotplug events")
      Signed-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Wei Xu <weixugc@google.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Keith Busch <kbusch@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6a0251c
    • D
      mm/migrate: add CPU hotplug to demotion #ifdef · 76af6a05
      Dave Hansen 提交于
      Once upon a time, the node demotion updates were driven solely by memory
      hotplug events.  But now, there are handlers for both CPU and memory
      hotplug.
      
      However, the #ifdef around the code checks only memory hotplug.  A
      system that has HOTPLUG_CPU=y but MEMORY_HOTPLUG=n would miss CPU
      hotplug events.
      
      Update the #ifdef around the common code.  Add memory and CPU-specific
      #ifdefs for their handlers.  These memory/CPU #ifdefs avoid unused
      function warnings when their Kconfig option is off.
      
      [arnd@arndb.de: rework hotplug_memory_notifier() stub]
        Link: https://lkml.kernel.org/r/20211013144029.2154629-1-arnd@kernel.org
      
      Link: https://lkml.kernel.org/r/20210924161255.E5FE8F7E@davehans-spike.ostc.intel.com
      Fixes: 884a6e5d ("mm/migrate: update node demotion order on hotplug events")
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Wei Xu <weixugc@google.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76af6a05
    • D
      mm/migrate: optimize hotplug-time demotion order updates · 295be91f
      Dave Hansen 提交于
      Patch series "mm/migrate: 5.15 fixes for automatic demotion", v2.
      
      This contains two fixes for the "automatic demotion" code which was
      merged into 5.15:
      
       * Fix memory hotplug performance regression by watching
         suppressing any real action on irrelevant hotplug events.
      
       * Ensure CPU hotplug handler is registered when memory hotplug
         is disabled.
      
      This patch (of 2):
      
      == tl;dr ==
      
      Automatic demotion opted for a simple, lazy approach to handling hotplug
      events.  This noticeably slows down memory hotplug[1].  Optimize away
      updates to the demotion order when memory hotplug events should have no
      effect.
      
      This has no effect on CPU hotplug.  There is no known problem on the CPU
      side and any work there will be in a separate series.
      
      == Background ==
      
      Automatic demotion is a memory migration strategy to ensure that new
      allocations have room in faster memory tiers on tiered memory systems.
      The kernel maintains an array (node_demotion[]) to drive these
      migrations.
      
      The node_demotion[] path is calculated by starting at nodes with CPUs
      and then "walking" to nodes with memory.  Only hotplug events which
      online or offline a node with memory (N_ONLINE) or CPUs (N_CPU) will
      actually affect the migration order.
      
      == Problem ==
      
      However, the current code is lazy.  It completely regenerates the
      migration order on *any* CPU or memory hotplug event.  The logic was
      that these events are extremely rare and that the overhead from
      indiscriminate order regeneration is minimal.
      
      Part of the update logic involves a synchronize_rcu(), which is a pretty
      big hammer.  Its overhead was large enough to be detected by some 0day
      tests that watch memory hotplug performance[1].
      
      == Solution ==
      
      Add a new helper (node_demotion_topo_changed()) which can differentiate
      between superfluous and impactful hotplug events.  Skip the expensive
      update operation for superfluous events.
      
      == Aside: Locking ==
      
      It took me a few moments to declare the locking to be safe enough for
      node_demotion_topo_changed() to work.  It all hinges on the memory
      hotplug lock:
      
      During memory hotplug events, 'mem_hotplug_lock' is held for write.
      This ensures that two memory hotplug events can not be called
      simultaneously.
      
      CPU hotplug has a similar lock (cpuhp_state_mutex) which also provides
      mutual exclusion between CPU hotplug events.  In addition, the demotion
      code acquire and hold the mem_hotplug_lock for read during its CPU
      hotplug handlers.  This provides mutual exclusion between the demotion
      memory hotplug callbacks and the CPU hotplug callbacks.
      
      This effectively allows treating the migration target generation code to
      act as if it is single-threaded.
      
      1. https://lore.kernel.org/all/20210905135932.GE15026@xsang-OptiPlex-9020/
      
      Link: https://lkml.kernel.org/r/20210924161251.093CCD06@davehans-spike.ostc.intel.com
      Link: https://lkml.kernel.org/r/20210924161253.D7673E31@davehans-spike.ostc.intel.com
      Fixes: 884a6e5d ("mm/migrate: update node demotion order on hotplug events")
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reported-by: Nkernel test robot <oliver.sang@intel.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Wei Xu <weixugc@google.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      295be91f
  11. 18 10月, 2021 3 次提交
  12. 27 9月, 2021 2 次提交
  13. 09 9月, 2021 5 次提交
  14. 04 9月, 2021 3 次提交