- 22 3月, 2022 4 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
Convert the implementation and all callers. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Matthew Wilcox (Oracle) 提交于
Convert the callers to pass a folio and the try_to_migrate_one() worker to use a folio throughout. Fixes an assumption that a folio must be <= PMD size. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Matthew Wilcox (Oracle) 提交于
page_mapped_in_vma() really just wants to walk one page, but as the code stands, if passed the head page of a compound page, it will walk every page in the compound page. Extract pfn/nr_pages/pgoff from the struct page early, so they can be overridden by page_mapped_in_vma(). Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Matthew Wilcox (Oracle) 提交于
Instead of declaring a struct page_vma_mapped_walk directly, use these helpers to allow us to transition to a PFN approach in the following patches. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
- 04 3月, 2022 4 次提交
-
-
由 Christoph Hellwig 提交于
Split the code used to migrate to and from ZONE_DEVICE memory from migrate.c into a new file. Link: https://lkml.kernel.org/r/20220210072828.2930359-14-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: N"Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Christian Knig <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Christoph Hellwig 提交于
Make the flow a little more clear and prepare for adding a new ZONE_DEVICE memory type. Link: https://lkml.kernel.org/r/20220210072828.2930359-13-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlistair Popple <apopple@nvidia.com> Tested-by: N"Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Christian Knig <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Christoph Hellwig 提交于
Make the flow a little more clear and prepare for adding a new ZONE_DEVICE memory type. Link: https://lkml.kernel.org/r/20220210072828.2930359-12-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlistair Popple <apopple@nvidia.com> Tested-by: N"Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Christian Knig <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Christoph Hellwig 提交于
ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to be treated specially for ZONE_DEVICE pages. Note that this excludes the special idle page wakeup for fsdax pages, which still happens at refcount 1. This is a separate issue and will be sorted out later. Given that only fsdax pages require the notifiacation when the refcount hits 1 now, the PAGEMAP_OPS Kconfig symbol can go away and be replaced with a FS_DAX check for this hook in the put_page fastpath. Based on an earlier patch from Ralph Campbell <rcampbell@nvidia.com>. Link: https://lkml.kernel.org/r/20220210072828.2930359-8-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NRalph Campbell <rcampbell@nvidia.com> Reviewed-by: NJason Gunthorpe <jgg@nvidia.com> Reviewed-by: NDan Williams <dan.j.williams@intel.com> Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Tested-by: N"Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Christian Knig <christian.koenig@amd.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
- 18 2月, 2022 3 次提交
-
-
由 Hugh Dickins 提交于
Page migration of a VM_LOCKED page tends to fail, because when the old page is unmapped, it is put on the mlock pagevec with raised refcount, which then fails the freeze. At first I thought this would be fixed by a local mlock_page_drain() at the upper rmap_walk() level - which would have nicely batched all the munlocks of that page; but tests show that the task can too easily move to another cpu, leaving pagevec residue behind which fails the migration. So try_to_migrate_one() drain the local pagevec after page_remove_rmap() from a VM_LOCKED vma; and do the same in try_to_unmap_one(), whose TTU_IGNORE_MLOCK users would want the same treatment; and do the same in remove_migration_pte() - not important when successfully inserting a new page, but necessary when hoping to retry after failure. Any new pagevec runs the risk of adding a new way of stranding, and we might discover other corners where mlock_page_drain() or lru_add_drain() would now help. Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Hugh Dickins 提交于
Compaction, NUMA page movement, THP collapse/split, and memory failure do isolate unevictable pages from their "LRU", losing the record of mlock_count in doing so (isolators are likely to use page->lru for their own private lists, so mlock_count has to be presumed lost). That's unfortunate, and we should put in some work to correct that: one can imagine a function to build up the mlock_count again - but it would require i_mmap_rwsem for read, so be careful where it's called. Or page_referenced_one() and try_to_unmap_one() might do that extra work. But one place that can very easily be improved is page migration's __unmap_and_move(): a small adjustment to where the successful new page is put back on LRU, and its mlock_count (if any) is built back up by remove_migration_ptes(). Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Hugh Dickins 提交于
Add vma argument to mlock_vma_page() and munlock_vma_page(), make them inline functions which check (vma->vm_flags & VM_LOCKED) before calling mlock_page() and munlock_page() in mm/mlock.c. Add bool compound to mlock_vma_page() and munlock_vma_page(): this is because we have understandable difficulty in accounting pte maps of THPs, and if passed a PageHead page, mlock_page() and munlock_page() cannot tell whether it's a pmd map to be counted or a pte map to be ignored. Add vma arg to page_add_file_rmap() and page_remove_rmap(), like the others, and use that to call mlock_vma_page() at the end of the page adds, and munlock_vma_page() at the end of page_remove_rmap() (end or beginning? unimportant, but end was easier for assertions in testing). No page lock is required (although almost all adds happen to hold it): delete the "Serialize with page migration" BUG_ON(!PageLocked(page))s. Certainly page lock did serialize with page migration, but I'm having difficulty explaining why that was ever important. Mlock accounting on THPs has been hard to define, differed between anon and file, involved PageDoubleMap in some places and not others, required clear_page_mlock() at some points. Keep it simple now: just count the pmds and ignore the ptes, there is no reason for ptes to undo pmd mlocks. page_add_new_anon_rmap() callers unchanged: they have long been calling lru_cache_add_inactive_or_unevictable(), which does its own VM_LOCKED handling (it also checks for not VM_SPECIAL: I think that's overcautious, and inconsistent with other checks, that mmap_region() already prevents VM_LOCKED on VM_SPECIAL; but haven't quite convinced myself to change it). Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
- 22 1月, 2022 1 次提交
-
-
由 Alistair Popple 提交于
This fixes the FIXME in migrate_vma_check_page(). Before migrating a page migration code will take a reference and check there are no unexpected page references, failing the migration if there are. When a thread faults on a migration entry it will take a temporary reference to the page to wait for the page to become unlocked signifying the migration entry has been removed. This reference is dropped just prior to waiting on the page lock, however the extra reference can cause migration failures so it is desirable to avoid taking it. As migration code already has a reference to the migrating page an extra reference to wait on PG_locked is unnecessary so long as the reference can't be dropped whilst setting up the wait. When faulting on a migration entry the ptl is taken to check the migration entry. Removing a migration entry also requires the ptl, and migration code won't drop its page reference until after the migration entry has been removed. Therefore retaining the ptl of a migration entry is sufficient to ensure the page has a reference. Reworking migration_entry_wait() to hold the ptl until the wait setup is complete means the extra page reference is no longer needed. [apopple@nvidia.com: v5] Link: https://lkml.kernel.org/r/20211213033848.1973946-1-apopple@nvidia.com Link: https://lkml.kernel.org/r/20211118020754.954425-1-apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com> Acked-by: NDavid Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 1月, 2022 7 次提交
-
-
由 Colin Ian King 提交于
The variable addr is being set and incremented in a for-loop but not actually being used. It is redundant and so addr and also variable start can be removed. Link: https://lkml.kernel.org/r/20211221185729.609630-1-colin.i.king@gmail.comSigned-off-by: NColin Ian King <colin.i.king@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Huang Ying 提交于
Now, node_demotion and next_demotion_node() are placed between __unmap_and_move() and unmap_and_move(). This hurts code readability. So move them near their users in the file. There's no functionality change in this patch. Link: https://lkml.kernel.org/r/20211206031227.3323097-1-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NYang Shi <shy828301@gmail.com> Reviewed-by: NWei Xu <weixugc@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baolin Wang 提交于
As Yang Shi suggested [1], it will be helpful to explain why we should select target node randomly now if there are multiple target nodes. [1] https://lore.kernel.org/all/CAHbLzkqSqCL+g7dfzeOw8fPyeEC0BBv13Ny1UVGHDkadnQdR=g@mail.gmail.com/ Link: https://lkml.kernel.org/r/c31d36bd097c6e9e69fc0f409c43b78e53e64fc2.1637766801.git.baolin.wang@linux.alibaba.comSigned-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NYang Shi <shy828301@gmail.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Zi Yan <ziy@nvidia.com> Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com> Cc: Xunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baolin Wang 提交于
We have some machines with multiple memory types like below, which have one fast (DRAM) memory node and two slow (persistent memory) memory nodes. According to current node demotion policy, if node 0 fills up, its memory should be migrated to node 1, when node 1 fills up, its memory will be migrated to node 2: node 0 -> node 1 -> node 2 ->stop. But this is not efficient and suitbale memory migration route for our machine with multiple slow memory nodes. Since the distance between node 0 to node 1 and node 0 to node 2 is equal, and memory migration between slow memory nodes will increase persistent memory bandwidth greatly, which will hurt the whole system's performance. Thus for this case, we can treat the slow memory node 1 and node 2 as a whole slow memory region, and we should migrate memory from node 0 to node 1 and node 2 if node 0 fills up. This patch changes the node_demotion data structure to support multiple target nodes, and establishes the migration path to support multiple target nodes with validating if the node distance is the best or not. available: 3 nodes (0-2) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 62153 MB node 0 free: 55135 MB node 1 cpus: node 1 size: 127007 MB node 1 free: 126930 MB node 2 cpus: node 2 size: 126968 MB node 2 free: 126878 MB node distances: node 0 1 2 0: 10 20 20 1: 20 10 20 2: 20 20 10 Link: https://lkml.kernel.org/r/00728da107789bb4ed9e0d28b1d08fd8056af2ef.1636697263.git.baolin.wang@linux.alibaba.comSigned-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: N"Huang, Ying" <ying.huang@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Yang Shi <shy828301@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com> Cc: Xunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baolin Wang 提交于
Correct the migration stats for hugetlb with using compound_nr() instead of thp_nr_pages(), meanwhile change 'nr_failed_pages' to record the number of normal pages failed to migrate, including THP and hugetlb, and 'nr_succeeded' will record the number of normal pages migrated successfully. [baolin.wang@linux.alibaba.com: fix docs, per Mike] Link: https://lkml.kernel.org/r/141bdfc6-f898-3cc3-f692-726c5f6cb74d@linux.alibaba.com Link: https://lkml.kernel.org/r/71a4b6c22f208728fe8c78ad26375436c4ff9704.1636275127.git.baolin.wang@linux.alibaba.comSigned-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NZi Yan <ziy@nvidia.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baolin Wang 提交于
Patch series "Improve the migration stats". According to talk with Zi Yan [1], this patch set changes the return value of migrate_pages() to avoid returning a number which is larger than the number of pages the users tried to migrate by move_pages() syscall. Also fix the hugetlb migration stats and migration stats in trace_mm_compaction_migratepages(). [1] https://lore.kernel.org/linux-mm/7E44019D-2A5D-4BA7-B4D5-00D4712F1687@nvidia.com/ This patch (of 3): As Zi Yan pointed out, the syscall move_pages() can return a non-migrated number larger than the number of pages the users tried to migrate, when a THP page is failed to migrate. This is confusing for users. Since other migration scenarios do not care about the actual non-migrated number of pages except the memory compaction migration which will fix in following patch. Thus we can change the return value to return the number of {normal page, THP, hugetlb} instead to avoid this issue, and the number of THP splits will be considered as the number of non-migrated THP, no matter how many subpages of the THP are migrated successfully. Meanwhile we should still keep the migration counters using the number of normal pages. Link: https://lkml.kernel.org/r/cover.1636275127.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/6486fabc3e8c66ff613e150af25e89b3147977a6.1636275127.git.baolin.wang@linux.alibaba.comSigned-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: NZi Yan <ziy@nvidia.com> Co-developed-by: NZi Yan <ziy@nvidia.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pasha Tatashin 提交于
Patch series "page table check", v3. Ensure that some memory corruptions are prevented by checking at the time of insertion of entries into user page tables that there is no illegal sharing. We have recently found a problem [1] that existed in kernel since 4.14. The problem was caused by broken page ref count and led to memory leaking from one process into another. The problem was accidentally detected by studying a dump of one process and noticing that one page contains memory that should not belong to this process. There are some other page->_refcount related problems that were recently fixed: [2], [3] which potentially could also lead to illegal sharing. In addition to hardening refcount [4] itself, this work is an attempt to prevent this class of memory corruption issues. It uses a simple state machine that is independent from regular MM logic to check for illegal sharing at time pages are inserted and removed from page tables. [1] https://lore.kernel.org/all/xr9335nxwc5y.fsf@gthelen2.svl.corp.google.com [2] https://lore.kernel.org/all/1582661774-30925-2-git-send-email-akaher@vmware.com [3] https://lore.kernel.org/all/20210622021423.154662-3-mike.kravetz@oracle.com [4] https://lore.kernel.org/all/20211221150140.988298-1-pasha.tatashin@soleen.com This patch (of 4): There are a few places where we first update the entry in the user page table, and later change the struct page to indicate that this is anonymous or file page. In most places, however, we first configure the page metadata and then insert entries into the page table. Page table check, will use the information from struct page to verify the type of entry is inserted. Change the order in all places to first update struct page, and later to update page table. This means that we first do calls that may change the type of page (anon or file): page_move_anon_rmap page_add_anon_rmap do_page_add_anon_rmap page_add_new_anon_rmap page_add_file_rmap hugepage_add_anon_rmap hugepage_add_new_anon_rmap And after that do calls that add entries to the page table: set_huge_pte_at set_pte_at Link: https://lkml.kernel.org/r/20211221154650.1047963-1-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20211221154650.1047963-2-pasha.tatashin@soleen.comSigned-off-by: NPasha Tatashin <pasha.tatashin@soleen.com> Cc: David Rientjes <rientjes@google.com> Cc: Paul Turner <pjt@google.com> Cc: Wei Xu <weixugc@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Will Deacon <will@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 1月, 2022 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
We currently store large folios as 2^N consecutive entries. While this consumes rather more memory than necessary, it also turns out to be buggy. A writeback operation which starts within a tail page of a dirty folio will not write back the folio as the xarray's dirty bit is only set on the head index. With multi-index entries, the dirty bit will be found no matter where in the folio the operation starts. This does end up simplifying the page cache slightly, although not as much as I had hoped. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
-
- 05 1月, 2022 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
Convert all three callers of put_and_wait_on_page_locked() to folio_put_wait_locked(). This shrinks the kernel overall by 19 bytes. filemap_update_page() shrinks by 19 bytes while __migration_entry_wait() is unchanged. folio_put_wait_locked() is 14 bytes smaller than put_and_wait_on_page_locked(), but pmd_migration_entry_wait() grows by 14 bytes. It removes the assumption from pmd_migration_entry_wait() that pages cannot be larger than a PMD (which is true today, but may be interesting to explore in the future). Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
-
- 12 11月, 2021 2 次提交
-
-
由 Alistair Popple 提交于
MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a source page was already locked during migrate_vma_collect(). If it wasn't then the a second attempt is made to lock the page. However if the first attempt failed it's unlikely a second attempt will succeed, and the retry adds complexity. So clean this up by removing the retry and MIGRATE_PFN_LOCKED flag. Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag set, but nothing actually checks that. Link: https://lkml.kernel.org/r/20211025041608.289017-1-apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com> Reviewed-by: NRalph Campbell <rcampbell@nvidia.com> Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baolin Wang 提交于
There is no need to validate the file-backed page's refcount before trying to freeze the page's expected refcount, instead we can rely on the folio_ref_freeze() to validate if the page has the expected refcount before migrating its mapping. Moreover we are always under the page lock when migrating the page mapping, which means nowhere else can remove it from the page cache, so we can remove the xas_load() validation under the i_pages lock. Link: https://lkml.kernel.org/r/cover.1629447552.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/df4c129fd8e86a95dbc55f4663d77441cc0d3bd1.1629447552.git.baolin.wang@linux.alibaba.comSigned-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Suggested-by: NMatthew Wilcox <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 11月, 2021 1 次提交
-
-
由 Yang Shi 提交于
The memory demotion needs to call migrate_pages() to do the jobs. And it is controlled by a knob, however, the knob doesn't depend on CONFIG_MIGRATION. The knob could be truned on even though MIGRATION is disabled, this will not cause any crash since migrate_pages() would just return -ENOSYS. But it is definitely not optimal to go through demotion path then retry regular swap every time. And it doesn't make too much sense to have the knob visible to the users when !MIGRATION. Move the related code from mempolicy.[h|c] to migrate.[h|c]. Link: https://lkml.kernel.org/r/20211015005559.246709-1-shy828301@gmail.comSigned-off-by: NYang Shi <shy828301@gmail.com> Acked-by: N"Huang, Ying" <ying.huang@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 10月, 2021 3 次提交
-
-
由 Huang Ying 提交于
The node demotion order needs to be updated during CPU hotplug. Because whether a NUMA node has CPU may influence the demotion order. The update function should be called during CPU online/offline after the node_states[N_CPU] has been updated. That is done in CPUHP_AP_ONLINE_DYN during CPU online and in CPUHP_MM_VMSTAT_DEAD during CPU offline. But in commit 884a6e5d ("mm/migrate: update node demotion order on hotplug events"), the function to update node demotion order is called in CPUHP_AP_ONLINE_DYN during CPU online/offline. This doesn't satisfy the order requirement. For example, there are 4 CPUs (P0, P1, P2, P3) in 2 sockets (P0, P1 in S0 and P2, P3 in S1), the demotion order is - S0 -> NUMA_NO_NODE - S1 -> NUMA_NO_NODE After P2 and P3 is offlined, because S1 has no CPU now, the demotion order should have been changed to - S0 -> S1 - S1 -> NO_NODE but it isn't changed, because the order updating callback for CPU hotplug doesn't see the new nodemask. After that, if P1 is offlined, the demotion order is changed to the expected order as above. So in this patch, we added CPUHP_AP_MM_DEMOTION_ONLINE and CPUHP_MM_DEMOTION_DEAD to be called after CPUHP_AP_ONLINE_DYN and CPUHP_MM_VMSTAT_DEAD during CPU online and offline, and register the update function on them. Link: https://lkml.kernel.org/r/20210929060351.7293-1-ying.huang@intel.com Fixes: 884a6e5d ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: N"Huang, Ying" <ying.huang@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Xu <weixugc@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Hansen 提交于
Once upon a time, the node demotion updates were driven solely by memory hotplug events. But now, there are handlers for both CPU and memory hotplug. However, the #ifdef around the code checks only memory hotplug. A system that has HOTPLUG_CPU=y but MEMORY_HOTPLUG=n would miss CPU hotplug events. Update the #ifdef around the common code. Add memory and CPU-specific #ifdefs for their handlers. These memory/CPU #ifdefs avoid unused function warnings when their Kconfig option is off. [arnd@arndb.de: rework hotplug_memory_notifier() stub] Link: https://lkml.kernel.org/r/20211013144029.2154629-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20210924161255.E5FE8F7E@davehans-spike.ostc.intel.com Fixes: 884a6e5d ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Xu <weixugc@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Hansen 提交于
Patch series "mm/migrate: 5.15 fixes for automatic demotion", v2. This contains two fixes for the "automatic demotion" code which was merged into 5.15: * Fix memory hotplug performance regression by watching suppressing any real action on irrelevant hotplug events. * Ensure CPU hotplug handler is registered when memory hotplug is disabled. This patch (of 2): == tl;dr == Automatic demotion opted for a simple, lazy approach to handling hotplug events. This noticeably slows down memory hotplug[1]. Optimize away updates to the demotion order when memory hotplug events should have no effect. This has no effect on CPU hotplug. There is no known problem on the CPU side and any work there will be in a separate series. == Background == Automatic demotion is a memory migration strategy to ensure that new allocations have room in faster memory tiers on tiered memory systems. The kernel maintains an array (node_demotion[]) to drive these migrations. The node_demotion[] path is calculated by starting at nodes with CPUs and then "walking" to nodes with memory. Only hotplug events which online or offline a node with memory (N_ONLINE) or CPUs (N_CPU) will actually affect the migration order. == Problem == However, the current code is lazy. It completely regenerates the migration order on *any* CPU or memory hotplug event. The logic was that these events are extremely rare and that the overhead from indiscriminate order regeneration is minimal. Part of the update logic involves a synchronize_rcu(), which is a pretty big hammer. Its overhead was large enough to be detected by some 0day tests that watch memory hotplug performance[1]. == Solution == Add a new helper (node_demotion_topo_changed()) which can differentiate between superfluous and impactful hotplug events. Skip the expensive update operation for superfluous events. == Aside: Locking == It took me a few moments to declare the locking to be safe enough for node_demotion_topo_changed() to work. It all hinges on the memory hotplug lock: During memory hotplug events, 'mem_hotplug_lock' is held for write. This ensures that two memory hotplug events can not be called simultaneously. CPU hotplug has a similar lock (cpuhp_state_mutex) which also provides mutual exclusion between CPU hotplug events. In addition, the demotion code acquire and hold the mem_hotplug_lock for read during its CPU hotplug handlers. This provides mutual exclusion between the demotion memory hotplug callbacks and the CPU hotplug callbacks. This effectively allows treating the migration target generation code to act as if it is single-threaded. 1. https://lore.kernel.org/all/20210905135932.GE15026@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/20210924161251.093CCD06@davehans-spike.ostc.intel.com Link: https://lkml.kernel.org/r/20210924161253.D7673E31@davehans-spike.ostc.intel.com Fixes: 884a6e5d ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reported-by: Nkernel test robot <oliver.sang@intel.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Xu <weixugc@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Thelen <gthelen@google.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 10月, 2021 3 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
This is the folio equivalent of migrate_page_copy(), which is retained as a wrapper for filesystems which are not yet converted to folios. Also convert copy_huge_page() to folio_copy(). Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NZi Yan <ziy@nvidia.com> Acked-by: NVlastimil Babka <vbabka@suse.cz>
-
由 Matthew Wilcox (Oracle) 提交于
Turn migrate_page_states() into a wrapper around folio_migrate_flags(). Also convert two functions only called from folio_migrate_flags() to be folio-based. ksm_migrate_page() becomes folio_migrate_ksm() and copy_page_owner() becomes folio_copy_owner(). folio_migrate_flags() alone shrinks by two thirds -- 1967 bytes down to 642 bytes. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NZi Yan <ziy@nvidia.com> Reviewed-by: NDavid Howells <dhowells@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz>
-
由 Matthew Wilcox (Oracle) 提交于
Reimplement migrate_page_move_mapping() as a wrapper around folio_migrate_mapping(). Saves 193 bytes of kernel text. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Howells <dhowells@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz>
-
- 27 9月, 2021 2 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
Convert all callers of mem_cgroup_migrate() to call page_folio() first. They all look like they're using head pages already, but this proves it. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Reviewed-by: NDavid Howells <dhowells@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz>
-
由 Matthew Wilcox (Oracle) 提交于
Convert all callers of mem_cgroup_charge() to call page_folio() on the page they're currently passing in. Many of them will be converted to use folios themselves soon. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Howells <dhowells@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz>
-
- 09 9月, 2021 5 次提交
-
-
由 Arnd Bergmann 提交于
These are all handled correctly when calling the native system call entry point, so remove the special cases. Link: https://lkml.kernel.org/r/20210727144859.4150043-6-arnd@kernel.orgSigned-off-by: NArnd Bergmann <arnd@arndb.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arnd Bergmann 提交于
The compat move_pages() implementation uses compat_alloc_user_space() for converting the pointer array. Moving the compat handling into the function itself is a bit simpler and lets us avoid the compat_alloc_user_space() call. Link: https://lkml.kernel.org/r/20210727144859.4150043-4-arnd@kernel.orgSigned-off-by: NArnd Bergmann <arnd@arndb.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baolin Wang 提交于
Change to use bool type for 'page_was_mapped' variable making it more readable. Link: https://lkml.kernel.org/r/ce1279df18d2c163998c403e0b5ec6d3f6f90f7a.1629447552.git.baolin.wang@linux.alibaba.comSigned-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NYang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baolin Wang 提交于
since commit a98a2f0c ("mm/rmap: split migration into its own function"), the migration ptes establishment has been split into a separate try_to_migrate() function, thus update the related comments. Link: https://lkml.kernel.org/r/5b824bad6183259c916ae6cf42f81d14c6118b06.1629447552.git.baolin.wang@linux.alibaba.comSigned-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NYang Shi <shy828301@gmail.com> Reviewed-by: NAlistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baolin Wang 提交于
Use thp_nr_pages() instead of compound_nr() to get the number of pages for THP page, meanwhile introducing a local variable 'nr_pages' to avoid getting the number of pages repeatedly. Link: https://lkml.kernel.org/r/a8e331ac04392ee230c79186330fb05e86a2aa77.1629447552.git.baolin.wang@linux.alibaba.comSigned-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NYang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 9月, 2021 3 次提交
-
-
由 Randy Dunlap 提交于
Use the expected "Return:" format to prevent a kernel-doc warning. mm/migrate.c:1157: warning: Excess function parameter 'returns' description in 'next_demotion_node' Link: https://lkml.kernel.org/r/20210808203151.10632-1-rdunlap@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yang Shi 提交于
Under normal circumstances, migrate_pages() returns the number of pages migrated. In error conditions, it returns an error code. When returning an error code, there is no way to know how many pages were migrated or not migrated. Make migrate_pages() return how many pages are demoted successfully for all cases, including when encountering errors. Page reclaim behavior will depend on this in subsequent patches. Link: https://lkml.kernel.org/r/20210721063926.3024591-3-ying.huang@intel.com Link: https://lkml.kernel.org/r/20210715055145.195411-4-ying.huang@intel.comSigned-off-by: NYang Shi <yang.shi@linux.alibaba.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: N"Huang, Ying" <ying.huang@intel.com> Suggested-by: Oscar Salvador <osalvador@suse.de> [optional parameter] Reviewed-by: NYang Shi <shy828301@gmail.com> Reviewed-by: NZi Yan <ziy@nvidia.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Xu <weixugc@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Hansen 提交于
Reclaim-based migration is attempting to optimize data placement in memory based on the system topology. If the system changes, so must the migration ordering. The implementation is conceptually simple and entirely unoptimized. On any memory or CPU hotplug events, assume that a node was added or removed and recalculate all migration targets. This ensures that the node_demotion[] array is always ready to be used in case the new reclaim mode is enabled. This recalculation is far from optimal, most glaringly that it does not even attempt to figure out the hotplug event would have some *actual* effect on the demotion order. But, given the expected paucity of hotplug events, this should be fine. Link: https://lkml.kernel.org/r/20210721063926.3024591-2-ying.huang@intel.com Link: https://lkml.kernel.org/r/20210715055145.195411-3-ying.huang@intel.comSigned-off-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: N"Huang, Ying" <ying.huang@intel.com> Reviewed-by: NYang Shi <shy828301@gmail.com> Reviewed-by: NZi Yan <ziy@nvidia.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Xu <weixugc@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-