1. 04 7月, 2022 3 次提交
  2. 17 6月, 2022 2 次提交
    • Z
      mm/memory-failure: disable unpoison once hw error happens · 67f22ba7
      zhenwei pi 提交于
      Currently unpoison_memory(unsigned long pfn) is designed for soft
      poison(hwpoison-inject) only.  Since 17fae129, the KPTE gets cleared
      on a x86 platform once hardware memory corrupts.
      
      Unpoisoning a hardware corrupted page puts page back buddy only, the
      kernel has a chance to access the page with *NOT PRESENT* KPTE.  This
      leads BUG during accessing on the corrupted KPTE.
      
      Suggested by David&Naoya, disable unpoison mechanism when a real HW error
      happens to avoid BUG like this:
      
       Unpoison: Software-unpoisoned page 0x61234
       BUG: unable to handle page fault for address: ffff888061234000
       #PF: supervisor write access in kernel mode
       #PF: error_code(0x0002) - not-present page
       PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
       Oops: 0002 [#1] PREEMPT SMP NOPTI
       CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G   M       OE     5.18.0.bm.1-amd64 #7
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
       RIP: 0010:clear_page_erms+0x7/0x10
       Code: ...
       RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
       RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
       RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
       R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
       R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
       FS:  00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       PKRU: 55555554
       Call Trace:
        <TASK>
        prep_new_page+0x151/0x170
        get_page_from_freelist+0xca0/0xe20
        ? sysvec_apic_timer_interrupt+0xab/0xc0
        ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
        __alloc_pages+0x17e/0x340
        __folio_alloc+0x17/0x40
        vma_alloc_folio+0x84/0x280
        __handle_mm_fault+0x8d4/0xeb0
        handle_mm_fault+0xd5/0x2a0
        do_user_addr_fault+0x1d0/0x680
        ? kvm_read_and_reset_apf_flags+0x3b/0x50
        exc_page_fault+0x78/0x170
        asm_exc_page_fault+0x27/0x30
      
      Link: https://lkml.kernel.org/r/20220615093209.259374-2-pizhenwei@bytedance.com
      Fixes: 847ce401 ("HWPOISON: Add unpoisoning support")
      Fixes: 17fae129 ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
      Signed-off-by: Nzhenwei pi <pizhenwei@bytedance.com>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>	[5.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      67f22ba7
    • A
      mm: re-allow pinning of zero pfns · 034e5afa
      Alex Williamson 提交于
      The commit referenced below subtly and inadvertently changed the logic to
      disallow pinning of zero pfns.  This breaks device assignment with vfio
      and potentially various other users of gup.  Exclude the zero page test
      from the negation.
      
      Link: https://lkml.kernel.org/r/165490039431.944052.12458624139225785964.stgit@omen
      Fixes: 1c563432 ("mm: fix is_pinnable_page against a cma page")
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Reported-by: NYishai Hadas <yishaih@nvidia.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: John Dias <joaodias@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Zhangfei Gao <zhangfei.gao@linaro.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Yi Liu <yi.l.liu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      034e5afa
  3. 28 5月, 2022 1 次提交
    • M
      mm: fix is_pinnable_page against a cma page · 1c563432
      Minchan Kim 提交于
      Pages in the CMA area could have MIGRATE_ISOLATE as well as MIGRATE_CMA so
      the current is_pinnable_page() could miss CMA pages which have
      MIGRATE_ISOLATE.  It ends up pinning CMA pages as longterm for the
      pin_user_pages() API so CMA allocations keep failing until the pin is
      released.
      
           CPU 0                                   CPU 1 - Task B
      
      cma_alloc
      alloc_contig_range
                                              pin_user_pages_fast(FOLL_LONGTERM)
      change pageblock as MIGRATE_ISOLATE
                                              internal_get_user_pages_fast
                                              lockless_pages_from_mm
                                              gup_pte_range
                                              try_grab_folio
                                              is_pinnable_page
                                                return true;
                                              So, pinned the page successfully.
      page migration failure with pinned page
                                              ..
                                              .. After 30 sec
                                              unpin_user_page(page)
      
      CMA allocation succeeded after 30 sec.
      
      The CMA allocation path protects the migration type change race using
      zone->lock but what GUP path need to know is just whether the page is on
      CMA area or not rather than exact migration type.  Thus, we don't need
      zone->lock but just checks migration type in either of (MIGRATE_ISOLATE
      and MIGRATE_CMA).
      
      Adding the MIGRATE_ISOLATE check in is_pinnable_page could cause rejecting
      of pinning pages on MIGRATE_ISOLATE pageblocks even though it's neither
      CMA nor movable zone if the page is temporarily unmovable.  However, such
      a migration failure by unexpected temporal refcount holding is general
      issue, not only come from MIGRATE_ISOLATE and the MIGRATE_ISOLATE is also
      transient state like other temporal elevated refcount problem.
      
      Link: https://lkml.kernel.org/r/20220524171525.976723-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
      Acked-by: NPaul E. McKenney <paulmck@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      1c563432
  4. 19 5月, 2022 1 次提交
    • J
      random: move randomize_page() into mm where it belongs · 5ad7dd88
      Jason A. Donenfeld 提交于
      randomize_page is an mm function. It is documented like one. It contains
      the history of one. It has the naming convention of one. It looks
      just like another very similar function in mm, randomize_stack_top().
      And it has always been maintained and updated by mm people. There is no
      need for it to be in random.c. In the "which shape does not look like
      the other ones" test, pointing to randomize_page() is correct.
      
      So move randomize_page() into mm/util.c, right next to the similar
      randomize_stack_top() function.
      
      This commit contains no actual code changes.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      5ad7dd88
  5. 13 5月, 2022 3 次提交
    • P
      mm/hugetlb: only drop uffd-wp special pte if required · 05e90bd0
      Peter Xu 提交于
      As with shmem uffd-wp special ptes, only drop the uffd-wp special swap pte
      if unmapping an entire vma or synchronized such that faults can not race
      with the unmap operation.  This requires passing zap_flags all the way to
      the lowest level hugetlb unmap routine: __unmap_hugepage_range.
      
      In general, unmap calls originated in hugetlbfs code will pass the
      ZAP_FLAG_DROP_MARKER flag as synchronization is in place to prevent
      faults.  The exception is hole punch which will first unmap without any
      synchronization.  Later when hole punch actually removes the page from the
      file, it will check to see if there was a subsequent fault and if so take
      the hugetlb fault mutex while unmapping again.  This second unmap will
      pass in ZAP_FLAG_DROP_MARKER.
      
      The justification of "whether to apply ZAP_FLAG_DROP_MARKER flag when
      unmap a hugetlb range" is (IMHO): we should never reach a state when a
      page fault could errornously fault in a page-cache page that was
      wr-protected to be writable, even in an extremely short period.  That
      could happen if e.g.  we pass ZAP_FLAG_DROP_MARKER when
      hugetlbfs_punch_hole() calls hugetlb_vmdelete_list(), because if a page
      faults after that call and before remove_inode_hugepages() is executed,
      the page cache can be mapped writable again in the small racy window, that
      can cause unexpected data overwritten.
      
      [peterx@redhat.com: fix sparse warning]
        Link: https://lkml.kernel.org/r/Ylcdw8I1L5iAoWhb@xz-m1.local
      [akpm@linux-foundation.org: move zap_flags_t from mm.h to mm_types.h to fix build issues]
      Link: https://lkml.kernel.org/r/20220405014915.14873-1-peterx@redhat.comSigned-off-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      05e90bd0
    • P
      mm/shmem: persist uffd-wp bit across zapping for file-backed · 999dad82
      Peter Xu 提交于
      File-backed memory is prone to being unmapped at any time.  It means all
      information in the pte will be dropped, including the uffd-wp flag.
      
      To persist the uffd-wp flag, we'll use the pte markers.  This patch
      teaches the zap code to understand uffd-wp and know when to keep or drop
      the uffd-wp bit.
      
      Add a new flag ZAP_FLAG_DROP_MARKER and set it in zap_details when we
      don't want to persist such an information, for example, when destroying
      the whole vma, or punching a hole in a shmem file.  For the rest cases we
      should never drop the uffd-wp bit, or the wr-protect information will get
      lost.
      
      The new ZAP_FLAG_DROP_MARKER needs to be put into mm.h rather than
      memory.c because it'll be further referenced in hugetlb files later.
      
      Link: https://lkml.kernel.org/r/20220405014847.14295-1-peterx@redhat.comSigned-off-by: NPeter Xu <peterx@redhat.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      999dad82
    • N
      mm/mprotect: use mmu_gather · 4a18419f
      Nadav Amit 提交于
      Patch series "mm/mprotect: avoid unnecessary TLB flushes", v6.
      
      This patchset is intended to remove unnecessary TLB flushes during
      mprotect() syscalls.  Once this patch-set make it through, similar and
      further optimizations for MADV_COLD and userfaultfd would be possible.
      
      Basically, there are 3 optimizations in this patch-set:
      
      1. Use TLB batching infrastructure to batch flushes across VMAs and do
         better/fewer flushes.  This would also be handy for later userfaultfd
         enhancements.
      
      2. Avoid unnecessary TLB flushes.  This optimization is the one that
         provides most of the performance benefits.  Unlike previous versions,
         we now only avoid flushes that would not result in spurious
         page-faults.
      
      3. Avoiding TLB flushes on change_huge_pmd() that are only needed to
         prevent the A/D bits from changing.
      
      Andrew asked for some benchmark numbers.  I do not have an easy
      determinate macrobenchmark in which it is easy to show benefit.  I
      therefore ran a microbenchmark: a loop that does the following on
      anonymous memory, just as a sanity check to see that time is saved by
      avoiding TLB flushes.  The loop goes:
      
      	mprotect(p, PAGE_SIZE, PROT_READ)
      	mprotect(p, PAGE_SIZE, PROT_READ|PROT_WRITE)
      	*p = 0; // make the page writable
      
      The test was run in KVM guest with 1 or 2 threads (the second thread was
      busy-looping).  I measured the time (cycles) of each operation:
      
      		1 thread		2 threads
      		mmots	+patch		mmots	+patch
      PROT_READ	3494	2725 (-22%)	8630	7788 (-10%)
      PROT_READ|WRITE	3952	2724 (-31%)	9075	2865 (-68%)
      
      [ mmots = v5.17-rc6-mmots-2022-03-06-20-38 ]
      
      The exact numbers are really meaningless, but the benefit is clear.  There
      are 2 interesting results though.  
      
      (1) PROT_READ is cheaper, while one can expect it not to be affected. 
      This is presumably due to TLB miss that is saved
      
      (2) Without memory access (*p = 0), the speedup of the patch is even
      greater.  In that scenario mprotect(PROT_READ) also avoids the TLB flush. 
      As a result both operations on the patched kernel take roughly ~1500
      cycles (with either 1 or 2 threads), whereas on mmotm their cost is as
      high as presented in the table.
      
      
      This patch (of 3):
      
      change_pXX_range() currently does not use mmu_gather, but instead
      implements its own deferred TLB flushes scheme.  This both complicates the
      code, as developers need to be aware of different invalidation schemes,
      and prevents opportunities to avoid TLB flushes or perform them in finer
      granularity.
      
      The use of mmu_gather for modified PTEs has benefits in various scenarios
      even if pages are not released.  For instance, if only a single page needs
      to be flushed out of a range of many pages, only that page would be
      flushed.  If a THP page is flushed, on x86 a single TLB invlpg instruction
      can be used instead of 512 instructions (or a full TLB flush, which would
      Linux would actually use by default).  mprotect() over multiple VMAs
      requires a single flush.
      
      Use mmu_gather in change_pXX_range().  As the pages are not released, only
      record the flushed range using tlb_flush_pXX_range().
      
      Handle THP similarly and get rid of flush_cache_range() which becomes
      redundant since tlb_start_vma() calls it when needed.
      
      Link: https://lkml.kernel.org/r/20220401180821.1986781-1-namit@vmware.com
      Link: https://lkml.kernel.org/r/20220401180821.1986781-2-namit@vmware.comSigned-off-by: NNadav Amit <namit@vmware.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Nick Piggin <npiggin@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      4a18419f
  6. 10 5月, 2022 3 次提交
    • D
      mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page · a7f22660
      David Hildenbrand 提交于
      Whenever GUP currently ends up taking a R/O pin on an anonymous page that
      might be shared -- mapped R/O and !PageAnonExclusive() -- any write fault
      on the page table entry will end up replacing the mapped anonymous page
      due to COW, resulting in the GUP pin no longer being consistent with the
      page actually mapped into the page table.
      
      The possible ways to deal with this situation are:
       (1) Ignore and pin -- what we do right now.
       (2) Fail to pin -- which would be rather surprising to callers and
           could break user space.
       (3) Trigger unsharing and pin the now exclusive page -- reliable R/O
           pins.
      
      Let's implement 3) because it provides the clearest semantics and allows
      for checking in unpin_user_pages() and friends for possible BUGs: when
      trying to unpin a page that's no longer exclusive, clearly something went
      very wrong and might result in memory corruptions that might be hard to
      debug.  So we better have a nice way to spot such issues.
      
      This change implies that whenever user space *wrote* to a private mapping
      (IOW, we have an anonymous page mapped), that GUP pins will always remain
      consistent: reliable R/O GUP pins of anonymous pages.
      
      As a side note, this commit fixes the COW security issue for hugetlb with
      FOLL_PIN as documented in:
        https://lore.kernel.org/r/3ae33b08-d9ef-f846-56fb-645e3b9b4c66@redhat.com
      The vmsplice reproducer still applies, because vmsplice uses FOLL_GET
      instead of FOLL_PIN.
      
      Note that follow_huge_pmd() doesn't apply because we cannot end up in
      there with FOLL_PIN.
      
      This commit is heavily based on prototype patches by Andrea.
      
      Link: https://lkml.kernel.org/r/20220428083441.37290-17-david@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Co-developed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Liang Zhang <zhangliang5@huawei.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Oded Gabbay <oded.gabbay@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      a7f22660
    • D
      mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap() · fb3d824d
      David Hildenbrand 提交于
      ...  and move the special check for pinned pages into
      page_try_dup_anon_rmap() to prepare for tracking exclusive anonymous pages
      via a new pageflag, clearing it only after making sure that there are no
      GUP pins on the anonymous page.
      
      We really only care about pins on anonymous pages, because they are prone
      to getting replaced in the COW handler once mapped R/O.  For !anon pages
      in cow-mappings (!VM_SHARED && VM_MAYWRITE) we shouldn't really care about
      that, at least not that I could come up with an example.
      
      Let's drop the is_cow_mapping() check from page_needs_cow_for_dma(), as we
      know we're dealing with anonymous pages.  Also, drop the handling of
      pinned pages from copy_huge_pud() and add a comment if ever supporting
      anonymous pages on the PUD level.
      
      This is a preparation for tracking exclusivity of anonymous pages in the
      rmap code, and disallowing marking a page shared (-> failing to duplicate)
      if there are GUP pins on a page.
      
      Link: https://lkml.kernel.org/r/20220428083441.37290-5-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Liang Zhang <zhangliang5@huawei.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Oded Gabbay <oded.gabbay@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      fb3d824d
    • D
      mm/hugetlb: take src_mm->write_protect_seq in copy_hugetlb_page_range() · 623a1ddf
      David Hildenbrand 提交于
      Let's do it just like copy_page_range(), taking the seqlock and making
      sure the mmap_lock is held in write mode.
      
      This allows for add a VM_BUG_ON to page_needs_cow_for_dma() and properly
      synchronizes concurrent fork() with GUP-fast of hugetlb pages, which will
      be relevant for further changes.
      
      Link: https://lkml.kernel.org/r/20220428083441.37290-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Liang Zhang <zhangliang5@huawei.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Oded Gabbay <oded.gabbay@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      623a1ddf
  7. 29 4月, 2022 5 次提交
    • J
      mm/sparse-vmemmap: improve memory savings for compound devmaps · 4917f55b
      Joao Martins 提交于
      A compound devmap is a dev_pagemap with @vmemmap_shift > 0 and it means
      that pages are mapped at a given huge page alignment and utilize uses
      compound pages as opposed to order-0 pages.
      
      Take advantage of the fact that most tail pages look the same (except the
      first two) to minimize struct page overhead.  Allocate a separate page for
      the vmemmap area which contains the head page and separate for the next 64
      pages.  The rest of the subsections then reuse this tail vmemmap page to
      initialize the rest of the tail pages.
      
      Sections are arch-dependent (e.g.  on x86 it's 64M, 128M or 512M) and when
      initializing compound devmap with big enough @vmemmap_shift (e.g.  1G PUD)
      it may cross multiple sections.  The vmemmap code needs to consult @pgmap
      so that multiple sections that all map the same tail data can refer back
      to the first copy of that data for a given gigantic page.
      
      On compound devmaps with 2M align, this mechanism lets 6 pages be saved
      out of the 8 necessary PFNs necessary to set the subsection's 512 struct
      pages being mapped.  On a 1G compound devmap it saves 4094 pages.
      
      Altmap isn't supported yet, given various restrictions in altmap pfn
      allocator, thus fallback to the already in use vmemmap_populate().  It is
      worth noting that altmap for devmap mappings was there to relieve the
      pressure of inordinate amounts of memmap space to map terabytes of pmem. 
      With compound pages the motivation for altmaps for pmem gets reduced.
      
      Link: https://lkml.kernel.org/r/20220420155310.9712-5-joao.m.martins@oracle.comSigned-off-by: NJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      4917f55b
    • J
      mm/sparse-vmemmap: add a pgmap argument to section activation · e3246d8f
      Joao Martins 提交于
      Patch series "sparse-vmemmap: memory savings for compound devmaps (device-dax)", v9.
      
      This series minimizes 'struct page' overhead by pursuing a similar
      approach as Muchun Song series "Free some vmemmap pages of hugetlb page"
      (now merged since v5.14), but applied to devmap with @vmemmap_shift
      (device-dax).  
      
      The vmemmap dedpulication original idea (already used in HugeTLB) is to
      reuse/deduplicate tail page vmemmap areas, particular the area which only
      describes tail pages.  So a vmemmap page describes 64 struct pages, and
      the first page for a given ZONE_DEVICE vmemmap would contain the head page
      and 63 tail pages.  The second vmemmap page would contain only tail pages,
      and that's what gets reused across the rest of the subsection/section. 
      The bigger the page size, the bigger the savings (2M hpage -> save 6
      vmemmap pages; 1G hpage -> save 4094 vmemmap pages).  
      
      This is done for PMEM /specifically only/ on device-dax configured
      namespaces, not fsdax.  In other words, a devmap with a @vmemmap_shift.
      
      In terms of savings, per 1Tb of memory, the struct page cost would go down
      with compound devmap:
      
      * with 2M pages we lose 4G instead of 16G (0.39% instead of 1.5% of
        total memory)
      
      * with 1G pages we lose 40MB instead of 16G (0.0014% instead of 1.5% of
        total memory)
      
      The series is mostly summed up by patch 4, and to summarize what the
      series does:
      
      Patches 1 - 3: Minor cleanups in preparation for patch 4.  Move the very
      nice docs of hugetlb_vmemmap.c into a Documentation/vm/ entry.
      
      Patch 4: Patch 4 is the one that takes care of the struct page savings
      (also referred to here as tail-page/vmemmap deduplication).  Much like
      Muchun series, we reuse the second PTE tail page vmemmap areas across a
      given @vmemmap_shift On important difference though, is that contrary to
      the hugetlbfs series, there's no vmemmap for the area because we are
      late-populating it as opposed to remapping a system-ram range.  IOW no
      freeing of pages of already initialized vmemmap like the case for
      hugetlbfs, which greatly simplifies the logic (besides not being
      arch-specific).  altmap case unchanged and still goes via the
      vmemmap_populate().  Also adjust the newly added docs to the device-dax
      case.
      
      [Note that device-dax is still a little behind HugeTLB in terms of
      savings.  I have an additional simple patch that reuses the head vmemmap
      page too, as a follow-up.  That will double the savings and namespaces
      initialization.]
      
      Patch 5: Initialize fewer struct pages depending on the page size with
      DRAM backed struct pages -- because fewer pages are unique and most tail
      pages (with bigger vmemmap_shift).
      
          NVDIMM namespace bootstrap improves from ~268-358 ms to
          ~80-110/<1ms on 128G NVDIMMs with 2M and 1G respectivally.  And struct
          page needed capacity will be 3.8x / 1071x smaller for 2M and 1G
          respectivelly.  Tested on x86 with 1.5Tb of pmem (including pinning,
          and RDMA registration/deregistration scalability with 2M MRs)
      
      
      This patch (of 5):
      
      In support of using compound pages for devmap mappings, plumb the pgmap
      down to the vmemmap_populate implementation.  Note that while altmap is
      retrievable from pgmap the memory hotplug code passes altmap without
      pgmap[*], so both need to be independently plumbed.
      
      So in addition to @altmap, pass @pgmap to sparse section populate
      functions namely:
      
      	sparse_add_section
      	  section_activate
      	    populate_section_memmap
         	      __populate_section_memmap
      
      Passing @pgmap allows __populate_section_memmap() to both fetch the
      vmemmap_shift in which memmap metadata is created for and also to let
      sparse-vmemmap fetch pgmap ranges to co-relate to a given section and pick
      whether to just reuse tail pages from past onlined sections.
      
      While at it, fix the kdoc for @altmap for sparse_add_section().
      
      [*] https://lore.kernel.org/linux-mm/20210319092635.6214-1-osalvador@suse.de/
      
      Link: https://lkml.kernel.org/r/20220420155310.9712-1-joao.m.martins@oracle.com
      Link: https://lkml.kernel.org/r/20220420155310.9712-2-joao.m.martins@oracle.comSigned-off-by: NJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      e3246d8f
    • M
      mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP* · 47010c04
      Muchun Song 提交于
      The word of "free" is not expressive enough to express the feature of
      optimizing vmemmap pages associated with each HugeTLB, rename this keywork
      to "optimize".  In this patch , cheanup configs to make code more
      expressive.
      
      Link: https://lkml.kernel.org/r/20220404074652.68024-4-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      47010c04
    • M
      mm: simplify follow_invalidate_pte() · 0e5e64c0
      Muchun Song 提交于
      The only user (DAX) of range and pmdpp parameters of
      follow_invalidate_pte() is gone, it is safe to remove them and make it
      static to simlify the code.  This is revertant of the following commits:
      
        09796395 ("mm: add follow_pte_pmd()")
        a4d1a885 ("dax: update to new mmu_notifier semantic")
      
      There is only one caller of the follow_invalidate_pte().  So just fold it
      into follow_pte() and remove it.
      
      Link: https://lkml.kernel.org/r/20220403053957.10770-7-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
      Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      0e5e64c0
    • N
      Revert "mm/memory-failure.c: fix race with changing page compound again" · 2ba2b008
      Naoya Horiguchi 提交于
      Reverts commit 888af270 ("mm/memory-failure.c: fix race with changing
      page compound again") because now we fetch the page refcount under
      hugetlb_lock in try_memory_failure_hugetlb() so that the race check is no
      longer necessary.
      
      Link: https://lkml.kernel.org/r/20220408135323.1559401-4-naoya.horiguchi@linux.devSigned-off-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Suggested-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      2ba2b008
  8. 22 4月, 2022 1 次提交
  9. 25 3月, 2022 1 次提交
  10. 23 3月, 2022 6 次提交
    • N
      userfaultfd: provide unmasked address on page-fault · 824ddc60
      Nadav Amit 提交于
      Userfaultfd is supposed to provide the full address (i.e., unmasked) of
      the faulting access back to userspace.  However, that is not the case for
      quite some time.
      
      Even running "userfaultfd_demo" from the userfaultfd man page provides the
      wrong output (and contradicts the man page).  Notice that
      "UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) and
      not the first read address (0x7fc5e30b300f).
      
      	Address returned by mmap() = 0x7fc5e30b3000
      
      	fault_handler_thread():
      	    poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
      	    UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000
      		(uffdio_copy.copy returned 4096)
      	Read address 0x7fc5e30b300f in main(): A
      	Read address 0x7fc5e30b340f in main(): A
      	Read address 0x7fc5e30b380f in main(): A
      	Read address 0x7fc5e30b3c0f in main(): A
      
      The exact address is useful for various reasons and specifically for
      prefetching decisions.  If it is known that the memory is populated by
      certain objects whose size is not page-aligned, then based on the faulting
      address, the uffd-monitor can decide whether to prefetch and prefault the
      adjacent page.
      
      This bug has been for quite some time in the kernel: since commit
      1a29d85e ("mm: use vmf->address instead of of vmf->virtual_address")
      vmf->virtual_address"), which dates back to 2016.  A concern has been
      raised that existing userspace application might rely on the old/wrong
      behavior in which the address is masked.  Therefore, it was suggested to
      provide the masked address unless the user explicitly asks for the exact
      address.
      
      Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct
      userfaultfd to provide the exact address.  Add a new "real_address" field
      to vmf to hold the unmasked address.  Provide the address to userspace
      accordingly.
      
      Initialize real_address in various code-paths to be consistent with
      address, even when it is not used, to be on the safe side.
      
      [namit@vmware.com: initialize real_address on all code paths, per Jan]
        Link: https://lkml.kernel.org/r/20220226022655.350562-1-namit@vmware.com
      [akpm@linux-foundation.org: fix typo in comment, per Jan]
      
      Link: https://lkml.kernel.org/r/20220218041003.3508-1-namit@vmware.comSigned-off-by: NNadav Amit <namit@vmware.com>
      Acked-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      824ddc60
    • M
      mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP · e5408417
      Muchun Song 提交于
      The vmemmap_remap_free/alloc are relevant to HugeTLB, so move those
      functiongs to the scope of CONFIG_HUGETLB_PAGE_FREE_VMEMMAP.
      
      Link: https://lkml.kernel.org/r/20211101031651.75851-6-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: NBarry Song <song.bao.hua@hisilicon.com>
      Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
      Cc: Chen Huang <chenhuang5@huawei.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Fam Zheng <fam.zheng@bytedance.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5408417
    • M
      mm/memory-failure.c: fix race with changing page compound again · 888af270
      Miaohe Lin 提交于
      Patch series "A few fixup patches for memory failure", v2.
      
      This series contains a few patches to fix the race with changing page
      compound page, make non-LRU movable pages unhandlable and so on.  More
      details can be found in the respective changelogs.
      
      There is a race window where we got the compound_head, the hugetlb page
      could be freed to buddy, or even changed to another compound page just
      before we try to get hwpoison page.  Think about the below race window:
      
        CPU 1					  CPU 2
        memory_failure_hugetlb
        struct page *head = compound_head(p);
      					  hugetlb page might be freed to
      					  buddy, or even changed to another
      					  compound page.
      
        get_hwpoison_page -- page is not what we want now...
      
      If this race happens, just bail out.  Also MF_MSG_DIFFERENT_PAGE_SIZE is
      introduced to record this event.
      
      [akpm@linux-foundation.org: s@/**@/*@, per Naoya Horiguchi]
      
      Link: https://lkml.kernel.org/r/20220312074613.4798-1-linmiaohe@huawei.com
      Link: https://lkml.kernel.org/r/20220312074613.4798-2-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      888af270
    • O
      arch/x86/mm/numa: Do not initialize nodes twice · 1ca75fa7
      Oscar Salvador 提交于
      On x86, prior to ("mm: handle uninitialized numa nodes gracecully"), NUMA
      nodes could be allocated at three different places.
      
       - numa_register_memblks
       - init_cpu_to_node
       - init_gi_nodes
      
      All these calls happen at setup_arch, and have the following order:
      
      setup_arch
        ...
        x86_numa_init
         numa_init
          numa_register_memblks
        ...
        init_cpu_to_node
         init_memory_less_node
          alloc_node_data
          free_area_init_memoryless_node
        init_gi_nodes
         init_memory_less_node
          alloc_node_data
          free_area_init_memoryless_node
      
      numa_register_memblks() is only interested in those nodes which have
      memory, so it skips over any memoryless node it founds.  Later on, when
      we have read ACPI's SRAT table, we call init_cpu_to_node() and
      init_gi_nodes(), which initialize any memoryless node we might have that
      have either CPU or Initiator affinity, meaning we allocate pg_data_t
      struct for them and we mark them as ONLINE.
      
      So far so good, but the thing is that after ("mm: handle uninitialized
      numa nodes gracefully"), we allocate all possible NUMA nodes in
      free_area_init(), meaning we have a picture like the following:
      
      setup_arch
        x86_numa_init
         numa_init
          numa_register_memblks  <-- allocate non-memoryless node
        x86_init.paging.pagetable_init
         ...
          free_area_init
           free_area_init_memoryless <-- allocate memoryless node
        init_cpu_to_node
         alloc_node_data             <-- allocate memoryless node with CPU
         free_area_init_memoryless_node
        init_gi_nodes
         alloc_node_data             <-- allocate memoryless node with Initiator
         free_area_init_memoryless_node
      
      free_area_init() already allocates all possible NUMA nodes, but
      init_cpu_to_node() and init_gi_nodes() are clueless about that, so they
      go ahead and allocate a new pg_data_t struct without checking anything,
      meaning we end up allocating twice.
      
      It should be mad clear that this only happens in the case where
      memoryless NUMA node happens to have a CPU/Initiator affinity.
      
      So get rid of init_memory_less_node() and just set the node online.
      
      Note that setting the node online is needed, otherwise we choke down the
      chain when bringup_nonboot_cpus() ends up calling
      __try_online_node()->register_one_node()->...  and we blow up in
      bus_add_device().  As can be seen here:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000060
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-1-default+ #45
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/4
        RIP: 0010:bus_add_device+0x5a/0x140
        Code: 8b 74 24 20 48 89 df e8 84 96 ff ff 85 c0 89 c5 75 38 48 8b 53 50 48 85 d2 0f 84 bb 00 004
        RSP: 0000:ffffc9000022bd10 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff888100987400 RCX: ffff8881003e4e19
        RDX: ffff8881009a5e00 RSI: ffff888100987400 RDI: ffff888100987400
        RBP: 0000000000000000 R08: ffff8881003e4e18 R09: ffff8881003e4c98
        R10: 0000000000000000 R11: ffff888100402bc0 R12: ffffffff822ceba0
        R13: 0000000000000000 R14: ffff888100987400 R15: 0000000000000000
        FS:  0000000000000000(0000) GS:ffff88853fc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000060 CR3: 000000000200a001 CR4: 00000000001706b0
        Call Trace:
         device_add+0x4c0/0x910
         __register_one_node+0x97/0x2d0
         __try_online_node+0x85/0xc0
         try_online_node+0x25/0x40
         cpu_up+0x4f/0x100
         bringup_nonboot_cpus+0x4f/0x60
         smp_init+0x26/0x79
         kernel_init_freeable+0x130/0x2f1
         kernel_init+0x17/0x150
         ret_from_fork+0x22/0x30
      
      The reason is simple, by the time bringup_nonboot_cpus() gets called, we
      did not register the node_subsys bus yet, so we crash when
      bus_add_device() tries to dereference bus()->p.
      
      The following shows the order of the calls:
      
      kernel_init_freeable
       smp_init
        bringup_nonboot_cpus
         ...
           bus_add_device()      <- we did not register node_subsys yet
       do_basic_setup
        do_initcalls
         postcore_initcall(register_node_type);
          register_node_type
           subsys_system_register
            subsys_register
             bus_register         <- register node_subsys bus
      
      Why setting the node online saves us then? Well, simply because
      __try_online_node() backs off when the node is online, meaning we do not
      end up calling register_one_node() in the first place.
      
      This is subtle, broken and deserves a deep analysis and thought about
      how to put this into shape, but for now let us have this easy fix for
      the leaking memory issue.
      
      [osalvador@suse.de: add comments]
        Link: https://lkml.kernel.org/r/20220221142649.3457-1-osalvador@suse.de
      
      Link: https://lkml.kernel.org/r/20220218224302.5282-2-osalvador@suse.de
      Fixes: da4490c958ad ("mm: handle uninitialized numa nodes gracefully")
      Signed-off-by: NOscar Salvador <osalvador@suse.de>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Rafael Aquini <raquini@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Alexey Makhalov <amakhalov@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ca75fa7
    • J
      mm/gup: remove unused get_user_pages_locked() · 73fd16d8
      John Hubbard 提交于
      Now that the last caller of get_user_pages_locked() is gone, remove it.
      
      Link: https://lkml.kernel.org/r/20220204020010.68930-6-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
      Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73fd16d8
    • J
      mm/gup: remove unused pin_user_pages_locked() · ad6c4412
      John Hubbard 提交于
      This routine was used for a short while, but then the calling code was
      refactored and the only caller was removed.
      
      Link: https://lkml.kernel.org/r/20220204020010.68930-4-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad6c4412
  11. 22 3月, 2022 14 次提交