1. 13 8月, 2020 5 次提交
    • P
      mm: clean up the last pieces of page fault accountings · a2beb5f1
      Peter Xu 提交于
      Here're the last pieces of page fault accounting that were still done
      outside handle_mm_fault() where we still have regs==NULL when calling
      handle_mm_fault():
      
      arch/powerpc/mm/copro_fault.c:   copro_handle_mm_fault
      arch/sparc/mm/fault_32.c:        force_user_fault
      arch/um/kernel/trap.c:           handle_page_fault
      mm/gup.c:                        faultin_page
                                       fixup_user_fault
      mm/hmm.c:                        hmm_vma_fault
      mm/ksm.c:                        break_ksm
      
      Some of them has the issue of duplicated accounting for page fault
      retries.  Some of them didn't do the accounting at all.
      
      This patch cleans all these up by letting handle_mm_fault() to do per-task
      page fault accounting even if regs==NULL (though we'll still skip the perf
      event accountings).  With that, we can safely remove all the outliers now.
      
      There's another functional change in that now we account the page faults
      to the caller of gup, rather than the task_struct that passed into the gup
      code.  More information of this can be found at [1].
      
      After this patch, below things should never be touched again outside
      handle_mm_fault():
      
        - task_struct.[maj|min]_flt
        - PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]
      
      [1] https://lore.kernel.org/lkml/CAHk-=wj_V2Tps2QrMn20_W0OJF9xqNh52XSGA42s-ZJ8Y+GyKw@mail.gmail.com/Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200707225021.200906-25-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2beb5f1
    • P
      mm: do page fault accounting in handle_mm_fault · bce617ed
      Peter Xu 提交于
      Patch series "mm: Page fault accounting cleanups", v5.
      
      This is v5 of the pf accounting cleanup series.  It originates from Gerald
      Schaefer's report on an issue a week ago regarding to incorrect page fault
      accountings for retried page fault after commit 4064b982 ("mm: allow
      VM_FAULT_RETRY for multiple times"):
      
        https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/
      
      What this series did:
      
        - Correct page fault accounting: we do accounting for a page fault
          (no matter whether it's from #PF handling, or gup, or anything else)
          only with the one that completed the fault.  For example, page fault
          retries should not be counted in page fault counters.  Same to the
          perf events.
      
        - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
          event is used in an adhoc way across different archs.
      
          Case (1): for many archs it's done at the entry of a page fault
          handler, so that it will also cover e.g.  errornous faults.
      
          Case (2): for some other archs, it is only accounted when the page
          fault is resolved successfully.
      
          Case (3): there're still quite some archs that have not enabled
          this perf event.
      
          Since this series will touch merely all the archs, we unify this
          perf event to always follow case (1), which is the one that makes most
          sense.  And since we moved the accounting into handle_mm_fault, the
          other two MAJ/MIN perf events are well taken care of naturally.
      
        - Unify definition of "major faults": the definition of "major
          fault" is slightly changed when used in accounting (not
          VM_FAULT_MAJOR).  More information in patch 1.
      
        - Always account the page fault onto the one that triggered the page
          fault.  This does not matter much for #PF handlings, but mostly for
          gup.  More information on this in patch 25.
      
      Patchset layout:
      
      Patch 1:     Introduced the accounting in handle_mm_fault(), not enabled.
      Patch 2-23:  Enable the new accounting for arch #PF handlers one by one.
      Patch 24:    Enable the new accounting for the rest outliers (gup, iommu, etc.)
      Patch 25:    Cleanup GUP task_struct pointer since it's not needed any more
      
      This patch (of 25):
      
      This is a preparation patch to move page fault accountings into the
      general code in handle_mm_fault().  This includes both the per task
      flt_maj/flt_min counters, and the major/minor page fault perf events.  To
      do this, the pt_regs pointer is passed into handle_mm_fault().
      
      PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
      handlers.
      
      So far, all the pt_regs pointer that passed into handle_mm_fault() is
      NULL, which means this patch should have no intented functional change.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
      Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bce617ed
    • J
      mm/gup: use a standard migration target allocation callback · ed03d924
      Joonsoo Kim 提交于
      There is a well-defined migration target allocation callback. Use it.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Roman Gushchin <guro@fb.com>
      Link: http://lkml.kernel.org/r/1596180906-8442-3-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed03d924
    • J
      mm/hugetlb: make hugetlb migration callback CMA aware · bbe88753
      Joonsoo Kim 提交于
      new_non_cma_page() in gup.c requires to allocate the new page that is not
      on the CMA area.  new_non_cma_page() implements it by using allocation
      scope APIs.
      
      However, there is a work-around for hugetlb.  Normal hugetlb page
      allocation API for migration is alloc_huge_page_nodemask().  It consists
      of two steps.  First is dequeing from the pool.  Second is, if there is no
      available page on the queue, allocating by using the page allocator.
      
      new_non_cma_page() can't use this API since first step (deque) isn't aware
      of scope API to exclude CMA area.  So, new_non_cma_page() exports hugetlb
      internal function for the second step, alloc_migrate_huge_page(), to
      global scope and uses it directly.  This is suboptimal since hugetlb pages
      on the queue cannot be utilized.
      
      This patch tries to fix this situation by making the deque function on
      hugetlb CMA aware.  In the deque function, CMA memory is skipped if
      PF_MEMALLOC_NOCMA flag is found.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Roman Gushchin <guro@fb.com>
      Link: http://lkml.kernel.org/r/1596180906-8442-2-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bbe88753
    • J
      mm/gup: restrict CMA region by using allocation scope API · 41b4dc14
      Joonsoo Kim 提交于
      We have well defined scope API to exclude CMA region.  Use it rather than
      manipulating gfp_mask manually.  With this change, we can now restore
      __GFP_MOVABLE for gfp_mask like as usual migration target allocation.  It
      would result in that the ZONE_MOVABLE is also searched by page allocator.
      For hugetlb, gfp_mask is redefined since it has a regular allocation mask
      filter for migration target.  __GPF_NOWARN is added to hugetlb gfp_mask
      filter since a new user for gfp_mask filter, gup, want to be silent when
      allocation fails.
      
      Note that this can be considered as a fix for the commit 9a4e9f3b
      ("mm: update get_user_pages_longterm to migrate pages allocated from CMA
      region").  However, "Fixes" tag isn't added here since it is just
      suboptimal but it doesn't cause any problem.
      Suggested-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
      Link: http://lkml.kernel.org/r/1596180906-8442-1-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41b4dc14
  2. 08 8月, 2020 1 次提交
  3. 20 6月, 2020 2 次提交
  4. 10 6月, 2020 6 次提交
  5. 09 6月, 2020 3 次提交
  6. 04 6月, 2020 5 次提交
  7. 03 6月, 2020 4 次提交
    • M
      mm/gup.c: further document vma_permits_fault() · 548b6a1e
      Miles Chen 提交于
      Describe the caller's responsibilities when passing
      FAULT_FLAG_ALLOW_RETRY.
      
      Link: http://lkml.kernel.org/r/1586915606.5647.5.camel@mtkswgap22Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      548b6a1e
    • J
      mm/gup: introduce pin_user_pages_unlocked · 91429023
      John Hubbard 提交于
      Introduce pin_user_pages_unlocked(), which is nearly identical to the
      get_user_pages_unlocked() that it wraps, except that it sets FOLL_PIN
      and rejects FOLL_GET.
      Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Walls <awalls@md.metrocast.net>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Link: http://lkml.kernel.org/r/20200518012157.1178336-2-jhubbard@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91429023
    • S
      mm/gup.c: update the documentation · adc8cb40
      Souptick Joarder 提交于
      This patch is an attempt to update the documentation.
      
       - Add/ remove extra * based on type of function static/global.
      
       - Add description for functions and their input arguments.
      
      [akpm@linux-foundation.org: s@/*@/**@]
      Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1588013630-4497-1-git-send-email-jrdr.linux@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      adc8cb40
    • L
      gup: document and work around "COW can break either way" issue · 17839856
      Linus Torvalds 提交于
      Doing a "get_user_pages()" on a copy-on-write page for reading can be
      ambiguous: the page can be COW'ed at any time afterwards, and the
      direction of a COW event isn't defined.
      
      Yes, whoever writes to it will generally do the COW, but if the thread
      that did the get_user_pages() unmapped the page before the write (and
      that could happen due to memory pressure in addition to any outright
      action), the writer could also just take over the old page instead.
      
      End result: the get_user_pages() call might result in a page pointer
      that is no longer associated with the original VM, and is associated
      with - and controlled by - another VM having taken it over instead.
      
      So when doing a get_user_pages() on a COW mapping, the only really safe
      thing to do would be to break the COW when getting the page, even when
      only getting it for reading.
      
      At the same time, some users simply don't even care.
      
      For example, the perf code wants to look up the page not because it
      cares about the page, but because the code simply wants to look up the
      physical address of the access for informational purposes, and doesn't
      really care about races when a page might be unmapped and remapped
      elsewhere.
      
      This adds logic to force a COW event by setting FOLL_WRITE on any
      copy-on-write mapping when FOLL_GET (or FOLL_PIN) is used to get a page
      pointer as a result.
      
      The current semantics end up being:
      
       - __get_user_pages_fast(): no change. If you don't ask for a write,
         you won't break COW. You'd better know what you're doing.
      
       - get_user_pages_fast(): the fast-case "look it up in the page tables
         without anything getting mmap_sem" now refuses to follow a read-only
         page, since it might need COW breaking.  Which happens in the slow
         path - the fast path doesn't know if the memory might be COW or not.
      
       - get_user_pages() (including the slow-path fallback for gup_fast()):
         for a COW mapping, turn on FOLL_WRITE for FOLL_GET/FOLL_PIN, with
         very similar semantics to FOLL_FORCE.
      
      If it turns out that we want finer granularity (ie "only break COW when
      it might actually matter" - things like the zero page are special and
      don't need to be broken) we might need to push these semantics deeper
      into the lookup fault path.  So if people care enough, it's possible
      that we might end up adding a new internal FOLL_BREAK_COW flag to go
      with the internal FOLL_COW flag we already have for tracking "I had a
      COW".
      
      Alternatively, if it turns out that different callers might want to
      explicitly control the forced COW break behavior, we might even want to
      make such a flag visible to the users of get_user_pages() instead of
      using the above default semantics.
      
      But for now, this is mostly commentary on the issue (this commit message
      being a lot bigger than the patch, and that patch in turn is almost all
      comments), with that minimal "enable COW breaking early" logic using the
      existing FOLL_WRITE behavior.
      
      [ It might be worth noting that we've always had this ambiguity, and it
        could arguably be seen as a user-space issue.
      
        You only get private COW mappings that could break either way in
        situations where user space is doing cooperative things (ie fork()
        before an execve() etc), but it _is_ surprising and very subtle, and
        fork() is supposed to give you independent address spaces.
      
        So let's treat this as a kernel issue and make the semantics of
        get_user_pages() easier to understand. Note that obviously a true
        shared mapping will still get a page that can change under us, so this
        does _not_ mean that get_user_pages() somehow returns any "stable"
        page ]
      Reported-by: NJann Horn <jannh@google.com>
      Tested-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NKirill Shutemov <kirill@shutemov.name>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      17839856
  8. 15 5月, 2020 1 次提交
  9. 22 4月, 2020 1 次提交
  10. 21 4月, 2020 1 次提交
  11. 09 4月, 2020 1 次提交
  12. 08 4月, 2020 5 次提交
  13. 03 4月, 2020 5 次提交
    • P
      mm/gup: allow to react to fatal signals · 71335f37
      Peter Xu 提交于
      The existing gup code does not react to the fatal signals in many code
      paths.  For example, in one retry path of gup we're still using
      down_read() rather than down_read_killable().  Also, when doing page
      faults we don't pass in FAULT_FLAG_KILLABLE as well, which means that
      within the faulting process we'll wait in non-killable way as well.  These
      were spotted by Linus during the code review of some other patches.
      
      Let's allow the gup code to react to fatal signals to improve the
      responsiveness of threads when during gup and being killed.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220160256.9887-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71335f37
    • P
      mm/gup: allow VM_FAULT_RETRY for multiple times · 4426e945
      Peter Xu 提交于
      This is the gup counterpart of the change that allows the VM_FAULT_RETRY
      to happen for more than once.  One thing to mention is that we must check
      the fatal signal here before retry because the GUP can be interrupted by
      that, otherwise we can loop forever.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220195357.16371-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4426e945
    • P
      mm/gup: fix __get_user_pages() on fault retry of hugetlb · ad415db8
      Peter Xu 提交于
      When follow_hugetlb_page() returns with *locked==0, it means we've got a
      VM_FAULT_RETRY within the fauling process and we've released the mmap_sem.
      When that happens, we should stop and bail out.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220155353.8676-3-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad415db8
    • P
      mm/gup: rename "nonblocking" to "locked" where proper · 4f6da934
      Peter Xu 提交于
      Patch series "mm: Page fault enhancements", v6.
      
      This series contains cleanups and enhancements to current page fault
      logic.  The whole idea comes from the discussion between Andrea and Linus
      on the bug reported by syzbot here:
      
        https://lkml.org/lkml/2017/11/2/833
      
      Basically it does two things:
      
        (a) Allows the page fault logic to be more interactive on not only
            SIGKILL, but also the rest of userspace signals, and,
      
        (b) Allows the page fault retry (VM_FAULT_RETRY) to happen for more
            than once.
      
      For (a): with the changes we should be able to react faster when page
      faults are working in parallel with userspace signals like SIGSTOP and
      SIGCONT (and more), and with that we can remove the buggy part in
      userfaultfd and benefit the whole page fault mechanism on faster signal
      processing to reach the userspace.
      
      For (b), we should be able to allow the page fault handler to loop for
      even more than twice.  Some context: for now since we have
      FAULT_FLAG_ALLOW_RETRY we can allow to retry the page fault once with the
      same interrupt context, however never more than twice.  This can be not
      only a potential cleanup to remove this assumption since AFAIU the code
      itself doesn't really have this twice-only limitation (though that should
      be a protective approach in the past), at the same time it'll greatly
      simplify future works like userfaultfd write-protect where it's possible
      to retry for more than twice (please have a look at [1] below for a
      possible user that might require the page fault to be handled for a third
      time; if we can remove the retry limitation we can simply drop that patch
      and those complexity).
      
      This patch (of 16):
      
      There's plenty of places around __get_user_pages() that has a parameter
      "nonblocking" which does not really mean that "it won't block" (because it
      can really block) but instead it shows whether the mmap_sem is released by
      up_read() during the page fault handling mostly when VM_FAULT_RETRY is
      returned.
      
      We have the correct naming in e.g.  get_user_pages_locked() or
      get_user_pages_remote() as "locked", however there're still many places
      that are using the "nonblocking" as name.
      
      Renaming the places to "locked" where proper to better suite the
      functionality of the variable.  While at it, fixing up some of the
      comments accordingly.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Reviewed-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Reviewed-by: NJerome Glisse <jglisse@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220155353.8676-2-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f6da934
    • P
      mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path · df3a0a21
      Pingfan Liu 提交于
      FOLL_LONGTERM is a special case of FOLL_PIN.  It suggests a pin which is
      going to be given to hardware and can't move.  It would truncate CMA
      permanently and should be excluded.
      
      In gup slow path, where
      __gup_longterm_locked->check_and_migrate_cma_pages() handles
      FOLL_LONGTERM, but in fast path, there lacks such a check, which means a
      possible leak of CMA page to longterm pinned.
      
      Place a check in try_grab_compound_head() in the fast path to fix the
      leak, and if FOLL_LONGTERM happens on CMA, it will fall back to slow path
      to migrate the page.
      
      Some note about the check: Huge page's subpages have the same migrate type
      due to either allocation from a free_list[] or alloc_contig_range() with
      param MIGRATE_MOVABLE.  So it is enough to check on a single subpage by
      is_migrate_cma_page(subpage)
      Signed-off-by: NPingfan Liu <kernelfans@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Link: http://lkml.kernel.org/r/1584876733-17405-3-git-send-email-kernelfans@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df3a0a21