- 21 4月, 2020 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
Several references got broken due to txt to ReST conversion. Several of them can be automatically fixed with: scripts/documentation-file-ref-check --fix Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> # hwtracing/coresight/Kconfig Reviewed-by: Paul E. McKenney <paulmck@kernel.org> # memory-barrier.txt Acked-by: Alex Shi <alex.shi@linux.alibaba.com> # translations/zh_CN Acked-by: Federico Vaga <federico.vaga@vaga.pv.it> # translations/it_IT Acked-by: Marc Zyngier <maz@kernel.org> # kvm/arm64 Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/6f919ddb83a33b5f2a63b6b5f0575737bb2b36aa.1586881715.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>
-
- 09 4月, 2020 1 次提交
-
-
由 Hillf Danton 提交于
__get_user_pages_locked() will return 0 instead of -EINTR after commit 4426e945 ("mm/gup: allow VM_FAULT_RETRY for multiple times") which added extra code to allow gup detect fatal signal faster. Restore the original -EINTR behavior. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 4426e945 ("mm/gup: allow VM_FAULT_RETRY for multiple times") Reported-by: syzbot+3be1a33f04dc782e9fd5@syzkaller.appspotmail.com Signed-off-by: NHillf Danton <hdanton@sina.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 4月, 2020 5 次提交
-
-
由 Peter Xu 提交于
It's definitely incorrect to mark the lock as taken even if down_read_killable() failed. This wass overlooked when we switched from down_read() to down_read_killable() because down_read() won't fail while down_read_killable() could. Fixes: 71335f37 ("mm/gup: allow to react to fatal signals") Reported-by: syzbot+a8c70b7f3579fc0587dc@syzkaller.appspotmail.com Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
Convert the various /* fallthrough */ comments to the pseudo-keyword fallthrough; Done via script: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Link: http://lkml.kernel.org/r/f62fea5d10eb0ccfc05d87c242a620c261219b66.camel@perches.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Huang Ying 提交于
Some comments for MADV_FREE is revised and added to help people understand the MADV_FREE code, especially the page flag, PG_swapbacked. This makes page_is_file_cache() isn't consistent with its comments. So the function is renamed to page_is_file_lru() to make them consistent again. All these are put in one patch as one logical change. Suggested-by: NDavid Hildenbrand <david@redhat.com> Suggested-by: NJohannes Weiner <hannes@cmpxchg.org> Suggested-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: N"Huang, Ying" <ying.huang@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NMichal Hocko <mhocko@kernel.org> Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@surriel.com> Link: http://lkml.kernel.org/r/20200317100342.2730705-1-ying.huang@intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anshuman Khandual 提交于
This replaces all remaining open encodings with vma_is_anonymous(). Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nick Piggin <npiggin@gmail.com> Cc: Paul Burton <paulburton@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/1582520593-30704-5-git-send-email-anshuman.khandual@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anshuman Khandual 提交于
Lets move vma_is_accessible() helper to include/linux/mm.h which makes it available for general use. While here, this replaces all remaining open encodings for VMA access check with vma_is_accessible(). Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: NGuo Ren <guoren@kernel.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Guo Ren <guoren@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paulburton@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Nick Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 4月, 2020 14 次提交
-
-
由 Peter Xu 提交于
The existing gup code does not react to the fatal signals in many code paths. For example, in one retry path of gup we're still using down_read() rather than down_read_killable(). Also, when doing page faults we don't pass in FAULT_FLAG_KILLABLE as well, which means that within the faulting process we'll wait in non-killable way as well. These were spotted by Linus during the code review of some other patches. Let's allow the gup code to react to fatal signals to improve the responsiveness of threads when during gup and being killed. Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NBrian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160256.9887-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Xu 提交于
This is the gup counterpart of the change that allows the VM_FAULT_RETRY to happen for more than once. One thing to mention is that we must check the fatal signal here before retry because the GUP can be interrupted by that, otherwise we can loop forever. Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NBrian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220195357.16371-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Xu 提交于
When follow_hugetlb_page() returns with *locked==0, it means we've got a VM_FAULT_RETRY within the fauling process and we've released the mmap_sem. When that happens, we should stop and bail out. Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NBrian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220155353.8676-3-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Xu 提交于
Patch series "mm: Page fault enhancements", v6. This series contains cleanups and enhancements to current page fault logic. The whole idea comes from the discussion between Andrea and Linus on the bug reported by syzbot here: https://lkml.org/lkml/2017/11/2/833 Basically it does two things: (a) Allows the page fault logic to be more interactive on not only SIGKILL, but also the rest of userspace signals, and, (b) Allows the page fault retry (VM_FAULT_RETRY) to happen for more than once. For (a): with the changes we should be able to react faster when page faults are working in parallel with userspace signals like SIGSTOP and SIGCONT (and more), and with that we can remove the buggy part in userfaultfd and benefit the whole page fault mechanism on faster signal processing to reach the userspace. For (b), we should be able to allow the page fault handler to loop for even more than twice. Some context: for now since we have FAULT_FLAG_ALLOW_RETRY we can allow to retry the page fault once with the same interrupt context, however never more than twice. This can be not only a potential cleanup to remove this assumption since AFAIU the code itself doesn't really have this twice-only limitation (though that should be a protective approach in the past), at the same time it'll greatly simplify future works like userfaultfd write-protect where it's possible to retry for more than twice (please have a look at [1] below for a possible user that might require the page fault to be handled for a third time; if we can remove the retry limitation we can simply drop that patch and those complexity). This patch (of 16): There's plenty of places around __get_user_pages() that has a parameter "nonblocking" which does not really mean that "it won't block" (because it can really block) but instead it shows whether the mmap_sem is released by up_read() during the page fault handling mostly when VM_FAULT_RETRY is returned. We have the correct naming in e.g. get_user_pages_locked() or get_user_pages_remote() as "locked", however there're still many places that are using the "nonblocking" as name. Renaming the places to "locked" where proper to better suite the functionality of the variable. While at it, fixing up some of the comments accordingly. Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NBrian Geffon <bgeffon@google.com> Reviewed-by: NMike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: NJerome Glisse <jglisse@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Cracauer <cracauer@cons.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220155353.8676-2-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pingfan Liu 提交于
FOLL_LONGTERM is a special case of FOLL_PIN. It suggests a pin which is going to be given to hardware and can't move. It would truncate CMA permanently and should be excluded. In gup slow path, where __gup_longterm_locked->check_and_migrate_cma_pages() handles FOLL_LONGTERM, but in fast path, there lacks such a check, which means a possible leak of CMA page to longterm pinned. Place a check in try_grab_compound_head() in the fast path to fix the leak, and if FOLL_LONGTERM happens on CMA, it will fall back to slow path to migrate the page. Some note about the check: Huge page's subpages have the same migrate type due to either allocation from a free_list[] or alloc_contig_range() with param MIGRATE_MOVABLE. So it is enough to check on a single subpage by is_migrate_cma_page(subpage) Signed-off-by: NPingfan Liu <kernelfans@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Link: http://lkml.kernel.org/r/1584876733-17405-3-git-send-email-kernelfans@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pingfan Liu 提交于
To better reflect the held state of pages and make code self-explaining, rename nr as nr_pinned. Signed-off-by: NPingfan Liu <kernelfans@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Link: http://lkml.kernel.org/r/1584876733-17405-2-git-send-email-kernelfans@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Claudio Imbrenda 提交于
With the introduction of protected KVM guests on s390 there is now a concept of inaccessible pages. These pages need to be made accessible before the host can access them. While cpu accesses will trigger a fault that can be resolved, I/O accesses will just fail. We need to add a callback into architecture code for places that will do I/O, namely when writeback is started or when a page reference is taken. This is not only to enable paging, file backing etc, it is also necessary to protect the host against a malicious user space. For example a bad QEMU could simply start direct I/O on such protected memory. We do not want userspace to be able to trigger I/O errors and thus the logic is "whenever somebody accesses that page (gup) or does I/O, make sure that this page can be accessed". When the guest tries to access that page we will wait in the page fault handler for writeback to have finished and for the page_ref to be the expected value. On s390x the function is not supposed to fail, so it is ok to use a WARN_ON on failure. If we ever need some more finegrained handling we can tackle this when we know the details. Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com> Acked-by: NWill Deacon <will@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200306132537.783769-3-imbrenda@linux.ibm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
Now that pages are "DMA-pinned" via pin_user_page*(), and unpinned via unpin_user_pages*(), we need some visibility into whether all of this is working correctly. Add two new fields to /proc/vmstat: nr_foll_pin_acquired nr_foll_pin_released These are documented in Documentation/core-api/pin_user_pages.rst. They represent the number of pages (since boot time) that have been pinned ("nr_foll_pin_acquired") and unpinned ("nr_foll_pin_released"), via pin_user_pages*() and unpin_user_pages*(). In the absence of long-running DMA or RDMA operations that hold pages pinned, the above two fields will normally be equal to each other. Also: update Documentation/core-api/pin_user_pages.rst, to remove an earlier (now confirmed untrue) claim about a performance problem with /proc/vmstat. Also: update Documentation/core-api/pin_user_pages.rst to rename the new /proc/vmstat entries, to the names listed here. Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200211001536.1027652-9-jhubbard@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
For huge pages (and in fact, any compound page), the GUP_PIN_COUNTING_BIAS scheme tends to overflow too easily, each tail page increments the head page->_refcount by GUP_PIN_COUNTING_BIAS (1024). That limits the number of huge pages that can be pinned. This patch removes that limitation, by using an exact form of pin counting for compound pages of order > 1. The "order > 1" is required because this approach uses the 3rd struct page in the compound page, and order 1 compound pages only have two pages, so that won't work there. A new struct page field, hpage_pinned_refcount, has been added, replacing a padding field in the union (so no new space is used). This enhancement also has a useful side effect: huge pages and compound pages (of order > 1) do not suffer from the "potential false positives" problem that is discussed in the page_dma_pinned() comment block. That is because these compound pages have extra space for tracking things, so they get exact pin counts instead of overloading page->_refcount. Documentation/core-api/pin_user_pages.rst is updated accordingly. Suggested-by: NJan Kara <jack@suse.cz> Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200211001536.1027652-8-jhubbard@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
Add tracking of pages that were pinned via FOLL_PIN. This tracking is implemented via overloading of page->_refcount: pins are added by adding GUP_PIN_COUNTING_BIAS (1024) to the refcount. This provides a fuzzy indication of pinning, and it can have false positives (and that's OK). Please see the pre-existing Documentation/core-api/pin_user_pages.rst for details. As mentioned in pin_user_pages.rst, callers who effectively set FOLL_PIN (typically via pin_user_pages*()) are required to ultimately free such pages via unpin_user_page(). Please also note the limitation, discussed in pin_user_pages.rst under the "TODO: for 1GB and larger huge pages" section. (That limitation will be removed in a following patch.) The effect of a FOLL_PIN flag is similar to that of FOLL_GET, and may be thought of as "FOLL_GET for DIO and/or RDMA use". Pages that have been pinned via FOLL_PIN are identifiable via a new function call: bool page_maybe_dma_pinned(struct page *page); What to do in response to encountering such a page, is left to later patchsets. There is discussion about this in [1], [2], [3], and [4]. This also changes a BUG_ON(), to a WARN_ON(), in follow_page_mask(). [1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/ [2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/ [3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/ [4] LWN kernel index: get_user_pages(): https://lwn.net/Kernel/Index/#Memory_management-get_user_pages [jhubbard@nvidia.com: add kerneldoc] Link: http://lkml.kernel.org/r/20200307021157.235726-1-jhubbard@nvidia.com [imbrenda@linux.ibm.com: if pin fails, we need to unpin, a simple put_page will not be enough] Link: http://lkml.kernel.org/r/20200306132537.783769-2-imbrenda@linux.ibm.com [akpm@linux-foundation.org: fix put_compound_head defined but not used] Suggested-by: NJan Kara <jack@suse.cz> Suggested-by: NJérôme Glisse <jglisse@redhat.com> Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com> Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200211001536.1027652-7-jhubbard@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
Internal to mm/gup.c, require that get_user_pages_fast() and __get_user_pages_fast() identify themselves, by setting FOLL_GET. This is required in order to be able to make decisions based on "FOLL_PIN, or FOLL_GET, or both or neither are set", in upcoming patches. Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200211001536.1027652-6-jhubbard@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
In preparation for an upcoming patch, send gup flags args to two more routines: put_compound_head(), and undo_dev_pagemap(). Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200211001536.1027652-5-jhubbard@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
A subsequent patch requires access to gup flags, so pass the flags argument through to the __gup_device_* functions. Also placate checkpatch.pl by shortening a nearby line. Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NJérôme Glisse <jglisse@redhat.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200211001536.1027652-3-jhubbard@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
Patch series "mm/gup: track FOLL_PIN pages", v6. This activates tracking of FOLL_PIN pages. This is in support of fixing the get_user_pages()+DMA problem described in [1]-[4]. FOLL_PIN support is now in the main linux tree. However, the patch to use FOLL_PIN to track pages was *not* submitted, because Leon saw an RDMA test suite failure that involved (I think) page refcount overflows when huge pages were used. This patch definitively solves that kind of overflow problem, by adding an exact pincount, for compound pages (of order > 1), in the 3rd struct page of a compound page. If available, that form of pincounting is used, instead of the GUP_PIN_COUNTING_BIAS approach. Thanks again to Jan Kara for that idea. Other interesting changes: * dump_page(): added one, or two new things to report for compound pages: head refcount (for all compound pages), and map_pincount (for compound pages of order > 1). * Documentation/core-api/pin_user_pages.rst: removed the "TODO" for the huge page refcount upper limit problems, and added notes about how it works now. Also added a note about the dump_page() enhancements. * Added some comments in gup.c and mm.h, to explain that there are two ways to count pinned pages: exact (for compound pages of order > 1) and fuzzy (GUP_PIN_COUNTING_BIAS: for all other pages). ============================================================ General notes about the tracking patch: This is a prerequisite to solving the problem of proper interactions between file-backed pages, and [R]DMA activities, as discussed in [1], [2], [3], [4] and in a remarkable number of email threads since about 2017. :) In contrast to earlier approaches, the page tracking can be incrementally applied to the kernel call sites that, until now, have been simply calling get_user_pages() ("gup"). In other words, opt-in by changing from this: get_user_pages() (sets FOLL_GET) put_page() to this: pin_user_pages() (sets FOLL_PIN) unpin_user_page() ============================================================ Future steps: * Convert more subsystems from get_user_pages() to pin_user_pages(). The first probably needs to be bio/biovecs, because any filesystem testing is too difficult without those in place. * Change VFS and filesystems to respond appropriately when encountering dma-pinned pages. * Work with Ira and others to connect this all up with file system leases. [1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/ [2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/ [3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/ [4] LWN kernel index: get_user_pages() https://lwn.net/Kernel/Index/#Memory_management-get_user_pages This patch (of 12): An upcoming patch requires reusing the implementation of get_user_pages_remote(). Split up get_user_pages_remote() into an outer routine that checks flags, and an implementation routine that will be reused. This makes subsequent changes much easier to understand. There should be no change in behavior due to this patch. Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200211001536.1027652-2-jhubbard@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 2月, 2020 1 次提交
-
-
由 Peter Zijlstra 提交于
Towards a more consistent naming scheme. [akpm@linux-foundation.org: fix sparc64 Kconfig] Link: http://lkml.kernel.org/r/20200116064531.483522-7-aneesh.kumar@linux.ibm.comSigned-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 2月, 2020 8 次提交
-
-
由 John Hubbard 提交于
In order to provide a clearer, more symmetric API for pinning and unpinning DMA pages. This way, pin_user_pages*() calls match up with unpin_user_pages*() calls, and the API is a lot closer to being self-explanatory. Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
Introduce pin_user_pages*() variations of get_user_pages*() calls, and also pin_longterm_pages*() variations. For now, these are placeholder calls, until the various call sites are converted to use the correct get_user_pages*() or pin_user_pages*() API. These variants will eventually all set FOLL_PIN, which is also introduced, and thoroughly documented. pin_user_pages() pin_user_pages_remote() pin_user_pages_fast() All pages that are pinned via the above calls, must be unpinned via put_user_page(). The underlying rules are: * FOLL_PIN is a gup-internal flag, so the call sites should not directly set it. That behavior is enforced with assertions. * Call sites that want to indicate that they are going to do DirectIO ("DIO") or something with similar characteristics, should call a get_user_pages()-like wrapper call that sets FOLL_PIN. These wrappers will: * Start with "pin_user_pages" instead of "get_user_pages". That makes it easy to find and audit the call sites. * Set FOLL_PIN * For pages that are received via FOLL_PIN, those pages must be returned via put_user_page(). Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases in this documentation. (I've reworded it and expanded upon it.) Link: http://lkml.kernel.org/r/20200107224558.2362728-12-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> [Documentation] Reviewed-by: NJérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
Commit 817be129 ("mm: validate get_user_pages_fast flags") allowed only FOLL_WRITE and FOLL_LONGTERM to be passed to get_user_pages_fast(). This, combined with the fact that get_user_pages_fast() falls back to "slow gup", which *does* accept FOLL_FORCE, leads to an odd situation: if you need FOLL_FORCE, you cannot call get_user_pages_fast(). There does not appear to be any reason for filtering out FOLL_FORCE. There is nothing in the _fast() implementation that requires that we avoid writing to the pages. So it appears to have been an oversight. Fix by allowing FOLL_FORCE to be set for get_user_pages_fast(). Link: http://lkml.kernel.org/r/20200107224558.2362728-9-jhubbard@nvidia.com Fixes: 817be129 ("mm: validate get_user_pages_fast flags") Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
As it says in the updated comment in gup.c: current FOLL_LONGTERM behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on vmas. However, the corresponding restriction in get_user_pages_remote() was slightly stricter than is actually required: it forbade all FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers that do not set the "locked" arg. Update the code and comments to loosen the restriction, allowing FOLL_LONGTERM in some cases. Also, copy the DAX check ("if a VMA is DAX, don't allow long term pinning") from the VFIO call site, all the way into the internals of get_user_pages_remote() and __gup_longterm_locked(). That is: get_user_pages_remote() calls __gup_longterm_locked(), which in turn calls check_dax_vmas(). This check will then be removed from the VFIO call site in a subsequent patch. Thanks to Jason Gunthorpe for pointing out a clean way to fix this, and to Dan Williams for helping clarify the DAX refactoring. Link: http://lkml.kernel.org/r/20200107224558.2362728-7-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com> Tested-by: NAlex Williamson <alex.williamson@redhat.com> Acked-by: NAlex Williamson <alex.williamson@redhat.com> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Suggested-by: NJason Gunthorpe <jgg@ziepe.ca> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
An upcoming patch uses try_get_compound_head() more widely, so move it to the top of gup.c. Also fix a tiny spelling error and a checkpatch.pl warning. Link: http://lkml.kernel.org/r/20200107224558.2362728-3-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
Patch series "mm/gup: prereqs to track dma-pinned pages: FOLL_PIN", v12. Overview: This is a prerequisite to solving the problem of proper interactions between file-backed pages, and [R]DMA activities, as discussed in [1], [2], [3], and in a remarkable number of email threads since about 2017. :) A new internal gup flag, FOLL_PIN is introduced, and thoroughly documented in the last patch's Documentation/vm/pin_user_pages.rst. I believe that this will provide a good starting point for doing the layout lease work that Ira Weiny has been working on. That's because these new wrapper functions provide a clean, constrained, systematically named set of functionality that, again, is required in order to even know if a page is "dma-pinned". In contrast to earlier approaches, the page tracking can be incrementally applied to the kernel call sites that, until now, have been simply calling get_user_pages() ("gup"). In other words, opt-in by changing from this: get_user_pages() (sets FOLL_GET) put_page() to this: pin_user_pages() (sets FOLL_PIN) unpin_user_page() Testing: * I've done some overall kernel testing (LTP, and a few other goodies), and some directed testing to exercise some of the changes. And as you can see, gup_benchmark is enhanced to exercise this. Basically, I've been able to runtime test the core get_user_pages() and pin_user_pages() and related routines, but not so much on several of the call sites--but those are generally just a couple of lines changed, each. Not much of the kernel is actually using this, which on one hand reduces risk quite a lot. But on the other hand, testing coverage is low. So I'd love it if, in particular, the Infiniband and PowerPC folks could do a smoke test of this series for me. Runtime testing for the call sites so far is pretty light: * io_uring: Some directed tests from liburing exercise this, and they pass. * process_vm_access.c: A small directed test passes. * gup_benchmark: the enhanced version hits the new gup.c code, and passes. * infiniband: Ran rdma-core tests: rdma-core/build/bin/run_tests.py * VFIO: compiles (I'm vowing to set up a run time test soon, but it's not ready just yet) * powerpc: it compiles... * drm/via: compiles... * goldfish: compiles... * net/xdp: compiles... * media/v4l2: compiles... [1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/ [2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/ [3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/ This patch (of 22): There are four locations in gup.c that have a fair amount of code duplication. This means that changing one requires making the same changes in four places, not to mention reading the same code four times, and wondering if there are subtle differences. Factor out the common code into static functions, thus reducing the overall line count and the code's complexity. Also, take the opportunity to slightly improve the efficiency of the error cases, by doing a mass subtraction of the refcount, surrounded by get_page()/put_page(). Also, further simplify (slightly), by waiting until the the successful end of each routine, to increment *nr. Link: http://lkml.kernel.org/r/20200107224558.2362728-2-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJérôme Glisse <jglisse@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
No functional change, just leverage the helper function to improve readability as others. Link: http://lkml.kernel.org/r/20200113070322.26627-1-richardw.yang@linux.intel.comSigned-off-by: NWei Yang <richardw.yang@linux.intel.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NDavid Rientjes <rientjes@google.com> Reviewed-by: NRalph Campbell <rcampbell@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Qiujun Huang 提交于
sorry for not processing for a long time. I met it again. patch v1 https://lkml.org/lkml/2019/9/20/656 do_machine_check() do_memory_failure() memory_failure() hw_poison_user_mappings() try_to_unmap() pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); ...and now we have a swap entry that indicates that the page entry refers to a bad (and poisoned) page of memory, but gup_fast() at this level of the page table was ignoring swap entries, and incorrectly assuming that "!pxd_none() == valid and present". And this was not just a poisoned page problem, but a generaly swap entry problem. So, any swap entry type (device memory migration, numa migration, or just regular swapping) could lead to the same problem. Fix this by checking for pxd_present(), instead of pxd_none(). Link: http://lkml.kernel.org/r/1578479084-15508-1-git-send-email-hqjagain@gmail.comSigned-off-by: NQiujun Huang <hqjagain@gmail.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 12月, 2019 2 次提交
-
-
由 Liu Xiang 提交于
Fix comments of __get_user_pages() and get_user_pages_remote(), make them more clear. Link: http://lkml.kernel.org/r/1572443533-3118-1-git-send-email-liuxiang_1999@126.comSigned-off-by: NLiu Xiang <liuxiang_1999@126.com> Suggested-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 zhong jiang 提交于
check_and_migrate_cma_pages() was recording the result of __get_user_pages_locked() in an unsigned "nr_pages" variable. Because __get_user_pages_locked() returns a signed value that can include negative errno values, this had the effect of hiding errors. Change check_and_migrate_cma_pages() implementation so that it uses a signed variable instead, and propagates the results back to the caller just as other gup internal functions do. This was discovered with the help of unsigned_lesser_than_zero.cocci. Link: http://lkml.kernel.org/r/1571671030-58029-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhong jiang <zhongjiang@huawei.com> Suggested-by: NJohn Hubbard <jhubbard@nvidia.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 10月, 2019 1 次提交
-
-
由 John Hubbard 提交于
In several routines, the "flags" argument is incorrectly named "write". Change it to "flags". Also, in one place, the misnaming led to an actual bug: "flags & FOLL_WRITE" is required, rather than just "flags". (That problem was flagged by krobot, in v1 of this patch.) Also, change the flags argument from int, to unsigned int. You can see that this was a simple oversight, because the calling code passes "flags" to the fifth argument: gup_pgd_range(): ... if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr, PGDIR_SHIFT, next, flags, pages, nr)) ...which, until this patch, the callees referred to as "write". Also, change two lines to avoid checkpatch line length complaints, and another line to fix another oversight that checkpatch called out: missing "int" on pdshift. Link: http://lkml.kernel.org/r/20191014184639.1512873-3-jhubbard@nvidia.com Fixes: b798bec4 ("mm/gup: change write parameter to flags in fast walk") Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com> Reported-by: Nkbuild test robot <lkp@intel.com> Suggested-by: NKirill A. Shutemov <kirill@shutemov.name> Suggested-by: NIra Weiny <ira.weiny@intel.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 9月, 2019 1 次提交
-
-
由 Andrey Konovalov 提交于
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. mm/gup.c provides a kernel interface that accepts user addresses and manipulates user pages directly (for example get_user_pages, that is used by the futex syscall). Since a user can provided tagged addresses, we need to handle this case. Add untagging to gup.c functions that use user addresses for vma lookups. Link: http://lkml.kernel.org/r/4731bddba3c938658c10ff4ed55cc01c60f4c8f8.1563904656.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com> Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 9月, 2019 3 次提交
-
-
由 Song Liu 提交于
Introduce a new foll_flag: FOLL_SPLIT_PMD. As the name says FOLL_SPLIT_PMD splits huge pmd for given mm_struct, the underlining huge page stays as-is. FOLL_SPLIT_PMD is useful for cases where we need to use regular pages, but would switch back to huge page and huge pmd on. One of such example is uprobe. The following patches use FOLL_SPLIT_PMD in uprobe. Link: http://lkml.kernel.org/r/20190815164525.1848545-4-songliubraving@fb.comSigned-off-by: NSong Liu <songliubraving@fb.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
[11~From: John Hubbard <jhubbard@nvidia.com> Subject: mm/gup: add make_dirty arg to put_user_pages_dirty_lock() Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()", v3. There are about 50+ patches in my tree [2], and I'll be sending out the remaining ones in a few more groups: * The block/bio related changes (Jerome mostly wrote those, but I've had to move stuff around extensively, and add a little code) * mm/ changes * other subsystem patches * an RFC that shows the current state of the tracking patch set. That can only be applied after all call sites are converted, but it's good to get an early look at it. This is part a tree-wide conversion, as described in fc1d8e7c ("mm: introduce put_user_page*(), placeholder versions"). This patch (of 3): Provide more capable variation of put_user_pages_dirty_lock(), and delete put_user_pages_dirty(). This is based on the following: 1. Lots of call sites become simpler if a bool is passed into put_user_page*(), instead of making the call site choose which put_user_page*() variant to call. 2. Christoph Hellwig's observation that set_page_dirty_lock() is usually correct, and set_page_dirty() is usually a bug, or at least questionable, within a put_user_page*() calling chain. This leads to the following API choices: * put_user_pages_dirty_lock(page, npages, make_dirty) * There is no put_user_pages_dirty(). You have to hand code that, in the rare case that it's required. [jhubbard@nvidia.com: remove unused variable in siw_free_plist()] Link: http://lkml.kernel.org/r/20190729074306.10368-1-jhubbard@nvidia.com Link: http://lkml.kernel.org/r/20190724044537.10458-2-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Replace 1 << compound_order(page) with compound_nr(page). Minor improvements in readability. Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 7月, 2019 1 次提交
-
-
由 Robin Murphy 提交于
ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined with the long-out-of-date comment can lead to the impression than an architecture may just enable it (since __add_pages() now "comprehends device memory" for itself) and expect things to work. In practice, however, ZONE_DEVICE users have little chance of functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper dependency so the real situation is clearer. Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.comSigned-off-by: NRobin Murphy <robin.murphy@arm.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Acked-by: NOliver O'Halloran <oohall@gmail.com> Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 7月, 2019 2 次提交
-
-
由 Guenter Roeck 提交于
Several mips builds generate the following build warning. mm/gup.c:1788:13: warning: 'undo_dev_pagemap' defined but not used The function is declared unconditionally but only called from behind various ifdefs. Mark it __maybe_unused. Link: http://lkml.kernel.org/r/1562072523-22311-1-git-send-email-linux@roeck-us.netSigned-off-by: NGuenter Roeck <linux@roeck-us.net> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Lutomirski 提交于
If we end up without a PGD or PUD entry backing the gate area, don't BUG -- just fail gracefully. It's not entirely implausible that this could happen some day on x86. It doesn't right now even with an execute-only emulated vsyscall page because the fixmap shares the PUD, but the core mm code shouldn't rely on that particular detail to avoid OOPSing. Link: http://lkml.kernel.org/r/a1d9f4efb75b9d464e59fd6af00104b21c58f6f7.1561610798.git.luto@kernel.orgSigned-off-by: NAndy Lutomirski <luto@kernel.org> Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-