- 04 5月, 2019 1 次提交
-
-
由 Linus Torvalds 提交于
commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64 upstream. If the page refcount wraps around past zero, it will be freed while there are still four billion references to it. One of the possible avenues for an attacker to try to make this happen is by doing direct IO on a page multiple times. This patch makes get_user_pages() refuse to take a new page reference if there are already more than two billion references to the page. Reported-by: NJann Horn <jannh@google.com> Acked-by: NMatthew Wilcox <willy@infradead.org> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 24 3月, 2019 1 次提交
-
-
由 Yu Zhao 提交于
[ Upstream commit 414fd080d125408cb15d04ff4907e1dd8145c8c7 ] For dax pmd, pmd_trans_huge() returns false but pmd_huge() returns true on x86. So the function works as long as hugetlb is configured. However, dax doesn't depend on hugetlb. Link: http://lkml.kernel.org/r/20190111034033.601-1-yuzhao@google.comSigned-off-by: NYu Zhao <yuzhao@google.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Keith Busch <keith.busch@intel.com> Cc: "Michael S . Tsirkin" <mst@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 24 8月, 2018 1 次提交
-
-
由 Souptick Joarder 提交于
Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. Ref-> commit 1c8f4220 ("mm: change return type to vm_fault_t") The aim is to change the return type of finish_fault() and handle_mm_fault() to vm_fault_t type. As part of that clean up return type of all other recursively called functions have been changed to vm_fault_t type. The places from where handle_mm_fault() is getting invoked will be change to vm_fault_t type but in a separate patch. vmf_error() is the newly introduce inline function in 4.17-rc6. [akpm@linux-foundation.org: don't shadow outer local `ret' in __do_huge_pmd_anonymous_page()] Link: http://lkml.kernel.org/r/20180604171727.GA20279@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com> Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 7月, 2018 1 次提交
-
-
由 Michal Hocko 提交于
syzbot has noticed that a specially crafted library can easily hit VM_BUG_ON in __mm_populate kernel BUG at mm/gup.c:1242! invalid opcode: 0000 [#1] SMP CPU: 2 PID: 9667 Comm: a.out Not tainted 4.18.0-rc3 #644 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 RIP: 0010:__mm_populate+0x1e2/0x1f0 Code: 55 d0 65 48 33 14 25 28 00 00 00 89 d8 75 21 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 75 18 f1 ff 0f 0b e8 6e 18 f1 ff <0f> 0b 31 db eb c9 e8 93 06 e0 ff 0f 1f 00 55 48 89 e5 53 48 89 fb Call Trace: vm_brk_flags+0xc3/0x100 vm_brk+0x1f/0x30 load_elf_library+0x281/0x2e0 __ia32_sys_uselib+0x170/0x1e0 do_fast_syscall_32+0xca/0x420 entry_SYSENTER_compat+0x70/0x7f The reason is that the length of the new brk is not page aligned when we try to populate the it. There is no reason to bug on that though. do_brk_flags already aligns the length properly so the mapping is expanded as it should. All we need is to tell mm_populate about it. Besides that there is absolutely no reason to to bug_on in the first place. The worst thing that could happen is that the last page wouldn't get populated and that is far from putting system into an inconsistent state. Fix the issue by moving the length sanitization code from do_brk_flags up to vm_brk_flags. The only other caller of do_brk_flags is brk syscall entry and it makes sure to provide the proper length so t here is no need for sanitation and so we can use do_brk_flags without it. Also remove the bogus BUG_ONs. [osalvador@techadventures.net: fix up vm_brk_flags s@request@len@] Link: http://lkml.kernel.org/r/20180706090217.GI32658@dhcp22.suse.czSigned-off-by: NMichal Hocko <mhocko@suse.com> Reported-by: Nsyzbot <syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com> Tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: NOscar Salvador <osalvador@suse.de> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 6月, 2018 2 次提交
-
-
由 Huang Ying 提交于
mmap_sem will be read locked when calling follow_pmd_mask(). But this cannot prevent PMD from being changed for all cases when PTL is unlocked, for example, from pmd_trans_huge() to pmd_none() via MADV_DONTNEED. So it is possible for the pmd_present() check in follow_pmd_mask() to encounter an invalid PMD. This may cause an incorrect VM_BUG_ON() or an infinite loop. Fix this by reading the PMD entry into a local variable with READ_ONCE() and checking the local variable and pmd_none() in the retry loop. As Kirill pointed out, with PTL unlocked, the *pmd may be changed under us, so reading it directly again and again may incur weird bugs. So although using *pmd directly other than for pmd_present() checking may be safe, it is still better to replace them to read *pmd once and check the local variable multiple times. When PTL unlocked, replace all *pmd with local variable was suggested by Kirill. Link: http://lkml.kernel.org/r/20180419083514.1365-1-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com> Reviewed-by: NZi Yan <zi.yan@cs.rutgers.edu> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Laurent Dufour 提交于
Currently the PTE special supports is turned on in per architecture header files. Most of the time, it is defined in arch/*/include/asm/pgtable.h depending or not on some other per architecture static definition. This patch introduce a new configuration variable to manage this directly in the Kconfig files. It would later replace __HAVE_ARCH_PTE_SPECIAL. Here notes for some architecture where the definition of __HAVE_ARCH_PTE_SPECIAL is not obvious: arm __HAVE_ARCH_PTE_SPECIAL which is currently defined in arch/arm/include/asm/pgtable-3level.h which is included by arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set. So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE. powerpc __HAVE_ARCH_PTE_SPECIAL is defined in 2 files: - arch/powerpc/include/asm/book3s/64/pgtable.h - arch/powerpc/include/asm/pte-common.h The first one is included if (PPC_BOOK3S & PPC64) while the second is included in all the other cases. So select ARCH_HAS_PTE_SPECIAL all the time. sparc: __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) && defined(__arch64__) which are defined through the compiler in sparc/Makefile if !SPARC32 which I assume to be if SPARC64. So select ARCH_HAS_PTE_SPECIAL if SPARC64 There is no functional change introduced by this patch. Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.comSigned-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com> Suggested-by: NJerome Glisse <jglisse@redhat.com> Reviewed-by: NJerome Glisse <jglisse@redhat.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <albert@sifive.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Christophe LEROY <christophe.leroy@c-s.fr> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 5月, 2018 1 次提交
-
-
由 Dan Williams 提交于
get_user_pages_fast() for device pages is missing the typical validation that all page references have been taken while the mapping was valid. Without this validation truncate operations can not reliably coordinate against new page reference events like O_DIRECT. Cc: <stable@vger.kernel.org> Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings") Reported-by: NJan Kara <jack@suse.cz> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 18 5月, 2018 1 次提交
-
-
由 Willy Tarreau 提交于
proc_pid_cmdline_read() and environ_read() directly access the target process' VM to retrieve the command line and environment. If this process remaps these areas onto a file via mmap(), the requesting process may experience various issues such as extra delays if the underlying device is slow to respond. Let's simply refuse to access file-backed areas in these functions. For this we add a new FOLL_ANON gup flag that is passed to all calls to access_remote_vm(). The code already takes care of such failures (including unmapped areas). Accesses via /proc/pid/mem were not changed though. This was assigned CVE-2018-1120. Note for stable backports: the patch may apply to kernels prior to 4.11 but silently miss one location; it must be checked that no call to access_remote_vm() keeps zero as the last argument. Reported-by: NQualys Security Advisory <qsa@qualys.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NWilly Tarreau <w@1wt.eu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 4月, 2018 2 次提交
-
-
由 Michael S. Tsirkin 提交于
__get_user_pages_fast handles errors differently from get_user_pages_fast: the former always returns the number of pages pinned, the later might return a negative error code. Link: http://lkml.kernel.org/r/1522962072-182137-6-git-send-email-mst@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michael S. Tsirkin 提交于
get_user_pages_fast is supposed to be a faster drop-in equivalent of get_user_pages. As such, callers expect it to return a negative return code when passed an invalid address, and never expect it to return 0 when passed a positive number of pages, since its documentation says: * Returns number of pages pinned. This may be fewer than the number * requested. If nr_pages is 0 or negative, returns 0. If no pages * were pinned, returns -errno. When get_user_pages_fast fall back on get_user_pages this is exactly what happens. Unfortunately the implementation is inconsistent: it returns 0 if passed a kernel address, confusing callers: for example, the following is pretty common but does not appear to do the right thing with a kernel address: ret = get_user_pages_fast(addr, 1, writeable, &page); if (ret < 0) return ret; Change get_user_pages_fast to return -EFAULT when supplied a kernel address to make it match expectations. All callers have been audited for consistency with the documented semantics. Link: http://lkml.kernel.org/r/1522962072-182137-4-git-send-email-mst@redhat.com Fixes: 5b65c467 ("mm, x86/mm: Fix performance regression in get_user_pages_fast()") Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reported-by: syzbot+6304bf97ef436580fede@syzkaller.appspotmail.com Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 4月, 2018 1 次提交
-
-
由 Mario Leinweber 提交于
- Fixed style error: 8 spaces -> 1 tab. - Fixed style warning: Corrected misleading indentation. Link: http://lkml.kernel.org/r/20180302210254.31888-1-marioleinweber@web.deSigned-off-by: NMario Leinweber <marioleinweber@web.de> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 3月, 2018 1 次提交
-
-
由 Andrea Arcangeli 提交于
KVM is hanging during postcopy live migration with userfaultfd because get_user_pages_unlocked is not capable to handle FOLL_NOWAIT. Earlier FOLL_NOWAIT was only ever passed to get_user_pages. Specifically faultin_page (the callee of get_user_pages_unlocked caller) doesn't know that if FAULT_FLAG_RETRY_NOWAIT was set in the page fault flags, when VM_FAULT_RETRY is returned, the mmap_sem wasn't actually released (even if nonblocking is not NULL). So it sets *nonblocking to zero and the caller won't release the mmap_sem thinking it was already released, but it wasn't because of FOLL_NOWAIT. Link: http://lkml.kernel.org/r/20180302174343.5421-2-aarcange@redhat.com Fixes: ce53053c ("kvm: switch get_user_page_nowait() to get_user_pages_unlocked()") Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reported-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Tested-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 1月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Change the calling convention so that get_dev_pagemap always consumes the previous reference instead of doing this using an explicit earlier call to put_dev_pagemap in the callers. The callers will still need to put the final reference after finishing the loop over the pages. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 16 12月, 2017 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commits 5c9d2d5c, c7da82b8, and e7fe7b5c. We'll probably need to revisit this, but basically we should not complicate the get_user_pages_fast() case, and checking the actual page table protection key bits will require more care anyway, since the protection keys depend on the exact state of the VM in question. Particularly when doing a "remote" page lookup (ie in somebody elses VM, not your own), you need to be much more careful than this was. Dave Hansen says: "So, the underlying bug here is that we now a get_user_pages_remote() and then go ahead and do the p*_access_permitted() checks against the current PKRU. This was introduced recently with the addition of the new p??_access_permitted() calls. We have checks in the VMA path for the "remote" gups and we avoid consulting PKRU for them. This got missed in the pkeys selftests because I did a ptrace read, but not a *write*. I also didn't explicitly test it against something where a COW needed to be done" It's also not entirely clear that it makes sense to check the protection key bits at this level at all. But one possible eventual solution is to make the get_user_pages_fast() case just abort if it sees protection key bits set, which makes us fall back to the regular get_user_pages() case, which then has a vma and can do the check there if we want to. We'll see. Somewhat related to this all: what we _do_ want to do some day is to check the PAGE_USER bit - it should obviously always be set for user pages, but it would be a good check to have back. Because we have no generic way to test for it, we lost it as part of moving over from the architecture-specific x86 GUP implementation to the generic one in commit e585513b ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation"). Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 12月, 2017 3 次提交
-
-
由 Al Viro 提交于
The only caller that doesn't pass true in it is get_user_pages() and it passes NULL in locked. The only place where we check it is if (notify_locked && lock_dropped && *locked) and lock_dropped can become true only if we have locked != NULL. In other words, the second part of condition will be false when called by get_user_pages(). Just get rid of the argument and turn the condition into if (lock_dropped && *locked) Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Equivalent transformation - the only place in __get_user_pages_locked() where we look at notify_drop argument is if (notify_drop && lock_dropped && *locked) { up_read(&mm->mmap_sem); *locked = 0; } in the very end. Changing notify_drop from false to true won't change behaviour unless *locked is non-zero. The caller is ret = __get_user_pages_locked(current, mm, start, nr_pages, pages, NULL, &locked, false, gup_flags | FOLL_TOUCH); if (locked) up_read(&mm->mmap_sem); so in that case the original kernel would have done up_read() right after return from __get_user_pages_locked(), while the modified one would've done it right before the return. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 30 11月, 2017 2 次提交
-
-
由 Dan Williams 提交于
Patch series "introduce get_user_pages_longterm()", v2. Here is a new get_user_pages api for cases where a driver intends to keep an elevated page count indefinitely. This is distinct from usages like iov_iter_get_pages where the elevated page counts are transient. The iov_iter_get_pages cases immediately turn around and submit the pages to a device driver which will put_page when the i/o operation completes (under kernel control). In the longterm case userspace is responsible for dropping the page reference at some undefined point in the future. This is untenable for filesystem-dax case where the filesystem is in control of the lifetime of the block / page and needs reasonable limits on how long it can wait for pages in a mapping to become idle. Fixing filesystems to actually wait for dax pages to be idle before blocks from a truncate/hole-punch operation are repurposed is saved for a later patch series. Also, allowing longterm registration of dax mappings is a future patch series that introduces a "map with lease" semantic where the kernel can revoke a lease and force userspace to drop its page references. I have also tagged these for -stable to purposely break cases that might assume that longterm memory registrations for filesystem-dax mappings were supported by the kernel. The behavior regression this policy change implies is one of the reasons we maintain the "dax enabled. Warning: EXPERIMENTAL, use at your own risk" notification when mounting a filesystem in dax mode. It is worth noting the device-dax interface does not suffer the same constraints since it does not support file space management operations like hole-punch. This patch (of 4): Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow long standing memory registrations against filesytem-dax vmas. Device-dax vmas do not have this problem and are explicitly allowed. This is temporary until a "memory registration with layout-lease" mechanism can be implemented for the affected sub-systems (RDMA and V4L2). [akpm@linux-foundation.org: use kcalloc()] Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: NDan Williams <dan.j.williams@intel.com> Suggested-by: NChristoph Hellwig <hch@lst.de> Cc: Doug Ledford <dledford@redhat.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Inki Dae <inki.dae@samsung.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Joonyoung Shim <jy0922.shim@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
The 'access_permitted' helper is used in the gup-fast path and goes beyond the simple _PAGE_RW check to also: - validate that the mapping is writable from a protection keys standpoint - validate that the pte has _PAGE_USER set since all fault paths where pte_write is must be referencing user-memory. Link: http://lkml.kernel.org/r/151043111604.2842.8051684481794973100.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 9月, 2017 1 次提交
-
-
由 Kirill A. Shutemov 提交于
The 0-day test bot found a performance regression that was tracked down to switching x86 to the generic get_user_pages_fast() implementation: http://lkml.kernel.org/r/20170710024020.GA26389@yexl-desktop The regression was caused by the fact that we now use local_irq_save() + local_irq_restore() in get_user_pages_fast() to disable interrupts. In x86 implementation local_irq_disable() + local_irq_enable() was used. The fix is to make get_user_pages_fast() use local_irq_disable(), leaving local_irq_save() for __get_user_pages_fast() that can be called with interrupts disabled. Numbers for pinning a gigabyte of memory, one page a time, 20 repeats: Before: Average: 14.91 ms, stddev: 0.45 ms After: Average: 10.76 ms, stddev: 0.18 ms Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: linux-mm@kvack.org Fixes: e585513b ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation") Link: http://lkml.kernel.org/r/20170908215603.9189-3-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 9月, 2017 2 次提交
-
-
由 Jérôme Glisse 提交于
Platform with advance system bus (like CAPI or CCIX) allow device memory to be accessible from CPU in a cache coherent fashion. Add a new type of ZONE_DEVICE to represent such memory. The use case are the same as for the un-addressable device memory but without all the corners cases. Link: http://lkml.kernel.org/r/20170817000548.32038-19-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Nellans <dnellans@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sherry Cheung <SCheung@nvidia.com> Cc: Subhash Gutti <sgutti@nvidia.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Bob Liu <liubo95@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zi Yan 提交于
When THP migration is being used, memory management code needs to handle pmd migration entries properly. This patch uses !pmd_present() or is_swap_pmd() (depending on whether pmd_none() needs separate code or not) to check pmd migration entries at the places where a pmd entry is present. Since pmd-related code uses split_huge_page(), split_huge_pmd(), pmd_trans_huge(), pmd_trans_unstable(), or pmd_none_or_trans_huge_or_clear_bad(), this patch: 1. adds pmd migration entry split code in split_huge_pmd(), 2. takes care of pmd migration entries whenever pmd_trans_huge() is present, 3. makes pmd_none_or_trans_huge_or_clear_bad() pmd migration entry aware. Since split_huge_page() uses split_huge_pmd() and pmd_trans_unstable() is equivalent to pmd_none_or_trans_huge_or_clear_bad(), we do not change them. Until this commit, a pmd entry should be: 1. pointing to a pte page, 2. is_swap_pmd(), 3. pmd_trans_huge(), 4. pmd_devmap(), or 5. pmd_none(). Signed-off-by: NZi Yan <zi.yan@cs.rutgers.edu> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Nellans <dnellans@nvidia.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 9月, 2017 1 次提交
-
-
由 Oliver O'Halloran 提交于
These functions are the only bits of generic code that use {pud,pmd}_pfn() without checking for CONFIG_TRANSPARENT_HUGEPAGE. This works fine on x86, the only arch with devmap support, since the *_pfn() functions are always defined there, but this isn't true for every architecture. Link: http://lkml.kernel.org/r/20170626063833.11094-1-oohall@gmail.comSigned-off-by: NOliver O'Halloran <oohall@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 7月, 2017 5 次提交
-
-
由 Punit Agrawal 提交于
When speculatively taking references to a hugepage using page_cache_add_speculative() in gup_huge_pmd(), it is assumed that the page returned by pmd_page() is the head page. Although normally true, this assumption doesn't hold when the hugepage comprises of successive page table entries such as when using contiguous bit on arm64 at PTE or PMD levels. This can be addressed by ensuring that the page passed to page_cache_add_speculative() is the real head or by de-referencing the head page within the function. We take the first approach to keep the usage pattern aligned with page_cache_get_speculative() where users already pass the appropriate page, i.e., the de-referenced head. Apply the same logic to fix gup_huge_[pud|pgd]() as well. [punit.agrawal@arm.com: fix arm64 ltp failure] Link: http://lkml.kernel.org/r/20170619170145.25577-5-punit.agrawal@arm.com Link: http://lkml.kernel.org/r/20170522133604.11392-3-punit.agrawal@arm.comSigned-off-by: NPunit Agrawal <punit.agrawal@arm.com> Acked-by: NSteve Capper <steve.capper@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Will Deacon 提交于
When operating on hugepages with DEBUG_VM enabled, the GUP code checks the compound head for each tail page prior to calling page_cache_add_speculative. This is broken, because on the fast-GUP path (where we don't hold any page table locks) we can be racing with a concurrent invocation of split_huge_page_to_list. split_huge_page_to_list deals with this race by using page_ref_freeze to freeze the page and force concurrent GUPs to fail whilst the component pages are modified. This modification includes clearing the compound_head field for the tail pages, so checking this prior to a successful call to page_cache_add_speculative can lead to false positives: In fact, page_cache_add_speculative *already* has this check once the page refcount has been successfully updated, so we can simply remove the broken calls to VM_BUG_ON_PAGE. Link: http://lkml.kernel.org/r/20170522133604.11392-2-punit.agrawal@arm.comSigned-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com> Acked-by: NSteve Capper <steve.capper@arm.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
Architectures like ppc64 supports hugepage size that is not mapped to any of of the page table levels. Instead they add an alternate page table entry format called hugepage directory (hugepd). hugepd indicates that the page table entry maps to a set of hugetlb pages. Add support for this in generic follow_page_mask code. We already support this format in the generic gup code. The default implementation prints warning and returns NULL. We will add ppc64 support in later patches Link: http://lkml.kernel.org/r/1494926612-23928-7-git-send-email-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Mike Kravetz <kravetz@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anshuman Khandual 提交于
ppc64 supports pgd hugetlb entries. Add code to handle hugetlb pgd entries to follow_page_mask so that ppc64 can switch to it to handle hugetlbe entries. Link: http://lkml.kernel.org/r/1494926612-23928-5-git-send-email-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Mike Kravetz <kravetz@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
Makes code reading easy. No functional changes in this patch. In a followup patch, we will be updating the follow_page_mask to handle hugetlb hugepd format so that archs like ppc64 can switch to the generic version. This split helps in doing that nicely. Link: http://lkml.kernel.org/r/1494926612-23928-3-git-send-email-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Mike Kravetz <kravetz@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 6月, 2017 1 次提交
-
-
由 Hugh Dickins 提交于
Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Original-patch-by: NOleg Nesterov <oleg@redhat.com> Original-patch-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NMichal Hocko <mhocko@suse.com> Tested-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 6月, 2017 1 次提交
-
-
由 Kirill A. Shutemov 提交于
This patch provides all required callbacks required by the generic get_user_pages_fast() code and switches x86 over - and removes the platform specific implementation. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-2-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 6月, 2017 1 次提交
-
-
由 James Morse 提交于
KVM uses get_user_pages() to resolve its stage2 faults. KVM sets the FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it finds a VM_FAULT_HWPOISON. KVM handles these hwpoison pages as a special case. (check_user_page_hwpoison()) When huge pages are involved, this doesn't work so well. get_user_pages() calls follow_hugetlb_page(), which stops early if it receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning -EFAULT to the caller. The step to map this to -EHWPOISON based on the FOLL_ flags is missing. The hwpoison special case is skipped, and -EFAULT is returned to user-space, causing Qemu or kvmtool to exit. Instead, move this VM_FAULT_ to errno mapping code into a header file and use it from faultin_page() and follow_hugetlb_page(). With this, KVM works as expected. This isn't a problem for arm64 today as we haven't enabled MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86 too, so I think this should be a fix. This doesn't apply earlier than stable's v4.11.1 due to all sorts of cleanup. [james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()] suggested. Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.comSigned-off-by: NJames Morse <james.morse@arm.com> Acked-by: NPunit Agrawal <punit.agrawal@arm.com> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> [4.11.1+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 5月, 2017 1 次提交
-
-
由 Arnd Bergmann 提交于
MIPS just got changed to only accept a pointer argument for access_ok(), causing one warning in drivers/scsi/pmcraid.c. I tried changing x86 the same way and found the same warning in __get_user_pages_fast() and nowhere else in the kernel during randconfig testing: mm/gup.c: In function '__get_user_pages_fast': mm/gup.c:1578:6: error: passing argument 1 of '__chk_range_not_ok' makes pointer from integer without a cast [-Werror=int-conversion] It would probably be a good idea to enforce type-safety in general, so let's change this file to not cause a warning if we do that. I don't know why the warning did not appear on MIPS. Fixes: 2667f50e ("mm: introduce a general RCU get_user_pages_fast()") Link: http://lkml.kernel.org/r/20170421162659.3314521-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 4月, 2017 1 次提交
-
-
由 Ingo Molnar 提交于
This reverts commit 2947ba05. Dan Williams reported dax-pmem kernel warnings with the following signature: WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200 percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic ... and bisected it to this commit, which suggests possible memory corruption caused by the x86 fast-GUP conversion. He also pointed out: " This is similar to the backtrace when we were not properly handling pud faults and was fixed with this commit: 220ced16 "mm: fix get_user_pages() vs device-dax pud mappings" I've found some missing _devmap checks in the generic get_user_pages_fast() path, but this does not fix the regression [...] " So given that there are known bugs, and a pretty robust looking bisection points to this commit suggesting that are unknown bugs in the conversion as well, revert it for the time being - we'll re-try in v4.13. Reported-by: NDan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: aneesh.kumar@linux.vnet.ibm.com Cc: dann.frazier@canonical.com Cc: dave.hansen@intel.com Cc: steve.capper@linaro.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 3月, 2017 7 次提交
-
-
由 Kirill A. Shutemov 提交于
This patch provides all required callbacks required by the generic get_user_pages_fast() code and switches x86 over - and removes the platform specific implementation. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316213906.89528-1-kirill.shutemov@linux.intel.com [ Minor readability edits. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill A. Shutemov 提交于
This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. On x86, get_user_pages_fast() does a couple of sanity checks to see if we can call __get_user_pages_fast() for the range. This kind of wrapping protection should be useful for the generic code too. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-7-kirill.shutemov@linux.intel.com [ Small readability edits. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill A. Shutemov 提交于
This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. Prepare generic GUP_fast() to handle dev_pagemap(). At the moment, it's only implemented on x86. On non-x86, the new code will be compiled out. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-6-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill A. Shutemov 提交于
This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. Unlike generic GUP_fast(), the x86 version makes all pages it touches referenced. It seems required for GRU and EPT. See the following commit: 8ee53820 ("thp: mmu_notifier_test_young") Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-5-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill A. Shutemov 提交于
This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. On x86 PAE, page table entry is larger than sizeof(long) and we would need to provide a helper that can read the entry atomically. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-4-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill A. Shutemov 提交于
This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. On x86, we would need to do additional permission checks to determine if access is allowed. Let's abstract it out into separate helpers. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-3-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill A. Shutemov 提交于
The only arch that defines it to something meaningful is x86. But x86 doesn't use the generic GUP_fast() implementation -- the only place where the callback is called. Let's drop it. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-2-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-