- 27 9月, 2022 2 次提交
-
-
由 Liam R. Howlett 提交于
Replace any vm_next use with vma_find(). Update free_pgtables(), unmap_vmas(), and zap_page_range() to use the maple tree. Use the new free_pgtables() and unmap_vmas() in do_mas_align_munmap(). At the same time, alter the loop to be more compact. Now that free_pgtables() and unmap_vmas() take a maple tree as an argument, rearrange do_mas_align_munmap() to use the new tree to hold the vmas to remove. Remove __vma_link_list() and __vma_unlink_list() as they are exclusively used to update the linked list. Drop linked list update from __insert_vm_struct(). Rework validation of tree as it was depending on the linked list. [yang.lee@linux.alibaba.com: fix one kernel-doc comment] Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1949 Link: https://lkml.kernel.org/r/20220824021918.94116-1-yang.lee@linux.alibaba.comLink: https://lkml.kernel.org/r/20220906194824.2110408-69-Liam.Howlett@oracle.comSigned-off-by: NLiam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: NYang Li <yang.lee@linux.alibaba.com> Tested-by: NYu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
由 Liam R. Howlett 提交于
Remove the RB tree and start using the maple tree for vm_area_struct tracking. Drop validate_mm() calls in expand_upwards() and expand_downwards() as the lock is not held. Link: https://lkml.kernel.org/r/20220906194824.2110408-18-Liam.Howlett@oracle.comSigned-off-by: NLiam R. Howlett <Liam.Howlett@Oracle.com> Tested-by: NYu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
- 12 9月, 2022 2 次提交
-
-
由 Kefeng Wang 提交于
If a process has not enough memory to allocate a new virtual mapping, we may meet verious kinds of error, eg, fork cannot allocate memory, SIGBUS error in shmem, but it is difficult to confirm them, let's add some debug information to easily to check this scenario if __vm_enough_memory fails. Link: https://lkml.kernel.org/r/20220726145428.8030-1-wangkefeng.wang@huawei.comReported-by: NYongqiang Liu <liuyongqiang13@huawei.com> Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Acked-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
由 Kairui Song 提交于
folio_test_hugetlb() will call PageHeadHuge which is a function call, and blocks the compiler from recognizing this redundant load. After rearranging the code, stack usage is dropped from 32 to 24, and the function size is smaller (tested on GCC 12): Before: Stack usage: mm/util.c:845:5:folio_mapcount 32 static Size: 0000000000000ea0 00000000000000c7 T folio_mapcount After: Stack usage: mm/util.c:845:5:folio_mapcount 24 static Size: 0000000000000ea0 00000000000000b0 T folio_mapcount Link: https://lkml.kernel.org/r/20220801173155.92008-1-ryncsn@gmail.comSigned-off-by: NKairui Song <kasong@tencent.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
- 03 8月, 2022 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
These drivers are rather uncomfortably hammered into the address_space_operations hole. They aren't filesystems and don't behave like filesystems. They just need their own movable_operations structure, which we can point to directly from page->mapping. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
- 28 6月, 2022 1 次提交
-
-
由 Mike Rapoport 提交于
so it will be consistent with code mm directory and with Documentation/admin-guide/mm and won't be confused with virtual machines. Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Suggested-by: NMatthew Wilcox <willy@infradead.org> Tested-by: NIra Weiny <ira.weiny@intel.com> Acked-by: NJonathan Corbet <corbet@lwn.net> Acked-by: NWu XiangCheng <bobwxc@email.cn>
-
- 19 5月, 2022 1 次提交
-
-
由 Jason A. Donenfeld 提交于
randomize_page is an mm function. It is documented like one. It contains the history of one. It has the naming convention of one. It looks just like another very similar function in mm, randomize_stack_top(). And it has always been maintained and updated by mm people. There is no need for it to be in random.c. In the "which shape does not look like the other ones" test, pointing to randomize_page() is correct. So move randomize_page() into mm/util.c, right next to the similar randomize_stack_top() function. This commit contains no actual code changes. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
- 10 5月, 2022 1 次提交
-
-
由 NeilBrown 提交于
Patch series "MM changes to improve swap-over-NFS support". Assorted improvements for swap-via-filesystem. This is a resend of these patches, rebased on current HEAD. The only substantial changes is that swap_dirty_folio has replaced swap_set_page_dirty. Currently swap-via-fs (SWP_FS_OPS) doesn't work for any filesystem. It has previously worked for NFS but that broke a few releases back. This series changes to use a new ->swap_rw rather than ->readpage and ->direct_IO. It also makes other improvements. There is a companion series already in linux-next which fixes various issues with NFS. Once both series land, a final patch is needed which changes NFS over to use ->swap_rw. This patch (of 10): Many functions declared in include/linux/swap.h are only used within mm/ Create a new "mm/swap.h" and move some of these declarations there. Remove the redundant 'extern' from the function declarations. [akpm@linux-foundation.org: mm/memory-failure.c needs mm/swap.h] Link: https://lkml.kernel.org/r/164859751830.29473.5309689752169286816.stgit@noble.brown Link: https://lkml.kernel.org/r/164859778120.29473.11725907882296224053.stgit@noble.brownSigned-off-by: NNeilBrown <neilb@suse.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Tested-by: NDavid Howells <dhowells@redhat.com> Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
- 05 5月, 2022 1 次提交
-
-
由 Christophe Leroy 提交于
Commit e7142bf5 ("arm64, mm: make randomization selected by generic topdown mmap layout") introduced a default version of arch_randomize_brk() provided when CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT is selected. powerpc could select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT but needs to provide its own arch_randomize_brk(). In order to allow that, define generic version of arch_randomize_brk() as a __weak symbol. Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Acked-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b222f1ca06c850daf1b2f26afdb46c6dd97d21ba.1649523076.git.christophe.leroy@csgroup.eu
-
- 25 4月, 2022 1 次提交
-
-
由 Linus Torvalds 提交于
Since commit 559089e0 ("vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP"), the use of hugepage mappings for vmalloc is an opt-in strategy, because it caused a number of problems that weren't noticed until x86 enabled it too. One of the issues was fixed by Nick Piggin in commit 3b8000ae ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound"), but I'm still worried about page protection issues, and VM_FLUSH_RESET_PERMS in particular. However, like the hash table allocation case (commit f2edd118: "page_alloc: use vmalloc_huge for large system hash"), the use of kvmalloc() should be safe from any such games, since the returned pointer might be a SLUB allocation, and as such no user should reasonably be using it in any odd ways. We also know that the allocations are fairly large, since it falls back to the vmalloc case only when a kmalloc() fails. So using a hugepage mapping seems both safe and relevant. This patch does show a weakness in the opt-in strategy: since the opt-in flag is in the 'vm_flags', not the usual gfp_t allocation flags, very few of the usual interfaces actually expose it. That's not much of an issue in this case that already used one of the fairly specialized low-level vmalloc interfaces for the allocation, but for a lot of other vmalloc() users that might want to opt in, it's going to be very inconvenient. We'll either have to fix any compatibility problems, or expose it in the gfp flags (__GFP_COMP would have made a lot of sense) to allow normal vmalloc() users to use hugepage mappings. That said, the cases that really matter were probably already taken care of by the hash tabel allocation. Link: https://lore.kernel.org/all/20220415164413.2727220-1-song@kernel.org/ Link: https://lore.kernel.org/all/CAHk-=whao=iosX1s5Z4SF-ZGa-ebAukJoAdUJFk5SPwnofV+Vg@mail.gmail.com/ Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Cc: Song Liu <songliubraving@fb.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 3月, 2022 2 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
Move the prototype from mm.h to mm/internal.h and convert all callers to pass a folio. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Matthew Wilcox (Oracle) 提交于
This implements the same algorithm as total_mapcount(), which is transformed into a wrapper function. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 08 3月, 2022 1 次提交
-
-
由 Paolo Bonzini 提交于
Linux has dozens of occurrences of vmalloc(array_size()) and vzalloc(array_size()). Allow to simplify the code by providing vmalloc_array and vcalloc, as well as the underscored variants that let the caller specify the GFP flags. Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 3月, 2022 1 次提交
-
-
由 Daniel Borkmann 提交于
syzkaller was recently triggering an oversized kvmalloc() warning via xdp_umem_create(). The triggered warning was added back in 7661809d ("mm: don't allow oversized kvmalloc() calls"). The rationale for the warning for huge kvmalloc sizes was as a reaction to a security bug where the size was more than UINT_MAX but not everything was prepared to handle unsigned long sizes. Anyway, the AF_XDP related call trace from this syzkaller report was: kvmalloc include/linux/mm.h:806 [inline] kvmalloc_array include/linux/mm.h:824 [inline] kvcalloc include/linux/mm.h:829 [inline] xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline] xdp_umem_reg net/xdp/xdp_umem.c:219 [inline] xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252 xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068 __sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176 __do_sys_setsockopt net/socket.c:2187 [inline] __se_sys_setsockopt net/socket.c:2184 [inline] __x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Björn mentioned that requests for >2GB allocation can still be valid: The structure that is being allocated is the page-pinning accounting. AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/ PAGE_SIZE on 64 bit systems). [...] I could just change from U32_MAX to INT_MAX, but as I stated earlier that has a hacky feeling to it. [...] From my perspective, the code isn't broken, with the memcg limits in consideration. [...] Linus says: [...] Pretty much every time this has come up, the kernel warning has shown that yes, the code was broken and there really wasn't a reason for doing allocations that big. Of course, some people would be perfectly fine with the allocation failing, they just don't want the warning. I didn't want __GFP_NOWARN to shut it up originally because I wanted people to see all those cases, but these days I think we can just say "yeah, people can shut it up explicitly by saying 'go ahead and fail this allocation, don't warn about it'". So enough time has passed that by now I'd certainly be ok with [it]. Thus allow call-sites to silence such userspace triggered splats if the allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call to kvcalloc() this is already the case, so nothing else needed there. Fixes: 7661809d ("mm: don't allow oversized kvmalloc() calls") Reported-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Tested-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com Cc: Björn Töpel <bjorn@kernel.org> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: David S. Miller <davem@davemloft.net> Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.orgReviewed-by: NLeon Romanovsky <leonro@nvidia.com> Ackd-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 1月, 2022 1 次提交
-
-
由 Michal Hocko 提交于
Support for GFP_NO{FS,IO} and __GFP_NOFAIL has been implemented by previous patches so we can allow the support for kvmalloc. This will allow some external users to simplify or completely remove their helpers. GFP_NOWAIT semantic hasn't been supported so far but it hasn't been explicitly documented so let's add a note about that. ceph_kvmalloc is the first helper to be dropped and changed to kvmalloc. Link: https://lkml.kernel.org/r/20211122153233.9924-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NUladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 11月, 2021 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
There's no need for this predicate; callers can just use !folio_test_large(). Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
- 18 10月, 2021 2 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
This is the folio equivalent of migrate_page_copy(), which is retained as a wrapper for filesystems which are not yet converted to folios. Also convert copy_huge_page() to folio_copy(). Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NZi Yan <ziy@nvidia.com> Acked-by: NVlastimil Babka <vbabka@suse.cz>
-
由 Matthew Wilcox (Oracle) 提交于
This is a default implementation which calls flush_dcache_page() on each page in the folio. If architectures can do better, they should implement their own version of it. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NVlastimil Babka <vbabka@suse.cz>
-
- 27 9月, 2021 3 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
Convert __page_rmapping to folio_raw_mapping and move it to mm/internal.h. It's only a couple of instructions (load and mask), so it's definitely going to be cheaper to inline it than call it. Leave page_rmapping out of line. Change page_anon_vma() to not call folio_raw_mapping() -- it's more efficient to do the subtraction than the mask. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NDavid Howells <dhowells@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz>
-
由 Matthew Wilcox (Oracle) 提交于
This function is the equivalent of page_mapped(). It is slightly shorter as we do not need to handle the PageTail() case. Reimplement page_mapped() as a wrapper around folio_mapped(). folio_mapped() is 13 bytes smaller than page_mapped(), but the page_mapped() wrapper is 30 bytes, for a net increase of 17 bytes of text. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Howells <dhowells@redhat.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMike Rapoport <rppt@linux.ibm.com>
-
由 Matthew Wilcox (Oracle) 提交于
These are the folio equivalent of page_mapping() and page_file_mapping(). Add an out-of-line page_mapping() wrapper around folio_mapping() in order to prevent the page_folio() call from bloating every caller of page_mapping(). Adjust page_file_mapping() and page_mapping_file() to use folios internally. Rename __page_file_mapping() to swapcache_mapping() and change it to take a folio. This ends up saving 122 bytes of text overall. folio_mapping() is 45 bytes shorter than page_mapping() was, but the new page_mapping() wrapper is 30 bytes. The major reduction is a few bytes less in dozens of nfs functions (which call page_file_mapping()). Most of these appear to be a slight change in gcc's register allocation decisions, which allow: 48 8b 56 08 mov 0x8(%rsi),%rdx 48 8d 42 ff lea -0x1(%rdx),%rax 83 e2 01 and $0x1,%edx 48 0f 44 c6 cmove %rsi,%rax to become: 48 8b 46 08 mov 0x8(%rsi),%rax 48 8d 78 ff lea -0x1(%rax),%rdi a8 01 test $0x1,%al 48 0f 44 fe cmove %rsi,%rdi for a reduction of a single byte. Once the NFS client is converted to use folios, this entire sequence will disappear. Also add folio_mapping() documentation. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Acked-by: NJeff Layton <jlayton@kernel.org> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com> Reviewed-by: NDavid Howells <dhowells@redhat.com>
-
- 25 9月, 2021 1 次提交
-
-
由 Chen Jun 提交于
We get an unexpected value of /proc/sys/vm/overcommit_memory after running the following program: int main() { int fd = open("/proc/sys/vm/overcommit_memory", O_RDWR); write(fd, "1", 1); write(fd, "2", 1); close(fd); } write(fd, "2", 1) will pass *ppos = 1 to proc_dointvec_minmax. proc_dointvec_minmax will return 0 without setting new_policy. t.data = &new_policy; ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos) -->do_proc_dointvec -->__do_proc_dointvec if (write) { if (proc_first_pos_non_zero_ignore(ppos, table)) goto out; sysctl_overcommit_memory = new_policy; so sysctl_overcommit_memory will be set to an uninitialized value. Check whether new_policy has been changed by proc_dointvec_minmax. Link: https://lkml.kernel.org/r/20210923020524.13289-1-chenjun102@huawei.com Fixes: 56f3547b ("mm: adjust vm_committed_as_batch according to vm overcommit policy") Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NFeng Tang <feng.tang@intel.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Cc: Rui Xiang <rui.xiang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 9月, 2021 1 次提交
-
-
由 Linus Torvalds 提交于
'kvmalloc()' is a convenience function for people who want to do a kmalloc() but fall back on vmalloc() if there aren't enough physically contiguous pages, or if the allocation is larger than what kmalloc() supports. However, let's make sure it doesn't get _too_ easy to do crazy things with it. In particular, don't allow big allocations that could be due to integer overflow or underflow. So make sure the allocation size fits in an 'int', to protect against trivial integer conversion issues. Acked-by: NWilly Tarreau <w@1wt.eu> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 8月, 2021 1 次提交
-
-
由 Dave Chinner 提交于
During log recovery of an XFS filesystem with 64kB directory buffers, rebuilding a buffer split across two log records results in a memory allocation warning from krealloc like this: xfs filesystem being mounted at /mnt/scratch supports timestamps until 2038 (0x7fffffff) XFS (dm-0): Unmounting Filesystem XFS (dm-0): Mounting V5 Filesystem XFS (dm-0): Starting recovery (logdev: internal) ------------[ cut here ]------------ WARNING: CPU: 5 PID: 3435170 at mm/page_alloc.c:3539 get_page_from_freelist+0xdee/0xe40 ..... RIP: 0010:get_page_from_freelist+0xdee/0xe40 Call Trace: ? complete+0x3f/0x50 __alloc_pages+0x16f/0x300 alloc_pages+0x87/0x110 kmalloc_order+0x2c/0x90 kmalloc_order_trace+0x1d/0x90 __kmalloc_track_caller+0x215/0x270 ? xlog_recover_add_to_cont_trans+0x63/0x1f0 krealloc+0x54/0xb0 xlog_recover_add_to_cont_trans+0x63/0x1f0 xlog_recovery_process_trans+0xc1/0xd0 xlog_recover_process_ophdr+0x86/0x130 xlog_recover_process_data+0x9f/0x160 xlog_recover_process+0xa2/0x120 xlog_do_recovery_pass+0x40b/0x7d0 ? __irq_work_queue_local+0x4f/0x60 ? irq_work_queue+0x3a/0x50 xlog_do_log_recovery+0x70/0x150 xlog_do_recover+0x38/0x1d0 xlog_recover+0xd8/0x170 xfs_log_mount+0x181/0x300 xfs_mountfs+0x4a1/0x9b0 xfs_fs_fill_super+0x3c0/0x7b0 get_tree_bdev+0x171/0x270 ? suffix_kstrtoint.constprop.0+0xf0/0xf0 xfs_fs_get_tree+0x15/0x20 vfs_get_tree+0x24/0xc0 path_mount+0x2f5/0xaf0 __x64_sys_mount+0x108/0x140 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Essentially, we are taking a multi-order allocation from kmem_alloc() (which has an open coded no fail, no warn loop) and then reallocating it out to 64kB using krealloc(__GFP_NOFAIL) and that is then triggering the above warning. This is a regression caused by converting this code from an open coded no fail/no warn reallocation loop to using __GFP_NOFAIL. What we actually need here is kvrealloc(), so that if contiguous page allocation fails we fall back to vmalloc() and we don't get nasty warnings happening in XFS. Fixes: 771915c4 ("xfs: remove kmem_realloc()") Signed-off-by: NDave Chinner <dchinner@redhat.com> Acked-by: NMel Gorman <mgorman@techsingularity.net> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
- 13 7月, 2021 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
Rewrite copy_huge_page() and move it into mm/util.c so it's always available. Fixes an exposure of uninitialised memory on configurations with HUGETLB and UFFD enabled and MIGRATION disabled. Fixes: 8cc5fcbb ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY") Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 7月, 2021 1 次提交
-
-
由 David Hildenbrand 提交于
A driver might set a page logically offline -- PageOffline() -- and turn the page inaccessible in the hypervisor; after that, access to page content can be fatal. One example is virtio-mem; while unplugged memory -- marked as PageOffline() can currently be read in the hypervisor, this will no longer be the case in the future; for example, when having a virtio-mem device backed by huge pages in the hypervisor. Some special PFN walkers -- i.e., /proc/kcore -- read content of random pages after checking PageOffline(); however, these PFN walkers can race with drivers that set PageOffline(). Let's introduce page_offline_(begin|end|freeze|thaw) for synchronizing. page_offline_freeze()/page_offline_thaw() allows for a subsystem to synchronize with such drivers, achieving that a page cannot be set PageOffline() while frozen. page_offline_begin()/page_offline_end() is used by drivers that care about such races when setting a page PageOffline(). For simplicity, use a rwsem for now; neither drivers nor users are performance sensitive. Link: https://lkml.kernel.org/r/20210526093041.8800-5-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Cc: Aili Yao <yaoaili@kingsoft.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jiri Bohac <jbohac@suse.cz> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Roman Gushchin <guro@fb.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Steven Price <steven.price@arm.com> Cc: Wei Liu <wei.liu@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 5月, 2021 1 次提交
-
-
由 Maninder Singh 提交于
This commit adds enables a stack dump for the last free of an object: slab kmalloc-64 start c8ab0140 data offset 64 pointer offset 0 size 64 allocated at meminfo_proc_show+0x40/0x4fc [ 20.192078] meminfo_proc_show+0x40/0x4fc [ 20.192263] seq_read_iter+0x18c/0x4c4 [ 20.192430] proc_reg_read_iter+0x84/0xac [ 20.192617] generic_file_splice_read+0xe8/0x17c [ 20.192816] splice_direct_to_actor+0xb8/0x290 [ 20.193008] do_splice_direct+0xa0/0xe0 [ 20.193185] do_sendfile+0x2d0/0x438 [ 20.193345] sys_sendfile64+0x12c/0x140 [ 20.193523] ret_fast_syscall+0x0/0x58 [ 20.193695] 0xbeeacde4 [ 20.193822] Free path: [ 20.193935] meminfo_proc_show+0x5c/0x4fc [ 20.194115] seq_read_iter+0x18c/0x4c4 [ 20.194285] proc_reg_read_iter+0x84/0xac [ 20.194475] generic_file_splice_read+0xe8/0x17c [ 20.194685] splice_direct_to_actor+0xb8/0x290 [ 20.194870] do_splice_direct+0xa0/0xe0 [ 20.195014] do_sendfile+0x2d0/0x438 [ 20.195174] sys_sendfile64+0x12c/0x140 [ 20.195336] ret_fast_syscall+0x0/0x58 [ 20.195491] 0xbeeacde4 Acked-by: NVlastimil Babka <vbabka@suse.cz> Co-developed-by: NVaneet Narang <v.narang@samsung.com> Signed-off-by: NVaneet Narang <v.narang@samsung.com> Signed-off-by: NManinder Singh <maninder1.s@samsung.com> Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
-
- 06 5月, 2021 2 次提交
-
-
由 Bhaskar Chowdhury 提交于
s/condtion/condition/ Link: https://lkml.kernel.org/r/20210317033439.3429411-1-unixbhaskar@gmail.comSigned-off-by: NBhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
Simplify the code by using a temporary and reduce the object size by using a single call to pr_cont(). Reverse a test and unindent a block too. $ size mm/util.o* (defconfig x86-64) text data bss dec hex filename 7419 372 40 7831 1e97 mm/util.o.new 7477 372 40 7889 1ed1 mm/util.o.old Link: https://lkml.kernel.org/r/a6e105886338f68afd35f7a13d73bcf06b0cc732.camel@perches.comSigned-off-by: NJoe Perches <joe@perches.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 5月, 2021 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
page_mapping_file() is only used by some architectures, and then it is usually only used in one place. Make it a static inline function so other architectures don't have to carry this dead code. Link: https://lkml.kernel.org/r/20210317123011.350118-1-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 3月, 2021 2 次提交
-
-
由 Paul E. McKenney 提交于
This commit adds a few crude tests for mem_dump_obj() to rcutorture runs. Just to prevent bitrot, you understand! Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
-
由 Paul E. McKenney 提交于
The mem_dump_obj() functionality adds a few hundred bytes, which is a small price to pay. Except on kernels built with CONFIG_PRINTK=n, in which mem_dump_obj() messages will be suppressed. This commit therefore makes mem_dump_obj() be a static inline empty function on kernels built with CONFIG_PRINTK=n and excludes all of its support functions as well. This avoids kernel bloat on systems that cannot use mem_dump_obj(). Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <linux-mm@kvack.org> Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
-
- 23 1月, 2021 3 次提交
-
-
由 Paul E. McKenney 提交于
This commit adds vmalloc() support to mem_dump_obj(). Note that the vmalloc_dump_obj() function combines the checking and dumping, in contrast with the split between kmem_valid_obj() and kmem_dump_obj(). The reason for the difference is that the checking in the vmalloc() case involves acquiring a global lock, and redundant acquisitions of global locks should be avoided, even on not-so-fast paths. Note that this change causes on-stack variables to be reported as vmalloc() storage from kernel_clone() or similar, depending on the degree of inlining that your compiler does. This is likely more helpful than the earlier "non-paged (local) memory". Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <linux-mm@kvack.org> Reported-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
-
由 Paul E. McKenney 提交于
This commit makes mem_dump_obj() call out NULL and zero-sized pointers specially instead of classifying them as non-paged memory. Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org> Reported-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
-
由 Paul E. McKenney 提交于
There are kernel facilities such as per-CPU reference counts that give error messages in generic handlers or callbacks, whose messages are unenlightening. In the case of per-CPU reference-count underflow, this is not a problem when creating a new use of this facility because in that case the bug is almost certainly in the code implementing that new use. However, trouble arises when deploying across many systems, which might exercise corner cases that were not seen during development and testing. Here, it would be really nice to get some kind of hint as to which of several uses the underflow was caused by. This commit therefore exposes a mem_dump_obj() function that takes a pointer to memory (which must still be allocated if it has been dynamically allocated) and prints available information on where that memory came from. This pointer can reference the middle of the block as well as the beginning of the block, as needed by things like RCU callback functions and timer handlers that might not know where the beginning of the memory block is. These functions and handlers can use mem_dump_obj() to print out better hints as to where the problem might lie. The information printed can depend on kernel configuration. For example, the allocation return address can be printed only for slab and slub, and even then only when the necessary debug has been enabled. For slab, build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space to the next power of two or use the SLAB_STORE_USER when creating the kmem_cache structure. For slub, build with CONFIG_SLUB_DEBUG=y and boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create() if more focused use is desired. Also for slub, use CONFIG_STACKTRACE to enable printing of the allocation-time stack trace. Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org> Reported-by: NAndrii Nakryiko <andrii@kernel.org> [ paulmck: Convert to printing and change names per Joonsoo Kim. ] [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ] [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ] [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ] [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ] [ paulmck: Explicitly check for small pointers per Naresh Kamboju. ] Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
-
- 19 11月, 2020 1 次提交
-
-
由 Christian König 提交于
Add the new vma_set_file() function to allow changing vma->vm_file with the necessary refcount dance. v2: add more users of this. v3: add missing EXPORT_SYMBOL, rebase on mmap cleanup, add comments why we drop the reference on two occasions. v4: make it clear that changing an anonymous vma is illegal. v5: move vma_set_file to mm/util.c Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v2) Reviewed-by: NJason Gunthorpe <jgg@nvidia.com> Acked-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://patchwork.freedesktop.org/patch/399360/
-
- 17 10月, 2020 1 次提交
-
-
由 Bartosz Golaszewski 提交于
Memory allocated with kstrdup_const() must not be passed to regular krealloc() as it is not aware of the possibility of the chunk residing in .rodata. Since there are no potential users of krealloc_const() at the moment, let's just update the doc to make it explicit. Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200817173927.23389-1-brgl@bgdev.plSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 9月, 2020 1 次提交
-
-
由 Catalin Marinas 提交于
When the Memory Tagging Extension is enabled, two pages are identical only if both their data and tags are identical. Make the generic memcmp_pages() a __weak function and add an arm64-specific implementation which returns non-zero if any of the two pages contain valid MTE tags (PG_mte_tagged set). There isn't much benefit in comparing the tags of two pages since these are normally used for heap allocations and likely to differ anyway. Co-developed-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
-
- 08 8月, 2020 2 次提交
-
-
由 Peter Collingbourne 提交于
The current split between do_mmap() and do_mmap_pgoff() was introduced in commit 1fcfd8db ("mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff()") to support MPX. The wrapper function do_mmap_pgoff() always passed 0 as the value of the vm_flags argument to do_mmap(). However, MPX support has subsequently been removed from the kernel and there were no more direct callers of do_mmap(); all calls were going via do_mmap_pgoff(). Simplify the code by removing do_mmap_pgoff() and changing all callers to directly call do_mmap(), which now no longer takes a vm_flags argument. Signed-off-by: NPeter Collingbourne <pcc@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Link: http://lkml.kernel.org/r/20200727194109.1371462-1-pcc@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Feng Tang 提交于
When checking a performance change for will-it-scale scalability mmap test [1], we found very high lock contention for spinlock of percpu counter 'vm_committed_as': 94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave 48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap; 45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap; Actually this heavy lock contention is not always necessary. The 'vm_committed_as' needs to be very precise when the strict OVERCOMMIT_NEVER policy is set, which requires a rather small batch number for the percpu counter. So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies. Also add a sysctl handler to adjust it when the policy is reconfigured. Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T desktop, and 2097%(20X) on a 4S/72C/144T server. We tested with test platforms in 0day (server, desktop and laptop), and 80%+ platforms shows improvements with that test. And whether it shows improvements depends on if the test mmap size is bigger than the batch number computed. And if the lift is 16X, 1/3 of the platforms will show improvements, though it should help the mmap/unmap usage generally, as Michal Hocko mentioned: : I believe that there are non-synthetic worklaods which would benefit from : a larger batch. E.g. large in memory databases which do large mmaps : during startups from multiple threads. [1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/Signed-off-by: NFeng Tang <feng.tang@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Qian Cai <cai@lca.pw> Cc: Kees Cook <keescook@chromium.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: kernel test robot <rong.a.chen@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1592725000-73486-4-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1594389708-60781-5-git-send-email-feng.tang@intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-