1. 27 9月, 2022 2 次提交
  2. 12 9月, 2022 2 次提交
  3. 03 8月, 2022 1 次提交
  4. 28 6月, 2022 1 次提交
  5. 19 5月, 2022 1 次提交
    • J
      random: move randomize_page() into mm where it belongs · 5ad7dd88
      Jason A. Donenfeld 提交于
      randomize_page is an mm function. It is documented like one. It contains
      the history of one. It has the naming convention of one. It looks
      just like another very similar function in mm, randomize_stack_top().
      And it has always been maintained and updated by mm people. There is no
      need for it to be in random.c. In the "which shape does not look like
      the other ones" test, pointing to randomize_page() is correct.
      
      So move randomize_page() into mm/util.c, right next to the similar
      randomize_stack_top() function.
      
      This commit contains no actual code changes.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      5ad7dd88
  6. 10 5月, 2022 1 次提交
  7. 05 5月, 2022 1 次提交
  8. 25 4月, 2022 1 次提交
    • L
      kvmalloc: use vmalloc_huge for vmalloc allocations · 9becb688
      Linus Torvalds 提交于
      Since commit 559089e0 ("vmalloc: replace VM_NO_HUGE_VMAP with
      VM_ALLOW_HUGE_VMAP"), the use of hugepage mappings for vmalloc is an
      opt-in strategy, because it caused a number of problems that weren't
      noticed until x86 enabled it too.
      
      One of the issues was fixed by Nick Piggin in commit 3b8000ae
      ("mm/vmalloc: huge vmalloc backing pages should be split rather than
      compound"), but I'm still worried about page protection issues, and
      VM_FLUSH_RESET_PERMS in particular.
      
      However, like the hash table allocation case (commit f2edd118:
      "page_alloc: use vmalloc_huge for large system hash"), the use of
      kvmalloc() should be safe from any such games, since the returned
      pointer might be a SLUB allocation, and as such no user should
      reasonably be using it in any odd ways.
      
      We also know that the allocations are fairly large, since it falls back
      to the vmalloc case only when a kmalloc() fails.  So using a hugepage
      mapping seems both safe and relevant.
      
      This patch does show a weakness in the opt-in strategy: since the opt-in
      flag is in the 'vm_flags', not the usual gfp_t allocation flags, very
      few of the usual interfaces actually expose it.
      
      That's not much of an issue in this case that already used one of the
      fairly specialized low-level vmalloc interfaces for the allocation, but
      for a lot of other vmalloc() users that might want to opt in, it's going
      to be very inconvenient.
      
      We'll either have to fix any compatibility problems, or expose it in the
      gfp flags (__GFP_COMP would have made a lot of sense) to allow normal
      vmalloc() users to use hugepage mappings.  That said, the cases that
      really matter were probably already taken care of by the hash tabel
      allocation.
      
      Link: https://lore.kernel.org/all/20220415164413.2727220-1-song@kernel.org/
      Link: https://lore.kernel.org/all/CAHk-=whao=iosX1s5Z4SF-ZGa-ebAukJoAdUJFk5SPwnofV+Vg@mail.gmail.com/
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Menzel <pmenzel@molgen.mpg.de>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9becb688
  9. 22 3月, 2022 2 次提交
  10. 08 3月, 2022 1 次提交
  11. 05 3月, 2022 1 次提交
    • D
      mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls · 0708a0af
      Daniel Borkmann 提交于
      syzkaller was recently triggering an oversized kvmalloc() warning via
      xdp_umem_create().
      
      The triggered warning was added back in 7661809d ("mm: don't allow
      oversized kvmalloc() calls"). The rationale for the warning for huge
      kvmalloc sizes was as a reaction to a security bug where the size was
      more than UINT_MAX but not everything was prepared to handle unsigned
      long sizes.
      
      Anyway, the AF_XDP related call trace from this syzkaller report was:
      
        kvmalloc include/linux/mm.h:806 [inline]
        kvmalloc_array include/linux/mm.h:824 [inline]
        kvcalloc include/linux/mm.h:829 [inline]
        xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline]
        xdp_umem_reg net/xdp/xdp_umem.c:219 [inline]
        xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252
        xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068
        __sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176
        __do_sys_setsockopt net/socket.c:2187 [inline]
        __se_sys_setsockopt net/socket.c:2184 [inline]
        __x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184
        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
        do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Björn mentioned that requests for >2GB allocation can still be valid:
      
        The structure that is being allocated is the page-pinning accounting.
        AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
        still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
        PAGE_SIZE on 64 bit systems). [...]
      
        I could just change from U32_MAX to INT_MAX, but as I stated earlier
        that has a hacky feeling to it. [...] From my perspective, the code
        isn't broken, with the memcg limits in consideration. [...]
      
      Linus says:
      
        [...] Pretty much every time this has come up, the kernel warning has
        shown that yes, the code was broken and there really wasn't a reason
        for doing allocations that big.
      
        Of course, some people would be perfectly fine with the allocation
        failing, they just don't want the warning. I didn't want __GFP_NOWARN
        to shut it up originally because I wanted people to see all those
        cases, but these days I think we can just say "yeah, people can shut
        it up explicitly by saying 'go ahead and fail this allocation, don't
        warn about it'".
      
        So enough time has passed that by now I'd certainly be ok with [it].
      
      Thus allow call-sites to silence such userspace triggered splats if the
      allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
      to kvcalloc() this is already the case, so nothing else needed there.
      
      Fixes: 7661809d ("mm: don't allow oversized kvmalloc() calls")
      Reported-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
      Cc: Björn Töpel <bjorn@kernel.org>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com
      Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.orgReviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Ackd-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0708a0af
  12. 15 1月, 2022 1 次提交
  13. 17 11月, 2021 1 次提交
  14. 18 10月, 2021 2 次提交
  15. 27 9月, 2021 3 次提交
  16. 25 9月, 2021 1 次提交
  17. 03 9月, 2021 1 次提交
    • L
      mm: don't allow oversized kvmalloc() calls · 7661809d
      Linus Torvalds 提交于
      'kvmalloc()' is a convenience function for people who want to do a
      kmalloc() but fall back on vmalloc() if there aren't enough physically
      contiguous pages, or if the allocation is larger than what kmalloc()
      supports.
      
      However, let's make sure it doesn't get _too_ easy to do crazy things
      with it.  In particular, don't allow big allocations that could be due
      to integer overflow or underflow.  So make sure the allocation size fits
      in an 'int', to protect against trivial integer conversion issues.
      Acked-by: NWilly Tarreau <w@1wt.eu>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7661809d
  18. 10 8月, 2021 1 次提交
    • D
      mm: Add kvrealloc() · de2860f4
      Dave Chinner 提交于
      During log recovery of an XFS filesystem with 64kB directory
      buffers, rebuilding a buffer split across two log records results
      in a memory allocation warning from krealloc like this:
      
      xfs filesystem being mounted at /mnt/scratch supports timestamps until 2038 (0x7fffffff)
      XFS (dm-0): Unmounting Filesystem
      XFS (dm-0): Mounting V5 Filesystem
      XFS (dm-0): Starting recovery (logdev: internal)
      ------------[ cut here ]------------
      WARNING: CPU: 5 PID: 3435170 at mm/page_alloc.c:3539 get_page_from_freelist+0xdee/0xe40
      .....
      RIP: 0010:get_page_from_freelist+0xdee/0xe40
      Call Trace:
       ? complete+0x3f/0x50
       __alloc_pages+0x16f/0x300
       alloc_pages+0x87/0x110
       kmalloc_order+0x2c/0x90
       kmalloc_order_trace+0x1d/0x90
       __kmalloc_track_caller+0x215/0x270
       ? xlog_recover_add_to_cont_trans+0x63/0x1f0
       krealloc+0x54/0xb0
       xlog_recover_add_to_cont_trans+0x63/0x1f0
       xlog_recovery_process_trans+0xc1/0xd0
       xlog_recover_process_ophdr+0x86/0x130
       xlog_recover_process_data+0x9f/0x160
       xlog_recover_process+0xa2/0x120
       xlog_do_recovery_pass+0x40b/0x7d0
       ? __irq_work_queue_local+0x4f/0x60
       ? irq_work_queue+0x3a/0x50
       xlog_do_log_recovery+0x70/0x150
       xlog_do_recover+0x38/0x1d0
       xlog_recover+0xd8/0x170
       xfs_log_mount+0x181/0x300
       xfs_mountfs+0x4a1/0x9b0
       xfs_fs_fill_super+0x3c0/0x7b0
       get_tree_bdev+0x171/0x270
       ? suffix_kstrtoint.constprop.0+0xf0/0xf0
       xfs_fs_get_tree+0x15/0x20
       vfs_get_tree+0x24/0xc0
       path_mount+0x2f5/0xaf0
       __x64_sys_mount+0x108/0x140
       do_syscall_64+0x3a/0x70
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Essentially, we are taking a multi-order allocation from kmem_alloc()
      (which has an open coded no fail, no warn loop) and then
      reallocating it out to 64kB using krealloc(__GFP_NOFAIL) and that is
      then triggering the above warning.
      
      This is a regression caused by converting this code from an open
      coded no fail/no warn reallocation loop to using __GFP_NOFAIL.
      
      What we actually need here is kvrealloc(), so that if contiguous
      page allocation fails we fall back to vmalloc() and we don't
      get nasty warnings happening in XFS.
      
      Fixes: 771915c4 ("xfs: remove kmem_realloc()")
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      de2860f4
  19. 13 7月, 2021 1 次提交
  20. 01 7月, 2021 1 次提交
    • D
      mm: introduce page_offline_(begin|end|freeze|thaw) to synchronize setting PageOffline() · 82840451
      David Hildenbrand 提交于
      A driver might set a page logically offline -- PageOffline() -- and turn
      the page inaccessible in the hypervisor; after that, access to page
      content can be fatal.  One example is virtio-mem; while unplugged memory
      -- marked as PageOffline() can currently be read in the hypervisor, this
      will no longer be the case in the future; for example, when having a
      virtio-mem device backed by huge pages in the hypervisor.
      
      Some special PFN walkers -- i.e., /proc/kcore -- read content of random
      pages after checking PageOffline(); however, these PFN walkers can race
      with drivers that set PageOffline().
      
      Let's introduce page_offline_(begin|end|freeze|thaw) for synchronizing.
      
      page_offline_freeze()/page_offline_thaw() allows for a subsystem to
      synchronize with such drivers, achieving that a page cannot be set
      PageOffline() while frozen.
      
      page_offline_begin()/page_offline_end() is used by drivers that care about
      such races when setting a page PageOffline().
      
      For simplicity, use a rwsem for now; neither drivers nor users are
      performance sensitive.
      
      Link: https://lkml.kernel.org/r/20210526093041.8800-5-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Aili Yao <yaoaili@kingsoft.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82840451
  21. 11 5月, 2021 1 次提交
    • M
      mm/slub: Add Support for free path information of an object · e548eaa1
      Maninder Singh 提交于
      This commit adds enables a stack dump for the last free of an object:
      
      slab kmalloc-64 start c8ab0140 data offset 64 pointer offset 0 size 64 allocated at meminfo_proc_show+0x40/0x4fc
      [   20.192078]     meminfo_proc_show+0x40/0x4fc
      [   20.192263]     seq_read_iter+0x18c/0x4c4
      [   20.192430]     proc_reg_read_iter+0x84/0xac
      [   20.192617]     generic_file_splice_read+0xe8/0x17c
      [   20.192816]     splice_direct_to_actor+0xb8/0x290
      [   20.193008]     do_splice_direct+0xa0/0xe0
      [   20.193185]     do_sendfile+0x2d0/0x438
      [   20.193345]     sys_sendfile64+0x12c/0x140
      [   20.193523]     ret_fast_syscall+0x0/0x58
      [   20.193695]     0xbeeacde4
      [   20.193822]  Free path:
      [   20.193935]     meminfo_proc_show+0x5c/0x4fc
      [   20.194115]     seq_read_iter+0x18c/0x4c4
      [   20.194285]     proc_reg_read_iter+0x84/0xac
      [   20.194475]     generic_file_splice_read+0xe8/0x17c
      [   20.194685]     splice_direct_to_actor+0xb8/0x290
      [   20.194870]     do_splice_direct+0xa0/0xe0
      [   20.195014]     do_sendfile+0x2d0/0x438
      [   20.195174]     sys_sendfile64+0x12c/0x140
      [   20.195336]     ret_fast_syscall+0x0/0x58
      [   20.195491]     0xbeeacde4
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Co-developed-by: NVaneet Narang <v.narang@samsung.com>
      Signed-off-by: NVaneet Narang <v.narang@samsung.com>
      Signed-off-by: NManinder Singh <maninder1.s@samsung.com>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      e548eaa1
  22. 06 5月, 2021 2 次提交
  23. 01 5月, 2021 1 次提交
  24. 09 3月, 2021 2 次提交
  25. 23 1月, 2021 3 次提交
    • P
      mm: Make mem_dump_obj() handle vmalloc() memory · 98f18083
      Paul E. McKenney 提交于
      This commit adds vmalloc() support to mem_dump_obj().  Note that the
      vmalloc_dump_obj() function combines the checking and dumping, in
      contrast with the split between kmem_valid_obj() and kmem_dump_obj().
      The reason for the difference is that the checking in the vmalloc()
      case involves acquiring a global lock, and redundant acquisitions of
      global locks should be avoided, even on not-so-fast paths.
      
      Note that this change causes on-stack variables to be reported as
      vmalloc() storage from kernel_clone() or similar, depending on the degree
      of inlining that your compiler does.  This is likely more helpful than
      the earlier "non-paged (local) memory".
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <linux-mm@kvack.org>
      Reported-by: NAndrii Nakryiko <andrii@kernel.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      98f18083
    • P
      mm: Make mem_dump_obj() handle NULL and zero-sized pointers · b70fa3b1
      Paul E. McKenney 提交于
      This commit makes mem_dump_obj() call out NULL and zero-sized pointers
      specially instead of classifying them as non-paged memory.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Reported-by: NAndrii Nakryiko <andrii@kernel.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      b70fa3b1
    • P
      mm: Add mem_dump_obj() to print source of memory block · 8e7f37f2
      Paul E. McKenney 提交于
      There are kernel facilities such as per-CPU reference counts that give
      error messages in generic handlers or callbacks, whose messages are
      unenlightening.  In the case of per-CPU reference-count underflow, this
      is not a problem when creating a new use of this facility because in that
      case the bug is almost certainly in the code implementing that new use.
      However, trouble arises when deploying across many systems, which might
      exercise corner cases that were not seen during development and testing.
      Here, it would be really nice to get some kind of hint as to which of
      several uses the underflow was caused by.
      
      This commit therefore exposes a mem_dump_obj() function that takes
      a pointer to memory (which must still be allocated if it has been
      dynamically allocated) and prints available information on where that
      memory came from.  This pointer can reference the middle of the block as
      well as the beginning of the block, as needed by things like RCU callback
      functions and timer handlers that might not know where the beginning of
      the memory block is.  These functions and handlers can use mem_dump_obj()
      to print out better hints as to where the problem might lie.
      
      The information printed can depend on kernel configuration.  For example,
      the allocation return address can be printed only for slab and slub,
      and even then only when the necessary debug has been enabled.  For slab,
      build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
      to the next power of two or use the SLAB_STORE_USER when creating the
      kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
      boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
      if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
      to enable printing of the allocation-time stack trace.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Reported-by: NAndrii Nakryiko <andrii@kernel.org>
      [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
      [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
      [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
      [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
      [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
      [ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      8e7f37f2
  26. 19 11月, 2020 1 次提交
  27. 17 10月, 2020 1 次提交
  28. 04 9月, 2020 1 次提交
  29. 08 8月, 2020 2 次提交