1. 01 9月, 2022 1 次提交
  2. 24 8月, 2022 5 次提交
  3. 20 7月, 2022 1 次提交
  4. 04 7月, 2022 1 次提交
  5. 13 5月, 2022 1 次提交
  6. 16 4月, 2022 1 次提交
  7. 13 4月, 2022 1 次提交
    • O
      mm/slab_common: move dma-kmalloc caches creation into new_kmalloc_cache() · 33647783
      Ohhoon Kwon 提交于
      There are four types of kmalloc_caches: KMALLOC_NORMAL, KMALLOC_CGROUP,
      KMALLOC_RECLAIM, and KMALLOC_DMA. While the first three types are
      created using new_kmalloc_cache(), KMALLOC_DMA caches are created in a
      separate logic. Let KMALLOC_DMA caches be also created using
      new_kmalloc_cache(), to enhance readability.
      
      Historically, there were only KMALLOC_NORMAL caches and KMALLOC_DMA
      caches in the first place, and they were initialized in two separate
      logics. However, when KMALLOC_RECLAIM was introduced in v4.20 via
      commit 1291523f ("mm, slab/slub: introduce kmalloc-reclaimable
      caches") and KMALLOC_CGROUP was introduced in v5.14 via
      commit 494c1dfe ("mm: memcg/slab: create a new set of kmalloc-cg-<n>
      caches"), their creations were merged with KMALLOC_NORMAL's only.
      KMALLOC_DMA creation logic should be merged with them, too.
      
      By merging KMALLOC_DMA initialization with other types, the following
      two changes might occur:
      1. The order dma-kmalloc-<n> caches added in slab_cache list may be
      sorted by size. i.e. the order they appear in /proc/slabinfo may change
      as well.
      2. slab_state will be set to UP after KMALLOC_DMA is created.
      In case of slub, freelist randomization is dependent on slab_state>=UP,
      and therefore KMALLOC_DMA cache's freelist will not be randomized in
      creation, but will be deferred to init_freelist_randomization().
      Co-developed-by: NJaeSang Yoo <jsyoo5b@gmail.com>
      Signed-off-by: NJaeSang Yoo <jsyoo5b@gmail.com>
      Signed-off-by: NOhhoon Kwon <ohkwon1043@gmail.com>
      Reviewed-by: NHyeonggon Yoo <42.hyeyoo@gmail.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Link: https://lore.kernel.org/r/20220410162511.656541-1-ohkwon1043@gmail.com
      33647783
  8. 06 4月, 2022 1 次提交
    • O
      mm/slub: use stackdepot to save stack trace in objects · 5cf909c5
      Oliver Glitta 提交于
      Many stack traces are similar so there are many similar arrays.
      Stackdepot saves each unique stack only once.
      
      Replace field addrs in struct track with depot_stack_handle_t handle.  Use
      stackdepot to save stack trace.
      
      The benefits are smaller memory overhead and possibility to aggregate
      per-cache statistics in the following patch using the stackdepot handle
      instead of matching stacks manually.
      
      [ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]
      
      This was initially merged as commit 78869146 and reverted by commit
      ae14c63a due to several issues, that should now be fixed.
      The problem of unconditional memory overhead by stackdepot has been
      addressed by commit 2dba5eb1 ("lib/stackdepot: allow optional init
      and stack_table allocation by kvmalloc()"), so the dependency on
      stackdepot will result in extra memory usage only when a slab cache
      tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
      The build failures on some architectures were also addressed, and the
      reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
      patch.
      Signed-off-by: NOliver Glitta <glittao@gmail.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-and-tested-by: NHyeonggon Yoo <42.hyeyoo@gmail.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      5cf909c5
  9. 21 2月, 2022 1 次提交
  10. 15 1月, 2022 5 次提交
  11. 06 1月, 2022 1 次提交
  12. 07 11月, 2021 1 次提交
  13. 04 9月, 2021 1 次提交
  14. 30 6月, 2021 5 次提交
  15. 17 6月, 2021 1 次提交
    • K
      mm/slub: fix redzoning for small allocations · 74c1d3e0
      Kees Cook 提交于
      The redzone area for SLUB exists between s->object_size and s->inuse
      (which is at least the word-aligned object_size).  If a cache were
      created with an object_size smaller than sizeof(void *), the in-object
      stored freelist pointer would overwrite the redzone (e.g.  with boot
      param "slub_debug=ZF"):
      
        BUG test (Tainted: G    B            ): Right Redzone overwritten
        -----------------------------------------------------------------------------
      
        INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb
        INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200
        INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620
      
        Redzone  (____ptrval____): bb bb bb bb bb bb bb bb    ........
        Object   (____ptrval____): f6 f4 a5 40 1d e8          ...@..
        Redzone  (____ptrval____): 1a aa                      ..
        Padding  (____ptrval____): 00 00 00 00 00 00 00 00    ........
      
      Store the freelist pointer out of line when object_size is smaller than
      sizeof(void *) and redzoning is enabled.
      
      Additionally remove the "smaller than sizeof(void *)" check under
      CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant:
      SLAB and SLOB both handle small sizes.
      
      (Note that no caches within this size range are known to exist in the
      kernel currently.)
      
      Link: https://lkml.kernel.org/r/20210608183955.280836-3-keescook@chromium.org
      Fixes: 81819f0f ("SLUB core")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Lin, Zhenpeng" <zplin@psu.edu>
      Cc: Marco Elver <elver@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74c1d3e0
  16. 15 5月, 2021 1 次提交
    • V
      mm, slub: move slub_debug static key enabling outside slab_mutex · afe0c26d
      Vlastimil Babka 提交于
      Paul E.  McKenney reported [1] that commit 1f0723a4 ("mm, slub: enable
      slub_debug static key when creating cache with explicit debug flags")
      results in the lockdep complaint:
      
       ======================================================
       WARNING: possible circular locking dependency detected
       5.12.0+ #15 Not tainted
       ------------------------------------------------------
       rcu_torture_sta/109 is trying to acquire lock:
       ffffffff96063cd0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x9/0x20
      
       but task is already holding lock:
       ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (slab_mutex){+.+.}-{3:3}:
              lock_acquire+0xb9/0x3a0
              __mutex_lock+0x8d/0x920
              slub_cpu_dead+0x15/0xf0
              cpuhp_invoke_callback+0x17a/0x7c0
              cpuhp_invoke_callback_range+0x3b/0x80
              _cpu_down+0xdf/0x2a0
              cpu_down+0x2c/0x50
              device_offline+0x82/0xb0
              remove_cpu+0x1a/0x30
              torture_offline+0x80/0x140
              torture_onoff+0x147/0x260
              kthread+0x10a/0x140
              ret_from_fork+0x22/0x30
      
       -> #0 (cpu_hotplug_lock){++++}-{0:0}:
              check_prev_add+0x8f/0xbf0
              __lock_acquire+0x13f0/0x1d80
              lock_acquire+0xb9/0x3a0
              cpus_read_lock+0x21/0xa0
              static_key_enable+0x9/0x20
              __kmem_cache_create+0x38d/0x430
              kmem_cache_create_usercopy+0x146/0x250
              kmem_cache_create+0xd/0x10
              rcu_torture_stats+0x79/0x280
              kthread+0x10a/0x140
              ret_from_fork+0x22/0x30
      
       other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(slab_mutex);
                                      lock(cpu_hotplug_lock);
                                      lock(slab_mutex);
         lock(cpu_hotplug_lock);
      
        *** DEADLOCK ***
      
       1 lock held by rcu_torture_sta/109:
        #0: ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250
      
       stack backtrace:
       CPU: 3 PID: 109 Comm: rcu_torture_sta Not tainted 5.12.0+ #15
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
       Call Trace:
        dump_stack+0x6d/0x89
        check_noncircular+0xfe/0x110
        ? lock_is_held_type+0x98/0x110
        check_prev_add+0x8f/0xbf0
        __lock_acquire+0x13f0/0x1d80
        lock_acquire+0xb9/0x3a0
        ? static_key_enable+0x9/0x20
        ? mark_held_locks+0x49/0x70
        cpus_read_lock+0x21/0xa0
        ? static_key_enable+0x9/0x20
        static_key_enable+0x9/0x20
        __kmem_cache_create+0x38d/0x430
        kmem_cache_create_usercopy+0x146/0x250
        ? rcu_torture_stats_print+0xd0/0xd0
        kmem_cache_create+0xd/0x10
        rcu_torture_stats+0x79/0x280
        ? rcu_torture_stats_print+0xd0/0xd0
        kthread+0x10a/0x140
        ? kthread_park+0x80/0x80
        ret_from_fork+0x22/0x30
      
      This is because there's one order of locking from the hotplug callbacks:
      
      lock(cpu_hotplug_lock); // from hotplug machinery itself
      lock(slab_mutex); // in e.g. slab_mem_going_offline_callback()
      
      And commit 1f0723a4 made the reverse sequence possible:
      lock(slab_mutex); // in kmem_cache_create_usercopy()
      lock(cpu_hotplug_lock); // kmem_cache_open() -> static_key_enable()
      
      The simplest fix is to move static_key_enable() to a place before slab_mutex is
      taken. That means kmem_cache_create_usercopy() in mm/slab_common.c which is not
      ideal for SLUB-specific code, but the #ifdef CONFIG_SLUB_DEBUG makes it
      at least self-contained and obvious.
      
      [1] https://lore.kernel.org/lkml/20210502171827.GA3670492@paulmck-ThinkPad-P17-Gen-1/
      
      Link: https://lkml.kernel.org/r/20210504120019.26791-1-vbabka@suse.cz
      Fixes: 1f0723a4 ("mm, slub: enable slub_debug static key when creating cache with explicit debug flags")
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NPaul E. McKenney <paulmck@kernel.org>
      Tested-by: NPaul E. McKenney <paulmck@kernel.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      afe0c26d
  17. 11 5月, 2021 1 次提交
    • M
      mm/slub: Add Support for free path information of an object · e548eaa1
      Maninder Singh 提交于
      This commit adds enables a stack dump for the last free of an object:
      
      slab kmalloc-64 start c8ab0140 data offset 64 pointer offset 0 size 64 allocated at meminfo_proc_show+0x40/0x4fc
      [   20.192078]     meminfo_proc_show+0x40/0x4fc
      [   20.192263]     seq_read_iter+0x18c/0x4c4
      [   20.192430]     proc_reg_read_iter+0x84/0xac
      [   20.192617]     generic_file_splice_read+0xe8/0x17c
      [   20.192816]     splice_direct_to_actor+0xb8/0x290
      [   20.193008]     do_splice_direct+0xa0/0xe0
      [   20.193185]     do_sendfile+0x2d0/0x438
      [   20.193345]     sys_sendfile64+0x12c/0x140
      [   20.193523]     ret_fast_syscall+0x0/0x58
      [   20.193695]     0xbeeacde4
      [   20.193822]  Free path:
      [   20.193935]     meminfo_proc_show+0x5c/0x4fc
      [   20.194115]     seq_read_iter+0x18c/0x4c4
      [   20.194285]     proc_reg_read_iter+0x84/0xac
      [   20.194475]     generic_file_splice_read+0xe8/0x17c
      [   20.194685]     splice_direct_to_actor+0xb8/0x290
      [   20.194870]     do_splice_direct+0xa0/0xe0
      [   20.195014]     do_sendfile+0x2d0/0x438
      [   20.195174]     sys_sendfile64+0x12c/0x140
      [   20.195336]     ret_fast_syscall+0x0/0x58
      [   20.195491]     0xbeeacde4
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Co-developed-by: NVaneet Narang <v.narang@samsung.com>
      Signed-off-by: NVaneet Narang <v.narang@samsung.com>
      Signed-off-by: NManinder Singh <maninder1.s@samsung.com>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      e548eaa1
  18. 01 5月, 2021 1 次提交
  19. 09 3月, 2021 2 次提交
  20. 27 2月, 2021 4 次提交
    • A
      kasan, mm: optimize krealloc poisoning · d12d9ad8
      Andrey Konovalov 提交于
      Currently, krealloc() always calls ksize(), which unpoisons the whole
      object including the redzone.  This is inefficient, as kasan_krealloc()
      repoisons the redzone for objects that fit into the same buffer.
      
      This patch changes krealloc() instrumentation to use uninstrumented
      __ksize() that doesn't unpoison the memory.  Instead, kasan_kreallos() is
      changed to unpoison the memory excluding the redzone.
      
      For objects that don't fit into the old allocation, this patch disables
      KASAN accessibility checks when copying memory into a new object instead
      of unpoisoning it.
      
      Link: https://lkml.kernel.org/r/9bef90327c9cb109d736c40115684fd32f49e6b0.1612546384.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d12d9ad8
    • A
      kasan, mm: fail krealloc on freed objects · 26a5ca7a
      Andrey Konovalov 提交于
      Currently, if krealloc() is called on a freed object with KASAN enabled,
      it allocates and returns a new object, but doesn't copy any memory from
      the old one as ksize() returns 0.  This makes the caller believe that
      krealloc() succeeded (KASAN report is printed though).
      
      This patch adds an accessibility check into __do_krealloc().  If the check
      fails, krealloc() returns NULL.  This check duplicates the one in ksize();
      this is fixed in the following patch.
      
      This patch also adds a KASAN-KUnit test to check krealloc() behaviour when
      it's called on a freed object.
      
      Link: https://lkml.kernel.org/r/cbcf7b02be0a1ca11de4f833f2ff0b3f2c9b00c8.1612546384.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26a5ca7a
    • A
      kasan, mm: don't save alloc stacks twice · 92850134
      Andrey Konovalov 提交于
      Patch series "kasan: optimizations and fixes for HW_TAGS", v4.
      
      This patchset makes the HW_TAGS mode more efficient, mostly by reworking
      poisoning approaches and simplifying/inlining some internal helpers.
      
      With this change, the overhead of HW_TAGS annotations excluding setting
      and checking memory tags is ~3%.  The performance impact caused by tags
      will be unknown until we have hardware that supports MTE.
      
      As a side-effect, this patchset speeds up generic KASAN by ~15%.
      
      This patch (of 13):
      
      Currently KASAN saves allocation stacks in both kasan_slab_alloc() and
      kasan_kmalloc() annotations.  This patch changes KASAN to save allocation
      stacks for slab objects from kmalloc caches in kasan_kmalloc() only, and
      stacks for other slab objects in kasan_slab_alloc() only.
      
      This change requires ____kasan_kmalloc() knowing whether the object
      belongs to a kmalloc cache.  This is implemented by adding a flag field to
      the kasan_info structure.  That flag is only set for kmalloc caches via a
      new kasan_cache_create_kmalloc() annotation.
      
      Link: https://lkml.kernel.org/r/cover.1612546384.git.andreyknvl@google.com
      Link: https://lkml.kernel.org/r/7c673ebca8d00f40a7ad6f04ab9a2bddeeae2097.1612546384.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      92850134
    • A
      mm, kfence: insert KFENCE hooks for SLAB · d3fb45f3
      Alexander Potapenko 提交于
      Inserts KFENCE hooks into the SLAB allocator.
      
      To pass the originally requested size to KFENCE, add an argument
      'orig_size' to slab_alloc*(). The additional argument is required to
      preserve the requested original size for kmalloc() allocations, which
      uses size classes (e.g. an allocation of 272 bytes will return an object
      of size 512). Therefore, kmem_cache::size does not represent the
      kmalloc-caller's requested size, and we must introduce the argument
      'orig_size' to propagate the originally requested size to KFENCE.
      
      Without the originally requested size, we would not be able to detect
      out-of-bounds accesses for objects placed at the end of a KFENCE object
      page if that object is not equal to the kmalloc-size class it was
      bucketed into.
      
      When KFENCE is disabled, there is no additional overhead, since
      slab_alloc*() functions are __always_inline.
      
      Link: https://lkml.kernel.org/r/20201103175841.3495947-5-elver@google.comSigned-off-by: NMarco Elver <elver@google.com>
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Reviewed-by: NDmitry Vyukov <dvyukov@google.com>
      Co-developed-by: NMarco Elver <elver@google.com>
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joern Engel <joern@purestorage.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: SeongJae Park <sjpark@amazon.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d3fb45f3
  21. 25 2月, 2021 4 次提交
    • A
      kasan: fix bug detection via ksize for HW_TAGS mode · 611806b4
      Andrey Konovalov 提交于
      The currently existing kasan_check_read/write() annotations are intended
      to be used for kernel modules that have KASAN compiler instrumentation
      disabled. Thus, they are only relevant for the software KASAN modes that
      rely on compiler instrumentation.
      
      However there's another use case for these annotations: ksize() checks
      that the object passed to it is indeed accessible before unpoisoning the
      whole object. This is currently done via __kasan_check_read(), which is
      compiled away for the hardware tag-based mode that doesn't rely on
      compiler instrumentation. This leads to KASAN missing detecting some
      memory corruptions.
      
      Provide another annotation called kasan_check_byte() that is available
      for all KASAN modes. As the implementation rename and reuse
      kasan_check_invalid_free(). Use this new annotation in ksize().
      To avoid having ksize() as the top frame in the reported stack trace
      pass _RET_IP_ to __kasan_check_byte().
      
      Also add a new ksize_uaf() test that checks that a use-after-free is
      detected via ksize() itself, and via plain accesses that happen later.
      
      Link: https://linux-review.googlesource.com/id/Iaabf771881d0f9ce1b969f2a62938e99d3308ec5
      Link: https://lkml.kernel.org/r/f32ad74a60b28d8402482a38476f02bb7600f620.1610733117.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Reviewed-by: NAlexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      611806b4
    • M
      mm: memcontrol: fix slub memory accounting · 96403bfe
      Muchun Song 提交于
      SLUB currently account kmalloc() and kmalloc_node() allocations larger
      than order-1 page per-node.  But it forget to update the per-memcg
      vmstats.  So it can lead to inaccurate statistics of "slab_unreclaimable"
      which is from memory.stat.  Fix it by using mod_lruvec_page_state instead
      of mod_node_page_state.
      
      Link: https://lkml.kernel.org/r/20210223092423.42420-1-songmuchun@bytedance.com
      Fixes: 6a486c0a ("mm, sl[ou]b: improve memory accounting")
      Signed-off-by: NMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NMichal Koutný <mkoutny@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96403bfe
    • V
      mm, slab, slub: stop taking cpu hotplug lock · 59450bbc
      Vlastimil Babka 提交于
      SLAB has been using get/put_online_cpus() around creating, destroying and
      shrinking kmem caches since 95402b38 ("cpu-hotplug: replace
      per-subsystem mutexes with get_online_cpus()") in 2008, which is supposed
      to be replacing a private mutex (cache_chain_mutex, called slab_mutex
      today) with system-wide mechanism, but in case of SLAB it's in fact used
      in addition to the existing mutex, without explanation why.
      
      SLUB appears to have avoided the cpu hotplug lock initially, but gained it
      due to common code unification, such as 20cea968 ("mm, sl[aou]b: Move
      kmem_cache_create mutex handling to common code").
      
      Regardless of the history, checking if the hotplug lock is actually needed
      today suggests that it's not, and therefore it's better to avoid this
      system-wide lock and the ordering this imposes wrt other locks (such as
      slab_mutex).
      
      Specifically, in SLAB we have for_each_online_cpu() in do_tune_cpucache()
      protected by slab_mutex, and cpu hotplug callbacks that also take the
      slab_mutex, which is also taken by the common slab function that currently
      also take the hotplug lock.  Thus the slab_mutex protection should be
      sufficient.  Also per-cpu array caches are allocated for each possible
      cpu, so not affected by their online/offline state.
      
      In SLUB we have for_each_online_cpu() in functions that show statistics
      and are already unprotected today, as racing with hotplug is not harmful.
      Otherwise SLUB relies on percpu allocator.  The slub_cpu_dead() hotplug
      callback takes the slab_mutex.
      
      To sum up, this patch removes get/put_online_cpus() calls from slab as it
      should be safe without further adjustments.
      
      Link: https://lkml.kernel.org/r/20210113131634.3671-4-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Qian Cai <cai@redhat.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59450bbc
    • V
      mm, slab, slub: stop taking memory hotplug lock · 7e1fa93d
      Vlastimil Babka 提交于
      Since commit 03afc0e2 ("slab: get_online_mems for
      kmem_cache_{create,destroy,shrink}") we are taking memory hotplug lock for
      SLAB and SLUB when creating, destroying or shrinking a cache.  It is quite
      a heavy lock and it's best to avoid it if possible, as we had several
      issues with lockdep complaining about ordering in the past, see e.g.
      e4f8e513 ("mm/slub: fix a deadlock in show_slab_objects()").
      
      The problem scenario in 03afc0e2 (solved by the memory hotplug lock)
      can be summarized as follows: while there's slab_mutex synchronizing new
      kmem cache creation and SLUB's MEM_GOING_ONLINE callback
      slab_mem_going_online_callback(), we may miss creation of kmem_cache_node
      for the hotplugged node in the new kmem cache, because the hotplug
      callback doesn't yet see the new cache, and cache creation in
      init_kmem_cache_nodes() only inits kmem_cache_node for nodes in the
      N_NORMAL_MEMORY nodemask, which however may not yet include the new node,
      as that happens only later after the MEM_GOING_ONLINE callback.
      
      Instead of using get/put_online_mems(), the problem can be solved by SLUB
      maintaining its own nodemask of nodes for which it has allocated the
      per-node kmem_cache_node structures.  This nodemask would generally mirror
      the N_NORMAL_MEMORY nodemask, but would be updated only in under SLUB's
      control in its memory hotplug callbacks under the slab_mutex.  This patch
      adds such nodemask and its handling.
      
      Commit 03afc0e2 mentiones "issues like [the one above]", but there
      don't appear to be further issues.  All the paths (shared for SLAB and
      SLUB) taking the memory hotplug locks are also taking the slab_mutex,
      except kmem_cache_shrink() where 03afc0e2 replaced slab_mutex with
      get/put_online_mems().
      
      We however cannot simply restore slab_mutex in kmem_cache_shrink(), as
      SLUB can enters the function from a write to sysfs 'shrink' file, thus
      holding kernfs lock, and in kmem_cache_create() the kernfs lock is nested
      within slab_mutex.  But on closer inspection we don't actually need to
      protect kmem_cache_shrink() from hotplug callbacks: While SLUB's
      __kmem_cache_shrink() does for_each_kmem_cache_node(), missing a new node
      added in parallel hotplug is not fatal, and parallel hotremove does not
      free kmem_cache_node's anymore after the previous patch, so use-after free
      cannot happen.  The per-node shrinking itself is protected by
      n->list_lock.  Same is true for SLAB, and SLOB is no-op.
      
      SLAB also doesn't need the memory hotplug locking, which it only gained by
      03afc0e2 through the shared paths in slab_common.c.  Its memory
      hotplug callbacks are also protected by slab_mutex against races with
      these paths.  The problem of SLUB relying on N_NORMAL_MEMORY doesn't apply
      to SLAB, as its setup_kmem_cache_nodes relies on N_ONLINE, and the new
      node is already set there during the MEM_GOING_ONLINE callback, so no
      special care is needed for SLAB.
      
      As such, this patch removes all get/put_online_mems() usage by the slab
      subsystem.
      
      Link: https://lkml.kernel.org/r/20210113131634.3671-3-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Qian Cai <cai@redhat.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e1fa93d