1. 30 6月, 2021 2 次提交
  2. 17 6月, 2021 1 次提交
    • K
      mm/slub: fix redzoning for small allocations · 74c1d3e0
      Kees Cook 提交于
      The redzone area for SLUB exists between s->object_size and s->inuse
      (which is at least the word-aligned object_size).  If a cache were
      created with an object_size smaller than sizeof(void *), the in-object
      stored freelist pointer would overwrite the redzone (e.g.  with boot
      param "slub_debug=ZF"):
      
        BUG test (Tainted: G    B            ): Right Redzone overwritten
        -----------------------------------------------------------------------------
      
        INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb
        INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200
        INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620
      
        Redzone  (____ptrval____): bb bb bb bb bb bb bb bb    ........
        Object   (____ptrval____): f6 f4 a5 40 1d e8          ...@..
        Redzone  (____ptrval____): 1a aa                      ..
        Padding  (____ptrval____): 00 00 00 00 00 00 00 00    ........
      
      Store the freelist pointer out of line when object_size is smaller than
      sizeof(void *) and redzoning is enabled.
      
      Additionally remove the "smaller than sizeof(void *)" check under
      CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant:
      SLAB and SLOB both handle small sizes.
      
      (Note that no caches within this size range are known to exist in the
      kernel currently.)
      
      Link: https://lkml.kernel.org/r/20210608183955.280836-3-keescook@chromium.org
      Fixes: 81819f0f ("SLUB core")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Lin, Zhenpeng" <zplin@psu.edu>
      Cc: Marco Elver <elver@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74c1d3e0
  3. 15 5月, 2021 1 次提交
    • V
      mm, slub: move slub_debug static key enabling outside slab_mutex · afe0c26d
      Vlastimil Babka 提交于
      Paul E.  McKenney reported [1] that commit 1f0723a4 ("mm, slub: enable
      slub_debug static key when creating cache with explicit debug flags")
      results in the lockdep complaint:
      
       ======================================================
       WARNING: possible circular locking dependency detected
       5.12.0+ #15 Not tainted
       ------------------------------------------------------
       rcu_torture_sta/109 is trying to acquire lock:
       ffffffff96063cd0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x9/0x20
      
       but task is already holding lock:
       ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (slab_mutex){+.+.}-{3:3}:
              lock_acquire+0xb9/0x3a0
              __mutex_lock+0x8d/0x920
              slub_cpu_dead+0x15/0xf0
              cpuhp_invoke_callback+0x17a/0x7c0
              cpuhp_invoke_callback_range+0x3b/0x80
              _cpu_down+0xdf/0x2a0
              cpu_down+0x2c/0x50
              device_offline+0x82/0xb0
              remove_cpu+0x1a/0x30
              torture_offline+0x80/0x140
              torture_onoff+0x147/0x260
              kthread+0x10a/0x140
              ret_from_fork+0x22/0x30
      
       -> #0 (cpu_hotplug_lock){++++}-{0:0}:
              check_prev_add+0x8f/0xbf0
              __lock_acquire+0x13f0/0x1d80
              lock_acquire+0xb9/0x3a0
              cpus_read_lock+0x21/0xa0
              static_key_enable+0x9/0x20
              __kmem_cache_create+0x38d/0x430
              kmem_cache_create_usercopy+0x146/0x250
              kmem_cache_create+0xd/0x10
              rcu_torture_stats+0x79/0x280
              kthread+0x10a/0x140
              ret_from_fork+0x22/0x30
      
       other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(slab_mutex);
                                      lock(cpu_hotplug_lock);
                                      lock(slab_mutex);
         lock(cpu_hotplug_lock);
      
        *** DEADLOCK ***
      
       1 lock held by rcu_torture_sta/109:
        #0: ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250
      
       stack backtrace:
       CPU: 3 PID: 109 Comm: rcu_torture_sta Not tainted 5.12.0+ #15
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
       Call Trace:
        dump_stack+0x6d/0x89
        check_noncircular+0xfe/0x110
        ? lock_is_held_type+0x98/0x110
        check_prev_add+0x8f/0xbf0
        __lock_acquire+0x13f0/0x1d80
        lock_acquire+0xb9/0x3a0
        ? static_key_enable+0x9/0x20
        ? mark_held_locks+0x49/0x70
        cpus_read_lock+0x21/0xa0
        ? static_key_enable+0x9/0x20
        static_key_enable+0x9/0x20
        __kmem_cache_create+0x38d/0x430
        kmem_cache_create_usercopy+0x146/0x250
        ? rcu_torture_stats_print+0xd0/0xd0
        kmem_cache_create+0xd/0x10
        rcu_torture_stats+0x79/0x280
        ? rcu_torture_stats_print+0xd0/0xd0
        kthread+0x10a/0x140
        ? kthread_park+0x80/0x80
        ret_from_fork+0x22/0x30
      
      This is because there's one order of locking from the hotplug callbacks:
      
      lock(cpu_hotplug_lock); // from hotplug machinery itself
      lock(slab_mutex); // in e.g. slab_mem_going_offline_callback()
      
      And commit 1f0723a4 made the reverse sequence possible:
      lock(slab_mutex); // in kmem_cache_create_usercopy()
      lock(cpu_hotplug_lock); // kmem_cache_open() -> static_key_enable()
      
      The simplest fix is to move static_key_enable() to a place before slab_mutex is
      taken. That means kmem_cache_create_usercopy() in mm/slab_common.c which is not
      ideal for SLUB-specific code, but the #ifdef CONFIG_SLUB_DEBUG makes it
      at least self-contained and obvious.
      
      [1] https://lore.kernel.org/lkml/20210502171827.GA3670492@paulmck-ThinkPad-P17-Gen-1/
      
      Link: https://lkml.kernel.org/r/20210504120019.26791-1-vbabka@suse.cz
      Fixes: 1f0723a4 ("mm, slub: enable slub_debug static key when creating cache with explicit debug flags")
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NPaul E. McKenney <paulmck@kernel.org>
      Tested-by: NPaul E. McKenney <paulmck@kernel.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      afe0c26d
  4. 01 5月, 2021 1 次提交
  5. 09 3月, 2021 2 次提交
  6. 27 2月, 2021 4 次提交
    • A
      kasan, mm: optimize krealloc poisoning · d12d9ad8
      Andrey Konovalov 提交于
      Currently, krealloc() always calls ksize(), which unpoisons the whole
      object including the redzone.  This is inefficient, as kasan_krealloc()
      repoisons the redzone for objects that fit into the same buffer.
      
      This patch changes krealloc() instrumentation to use uninstrumented
      __ksize() that doesn't unpoison the memory.  Instead, kasan_kreallos() is
      changed to unpoison the memory excluding the redzone.
      
      For objects that don't fit into the old allocation, this patch disables
      KASAN accessibility checks when copying memory into a new object instead
      of unpoisoning it.
      
      Link: https://lkml.kernel.org/r/9bef90327c9cb109d736c40115684fd32f49e6b0.1612546384.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d12d9ad8
    • A
      kasan, mm: fail krealloc on freed objects · 26a5ca7a
      Andrey Konovalov 提交于
      Currently, if krealloc() is called on a freed object with KASAN enabled,
      it allocates and returns a new object, but doesn't copy any memory from
      the old one as ksize() returns 0.  This makes the caller believe that
      krealloc() succeeded (KASAN report is printed though).
      
      This patch adds an accessibility check into __do_krealloc().  If the check
      fails, krealloc() returns NULL.  This check duplicates the one in ksize();
      this is fixed in the following patch.
      
      This patch also adds a KASAN-KUnit test to check krealloc() behaviour when
      it's called on a freed object.
      
      Link: https://lkml.kernel.org/r/cbcf7b02be0a1ca11de4f833f2ff0b3f2c9b00c8.1612546384.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26a5ca7a
    • A
      kasan, mm: don't save alloc stacks twice · 92850134
      Andrey Konovalov 提交于
      Patch series "kasan: optimizations and fixes for HW_TAGS", v4.
      
      This patchset makes the HW_TAGS mode more efficient, mostly by reworking
      poisoning approaches and simplifying/inlining some internal helpers.
      
      With this change, the overhead of HW_TAGS annotations excluding setting
      and checking memory tags is ~3%.  The performance impact caused by tags
      will be unknown until we have hardware that supports MTE.
      
      As a side-effect, this patchset speeds up generic KASAN by ~15%.
      
      This patch (of 13):
      
      Currently KASAN saves allocation stacks in both kasan_slab_alloc() and
      kasan_kmalloc() annotations.  This patch changes KASAN to save allocation
      stacks for slab objects from kmalloc caches in kasan_kmalloc() only, and
      stacks for other slab objects in kasan_slab_alloc() only.
      
      This change requires ____kasan_kmalloc() knowing whether the object
      belongs to a kmalloc cache.  This is implemented by adding a flag field to
      the kasan_info structure.  That flag is only set for kmalloc caches via a
      new kasan_cache_create_kmalloc() annotation.
      
      Link: https://lkml.kernel.org/r/cover.1612546384.git.andreyknvl@google.com
      Link: https://lkml.kernel.org/r/7c673ebca8d00f40a7ad6f04ab9a2bddeeae2097.1612546384.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      92850134
    • A
      mm, kfence: insert KFENCE hooks for SLAB · d3fb45f3
      Alexander Potapenko 提交于
      Inserts KFENCE hooks into the SLAB allocator.
      
      To pass the originally requested size to KFENCE, add an argument
      'orig_size' to slab_alloc*(). The additional argument is required to
      preserve the requested original size for kmalloc() allocations, which
      uses size classes (e.g. an allocation of 272 bytes will return an object
      of size 512). Therefore, kmem_cache::size does not represent the
      kmalloc-caller's requested size, and we must introduce the argument
      'orig_size' to propagate the originally requested size to KFENCE.
      
      Without the originally requested size, we would not be able to detect
      out-of-bounds accesses for objects placed at the end of a KFENCE object
      page if that object is not equal to the kmalloc-size class it was
      bucketed into.
      
      When KFENCE is disabled, there is no additional overhead, since
      slab_alloc*() functions are __always_inline.
      
      Link: https://lkml.kernel.org/r/20201103175841.3495947-5-elver@google.comSigned-off-by: NMarco Elver <elver@google.com>
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Reviewed-by: NDmitry Vyukov <dvyukov@google.com>
      Co-developed-by: NMarco Elver <elver@google.com>
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joern Engel <joern@purestorage.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: SeongJae Park <sjpark@amazon.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d3fb45f3
  7. 25 2月, 2021 5 次提交
    • A
      kasan: fix bug detection via ksize for HW_TAGS mode · 611806b4
      Andrey Konovalov 提交于
      The currently existing kasan_check_read/write() annotations are intended
      to be used for kernel modules that have KASAN compiler instrumentation
      disabled. Thus, they are only relevant for the software KASAN modes that
      rely on compiler instrumentation.
      
      However there's another use case for these annotations: ksize() checks
      that the object passed to it is indeed accessible before unpoisoning the
      whole object. This is currently done via __kasan_check_read(), which is
      compiled away for the hardware tag-based mode that doesn't rely on
      compiler instrumentation. This leads to KASAN missing detecting some
      memory corruptions.
      
      Provide another annotation called kasan_check_byte() that is available
      for all KASAN modes. As the implementation rename and reuse
      kasan_check_invalid_free(). Use this new annotation in ksize().
      To avoid having ksize() as the top frame in the reported stack trace
      pass _RET_IP_ to __kasan_check_byte().
      
      Also add a new ksize_uaf() test that checks that a use-after-free is
      detected via ksize() itself, and via plain accesses that happen later.
      
      Link: https://linux-review.googlesource.com/id/Iaabf771881d0f9ce1b969f2a62938e99d3308ec5
      Link: https://lkml.kernel.org/r/f32ad74a60b28d8402482a38476f02bb7600f620.1610733117.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Reviewed-by: NAlexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      611806b4
    • M
      mm: memcontrol: fix slub memory accounting · 96403bfe
      Muchun Song 提交于
      SLUB currently account kmalloc() and kmalloc_node() allocations larger
      than order-1 page per-node.  But it forget to update the per-memcg
      vmstats.  So it can lead to inaccurate statistics of "slab_unreclaimable"
      which is from memory.stat.  Fix it by using mod_lruvec_page_state instead
      of mod_node_page_state.
      
      Link: https://lkml.kernel.org/r/20210223092423.42420-1-songmuchun@bytedance.com
      Fixes: 6a486c0a ("mm, sl[ou]b: improve memory accounting")
      Signed-off-by: NMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NMichal Koutný <mkoutny@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96403bfe
    • V
      mm, slab, slub: stop taking cpu hotplug lock · 59450bbc
      Vlastimil Babka 提交于
      SLAB has been using get/put_online_cpus() around creating, destroying and
      shrinking kmem caches since 95402b38 ("cpu-hotplug: replace
      per-subsystem mutexes with get_online_cpus()") in 2008, which is supposed
      to be replacing a private mutex (cache_chain_mutex, called slab_mutex
      today) with system-wide mechanism, but in case of SLAB it's in fact used
      in addition to the existing mutex, without explanation why.
      
      SLUB appears to have avoided the cpu hotplug lock initially, but gained it
      due to common code unification, such as 20cea968 ("mm, sl[aou]b: Move
      kmem_cache_create mutex handling to common code").
      
      Regardless of the history, checking if the hotplug lock is actually needed
      today suggests that it's not, and therefore it's better to avoid this
      system-wide lock and the ordering this imposes wrt other locks (such as
      slab_mutex).
      
      Specifically, in SLAB we have for_each_online_cpu() in do_tune_cpucache()
      protected by slab_mutex, and cpu hotplug callbacks that also take the
      slab_mutex, which is also taken by the common slab function that currently
      also take the hotplug lock.  Thus the slab_mutex protection should be
      sufficient.  Also per-cpu array caches are allocated for each possible
      cpu, so not affected by their online/offline state.
      
      In SLUB we have for_each_online_cpu() in functions that show statistics
      and are already unprotected today, as racing with hotplug is not harmful.
      Otherwise SLUB relies on percpu allocator.  The slub_cpu_dead() hotplug
      callback takes the slab_mutex.
      
      To sum up, this patch removes get/put_online_cpus() calls from slab as it
      should be safe without further adjustments.
      
      Link: https://lkml.kernel.org/r/20210113131634.3671-4-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Qian Cai <cai@redhat.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59450bbc
    • V
      mm, slab, slub: stop taking memory hotplug lock · 7e1fa93d
      Vlastimil Babka 提交于
      Since commit 03afc0e2 ("slab: get_online_mems for
      kmem_cache_{create,destroy,shrink}") we are taking memory hotplug lock for
      SLAB and SLUB when creating, destroying or shrinking a cache.  It is quite
      a heavy lock and it's best to avoid it if possible, as we had several
      issues with lockdep complaining about ordering in the past, see e.g.
      e4f8e513 ("mm/slub: fix a deadlock in show_slab_objects()").
      
      The problem scenario in 03afc0e2 (solved by the memory hotplug lock)
      can be summarized as follows: while there's slab_mutex synchronizing new
      kmem cache creation and SLUB's MEM_GOING_ONLINE callback
      slab_mem_going_online_callback(), we may miss creation of kmem_cache_node
      for the hotplugged node in the new kmem cache, because the hotplug
      callback doesn't yet see the new cache, and cache creation in
      init_kmem_cache_nodes() only inits kmem_cache_node for nodes in the
      N_NORMAL_MEMORY nodemask, which however may not yet include the new node,
      as that happens only later after the MEM_GOING_ONLINE callback.
      
      Instead of using get/put_online_mems(), the problem can be solved by SLUB
      maintaining its own nodemask of nodes for which it has allocated the
      per-node kmem_cache_node structures.  This nodemask would generally mirror
      the N_NORMAL_MEMORY nodemask, but would be updated only in under SLUB's
      control in its memory hotplug callbacks under the slab_mutex.  This patch
      adds such nodemask and its handling.
      
      Commit 03afc0e2 mentiones "issues like [the one above]", but there
      don't appear to be further issues.  All the paths (shared for SLAB and
      SLUB) taking the memory hotplug locks are also taking the slab_mutex,
      except kmem_cache_shrink() where 03afc0e2 replaced slab_mutex with
      get/put_online_mems().
      
      We however cannot simply restore slab_mutex in kmem_cache_shrink(), as
      SLUB can enters the function from a write to sysfs 'shrink' file, thus
      holding kernfs lock, and in kmem_cache_create() the kernfs lock is nested
      within slab_mutex.  But on closer inspection we don't actually need to
      protect kmem_cache_shrink() from hotplug callbacks: While SLUB's
      __kmem_cache_shrink() does for_each_kmem_cache_node(), missing a new node
      added in parallel hotplug is not fatal, and parallel hotremove does not
      free kmem_cache_node's anymore after the previous patch, so use-after free
      cannot happen.  The per-node shrinking itself is protected by
      n->list_lock.  Same is true for SLAB, and SLOB is no-op.
      
      SLAB also doesn't need the memory hotplug locking, which it only gained by
      03afc0e2 through the shared paths in slab_common.c.  Its memory
      hotplug callbacks are also protected by slab_mutex against races with
      these paths.  The problem of SLUB relying on N_NORMAL_MEMORY doesn't apply
      to SLAB, as its setup_kmem_cache_nodes relies on N_ONLINE, and the new
      node is already set there during the MEM_GOING_ONLINE callback, so no
      special care is needed for SLAB.
      
      As such, this patch removes all get/put_online_mems() usage by the slab
      subsystem.
      
      Link: https://lkml.kernel.org/r/20210113131634.3671-3-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Qian Cai <cai@redhat.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e1fa93d
    • N
      mm/sl?b.c: remove ctor argument from kmem_cache_flags · 37540008
      Nikolay Borisov 提交于
      This argument hasn't been used since e153362a ("slub: Remove objsize
      check in kmem_cache_flags()") so simply remove it.
      
      Link: https://lkml.kernel.org/r/20210126095733.974665-1-nborisov@suse.comSigned-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37540008
  8. 23 1月, 2021 1 次提交
    • P
      mm: Add mem_dump_obj() to print source of memory block · 8e7f37f2
      Paul E. McKenney 提交于
      There are kernel facilities such as per-CPU reference counts that give
      error messages in generic handlers or callbacks, whose messages are
      unenlightening.  In the case of per-CPU reference-count underflow, this
      is not a problem when creating a new use of this facility because in that
      case the bug is almost certainly in the code implementing that new use.
      However, trouble arises when deploying across many systems, which might
      exercise corner cases that were not seen during development and testing.
      Here, it would be really nice to get some kind of hint as to which of
      several uses the underflow was caused by.
      
      This commit therefore exposes a mem_dump_obj() function that takes
      a pointer to memory (which must still be allocated if it has been
      dynamically allocated) and prints available information on where that
      memory came from.  This pointer can reference the middle of the block as
      well as the beginning of the block, as needed by things like RCU callback
      functions and timer handlers that might not know where the beginning of
      the memory block is.  These functions and handlers can use mem_dump_obj()
      to print out better hints as to where the problem might lie.
      
      The information printed can depend on kernel configuration.  For example,
      the allocation return address can be printed only for slab and slub,
      and even then only when the necessary debug has been enabled.  For slab,
      build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
      to the next power of two or use the SLAB_STORE_USER when creating the
      kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
      boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
      if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
      to enable printing of the allocation-time stack trace.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Reported-by: NAndrii Nakryiko <andrii@kernel.org>
      [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
      [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
      [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
      [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
      [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
      [ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      8e7f37f2
  9. 23 12月, 2020 2 次提交
  10. 16 12月, 2020 2 次提交
    • B
      mm: slab: clarify krealloc()'s behavior with __GFP_ZERO · 15d5de49
      Bartosz Golaszewski 提交于
      Patch series "slab: provide and use krealloc_array()", v3.
      
      Andy brought to my attention the fact that users allocating an array of
      equally sized elements should check if the size multiplication doesn't
      overflow.  This is why we have helpers like kmalloc_array().
      
      However we don't have krealloc_array() equivalent and there are many users
      who do their own multiplication when calling krealloc() for arrays.
      
      This series provides krealloc_array() and uses it in a couple places.
      
      A separate series will follow adding devm_krealloc_array() which is needed
      in the xilinx adc driver.
      
      This patch (of 9):
      
      __GFP_ZERO is ignored by krealloc() (unless we fall-back to kmalloc()
      path, in which case it's honored).  Point that out in the kerneldoc.
      
      Link: https://lkml.kernel.org/r/20201109110654.12547-1-brgl@bgdev.pl
      Link: https://lkml.kernel.org/r/20201109110654.12547-2-brgl@bgdev.plSigned-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: Christian Knig <christian.koenig@amd.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Maxime Ripard <mripard@kernel.org>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: "Michael S . Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jaroslav Kysela <perex@perex.cz>
      Cc: Takashi Iwai <tiwai@suse.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15d5de49
    • liulangrenaaa's avatar
      mm/slab_common.c: use list_for_each_entry in dump_unreclaimable_slab() · 7714304f
      liulangrenaaa 提交于
      dump_unreclaimable_slab() acquires the slab_mutex first, and it won't
      remove any slab_caches list entry when itering the slab_caches lists.
      
      Thus we do not need list_for_each_entry_safe here, which is against
      removal of list entry.
      
      Link: https://lkml.kernel.org/r/20200926043440.GA180545@rlkSigned-off-by: liulangrenaaa's avatarHui Su <sh_def@163.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7714304f
  11. 13 8月, 2020 1 次提交
  12. 08 8月, 2020 11 次提交
  13. 25 7月, 2020 1 次提交
    • M
      mm: memcg/slab: fix memory leak at non-root kmem_cache destroy · d38a2b7a
      Muchun Song 提交于
      If the kmem_cache refcount is greater than one, we should not mark the
      root kmem_cache as dying.  If we mark the root kmem_cache dying
      incorrectly, the non-root kmem_cache can never be destroyed.  It
      resulted in memory leak when memcg was destroyed.  We can use the
      following steps to reproduce.
      
        1) Use kmem_cache_create() to create a new kmem_cache named A.
        2) Coincidentally, the kmem_cache A is an alias for kmem_cache B,
           so the refcount of B is just increased.
        3) Use kmem_cache_destroy() to destroy the kmem_cache A, just
           decrease the B's refcount but mark the B as dying.
        4) Create a new memory cgroup and alloc memory from the kmem_cache
           B. It leads to create a non-root kmem_cache for allocating memory.
        5) When destroy the memory cgroup created in the step 4), the
           non-root kmem_cache can never be destroyed.
      
      If we repeat steps 4) and 5), this will cause a lot of memory leak.  So
      only when refcount reach zero, we mark the root kmem_cache as dying.
      
      Fixes: 92ee383f ("mm: fix race between kmem_cache destroy, create and deactivate")
      Signed-off-by: NMuchun Song <songmuchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200716165103.83462-1-songmuchun@bytedance.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d38a2b7a
  14. 26 6月, 2020 1 次提交
    • W
      mm/slab: use memzero_explicit() in kzfree() · 8982ae52
      Waiman Long 提交于
      The kzfree() function is normally used to clear some sensitive
      information, like encryption keys, in the buffer before freeing it back to
      the pool.  Memset() is currently used for buffer clearing.  However
      unlikely, there is still a non-zero probability that the compiler may
      choose to optimize away the memory clearing especially if LTO is being
      used in the future.
      
      To make sure that this optimization will never happen,
      memzero_explicit(), which is introduced in v3.18, is now used in
      kzfree() to future-proof it.
      
      Link: http://lkml.kernel.org/r/20200616154311.12314-2-longman@redhat.com
      Fixes: 3ef0e5ba ("slab: introduce kzfree()")
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8982ae52
  15. 03 6月, 2020 1 次提交
  16. 11 4月, 2020 1 次提交
  17. 08 4月, 2020 1 次提交
    • A
      proc: faster open/read/close with "permanent" files · d919b33d
      Alexey Dobriyan 提交于
      Now that "struct proc_ops" exist we can start putting there stuff which
      could not fly with VFS "struct file_operations"...
      
      Most of fs/proc/inode.c file is dedicated to make open/read/.../close
      reliable in the event of disappearing /proc entries which usually happens
      if module is getting removed.  Files like /proc/cpuinfo which never
      disappear simply do not need such protection.
      
      Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such
      "permanent" files.
      
      Enable "permanent" flag for
      
      	/proc/cpuinfo
      	/proc/kmsg
      	/proc/modules
      	/proc/slabinfo
      	/proc/stat
      	/proc/sysvipc/*
      	/proc/swaps
      
      More will come once I figure out foolproof way to prevent out module
      authors from marking their stuff "permanent" for performance reasons
      when it is not.
      
      This should help with scalability: benchmark is "read /proc/cpuinfo R times
      by N threads scattered over the system".
      
      	N	R	t, s (before)	t, s (after)
      	-----------------------------------------------------
      	64	4096	1.582458	1.530502	-3.2%
      	256	4096	6.371926	6.125168	-3.9%
      	1024	4096	25.64888	24.47528	-4.6%
      
      Benchmark source:
      
      #include <chrono>
      #include <iostream>
      #include <thread>
      #include <vector>
      
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <unistd.h>
      
      const int NR_CPUS = sysconf(_SC_NPROCESSORS_ONLN);
      int N;
      const char *filename;
      int R;
      
      int xxx = 0;
      
      int glue(int n)
      {
      	cpu_set_t m;
      	CPU_ZERO(&m);
      	CPU_SET(n, &m);
      	return sched_setaffinity(0, sizeof(cpu_set_t), &m);
      }
      
      void f(int n)
      {
      	glue(n % NR_CPUS);
      
      	while (*(volatile int *)&xxx == 0) {
      	}
      
      	for (int i = 0; i < R; i++) {
      		int fd = open(filename, O_RDONLY);
      		char buf[4096];
      		ssize_t rv = read(fd, buf, sizeof(buf));
      		asm volatile ("" :: "g" (rv));
      		close(fd);
      	}
      }
      
      int main(int argc, char *argv[])
      {
      	if (argc < 4) {
      		std::cerr << "usage: " << argv[0] << ' ' << "N /proc/filename R
      ";
      		return 1;
      	}
      
      	N = atoi(argv[1]);
      	filename = argv[2];
      	R = atoi(argv[3]);
      
      	for (int i = 0; i < NR_CPUS; i++) {
      		if (glue(i) == 0)
      			break;
      	}
      
      	std::vector<std::thread> T;
      	T.reserve(N);
      	for (int i = 0; i < N; i++) {
      		T.emplace_back(f, i);
      	}
      
      	auto t0 = std::chrono::system_clock::now();
      	{
      		*(volatile int *)&xxx = 1;
      		for (auto& t: T) {
      			t.join();
      		}
      	}
      	auto t1 = std::chrono::system_clock::now();
      	std::chrono::duration<double> dt = t1 - t0;
      	std::cout << dt.count() << '
      ';
      
      	return 0;
      }
      
      P.S.:
      Explicit randomization marker is added because adding non-function pointer
      will silently disable structure layout randomization.
      
      [akpm@linux-foundation.org: coding style fixes]
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Joe Perches <joe@perches.com>
      Link: http://lkml.kernel.org/r/20200222201539.GA22576@avx2Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d919b33d
  18. 03 4月, 2020 1 次提交
    • Y
      mm, memcg: fix build error around the usage of kmem_caches · a87425a3
      Yafang Shao 提交于
      When I manually set default n to MEMCG_KMEM in init/Kconfig, bellow error
      occurs,
      
        mm/slab_common.c: In function 'memcg_slab_start':
        mm/slab_common.c:1530:30: error: 'struct mem_cgroup' has no member named
        'kmem_caches'
          return seq_list_start(&memcg->kmem_caches, *pos);
                                      ^
        mm/slab_common.c: In function 'memcg_slab_next':
        mm/slab_common.c:1537:32: error: 'struct mem_cgroup' has no member named
        'kmem_caches'
          return seq_list_next(p, &memcg->kmem_caches, pos);
                                        ^
        mm/slab_common.c: In function 'memcg_slab_show':
        mm/slab_common.c:1551:16: error: 'struct mem_cgroup' has no member named
        'kmem_caches'
          if (p == memcg->kmem_caches.next)
                        ^
          CC      arch/x86/xen/smp.o
        mm/slab_common.c: In function 'memcg_slab_start':
        mm/slab_common.c:1531:1: warning: control reaches end of non-void function
        [-Wreturn-type]
         }
         ^
        mm/slab_common.c: In function 'memcg_slab_next':
        mm/slab_common.c:1538:1: warning: control reaches end of non-void function
        [-Wreturn-type]
         }
         ^
      
      That's because kmem_caches is defined only when CONFIG_MEMCG_KMEM is set,
      while memcg_slab_start() will use it no matter CONFIG_MEMCG_KMEM is defined
      or not.
      
      By the way, the reason I mannuly undefined CONFIG_MEMCG_KMEM is to verify
      whether my some other code change is still stable when CONFIG_MEMCG_KMEM is
      not set. Unfortunately, the existing code has been already unstable since
      v4.11.
      
      Fixes: bc2791f8 ("slab: link memcg kmem_caches on their associated memory cgroup")
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Link: http://lkml.kernel.org/r/1580970260-2045-1-git-send-email-laoar.shao@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a87425a3
  19. 04 2月, 2020 1 次提交