1. 20 7月, 2022 1 次提交
  2. 04 7月, 2022 1 次提交
    • M
      mm: slab: optimize memcg_slab_free_hook() · b77d5b1b
      Muchun Song 提交于
      Most callers of memcg_slab_free_hook() already know the slab,  which could
      be passed to memcg_slab_free_hook() directly to reduce the overhead of an
      another call of virt_to_slab().  For bulk freeing of objects, the call of
      slab_objcgs() in the loop in memcg_slab_free_hook() is redundant as well.
      Rework memcg_slab_free_hook() and build_detached_freelist() to reduce
      those unnecessary overhead and make memcg_slab_free_hook() can handle bulk
      freeing in slab_free().
      
      Move the calling site of memcg_slab_free_hook() from do_slab_free() to
      slab_free() for slub to make the code clearer since the logic is weird
      (e.g. the caller need to judge whether it needs to call
      memcg_slab_free_hook()). It is easy to make mistakes like missing calling
      of memcg_slab_free_hook() like fixes of:
      
        commit d1b2cf6c ("mm: memcg/slab: uncharge during kmem_cache_free_bulk()")
        commit ae085d7f ("mm: kfence: fix missing objcg housekeeping for SLAB")
      
      This optimization is mainly for bulk objects freeing.  The following numbers
      is shown for 16-object freeing.
      
                                 before      after
        kmem_cache_free_bulk:   ~430 ns     ~400 ns
      
      The overhead is reduced by about 7% for 16-object freeing.
      Signed-off-by: NMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: NHyeonggon Yoo <42.hyeyoo@gmail.com>
      Link: https://lore.kernel.org/r/20220429123044.37885-1-songmuchun@bytedance.comSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      b77d5b1b
  3. 16 4月, 2022 1 次提交
  4. 06 4月, 2022 1 次提交
  5. 23 3月, 2022 1 次提交
    • M
      mm: introduce kmem_cache_alloc_lru · 88f2ef73
      Muchun Song 提交于
      We currently allocate scope for every memcg to be able to tracked on
      every superblock instantiated in the system, regardless of whether that
      superblock is even accessible to that memcg.
      
      These huge memcg counts come from container hosts where memcgs are
      confined to just a small subset of the total number of superblocks that
      instantiated at any given point in time.
      
      For these systems with huge container counts, list_lru does not need the
      capability of tracking every memcg on every superblock.  What it comes
      down to is that adding the memcg to the list_lru at the first insert.
      So introduce kmem_cache_alloc_lru to allocate objects and its list_lru.
      In the later patch, we will convert all inode and dentry allocation from
      kmem_cache_alloc to kmem_cache_alloc_lru.
      
      Link: https://lkml.kernel.org/r/20220228122126.37293-3-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
      Cc: Chao Yu <chao@kernel.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Fam Zheng <fam.zheng@bytedance.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kari Argillander <kari.argillander@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88f2ef73
  6. 15 1月, 2022 1 次提交
  7. 07 1月, 2022 1 次提交
  8. 06 1月, 2022 11 次提交
  9. 21 11月, 2021 1 次提交
  10. 31 7月, 2021 1 次提交
  11. 16 7月, 2021 1 次提交
  12. 30 6月, 2021 4 次提交
    • W
      mm: memcg/slab: properly set up gfp flags for objcg pointer array · 41eb5df1
      Waiman Long 提交于
      Patch series "mm: memcg/slab: Fix objcg pointer array handling problem", v4.
      
      Since the merging of the new slab memory controller in v5.9, the page
      structure stores a pointer to objcg pointer array for slab pages.  When
      the slab has no used objects, it can be freed in free_slab() which will
      call kfree() to free the objcg pointer array in
      memcg_alloc_page_obj_cgroups().  If it happens that the objcg pointer
      array is the last used object in its slab, that slab may then be freed
      which may caused kfree() to be called again.
      
      With the right workload, the slab cache may be set up in a way that allows
      the recursive kfree() calling loop to nest deep enough to cause a kernel
      stack overflow and panic the system.  In fact, we have a reproducer that
      can cause kernel stack overflow on a s390 system involving kmalloc-rcl-256
      and kmalloc-rcl-128 slabs with the following kfree() loop recursively
      called 74 times:
      
        [ 285.520739] [<000000000ec432fc>] kfree+0x4bc/0x560 [ 285.520740]
      [<000000000ec43466>] __free_slab+0xc6/0x228 [ 285.520741]
      [<000000000ec41fc2>] __slab_free+0x3c2/0x3e0 [ 285.520742]
      [<000000000ec432fc>] kfree+0x4bc/0x560 : While investigating this issue, I
      also found an issue on the allocation side.  If the objcg pointer array
      happen to come from the same slab or a circular dependency linkage is
      formed with multiple slabs, those affected slabs can never be freed again.
      
      This patch series addresses these two issues by introducing a new set of
      kmalloc-cg-<n> caches split from kmalloc-<n> caches.  The new set will
      only contain non-reclaimable and non-dma objects that are accounted in
      memory cgroups whereas the old set are now for unaccounted objects only.
      By making this split, all the objcg pointer arrays will come from the
      kmalloc-<n> caches, but those caches will never hold any objcg pointer
      array.  As a result, deeply nested kfree() call and the unfreeable slab
      problems are now gone.
      
      This patch (of 4):
      
      Since the merging of the new slab memory controller in v5.9, the page
      structure may store a pointer to obj_cgroup pointer array for slab pages.
      Currently, only the __GFP_ACCOUNT bit is masked off.  However, the array
      is not readily reclaimable and doesn't need to come from the DMA buffer.
      So those GFP bits should be masked off as well.
      
      Do the flag bit clearing at memcg_alloc_page_obj_cgroups() to make sure
      that it is consistently applied no matter where it is called.
      
      Link: https://lkml.kernel.org/r/20210505200610.13943-1-longman@redhat.com
      Link: https://lkml.kernel.org/r/20210505200610.13943-2-longman@redhat.com
      Fixes: 286e04b8 ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages")
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41eb5df1
    • W
      mm/memcg: move mod_objcg_state() to memcontrol.c · fdbcb2a6
      Waiman Long 提交于
      Patch series "mm/memcg: Reduce kmemcache memory accounting overhead", v6.
      
      With the recent introduction of the new slab memory controller, we
      eliminate the need for having separate kmemcaches for each memory cgroup
      and reduce overall kernel memory usage.  However, we also add additional
      memory accounting overhead to each call of kmem_cache_alloc() and
      kmem_cache_free().
      
      For workloads that require a lot of kmemcache allocations and
      de-allocations, they may experience performance regression as illustrated
      in [1] and [2].
      
      A simple kernel module that performs repeated loop of 100,000,000
      kmem_cache_alloc() and kmem_cache_free() of either a small 32-byte object
      or a big 4k object at module init time with a batch size of 4 (4 kmalloc's
      followed by 4 kfree's) is used for benchmarking.  The benchmarking tool
      was run on a kernel based on linux-next-20210419.  The test was run on a
      CascadeLake server with turbo-boosting disable to reduce run-to-run
      variation.
      
      The small object test exercises mainly the object stock charging and
      vmstat update code paths.  The large object test also exercises the
      refill_obj_stock() and __memcg_kmem_charge()/__memcg_kmem_uncharge() code
      paths.
      
      With memory accounting disabled, the run time was 3.130s with both small
      object big object tests.
      
      With memory accounting enabled, both cgroup v1 and v2 showed similar
      results in the small object test.  The performance results of the large
      object test, however, differed between cgroup v1 and v2.
      
      The execution times with the application of various patches in the
      patchset were:
      
        Applied patches   Run time   Accounting overhead   %age 1   %age 2
        ---------------   --------   -------------------   ------   ------
      
        Small 32-byte object:
             None          11.634s         8.504s          100.0%   271.7%
              1-2           9.425s         6.295s           74.0%   201.1%
              1-3           9.708s         6.578s           77.4%   210.2%
              1-4           8.062s         4.932s           58.0%   157.6%
      
        Large 4k object (v2):
             None          22.107s        18.977s          100.0%   606.3%
              1-2          20.960s        17.830s           94.0%   569.6%
              1-3          14.238s        11.108s           58.5%   354.9%
              1-4          11.329s         8.199s           43.2%   261.9%
      
        Large 4k object (v1):
             None          36.807s        33.677s          100.0%  1075.9%
              1-2          36.648s        33.518s           99.5%  1070.9%
              1-3          22.345s        19.215s           57.1%   613.9%
              1-4          18.662s        15.532s           46.1%   496.2%
      
        N.B. %age 1 = overhead/unpatched overhead
             %age 2 = overhead/accounting disabled time
      
      Patch 2 (vmstat data stock caching) helps in both the small object test
      and the large v2 object test. It doesn't help much in v1 big object test.
      
      Patch 3 (refill_obj_stock improvement) does help the small object test
      but offer significant performance improvement for the large object test
      (both v1 and v2).
      
      Patch 4 (eliminating irq disable/enable) helps in all test cases.
      
      To test for the extreme case, a multi-threaded kmalloc/kfree
      microbenchmark was run on the 2-socket 48-core 96-thread system with
      96 testing threads in the same memcg doing kmalloc+kfree of a 4k object
      with accounting enabled for 10s. The total number of kmalloc+kfree done
      in kilo operations per second (kops/s) were as follows:
      
        Applied patches   v1 kops/s   v1 change   v2 kops/s   v2 change
        ---------------   ---------   ---------   ---------   ---------
             None           3,520        1.00X      6,242        1.00X
              1-2           4,304        1.22X      8,478        1.36X
              1-3           4,731        1.34X    418,142       66.99X
              1-4           4,587        1.30X    438,838       70.30X
      
      With memory accounting disabled, the kmalloc/kfree rate was 1,481,291
      kop/s. This test shows how significant the memory accouting overhead
      can be in some extreme situations.
      
      For this multithreaded test, the improvement from patch 2 mainly
      comes from the conditional atomic xchg of objcg->nr_charged_bytes in
      mod_objcg_state(). By using an unconditional xchg, the operation rates
      were similar to the unpatched kernel.
      
      Patch 3 elminates the single highly contended cacheline of
      objcg->nr_charged_bytes for cgroup v2 leading to a huge performance
      improvement. Cgroup v1, however, still has another highly contended
      cacheline in the shared page counter &memcg->kmem. So the improvement
      is only modest.
      
      Patch 4 helps in cgroup v2, but performs worse in cgroup v1 as
      eliminating the irq_disable/irq_enable overhead seems to aggravate the
      cacheline contention.
      
      [1] https://lore.kernel.org/linux-mm/20210408193948.vfktg3azh2wrt56t@gabell/T/#u
      [2] https://lore.kernel.org/lkml/20210114025151.GA22932@xsang-OptiPlex-9020/
      
      This patch (of 4):
      
      mod_objcg_state() is moved from mm/slab.h to mm/memcontrol.c so that
      further optimization can be done to it in later patches without exposing
      unnecessary details to other mm components.
      
      Link: https://lkml.kernel.org/r/20210506150007.16288-1-longman@redhat.com
      Link: https://lkml.kernel.org/r/20210506150007.16288-2-longman@redhat.comSigned-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Chris Down <chris@chrisdown.name>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Yafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fdbcb2a6
    • F
      mm: slub: move sysfs slab alloc/free interfaces to debugfs · 64dd6849
      Faiyaz Mohammed 提交于
      alloc_calls and free_calls implementation in sysfs have two issues, one is
      PAGE_SIZE limitation of sysfs and other is it does not adhere to "one
      value per file" rule.
      
      To overcome this issues, move the alloc_calls and free_calls
      implementation to debugfs.
      
      Debugfs cache will be created if SLAB_STORE_USER flag is set.
      
      Rename the alloc_calls/free_calls to alloc_traces/free_traces, to be
      inline with what it does.
      
      [faiyazm@codeaurora.org: fix the leak of alloc/free traces debugfs interface]
        Link: https://lkml.kernel.org/r/1624248060-30286-1-git-send-email-faiyazm@codeaurora.org
      
      Link: https://lkml.kernel.org/r/1623438200-19361-1-git-send-email-faiyazm@codeaurora.orgSigned-off-by: NFaiyaz Mohammed <faiyazm@codeaurora.org>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64dd6849
    • O
      mm/slub, kunit: add a KUnit test for SLUB debugging functionality · 1f9f78b1
      Oliver Glitta 提交于
      SLUB has resiliency_test() function which is hidden behind #ifdef
      SLUB_RESILIENCY_TEST that is not part of Kconfig, so nobody runs it.
      KUnit should be a proper replacement for it.
      
      Try changing byte in redzone after allocation and changing pointer to next
      free node, first byte, 50th byte and redzone byte.  Check if validation
      finds errors.
      
      There are several differences from the original resiliency test: Tests
      create own caches with known state instead of corrupting shared kmalloc
      caches.
      
      The corruption of freepointer uses correct offset, the original resiliency
      test got broken with freepointer changes.
      
      Scratch changing random byte test, because it does not have meaning in
      this form where we need deterministic results.
      
      Add new option CONFIG_SLUB_KUNIT_TEST in Kconfig.  Tests next_pointer,
      first_word and clobber_50th_byte do not run with KASAN option on.  Because
      the test deliberately modifies non-allocated objects.
      
      Use kunit_resource to count errors in cache and silence bug reports.
      Count error whenever slab_bug() or slab_fix() is called or when the count
      of pages is wrong.
      
      [glittao@gmail.com: remove unused function test_exit(), from SLUB KUnit test]
        Link: https://lkml.kernel.org/r/20210512140656.12083-1-glittao@gmail.com
      [akpm@linux-foundation.org: export kasan_enable/disable_current to modules]
      
      Link: https://lkml.kernel.org/r/20210511150734.3492-2-glittao@gmail.comSigned-off-by: NOliver Glitta <glittao@gmail.com>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NDaniel Latypov <dlatypov@google.com>
      Acked-by: NMarco Elver <elver@google.com>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f9f78b1
  13. 11 5月, 2021 1 次提交
    • M
      mm/slub: Add Support for free path information of an object · e548eaa1
      Maninder Singh 提交于
      This commit adds enables a stack dump for the last free of an object:
      
      slab kmalloc-64 start c8ab0140 data offset 64 pointer offset 0 size 64 allocated at meminfo_proc_show+0x40/0x4fc
      [   20.192078]     meminfo_proc_show+0x40/0x4fc
      [   20.192263]     seq_read_iter+0x18c/0x4c4
      [   20.192430]     proc_reg_read_iter+0x84/0xac
      [   20.192617]     generic_file_splice_read+0xe8/0x17c
      [   20.192816]     splice_direct_to_actor+0xb8/0x290
      [   20.193008]     do_splice_direct+0xa0/0xe0
      [   20.193185]     do_sendfile+0x2d0/0x438
      [   20.193345]     sys_sendfile64+0x12c/0x140
      [   20.193523]     ret_fast_syscall+0x0/0x58
      [   20.193695]     0xbeeacde4
      [   20.193822]  Free path:
      [   20.193935]     meminfo_proc_show+0x5c/0x4fc
      [   20.194115]     seq_read_iter+0x18c/0x4c4
      [   20.194285]     proc_reg_read_iter+0x84/0xac
      [   20.194475]     generic_file_splice_read+0xe8/0x17c
      [   20.194685]     splice_direct_to_actor+0xb8/0x290
      [   20.194870]     do_splice_direct+0xa0/0xe0
      [   20.195014]     do_sendfile+0x2d0/0x438
      [   20.195174]     sys_sendfile64+0x12c/0x140
      [   20.195336]     ret_fast_syscall+0x0/0x58
      [   20.195491]     0xbeeacde4
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Co-developed-by: NVaneet Narang <v.narang@samsung.com>
      Signed-off-by: NVaneet Narang <v.narang@samsung.com>
      Signed-off-by: NManinder Singh <maninder1.s@samsung.com>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      e548eaa1
  14. 01 5月, 2021 1 次提交
    • A
      kasan, mm: integrate slab init_on_alloc with HW_TAGS · da844b78
      Andrey Konovalov 提交于
      This change uses the previously added memory initialization feature of
      HW_TAGS KASAN routines for slab memory when init_on_alloc is enabled.
      
      With this change, memory initialization memset() is no longer called when
      both HW_TAGS KASAN and init_on_alloc are enabled.  Instead, memory is
      initialized in KASAN runtime.
      
      The memory initialization memset() is moved into slab_post_alloc_hook()
      that currently directly follows the initialization loop.  A new argument
      is added to slab_post_alloc_hook() that indicates whether to initialize
      the memory or not.
      
      To avoid discrepancies with which memory gets initialized that can be
      caused by future changes, both KASAN hook and initialization memset() are
      put together and a warning comment is added.
      
      Combining setting allocation tags with memory initialization improves
      HW_TAGS KASAN performance when init_on_alloc is enabled.
      
      Link: https://lkml.kernel.org/r/c1292aeb5d519da221ec74a0684a949b027d7720.1615296150.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da844b78
  15. 08 4月, 2021 1 次提交
  16. 09 3月, 2021 1 次提交
    • P
      mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels · 5bb1bb35
      Paul E. McKenney 提交于
      The mem_dump_obj() functionality adds a few hundred bytes, which is a
      small price to pay.  Except on kernels built with CONFIG_PRINTK=n, in
      which mem_dump_obj() messages will be suppressed.  This commit therefore
      makes mem_dump_obj() be a static inline empty function on kernels built
      with CONFIG_PRINTK=n and excludes all of its support functions as well.
      This avoids kernel bloat on systems that cannot use mem_dump_obj().
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <linux-mm@kvack.org>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      5bb1bb35
  17. 25 2月, 2021 2 次提交
  18. 23 1月, 2021 1 次提交
    • P
      mm: Add mem_dump_obj() to print source of memory block · 8e7f37f2
      Paul E. McKenney 提交于
      There are kernel facilities such as per-CPU reference counts that give
      error messages in generic handlers or callbacks, whose messages are
      unenlightening.  In the case of per-CPU reference-count underflow, this
      is not a problem when creating a new use of this facility because in that
      case the bug is almost certainly in the code implementing that new use.
      However, trouble arises when deploying across many systems, which might
      exercise corner cases that were not seen during development and testing.
      Here, it would be really nice to get some kind of hint as to which of
      several uses the underflow was caused by.
      
      This commit therefore exposes a mem_dump_obj() function that takes
      a pointer to memory (which must still be allocated if it has been
      dynamically allocated) and prints available information on where that
      memory came from.  This pointer can reference the middle of the block as
      well as the beginning of the block, as needed by things like RCU callback
      functions and timer handlers that might not know where the beginning of
      the memory block is.  These functions and handlers can use mem_dump_obj()
      to print out better hints as to where the problem might lie.
      
      The information printed can depend on kernel configuration.  For example,
      the allocation return address can be printed only for slab and slub,
      and even then only when the necessary debug has been enabled.  For slab,
      build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
      to the next power of two or use the SLAB_STORE_USER when creating the
      kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
      boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
      if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
      to enable printing of the allocation-time stack trace.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Reported-by: NAndrii Nakryiko <andrii@kernel.org>
      [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
      [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
      [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
      [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
      [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
      [ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      8e7f37f2
  19. 16 12月, 2020 2 次提交
  20. 07 12月, 2020 1 次提交
  21. 03 12月, 2020 2 次提交
  22. 19 10月, 2020 1 次提交
    • R
      mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current() · 279c3393
      Roman Gushchin 提交于
      Patch series "mm: kmem: kernel memory accounting in an interrupt context".
      
      This patchset implements memcg-based memory accounting of allocations made
      from an interrupt context.
      
      Historically, such allocations were passed unaccounted mostly because
      charging the memory cgroup of the current process wasn't an option.  Also
      performance reasons were likely a reason too.
      
      The remote charging API allows to temporarily overwrite the currently
      active memory cgroup, so that all memory allocations are accounted towards
      some specified memory cgroup instead of the memory cgroup of the current
      process.
      
      This patchset extends the remote charging API so that it can be used from
      an interrupt context.  Then it removes the fence that prevented the
      accounting of allocations made from an interrupt context.  It also
      contains a couple of optimizations/code refactorings.
      
      This patchset doesn't directly enable accounting for any specific
      allocations, but prepares the code base for it.  The bpf memory accounting
      will likely be the first user of it: a typical example is a bpf program
      parsing an incoming network packet, which allocates an entry in hashmap
      map to store some information.
      
      This patch (of 4):
      
      Currently memcg_kmem_bypass() is called before obtaining the current
      memory/obj cgroup using get_mem/obj_cgroup_from_current().  Moving
      memcg_kmem_bypass() into get_mem/obj_cgroup_from_current() reduces the
      number of call sites and allows further code simplifications.
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Link: http://lkml.kernel.org/r/20200827225843.1270629-1-guro@fb.com
      Link: http://lkml.kernel.org/r/20200827225843.1270629-2-guro@fb.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      279c3393
  23. 17 10月, 2020 1 次提交
  24. 14 10月, 2020 1 次提交