1. 06 1月, 2022 8 次提交
    • V
      mm/sl*b: Differentiate struct slab fields by sl*b implementations · 401fb12c
      Vlastimil Babka 提交于
      With a struct slab definition separate from struct page, we can go
      further and define only fields that the chosen sl*b implementation uses.
      This means everything between __page_flags and __page_refcount
      placeholders now depends on the chosen CONFIG_SL*B. Some fields exist in
      all implementations (slab_list) but can be part of a union in some, so
      it's simpler to repeat them than complicate the definition with ifdefs
      even more.
      
      The patch doesn't change physical offsets of the fields, although it
      could be done later - for example it's now clear that tighter packing in
      SLOB could be possible.
      
      This should also prevent accidental use of fields that don't exist in
      given implementation. Before this patch virt_to_cache() and
      cache_from_obj() were visible for SLOB (albeit not used), although they
      rely on the slab_cache field that isn't set by SLOB. With this patch
      it's now a compile error, so these functions are now hidden behind
      an #ifndef CONFIG_SLOB.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Tested-by: Marco Elver <elver@google.com> # kfence
      Reviewed-by: NHyeonggon Yoo <42.hyeyoo@gmail.com>
      Tested-by: NHyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <kasan-dev@googlegroups.com>
      401fb12c
    • V
      mm/memcg: Convert slab objcgs from struct page to struct slab · 4b5f8d9a
      Vlastimil Babka 提交于
      page->memcg_data is used with MEMCG_DATA_OBJCGS flag only for slab pages
      so convert all the related infrastructure to struct slab. Also use
      struct folio instead of struct page when resolving object pointers.
      
      This is not just mechanistic changing of types and names. Now in
      mem_cgroup_from_obj() we use folio_test_slab() to decide if we interpret
      the folio as a real slab instead of a large kmalloc, instead of relying
      on MEMCG_DATA_OBJCGS bit that used to be checked in page_objcgs_check().
      Similarly in memcg_slab_free_hook() where we can encounter
      kmalloc_large() pages (here the folio slab flag check is implied by
      virt_to_slab()). As a result, page_objcgs_check() can be dropped instead
      of converted.
      
      To avoid include cycles, move the inline definition of slab_objcgs()
      from memcontrol.h to mm/slab.h.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <cgroups@vger.kernel.org>
      4b5f8d9a
    • V
      mm: Convert struct page to struct slab in functions used by other subsystems · 40f3bf0c
      Vlastimil Babka 提交于
      KASAN, KFENCE and memcg interact with SLAB or SLUB internals through
      functions nearest_obj(), obj_to_index() and objs_per_slab() that use
      struct page as parameter. This patch converts it to struct slab
      including all callers, through a coccinelle semantic patch.
      
      // Options: --include-headers --no-includes --smpl-spacing include/linux/slab_def.h include/linux/slub_def.h mm/slab.h mm/kasan/*.c mm/kfence/kfence_test.c mm/memcontrol.c mm/slab.c mm/slub.c
      // Note: needs coccinelle 1.1.1 to avoid breaking whitespace
      
      @@
      @@
      
      -objs_per_slab_page(
      +objs_per_slab(
       ...
       )
       { ... }
      
      @@
      @@
      
      -objs_per_slab_page(
      +objs_per_slab(
       ...
       )
      
      @@
      identifier fn =~ "obj_to_index|objs_per_slab";
      @@
      
       fn(...,
      -   const struct page *page
      +   const struct slab *slab
          ,...)
       {
      <...
      (
      - page_address(page)
      + slab_address(slab)
      |
      - page
      + slab
      )
      ...>
       }
      
      @@
      identifier fn =~ "nearest_obj";
      @@
      
       fn(...,
      -   struct page *page
      +   const struct slab *slab
          ,...)
       {
      <...
      (
      - page_address(page)
      + slab_address(slab)
      |
      - page
      + slab
      )
      ...>
       }
      
      @@
      identifier fn =~ "nearest_obj|obj_to_index|objs_per_slab";
      expression E;
      @@
      
       fn(...,
      (
      - slab_page(E)
      + E
      |
      - virt_to_page(E)
      + virt_to_slab(E)
      |
      - virt_to_head_page(E)
      + virt_to_slab(E)
      |
      - page
      + page_slab(page)
      )
        ,...)
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NAndrey Konovalov <andreyknvl@gmail.com>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Julia Lawall <julia.lawall@inria.fr>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <kasan-dev@googlegroups.com>
      Cc: <cgroups@vger.kernel.org>
      40f3bf0c
    • M
      mm: Convert check_heap_object() to use struct slab · 0b3eb091
      Matthew Wilcox (Oracle) 提交于
      Ensure that we're not seeing a tail page inside __check_heap_object() by
      converting to a slab instead of a page.  Take the opportunity to mark
      the slab as const since we're not modifying it.  Also move the
      declaration of __check_heap_object() to mm/slab.h so it's not available
      to the wider kernel.
      
      [ vbabka@suse.cz: in check_heap_object() only convert to struct slab for
        actual PageSlab pages; use folio as intermediate step instead of page ]
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      0b3eb091
    • M
      mm: Use struct slab in kmem_obj_info() · 7213230a
      Matthew Wilcox (Oracle) 提交于
      All three implementations of slab support kmem_obj_info() which reports
      details of an object allocated from the slab allocator.  By using the
      slab type instead of the page type, we make it obvious that this can
      only be called for slabs.
      
      [ vbabka@suse.cz: also convert the related kmem_valid_obj() to folios ]
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      7213230a
    • M
      mm: Convert virt_to_cache() to use struct slab · 82c1775d
      Matthew Wilcox (Oracle) 提交于
      This function is entirely self-contained, so can be converted from page
      to slab.
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NHyeonggon Yoo <42.hyeyoo@gmail.com>
      82c1775d
    • M
      mm: Convert [un]account_slab_page() to struct slab · b918653b
      Matthew Wilcox (Oracle) 提交于
      Convert the parameter of these functions to struct slab instead of
      struct page and drop _page from the names. For now their callers just
      convert page to slab.
      
      [ vbabka@suse.cz: replace existing functions instead of calling them ]
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      b918653b
    • M
      mm: Split slab into its own type · d122019b
      Matthew Wilcox (Oracle) 提交于
      Make struct slab independent of struct page. It still uses the
      underlying memory in struct page for storing slab-specific data, but
      slab and slub can now be weaned off using struct page directly.  Some of
      the wrapper functions (slab_address() and slab_order()) still need to
      cast to struct folio, but this is a significant disentanglement.
      
      [ vbabka@suse.cz: Rebase on folios, use folio instead of page where
        possible.
      
        Do not duplicate flags field in struct slab, instead make the related
        accessors go through slab_folio(). For testing pfmemalloc use the
        folio_*_active flag accessors directly so the PageSlabPfmemalloc
        wrappers can be removed later.
      
        Make folio_slab() expect only folio_test_slab() == true folios and
        virt_to_slab() return NULL when folio_test_slab() == false.
      
        Move struct slab to mm/slab.h.
      
        Don't represent with struct slab pages that are not true slab pages,
        but just a compound page obtained directly rom page allocator (with
        large kmalloc() for SLUB and SLOB). ]
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      d122019b
  2. 21 11月, 2021 1 次提交
  3. 31 7月, 2021 1 次提交
  4. 16 7月, 2021 1 次提交
  5. 30 6月, 2021 4 次提交
    • W
      mm: memcg/slab: properly set up gfp flags for objcg pointer array · 41eb5df1
      Waiman Long 提交于
      Patch series "mm: memcg/slab: Fix objcg pointer array handling problem", v4.
      
      Since the merging of the new slab memory controller in v5.9, the page
      structure stores a pointer to objcg pointer array for slab pages.  When
      the slab has no used objects, it can be freed in free_slab() which will
      call kfree() to free the objcg pointer array in
      memcg_alloc_page_obj_cgroups().  If it happens that the objcg pointer
      array is the last used object in its slab, that slab may then be freed
      which may caused kfree() to be called again.
      
      With the right workload, the slab cache may be set up in a way that allows
      the recursive kfree() calling loop to nest deep enough to cause a kernel
      stack overflow and panic the system.  In fact, we have a reproducer that
      can cause kernel stack overflow on a s390 system involving kmalloc-rcl-256
      and kmalloc-rcl-128 slabs with the following kfree() loop recursively
      called 74 times:
      
        [ 285.520739] [<000000000ec432fc>] kfree+0x4bc/0x560 [ 285.520740]
      [<000000000ec43466>] __free_slab+0xc6/0x228 [ 285.520741]
      [<000000000ec41fc2>] __slab_free+0x3c2/0x3e0 [ 285.520742]
      [<000000000ec432fc>] kfree+0x4bc/0x560 : While investigating this issue, I
      also found an issue on the allocation side.  If the objcg pointer array
      happen to come from the same slab or a circular dependency linkage is
      formed with multiple slabs, those affected slabs can never be freed again.
      
      This patch series addresses these two issues by introducing a new set of
      kmalloc-cg-<n> caches split from kmalloc-<n> caches.  The new set will
      only contain non-reclaimable and non-dma objects that are accounted in
      memory cgroups whereas the old set are now for unaccounted objects only.
      By making this split, all the objcg pointer arrays will come from the
      kmalloc-<n> caches, but those caches will never hold any objcg pointer
      array.  As a result, deeply nested kfree() call and the unfreeable slab
      problems are now gone.
      
      This patch (of 4):
      
      Since the merging of the new slab memory controller in v5.9, the page
      structure may store a pointer to obj_cgroup pointer array for slab pages.
      Currently, only the __GFP_ACCOUNT bit is masked off.  However, the array
      is not readily reclaimable and doesn't need to come from the DMA buffer.
      So those GFP bits should be masked off as well.
      
      Do the flag bit clearing at memcg_alloc_page_obj_cgroups() to make sure
      that it is consistently applied no matter where it is called.
      
      Link: https://lkml.kernel.org/r/20210505200610.13943-1-longman@redhat.com
      Link: https://lkml.kernel.org/r/20210505200610.13943-2-longman@redhat.com
      Fixes: 286e04b8 ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages")
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41eb5df1
    • W
      mm/memcg: move mod_objcg_state() to memcontrol.c · fdbcb2a6
      Waiman Long 提交于
      Patch series "mm/memcg: Reduce kmemcache memory accounting overhead", v6.
      
      With the recent introduction of the new slab memory controller, we
      eliminate the need for having separate kmemcaches for each memory cgroup
      and reduce overall kernel memory usage.  However, we also add additional
      memory accounting overhead to each call of kmem_cache_alloc() and
      kmem_cache_free().
      
      For workloads that require a lot of kmemcache allocations and
      de-allocations, they may experience performance regression as illustrated
      in [1] and [2].
      
      A simple kernel module that performs repeated loop of 100,000,000
      kmem_cache_alloc() and kmem_cache_free() of either a small 32-byte object
      or a big 4k object at module init time with a batch size of 4 (4 kmalloc's
      followed by 4 kfree's) is used for benchmarking.  The benchmarking tool
      was run on a kernel based on linux-next-20210419.  The test was run on a
      CascadeLake server with turbo-boosting disable to reduce run-to-run
      variation.
      
      The small object test exercises mainly the object stock charging and
      vmstat update code paths.  The large object test also exercises the
      refill_obj_stock() and __memcg_kmem_charge()/__memcg_kmem_uncharge() code
      paths.
      
      With memory accounting disabled, the run time was 3.130s with both small
      object big object tests.
      
      With memory accounting enabled, both cgroup v1 and v2 showed similar
      results in the small object test.  The performance results of the large
      object test, however, differed between cgroup v1 and v2.
      
      The execution times with the application of various patches in the
      patchset were:
      
        Applied patches   Run time   Accounting overhead   %age 1   %age 2
        ---------------   --------   -------------------   ------   ------
      
        Small 32-byte object:
             None          11.634s         8.504s          100.0%   271.7%
              1-2           9.425s         6.295s           74.0%   201.1%
              1-3           9.708s         6.578s           77.4%   210.2%
              1-4           8.062s         4.932s           58.0%   157.6%
      
        Large 4k object (v2):
             None          22.107s        18.977s          100.0%   606.3%
              1-2          20.960s        17.830s           94.0%   569.6%
              1-3          14.238s        11.108s           58.5%   354.9%
              1-4          11.329s         8.199s           43.2%   261.9%
      
        Large 4k object (v1):
             None          36.807s        33.677s          100.0%  1075.9%
              1-2          36.648s        33.518s           99.5%  1070.9%
              1-3          22.345s        19.215s           57.1%   613.9%
              1-4          18.662s        15.532s           46.1%   496.2%
      
        N.B. %age 1 = overhead/unpatched overhead
             %age 2 = overhead/accounting disabled time
      
      Patch 2 (vmstat data stock caching) helps in both the small object test
      and the large v2 object test. It doesn't help much in v1 big object test.
      
      Patch 3 (refill_obj_stock improvement) does help the small object test
      but offer significant performance improvement for the large object test
      (both v1 and v2).
      
      Patch 4 (eliminating irq disable/enable) helps in all test cases.
      
      To test for the extreme case, a multi-threaded kmalloc/kfree
      microbenchmark was run on the 2-socket 48-core 96-thread system with
      96 testing threads in the same memcg doing kmalloc+kfree of a 4k object
      with accounting enabled for 10s. The total number of kmalloc+kfree done
      in kilo operations per second (kops/s) were as follows:
      
        Applied patches   v1 kops/s   v1 change   v2 kops/s   v2 change
        ---------------   ---------   ---------   ---------   ---------
             None           3,520        1.00X      6,242        1.00X
              1-2           4,304        1.22X      8,478        1.36X
              1-3           4,731        1.34X    418,142       66.99X
              1-4           4,587        1.30X    438,838       70.30X
      
      With memory accounting disabled, the kmalloc/kfree rate was 1,481,291
      kop/s. This test shows how significant the memory accouting overhead
      can be in some extreme situations.
      
      For this multithreaded test, the improvement from patch 2 mainly
      comes from the conditional atomic xchg of objcg->nr_charged_bytes in
      mod_objcg_state(). By using an unconditional xchg, the operation rates
      were similar to the unpatched kernel.
      
      Patch 3 elminates the single highly contended cacheline of
      objcg->nr_charged_bytes for cgroup v2 leading to a huge performance
      improvement. Cgroup v1, however, still has another highly contended
      cacheline in the shared page counter &memcg->kmem. So the improvement
      is only modest.
      
      Patch 4 helps in cgroup v2, but performs worse in cgroup v1 as
      eliminating the irq_disable/irq_enable overhead seems to aggravate the
      cacheline contention.
      
      [1] https://lore.kernel.org/linux-mm/20210408193948.vfktg3azh2wrt56t@gabell/T/#u
      [2] https://lore.kernel.org/lkml/20210114025151.GA22932@xsang-OptiPlex-9020/
      
      This patch (of 4):
      
      mod_objcg_state() is moved from mm/slab.h to mm/memcontrol.c so that
      further optimization can be done to it in later patches without exposing
      unnecessary details to other mm components.
      
      Link: https://lkml.kernel.org/r/20210506150007.16288-1-longman@redhat.com
      Link: https://lkml.kernel.org/r/20210506150007.16288-2-longman@redhat.comSigned-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Chris Down <chris@chrisdown.name>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Yafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fdbcb2a6
    • F
      mm: slub: move sysfs slab alloc/free interfaces to debugfs · 64dd6849
      Faiyaz Mohammed 提交于
      alloc_calls and free_calls implementation in sysfs have two issues, one is
      PAGE_SIZE limitation of sysfs and other is it does not adhere to "one
      value per file" rule.
      
      To overcome this issues, move the alloc_calls and free_calls
      implementation to debugfs.
      
      Debugfs cache will be created if SLAB_STORE_USER flag is set.
      
      Rename the alloc_calls/free_calls to alloc_traces/free_traces, to be
      inline with what it does.
      
      [faiyazm@codeaurora.org: fix the leak of alloc/free traces debugfs interface]
        Link: https://lkml.kernel.org/r/1624248060-30286-1-git-send-email-faiyazm@codeaurora.org
      
      Link: https://lkml.kernel.org/r/1623438200-19361-1-git-send-email-faiyazm@codeaurora.orgSigned-off-by: NFaiyaz Mohammed <faiyazm@codeaurora.org>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64dd6849
    • O
      mm/slub, kunit: add a KUnit test for SLUB debugging functionality · 1f9f78b1
      Oliver Glitta 提交于
      SLUB has resiliency_test() function which is hidden behind #ifdef
      SLUB_RESILIENCY_TEST that is not part of Kconfig, so nobody runs it.
      KUnit should be a proper replacement for it.
      
      Try changing byte in redzone after allocation and changing pointer to next
      free node, first byte, 50th byte and redzone byte.  Check if validation
      finds errors.
      
      There are several differences from the original resiliency test: Tests
      create own caches with known state instead of corrupting shared kmalloc
      caches.
      
      The corruption of freepointer uses correct offset, the original resiliency
      test got broken with freepointer changes.
      
      Scratch changing random byte test, because it does not have meaning in
      this form where we need deterministic results.
      
      Add new option CONFIG_SLUB_KUNIT_TEST in Kconfig.  Tests next_pointer,
      first_word and clobber_50th_byte do not run with KASAN option on.  Because
      the test deliberately modifies non-allocated objects.
      
      Use kunit_resource to count errors in cache and silence bug reports.
      Count error whenever slab_bug() or slab_fix() is called or when the count
      of pages is wrong.
      
      [glittao@gmail.com: remove unused function test_exit(), from SLUB KUnit test]
        Link: https://lkml.kernel.org/r/20210512140656.12083-1-glittao@gmail.com
      [akpm@linux-foundation.org: export kasan_enable/disable_current to modules]
      
      Link: https://lkml.kernel.org/r/20210511150734.3492-2-glittao@gmail.comSigned-off-by: NOliver Glitta <glittao@gmail.com>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NDaniel Latypov <dlatypov@google.com>
      Acked-by: NMarco Elver <elver@google.com>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f9f78b1
  6. 11 5月, 2021 1 次提交
    • M
      mm/slub: Add Support for free path information of an object · e548eaa1
      Maninder Singh 提交于
      This commit adds enables a stack dump for the last free of an object:
      
      slab kmalloc-64 start c8ab0140 data offset 64 pointer offset 0 size 64 allocated at meminfo_proc_show+0x40/0x4fc
      [   20.192078]     meminfo_proc_show+0x40/0x4fc
      [   20.192263]     seq_read_iter+0x18c/0x4c4
      [   20.192430]     proc_reg_read_iter+0x84/0xac
      [   20.192617]     generic_file_splice_read+0xe8/0x17c
      [   20.192816]     splice_direct_to_actor+0xb8/0x290
      [   20.193008]     do_splice_direct+0xa0/0xe0
      [   20.193185]     do_sendfile+0x2d0/0x438
      [   20.193345]     sys_sendfile64+0x12c/0x140
      [   20.193523]     ret_fast_syscall+0x0/0x58
      [   20.193695]     0xbeeacde4
      [   20.193822]  Free path:
      [   20.193935]     meminfo_proc_show+0x5c/0x4fc
      [   20.194115]     seq_read_iter+0x18c/0x4c4
      [   20.194285]     proc_reg_read_iter+0x84/0xac
      [   20.194475]     generic_file_splice_read+0xe8/0x17c
      [   20.194685]     splice_direct_to_actor+0xb8/0x290
      [   20.194870]     do_splice_direct+0xa0/0xe0
      [   20.195014]     do_sendfile+0x2d0/0x438
      [   20.195174]     sys_sendfile64+0x12c/0x140
      [   20.195336]     ret_fast_syscall+0x0/0x58
      [   20.195491]     0xbeeacde4
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Co-developed-by: NVaneet Narang <v.narang@samsung.com>
      Signed-off-by: NVaneet Narang <v.narang@samsung.com>
      Signed-off-by: NManinder Singh <maninder1.s@samsung.com>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      e548eaa1
  7. 01 5月, 2021 1 次提交
    • A
      kasan, mm: integrate slab init_on_alloc with HW_TAGS · da844b78
      Andrey Konovalov 提交于
      This change uses the previously added memory initialization feature of
      HW_TAGS KASAN routines for slab memory when init_on_alloc is enabled.
      
      With this change, memory initialization memset() is no longer called when
      both HW_TAGS KASAN and init_on_alloc are enabled.  Instead, memory is
      initialized in KASAN runtime.
      
      The memory initialization memset() is moved into slab_post_alloc_hook()
      that currently directly follows the initialization loop.  A new argument
      is added to slab_post_alloc_hook() that indicates whether to initialize
      the memory or not.
      
      To avoid discrepancies with which memory gets initialized that can be
      caused by future changes, both KASAN hook and initialization memset() are
      put together and a warning comment is added.
      
      Combining setting allocation tags with memory initialization improves
      HW_TAGS KASAN performance when init_on_alloc is enabled.
      
      Link: https://lkml.kernel.org/r/c1292aeb5d519da221ec74a0684a949b027d7720.1615296150.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da844b78
  8. 08 4月, 2021 1 次提交
  9. 09 3月, 2021 1 次提交
    • P
      mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels · 5bb1bb35
      Paul E. McKenney 提交于
      The mem_dump_obj() functionality adds a few hundred bytes, which is a
      small price to pay.  Except on kernels built with CONFIG_PRINTK=n, in
      which mem_dump_obj() messages will be suppressed.  This commit therefore
      makes mem_dump_obj() be a static inline empty function on kernels built
      with CONFIG_PRINTK=n and excludes all of its support functions as well.
      This avoids kernel bloat on systems that cannot use mem_dump_obj().
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <linux-mm@kvack.org>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      5bb1bb35
  10. 25 2月, 2021 2 次提交
  11. 23 1月, 2021 1 次提交
    • P
      mm: Add mem_dump_obj() to print source of memory block · 8e7f37f2
      Paul E. McKenney 提交于
      There are kernel facilities such as per-CPU reference counts that give
      error messages in generic handlers or callbacks, whose messages are
      unenlightening.  In the case of per-CPU reference-count underflow, this
      is not a problem when creating a new use of this facility because in that
      case the bug is almost certainly in the code implementing that new use.
      However, trouble arises when deploying across many systems, which might
      exercise corner cases that were not seen during development and testing.
      Here, it would be really nice to get some kind of hint as to which of
      several uses the underflow was caused by.
      
      This commit therefore exposes a mem_dump_obj() function that takes
      a pointer to memory (which must still be allocated if it has been
      dynamically allocated) and prints available information on where that
      memory came from.  This pointer can reference the middle of the block as
      well as the beginning of the block, as needed by things like RCU callback
      functions and timer handlers that might not know where the beginning of
      the memory block is.  These functions and handlers can use mem_dump_obj()
      to print out better hints as to where the problem might lie.
      
      The information printed can depend on kernel configuration.  For example,
      the allocation return address can be printed only for slab and slub,
      and even then only when the necessary debug has been enabled.  For slab,
      build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
      to the next power of two or use the SLAB_STORE_USER when creating the
      kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
      boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
      if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
      to enable printing of the allocation-time stack trace.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Reported-by: NAndrii Nakryiko <andrii@kernel.org>
      [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
      [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
      [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
      [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
      [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
      [ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      8e7f37f2
  12. 16 12月, 2020 2 次提交
  13. 07 12月, 2020 1 次提交
  14. 03 12月, 2020 2 次提交
  15. 19 10月, 2020 1 次提交
    • R
      mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current() · 279c3393
      Roman Gushchin 提交于
      Patch series "mm: kmem: kernel memory accounting in an interrupt context".
      
      This patchset implements memcg-based memory accounting of allocations made
      from an interrupt context.
      
      Historically, such allocations were passed unaccounted mostly because
      charging the memory cgroup of the current process wasn't an option.  Also
      performance reasons were likely a reason too.
      
      The remote charging API allows to temporarily overwrite the currently
      active memory cgroup, so that all memory allocations are accounted towards
      some specified memory cgroup instead of the memory cgroup of the current
      process.
      
      This patchset extends the remote charging API so that it can be used from
      an interrupt context.  Then it removes the fence that prevented the
      accounting of allocations made from an interrupt context.  It also
      contains a couple of optimizations/code refactorings.
      
      This patchset doesn't directly enable accounting for any specific
      allocations, but prepares the code base for it.  The bpf memory accounting
      will likely be the first user of it: a typical example is a bpf program
      parsing an incoming network packet, which allocates an entry in hashmap
      map to store some information.
      
      This patch (of 4):
      
      Currently memcg_kmem_bypass() is called before obtaining the current
      memory/obj cgroup using get_mem/obj_cgroup_from_current().  Moving
      memcg_kmem_bypass() into get_mem/obj_cgroup_from_current() reduces the
      number of call sites and allows further code simplifications.
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Link: http://lkml.kernel.org/r/20200827225843.1270629-1-guro@fb.com
      Link: http://lkml.kernel.org/r/20200827225843.1270629-2-guro@fb.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      279c3393
  16. 17 10月, 2020 1 次提交
  17. 14 10月, 2020 1 次提交
  18. 08 8月, 2020 10 次提交