1. 01 2月, 2023 1 次提交
  2. 12 12月, 2022 1 次提交
    • Y
      mm: memcg: fix stale protection of reclaim target memcg · adb82130
      Yosry Ahmed 提交于
      Patch series "mm: memcg: fix protection of reclaim target memcg", v3.
      
      This series fixes a bug in calculating the protection of the reclaim
      target memcg where we end up using stale effective protection values from
      the last reclaim operation, instead of completely ignoring the protection
      of the reclaim target as intended.  More detailed explanation and examples
      in patch 1, which includes the fix.  Patches 2 & 3 introduce a selftest
      case that catches the bug.
      
      
      This patch (of 3):
      
      When we are doing memcg reclaim, the intended behavior is that we
      ignore any protection (memory.min, memory.low) of the target memcg (but
      not its children).  Ever since the patch pointed to by the "Fixes" tag,
      we actually read a stale value for the target memcg protection when
      deciding whether to skip the memcg or not because it is protected.  If
      the stale value happens to be high enough, we don't reclaim from the
      target memcg.
      
      Essentially, in some cases we may falsely skip reclaiming from the
      target memcg of reclaim because we read a stale protection value from
      last time we reclaimed from it.
      
      
      During reclaim, mem_cgroup_calculate_protection() is used to determine the
      effective protection (emin and elow) values of a memcg.  The protection of
      the reclaim target is ignored, but we cannot set their effective
      protection to 0 due to a limitation of the current implementation (see
      comment in mem_cgroup_protection()).  Instead, we leave their effective
      protection values unchaged, and later ignore it in
      mem_cgroup_protection().
      
      However, mem_cgroup_protection() is called later in
      shrink_lruvec()->get_scan_count(), which is after the
      mem_cgroup_below_{min/low}() checks in shrink_node_memcgs().  As a result,
      the stale effective protection values of the target memcg may lead us to
      skip reclaiming from the target memcg entirely, before calling
      shrink_lruvec().  This can be even worse with recursive protection, where
      the stale target memcg protection can be higher than its standalone
      protection.  See two examples below (a similar version of example (a) is
      added to test_memcontrol in a later patch).
      
      (a) A simple example with proactive reclaim is as follows. Consider the
      following hierarchy:
      ROOT
       |
       A
       |
       B (memory.min = 10M)
      
      Consider the following scenario:
      - B has memory.current = 10M.
      - The system undergoes global reclaim (or memcg reclaim in A).
      - In shrink_node_memcgs():
        - mem_cgroup_calculate_protection() calculates the effective min (emin)
          of B as 10M.
        - mem_cgroup_below_min() returns true for B, we do not reclaim from B.
      - Now if we want to reclaim 5M from B using proactive reclaim
        (memory.reclaim), we should be able to, as the protection of the
        target memcg should be ignored.
      - In shrink_node_memcgs():
        - mem_cgroup_calculate_protection() immediately returns for B without
          doing anything, as B is the target memcg, relying on
          mem_cgroup_protection() to ignore B's stale effective min (still 10M).
        - mem_cgroup_below_min() reads the stale effective min for B and we
          skip it instead of ignoring its protection as intended, as we never
          reach mem_cgroup_protection().
      
      (b) An more complex example with recursive protection is as follows.
      Consider the following hierarchy with memory_recursiveprot:
      ROOT
       |
       A (memory.min = 50M)
       |
       B (memory.min = 10M, memory.high = 40M)
      
      Consider the following scenario:
      - B has memory.current = 35M.
      - The system undergoes global reclaim (target memcg is NULL).
      - B will have an effective min of 50M (all of A's unclaimed protection).
      - B will not be reclaimed from.
      - Now allocate 10M more memory in B, pushing it above it's high limit.
      - The system undergoes memcg reclaim from B (target memcg is B).
      - Like example (a), we do nothing in mem_cgroup_calculate_protection(),
        then call mem_cgroup_below_min(), which will read the stale effective
        min for B (50M) and skip it. In this case, it's even worse because we
        are not just considering B's standalone protection (10M), but we are
        reading a much higher stale protection (50M) which will cause us to not
        reclaim from B at all.
      
      This is an artifact of commit 45c7f7e1 ("mm, memcg: decouple
      e{low,min} state mutations from protection checks") which made
      mem_cgroup_calculate_protection() only change the state without returning
      any value.  Before that commit, we used to return MEMCG_PROT_NONE for the
      target memcg, which would cause us to skip the
      mem_cgroup_below_{min/low}() checks.  After that commit we do not return
      anything and we end up checking the min & low effective protections for
      the target memcg, which are stale.
      
      Update mem_cgroup_supports_protection() to also check if we are reclaiming
      from the target, and rename it to mem_cgroup_unprotected() (now returns
      true if we should not protect the memcg, much simpler logic).
      
      Link: https://lkml.kernel.org/r/20221202031512.1365483-1-yosryahmed@google.com
      Link: https://lkml.kernel.org/r/20221202031512.1365483-2-yosryahmed@google.com
      Fixes: 45c7f7e1 ("mm, memcg: decouple e{low,min} state mutations from protection checks")
      Signed-off-by: NYosry Ahmed <yosryahmed@google.com>
      Reviewed-by: NRoman Gushchin <roman.gushchin@linux.dev>
      Cc: Chris Down <chris@chrisdown.name>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vasily Averin <vasily.averin@linux.dev>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      adb82130
  3. 04 10月, 2022 3 次提交
  4. 29 9月, 2022 1 次提交
  5. 27 9月, 2022 3 次提交
    • S
      mm: deduplicate cacheline padding code · e6ad640b
      Shakeel Butt 提交于
      There are three users (mmzone.h, memcontrol.h, page_counter.h) using
      similar code for forcing cacheline padding between fields of different
      structures.  Dedup that code.
      
      Link: https://lkml.kernel.org/r/20220826230642.566725-1-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com>
      Suggested-by: NFeng Tang <feng.tang@intel.com>
      Reviewed-by: NFeng Tang <feng.tang@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      e6ad640b
    • Y
      mm: multi-gen LRU: support page table walks · bd74fdae
      Yu Zhao 提交于
      To further exploit spatial locality, the aging prefers to walk page tables
      to search for young PTEs and promote hot pages.  A kill switch will be
      added in the next patch to disable this behavior.  When disabled, the
      aging relies on the rmap only.
      
      NB: this behavior has nothing similar with the page table scanning in the
      2.4 kernel [1], which searches page tables for old PTEs, adds cold pages
      to swapcache and unmaps them.
      
      To avoid confusion, the term "iteration" specifically means the traversal
      of an entire mm_struct list; the term "walk" will be applied to page
      tables and the rmap, as usual.
      
      An mm_struct list is maintained for each memcg, and an mm_struct follows
      its owner task to the new memcg when this task is migrated.  Given an
      lruvec, the aging iterates lruvec_memcg()->mm_list and calls
      walk_page_range() with each mm_struct on this list to promote hot pages
      before it increments max_seq.
      
      When multiple page table walkers iterate the same list, each of them gets
      a unique mm_struct; therefore they can run concurrently.  Page table
      walkers ignore any misplaced pages, e.g., if an mm_struct was migrated,
      pages it left in the previous memcg will not be promoted when its current
      memcg is under reclaim.  Similarly, page table walkers will not promote
      pages from nodes other than the one under reclaim.
      
      This patch uses the following optimizations when walking page tables:
      1. It tracks the usage of mm_struct's between context switches so that
         page table walkers can skip processes that have been sleeping since
         the last iteration.
      2. It uses generational Bloom filters to record populated branches so
         that page table walkers can reduce their search space based on the
         query results, e.g., to skip page tables containing mostly holes or
         misplaced pages.
      3. It takes advantage of the accessed bit in non-leaf PMD entries when
         CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y.
      4. It does not zigzag between a PGD table and the same PMD table
         spanning multiple VMAs. IOW, it finishes all the VMAs within the
         range of the same PMD table before it returns to a PGD table. This
         improves the cache performance for workloads that have large
         numbers of tiny VMAs [2], especially when CONFIG_PGTABLE_LEVELS=5.
      
      Server benchmark results:
        Single workload:
          fio (buffered I/O): no change
      
        Single workload:
          memcached (anon): +[8, 10]%
                      Ops/sec      KB/sec
            patch1-7: 1147696.57   44640.29
            patch1-8: 1245274.91   48435.66
      
        Configurations:
          no change
      
      Client benchmark results:
        kswapd profiles:
          patch1-7
            48.16%  lzo1x_1_do_compress (real work)
             8.20%  page_vma_mapped_walk (overhead)
             7.06%  _raw_spin_unlock_irq
             2.92%  ptep_clear_flush
             2.53%  __zram_bvec_write
             2.11%  do_raw_spin_lock
             2.02%  memmove
             1.93%  lru_gen_look_around
             1.56%  free_unref_page_list
             1.40%  memset
      
          patch1-8
            49.44%  lzo1x_1_do_compress (real work)
             6.19%  page_vma_mapped_walk (overhead)
             5.97%  _raw_spin_unlock_irq
             3.13%  get_pfn_folio
             2.85%  ptep_clear_flush
             2.42%  __zram_bvec_write
             2.08%  do_raw_spin_lock
             1.92%  memmove
             1.44%  alloc_zspage
             1.36%  memset
      
        Configurations:
          no change
      
      Thanks to the following developers for their efforts [3].
        kernel test robot <lkp@intel.com>
      
      [1] https://lwn.net/Articles/23732/
      [2] https://llvm.org/docs/ScudoHardenedAllocator.html
      [3] https://lore.kernel.org/r/202204160827.ekEARWQo-lkp@intel.com/
      
      Link: https://lkml.kernel.org/r/20220918080010.2920238-9-yuzhao@google.comSigned-off-by: NYu Zhao <yuzhao@google.com>
      Acked-by: NBrian Geffon <bgeffon@google.com>
      Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org>
      Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Acked-by: NSteven Barrett <steven@liquorix.net>
      Acked-by: NSuleiman Souhlal <suleiman@google.com>
      Tested-by: NDaniel Byrne <djbyrne@mtu.edu>
      Tested-by: NDonald Carr <d@chaos-reins.com>
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru>
      Tested-by: NShuang Zhai <szhai2@cs.rochester.edu>
      Tested-by: NSofia Trinh <sofia.trinh@edi.works>
      Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michael Larabel <Michael@MichaelLarabel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      bd74fdae
    • Y
      mm: multi-gen LRU: exploit locality in rmap · 018ee47f
      Yu Zhao 提交于
      Searching the rmap for PTEs mapping each page on an LRU list (to test and
      clear the accessed bit) can be expensive because pages from different VMAs
      (PA space) are not cache friendly to the rmap (VA space).  For workloads
      mostly using mapped pages, searching the rmap can incur the highest CPU
      cost in the reclaim path.
      
      This patch exploits spatial locality to reduce the trips into the rmap. 
      When shrink_page_list() walks the rmap and finds a young PTE, a new
      function lru_gen_look_around() scans at most BITS_PER_LONG-1 adjacent
      PTEs.  On finding another young PTE, it clears the accessed bit and
      updates the gen counter of the page mapped by this PTE to
      (max_seq%MAX_NR_GENS)+1.
      
      Server benchmark results:
        Single workload:
          fio (buffered I/O): no change
      
        Single workload:
          memcached (anon): +[3, 5]%
                      Ops/sec      KB/sec
            patch1-6: 1106168.46   43025.04
            patch1-7: 1147696.57   44640.29
      
        Configurations:
          no change
      
      Client benchmark results:
        kswapd profiles:
          patch1-6
            39.03%  lzo1x_1_do_compress (real work)
            18.47%  page_vma_mapped_walk (overhead)
             6.74%  _raw_spin_unlock_irq
             3.97%  do_raw_spin_lock
             2.49%  ptep_clear_flush
             2.48%  anon_vma_interval_tree_iter_first
             1.92%  folio_referenced_one
             1.88%  __zram_bvec_write
             1.48%  memmove
             1.31%  vma_interval_tree_iter_next
      
          patch1-7
            48.16%  lzo1x_1_do_compress (real work)
             8.20%  page_vma_mapped_walk (overhead)
             7.06%  _raw_spin_unlock_irq
             2.92%  ptep_clear_flush
             2.53%  __zram_bvec_write
             2.11%  do_raw_spin_lock
             2.02%  memmove
             1.93%  lru_gen_look_around
             1.56%  free_unref_page_list
             1.40%  memset
      
        Configurations:
          no change
      
      Link: https://lkml.kernel.org/r/20220918080010.2920238-8-yuzhao@google.comSigned-off-by: NYu Zhao <yuzhao@google.com>
      Acked-by: NBarry Song <baohua@kernel.org>
      Acked-by: NBrian Geffon <bgeffon@google.com>
      Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org>
      Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Acked-by: NSteven Barrett <steven@liquorix.net>
      Acked-by: NSuleiman Souhlal <suleiman@google.com>
      Tested-by: NDaniel Byrne <djbyrne@mtu.edu>
      Tested-by: NDonald Carr <d@chaos-reins.com>
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru>
      Tested-by: NShuang Zhai <szhai2@cs.rochester.edu>
      Tested-by: NSofia Trinh <sofia.trinh@edi.works>
      Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michael Larabel <Michael@MichaelLarabel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      018ee47f
  6. 12 9月, 2022 1 次提交
  7. 29 8月, 2022 1 次提交
    • S
      Revert "memcg: cleanup racy sum avoidance code" · dbb16df6
      Shakeel Butt 提交于
      This reverts commit 96e51ccf.
      
      Recently we started running the kernel with rstat infrastructure on
      production traffic and begin to see negative memcg stats values. 
      Particularly the 'sock' stat is the one which we observed having negative
      value.
      
      $ grep "sock " /mnt/memory/job/memory.stat
      sock 253952
      total_sock 18446744073708724224
      
      Re-run after couple of seconds
      
      $ grep "sock " /mnt/memory/job/memory.stat
      sock 253952
      total_sock 53248
      
      For now we are only seeing this issue on large machines (256 CPUs) and
      only with 'sock' stat.  I think the networking stack increase the stat on
      one cpu and decrease it on another cpu much more often.  So, this negative
      sock is due to rstat flusher flushing the stats on the CPU that has seen
      the decrement of sock but missed the CPU that has increments.  A typical
      race condition.
      
      For easy stable backport, revert is the most simple solution.  For long
      term solution, I am thinking of two directions.  First is just reduce the
      race window by optimizing the rstat flusher.  Second is if the reader sees
      a negative stat value, force flush and restart the stat collection. 
      Basically retry but limited.
      
      Link: https://lkml.kernel.org/r/20220817172139.3141101-1-shakeelb@google.com
      Fixes: 96e51ccf ("memcg: cleanup racy sum avoidance code")
      Signed-off-by: NShakeel Butt <shakeelb@google.com>
      Cc: "Michal Koutný" <mkoutny@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: <stable@vger.kernel.org>	[5.15]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      dbb16df6
  8. 04 7月, 2022 1 次提交
    • R
      mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino() · c15187a4
      Roman Gushchin 提交于
      Patch series "mm: introduce shrinker debugfs interface", v5.
      
      The only existing debugging mechanism is a couple of tracepoints in
      do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end.  They
      aren't covering everything though: shrinkers which report 0 objects will
      never show up, there is no support for memcg-aware shrinkers.  Shrinkers
      are identified by their scan function, which is not always enough (e.g. 
      hard to guess which super block's shrinker it is having only
      "super_cache_scan").
      
      To provide a better visibility and debug options for memory shrinkers this
      patchset introduces a /sys/kernel/debug/shrinker interface, to some extent
      similar to /sys/kernel/slab.
      
      For each shrinker registered in the system a directory is created.  As
      now, the directory will contain only a "scan" file, which allows to get
      the number of managed objects for each memory cgroup (for memcg-aware
      shrinkers) and each numa node (for numa-aware shrinkers on a numa
      machine).  Other interfaces might be added in the future.
      
      To make debugging more pleasant, the patchset also names all shrinkers, so
      that debugfs entries can have meaningful names.
      
      
      This patch (of 5):
      
      Shrinker debugfs requires a way to represent memory cgroups without using
      full paths, both for displaying information and getting input from a user.
      
      Cgroup inode number is a perfect way, already used by bpf.
      
      This commit adds a couple of helper functions which will be used to handle
      memcg-aware shrinkers.
      
      Link: https://lkml.kernel.org/r/20220601032227.4076670-1-roman.gushchin@linux.dev
      Link: https://lkml.kernel.org/r/20220601032227.4076670-2-roman.gushchin@linux.devSigned-off-by: NRoman Gushchin <roman.gushchin@linux.dev>
      Acked-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      c15187a4
  9. 17 6月, 2022 2 次提交
  10. 20 5月, 2022 1 次提交
    • J
      zswap: memcg accounting · f4840ccf
      Johannes Weiner 提交于
      Applications can currently escape their cgroup memory containment when
      zswap is enabled.  This patch adds per-cgroup tracking and limiting of
      zswap backend memory to rectify this.
      
      The existing cgroup2 memory.stat file is extended to show zswap statistics
      analogous to what's in meminfo and vmstat.  Furthermore, two new control
      files, memory.zswap.current and memory.zswap.max, are added to allow
      tuning zswap usage on a per-workload basis.  This is important since not
      all workloads benefit from zswap equally; some even suffer compared to
      disk swap when memory contents don't compress well.  The optimal size of
      the zswap pool, and the threshold for writeback, also depends on the size
      of the workload's warm set.
      
      The implementation doesn't use a traditional page_counter transaction. 
      zswap is unconventional as a memory consumer in that we only know the
      amount of memory to charge once expensive compression has occurred.  If
      zwap is disabled or the limit is already exceeded we obviously don't want
      to compress page upon page only to reject them all.  Instead, the limit is
      checked against current usage, then we compress and charge.  This allows
      some limit overrun, but not enough to matter in practice.
      
      [hannes@cmpxchg.org: fix for CONFIG_SLOB builds]
        Link: https://lkml.kernel.org/r/YnwD14zxYjUJPc2w@cmpxchg.org
      [hannes@cmpxchg.org: opt out of cgroups v1]
        Link: https://lkml.kernel.org/r/Yn6it9mBYFA+/lTb@cmpxchg.org
      Link: https://lkml.kernel.org/r/20220510152847.230957-7-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      f4840ccf
  11. 13 5月, 2022 1 次提交
  12. 29 4月, 2022 1 次提交
  13. 22 4月, 2022 1 次提交
    • S
      memcg: sync flush only if periodic flush is delayed · 9b301615
      Shakeel Butt 提交于
      Daniel Dao has reported [1] a regression on workloads that may trigger a
      lot of refaults (anon and file).  The underlying issue is that flushing
      rstat is expensive.  Although rstat flush are batched with (nr_cpus *
      MEMCG_BATCH) stat updates, it seems like there are workloads which
      genuinely do stat updates larger than batch value within short amount of
      time.  Since the rstat flush can happen in the performance critical
      codepaths like page faults, such workload can suffer greatly.
      
      This patch fixes this regression by making the rstat flushing
      conditional in the performance critical codepaths.  More specifically,
      the kernel relies on the async periodic rstat flusher to flush the stats
      and only if the periodic flusher is delayed by more than twice the
      amount of its normal time window then the kernel allows rstat flushing
      from the performance critical codepaths.
      
      Now the question: what are the side-effects of this change? The worst
      that can happen is the refault codepath will see 4sec old lruvec stats
      and may cause false (or missed) activations of the refaulted page which
      may under-or-overestimate the workingset size.  Though that is not very
      concerning as the kernel can already miss or do false activations.
      
      There are two more codepaths whose flushing behavior is not changed by
      this patch and we may need to come to them in future.  One is the
      writeback stats used by dirty throttling and second is the deactivation
      heuristic in the reclaim.  For now keeping an eye on them and if there
      is report of regression due to these codepaths, we will reevaluate then.
      
      Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com [1]
      Link: https://lkml.kernel.org/r/20220304184040.1304781-1-shakeelb@google.com
      Fixes: 1f828223 ("memcg: flush lruvec stats in the refault")
      Signed-off-by: NShakeel Butt <shakeelb@google.com>
      Reported-by: NDaniel Dao <dqminh@cloudflare.com>
      Tested-by: NIvan Babrou <ivan@cloudflare.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Frank Hofmann <fhofmann@cloudflare.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b301615
  14. 23 3月, 2022 5 次提交
    • M
      mm: memcontrol: rename memcg_cache_id to memcg_kmem_id · 7c52f65d
      Muchun Song 提交于
      The memcg_cache_id() introduced by commit 2633d7a0 ("slab/slub:
      consider a memcg parameter in kmem_create_cache") is used to index in the
      kmem_cache->memcg_params->memcg_caches array.  Since
      kmem_cache->memcg_params.memcg_caches has been removed by commit
      9855609b ("mm: memcg/slab: use a single set of kmem_caches for all
      accounted allocations").  So the name does not need to reflect cache
      related.  Just rename it to memcg_kmem_id.  And it can reflect kmem
      related.
      
      Link: https://lkml.kernel.org/r/20220228122126.37293-17-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
      Cc: Chao Yu <chao@kernel.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Fam Zheng <fam.zheng@bytedance.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kari Argillander <kari.argillander@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c52f65d
    • M
      mm: list_lru: replace linear array with xarray · bbca91cc
      Muchun Song 提交于
      If we run 10k containers in the system, the size of the
      list_lru_memcg->lrus can be ~96KB per list_lru.  When we decrease the
      number containers, the size of the array will not be shrinked.  It is
      not scalable.  The xarray is a good choice for this case.  We can save a
      lot of memory when there are tens of thousands continers in the system.
      If we use xarray, we also can remove the logic code of resizing array,
      which can simplify the code.
      
      [akpm@linux-foundation.org: remove unused local]
      
      Link: https://lkml.kernel.org/r/20220228122126.37293-13-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
      Cc: Chao Yu <chao@kernel.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Fam Zheng <fam.zheng@bytedance.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kari Argillander <kari.argillander@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bbca91cc
    • M
      mm: introduce kmem_cache_alloc_lru · 88f2ef73
      Muchun Song 提交于
      We currently allocate scope for every memcg to be able to tracked on
      every superblock instantiated in the system, regardless of whether that
      superblock is even accessible to that memcg.
      
      These huge memcg counts come from container hosts where memcgs are
      confined to just a small subset of the total number of superblocks that
      instantiated at any given point in time.
      
      For these systems with huge container counts, list_lru does not need the
      capability of tracking every memcg on every superblock.  What it comes
      down to is that adding the memcg to the list_lru at the first insert.
      So introduce kmem_cache_alloc_lru to allocate objects and its list_lru.
      In the later patch, we will convert all inode and dentry allocation from
      kmem_cache_alloc to kmem_cache_alloc_lru.
      
      Link: https://lkml.kernel.org/r/20220228122126.37293-3-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
      Cc: Chao Yu <chao@kernel.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Fam Zheng <fam.zheng@bytedance.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kari Argillander <kari.argillander@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88f2ef73
    • W
      mm/memcg: retrieve parent memcg from css.parent · 486bc706
      Wei Yang 提交于
      The parent we get from page_counter is correct, while this is two
      different hierarchy.
      
      Let's retrieve the parent memcg from css.parent just like parent_cs(),
      blkcg_parent(), etc.
      
      Link: https://lkml.kernel.org/r/20220201004643.8391-2-richard.weiyang@gmail.comSigned-off-by: NWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      486bc706
    • Y
      memcg: add per-memcg total kernel memory stat · a8c49af3
      Yosry Ahmed 提交于
      Currently memcg stats show several types of kernel memory: kernel stack,
      page tables, sock, vmalloc, and slab.  However, there are other
      allocations with __GFP_ACCOUNT (or supersets such as GFP_KERNEL_ACCOUNT)
      that are not accounted in any of those stats, a few examples are:
      
       - various kvm allocations (e.g. allocated pages to create vcpus)
       - io_uring
       - tmp_page in pipes during pipe_write()
       - bpf ringbuffers
       - unix sockets
      
      Keeping track of the total kernel memory is essential for the ease of
      migration from cgroup v1 to v2 as there are large discrepancies between
      v1's kmem.usage_in_bytes and the sum of the available kernel memory
      stats in v2.  Adding separate memcg stats for all __GFP_ACCOUNT kernel
      allocations is an impractical maintenance burden as there a lot of those
      all over the kernel code, with more use cases likely to show up in the
      future.
      
      Therefore, add a "kernel" memcg stat that is analogous to kmem page
      counter, with added benefits such as using rstat infrastructure which
      aggregates stats more efficiently.  Additionally, this provides a
      lighter alternative in case the legacy kmem is deprecated in the future
      
      [yosryahmed@google.com: v2]
        Link: https://lkml.kernel.org/r/20220203193856.972500-1-yosryahmed@google.com
      
      Link: https://lkml.kernel.org/r/20220201200823.3283171-1-yosryahmed@google.comSigned-off-by: NYosry Ahmed <yosryahmed@google.com>
      Acked-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8c49af3
  15. 12 2月, 2022 1 次提交
    • R
      mm: memcg: synchronize objcg lists with a dedicated spinlock · 0764db9b
      Roman Gushchin 提交于
      Alexander reported a circular lock dependency revealed by the mmap1 ltp
      test:
      
        LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
                WARNING: possible circular locking dependency detected
                5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
                ------------------------------------------------------
                mmap1/202299 is trying to acquire lock:
                00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
                but task is already holding lock:
                00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
                which lock already depends on the new lock.
                the existing dependency chain (in reverse order) is:
                -> #1 (&sighand->siglock){-.-.}-{2:2}:
                       __lock_acquire+0x604/0xbd8
                       lock_acquire.part.0+0xe2/0x238
                       lock_acquire+0xb0/0x200
                       _raw_spin_lock_irqsave+0x6a/0xd8
                       __lock_task_sighand+0x90/0x190
                       cgroup_freeze_task+0x2e/0x90
                       cgroup_migrate_execute+0x11c/0x608
                       cgroup_update_dfl_csses+0x246/0x270
                       cgroup_subtree_control_write+0x238/0x518
                       kernfs_fop_write_iter+0x13e/0x1e0
                       new_sync_write+0x100/0x190
                       vfs_write+0x22c/0x2d8
                       ksys_write+0x6c/0xf8
                       __do_syscall+0x1da/0x208
                       system_call+0x82/0xb0
                -> #0 (css_set_lock){..-.}-{2:2}:
                       check_prev_add+0xe0/0xed8
                       validate_chain+0x736/0xb20
                       __lock_acquire+0x604/0xbd8
                       lock_acquire.part.0+0xe2/0x238
                       lock_acquire+0xb0/0x200
                       _raw_spin_lock_irqsave+0x6a/0xd8
                       obj_cgroup_release+0x4a/0xe0
                       percpu_ref_put_many.constprop.0+0x150/0x168
                       drain_obj_stock+0x94/0xe8
                       refill_obj_stock+0x94/0x278
                       obj_cgroup_charge+0x164/0x1d8
                       kmem_cache_alloc+0xac/0x528
                       __sigqueue_alloc+0x150/0x308
                       __send_signal+0x260/0x550
                       send_signal+0x7e/0x348
                       force_sig_info_to_task+0x104/0x180
                       force_sig_fault+0x48/0x58
                       __do_pgm_check+0x120/0x1f0
                       pgm_check_handler+0x11e/0x180
                other info that might help us debug this:
                 Possible unsafe locking scenario:
                       CPU0                    CPU1
                       ----                    ----
                  lock(&sighand->siglock);
                                               lock(css_set_lock);
                                               lock(&sighand->siglock);
                  lock(css_set_lock);
                 *** DEADLOCK ***
                2 locks held by mmap1/202299:
                 #0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
                 #1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
                stack backtrace:
                CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
                Hardware name: IBM 3906 M04 704 (LPAR)
                Call Trace:
                  dump_stack_lvl+0x76/0x98
                  check_noncircular+0x136/0x158
                  check_prev_add+0xe0/0xed8
                  validate_chain+0x736/0xb20
                  __lock_acquire+0x604/0xbd8
                  lock_acquire.part.0+0xe2/0x238
                  lock_acquire+0xb0/0x200
                  _raw_spin_lock_irqsave+0x6a/0xd8
                  obj_cgroup_release+0x4a/0xe0
                  percpu_ref_put_many.constprop.0+0x150/0x168
                  drain_obj_stock+0x94/0xe8
                  refill_obj_stock+0x94/0x278
                  obj_cgroup_charge+0x164/0x1d8
                  kmem_cache_alloc+0xac/0x528
                  __sigqueue_alloc+0x150/0x308
                  __send_signal+0x260/0x550
                  send_signal+0x7e/0x348
                  force_sig_info_to_task+0x104/0x180
                  force_sig_fault+0x48/0x58
                  __do_pgm_check+0x120/0x1f0
                  pgm_check_handler+0x11e/0x180
                INFO: lockdep is turned off.
      
      In this example a slab allocation from __send_signal() caused a
      refilling and draining of a percpu objcg stock, resulted in a releasing
      of another non-related objcg.  Objcg release path requires taking the
      css_set_lock, which is used to synchronize objcg lists.
      
      This can create a circular dependency with the sighandler lock, which is
      taken with the locked css_set_lock by the freezer code (to freeze a
      task).
      
      In general it seems that using css_set_lock to synchronize objcg lists
      makes any slab allocations and deallocation with the locked css_set_lock
      and any intervened locks risky.
      
      To fix the problem and make the code more robust let's stop using
      css_set_lock to synchronize objcg lists and use a new dedicated spinlock
      instead.
      
      Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
      Fixes: bf4f0599 ("mm: memcg/slab: obj_cgroup API")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Reported-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
      Tested-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
      Reviewed-by: NWaiman Long <longman@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NJeremy Linton <jeremy.linton@arm.com>
      Tested-by: NJeremy Linton <jeremy.linton@arm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0764db9b
  16. 15 1月, 2022 2 次提交
  17. 06 1月, 2022 1 次提交
    • V
      mm/memcg: Convert slab objcgs from struct page to struct slab · 4b5f8d9a
      Vlastimil Babka 提交于
      page->memcg_data is used with MEMCG_DATA_OBJCGS flag only for slab pages
      so convert all the related infrastructure to struct slab. Also use
      struct folio instead of struct page when resolving object pointers.
      
      This is not just mechanistic changing of types and names. Now in
      mem_cgroup_from_obj() we use folio_test_slab() to decide if we interpret
      the folio as a real slab instead of a large kmalloc, instead of relying
      on MEMCG_DATA_OBJCGS bit that used to be checked in page_objcgs_check().
      Similarly in memcg_slab_free_hook() where we can encounter
      kmalloc_large() pages (here the folio slab flag check is implied by
      virt_to_slab()). As a result, page_objcgs_check() can be dropped instead
      of converted.
      
      To avoid include cycles, move the inline definition of slab_objcgs()
      from memcontrol.h to mm/slab.h.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <cgroups@vger.kernel.org>
      4b5f8d9a
  18. 07 11月, 2021 2 次提交
    • Y
      mm/vmpressure: fix data-race with memcg->socket_pressure · 7e6ec49c
      Yuanzheng Song 提交于
      When reading memcg->socket_pressure in mem_cgroup_under_socket_pressure()
      and writing memcg->socket_pressure in vmpressure() at the same time, the
      following data-race occurs:
      
        BUG: KCSAN: data-race in __sk_mem_reduce_allocated / vmpressure
      
        write to 0xffff8881286f4938 of 8 bytes by task 24550 on cpu 3:
         vmpressure+0x218/0x230 mm/vmpressure.c:307
         shrink_node_memcgs+0x2b9/0x410 mm/vmscan.c:2658
         shrink_node+0x9d2/0x11d0 mm/vmscan.c:2769
         shrink_zones+0x29f/0x470 mm/vmscan.c:2972
         do_try_to_free_pages+0x193/0x6e0 mm/vmscan.c:3027
         try_to_free_mem_cgroup_pages+0x1c0/0x3f0 mm/vmscan.c:3345
         reclaim_high mm/memcontrol.c:2440 [inline]
         mem_cgroup_handle_over_high+0x18b/0x4d0 mm/memcontrol.c:2624
         tracehook_notify_resume include/linux/tracehook.h:197 [inline]
         exit_to_user_mode_loop kernel/entry/common.c:164 [inline]
         exit_to_user_mode_prepare+0x110/0x170 kernel/entry/common.c:191
         syscall_exit_to_user_mode+0x16/0x30 kernel/entry/common.c:266
         ret_from_fork+0x15/0x30 arch/x86/entry/entry_64.S:289
      
        read to 0xffff8881286f4938 of 8 bytes by interrupt on cpu 1:
         mem_cgroup_under_socket_pressure include/linux/memcontrol.h:1483 [inline]
         sk_under_memory_pressure include/net/sock.h:1314 [inline]
         __sk_mem_reduce_allocated+0x1d2/0x270 net/core/sock.c:2696
         __sk_mem_reclaim+0x44/0x50 net/core/sock.c:2711
         sk_mem_reclaim include/net/sock.h:1490 [inline]
         ......
         net_rx_action+0x17a/0x480 net/core/dev.c:6864
         __do_softirq+0x12c/0x2af kernel/softirq.c:298
         run_ksoftirqd+0x13/0x20 kernel/softirq.c:653
         smpboot_thread_fn+0x33f/0x510 kernel/smpboot.c:165
         kthread+0x1fc/0x220 kernel/kthread.c:292
         ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
      
      Fix it by using READ_ONCE() and WRITE_ONCE() to read and write
      memcg->socket_pressure.
      
      Link: https://lkml.kernel.org/r/20211025082843.671690-1-songyuanzheng@huawei.comSigned-off-by: NYuanzheng Song <songyuanzheng@huawei.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e6ec49c
    • M
      mm: memcontrol: remove the kmem states · e80216d9
      Muchun Song 提交于
      Now the kmem states is only used to indicate whether the kmem is
      offline.  However, we can set ->kmemcg_id to -1 to indicate whether the
      kmem is offline.  Finally, we can remove the kmem states to simplify the
      code.
      
      Link: https://lkml.kernel.org/r/20211025125259.56624-1-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e80216d9
  19. 18 10月, 2021 1 次提交
  20. 27 9月, 2021 10 次提交