1. 19 3月, 2021 1 次提交
  2. 13 2月, 2021 1 次提交
  3. 11 2月, 2021 2 次提交
  4. 10 2月, 2021 4 次提交
    • C
      mm: simplify swapdev_block · f885056a
      Christoph Hellwig 提交于
      Open code the parts of map_swap_entry that was actually used by
      swapdev_block, and remove the now unused map_swap_entry function.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f885056a
    • J
      Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" · e82553c1
      Johannes Weiner 提交于
      This reverts commit 536d3bf2, as it can
      cause writers to memory.high to get stuck in the kernel forever,
      performing page reclaim and consuming excessive amounts of CPU cycles.
      
      Before the patch, a write to memory.high would first put the new limit
      in place for the workload, and then reclaim the requested delta.  After
      the patch, the kernel tries to reclaim the delta before putting the new
      limit into place, in order to not overwhelm the workload with a sudden,
      large excess over the limit.  However, if reclaim is actively racing
      with new allocations from the uncurbed workload, it can keep the write()
      working inside the kernel indefinitely.
      
      This is causing problems in Facebook production.  A privileged
      system-level daemon that adjusts memory.high for various workloads
      running on a host can get unexpectedly stuck in the kernel and
      essentially turn into a sort of involuntary kswapd for one of the
      workloads.  We've observed that daemon busy-spin in a write() for
      minutes at a time, neglecting its other duties on the system, and
      expending privileged system resources on behalf of a workload.
      
      To remedy this, we have first considered changing the reclaim logic to
      break out after a couple of loops - whether the workload has converged
      to the new limit or not - and bound the write() call this way.  However,
      the root cause that inspired the sequence change in the first place has
      been fixed through other means, and so a revert back to the proven
      limit-setting sequence, also used by memory.max, is preferable.
      
      The sequence was changed to avoid extreme latencies in the workload when
      the limit was lowered: the sudden, large excess created by the limit
      lowering would erroneously trigger the penalty sleeping code that is
      meant to throttle excessive growth from below.  Allocating threads could
      end up sleeping long after the write() had already reclaimed the delta
      for which they were being punished.
      
      However, erroneous throttling also caused problems in other scenarios at
      around the same time.  This resulted in commit b3ff9291 ("mm, memcg:
      reclaim more aggressively before high allocator throttling"), included
      in the same release as the offending commit.  When allocating threads
      now encounter large excess caused by a racing write() to memory.high,
      instead of entering punitive sleeps, they will simply be tasked with
      helping reclaim down the excess, and will be held no longer than it
      takes to accomplish that.  This is in line with regular limit
      enforcement - i.e.  if the workload allocates up against or over an
      otherwise unchanged limit from below.
      
      With the patch breaking userspace, and the root cause addressed by other
      means already, revert it again.
      
      Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org
      Fixes: 536d3bf2 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NTejun Heo <tj@kernel.org>
      Acked-by: NChris Down <chris@chrisdown.name>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: <stable@vger.kernel.org>	[5.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e82553c1
    • A
      mm/mremap: fix BUILD_BUG_ON() error in get_extent · a30a2909
      Arnd Bergmann 提交于
      clang can't evaluate this function argument at compile time when the
      function is not inlined, which leads to a link time failure:
      
        ld.lld: error: undefined symbol: __compiletime_assert_414
        >>> referenced by mremap.c
        >>>               mremap.o:(get_extent) in archive mm/built-in.a
      
      Mark the function as __always_inline to avoid it.
      
      Link: https://lkml.kernel.org/r/20201230154104.522605-1-arnd@kernel.org
      Fixes: 9ad9718b ("mm/mremap: calculate extent in one place")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Tested-by: NNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Cc: Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a30a2909
    • A
      kasan: fix stack traces dependency for HW_TAGS · 1cc4cdb5
      Andrey Konovalov 提交于
      Currently, whether the alloc/free stack traces collection is enabled by
      default for hardware tag-based KASAN depends on CONFIG_DEBUG_KERNEL.
      The intention for this dependency was to only enable collection on slow
      debug kernels due to a significant perf and memory impact.
      
      As it turns out, CONFIG_DEBUG_KERNEL is not considered a debug option
      and is enabled on many productions kernels including Android and Ubuntu.
      As the result, this dependency is pointless and only complicates the
      code and documentation.
      
      Having stack traces collection disabled by default would make the
      hardware mode work differently to to the software ones, which is
      confusing.
      
      This change removes the dependency and enables stack traces collection
      by default.
      
      Looking into the future, this default might makes sense for production
      kernels, assuming we implement a fast stack trace collection approach.
      
      Link: https://lkml.kernel.org/r/6678d77ceffb71f1cff2cf61560e2ffe7bb6bfe9.1612808820.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1cc4cdb5
  5. 09 2月, 2021 1 次提交
    • P
      mm: provide a saner PTE walking API for modules · 9fd6dad1
      Paolo Bonzini 提交于
      Currently, the follow_pfn function is exported for modules but
      follow_pte is not.  However, follow_pfn is very easy to misuse,
      because it does not provide protections (so most of its callers
      assume the page is writable!) and because it returns after having
      already unlocked the page table lock.
      
      Provide instead a simplified version of follow_pte that does
      not have the pmdpp and range arguments.  The older version
      survives as follow_invalidate_pte() for use by fs/dax.c.
      Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9fd6dad1
  6. 07 2月, 2021 1 次提交
    • K
      mm: page_frag: Introduce page_frag_alloc_align() · b358e212
      Kevin Hao 提交于
      In the current implementation of page_frag_alloc(), it doesn't have
      any align guarantee for the returned buffer address. But for some
      hardwares they do require the DMA buffer to be aligned correctly,
      so we would have to use some workarounds like below if the buffers
      allocated by the page_frag_alloc() are used by these hardwares for
      DMA.
          buf = page_frag_alloc(really_needed_size + align);
          buf = PTR_ALIGN(buf, align);
      
      These codes seems ugly and would waste a lot of memories if the buffers
      are used in a network driver for the TX/RX. So introduce
      page_frag_alloc_align() to make sure that an aligned buffer address is
      returned.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b358e212
  7. 06 2月, 2021 11 次提交
  8. 30 1月, 2021 3 次提交
  9. 29 1月, 2021 1 次提交
    • W
      Revert "mm/slub: fix a memory leak in sysfs_slab_add()" · 757fed1d
      Wang Hai 提交于
      This reverts commit dde3c6b7.
      
      syzbot report a double-free bug. The following case can cause this bug.
      
       - mm/slab_common.c: create_cache(): if the __kmem_cache_create() fails,
         it does:
      
      	out_free_cache:
      		kmem_cache_free(kmem_cache, s);
      
       - but __kmem_cache_create() - at least for slub() - will have done
      
      	sysfs_slab_add(s)
      		-> sysfs_create_group() .. fails ..
      		-> kobject_del(&s->kobj); .. which frees s ...
      
      We can't remove the kmem_cache_free() in create_cache(), because other
      error cases of __kmem_cache_create() do not free this.
      
      So, revert the commit dde3c6b7 ("mm/slub: fix a memory leak in
      sysfs_slab_add()") to fix this.
      
      Reported-by: syzbot+d0bd96b4696c1ef67991@syzkaller.appspotmail.com
      Fixes: dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()")
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NWang Hai <wanghai38@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      757fed1d
  10. 28 1月, 2021 3 次提交
  11. 27 1月, 2021 1 次提交
  12. 25 1月, 2021 11 次提交