1. 25 2月, 2021 9 次提交
    • V
      mm, slub: remove slub_memcg_sysfs boot param and CONFIG_SLUB_MEMCG_SYSFS_ON · fe2cce15
      Vlastimil Babka 提交于
      The boot param and config determine the value of memcg_sysfs_enabled,
      which is unused since commit 10befea9 ("mm: memcg/slab: use a single
      set of kmem_caches for all allocations") as there are no per-memcg kmem
      caches anymore.
      
      Link: https://lkml.kernel.org/r/20210127124745.7928-1-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe2cce15
    • V
      mm, slub: splice cpu and page freelists in deactivate_slab() · d930ff03
      Vlastimil Babka 提交于
      In deactivate_slab() we currently move all but one objects on the cpu
      freelist to the page freelist one by one using the costly cmpxchg_double()
      operation.  Then we unfreeze the page while moving the last object on page
      freelist, with a final cmpxchg_double().
      
      This can be optimized to avoid the cmpxchg_double() per object.  Just
      count the objects on cpu freelist (to adjust page->inuse properly) and
      also remember the last object in the chain.  Then splice page->freelist to
      the last object and effectively add the whole cpu freelist to
      page->freelist while unfreezing the page, with a single cmpxchg_double().
      
      Link: https://lkml.kernel.org/r/20210115183543.15097-1-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NJann Horn <jannh@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d930ff03
    • V
      mm, slab, slub: stop taking cpu hotplug lock · 59450bbc
      Vlastimil Babka 提交于
      SLAB has been using get/put_online_cpus() around creating, destroying and
      shrinking kmem caches since 95402b38 ("cpu-hotplug: replace
      per-subsystem mutexes with get_online_cpus()") in 2008, which is supposed
      to be replacing a private mutex (cache_chain_mutex, called slab_mutex
      today) with system-wide mechanism, but in case of SLAB it's in fact used
      in addition to the existing mutex, without explanation why.
      
      SLUB appears to have avoided the cpu hotplug lock initially, but gained it
      due to common code unification, such as 20cea968 ("mm, sl[aou]b: Move
      kmem_cache_create mutex handling to common code").
      
      Regardless of the history, checking if the hotplug lock is actually needed
      today suggests that it's not, and therefore it's better to avoid this
      system-wide lock and the ordering this imposes wrt other locks (such as
      slab_mutex).
      
      Specifically, in SLAB we have for_each_online_cpu() in do_tune_cpucache()
      protected by slab_mutex, and cpu hotplug callbacks that also take the
      slab_mutex, which is also taken by the common slab function that currently
      also take the hotplug lock.  Thus the slab_mutex protection should be
      sufficient.  Also per-cpu array caches are allocated for each possible
      cpu, so not affected by their online/offline state.
      
      In SLUB we have for_each_online_cpu() in functions that show statistics
      and are already unprotected today, as racing with hotplug is not harmful.
      Otherwise SLUB relies on percpu allocator.  The slub_cpu_dead() hotplug
      callback takes the slab_mutex.
      
      To sum up, this patch removes get/put_online_cpus() calls from slab as it
      should be safe without further adjustments.
      
      Link: https://lkml.kernel.org/r/20210113131634.3671-4-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Qian Cai <cai@redhat.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59450bbc
    • V
      mm, slab, slub: stop taking memory hotplug lock · 7e1fa93d
      Vlastimil Babka 提交于
      Since commit 03afc0e2 ("slab: get_online_mems for
      kmem_cache_{create,destroy,shrink}") we are taking memory hotplug lock for
      SLAB and SLUB when creating, destroying or shrinking a cache.  It is quite
      a heavy lock and it's best to avoid it if possible, as we had several
      issues with lockdep complaining about ordering in the past, see e.g.
      e4f8e513 ("mm/slub: fix a deadlock in show_slab_objects()").
      
      The problem scenario in 03afc0e2 (solved by the memory hotplug lock)
      can be summarized as follows: while there's slab_mutex synchronizing new
      kmem cache creation and SLUB's MEM_GOING_ONLINE callback
      slab_mem_going_online_callback(), we may miss creation of kmem_cache_node
      for the hotplugged node in the new kmem cache, because the hotplug
      callback doesn't yet see the new cache, and cache creation in
      init_kmem_cache_nodes() only inits kmem_cache_node for nodes in the
      N_NORMAL_MEMORY nodemask, which however may not yet include the new node,
      as that happens only later after the MEM_GOING_ONLINE callback.
      
      Instead of using get/put_online_mems(), the problem can be solved by SLUB
      maintaining its own nodemask of nodes for which it has allocated the
      per-node kmem_cache_node structures.  This nodemask would generally mirror
      the N_NORMAL_MEMORY nodemask, but would be updated only in under SLUB's
      control in its memory hotplug callbacks under the slab_mutex.  This patch
      adds such nodemask and its handling.
      
      Commit 03afc0e2 mentiones "issues like [the one above]", but there
      don't appear to be further issues.  All the paths (shared for SLAB and
      SLUB) taking the memory hotplug locks are also taking the slab_mutex,
      except kmem_cache_shrink() where 03afc0e2 replaced slab_mutex with
      get/put_online_mems().
      
      We however cannot simply restore slab_mutex in kmem_cache_shrink(), as
      SLUB can enters the function from a write to sysfs 'shrink' file, thus
      holding kernfs lock, and in kmem_cache_create() the kernfs lock is nested
      within slab_mutex.  But on closer inspection we don't actually need to
      protect kmem_cache_shrink() from hotplug callbacks: While SLUB's
      __kmem_cache_shrink() does for_each_kmem_cache_node(), missing a new node
      added in parallel hotplug is not fatal, and parallel hotremove does not
      free kmem_cache_node's anymore after the previous patch, so use-after free
      cannot happen.  The per-node shrinking itself is protected by
      n->list_lock.  Same is true for SLAB, and SLOB is no-op.
      
      SLAB also doesn't need the memory hotplug locking, which it only gained by
      03afc0e2 through the shared paths in slab_common.c.  Its memory
      hotplug callbacks are also protected by slab_mutex against races with
      these paths.  The problem of SLUB relying on N_NORMAL_MEMORY doesn't apply
      to SLAB, as its setup_kmem_cache_nodes relies on N_ONLINE, and the new
      node is already set there during the MEM_GOING_ONLINE callback, so no
      special care is needed for SLAB.
      
      As such, this patch removes all get/put_online_mems() usage by the slab
      subsystem.
      
      Link: https://lkml.kernel.org/r/20210113131634.3671-3-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Qian Cai <cai@redhat.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e1fa93d
    • V
      mm, slub: stop freeing kmem_cache_node structures on node offline · 666716fd
      Vlastimil Babka 提交于
      Patch series "mm, slab, slub: remove cpu and memory hotplug locks".
      
      Some related work caused me to look at how we use get/put_mems_online()
      and get/put_online_cpus() during kmem cache
      creation/descruction/shrinking, and realize that it should be actually
      safe to remove all of that with rather small effort (as e.g.  Michal Hocko
      suspected in some of the past discussions already).  This has the benefit
      to avoid rather heavy locks that have caused locking order issues already
      in the past.  So this is the result, Patches 2 and 3 remove memory hotplug
      and cpu hotplug locking, respectively.  Patch 1 is due to realization that
      in fact some races exist despite the locks (even if not removed), but the
      most sane solution is not to introduce more of them, but rather accept
      some wasted memory in scenarios that should be rare anyway (full memory
      hot remove), as we do the same in other contexts already.
      
      This patch (of 3):
      
      Commit e4f8e513 ("mm/slub: fix a deadlock in show_slab_objects()") has
      fixed a problematic locking order by removing the memory hotplug lock
      get/put_online_mems() from show_slab_objects().  During the discussion, it
      was argued [1] that this is OK, because existing slabs on the node would
      prevent a hotremove to proceed.
      
      That's true, but per-node kmem_cache_node structures are not necessarily
      allocated on the same node and may exist even without actual slab pages on
      the same node.  Any path that uses get_node() directly or via
      for_each_kmem_cache_node() (such as show_slab_objects()) can race with
      freeing of kmem_cache_node even with the !NULL check, resulting in
      use-after-free.
      
      To that end, commit e4f8e513 argues in a comment that:
      
       * We don't really need mem_hotplug_lock (to hold off
       * slab_mem_going_offline_callback) here because slab's memory hot
       * unplug code doesn't destroy the kmem_cache->node[] data.
      
      While it's true that slab_mem_going_offline_callback() doesn't free the
      kmem_cache_node, the later callback slab_mem_offline_callback() actually
      does, so the race and use-after-free exists.  Not just for
      show_slab_objects() after commit e4f8e513, but also many other places
      that are not under slab_mutex.  And adding slab_mutex locking or other
      synchronization to SLUB paths such as get_any_partial() would be bad for
      performance and error-prone.
      
      The easiest solution is therefore to make the abovementioned comment true
      and stop freeing the kmem_cache_node structures, accepting some wasted
      memory in the full memory node removal scenario.  Analogically we also
      don't free hotremoved pgdat as mentioned in [1], nor the similar per-node
      structures in SLAB.  Importantly this approach will not block the
      hotremove, as generally such nodes should be movable in order to succeed
      hotremove in the first place, and thus the GFP_KERNEL allocated
      kmem_cache_node will come from elsewhere.
      
      [1] https://lore.kernel.org/linux-mm/20190924151147.GB23050@dhcp22.suse.cz/
      
      Link: https://lkml.kernel.org/r/20210113131634.3671-1-vbabka@suse.cz
      Link: https://lkml.kernel.org/r/20210113131634.3671-2-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Qian Cai <cai@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      666716fd
    • J
      mm/slub: disable user tracing for kmemleak caches by default · ca220593
      Johannes Berg 提交于
      If kmemleak is enabled, it uses a kmem cache for its own objects.  These
      objects are used to hold information kmemleak uses, including a stack
      trace.  If slub_debug is also turned on, each of them has *another* stack
      trace, so the overhead adds up, and on my tests (on ARCH=um, admittedly)
      2/3rds of the allocations end up being doing the stack tracing.
      
      Turn off SLAB_STORE_USER if SLAB_NOLEAKTRACE was given, to avoid storing
      the essentially same data twice.
      
      Link: https://lkml.kernel.org/r/20210113215114.d94efa13ba30.I117b6764e725b3192318bbcf4269b13b709539ae@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca220593
    • Z
      mm/slab: minor coding style tweaks · 0b411634
      Zhiyuan Dai 提交于
      Fix some coding style issues, improve code reading.  Adds whitespace to
      clearly separate the parameters.
      
      Link: https://lkml.kernel.org/r/1612841499-32166-1-git-send-email-daizhiyuan@phytium.com.cnSigned-off-by: NZhiyuan Dai <daizhiyuan@phytium.com.cn>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b411634
    • N
      mm/sl?b.c: remove ctor argument from kmem_cache_flags · 37540008
      Nikolay Borisov 提交于
      This argument hasn't been used since e153362a ("slub: Remove objsize
      check in kmem_cache_flags()") so simply remove it.
      
      Link: https://lkml.kernel.org/r/20210126095733.974665-1-nborisov@suse.comSigned-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37540008
    • J
      mm, tracing: record slab name for kmem_cache_free() · 3544de8e
      Jacob Wen 提交于
      Currently, a trace record generated by the RCU core is as below.
      
      ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f3b49a66
      
      It doesn't tell us what the RCU core has freed.
      
      This patch adds the slab name to trace_kmem_cache_free().
      The new format is as follows.
      
      ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000037f79c8d name=dentry
      ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f78cb7b5 name=sock_inode_cache
      ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000018768985 name=pool_workqueue
      ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=000000006a6cb484 name=radix_tree_node
      
      We can use it to understand what the RCU core is going to free. For
      example, some users maybe interested in when the RCU core starts
      freeing reclaimable slabs like dentry to reduce memory pressure.
      
      Link: https://lkml.kernel.org/r/20201216072804.8838-1-jian.w.wen@oracle.comSigned-off-by: NJacob Wen <jian.w.wen@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3544de8e
  2. 15 2月, 2021 2 次提交
    • D
      percpu: fix clang modpost section mismatch · 258e0815
      Dennis Zhou 提交于
      pcpu_build_alloc_info() is an __init function that makes a call to
      cpumask_clear_cpu(). With CONFIG_GCOV_PROFILE_ALL enabled, the inline
      heuristics are modified and such cpumask_clear_cpu() which is marked
      inline doesn't get inlined. Because it works on mask in __initdata,
      modpost throws a section mismatch error.
      
      Arnd sent a patch with the flatten attribute as an alternative [2]. I've
      added it to compiler_attributes.h.
      
      modpost complaint:
        WARNING: modpost: vmlinux.o(.text+0x735425): Section mismatch in reference from the function cpumask_clear_cpu() to the variable .init.data:pcpu_build_alloc_info.mask
        The function cpumask_clear_cpu() references
        the variable __initdata pcpu_build_alloc_info.mask.
        This is often because cpumask_clear_cpu lacks a __initdata
        annotation or the annotation of pcpu_build_alloc_info.mask is wrong.
      
      clang output:
        mm/percpu.c:2724:5: remark: cpumask_clear_cpu not inlined into pcpu_build_alloc_info because too costly to inline (cost=725, threshold=325) [-Rpass-missed=inline]
      
      [1] https://lore.kernel.org/linux-mm/202012220454.9F6Bkz9q-lkp@intel.com/
      [2] https://lore.kernel.org/lkml/CAK8P3a2ZWfNeXKSm8K_SUhhwkor17jFo3xApLXjzfPqX0eUDUA@mail.gmail.com/Reported-by: Nkernel test robot <lkp@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NDennis Zhou <dennis@kernel.org>
      258e0815
    • W
      percpu: reduce the number of cpu distance comparisons · d7d29ac7
      Wonhyuk Yang 提交于
      To build group_map[] and group_cnt[], we find out which group
      CPUs belong to by comparing the distance of the cpu. However,
      this includes cases where comparisons are not required.
      
      This patch uses a bitmap to record CPUs that is not classified in
      the group. CPUs that we know which group they belong to should be
      cleared from the bitmap. In result, we can reduce the number of
      unnecessary comparisons.
      Signed-off-by: NWonhyuk Yang <vvghjk1234@gmail.com>
      Signed-off-by: NDennis Zhou <dennis@kernel.org>
      [Dennis: added cpumask_clear() call and #include cpumask.h.]
      d7d29ac7
  3. 13 2月, 2021 1 次提交
  4. 11 2月, 2021 2 次提交
  5. 10 2月, 2021 4 次提交
    • C
      mm: simplify swapdev_block · f885056a
      Christoph Hellwig 提交于
      Open code the parts of map_swap_entry that was actually used by
      swapdev_block, and remove the now unused map_swap_entry function.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f885056a
    • J
      Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" · e82553c1
      Johannes Weiner 提交于
      This reverts commit 536d3bf2, as it can
      cause writers to memory.high to get stuck in the kernel forever,
      performing page reclaim and consuming excessive amounts of CPU cycles.
      
      Before the patch, a write to memory.high would first put the new limit
      in place for the workload, and then reclaim the requested delta.  After
      the patch, the kernel tries to reclaim the delta before putting the new
      limit into place, in order to not overwhelm the workload with a sudden,
      large excess over the limit.  However, if reclaim is actively racing
      with new allocations from the uncurbed workload, it can keep the write()
      working inside the kernel indefinitely.
      
      This is causing problems in Facebook production.  A privileged
      system-level daemon that adjusts memory.high for various workloads
      running on a host can get unexpectedly stuck in the kernel and
      essentially turn into a sort of involuntary kswapd for one of the
      workloads.  We've observed that daemon busy-spin in a write() for
      minutes at a time, neglecting its other duties on the system, and
      expending privileged system resources on behalf of a workload.
      
      To remedy this, we have first considered changing the reclaim logic to
      break out after a couple of loops - whether the workload has converged
      to the new limit or not - and bound the write() call this way.  However,
      the root cause that inspired the sequence change in the first place has
      been fixed through other means, and so a revert back to the proven
      limit-setting sequence, also used by memory.max, is preferable.
      
      The sequence was changed to avoid extreme latencies in the workload when
      the limit was lowered: the sudden, large excess created by the limit
      lowering would erroneously trigger the penalty sleeping code that is
      meant to throttle excessive growth from below.  Allocating threads could
      end up sleeping long after the write() had already reclaimed the delta
      for which they were being punished.
      
      However, erroneous throttling also caused problems in other scenarios at
      around the same time.  This resulted in commit b3ff9291 ("mm, memcg:
      reclaim more aggressively before high allocator throttling"), included
      in the same release as the offending commit.  When allocating threads
      now encounter large excess caused by a racing write() to memory.high,
      instead of entering punitive sleeps, they will simply be tasked with
      helping reclaim down the excess, and will be held no longer than it
      takes to accomplish that.  This is in line with regular limit
      enforcement - i.e.  if the workload allocates up against or over an
      otherwise unchanged limit from below.
      
      With the patch breaking userspace, and the root cause addressed by other
      means already, revert it again.
      
      Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org
      Fixes: 536d3bf2 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NTejun Heo <tj@kernel.org>
      Acked-by: NChris Down <chris@chrisdown.name>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: <stable@vger.kernel.org>	[5.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e82553c1
    • A
      mm/mremap: fix BUILD_BUG_ON() error in get_extent · a30a2909
      Arnd Bergmann 提交于
      clang can't evaluate this function argument at compile time when the
      function is not inlined, which leads to a link time failure:
      
        ld.lld: error: undefined symbol: __compiletime_assert_414
        >>> referenced by mremap.c
        >>>               mremap.o:(get_extent) in archive mm/built-in.a
      
      Mark the function as __always_inline to avoid it.
      
      Link: https://lkml.kernel.org/r/20201230154104.522605-1-arnd@kernel.org
      Fixes: 9ad9718b ("mm/mremap: calculate extent in one place")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Tested-by: NNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Cc: Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a30a2909
    • A
      kasan: fix stack traces dependency for HW_TAGS · 1cc4cdb5
      Andrey Konovalov 提交于
      Currently, whether the alloc/free stack traces collection is enabled by
      default for hardware tag-based KASAN depends on CONFIG_DEBUG_KERNEL.
      The intention for this dependency was to only enable collection on slow
      debug kernels due to a significant perf and memory impact.
      
      As it turns out, CONFIG_DEBUG_KERNEL is not considered a debug option
      and is enabled on many productions kernels including Android and Ubuntu.
      As the result, this dependency is pointless and only complicates the
      code and documentation.
      
      Having stack traces collection disabled by default would make the
      hardware mode work differently to to the software ones, which is
      confusing.
      
      This change removes the dependency and enables stack traces collection
      by default.
      
      Looking into the future, this default might makes sense for production
      kernels, assuming we implement a fast stack trace collection approach.
      
      Link: https://lkml.kernel.org/r/6678d77ceffb71f1cff2cf61560e2ffe7bb6bfe9.1612808820.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1cc4cdb5
  6. 09 2月, 2021 1 次提交
    • P
      mm: provide a saner PTE walking API for modules · 9fd6dad1
      Paolo Bonzini 提交于
      Currently, the follow_pfn function is exported for modules but
      follow_pte is not.  However, follow_pfn is very easy to misuse,
      because it does not provide protections (so most of its callers
      assume the page is writable!) and because it returns after having
      already unlocked the page table lock.
      
      Provide instead a simplified version of follow_pte that does
      not have the pmdpp and range arguments.  The older version
      survives as follow_invalidate_pte() for use by fs/dax.c.
      Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9fd6dad1
  7. 07 2月, 2021 1 次提交
    • K
      mm: page_frag: Introduce page_frag_alloc_align() · b358e212
      Kevin Hao 提交于
      In the current implementation of page_frag_alloc(), it doesn't have
      any align guarantee for the returned buffer address. But for some
      hardwares they do require the DMA buffer to be aligned correctly,
      so we would have to use some workarounds like below if the buffers
      allocated by the page_frag_alloc() are used by these hardwares for
      DMA.
          buf = page_frag_alloc(really_needed_size + align);
          buf = PTR_ALIGN(buf, align);
      
      These codes seems ugly and would waste a lot of memories if the buffers
      are used in a network driver for the TX/RX. So introduce
      page_frag_alloc_align() to make sure that an aligned buffer address is
      returned.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b358e212
  8. 06 2月, 2021 11 次提交
  9. 30 1月, 2021 3 次提交
  10. 29 1月, 2021 1 次提交
    • W
      Revert "mm/slub: fix a memory leak in sysfs_slab_add()" · 757fed1d
      Wang Hai 提交于
      This reverts commit dde3c6b7.
      
      syzbot report a double-free bug. The following case can cause this bug.
      
       - mm/slab_common.c: create_cache(): if the __kmem_cache_create() fails,
         it does:
      
      	out_free_cache:
      		kmem_cache_free(kmem_cache, s);
      
       - but __kmem_cache_create() - at least for slub() - will have done
      
      	sysfs_slab_add(s)
      		-> sysfs_create_group() .. fails ..
      		-> kobject_del(&s->kobj); .. which frees s ...
      
      We can't remove the kmem_cache_free() in create_cache(), because other
      error cases of __kmem_cache_create() do not free this.
      
      So, revert the commit dde3c6b7 ("mm/slub: fix a memory leak in
      sysfs_slab_add()") to fix this.
      
      Reported-by: syzbot+d0bd96b4696c1ef67991@syzkaller.appspotmail.com
      Fixes: dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()")
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NWang Hai <wanghai38@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      757fed1d
  11. 28 1月, 2021 3 次提交
  12. 27 1月, 2021 1 次提交
  13. 25 1月, 2021 1 次提交