- 25 2月, 2021 9 次提交
-
-
由 Vlastimil Babka 提交于
The boot param and config determine the value of memcg_sysfs_enabled, which is unused since commit 10befea9 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") as there are no per-memcg kmem caches anymore. Link: https://lkml.kernel.org/r/20210127124745.7928-1-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NRoman Gushchin <guro@fb.com> Acked-by: NDavid Rientjes <rientjes@google.com> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
In deactivate_slab() we currently move all but one objects on the cpu freelist to the page freelist one by one using the costly cmpxchg_double() operation. Then we unfreeze the page while moving the last object on page freelist, with a final cmpxchg_double(). This can be optimized to avoid the cmpxchg_double() per object. Just count the objects on cpu freelist (to adjust page->inuse properly) and also remember the last object in the chain. Then splice page->freelist to the last object and effectively add the whole cpu freelist to page->freelist while unfreezing the page, with a single cmpxchg_double(). Link: https://lkml.kernel.org/r/20210115183543.15097-1-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NJann Horn <jannh@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
SLAB has been using get/put_online_cpus() around creating, destroying and shrinking kmem caches since 95402b38 ("cpu-hotplug: replace per-subsystem mutexes with get_online_cpus()") in 2008, which is supposed to be replacing a private mutex (cache_chain_mutex, called slab_mutex today) with system-wide mechanism, but in case of SLAB it's in fact used in addition to the existing mutex, without explanation why. SLUB appears to have avoided the cpu hotplug lock initially, but gained it due to common code unification, such as 20cea968 ("mm, sl[aou]b: Move kmem_cache_create mutex handling to common code"). Regardless of the history, checking if the hotplug lock is actually needed today suggests that it's not, and therefore it's better to avoid this system-wide lock and the ordering this imposes wrt other locks (such as slab_mutex). Specifically, in SLAB we have for_each_online_cpu() in do_tune_cpucache() protected by slab_mutex, and cpu hotplug callbacks that also take the slab_mutex, which is also taken by the common slab function that currently also take the hotplug lock. Thus the slab_mutex protection should be sufficient. Also per-cpu array caches are allocated for each possible cpu, so not affected by their online/offline state. In SLUB we have for_each_online_cpu() in functions that show statistics and are already unprotected today, as racing with hotplug is not harmful. Otherwise SLUB relies on percpu allocator. The slub_cpu_dead() hotplug callback takes the slab_mutex. To sum up, this patch removes get/put_online_cpus() calls from slab as it should be safe without further adjustments. Link: https://lkml.kernel.org/r/20210113131634.3671-4-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Qian Cai <cai@redhat.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Since commit 03afc0e2 ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}") we are taking memory hotplug lock for SLAB and SLUB when creating, destroying or shrinking a cache. It is quite a heavy lock and it's best to avoid it if possible, as we had several issues with lockdep complaining about ordering in the past, see e.g. e4f8e513 ("mm/slub: fix a deadlock in show_slab_objects()"). The problem scenario in 03afc0e2 (solved by the memory hotplug lock) can be summarized as follows: while there's slab_mutex synchronizing new kmem cache creation and SLUB's MEM_GOING_ONLINE callback slab_mem_going_online_callback(), we may miss creation of kmem_cache_node for the hotplugged node in the new kmem cache, because the hotplug callback doesn't yet see the new cache, and cache creation in init_kmem_cache_nodes() only inits kmem_cache_node for nodes in the N_NORMAL_MEMORY nodemask, which however may not yet include the new node, as that happens only later after the MEM_GOING_ONLINE callback. Instead of using get/put_online_mems(), the problem can be solved by SLUB maintaining its own nodemask of nodes for which it has allocated the per-node kmem_cache_node structures. This nodemask would generally mirror the N_NORMAL_MEMORY nodemask, but would be updated only in under SLUB's control in its memory hotplug callbacks under the slab_mutex. This patch adds such nodemask and its handling. Commit 03afc0e2 mentiones "issues like [the one above]", but there don't appear to be further issues. All the paths (shared for SLAB and SLUB) taking the memory hotplug locks are also taking the slab_mutex, except kmem_cache_shrink() where 03afc0e2 replaced slab_mutex with get/put_online_mems(). We however cannot simply restore slab_mutex in kmem_cache_shrink(), as SLUB can enters the function from a write to sysfs 'shrink' file, thus holding kernfs lock, and in kmem_cache_create() the kernfs lock is nested within slab_mutex. But on closer inspection we don't actually need to protect kmem_cache_shrink() from hotplug callbacks: While SLUB's __kmem_cache_shrink() does for_each_kmem_cache_node(), missing a new node added in parallel hotplug is not fatal, and parallel hotremove does not free kmem_cache_node's anymore after the previous patch, so use-after free cannot happen. The per-node shrinking itself is protected by n->list_lock. Same is true for SLAB, and SLOB is no-op. SLAB also doesn't need the memory hotplug locking, which it only gained by 03afc0e2 through the shared paths in slab_common.c. Its memory hotplug callbacks are also protected by slab_mutex against races with these paths. The problem of SLUB relying on N_NORMAL_MEMORY doesn't apply to SLAB, as its setup_kmem_cache_nodes relies on N_ONLINE, and the new node is already set there during the MEM_GOING_ONLINE callback, so no special care is needed for SLAB. As such, this patch removes all get/put_online_mems() usage by the slab subsystem. Link: https://lkml.kernel.org/r/20210113131634.3671-3-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Qian Cai <cai@redhat.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Patch series "mm, slab, slub: remove cpu and memory hotplug locks". Some related work caused me to look at how we use get/put_mems_online() and get/put_online_cpus() during kmem cache creation/descruction/shrinking, and realize that it should be actually safe to remove all of that with rather small effort (as e.g. Michal Hocko suspected in some of the past discussions already). This has the benefit to avoid rather heavy locks that have caused locking order issues already in the past. So this is the result, Patches 2 and 3 remove memory hotplug and cpu hotplug locking, respectively. Patch 1 is due to realization that in fact some races exist despite the locks (even if not removed), but the most sane solution is not to introduce more of them, but rather accept some wasted memory in scenarios that should be rare anyway (full memory hot remove), as we do the same in other contexts already. This patch (of 3): Commit e4f8e513 ("mm/slub: fix a deadlock in show_slab_objects()") has fixed a problematic locking order by removing the memory hotplug lock get/put_online_mems() from show_slab_objects(). During the discussion, it was argued [1] that this is OK, because existing slabs on the node would prevent a hotremove to proceed. That's true, but per-node kmem_cache_node structures are not necessarily allocated on the same node and may exist even without actual slab pages on the same node. Any path that uses get_node() directly or via for_each_kmem_cache_node() (such as show_slab_objects()) can race with freeing of kmem_cache_node even with the !NULL check, resulting in use-after-free. To that end, commit e4f8e513 argues in a comment that: * We don't really need mem_hotplug_lock (to hold off * slab_mem_going_offline_callback) here because slab's memory hot * unplug code doesn't destroy the kmem_cache->node[] data. While it's true that slab_mem_going_offline_callback() doesn't free the kmem_cache_node, the later callback slab_mem_offline_callback() actually does, so the race and use-after-free exists. Not just for show_slab_objects() after commit e4f8e513, but also many other places that are not under slab_mutex. And adding slab_mutex locking or other synchronization to SLUB paths such as get_any_partial() would be bad for performance and error-prone. The easiest solution is therefore to make the abovementioned comment true and stop freeing the kmem_cache_node structures, accepting some wasted memory in the full memory node removal scenario. Analogically we also don't free hotremoved pgdat as mentioned in [1], nor the similar per-node structures in SLAB. Importantly this approach will not block the hotremove, as generally such nodes should be movable in order to succeed hotremove in the first place, and thus the GFP_KERNEL allocated kmem_cache_node will come from elsewhere. [1] https://lore.kernel.org/linux-mm/20190924151147.GB23050@dhcp22.suse.cz/ Link: https://lkml.kernel.org/r/20210113131634.3671-1-vbabka@suse.cz Link: https://lkml.kernel.org/r/20210113131634.3671-2-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Qian Cai <cai@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Berg 提交于
If kmemleak is enabled, it uses a kmem cache for its own objects. These objects are used to hold information kmemleak uses, including a stack trace. If slub_debug is also turned on, each of them has *another* stack trace, so the overhead adds up, and on my tests (on ARCH=um, admittedly) 2/3rds of the allocations end up being doing the stack tracing. Turn off SLAB_STORE_USER if SLAB_NOLEAKTRACE was given, to avoid storing the essentially same data twice. Link: https://lkml.kernel.org/r/20210113215114.d94efa13ba30.I117b6764e725b3192318bbcf4269b13b709539ae@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zhiyuan Dai 提交于
Fix some coding style issues, improve code reading. Adds whitespace to clearly separate the parameters. Link: https://lkml.kernel.org/r/1612841499-32166-1-git-send-email-daizhiyuan@phytium.com.cnSigned-off-by: NZhiyuan Dai <daizhiyuan@phytium.com.cn> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nikolay Borisov 提交于
This argument hasn't been used since e153362a ("slub: Remove objsize check in kmem_cache_flags()") so simply remove it. Link: https://lkml.kernel.org/r/20210126095733.974665-1-nborisov@suse.comSigned-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jacob Wen 提交于
Currently, a trace record generated by the RCU core is as below. ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f3b49a66 It doesn't tell us what the RCU core has freed. This patch adds the slab name to trace_kmem_cache_free(). The new format is as follows. ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000037f79c8d name=dentry ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f78cb7b5 name=sock_inode_cache ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000018768985 name=pool_workqueue ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=000000006a6cb484 name=radix_tree_node We can use it to understand what the RCU core is going to free. For example, some users maybe interested in when the RCU core starts freeing reclaimable slabs like dentry to reduce memory pressure. Link: https://lkml.kernel.org/r/20201216072804.8838-1-jian.w.wen@oracle.comSigned-off-by: NJacob Wen <jian.w.wen@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 2月, 2021 2 次提交
-
-
由 Dennis Zhou 提交于
pcpu_build_alloc_info() is an __init function that makes a call to cpumask_clear_cpu(). With CONFIG_GCOV_PROFILE_ALL enabled, the inline heuristics are modified and such cpumask_clear_cpu() which is marked inline doesn't get inlined. Because it works on mask in __initdata, modpost throws a section mismatch error. Arnd sent a patch with the flatten attribute as an alternative [2]. I've added it to compiler_attributes.h. modpost complaint: WARNING: modpost: vmlinux.o(.text+0x735425): Section mismatch in reference from the function cpumask_clear_cpu() to the variable .init.data:pcpu_build_alloc_info.mask The function cpumask_clear_cpu() references the variable __initdata pcpu_build_alloc_info.mask. This is often because cpumask_clear_cpu lacks a __initdata annotation or the annotation of pcpu_build_alloc_info.mask is wrong. clang output: mm/percpu.c:2724:5: remark: cpumask_clear_cpu not inlined into pcpu_build_alloc_info because too costly to inline (cost=725, threshold=325) [-Rpass-missed=inline] [1] https://lore.kernel.org/linux-mm/202012220454.9F6Bkz9q-lkp@intel.com/ [2] https://lore.kernel.org/lkml/CAK8P3a2ZWfNeXKSm8K_SUhhwkor17jFo3xApLXjzfPqX0eUDUA@mail.gmail.com/Reported-by: Nkernel test robot <lkp@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: NDennis Zhou <dennis@kernel.org>
-
由 Wonhyuk Yang 提交于
To build group_map[] and group_cnt[], we find out which group CPUs belong to by comparing the distance of the cpu. However, this includes cases where comparisons are not required. This patch uses a bitmap to record CPUs that is not classified in the group. CPUs that we know which group they belong to should be cleared from the bitmap. In result, we can reduce the number of unnecessary comparisons. Signed-off-by: NWonhyuk Yang <vvghjk1234@gmail.com> Signed-off-by: NDennis Zhou <dennis@kernel.org> [Dennis: added cpumask_clear() call and #include cpumask.h.]
-
- 13 2月, 2021 1 次提交
-
-
由 Christophe Leroy 提交于
powerpc was the last provider of arch_remap() and the last user of mm-arch-hooks.h. Since commit 526a9c4a ("powerpc/vdso: Provide vdso_remap()"), arch_remap() hence mm-arch-hooks.h are not used anymore. Remove them. Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
- 11 2月, 2021 2 次提交
-
-
由 Vlastimil Babka 提交于
When creating a new kmem cache, SLUB determines how large the slab pages will based on number of inputs, including the number of CPUs in the system. Larger slab pages mean that more objects can be allocated/free from per-cpu slabs before accessing shared structures, but also potentially more memory can be wasted due to low slab usage and fragmentation. The rough idea of using number of CPUs is that larger systems will be more likely to benefit from reduced contention, and also should have enough memory to spare. Number of CPUs used to be determined as nr_cpu_ids, which is number of possible cpus, but on some systems many will never be onlined, thus commit 045ab8c9 ("mm/slub: let number of online CPUs determine the slub page order") changed it to nr_online_cpus(). However, for kmem caches created early before CPUs are onlined, this may lead to permamently low slab page sizes. Vincent reports a regression [1] of hackbench on arm64 systems: "I'm facing significant performances regression on a large arm64 server system (224 CPUs). Regressions is also present on small arm64 system (8 CPUs) but in a far smaller order of magnitude On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16 v5.11-rc4 : 9.135sec (+/- 0.45%) v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%) v5.10: 3.136sec (+/- 0.40%)" Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting page allocator contention: "i.e. the patch incurs a 7% to 32% performance penalty. This bisected cleanly yesterday when I was looking for the regression and then found the thread. Numerous caches change size. For example, kmalloc-512 goes from order-0 (vanilla) to order-2 with the revert. So mostly this is down to the number of times SLUB calls into the page allocator which only caches order-0 pages on a per-cpu basis" Clearly num_online_cpus() doesn't work too early in bootup. We could change the order dynamically in a memory hotplug callback, but runtime order changing for existing kmem caches has been already shown as dangerous, and removed in 32a6f409 ("mm, slub: remove runtime allocation order changes"). It could be resurrected in a safe manner with some effort, but to fix the regression we need something simpler. We could use num_present_cpus() that should be the number of physically present CPUs even before they are onlined. That would work for PowerPC [3], which triggered the original commit, but that still doesn't work on arm64 [4] as explained in [5]. So this patch tries to determine the best available value without specific arch knowledge. - num_present_cpus() if the number is larger than 1, as that means the arch is likely setting it properly - nr_cpu_ids otherwise This should fix the reported regressions while also keeping the effect of 045ab8c9 for PowerPC systems. It's possible there are configurations where num_present_cpus() is 1 during boot while nr_cpu_ids is at the same time bloated, so these (if they exist) would keep the large orders based on nr_cpu_ids as was before 045ab8c9. [1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj7Rou=xzZg@mail.gmail.com/ [2] https://lore.kernel.org/linux-mm/20210128134512.GF3592@techsingularity.net/ [3] https://lore.kernel.org/linux-mm/20210123051607.GC2587010@in.ibm.com/ [4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03YsY18cmWv_g@mail.gmail.com/ [5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/ Link: https://lkml.kernel.org/r/20210208134108.22286-1-vbabka@suse.cz Fixes: 045ab8c9 ("mm/slub: let number of online CPUs determine the slub page order") Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Reported-by: NVincent Guittot <vincent.guittot@linaro.org> Reported-by: NMel Gorman <mgorman@techsingularity.net> Tested-by: NMel Gorman <mgorman@techsingularity.net> Tested-by: NVincent Guittot <vincent.guittot@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jann Horn <jannh@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Will Deacon 提交于
Commit f9ce0be7 ("mm: Cleanup faultaround and finish_fault() codepaths") added a call to 'update_mmu_cache()' in mm/filemap.c, which breaks the build for microblaze: | mm/filemap.c: In function 'filemap_map_pages': | mm/filemap.c:3153:3: error: implicit declaration of function 'update_mmu_cache'; did you mean 'update_mmu_tlb'? Include asm/tlbflush.h in mm/filemap.c to make sure that the function (or indeed, macro) is available. Reported-by: NGuenter Roeck <linux@roeck-us.net> Tested-by: NGuenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210209202449.GA104837@roeck-us.netSigned-off-by: NWill Deacon <will@kernel.org>
-
- 10 2月, 2021 4 次提交
-
-
由 Christoph Hellwig 提交于
Open code the parts of map_swap_entry that was actually used by swapdev_block, and remove the now unused map_swap_entry function. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Johannes Weiner 提交于
This reverts commit 536d3bf2, as it can cause writers to memory.high to get stuck in the kernel forever, performing page reclaim and consuming excessive amounts of CPU cycles. Before the patch, a write to memory.high would first put the new limit in place for the workload, and then reclaim the requested delta. After the patch, the kernel tries to reclaim the delta before putting the new limit into place, in order to not overwhelm the workload with a sudden, large excess over the limit. However, if reclaim is actively racing with new allocations from the uncurbed workload, it can keep the write() working inside the kernel indefinitely. This is causing problems in Facebook production. A privileged system-level daemon that adjusts memory.high for various workloads running on a host can get unexpectedly stuck in the kernel and essentially turn into a sort of involuntary kswapd for one of the workloads. We've observed that daemon busy-spin in a write() for minutes at a time, neglecting its other duties on the system, and expending privileged system resources on behalf of a workload. To remedy this, we have first considered changing the reclaim logic to break out after a couple of loops - whether the workload has converged to the new limit or not - and bound the write() call this way. However, the root cause that inspired the sequence change in the first place has been fixed through other means, and so a revert back to the proven limit-setting sequence, also used by memory.max, is preferable. The sequence was changed to avoid extreme latencies in the workload when the limit was lowered: the sudden, large excess created by the limit lowering would erroneously trigger the penalty sleeping code that is meant to throttle excessive growth from below. Allocating threads could end up sleeping long after the write() had already reclaimed the delta for which they were being punished. However, erroneous throttling also caused problems in other scenarios at around the same time. This resulted in commit b3ff9291 ("mm, memcg: reclaim more aggressively before high allocator throttling"), included in the same release as the offending commit. When allocating threads now encounter large excess caused by a racing write() to memory.high, instead of entering punitive sleeps, they will simply be tasked with helping reclaim down the excess, and will be held no longer than it takes to accomplish that. This is in line with regular limit enforcement - i.e. if the workload allocates up against or over an otherwise unchanged limit from below. With the patch breaking userspace, and the root cause addressed by other means already, revert it again. Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org Fixes: 536d3bf2 ("mm: memcontrol: avoid workload stalls when lowering memory.high") Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reported-by: NTejun Heo <tj@kernel.org> Acked-by: NChris Down <chris@chrisdown.name> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: <stable@vger.kernel.org> [5.8+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arnd Bergmann 提交于
clang can't evaluate this function argument at compile time when the function is not inlined, which leads to a link time failure: ld.lld: error: undefined symbol: __compiletime_assert_414 >>> referenced by mremap.c >>> mremap.o:(get_extent) in archive mm/built-in.a Mark the function as __always_inline to avoid it. Link: https://lkml.kernel.org/r/20201230154104.522605-1-arnd@kernel.org Fixes: 9ad9718b ("mm/mremap: calculate extent in one place") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Tested-by: NNick Desaulniers <ndesaulniers@google.com> Reviewed-by: NNathan Chancellor <natechancellor@gmail.com> Tested-by: NSedat Dilek <sedat.dilek@gmail.com> Cc: Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Konovalov 提交于
Currently, whether the alloc/free stack traces collection is enabled by default for hardware tag-based KASAN depends on CONFIG_DEBUG_KERNEL. The intention for this dependency was to only enable collection on slow debug kernels due to a significant perf and memory impact. As it turns out, CONFIG_DEBUG_KERNEL is not considered a debug option and is enabled on many productions kernels including Android and Ubuntu. As the result, this dependency is pointless and only complicates the code and documentation. Having stack traces collection disabled by default would make the hardware mode work differently to to the software ones, which is confusing. This change removes the dependency and enables stack traces collection by default. Looking into the future, this default might makes sense for production kernels, assuming we implement a fast stack trace collection approach. Link: https://lkml.kernel.org/r/6678d77ceffb71f1cff2cf61560e2ffe7bb6bfe9.1612808820.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NMarco Elver <elver@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 2月, 2021 1 次提交
-
-
由 Paolo Bonzini 提交于
Currently, the follow_pfn function is exported for modules but follow_pte is not. However, follow_pfn is very easy to misuse, because it does not provide protections (so most of its callers assume the page is writable!) and because it returns after having already unlocked the page table lock. Provide instead a simplified version of follow_pte that does not have the pmdpp and range arguments. The older version survives as follow_invalidate_pte() for use by fs/dax.c. Reviewed-by: NJason Gunthorpe <jgg@nvidia.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 2月, 2021 1 次提交
-
-
由 Kevin Hao 提交于
In the current implementation of page_frag_alloc(), it doesn't have any align guarantee for the returned buffer address. But for some hardwares they do require the DMA buffer to be aligned correctly, so we would have to use some workarounds like below if the buffers allocated by the page_frag_alloc() are used by these hardwares for DMA. buf = page_frag_alloc(really_needed_size + align); buf = PTR_ALIGN(buf, align); These codes seems ugly and would waste a lot of memories if the buffers are used in a network driver for the TX/RX. So introduce page_frag_alloc_align() to make sure that an aligned buffer address is returned. Signed-off-by: NKevin Hao <haokexin@gmail.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 06 2月, 2021 11 次提交
-
-
由 Muchun Song 提交于
The VM_BUG_ON_PAGE avoids the generation of any code, even if that expression has side-effects when !CONFIG_DEBUG_VM. Link: https://lkml.kernel.org/r/20210126031009.96266-1-songmuchun@bytedance.com Fixes: e5dfaceb ("mm/hugetlb.c: just use put_page_testzero() instead of page_count()") Signed-off-by: NMuchun Song <songmuchun@bytedance.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vincenzo Frascino 提交于
Currently, addr_has_metadata() returns true for every address. An invalid address (e.g. NULL) passed to the function when, KASAN_HW_TAGS is enabled, leads to a kernel panic. Make addr_has_metadata() return true for valid addresses only. Note: KASAN_HW_TAGS support for vmalloc will be added with a future patch. Link: https://lkml.kernel.org/r/20210126134409.47894-3-vincenzo.frascino@arm.com Fixes: 2e903b91 ("kasan, arm64: implement HW_TAGS runtime") Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: NAndrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Waiman Long 提交于
Commit 3fea5a49 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API") introduced a bug in __add_to_page_cache_locked() causing the following splat: page dumped because: VM_BUG_ON_PAGE(page_memcg(page)) pages's memcg:ffff8889a4116000 ------------[ cut here ]------------ kernel BUG at mm/memcontrol.c:2924! invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 35 PID: 12345 Comm: cat Tainted: G S W I 5.11.0-rc4-debug+ #1 Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017 RIP: commit_charge+0xf4/0x130 Call Trace: mem_cgroup_charge+0x175/0x770 __add_to_page_cache_locked+0x712/0xad0 add_to_page_cache_lru+0xc5/0x1f0 cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles] __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache] __nfs_readpages_from_fscache+0x16d/0x630 [nfs] nfs_readpages+0x24e/0x540 [nfs] read_pages+0x5b1/0xc40 page_cache_ra_unbounded+0x460/0x750 generic_file_buffered_read_get_pages+0x290/0x1710 generic_file_buffered_read+0x2a9/0xc30 nfs_file_read+0x13f/0x230 [nfs] new_sync_read+0x3af/0x610 vfs_read+0x339/0x4b0 ksys_read+0xf1/0x1c0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Before that commit, there was a try_charge() and commit_charge() in __add_to_page_cache_locked(). These two separated charge functions were replaced by a single mem_cgroup_charge(). However, it forgot to add a matching mem_cgroup_uncharge() when the xarray insertion failed with the page released back to the pool. Fix this by adding a mem_cgroup_uncharge() call when insertion error happens. Link: https://lkml.kernel.org/r/20210125042441.20030-1-longman@redhat.com Fixes: 3fea5a49 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API") Signed-off-by: NWaiman Long <longman@redhat.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <smuchun@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Roman Gushchin 提交于
With kaslr the kernel image is placed at a random place, so starting the bottom-up allocation with the kernel_end can result in an allocation failure and a warning like this one: hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node ------------[ cut here ]------------ memblock: bottom-up allocation failed, memory hotremove may be affected WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 RIP: 0010:memblock_find_in_range_node+0x178/0x25a Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000 RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046 RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000 R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000 FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0 Call Trace: memblock_alloc_range_nid+0x8d/0x11e cma_declare_contiguous_nid+0x2c4/0x38c hugetlb_cma_reserve+0xdc/0x128 flush_tlb_one_kernel+0xc/0x20 native_set_fixmap+0x82/0xd0 flat_get_apic_id+0x5/0x10 register_lapic_address+0x8e/0x97 setup_arch+0x8a5/0xc3f start_kernel+0x66/0x547 load_ucode_bsp+0x4c/0xcd secondary_startup_64_no_verify+0xb0/0xbb random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0 ---[ end trace f151227d0b39be70 ]--- At the same time, the kernel image is protected with memblock_reserve(), so we can just start searching at PAGE_SIZE. In this case the bottom-up allocation has the same chances to success as a top-down allocation, so there is no reason to fallback in the case of a failure. All together it simplifies the logic. Link: https://lkml.kernel.org/r/20201217201214.3414100-2-guro@fb.com Fixes: 8fabc623 ("powerpc: Ensure that swiotlb buffer is allocated from low memory") Signed-off-by: NRoman Gushchin <guro@fb.com> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Wonhyuk Yang <vvghjk1234@gmail.com> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Sergey reported deadlock between kswapd correctly doing its usual lock_page(page) followed by down_read(page->mapping->i_mmap_rwsem), and madvise(MADV_REMOVE) on an madvise(MADV_HUGEPAGE) area doing down_write(page->mapping->i_mmap_rwsem) followed by lock_page(page). This happened when shmem_fallocate(punch hole)'s unmap_mapping_range() reaches zap_pmd_range()'s call to __split_huge_pmd(). The same deadlock could occur when partially truncating a mapped huge tmpfs file, or using fallocate(FALLOC_FL_PUNCH_HOLE) on it. __split_huge_pmd()'s page lock was added in 5.8, to make sure that any concurrent use of reuse_swap_page() (holding page lock) could not catch the anon THP's mapcounts and swapcounts while they were being split. Fortunately, reuse_swap_page() is never applied to a shmem or file THP (not even by khugepaged, which checks PageSwapCache before calling), and anonymous THPs are never created in shmem or file areas: so that __split_huge_pmd()'s page lock can only be necessary for anonymous THPs, on which there is no risk of deadlock with i_mmap_rwsem. Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101161409470.2022@eggly.anvils Fixes: c444eb56 ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()") Signed-off-by: NHugh Dickins <hughd@google.com> Reported-by: NSergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rokudo Yan 提交于
In fast_isolate_freepages, high_pfn will be used if a prefered one (ie PFN >= low_fn) not found. But the high_pfn is not reset before searching an free area, so when it was used as freepage, it may from another free area searched before. As a result move_freelist_head(freelist, freepage) will have unexpected behavior (eg corrupt the MOVABLE freelist) Unable to handle kernel paging request at virtual address dead000000000200 Mem abort info: ESR = 0x96000044 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000044 CM = 0, WnR = 1 [dead000000000200] address between user and kernel address ranges -000|list_cut_before(inline) -000|move_freelist_head(inline) -000|fast_isolate_freepages(inline) -000|isolate_freepages(inline) -000|compaction_alloc(?, ?) -001|unmap_and_move(inline) -001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p -002|__read_once_size(inline) -002|static_key_count(inline) -002|static_key_false(inline) -002|trace_mm_compaction_migratepages(inline) -002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0) -003|kcompactd_do_work(inline) -003|kcompactd([X19] p = 0xFFFFFF93227FBC40) -004|kthread([X20] _create = 0xFFFFFFE1AFB26380) -005|ret_from_fork(asm) The issue was reported on an smart phone product with 6GB ram and 3GB zram as swap device. This patch fixes the issue by reset high_pfn before searching each free area, which ensure freepage and freelist match when call move_freelist_head in fast_isolate_freepages(). Link: http://lkml.kernel.org/r/20190118175136.31341-12-mgorman@techsingularity.net Link: https://lkml.kernel.org/r/20210112094720.1238444-1-wu-yan@tcl.com Fixes: 5a811889 ("mm, compaction: use free lists to quickly locate a migration target") Signed-off-by: NRokudo Yan <wu-yan@tcl.com> Acked-by: NMel Gorman <mgorman@techsingularity.net> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
All pages isolated for the migration have an elevated reference count and therefore seeing a reference count equal to 1 means that the last user of the page has dropped the reference and the page has became unused and there doesn't make much sense to migrate it anymore. This has been done for regular pages and this patch does the same for hugetlb pages. Although the likelihood of the race is rather small for hugetlb pages it makes sense the two code paths in sync. Link: https://lkml.kernel.org/r/20210115124942.46403-2-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NYang Shi <shy828301@gmail.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
The page_huge_active() can be called from scan_movable_pages() which do not hold a reference count to the HugeTLB page. So when we call page_huge_active() from scan_movable_pages(), the HugeTLB page can be freed parallel. Then we will trigger a BUG_ON which is in the page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the VM_BUG_ON_PAGE. Link: https://lkml.kernel.org/r/20210115124942.46403-6-songmuchun@bytedance.com Fixes: 7e1f049e ("mm: hugetlb: cleanup using paeg_huge_active()") Signed-off-by: NMuchun Song <songmuchun@bytedance.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
There is a race between isolate_huge_page() and __free_huge_page(). CPU0: CPU1: if (PageHuge(page)) put_page(page) __free_huge_page(page) spin_lock(&hugetlb_lock) update_and_free_page(page) set_compound_page_dtor(page, NULL_COMPOUND_DTOR) spin_unlock(&hugetlb_lock) isolate_huge_page(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHead(page), page) spin_lock(&hugetlb_lock) page_huge_active(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHuge(page), page) spin_unlock(&hugetlb_lock) When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0, because it is already freed to the buddy allocator. Link: https://lkml.kernel.org/r/20210115124942.46403-5-songmuchun@bytedance.com Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by: NMuchun Song <songmuchun@bytedance.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
There is a race condition between __free_huge_page() and dissolve_free_huge_page(). CPU0: CPU1: // page_count(page) == 1 put_page(page) __free_huge_page(page) dissolve_free_huge_page(page) spin_lock(&hugetlb_lock) // PageHuge(page) && !page_count(page) update_and_free_page(page) // page is freed to the buddy spin_unlock(&hugetlb_lock) spin_lock(&hugetlb_lock) clear_page_huge_active(page) enqueue_huge_page(page) // It is wrong, the page is already freed spin_unlock(&hugetlb_lock) The race window is between put_page() and dissolve_free_huge_page(). We should make sure that the page is already on the free list when it is dissolved. As a result __free_huge_page would corrupt page(s) already in the buddy allocator. Link: https://lkml.kernel.org/r/20210115124942.46403-4-songmuchun@bytedance.com Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by: NMuchun Song <songmuchun@bytedance.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
If a new hugetlb page is allocated during fallocate it will not be marked as active (set_page_huge_active) which will result in a later isolate_huge_page failure when the page migration code would like to move that page. Such a failure would be unexpected and wrong. Only export set_page_huge_active, just leave clear_page_huge_active as static. Because there are no external users. Link: https://lkml.kernel.org/r/20210115124942.46403-3-songmuchun@bytedance.com Fixes: 70c3547e (hugetlbfs: add hugetlbfs_fallocate()) Signed-off-by: NMuchun Song <songmuchun@bytedance.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 1月, 2021 3 次提交
-
-
由 Will Deacon 提交于
The 'start' and 'end' arguments to tlb_gather_mmu() are no longer needed now that there is a separate function for 'fullmm' flushing. Remove the unused arguments and update all callers. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NYu Zhao <yuzhao@google.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wjQWa14_4UpfDf=fiineNP+RH74kZeDMo_f1D35xNzq9w@mail.gmail.com
-
由 Will Deacon 提交于
Passing the range '0, -1' to tlb_gather_mmu() sets the 'fullmm' flag, which indicates that the mm_struct being operated on is going away. In this case, some architectures (such as arm64) can elide TLB invalidation by ensuring that the TLB tag (ASID) associated with this mm is not immediately reclaimed. Although this behaviour is documented in asm-generic/tlb.h, it's subtle and easily missed. Introduce tlb_gather_mmu_fullmm() to make it clearer that this is for the entire mm and WARN() if tlb_gather_mmu() is called with the 'fullmm' address range. Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NYu Zhao <yuzhao@google.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20210127235347.1402-4-will@kernel.org
-
由 Will Deacon 提交于
Since commit 7a30df49 ("mm: mmu_gather: remove __tlb_reset_range() for force flush"), the 'start' and 'end' arguments to tlb_finish_mmu() are no longer used, since we flush the whole mm in case of a nested invalidation. Remove the unused arguments and update all callers. Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NYu Zhao <yuzhao@google.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20210127235347.1402-3-will@kernel.org
-
- 29 1月, 2021 1 次提交
-
-
由 Wang Hai 提交于
This reverts commit dde3c6b7. syzbot report a double-free bug. The following case can cause this bug. - mm/slab_common.c: create_cache(): if the __kmem_cache_create() fails, it does: out_free_cache: kmem_cache_free(kmem_cache, s); - but __kmem_cache_create() - at least for slub() - will have done sysfs_slab_add(s) -> sysfs_create_group() .. fails .. -> kobject_del(&s->kobj); .. which frees s ... We can't remove the kmem_cache_free() in create_cache(), because other error cases of __kmem_cache_create() do not free this. So, revert the commit dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()") to fix this. Reported-by: syzbot+d0bd96b4696c1ef67991@syzkaller.appspotmail.com Fixes: dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()") Acked-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NWang Hai <wanghai38@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 1月, 2021 3 次提交
-
-
由 Geert Uytterhoeven 提交于
If CONFIG_MMU is not set (e.g. m68k/m5272c3_defconfig): mm/nommu.c:1671:6: error: conflicting types for ‘filemap_map_pages’ 1671 | void filemap_map_pages(struct vm_fault *vmf, | ^~~~~~~~~~~~~~~~~ In file included from mm/nommu.c:20: ./include/linux/mm.h:2578:19: note: previous declaration of ‘filemap_map_pages’ was here 2578 | extern vm_fault_t filemap_map_pages(struct vm_fault *vmf, | ^~~~~~~~~~~~~~~~~ The signature of filemap_map_pages() was changed, but the nommu implementation wasn't updated. Reported-by: noreply@ellerman.id.au Fixes: f9ce0be7 ("mm: Cleanup faultaround and finish_fault() codepaths") Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20210128100626.2257638-1-geert@linux-m68k.orgSigned-off-by: NWill Deacon <will@kernel.org>
-
由 Jens Axboe 提交于
Current tree spews this on compile: mm/swapfile.c:2290:17: warning: ‘map_swap_entry’ defined but not used [-Wunused-function] 2290 | static sector_t map_swap_entry(swp_entry_t entry, struct block_device **bdev) | ^~~~~~~~~~~~~~ if !CONFIG_HIBERNATION, as we don't use the function unless we have that config option set. Fixes: 48d15436 ("mm: remove get_swap_bio") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Just reuse the block_device and sector from the swap_info structure, just as used by the SWP_SYNCHRONOUS path. Also remove the checks for NULL returns from bio_alloc as that can't happen for sleeping allocations. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 1月, 2021 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit d3921cb8. Chris Wilson reports that it causes boot problems: "We have half a dozen or so different machines in CI that are silently failing to boot, that we believe is bisected to this patch" and the CI team confirmed that a revert fixed the issues. The cause is unknown for now, so let's revert it. Link: https://lore.kernel.org/lkml/161160687463.28991.354987542182281928@build.alporthouse.com/Reported-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 1月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-