- 08 9月, 2021 2 次提交
-
-
由 Vlastimil Babka 提交于
mainline inclusion from mainline-v5.12-rc1 commit 7e1fa93d category: bugfix bugzilla: 175589 CVE: NA ------------------------------------------------- Since commit 03afc0e2 ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}") we are taking memory hotplug lock for SLAB and SLUB when creating, destroying or shrinking a cache. It is quite a heavy lock and it's best to avoid it if possible, as we had several issues with lockdep complaining about ordering in the past, see e.g. e4f8e513 ("mm/slub: fix a deadlock in show_slab_objects()"). The problem scenario in 03afc0e2 (solved by the memory hotplug lock) can be summarized as follows: while there's slab_mutex synchronizing new kmem cache creation and SLUB's MEM_GOING_ONLINE callback slab_mem_going_online_callback(), we may miss creation of kmem_cache_node for the hotplugged node in the new kmem cache, because the hotplug callback doesn't yet see the new cache, and cache creation in init_kmem_cache_nodes() only inits kmem_cache_node for nodes in the N_NORMAL_MEMORY nodemask, which however may not yet include the new node, as that happens only later after the MEM_GOING_ONLINE callback. Instead of using get/put_online_mems(), the problem can be solved by SLUB maintaining its own nodemask of nodes for which it has allocated the per-node kmem_cache_node structures. This nodemask would generally mirror the N_NORMAL_MEMORY nodemask, but would be updated only in under SLUB's control in its memory hotplug callbacks under the slab_mutex. This patch adds such nodemask and its handling. Commit 03afc0e2 mentiones "issues like [the one above]", but there don't appear to be further issues. All the paths (shared for SLAB and SLUB) taking the memory hotplug locks are also taking the slab_mutex, except kmem_cache_shrink() where 03afc0e2 replaced slab_mutex with get/put_online_mems(). We however cannot simply restore slab_mutex in kmem_cache_shrink(), as SLUB can enters the function from a write to sysfs 'shrink' file, thus holding kernfs lock, and in kmem_cache_create() the kernfs lock is nested within slab_mutex. But on closer inspection we don't actually need to protect kmem_cache_shrink() from hotplug callbacks: While SLUB's __kmem_cache_shrink() does for_each_kmem_cache_node(), missing a new node added in parallel hotplug is not fatal, and parallel hotremove does not free kmem_cache_node's anymore after the previous patch, so use-after free cannot happen. The per-node shrinking itself is protected by n->list_lock. Same is true for SLAB, and SLOB is no-op. SLAB also doesn't need the memory hotplug locking, which it only gained by 03afc0e2 through the shared paths in slab_common.c. Its memory hotplug callbacks are also protected by slab_mutex against races with these paths. The problem of SLUB relying on N_NORMAL_MEMORY doesn't apply to SLAB, as its setup_kmem_cache_nodes relies on N_ONLINE, and the new node is already set there during the MEM_GOING_ONLINE callback, so no special care is needed for SLAB. As such, this patch removes all get/put_online_mems() usage by the slab subsystem. Link: https://lkml.kernel.org/r/20210113131634.3671-3-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Qian Cai <cai@redhat.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChengyang Fan <cy.fan@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Vlastimil Babka 提交于
mainline inclusion from mainline-v5.12-rc1 commit 666716fd category: bugfix bugzilla: 175589 CVE: NA ------------------------------------------------- Patch series "mm, slab, slub: remove cpu and memory hotplug locks". Some related work caused me to look at how we use get/put_mems_online() and get/put_online_cpus() during kmem cache creation/descruction/shrinking, and realize that it should be actually safe to remove all of that with rather small effort (as e.g. Michal Hocko suspected in some of the past discussions already). This has the benefit to avoid rather heavy locks that have caused locking order issues already in the past. So this is the result, Patches 2 and 3 remove memory hotplug and cpu hotplug locking, respectively. Patch 1 is due to realization that in fact some races exist despite the locks (even if not removed), but the most sane solution is not to introduce more of them, but rather accept some wasted memory in scenarios that should be rare anyway (full memory hot remove), as we do the same in other contexts already. This patch (of 3): Commit e4f8e513 ("mm/slub: fix a deadlock in show_slab_objects()") has fixed a problematic locking order by removing the memory hotplug lock get/put_online_mems() from show_slab_objects(). During the discussion, it was argued [1] that this is OK, because existing slabs on the node would prevent a hotremove to proceed. That's true, but per-node kmem_cache_node structures are not necessarily allocated on the same node and may exist even without actual slab pages on the same node. Any path that uses get_node() directly or via for_each_kmem_cache_node() (such as show_slab_objects()) can race with freeing of kmem_cache_node even with the !NULL check, resulting in use-after-free. To that end, commit e4f8e513 argues in a comment that: * We don't really need mem_hotplug_lock (to hold off * slab_mem_going_offline_callback) here because slab's memory hot * unplug code doesn't destroy the kmem_cache->node[] data. While it's true that slab_mem_going_offline_callback() doesn't free the kmem_cache_node, the later callback slab_mem_offline_callback() actually does, so the race and use-after-free exists. Not just for show_slab_objects() after commit e4f8e513, but also many other places that are not under slab_mutex. And adding slab_mutex locking or other synchronization to SLUB paths such as get_any_partial() would be bad for performance and error-prone. The easiest solution is therefore to make the abovementioned comment true and stop freeing the kmem_cache_node structures, accepting some wasted memory in the full memory node removal scenario. Analogically we also don't free hotremoved pgdat as mentioned in [1], nor the similar per-node structures in SLAB. Importantly this approach will not block the hotremove, as generally such nodes should be movable in order to succeed hotremove in the first place, and thus the GFP_KERNEL allocated kmem_cache_node will come from elsewhere. [1] https://lore.kernel.org/linux-mm/20190924151147.GB23050@dhcp22.suse.cz/ Link: https://lkml.kernel.org/r/20210113131634.3671-1-vbabka@suse.cz Link: https://lkml.kernel.org/r/20210113131634.3671-2-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Qian Cai <cai@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChengyang Fan <cy.fan@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 29 5月, 2021 2 次提交
-
-
由 Yang Yingliang 提交于
hulk inclusion category: bugfix bugzilla: 51349 CVE: NA ------------------------------------------------- This patchset https://patchwork.kernel.org/project/linux-block/cover/20190826111627.7505-1-vbabka@suse.cz/ will cause perfmance regression, so revert it and use another way to fix the warning introduced by fix CVE-2021-27365. Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Yang Yingliang 提交于
hulk inclusion category: bugfix bugzilla: 51349 CVE: NA ------------------------------------------------- This patchset https://patchwork.kernel.org/project/linux-block/cover/20190826111627.7505-1-vbabka@suse.cz/ will cause perfmance regression, so revert it and use another way to fix the warning introduced by fix CVE-2021-27365. Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 08 5月, 2021 2 次提交
-
-
由 Muchun Song 提交于
mainline inclusion from mainline-5.12-rc1 commit 96403bfe category: bugfix bugzilla: 51349 CVE: NA ------------------------------------------------- SLUB currently account kmalloc() and kmalloc_node() allocations larger than order-1 page per-node. But it forget to update the per-memcg vmstats. So it can lead to inaccurate statistics of "slab_unreclaimable" which is from memory.stat. Fix it by using mod_lruvec_page_state instead of mod_node_page_state. Link: https://lkml.kernel.org/r/20210223092423.42420-1-songmuchun@bytedance.com Fixes: 6a486c0a ("mm, sl[ou]b: improve memory accounting") Signed-off-by: NMuchun Song <songmuchun@bytedance.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Reviewed-by: NRoman Gushchin <guro@fb.com> Reviewed-by: NMichal Koutný <mkoutny@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 96403bfe) Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Vlastimil Babka 提交于
mainline inclusion from mainline-5.4-rc3 commit 6a486c0a category: feature bugzilla: 51349 CVE: NA ------------------------------------------------- Patch series "guarantee natural alignment for kmalloc()", v2. This patch (of 2): SLOB currently doesn't account its pages at all, so in /proc/meminfo the Slab field shows zero. Modifying a counter on page allocation and freeing should be acceptable even for the small system scenarios SLOB is intended for. Since reclaimable caches are not separated in SLOB, account everything as unreclaimable. SLUB currently doesn't account kmalloc() and kmalloc_node() allocations larger than order-1 page, that are passed directly to the page allocator. As they also don't appear in /proc/slabinfo, it might look like a memory leak. For consistency, account them as well. (SLAB doesn't actually use page allocator directly, so no change there). Ideally SLOB and SLUB would be handled in separate patches, but due to the shared kmalloc_order() function and different kfree() implementations, it's easier to patch both at once to prevent inconsistencies. Link: http://lkml.kernel.org/r/20190826111627.7505-2-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Darrick J . Wong" <darrick.wong@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 6a486c0a) Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 14 4月, 2021 1 次提交
-
-
由 Linus Torvalds 提交于
stable inclusion from linux-4.19.181 commit c6afb0eeed94a7d504acdfdde7128573ec6a9e3e -------------------------------- commit 9b1ea29b upstream. This reverts commit 8ff60eb0. The kernel test robot reports a huge performance regression due to the commit, and the reason seems fairly straightforward: when there is contention on the page list (which is what causes acquire_slab() to fail), we do _not_ want to just loop and try again, because that will transfer the contention to the 'n->list_lock' spinlock we hold, and just make things even worse. This is admittedly likely a problem only on big machines - the kernel test robot report comes from a 96-thread dual socket Intel Xeon Gold 6252 setup, but the regression there really is quite noticeable: -47.9% regression of stress-ng.rawpkt.ops_per_sec and the commit that was marked as being fixed (7ced3719: "slub: Acquire_slab() avoid loop") actually did the loop exit early very intentionally (the hint being that "avoid loop" part of that commit message), exactly to avoid this issue. The correct thing to do may be to pick some kind of reasonable middle ground: instead of breaking out of the loop on the very first sign of contention, or trying over and over and over again, the right thing may be to re-try _once_, and then give up on the second failure (or pick your favorite value for "once"..). Reported-by: Nkernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/lkml/20210301080404.GF12822@xsang-OptiPlex-9020/ Cc: Jann Horn <jannh@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
-
- 11 3月, 2021 1 次提交
-
-
由 Wang Hai 提交于
stable inclusion from linux-4.19.172 commit 08b227d9f380aff229fe7eb3e43c0d90ac14ec38 -------------------------------- commit 757fed1d upstream. This reverts commit dde3c6b7. syzbot report a double-free bug. The following case can cause this bug. - mm/slab_common.c: create_cache(): if the __kmem_cache_create() fails, it does: out_free_cache: kmem_cache_free(kmem_cache, s); - but __kmem_cache_create() - at least for slub() - will have done sysfs_slab_add(s) -> sysfs_create_group() .. fails .. -> kobject_del(&s->kobj); .. which frees s ... We can't remove the kmem_cache_free() in create_cache(), because other error cases of __kmem_cache_create() do not free this. So, revert the commit dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()") to fix this. Reported-by: syzbot+d0bd96b4696c1ef67991@syzkaller.appspotmail.com Fixes: dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()") Acked-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NWang Hai <wanghai38@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
-
- 22 2月, 2021 1 次提交
-
-
由 Jann Horn 提交于
stable inclusion from linux-4.19.169 commit 21a466aed724f6f2fb10635c982a50c0900ed8f3 -------------------------------- commit 8ff60eb0 upstream. acquire_slab() fails if there is contention on the freelist of the page (probably because some other CPU is concurrently freeing an object from the page). In that case, it might make sense to look for a different page (since there might be more remote frees to the page from other CPUs, and we don't want contention on struct page). However, the current code accidentally stops looking at the partial list completely in that case. Especially on kernels without CONFIG_NUMA set, this means that get_partial() fails and new_slab_objects() falls back to new_slab(), allocating new pages. This could lead to an unnecessary increase in memory fragmentation. Link: https://lkml.kernel.org/r/20201228130853.1871516-1-jannh@google.com Fixes: 7ced3719 ("slub: Acquire_slab() avoid loop") Signed-off-by: NJann Horn <jannh@google.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
-
- 22 9月, 2020 7 次提交
-
-
由 Eugeniu Rosca 提交于
mainline inclusion from mainline-v5.9-rc4 commit dc07a728 category: bugfix bugzilla: NA CVE: NA ------------------------------------------------- Commit 52f23478 ("mm/slub.c: fix corrupted freechain in deactivate_slab()") suffered an update when picked up from LKML [1]. Specifically, relocating 'freelist = NULL' into 'freelist_corrupted()' created a no-op statement. Fix it by sticking to the behavior intended in the original patch [1]. In addition, make freelist_corrupted() immune to passing NULL instead of &freelist. The issue has been spotted via static analysis and code review. [1] https://lore.kernel.org/linux-mm/20200331031450.12182-1-dongli.zhang@oracle.com/ Fixes: 52f23478 ("mm/slub.c: fix corrupted freechain in deactivate_slab()") Signed-off-by: NEugeniu Rosca <erosca@de.adit-jv.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: Joe Jin <joe.jin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200824130643.10291-1-erosca@de.adit-jv.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Qian Cai 提交于
stable inclusion from linux-4.19.132 commit 3e632652e3dc186d61d584258becf9ca69ae2f3a -------------------------------- [ Upstream commit a68ee057 ] There is no need to copy SLUB_STATS items from root memcg cache to new memcg cache copies. Doing so could result in stack overruns because the store function only accepts 0 to clear the stat and returns an error for everything else while the show method would print out the whole stat. Then, the mismatch of the lengths returns from show and store methods happens in memcg_propagate_slab_attrs(): else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf)) buf = mbuf; max_attr_size is only 2 from slab_attr_store(), then, it uses mbuf[64] in show_stat() later where a bounch of sprintf() would overrun the stack variable. Fix it by always allocating a page of buffer to be used in show_stat() if SLUB_STATS=y which should only be used for debug purpose. # echo 1 > /sys/kernel/slab/fs_cache/shrink BUG: KASAN: stack-out-of-bounds in number+0x421/0x6e0 Write of size 1 at addr ffffc900256cfde0 by task kworker/76:0/53251 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func Call Trace: number+0x421/0x6e0 vsnprintf+0x451/0x8e0 sprintf+0x9e/0xd0 show_stat+0x124/0x1d0 alloc_slowpath_show+0x13/0x20 __kmem_cache_create+0x47a/0x6b0 addr ffffc900256cfde0 is located in stack of task kworker/76:0/53251 at offset 0 in frame: process_one_work+0x0/0xb90 this frame has 1 object: [32, 72) 'lockdep_map' Memory state around the buggy address: ffffc900256cfc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffc900256cfd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffffc900256cfd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 ^ ffffc900256cfe00: 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00 ffffc900256cfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x6ac/0x6b0 Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func Call Trace: __kmem_cache_create+0x6ac/0x6b0 Fixes: 107dab5c ("slub: slub-specific propagation changes") Signed-off-by: NQian Cai <cai@lca.pw> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Glauber Costa <glauber@scylladb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200429222356.4322-1-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Dongli Zhang 提交于
stable inclusion from linux-4.19.132 commit 6c09755c02642ea3727a87c994a1d9fab32aa8f4 -------------------------------- [ Upstream commit 52f23478 ] The slub_debug is able to fix the corrupted slab freelist/page. However, alloc_debug_processing() only checks the validity of current and next freepointer during allocation path. As a result, once some objects have their freepointers corrupted, deactivate_slab() may lead to page fault. Below is from a test kernel module when 'slub_debug=PUF,kmalloc-128 slub_nomerge'. The test kernel corrupts the freepointer of one free object on purpose. Unfortunately, deactivate_slab() does not detect it when iterating the freechain. BUG: unable to handle page fault for address: 00000000123456f8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... ... RIP: 0010:deactivate_slab.isra.92+0xed/0x490 ... ... Call Trace: ___slab_alloc+0x536/0x570 __slab_alloc+0x17/0x30 __kmalloc+0x1d9/0x200 ext4_htree_store_dirent+0x30/0xf0 htree_dirblock_to_tree+0xcb/0x1c0 ext4_htree_fill_tree+0x1bc/0x2d0 ext4_readdir+0x54f/0x920 iterate_dir+0x88/0x190 __x64_sys_getdents+0xa6/0x140 do_syscall_64+0x49/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Therefore, this patch adds extra consistency check in deactivate_slab(). Once an object's freepointer is corrupted, all following objects starting at this object are isolated. [akpm@linux-foundation.org: fix build with CONFIG_SLAB_DEBUG=n] Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Joe Jin <joe.jin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200331031450.12182-1-dongli.zhang@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Wang Hai 提交于
stable inclusion from linux-4.19.129 commit 53bb2a6566fb84d2ce3d36ecd42af5cb9c34f14e -------------------------------- commit dde3c6b7 upstream. syzkaller reports for memory leak when kobject_init_and_add() returns an error in the function sysfs_slab_add() [1] When this happened, the function kobject_put() is not called for the corresponding kobject, which potentially leads to memory leak. This patch fixes the issue by calling kobject_put() even if kobject_init_and_add() fails. [1] BUG: memory leak unreferenced object 0xffff8880a6d4be88 (size 8): comm "syz-executor.3", pid 946, jiffies 4295772514 (age 18.396s) hex dump (first 8 bytes): 70 69 64 5f 33 00 ff ff pid_3... backtrace: kstrdup+0x35/0x70 mm/util.c:60 kstrdup_const+0x3d/0x50 mm/util.c:82 kvasprintf_const+0x112/0x170 lib/kasprintf.c:48 kobject_set_name_vargs+0x55/0x130 lib/kobject.c:289 kobject_add_varg lib/kobject.c:384 [inline] kobject_init_and_add+0xd8/0x170 lib/kobject.c:473 sysfs_slab_add+0x1d8/0x290 mm/slub.c:5811 __kmem_cache_create+0x50a/0x570 mm/slub.c:4384 create_cache+0x113/0x1e0 mm/slab_common.c:407 kmem_cache_create_usercopy+0x1a1/0x260 mm/slab_common.c:505 kmem_cache_create+0xd/0x10 mm/slab_common.c:564 create_pid_cachep kernel/pid_namespace.c:54 [inline] create_pid_namespace kernel/pid_namespace.c:96 [inline] copy_pid_ns+0x77c/0x8f0 kernel/pid_namespace.c:148 create_new_namespaces+0x26b/0xa30 kernel/nsproxy.c:95 unshare_nsproxy_namespaces+0xa7/0x1e0 kernel/nsproxy.c:229 ksys_unshare+0x3d2/0x770 kernel/fork.c:2969 __do_sys_unshare kernel/fork.c:3037 [inline] __se_sys_unshare kernel/fork.c:3035 [inline] __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3035 do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295 Fixes: 80da026a ("mm/slub: fix slab double-free in case of duplicate sysfs filename") Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NWang Hai <wanghai38@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200602115033.1054-1-wanghai38@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Vlastimil Babka 提交于
stable inclusion from linux-4.19.113 commit 3e79ba6341083428ce4787d08e9fb757064d192c -------------------------------- commit 0715e6c5 upstream. Sachin reports [1] a crash in SLUB __slab_alloc(): BUG: Kernel NULL pointer dereference on read at 0x000073b0 Faulting instruction address: 0xc0000000003d55f4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1 NIP: c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000 REGS: c0000008b37836d0 TRAP: 0300 Not tainted (5.6.0-rc2-next-20200218-autotest) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24004844 XER: 00000000 CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1 GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500 GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620 GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000 GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000 GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002 GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122 GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8 GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180 NIP ___slab_alloc+0x1f4/0x760 LR __slab_alloc+0x34/0x60 Call Trace: ___slab_alloc+0x334/0x760 (unreliable) __slab_alloc+0x34/0x60 __kmalloc_node+0x110/0x490 kvmalloc_node+0x58/0x110 mem_cgroup_css_online+0x108/0x270 online_css+0x48/0xd0 cgroup_apply_control_enable+0x2ec/0x4d0 cgroup_mkdir+0x228/0x5f0 kernfs_iop_mkdir+0x90/0xf0 vfs_mkdir+0x110/0x230 do_mkdirat+0xb0/0x1a0 system_call+0x5c/0x68 This is a PowerPC platform with following NUMA topology: available: 2 nodes (0-1) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 1 size: 35247 MB node 1 free: 30907 MB node distances: node 0 1 0: 10 40 1: 40 10 possible numa nodes: 0-31 This only happens with a mmotm patch "mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node" [2] which effectively calls kmalloc_node for each possible node. SLUB however only allocates kmem_cache_node on online N_NORMAL_MEMORY nodes, and relies on node_to_mem_node to return such valid node for other nodes since commit a561ce00 ("slub: fall back to node_to_mem_node() node if allocating on memoryless node"). This is however not true in this configuration where the _node_numa_mem_ array is not initialized for nodes 0 and 2-31, thus it contains zeroes and get_partial() ends up accessing non-allocated kmem_cache_node. A related issue was reported by Bharata (originally by Ramachandran) [3] where a similar PowerPC configuration, but with mainline kernel without patch [2] ends up allocating large amounts of pages by kmalloc-1k kmalloc-512. This seems to have the same underlying issue with node_to_mem_node() not behaving as expected, and might probably also lead to an infinite loop with CONFIG_SLUB_CPU_PARTIAL [4]. This patch should fix both issues by not relying on node_to_mem_node() anymore and instead simply falling back to NUMA_NO_NODE, when kmalloc_node(node) is attempted for a node that's not online, or has no usable memory. The "usable memory" condition is also changed from node_present_pages() to N_NORMAL_MEMORY node state, as that is exactly the condition that SLUB uses to allocate kmem_cache_node structures. The check in get_partial() is removed completely, as the checks in ___slab_alloc() are now sufficient to prevent get_partial() being reached with an invalid node. [1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/ [2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/ [3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/ [4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/ Fixes: a561ce00 ("slub: fall back to node_to_mem_node() node if allocating on memoryless node") Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com> Reported-by: NPUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy@in.ibm.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: NBharata B Rao <bharata@linux.ibm.com> Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christopher Lameter <cl@linux.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200320115533.9604-1-vbabka@suse.czDebugged-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Linus Torvalds 提交于
stable inclusion from linux-4.19.113 commit 451d4a2390ed7236b3713c315780c3c50618748c -------------------------------- commit 5076190d upstream. This is just a cleanup addition to Jann's fix to properly update the transaction ID for the slub slowpath in commit fd4d9c7d ("mm: slub: add missing TID bump.."). The transaction ID is what protects us against any concurrent accesses, but we should really also make sure to make the 'freelist' comparison itself always use the same freelist value that we then used as the new next free pointer. Jann points out that if we do all of this carefully, we could skip the transaction ID update for all the paths that only remove entries from the lists, and only update the TID when adding entries (to avoid the ABA issue with cmpxchg and list handling re-adding a previously seen value). But this patch just does the "make sure to cmpxchg the same value we used" rather than then try to be clever. Acked-by: NJann Horn <jannh@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jann Horn 提交于
stable inclusion from linux-4.19.112 commit 30f6cae722654caef2ab4bacb2e910bfd766866b -------------------------------- commit fd4d9c7d upstream. When kmem_cache_alloc_bulk() attempts to allocate N objects from a percpu freelist of length M, and N > M > 0, it will first remove the M elements from the percpu freelist, then call ___slab_alloc() to allocate the next element and repopulate the percpu freelist. ___slab_alloc() can re-enable IRQs via allocate_slab(), so the TID must be bumped before ___slab_alloc() to properly commit the freelist head change. Fix it by unconditionally bumping c->tid when entering the slowpath. Cc: stable@vger.kernel.org Fixes: ebe909e0 ("slub: improve bulk alloc strategy") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 27 12月, 2019 4 次提交
-
-
由 Qian Cai 提交于
commit e4f8e513 upstream. A long time ago we fixed a similar deadlock in show_slab_objects() [1]. However, it is apparently due to the commits like 01fb58bc ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path") and 03afc0e2 ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by just reading files in /sys/kernel/slab which will generate a lockdep splat below. Since the "mem_hotplug_lock" here is only to obtain a stable online node mask while racing with NUMA node hotplug, in the worst case, the results may me miscalculated while doing NUMA node hotplug, but they shall be corrected by later reads of the same files. WARNING: possible circular locking dependency detected ------------------------------------------------------ cat/5224 is trying to acquire lock: ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at: show_slab_objects+0x94/0x3a8 but task is already holding lock: b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (kn->count#45){++++}: lock_acquire+0x31c/0x360 __kernfs_remove+0x290/0x490 kernfs_remove+0x30/0x44 sysfs_remove_dir+0x70/0x88 kobject_del+0x50/0xb0 sysfs_slab_unlink+0x2c/0x38 shutdown_cache+0xa0/0xf0 kmemcg_cache_shutdown_fn+0x1c/0x34 kmemcg_workfn+0x44/0x64 process_one_work+0x4f4/0x950 worker_thread+0x390/0x4bc kthread+0x1cc/0x1e8 ret_from_fork+0x10/0x18 -> #1 (slab_mutex){+.+.}: lock_acquire+0x31c/0x360 __mutex_lock_common+0x16c/0xf78 mutex_lock_nested+0x40/0x50 memcg_create_kmem_cache+0x38/0x16c memcg_kmem_cache_create_func+0x3c/0x70 process_one_work+0x4f4/0x950 worker_thread+0x390/0x4bc kthread+0x1cc/0x1e8 ret_from_fork+0x10/0x18 -> #0 (mem_hotplug_lock.rw_sem){++++}: validate_chain+0xd10/0x2bcc __lock_acquire+0x7f4/0xb8c lock_acquire+0x31c/0x360 get_online_mems+0x54/0x150 show_slab_objects+0x94/0x3a8 total_objects_show+0x28/0x34 slab_attr_show+0x38/0x54 sysfs_kf_seq_show+0x198/0x2d4 kernfs_seq_show+0xa4/0xcc seq_read+0x30c/0x8a8 kernfs_fop_read+0xa8/0x314 __vfs_read+0x88/0x20c vfs_read+0xd8/0x10c ksys_read+0xb0/0x120 __arm64_sys_read+0x54/0x88 el0_svc_handler+0x170/0x240 el0_svc+0x8/0xc other info that might help us debug this: Chain exists of: mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(kn->count#45); lock(slab_mutex); lock(kn->count#45); lock(mem_hotplug_lock.rw_sem); *** DEADLOCK *** 3 locks held by cat/5224: #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8 #1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0 #2: b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0 stack backtrace: Call trace: dump_backtrace+0x0/0x248 show_stack+0x20/0x2c dump_stack+0xd0/0x140 print_circular_bug+0x368/0x380 check_noncircular+0x248/0x250 validate_chain+0xd10/0x2bcc __lock_acquire+0x7f4/0xb8c lock_acquire+0x31c/0x360 get_online_mems+0x54/0x150 show_slab_objects+0x94/0x3a8 total_objects_show+0x28/0x34 slab_attr_show+0x38/0x54 sysfs_kf_seq_show+0x198/0x2d4 kernfs_seq_show+0xa4/0xcc seq_read+0x30c/0x8a8 kernfs_fop_read+0xa8/0x314 __vfs_read+0x88/0x20c vfs_read+0xd8/0x10c ksys_read+0xb0/0x120 __arm64_sys_read+0x54/0x88 el0_svc_handler+0x170/0x240 el0_svc+0x8/0xc I think it is important to mention that this doesn't expose the show_slab_objects to use-after-free. There is only a single path that might really race here and that is the slab hotplug notifier callback __kmem_cache_shrink (via slab_mem_going_offline_callback) but that path doesn't really destroy kmem_cache_node data structures. [1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html [akpm@linux-foundation.org: add comment explaining why we don't need mem_hotplug_lock] Link: http://lkml.kernel.org/r/1570192309-10132-1-git-send-email-cai@lca.pw Fixes: 01fb58bc ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path") Fixes: 03afc0e2 ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}") Signed-off-by: NQian Cai <cai@lca.pw> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Shakeel Butt 提交于
mainline inclusion from mainline-v5.3-rc1 commit cb097cd4 category: bugfix bugzilla: 18470 CVE: NA ------------------------------------------------- Currently for CONFIG_SLUB, if a memcg kmem cache creation is failed and the corresponding root kmem cache has SLAB_PANIC flag, the kernel will be crashed. This is unnecessary as the kernel can handle the creation failures of memcg kmem caches. Additionally CONFIG_SLAB does not implement this behavior. So, to keep the behavior consistent between SLAB and SLUB, removing the panic for memcg kmem cache creation failures. The root kmem cache creation failure for SLAB_PANIC correctly panics for both SLAB and SLUB. Link: http://lkml.kernel.org/r/20190619232514.58994-1-shakeelb@google.comReported-by: NDave Hansen <dave.hansen@intel.com> Signed-off-by: NShakeel Butt <shakeelb@google.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: Roman Gushchin <guro@fb.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ntongtiangen <tongtiangen@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Nicolas Boichat 提交于
commit 6d6ea1e9 upstream. Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables", v6. This is a followup to the discussion in [1], [2]. IOMMUs using ARMv7 short-descriptor format require page tables (level 1 and 2) to be allocated within the first 4GB of RAM, even on 64-bit systems. For L1 tables that are bigger than a page, we can just use __get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still use GFP_DMA). For L2 tables that only take 1KB, it would be a waste to allocate a full page, so we considered 3 approaches: 1. This series, adding support for GFP_DMA32 slab caches. 2. genalloc, which requires pre-allocating the maximum number of L2 page tables (4096, so 4MB of memory). 3. page_frag, which is not very memory-efficient as it is unable to reuse freed fragments until the whole page is freed. [3] This series is the most memory-efficient approach. stable@ note: We confirmed that this is a regression, and IOMMU errors happen on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue most likely starts from commit ad67f5a6 ("arm64: replace ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek platforms (and maybe others?). [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html [2] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html [3] https://patchwork.codeaurora.org/patch/671639/ This patch (of 3): IOMMUs using ARMv7 short-descriptor format require page tables to be allocated within the first 4GB of RAM, even on 64-bit systems. On arm64, this is done by passing GFP_DMA32 flag to memory allocation functions. For IOMMU L2 tables that only take 1KB, it would be a waste to allocate a full page using get_free_pages, so we considered 3 approaches: 1. This patch, adding support for GFP_DMA32 slab caches. 2. genalloc, which requires pre-allocating the maximum number of L2 page tables (4096, so 4MB of memory). 3. page_frag, which is not very memory-efficient as it is unable to reuse freed fragments until the whole page is freed. This change makes it possible to create a custom cache in DMA32 zone using kmem_cache_create, then allocate memory using kmem_cache_alloc. We do not create a DMA32 kmalloc cache array, as there are currently no users of kmalloc(..., GFP_DMA32). These calls will continue to trigger a warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK. This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32 kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and unnecessary). Link: http://lkml.kernel.org/r/20181210011504.122604-2-drinkcat@chromium.orgSigned-off-by: NNicolas Boichat <drinkcat@chromium.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NWill Deacon <will.deacon@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Sasha Levin <Alexander.Levin@microsoft.com> Cc: Huaisheng Ye <yehs1@lenovo.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Yong Wu <yong.wu@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Tomasz Figa <tfiga@google.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Qian Cai 提交于
mainline inclusion from mainline-5.0 commit 278d7756 category: bugfix bugzilla: 11609 CVE: NA ------------------------------------------------ "addr" function argument is not used in alloc_consistency_checks() at all, so remove it. Link: http://lkml.kernel.org/r/20190211123214.35592-1-cai@lca.pw Fixes: becfda68 ("slub: convert SLAB_DEBUG_FREE to SLAB_CONSISTENCY_CHECKS") Signed-off-by: NQian Cai <cai@lca.pw> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Nzhong jiang <zhongjiang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 30 8月, 2018 1 次提交
-
-
由 Mukesh Ojha 提交于
The conversion of the hotplug notifiers to a state machine left the notifier.h includes around in some places. Remove them. Signed-off-by: NMukesh Ojha <mojha@codeaurora.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1535114033-4605-1-git-send-email-mojha@codeaurora.org
-
- 18 8月, 2018 1 次提交
-
-
由 Vlastimil Babka 提交于
In SLUB, prefetch_freepointer() is used when allocating an object from cache's freelist, to make sure the next object in the list is cache-hot, since it's probable it will be allocated soon. Commit 2482ddec ("mm: add SLUB free list pointer obfuscation") has unintentionally changed the prefetch in a way where the prefetch is turned to a real fetch, and only the next->next pointer is prefetched. In case there is not a stream of allocations that would benefit from prefetching, the extra real fetch might add a useless cache miss to the allocation. Restore the previous behavior. Link: http://lkml.kernel.org/r/20180809085245.22448-1-vbabka@suse.cz Fixes: 2482ddec ("mm: add SLUB free list pointer obfuscation") Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NKees Cook <keescook@chromium.org> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 6月, 2018 1 次提交
-
-
由 Mikulas Patocka 提交于
In kernel 4.17 I removed some code from dm-bufio that did slab cache merging (commit 21bb1327: "dm bufio: remove code that merges slab caches") - both slab and slub support merging caches with identical attributes, so dm-bufio now just calls kmem_cache_create and relies on implicit merging. This uncovered a bug in the slub subsystem - if we delete a cache and immediatelly create another cache with the same attributes, it fails because of duplicate filename in /sys/kernel/slab/. The slub subsystem offloads freeing the cache to a workqueue - and if we create the new cache before the workqueue runs, it complains because of duplicate filename in sysfs. This patch fixes the bug by moving the call of kobject_del from sysfs_slab_remove_workfn to shutdown_cache. kobject_del must be called while we hold slab_mutex - so that the sysfs entry is deleted before a cache with the same attributes could be created. Running device-mapper-test-suite with: dmtest run --suite thin-provisioning -n /commit_failure_causes_fallback/ triggered: Buffer I/O error on dev dm-0, logical block 1572848, async page read device-mapper: thin: 253:1: metadata operation 'dm_pool_alloc_data_block' failed: error = -5 device-mapper: thin: 253:1: aborting current metadata transaction sysfs: cannot create duplicate filename '/kernel/slab/:a-0000144' CPU: 2 PID: 1037 Comm: kworker/u48:1 Not tainted 4.17.0.snitm+ #25 Hardware name: Supermicro SYS-1029P-WTR/X11DDW-L, BIOS 2.0a 12/06/2017 Workqueue: dm-thin do_worker [dm_thin_pool] Call Trace: dump_stack+0x5a/0x73 sysfs_warn_dup+0x58/0x70 sysfs_create_dir_ns+0x77/0x80 kobject_add_internal+0xba/0x2e0 kobject_init_and_add+0x70/0xb0 sysfs_slab_add+0xb1/0x250 __kmem_cache_create+0x116/0x150 create_cache+0xd9/0x1f0 kmem_cache_create_usercopy+0x1c1/0x250 kmem_cache_create+0x18/0x20 dm_bufio_client_create+0x1ae/0x410 [dm_bufio] dm_block_manager_create+0x5e/0x90 [dm_persistent_data] __create_persistent_data_objects+0x38/0x940 [dm_thin_pool] dm_pool_abort_metadata+0x64/0x90 [dm_thin_pool] metadata_operation_failed+0x59/0x100 [dm_thin_pool] alloc_data_block.isra.53+0x86/0x180 [dm_thin_pool] process_cell+0x2a3/0x550 [dm_thin_pool] do_worker+0x28d/0x8f0 [dm_thin_pool] process_one_work+0x171/0x370 worker_thread+0x49/0x3f0 kthread+0xf8/0x130 ret_from_fork+0x35/0x40 kobject_add_internal failed for :a-0000144 with -EEXIST, don't try to register things with the same name in the same directory. kmem_cache_create(dm_bufio_buffer-16) failed with error -17 Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1806151817130.6333@file01.intranet.prod.int.rdu2.redhat.comSigned-off-by: NMikulas Patocka <mpatocka@redhat.com> Reported-by: NMike Snitzer <snitzer@redhat.com> Tested-by: NMike Snitzer <snitzer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 6月, 2018 2 次提交
-
-
由 Kees Cook 提交于
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org> -
由 Kees Cook 提交于
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 08 6月, 2018 9 次提交
-
-
由 Matthew Wilcox 提交于
Christoph doubts anyone was using the 'reserved' file in sysfs, so remove it. Link: http://lkml.kernel.org/r/20180518194519.3820-17-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
The reserved field was only used for embedding an rcu_head in the data structure. With the previous commit, we no longer need it. That lets us remove the 'reserved' argument to a lot of functions. Link: http://lkml.kernel.org/r/20180518194519.3820-16-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
rcu_head may now grow larger than list_head without affecting slab or slub. Link: http://lkml.kernel.org/r/20180518194519.3820-15-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
Since the LRU is two words, this does not affect the double-word alignment of SLUB's freelist. Link: http://lkml.kernel.org/r/20180518194519.3820-10-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
By moving page->private to the fourth word of struct page, we can put the SLUB counters in the same word as SLAB's s_mem and still do the cmpxchg_double trick. Now the SLUB counters no longer overlap with the mapcount or refcount so we can drop the call to page_mapcount_reset() and simplify set_page_slub_counters() to a single line. Link: http://lkml.kernel.org/r/20180518194519.3820-6-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
This will allow us to store slub's counters in the same bits as slab's s_mem. slub now needs to set page->mapping to NULL as it frees the page, just like slab does. Link: http://lkml.kernel.org/r/20180518194519.3820-5-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Canjiang Lu 提交于
The obsolete comment removed in this patch was introduced by 51df1142 ("slub: Dynamically size kmalloc cache allocations"). I paste related modification from that commit: +#ifdef CONFIG_NUMA + /* + * Allocate kmem_cache_node properly from the kmem_cache slab. + * kmem_cache_node is separately allocated so no need to + * update any list pointers. + */ + temp_kmem_cache_node = kmem_cache_node; + kmem_cache_node = kmem_cache_alloc(kmem_cache, GFP_NOWAIT); + memcpy(kmem_cache_node, temp_kmem_cache_node, kmem_size); + + kmem_cache_bootstrap_fixup(kmem_cache_node); + + caches++; +#else + /* + * kmem_cache has kmem_cache_node embedded and we moved it! + * Update the list heads + */ + INIT_LIST_HEAD(&kmem_cache->local_node.partial); + list_splice(&temp_kmem_cache->local_node.partial, &kmem_cache->local_node.partial); +#ifdef CONFIG_SLUB_DEBUG + INIT_LIST_HEAD(&kmem_cache->local_node.full); + list_splice(&temp_kmem_cache->local_node.full, &kmem_cache->local_node.full); +#endif As we can see there're used to distinguish the difference handling between NUMA/non-NUMA configuration in the original commit. I think it doesn't make any sense in current implementation which is placed above kmem_cache_node = bootstrap(&boot_kmem_cache_node); So maybe it's better to remove them now? Link: http://lkml.kernel.org/r/5af26f58.1c69fb81.1be0e.c520SMTPIN_ADDED_BROKEN@mx.google.comSigned-off-by: NCanjiang Lu <canjiang.lu@samsung.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mathieu Malaterre 提交于
__printf is useful to verify format and arguments. Remove the following warning (with W=1): mm/slub.c:721:2: warning: function might be possible candidate for `gnu_printf' format attribute [-Wsuggest-attribute=format] Link: http://lkml.kernel.org/r/20180505200706.19986-1-malat@debian.orgSigned-off-by: NMathieu Malaterre <malat@debian.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
__GFP_ZERO requests that the object be initialised to all-zeroes, while the purpose of a constructor is to initialise an object to a particular pattern. We cannot do both. Add a warning to catch any users who mistakenly pass a __GFP_ZERO flag when allocating a slab with a constructor. Link: http://lkml.kernel.org/r/20180412191322.GA21205@bombadil.infradead.org Fixes: d07dbea4 ("Slab allocators: support __GFP_ZERO in all allocators") Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 4月, 2018 1 次提交
-
-
由 Andrey Konovalov 提交于
The kasan_slab_free hook's return value denotes whether the reuse of a slab object must be delayed (e.g. when the object is put into memory qurantine). The current way SLUB handles this hook is by ignoring its return value and hardcoding checks similar (but not exactly the same) to the ones performed in kasan_slab_free, which is prone to making mistakes. The main difference between the hardcoded checks and the ones in kasan_slab_free is whether we want to perform a free in case when an invalid-free or a double-free was detected (we don't). This patch changes the way SLUB handles this by: 1. taking into account the return value of kasan_slab_free for each of the objects, that are being freed; 2. reconstructing the freelist of objects to exclude the ones, whose reuse must be delayed. [andreyknvl@google.com: eliminate unnecessary branch in slab_free] Link: http://lkml.kernel.org/r/a62759a2545fddf69b0c034547212ca1eb1b3ce2.1520359686.git.andreyknvl@google.com Link: http://lkml.kernel.org/r/083f58501e54731203801d899632d76175868e97.1519400992.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com> Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kostya Serebryany <kcc@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 4月, 2018 5 次提交
-
-
由 Shakeel Butt 提交于
The kasan quarantine is designed to delay freeing slab objects to catch use-after-free. The quarantine can be large (several percent of machine memory size). When kmem_caches are deleted related objects are flushed from the quarantine but this requires scanning the entire quarantine which can be very slow. We have seen the kernel busily working on this while holding slab_mutex and badly affecting cache_reaper, slabinfo readers and memcg kmem cache creations. It can easily reproduced by following script: yes . | head -1000000 | xargs stat > /dev/null for i in `seq 1 10`; do seq 500 | (cd /cg/memory && xargs mkdir) seq 500 | xargs -I{} sh -c 'echo $BASHPID > \ /cg/memory/{}/tasks && exec stat .' > /dev/null seq 500 | (cd /cg/memory && xargs rmdir) done The busy stack: kasan_cache_shutdown shutdown_cache memcg_destroy_kmem_caches mem_cgroup_css_free css_free_rwork_fn process_one_work worker_thread kthread ret_from_fork This patch is based on the observation that if the kmem_cache to be destroyed is empty then there should not be any objects of this cache in the quarantine. Without the patch the script got stuck for couple of hours. With the patch the script completed within a second. Link: http://lkml.kernel.org/r/20180327230603.54721-1-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> -
由 Alexey Dobriyan 提交于
Function returns size of the object without red zone which can't be negative. Link: http://lkml.kernel.org/r/20180305200730.15812-24-adobriyan@gmail.comSigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
struct kmem_cache_order_objects is for mixing order and number of objects, and orders aren't big enough to warrant 64-bit width. Propagate unsignedness down so that everything fits. !!! Patch assumes that "PAGE_SIZE << order" doesn't overflow. !!! Link: http://lkml.kernel.org/r/20180305200730.15812-23-adobriyan@gmail.comSigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
slab_index() returns index of an object within a slab which is at most u15 (or u16?). Iterators additionally guarantee that "p >= addr". Link: http://lkml.kernel.org/r/20180305200730.15812-22-adobriyan@gmail.comSigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
If kmem case sizes are 32-bit, then usecopy region should be too. Link: http://lkml.kernel.org/r/20180305200730.15812-21-adobriyan@gmail.comSigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-