- 08 8月, 2020 8 次提交
-
-
由 Vlastimil Babka 提交于
There are few places that call kmem_cache_debug(s) (which tests if any of debug flags are enabled for a cache) immediately followed by a test for a specific flag. The compiler can probably eliminate the extra check, but we can make the code nicer by introducing kmem_cache_debug_flags() that works like kmem_cache_debug() (including the static key check) but tests for specific flag(s). The next patches will add more users. [vbabka@suse.cz: change return from int to bool, per Kees. Add VM_WARN_ON_ONCE() for invalid flags, per Roman] Link: http://lkml.kernel.org/r/949b90ed-e0f0-07d7-4d21-e30ec0958a7c@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NRoman Gushchin <guro@fb.com> Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NKees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-8-vbabka@suse.czSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
One advantage of CONFIG_SLUB_DEBUG is that a generic distro kernel can be built with the option enabled, but it's inactive until simply enabled on boot, without rebuilding the kernel. With a static key, we can further eliminate the overhead of checking whether a cache has a particular debug flag enabled if we know that there are no such caches (slub_debug was not enabled during boot). We use the same mechanism also for e.g. page_owner, debug_pagealloc or kmemcg functionality. This patch introduces the static key and makes the general check for per-cache debug flags kmem_cache_debug() use it. This benefits several call sites, including (slow path but still rather frequent) __slab_free(). The next patches will add more uses. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NKees Cook <keescook@chromium.org> Acked-by: NRoman Gushchin <guro@fb.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-7-vbabka@suse.czSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
The attribute reflects the SLAB_RECLAIM_ACCOUNT cache flag. It's not clear why this attribute was writable in the first place, as it's tied to how the cache is used by its creator, it's not a user tunable. Furthermore: - it affects slab merging, but that's not being checked while toggled - if affects whether __GFP_RECLAIMABLE flag is used to allocate page, but the runtime toggle doesn't update allocflags - it affects cache_vmstat_idx() so runtime toggling might lead to incosistency of NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE Thus make it read-only. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NKees Cook <keescook@chromium.org> Acked-by: NRoman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-6-vbabka@suse.czSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be read to check if the respective debugging options are enabled for given cache. Some options, namely sanity_checks, trace, and failslab can be also enabled and disabled at runtime by writing into the files. The runtime toggling is racy. Some options disable __CMPXCHG_DOUBLE when enabled, which means that in case of concurrent allocations, some can still use __CMPXCHG_DOUBLE and some not, leading to potential corruption. The s->flags field is also not updated or checked atomically. The simplest solution is to remove the runtime toggling. The extended slub_debug boot parameter syntax introduced by earlier patch should allow to fine-tune the debugging configuration during boot with same granularity. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NKees Cook <keescook@chromium.org> Acked-by: NRoman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-5-vbabka@suse.czSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
SLUB allows runtime changing of page allocation order by writing into the /sys/kernel/slab/<cache>/order file. Jann has reported [1] that this interface allows the order to be set too small, leading to crashes. While it's possible to fix the immediate issue, closer inspection reveals potential races. Storing the new order calls calculate_sizes() which non-atomically updates a lot of kmem_cache fields while the cache is still in use. Unexpected behavior might occur even if the fields are set to the same value as they were. This could be fixed by splitting out the part of calculate_sizes() that depends on forced_order, so that we only update kmem_cache.oo field. This could still race with init_cache_random_seq(), shuffle_freelist(), allocate_slab(). Perhaps it's possible to audit and e.g. add some READ_ONCE/WRITE_ONCE accesses, it might be easier just to remove the runtime order changes, which is what this patch does. If there are valid usecases for per-cache order setting, we could e.g. extend the boot parameters to do that. [1] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.comReported-by: NJann Horn <jannh@google.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NKees Cook <keescook@chromium.org> Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NRoman Gushchin <guro@fb.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-4-vbabka@suse.czSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be read to check if the respective debugging options are enabled for given cache. The options can be also toggled at runtime by writing into the files. Some of those, namely red_zone, poison, and store_user can be toggled only when no objects yet exist in the cache. Vijayanand reports [1] that there is a problem with freelist randomization if changing the debugging option's state results in different number of objects per page, and the random sequence cache needs thus needs to be recomputed. However, another problem is that the check for "no objects yet exist in the cache" is racy, as noted by Jann [2] and fixing that would add overhead or otherwise complicate the allocation/freeing paths. Thus it would be much simpler just to remove the runtime toggling support. The documentation describes it's "In case you forgot to enable debugging on the kernel command line", but the neccessity of having no objects limits its usefulness anyway for many caches. Vijayanand describes an use case [3] where debugging is enabled for all but zram caches for memory overhead reasons, and using the runtime toggles was the only way to achieve such configuration. After the previous patch it's now possible to do that directly from the kernel boot option, so we can remove the dangerous runtime toggles by making the /sys attribute files read-only. While updating it, also improve the documentation of the debugging /sys files. [1] https://lkml.kernel.org/r/1580379523-32272-1-git-send-email-vjitta@codeaurora.org [2] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com [3] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.orgReported-by: NVijayanand Jitta <vjitta@codeaurora.org> Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NKees Cook <keescook@chromium.org> Acked-by: NRoman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-3-vbabka@suse.czSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Patch series "slub_debug fixes and improvements". The slub_debug kernel boot parameter can either apply a single set of options to all caches or a list of caches. There is a use case where debugging is applied for all caches and then disabled at runtime for specific caches, for performance and memory consumption reasons [1]. As runtime changes are dangerous, extend the boot parameter syntax so that multiple blocks of either global or slab-specific options can be specified, with blocks delimited by ';'. This will also support the use case of [1] without runtime changes. For details see the updated Documentation/vm/slub.rst [1] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org [weiyongjun1@huawei.com: make parse_slub_debug_flags() static] Link: http://lkml.kernel.org/r/20200702150522.4940-1-weiyongjun1@huawei.comSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NKees Cook <keescook@chromium.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200610163135.17364-2-vbabka@suse.czSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Long Li 提交于
kmalloc cannot allocate memory from HIGHMEM. Allocating large amounts of memory currently bypasses the check and will simply leak the memory when page_address() returns NULL. To fix this, factor the GFP_SLAB_BUG_MASK check out of slab & slub, and call it from kmalloc_order() as well. In order to make the code clear, the warning message is put in one place. Signed-off-by: NLong Li <lonuxli.64@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NPekka Enberg <penberg@kernel.org> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200704035027.GA62481@lilongSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 7月, 2020 1 次提交
-
-
由 Kees Cook 提交于
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 26 6月, 2020 1 次提交
-
-
According to Christopher Lameter two fixes have been merged for the same problem. As far as I can tell, the code does not acquire the list_lock and invoke kmalloc(). list_slab_objects() misses an unlock (the counterpart to get_map()) and the memory allocated in free_partial() isn't used. Revert the mentioned commit. Link: http://lkml.kernel.org/r/20200618201234.795692-1-bigeasy@linutronix.de Fixes: aa456c7a ("slub: remove kmalloc under list_lock from list_slab_objects() V2") Link: https://lkml.kernel.org/r/alpine.DEB.2.22.394.2006181501480.12014@www.lameter.comSigned-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 6月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Better describe what these functions do. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 6月, 2020 1 次提交
-
-
由 Ethon Paul 提交于
There is a typo in comment, fix it. Signed-off-by: NEthon Paul <ethp@qq.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NDavid Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/20200411002247.14468-1-ethp@qq.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 6月, 2020 2 次提交
-
-
由 Joonsoo Kim 提交于
classzone_idx is just different name for high_zoneidx now. So, integrate them and add some comment to struct alloc_context in order to reduce future confusion about the meaning of this variable. The accessor, ac_classzone_idx() is also removed since it isn't needed after integration. In addition to integration, this patch also renames high_zoneidx to highest_zoneidx since it represents more precise meaning. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NBaoquan He <bhe@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Ye Xiaolong <xiaolong.ye@intel.com> Link: http://lkml.kernel.org/r/1587095923-7515-3-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wang Hai 提交于
syzkaller reports for memory leak when kobject_init_and_add() returns an error in the function sysfs_slab_add() [1] When this happened, the function kobject_put() is not called for the corresponding kobject, which potentially leads to memory leak. This patch fixes the issue by calling kobject_put() even if kobject_init_and_add() fails. [1] BUG: memory leak unreferenced object 0xffff8880a6d4be88 (size 8): comm "syz-executor.3", pid 946, jiffies 4295772514 (age 18.396s) hex dump (first 8 bytes): 70 69 64 5f 33 00 ff ff pid_3... backtrace: kstrdup+0x35/0x70 mm/util.c:60 kstrdup_const+0x3d/0x50 mm/util.c:82 kvasprintf_const+0x112/0x170 lib/kasprintf.c:48 kobject_set_name_vargs+0x55/0x130 lib/kobject.c:289 kobject_add_varg lib/kobject.c:384 [inline] kobject_init_and_add+0xd8/0x170 lib/kobject.c:473 sysfs_slab_add+0x1d8/0x290 mm/slub.c:5811 __kmem_cache_create+0x50a/0x570 mm/slub.c:4384 create_cache+0x113/0x1e0 mm/slab_common.c:407 kmem_cache_create_usercopy+0x1a1/0x260 mm/slab_common.c:505 kmem_cache_create+0xd/0x10 mm/slab_common.c:564 create_pid_cachep kernel/pid_namespace.c:54 [inline] create_pid_namespace kernel/pid_namespace.c:96 [inline] copy_pid_ns+0x77c/0x8f0 kernel/pid_namespace.c:148 create_new_namespaces+0x26b/0xa30 kernel/nsproxy.c:95 unshare_nsproxy_namespaces+0xa7/0x1e0 kernel/nsproxy.c:229 ksys_unshare+0x3d2/0x770 kernel/fork.c:2969 __do_sys_unshare kernel/fork.c:3037 [inline] __se_sys_unshare kernel/fork.c:3035 [inline] __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3035 do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295 Fixes: 80da026a ("mm/slub: fix slab double-free in case of duplicate sysfs filename") Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NWang Hai <wanghai38@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200602115033.1054-1-wanghai38@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 6月, 2020 4 次提交
-
-
由 Qian Cai 提交于
There is no need to copy SLUB_STATS items from root memcg cache to new memcg cache copies. Doing so could result in stack overruns because the store function only accepts 0 to clear the stat and returns an error for everything else while the show method would print out the whole stat. Then, the mismatch of the lengths returns from show and store methods happens in memcg_propagate_slab_attrs(): else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf)) buf = mbuf; max_attr_size is only 2 from slab_attr_store(), then, it uses mbuf[64] in show_stat() later where a bounch of sprintf() would overrun the stack variable. Fix it by always allocating a page of buffer to be used in show_stat() if SLUB_STATS=y which should only be used for debug purpose. # echo 1 > /sys/kernel/slab/fs_cache/shrink BUG: KASAN: stack-out-of-bounds in number+0x421/0x6e0 Write of size 1 at addr ffffc900256cfde0 by task kworker/76:0/53251 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func Call Trace: number+0x421/0x6e0 vsnprintf+0x451/0x8e0 sprintf+0x9e/0xd0 show_stat+0x124/0x1d0 alloc_slowpath_show+0x13/0x20 __kmem_cache_create+0x47a/0x6b0 addr ffffc900256cfde0 is located in stack of task kworker/76:0/53251 at offset 0 in frame: process_one_work+0x0/0xb90 this frame has 1 object: [32, 72) 'lockdep_map' Memory state around the buggy address: ffffc900256cfc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffc900256cfd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffffc900256cfd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 ^ ffffc900256cfe00: 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00 ffffc900256cfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x6ac/0x6b0 Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func Call Trace: __kmem_cache_create+0x6ac/0x6b0 Fixes: 107dab5c ("slub: slub-specific propagation changes") Signed-off-by: NQian Cai <cai@lca.pw> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Glauber Costa <glauber@scylladb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200429222356.4322-1-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christopher Lameter 提交于
list_slab_objects() is called when a slab is destroyed and there are objects still left to list the objects in the syslog. This is a pretty rare event. And there it seems we take the list_lock and call kmalloc while holding that lock. Perform the allocation in free_partial() before the list_lock is taken. Fixes: bbd7d57b ("slub: Potential stack overflow") Signed-off-by: NChristopher Lameter <cl@linux.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Yu Zhao <yuzhao@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2002031721250.1668@www.lameter.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Lameter 提交于
I came across some unnecessary uevents once again which reminded me this. The patch seems to be lost in the leaves of the original discussion [1], so resending. [1] https://lore.kernel.org/r/alpine.DEB.2.21.2001281813130.745@www.lameter.com Kmem caches are internal kernel structures so it is strange that userspace notifiers would be needed. And I am not aware of any use of these notifiers. These notifiers may just exist because in the initial slub release the sysfs code was copied from another subsystem. Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Koutný <mkoutny@suse.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200423115721.19821-1-mkoutny@suse.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dongli Zhang 提交于
The slub_debug is able to fix the corrupted slab freelist/page. However, alloc_debug_processing() only checks the validity of current and next freepointer during allocation path. As a result, once some objects have their freepointers corrupted, deactivate_slab() may lead to page fault. Below is from a test kernel module when 'slub_debug=PUF,kmalloc-128 slub_nomerge'. The test kernel corrupts the freepointer of one free object on purpose. Unfortunately, deactivate_slab() does not detect it when iterating the freechain. BUG: unable to handle page fault for address: 00000000123456f8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... ... RIP: 0010:deactivate_slab.isra.92+0xed/0x490 ... ... Call Trace: ___slab_alloc+0x536/0x570 __slab_alloc+0x17/0x30 __kmalloc+0x1d9/0x200 ext4_htree_store_dirent+0x30/0xf0 htree_dirblock_to_tree+0xcb/0x1c0 ext4_htree_fill_tree+0x1bc/0x2d0 ext4_readdir+0x54f/0x920 iterate_dir+0x88/0x190 __x64_sys_getdents+0xa6/0x140 do_syscall_64+0x49/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Therefore, this patch adds extra consistency check in deactivate_slab(). Once an object's freepointer is corrupted, all following objects starting at this object are isolated. [akpm@linux-foundation.org: fix build with CONFIG_SLAB_DEBUG=n] Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Joe Jin <joe.jin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200331031450.12182-1-dongli.zhang@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 5月, 2020 1 次提交
-
-
由 Waiman Long 提交于
In a couple of places in the slub memory allocator, the code uses "s->offset" as a check to see if the free pointer is put right after the object. That check is no longer true with commit 3202fa62 ("slub: relocate freelist pointer to middle of object"). As a result, echoing "1" into the validate sysfs file, e.g. of dentry, may cause a bunch of "Freepointer corrupt" error reports like the following to appear with the system in panic afterwards. ============================================================================= BUG dentry(666:pmcd.service) (Tainted: G B): Freepointer corrupt ----------------------------------------------------------------------------- To fix it, use the check "s->offset == s->inuse" in the new helper function freeptr_outside_object() instead. Also add another helper function get_info_end() to return the end of info block (inuse + free pointer if not overlapping with object). Fixes: 3202fa62 ("slub: relocate freelist pointer to middle of object") Signed-off-by: NWaiman Long <longman@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKees Cook <keescook@chromium.org> Acked-by: NRafael Aquini <aquini@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Vitaly Nikolenko <vnik@duasynt.com> Cc: Silvio Cesare <silvio.cesare@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: Changbin Du <changbin.du@gmail.com> Link: http://lkml.kernel.org/r/20200429135328.26976-1-longman@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 4月, 2020 1 次提交
-
-
由 Kees Cook 提交于
Marco Elver reported system crashes when booting with "slub_debug=Z". The freepointer location (s->offset) was not taking into account that the "inuse" size that includes the redzone area should not be used by the freelist pointer. Change the calculation to save the area of the object that an inline freepointer may be written into. Fixes: 3202fa62 ("slub: relocate freelist pointer to middle of object") Reported-by: NMarco Elver <elver@google.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NMarco Elver <elver@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/202004151054.BD695840@keescook Link: https://lore.kernel.org/linux-mm/20200415164726.GA234932@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 4月, 2020 2 次提交
-
-
由 Jules Irenge 提交于
Sparse reports a warning at put_map()() warning: context imbalance in put_map() - unexpected unlock The root cause is the missing annotation at put_map() Add the missing __releases(&object_map_lock) annotation Signed-off-by: NJules Irenge <jbi.octave@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200214204741.94112-10-jbi.octave@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jules Irenge 提交于
Sparse reports a warning at get_map()() warning: context imbalance in get_map() - wrong count at exit The root cause is the missing annotation at get_map() Add the missing __acquires(&object_map_lock) annotation Signed-off-by: NJules Irenge <jbi.octave@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200214204741.94112-9-jbi.octave@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 4月, 2020 4 次提交
-
-
由 Kees Cook 提交于
In a recent discussion[1] with Vitaly Nikolenko and Silvio Cesare, it became clear that moving the freelist pointer away from the edge of allocations would likely improve the overall defensive posture of the inline freelist pointer. My benchmarks show no meaningful change to performance (they seem to show it being faster), so this looks like a reasonable change to make. Instead of having the freelist pointer at the very beginning of an allocation (offset 0) or at the very end of an allocation (effectively offset -sizeof(void *) from the next allocation), move it away from the edges of the allocation and into the middle. This provides some protection against small-sized neighboring overflows (or underflows), for which the freelist pointer is commonly the target. (Large or well controlled overwrites are much more likely to attack live object contents, instead of attempting freelist corruption.) The vaunted kernel build benchmark, across 5 runs. Before: Mean: 250.05 Std Dev: 1.85 and after, which appears mysteriously faster: Mean: 247.13 Std Dev: 0.76 Attempts at running "sysbench --test=memory" show the change to be well in the noise (sysbench seems to be pretty unstable here -- it's not really measuring allocation). Hackbench is more allocation-heavy, and while the std dev is above the difference, it looks like may manifest as an improvement as well: 20 runs of "hackbench -g 20 -l 1000", before: Mean: 36.322 Std Dev: 0.577 and after: Mean: 36.056 Std Dev: 0.598 [1] https://twitter.com/vnik5287/status/1235113523098685440Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Vitaly Nikolenko <vnik@duasynt.com> Cc: Silvio Cesare <silvio.cesare@gmail.com> Cc: Christoph Lameter <cl@linux.com>Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/202003051624.AAAC9AECC@keescookSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
Under CONFIG_SLAB_FREELIST_HARDENED=y, the obfuscation was relatively weak in that the ptr and ptr address were usually so close that the first XOR would result in an almost entirely 0-byte value[1], leaving most of the "secret" number ultimately being stored after the third XOR. A single blind memory content exposure of the freelist was generally sufficient to learn the secret. Add a swab() call to mix bits a little more. This is a cheap way (1 cycle) to make attacks need more than a single exposure to learn the secret (or to know _where_ the exposure is in memory). kmalloc-32 freelist walk, before: ptr ptr_addr stored value secret ffff90c22e019020@ffff90c22e019000 is 86528eb656b3b5bd (86528eb656b3b59d) ffff90c22e019040@ffff90c22e019020 is 86528eb656b3b5fd (86528eb656b3b59d) ffff90c22e019060@ffff90c22e019040 is 86528eb656b3b5bd (86528eb656b3b59d) ffff90c22e019080@ffff90c22e019060 is 86528eb656b3b57d (86528eb656b3b59d) ffff90c22e0190a0@ffff90c22e019080 is 86528eb656b3b5bd (86528eb656b3b59d) ... after: ptr ptr_addr stored value secret ffff9eed6e019020@ffff9eed6e019000 is 793d1135d52cda42 (86528eb656b3b59d) ffff9eed6e019040@ffff9eed6e019020 is 593d1135d52cda22 (86528eb656b3b59d) ffff9eed6e019060@ffff9eed6e019040 is 393d1135d52cda02 (86528eb656b3b59d) ffff9eed6e019080@ffff9eed6e019060 is 193d1135d52cdae2 (86528eb656b3b59d) ffff9eed6e0190a0@ffff9eed6e019080 is f93d1135d52cdac2 (86528eb656b3b59d) [1] https://blog.infosectcbr.com.au/2020/03/weaknesses-in-linux-kernel-heap.html Fixes: 2482ddec ("mm: add SLUB free list pointer obfuscation") Reported-by: NSilvio Cesare <silvio.cesare@gmail.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/202003051623.AF4F8CB@keescookSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 chenqiwu 提交于
There are slub_cpu_partial() and slub_set_cpu_partial() APIs to wrap kmem_cache->cpu_partial. This patch will use the two APIs to replace kmem_cache->cpu_partial in slub code. Signed-off-by: Nchenqiwu <chenqiwu@xiaomi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/1582079562-17980-1-git-send-email-qiwuchen55@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 chenqiwu 提交于
There are slub_percpu_partial() and slub_set_percpu_partial() APIs to wrap kmem_cache->cpu_partial. This patch will use the two to replace cpu_slab->partial in slub code. Signed-off-by: Nchenqiwu <chenqiwu@xiaomi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/1581951895-3038-1-git-send-email-qiwuchen55@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 3月, 2020 1 次提交
-
-
由 Daniel Vetter 提交于
slab does this already, and I want to use this in a memory allocation tracker in drm for stuff that's tied to the lifetime of a drm_device, not the underlying struct device. Kinda like devres, but for drm. Acked-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Link: https://patchwork.freedesktop.org/patch/msgid/20200323144950.3018436-2-daniel.vetter@ffwll.ch
-
- 22 3月, 2020 1 次提交
-
-
由 Vlastimil Babka 提交于
Sachin reports [1] a crash in SLUB __slab_alloc(): BUG: Kernel NULL pointer dereference on read at 0x000073b0 Faulting instruction address: 0xc0000000003d55f4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1 NIP: c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000 REGS: c0000008b37836d0 TRAP: 0300 Not tainted (5.6.0-rc2-next-20200218-autotest) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24004844 XER: 00000000 CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1 GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500 GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620 GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000 GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000 GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002 GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122 GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8 GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180 NIP ___slab_alloc+0x1f4/0x760 LR __slab_alloc+0x34/0x60 Call Trace: ___slab_alloc+0x334/0x760 (unreliable) __slab_alloc+0x34/0x60 __kmalloc_node+0x110/0x490 kvmalloc_node+0x58/0x110 mem_cgroup_css_online+0x108/0x270 online_css+0x48/0xd0 cgroup_apply_control_enable+0x2ec/0x4d0 cgroup_mkdir+0x228/0x5f0 kernfs_iop_mkdir+0x90/0xf0 vfs_mkdir+0x110/0x230 do_mkdirat+0xb0/0x1a0 system_call+0x5c/0x68 This is a PowerPC platform with following NUMA topology: available: 2 nodes (0-1) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 1 size: 35247 MB node 1 free: 30907 MB node distances: node 0 1 0: 10 40 1: 40 10 possible numa nodes: 0-31 This only happens with a mmotm patch "mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node" [2] which effectively calls kmalloc_node for each possible node. SLUB however only allocates kmem_cache_node on online N_NORMAL_MEMORY nodes, and relies on node_to_mem_node to return such valid node for other nodes since commit a561ce00 ("slub: fall back to node_to_mem_node() node if allocating on memoryless node"). This is however not true in this configuration where the _node_numa_mem_ array is not initialized for nodes 0 and 2-31, thus it contains zeroes and get_partial() ends up accessing non-allocated kmem_cache_node. A related issue was reported by Bharata (originally by Ramachandran) [3] where a similar PowerPC configuration, but with mainline kernel without patch [2] ends up allocating large amounts of pages by kmalloc-1k kmalloc-512. This seems to have the same underlying issue with node_to_mem_node() not behaving as expected, and might probably also lead to an infinite loop with CONFIG_SLUB_CPU_PARTIAL [4]. This patch should fix both issues by not relying on node_to_mem_node() anymore and instead simply falling back to NUMA_NO_NODE, when kmalloc_node(node) is attempted for a node that's not online, or has no usable memory. The "usable memory" condition is also changed from node_present_pages() to N_NORMAL_MEMORY node state, as that is exactly the condition that SLUB uses to allocate kmem_cache_node structures. The check in get_partial() is removed completely, as the checks in ___slab_alloc() are now sufficient to prevent get_partial() being reached with an invalid node. [1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/ [2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/ [3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/ [4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/ Fixes: a561ce00 ("slub: fall back to node_to_mem_node() node if allocating on memoryless node") Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com> Reported-by: NPUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy@in.ibm.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: NBharata B Rao <bharata@linux.ibm.com> Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christopher Lameter <cl@linux.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200320115533.9604-1-vbabka@suse.czDebugged-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 3月, 2020 2 次提交
-
-
由 Linus Torvalds 提交于
This is just a cleanup addition to Jann's fix to properly update the transaction ID for the slub slowpath in commit fd4d9c7d ("mm: slub: add missing TID bump.."). The transaction ID is what protects us against any concurrent accesses, but we should really also make sure to make the 'freelist' comparison itself always use the same freelist value that we then used as the new next free pointer. Jann points out that if we do all of this carefully, we could skip the transaction ID update for all the paths that only remove entries from the lists, and only update the TID when adding entries (to avoid the ABA issue with cmpxchg and list handling re-adding a previously seen value). But this patch just does the "make sure to cmpxchg the same value we used" rather than then try to be clever. Acked-by: NJann Horn <jannh@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jann Horn 提交于
When kmem_cache_alloc_bulk() attempts to allocate N objects from a percpu freelist of length M, and N > M > 0, it will first remove the M elements from the percpu freelist, then call ___slab_alloc() to allocate the next element and repopulate the percpu freelist. ___slab_alloc() can re-enable IRQs via allocate_slab(), so the TID must be bumped before ___slab_alloc() to properly commit the freelist head change. Fix it by unconditionally bumping c->tid when entering the slowpath. Cc: stable@vger.kernel.org Fixes: ebe909e0 ("slub: improve bulk alloc strategy") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 2月, 2020 1 次提交
-
-
由 Yu Zhao 提交于
If we are already under list_lock, don't call kmalloc(). Otherwise we will run into a deadlock because kmalloc() also tries to grab the same lock. Fix the problem by using a static bitmap instead. WARNING: possible recursive locking detected -------------------------------------------- mount-encrypted/4921 is trying to acquire lock: (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 but task is already holding lock: (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&n->list_lock)->rlock); lock(&(&n->list_lock)->rlock); *** DEADLOCK *** Link: http://lkml.kernel.org/r/20191108193958.205102-2-yuzhao@google.comSigned-off-by: NYu Zhao <yuzhao@google.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 1月, 2020 1 次提交
-
-
The allocation mask is no longer used by on_each_cpu_cond() and on_each_cpu_cond_mask() and can be removed. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20200117090137.1205765-4-bigeasy@linutronix.de
-
- 14 1月, 2020 1 次提交
-
-
由 Vlastimil Babka 提交于
Commit 96a2b03f ("mm, debug_pagelloc: use static keys to enable debugging") has introduced a static key to reduce overhead when debug_pagealloc is compiled in but not enabled. It relied on the assumption that jump_label_init() is called before parse_early_param() as in start_kernel(), so when the "debug_pagealloc=on" option is parsed, it is safe to enable the static key. However, it turns out multiple architectures call parse_early_param() earlier from their setup_arch(). x86 also calls jump_label_init() even earlier, so no issue was found while testing the commit, but same is not true for e.g. ppc64 and s390 where the kernel would not boot with debug_pagealloc=on as found by our QA. To fix this without tricky changes to init code of multiple architectures, this patch partially reverts the static key conversion from 96a2b03f. Init-time and non-fastpath calls (such as in arch code) of debug_pagealloc_enabled() will again test a simple bool variable. Fastpath mm code is converted to a new debug_pagealloc_enabled_static() variant that relies on the static key, which is enabled in a well-defined point in mm_init() where it's guaranteed that jump_label_init() has been called, regardless of architecture. [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early] Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz Fixes: 96a2b03f ("mm, debug_pagelloc: use static keys to enable debugging") Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 12月, 2019 1 次提交
-
-
由 Thomas Gleixner 提交于
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the pte_unmap_same() and SLUB code over to use CONFIG_PREEMPTION. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NChistoph Lameter <cl@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20191015191821.11479-26-bigeasy@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 01 12月, 2019 3 次提交
-
-
由 Yu Zhao 提交于
The function doesn't need to return any value, and the check can be done in one pass. There is a behavior change: before the patch, we stop at the first invalid free object; after the patch, we stop at the first invalid object, free or in use. This shouldn't matter because the original behavior isn't intended anyway. Link: http://lkml.kernel.org/r/20191108193958.205102-1-yuzhao@google.comSigned-off-by: NYu Zhao <yuzhao@google.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yu Zhao 提交于
Slub doesn't use PG_active and PG_error anymore. Link: http://lkml.kernel.org/r/20191007222023.162256-1-yuzhao@google.comSigned-off-by: NYu Zhao <yuzhao@google.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Miles Chen 提交于
With commit ad67b74d ("printk: hash addresses printed with %p"), it is a little bit harder to match the fault addresses printed by check_bytes_and_report() or slab_pad_check() in the dump because the fault addresses may not show up in the dump. Print the offset of the fault addresses to make it easier to match the incorrect poison or padding values in the dump. Before: We have to search the "63" in the dump. If we want to get the offset of 63, we have to count it from the start of Object dump. ============================================================= BUG kmalloc-128 (Not tainted): Poison overwritten ------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: 0x00000000570da294-0x00000000570da294. First byte 0x63 instead of 0x6b ... INFO: Object 0x000000006ebb3b9e @offset=14208 fp=0x0000000065862488 Redzone 00000000a6abccff: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 00000000741c16f0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 0000000061ad278f: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 000000000467c1bd: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 000000008812766b: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 000000003d9b8f25: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 0000000000d80c33: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 00000000867b0d90: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Object 000000006ebb3b9e: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000005ea59a9f: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000003ef8bddc: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000008190375d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000006df7fb32: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 0000000069474eae: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 0000000008073b7d: 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 00000000b45ae74d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 After: We know the fault address is at @offset=1508, and the Object is at @offset=1408, so we know the fault address is at offset=100 within the object. ========================================================= BUG kmalloc-128 (Not tainted): Poison overwritten --------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: 0x00000000638ec1d1-0x00000000638ec1d1 @offset=1508. First byte 0x63 instead of 0x6b ... INFO: Object 0x000000008171818d @offset=1408 fp=0x0000000066dae230 Redzone 00000000e2697ab6: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 0000000064b6a381: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 00000000e413a234: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 0000000004c1dfeb: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 000000009ad24d42: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 000000002a196a23: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 00000000a7b8468a: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Redzone 0000000088db6da3: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb Object 000000008171818d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000007c4035d4: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000004dd281a4: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 0000000079121dff: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 00000000756682a9: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 0000000053b7e541: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 0000000091f8d530: 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Object 000000009c76035c: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 Link: http://lkml.kernel.org/r/20190925140807.20490-1-miles.chen@mediatek.comSigned-off-by: NMiles Chen <miles.chen@mediatek.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 11月, 2019 1 次提交
-
-
由 Laura Abbott 提交于
Commit 1b7e816f ("mm: slub: Fix slab walking for init_on_free") fixed one problem with the slab walking but missed a key detail: When walking the list, the head and tail pointers need to be updated since we end up reversing the list as a result. Without doing this, bulk free is broken. One way this is exposed is a NULL pointer with slub_debug=F: ============================================================================= BUG skbuff_head_cache (Tainted: G T): Object already free ----------------------------------------------------------------------------- INFO: Slab 0x000000000d2d2f8f objects=16 used=3 fp=0x0000000064309071 flags=0x3fff00000000201 BUG: kernel NULL pointer dereference, address: 0000000000000000 Oops: 0000 [#1] PREEMPT SMP PTI RIP: 0010:print_trailer+0x70/0x1d5 Call Trace: <IRQ> free_debug_processing.cold.37+0xc9/0x149 __slab_free+0x22a/0x3d0 kmem_cache_free_bulk+0x415/0x420 __kfree_skb_flush+0x30/0x40 net_rx_action+0x2dd/0x480 __do_softirq+0xf0/0x246 irq_exit+0x93/0xb0 do_IRQ+0xa0/0x110 common_interrupt+0xf/0xf </IRQ> Given we're now almost identical to the existing debugging code which correctly walks the list, combine with that. Link: https://lkml.kernel.org/r/20191104170303.GA50361@gandi.net Link: http://lkml.kernel.org/r/20191106222208.26815-1-labbott@redhat.com Fixes: 1b7e816f ("mm: slub: Fix slab walking for init_on_free") Signed-off-by: NLaura Abbott <labbott@redhat.com> Reported-by: NThibaut Sautereau <thibaut.sautereau@clip-os.org> Acked-by: NDavid Rientjes <rientjes@google.com> Tested-by: NAlexander Potapenko <glider@google.com> Acked-by: NAlexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <clipos@ssi.gouv.fr> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 10月, 2019 2 次提交
-
-
由 Alexander Potapenko 提交于
slab_alloc_node() already zeroed out the freelist pointer if init_on_free was on. Thibaut Sautereau noticed that the same needs to be done for kmem_cache_alloc_bulk(), which performs the allocations separately. kmem_cache_alloc_bulk() is currently used in two places in the kernel, so this change is unlikely to have a major performance impact. SLAB doesn't require a similar change, as auto-initialization makes the allocator store the freelist pointers off-slab. Link: http://lkml.kernel.org/r/20191007091605.30530-1-glider@google.com Fixes: 6471384a ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options") Signed-off-by: NAlexander Potapenko <glider@google.com> Reported-by: NThibaut Sautereau <thibaut@sautereau.fr> Reported-by: NKees Cook <keescook@chromium.org> Cc: Christoph Lameter <cl@linux.com> Cc: Laura Abbott <labbott@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Qian Cai 提交于
A long time ago we fixed a similar deadlock in show_slab_objects() [1]. However, it is apparently due to the commits like 01fb58bc ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path") and 03afc0e2 ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by just reading files in /sys/kernel/slab which will generate a lockdep splat below. Since the "mem_hotplug_lock" here is only to obtain a stable online node mask while racing with NUMA node hotplug, in the worst case, the results may me miscalculated while doing NUMA node hotplug, but they shall be corrected by later reads of the same files. WARNING: possible circular locking dependency detected ------------------------------------------------------ cat/5224 is trying to acquire lock: ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at: show_slab_objects+0x94/0x3a8 but task is already holding lock: b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (kn->count#45){++++}: lock_acquire+0x31c/0x360 __kernfs_remove+0x290/0x490 kernfs_remove+0x30/0x44 sysfs_remove_dir+0x70/0x88 kobject_del+0x50/0xb0 sysfs_slab_unlink+0x2c/0x38 shutdown_cache+0xa0/0xf0 kmemcg_cache_shutdown_fn+0x1c/0x34 kmemcg_workfn+0x44/0x64 process_one_work+0x4f4/0x950 worker_thread+0x390/0x4bc kthread+0x1cc/0x1e8 ret_from_fork+0x10/0x18 -> #1 (slab_mutex){+.+.}: lock_acquire+0x31c/0x360 __mutex_lock_common+0x16c/0xf78 mutex_lock_nested+0x40/0x50 memcg_create_kmem_cache+0x38/0x16c memcg_kmem_cache_create_func+0x3c/0x70 process_one_work+0x4f4/0x950 worker_thread+0x390/0x4bc kthread+0x1cc/0x1e8 ret_from_fork+0x10/0x18 -> #0 (mem_hotplug_lock.rw_sem){++++}: validate_chain+0xd10/0x2bcc __lock_acquire+0x7f4/0xb8c lock_acquire+0x31c/0x360 get_online_mems+0x54/0x150 show_slab_objects+0x94/0x3a8 total_objects_show+0x28/0x34 slab_attr_show+0x38/0x54 sysfs_kf_seq_show+0x198/0x2d4 kernfs_seq_show+0xa4/0xcc seq_read+0x30c/0x8a8 kernfs_fop_read+0xa8/0x314 __vfs_read+0x88/0x20c vfs_read+0xd8/0x10c ksys_read+0xb0/0x120 __arm64_sys_read+0x54/0x88 el0_svc_handler+0x170/0x240 el0_svc+0x8/0xc other info that might help us debug this: Chain exists of: mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(kn->count#45); lock(slab_mutex); lock(kn->count#45); lock(mem_hotplug_lock.rw_sem); *** DEADLOCK *** 3 locks held by cat/5224: #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8 #1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0 #2: b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0 stack backtrace: Call trace: dump_backtrace+0x0/0x248 show_stack+0x20/0x2c dump_stack+0xd0/0x140 print_circular_bug+0x368/0x380 check_noncircular+0x248/0x250 validate_chain+0xd10/0x2bcc __lock_acquire+0x7f4/0xb8c lock_acquire+0x31c/0x360 get_online_mems+0x54/0x150 show_slab_objects+0x94/0x3a8 total_objects_show+0x28/0x34 slab_attr_show+0x38/0x54 sysfs_kf_seq_show+0x198/0x2d4 kernfs_seq_show+0xa4/0xcc seq_read+0x30c/0x8a8 kernfs_fop_read+0xa8/0x314 __vfs_read+0x88/0x20c vfs_read+0xd8/0x10c ksys_read+0xb0/0x120 __arm64_sys_read+0x54/0x88 el0_svc_handler+0x170/0x240 el0_svc+0x8/0xc I think it is important to mention that this doesn't expose the show_slab_objects to use-after-free. There is only a single path that might really race here and that is the slab hotplug notifier callback __kmem_cache_shrink (via slab_mem_going_offline_callback) but that path doesn't really destroy kmem_cache_node data structures. [1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html [akpm@linux-foundation.org: add comment explaining why we don't need mem_hotplug_lock] Link: http://lkml.kernel.org/r/1570192309-10132-1-git-send-email-cai@lca.pw Fixes: 01fb58bc ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path") Fixes: 03afc0e2 ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}") Signed-off-by: NQian Cai <cai@lca.pw> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-