- 24 3月, 2022 2 次提交
-
-
由 Randy Dunlap 提交于
initcall_blacklist() should return 1 to indicate that it handled its cmdline arguments. set_debug_rodata() should return 1 to indicate that it handled its cmdline arguments. Print a warning if the option string is invalid. This prevents these strings from being added to the 'init' program's environment as they are not init arguments/parameters. Link: https://lkml.kernel.org/r/20220221050901.23985-1-rdunlap@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org> Reported-by: NIgor Zhbanov <i.zhbanov@omprussia.ru> Cc: Ingo Molnar <mingo@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mark-PK Tsai 提交于
Use ktime_us_delta() to make the initcall_debug log more precise than right shifting the result of ktime_to_ns() by 10 bits. Link: https://lkml.kernel.org/r/20220209053350.15771-1-mark-pk.tsai@mediatek.comSigned-off-by: NMark-PK Tsai <mark-pk.tsai@mediatek.com> Reviewed-by: NAndrew Halaney <ahalaney@redhat.com> Tested-by: NAndrew Halaney <ahalaney@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Kees Cook <keescook@chromium.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: YJ Chiang <yj.chiang@mediatek.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 1月, 2022 1 次提交
-
-
由 Vlastimil Babka 提交于
Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated from memblock, even if stack depot ends up not actually used. The default size of stack_table is 4MB on 32-bit, 8MB on 64-bit. This is fine for use-cases such as KASAN which is also a config option and has overhead on its own. But it's an issue for functionality that has to be actually enabled on boot (page_owner) or depends on hardware (GPU drivers) and thus the memory might be wasted. This was raised as an issue [1] when attempting to add stackdepot support for SLUB's debug object tracking functionality. It's common to build kernels with CONFIG_SLUB_DEBUG and enable slub_debug on boot only when needed, or create only specific kmem caches with debugging for testing purposes. It would thus be more efficient if stackdepot's table was allocated only when actually going to be used. This patch thus makes the allocation (and whole stack_depot_init() call) optional: - Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN select this flag. - Other users have to call stack_depot_init() as part of their own init when it's determined that stack depot will actually be used. This may depend on both config and runtime conditions. Convert current users which are page_owner and several in the DRM subsystem. Same will be done for SLUB later. - Because the init might now be called after the boot-time memblock allocation has given all memory to the buddy allocator, change stack_depot_init() to allocate stack_table with kvmalloc() when memblock is no longer available. Also handle allocation failure by disabling stackdepot (could have theoretically happened even with memblock allocation previously), and don't unnecessarily align the memblock allocation to its own size anymore. [1] https://lore.kernel.org/all/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/ Link: https://lkml.kernel.org/r/20211013073005.11351-1-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: Marco Elver <elver@google.com> # stackdepot Cc: Marco Elver <elver@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Oliver Glitta <glittao@gmail.com> Cc: Imran Khan <imran.f.khan@oracle.com> From: Colin Ian King <colin.king@canonical.com> Subject: lib/stackdepot: fix spelling mistake and grammar in pr_err message There is a spelling mistake of the work allocation so fix this and re-phrase the message to make it easier to read. Link: https://lkml.kernel.org/r/20211015104159.11282-1-colin.king@canonical.comSigned-off-by: NColin Ian King <colin.king@canonical.com> Cc: Vlastimil Babka <vbabka@suse.cz> From: Vlastimil Babka <vbabka@suse.cz> Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup On FLATMEM, we call page_ext_init_flatmem_late() just before kmem_cache_init() which means stack_depot_init() (called by page owner init) will not recognize properly it should use kvmalloc() and not memblock_alloc(). memblock_alloc() will also not issue a warning and return a block memory that can be invalid and cause kernel page fault when saving stacks, as reported by the kernel test robot [1]. Fix this by moving page_ext_init_flatmem_late() below kmem_cache_init() so that slab_is_available() is true during stack_depot_init(). SPARSEMEM doesn't have this issue, as it doesn't do page_ext_init_flatmem_late(), but a different page_ext_init() even later in the boot process. Thanks to Mike Rapoport for pointing out the FLATMEM init ordering issue. While at it, also actually resolve a checkpatch warning in stack_depot_init() from DRM CI, which was supposed to be in the original patch already. [1] https://lore.kernel.org/all/20211014085450.GC18719@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/6abd9213-19a9-6d58-cedc-2414386d2d81@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Reported-by: Nkernel test robot <oliver.sang@intel.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> From: Vlastimil Babka <vbabka@suse.cz> Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup3 Due to cd06ab2f ("drm/locking: add backtrace for locking contended locks without backoff") landing recently to -next adding a new stack depot user in drivers/gpu/drm/drm_modeset_lock.c we need to add an appropriate call to stack_depot_init() there as well. Link: https://lkml.kernel.org/r/2a692365-cfa1-64f2-34e0-8aa5674dce5e@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Marco Elver <elver@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Oliver Glitta <glittao@gmail.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> From: Vlastimil Babka <vbabka@suse.cz> Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup4 Due to 4e66934e ("lib: add reference counting tracking infrastructure") landing recently to net-next adding a new stack depot user in lib/ref_tracker.c we need to add an appropriate call to stack_depot_init() there as well. Link: https://lkml.kernel.org/r/45c1b738-1a2f-5b5f-2f6d-86fab206d01c@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NEric Dumazet <edumazet@google.com> Cc: Jiri Slab <jirislaby@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 11月, 2021 1 次提交
-
-
由 Andrew Halaney 提交于
The prior message is confusing users, which is the exact opposite of the goal. If the message is being seen, one of the following situations is happening: 1. the param is misspelled 2. the param is not valid due to the kernel configuration 3. the param is intended for init but isn't after the '--' delineator on the command line To make that more clear to the user, explicitly mention "kernel command line" and also note that the params are still passed to user space to avoid causing any alarm over params intended for init. Link: https://lkml.kernel.org/r/20211013223502.96756-1-ahalaney@redhat.com Fixes: 86d1919a ("init: print out unknown kernel parameters") Signed-off-by: NAndrew Halaney <ahalaney@redhat.com> Suggested-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 11月, 2021 2 次提交
-
-
由 Mike Rapoport 提交于
Rename memblock_free_ptr() to memblock_free() and use memblock_free() when freeing a virtual pointer so that memblock_free() will be a counterpart of memblock_alloc() The callers are updated with the below semantic patch and manual addition of (void *) casting to pointers that are represented by unsigned long variables. @@ identifier vaddr; expression size; @@ ( - memblock_phys_free(__pa(vaddr), size); + memblock_free(vaddr, size); | - memblock_free_ptr(vaddr, size); + memblock_free(vaddr, size); ) [sfr@canb.auug.org.au: fixup] Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christophe Leroy 提交于
core_kernel_text() considers that until system_state in at least SYSTEM_RUNNING, init memory is valid. But init memory is freed a few lines before setting SYSTEM_RUNNING, so we have a small period of time when core_kernel_text() is wrong. Create an intermediate system state called SYSTEM_FREEING_INIT that is set before starting freeing init memory, and use it in core_kernel_text() to report init memory invalid earlier. Link: https://lkml.kernel.org/r/9ecfdee7dd4d741d172cb93ff1d87f1c58127c9a.1633001016.git.christophe.leroy@csgroup.euSigned-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 10月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
Except for the features passed to blk_queue_required_elevator_features, elevator.h is only needed internally to the block layer. Move the ELEVATOR_F_* definitions to blkdev.h, and the move elevator.h to block/, dropping all the spurious includes outside of that. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-13-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 10月, 2021 4 次提交
-
-
由 Masami Hiramatsu 提交于
Free unused memblock in a error case to fix memblock leak in xbc_make_cmdline(). Link: https://lkml.kernel.org/r/163177339181.682366.8713781325929549256.stgit@devnote2 Fixes: 51887d03 ("bootconfig: init: Allow admin to use bootconfig for kernel command line") Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Masami Hiramatsu 提交于
Avoid using this noisy name and use more calm one. This is just a name change. No functional change. Link: https://lkml.kernel.org/r/163187295918.2366983.5231840238429996027.stgit@devnote2Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Masami Hiramatsu 提交于
Add xbc_get_info() API which allows user to get the number of used xbc_nodes and the size of bootconfig data. This is also useful for checking the bootconfig is initialized or not. Link: https://lkml.kernel.org/r/163177340877.682366.4360676589783197627.stgit@devnote2Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Masami Hiramatsu 提交于
Allocate 'xbc_data' in the xbc_init() so that it does not need to care about the ownership of the copied data. Link: https://lkml.kernel.org/r/163177339986.682366.898762699429769117.stgit@devnote2Suggested-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 23 9月, 2021 1 次提交
-
-
由 Geert Uytterhoeven 提交于
Commit f8ade8dd ("xsurf100: drop include of lib8390.c") accidentally changed init/main.c. Revert that part. Fixes: f8ade8dd ("xsurf100: drop include of lib8390.c") Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 9月, 2021 1 次提交
-
-
由 Linus Torvalds 提交于
The boot-time allocation interface for memblock is a mess, with 'memblock_alloc()' returning a virtual pointer, but then you are supposed to free it with 'memblock_free()' that takes a _physical_ address. Not only is that all kinds of strange and illogical, but it actually causes bugs, when people then use it like a normal allocation function, and it fails spectacularly on a NULL pointer: https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/ or just random memory corruption if the debug checks don't catch it: https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/ I really don't want to apply patches that treat the symptoms, when the fundamental cause is this horribly confusing interface. I started out looking at just automating a sane replacement sequence, but because of this mix or virtual and physical addresses, and because people have used the "__pa()" macro that can take either a regular kernel pointer, or just the raw "unsigned long" address, it's all quite messy. So this just introduces a new saner interface for freeing a virtual address that was allocated using 'memblock_alloc()', and that was kept as a regular kernel pointer. And then it converts a couple of users that are obvious and easy to test, including the 'xbc_nodes' case in lib/bootconfig.c that caused problems. Reported-by: Nkernel test robot <oliver.sang@intel.com> Fixes: 40caa127 ("init: bootconfig: Remove all bootconfig data when the init memory is removed") Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 9月, 2021 4 次提交
-
-
由 Masami Hiramatsu 提交于
Reorder the init parameters from bootconfig and kernel cmdline so that the kernel cmdline always be the last part of the parameters as below. " -- "[bootconfig init params][cmdline init params] This change will help us to prevent that bootconfig init params overwrite the init params which user gives in the command line. Link: https://lkml.kernel.org/r/163077085675.222577.5665176468023636160.stgit@devnote2Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Masami Hiramatsu 提交于
Since the bootconfig is used only in the init functions, it doesn't need to keep the data after boot. Free it when the init memory is removed. Link: https://lkml.kernel.org/r/163077084958.222577.5924961258513004428.stgit@devnote2Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Kefeng Wang 提交于
There are some empty trap_init() definitions in different ARCHs, Introduce a new weak trap_init() function to clean them up. Link: https://lkml.kernel.org/r/20210812123602.76356-1-wangkefeng.wang@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm32] Acked-by: Vineet Gupta [arc] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Stafford Horne <shorne@gmail.com> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <palmerdabbelt@google.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rasmus Villemoes 提交于
Currently, usermodehelper is enabled right before PID1 starts going through the initcalls. However, any call of a usermodehelper from a pure_, core_, postcore_, arch_, subsys_ or fs_ initcall is futile, as there is no filesystem contents yet. Up until commit e7cb072e ("init/initramfs.c: do unpacking asynchronously"), such calls, whether via some request_module(), a legacy uevent "/sbin/hotplug" notification or something else, would just fail silently with (presumably) -ENOENT from kernel_execve(). However, that commit introduced the wait_for_initramfs() synchronization hook which must be called from the usermodehelper exec path right before the kernel_execve, in order that request_module() et al done from *after* rootfs_initcall() time (i.e. device_ and late_ initcalls) would continue to find a populated initramfs as they used to. Any call of wait_for_initramfs() done before the unpacking has been scheduled (i.e. before rootfs_initcall time) must just return immediately [and let the caller find an empty file system] in order not to deadlock the machine. I mistakenly thought, and my limited testing confirmed, that there were no such calls, so I added a pr_warn_once() in wait_for_initramfs(). It turns out that one can indeed hit request_module() as well as kobject_uevent_env() during those early init calls, leading to a user-visible warning in the kernel log emitted consistently for certain configurations. We could just remove the pr_warn_once(), but I think it's better to postpone enabling the usermodehelper framework until there is at least some chance of finding the executable. That is also a little more efficient in that a lot of work done in umh.c will be elided. However, it does change the error seen by those early callers from -ENOENT to -EBUSY, so there is a risk of a regression if any caller care about the exact error value. Link: https://lkml.kernel.org/r/20210728134638.329060-1-linux@rasmusvillemoes.dk Fixes: e7cb072e ("init/initramfs.c: do unpacking asynchronously") Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Reported-by: NAlexander Egorenkov <egorenar@linux.ibm.com> Reported-by: NBruno Goncalves <bgoncalv@redhat.com> Reported-by: NHeiner Kallweit <hkallweit1@gmail.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 8月, 2021 1 次提交
-
-
由 Masami Hiramatsu 提交于
Since the 'bootconfig' command line parameter is handled before parsing the command line, it doesn't use early_param(). But in this case, kernel shows a wrong warning message about it. [ 0.013714] Kernel command line: ro console=ttyS0 bootconfig console=tty0 [ 0.013741] Unknown command line parameters: bootconfig To suppress this message, add a dummy handler for 'bootconfig'. Link: https://lkml.kernel.org/r/162812945097.77369.1849780946468010448.stgit@devnote2 Fixes: 86d1919a ("init: print out unknown kernel parameters") Reviewed-by: NAndrew Halaney <ahalaney@redhat.com> Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 03 8月, 2021 1 次提交
-
-
由 Michael Schmitz 提交于
Now that ax88796.c exports the ax_NS8390_reinit() symbol, we can include 8390.h instead of lib8390.c, avoiding duplication of that function and killing a few compile warnings in the bargain. Fixes: 861928f4 ("net-next: New ax88796 platform driver for Amiga X-Surf 100 Zorro board (m68k)") Signed-off-by: NMichael Schmitz <schmitzmic@gmail.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 7月, 2021 1 次提交
-
-
由 Stephen Boyd 提交于
Parse the kernel's build ID at initialization so that other code can print a hex format string representation of the running kernel's build ID. This will be used in the kdump and dump_stack code so that developers can easily locate the vmlinux debug symbols for a crash/stacktrace. [swboyd@chromium.org: fix implicit declaration of init_vmlinux_build_id()] Link: https://lkml.kernel.org/r/CAE-0n51UjTbay8N9FXAyE7_aR2+ePrQnKSRJ0gbmRsXtcLBVaw@mail.gmail.com Link: https://lkml.kernel.org/r/20210511003845.2429846-4-swboyd@chromium.orgSigned-off-by: NStephen Boyd <swboyd@chromium.org> Acked-by: NBaoquan He <bhe@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Evan Green <evgreen@chromium.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Matthew Wilcox <willy@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sasha Levin <sashal@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 7月, 2021 1 次提交
-
-
由 Andrew Halaney 提交于
It is easy to foobar setting a kernel parameter on the command line without realizing it, there's not much output that you can use to assess what the kernel did with that parameter by default. Make it a little more explicit which parameters on the command line _looked_ like a valid parameter for the kernel, but did not match anything and ultimately got tossed to init. This is very similar to the unknown parameter message received when loading a module. This assumes the parameters are processed in a normal fashion, some parameters (dyndbg= for example) don't register their parameter with the rest of the kernel's parameters, and therefore always show up in this list (and are also given to init - like the rest of this list). Another example is BOOT_IMAGE= is highlighted as an offender, which it technically is, but is passed by LILO and GRUB so most systems will see that complaint. An example output where "foobared" and "unrecognized" are intentionally invalid parameters: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.12-dirty debug log_buf_len=4M foobared unrecognized=foo Unknown command line parameters: foobared BOOT_IMAGE=/boot/vmlinuz-5.12-dirty unrecognized=foo Link: https://lkml.kernel.org/r/20210511211009.42259-1-ahalaney@redhat.comSigned-off-by: NAndrew Halaney <ahalaney@redhat.com> Suggested-by: NSteven Rostedt <rostedt@goodmis.org> Suggested-by: NBorislav Petkov <bp@suse.de> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 6月, 2021 1 次提交
-
-
由 Masami Hiramatsu 提交于
Move the checksum calculation function into the header for sharing it with tools/bootconfig. Link: https://lkml.kernel.org/r/162262197470.264090.16325743685807878807.stgit@devnote2Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 05 6月, 2021 1 次提交
-
-
由 Mark Rutland 提交于
During boot, kernel_init_freeable() initializes `cad_pid` to the init task's struct pid. Later on, we may change `cad_pid` via a sysctl, and when this happens proc_do_cad_pid() will increment the refcount on the new pid via get_pid(), and will decrement the refcount on the old pid via put_pid(). As we never called get_pid() when we initialized `cad_pid`, we decrement a reference we never incremented, can therefore free the init task's struct pid early. As there can be dangling references to the struct pid, we can later encounter a use-after-free (e.g. when delivering signals). This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to have been around since the conversion of `cad_pid` to struct pid in commit 9ec52099 ("[PATCH] replace cad_pid by a struct pid") from the pre-KASAN stone age of v2.6.19. Fix this by getting a reference to the init task's struct pid when we assign it to `cad_pid`. Full KASAN splat below. ================================================================== BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline] BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509 Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273 CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1 Hardware name: linux,dummy-virt (DT) Call trace: ns_of_pid include/linux/pid.h:153 [inline] task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509 do_notify_parent+0x308/0xe60 kernel/signal.c:1950 exit_notify kernel/exit.c:682 [inline] do_exit+0x2334/0x2bd0 kernel/exit.c:845 do_group_exit+0x108/0x2c8 kernel/exit.c:922 get_signal+0x4e4/0x2a88 kernel/signal.c:2781 do_signal arch/arm64/kernel/signal.c:882 [inline] do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936 work_pending+0xc/0x2dc Allocated by task 0: slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516 slab_alloc_node mm/slub.c:2907 [inline] slab_alloc mm/slub.c:2915 [inline] kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920 alloc_pid+0xdc/0xc00 kernel/pid.c:180 copy_process+0x2794/0x5e18 kernel/fork.c:2129 kernel_clone+0x194/0x13c8 kernel/fork.c:2500 kernel_thread+0xd4/0x110 kernel/fork.c:2552 rest_init+0x44/0x4a0 init/main.c:687 arch_call_rest_init+0x1c/0x28 start_kernel+0x520/0x554 init/main.c:1064 0x0 Freed by task 270: slab_free_hook mm/slub.c:1562 [inline] slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600 slab_free mm/slub.c:3161 [inline] kmem_cache_free+0x224/0x8e0 mm/slub.c:3177 put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114 put_pid+0x30/0x48 kernel/pid.c:109 proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401 proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591 proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617 call_write_iter include/linux/fs.h:1977 [inline] new_sync_write+0x3ac/0x510 fs/read_write.c:518 vfs_write fs/read_write.c:605 [inline] vfs_write+0x9c4/0x1018 fs/read_write.c:585 ksys_write+0x124/0x240 fs/read_write.c:658 __do_sys_write fs/read_write.c:670 [inline] __se_sys_write fs/read_write.c:667 [inline] __arm64_sys_write+0x78/0xb0 fs/read_write.c:667 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall arch/arm64/kernel/syscall.c:49 [inline] el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129 do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168 el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416 el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432 el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701 The buggy address belongs to the object at ffff23794dda0000 which belongs to the cache pid of size 224 The buggy address is located 4 bytes inside of 224-byte region [ffff23794dda0000, ffff23794dda00e0) The buggy address belongs to the page: page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0 head:(____ptrval____) order:1 compound_mapcount:0 flags: 0x3fffc0000010200(slab|head) raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080 raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ================================================================== Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com Fixes: 9ec52099 ("[PATCH] replace cad_pid by a struct pid") Signed-off-by: NMark Rutland <mark.rutland@arm.com> Acked-by: NChristian Brauner <christian.brauner@ubuntu.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Christian Brauner <christian@brauner.io> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 6月, 2021 1 次提交
-
-
由 Peter Zijlstra 提交于
Extend 8fb12156 ("init: Pin init task to the boot CPU, initially") to cover the new PF_NO_SETAFFINITY requirement. While there, move wait_for_completion(&kthreadd_done) into kernel_init() to make it absolutely clear it is the very first thing done by the init thread. Fixes: 570a752b ("lib/smp_processor_id: Use is_percpu_thread() instead of nr_cpus_allowed") Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NValentin Schneider <valentin.schneider@arm.com> Tested-by: NValentin Schneider <valentin.schneider@arm.com> Tested-by: NBorislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/YLS4mbKUrA3Gnb4t@hirez.programming.kicks-ass.net
-
- 12 5月, 2021 1 次提交
-
-
由 Valentin Schneider 提交于
As pointed out by commit de9b8f5d ("sched: Fix crash trying to dequeue/enqueue the idle thread") init_idle() can and will be invoked more than once on the same idle task. At boot time, it is invoked for the boot CPU thread by sched_init(). Then smp_init() creates the threads for all the secondary CPUs and invokes init_idle() on them. As the hotplug machinery brings the secondaries to life, it will issue calls to idle_thread_get(), which itself invokes init_idle() yet again. In this case it's invoked twice more per secondary: at _cpu_up(), and at bringup_cpu(). Given smp_init() already initializes the idle tasks for all *possible* CPUs, no further initialization should be required. Now, removing init_idle() from idle_thread_get() exposes some interesting expectations with regards to the idle task's preempt_count: the secondary startup always issues a preempt_disable(), requiring some reset of the preempt count to 0 between hot-unplug and hotplug, which is currently served by idle_thread_get() -> idle_init(). Given the idle task is supposed to have preemption disabled once and never see it re-enabled, it seems that what we actually want is to initialize its preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove init_idle() from idle_thread_get(). Secondary startups were patched via coccinelle: @begone@ @@ -preempt_disable(); ... cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); Signed-off-by: NValentin Schneider <valentin.schneider@arm.com> Signed-off-by: NIngo Molnar <mingo@kernel.org> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@arm.com
-
- 11 5月, 2021 1 次提交
-
-
由 Frederic Weisbecker 提交于
Once srcu_init() is called, the SRCU core will make use of delayed workqueues, which rely on timers. However init_timers() is called several steps after rcu_init(). This means that a call_srcu() after rcu_init() but before init_timers() would find itself within a dangerously uninitialized timer core. This commit therefore creates a separate call to srcu_init() after init_timer() completes, which ensures that we stay in early SRCU mode until timers are safe(r). Signed-off-by: NFrederic Weisbecker <frederic@kernel.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
-
- 07 5月, 2021 1 次提交
-
-
由 Rasmus Villemoes 提交于
Patch series "background initramfs unpacking, and CONFIG_MODPROBE_PATH", v3. These two patches are independent, but better-together. The second is a rather trivial patch that simply allows the developer to change "/sbin/modprobe" to something else - e.g. the empty string, so that all request_module() during early boot return -ENOENT early, without even spawning a usermode helper, needlessly synchronizing with the initramfs unpacking. The first patch delegates decompressing the initramfs to a worker thread, allowing do_initcalls() in main.c to proceed to the device_ and late_ initcalls without waiting for that decompression (and populating of rootfs) to finish. Obviously, some of those later calls may rely on the initramfs being available, so I've added synchronization points in the firmware loader and usermodehelper paths - there might be other places that would need this, but so far no one has been able to think of any places I have missed. There's not much to win if most of the functionality needed during boot is only available as modules. But systems with a custom-made .config and initramfs can boot faster, partly due to utilizing more than one cpu earlier, partly by avoiding known-futile modprobe calls (which would still trigger synchronization with the initramfs unpacking, thus eliminating most of the first benefit). This patch (of 2): Most of the boot process doesn't actually need anything from the initramfs, until of course PID1 is to be executed. So instead of doing the decompressing and populating of the initramfs synchronously in populate_rootfs() itself, push that off to a worker thread. This is primarily motivated by an embedded ppc target, where unpacking even the rather modest sized initramfs takes 0.6 seconds, which is long enough that the external watchdog becomes unhappy that it doesn't get attention soon enough. By doing the initramfs decompression in a worker thread, we get to do the device_initcalls and hence start petting the watchdog much sooner. Normal desktops might benefit as well. On my mostly stock Ubuntu kernel, my initramfs is a 26M xz-compressed blob, decompressing to around 126M. That takes almost two seconds: [ 0.201454] Trying to unpack rootfs image as initramfs... [ 1.976633] Freeing initrd memory: 29416K Before this patch, these lines occur consecutively in dmesg. With this patch, the timestamps on these two lines is roughly the same as above, but with 172 lines inbetween - so more than one cpu has been kept busy doing work that would otherwise only happen after the populate_rootfs() finished. Should one of the initcalls done after rootfs_initcall time (i.e., device_ and late_ initcalls) need something from the initramfs (say, a kernel module or a firmware blob), it will simply wait for the initramfs unpacking to be done before proceeding, which should in theory make this completely safe. But if some driver pokes around in the filesystem directly and not via one of the official kernel interfaces (i.e. request_firmware*(), call_usermodehelper*) that theory may not hold - also, I certainly might have missed a spot when sprinkling wait_for_initramfs(). So there is an escape hatch in the form of an initramfs_async= command line parameter. Link: https://lkml.kernel.org/r/20210313212528.2956377-1-linux@rasmusvillemoes.dk Link: https://lkml.kernel.org/r/20210313212528.2956377-2-linux@rasmusvillemoes.dkSigned-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 5月, 2021 2 次提交
-
-
由 Kefeng Wang 提交于
mem_init_print_info() is called in mem_init() on each architecture, and pass NULL argument, so using void argument and move it into mm_init(). Link: https://lkml.kernel.org/r/20210317015210.33641-1-wangkefeng.wang@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86] Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> [powerpc] Acked-by: NDavid Hildenbrand <david@redhat.com> Tested-by: Anatoly Pugachev <matorola@gmail.com> [sparc64] Acked-by: Russell King <rmk+kernel@armlinux.org.uk> [arm] Acked-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Guo Ren <guoren@kernel.org> Cc: Yoshinori Sato <ysato@users.osdn.me> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nicholas Piggin 提交于
This changes the awkward approach where architectures provide init functions to determine which levels they can provide large mappings for, to one where the arch is queried for each call. This removes code and indirection, and allows constant-folding of dead code for unsupported levels. This also adds a prot argument to the arch query. This is unused currently but could help with some architectures (e.g., some powerpc processors can't map uncacheable memory with large pages). Link: https://lkml.kernel.org/r/20210317062402.533919-7-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NDing Tianhong <dingtianhong@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Russell King <linux@armlinux.org.uk> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 4月, 2021 1 次提交
-
-
由 Kees Cook 提交于
This provides the ability for architectures to enable kernel stack base address offset randomization. This feature is controlled by the boot param "randomize_kstack_offset=on/off", with its default value set by CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT. This feature is based on the original idea from the last public release of PaX's RANDKSTACK feature: https://pax.grsecurity.net/docs/randkstack.txt All the credit for the original idea goes to the PaX team. Note that the design and implementation of this upstream randomize_kstack_offset feature differs greatly from the RANDKSTACK feature (see below). Reasoning for the feature: This feature aims to make harder the various stack-based attacks that rely on deterministic stack structure. We have had many such attacks in past (just to name few): https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf https://jon.oberheide.org/files/stackjacking-infiltrate11.pdf https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html As Linux kernel stack protections have been constantly improving (vmap-based stack allocation with guard pages, removal of thread_info, STACKLEAK), attackers have had to find new ways for their exploits to work. They have done so, continuing to rely on the kernel's stack determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT were not relevant. For example, the following recent attacks would have been hampered if the stack offset was non-deterministic between syscalls: https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf (page 70: targeting the pt_regs copy with linear stack overflow) https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html (leaked stack address from one syscall as a target during next syscall) The main idea is that since the stack offset is randomized on each system call, it is harder for an attack to reliably land in any particular place on the thread stack, even with address exposures, as the stack base will change on the next syscall. Also, since randomization is performed after placing pt_regs, the ptrace-based approach[1] to discover the randomized offset during a long-running syscall should not be possible. Design description: During most of the kernel's execution, it runs on the "thread stack", which is pretty deterministic in its structure: it is fixed in size, and on every entry from userspace to kernel on a syscall the thread stack starts construction from an address fetched from the per-cpu cpu_current_top_of_stack variable. The first element to be pushed to the thread stack is the pt_regs struct that stores all required CPU registers and syscall parameters. Finally the specific syscall function is called, with the stack being used as the kernel executes the resulting request. The goal of randomize_kstack_offset feature is to add a random offset after the pt_regs has been pushed to the stack and before the rest of the thread stack is used during the syscall processing, and to change it every time a process issues a syscall. The source of randomness is currently architecture-defined (but x86 is using the low byte of rdtsc()). Future improvements for different entropy sources is possible, but out of scope for this patch. Further more, to add more unpredictability, new offsets are chosen at the end of syscalls (the timing of which should be less easy to measure from userspace than at syscall entry time), and stored in a per-CPU variable, so that the life of the value does not stay explicitly tied to a single task. As suggested by Andy Lutomirski, the offset is added using alloca() and an empty asm() statement with an output constraint, since it avoids changes to assembly syscall entry code, to the unwinder, and provides correct stack alignment as defined by the compiler. In order to make this available by default with zero performance impact for those that don't want it, it is boot-time selectable with static branches. This way, if the overhead is not wanted, it can just be left turned off with no performance impact. The generated assembly for x86_64 with GCC looks like this: ... ffffffff81003977: 65 8b 05 02 ea 00 7f mov %gs:0x7f00ea02(%rip),%eax # 12380 <kstack_offset> ffffffff8100397e: 25 ff 03 00 00 and $0x3ff,%eax ffffffff81003983: 48 83 c0 0f add $0xf,%rax ffffffff81003987: 25 f8 07 00 00 and $0x7f8,%eax ffffffff8100398c: 48 29 c4 sub %rax,%rsp ffffffff8100398f: 48 8d 44 24 0f lea 0xf(%rsp),%rax ffffffff81003994: 48 83 e0 f0 and $0xfffffffffffffff0,%rax ... As a result of the above stack alignment, this patch introduces about 5 bits of randomness after pt_regs is spilled to the thread stack on x86_64, and 6 bits on x86_32 (since its has 1 fewer bit required for stack alignment). The amount of entropy could be adjusted based on how much of the stack space we wish to trade for security. My measure of syscall performance overhead (on x86_64): lmbench: /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_syscall -N 10000 null randomize_kstack_offset=y Simple syscall: 0.7082 microseconds randomize_kstack_offset=n Simple syscall: 0.7016 microseconds So, roughly 0.9% overhead growth for a no-op syscall, which is very manageable. And for people that don't want this, it's off by default. There are two gotchas with using the alloca() trick. First, compilers that have Stack Clash protection (-fstack-clash-protection) enabled by default (e.g. Ubuntu[3]) add pagesize stack probes to any dynamic stack allocations. While the randomization offset is always less than a page, the resulting assembly would still contain (unreachable!) probing routines, bloating the resulting assembly. To avoid this, -fno-stack-clash-protection is unconditionally added to the kernel Makefile since this is the only dynamic stack allocation in the kernel (now that VLAs have been removed) and it is provably safe from Stack Clash style attacks. The second gotcha with alloca() is a negative interaction with -fstack-protector*, in that it sees the alloca() as an array allocation, which triggers the unconditional addition of the stack canary function pre/post-amble which slows down syscalls regardless of the static branch. In order to avoid adding this unneeded check and its associated performance impact, architectures need to carefully remove uses of -fstack-protector-strong (or -fstack-protector) in the compilation units that use the add_random_kstack() macro and to audit the resulting stack mitigation coverage (to make sure no desired coverage disappears). No change is visible for this on x86 because the stack protector is already unconditionally disabled for the compilation unit, but the change is required on arm64. There is, unfortunately, no attribute that can be used to disable stack protector for specific functions. Comparison to PaX RANDKSTACK feature: The RANDKSTACK feature randomizes the location of the stack start (cpu_current_top_of_stack), i.e. including the location of pt_regs structure itself on the stack. Initially this patch followed the same approach, but during the recent discussions[2], it has been determined to be of a little value since, if ptrace functionality is available for an attacker, they can use PTRACE_PEEKUSR/PTRACE_POKEUSR to read/write different offsets in the pt_regs struct, observe the cache behavior of the pt_regs accesses, and figure out the random stack offset. Another difference is that the random offset is stored in a per-cpu variable, rather than having it be per-thread. As a result, these implementations differ a fair bit in their implementation details and results, though obviously the intent is similar. [1] https://lore.kernel.org/kernel-hardening/2236FBA76BA1254E88B949DDB74E612BA4BC57C1@IRSMSX102.ger.corp.intel.com/ [2] https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/ [3] https://lists.ubuntu.com/archives/ubuntu-devel/2019-June/040741.htmlCo-developed-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210401232347.2791257-4-keescook@chromium.org
-
- 19 3月, 2021 1 次提交
-
-
由 Cao jin 提交于
Parameter "cmdline" has no use, drop it. Link: https://lkml.kernel.org/r/20210311085213.27680-1-jojing64@gmail.comAcked-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NCao jin <jojing64@gmail.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 27 2月, 2021 3 次提交
-
-
由 Sumit Garg 提交于
Currently breakpoints in kernel .init.text section are not handled correctly while allowing to remove them even after corresponding pages have been freed. Fix it via killing .init.text section breakpoints just prior to initmem pages being freed. Doug: "HW breakpoints aren't handled by this patch but it's probably not such a big deal". Link: https://lkml.kernel.org/r/20210224081652.587785-1-sumit.garg@linaro.orgSigned-off-by: NSumit Garg <sumit.garg@linaro.org> Suggested-by: NDoug Anderson <dianders@chromium.org> Acked-by: NDoug Anderson <dianders@chromium.org> Acked-by: NDaniel Thompson <daniel.thompson@linaro.org> Tested-by: NDaniel Thompson <daniel.thompson@linaro.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vijayanand Jitta 提交于
Add a kernel parameter stack_depot_disable to disable stack depot. So that stack hash table doesn't consume any memory when stack depot is disabled. The use case is CONFIG_PAGE_OWNER without page_owner=on. Without this patch, stackdepot will consume the memory for the hashtable. By default, it's 8M which is never trivial. With this option, in CONFIG_PAGE_OWNER configured system, page_owner=off, stack_depot_disable in kernel command line, we could save the wasted memory for the hashtable. [akpm@linux-foundation.org: fix CONFIG_STACKDEPOT=n build] Link: https://lkml.kernel.org/r/1611749198-24316-2-git-send-email-vjitta@codeaurora.orgSigned-off-by: NVinayak Menon <vinmenon@codeaurora.org> Signed-off-by: NVijayanand Jitta <vjitta@codeaurora.org> Cc: Alexander Potapenko <glider@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Yogesh Lal <ylal@codeaurora.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Potapenko 提交于
Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7. This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a low-overhead sampling-based memory safety error detector of heap use-after-free, invalid-free, and out-of-bounds access errors. This series enables KFENCE for the x86 and arm64 architectures, and adds KFENCE hooks to the SLAB and SLUB allocators. KFENCE is designed to be enabled in production kernels, and has near zero performance overhead. Compared to KASAN, KFENCE trades performance for precision. The main motivation behind KFENCE's design, is that with enough total uptime KFENCE will detect bugs in code paths not typically exercised by non-production test workloads. One way to quickly achieve a large enough total uptime is when the tool is deployed across a large fleet of machines. KFENCE objects each reside on a dedicated page, at either the left or right page boundaries. The pages to the left and right of the object page are "guard pages", whose attributes are changed to a protected state, and cause page faults on any attempted access to them. Such page faults are then intercepted by KFENCE, which handles the fault gracefully by reporting a memory access error. Guarded allocations are set up based on a sample interval (can be set via kfence.sample_interval). After expiration of the sample interval, the next allocation through the main allocator (SLAB or SLUB) returns a guarded allocation from the KFENCE object pool. At this point, the timer is reset, and the next allocation is set up after the expiration of the interval. To enable/disable a KFENCE allocation through the main allocator's fast-path without overhead, KFENCE relies on static branches via the static keys infrastructure. The static branch is toggled to redirect the allocation to KFENCE. The KFENCE memory pool is of fixed size, and if the pool is exhausted no further KFENCE allocations occur. The default config is conservative with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB pages). We have verified by running synthetic benchmarks (sysbench I/O, hackbench) and production server-workload benchmarks that a kernel with KFENCE (using sample intervals 100-500ms) is performance-neutral compared to a non-KFENCE baseline kernel. KFENCE is inspired by GWP-ASan [1], a userspace tool with similar properties. The name "KFENCE" is a homage to the Electric Fence Malloc Debugger [2]. For more details, see Documentation/dev-tools/kfence.rst added in the series -- also viewable here: https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst [1] http://llvm.org/docs/GwpAsan.html [2] https://linux.die.net/man/3/efence This patch (of 9): This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a low-overhead sampling-based memory safety error detector of heap use-after-free, invalid-free, and out-of-bounds access errors. KFENCE is designed to be enabled in production kernels, and has near zero performance overhead. Compared to KASAN, KFENCE trades performance for precision. The main motivation behind KFENCE's design, is that with enough total uptime KFENCE will detect bugs in code paths not typically exercised by non-production test workloads. One way to quickly achieve a large enough total uptime is when the tool is deployed across a large fleet of machines. KFENCE objects each reside on a dedicated page, at either the left or right page boundaries. The pages to the left and right of the object page are "guard pages", whose attributes are changed to a protected state, and cause page faults on any attempted access to them. Such page faults are then intercepted by KFENCE, which handles the fault gracefully by reporting a memory access error. To detect out-of-bounds writes to memory within the object's page itself, KFENCE also uses pattern-based redzones. The following figure illustrates the page layout: ---+-----------+-----------+-----------+-----------+-----------+--- | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx | ---+-----------+-----------+-----------+-----------+-----------+--- Guarded allocations are set up based on a sample interval (can be set via kfence.sample_interval). After expiration of the sample interval, a guarded allocation from the KFENCE object pool is returned to the main allocator (SLAB or SLUB). At this point, the timer is reset, and the next allocation is set up after the expiration of the interval. To enable/disable a KFENCE allocation through the main allocator's fast-path without overhead, KFENCE relies on static branches via the static keys infrastructure. The static branch is toggled to redirect the allocation to KFENCE. To date, we have verified by running synthetic benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE is performance-neutral compared to the non-KFENCE baseline. For more details, see Documentation/dev-tools/kfence.rst (added later in the series). [elver@google.com: fix parameter description for kfence_object_start()] Link: https://lkml.kernel.org/r/20201106092149.GA2851373@elver.google.com [elver@google.com: avoid stalling work queue task without allocations] Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com [elver@google.com: fix potential deadlock due to wake_up()] Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com [elver@google.com: add option to use KFENCE without static keys] Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com [elver@google.com: add missing copyright and description headers] Link: https://lkml.kernel.org/r/20210118092159.145934-1-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.comSigned-off-by: NMarco Elver <elver@google.com> Signed-off-by: NAlexander Potapenko <glider@google.com> Reviewed-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NSeongJae Park <sjpark@amazon.de> Co-developed-by: NMarco Elver <elver@google.com> Reviewed-by: NJann Horn <jannh@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Joern Engel <joern@purestorage.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 2月, 2021 1 次提交
-
-
由 Andy Shevchenko 提交于
SFI-based platforms are gone. So does this framework. This removes mention of SFI through the drivers and other code as well. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: NHans de Goede <hdegoede@redhat.com> Acked-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 06 2月, 2021 1 次提交
-
-
由 Johannes Berg 提交于
On ARCH=um, loading a module doesn't result in its constructors getting called, which breaks module gcov since the debugfs files are never registered. On the other hand, in-kernel constructors have already been called by the dynamic linker, so we can't call them again. Get out of this conundrum by allowing CONFIG_CONSTRUCTORS to be selected, but avoiding the in-kernel constructor calls. Also remove the "if !UML" from GCOV selecting CONSTRUCTORS now, since we really do want CONSTRUCTORS, just not kernel binary ones. Link: https://lkml.kernel.org/r/20210120172041.c246a2cac2fb.I1358f584b76f1898373adfed77f4462c8705b736@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com> Reviewed-by: NPeter Oberparleiter <oberpar@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jessica Yu <jeyu@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 1月, 2021 1 次提交
-
-
由 Petr Mladek 提交于
This reverts commit 757055ae. The commit caused that ttynull was used as the default console on several systems[1][2][3]. As a result, the console was blank even when a better alternative existed. It happened when there was no console configured on the command line and ttynull_init() was the first initcall calling register_console(). Or it happened when /dev/ did not exist when console_on_rootfs() was called. It was not able to open /dev/console even though a console driver was registered. It tried to add ttynull console but it obviously did not help. But ttynull became the preferred console and was used by /dev/console when it was available later. The commit tried to fix a historical problem that have been there for ages. The primary motivation was the commit 3cffa06a ("printk/console: Allow to disable console output by using console="" or console=null"). It provided a clean solution for a workaround that was widely used and worked only by chance. This revert causes that the console="" or console=null command line options will again work only by chance. These options will cause that a particular console will be preferred and the default (tty) ones will not get enabled. There will be no console registered at all. As a result there won't be stdin, stdout, and stderr for the init process. But it worked exactly this way even before. The proper solution has to fulfill many conditions: + Register ttynull only when explicitly required or as the ultimate fallback. + ttynull should get associated with /dev/console but it must not become preferred console when used as a fallback. Especially, it must still be possible to replace it by a better console later. Such a change requires clean up of the register_console() code. Otherwise, it would be even harder to follow. Especially, the use of has_preferred_console and CON_CONSDEV flag is tricky. The clean up is risky. The ordering of consoles is not well defined. And any changes tend to break existing user settings. Do the revert at the least risky solution for now. [1] https://lore.kernel.org/linux-kselftest/20201221144302.GR4077@smile.fi.intel.com/ [2] https://lore.kernel.org/lkml/d2a3b3c0-e548-7dd1-730f-59bc5c04e191@synopsys.com/ [3] https://patchwork.ozlabs.org/project/linux-um/patch/20210105120128.10854-1-thomas@m3y3r.de/Reported-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Reported-by: NVineet Gupta <vgupta@synopsys.com> Reported-by: NThomas Meyer <thomas@m3y3r.de> Signed-off-by: NPetr Mladek <pmladek@suse.com> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 12月, 2020 3 次提交
-
-
由 Vlastimil Babka 提交于
Patch series "cleanup page poisoning", v3. I have identified a number of issues and opportunities for cleanup with CONFIG_PAGE_POISON and friends: - interaction with init_on_alloc and init_on_free parameters depends on the order of parameters (Patch 1) - the boot time enabling uses static key, but inefficienty (Patch 2) - sanity checking is incompatible with hibernation (Patch 3) - CONFIG_PAGE_POISONING_NO_SANITY can be removed now that we have init_on_free (Patch 4) - CONFIG_PAGE_POISONING_ZERO can be most likely removed now that we have init_on_free (Patch 5) This patch (of 5): Enabling page_poison=1 together with init_on_alloc=1 or init_on_free=1 produces a warning in dmesg that page_poison takes precedence. However, as these warnings are printed in early_param handlers for init_on_alloc/free, they are not printed if page_poison is enabled later on the command line (handlers are called in the order of their parameters), or when init_on_alloc/free is always enabled by the respective config option - before the page_poison early param handler is called, it is not considered to be enabled. This is inconsistent. We can remove the dependency on order by making the init_on_* parameters only set a boolean variable, and postponing the evaluation after all early params have been processed. Introduce a new init_mem_debugging_and_hardening() function for that, and move the related debug_pagealloc processing there as well. As a result init_mem_debugging_and_hardening() knows always accurately if init_on_* and/or page_poison options were enabled. Thus we can also optimize want_init_on_alloc() and want_init_on_free(). We don't need to check page_poisoning_enabled() there, we can instead not enable the init_on_* static keys at all, if page poisoning is enabled. This results in a simpler and more effective code. Link: https://lkml.kernel.org/r/20201113104033.22907-1-vbabka@suse.cz Link: https://lkml.kernel.org/r/20201113104033.22907-2-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mateusz Nosek <mateusznosek0@gmail.com> Cc: Laura Abbott <labbott@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lin Feng 提交于
In the booting phase if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set, we have following callchain: start_kernel ... mm_init mem_init memblock_free_all reset_all_zones_managed_pages free_low_memory_core_early ... buffer_init nr_free_buffer_pages zone->managed_pages ... rest_init kernel_init kernel_init_freeable page_alloc_init_late kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid); wait_for_completion(&pgdat_init_all_done_comp); ... files_maxfiles_init It's clear that buffer_init depends on zone->managed_pages, but it's reset in reset_all_zones_managed_pages after that pages are readded into zone->managed_pages, but when buffer_init runs this process is half done and most of them will finally be added till deferred_init_memmap done. In large memory couting of nr_free_buffer_pages drifts too much, also drifting from kernels to kernels on same hardware. Fix is simple, it delays buffer_init run till deferred_init_memmap all done. But as corrected by this patch, max_buffer_heads becomes very large, the value is roughly as many as 4 times of totalram_pages, formula: max_buffer_heads = nrpages * (10%) * (PAGE_SIZE / sizeof(struct buffer_head)); Say in a 64GB memory box we have 16777216 pages, then max_buffer_heads turns out to be roughly 67,108,864. In common cases, should a buffer_head be mapped to one page/block(4KB)? So max_buffer_heads never exceeds totalram_pages. IMO it's likely to make buffer_heads_over_limit bool value alwasy false, then make codes 'if (buffer_heads_over_limit)' test in vmscan unnecessary. So this patch will change the original behavior related to buffer_heads_over_limit in vmscan since we used a half done value of zone->managed_pages before, or should we use a smaller factor(<10%) in previous formula. akpm: I think this is OK - the max_buffer_heads code is only needed on highmem machines, to prevent ZONE_NORMAL from being consumed by large amounts of buffer_heads attached to highmem pagecache. This problem will not occur on 64-bit machines, so this feature's non-functionality on such machines is a feature, not a bug. Link: https://lkml.kernel.org/r/20201123110500.103523-1-linf@wangsu.comSigned-off-by: NLin Feng <linf@wangsu.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zhenhua Huang 提交于
Page owner of pages used by page owner itself used is missing on arm32 targets. The reason is dummy_handle and failure_handle is not initialized correctly. Buddy allocator is used to initialize these two handles. However, buddy allocator is not ready when page owner calls it. This change fixed that by initializing page owner after buddy initialization. The working flow before and after this change are: original logic: 1. allocated memory for page_ext(using memblock). 2. invoke the init callback of page_ext_ops like page_owner(using buddy allocator). 3. initialize buddy. after this change: 1. allocated memory for page_ext(using memblock). 2. initialize buddy. 3. invoke the init callback of page_ext_ops like page_owner(using buddy allocator). with the change, failure/dummy_handle can get its correct value and page owner output for example has the one for page owner itself: Page allocated via order 2, mask 0x6202c0(GFP_USER|__GFP_NOWARN), pid 1006, ts 67278156558 ns PFN 543776 type Unmovable Block 531 type Unmovable Flags 0x0() init_page_owner+0x28/0x2f8 invoke_init_callbacks_flatmem+0x24/0x34 start_kernel+0x33c/0x5d8 Link: https://lkml.kernel.org/r/1603104925-5888-1-git-send-email-zhenhuah@codeaurora.orgSigned-off-by: NZhenhua Huang <zhenhuah@codeaurora.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-