- 31 3月, 2022 1 次提交
-
-
由 Kees Cook 提交于
To follow the existing per-arch conventions, rename "sp_in_global" to "current_stack_pointer". This will let it be used in non-arch places (like HARDENED_USERCOPY). Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
- 30 3月, 2022 1 次提交
-
-
由 Fangrui Song 提交于
On ELF, (NOLOAD) sets the section type to SHT_NOBITS[1]. It is conceptually inappropriate for .plt, .got, and .got.plt sections which are always SHT_PROGBITS. In GNU ld, if PLT entries are needed, .plt will be SHT_PROGBITS anyway and (NOLOAD) will be essentially ignored. In ld.lld, since https://reviews.llvm.org/D118840 ("[ELF] Support (TYPE=<value>) to customize the output section type"), ld.lld will report a `section type mismatch` error (later changed to a warning). Just remove (NOLOAD) to fix the warning. [1] https://lld.llvm.org/ELF/linker_script.html As of today, "The section should be marked as not loadable" on https://sourceware.org/binutils/docs/ld/Output-Section-Type.html is outdated for ELF. Link: https://github.com/ClangBuiltLinux/linux/issues/1597 Fixes: ab1ef68e ("RISC-V: Add sections of PLT and GOT for kernel module") Reported-by: NNathan Chancellor <nathan@kernel.org> Signed-off-by: NFangrui Song <maskray@google.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
- 24 3月, 2022 4 次提交
-
-
由 Feiyang Chen 提交于
Select HAVE_ARCH_NODEDATA_EXTENSION for loongson64 to fix build error when CONFIG_NUMA=y: mips64el-unknown-linux-gnu-ld: mm/page_alloc.o: in function `free_area_init': (.init.text+0x1714): undefined reference to `node_data' mips64el-unknown-linux-gnu-ld: (.init.text+0x1730): undefined reference to `node_data' Also, select HAVE_ARCH_NODEDATA_EXTENSION for sgi-ip27 to fix build error: mips64el-unknown-linux-gnu-ld: mm/page_alloc.o: in function `free_area_init': page_alloc.c:(.init.text+0x1ba8): undefined reference to `node_data' mips64el-unknown-linux-gnu-ld: page_alloc.c:(.init.text+0x1bcc): undefined reference to `node_data' mips64el-unknown-linux-gnu-ld: page_alloc.c:(.init.text+0x1be4): undefined reference to `node_data' mips64el-unknown-linux-gnu-ld: page_alloc.c:(.init.text+0x1bf4): undefined reference to `node_data' Signed-off-by: NFeiyang Chen <chenfeiyang@loongson.cn> Reviewed-by: NHuacai Chen <chenhuacai@kernel.org> Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
-
由 Jisheng Zhang 提交于
Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and increase compile coverage. Link: https://lkml.kernel.org/r/20211206160514.2000-5-jszhang@kernel.orgSigned-off-by: NJisheng Zhang <jszhang@kernel.org> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Acked-by: NBaoquan He <bhe@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jisheng Zhang 提交于
Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and increase compile coverage. Link: https://lkml.kernel.org/r/20211206160514.2000-4-jszhang@kernel.orgSigned-off-by: NJisheng Zhang <jszhang@kernel.org> Acked-by: NBaoquan He <bhe@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jisheng Zhang 提交于
Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and increase compile coverage. Link: https://lkml.kernel.org/r/20211206160514.2000-3-jszhang@kernel.orgSigned-off-by: NJisheng Zhang <jszhang@kernel.org> Acked-by: NPalmer Dabbelt <palmer@rivosinc.com> Acked-by: NBaoquan He <bhe@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 3月, 2022 18 次提交
-
-
由 Andre Przywara 提交于
The Kconfig symbols required for the Allwinner F1C100 (MACH_SUNIV) are currently not selected by any defconfig. sunxi_defconfig only covers the v7 SoCs, but the F1C100s is ARMv5, so we cannot share a single image. Add the required symbols to multi_v5_defconfig, to give people some sane default config when playing around with this chip. This is probably more important as there are surely not many distros out there supporting ARMv5 out of the box. This allows my LicheePi Nano board to boot to a busybox prompt. The zImage size grows by about 50 KB, in detail: text data bss dec hex filename 10510000 4400700 687740 15598440 ee0368 vmlinux-old 10588592 4469096 686812 15744500 f03df4 vmlinux-new 14922908 arch/arm/boot/Image-old 15067388 arch/arm/boot/Image-new 6388016 arch/arm/boot/zImage-old 6440064 arch/arm/boot/zImage-new Signed-off-by: NAndre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20220317183043.948432-6-andre.przywara@arm.com' Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-
由 David Hildenbrand 提交于
... and call node_dev_init() after memory_dev_init() from driver_init(), so before any of the existing arch/subsys calls. All online nodes should be known at that point: early during boot, arch code determines node and zone ranges and sets the relevant nodes online; usually this happens in setup_arch(). This is in line with memory_dev_init(), which initializes the memory device subsystem and creates all memory block devices. Similar to memory_dev_init(), panic() if anything goes wrong, we don't want to continue with such basic initialization errors. The important part is that node_dev_init() gets called after memory_dev_init() and after cpu_dev_init(), but before any of the relevant archs call register_cpu() to register the new cpu device under the node device. The latter should be the case for the current users of topology_init(). Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Tested-by: Anatoly Pugachev <matorola@gmail.com> (sparc64) Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Mike Rapoport <rppt@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Prior to "mm: handle uninitialized numa nodes gracefully" memory hotplug used to allocate pgdat when memory has been added to a node (hotadd_init_pgdat) arch_free_nodedata has been only used in the failure path because once the pgdat is exported (to be visible by NODA_DATA(nid)) it cannot really be freed because there is no synchronization available for that. pgdat is allocated for each possible nodes now so the memory hotplug doesn't need to do the ever use arch_free_nodedata so drop it. This patch doesn't introduce any functional change. Link: https://lkml.kernel.org/r/20220127085305.20890-4-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NRafael Aquini <raquini@redhat.com> Acked-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Cc: Alexey Makhalov <amakhalov@vmware.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
We have had several reports [1][2][3] that page allocator blows up when an allocation from a possible node is requested. The underlying reason is that NODE_DATA for the specific node is not allocated. NUMA specific initialization is arch specific and it can vary a lot. E.g. x86 tries to initialize all nodes that have some cpu affinity (see init_cpu_to_node) but this can be insufficient because the node might be cpuless for example. One way to address this problem would be to check for !node_online nodes when trying to get a zonelist and silently fall back to another node. That is unfortunately adding a branch into allocator hot path and it doesn't handle any other potential NODE_DATA users. This patch takes a different approach (following a lead of [3]) and it pre allocates pgdat for all possible nodes in an arch indipendent code - free_area_init. All uninitialized nodes are treated as memoryless nodes. node_state of the node is not changed because that would lead to other side effects - e.g. sysfs representation of such a node and from past discussions [4] it is known that some tools might have problems digesting that. Newly allocated pgdat only gets a minimal initialization and the rest of the work is expected to be done by the memory hotplug - hotadd_new_pgdat (renamed to hotadd_init_pgdat). generic_alloc_nodedata is changed to use the memblock allocator because neither page nor slab allocators are available at the stage when all pgdats are allocated. Hotplug doesn't allocate pgdat anymore so we can use the early boot allocator. The only arch specific implementation is ia64 and that is changed to use the early allocator as well. [1] http://lkml.kernel.org/r/20211101201312.11589-1-amakhalov@vmware.com [2] http://lkml.kernel.org/r/20211207224013.880775-1-npache@redhat.com [3] http://lkml.kernel.org/r/20190114082416.30939-1-mhocko@kernel.org [4] http://lkml.kernel.org/r/20200428093836.27190-1-srikar@linux.vnet.ibm.com [akpm@linux-foundation.org: replace comment, per Mike] Link: https://lkml.kernel.org/r/Yfe7RBeLCijnWBON@dhcp22.suse.czReported-by: NAlexey Makhalov <amakhalov@vmware.com> Tested-by: NAlexey Makhalov <amakhalov@vmware.com> Reported-by: NNico Pache <npache@redhat.com> Acked-by: NRafael Aquini <raquini@redhat.com> Tested-by: NRafael Aquini <raquini@redhat.com> Acked-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NMichal Hocko <mhocko@suse.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Patch series "mm, memory_hotplug: handle unitialized numa node gracefully". The core of the fix is patch 2 which also links existing bug reports. The high level goal is to have all possible numa nodes have their pgdat allocated and initialized so for_each_possible_node(nid) NODE_DATA(nid) will never return garbage. This has proven to be problem in several places when an offline numa node is used for an allocation just to realize that node_data and therefore allocation fallback zonelists are not initialized and such an allocation request blows up. There were attempts to address that by checking node_online in several places including the page allocator. This patchset approaches the problem from a different perspective and instead of special casing, which just adds a runtime overhead, it allocates pglist_data for each possible node. This can add some memory overhead for platforms with high number of possible nodes if they do not contain any memory. This should be a rather rare configuration though. How to test this? David has provided and excellent howto: http://lkml.kernel.org/r/6e5ebc19-890c-b6dd-1924-9f25c441010d@redhat.com Patches 1 and 3-6 are mostly cleanups. The patchset has been reviewed by Rafael (thanks!) and the core fix tested by Rafael and Alexey (thanks to both). David has tested as per instructions above and hasn't found any fallouts in the memory hotplug scenarios. This patch (of 6): This is a preparatory patch and it doesn't introduce any functional change. It merely pulls out arch_alloc_nodedata (and co) outside of CONFIG_MEMORY_HOTPLUG because the following patch will need to call this from the generic MM code. Link: https://lkml.kernel.org/r/20220127085305.20890-1-mhocko@kernel.org Link: https://lkml.kernel.org/r/20220127085305.20890-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NRafael Aquini <raquini@redhat.com> Acked-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Reviewed-by: NWei Yang <richard.weiyang@gmail.com> Cc: Alexey Makhalov <amakhalov@vmware.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hari Bathini 提交于
With commit a4e92ce8 ("powerpc/fadump: Reservationless firmware assisted dump"), Linux kernel's Contiguous Memory Allocator (CMA) based reservation was introduced in fadump. That change was aimed at using CMA to let applications utilize the memory reserved for fadump while blocking it from being used for kernel pages. The assumption was, even if CMA activation fails for whatever reason, the memory still remains reserved to avoid it from being used for kernel pages. But commit 072355c1 ("mm/cma: expose all pages to the buddy if activation of an area fails") breaks this assumption as it started exposing all pages to buddy allocator on CMA activation failure. It led to warning messages like below while running crash-utility on vmcore of a kernel having above two commits: crash: seek error: kernel virtual address: <from reserved region> To fix this problem, opt out from exposing pages to buddy allocator on CMA activation failure for fadump reserved memory. Link: https://lkml.kernel.org/r/20220117075246.36072-3-hbathini@linux.ibm.comSigned-off-by: NHari Bathini <hbathini@linux.ibm.com> Acked-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anshuman Khandual 提交于
ARCH_WANT_GENERAL_HUGETLB config has duplicate definitions on platforms that subscribe it. Instead make it a generic config option which can be selected on applicable platforms when required. Link: https://lkml.kernel.org/r/1643718465-4324-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 luofei 提交于
When the hwpoison page meets the filter conditions, it should not be regarded as successful memory_failure() processing for mce handler, but should return a distinct value, otherwise mce handler regards the error page has been identified and isolated, which may lead to calling set_mce_nospec() to change page attribute, etc. Here memory_failure() return -EOPNOTSUPP to indicate that the error event is filtered, mce handler should not take any action for this situation and hwpoison injector should treat as correct. Link: https://lkml.kernel.org/r/20220223082135.2769649-1-luofei@unicloud.comSigned-off-by: Nluofei <luofei@unicloud.com> Acked-by: NBorislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oscar Salvador 提交于
On x86, prior to ("mm: handle uninitialized numa nodes gracecully"), NUMA nodes could be allocated at three different places. - numa_register_memblks - init_cpu_to_node - init_gi_nodes All these calls happen at setup_arch, and have the following order: setup_arch ... x86_numa_init numa_init numa_register_memblks ... init_cpu_to_node init_memory_less_node alloc_node_data free_area_init_memoryless_node init_gi_nodes init_memory_less_node alloc_node_data free_area_init_memoryless_node numa_register_memblks() is only interested in those nodes which have memory, so it skips over any memoryless node it founds. Later on, when we have read ACPI's SRAT table, we call init_cpu_to_node() and init_gi_nodes(), which initialize any memoryless node we might have that have either CPU or Initiator affinity, meaning we allocate pg_data_t struct for them and we mark them as ONLINE. So far so good, but the thing is that after ("mm: handle uninitialized numa nodes gracefully"), we allocate all possible NUMA nodes in free_area_init(), meaning we have a picture like the following: setup_arch x86_numa_init numa_init numa_register_memblks <-- allocate non-memoryless node x86_init.paging.pagetable_init ... free_area_init free_area_init_memoryless <-- allocate memoryless node init_cpu_to_node alloc_node_data <-- allocate memoryless node with CPU free_area_init_memoryless_node init_gi_nodes alloc_node_data <-- allocate memoryless node with Initiator free_area_init_memoryless_node free_area_init() already allocates all possible NUMA nodes, but init_cpu_to_node() and init_gi_nodes() are clueless about that, so they go ahead and allocate a new pg_data_t struct without checking anything, meaning we end up allocating twice. It should be mad clear that this only happens in the case where memoryless NUMA node happens to have a CPU/Initiator affinity. So get rid of init_memory_less_node() and just set the node online. Note that setting the node online is needed, otherwise we choke down the chain when bringup_nonboot_cpus() ends up calling __try_online_node()->register_one_node()->... and we blow up in bus_add_device(). As can be seen here: BUG: kernel NULL pointer dereference, address: 0000000000000060 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-1-default+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/4 RIP: 0010:bus_add_device+0x5a/0x140 Code: 8b 74 24 20 48 89 df e8 84 96 ff ff 85 c0 89 c5 75 38 48 8b 53 50 48 85 d2 0f 84 bb 00 004 RSP: 0000:ffffc9000022bd10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888100987400 RCX: ffff8881003e4e19 RDX: ffff8881009a5e00 RSI: ffff888100987400 RDI: ffff888100987400 RBP: 0000000000000000 R08: ffff8881003e4e18 R09: ffff8881003e4c98 R10: 0000000000000000 R11: ffff888100402bc0 R12: ffffffff822ceba0 R13: 0000000000000000 R14: ffff888100987400 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88853fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000060 CR3: 000000000200a001 CR4: 00000000001706b0 Call Trace: device_add+0x4c0/0x910 __register_one_node+0x97/0x2d0 __try_online_node+0x85/0xc0 try_online_node+0x25/0x40 cpu_up+0x4f/0x100 bringup_nonboot_cpus+0x4f/0x60 smp_init+0x26/0x79 kernel_init_freeable+0x130/0x2f1 kernel_init+0x17/0x150 ret_from_fork+0x22/0x30 The reason is simple, by the time bringup_nonboot_cpus() gets called, we did not register the node_subsys bus yet, so we crash when bus_add_device() tries to dereference bus()->p. The following shows the order of the calls: kernel_init_freeable smp_init bringup_nonboot_cpus ... bus_add_device() <- we did not register node_subsys yet do_basic_setup do_initcalls postcore_initcall(register_node_type); register_node_type subsys_system_register subsys_register bus_register <- register node_subsys bus Why setting the node online saves us then? Well, simply because __try_online_node() backs off when the node is online, meaning we do not end up calling register_one_node() in the first place. This is subtle, broken and deserves a deep analysis and thought about how to put this into shape, but for now let us have this easy fix for the leaking memory issue. [osalvador@suse.de: add comments] Link: https://lkml.kernel.org/r/20220221142649.3457-1-osalvador@suse.de Link: https://lkml.kernel.org/r/20220218224302.5282-2-osalvador@suse.de Fixes: da4490c958ad ("mm: handle uninitialized numa nodes gracefully") Signed-off-by: NOscar Salvador <osalvador@suse.de> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Rafael Aquini <raquini@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Alexey Makhalov <amakhalov@vmware.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Hildenbrand 提交于
Patch series "mm: enforce pageblock_order < MAX_ORDER". Having pageblock_order >= MAX_ORDER seems to be able to happen in corner cases and some parts of the kernel are not prepared for it. For example, Aneesh has shown [1] that such kernels can be compiled on ppc64 with 64k base pages by setting FORCE_MAX_ZONEORDER=8, which will run into a WARN_ON_ONCE(order >= MAX_ORDER) in comapction code right during boot. We can get pageblock_order >= MAX_ORDER when the default hugetlb size is bigger than the maximum allocation granularity of the buddy, in which case we are no longer talking about huge pages but instead gigantic pages. Having pageblock_order >= MAX_ORDER can only make alloc_contig_range() of such gigantic pages more likely to succeed. Reliable use of gigantic pages either requires boot time allcoation or CMA, no need to overcomplicate some places in the kernel to optimize for corner cases that are broken in other areas of the kernel. This patch (of 2): Let's enforce pageblock_order < MAX_ORDER and simplify. Especially patch #1 can be regarded a cleanup before: [PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range alignment. [2] [1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com [2] https://lkml.kernel.org/r/20220211164135.1803616-1-zi.yan@sent.com Link: https://lkml.kernel.org/r/20220214174132.219303-2-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NZi Yan <ziy@nvidia.com> Acked-by: NRob Herring <robh@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: John Garry via iommu <iommu@lists.linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stafford Horne 提交于
Originally the mmu_gathers were removed in commit 1c395176 ("mm: now that all old mmu_gather code is gone, remove the storage"). However, the openrisc and hexagon architecture were merged around the same time and mmu_gathers was not removed. This patch removes them from openrisc, hexagon and nds32: Noticed while cleaning this warning: arch/openrisc/mm/init.c:41:1: warning: symbol 'mmu_gathers' was not declared. Should it be static? Link: https://lkml.kernel.org/r/20220205141956.3315419-1-shorne@gmail.comSigned-off-by: NStafford Horne <shorne@gmail.com> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Brian Cain <bcain@codeaurora.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: David Hildenbrand <david@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anshuman Khandual 提交于
Each call into pte_mkhuge() is invariably followed by arch_make_huge_pte(). Instead arch_make_huge_pte() can accommodate pte_mkhuge() at the beginning. This updates generic fallback stub for arch_make_huge_pte() and available platforms definitions. This makes huge pte creation much cleaner and easier to follow. Link: https://lkml.kernel.org/r/1643860669-26307-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Acked-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vincent Chen 提交于
Add calls to rseq_signal_deliver() and rseq_syscall() to introduce RSEQ support. 1. Call the rseq_signal_deliver() function to fixup on the pre-signal frame when a signal is delivered on top of a restartable sequence critical section. 2. Check that system calls are not invoked from within rseq critical sections by invoking rseq_signal() from ret_from_syscall(). With CONFIG_DEBUG_RSEQ, such behavior results in termination of the process with SIGSEGV. Signed-off-by: NVincent Chen <vincent.chen@sifive.com> Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Alexei Starovoitov 提交于
This reverts commit 75caf33e. Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Alexei Starovoitov 提交于
This reverts commit 83acdce6. Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Alexei Starovoitov 提交于
This reverts commit 02752bd9. Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Alexei Starovoitov 提交于
This reverts commit 515a4917. Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Max Filippov 提交于
Before the commit f9ce0be7 ("mm: Cleanup faultaround and finish_fault() codepaths") there was a call to update_mmu_cache in alloc_set_pte that used to invalidate TLB entry caching invalid PTE that caused a page fault. That commit removed that call so now invalid TLB entry survives causing repetitive page faults on the CPU that took the initial fault until that TLB entry is occasionally evicted. This issue is spotted by the xtensa TLB sanity checker. Fix this issue by defining update_mmu_tlb function that flushes TLB entry for the faulting address. Cc: stable@vger.kernel.org # 5.12+ Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
-
- 22 3月, 2022 10 次提交
-
-
由 Randy Dunlap 提交于
Add <asm/paravirt_api_clock.h> for arch/arm/, mapped to <asm/paravirt.h>, to simplify #ifdeffery in generic code. Fixes this build error introduced by the scheduler tree: In file included from ../kernel/sched/core.c:81: ../kernel/sched/sched.h:87:11: fatal error: asm/paravirt_api_clock.h: No such file or directory 87 | # include <asm/paravirt_api_clock.h> Reviewed-by: NNathan Chancellor <nathan@kernel.org> Fixes: 4ff8f2ca ("sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NIngo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20220316204146.14000-1-rdunlap@infradead.org
-
由 Atish Patra 提交于
The sscofpmf extension allows counter overflow and filtering for programmable counters. Enable the perf driver to handle the overflow interrupt. The overflow interrupt is a hart local interrupt. Thus, per cpu overflow interrupts are setup as a child under the root INTC irq domain. Signed-off-by: NAtish Patra <atish.patra@wdc.com> Signed-off-by: NAtish Patra <atishp@rivosinc.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Atish Patra 提交于
This patch adds all the definitions defined by the SBI PMU extension. Reviewed-by: NAnup Patel <anup@brainfault.org> Signed-off-by: NAtish Patra <atish.patra@wdc.com> Signed-off-by: NAtish Patra <atishp@rivosinc.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Atish Patra 提交于
Linux kernel can directly read these counters as the HPMCOUNTERS CSRs are accessible in S-mode. Reviewed-by: NAnup Patel <anup@brainfault.org> Signed-off-by: NAtish Patra <atish.patra@wdc.com> Signed-off-by: NAtish Patra <atishp@rivosinc.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Atish Patra 提交于
The current perf implementation in RISC-V is not very useful as it can not count any events other than cycle/instructions. Moreover, perf record can not be used or the events can not be started or stopped. Remove the implementation now for a better platform driver in future that will implement most of the missing functionality. Reviewed-by: NAnup Patel <anup@brainfault.org> Signed-off-by: NAtish Patra <atish.patra@wdc.com> Signed-off-by: NAtish Patra <atishp@rivosinc.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Matthew Wilcox (Oracle) 提交于
We need to use this function in common code; pull it out of pmd_page(). Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Matthew Wilcox (Oracle) 提交于
This is straightforward for everything except nohash64 where we indirect through pmd_page(). There must be a better way to do this. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Matthew Wilcox (Oracle) 提交于
Whether or not the platform supports PMD sized pages, we need to provide pmd_pfn() for an upcoming patch. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Mike Rapoport 提交于
We need to use this function in common code, so define it for architectures and/or configrations that miss it. The result of pmd_pfn() will only be used if TRANSPARENT_HUGEPAGE is enabled, but a function or macro called pmd_pfn() must be defined, even on machines with two level page tables. Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Matthew Wilcox (Oracle) 提交于
Add isolate_lru_page() as a wrapper around isolate_lru_folio(). TestClearPageLRU() would have always failed on a tail page, so returning -EBUSY is the same behaviour. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: NJason Gunthorpe <jgg@nvidia.com> Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
-
- 21 3月, 2022 6 次提交
-
-
由 Julia Lawall 提交于
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: NJulia Lawall <Julia.Lawall@inria.fr> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220318103729.157574-9-Julia.Lawall@inria.fr
-
由 Kees Cook 提交于
Add SUBARCH target for Clang+um (which must go last, not alphabetically, so the other SUBARCHes are assigned). Remove open-coded "DEFINE" macro, instead using linux/kbuild.h's version which was updated to use Clang-friendly assembly in commit cf0c3e68 ("kbuild: fix asm-offset generation to work with clang"). Redefine "DEFINE_LONGS" in terms of "COMMENT" and "DEFINE" so that the intended coment actually has useful content. Add a missed "break" to avoid implicit fall-through warnings. This lets me run KUnit tests with Clang: $ ./tools/testing/kunit/kunit.py run --make_options LLVM=1 ... Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: David Gow <davidgow@google.com> Cc: linux-um@lists.infradead.org Cc: linux-kbuild@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: kunit-dev@googlegroups.com Cc: llvm@lists.linux.dev Reviewed-by: NNathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/lkml/Yg2YubZxvYvx7%2Fnm@dev-arch.archlinux-ax161/Tested-by: NDavid Gow <davidgow@google.com> Link: https://lore.kernel.org/lkml/CABVgOSk=oFxsbSbQE-v65VwR2+mXeGXDDjzq8t7FShwjJ3+kUg@mail.gmail.com/Signed-off-by: NKees Cook <keescook@chromium.org> --- v1: https://lore.kernel.org/lkml/20220217002843.2312603-1-keescook@chromium.org v2: https://lore.kernel.org/lkml/20220224055831.1854786-1-keescook@chromium.org v3: - use kbuild.h to avoid duplication (Masahiro) - fix intended comments (Masahiro) - use SUBARCH (Nathan)
-
由 Paolo Bonzini 提交于
Instead of using array_size, use a function that takes care of the multiplication. While at it, switch to kvcalloc since this allocation should not be very large. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Oliver Upton 提交于
KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not advertise the set of quirks which may be disabled to userspace, so it is impossible to predict the behavior of KVM. Worse yet, KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning it fails to reject attempts to set invalid quirk bits. The only valid workaround for the quirky quirks API is to add a new CAP. Actually advertise the set of quirks that can be disabled to userspace so it can predict KVM's behavior. Reject values for cap->args[0] that contain invalid bits. Finally, add documentation for the new capability and describe the existing quirks. Signed-off-by: NOliver Upton <oupton@google.com> Message-Id: <20220301060351.442881-5-oupton@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Thomas Gleixner 提交于
Non constant TSC is a nightmare on bare metal already, but with virtualization it becomes a complete disaster because the workarounds are horrible latency wise. That's also a preliminary for running RT in a guest on top of a RT host. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Message-Id: <Yh5eJSG19S2sjZfy@linutronix.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Guests X86_BUG_NULL_SEG if and only if the host has them. Use the info from static_cpu_has_bug to form the 0x80000021 CPUID leaf that was defined for Zen3. Userspace can then set the bit even on older CPUs that do not have the bug, such as Zen2. Do the same for X86_FEATURE_LFENCE_RDTSC as well, since various processors have had very different ways of detecting it and not all of them are available to userspace. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-