1. 18 2月, 2020 1 次提交
  2. 18 11月, 2019 1 次提交
  3. 05 9月, 2019 1 次提交
  4. 21 5月, 2019 1 次提交
  5. 13 3月, 2019 1 次提交
    • M
      memblock: drop __memblock_alloc_base() · 42b46aef
      Mike Rapoport 提交于
      The __memblock_alloc_base() function tries to allocate a memory up to
      the limit specified by its max_addr parameter.  Depending on the value
      of this parameter, the __memblock_alloc_base() can is replaced with the
      appropriate memblock_phys_alloc*() variant.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-9-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NRob Herring <robh@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42b46aef
  6. 06 3月, 2019 1 次提交
  7. 31 10月, 2018 2 次提交
    • M
      mm: remove include/linux/bootmem.h · 57c8a661
      Mike Rapoport 提交于
      Move remaining definitions and declarations from include/linux/bootmem.h
      into include/linux/memblock.h and remove the redundant header.
      
      The includes were replaced with the semantic patch below and then
      semi-automated removal of duplicated '#include <linux/memblock.h>
      
      @@
      @@
      - #include <linux/bootmem.h>
      + #include <linux/memblock.h>
      
      [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
      [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
      [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
        Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
      Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57c8a661
    • M
      memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc* · 9a8dd708
      Mike Rapoport 提交于
      Make it explicit that the caller gets a physical address rather than a
      virtual one.
      
      This will also allow using meblock_alloc prefix for memblock allocations
      returning virtual address, which is done in the following patches.
      
      The conversion is done using the following semantic patch:
      
      @@
      expression e1, e2, e3;
      @@
      (
      - memblock_alloc(e1, e2)
      + memblock_phys_alloc(e1, e2)
      |
      - memblock_alloc_nid(e1, e2, e3)
      + memblock_phys_alloc_nid(e1, e2, e3)
      |
      - memblock_alloc_try_nid(e1, e2, e3)
      + memblock_phys_alloc_try_nid(e1, e2, e3)
      )
      
      Link: http://lkml.kernel.org/r/1536927045-23536-7-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a8dd708
  8. 14 5月, 2018 1 次提交
  9. 08 4月, 2017 1 次提交
  10. 03 4月, 2017 2 次提交
  11. 28 1月, 2017 1 次提交
    • I
      x86/boot/e820: Move asm/e820.h to asm/e820/api.h · 66441bd3
      Ingo Molnar 提交于
      In line with asm/e820/types.h, move the e820 API declarations to
      asm/e820/api.h and update all usage sites.
      
      This is just a mechanical, obviously correct move & replace patch,
      there will be subsequent changes to clean up the code and to make
      better use of the new header organization.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      66441bd3
  12. 15 12月, 2016 1 次提交
  13. 22 9月, 2016 1 次提交
    • T
      x86/numa: Online memory-less nodes at boot time · 2532fc31
      Tang Chen 提交于
      For now, x86 does not support memory-less node. A node without memory
      will not be onlined, and the cpus on it will be mapped to the other
      online nodes with memory in init_cpu_to_node(). The reason of doing this
      is to ensure each cpu has mapped to a node with memory, so that it will
      be able to allocate local memory for that cpu.
      
      But we don't have to do it in this way.
      
      In this series of patches, we are going to construct cpu <-> node mapping
      for all possible cpus at boot time, which is a persistent mapping. It means
      that the cpu will be mapped to the node which it belongs to, and will never
      be changed. If a node has only cpus but no memory, the cpus on it will be
      mapped to a memory-less node. And the memory-less node should be onlined.
      
      Allocate pgdats for all memory-less nodes and online them at boot
      time. Then build zonelists for these nodes. As a result, when cpus on these
      memory-less nodes try to allocate memory from local node, it will
      automatically fall back to the proper zones in the zonelists.
      Signed-off-by: NZhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: mika.j.penttila@gmail.com
      Cc: len.brown@intel.com
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: rafael@kernel.org
      Cc: rjw@rjwysocki.net
      Cc: yasu.isimatu@gmail.com
      Cc: linux-mm@kvack.org
      Cc: linux-acpi@vger.kernel.org
      Cc: isimatu.yasuaki@jp.fujitsu.com
      Cc: gongzhaogang@inspur.com
      Cc: tj@kernel.org
      Cc: izumi.taku@jp.fujitsu.com
      Cc: cl@linux.com
      Cc: chen.tang@easystack.cn
      Cc: akpm@linux-foundation.org
      Cc: kamezawa.hiroyu@jp.fujitsu.com
      Cc: lenb@kernel.org
      Link: http://lkml.kernel.org/r/1472114120-3281-2-git-send-email-douly.fnst@cn.fujitsu.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2532fc31
  14. 14 7月, 2016 1 次提交
    • P
      x86/mm: Audit and remove any unnecessary uses of module.h · 4b599fed
      Paul Gortmaker 提交于
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  The advantage
      in doing so is that module.h itself sources about 15 other headers;
      adding significantly to what we feed cpp, and it can obscure what
      headers we are effectively using.
      
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
      for the presence of either and replace accordingly where needed.
      
      Note that some bool/obj-y instances remain since module.h is
      the header for some exception table entry stuff, and for things
      like __init_or_module (code that is tossed when MODULES=n).
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160714001901.31603-3-paul.gortmaker@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4b599fed
  15. 30 5月, 2016 1 次提交
  16. 20 5月, 2016 1 次提交
  17. 08 2月, 2016 3 次提交
    • I
      x86/mm/numa: Check for failures in numa_clear_kernel_node_hotplug() · 5f7ee246
      Ingo Molnar 提交于
      numa_clear_kernel_node_hotplug() uses memblock_set_node() without
      checking for failures.
      
      memblock_set_node() is a complex function that might extend the
      memblock array - which extension might fail - so check for this
      possibility.
      
      It's not supposed to happen (because realistically if we have so
      little memory that this fails then we likely won't be able to
      boot anyway), but do the check nevertheless.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: y14sg1 <y14sg1@comcast.net>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5f7ee246
    • I
      x86/mm/numa: Clean up numa_clear_kernel_node_hotplug() · c1a0bf34
      Ingo Molnar 提交于
      So we fixed an overflow bug in numa_clear_kernel_node_hotplug():
      
        2b54ab3c66d4 ("x86/mm/numa: Fix memory corruption on 32-bit NUMA kernels")
      
      ... and the bug was indirectly caused by poor coding style,
      such as using start/end local variables unnecessarily, which
      lost the physaddr_t type.
      
      So make the code more readable and try to fully comment all
      the thinking behind the logic.
      
      No change in functionality.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: y14sg1 <y14sg1@comcast.net>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c1a0bf34
    • I
      x86/mm/numa: Fix 32-bit memblock range truncation bug on 32-bit NUMA kernels · 59fd1214
      Ingo Molnar 提交于
      The following commit:
      
        a0acda91 ("acpi, numa, mem_hotplug: mark all nodes the kernel resides un-hotpluggable")
      
      Introduced numa_clear_kernel_node_hotplug(), which function is executed
      during early bootup, and which marks all currently reserved memblock
      regions as hot-memory-unswappable as well.
      
      y14sg1 <y14sg1@comcast.net> reported that when running 32-bit NUMA kernels,
      the grsecurity/PAX kernel patch flagged a size overflow in this function:
      
        PAX: size overflow detected in function x86_numa_init arch/x86/mm/numa.c:691 [...]
      
      ... the reason for the overflow is that memblock_clear_hotplug() takes physical
      addresses as arguments, while the start/end variables used by
      numa_clear_kernel_node_hotplug() are 'unsigned long', which is 32-bit on PAE
      kernels, but which has 64-bit physical addresses.
      
      So on 32-bit PAE kernels that have physical memory above the 4GB boundary,
      we truncate a 64-bit physical address range to 32 bits and pass it to
      memblock_clear_hotplug(), which at minimum prevents the original memory-hotplug
      bugfix from working, but might have other side effects as well.
      
      The fix is to use the proper type to handle physical addresses, phys_addr_t.
      Reported-by: Ny14sg1 <y14sg1@comcast.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      59fd1214
  18. 09 9月, 2015 1 次提交
    • T
      mem-hotplug: handle node hole when initializing numa_meminfo. · 95cf82ec
      Tang Chen 提交于
      When parsing SRAT, all memory ranges are added into numa_meminfo.  In
      numa_init(), before entering numa_cleanup_meminfo(), all possible memory
      ranges are in numa_meminfo.  And numa_cleanup_meminfo() removes all
      ranges over max_pfn or empty.
      
      But, this only works if the nodes are continuous.  Let's have a look at
      the following example:
      
      We have an SRAT like this:
      SRAT: Node 0 PXM 0 [mem 0x00000000-0x5fffffff]
      SRAT: Node 0 PXM 0 [mem 0x100000000-0x1ffffffffff]
      SRAT: Node 1 PXM 1 [mem 0x20000000000-0x3ffffffffff]
      SRAT: Node 4 PXM 2 [mem 0x40000000000-0x5ffffffffff] hotplug
      SRAT: Node 5 PXM 3 [mem 0x60000000000-0x7ffffffffff] hotplug
      SRAT: Node 2 PXM 4 [mem 0x80000000000-0x9ffffffffff] hotplug
      SRAT: Node 3 PXM 5 [mem 0xa0000000000-0xbffffffffff] hotplug
      SRAT: Node 6 PXM 6 [mem 0xc0000000000-0xdffffffffff] hotplug
      SRAT: Node 7 PXM 7 [mem 0xe0000000000-0xfffffffffff] hotplug
      
      On boot, only node 0,1,2,3 exist.
      
      And the numa_meminfo will look like this:
      numa_meminfo.nr_blks = 9
      1. on node 0: [0, 60000000]
      2. on node 0: [100000000, 20000000000]
      3. on node 1: [20000000000, 40000000000]
      4. on node 4: [40000000000, 60000000000]
      5. on node 5: [60000000000, 80000000000]
      6. on node 2: [80000000000, a0000000000]
      7. on node 3: [a0000000000, a0800000000]
      8. on node 6: [c0000000000, a0800000000]
      9. on node 7: [e0000000000, a0800000000]
      
      And numa_cleanup_meminfo() will merge 1 and 2, and remove 8,9 because the
      end address is over max_pfn, which is a0800000000.  But 4 and 5 are not
      removed because their end addresses are less then max_pfn.  But in fact,
      node 4 and 5 don't exist.
      
      In a word, numa_cleanup_meminfo() is not able to handle holes between nodes.
      
      Since memory ranges in node 4 and 5 are in numa_meminfo, in
      numa_register_memblks(), node 4 and 5 will be mistakenly set to online.
      
      If you run lscpu, it will show:
      NUMA node0 CPU(s):     0-14,128-142
      NUMA node1 CPU(s):     15-29,143-157
      NUMA node2 CPU(s):
      NUMA node3 CPU(s):
      NUMA node4 CPU(s):     62-76,190-204
      NUMA node5 CPU(s):     78-92,206-220
      
      In this patch, we use memblock_overlaps_region() to check if ranges in
      numa_meminfo overlap with ranges in memory_block.  Since memory_block
      contains all available memory at boot time, if they overlap, it means the
      ranges exist.  If not, then remove them from numa_meminfo.
      
      After this patch, lscpu will show:
      NUMA node0 CPU(s):     0-14,128-142
      NUMA node1 CPU(s):     15-29,143-157
      NUMA node4 CPU(s):     62-76,190-204
      NUMA node5 CPU(s):     78-92,206-220
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Vladimir Murzin <vladimir.murzin@arm.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95cf82ec
  19. 07 4月, 2015 1 次提交
    • D
      x86/mm/numa: Fix kernel stack corruption in numa_init()->numa_clear_kernel_node_hotplug() · 22ef882e
      Dave Young 提交于
      I got below kernel panic during kdump test on Thinkpad T420
      laptop:
      
      [    0.000000] No NUMA configuration found
      [    0.000000] Faking a node at [mem 0x0000000000000000-0x0000000037ba4fff]
      [    0.000000] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81d21910
       ...
      [    0.000000] Call Trace:
      [    0.000000]  [<ffffffff817c2a26>] dump_stack+0x45/0x57
      [    0.000000]  [<ffffffff817bc8d2>] panic+0xd0/0x204
      [    0.000000]  [<ffffffff81d21910>] ? numa_clear_kernel_node_hotplug+0xe6/0xf2
      [    0.000000]  [<ffffffff8107741b>] __stack_chk_fail+0x1b/0x20
      [    0.000000]  [<ffffffff81d21910>] numa_clear_kernel_node_hotplug+0xe6/0xf2
      [    0.000000]  [<ffffffff81d21e5d>] numa_init+0x1a5/0x520
      [    0.000000]  [<ffffffff81d222b1>] x86_numa_init+0x19/0x3d
      [    0.000000]  [<ffffffff81d22460>] initmem_init+0x9/0xb
      [    0.000000]  [<ffffffff81d0d00c>] setup_arch+0x94f/0xc82
      [    0.000000]  [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
      [    0.000000]  [<ffffffff817bd0bb>] ? printk+0x55/0x6b
      [    0.000000]  [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
      [    0.000000]  [<ffffffff81d05d9b>] start_kernel+0xe8/0x4d6
      [    0.000000]  [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
      [    0.000000]  [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
      [    0.000000]  [<ffffffff81d055ee>] x86_64_start_reservations+0x2a/0x2c
      [    0.000000]  [<ffffffff81d05751>] x86_64_start_kernel+0x161/0x184
      [    0.000000] ---[ end Kernel panic - not syncing: stack-protector: Kernel sta
      
      This is caused by writing over the end of numa mask bitmap
      in numa_clear_kernel_node().
      
      numa_clear_kernel_node() tries to set the node id in a mask bitmap,
      by iterating all reserved regions and assuming that every region
      has a valid nid.
      
      This assumption is not true because there's an exception for some
      graphic memory quirks. See trim_snb_memory() in arch/x86/kernel/setup.c
      
      It is easily to reproduce the bug in the kdump kernel because kdump
      kernel use pre-reserved memory instead of the whole memory, but
      kexec pass other reserved memory ranges to 2nd kernel as well.
      like below in my test:
      
      kdump kernel ram 0x2d000000 - 0x37bfffff
      One of the reserved regions: 0x40000000 - 0x40100000 which
      includes 0x40004000, a page excluded in trim_snb_memory(). For
      this memblock reserved region the nid is not set, it is still
      default value MAX_NUMNODES. later node_set will set bit
      MAX_NUMNODES thus stack corruption happen.
      
      This also happens when booting with mem= kernel commandline
      during my test.
      
      Fixing it by adding a check, do not call node_set in case nid is
      MAX_NUMNODES.
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bhe@redhat.com
      Cc: qiuxishi@huawei.com
      Link: http://lkml.kernel.org/r/20150407134132.GA23522@dhcp-16-198.nay.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      22ef882e
  20. 14 2月, 2015 1 次提交
  21. 14 10月, 2014 1 次提交
    • X
      arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable · bd5cfb89
      Xishi Qiu 提交于
      If all the nodes are marked hotpluggable, alloc node data will fail.
      Because __next_mem_range_rev() will skip the hotpluggable memory
      regions.  numa_clear_kernel_node_hotplug() is called after alloc node
      data.
      
      numa_init()
          ...
          ret = init_func();  // this will mark hotpluggable flag from SRAT
          ...
          memblock_set_bottom_up(false);
          ...
          ret = numa_register_memblks(&numa_meminfo);  // this will alloc node data(pglist_data)
          ...
          numa_clear_kernel_node_hotplug();  // in case all the nodes are hotpluggable
          ...
      
      numa_register_memblks()
          setup_node_data()
              memblock_find_in_range_node()
                  __memblock_find_range_top_down()
                      for_each_mem_range_rev()
                          __next_mem_range_rev()
      
      This patch moves numa_clear_kernel_node_hotplug() into
      numa_register_memblks(), clear kernel node hotpluggable flag before
      alloc node data, then alloc node data won't fail even all the nodes
      are hotpluggable.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd5cfb89
  22. 16 9月, 2014 1 次提交
    • L
      x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data() · 8b375f64
      Luiz Capitulino 提交于
      The setup_node_data() function allocates a pg_data_t object,
      inserts it into the node_data[] array and initializes the
      following fields: node_id, node_start_pfn and
      node_spanned_pages.
      
      However, a few function calls later during the kernel boot,
      free_area_init_node() re-initializes those fields, possibly with
      setup_node_data() is not used.
      
      This causes a small glitch when running Linux as a hyperv numa
      guest:
      
        SRAT: PXM 0 -> APIC 0x00 -> Node 0
        SRAT: PXM 0 -> APIC 0x01 -> Node 0
        SRAT: PXM 1 -> APIC 0x02 -> Node 1
        SRAT: PXM 1 -> APIC 0x03 -> Node 1
        SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
        SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff]
        SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff]
        NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff]
        Initmem setup node 0 [mem 0x00000000-0x7fffffff]
          NODE_DATA [mem 0x7ffdc000-0x7ffeffff]
        Initmem setup node 1 [mem 0x80800000-0x1081fffff]
          NODE_DATA [mem 0x1081ea000-0x1081fdfff]
        crashkernel: memory value expected
         [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0
         [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1
        Zone ranges:
          DMA      [mem 0x00001000-0x00ffffff]
          DMA32    [mem 0x01000000-0xffffffff]
          Normal   [mem 0x100000000-0x1081fffff]
        Movable zone start for each node
        Early memory node ranges
          node   0: [mem 0x00001000-0x0009efff]
          node   0: [mem 0x00100000-0x7ffeffff]
          node   1: [mem 0x80200000-0xf7ffffff]
          node   1: [mem 0x100000000-0x1081fffff]
        On node 0 totalpages: 524174
          DMA zone: 64 pages used for memmap
          DMA zone: 21 pages reserved
          DMA zone: 3998 pages, LIFO batch:0
          DMA32 zone: 8128 pages used for memmap
          DMA32 zone: 520176 pages, LIFO batch:31
        On node 1 totalpages: 524288
          DMA32 zone: 7672 pages used for memmap
          DMA32 zone: 491008 pages, LIFO batch:31
          Normal zone: 520 pages used for memmap
          Normal zone: 33280 pages, LIFO batch:7
      
      In this dmesg, the SRAT table reports that the memory range for
      node 1 starts at 0x80200000.  However, the line starting with
      "Initmem" reports that node 1 memory range starts at 0x80800000.
       The "Initmem" line is reported by setup_node_data() and is
      wrong, because the kernel ends up using the range as reported in
      the SRAT table.
      
      This commit drops all that dead code from setup_node_data(),
      renames it to alloc_node_data() and adds a printk() to
      free_area_init_node() so that we report a node's memory range
      accurately.
      
      Here's the same dmesg section with this patch applied:
      
         SRAT: PXM 0 -> APIC 0x00 -> Node 0
         SRAT: PXM 0 -> APIC 0x01 -> Node 0
         SRAT: PXM 1 -> APIC 0x02 -> Node 1
         SRAT: PXM 1 -> APIC 0x03 -> Node 1
         SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
         SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff]
         SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff]
         NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff]
         NODE_DATA(0) allocated [mem 0x7ffdc000-0x7ffeffff]
         NODE_DATA(1) allocated [mem 0x1081ea000-0x1081fdfff]
         crashkernel: memory value expected
          [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0
          [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1
         Zone ranges:
           DMA      [mem 0x00001000-0x00ffffff]
           DMA32    [mem 0x01000000-0xffffffff]
           Normal   [mem 0x100000000-0x1081fffff]
         Movable zone start for each node
         Early memory node ranges
           node   0: [mem 0x00001000-0x0009efff]
           node   0: [mem 0x00100000-0x7ffeffff]
           node   1: [mem 0x80200000-0xf7ffffff]
           node   1: [mem 0x100000000-0x1081fffff]
         Initmem setup node 0 [mem 0x00001000-0x7ffeffff]
         On node 0 totalpages: 524174
           DMA zone: 64 pages used for memmap
           DMA zone: 21 pages reserved
           DMA zone: 3998 pages, LIFO batch:0
           DMA32 zone: 8128 pages used for memmap
           DMA32 zone: 520176 pages, LIFO batch:31
         Initmem setup node 1 [mem 0x80200000-0x1081fffff]
         On node 1 totalpages: 524288
           DMA32 zone: 7672 pages used for memmap
           DMA32 zone: 491008 pages, LIFO batch:31
           Normal zone: 520 pages used for memmap
           Normal zone: 33280 pages, LIFO batch:7
      
      This commit was tested on a two node bare-metal NUMA machine and
      Linux as a numa guest on hyperv and qemu/kvm.
      
      PS: The wrong memory range reported by setup_node_data() seems to be
          harmless in the current kernel because it's just not used.  However,
          that bad range is used in kernel 2.6.32 to initialize the old boot
          memory allocator, which causes a crash during boot.
      Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8b375f64
  23. 05 6月, 2014 1 次提交
  24. 28 2月, 2014 1 次提交
  25. 07 2月, 2014 2 次提交
  26. 22 1月, 2014 3 次提交
    • T
      acpi, numa, mem_hotplug: mark all nodes the kernel resides un-hotpluggable · a0acda91
      Tang Chen 提交于
      At very early time, the kernel have to use some memory such as loading
      the kernel image.  We cannot prevent this anyway.  So any node the
      kernel resides in should be un-hotpluggable.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0acda91
    • T
      acpi, numa, mem_hotplug: mark hotpluggable memory in memblock · 05d1d8cb
      Tang Chen 提交于
      When parsing SRAT, we know that which memory area is hotpluggable.  So we
      invoke function memblock_mark_hotplug() introduced by previous patch to
      mark hotpluggable memory in memblock.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05d1d8cb
    • T
      memblock: make memblock_set_node() support different memblock_type · e7e8de59
      Tang Chen 提交于
      [sfr@canb.auug.org.au: fix powerpc build]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7e8de59
  27. 19 12月, 2013 1 次提交
    • L
      x86/mm/numa: Fix 32-bit kernel NUMA boot · f3d815cb
      Lans Zhang 提交于
      When booting a 32-bit x86 kernel on a NUMA machine, node data
      cannot be allocated from local node if the account of memory for
      node 0 covers the low memory space entirely:
      
        [    0.000000] Initmem setup node 0 [mem 0x00000000-0x83fffffff]
        [    0.000000]   NODE_DATA [mem 0x367ed000-0x367edfff]
        [    0.000000] Initmem setup node 1 [mem 0x840000000-0xfffffffff]
        [    0.000000] Cannot find 4096 bytes in node 1
        [    0.000000] 64664MB HIGHMEM available.
        [    0.000000] 871MB LOWMEM available.
      
      To fix this issue, node data is allowed to be allocated from
      other nodes if the memory of local node is still not mapped. The
      expected result looks like this:
      
        [    0.000000] Initmem setup node 0 [mem 0x00000000-0x83fffffff]
        [    0.000000]   NODE_DATA [mem 0x367ed000-0x367edfff]
        [    0.000000] Initmem setup node 1 [mem 0x840000000-0xfffffffff]
        [    0.000000]   NODE_DATA [mem 0x367ec000-0x367ecfff]
        [    0.000000]     NODE_DATA(1) on node 0
        [    0.000000] 64664MB HIGHMEM available.
        [    0.000000] 871MB LOWMEM available.
      Signed-off-by: NLans Zhang <jia.zhang@windriver.com>
      Cc: <andi@firstfloor.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1386303510-18574-1-git-send-email-jia.zhang@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f3d815cb
  28. 13 11月, 2013 1 次提交
    • T
      mem-hotplug: introduce movable_node boot option · c5320926
      Tang Chen 提交于
      The hot-Pluggable field in SRAT specifies which memory is hotpluggable.
      As we mentioned before, if hotpluggable memory is used by the kernel, it
      cannot be hot-removed.  So memory hotplug users may want to set all
      hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.
      
      Memory hotplug users may also set a node as movable node, which has
      ZONE_MOVABLE only, so that the whole node can be hot-removed.
      
      But the kernel cannot use memory in ZONE_MOVABLE.  By doing this, the
      kernel cannot use memory in movable nodes.  This will cause NUMA
      performance down.  And other users may be unhappy.
      
      So we need a way to allow users to enable and disable this functionality.
      In this patch, we introduce movable_node boot option to allow users to
      choose to not to consume hotpluggable memory at early boot time and later
      we can set it as ZONE_MOVABLE.
      
      To achieve this, the movable_node boot option will control the memblock
      allocation direction.  That said, after memblock is ready, before SRAT is
      parsed, we should allocate memory near the kernel image as we explained in
      the previous patches.  So if movable_node boot option is set, the kernel
      does the following:
      
      1. After memblock is ready, make memblock allocate memory bottom up.
      2. After SRAT is parsed, make memblock behave as default, allocate memory
         top down.
      
      Users can specify "movable_node" in kernel commandline to enable this
      functionality.  For those who don't use memory hotplug or who don't want
      to lose their NUMA performance, just don't specify anything.  The kernel
      will work as before.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Suggested-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c5320926
  29. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  30. 30 4月, 2013 1 次提交
  31. 03 3月, 2013 1 次提交
    • Y
      x86, ACPI, mm: Revert movablemem_map support · 20e6926d
      Yinghai Lu 提交于
      Tim found:
      
        WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
        Hardware name: S2600CP
        sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
        smpboot: Booting Node   1, Processors  #1
        Modules linked in:
        Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
        Call Trace:
          set_cpu_sibling_map+0x279/0x449
          start_secondary+0x11d/0x1e5
      
      Don Morris reproduced on a HP z620 workstation, and bisected it to
      commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
      is ready")
      
      It turns out movable_map has some problems, and it breaks several things
      
      1. numa_init is called several times, NOT just for srat. so those
      	nodes_clear(numa_nodes_parsed)
      	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
         can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
         and make fall back path working.
      
      2. simply split acpi_numa_init to early_parse_srat.
         a. that early_parse_srat is NOT called for ia64, so you break ia64.
         b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
      	     set_apicid_to_node(i, NUMA_NO_NODE)
           still left in numa_init. So it will just clear result from early_parse_srat.
           it should be moved before that....
         c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
             early before override from INITRD is settled.
      
      3. that patch TITLE is total misleading, there is NO x86 in the title,
         but it changes critical x86 code. It caused x86 guys did not
         pay attention to find the problem early. Those patches really should
         be routed via tip/x86/mm.
      
      4. after that commit, following range can not use movable ram:
        a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
        b. initrd... it will be freed after booting, so it could be on movable...
        c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
      	anymore.
        d. init_mem_mapping: can not put page table high anymore.
        e. initmem_init: vmemmap can not be high local node anymore. That is
           not good.
      
      If node is hotplugable, the mem related range like page table and
      vmemmap could be on the that node without problem and should be on that
      node.
      
      We have workaround patch that could fix some problems, but some can not
      be fixed.
      
      So just remove that offending commit and related ones including:
      
       f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
          protect movablecore_map in memblock_overlaps_region().")
      
       01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
          SRAT")
      
       27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
          the end of node")
      
       e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
          ready")
      
       fb06bc8e ("page_alloc: bootmem limit with movablecore_map")
      
       42f47e27 ("page_alloc: make movablemem_map have higher priority")
      
       6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
          movable limit for nodes")
      
       34b71f1e ("page_alloc: add movable_memmap kernel parameter")
      
       4d59a751 ("x86: get pg_data_t's memory from other node")
      
      Later we should have patches that will make sure kernel put page table
      and vmemmap on local node ram instead of push them down to node0.  Also
      need to find way to put other kernel used ram to local node ram.
      Reported-by: NTim Gardner <tim.gardner@canonical.com>
      Reported-by: NDon Morris <don.morris@hp.com>
      Bisected-by: NDon Morris <don.morris@hp.com>
      Tested-by: NDon Morris <don.morris@hp.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20e6926d
  32. 24 2月, 2013 2 次提交
    • W
      x86/mm/numa: Don't check if node is NUMA_NO_NODE · 942670d0
      Wen Congyang 提交于
      If we aren't debugging per_cpu maps, the cpu's node is stored in
      per_cpu variable numa_node.  If `node' is NUMA_NO_NODE, it means
      the caller wants to clear the cpu's node.  So we should also
      call set_cpu_numa_node() in this case.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      942670d0
    • T
      acpi, memory-hotplug: parse SRAT before memblock is ready · e8d19552
      Tang Chen 提交于
      On linux, the pages used by kernel could not be migrated.  As a result,
      if a memory range is used by kernel, it cannot be hot-removed.  So if we
      want to hot-remove memory, we should prevent kernel from using it.
      
      The way now used to prevent this is specify a memory range by
      movablemem_map boot option and set it as ZONE_MOVABLE.
      
      But when the system is booting, memblock will allocate memory, and
      reserve the memory for kernel.  And before we parse SRAT, and know the
      node memory ranges, memblock is working.  And it may allocate memory in
      ranges to be set as ZONE_MOVABLE.  This memory can be used by kernel,
      and never be freed.
      
      So, let's parse SRAT before memblock is called first.  And it is early
      enough.
      
      The first call of memblock_find_in_range_node() is in:
      
        setup_arch()
          |-->setup_real_mode()
      
      so, this patch add a function early_parse_srat() to parse SRAT, and call
      it before setup_real_mode() is called.
      
      NOTE:
      
      1) early_parse_srat() is called before numa_init(), and has initialized
         numa_meminfo.  So DO NOT clear numa_nodes_parsed in numa_init() and DO
         NOT zero numa_meminfo in numa_init(), otherwise we will lose memory
         numa info.
      
      2) I don't know why using count of memory affinities parsed from SRAT
         as a return value in original acpi_numa_init().  So I add a static
         variable srat_mem_cnt to remember this count and use it as the return
         value of the new acpi_numa_init()
      
      [mhocko@suse.cz: parse SRAT before memblock is ready fix]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: "Brown, Len" <len.brown@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8d19552