- 03 6月, 2017 1 次提交
-
-
由 Michal Hocko 提交于
We have seen an early OOM killer invocation on ppc64 systems with crashkernel=4096M: kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0 kthreadd cpuset=/ mems_allowed=7 CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1 Call Trace: dump_stack+0xb0/0xf0 (unreliable) dump_header+0xb0/0x258 out_of_memory+0x5f0/0x640 __alloc_pages_nodemask+0xa8c/0xc80 kmem_getpages+0x84/0x1a0 fallback_alloc+0x2a4/0x320 kmem_cache_alloc_node+0xc0/0x2e0 copy_process.isra.25+0x260/0x1b30 _do_fork+0x94/0x470 kernel_thread+0x48/0x60 kthreadd+0x264/0x330 ret_from_kernel_thread+0x5c/0xa4 Mem-Info: active_anon:0 inactive_anon:0 isolated_anon:0 active_file:0 inactive_file:0 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 slab_reclaimable:5 slab_unreclaimable:73 mapped:0 shmem:0 pagetables:0 bounce:0 free:0 free_pcp:0 free_cma:0 Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB 0 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 819200 pages RAM 0 pages HighMem/MovableOnly 817481 pages reserved 0 pages cma reserved 0 pages hwpoisoned the reason is that the managed memory is too low (only 110MB) while the rest of the the 50GB is still waiting for the deferred intialization to be done. update_defer_init estimates the initial memoty to initialize to 2GB at least but it doesn't consider any memory allocated in that range. In this particular case we've had Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB) so the low 2GB is mostly depleted. Fix this by considering memblock allocations in the initial static initialization estimation. Move the max_initialise to reset_deferred_meminit and implement a simple memblock_reserved_memory helper which iterates all reserved blocks and sums the size of all that start below the given address. The cumulative size is than added on top of the initial estimation. This is still not ideal because reset_deferred_meminit doesn't consider holes and so reservation might be above the initial estimation whihch we ignore but let's make the logic simpler until we really need to handle more complicated cases. Fixes: 3a80a7fa ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set") Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.czSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NMel Gorman <mgorman@suse.de> Tested-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 4月, 2017 2 次提交
-
-
由 AKASHI Takahiro 提交于
Add memblock_cap_memory_range() which will remove all the memblock regions except the memory range specified in the arguments. In addition, rework is done on memblock_mem_limit_remove_map() to re-implement it using memblock_cap_memory_range(). This function, like memblock_mem_limit_remove_map(), will not remove memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed later as "device memory." See the commit a571d4eb ("mm/memblock.c: add new infrastructure to address the mem limit issue"). This function is used, in a succeeding patch in the series of arm64 kdump suuport, to limit the range of usable memory, or System RAM, on crash dump kernel. (Please note that "mem=" parameter is of little use for this purpose.) Signed-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: NWill Deacon <will.deacon@arm.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Acked-by: NDennis Chen <dennis.chen@arm.com> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 AKASHI Takahiro 提交于
This function, with a combination of memblock_mark_nomap(), will be used in a later kdump patch for arm64 when it temporarily isolates some range of memory from the other memory blocks in order to create a specific kernel mapping at boot time. Signed-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 10 3月, 2017 1 次提交
-
-
由 AKASHI Takahiro 提交于
Obviously, we should not access memblock.memory.regions[right] if 'right' is outside of [0..memblock.memory.cnt>. Fixes: b92df1de ("mm: page_alloc: skip over regions of invalid pfns where possible") Link: http://lkml.kernel.org/r/20170303023745.9104-1-takahiro.akashi@linaro.orgSigned-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Paul Burton <paul.burton@imgtec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 2月, 2017 3 次提交
-
-
由 Heiko Carstens 提交于
Provide the name of each memblock type with struct memblock_type. This allows to get rid of the function memblock_type_name() and duplicating the type names in __memblock_dump_all(). The only memblock_type usage out of mm/memblock.c seems to be arch/s390/kernel/crash_dump.c. While at it, give it a name. Link: http://lkml.kernel.org/r/20170120123456.46508-4-heiko.carstens@de.ibm.comSigned-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heiko Carstens 提交于
Since commit 70210ed9 ("mm/memblock: add physical memory list") the memblock structure knows about a physical memory list. The physical memory list should also be dumped if memblock_dump_all() is called in case memblock_debug is switched on. This makes debugging a bit easier. Link: http://lkml.kernel.org/r/20170120123456.46508-3-heiko.carstens@de.ibm.comSigned-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heiko Carstens 提交于
Since commit 70210ed9 ("mm/memblock: add physical memory list") the memblock structure knows about a physical memory list. memblock_type_name() should return "physmem" instead of "unknown" if the name of the physmem memblock_type is being asked for. Link: http://lkml.kernel.org/r/20170120123456.46508-2-heiko.carstens@de.ibm.comSigned-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 2月, 2017 4 次提交
-
-
由 Miles Chen 提交于
There is no variable named flags in memblock_add() and memblock_reserve() so remove it from the log messages. This patch also cleans up the type casting for phys_addr_t by using %pa to print them. Link: http://lkml.kernel.org/r/1484720165-25403-1-git-send-email-miles.chen@mediatek.comSigned-off-by: NMiles Chen <miles.chen@mediatek.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
memblock_reserve() would add a new range to memblock.reserved in case the new range is not totally covered by any of the current memblock.reserved range. If the memblock.reserved is full and can't resize, memblock_reserve() would fail. This doesn't happen in real world now, I observed this during code review. While theoretically, it has the chance to happen. And if it happens, others would think this range of memory is still available and may corrupt the memory. This patch checks the return value and goto "done" after it succeeds. Link: http://lkml.kernel.org/r/1482363033-24754-3-git-send-email-richard.weiyang@gmail.comSigned-off-by: NWei Yang <richard.weiyang@gmail.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
memblock_is_region_memory() invoke memblock_search() to see whether the base address is in the memory region. If it fails, idx would be -1. Then, it returns 0. If the memblock_search() returns a valid index, it means the base address is guaranteed to be in the range memblock.memory.regions[idx]. Because of this, it is not necessary to check the base again. This patch removes the check on "base". Link: http://lkml.kernel.org/r/1482363033-24754-2-git-send-email-richard.weiyang@gmail.comSigned-off-by: NWei Yang <richard.weiyang@gmail.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Burton 提交于
When using a sparse memory model memmap_init_zone() when invoked with the MEMMAP_EARLY context will skip over pages which aren't valid - ie. which aren't in a populated region of the sparse memory map. However if the memory map is extremely sparse then it can spend a long time linearly checking each PFN in a large non-populated region of the memory map & skipping it in turn. When CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled, we have sufficient information to quickly discover the next valid PFN given an invalid one by searching through the list of memory regions & skipping forwards to the first PFN covered by the memory region to the right of the non-populated region. Implement this in order to speed up memmap_init_zone() for systems with extremely sparse memory maps. James said "I have tested this patch on a virtual model of a Samurai CPU with a sparse memory map. The kernel boot time drops from 109 to 62 seconds. " Link: http://lkml.kernel.org/r/20161125185518.29885-1-paul.burton@imgtec.comSigned-off-by: NPaul Burton <paul.burton@imgtec.com> Tested-by: NJames Hartley <james.hartley@imgtec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 10月, 2016 1 次提交
-
-
由 Catalin Marinas 提交于
Some of the kmemleak_*() callbacks in memblock, bootmem, CMA convert a physical address to a virtual one using __va(). However, such physical addresses may sometimes be located in highmem and using __va() is incorrect, leading to inconsistent object tracking in kmemleak. The following functions have been added to the kmemleak API and they take a physical address as the object pointer. They only perform the corresponding action if the address has a lowmem mapping: kmemleak_alloc_phys kmemleak_free_part_phys kmemleak_not_leak_phys kmemleak_ignore_phys The affected calling places have been updated to use the new kmemleak API. Link: http://lkml.kernel.org/r/1471531432-16503-1-git-send-email-catalin.marinas@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com> Reported-by: NVignesh R <vigneshr@ti.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 10月, 2016 1 次提交
-
-
由 Srikar Dronamraju 提交于
The total reserved memory in a system is accounted but not available for use use outside mm/memblock.c. By exposing the total reserved memory, systems can better calculate the size of large hashes. Link: http://lkml.kernel.org/r/1472476010-4709-3-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Suggested-by: NMel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 8月, 2016 2 次提交
-
-
由 zijun_hu 提交于
It causes NULL dereference error and failure to get type_a->regions[0] info if parameter type_b of __next_mem_range_rev() == NULL Fix this by checking before dereferring and initializing idx_b to 0 The approach is tested by dumping all types of region via __memblock_dump_all() and __next_mem_range_rev() fixed to UART separately the result is okay after checking the logs. Link: http://lkml.kernel.org/r/57A0320D.6070102@zoho.comSigned-off-by: Nzijun_hu <zijun_hu@htc.com> Tested-by: Nzijun_hu <zijun_hu@htc.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Kuleshov 提交于
s/accomodate/accommodate/ Link: http://lkml.kernel.org/r/20160804121824.18100-1-kuleshovmail@gmail.comSigned-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 7月, 2016 3 次提交
-
-
由 zijun_hu 提交于
Fix region index adjustment error when parameter type_b of __next_mem_range_rev() == NULL. Signed-off-by: Nzijun_hu <zijun_hu@htc.com> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Wei Yang <weiyang@linux.vnet.ibm.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Richard Leitner <dev@g0hl1n.net> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dennis Chen 提交于
In some cases, memblock is queried by kernel to determine whether a specified address is RAM or not. For example, the ACPI core needs this information to determine which attributes to use when mapping ACPI regions(acpi_os_ioremap). Use of incorrect memory types can result in faults, data corruption, or other issues. Removing memory with memblock_enforce_memory_limit() throws away this information, and so a kernel booted with 'mem=' may suffer from the issues described above. To avoid this, we need to keep those NOMAP regions instead of removing all above the limit, which preserves the information we need while preventing other use of those regions. This patch adds new infrastructure to retain all NOMAP memblock regions while removing others, to cater for this. Link: http://lkml.kernel.org/r/1468475036-5852-2-git-send-email-dennis.chen@arm.comSigned-off-by: NDennis Chen <dennis.chen@arm.com> Acked-by: NSteve Capper <steve.capper@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Kaly Xin <kaly.xin@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
asm-generic headers are generic implementations for architecture specific code and should not be included by common code. Thus use the asm/ version of sections.h to get at the linker sections. Link: http://lkml.kernel.org/r/1468285103-7470-1-git-send-email-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 7月, 2016 1 次提交
-
-
由 nimisolo 提交于
If nr_new is 0 which means there's no region would be added, so just return to the caller. Signed-off-by: Nnimisolo <nimisolo@gmail.com> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 5月, 2016 2 次提交
-
-
由 Richard Leitner 提交于
Comparing an u64 variable to >= 0 returns always true and can therefore be removed. This issue was detected using the -Wtype-limits gcc flag. This patch fixes following type-limits warning: mm/memblock.c: In function `__next_reserved_mem_region': mm/memblock.c:843:11: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] if (*idx >= 0 && *idx < type->cnt) { Link: http://lkml.kernel.org/r/20160510103625.3a7f8f32@g0hl1n.netSigned-off-by: NRichard Leitner <dev@g0hl1n.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Kuleshov 提交于
memblock_add_region() and memblock_reserve_region() do nothing specific before the call of memblock_add_range(), only print debug output. We can do the same in memblock_add() and memblock_reserve() since both memblock_add_region() and memblock_reserve_region() are not used by anybody outside of memblock.c and memblock_{add,reserve}() have the same set of flags and nids. Since memblock_add_region() and memblock_reserve_region() will be inlined, there will not be functional changes, but will improve code readability a little. Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Pekka Enberg <penberg@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 3月, 2016 1 次提交
-
-
由 Joe Perches 提交于
Kernel style prefers a single string over split strings when the string is 'user-visible'. Miscellanea: - Add a missing newline - Realign arguments Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: Tejun Heo <tj@kernel.org> [percpu] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 3月, 2016 1 次提交
-
-
由 Alexander Kuleshov 提交于
We define struct memblock_type *type in the memblock_add_region() and memblock_reserve_region() functions only for passing it to the memlock_add_range() and memblock_reserve_range() functions. Let's remove these variables and will pass a type directly. Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 2月, 2016 1 次提交
-
-
由 David Gibson 提交于
At the moment memblock_phys_mem_size() is marked as __init, and so is discarded after boot. This is different from most of the memblock functions which are marked __init_memblock, and are only discarded after boot if memory hotplug is not configured. To allow for upcoming code which will need memblock_phys_mem_size() in the hotplug path, change it from __init to __init_memblock. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 1月, 2016 3 次提交
-
-
由 Alexander Kuleshov 提交于
We already have the for_each_memblock() macro in <linux/memblock.h> which provides ability to iterate over memblock regions of a known type. The for_each_memblock() macro allows us to pass the pointer to the struct memblock_type, instead we need to pass name of the type. This patch introduces a new macro for_each_memblock_type() which allows us iterate over memblock regions with the given type when the type is unknown. Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Kuleshov 提交于
Remove rgnbase and rgnsize variables from memblock_overlaps_region(). We use these variables only for passing to the memblock_addrs_overlap() function and that's all. Let's remove them. Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yaowei Bai 提交于
Make memblock_is_memory() and memblock_is_reserved return bool to improve readability due to these particular functions only using either one or zero as their return value. No functional change. Signed-off-by: NYaowei Bai <baiyaowei@cmss.chinamobile.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 12月, 2015 1 次提交
-
-
由 Ard Biesheuvel 提交于
This introduces the MEMBLOCK_NOMAP attribute and the required plumbing to make it usable as an indicator that some parts of normal memory should not be covered by the kernel direct mapping. It is up to the arch to actually honor the attribute when laying out this mapping, but the memblock code itself is modified to disregard these regions for allocations and other general use. Cc: linux-mm@kvack.org Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
- 06 11月, 2015 1 次提交
-
-
由 Alexander Kuleshov 提交于
memblock_remove_range() is only used in the mm/memblock.c, so we can make it static. Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 9月, 2015 6 次提交
-
-
由 Alexander Kuleshov 提交于
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Kuleshov 提交于
s/succees/success/ Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Kuleshov 提交于
Since commit e3239ff9 ("memblock: Rename memblock_region to memblock_type and memblock_property to memblock_region"), all local variables of the membock_type type were renamed to 'type'. This commit renames all remaining local variables with the memblock_type type to the same view. Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tang Chen 提交于
When parsing SRAT, all memory ranges are added into numa_meminfo. In numa_init(), before entering numa_cleanup_meminfo(), all possible memory ranges are in numa_meminfo. And numa_cleanup_meminfo() removes all ranges over max_pfn or empty. But, this only works if the nodes are continuous. Let's have a look at the following example: We have an SRAT like this: SRAT: Node 0 PXM 0 [mem 0x00000000-0x5fffffff] SRAT: Node 0 PXM 0 [mem 0x100000000-0x1ffffffffff] SRAT: Node 1 PXM 1 [mem 0x20000000000-0x3ffffffffff] SRAT: Node 4 PXM 2 [mem 0x40000000000-0x5ffffffffff] hotplug SRAT: Node 5 PXM 3 [mem 0x60000000000-0x7ffffffffff] hotplug SRAT: Node 2 PXM 4 [mem 0x80000000000-0x9ffffffffff] hotplug SRAT: Node 3 PXM 5 [mem 0xa0000000000-0xbffffffffff] hotplug SRAT: Node 6 PXM 6 [mem 0xc0000000000-0xdffffffffff] hotplug SRAT: Node 7 PXM 7 [mem 0xe0000000000-0xfffffffffff] hotplug On boot, only node 0,1,2,3 exist. And the numa_meminfo will look like this: numa_meminfo.nr_blks = 9 1. on node 0: [0, 60000000] 2. on node 0: [100000000, 20000000000] 3. on node 1: [20000000000, 40000000000] 4. on node 4: [40000000000, 60000000000] 5. on node 5: [60000000000, 80000000000] 6. on node 2: [80000000000, a0000000000] 7. on node 3: [a0000000000, a0800000000] 8. on node 6: [c0000000000, a0800000000] 9. on node 7: [e0000000000, a0800000000] And numa_cleanup_meminfo() will merge 1 and 2, and remove 8,9 because the end address is over max_pfn, which is a0800000000. But 4 and 5 are not removed because their end addresses are less then max_pfn. But in fact, node 4 and 5 don't exist. In a word, numa_cleanup_meminfo() is not able to handle holes between nodes. Since memory ranges in node 4 and 5 are in numa_meminfo, in numa_register_memblks(), node 4 and 5 will be mistakenly set to online. If you run lscpu, it will show: NUMA node0 CPU(s): 0-14,128-142 NUMA node1 CPU(s): 15-29,143-157 NUMA node2 CPU(s): NUMA node3 CPU(s): NUMA node4 CPU(s): 62-76,190-204 NUMA node5 CPU(s): 78-92,206-220 In this patch, we use memblock_overlaps_region() to check if ranges in numa_meminfo overlap with ranges in memory_block. Since memory_block contains all available memory at boot time, if they overlap, it means the ranges exist. If not, then remove them from numa_meminfo. After this patch, lscpu will show: NUMA node0 CPU(s): 0-14,128-142 NUMA node1 CPU(s): 15-29,143-157 NUMA node4 CPU(s): 62-76,190-204 NUMA node5 CPU(s): 78-92,206-220 Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com> Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tang Chen 提交于
memblock_overlaps_region() checks if the given memblock region intersects a region in memblock. If so, it returns the index of the intersected region. But its only caller is memblock_is_region_reserved(), and it returns 0 if false, non-zero if true. Both of these should return bool. Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
Each memblock_region has flags to indicates the type of this range. For the overlap case, memblock_add_range() inserts the lower part and leave the upper part as indicated in the overlapped region. If the flags of the new range differs from the overlapped region, the information recorded is not correct. This patch adds a WARN_ON when the flags of the new range differs from the overlapped region. Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 9月, 2015 1 次提交
-
-
由 Wei Yang 提交于
Each memblock_region has nid to indicates the Node ID of this range. For the overlap case, memblock_add_range() inserts the lower part and leave the upper part as indicated in the overlapped region. If the nid of the new range differs from the overlapped region, the information recorded is not correct. This patch adds a WARN_ON when the nid of the new range differs from the overlapped region. Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 7月, 2015 2 次提交
-
-
由 Mel Gorman 提交于
__free_pages_bootmem prepares a page for release to the buddy allocator and assumes that the struct page is initialised. Parallel initialisation of struct pages defers initialisation and __free_pages_bootmem can be called for struct pages that cannot yet map struct page to PFN. This patch passes PFN to __free_pages_bootmem with no other functional change. Signed-off-by: NMel Gorman <mgorman@suse.de> Tested-by: NNate Zimmer <nzimmer@sgi.com> Tested-by: NWaiman Long <waiman.long@hp.com> Tested-by: NDaniel J Blueman <daniel@numascale.com> Acked-by: NPekka Enberg <penberg@kernel.org> Cc: Robin Holt <robinmholt@gmail.com> Cc: Nate Zimmer <nzimmer@sgi.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Scott Norton <scott.norton@hp.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Robin Holt 提交于
Struct page initialisation had been identified as one of the reasons why large machines take a long time to boot. Patches were posted a long time ago to defer initialisation until they were first used. This was rejected on the grounds it should not be necessary to hurt the fast paths. This series reuses much of the work from that time but defers the initialisation of memory to kswapd so that one thread per node initialises memory local to that node. After applying the series and setting the appropriate Kconfig variable I see this in the boot log on a 64G machine [ 7.383764] kswapd 0 initialised deferred memory in 188ms [ 7.404253] kswapd 1 initialised deferred memory in 208ms [ 7.411044] kswapd 3 initialised deferred memory in 216ms [ 7.411551] kswapd 2 initialised deferred memory in 216ms On a 1TB machine, I see [ 8.406511] kswapd 3 initialised deferred memory in 1116ms [ 8.428518] kswapd 1 initialised deferred memory in 1140ms [ 8.435977] kswapd 0 initialised deferred memory in 1148ms [ 8.437416] kswapd 2 initialised deferred memory in 1148ms Once booted the machine appears to work as normal. Boot times were measured from the time shutdown was called until ssh was available again. In the 64G case, the boot time savings are negligible. On the 1TB machine, the savings were 16 seconds. Nate Zimmer said: : On an older 8 TB box with lots and lots of cpus the boot time, as : measure from grub to login prompt, the boot time improved from 1484 : seconds to exactly 1000 seconds. Waiman Long said: : I ran a bootup timing test on a 12-TB 16-socket IvyBridge-EX system. From : grub menu to ssh login, the bootup time was 453s before the patch and 265s : after the patch - a saving of 188s (42%). Daniel Blueman said: : On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're seeing : stock 4.0 boot in 7136s. This drops to 2159s, or a 70% reduction with : this patchset. Non-temporal PMD init (https://lkml.org/lkml/2015/4/23/350) : drops this to 1045s. This patch (of 13): As part of initializing struct page's in 2MiB chunks, we noticed that at the end of free_all_bootmem(), there was nothing which had forced the reserved/allocated 4KiB pages to be initialized. This helper function will be used for that expansion. Signed-off-by: NRobin Holt <holt@sgi.com> Signed-off-by: NNate Zimmer <nzimmer@sgi.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Tested-by: NNate Zimmer <nzimmer@sgi.com> Tested-by: NWaiman Long <waiman.long@hp.com> Tested-by: NDaniel J Blueman <daniel@numascale.com> Acked-by: NPekka Enberg <penberg@kernel.org> Cc: Robin Holt <robinmholt@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Scott Norton <scott.norton@hp.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 6月, 2015 2 次提交
-
-
由 Tony Luck 提交于
Try to allocate all boot time kernel data structures from mirrored memory. If we run out of mirrored memory print warnings, but fall back to using non-mirrored memory to make sure that we still boot. By number of bytes, most of what we allocate at boot time is the page structures. 64 bytes per 4K page on x86_64 ... or about 1.5% of total system memory. For workloads where the bulk of memory is allocated to applications this may represent a useful improvement to system availability since 1.5% of total memory might be a third of the memory allocated to the kernel. Signed-off-by: NTony Luck <tony.luck@intel.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Xiexiuqi <xiexiuqi@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tony Luck 提交于
Some high end Intel Xeon systems report uncorrectable memory errors as a recoverable machine check. Linux has included code for some time to process these and just signal the affected processes (or even recover completely if the error was in a read only page that can be replaced by reading from disk). But we have no recovery path for errors encountered during kernel code execution. Except for some very specific cases were are unlikely to ever be able to recover. Enter memory mirroring. Actually 3rd generation of memory mirroing. Gen1: All memory is mirrored Pro: No s/w enabling - h/w just gets good data from other side of the mirror Con: Halves effective memory capacity available to OS/applications Gen2: Partial memory mirror - just mirror memory begind some memory controllers Pro: Keep more of the capacity Con: Nightmare to enable. Have to choose between allocating from mirrored memory for safety vs. NUMA local memory for performance Gen3: Address range partial memory mirror - some mirror on each memory controller Pro: Can tune the amount of mirror and keep NUMA performance Con: I have to write memory management code to implement The current plan is just to use mirrored memory for kernel allocations. This has been broken into two phases: 1) This patch series - find the mirrored memory, use it for boot time allocations 2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the unused mirrored memory from mm/memblock.c and only give it out to select kernel allocations (this is still being scoped because page_alloc.c is scary). This patch (of 3): Add extra "flags" to memblock to allow selection of memory based on attribute. No functional changes Signed-off-by: NTony Luck <tony.luck@intel.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Xiexiuqi <xiexiuqi@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-