- 24 2月, 2013 5 次提交
-
-
由 Tang Chen 提交于
Introduce a new API vmemmap_free() to free and remove vmemmap pagetables. Since pagetable implements are different, each architecture has to provide its own version of vmemmap_free(), just like vmemmap_populate(). Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc. [mhocko@suse.cz: fix implicit declaration of remove_pagetable] Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: NJianguo Wu <wujianguo@huawei.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NMichal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tang Chen 提交于
Search a page table about the removed memory, and clear page table for x86_64 architecture. [akpm@linux-foundation.org: make kernel_physical_mapping_remove() static] Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Signed-off-by: NJianguo Wu <wujianguo@huawei.com> Signed-off-by: NJiang Liu <jiang.liu@huawei.com> Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wen Congyang 提交于
When memory is removed, the corresponding pagetables should alse be removed. This patch introduces some common APIs to support vmemmap pagetable and x86_64 architecture direct mapping pagetable removing. All pages of virtual mapping in removed memory cannot be freed if some pages used as PGD/PUD include not only removed memory but also other memory. So this patch uses the following way to check whether a page can be freed or not. 1) When removing memory, the page structs of the removed memory are filled with 0FD. 2) All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. For direct mapping pages, update direct_pages_count[level] when we freed their pagetables. And do not free the pages again because they were freed when offlining. For vmemmap pages, free the pages and their pagetables. For larger pages, do not split them into smaller ones because there is no way to know if the larger page has been split. As a result, there is no way to decide when to split. We deal the larger pages in the following way: 1) For direct mapped pages, all the pages were freed when they were offlined. And since menmory offline is done section by section, all the memory ranges being removed are aligned to PAGE_SIZE. So only need to deal with unaligned pages when freeing vmemmap pages. 2) For vmemmap pages being used to store page_struct, if part of the larger page is still in use, just fill the unused part with 0xFD. And when the whole page is fulfilled with 0xFD, then free the larger page. [akpm@linux-foundation.org: fix typo in comment] [tangchen@cn.fujitsu.com: do not calculate direct mapping pages when freeing vmemmap pagetables] [tangchen@cn.fujitsu.com: do not free direct mapping pages twice] [tangchen@cn.fujitsu.com: do not free page split from hugepage one by one] [tangchen@cn.fujitsu.com: do not split pages when freeing pagetable pages] [akpm@linux-foundation.org: use pmd_page_vaddr()] [akpm@linux-foundation.org: fix used-uninitialised bug] Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: NJianguo Wu <wujianguo@huawei.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yasuaki Ishimatsu 提交于
For removing memmap region of sparse-vmemmap which is allocated bootmem, memmap region of sparse-vmemmap needs to be registered by get_page_bootmem(). So the patch searches pages of virtual mapping and registers the pages by get_page_bootmem(). NOTE: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390, and sparc. So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node() when platform doesn't support it. It's implemented by adding a new Kconfig option named CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected by memory-hotplug feature fully supported archs(currently only on x86_64). Since we have 2 config options called MEMORY_HOTPLUG and MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately, and codes in function register_page_bootmem_info_node() are only used for collecting infomation for hot-remove, so reside it under MEMORY_HOTREMOVE. Besides page_isolation.c selected by MEMORY_ISOLATION under MEMORY_HOTPLUG is also such case, move it too. [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE] [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()] [mhocko@suse.cz: remove the arch specific functions without any implementation] [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed] [rientjes@google.com: fix defined but not used warning] Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com> Reviewed-by: NWu Jianguo <wujianguo@huawei.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NMichal Hocko <mhocko@suse.cz> Signed-off-by: NLin Feng <linfeng@cn.fujitsu.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wen Congyang 提交于
For removing memory, we need to remove page tables. But it depends on architecture. So the patch introduce arch_remove_memory() for removing page table. Now it only calls __remove_pages(). Note: __remove_pages() for some archtecuture is not implemented (I don't know how to implement it for s390). Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 2月, 2013 1 次提交
-
-
由 Mel Gorman 提交于
A user reported the following oops when a backup process reads /proc/kcore: BUG: unable to handle kernel paging request at ffffbb00ff33b000 IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110 [...] Call Trace: [<ffffffff811b8aaa>] read_kcore+0x17a/0x370 [<ffffffff811ad847>] proc_reg_read+0x77/0xc0 [<ffffffff81151687>] vfs_read+0xc7/0x130 [<ffffffff811517f3>] sys_read+0x53/0xa0 [<ffffffff81449692>] system_call_fastpath+0x16/0x1b Investigation determined that the bug triggered when reading system RAM at the 4G mark. On this system, that was the first address using 1G pages for the virt->phys direct mapping so the PUD is pointing to a physical address, not a PMD page. The problem is that the page table walker in kern_addr_valid() is not checking pud_large() and treats the physical address as if it was a PMD. If it happens to look like pmd_none then it'll silently fail, probably returning zeros instead of real data. If the data happens to look like a present PMD though, it will be walked resulting in the oops above. This patch adds the necessary pud_large() check. Unfortunately the problem was not readily reproducible and now they are running the backup program without accessing /proc/kcore so the patch has not been validated but I think it makes sense. Signed-off-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NRik van Riel <riel@redhat.coM> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: stable@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 30 1月, 2013 4 次提交
-
-
由 Yinghai Lu 提交于
HPA said, we should not have RW and +x set at the time. for kernel layout: [ 0.000000] Kernel Layout: [ 0.000000] .text: [0x01000000-0x021434f8] [ 0.000000] .rodata: [0x02200000-0x02a13fff] [ 0.000000] .data: [0x02c00000-0x02dc763f] [ 0.000000] .init: [0x02dc9000-0x0312cfff] [ 0.000000] .bss: [0x0313b000-0x03dd6fff] [ 0.000000] .brk: [0x03dd7000-0x03dfffff] before the patch, we have ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff82200000 18M ro PSE GLB x pmd 0xffffffff82200000-0xffffffff82c00000 10M ro PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82dc9000 1828K RW GLB x pte 0xffffffff82dc9000-0xffffffff82e00000 220K RW GLB NX pte 0xffffffff82e00000-0xffffffff83000000 2M RW PSE GLB NX pmd 0xffffffff83000000-0xffffffff8313a000 1256K RW GLB NX pte 0xffffffff8313a000-0xffffffff83200000 792K RW GLB x pte 0xffffffff83200000-0xffffffff83e00000 12M RW PSE GLB x pmd 0xffffffff83e00000-0xffffffffa0000000 450M pmd after patch,, we get ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff82200000 18M ro PSE GLB x pmd 0xffffffff82200000-0xffffffff82c00000 10M ro PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82e00000 2M RW GLB NX pte 0xffffffff82e00000-0xffffffff83000000 2M RW PSE GLB NX pmd 0xffffffff83000000-0xffffffff83200000 2M RW GLB NX pte 0xffffffff83200000-0xffffffff83e00000 12M RW PSE GLB NX pmd 0xffffffff83e00000-0xffffffffa0000000 450M pmd so data, bss, brk get NX ... Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-33-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Yinghai Lu 提交于
We are not having max_pfn_mapped set correctly until init_memory_mapping. So don't print its initial value for 64bit Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup. -v2: update comments about max_pfn_mapped according to Stefano Stabellini. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-14-git-send-email-yinghai@kernel.orgAcked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Yinghai Lu 提交于
It is simple version for kernel_physical_mapping_init. it will work to build one page table that will be used later. Use mapping_info to control 1. alloc_pg_page method 2. if PMD is EXEC, 3. if pgd is with kernel low mapping or ident mapping. Will use to replace some local versions in kexec, hibernation and etc. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-8-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Yinghai Lu 提交于
Just like the way we calculate next for pud and pmd, aka round down and add size. Also, do not do boundary-checking with 'next', and just pass 'end' down to phys_pud_init() instead. Because the loop in phys_pud_init() stops at PTRS_PER_PUD and thus can handle a possibly bigger 'end' properly. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-6-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 24 1月, 2013 1 次提交
-
-
由 Wen Congyang 提交于
The address range of sync_global_pgds() should be [start, end], but we pass [start, end) to this function. Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 12月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit 53b87cf0. It causes odd bootup problems on x86-64. Markus Trippelsdorf gets a repeatable oops, and I see a non-repeatable oops (or constant stream of messages that scroll off too quickly to read) that seems to go away with this commit reverted. So we don't know exactly what is wrong with the commit, but it's definitely problematic, and worth reverting sooner rather than later. Bisected-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Cc: H Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Matt Fleming <matt.fleming@intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 12月, 2012 1 次提交
-
-
由 Lai Jiangshan 提交于
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Since we introduced N_MEMORY, we update the initialization of node_states. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NLin Feng <linfeng@cn.fujitsu.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 11月, 2012 8 次提交
-
-
由 Yinghai Lu 提交于
Now NO_BOOTMEM version free_all_bootmem_node() does not really do free_bootmem at all, and it only call register_page_bootmem_info_node instead. That is confusing, try to kill that free_all_bootmem_node(). Before that, this patch will remove numa_free_all_bootmem(). That function could be replaced with register_page_bootmem_info() and free_all_bootmem(); Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-43-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Yinghai Lu 提交于
They are almost same except 64 bit need to handle after_bootmem case. Add mm_internal.h to make that alloc_low_page() only to be accessible from arch/x86/mm/init*.c Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-25-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Yinghai Lu 提交于
Now all page table buf are pre-mapped, and could use virtual address directly. So don't need to remember physical address anymore. Remove that phys pointer in alloc_low_page(), and that will allow us to merge alloc_low_page between 64bit and 32bit. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-24-git-send-email-yinghai@kernel.orgAcked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Yinghai Lu 提交于
We try to put page table high to make room for kdump, and at that time those ranges are not mapped yet, and have to use ioremap to access it. Now after patch that pre-map page table top down. x86, mm: setup page table in top-down We do not need that workaround anymore. Just use __va to return directly mapping address. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-23-git-send-email-yinghai@kernel.orgAcked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Yinghai Lu 提交于
Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first. Then use mapped pages to map more ranges below, and keep looping until all pages get mapped. alloc_low_page will use page from BRK at first, after that buffer is used up, will use memblock to find and reserve pages for page table usage. Introduce min_pfn_mapped to make sure find new pages from mapped ranges, that will be updated when lower pages get mapped. Also add step_size to make sure that don't try to map too big range with limited mapped pages initially, and increase the step_size when we have more mapped pages on hand. We don't need to call pagetable_reserve anymore, reserve work is done in alloc_low_page() directly. At last we can get rid of calculation and find early pgt related code. -v2: update to after fix_xen change, also use MACRO for initial pgt_buf size and add comments with it. -v3: skip big reserved range in memblock.reserved near end. -v4: don't need fix_xen change now. -v5: add changelog about moving about reserving pagetable to alloc_low_page. Suggested-by: N"H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-22-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Yinghai Lu 提交于
After we add code use buffer in BRK to pre-map buf for page table in following patch: x86, mm: setup page table in top-down it should be safe to remove early_memmap for page table accessing. Instead we get panic with that. It turns out that we clear the initial page table wrongly for next range that is separated by holes. And it only happens when we are trying to map ram range one by one. We need to check if the range is ram before clearing page table. We change the loop structure to remove the extra little loop and use one loop only, and in that loop will caculate next at first, and check if [addr,next) is covered by E820_RAM. -v2: E820_RESERVED_KERN is treated as E820_RAM. EFI one change some E820_RAM to that, so next kernel by kexec will know that range is used already. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-20-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Yinghai Lu 提交于
We are going to use buffer in BRK to map small range just under memory top, and use those new mapped ram to map ram range under it. The ram range that will be mapped at first could be only page aligned, but ranges around it are ram too, we could use bigger page to map it to avoid small page size. We will adjust page_size_mask in following patch: x86, mm: Use big page size for small memory range to use big page size for small ram range. Before that patch, this patch will make sure start address to be aligned down according to bigger page size, otherwise entry in page page will not have correct value. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-18-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Jacob Shin 提交于
Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT ) and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not backed by actual DRAM. This is fine for holes under 4GB which are covered by fixed and variable range MTRRs to be UC. However, we run into trouble on higher memory addresses which cannot be covered by MTRRs. Our system with 1TB of RAM has an e820 that looks like this: BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable and so direct mappings are created for huge memory hole between 0x000000e038000000 to 0x0000010000000000. Even though the kernel never generates memory accesses in that region, since the page tables mark them incorrectly as being WB, our (AMD) processor ends up causing a MCE while doing some memory bookkeeping/optimizations around that area. This patch iterates through e820 and only direct maps ranges that are marked as E820_RAM, and keeps track of those pfn ranges. Depending on the alignment of E820 ranges, this may possibly result in using smaller size (i.e. 4K instead of 2M or 1G) page tables. -v2: move changes from setup.c to mm/init.c, also use for_each_mem_pfn_range instead. - Yinghai Lu -v3: add calculate_all_table_space_size() to get correct needed page table size. - Yinghai Lu -v4: fix add_pfn_range_mapped() to get correct max_low_pfn_mapped when mem map does have hole under 4g that is found by Konard on xen domU with 8g ram. - Yinghai Signed-off-by: NJacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1353123563-3103-16-git-send-email-yinghai@kernel.orgSigned-off-by: NYinghai Lu <yinghai@kernel.org> Reviewed-by: NPekka Enberg <penberg@kernel.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 17 11月, 2012 1 次提交
-
-
由 Alexander Duyck 提交于
When I made an attempt at separating __pa_symbol and __pa I found that there were a number of cases where __pa was used on an obvious symbol. I also caught one non-obvious case as _brk_start and _brk_end are based on the address of __brk_base which is a C visible symbol. In mark_rodata_ro I was able to reduce the overhead of kernel symbol to virtual memory translation by using a combination of __va(__pa_symbol()) instead of page_address(virt_to_page()). Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Link: http://lkml.kernel.org/r/20121116215640.8521.80483.stgit@ahduyck-cp1.jf.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 30 10月, 2012 1 次提交
-
-
由 Matt Fleming 提交于
There are various pieces of code in arch/x86 that require a page table with an identity mapping. Make trampoline_pgd a proper kernel page table, it currently only includes the kernel text and module space mapping. One new feature of trampoline_pgd is that it now has mappings for the physical I/O device addresses, which are inserted at ioremap() time. Some broken implementations of EFI firmware require these mappings to always be around. Acked-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
-
- 24 10月, 2012 1 次提交
-
-
由 Jan Beulich 提交于
Commit 20167d34 ("x86-64: Fix accounting in kernel_physical_mapping_init()") went a little too far by entirely removing the counting of pre-populated page tables: this should be done at boot time (to cover the page tables set up in early boot code), but shouldn't be done during memory hot add. Hence, re-add the removed increments of "pages", but make them and the one in phys_pte_init() conditional upon !after_bootmem. Reported-Acked-and-Tested-by: NHugh Dickins <hughd@google.com> Signed-off-by: NJan Beulich <jbeulich@suse.com> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/506DAFBA020000780009FA8C@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 5月, 2012 1 次提交
-
-
由 Jan Beulich 提交于
When finding a present and acceptable 2M/1G mapping, the number of pages mapped this way shouldn't be incremented (as it was already incremented when the earlier part of the mapping was established). Instead, last_map_addr needs to be updated in this case. Further, address increments were wrong in one place each in both phys_pmd_init() and phys_pud_init() (lacking the aligning down to the respective page boundary). As we're now doing the same calculation several times, fold it into a single instance using a local variable (matching how kernel_physical_mapping_init() itself does it at the PGD level). Observed during code inspection, not because of an actual problem. Signed-off-by: NJan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FB3C27202000078000841A0@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 29 3月, 2012 1 次提交
-
-
由 David Howells 提交于
Disintegrate asm/system.h for X86. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NH. Peter Anvin <hpa@zytor.com> cc: x86@kernel.org
-
- 11 11月, 2011 5 次提交
-
-
由 Pekka Enberg 提交于
Now that zone_sizes_init() is identical on 32-bit and 64-bit, move the code to arch/x86/mm/init.c and use it for both architectures. Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NPekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1320155902-10424-7-git-send-email-penberg@kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Pekka Enberg 提交于
Make 32-bit and 64-bit zone_sizes_init() identical in preparation for unification. Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NPekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1320155902-10424-6-git-send-email-penberg@kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Pekka Enberg 提交于
64-bit has no highmem so max_low_pfn is always the same as 'max_pfn'. Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NPekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1320155902-10424-5-git-send-email-penberg@kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Pekka Enberg 提交于
In preparation for unifying 32-bit and 64-bit zone_sizes_init() make sure ZONE_DMA32 is wrapped in CONFIG_ZONE_DMA32. Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NArun Sharma <asharma@fb.com> Signed-off-by: NPekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1320155902-10424-4-git-send-email-penberg@kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Pekka Enberg 提交于
This patch introduces a zone_sizes_init() helper function on 64-bit to make it more similar to 32-bit init. Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NPekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1320155902-10424-2-git-send-email-penberg@kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 15 7月, 2011 1 次提交
-
-
由 Tejun Heo 提交于
From 5732e1247898d67cbf837585150fe9f68974671d Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj@kernel.org> Date: Thu, 14 Jul 2011 11:22:16 +0200 Convert x86 to HAVE_MEMBLOCK_NODE_MAP. The only difference in memory handling is that allocations can't no longer cross node boundaries whether they're node affine or not, which shouldn't matter at all. This conversion will enable further simplification of boot memory handling. -v2: Fix build failure on !NUMA configurations discovered by hpa. Signed-off-by: NTejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110714094423.GG3455@htj.dyndns.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 12 7月, 2011 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
The macro MIN_MEMORY_BLOCK_SIZE is currently defined twice in two .c files, and I need it in a third one to fix a powerpc bug, so let's first move it into a header Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: NIngo Molnar <mingo@elte.hu>
-
- 17 5月, 2011 1 次提交
-
-
由 David Rientjes 提交于
ZONE_DMA is unnecessary for a large number of machines that do not require less than 32-bit DMA addressing, e.g. ISA legacy DMA or PCI cards with a restricted DMA address mask. This patch allows users to disable ZONE_DMA for x86 if they know they will not be using such devices with their kernel. This prevents the VM from unnecessarily reserving a ratio of memory (defaulting to 1/256th of system capacity) with lowmem_reserve_ratio for such allocations when it will never be used. Signed-off-by: NDavid Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1105161353560.4353@chino.kir.corp.google.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 02 5月, 2011 1 次提交
-
-
由 Tejun Heo 提交于
The only special handling NUMA needs to do for hotadd memory is determining the node for the hotadd memory given the address of it and there's nothing specific to specific config method used. srat_64.c does somewhat elaborate error checking on ACPI_SRAT_MEM_HOT_PLUGGABLE regions, remembers them and implements memory_add_physaddr_to_nid() which determines the node for given hotadd address. This is almost completely redundant. All the information is already available to the generic NUMA code which already performs all the sanity checking and merging. All that's necessary is not using __initdata from numa_meminfo and providing a function which uses it to map address to node. Drop the specific implementation from srat_64.c and add generic memory_add_physaddr_to_nid() in numa_64.c, which is enabled if CONFIG_MEMORY_HOTPLUG is set. Other than dropping the code, srat_64.c doesn't need any change as it already calls numa_add_memblk() for hot pluggable regions which is enough. While at it, change CONFIG_MEMORY_HOTPLUG_SPARSE in srat_64.c to CONFIG_MEMORY_HOTPLUG, for NUMA on x86-64, the two are always the same. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
-
- 24 3月, 2011 3 次提交
-
-
由 Stephen Wilson 提交于
Now that gate vma's are referenced with respect to a particular mm and not a particular task it only makes sense to propagate the change to this predicate as well. Signed-off-by: NStephen Wilson <wilsons@start.ca> Reviewed-by: NMichel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Stephen Wilson 提交于
Morally, the question of whether an address lies in a gate vma should be asked with respect to an mm, not a particular task. Moreover, dropping the dependency on task_struct will help make existing and future operations on mm's more flexible and convenient. Signed-off-by: NStephen Wilson <wilsons@start.ca> Reviewed-by: NMichel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Stephen Wilson 提交于
Morally, the presence of a gate vma is more an attribute of a particular mm than a particular task. Moreover, dropping the dependency on task_struct will help make both existing and future operations on mm's more flexible and convenient. Signed-off-by: NStephen Wilson <wilsons@start.ca> Reviewed-by: NMichel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 20 3月, 2011 1 次提交
-
-
由 Yinghai Lu 提交于
Now cleanup_highmap actually is in two steps: one is early in head64.c and only clears above _end; a second one is in init_memory_mapping() and tries to clean from _brk_end to _end. It should check if those boundaries are PMD_SIZE aligned but currently does not. Also init_memory_mapping() is called several times for numa or memory hotplug, so we really should not handle initial kernel mappings there. This patch moves cleanup_highmap() down after _brk_end is settled so we can do everything in one step. Also we honor max_pfn_mapped in the implementation of cleanup_highmap. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 10 3月, 2011 1 次提交
-
-
由 Andrea Arcangeli 提交于
It's forbidden to take the page_table_lock with the irq disabled or if there's contention the IPIs (for tlb flushes) sent with the page_table_lock held will never run leading to a deadlock. Nobody takes the pgd_lock from irq context so the _irqsave can be removed. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Acked-by: NRik van Riel <riel@redhat.com> Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <201102162345.p1GNjMjm021738@imap1.linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-