- 24 2月, 2013 2 次提交
-
-
由 Andrea Arcangeli 提交于
Without this patch any kernel code that reads kernel memory in non present kernel pte/pmds (as set by pageattr.c) will crash. With this kernel code: static struct page *crash_page; static unsigned long *crash_address; [..] crash_page = alloc_pages(GFP_KERNEL, 9); crash_address = page_address(crash_page); if (set_memory_np((unsigned long)crash_address, 1)) printk("set_memory_np failure\n"); [..] The kernel will crash if inside the "crash tool" one would try to read the memory at the not present address. crash> p crash_address crash_address = $8 = (long unsigned int *) 0xffff88023c000000 crash> rd 0xffff88023c000000 [ *lockup* ] The lockup happens because _PAGE_GLOBAL and _PAGE_PROTNONE shares the same bit, and pageattr leaves _PAGE_GLOBAL set on a kernel pte which is then mistaken as _PAGE_PROTNONE (so pte_present returns true by mistake and the kernel fault then gets confused and loops). With THP the same can happen after we taught pmd_present to check _PAGE_PROTNONE and _PAGE_PSE in commit 027ef6c8 ("mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP"). THP has the same problem with _PAGE_GLOBAL as the 4k pages, but it also has a problem with _PAGE_PSE, which must be cleared too. After the patch is applied copy_user correctly returns -EFAULT and doesn't lockup anymore. crash> p crash_address crash_address = $9 = (long unsigned int *) 0xffff88023c000000 crash> rd 0xffff88023c000000 rd: read error: kernel virtual address: ffff88023c000000 type: "64-bit KVADDR" Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Shaohua Li <shaohua.li@intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Wen Congyang 提交于
When memory is removed, the corresponding pagetables should alse be removed. This patch introduces some common APIs to support vmemmap pagetable and x86_64 architecture direct mapping pagetable removing. All pages of virtual mapping in removed memory cannot be freed if some pages used as PGD/PUD include not only removed memory but also other memory. So this patch uses the following way to check whether a page can be freed or not. 1) When removing memory, the page structs of the removed memory are filled with 0FD. 2) All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. For direct mapping pages, update direct_pages_count[level] when we freed their pagetables. And do not free the pages again because they were freed when offlining. For vmemmap pages, free the pages and their pagetables. For larger pages, do not split them into smaller ones because there is no way to know if the larger page has been split. As a result, there is no way to decide when to split. We deal the larger pages in the following way: 1) For direct mapped pages, all the pages were freed when they were offlined. And since menmory offline is done section by section, all the memory ranges being removed are aligned to PAGE_SIZE. So only need to deal with unaligned pages when freeing vmemmap pages. 2) For vmemmap pages being used to store page_struct, if part of the larger page is still in use, just fill the unused part with 0xFD. And when the whole page is fulfilled with 0xFD, then free the larger page. [akpm@linux-foundation.org: fix typo in comment] [tangchen@cn.fujitsu.com: do not calculate direct mapping pages when freeing vmemmap pagetables] [tangchen@cn.fujitsu.com: do not free direct mapping pages twice] [tangchen@cn.fujitsu.com: do not free page split from hugepage one by one] [tangchen@cn.fujitsu.com: do not split pages when freeing pagetable pages] [akpm@linux-foundation.org: use pmd_page_vaddr()] [akpm@linux-foundation.org: fix used-uninitialised bug] Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: NJianguo Wu <wujianguo@huawei.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 1月, 2013 2 次提交
-
-
由 Dave Hansen 提交于
This is necessary because __pa() does not work on some kinds of memory, like vmalloc() or the alloc_remap() areas on 32-bit NUMA systems. We have some functions to do conversions _like_ this in the vmalloc() code (like vmalloc_to_page()), but they do not work on sizes other than 4k pages. We would potentially need to be able to handle all the page sizes that we use for the kernel linear mapping (4k, 2M, 1G). In practice, on 32-bit NUMA systems, the percpu areas get stuck in the alloc_remap() area. Any __pa() call on them will break and basically return garbage. This patch introduces a new function slow_virt_to_phys(), which walks the kernel page tables on x86 and should do precisely the same logical thing as __pa(), but actually work on a wider range of memory. It should work on the normal linear mapping, vmalloc(), kmap(), etc... Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130122212433.4D1FCA62@kernel.stglabs.ibm.comAcked-by: NRik van Riel <riel@redhat.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Dave Hansen 提交于
try_preserve_large_page() can be slightly simplified by using the new page_level_*() helpers. This also moves the 'level' over to the new pg_level enum type. Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130122212432.14F3D993@kernel.stglabs.ibm.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 16 12月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit bd52276f ("x86-64/efi: Use EFI to deal with platform wall clock (again)"), and the two supporting commits: da5a108d: "x86/kernel: remove tboot 1:1 page table creation code" 185034e7: "x86, efi: 1:1 pagetable mapping for virtual EFI calls") as they all depend semantically on commit 53b87cf0 ("x86, mm: Include the entire kernel memory map in trampoline_pgd") that got reverted earlier due to the problems it caused. This was pointed out by Yinghai Lu, and verified by me on my Macbook Air that uses EFI. Pointed-out-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 11月, 2012 1 次提交
-
-
由 Yinghai Lu 提交于
We are going to map ram only, so under max_low_pfn_mapped, between 4g and max_pfn_mapped does not mean mapped at all. Use pfn_range_is_mapped() directly. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-13-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 17 11月, 2012 1 次提交
-
-
由 Alexander Duyck 提交于
When I made an attempt at separating __pa_symbol and __pa I found that there were a number of cases where __pa was used on an obvious symbol. I also caught one non-obvious case as _brk_start and _brk_end are based on the address of __brk_base which is a C visible symbol. In mark_rodata_ro I was able to reduce the overhead of kernel symbol to virtual memory translation by using a combination of __va(__pa_symbol()) instead of page_address(virt_to_page()). Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Link: http://lkml.kernel.org/r/20121116215640.8521.80483.stgit@ahduyck-cp1.jf.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 30 10月, 2012 1 次提交
-
-
由 Jan Beulich 提交于
Other than ix86, x86-64 on EFI so far didn't set the {g,s}et_wallclock accessors to the EFI routines, thus incorrectly using raw RTC accesses instead. Simply removing the #ifdef around the respective code isn't enough, however: While so far early get-time calls were done in physical mode, this doesn't work properly for x86-64, as virtual addresses would still need to be set up for all runtime regions (which wasn't the case on the system I have access to), so instead the patch moves the call to efi_enter_virtual_mode() ahead (which in turn allows to drop all code related to calling efi-get-time in physical mode). Additionally the earlier calling of efi_set_executable() requires the CPA code to cope, i.e. during early boot it must be avoided to call cpa_flush_array(), as the first thing this function does is a BUG_ON(irqs_disabled()). Also make the two EFI functions in question here static - they're not being referenced elsewhere. History: This commit was originally merged as bacef661 ("x86-64/efi: Use EFI to deal with platform wall clock") but it resulted in some ASUS machines no longer booting due to a firmware bug, and so was reverted in f026cfa8. A pre-emptive fix for the buggy ASUS firmware was merged in 03a1c254975e ("x86, efi: 1:1 pagetable mapping for virtual EFI calls") so now this patch can be reapplied. Signed-off-by: NJan Beulich <jbeulich@suse.com> Tested-by: NMatt Fleming <matt.fleming@intel.com> Acked-by: NMatthew Garrett <mjg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> [added commit history]
-
- 15 8月, 2012 1 次提交
-
-
由 H. Peter Anvin 提交于
This reverts commit bacef661. This commit has been found to cause serious regressions on a number of ASUS machines at the least. We probably need to provide a 1:1 map in addition to the EFI virtual memory map in order for this to work. Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Reported-and-bisected-by: NJérôme Carretero <cJ-ko@zougloub.eu> Cc: Jan Beulich <jbeulich@suse.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120805172903.5f8bb24c@zougloub.eu
-
- 11 6月, 2012 1 次提交
-
-
由 Wanpeng Li 提交于
Fix kernel-doc warnings in arch/x86/mm/ioremap.c and arch/x86/mm/pageattr.c, just like this one: Warning(arch/x86/mm/ioremap.c:204): No description found for parameter 'phys_addr' Warning(arch/x86/mm/ioremap.c:204): Excess function parameter 'offset' description in 'ioremap_nocache' Signed-off-by: NWanpeng Li <liwp@linux.vnet.ibm.com> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Cc: Wanpeng Li <liwp.linux@gmail.com> Link: http://lkml.kernel.org/r/1339296652-2935-1-git-send-email-liwp.linux@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 6月, 2012 1 次提交
-
-
由 Jan Beulich 提交于
Other than ix86, x86-64 on EFI so far didn't set the {g,s}et_wallclock accessors to the EFI routines, thus incorrectly using raw RTC accesses instead. Simply removing the #ifdef around the respective code isn't enough, however: While so far early get-time calls were done in physical mode, this doesn't work properly for x86-64, as virtual addresses would still need to be set up for all runtime regions (which wasn't the case on the system I have access to), so instead the patch moves the call to efi_enter_virtual_mode() ahead (which in turn allows to drop all code related to calling efi-get-time in physical mode). Additionally the earlier calling of efi_set_executable() requires the CPA code to cope, i.e. during early boot it must be avoided to call cpa_flush_array(), as the first thing this function does is a BUG_ON(irqs_disabled()). Also make the two EFI functions in question here static - they're not being referenced elsewhere. Signed-off-by: NJan Beulich <jbeulich@suse.com> Tested-by: NMatt Fleming <matt.fleming@intel.com> Acked-by: NMatthew Garrett <mjg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FBFBF5F020000780008637F@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 12月, 2011 2 次提交
-
-
由 Stanislaw Gruszka 提交于
When (no)bootmem finish operation, it pass pages to buddy allocator. Since debug_pagealloc_enabled is not set, we will do not protect pages, what is not what we want with CONFIG_DEBUG_PAGEALLOC=y. To fix remove debug_pagealloc_enabled. That variable was introduced by commit 12d6f21e "x86: do not PSE on CONFIG_DEBUG_PAGEALLOC=y" to get more CPA (change page attribude) code testing. But currently we have CONFIG_CPA_DEBUG, which test CPA. Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1322582711-14571-1-git-send-email-sgruszka@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 H Hartley Sweeten 提交于
Local functions should be marked static. This also quiets the following sparse noise: warning: symbol '_set_memory_array' was not declared. Should it be static? Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: hartleys@visionengravers.com Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 18 3月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
They were generated by 'codespell' and then manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi> Cc: trivial@kernel.org LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 10 3月, 2011 1 次提交
-
-
由 Andrea Arcangeli 提交于
It's forbidden to take the page_table_lock with the irq disabled or if there's contention the IPIs (for tlb flushes) sent with the page_table_lock held will never run leading to a deadlock. Nobody takes the pgd_lock from irq context so the _irqsave can be removed. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Acked-by: NRik van Riel <riel@redhat.com> Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <201102162345.p1GNjMjm021738@imap1.linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 03 2月, 2011 1 次提交
-
-
由 Matthieu CASTET 提交于
Xen want page table pages read only. But the initial page table (from head_*.S) live in .data or .bss. That was broken by 64edc8ed. There is absolutely no reason to force these pages RW after they have already been marked RO. Signed-off-by: NMatthieu CASTET <castet.matthieu@free.fr> Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 18 11月, 2010 2 次提交
-
-
由 Matthieu Castet 提交于
This patch expands functionality of CONFIG_DEBUG_RODATA to set main (static) kernel data area as NX. The following steps are taken to achieve this: 1. Linker script is adjusted so .text always starts and ends on a page bound 2. Linker script is adjusted so .rodata always start and end on a page boundary 3. NX is set for all pages from _etext through _end in mark_rodata_ro. 4. free_init_pages() sets released memory NX in arch/x86/mm/init.c 5. bios rom is set to x when pcibios is used. The results of patch application may be observed in the diff of kernel page table dumps: pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc00a0000 640K RW GLB NX pte +0xc00a0000-0xc0100000 384K RW GLB x pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte No pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc0100000 1M RW GLB NX pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte The patch has been originally developed for Linux 2.6.34-rc2 x86 by Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>. -v1: initial patch for 2.6.30 -v2: patch for 2.6.31-rc7 -v3: moved all code into arch/x86, adjusted credits -v4: fixed ifdef, removed credits from CREDITS -v5: fixed an address calculation bug in mark_nxdata_nx() -v6: added acked-by and PT dump diff to commit log -v7: minor adjustments for -tip -v8: rework with the merge of "Set first MB as RW+NX" Signed-off-by: NSiarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: NXuxian Jiang <jiang@cs.ncsu.edu> Signed-off-by: NMatthieu CASTET <castet.matthieu@free.fr> Cc: Arjan van de Ven <arjan@infradead.org> Cc: James Morris <jmorris@namei.org> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F82E.60601@free.fr> [ minor cleanliness edits ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 matthieu castet 提交于
This patch fixes a bug in try_preserve_large_page() which may result in improper large page preservation and improper application of page attributes to the memory area outside of the original change request. More specifically, the problem manifests itself when set_memory_*() is called for several pages at the beginning of the large page and try_preserve_large_page() erroneously concludes that the change can be applied to whole large page. The fix consists of 3 parts: 1. Addition of "required" protection attributes in static_protections(), so .data and .bss can be guaranteed to stay "RW" 2. static_protections() is now called for every small page within large page to determine compatibility of new protection attributes (instead of just small pages within the requested range). 3. Large page can be preserved only if attribute change is large-page-aligned and covers whole large page. -v1: Try_preserve_large_page() patch for Linux 2.6.34-rc2 -v2: Replaced pfn check with address check for kernel rw-data Signed-off-by: NSiarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: NXuxian Jiang <jiang@cs.ncsu.edu> Reviewed-by: NSuresh Siddha <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: James Morris <jmorris@namei.org> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F7F3.8030809@free.fr> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 06 4月, 2010 1 次提交
-
-
由 Pauli Nieminen 提交于
Setting single memory pages at a time to wc takes a lot time in cache flush. To reduce number of cache flush set_pages_array_wc and set_memory_array_wc can be used to set multiple pages to WC with single cache flush. This improves allocation performance for wc cached pages in drm/ttm. CC: Suresh Siddha <suresh.b.siddha@intel.com> CC: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com> Signed-off-by: NPauli Nieminen <suokkos@gmail.com> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
- 30 3月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: NTejun Heo <tj@kernel.org> Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
-
- 23 2月, 2010 1 次提交
-
-
由 Suresh Siddha 提交于
We currently enforce the !RW mapping for the kernel mapping that maps holes between different text, rodata and data sections. However, kernel identity mappings will have different RWX permissions to the pages mapping to text and to the pages padding (which are freed) the text, rodata sections. Hence kernel identity mappings will be broken to smaller pages. For 64-bit, kernel text and kernel identity mappings are different, so we can enable protection checks that come with CONFIG_DEBUG_RODATA, as well as retain 2MB large page mappings for kernel text. Konrad reported a boot failure with the Linux Xen paravirt guest because of this. In this paravirt guest case, the kernel text mapping and the kernel identity mapping share the same page-table pages. Thus forcing the !RW mapping for some of the kernel mappings also cause the kernel identity mappings to be read-only resulting in the boot failure. Linux Xen paravirt guest also uses 4k mappings and don't use 2M mapping. Fix this issue and retain large page performance advantage for native kernels by not working hard and not enforcing !RW for the kernel text mapping, if the current mapping is already using small page mapping. Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1266522700.2909.34.camel@sbs-t61.sc.intel.com> Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: stable@kernel.org [2.6.32, 2.6.33] Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 17 11月, 2009 1 次提交
-
-
由 H. Peter Anvin 提交于
Make set_memory_x/set_memory_nx directly aware of if NX is supported in the system or not, rather than requiring that every caller assesses that support independently. Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Starling <tstarling@wikimedia.org> Cc: Hannes Eder <hannes@hanneseder.net> LKML-Reference: <1258154897-6770-4-git-send-email-hpa@zytor.com> Acked-by: NKees Cook <kees.cook@canonical.com>
-
- 03 11月, 2009 2 次提交
-
-
由 Suresh Siddha 提交于
On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA. So use the kernel identity mapping instead of the kernel text mapping to modify the kernel text. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Tested-by: NSteven Rostedt <rostedt@goodmis.org> LKML-Reference: <20091029024821.080941108@sbs-t61.sc.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Suresh Siddha 提交于
Steven Rostedt reported that we are unconditionally making the kernel text mapping as read-only. i.e., if someone does cpa() to the kernel text area for setting/clearing any page table attribute, we unconditionally clear the read-write attribute for the kernel text mapping that is set at compile time. We should delay (to forbid the write attribute) and enforce only after the kernel has mapped the text as read-only. Reported-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Tested-by: NSteven Rostedt <rostedt@goodmis.org> LKML-Reference: <20091029024820.996634347@sbs-t61.sc.intel.com> [ marked kernel_set_to_readonly as __read_mostly ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 28 10月, 2009 1 次提交
-
-
由 Steven Rostedt 提交于
The commit 74e08179 x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA prevents text sections from becoming read/write using set_memory_rw. The dynamic ftrace changes all text pages to read/write just before converting the calls to tracing to nops, and vice versa. I orginally just added a flag to allow this transaction when ftrace did the change, but I also found that when the CPA testing was running it would remove the read/write as well, and ftrace does not do the text conversion on boot up, and the CPA changes caused the dynamic tracer to fail on self tests. The current solution I have is to simply not to prevent change_page_attr from setting the RW bit for kernel text pages. Reported-by: NIngo Molnar <mingo@elte.hu> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 20 10月, 2009 1 次提交
-
-
由 Suresh Siddha 提交于
CONFIG_DEBUG_RODATA chops the large pages spanning boundaries of kernel text/rodata/data to small 4KB pages as they are mapped with different attributes (text as RO, RODATA as RO and NX etc). On x86_64, preserve the large page mappings for kernel text/rodata/data boundaries when CONFIG_DEBUG_RODATA is enabled. This is done by allowing the RODATA section to be hugepage aligned and having same RWX attributes for the 2MB page boundaries Extra Memory pages padding the sections will be freed during the end of the boot and the kernel identity mappings will have different RWX permissions compared to the kernel text mappings. Kernel identity mappings to these physical pages will be mapped with smaller pages but large page mappings are still retained for kernel text,rodata,data mappings. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091014220254.190119924@sbs-t61.sc.intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 12 9月, 2009 1 次提交
-
-
由 Eric Anholt 提交于
Ever since we enabled GEM, the pre-9xx chipsets (particularly 865) have had serious stability issues. Back in May a wbinvd was added to the DRM to work around much of the problem. Some failure remained -- easily visible by dragging a window around on an X -retro desktop, or by looking at bugzilla. The chipset flush was on the right track -- hitting the right amount of memory, and it appears to be the only way to flush on these chipsets, but the flush page was mapped uncached. As a result, the writes trying to clear the writeback cache ended up bypassing the cache, and not flushing anything! The wbinvd would flush out other writeback data and often cause the data we wanted to get flushed, but not always. By removing the setting of the page to UC and instead just clflushing the data we write to try to flush it, we get the desired behavior with no wbinvd. This exports clflush_cache_range(), which was laying around and happened to basically match the code I was otherwise going to copy from the DRM. Signed-off-by: NEric Anholt <eric@anholt.net> Signed-off-by: NBrice Goglin <Brice.Goglin@ens-lyon.org> Cc: stable@kernel.org
-
- 10 9月, 2009 1 次提交
-
-
由 Jack Steiner 提交于
Fix address passed to cpa_flush_range() when changing page attributes from WB to UC. The address (*addr) is modified by __change_page_attr_set_clr(). The result is that the pages being flushed start at the _end_ of the changed range instead of the beginning. This should be considered for 2.6.30-stable and 2.6.31-stable. Signed-off-by: NJack Steiner <steiner@sgi.com> Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Cc: Stable team <stable@kernel.org>
-
- 14 8月, 2009 1 次提交
-
-
由 Tejun Heo 提交于
With x86 converted to embedding allocator, lpage doesn't have any user left. Kill it along with cpa handling code. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com>
-
- 04 8月, 2009 1 次提交
-
-
由 Thomas Hellstrom 提交于
The code was incorrectly reserving memtypes using the page virtual address instead of the physical address. Furthermore, the code was not ignoring highmem pages as it ought to. ( upstream does not pass in highmem pages yet - but upcoming graphics code will do it and there's no reason to not handle this properly in the CPA APIs.) Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=13884Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com> Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> Cc: <stable@kernel.org> Cc: dri-devel@lists.sourceforge.net Cc: venkatesh.pallipadi@intel.com LKML-Reference: <1249284345-7654-1-git-send-email-thellstrom@vmware.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 31 7月, 2009 1 次提交
-
-
由 Pallipadi, Venkatesh 提交于
Changeset 3869c4aa that went in after 2.6.30-rc1 was a seemingly small change to _set_memory_wc() to make it complaint with SDM requirements. But, introduced a nasty bug, which can result in crash and/or strange corruptions when set_memory_wc is used. One such crash reported here http://lkml.org/lkml/2009/7/30/94 Actually, that changeset introduced two bugs. * change_page_attr_set() takes &addr as first argument and can the addr value might have changed on return, even for single page change_page_attr_set() call. That will make the second change_page_attr_set() in this routine operate on unrelated addr, that can eventually cause strange corruptions and bad page state crash. * The second change_page_attr_set() call, before setting _PAGE_CACHE_WC, should clear the earlier _PAGE_CACHE_UC_MINUS, as otherwise cache attribute will not be WC (will be UC instead). The patch below fixes both these problems. Sending a single patch to fix both the problems, as the change is to the same line of code. The change to have a addr_copy is not very clean. But, it is simpler than making more changes through various routines in pageattr.c. A huge thanks to Jerome for reporting this problem and providing a simple test case that helped us root cause the problem. Reported-by: NJerome Glisse <glisse@freedesktop.org> Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20090730214319.GA1889@linux-os.sc.intel.com> Acked-by: NDave Airlie <airlied@redhat.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 04 7月, 2009 1 次提交
-
-
由 Tejun Heo 提交于
Generalize and move x86 setup_pcpu_lpage() into pcpu_lpage_first_chunk(). setup_pcpu_lpage() now is a simple wrapper around the generalized version. Other than taking size parameters and using arch supplied callbacks to allocate/free/map memory, pcpu_lpage_first_chunk() is identical to the original implementation. This simplifies arch code and will help converting more archs to dynamic percpu allocator. While at it, factor out pcpu_calc_fc_sizes() which is common to pcpu_embed_first_chunk() and pcpu_lpage_first_chunk(). [ Impact: code reorganization and generalization ] Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
-
- 22 6月, 2009 2 次提交
-
-
由 Tejun Heo 提交于
lpage allocator aliases a PMD page for each cpu and returns whatever is unused to the page allocator. When the pageattr of the recycled pages are changed, this makes the two aliases point to the overlapping regions with different attributes which isn't allowed and known to cause subtle data corruption in certain cases. This can be handled in simliar manner to the x86_64 highmap alias. pageattr code should detect if the target pages have PMD alias and split the PMD alias and synchronize the attributes. pcpur allocator is updated to keep the allocated PMD pages map sorted in ascending address order and provide pcpu_lpage_remapped() function which binary searches the array to determine whether the given address is aliased and if so to which address. pageattr is updated to use pcpu_lpage_remapped() to detect the PMD alias and split it up as necessary from cpa_process_alias(). Jan Beulich spotted the original problem and incorrect usage of vaddr instead of laddr for lookup. With this, lpage percpu allocator should work correctly. Re-enable it. [ Impact: fix subtle lpage pageattr bug and re-enable lpage ] Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NJan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>
-
由 Tejun Heo 提交于
Reorganize cpa_process_alias() so that new alias condition can be added easily. Jan Beulich spotted problem in the original cleanup thread which incorrectly assumed the two existing conditions were mutially exclusive. [ Impact: code reorganization ] Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>
-
- 15 6月, 2009 1 次提交
-
-
由 Vegard Nossum 提交于
As these are allocated using the page allocator, we need to pass __GFP_NOTRACK before we add page allocator support to kmemcheck. Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
-
- 27 5月, 2009 1 次提交
-
-
由 Pallipadi, Venkatesh 提交于
Cleanup cpa_flush_array() to avoid back to back on_each_cpu() calls. [ Impact: optimizes fix 0af48f42 ] Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 23 5月, 2009 2 次提交
-
-
cpa_flush_array seems to prefer wbinvd() over clflush at 4M threshold. clflush needs to be done on only one CPU as per instruction definition. wbinvd() however, should be done on all CPUs. [ Impact: fix missing flush which could cause data corruption ] Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
wbinvd is supported on all CPUs 486 or later. But, pageattr.c is checking x86_model >= 4 before wbinvd(), which looks like an oversight bug. It was first introduced at one place by changeset d7c8f21a and got copied over to second place in the same file later. [ Impact: fix missing cache flush on early-model CPUs, potential data corruption ] Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 10 4月, 2009 2 次提交
-
-
As per SDM, there should not be any aliasing of a WC with any cacheable type across CPUs. That is if one CPU is changing the identity map memtype to _WC, no other CPU at the time of this change should not have a TLB for this page that carries a WB attribute. SDM suggests to make the page not present. But for that we will have to handle any page faults that can potentially happen due to these pages being not present. Other way to deal with this without having any WB mapping is to change the page first to UC and then to WC. This ensures that we meet the SDM requirement of no cacheable alais to WC page. This also has same or lower overhead than marking the page not present and making it present later. Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20090409212708.797481000@intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
Handle faults and do proper cleanups in set_memory_*() functions. In some cases, these functions were not doing proper free on failure paths. With the changes to tracking memtype of RAM pages in struct page instead of pat list, we do not need the changes in commits c5e147. This patch reverts that change. Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20090409212708.653222000@intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-