- 08 12月, 2014 1 次提交
-
-
由 Rasmus Villemoes 提交于
seq_puts is a lot cheaper than seq_printf, so use that to print literal strings. Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Link: http://lkml.kernel.org/r/1417208622-12264-1-git-send-email-linux@rasmusvillemoes.dkSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 11月, 2014 2 次提交
-
-
由 Dave Hansen 提交于
get_reg_offset() used to return the register contents themselves instead of the register offset. When it did that, it was an unsigned long. I changed it to return an integer _offset_ instead of the register. But, I neglected to change the return type of the function or the variables in which we store the result of the call. This fixes up the code to clear up the warnings from the smatch bot: New smatch warnings: arch/x86/mm/mpx.c:178 mpx_get_addr_ref() warn: unsigned 'addr_offset' is never less than zero. arch/x86/mm/mpx.c:184 mpx_get_addr_ref() warn: unsigned 'base_offset' is never less than zero. arch/x86/mm/mpx.c:188 mpx_get_addr_ref() warn: unsigned 'indx_offset' is never less than zero. arch/x86/mm/mpx.c:196 mpx_get_addr_ref() warn: unsigned 'addr_offset' is never less than zero. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20141118182343.C3E0C629@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Kees Cook 提交于
When setting up permissions on kernel memory at boot, the end of the PMD that was split from bss remained executable. It should be NX like the rest. This performs a PMD alignment instead of a PAGE alignment to get the correct span of memory. Before: ---[ High Kernel Mapping ]--- ... 0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte 0xffffffff82200000-0xffffffff82c00000 10M RW PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82df5000 2004K RW GLB NX pte 0xffffffff82df5000-0xffffffff82e00000 44K RW GLB x pte 0xffffffff82e00000-0xffffffffc0000000 978M pmd After: ---[ High Kernel Mapping ]--- ... 0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte 0xffffffff82200000-0xffffffff82e00000 12M RW PSE GLB NX pmd 0xffffffff82e00000-0xffffffffc0000000 978M pmd [ tglx: Changed it to roundup(_brk_end, PMD_SIZE) and added a comment. We really should unmap the reminder along with the holes caused by init,initdata etc. but thats a different issue ] Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20141114194737.GA3091@www.outflux.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 18 11月, 2014 4 次提交
-
-
由 Dave Hansen 提交于
The previous patch allocates bounds tables on-demand. As noted in an earlier description, these can add up to *HUGE* amounts of memory. This has caused OOMs in practice when running tests. This patch adds support for freeing bounds tables when they are no longer in use. There are two types of mappings in play when unmapping tables: 1. The mapping with the actual data, which userspace is munmap()ing or brk()ing away, etc... 2. The mapping for the bounds table *backing* the data (is tagged with VM_MPX, see the patch "add MPX specific mmap interface"). If userspace use the prctl() indroduced earlier in this patchset to enable the management of bounds tables in kernel, when it unmaps the first type of mapping with the actual data, the kernel needs to free the mapping for the bounds table backing the data. This patch hooks in at the very end of do_unmap() to do so. We look at the addresses being unmapped and find the bounds directory entries and tables which cover those addresses. If an entire table is unused, we clear associated directory entry and free the table. Once we unmap the bounds table, we would have a bounds directory entry pointing at empty address space. That address space might now be allocated for some other (random) use, and the MPX hardware might now try to walk it as if it were a bounds table. That would be bad. So any unmapping of an enture bounds table has to be accompanied by a corresponding write to the bounds directory entry to invalidate it. That write to the bounds directory can fault, which causes the following problem: Since we are doing the freeing from munmap() (and other paths like it), we hold mmap_sem for write. If we fault, the page fault handler will attempt to acquire mmap_sem for read and we will deadlock. To avoid the deadlock, we pagefault_disable() when touching the bounds directory entry and use a get_user_pages() to resolve the fault. The unmapping of bounds tables happends under vm_munmap(). We also (indirectly) call vm_munmap() to _do_ the unmapping of the bounds tables. We avoid unbounded recursion by disallowing freeing of bounds tables *for* bounds tables. This would not occur normally, so should not have any practical impact. Being strict about it here helps ensure that we do not have an exploitable stack overflow. Based-on-patch-by: NQiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151831.E4531C4A@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Dave Hansen 提交于
This is really the meat of the MPX patch set. If there is one patch to review in the entire series, this is the one. There is a new ABI here and this kernel code also interacts with userspace memory in a relatively unusual manner. (small FAQ below). Long Description: This patch adds two prctl() commands to provide enable or disable the management of bounds tables in kernel, including on-demand kernel allocation (See the patch "on-demand kernel allocation of bounds tables") and cleanup (See the patch "cleanup unused bound tables"). Applications do not strictly need the kernel to manage bounds tables and we expect some applications to use MPX without taking advantage of this kernel support. This means the kernel can not simply infer whether an application needs bounds table management from the MPX registers. The prctl() is an explicit signal from userspace. PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to require kernel's help in managing bounds tables. PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel won't allocate and free bounds tables even if the CPU supports MPX. PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds directory out of a userspace register (bndcfgu) and then cache it into a new field (->bd_addr) in the 'mm_struct'. PR_MPX_DISABLE_MANAGEMENT will set "bd_addr" to an invalid address. Using this scheme, we can use "bd_addr" to determine whether the management of bounds tables in kernel is enabled. Also, the only way to access that bndcfgu register is via an xsaves, which can be expensive. Caching "bd_addr" like this also helps reduce the cost of those xsaves when doing table cleanup at munmap() time. Unfortunately, we can not apply this optimization to #BR fault time because we need an xsave to get the value of BNDSTATUS. ==== Why does the hardware even have these Bounds Tables? ==== MPX only has 4 hardware registers for storing bounds information. If MPX-enabled code needs more than these 4 registers, it needs to spill them somewhere. It has two special instructions for this which allow the bounds to be moved between the bounds registers and some new "bounds tables". They are similar conceptually to a page fault and will be raised by the MPX hardware during both bounds violations or when the tables are not present. This patch handles those #BR exceptions for not-present tables by carving the space out of the normal processes address space (essentially calling the new mmap() interface indroduced earlier in this patch set.) and then pointing the bounds-directory over to it. The tables *need* to be accessed and controlled by userspace because the instructions for moving bounds in and out of them are extremely frequent. They potentially happen every time a register pointing to memory is dereferenced. Any direct kernel involvement (like a syscall) to access the tables would obviously destroy performance. ==== Why not do this in userspace? ==== This patch is obviously doing this allocation in the kernel. However, MPX does not strictly *require* anything in the kernel. It can theoretically be done completely from userspace. Here are a few ways this *could* be done. I don't think any of them are practical in the real-world, but here they are. Q: Can virtual space simply be reserved for the bounds tables so that we never have to allocate them? A: As noted earlier, these tables are *HUGE*. An X-GB virtual area needs 4*X GB of virtual space, plus 2GB for the bounds directory. If we were to preallocate them for the 128TB of user virtual address space, we would need to reserve 512TB+2GB, which is larger than the entire virtual address space today. This means they can not be reserved ahead of time. Also, a single process's pre-popualated bounds directory consumes 2GB of virtual *AND* physical memory. IOW, it's completely infeasible to prepopulate bounds directories. Q: Can we preallocate bounds table space at the same time memory is allocated which might contain pointers that might eventually need bounds tables? A: This would work if we could hook the site of each and every memory allocation syscall. This can be done for small, constrained applications. But, it isn't practical at a larger scale since a given app has no way of controlling how all the parts of the app might allocate memory (think libraries). The kernel is really the only place to intercept these calls. Q: Could a bounds fault be handed to userspace and the tables allocated there in a signal handler instead of in the kernel? A: (thanks to tglx) mmap() is not on the list of safe async handler functions and even if mmap() would work it still requires locking or nasty tricks to keep track of the allocation state there. Having ruled out all of the userspace-only approaches for managing bounds tables that we could think of, we create them on demand in the kernel. Based-on-patch-by: NQiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Dave Hansen 提交于
This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. We have to be very careful when decoding these instructions. They are completely controlled by userspace and may be changed at any time up to and including the point where we try to copy them in to the kernel. They may or may not be MPX instructions and could be completely invalid for all we know. Note: This code is based on Qiaowei Ren's specialized MPX decoder, but uses the generic decoder whenever possible. It was tested for robustness by generating a completely random data stream and trying to decode that stream. I also unmapped random pages inside the stream to test the "partial instruction" short read code. We kzalloc() the siginfo instead of stack allocating it because we need to memset() it anyway, and doing this makes it much more clear when it got initialized by the MPX instruction decoder. Changes from the old decoder: * Use the generic decoder instead of custom functions. Saved ~70 lines of code overall. * Remove insn->addr_bytes code (never used??) * Make sure never to possibly overflow the regoff[] array, plus check the register range correctly in 32 and 64-bit modes. * Allow get_reg() to return an error and have mpx_get_addr_ref() handle when it sees errors. * Only call insn_get_*() near where we actually use the values instead if trying to call them all at once. * Handle short reads from copy_from_user() and check the actual number of read bytes against what we expect from insn_get_length(). If a read stops in the middle of an instruction, we error out. * Actually check the opcodes intead of ignoring them. * Dynamically kzalloc() siginfo_t so we don't leak any stack data. * Detect and handle decoder failures instead of ignoring them. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Based-on-patch-by: NQiaowei Ren <qiaowei.ren@intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151828.5BDD0915@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Qiaowei Ren 提交于
We have chosen to perform the allocation of bounds tables in kernel (See the patch "on-demand kernel allocation of bounds tables") and to mark these VMAs with VM_MPX. However, there is currently no suitable interface to actually do this. Existing interfaces, like do_mmap_pgoff(), have no way to set a modified ->vm_ops or ->vm_flags and don't hold mmap_sem long enough to let a caller do it. This patch wraps mmap_region() and hold mmap_sem long enough to make the modifications to the VMA which we need. Also note the 32/64-bit #ifdef in the header. We actually need to do this at runtime eventually. But, for now, we don't support running 32-bit binaries on 64-bit kernels. Support for this will come in later patches. Signed-off-by: NQiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151827.CE440F67@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 17 11月, 2014 1 次提交
-
-
由 Thomas Gleixner 提交于
Commit e00c8cc9 "x86: Use new cache mode type in memtype related functions" broke the ARCH=um build. arch/x86/include/asm/cacheflush.h:67:36: error: return type is an incomplete type static inline enum page_cache_mode get_page_memtype(struct page *pg) The reason is simple. get_page_memtype() and set_page_memtype() require enum page_cache_mode now, which is defined in asm/pgtable_types.h. UM does not include that file for obvious reasons. The simple solution is to move that functions to arch/x86/mm/pat.c where the only callsites of this are located. They should have been there in the first place. Fixes: e00c8cc9 "x86: Use new cache mode type in memtype related functions" Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Weinberger <richard@nod.at>
-
- 16 11月, 2014 12 次提交
-
-
由 Juergen Gross 提交于
Update the translation tables from cache mode to pgprot values according to the PAT settings. This enables changing the cache attributes of a PAT index in just one place without having to change at the users side. With this change it is possible to use the same kernel with different PAT configurations, e.g. supporting Xen. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NToshi Kani <toshi.kani@hp.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-18-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Juergen Gross 提交于
The PAT bit in the ptes is not moved to the correct position when copying page protection attributes between entries of different sized pages. Translate the ptes according to their page size. Based-on-patch-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-17-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Juergen Gross 提交于
Dumping page table protection bits is not correct for entries on levels 2 and 3 regarding the PAT bit, which is at a different position as on level 4. Based-on-patch-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-16-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Juergen Gross 提交于
Instead of directly using the cache mode bits in the pte switch to using the cache mode type. Based-on-patch-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-14-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Juergen Gross 提交于
Instead of directly using the cache mode bits in the pte switch to using the cache mode type. Based-on-patch-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-13-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Juergen Gross 提交于
Instead of directly using the cache mode bits in the pte switch to using the cache mode type in the functions for modifying page attributes. Based-on-patch-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-12-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Juergen Gross 提交于
When modifying page attributes via change_page_attr_set_clr() don't test for setting _PAGE_PAT_LARGE, as this is - never done - PAT support for large pages is not included in the kernel up to now Signed-off-by: NJuergen Gross <jgross@suse.com> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-11-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Juergen Gross 提交于
Instead of directly using the cache mode bits in the pte switch to using the cache mode type. As those are the main callers of lookup_memtype(), change this as well. Based-on-patch-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-10-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Juergen Gross 提交于
Instead of directly using the cache mode bits in the pte switch to using the cache mode type. This requires to change io_reserve_memtype() as well. Based-on-patch-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-9-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Juergen Gross 提交于
Instead of directly using the cache mode bits in the pte switch to using the cache mode type. This requires changing some callers of is_new_memtype_allowed() to be changed as well. Based-on-patch-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-8-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Juergen Gross 提交于
Instead of directly using the cache mode bits in the pte switch to using the cache mode type. Based-on-patch-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-7-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Juergen Gross 提交于
At the moment there are a lot of places that handle setting or getting the page cache mode by treating the pgprot bits equal to the cache mode. This is only true because there are a lot of assumptions about the setup of the PAT MSR. Otherwise the cache type needs to get translated into pgprot bits and vice versa. This patch tries to prepare for that by introducing a separate type for the cache mode and adding functions to translate between those and pgprot values. To avoid too much performance penalty the translation between cache mode and pgprot values is done via tables which contain the relevant information. Write-back cache mode is hard-wired to be 0, all other modes are configurable via those tables. For large pages there are translation functions as the PAT bit is located at different positions in the ptes of 4k and large pages. Based-on-patch-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-2-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 12 11月, 2014 1 次提交
-
-
由 Mathias Krause 提交于
In commit 3891a04a ("x86-64, espfix: Don't leak bits 31:16 of %esp returning..") the "ESPFix Area" was added to the page table dump special sections. That area, though, has a limited amount of entries printed. The EFI runtime services are, unfortunately, located in-between the espfix area and the high kernel memory mapping. Due to the enforced limitation for the espfix area, the EFI mappings won't be printed in the page table dump. To make the ESP runtime service mappings visible again, provide them a dedicated entry. Signed-off-by: NMathias Krause <minipli@googlemail.com> Acked-by: NBorislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
-
- 10 11月, 2014 1 次提交
-
-
由 Thierry Reding 提交于
The xlate_dev_{kmem,mem}_ptr() functions take either a physical address or a kernel virtual address, so data types should be phys_addr_t and void *. They both return a kernel virtual address which is only ever used in calls to copy_{from,to}_user(), so make variables that store it void * rather than char * for consistency. Also only define a weak unxlate_dev_mem_ptr() function if architectures haven't overridden them in the asm/io.h header file. Signed-off-by: NThierry Reding <treding@nvidia.com>
-
- 05 11月, 2014 1 次提交
-
-
由 Daniel J Blueman 提交于
On large-memory x86-64 systems of 64GB or more with memory hot-plug enabled, use a 2GB memory block size. Eg with 64GB memory, this reduces the number of directories in /sys/devices/system/memory from 512 to 32, making it more manageable, and reducing the creation time accordingly. This caveat is that the memory can't be offlined (for hotplug or otherwise) with the finer default 128MB granularity, but this is unimportant due to the high memory densities generally used with such large-memory systems, where eg a single DIMM is the order of 16GB. Signed-off-by: NDaniel J Blueman <daniel@numascale.com> Cc: Steffen Persvold <sp@numascale.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: http://lkml.kernel.org/r/1415089784-28779-4-git-send-email-daniel@numascale.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 29 10月, 2014 1 次提交
-
-
由 Dexuan Cui 提交于
pte_pfn() returns a PFN of long (32 bits in 32-PAE), so "long << PAGE_SHIFT" will overflow for PFNs above 4GB. Due to this issue, some Linux 32-PAE distros, running as guests on Hyper-V, with 5GB memory assigned, can't load the netvsc driver successfully and hence the synthetic network device can't work (we can use the kernel parameter mem=3000M to work around the issue). Cast pte_pfn() to phys_addr_t before shifting. Fixes: "commit d7656534: x86, mm: Create slow_virt_to_phys()" Signed-off-by: NDexuan Cui <decui@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: gregkh@linuxfoundation.org Cc: linux-mm@kvack.org Cc: olaf@aepfle.de Cc: apw@canonical.com Cc: jasowang@redhat.com Cc: dave.hansen@intel.com Cc: riel@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1414580017-27444-1-git-send-email-decui@microsoft.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 14 10月, 2014 2 次提交
-
-
由 Xishi Qiu 提交于
If all the nodes are marked hotpluggable, alloc node data will fail. Because __next_mem_range_rev() will skip the hotpluggable memory regions. numa_clear_kernel_node_hotplug() is called after alloc node data. numa_init() ... ret = init_func(); // this will mark hotpluggable flag from SRAT ... memblock_set_bottom_up(false); ... ret = numa_register_memblks(&numa_meminfo); // this will alloc node data(pglist_data) ... numa_clear_kernel_node_hotplug(); // in case all the nodes are hotpluggable ... numa_register_memblks() setup_node_data() memblock_find_in_range_node() __memblock_find_range_top_down() for_each_mem_range_rev() __next_mem_range_rev() This patch moves numa_clear_kernel_node_hotplug() into numa_register_memblks(), clear kernel node hotpluggable flag before alloc node data, then alloc node data won't fail even all the nodes are hotpluggable. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NXishi Qiu <qiuxishi@huawei.com> Cc: Dave Jones <davej@redhat.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Travis 提交于
Use the optimized ioresource lookup, "region_is_ram", for the ioremap function. If the region is not found, it falls back to the "page_is_ram" function. If it is found and it is RAM, then the usual warning message is issued, and the ioremap operation is aborted. Otherwise, the ioremap operation continues. Signed-off-by: NMike Travis <travis@sgi.com> Acked-by: NAlex Thorlton <athorlton@sgi.com> Reviewed-by: NCliff Wickman <cpw@sgi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mark Salter <msalter@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 9月, 2014 2 次提交
-
-
由 David Vrabel 提交于
The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs that were used to map I/O regions that are 1:1 in the p2m. This allowed Xen to obtain the correct PFN when converting the MFNs read from a PTE back to their PFN. Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn() returns the correct PFN by using a combination of the m2p and p2m to determine if an MFN corresponds to a 1:1 mapping in the the p2m. Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for future uses of the PTE flag. Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
-
由 David Vrabel 提交于
If a fault on a kernel address is due to a non-present page, then it cannot be the result of stale TLB entry from a protection change (RO to RW or NX to X). Thus the pagetable walk in spurious_fault() can be skipped. See the initial if in spurious_fault() and the tests in spurious_fault_check()) for the set of possible error codes checked for spurious faults. These are: IRUWP Before x00xx && ( 1xxxx || xxx1x ) After ( 10001 || 00011 ) && ( 1xxxx || xxx1x ) Thus the new condition is a subset of the previous one, excluding only non-present faults (I == 1 and W == 1 are mutually exclusive). This avoids spurious_fault() oopsing in some cases if the pagetables it attempts to walk are not accessible. This obscures the location of the original fault. This also fixes a crash with Xen PV guests when they access entries in the M2P corresponding to device MMIO regions. The M2P is mapped (read-only) by Xen into the kernel address space of the guest and this mapping may contains holes for non-RAM regions. Read faults will result in calls to spurious_fault(), but because the page tables for the M2P mappings are not accessible by the guest the pagetable walk would fault. This was not normally a problem as MMIO mappings would not normally result in a M2P lookup because of the use of the _PAGE_IOMAP bit the PTE. However, removing the _PAGE_IOMAP bit requires M2P lookups for MMIO mappings as well. Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NDave Hansen <dave.hansen@intel.com>
-
- 19 9月, 2014 2 次提交
-
-
由 Aaron Tomlin 提交于
This facility is used in a few places so let's introduce a helper function to improve code readability. Signed-off-by: NAaron Tomlin <atomlin@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: aneesh.kumar@linux.vnet.ibm.com Cc: dzickus@redhat.com Cc: bmr@redhat.com Cc: jcastillo@redhat.com Cc: oleg@redhat.com Cc: riel@redhat.com Cc: prarit@redhat.com Cc: jgh@redhat.com Cc: minchan@kernel.org Cc: mpe@ellerman.id.au Cc: tglx@linutronix.de Cc: hannes@cmpxchg.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1410527779-8133-3-git-send-email-atomlin@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Aaron Tomlin 提交于
Tasks get their end of stack set to STACK_END_MAGIC with the aim to catch stack overruns. Currently this feature does not apply to init_task. This patch removes this restriction. Note that a similar patch was posted by Prarit Bhargava some time ago but was never merged: http://marc.info/?l=linux-kernel&m=127144305403241&w=2Signed-off-by: NAaron Tomlin <atomlin@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: aneesh.kumar@linux.vnet.ibm.com Cc: dzickus@redhat.com Cc: bmr@redhat.com Cc: jcastillo@redhat.com Cc: jgh@redhat.com Cc: minchan@kernel.org Cc: tglx@linutronix.de Cc: hannes@cmpxchg.org Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Daeseok Youn <daeseok.youn@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 9月, 2014 3 次提交
-
-
由 Luiz Capitulino 提交于
The setup_node_data() function allocates a pg_data_t object, inserts it into the node_data[] array and initializes the following fields: node_id, node_start_pfn and node_spanned_pages. However, a few function calls later during the kernel boot, free_area_init_node() re-initializes those fields, possibly with setup_node_data() is not used. This causes a small glitch when running Linux as a hyperv numa guest: SRAT: PXM 0 -> APIC 0x00 -> Node 0 SRAT: PXM 0 -> APIC 0x01 -> Node 0 SRAT: PXM 1 -> APIC 0x02 -> Node 1 SRAT: PXM 1 -> APIC 0x03 -> Node 1 SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff] SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff] SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff] NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff] Initmem setup node 0 [mem 0x00000000-0x7fffffff] NODE_DATA [mem 0x7ffdc000-0x7ffeffff] Initmem setup node 1 [mem 0x80800000-0x1081fffff] NODE_DATA [mem 0x1081ea000-0x1081fdfff] crashkernel: memory value expected [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0 [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1 Zone ranges: DMA [mem 0x00001000-0x00ffffff] DMA32 [mem 0x01000000-0xffffffff] Normal [mem 0x100000000-0x1081fffff] Movable zone start for each node Early memory node ranges node 0: [mem 0x00001000-0x0009efff] node 0: [mem 0x00100000-0x7ffeffff] node 1: [mem 0x80200000-0xf7ffffff] node 1: [mem 0x100000000-0x1081fffff] On node 0 totalpages: 524174 DMA zone: 64 pages used for memmap DMA zone: 21 pages reserved DMA zone: 3998 pages, LIFO batch:0 DMA32 zone: 8128 pages used for memmap DMA32 zone: 520176 pages, LIFO batch:31 On node 1 totalpages: 524288 DMA32 zone: 7672 pages used for memmap DMA32 zone: 491008 pages, LIFO batch:31 Normal zone: 520 pages used for memmap Normal zone: 33280 pages, LIFO batch:7 In this dmesg, the SRAT table reports that the memory range for node 1 starts at 0x80200000. However, the line starting with "Initmem" reports that node 1 memory range starts at 0x80800000. The "Initmem" line is reported by setup_node_data() and is wrong, because the kernel ends up using the range as reported in the SRAT table. This commit drops all that dead code from setup_node_data(), renames it to alloc_node_data() and adds a printk() to free_area_init_node() so that we report a node's memory range accurately. Here's the same dmesg section with this patch applied: SRAT: PXM 0 -> APIC 0x00 -> Node 0 SRAT: PXM 0 -> APIC 0x01 -> Node 0 SRAT: PXM 1 -> APIC 0x02 -> Node 1 SRAT: PXM 1 -> APIC 0x03 -> Node 1 SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff] SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff] SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff] NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff] NODE_DATA(0) allocated [mem 0x7ffdc000-0x7ffeffff] NODE_DATA(1) allocated [mem 0x1081ea000-0x1081fdfff] crashkernel: memory value expected [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0 [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1 Zone ranges: DMA [mem 0x00001000-0x00ffffff] DMA32 [mem 0x01000000-0xffffffff] Normal [mem 0x100000000-0x1081fffff] Movable zone start for each node Early memory node ranges node 0: [mem 0x00001000-0x0009efff] node 0: [mem 0x00100000-0x7ffeffff] node 1: [mem 0x80200000-0xf7ffffff] node 1: [mem 0x100000000-0x1081fffff] Initmem setup node 0 [mem 0x00001000-0x7ffeffff] On node 0 totalpages: 524174 DMA zone: 64 pages used for memmap DMA zone: 21 pages reserved DMA zone: 3998 pages, LIFO batch:0 DMA32 zone: 8128 pages used for memmap DMA32 zone: 520176 pages, LIFO batch:31 Initmem setup node 1 [mem 0x80200000-0x1081fffff] On node 1 totalpages: 524288 DMA32 zone: 7672 pages used for memmap DMA32 zone: 491008 pages, LIFO batch:31 Normal zone: 520 pages used for memmap Normal zone: 33280 pages, LIFO batch:7 This commit was tested on a two node bare-metal NUMA machine and Linux as a numa guest on hyperv and qemu/kvm. PS: The wrong memory range reported by setup_node_data() seems to be harmless in the current kernel because it's just not used. However, that bad range is used in kernel 2.6.32 to initialize the old boot memory allocator, which causes a crash during boot. Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yasuaki Ishimatsu 提交于
When hot-adding/removing memory, sync_global_pgds() is called for synchronizing PGD to PGD entries of all processes MM. But when hot-removing memory, sync_global_pgds() does not work correctly. At first, sync_global_pgds() checks whether target PGD is none or not. And if PGD is none, the PGD is skipped. But when hot-removing memory, PGD may be none since PGD may be cleared by free_pud_table(). So when sync_global_pgds() is called after hot-removing memory, sync_global_pgds() should not skip PGD even if the PGD is none. And sync_global_pgds() must clear PGD entries of all processes MM. Currently sync_global_pgds() does not clear PGD entries of all processes MM when hot-removing memory. So when hot adding memory which is same memory range as removed memory after hot-removing memory, following call traces are shown: kernel BUG at arch/x86/mm/init_64.c:206! ... [<ffffffff815e0c80>] kernel_physical_mapping_init+0x1b2/0x1d2 [<ffffffff815ced94>] init_memory_mapping+0x1d4/0x380 [<ffffffff8104aebd>] arch_add_memory+0x3d/0xd0 [<ffffffff815d03d9>] add_memory+0xb9/0x1b0 [<ffffffff81352415>] acpi_memory_device_add+0x1af/0x28e [<ffffffff81325dc4>] acpi_bus_device_attach+0x8c/0xf0 [<ffffffff813413b9>] acpi_ns_walk_namespace+0xc8/0x17f [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff813418ed>] acpi_walk_namespace+0x95/0xc5 [<ffffffff81326b4c>] acpi_bus_scan+0x9a/0xc2 [<ffffffff81326bff>] acpi_scan_bus_device_check+0x8b/0x12e [<ffffffff81326cb5>] acpi_scan_device_check+0x13/0x15 [<ffffffff81320122>] acpi_os_execute_deferred+0x25/0x32 [<ffffffff8107e02b>] process_one_work+0x17b/0x460 [<ffffffff8107edfb>] worker_thread+0x11b/0x400 [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400 [<ffffffff81085aef>] kthread+0xcf/0xe0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 [<ffffffff815fc76c>] ret_from_fork+0x7c/0xb0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 This patch clears PGD entries of all processes MM when sync_global_pgds() is called after hot-removing memory Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yasuaki Ishimatsu 提交于
When hot-adding memory after hot-removing memory, following call traces are shown: kernel BUG at arch/x86/mm/init_64.c:206! ... [<ffffffff815e0c80>] kernel_physical_mapping_init+0x1b2/0x1d2 [<ffffffff815ced94>] init_memory_mapping+0x1d4/0x380 [<ffffffff8104aebd>] arch_add_memory+0x3d/0xd0 [<ffffffff815d03d9>] add_memory+0xb9/0x1b0 [<ffffffff81352415>] acpi_memory_device_add+0x1af/0x28e [<ffffffff81325dc4>] acpi_bus_device_attach+0x8c/0xf0 [<ffffffff813413b9>] acpi_ns_walk_namespace+0xc8/0x17f [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff813418ed>] acpi_walk_namespace+0x95/0xc5 [<ffffffff81326b4c>] acpi_bus_scan+0x9a/0xc2 [<ffffffff81326bff>] acpi_scan_bus_device_check+0x8b/0x12e [<ffffffff81326cb5>] acpi_scan_device_check+0x13/0x15 [<ffffffff81320122>] acpi_os_execute_deferred+0x25/0x32 [<ffffffff8107e02b>] process_one_work+0x17b/0x460 [<ffffffff8107edfb>] worker_thread+0x11b/0x400 [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400 [<ffffffff81085aef>] kthread+0xcf/0xe0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 [<ffffffff815fc76c>] ret_from_fork+0x7c/0xb0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 The patch-set fixes the issue. This patch (of 2): remove_pagetable() gets start argument and passes the argument to sync_global_pgds(). In this case, the argument must not be modified. If the argument is modified and passed to sync_global_pgds(), sync_global_pgds() does not correctly synchronize PGD to PGD entries of all processes MM since synchronized range of memory [start, end] is wrong. Unfortunately the start argument is modified in remove_pagetable(). So this patch fixes the issue. Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 9月, 2014 2 次提交
-
-
由 Jan-Simon Möller 提交于
This fixes a compilation error in clang in that a linker section attribute can't be added to a type: arch/x86/mm/mmap.c:34:8: error: '__section__' attribute only applies to functions and global variables struct __read_mostly ... By moving the section attribute to the variable declaration, the desired effect is achieved. Signed-off-by: NJan-Simon Möller <dl9pf@gmx.de> Signed-off-by: NBehan Webster <behanw@converseincode.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1409959005-11479-1-git-send-email-behanw@converseincode.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathias Krause 提交于
We should classify the espfix area as such only if we actually have enabled the corresponding option. Otherwise the page table dump might look confusing. Signed-off-by: NMathias Krause <minipli@googlemail.com> Link: http://lkml.kernel.org/r/1410114629-24523-1-git-send-email-minipli@googlemail.com Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 01 9月, 2014 1 次提交
-
-
由 Matthew Wilcox 提交于
The last user of set_pmd_pfn() went away in commit f03574f2, so this has been dead code for over a year. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> arch/x86/include/asm/pgtable_32.h | 3 --- arch/x86/mm/pgtable_32.c | 35 ----------------------------------- 2 files changed, 38 deletions(-)
-
- 27 8月, 2014 1 次提交
-
-
由 Christoph Lameter 提交于
__get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by: NH. Peter Anvin <hpa@linux.intel.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 10 8月, 2014 1 次提交
-
-
由 Jeremiah Mahler 提交于
A sparse warning is generated about 'tlb_single_page_flush_ceiling' not being declared. arch/x86/mm/tlb.c:177:15: warning: symbol 'tlb_single_page_flush_ceiling' was not declared. Should it be static? Since it isn't used anywhere outside this file, fix the warning by making it static. Also, optimize the use of this variable by adding the __read_mostly directive, as suggested by David Rientjes. Suggested-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NJeremiah Mahler <jmmahler@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/r/1407569913-4035-1-git-send-email-jmmahler@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 8月, 2014 1 次提交
-
-
由 Dave Hansen 提交于
Dave Jones reported seeing a bug from one of my TLB tracepoints: http://lkml.kernel.org/r/20140806181801.GA4605@redhat.com According to Paul McKenney, the right way to fix this is adding an _rcuidle suffix to the tracepoint. http://lkml.kernel.org/r/20140807065055.GA5821@linux.vnet.ibm.com This patch does just that. Reported-by: Dave Jones <davej@redhat.com>, Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Dave Hansen <dave@sr71.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140807175841.5C92D878@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 07 8月, 2014 1 次提交
-
-
由 Wang Nan 提交于
This patch introduces zone_for_memory() to arch_add_memory() on x86_32 to ensure new, higher memory added into ZONE_MOVABLE if movable zone has already setup. Signed-off-by: NWang Nan <wangnan0@huawei.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "Mel Gorman" <mgorman@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-