- 19 5月, 2015 1 次提交
-
-
由 David Hildenbrand 提交于
Introduce faulthandler_disabled() and use it to check for irq context and disabled pagefaults (via pagefault_disable()) in the pagefault handlers. Please note that we keep the in_atomic() checks in place - to detect whether in irq context (in which case preemption is always properly disabled). In contrast, preempt_disable() should never be used to disable pagefaults. With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt counter, and therefore the result of in_atomic() differs. We validate that condition by using might_fault() checks when calling might_sleep(). Therefore, add a comment to faulthandler_disabled(), describing why this is needed. faulthandler_disabled() and pagefault_disable() are defined in linux/uaccess.h, so let's properly add that include to all relevant files. This patch is based on a patch from Thomas Gleixner. Reviewed-and-tested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: benh@kernel.crashing.org Cc: bigeasy@linutronix.de Cc: borntraeger@de.ibm.com Cc: daniel.vetter@intel.com Cc: heiko.carstens@de.ibm.com Cc: herbert@gondor.apana.org.au Cc: hocko@suse.cz Cc: hughd@google.com Cc: mst@redhat.com Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: schwidefsky@de.ibm.com Cc: yang.shi@windriver.com Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 3月, 2015 2 次提交
-
-
由 Andy Lutomirski 提交于
user_mode_vm() and user_mode() are now the same. Change all callers of user_mode_vm() to user_mode(). The next patch will remove the definition of user_mode_vm. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/43b1f57f3df70df5a08b0925897c660725015554.1426728647.git.luto@kernel.org [ Merged to a more recent kernel. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
This is slightly shorter and slightly faster. It's also more correct: the split between user and kernel addresses is TASK_SIZE_MAX, regardless of ti->flags. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/09156b63bad90a327827003c9e53faa82ef4c56e.1426728647.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 04 2月, 2015 1 次提交
-
-
由 Andy Lutomirski 提交于
Context switches and TLB flushes can change individual bits of CR4. CR4 reads take several cycles, so store a shadow copy of CR4 in a per-cpu variable. To avoid wasting a cache line, I added the CR4 shadow to cpu_tlbstate, which is already touched in switch_mm. The heaviest users of the cr4 shadow will be switch_mm and __switch_to_xtra, and __switch_to_xtra is called shortly after switch_mm during context switch, so the cacheline is likely to be hot. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 30 1月, 2015 1 次提交
-
-
由 Linus Torvalds 提交于
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit fee7e49d ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: NTakashi Iwai <tiwai@suse.de> Tested-by: NJan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 12月, 2014 1 次提交
-
-
由 Linus Torvalds 提交于
My commit 26178ec1 ("x86: mm: consolidate VM_FAULT_RETRY handling") had a really stupid typo: the FAULT_FLAG_USER bit is in the 'flags' variable, not the 'fault' variable. Duh, The one silver lining in this is that Dave finding this at least confirms that trinity actually triggers this special path easily, in a way normal use does not. Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 12月, 2014 2 次提交
-
-
由 Linus Torvalds 提交于
The VM_FAULT_RETRY handling was confusing and incorrect for the case of returning to kernel mode. We need to handle the exception table fixup if we return to kernel mode due to a fatal signal - it will basically look to the kernel user mode access like the access failed due to the VM going away from udner it. Which is correct - the process is dying - and avoids the whole "repeat endless kernel page faults" case. Handling the VM_FAULT_RETRY early and in just one place also simplifies the mmap_sem handling, since once we've taken care of VM_FAULT_RETRY we know that we can just drop the lock. The remaining accounting and possible error handling is thread-local and does not need the mmap_sem. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
This replaces four copies in various stages of mm_fault_error() handling with just a single one. It will also allow for more natural placement of the unlocking after some further cleanup. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 9月, 2014 1 次提交
-
-
由 David Vrabel 提交于
If a fault on a kernel address is due to a non-present page, then it cannot be the result of stale TLB entry from a protection change (RO to RW or NX to X). Thus the pagetable walk in spurious_fault() can be skipped. See the initial if in spurious_fault() and the tests in spurious_fault_check()) for the set of possible error codes checked for spurious faults. These are: IRUWP Before x00xx && ( 1xxxx || xxx1x ) After ( 10001 || 00011 ) && ( 1xxxx || xxx1x ) Thus the new condition is a subset of the previous one, excluding only non-present faults (I == 1 and W == 1 are mutually exclusive). This avoids spurious_fault() oopsing in some cases if the pagetables it attempts to walk are not accessible. This obscures the location of the original fault. This also fixes a crash with Xen PV guests when they access entries in the M2P corresponding to device MMIO regions. The M2P is mapped (read-only) by Xen into the kernel address space of the guest and this mapping may contains holes for non-RAM regions. Read faults will result in calls to spurious_fault(), but because the page tables for the M2P mappings are not accessible by the guest the pagetable walk would fault. This was not normally a problem as MMIO mappings would not normally result in a M2P lookup because of the use of the _PAGE_IOMAP bit the PTE. However, removing the _PAGE_IOMAP bit requires M2P lookups for MMIO mappings as well. Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NDave Hansen <dave.hansen@intel.com>
-
- 19 9月, 2014 2 次提交
-
-
由 Aaron Tomlin 提交于
This facility is used in a few places so let's introduce a helper function to improve code readability. Signed-off-by: NAaron Tomlin <atomlin@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: aneesh.kumar@linux.vnet.ibm.com Cc: dzickus@redhat.com Cc: bmr@redhat.com Cc: jcastillo@redhat.com Cc: oleg@redhat.com Cc: riel@redhat.com Cc: prarit@redhat.com Cc: jgh@redhat.com Cc: minchan@kernel.org Cc: mpe@ellerman.id.au Cc: tglx@linutronix.de Cc: hannes@cmpxchg.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1410527779-8133-3-git-send-email-atomlin@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Aaron Tomlin 提交于
Tasks get their end of stack set to STACK_END_MAGIC with the aim to catch stack overruns. Currently this feature does not apply to init_task. This patch removes this restriction. Note that a similar patch was posted by Prarit Bhargava some time ago but was never merged: http://marc.info/?l=linux-kernel&m=127144305403241&w=2Signed-off-by: NAaron Tomlin <atomlin@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: aneesh.kumar@linux.vnet.ibm.com Cc: dzickus@redhat.com Cc: bmr@redhat.com Cc: jcastillo@redhat.com Cc: jgh@redhat.com Cc: minchan@kernel.org Cc: tglx@linutronix.de Cc: hannes@cmpxchg.org Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Daeseok Youn <daeseok.youn@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 9月, 2014 1 次提交
-
-
由 Yasuaki Ishimatsu 提交于
When hot-adding/removing memory, sync_global_pgds() is called for synchronizing PGD to PGD entries of all processes MM. But when hot-removing memory, sync_global_pgds() does not work correctly. At first, sync_global_pgds() checks whether target PGD is none or not. And if PGD is none, the PGD is skipped. But when hot-removing memory, PGD may be none since PGD may be cleared by free_pud_table(). So when sync_global_pgds() is called after hot-removing memory, sync_global_pgds() should not skip PGD even if the PGD is none. And sync_global_pgds() must clear PGD entries of all processes MM. Currently sync_global_pgds() does not clear PGD entries of all processes MM when hot-removing memory. So when hot adding memory which is same memory range as removed memory after hot-removing memory, following call traces are shown: kernel BUG at arch/x86/mm/init_64.c:206! ... [<ffffffff815e0c80>] kernel_physical_mapping_init+0x1b2/0x1d2 [<ffffffff815ced94>] init_memory_mapping+0x1d4/0x380 [<ffffffff8104aebd>] arch_add_memory+0x3d/0xd0 [<ffffffff815d03d9>] add_memory+0xb9/0x1b0 [<ffffffff81352415>] acpi_memory_device_add+0x1af/0x28e [<ffffffff81325dc4>] acpi_bus_device_attach+0x8c/0xf0 [<ffffffff813413b9>] acpi_ns_walk_namespace+0xc8/0x17f [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff813418ed>] acpi_walk_namespace+0x95/0xc5 [<ffffffff81326b4c>] acpi_bus_scan+0x9a/0xc2 [<ffffffff81326bff>] acpi_scan_bus_device_check+0x8b/0x12e [<ffffffff81326cb5>] acpi_scan_device_check+0x13/0x15 [<ffffffff81320122>] acpi_os_execute_deferred+0x25/0x32 [<ffffffff8107e02b>] process_one_work+0x17b/0x460 [<ffffffff8107edfb>] worker_thread+0x11b/0x400 [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400 [<ffffffff81085aef>] kthread+0xcf/0xe0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 [<ffffffff815fc76c>] ret_from_fork+0x7c/0xb0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 This patch clears PGD entries of all processes MM when sync_global_pgds() is called after hot-removing memory Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 07 8月, 2014 1 次提交
-
-
由 Paul Cassella 提交于
Add a comment describing the circumstances in which __lock_page_or_retry() will or will not release the mmap_sem when returning 0. Add comments to lock_page_or_retry()'s callers (filemap_fault(), do_swap_page()) noting the impact on VM_FAULT_RETRY returns. Add comments on up the call tree, particularly replacing the false "We return with mmap_sem still held" comments. Signed-off-by: NPaul Cassella <cassella@cray.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 6月, 2014 1 次提交
-
-
由 Jiri Kosina 提交于
If pagefault triggers due to SMEP triggering, it can't be really easily distinguished from any other oops-causing pagefault, which might lead to quite some confusion when trying to understand the reason for the oops. Print an explanatory message in case the fault happened during instruction fetch for _PAGE_USER page which is present and executable on SMEP-enabled CPUs. This is consistent with what we are doing for NX already; in addition to immediately seeing from the oops what might be happening, it can even easily give a good indication to sysadmins who are carefully monitoring their kernel logs that someone might be trying to pwn them. Signed-off-by: NJiri Kosina <jkosina@suse.cz> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1406102248490.1321@pobox.suse.czSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 06 5月, 2014 1 次提交
-
-
由 Andy Lutomirski 提交于
This makes the 64-bit and x32 vdsos use the same mechanism as the 32-bit vdso. Most of the churn is deleting all the old fixmap code. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/8af87023f57f6bb96ec8d17fce3f88018195b49b.1399317206.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 24 4月, 2014 1 次提交
-
-
由 Masami Hiramatsu 提交于
Use NOKPROBE_SYMBOL macro for protecting functions from kprobes instead of __kprobes annotation under arch/x86. This applies nokprobe_inline annotation for some cases, because NOKPROBE_SYMBOL() will inhibit inlining by referring the symbol address. This just folds a bunch of previous NOKPROBE_SYMBOL() cleanup patches for x86 to one patch. Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20140417081814.26341.51656.stgit@ltc230.yrl.intra.hitachi.co.jp Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp> Cc: Gleb Natapov <gleb@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Lebon <jlebon@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 07 3月, 2014 1 次提交
-
-
由 Peter Zijlstra 提交于
Building on commit 0ac09f9f ("x86, trace: Fix CR2 corruption when tracing page faults") this patch addresses another few issues: - Now that read_cr2() is lifted into trace_do_page_fault(), we should pass the address to trace_page_fault_entries() to avoid it re-reading a potentially changed cr2. - Put both trace_do_page_fault() and trace_page_fault_entries() under CONFIG_TRACING. - Mark both fault entry functions {,trace_}do_page_fault() as notrace to avoid getting __mcount or other function entry trace callbacks before we've observed CR2. - Mark __do_page_fault() as noinline to guarantee the function tracer does get to see the fault. Cc: <jolsa@redhat.com> Cc: <vincent.weaver@maine.edu> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140306145300.GO9987@twins.programming.kicks-ass.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 05 3月, 2014 2 次提交
-
-
由 Jiri Olsa 提交于
The trace_do_page_fault function trigger tracepoint and then handles the actual page fault. This could lead to error if the tracepoint caused page fault. The original cr2 value gets lost and the original page fault handler kills current process with SIGSEGV. This happens if you record page faults with callchain data, the user part of it will cause tracepoint handler to page fault: # perf record -g -e exceptions:page_fault_user ls Fixing this by saving the original cr2 value and using it after tracepoint handler is done. v2: Moving the cr2 read before exception_enter, because it could trigger tracepoint as well. Reported-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net> Reported-by: NVince Weaver <vincent.weaver@maine.edu> Tested-by: NVince Weaver <vincent.weaver@maine.edu> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1402211701380.6395@vincent-weaver-1.um.maine.edu Link: http://lkml.kernel.org/r/20140228160526.GD1133@krava.brq.redhat.com
-
由 Matt Fleming 提交于
Now that we have EFI-specific page tables we need to lookup the pgd when dumping those page tables, rather than assuming that swapper_pgdir is the current pgdir. Remove the double underscore prefix, which is usually reserved for static functions. Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
-
- 14 2月, 2014 1 次提交
-
-
由 H. Peter Anvin 提交于
If CONFIG_X86_SMAP is disabled, smap_violation() tests for conditions which are incorrect (as the AC flag doesn't matter), causing spurious faults. The dynamic disabling of SMAP (nosmap on the command line) is fine because it disables X86_FEATURE_SMAP, therefore causing the static_cpu_has() to return false. Found by Fengguang Wu's test system. [ v3: move all predicates into smap_violation() ] [ v2: use IS_ENABLED() instead of #ifdef ] Reported-by: NFengguang Wu <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhostSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.7+
-
- 16 1月, 2014 1 次提交
-
-
由 Peter Zijlstra 提交于
Waiman managed to trigger a PMI while in a emulate_vsyscall() fault, the PMI in turn managed to trigger a fault while obtaining a stack trace. This triggered the sig_on_uaccess_error recursive fault logic and killed the process dead. Fix this by explicitly excluding interrupts from the recursive fault logic. Reported-and-Tested-by: NWaiman Long <waiman.long@hp.com> Fixes: e00b12e6 ("perf/x86: Further optimize copy_from_user_nmi()") Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140110200603.GJ7572@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 11月, 2013 1 次提交
-
-
由 Jiri Slaby 提交于
Consider a kernel crash in a module, simulated the following way: static int my_init(void) { char *map = (void *)0x5; *map = 3; return 0; } module_init(my_init); When we turn off FRAME_POINTERs, the very first instruction in that function causes a BUG. The problem is that we print IP in the BUG report using %pB (from printk_address). And %pB decrements the pointer by one to fix printing addresses of functions with tail calls. This was added in commit 71f9e598 ("x86, dumpstack: Use %pB format specifier for stack trace") to fix the call stack printouts. So instead of correct output: BUG: unable to handle kernel NULL pointer dereference at 0000000000000005 IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173] We get: BUG: unable to handle kernel NULL pointer dereference at 0000000000000005 IP: [<ffffffffa0152000>] 0xffffffffa0151fff To fix that, we use %pS only for stack addresses printouts (via newly added printk_stack_address) and %pB for regs->ip (via printk_address). I.e. we revert to the old behaviour for all except call stacks. And since from all those reliable is 1, we remove that parameter from printk_address. Signed-off-by: NJiri Slaby <jslaby@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: joe@perches.com Cc: jirislaby@gmail.com Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 11月, 2013 1 次提交
-
-
由 H. Peter Anvin 提交于
Tracepoints are named hierachially, and it makes more sense to keep a general flow of information level from general to specific from left to right, i.e. x86_exceptions.page_fault_user|kernel rather than x86_exceptions.user|kernel_page_fault Suggested-by: NIngo Molnar <mingo@kernel.org> Acked-by: NSeiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/20131111082955.GB12405@gmail.com
-
- 09 11月, 2013 2 次提交
-
-
由 Seiji Aguchi 提交于
This patch introduces page fault tracepoints to x86 architecture by switching IDT. Two events, for user and kernel spaces, are introduced at the beginning of page fault handler for tracing. - User space event There is a request of page fault event for user space as below. https://lkml.kernel.org/r/1368079520-11015-2-git-send-email-fdeslaur+()+gmail+!+com https://lkml.kernel.org/r/1368079520-11015-1-git-send-email-fdeslaur+()+gmail+!+com - Kernel space event: When we measure an overhead in kernel space for investigating performance issues, we can check if it comes from the page fault events. Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/52716E67.6090705@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Seiji Aguchi 提交于
This patch registers exception handlers for tracing to a trace IDT. To implemented it in set_intr_gate(), this patch does followings. - Register the exception handlers to the trace IDT by prepending "trace_" to the handler's names. - Also, newly introduce trace_page_fault() to add tracepoints in a subsequent patch. Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/52716DEC.5050204@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 29 10月, 2013 1 次提交
-
-
由 Peter Zijlstra 提交于
Now that we can deal with nested NMI due to IRET re-enabling NMIs and can deal with faults from NMI by making sure we preserve CR2 over NMIs we can in fact simply access user-space memory from NMI context. So rewrite copy_from_user_nmi() to use __copy_from_user_inatomic() and rework the fault path to do the minimal required work before taking the in_atomic() fault handler. In particular avoid perf_sw_event() which would make perf recurse on itself (it should be harmless as our recursion protections should be able to deal with this -- but why tempt fate). Also rename notify_page_fault() to kprobes_fault() as that is a much better name; there is no notifier in it and its specific to kprobes. Don measured that his worst case NMI path shrunk from ~300K cycles to ~150K cycles. Cc: Stephane Eranian <eranian@google.com> Cc: jmario@redhat.com Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: dave.hansen@linux.intel.com Tested-by: NDon Zickus <dzickus@redhat.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131024105206.GM2490@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 9月, 2013 2 次提交
-
-
由 Johannes Weiner 提交于
The x86 fault handler bails in the middle of error handling when the task has a fatal signal pending. For a subsequent patch this is a problem in OOM situations because it relies on pagefault_out_of_memory() being called even when the task has been killed, to perform proper per-task OOM state unwinding. Shortcutting the fault like this is a rather minor optimization that saves a few instructions in rare cases. Just remove it for user-triggered faults. Use the opportunity to split the fault retry handling from actual fault errors and add locking documentation that reads suprisingly similar to ARM's. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Unlike global OOM handling, memory cgroup code will invoke the OOM killer in any OOM situation because it has no way of telling faults occuring in kernel context - which could be handled more gracefully - from user-triggered faults. Pass a flag that identifies faults originating in user space from the architecture-specific fault handlers to generic code so that memcg OOM handling can be improved. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 4月, 2013 1 次提交
-
-
由 Samu Kallio 提交于
In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops when lazy MMU updates are enabled, because set_pgd effects are being deferred. One instance of this problem is during process mm cleanup with memory cgroups enabled. The chain of events is as follows: - zap_pte_range enables lazy MMU updates - zap_pte_range eventually calls mem_cgroup_charge_statistics, which accesses the vmalloc'd mem_cgroup per-cpu stat area - vmalloc_fault is triggered which tries to sync the corresponding PGD entry with set_pgd, but the update is deferred - vmalloc_fault oopses due to a mismatch in the PUD entries The OOPs usually looks as so: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/fault.c:396! invalid opcode: 0000 [#1] SMP .. snip .. CPU 1 Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1 RIP: e030:[<ffffffff816271bf>] [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208 .. snip .. Call Trace: [<ffffffff81627759>] do_page_fault+0x399/0x4b0 [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110 [<ffffffff81624065>] page_fault+0x25/0x30 [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50 [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350 [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60 [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150 [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80 [<ffffffff81153e61>] unmap_single_vma+0x531/0x870 [<ffffffff81154962>] unmap_vmas+0x52/0xa0 [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100 [<ffffffff8115c8f8>] exit_mmap+0x98/0x170 [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [<ffffffff81059ce3>] mmput+0x83/0xf0 [<ffffffff810624c4>] exit_mm+0x104/0x130 [<ffffffff8106264a>] do_exit+0x15a/0x8c0 [<ffffffff810630ff>] do_group_exit+0x3f/0xa0 [<ffffffff81063177>] sys_exit_group+0x17/0x20 [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the changes visible to the consistency checks. Cc: <stable@vger.kernel.org> RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737Tested-by: NJosh Boyer <jwboyer@redhat.com> Reported-and-Tested-by: NKrishna Raman <kraman@redhat.com> Signed-off-by: NSamu Kallio <samu.kallio@aberdeencloud.com> Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.comTested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 03 4月, 2013 1 次提交
-
-
由 Borislav Petkov 提交于
... to using the new facility and drop the cpuinfo_x86 member. Signed-off-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1363788448-31325-3-git-send-email-bp@alien8.deSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 08 3月, 2013 2 次提交
-
-
由 Frederic Weisbecker 提交于
On exception exit, we restore the previous context tracking state based on the regs of the interrupted frame. Iff that frame is in user mode as stated by user_mode() helper, we restore the context tracking user mode. However there is a tiny chunck of low level arch code after we pass through user_enter() and until the CPU eventually resumes userspace. If an exception happens in this tiny area, exception_enter() correctly exits the context tracking user mode but exception_exit() won't restore it because of the value returned by user_mode(regs). As a result we may return to userspace with the wrong context tracking state. To fix this, change exception_enter() to return the context tracking state prior to its call and pass this saved state to exception_exit(). This restores the real context tracking state of the interrupted frame. (May be this patch was suggested to me, I don't recall exactly. If so, sorry for the missing credit). Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mats Liljegren <mats.liljegren@enea.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Frederic Weisbecker 提交于
Exceptions handling on context tracking should share common treatment: on entry we exit user mode if the exception triggered in that context. Then on exception exit we return to that previous context. Generalize this to avoid duplication across archs. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mats Liljegren <mats.liljegren@enea.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 24 2月, 2013 1 次提交
-
-
由 Andrea Arcangeli 提交于
I got a report for a minor regression introduced by commit 027ef6c8 ("mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP"). So the problem is, pageattr creates kernel pagetables (pte and pmds) that breaks pte_present/pmd_present and the patch above exposed this invariant breakage for pmd_present. The same problem already existed for the pte and pte_present and it was fixed by commit 660a293e ("x86, mm: Make spurious_fault check explicitly check the PRESENT bit") (if it wasn't for that commit, it wouldn't even be a regression). That fix avoids the pagefault to use pte_present. I could follow through by stopping using pmd_present/pmd_huge too. However I think it's more robust to fix pageattr and to clear the PSE/GLOBAL bitflags too in addition to the present bitflag. So the kernel page fault can keep using the regular pte_present/pmd_present/pmd_huge. The confusion arises because _PAGE_GLOBAL and _PAGE_PROTNONE are sharing the same bit, and in the pmd case we pretend _PAGE_PSE to be set only in present pmds (to facilitate split_huge_page final tlb flush). Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Shaohua Li <shaohua.li@intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 2月, 2013 1 次提交
-
-
由 Kees Cook 提交于
Without this patch, it is trivial to determine kernel page mappings by examining the error code reported to dmesg[1]. Instead, declare the entire kernel memory space as a violation of a present page. Additionally, since show_unhandled_signals is enabled by default, switch branch hinting to the more realistic expectation, and unobfuscate the setting of the PF_PROT bit to improve readability. [1] http://vulnfactory.org/blog/2013/02/06/a-linux-memory-trick/Reported-by: NDan Rosenberg <dan.j.rosenberg@gmail.com> Suggested-by: NBrad Spengler <spender@grsecurity.net> Signed-off-by: NKees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Acked-by: NH. Peter Anvin <hpa@zytor.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130207174413.GA12485@www.outflux.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 12月, 2012 1 次提交
-
-
由 David Rientjes 提交于
out_of_memory() is a globally defined function to call the oom killer. x86, sh, and powerpc all use a function of the same name within file scope in their respective fault.c unnecessarily. Inline the functions into the pagefault handlers to clean the code up. Signed-off-by: NDavid Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 12月, 2012 1 次提交
-
-
由 Frederic Weisbecker 提交于
Create a new subsystem that probes on kernel boundaries to keep track of the transitions between level contexts with two basic initial contexts: user or kernel. This is an abstraction of some RCU code that use such tracking to implement its userspace extended quiescent state. We need to pull this up from RCU into this new level of indirection because this tracking is also going to be used to implement an "on demand" generic virtual cputime accounting. A necessary step to shutdown the tick while still accounting the cputime. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> [ paulmck: fix whitespace error and email address. ] Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 09 10月, 2012 1 次提交
-
-
由 Shaohua Li 提交于
.fault now can retry. The retry can break state machine of .fault. In filemap_fault, if page is miss, ra->mmap_miss is increased. In the second try, since the page is in page cache now, ra->mmap_miss is decreased. And these are done in one fault, so we can't detect random mmap file access. Add a new flag to indicate .fault is tried once. In the second try, skip ra->mmap_miss decreasing. The filemap_fault state machine is ok with it. I only tested x86, didn't test other archs, but looks the change for other archs is obvious, but who knows :) Signed-off-by: NShaohua Li <shaohua.li@fusionio.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 9月, 2012 1 次提交
-
-
由 Frederic Weisbecker 提交于
Add necessary hooks to x86 exception for userspace RCU extended quiescent state support. This includes traps, page fault, debug exceptions, etc... Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 22 9月, 2012 1 次提交
-
-
由 H. Peter Anvin 提交于
If we get a page fault due to SMAP, trigger an oops rather than spinning forever. Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1348256595-29119-11-git-send-email-hpa@linux.intel.com
-
- 03 5月, 2012 1 次提交
-
-
由 Eric W. Biederman 提交于
cred.h and a few trivial users of struct cred are changed. The rest of the users of struct cred are left for other patches as there are too many changes to make in one go and leave the change reviewable. If the user namespace is disabled and CONFIG_UIDGID_STRICT_TYPE_CHECKS are disabled the code will contiue to compile and behave correctly. Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-