- 01 5月, 2014 1 次提交
-
-
由 H. Peter Anvin 提交于
The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. This causes some 16-bit software to break, but it also leaks kernel state to user space. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 64-bit mode. In checkin: b3b42ac2 x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels we "solved" this by forbidding 16-bit segments on 64-bit kernels, with the logic that 16-bit support is crippled on 64-bit kernels anyway (no V86 support), but it turns out that people are doing stuff like running old Win16 binaries under Wine and expect it to work. This works around this by creating percpu "ministacks", each of which is mapped 2^16 times 64K apart. When we detect that the return SS is on the LDT, we copy the IRET frame to the ministack and use the relevant alias to return to userspace. The ministacks are mapped readonly, so if IRET faults we promote #GP to #DF which is an IST vector and thus has its own stack; we then do the fixup in the #DF handler. (Making #GP an IST exception would make the msr_safe functions unsafe in NMI/MC context, and quite possibly have other effects.) Special thanks to: - Andy Lutomirski, for the suggestion of using very small stack slots and copy (as opposed to map) the IRET frame there, and for the suggestion to mark them readonly and let the fault promote to #DF. - Konrad Wilk for paravirt fixup and testing. - Borislav Petkov for testing help and useful comments. Reported-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Lutomriski <amluto@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dirk Hohndel <dirk@hohndel.org> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: comex <comexk@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> # consider after upstream merge
-
- 10 1月, 2014 1 次提交
-
-
由 Steven Rostedt 提交于
Function tracing callbacks expect to have the ftrace_ops that registered it passed to them, not the address of the variable that holds the ftrace_ops that registered it. Use a mov instead of a lea to store the ftrace_ops into the parameter of the function tracing callback. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20131113152004.459787f9@gandalf.local.homeSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.8+
-
- 09 11月, 2013 1 次提交
-
-
由 Seiji Aguchi 提交于
This patch registers exception handlers for tracing to a trace IDT. To implemented it in set_intr_gate(), this patch does followings. - Register the exception handlers to the trace IDT by prepending "trace_" to the handler's names. - Also, newly introduce trace_page_fault() to add tracepoints in a subsequent patch. Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/52716DEC.5050204@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 01 10月, 2013 1 次提交
-
-
由 Frederic Weisbecker 提交于
All arch overriden implementations of do_softirq() share the following common code: disable irqs (to avoid races with the pending check), check if there are softirqs pending, then execute __do_softirq() on a specific stack. Consolidate the common parts such that archs only worry about the stack switch. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@au1.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@au1.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Andrew Morton <akpm@linux-foundation.org>
-
- 25 9月, 2013 1 次提交
-
-
由 Peter Zijlstra 提交于
Convert x86 to use a per-cpu preemption count. The reason for doing so is that accessing per-cpu variables is a lot cheaper than accessing thread_info variables. We still need to save/restore the actual preemption count due to PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the same cache-line as the other hot __switch_to() variables such as current_task. NOTE: this save/restore is required even for !PREEMPT kernels as cond_resched() also relies on preempt_count's PREEMPT_ACTIVE to ignore task_struct::state. Also rename thread_info::preempt_count to ensure nobody is 'accidentally' still poking at it. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-gzn5rfsf8trgjoqx8hyayy3q@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 9月, 2013 1 次提交
-
-
由 Borislav Petkov 提交于
b3af11af ("x86: get rid of pt_regs argument of iopl(2)") dropped PTREGSCALL which was also the last user of save_rest. Drop that now-unused function too. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/1378546750-19727-1-git-send-email-bp@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 26 6月, 2013 1 次提交
-
-
由 H. Peter Anvin 提交于
Bit 1 in the x86 EFLAGS is always set. Name the macro something that actually tries to explain what it is all about, rather than being a tautology. Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Gleb Natapov <gleb@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/n/tip-f10rx5vjjm6tfnt8o1wseb3v@git.kernel.org
-
- 23 6月, 2013 1 次提交
-
-
由 Seiji Aguchi 提交于
In case CONFIG_X86_MCE_THRESHOLD and CONFIG_X86_THERMAL_VECTOR are disabled, kernel build fails as follows. arch/x86/built-in.o: In function `trace_threshold_interrupt': (.entry.text+0x122b): undefined reference to `smp_trace_threshold_interrupt' arch/x86/built-in.o: In function `trace_thermal_interrupt': (.entry.text+0x132b): undefined reference to `smp_trace_thermal_interrupt' In this case, trace_threshold_interrupt/trace_thermal_interrupt are not needed to define. So, add config option checking to their definitions in entry_64.S. Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com> Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/51C58B8A.2080808@hds.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 21 6月, 2013 1 次提交
-
-
由 Seiji Aguchi 提交于
[Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
-
- 17 4月, 2013 1 次提交
-
-
由 Yang Zhang 提交于
Posted Interrupt feature requires a special IPI to deliver posted interrupt to guest. And it should has a high priority so the interrupt will not be blocked by others. Normally, the posted interrupt will be consumed by vcpu if target vcpu is running and transparent to OS. But in some cases, the interrupt will arrive when target vcpu is scheduled out. And host will see it. So we need to register a dump handler to handle it. Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Acked-by: NIngo Molnar <mingo@kernel.org> Reviewed-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 13 2月, 2013 1 次提交
-
-
由 K. Y. Srinivasan 提交于
Starting with win8, vmbus interrupts can be delivered on any VCPU in the guest and furthermore can be concurrently active on multiple VCPUs. Support this interrupt delivery model by setting up a separate IDT entry for Hyper-V vmbus. interrupts. I would like to thank Jan Beulich <JBeulich@suse.com> and Thomas Gleixner <tglx@linutronix.de>, for their help. In this version of the patch, based on the feedback, I have merged the IDT vector for Xen and Hyper-V and made the necessary adjustments. Furhermore, based on Jan's feedback I have added the necessary compilation switches. Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com> Link: http://lkml.kernel.org/r/1359940959-32168-3-git-send-email-kys@microsoft.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 04 2月, 2013 3 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
we are not going to return via SYSRET anyway. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 24 1月, 2013 1 次提交
-
-
由 Jan Beulich 提交于
While in one case a plain annotation is necessary, in the other case the stack adjustment can simply be folded into the immediately preceding RESTORE_ALL, thus getting the correct annotation for free. Signed-off-by: NJan Beulich <jbeulich@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander van Heukelum <heukelum@mailshack.com> Link: http://lkml.kernel.org/r/51010C9302000078000B9045@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 20 12月, 2012 2 次提交
-
-
由 Al Viro 提交于
Again, conditional on CONFIG_GENERIC_SIGALTSTACK Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Conditional on CONFIG_GENERIC_SIGALTSTACK; architectures that do not select it are completely unaffected Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 01 12月, 2012 1 次提交
-
-
由 Frederic Weisbecker 提交于
Create a new subsystem that probes on kernel boundaries to keep track of the transitions between level contexts with two basic initial contexts: user or kernel. This is an abstraction of some RCU code that use such tracking to implement its userspace extended quiescent state. We need to pull this up from RCU into this new level of indirection because this tracking is also going to be used to implement an "on demand" generic virtual cputime accounting. A necessary step to shutdown the tick while still accounting the cputime. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> [ paulmck: fix whitespace error and email address. ] Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 29 11月, 2012 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 21 11月, 2012 1 次提交
-
-
由 Jan Beulich 提交于
While these got added in the right place everywhere else, entry_64.S is the odd one where they ended up before the initial CFI directive(s). In order to cover the full code ranges, the CFI directive must be first, though. Signed-off-by: NJan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/5093BA1F02000078000A600E@nat28.tlf.novell.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 02 11月, 2012 1 次提交
-
-
由 Salman Qazi 提交于
The nested NMI modifies the place (instruction, flags and stack) that the first NMI will iret to. However, the copy of registers modified is exactly the one that is the part of pt_regs in the first NMI. This can change the behaviour of the first NMI. In particular, Google's arch_trigger_all_cpu_backtrace handler also prints regions of memory surrounding addresses appearing in registers. This results in handled exceptions, after which nested NMIs start coming in. These nested NMIs change the value of registers in pt_regs. This can cause the original NMI handler to produce incorrect output. We solve this problem by interchanging the position of the preserved copy of the iret registers ("saved") and the copy subject to being trampled by nested NMI ("copied"). Link: http://lkml.kernel.org/r/20121002002919.27236.14388.stgit@dungbeetle.mtv.corp.google.comSigned-off-by: NSalman Qazi <sqazi@google.com> [ Added a needed CFI_ADJUST_CFA_OFFSET ] Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 20 10月, 2012 1 次提交
-
-
由 David Vrabel 提交于
In 32 bit guests, if a userspace process has %eax == -ERESTARTSYS (-512) or -ERESTARTNOINTR (-513) when it is interrupted by an event /and/ the process has a pending signal then %eip (and %eax) are corrupted when returning to the main process after handling the signal. The application may then crash with SIGSEGV or a SIGILL or it may have subtly incorrect behaviour (depending on what instruction it returned to). The occurs because handle_signal() is incorrectly thinking that there is a system call that needs to restarted so it adjusts %eip and %eax to re-execute the system call instruction (even though user space had not done a system call). If %eax == -514 (-ERESTARTNOHAND (-514) or -ERESTART_RESTARTBLOCK (-516) then handle_signal() only corrupted %eax (by setting it to -EINTR). This may cause the application to crash or have incorrect behaviour. handle_signal() assumes that regs->orig_ax >= 0 means a system call so any kernel entry point that is not for a system call must push a negative value for orig_ax. For example, for physical interrupts on bare metal the inverse of the vector is pushed and page_fault() sets regs->orig_ax to -1, overwriting the hardware provided error code. xen_hypervisor_callback() was incorrectly pushing 0 for orig_ax instead of -1. Classic Xen kernels pushed %eax which works as %eax cannot be both non-negative and -RESTARTSYS (etc.), but using -1 is consistent with other non-system call entry points and avoids some of the tests in handle_signal(). There were similar bugs in xen_failsafe_callback() of both 32 and 64-bit guests. If the fault was corrected and the normal return path was used then 0 was incorrectly pushed as the value for orig_ax. Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Acked-by: NJan Beulich <JBeulich@suse.com> Acked-by: NIan Campbell <ian.campbell@citrix.com> Cc: stable@vger.kernel.org Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 13 10月, 2012 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 01 10月, 2012 2 次提交
-
-
由 Al Viro 提交于
32bit wrapper is lost on that; 64bit one is *not*, since we need to arrange for full pt_regs on stack when we call sys_execve() and we need to load callee-saved ones from there afterwards. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 9月, 2012 2 次提交
-
-
由 Frederic Weisbecker 提交于
This way we can exit the RCU extended quiescent state before we schedule a new task from irq/exception exit. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Tao Guo 提交于
GAS in binutils(2.16.91) could not parse parentheses within macro parameters unless fully parenthesized, and this is a workaround to make old gas work without generating below errors: arch/x86/kernel/entry_64.S: Assembler messages: arch/x86/kernel/entry_64.S:387: Error: too many positional arguments arch/x86/kernel/entry_64.S:389: Error: too many positional arguments [...] Signed-off-by: NTao Guo <glorioustao@gmail.com> Reluctantly-Acked-by: NJan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1348648102-12653-1-git-send-email-glorioustao@gmail.com [ Jan argues that these old GAS versions are fragile - which is so, but lets give them a chance. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 9月, 2012 1 次提交
-
-
由 H. Peter Anvin 提交于
When Supervisor Mode Access Prevention (SMAP) is enabled, access to userspace from the kernel is controlled by the AC flag. To make the performance of manipulating that flag acceptable, there are two new instructions, STAC and CLAC, to set and clear it. This patch adds those instructions, via alternative(), when the SMAP feature is enabled. It also adds X86_EFLAGS_AC unconditionally to the SYSCALL entry mask; there is simply no reason to make that one conditional. Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1348256595-29119-9-git-send-email-hpa@linux.intel.com
-
- 14 9月, 2012 1 次提交
-
-
由 Steven Rostedt 提交于
Allow ftrace handlers to change RIP register (regs->ip) in handlers. This will allow handlers to call another function instead of original function. Link: http://lkml.kernel.org/r/20120905143118.10329.5078.stgit@localhost.localdomain Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 13 9月, 2012 1 次提交
-
-
由 Ian Campbell 提交于
On 64 bit x86 we save the current eflags in cpu_init for use in ret_from_fork. Strictly speaking reserved bits in EFLAGS should be read as written but in practise it is unlikely that EFLAGS could ever be extended in this way and the kernel alread clears any undefined flags early on. The equivalent 32 bit code simply hard codes 0x0202 as the new EFLAGS. This change makes 64 bit use the same mechanism to setup the initial EFLAGS on fork. Note that 64 bit resets EFLAGS before calling schedule_tail() as opposed to 32 bit which calls schedule_tail() first. Therefore the correct value for EFLAGS has opposite IF bit. Signed-off-by: NIan Campbell <ian.campbell@citrix.com> Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Acked-by: NAndi Kleen <ak@linux.intel.com> Acked-by: N"H. Peter Anvin" <hpa@zytor.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Pekka Enberg <penberg@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20120824195847.GA31628@moonSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 8月, 2012 1 次提交
-
-
由 Steven Rostedt 提交于
If the kernel is compiled with gcc 4.6.0 which supports -mfentry, then use that instead of mcount. With mcount, frame pointers are forced with the -pg option and we get something like: <can_vma_merge_before>: 55 push %rbp 48 89 e5 mov %rsp,%rbp 53 push %rbx 41 51 push %r9 e8 fe 6a 39 00 callq ffffffff81483d00 <mcount> 31 c0 xor %eax,%eax 48 89 fb mov %rdi,%rbx 48 89 d7 mov %rdx,%rdi 48 33 73 30 xor 0x30(%rbx),%rsi 48 f7 c6 ff ff ff f7 test $0xfffffffff7ffffff,%rsi With -mfentry, frame pointers are no longer forced and the call looks like this: <can_vma_merge_before>: e8 33 af 37 00 callq ffffffff81461b40 <__fentry__> 53 push %rbx 48 89 fb mov %rdi,%rbx 31 c0 xor %eax,%eax 48 89 d7 mov %rdx,%rdi 41 51 push %r9 48 33 73 30 xor 0x30(%rbx),%rsi 48 f7 c6 ff ff ff f7 test $0xfffffffff7ffffff,%rsi This adds the ftrace hook at the beginning of the function before a frame is set up, and allows the function callbacks to be able to access parameters. As kprobes now can use function tracing (at least on x86) this speeds up the kprobe hooks that are at the beginning of the function. Link: http://lkml.kernel.org/r/20120807194100.130477900@goodmis.orgAcked-by: NIngo Molnar <mingo@kernel.org> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 31 7月, 2012 1 次提交
-
-
由 Steven Rostedt 提交于
The graph caller is called by the mcount callers, which already does the check against the function_trace_stop variable. No reason to check it again. Link: http://lkml.kernel.org/r/20120711195745.588538769@goodmis.orgSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 20 7月, 2012 2 次提交
-
-
由 Steven Rostedt 提交于
Add a way to have different functions calling different trampolines. If a ftrace_ops wants regs saved on the return, then have only the functions with ops registered to save regs. Functions registered by other ops would not be affected, unless the functions overlap. If one ftrace_ops registered functions A, B and C and another ops registered fucntions to save regs on A, and D, then only functions A and D would be saving regs. Function B and C would work as normal. Although A is registered by both ops: normal and saves regs; this is fine as saving the regs is needed to satisfy one of the ops that calls it but the regs are ignored by the other ops function. x86_64 implements the full regs saving, and i386 just passes a NULL for regs to satisfy the ftrace_ops passing. Where an arch must supply both regs and ftrace_ops parameters, even if regs is just NULL. It is OK for an arch to pass NULL regs. All function trace users that require regs passing must add the flag FTRACE_OPS_FL_SAVE_REGS when registering the ftrace_ops. If the arch does not support saving regs then the ftrace_ops will fail to register. The flag FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED may be set that will prevent the ftrace_ops from failing to register. In this case, the handler may either check if regs is not NULL or check if ARCH_SUPPORTS_FTRACE_SAVE_REGS. If the arch supports passing regs it will set this macro and pass regs for ops that request them. All other archs will just pass NULL. Link: Link: http://lkml.kernel.org/r/20120711195745.107705970@goodmis.org Cc: Alexander van Heukelum <heukelum@fastmail.fm> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
Currently the function trace callback receives only the ip and parent_ip of the function that it traced. It would be more powerful to also return the ops that registered the function as well. This allows the same function to act differently depending on what ftrace_ops registered it. Link: http://lkml.kernel.org/r/20120612225424.267254552@goodmis.orgReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 28 6月, 2012 1 次提交
-
-
由 Alex Shi 提交于
There are 32 INVALIDATE_TLB_VECTOR now in kernel. That is quite big amount of vector in IDT. But it is still not enough, since modern x86 sever has more cpu number. That still causes heavy lock contention in TLB flushing. The patch using generic smp call function to replace it. That saved 32 vector number in IDT, and resolved the lock contention in TLB flushing on large system. In the NHM EX machine 4P * 8cores * HT = 64 CPUs, hackbench pthread has 3% performance increase. Signed-off-by: NAlex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-9-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 07 6月, 2012 1 次提交
-
-
由 Steven Rostedt 提交于
Avi Kivity reported that page faults in NMIs could cause havic if the NMI preempted another page fault handler: The recent changes to NMI allow exceptions to take place in NMI handlers, but I think that a #PF (say, due to access to vmalloc space) is still problematic. Consider the sequence #PF (cr2 set by processor) NMI ... #PF (cr2 clobbered) do_page_fault() IRET ... IRET do_page_fault() address = read_cr2() The last line reads the overwritten cr2 value. Originally I wrote a patch to solve this by saving the cr2 on the stack. Brian Gerst suggested to save it in the r12 register as both r12 and rbx are saved by the do_nmi handler as required by the C standard. But rbx is already used for saving if swapgs needs to be run on exit of the NMI handler. Link: http://lkml.kernel.org/r/4FBB8C40.6080304@redhat.com Link: http://lkml.kernel.org/r/1337763411.13348.140.camel@gandalf.stny.rr.comReported-by: NAvi Kivity <avi@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Suggested-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 01 6月, 2012 1 次提交
-
-
由 Steven Rostedt 提交于
When both DYNAMIC_FTRACE and LOCKDEP are set, the TRACE_IRQS_ON/OFF will call into the lockdep code. The lockdep code can call lots of functions that may be traced by ftrace. When ftrace is updating its code and hits a breakpoint, the breakpoint handler will call into lockdep. If lockdep happens to call a function that also has a breakpoint attached, it will jump back into the breakpoint handler resetting the stack to the debug stack and corrupt the contents currently on that stack. The 'do_sym' call that calls do_int3() is protected by modifying the IST table to point to a different location if another breakpoint is hit. But the TRACE_IRQS_OFF/ON are outside that protection, and if a breakpoint is hit from those, the stack will get corrupted, and the kernel will crash: [ 1013.243754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 [ 1013.272665] IP: [<ffff880145cc0000>] 0xffff880145cbffff [ 1013.285186] PGD 1401b2067 PUD 14324c067 PMD 0 [ 1013.298832] Oops: 0010 [#1] PREEMPT SMP [ 1013.310600] CPU 2 [ 1013.317904] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1013.401848] [ 1013.407399] Pid: 112, comm: kworker/2:1 Not tainted 3.4.0+ #30 [ 1013.437943] RIP: 8eb8:[<ffff88014630a000>] [<ffff88014630a000>] 0xffff880146309fff [ 1013.459871] RSP: ffffffff8165e919:ffff88014780f408 EFLAGS: 00010046 [ 1013.477909] RAX: 0000000000000001 RBX: ffffffff81104020 RCX: 0000000000000000 [ 1013.499458] RDX: ffff880148008ea8 RSI: ffffffff8131ef40 RDI: ffffffff82203b20 [ 1013.521612] RBP: ffffffff81005751 R08: 0000000000000000 R09: 0000000000000000 [ 1013.543121] R10: ffffffff82cdc318 R11: 0000000000000000 R12: ffff880145cc0000 [ 1013.564614] R13: ffff880148008eb8 R14: 0000000000000002 R15: ffff88014780cb40 [ 1013.586108] FS: 0000000000000000(0000) GS:ffff880148000000(0000) knlGS:0000000000000000 [ 1013.609458] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1013.627420] CR2: 0000000000000002 CR3: 0000000141f10000 CR4: 00000000001407e0 [ 1013.649051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1013.670724] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1013.692376] Process kworker/2:1 (pid: 112, threadinfo ffff88013fe0e000, task ffff88014020a6a0) [ 1013.717028] Stack: [ 1013.724131] ffff88014780f570 ffff880145cc0000 0000400000004000 0000000000000000 [ 1013.745918] cccccccccccccccc ffff88014780cca8 ffffffff811072bb ffffffff81651627 [ 1013.767870] ffffffff8118f8a7 ffffffff811072bb ffffffff81f2b6c5 ffffffff81f11bdb [ 1013.790021] Call Trace: [ 1013.800701] Code: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a <e7> d7 64 81 ff ff ff ff 01 00 00 00 00 00 00 00 65 d9 64 81 ff [ 1013.861443] RIP [<ffff88014630a000>] 0xffff880146309fff [ 1013.884466] RSP <ffff88014780f408> [ 1013.901507] CR2: 0000000000000002 The solution was to reuse the NMI functions that change the IDT table to make the debug stack keep its current stack (in kernel mode) when hitting a breakpoint: call debug_stack_set_zero TRACE_IRQS_ON call debug_stack_reset If the TRACE_IRQS_ON happens to hit a breakpoint then it will keep the current stack and not crash the box. Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 21 4月, 2012 1 次提交
-
-
由 H. Peter Anvin 提交于
Remove open-coded exception table entries in arch/x86/kernel/entry_64.S, and replace them with _ASM_EXTABLE() macros; this will allow us to change the format and type of the exception table entries. Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Cc: David Daney <david.daney@cavium.com> Link: http://lkml.kernel.org/r/CA%2B55aFyijf43qSu3N9nWHEBwaGbb7T2Oq9A=9EyR=Jtyqfq_cQ@mail.gmail.com
-
- 27 2月, 2012 1 次提交
-
-
由 Mark Wielaard 提交于
Commit eab9e613 ("x86-64: Fix CFI data for interrupt frames") introduced a DW_CFA_def_cfa_expression in the SAVE_ARGS_IRQ macro. To later define the CFA using a simple register+offset rule both register and offset need to be supplied. Just using CFI_DEF_CFA_REGISTER leaves the offset undefined. So use CFI_DEF_CFA with reg+off explicitly at the end of common_interrupt. Signed-off-by: NMark Wielaard <mjw@redhat.com> Acked-by: NJan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/1330079527-30711-1-git-send-email-mjw@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 25 2月, 2012 1 次提交
-
-
由 Steven Rostedt 提交于
Some of the comments for the nesting NMI algorithm were stale and had some references to some prototypes that were first tried. I also updated the comments to be a little easier to understand the flow of the code. It definitely needs the documentation. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-