1. 26 6月, 2013 1 次提交
  2. 23 6月, 2013 1 次提交
  3. 21 6月, 2013 1 次提交
    • S
      x86, trace: Add irq vector tracepoints · cf910e83
      Seiji Aguchi 提交于
      [Purpose of this patch]
      
      As Vaibhav explained in the thread below, tracepoints for irq vectors
      are useful.
      
      http://www.spinics.net/lists/mm-commits/msg85707.html
      
      <snip>
      The current interrupt traces from irq_handler_entry and irq_handler_exit
      provide when an interrupt is handled.  They provide good data about when
      the system has switched to kernel space and how it affects the currently
      running processes.
      
      There are some IRQ vectors which trigger the system into kernel space,
      which are not handled in generic IRQ handlers.  Tracing such events gives
      us the information about IRQ interaction with other system events.
      
      The trace also tells where the system is spending its time.  We want to
      know which cores are handling interrupts and how they are affecting other
      processes in the system.  Also, the trace provides information about when
      the cores are idle and which interrupts are changing that state.
      <snip>
      
      On the other hand, my usecase is tracing just local timer event and
      getting a value of instruction pointer.
      
      I suggested to add an argument local timer event to get instruction pointer before.
      But there is another way to get it with external module like systemtap.
      So, I don't need to add any argument to irq vector tracepoints now.
      
      [Patch Description]
      
      Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
      But there is an above use case to trace specific irq_vector rather than tracing all events.
      In this case, we are concerned about overhead due to unwanted events.
      
      So, add following tracepoints instead of introducing irq_vector_entry/exit.
      so that we can enable them independently.
         - local_timer_vector
         - reschedule_vector
         - call_function_vector
         - call_function_single_vector
         - irq_work_entry_vector
         - error_apic_vector
         - thermal_apic_vector
         - threshold_apic_vector
         - spurious_apic_vector
         - x86_platform_ipi_vector
      
      Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
      makes a zero when tracepoints are disabled. Detailed explanations are as follows.
       - Create trace irq handlers with entering_irq()/exiting_irq().
       - Create a new IDT, trace_idt_table, at boot time by adding a logic to
         _set_gate(). It is just a copy of original idt table.
       - Register the new handlers for tracpoints to the new IDT by introducing
         macros to alloc_intr_gate() called at registering time of irq_vector handlers.
       - Add checking, whether irq vector tracing is on/off, into load_current_idt().
         This has to be done below debug checking for these reasons.
         - Switching to debug IDT may be kicked while tracing is enabled.
         - On the other hands, switching to trace IDT is kicked only when debugging
           is disabled.
      
      In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
      used for other purposes.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      cf910e83
  4. 17 4月, 2013 1 次提交
  5. 13 2月, 2013 1 次提交
  6. 04 2月, 2013 3 次提交
  7. 24 1月, 2013 1 次提交
  8. 20 12月, 2012 2 次提交
  9. 01 12月, 2012 1 次提交
    • F
      context_tracking: New context tracking susbsystem · 91d1aa43
      Frederic Weisbecker 提交于
      Create a new subsystem that probes on kernel boundaries
      to keep track of the transitions between level contexts
      with two basic initial contexts: user or kernel.
      
      This is an abstraction of some RCU code that use such tracking
      to implement its userspace extended quiescent state.
      
      We need to pull this up from RCU into this new level of indirection
      because this tracking is also going to be used to implement an "on
      demand" generic virtual cputime accounting. A necessary step to
      shutdown the tick while still accounting the cputime.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      [ paulmck: fix whitespace error and email address. ]
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      91d1aa43
  10. 29 11月, 2012 1 次提交
  11. 21 11月, 2012 1 次提交
  12. 02 11月, 2012 1 次提交
    • S
      x86: Don't clobber top of pt_regs in nested NMI · 28696f43
      Salman Qazi 提交于
      The nested NMI modifies the place (instruction, flags and stack)
      that the first NMI will iret to.  However, the copy of registers
      modified is exactly the one that is the part of pt_regs in
      the first NMI.  This can change the behaviour of the first NMI.
      
      In particular, Google's arch_trigger_all_cpu_backtrace handler
      also prints regions of memory surrounding addresses appearing in
      registers.  This results in handled exceptions, after which nested NMIs
      start coming in.  These nested NMIs change the value of registers
      in pt_regs.  This can cause the original NMI handler to produce
      incorrect output.
      
      We solve this problem by interchanging the position of the preserved
      copy of the iret registers ("saved") and the copy subject to being
      trampled by nested NMI ("copied").
      
      Link: http://lkml.kernel.org/r/20121002002919.27236.14388.stgit@dungbeetle.mtv.corp.google.comSigned-off-by: NSalman Qazi <sqazi@google.com>
      [ Added a needed CFI_ADJUST_CFA_OFFSET ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      28696f43
  13. 20 10月, 2012 1 次提交
    • D
      xen/x86: don't corrupt %eip when returning from a signal handler · a349e23d
      David Vrabel 提交于
      In 32 bit guests, if a userspace process has %eax == -ERESTARTSYS
      (-512) or -ERESTARTNOINTR (-513) when it is interrupted by an event
      /and/ the process has a pending signal then %eip (and %eax) are
      corrupted when returning to the main process after handling the
      signal.  The application may then crash with SIGSEGV or a SIGILL or it
      may have subtly incorrect behaviour (depending on what instruction it
      returned to).
      
      The occurs because handle_signal() is incorrectly thinking that there
      is a system call that needs to restarted so it adjusts %eip and %eax
      to re-execute the system call instruction (even though user space had
      not done a system call).
      
      If %eax == -514 (-ERESTARTNOHAND (-514) or -ERESTART_RESTARTBLOCK
      (-516) then handle_signal() only corrupted %eax (by setting it to
      -EINTR).  This may cause the application to crash or have incorrect
      behaviour.
      
      handle_signal() assumes that regs->orig_ax >= 0 means a system call so
      any kernel entry point that is not for a system call must push a
      negative value for orig_ax.  For example, for physical interrupts on
      bare metal the inverse of the vector is pushed and page_fault() sets
      regs->orig_ax to -1, overwriting the hardware provided error code.
      
      xen_hypervisor_callback() was incorrectly pushing 0 for orig_ax
      instead of -1.
      
      Classic Xen kernels pushed %eax which works as %eax cannot be both
      non-negative and -RESTARTSYS (etc.), but using -1 is consistent with
      other non-system call entry points and avoids some of the tests in
      handle_signal().
      
      There were similar bugs in xen_failsafe_callback() of both 32 and
      64-bit guests. If the fault was corrected and the normal return path
      was used then 0 was incorrectly pushed as the value for orig_ax.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NJan Beulich <JBeulich@suse.com>
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a349e23d
  14. 13 10月, 2012 1 次提交
  15. 01 10月, 2012 2 次提交
  16. 26 9月, 2012 2 次提交
    • F
      x86: Use the new schedule_user API on userspace preemption · 0430499c
      Frederic Weisbecker 提交于
      This way we can exit the RCU extended quiescent state before
      we schedule a new task from irq/exception exit.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      0430499c
    • T
      x86_64: Work around old GAS bug · 1b2b23d8
      Tao Guo 提交于
      GAS in binutils(2.16.91) could not parse parentheses within
      macro parameters unless fully parenthesized, and this is a
      workaround to make old gas work without generating below errors:
      
       arch/x86/kernel/entry_64.S: Assembler messages:
       arch/x86/kernel/entry_64.S:387: Error: too many positional arguments
       arch/x86/kernel/entry_64.S:389: Error: too many positional arguments
       [...]
      Signed-off-by: NTao Guo <glorioustao@gmail.com>
      Reluctantly-Acked-by: NJan Beulich <jbeulich@novell.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1348648102-12653-1-git-send-email-glorioustao@gmail.com
      [ Jan argues that these old GAS versions are fragile - which is so, but lets give them a chance. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1b2b23d8
  17. 22 9月, 2012 1 次提交
  18. 14 9月, 2012 1 次提交
  19. 13 9月, 2012 1 次提交
  20. 23 8月, 2012 1 次提交
    • S
      ftrace/x86: Add support for -mfentry to x86_64 · d57c5d51
      Steven Rostedt 提交于
      If the kernel is compiled with gcc 4.6.0 which supports -mfentry,
      then use that instead of mcount.
      
      With mcount, frame pointers are forced with the -pg option and we
      get something like:
      
      <can_vma_merge_before>:
             55                      push   %rbp
             48 89 e5                mov    %rsp,%rbp
             53                      push   %rbx
             41 51                   push   %r9
             e8 fe 6a 39 00          callq  ffffffff81483d00 <mcount>
             31 c0                   xor    %eax,%eax
             48 89 fb                mov    %rdi,%rbx
             48 89 d7                mov    %rdx,%rdi
             48 33 73 30             xor    0x30(%rbx),%rsi
             48 f7 c6 ff ff ff f7    test   $0xfffffffff7ffffff,%rsi
      
      With -mfentry, frame pointers are no longer forced and the call looks
      like this:
      
      <can_vma_merge_before>:
             e8 33 af 37 00          callq  ffffffff81461b40 <__fentry__>
             53                      push   %rbx
             48 89 fb                mov    %rdi,%rbx
             31 c0                   xor    %eax,%eax
             48 89 d7                mov    %rdx,%rdi
             41 51                   push   %r9
             48 33 73 30             xor    0x30(%rbx),%rsi
             48 f7 c6 ff ff ff f7    test   $0xfffffffff7ffffff,%rsi
      
      This adds the ftrace hook at the beginning of the function before a
      frame is set up, and allows the function callbacks to be able to access
      parameters. As kprobes now can use function tracing (at least on x86)
      this speeds up the kprobe hooks that are at the beginning of the
      function.
      
      Link: http://lkml.kernel.org/r/20120807194100.130477900@goodmis.orgAcked-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d57c5d51
  21. 31 7月, 2012 1 次提交
  22. 20 7月, 2012 2 次提交
    • S
      ftrace/x86: Add separate function to save regs · 08f6fba5
      Steven Rostedt 提交于
      Add a way to have different functions calling different trampolines.
      If a ftrace_ops wants regs saved on the return, then have only the
      functions with ops registered to save regs. Functions registered by
      other ops would not be affected, unless the functions overlap.
      
      If one ftrace_ops registered functions A, B and C and another ops
      registered fucntions to save regs on A, and D, then only functions
      A and D would be saving regs. Function B and C would work as normal.
      Although A is registered by both ops: normal and saves regs; this is fine
      as saving the regs is needed to satisfy one of the ops that calls it
      but the regs are ignored by the other ops function.
      
      x86_64 implements the full regs saving, and i386 just passes a NULL
      for regs to satisfy the ftrace_ops passing. Where an arch must supply
      both regs and ftrace_ops parameters, even if regs is just NULL.
      
      It is OK for an arch to pass NULL regs. All function trace users that
      require regs passing must add the flag FTRACE_OPS_FL_SAVE_REGS when
      registering the ftrace_ops. If the arch does not support saving regs
      then the ftrace_ops will fail to register. The flag
      FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED may be set that will prevent the
      ftrace_ops from failing to register. In this case, the handler may
      either check if regs is not NULL or check if ARCH_SUPPORTS_FTRACE_SAVE_REGS.
      If the arch supports passing regs it will set this macro and pass regs
      for ops that request them. All other archs will just pass NULL.
      
      Link: Link: http://lkml.kernel.org/r/20120711195745.107705970@goodmis.org
      
      Cc: Alexander van Heukelum <heukelum@fastmail.fm>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      08f6fba5
    • S
      ftrace: Pass ftrace_ops as third parameter to function trace callback · 2f5f6ad9
      Steven Rostedt 提交于
      Currently the function trace callback receives only the ip and parent_ip
      of the function that it traced. It would be more powerful to also return
      the ops that registered the function as well. This allows the same function
      to act differently depending on what ftrace_ops registered it.
      
      Link: http://lkml.kernel.org/r/20120612225424.267254552@goodmis.orgReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2f5f6ad9
  23. 28 6月, 2012 1 次提交
  24. 07 6月, 2012 1 次提交
  25. 01 6月, 2012 1 次提交
    • S
      ftrace/x86: Do not change stacks in DEBUG when calling lockdep · 5963e317
      Steven Rostedt 提交于
      When both DYNAMIC_FTRACE and LOCKDEP are set, the TRACE_IRQS_ON/OFF
      will call into the lockdep code. The lockdep code can call lots of
      functions that may be traced by ftrace. When ftrace is updating its
      code and hits a breakpoint, the breakpoint handler will call into
      lockdep. If lockdep happens to call a function that also has a breakpoint
      attached, it will jump back into the breakpoint handler resetting
      the stack to the debug stack and corrupt the contents currently on
      that stack.
      
      The 'do_sym' call that calls do_int3() is protected by modifying the
      IST table to point to a different location if another breakpoint is
      hit. But the TRACE_IRQS_OFF/ON are outside that protection, and if
      a breakpoint is hit from those, the stack will get corrupted, and
      the kernel will crash:
      
      [ 1013.243754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
      [ 1013.272665] IP: [<ffff880145cc0000>] 0xffff880145cbffff
      [ 1013.285186] PGD 1401b2067 PUD 14324c067 PMD 0
      [ 1013.298832] Oops: 0010 [#1] PREEMPT SMP
      [ 1013.310600] CPU 2
      [ 1013.317904] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
      [ 1013.401848]
      [ 1013.407399] Pid: 112, comm: kworker/2:1 Not tainted 3.4.0+ #30
      [ 1013.437943] RIP: 8eb8:[<ffff88014630a000>]  [<ffff88014630a000>] 0xffff880146309fff
      [ 1013.459871] RSP: ffffffff8165e919:ffff88014780f408  EFLAGS: 00010046
      [ 1013.477909] RAX: 0000000000000001 RBX: ffffffff81104020 RCX: 0000000000000000
      [ 1013.499458] RDX: ffff880148008ea8 RSI: ffffffff8131ef40 RDI: ffffffff82203b20
      [ 1013.521612] RBP: ffffffff81005751 R08: 0000000000000000 R09: 0000000000000000
      [ 1013.543121] R10: ffffffff82cdc318 R11: 0000000000000000 R12: ffff880145cc0000
      [ 1013.564614] R13: ffff880148008eb8 R14: 0000000000000002 R15: ffff88014780cb40
      [ 1013.586108] FS:  0000000000000000(0000) GS:ffff880148000000(0000) knlGS:0000000000000000
      [ 1013.609458] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1013.627420] CR2: 0000000000000002 CR3: 0000000141f10000 CR4: 00000000001407e0
      [ 1013.649051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1013.670724] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [ 1013.692376] Process kworker/2:1 (pid: 112, threadinfo ffff88013fe0e000, task ffff88014020a6a0)
      [ 1013.717028] Stack:
      [ 1013.724131]  ffff88014780f570 ffff880145cc0000 0000400000004000 0000000000000000
      [ 1013.745918]  cccccccccccccccc ffff88014780cca8 ffffffff811072bb ffffffff81651627
      [ 1013.767870]  ffffffff8118f8a7 ffffffff811072bb ffffffff81f2b6c5 ffffffff81f11bdb
      [ 1013.790021] Call Trace:
      [ 1013.800701] Code: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a <e7> d7 64 81 ff ff ff ff 01 00 00 00 00 00 00 00 65 d9 64 81 ff
      [ 1013.861443] RIP  [<ffff88014630a000>] 0xffff880146309fff
      [ 1013.884466]  RSP <ffff88014780f408>
      [ 1013.901507] CR2: 0000000000000002
      
      The solution was to reuse the NMI functions that change the IDT table to make the debug
      stack keep its current stack (in kernel mode) when hitting a breakpoint:
      
        call debug_stack_set_zero
        TRACE_IRQS_ON
        call debug_stack_reset
      
      If the TRACE_IRQS_ON happens to hit a breakpoint then it will keep the current stack
      and not crash the box.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5963e317
  26. 21 4月, 2012 1 次提交
  27. 27 2月, 2012 1 次提交
  28. 25 2月, 2012 3 次提交
  29. 21 2月, 2012 4 次提交
    • S
      x86: Specify a size for the cmp in the NMI handler · a38449ef
      Steven Rostedt 提交于
      Linus noticed that the cmp used to check if the code segment is
      __KERNEL_CS or not did not specify a size. Perhaps it does not matter
      as H. Peter Anvin noted that user space can not set the bottom two
      bits of the %cs register. But it's best not to let the assembly choose
      and change things between different versions of gas, but instead just
      pick the size.
      
      Four bytes are used to compare the saved code segment against
      __KERNEL_CS. Perhaps this might mess up Xen, but we can fix that when
      the time comes.
      
      Also I noticed that there was another non-specified cmp that checks
      the special stack variable if it is 1 or 0. This too probably doesn't
      matter what cmp is used, but this patch uses cmpl just to make it non
      ambiguous.
      
      Link: http://lkml.kernel.org/r/CA+55aFxfAn9MWRgS3O5k2tqN5ys1XrhSFVO5_9ZAoZKDVgNfGA@mail.gmail.comSuggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a38449ef
    • H
      x32: Handle process creation · d1a797f3
      H. Peter Anvin 提交于
      Allow an x32 process to be started.
      Originally-by: NH. J. Lu <hjl.tools@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      d1a797f3
    • H
      x32: Signal-related system calls · c5a37394
      H. Peter Anvin 提交于
      x32 uses the 64-bit signal frame format, obviously, but there are some
      structures which mixes that with pointers or sizeof(long) types, as
      such we have to create a handful of system calls specific to x32.  By
      and large these are a mixture of the 64-bit and the compat system
      calls.
      Originally-by: NH. J. Lu <hjl.tools@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      c5a37394
    • H
      x32: Handle the x32 system call flag · fca460f9
      H. Peter Anvin 提交于
      x32 shares most system calls with x86-64, but unfortunately some
      subsystem (the input subsystem is the chief offender) which require
      is_compat() when operating with a 32-bit userspace.  The input system
      actually has text files in sysfs whose meaning is dependent on
      sizeof(long) in userspace!
      
      We could solve this by having two completely disjoint system call
      tables; requiring that each system call be duplicated.  This patch
      takes a different approach: we add a flag to the system call number;
      this flag doesn't affect the system call dispatch but requests compat
      treatment from affected subsystems for the duration of the system call.
      
      The change of cmpq to cmpl is safe since it immediately follows the
      and.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      fca460f9