1. 16 1月, 2009 1 次提交
    • I
      percpu: add optimized generic percpu accessors · 6dbde353
      Ingo Molnar 提交于
      It is an optimization and a cleanup, and adds the following new
      generic percpu methods:
      
        percpu_read()
        percpu_write()
        percpu_add()
        percpu_sub()
        percpu_and()
        percpu_or()
        percpu_xor()
      
      and implements support for them on x86. (other architectures will fall
      back to a default implementation)
      
      The advantage is that for example to read a local percpu variable,
      instead of this sequence:
      
       return __get_cpu_var(var);
      
       ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
       ffffffff8102ca32:	81
       ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
       ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax
      
      We can get a single instruction by using the optimized variants:
      
       return percpu_read(var);
      
       ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax
      
      I also cleaned up the x86-specific APIs and made the x86 code use
      these new generic percpu primitives.
      
      tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
          * added percpu_and() for completeness's sake
          * made generic percpu ops atomic against preemption
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6dbde353
  2. 04 1月, 2009 1 次提交
    • J
      x86: process_32.c fix style problems · befa9e78
      Jaswinder Singh Rajput 提交于
      Impact: cleanup
      
      Fix:
      
       WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
       WARNING: Use #include <linux/io.h> instead of <asm/io.h>
       WARNING: Use #include <linux/kdebug.h> instead of <asm/kdebug.h>
       WARNING: Use #include <linux/smp.h> instead of <asm/smp.h>
       ERROR: "foo * bar" should be "foo *bar"
       ERROR: trailing whitespace
       ERROR: spaces required around that ':' (ctx:WxO)
       ERROR: spaces required around that ':' (ctx:OxW)
      
       total: 7 errors, 4 warnings
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      befa9e78
  3. 20 12月, 2008 1 次提交
    • M
      x86, bts: add fork and exit handling · bf53de90
      Markus Metzger 提交于
      Impact: introduce new ptrace facility
      
      Add arch_ptrace_untrace() function that is called when the tracer
      detaches (either voluntarily or when the tracing task dies);
      ptrace_disable() is only called on a voluntary detach.
      
      Add ptrace_fork() and arch_ptrace_fork(). They are called when a
      traced task is forked.
      
      Clear DS and BTS related fields on fork.
      
      Release DS resources and reclaim memory in ptrace_untrace(). This
      releases resources already when the tracing task dies. We used to do
      that when the traced task dies.
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bf53de90
  4. 12 12月, 2008 1 次提交
  5. 08 12月, 2008 1 次提交
    • F
      tracing/function-graph-tracer: introduce __notrace_funcgraph to filter special functions · 8b96f011
      Frederic Weisbecker 提交于
      Impact: trace more functions
      
      When the function graph tracer is configured, three more files are not
      traced to prevent only four functions to be traced. And this impacts the
      normal function tracer too.
      
      arch/x86/kernel/process_64/32.c:
      
      I had crashes when I let this file traced. After some debugging, I saw
      that the "current" task point was changed inside__swtich_to(), ie:
      "write_pda(pcurrent, next_p);" inside process_64.c Since the tracer store
      the original return address of the function inside current, we had
      crashes. Only __switch_to() has to be excluded from tracing.
      
      kernel/module.c and kernel/extable.c:
      
      Because of a function used internally by the function graph tracer:
      __kernel_text_address()
      
      To let the other functions inside these files to be traced, this patch
      introduces the __notrace_funcgraph function prefix which is __notrace if
      function graph tracer is configured and nothing if not.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8b96f011
  6. 13 10月, 2008 1 次提交
  7. 24 9月, 2008 1 次提交
  8. 23 9月, 2008 1 次提交
    • T
      x86: prevent stale state of c1e_mask across CPU offline/online · 4faac97d
      Thomas Gleixner 提交于
      Impact: hang which happens across CPU offline/online on AMD C1E systems.
      
      When a CPU goes offline then the corresponding bit in the broadcast
      mask is cleared. For AMD C1E enabled CPUs we do not reenable the
      broadcast when the CPU comes online again as we do not clear the
      corresponding bit in the c1e_mask, which keeps track which CPUs
      have been switched to broadcast already. So on those !$@#& machines
      we never switch back to broadcasting after a CPU offline/online cycle.
      
      Clear the bit when the CPU plays dead.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      4faac97d
  9. 17 9月, 2008 1 次提交
  10. 05 9月, 2008 1 次提交
  11. 25 8月, 2008 3 次提交
  12. 15 8月, 2008 1 次提交
  13. 22 7月, 2008 2 次提交
  14. 19 7月, 2008 1 次提交
    • T
      nohz: prevent tick stop outside of the idle loop · b8f8c3cf
      Thomas Gleixner 提交于
      Jack Ren and Eric Miao tracked down the following long standing
      problem in the NOHZ code:
      
      	scheduler switch to idle task
      	enable interrupts
      
      Window starts here
      
      	----> interrupt happens (does not set NEED_RESCHED)
      	      	irq_exit() stops the tick
      
      	----> interrupt happens (does set NEED_RESCHED)
      
      	return from schedule()
      	
      	cpu_idle(): preempt_disable();
      
      Window ends here
      
      The interrupts can happen at any point inside the race window. The
      first interrupt stops the tick, the second one causes the scheduler to
      rerun and switch away from idle again and we end up with the tick
      disabled.
      
      The fact that it needs two interrupts where the first one does not set
      NEED_RESCHED and the second one does made the bug obscure and extremly
      hard to reproduce and analyse. Kudos to Jack and Eric.
      
      Solution: Limit the NOHZ functionality to the idle loop to make sure
      that we can not run into such a situation ever again.
      
      cpu_idle()
      {
      	preempt_disable();
      
      	while(1) {
      		 tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
      		 			          are in the idle loop
      
      		 while (!need_resched())
      		       halt();
      
      		 tick_nohz_restart_sched_tick(); <- disables NOHZ mode
      		 preempt_enable_no_resched();
      		 schedule();
      		 preempt_disable();
      	}
      }
      
      In hindsight we should have done this forever, but ... 
      
      /me grabs a large brown paperbag.
      
      Debugged-by: Jack Ren <jack.ren@marvell.com>, 
      Debugged-by: Neric miao <eric.y.miao@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b8f8c3cf
  15. 08 7月, 2008 1 次提交
  16. 19 6月, 2008 1 次提交
    • S
      x86: fix NULL pointer deref in __switch_to · 75118a82
      Suresh Siddha 提交于
      Patrick McHardy reported a crash:
      
      > > I get this oops once a day, its apparently triggered by something
      > > run by cron, but the process is a different one each time.
      > >
      > > Kernel is -git from yesterday shortly before the -rc6 release
      > > (last commit is the usb-2.6 merge, the x86 patches are missing),
      > > .config is attached.
      > >
      > > I'll retry with current -git, but the patches that have gone in
      > > since I last updated don't look related.
      > >
      > > [62060.043009] BUG: unable to handle kernel NULL pointer dereference at
      > > 000001ff
      > > [62060.043009] IP: [<c0102a9b>] __switch_to+0x2f/0x118
      > > [62060.043009] *pde = 00000000
      > > [62060.043009] Oops: 0002 [#1] PREEMPT
      
      Vegard Nossum analyzed it:
      
      > This decodes to
      >
      >    0:   0f ae 00                fxsave (%eax)
      >
      > so it's related to the floating-point context. This is the exact
      > location of the crash:
      >
      > $ addr2line -e arch/x86/kernel/process_32.o -i ab0
      > include/asm/i387.h:232
      > include/asm/i387.h:262
      > arch/x86/kernel/process_32.c:595
      >
      > ...so it looks like prev_task->thread.xstate->fxsave has become NULL.
      > Or maybe it never had any other value.
      
      Somehow (as described below) TS_USEDFPU is set but the fpu is not
      allocated or freed.
      
      Another possible FPU pre-emption issue with the sleazy FPU optimization
      which was benign before but not so anymore, with the dynamic FPU allocation
      patch.
      
      New task is getting exec'd and it is prempted at the below point.
      
      flush_thread() {
      	...
      	/*
      	* Forget coprocessor state..
      	*/
      	clear_fpu(tsk);
      		<----- Preemption point
      	clear_used_math();
      	...
      }
      
      Now when it context switches in again, as the used_math() is still set
      and fpu_counter can be > 5, we will do a math_state_restore() which sets
      the task's TS_USEDFPU. After it continues from the above preemption point
      it does clear_used_math() and much later free_thread_xstate().
      
      Now, at the next context switch, it is quite possible that xstate is
      null, used_math() is not set and TS_USEDFPU is still set. This will
      trigger unlazy_fpu() causing kernel oops.
      
      Fix this  by clearing tsk's fpu_counter before clearing task's fpu.
      Reported-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      75118a82
  17. 10 6月, 2008 2 次提交
  18. 04 6月, 2008 1 次提交
    • S
      x86, fpu: fix CONFIG_PREEMPT=y corruption of application's FPU stack · 870568b3
      Suresh Siddha 提交于
      Jürgen Mell reported an FPU state corruption bug under CONFIG_PREEMPT,
      and bisected it to commit v2.6.19-1363-gacc20761, "i386: add sleazy FPU
      optimization".
      
      Add tsk_used_math() checks to prevent calling math_state_restore()
      which can sleep in the case of !tsk_used_math(). This prevents
      making a blocking call in __switch_to().
      
      Apparently "fpu_counter > 5" check is not enough, as in some signal handling
      and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.
      
      It's a side effect though. This is the failing scenario:
      
      process 'A' in save_i387_ia32() just after clear_used_math()
      
      Got an interrupt and pre-empted out.
      
      At the next context switch to process 'A' again, kernel tries to restore
      the math state proactively and sees a fpu_counter > 0 and !tsk_used_math()
      
      This results in init_fpu() during the __switch_to()'s math_state_restore()
      
      And resulting in fpu corruption which will be saved/restored
      (save_i387_fxsave and restore_i387_fxsave) during the remaining
      part of the signal handling after the context switch.
      Bisected-by: NJürgen Mell <j.mell@t-online.de>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Tested-by: NJürgen Mell <j.mell@t-online.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      870568b3
  19. 24 5月, 2008 1 次提交
    • S
      ftrace: trace preempt off critical timings · 6cd8a4bb
      Steven Rostedt 提交于
      Add preempt off timings. A lot of kernel core code is taken from the RT patch
      latency trace that was written by Ingo Molnar.
      
      This adds "preemptoff" and "preemptirqsoff" to /debugfs/tracing/available_tracers
      
      Now instead of just tracing irqs off, preemption off can be selected
      to be recorded.
      
      When this is selected, it shares the same files as irqs off timings.
      One can either trace preemption off, irqs off, or one or the other off.
      
      By echoing "preemptoff" into /debugfs/tracing/current_tracer, recording
      of preempt off only is performed. "irqsoff" will only record the time
      irqs are disabled, but "preemptirqsoff" will take the total time irqs
      or preemption are disabled. Runtime switching of these options is now
      supported by simpling echoing in the appropriate trace name into
      /debugfs/tracing/current_tracer.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6cd8a4bb
  20. 13 5月, 2008 2 次提交
    • A
      x86, ptrace: PEBS support, warning fix · 970e7250
      Andrew Morton 提交于
      arch/x86/kernel/process_32.c:566: warning: unused variable 'ds_next'
      arch/x86/kernel/process_32.c:566: warning: unused variable 'ds_prev'
      
      Cc: Markus Metzger <markus.t.metzger@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <juan.villacis@intel.com>
      Cc: stephane eranian <eranian@googlemail.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      970e7250
    • M
      x86, ptrace: PEBS support · 93fa7636
      Markus Metzger 提交于
      Polish the ds.h interface and add support for PEBS.
      
      Ds.c is meant to be the resource allocator for per-thread and per-cpu
      BTS and PEBS recording.
      It is used by ptrace/utrace to provide execution tracing of debugged tasks.
      It will be used by profilers (e.g. perfmon2).
      It may be used by kernel debuggers to provide a kernel execution trace.
      
      Changes in detail:
      - guard DS and ptrace by CONFIG macros
      - separate DS and BTS more clearly
      - simplify field accesses
      - add functions to manage PEBS buffers
      - add simple protection/allocation mechanism
      - added support for Atom
      
      Opens:
      - buffer overflow handling
        Currently, only circular buffers are supported. This is all we need
        for debugging. Profilers would want an overflow notification.
        This is planned to be added when perfmon2 is made to use the ds.h
        interface.
      - utrace intermediate layer
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      93fa7636
  21. 27 4月, 2008 1 次提交
    • P
      fix idle (arch, acpi and apm) and lockdep · 7f424a8b
      Peter Zijlstra 提交于
      OK, so 25-mm1 gave a lockdep error which made me look into this.
      
      The first thing that I noticed was the horrible mess; the second thing I
      saw was hacks like: 71e93d15
      
      The problem is that arch idle routines are somewhat inconsitent with
      their IRQ state handling and instead of fixing _that_, we go paper over
      the problem.
      
      So the thing I've tried to do is set a standard for idle routines and
      fix them all up to adhere to that. So the rules are:
      
        idle routines are entered with IRQs disabled
        idle routines will exit with IRQs enabled
      
      Nearly all already did this in one form or another.
      
      Merge the 32 and 64 bit bits so they no longer have different bugs.
      
      As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
      irq-enable.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NBob Copeland <me@bobcopeland.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7f424a8b
  22. 25 4月, 2008 1 次提交
  23. 20 4月, 2008 4 次提交
  24. 17 4月, 2008 5 次提交
  25. 11 4月, 2008 1 次提交
  26. 01 3月, 2008 1 次提交
  27. 09 2月, 2008 2 次提交