1. 03 2月, 2012 1 次提交
  2. 12 12月, 2011 2 次提交
    • F
      rcu: Fix early call to rcu_idle_enter() · 416eb33c
      Frederic Weisbecker 提交于
      On the irq exit path, tick_nohz_irq_exit()
      may raise a softirq, which action leads to the wake up
      path and select_task_rq_fair() that makes use of rcu
      to iterate the domains.
      
      This is an illegal use of RCU because we may be in RCU
      extended quiescent state if we interrupted an RCU-idle
      window in the idle loop:
      
      [  132.978883] ===============================
      [  132.978883] [ INFO: suspicious RCU usage. ]
      [  132.978883] -------------------------------
      [  132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
      [  132.978883]
      [  132.978883] other info that might help us debug this:
      [  132.978883]
      [  132.978883]
      [  132.978883] rcu_scheduler_active = 1, debug_locks = 0
      [  132.978883] RCU used illegally from extended quiescent state!
      [  132.978883] 2 locks held by swapper/0:
      [  132.978883]  #0:  (&p->pi_lock){-.-.-.}, at: [<ffffffff8105a729>] try_to_wake_up+0x39/0x2f0
      [  132.978883]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8105556a>] select_task_rq_fair+0x6a/0xec0
      [  132.978883]
      [  132.978883] stack backtrace:
      [  132.978883] Pid: 0, comm: swapper Tainted: G        W   3.0.0+ #178
      [  132.978883] Call Trace:
      [  132.978883]  <IRQ>  [<ffffffff810a01f6>] lockdep_rcu_suspicious+0xe6/0x100
      [  132.978883]  [<ffffffff81055c49>] select_task_rq_fair+0x749/0xec0
      [  132.978883]  [<ffffffff8105556a>] ? select_task_rq_fair+0x6a/0xec0
      [  132.978883]  [<ffffffff812fe494>] ? do_raw_spin_lock+0x54/0x150
      [  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
      [  132.978883]  [<ffffffff8105a7c3>] try_to_wake_up+0xd3/0x2f0
      [  132.978883]  [<ffffffff81094f98>] ? ktime_get+0x68/0xf0
      [  132.978883]  [<ffffffff8105aa35>] wake_up_process+0x15/0x20
      [  132.978883]  [<ffffffff81069dd5>] raise_softirq_irqoff+0x65/0x110
      [  132.978883]  [<ffffffff8108eb65>] __hrtimer_start_range_ns+0x415/0x5a0
      [  132.978883]  [<ffffffff812fe3ee>] ? do_raw_spin_unlock+0x5e/0xb0
      [  132.978883]  [<ffffffff8108ed08>] hrtimer_start+0x18/0x20
      [  132.978883]  [<ffffffff8109c9c3>] tick_nohz_stop_sched_tick+0x393/0x450
      [  132.978883]  [<ffffffff810694f2>] irq_exit+0xd2/0x100
      [  132.978883]  [<ffffffff81829e96>] do_IRQ+0x66/0xe0
      [  132.978883]  [<ffffffff81820d53>] common_interrupt+0x13/0x13
      [  132.978883]  <EOI>  [<ffffffff8103434b>] ? native_safe_halt+0xb/0x10
      [  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
      [  132.978883]  [<ffffffff810144ea>] default_idle+0xba/0x370
      [  132.978883]  [<ffffffff810147fe>] amd_e400_idle+0x5e/0x130
      [  132.978883]  [<ffffffff8100a9f6>] cpu_idle+0xb6/0x120
      [  132.978883]  [<ffffffff817f217f>] rest_init+0xef/0x150
      [  132.978883]  [<ffffffff817f20e2>] ? rest_init+0x52/0x150
      [  132.978883]  [<ffffffff81ed9cf3>] start_kernel+0x3da/0x3e5
      [  132.978883]  [<ffffffff81ed9346>] x86_64_start_reservations+0x131/0x135
      [  132.978883]  [<ffffffff81ed944d>] x86_64_start_kernel+0x103/0x112
      
      Fix this by calling rcu_idle_enter() after tick_nohz_irq_exit().
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      416eb33c
    • F
      nohz: Separate out irq exit and idle loop dyntick logic · 280f0677
      Frederic Weisbecker 提交于
      The tick_nohz_stop_sched_tick() function, which tries to delay
      the next timer tick as long as possible, can be called from two
      places:
      
      - From the idle loop to start the dytick idle mode
      - From interrupt exit if we have interrupted the dyntick
      idle mode, so that we reprogram the next tick event in
      case the irq changed some internal state that requires this
      action.
      
      There are only few minor differences between both that
      are handled by that function, driven by the ts->inidle
      cpu variable and the inidle parameter. The whole guarantees
      that we only update the dyntick mode on irq exit if we actually
      interrupted the dyntick idle mode, and that we enter in RCU extended
      quiescent state from idle loop entry only.
      
      Split this function into:
      
      - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
      dynticks idle mode unconditionally if it can, and enters into RCU
      extended quiescent state.
      
      - tick_nohz_irq_exit() which only updates the dynticks idle mode
      when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).
      
      To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
      into tick_nohz_idle_exit().
      
      This simplifies the code and micro-optimize the irq exit path (no need
      for local_irq_save there). This also prepares for the split between
      dynticks and rcu extended quiescent state logics. We'll need this split to
      further fix illegal uses of RCU in extended quiescent states in the idle
      loop.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      280f0677
  3. 31 10月, 2011 1 次提交
  4. 21 7月, 2011 1 次提交
    • P
      softirq,rcu: Inform RCU of irq_exit() activity · ec433f0c
      Peter Zijlstra 提交于
      The rcu_read_unlock_special() function relies on in_irq() to exclude
      scheduler activity from interrupt level.  This fails because exit_irq()
      can invoke the scheduler after clearing the preempt_count() bits that
      in_irq() uses to determine that it is at interrupt level.  This situation
      can result in failures as follows:
      
       $task			IRQ		SoftIRQ
      
       rcu_read_lock()
      
       /* do stuff */
      
       <preempt> |= UNLOCK_BLOCKED
      
       rcu_read_unlock()
         --t->rcu_read_lock_nesting
      
      			irq_enter();
      			/* do stuff, don't use RCU */
      			irq_exit();
      			  sub_preempt_count(IRQ_EXIT_OFFSET);
      			  invoke_softirq()
      
      					ttwu();
      					  spin_lock_irq(&pi->lock)
      					  rcu_read_lock();
      					  /* do stuff */
      					  rcu_read_unlock();
      					    rcu_read_unlock_special()
      					      rcu_report_exp_rnp()
      					        ttwu()
      					          spin_lock_irq(&pi->lock) /* deadlock */
      
         rcu_read_unlock_special(t);
      
      Ed can simply trigger this 'easy' because invoke_softirq() immediately
      does a ttwu() of ksoftirqd/# instead of doing the in-place softirq stuff
      first, but even without that the above happens.
      
      Cure this by also excluding softirqs from the
      rcu_read_unlock_special() handler and ensuring the force_irqthreads
      ksoftirqd/# wakeup is done from full softirq context.
      
      [ Alternatively, delaying the ->rcu_read_lock_nesting decrement
        until after the special handling would make the thing more robust
        in the face of interrupts as well.  And there is a separate patch
        for that. ]
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Reported-and-tested-by: NEd Tomlinson <edt@aei.ca>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ec433f0c
  5. 15 6月, 2011 1 次提交
    • S
      rcu: Use softirq to address performance regression · 09223371
      Shaohua Li 提交于
      Commit a26ac245(rcu: move TREE_RCU from softirq to kthread)
      introduced performance regression. In an AIM7 test, this commit degraded
      performance by about 40%.
      
      The commit runs rcu callbacks in a kthread instead of softirq. We observed
      high rate of context switch which is caused by this. Out test system has
      64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
      which is caused by RCU's per-CPU kthread.  A trace showed that most of
      the time the RCU per-CPU kthread doesn't actually handle any callbacks,
      but instead just does a very small amount of work handling grace periods.
      This means that RCU's per-CPU kthreads are making the scheduler do quite
      a bit of work in order to allow a very small amount of RCU-related
      processing to be done.
      
      Alex Shi's analysis determined that this slowdown is due to lock
      contention within the scheduler.  Unfortunately, as Peter Zijlstra points
      out, the scheduler's real-time semantics require global action, which
      means that this contention is inherent in real-time scheduling.  (Yes,
      perhaps someone will come up with a workaround -- otherwise, -rt is not
      going to do well on large SMP systems -- but this patch will work around
      this issue in the meantime.  And "the meantime" might well be forever.)
      
      This patch therefore re-introduces softirq processing to RCU, but only
      for core RCU work.  RCU callbacks are still executed in kthread context,
      so that only a small amount of RCU work runs in softirq context in the
      common case.  This should minimize ksoftirqd execution, allowing us to
      skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Tested-by: N"Alex,Shi" <alex.shi@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      09223371
  6. 06 5月, 2011 1 次提交
  7. 31 3月, 2011 1 次提交
  8. 23 3月, 2011 1 次提交
  9. 26 2月, 2011 1 次提交
    • T
      genirq: Provide forced interrupt threading · 8d32a307
      Thomas Gleixner 提交于
      Add a commandline parameter "threadirqs" which forces all interrupts except
      those marked IRQF_NO_THREAD to run threaded. That's mostly a debug option to
      allow retrieving better debug data from crashing interrupt handlers. If
      "threadirqs" is not enabled on the kernel command line, then there is no
      impact in the interrupt hotpath.
      
      Architecture code needs to select CONFIG_IRQ_FORCED_THREADING after
      marking the interrupts which cant be threaded IRQF_NO_THREAD. All
      interrupts which have IRQF_TIMER set are implict marked
      IRQF_NO_THREAD. Also all PER_CPU interrupts are excluded.
      
      Forced threading hard interrupts also forces all soft interrupt
      handling into thread context.
      
      When enabled it might slow down things a bit, but for debugging problems in
      interrupt code it's a reasonable penalty as it does not immediately
      crash and burn the machine when an interrupt handler is buggy.
      
      Some test results on a Core2Duo machine:
      
      Cache cold run of:
       # time git grep irq_desc
      
            non-threaded       threaded
       real 1m18.741s          1m19.061s
       user 0m1.874s           0m1.757s
       sys  0m5.843s           0m5.427s
      
       # iperf -c server
      non-threaded
      [  3]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
      [  3]  0.0-10.0 sec  1.09 GBytes   934 Mbits/sec
      [  3]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
      threaded
      [  3]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec
      [  3]  0.0-10.0 sec  1.09 GBytes   934 Mbits/sec
      [  3]  0.0-10.0 sec  1.09 GBytes   937 Mbits/sec
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <20110223234956.772668648@linutronix.de>
      8d32a307
  10. 09 2月, 2011 1 次提交
  11. 26 1月, 2011 1 次提交
  12. 14 1月, 2011 1 次提交
    • A
      kernel: clean up USE_GENERIC_SMP_HELPERS · 351f8f8e
      Amerigo Wang 提交于
      For arch which needs USE_GENERIC_SMP_HELPERS, it has to select
      USE_GENERIC_SMP_HELPERS, rather than leaving a choice to user, since they
      don't provide their own implementions.
      
      Also, move on_each_cpu() to kernel/smp.c, it is strange to put it in
      kernel/softirq.c.
      
      For arch which doesn't use USE_GENERIC_SMP_HELPERS, e.g.  blackfin, only
      on_each_cpu() is compiled.
      Signed-off-by: NAmerigo Wang <amwang@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      351f8f8e
  13. 07 1月, 2011 1 次提交
  14. 17 12月, 2010 1 次提交
  15. 23 10月, 2010 1 次提交
  16. 21 10月, 2010 1 次提交
    • T
      tracing: Cleanup the convoluted softirq tracepoints · f4bc6bb2
      Thomas Gleixner 提交于
      With the addition of trace_softirq_raise() the softirq tracepoint got
      even more convoluted. Why the tracepoints take two pointers to assign
      an integer is beyond my comprehension.
      
      But adding an extra case which treats the first pointer as an unsigned
      long when the second pointer is NULL including the back and forth
      type casting is just horrible.
      
      Convert the softirq tracepoints to take a single unsigned int argument
      for the softirq vector number and fix the call sites.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <alpine.LFD.2.00.1010191428560.6815@localhost6.localdomain6>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: mathieu.desnoyers@efficios.com
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      f4bc6bb2
  17. 19 10月, 2010 3 次提交
    • V
      sched: Call tick_check_idle before __irq_enter · d267f87f
      Venkatesh Pallipadi 提交于
      When CPU is idle and on first interrupt, irq_enter calls tick_check_idle()
      to notify interruption from idle. But, there is a problem if this call
      is done after __irq_enter, as all routines in __irq_enter may find
      stale time due to yet to be done tick_check_idle.
      
      Specifically, trace calls in __irq_enter when they use global clock and also
      account_system_vtime change in this patch as it wants to use sched_clock_cpu()
      to do proper irq timing.
      
      But, tick_check_idle was moved after __irq_enter intentionally to
      prevent problem of unneeded ksoftirqd wakeups by the commit ee5f80a9:
      
          irq: call __irq_enter() before calling the tick_idle_check
          Impact: avoid spurious ksoftirqd wakeups
      
      Moving tick_check_idle() before __irq_enter and wrapping it with
      local_bh_enable/disable would solve both the problems.
      Fixed-by: NYong Zhang <yong.zhang0@gmail.com>
      Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1286237003-12406-9-git-send-email-venki@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d267f87f
    • V
      sched: Add a PF flag for ksoftirqd identification · 6cdd5199
      Venkatesh Pallipadi 提交于
      To account softirq time cleanly in scheduler, we need to identify whether
      softirq is invoked in ksoftirqd context or softirq at hardirq tail context.
      Add PF_KSOFTIRQD for that purpose.
      
      As all PF flag bits are currently taken, create space by moving one of the
      infrequently used bits (PF_THREAD_BOUND) down in task_struct to be along
      with some other state fields.
      Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1286237003-12406-4-git-send-email-venki@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6cdd5199
    • V
      sched: Fix softirq time accounting · 75e1056f
      Venkatesh Pallipadi 提交于
      Peter Zijlstra found a bug in the way softirq time is accounted in
      VIRT_CPU_ACCOUNTING on this thread:
      
         http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html
      
      The problem is, softirq processing uses local_bh_disable internally. There
      is no way, later in the flow, to differentiate between whether softirq is
      being processed or is it just that bh has been disabled. So, a hardirq when bh
      is disabled results in time being wrongly accounted as softirq.
      
      Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING
      as well. As account_system_time() in normal tick based accouting also uses
      softirq_count, which will be set even when not in softirq with bh disabled.
      
      Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count
      for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq
      processing. The patch below does that and adds API in_serving_softirq() which
      returns whether we are currently processing softirq or not.
      
      Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c
      to in_serving_softirq.
      
      Looks like many usages of in_softirq really want in_serving_softirq. Those
      changes can be made individually on a case by case basis.
      Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1286237003-12406-2-git-send-email-venki@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      75e1056f
  18. 12 10月, 2010 2 次提交
  19. 22 9月, 2010 1 次提交
  20. 05 6月, 2010 1 次提交
  21. 28 5月, 2010 1 次提交
  22. 11 5月, 2010 1 次提交
    • P
      rcu: refactor RCU's context-switch handling · 25502a6c
      Paul E. McKenney 提交于
      The addition of preemptible RCU to treercu resulted in a bit of
      confusion and inefficiency surrounding the handling of context switches
      for RCU-sched and for RCU-preempt.  For RCU-sched, a context switch
      is a quiescent state, pure and simple, just like it always has been.
      For RCU-preempt, a context switch is in no way a quiescent state, but
      special handling is required when a task blocks in an RCU read-side
      critical section.
      
      However, the callout from the scheduler and the outer loop in ksoftirqd
      still calls something named rcu_sched_qs(), whose name is no longer
      accurate.  Furthermore, when rcu_check_callbacks() notes an RCU-sched
      quiescent state, it ends up unnecessarily (though harmlessly, aside
      from the performance hit) enqueuing the current task if it happens to
      be running in an RCU-preempt read-side critical section.  This not only
      increases the maximum latency of scheduler_tick(), it also needlessly
      increases the overhead of the next outermost rcu_read_unlock() invocation.
      
      This patch addresses this situation by separating the notion of RCU's
      context-switch handling from that of RCU-sched's quiescent states.
      The context-switch handling is covered by rcu_note_context_switch() in
      general and by rcu_preempt_note_context_switch() for preemptible RCU.
      This permits rcu_sched_qs() to handle quiescent states and only quiescent
      states.  It also reduces the maximum latency of scheduler_tick(), though
      probably by much less than a microsecond.  Finally, it means that tasks
      within preemptible-RCU read-side critical sections avoid incurring the
      overhead of queuing unless there really is a context switch.
      Suggested-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      25502a6c
  23. 04 2月, 2010 1 次提交
  24. 02 11月, 2009 1 次提交
    • L
      rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls · c5e0cb3d
      Lai Jiangshan 提交于
      Currently, rcu_irq_exit() is invoked only for CONFIG_NO_HZ,
      while rcu_irq_enter() is invoked unconditionally.  This patch
      moves rcu_irq_exit() out from under CONFIG_NO_HZ so that the
      calls are balanced.
      
      This patch has no effect on the behavior of the kernel because
      both rcu_irq_enter() and rcu_irq_exit() are empty for
      !CONFIG_NO_HZ, but the code is easier to understand if the calls
      are obviously balanced in all cases.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12567428891605-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c5e0cb3d
  25. 29 10月, 2009 1 次提交
    • T
      percpu: make percpu symbols under kernel/ and mm/ unique · 1871e52c
      Tejun Heo 提交于
      This patch updates percpu related symbols under kernel/ and mm/ such
      that percpu symbols are unique and don't clash with local symbols.
      This serves two purposes of decreasing the possibility of global
      percpu symbol collision and allowing dropping per_cpu__ prefix from
      percpu symbols.
      
      * kernel/lockdep.c: s/lock_stats/cpu_lock_stats/
      
      * kernel/sched.c: s/init_rq_rt/init_rt_rq_var/	(any better idea?)
        		  s/sched_group_cpus/sched_groups/
      
      * kernel/softirq.c: s/ksoftirqd/run_ksoftirqd/a
      
      * kernel/softlockup.c: s/(*)_timestamp/softlockup_\1_ts/
        		       s/watchdog_task/softlockup_watchdog/
      		       s/timestamp/ts/ for local variables
      
      * kernel/time/timer_stats: s/lookup_lock/tstats_lookup_lock/
      
      * mm/slab.c: s/reap_work/slab_reap_work/
        	     s/reap_node/slab_reap_node/
      
      * mm/vmstat.c: local variable changed to avoid collision with vmstat_work
      
      Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
      which cause name clashes" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N(slab/vmstat) Christoph Lameter <cl@linux-foundation.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      1871e52c
  26. 18 9月, 2009 1 次提交
  27. 23 8月, 2009 1 次提交
    • P
      rcu: Renamings to increase RCU clarity · d6714c22
      Paul E. McKenney 提交于
      Make RCU-sched, RCU-bh, and RCU-preempt be underlying
      implementations, with "RCU" defined in terms of one of the
      three.  Update the outdated rcu_qsctr_inc() names, as these
      functions no longer increment anything.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <12509746132696-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d6714c22
  28. 22 7月, 2009 1 次提交
    • P
      softirq: introduce tasklet_hrtimer infrastructure · 9ba5f005
      Peter Zijlstra 提交于
      commit ca109491 (hrtimer: removing all ur callback modes) moved all
      hrtimer callbacks into hard interrupt context when high resolution
      timers are active. That breaks code which relied on the assumption
      that the callback happens in softirq context.
      
      Provide a generic infrastructure which combines tasklets and hrtimers
      together to provide an in-softirq hrtimer experience.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: torvalds@linux-foundation.org
      Cc: kaber@trash.net
      Cc: David Miller <davem@davemloft.net>
      LKML-Reference: <1248265724.27058.1366.camel@twins>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      9ba5f005
  29. 19 6月, 2009 1 次提交
    • K
      softirq: introduce statistics for softirq · aa0ce5bb
      Keika Kobayashi 提交于
      Statistics for softirq doesn't exist.
      It will be helpful like statistics for interrupts.
      This patch introduces counting the number of softirq,
      which will be exported in /proc/softirqs.
      
      When softirq handler consumes much CPU time,
      /proc/stat is like the following.
      
      $ while :; do  cat /proc/stat | head -n1 ; sleep 10 ; done
      cpu  88 0 408 739665 583 28 2 0 0
      cpu  450 0 1090 740970 594 28 1294 0 0
                                    ^^^^
                                   softirq
      
      In such a situation,
      /proc/softirqs shows us which softirq handler is invoked.
      We can see the increase rate of softirqs.
      
      <before>
      $ cat /proc/softirqs
                      CPU0       CPU1       CPU2       CPU3
      HI                 0          0          0          0
      TIMER         462850     462805     462782     462718
      NET_TX             0          0          0        365
      NET_RX          2472          2          2         40
      BLOCK              0          0        381       1164
      TASKLET            0          0          0        224
      SCHED         462654     462689     462698     462427
      RCU             3046       2423       3367       3173
      
      <after>
      $ cat /proc/softirqs
                      CPU0       CPU1       CPU2       CPU3
      HI                 0          0          0          0
      TIMER         463361     465077     465056     464991
      NET_TX            53          0          1        365
      NET_RX          3757          2          2         40
      BLOCK              0          0        398       1170
      TASKLET            0          0          0        224
      SCHED         463074     464318     464612     463330
      RCU             3505       2948       3947       3673
      
      When CPU TIME of softirq is high,
      the rates of increase is the following.
        TIMER  : 220/sec     : CPU1-3
        NET_TX : 5/sec       : CPU0
        NET_RX : 120/sec     : CPU0
        SCHED  : 40-200/sec  : all CPU
        RCU    : 45-58/sec   : all CPU
      
      The rates of increase in an idle mode is the following.
        TIMER  : 250/sec
        SCHED  : 250/sec
        RCU    : 2/sec
      
      It seems many softirqs for receiving packets and rcu are invoked.  This
      gives us help for checking system.
      Signed-off-by: NKeika Kobayashi <kobayashi.kk@ncos.nec.co.jp>
      Reviewed-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa0ce5bb
  30. 13 6月, 2009 1 次提交
    • V
      tasklets: new tasklet scheduling function · 7c692cba
      Vegard Nossum 提交于
      Rationale: kmemcheck needs to be able to schedule a tasklet without
      touching any dynamically allocated memory _at_ _all_ (since that would
      lead to a recursive page fault). This tasklet is used for writing the
      error reports to the kernel log.
      
      The new scheduling function avoids touching any other tasklets by
      inserting the new tasklist as the head of the "tasklet_hi" list instead
      of on the tail.
      
      Also don't wake up the softirq thread lest the scheduler access some
      tracked memory and we go down with a recursive page fault.
      
      In this case, we'd better just wait for the maximum time of 1/HZ for the
      message to appear.
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      7c692cba
  31. 29 4月, 2009 1 次提交
    • H
      tracing: fix build failure on s390 · a0e39ed3
      Heiko Carstens 提交于
      "tracing: create automated trace defines" causes this compile error on s390,
      as reported by Sachin Sant against linux-next:
      
       kernel/built-in.o: In function `__do_softirq':
       (.text+0x1c680): undefined reference to `__tracepoint_softirq_entry'
      
      This happens because the definitions of the softirq tracepoints were moved
      from kernel/softirq.c to kernel/irq/handle.c. Since s390 doesn't support
      generic hardirqs handle.c doesn't get compiled and the definitions are
      missing.
      
      So move the tracepoints to softirq.c again.
      
      [ Impact: fix build failure on s390 ]
      Reported-by: NSachin Sant <sachinp@in.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: fweisbec@gmail.com
      LKML-Reference: <20090429135139.5fac79b8@osiris.boeblingen.de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a0e39ed3
  32. 28 4月, 2009 1 次提交
    • Y
      x86/irq: change irq_desc_alloc() to take node instead of cpu · 85ac16d0
      Yinghai Lu 提交于
      This simplifies the node awareness of the code. All our allocators
      only deal with a NUMA node ID locality not with CPU ids anyway - so
      there's no need to maintain (and transform) a CPU id all across the
      IRq layer.
      
      v2: keep move_irq_desc related
      
      [ Impact: cleanup, prepare IRQ code to be NUMA-aware ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      LKML-Reference: <49F65536.2020300@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      85ac16d0
  33. 17 4月, 2009 1 次提交
  34. 15 4月, 2009 2 次提交
    • S
      tracing/events: move trace point headers into include/trace/events · ad8d75ff
      Steven Rostedt 提交于
      Impact: clean up
      
      Create a sub directory in include/trace called events to keep the
      trace point headers in their own separate directory. Only headers that
      declare trace points should be defined in this directory.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ad8d75ff
    • S
      tracing: create automated trace defines · a8d154b0
      Steven Rostedt 提交于
      This patch lowers the number of places a developer must modify to add
      new tracepoints. The current method to add a new tracepoint
      into an existing system is to write the trace point macro in the
      trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or
      DECLARE_TRACE, then they must add the same named item into the C file
      with the macro DEFINE_TRACE(name) and then add the trace point.
      
      This change cuts out the needing to add the DEFINE_TRACE(name).
      Every file that uses the tracepoint must still include the trace/<type>.h
      file, but the one C file must also add a define before the including
      of that file.
      
       #define CREATE_TRACE_POINTS
       #include <trace/mytrace.h>
      
      This will cause the trace/mytrace.h file to also produce the C code
      necessary to implement the trace point.
      
      Note, if more than one trace/<type>.h is used to create the C code
      it is best to list them all together.
      
       #define CREATE_TRACE_POINTS
       #include <trace/foo.h>
       #include <trace/bar.h>
       #include <trace/fido.h>
      
      Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with
      the cleaner solution of the define above the includes over my first
      design to have the C code include a "special" header.
      
      This patch converts sched, irq and lockdep and skb to use this new
      method.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a8d154b0
  35. 31 3月, 2009 1 次提交
    • P
      hrtimer: fix rq->lock inversion (again) · 7f1e2ca9
      Peter Zijlstra 提交于
      It appears I inadvertly introduced rq->lock recursion to the
      hrtimer_start() path when I delegated running already expired
      timers to softirq context.
      
      This patch fixes it by introducing a __hrtimer_start_range_ns()
      method that will not use raise_softirq_irqoff() but
      __raise_softirq_irqoff() which avoids the wakeup.
      
      It then also changes schedule() to check for pending softirqs and
      do the wakeup then, I'm not quite sure I like this last bit, nor
      am I convinced its really needed.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      LKML-Reference: <20090313112301.096138802@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7f1e2ca9