1. 01 5月, 2013 2 次提交
  2. 08 2月, 2013 1 次提交
  3. 18 11月, 2012 1 次提交
    • F
      printk: Wake up klogd using irq_work · 74876a98
      Frederic Weisbecker 提交于
      klogd is woken up asynchronously from the tick in order
      to do it safely.
      
      However if printk is called when the tick is stopped, the reader
      won't be woken up until the next interrupt, which might not fire
      for a while. As a result, the user may miss some message.
      
      To fix this, lets implement the printk tick using a lazy irq work.
      This subsystem takes care of the timer tick state and can
      fix up accordingly.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      74876a98
  4. 10 10月, 2012 1 次提交
  5. 21 8月, 2012 3 次提交
    • T
      timer: Implement TIMER_IRQSAFE · c5f66e99
      Tejun Heo 提交于
      Timer internals are protected with irq-safe locks but timer execution
      isn't, so a timer being dequeued for execution and its execution
      aren't atomic against IRQs.  This makes it impossible to wait for its
      completion from IRQ handlers and difficult to shoot down a timer from
      IRQ handlers.
      
      This issue caused some issues for delayed_work interface.  Because
      there's no way to reliably shoot down delayed_work->timer from IRQ
      handlers, __cancel_delayed_work() can't share the logic to steal the
      target delayed_work with cancel_delayed_work_sync(), and can only
      steal delayed_works which are on queued on timer.  Similarly, the
      pending mod_delayed_work() can't be used from IRQ handlers.
      
      This patch adds a new timer flag TIMER_IRQSAFE, which makes the timer
      to be executed without enabling IRQ after dequeueing such that its
      dequeueing and execution are atomic against IRQ handlers.
      
      This makes it safe to wait for the timer's completion from IRQ
      handlers, for example, using del_timer_sync().  It can never be
      executing on the local CPU and if executing on other CPUs it won't be
      interrupted until done.
      
      This will enable simplifying delayed_work cancel/mod interface.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: torvalds@linux-foundation.org
      Cc: peterz@infradead.org
      Link: http://lkml.kernel.org/r/1344449428-24962-5-git-send-email-tj@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c5f66e99
    • T
      timer: Clean up timer initializers · fc683995
      Tejun Heo 提交于
      Over time, timer initializers became messy with unnecessarily
      duplicated code which are inconsistently spread across timer.h and
      timer.c.
      
      This patch cleans up timer initializers.
      
      * timer.c::__init_timer() is renamed to do_init_timer().
      
      * __TIMER_INITIALIZER() added.  It takes @flags and all initializers
        are wrappers around it.
      
      * init_timer[_on_stack]_key() now take @flags.
      
      * __init_timer[_on_stack]() added.  They take @flags and all init
        macros are wrappers around them.
      
      * __setup_timer[_on_stack]() added.  It uses __init_timer() and takes
        @flags.  All setup macros are wrappers around the two.
      
      Note that this patch doesn't add missing init/setup combinations -
      e.g. init_timer_deferrable_on_stack().  Adding missing ones is
      trivial.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: torvalds@linux-foundation.org
      Cc: peterz@infradead.org
      Link: http://lkml.kernel.org/r/1344449428-24962-4-git-send-email-tj@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      fc683995
    • T
      timer: Generalize timer->base flags handling · e52b1db3
      Tejun Heo 提交于
      To prepare for addition of another flag, generalize timer->base flags
      handling.
      
      * Rename from TBASE_*_FLAG to TIMER_* and make them LU constants.
      
      * Define and use TIMER_FLAG_MASK for flags masking so that multiple
        flags can be handled correctly.
      
      * Don't dereference timer->base directly even if
        !tbase_get_deferrable().  All two such places are already passed in
        @base, so use it instead.
      
      * Make sure tvec_base's alignment is large enough for timer->base
        flags using BUILD_BUG_ON().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: torvalds@linux-foundation.org
      Cc: peterz@infradead.org
      Link: http://lkml.kernel.org/r/1344449428-24962-2-git-send-email-tj@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e52b1db3
  6. 19 8月, 2012 1 次提交
  7. 06 6月, 2012 4 次提交
    • T
      timers: Improve get_next_timer_interrupt() · e40468a5
      Thomas Gleixner 提交于
      Gilad reported at
      
       http://lkml.kernel.org/r/1336056962-10465-2-git-send-email-gilad@benyossef.com
      
      "Current timer code fails to correctly return a value meaning that
       there is no future timer event, with the result that the timer keeps
       getting re-armed in HZ one shot mode even when we could turn it off,
       generating unneeded interrupts.
      
       What is happening is that when __next_timer_interrupt() wishes
       to return a value that signifies "there is no future timer
       event", it returns (base->timer_jiffies + NEXT_TIMER_MAX_DELTA).
      
       However, the code in tick_nohz_stop_sched_tick(), which called
       __next_timer_interrupt() via get_next_timer_interrupt(),
       compares the return value to (last_jiffies + NEXT_TIMER_MAX_DELTA)
       to see if the timer needs to be re-armed.
      
       base->timer_jiffies != last_jiffies and so tick_nohz_stop_sched_tick()
       interperts the return value as indication that there is a distant
       future event 12 days from now and programs the timer to fire next
       after KTIME_MAX nsecs instead of avoiding to arm it. This ends up
       causing a needless interrupt once every KTIME_MAX nsecs."
      
      Fix this by using the new active timer accounting. This avoids scans
      when no active timer is enqueued completely, so we don't have to rely
      on base->timer_next and base->timer_jiffies anymore.
      Reported-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20120525214819.317535385@linutronix.de
      e40468a5
    • T
      timers: Add accounting of non deferrable timers · 99d5f3aa
      Thomas Gleixner 提交于
      The code in get_next_timer_interrupt() is suboptimal as it has to run
      through the cascade to find the next expiring timer. On a completely
      idle core we should only do that when there is an active timer
      enqueued and base->next_timer does not give us a fast answer.
      
      Add accounting of the active timers to the now consolidated
      attach/detach code. I deliberately avoided sanity checks because the
      code is fully symetric and any fiddling with timers w/o using the API
      functions will lead to cute explosions anyway. ulong is big enough
      even on 32bit and if we really run into the situation to have more
      than 1<<32 timers enqueued there, then we are definitely not in a
      state to go idle and run through that code.
      
      This allows us to fix another shortcoming of get_next_timer_interrupt().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20120525214819.236377028@linutronix.de
      99d5f3aa
    • T
      timers: Consolidate base->next_timer update · facbb4a7
      Thomas Gleixner 提交于
      Another bunch of mindlessly copied code. All callers of
      internal_add_timer() except the recascading code updates
      base->next_timer.
      
      Move this into internal_add_timer() and let the cascading code call
      __internal_add_timer().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20120525214819.189946224@linutronix.de
      facbb4a7
    • T
      timers: Create detach_if_pending() and use it · ec44bc7a
      Thomas Gleixner 提交于
      Most callers of detach_timer() have the same pattern around
      them. Check whether the timer is pending and eventually updating
      base->next_timer.
      
      Create detach_if_pending() and replace the duplicated code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20120525214819.131246037@linutronix.de
      ec44bc7a
  8. 15 5月, 2012 1 次提交
    • P
      lockdep: fix oops in processing workqueue · 4d82a1de
      Peter Zijlstra 提交于
      Under memory load, on x86_64, with lockdep enabled, the workqueue's
      process_one_work() has been seen to oops in __lock_acquire(), barfing
      on a 0xffffffff00000000 pointer in the lockdep_map's class_cache[].
      
      Because it's permissible to free a work_struct from its callout function,
      the map used is an onstack copy of the map given in the work_struct: and
      that copy is made without any locking.
      
      Surprisingly, gcc (4.5.1 in Hugh's case) uses "rep movsl" rather than
      "rep movsq" for that structure copy: which might race with a workqueue
      user's wait_on_work() doing lock_map_acquire() on the source of the
      copy, putting a pointer into the class_cache[], but only in time for
      the top half of that pointer to be copied to the destination map.
      
      Boom when process_one_work() subsequently does lock_map_acquire()
      on its onstack copy of the lockdep_map.
      
      Fix this, and a similar instance in call_timer_fn(), with a
      lockdep_copy_map() function which additionally NULLs the class_cache[].
      
      Note: this oops was actually seen on 3.4-next, where flush_work() newly
      does the racing lock_map_acquire(); but Tejun points out that 3.4 and
      earlier are already vulnerable to the same through wait_on_work().
      
      * Patch orginally from Peter.  Hugh modified it a bit and wrote the
        description.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Reported-by: NHugh Dickins <hughd@google.com>
      LKML-Reference: <alpine.LSU.2.00.1205070951170.1544@eggly.anvils>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4d82a1de
  9. 03 5月, 2012 1 次提交
  10. 27 4月, 2012 1 次提交
  11. 09 12月, 2011 1 次提交
  12. 24 11月, 2011 2 次提交
  13. 31 10月, 2011 1 次提交
  14. 03 6月, 2011 1 次提交
  15. 08 3月, 2011 1 次提交
    • S
      debugobjects: Add hint for better object identification · 99777288
      Stanislaw Gruszka 提交于
      In complex subsystems like mac80211 structures can contain several
      timers and work structs, so identifying a specific instance from the
      call trace and object type output of debugobjects can be hard.
      
      Allow the subsystems which support debugobjects to provide a hint
      function. This function returns a pointer to a kernel address
      (preferrably the objects callback function) which is printed along
      with the debugobjects type.
      
      Add hint methods for timer_list, work_struct and hrtimer.
      
      [ tglx: Massaged changelog, made it compile ]
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <20110307085809.GA9334@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      99777288
  16. 16 2月, 2011 1 次提交
  17. 08 2月, 2011 1 次提交
  18. 04 2月, 2011 1 次提交
  19. 31 1月, 2011 1 次提交
  20. 13 12月, 2010 1 次提交
    • C
      timers: Use this_cpu_read · 7496351a
      Christoph Lameter 提交于
      Eric asked for this.
      
      [tglx: Because it generates faster code according to Erics ]
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: linux-mm@kvack.org
      LKML-Reference: <alpine.DEB.2.00.1011301404490.4039@router.home>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7496351a
  21. 09 12月, 2010 2 次提交
    • H
      nohz: Fix get_next_timer_interrupt() vs cpu hotplug · dbd87b5a
      Heiko Carstens 提交于
      This fixes a bug as seen on 2.6.32 based kernels where timers got
      enqueued on offline cpus.
      
      If a cpu goes offline it might still have pending timers. These will
      be migrated during CPU_DEAD handling after the cpu is offline.
      However while the cpu is going offline it will schedule the idle task
      which will then call tick_nohz_stop_sched_tick().
      
      That function in turn will call get_next_timer_intterupt() to figure
      out if the tick of the cpu can be stopped or not. If it turns out that
      the next tick is just one jiffy off (delta_jiffies == 1)
      tick_nohz_stop_sched_tick() incorrectly assumes that the tick should
      not stop and takes an early exit and thus it won't update the load
      balancer cpu.
      
      Just afterwards the cpu will be killed and the load balancer cpu could
      be the offline cpu.
      
      On 2.6.32 based kernel get_nohz_load_balancer() gets called to decide
      on which cpu a timer should be enqueued (see __mod_timer()). Which
      leads to the possibility that timers get enqueued on an offline cpu.
      These will never expire and can cause a system hang.
      
      This has been observed 2.6.32 kernels. On current kernels
      __mod_timer() uses get_nohz_timer_target() which doesn't have that
      problem. However there might be other problems because of the too
      early exit tick_nohz_stop_sched_tick() in case a cpu goes offline.
      
      The easiest and probably safest fix seems to be to let
      get_next_timer_interrupt() just lie and let it say there isn't any
      pending timer if the current cpu is offline.
      
      I also thought of moving migrate_[hr]timers() from CPU_DEAD to
      CPU_DYING, but seeing that there already have been fixes at least in
      the hrtimer code in this area I'm afraid that this could add new
      subtle bugs.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101201091109.GA8984@osiris.boeblingen.de.ibm.com>
      Cc: stable@kernel.org
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dbd87b5a
    • P
      sched: Cure more NO_HZ load average woes · 0f004f5a
      Peter Zijlstra 提交于
      There's a long-running regression that proved difficult to fix and
      which is hitting certain people and is rather annoying in its effects.
      
      Damien reported that after 74f5187a (sched: Cure load average vs
      NO_HZ woes) his load average is unnaturally high, he also noted that
      even with that patch reverted the load avgerage numbers are not
      correct.
      
      The problem is that the previous patch only solved half the NO_HZ
      problem, it addressed the part of going into NO_HZ mode, not of
      comming out of NO_HZ mode. This patch implements that missing half.
      
      When comming out of NO_HZ mode there are two important things to take
      care of:
      
       - Folding the pending idle delta into the global active count.
       - Correctly aging the averages for the idle-duration.
      
      So with this patch the NO_HZ interaction should be complete and
      behaviour between CONFIG_NO_HZ=[yn] should be equivalent.
      
      Furthermore, this patch slightly changes the load average computation
      by adding a rounding term to the fixed point multiplication.
      Reported-by: NDamien Wyart <damien.wyart@free.fr>
      Reported-by: NTim McGrath <tmhikaru@gmail.com>
      Tested-by: NDamien Wyart <damien.wyart@free.fr>
      Tested-by: NOrion Poplawski <orion@cora.nwra.com>
      Tested-by: NKyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      Cc: Chase Douglas <chase.douglas@canonical.com>
      LKML-Reference: <1291129145.32004.874.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0f004f5a
  22. 22 10月, 2010 3 次提交
  23. 21 10月, 2010 1 次提交
  24. 19 10月, 2010 1 次提交
    • P
      irq_work: Add generic hardirq context callbacks · e360adbe
      Peter Zijlstra 提交于
      Provide a mechanism that allows running code in IRQ context. It is
      most useful for NMI code that needs to interact with the rest of the
      system -- like wakeup a task to drain buffers.
      
      Perf currently has such a mechanism, so extract that and provide it as
      a generic feature, independent of perf so that others may also
      benefit.
      
      The IRQ context callback is generated through self-IPIs where
      possible, or on architectures like powerpc the decrementer (the
      built-in timer facility) is set to generate an interrupt immediately.
      
      Architectures that don't have anything like this get to do with a
      callback from the timer tick. These architectures can call
      irq_work_run() at the tail of any IRQ handlers that might enqueue such
      work (like the perf IRQ handler) to avoid undue latencies in
      processing the work.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      [ various fixes ]
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e360adbe
  25. 11 8月, 2010 1 次提交
  26. 04 8月, 2010 2 次提交
    • P
      timer: Added usleep_range timer · 5e7f5a17
      Patrick Pannuto 提交于
      usleep_range is a finer precision implementations of msleep
      and is designed to be a drop-in replacement for udelay where
      a precise sleep / busy-wait is unnecessary.
      
      Since an easy interface to hrtimers could lead to an undesired
      proliferation of interrupts, we provide only a "range" API,
      forcing the caller to think about an acceptable tolerance on
      both ends and hopefully avoiding introducing another interrupt.
      
      INTRO
      
      As discussed here ( http://lkml.org/lkml/2007/8/3/250 ), msleep(1) is not
      precise enough for many drivers (yes, sleep precision is an unfair notion,
      but consistently sleeping for ~an order of magnitude greater than requested
      is worth fixing). This patch adds a usleep API so that udelay does not have
      to be used. Obviously not every udelay can be replaced (those in atomic
      contexts or being used for simple bitbanging come to mind), but there are
      many, many examples of
      
      mydriver_write(...)
      /* Wait for hardware to latch */
      udelay(100)
      
      in various drivers where a busy-wait loop is neither beneficial nor
      necessary, but msleep simply does not provide enough precision and people
      are using a busy-wait loop instead.
      
      CONCERNS FROM THE RFC
      
      Why is udelay a problem / necessary? Most callers of udelay are in device/
      driver initialization code, which is serial...
      
      	As I see it, there is only benefit to sleeping over a delay; the
      	notion of "refactoring" areas that use udelay was presented, but
      	I see usleep as the refactoring. Consider i2c, if the bus is busy,
      	you need to wait a bit (say 100us) before trying again, your
      	current options are:
      
      		* udelay(100)
      		* msleep(1) <-- As noted above, actually as high as ~20ms
      				on some platforms, so not really an option
      		* Manually set up an hrtimer to try again in 100us (which
      		  is what usleep does anyway...)
      
      	People choose the udelay route because it is EASY; we need to
      	provide a better easy route.
      
      	Device / driver / boot code is *currently* serial, but every few
      	months someone makes noise about parallelizing boot, and IMHO, a
      	little forward-thinking now is one less thing to worry about
      	if/when that ever happens
      
      udelay's could be preempted
      
      	Sure, but if udelay plans on looping 1000 times, and it gets
      	preempted on loop 200, whenever it's scheduled again, it is
      	going to do the next 800 loops.
      
      Is the interruptible case needed?
      
      	Probably not, but I see usleep as a very logical parallel to msleep,
      	so it made sense to include the "full" API. Processors are getting
      	faster (albeit not as quickly as they are becoming more parallel),
      	so if someone wanted to be interruptible for a few usecs, why not
      	let them? If this is a contentious point, I'm happy to remove it.
      
      OTHER THOUGHTS
      
      I believe there is also value in exposing the usleep_range option; it gives
      the scheduler a lot more flexibility and allows the programmer to express
      his intent much more clearly; it's something I would hope future driver
      writers will take advantage of.
      
      To get the results in the NUMBERS section below, I literally s/udelay/usleep
      the kernel tree; I had to go in and undo the changes to the USB drivers, but
      everything else booted successfully; I find that extremely telling in and
      of itself -- many people are using a delay API where a sleep will suit them
      just fine.
      
      SOME ATTEMPTS AT NUMBERS
      
      It turns out that calculating quantifiable benefit on this is challenging,
      so instead I will simply present the current state of things, and I hope
      this to be sufficient:
      
      How many udelay calls are there in 2.6.35-rc5?
      
      	udealy(ARG) >=	| COUNT
      	1000		| 319
      	500		| 414
      	100		| 1146
      	20		| 1832
      
      I am working on Android, so that is my focus for this. The following table
      is a modified usleep that simply printk's the amount of time requested to
      sleep; these tests were run on a kernel with udelay >= 20 --> usleep
      
      "boot" is power-on to lock screen
      "power collapse" is when the power button is pushed and the device suspends
      "resume" is when the power button is pushed and the lock screen is displayed
               (no touchscreen events or anything, just turning on the display)
      "use device" is from the unlock swipe to clicking around a bit; there is no
      	sd card in this phone, so fail loading music, video, camera
      
      	ACTION		| TOTAL NUMBER OF USLEEP CALLS	| NET TIME (us)
      	boot		| 22				| 1250
      	power-collapse	| 9				| 1200
      	resume		| 5				| 500
      	use device	| 59				| 7700
      
      The most interesting category to me is the "use device" field; 7700us of
      busy-wait time that could be put towards better responsiveness, or at the
      least less power usage.
      Signed-off-by: NPatrick Pannuto <ppannuto@codeaurora.org>
      Cc: apw@canonical.com
      Cc: corbet@lwn.net
      Cc: arjan@linux.intel.com
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5e7f5a17
    • T
      Revert "timer: Added usleep[_range] timer" · e1b004c3
      Thomas Gleixner 提交于
      This reverts commit 22b8f15c to merge
      an advanced version.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      e1b004c3
  27. 03 8月, 2010 1 次提交
  28. 23 7月, 2010 2 次提交