1. 10 12月, 2009 1 次提交
    • T
      hrtimer: Tune hrtimer_interrupt hang logic · 41d2e494
      Thomas Gleixner 提交于
      The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
      execution time of the hrtimer callbacks.
      
      This is error-prone for virtual machines, where a guest vcpu can be
      scheduled out during the execution of the callbacks (and the callbacks
      themselves can do operations that translate to blocking operations in
      the hypervisor), which in can lead to large min_delta_ns rendering the
      system unusable.
      
      Replace the current heuristics with something more reliable. Allow the
      interrupt code to try 3 times to catch up with the lost time. If that
      fails use the total time spent in the interrupt handler to defer the
      next timer interrupt so the system can catch up with other things
      which got delayed. Limit that deferment to 100ms.
      
      The retry events and the maximum time spent in the interrupt handler
      are recorded and exposed via /proc/timer_list
      
      Inspired by a patch from Marcelo.
      Reported-by: NMichael Tokarev <mjt@tls.msk.ru>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Cc: kvm@vger.kernel.org
      41d2e494
  2. 14 11月, 2009 1 次提交
    • J
      nohz: Allow 32-bit machines to sleep for more than 2.15 seconds · 97813f2f
      Jon Hunter 提交于
      In the dynamic tick code, "max_delta_ns" (member of the
      "clock_event_device" structure) represents the maximum sleep time
      that can occur between timer events in nanoseconds.
      
      The variable, "max_delta_ns", is defined as an unsigned long
      which is a 32-bit integer for 32-bit machines and a 64-bit
      integer for 64-bit machines (if -m64 option is used for gcc).
      The value of max_delta_ns is set by calling the function
      "clockevent_delta2ns()" which returns a maximum value of LONG_MAX.
      For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in
      nanoseconds this equates to ~2.15 seconds. Hence, the maximum
      sleep time for a 32-bit machine is ~2.15 seconds, where as for
      a 64-bit machine it will be many years.
      
      This patch changes the type of max_delta_ns to be "u64" instead of
      "unsigned long" so that this variable is a 64-bit type for both 32-bit
      and 64-bit machines. It also changes the maximum value returned by
      clockevent_delta2ns() to KTIME_MAX.  Hence this allows a 32-bit
      machine to sleep for longer than ~2.15 seconds. Please note that this
      patch also changes "min_delta_ns" to be "u64" too and although this is
      unnecessary, it makes the patch simpler as it avoids to fixup all
      callers of clockevent_delta2ns().
      
      [ tglx: changed "unsigned long long" to u64 as we use this data type
        	through out the time code ]
      Signed-off-by: NJon Hunter <jon-hunter@ti.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1250617512-23567-3-git-send-email-jon-hunter@ti.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      97813f2f
  3. 26 9月, 2009 1 次提交
  4. 15 9月, 2009 1 次提交
    • A
      hrtimer: Eliminate needless reprogramming of clock events device · 7403f41f
      Ashwin Chaugule 提交于
      On NOHZ systems the following timers,
      
      -  tick_nohz_restart_sched_tick (tick_sched_timer)
      -  hrtimer_start (tick_sched_timer)
      
      are reprogramming the clock events device far more often than needed.
      No specific test case was required to observe this effect. This
      occurres because there was no check to see if the currently removed or
      restarted hrtimer was:
      
      1) the one which previously armed the clock events device.
      2) going to be replaced by another timer which has the same expiry time.
      
      Avoid the reprogramming in hrtimer_force_reprogram when the new expiry
      value which is evaluated from the clock bases is equal to
      cpu_base->expires_next. This results in faster application startup
      time by ~4%.
      
      [ tglx: simplified initial solution ]
      Signed-off-by: NAshwin Chaugule <ashwinc@quicinc.com>
      LKML-Reference: <4AA00165.90609@codeaurora.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7403f41f
  5. 29 8月, 2009 2 次提交
    • X
      hrtimer: Add tracepoint for hrtimers · c6a2a177
      Xiao Guangrong 提交于
      Add tracepoints which cover the life cycle of a hrtimer. The
      tracepoints are integrated with the already existing debug_object
      debug points as far as possible.
      
      [ tglx: Fixed comments, made output conistent, easier to read and
        	parse. Fixed output for 32bit archs which do not use the
        	scalar representation of ktime_t. Hand current time to
        	trace_hrtimer_expiry_entry instead of calling get_time()
        	inside of the trace assignment. ]
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Zhaolei <zhaolei@cn.fujitsu.com>
      LKML-Reference: <4A7F8B2B.5020908@cn.fujitsu.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c6a2a177
    • S
      pktgen: spin using hrtimer · 2bc481cf
      Stephen Hemminger 提交于
      This changes how the pktgen thread spins/waits between
      packets if delay is configured. It uses a high res timer to
      wait for time to arrive.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2bc481cf
  6. 22 7月, 2009 1 次提交
  7. 10 7月, 2009 2 次提交
    • T
      hrtimer: Fix migration expiry check · 6ff7041d
      Thomas Gleixner 提交于
      The timer migration expiry check should prevent the migration of a
      timer to another CPU when the timer expires before the next event is
      scheduled on the other CPU. Migrating the timer might delay it because
      we can not reprogram the clock event device on the other CPU. But the
      code implementing that check has two flaws:
      
      - for !HIGHRES the check compares the expiry value with the clock
        events device expiry value which is wrong for CLOCK_REALTIME based
        timers.
      
      - the check is racy. It holds the hrtimer base lock of the target CPU,
        but the clock event device expiry value can be modified
        nevertheless, e.g. by an timer interrupt firing.
      
      The !HIGHRES case is easy to fix as we can enqueue the timer on the
      cpu which was selected by the load balancer. It runs the idle
      balancing code once per jiffy anyway. So the maximum delay for the
      timer is the same as when we keep the tick on the current cpu going.
      
      In the HIGHRES case we can get the next expiry value from the hrtimer
      cpu_base of the target CPU and serialize the update with the cpu_base
      lock. This moves the lock section in hrtimer_interrupt() so we can set
      next_event to KTIME_MAX while we are handling the expired timers and
      set it to the next expiry value after we handled the timers under the
      base lock. While the expired timers are processed timer migration is
      blocked because the expiry time of the timer is always <= KTIME_MAX.
      
      Also remove the now useless clockevents_get_next_event() function.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6ff7041d
    • T
      hrtimer: migration: do not check expiry time on current CPU · 7e0c5086
      Thomas Gleixner 提交于
      The timer migration code needs to check whether the expiry time of the
      timer is before the programmed clock event expiry time when the timer
      is enqueued on another CPU because we can not reprogram the timer
      device on the other CPU. The current logic checks the expiry time even
      if we enqueue on the current CPU when nohz_get_load_balancer() returns
      current CPU. This might lead to an endless loop in the expiry check
      code when the expiry time of the timer is before the current
      programmed next event.
      
      Check whether nohz_get_load_balancer() returns current CPU and skip
      the expiry check if this is the case.
      
      The bug was triggered from the networking code. The patch fixes the
      regression http://bugzilla.kernel.org/show_bug.cgi?id=13738
      (Soft-Lockup/Race in networking in 2.6.31-rc1+195)
      
      Cc: Arun Bharadwaj <arun@linux.vnet.ibm.com
      Tested-by: NJoao Correia <joaomiguelcorreia@gmail.com>
      Tested-by: NAndres Freund <andres@anarazel.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7e0c5086
  8. 07 7月, 2009 2 次提交
    • T
      timekeeping: Move ktime_get() functions to timekeeping.c · a40f262c
      Thomas Gleixner 提交于
      The ktime_get() functions for GENERIC_TIME=n are still located in
      hrtimer.c. Move them to time/timekeeping.c where they belong.
      
      LKML-Reference: <new-submission>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a40f262c
    • M
      timekeeping: optimized ktime_get[_ts] for GENERIC_TIME=y · 951ed4d3
      Martin Schwidefsky 提交于
      The generic ktime_get function defined in kernel/hrtimer.c is suboptimial
      for GENERIC_TIME=y:
      
       0)               |  ktime_get() {
       0)               |    ktime_get_ts() {
       0)               |      getnstimeofday() {
       0)               |        read_tod_clock() {
       0)   0.601 us    |        }
       0)   1.938 us    |      }
       0)               |      set_normalized_timespec() {
       0)   0.602 us    |      }
       0)   4.375 us    |    }
       0)   5.523 us    |  }
      
      Overall there are two read_seqbegin/read_seqretry loops and a lot of
      unnecessary struct timespec calculations. ktime_get returns a nano second
      value which is the sum of xtime, wall_to_monotonic and the nano second
      delta from the clock source.
      
      ktime_get can be optimized for GENERIC_TIME=y. The new version only calls
      clocksource_read:
      
       0)               |  ktime_get() {
       0)               |    read_tod_clock() {
       0)   0.610 us    |    }
       0)   1.977 us    |  }
      
      It uses a single read_seqbegin/readseqretry loop and just adds everthing
      to a nano second value.
      
      ktime_get_ts is optimized in a similar fashion.
      
      [ tglx: added WARN_ON(timekeeping_suspended) as in getnstimeofday() ]
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Acked-by: Njohn stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090707112728.3005244d@skybase>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      951ed4d3
  9. 13 6月, 2009 1 次提交
  10. 11 6月, 2009 1 次提交
  11. 08 6月, 2009 1 次提交
  12. 13 5月, 2009 2 次提交
    • A
      timers: Logic to move non pinned timers · eea08f32
      Arun R Bharadwaj 提交于
      * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:
      
      This patch migrates all non pinned timers and hrtimers to the current
      idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
      are not migrated.
      
      While migrating hrtimers, care should be taken to check if migrating
      a hrtimer would result in a latency or not. So we compare the expiry of the
      hrtimer with the next timer interrupt on the target cpu and migrate the
      hrtimer only if it expires *after* the next interrupt on the target cpu.
      So, added a clockevents_get_next_event() helper function to return the
      next_event on the target cpu's clock_event_device.
      
      [ tglx: cleanups and simplifications ]
      Signed-off-by: NArun R Bharadwaj <arun@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      eea08f32
    • A
      timers: Framework for identifying pinned timers · 597d0275
      Arun R Bharadwaj 提交于
      * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:
      
      This patch creates a new framework for identifying cpu-pinned timers
      and hrtimers.
      
      This framework is needed because pinned timers are expected to fire on
      the same CPU on which they are queued. So it is essential to identify
      these and not migrate them, in case there are any.
      
      For regular timers, the currently existing add_timer_on() can be used
      queue pinned timers and subsequently mod_timer_pinned() can be used
      to modify the 'expires' field.
      
      For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are
      added to queue cpu-pinned hrtimer.
      
      [ tglx: use .._PINNED mode argument instead of creating tons of new
      functions ]
      Signed-off-by: NArun R Bharadwaj <arun@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      597d0275
  13. 31 3月, 2009 1 次提交
    • P
      hrtimer: fix rq->lock inversion (again) · 7f1e2ca9
      Peter Zijlstra 提交于
      It appears I inadvertly introduced rq->lock recursion to the
      hrtimer_start() path when I delegated running already expired
      timers to softirq context.
      
      This patch fixes it by introducing a __hrtimer_start_range_ns()
      method that will not use raise_softirq_irqoff() but
      __raise_softirq_irqoff() which avoids the wakeup.
      
      It then also changes schedule() to check for pending softirqs and
      do the wakeup then, I'm not quite sure I like this last bit, nor
      am I convinced its really needed.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      LKML-Reference: <20090313112301.096138802@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7f1e2ca9
  14. 31 1月, 2009 3 次提交
    • T
      hrtimer: prevent negative expiry value after clock_was_set() · b0a9b511
      Thomas Gleixner 提交于
      Impact: prevent false positive WARN_ON() in clockevents_program_event()
      
      clock_was_set() changes the base->offset of CLOCK_REALTIME and
      enforces the reprogramming of the clockevent device to expire timers
      which are based on CLOCK_REALTIME. If the clock change is large enough
      then the subtraction of the timer expiry value and base->offset can
      become negative which triggers the warning in
      clockevents_program_event().
      
      Check the subtraction result and set a negative value to 0.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b0a9b511
    • S
      hrtimers: allow the hot-unplugging of all cpus · 94df7de0
      Sebastien Dugue 提交于
      Impact: fix CPU hotplug hang on Power6 testbox
      
      On architectures that support offlining all cpus (at least powerpc/pseries),
      hot-unpluging the tick_do_timer_cpu can result in a system hang.
      
      This comes from the fact that if the cpu going down happens to be the
      cpu doing the tick, then as the tick_do_timer_cpu handover happens after the
      cpu is dead (via the CPU_DEAD notification), we're left without ticks,
      jiffies are frozen and any task relying on timers (msleep, ...) is stuck.
      That's particularly the case for the cpu looping in __cpu_die() waiting
      for the dying cpu to be dead.
      
      This patch addresses this by having the tick_do_timer_cpu handover happen
      earlier during the CPU_DYING notification. For this, a new clockevent
      notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered
      in hrtimer_cpu_notify().
      Signed-off-by: NSebastien Dugue <sebastien.dugue@bull.net>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      94df7de0
    • F
      hrtimers: increase clock min delta threshold while interrupt hanging · 7f22391c
      Frederic Weisbecker 提交于
      Impact: avoid timer IRQ hanging slow systems
      
      While using the function graph tracer on a virtualized system, the
      hrtimer_interrupt can hang the system on an infinite loop.
      
      This can be caused in several situations:
      
       - the hardware is very slow and HZ is set too high
      
       - something intrusive is slowing the system down (tracing under emulation)
      
      ... and the next clock events to program are always before the current time.
      
      This patch implements a reasonable compromise: if such a situation is
      detected, we share the CPUs time in 1/4 to process the hrtimer interrupts.
      This is enough to let the system running without serious starvation.
      
      It has been successfully tested under VirtualBox with 1000 HZ and 100 HZ
      with function graph tracer launched. On both cases, the clock events were
      increased until about 25 ms periodic ticks, which means 40 HZ.
      
      So we change a hard to debug hang into a warning message and a system that
      still manages to limp along.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7f22391c
  15. 25 1月, 2009 1 次提交
    • T
      hrtimer: prevent negative expiry value after clock_was_set() · 6626bff2
      Thomas Gleixner 提交于
      Impact: prevent false positive WARN_ON() in clockevents_program_event()
      
      clock_was_set() changes the base->offset of CLOCK_REALTIME and
      enforces the reprogramming of the clockevent device to expire timers
      which are based on CLOCK_REALTIME. If the clock change is large enough
      then the subtraction of the timer expiry value and base->offset can
      become negative which triggers the warning in
      clockevents_program_event(). 
      
      Check the subtraction result and set a negative value to 0.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6626bff2
  16. 19 1月, 2009 1 次提交
    • P
      hrtimers: fix inconsistent lock state on resume in hres_timers_resume · 1d4a7f1c
      Peter Zijlstra 提交于
      Andrey Borzenkov reported this lockdep assert:
      
      > [17854.688347] =================================
      > [17854.688347] [ INFO: inconsistent lock state ]
      > [17854.688347] 2.6.29-rc2-1avb #1
      > [17854.688347] ---------------------------------
      > [17854.688347] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
      > [17854.688347] pm-suspend/18240 [HC0[0]:SC0[0]:HE1:SE1] takes:
      > [17854.688347]  (&cpu_base->lock){++..}, at: [<c0136fcc>] retrigger_next_event+0x5c/0xa0
      > [17854.688347] {in-hardirq-W} state was registered at:
      > [17854.688347]   [<c01443cd>] __lock_acquire+0x79d/0x1930
      > [17854.688347]   [<c01455bc>] lock_acquire+0x5c/0x80
      > [17854.688347]   [<c03092e5>] _spin_lock+0x35/0x70
      > [17854.688347]   [<c0136e61>] hrtimer_run_queues+0x31/0x140
      > [17854.688347]   [<c0128d98>] run_local_timers+0x8/0x20
      > [17854.688347]   [<c0128dd3>] update_process_times+0x23/0x60
      > [17854.688347]   [<c013e274>] tick_periodic+0x24/0x80
      > [17854.688347]   [<c013e2e2>] tick_handle_periodic+0x12/0x70
      > [17854.688347]   [<c0104e24>] timer_interrupt+0x14/0x20
      > [17854.688347]   [<c01607b9>] handle_IRQ_event+0x29/0x60
      > [17854.688347]   [<c0161c59>] handle_level_irq+0x69/0xe0
      > [17854.688347]   [<ffffffff>] 0xffffffff
      > [17854.688347] irq event stamp: 55771
      > [17854.688347] hardirqs last  enabled at (55771): [<c0309125>] _spin_unlock_irqrestore+0x35/0x60
      > [17854.688347] hardirqs last disabled at (55770): [<c0309419>] _spin_lock_irqsave+0x19/0x80
      > [17854.688347] softirqs last  enabled at (54836): [<c0124f54>] __do_softirq+0xc4/0x110
      > [17854.688347] softirqs last disabled at (54831): [<c01049ae>] do_softirq+0x8e/0xe0
      > [17854.688347]
      > [17854.688347] other info that might help us debug this:
      > [17854.688347] 3 locks held by pm-suspend/18240:
      > [17854.688347]  #0:  (&buffer->mutex){--..}, at: [<c01dd4c5>] sysfs_write_file+0x25/0x100
      > [17854.688347]  #1:  (pm_mutex){--..}, at: [<c015056f>] enter_state+0x4f/0x140
      > [17854.688347]  #2:  (dpm_list_mtx){--..}, at: [<c027880f>] device_pm_lock+0xf/0x20
      > [17854.688347]
      > [17854.688347] stack backtrace:
      > [17854.688347] Pid: 18240, comm: pm-suspend Not tainted 2.6.29-rc2-1avb #1
      > [17854.688347] Call Trace:
      > [17854.688347]  [<c0306248>] ? printk+0x18/0x20
      > [17854.688347]  [<c0141fac>] print_usage_bug+0x16c/0x1d0
      > [17854.688347]  [<c0142bcf>] mark_lock+0x8bf/0xc90
      > [17854.688347]  [<c0106b8f>] ? pit_next_event+0x2f/0x40
      > [17854.688347]  [<c01441b0>] __lock_acquire+0x580/0x1930
      > [17854.688347]  [<c030916d>] ? _spin_unlock+0x1d/0x20
      > [17854.688347]  [<c0106b8f>] ? pit_next_event+0x2f/0x40
      > [17854.688347]  [<c013dd38>] ? clockevents_program_event+0x98/0x160
      > [17854.688347]  [<c0142fe8>] ? mark_held_locks+0x48/0x90
      > [17854.688347]  [<c0309125>] ? _spin_unlock_irqrestore+0x35/0x60
      > [17854.688347]  [<c0143229>] ? trace_hardirqs_on_caller+0x139/0x190
      > [17854.688347]  [<c014328b>] ? trace_hardirqs_on+0xb/0x10
      > [17854.688347]  [<c01455bc>] lock_acquire+0x5c/0x80
      > [17854.688347]  [<c0136fcc>] ? retrigger_next_event+0x5c/0xa0
      > [17854.688347]  [<c03092e5>] _spin_lock+0x35/0x70
      > [17854.688347]  [<c0136fcc>] ? retrigger_next_event+0x5c/0xa0
      > [17854.688347]  [<c0136fcc>] retrigger_next_event+0x5c/0xa0
      > [17854.688347]  [<c013711a>] hres_timers_resume+0xa/0x10
      > [17854.688347]  [<c013aa8e>] timekeeping_resume+0xee/0x150
      > [17854.688347]  [<c0273384>] __sysdev_resume+0x14/0x50
      > [17854.688347]  [<c0273407>] sysdev_resume+0x47/0x80
      > [17854.688347]  [<c02791ab>] device_power_up+0xb/0x20
      > [17854.688347]  [<c015043f>] suspend_devices_and_enter+0xcf/0x150
      > [17854.688347]  [<c0150c2f>] ? freeze_processes+0x3f/0x90
      > [17854.688347]  [<c0150614>] enter_state+0xf4/0x140
      > [17854.688347]  [<c01506dd>] state_store+0x7d/0xc0
      > [17854.688347]  [<c0150660>] ? state_store+0x0/0xc0
      > [17854.688347]  [<c0202da4>] kobj_attr_store+0x24/0x30
      > [17854.688347]  [<c01dd53c>] sysfs_write_file+0x9c/0x100
      > [17854.688347]  [<c019916c>] vfs_write+0x9c/0x160
      > [17854.688347]  [<c0103494>] ? restore_nocheck_notrace+0x0/0xe
      > [17854.688347]  [<c01dd4a0>] ? sysfs_write_file+0x0/0x100
      > [17854.688347]  [<c01992ed>] sys_write+0x3d/0x70
      > [17854.688347]  [<c0103371>] sysenter_do_call+0x12/0x31
      
      Andrey's analysis:
      
      > timekeeping_resume() is called via class ->resume
      > method; and according to comments in sysdev_resume() and
      > device_power_up(), they are called with interrupts disabled.
      >
      > Looking at suspend_enter, irqs *are* disabled at this point.
      >
      > So it actually looks like something (may be some driver)
      > unconditionally enabled irqs in resume path.
      
      Add a debug check to test this theory. If it triggers then it
      triggers because the resume code calls it with irqs enabled,
      which is a no-no not just for timekeeping_resume(), but also
      bad for a number of other resume handlers.
      Reported-by: NAndrey Borzenkov <arvidjaar@mail.ru>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1d4a7f1c
  17. 14 1月, 2009 1 次提交
  18. 05 1月, 2009 6 次提交
  19. 30 12月, 2008 1 次提交
    • S
      hrtimers: allow the hot-unplugging of all cpus · 5762ba18
      Sebastien Dugue 提交于
      Impact: fix CPU hotplug hang on Power6 testbox
      
      On architectures that support offlining all cpus (at least powerpc/pseries),
      hot-unpluging the tick_do_timer_cpu can result in a system hang.
      
      This comes from the fact that if the cpu going down happens to be the
      cpu doing the tick, then as the tick_do_timer_cpu handover happens after the
      cpu is dead (via the CPU_DEAD notification), we're left without ticks,
      jiffies are frozen and any task relying on timers (msleep, ...) is stuck.
      That's particularly the case for the cpu looping in __cpu_die() waiting
      for the dying cpu to be dead.
      
      This patch addresses this by having the tick_do_timer_cpu handover happen
      earlier during the CPU_DYING notification. For this, a new clockevent
      notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered
      in hrtimer_cpu_notify().
      Signed-off-by: NSebastien Dugue <sebastien.dugue@bull.net>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5762ba18
  20. 27 12月, 2008 1 次提交
    • F
      hrtimers: increase clock min delta threshold while interrupt hanging · 1cc4fff0
      Frederic Weisbecker 提交于
      Impact: avoid timer IRQ hanging slow systems
      
      While using the function graph tracer on a virtualized system, the
      hrtimer_interrupt can hang the system on an infinite loop.
      
      This can be caused in several situations:
      
       - the hardware is very slow and HZ is set too high
      
       - something intrusive is slowing the system down (tracing under emulation)
      
      ... and the next clock events to program are always before the current time.
      
      This patch implements a reasonable compromise: if such a situation is
      detected, we share the CPUs time in 1/4 to process the hrtimer interrupts.
      This is enough to let the system running without serious starvation.
      
      It has been successfully tested under VirtualBox with 1000 HZ and 100 HZ
      with function graph tracer launched. On both cases, the clock events were
      increased until about 25 ms periodic ticks, which means 40 HZ.
      
      So we change a hard to debug hang into a warning message and a system that
      still manages to limp along.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1cc4fff0
  21. 26 12月, 2008 1 次提交
  22. 19 12月, 2008 1 次提交
  23. 09 12月, 2008 1 次提交
  24. 04 12月, 2008 1 次提交
  25. 25 11月, 2008 1 次提交
    • P
      hrtimer: removing all ur callback modes · ca109491
      Peter Zijlstra 提交于
      Impact: cleanup, move all hrtimer processing into hardirq context
      
      This is an attempt at removing some of the hrtimer complexity by
      reducing the number of callback modes to 1.
      
      This means that all hrtimer callback functions will be ran from HARD-irq
      context.
      
      I went through all the 30 odd hrtimer callback functions in the kernel
      and saw only one that I'm not quite sure of, which is the one in
      net/can/bcm.c - hence I'm CC-ing the folks responsible for that code.
      
      Furthermore, the hrtimer core now calls callbacks directly with IRQs
      disabled in case you try to enqueue an expired timer. If this timer is a
      periodic timer (which should use hrtimer_forward() to advance its time)
      then it might be possible to end up in an inf. recursive loop due to the
      fact that hrtimer_forward() doesn't round up to the next timer
      granularity, and therefore keeps on calling the callback - obviously
      this needs a fix.
      
      Aside from that, this seems to compile and actually boot on my dual core
      test box - although I'm sure there are some bugs in, me not hitting any
      makes me certain :-)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ca109491
  26. 12 11月, 2008 1 次提交
  27. 11 11月, 2008 1 次提交
    • G
      timers: handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context · 5d5254f0
      Gautham R Shenoy 提交于
      Impact: fix incorrect locking triggered during hotplug-intense stress-tests
      
      While migrating the the CB_IRQSAFE_UNLOCKED timers during a cpu-offline,
      we queue them on the cb_pending list, so that they won't go
      stale.
      
      Thus, when the callbacks of the timers run from the softirq context,
      they could run into potential deadlocks, since these callbacks
      assume that they're running with irq's disabled, thereby annoying
      lockdep!
      
      Fix this by emulating hardirq context while running these callbacks from
      the hrtimer softirq.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.27 #2
      --------------------------------
      inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
      ksoftirqd/0/4 [HC0[0]:SC1[1]:HE1:SE0] takes:
       (&rq->lock){++..}, at: [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
      {in-hardirq-W} state was registered at:
        [<c014103c>] __lock_acquire+0x549/0x121e
        [<c0107890>] native_sched_clock+0x88/0x99
        [<c013aa12>] clocksource_get_next+0x39/0x3f
        [<c0139abc>] update_wall_time+0x616/0x7df
        [<c0141d6b>] lock_acquire+0x5a/0x74
        [<c0121724>] scheduler_tick+0x3a/0x18d
        [<c047ed45>] _spin_lock+0x1c/0x45
        [<c0121724>] scheduler_tick+0x3a/0x18d
        [<c0121724>] scheduler_tick+0x3a/0x18d
        [<c012c436>] update_process_times+0x3a/0x44
        [<c013c044>] tick_periodic+0x63/0x6d
        [<c013c062>] tick_handle_periodic+0x14/0x5e
        [<c010568c>] timer_interrupt+0x44/0x4a
        [<c0150c9f>] handle_IRQ_event+0x13/0x3d
        [<c0151c14>] handle_level_irq+0x79/0xbd
        [<c0105634>] do_IRQ+0x69/0x7d
        [<c01041e4>] common_interrupt+0x28/0x30
        [<c047007b>] aac_probe_one+0x1a3/0x3f3
        [<c047ec2d>] _spin_unlock_irqrestore+0x36/0x39
        [<c01512b4>] setup_irq+0x1be/0x1f9
        [<c065d70b>] start_kernel+0x259/0x2c5
        [<ffffffff>] 0xffffffff
      irq event stamp: 50102
      hardirqs last  enabled at (50102): [<c047ebf4>] _spin_unlock_irq+0x20/0x23
      hardirqs last disabled at (50101): [<c047edc2>] _spin_lock_irq+0xa/0x4b
      softirqs last  enabled at (50088): [<c0128ba6>] do_softirq+0x37/0x4d
      softirqs last disabled at (50099): [<c0128ba6>] do_softirq+0x37/0x4d
      
      other info that might help us debug this:
      no locks held by ksoftirqd/0/4.
      
      stack backtrace:
      Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.27 #2
       [<c013f6cb>] print_usage_bug+0x13e/0x147
       [<c013fef5>] mark_lock+0x493/0x797
       [<c01410b1>] __lock_acquire+0x5be/0x121e
       [<c0141d6b>] lock_acquire+0x5a/0x74
       [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
       [<c047ed45>] _spin_lock+0x1c/0x45
       [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
       [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
       [<c01210fd>] finish_task_switch+0x41/0xbd
       [<c0107890>] native_sched_clock+0x88/0x99
       [<c011dae6>] sched_rt_period_timer+0x0/0x1fc
       [<c0136dda>] run_hrtimer_pending+0x54/0xe5
       [<c011dae6>] sched_rt_period_timer+0x0/0x1fc
       [<c0128afb>] __do_softirq+0x7b/0xef
       [<c0128ba6>] do_softirq+0x37/0x4d
       [<c0128c12>] ksoftirqd+0x56/0xc5
       [<c0128bbc>] ksoftirqd+0x0/0xc5
       [<c0134649>] kthread+0x38/0x5d
       [<c0134611>] kthread+0x0/0x5d
       [<c0104477>] kernel_thread_helper+0x7/0x10
       =======================
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5d5254f0
  28. 20 10月, 2008 2 次提交