1. 31 3月, 2009 1 次提交
    • P
      hrtimer: fix rq->lock inversion (again) · 7f1e2ca9
      Peter Zijlstra 提交于
      It appears I inadvertly introduced rq->lock recursion to the
      hrtimer_start() path when I delegated running already expired
      timers to softirq context.
      
      This patch fixes it by introducing a __hrtimer_start_range_ns()
      method that will not use raise_softirq_irqoff() but
      __raise_softirq_irqoff() which avoids the wakeup.
      
      It then also changes schedule() to check for pending softirqs and
      do the wakeup then, I'm not quite sure I like this last bit, nor
      am I convinced its really needed.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      LKML-Reference: <20090313112301.096138802@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7f1e2ca9
  2. 31 1月, 2009 3 次提交
    • T
      hrtimer: prevent negative expiry value after clock_was_set() · b0a9b511
      Thomas Gleixner 提交于
      Impact: prevent false positive WARN_ON() in clockevents_program_event()
      
      clock_was_set() changes the base->offset of CLOCK_REALTIME and
      enforces the reprogramming of the clockevent device to expire timers
      which are based on CLOCK_REALTIME. If the clock change is large enough
      then the subtraction of the timer expiry value and base->offset can
      become negative which triggers the warning in
      clockevents_program_event().
      
      Check the subtraction result and set a negative value to 0.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b0a9b511
    • S
      hrtimers: allow the hot-unplugging of all cpus · 94df7de0
      Sebastien Dugue 提交于
      Impact: fix CPU hotplug hang on Power6 testbox
      
      On architectures that support offlining all cpus (at least powerpc/pseries),
      hot-unpluging the tick_do_timer_cpu can result in a system hang.
      
      This comes from the fact that if the cpu going down happens to be the
      cpu doing the tick, then as the tick_do_timer_cpu handover happens after the
      cpu is dead (via the CPU_DEAD notification), we're left without ticks,
      jiffies are frozen and any task relying on timers (msleep, ...) is stuck.
      That's particularly the case for the cpu looping in __cpu_die() waiting
      for the dying cpu to be dead.
      
      This patch addresses this by having the tick_do_timer_cpu handover happen
      earlier during the CPU_DYING notification. For this, a new clockevent
      notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered
      in hrtimer_cpu_notify().
      Signed-off-by: NSebastien Dugue <sebastien.dugue@bull.net>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      94df7de0
    • F
      hrtimers: increase clock min delta threshold while interrupt hanging · 7f22391c
      Frederic Weisbecker 提交于
      Impact: avoid timer IRQ hanging slow systems
      
      While using the function graph tracer on a virtualized system, the
      hrtimer_interrupt can hang the system on an infinite loop.
      
      This can be caused in several situations:
      
       - the hardware is very slow and HZ is set too high
      
       - something intrusive is slowing the system down (tracing under emulation)
      
      ... and the next clock events to program are always before the current time.
      
      This patch implements a reasonable compromise: if such a situation is
      detected, we share the CPUs time in 1/4 to process the hrtimer interrupts.
      This is enough to let the system running without serious starvation.
      
      It has been successfully tested under VirtualBox with 1000 HZ and 100 HZ
      with function graph tracer launched. On both cases, the clock events were
      increased until about 25 ms periodic ticks, which means 40 HZ.
      
      So we change a hard to debug hang into a warning message and a system that
      still manages to limp along.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7f22391c
  3. 25 1月, 2009 1 次提交
    • T
      hrtimer: prevent negative expiry value after clock_was_set() · 6626bff2
      Thomas Gleixner 提交于
      Impact: prevent false positive WARN_ON() in clockevents_program_event()
      
      clock_was_set() changes the base->offset of CLOCK_REALTIME and
      enforces the reprogramming of the clockevent device to expire timers
      which are based on CLOCK_REALTIME. If the clock change is large enough
      then the subtraction of the timer expiry value and base->offset can
      become negative which triggers the warning in
      clockevents_program_event(). 
      
      Check the subtraction result and set a negative value to 0.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6626bff2
  4. 19 1月, 2009 1 次提交
    • P
      hrtimers: fix inconsistent lock state on resume in hres_timers_resume · 1d4a7f1c
      Peter Zijlstra 提交于
      Andrey Borzenkov reported this lockdep assert:
      
      > [17854.688347] =================================
      > [17854.688347] [ INFO: inconsistent lock state ]
      > [17854.688347] 2.6.29-rc2-1avb #1
      > [17854.688347] ---------------------------------
      > [17854.688347] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
      > [17854.688347] pm-suspend/18240 [HC0[0]:SC0[0]:HE1:SE1] takes:
      > [17854.688347]  (&cpu_base->lock){++..}, at: [<c0136fcc>] retrigger_next_event+0x5c/0xa0
      > [17854.688347] {in-hardirq-W} state was registered at:
      > [17854.688347]   [<c01443cd>] __lock_acquire+0x79d/0x1930
      > [17854.688347]   [<c01455bc>] lock_acquire+0x5c/0x80
      > [17854.688347]   [<c03092e5>] _spin_lock+0x35/0x70
      > [17854.688347]   [<c0136e61>] hrtimer_run_queues+0x31/0x140
      > [17854.688347]   [<c0128d98>] run_local_timers+0x8/0x20
      > [17854.688347]   [<c0128dd3>] update_process_times+0x23/0x60
      > [17854.688347]   [<c013e274>] tick_periodic+0x24/0x80
      > [17854.688347]   [<c013e2e2>] tick_handle_periodic+0x12/0x70
      > [17854.688347]   [<c0104e24>] timer_interrupt+0x14/0x20
      > [17854.688347]   [<c01607b9>] handle_IRQ_event+0x29/0x60
      > [17854.688347]   [<c0161c59>] handle_level_irq+0x69/0xe0
      > [17854.688347]   [<ffffffff>] 0xffffffff
      > [17854.688347] irq event stamp: 55771
      > [17854.688347] hardirqs last  enabled at (55771): [<c0309125>] _spin_unlock_irqrestore+0x35/0x60
      > [17854.688347] hardirqs last disabled at (55770): [<c0309419>] _spin_lock_irqsave+0x19/0x80
      > [17854.688347] softirqs last  enabled at (54836): [<c0124f54>] __do_softirq+0xc4/0x110
      > [17854.688347] softirqs last disabled at (54831): [<c01049ae>] do_softirq+0x8e/0xe0
      > [17854.688347]
      > [17854.688347] other info that might help us debug this:
      > [17854.688347] 3 locks held by pm-suspend/18240:
      > [17854.688347]  #0:  (&buffer->mutex){--..}, at: [<c01dd4c5>] sysfs_write_file+0x25/0x100
      > [17854.688347]  #1:  (pm_mutex){--..}, at: [<c015056f>] enter_state+0x4f/0x140
      > [17854.688347]  #2:  (dpm_list_mtx){--..}, at: [<c027880f>] device_pm_lock+0xf/0x20
      > [17854.688347]
      > [17854.688347] stack backtrace:
      > [17854.688347] Pid: 18240, comm: pm-suspend Not tainted 2.6.29-rc2-1avb #1
      > [17854.688347] Call Trace:
      > [17854.688347]  [<c0306248>] ? printk+0x18/0x20
      > [17854.688347]  [<c0141fac>] print_usage_bug+0x16c/0x1d0
      > [17854.688347]  [<c0142bcf>] mark_lock+0x8bf/0xc90
      > [17854.688347]  [<c0106b8f>] ? pit_next_event+0x2f/0x40
      > [17854.688347]  [<c01441b0>] __lock_acquire+0x580/0x1930
      > [17854.688347]  [<c030916d>] ? _spin_unlock+0x1d/0x20
      > [17854.688347]  [<c0106b8f>] ? pit_next_event+0x2f/0x40
      > [17854.688347]  [<c013dd38>] ? clockevents_program_event+0x98/0x160
      > [17854.688347]  [<c0142fe8>] ? mark_held_locks+0x48/0x90
      > [17854.688347]  [<c0309125>] ? _spin_unlock_irqrestore+0x35/0x60
      > [17854.688347]  [<c0143229>] ? trace_hardirqs_on_caller+0x139/0x190
      > [17854.688347]  [<c014328b>] ? trace_hardirqs_on+0xb/0x10
      > [17854.688347]  [<c01455bc>] lock_acquire+0x5c/0x80
      > [17854.688347]  [<c0136fcc>] ? retrigger_next_event+0x5c/0xa0
      > [17854.688347]  [<c03092e5>] _spin_lock+0x35/0x70
      > [17854.688347]  [<c0136fcc>] ? retrigger_next_event+0x5c/0xa0
      > [17854.688347]  [<c0136fcc>] retrigger_next_event+0x5c/0xa0
      > [17854.688347]  [<c013711a>] hres_timers_resume+0xa/0x10
      > [17854.688347]  [<c013aa8e>] timekeeping_resume+0xee/0x150
      > [17854.688347]  [<c0273384>] __sysdev_resume+0x14/0x50
      > [17854.688347]  [<c0273407>] sysdev_resume+0x47/0x80
      > [17854.688347]  [<c02791ab>] device_power_up+0xb/0x20
      > [17854.688347]  [<c015043f>] suspend_devices_and_enter+0xcf/0x150
      > [17854.688347]  [<c0150c2f>] ? freeze_processes+0x3f/0x90
      > [17854.688347]  [<c0150614>] enter_state+0xf4/0x140
      > [17854.688347]  [<c01506dd>] state_store+0x7d/0xc0
      > [17854.688347]  [<c0150660>] ? state_store+0x0/0xc0
      > [17854.688347]  [<c0202da4>] kobj_attr_store+0x24/0x30
      > [17854.688347]  [<c01dd53c>] sysfs_write_file+0x9c/0x100
      > [17854.688347]  [<c019916c>] vfs_write+0x9c/0x160
      > [17854.688347]  [<c0103494>] ? restore_nocheck_notrace+0x0/0xe
      > [17854.688347]  [<c01dd4a0>] ? sysfs_write_file+0x0/0x100
      > [17854.688347]  [<c01992ed>] sys_write+0x3d/0x70
      > [17854.688347]  [<c0103371>] sysenter_do_call+0x12/0x31
      
      Andrey's analysis:
      
      > timekeeping_resume() is called via class ->resume
      > method; and according to comments in sysdev_resume() and
      > device_power_up(), they are called with interrupts disabled.
      >
      > Looking at suspend_enter, irqs *are* disabled at this point.
      >
      > So it actually looks like something (may be some driver)
      > unconditionally enabled irqs in resume path.
      
      Add a debug check to test this theory. If it triggers then it
      triggers because the resume code calls it with irqs enabled,
      which is a no-no not just for timekeeping_resume(), but also
      bad for a number of other resume handlers.
      Reported-by: NAndrey Borzenkov <arvidjaar@mail.ru>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1d4a7f1c
  5. 14 1月, 2009 1 次提交
  6. 05 1月, 2009 6 次提交
  7. 30 12月, 2008 1 次提交
    • S
      hrtimers: allow the hot-unplugging of all cpus · 5762ba18
      Sebastien Dugue 提交于
      Impact: fix CPU hotplug hang on Power6 testbox
      
      On architectures that support offlining all cpus (at least powerpc/pseries),
      hot-unpluging the tick_do_timer_cpu can result in a system hang.
      
      This comes from the fact that if the cpu going down happens to be the
      cpu doing the tick, then as the tick_do_timer_cpu handover happens after the
      cpu is dead (via the CPU_DEAD notification), we're left without ticks,
      jiffies are frozen and any task relying on timers (msleep, ...) is stuck.
      That's particularly the case for the cpu looping in __cpu_die() waiting
      for the dying cpu to be dead.
      
      This patch addresses this by having the tick_do_timer_cpu handover happen
      earlier during the CPU_DYING notification. For this, a new clockevent
      notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered
      in hrtimer_cpu_notify().
      Signed-off-by: NSebastien Dugue <sebastien.dugue@bull.net>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5762ba18
  8. 27 12月, 2008 1 次提交
    • F
      hrtimers: increase clock min delta threshold while interrupt hanging · 1cc4fff0
      Frederic Weisbecker 提交于
      Impact: avoid timer IRQ hanging slow systems
      
      While using the function graph tracer on a virtualized system, the
      hrtimer_interrupt can hang the system on an infinite loop.
      
      This can be caused in several situations:
      
       - the hardware is very slow and HZ is set too high
      
       - something intrusive is slowing the system down (tracing under emulation)
      
      ... and the next clock events to program are always before the current time.
      
      This patch implements a reasonable compromise: if such a situation is
      detected, we share the CPUs time in 1/4 to process the hrtimer interrupts.
      This is enough to let the system running without serious starvation.
      
      It has been successfully tested under VirtualBox with 1000 HZ and 100 HZ
      with function graph tracer launched. On both cases, the clock events were
      increased until about 25 ms periodic ticks, which means 40 HZ.
      
      So we change a hard to debug hang into a warning message and a system that
      still manages to limp along.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1cc4fff0
  9. 26 12月, 2008 1 次提交
  10. 19 12月, 2008 1 次提交
  11. 09 12月, 2008 1 次提交
  12. 04 12月, 2008 1 次提交
  13. 25 11月, 2008 1 次提交
    • P
      hrtimer: removing all ur callback modes · ca109491
      Peter Zijlstra 提交于
      Impact: cleanup, move all hrtimer processing into hardirq context
      
      This is an attempt at removing some of the hrtimer complexity by
      reducing the number of callback modes to 1.
      
      This means that all hrtimer callback functions will be ran from HARD-irq
      context.
      
      I went through all the 30 odd hrtimer callback functions in the kernel
      and saw only one that I'm not quite sure of, which is the one in
      net/can/bcm.c - hence I'm CC-ing the folks responsible for that code.
      
      Furthermore, the hrtimer core now calls callbacks directly with IRQs
      disabled in case you try to enqueue an expired timer. If this timer is a
      periodic timer (which should use hrtimer_forward() to advance its time)
      then it might be possible to end up in an inf. recursive loop due to the
      fact that hrtimer_forward() doesn't round up to the next timer
      granularity, and therefore keeps on calling the callback - obviously
      this needs a fix.
      
      Aside from that, this seems to compile and actually boot on my dual core
      test box - although I'm sure there are some bugs in, me not hitting any
      makes me certain :-)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ca109491
  14. 12 11月, 2008 1 次提交
  15. 11 11月, 2008 1 次提交
    • G
      timers: handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context · 5d5254f0
      Gautham R Shenoy 提交于
      Impact: fix incorrect locking triggered during hotplug-intense stress-tests
      
      While migrating the the CB_IRQSAFE_UNLOCKED timers during a cpu-offline,
      we queue them on the cb_pending list, so that they won't go
      stale.
      
      Thus, when the callbacks of the timers run from the softirq context,
      they could run into potential deadlocks, since these callbacks
      assume that they're running with irq's disabled, thereby annoying
      lockdep!
      
      Fix this by emulating hardirq context while running these callbacks from
      the hrtimer softirq.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.27 #2
      --------------------------------
      inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
      ksoftirqd/0/4 [HC0[0]:SC1[1]:HE1:SE0] takes:
       (&rq->lock){++..}, at: [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
      {in-hardirq-W} state was registered at:
        [<c014103c>] __lock_acquire+0x549/0x121e
        [<c0107890>] native_sched_clock+0x88/0x99
        [<c013aa12>] clocksource_get_next+0x39/0x3f
        [<c0139abc>] update_wall_time+0x616/0x7df
        [<c0141d6b>] lock_acquire+0x5a/0x74
        [<c0121724>] scheduler_tick+0x3a/0x18d
        [<c047ed45>] _spin_lock+0x1c/0x45
        [<c0121724>] scheduler_tick+0x3a/0x18d
        [<c0121724>] scheduler_tick+0x3a/0x18d
        [<c012c436>] update_process_times+0x3a/0x44
        [<c013c044>] tick_periodic+0x63/0x6d
        [<c013c062>] tick_handle_periodic+0x14/0x5e
        [<c010568c>] timer_interrupt+0x44/0x4a
        [<c0150c9f>] handle_IRQ_event+0x13/0x3d
        [<c0151c14>] handle_level_irq+0x79/0xbd
        [<c0105634>] do_IRQ+0x69/0x7d
        [<c01041e4>] common_interrupt+0x28/0x30
        [<c047007b>] aac_probe_one+0x1a3/0x3f3
        [<c047ec2d>] _spin_unlock_irqrestore+0x36/0x39
        [<c01512b4>] setup_irq+0x1be/0x1f9
        [<c065d70b>] start_kernel+0x259/0x2c5
        [<ffffffff>] 0xffffffff
      irq event stamp: 50102
      hardirqs last  enabled at (50102): [<c047ebf4>] _spin_unlock_irq+0x20/0x23
      hardirqs last disabled at (50101): [<c047edc2>] _spin_lock_irq+0xa/0x4b
      softirqs last  enabled at (50088): [<c0128ba6>] do_softirq+0x37/0x4d
      softirqs last disabled at (50099): [<c0128ba6>] do_softirq+0x37/0x4d
      
      other info that might help us debug this:
      no locks held by ksoftirqd/0/4.
      
      stack backtrace:
      Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.27 #2
       [<c013f6cb>] print_usage_bug+0x13e/0x147
       [<c013fef5>] mark_lock+0x493/0x797
       [<c01410b1>] __lock_acquire+0x5be/0x121e
       [<c0141d6b>] lock_acquire+0x5a/0x74
       [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
       [<c047ed45>] _spin_lock+0x1c/0x45
       [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
       [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
       [<c01210fd>] finish_task_switch+0x41/0xbd
       [<c0107890>] native_sched_clock+0x88/0x99
       [<c011dae6>] sched_rt_period_timer+0x0/0x1fc
       [<c0136dda>] run_hrtimer_pending+0x54/0xe5
       [<c011dae6>] sched_rt_period_timer+0x0/0x1fc
       [<c0128afb>] __do_softirq+0x7b/0xef
       [<c0128ba6>] do_softirq+0x37/0x4d
       [<c0128c12>] ksoftirqd+0x56/0xc5
       [<c0128bbc>] ksoftirqd+0x0/0xc5
       [<c0134649>] kthread+0x38/0x5d
       [<c0134611>] kthread+0x0/0x5d
       [<c0104477>] kernel_thread_helper+0x7/0x10
       =======================
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5d5254f0
  16. 20 10月, 2008 2 次提交
  17. 14 10月, 2008 1 次提交
  18. 13 10月, 2008 1 次提交
  19. 12 10月, 2008 1 次提交
  20. 29 9月, 2008 4 次提交
    • T
      hrtimer: prevent migration of per CPU hrtimers · ccc7dadf
      Thomas Gleixner 提交于
      Impact: per CPU hrtimers can be migrated from a dead CPU
      
      The hrtimer code has no knowledge about per CPU timers, but we need to
      prevent the migration of such timers and warn when such a timer is
      active at migration time.
      
      Explicitely mark the timers as per CPU and use a more understandable
      mode descriptor for the interrupts safe unlocked callback mode, which
      is used by hrtimer_sleeper and the scheduler code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ccc7dadf
    • T
      hrtimer: mark migration state · b00c1a99
      Thomas Gleixner 提交于
      Impact: during migration active hrtimers can be seen as inactive
      
      The migration code removes the hrtimers from the queues of the dead
      CPU and sets the state temporary to INACTIVE. The enqueue code sets it
      to ACTIVE/PENDING again.
      
      Prevent that the wrong state can be seen by using a separate migration
      state bit.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b00c1a99
    • T
      hrtimer: fix migration of CB_IRQSAFE_NO_SOFTIRQ hrtimers · 41e1022e
      Thomas Gleixner 提交于
      Impact: Stale timers after a CPU went offline.
      
      commit 37bb6cb4
             hrtimer: unlock hrtimer_wakeup
      
      changed the hrtimer sleeper callback mode to CB_IRQSAFE_NO_SOFTIRQ due
      to locking problems. A result of this change is that when enqueue is
      called for an already expired hrtimer the callback function is not
      longer called directly from the enqueue code. The normal callers have
      been fixed in the code, but the migration code which moves hrtimers
      from a dead CPU to a live CPU was not made aware of this.
      
      This can be fixed by checking the timer state after the call to
      enqueue in the migration code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      41e1022e
    • T
      hrtimer: migrate pending list on cpu offline · 7659e349
      Thomas Gleixner 提交于
      Impact: hrtimers which are on the pending list are not migrated at cpu
      	offline and can be stale forever
      
      Add the pending list migration when CONFIG_HIGH_RES_TIMERS is enabled
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7659e349
  21. 22 9月, 2008 1 次提交
  22. 11 9月, 2008 2 次提交
  23. 08 9月, 2008 1 次提交
  24. 06 9月, 2008 3 次提交
  25. 21 8月, 2008 1 次提交
  26. 04 7月, 2008 1 次提交