1. 11 11月, 2008 1 次提交
    • G
      timers: handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context · 5d5254f0
      Gautham R Shenoy 提交于
      Impact: fix incorrect locking triggered during hotplug-intense stress-tests
      
      While migrating the the CB_IRQSAFE_UNLOCKED timers during a cpu-offline,
      we queue them on the cb_pending list, so that they won't go
      stale.
      
      Thus, when the callbacks of the timers run from the softirq context,
      they could run into potential deadlocks, since these callbacks
      assume that they're running with irq's disabled, thereby annoying
      lockdep!
      
      Fix this by emulating hardirq context while running these callbacks from
      the hrtimer softirq.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.27 #2
      --------------------------------
      inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
      ksoftirqd/0/4 [HC0[0]:SC1[1]:HE1:SE0] takes:
       (&rq->lock){++..}, at: [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
      {in-hardirq-W} state was registered at:
        [<c014103c>] __lock_acquire+0x549/0x121e
        [<c0107890>] native_sched_clock+0x88/0x99
        [<c013aa12>] clocksource_get_next+0x39/0x3f
        [<c0139abc>] update_wall_time+0x616/0x7df
        [<c0141d6b>] lock_acquire+0x5a/0x74
        [<c0121724>] scheduler_tick+0x3a/0x18d
        [<c047ed45>] _spin_lock+0x1c/0x45
        [<c0121724>] scheduler_tick+0x3a/0x18d
        [<c0121724>] scheduler_tick+0x3a/0x18d
        [<c012c436>] update_process_times+0x3a/0x44
        [<c013c044>] tick_periodic+0x63/0x6d
        [<c013c062>] tick_handle_periodic+0x14/0x5e
        [<c010568c>] timer_interrupt+0x44/0x4a
        [<c0150c9f>] handle_IRQ_event+0x13/0x3d
        [<c0151c14>] handle_level_irq+0x79/0xbd
        [<c0105634>] do_IRQ+0x69/0x7d
        [<c01041e4>] common_interrupt+0x28/0x30
        [<c047007b>] aac_probe_one+0x1a3/0x3f3
        [<c047ec2d>] _spin_unlock_irqrestore+0x36/0x39
        [<c01512b4>] setup_irq+0x1be/0x1f9
        [<c065d70b>] start_kernel+0x259/0x2c5
        [<ffffffff>] 0xffffffff
      irq event stamp: 50102
      hardirqs last  enabled at (50102): [<c047ebf4>] _spin_unlock_irq+0x20/0x23
      hardirqs last disabled at (50101): [<c047edc2>] _spin_lock_irq+0xa/0x4b
      softirqs last  enabled at (50088): [<c0128ba6>] do_softirq+0x37/0x4d
      softirqs last disabled at (50099): [<c0128ba6>] do_softirq+0x37/0x4d
      
      other info that might help us debug this:
      no locks held by ksoftirqd/0/4.
      
      stack backtrace:
      Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.27 #2
       [<c013f6cb>] print_usage_bug+0x13e/0x147
       [<c013fef5>] mark_lock+0x493/0x797
       [<c01410b1>] __lock_acquire+0x5be/0x121e
       [<c0141d6b>] lock_acquire+0x5a/0x74
       [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
       [<c047ed45>] _spin_lock+0x1c/0x45
       [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
       [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
       [<c01210fd>] finish_task_switch+0x41/0xbd
       [<c0107890>] native_sched_clock+0x88/0x99
       [<c011dae6>] sched_rt_period_timer+0x0/0x1fc
       [<c0136dda>] run_hrtimer_pending+0x54/0xe5
       [<c011dae6>] sched_rt_period_timer+0x0/0x1fc
       [<c0128afb>] __do_softirq+0x7b/0xef
       [<c0128ba6>] do_softirq+0x37/0x4d
       [<c0128c12>] ksoftirqd+0x56/0xc5
       [<c0128bbc>] ksoftirqd+0x0/0xc5
       [<c0134649>] kthread+0x38/0x5d
       [<c0134611>] kthread+0x0/0x5d
       [<c0104477>] kernel_thread_helper+0x7/0x10
       =======================
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5d5254f0
  2. 20 10月, 2008 2 次提交
  3. 14 10月, 2008 1 次提交
  4. 13 10月, 2008 1 次提交
  5. 12 10月, 2008 1 次提交
  6. 29 9月, 2008 4 次提交
    • T
      hrtimer: prevent migration of per CPU hrtimers · ccc7dadf
      Thomas Gleixner 提交于
      Impact: per CPU hrtimers can be migrated from a dead CPU
      
      The hrtimer code has no knowledge about per CPU timers, but we need to
      prevent the migration of such timers and warn when such a timer is
      active at migration time.
      
      Explicitely mark the timers as per CPU and use a more understandable
      mode descriptor for the interrupts safe unlocked callback mode, which
      is used by hrtimer_sleeper and the scheduler code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ccc7dadf
    • T
      hrtimer: mark migration state · b00c1a99
      Thomas Gleixner 提交于
      Impact: during migration active hrtimers can be seen as inactive
      
      The migration code removes the hrtimers from the queues of the dead
      CPU and sets the state temporary to INACTIVE. The enqueue code sets it
      to ACTIVE/PENDING again.
      
      Prevent that the wrong state can be seen by using a separate migration
      state bit.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b00c1a99
    • T
      hrtimer: fix migration of CB_IRQSAFE_NO_SOFTIRQ hrtimers · 41e1022e
      Thomas Gleixner 提交于
      Impact: Stale timers after a CPU went offline.
      
      commit 37bb6cb4
             hrtimer: unlock hrtimer_wakeup
      
      changed the hrtimer sleeper callback mode to CB_IRQSAFE_NO_SOFTIRQ due
      to locking problems. A result of this change is that when enqueue is
      called for an already expired hrtimer the callback function is not
      longer called directly from the enqueue code. The normal callers have
      been fixed in the code, but the migration code which moves hrtimers
      from a dead CPU to a live CPU was not made aware of this.
      
      This can be fixed by checking the timer state after the call to
      enqueue in the migration code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      41e1022e
    • T
      hrtimer: migrate pending list on cpu offline · 7659e349
      Thomas Gleixner 提交于
      Impact: hrtimers which are on the pending list are not migrated at cpu
      	offline and can be stale forever
      
      Add the pending list migration when CONFIG_HIGH_RES_TIMERS is enabled
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7659e349
  7. 22 9月, 2008 1 次提交
  8. 11 9月, 2008 2 次提交
  9. 08 9月, 2008 1 次提交
  10. 06 9月, 2008 3 次提交
  11. 21 8月, 2008 1 次提交
  12. 04 7月, 2008 1 次提交
  13. 26 6月, 2008 1 次提交
  14. 27 5月, 2008 2 次提交
  15. 25 5月, 2008 1 次提交
    • C
      Remove argument from open_softirq which is always NULL · 962cf36c
      Carlos R. Mafra 提交于
      As git-grep shows, open_softirq() is always called with the last argument
      being NULL
      
      block/blk-core.c:       open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL);
      kernel/hrtimer.c:       open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq, NULL);
      kernel/rcuclassic.c:    open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
      kernel/rcupreempt.c:    open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
      kernel/sched.c: open_softirq(SCHED_SOFTIRQ, run_rebalance_domains, NULL);
      kernel/softirq.c:       open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
      kernel/softirq.c:       open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
      kernel/timer.c: open_softirq(TIMER_SOFTIRQ, run_timer_softirq, NULL);
      net/core/dev.c: open_softirq(NET_TX_SOFTIRQ, net_tx_action, NULL);
      net/core/dev.c: open_softirq(NET_RX_SOFTIRQ, net_rx_action, NULL);
      
      This observation has already been made by Matthew Wilcox in June 2002
      (http://www.cs.helsinki.fi/linux/linux-kernel/2002-25/0687.html)
      
      "I notice that none of the current softirq routines use the data element
      passed to them."
      
      and the situation hasn't changed since them. So it appears we can safely
      remove that extra argument to save 128 (54) bytes of kernel data (text).
      Signed-off-by: NCarlos R. Mafra <crmafra@ift.unesp.br>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      962cf36c
  16. 13 5月, 2008 1 次提交
  17. 04 5月, 2008 1 次提交
  18. 30 4月, 2008 1 次提交
  19. 29 4月, 2008 1 次提交
    • T
      hrtimer: raise softirq unlocked to avoid circular lock dependency · 0c96c597
      Thomas Gleixner 提交于
      The scheduler hrtimer bits in 2.6.25 introduced a circular lock
      dependency in a rare code path:
      
      =======================================================
      [ INFO: possible circular locking dependency detected ]
      2.6.25-sched-devel.git-x86-latest.git #19
      -------------------------------------------------------
      X/2980 is trying to acquire lock:
       (&rq->rq_lock_key#2){++..}, at: [<ffffffff80230146>] task_rq_lock+0x56/0xa0
      
      but task is already holding lock:
       (&cpu_base->lock){++..}, at: [<ffffffff80257ae1>] lock_hrtimer_base+0x31/0x60
      
      which lock already depends on the new lock.
      
      The scenario which leads to this is:
      
      posix-timer signal is delivered
       -> posix-timer is rearmed
          timer is already expired in hrtimer_enqueue()
           -> softirq is raised
      
      To prevent this we need to move the raise of the softirq out of the
      base->lock protected code path.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      0c96c597
  20. 28 4月, 2008 1 次提交
    • B
      hrtimer: timeout too long when using HRTIMER_CB_SOFTIRQ · d7b41a24
      Bodo Stroesser 提交于
      When using hrtimer with timer->cb_mode == HRTIMER_CB_SOFTIRQ
      in some cases the clockevent is not programmed.
      This happens, if:
       - a timer is rearmed while it's state is HRTIMER_STATE_CALLBACK
       - hrtimer_reprogram() returns -ETIME, when it is called after
         CALLBACK is finished. This occurs if the new timer->expires
         is in the past when CALLBACK is done.
      In this case, the timer needs to be removed from the tree and put
      onto the pending list again.
      
      The patch is against 2.6.22.5, but AFAICS, it is relevant
      for 2.6.25 also (in run_hrtimer_pending()).
      Signed-off-by: NBodo Stroesser <bstroesser@fujitsu-siemens.com>
      Cc: stable@kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d7b41a24
  21. 21 4月, 2008 2 次提交
  22. 17 4月, 2008 2 次提交
  23. 15 2月, 2008 2 次提交
  24. 10 2月, 2008 2 次提交
    • O
      hrtimer: don't modify restart_block->fn in restart functions · c289b074
      Oleg Nesterov 提交于
      hrtimer_nanosleep_restart() clears/restores restart_block->fn. This is
      pointless and complicates its usage. Note that if sys_restart_syscall()
      doesn't actually happen, we have a bogus "pending" restart->fn anyway,
      this is harmless.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Alexey Dobriyan <adobriyan@sw.ru>
      Cc: Pavel Emelyanov <xemul@sw.ru>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Toyo Abe <toyoa@mvista.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c289b074
    • O
      hrtimer: fix *rmtp handling in hrtimer_nanosleep() · 080344b9
      Oleg Nesterov 提交于
      Spotted by Pavel Emelyanov and Alexey Dobriyan.
      
      hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to
      the local variable which lives in the caller's stack frame. This means that
      if sys_restart_syscall() actually happens and it is interrupted as well, we
      don't update the user-space variable, but write into the already dead stack
      frame.
      
      Introduced by commit 04c22714
      hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier
      
      Change the callers to pass "__user *rmtp" to hrtimer_nanosleep(), and change
      hrtimer_nanosleep() to use copy_to_user() to actually update *rmtp.
      
      Small problem remains. man 2 nanosleep states that *rtmp should be written if
      nanosleep() was interrupted (it says nothing whether it is OK to update *rmtp
      if nanosleep returns 0), but (with or without this patch) we can dirty *rem
      even if nanosleep() returns 0.
      
      NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other
      bugs. Fixed by the next patch.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Alexey Dobriyan <adobriyan@sw.ru>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Pavel Emelyanov <xemul@sw.ru>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Toyo Abe <toyoa@mvista.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      
       include/linux/hrtimer.h |    2 -
       kernel/hrtimer.c        |   51 +++++++++++++++++++++++++-----------------------
       kernel/posix-timers.c   |   14 +------------
       3 files changed, 30 insertions(+), 37 deletions(-)
      080344b9
  25. 06 2月, 2008 1 次提交
    • D
      timerfd: new timerfd API · 4d672e7a
      Davide Libenzi 提交于
      This is the new timerfd API as it is implemented by the following patch:
      
      int timerfd_create(int clockid, int flags);
      int timerfd_settime(int ufd, int flags,
      		    const struct itimerspec *utmr,
      		    struct itimerspec *otmr);
      int timerfd_gettime(int ufd, struct itimerspec *otmr);
      
      The timerfd_create() API creates an un-programmed timerfd fd.  The "clockid"
      parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
      
      The timerfd_settime() API give new settings by the timerfd fd, by optionally
      retrieving the previous expiration time (in case the "otmr" parameter is not
      NULL).
      
      The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
      is set in the "flags" parameter.  Otherwise it's a relative time.
      
      The timerfd_gettime() API returns the next expiration time of the timer, or
      {0, 0} if the timerfd has not been set yet.
      
      Like the previous timerfd API implementation, read(2) and poll(2) are
      supported (with the same interface).  Here's a simple test program I used to
      exercise the new timerfd APIs:
      
      http://www.xmailserver.org/timerfd-test2.c
      
      [akpm@linux-foundation.org: coding-style cleanups]
      [akpm@linux-foundation.org: fix ia64 build]
      [akpm@linux-foundation.org: fix m68k build]
      [akpm@linux-foundation.org: fix mips build]
      [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
      [heiko.carstens@de.ibm.com: fix s390]
      [akpm@linux-foundation.org: fix powerpc build]
      [akpm@linux-foundation.org: fix sparc64 more]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4d672e7a
  26. 02 2月, 2008 1 次提交
  27. 26 1月, 2008 2 次提交