1. 10 7月, 2007 27 次提交
  2. 04 7月, 2007 1 次提交
    • T
      NTP: remove clock_was_set() call to prevent deadlock · 746976a3
      Thomas Gleixner 提交于
      The clock_was_set() call in seconds_overflow() which happens only when
      leap seconds are inserted / deleted is wrong in two aspects:
      
      1. it results in a call to on_each_cpu() with interrupts disabled
      2. it is potential deadlock source vs. call_lock in smp_call_function()
      
      The only possible side effect of the removal might be, that an absolute
      CLOCK_REALTIME timer fires 1 second too late, in the rare case of leap
      second deletion and an absolute CLOCK_REALTIME timer which expires in
      the affected time frame. It will never fire too early.
      
      This was probably observed by the reporter of a June 30th -> July 1st
      hang: http://lkml.org/lkml/2007/7/3/103
      
      A similar problem was observed by Dave Jones, who provided a screen shot
      with a lockdep back trace, which allowed to analyse the problem.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      746976a3
  3. 02 7月, 2007 1 次提交
    • R
      PM: introduce set_target method in pm_ops · 2391dae3
      Rafael J. Wysocki 提交于
      Commit 52ade9b3 changed the suspend code
      ordering to execute pm_ops->prepare() after the device model per-device
      .suspend() calls in order to fix some ACPI-related issues.  Unfortunately, it
      broke the at91 platform which assumed that pm_ops->prepare() would be called
      before suspending devices.
      
      at91 used pm_ops->prepare() to get notified of the target system sleep state,
      so that it could use this information while suspending devices.  However, with
      the current suspend code ordering pm_ops->prepare() is called too late for
      this purpose.  Thus, at91 needs an additional method in 'struct pm_ops' that
      will be used for notifying the platform of the target system sleep state.
      Moreover, in the future such a method will also be needed by ACPI.
      
      This patch adds the .set_target() method to 'struct pm_ops' and makes the
      suspend code call it, if implemented, before executing the device model
      per-device .suspend() calls.  It also modifies the at91 code to use
      pm_ops->set_target() instead of pm_ops->prepare().
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NDavid Brownell <dbrownell@users.sourceforge.net>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2391dae3
  4. 29 6月, 2007 2 次提交
  5. 25 6月, 2007 1 次提交
  6. 24 6月, 2007 3 次提交
  7. 22 6月, 2007 1 次提交
    • T
      posix-timers: Prevent softirq starvation by small intervals and SIG_IGN · 58229a18
      Thomas Gleixner 提交于
      posix-timers which deliver an ignored signal are currently rearmed in
      the timer softirq: This is necessary because the timer needs to be
      delivered again when SIG_IGN is removed. This is not a problem, when
      the interval is reasonable.
      
      With high resolution timers enabled one might arm a posix timer with a
      very small interval and ignore the signal. This might lead to a
      softirq starvation when the interval is so small that the timer is
      requeued onto the softirq pending list right away.
      
      This problem was pointed out by Jan Kiszka. Thanks Jan !
      
      The correct solution would be to stop the timer, when the signal is
      ignored and rearm it when SIG_IGN is removed. Unfortunately this
      requires modification in sigaction and involves non trivial sighand
      locking. It's too late in the release cycle for such a change.
      
      For now we just keep the timer running and enforce that the timer only
      fires every jiffie. This does not break anything as we keep the
      overrun counter correct. It adds a little inaccuracy to the
      timer_gettime() interface, but...
      
      The more complex change is necessary anyway to fix another short
      coming of the current implementation, which I discovered while looking
      at this problem: A pending signal is discarded when SIG_IGN is set. In
      case that a posixtimer signal is pending then it is discarded as well,
      but when SIG_IGN is removed later nothing rearms the timer. This is
      not new, it's that way since posix timers have been merged. So nothing
      to worry about right now.
      
      I have a working solution to fix all of this, but the impact is too
      large for both stable and 2.6.22. I'm going to send it out for review
      in the next days.
      
      This should go into 2.6.21.stable as well.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Stable Team <stable@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58229a18
  8. 19 6月, 2007 4 次提交
    • L
      Fix possible runqueue lock starvation in wait_task_inactive() · fa490cfd
      Linus Torvalds 提交于
      Miklos Szeredi reported very long pauses (several seconds, sometimes
      more) on his T60 (with a Core2Duo) which he managed to track down to
      wait_task_inactive()'s open-coded busy-loop.
      
      He observed that an interrupt on one core tries to acquire the
      runqueue-lock but does not succeed in doing so for a very long time -
      while wait_task_inactive() on the other core loops waiting for the first
      core to deschedule a task (which it wont do while spinning in an
      interrupt handler).
      
      This rewrites wait_task_inactive() to do all its waiting optimistically
      without any locks taken at all, and then just double-check the end
      result with the proper runqueue lock held over just a very short
      section.  If there were races in the optimistic wait, of a preemption
      event scheduled the process away, we simply re-synchronize, and start
      over.
      
      So the code now looks like this:
      
      	repeat:
      		/* Unlocked, optimistic looping! */
      		rq = task_rq(p);
      		while (task_running(rq, p))
      			cpu_relax();
      
      		/* Get the *real* values */
      		rq = task_rq_lock(p, &flags);
      		running = task_running(rq, p);
      		array = p->array;
      		task_rq_unlock(rq, &flags);
      
      		/* Check them.. */
      		if (unlikely(running)) {
      			cpu_relax();
      			goto repeat;
      		}
      
      		/* Preempted away? Yield if so.. */
      		if (unlikely(array)) {
      			yield();
      			goto repeat;
      		}
      
      Basically, that first "while()" loop is done entirely without any
      locking at all (and doesn't check for the case where the target process
      might have been preempted away), and so it's possibly "incorrect", but
      we don't really care.  Both the runqueue used, and the "task_running()"
      check might be the wrong tests, but they won't oops - they just mean
      that we could possibly get the wrong results due to lack of locking and
      exit the loop early in the case of a race condition.
      
      So once we've exited the loop, we then get the proper (and careful) rq
      lock, and check the running/runnable state _safely_.  And if it turns
      out that our quick-and-dirty and unsafe loop was wrong after all, we
      just go back and try it all again.
      
      (The patch also adds a lot of comments, which is the actual bulk of it
      all, to make it more obvious why we can do these things without holding
      the locks).
      
      Thanks to Miklos for all the testing and tracking it down.
      Tested-by: NMiklos Szeredi <miklos@szeredi.hu>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa490cfd
    • I
      sched: fix SysRq-N (normalize RT tasks) · a0f98a1c
      Ingo Molnar 提交于
      Gene Heskett reported the following problem while testing CFS: SysRq-N
      is not always effective in normalizing tasks back to SCHED_OTHER.
      
      The reason for that turns out to be the following bug:
      
       - normalize_rt_tasks() uses for_each_process() to iterate through all
         tasks in the system.  The problem is, this method does not iterate
         through all tasks, it iterates through all thread groups.
      
      The proper mechanism to enumerate over all threads is to use a
      do_each_thread() + while_each_thread() loop.
      Reported-by: NGene Heskett <gene.heskett@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0f98a1c
    • B
      Fix signalfd interaction with thread-private signals · caec4e8d
      Benjamin Herrenschmidt 提交于
      Don't let signalfd dequeue private signals off other threads (in the
      case of things like SIGILL or SIGSEGV, trying to do so would result
      in undefined behaviour on who actually gets the signal, since they
      are force unblocked).
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      caec4e8d
    • T
      Revert "futex_requeue_pi optimization" · bd197234
      Thomas Gleixner 提交于
      This reverts commit d0aa7a70.
      
      It not only introduced user space visible changes to the futex syscall,
      it is also non-functional and there is no way to fix it proper before
      the 2.6.22 release.
      
      The breakage report ( http://lkml.org/lkml/2007/5/12/17 ) went
      unanswered, and unfortunately it turned out that the concept is not
      feasible at all.  It violates the rtmutex semantics badly by introducing
      a virtual owner, which hacks around the coupling of the user-space
      pi_futex and the kernel internal rt_mutex representation.
      
      At the moment the only safe option is to remove it fully as it contains
      user-space visible changes to broken kernel code, which we do not want
      to expose in the 2.6.22 release.
      
      The patch reverts the original patch mostly 1:1, but contains a couple
      of trivial manual cleanups which were necessary due to patches, which
      touched the same area of code later.
      
      Verified against the glibc tests and my own PI futex tests.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Pierre Peiffer <pierre.peiffer@bull.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd197234