1. 20 9月, 2007 6 次提交
  2. 16 9月, 2007 5 次提交
    • T
      clockevents: prevent stale tick update on offline cpu · 5e41d0d6
      Thomas Gleixner 提交于
      Taking a cpu offline removes the cpu from the online mask before the
      CPU_DEAD notification is done. The clock events layer does the cleanup
      of the dead CPU from the CPU_DEAD notifier chain. tick_do_timer_cpu is
      used to avoid xtime lock contention by assigning the task of jiffies
      xtime updates to one CPU. If a CPU is taken offline, then this
      assignment becomes stale. This went unnoticed because most of the time
      the offline CPU went dead before the online CPU reached __cpu_die(),
      where the CPU_DEAD state is checked. In the case that the offline CPU did
      not reach the DEAD state before we reach __cpu_die(), the code in there
      goes to sleep for 100ms. Due to the stale time update assignment, the
      system is stuck forever.
      
      Take the assignment away when a cpu is not longer in the cpu_online_mask.
      We do this in the last call to tick_nohz_stop_sched_tick() when the offline
      CPU is on the way to the final play_dead() idle entry.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5e41d0d6
    • T
      clockevents: do not shutdown the oneshot broadcast device · 31d9b393
      Thomas Gleixner 提交于
      When a cpu goes offline it is removed from the broadcast masks. If the
      mask becomes empty the code shuts down the broadcast device. This is
      wrong, because the broadcast device needs to be ready for the online
      cpu going idle (into a c-state, which stops the local apic timer).
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      31d9b393
    • T
      clockevents: Enforce oneshot broadcast when broadcast mask is set on resume · 07eec6af
      Thomas Gleixner 提交于
      The jinxed VAIO refuses to resume without hitting keys on the keyboard
      when this is not enforced. It is unclear why the cpu ends up in a lower
      C State without notifying the clock events layer, but enforcing the
      oneshot broadcast here is safe.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      07eec6af
    • T
      timekeeping: Prevent time going backwards on resume · 6a669ee8
      Thomas Gleixner 提交于
      Timekeeping resume adjusts xtime by adding the slept time in seconds and
      resets the reference value of the clock source (clock->cycle_last).
      clock->cycle last is used to calculate the delta between the last xtime
      update and the readout of the clock source in __get_nsec_offset(). xtime
      plus the offset is the current time. The resume code ignores the delta
      which had already elapsed between the last xtime update and the actual
      time of suspend. If the suspend time is short, then we can see time
      going backwards on resume.
      
      Suspend:
      offs_s = clock->read() - clock->cycle_last;
      now = xtime + offs_s;
      timekeeping_suspend_time = read_rtc();
      
      Resume:
      sleep_time = read_rtc() - timekeeping_suspend_time;
      xtime.tv_sec += sleep_time;
      clock->cycle_last = clock->read();
      offs_r = clock->read() - clock->cycle_last;
      now = xtime + offs_r;
      
      if sleep_time_seconds == 0 and offs_r < offs_s, then time goes
      backwards.
      
      Fix this by storing the offset from the last xtime update and add it to
      xtime during resume, when we reset clock->cycle_last:
      
      sleep_time = read_rtc() - timekeeping_suspend_time;
      xtime.tv_sec += sleep_time;
      xtime += offs_s;	/* Fixup xtime offset at suspend time */
      clock->cycle_last = clock->read();
      offs_r = clock->read() - clock->cycle_last;
      now = xtime + offs_r;
      
      Thanks to Marcelo for tracking this down on the OLPC and providing the
      necessary details to analyze the root cause.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Tosatti <marcelo@kvack.org>
      6a669ee8
    • T
      timekeeping: access rtc outside of xtime lock · 3be90950
      Thomas Gleixner 提交于
      Lockdep complains about the access of rtc in timekeeping_suspend
      inside the interrupt disabled region of the write locked xtime lock.
      Move the access outside.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <johnstul@us.ibm.com>
      3be90950
  3. 12 9月, 2007 3 次提交
    • T
      Fix "no_sync_cmos_clock" logic inversion in kernel/time/ntp.c · 298a5df4
      Tony Breeds 提交于
      Seems to me that this timer will only get started on platforms that say
      they don't want it?
      Signed-off-by: NTony Breeds <tony@bakeyournoodle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Gabriel Paubert <paubert@iram.es>
      Cc: Zachary Amsden <zach@vmware.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      298a5df4
    • M
      Restore call_usermodehelper_pipe() behaviour · 3210f0ec
      Michael Ellerman 提交于
      The semantics of call_usermodehelper_pipe() used to be that it would fork
      the helper, and wait for the kernel thread to be started.  This was
      implemented by setting sub_info.wait to 0 (implicitly), and doing a
      wait_for_completion().
      
      As part of the cleanup done in 0ab4dc92,
      call_usermodehelper_pipe() was changed to pass 1 as the value for wait to
      call_usermodehelper_exec().
      
      This is equivalent to setting sub_info.wait to 1, which is a change from
      the previous behaviour.  Using 1 instead of 0 causes
      __call_usermodehelper() to start the kernel thread running
      wait_for_helper(), rather than directly calling ____call_usermodehelper().
      
      The end result is that the calling kernel code blocks until the user mode
      helper finishes.  As the helper is expecting input on stdin, and now no one
      is writing anything, everything locks up (observed in do_coredump).
      
      The fix is to change the 1 to UMH_WAIT_EXEC (aka 0), indicating that we
      want to wait for the kernel thread to be started, but not for the helper to
      finish.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Acked-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3210f0ec
    • A
      futex_compat: fix list traversal bugs · 179c85ea
      Arnd Bergmann 提交于
      The futex list traversal on the compat side appears to have
      a bug.
      
      It's loop termination condition compares:
      
              while (compat_ptr(uentry) != &head->list)
      
      But that can't be right because "uentry" has the special
      "pi" indicator bit still potentially set at bit 0.  This
      is cleared by fetch_robust_entry() into the "entry"
      return value.
      
      What this seems to mean is that the list won't terminate
      when list iteration gets back to the the head.  And we'll
      also process the list head like a normal entry, which could
      cause all kinds of problems.
      
      So we should check for equality with "entry".  That pointer
      is of the non-compat type so we have to do a little casting
      to keep the compiler and sparse happy.
      
      The same problem can in theory occur with the 'pending'
      variable, although that has not been reported from users
      so far.
      
      Based on the original patch from David Miller.
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      179c85ea
  4. 11 9月, 2007 1 次提交
    • R
      Fix spurious syscall tracing after PTRACE_DETACH + PTRACE_ATTACH · 7d941432
      Roland McGrath 提交于
      When PTRACE_SYSCALL was used and then PTRACE_DETACH is used, the
      TIF_SYSCALL_TRACE flag is left set on the formerly-traced task.  This
      means that when a new tracer comes along and does PTRACE_ATTACH, it's
      possible he gets a syscall tracing stop even though he's never used
      PTRACE_SYSCALL.  This happens if the task was in the middle of a system
      call when the second PTRACE_ATTACH was done.  The symptom is an
      unexpected SIGTRAP when the tracer thinks that only SIGSTOP should have
      been provoked by his ptrace calls so far.
      
      A few machines already fixed this in ptrace_disable (i386, ia64, m68k).
      But all other machines do not, and still have this bug.  On x86_64, this
      constitutes a regression in IA32 compatibility support.
      
      Since all machines now use TIF_SYSCALL_TRACE for this, I put the
      clearing of TIF_SYSCALL_TRACE in the generic ptrace_detach code rather
      than adding it to every other machine's ptrace_disable.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d941432
  5. 05 9月, 2007 8 次提交
  6. 31 8月, 2007 6 次提交
  7. 28 8月, 2007 7 次提交
    • I
      sched: clean up task_new_fair() · 9f508f82
      Ingo Molnar 提交于
      cleanup: we have the 'se' and 'curr' entity-pointers already,
      no need to use p->se and current->se.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      9f508f82
    • I
      sched: small schedstat fix · 213c8af6
      Ingo Molnar 提交于
      small schedstat fix: the cfs_rq->wait_runtime 'sum of all runtimes'
      statistics counters missed newly forked tasks and thus had a constant
      negative skew. Fix this.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      213c8af6
    • I
      sched: fix wait_start_fair condition in update_stats_wait_end() · b77d69db
      Ingo Molnar 提交于
      Peter Zijlstra noticed the following bug in SCHED_FEAT_SKIP_INITIAL (which
      is disabled by default at the moment): it relies on se.wait_start_fair
      being 0 while update_stats_wait_end() did not recognize a 0 value,
      so instead of 'skipping' the initial interval we gave the new child
      a maximum boost of +runtime-limit ...
      
      (No impact on the default kernel, but nice to fix for completeness.)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      b77d69db
    • T
      sched: call update_curr() in task_tick_fair() · 7109c442
      Ting Yang 提交于
      update the fair-clock before using it for the key value.
      
      [ mingo@elte.hu: small cleanups. ]
      Signed-off-by: NTing Yang <tingy@cs.umass.edu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      7109c442
    • I
      sched: make the scheduler converge to the ideal latency · f6cf891c
      Ingo Molnar 提交于
      de-HZ-ification of the granularity defaults unearthed a pre-existing
      property of CFS: while it correctly converges to the granularity goal,
      it does not prevent run-time fluctuations in the range of
      [-gran ... 0 ... +gran].
      
      With the increase of the granularity due to the removal of HZ
      dependencies, this becomes visible in chew-max output (with 5 tasks
      running):
      
       out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
       out:  27 . 27. 32 | flu:  0 .  0 | ran:   17 .   13 | per:   44 .   40
       out:  27 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   36 .   40
       out:  29 . 27. 32 | flu:  2 .  0 | ran:   17 .   13 | per:   46 .   40
       out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
       out:  29 . 27. 32 | flu:  0 .  0 | ran:   18 .   13 | per:   47 .   40
       out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
      
      average slice is the ideal 13 msecs and the period is picture-perfect 40
      msecs. But the 'ran' field fluctuates around 13.33 msecs and there's no
      mechanism in CFS to keep that from happening: it's a perfectly valid
      solution that CFS finds.
      
      to fix this we add a granularity/preemption rule that knows about
      the "target latency", which makes tasks that run longer than the ideal
      latency run a bit less. The simplest approach is to simply decrease the
      preemption granularity when a task overruns its ideal latency. For this
      we have to track how much the task executed since its last preemption.
      
      ( this adds a new field to task_struct, but we can eliminate that
        overhead in 2.6.24 by putting all the scheduler timestamps into an
        anonymous union. )
      
      with this change in place, chew-max output is fluctuation-less all
      around:
      
       out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
       out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
       out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
       out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
       out:  28 . 27. 39 | flu:  0 .  1 | ran:   13 .   13 | per:   41 .   40
       out:  28 . 27. 39 | flu:  0 .  1 | ran:   13 .   13 | per:   41 .   40
      
      this patch has no impact on any fastpath or on any globally observable
      scheduling property. (unless you have sharp enough eyes to see
      millisecond-level ruckles in glxgears smoothness :-)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      f6cf891c
    • M
      sched: fix sleeper bonus limit · 5f01d519
      Mike Galbraith 提交于
      There is an Amarok song switch time increase (regression) under
      hefty load.
      
      What is happening is that sleeper_bonus is never consumed, and only
      rarely goes below runtime_limit, so for the most part, Amarok isn't
      getting any bonus at all.  We're keeping sleeper_bonus right at
      runtime_limit (sched_latency == sched_runtime_limit == 40ms) forever, ie
      we don't consume if we're lower that that, and don't add if we're above
      it.  One Amarok thread waking (or anybody else) will push us past the
      threshold, so the next thread waking gets nada, but will reap pain from
      the previous thread waking until we drop back to runtime_limit.  It
      looks to me like under load, some random task gets a bonus, and
      everybody else pays, whether deserving or not.
      
      This diff fixed the regression for me at any load rate.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      5f01d519
    • H
      fix bogus hotplug cpu warning · d243769d
      Hugh Dickins 提交于
      Fix bogus DEBUG_PREEMPT warning on x86_64, when cpu brought online after
      bootup: current_is_keventd is right to note its use of smp_processor_id
      is preempt-safe, but should use raw_smp_processor_id to avoid the warning.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d243769d
  8. 26 8月, 2007 4 次提交