1. 13 12月, 2008 2 次提交
  2. 04 12月, 2008 1 次提交
    • J
      time: catch xtime_nsec underflows and fix them · 6c9bacb4
      john stultz 提交于
      Impact: fix time warp bug
      
      Alex Shi, along with Yanmin Zhang have been noticing occasional time
      inconsistencies recently. Through their great diagnosis, they found that
      the xtime_nsec value used in update_wall_time was occasionally going
      negative. After looking through the code for awhile, I realized we have
      the possibility for an underflow when three conditions are met in
      update_wall_time():
      
        1) We have accumulated a second's worth of nanoseconds, so we
           incremented xtime.tv_sec and appropriately decrement xtime_nsec.
           (This doesn't cause xtime_nsec to go negative, but it can cause it
            to be small).
      
        2) The remaining offset value is large, but just slightly less then
           cycle_interval.
      
        3) clocksource_adjust() is speeding up the clock, causing a
           corrective amount (compensating for the increase in the multiplier
           being multiplied against the unaccumulated offset value) to be
           subtracted from xtime_nsec.
      
      This can cause xtime_nsec to underflow.
      
      Unfortunately, since we notify the NTP subsystem via second_overflow()
      whenever we accumulate a full second, and this effects the error
      accumulation that has already occured, we cannot simply revert the
      accumulated second from xtime nor move the second accumulation to after
      the clocksource_adjust call without a change in behavior.
      
      This leaves us with (at least) two options:
      
      1) Simply return from clocksource_adjust() without making a change if we
         notice the adjustment would cause xtime_nsec to go negative.
      
      This would work, but I'm concerned that if a large adjustment was needed
      (due to the error being large), it may be possible to get stuck with an
      ever increasing error that becomes too large to correct (since it may
      always force xtime_nsec negative). This may just be paranoia on my part.
      
      2) Catch xtime_nsec if it is negative, then add back the amount its
         negative to both xtime_nsec and the error.
      
      This second method is consistent with how we've handled earlier rounding
      issues, and also has the benefit that the error being added is always in
      the oposite direction also always equal or smaller then the correction
      being applied. So the risk of a corner case where things get out of
      control is lessened.
      
      This patch fixes bug 11970, as tested by Yanmin Zhang
      http://bugzilla.kernel.org/show_bug.cgi?id=11970
      
      Reported-by: alex.shi@intel.com
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Acked-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Tested-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6c9bacb4
  3. 11 11月, 2008 1 次提交
    • T
      nohz: disable tick_nohz_kick_tick() for now · ae99286b
      Thomas Gleixner 提交于
      Impact: nohz powersavings and wakeup regression
      
      commit fb02fbc1 (NOHZ: restart tick
      device from irq_enter()) causes a serious wakeup regression.
      
      While the patch is correct it does not take into account that spurious
      wakeups happen on x86. A fix for this issue is available, but we just
      revert to the .27 behaviour and let long running softirqs screw
      themself.
      
      Disable it for now.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ae99286b
  4. 22 10月, 2008 1 次提交
    • T
      NOHZ: fix thinko in the timer restart code path · c4bd822e
      Thomas Gleixner 提交于
      commit fb02fbc1 (NOHZ: restart tick
      device from irq_enter())
      
      solves the problem of stale jiffies when long running softirqs happen
      in a long idle sleep period, but it has a major thinko in it:
      
      When the interrupt which came in _is_ the timer interrupt which should
      expire ts->sched_timer then we cancel and rearm the timer _before_ it
      gets expired in hrtimer_interrupt() to the next period. That means the
      call back function is not called. This game can go on for ever :(
      
      Prevent this by making sure to only rearm the timer when the expiry
      time is more than one tick_period away. Otherwise keep it running as
      it is either already expired or will expiry at the right point to
      update jiffies.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NVenkatesch Pallipadi <venkatesh.pallipadi@intel.com>
      c4bd822e
  5. 20 10月, 2008 3 次提交
  6. 18 10月, 2008 3 次提交
    • T
      NOHZ: restart tick device from irq_enter() · fb02fbc1
      Thomas Gleixner 提交于
      We did not restart the tick device from irq_enter() to avoid double
      reprogramming and extra events in the return immediate to idle case.
      
      But long lasting softirqs can lead to a situation where jiffies become
      stale:
      
      idle()
        tick stopped (reprogrammed to next pending timer)
        halt()
         interrupt
           jiffies updated from irq_enter()
           interrupt handler
           softirq function 1 runs 20ms
           softirq function 2 arms a 10ms timer with a stale jiffies value
           jiffies updated from irq_exit()
           timer wheel has now an already expired timer
           (the one added in function 2)
           timer fires and timer softirq runs
      
      This was discovered when debugging a timer problem which happend only
      when the ath5k driver is active. The debugging proved that there is a
      softirq function running for more than 20ms, which is a bug by itself.
      
      To solve this we restart the tick timer right from irq_enter(), but do
      not go through the other functions which are necessary to return from
      idle when need_resched() is set.
      Reported-by: NElias Oltmanns <eo@nebensachen.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NElias Oltmanns <eo@nebensachen.de>
      fb02fbc1
    • T
      NOHZ: split tick_nohz_restart_sched_tick() · c34bec5a
      Thomas Gleixner 提交于
      Split out the clock event device reprogramming. Preparatory
      patch.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c34bec5a
    • T
      NOHZ: unify the nohz function calls in irq_enter() · 719254fa
      Thomas Gleixner 提交于
      We have two separate nohz function calls in irq_enter() for no good
      reason. Just call a single NOHZ function from irq_enter() and call
      the bits in the tick code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      719254fa
  7. 17 10月, 2008 1 次提交
  8. 10 10月, 2008 1 次提交
  9. 04 10月, 2008 1 次提交
  10. 29 9月, 2008 1 次提交
    • T
      hrtimer: prevent migration of per CPU hrtimers · ccc7dadf
      Thomas Gleixner 提交于
      Impact: per CPU hrtimers can be migrated from a dead CPU
      
      The hrtimer code has no knowledge about per CPU timers, but we need to
      prevent the migration of such timers and warn when such a timer is
      active at migration time.
      
      Explicitely mark the timers as per CPU and use a more understandable
      mode descriptor for the interrupts safe unlocked callback mode, which
      is used by hrtimer_sleeper and the scheduler code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ccc7dadf
  11. 24 9月, 2008 3 次提交
  12. 23 9月, 2008 5 次提交
    • I
      timers: fix build error in !oneshot case · f8e256c6
      Ingo Molnar 提交于
       kernel/time/tick-common.c: In function ‘tick_setup_periodic’:
       kernel/time/tick-common.c:113: error: implicit declaration of function ‘tick_broadcast_oneshot_active’
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f8e256c6
    • T
      clockevents: prevent mode mismatch on cpu online · 27ce4cb4
      Thomas Gleixner 提交于
      Impact: timer hang on CPU online observed on AMD C1E systems
      
      When a CPU is brought online then the broadcast machinery can
      be in the one shot state already. Check this and setup the timer 
      device of the new CPU in one shot mode so the broadcast code
      can pick up the next_event value correctly.
      
      Another AMD C1E oddity, as we switch to broadcast immediately and
      not after the full bring up via the ACPI cpu idle code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      27ce4cb4
    • T
      clockevents: check broadcast device not tick device · 30274569
      Thomas Gleixner 提交于
      Impact: Possible hang on CPU online observed on AMD C1E machines.
      
      The broadcast setup code looks at the mode of the tick device to
      determine whether it needs to be shut down or setup. This is wrong
      when the broadcast mode is set to one shot already. This can happen
      when a CPU is brought online as it goes through the periodic setup
      first.
      
      The problem went unnoticed as sane systems do not call into that code
      before the switch to one shot for the clock event device happens.
      The AMD C1E idle routine switches over immediately and thereby shuts
      down the just setup device before the first interrupt happens.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      30274569
    • T
      clockevents: prevent stale tick_next_period for onlining CPUs · 49d670fb
      Thomas Gleixner 提交于
      Impact: possible hang on CPU onlining in timer one shot mode.
      
      The tick_next_period variable is only used during boot on nohz/highres
      enabled systems, but for CPU onlining it needs to be maintained when
      the per cpu clock events device operates in one shot mode.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      49d670fb
    • T
      clockevents: prevent cpu online to interfere with nohz · 6441402b
      Thomas Gleixner 提交于
      Impact: rare hang which can be triggered on CPU online.
      
      tick_do_timer_cpu keeps track of the CPU which updates jiffies
      via do_timer. The value -1 is used to signal, that currently no
      CPU is doing this. There are two cases, where the variable can 
      have this state:
      
       boot:
          necessary for systems where the boot cpu id can be != 0
      
       nohz long idle sleep:
          When the CPU which did the jiffies update last goes into
          a long idle sleep it drops the update jiffies duty so
          another CPU which is not idle can pick it up and keep
          jiffies going.
      
      Using the same value for both situations is wrong, as the CPU online
      code can see the -1 state when the timer of the newly onlined CPU is
      setup. The setup for a newly onlined CPU goes through periodic mode
      and can pick up the do_timer duty without being aware of the nohz /
      highres mode of the already running system.
      
      Use two separate states and make them constants to avoid magic
      numbers confusion. 
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6441402b
  13. 17 9月, 2008 1 次提交
    • T
      clockevents: make device shutdown robust · 2344abbc
      Thomas Gleixner 提交于
      The device shut down does not cleanup the next_event variable of the
      clock event device. So when the device is reactivated the possible
      stale next_event value can prevent the device to be reprogrammed as it
      claims to wait on a event already.
      
      This is the root cause of the resurfacing suspend/resume problem,
      where systems need key press to come back to life.
      
      Fix this by setting next_event to KTIME_MAX when the device is shut
      down. Use a separate function for shutdown which takes care of that
      and only keep the direct set mode call in the broadcast code, where we
      can not touch the next_event value.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      2344abbc
  14. 10 9月, 2008 2 次提交
    • T
      clockevents: remove WARN_ON which was used to gather information · e75b986a
      Thomas Gleixner 提交于
      The issue of the endless reprogramming loop due to a too small
      min_delta_ns was fixed with the previous updates of the clock events
      code, but we had no information about the spread of this problem. I
      added a WARN_ON to get automated information via kerneloops.org and to
      get some direct reports, which allowed me to analyse the affected
      machines.
      
      The WARN_ON has served its purpose and would be annoying for a release
      kernel. Remove it and just keep the information about the increase of
      the min_delta_ns value.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      e75b986a
    • T
      clockevents: remove WARN_ON which was used to gather information · 61c22c34
      Thomas Gleixner 提交于
      The issue of the endless reprogramming loop due to a too small
      min_delta_ns was fixed with the previous updates of the clock events
      code, but we had no information about the spread of this problem. I
      added a WARN_ON to get automated information via kerneloops.org and to
      get some direct reports, which allowed me to analyse the affected
      machines.
      
      The WARN_ON has served its purpose and would be annoying for a release
      kernel. Remove it and just keep the information about the increase of
      the min_delta_ns value.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      61c22c34
  15. 08 9月, 2008 1 次提交
  16. 06 9月, 2008 4 次提交
    • M
      ntp: fix calculation of the next jiffie to trigger RTC sync · 4ff4b9e1
      Maciej W. Rozycki 提交于
      We have a bug in the calculation of the next jiffie to trigger the RTC
      synchronisation.  The aim here is to run sync_cmos_clock() as close as
      possible to the middle of a second.  Which means we want this function to
      be called less than or equal to half a jiffie away from when now.tv_nsec
      equals 5e8 (500000000).
      
      If this is not the case for a given call to the function, for this purpose
      instead of updating the RTC we calculate the offset in nanoseconds to the
      next point in time where now.tv_nsec will be equal 5e8.  The calculated
      offset is then converted to jiffies as these are the unit used by the
      timer.
      
      Hovewer timespec_to_jiffies() used here uses a ceil()-type rounding mode,
      where the resulting value is rounded up.  As a result the range of
      now.tv_nsec when the timer will trigger is from 5e8 to 5e8 + TICK_NSEC
      rather than the desired 5e8 - TICK_NSEC / 2 to 5e8 + TICK_NSEC / 2.
      
      As a result if for example sync_cmos_clock() happens to be called at the
      time when now.tv_nsec is between 5e8 + TICK_NSEC / 2 and 5e8 to 5e8 +
      TICK_NSEC, it will simply be rescheduled HZ jiffies later, falling in the
      same range of now.tv_nsec again.  Similarly for cases offsetted by an
      integer multiple of TICK_NSEC.
      
      This change addresses the problem by subtracting TICK_NSEC / 2 from the
      nanosecond offset to the next point in time where now.tv_nsec will be
      equal 5e8, effectively shifting the following rounding in
      timespec_to_jiffies() so that it produces a rounded-to-nearest result.
      Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4ff4b9e1
    • T
      clockevents: broadcast fixup possible waiters · 7300711e
      Thomas Gleixner 提交于
      Until the C1E patches arrived there where no users of periodic broadcast
      before switching to oneshot mode. Now we need to trigger a possible
      waiter for a periodic broadcast when switching to oneshot mode.
      Otherwise we can starve them for ever.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7300711e
    • A
      hrtimer: convert kernel/* to the new hrtimer apis · cc584b21
      Arjan van de Ven 提交于
      In order to be able to do range hrtimers we need to use accessor functions
      to the "expire" member of the hrtimer struct.
      This patch converts kernel/* to these accessors.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      cc584b21
    • P
      sched_clock: fix NOHZ interaction · 56c7426b
      Peter Zijlstra 提交于
      If HLT stops the TSC, we'll fail to account idle time, thereby inflating the
      actual process times. Fix this by re-calibrating the clock against GTOD when
      leaving nohz mode.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NAvi Kivity <avi@qumranet.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      56c7426b
  17. 05 9月, 2008 5 次提交
    • T
      clockevents: prevent endless loop lockup · 1fb9b7d2
      Thomas Gleixner 提交于
      The C1E/HPET bug reports on AMDX2/RS690 systems where tracked down to a
      too small value of the HPET minumum delta for programming an event.
      
      The clockevents code needs to enforce an interrupt event on the clock event
      device in some cases. The enforcement code was stupid and naive, as it just
      added the minimum delta to the current time and tried to reprogram the device.
      When the minimum delta is too small, then this loops forever.
      
      Add a sanity check. Allow reprogramming to fail 3 times, then print a warning
      and double the minimum delta value to make sure, that this does not happen again.
      Use the same function for both tick-oneshot and tick-broadcast code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1fb9b7d2
    • T
      clockevents: prevent multiple init/shutdown · 9c17bcda
      Thomas Gleixner 提交于
      While chasing the C1E/HPET bugreports I went through the clock events
      code inch by inch and found that the broadcast device can be initialized
      and shutdown multiple times. Multiple shutdowns are not critical, but
      useless waste of time. Multiple initializations are simply broken. Another
      CPU might have the device in use already after the first initialization and
      the second init could just render it unusable again.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9c17bcda
    • T
      clockevents: enforce reprogram in oneshot setup · 7205656a
      Thomas Gleixner 提交于
      In tick_oneshot_setup we program the device to the given next_event,
      but we do not check the return value. We need to make sure that the
      device is programmed enforced so the interrupt handler engine starts
      working. Split out the reprogramming function from tick_program_event()
      and call it with the device, which was handed in to tick_setup_oneshot().
      Set the force argument, so the devices is firing an interrupt.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7205656a
    • T
      clockevents: prevent endless loop in periodic broadcast handler · d4496b39
      Thomas Gleixner 提交于
      The reprogramming of the periodic broadcast handler was broken,
      when the first programming returned -ETIME. The clockevents code
      stores the new expiry value in the clock events device next_event field
      only when the programming time has not been elapsed yet. The loop in
      question calculates the new expiry value from the next_event value
      and therefor never increases.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d4496b39
    • V
      clockevents: prevent clockevent event_handler ending up handler_noop · 7c1e7689
      Venkatesh Pallipadi 提交于
      There is a ordering related problem with clockevents code, due to which
      clockevents_register_device() called after tickless/highres switch
      will not work. The new clockevent ends up with clockevents_handle_noop as
      event handler, resulting in no timer activity.
      
      The problematic path seems to be
      
      * old device already has hrtimer_interrupt as the event_handler
      * new clockevent device registers with a higher rating
      * tick_check_new_device() is called
        * clockevents_exchange_device() gets called
          * old->event_handler is set to clockevents_handle_noop
        * tick_setup_device() is called for the new device
          * which sets new->event_handler using the old->event_handler which is noop.
      
      Change the ordering so that new device inherits the proper handler.
      
      This does not have any issue in normal case as most likely all the clockevent
      devices are setup before the highres switch. But, can potentially be affecting
      some corner case where HPET force detect happens after the highres switch.
      This was a problem with HPET in MSI mode code that we have been experimenting
      with.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7c1e7689
  18. 22 8月, 2008 1 次提交
  19. 21 8月, 2008 3 次提交
    • M
      nohz: fix wrong event handler after online an offlined cpu · 3c4fbe5e
      Miao Xie 提交于
      On the tickless system(CONFIG_NO_HZ=y and CONFIG_HIGH_RES_TIMERS=n), after
      I made an offlined cpu online, I found this cpu's event handler was
      tick_handle_periodic, not tick_nohz_handler.
      
      After debuging, I found this bug was caused by the wrong tick mode.  the
      tick mode is not changed to NOHZ_MODE_INACTIVE when the cpu is offline.
      
      This patch fixes this bug.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3c4fbe5e
    • J
      clocksource: introduce CLOCK_MONOTONIC_RAW · 2d42244a
      John Stultz 提交于
      In talking with Josip Loncaric, and his work on clock synchronization (see
      btime.sf.net), he mentioned that for really close synchronization, it is
      useful to have access to "hardware time", that is a notion of time that is
      not in any way adjusted by the clock slewing done to keep close time sync.
      
      Part of the issue is if we are using the kernel's ntp adjusted
      representation of time in order to measure how we should correct time, we
      can run into what Paul McKenney aptly described as "Painting a road using
      the lines we're painting as the guide".
      
      I had been thinking of a similar problem, and was trying to come up with a
      way to give users access to a purely hardware based time representation
      that avoided users having to know the underlying frequency and mask values
      needed to deal with the wide variety of possible underlying hardware
      counters.
      
      My solution is to introduce CLOCK_MONOTONIC_RAW.  This exposes a
      nanosecond based time value, that increments starting at bootup and has no
      frequency adjustments made to it what so ever.
      
      The time is accessed from userspace via the posix_clock_gettime() syscall,
      passing CLOCK_MONOTONIC_RAW as the clock_id.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2d42244a
    • R
      clocksource: introduce clocksource_forward_now() · 9a055117
      Roman Zippel 提交于
      To keep the raw monotonic patch simple first introduce
      clocksource_forward_now(), which takes care of the offset since the last
      update_wall_time() call and adds it to the clock, so there is no need
      anymore to deal with it explicitly at various places, which need to make
      significant changes to the clock.
      
      This is also gets rid of the timekeeping_suspend_nsecs, instead of
      waiting until resume, the value is accumulated during suspend. In the end
      there is only a single user of __get_nsec_offset() left, so I integrated
      it back to getnstimeofday().
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9a055117