1. 12 6月, 2015 2 次提交
    • J
      time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge · 833f32d7
      John Stultz 提交于
      Currently, leapsecond adjustments are done at tick time. As a result,
      the leapsecond was applied at the first timer tick *after* the
      leapsecond (~1-10ms late depending on HZ), rather then exactly on the
      second edge.
      
      This was in part historical from back when we were always tick based,
      but correcting this since has been avoided since it adds extra
      conditional checks in the gettime fastpath, which has performance
      overhead.
      
      However, it was recently pointed out that ABS_TIME CLOCK_REALTIME
      timers set for right after the leapsecond could fire a second early,
      since some timers may be expired before we trigger the timekeeping
      timer, which then applies the leapsecond.
      
      This isn't quite as bad as it sounds, since behaviorally it is similar
      to what is possible w/ ntpd made leapsecond adjustments done w/o using
      the kernel discipline. Where due to latencies, timers may fire just
      prior to the settimeofday call. (Also, one should note that all
      applications using CLOCK_REALTIME timers should always be careful,
      since they are prone to quirks from settimeofday() disturbances.)
      
      However, the purpose of having the kernel do the leap adjustment is to
      avoid such latencies, so I think this is worth fixing.
      
      So in order to properly keep those timers from firing a second early,
      this patch modifies the ntp and timekeeping logic so that we keep
      enough state so that the update_base_offsets_now accessor, which
      provides the hrtimer core the current time, can check and apply the
      leapsecond adjustment on the second edge. This prevents the hrtimer
      core from expiring timers too early.
      
      This patch does not modify any other time read path, so no additional
      overhead is incurred. However, this also means that the leap-second
      continues to be applied at tick time for all other read-paths.
      
      Apologies to Richard Cochran, who pushed for similar changes years
      ago, which I resisted due to the concerns about the performance
      overhead.
      
      While I suspect this isn't extremely critical, folks who care about
      strict leap-second correctness will likely want to watch
      this. Potentially a -stable candidate eventually.
      Originally-suggested-by: NRichard Cochran <richardcochran@gmail.com>
      Reported-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Reported-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      833f32d7
    • J
      time: Move clock_was_set_seq update before updating shadow-timekeeper · d1518326
      John Stultz 提交于
      It was reported that 868a3e91 (hrtimer: Make offset
      update smarter) was causing timer problems after suspend/resume.
      
      The problem with that change is the modification to
      clock_was_set_seq in timekeeping_update is done prior to
      mirroring the time state to the shadow-timekeeper. Thus the
      next time we do update_wall_time() the updated sequence is
      overwritten by whats in the shadow copy.
      
      This patch moves the shadow-timekeeper mirroring to the end
      of the function, after all updates have been made, so all data
      is kept in sync.
      
      (This patch also affects the update_fast_timekeeper calls which
      were also problematically done prior to the mirroring).
      Reported-and-tested-by: NJeremiah Mahler <jmmahler@gmail.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d1518326
  2. 23 5月, 2015 3 次提交
    • X
      time: Remove read_boot_clock() · e83d0a41
      Xunlei Pang 提交于
      Now that we have a read_boot_clock64() function available on every
      architecture, and converted all the users to it, it's time to remove
      the (now unused) read_boot_clock() completely from the kernel.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
      [jstultz: Minor commit message tweak suggested by Ingo]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      e83d0a41
    • J
      time: Rework debugging variables so they aren't global · 57d05a93
      John Stultz 提交于
      Ingo suggested that the timekeeping debugging variables
      recently added should not be global, and should be tied
      to the timekeeper's read_base.
      
      Thus this patch implements that suggestion.
      
      This version is different from the earlier versions
      as it keeps the variables in the timekeeper structure
      rather then in the tkr.
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      57d05a93
    • H
      timekeeping: Provide new API to get the current time resolution · 6374f912
      Harald Geyer 提交于
      This patch series introduces a new function
      u32 ktime_get_resolution_ns(void)
      which allows to clean up some driver code.
      
      In particular the IIO subsystem has a function to provide timestamps for
      events but no means to get their resolution. So currently the dht11 driver
      tries to guess the resolution in a rather messy and convoluted way. We
      can do much better with the new code.
      
      This API is not designed to be exposed to user space.
      
      This has been tested on i386, sunxi and mxs.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NHarald Geyer <harald@ccbib.org>
      [jstultz: Tweaked to make it build after upstream changes]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      6374f912
  3. 22 4月, 2015 2 次提交
  4. 03 4月, 2015 6 次提交
  5. 01 4月, 2015 1 次提交
  6. 27 3月, 2015 4 次提交
  7. 13 3月, 2015 4 次提交
    • J
      timekeeping: Add warnings when overflows or underflows are observed · 4ca22c26
      John Stultz 提交于
      It was suggested that the underflow/overflow protection
      should probably throw some sort of warning out, rather
      than just silently fixing the issue.
      
      So this patch adds some warnings here. The flag variables
      used are not protected by locks, but since we can't print
      from the reading functions, just being able to say we
      saw an issue in the update interval is useful enough,
      and can be slightly racy without real consequence.
      
      The big complication is that we're only under a read
      seqlock, so the data could shift under us during
      our calculation to see if there was a problem. This
      patch avoids this issue by nesting another seqlock
      which allows us to snapshot the just required values
      atomically. So we shouldn't see false positives.
      
      I also added some basic rate-limiting here, since
      on one build machine w/ skewed TSCs it was fairly
      noisy at bootup.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-8-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4ca22c26
    • J
      timekeeping: Try to catch clocksource delta underflows · 057b87e3
      John Stultz 提交于
      In the case where there is a broken clocksource
      where there are multiple actual clocks that
      aren't perfectly aligned, we may see small "negative"
      deltas when we subtract 'now' from 'cycle_last'.
      
      The values are actually negative with respect to the
      clocksource mask value, not necessarily negative
      if cast to a s64, but we can check by checking the
      delta to see if it is a small (relative to the mask)
      negative value (again negative relative to the mask).
      
      If so, we assume we jumped backwards somehow and
      instead use zero for our delta.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-7-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      057b87e3
    • J
      timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value · a558cd02
      John Stultz 提交于
      When calculating the current delta since the last tick, we
      currently have no hard protections to prevent a multiplication
      overflow from occuring.
      
      This patch introduces infrastructure to allow a cap that
      limits the clocksource read delta value to the 'max_cycles' value,
      which is where an overflow would occur.
      
      Since this is in the hotpath, it adds the extra checking under
      CONFIG_DEBUG_TIMEKEEPING=y.
      
      There was some concern that capping time like this could cause
      problems as we may stop expiring timers, which could go circular
      if the timer that triggers time accumulation were mis-scheduled
      too far in the future, which would cause time to stop.
      
      However, since the mult overflow would result in a smaller time
      value, we would effectively have the same problem there.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-6-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a558cd02
    • J
      timekeeping: Add debugging checks to warn if we see delays · 3c17ad19
      John Stultz 提交于
      Recently there's been requests for better sanity
      checking in the time code, so that it's more clear
      when something is going wrong, since timekeeping issues
      could manifest in a large number of strange ways in
      various subsystems.
      
      Thus, this patch adds some extra infrastructure to
      add a check to update_wall_time() to print two new
      warnings:
      
       1) if we see the call delayed beyond the 'max_cycles'
          overflow point,
      
       2) or if we see the call delayed beyond the clocksource's
          'max_idle_ns' value, which is currently 50% of the
          overflow point.
      
      This extra infrastructure is conditional on
      a new CONFIG_DEBUG_TIMEKEEPING option, also
      added in this patch - default off.
      
      Tested this a bit by halting qemu for specified
      lengths of time to trigger the warnings.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stultz@linaro.org
      [ Improved the changelog and the messages a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3c17ad19
  8. 16 2月, 2015 2 次提交
    • R
      PM / sleep: Make it possible to quiesce timers during suspend-to-idle · 124cf911
      Rafael J. Wysocki 提交于
      The efficiency of suspend-to-idle depends on being able to keep CPUs
      in the deepest available idle states for as much time as possible.
      Ideally, they should only be brought out of idle by system wakeup
      interrupts.
      
      However, timer interrupts occurring periodically prevent that from
      happening and it is not practical to chase all of the "misbehaving"
      timers in a whack-a-mole fashion.  A much more effective approach is
      to suspend the local ticks for all CPUs and the entire timekeeping
      along the lines of what is done during full suspend, which also
      helps to keep suspend-to-idle and full suspend reasonably similar.
      
      The idea is to suspend the local tick on each CPU executing
      cpuidle_enter_freeze() and to make the last of them suspend the
      entire timekeeping.  That should prevent timer interrupts from
      triggering until an IO interrupt wakes up one of the CPUs.  It
      needs to be done with interrupts disabled on all of the CPUs,
      though, because otherwise the suspended clocksource might be
      accessed by an interrupt handler which might lead to fatal
      consequences.
      
      Unfortunately, the existing ->enter callbacks provided by cpuidle
      drivers generally cannot be used for implementing that, because some
      of them re-enable interrupts temporarily and some idle entry methods
      cause interrupts to be re-enabled automatically on exit.  Also some
      of these callbacks manipulate local clock event devices of the CPUs
      which really shouldn't be done after suspending their ticks.
      
      To overcome that difficulty, introduce a new cpuidle state callback,
      ->enter_freeze, that will be guaranteed (1) to keep interrupts
      disabled all the time (and return with interrupts disabled) and (2)
      not to touch the CPU timer devices.  Modify cpuidle_enter_freeze() to
      look for the deepest available idle state with ->enter_freeze present
      and to make the CPU execute that callback with suspended tick (and the
      last of the online CPUs to execute it with suspended timekeeping).
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      124cf911
    • R
      timekeeping: Make it safe to use the fast timekeeper while suspended · 060407ae
      Rafael J. Wysocki 提交于
      Theoretically, ktime_get_mono_fast_ns() may be executed after
      timekeeping has been suspended (or before it is resumed) which
      in turn may lead to undefined behavior, for example, when the
      clocksource read from timekeeping_get_ns() called by it is
      not accessible at that time.
      
      Prevent that from happening by setting up a dummy readout base for
      the fast timekeeper during timekeeping_suspend() such that it will
      always return the same number of cycles.
      
      After the last timekeeping_update() in timekeeping_suspend() the
      clocksource is read and the result is stored as cycles_at_suspend.
      The readout base from the current timekeeper is copied onto the
      dummy and the ->read pointer of the dummy is set to a routine
      unconditionally returning cycles_at_suspend.  Next, the dummy is
      passed to update_fast_timekeeper().
      
      Then, ktime_get_mono_fast_ns() will work until the subsequent
      timekeeping_resume() and the proper readout base for the fast
      timekeeper will be restored by the timekeeping_update() called
      right after clearing timekeeping_suspended.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      060407ae
  9. 14 2月, 2015 1 次提交
  10. 24 1月, 2015 1 次提交
    • J
      time: Expose getboottime64 for in-kernel uses · d08c0cdd
      John Stultz 提交于
      Adds a timespec64 based getboottime64() implementation
      that can be used as we convert internal users of
      getboottime away from using timespecs.
      
      Cc: pang.xunlei <pang.xunlei@linaro.org>
      Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      d08c0cdd
  11. 25 11月, 2014 1 次提交
  12. 22 11月, 2014 7 次提交
  13. 29 10月, 2014 2 次提交
  14. 06 9月, 2014 1 次提交
  15. 15 8月, 2014 1 次提交
  16. 24 7月, 2014 2 次提交
    • J
      timekeeping: Use cached ntp_tick_length when accumulating error · 375f45b5
      John Stultz 提交于
      By caching the ntp_tick_length() when we correct the frequency error,
      and then using that cached value to accumulate error, we avoid large
      initial errors when the tick length is changed.
      
      This makes convergence happen much faster in the simulator, since the
      initial error doesn't have to be slowly whittled away.
      
      This initially seems like an accounting error, but Miroslav pointed out
      that ntp_tick_length() can change mid-tick, so when we apply it in the
      error accumulation, we are applying any recent change to the entire tick.
      
      This approach chooses to apply changes in the ntp_tick_length() only to
      the next tick, which allows us to calculate the freq correction before
      using the new tick length, which avoids accummulating error.
      
      Credit to Miroslav for pointing this out and providing the original patch
      this functionality has been pulled out from, along with the rational.
      
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Reported-by: NMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      375f45b5
    • J
      timekeeping: Rework frequency adjustments to work better w/ nohz · dc491596
      John Stultz 提交于
      The existing timekeeping_adjust logic has always been complicated
      to understand. Further, since it was developed prior to NOHZ becoming
      common, its not surprising it performs poorly when NOHZ is enabled.
      
      Since Miroslav pointed out the problematic nature of the existing code
      in the NOHZ case, I've tried to refactor the code to perform better.
      
      The problem with the previous approach was that it tried to adjust
      for the total cumulative error using a scaled dampening factor. This
      resulted in large errors to be corrected slowly, while small errors
      were corrected quickly. With NOHZ the timekeeping code doesn't know
      how far out the next tick will be, so this results in bad
      over-correction to small errors, and insufficient correction to large
      errors.
      
      Inspired by Miroslav's patch, I've refactored the code to try to
      address the correction in two steps.
      
      1) Check the future freq error for the next tick, and if the frequency
      error is large, try to make sure we correct it so it doesn't cause
      much accumulated error.
      
      2) Then make a small single unit adjustment to correct any cumulative
      error that has collected over time.
      
      This method performs fairly well in the simulator Miroslav created.
      
      Major credit to Miroslav for pointing out the issue, providing the
      original patch to resolve this, a simulator for testing, as well as
      helping debug and resolve issues in my implementation so that it
      performed closer to his original implementation.
      
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Reported-by: NMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      dc491596