1. 10 11月, 2015 1 次提交
    • A
      remove abs64() · 79211c8e
      Andrew Morton 提交于
      Switch everything to the new and more capable implementation of abs().
      Mainly to give the new abs() a bit of a workout.
      
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79211c8e
  2. 16 10月, 2015 1 次提交
  3. 02 10月, 2015 2 次提交
  4. 22 9月, 2015 1 次提交
  5. 13 9月, 2015 1 次提交
    • J
      time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() · 2619d7e9
      John Stultz 提交于
      The internal clocksteering done for fine-grained error
      correction uses a logarithmic approximation, so any time
      adjtimex() adjusts the clock steering, timekeeping_freqadjust()
      quickly approximates the correct clock frequency over a series
      of ticks.
      
      Unfortunately, the logic in timekeeping_freqadjust(), introduced
      in commit:
      
        dc491596 ("timekeeping: Rework frequency adjustments to work better w/ nohz")
      
      used the abs() function with a s64 error value to calculate the
      size of the approximated adjustment to be made.
      
      Per include/linux/kernel.h:
      
        "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".
      
      Thus on 32-bit platforms, this resulted in the clocksteering to
      take a quite dampended random walk trying to converge on the
      proper frequency, which caused the adjustments to be made much
      slower then intended (most easily observed when large
      adjustments are made).
      
      This patch fixes the issue by using abs64() instead.
      Reported-by: NNuno Gonçalves <nunojpg@gmail.com>
      Tested-by: NNuno Goncalves <nunojpg@gmail.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: <stable@vger.kernel.org> # v3.17+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2619d7e9
  6. 18 8月, 2015 2 次提交
    • B
      time: Introduce current_kernel_time64() · 8758a240
      Baolin Wang 提交于
      The current_kernel_time() is not year 2038 safe on 32bit systems
      since it returns a timespec value. Introduce current_kernel_time64()
      which returns a timespec64 value.
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBaolin Wang <baolin.wang@linaro.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      8758a240
    • W
      time: Always make sure wall_to_monotonic isn't positive · e1d7ba87
      Wang YanQing 提交于
      Two issues were found on an IMX6 development board without an
      enabled RTC device(resulting in the boot time and monotonic
      time being initialized to 0).
      
      Issue 1:exportfs -a generate:
             "exportfs: /opt/nfs/arm does not support NFS export"
      Issue 2:cat /proc/stat:
             "btime 4294967236"
      
      The same issues can be reproduced on x86 after running the
      following code:
      	int main(void)
      	{
      	    struct timeval val;
      	    int ret;
      
      	    val.tv_sec = 0;
      	    val.tv_usec = 0;
      	    ret = settimeofday(&val, NULL);
      	    return 0;
      	}
      
      Two issues are different symptoms of same problem:
      The reason is a positive wall_to_monotonic pushes boot time back
      to the time before Epoch, and getboottime will return negative
      value.
      
      In symptom 1:
                negative boot time cause get_expiry() to overflow time_t
                when input expire time is 2147483647, then cache_flush()
                always clears entries just added in ip_map_parse.
      In symptom 2:
                show_stat() uses "unsigned long" to print negative btime
                value returned by getboottime.
      
      This patch fix the problem by prohibiting time from being set to a value which
      would cause a negative boot time. As a result one can't set the CLOCK_REALTIME
      time prior to (1970 + system uptime).
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NWang YanQing <udknight@gmail.com>
      [jstultz: reworded commit message]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      e1d7ba87
  7. 18 6月, 2015 1 次提交
    • J
      timekeeping: Copy the shadow-timekeeper over the real timekeeper last · 906c5557
      John Stultz 提交于
      The fix in d1518326 (time: Move clock_was_set_seq update
      before updating shadow-timekeeper) was unfortunately incomplete.
      
      The main gist of that change was to do the shadow-copy update
      last, so that any state changes were properly duplicated, and
      we wouldn't accidentally have stale data in the shadow.
      
      Unfortunately in the main update_wall_time() logic, we update
      use the shadow-timekeeper to calculate the next update values,
      then while holding the lock, copy the shadow-timekeeper over,
      then call timekeeping_update() to do some additional
      bookkeeping, (skipping the shadow mirror). The bug with this is
      the additional bookkeeping isn't all read-only, and some
      changes timkeeper state. Thus we might then overwrite this state
      change on the next update.
      
      To avoid this problem, do the timekeeping_update() on the
      shadow-timekeeper prior to copying the full state over to
      the real-timekeeper.
      
      This avoids problems with both the clock_was_set_seq and
      next_leap_ktime being overwritten and possibly the
      fast-timekeepers as well.
      
      Many thanks to Prarit for his rigorous testing, which discovered
      this problem, along with Prarit and Daniel's work validating this
      fix.
      Reported-by: NPrarit Bhargava <prarit@redhat.com>
      Tested-by: NPrarit Bhargava <prarit@redhat.com>
      Tested-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1434560753-7441-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      906c5557
  8. 12 6月, 2015 2 次提交
    • J
      time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge · 833f32d7
      John Stultz 提交于
      Currently, leapsecond adjustments are done at tick time. As a result,
      the leapsecond was applied at the first timer tick *after* the
      leapsecond (~1-10ms late depending on HZ), rather then exactly on the
      second edge.
      
      This was in part historical from back when we were always tick based,
      but correcting this since has been avoided since it adds extra
      conditional checks in the gettime fastpath, which has performance
      overhead.
      
      However, it was recently pointed out that ABS_TIME CLOCK_REALTIME
      timers set for right after the leapsecond could fire a second early,
      since some timers may be expired before we trigger the timekeeping
      timer, which then applies the leapsecond.
      
      This isn't quite as bad as it sounds, since behaviorally it is similar
      to what is possible w/ ntpd made leapsecond adjustments done w/o using
      the kernel discipline. Where due to latencies, timers may fire just
      prior to the settimeofday call. (Also, one should note that all
      applications using CLOCK_REALTIME timers should always be careful,
      since they are prone to quirks from settimeofday() disturbances.)
      
      However, the purpose of having the kernel do the leap adjustment is to
      avoid such latencies, so I think this is worth fixing.
      
      So in order to properly keep those timers from firing a second early,
      this patch modifies the ntp and timekeeping logic so that we keep
      enough state so that the update_base_offsets_now accessor, which
      provides the hrtimer core the current time, can check and apply the
      leapsecond adjustment on the second edge. This prevents the hrtimer
      core from expiring timers too early.
      
      This patch does not modify any other time read path, so no additional
      overhead is incurred. However, this also means that the leap-second
      continues to be applied at tick time for all other read-paths.
      
      Apologies to Richard Cochran, who pushed for similar changes years
      ago, which I resisted due to the concerns about the performance
      overhead.
      
      While I suspect this isn't extremely critical, folks who care about
      strict leap-second correctness will likely want to watch
      this. Potentially a -stable candidate eventually.
      Originally-suggested-by: NRichard Cochran <richardcochran@gmail.com>
      Reported-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Reported-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      833f32d7
    • J
      time: Move clock_was_set_seq update before updating shadow-timekeeper · d1518326
      John Stultz 提交于
      It was reported that 868a3e91 (hrtimer: Make offset
      update smarter) was causing timer problems after suspend/resume.
      
      The problem with that change is the modification to
      clock_was_set_seq in timekeeping_update is done prior to
      mirroring the time state to the shadow-timekeeper. Thus the
      next time we do update_wall_time() the updated sequence is
      overwritten by whats in the shadow copy.
      
      This patch moves the shadow-timekeeper mirroring to the end
      of the function, after all updates have been made, so all data
      is kept in sync.
      
      (This patch also affects the update_fast_timekeeper calls which
      were also problematically done prior to the mirroring).
      Reported-and-tested-by: NJeremiah Mahler <jmmahler@gmail.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d1518326
  9. 28 5月, 2015 2 次提交
  10. 23 5月, 2015 3 次提交
    • X
      time: Remove read_boot_clock() · e83d0a41
      Xunlei Pang 提交于
      Now that we have a read_boot_clock64() function available on every
      architecture, and converted all the users to it, it's time to remove
      the (now unused) read_boot_clock() completely from the kernel.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
      [jstultz: Minor commit message tweak suggested by Ingo]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      e83d0a41
    • J
      time: Rework debugging variables so they aren't global · 57d05a93
      John Stultz 提交于
      Ingo suggested that the timekeeping debugging variables
      recently added should not be global, and should be tied
      to the timekeeper's read_base.
      
      Thus this patch implements that suggestion.
      
      This version is different from the earlier versions
      as it keeps the variables in the timekeeper structure
      rather then in the tkr.
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      57d05a93
    • H
      timekeeping: Provide new API to get the current time resolution · 6374f912
      Harald Geyer 提交于
      This patch series introduces a new function
      u32 ktime_get_resolution_ns(void)
      which allows to clean up some driver code.
      
      In particular the IIO subsystem has a function to provide timestamps for
      events but no means to get their resolution. So currently the dht11 driver
      tries to guess the resolution in a rather messy and convoluted way. We
      can do much better with the new code.
      
      This API is not designed to be exposed to user space.
      
      This has been tested on i386, sunxi and mxs.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NHarald Geyer <harald@ccbib.org>
      [jstultz: Tweaked to make it build after upstream changes]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      6374f912
  11. 22 4月, 2015 2 次提交
  12. 03 4月, 2015 6 次提交
  13. 01 4月, 2015 1 次提交
  14. 27 3月, 2015 4 次提交
  15. 13 3月, 2015 4 次提交
    • J
      timekeeping: Add warnings when overflows or underflows are observed · 4ca22c26
      John Stultz 提交于
      It was suggested that the underflow/overflow protection
      should probably throw some sort of warning out, rather
      than just silently fixing the issue.
      
      So this patch adds some warnings here. The flag variables
      used are not protected by locks, but since we can't print
      from the reading functions, just being able to say we
      saw an issue in the update interval is useful enough,
      and can be slightly racy without real consequence.
      
      The big complication is that we're only under a read
      seqlock, so the data could shift under us during
      our calculation to see if there was a problem. This
      patch avoids this issue by nesting another seqlock
      which allows us to snapshot the just required values
      atomically. So we shouldn't see false positives.
      
      I also added some basic rate-limiting here, since
      on one build machine w/ skewed TSCs it was fairly
      noisy at bootup.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-8-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4ca22c26
    • J
      timekeeping: Try to catch clocksource delta underflows · 057b87e3
      John Stultz 提交于
      In the case where there is a broken clocksource
      where there are multiple actual clocks that
      aren't perfectly aligned, we may see small "negative"
      deltas when we subtract 'now' from 'cycle_last'.
      
      The values are actually negative with respect to the
      clocksource mask value, not necessarily negative
      if cast to a s64, but we can check by checking the
      delta to see if it is a small (relative to the mask)
      negative value (again negative relative to the mask).
      
      If so, we assume we jumped backwards somehow and
      instead use zero for our delta.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-7-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      057b87e3
    • J
      timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value · a558cd02
      John Stultz 提交于
      When calculating the current delta since the last tick, we
      currently have no hard protections to prevent a multiplication
      overflow from occuring.
      
      This patch introduces infrastructure to allow a cap that
      limits the clocksource read delta value to the 'max_cycles' value,
      which is where an overflow would occur.
      
      Since this is in the hotpath, it adds the extra checking under
      CONFIG_DEBUG_TIMEKEEPING=y.
      
      There was some concern that capping time like this could cause
      problems as we may stop expiring timers, which could go circular
      if the timer that triggers time accumulation were mis-scheduled
      too far in the future, which would cause time to stop.
      
      However, since the mult overflow would result in a smaller time
      value, we would effectively have the same problem there.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-6-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a558cd02
    • J
      timekeeping: Add debugging checks to warn if we see delays · 3c17ad19
      John Stultz 提交于
      Recently there's been requests for better sanity
      checking in the time code, so that it's more clear
      when something is going wrong, since timekeeping issues
      could manifest in a large number of strange ways in
      various subsystems.
      
      Thus, this patch adds some extra infrastructure to
      add a check to update_wall_time() to print two new
      warnings:
      
       1) if we see the call delayed beyond the 'max_cycles'
          overflow point,
      
       2) or if we see the call delayed beyond the clocksource's
          'max_idle_ns' value, which is currently 50% of the
          overflow point.
      
      This extra infrastructure is conditional on
      a new CONFIG_DEBUG_TIMEKEEPING option, also
      added in this patch - default off.
      
      Tested this a bit by halting qemu for specified
      lengths of time to trigger the warnings.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stultz@linaro.org
      [ Improved the changelog and the messages a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3c17ad19
  16. 16 2月, 2015 2 次提交
    • R
      PM / sleep: Make it possible to quiesce timers during suspend-to-idle · 124cf911
      Rafael J. Wysocki 提交于
      The efficiency of suspend-to-idle depends on being able to keep CPUs
      in the deepest available idle states for as much time as possible.
      Ideally, they should only be brought out of idle by system wakeup
      interrupts.
      
      However, timer interrupts occurring periodically prevent that from
      happening and it is not practical to chase all of the "misbehaving"
      timers in a whack-a-mole fashion.  A much more effective approach is
      to suspend the local ticks for all CPUs and the entire timekeeping
      along the lines of what is done during full suspend, which also
      helps to keep suspend-to-idle and full suspend reasonably similar.
      
      The idea is to suspend the local tick on each CPU executing
      cpuidle_enter_freeze() and to make the last of them suspend the
      entire timekeeping.  That should prevent timer interrupts from
      triggering until an IO interrupt wakes up one of the CPUs.  It
      needs to be done with interrupts disabled on all of the CPUs,
      though, because otherwise the suspended clocksource might be
      accessed by an interrupt handler which might lead to fatal
      consequences.
      
      Unfortunately, the existing ->enter callbacks provided by cpuidle
      drivers generally cannot be used for implementing that, because some
      of them re-enable interrupts temporarily and some idle entry methods
      cause interrupts to be re-enabled automatically on exit.  Also some
      of these callbacks manipulate local clock event devices of the CPUs
      which really shouldn't be done after suspending their ticks.
      
      To overcome that difficulty, introduce a new cpuidle state callback,
      ->enter_freeze, that will be guaranteed (1) to keep interrupts
      disabled all the time (and return with interrupts disabled) and (2)
      not to touch the CPU timer devices.  Modify cpuidle_enter_freeze() to
      look for the deepest available idle state with ->enter_freeze present
      and to make the CPU execute that callback with suspended tick (and the
      last of the online CPUs to execute it with suspended timekeeping).
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      124cf911
    • R
      timekeeping: Make it safe to use the fast timekeeper while suspended · 060407ae
      Rafael J. Wysocki 提交于
      Theoretically, ktime_get_mono_fast_ns() may be executed after
      timekeeping has been suspended (or before it is resumed) which
      in turn may lead to undefined behavior, for example, when the
      clocksource read from timekeeping_get_ns() called by it is
      not accessible at that time.
      
      Prevent that from happening by setting up a dummy readout base for
      the fast timekeeper during timekeeping_suspend() such that it will
      always return the same number of cycles.
      
      After the last timekeeping_update() in timekeeping_suspend() the
      clocksource is read and the result is stored as cycles_at_suspend.
      The readout base from the current timekeeper is copied onto the
      dummy and the ->read pointer of the dummy is set to a routine
      unconditionally returning cycles_at_suspend.  Next, the dummy is
      passed to update_fast_timekeeper().
      
      Then, ktime_get_mono_fast_ns() will work until the subsequent
      timekeeping_resume() and the proper readout base for the fast
      timekeeper will be restored by the timekeeping_update() called
      right after clearing timekeeping_suspended.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      060407ae
  17. 14 2月, 2015 1 次提交
  18. 24 1月, 2015 1 次提交
    • J
      time: Expose getboottime64 for in-kernel uses · d08c0cdd
      John Stultz 提交于
      Adds a timespec64 based getboottime64() implementation
      that can be used as we convert internal users of
      getboottime away from using timespecs.
      
      Cc: pang.xunlei <pang.xunlei@linaro.org>
      Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      d08c0cdd
  19. 25 11月, 2014 1 次提交
  20. 22 11月, 2014 2 次提交