1. 03 4月, 2015 4 次提交
  2. 01 4月, 2015 1 次提交
  3. 27 3月, 2015 4 次提交
  4. 13 3月, 2015 4 次提交
    • J
      timekeeping: Add warnings when overflows or underflows are observed · 4ca22c26
      John Stultz 提交于
      It was suggested that the underflow/overflow protection
      should probably throw some sort of warning out, rather
      than just silently fixing the issue.
      
      So this patch adds some warnings here. The flag variables
      used are not protected by locks, but since we can't print
      from the reading functions, just being able to say we
      saw an issue in the update interval is useful enough,
      and can be slightly racy without real consequence.
      
      The big complication is that we're only under a read
      seqlock, so the data could shift under us during
      our calculation to see if there was a problem. This
      patch avoids this issue by nesting another seqlock
      which allows us to snapshot the just required values
      atomically. So we shouldn't see false positives.
      
      I also added some basic rate-limiting here, since
      on one build machine w/ skewed TSCs it was fairly
      noisy at bootup.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-8-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4ca22c26
    • J
      timekeeping: Try to catch clocksource delta underflows · 057b87e3
      John Stultz 提交于
      In the case where there is a broken clocksource
      where there are multiple actual clocks that
      aren't perfectly aligned, we may see small "negative"
      deltas when we subtract 'now' from 'cycle_last'.
      
      The values are actually negative with respect to the
      clocksource mask value, not necessarily negative
      if cast to a s64, but we can check by checking the
      delta to see if it is a small (relative to the mask)
      negative value (again negative relative to the mask).
      
      If so, we assume we jumped backwards somehow and
      instead use zero for our delta.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-7-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      057b87e3
    • J
      timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value · a558cd02
      John Stultz 提交于
      When calculating the current delta since the last tick, we
      currently have no hard protections to prevent a multiplication
      overflow from occuring.
      
      This patch introduces infrastructure to allow a cap that
      limits the clocksource read delta value to the 'max_cycles' value,
      which is where an overflow would occur.
      
      Since this is in the hotpath, it adds the extra checking under
      CONFIG_DEBUG_TIMEKEEPING=y.
      
      There was some concern that capping time like this could cause
      problems as we may stop expiring timers, which could go circular
      if the timer that triggers time accumulation were mis-scheduled
      too far in the future, which would cause time to stop.
      
      However, since the mult overflow would result in a smaller time
      value, we would effectively have the same problem there.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-6-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a558cd02
    • J
      timekeeping: Add debugging checks to warn if we see delays · 3c17ad19
      John Stultz 提交于
      Recently there's been requests for better sanity
      checking in the time code, so that it's more clear
      when something is going wrong, since timekeeping issues
      could manifest in a large number of strange ways in
      various subsystems.
      
      Thus, this patch adds some extra infrastructure to
      add a check to update_wall_time() to print two new
      warnings:
      
       1) if we see the call delayed beyond the 'max_cycles'
          overflow point,
      
       2) or if we see the call delayed beyond the clocksource's
          'max_idle_ns' value, which is currently 50% of the
          overflow point.
      
      This extra infrastructure is conditional on
      a new CONFIG_DEBUG_TIMEKEEPING option, also
      added in this patch - default off.
      
      Tested this a bit by halting qemu for specified
      lengths of time to trigger the warnings.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stultz@linaro.org
      [ Improved the changelog and the messages a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3c17ad19
  5. 16 2月, 2015 2 次提交
    • R
      PM / sleep: Make it possible to quiesce timers during suspend-to-idle · 124cf911
      Rafael J. Wysocki 提交于
      The efficiency of suspend-to-idle depends on being able to keep CPUs
      in the deepest available idle states for as much time as possible.
      Ideally, they should only be brought out of idle by system wakeup
      interrupts.
      
      However, timer interrupts occurring periodically prevent that from
      happening and it is not practical to chase all of the "misbehaving"
      timers in a whack-a-mole fashion.  A much more effective approach is
      to suspend the local ticks for all CPUs and the entire timekeeping
      along the lines of what is done during full suspend, which also
      helps to keep suspend-to-idle and full suspend reasonably similar.
      
      The idea is to suspend the local tick on each CPU executing
      cpuidle_enter_freeze() and to make the last of them suspend the
      entire timekeeping.  That should prevent timer interrupts from
      triggering until an IO interrupt wakes up one of the CPUs.  It
      needs to be done with interrupts disabled on all of the CPUs,
      though, because otherwise the suspended clocksource might be
      accessed by an interrupt handler which might lead to fatal
      consequences.
      
      Unfortunately, the existing ->enter callbacks provided by cpuidle
      drivers generally cannot be used for implementing that, because some
      of them re-enable interrupts temporarily and some idle entry methods
      cause interrupts to be re-enabled automatically on exit.  Also some
      of these callbacks manipulate local clock event devices of the CPUs
      which really shouldn't be done after suspending their ticks.
      
      To overcome that difficulty, introduce a new cpuidle state callback,
      ->enter_freeze, that will be guaranteed (1) to keep interrupts
      disabled all the time (and return with interrupts disabled) and (2)
      not to touch the CPU timer devices.  Modify cpuidle_enter_freeze() to
      look for the deepest available idle state with ->enter_freeze present
      and to make the CPU execute that callback with suspended tick (and the
      last of the online CPUs to execute it with suspended timekeeping).
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      124cf911
    • R
      timekeeping: Make it safe to use the fast timekeeper while suspended · 060407ae
      Rafael J. Wysocki 提交于
      Theoretically, ktime_get_mono_fast_ns() may be executed after
      timekeeping has been suspended (or before it is resumed) which
      in turn may lead to undefined behavior, for example, when the
      clocksource read from timekeeping_get_ns() called by it is
      not accessible at that time.
      
      Prevent that from happening by setting up a dummy readout base for
      the fast timekeeper during timekeeping_suspend() such that it will
      always return the same number of cycles.
      
      After the last timekeeping_update() in timekeeping_suspend() the
      clocksource is read and the result is stored as cycles_at_suspend.
      The readout base from the current timekeeper is copied onto the
      dummy and the ->read pointer of the dummy is set to a routine
      unconditionally returning cycles_at_suspend.  Next, the dummy is
      passed to update_fast_timekeeper().
      
      Then, ktime_get_mono_fast_ns() will work until the subsequent
      timekeeping_resume() and the proper readout base for the fast
      timekeeper will be restored by the timekeeping_update() called
      right after clearing timekeeping_suspended.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      060407ae
  6. 14 2月, 2015 1 次提交
  7. 24 1月, 2015 1 次提交
    • J
      time: Expose getboottime64 for in-kernel uses · d08c0cdd
      John Stultz 提交于
      Adds a timespec64 based getboottime64() implementation
      that can be used as we convert internal users of
      getboottime away from using timespecs.
      
      Cc: pang.xunlei <pang.xunlei@linaro.org>
      Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      d08c0cdd
  8. 25 11月, 2014 1 次提交
  9. 22 11月, 2014 7 次提交
  10. 29 10月, 2014 2 次提交
  11. 06 9月, 2014 1 次提交
  12. 15 8月, 2014 1 次提交
  13. 24 7月, 2014 11 次提交
    • J
      timekeeping: Use cached ntp_tick_length when accumulating error · 375f45b5
      John Stultz 提交于
      By caching the ntp_tick_length() when we correct the frequency error,
      and then using that cached value to accumulate error, we avoid large
      initial errors when the tick length is changed.
      
      This makes convergence happen much faster in the simulator, since the
      initial error doesn't have to be slowly whittled away.
      
      This initially seems like an accounting error, but Miroslav pointed out
      that ntp_tick_length() can change mid-tick, so when we apply it in the
      error accumulation, we are applying any recent change to the entire tick.
      
      This approach chooses to apply changes in the ntp_tick_length() only to
      the next tick, which allows us to calculate the freq correction before
      using the new tick length, which avoids accummulating error.
      
      Credit to Miroslav for pointing this out and providing the original patch
      this functionality has been pulled out from, along with the rational.
      
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Reported-by: NMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      375f45b5
    • J
      timekeeping: Rework frequency adjustments to work better w/ nohz · dc491596
      John Stultz 提交于
      The existing timekeeping_adjust logic has always been complicated
      to understand. Further, since it was developed prior to NOHZ becoming
      common, its not surprising it performs poorly when NOHZ is enabled.
      
      Since Miroslav pointed out the problematic nature of the existing code
      in the NOHZ case, I've tried to refactor the code to perform better.
      
      The problem with the previous approach was that it tried to adjust
      for the total cumulative error using a scaled dampening factor. This
      resulted in large errors to be corrected slowly, while small errors
      were corrected quickly. With NOHZ the timekeeping code doesn't know
      how far out the next tick will be, so this results in bad
      over-correction to small errors, and insufficient correction to large
      errors.
      
      Inspired by Miroslav's patch, I've refactored the code to try to
      address the correction in two steps.
      
      1) Check the future freq error for the next tick, and if the frequency
      error is large, try to make sure we correct it so it doesn't cause
      much accumulated error.
      
      2) Then make a small single unit adjustment to correct any cumulative
      error that has collected over time.
      
      This method performs fairly well in the simulator Miroslav created.
      
      Major credit to Miroslav for pointing out the issue, providing the
      original patch to resolve this, a simulator for testing, as well as
      helping debug and resolve issues in my implementation so that it
      performed closer to his original implementation.
      
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Reported-by: NMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      dc491596
    • J
      timekeeping: Minor fixup for timespec64->timespec assignment · e2dff1ec
      John Stultz 提交于
      In the GENERIC_TIME_VSYSCALL_OLD update_vsyscall implementation,
      we take the tk_xtime() value, which returns a timespec64, and
      store it in a timespec.
      
      This luckily is ok, since the only architectures that use
      GENERIC_TIME_VSYSCALL_OLD are ia64 and ppc64, which are both
      64 bit systems where timespec64 is the same as a timespec.
      
      Even so, for cleanliness reasons, use the conversion function
      to assign the proper type.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      e2dff1ec
    • T
      timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC · 4396e058
      Thomas Gleixner 提交于
      Tracers want a correlated time between the kernel instrumentation and
      user space. We really do not want to export sched_clock() to user
      space, so we need to provide something sensible for this.
      
      Using separate data structures with an non blocking sequence count
      based update mechanism allows us to do that. The data structure
      required for the readout has a sequence counter and two copies of the
      timekeeping data.
      
      On the update side:
      
        smp_wmb();
        tkf->seq++;
        smp_wmb();
        update(tkf->base[0], tk);
        smp_wmb();
        tkf->seq++;
        smp_wmb();
        update(tkf->base[1], tk);
      
      On the reader side:
      
        do {
           seq = tkf->seq;
           smp_rmb();
           idx = seq & 0x01;
           now = now(tkf->base[idx]);
           smp_rmb();
        } while (seq != tkf->seq)
      
      So if a NMI hits the update of base[0] it will use base[1] which is
      still consistent, but this timestamp is not guaranteed to be monotonic
      across an update.
      
      The timestamp is calculated by:
      
      	now = base_mono + clock_delta * slope
      
      So if the update lowers the slope, readers who are forced to the
      not yet updated second array are still using the old steeper slope.
      
       tmono
       ^
       |    o  n
       |   o n
       |  u
       | o
       |o
       |12345678---> reader order
      
       o = old slope
       u = update
       n = new slope
      
      So reader 6 will observe time going backwards versus reader 5.
      
      While other CPUs are likely to be able observe that, the only way
      for a CPU local observation is when an NMI hits in the middle of
      the update. Timestamps taken from that NMI context might be ahead
      of the following timestamps. Callers need to be aware of that and
      deal with it.
      
      V2: Got rid of clock monotonic raw and reorganized the data
          structures. Folded in the barrier fix from Mathieu.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      4396e058
    • T
      timekeeping: Use tk_read_base as argument for timekeeping_get_ns() · 0e5ac3a8
      Thomas Gleixner 提交于
      All the function needs is in the tk_read_base struct. No functional
      change for the current code, just a preparatory patch for the NMI safe
      accessor to clock monotonic which will use struct tk_read_base as well.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      0e5ac3a8
    • T
      timekeeping: Create struct tk_read_base and use it in struct timekeeper · d28ede83
      Thomas Gleixner 提交于
      The members of the new struct are the required ones for the new NMI
      safe accessor to clcok monotonic. In order to reuse the existing
      timekeeping code and to make the update of the fast NMI safe
      timekeepers a simple memcpy use the struct for the timekeeper as well
      and convert all users.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      d28ede83
    • T
      timekeeping: Restructure the timekeeper some more · 6d3aadf3
      Thomas Gleixner 提交于
      Access to time requires to touch two cachelines at minimum
      
         1) The timekeeper data structure
      
         2) The clocksource data structure
      
      The access to the clocksource data structure can be avoided as almost
      all clocksource implementations ignore the argument to the read
      callback, which is a pointer to the clocksource.
      
      But the core needs to touch it to access the members @read and @mask.
      
      So we are better off by copying the @read function pointer and the
      @mask from the clocksource to the core data structure itself.
      
      For the most used ktime_get() access all required data including the
      @read and @mask copies fits together with the sequence counter into a
      single 64 byte cacheline.
      
      For the other time access functions we touch in the current code three
      cache lines in the worst case. But with the clocksource data copies we
      can reduce that to two adjacent cachelines, which is more efficient
      than disjunct cache lines.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      6d3aadf3
    • T
      clocksource: Get rid of cycle_last · 4a0e6377
      Thomas Gleixner 提交于
      cycle_last was added to the clocksource to support the TSC
      validation. We moved that to the core code, so we can get rid of the
      extra copy.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      4a0e6377
    • T
      clocksource: Make delta calculation a function · 3a978377
      Thomas Gleixner 提交于
      We want to move the TSC sanity check into core code to make NMI safe
      accessors to clock monotonic[_raw] possible. For this we need to
      sanity check the delta calculation. Create a helper function and
      convert all sites to use it.
      
      [ Build fix from jstultz ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      3a978377
    • T
      timekeeping: Provide ktime_get_raw() · f519b1a2
      Thomas Gleixner 提交于
      Provide a ktime_t based interface for raw monotonic time.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      f519b1a2
    • T
      timekeeping: Simplify timekeeping_clocktai() · 61edec81
      Thomas Gleixner 提交于
      timekeeping_clocktai() is not used in fast pathes, so the extra
      timespec conversion is not problematic.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      61edec81