提交 · 833f32d763028c1bb371c64f457788b933773b3e · openanolis / cloud-kernel

12 6月, 2015 2 次提交

time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge · 833f32d7

由 John Stultz 提交于 6月 11, 2015

Currently, leapsecond adjustments are done at tick time. As a result,
the leapsecond was applied at the first timer tick *after* the
leapsecond (~1-10ms late depending on HZ), rather then exactly on the
second edge.

This was in part historical from back when we were always tick based,
but correcting this since has been avoided since it adds extra
conditional checks in the gettime fastpath, which has performance
overhead.

However, it was recently pointed out that ABS_TIME CLOCK_REALTIME
timers set for right after the leapsecond could fire a second early,
since some timers may be expired before we trigger the timekeeping
timer, which then applies the leapsecond.

This isn't quite as bad as it sounds, since behaviorally it is similar
to what is possible w/ ntpd made leapsecond adjustments done w/o using
the kernel discipline. Where due to latencies, timers may fire just
prior to the settimeofday call. (Also, one should note that all
applications using CLOCK_REALTIME timers should always be careful,
since they are prone to quirks from settimeofday() disturbances.)

However, the purpose of having the kernel do the leap adjustment is to
avoid such latencies, so I think this is worth fixing.

So in order to properly keep those timers from firing a second early,
this patch modifies the ntp and timekeeping logic so that we keep
enough state so that the update_base_offsets_now accessor, which
provides the hrtimer core the current time, can check and apply the
leapsecond adjustment on the second edge. This prevents the hrtimer
core from expiring timers too early.

This patch does not modify any other time read path, so no additional
overhead is incurred. However, this also means that the leap-second
continues to be applied at tick time for all other read-paths.

Apologies to Richard Cochran, who pushed for similar changes years
ago, which I resisted due to the concerns about the performance
overhead.

While I suspect this isn't extremely critical, folks who care about
strict leap-second correctness will likely want to watch
this. Potentially a -stable candidate eventually.
Originally-suggested-by: NRichard Cochran <richardcochran@gmail.com>
Reported-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
Reported-by: NPrarit Bhargava <prarit@redhat.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

833f32d7

time: Move clock_was_set_seq update before updating shadow-timekeeper · d1518326

由 John Stultz 提交于 6月 11, 2015

It was reported that 868a3e91 (hrtimer: Make offset
update smarter) was causing timer problems after suspend/resume.

The problem with that change is the modification to
clock_was_set_seq in timekeeping_update is done prior to
mirroring the time state to the shadow-timekeeper. Thus the
next time we do update_wall_time() the updated sequence is
overwritten by whats in the shadow copy.

This patch moves the shadow-timekeeper mirroring to the end
of the function, after all updates have been made, so all data
is kept in sync.

(This patch also affects the update_fast_timekeeper calls which
were also problematically done prior to the mirroring).
Reported-and-tested-by: NJeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

d1518326

23 5月, 2015 3 次提交

time: Remove read_boot_clock() · e83d0a41

由 Xunlei Pang 提交于 4月 09, 2015

Now that we have a read_boot_clock64() function available on every
architecture, and converted all the users to it, it's time to remove
the (now unused) read_boot_clock() completely from the kernel.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
[jstultz: Minor commit message tweak suggested by Ingo]
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

e83d0a41

time: Rework debugging variables so they aren't global · 57d05a93

由 John Stultz 提交于 5月 13, 2015

Ingo suggested that the timekeeping debugging variables
recently added should not be global, and should be tied
to the timekeeper's read_base.

Thus this patch implements that suggestion.

This version is different from the earlier versions
as it keeps the variables in the timekeeper structure
rather then in the tkr.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

57d05a93

timekeeping: Provide new API to get the current time resolution · 6374f912

由 Harald Geyer 提交于 4月 07, 2015

This patch series introduces a new function
u32 ktime_get_resolution_ns(void)
which allows to clean up some driver code.

In particular the IIO subsystem has a function to provide timestamps for
events but no means to get their resolution. So currently the dht11 driver
tries to guess the resolution in a rather messy and convoluted way. We
can do much better with the new code.

This API is not designed to be exposed to user space.

This has been tested on i386, sunxi and mxs.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NHarald Geyer <harald@ccbib.org>
[jstultz: Tweaked to make it build after upstream changes]
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

6374f912

22 4月, 2015 2 次提交

hrtimer: Make offset update smarter · 868a3e91

由 Thomas Gleixner 提交于 4月 14, 2015

On every tick/hrtimer interrupt we update the offset variables of the
clock bases. That's silly because these offsets change very seldom.

Add a sequence counter to the time keeping code which keeps track of
the offset updates (clock_was_set()). Have a sequence cache in the
hrtimer cpu bases to evaluate whether the offsets must be updated or
not. This allows us later to avoid pointless cacheline pollution.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20150414203501.132820245@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>

868a3e91

hrtimer: Get rid of softirq time · 21d6d52a

由 Thomas Gleixner 提交于 4月 14, 2015

The softirq time field in the clock bases is an optimization from the
early days of hrtimers. It provides a coarse "jiffies" like time
mostly for self rearming timers.

But that comes with a price:
    - Larger code size
    - Extra storage space
    - Duplicated functions with really small differences
   
The benefit of this is optimization is marginal for contemporary
systems.

Consolidate everything on the high resolution timer
implementation. This makes further optimizations possible.

Text size reduction:
       x8664 -95, i386 -356, ARM -148, ARM64 -40, power64 -16
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.039977424@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

21d6d52a

03 4月, 2015 6 次提交

timekeeping: Get rid of stale comment · 347c6f6d

由 Thomas Gleixner 提交于 4月 03, 2015

Arch specific management of xtime/jiffies/wall_to_monotonic is
gone for quite a while. Zap the stale comment.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/2422730.dmO29q661S@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

347c6f6d

time, drivers/rtc: Don't bother with rtc_resume() for the nonstop clocksource · 0fa88cb4

由 Xunlei Pang 提交于 4月 01, 2015

If a system does not provide a persistent_clock(), the time
will be updated on resume by rtc_resume(). With the addition
of the non-stop clocksources for suspend timing, those systems
set the time on resume in timekeeping_resume(), but may not
provide a valid persistent_clock().

This results in the rtc_resume() logic thinking no one has set
the time and it then will over-write the suspend time again,
which is not necessary and only increases clock error.

So, fix this for rtc_resume().

This patch also improves the name of persistent_clock_exist to
make it more grammatical.
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-19-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

0fa88cb4

time: Fix a bug in timekeeping_suspend() with no persistent clock · 264bb3f7

由 Xunlei Pang 提交于 4月 01, 2015

When there's no persistent clock, normally
timekeeping_suspend_time should always be zero, but this can
break in timekeeping_suspend().

At T1, there was a system suspend, so old_delta was assigned T1.
After some time, one time adjustment happened, and xtime got the
value of T1-dt(0s<dt<2s). Then, there comes another system
suspend soon after this adjustment, obviously we will get a
small negative delta_delta, resulting in a negative
timekeeping_suspend_time.

This is problematic, when doing timekeeping_resume() if there is
no nonstop clocksource for example, it will hit the else leg and
inject the improper sleeptime which is the wrong logic.

So, we can solve this problem by only doing delta related code
when the persistent clock is existent. Actually the code only
makes sense for persistent clock cases.
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-18-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

264bb3f7

time: Don't build timekeeping_inject_sleeptime64() if no one uses it · 7f298139

由 Xunlei Pang 提交于 4月 01, 2015

timekeeping_inject_sleeptime64() is only used by RTC
suspend/resume, so add build dependencies on the necessary RTC
related macros.
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
[ Improve commit message clarity. ]
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-16-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

7f298139

time: Add y2038 safe read_persistent_clock64() · 2ee96632

由 Xunlei Pang 提交于 4月 01, 2015

As part of addressing in-kernel y2038 issues, this patch adds
read_persistent_clock64() and replaces all the call sites of
read_persistent_clock() with this function. This is a __weak
implementation, which simply calls the existing y2038 unsafe
read_persistent_clock().

This allows architecture specific implementations to be
converted independently, and eventually the y2038 unsafe
read_persistent_clock() can be removed after all its
architecture specific implementations have been converted to
read_persistent_clock64().
Suggested-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

2ee96632

time: Add y2038 safe read_boot_clock64() · 9a806ddb

由 Xunlei Pang 提交于 4月 01, 2015

As part of addressing in-kernel y2038 issues, this patch adds
read_boot_clock64() and replaces all the call sites of
read_boot_clock() with this function. This is a __weak
implementation, which simply calls the existing y2038 unsafe
read_boot_clock().

This allows architecture specific implementations to be
converted independently, and eventually the y2038 unsafe
read_boot_clock() can be removed after all its architecture
specific implementations have been converted to
read_boot_clock64().
Suggested-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

9a806ddb

01 4月, 2015 1 次提交

clockevents: Make suspend/resume calls explicit · 4ffee521

由 Thomas Gleixner 提交于 3月 25, 2015

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call.

We are way better off to have explicit calls instead of this
monstrosity. Split out the suspend/resume() calls and invoke
them directly from the call sites.

No locking required at this point because these calls happen
with interrupts disabled and a single cpu online.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
[ Rebased on top of 4.0-rc5. ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/713674030.jVm1qaHuPf@vostro.rjw.lan
[ Rebased on top of latest timers/core. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

4ffee521

27 3月, 2015 4 次提交

time: Introduce tk_fast_raw · f09cb9a1

由 Peter Zijlstra 提交于 3月 19, 2015

Add the NMI safe CLOCK_MONOTONIC_RAW accessor..
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.562746929@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

f09cb9a1

time: Parametrize all tk_fast_mono users · 4498e746

由 Peter Zijlstra 提交于 3月 19, 2015

In preparation for more tk_fast instances, remove all hard-coded
tk_fast_mono references.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.484279927@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

4498e746

time: Add timerkeeper::tkr_raw · 4a4ad80d

由 Peter Zijlstra 提交于 3月 19, 2015

Introduce tkr_raw and make use of it.

  base_raw -> tkr_raw.base
  clock->{mult,shift} -> tkr_raw.{mult.shift}

Kill timekeeping_get_ns_raw() in favour of
timekeeping_get_ns(&tkr_raw), this removes all mono_raw special
casing.

Duplicate the updates to tkr_mono.cycle_last into tkr_raw.cycle_last,
both need the same value.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.422589590@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

4a4ad80d

time: Rename timekeeper::tkr to timekeeper::tkr_mono · 876e7881

由 Peter Zijlstra 提交于 3月 19, 2015

In preparation of adding another tkr field, rename this one to
tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base,
since the structure is not specific to CLOCK_MONOTONIC and the mono
name got added to the tk_read_base instance.

Lots of trivial churn.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

876e7881

13 3月, 2015 4 次提交

timekeeping: Add warnings when overflows or underflows are observed · 4ca22c26

由 John Stultz 提交于 3月 11, 2015

It was suggested that the underflow/overflow protection
should probably throw some sort of warning out, rather
than just silently fixing the issue.

So this patch adds some warnings here. The flag variables
used are not protected by locks, but since we can't print
from the reading functions, just being able to say we
saw an issue in the update interval is useful enough,
and can be slightly racy without real consequence.

The big complication is that we're only under a read
seqlock, so the data could shift under us during
our calculation to see if there was a problem. This
patch avoids this issue by nesting another seqlock
which allows us to snapshot the just required values
atomically. So we shouldn't see false positives.

I also added some basic rate-limiting here, since
on one build machine w/ skewed TSCs it was fairly
noisy at bootup.
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-8-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

4ca22c26

timekeeping: Try to catch clocksource delta underflows · 057b87e3

由 John Stultz 提交于 3月 11, 2015

In the case where there is a broken clocksource
where there are multiple actual clocks that
aren't perfectly aligned, we may see small "negative"
deltas when we subtract 'now' from 'cycle_last'.

The values are actually negative with respect to the
clocksource mask value, not necessarily negative
if cast to a s64, but we can check by checking the
delta to see if it is a small (relative to the mask)
negative value (again negative relative to the mask).

If so, we assume we jumped backwards somehow and
instead use zero for our delta.
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-7-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

057b87e3

timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value · a558cd02

由 John Stultz 提交于 3月 11, 2015

When calculating the current delta since the last tick, we
currently have no hard protections to prevent a multiplication
overflow from occuring.

This patch introduces infrastructure to allow a cap that
limits the clocksource read delta value to the 'max_cycles' value,
which is where an overflow would occur.

Since this is in the hotpath, it adds the extra checking under
CONFIG_DEBUG_TIMEKEEPING=y.

There was some concern that capping time like this could cause
problems as we may stop expiring timers, which could go circular
if the timer that triggers time accumulation were mis-scheduled
too far in the future, which would cause time to stop.

However, since the mult overflow would result in a smaller time
value, we would effectively have the same problem there.
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-6-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

a558cd02

timekeeping: Add debugging checks to warn if we see delays · 3c17ad19

由 John Stultz 提交于 3月 11, 2015

Recently there's been requests for better sanity
checking in the time code, so that it's more clear
when something is going wrong, since timekeeping issues
could manifest in a large number of strange ways in
various subsystems.

Thus, this patch adds some extra infrastructure to
add a check to update_wall_time() to print two new
warnings:

 1) if we see the call delayed beyond the 'max_cycles'
    overflow point,

 2) or if we see the call delayed beyond the clocksource's
    'max_idle_ns' value, which is currently 50% of the
    overflow point.

This extra infrastructure is conditional on
a new CONFIG_DEBUG_TIMEKEEPING option, also
added in this patch - default off.

Tested this a bit by halting qemu for specified
lengths of time to trigger the warnings.
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stultz@linaro.org
[ Improved the changelog and the messages a bit. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

3c17ad19

16 2月, 2015 2 次提交

PM / sleep: Make it possible to quiesce timers during suspend-to-idle · 124cf911

由 Rafael J. Wysocki 提交于 2月 13, 2015

The efficiency of suspend-to-idle depends on being able to keep CPUs
in the deepest available idle states for as much time as possible.
Ideally, they should only be brought out of idle by system wakeup
interrupts.

However, timer interrupts occurring periodically prevent that from
happening and it is not practical to chase all of the "misbehaving"
timers in a whack-a-mole fashion. A much more effective approach is
to suspend the local ticks for all CPUs and the entire timekeeping
along the lines of what is done during full suspend, which also
helps to keep suspend-to-idle and full suspend reasonably similar.

The idea is to suspend the local tick on each CPU executing
cpuidle_enter_freeze() and to make the last of them suspend the
entire timekeeping. That should prevent timer interrupts from
triggering until an IO interrupt wakes up one of the CPUs. It
needs to be done with interrupts disabled on all of the CPUs,
though, because otherwise the suspended clocksource might be
accessed by an interrupt handler which might lead to fatal
consequences.

Unfortunately, the existing ->enter callbacks provided by cpuidle
drivers generally cannot be used for implementing that, because some
of them re-enable interrupts temporarily and some idle entry methods
cause interrupts to be re-enabled automatically on exit. Also some
of these callbacks manipulate local clock event devices of the CPUs
which really shouldn't be done after suspending their ticks.

To overcome that difficulty, introduce a new cpuidle state callback,
->enter_freeze, that will be guaranteed (1) to keep interrupts
disabled all the time (and return with interrupts disabled) and (2)
not to touch the CPU timer devices. Modify cpuidle_enter_freeze() to
look for the deepest available idle state with ->enter_freeze present
and to make the CPU execute that callback with suspended tick (and the
last of the online CPUs to execute it with suspended timekeeping).
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>

124cf911

timekeeping: Make it safe to use the fast timekeeper while suspended · 060407ae

由 Rafael J. Wysocki 提交于 2月 13, 2015

Theoretically, ktime_get_mono_fast_ns() may be executed after
timekeeping has been suspended (or before it is resumed) which
in turn may lead to undefined behavior, for example, when the
clocksource read from timekeeping_get_ns() called by it is
not accessible at that time.

Prevent that from happening by setting up a dummy readout base for
the fast timekeeper during timekeeping_suspend() such that it will
always return the same number of cycles.

After the last timekeeping_update() in timekeeping_suspend() the
clocksource is read and the result is stored as cycles_at_suspend.
The readout base from the current timekeeper is copied onto the
dummy and the ->read pointer of the dummy is set to a routine
unconditionally returning cycles_at_suspend.  Next, the dummy is
passed to update_fast_timekeeper().

Then, ktime_get_mono_fast_ns() will work until the subsequent
timekeeping_resume() and the proper readout base for the fast
timekeeper will be restored by the timekeeping_update() called
right after clearing timekeeping_suspended.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>

060407ae

14 2月, 2015 1 次提交

timekeeping: Pass readout base to update_fast_timekeeper() · affe3e85

由 Rafael J. Wysocki 提交于 2月 11, 2015

Modify update_fast_timekeeper() to take a struct tk_read_base
pointer as its argument (instead of a struct timekeeper pointer)
and update its kerneldoc comment to reflect that.

That will allow a struct tk_read_base that is not part of a
struct timekeeper to be passed to it in the next patch.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJohn Stultz <john.stultz@linaro.org>

affe3e85

24 1月, 2015 1 次提交

time: Expose getboottime64 for in-kernel uses · d08c0cdd

由 John Stultz 提交于 12月 08, 2014

Adds a timespec64 based getboottime64() implementation
that can be used as we convert internal users of
getboottime away from using timespecs.

Cc: pang.xunlei <pang.xunlei@linaro.org>
Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

d08c0cdd

25 11月, 2014 1 次提交

time: Fix sign bug in NTP mult overflow warning · cb2aa634

由 John Stultz 提交于 11月 24, 2014

In commit 6067dc5a ("time: Avoid possible NTP adjustment
mult overflow") a new check was added to watch for adjustments
that could cause a mult overflow.

Unfortunately the check compares a signed with unsigned value
and ignored the case where the adjustment was negative, which
causes spurious warn-ons on some systems (and seems like it
would result in problematic time adjustments there as well, due
to the early return).

Thus this patch adds a check to make sure the adjustment is
positive before we check for an overflow, and resovles the issue
in my testing.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Debugged-by: Npang.xunlei <pang.xunlei@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1416890145-30048-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

cb2aa634

22 11月, 2014 7 次提交

time: Fixup comments to reflect usage of timespec64 · 5322e4c2

由 John Stultz 提交于 11月 07, 2014

Fix up a few comments that weren't updated when the
functions were converted to use timespec64 structures.
Acked-by: NArnd Bergmann <arnd.bergmann@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

5322e4c2

time: Expose get_monotonic_coarse64() for in-kernel uses · 334334b5

由 John Stultz 提交于 11月 07, 2014

Adds a timespec64 based get_monotonic_coarse64() implementation
that can be used as we convert internal users of
get_monotonic_coarse away from using timespecs.
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

334334b5

time: Expose getrawmonotonic64 for in-kernel uses · cdba2ec5

由 John Stultz 提交于 11月 07, 2014

Adds a timespec64 based getrawmonotonic64() implementation
that can be used as we convert internal users of
getrawmonotonic away from using timespecs.
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

cdba2ec5

time: Provide y2038 safe timekeeping_inject_sleeptime() replacement · 04d90890

由 pang.xunlei 提交于 11月 18, 2014

As part of addressing "y2038 problem" for in-kernel uses, this
patch adds timekeeping_inject_sleeptime64() using timespec64.

After this patch, timekeeping_inject_sleeptime() is deprecated
and all its call sites will be fixed using the new interface,
after that it can be removed.

NOTE: timekeeping_inject_sleeptime() is safe actually, but we
want to eliminate timespec eventually, so comes this patch.
Signed-off-by: Npang.xunlei <pang.xunlei@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

04d90890

time: Provide y2038 safe do_settimeofday() replacement · 21f7eca5

由 pang.xunlei 提交于 11月 18, 2014

The kernel uses 32-bit signed value(time_t) for seconds elapsed
1970-01-01:00:00:00, thus it will overflow at 2038-01-19 03:14:08
on 32-bit systems. This is widely known as the y2038 problem.

As part of addressing "y2038 problem" for in-kernel uses, this patch
adds safe do_settimeofday64() using timespec64.

After this patch, do_settimeofday() is deprecated and all its call
sites will be fixed using do_settimeofday64(), after that it can be
removed.
Signed-off-by: Npang.xunlei <pang.xunlei@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

21f7eca5

time: Complete NTP adjustment threshold judging conditions · 659bc17b

由 pang.xunlei 提交于 10月 09, 2014

The clocksource mult-adjustment threshold is [mult-maxadj, mult+maxadj],
timekeeping_adjust() only deals with the upper threshold, but misses the
lower threshold.

This patch adds the lower threshold judging condition.
Signed-off-by: Npang.xunlei <pang.xunlei@linaro.org>
[jstultz: Minor fix for > 80 char line]
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

659bc17b

time: Avoid possible NTP adjustment mult overflow. · 6067dc5a

由 pang.xunlei 提交于 10月 08, 2014

Ideally, __clocksource_updatefreq_scale, selects the largest shift
value possible for a clocksource. This results in the mult memember of
struct clocksource being particularly large, although not so large
that NTP would adjust the clock to cause it to overflow.

That said, nothing actually prohibits an overflow from occuring, its
just that it "shouldn't" occur.

So while very unlikely, and so far never observed, the value of
(cs->mult+cs->maxadj) may have a chance to reach very near 0xFFFFFFFF,
so there is a possibility it may overflow when doing NTP positive
adjustment

See the following detail: When NTP slewes the clock, kernel goes
through update_wall_time()->...->timekeeping_apply_adjustment():
	tk->tkr.mult += mult_adj;

Since there is no guard against it, its possible tk->tkr.mult may
overflow during this operation.

This patch avoids any possible mult overflow by judging the overflow
case before adding mult_adj to mult, also adds the WARNING message
when capturing such case.
Signed-off-by: Npang.xunlei <pang.xunlei@linaro.org>
[jstultz: Reworded commit message]
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

6067dc5a

29 10月, 2014 2 次提交

timekeeping: Provide y2038 safe accessor to the seconds portion of CLOCK_REALTIME · dbe7aa62

由 Heena Sirwani 提交于 10月 29, 2014

ktime_get_real_seconds() is the replacement function for get_seconds()
returning the seconds portion of CLOCK_REALTIME in a time64_t. For
64bit the function is equivivalent to get_seconds(), but for 32bit it
protects the readout with the timekeeper sequence count. This is
required because 32-bit machines cannot access 64-bit tk->xtime_sec
variable atomically.

[tglx: Massaged changelog and added docbook comment ]
Signed-off-by: NHeena Sirwani <heenasirwani@gmail.com>
Reviewed-by: NArnd Bergman <arnd@arndb.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: opw-kernel@googlegroups.com
Link: http://lkml.kernel.org/r/7adcfaa8962b8ad58785d9a2456c3f77d93c0ffb.1414578445.git.heenasirwani@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

dbe7aa62

timekeeping: Provide fast accessor to the seconds part of CLOCK_MONOTONIC · 9e3680b1

由 Heena Sirwani 提交于 10月 29, 2014

This is the counterpart to get_seconds() based on CLOCK_MONOTONIC. The
use case for this interface are kernel internal coarse grained
timestamps which do neither require the nanoseconds fraction of
current time nor the CLOCK_REALTIME properties. Such timestamps can
currently only retrieved by calling ktime_get_ts64() and using the
tv_sec field of the returned timespec64. That's inefficient as it
involves the read of the clocksource, math operations and must be
protected by the timekeeper sequence counter.

To avoid the sequence counter protection we restrict the return value
to unsigned 32bit on 32bit machines. This covers ~136 years of uptime
and therefor an overflow is not expected to hit anytime soon.

To avoid math in the function we calculate the current seconds portion
of CLOCK_MONOTONIC when the timekeeper gets updated in
tk_update_ktime_data() similar to the CLOCK_REALTIME counterpart
xtime_sec.

[ tglx: Massaged changelog, simplified and commented the update
  	function, added docbook comment ]
Signed-off-by: NHeena Sirwani <heenasirwani@gmail.com>
Reviewed-by: NArnd Bergman <arnd@arndb.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: opw-kernel@googlegroups.com
Link: http://lkml.kernel.org/r/da0b63f4bdf3478909f92becb35861197da3a905.1414578445.git.heenasirwani@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

9e3680b1

06 9月, 2014 1 次提交

timekeeping: Update timekeeper before updating vsyscall and pvclock · 9bf2419f

由 Thomas Gleixner 提交于 9月 06, 2014

The update_walltime() code works on the shadow timekeeper to make the
seqcount protected region as short as possible. But that update to the
shadow timekeeper does not update all timekeeper fields because it's
sufficient to do that once before it becomes life. One of these fields
is tkr.base_mono. That stays stale in the shadow timekeeper unless an
operation happens which copies the real timekeeper to the shadow.

The update function is called after the update calls to vsyscall and
pvclock. While not correct, it did not cause any problems because none
of the invoked update functions used base_mono.

commit cbcf2dd3 (x86: kvm: Make kvm_get_time_and_clockread()
nanoseconds based) changed that in the kvm pvclock update function, so
the stale mono_base value got used and caused kvm-clock to malfunction.

Put the update where it belongs and fix the issue.
Reported-by: NChris J Arges <chris.j.arges@canonical.com>
Reported-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

9bf2419f

15 8月, 2014 1 次提交

timekeeping: Another fix to the VSYSCALL_OLD update_vsyscall · 0680eb1f

由 John Stultz 提交于 8月 13, 2014

Benjamin Herrenschmidt pointed out that I further missed modifying
update_vsyscall after the wall_to_mono value was changed to a
timespec64.  This causes issues on powerpc32, which expects a 32bit
timespec.

This patch fixes the problem by properly converting from a timespec64 to
a timespec before passing the value on to the arch-specific vsyscall
logic.

[ Thomas is currently on vacation, but reviewed it and wanted me to send
  this fix on to you directly. ]

Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0680eb1f

24 7月, 2014 2 次提交

timekeeping: Use cached ntp_tick_length when accumulating error · 375f45b5

由 John Stultz 提交于 4月 23, 2014

By caching the ntp_tick_length() when we correct the frequency error,
and then using that cached value to accumulate error, we avoid large
initial errors when the tick length is changed.

This makes convergence happen much faster in the simulator, since the
initial error doesn't have to be slowly whittled away.

This initially seems like an accounting error, but Miroslav pointed out
that ntp_tick_length() can change mid-tick, so when we apply it in the
error accumulation, we are applying any recent change to the entire tick.

This approach chooses to apply changes in the ntp_tick_length() only to
the next tick, which allows us to calculate the freq correction before
using the new tick length, which avoids accummulating error.

Credit to Miroslav for pointing this out and providing the original patch
this functionality has been pulled out from, along with the rational.

Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reported-by: NMiroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

375f45b5

timekeeping: Rework frequency adjustments to work better w/ nohz · dc491596

由 John Stultz 提交于 12月 06, 2013

The existing timekeeping_adjust logic has always been complicated
to understand. Further, since it was developed prior to NOHZ becoming
common, its not surprising it performs poorly when NOHZ is enabled.

Since Miroslav pointed out the problematic nature of the existing code
in the NOHZ case, I've tried to refactor the code to perform better.

The problem with the previous approach was that it tried to adjust
for the total cumulative error using a scaled dampening factor. This
resulted in large errors to be corrected slowly, while small errors
were corrected quickly. With NOHZ the timekeeping code doesn't know
how far out the next tick will be, so this results in bad
over-correction to small errors, and insufficient correction to large
errors.

Inspired by Miroslav's patch, I've refactored the code to try to
address the correction in two steps.

1) Check the future freq error for the next tick, and if the frequency
error is large, try to make sure we correct it so it doesn't cause
much accumulated error.

2) Then make a small single unit adjustment to correct any cumulative
error that has collected over time.

This method performs fairly well in the simulator Miroslav created.

Major credit to Miroslav for pointing out the issue, providing the
original patch to resolve this, a simulator for testing, as well as
helping debug and resolve issues in my implementation so that it
performed closer to his original implementation.

Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reported-by: NMiroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

dc491596

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功