提交 · 69585dd69e663a40729492c7b52eb82477a2027a · openanolis / cloud-kernel

25 8月, 2009 1 次提交

timekeeping: Fix invalid getboottime() value · 36d47481

由 Hiroshi Shimamoto 提交于 8月 25, 2009

Don't use timespec_add_safe() with wall_to_monotonic, because
wall_to_monotonic has negative values which will cause overflow
in timespec_add_safe(). That makes btime in /proc/stat invalid.
Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <4A937FDE.4050506@ct.jp.nec.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

36d47481

22 8月, 2009 1 次提交

time: Introduce CLOCK_REALTIME_COARSE · da15cfda

由 john stultz 提交于 8月 19, 2009

After talking with some application writers who want very fast, but not
fine-grained timestamps, I decided to try to implement new clock_ids
to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
which returns the time at the last tick. This is very fast as we don't
have to access any hardware (which can be very painful if you're using
something like the acpi_pm clocksource), and we can even use the vdso
clock_gettime() method to avoid the syscall. The only trade off is you
only get low-res tick grained time resolution.

This isn't a new idea, I know Ingo has a patch in the -rt tree that made
the vsyscall gettimeofday() return coarse grained time when the
vsyscall64 sysctrl was set to 2. However this affects all applications
on a system.

With this method, applications can choose the proper speed/granularity
trade-off for themselves.
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: nikolag@ca.ibm.com
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: arjan@infradead.org
Cc: jonathan@jonmasters.org
LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

da15cfda

15 8月, 2009 11 次提交

timekeeping: Introduce read_boot_clock · 23970e38

由 Martin Schwidefsky 提交于 8月 14, 2009

Add the new function read_boot_clock to get the exact time the system
has been started. For architectures without support for exact boot
time a new weak function is added that returns 0.  Use the exact boot
time to initialize wall_to_monotonic, or xtime if the read_boot_clock
returned 0.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134811.296703241@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

23970e38

timekeeping: Increase granularity of read_persistent_clock() · d4f587c6

由 Martin Schwidefsky 提交于 8月 14, 2009

The persistent clock of some architectures (e.g. s390) have a
better granularity than seconds. To reduce the delta between the
host clock and the guest clock in a virtualized system change the 
read_persistent_clock function to return a struct timespec.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134811.013873340@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

d4f587c6

timekeeping: Update clocksource with stop_machine · 75c5158f

由 Martin Schwidefsky 提交于 8月 14, 2009

update_wall_time calls change_clocksource HZ times per second to check
if a new clock source is available. In close to 100% of all calls
there is no new clock. Replace the tick based check by an update done
with stop_machine.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134810.711836357@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

75c5158f

timekeeping: Add timekeeper read_clock helper functions · 2ba2a305

由 Martin Schwidefsky 提交于 8月 14, 2009

Add timekeeper_read_clock_ntp and timekeeper_read_clock_raw and use
them for getnstimeofday, ktime_get, ktime_get_ts and getrawmonotonic.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134810.435105711@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

2ba2a305

timekeeping: Move NTP adjusted clock multiplier to struct timekeeper · 0a544198

由 Martin Schwidefsky 提交于 8月 14, 2009

The clocksource structure has two multipliers, the unmodified multiplier
clock->mult_orig and the NTP corrected multiplier clock->mult. The NTP
multiplier is misplaced in the struct clocksource, this is private
information of the timekeeping code. Add the mult field to the struct
timekeeper to contain the NTP corrected value, keep the unmodifed
multiplier in clock->mult and remove clock->mult_orig.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134810.149047645@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

0a544198

timekeeping: Add xtime_shift and ntp_error_shift to struct timekeeper · 23ce7211

由 Martin Schwidefsky 提交于 8月 14, 2009

The xtime_nsec value in the timekeeper structure is shifted by a few
bits to improve precision. This happens to be the same value as the
clock->shift. To improve readability add xtime_shift to the timekeeper
and use it instead of the clock->shift. Likewise add ntp_error_shift
and replace all (NTP_SCALE_SHIFT - clock->shift) expressions.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134809.871899606@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

23ce7211

timekeeping: Introduce struct timekeeper · 155ec602

由 Martin Schwidefsky 提交于 8月 14, 2009

Add struct timekeeper to keep the internal values timekeeping.c needs
in regard to the currently selected clock source. This moves the
timekeeping intervals, xtime_nsec and the ntp error value from struct
clocksource to struct timekeeper. The raw_time is removed from the
clocksource as well. It gets treated like xtime as a global variable.
Eventually xtime raw_time should be moved to struct timekeeper.

[ tglx: minor cleanup ]
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134809.613209842@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

155ec602

clocksource: Cleanup clocksource selection · f1b82746

由 Martin Schwidefsky 提交于 8月 14, 2009

If a non high-resolution clocksource is first set as override clock
and then registered it becomes active even if the system is in one-shot
mode. Move the override check from sysfs_override_clocksource to the
clocksource selection. That fixes the bug and simplifies the code. The
check in clocksource_register for double registration of the same
clocksource is removed without replacement.

To find the initial clocksource a new weak function in jiffies.c is
defined that returns the jiffies clocksource. The architecture code
can then override the weak function with a more suitable clocksource,
e.g. the TOD clock on s390.

[ tglx: Folded in a fix from John Stultz ]
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134808.388024160@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f1b82746

timekeeping: Move reset of cycle_last for tsc clocksource to tsc · 1be39679

由 Martin Schwidefsky 提交于 8月 14, 2009

change_clocksource resets the cycle_last value to zero then sets it to
a value read from the clocksource. The reset to zero is required only
for the TSC clocksource to make the read_tsc function work after a
resume. The reason is that the TSC read function uses cycle_last to
detect backwards going TSCs. In the resume case cycle_last contains
the TSC value from the last update before the suspend. On resume the
TSC starts counting from 0 again and would trip over the cycle_last
comparison.

This is subtle and surprising. Move the reset to a resume function in
the tsc code.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134808.142191175@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

1be39679

timekeeping: Remove clocksource inline functions · a0f7d48b

由 Martin Schwidefsky 提交于 8月 14, 2009

The three inline functions clocksource_read, clocksource_enable and
clocksource_disable are simple wrappers of an indirect call plus the
copy from and to the mult_orig value. The functions are exclusively
used by the timekeeping code which has intimate knowledge of the
clocksource anyway. Therefore remove the inline functions. No
functional change.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134807.903108946@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

a0f7d48b

timekeeping: Introduce timekeeping_leap_insert · 31089c13

由 John Stultz 提交于 8月 14, 2009

Move the adjustment of xtime, wall_to_monotonic and the update of the
vsyscall variables to the timekeeping code.
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
LKML-Reference: <20090814134807.609730216@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

31089c13

07 7月, 2009 2 次提交

timekeeping: Move ktime_get() functions to timekeeping.c · a40f262c

由 Thomas Gleixner 提交于 7月 07, 2009

The ktime_get() functions for GENERIC_TIME=n are still located in
hrtimer.c. Move them to time/timekeeping.c where they belong.

LKML-Reference: <new-submission>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

a40f262c

timekeeping: optimized ktime_get[_ts] for GENERIC_TIME=y · 951ed4d3

由 Martin Schwidefsky 提交于 7月 07, 2009

The generic ktime_get function defined in kernel/hrtimer.c is suboptimial
for GENERIC_TIME=y:

 0)               |  ktime_get() {
 0)               |    ktime_get_ts() {
 0)               |      getnstimeofday() {
 0)               |        read_tod_clock() {
 0)   0.601 us    |        }
 0)   1.938 us    |      }
 0)               |      set_normalized_timespec() {
 0)   0.602 us    |      }
 0)   4.375 us    |    }
 0)   5.523 us    |  }

Overall there are two read_seqbegin/read_seqretry loops and a lot of
unnecessary struct timespec calculations. ktime_get returns a nano second
value which is the sum of xtime, wall_to_monotonic and the nano second
delta from the clock source.

ktime_get can be optimized for GENERIC_TIME=y. The new version only calls
clocksource_read:

 0)               |  ktime_get() {
 0)               |    read_tod_clock() {
 0)   0.610 us    |    }
 0)   1.977 us    |  }

It uses a single read_seqbegin/readseqretry loop and just adds everthing
to a nano second value.

ktime_get_ts is optimized in a similar fashion.

[ tglx: added WARN_ON(timekeeping_suspended) as in getnstimeofday() ]
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Njohn stultz <johnstul@us.ibm.com>
LKML-Reference: <20090707112728.3005244d@skybase>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

951ed4d3

15 5月, 2009 1 次提交

sched, timers: move calc_load() to scheduler · dce48a84

由 Thomas Gleixner 提交于 4月 11, 2009

Dimitri Sivanich noticed that xtime_lock is held write locked across
calc_load() which iterates over all online CPUs. That can cause long
latencies for xtime_lock readers on large SMP systems. 

The load average calculation is an rough estimate anyway so there is
no real need to protect the readers vs. the update. It's not a problem
when the avenrun array is updated while a reader copies the values.

Instead of iterating over all online CPUs let the scheduler_tick code
update the number of active tasks shortly before the avenrun update
happens. The avenrun update itself is handled by the CPU which calls
do_timer().

[ Impact: reduce xtime_lock write locked section ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>

dce48a84

02 5月, 2009 1 次提交

timekeeping: create arch_gettimeoffset infrastructure · 7d27558c

由 john stultz 提交于 5月 01, 2009

Some arches don't supply their own clocksource. This is mainly the
case in architectures that get their inter-tick times by reading the
counter on their interval timer.  Since these timers wrap every tick,
they're not really useful as clocksources.  Wrapping them to act like
one is possible but not very efficient. So we provide a callout these
arches can implement for use with the jiffies clocksource to provide
finer then tick granular time.

[ Impact: ease the migration to generic time keeping ]
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

7d27558c

22 4月, 2009 1 次提交

clocksource: add enable() and disable() callbacks · 4614e6ad

由 Magnus Damm 提交于 4月 21, 2009

Add enable() and disable() callbacks for clocksources.

This allows us to put unused clocksources in power save mode.  The
functions clocksource_enable() and clocksource_disable() wrap the
callbacks and are inserted in the timekeeping code to enable before use
and disable after switching to a new clocksource.
Signed-off-by: NMagnus Damm <damm@igel.co.jp>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4614e6ad

31 12月, 2008 1 次提交

sched_clock: prevent scd->clock from moving backwards, take #2 · 1c5745aa

由 Thomas Gleixner 提交于 12月 22, 2008

Redo:

  5b7dba4f: sched_clock: prevent scd->clock from moving backwards

which had to be reverted due to s2ram hangs:

  ca7e716c: Revert "sched_clock: prevent scd->clock from moving backwards"

... this time with resume restoring GTOD later in the sequence
taken into account as well.

The "timekeeping_suspended" flag is not very nice but we cannot call into
GTOD before it has been properly resumed and the scheduler will run very
early in the resume sequence.

Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1c5745aa

04 12月, 2008 1 次提交

time: catch xtime_nsec underflows and fix them · 6c9bacb4

由 john stultz 提交于 12月 01, 2008

Impact: fix time warp bug

Alex Shi, along with Yanmin Zhang have been noticing occasional time
inconsistencies recently. Through their great diagnosis, they found that
the xtime_nsec value used in update_wall_time was occasionally going
negative. After looking through the code for awhile, I realized we have
the possibility for an underflow when three conditions are met in
update_wall_time():

  1) We have accumulated a second's worth of nanoseconds, so we
     incremented xtime.tv_sec and appropriately decrement xtime_nsec.
     (This doesn't cause xtime_nsec to go negative, but it can cause it
      to be small).

  2) The remaining offset value is large, but just slightly less then
     cycle_interval.

  3) clocksource_adjust() is speeding up the clock, causing a
     corrective amount (compensating for the increase in the multiplier
     being multiplied against the unaccumulated offset value) to be
     subtracted from xtime_nsec.

This can cause xtime_nsec to underflow.

Unfortunately, since we notify the NTP subsystem via second_overflow()
whenever we accumulate a full second, and this effects the error
accumulation that has already occured, we cannot simply revert the
accumulated second from xtime nor move the second accumulation to after
the clocksource_adjust call without a change in behavior.

This leaves us with (at least) two options:

1) Simply return from clocksource_adjust() without making a change if we
   notice the adjustment would cause xtime_nsec to go negative.

This would work, but I'm concerned that if a large adjustment was needed
(due to the error being large), it may be possible to get stuck with an
ever increasing error that becomes too large to correct (since it may
always force xtime_nsec negative). This may just be paranoia on my part.

2) Catch xtime_nsec if it is negative, then add back the amount its
   negative to both xtime_nsec and the error.

This second method is consistent with how we've handled earlier rounding
issues, and also has the benefit that the error being added is always in
the oposite direction also always equal or smaller then the correction
being applied. So the risk of a corner case where things get out of
control is lessened.

This patch fixes bug 11970, as tested by Yanmin Zhang
http://bugzilla.kernel.org/show_bug.cgi?id=11970

Reported-by: alex.shi@intel.com
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Acked-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Tested-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6c9bacb4

24 9月, 2008 1 次提交

timekeeping: fix rounding problem during clock update · 5cd1c9c5

由 Roman Zippel 提交于 9月 22, 2008

Due to a rounding problem during a clock update it's possible for readers
to observe the clock jumping back by 1nsec.  The following simplified
example demonstrates the problem:

cycle	xtime
0	0
1000	999999.6
2000	1999999.2
3000	2999998.8
...

1500 =	1499999.4
=	0.0 + 1499999.4
=	999999.6 + 499999.8

When reading the clock only the full nanosecond part is used, while
timekeeping internally keeps nanosecond fractions.  If the clock is now
updated at cycle 1500 here, a nanosecond is missing due to the truncation.

The simple fix is to round up the xtime value during the update, this also
changes the distance to the reference time, but the adjustment will
automatically take care that it stays under control.
Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

5cd1c9c5

21 8月, 2008 2 次提交

clocksource: introduce CLOCK_MONOTONIC_RAW · 2d42244a

由 John Stultz 提交于 8月 20, 2008

In talking with Josip Loncaric, and his work on clock synchronization (see
btime.sf.net), he mentioned that for really close synchronization, it is
useful to have access to "hardware time", that is a notion of time that is
not in any way adjusted by the clock slewing done to keep close time sync.

Part of the issue is if we are using the kernel's ntp adjusted
representation of time in order to measure how we should correct time, we
can run into what Paul McKenney aptly described as "Painting a road using
the lines we're painting as the guide".

I had been thinking of a similar problem, and was trying to come up with a
way to give users access to a purely hardware based time representation
that avoided users having to know the underlying frequency and mask values
needed to deal with the wide variety of possible underlying hardware
counters.

My solution is to introduce CLOCK_MONOTONIC_RAW.  This exposes a
nanosecond based time value, that increments starting at bootup and has no
frequency adjustments made to it what so ever.

The time is accessed from userspace via the posix_clock_gettime() syscall,
passing CLOCK_MONOTONIC_RAW as the clock_id.
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2d42244a

clocksource: introduce clocksource_forward_now() · 9a055117

由 Roman Zippel 提交于 8月 20, 2008

To keep the raw monotonic patch simple first introduce
clocksource_forward_now(), which takes care of the offset since the last
update_wall_time() call and adds it to the clock, so there is no need
anymore to deal with it explicitly at various places, which need to make
significant changes to the clock.

This is also gets rid of the timekeeping_suspend_nsecs, instead of
waiting until resume, the value is accumulated during suspend. In the end
there is only a single user of __get_nsec_offset() left, so I integrated
it back to getnstimeofday().
Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9a055117

01 5月, 2008 3 次提交

ntp: handle leap second via timer · 7dffa3c6

由 Roman Zippel 提交于 5月 01, 2008

Remove the leap second handling from second_overflow(), which doesn't have to
check for it every second anymore.  With CONFIG_NO_HZ this also makes sure the
leap second is handled close to the full second.  Additionally this makes it
possible to abort a leap second properly by resetting the STA_INS/STA_DEL
status bits.
Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7dffa3c6

ntp: remove current_tick_length() · 8383c423

由 Roman Zippel 提交于 5月 01, 2008

current_tick_length used to do a little more, but now it just returns
tick_length, which we can also access directly at the few places, where it's
needed.
Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8383c423

ntp: rename TICK_LENGTH_SHIFT to NTP_SCALE_SHIFT · 7fc5c784

由 Roman Zippel 提交于 5月 01, 2008

As TICK_LENGTH_SHIFT is used for more than just the tick length, the name
isn't quite approriate anymore, so this renames it to NTP_SCALE_SHIFT.
Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7fc5c784

20 4月, 2008 1 次提交

x86: tsc prevent time going backwards · d8bb6f4c

由 Thomas Gleixner 提交于 4月 01, 2008

We already catch most of the TSC problems by sanity checks, but there
is a subtle bug which has been in the code forever. This can cause
time jumps in the range of hours.

This was reported in:
     http://lkml.org/lkml/2007/8/23/96
and
     http://lkml.org/lkml/2008/3/31/23

I was able to reproduce the problem with a gettimeofday loop test on a
dual core and a quad core machine which both have sychronized
TSCs. The TSCs seems not to be perfectly in sync though, but the
kernel is not able to detect the slight delta in the sync check. Still
there exists an extremly small window where this delta can be observed
with a real big time jump. So far I was only able to reproduce this
with the vsyscall gettimeofday implementation, but in theory this
might be observable with the syscall based version as well.

CPU 0 updates the clock source variables under xtime/vyscall lock and
CPU1, where the TSC is slighty behind CPU0, is reading the time right
after the seqlock was unlocked.

The clocksource reference data was updated with the TSC from CPU0 and
the value which is read from TSC on CPU1 is less than the reference
data. This results in a huge delta value due to the unsigned
subtraction of the TSC value and the reference value. This algorithm
can not be changed due to the support of wrapping clock sources like
pm timer.

The huge delta is converted to nanoseconds and added to xtime, which
is then observable by the caller. The next gettimeofday call on CPU1
will show the correct time again as now the TSC has advanced above the
reference value.

To prevent this TSC specific wreckage we need to compare the TSC value
against the reference value and return the latter when it is larger
than the actual TSC value.

I pondered to mark the TSC unstable when the readout is smaller than
the reference value, but this would render an otherwise good and fast
clocksource unusable without a real good reason.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d8bb6f4c

25 3月, 2008 1 次提交

Don't 'printk()' while holding xtime lock for writing · 92896bd9

由 Linus Torvalds 提交于 3月 24, 2008

The printk() can deadlock because it can wake up klogd(), and
task enqueueing will try to read the time in order to set a hrtimer.
Reported-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
Debugged-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92896bd9

09 3月, 2008 1 次提交

time: remove obsolete CLOCK_TICK_ADJUST · 10a398d0

由 Roman Zippel 提交于 3月 04, 2008

The first version of the ntp_interval/tick_length inconsistent usage patch was
recently merged as bbe4d18a

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bbe4d18ac2e058c56adb0cd71f49d9ed3216a405

While the fix did greatly improve the situation, it was correctly pointed out
by Roman that it does have a small bug: If the users change clocksources after
the system has been running and NTP has made corrections, the correctoins made
against the old clocksource will be applied against the new clocksource,
causing error.

The second attempt, which corrects the issue in the NTP_INTERVAL_LENGTH
definition has also made it up-stream as commit
e13a2e61

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e13a2e61dd5152f5499d2003470acf9c838eab84

Roman has correctly pointed out that CLOCK_TICK_ADJUST is calculated
based on the PIT's frequency, and isn't really relevant to non-PIT
driven clocksources (that is, clocksources other then jiffies and pit).

This patch reverts both of those changes, and simply removes
CLOCK_TICK_ADJUST.

This does remove the granularity error correction for users of PIT and Jiffies
clocksource users, but the granularity error but for the majority of users, it
should be within the 500ppm range NTP can accommodate for.

For systems that have granularity errors greater then 500ppm, the
"ntp_tick_adj=" boot option can be used to compensate.

[johnstul@us.ibm.com: provided changelog]
[mattilinnanvuori@yahoo.com: maek ntp_tick_adj static]
Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
Acked-by: Njohn stultz <johnstul@us.ibm.com>
Signed-off-by: NMatti Linnanvuori <mattilinnanvuori@yahoo.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: mingo@elte.hu
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

10a398d0

09 2月, 2008 2 次提交

time: fix typo in comments · 3eb05676

由 Li Zefan 提交于 2月 08, 2008

Fix typo in comments.

BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise
checkpatch.pl will be complaining.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3eb05676

timekeeping: rename timekeeping_is_continuous to timekeeping_valid_for_hres · cf4fc6cb

由 Li Zefan 提交于 2月 08, 2008

Function timekeeping_is_continuous() no longer checks flag
CLOCK_IS_CONTINUOUS, and it checks CLOCK_SOURCE_VALID_FOR_HRES now.  So rename
the function accordingly.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf4fc6cb

02 2月, 2008 1 次提交

timekeeping: update xtime_cache when time(zone) changes · 1001d0a9

由 Thomas Gleixner 提交于 2月 01, 2008

xtime_cache needs to be updated whenever xtime and or wall_to_monotic
are changed. Otherwise users of xtime_cache might see a stale (and in
the case of timezone changes utterly wrong) value until the next
update happens.

Fixup the obvious places, which miss this update.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Tested-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1001d0a9

30 1月, 2008 2 次提交

NTP: correct inconsistent ntp interval/tick_length usage · bbe4d18a

由 john stultz 提交于 1月 30, 2008

I recently noticed on one of my boxes that when synched with an NTP
server, the drift value reported for the system was ~283ppm. While in
some cases, clock hardware can be that bad, it struck me as unusual as
the system was using the acpi_pm clocksource, which is one of the more
trustworthy and accurate clocksources on x86 hardware.

I brought up another system and let it sync to the same NTP server, and
I noticed a similar 280some ppm drift.

In looking at the code, I found that the acpi_pm's constant frequency
was being computed correctly at boot-up, however once the system was up,
even without the ntp daemon running, the clocksource's frequency was
being modified by the clocksource_adjust() function.

Digging deeper, I realized that in the code that keeps track of how much
the clocksource is skewing from the ntp desired time, we were using
different lengths to establish how long an time interval was.

The clocksource was being setup with the following interval:
	NTP_INTERVAL_LENGTH = NSEC_PER_SEC/NTP_INTERVAL_FREQ

While the ntp code was using the tick_length_base value:
	tick_length_base ~= (tick_usec * NSEC_PER_USEC * USER_HZ)
					/NTP_INTERVAL_FREQ

The subtle difference is:
	(tick_usec * NSEC_PER_USEC * USER_HZ) != NSEC_PER_SEC

This difference in calculation was causing the clocksource correction
code to apply a correction factor to the clocksource so the two
intervals were the same, however this results in the actual frequency of
the clocksource to be made incorrect. I believe this difference would
affect all clocksources, although to differing degrees depending on the
clocksource resolution.

The issue was introduced when my HZ free ntp patch landed in 2.6.21-rc1,
so my apologies for the mistake, and for not noticing it until now.

The following patch, corrects the clocksource's initialization code so
it uses the same interval length as the code in ntp.c. After applying
this patch, the drift value for the same system went from ~283ppm to
only 2.635ppm.

I believe this patch to be good, however it does affect all arches and
I've only tested on x86, so some caution is advised. I do think it would
be a likely candidate for a stable 2.6.24.x release.

Any thoughts or feedback would be appreciated.
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

bbe4d18a

time: fold __get_realtime_clock_ts() into getnstimeofday() · efd9ac86

由 Geert Uytterhoeven 提交于 1月 30, 2008

  - getnstimeofday() was just a wrapper around __get_realtime_clock_ts()
  - Replace calls to __get_realtime_clock_ts() by calls to getnstimeofday()
  - Fix bogus reference to get_realtime_clock_ts(), which never existed
Signed-off-by: NGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

efd9ac86

25 1月, 2008 1 次提交

Driver core: change sysdev classes to use dynamic kobject names · af5ca3f4

由 Kay Sievers 提交于 12月 20, 2007

All kobjects require a dynamically allocated name now. We no longer
need to keep track if the name is statically assigned, we can just
unconditionally free() all kobject names on cleanup.
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

af5ca3f4

17 10月, 2007 2 次提交

kernel/time/timekeeping.c: cleanups · ba2a631b

由 Adrian Bunk 提交于 10月 16, 2007

- remove the no longer required __attribute__((weak)) of xtime_lock
- remove the following no longer used EXPORT_SYMBOL's:
  - xtime
  - xtime_lock
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba2a631b

time: introduce xtime_seconds · f20bf612

由 Ingo Molnar 提交于 10月 16, 2007

improve performance of sys_time(). sys_time() returns time in seconds,
but it does so by calling do_gettimeofday() and then returning the
tv_sec portion of the GTOD time. But the data structure "xtime", which
is updated by every timer/scheduler tick, already offers HZ granularity
time.

the patch improves the sysbench oltp macrobenchmark by 4-5% on an AMD
dual-core system:

v2.6.23:

#threads

   1:     transactions:                        4073   (407.23 per sec.)
   2:     transactions:                        8530   (852.81 per sec.)
   3:     transactions:                        8321   (831.88 per sec.)
   4:     transactions:                        8407   (840.58 per sec.)
   5:     transactions:                        8070   (806.74 per sec.)

v2.6.23 + sys_time-speedup.patch:

   1:     transactions:                        4281   (428.09 per sec.)
   2:     transactions:                        8910   (890.85 per sec.)
   3:     transactions:                        8659   (865.79 per sec.)
   4:     transactions:                        8676   (867.34 per sec.)
   5:     transactions:                        8532   (852.91 per sec.)

and by 4-5% on an Intel dual-core system too:

2.6.23:

  1:     transactions:                        4560   (455.94 per sec.)
  2:     transactions:                        10094  (1009.30 per sec.)
  3:     transactions:                        9755   (975.36 per sec.)
  4:     transactions:                        9859   (985.78 per sec.)
  5:     transactions:                        9701   (969.72 per sec.)

2.6.23 + sys_time-speedup.patch:

  1:     transactions:                        4779   (477.84 per sec.)
  2:     transactions:                        10103  (1010.14 per sec.)
  3:     transactions:                        10141  (1013.93 per sec.)
  4:     transactions:                        10371  (1036.89 per sec.)
  5:     transactions:                        10178  (1017.50 per sec.)

(the more CPUs the system has, the more speedup this patch gives for
this particular workload.)
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f20bf612

16 9月, 2007 2 次提交

timekeeping: Prevent time going backwards on resume · 6a669ee8

由 Thomas Gleixner 提交于 9月 16, 2007

Timekeeping resume adjusts xtime by adding the slept time in seconds and
resets the reference value of the clock source (clock->cycle_last).
clock->cycle last is used to calculate the delta between the last xtime
update and the readout of the clock source in __get_nsec_offset(). xtime
plus the offset is the current time. The resume code ignores the delta
which had already elapsed between the last xtime update and the actual
time of suspend. If the suspend time is short, then we can see time
going backwards on resume.

Suspend:
offs_s = clock->read() - clock->cycle_last;
now = xtime + offs_s;
timekeeping_suspend_time = read_rtc();

Resume:
sleep_time = read_rtc() - timekeeping_suspend_time;
xtime.tv_sec += sleep_time;
clock->cycle_last = clock->read();
offs_r = clock->read() - clock->cycle_last;
now = xtime + offs_r;

if sleep_time_seconds == 0 and offs_r < offs_s, then time goes
backwards.

Fix this by storing the offset from the last xtime update and add it to
xtime during resume, when we reset clock->cycle_last:

sleep_time = read_rtc() - timekeeping_suspend_time;
xtime.tv_sec += sleep_time;
xtime += offs_s;	/* Fixup xtime offset at suspend time */
clock->cycle_last = clock->read();
offs_r = clock->read() - clock->cycle_last;
now = xtime + offs_r;

Thanks to Marcelo for tracking this down on the OLPC and providing the
necessary details to analyze the root cause.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Tosatti <marcelo@kvack.org>

6a669ee8

timekeeping: access rtc outside of xtime lock · 3be90950

由 Thomas Gleixner 提交于 9月 16, 2007

Lockdep complains about the access of rtc in timekeeping_suspend
inside the interrupt disabled region of the write locked xtime lock.
Move the access outside.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <johnstul@us.ibm.com>

3be90950

26 7月, 2007 1 次提交

Cache xtime every call to update_wall_time · 17c38b74

由 john stultz 提交于 7月 24, 2007

This avoids xtime lag seen with dynticks, because while 'xtime' itself
is still not updated often, we keep a 'xtime_cache' variable around that
contains the approximate real-time that _is_ updated each time we do a
'update_wall_time()', and is thus never off by more than one tick.

IOW, this restores the original semantics for 'xtime' users, as long as
you use the proper abstraction functions (ie 'current_kernel_time()' or
'get_seconds()' depending on whether you want a timespec or just the
seconds field).

[ Updated Patch.  As penance for my sins I've also yanked another #ifdef
  that was added to avoid the xtime lag w/ hrtimers.  ]
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

17c38b74

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功