提交 · 98962465ed9e6ea99c38e0af63fe1dcb5a79dc25 · openanolis / cloud-kernel

14 11月, 2009 4 次提交

nohz: Prevent clocksource wrapping during idle · 98962465

由 Jon Hunter 提交于 8月 18, 2009

The dynamic tick allows the kernel to sleep for periods longer than a
single tick, but it does not limit the sleep time currently. In the
worst case the kernel could sleep longer than the wrap around time of
the time keeping clock source which would result in losing track of
time.

Prevent this by limiting it to the safe maximum sleep time of the
current time keeping clock source. The value is calculated when the
clock source is registered.

[ tglx: simplified the code a bit and massaged the commit msg ]
Signed-off-by: NJon Hunter <jon-hunter@ti.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

98962465

nohz: Type cast printk argument · 529eaccd

由 Thomas Gleixner 提交于 11月 13, 2009

On some archs local_softirq_pending() has a data type of unsigned long
on others its unsigned int. Type cast it to (unsigned int) in the
printk to avoid the compiler warning.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>

529eaccd

clocksource: Provide a generic mult/shift factor calculation · 7d2f944a

由 Thomas Gleixner 提交于 11月 11, 2009

MIPS has two functions to calculcate the mult/shift factors for clock
sources and clock events at run time. ARM needs such functions as
well.

Implement a function which calculates the mult/shift factors based on
the frequencies to which and from which is converted. The function
also has a parameter to specify the minimum conversion range in
seconds. This range is guaranteed not to produce a 64bit overflow when
a value is multiplied with the calculated mult factor. The larger the
conversion range the less becomes the conversion accuracy.

Provide two inline wrappers which handle clock events and clock
sources. For clock events the "from" frequency is nano seconds per
second which corresponds to 1GHz and "to" is the device frequency. For
clock sources "from" is the device frequency and "to" is nano seconds
per second.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NMikael Pettersson <mikpe@it.uu.se>
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Acked-by: NLinus Walleij <linus.walleij@stericsson.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20091111134229.766673305@linutronix.de>

7d2f944a

clockevents: Use u32 for mult and shift factors · 23af368e

由 Thomas Gleixner 提交于 11月 11, 2009

The mult and shift factors of clock events differ in their data type
from those of clock sources for no reason. u32 is sufficient for
both. shift is always <= 32 and mult is limited to 2^32-1 to avoid
64bit multiplication overflows in the conversion.

Preparatory patch for a generic mult/shift factor calculation
function.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NMikael Pettersson <mikpe@it.uu.se>
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Acked-by: NLinus Walleij <linus.walleij@stericsson.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20091111134229.725664788@linutronix.de>

23af368e

05 11月, 2009 2 次提交

nohz: Introduce arch_needs_cpu · 3c5d92a0

由 Martin Schwidefsky 提交于 9月 29, 2009

Allow the architecture to request a normal jiffy tick when the system
goes idle and tick_nohz_stop_sched_tick is called . On s390 the hook is
used to prevent the system going fully idle if there has been an
interrupt other than a clock comparator interrupt since the last wakeup.

On s390 the HiperSockets response time for 1 connection ping-pong goes
down from 42 to 34 microseconds. The CPU cost decreases by 27%.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
LKML-Reference: <20090929122533.402715150@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

3c5d92a0

nohz: Reuse ktime in sub-functions of tick_check_idle. · eed3b9cf

由 Martin Schwidefsky 提交于 9月 29, 2009

On a system with NOHZ=y tick_check_idle calls tick_nohz_stop_idle and
tick_nohz_update_jiffies. Given the right conditions (ts->idle_active
and/or ts->tick_stopped) both function get a time stamp with ktime_get.
The same time stamp can be reused if both function require one.

On s390 this change has the additional benefit that gcc inlines the
tick_nohz_stop_idle function into tick_check_idle. The number of
instructions to execute tick_check_idle drops from 225 to 144
(without the ktime_get optimization it is 367 vs 215 instructions).

before:

 0)               |  tick_check_idle() {
 0)               |    tick_nohz_stop_idle() {
 0)               |      ktime_get() {
 0)               |        read_tod_clock() {
 0)   0.601 us    |        }
 0)   1.765 us    |      }
 0)   3.047 us    |    }
 0)               |    ktime_get() {
 0)               |      read_tod_clock() {
 0)   0.570 us    |      }
 0)   1.727 us    |    }
 0)               |    tick_do_update_jiffies64() {
 0)   0.609 us    |    }
 0)   8.055 us    |  }

after:

 0)               |  tick_check_idle() {
 0)               |    ktime_get() {
 0)               |      read_tod_clock() {
 0)   0.617 us    |      }
 0)   1.773 us    |    }
 0)               |    tick_do_update_jiffies64() {
 0)   0.593 us    |    }
 0)   4.477 us    |  }
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: john stultz <johnstul@us.ibm.com>
LKML-Reference: <20090929122533.206589318@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

eed3b9cf

05 10月, 2009 2 次提交

time: Remove xtime_cache · 7bc7d637

由 john stultz 提交于 10月 02, 2009

With the prior logarithmic time accumulation patch, xtime will now
always be within one "tick" of the current time, instead of
possibly half a second off.

This removes the need for the xtime_cache value, which always
stored the time at the last interrupt, so this patch cleans that up
removing the xtime_cache related code.

This is a bit simpler, but still could use some wider testing.
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NJohn Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1254525855.7741.95.camel@localhost.localdomain>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7bc7d637

time: Implement logarithmic time accumulation · a092ff0f

由 john stultz 提交于 10月 02, 2009

Accumulating one tick at a time works well unless we're using NOHZ.
Then it can be an issue, since we may have to run through the loop
a few thousand times, which can increase timer interrupt caused
latency.

The current solution was to accumulate in half-second intervals
with NOHZ. This kept the number of loops down, however it did
slightly change how we make NTP adjustments. While not an issue
with NTPd users, as NTPd makes adjustments over a longer period of
time, other adjtimex() users have noticed the half-second
granularity with which we can apply frequency changes to the clock.

For instance, if a application tries to apply a 100ppm frequency
correction for 20ms to correct a 2us offset, with NOHZ they either
get no correction, or a 50us correction.

Now, there will always be some granularity error for applying
frequency corrections. However with users sensitive to this error
have seen a 50-500x increase with NOHZ compared to running without
NOHZ.

So I figured I'd try another approach then just simply increasing
the interval. My approach is to consume the time interval
logarithmically. This reduces the number of times through the loop
needed keeping latency down, while still preserving the original
granularity error for adjtimex() changes.

Further, this change allows us to remove the xtime_cache code
(patch to follow), as xtime is always within one tick of the
current time, instead of the half-second updates it saw before.

An earlier version of this patch has been shipping to x86 users in
the RedHat MRG releases for awhile without issue, but I've reworked
this version to be even more careful about avoiding possible
overflows if the shift value gets too large.
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NJohn Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1254525473.7741.88.camel@localhost.localdomain>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a092ff0f

02 10月, 2009 1 次提交

const: constify remaining file_operations · 828c0950

由 Alexey Dobriyan 提交于 10月 01, 2009

[akpm@linux-foundation.org: fix KVM]
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

828c0950

25 9月, 2009 1 次提交

clocksource: Resume clocksource without taking the clocksource mutex · 89133f93

由 Martin Schwidefsky 提交于 9月 24, 2009

git commit 75c5158f converted the clocksource spinlock to a
mutex. This causes the following BUG:

BUG: sleeping function called from invalid context at
kernel/mutex.c:280 in_atomic(): 0, irqs_disabled(): 1, pid: 2473,
name: pm-suspend 2 locks held by pm-suspend/2473:
 #0:  (&buffer->mutex){......}, at: [<ffffffff8115ab13>]
sysfs_write_file+0x3c/0x137
 #1:  (pm_mutex){......}, at: [<ffffffff810865b5>]
enter_state+0x39/0x130 Pid: 2473, comm: pm-suspend Not tainted 2.6.31
#1 Call Trace:
 [<ffffffff810792f0>] ? __debug_show_held_locks+0x22/0x24
 [<ffffffff8104a2ef>] __might_sleep+0x107/0x10b
 [<ffffffff8141fca9>] mutex_lock_nested+0x25/0x43
 [<ffffffff81073537>] clocksource_resume+0x1c/0x60
 [<ffffffff81072902>] timekeeping_resume+0x1e/0x1c8
 [<ffffffff812aee62>] __sysdev_resume+0x25/0xcf
 [<ffffffff812aef79>] sysdev_resume+0x6d/0xae
 [<ffffffff810864f8>] suspend_devices_and_enter+0x12b/0x1af
 [<ffffffff8108665b>] enter_state+0xdf/0x130
 [<ffffffff81085dc3>] state_store+0xb6/0xd3
 [<ffffffff81204c73>] kobj_attr_store+0x17/0x19
 [<ffffffff8115abd2>] sysfs_write_file+0xfb/0x137
 [<ffffffff811057d2>] vfs_write+0xae/0x10b
 [<ffffffff81208392>] ? __up_read+0x1a/0x7f
 [<ffffffff811058ef>] sys_write+0x4a/0x6e
 [<ffffffff81011b82>] system_call_fastpath+0x16/0x1b

clocksource_resume is called early in the resume process, there is
only one cpu, no processes are running and the interrupts are
disabled. It is therefore possible to resume the clocksources
without taking the clocksource mutex.
Reported-by: NXiaotian Feng <xtfeng@gmail.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Tested-by: NMichal Schmidt <mschmidt@redhat.com>
Cc: Xiaotian Feng <xtfeng@gmail.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090924172952.49697825@mschwide.boeblingen.de.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

89133f93

24 9月, 2009 1 次提交

time: add function to convert between calendar time and broken-down time for universal use · 57f1f087

由 Zhaolei 提交于 9月 23, 2009

There are many similar code in kernel for one object: convert time between
calendar time and broken-down time.

Here is some source I found:
  fs/ncpfs/dir.c
  fs/smbfs/proc.c
  fs/fat/misc.c
  fs/udf/udftime.c
  fs/cifs/netmisc.c
  net/netfilter/xt_time.c
  drivers/scsi/ips.c
  drivers/input/misc/hp_sdc_rtc.c
  drivers/rtc/rtc-lib.c
  arch/ia64/hp/sim/boot/fw-emu.c
  arch/m68k/mac/misc.c
  arch/powerpc/kernel/time.c
  arch/parisc/include/asm/rtc.h
  ...

We can make a common function for this type of conversion, At least we
can get following benefit:

1: Make kernel simple and unify
2: Easy to fix bug in converting code
3: Reduce clone of code in future
   For example, I'm trying to make ftrace display walltime,
   this patch will make me easy.

This code is based on code from glibc-2.6
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57f1f087

15 9月, 2009 2 次提交

clocksource: Delay clocksource down rating to late boot · 54a6bc0b

由 Thomas Gleixner 提交于 9月 14, 2009

The down rating of clock sources in the early boot process via the
clock source watchdog mechanism can happen way before the per cpu
event queues are initialized. This leads to a boot crash on x86 when
the TSC is marked unstable in the SMP bring up.

The selection of a clock source for time keeping happens in the late
boot process so we can safely delay the list manipulation until
clocksource_done_booting() is called.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>

54a6bc0b

clocksource: clocksource_select must be called with mutex locked · e6c73305

由 Thomas Gleixner 提交于 9月 14, 2009

The callers of clocksource_select must hold clocksource_mutex to
protect the clocksource_list.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>

e6c73305

12 9月, 2009 1 次提交

clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash · f79e0258

由 Martin Schwidefsky 提交于 9月 11, 2009

The watchdog timer is started after the watchdog clocksource
and at least one watched clocksource have been registered. The
clocksource work element watchdog_work is initialized just
before the clocksource timer is started. This is too late for
the clocksource_mark_unstable call from native_cpu_up. To fix
this use a static initializer for watchdog_work.

This resolves a boot crash reported by multiple people.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090911153305.3fe9a361@skybase>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f79e0258

29 8月, 2009 1 次提交

clocksource: Resolve cpu hotplug dead lock with TSC unstable · 7285dd7f

由 Thomas Gleixner 提交于 8月 28, 2009

Martin Schwidefsky analyzed it:
To register a clocksource the clocksource_mutex is acquired and if
necessary timekeeping_notify is called to install the clocksource as
the timekeeper clock. timekeeping_notify uses stop_machine which needs
to take cpu_add_remove_lock mutex.
Starting a new cpu is done with the cpu_add_remove_lock mutex held.
native_cpu_up checks the tsc of the new cpu and if the tsc is no good
clocksource_change_rating is called. Which needs the clocksource_mutex
and the deadlock is complete.

The solution is to replace the TSC via the clocksource watchdog
mechanism. Mark the TSC as unstable and schedule the watchdog work so
it gets removed in the watchdog thread context.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>

7285dd7f

25 8月, 2009 1 次提交

timekeeping: Fix invalid getboottime() value · 36d47481

由 Hiroshi Shimamoto 提交于 8月 25, 2009

Don't use timespec_add_safe() with wall_to_monotonic, because
wall_to_monotonic has negative values which will cause overflow
in timespec_add_safe(). That makes btime in /proc/stat invalid.
Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <4A937FDE.4050506@ct.jp.nec.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

36d47481

22 8月, 2009 1 次提交

time: Introduce CLOCK_REALTIME_COARSE · da15cfda

由 john stultz 提交于 8月 19, 2009

After talking with some application writers who want very fast, but not
fine-grained timestamps, I decided to try to implement new clock_ids
to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
which returns the time at the last tick. This is very fast as we don't
have to access any hardware (which can be very painful if you're using
something like the acpi_pm clocksource), and we can even use the vdso
clock_gettime() method to avoid the syscall. The only trade off is you
only get low-res tick grained time resolution.

This isn't a new idea, I know Ingo has a patch in the -rt tree that made
the vsyscall gettimeofday() return coarse grained time when the
vsyscall64 sysctrl was set to 2. However this affects all applications
on a system.

With this method, applications can choose the proper speed/granularity
trade-off for themselves.
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: nikolag@ca.ibm.com
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: arjan@infradead.org
Cc: jonathan@jonmasters.org
LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

da15cfda

20 8月, 2009 1 次提交

clockevent: Prevent dead lock on clockevents_lock · f833bab8

由 Suresh Siddha 提交于 8月 17, 2009

Currently clockevents_notify() is called with interrupts enabled at
some places and interrupts disabled at some other places.

This results in a deadlock in this scenario.

cpu A holds clockevents_lock in clockevents_notify() with irqs enabled
cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled
cpu C doing set_mtrr() which will try to rendezvous of all the cpus.

This will result in C and A come to the rendezvous point and waiting
for B. B is stuck forever waiting for the spinlock and thus not
reaching the rendezvous point.

Fix the clockevents code so that clockevents_lock is taken with
interrupts disabled and thus avoid the above deadlock.

Also call lapic_timer_propagate_broadcast() on the destination cpu so
that we avoid calling smp_call_function() in the clockevents notifier
chain.

This issue left us wondering if we need to change the MTRR rendezvous
logic to use stop machine logic (instead of smp_call_function) or add
a check in spinlock debug code to see if there are other spinlocks
which gets taken under both interrupts enabled/disabled conditions.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: "Brown Len" <len.brown@intel.com>
LKML-Reference: <1250544899.2709.210.camel@sbs-t61.sc.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f833bab8

19 8月, 2009 2 次提交

clocksource: Avoid clocksource watchdog circular locking dependency · 01548f4d

由 Martin Schwidefsky 提交于 8月 18, 2009

stop_machine from a multithreaded workqueue is not allowed because
of a circular locking dependency between cpu_down and the workqueue
execution. Use a kernel thread to do the clocksource downgrade.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: john stultz <johnstul@us.ibm.com>
LKML-Reference: <20090818170942.3ab80c91@skybase>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

01548f4d

clocksource: Protect the watchdog rating changes with clocksource_mutex · d0981a1b

由 Thomas Gleixner 提交于 8月 19, 2009

Martin pointed out that commit 6ea41d2529 (clocksource: Call
clocksource_change_rating() outside of watchdog_lock) has a
theoretical reference count problem. The calls to
clocksource_change_rating() are now done outside of the clocksource
mutex and outside of the watchdog lock. A concurrent
clocksource_unregister() could remove the clock.

Split out the code which changes the rating from
clocksource_change_rating() into __clocksource_change_rating().

Protect the clocksource_watchdog_work() code sequence with the
clocksource_mutex() and call __clocksource_change_rating().

LKML-Reference: <alpine.LFD.2.00.0908171038420.2782@localhost.localdomain>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>

d0981a1b

17 8月, 2009 1 次提交

timers: Drop write permission on /proc/timer_list · de809347

由 Amerigo Wang 提交于 8月 17, 2009

/proc/timer_list and /proc/slabinfo are not supposed to be
written, so there should be no write permissions on it.
Signed-off-by: NWANG Cong <amwang@redhat.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: linux-mm@kvack.org
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <20090817094525.6355.88682.sendpatchset@localhost.localdomain>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

de809347

15 8月, 2009 16 次提交

clocksource: Call clocksource_change_rating() outside of watchdog_lock · 6ea41d25

由 Thomas Gleixner 提交于 8月 15, 2009

The changes to the watchdog logic introduced a lock inversion between
watchdog_lock and clocksource_mutex. Change the rating outside of
watchdog_lock to avoid it.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6ea41d25

timekeeping: Introduce read_boot_clock · 23970e38

由 Martin Schwidefsky 提交于 8月 14, 2009

Add the new function read_boot_clock to get the exact time the system
has been started. For architectures without support for exact boot
time a new weak function is added that returns 0.  Use the exact boot
time to initialize wall_to_monotonic, or xtime if the read_boot_clock
returned 0.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134811.296703241@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

23970e38

timekeeping: Increase granularity of read_persistent_clock() · d4f587c6

由 Martin Schwidefsky 提交于 8月 14, 2009

The persistent clock of some architectures (e.g. s390) have a
better granularity than seconds. To reduce the delta between the
host clock and the guest clock in a virtualized system change the 
read_persistent_clock function to return a struct timespec.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134811.013873340@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

d4f587c6

timekeeping: Update clocksource with stop_machine · 75c5158f

由 Martin Schwidefsky 提交于 8月 14, 2009

update_wall_time calls change_clocksource HZ times per second to check
if a new clock source is available. In close to 100% of all calls
there is no new clock. Replace the tick based check by an update done
with stop_machine.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134810.711836357@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

75c5158f

timekeeping: Add timekeeper read_clock helper functions · 2ba2a305

由 Martin Schwidefsky 提交于 8月 14, 2009

Add timekeeper_read_clock_ntp and timekeeper_read_clock_raw and use
them for getnstimeofday, ktime_get, ktime_get_ts and getrawmonotonic.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134810.435105711@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

2ba2a305

timekeeping: Move NTP adjusted clock multiplier to struct timekeeper · 0a544198

由 Martin Schwidefsky 提交于 8月 14, 2009

The clocksource structure has two multipliers, the unmodified multiplier
clock->mult_orig and the NTP corrected multiplier clock->mult. The NTP
multiplier is misplaced in the struct clocksource, this is private
information of the timekeeping code. Add the mult field to the struct
timekeeper to contain the NTP corrected value, keep the unmodifed
multiplier in clock->mult and remove clock->mult_orig.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134810.149047645@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

0a544198

timekeeping: Add xtime_shift and ntp_error_shift to struct timekeeper · 23ce7211

由 Martin Schwidefsky 提交于 8月 14, 2009

The xtime_nsec value in the timekeeper structure is shifted by a few
bits to improve precision. This happens to be the same value as the
clock->shift. To improve readability add xtime_shift to the timekeeper
and use it instead of the clock->shift. Likewise add ntp_error_shift
and replace all (NTP_SCALE_SHIFT - clock->shift) expressions.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134809.871899606@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

23ce7211

timekeeping: Introduce struct timekeeper · 155ec602

由 Martin Schwidefsky 提交于 8月 14, 2009

Add struct timekeeper to keep the internal values timekeeping.c needs
in regard to the currently selected clock source. This moves the
timekeeping intervals, xtime_nsec and the ntp error value from struct
clocksource to struct timekeeper. The raw_time is removed from the
clocksource as well. It gets treated like xtime as a global variable.
Eventually xtime raw_time should be moved to struct timekeeper.

[ tglx: minor cleanup ]
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134809.613209842@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

155ec602

clocksource: Move watchdog downgrade to a work queue thread · c55c87c8

由 Martin Schwidefsky 提交于 8月 14, 2009

Move the downgrade of an unstable clocksource from the timer interrupt
context into the process context of a work queue thread. This is
needed to be able to do the clocksource switch with stop_machine.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134809.354926067@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

c55c87c8

clocksource: Refactor clocksource watchdog · fb63a0eb

由 Martin Schwidefsky 提交于 8月 14, 2009

Refactor clocksource watchdog code to make it more readable. Add
clocksource_dequeue_watchdog to remove a clocksource from the watchdog
list when it is unregistered.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134809.110881699@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

fb63a0eb

clocksource: Simplify clocksource watchdog resume logic · 0f8e8ef7

由 Martin Schwidefsky 提交于 8月 14, 2009

To resume the clocksource watchdog just remove the CLOCK_SOURCE_WATCHDOG
bit from the watched clocksource.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134808.880925790@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

0f8e8ef7

clocksource: Delay clocksource watchdog highres enablement · 8cf4e750

由 Martin Schwidefsky 提交于 8月 14, 2009

The clocksource watchdog marks a clock as highres capable before it
checked the deviation from the watchdog clocksource even for a single
time. Make sure that the deviation is at least checked once before
doing the switch to highres mode.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134808.627795883@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

8cf4e750

clocksource: Cleanup clocksource selection · f1b82746

由 Martin Schwidefsky 提交于 8月 14, 2009

If a non high-resolution clocksource is first set as override clock
and then registered it becomes active even if the system is in one-shot
mode. Move the override check from sysfs_override_clocksource to the
clocksource selection. That fixes the bug and simplifies the code. The
check in clocksource_register for double registration of the same
clocksource is removed without replacement.

To find the initial clocksource a new weak function in jiffies.c is
defined that returns the jiffies clocksource. The architecture code
can then override the weak function with a more suitable clocksource,
e.g. the TOD clock on s390.

[ tglx: Folded in a fix from John Stultz ]
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134808.388024160@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f1b82746

timekeeping: Move reset of cycle_last for tsc clocksource to tsc · 1be39679

由 Martin Schwidefsky 提交于 8月 14, 2009

change_clocksource resets the cycle_last value to zero then sets it to
a value read from the clocksource. The reset to zero is required only
for the TSC clocksource to make the read_tsc function work after a
resume. The reason is that the TSC read function uses cycle_last to
detect backwards going TSCs. In the resume case cycle_last contains
the TSC value from the last update before the suspend. On resume the
TSC starts counting from 0 again and would trip over the cycle_last
comparison.

This is subtle and surprising. Move the reset to a resume function in
the tsc code.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134808.142191175@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

1be39679

timekeeping: Remove clocksource inline functions · a0f7d48b

由 Martin Schwidefsky 提交于 8月 14, 2009

The three inline functions clocksource_read, clocksource_enable and
clocksource_disable are simple wrappers of an indirect call plus the
copy from and to the mult_orig value. The functions are exclusively
used by the timekeeping code which has intimate knowledge of the
clocksource anyway. Therefore remove the inline functions. No
functional change.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134807.903108946@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

a0f7d48b

timekeeping: Introduce timekeeping_leap_insert · 31089c13

由 John Stultz 提交于 8月 14, 2009

Move the adjustment of xtime, wall_to_monotonic and the update of the
vsyscall variables to the timekeeping code.
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
LKML-Reference: <20090814134807.609730216@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

31089c13

19 7月, 2009 1 次提交

clocksource: Prevent NULL pointer dereference · 79ef2bb0

由 Thomas Gleixner 提交于 7月 19, 2009

Writing a zero length string to sys/.../current_clocksource will cause
a NULL pointer dereference if the clock events system is in one shot
(highres or nohz) mode.
Pointed-out-by: NDan Carpenter <error27@gmail.com>
LKML-Reference: <alpine.DEB.2.00.0907191545580.12306@bicker>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

79ef2bb0

10 7月, 2009 1 次提交

hrtimer: Fix migration expiry check · 6ff7041d

由 Thomas Gleixner 提交于 7月 10, 2009

The timer migration expiry check should prevent the migration of a
timer to another CPU when the timer expires before the next event is
scheduled on the other CPU. Migrating the timer might delay it because
we can not reprogram the clock event device on the other CPU. But the
code implementing that check has two flaws:

- for !HIGHRES the check compares the expiry value with the clock
  events device expiry value which is wrong for CLOCK_REALTIME based
  timers.

- the check is racy. It holds the hrtimer base lock of the target CPU,
  but the clock event device expiry value can be modified
  nevertheless, e.g. by an timer interrupt firing.

The !HIGHRES case is easy to fix as we can enqueue the timer on the
cpu which was selected by the load balancer. It runs the idle
balancing code once per jiffy anyway. So the maximum delay for the
timer is the same as when we keep the tick on the current cpu going.

In the HIGHRES case we can get the next expiry value from the hrtimer
cpu_base of the target CPU and serialize the update with the cpu_base
lock. This moves the lock section in hrtimer_interrupt() so we can set
next_event to KTIME_MAX while we are handling the expired timers and
set it to the next expiry value after we handled the timers under the
base lock. While the expired timers are processed timer migration is
blocked because the expiry time of the timer is always <= KTIME_MAX.

Also remove the now useless clockevents_get_next_event() function.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6ff7041d

07 7月, 2009 1 次提交

timekeeping: Move ktime_get() functions to timekeeping.c · a40f262c

由 Thomas Gleixner 提交于 7月 07, 2009

The ktime_get() functions for GENERIC_TIME=n are still located in
hrtimer.c. Move them to time/timekeeping.c where they belong.

LKML-Reference: <new-submission>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

a40f262c

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功