提交 · 9f31f5774961a735687fee17953ab505b3df3abf · openanolis / cloud-kernel

27 7月, 2010 3 次提交

timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset · 7615856e

由 John Stultz 提交于 7月 13, 2010

update_vsyscall() did not provide the wall_to_monotoinc offset,
so arch specific implementations tend to reference wall_to_monotonic
directly. This limits future cleanups in the timekeeping core, so
this patch fixes the update_vsyscall interface to provide
wall_to_monotonic, allowing wall_to_monotonic to be made static
as planned in Documentation/feature-removal-schedule.txt
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

7615856e

time: Kill off CONFIG_GENERIC_TIME · 592913ec

由 John Stultz 提交于 7月 13, 2010

Now that all arches have been converted over to use generic time via
clocksources or arch_gettimeoffset(), we can remove the GENERIC_TIME
config option and simplify the generic code.
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-4-git-send-email-johnstul@us.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

592913ec

time: Implement timespec_add · ce3bf7ab

由 John Stultz 提交于 7月 13, 2010

After accidentally misusing timespec_add_safe, I wanted to make sure
we don't accidently trip over that issue again, so I created a simple
timespec_add() function which we can use to replace the instances
of timespec_add_safe() that don't want the overflow detection.
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-3-git-send-email-johnstul@us.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

ce3bf7ab

01 7月, 2010 1 次提交

sched: Cure nr_iowait_cpu() users · 8c215bd3

由 Peter Zijlstra 提交于 7月 01, 2010

Commit 0224cf4c (sched: Intoduce get_cpu_iowait_time_us())
broke things by not making sure preemption was indeed disabled
by the callers of nr_iowait_cpu() which took the iowait value of
the current cpu.

This resulted in a heap of preempt warnings. Cure this by making
nr_iowait_cpu() take a cpu number and fix up the callers to pass
in the right number.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: linux-pm@lists.linux-foundation.org
LKML-Reference: <1277968037.1868.120.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8c215bd3

18 6月, 2010 1 次提交

nohz: Fix nohz ratelimit · 3310d4d3

由 Peter Zijlstra 提交于 6月 17, 2010

Chris Wedgwood reports that 39c0cbe2 (sched: Rate-limit nohz) causes a
serial console regression, unresponsiveness, and indeed it does. The
reason is that the nohz code is skipped even when the tick was already
stopped before the nohz_ratelimit(cpu) condition changed.

Move the nohz_ratelimit() check to the other conditions which prevent
long idle sleeps.
Reported-by: NChris Wedgwood <cw@f00f.org>
Tested-by: NBrian Bloniarz <bmb@athenacr.com>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg KH <gregkh@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Jef Driesen <jefdriesen@telenet.be>
LKML-Reference: <1276790557.27822.516.camel@twins>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

3310d4d3

10 5月, 2010 7 次提交

clocksource: Add clocksource_register_hz/khz interface · d7e81c26

由 John Stultz 提交于 5月 07, 2010

How to pick good mult/shift pairs has always been difficult to
describe to folks writing clocksource drivers, since it requires
careful tradeoffs in adjustment accuracy vs overflow limits.

Now, with the clocks_calc_mult_shift function, its much
easier. However, not many clocksources have converted to using that
function, and there is still the issue of the max interval length
assumption being made by each clocksource driver independently.

So this patch simplifies the registration process by having
clocksources be registered with a hz/khz value and the registration
function taking care of setting mult/shift.

This should take most of the confusion out of writing a clocksource
driver.

Additionally it also keeps the shift size tradeoff (more accuracy vs
longer possible nohz times) centralized so the timekeeping core can
keep track of the assumptions being made.

[ tglx: Coding style and comments fixed ]
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
LKML-Reference: <1273280858-30143-1-git-send-email-johnstul@us.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

d7e81c26

sched: Intoduce get_cpu_iowait_time_us() · 0224cf4c

由 Arjan van de Ven 提交于 5月 09, 2010

For the ondemand cpufreq governor, it is desired that the iowait
time is microaccounted in a similar way as idle time is.

This patch introduces the infrastructure to account and expose
this information via the get_cpu_iowait_time_us() function.

[akpm@linux-foundation.org: fix CONFIG_NO_HZ=n build]
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082523.284feab6@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0224cf4c

sched: Eliminate the ts->idle_lastupdate field · e0e37c20

由 Arjan van de Ven 提交于 5月 09, 2010

Now that the only user of ts->idle_lastupdate is
update_ts_time_stats(), the entire field can be eliminated.

In update_ts_time_stats(), idle_lastupdate is first set to
"now", and a few lines later, the only user is an if() statement
that assigns a variable either to "now" or to
ts->idle_lastupdate, which has the value of "now" at that point.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082439.2fab0b4f@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e0e37c20

sched: Fold updating of the last_update_time_info into update_ts_time_stats() · 8d63bf94

由 Arjan van de Ven 提交于 5月 09, 2010

This patch folds the updating of the last_update_time into the
update_ts_time_stats() function, and updates the callers.

This allows for further cleanups that are done in the next
patch.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082403.60072967@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8d63bf94

sched: Update the idle statistics in get_cpu_idle_time_us() · 8c7b09f4

由 Arjan van de Ven 提交于 5月 09, 2010

Right now, get_cpu_idle_time_us() only reports the idle
statistics upto the point the CPU entered last idle; not what is
valid right now.

This patch adds an update of the idle statistics to
get_cpu_idle_time_us(), so that calling this function always
returns statistics that are accurate at the point of the call.

This includes resetting the start of the idle time for
accounting purposes to avoid double accounting.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082323.2d2f1945@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8c7b09f4

sched: Introduce a function to update the idle statistics · 595aac48

由 Arjan van de Ven 提交于 5月 09, 2010

Currently, two places update the idle statistics (and more to
come later in this series).

This patch creates a helper function for updating these
statistics.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082245.163e67ed@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

595aac48

sched: Add a comment to get_cpu_idle_time_us() · b1f724c3

由 Arjan van de Ven 提交于 5月 09, 2010

The exported function get_cpu_idle_time_us() has no comment
describing it; add a kerneldoc comment
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082208.7cb721f0@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b1f724c3

13 4月, 2010 1 次提交

time: Remove xtime_cache · 6a867a39

由 John Stultz 提交于 4月 06, 2010

With the earlier logarithmic time accumulation patch, xtime will now
always be within one "tick" of the current time, instead of possibly
half a second off.

This removes the need for the xtime_cache value, which always stored the
time at the last interrupt, so this patch cleans that up removing the
xtime_cache related code.

This patch also addresses an issue with an earlier version of this change,
where xtime_cache was normalizing xtime, which could in some cases be
not valid (ie: tv_nsec == NSEC_PER_SEC). This is fixed by handling
the edge case in update_wall_time().
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Petr Titěra <P.Titera@century.cz>
LKML-Reference: <1270589451-30773-1-git-send-email-johnstul@us.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6a867a39

30 3月, 2010 1 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

24 3月, 2010 1 次提交

ntp: Make time_adjust static · e1292ba1

由 John Stultz 提交于 3月 18, 2010

Now that no arches are accessing time_adjust directly,
make it static.
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
LKML-Reference: <1268968769-19209-1-git-send-email-johnstul@us.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

e1292ba1

23 3月, 2010 1 次提交

time: Fix accumulation bug triggered by long delay. · 830ec045

由 John Stultz 提交于 3月 18, 2010

The logarithmic accumulation done in the timekeeping has some overflow
protection that limits the max shift value. That means it will take
more then shift loops to accumulate all of the cycles. This causes
the shift decrement to underflow, which causes the loop to never exit.

The simplest fix would be simply to do a:
	if (shift)
		shift--;

However that is not optimal, as we know the cycle offset is larger
then the interval << shift, the above would make shift drop to zero,
then we would be spinning for quite awhile accumulating at interval
chunks at a time.

Instead, this patch only decreases shift if the offset is smaller
then cycle_interval << shift.  This makes sure we accumulate using
the largest chunks possible without overflowing tick_length, and limits
the number of iterations through the loop.

This issue was found and reported by Sonic Zhang, who also tested the fix.
Many thanks your explanation and testing!
Reported-by: NSonic Zhang <sonic.adi@gmail.com>
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Tested-by: NSonic Zhang <sonic.adi@gmail.com>
LKML-Reference: <1268948850-5225-1-git-send-email-johnstul@us.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

830ec045

13 3月, 2010 1 次提交

clockevents: Sanitize min_delta_ns adjustment and prevent overflows · 80a05b9f

由 Thomas Gleixner 提交于 3月 12, 2010

The current logic which handles clock events programming failures can
increase min_delta_ns unlimited and even can cause overflows.

Sanitize it by:
 - prevent zero increase when min_delta_ns == 1
 - limiting min_delta_ns to a jiffie
 - bail out if the jiffie limit is hit
 - add retries stats for /proc/timer_list so we can gather data
Reported-by: NUwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

80a05b9f

12 3月, 2010 1 次提交

sched: Rate-limit nohz · 39c0cbe2

由 Mike Galbraith 提交于 3月 11, 2010

Entering nohz code on every micro-idle is costing ~10% throughput for netperf
TCP_RR when scheduling cross-cpu. Rate limiting entry fixes this, but raises
ticks a bit. On my Q6600, an idle box goes from ~85 interrupts/sec to 128.

The higher the context switch rate, the more nohz entry costs. With this patch
and some cycle recovery patches in my tree, max cross cpu context switch rate is
improved by ~16%, a large portion of which of which is this ratelimiting.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301003.6785.28.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

39c0cbe2

02 3月, 2010 1 次提交

timekeeping: Prevent oops when GENERIC_TIME=n · ad6759fb

由 john stultz 提交于 3月 01, 2010

Aaro Koskinen reported an issue in kernel.org bugzilla #15366, where
on non-GENERIC_TIME systems, accessing
/sys/devices/system/clocksource/clocksource0/current_clocksource
results in an oops.

It seems the timekeeper/clocksource rework missed initializing the
curr_clocksource value in the !GENERIC_TIME case.

Thanks to Aaro for reporting and diagnosing the issue as well as
testing the fix!
Reported-by: NAaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: stable@kernel.org
LKML-Reference: <1267475683.4216.61.camel@localhost.localdomain>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

ad6759fb

10 2月, 2010 1 次提交

Export the symbol of getboottime and mmonotonic_to_bootbased · c93d89f3

由 Jason Wang 提交于 1月 27, 2010

Export getboottime and monotonic_to_bootbased in order to let them
could be used by following patch.

Cc: stable@kernel.org
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c93d89f3

05 2月, 2010 2 次提交

clocksource: add suspend callback · c54a42b1

由 Magnus Damm 提交于 2月 02, 2010

Add a clocksource suspend callback.  This callback can be used by the
clocksource driver to shutdown and perform any kind of late suspend
activities even though the clocksource driver itself is a non-sysdev
driver.

One example where this is useful is to fix the sh_cmt.c platform driver
that today suspends using the platform bus and shuts down the clocksource
too early.

With this callback in place the sh_cmt driver will suspend using the
clocksource and clockevent hooks and leave the platform device pm
callbacks unused.
Signed-off-by: NMagnus Damm <damm@opensource.se>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

c54a42b1

clocksource: add argument to resume callback · 17622339

由 Magnus Damm 提交于 2月 02, 2010

Pass the clocksource as an argument to the clocksource resume callback. 
Needed so we can point out which CMT channel the sh_cmt.c driver shall
resume.
Signed-off-by: NMagnus Damm <damm@opensource.se>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

17622339

29 1月, 2010 2 次提交

ntp: Cleanup xtime references in ntp.c · 7e1b5847

由 John Stultz 提交于 1月 28, 2010

ntp.c doesn't need to access timekeeping internals directly, so change
xtime references to use the get_seconds() timekeeping interface.
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: richard@rsk.demon.co.uk
LKML-Reference: <1264738844-21935-1-git-send-email-johnstul@us.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

7e1b5847

ntp: Make time_esterror and time_maxerror static · 1f5b8f8a

由 john stultz 提交于 1月 28, 2010

Make time_esterror and time_maxerror static as no one uses them
outside of ntp.c
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: richard@rsk.demon.co.uk
LKML-Reference: <1264719761.3437.47.camel@localhost.localdomain>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

1f5b8f8a

26 1月, 2010 1 次提交

clocksource: Prevent potential kgdb dead lock · 7b7422a5

由 Thomas Gleixner 提交于 1月 26, 2010

commit 0f8e8ef7 (clocksource: Simplify clocksource watchdog resume
logic) introduced a potential kgdb dead lock. When the kernel is
stopped by kgdb inside code which holds watchdog_lock then kgdb dead
locks in clocksource_resume_watchdog().

clocksource_resume_watchdog() is called from kbdg via
clocksource_touch_watchdog() to avoid that the clock source watchdog
marks TSC unstable after the kernel has been stopped.

Solve this by replacing spin_lock with a spin_trylock and just return
in case the lock is held. Not resetting the watchdog might result in
TSC becoming marked unstable, but that's an acceptable penalty for
using kgdb.

The timekeeping is anyway easily screwed up by kgdb when the system
uses either jiffies or a clock source which wraps in short intervals
(e.g. pm_timer wraps about every 4.6s), so we really do not have to
worry about that occasional TSC marked unstable side effect.

The second caller of clocksource_resume_watchdog() is
clocksource_resume(). The trylock is safe here as well because the
system is UP at this point, interrupts are disabled and nothing else
can hold watchdog_lock().
Reported-by: NJason Wessel <jason.wessel@windriver.com>
LKML-Reference: <1264480000-6997-4-git-send-email-jason.wessel@windriver.com>
Cc: kgdb-bugreport@lists.sourceforge.net
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

7b7422a5

18 1月, 2010 1 次提交

clockevent: Don't remove broadcast device when cpu is dead · ea9d8e3f

由 Xiaotian Feng 提交于 1月 07, 2010

Marc reported that the BUG_ON in clockevents_notify() triggers on his
system. This happens because the kernel tries to remove an active
clock event device (used for broadcasting) from the device list.

The handling of devices which can be used as per cpu device and as a
global broadcast device is suboptimal.

The simplest solution for now (and for stable) is to check whether the
device is used as global broadcast device, but this needs to be
revisited.

[ tglx: restored the cpuweight check and massaged the changelog ]
Reported-by: NMarc Dionne <marc.c.dionne@gmail.com>
Tested-by: NMarc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: NXiaotian Feng <dfeng@redhat.com>
LKML-Reference: <1262834564-13033-1-git-send-email-dfeng@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org

ea9d8e3f

23 12月, 2009 1 次提交

Revert "time: Remove xtime_cache" · 83f57a11

由 Linus Torvalds 提交于 12月 22, 2009

This reverts commit 7bc7d637, as
requested by John Stultz. Quoting John:

 "Petr Titěra reported an issue where he saw odd atime regressions with
  2.6.33 where there were a full second worth of nanoseconds in the
  nanoseconds field.

  He also reviewed the time code and narrowed down the problem: unhandled
  overflow of the nanosecond field caused by rounding up the
  sub-nanosecond accumulated time.

  Details:

   * At the end of update_wall_time(), we currently round up the
  sub-nanosecond portion of accumulated time when storing it into xtime.
  This was added to avoid time inconsistencies caused when the
  sub-nanosecond portion was truncated when storing into xtime.
  Unfortunately we don't handle the possible second overflow caused by
  that rounding.

   * Previously the xtime_cache code hid this overflow by normalizing the
  xtime value when storing into the xtime_cache.

   * We could try to handle the second overflow after the rounding up, but
  since this affects the timekeeping's internal state, this would further
  complicate the next accumulation cycle, causing small errors in ntp
  steering. As much as I'd like to get rid of it, the xtime_cache code is
  known to work.

   * The correct fix is really to include the sub-nanosecond portion in the
  timekeeping accessor function, so we don't need to round up at during
  accumulation. This would greatly simplify the accumulation code.
  Unfortunately, we can't do this safely until the last three
  non-GENERIC_TIME arches (sparc32, arm, cris) are converted  (those
  patches are in -mm) and we kill off the spots where arches set xtime
  directly. This is all 2.6.34 material, so I think reverting the
  xtime_cache change is the best approach for now.

  Many thanks to Petr for both reporting and finding the issue!"
Reported-by: NPetr Titěra <P.Titera@century.cz>
Requested-by: Njohn stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83f57a11

17 12月, 2009 1 次提交

cpumask: avoid dereferencing struct cpumask · 62ac1279

由 Rusty Russell 提交于 12月 17, 2009

struct cpumask will be undefined soon with CONFIG_CPUMASK_OFFSTACK=y,
to avoid them being declared on the stack.

cpumask_bits() does what we want here (of course, this code is crap).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
To: Thomas Gleixner <tglx@linutronix.de>

62ac1279

16 12月, 2009 1 次提交

timecompare: fix half-Y2K38 problem in timecompare_update while calculating offset · f065f41f

由 Barry Song 提交于 12月 15, 2009

ktime will overflow from 03:14:07 UTC on Tuesday, 19 January 2038,
ktime_add() in timecompare_update() will overflow a half earlier.  As a
result, wrong offset will be gotten, then cause some strange problems.
Signed-off-by: NBarry Song <21cnbao@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Patrick Ohly <patrick.ohly@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f065f41f

15 12月, 2009 3 次提交

clockevents: Convert to raw_spinlock · b5f91da0

由 Thomas Gleixner 提交于 12月 08, 2009

Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NIngo Molnar <mingo@elte.hu>

b5f91da0

clockevents: Make tick_device_lock static · d192c47f

由 Thomas Gleixner 提交于 12月 08, 2009

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NIngo Molnar <mingo@elte.hu>

d192c47f

hrtimers: Convert to raw_spinlocks · ecb49d1a

由 Thomas Gleixner 提交于 11月 17, 2009

Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NIngo Molnar <mingo@elte.hu>

ecb49d1a

11 12月, 2009 1 次提交

clockevents: Prevent clockevent_devices list corruption on cpu hotplug · bb6eddf7

由 Thomas Gleixner 提交于 12月 10, 2009

Xiaotian Feng triggered a list corruption in the clock events list on
CPU hotplug and debugged the root cause.

If a CPU registers more than one per cpu clock event device, then only
the active clock event device is removed on CPU_DEAD. The unused
devices are kept in the clock events device list.

On CPU up the clock event devices are registered again, which means
that we list_add an already enqueued list_head. That results in list
corruption.

Resolve this by removing all devices which are associated to the dead
CPU on CPU_DEAD.
Reported-by: NXiaotian Feng <dfeng@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NXiaotian Feng <dfeng@redhat.com>
Cc: stable@kernel.org

bb6eddf7

10 12月, 2009 1 次提交

hrtimer: Tune hrtimer_interrupt hang logic · 41d2e494

由 Thomas Gleixner 提交于 11月 13, 2009

The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
execution time of the hrtimer callbacks.

This is error-prone for virtual machines, where a guest vcpu can be
scheduled out during the execution of the callbacks (and the callbacks
themselves can do operations that translate to blocking operations in
the hypervisor), which in can lead to large min_delta_ns rendering the
system unusable.

Replace the current heuristics with something more reliable. Allow the
interrupt code to try 3 times to catch up with the lost time. If that
fails use the total time spent in the interrupt handler to defer the
next timer interrupt so the system can catch up with other things
which got delayed. Limit that deferment to 100ms.

The retry events and the maximum time spent in the interrupt handler
are recorded and exposed via /proc/timer_list

Inspired by a patch from Marcelo.
Reported-by: NMichael Tokarev <mjt@tls.msk.ru>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NMarcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org

41d2e494

18 11月, 2009 1 次提交

clockevents: Add missing include to pacify sparse · 8e1a928a

由 H Hartley Sweeten 提交于 10月 16, 2009

Include "tick-internal.h" in order to pick up the extern function
prototype for clockevents_shutdown(). This quiets the following sparse
build noise:

  warning: symbol 'clockevents_shutdown' was not declared. Should it be static?
Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
LKML-Reference: <BD79186B4FD85F4B8E60E381CAEE190901E24550@mi8nycmail19.Mi8.com>
Reviewed-by: NYong Zhang <yong.zhang0@gmail.com>
Cc: johnstul@us.ibm.com
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

8e1a928a

17 11月, 2009 1 次提交

timekeeping: Fix clock_gettime vsyscall time warp · 0696b711

由 Lin Ming 提交于 11月 17, 2009

Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier
to struct timekeeper" the clock multiplier of vsyscall is updated with
the unmodified clock multiplier of the clock source and not with the
NTP adjusted multiplier of the timekeeper.

This causes user space observerable time warps:
new CLOCK-warp maximum: 120 nsecs,  00000025c337c537 -> 00000025c337c4bf

Add a new argument "mult" to update_vsyscall() and hand in the
timekeeping internal NTP adjusted multiplier.
Signed-off-by: NLin Ming <ming.m.lin@intel.com>
Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

0696b711

14 11月, 2009 4 次提交

clocksource/events: Fix fallout of generic code changes · a362c638

由 Thomas Gleixner 提交于 11月 14, 2009

powerpc grew a new warning due to the type change of clockevent->mult.

The architectures which use parts of the generic time keeping
infrastructure tripped over my wrong assumption that
clocksource_register is only used when GENERIC_TIME=y.

I should have looked and also I should have known better. These
renitent Gaul villages are racking my nerves. Some serious deprecating
is due.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

a362c638

nohz: Allow 32-bit machines to sleep for more than 2.15 seconds · 97813f2f

由 Jon Hunter 提交于 8月 18, 2009

In the dynamic tick code, "max_delta_ns" (member of the
"clock_event_device" structure) represents the maximum sleep time
that can occur between timer events in nanoseconds.

The variable, "max_delta_ns", is defined as an unsigned long
which is a 32-bit integer for 32-bit machines and a 64-bit
integer for 64-bit machines (if -m64 option is used for gcc).
The value of max_delta_ns is set by calling the function
"clockevent_delta2ns()" which returns a maximum value of LONG_MAX.
For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in
nanoseconds this equates to ~2.15 seconds. Hence, the maximum
sleep time for a 32-bit machine is ~2.15 seconds, where as for
a 64-bit machine it will be many years.

This patch changes the type of max_delta_ns to be "u64" instead of
"unsigned long" so that this variable is a 64-bit type for both 32-bit
and 64-bit machines. It also changes the maximum value returned by
clockevent_delta2ns() to KTIME_MAX.  Hence this allows a 32-bit
machine to sleep for longer than ~2.15 seconds. Please note that this
patch also changes "min_delta_ns" to be "u64" too and although this is
unnecessary, it makes the patch simpler as it avoids to fixup all
callers of clockevent_delta2ns().

[ tglx: changed "unsigned long long" to u64 as we use this data type
  	through out the time code ]
Signed-off-by: NJon Hunter <jon-hunter@ti.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1250617512-23567-3-git-send-email-jon-hunter@ti.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

97813f2f

nohz: Track last do_timer() cpu · 27185016

由 Thomas Gleixner 提交于 11月 12, 2009

The previous patch which limits the sleep time to the maximum
deferment time of the time keeping clocksource has some limitations on
SMP machines: if all CPUs are idle then for all CPUs the maximum sleep
time is limited.

Solve this by keeping track of which cpu had the do_timer() duty
assigned last and limit the sleep time only for this cpu.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: Jon Hunter <jon-hunter@ti.com>
Cc: John Stultz <johnstul@us.ibm.com>

27185016

nohz: Prevent clocksource wrapping during idle · 98962465

由 Jon Hunter 提交于 8月 18, 2009

The dynamic tick allows the kernel to sleep for periods longer than a
single tick, but it does not limit the sleep time currently. In the
worst case the kernel could sleep longer than the wrap around time of
the time keeping clock source which would result in losing track of
time.

Prevent this by limiting it to the safe maximum sleep time of the
current time keeping clock source. The value is calculated when the
clock source is registered.

[ tglx: simplified the code a bit and massaged the commit msg ]
Signed-off-by: NJon Hunter <jon-hunter@ti.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

98962465

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功