提交 · 123f94f22e3d283dfe68742b269c245b0501ad82 · openanolis / cloud-kernel

01 7月, 2010 1 次提交

sched: Cure nr_iowait_cpu() users · 8c215bd3

由 Peter Zijlstra 提交于 7月 01, 2010

Commit 0224cf4c (sched: Intoduce get_cpu_iowait_time_us())
broke things by not making sure preemption was indeed disabled
by the callers of nr_iowait_cpu() which took the iowait value of
the current cpu.

This resulted in a heap of preempt warnings. Cure this by making
nr_iowait_cpu() take a cpu number and fix up the callers to pass
in the right number.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: linux-pm@lists.linux-foundation.org
LKML-Reference: <1277968037.1868.120.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8c215bd3

18 6月, 2010 1 次提交

nohz: Fix nohz ratelimit · 3310d4d3

由 Peter Zijlstra 提交于 6月 17, 2010

Chris Wedgwood reports that 39c0cbe2 (sched: Rate-limit nohz) causes a
serial console regression, unresponsiveness, and indeed it does. The
reason is that the nohz code is skipped even when the tick was already
stopped before the nohz_ratelimit(cpu) condition changed.

Move the nohz_ratelimit() check to the other conditions which prevent
long idle sleeps.
Reported-by: NChris Wedgwood <cw@f00f.org>
Tested-by: NBrian Bloniarz <bmb@athenacr.com>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg KH <gregkh@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Jef Driesen <jefdriesen@telenet.be>
LKML-Reference: <1276790557.27822.516.camel@twins>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

3310d4d3

10 5月, 2010 6 次提交

sched: Intoduce get_cpu_iowait_time_us() · 0224cf4c

由 Arjan van de Ven 提交于 5月 09, 2010

For the ondemand cpufreq governor, it is desired that the iowait
time is microaccounted in a similar way as idle time is.

This patch introduces the infrastructure to account and expose
this information via the get_cpu_iowait_time_us() function.

[akpm@linux-foundation.org: fix CONFIG_NO_HZ=n build]
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082523.284feab6@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0224cf4c

sched: Eliminate the ts->idle_lastupdate field · e0e37c20

由 Arjan van de Ven 提交于 5月 09, 2010

Now that the only user of ts->idle_lastupdate is
update_ts_time_stats(), the entire field can be eliminated.

In update_ts_time_stats(), idle_lastupdate is first set to
"now", and a few lines later, the only user is an if() statement
that assigns a variable either to "now" or to
ts->idle_lastupdate, which has the value of "now" at that point.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082439.2fab0b4f@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e0e37c20

sched: Fold updating of the last_update_time_info into update_ts_time_stats() · 8d63bf94

由 Arjan van de Ven 提交于 5月 09, 2010

This patch folds the updating of the last_update_time into the
update_ts_time_stats() function, and updates the callers.

This allows for further cleanups that are done in the next
patch.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082403.60072967@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8d63bf94

sched: Update the idle statistics in get_cpu_idle_time_us() · 8c7b09f4

由 Arjan van de Ven 提交于 5月 09, 2010

Right now, get_cpu_idle_time_us() only reports the idle
statistics upto the point the CPU entered last idle; not what is
valid right now.

This patch adds an update of the idle statistics to
get_cpu_idle_time_us(), so that calling this function always
returns statistics that are accurate at the point of the call.

This includes resetting the start of the idle time for
accounting purposes to avoid double accounting.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082323.2d2f1945@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8c7b09f4

sched: Introduce a function to update the idle statistics · 595aac48

由 Arjan van de Ven 提交于 5月 09, 2010

Currently, two places update the idle statistics (and more to
come later in this series).

This patch creates a helper function for updating these
statistics.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082245.163e67ed@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

595aac48

sched: Add a comment to get_cpu_idle_time_us() · b1f724c3

由 Arjan van de Ven 提交于 5月 09, 2010

The exported function get_cpu_idle_time_us() has no comment
describing it; add a kerneldoc comment
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082208.7cb721f0@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b1f724c3

12 3月, 2010 1 次提交

sched: Rate-limit nohz · 39c0cbe2

由 Mike Galbraith 提交于 3月 11, 2010

Entering nohz code on every micro-idle is costing ~10% throughput for netperf
TCP_RR when scheduling cross-cpu. Rate limiting entry fixes this, but raises
ticks a bit. On my Q6600, an idle box goes from ~85 interrupts/sec to 128.

The higher the context switch rate, the more nohz entry costs. With this patch
and some cycle recovery patches in my tree, max cross cpu context switch rate is
improved by ~16%, a large portion of which of which is this ratelimiting.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301003.6785.28.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

39c0cbe2

14 11月, 2009 3 次提交

nohz: Track last do_timer() cpu · 27185016

由 Thomas Gleixner 提交于 11月 12, 2009

The previous patch which limits the sleep time to the maximum
deferment time of the time keeping clocksource has some limitations on
SMP machines: if all CPUs are idle then for all CPUs the maximum sleep
time is limited.

Solve this by keeping track of which cpu had the do_timer() duty
assigned last and limit the sleep time only for this cpu.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: Jon Hunter <jon-hunter@ti.com>
Cc: John Stultz <johnstul@us.ibm.com>

27185016

nohz: Prevent clocksource wrapping during idle · 98962465

由 Jon Hunter 提交于 8月 18, 2009

The dynamic tick allows the kernel to sleep for periods longer than a
single tick, but it does not limit the sleep time currently. In the
worst case the kernel could sleep longer than the wrap around time of
the time keeping clock source which would result in losing track of
time.

Prevent this by limiting it to the safe maximum sleep time of the
current time keeping clock source. The value is calculated when the
clock source is registered.

[ tglx: simplified the code a bit and massaged the commit msg ]
Signed-off-by: NJon Hunter <jon-hunter@ti.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

98962465

nohz: Type cast printk argument · 529eaccd

由 Thomas Gleixner 提交于 11月 13, 2009

On some archs local_softirq_pending() has a data type of unsigned long
on others its unsigned int. Type cast it to (unsigned int) in the
printk to avoid the compiler warning.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>

529eaccd

05 11月, 2009 2 次提交

nohz: Introduce arch_needs_cpu · 3c5d92a0

由 Martin Schwidefsky 提交于 9月 29, 2009

Allow the architecture to request a normal jiffy tick when the system
goes idle and tick_nohz_stop_sched_tick is called . On s390 the hook is
used to prevent the system going fully idle if there has been an
interrupt other than a clock comparator interrupt since the last wakeup.

On s390 the HiperSockets response time for 1 connection ping-pong goes
down from 42 to 34 microseconds. The CPU cost decreases by 27%.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
LKML-Reference: <20090929122533.402715150@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

3c5d92a0

nohz: Reuse ktime in sub-functions of tick_check_idle. · eed3b9cf

由 Martin Schwidefsky 提交于 9月 29, 2009

On a system with NOHZ=y tick_check_idle calls tick_nohz_stop_idle and
tick_nohz_update_jiffies. Given the right conditions (ts->idle_active
and/or ts->tick_stopped) both function get a time stamp with ktime_get.
The same time stamp can be reused if both function require one.

On s390 this change has the additional benefit that gcc inlines the
tick_nohz_stop_idle function into tick_check_idle. The number of
instructions to execute tick_check_idle drops from 225 to 144
(without the ktime_get optimization it is 367 vs 215 instructions).

before:

 0)               |  tick_check_idle() {
 0)               |    tick_nohz_stop_idle() {
 0)               |      ktime_get() {
 0)               |        read_tod_clock() {
 0)   0.601 us    |        }
 0)   1.765 us    |      }
 0)   3.047 us    |    }
 0)               |    ktime_get() {
 0)               |      read_tod_clock() {
 0)   0.570 us    |      }
 0)   1.727 us    |    }
 0)               |    tick_do_update_jiffies64() {
 0)   0.609 us    |    }
 0)   8.055 us    |  }

after:

 0)               |  tick_check_idle() {
 0)               |    ktime_get() {
 0)               |      read_tod_clock() {
 0)   0.617 us    |      }
 0)   1.773 us    |    }
 0)               |    tick_do_update_jiffies64() {
 0)   0.593 us    |    }
 0)   4.477 us    |  }
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: john stultz <johnstul@us.ibm.com>
LKML-Reference: <20090929122533.206589318@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

eed3b9cf

07 10月, 2009 1 次提交

NOHZ: update idle state also when NOHZ is inactive · fdc6f192

由 Eero Nurkkala 提交于 10月 07, 2009

Commit f2e21c96 had unfortunate side
effects with cpufreq governors on some systems.

If the system did not switch into NOHZ mode ts->inidle is not set when
tick_nohz_stop_sched_tick() is called from the idle routine. Therefor
all subsequent calls from irq_exit() to tick_nohz_stop_sched_tick()
fail to call tick_nohz_start_idle(). This results in bogus idle
accounting information which is passed to cpufreq governors.

Set the inidle flag unconditionally of the NOHZ active state to keep
the idle time accounting correct in any case.

[ tglx: Added comment and tweaked the changelog ]
Reported-by: NSteven Noonan <steven@uplinklabs.net>
Signed-off-by: NEero Nurkkala <ext-eero.nurkkala@nokia.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: stable@kernel.org
LKML-Reference: <1254907901.30157.93.camel@eenurkka-desktop>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

fdc6f192

27 5月, 2009 1 次提交

NOHZ: Properly feed cpufreq ondemand governor · f2e21c96

由 Eero Nurkkala 提交于 5月 25, 2009

A call from irq_exit() may occasionally pause the timing
info for cpufreq ondemand governor. This results in the
cpufreq ondemand governor to fail to calculate the 
system load properly. Thus, relocate the checks for this
particular case to keep the governor always functional.
Signed-off-by: NEero Nurkkala <ext-eero.nurkkala@nokia.com>
Reported-by: NTero Kristo <tero.kristo@nokia.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f2e21c96

13 5月, 2009 1 次提交

timers: Identifying the existing pinned timers · 5c333864

由 Arun R Bharadwaj 提交于 4月 16, 2009

* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:

The following pinned hrtimers have been identified and marked:
1)sched_rt_period_timer
2)tick_sched_timer
3)stack_trace_timer_fn

[ tglx: fixup the hrtimer pinned mode ]
Signed-off-by: NArun R Bharadwaj <arun@linux.vnet.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

5c333864

15 1月, 2009 1 次提交

time-sched.c: tick_nohz_update_jiffies should be static · 934d96ea

由 Jaswinder Singh Rajput 提交于 1月 14, 2009

Impact: cleanup, reduce kernel size a bit, avoid sparse warning

Fixes sparse warning:

kernel/time/tick-sched.c:137:6: warning: symbol 'tick_nohz_update_jiffies' was not declared. Should it be static?
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

934d96ea

31 12月, 2008 2 次提交

[PATCH] idle cputime accounting · 79741dd3

由 Martin Schwidefsky 提交于 12月 31, 2008

The cpu time spent by the idle process actually doing something is
currently accounted as idle time. This is plain wrong, the architectures
that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
time spent doing nothing and the time spent by idle doing work. The first
is accounted with account_idle_time and the second with account_system_time.
The architectures that use the account_xxx_time interface directly and not
the account_xxx_ticks interface now need to do the check for the idle
process in their arch code. In particular to improve the system vs true
idle time accounting the arch code needs to measure the true idle time
instead of just testing for the idle process.
To improve the tick based accounting as well we would need an architecture
primitive that can tell us if the pt_regs of the interrupted context
points to the magic instruction that halts the cpu.

In addition idle time is no more added to the stime of the idle process.
This field now contains the system time of the idle process as it should
be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
every tick that occurs while idle is running will be accounted as idle
time.

This patch contains the necessary common code changes to be able to
distinguish idle system time and true idle time. The architectures with
support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

79741dd3

[PATCH] fix scaled & unscaled cputime accounting · 457533a7

由 Martin Schwidefsky 提交于 12月 31, 2008

The utimescaled / stimescaled fields in the task structure and the
global cpustat should be set on all architectures. On s390 the calls
to account_user_time_scaled and account_system_time_scaled never have
been added. In addition system time that is accounted as guest time
to the user time of a process is accounted to the scaled system time
instead of the scaled user time.
To fix the bugs and to prevent future forgetfulness this patch merges
account_system_time_scaled into account_system_time and
account_user_time_scaled into account_user_time.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Michael Neuling <mikey@neuling.org>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

457533a7

12 12月, 2008 2 次提交

nohz: suppress needless timer reprogramming · 00147449

由 Woodruff, Richard 提交于 12月 01, 2008

In my device I get many interrupts from a high speed USB device in a very
short period of time.  The system spends a lot of time reprogramming the
hardware timer which is in a slower timing domain as compared to the CPU. 
This results in the CPU spending a huge amount of time waiting for the
timer posting to be done.  All of this reprogramming is useless as the
wake up time has not changed.

As measured using ETM trace this drops my reprogramming penalty from
almost 60% CPU load down to 15% during high interrupt rate.  I can send
traces to show this.

Suppress setting of duplicate timer event when timer already stopped. 
Timer programming can be very costly and can result in long cpu stall/wait
times.

[akpm@linux-foundation.org: coding-style fixes]
[tglx@linutronix.de: move the check to the right place and avoid raising
		     the softirq for nothing]
Signed-off-by: NRichard Woodruff <r-woodruff2@ti.com>
Cc: johnstul@us.ibm.com
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

00147449

nohz: no softirq pending warnings for offline cpus · fa116ea3

由 Heiko Carstens 提交于 12月 11, 2008

Impact: remove false positive warning

After a cpu was taken down during cpu hotplug (read: disabled for interrupts)
it still might have pending softirqs. However take_cpu_down makes sure
that the idle task will run next instead of ksoftirqd on the taken down cpu.
The idle task will call tick_nohz_stop_sched_tick which might warn about
pending softirqs just before the cpu kills itself completely.

However the pending softirqs on the dead cpu aren't a problem because they
will be moved to an online cpu during CPU_DEAD handling.

So make sure we warn only for online cpus.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fa116ea3

25 11月, 2008 2 次提交

hrtimer: removing all ur callback modes · ca109491

由 Peter Zijlstra 提交于 11月 25, 2008

Impact: cleanup, move all hrtimer processing into hardirq context

This is an attempt at removing some of the hrtimer complexity by
reducing the number of callback modes to 1.

This means that all hrtimer callback functions will be ran from HARD-irq
context.

I went through all the 30 odd hrtimer callback functions in the kernel
and saw only one that I'm not quite sure of, which is the one in
net/can/bcm.c - hence I'm CC-ing the folks responsible for that code.

Furthermore, the hrtimer core now calls callbacks directly with IRQs
disabled in case you try to enqueue an expired timer. If this timer is a
periodic timer (which should use hrtimer_forward() to advance its time)
then it might be possible to end up in an inf. recursive loop due to the
fact that hrtimer_forward() doesn't round up to the next timer
granularity, and therefore keeps on calling the callback - obviously
this needs a fix.

Aside from that, this seems to compile and actually boot on my dual core
test box - although I'm sure there are some bugs in, me not hitting any
makes me certain :-)
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ca109491

sched: convert nohz_cpu_mask to cpumask_var_t. · 6a7b3dc3

由 Rusty Russell 提交于 11月 25, 2008

Impact: (future) size reduction for large NR_CPUS.

Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves
space for small nr_cpu_ids but big CONFIG_NR_CPUS.  cpumask_var_t
is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6a7b3dc3

11 11月, 2008 1 次提交

nohz: disable tick_nohz_kick_tick() for now · ae99286b

由 Thomas Gleixner 提交于 11月 10, 2008

Impact: nohz powersavings and wakeup regression

commit fb02fbc1 (NOHZ: restart tick
device from irq_enter()) causes a serious wakeup regression.

While the patch is correct it does not take into account that spurious
wakeups happen on x86. A fix for this issue is available, but we just
revert to the .27 behaviour and let long running softirqs screw
themself.

Disable it for now.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

ae99286b

22 10月, 2008 1 次提交

NOHZ: fix thinko in the timer restart code path · c4bd822e

由 Thomas Gleixner 提交于 10月 21, 2008

commit fb02fbc1 (NOHZ: restart tick
device from irq_enter())

solves the problem of stale jiffies when long running softirqs happen
in a long idle sleep period, but it has a major thinko in it:

When the interrupt which came in _is_ the timer interrupt which should
expire ts->sched_timer then we cancel and rearm the timer _before_ it
gets expired in hrtimer_interrupt() to the next period. That means the
call back function is not called. This game can go on for ever :(

Prevent this by making sure to only rearm the timer when the expiry
time is more than one tick_period away. Otherwise keep it running as
it is either already expired or will expiry at the right point to
update jiffies.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NVenkatesch Pallipadi <venkatesh.pallipadi@intel.com>

c4bd822e

18 10月, 2008 3 次提交

NOHZ: restart tick device from irq_enter() · fb02fbc1

由 Thomas Gleixner 提交于 10月 17, 2008

We did not restart the tick device from irq_enter() to avoid double
reprogramming and extra events in the return immediate to idle case.

But long lasting softirqs can lead to a situation where jiffies become
stale:

idle()
  tick stopped (reprogrammed to next pending timer)
  halt()
   interrupt
     jiffies updated from irq_enter()
     interrupt handler
     softirq function 1 runs 20ms
     softirq function 2 arms a 10ms timer with a stale jiffies value
     jiffies updated from irq_exit()
     timer wheel has now an already expired timer
     (the one added in function 2)
     timer fires and timer softirq runs

This was discovered when debugging a timer problem which happend only
when the ath5k driver is active. The debugging proved that there is a
softirq function running for more than 20ms, which is a bug by itself.

To solve this we restart the tick timer right from irq_enter(), but do
not go through the other functions which are necessary to return from
idle when need_resched() is set.
Reported-by: NElias Oltmanns <eo@nebensachen.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NElias Oltmanns <eo@nebensachen.de>

fb02fbc1

NOHZ: split tick_nohz_restart_sched_tick() · c34bec5a

由 Thomas Gleixner 提交于 10月 17, 2008

Split out the clock event device reprogramming. Preparatory
patch.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

c34bec5a

NOHZ: unify the nohz function calls in irq_enter() · 719254fa

由 Thomas Gleixner 提交于 10月 17, 2008

We have two separate nohz function calls in irq_enter() for no good
reason. Just call a single NOHZ function from irq_enter() and call
the bits in the tick code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

719254fa

10 10月, 2008 1 次提交

[CPUFREQ][5/6] cpufreq: Changes to get_cpu_idle_time_us(), used by ondemand governor · 8083e4ad

由 venkatesh.pallipadi@intel.com 提交于 8月 04, 2008

export get_cpu_idle_time_us() for it to be used in ondemand governor.
Last update time can be current time when the CPU is currently non-idle,
accounting for the busy time since last idle.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NDave Jones <davej@redhat.com>

8083e4ad

29 9月, 2008 1 次提交

hrtimer: prevent migration of per CPU hrtimers · ccc7dadf

由 Thomas Gleixner 提交于 9月 29, 2008

Impact: per CPU hrtimers can be migrated from a dead CPU

The hrtimer code has no knowledge about per CPU timers, but we need to
prevent the migration of such timers and warn when such a timer is
active at migration time.

Explicitely mark the timers as per CPU and use a more understandable
mode descriptor for the interrupts safe unlocked callback mode, which
is used by hrtimer_sleeper and the scheduler code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

ccc7dadf

23 9月, 2008 2 次提交

clockevents: prevent stale tick_next_period for onlining CPUs · 49d670fb

由 Thomas Gleixner 提交于 9月 22, 2008

Impact: possible hang on CPU onlining in timer one shot mode.

The tick_next_period variable is only used during boot on nohz/highres
enabled systems, but for CPU onlining it needs to be maintained when
the per cpu clock events device operates in one shot mode.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

49d670fb

clockevents: prevent cpu online to interfere with nohz · 6441402b

由 Thomas Gleixner 提交于 9月 22, 2008

Impact: rare hang which can be triggered on CPU online.

tick_do_timer_cpu keeps track of the CPU which updates jiffies
via do_timer. The value -1 is used to signal, that currently no
CPU is doing this. There are two cases, where the variable can 
have this state:

 boot:
    necessary for systems where the boot cpu id can be != 0

 nohz long idle sleep:
    When the CPU which did the jiffies update last goes into
    a long idle sleep it drops the update jiffies duty so
    another CPU which is not idle can pick it up and keep
    jiffies going.

Using the same value for both situations is wrong, as the CPU online
code can see the -1 state when the timer of the newly onlined CPU is
setup. The setup for a newly onlined CPU goes through periodic mode
and can pick up the do_timer duty without being aware of the nohz /
highres mode of the already running system.

Use two separate states and make them constants to avoid magic
numbers confusion. 
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6441402b

06 9月, 2008 2 次提交

hrtimer: convert kernel/* to the new hrtimer apis · cc584b21

由 Arjan van de Ven 提交于 9月 01, 2008

In order to be able to do range hrtimers we need to use accessor functions
to the "expire" member of the hrtimer struct.
This patch converts kernel/* to these accessors.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>

cc584b21

sched_clock: fix NOHZ interaction · 56c7426b

由 Peter Zijlstra 提交于 9月 01, 2008

If HLT stops the TSC, we'll fail to account idle time, thereby inflating the
actual process times. Fix this by re-calibrating the clock against GTOD when
leaving nohz mode.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: NAvi Kivity <avi@qumranet.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

56c7426b

21 8月, 2008 1 次提交

nohz: fix wrong event handler after online an offlined cpu · 3c4fbe5e

由 Miao Xie 提交于 8月 20, 2008

On the tickless system(CONFIG_NO_HZ=y and CONFIG_HIGH_RES_TIMERS=n), after
I made an offlined cpu online, I found this cpu's event handler was
tick_handle_periodic, not tick_nohz_handler.

After debuging, I found this bug was caused by the wrong tick mode.  the
tick mode is not changed to NOHZ_MODE_INACTIVE when the cpu is offline.

This patch fixes this bug.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3c4fbe5e

11 8月, 2008 1 次提交

printk: robustify printk · b845b517

由 Peter Zijlstra 提交于 8月 08, 2008

Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
wakeup by polling from the timer tick.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b845b517

31 7月, 2008 1 次提交

sched clock: revert various sched_clock() changes · e4e4e534

由 Ingo Molnar 提交于 4月 14, 2008

Found an interactivity problem on a quad core test-system - simple
CPU loops would occasionally delay the system un an unacceptable way.

After much debugging with Peter Zijlstra it turned out that the problem
is caused by the string of sched_clock() changes - they caused the CPU
clock to jump backwards a bit - which confuses the scheduler arithmetics.

(which is unsigned for performance reasons)

So revert:

 # c300ba25: sched_clock: and multiplier for TSC to gtod drift
 # c0c87734: sched_clock: only update deltas with local reads.
 # af52a90a: sched_clock: stop maximum check on NO HZ
 # f7cce27f: sched_clock: widen the max and min time

This solves the interactivity problems.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>

e4e4e534

19 7月, 2008 1 次提交

nohz: prevent tick stop outside of the idle loop · b8f8c3cf

由 Thomas Gleixner 提交于 7月 18, 2008

Jack Ren and Eric Miao tracked down the following long standing
problem in the NOHZ code:

	scheduler switch to idle task
	enable interrupts

Window starts here

	----> interrupt happens (does not set NEED_RESCHED)
	      	irq_exit() stops the tick

	----> interrupt happens (does set NEED_RESCHED)

	return from schedule()
	
	cpu_idle(): preempt_disable();

Window ends here

The interrupts can happen at any point inside the race window. The
first interrupt stops the tick, the second one causes the scheduler to
rerun and switch away from idle again and we end up with the tick
disabled.

The fact that it needs two interrupts where the first one does not set
NEED_RESCHED and the second one does made the bug obscure and extremly
hard to reproduce and analyse. Kudos to Jack and Eric.

Solution: Limit the NOHZ functionality to the idle loop to make sure
that we can not run into such a situation ever again.

cpu_idle()
{
	preempt_disable();

	while(1) {
		 tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
		 			          are in the idle loop

		 while (!need_resched())
		       halt();

		 tick_nohz_restart_sched_tick(); <- disables NOHZ mode
		 preempt_enable_no_resched();
		 schedule();
		 preempt_disable();
	}
}

In hindsight we should have done this forever, but ... 

/me grabs a large brown paperbag.

Debugged-by: Jack Ren <jack.ren@marvell.com>, 
Debugged-by: Neric miao <eric.y.miao@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

b8f8c3cf

11 7月, 2008 1 次提交

sched_clock: stop maximum check on NO HZ · af52a90a

由 Steven Rostedt 提交于 7月 07, 2008

Working with ftrace I would get large jumps of 11 millisecs or more with
the clock tracer. This killed the latencing timings of ftrace and also
caused the irqoff self tests to fail.

What was happening is with NO_HZ the idle would stop the jiffy counter and
before the jiffy counter was updated the sched_clock would have a bad
delta jiffies to compare with the gtod with the maximum.

The jiffies would stop and the last sched_tick would record the last gtod.
On wakeup, the sched clock update would compare the gtod + delta jiffies
(which would be zero) and compare it to the TSC. The TSC would have
correctly (with a stable TSC) moved forward several jiffies. But because the
jiffies has not been updated yet the clock would be prevented from moving
forward because it would appear that the TSC jumped too far ahead.

The clock would then virtually stop, until the jiffies are updated. Then
the next sched clock update would see that the clock was very much behind
since the delta jiffies is now correct. This would then jump the clock
forward by several jiffies.

This caused ftrace to report several milliseconds of interrupts off
latency at every resume from NO_HZ idle.

This patch adds hooks into the nohz code to disable the checking of the
maximum clock update when nohz is in effect. It resumes the max check
when nohz has updated the jiffies again.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

af52a90a

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功