提交 · a89c7edbe7d7aa80f507915f3dd801211b116b79 · openanolis / cloud-kernel

05 5月, 2013 1 次提交

tick: Use zalloc_cpumask_var for allocating offstack cpumasks · fbd44a60

由 Thomas Gleixner 提交于 5月 03, 2013

commit b352bc1c (tick: Convert broadcast cpu bitmaps to
cpumask_var_t) broke CONFIG_CPUMASK_OFFSTACK in a very subtle way.

Instead of allocating the cpumasks with zalloc_cpumask_var it uses
alloc_cpumask_var, so we can get random data there, which of course
confuses the logic completely and causes random failures.
Reported-and-tested-by: NDave Jones <davej@redhat.com>
Reported-and-tested-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305032015060.2990@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

fbd44a60

25 4月, 2013 1 次提交

clockevents: Set dummy handler on CPU_DEAD shutdown · 6f7a05d7

由 Thomas Gleixner 提交于 4月 25, 2013

Vitaliy reported that a per cpu HPET timer interrupt crashes the
system during hibernation. What happens is that the per cpu HPET timer
gets shut down when the nonboot cpus are stopped. When the nonboot
cpus are onlined again the HPET code sets up the MSI interrupt which
fires before the clock event device is registered. The event handler
is still set to hrtimer_interrupt, which then crashes the machine due
to highres mode not being active.

See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=700333

There is no real good way to avoid that in the HPET code. The HPET
code alrady has a mechanism to detect spurious interrupts when event
handler == NULL for a similar reason.

We can handle that in the clockevent/tick layer and replace the
previous functional handler with a dummy handler like we do in
tick_setup_new_device().

The original clockevents code did this in clockevents_exchange_device(),
but that got removed by commit 7c1e7689 (clockevents: prevent
clockevent event_handler ending up handler_noop) which forgot to fix
it up in tick_shutdown(). Same issue with the broadcast device.
Reported-by: NVitaliy Fillipov <vitalif@yourcmc.ru>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Cc: 700333@bugs.debian.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6f7a05d7

18 4月, 2013 1 次提交

clockevents: Switch into oneshot mode even if broadcast registered late · c038c1c4

由 Stephen Boyd 提交于 4月 17, 2013

tick_oneshot_notify() is used to notify a particular CPU to try
to switch into oneshot mode after a oneshot capable tick device
is registered and tick_clock_notify() is used to notify all CPUs
to try to switch into oneshot mode after a high res clocksource
is registered. There is one caveat; if the tick devices suffer
from FEAT_C3_STOP we don't try to switch into oneshot mode unless
we have a oneshot capable broadcast device already registered.

If the broadcast device is registered after the tick devices that
have FEAT_C3_STOP we'll never try to switch into oneshot mode
again, causing us to be stuck in periodic mode forever. Avoid
this scenario by calling tick_clock_notify() after we register
the broadcast device so that we try to switch into oneshot mode
on all CPUs one more time.

[ tglx: Adopted to timers/core and added a comment ]
Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
Link: http://lkml.kernel.org/r/1366219566-29783-1-git-send-email-sboyd@codeaurora.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

c038c1c4

16 4月, 2013 1 次提交

nohz: Switch from "extended nohz" to "full nohz" based naming · c5bfece2

由 Frederic Weisbecker 提交于 4月 12, 2013

"Extended nohz" was used as a naming base for the full dynticks
API and Kconfig symbols. It reflects the fact the system tries
to stop the tick in more places than just idle.

But that "extended" name is a bit opaque and vague. Rename it to
"full" makes it clearer what the system tries to do under this
config: try to shutdown the tick anytime it can. The various
constraints that prevent that to happen shouldn't be considered
as fundamental properties of this feature but rather technical
issues that may be solved in the future.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>

c5bfece2

21 3月, 2013 1 次提交

nohz: Assign timekeeping duty to a CPU outside the full dynticks range · a382bf93

由 Frederic Weisbecker 提交于 12月 18, 2012

This way the full nohz CPUs can safely run with the tick
stopped with a guarantee that somebody else is taking
care of the jiffies and GTOD progression.

Once the duty is attributed to a CPU, it won't change. Also that
CPU can't enter into dyntick idle mode or be hot unplugged.

This may later be improved from a power consumption POV. At
least we should be able to share the duty amongst all CPUs
outside the full dynticks range. Then the duty could even be
shared with full dynticks CPUs when those can't stop their
tick for any reason.

But let's start with that very simple approach first.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[fix have_nohz_full_mask offcase]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a382bf93

13 3月, 2013 3 次提交

tick: Provide a check for a forced broadcast pending · eaa907c5

由 Thomas Gleixner 提交于 3月 06, 2013

On the CPU which gets woken along with the target CPU of the broadcast
the following happens:

  deep_idle()
			<-- spurious wakeup
  broadcast_exit()
    set forced bit
  
  enable interrupts
    
			<-- Nothing happens

  disable interrupts

  broadcast_enter()
			<-- Here we observe the forced bit is set
  deep_idle()

Now after that the target CPU of the broadcast runs the broadcast
handler and finds the other CPU in both the broadcast and the forced
mask, sends the IPI and stuff gets back to normal.

So it's not actually harmful, just more evidence for the theory, that
hardware designers have access to very special drug supplies.

Now there is no point in going back to deep idle just to wake up again
right away via an IPI. Provide a check which allows the idle code to
avoid the deep idle transition.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Jason Liu <liu.h.jason@gmail.com>
Link: http://lkml.kernel.org/r/20130306111537.565418308@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

eaa907c5

tick: Handle broadcast wakeup of multiple cpus · 989dcb64

由 Thomas Gleixner 提交于 3月 06, 2013

Some brilliant hardware implementations wake multiple cores when the
broadcast timer fires. This leads to the following interesting
problem:

CPU0				CPU1
wakeup from idle		wakeup from idle

leave broadcast mode		leave broadcast mode
 restart per cpu timer		 restart per cpu timer
 	     	 		go back to idle
handle broadcast
 (empty mask)			
				enter broadcast mode
				programm broadcast device
enter broadcast mode
programm broadcast device

So what happens is that due to the forced reprogramming of the cpu
local timer, we need to set a event in the future. Now if we manage to
go back to idle before the timer fires, we switch off the timer and
arm the broadcast device with an already expired time (covered by
forced mode). So in the worst case we repeat the above ping pong
forever.
					
Unfortunately we have no information about what caused the wakeup, but
we can check current time against the expiry time of the local cpu. If
the local event is already in the past, we know that the broadcast
timer is about to fire and send an IPI. So we mark ourself as an IPI
target even if we left broadcast mode and avoid the reprogramming of
the local cpu timer.

This still leaves the possibility that a CPU which is not handling the
broadcast interrupt is going to reach idle again before the IPI
arrives. This can't be solved in the core code and will be handled in
follow up patches.
Reported-by: NJason Liu <liu.h.jason@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Link: http://lkml.kernel.org/r/20130306111537.492045206@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

989dcb64

tick: Avoid programming the local cpu timer if broadcast pending · 26517f3e

由 Thomas Gleixner 提交于 3月 06, 2013

If the local cpu timer stops in deep idle, we arm the broadcast device
and get woken by an IPI. Now when we return from deep idle we reenable
the local cpu timer unconditionally before handling the IPI. But
that's a pointless exercise: the timer is already expired and the IPI
is on the way. And it's an expensive exercise as we use the forced
reprogramming mode so that we do not lose a timer event. This forced
reprogramming will loop at least once in the retry.

To avoid this reprogramming, we mark the cpu in a pending bit mask
before we send the IPI. Now when the IPI target cpu wakes up, it will
see the pending bit set and skip the reprogramming. The reprogramming
of the cpu local timer will happen in the IPI handler which runs the
cpu local timer interrupt function.
Reported-by: NJason Liu <liu.h.jason@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Link: http://lkml.kernel.org/r/20130306111537.431082074@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

26517f3e

08 3月, 2013 1 次提交

clockevents: Don't allow dummy broadcast timers · a7dc19b8

由 Mark Rutland 提交于 3月 07, 2013

Currently tick_check_broadcast_device doesn't reject clock_event_devices
with CLOCK_EVT_FEAT_DUMMY, and may select them in preference to real
hardware if they have a higher rating value. In this situation, the
dummy timer is responsible for broadcasting to itself, and the core
clockevents code may attempt to call non-existent callbacks for
programming the dummy, eventually leading to a panic.

This patch makes tick_check_broadcast_device always reject dummy timers,
preventing this problem.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Jon Medhurst (Tixy) <tixy@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

a7dc19b8

07 3月, 2013 3 次提交

tick: Dynamically set broadcast irq affinity · d2348fb6

由 Daniel Lezcano 提交于 3月 02, 2013

When a cpu goes to a deep idle state where its local timer is
shutdown, it notifies the time frame work to use the broadcast timer
instead.  Unfortunately, the broadcast device could wake up any CPU,
including an idle one which is not concerned by the wake up at all. So
in the worst case an idle CPU will wake up to send an IPI to the CPU
whose timer expired.

Provide an opt-in feature CLOCK_EVT_FEAT_DYNIRQ which tells the core
that is should set the interrupt affinity of the broadcast interrupt
to the cpu which has the earliest expiry time. This avoids unnecessary
spurious wakeups and IPIs.

[ tglx: Adopted to cpumask rework, silenced an uninitialized warning,
  massaged changelog ]
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Cc: viresh.kumar@linaro.org
Cc: jacob.jun.pan@linux.intel.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: santosh.shilimkar@ti.com
Cc: linaro-kernel@lists.linaro.org
Cc: patches@linaro.org
Cc: rickard.andersson@stericsson.com
Cc: vincent.guittot@linaro.org
Cc: linus.walleij@stericsson.com
Cc: john.stultz@linaro.org
Link: http://lkml.kernel.org/r/1362219013-18173-3-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

d2348fb6

tick: Pass broadcast device to tick_broadcast_set_event() · f9ae39d0

由 Daniel Lezcano 提交于 3月 02, 2013

Pass the broadcast timer to tick_broadcast_set_event() instead of
reevaluating tick_broadcast_device.evtdev.

[ tglx: Massaged changelog ]
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Cc: viresh.kumar@linaro.org
Cc: jacob.jun.pan@linux.intel.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: santosh.shilimkar@ti.com
Cc: linaro-kernel@lists.linaro.org
Cc: patches@linaro.org
Cc: rickard.andersson@stericsson.com
Cc: vincent.guittot@linaro.org
Cc: linus.walleij@stericsson.com
Cc: john.stultz@linaro.org
Link: http://lkml.kernel.org/r/1362219013-18173-2-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f9ae39d0

tick: Convert broadcast cpu bitmaps to cpumask_var_t · b352bc1c

由 Thomas Gleixner 提交于 3月 05, 2013

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130306111537.366394000@linutronix.de
Cc: Rusty Russell <rusty@rustcorp.com.au>

b352bc1c

13 2月, 2013 1 次提交

clockevents: Fix generic broadcast for FEAT_C3STOP · 5d1d9a29

由 Mark Rutland 提交于 2月 08, 2013

Commit 12ad1000: "clockevents: Add generic timer broadcast function"
made tick_device_uses_broadcast set up the generic broadcast function
for dummy devices (where !tick_device_is_functional(dev)), but neglected
to set up the broadcast function for devices that stop in low power
states (with the CLOCK_EVT_FEAT_C3STOP flag).

When these devices enter low power states they will not have the generic
broadcast function assigned, and will bring down the system when an
attempt is made to broadcast to them.

This patch ensures that the broadcast function is also assigned for
devices which require broadcast in low power states.
Reported-by: NStephen Warren <swarren@nvidia.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NStephen Warren <swarren@nvidia.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: nico@linaro.org
Cc: Marc.Zyngier@arm.com
Cc: Will.Deacon@arm.com
Cc: santosh.shilimkar@ti.com
Cc: john.stultz@linaro.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

5d1d9a29

01 2月, 2013 2 次提交

clockevents: Add generic timer broadcast function · 12ad1000

由 Mark Rutland 提交于 1月 14, 2013

Currently, the timer broadcast mechanism is defined by a function
pointer on struct clock_event_device. As the fundamental mechanism for
broadcast is architecture-specific, this means that clock_event_device
drivers cannot be shared across multiple architectures.

This patch adds an (optional) architecture-specific function for timer
tick broadcast, allowing drivers which may require broadcast
functionality to be shared across multiple architectures.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: nico@linaro.org
Cc: Will.Deacon@arm.com
Cc: Marc.Zyngier@arm.com
Cc: john.stultz@linaro.org
Link: http://lkml.kernel.org/r/1358183124-28461-3-git-send-email-mark.rutland@arm.comTested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

12ad1000

clockevents: Add generic timer broadcast receiver · 12572dbb

由 Mark Rutland 提交于 1月 14, 2013

Currently the broadcast mechanism used for timers is abstracted by a
function pointer on struct clock_event_device. As the fundamental
mechanism for broadcast is architecture-specific, this ties each
clock_event_device driver to a single architecture, even where the
driver is otherwise generic.

This patch adds a standard path for the receipt of timer broadcasts, so
drivers and/or architecture backends need not manage redundant lists of
timers for the purpose of routing broadcast timer ticks.

[tglx: Made the implementation depend on the config switch as well ]
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: nico@linaro.org
Cc: Will.Deacon@arm.com
Cc: Marc.Zyngier@arm.com
Cc: john.stultz@linaro.org
Link: http://lkml.kernel.org/r/1358183124-28461-2-git-send-email-mark.rutland@arm.comTested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

12572dbb

20 4月, 2012 2 次提交

tick: Fix the spurious broadcast timer ticks after resume · a6371f80

由 Suresh Siddha 提交于 4月 18, 2012

During resume, tick_resume_broadcast() programs the broadcast timer in
oneshot mode unconditionally. On the platforms where broadcast timer
is not really required, this will generate spurious broadcast timer
ticks upon resume. For example, on the always running apic timer
platforms with HPET, I see spurious hpet tick once every ~5minutes
(which is the 32-bit hpet counter wraparound time).

Similar to boot time, during resume make the oneshot mode setting of
the broadcast clock event device conditional on the state of active
broadcast users.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: svenjoac@gmx.de
Cc: torvalds@linux-foundation.org
Cc: rjw@sisk.pl
Link: http://lkml.kernel.org/r/1334802459.28674.209.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

a6371f80

tick: Ensure that the broadcast device is initialized · b9a6a235

由 Thomas Gleixner 提交于 4月 18, 2012

Santosh found another trap when we avoid to initialize the broadcast
device in the switch_to_oneshot code. The broadcast device might be
still in SHUTDOWN state when we actually need to use it. That
obviously breaks, as set_next_event() is called on a shutdown
device. This did not break on x86, but Suresh analyzed it:

From the review, most likely on Sven's system we are force enabling
the hpet using the pci quirk's method very late. And in this case,
hpet_clockevent (which will be global_clock_event) handler can be
null, specifically as this platform might not be using deeper c-states
and using the reliable APIC timer.

Prior to commit 'fa4da365', that handler will be set to
'tick_handle_oneshot_broadcast' when we switch the broadcast timer to
oneshot mode, even though we don't use it. Post commit
'fa4da365', we stopped switching the broadcast mode to oneshot
as this is not really needed and his platform's global_clock_event's
handler will remain null. While on my SNB laptop, same is set to
'clockevents_handle_noop' because hpet gets enabled very early. (noop
handler on my platform set when the early enabled hpet timer gets
replaced by the lapic timer).

But the commit 'fa4da365' tracked the broadcast timer mode in
the SW as oneshot, even though it didn't touch the HW timer. During
resume however, tick_resume_broadcast() saw the SW broadcast mode as
oneshot and actually programmed the broadcast device also into oneshot
mode. So this triggered the null pointer de-reference after the hpet
wraps around and depending on what the hpet counter is set to. On the
normal platforms where hpet gets enabled early we should be seeing a
spurious interrupt (in my SNB laptop I see one spurious interrupt
after around 5 minutes ;) which is 32-bit hpet counter wraparound
time), but that's a separate issue.

Enforce the mode setting when trying to set an event.
Reported-and-tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: torvalds@linux-foundation.org
Cc: svenjoac@gmx.de
Cc: rjw@sisk.pl
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181723350.2542@ionos

b9a6a235

18 4月, 2012 1 次提交

tick: Fix oneshot broadcast setup really · b435092f

由 Thomas Gleixner 提交于 4月 18, 2012

Sven Joachim reported, that suspend/resume on rc3 trips over a NULL
pointer dereference. Linus spotted the clockevent handler being NULL.

commit fa4da365(clockevents: tTack broadcast device mode change in
tick_broadcast_switch_to_oneshot()) tried to fix a problem with the
broadcast device setup, which was introduced in commit 77b0d60c(
clockevents: Leave the broadcast device in shutdown mode when not
needed).

The initial commit avoided to set up the broadcast device when no
broadcast request bits were set, but that left the broadcast device
disfunctional. In consequence deep idle states which need the
broadcast device were not woken up.

commit fa4da365 tried to fix that by initializing the state of the
broadcast facility, but that missed the fact, that nothing initializes
the event handler and some other state of the underlying clock event
device.

The fix is to revert both commits and make only the mode setting of
the clock event device conditional on the state of active broadcast
users. 

That initializes everything except the low level device mode, but this
happens when the broadcast functionality is invoked by deep idle.
Reported-and-tested-by: NSven Joachim <svenjoac@gmx.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181205540.2542@ionos

b435092f

10 4月, 2012 1 次提交

clockevents: tTack broadcast device mode change in tick_broadcast_switch_to_oneshot() · fa4da365

由 Suresh Siddha 提交于 4月 09, 2012

In the commit 77b0d60c,
"clockevents: Leave the broadcast device in shutdown mode when not needed",
we were bailing out too quickly in tick_broadcast_switch_to_oneshot(),
with out tracking the broadcast device mode change to 'TICKDEV_MODE_ONESHOT'.

This breaks the platforms which need broadcast device oneshot services during
deep idle states. tick_broadcast_oneshot_control() thinks that it is
in periodic mode and fails to take proper decisions based on the
CLOCK_EVT_NOTIFY_BROADCAST_[ENTER, EXIT] notifications during deep
idle entry/exit.

Fix this by tracking the broadcast device mode as 'TICKDEV_MODE_ONESHOT',
before leaving the broadcast HW device in shutdown mode if there are no active
requests for the moment.
Reported-and-tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: johnstul@us.ibm.com
Link: http://lkml.kernel.org/r/1334011304.12400.81.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

fa4da365

15 2月, 2012 1 次提交

clockevents: Leave the broadcast device in shutdown mode when not needed · 77b0d60c

由 Suresh Siddha 提交于 11月 04, 2011

Platforms with Always Running APIC Timer doesn't use the broadcast timer
but the kernel is leaving the broadcast timer (HPET in this case)
in oneshot mode.

On these platforms, before the switch to oneshot mode, broadcast device is
actually in shutdown mode. Code checks for empty tick_broadcast_mask and
avoids going into the periodic mode.

During switch to oneshot mode, add the same tick_broadcast_mask checks in the
tick_broadcast_switch_to_oneshot() and avoid the broadcast device going into
the oneshot mode.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: venki@google.com
Link: http://lkml.kernel.org/r/1320452301.15071.16.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

77b0d60c

02 12月, 2011 1 次提交

tick-broadcast: Stop active broadcast device when replacing it · c1be8430

由 Thomas Gleixner 提交于 12月 02, 2011

When a better rated broadcast device is installed, then the current
active device is not disabled, which results in two running broadcast
devices.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org

c1be8430

08 9月, 2011 1 次提交

clockevents: Make minimum delay adjustments configurable · d1748302

由 Martin Schwidefsky 提交于 8月 23, 2011

The automatic increase of the min_delta_ns of a clockevents device
should be done in the clockevents code as the minimum delay is an
attribute of the clockevents device.

In addition not all architectures want the automatic adjustment, on a
massively virtualized system it can happen that the programming of a
clock event fails several times in a row because the virtual cpu has
been rescheduled quickly enough. In that case the minimum delay will
erroneously be increased with no way back. The new config symbol
GENERIC_CLOCKEVENTS_MIN_ADJUST is used to enable the automatic
adjustment. The config option is selected only for x86.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: john stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/20110823133142.494157493@de.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

d1748302

17 5月, 2011 1 次提交

tick: Clear broadcast active bit when switching to oneshot · 07f4beb0

由 Thomas Gleixner 提交于 5月 16, 2011

The first cpu which switches from periodic to oneshot mode switches
also the broadcast device into oneshot mode. The broadcast device
serves as a backup for per cpu timers which stop in deeper
C-states. To avoid starvation of the cpus which might be in idle and
depend on broadcast mode it marks the other cpus as broadcast active
and sets the brodcast expiry value of those cpus to the next tick.

The oneshot mode broadcast bit for the other cpus is sticky and gets
only cleared when those cpus exit idle. If a cpu was not idle while
the bit got set in consequence the bit prevents that the broadcast
device is armed on behalf of that cpu when it enters idle for the
first time after it switched to oneshot mode.

In most cases that goes unnoticed as one of the other cpus has usually
a timer pending which keeps the broadcast device armed with a short
timeout. Now if the only cpu which has a short timer active has the
bit set then the broadcast device will not be armed on behalf of that
cpu and will fire way after the expected timer expiry. In the case of
Christians bug report it took ~145 seconds which is about half of the
wrap around time of HPET (the limit for that device) due to the fact
that all other cpus had no timers armed which expired before the 145
seconds timeframe.

The solution is simply to clear the broadcast active bit
unconditionally when a cpu switches to oneshot mode after the first
cpu switched the broadcast device over. It's not idle at that point
otherwise it would not be executing that code.

[ I fundamentally hate that broadcast crap. Why the heck thought some
  folks that when going into deep idle it's a brilliant concept to
  switch off the last device which brings the cpu back from that
  state? ]

Thanks to Christian for providing all the valuable debug information!
Reported-and-tested-by: NChristian Hoffmann <email@christianhoffmann.info>
Cc: John Stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1105161105170.3078%40ionos%3E
Cc: stable@kernel.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

07f4beb0

05 5月, 2011 1 次提交

clockevents: Move C3 stop test outside lock · 7372b0b1

由 Andi Kleen 提交于 5月 04, 2011

Avoid taking broadcast_lock in the idle path for systems where the
timer doesn't stop in C3.

[ tglx: Removed the stale label and added comment ]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Dave Kleikamp <dkleikamp@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: lenb@kernel.org
Cc: paulmck@us.ibm.com
Link: http://lkml.kernel.org/r/%3C20110504234806.GF2925%40one.firstfloor.org%3ESigned-off-by: NThomas Gleixner <tglx@linutronix.de>

7372b0b1

26 2月, 2011 1 次提交

clockevents: Prevent oneshot mode when broadcast device is periodic · 3a142a06

由 Thomas Gleixner 提交于 2月 25, 2011

When the per cpu timer is marked CLOCK_EVT_FEAT_C3STOP, then we only
can switch into oneshot mode, when the backup broadcast device
supports oneshot mode as well. Otherwise we would try to switch the
broadcast device into an unsupported mode unconditionally. This went
unnoticed so far as the current available broadcast devices support
oneshot mode. Seth unearthed this problem while debugging and working
around an hpet related BIOS wreckage.

Add the necessary check to tick_is_oneshot_available().
Reported-and-tested-by: NSeth Forshee <seth.forshee@canonical.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.1102252231200.2701@localhost6.localdomain6>
Cc: stable@kernel.org # .21 ->

3a142a06

01 2月, 2011 1 次提交

time: Make do_timer() and xtime_lock local to kernel/time/ · e2830b5c

由 Torben Hohn 提交于 1月 27, 2011

All callers of do_timer() are converted to xtime_update(). The only
users of xtime_lock are in kernel/time/. Make both local to
kernel/time/ and remove them from the global header files.

[ tglx: Reuse tick-internal.h instead of creating another local header
  	file. Massaged changelog ]
Signed-off-by: NTorben Hohn <torbenh@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: johnstul@us.ibm.com
Cc: yong.zhang0@gmail.com
Cc: hch@infradead.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

e2830b5c

12 7月, 2010 1 次提交

fix comment/printk typos concerning "already" · 698f9315

由 Uwe Kleine-König 提交于 7月 02, 2010

Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

698f9315

15 12月, 2009 1 次提交

clockevents: Convert to raw_spinlock · b5f91da0

由 Thomas Gleixner 提交于 12月 08, 2009

Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NIngo Molnar <mingo@elte.hu>

b5f91da0

20 8月, 2009 1 次提交

clockevent: Prevent dead lock on clockevents_lock · f833bab8

由 Suresh Siddha 提交于 8月 17, 2009

Currently clockevents_notify() is called with interrupts enabled at
some places and interrupts disabled at some other places.

This results in a deadlock in this scenario.

cpu A holds clockevents_lock in clockevents_notify() with irqs enabled
cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled
cpu C doing set_mtrr() which will try to rendezvous of all the cpus.

This will result in C and A come to the rendezvous point and waiting
for B. B is stuck forever waiting for the spinlock and thus not
reaching the rendezvous point.

Fix the clockevents code so that clockevents_lock is taken with
interrupts disabled and thus avoid the above deadlock.

Also call lapic_timer_propagate_broadcast() on the destination cpu so
that we avoid calling smp_call_function() in the clockevents notifier
chain.

This issue left us wondering if we need to change the MTRR rendezvous
logic to use stop machine logic (instead of smp_call_function) or add
a check in spinlock debug code to see if there are other spinlocks
which gets taken under both interrupts enabled/disabled conditions.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: "Brown Len" <len.brown@intel.com>
LKML-Reference: <1250544899.2709.210.camel@sbs-t61.sc.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f833bab8

02 5月, 2009 1 次提交

clockevents: tick_broadcast_device can become static · a52f5c56

由 Dmitri Vorobiev 提交于 5月 01, 2009

The variable tick_broadcast_device is not used outside of the
file where it is defined, so let's make it static.
Signed-off-by: NDmitri Vorobiev <dmitri.vorobiev@movial.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

a52f5c56

01 1月, 2009 2 次提交

cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/ · 5db0e1e9

由 Rusty Russell 提交于 1月 01, 2009

Impact: cleanup

Simple replacement, now the _nr is redundant.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>

5db0e1e9

cpumask: convert kernel time functions · 6b954823

由 Rusty Russell 提交于 1月 01, 2009

Impact: Use new APIs

Convert kernel/time functions to use struct cpumask *.

Note the ugly bitmap declarations in tick-broadcast.c.  These should
be cpumask_var_t, but there was no obvious initialization function to
put the alloc_cpumask_var() calls in.  This was safe.

(Eventually 'struct cpumask' will be undefined for CONFIG_CPUMASK_OFFSTACK,
so we use a bitmap here to show we really mean it).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>

6b954823

13 12月, 2008 1 次提交

cpumask: convert struct clock_event_device to cpumask pointers. · 320ab2b0

由 Rusty Russell 提交于 12月 13, 2008

Impact: change calling convention of existing clock_event APIs

struct clock_event_timer's cpumask field gets changed to take pointer,
as does the ->broadcast function.

Another single-patch change.  For safety, we BUG_ON() in
clockevents_register_device() if it's not set.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>

320ab2b0

18 10月, 2008 1 次提交

NOHZ: restart tick device from irq_enter() · fb02fbc1

由 Thomas Gleixner 提交于 10月 17, 2008

We did not restart the tick device from irq_enter() to avoid double
reprogramming and extra events in the return immediate to idle case.

But long lasting softirqs can lead to a situation where jiffies become
stale:

idle()
  tick stopped (reprogrammed to next pending timer)
  halt()
   interrupt
     jiffies updated from irq_enter()
     interrupt handler
     softirq function 1 runs 20ms
     softirq function 2 arms a 10ms timer with a stale jiffies value
     jiffies updated from irq_exit()
     timer wheel has now an already expired timer
     (the one added in function 2)
     timer fires and timer softirq runs

This was discovered when debugging a timer problem which happend only
when the ath5k driver is active. The debugging proved that there is a
softirq function running for more than 20ms, which is a bug by itself.

To solve this we restart the tick timer right from irq_enter(), but do
not go through the other functions which are necessary to return from
idle when need_resched() is set.
Reported-by: NElias Oltmanns <eo@nebensachen.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NElias Oltmanns <eo@nebensachen.de>

fb02fbc1

04 10月, 2008 1 次提交

clockevents: check broadcast tick device not the clock events device · 07454bff

由 Thomas Gleixner 提交于 10月 04, 2008

Impact: jiffies increment too fast.

Hugh Dickins noted that with NOHZ=n and HIGHRES=n jiffies get
incremented too fast. The reason is a wrong check in the broadcast
enter/exit code, which keeps the local apic timer in periodic mode
when the switch happens.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

07454bff

23 9月, 2008 2 次提交

clockevents: prevent mode mismatch on cpu online · 27ce4cb4

由 Thomas Gleixner 提交于 9月 22, 2008

Impact: timer hang on CPU online observed on AMD C1E systems

When a CPU is brought online then the broadcast machinery can
be in the one shot state already. Check this and setup the timer 
device of the new CPU in one shot mode so the broadcast code
can pick up the next_event value correctly.

Another AMD C1E oddity, as we switch to broadcast immediately and
not after the full bring up via the ACPI cpu idle code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

27ce4cb4

clockevents: check broadcast device not tick device · 30274569

由 Thomas Gleixner 提交于 9月 22, 2008

Impact: Possible hang on CPU online observed on AMD C1E machines.

The broadcast setup code looks at the mode of the tick device to
determine whether it needs to be shut down or setup. This is wrong
when the broadcast mode is set to one shot already. This can happen
when a CPU is brought online as it goes through the periodic setup
first.

The problem went unnoticed as sane systems do not call into that code
before the switch to one shot for the clock event device happens.
The AMD C1E idle routine switches over immediately and thereby shuts
down the just setup device before the first interrupt happens.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

30274569

17 9月, 2008 1 次提交

clockevents: make device shutdown robust · 2344abbc

由 Thomas Gleixner 提交于 9月 16, 2008

The device shut down does not cleanup the next_event variable of the
clock event device. So when the device is reactivated the possible
stale next_event value can prevent the device to be reprogrammed as it
claims to wait on a event already.

This is the root cause of the resurfacing suspend/resume problem,
where systems need key press to come back to life.

Fix this by setting next_event to KTIME_MAX when the device is shut
down. Use a separate function for shutdown which takes care of that
and only keep the direct set mode call in the broadcast code, where we
can not touch the next_event value.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

2344abbc

06 9月, 2008 1 次提交

clockevents: broadcast fixup possible waiters · 7300711e

由 Thomas Gleixner 提交于 9月 06, 2008

Until the C1E patches arrived there where no users of periodic broadcast
before switching to oneshot mode. Now we need to trigger a possible
waiter for a periodic broadcast when switching to oneshot mode.
Otherwise we can starve them for ever.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

7300711e

05 9月, 2008 1 次提交

clockevents: prevent endless loop lockup · 1fb9b7d2

由 Thomas Gleixner 提交于 9月 03, 2008

The C1E/HPET bug reports on AMDX2/RS690 systems where tracked down to a
too small value of the HPET minumum delta for programming an event.

The clockevents code needs to enforce an interrupt event on the clock event
device in some cases. The enforcement code was stupid and naive, as it just
added the minimum delta to the current time and tried to reprogram the device.
When the minimum delta is too small, then this loops forever.

Add a sanity check. Allow reprogramming to fail 3 times, then print a warning
and double the minimum delta value to make sure, that this does not happen again.
Use the same function for both tick-oneshot and tick-broadcast code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1fb9b7d2

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功