- 30 1月, 2008 25 次提交
-
-
由 Thomas Gleixner 提交于
Use the same name for the 32 and 64 bit variant. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Thomas Gleixner 提交于
Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Thomas Gleixner 提交于
White space and coding style cleanups. Change unsigned to int. There is no win when we compare mincount against pc->size, which is an int as well. Casting pc->size to unsigned just might hide real problems. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Thomas Gleixner 提交于
Create a ldt write accessor like the 32 bit one. Preparatory patch for merging ldt.c and anyway necessary for 64bit paravirt ops. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Thomas Gleixner 提交于
White space and coding style clenaup. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Thomas Gleixner 提交于
White space and coding style cleanup. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
code cleanups: errors lines of code errors/KLOC arch/x86/kernel/pci-gart_64.c 183 748 244.6 arch/x86/kernel/pci-gart_64.c 0 790 0 Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Ingo Molnar 提交于
clean up arch/x86/kernel/aperture_64.c printk()s. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Ingo Molnar 提交于
whitespace cleanup. No code changed: text data bss dec hex filename 2080 76 4 2160 870 aperture_64.o.before 2080 76 4 2160 870 aperture_64.o.after errors lines of code errors/KLOC arch/x86/kernel/aperture_64.c 114 299 381.2 arch/x86/kernel/aperture_64.c 0 315 0 Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Hiroshi Shimamoto 提交于
local_irq_enable() is missing after sched_clock_idle_wakeup_event(). Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC in HLT too, not just when going through the ACPI methods. (the ACPI idle code already does this.) [ update the 64-bit side too, as noticed by Jiri Slaby. ] Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Guillaume Chazarain 提交于
scale the sched_clock() cyc_2_nsec scaling factor according to CPU frequency changes. [ mingo@elte.hu: simplified it and fixed it for SMP. ] Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Roland McGrath 提交于
cf http://lkml.org/lkml/2007/10/3/41 To summarize: on Linux, SA_ONSTACK decides whether you are already on the signal stack based on the value of the SP at the time of a signal. If you are not already inside the range, you are not "on the signal stack" and so the new signal handler frame starts over at the base of the signal stack. sigaltstack (and sigstack before it) was invented in BSD. There, the SA_ONSTACK behavior has always been different. It uses a kernel state flag to decide, rather than the SP value. When you first take an SA_ONSTACK signal and switch to the alternate signal stack, it sets the SS_ONSTACK flag in the thread's sigaltstack state in the kernel. Thereafter you are "on the signal stack" and don't switch SP before pushing a handler frame no matter what the SP value is. Only when you sigreturn from the original handler context do you clear the SS_ONSTACK flag so that a new handler frame will start over at the base of the alternate signal stack. The undesireable effect of the Linux behavior is that an overflow of the alternate signal stack can not only go undetected, but lead to a ring buffer effect of clobbering the original handler frame at the base of the signal stack for each successive signal that comes just after the overflow. This is what Shi Weihua's test case demonstrates. Normally this does not come up because of the signal mask, but the test case uses SA_NODEFER for its SIGSEGV handler. The other subtle part of the existing Linux semantics is that a simple longjmp out of a signal handler serves to take you off the signal stack in a safe and reliable fashion without having used sigreturn (nor having just returned from the handler normally, which means the same). After the longjmp (or even informal stack switching not via any proper libc or kernel interface), the alternate signal stack stands ready to be used again. A paranoid program would allocate a PROT_NONE red zone around its alternate signal stack. Then a small overflow would trigger a SIGSEGV in handler setup, and be fatal (core dump) whether or not SIGSEGV is blocked. As with thread stack red zones, that cannot catch all overflows (or underflows). e.g., a local array as large as page size allocated in a function called from a handler, but not actually touched before more calls push more stack, could cause an overflow that silently pushes into some unrelated allocated pages. The BSD behavior does not do anything in particular about overflow. But it does at least avoid the wraparound or "ring buffer effect", so you'll just get a straightforward all-out overflow down your address space past the low end of the alternate signal stack. I don't know what the BSD behavior is for longjmp out of an SA_ONSTACK handler. The POSIX wording relating to sigaltstack is pretty minimal. I don't think it speaks to this issue one way or another. (The program that overflows its stack is clearly in undefined behavior territory of one sort or another anyhow.) Given the longjmp issue and the potential for highly subtle complications in existing programs relying on this in arcane ways deep in their code, I am very dubious about changing the behavior to the BSD style persistent flag. I think Shi Weihua's patches have a similar effect by tracking the SP used in the last handler setup. I think it would be sensible for the signal handler setup code to detect when it would itself be causing a stack overflow. Maybe something like the following patch (untested). This issue exists in the same way on all machines, so ideally they would all do a similar check. When it's the handler function itself or its callees that cause the overflow, rather than the signal handler frame setup alone crossing the boundary, this still won't help. But I don't see any way to distinguish that from the valid longjmp case. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Ingo Molnar 提交于
add the DMI strings provided by Islam Amer <pharon@gmail.com>, for the Compaq Presario V6000 (Quanta/30B7). Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Ingo Molnar 提交于
various changes to the in_p/out_p delay details: - add the io_delay=none method - make each method selectable from the kernel config - simplify the delay code a bit by getting rid of an indirect function call - add the /proc/sys/kernel/io_delay_type sysctl - change 'io_delay=standard|alternate' to io_delay=0x80 and io_delay=0xed - make the io delay config not depend on CONFIG_DEBUG_KERNEL Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: N"David P. Reed" <dpreed@reed.com>
-
由 Rene Herman 提交于
x86: provide a DMI based port 0x80 I/O delay override. Certain (HP) laptops experience trouble from our port 0x80 I/O delay writes. This patch provides for a DMI based switch to the "alternate diagnostic port" 0xed (as used by some BIOSes as well) for these. David P. Reed confirmed that port 0xed works for him and provides a proper delay. The symptoms of _not_ working are a hanging machine, with "hwclock" use being a direct trigger. Earlier versions of this attempted to simply use udelay(2), with the 2 being a value tested to be a nicely conservative upper-bound with help from many on the linux-kernel mailinglist but that approach has two problems. First, pre-loops_per_jiffy calibration (which is post PIT init while some implementations of the PIT are actually one of the historically problematic devices that need the delay) udelay() isn't particularly well-defined. We could initialise loops_per_jiffy conservatively (and based on CPU family so as to not unduly delay old machines) which would sort of work, but... Second, delaying isn't the only effect that a write to port 0x80 has. It's also a PCI posting barrier which some devices may be explicitly or implicitly relying on. Alan Cox did a survey and found evidence that additionally some drivers may be racy on SMP without the bus locking outb. Switching to an inb() makes the timing too unpredictable and as such, this DMI based switch should be the safest approach for now. Any more invasive changes should get more rigid testing first. It's moreover only very few machines with the problem and a DMI based hack seems to fit that situation. This also introduces a command-line parameter "io_delay" to override the DMI based choice again: io_delay=<standard|alternate> where "standard" means using the standard port 0x80 and "alternate" port 0xed. This retains the udelay method as a config (CONFIG_UDELAY_IO_DELAY) and command-line ("io_delay=udelay") choice for testing purposes as well. This does not change the io_delay() in the boot code which is using the same port 0x80 I/O delay but those do not appear to be a problem as David P. Reed reported the problem was already gone after using the udelay version. He moreover reported that booting with "acpi=off" also fixed things and seeing as how ACPI isn't touched until after this DMI based I/O port switch I believe it's safe to leave the ones in the boot code be. The DMI strings from David's HP Pavilion dv9000z are in there already and we need to get/verify the DMI info from other machines with the problem, notably the HP Pavilion dv6000z. This patch is partly based on earlier patches from Pavel Machek and David P. Reed. Signed-off-by: NRene Herman <rene.herman@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Mike Galbraith 提交于
s2ram recently became useful here, except for the kernel's annoying habit of disabling my P4's perfectly good TSC. [ 107.894470] CPU 1 is now offline [ 107.894474] SMP alternatives: switching to UP code [ 107.895832] CPU0 attaching sched-domain: [ 107.895836] domain 0: span 1 [ 107.895838] groups: 1 [ 107.896097] CPU1 is down [ 3.726156] Intel machine check architecture supported. [ 3.726165] Intel machine check reporting enabled on CPU#0. [ 3.726167] CPU0: Intel P4/Xeon Extended MCE MSRs (12) available [ 3.726170] CPU0: Thermal monitoring enabled [ 3.726175] Back to C! [ 3.726708] Force enabled HPET at resume [ 3.726775] Enabling non-boot CPUs ... [ 3.727049] CPU0 attaching NULL sched-domain. [ 3.727165] SMP alternatives: switching to SMP code [ 3.727858] Booting processor 1/1 eip 3000 [ 3.727862] CPU 1 irqstacks, hard=b042f000 soft=b042d000 [ 3.738173] Initializing CPU#1 [ 3.798912] Calibrating delay using timer specific routine.. 5986.12 BogoMIPS (lpj=2993061) [ 3.798920] CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 00000000 [ 3.798931] CPU: Trace cache: 12K uops, L1 D cache: 8K [ 3.798934] CPU: L2 cache: 512K [ 3.798936] CPU: Physical Processor ID: 0 [ 3.798938] CPU: After all inits, caps: bfebfbff 00000000 00000000 0000b080 00004400 00000000 00000000 00000000 [ 3.798946] Intel machine check architecture supported. [ 3.798952] Intel machine check reporting enabled on CPU#1. [ 3.798955] CPU1: Intel P4/Xeon Extended MCE MSRs (12) available [ 3.798959] CPU1: Thermal monitoring enabled [ 3.799161] CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 09 [ 3.799187] checking TSC synchronization [CPU#0 -> CPU#1]: [ 3.819181] Measured 63588552840 cycles TSC warp between CPUs, turning off TSC clock. [ 3.819184] Marking TSC unstable due to: check_tsc_sync_source failed. If check_tsc_warp() is called after initial boot, and the TSC has in the meantime been set (BIOS, user, silicon, elves) to a value lower than the last stored/stale value, we blame the TSC. Reset to pristine condition after every test. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Rafael J. Wysocki 提交于
Document the fact that __save_processor_state() has to save all CPU registers referred to by the kernel in case a different kernel is used to load and restore a hibernation image containing it. Sigend-off-by: NRafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Balaji Rao 提交于
Looks like IRQ 31 is assigned to timer 3, even without the patch! I wonder who wrote the number 31. But the manual says that it is zero by default. I think we should check whether the timer has been allocated an IRQ before proceeding to assign one to it. Here is a patch that does this. Signed-off-by: NBalaji Rao <balajirrao@gmail.com> Tested-by: NYinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Balaji Rao 提交于
The userspace API for the HPET (see Documentation/hpet.txt) did not work. The HPET_IE_ON ioctl was failing as there was no IRQ assigned to the timer device. This patch fixes it by allocating IRQs to timer blocks in the HPET. arch/x86/kernel/hpet.c | 13 +++++-------- drivers/char/hpet.c | 45 ++++++++++++++++++++++++++++++++++++++------- include/linux/hpet.h | 2 +- 3 files changed, 44 insertions(+), 16 deletions(-) Signed-off-by: NBalaji Rao <balajirrao@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
The following scenario might leave PIT as a disfunctional clock source: PIT is registered as clocksource PM_TIMER is registered as clocksource and enables highres/dyntick mode PIT is switched to oneshot mode -> now the readout of PIT is bogus, but the user might select PIT via the sysfs override, which would break the box as the time readout is unusable. Unregister the PIT clocksource when the PIT clock event device is switched into shutdown / oneshot mode. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Thomas Gleixner 提交于
On x86 the PIT might become an unusable clocksource. Add an unregister function to provide a possibilty to remove the PIT from the list of available clock sources. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Thomas Gleixner 提交于
PIT clocksource is registered unconditionally even when HPET is enabled or when PIT is replaced by the local APIC timer. In both cases PIT can not be used as it is stopped and the readout would be stale. Prevent registering PIT in those cases. patch depends on: x86: offer is_hpet_enabled() on !CONFIG_HPET_TIMER too Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Pavel Machek 提交于
I was confused by FSEC = 10^15 NSEC statement, plus small whitespace fixes. When there's copyright, there should be GPL. Signed-off-by: NPavel Machek <pavel@suse.cz> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Greg KH 提交于
the logic in this function is just crazy. It's recursive, but we can circumvent the creation for the kobject and whole creation of the threshold_block if some conditions are met. That's why we see the allocate_threshold_blocks so many times in the callstack, yet only a few kobjects created. Then we blow up in kobject_uevent_env() on the first debug printk. Which means that we are just passing in garbage. Man, this is one time that comments in code would have been very nice to have, and why forward goto's into major code blocks are just evil... Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 29 1月, 2008 1 次提交
-
-
由 Sam Ravnborg 提交于
This patch consolidate all definitions of .init.text, .init.data and .exit.text, .exit.data section definitions in the generic vmlinux.lds.h. This is a preparational patch - alone it does not buy us much good. Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
-
- 26 1月, 2008 3 次提交
-
-
由 Arjan van de Ven 提交于
LatencyTOP kernel infrastructure; it measures latencies in the scheduler and tracks it system wide and per process. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Use HR-timers (when available) to deliver an accurate preemption tick. The regular scheduler tick that runs at 1/HZ can be too coarse when nice level are used. The fairness system will still keep the cpu utilisation 'fair' by then delaying the task that got an excessive amount of CPU time but try to minimize this by delivering preemption points spot-on. The average frequency of this extra interrupt is sched_latency / nr_latency. Which need not be higher than 1/HZ, its just that the distribution within the sched_latency period is important. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gautham R Shenoy 提交于
Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use get_online_cpus and put_online_cpus instead as it highlights the refcount semantics in these operations. The new API guarantees protection against the cpu-hotplug operation, but it doesn't guarantee serialized access to any of the local data structures. Hence the changes needs to be reviewed. In case of pseries_add_processor/pseries_remove_processor, use cpu_maps_update_begin()/cpu_maps_update_done() as we're modifying the cpu_present_map there. Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 25 1月, 2008 6 次提交
-
-
由 Kay Sievers 提交于
All kobjects require a dynamically allocated name now. We no longer need to keep track if the name is statically assigned, we can just unconditionally free() all kobject names on cleanup. Signed-off-by: NKay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Greg Kroah-Hartman 提交于
There is no need for kobject_unregister() anymore, thanks to Kay's kobject cleanup changes, so replace all instances of it with kobject_put(). Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Greg Kroah-Hartman 提交于
Stop using kobject_register, as this way we can control the sending of the uevent properly, after everything is properly initialized. Cc: Jacob Shin <jacob.shin@amd.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Greg Kroah-Hartman 提交于
Make this kobject dynamic and convert it to not use kobject_register, which is going away. Cc: Jacob Shin <jacob.shin@amd.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Greg Kroah-Hartman 提交于
Stop using kobject_register, as this way we can control the sending of the uevent properly, after everything is properly initialized. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Rafael J. Wysocki 提交于
This patch reorganizes the way suspend and resume notifications are sent to drivers. The major changes are that now the PM core acquires every device semaphore before calling the methods, and calls to device_add() during suspends will fail, while calls to device_del() during suspends will block. It also provides a way to safely remove a suspended device with the help of the PM core, by using the device_pm_schedule_removal() callback introduced specifically for this purpose, and updates two drivers (msr and cpuid) that need to use it. Signed-off-by: NAlan Stern <stern@rowland.harvard.edu> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 23 1月, 2008 1 次提交
-
-
由 Jordan Crouse 提交于
When we set the MFGPT timer tick, there is a chance that we'll immediately assert an event. If for some reason the IRQ routing for this clock has been setup for some other purpose, then we could end up firing an interrupt into the SMM handler or worse. This rearranges the timer tick init function to initalize the handler before we set up the MFGPT clock to make sure that even if we get an event, it will go to the handler. Furthermore, in the handler we need to make sure that we clear the event, even if the timer isn't running. Signed-off-by: NJordan Crouse <jordan.crouse@amd.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NIngo Molnar <mingo@elte.hu> Tested-by: NArnd Hannemann <hannemann@i4.informatik.rwth-aachen.de>
-
- 22 1月, 2008 1 次提交
-
-
由 Thomas Gleixner 提交于
This reverts commit d4d25dec. It tried to fix long standing bugzilla entries, but the solution was reported to break other systems. The reporter of http://bugzilla.kernel.org/show_bug.cgi?id=9791 tracked it down to this commit and confirmed that reverting the patch restores the correct behaviour. It's too late in the release cycle to find a better solution than reverting the commit to avoid regressions. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NIngo Molnar <mingo@elte.hu>
-
- 16 1月, 2008 1 次提交
-
-
由 Peter Zijlstra 提交于
On Sat, 2007-12-29 at 18:06 +0100, Marcin Slusarz wrote: > Hi > Today I've got this (while i was upgrading my gentoo box): > > WARNING: at kernel/lockdep.c:2658 check_flags() > Pid: 21680, comm: conftest Not tainted 2.6.24-rc6 #63 > > Call Trace: > [<ffffffff80253457>] check_flags+0x1c7/0x1d0 > [<ffffffff80257217>] lock_acquire+0x57/0xc0 > [<ffffffff8024d5c0>] __atomic_notifier_call_chain+0x60/0xd0 > [<ffffffff8024d641>] atomic_notifier_call_chain+0x11/0x20 > [<ffffffff8024d67e>] notify_die+0x2e/0x30 > [<ffffffff8020da0a>] do_divide_error+0x5a/0xa0 > [<ffffffff80522bdd>] trace_hardirqs_on_thunk+0x35/0x3a > [<ffffffff80255b89>] trace_hardirqs_on+0xd9/0x180 > [<ffffffff80522bdd>] trace_hardirqs_on_thunk+0x35/0x3a > [<ffffffff80523c2d>] error_exit+0x0/0xa9 > > possible reason: unannotated irqs-off. > irq event stamp: 4693 > hardirqs last enabled at (4693): [<ffffffff80522bdd>] trace_hardirqs_on_thunk+0x35/0x3a > hardirqs last disabled at (4692): [<ffffffff80522c17>] trace_hardirqs_off_thunk+0x35/0x37 > softirqs last enabled at (3546): [<ffffffff80238343>] __do_softirq+0xb3/0xd0 > softirqs last disabled at (3521): [<ffffffff8020c97c>] call_softirq+0x1c/0x30 more early fixups for notify_die().. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 15 1月, 2008 2 次提交
-
-
由 Bernhard Walle 提交于
In the current code, RTC_AIE doesn't work if the RTC relies on CONFIG_HPET_EMULATE_RTC because the code sets the RTC_AIE flag in hpet_set_rtc_irq_bit(). The interrupt handles does accidentally check for RTC_PIE and not RTC_AIE when comparing the time which was set in hpet_set_alarm_time(). I now verified on a test system here that without the patch applied, the attached test program fails on a system that has HPET with 2.6.24-rc7-default. That's not critical since I guess the problem has been there for several kernel releases, but as the fix is quite obvious. Configuration is CONFIG_RTC=y and CONFIG_HPET_EMULATE_RTC=y. Signed-off-by: NBernhard Walle <bwalle@suse.de> Acked-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
Sometimes cpu_idle_wait gets stuck because it might miss CPUS that are already in idle, have no tasks waiting to run and have no interrupts going to them. This is common on bootup when switching cpu idle governors. This patch gives those CPUS that don't check in an IPI kick. Background: ----------- I notice this while developing the mcount patches, that every once in a while the system would hang. Looking deeper, the hang was always at boot up when registering init_menu of the cpu_idle menu governor. Talking with Thomas Gliexner, we discovered that one of the CPUS had no timer events scheduled for it and it was in idle (running with NO_HZ). So the CPU would not set the cpu_idle_state bit. Hitting sysrq-t a few times would eventually route the interrupt to the stuck CPU and the system would continue. Note, I would have used the PDA isidle but that is set after the cpu_idle_state bit is cleared, and would leave a window open where we may miss being kicked. hmm, looking closer at this, we still have a small race window between clearing the cpu_idle_state and disabling interrupts (hence the RFC). CPU0: CPU 1: --------- --------- cpu_idle_wait(): cpu_idle(): | __cpu_cpu_var(is_idle) = 1; | if (__get_cpu_var(cpu_idle_state)) /* == 0 */ per_cpu(cpu_idle_state, 1) = 1; | if (per_cpu(is_idle, 1)) /* == 1 */ | smp_call_function(1) | | receives ipi and runs do_nothing. wait on map == empty idle(); /* waits forever */ So really we need interrupts off for most of this then. One might think that we could simply clear the cpu_idle_state from do_nothing, but I'm assuming that cpu_idle governors can be removed, and this might cause a race that a governor might be used after the module was removed. Venki said: I think your RFC patch is the right solution here. As I see it, there is no race with your RFC patch. As long as you call a dummy smp_call_function on all CPUs, we should be OK. We can get rid of cpu_idle_state and the current wait forever logic altogether with dummy smp_call_function. And so there wont be any wait forever scenario. The whole point of cpu_idle_wait() is to make all CPUs come out of idle loop atleast once. The caller will use cpu_idle_wait something like this. // Want to change idle handler - Switch global idle handler to always present default_idle - call cpu_idle_wait so that all cpus come out of idle for an instant and stop using old idle pointer and start using default idle - Change the idle handler to a new handler - optional cpu_idle_wait if you want all cpus to start using the new handler immediately. Maybe the below 1s patch is safe bet for .24. But for .25, I would say we just replace all complicated logic by simple dummy smp_call_function and remove cpu_idle_state altogether. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: NIngo Molnar <mingo@elte.hu> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@suse.de> Cc: Len Brown <lenb@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-