- 22 7月, 2007 3 次提交
-
-
由 Tim Hockin 提交于
Background: The MCE handler has several paths that it can take, depending on various conditions of the MCE status and the value of the 'tolerant' knob. The exact semantics are not well defined and the code is a bit twisty. Description: This patch makes the MCE handler's behavior more clear by documenting the behavior for various 'tolerant' levels. It also fixes or enhances several small things in the handler. Specifically: * If RIPV is set it is not safe to restart, so set the 'no way out' flag rather than the 'kill it' flag. * Don't panic() on correctable MCEs. * If the _OVER bit is set *and* the _UC bit is set (meaning possibly dropped uncorrected errors), set the 'no way out' flag. * Use EIPV for testing whether an app can be killed (SIGBUS) rather than RIPV. According to docs, EIPV indicates that the error is related to the IP, while RIPV simply means the IP is valid to restart from. * Don't clear the MCi_STATUS registers until after the panic() path. This leaves the status bits set after the panic() so clever BIOSes can find them (and dumb BIOSes can do nothing). This patch also calls nonseekable_open() in mce_open (as suggested by akpm). Result: Tolerant levels behave almost identically to how they always have, but not it's well defined. There's a slightly higher chance of panic()ing when multiple errors happen (a good thing, IMHO). If you take an MBE and panic(), the error status bits are not cleared. Alternatives: None. Testing: I used software to inject correctable and uncorrectable errors. With tolerant = 3, the system usually survives. With tolerant = 2, the system usually panic()s (PCC) but not always. With tolerant = 1, the system always panic()s. When the system panic()s, the BIOS is able to detect that the cause of death was an MC4. I was not able to reproduce the case of a non-PCC error in userspace, with EIPV, with (tolerant < 3). That will be rare at best. Signed-off-by: NTim Hockin <thockin@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tim Hockin 提交于
Background: /dev/mcelog is typically polled manually. This is less than optimal for situations where accurate accounting of MCEs is important. Calling poll() on /dev/mcelog does not work. Description: This patch adds support for poll() to /dev/mcelog. This results in immediate wakeup of user apps whenever the poller finds MCEs. Because the exception handler can not take any locks, it can not call the wakeup itself. Instead, it uses a thread_info flag (TIF_MCE_NOTIFY) which is caught at the next return from interrupt or exit from idle, calling the mce_user_notify() routine. This patch also disables the "fake panic" path of the mce_panic(), because it results in printk()s in the exception handler and crashy systems. This patch also does some small cleanup for essentially unused variables, and moves the user notification into the body of the poller, so it is only called once per poll, rather than once per CPU. Result: Applications can now poll() on /dev/mcelog. When an error is logged (whether through the poller or through an exception) the applications are woken up promptly. This should not affect any previous behaviors. If no MCEs are being logged, there is no overhead. Alternatives: I considered simply supporting poll() through the poller and not using TIF_MCE_NOTIFY at all. However, the time between an uncorrectable error happening and the user application being notified is *the*most* critical window for us. Many uncorrectable errors can be logged to the network if given a chance. I also considered doing the MCE poll directly from the idle notifier, but decided that was overkill. Testing: I used an error-injecting DIMM to create lots of correctable DRAM errors and verified that my user app is woken up in sync with the polling interval. I also used the northbridge to inject uncorrectable ECC errors, and verified (printk() to the rescue) that the notify routine is called and the user app does wake up. I built with PREEMPT on and off, and verified that my machine survives MCEs. [wli@holomorphy.com: build fix] Signed-off-by: NTim Hockin <thockin@google.com> Signed-off-by: NWilliam Irwin <bill.irwin@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tim Hockin 提交于
Background: /dev/mcelog is a clear-on-read interface. It is currently possible for multiple users to open and read() the device. Users are protected from each other during any one read, but not across reads. Description: This patch adds support for O_EXCL to /dev/mcelog. If a user opens the device with O_EXCL, no other user may open the device (EBUSY). Likewise, any user that tries to open the device with O_EXCL while another user has the device will fail (EBUSY). Result: Applications can get exclusive access to /dev/mcelog. Applications that do not care will be unchanged. Alternatives: A simpler choice would be to only allow one open() at all, regardless of O_EXCL. Testing: I wrote an application that opens /dev/mcelog with O_EXCL and observed that any other app that tried to open /dev/mcelog would fail until the exclusive app had closed the device. Caveats: None. Signed-off-by: NTim Hockin <thockin@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 7月, 2007 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Rather than using a tri-state integer for the wait flag in call_usermodehelper_exec, define a proper enum, and use that. I've preserved the integer values so that any callers I've missed should still work OK. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andi Kleen <ak@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Howells <dhowells@redhat.com>
-
- 24 6月, 2007 1 次提交
-
-
由 Joshua Wise 提交于
Background: When a userspace application wants to know about machine check events, it opens /dev/mcelog and does a read(). Usually, we found that this interface works well, but in some cases, when the system was taking large numbers of machine check exceptions, the read() would hang. The system would output a soft-lockup warning, and the daemon reading from /dev/mcelog would suck up as much of a single CPU as it could spinning in system space. Description: This patch fixes this bug. In particular, there was a "continue" inside a timeout loop that presumably was intended to break out of the outer loop, but instead caused the inner loop to continue. This patch also makes the condition for the break-out a little more evident by changing a !time_before to a time_after_eq. Result: The read() no longer hangs in this test case. Testing: On my system, I could replicate the bug with the following command: # for i in `seq 15000`; do ./inject_sbe.sh; done where inject_sbe.sh contains commands to inject a single-bit error into the next memory write transaction. Patch: This patch is against git f1518a08. Signed-off-by: NJoshua Wise <jwise@google.com> Signed-off-by: NTim Hockin <thockin@google.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 5月, 2007 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 5月, 2007 1 次提交
-
-
由 Christoph Hellwig 提交于
This patch moves the die notifier handling to common code. Previous various architectures had exactly the same code for it. Note that the new code is compiled unconditionally, this should be understood as an appel to the other architecture maintainer to implement support for it aswell (aka sprinkling a notify_die or two in the proper place) arm had a notifiy_die that did something totally different, I renamed it to arm_notify_die as part of the patch and made it static to the file it's declared and used at. avr32 used to pass slightly less information through this interface and I brought it into line with the other architectures. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix vmalloc_sync_all bustage] [bryan.wu@analog.com: fix vmalloc_sync_all in nommu] Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NBryan Wu <bryan.wu@analog.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 5月, 2007 1 次提交
-
-
由 Tim Hockin 提交于
Background: We've found that MCEs (specifically DRAM SBEs) tend to come in bunches, especially when we are trying really hard to stress the system out. The current MCE poller uses a static interval which does not care whether it has or has not found MCEs recently. Description: This patch makes the MCE poller adjust the polling interval dynamically. If we find an MCE, poll 2x faster (down to 10 ms). When we stop finding MCEs, poll 2x slower (up to check_interval seconds). The check_interval tunable becomes the max polling interval. The "Machine check events logged" printk() is rate limited to the check_interval, which should be identical behavior to the old functionality. Result: If you start to take a lot of correctable errors (not exceptions), you log them faster and more accurately (less chance of overflowing the MCA registers). If you don't take a lot of errors, you will see no change. Alternatives: I considered simply reducing the polling interval to 10 ms immediately and keeping it there as long as we continue to find errors. This felt a bit heavy handed, but does perform significantly better for the default check_interval of 5 minutes (we're using a few seconds when testing for DRAM errors). I could be convinced to go with this, if anyone felt it was not too aggressive. Testing: I used an error-injecting DIMM to create lots of correctable DRAM errors and verified that the polling interval accelerates. The printk() only happens once per check_interval seconds. Patch: This patch is against 2.6.21-rc7. Signed-Off-By: NTim Hockin <thockin@google.com> Signed-off-by: NAndi Kleen <ak@suse.de>
-
- 13 2月, 2007 2 次提交
-
-
由 Andi Kleen 提交于
When a machine check event is detected (including a AMD RevF threshold overflow event) allow to run a "trigger" program. This allows user space to react to such events sooner. The trigger is configured using a new trigger entry in the machinecheck sysfs interface. It is currently shared between all CPUs. I also fixed the AMD threshold handler to run the machine check polling code immediately to actually log any events that might have caused the threshold interrupt. Also added some documentation for the mce sysfs interface. Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Arjan van de Ven 提交于
Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. [akpm@osdl.org: sparc64 fix] Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 12月, 2006 1 次提交
-
-
由 Ingo Molnar 提交于
There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn, prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus generating compiler warnings of unused symbols, hence forcing people to add #ifdefs. the compiler can skip truly unused functions just fine: text data bss dec hex filename 1624412 728710 3674856 6027978 5bfaca vmlinux.before 1624412 728710 3674856 6027978 5bfaca vmlinux.after [akpm@osdl.org: topology.c fix] Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 07 12月, 2006 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Make mce_remove_device() clean up the kobject in per_cpu(device_mce, cpu) after it has been unregistered. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NAndi Kleen <ak@suse.de>
-
- 22 11月, 2006 2 次提交
-
-
由 David Howells 提交于
Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is *not* cleared before jumping to the work function, then the work function *may* access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Separate delayable work items from non-delayable work items be splitting them into a separate structure (delayed_work), which incorporates a work_struct and the timer_list removed from work_struct. The work_struct struct is huge, and this limits it's usefulness. On a 64-bit architecture it's nearly 100 bytes in size. This reduces that by half for the non-delayable type of event. Signed-Off-By: NDavid Howells <dhowells@redhat.com>
-
- 26 9月, 2006 2 次提交
-
-
由 Dmitriy Zavin 提交于
Refactor the event processing (syslog messaging and rate limiting) into separate file therm_throt.c. This allows consistent reporting of CPU thermal throttle events. After ACK'ing the interrupt, if the event is current, the user (p4.c/mce_intel.c) calls therm_throt_process to log (and rate limit) the event. If that function returns 1, the user has the option to log things further (such as to mce_log in x86_64). AK: minor cleanup Signed-off-by: NDmitriy Zavin <dmitriyz@google.com> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Andi Kleen 提交于
And replace all users with ordinary smp_processor_id. The function was originally added to get some basic oops information out even if the GS register was corrupted. However that didn't work for some anymore because printk is needed to print the oops and it uses smp_processor_id() already. Also GS register corruptions are not particularly common anymore. This also helps the Xen port which would otherwise need to do this in a special way because it can't access the local APIC. Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: NAndi Kleen <ak@suse.de>
-
- 01 8月, 2006 1 次提交
-
-
由 Chandra Seetharaman 提交于
Use hotplug version of register_cpu_notifier in late init functions. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 28 6月, 2006 2 次提交
-
-
由 Chandra Seetharaman 提交于
Make notifier_blocks associated with cpu_notifier as __cpuinitdata. __cpuinitdata makes sure that the data is init time only unless CONFIG_HOTPLUG_CPU is defined. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Chandra Seetharaman 提交于
In 2.6.17, there was a problem with cpu_notifiers and XFS. I provided a band-aid solution to solve that problem. In the process, i undid all the changes you both were making to ensure that these notifiers were available only at init time (unless CONFIG_HOTPLUG_CPU is defined). We deferred the real fix to 2.6.18. Here is a set of patches that fixes the XFS problem cleanly and makes the cpu notifiers available only at init time (unless CONFIG_HOTPLUG_CPU is defined). If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run time. This patch reverts the notifier_call changes made in 2.6.17 Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 27 6月, 2006 1 次提交
-
-
由 Jacob Shin 提交于
Get rid of /sys/devices/system/threshold directory and move mce_amd thresholding files into the machine sysfs directory -- /sys/devices/system/machinecheck. AK: Fixed warning Signed-off-by: NJacob Shin <jacob.shin@amd.com> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 26 4月, 2006 1 次提交
-
-
由 Chandra Seetharaman 提交于
Few of the notifier_chain_register() callers use __init in the definition of notifier_call. It is incorrect as the function definition should be available after the initializations (they do not unregister them during initializations). This patch fixes all such usages to _not_ have the notifier_call __init section. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 10 4月, 2006 1 次提交
-
-
由 Andi Kleen 提交于
Machine checks can stall the machine for a long time and it's not good to trigger the nmi watchdog during that. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 01 4月, 2006 1 次提交
-
-
由 OGAWA Hirofumi 提交于
The boot cmdline is parsed in parse_early_param() and parse_args(,unknown_bootoption). And __setup() is used in obsolete_checksetup(). start_kernel() -> parse_args() -> unknown_bootoption() -> obsolete_checksetup() If __setup()'s callback (->setup_func()) returns 1 in obsolete_checksetup(), obsolete_checksetup() thinks a parameter was handled. If ->setup_func() returns 0, obsolete_checksetup() tries other ->setup_func(). If all ->setup_func() that matched a parameter returns 0, a parameter is seted to argv_init[]. Then, when runing /sbin/init or init=app, argv_init[] is passed to the app. If the app doesn't ignore those arguments, it will warning and exit. This patch fixes a wrong usage of it, however fixes obvious one only. Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 24 3月, 2006 1 次提交
-
-
由 Akinobu Mita 提交于
While working on these patch set, I found several possible cleanup on x86-64 and ia64. akpm: I stole this from Andi's queue. Not only does it clean up bitops. It also unrelatedly changes the prototype of pci_mmcfg_init() and removes its arch_initcall(). It seems that the wrong two patches got joined together, but this is the one which has been tested. This patch fixes the current x86_64 build error (the pci_mmcfg_init() declaration in arch/i386/pci/pci.h disagrees with the definition in arch/x86_64/pci/mmconfig.c) This also means that x86_64's pci_mmcfg_init() gets called in the same (new) manner as x86's: from arch/i386/pci/init.c:pci_access_init(), rather than via initcall. The bitops cleanups came along for free. All this worked OK in -mm testing (since 2.6.16-rc4-mm1) because x86_64 was tested with both patches applied. Signed-off-by: NAkinobu Mita <mita@miraclelinux.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Con Kolivas <kernel@kolivas.org> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 05 2月, 2006 1 次提交
-
-
由 Ashok Raj 提交于
attached patch is 2 more cases i found via running the reference_init.pl script. These were easy to spot just knowing the file names. There is one another about init/main.c that i cant exactly zero in. (partly because i dont know how to interpret the data thats spewed out of the tool). Signed-off-by: NAshok Raj <ashok.raj@intel.com> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 12 1月, 2006 5 次提交
-
-
由 Andi Kleen 提交于
hard_smp_processor_id would return the local APIC id instead of the Linux processor id. On big systems they are often not identical. safe_smp_processor_id is just a wrapper around it that does the necessary conversions. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andi Kleen 提交于
Hopefully the users will take the hint. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Shaohua Li 提交于
There is one CPU here whose MCE bank count is 6. This patch increases x86_64's MCE bank count. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jan Beulich 提交于
This adjusts things so that handlers of the die() notifier will have sufficient information about the trap currently being handled. It also adjusts the notify_die() prototype to (again) match that of i386. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Randy Dunlap 提交于
arch: Use <linux/capability.h> where capable() is used. Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 15 11月, 2005 2 次提交
-
-
由 Andi Kleen 提交于
The logging for boot errors was turned off because it was broken on some AMD systems. But give Intel EM64T systems a chance because they are supposed to be correct there. The advantage is that there is a chance to actually log uncorrected machine checks after the reset. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jacob Shin 提交于
MC4_MISC - DRAM Errors Threshold Register realized under AMD K8 Rev F. This register is used to count correctable and uncorrectable ECC errors that occur during DRAM read operations. The user may interface through sysfs files in order to change the threshold configuration. bank%d/error_count - reads current error count, write to clear. bank%d/interrupt_enable - set/clear interrupt enable. bank%d/threshold_limit - read/write the threshold limit. APIC vector 0xF9 in hw_irq.h. 5 software defined bank ids in mce.h. new apic.c function to setup threshold apic lvt. defaults to interrupt off, count enabled, and threshold limit max. sysfs interface created on /sys/devices/system/threshold. AK: added some ifdefs to make it compile on UP Signed-off-by: NJacob Shin <jacob.shin@amd.com> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 30 9月, 2005 1 次提交
-
-
由 Mike Waychison 提交于
The attempt to fixup the lockless mce log buffer introduced an infinite loop when trying to find a free entry. And: Using rcu_dereference() to load mcelog.next doesn't seem to be sufficient enough to ensure that mcelog.next is loaded each time around the loop in mce_log(). Instead, use an explicit rmb() to ensure that the compiler gets it right. AK: turned the smp_wmbs into true wmbs to make sure they are not reordered by the compiler on UP. Signed-off-by: NMike Waychison <mikew@google.com> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 13 9月, 2005 4 次提交
-
-
由 Randy Dunlap 提交于
Use the add_taint() interface for setting tainted bit flags instead of doing it manually. Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andi Kleen 提交于
The resume code uses CPU hotplug now so at resume time we only ever see one CPU. Pointed out by Yu Luming. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andi Kleen 提交于
Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andi Kleen 提交于
One machine is constantly throwing NMI watchdog timeouts in mce_log This was one attempt to fix it. (AK: this doesn't actually fix the bug I'm seeing unfortunately, probably drop. I don't like it that the reader can spin forever now waiting for a writer) Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 08 8月, 2005 1 次提交
-
-
由 Andi Kleen 提交于
Don't log machine check events left over from boot. Too many BIOSes leave bogus events in there. This unfortunately also makes it impossible to log events that caused a reboot. For people with non broken BIOS there is mce=bootlog Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 29 7月, 2005 1 次提交
-
-
由 Andi Kleen 提交于
This patch will create machinecheck sysdev directories per CPU. All of the cpus still share the same ctl banks. When compiled with CONFIG_HOTPLUG_CPU, it will also bring up/down sysdev directories as cpus go up/down. I have tested the patch along with CONFIG_HOTPLUG_CPU option on in 2.6.13-rc1 kernel. Minor changes by AK: remove useless unload function Signed-off-by: NJacob Shin <jacob.shin@amd.com> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 26 6月, 2005 1 次提交
-
-
由 Paul E. McKenney 提交于
2.6.12-rc6-mm1 has a few remaining synchronize_kernel()s, some (but not all) in comments. This patch changes these synchronize_kernel() calls (and comments) to synchronize_rcu() or synchronize_sched() as follows: - arch/x86_64/kernel/mce.c mce_read(): change to synchronize_sched() to handle races with machine-check exceptions (synchronize_rcu() would not cut it given RCU implementations intended for hardcore realtime use. - drivers/input/serio/i8042.c i8042_stop(): change to synchronize_sched() to handle races with i8042_interrupt() interrupt handler. Again, synchronize_rcu() would not cut it given RCU implementations intended for hardcore realtime use. - include/*/kdebug.h comments: change to synchronize_sched() to handle races with NMIs. As before, synchronize_rcu() would not cut it... - include/linux/list.h comment: change to synchronize_rcu(), since this comment is for list_del_rcu(). - security/keys/key.c unregister_key_type(): change to synchronize_rcu(), since this is interacting with RCU read side. - security/keys/process_keys.c install_session_keyring(): change to synchronize_rcu(), since this is interacting with RCU read side. Signed-off-by: N"Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-