- 30 4月, 2012 3 次提交
-
-
由 Borislav Petkov 提交于
Turn off MC4_MISC thresholding banks on models which have them but that particular processor implementation does not supply applicable error sources to be counted. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Borislav Petkov 提交于
Depending on whether the box supports the APIC LVT interrupt for thresholding, we want to show the 'interrupt_enable' sysfs node or not. Make that the case by adding it to the default sysfs attributes only if it is supported. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Borislav Petkov 提交于
Currently, the APIC LVT interrupt for error thresholding is implicitly enabled. However, there are models in the F15h range which do not enable it. Make the code machinery which sets up the APIC interrupt support an optional setting and add an ->interrupt_capable member to the bank representation mirroring that capability and enable the interrupt offset programming only if it is true. Simplify code and fixup comment style while at it. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
- 29 3月, 2012 1 次提交
-
-
由 David Howells 提交于
Disintegrate asm/system.h for X86. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NH. Peter Anvin <hpa@zytor.com> cc: x86@kernel.org
-
- 07 3月, 2012 1 次提交
-
-
由 Srivatsa S. Bhat 提交于
While booting, the following message is seen: [ 21.665087] =============================== [ 21.669439] [ INFO: suspicious RCU usage. ] [ 21.673798] 3.2.0-0.0.0.28.36b5ec9-default #2 Not tainted [ 21.681353] ------------------------------- [ 21.685864] arch/x86/kernel/cpu/mcheck/mce.c:194 suspicious rcu_dereference_index_check() usage! [ 21.695013] [ 21.695014] other info that might help us debug this: [ 21.695016] [ 21.703488] [ 21.703489] rcu_scheduler_active = 1, debug_locks = 1 [ 21.710426] 3 locks held by modprobe/2139: [ 21.714754] #0: (&__lockdep_no_validate__){......}, at: [<ffffffff8133afd3>] __driver_attach+0x53/0xa0 [ 21.725020] #1: [ 21.725323] ioatdma: Intel(R) QuickData Technology Driver 4.00 [ 21.733206] (&__lockdep_no_validate__){......}, at: [<ffffffff8133afe1>] __driver_attach+0x61/0xa0 [ 21.743015] #2: (i7core_edac_lock){+.+.+.}, at: [<ffffffffa01cfa5f>] i7core_probe+0x1f/0x5c0 [i7core_edac] [ 21.753708] [ 21.753709] stack backtrace: [ 21.758429] Pid: 2139, comm: modprobe Not tainted 3.2.0-0.0.0.28.36b5ec9-default #2 [ 21.768253] Call Trace: [ 21.770838] [<ffffffff810977cd>] lockdep_rcu_suspicious+0xcd/0x100 [ 21.777366] [<ffffffff8101aa41>] drain_mcelog_buffer+0x191/0x1b0 [ 21.783715] [<ffffffff8101aa78>] mce_register_decode_chain+0x18/0x20 [ 21.790430] [<ffffffffa01cf8db>] i7core_register_mci+0x2fb/0x3e4 [i7core_edac] [ 21.798003] [<ffffffffa01cfb14>] i7core_probe+0xd4/0x5c0 [i7core_edac] [ 21.804809] [<ffffffff8129566b>] local_pci_probe+0x5b/0xe0 [ 21.810631] [<ffffffff812957c9>] __pci_device_probe+0xd9/0xe0 [ 21.816650] [<ffffffff813362e4>] ? get_device+0x14/0x20 [ 21.822178] [<ffffffff81296916>] pci_device_probe+0x36/0x60 [ 21.828061] [<ffffffff8133ac8a>] really_probe+0x7a/0x2b0 [ 21.833676] [<ffffffff8133af23>] driver_probe_device+0x63/0xc0 [ 21.839868] [<ffffffff8133b01b>] __driver_attach+0x9b/0xa0 [ 21.845718] [<ffffffff8133af80>] ? driver_probe_device+0xc0/0xc0 [ 21.852027] [<ffffffff81339168>] bus_for_each_dev+0x68/0x90 [ 21.857876] [<ffffffff8133aa3c>] driver_attach+0x1c/0x20 [ 21.863462] [<ffffffff8133a64d>] bus_add_driver+0x16d/0x2b0 [ 21.869377] [<ffffffff8133b6dc>] driver_register+0x7c/0x160 [ 21.875220] [<ffffffff81296bda>] __pci_register_driver+0x6a/0xf0 [ 21.881494] [<ffffffffa01fe000>] ? 0xffffffffa01fdfff [ 21.886846] [<ffffffffa01fe047>] i7core_init+0x47/0x1000 [i7core_edac] [ 21.893737] [<ffffffff810001ce>] do_one_initcall+0x3e/0x180 [ 21.899670] [<ffffffff810a9b95>] sys_init_module+0xc5/0x220 [ 21.905542] [<ffffffff8149bc39>] system_call_fastpath+0x16/0x1b Fix this by using ACCESS_ONCE() instead of rcu_dereference_check_mce() over mcelog.next. Since the access to each entry is controlled by the ->finished field, ACCESS_ONCE() should work just fine. An rcu_dereference is unnecessary here. Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Suggested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
- 23 2月, 2012 2 次提交
-
-
由 Naoya Horiguchi 提交于
Current kernel MCE code reads ERST at the first reading of /dev/mcelog (maybe in starting mcelogd,) even if the system does not support ERST, which results in a fake "no such device" message (as described in [1].) This problem is not critical, but can confuse system admins. This patch fixes it by filtering the return value from lower (ACPI) layer. [1] http://thread.gmane.org/gmane.linux.kernel/1060250 Reported by: Jon Masters <jonathan@jonmasters.org> Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Link: https://lkml.org/lkml/2012/1/23/299Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Greg Kroah-Hartman 提交于
When I previously fixed up the mce_device code, I used a static array of the pointers. It was (rightfully) pointed out to me that I should be using the per_cpu code instead. This patch converts the code over to that structure, moving the variable back into the per_cpu area, like it used to be for 3.2 and earlier. Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de> Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Link: https://lkml.org/lkml/2012/1/27/165Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 22 2月, 2012 1 次提交
-
-
由 Borislav Petkov 提交于
141168c3 ("x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86'") removed a bunch of CONFIG_SMP ifdefs around code touching struct cpuinfo_x86 members but also caused the following build error with Randy's randconfigs: mce_amd.c:(.cpuinit.text+0x4723): undefined reference to `cpu_llc_shared_map' Restore the #ifdef in threshold_create_bank() which creates symlinks on the non-BSP CPUs. There's a better patch series being worked on by Kevin Winchester which will solve this in a cleaner fashion, but that series is too ambitious for v3.3 merging - so we first queue up this trivial fix and then do the rest for v3.4. Signed-off-by: NBorislav Petkov <bp@alien8.de> Acked-by: NKevin Winchester <kjwinchester@gmail.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Nick Bowler <nbowler@elliptictech.com> Link: http://lkml.kernel.org/r/20120203191801.GA2846@x1.osrc.amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 27 1月, 2012 1 次提交
-
-
由 Tony Luck 提交于
Magic constants like 0x0134 in code just invite questions on where they come from, what they mean, can they be changed. Provide #defines for the architecturally defined MCACOD values with a reference to the Intel Software Developers manual which describes them. Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 17 1月, 2012 1 次提交
-
-
由 Greg Kroah-Hartman 提交于
When suspending, there was a large list of warnings going something like: Device 'machinecheck1' does not have a release() function, it is broken and must be fixed This patch turns the static mce_devices into dynamically allocated, and properly frees them when they are removed from the system. It solves the warning messages on my laptop here. Reported-by: N"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Tested-by: NDjalal Harouni <tixxdz@opendz.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@amd64.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 1月, 2012 1 次提交
-
-
由 Srivatsa S. Bhat 提交于
Commit 8a25a2fd ("cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem") changed how things are dealt with in the MCE subsystem. Some of the things that got broken due to this are CPU hotplug and suspend/hibernate. MCE uses per_cpu allocations of struct device. So, when a CPU goes offline and comes back online, in order to ensure that we start from a clean slate with respect to the MCE subsystem, zero out the entire per_cpu device structure to 0 before using it. Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 1月, 2012 5 次提交
-
-
由 Tony Luck 提交于
Action required data path signature is defined in table 15-19 of SDM: +-----------------------------------------------------------------------------+ | SRAR Error | Valid | OVER | UC | EN | MISCV | ADDRV | PCC | S | AR | MCACOD | | Data Load | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0x134 | +-----------------------------------------------------------------------------+ Recognise this, and pass MCE_AR_SEVERITY code back to do_machine_check() if we have the action handler configured (CONFIG_MEMORY_FAILURE=y) Acked-by: NBorislav Petkov <bp@amd64.org> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Tony Luck 提交于
All non-urgent actions (reporting low severity errors and handling "action-optional" errors) are now handled by a work queue. This means that TIF_MCE_NOTIFY can be used to block execution for a thread experiencing an "action-required" fault until we get all cpus out of the machine check handler (and the thread that hit the fault into mce_notify_process(). We use the new mce_{save,find,clear}_info() API to get information from do_machine_check() to mce_notify_process(), and then use the newly improved memory_failure(..., MF_ACTION_REQUIRED) to handle the error (possibly signalling the process). Update some comments to make the new code flows clearer. Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Tony Luck 提交于
Machine checks on Intel cpus interrupt execution on all cpus, regardless of interrupt masking. We have a need to save some data about the cause of the machine check (physical address) in the machine check handler that can be retrieved later to attempt recovery in a more flexible execution state. Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Tony Luck 提交于
The MCI_STATUS_MISCV and MCI_STATUS_ADDRV bits in the bank status registers define whether the MISC and ADDR registers respectively contain valid data - provide a helper function to check these bits and read the registers when needed. In addition, processors that support software error recovery (as indicated by the MCG_SER_P bit in the MCG_CAP register) may include some undefined bits in the ADDR register - mask these out. Acked-by: NBorislav Petkov <bp@amd64.org> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Tony Luck 提交于
There is only one caller of memory_failure(), all other users call __memory_failure() and pass in the flags argument explicitly. The lone user of memory_failure() will soon need to pass flags too. Add flags argument to the callsite in mce.c. Delete the old memory_failure() function, and then rename __memory_failure() without the leading "__". Provide clearer message when action optional memory errors are ignored. Acked-by: NBorislav Petkov <bp@amd64.org> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 22 12月, 2011 1 次提交
-
-
由 Kay Sievers 提交于
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Userspace relies on events and generic sysfs subsystem infrastructure from sysdev devices, which are made available with this conversion. Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@amd64.org> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Len Brown <lenb@kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NKay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 21 12月, 2011 1 次提交
-
-
由 Kevin Winchester 提交于
Several fields in struct cpuinfo_x86 were not defined for the !SMP case, likely to save space. However, those fields still have some meaning for UP, and keeping them allows some #ifdef removal from other files. The additional size of the UP kernel from this change is not significant enough to worry about keeping up the distinction: text data bss dec hex filename 4737168 506459 972040 6215667 5ed7f3 vmlinux.o.before 4737444 506459 972040 6215943 5ed907 vmlinux.o.after for a difference of 276 bytes for an example UP config. If someone wants those 276 bytes back badly then it should be implemented in a cleaner way. Signed-off-by: NKevin Winchester <kjwinchester@gmail.com> Cc: Steffen Persvold <sp@numascale.com> Link: http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinchester@gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 17 12月, 2011 1 次提交
-
-
由 Chen Gong 提交于
mce-inject provides a mechanism to simulate errors so that test scripts can check for correct operation of the kernel without requiring any specialized hardware to create rare events. The existing code can simulate events in normal process context and also in NMI context - but not in IRQ context. This patch fills that gap. Link: https://lkml.org/lkml/2011/12/7/537Signed-off-by: NChen Gong <gong.chen@linux.intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 15 12月, 2011 1 次提交
-
-
由 Fenghua Yu 提交于
Thermal throttle and power limit events are not defined as MCE errors in x86 architecture and should not generate MCE errors in mcelog. Current kernel generates fake software defined MCE errors for these events. This may confuse users because they may think the machine has real MCE errors while actually only thermal throttle or power limit events happen. To make it worse, buggy firmware on some platforms may falsely generate the events. Therefore, kernel reports MCE errors which users think as real hardware errors. Although the firmware bugs should be fixed, on the other hand, kernel should not report MCE errors either. So mcelog is not a good mechanism to report these events. To report the events, we count them in respective counters (core_power_limit_count, package_power_limit_count, core_throttle_count, and package_throttle_count) in /sys/devices/system/cpu/cpu#/thermal_throttle/. Users can check the counters for each event on each CPU. Please note that all CPU's on one package report duplicate counters. It's user application's responsibity to retrieve a package level counter for one package. This patch doesn't report package level power limit, core level power limit, and package level thermal throttle events in mcelog. When the events happen, only report them in respective counters in sysfs. Since core level thermal throttle has been legacy code in kernel for a while and users accepted it as MCE error in mcelog, core level thermal throttle is still reported in mcelog. In the mean time, the event is counted in a counter in sysfs as well. Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Acked-by: NBorislav Petkov <bp@amd64.org> Acked-by: NTony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20111215001945.GA21009@linux-os.sc.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 14 12月, 2011 2 次提交
-
-
由 Borislav Petkov 提交于
Add a function which drains whatever MCEs were logged in already during boot and before the decoder chains were registered. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Borislav Petkov 提交于
No functionality change, this is done so that in a follow-on patch all queued-up MCEs can be decoded after registering on the chain. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
- 12 12月, 2011 1 次提交
-
-
由 Frederic Weisbecker 提交于
Interrupts notify the idle exit state before calling irq_enter(). But the notifier code calls rcu_read_lock() and this is not allowed while rcu is in an extended quiescent state. We need to wait for irq_enter() -> rcu_idle_exit() to be called before doing so otherwise this results in a grumpy RCU: [ 0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110() [ 0.099991] Hardware name: AMD690VM-FMH [ 0.099991] Modules linked in: [ 0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255 [ 0.099991] Call Trace: [ 0.099991] <IRQ> [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0 [ 0.099991] [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20 [ 0.099991] [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110 [ 0.099991] [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20 [ 0.099991] [<ffffffff81001873>] exit_idle+0x43/0x50 [ 0.099991] [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0 [ 0.099991] [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20 [ 0.099991] <EOI> [<ffffffff8100ae67>] ? default_idle+0xa7/0x350 [ 0.099991] [<ffffffff8100ae65>] ? default_idle+0xa5/0x350 [ 0.099991] [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110 [ 0.099991] [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160 [ 0.099991] [<ffffffff810019a0>] cpu_idle+0xb0/0x110 [ 0.099991] [<ffffffff817a7505>] rest_init+0xe5/0x140 [ 0.099991] [<ffffffff817a7468>] ? rest_init+0x48/0x140 [ 0.099991] [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc [ 0.099991] [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135 [ 0.099991] [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4 Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Henroid <andrew.d.henroid@intel.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
- 08 11月, 2011 1 次提交
-
-
由 Luck, Tony 提交于
Arjan would like to make struct file_operations const, but mce-inject directly writes to the mce_chrdev_ops to install its write handler. In an ideal world mce-inject would have its own character device, but we have a sizable legacy of test scripts that hardwire "/dev/mcelog", so it would be painful to switch to a separate device now. Instead, this patch switches to a stub function in the mce code, with a registration helper that mce-inject can call when it is loaded. Note that this would also allow for a sane process to allow mce-inject to be unloaded again (with an unregister function, and appropriate module_{get,put}() calls), but that is left for potential future patches. Reported-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4eb2e1971326651a3b@agluck-desktop.sc.intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 01 11月, 2011 3 次提交
-
-
由 Borislav Petkov 提交于
Remove edac_mce pieces and use the normal MCE decoder notifier chain by retaining the same functionality with considerably less code. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
-
由 Paul Gortmaker 提交于
These files were implicitly getting EXPORT_SYMBOL via device.h which was including module.h, but that will be fixed up shortly. By fixing these now, we can avoid seeing things like: arch/x86/kernel/rtc.c:29: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’ arch/x86/kernel/pci-dma.c:20: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’ arch/x86/kernel/e820.c:69: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL_GPL’ [ with input from Randy Dunlap <rdunlap@xenotime.net> and also from Stephen Rothwell <sfr@canb.auug.org.au> ] Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Borislav Petkov 提交于
Drop the edac_mce custom hook in favor of the generic notifier mechanism. Also, do not log the error to mcelog if the notified agent was able to decode it. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
-
- 19 10月, 2011 1 次提交
-
-
由 Borislav Petkov 提交于
506ed6b5 ("x86, intel: Output microcode revision in /proc/cpuinfo") added microcode revision format to /proc/cpuinfo and the MCE handler in decimal format but both AMD and Intel patch levels are handled as hex numbers. Fix it. Acked-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
- 14 10月, 2011 1 次提交
-
-
由 Andi Kleen 提交于
I got a request to make it easier to determine the microcode update level on Intel CPUs. This patch adds a new "microcode" field to /proc/cpuinfo. The microcode level is also outputed on fatal machine checks together with the other CPUID model information. I removed the respective code from the microcode update driver, it just reads the field from cpu_data. Also when the microcode is updated it fills in the new values too. I had to add a memory barrier to native_cpuid to prevent it being optimized away when the result is not used. This turns out to clean up further code which already got this information manually. This is done in followon patches. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NH. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1318466795-7393-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 10 10月, 2011 1 次提交
-
-
由 Don Zickus 提交于
Just convert all the files that have an nmi handler to the new routines. Most of it is straight forward conversion. A couple of places needed some tweaking like kgdb which separates the debug notifier from the nmi handler and mce removes a call to notify_die. [Thanks to Ying for finding out the history behind that mce call https://lkml.org/lkml/2010/5/27/114 And Boris responding that he would like to remove that call because of it https://lkml.org/lkml/2011/9/21/163] The things that get converted are the registeration/unregistration routines and the nmi handler itself has its args changed along with code removal to check which list it is on (most are on one NMI list except for kgdb which has both an NMI routine and an NMI Unknown routine). Signed-off-by: NDon Zickus <dzickus@redhat.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NCorey Minyard <minyard@acm.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Corey Minyard <minyard@acm.org> Cc: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 14 9月, 2011 1 次提交
-
-
由 Hidetoshi Seto 提交于
del_timer_sync() can cause a deadlock when called in interrupt context. It is used with on_each_cpu() in some parts for sysfs files like bank*, check_interval, cmci_disabled and ignore_ce. However, use of on_each_cpu() results in calling the function passed as the argument in interrupt context. This causes a flood of nested warnings from del_timer_sync() (it runs on each CPU) caused even by a simple file access like: $ echo 300 > /sys/devices/system/machinecheck/machinecheck0/check_interval Fortunately, these MCE-specific files are rarely used and AFAIK only few MCE geeks experience this warning. To remove the warning, move timer deletion outside of the interrupt context. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
- 13 9月, 2011 1 次提交
-
-
由 Thomas Gleixner 提交于
The cmci_discover_lock can be taken in atomic context (cpu bring up sequence) and therefore cannot be preempted on -rt. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 16 6月, 2011 8 次提交
-
-
由 Hidetoshi Seto 提交于
There are many functions named mce_* so use a new prefix for the subset of functions related to sysfs support. And since f3c6ea1b introduces syscore_ops, use the prefix mce_syscore for some functions related to power management which were in sysdev_class before. Before: After: mce_device mce_sysdev mce_sysclass mce_sysdev_class mce_attrs mce_sysdev_attrs mce_dev_initialized mce_sysdev_initialized mce_create_device mce_sysdev_create mce_remove_device mce_sysdev_remove mce_suspend mce_syscore_suspend mce_shutdown mce_syscore_shutdown mce_resume mce_syscore_resume Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: NTony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4DEED81B.8020506@jp.fujitsu.comSigned-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Hidetoshi Seto 提交于
There are many functions named mce_* so use a new prefix for the subset of functions dealing with the character device /dev/mcelog. This change doesn't impact the mce-inject module because the exported symbol mce_chrdev_ops already has the prefix, therefore it is left unchanged. Before: After: mce_wait mce_chrdev_wait mce_state_lock mce_chrdev_state_lock open_count mce_chrdev_open_count open_exclu mce_chrdev_open_exclu mce_open mce_chrdev_open mce_release mce_chrdev_release mce_read_mutex mce_chrdev_read_mutex mce_read mce_chrdev_read mce_poll mce_chrdev_poll mce_ioctl mce_chrdev_ioctl mce_log_device mce_chrdev_device Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: NTony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4DEED7CD.3040500@jp.fujitsu.comSigned-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Hidetoshi Seto 提交于
Use a temporary local variable m to simplify the code. No change in logic. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: NTony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4DEED7A8.8020307@jp.fujitsu.comSigned-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Hidetoshi Seto 提交于
Use temporary local variable sysdev to simplify the code. No change in logic. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: NTony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4DEED777.7080205@jp.fujitsu.comSigned-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Hidetoshi Seto 提交于
Because "ancient CPUs" like p5 and winchip don't have X86_FEATURE_MCA (I suppose so), mcheck_cpu_init() on such CPUs will return at check of mce_available() after __mcheck_cpu_ancient_init(). It is hard to know this implicit behavior without knowing the CPUs well. So make it clear that we leave mcheck_cpu_init() when the CPU is initialized in __mcheck_cpu_ancient_init(). Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: NTony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4DEED74B.20502@jp.fujitsu.comSigned-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Hidetoshi Seto 提交于
This patch introduces mce_gather_info() which is to be called at the beginning of error handling and gathers minimum error information from proper error registers (and saved registers). As the result of mce_get_rip() is integrated, unnecessary zeroing is removed. This also takes care of saving RIP which is required to make some decision about error severity for SRAR errors, instead of retrieving it later in the handler. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: NTony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4DEED71A.1060906@jp.fujitsu.comSigned-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Hidetoshi Seto 提交于
Follow other MCi register defines. Plus define MCI_MISC_ADDR_LSB() and MCI_MISC_ADDR_MODE(). Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: NTony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4DEED6E8.9090509@jp.fujitsu.comSigned-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Hidetoshi Seto 提交于
The MCE handler uses a special vector for self IPI to invoke post-emergency processing in an interrupt context, e.g. call an NMI-unsafe function, wakeup loggers, schedule time-consuming work for recovery, etc. This mechanism is now generalized by the following commit: > e360adbe > Author: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Thu Oct 14 14:01:34 2010 +0800 > > irq_work: Add generic hardirq context callbacks > > Provide a mechanism that allows running code in IRQ context. It is > most useful for NMI code that needs to interact with the rest of the > system -- like wakeup a task to drain buffers. : So change to use provided generic mechanism. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: NTony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4DEED6B2.6080005@jp.fujitsu.comSigned-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-