- 29 3月, 2014 1 次提交
-
-
由 Chen, Gong 提交于
When CMCI storm persists for a long time(at least beyond predefined threshold. It's 30 seconds for now), we can watch CMCI storm is detected immediately after it subsides. ... Dec 10 22:04:29 kernel: CMCI storm detected: switching to poll mode Dec 10 22:04:59 kernel: CMCI storm subsided: switching to interrupt mode Dec 10 22:04:59 kernel: CMCI storm detected: switching to poll mode Dec 10 22:05:29 kernel: CMCI storm subsided: switching to interrupt mode ... The problem is that our logic that determines that the storm has ended is incorrect. We announce the end, re-enable interrupts and realize that the storm is still going on, so we switch back to polling mode. Rinse, repeat. When a storm happens we disable signaling of errors via CMCI and begin polling machine check banks instead. If we find any logged errors, then we need to set a per-cpu flag so that our per-cpu tests that check whether the storm is ongoing will see that errors are still being logged independently of whether mce_notify_irq() says that the error has been fully processed. cmci_clear() is not the right tool to disable a bank. It disables the interrupt for the bank as desired, but it also clears the bit for this bank in "mce_banks_owned" so we will skip the bank when polling (so we fail to see that the storm continues because we stop looking). New cmci_storm_disable_banks() just disables the interrupt while allowing polling to continue. Reported-by: NWilliam Dauchy <wdauchy@gmail.com> Signed-off-by: NChen, Gong <gong.chen@linux.intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 07 1月, 2014 1 次提交
-
-
由 Paul Gortmaker 提交于
None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. [ hpa: undid incorrect removal from arch/x86/kernel/head_32.S ] Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1389054026-12947-1-git-send-email-paul.gortmaker@windriver.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 09 7月, 2013 1 次提交
-
-
由 Naveen N. Rao 提交于
The Corrected Machine Check structure (CMC) in HEST has a flag which can be set by the firmware to indicate to the OS that it prefers to process the corrected error events first. In this scenario, the OS is expected to not monitor for corrected errors (through CMCI/polling). Instead, the firmware notifies the OS on corrected error events through GHES. Linux already has support for GHES. This patch adds support for parsing CMC structure and to disable CMCI/polling if the firmware first flag is set. Further, the list of machine check bank structures at the end of CMC is used to determine which MCA banks function in FF mode, so that we continue to monitor error events on the other banks. Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 26 6月, 2013 1 次提交
-
-
由 Naveen N. Rao 提交于
There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned' per-cpu bitmaps. Provide comments so that we all know exactly what these are used for, and why. Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 03 4月, 2013 1 次提交
-
-
由 Srivatsa S. Bhat 提交于
Dave Jones reports that offlining a CPU leads to this trace: numa_remove_cpu cpu 1 node 0: mask now 0,2-3 smpboot: CPU 1 is now offline BUG: using smp_processor_id() in preemptible [00000000] code: cpu-offline.sh/10591 caller is cmci_rediscover+0x6a/0xe0 Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2 Call Trace: [<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100 [<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0 [<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae [<ffffffff8160ea66>] notifier_call_chain+0x66/0x150 [<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10 [<ffffffff8104c2e3>] cpu_notify+0x23/0x50 [<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20 [<ffffffff815ef082>] _cpu_down+0x302/0x350 [<ffffffff815ef106>] cpu_down+0x36/0x50 [<ffffffff815f1c9d>] store_online+0x8d/0xd0 [<ffffffff813edc48>] dev_attr_store+0x18/0x30 [<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150 [<ffffffff811adfb2>] vfs_write+0xa2/0x170 [<ffffffff811ae16c>] sys_write+0x4c/0xa0 [<ffffffff81613019>] system_call_fastpath+0x16/0x1b However, a look at cmci_rediscover shows that it can be simplified quite a bit, apart from solving the above issue. It invokes functions that take spin locks with interrupts disabled, and hence it can run in atomic context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU is already dead and out of the cpu_online_mask. So take these points into account and simplify the code, and thereby also fix the above issue. Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 31 10月, 2012 1 次提交
-
-
由 Tang Chen 提交于
cmci_rediscover() used set_cpus_allowed_ptr() to change the current process's running cpu, and migrate itself to the dest cpu. But worker processes are not allowed to be migrated. If current is a worker, the worker will be migrated to another cpu, but the corresponding worker_pool is still on the original cpu. In this case, the following BUG_ON in try_to_wake_up_local() will be triggered: BUG_ON(rq != this_rq()); This will cause the kernel panic. The call trace is like the following: [ 6155.451107] ------------[ cut here ]------------ [ 6155.452019] kernel BUG at kernel/sched/core.c:1654! ...... [ 6155.452019] RIP: 0010:[<ffffffff810add15>] [<ffffffff810add15>] try_to_wake_up_local+0x115/0x130 ...... [ 6155.452019] Call Trace: [ 6155.452019] [<ffffffff8166fc14>] __schedule+0x764/0x880 [ 6155.452019] [<ffffffff81670059>] schedule+0x29/0x70 [ 6155.452019] [<ffffffff8166de65>] schedule_timeout+0x235/0x2d0 [ 6155.452019] [<ffffffff810db57d>] ? mark_held_locks+0x8d/0x140 [ 6155.452019] [<ffffffff810dd463>] ? __lock_release+0x133/0x1a0 [ 6155.452019] [<ffffffff81671c50>] ? _raw_spin_unlock_irq+0x30/0x50 [ 6155.452019] [<ffffffff810db8f5>] ? trace_hardirqs_on_caller+0x105/0x190 [ 6155.452019] [<ffffffff8166fefb>] wait_for_common+0x12b/0x180 [ 6155.452019] [<ffffffff810b0b30>] ? try_to_wake_up+0x2f0/0x2f0 [ 6155.452019] [<ffffffff8167002d>] wait_for_completion+0x1d/0x20 [ 6155.452019] [<ffffffff8110008a>] stop_one_cpu+0x8a/0xc0 [ 6155.452019] [<ffffffff810abd40>] ? __migrate_task+0x1a0/0x1a0 [ 6155.452019] [<ffffffff810a6ab8>] ? complete+0x28/0x60 [ 6155.452019] [<ffffffff810b0fd8>] set_cpus_allowed_ptr+0x128/0x130 [ 6155.452019] [<ffffffff81036785>] cmci_rediscover+0xf5/0x140 [ 6155.452019] [<ffffffff816643c0>] mce_cpu_callback+0x18d/0x19d [ 6155.452019] [<ffffffff81676187>] notifier_call_chain+0x67/0x150 [ 6155.452019] [<ffffffff810a03de>] __raw_notifier_call_chain+0xe/0x10 [ 6155.452019] [<ffffffff81070470>] __cpu_notify+0x20/0x40 [ 6155.452019] [<ffffffff810704a5>] cpu_notify_nofail+0x15/0x30 [ 6155.452019] [<ffffffff81655182>] _cpu_down+0x262/0x2e0 [ 6155.452019] [<ffffffff81655236>] cpu_down+0x36/0x50 [ 6155.452019] [<ffffffff813d3eaa>] acpi_processor_remove+0x50/0x11e [ 6155.452019] [<ffffffff813a6978>] acpi_device_remove+0x90/0xb2 [ 6155.452019] [<ffffffff8143cbec>] __device_release_driver+0x7c/0xf0 [ 6155.452019] [<ffffffff8143cd6f>] device_release_driver+0x2f/0x50 [ 6155.452019] [<ffffffff813a7870>] acpi_bus_remove+0x32/0x6d [ 6155.452019] [<ffffffff813a7932>] acpi_bus_trim+0x87/0xee [ 6155.452019] [<ffffffff813a7a21>] acpi_bus_hot_remove_device+0x88/0x16b [ 6155.452019] [<ffffffff813a33ee>] acpi_os_execute_deferred+0x27/0x34 [ 6155.452019] [<ffffffff81090589>] process_one_work+0x219/0x680 [ 6155.452019] [<ffffffff81090528>] ? process_one_work+0x1b8/0x680 [ 6155.452019] [<ffffffff813a33c7>] ? acpi_os_wait_events_complete+0x23/0x23 [ 6155.452019] [<ffffffff810923be>] worker_thread+0x12e/0x320 [ 6155.452019] [<ffffffff81092290>] ? manage_workers+0x110/0x110 [ 6155.452019] [<ffffffff81098396>] kthread+0xc6/0xd0 [ 6155.452019] [<ffffffff8167c4c4>] kernel_thread_helper+0x4/0x10 [ 6155.452019] [<ffffffff81671f30>] ? retint_restore_args+0x13/0x13 [ 6155.452019] [<ffffffff810982d0>] ? __init_kthread_worker+0x70/0x70 [ 6155.452019] [<ffffffff8167c4c0>] ? gs_change+0x13/0x13 This patch removes the set_cpus_allowed_ptr() call, and put the cmci rediscover jobs onto all the other cpus using system_wq. This could bring some delay for the jobs. Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 26 10月, 2012 2 次提交
-
-
由 Borislav Petkov 提交于
mce_ser, mce_bios_cmci_threshold and mce_disabled are the last three bools which need conversion. Move them to the mca_config struct and adjust usage sites accordingly. Signed-off-by: NBorislav Petkov <bp@alien8.de> Acked-by: NTony Luck <tony.luck@intel.com>
-
由 Borislav Petkov 提交于
Move them into the mca_config struct and adjust code touching them accordingly. Signed-off-by: NBorislav Petkov <bp@alien8.de> Acked-by: NTony Luck <tony.luck@intel.com>
-
- 28 9月, 2012 1 次提交
-
-
由 Naveen N. Rao 提交于
The ACPI spec doesn't provide for a way for the bios to pass down recommended thresholds to the OS on a _per-bank_ basis. This patch adds a new boot option, which if passed, tells Linux to use CMCI thresholds set by the bios. As fail-safe, we initialize threshold to 1 if some banks have not been initialized by the bios and warn the user. Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 10 8月, 2012 2 次提交
-
-
由 Chen Gong 提交于
On Intel systems corrected machine check interrupts (CMCI) may be sent to multiple logical processors; possibly to all processors on the affected socket (SDM Volume 3B "15.5.1 CMCI Local APIC Interface"). This means that a persistent error (such as a stuck bit in ECC memory) may cause a storm of interrupts that greatly hinders or prevents forward progress (probably on many processors). To solve this we keep track of the rate at which each processor sees CMCI. If we exceed a threshold, we disable CMCI delivery and switch to polling the machine check banks. If the storm subsides (none of the affected processors see any more errors for a complete poll interval) we re-enable CMCI. [Tony: Added console messages when storm begins/ends and increased storm threshold from 5 to 15 so we have a few more logged entries before we disable interrupts and start dropping reports] Signed-off-by: NChen Gong <gong.chen@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NChen Gong <gong.chen@linux.intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Tony Luck 提交于
cmci_discover() works out which machine check banks support CMCI, and which of those are shared by multiple logical processors. It uses this information to ensure that exactly one cpu is designated the owner of each bank so that when interrupts are broadcast to multiple cpus, only one of them will look in a shared bank to log the error and clear the bank. At boot time cmci_discover() performs this task silently. But during certain cpu hotplug operations it prints out a set of summary lines like this: CPU 35 MCA banks CMCI:0 CMCI:1 CMCI:3 CMCI:5 CMCI:6 CMCI:7 CMCI:8 CMCI:9 CMCI:10 CMCI:11 CPU 1 MCA banks CMCI:0 CMCI:1 CMCI:3 CPU 39 MCA banks CMCI:0 CMCI:1 CMCI:3 CPU 38 MCA banks CMCI:0 CMCI:1 CMCI:3 CPU 32 MCA banks CMCI:0 CMCI:1 CMCI:3 CPU 37 MCA banks CMCI:0 CMCI:1 CMCI:3 CPU 36 MCA banks CMCI:0 CMCI:1 CMCI:3 CPU 34 MCA banks CMCI:0 CMCI:1 CMCI:3 The value of these messages seems very low. A user might painstakingly cross-check against the data sheet for a processor to ensure that all CMCI supported banks are correctly reported, but this seems improbable. If users really wanted to do this, we should print the information at boot time too. Remove the messages. Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 13 9月, 2011 1 次提交
-
-
由 Thomas Gleixner 提交于
The cmci_discover_lock can be taken in atomic context (cpu bring up sequence) and therefore cannot be preempted on -rt. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 30 12月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
Replace all uses of current_cpu_data with this_cpu operations on the per cpu structure cpu_info. The scala accesses are replaced with the matching this_cpu ops which results in smaller and more efficient code. In the long run, it might be a good idea to remove cpu_data() macro too and use per_cpu macro directly. tj: updated description Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: NH. Peter Anvin <hpa@zytor.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 11 6月, 2010 2 次提交
-
-
由 Huang Ying 提交于
It is reported that CMCI is not raised when number of corrected error reaches preset threshold. After inspection, it is found that MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch fixed it. Value of MCI_CTL2_CMCI_THRESHOLD_MASK is fixed according to x86_64 Software Developer's Manual too. Reported-by: NShaohui Zheng <shaohui.zheng@intel.com> Signed-off-by: NHuang Ying <ying.huang@intel.com> LKML-Reference: <1275977350.3444.660.camel@yhuang-dev.sh.intel.com> Reviewed-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Huang Ying 提交于
Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent. Signed-off-by: NHuang Ying <ying.huang@intel.com> LKML-Reference: <1275977348.3444.659.camel@yhuang-dev.sh.intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 30 3月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: NTejun Heo <tj@kernel.org> Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
-
- 11 3月, 2010 1 次提交
-
-
由 Mike Travis 提交于
Don't write per cpu MCA boot up messages. Signed-of-by: NMike Travis <travis@sgi.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: x86@kernel.org Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 10月, 2009 1 次提交
-
-
由 Alexey Dobriyan 提交于
After m68k's task_thread_info() doesn't refer to current, it's possible to remove sched.h from interrupt.h and not break m68k! Many thanks to Heiko Carstens for allowing this. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
- 10 7月, 2009 1 次提交
-
-
由 Andi Kleen 提交于
Instead of open coded calculations for bank MSRs hide the indexing of higher banks MCE register MSRs in new macros. No semantic changes. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 17 6月, 2009 8 次提交
-
-
由 H. Peter Anvin 提交于
mce_intel.c uses apic_write() and lapic_get_maxlvt(), and so it needs <asm/apic.h>. Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
-
由 Cyrill Gorcunov 提交于
If APIC was disabled (for some reason) and as result it's not even mapped we should not try to enable thermal interrupts at all. Reported-by: NSimon Holm Thøgersen <odie@cs.aau.dk> Tested-by: NSimon Holm Thøgersen <odie@cs.aau.dk> Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <20090615182633.GA7606@lenovo> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Hidetoshi Seto 提交于
Rename files that are no longer 64bit specific: mce_amd_64.c => mce_amd.c mce_intel_64.c => mce_intel.c Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Hidetoshi Seto 提交于
Now all symbols in the header are static. Remove the header. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Hidetoshi Seto 提交于
Put common functions into therm_throt.c, modify Makefile. unexpected_thermal_interrupt intel_thermal_interrupt smp_thermal_interrupt intel_set_thermal_handler Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Hidetoshi Seto 提交于
Let them in same shape. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Hidetoshi Seto 提交于
Break smp_thermal_interrupt() into two functions. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Hidetoshi Seto 提交于
There are 2 headers: arch/x86/include/asm/mce.h arch/x86/kernel/cpu/mcheck/mce.h and in the latter small header: #include <asm/mce.h> This patch move all contents in the latter header into the former, and fix all files using the latter to include the former instead. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 11 6月, 2009 1 次提交
-
-
由 Hidetoshi Seto 提交于
This patch introduces three boot options (no_cmci, dont_log_ce and ignore_ce) to control handling for corrected errors. The "mce=no_cmci" boot option disables the CMCI feature. Since CMCI is a new feature so having boot controls to disable it will be a help if the hardware is misbehaving. The "mce=dont_log_ce" boot option disables logging for corrected errors. All reported corrected errors will be cleared silently. This option will be useful if you never care about corrected errors. The "mce=ignore_ce" boot option disables features for corrected errors, i.e. polling timer and cmci. All corrected events are not cleared and kept in bank MSRs. Usually this disablement is not recommended, however it will be a help if there are some conflict with the BIOS or hardware monitoring applications etc., that clears corrected events in banks instead of OS. [ And trivial cleanup (space -> tab) for doc is included. ] Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> LKML-Reference: <4A30ACDF.5030408@jp.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 04 6月, 2009 1 次提交
-
-
由 Andi Kleen 提交于
Rename the mce_notify_user function to mce_notify_irq. The next patch will split the wakeup handling of interrupt context and of process context and it's better to give it a clearer name for this. Contains a fix from Ying Huang [ Impact: cleanup ] Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 29 5月, 2009 4 次提交
-
-
由 Hidetoshi Seto 提交于
Fix for: WARNING: space prohibited between function name and open parenthesis '(' + for_each_online_cpu (cpu) { Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Thomas Gleixner 提交于
Decode magic constants and turn them into symbols. [ Cleanup to use symbols already exists - HS ] [ Impact: cleanup ] Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Thomas Gleixner 提交于
Mechanic unification. No change in code. [ Impact: cleanup, 32-bit / 64-bit unification ] Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Thomas Gleixner 提交于
Prepare for unification, make two intel_init_thermal equal. [ Impact: cleanup ] Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 09 5月, 2009 1 次提交
-
-
由 Huang Weiyi 提交于
Remove duplicated #include in arch/x86/kernel/cpu/mcheck/mce_intel_64.c. [ Impact: cleanup ] Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 08 5月, 2009 1 次提交
-
-
由 Hidetoshi Seto 提交于
Lockdep reports the warning below when Li tries to offline one cpu: [ 110.835487] ================================= [ 110.835616] [ INFO: inconsistent lock state ] [ 110.835688] 2.6.30-rc4-00336-g8c9ed899 #52 [ 110.835757] --------------------------------- [ 110.835828] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. [ 110.835908] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes: [ 110.835982] (cmci_discover_lock){?.+...}, at: [<ffffffff80236dc0>] cmci_clear+0x30/0x9b cmci_clear() can be called via smp_call_function_single(). It is better to disable interrupt while holding cmci_discover_lock, to turn it into an irq-safe lock - we can deadlock otherwise. [ Impact: fix possible deadlock in the MCE code ] Reported-by: NShaohua Li <shaohua.li@intel.com> Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4A03ED38.8000700@jp.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Reported-by: Shaohua Li<shaohua.li@intel.com>
-
- 16 3月, 2009 1 次提交
-
-
由 Hidetoshi Seto 提交于
Impact: Bug fix on UP Referring commit cc3ca220, Peter removed __cpuinit annotations for mce_cpu_features() and its successor functions, which caused troubles on UP configurations. However the intel_init_cmci() was introduced after that and it also has __cpuinit annotation even though it is called from mce_cpu_features(). Remove the annotation from that function too. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 13 3月, 2009 1 次提交
-
-
由 Rusty Russell 提交于
Impact: cleanup 1) &cpu_online_map -> cpu_online_mask 2) first_cpu/next_cpu_nr -> cpumask_first/cpumask_next 3) cpu_*_map manipulation -> init_cpu_* / set_cpu_* Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 25 2月, 2009 2 次提交
-
-
由 H. Peter Anvin 提交于
Impact: Bug fix on UP The MCE code is reinitialized from resume, so we can't use __cpuinit/__cpuexit for most of the code. Remove those annotations for anything downstream of mce_init(). Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Andi Kleen 提交于
Impact: Major new feature Intel CMCI (Corrected Machine Check Interrupt) is a new feature on Nehalem CPUs. It allows the CPU to trigger interrupts on corrected events, which allows faster reaction to them instead of with the traditional polling timer. Also use CMCI to discover shared banks. Machine check banks can be shared by CPU threads or even cores. Using the CMCI enable bit it is possible to detect the fact that another CPU already saw a specific bank. Use this to assign shared banks only to one CPU to avoid reporting duplicated events. On CPU hot unplug bank sharing is re discovered. This is done using a thread that cycles through all the CPUs. To avoid races between the poller and CMCI we only poll for banks that are not CMCI capable and only check CMCI owned banks on a interrupt. The shared banks ownership information is currently only used for CMCI interrupts, not polled banks. The sharing discovery code follows the algorithm recommended in the IA32 SDM Vol3a 14.5.2.1 The CMCI interrupt handler just calls the machine check poller to pick up the machine check event that caused the interrupt. I decided not to implement a separate threshold event like the AMD version has, because the threshold is always one currently and adding another event didn't seem to add any value. Some code inspired by Yunhong Jiang's Xen implementation, which was in term inspired by a earlier CMCI implementation by me. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 21 2月, 2009 1 次提交
-
-
由 H. Peter Anvin 提交于
Impact: Bug fix on UP Checkin 6ec68bff: x86, mce: reinitialize per cpu features on resume introduced a call to mce_cpu_features() in the resume path, in order for the MCE machinery to get properly reinitialized after a resume. However, this function (and its successors) was flagged __cpuinit, which becomes __init on UP configurations (on SMP suspend/resume requires CPU hotplug and so this would not be seen.) Remove the offending __cpuinit annotations for mce_cpu_features() and its successor functions. Cc: Andi Kleen <ak@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-