- 20 7月, 2010 5 次提交
-
-
由 Suresh Siddha 提交于
xsaveopt is a more optimized form of xsave specifically designed for the context switch usage. xsaveopt doesn't save the state that's not modified from the prior xrstor. And if a specific feature state gets modified to the init state, then xsaveopt just updates the header bit in the xsave memory layout without updating the corresponding memory layout. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100719230205.604014179@sbs-t61.sc.intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
With xsaveopt, if a processor implementation discern that a processor state component is in its initialized state it may modify the corresponding bit in the xsave_hdr.xstate_bv as '0', with out modifying the corresponding memory layout. Hence wHile presenting the xstate information to the user, we always ensure that the memory layout of a feature will be in the init state if the corresponding header bit is zero. This ensures the consistency and avoids the condition of the user seeing some some stale state in the memory layout during signal handling, debugging etc. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100719230205.351459480@sbs-t61.sc.intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
Subleaves of the cpuid vector 0xd provides the offset and size of different feature state that are managed by the xsave/xrstor. Track this for the upcoming usage during signal handling. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100719230205.262987929@sbs-t61.sc.intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
Enumerate the xsaveopt feature. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100719230205.604014179@sbs-t61.sc.intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
Some cpuid features (like xsaveopt) are enumerated using cpuid subleaves. Extend init_scattered_cpuid_features() to take subleaf into account. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100719230205.439900717@sbs-t61.sc.intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 08 7月, 2010 1 次提交
-
-
由 H. Peter Anvin 提交于
Intel has defined CPUID leaf 7 as the next set of feature flags (see the AVX specification, version 007). Add support for this new feature flags word. Signed-off-by: NH. Peter Anvin <hpa@zytor.com> LKML-Reference: <tip-*@vger.kernel.org>
-
- 17 6月, 2010 1 次提交
-
-
由 Venkatesh Pallipadi 提交于
The new IA32_ENERGY_PERF_BIAS MSR allows system software to give hardware a hint whether OS policy favors more power saving, or more performance. This allows the OS to have some influence on internal hardware power/performance tradeoffs where the OS has previously had no influence. The support for this feature is indicated by CPUID.06H.ECX.bit3, as documented in the Intel Architectures Software Developer's Manual. This patch discovers support of this feature and displays it as "epb" in /proc/cpuinfo. Signed-off-by: NVenkatesh Pallipadi <venki@google.com> LKML-Reference: <alpine.LFD.2.00.1006032310160.6669@localhost.localdomain> Signed-off-by: NLen Brown <len.brown@intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 10 6月, 2010 3 次提交
-
-
由 Borislav Petkov 提交于
Extend support to future families, and in particular: * extend direct mapping split of Tseg SMM area. * extend K8 flavored alternatives (NOPS). * rep movs* prefix is fast in ucode. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100602182921.GA21557@aftab> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Borislav Petkov 提交于
This is in preparation for disabling L3 cache indices after having received correctable ECCs in the L3 cache. Now we allow for initial setting of a disabled index slot (write once) and deny writing new indices to it after it has been disabled. Also, we deny using both slots to disable one and the same index. Userspace can restore the previously disabled indices by rewriting those sysfs entries when booting. Cleanup and reorganize code while at it. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100602161840.GI18327@aftab> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Dan Carpenter 提交于
The places which call check_for_xstate() only care about zero or non-zero so this patch doesn't change how the code runs, but it's a cleanup. The main reason for this patch is that I'm looking for places which don't return -EFAULT for copy_from_user() failures. Signed-off-by: NDan Carpenter <error27@gmail.com> LKML-Reference: <20100603100746.GU5483@bicker> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
-
- 02 6月, 2010 1 次提交
-
-
由 Borislav Petkov 提交于
Percpu initialization happens now after booting the cores on the machine and this causes them all to be displayed as belonging to node 0: Jun 8 05:57:21 kepek kernel: [ 0.106999] Booting Node 0, Processors #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 Ok. Use early_cpu_to_node() to get the correct node of each core instead. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com> Cc: Mike Travis <travis@sgi.com> LKML-Reference: <20100601190455.GA14237@aftab> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 01 6月, 2010 2 次提交
-
-
由 Joerg Roedel 提交于
This patch implements a fallback to the GART IOMMU if this is possible and the AMD IOMMU initialization failed. Otherwise the fallback would be nommu which is very problematic on machines with more than 4GB of memory or swiotlb which hurts io-performance. Cc: stable@kernel.org Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
由 Joerg Roedel 提交于
When request_mem_region fails the error path tries to disable the IOMMUs. This accesses the mmio-region which was not allocated leading to a kernel crash. This patch fixes the issue. Cc: stable@kernel.org Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
- 31 5月, 2010 2 次提交
-
-
由 Akinobu Mita 提交于
DBG() macro for CONFIG_DEBUG_PER_CPU_MAPS is unused. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> LKML-Reference: <1274706291-13554-1-git-send-email-akinobu.mita@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Stephane Eranian 提交于
The transactional API patch between the generic and model-specific code introduced several important bugs with event scheduling, at least on X86. If you had pinned events, e.g., watchdog, and were over-committing the PMU, you would get bogus counts. The bug was showing up on Intel CPU because events would move around more often that on AMD. But the problem also existed on AMD, though harder to expose. The issues were: - group_sched_in() was missing a cancel_txn() in the error path - cpuc->n_added was not properly maintained, leading to missing actions in hw_perf_enable(), i.e., n_running being 0. You cannot update n_added until you know the transaction has succeeded. In case of failed transaction n_added was not adjusted back. - in case of failed transactions, event_sched_out() was called and eventually invoked x86_disable_event() to touch the HW reg. But with transactions, on X86, event_sched_in() does not touch HW registers, it simply collects events into a list. Thus, you could end up calling x86_disable_event() on a counter which did not correspond to the current event when idx != -1. The patch modifies the generic and X86 code to avoid all those problems. First, we keep track of the number of events added last. In case the transaction fails, we substract them from n_added. This approach is necessary (as opposed to delaying updates to n_added) because not all event updates use the transaction API, e.g., single events. Second, we encapsulate the event_sched_in() and event_sched_out() in group_sched_in() inside the transaction. That makes the operations symmetrical and you can also detect that you are inside a transaction and skip the HW reg access by checking cpuc->group_flag. With this patch, you can now overcommit the PMU even with pinned system-wide events present and still get valid counts. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1274796225.5882.1389.camel@twins> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 28 5月, 2010 3 次提交
-
-
由 Lee Schermerhorn 提交于
x86 arch specific changes to use generic numa_node_id() based on generic percpu variable infrastructure. Back out x86's custom version of numa_node_id() Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-arch@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 FUJITA Tomonori 提交于
sync_single_range_for_cpu and sync_single_range_for_device hooks in swiotlb_dma_ops are unnecessary because sync_single_for_cpu and sync_single_for_device are used there. Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
By the previous modification, the cpu notifier can return encapsulate errno value. This converts the cpu notifiers for msr, cpuid, and therm_throt. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 5月, 2010 1 次提交
-
-
由 Julia Lawall 提交于
Add a spin_unlock missing on the error path. The locks and unlocks are balanced in other functions, so it seems that the same should be the case here. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E1; @@ * spin_lock(E1,...); <+... when != E1 if (...) { ... when != E1 * return ...; } ...+> * spin_unlock(E1,...); // </smpl> Cc: stable@kernel.org Signed-off-by: NJulia Lawall <julia@diku.dk> Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
- 26 5月, 2010 2 次提交
-
-
由 Borislav Petkov 提交于
Fix the following warning: "WARNING: arch/x86/kernel/built-in.o(.exit.text+0x72): Section mismatch in reference from the function powernowk8_exit() to the variable .cpuinit.data:cpb_nb The function __exit powernowk8_exit() references a variable __cpuinitdata cpb_nb. This is often seen when error handling in the exit function uses functionality in the init path. The fix is often to remove the __cpuinitdata annotation of cpb_nb so it may be used outside an init section." Cc: <stable@kernel.org> Reported-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100525152858.GA24836@aftab> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Kay Sievers 提交于
This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d1-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: NKay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 25 5月, 2010 4 次提交
-
-
由 Peter Zijlstra 提交于
Patch b7e2ecef (perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction) made the unfortunate mistake of assuming the world is x86 only, correct this. The problem was that perf_fetch_caller_regs() did local_save_flags() into regs->flags, and I re-used that to remove another local_save_flags(), forgetting !x86 doesn't have regs->flags. Do the reverse, remove the local_save_flags() from perf_fetch_caller_regs() and let the ftrace site do the local_save_flags() instead. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NPaul Mackerras <paulus@samba.org> Cc: acme@redhat.com Cc: efault@gmx.de Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org LKML-Reference: <1274778175.5882.623.camel@twins> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gabor Gombas 提交于
The low-memory corruption checker triggers during suspend/resume, so we need to reserve the low 64k. Don't be fooled that the BIOS identifies itself as "Dell Inc.", it's still Phoenix BIOS. [ hpa: I think we blacklist almost every BIOS in existence. We should either change this to a whitelist or just make it unconditional. ] Signed-off-by: NGabor Gombas <gombasg@digikabel.hu> LKML-Reference: <201005241913.o4OJDIMM010877@imap1.linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: <stable@kernel.org>
-
由 Jan Beulich 提交于
Bits set in cpu_possible_mask prior to the execution of prefill_possible_map() (i.e. when parsing ACPI or MPS tables) would prevent the SMP alternatives logic from switching to UP mode, plus unnecessary setup of per-CPU data for CPUs that can never come online. Additionally, without CONFIG_HOTPLUG_CPU disabled CPUs can never come online, and hence setting cpu_possible_mask bits for them is again a simple waste of resources. Signed-off-by: NJan Beulich <jbeulich@novell.com> LKML-Reference: <201005241913.o4OJDH3Z010874@imap1.linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Kerstin Jonsson 提交于
When the SMP kernel decides to crash_kexec() the local APICs may have pending interrupts in their vector tables. The setup routine for the local APIC has a deficient mechanism for clearing these interrupts, it only handles interrupts that has already been dispatched to the local core for servicing (the ISR register) safely, it doesn't consider lower prioritized queued interrupts stored in the IRR register. If you have more than one pending interrupt within the same 32 bit word in the LAPIC vector table registers you may find yourself entering the IO APIC setup with pending interrupts left in the LAPIC. This is a situation for wich the IO APIC setup is not prepared. Depending of what/which interrupt vector/vectors are stuck in the APIC tables your system may show various degrees of malfunctioning. That was the reason why the check_timer() failed in our system, the timer interrupts was blocked by pending interrupts from the old kernel when routed trough the IO APIC. Additional comment from Jiri Bohac: ============== If this should go into stable release, I'd add some kind of limit on the number of iterations, just to be safe from hard to debug lock-ups: +if (loops++ > MAX_LOOPS) { + printk("LAPIC pending clean-up") + break; +} while (queued); with MAX_LOOPS something like 1E9 this would leave plenty of time for the pending IRQs to be cleared and would and still cause at most a second of delay if the loop were to lock-up for whatever reason. [trenn@suse.de: V2: Use tsc if avail to bail out after 1 sec due to possible virtual apic_read calls which may take rather long (suggested by: Avi Kivity <avi@redhat.com>) If no tsc is available bail out quickly after cpu_khz, if we broke out too early and still have irqs pending (which should never happen?) we still get a WARN_ON... V3: - Fixed indentation -> checkpatch clean - max_loops must be signed V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call V5: Adjust WARN_ON() condition to also catch error in cpu_has_tsc case] Cc: <jbohac@novell.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Kerstin Jonsson <kerstin.jonsson@ericsson.com> Cc: Avi Kivity <avi@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Tested-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NThomas Renninger <trenn@suse.de> LKML-Reference: <201005241913.o4OJDGWM010865@imap1.linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 21 5月, 2010 8 次提交
-
-
由 Jason Wessel 提交于
Allow kdb to work properly with with earlyprintk=vga by interpreting the backspace and carriage return output characters. These interpretation of these characters is used for simple line editing provided in the kdb shell. CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: H. Peter Anvin <hpa@zytor.com> CC: x86@kernel.org Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
If the kernel debugger was configured, attached and started with kgdbwait, the hardware breakpoint registers should get restored by the kgdb code which is managing the dr registers. CC: x86@kernel.org CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: H. Peter Anvin <hpa@zytor.com> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
It is not possible to use the hw_breakpoint.c API prior to mm_init(), but it is possible to use hardware breakpoints with the kernel debugger. Prior to smp_init() it is possible to simply write to the dr registers of the boot cpu directly. This can be used up until the kgdb_arch_late() is invoked, at which point the standard hw_breakpoint.c API will get used. CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@elte.hu> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
The kernel debugger can operate well before mm_init(), but the x86 hardware breakpoint code which uses the perf api requires that the kernel allocators are initialized. This means the kernel debug core needs to provide an optional arch specific call back to allow the initialization functions to run after the kernel has been further initialized. The kdb shell already had a similar restriction with an early initialization and late initialization. The kdb_init() was moved into the debug core's version of the late init which is called dbg_late_init(); CC: kgdb-bugreport@lists.sourceforge.net Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jan Kiszka 提交于
Allow the x86 arch to have early exception processing for the purpose of debugging via the kgdb. Signed-off-by: NJan Kiszka <jan.kiszka@web.de> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
The only way the debugger can handle a trap in inside rcu_lock, notify_die, or atomic_notifier_call_chain without a triple fault is to have a low level "first opportunity handler" in the int3 exception handler. Generally this will be something the vast majority of folks will not need, but for those who need it, it is added as a kernel .config option called KGDB_LOW_LEVEL_TRAP. CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de> CC: H. Peter Anvin <hpa@zytor.com> CC: x86@kernel.org Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
Remove all the references to the kgdb_post_primary_code. This function serves no useful purpose because you can obtain the same information from the "struct kgdb_state *ks" from with in the debugger, if for some reason you want the data. Also remove the unintentional duplicate assignment for ks->ex_vector. Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
These are the minimum changes to the kgdb core in order to enable an API to connect a new front end (kdb) to the debug core. This patch introduces the dbg_kdb_mode variable controls where the user level I/O is routed. It will be routed to the gdbstub (kgdb) or to the kdb front end which is a simple shell available over the kgdboc connection. You can switch back and forth between kdb or the gdb stub mode of operation dynamically. From gdb stub mode you can blindly type "$3#33", or from the kdb mode you can enter "kgdb" to switch to the gdb stub. The logic in the debug core depends on kdb to look for the typical gdb connection sequences and return immediately with KGDB_PASS_EVENT if a gdb serial command sequence is detected. That should allow a reasonably seamless transition between kdb -> gdb without leaving the kernel exception state. The two gdb serial queries that kdb is responsible for detecting are the "?" and "qSupported" packets. CC: Ingo Molnar <mingo@elte.hu> Signed-off-by: NJason Wessel <jason.wessel@windriver.com> Acked-by: NMartin Hicks <mort@sgi.com>
-
- 20 5月, 2010 2 次提交
-
-
由 Huang Ying 提交于
Traditionally, fatal MCE will cause Linux print error log to console then reboot. Because MCE registers will preserve their content after warm reboot, the hardware error can be logged to disk or network after reboot. But system may fail to warm reboot, then you may lose the hardware error log. ERST can help here. Through saving the hardware error log into flash via ERST before go panic, the hardware error log can be gotten from the flash after system boot successful again. The fatal MCE processing procedure with ERST involved is as follow: - Hardware detect error, MCE raised - MCE read MCE registers, check error severity (fatal), prepare error record - Write MCE error record into flash via ERST - Go panic, then trigger system reboot - System reboot, /sbin/mcelog run, it reads /dev/mcelog to check flash for error record of previous boot via ERST, and output and clear them if available - /sbin/mcelog logs error records into disk or network ERST only accepts CPER record format, but there is no pre-defined CPER section can accommodate all information in struct mce, so a customized section type is defined to hold struct mce inside a CPER record as an error section. Signed-off-by: NHuang Ying <ying.huang@intel.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Huang Ying 提交于
Generic Hardware Error Source provides a way to report platform hardware errors (such as that from chipset). It works in so called "Firmware First" mode, that is, hardware errors are reported to firmware firstly, then reported to Linux by firmware. This way, some non-standard hardware error registers or non-standard hardware link can be checked by firmware to produce more valuable hardware error information for Linux. Now, only SCI notification type and memory errors are supported. More notification type and hardware error type will be added later. These memory errors are reported to user space through /dev/mcelog via faking a corrected Machine Check, so that the error memory page can be offlined by /sbin/mcelog if the error count for one page is beyond the threshold. On some machines, Machine Check can not report physical address for some corrected memory errors, but GHES can do that. So this simplified GHES is implemented firstly. Signed-off-by: NHuang Ying <ying.huang@intel.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
- 19 5月, 2010 5 次提交
-
-
由 Glauber Costa 提交于
If the HV told us we can fully trust the TSC, skip any correction Signed-off-by: NGlauber Costa <glommer@redhat.com> Acked-by: NZachary Amsden <zamsden@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Glauber Costa 提交于
We now added a new set of clock-related msrs in replacement of the old ones. In theory, we could just try to use them and get a return value indicating they do not exist, due to our use of kvm_write_msr_save. However, kvm clock registration happens very early, and if we ever try to write to a non-existant MSR, we raise a lethal #GP, since our idt handlers are not in place yet. So this patch tests for a cpuid feature exported by the host to decide which set of msrs are supported. Signed-off-by: NGlauber Costa <glommer@redhat.com> Acked-by: NZachary Amsden <zamsden@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Glauber Costa 提交于
In recent stress tests, it was found that pvclock-based systems could seriously warp in smp systems. Using ingo's time-warp-test.c, I could trigger a scenario as bad as 1.5mi warps a minute in some systems. (to be fair, it wasn't that bad in most of them). Investigating further, I found out that such warps were caused by the very offset-based calculation pvclock is based on. This happens even on some machines that report constant_tsc in its tsc flags, specially on multi-socket ones. Two reads of the same kernel timestamp at approx the same time, will likely have tsc timestamped in different occasions too. This means the delta we calculate is unpredictable at best, and can probably be smaller in a cpu that is legitimately reading clock in a forward ocasion. Some adjustments on the host could make this window less likely to happen, but still, it pretty much poses as an intrinsic problem of the mechanism. A while ago, I though about using a shared variable anyway, to hold clock last state, but gave up due to the high contention locking was likely to introduce, possibly rendering the thing useless on big machines. I argue, however, that locking is not necessary. We do a read-and-return sequence in pvclock, and between read and return, the global value can have changed. However, it can only have changed by means of an addition of a positive value. So if we detected that our clock timestamp is less than the current global, we know that we need to return a higher one, even though it is not exactly the one we compared to. OTOH, if we detect we're greater than the current time source, we atomically replace the value with our new readings. This do causes contention on big boxes (but big here means *BIG*), but it seems like a good trade off, since it provide us with a time source guaranteed to be stable wrt time warps. After this patch is applied, I don't see a single warp in time during 5 days of execution, in any of the machines I saw them before. Signed-off-by: NGlauber Costa <glommer@redhat.com> Acked-by: NZachary Amsden <zamsden@redhat.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> CC: Avi Kivity <avi@redhat.com> CC: Marcelo Tosatti <mtosatti@redhat.com> CC: Zachary Amsden <zamsden@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Glauber Costa 提交于
This patch removes one padding byte and transform it into a flags field. New versions of guests using pvclock will query these flags upon each read. Flags, however, will only be interpreted when the guest decides to. It uses the pvclock_valid_flags function to signal that a specific set of flags should be taken into consideration. Which flags are valid are usually devised via HV negotiation. Signed-off-by: NGlauber Costa <glommer@redhat.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> Acked-by: NZachary Amsden <zamsden@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Shane Wang 提交于
Per document, for feature control MSR: Bit 1 enables VMXON in SMX operation. If the bit is clear, execution of VMXON in SMX operation causes a general-protection exception. Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution of VMXON outside SMX operation causes a general-protection exception. This patch is to enable this kind of check with SMX for VMXON in KVM. Signed-off-by: NShane Wang <shane.wang@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-