- 12 8月, 2007 3 次提交
-
-
由 Andi Kleen 提交于
The Averatec 2370 and some other Turion laptop BIOS seems to program the ENABLE_C1E MSR inconsistently between cores. This confuses the lapic use heuristics because when C1E is enabled anywhere it seems to affect the complete chip. Use a global flag instead of a per cpu flag to handle this. If any CPU has C1E enabled disabled lapic use. Thanks to Cal Peake for debugging. Cc: tglx@linutronix.de Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andi Kleen 提交于
Commit 19d36ccd "x86: Fix alternatives and kprobes to remap write-protected kernel text" uses code which is being patched for patching. In particular, paravirt_ops does patching in two stages: first it calls paravirt_ops.patch, then it fills any remaining instructions with nop_out(). nop_out calls text_poke() which calls lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val): that call site is one of the places we patch. If we always do patching as one single call to text_poke(), we only need make sure we're not patching the memcpy in text_poke itself. This means the prototype to paravirt_ops.patch needs to change, to marshal the new code into a buffer rather than patching in place as it does now. It also means all patching goes through text_poke(), which is known to be safe (apply_alternatives is also changed to make a single patch). AK: fix compilation on x86-64 (bad rusty!) AK: fix boot on x86-64 (sigh) AK: merged with other patches Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muli Ben-Yehuda 提交于
This patch finishes the i386 and x86-64 ->sysdata conversion and hopefully also fixes Riku's and Andy's observed bugs. It is based on Yinghai Lu's and Andy Whitcroft's patches (thanks!) with some changes: - introduce pci_scan_bus_with_sysdata() and use it instead of pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will allocate the sysdata structure and then call pci_scan_bus(). - always allocate pci_sysdata dynamically. The whole point of this sysdata work is to make it easy to do root-bus specific things (e.g., support PCI domains and IOMMU's). I dislike using a default struct pci_sysdata in some places and a dynamically allocated pci_sysdata elsewhere - the potential for someone indavertantly changing the default structure is too high. - this patch only makes the minimal changes necessary, i.e., the NUMA node is always initialized to -1. Patches to do the right thing with regards to the NUMA node can build on top of this (either add a 'node' parameter to pci_scan_bus_with_sysdata() or just update the node when it becomes known). The patch was compile tested with various configurations (e.g., NUMAQ, VISWS) and run-time tested on i386 and x86-64. Unfortunately none of my machines exhibited the bugs so caveat emptor. Andy, could you please see if this fixes the NUMA issues you've seen? Riku, does this fix "pci=noacpi" on your laptop? Signed-off-by: NMuli Ben-Yehuda <muli@il.ibm.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <ak@suse.de> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: <riku.seppala@kymp.net> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 8月, 2007 2 次提交
-
-
由 Andrew Morton 提交于
Revert 7e92b4fc. It broke Sébastien Dugué's machine and Jeff said (persuasively) This seems like it will break decades-long-working stuff, in favor of breaking new ground in our favorite area, "trusting the BIOS." It's just not worth it for serial ports, IMO. Serial ports are something that just shouldn't break at this late stage in the game. My new Intel platform boxes don't even have serial ports, so I question the value of messing with serial port probing even more... because... just wait a year, and your box won't have a serial port either! :) I certainly don't object to the use of platform devices (or isa_driver), but the probe change seems questionable. That's sorta analagous to rewriting the floppy driver probe routine. Sure you could do it... but why risk all that damage and go through debugging all over again? It seems clear from this report that we cannot, should not, trust BIOS for something (a) so simple and (b) that has been working for over a decade. Much discussion ensued and we've decided to have another go at all of this. Cc: Sébastien Dugué <sebastien.dugue@bull.net> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Len Brown <lenb@kernel.org> Cc: Adam Belay <ambx1@neo.rr.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Jeff Garzik <jeff@garzik.org> Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Cc: Sascha Sommer <saschasommer@freenet.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephane Eranian 提交于
Remove unused TIF_NOTIFY_RESUME flag for all processor architectures. The flag was not used excecpt on IA-64 where the patch replaces it with TIF_PERFMON_WORK. Signed-off-by: Nstephane eranian <eranian@hpl.hp.com> Cc: <linux-arch@vger.kernel.org> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 7月, 2007 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION to avoid confusion (among other things, with CONFIG_SUSPEND introduced in the next patch). Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 7月, 2007 2 次提交
-
-
由 H. Peter Anvin 提交于
Make "struct ist_info" valid on both i386 and x86-64, and use the structure by name in the setup code. Additionally, "Intel SpeedStep IST" is redundant, refer to it as IST consistently. Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 H. Peter Anvin 提交于
Fix missing letters in the structure members of struct efi_info. Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 25 7月, 2007 1 次提交
-
-
由 Len Brown 提交于
As it was a synonym for (CONFIG_ACPI && CONFIG_X86), the ifdefs for it were more clutter than they were worth. For ia64, just add a few stubs in anticipation of future S3 or S4 support. Signed-off-by: NLen Brown <len.brown@intel.com>
-
- 23 7月, 2007 4 次提交
-
-
由 Andi Kleen 提交于
Previously lock was unconditionally used, but shouldn't be needed on UP systems. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Juergen Beisert 提交于
Due to index register access ordering problems, when using macros a line like this fails (and does nothing): setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88); With inlined functions this line will work as expected. Note about a side effect: Seems on Geode GX1 based systems the "suspend on halt power saving feature" was never enabled due to this wrong macro expansion. With inlined functions it will be enabled, but this will stop the TSC when the CPU runs into a HLT instruction. Kernel output something like this: Clocksource tsc unstable (delta = -472746897 ns) This is the 3rd version of this patch. - Adding missed arch/i386/kernel/cpu/mtrr/state.c Thanks to Andres Salomon - Adding some big fat comments into the new header file Suggested by Andi Kleen AK: fixed x86-64 compilation Signed-off-by: NJuergen Beisert <juergen@kreuzholzen.de> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andi Kleen 提交于
When a machine check or NMI occurs while multiple byte code is patched the CPU could theoretically see an inconsistent instruction and crash. Prevent this by temporarily disabling MCEs and returning early in the NMI handler. Based on discussion with Mathieu Desnoyers. Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andi Kleen 提交于
Reenable kprobes and alternative patching when the kernel text is write protected by DEBUG_RODATA Add a general utility function to change write protected text. The new function remaps the code using vmap to write it and takes care of CPU synchronization. It also does CLFLUSH to make icache recovery faster. There are some limitations on when the function can be used, see the comment. This is a newer version that also changes the paravirt_ops code. text_poke also supports multi byte patching now. Contains bug fixes from Zach Amsden and suggestions from Mathieu Desnoyers. Cc: Jan Beulich <jbeulich@novell.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: Zach Amsden <zach@vmware.com> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 7月, 2007 15 次提交
-
-
由 Muli Ben-Yehuda 提交于
This patch introduces struct pci_sysdata to x86 and x86-64, and converts the existing two users (NUMA, Calgary) to use it. This lays the groundwork for having other users of sysdata, such as the PCI domains work. The Calgary bits are tested, the NUMA bits just look ok. Signed-off-by: NJeff Garzik <jeff@garzik.org> Signed-off-by: NMuli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andres Salomon 提交于
This builds upon the existing geode infrastructure, but adds southbridge support, some GPIO functions, and a header file (asm-i386/geode.h) with some useful GX/LX detection tests. The majority of this code was written by Jordan Crouse. Signed-off-by: NJordan Crouse <jordan.crouse@amd.com> Signed-off-by: NAndres Salomon <dilinger@debian.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Thomas Gleixner 提交于
setup_pit_timer is declared in asm-i386/timer.h. Move it to the pit header file, so it can be used by x86_64 as well. Move also the PIT constants. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Robert P. J. Day 提交于
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andreas Mohr 提交于
Add cpu_relax() to cmos_lock() inline function for faster operation on SMT CPUs and less power consumption on others in case of lock contention (which probably doesn't happen too often, so admittedly this patch is not too exciting). [akpm@linux-foundation.org: Include the header file for cpu_relax()] Signed-off-by: NAndreas Mohr <andi@lisas.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rafael J. Wysocki 提交于
On some systems the ACPI NVS area is located in the first 1 MB of RAM and it is overwritten by the i386 code during the restore after hibernation. This confuses the ACPI platform firmware that doesn't update the AC adapter status appropriately as a result (http://bugzilla.kernel.org/show_bug.cgi?id=7995). The solution is to register the reserved memory in the first 1 MB as 'nosave', so that swsusp doesn't touch it during the restore. Also, this has been done on x86_64 for a long time now, so this patch makes the i386 restore code behave like the x86_64 one. [akpm@linux-foundation.org: build fix] Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NPavel Machek <pavel@ucw.cz> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Adrian Bunk 提交于
The Rise CPUs were only very short-lived, and there are no reports of anyone both owning one and running Linux on it. Googling for the printk string "CPU: Rise iDragon" didn't find any dmesg available online. If it turns out that against all expectations there are actually users reverting this patch would be easy. This patch will make the kernel images smaller by a few bytes for all i386 users. Signed-off-by: NAdrian Bunk <bunk@stusta.de> Acked-by: NDave Jones <davej@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
Prevent stuff like this: mm/vmalloc.c: In function 'unmap_kernel_range': mm/vmalloc.c:75: warning: unused variable 'start' Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nigel Cunningham 提交于
Signed-off-by: NNigel Cunningham <nigel@nigel.suspend2.net> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Truxton Fulton 提交于
Commit 59f4e7d5 fixed machine rebooting on Truxton's machine (when no keyboard was present). But it broke it on Lee's machine. The patch reinstates the old (pre-59f4e7d5) code and if that doesn't work out, try the new, post-59f4e7d5 code instead. Cc: Lee Garrett <lee-in-berlin@web.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Beulich 提交于
Constrain __supported_pte_mask and NX handling to just the PAE kernel. Signed-off-by: NJan Beulich <jbeulich@novell.com> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Thomas Gleixner 提交于
hpet.h in asm-i386 and asm-x86_64 contain tons of duplicated stuff. Consolidate into one shared header file. AK: Fix i386 compilation with !X86_IO_APIC Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NChris Wright <chrisw@sous-sol.org> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chris Wright 提交于
Remove pit_interrupt_hook as it adds just an extra layer. Signed-off-by: NChris Wright <chrisw@sous-sol.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andi Kleen 提交于
The compiler generally generates reasonable inline code for the simple cases and for the rest it's better for code size for them to be out of line. Also there they can be potentially optimized more in the future. In fact they probably should be in a .S file because they're all pure assembly, but that's for another day. Also some code style cleanup on them while I was on it (this seems to be the last untouched really early Linux code) This saves ~12k text for a defconfig kernel with gcc 4.1. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Thomas Gleixner 提交于
i386 and sparc64 have the identical code to update the cmos clock. Move it into kernel/time/ntp.c as there are other architectures coming along with the same requirements. [akpm@linux-foundation.org: build fixes] Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 7月, 2007 6 次提交
-
-
由 Avi Kivity 提交于
Currently, CONFIG_X86_CMPXCHG64 both enables boot-time checking of the cmpxchg64b feature and enables compilation of the set_64bit() family. Since the option is dependent on PAE, and since KVM depends on set_64bit(), this effectively disables KVM on i386 nopae. Simplify by removing the config option altogether: the boot check is made dependent on CONFIG_X86_PAE directly, and the set_64bit() family is exposed without constraints. It is up to users to check for the feature flag (KVM does not as virtualiation extensions imply its existence). Signed-off-by: NAvi Kivity <avi@qumranet.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ralf Baechle 提交于
Since Ingo's recent scheduler rewrite which was merged as commit 0437e109 sched_cacheflush is unused. Signed-off-by: NRalf Baechle <ralf@linux-mips.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Rusty Russell 提交于
This is the code for the "lg.ko" module, which allows lguest guests to be launched. [akpm@linux-foundation.org: update for futex-new-private-futexes] [akpm@linux-foundation.org: build fix] [jmorris@namei.org: lguest: use hrtimers] [akpm@linux-foundation.org: x86_64 build fix] Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
New arch macro STACK_TOP_MAX it gives the larges valid stack address for the architecture in question. It differs from STACK_TOP in that it will not distinguish between personalities but will always return the largest possible address. This is used to create the initial stack on execve, which we will move down to the proper location once the binfmt code has figured out where that is. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NOllie Wild <aaw@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Fenghua Yu 提交于
per cpu data section contains two types of data. One set which is exclusively accessed by the local cpu and the other set which is per cpu, but also shared by remote cpus. In the current kernel, these two sets are not clearely separated out. This can potentially cause the same data cacheline shared between the two sets of data, which will result in unnecessary bouncing of the cacheline between cpus. One way to fix the problem is to cacheline align the remotely accessed per cpu data, both at the beginning and at the end. Because of the padding at both ends, this will likely cause some memory wastage and also the interface to achieve this is not clean. This patch: Moves the remotely accessed per cpu data (which is currently marked as ____cacheline_aligned_in_smp) into a different section, where all the data elements are cacheline aligned. And as such, this differentiates the local only data and remotely accessed data cleanly. Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: <linux-arch@vger.kernel.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michael Ellerman 提交于
AFAICT now that jprobe.entry is a void *, JPROBE_ENTRY doesn't do anything useful - so remove it .. I've left a do-nothing version so that out-of-tree jprobes code will still compile without modifications. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 7月, 2007 6 次提交
-
-
由 Jeremy Fitzhardinge 提交于
This communicates with the machine control software via a registry residing in a controlling virtual machine. This allows dynamic creation, destruction and modification of virtual device configurations (network devices, block devices and CPUS, to name some examples). [ Greg, would you mind giving this a review? Thanks -J ] Signed-off-by: NIan Pratt <ian.pratt@xensource.com> Signed-off-by: NChristian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NChris Wright <chrisw@sous-sol.org> Cc: Greg KH <greg@kroah.com>
-
由 Jeremy Fitzhardinge 提交于
This patch is a rollup of all the core pieces of the Xen implementation, including: - booting and setup - pagetable setup - privileged instructions - segmentation - interrupt flags - upcalls - multicall batching BOOTING AND SETUP The vmlinux image is decorated with ELF notes which tell the Xen domain builder what the kernel's requirements are; the domain builder then constructs the address space accordingly and starts the kernel. Xen has its own entrypoint for the kernel (contained in an ELF note). The ELF notes are set up by xen-head.S, which is included into head.S. In principle it could be linked separately, but it seems to provoke lots of binutils bugs. Because the domain builder starts the kernel in a fairly sane state (32-bit protected mode, paging enabled, flat segments set up), there's not a lot of setup needed before starting the kernel proper. The main steps are: 1. Install the Xen paravirt_ops, which is simply a matter of a structure assignment. 2. Set init_mm to use the Xen-supplied pagetables (analogous to the head.S generated pagetables in a native boot). 3. Reserve address space for Xen, since it takes a chunk at the top of the address space for its own use. 4. Call start_kernel() PAGETABLE SETUP Once we hit the main kernel boot sequence, it will end up calling back via paravirt_ops to set up various pieces of Xen specific state. One of the critical things which requires a bit of extra care is the construction of the initial init_mm pagetable. Because Xen places tight constraints on pagetables (an active pagetable must always be valid, and must always be mapped read-only to the guest domain), we need to be careful when constructing the new pagetable to keep these constraints in mind. It turns out that the easiest way to do this is use the initial Xen-provided pagetable as a template, and then just insert new mappings for memory where a mapping doesn't already exist. This means that during pagetable setup, it uses a special version of xen_set_pte which ignores any attempt to remap a read-only page as read-write (since Xen will map its own initial pagetable as RO), but lets other changes to the ptes happen, so that things like NX are set properly. PRIVILEGED INSTRUCTIONS AND SEGMENTATION When the kernel runs under Xen, it runs in ring 1 rather than ring 0. This means that it is more privileged than user-mode in ring 3, but it still can't run privileged instructions directly. Non-performance critical instructions are dealt with by taking a privilege exception and trapping into the hypervisor and emulating the instruction, but more performance-critical instructions have their own specific paravirt_ops. In many cases we can avoid having to do any hypercalls for these instructions, or the Xen implementation is quite different from the normal native version. The privileged instructions fall into the broad classes of: Segmentation: setting up the GDT and the GDT entries, LDT, TLS and so on. Xen doesn't allow the GDT to be directly modified; all GDT updates are done via hypercalls where the new entries can be validated. This is important because Xen uses segment limits to prevent the guest kernel from damaging the hypervisor itself. Traps and exceptions: Xen uses a special format for trap entrypoints, so when the kernel wants to set an IDT entry, it needs to be converted to the form Xen expects. Xen sets int 0x80 up specially so that the trap goes straight from userspace into the guest kernel without going via the hypervisor. sysenter isn't supported. Kernel stack: The esp0 entry is extracted from the tss and provided to Xen. TLB operations: the various TLB calls are mapped into corresponding Xen hypercalls. Control registers: all the control registers are privileged. The most important is cr3, which points to the base of the current pagetable, and we handle it specially. Another instruction we treat specially is CPUID, even though its not privileged. We want to control what CPU features are visible to the rest of the kernel, and so CPUID ends up going into a paravirt_op. Xen implements this mainly to disable the ACPI and APIC subsystems. INTERRUPT FLAGS Xen maintains its own separate flag for masking events, which is contained within the per-cpu vcpu_info structure. Because the guest kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely ignored (and must be, because even if a guest domain disables interrupts for itself, it can't disable them overall). (A note on terminology: "events" and interrupts are effectively synonymous. However, rather than using an "enable flag", Xen uses a "mask flag", which blocks event delivery when it is non-zero.) There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which are implemented to manage the Xen event mask state. The only thing worth noting is that when events are unmasked, we need to explicitly see if there's a pending event and call into the hypervisor to make sure it gets delivered. UPCALLS Xen needs a couple of upcall (or callback) functions to be implemented by each guest. One is the event upcalls, which is how events (interrupts, effectively) are delivered to the guests. The other is the failsafe callback, which is used to report errors in either reloading a segment register, or caused by iret. These are implemented in i386/kernel/entry.S so they can jump into the normal iret_exc path when necessary. MULTICALL BATCHING Xen provides a multicall mechanism, which allows multiple hypercalls to be issued at once in order to mitigate the cost of trapping into the hypervisor. This is particularly useful for context switches, since the 4-5 hypercalls they would normally need (reload cr3, update TLS, maybe update LDT) can be reduced to one. This patch implements a generic batching mechanism for hypercalls, which gets used in many places in the Xen code. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NChris Wright <chrisw@sous-sol.org> Cc: Ian Pratt <ian.pratt@xensource.com> Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Cc: Adrian Bunk <bunk@stusta.de>
-
由 Jeremy Fitzhardinge 提交于
Add Xen interface header files. These are taken fairly directly from the Xen tree, but somewhat rearranged to suit the kernel's conventions. Define macros and inline functions for doing hypercalls into the hypervisor. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NIan Pratt <ian.pratt@xensource.com> Signed-off-by: NChristian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: NChris Wright <chrisw@sous-sol.org>
-
由 Jeremy Fitzhardinge 提交于
The tsc-based get_scheduled_cycles interface is not a good match for Xen's runstate accounting, which reports everything in nanoseconds. This patch replaces this interface with a sched_clock interface, which matches both Xen and VMI's requirements. In order to do this, we: 1. replace get_scheduled_cycles with sched_clock 2. hoist cycles_2_ns into a common header 3. update vmi accordingly One thing to note: because sched_clock is implemented as a weak function in kernel/sched.c, we must define a real function in order to override this weak binding. This means the usual paravirt_ops technique of using an inline function won't work in this case. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Cc: Zachary Amsden <zach@vmware.com> Cc: Dan Hecht <dhecht@vmware.com> Cc: john stultz <johnstul@us.ibm.com>
-
由 Jeremy Fitzhardinge 提交于
In a virtual environment, device drivers such as legacy IDE will waste quite a lot of time probing for their devices which will never appear. This helper function allows a paravirt implementation to lay claim to the whole iomem and ioport space, thereby disabling all device drivers trying to claim IO resources. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NChris Wright <chrisw@sous-sol.org> Cc: Rusty Russell <rusty@rustcorp.com.au>
-
由 Jeremy Fitzhardinge 提交于
Paravirt implementations need to set the sibling map on new cpus. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
-