- 20 6月, 2008 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NJuan Quintela <quintela@redhat.com> Signed-off-by: NEduardo Habkost <ehabkost@redhat.com> Signed-off-by: NMark McLoughlin <markmc@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 25 4月, 2008 2 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Jeremy Fitzhardinge 提交于
Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name of the appropriate pagetable level structure. [ x86.git merge work by Mark McLoughlin <markmc@redhat.com> ] Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NMark McLoughlin <markmc@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 17 4月, 2008 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
The memory resource is also used for main memory, and we need it to allocate physical addresses for memory hotplug. Knobbling io space is enough to get the job done anyway. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 30 1月, 2008 13 次提交
-
-
由 Eduardo Habkost 提交于
This finally makes paravirt-ops able to compile and boot under x86_64. Signed-off-by: NEduardo Habkost <ehabkost@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Eduardo Habkost 提交于
paravirt_pagetable_setup_{start,done}() are not used (yet) under x86_64, and native_pagetable_setup_{start,done}() don't exist on x86_64. So they don't need to be set. Signed-off-by: NEduardo Habkost <ehabkost@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
This patch fills in the read and write cr8 fields with their native version. Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
x86_read_per_cpu() and its writeish sister are not present in x86_64. So in this patch, we replace them with __get_cpu_var(), which is present in both Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
The core patching code for paravirt is sufficiently different among i386 and x86_64, and we move them to specific files. Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
This patch adds paravirt hook for swapgs operation, which is a privileged operation in x86_64. Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
This patch adds a field in pv_cpu_ops for a paravirtualized hook for rdtscp, needed for x86_64. Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
This patch changes paravirt_32.c to paravirt.c. The goal is to have paravirt support in x86_64, so we do it in a common file Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
this patch changes the signature of write_ldt_entry. Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com> CC: Zachary Amsden <zach@vmware.com> CC: Jeremy Fitzhardinge <Jeremy.Fitzhardinge.citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
This patch changes the write_gdt_entry function signature. Instead of the old "a" and "b" parameters, it now receives a pointer to a desc_struct, and the size of the entry being handled. This is because x86_64 can have some 16-byte entries as well as 8-byte ones. Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com> CC: Zachary Amsden <zach@vmware.com> CC: Jeremy Fitzhardinge <Jeremy.Fitzhardinge.citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
this patch changes write_idt_entry signature. It now takes a gate_desc instead of the a and b parameters. It will allow it to be later unified between i386 and x86_64. Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com> CC: Zachary Amsden <zach@vmware.com> CC: Jeremy Fitzhardinge <Jeremy.Fitzhardinge.citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 H. Peter Anvin 提交于
This changes size-specific register names (eip/rip, esp/rsp, etc.) to generic names in the thread and tss structures. Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
This patch consolidates the irqflags include files containing common paravirt definitions. The native definition for interrupt handling, halt, and such, are the same for 32 and 64 bit, and they are kept in irqflags.h. the differences are split in the arch-specific files. The syscall function, irq_enable_sysexit, has a very specific i386 naming, and its name is then changed to a more general one. Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> Acked-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 30 11月, 2007 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Subdividing the paravirt_ops structure caused a regression in certain non-GPL modules which try to use mmu_ops and cpu_ops. This restores the old behaviour, and makes it consistent with the non-CONFIG_PARAVIRT case. Takashi Iwai <tiwai@suse.de> adds: > I took at this problem (as I have an nvidia card on one of my > workstations), and found out that the following suffer from > EXPORT_SYMBOL_GPL changes: > > * local_disable_irq(), local_irq_save*(), etc. > * MSR-related macros like rdmsr(), wrmsr(), read_cr0(), etc. > wbinvd(), too. > * pmd_val(), pgd_val(), etc are all involved with pv_mm_ops. > pmd_large() and pmd_bad() is also indirectly involved. > __flush_tlb() and friends suffer, too. Christoph Hellwig objects to this patch on the grounds that modules shouldn't be using these operations anyway. I don't think this is a particularly good reason to reject the patch, for several reasons: 1. These operations are still available to modules when not using CONFIG_PARAVIRT, since they are implicitly exported as inline functions via the kernel headers. Exporting the same functionality as GPL-only symbols just adds a gratuitious difference between CONFIG_PARAVIRT and non-CONFIG_PARAVIRT configurations. If we really think these operations are not for module use (or non-GPL module use), then we should solve the problem in a general way. 2. It's a regression from previous kernels, which would work these modules even with CONFIG_PARAVIRT enabled. 3. The operations in question seem pretty reasonable for modules to use. The control registers/MSRs can be accessed directly anyway, so there's no benefit in preventing modules from using standard interfaces. And it seems reasonable to allow a graphics driver to create its own mappings if it wants. Therefore, I think this patch should go in for 2.6.24. If people really think that these operations should not be available to modules, then we can address that separately. Signed-off-by: NJeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com> Cc: Tobias Powalowski <t.powa@gmx.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Takashi Iwai <tiwai@suse.de> Cc: Zachary Amsden <zach@vmware.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 10月, 2007 2 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Currently, the set_lazy_mode pv_op is overloaded with 5 functions: 1. enter lazy cpu mode 2. leave lazy cpu mode 3. enter lazy mmu mode 4. leave lazy mmu mode 5. flush pending batched operations This complicates each paravirt backend, since it needs to deal with all the possible state transitions, handling flushing, etc. In particular, flushing is quite distinct from the other 4 functions, and seems to just cause complication. This patch removes the set_lazy_mode operation, and adds "enter" and "leave" lazy mode operations on mmu_ops and cpu_ops. All the logic associated with enter and leaving lazy states is now in common code (basically BUG_ONs to make sure that no mode is current when entering a lazy mode, and make sure that the mode is current when leaving). Also, flush is handled in a common way, by simply leaving and re-entering the lazy mode. The result is that the Xen, lguest and VMI lazy mode implementations are much simpler. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Cc: Andi Kleen <ak@suse.de> Cc: Zach Amsden <zach@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Avi Kivity <avi@qumranet.com> Cc: Anthony Liguory <aliguori@us.ibm.com> Cc: "Glauber de Oliveira Costa" <glommer@gmail.com> Cc: Jun Nakajima <jun.nakajima@intel.com>
-
由 Jeremy Fitzhardinge 提交于
This patch refactors the paravirt_ops structure into groups of functionally related ops: pv_info - random info, rather than function entrypoints pv_init_ops - functions used at boot time (some for module_init too) pv_misc_ops - lazy mode, which didn't fit well anywhere else pv_time_ops - time-related functions pv_cpu_ops - various privileged instruction ops pv_irq_ops - operations for managing interrupt state pv_apic_ops - APIC operations pv_mmu_ops - operations for managing pagetables There are several motivations for this: 1. Some of these ops will be general to all x86, and some will be i386/x86-64 specific. This makes it easier to share common stuff while allowing separate implementations where needed. 2. At the moment we must export all of paravirt_ops, but modules only need selected parts of it. This allows us to export on a case by case basis (and also choose which export license we want to apply). 3. Functional groupings make things a bit more readable. Struct paravirt_ops is now only used as a template to generate patch-site identifiers, and to extract function pointers for inserting into jmp/calls when patching. It is only instantiated when needed. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Zach Amsden <zach@vmware.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Anthony Liguory <aliguori@us.ibm.com> Cc: "Glauber de Oliveira Costa" <glommer@gmail.com> Cc: Jun Nakajima <jun.nakajima@intel.com>
-
- 11 10月, 2007 2 次提交
-
-
由 Thomas Gleixner 提交于
Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Thomas Gleixner 提交于
Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 8月, 2007 1 次提交
-
-
由 Andi Kleen 提交于
Commit 19d36ccd "x86: Fix alternatives and kprobes to remap write-protected kernel text" uses code which is being patched for patching. In particular, paravirt_ops does patching in two stages: first it calls paravirt_ops.patch, then it fills any remaining instructions with nop_out(). nop_out calls text_poke() which calls lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val): that call site is one of the places we patch. If we always do patching as one single call to text_poke(), we only need make sure we're not patching the memcpy in text_poke itself. This means the prototype to paravirt_ops.patch needs to change, to marshal the new code into a buffer rather than patching in place as it does now. It also means all patching goes through text_poke(), which is known to be safe (apply_alternatives is also changed to make a single patch). AK: fix compilation on x86-64 (bad rusty!) AK: fix boot on x86-64 (sigh) AK: merged with other patches Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 7月, 2007 2 次提交
-
-
由 Andi Kleen 提交于
Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andi Kleen 提交于
Reenable kprobes and alternative patching when the kernel text is write protected by DEBUG_RODATA Add a general utility function to change write protected text. The new function remaps the code using vmap to write it and takes care of CPU synchronization. It also does CLFLUSH to make icache recovery faster. There are some limitations on when the function can be used, see the comment. This is a newer version that also changes the paravirt_ops code. text_poke also supports multi byte patching now. Contains bug fixes from Zach Amsden and suggestions from Mathieu Desnoyers. Cc: Jan Beulich <jbeulich@novell.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: Zach Amsden <zach@vmware.com> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 7月, 2007 2 次提交
-
-
由 Jeremy Fitzhardinge 提交于
The tsc-based get_scheduled_cycles interface is not a good match for Xen's runstate accounting, which reports everything in nanoseconds. This patch replaces this interface with a sched_clock interface, which matches both Xen and VMI's requirements. In order to do this, we: 1. replace get_scheduled_cycles with sched_clock 2. hoist cycles_2_ns into a common header 3. update vmi accordingly One thing to note: because sched_clock is implemented as a weak function in kernel/sched.c, we must define a real function in order to override this weak binding. This means the usual paravirt_ops technique of using an inline function won't work in this case. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Cc: Zachary Amsden <zach@vmware.com> Cc: Dan Hecht <dhecht@vmware.com> Cc: john stultz <johnstul@us.ibm.com>
-
由 Jeremy Fitzhardinge 提交于
In a virtual environment, device drivers such as legacy IDE will waste quite a lot of time probing for their devices which will never appear. This helper function allows a paravirt implementation to lay claim to the whole iomem and ioport space, thereby disabling all device drivers trying to claim IO resources. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NChris Wright <chrisw@sous-sol.org> Cc: Rusty Russell <rusty@rustcorp.com.au>
-
- 11 5月, 2007 1 次提交
-
-
由 Eric W. Biederman 提交于
This reverts commit c9ccf30d. Entering the kernel at startup_32 without passing our real mode data in %esi, and without guaranteeing that physical and virtual addresses are identity mapped makes head.S impossible to maintain. The only user of this infrastructure is lguest which is not merged so nothing we currently support will break by removing this over designed nightmare, and only the pending lguest patches will be affected. The pending Xen patches have a different entry point that they use. We are currently discussing what Xen and lguest need to do to boot the kernel in a more normal fashion so using startup_32 in this weird manner is clearly not their long term direction. So let's remove this code in head.S before it causes brain damage to people trying to maintain head.S Cc: Chris Wright <chrisw@sous-sol.org> Cc: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Zachary Amsden <zach@vmware.com> CC: H. Peter Anvin <hpa@zytor.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 5月, 2007 12 次提交
-
-
由 Jeremy Fitzhardinge 提交于
startup_ipi_hook depends on CONFIG_X86_LOCAL_APIC, so move it to the right part of the paravirt_ops initialization. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Andi Kleen 提交于
Otherwise non GPL modules cannot even do basic operations like disabling interrupts anymore, which would be excessive. Longer term should split the single structure up into internal and external symbols and not export the internal ones at all. Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Jeremy Fitzhardinge 提交于
In shadow mode hypervisors, ptep_get_and_clear achieves the desired purpose of keeping the shadows in sync by issuing a native_get_and_clear, followed by a call to pte_update, which indicates the PTE has been modified. Direct mode hypervisors (Xen) have no need for this anyway, and will trap the update using writable pagetables. This means no hypervisor makes use of ptep_get_and_clear; there is no reason to have it in the paravirt-ops structure. Change confusing terminology about raw vs. native functions into consistent use of native_pte_xxx for operations which do not invoke paravirt-ops. Signed-off-by: NZachary Amsden <zach@vmware.com> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Jeremy Fitzhardinge 提交于
Xen and VMI both have special requirements when mapping a highmem pte page into the kernel address space. These can be dealt with by adding a new kmap_atomic_pte() function for mapping highptes, and hooking it into the paravirt_ops infrastructure. Xen specifically wants to map the pte page RO, so this patch exposes a helper function, kmap_atomic_prot, which maps the page with the specified page protections. This also adds a kmap_flush_unused() function to clear out the cached kmap mappings. Xen needs this to clear out any potential stray RW mappings of pages which will become part of a pagetable. [ Zach - vmi.c will need some attention after this patch. It wasn't immediately obvious to me what needs to be done. ] Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com>
-
由 Jeremy Fitzhardinge 提交于
Back out the map_pt_hook to clear the way for kmap_atomic_pte. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com>
-
由 Jeremy Fitzhardinge 提交于
This patch adds a pv_op for flush_tlb_others. Linux running on native hardware uses cross-CPU IPIs to flush the TLB on any CPU which may have a particular mm's pagetable entries cached in its TLB. This is inefficient in a paravirtualized environment, since the hypervisor knows which real CPUs actually contain cached mappings, which may be a small subset of a guest's VCPUs. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Jeremy Fitzhardinge 提交于
Implement the actual patching machinery. paravirt_patch_default() contains the logic to automatically patch a callsite based on a few simple rules: - if the paravirt_op function is paravirt_nop, then patch nops - if the paravirt_op function is a jmp target, then jmp to it - if the paravirt_op function is callable and doesn't clobber too much for the callsite, call it directly paravirt_patch_default is suitable as a default implementation of paravirt_ops.patch, will remove most of the expensive indirect calls in favour of either a direct call or a pile of nops. Backends may implement their own patcher, however. There are several helper functions to help with this: paravirt_patch_nop nop out a callsite paravirt_patch_ignore leave the callsite as-is paravirt_patch_call patch a call if the caller and callee have compatible clobbers paravirt_patch_jmp patch in a jmp paravirt_patch_insns patch some literal instructions over the callsite, if they fit This patch also implements more direct patches for the native case, so that when running on native hardware many common operations are implemented inline. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com> Cc: Anthony Liguori <anthony@codemonkey.ws> Acked-by: NIngo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
Use patch type identifiers derived from the offset of the operation in the paravirt_ops structure. This avoids having to maintain a separate enum for patch site types. Also, since the identifier is derived from the offset into paravirt_ops, the offset can be derived from the identifier. This is used to remove replicated information in the various callsite macros, which has been a source of bugs in the past. This patch also drops the fused save_fl+cli operation, which doesn't really add much and makes things more complex - specifically because it breaks the 1:1 relationship between identifiers and offsets. If this operation turns out to be particularly beneficial, then the right answer is to define a new entrypoint for it. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com>
-
由 Jeremy Fitzhardinge 提交于
Add hooks to allow a paravirt implementation to track the lifetime of an mm. Paravirtualization requires three hooks, but only two are needed in common code. They are: arch_dup_mmap, which is called when a new mmap is created at fork arch_exit_mmap, which is called when the last process reference to an mm is dropped, which typically happens on exit and exec. The third hook is activate_mm, which is called from the arch-specific activate_mm() macro/function, and so doesn't need stub versions for other architectures. It's called when an mm is first used. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: linux-arch@vger.kernel.org Cc: James Bottomley <James.Bottomley@SteelEye.com> Acked-by: NIngo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
Normally when running in PAE mode, the 4th PMD maps the kernel address space, which can be shared among all processes (since they all need the same kernel mappings). Xen, however, does not allow guests to have the kernel pmd shared between page tables, so parameterize pgtable.c to allow both modes of operation. There are several side-effects of this. One is that vmalloc will update the kernel address space mappings, and those updates need to be propagated into all processes if the kernel mappings are not intrinsically shared. In the non-PAE case, this is done by maintaining a pgd_list of all processes; this list is used when all process pagetables must be updated. pgd_list is threaded via otherwise unused entries in the page structure for the pgd, which means that the pgd must be page-sized for this to work. Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE pgd to page aligned anyway, so this patch forces the pgd to be page aligned+sized when the kernel pmd is unshared, to accomodate both these requirements. Also, since there may be several distinct kernel pmds (if the user/kernel split is below 3G), there's no point in allocating them from a slab cache; they're just allocated with get_free_page and initialized appropriately. (Of course the could be cached if there is just a single kernel pmd - which is the default with a 3G user/kernel split - but it doesn't seem worthwhile to add yet another case into this code). [ Many thanks to wli for review comments. ] Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NWilliam Lee Irwin III <wli@holomorphy.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com> Cc: Christoph Lameter <clameter@sgi.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
由 Jeremy Fitzhardinge 提交于
This patch introduces paravirt_ops hooks to control how the kernel's initial pagetable is set up. In the case of a native boot, the very early bootstrap code creates a simple non-PAE pagetable to map the kernel and physical memory. When the VM subsystem is initialized, it creates a proper pagetable which respects the PAE mode, large pages, etc. When booting under a hypervisor, there are many possibilities for what paging environment the hypervisor establishes for the guest kernel, so the constructon of the kernel's pagetable depends on the hypervisor. In the case of Xen, the hypervisor boots the kernel with a fully constructed pagetable, which is already using PAE if necessary. Also, Xen requires particular care when constructing pagetables to make sure all pagetables are always mapped read-only. In order to make this easier, kernel's initial pagetable construction has been changed to only allocate and initialize a pagetable page if there's no page already present in the pagetable. This allows the Xen paravirt backend to make a copy of the hypervisor-provided pagetable, allowing the kernel to establish any more mappings it needs while keeping the existing ones. A slightly subtle point which is worth highlighting here is that Xen requires all kernel mappings to share the same pte_t pages between all pagetables, so that updating a kernel page's mapping in one pagetable is reflected in all other pagetables. This makes it possible to allocate a page and attach it to a pagetable without having to explicitly enumerate that page's mapping in all pagetables. And: +From: "Eric W. Biederman" <ebiederm@xmission.com> If we don't set the leaf page table entries it is quite possible that will inherit and incorrect page table entry from the initial boot page table setup in head.S. So we need to redo the effort here, so we pick up PSE, PGE and the like. Hypervisors like Xen require that their page tables be read-only, which is slightly incompatible with our low identity mappings, however I discussed this with Jeremy he has modified the Xen early set_pte function to avoid problems in this area. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Acked-by: NWilliam Irwin <bill.irwin@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
Add a set of accessors to pack, unpack and modify page table entries (at all levels). This allows a paravirt implementation to control the contents of pgd/pmd/pte entries. For example, Xen uses this to convert the (pseudo-)physical address into a machine address when populating a pagetable entry, and converting back to pphys address when an entry is read. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Acked-by: NIngo Molnar <mingo@elte.hu>
-