- 03 5月, 2007 40 次提交
-
-
由 Jeremy Fitzhardinge 提交于
This patch introduces paravirt_ops hooks to control how the kernel's initial pagetable is set up. In the case of a native boot, the very early bootstrap code creates a simple non-PAE pagetable to map the kernel and physical memory. When the VM subsystem is initialized, it creates a proper pagetable which respects the PAE mode, large pages, etc. When booting under a hypervisor, there are many possibilities for what paging environment the hypervisor establishes for the guest kernel, so the constructon of the kernel's pagetable depends on the hypervisor. In the case of Xen, the hypervisor boots the kernel with a fully constructed pagetable, which is already using PAE if necessary. Also, Xen requires particular care when constructing pagetables to make sure all pagetables are always mapped read-only. In order to make this easier, kernel's initial pagetable construction has been changed to only allocate and initialize a pagetable page if there's no page already present in the pagetable. This allows the Xen paravirt backend to make a copy of the hypervisor-provided pagetable, allowing the kernel to establish any more mappings it needs while keeping the existing ones. A slightly subtle point which is worth highlighting here is that Xen requires all kernel mappings to share the same pte_t pages between all pagetables, so that updating a kernel page's mapping in one pagetable is reflected in all other pagetables. This makes it possible to allocate a page and attach it to a pagetable without having to explicitly enumerate that page's mapping in all pagetables. And: +From: "Eric W. Biederman" <ebiederm@xmission.com> If we don't set the leaf page table entries it is quite possible that will inherit and incorrect page table entry from the initial boot page table setup in head.S. So we need to redo the effort here, so we pick up PSE, PGE and the like. Hypervisors like Xen require that their page tables be read-only, which is slightly incompatible with our low identity mappings, however I discussed this with Jeremy he has modified the Xen early set_pte function to avoid problems in this area. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Acked-by: NWilliam Irwin <bill.irwin@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
Add a set of accessors to pack, unpack and modify page table entries (at all levels). This allows a paravirt implementation to control the contents of pgd/pmd/pte entries. For example, Xen uses this to convert the (pseudo-)physical address into a machine address when populating a pagetable entry, and converting back to pphys address when an entry is read. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Acked-by: NIngo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
Add a _paravirt_nop function for use as a stub for no-op operations, and paravirt_nop #defined void * version to make using it easier (since all its uses are as a void *). This is useful to allow the patcher to automatically identify noop operations so it can simply nop out the callsite. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Acked-by: NIngo Molnar <mingo@elte.hu> [mingo] but only as a cleanup of the current open-coded (void *) casts. My problem with this is that it loses the types. Not that there is much to check for, but still, this adds some assumptions about how function calls look like
-
由 Jeremy Fitzhardinge 提交于
Remove CONFIG_DEBUG_PARAVIRT. When inlining code, this option attempts to trash registers in the patch-site's "clobber" field, on the grounds that this should find bugs with incorrect clobbers. Unfortunately, the clobber field really means "registers modified by this patch site", which includes return values. Because of this, this option has outlived its usefulness, so remove it. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au>
-
由 Jeremy Fitzhardinge 提交于
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Zachary Amsden <zach@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au>
-
由 Rusty Russell 提交于
On Thu, 2007-03-29 at 13:16 +0200, Andi Kleen wrote: > Please clean it up properly with two structs. Not sure about this, now I've done it. Running it here. If you like it, I can do x86-64 as well. == lguest defines its own TSS struct because the "struct tss_struct" contains linux-specific additions. Andi asked me to split the struct in processor.h. Unfortunately it makes usage a little awkward. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 James Puthukattukaran 提交于
I have a 4 socket AMD Operton system. The 2.6.18 kernel I have crashes when there is no memory in node0. AK: changed call to _nopanic Cc: Andi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de>
-
Setting the DEBUG_SIG flag breaks compilation due to a wrong struct access. Aditionally, it raises two warnings. This is one patch to fix them all. Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Jeremy Fitzhardinge 提交于
Add "noreplace-smp" to disable SMP instruction replacement. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Jeremy Fitzhardinge 提交于
The .smp_altinstructions section and its corresponding symbols are completely unused, so remove them. Also, remove stray #ifdef __KENREL__ in asm-i386/alternative.h Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de>
-
由 H. Peter Anvin 提交于
This patch is based on Rusty's recent cleanup of the EFLAGS-related macros; it extends the same kind of cleanup to control registers and MSRs. It also unifies these between i386 and x86-64; at least with regards to MSRs, the two had definitely gotten out of sync. Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Jeremy Fitzhardinge 提交于
Let's allow page-alignment in general for per-cpu data (wanted by Xen, and Ingo suggested KVM as well). Because larger alignments can use more room, we increase the max per-cpu memory to 64k rather than 32k: it's getting a little tight. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
由 Andi Kleen 提交于
As a bug workaround bank 0 on K7s is normally disabled, but no need to do that on other AMD CPUs. Cc: davej@redhat.com Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Jeremy Fitzhardinge 提交于
Update documentation for i386 smp_call_function* functions. As reported by Randy Dunlap <rdunlap@xenotime.net> [ I've posted this before but it seems to have been lost along the way. ] Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Randy Dunlap <rdunlap@xenotime.net>
-
由 Jan Engelhardt 提交于
(I hope Andi is the right one to Cc, otherwise please add, thanks!) Use menuconfigs instead of menus, so the whole menu can be disabled at once instead of going through all options. Signed-off-by: NJan Engelhardt <jengelh@gmx.de> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Andi Kleen 提交于
It doesn't put the CPU into deeper sleep states, so it's better to use the standard idle loop to save power. But allow to reenable it anyways for benchmarking. I also removed the obsolete idle=halt on i386 Cc: andreas.herrmann@amd.com Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Jeremy Fitzhardinge 提交于
Most of asm-x86_64/bugs.h is code which should be in a C file, so put it there. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>
-
由 Jeremy Fitzhardinge 提交于
Now that relocation of the VDSO for COMPAT_VDSO users is done at runtime rather than compile time, it is possible to enable/disable compat mode at runtime. This patch allows you to enable COMPAT_VDSO mode with "vdso=2" on the kernel command line, or via sysctl. (Switching on a running system shouldn't be done lightly; any process which was relying on the compat VDSO will be upset if it goes away.) The COMPAT_VDSO config option still exists, but if enabled it just makes vdso_enabled default to VDSO_COMPAT. +From: Hugh Dickins <hugh@veritas.com> Fix oops from i386-make-compat_vdso-runtime-selectable.patch. Even mingetty at system startup finds it easy to trigger an oops while reading /proc/PID/maps: though it has a good hold on the mm itself, that cannot stop exit_mm() from resetting tsk->mm to NULL. (It is usually show_map()'s call to get_gate_vma() which oopses, and I expect we could change that to check priv->tail_vma instead; but no matter, even m_start()'s call just after get_task_mm() is racy.) Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com> Cc: "Jan Beulich" <JBeulich@novell.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com>
-
由 Jeremy Fitzhardinge 提交于
Some versions of libc can't deal with a VDSO which doesn't have its ELF headers matching its mapped address. COMPAT_VDSO maps the VDSO at a specific system-wide fixed address. Previously this was all done at build time, on the grounds that the fixed VDSO address is always at the top of the address space. However, a hypervisor may reserve some of that address space, pushing the fixmap address down. This patch does the adjustment dynamically at runtime, depending on the runtime location of the VDSO fixmap. [ Patch has been through several hands: Jan Beulich wrote the orignal version; Zach reworked it, and Jeremy converted it to relocate phdrs as well as sections. ] Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com> Cc: "Jan Beulich" <JBeulich@novell.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com>
-
由 Jeremy Fitzhardinge 提交于
identify_cpu() is used to identify both the boot CPU and secondary CPUs, but it performs some actions which only apply to the boot CPU. Those functions are therefore really __init functions, but because they're called by identify_cpu(), they must be marked __cpuinit. This patch splits identify_cpu() into identify_boot_cpu() and identify_secondary_cpu(), and calls the appropriate init functions from each. Also, identify_boot_cpu() and all the functions it dominates are marked __init. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Jeremy Fitzhardinge 提交于
Most of asm-i386/bugs.h is code which should be in a C file, so put it there. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>
-
由 Andi Kleen 提交于
Ugly ifdef, but should handle all 64bit platforms that have suitable zones. On some like Altix it's probably impossible without IOMMU use to get memory <4GB this way, but they have to live with that. Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Avi Kivity 提交于
The xmm space on x86_64 is 256 bytes. Signed-off-by: NAvi Kivity <avi@qumranet.com> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Andi Kleen 提交于
As per i386 patch: move X86_EFLAGS_IF et al out to a new header: processor-flags.h, so we can include it from irqflags.h and use it in raw_irqs_disabled_flags(). As a side-effect, we could now use these flags in .S files. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Jan Beulich 提交于
Under CONFIG_DISCONTIGMEM, assuming that a !pfn_valid() implies all subsequent pfn-s are also invalid is wrong. Thus replace this by explicitly checking against the E820 map. AK: make e820 on x86-64 not initdata Signed-off-by: NJan Beulich <jbeulich@novell.com> Signed-off-by: NAndi Kleen <ak@suse.de> Acked-by: NMark Langsdorf <mark.langsdorf@amd.com>
-
由 Jeremy Fitzhardinge 提交于
Rather than using a single constant PERCPU_ENOUGH_ROOM, compute it as the sum of kernel_percpu + PERCPU_MODULE_RESERVE. This is now common to all architectures; if an architecture wants to set PERCPU_ENOUGH_ROOM to something special, then it may do so (ia64 is the only one which does). Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de>
-
由 Andi Kleen 提交于
All were already in some header Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Andi Kleen 提交于
Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Jeremy Fitzhardinge 提交于
machine_ops is an interface for the machine_* functions defined in <linux/reboot.h>. This is intended to allow hypervisors to intercept the reboot process, but it could be used to implement other x86 subarchtecture reboots. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Jeremy Fitzhardinge 提交于
Add a smp_ops interface. This abstracts the API defined by <linux/smp.h> for use within arch/i386. The primary intent is that it be used by a paravirtualizing hypervisor to implement SMP, but it could also be used by non-APIC-using sub-architectures. This is related to CONFIG_PARAVIRT, but is implemented unconditionally since it is simpler that way and not a highly performance-sensitive interface. Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
-
由 Rusty Russell 提交于
Now we have an explicit per-cpu GDT variable, we don't need to keep the descriptors around to use them to find the GDT: expose cpu_gdt directly. We could go further and make load_gdt() pack the descriptor for us, or even assume it means "load the current cpu's GDT" which is what it always does. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
由 Bernhard Walle 提交于
Fix "Section mismatch" warnings in arch/x86_64/kernel/time.c Signed-off-by: NBernhard Walle <bwalle@suse.de> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Jan Beulich 提交于
commit 5e518d76 did the same change to i386's variant. With this change, i386's and x86-64's versions are identical, raising the question whether the x86-64 one should go (just like there's only one instance of edd.S). Signed-off-by: NJan Beulich <jbeulich@novell.com> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Konrad Rzeszutek 提交于
This patch touches the NMI watchdog every MAX_ORDER_NR_PAGES to inhibit the machine from triggering an NMI while the CPUs are locked. This situation is happening on boxes with more than 64CPUs and 128GB of RAM when Alt-SysRq-m is performed. It has been succesfully tested for regression on uni, 2, 4, 8 32, and 64 CPU boxes with various memory configuration. Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Eric Dumazet 提交于
Current vsyscall_gtod_data is large (3 or 4 cache lines dirtied at timer interrupt). We can shrink it to exactly 64 bytes (1 cache line on AMD64) Instead of copying a whole struct clocksource, we copy only needed fields. I deleted an unused field : offset_base This patch fixes one oddity in vgettimeofday(): It can returns a timeval with tv_usec = 1000000. Maybe not a bug, but why not doing the right thing ? Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Eric Dumazet 提交于
There is a tiny probability that the return value from vtime(time_t *t) is Signed-off-by: NAndi Kleen <ak@suse.de> different than the value stored in *t Using a temporary variable solves the problem and gives a faster code. 17: 48 85 ff test %rdi,%rdi 1a: 48 8b 05 00 00 00 00 mov 0(%rip),%rax # __vsyscall_gtod_data.wall_time_tv.tv_sec 21: 74 03 je 26 23: 48 89 07 mov %rax,(%rdi) 26: c9 leaveq 27: c3 retq Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
-
由 Adrian Bunk 提交于
Many years ago, UNEXPECTED_IO_APIC() contained printk()'s (but nothing more). Now that it's completely empty for years, we can as well remove it. Signed-off-by: NAdrian Bunk <bunk@stusta.de> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Adrian Bunk 提交于
- there's no reason for duplicating the prototype from include/linux/syscalls.h in include/asm-x86_64/unistd.h - every file should #include the headers containing the prototypes for it's global functions Signed-off-by: NAdrian Bunk <bunk@stusta.de> Signed-off-by: NAndi Kleen <ak@suse.de>
-
由 Christoph Lameter 提交于
x86_64 currently simulates a list using the index and private fields of the page struct. Seems that the code was inherited from i386. But x86_64 does not use the slab to allocate pgds and pmds etc. So the lru field is not used by the slab and therefore available. This patch uses standard list operations on page->lru to realize pgd tracking. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
由 Parag Warudkar 提交于
Signed-off-by: NParag Warudkar <parag.warudkar@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndi Kleen <ak@suse.de>
-