- 20 8月, 2009 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Make sure the stack-protector segment registers are properly set up before calling any functions which may have stack-protection compiled into them. [ Impact: prevent Xen early-boot crash when stack-protector is enabled ] Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 16 5月, 2009 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Xiaohui Xin and some other folks at Intel have been looking into what's behind the performance hit of paravirt_ops when running native. It appears that the hit is entirely due to the paravirtualized spinlocks introduced by: | commit 8efcbab6 | Date: Mon Jul 7 12:07:51 2008 -0700 | | paravirt: introduce a "lock-byte" spinlock implementation The extra call/return in the spinlock path is somehow causing an increase in the cycles/instruction of somewhere around 2-7% (seems to vary quite a lot from test to test). The working theory is that the CPU's pipeline is getting upset about the call->call->locked-op->return->return, and seems to be failing to speculate (though I haven't seen anything definitive about the precise reasons). This doesn't entirely make sense, because the performance hit is also visible on unlock and other operations which don't involve locked instructions. But spinlock operations clearly swamp all the other pvops operations, even though I can't imagine that they're nearly as common (there's only a .05% increase in instructions executed). If I disable just the pv-spinlock calls, my tests show that pvops is identical to non-pvops performance on native (my measurements show that it is actually about .1% faster, but Xiaohui shows a .05% slowdown). Summary of results, averaging 10 runs of the "mmperf" test, using a no-pvops build as baseline: nopv Pv-nospin Pv-spin CPU cycles 100.00% 99.89% 102.18% instructions 100.00% 100.10% 100.15% CPI 100.00% 99.79% 102.03% cache ref 100.00% 100.84% 100.28% cache miss 100.00% 90.47% 88.56% cache miss rate 100.00% 89.72% 88.31% branches 100.00% 99.93% 100.04% branch miss 100.00% 103.66% 107.72% branch miss rt 100.00% 103.73% 107.67% wallclock 100.00% 99.90% 102.20% The clear effect here is that the 2% increase in CPI is directly reflected in the final wallclock time. (The other interesting effect is that the more ops are out of line calls via pvops, the lower the cache access and miss rates. Not too surprising, but it suggests that the non-pvops kernel is over-inlined. On the flipside, the branch misses go up correspondingly...) So, what's the fix? Paravirt patching turns all the pvops calls into direct calls, so _spin_lock etc do end up having direct calls. For example, the compiler generated code for paravirtualized _spin_lock is: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq *0xffffffff805a5b30 <_spin_lock+22>: retq The indirect call will get patched to: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq <__ticket_spin_lock> <_spin_lock+20>: nop; nop /* or whatever 2-byte nop */ <_spin_lock+22>: retq One possibility is to inline _spin_lock, etc, when building an optimised kernel (ie, when there's no spinlock/preempt instrumentation/debugging enabled). That will remove the outer call/return pair, returning the instruction stream to a single call/return, which will presumably execute the same as the non-pvops case. The downsides arel 1) it will replicate the preempt_disable/enable code at eack lock/unlock callsite; this code is fairly small, but not nothing; and 2) the spinlock definitions are already a very heavily tangled mass of #ifdefs and other preprocessor magic, and making any changes will be non-trivial. The other obvious answer is to disable pv-spinlocks. Making them a separate config option is fairly easy, and it would be trivial to enable them only when Xen is enabled (as the only non-default user). But it doesn't really address the common case of a distro build which is going to have Xen support enabled, and leaves the open question of whether the native performance cost of pv-spinlocks is worth the performance improvement on a loaded Xen system (10% saving of overall system CPU when guests block rather than spin). Still it is a reasonable short-term workaround. [ Impact: fix pvops performance regression when running native ] Analysed-by: N"Xin Xiaohui" <xiaohui.xin@intel.com> Analysed-by: N"Li Xin" <xin.li@intel.com> Analysed-by: N"Nakajima Jun" <jun.nakajima@intel.com> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: NH. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Xen-devel <xen-devel@lists.xensource.com> LKML-Reference: <4A0B62F7.5030802@goop.org> [ fixed the help text ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 13 5月, 2009 1 次提交
-
-
由 Randy Dunlap 提交于
mmu.c needs to #include module.h to prevent these warnings: arch/x86/xen/mmu.c:239: warning: data definition has no type or storage class arch/x86/xen/mmu.c:239: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' arch/x86/xen/mmu.c:239: warning: parameter names (without types) in function declaration [ Impact: cleanup ] Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 09 5月, 2009 3 次提交
-
-
由 Jeremy Fitzhardinge 提交于
stts() is implemented in terms of read_cr0/write_cr0 to update the state of the TS bit. This happens during context switch, and so is fairly performance critical. Rather than falling back to a trap-and-emulate native read_cr0, implement our own by caching the last-written value from write_cr0 (the TS bit is the only one we really care about). Impact: optimise Xen context switches Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Ignore known IST-using traps. Aside from the debugger traps, they're low-level faults which Xen will handle for us, so the kernel needn't worry about them. Keep warning in case unknown trap starts using IST. Impact: suppress spurious warnings Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Native x86-64 uses the IST mechanism to run int3 and debug traps on an alternative stack. Xen does not do this, and so the frames were being misinterpreted by the ptrace code. This change special-cases these two exceptions by using Xen variants which run on the normal kernel stack properly. Impact: avoid crash or bad data when IST trap is invoked under Xen Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 08 5月, 2009 2 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Use reserve_early rather than e820 reservations for Xen start info and mfn->pfn table, so that the memory use is a bit more self-documenting. [ Impact: cleanup ] Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4A032EF1.6070708@goop.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
The Xen pagetables are no longer implicitly reserved as part of the other i386_start_kernel reservations, so make sure we explicitly reserve them. This prevents them from being released into the general kernel free page pool and reused. [ Impact: fix Xen guest crash ] Also-Bisected-by: NBryan Donlan <bdonlan@gmail.com> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4A032EEC.30509@goop.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 22 4月, 2009 1 次提交
-
-
由 Magnus Damm 提交于
Pass clocksource pointer to the read() callback for clocksources. This allows us to share the callback between multiple instances. [hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods] [akpm@linux-foundation.org: cleanup] Signed-off-by: NMagnus Damm <damm@igel.co.jp> Acked-by: NJohn Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 4月, 2009 1 次提交
-
-
由 Masami Hiramatsu 提交于
Impact: fix kprobes crash on 32-bit with RAM above 4G Use phys_addr_t for receiving a physical address argument instead of unsigned long. This allows fixmap to handle pages higher than 4GB on x86-32. Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com> Acked-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: systemtap-ml <systemtap@sources.redhat.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <49DE3695.6040800@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 10 4月, 2009 2 次提交
-
-
由 Masami Hiramatsu 提交于
Use phys_addr_t for receiving a physical address argument instead of unsigned long. This allows fixmap to handle pages higher than 4GB on x86-32. Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jeremy Fitzhardinge 提交于
FIX_TEXT_POKE[01] are used to map kernel addresses, so they're mapping pfns, not mfns. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 09 4月, 2009 13 次提交
-
-
由 Jeremy Fitzhardinge 提交于
FIX_TEXT_POKE[01] are used to map kernel addresses, so they're mapping pfns, not mfns. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Use GATE_INTERRUPT/TRAP rather than 0xe/f. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Some 64-bit machines don't support the NX flag in ptes. Check for NX before constructing the kernel pagetables. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Impact: fixes crashing bug There's no particular problem with getting an empty cpu mask, so just shortcut-return if we get one. Avoids crash reported by Christophe Saout <christophe@saout.de> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
1. make sure early-allocated ptes are pinned, so they can be later unpinned 2. don't pin pmd+pud, just make them RO 3. scatter some __inits around Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE to be set. This confuses the kernel and it ends up crashing on an xsetbv instruction. At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of cpuid it we can't. This will produce a spurious error from Xen, but allows us to support XSAVE if/when Xen does. This also factors out the cpuid mask decisions to boot time. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Hannes Eder 提交于
Fix this sparse warnings: arch/x86/xen/smp.c:316:52: warning: Using plain integer as NULL pointer arch/x86/xen/smp.c:421:60: warning: Using plain integer as NULL pointer Signed-off-by: NHannes Eder <hannes@hanneseder.net> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Don't need the noise. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Remove use of multicall machinery which is unused (gdt loading is never performance critical). This removes the implicit use of percpu variables, which simplifies understanding how the percpu code's use of load_gdt interacts with this code. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Makes the logic a bit clearer. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Build the p2m_mfn_list_list early with the rest of the p2m table, but register it later when the real shared_info structure is in place. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
When doing very early p2m setting, we need to separate setting from allocation, so split things up accordingly. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
xen_mc_flush() requires preemption to be disabled for its own sanity, so disable it while we're flushing. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 31 3月, 2009 10 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Use GATE_INTERRUPT/TRAP rather than 0xe/f. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Some 64-bit machines don't support the NX flag in ptes. Check for NX before constructing the kernel pagetables. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Impact: fixes crashing bug There's no particular problem with getting an empty cpu mask, so just shortcut-return if we get one. Avoids crash reported by Christophe Saout <christophe@saout.de> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
1. make sure early-allocated ptes are pinned, so they can be later unpinned 2. don't pin pmd+pud, just make them RO 3. scatter some __inits around Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE to be set. This confuses the kernel and it ends up crashing on an xsetbv instruction. At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of cpuid it we can't. This will produce a spurious error from Xen, but allows us to support XSAVE if/when Xen does. This also factors out the cpuid mask decisions to boot time. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Hannes Eder 提交于
Fix this sparse warnings: arch/x86/xen/smp.c:316:52: warning: Using plain integer as NULL pointer arch/x86/xen/smp.c:421:60: warning: Using plain integer as NULL pointer Signed-off-by: NHannes Eder <hannes@hanneseder.net> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Don't need the noise. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Remove use of multicall machinery which is unused (gdt loading is never performance critical). This removes the implicit use of percpu variables, which simplifies understanding how the percpu code's use of load_gdt interacts with this code. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Makes the logic a bit clearer. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Build the p2m_mfn_list_list early with the rest of the p2m table, but register it later when the real shared_info structure is in place. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 30 3月, 2009 5 次提交
-
-
由 Jeremy Fitzhardinge 提交于
When doing very early p2m setting, we need to separate setting from allocation, so split things up accordingly. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
xen_mc_flush() requires preemption to be disabled for its own sanity, so disable it while we're flushing. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Impact: remove obsolete checks, simplification Lift restrictions on preemption with lazy mmu mode, as it is now allowed. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
-
由 Jeremy Fitzhardinge 提交于
Impact: fix lazy context switch API Pass the previous and next tasks into the context switch start end calls, so that the called functions can properly access the task state (esp in end_context_switch, in which the next task is not yet completely current). Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
-
由 Jeremy Fitzhardinge 提交于
Impact: allow preemption during lazy mmu updates If we're in lazy mmu mode when context switching, leave lazy mmu mode, but remember the task's state in TIF_LAZY_MMU_UPDATES. When we resume the task, check this flag and re-enter lazy mmu mode if its set. This sets things up for allowing lazy mmu mode while preemptible, though that won't actually be active until the next change. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
-