- 10 9月, 2009 3 次提交
-
-
由 Yang Xiaowei 提交于
We need to have a stronger barrier between releasing the lock and checking for any waiting spinners. A compiler barrier is not sufficient because the CPU's ordering rules do not prevent the read xl->spinners from happening before the unlock assignment, as they are different memory locations. We need to have an explicit barrier to enforce the write-read ordering to different memory locations. Because of it, I can't bring up > 4 HVM guests on one SMP machine. [ Code and commit comments expanded -J ] [ Impact: avoid deadlock when using Xen PV spinlocks ] Signed-off-by: NYang Xiaowei <xiaowei.yang@intel.com> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Where possible we enable interrupts while waiting for a spinlock to become free, in order to reduce big latency spikes in interrupt handling. However, at present if we manage to pick up the spinlock just before blocking, we'll end up holding the lock with interrupts enabled for a while. This will cause a deadlock if we recieve an interrupt in that window, and the interrupt handler tries to take the lock too. Solve this by shrinking the interrupt-enabled region to just around the blocking call. [ Impact: avoid race/deadlock when using Xen PV spinlocks ] Reported-by: N"Yang, Xiaowei" <xiaowei.yang@intel.com> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
-fstack-protector uses a special per-cpu "stack canary" value. gcc generates special code in each function to test the canary to make sure that the function's stack hasn't been overrun. On x86-64, this is simply an offset of %gs, which is the usual per-cpu base segment register, so setting it up simply requires loading %gs's base as normal. On i386, the stack protector segment is %gs (rather than the usual kernel percpu %fs segment register). This requires setting up the full kernel GDT and then loading %gs accordingly. We also need to make sure %gs is initialized when bringing up secondary cpus too. To keep things consistent, we do the full GDT/segment register setup on both architectures. Because we need to avoid -fstack-protected code before setting up the GDT and because there's no way to disable it on a per-function basis, several files need to have stack-protector inhibited. [ Impact: allow Xen booting with stack-protector enabled ] Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 26 8月, 2009 2 次提交
-
-
由 H. Peter Anvin 提交于
Initialize cx before calling xen_cpuid(), in order to suppress the "may be used uninitialized in this function" warning. Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
-
由 Jeremy Fitzhardinge 提交于
Xen always runs on CPUs which properly support WP enforcement in privileged mode, so there's no need to test for it. This also works around a crash reported by Arnd Hannemann, though I think its just a band-aid for that case. Reported-by: NArnd Hannemann <hannemann@nets.rwth-aachen.de> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 20 8月, 2009 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Make sure the stack-protector segment registers are properly set up before calling any functions which may have stack-protection compiled into them. [ Impact: prevent Xen early-boot crash when stack-protector is enabled ] Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 16 5月, 2009 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Xiaohui Xin and some other folks at Intel have been looking into what's behind the performance hit of paravirt_ops when running native. It appears that the hit is entirely due to the paravirtualized spinlocks introduced by: | commit 8efcbab6 | Date: Mon Jul 7 12:07:51 2008 -0700 | | paravirt: introduce a "lock-byte" spinlock implementation The extra call/return in the spinlock path is somehow causing an increase in the cycles/instruction of somewhere around 2-7% (seems to vary quite a lot from test to test). The working theory is that the CPU's pipeline is getting upset about the call->call->locked-op->return->return, and seems to be failing to speculate (though I haven't seen anything definitive about the precise reasons). This doesn't entirely make sense, because the performance hit is also visible on unlock and other operations which don't involve locked instructions. But spinlock operations clearly swamp all the other pvops operations, even though I can't imagine that they're nearly as common (there's only a .05% increase in instructions executed). If I disable just the pv-spinlock calls, my tests show that pvops is identical to non-pvops performance on native (my measurements show that it is actually about .1% faster, but Xiaohui shows a .05% slowdown). Summary of results, averaging 10 runs of the "mmperf" test, using a no-pvops build as baseline: nopv Pv-nospin Pv-spin CPU cycles 100.00% 99.89% 102.18% instructions 100.00% 100.10% 100.15% CPI 100.00% 99.79% 102.03% cache ref 100.00% 100.84% 100.28% cache miss 100.00% 90.47% 88.56% cache miss rate 100.00% 89.72% 88.31% branches 100.00% 99.93% 100.04% branch miss 100.00% 103.66% 107.72% branch miss rt 100.00% 103.73% 107.67% wallclock 100.00% 99.90% 102.20% The clear effect here is that the 2% increase in CPI is directly reflected in the final wallclock time. (The other interesting effect is that the more ops are out of line calls via pvops, the lower the cache access and miss rates. Not too surprising, but it suggests that the non-pvops kernel is over-inlined. On the flipside, the branch misses go up correspondingly...) So, what's the fix? Paravirt patching turns all the pvops calls into direct calls, so _spin_lock etc do end up having direct calls. For example, the compiler generated code for paravirtualized _spin_lock is: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq *0xffffffff805a5b30 <_spin_lock+22>: retq The indirect call will get patched to: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq <__ticket_spin_lock> <_spin_lock+20>: nop; nop /* or whatever 2-byte nop */ <_spin_lock+22>: retq One possibility is to inline _spin_lock, etc, when building an optimised kernel (ie, when there's no spinlock/preempt instrumentation/debugging enabled). That will remove the outer call/return pair, returning the instruction stream to a single call/return, which will presumably execute the same as the non-pvops case. The downsides arel 1) it will replicate the preempt_disable/enable code at eack lock/unlock callsite; this code is fairly small, but not nothing; and 2) the spinlock definitions are already a very heavily tangled mass of #ifdefs and other preprocessor magic, and making any changes will be non-trivial. The other obvious answer is to disable pv-spinlocks. Making them a separate config option is fairly easy, and it would be trivial to enable them only when Xen is enabled (as the only non-default user). But it doesn't really address the common case of a distro build which is going to have Xen support enabled, and leaves the open question of whether the native performance cost of pv-spinlocks is worth the performance improvement on a loaded Xen system (10% saving of overall system CPU when guests block rather than spin). Still it is a reasonable short-term workaround. [ Impact: fix pvops performance regression when running native ] Analysed-by: N"Xin Xiaohui" <xiaohui.xin@intel.com> Analysed-by: N"Li Xin" <xin.li@intel.com> Analysed-by: N"Nakajima Jun" <jun.nakajima@intel.com> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: NH. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Xen-devel <xen-devel@lists.xensource.com> LKML-Reference: <4A0B62F7.5030802@goop.org> [ fixed the help text ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 13 5月, 2009 1 次提交
-
-
由 Randy Dunlap 提交于
mmu.c needs to #include module.h to prevent these warnings: arch/x86/xen/mmu.c:239: warning: data definition has no type or storage class arch/x86/xen/mmu.c:239: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' arch/x86/xen/mmu.c:239: warning: parameter names (without types) in function declaration [ Impact: cleanup ] Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 09 5月, 2009 3 次提交
-
-
由 Jeremy Fitzhardinge 提交于
stts() is implemented in terms of read_cr0/write_cr0 to update the state of the TS bit. This happens during context switch, and so is fairly performance critical. Rather than falling back to a trap-and-emulate native read_cr0, implement our own by caching the last-written value from write_cr0 (the TS bit is the only one we really care about). Impact: optimise Xen context switches Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Ignore known IST-using traps. Aside from the debugger traps, they're low-level faults which Xen will handle for us, so the kernel needn't worry about them. Keep warning in case unknown trap starts using IST. Impact: suppress spurious warnings Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Native x86-64 uses the IST mechanism to run int3 and debug traps on an alternative stack. Xen does not do this, and so the frames were being misinterpreted by the ptrace code. This change special-cases these two exceptions by using Xen variants which run on the normal kernel stack properly. Impact: avoid crash or bad data when IST trap is invoked under Xen Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 08 5月, 2009 2 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Use reserve_early rather than e820 reservations for Xen start info and mfn->pfn table, so that the memory use is a bit more self-documenting. [ Impact: cleanup ] Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4A032EF1.6070708@goop.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
The Xen pagetables are no longer implicitly reserved as part of the other i386_start_kernel reservations, so make sure we explicitly reserve them. This prevents them from being released into the general kernel free page pool and reused. [ Impact: fix Xen guest crash ] Also-Bisected-by: NBryan Donlan <bdonlan@gmail.com> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4A032EEC.30509@goop.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 22 4月, 2009 1 次提交
-
-
由 Magnus Damm 提交于
Pass clocksource pointer to the read() callback for clocksources. This allows us to share the callback between multiple instances. [hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods] [akpm@linux-foundation.org: cleanup] Signed-off-by: NMagnus Damm <damm@igel.co.jp> Acked-by: NJohn Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 4月, 2009 1 次提交
-
-
由 Masami Hiramatsu 提交于
Impact: fix kprobes crash on 32-bit with RAM above 4G Use phys_addr_t for receiving a physical address argument instead of unsigned long. This allows fixmap to handle pages higher than 4GB on x86-32. Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com> Acked-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: systemtap-ml <systemtap@sources.redhat.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <49DE3695.6040800@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 10 4月, 2009 2 次提交
-
-
由 Masami Hiramatsu 提交于
Use phys_addr_t for receiving a physical address argument instead of unsigned long. This allows fixmap to handle pages higher than 4GB on x86-32. Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jeremy Fitzhardinge 提交于
FIX_TEXT_POKE[01] are used to map kernel addresses, so they're mapping pfns, not mfns. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 09 4月, 2009 13 次提交
-
-
由 Jeremy Fitzhardinge 提交于
FIX_TEXT_POKE[01] are used to map kernel addresses, so they're mapping pfns, not mfns. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Use GATE_INTERRUPT/TRAP rather than 0xe/f. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Some 64-bit machines don't support the NX flag in ptes. Check for NX before constructing the kernel pagetables. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Impact: fixes crashing bug There's no particular problem with getting an empty cpu mask, so just shortcut-return if we get one. Avoids crash reported by Christophe Saout <christophe@saout.de> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
1. make sure early-allocated ptes are pinned, so they can be later unpinned 2. don't pin pmd+pud, just make them RO 3. scatter some __inits around Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE to be set. This confuses the kernel and it ends up crashing on an xsetbv instruction. At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of cpuid it we can't. This will produce a spurious error from Xen, but allows us to support XSAVE if/when Xen does. This also factors out the cpuid mask decisions to boot time. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Hannes Eder 提交于
Fix this sparse warnings: arch/x86/xen/smp.c:316:52: warning: Using plain integer as NULL pointer arch/x86/xen/smp.c:421:60: warning: Using plain integer as NULL pointer Signed-off-by: NHannes Eder <hannes@hanneseder.net> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Don't need the noise. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Remove use of multicall machinery which is unused (gdt loading is never performance critical). This removes the implicit use of percpu variables, which simplifies understanding how the percpu code's use of load_gdt interacts with this code. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Makes the logic a bit clearer. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Build the p2m_mfn_list_list early with the rest of the p2m table, but register it later when the real shared_info structure is in place. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
When doing very early p2m setting, we need to separate setting from allocation, so split things up accordingly. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
xen_mc_flush() requires preemption to be disabled for its own sanity, so disable it while we're flushing. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 31 3月, 2009 10 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Use GATE_INTERRUPT/TRAP rather than 0xe/f. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Some 64-bit machines don't support the NX flag in ptes. Check for NX before constructing the kernel pagetables. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Impact: fixes crashing bug There's no particular problem with getting an empty cpu mask, so just shortcut-return if we get one. Avoids crash reported by Christophe Saout <christophe@saout.de> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
1. make sure early-allocated ptes are pinned, so they can be later unpinned 2. don't pin pmd+pud, just make them RO 3. scatter some __inits around Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE to be set. This confuses the kernel and it ends up crashing on an xsetbv instruction. At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of cpuid it we can't. This will produce a spurious error from Xen, but allows us to support XSAVE if/when Xen does. This also factors out the cpuid mask decisions to boot time. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Hannes Eder 提交于
Fix this sparse warnings: arch/x86/xen/smp.c:316:52: warning: Using plain integer as NULL pointer arch/x86/xen/smp.c:421:60: warning: Using plain integer as NULL pointer Signed-off-by: NHannes Eder <hannes@hanneseder.net> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Don't need the noise. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Remove use of multicall machinery which is unused (gdt loading is never performance critical). This removes the implicit use of percpu variables, which simplifies understanding how the percpu code's use of load_gdt interacts with this code. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Makes the logic a bit clearer. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Build the p2m_mfn_list_list early with the rest of the p2m table, but register it later when the real shared_info structure is in place. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-