- 31 8月, 2009 1 次提交
-
-
由 Thomas Gleixner 提交于
ARCH_SETUP is a horrible leftover from the old arch/i386 mach support code. It still has a lonely user in xen. Move it to x86_init_ops. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 18 6月, 2009 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Split the monolithic asm/paravirt.h into separate paravirt.h (inlines and other "active" definitions), and paravirt_types.h (types, constants and other "passive" definitions). This makes it easier to use the type/constant definitions without pulling in everything else and causing circular dependency problems. [ Impact: cleanup ] Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 16 5月, 2009 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Xiaohui Xin and some other folks at Intel have been looking into what's behind the performance hit of paravirt_ops when running native. It appears that the hit is entirely due to the paravirtualized spinlocks introduced by: | commit 8efcbab6 | Date: Mon Jul 7 12:07:51 2008 -0700 | | paravirt: introduce a "lock-byte" spinlock implementation The extra call/return in the spinlock path is somehow causing an increase in the cycles/instruction of somewhere around 2-7% (seems to vary quite a lot from test to test). The working theory is that the CPU's pipeline is getting upset about the call->call->locked-op->return->return, and seems to be failing to speculate (though I haven't seen anything definitive about the precise reasons). This doesn't entirely make sense, because the performance hit is also visible on unlock and other operations which don't involve locked instructions. But spinlock operations clearly swamp all the other pvops operations, even though I can't imagine that they're nearly as common (there's only a .05% increase in instructions executed). If I disable just the pv-spinlock calls, my tests show that pvops is identical to non-pvops performance on native (my measurements show that it is actually about .1% faster, but Xiaohui shows a .05% slowdown). Summary of results, averaging 10 runs of the "mmperf" test, using a no-pvops build as baseline: nopv Pv-nospin Pv-spin CPU cycles 100.00% 99.89% 102.18% instructions 100.00% 100.10% 100.15% CPI 100.00% 99.79% 102.03% cache ref 100.00% 100.84% 100.28% cache miss 100.00% 90.47% 88.56% cache miss rate 100.00% 89.72% 88.31% branches 100.00% 99.93% 100.04% branch miss 100.00% 103.66% 107.72% branch miss rt 100.00% 103.73% 107.67% wallclock 100.00% 99.90% 102.20% The clear effect here is that the 2% increase in CPI is directly reflected in the final wallclock time. (The other interesting effect is that the more ops are out of line calls via pvops, the lower the cache access and miss rates. Not too surprising, but it suggests that the non-pvops kernel is over-inlined. On the flipside, the branch misses go up correspondingly...) So, what's the fix? Paravirt patching turns all the pvops calls into direct calls, so _spin_lock etc do end up having direct calls. For example, the compiler generated code for paravirtualized _spin_lock is: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq *0xffffffff805a5b30 <_spin_lock+22>: retq The indirect call will get patched to: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq <__ticket_spin_lock> <_spin_lock+20>: nop; nop /* or whatever 2-byte nop */ <_spin_lock+22>: retq One possibility is to inline _spin_lock, etc, when building an optimised kernel (ie, when there's no spinlock/preempt instrumentation/debugging enabled). That will remove the outer call/return pair, returning the instruction stream to a single call/return, which will presumably execute the same as the non-pvops case. The downsides arel 1) it will replicate the preempt_disable/enable code at eack lock/unlock callsite; this code is fairly small, but not nothing; and 2) the spinlock definitions are already a very heavily tangled mass of #ifdefs and other preprocessor magic, and making any changes will be non-trivial. The other obvious answer is to disable pv-spinlocks. Making them a separate config option is fairly easy, and it would be trivial to enable them only when Xen is enabled (as the only non-default user). But it doesn't really address the common case of a distro build which is going to have Xen support enabled, and leaves the open question of whether the native performance cost of pv-spinlocks is worth the performance improvement on a loaded Xen system (10% saving of overall system CPU when guests block rather than spin). Still it is a reasonable short-term workaround. [ Impact: fix pvops performance regression when running native ] Analysed-by: N"Xin Xiaohui" <xiaohui.xin@intel.com> Analysed-by: N"Li Xin" <xin.li@intel.com> Analysed-by: N"Nakajima Jun" <jun.nakajima@intel.com> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: NH. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Xen-devel <xen-devel@lists.xensource.com> LKML-Reference: <4A0B62F7.5030802@goop.org> [ fixed the help text ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 11 4月, 2009 1 次提交
-
-
由 Masami Hiramatsu 提交于
Impact: fix kprobes crash on 32-bit with RAM above 4G Use phys_addr_t for receiving a physical address argument instead of unsigned long. This allows fixmap to handle pages higher than 4GB on x86-32. Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com> Acked-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: systemtap-ml <systemtap@sources.redhat.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <49DE3695.6040800@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 10 4月, 2009 1 次提交
-
-
由 Masami Hiramatsu 提交于
Use phys_addr_t for receiving a physical address argument instead of unsigned long. This allows fixmap to handle pages higher than 4GB on x86-32. Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 3月, 2009 3 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Impact: fix lazy context switch API Pass the previous and next tasks into the context switch start end calls, so that the called functions can properly access the task state (esp in end_context_switch, in which the next task is not yet completely current). Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
-
由 Jeremy Fitzhardinge 提交于
Impact: allow preemption during lazy mmu updates If we're in lazy mmu mode when context switching, leave lazy mmu mode, but remember the task's state in TIF_LAZY_MMU_UPDATES. When we resume the task, check this flag and re-enter lazy mmu mode if its set. This sets things up for allowing lazy mmu mode while preemptible, though that won't actually be active until the next change. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
-
由 Jeremy Fitzhardinge 提交于
Impact: simplification, prepare for later changes Make lazy cpu mode more specific to context switching, so that it makes sense to do more context-switch specific things in the callbacks. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
-
- 19 3月, 2009 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Impact: cleanup set_pte_present() is no longer used, directly or indirectly, so remove it. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Avi Kivity <avi@redhat.com> LKML-Reference: <1237406613-2929-2-git-send-email-jeremy@goop.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 17 3月, 2009 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Impact: fix crash on VMI (VMware) When we generate a call sequence for calling a paravirtualized function, we presume that the generated code is "call *0xXXXXX", which is a 6 byte opcode; this is larger than a normal direct call, and so we can patch a direct call over it. At the moment, however we give gcc enough rope to hang us by putting the address in a register and generating a two byte indirect-via-register call. Prevent this by explicitly dereferencing the function pointer and passing it into the asm as a constant. This prevents crashes in VMI, as it cannot handle unpatchable callsites. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Alok Kataria <akataria@vmware.com> LKML-Reference: <49BEEDC2.2070809@goop.org> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 13 2月, 2009 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Impact: avoid access to percpu vars in preempible context They are intended to be used whenever there's the possibility that there's some stale state which is going to be overwritten with a queued update, or to force a state change when we may be in lazy mode. Either way, we could end up calling it with preemption enabled, so wrap the functions in their own little preempt-disable section so they can be safely called in any context (though preemption should never be enabled if we're actually in a lazy state). (Move out of line to avoid #include dependencies.) Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 12 2月, 2009 2 次提交
-
-
由 Jeremy Fitzhardinge 提交于
pgtable*.h is intended for definitions relating to actual pagetables and their entries, so move all the definitions for (pte|pmd|pud|pgd)(val)?_t to the appropriate pgtable*.h headers. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
-
- 10 2月, 2009 1 次提交
-
-
由 Kyle McMartin 提交于
Architectures other than mips and x86 are not using ticket spinlocks. Therefore, the contention on the lock is meaningless, since there is nobody known to be waiting on it (arguably /fairly/ unfair locks). Dummy it out to return 0 on other architectures. Signed-off-by: NKyle McMartin <kyle@redhat.com> Acked-by: NRalf Baechle <ralf@linux-mips.org> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 2月, 2009 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Impact: Bug fix A hunk went missing in the original patch, and callee-save callsites were not marked as returning the upper 32-bit of result, causing Badness. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 03 2月, 2009 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Impact: bugfix In the 32-bit calling convention, %eax:%edx is used to return 64-bit values. Don't save and restore %edx around wrapped functions, or they can't return a full 64-bit result. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 31 1月, 2009 6 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Impact: Fix build when CONFIG_PARAVIRT_DEBUG is enabled Fix missed convertion to using callee-saved calls for pud_val, which causes a compile error when CONFIG_PARAVIRT_DEBUG is enabled. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Jeremy Fitzhardinge 提交于
Impact: Optimization In the native case, pte_val, make_pte, etc are all just identity functions, so there's no need to clobber a lot of registers over them. (This changes the 32-bit callee-save calling convention to return both EAX and EDX so functions can return 64-bit values.) Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Jeremy Fitzhardinge 提交于
Impact: Optimization Functions with the callee save calling convention clobber many fewer registers than the normal C calling convention. Implement variants of PVOP_V?CALL* accordingly. This only bothers with functions up to 3 args, since functions with more args may as well use the normal calling convention. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Jeremy Fitzhardinge 提交于
Impact: Optimization One of the problems with inserting a pile of C calls where previously there were none is that the register pressure is greatly increased. The C calling convention says that the caller must expect a certain set of registers may be trashed by the callee, and that the callee can use those registers without restriction. This includes the function argument registers, and several others. This patch seeks to alleviate this pressure by introducing wrapper thunks that will do the register saving/restoring, so that the callsite doesn't need to worry about it, but the callee function can be conventional compiler-generated code. In many cases (particularly performance-sensitive cases) the callee will be in assembler anyway, and need not use the compiler's calling convention. Standard calling convention is: arguments return scratch x86-32 eax edx ecx eax ? x86-64 rdi rsi rdx rcx rax r8 r9 r10 r11 The thunk preserves all argument and scratch registers. The return register is not preserved, and is available as a scratch register for unwrapped callee code (and of course the return value). Wrapped function pointers are themselves wrapped in a struct paravirt_callee_save structure, in order to get some warning from the compiler when functions with mismatched calling conventions are used. The most common paravirt ops, both statically and dynamically, are interrupt enable/disable/save/restore, so handle them first. This is particularly easy since their calls are handled specially anyway. XXX Deal with VMI. What's their calling convention? Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Jeremy Fitzhardinge 提交于
Impact: Optimization Each asm paravirt-ops call says what registers are available for clobbering. This patch makes use of this to selectively save/restore registers around each pvops call. In many cases this significantly shrinks code size. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Jeremy Fitzhardinge 提交于
Impact: Optimization Several paravirt ops implementations simply return their arguments, the most obvious being the make_pte/pte_val class of operations on native. On 32-bit, the identity function is literally a no-op, as the calling convention uses the same registers for the first argument and return. On 64-bit, it can be implemented with a single "mov". This patch adds special identity functions for 32 and 64 bit argument, and machinery to recognize them and replace them with either nops or a mov as appropriate. At the moment, the only users for the identity functions are the pagetable entry conversion functions. The result is a measureable improvement on pagetable-heavy benchmarks (2-3%, reducing the pvops overhead from 5 to 2%). Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 23 1月, 2009 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
pte_flags() was introduced as a new pvop in order to extract just the flags portion of a pte, which is a potentially cheaper operation than extracting the page number as well. It turns out this operation is not needed, because simply using a mask to extract the flags from a pte is sufficient for all current users. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 21 1月, 2009 1 次提交
-
-
由 Jiri Kosina 提交于
Impact: cleanup Remove byte locks implementation, which was introduced by Jeremy in 8efcbab6 ("paravirt: introduce a "lock-byte" spinlock implementation"), but turned out to be dead code that is not used by any in-kernel virtualization guest (Xen uses its own variant of spinlocks implementation and KVM is not planning to move to byte locks). Signed-off-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 1月, 2009 1 次提交
-
-
由 Rusty Russell 提交于
Impact: reduce stack usage, use new cpumask API. This is made a little more tricky by uv_flush_tlb_others which actually alters its argument, for an IPI to be sent to the remaining cpus in the mask. I solve this by allocating a cpumask_var_t for this case and falling back to IPI should this fail. To eliminate temporaries in the caller, all flush_tlb_others implementations now do the this-cpu-elimination step themselves. Note also the curious "cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask)" which has been there since pre-git and yet f->flush_cpumask is always zero at this point. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NMike Travis <travis@sgi.com>
-
- 23 10月, 2008 2 次提交
-
-
由 H. Peter Anvin 提交于
Change header guards named "ASM_X86__*" to "_ASM_X86_*" since: a. the double underscore is ugly and pointless. b. no leading underscore violates namespace constraints. Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 22 8月, 2008 2 次提交
-
-
由 Yinghai Lu 提交于
commandline show_msr=1 for bsp, show_msr=32 for all 32 cpus. [ mingo@elte.hu: added documentation ] Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Eduardo Habkost 提交于
This patch changes the pfn args from 'u32' to 'unsigned long' on alloc_p*() functions on paravirt_ops, and the corresponding implementations for Xen and VMI. The prototypes for CONFIG_PARAVIRT=n are already using unsigned long, so paravirt.h now matches the prototypes on asm-x86/pgalloc.h. It shouldn't result in any changes on generated code on 32-bit, with or without CONFIG_PARAVIRT. On both cases, 'codiff -f' didn't show any change after applying this patch. On 64-bit, there are (expected) binary changes only when CONFIG_PARAVIRT is enabled, as the patch is really supposed to change the size of the pfn args. [ v2: KVM_GUEST: use the right parameter type on kvm_release_pt() ] Signed-off-by: NEduardo Habkost <ehabkost@redhat.com> Acked-by: NJeremy Fitzhardinge <jeremy@goop.org> Acked-by: NZachary Amsden <zach@vmware.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 20 8月, 2008 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
It is useful for a pv_lock_ops backend to know whether interrupts are enabled or not in the context a spin_lock is being called. This allows it to enable interrupts while spinning, which could be particularly helpful when spinning becomes blocking. The default implementation just calls the normal spin_lock op, ignoring the flags. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 24 7月, 2008 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
LTP testing showed that Xen does not properly implement sys_modify_ldt(). This patch does the final little bits needed to make the ldt work properly. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 23 7月, 2008 1 次提交
-
-
由 Vegard Nossum 提交于
This patch is the result of an automatic script that consolidates the format of all the headers in include/asm-x86/. The format: 1. No leading underscore. Names with leading underscores are reserved. 2. Pathname components are separated by two underscores. So we can distinguish between mm_types.h and mm/types.h. 3. Everything except letters and numbers are turned into single underscores. Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
-
- 22 7月, 2008 2 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Rusty, in his peevish way, complained that macros defining constants should have a name which somewhat accurately reflects the actual purpose of the constant. Aside from the fact that PTE_MASK gives no clue as to what's actually being masked, and is misleadingly similar to the functionally entirely different PMD_MASK, PUD_MASK and PGD_MASK, I don't really see what the problem is. But if this patch silences the incessent noise, then it will have achieved its goal (TODO: write test-case). Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Rusty Russell 提交于
(Jeremy said: rusty: use PTE_MASK rusty: use PTE_MASK rusty: use PTE_MASK When I asked: jsgf: does that include the NX flag? He responded eloquently: rusty: use PTE_MASK rusty: use PTE_MASK yes, it's the official constant of masking flags out of ptes ) Change a15af1c9 'x86/paravirt: add pte_flags to just get pte flags' removed lguest's private pte_flags() in favor of a generic one. Unfortunately, the generic one doesn't filter out the non-flags bits: this results in lguest creating corrupt shadow page tables and blowing up host memory. Since noone is supposed to use the pfn part of pte_flags(), it seems safest to always do the filtering. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Acked-by: NJeremy Fitzhardinge <jeremy@goop.org> Signed-off-and-morning-tea-spilled-by: NIngo Molnar <mingo@elte.hu>
-
- 18 7月, 2008 2 次提交
-
-
由 Harvey Harrison 提交于
include/asm/paravirt.h:1404:2: warning: returning void-valued expression include/asm/paravirt.h:1414:2: warning: returning void-valued expression Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Maciej W. Rozycki 提交于
Use alternatives to select the workaround for the 11AP Pentium erratum for the affected steppings on the fly rather than build time. Remove the X86_GOOD_APIC configuration option and replace all the calls to apic_write_around() with plain apic_write(), protecting accesses to the ESR as appropriate due to the 3AP Pentium erratum. Remove apic_read_around() and all its invocations altogether as not needed. Remove apic_write_atomic() and all its implementing backends. The use of ASM_OUTPUT2() is not strictly needed for input constraints, but I have used it for readability's sake. I had the feeling no one else was brave enough to do it, so I went ahead and here it is. Verified by checking the generated assembly and tested with both a 32-bit and a 64-bit configuration, also with the 11AP "feature" forced on and verified with gdb on /proc/kcore to work as expected (as an 11AP machines are quite hard to get hands on these days). Some script complained about the use of "volatile", but apic_write() needs it for the same reason and is effectively a replacement for writel(), so I have disregarded it. I am not sure what the policy wrt defconfig files is, they are generated and there is risk of a conflict resulting from an unrelated change, so I have left changes to them out. The option will get removed from them at the next run. Some testing with machines other than mine will be needed to avoid some stupid mistake, but despite its volume, the change is not really that intrusive, so I am fairly confident that because it works for me, it will everywhere. Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 16 7月, 2008 4 次提交
-
-
由 Ingo Molnar 提交于
Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
Implement a version of the old spinlock algorithm, in which everyone spins waiting for a lock byte. In order to be compatible with the ticket-lock's use of a zero initializer, this uses the convention of '0' for unlocked and '1' for locked. This algorithm is much better than ticket locks in a virtual envionment, because it doesn't interact badly with the vcpu scheduler. If there are multiple vcpus spinning on a lock and the lock is released, the next vcpu to be scheduled will take the lock, rather than cycling around until the next ticketed vcpu gets it. To use this, you must call paravirt_use_bytelocks() very early, before any spinlocks have been taken. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@linux-foundation.org> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Virtualization <virtualization@lists.linux-foundation.org> Cc: Xen devel <xen-devel@lists.xensource.com> Cc: Thomas Friebel <thomas.friebel@amd.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
Ticket spinlocks have absolutely ghastly worst-case performance characteristics in a virtual environment. If there is any contention for physical CPUs (ie, there are more runnable vcpus than cpus), then ticket locks can cause the system to end up spending 90+% of its time spinning. The problem is that (v)cpus waiting on a ticket spinlock will be granted access to the lock in strict order they got their tickets. If the hypervisor scheduler doesn't give the vcpus time in that order, they will burn timeslices waiting for the scheduler to give the right vcpu some time. In the worst case it could take O(n^2) vcpu scheduler timeslices for everyone waiting on the lock to get it, not counting new cpus trying to take the lock while the log-jam is sorted out. These hooks allow a paravirt backend to replace the spinlock implementation. At the very least, this could revert the implementation back to the old lock algorithm, which allows the next scheduled vcpu to take the lock, and has basically fairly good performance. It also allows the spinlocks to take advantages of the hypervisor features to make locks more efficient (spin and block, for example). The cost to native execution is an extra direct call when using a spinlock function. There's no overhead if CONFIG_PARAVIRT is turned off. The lock structure is fixed at a single "unsigned int", initialized to zero, but the spinlock implementation can use it as it wishes. Thanks to Thomas Friebel's Xen Summit talk "Preventing Guests from Spinning Around" for pointing out this problem. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@linux-foundation.org> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Virtualization <virtualization@lists.linux-foundation.org> Cc: Xen devel <xen-devel@lists.xensource.com> Cc: Thomas Friebel <thomas.friebel@amd.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
The Xen hypercall interface is allowed to trash any or all of the argument registers, so we need to be careful that the kernel state isn't damaged. On 32-bit kernels, the hypercall parameter registers same as a regparm function call, so we've got away without explicit clobbering so far. The 64-bit ABI defines lots of caller-save registers, so save them all for safety. We can trim this set later by re-distributing the responsibility for saving all these registers. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-