- 06 2月, 2009 1 次提交
-
-
由 Tejun Heo 提交于
Make the following style cleanups: * drop unnecessary //#include from xen-asm_32.S * compulsive adding of space after comma * reformat multiline comments Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 05 2月, 2009 4 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Enable the use of the direct vcpu-access operations on 64-bit. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Jeremy Fitzhardinge 提交于
Now that x86-64 has directly accessible percpu variables, it can also implement the direct versions of these operations, which operate on a vcpu_info structure directly embedded in the percpu area. In fact, the 64-bit versions are more or less identical, and so can be shared. The only two differences are: 1. xen_restore_fl_direct takes its argument in eax on 32-bit, and rdi on 64-bit. Unfortunately it isn't possible to directly refer to the 2nd lsb of rdi directly (as you can with %ah), so the code isn't quite as dense. 2. check_events needs to variants to save different registers. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Jeremy Fitzhardinge 提交于
We need to access percpu data fairly early, so set up the percpu registers as soon as possible. We only need to load the appropriate segment register. We already have a GDT, but its hard to change it early because we need to manipulate the pagetable to do so, and that hasn't been set up yet. Also, set the kernel stack when bringing up secondary CPUs. If we don't they all end up sharing the same stack... Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Jeremy Fitzhardinge 提交于
Moving the mmu code from enlighten.c to mmu.c inadvertently broke the 32-bit build. Fix it. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 31 1月, 2009 4 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Impact: fix xen booting We need to access percpu data fairly early, so set up the percpu registers as soon as possible. We only need to load the appropriate segment register. We already have a GDT, but its hard to change it early because we need to manipulate the pagetable to do so, and that hasn't been set up yet. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Jeremy Fitzhardinge 提交于
Impact: Optimization In the native case, pte_val, make_pte, etc are all just identity functions, so there's no need to clobber a lot of registers over them. (This changes the 32-bit callee-save calling convention to return both EAX and EDX so functions can return 64-bit values.) Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Jeremy Fitzhardinge 提交于
Impact: Optimization One of the problems with inserting a pile of C calls where previously there were none is that the register pressure is greatly increased. The C calling convention says that the caller must expect a certain set of registers may be trashed by the callee, and that the callee can use those registers without restriction. This includes the function argument registers, and several others. This patch seeks to alleviate this pressure by introducing wrapper thunks that will do the register saving/restoring, so that the callsite doesn't need to worry about it, but the callee function can be conventional compiler-generated code. In many cases (particularly performance-sensitive cases) the callee will be in assembler anyway, and need not use the compiler's calling convention. Standard calling convention is: arguments return scratch x86-32 eax edx ecx eax ? x86-64 rdi rsi rdx rcx rax r8 r9 r10 r11 The thunk preserves all argument and scratch registers. The return register is not preserved, and is available as a scratch register for unwrapped callee code (and of course the return value). Wrapped function pointers are themselves wrapped in a struct paravirt_callee_save structure, in order to get some warning from the compiler when functions with mismatched calling conventions are used. The most common paravirt ops, both statically and dynamically, are interrupt enable/disable/save/restore, so handle them first. This is particularly easy since their calls are handled specially anyway. XXX Deal with VMI. What's their calling convention? Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Jeremy Fitzhardinge 提交于
Impact: Cleanup Move remaining mmu-related stuff into mmu.c. A general cleanup, and lay the groundwork for later patches. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 27 1月, 2009 1 次提交
-
-
由 Brian Gerst 提交于
Impact: cleanup Rename init_gdt() to setup_percpu_segment(), and move it to setup_percpu.c. Signed-off-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 23 1月, 2009 2 次提交
-
-
由 Ingo Molnar 提交于
Impact: build fix This build error: arch/x86/xen/suspend.c:22: error: implicit declaration of function 'fix_to_virt' arch/x86/xen/suspend.c:22: error: 'FIX_PARAVIRT_BOOTMAP' undeclared (first use in this function) arch/x86/xen/suspend.c:22: error: (Each undeclared identifier is reported only once arch/x86/xen/suspend.c:22: error: for each function it appears in.) triggers because the hardirq.h unification removed an implicit fixmap.h include - on which arch/x86/xen/suspend.c depended. Add the fixmap.h include explicitly. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
pte_flags() was introduced as a new pvop in order to extract just the flags portion of a pte, which is a potentially cheaper operation than extracting the page number as well. It turns out this operation is not needed, because simply using a mask to extract the flags from a pte is sufficient for all current users. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 20 1月, 2009 1 次提交
-
-
由 Brian Gerst 提交于
Impact: cleanup Copy the code to cpu_init() to satisfy the requirement that the cpu be reinitialized. Remove all other calls, since the segments are already initialized in head_64.S. Signed-off-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 18 1月, 2009 5 次提交
-
-
由 Brian Gerst 提交于
tj: * in asm-offsets_64.c, pda.h inclusion shouldn't be removed as pda is still referenced in the file * s/oldrsp/old_rsp/ Signed-off-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Brian Gerst 提交于
Also clean up PER_CPU_VAR usage in xen-asm_64.S tj: * remove now unused stack_thread_info() * s/kernelstack/kernel_stack/ * added FIXME comment in xen-asm_64.S Signed-off-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Brian Gerst 提交于
Signed-off-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Brian Gerst 提交于
Signed-off-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Brian Gerst 提交于
Signed-off-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 16 1月, 2009 3 次提交
-
-
由 Ingo Molnar 提交于
It is an optimization and a cleanup, and adds the following new generic percpu methods: percpu_read() percpu_write() percpu_add() percpu_sub() percpu_and() percpu_or() percpu_xor() and implements support for them on x86. (other architectures will fall back to a default implementation) The advantage is that for example to read a local percpu variable, instead of this sequence: return __get_cpu_var(var); ffffffff8102ca2b: 48 8b 14 fd 80 09 74 mov -0x7e8bf680(,%rdi,8),%rdx ffffffff8102ca32: 81 ffffffff8102ca33: 48 c7 c0 d8 59 00 00 mov $0x59d8,%rax ffffffff8102ca3a: 48 8b 04 10 mov (%rax,%rdx,1),%rax We can get a single instruction by using the optimized variants: return percpu_read(var); ffffffff8102ca3f: 65 48 8b 05 91 8f fd mov %gs:0x7efd8f91(%rip),%rax I also cleaned up the x86-specific APIs and made the x86 code use these new generic percpu primitives. tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out * added percpu_and() for completeness's sake * made generic percpu ops atomic against preemption Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
Do the following cleanups: * kill x86_64_init_pda() which now is equivalent to pda_init() * use per_cpu_offset() instead of cpu_pda() when initializing initial_gs Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Tejun Heo 提交于
[ Based on original patch from Christoph Lameter and Mike Travis. ] Currently pdas and percpu areas are allocated separately. %gs points to local pda and percpu area can be reached using pda->data_offset. This patch folds pda into percpu area. Due to strange gcc requirement, pda needs to be at the beginning of the percpu area so that pda->stack_canary is at %gs:40. To achieve this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is added and used to reserve pda sized chunk at the start of the percpu area. After this change, for boot cpu, %gs first points to pda in the data.init area and later during setup_per_cpu_areas() gets updated to point to the actual pda. This means that setup_per_cpu_areas() need to reload %gs for CPU0 while clearing pda area for other cpus as cpu0 already has modified it when control reaches setup_per_cpu_areas(). This patch also removes now unnecessary get_local_pda() and its call sites. A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into per cpu area" patch. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 1月, 2009 1 次提交
-
-
由 Rusty Russell 提交于
Impact: reduce stack usage, use new cpumask API. This is made a little more tricky by uv_flush_tlb_others which actually alters its argument, for an IPI to be sent to the remaining cpus in the mask. I solve this by allocating a cpumask_var_t for this case and falling back to IPI should this fail. To eliminate temporaries in the caller, all flush_tlb_others implementations now do the this-cpu-elimination step themselves. Note also the curious "cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask)" which has been there since pre-git and yet f->flush_cpumask is always zero at this point. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NMike Travis <travis@sgi.com>
-
- 31 12月, 2008 1 次提交
-
-
由 Martin Schwidefsky 提交于
The cpu time spent by the idle process actually doing something is currently accounted as idle time. This is plain wrong, the architectures that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the time spent doing nothing and the time spent by idle doing work. The first is accounted with account_idle_time and the second with account_system_time. The architectures that use the account_xxx_time interface directly and not the account_xxx_ticks interface now need to do the check for the idle process in their arch code. In particular to improve the system vs true idle time accounting the arch code needs to measure the true idle time instead of just testing for the idle process. To improve the tick based accounting as well we would need an architecture primitive that can tell us if the pt_regs of the interrupted context points to the magic instruction that halts the cpu. In addition idle time is no more added to the stime of the idle process. This field now contains the system time of the idle process as it should be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as every tick that occurs while idle is running will be accounted as idle time. This patch contains the necessary common code changes to be able to distinguish idle system time and true idle time. The architectures with support for VIRT_CPU_ACCOUNTING need some changes to exploit this. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 17 12月, 2008 6 次提交
-
-
由 Mike Travis 提交于
Impact: use new API, remove cpumask from stack. Change smp_call_function_mask() callers to smp_call_function_many(). This removes a cpumask from the stack, and falls back should allocating the cpumask var fail (only possible with CONFIG_CPUMASKS_OFFSTACK). Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NMike Travis <travis@sgi.com> Cc: jeremy@xensource.com
-
由 Mike Travis 提交于
This patch simply changes cpumask_t to struct cpumask and similar trivial modernizations. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NMike Travis <travis@sgi.com>
-
由 Mike Travis 提交于
Simple change, and eventual space saving when NR_CPUS >> nr_cpu_ids. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NMike Travis <travis@sgi.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
-
由 Mike Travis 提交于
Impact: cleanup, change parameter passing * Change genapic interfaces to accept cpumask_t pointers where possible. * Modify external callers to use cpumask_t pointers in function calls. * Create new send_IPI_mask_allbutself which is the same as the send_IPI_mask functions but removes smp_processor_id() from list. This removes another common need for a temporary cpumask_t variable. * Functions that used a temp cpumask_t variable for: cpumask_t allbutme = cpu_online_map; cpu_clear(smp_processor_id(), allbutme); if (!cpus_empty(allbutme)) ... become: if (!cpus_equal(cpu_online_map, cpumask_of_cpu(cpu))) ... * Other minor code optimizations (like using cpus_clear instead of CPU_MASK_NONE, etc.) Applies to linux-2.6.tip/master. Signed-off-by: NMike Travis <travis@sgi.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Acked-by: NIngo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
Impact: cleanup hypervisor.h had accumulated a lot of crud, including lots of spurious #includes. Clean it all up, and go around fixing up everything else accordingly. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Tej 提交于
Impact: cleanup Signed-off-by: NTej <bewith.tej@gmail.com> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 13 12月, 2008 1 次提交
-
-
由 Rusty Russell 提交于
Impact: change calling convention of existing clock_event APIs struct clock_event_timer's cpumask field gets changed to take pointer, as does the ->broadcast function. Another single-patch change. For safety, we BUG_ON() in clockevents_register_device() if it's not set. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu>
-
- 01 12月, 2008 2 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Al Viro 提交于
... so get xen-ops.h in agreement with xen/smp.c Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 11月, 2008 1 次提交
-
-
由 Ian Campbell 提交于
Impact: fix Xen guest boot failure commit eefb47f6 ("xen: use spin_lock_nest_lock when pinning a pagetable") changed xen_pgd_walk to walk over mm->pgd rather than taking pgd as an argument. This breaks xen_mm_(un)pin_all() because it makes init_mm.pgd readonly instead of the pgd we are interested in and therefore the pin subsequently fails. (XEN) mm.c:2280:d15 Bad type (saw 00000000e8000001 != exp 0000000060000000) for mfn bc464 (pfn 21ca7) (XEN) mm.c:2665:d15 Error while pinning mfn bc464 [ 14.586913] 1 multicall(s) failed: cpu 0 [ 14.586926] Pid: 14, comm: kstop/0 Not tainted 2.6.28-rc5-x86_32p-xenU-00172-gee2f6cc7 #200 [ 14.586940] Call Trace: [ 14.586955] [<c030c17a>] ? printk+0x18/0x1e [ 14.586972] [<c0103df3>] xen_mc_flush+0x163/0x1d0 [ 14.586986] [<c0104bc1>] __xen_pgd_pin+0xa1/0x110 [ 14.587000] [<c015a330>] ? stop_cpu+0x0/0xf0 [ 14.587015] [<c0104d7b>] xen_mm_pin_all+0x4b/0x70 [ 14.587029] [<c022bcb9>] xen_suspend+0x39/0xe0 [ 14.587042] [<c015a330>] ? stop_cpu+0x0/0xf0 [ 14.587054] [<c015a3cd>] stop_cpu+0x9d/0xf0 [ 14.587067] [<c01417cd>] run_workqueue+0x8d/0x150 [ 14.587080] [<c030e4b3>] ? _spin_unlock_irqrestore+0x23/0x40 [ 14.587094] [<c014558a>] ? prepare_to_wait+0x3a/0x70 [ 14.587107] [<c0141918>] worker_thread+0x88/0xf0 [ 14.587120] [<c01453c0>] ? autoremove_wake_function+0x0/0x50 [ 14.587133] [<c0141890>] ? worker_thread+0x0/0xf0 [ 14.587146] [<c014509c>] kthread+0x3c/0x70 [ 14.587157] [<c0145060>] ? kthread+0x0/0x70 [ 14.587170] [<c0109d1b>] kernel_thread_helper+0x7/0x10 [ 14.587181] call 1/3: op=14 arg=[c0415000] result=0 [ 14.587192] call 2/3: op=14 arg=[e1ca2000] result=0 [ 14.587204] call 3/3: op=26 arg=[c1808860] result=-22 Signed-off-by: NIan Campbell <ian.campbell@citrix.com> Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 07 11月, 2008 2 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Xen requires that all mappings of pagetable pages are read-only, so that they can't be updated illegally. As a result, if a page is being turned into a pagetable page, we need to make sure all its mappings are RO. If the page had been used for ioremap or vmalloc, it may still have left over mappings as a result of not having been lazily unmapped. This change makes sure we explicitly mop them all up before pinning the page. Unlike aliases created by kmap, the there can be vmalloc aliases even for non-high pages, so we must do the flush unconditionally. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Linux Memory Management List <linux-mm@kvack.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
Impact: fix 32-bit Xen guest boot crash On 32-bit PAE, pud_page, for no good reason, didn't really return a struct page *. Since Jan Beulich's fix "i386/PAE: fix pud_page()", pud_page does return a struct page *. Because PAE has 3 pagetable levels, the pud level is folded into the pgd level, so pgd_page() is the same as pud_page(), and now returns a struct page *. Update the xen/mmu.c code which uses pgd_page() accordingly. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 27 10月, 2008 1 次提交
-
-
由 Chris Lalancette 提交于
Impact: fix guest kernel boot crash on certain configs Recent i686 2.6.27 kernels with a certain amount of memory (between 736 and 855MB) have a problem booting under a hypervisor that supports batched mprotect (this includes the RHEL-5 Xen hypervisor as well as any 3.3 or later Xen hypervisor). The problem ends up being that xen_ptep_modify_prot_commit() is using virt_to_machine to calculate which pfn to update. However, this only works for pages that are in the p2m list, and the pages coming from change_pte_range() in mm/mprotect.c are kmap_atomic pages. Because of this, we can run into the situation where the lookup in the p2m table returns an INVALID_MFN, which we then try to pass to the hypervisor, which then (correctly) denies the request to a totally bogus pfn. The right thing to do is to use arbitrary_virt_to_machine, so that we can be sure we are modifying the right pfn. This unfortunately introduces a performance penalty because of a full page-table-walk, but we can avoid that penalty for pages in the p2m list by checking if virt_addr_valid is true, and if so, just doing the lookup in the p2m table. The attached patch implements this, and allows my 2.6.27 i686 based guest with 768MB of memory to boot on a RHEL-5 hypervisor again. Thanks to Jeremy for the suggestions about how to fix this particular issue. Signed-off-by: NChris Lalancette <clalance@redhat.com> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Chris Lalancette <clalance@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 21 10月, 2008 1 次提交
-
-
由 Steven Rostedt 提交于
Due to confusion between the ftrace infrastructure and the gcc profiling tracer "ftrace", this patch renames the config options from FTRACE to FUNCTION_TRACER. The other two names that are offspring from FTRACE DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same. This patch was generated mostly by script, and partially by hand. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 20 10月, 2008 1 次提交
-
-
由 Nick Piggin 提交于
Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and provide a fast, scalable percpu frontend for small vmaps (requires a slightly different API, though). The biggest problem with vmap is actually vunmap. Presently this requires a global kernel TLB flush, which on most architectures is a broadcast IPI to all CPUs to flush the cache. This is all done under a global lock. As the number of CPUs increases, so will the number of vunmaps a scaled workload will want to perform, and so will the cost of a global TLB flush. This gives terrible quadratic scalability characteristics. Another problem is that the entire vmap subsystem works under a single lock. It is a rwlock, but it is actually taken for write in all the fast paths, and the read locking would likely never be run concurrently anyway, so it's just pointless. This is a rewrite of vmap subsystem to solve those problems. The existing vmalloc API is implemented on top of the rewritten subsystem. The TLB flushing problem is solved by using lazy TLB unmapping. vmap addresses do not have to be flushed immediately when they are vunmapped, because the kernel will not reuse them again (would be a use-after-free) until they are reallocated. So the addresses aren't allocated again until a subsequent TLB flush. A single TLB flush then can flush multiple vunmaps from each CPU. XEN and PAT and such do not like deferred TLB flushing because they can't always handle multiple aliasing virtual addresses to a physical address. They now call vm_unmap_aliases() in order to flush any deferred mappings. That call is very expensive (well, actually not a lot more expensive than a single vunmap under the old scheme), however it should be OK if not called too often. The virtual memory extent information is stored in an rbtree rather than a linked list to improve the algorithmic scalability. There is a per-CPU allocator for small vmaps, which amortizes or avoids global locking. To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces must be used in place of vmap and vunmap. Vmalloc does not use these interfaces at the moment, so it will not be quite so scalable (although it will use lazy TLB flushing). As a quick test of performance, I ran a test that loops in the kernel, linearly mapping then touching then unmapping 4 pages. Different numbers of tests were run in parallel on an 4 core, 2 socket opteron. Results are in nanoseconds per map+touch+unmap. threads vanilla vmap rewrite 1 14700 2900 2 33600 3000 4 49500 2800 8 70631 2900 So with a 8 cores, the rewritten version is already 25x faster. In a slightly more realistic test (although with an older and less scalable version of the patch), I ripped the not-very-good vunmap batching code out of XFS, and implemented the large buffer mapping with vm_map_ram and vm_unmap_ram... along with a couple of other tricks, I was able to speed up a large directory workload by 20x on a 64 CPU system. I believe vmap/vunmap is actually sped up a lot more than 20x on such a system, but I'm running into other locks now. vmap is pretty well blown off the profiles. Before: 1352059 total 0.1401 798784 _write_lock 8320.6667 <- vmlist_lock 529313 default_idle 1181.5022 15242 smp_call_function 15.8771 <- vmap tlb flushing 2472 __get_vm_area_node 1.9312 <- vmap 1762 remove_vm_area 4.5885 <- vunmap 316 map_vm_area 0.2297 <- vmap 312 kfree 0.1950 300 _spin_lock 3.1250 252 sn_send_IPI_phys 0.4375 <- tlb flushing 238 vmap 0.8264 <- vmap 216 find_lock_page 0.5192 196 find_next_bit 0.3603 136 sn2_send_IPI 0.2024 130 pio_phys_write_mmr 2.0312 118 unmap_kernel_range 0.1229 After: 78406 total 0.0081 40053 default_idle 89.4040 33576 ia64_spinlock_contention 349.7500 1650 _spin_lock 17.1875 319 __reg_op 0.5538 281 _atomic_dec_and_lock 1.0977 153 mutex_unlock 1.5938 123 iget_locked 0.1671 117 xfs_dir_lookup 0.1662 117 dput 0.1406 114 xfs_iget_core 0.0268 92 xfs_da_hashname 0.1917 75 d_alloc 0.0670 68 vmap_page_range 0.0462 <- vmap 58 kmem_cache_alloc 0.0604 57 memset 0.0540 52 rb_next 0.1625 50 __copy_user 0.0208 49 bitmap_find_free_region 0.2188 <- vmap 46 ia64_sn_udelay 0.1106 45 find_inode_fast 0.1406 42 memcmp 0.2188 42 finish_task_switch 0.1094 42 __d_lookup 0.0410 40 radix_tree_lookup_slot 0.1250 37 _spin_unlock_irqrestore 0.3854 36 xfs_bmapi 0.0050 36 kmem_cache_free 0.0256 35 xfs_vn_getattr 0.0322 34 radix_tree_lookup 0.1062 33 __link_path_walk 0.0035 31 xfs_da_do_buf 0.0091 30 _xfs_buf_find 0.0204 28 find_get_page 0.0875 27 xfs_iread 0.0241 27 __strncpy_from_user 0.2812 26 _xfs_buf_initialize 0.0406 24 _xfs_buf_lookup_pages 0.0179 24 vunmap_page_range 0.0250 <- vunmap 23 find_lock_page 0.0799 22 vm_map_ram 0.0087 <- vmap 20 kfree 0.0125 19 put_page 0.0330 18 __kmalloc 0.0176 17 xfs_da_node_lookup_int 0.0086 17 _read_lock 0.0885 17 page_waitqueue 0.0664 vmap has gone from being the top 5 on the profiles and flushing the crap out of all TLBs, to using less than 1% of kernel time. [akpm@linux-foundation.org: cleanups, section fix] [akpm@linux-foundation.org: fix build on alpha] Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 10月, 2008 2 次提交
-
-
由 Thomas Gleixner 提交于
Revert the dynarray changes. They need more thought and polishing. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Alex Nixon 提交于
Following commit 9c3f2468d8339866d9ef6a25aae31a8909c6be0d, do_IRQ() looks up the IRQ number in the per-cpu variable vector_irq. This commit makes Xen initialise an identity vector_irq map for both X86_32 and X86_64. Signed-off-by: NAlex Nixon <alex.nixon@citrix.com> Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-