- 23 3月, 2006 1 次提交
-
-
由 Andrew Morton 提交于
When we stop allocating percpu memory for not-possible CPUs we must not touch the percpu data for not-possible CPUs at all. The correct way of doing this is to test cpu_possible() or to use for_each_cpu(). This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very few instances of this bug, if any. But the patch converts lots of open-coded test to use the preferred helper macros. Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Acked-by: NKyle McMartin <kyle@parisc-linux.org> Cc: Anton Blanchard <anton@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Andi Kleen <ak@muc.de> Cc: Christian Zankel <chris@zankel.net> Cc: Philippe Elie <phil.el@wanadoo.fr> Cc: Nathan Scott <nathans@sgi.com> Cc: Jens Axboe <axboe@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 20 3月, 2006 26 次提交
-
-
由 David S. Miller 提交于
The mapping is a simple "(cpuid >> 2) == core" for now. Later we'll add more sophisticated code that will walk the sun4v machine description and figure this out from there. We should also add core mappings for jaguar and panther processors. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Don't piggy back the SMP receive signal code to do the context version change handling. Instead allocate another fixed PIL number for this asynchronous cross-call. We can't use smp_call_function() because this thing is invoked with interrupts disabled and a few spinlocks held. Also, fix smp_call_function_mask() to count "cpus" correctly. There is no guarentee that the local cpu is in the mask yet that is exactly what this code was assuming. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This cpu mondo sending interface isn't all that easy to use correctly... We were clearing out the wrong bits from the "mask" after getting something other than EOK from the hypervisor. It turns out the hypervisor can just be resent the same cpu_list[] array, with the 0xffff "done" entries still in there, and it will do the right thing. So don't update or try to rebuild the cpu_list[] array to condense it. This requires the "forward_progress" check to be done slightly differently, but this new scheme is less bug prone than what we were doing before. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
There were several bugs in the SUN4V cpu mondo dispatch code. In fact, if we ever got a EWOULDBLOCK or other error from the hypervisor call, we'd potentially send a cpu mondo multiple times to the same cpu and even worse we could loop until the timeout resending the same mondo over and over to such cpus. So let's bulletproof this thing as follows: 1) Implement cpu_mondo_send() and cpu_state() hypervisor calls in arch/sparc64/kernel/entry.S, add prototypes to asm/hypervisor.h 2) Don't build and update the cpulist using inline functions, this was causing the cpu mask to not get updated in the caller. 3) Disable interrupts during the entire mondo send, otherwise our cpu list and/or mondo block could get overwritten if we take an interrupt and do a cpu mondo send on the current cpu. 4) Check for all possible error return types from the cpu_mondo_send() hypervisor call. In particular: HV_EOK) Our work is done, all cpus have received the mondo. HV_CPUERROR) One or more of the cpus in the cpu list we passed to the hypervisor are in error state. Use cpu_state() calls over the entries in the cpu list to see which ones. Record them in "error_mask" and report this after we are done sending the mondo to cpus which are not in error state. HV_EWOULDBLOCK) We need to keep trying. Any other error we consider fatal, we report the event and exit immediately. 5) We only timeout if forward progress is not made. Forward progress is defined as having at least one cpu get the mondo successfully in a given cpu_mondo_send() call. Otherwise we bump a counter and delay a little. If the counter hits a limit, we signal an error and report the event. Also, smp_call_function_mask() error handling reports the number of cpus incorrectly. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
1) We must flush the TLB, duh. 2) Even if the sw context was seen to be valid, the local cpu's hw context can be out of date, so reload it unconditionally. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
It's in "arg0" not "func". Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
The context allocation scheme we use depends upon there being a 1<-->1 mapping from cpu to physical TLB for correctness. Chips like Niagara break this assumption. So what we do is notify all cpus with a cross call when the context version number changes, and if necessary this makes them allocate a valid context for the address space they are running at the time. Stress tested with make -j1024, make -j2048, and make -j4096 kernel builds on a 32-strand, 8 core, T2000 with 16GB of ram. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Set, but never used. We used to use this for dynamic IRQ retargetting, but that code died a long time ago. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
The sibling cpu bringup is extremely fragile. We can only perform the most basic calls until we take over the trap table from the firmware/hypervisor on the new cpu. This means no accesses to %g4, %g5, %g6 since those can't be TLB translated without our trap handlers. In order to achieve this: 1) Change sun4v_init_mondo_queues() so that it can operate in several modes. It can allocate the queues, or install them in the current processor, or both. The boot cpu does both in it's call early on. Later, the boot cpu allocates the sibling cpu queue, starts the sibling cpu, then the sibling cpu loads them in. 2) init_cur_cpu_trap() is changed to take the current_thread_info() as an argument instead of reading %g6 directly on the current cpu. 3) Create a trampoline stack for the sibling cpus. We do our basic kernel calls using this stack, which is locked into the kernel image, then go to our proper thread stack after taking over the trap table. 4) While we are in this delicate startup state, we put 0xdeadbeef into %g4/%g5/%g6 in order to catch accidental accesses. 5) On the final prom_set_trap_table*() call, we put &init_thread_union into %g6. This is a hack to make prom_world(0) work. All that wants to do is restore the %asi register using get_thread_current_ds(). Longer term we should just do the OBP calls to set the trap table by hand just like we do for everything else. This would avoid that silly prom_world(0) issue, then we can remove the init_thread_union hack. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Use prom_startcpu_cpuid() on SUN4V instead of prom_startcpu(). We should really test for "SUNW,start-cpu-by-cpuid" presence and use it if present even on SUN4U. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
That now gets done as a side effect of taking over the trap table from OBP. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Writes by privileged code are not allowed. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
We have to use bootmem during init_IRQ and page alloc for sibling cpu calls. Also, fix incorrect hypervisor call return value checks in the hypervisor SMP cpu mondo send code. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
We do this right after we take over the trap table from OBP. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Function goes in %o5, args go in %o0 --> %o5. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Technically the hypervisor call supports sending in a list of all cpus to get the cross-call, but I only pass in one cpu at a time for now. The multi-cpu support is there, just ifdef'd out so it's easy to enable or delete it later. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
And more consistently check cheetah{,_plus} instead of assuming anything not spitfire is cheetah{,_plus}. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
On uniprocessor, it's always zero for optimize that. On SMP, the jmpl to the stub kills the return address stack in the cpu branch prediction logic, so expand the code sequence inline and use a code patching section to fix things up. This also always better and explicit register selection, which will be taken advantage of in a future changeset. The hard_smp_processor_id() function is big, so do not inline it. Fix up tests for Jalapeno to also test for Serrano chips too. These tests want "jbus Ultra-IIIi" cases to match, so that is what we should test for. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
It is totally unnecessary complexity. After we take over the trap table, we handle all PROM tlb misses fully. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
No longer needed now that we no longer have hard-coded alternate global register usage. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
As the RSS grows, grow the TSB in order to reduce the likelyhood of hash collisions and thus poor hit rates in the TSB. This definitely needs some serious tuning. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
UltraSPARC has special sets of global registers which are switched to for certain trap types. There is one set for MMU related traps, one set of Interrupt Vector processing, and another set (called the Alternate globals) for all other trap types. For what seems like forever we've hard coded the values in some of these trap registers. Some examples include: 1) Interrupt Vector global %g6 holds current processors interrupt work struct where received interrupts are managed for IRQ handler dispatch. 2) MMU global %g7 holds the base of the page tables of the currently active address space. 3) Alternate global %g6 held the current_thread_info() value. Such hardcoding has resulted in some serious issues in many areas. There are some code sequences where having another register available would help clean up the implementation. Taking traps such as cross-calls from the OBP firmware requires some trick code sequences wherein we have to save away and restore all of the special sets of global registers when we enter/exit OBP. We were also using the IMMU TSB register on SMP to hold the per-cpu area base address, which doesn't work any longer now that we actually use the TSB facility of the cpu. The implementation is pretty straight forward. One tricky bit is getting the current processor ID as that is different on different cpu variants. We use a stub with a fancy calling convention which we patch at boot time. The calling convention is that the stub is branched to and the (PC - 4) to return to is in register %g1. The cpu number is left in %g6. This stub can be invoked by using the __GET_CPUID macro. We use an array of per-cpu trap state to store the current thread and physical address of the current address space's page tables. The TRAP_LOAD_THREAD_REG loads %g6 with the current thread from this table, it uses __GET_CPUID and also clobbers %g1. TRAP_LOAD_IRQ_WORK is used by the interrupt vector processing to load the current processor's IRQ software state into %g6. It also uses __GET_CPUID and clobbers %g1. Finally, TRAP_LOAD_PGD_PHYS loads the physical address base of the current address space's page tables into %g7, it clobbers %g1 and uses __GET_CPUID. Many refinements are possible, as well as some tuning, with this stuff in place. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
We now use the TSB hardware assist features of the UltraSPARC MMUs. SMP is currently knowingly broken, we need to find another place to store the per-cpu base pointers. We hid them away in the TSB base register, and that obviously will not work any more :-) Another known broken case is non-8KB base page size. Also noticed that flush_tlb_all() is not referenced anywhere, only the internal __flush_tlb_all() (local cpu only) is used by the sparc64 port, so we can get rid of flush_tlb_all(). The kernel gets it's own 8KB TSB (swapper_tsb) and each address space gets it's own private 8K TSB. Later we can add code to dynamically increase the size of per-process TSB as the RSS grows. An 8KB TSB is good enough for up to about a 4MB RSS, after which the TSB starts to incur many capacity and conflict misses. We even accumulate OBP translations into the kernel TSB. Another area for refinement is large page size support. We could use a secondary address space TSB to handle those. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 2月, 2006 1 次提交
-
-
由 David S. Miller 提交于
The change to kernel/sched.c's init code to use for_each_cpu() requires that the cpu_possible_map be setup much earlier. Set it up via setup_arch(), constrained to NR_CPUS, and later constrain it to max_cpus in smp_prepare_cpus(). This fixes SMP booting on sparc64. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 1月, 2006 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 12 11月, 2005 1 次提交
-
-
由 David S. Miller 提交于
Noticed by Tom 'spot' Callaway. Even on uniprocessor we always reported the number of physical cpus in the system via /proc/cpuinfo. But when this got changed to use num_possible_cpus() it always reads as "1" on uniprocessor. This change was unintentional. So scan the firmware device tree and count the number of cpu nodes, and report that, as we always did. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 11月, 2005 2 次提交
-
-
由 Nick Piggin 提交于
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce confusion, and make their semantics rigid. Improves efficiency of resched_task and some cpu_idle routines. * In resched_task: - TIF_NEED_RESCHED is only cleared with the task's runqueue lock held, and as we hold it during resched_task, then there is no need for an atomic test and set there. The only other time this should be set is when the task's quantum expires, in the timer interrupt - this is protected against because the rq lock is irq-safe. - If TIF_NEED_RESCHED is set, then we don't need to do anything. It won't get unset until the task get's schedule()d off. - If we are running on the same CPU as the task we resched, then set TIF_NEED_RESCHED and no further action is required. - If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set after TIF_NEED_RESCHED has been set, then we need to send an IPI. Using these rules, we are able to remove the test and set operation in resched_task, and make clear the previously vague semantics of POLLING_NRFLAG. * In idle routines: - Enter cpu_idle with preempt disabled. When the need_resched() condition becomes true, explicitly call schedule(). This makes things a bit clearer (IMO), but haven't updated all architectures yet. - Many do a test and clear of TIF_NEED_RESCHED for some reason. According to the resched_task rules, this isn't needed (and actually breaks the assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock held). So remove that. Generally one less locked memory op when switching to the idle thread. - Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner most polling idle loops. The above resched_task semantics allow it to be set until before the last time need_resched() is checked before going into a halt requiring interrupt wakeup. Many idle routines simply never enter such a halt, and so POLLING_NRFLAG can be always left set, completely eliminating resched IPIs when rescheduling the idle task. POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs. Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Con Kolivas <kernel@kolivas.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nick Piggin 提交于
Run idle threads with preempt disabled. Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()). How did it ever work before? Might fix the CPU hotplugging hang which Nigel Cunningham noted. We think the bug hits if the idle thread is preempted after checking need_resched() and before going to sleep, then the CPU offlined. After calling stop_machine_run, the CPU eventually returns from preemption and into the idle thread and goes to sleep. The CPU will continue executing previous idle and have no chance to call play_dead. By disabling preemption until we are ready to explicitly schedule, this bug is fixed and the idle threads generally become more robust. From: alexs <ashepard@u.washington.edu> PPC build fix From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> MIPS build fix Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NYoichi Yuasa <yuasa@hh.iij4u.or.jp> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 08 11月, 2005 2 次提交
-
-
由 David S. Miller 提交于
It isn't needed any longer, as noted by Hugh Dickins. We still need the flush routines, due to the one remaining call site in hugetlb_prefault_arch_hook(). That can be eliminated at some later point, however. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hugh Dickins 提交于
sparc64 is unique among architectures in taking the page_table_lock in its context switch (well, cris does too, but erroneously, and it's not yet SMP anyway). This seems to be a private affair between switch_mm and activate_mm, using page_table_lock as a per-mm lock, without any relation to its uses elsewhere. That's fine, but comment it as such; and unlock sooner in switch_mm, more like in activate_mm (preemption is disabled here). There is a block of "if (0)"ed code in smp_flush_tlb_pending which would have liked to rely on the page_table_lock, in switch_mm and elsewhere; but its comment explains how dup_mmap's flush_tlb_mm defeated it. And though that could have been changed at any time over the past few years, now the chance vanishes as we push the page_table_lock downwards, and perhaps split it per page table page. Just delete that block of code. Which leaves the mysterious spin_unlock_wait(&oldmm->page_table_lock) in kernel/fork.c copy_mm. Textual analysis (supported by Nick Piggin) suggests that the comment was written by DaveM, and that it relates to the defeated approach in the sparc64 smp_flush_tlb_pending. Just delete this block too. Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 10月, 2005 1 次提交
-
-
由 David S. Miller 提交于
Doing a "SUNW,stop-self" firmware call on the other cpus is not the correct thing to do when dropping into the firmware for a halt, reboot, or power-off. For now, just do nothing to quiet the other cpus, as the system should be quiescent enough. Later we may decide to implement smp_send_stop() like the other SMP platforms do. Based upon a report from Christopher Zimmermann. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 9月, 2005 1 次提交
-
-
由 David S. Miller 提交于
At boot time, determine the D-cache, I-cache and E-cache size and line-size. Use them in cache flushes when appropriate. This change was motivated by discovering that the D-cache on UltraSparc-IIIi and later are 64K not 32K, and the flushes done by the Cheetah error handlers were assuming a 32K size. There are still some pieces of code that are hard coding things and will need to be fixed up at some point. While we're here, fix the D-cache and I-cache parity error handlers to run with interrupts disabled, and when the trap occurs at trap level > 1 log the event via a counter displayed in /proc/cpuinfo. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 8月, 2005 1 次提交
-
-
由 David S. Miller 提交于
It appears that a memory barrier soon after a mispredicted branch, not just in the delay slot, can cause the hang condition of this cpu errata. So move them out-of-line, and explicitly put them into a "branch always, predict taken" delay slot which should fully kill this problem. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 7月, 2005 1 次提交
-
-
由 David S. Miller 提交于
These two bits were accesses non-atomically from assembler code. So, in order to eliminate any potential races resulting from that, move these pieces of state into two bytes elsewhere in struct thread_info. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 7月, 2005 1 次提交
-
-
由 Andrew Morton 提交于
arch/sparc64/kernel/smp.c:48: error: parse error before "__attribute__" arch/sparc64/kernel/smp.c:49: error: parse error before "__attribute__" Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 7月, 2005 1 次提交
-
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-