1. 23 3月, 2006 1 次提交
    • A
      [PATCH] more for_each_cpu() conversions · 394e3902
      Andrew Morton 提交于
      When we stop allocating percpu memory for not-possible CPUs we must not touch
      the percpu data for not-possible CPUs at all.  The correct way of doing this
      is to test cpu_possible() or to use for_each_cpu().
      
      This patch is a kernel-wide sweep of all instances of NR_CPUS.  I found very
      few instances of this bug, if any.  But the patch converts lots of open-coded
      test to use the preferred helper macros.
      
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Acked-by: NKyle McMartin <kyle@parisc-linux.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Christian Zankel <chris@zankel.net>
      Cc: Philippe Elie <phil.el@wanadoo.fr>
      Cc: Nathan Scott <nathans@sgi.com>
      Cc: Jens Axboe <axboe@suse.de>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      394e3902
  2. 20 3月, 2006 26 次提交
    • D
      [SPARC64]: Add SMT scheduling support for Niagara. · 8935dced
      David S. Miller 提交于
      The mapping is a simple "(cpuid >> 2) == core" for now.
      Later we'll add more sophisticated code that will walk
      the sun4v machine description and figure this out from
      there.
      
      We should also add core mappings for jaguar and panther
      processors.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8935dced
    • D
      [SPARC64]: Fix new context version SMP handling. · ee29074d
      David S. Miller 提交于
      Don't piggy back the SMP receive signal code to do the
      context version change handling.
      
      Instead allocate another fixed PIL number for this
      asynchronous cross-call.  We can't use smp_call_function()
      because this thing is invoked with interrupts disabled
      and a few spinlocks held.
      
      Also, fix smp_call_function_mask() to count "cpus" correctly.
      There is no guarentee that the local cpu is in the mask
      yet that is exactly what this code was assuming.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee29074d
    • D
      [SPARC64]: More SUN4V cpu mondo bug fixing. · 3cab0c3e
      David S. Miller 提交于
      This cpu mondo sending interface isn't all that easy to
      use correctly...
      
      We were clearing out the wrong bits from the "mask" after getting
      something other than EOK from the hypervisor.
      
      It turns out the hypervisor can just be resent the same cpu_list[]
      array, with the 0xffff "done" entries still in there, and it will do
      the right thing.
      
      So don't update or try to rebuild the cpu_list[] array to condense it.
      
      This requires the "forward_progress" check to be done slightly
      differently, but this new scheme is less bug prone than what we were
      doing before.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3cab0c3e
    • D
      [SPARC64]: Fix bugs in SUN4V cpu mondo dispatch. · b830ab66
      David S. Miller 提交于
      There were several bugs in the SUN4V cpu mondo dispatch code.
      
      In fact, if we ever got a EWOULDBLOCK or other error from
      the hypervisor call, we'd potentially send a cpu mondo multiple
      times to the same cpu and even worse we could loop until the
      timeout resending the same mondo over and over to such cpus.
      
      So let's bulletproof this thing as follows:
      
      1) Implement cpu_mondo_send() and cpu_state() hypervisor calls
         in arch/sparc64/kernel/entry.S, add prototypes to asm/hypervisor.h
      
      2) Don't build and update the cpulist using inline functions, this
         was causing the cpu mask to not get updated in the caller.
      
      3) Disable interrupts during the entire mondo send, otherwise our
         cpu list and/or mondo block could get overwritten if we take
         an interrupt and do a cpu mondo send on the current cpu.
      
      4) Check for all possible error return types from the cpu_mondo_send()
         hypervisor call.  In particular:
      
         HV_EOK) Our work is done, all cpus have received the mondo.
         HV_CPUERROR) One or more of the cpus in the cpu list we passed
                      to the hypervisor are in error state.  Use cpu_state()
                      calls over the entries in the cpu list to see which
      		ones.  Record them in "error_mask" and report this
      		after we are done sending the mondo to cpus which are
      		not in error state.
         HV_EWOULDBLOCK) We need to keep trying.
      
         Any other error we consider fatal, we report the event and exit
         immediately.
      
      5) We only timeout if forward progress is not made.  Forward progress
         is defined as having at least one cpu get the mondo successfully
         in a given cpu_mondo_send() call.  Otherwise we bump a counter
         and delay a little.  If the counter hits a limit, we signal an
         error and report the event.
      
      Also, smp_call_function_mask() error handling reports the number
      of cpus incorrectly.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b830ab66
    • D
      [SPARC64]: Fix bugs in SMP TLB context version expiration handling. · aac0aadf
      David S. Miller 提交于
      1) We must flush the TLB, duh.
      
      2) Even if the sw context was seen to be valid, the local cpu's
         hw context can be out of date, so reload it unconditionally.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aac0aadf
    • D
      [SPARC64]: Report mondo error correctly in hypervisor_xcall_deliver(). · 6cc80cfa
      David S. Miller 提交于
      It's in "arg0" not "func".
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6cc80cfa
    • D
      [SPARC64]: Fix TLB context allocation with SMT style shared TLBs. · a0663a79
      David S. Miller 提交于
      The context allocation scheme we use depends upon there being a 1<-->1
      mapping from cpu to physical TLB for correctness.  Chips like Niagara
      break this assumption.
      
      So what we do is notify all cpus with a cross call when the context
      version number changes, and if necessary this makes them allocate
      a valid context for the address space they are running at the time.
      
      Stress tested with make -j1024, make -j2048, and make -j4096 kernel
      builds on a 32-strand, 8 core, T2000 with 16GB of ram.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0663a79
    • D
      [SPARC64]: Kill cpudata->idle_volume. · 1bd0cd74
      David S. Miller 提交于
      Set, but never used.
      
      We used to use this for dynamic IRQ retargetting, but that
      code died a long time ago.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bd0cd74
    • D
    • D
      [SPARC64]: Get SUN4V SMP working. · 72aff53f
      David S. Miller 提交于
      The sibling cpu bringup is extremely fragile.  We can only
      perform the most basic calls until we take over the trap
      table from the firmware/hypervisor on the new cpu.
      
      This means no accesses to %g4, %g5, %g6 since those can't be
      TLB translated without our trap handlers.
      
      In order to achieve this:
      
      1) Change sun4v_init_mondo_queues() so that it can operate in
         several modes.
      
         It can allocate the queues, or install them in the current
         processor, or both.
      
         The boot cpu does both in it's call early on.
      
         Later, the boot cpu allocates the sibling cpu queue, starts
         the sibling cpu, then the sibling cpu loads them in.
      
      2) init_cur_cpu_trap() is changed to take the current_thread_info()
         as an argument instead of reading %g6 directly on the current
         cpu.
      
      3) Create a trampoline stack for the sibling cpus.  We do our basic
         kernel calls using this stack, which is locked into the kernel
         image, then go to our proper thread stack after taking over the
         trap table.
      
      4) While we are in this delicate startup state, we put 0xdeadbeef
         into %g4/%g5/%g6 in order to catch accidental accesses.
      
      5) On the final prom_set_trap_table*() call, we put &init_thread_union
         into %g6.  This is a hack to make prom_world(0) work.  All that
         wants to do is restore the %asi register using
         get_thread_current_ds().
      
      Longer term we should just do the OBP calls to set the trap table by
      hand just like we do for everything else.  This would avoid that silly
      prom_world(0) issue, then we can remove the init_thread_union hack.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72aff53f
    • D
      [SPARC64]: Add prom_{start,stop}cpu_cpuid(). · 7890f794
      David S. Miller 提交于
      Use prom_startcpu_cpuid() on SUN4V instead of prom_startcpu().
      
      We should really test for "SUNW,start-cpu-by-cpuid" presence
      and use it if present even on SUN4U.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7890f794
    • D
      f03b8a54
    • D
      [SPARC64]: Kill sun4v_register_fault_status() on SMP. · 4a07e646
      David S. Miller 提交于
      That now gets done as a side effect of taking over the
      trap table from OBP.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a07e646
    • D
      [SPARC64]: Do not try to synchronize %stick registers on SUN4V. · 02fead75
      David S. Miller 提交于
      Writes by privileged code are not allowed.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02fead75
    • D
      [SPARC64]: Fix mondo queue allocations. · b5a37e96
      David S. Miller 提交于
      We have to use bootmem during init_IRQ and page alloc
      for sibling cpu calls.
      
      Also, fix incorrect hypervisor call return value
      checks in the hypervisor SMP cpu mondo send code.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5a37e96
    • D
      [SPARC64]: Register kernel TSB with hypervisor. · 490384e7
      David S. Miller 提交于
      We do this right after we take over the trap table from OBP.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      490384e7
    • D
      [SPARC64]: Fix hypervisor call arg passing. · 164c220f
      David S. Miller 提交于
      Function goes in %o5, args go in %o0 --> %o5.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      164c220f
    • D
      [SPARC64]: Sun4v cross-call sending support. · 1d2f1f90
      David S. Miller 提交于
      Technically the hypervisor call supports sending in a list
      of all cpus to get the cross-call, but I only pass in one
      cpu at a time for now.
      
      The multi-cpu support is there, just ifdef'd out so it's easy to
      enable or delete it later.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d2f1f90
    • D
    • D
      [SPARC64]: Add some hypervisor tlb_type checks. · a43fe0e7
      David S. Miller 提交于
      And more consistently check cheetah{,_plus} instead
      of assuming anything not spitfire is cheetah{,_plus}.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a43fe0e7
    • D
      [SPARC64]: Refine code sequences to get the cpu id. · 92704a1c
      David S. Miller 提交于
      On uniprocessor, it's always zero for optimize that.
      
      On SMP, the jmpl to the stub kills the return address stack in the cpu
      branch prediction logic, so expand the code sequence inline and use a
      code patching section to fix things up.  This also always better and
      explicit register selection, which will be taken advantage of in a
      future changeset.
      
      The hard_smp_processor_id() function is big, so do not inline it.
      
      Fix up tests for Jalapeno to also test for Serrano chips too.  These
      tests want "jbus Ultra-IIIi" cases to match, so that is what we should
      test for.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92704a1c
    • D
      [SPARC64]: Kill PROM locked TLB entry preservation code. · 3487d1d4
      David S. Miller 提交于
      It is totally unnecessary complexity.  After we take over
      the trap table, we handle all PROM tlb misses fully.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3487d1d4
    • D
      [SPARC64]: Kill {save,restore}_alternate_globals() · 96c6e0d8
      David S. Miller 提交于
      No longer needed now that we no longer have hard-coded
      alternate global register usage.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96c6e0d8
    • D
      [SPARC64]: Dynamically grow TSB in response to RSS growth. · bd40791e
      David S. Miller 提交于
      As the RSS grows, grow the TSB in order to reduce the likelyhood
      of hash collisions and thus poor hit rates in the TSB.
      
      This definitely needs some serious tuning.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd40791e
    • D
      [SPARC64]: Elminate all usage of hard-coded trap globals. · 56fb4df6
      David S. Miller 提交于
      UltraSPARC has special sets of global registers which are switched to
      for certain trap types.  There is one set for MMU related traps, one
      set of Interrupt Vector processing, and another set (called the
      Alternate globals) for all other trap types.
      
      For what seems like forever we've hard coded the values in some of
      these trap registers.  Some examples include:
      
      1) Interrupt Vector global %g6 holds current processors interrupt
         work struct where received interrupts are managed for IRQ handler
         dispatch.
      
      2) MMU global %g7 holds the base of the page tables of the currently
         active address space.
      
      3) Alternate global %g6 held the current_thread_info() value.
      
      Such hardcoding has resulted in some serious issues in many areas.
      There are some code sequences where having another register available
      would help clean up the implementation.  Taking traps such as
      cross-calls from the OBP firmware requires some trick code sequences
      wherein we have to save away and restore all of the special sets of
      global registers when we enter/exit OBP.
      
      We were also using the IMMU TSB register on SMP to hold the per-cpu
      area base address, which doesn't work any longer now that we actually
      use the TSB facility of the cpu.
      
      The implementation is pretty straight forward.  One tricky bit is
      getting the current processor ID as that is different on different cpu
      variants.  We use a stub with a fancy calling convention which we
      patch at boot time.  The calling convention is that the stub is
      branched to and the (PC - 4) to return to is in register %g1.  The cpu
      number is left in %g6.  This stub can be invoked by using the
      __GET_CPUID macro.
      
      We use an array of per-cpu trap state to store the current thread and
      physical address of the current address space's page tables.  The
      TRAP_LOAD_THREAD_REG loads %g6 with the current thread from this
      table, it uses __GET_CPUID and also clobbers %g1.
      
      TRAP_LOAD_IRQ_WORK is used by the interrupt vector processing to load
      the current processor's IRQ software state into %g6.  It also uses
      __GET_CPUID and clobbers %g1.
      
      Finally, TRAP_LOAD_PGD_PHYS loads the physical address base of the
      current address space's page tables into %g7, it clobbers %g1 and uses
      __GET_CPUID.
      
      Many refinements are possible, as well as some tuning, with this stuff
      in place.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56fb4df6
    • D
      [SPARC64]: Move away from virtual page tables, part 1. · 74bf4312
      David S. Miller 提交于
      We now use the TSB hardware assist features of the UltraSPARC
      MMUs.
      
      SMP is currently knowingly broken, we need to find another place
      to store the per-cpu base pointers.  We hid them away in the TSB
      base register, and that obviously will not work any more :-)
      
      Another known broken case is non-8KB base page size.
      
      Also noticed that flush_tlb_all() is not referenced anywhere, only
      the internal __flush_tlb_all() (local cpu only) is used by the
      sparc64 port, so we can get rid of flush_tlb_all().
      
      The kernel gets it's own 8KB TSB (swapper_tsb) and each address space
      gets it's own private 8K TSB.  Later we can add code to dynamically
      increase the size of per-process TSB as the RSS grows.  An 8KB TSB is
      good enough for up to about a 4MB RSS, after which the TSB starts to
      incur many capacity and conflict misses.
      
      We even accumulate OBP translations into the kernel TSB.
      
      Another area for refinement is large page size support.  We could use
      a secondary address space TSB to handle those.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74bf4312
  3. 27 2月, 2006 1 次提交
  4. 13 1月, 2006 1 次提交
  5. 12 11月, 2005 1 次提交
  6. 09 11月, 2005 2 次提交
    • N
      [PATCH] sched: resched and cpu_idle rework · 64c7c8f8
      Nick Piggin 提交于
      Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
      confusion, and make their semantics rigid.  Improves efficiency of
      resched_task and some cpu_idle routines.
      
      * In resched_task:
      - TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
        and as we hold it during resched_task, then there is no need for an
        atomic test and set there. The only other time this should be set is
        when the task's quantum expires, in the timer interrupt - this is
        protected against because the rq lock is irq-safe.
      
      - If TIF_NEED_RESCHED is set, then we don't need to do anything. It
        won't get unset until the task get's schedule()d off.
      
      - If we are running on the same CPU as the task we resched, then set
        TIF_NEED_RESCHED and no further action is required.
      
      - If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
        after TIF_NEED_RESCHED has been set, then we need to send an IPI.
      
      Using these rules, we are able to remove the test and set operation in
      resched_task, and make clear the previously vague semantics of
      POLLING_NRFLAG.
      
      * In idle routines:
      - Enter cpu_idle with preempt disabled. When the need_resched() condition
        becomes true, explicitly call schedule(). This makes things a bit clearer
        (IMO), but haven't updated all architectures yet.
      
      - Many do a test and clear of TIF_NEED_RESCHED for some reason. According
        to the resched_task rules, this isn't needed (and actually breaks the
        assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
        held). So remove that. Generally one less locked memory op when switching
        to the idle thread.
      
      - Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
        most polling idle loops. The above resched_task semantics allow it to be
        set until before the last time need_resched() is checked before going into
        a halt requiring interrupt wakeup.
      
        Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
        can be always left set, completely eliminating resched IPIs when rescheduling
        the idle task.
      
        POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Con Kolivas <kernel@kolivas.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      64c7c8f8
    • N
      [PATCH] sched: disable preempt in idle tasks · 5bfb5d69
      Nick Piggin 提交于
      Run idle threads with preempt disabled.
      
      Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
      How did it ever work before?
      
      Might fix the CPU hotplugging hang which Nigel Cunningham noted.
      
      We think the bug hits if the idle thread is preempted after checking
      need_resched() and before going to sleep, then the CPU offlined.
      
      After calling stop_machine_run, the CPU eventually returns from preemption and
      into the idle thread and goes to sleep.  The CPU will continue executing
      previous idle and have no chance to call play_dead.
      
      By disabling preemption until we are ready to explicitly schedule, this bug is
      fixed and the idle threads generally become more robust.
      
      From: alexs <ashepard@u.washington.edu>
      
        PPC build fix
      
      From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
      
        MIPS build fix
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NYoichi Yuasa <yuasa@hh.iij4u.or.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5bfb5d69
  7. 08 11月, 2005 2 次提交
    • D
      [SPARC64] mm: Do not flush TLB mm in tlb_finish_mmu() · 62dbec78
      David S. Miller 提交于
      It isn't needed any longer, as noted by Hugh Dickins.
      
      We still need the flush routines, due to the one remaining
      call site in hugetlb_prefault_arch_hook().  That can be
      eliminated at some later point, however.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62dbec78
    • H
      [SPARC64] mm: context switch ptlock · dedeb002
      Hugh Dickins 提交于
      sparc64 is unique among architectures in taking the page_table_lock in
      its context switch (well, cris does too, but erroneously, and it's not
      yet SMP anyway).
      
      This seems to be a private affair between switch_mm and activate_mm,
      using page_table_lock as a per-mm lock, without any relation to its uses
      elsewhere.  That's fine, but comment it as such; and unlock sooner in
      switch_mm, more like in activate_mm (preemption is disabled here).
      
      There is a block of "if (0)"ed code in smp_flush_tlb_pending which would
      have liked to rely on the page_table_lock, in switch_mm and elsewhere;
      but its comment explains how dup_mmap's flush_tlb_mm defeated it.  And
      though that could have been changed at any time over the past few years,
      now the chance vanishes as we push the page_table_lock downwards, and
      perhaps split it per page table page.  Just delete that block of code.
      
      Which leaves the mysterious spin_unlock_wait(&oldmm->page_table_lock)
      in kernel/fork.c copy_mm.  Textual analysis (supported by Nick Piggin)
      suggests that the comment was written by DaveM, and that it relates to
      the defeated approach in the sparc64 smp_flush_tlb_pending.  Just delete
      this block too.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dedeb002
  8. 15 10月, 2005 1 次提交
    • D
      [SPARC64]: Fix powering off on SMP. · b4d1b825
      David S. Miller 提交于
      Doing a "SUNW,stop-self" firmware call on the other cpus is not the
      correct thing to do when dropping into the firmware for a halt,
      reboot, or power-off.
      
      For now, just do nothing to quiet the other cpus, as the system should
      be quiescent enough.  Later we may decide to implement smp_send_stop()
      like the other SMP platforms do.
      
      Based upon a report from Christopher Zimmermann.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4d1b825
  9. 26 9月, 2005 1 次提交
    • D
      [SPARC64]: Probe D/I/E-cache config and use. · 80dc0d6b
      David S. Miller 提交于
      At boot time, determine the D-cache, I-cache and E-cache size and
      line-size.  Use them in cache flushes when appropriate.
      
      This change was motivated by discovering that the D-cache on
      UltraSparc-IIIi and later are 64K not 32K, and the flushes done by the
      Cheetah error handlers were assuming a 32K size.
      
      There are still some pieces of code that are hard coding things and
      will need to be fixed up at some point.
      
      While we're here, fix the D-cache and I-cache parity error handlers
      to run with interrupts disabled, and when the trap occurs at trap
      level > 1 log the event via a counter displayed in /proc/cpuinfo.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80dc0d6b
  10. 30 8月, 2005 1 次提交
  11. 25 7月, 2005 1 次提交
  12. 13 7月, 2005 1 次提交
  13. 11 7月, 2005 1 次提交