1. 12 1月, 2010 1 次提交
  2. 25 11月, 2009 1 次提交
  3. 24 11月, 2009 2 次提交
    • S
      sh: Minor optimisations to FPU handling · d3ea9fa0
      Stuart Menefy 提交于
      A number of small optimisations to FPU handling, in particular:
      
       - move the task USEDFPU flag from the thread_info flags field (which
         is accessed asynchronously to the thread) to a new status field,
         which is only accessed by the thread itself. This allows locking to
         be removed in most cases, or can be reduced to a preempt_lock().
         This mimics the i386 behaviour.
      
       - move the modification of regs->sr and thread_info->status flags out
         of save_fpu() to __unlazy_fpu(). This gives the compiler a better
         chance to optimise things, as well as making save_fpu() symmetrical
         with restore_fpu() and init_fpu().
      
       - implement prepare_to_copy(), so that when creating a thread, we can
         unlazy the FPU prior to copying the thread data structures.
      
      Also make sure that the FPU is disabled while in the kernel, in
      particular while booting, and for newly created kernel threads,
      
      In a very artificial benchmark, the execution time for 2500000
      context switches was reduced from 50 to 45 seconds.
      Signed-off-by: NStuart Menefy <stuart.menefy@st.com>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      d3ea9fa0
    • G
      sh: add sleazy FPU optimization · a0458b07
      Giuseppe CAVALLARO 提交于
      sh port of the sLeAZY-fpu feature currently implemented for some architectures
      such us i386.
      
      Right now the SH kernel has a 100% lazy fpu behaviour.
      This is of course great for applications that have very sporadic or no FPU use.
      However for very frequent FPU users...  you take an extra trap every context
      switch.
      The patch below adds a simple heuristic to this code: after 5 consecutive
      context switches of FPU use, the lazy behavior is disabled and the context
      gets restored every context switch.
      After 256 switches, this is reset and the 100% lazy behavior is returned.
      
      Tests with LMbench showed no regression.
      I saw a little improvement due to the prefetching (~2%).
      
      The tests below also show that, with this sLeazy patch, indeed,
      the number of FPU exceptions is reduced.
      To test this. I hacked the lat_ctx LMBench to use the FPU a little more.
      
         sLeasy implementation
         ===========================================
         switch_to calls            |  79326
         sleasy   calls             |  42577
         do_fpu_state_restore  calls|  59232
         restore_fpu   calls        |  59032
      
         Exceptions:  0x800 (FPU disabled  ): 16604
      
         100% Leazy (default implementation)
         ===========================================
         switch_to  calls            |  79690
         do_fpu_state_restore calls  |  53299
         restore_fpu  calls          |   53101
      
         Exceptions: 0x800 (FPU disabled  ):  53273
      Signed-off-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: NStuart Menefy <stuart.menefy@st.com>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      a0458b07
  4. 27 10月, 2009 1 次提交
  5. 24 8月, 2009 1 次提交
  6. 11 7月, 2009 1 次提交
    • M
      sh: Mark __switch_to() as __notrace_funcgraph · 7816fecd
      Matt Fleming 提交于
      Annotate __switch_to() so that the function graph tracer does not try to
      trace it. Use __notrace_funcgraph, as opposed to notrace, so that other
      tracers can continue to trace __switch_to().
      
      The reason that we don't want to trace __switch_to() with the function
      graph tracer is because of how the return address stack in task_struct
      is implemented. When we enter __switch_to we store the real return
      address on prev's ret_stack. When we return from __switch_to() we've
      patched the return address on the kernel stack to be
      return_to_handler. Calling return_to_handler we do,
      
             -> ftrace_return_to_handler()
             	  -> ftrace_pop_return_ftrace()
      
      Which tries to pop the real return address from current->ret_stack. The
      problem being that we stored the return address on prev->ret_stack, but
      current now points to next, and next->ret_stack doesn't contain the
      correct return address (and is possibly even empty).
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      7816fecd
  7. 19 6月, 2009 1 次提交
  8. 18 6月, 2009 1 次提交
  9. 08 5月, 2009 1 次提交
  10. 04 4月, 2009 1 次提交
    • M
      sh: Fix up DSP context save/restore. · 01ab1039
      Michael Trimarchi 提交于
      There were a number of issues with the DSP context save/restore code,
      mostly left-over relics from when it was introduced on SH3-DSP with
      little follow-up testing, resulting in things like task_pt_dspregs()
      referencing incorrect state on the stack.
      
      This follows the MIPS convention of tracking the DSP state in the
      thread_struct and handling the state save/restore in switch_to() and
      finish_arch_switch() respectively. The regset interface is also updated,
      which allows us to finally be rid of task_pt_dspregs() and the special
      cased task_pt_regs().
      Signed-off-by: NMichael Trimarchi <michael@evidence.eu.com>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      01ab1039
  11. 03 4月, 2009 1 次提交
  12. 22 12月, 2008 5 次提交
  13. 21 9月, 2008 2 次提交
    • P
      sh: Add FPU registers to regset interface. · e7ab3cd2
      Paul Mundt 提交于
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      e7ab3cd2
    • P
      sh: Trivial trace_mark() instrumentation for core events. · 3d58695e
      Paul Mundt 提交于
      This implements a few trace points across events that are deemed
      interesting. This implements a number of trace points:
      
      	- The page fault handler / TLB miss
      	- IPC calls
      	- Kernel thread creation
      
      The original LTTng patch had the slow-path instrumented, which
      fails to account for the vast majority of events. In general
      placing this in the fast-path is not a huge performance hit, as
      we don't take page faults for kernel addresses.
      
      The other bits of interest are some of the other trap handlers, as
      well as the syscall entry/exit (which is better off being handled
      through the tracehook API). Most of the other trap handlers are corner
      cases where alternate means of notification exist, so there is little
      value in placing extra trace points in these locations.
      
      Based on top of the points provided both by the LTTng instrumentation
      patch as well as the patch shipping in the ST-Linux tree, albeit in a
      stripped down form.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      3d58695e
  14. 08 9月, 2008 2 次提交
  15. 28 7月, 2008 1 次提交
    • A
      sh/kernel/ cleanups · 4c1cfab1
      Adrian Bunk 提交于
      This patch contains the following cleanups:
      - make the following needlessly global code static:
        - cf-enabler.c: cf_init()
        - cpu/clock.c: __clk_enable()
        - cpu/clock.c: __clk_disable()
        - process_32.c: default_idle()
        - time_32.c: struct clocksource_sh
        - timers/timer-tmu.c: struct tmu_timer_ops
      - remove the following unused functions (no CONFIG_BLK_DEV_FD on sh):
        - process_{32,64}.c: disable_hlt()
        - process_{32,64}.c: enable_hlt()
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      4c1cfab1
  16. 19 7月, 2008 1 次提交
    • T
      nohz: prevent tick stop outside of the idle loop · b8f8c3cf
      Thomas Gleixner 提交于
      Jack Ren and Eric Miao tracked down the following long standing
      problem in the NOHZ code:
      
      	scheduler switch to idle task
      	enable interrupts
      
      Window starts here
      
      	----> interrupt happens (does not set NEED_RESCHED)
      	      	irq_exit() stops the tick
      
      	----> interrupt happens (does set NEED_RESCHED)
      
      	return from schedule()
      	
      	cpu_idle(): preempt_disable();
      
      Window ends here
      
      The interrupts can happen at any point inside the race window. The
      first interrupt stops the tick, the second one causes the scheduler to
      rerun and switch away from idle again and we end up with the tick
      disabled.
      
      The fact that it needs two interrupts where the first one does not set
      NEED_RESCHED and the second one does made the bug obscure and extremly
      hard to reproduce and analyse. Kudos to Jack and Eric.
      
      Solution: Limit the NOHZ functionality to the idle loop to make sure
      that we can not run into such a situation ever again.
      
      cpu_idle()
      {
      	preempt_disable();
      
      	while(1) {
      		 tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
      		 			          are in the idle loop
      
      		 while (!need_resched())
      		       halt();
      
      		 tick_nohz_restart_sched_tick(); <- disables NOHZ mode
      		 preempt_enable_no_resched();
      		 schedule();
      		 preempt_disable();
      	}
      }
      
      In hindsight we should have done this forever, but ... 
      
      /me grabs a large brown paperbag.
      
      Debugged-by: Jack Ren <jack.ren@marvell.com>, 
      Debugged-by: Neric miao <eric.y.miao@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b8f8c3cf
  17. 26 3月, 2008 1 次提交
  18. 28 1月, 2008 4 次提交
  19. 20 10月, 2007 1 次提交
  20. 28 9月, 2007 2 次提交
  21. 31 7月, 2007 1 次提交
  22. 26 7月, 2007 1 次提交
  23. 11 6月, 2007 1 次提交
  24. 08 6月, 2007 2 次提交
  25. 21 5月, 2007 1 次提交
    • P
      sh: sr.bl toggling around idle sleep. · f3a9022f
      Paul Mundt 提交于
      As pointed out by Saito-san, without the sr.bl manipulation we can
      occasionally hit delays in the idle loop due to interrupt handling, so
      ensure that interrupts are blocked before going to sleep.
      
      At the same time, we throw in TIF_POLLING_NRFLAG for the !hlt_counter
      case (primarily used by the ST-40 parts).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      f3a9022f
  26. 09 5月, 2007 3 次提交
    • P
      sh: clockevent/clocksource/hrtimers/nohz TMU support. · 57be2b48
      Paul Mundt 提交于
      This adds basic support for clockevents and clocksources,
      presently only implemented for TMU-based systems (which
      are the majority of SH-3 and SH-4 systems).
      
      The old NO_IDLE_HZ implementation is also dropped completely,
      the only users of this were on TMU-based systems anyways.
      
      More work needs to be done to generalize the TMU handling,
      in that the current implementation is rather tied to the
      notion of TMU0 and TMU1 utilization.
      
      Additionally, as more SH timers switch over to this scheme,
      we'll be able to gut most of the remaining system timer
      infrastructure that existed before.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      57be2b48
    • P
      sh: Convert to common die chain. · b118ca57
      Paul Mundt 提交于
      This went in immediately after SH added the die chain notifiers,
      so move over to that instead..
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      b118ca57
    • P
      sh: Fix PC adjustments for varying opcode length. · 53f983a9
      Paul Mundt 提交于
      There are a few different cases for figuring out how to
      size the instruction. We read in the instruction located
      at regs->pc - 4 when rewinding the opcode to figure out if
      there's a 32-bit opcode before the faulting instruction, with
      a default of a - 2 adjustment on a mismatch. In practice this
      works for the cases where pc - 4 is just another 16-bit opcode,
      or we happen to have a 32-bit and a 16-bit immediately
      preceeding the pc value.
      
      In the cases where we aren't rewinding, this is much less ugly..
      
      We also don't bother fixing up the places where we're explicitly
      dealing with 16-bit instructions, since this might lead to
      confusion regarding the encoding size possibilities on other
      CPU variants.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      53f983a9