1. 08 5月, 2012 2 次提交
  2. 29 3月, 2012 1 次提交
  3. 09 3月, 2012 1 次提交
    • B
      powerpc: Rework lazy-interrupt handling · 7230c564
      Benjamin Herrenschmidt 提交于
      The current implementation of lazy interrupts handling has some
      issues that this tries to address.
      
      We don't do the various workarounds we need to do when re-enabling
      interrupts in some cases such as when returning from an interrupt
      and thus we may still lose or get delayed decrementer or doorbell
      interrupts.
      
      The current scheme also makes it much harder to handle the external
      "edge" interrupts provided by some BookE processors when using the
      EPR facility (External Proxy) and the Freescale Hypervisor.
      
      Additionally, we tend to keep interrupts hard disabled in a number
      of cases, such as decrementer interrupts, external interrupts, or
      when a masked decrementer interrupt is pending. This is sub-optimal.
      
      This is an attempt at fixing it all in one go by reworking the way
      we do the lazy interrupt disabling from the ground up.
      
      The base idea is to replace the "hard_enabled" field with a
      "irq_happened" field in which we store a bit mask of what interrupt
      occurred while soft-disabled.
      
      When re-enabling, either via arch_local_irq_restore() or when returning
      from an interrupt, we can now decide what to do by testing bits in that
      field.
      
      We then implement replaying of the missed interrupts either by
      re-using the existing exception frame (in exception exit case) or via
      the creation of a new one from an assembly trampoline (in the
      arch_local_irq_enable case).
      
      This removes the need to play with the decrementer to try to create
      fake interrupts, among others.
      
      In addition, this adds a few refinements:
      
       - We no longer  hard disable decrementer interrupts that occur
      while soft-disabled. We now simply bump the decrementer back to max
      (on BookS) or leave it stopped (on BookE) and continue with hard interrupts
      enabled, which means that we'll potentially get better sample quality from
      performance monitor interrupts.
      
       - Timer, decrementer and doorbell interrupts now hard-enable
      shortly after removing the source of the interrupt, which means
      they no longer run entirely hard disabled. Again, this will improve
      perf sample quality.
      
       - On Book3E 64-bit, we now make the performance monitor interrupt
      act as an NMI like Book3S (the necessary C code for that to work
      appear to already be present in the FSL perf code, notably calling
      nmi_enter instead of irq_enter). (This also fixes a bug where BookE
      perfmon interrupts could clobber r14 ... oops)
      
       - We could make "masked" decrementer interrupts act as NMIs when doing
      timer-based perf sampling to improve the sample quality.
      
      Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2:
      
      - Add hard-enable to decrementer, timer and doorbells
      - Fix CR clobber in masked irq handling on BookE
      - Make embedded perf interrupt act as an NMI
      - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
        to retrigger an interrupt without preventing hard-enable
      
      v3:
      
       - Fix or vs. ori bug on Book3E
       - Fix enabling of interrupts for some exceptions on Book3E
      
      v4:
      
       - Fix resend of doorbells on return from interrupt on Book3E
      
      v5:
      
       - Rebased on top of my latest series, which involves some significant
      rework of some aspects of the patch.
      
      v6:
       - 32-bit compile fix
       - more compile fixes with various .config combos
       - factor out the asm code to soft-disable interrupts
       - remove the C wrapper around preempt_schedule_irq
      
      v7:
       - Fix a bug with hard irq state tracking on native power7
      7230c564
  4. 01 3月, 2012 2 次提交
  5. 11 1月, 2012 1 次提交
    • A
      powerpc: Fix RCU idle and hcall tracing · a5ccfee0
      Anton Blanchard 提交于
      Tracepoints should not be called inside an rcu_idle_enter/rcu_idle_exit
      region. Since pSeries calls H_CEDE in the idle loop, we were violating
      this rule.
      
      commit a7b152d5 (powerpc: Tell RCU about idle after hcall tracing)
      tried to work around it by delaying the rcu_idle_enter until after we
      called the hcall tracepoint, but there are a number of issues with it.
      
      The hcall tracepoint trampoline code is called conditionally when the
      tracepoint is enabled. If the tracepoint is not enabled we never call
      rcu_idle_enter. The idle_uses_rcu check was also done at compile time
      which breaks multiplatform builds.
      
      The simple fix is to avoid tracing H_CEDE and rely on other tracepoints
      and the hypervisor dispatch trace log to work out if we called H_CEDE.
      
      This fixes a hang during boot on pSeries.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a5ccfee0
  6. 12 12月, 2011 4 次提交
    • F
      nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu() · 1268fbc7
      Frederic Weisbecker 提交于
      Those two APIs were provided to optimize the calls of
      tick_nohz_idle_enter() and rcu_idle_enter() into a single
      irq disabled section. This way no interrupt happening in-between would
      needlessly process any RCU job.
      
      Now we are talking about an optimization for which benefits
      have yet to be measured. Let's start simple and completely decouple
      idle rcu and dyntick idle logics to simplify.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1268fbc7
    • P
      powerpc: Tell RCU about idle after hcall tracing · a7b152d5
      Paul E. McKenney 提交于
      The PowerPC pSeries platform (CONFIG_PPC_PSERIES=y) enables
      hypervisor-call tracing for CONFIG_TRACEPOINTS=y kernels.  One of the
      hypervisor calls that is traced is the H_CEDE call in the idle loop
      that tells the hypervisor that this OS instance no longer needs the
      current CPU.  However, tracing uses RCU, so this combination of kernel
      configuration variables needs to avoid telling RCU about the current CPU's
      idleness until after the H_CEDE-entry tracing completes on the one hand,
      and must tell RCU that the the current CPU is no longer idle before the
      H_CEDE-exit tracing starts.
      
      In all other cases, it suffices to inform RCU of CPU idleness upon
      idle-loop entry and exit.
      
      This commit makes the required adjustments.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      a7b152d5
    • F
      nohz: Allow rcu extended quiescent state handling seperately from tick stop · 2bbb6817
      Frederic Weisbecker 提交于
      It is assumed that rcu won't be used once we switch to tickless
      mode and until we restart the tick. However this is not always
      true, as in x86-64 where we dereference the idle notifiers after
      the tick is stopped.
      
      To prepare for fixing this, add two new APIs:
      tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
      
      If no use of RCU is made in the idle loop between
      tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
      must instead call the new *_norcu() version such that the arch doesn't
      need to call rcu_idle_enter() and rcu_idle_exit().
      
      Otherwise the arch must call tick_nohz_enter_idle() and
      tick_nohz_exit_idle() and also call explicitly:
      
      - rcu_idle_enter() after its last use of RCU before the CPU is put
      to sleep.
      - rcu_idle_exit() before the first use of RCU after the CPU is woken
      up.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2bbb6817
    • F
      nohz: Separate out irq exit and idle loop dyntick logic · 280f0677
      Frederic Weisbecker 提交于
      The tick_nohz_stop_sched_tick() function, which tries to delay
      the next timer tick as long as possible, can be called from two
      places:
      
      - From the idle loop to start the dytick idle mode
      - From interrupt exit if we have interrupted the dyntick
      idle mode, so that we reprogram the next tick event in
      case the irq changed some internal state that requires this
      action.
      
      There are only few minor differences between both that
      are handled by that function, driven by the ts->inidle
      cpu variable and the inidle parameter. The whole guarantees
      that we only update the dyntick mode on irq exit if we actually
      interrupted the dyntick idle mode, and that we enter in RCU extended
      quiescent state from idle loop entry only.
      
      Split this function into:
      
      - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
      dynticks idle mode unconditionally if it can, and enters into RCU
      extended quiescent state.
      
      - tick_nohz_irq_exit() which only updates the dynticks idle mode
      when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).
      
      To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
      into tick_nohz_idle_exit().
      
      This simplifies the code and micro-optimize the irq exit path (no need
      for local_irq_save there). This also prepares for the split between
      dynticks and rcu extended quiescent state logics. We'll need this split to
      further fix illegal uses of RCU in extended quiescent states in the idle
      loop.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      280f0677
  7. 08 12月, 2011 1 次提交
  8. 24 8月, 2010 1 次提交
    • S
      powerpc: Re-enable preemption before cpu_die() · a7c2bb82
      Signed-off-by: Darren Hart 提交于
      start_secondary() is called shortly after _start and also via
      
      cpu_idle()->cpu_die()->pseries_mach_cpu_die()
      
      start_secondary() expects a preempt_count() of 0. pseries_mach_cpu_die() is
      called via the cpu_idle() routine with preemption disabled, resulting in the
      following repeating message during rapid cpu offline/online tests
      with CONFIG_PREEMPT=y:
      
      BUG: scheduling while atomic: swapper/0/0x00000002
      Modules linked in: autofs4 binfmt_misc dm_mirror dm_region_hash dm_log [last unloaded: scsi_wait_scan]
      Call Trace:
      [c00000010e7079c0] [c0000000000133ec] .show_stack+0xd8/0x218 (unreliable)
      [c00000010e707aa0] [c0000000006a47f0] .dump_stack+0x28/0x3c
      [c00000010e707b20] [c00000000006e7a4] .__schedule_bug+0x7c/0x9c
      [c00000010e707bb0] [c000000000699d9c] .schedule+0x104/0x800
      [c00000010e707cd0] [c000000000015b24] .cpu_idle+0x1c4/0x1d8
      [c00000010e707d70] [c0000000006aa1b4] .start_secondary+0x398/0x3d4
      [c00000010e707e30] [c000000000008278] .start_secondary_resume+0x10/0x14
      
      Move the cpu_die() call inside the existing preemption enabled block of
      cpu_idle(). This is safe as the idle task is affined to a single CPU so the
      debug_smp_processor_id() tests (from cpu_should_die()) won't trigger as we are
      in a "migration disabled" region.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NWill Schmidt <will_schmidt@vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Nathan Fontenot <nfont@austin.ibm.com>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Cc: Brian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a7c2bb82
  9. 19 11月, 2009 1 次提交
  10. 12 11月, 2009 1 次提交
  11. 21 11月, 2008 1 次提交
    • S
      powerpc: ftrace, do not latency trace idle · 6d07bb47
      Steven Rostedt 提交于
      Impact: fix for irq off latency tracer
      
      When idle is called, interrupts are disabled, but the idle function
      will still wake up on an interrupt. The problem is that the interrupt
      disabled latency tracer will take this call to idle as a latency.
      
      This patch disables the latency tracing when going into idle.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      6d07bb47
  12. 30 9月, 2008 1 次提交
  13. 19 7月, 2008 1 次提交
    • T
      nohz: prevent tick stop outside of the idle loop · b8f8c3cf
      Thomas Gleixner 提交于
      Jack Ren and Eric Miao tracked down the following long standing
      problem in the NOHZ code:
      
      	scheduler switch to idle task
      	enable interrupts
      
      Window starts here
      
      	----> interrupt happens (does not set NEED_RESCHED)
      	      	irq_exit() stops the tick
      
      	----> interrupt happens (does set NEED_RESCHED)
      
      	return from schedule()
      	
      	cpu_idle(): preempt_disable();
      
      Window ends here
      
      The interrupts can happen at any point inside the race window. The
      first interrupt stops the tick, the second one causes the scheduler to
      rerun and switch away from idle again and we end up with the tick
      disabled.
      
      The fact that it needs two interrupts where the first one does not set
      NEED_RESCHED and the second one does made the bug obscure and extremly
      hard to reproduce and analyse. Kudos to Jack and Eric.
      
      Solution: Limit the NOHZ functionality to the idle loop to make sure
      that we can not run into such a situation ever again.
      
      cpu_idle()
      {
      	preempt_disable();
      
      	while(1) {
      		 tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
      		 			          are in the idle loop
      
      		 while (!need_resched())
      		       halt();
      
      		 tick_nohz_restart_sched_tick(); <- disables NOHZ mode
      		 preempt_enable_no_resched();
      		 schedule();
      		 preempt_disable();
      	}
      }
      
      In hindsight we should have done this forever, but ... 
      
      /me grabs a large brown paperbag.
      
      Debugged-by: Jack Ren <jack.ren@marvell.com>, 
      Debugged-by: Neric miao <eric.y.miao@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b8f8c3cf
  14. 08 11月, 2007 1 次提交
  15. 03 10月, 2007 1 次提交
  16. 07 5月, 2007 1 次提交
  17. 15 2月, 2007 2 次提交
  18. 25 10月, 2006 1 次提交
  19. 01 7月, 2006 1 次提交
  20. 14 4月, 2006 1 次提交
  21. 27 3月, 2006 1 次提交
    • P
      powerpc: Unify the 32 and 64 bit idle loops · a0652fc9
      Paul Mackerras 提交于
      This unifies the 32-bit (ARCH=ppc and ARCH=powerpc) and 64-bit idle
      loops.  It brings over the concept of having a ppc_md.power_save
      function from 32-bit to ARCH=powerpc, which lets us get rid of
      native_idle().  With this we will also be able to simplify the idle
      handling for pSeries and cell.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      a0652fc9
  22. 18 11月, 2005 1 次提交
  23. 10 11月, 2005 1 次提交
  24. 09 11月, 2005 2 次提交
    • N
      [PATCH] sched: resched and cpu_idle rework · 64c7c8f8
      Nick Piggin 提交于
      Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
      confusion, and make their semantics rigid.  Improves efficiency of
      resched_task and some cpu_idle routines.
      
      * In resched_task:
      - TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
        and as we hold it during resched_task, then there is no need for an
        atomic test and set there. The only other time this should be set is
        when the task's quantum expires, in the timer interrupt - this is
        protected against because the rq lock is irq-safe.
      
      - If TIF_NEED_RESCHED is set, then we don't need to do anything. It
        won't get unset until the task get's schedule()d off.
      
      - If we are running on the same CPU as the task we resched, then set
        TIF_NEED_RESCHED and no further action is required.
      
      - If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
        after TIF_NEED_RESCHED has been set, then we need to send an IPI.
      
      Using these rules, we are able to remove the test and set operation in
      resched_task, and make clear the previously vague semantics of
      POLLING_NRFLAG.
      
      * In idle routines:
      - Enter cpu_idle with preempt disabled. When the need_resched() condition
        becomes true, explicitly call schedule(). This makes things a bit clearer
        (IMO), but haven't updated all architectures yet.
      
      - Many do a test and clear of TIF_NEED_RESCHED for some reason. According
        to the resched_task rules, this isn't needed (and actually breaks the
        assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
        held). So remove that. Generally one less locked memory op when switching
        to the idle thread.
      
      - Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
        most polling idle loops. The above resched_task semantics allow it to be
        set until before the last time need_resched() is checked before going into
        a halt requiring interrupt wakeup.
      
        Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
        can be always left set, completely eliminating resched IPIs when rescheduling
        the idle task.
      
        POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Con Kolivas <kernel@kolivas.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      64c7c8f8
    • N
      [PATCH] sched: disable preempt in idle tasks · 5bfb5d69
      Nick Piggin 提交于
      Run idle threads with preempt disabled.
      
      Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
      How did it ever work before?
      
      Might fix the CPU hotplugging hang which Nigel Cunningham noted.
      
      We think the bug hits if the idle thread is preempted after checking
      need_resched() and before going to sleep, then the CPU offlined.
      
      After calling stop_machine_run, the CPU eventually returns from preemption and
      into the idle thread and goes to sleep.  The CPU will continue executing
      previous idle and have no chance to call play_dead.
      
      By disabling preemption until we are ready to explicitly schedule, this bug is
      fixed and the idle threads generally become more robust.
      
      From: alexs <ashepard@u.washington.edu>
      
        PPC build fix
      
      From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
      
        MIPS build fix
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NYoichi Yuasa <yuasa@hh.iij4u.or.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5bfb5d69
  25. 07 11月, 2005 1 次提交
    • P
      powerpc: Various UP build fixes · 2249ca9d
      Paul Mackerras 提交于
      Mostly this involves adding #include <asm/smp.h>, since that defines
      things like boot_cpuid[_phys] and [gs]et_hard_smp_processor_id, which
      are SMP-related but still needed on UP.  This incorporates fixes
      posted by Olof Johansson and Heikki Lindholm.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      2249ca9d
  26. 19 10月, 2005 1 次提交
    • P
      powerpc: Merge machdep.h · 143a1dec
      Paul Mackerras 提交于
      A few things change for consistency between ppc32 and ppc64:
      idle functions return void; *_get_boot_time functions return
      unsigned long (i.e. time_t) rather than filling in a struct rtc_time
      (since that's useful to the callers and easier for pmac to
      generate); *_get_rtc_time and *_set_rtc_time functions take
      a struct rtc_time; irq_canonicalize is gone; nvram_sync returns
      void.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      143a1dec
  27. 08 7月, 2005 5 次提交
  28. 30 6月, 2005 2 次提交