1. 12 12月, 2011 40 次提交
    • P
      rcu: Eliminate RCU_FAST_NO_HZ grace-period hang · f535a607
      Paul E. McKenney 提交于
      With the new implementation of RCU_FAST_NO_HZ, it was possible to hang
      RCU grace periods as follows:
      
      o	CPU 0 attempts to go idle, cycles several times through the
      	rcu_prepare_for_idle() loop, then goes dyntick-idle when
      	RCU needs nothing more from it, while still having at least
      	on RCU callback pending.
      
      o	CPU 1 goes idle with no callbacks.
      
      Both CPUs can then stay in dyntick-idle mode indefinitely, preventing
      the RCU grace period from ever completing, possibly hanging the system.
      
      This commit therefore prevents CPUs that have RCU callbacks from entering
      dyntick-idle mode.  This approach also eliminates the need for the
      end-of-grace-period IPIs used previously.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f535a607
    • P
      rcu: Avoid needlessly IPIing CPUs at GP end · 84ad00cb
      Paul E. McKenney 提交于
      If a CPU enters dyntick-idle mode with callbacks pending, it will need
      an IPI at the end of the grace period.  However, if it exits dyntick-idle
      mode before the grace period ends, it will be needlessly IPIed at the
      end of the grace period.
      
      Therefore, this commit clears the per-CPU rcu_awake_at_gp_end flag
      when a CPU determines that it does not need it.  This in turn requires
      disabling interrupts across much of rcu_prepare_for_idle() in order to
      avoid having nested interrupts clearing this state out from under us.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      84ad00cb
    • P
      rcu: Go dyntick-idle more quickly if CPU has serviced current grace period · 3084f2f8
      Paul E. McKenney 提交于
      The earlier version would attempt to push callbacks through five times
      before going into dyntick-idle mode if callbacks remained, but the CPU
      had done all that it needed to do for the current RCU grace periods.
      This is wasteful:  In most cases, once the CPU has done all that it
      needs to for the current RCU grace periods, it will make no further
      progress on the callbacks no matter how many times it loops through
      the RCU core processing and the idle-entry code.
      
      This commit therefore goes to dyntick-idle mode whenever the current
      CPU has done all it can for the current grace period.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3084f2f8
    • P
      rcu: Add tracing for RCU_FAST_NO_HZ · 433cdddc
      Paul E. McKenney 提交于
      This commit adds trace_rcu_prep_idle(), which is invoked from
      rcu_prepare_for_idle() and rcu_wake_cpu() to trace attempts on
      the part of RCU to force CPUs into dyntick-idle mode.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      433cdddc
    • P
      rcu: Update trace_rcu_dyntick() header comment · 045fb931
      Paul E. McKenney 提交于
      This commit updates the trace_rcu_dyntick() header comment to reflect
      events added by commit 4b4f421.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      045fb931
    • P
      doc: Add load/store guarantees to Documentation/atomic-ops.txt · 182dd4b2
      Paul E. McKenney 提交于
      An IRC discussion uncovered many conflicting opinions on what types
      of data may be atomically loaded and stored.  This commit therefore
      calls this out the official set: pointers, longs, ints, and chars (but
      not shorts).  This commit also gives some examples of compiler mischief
      that can thwart atomicity.
      
      Please note that this discussion is relevant to !SMP kernels if
      CONFIG_PREEMPT=y: preemption can cause almost as much trouble as can SMP.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Chris Zankel <chris@zankel.net>
      182dd4b2
    • F
      nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu() · 1268fbc7
      Frederic Weisbecker 提交于
      Those two APIs were provided to optimize the calls of
      tick_nohz_idle_enter() and rcu_idle_enter() into a single
      irq disabled section. This way no interrupt happening in-between would
      needlessly process any RCU job.
      
      Now we are talking about an optimization for which benefits
      have yet to be measured. Let's start simple and completely decouple
      idle rcu and dyntick idle logics to simplify.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1268fbc7
    • P
      rcu: Add rcutorture CPU-hotplug capability · b58bdcca
      Paul E. McKenney 提交于
      Running CPU-hotplug operations concurrently with rcutorture has
      historically been a good way to find bugs in both RCU and CPU hotplug.
      This commit therefore adds an rcutorture module parameter called
      "onoff_interval" that causes a randomly selected CPU-hotplug operation to
      be executed at the specified interval, in seconds.  The default value of
      "onoff_interval" is zero, which disables rcutorture-instigated CPU-hotplug
      operations.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b58bdcca
    • P
      tile: Make tile use the new is_idle_task() API · a95f8817
      Paul E. McKenney 提交于
      Change from direct comparison of ->pid with zero to is_idle_task().
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      a95f8817
    • P
      events: Make events use the new is_idle_task() API · 77aeeebd
      Paul E. McKenney 提交于
      Change from direct comparison of ->pid with zero to is_idle_task().
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      77aeeebd
    • P
      kdb: Make KDB use the new is_idle_task() API · 7fc20c5c
      Paul E. McKenney 提交于
      Change from direct comparison of ->pid with zero to is_idle_task().
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      7fc20c5c
    • P
      sparc: Make SPARC use the new is_idle_task() API · 29f043a2
      Paul E. McKenney 提交于
      Change from direct comparison of ->pid with zero to is_idle_task().
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      29f043a2
    • P
      rcu: Make RCU use the new is_idle_task() API · 99745b6a
      Paul E. McKenney 提交于
      Change from direct comparison of ->pid with zero to is_idle_task().
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      99745b6a
    • P
      sched: Add is_idle_task() to handle invalidated uses of idle_cpu() · c4f30608
      Paul E. McKenney 提交于
      Commit 908a3283 (Fix idle_cpu()) invalidated some uses of idle_cpu(),
      which used to say whether or not the CPU was running the idle task,
      but now instead says whether or not the CPU is running the idle task
      in the absence of pending wakeups.  Although this new implementation
      gives a better answer to the question "is this CPU idle?", it also
      invalidates other uses that were made of idle_cpu().
      
      This commit therefore introduces a new is_idle_task() API member
      that determines whether or not the specified task is one of the
      idle tasks, allowing open-coded "->pid == 0" sequences to be replaced
      by something more meaningful.
      Suggested-by: NJosh Triplett <josh@joshtriplett.org>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c4f30608
    • P
      rcu: Control rcutorture startup from kernel boot parameters · bb3bf705
      Paul E. McKenney 提交于
      Currently, if rcutorture is built into the kernel, it must be manually
      started or started from an init script.  This is inconvenient for
      automated KVM testing, where it is good to be able to fully control
      rcutorture execution from the kernel parameters.  This patch therefore
      adds a module parameter named "rcutorture_runnable" that defaults
      to zero ("don't start automatically"), but which can be set to one
      to cause rcutorture to start up immediately during boot.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bb3bf705
    • P
      rcu: Add rcutorture system-shutdown capability · d5f546d8
      Paul E. McKenney 提交于
      Although it is easy to run rcutorture tests under KVM, there is currently
      no nice way to run such a test for a fixed time period, collect all of
      the rcutorture data, and then shut the system down cleanly.  This commit
      therefore adds an rcutorture module parameter named "shutdown_secs" that
      specified the run duration in seconds, after which rcutorture terminates
      the test and powers the system down.  The default value for "shutdown_secs"
      is zero, which disables shutdown.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d5f546d8
    • P
      rcu: Permit RCU_FAST_NO_HZ to be used by TREE_PREEMPT_RCU · b807fbff
      Paul E. McKenney 提交于
      The new implementation of RCU_FAST_NO_HZ is compatible with preemptible
      RCU, so this commit removes the Kconfig restriction that previously
      prohibited this.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      b807fbff
    • P
      rcu: Fix idle-task checks · 11dbaa8c
      Paul E. McKenney 提交于
      RCU has traditionally relied on idle_cpu() to determine whether a given
      CPU is running in the context of an idle task, but commit 908a3283
      (Fix idle_cpu()) has invalidated this approach.  After commit 908a3283,
      idle_cpu() will return true if the current CPU is currently running the
      idle task, and will be doing so for the foreseeable future.  RCU instead
      needs to know whether or not the current CPU is currently running the
      idle task, regardless of what the near future might bring.
      
      This commit therefore switches from idle_cpu() to "current->pid != 0".
      Reported-by: NWu Fengguang <fengguang.wu@intel.com>
      Suggested-by: NCarsten Emde <C.Emde@osadl.org>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Tested-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      11dbaa8c
    • P
      rcu: Allow dyntick-idle mode for CPUs with callbacks · aea1b35e
      Paul E. McKenney 提交于
      Currently, RCU does not permit a CPU to enter dyntick-idle mode if that
      CPU has any RCU callbacks queued.  This means that workloads for which
      each CPU wakes up and does some RCU updates every few ticks will never
      enter dyntick-idle mode.  This can result in significant unnecessary power
      consumption, so this patch permits a given to enter dyntick-idle mode if
      it has callbacks, but only if that same CPU has completed all current
      work for the RCU core.  We determine use rcu_pending() to determine
      whether a given CPU has completed all current work for the RCU core.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      aea1b35e
    • P
      rcu: Add more information to the wrong-idle-task complaint · 0989cb46
      Paul E. McKenney 提交于
      The current code just complains if the current task is not the idle task.
      This commit therefore adds printing of the identity of the idle task.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      0989cb46
    • P
      rcu: Deconfuse dynticks entry-exit tracing · 4145fa7f
      Paul E. McKenney 提交于
      The trace_rcu_dyntick() trace event did not print both the old and
      the new value of the nesting level, and furthermore printed only
      the low-order 32 bits of it.  This could result in some confusion
      when interpreting trace-event dumps, so this commit prints both
      the old and the new value, prints the full 64 bits, and also selects
      the process-entry/exit increment to print nicely in hexadecimal.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      4145fa7f
    • P
      rcu: Add documentation for raw SRCU read-side primitives · 9ceae0e2
      Paul E. McKenney 提交于
      Update various files in Documentation/RCU to reflect srcu_read_lock_raw()
      and srcu_read_unlock_raw().  Credit to Peter Zijlstra for suggesting
      use of the existing _raw suffix instead of the earlier bulkref names.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9ceae0e2
    • P
      rcu: Introduce raw SRCU read-side primitives · 0c53dd8b
      Paul E. McKenney 提交于
      The RCU implementations, including SRCU, are designed to be used in a
      lock-like fashion, so that the read-side lock and unlock primitives must
      execute in the same context for any given read-side critical section.
      This constraint is enforced by lockdep-RCU.  However, there is a need
      to enter an SRCU read-side critical section within the context of an
      exception and then exit in the context of the task that encountered the
      exception.  The cost of this capability is that the read-side operations
      incur the overhead of disabling interrupts.
      
      Note that although the current implementation allows a given read-side
      critical section to be entered by one task and then exited by another, all
      known possible implementations that allow this have scalability problems.
      Therefore, a given read-side critical section must be exited by the same
      task that entered it, though perhaps from an interrupt or exception
      handler running within that task's context.  But if you are thinking
      in terms of interrupt handlers, make sure that you have considered the
      possibility of threaded interrupt handlers.
      
      Credit goes to Peter Zijlstra for suggesting use of the existing _raw
      suffix to indicate disabling lockdep over the earlier "bulkref" names.
      Requested-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      0c53dd8b
    • P
      powerpc: Tell RCU about idle after hcall tracing · a7b152d5
      Paul E. McKenney 提交于
      The PowerPC pSeries platform (CONFIG_PPC_PSERIES=y) enables
      hypervisor-call tracing for CONFIG_TRACEPOINTS=y kernels.  One of the
      hypervisor calls that is traced is the H_CEDE call in the idle loop
      that tells the hypervisor that this OS instance no longer needs the
      current CPU.  However, tracing uses RCU, so this combination of kernel
      configuration variables needs to avoid telling RCU about the current CPU's
      idleness until after the H_CEDE-entry tracing completes on the one hand,
      and must tell RCU that the the current CPU is no longer idle before the
      H_CEDE-exit tracing starts.
      
      In all other cases, it suffices to inform RCU of CPU idleness upon
      idle-loop entry and exit.
      
      This commit makes the required adjustments.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      a7b152d5
    • F
      rcu: Fix early call to rcu_idle_enter() · 416eb33c
      Frederic Weisbecker 提交于
      On the irq exit path, tick_nohz_irq_exit()
      may raise a softirq, which action leads to the wake up
      path and select_task_rq_fair() that makes use of rcu
      to iterate the domains.
      
      This is an illegal use of RCU because we may be in RCU
      extended quiescent state if we interrupted an RCU-idle
      window in the idle loop:
      
      [  132.978883] ===============================
      [  132.978883] [ INFO: suspicious RCU usage. ]
      [  132.978883] -------------------------------
      [  132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
      [  132.978883]
      [  132.978883] other info that might help us debug this:
      [  132.978883]
      [  132.978883]
      [  132.978883] rcu_scheduler_active = 1, debug_locks = 0
      [  132.978883] RCU used illegally from extended quiescent state!
      [  132.978883] 2 locks held by swapper/0:
      [  132.978883]  #0:  (&p->pi_lock){-.-.-.}, at: [<ffffffff8105a729>] try_to_wake_up+0x39/0x2f0
      [  132.978883]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8105556a>] select_task_rq_fair+0x6a/0xec0
      [  132.978883]
      [  132.978883] stack backtrace:
      [  132.978883] Pid: 0, comm: swapper Tainted: G        W   3.0.0+ #178
      [  132.978883] Call Trace:
      [  132.978883]  <IRQ>  [<ffffffff810a01f6>] lockdep_rcu_suspicious+0xe6/0x100
      [  132.978883]  [<ffffffff81055c49>] select_task_rq_fair+0x749/0xec0
      [  132.978883]  [<ffffffff8105556a>] ? select_task_rq_fair+0x6a/0xec0
      [  132.978883]  [<ffffffff812fe494>] ? do_raw_spin_lock+0x54/0x150
      [  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
      [  132.978883]  [<ffffffff8105a7c3>] try_to_wake_up+0xd3/0x2f0
      [  132.978883]  [<ffffffff81094f98>] ? ktime_get+0x68/0xf0
      [  132.978883]  [<ffffffff8105aa35>] wake_up_process+0x15/0x20
      [  132.978883]  [<ffffffff81069dd5>] raise_softirq_irqoff+0x65/0x110
      [  132.978883]  [<ffffffff8108eb65>] __hrtimer_start_range_ns+0x415/0x5a0
      [  132.978883]  [<ffffffff812fe3ee>] ? do_raw_spin_unlock+0x5e/0xb0
      [  132.978883]  [<ffffffff8108ed08>] hrtimer_start+0x18/0x20
      [  132.978883]  [<ffffffff8109c9c3>] tick_nohz_stop_sched_tick+0x393/0x450
      [  132.978883]  [<ffffffff810694f2>] irq_exit+0xd2/0x100
      [  132.978883]  [<ffffffff81829e96>] do_IRQ+0x66/0xe0
      [  132.978883]  [<ffffffff81820d53>] common_interrupt+0x13/0x13
      [  132.978883]  <EOI>  [<ffffffff8103434b>] ? native_safe_halt+0xb/0x10
      [  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
      [  132.978883]  [<ffffffff810144ea>] default_idle+0xba/0x370
      [  132.978883]  [<ffffffff810147fe>] amd_e400_idle+0x5e/0x130
      [  132.978883]  [<ffffffff8100a9f6>] cpu_idle+0xb6/0x120
      [  132.978883]  [<ffffffff817f217f>] rest_init+0xef/0x150
      [  132.978883]  [<ffffffff817f20e2>] ? rest_init+0x52/0x150
      [  132.978883]  [<ffffffff81ed9cf3>] start_kernel+0x3da/0x3e5
      [  132.978883]  [<ffffffff81ed9346>] x86_64_start_reservations+0x131/0x135
      [  132.978883]  [<ffffffff81ed944d>] x86_64_start_kernel+0x103/0x112
      
      Fix this by calling rcu_idle_enter() after tick_nohz_irq_exit().
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      416eb33c
    • F
      x86: Call idle notifier after irq_enter() · 98ad1cc1
      Frederic Weisbecker 提交于
      Interrupts notify the idle exit state before calling irq_enter().
      But the notifier code calls rcu_read_lock() and this is not
      allowed while rcu is in an extended quiescent state. We need
      to wait for irq_enter() -> rcu_idle_exit() to be called before
      doing so otherwise this results in a grumpy RCU:
      
      [    0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
      [    0.099991] Hardware name: AMD690VM-FMH
      [    0.099991] Modules linked in:
      [    0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255
      [    0.099991] Call Trace:
      [    0.099991]  <IRQ>  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
      [    0.099991]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
      [    0.099991]  [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110
      [    0.099991]  [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20
      [    0.099991]  [<ffffffff81001873>] exit_idle+0x43/0x50
      [    0.099991]  [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0
      [    0.099991]  [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20
      [    0.099991]  <EOI>  [<ffffffff8100ae67>] ? default_idle+0xa7/0x350
      [    0.099991]  [<ffffffff8100ae65>] ? default_idle+0xa5/0x350
      [    0.099991]  [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110
      [    0.099991]  [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160
      [    0.099991]  [<ffffffff810019a0>] cpu_idle+0xb0/0x110
      [    0.099991]  [<ffffffff817a7505>] rest_init+0xe5/0x140
      [    0.099991]  [<ffffffff817a7468>] ? rest_init+0x48/0x140
      [    0.099991]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
      [    0.099991]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
      [    0.099991]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Andy Henroid <andrew.d.henroid@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      98ad1cc1
    • F
      x86: Enter rcu extended qs after idle notifier call · e37e112d
      Frederic Weisbecker 提交于
      The idle notifier, called by enter_idle(), enters into rcu read
      side critical section but at that time we already switched into
      the RCU-idle window (rcu_idle_enter() has been called). And it's
      illegal to use rcu_read_lock() in that state.
      
      This results in rcu reporting its bad mood:
      
      [    1.275635] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
      [    1.275635] Hardware name: AMD690VM-FMH
      [    1.275635] Modules linked in:
      [    1.275635] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #252
      [    1.275635] Call Trace:
      [    1.275635]  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
      [    1.275635]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
      [    1.275635]  [<ffffffff817d6f22>] __atomic_notifier_call_chain+0xd2/0x110
      [    1.275635]  [<ffffffff817d6f71>] atomic_notifier_call_chain+0x11/0x20
      [    1.275635]  [<ffffffff810018a0>] enter_idle+0x20/0x30
      [    1.275635]  [<ffffffff81001995>] cpu_idle+0xa5/0x110
      [    1.275635]  [<ffffffff817a7465>] rest_init+0xe5/0x140
      [    1.275635]  [<ffffffff817a73c8>] ? rest_init+0x48/0x140
      [    1.275635]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
      [    1.275635]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
      [    1.275635]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
      [    1.275635] ---[ end trace a22d306b065d4a66 ]---
      
      Fix this by entering rcu extended quiescent state later, just before
      the CPU goes to sleep.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      e37e112d
    • F
      nohz: Allow rcu extended quiescent state handling seperately from tick stop · 2bbb6817
      Frederic Weisbecker 提交于
      It is assumed that rcu won't be used once we switch to tickless
      mode and until we restart the tick. However this is not always
      true, as in x86-64 where we dereference the idle notifiers after
      the tick is stopped.
      
      To prepare for fixing this, add two new APIs:
      tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
      
      If no use of RCU is made in the idle loop between
      tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
      must instead call the new *_norcu() version such that the arch doesn't
      need to call rcu_idle_enter() and rcu_idle_exit().
      
      Otherwise the arch must call tick_nohz_enter_idle() and
      tick_nohz_exit_idle() and also call explicitly:
      
      - rcu_idle_enter() after its last use of RCU before the CPU is put
      to sleep.
      - rcu_idle_exit() before the first use of RCU after the CPU is woken
      up.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2bbb6817
    • F
      nohz: Separate out irq exit and idle loop dyntick logic · 280f0677
      Frederic Weisbecker 提交于
      The tick_nohz_stop_sched_tick() function, which tries to delay
      the next timer tick as long as possible, can be called from two
      places:
      
      - From the idle loop to start the dytick idle mode
      - From interrupt exit if we have interrupted the dyntick
      idle mode, so that we reprogram the next tick event in
      case the irq changed some internal state that requires this
      action.
      
      There are only few minor differences between both that
      are handled by that function, driven by the ts->inidle
      cpu variable and the inidle parameter. The whole guarantees
      that we only update the dyntick mode on irq exit if we actually
      interrupted the dyntick idle mode, and that we enter in RCU extended
      quiescent state from idle loop entry only.
      
      Split this function into:
      
      - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
      dynticks idle mode unconditionally if it can, and enters into RCU
      extended quiescent state.
      
      - tick_nohz_irq_exit() which only updates the dynticks idle mode
      when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).
      
      To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
      into tick_nohz_idle_exit().
      
      This simplifies the code and micro-optimize the irq exit path (no need
      for local_irq_save there). This also prepares for the split between
      dynticks and rcu extended quiescent state logics. We'll need this split to
      further fix illegal uses of RCU in extended quiescent states in the idle
      loop.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      280f0677
    • P
      rcu: Make srcu_read_lock_held() call common lockdep-enabled function · 867f236b
      Paul E. McKenney 提交于
      A common debug_lockdep_rcu_enabled() function is used to check whether
      RCU lockdep splats should be reported, but srcu_read_lock() does not
      use it.  This commit therefore brings srcu_read_lock_held() up to date.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      867f236b
    • P
      rcu: Warn when srcu_read_lock() is used in an extended quiescent state · ff195cb6
      Paul E. McKenney 提交于
      Catch SRCU up to the other variants of RCU by making PROVE_RCU
      complain if either srcu_read_lock() or srcu_read_lock_held() are
      used from within RCU-idle mode.
      
      Frederic reworked this to allow for the new versions of his patches
      that check for extended quiescent states.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      ff195cb6
    • P
      rcu: Remove one layer of abstraction from PROVE_RCU checking · d8ab29f8
      Paul E. McKenney 提交于
      Simplify things a bit by substituting the definitions of the single-line
      rcu_read_acquire(), rcu_read_release(), rcu_read_acquire_bh(),
      rcu_read_release_bh(), rcu_read_acquire_sched(), and
      rcu_read_release_sched() functions at their call points.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      d8ab29f8
    • F
      rcu: Warn when rcu_read_lock() is used in extended quiescent state · 00f49e57
      Frederic Weisbecker 提交于
      We are currently able to detect uses of rcu_dereference_check() inside
      extended quiescent states (such as the RCU-free window in idle).
      But rcu_read_lock() and friends can be used without rcu_dereference(),
      so that the earlier commit checking for use of rcu_dereference() and
      friends while in RCU idle mode miss some error conditions.  This commit
      therefore adds extended quiescent state checking to rcu_read_lock() and
      friends.
      
      Uses of RCU from within RCU-idle mode are totally ignored by
      RCU, hence the importance of these checks.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      00f49e57
    • F
      rcu: Inform the user about extended quiescent state on PROVE_RCU warning · 0464e937
      Frederic Weisbecker 提交于
      Inform the user if an RCU usage error is detected by lockdep while in
      an extended quiescent state (in this case, the RCU-free window in idle).
      This is accomplished by adding a line to the RCU lockdep splat indicating
      whether or not the splat occurred in extended quiescent state.
      
      Uses of RCU from within extended quiescent state mode are totally ignored
      by RCU, hence the importance of this diagnostic.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      0464e937
    • F
      rcu: Detect illegal rcu dereference in extended quiescent state · e6b80a3b
      Frederic Weisbecker 提交于
      Report that none of the rcu read lock maps are held while in an RCU
      extended quiescent state (the section between rcu_idle_enter()
      and rcu_idle_exit()). This helps detect any use of rcu_dereference()
      and friends from within the section in idle where RCU is not allowed.
      
      This way we can guarantee an extended quiescent window where the CPU
      can be put in dyntick idle mode or can simply aoid to be part of any
      global grace period completion while in the idle loop.
      
      Uses of RCU from such mode are totally ignored by RCU, hence the
      importance of these checks.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      e6b80a3b
    • T
      rcu: Remove redundant return from rcu_report_exp_rnp() · a0f8eefb
      Thomas Gleixner 提交于
      Empty void functions do not need "return", so this commit removes it
      from rcu_report_exp_rnp().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a0f8eefb
    • T
      rcu: Omit self-awaken when setting up expedited grace period · b40d293e
      Thomas Gleixner 提交于
      When setting up an expedited grace period, if there were no readers, the
      task will awaken itself.  This commit removes this useless self-awakening.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b40d293e
    • P
      rcu: Disable preemption in rcu_is_cpu_idle() · 34240697
      Paul E. McKenney 提交于
      Because rcu_is_cpu_idle() is to be used to check for extended quiescent
      states in RCU-preempt read-side critical sections, it cannot assume that
      preemption is disabled.  And preemption must be disabled when accessing
      the dyntick-idle state, because otherwise the following sequence of events
      could occur:
      
      1.	Task A on CPU 1 enters rcu_is_cpu_idle() and picks up the pointer
      	to CPU 1's per-CPU variables.
      
      2.	Task B preempts Task A and starts running on CPU 1.
      
      3.	Task A migrates to CPU 2.
      
      4.	Task B blocks, leaving CPU 1 idle.
      
      5.	Task A continues execution on CPU 2, accessing CPU 1's dyntick-idle
      	information using the pointer fetched in step 1 above, and finds
      	that CPU 1 is idle.
      
      6.	Task A therefore incorrectly concludes that it is executing in
      	an extended quiescent state, possibly issuing a spurious splat.
      
      Therefore, this commit disables preemption within the rcu_is_cpu_idle()
      function.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      34240697
    • P
      rcu: Document failing tick as cause of RCU CPU stall warning · 2c01531f
      Paul E. McKenney 提交于
      One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
      These turned out to be caused by a bug that stopped scheduling-clock
      tick interrupts from being sent to a given CPU for several hundred seconds.
      This commit therefore updates the documentation to call this out as a
      possible cause for RCU CPU stall warnings.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      2c01531f
    • P
      rcu: Add failure tracing to rcutorture · 91afaf30
      Paul E. McKenney 提交于
      Trace the rcutorture RCU accesses and dump the trace buffer when the
      first failure is detected.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      91afaf30