1. 25 9月, 2013 2 次提交
    • P
      rcu: Change EXPORT_SYMBOL() to EXPORT_SYMBOL_GPL() · f9ffc31e
      Paul E. McKenney 提交于
      Commit e6b80a3b (rcu: Detect illegal rcu dereference in extended
      quiescent state) exported the pre-existing rcu_is_cpu_idle() function
      using EXPORT_SYMBOL().  However, this is inconsistent with the remaining
      exports from RCU, which are all EXPORT_SYMBOL_GPL().  The current state
      of affairs means that a non-GPL module could use rcu_is_cpu_idle(),
      but in a CONFIG_TREE_PREEMPT_RCU=y kernel would be unable to invoke
      rcu_read_lock() and rcu_read_unlock().
      
      This commit therefore makes rcu_is_cpu_idle()'s export be consistent
      with the rest of RCU, namely EXPORT_SYMBOL_GPL().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      f9ffc31e
    • P
      rcu: Is it safe to enter an RCU read-side critical section? · cc6783f7
      Paul E. McKenney 提交于
      There is currently no way for kernel code to determine whether it
      is safe to enter an RCU read-side critical section, in other words,
      whether or not RCU is paying attention to the currently running CPU.
      Given the large and increasing quantity of code shared by the idle loop
      and non-idle code, the this shortcoming is becoming increasingly painful.
      
      This commit therefore adds __rcu_is_watching(), which returns true if
      it is safe to enter an RCU read-side critical section on the currently
      running CPU.  This function is quite fast, using only a __this_cpu_read().
      However, the caller must disable preemption.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      cc6783f7
  2. 01 9月, 2013 2 次提交
    • P
      nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU · eb75767b
      Paul E. McKenney 提交于
      Because RCU's quiescent-state-forcing mechanism is used to drive the
      full-system-idle state machine, and because this mechanism is executed
      by RCU's grace-period kthreads, this commit forces these kthreads to
      run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
      mean that the RCU grace-period kthreads would force the system into
      non-idle state every time they drove the state machine, which would
      be just a bit on the futile side.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      eb75767b
    • P
      nohz_full: Add full-system-idle state machine · 0edd1b17
      Paul E. McKenney 提交于
      This commit adds the state machine that takes the per-CPU idle data
      as input and produces a full-system-idle indication as output.  This
      state machine is driven out of RCU's quiescent-state-forcing
      mechanism, which invokes rcu_sysidle_check_cpu() to collect per-CPU
      idle state and then rcu_sysidle_report() to drive the state machine.
      
      The full-system-idle state is sampled using rcu_sys_is_idle(), which
      also drives the state machine if RCU is idle (and does so by forcing
      RCU to become non-idle).  This function returns true if all but the
      timekeeping CPU (tick_do_timer_cpu) are idle and have been idle long
      enough to avoid memory contention on the full_sysidle_state state
      variable.  The rcu_sysidle_force_exit() may be called externally
      to reset the state machine back into non-idle state.
      
      For large systems the state machine is driven out of RCU's
      force-quiescent-state logic, which provides good scalability at the price
      of millisecond-scale latencies on the transition to full-system-idle
      state.  This is not so good for battery-powered systems, which are usually
      small enough that they don't need to care about scalability, but which
      do care deeply about energy efficiency.  Small systems therefore drive
      the state machine directly out of the idle-entry code.  The number of
      CPUs in a "small" system is defined by a new NO_HZ_FULL_SYSIDLE_SMALL
      Kconfig parameter, which defaults to 8.  Note that this is a build-time
      definition.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      [ paulmck: Use true and false for boolean constants per Lai Jiangshan. ]
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Simplify logic and provide better comments for memory barriers,
        based on review comments and questions by Lai Jiangshan. ]
      0edd1b17
  3. 21 8月, 2013 1 次提交
  4. 19 8月, 2013 7 次提交
    • P
      nohz_full: Add full-system-idle arguments to API · 217af2a2
      Paul E. McKenney 提交于
      This commit adds an isidle and jiffies argument to force_qs_rnp(),
      dyntick_save_progress_counter(), and rcu_implicit_dynticks_qs() to enable
      RCU's force-quiescent-state process to check for full-system idle.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      [ paulmck: Use true and false for boolean constants per Lai Jiangshan. ]
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      217af2a2
    • P
      nohz_full: Add per-CPU idle-state tracking · eb348b89
      Paul E. McKenney 提交于
      This commit adds the code that updates the rcu_dyntick structure's
      new fields to track the per-CPU idle state based on interrupts and
      transitions into and out of the idle loop (NMIs are ignored because NMI
      handlers cannot cleanly read out the time anyway).  This code is similar
      to the code that maintains RCU's idea of per-CPU idleness, but differs
      in that RCU treats CPUs running in user mode as idle, where this new
      code does not.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      eb348b89
    • P
      nohz_full: Add rcu_dyntick data for scalable detection of all-idle state · 2333210b
      Paul E. McKenney 提交于
      This commit adds fields to the rcu_dyntick structure that are used to
      detect idle CPUs.  These new fields differ from the existing ones in
      that the existing ones consider a CPU executing in user mode to be idle,
      where the new ones consider CPUs executing in user mode to be busy.
      The handling of these new fields is otherwise quite similar to that for
      the exiting fields.  This commit also adds the initialization required
      for these fields.
      
      So, why is usermode execution treated differently, with RCU considering
      it a quiescent state equivalent to idle, while in contrast the new
      full-system idle state detection considers usermode execution to be
      non-idle?
      
      It turns out that although one of RCU's quiescent states is usermode
      execution, it is not a full-system idle state.  This is because the
      purpose of the full-system idle state is not RCU, but rather determining
      when accurate timekeeping can safely be disabled.  Whenever accurate
      timekeeping is required in a CONFIG_NO_HZ_FULL kernel, at least one
      CPU must keep the scheduling-clock tick going.  If even one CPU is
      executing in user mode, accurate timekeeping is requires, particularly for
      architectures where gettimeofday() and friends do not enter the kernel.
      Only when all CPUs are really and truly idle can accurate timekeeping be
      disabled, allowing all CPUs to turn off the scheduling clock interrupt,
      thus greatly improving energy efficiency.
      
      This naturally raises the question "Why is this code in RCU rather than in
      timekeeping?", and the answer is that RCU has the data and infrastructure
      to efficiently make this determination.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      2333210b
    • P
      rcu: Eliminate unused APIs intended for adaptive ticks · feed66ed
      Paul E. McKenney 提交于
      The rcu_user_enter_after_irq() and rcu_user_exit_after_irq()
      functions were intended for use by adaptive ticks, but changes
      in implementation have rendered them unnecessary.  This commit
      therefore removes them.
      Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      feed66ed
    • P
      rcu: Avoid redundant grace-period kthread wakeups · 1eafd31c
      Paul E. McKenney 提交于
      When setting up an in-the-future "advanced" grace period, the code needs
      to wake up the relevant grace-period kthread, which it currently does
      unconditionally.  However, this results in needless wakeups in the case
      where the advanced grace period is being set up by the grace-period
      kthread itself, which is a non-uncommon situation.  This commit therefore
      checks to see if the running thread is the grace-period kthread, and
      avoids doing the irq_work_queue()-mediated wakeup in that case.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      1eafd31c
    • P
      rcu: Make call_rcu() leak callbacks for debug-object errors · ae150184
      Paul E. McKenney 提交于
      If someone does a duplicate call_rcu(), the worst thing the second
      call_rcu() could do would be to actually queue the callback the second
      time because doing so corrupts whatever list the callback was already
      queued on.  This commit therefore makes __call_rcu() check the new
      return value from debug-objects and leak the callback upon error.
      This commit also substitutes rcu_leak_callback() for whatever callback
      function was previously in place in order to avoid freeing the callback
      out from under any readers that might still be referencing it.
      
      These changes increase the probability that the debug-objects error
      messages will actually make it somewhere visible.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      ae150184
    • B
      rcu: Expedite grace periods during suspend/resume · d1d74d14
      Borislav Petkov 提交于
      CONFIG_RCU_FAST_NO_HZ can increase grace-period durations by up to
      a factor of four, which can result in long suspend and resume times.
      Thus, this commit temporarily switches to expedited grace periods when
      suspending the box and return to normal settings when resuming.  Similar
      logic is applied to hibernation.
      
      Because expedited grace periods are of dubious benefit on very large
      systems, so this commit restricts their automated use during suspend
      and resume to systems of 256 or fewer CPUs.  (Some day a number of
      Linux-kernel facilities, including RCU's expedited grace periods,
      will be more scalable, but I need to see bug reports first.)
      
      [ paulmck: This also papers over an audio/irq bug, but hopefully that will
        be fixed soon. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      d1d74d14
  5. 30 7月, 2013 3 次提交
    • S
      rcu: Have the RCU tracepoints use the tracepoint_string infrastructure · f7f7bac9
      Steven Rostedt (Red Hat) 提交于
      Currently, RCU tracepoints save only a pointer to strings in the
      ring buffer. When displayed via the /sys/kernel/debug/tracing/trace file
      they are referenced like the printf "%s" that looks at the address
      in the ring buffer and prints out the string it points too. This requires
      that the strings are constant and persistent in the kernel.
      
      The problem with this is for tools like trace-cmd and perf that read the
      binary data from the buffers but have no access to the kernel memory to
      find out what string is represented by the address in the buffer.
      
      By using the tracepoint_string infrastructure, the RCU tracepoint strings
      can be exported such that userspace tools can map the addresses to
      the strings.
      
       # cat /sys/kernel/debug/tracing/printk_formats
      0xffffffff81a4a0e8 : "rcu_preempt"
      0xffffffff81a4a0f4 : "rcu_bh"
      0xffffffff81a4a100 : "rcu_sched"
      0xffffffff818437a0 : "cpuqs"
      0xffffffff818437a6 : "rcu_sched"
      0xffffffff818437a0 : "cpuqs"
      0xffffffff818437b0 : "rcu_bh"
      0xffffffff818437b7 : "Start context switch"
      0xffffffff818437cc : "End context switch"
      0xffffffff818437a0 : "cpuqs"
      [...]
      
      Now userspaces tools can display:
      
       rcu_utilization:      Start context switch
       rcu_dyntick:          Start 1 0
       rcu_utilization:      End context switch
       rcu_batch_start:      rcu_preempt CBs=0/5 bl=10
       rcu_dyntick:          End 0 140000000000000
       rcu_invoke_callback:  rcu_preempt rhp=0xffff880071c0d600 func=proc_i_callback
       rcu_invoke_callback:  rcu_preempt rhp=0xffff880077b5b230 func=__d_free
       rcu_dyntick:          Start 140000000000000 0
       rcu_invoke_callback:  rcu_preempt rhp=0xffff880077563980 func=file_free_rcu
       rcu_batch_end:        rcu_preempt CBs-invoked=3 idle=>c<>c<>c<>c<
       rcu_utilization:      End RCU core
       rcu_grace_period:     rcu_preempt 9741 start
       rcu_dyntick:          Start 1 0
       rcu_dyntick:          End 0 140000000000000
       rcu_dyntick:          Start 140000000000000 0
      
      Instead of:
      
       rcu_utilization:      ffffffff81843110
       rcu_future_grace_period: ffffffff81842f1d 9939 9939 9940 0 0 3 ffffffff81842f32
       rcu_batch_start:      ffffffff81842f1d CBs=0/4 bl=10
       rcu_future_grace_period: ffffffff81842f1d 9939 9939 9940 0 0 3 ffffffff81842f3c
       rcu_grace_period:     ffffffff81842f1d 9939 ffffffff81842f80
       rcu_invoke_callback:  ffffffff81842f1d rhp=0xffff88007888aac0 func=file_free_rcu
       rcu_grace_period:     ffffffff81842f1d 9939 ffffffff81842f95
       rcu_invoke_callback:  ffffffff81842f1d rhp=0xffff88006aeb4600 func=proc_i_callback
       rcu_future_grace_period: ffffffff81842f1d 9939 9939 9940 0 0 3 ffffffff81842f32
       rcu_future_grace_period: ffffffff81842f1d 9939 9939 9940 0 0 3 ffffffff81842f3c
       rcu_invoke_callback:  ffffffff81842f1d rhp=0xffff880071cb9fc0 func=__d_free
       rcu_grace_period:     ffffffff81842f1d 9939 ffffffff81842f80
       rcu_invoke_callback:  ffffffff81842f1d rhp=0xffff88007888ae80 func=file_free_rcu
       rcu_batch_end:        ffffffff81842f1d CBs-invoked=4 idle=>c<>c<>c<>c<
       rcu_utilization:      ffffffff8184311f
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f7f7bac9
    • S
      rcu: Simplify RCU_STATE_INITIALIZER() macro · a41bfeb2
      Steven Rostedt (Red Hat) 提交于
      The RCU_STATE_INITIALIZER() macro is used only in the rcutree.c file
      as well as the rcutree_plugin.h file. It is passed as a rvalue to
      a variable of a similar name. A per_cpu variable is also created
      with a similar name as well.
      
      The uses of RCU_STATE_INITIALIZER() can be simplified to remove some
      of the duplicate code that is done. Currently the three users of this
      macro has this format:
      
      struct rcu_state rcu_sched_state =
      	RCU_STATE_INITIALIZER(rcu_sched, call_rcu_sched);
      DEFINE_PER_CPU(struct rcu_data, rcu_sched_data);
      
      Notice that "rcu_sched" is called three times. This is the same with
      the other two users. This can be condensed to just:
      
      RCU_STATE_INITIALIZER(rcu_sched, call_rcu_sched);
      
      by moving the rest into the macro itself.
      
      This also opens the door to allow the RCU tracepoint strings and
      their addresses to be exported so that userspace tracing tools can
      translate the contents of the pointers of the RCU tracepoints.
      The change will allow for helper code to be placed in the
      RCU_STATE_INITIALIZER() macro to export the name that is used.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a41bfeb2
    • S
      rcu: Add const annotation to char * for RCU tracepoints and functions · e66c33d5
      Steven Rostedt (Red Hat) 提交于
      All the RCU tracepoints and functions that reference char pointers do
      so with just 'char *' even though they do not modify the contents of
      the string itself. This will cause warnings if a const char * is used
      in one of these functions.
      
      The RCU tracepoints store the pointer to the string to refer back to them
      when the trace output is displayed. As this can be minutes, hours or
      even days later, those strings had better be constant.
      
      This change also opens the door to allow the RCU tracepoint strings and
      their addresses to be exported so that userspace tracing tools can
      translate the contents of the pointers of the RCU tracepoints.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e66c33d5
  6. 15 7月, 2013 1 次提交
    • P
      rcu: delete __cpuinit usage from all rcu files · 49fb4c62
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the drivers/rcu uses of the __cpuinit macros
      from all C files.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@freedesktop.org>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      49fb4c62
  7. 04 7月, 2013 1 次提交
  8. 11 6月, 2013 12 次提交
    • P
      rcu: Drive quiescent-state-forcing delay from HZ · 026ad283
      Paul E. McKenney 提交于
      Systems with HZ=100 can have slow bootup times due to the default
      three-jiffy delays between quiescent-state forcing attempts.  This
      commit therefore auto-tunes the RCU_JIFFIES_TILL_FORCE_QS value based
      on the value of HZ.  However, this would break very large systems that
      require more time between quiescent-state forcing attempts.  This
      commit therefore also ups the default delay by one jiffy for each
      256 CPUs that might be on the system (based off of nr_cpu_ids at
      runtime, -not- NR_CPUS at build time).
      
      Updated to collapse #ifdefs for RCU_JIFFIES_TILL_FORCE_QS into a
      step-function definition as suggested by Josh Triplett.
      Reported-by: NPaul Mackerras <paulus@au1.ibm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      026ad283
    • P
      rcu: Move redundant call to note_gp_changes() into called function · 05eb552b
      Paul E. McKenney 提交于
      The __rcu_process_callbacks() invokes note_gp_changes() immediately
      before invoking rcu_check_quiescent_state(), which conditionally
      invokes that same function.  This commit therefore eliminates the
      call to note_gp_changes() in __rcu_process_callbacks() in favor of
      making unconditional to call from rcu_check_quiescent_state() to
      note_gp_changes().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      05eb552b
    • P
      rcu: Inline trivial wrapper function rcu_start_gp_per_cpu() · ce3d9c03
      Paul E. McKenney 提交于
      Given the changes that introduce note_gp_change(), rcu_start_gp_per_cpu()
      is now a trivial wrapper function with only one caller.  This commit
      therefore inlines it into its sole call site.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      ce3d9c03
    • P
      rcu: Eliminate check_for_new_grace_period() wrapper function · 63274cfb
      Paul E. McKenney 提交于
      One of the calls to check_for_new_grace_period() is now redundant due to
      an immediately preceding call to note_gp_changes().  Eliminating this
      redundant call leaves a single caller, which is simpler if inlined.
      This commit therefore eliminates the redundant call and inlines the
      body of check_for_new_grace_period() into the single remaining call site.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      63274cfb
    • P
      rcu: Merge __rcu_process_gp_end() into __note_gp_changes() · ba9fbe95
      Paul E. McKenney 提交于
      This commit eliminates some duplicated code by merging
      __rcu_process_gp_end() into __note_gp_changes().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      ba9fbe95
    • P
      rcu: Switch callers from rcu_process_gp_end() to note_gp_changes() · 470716fc
      Paul E. McKenney 提交于
      Because note_gp_changes() now incorporates rcu_process_gp_end() function,
      this commit switches to the former and eliminates the latter.  In
      addition, this commit changes external calls from __rcu_process_gp_end()
      to __note_gp_changes().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      470716fc
    • P
      rcu: Rename note_new_gpnum() to note_gp_changes() · d34ea322
      Paul E. McKenney 提交于
      Because note_new_gpnum() now also checks for the ends of old grace periods,
      this commit changes its name to note_gp_changes().  Later commits will merge
      rcu_process_gp_end() into note_gp_changes().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      d34ea322
    • P
      rcu: Make __note_new_gpnum() check for ends of prior grace periods · 398ebe60
      Paul E. McKenney 提交于
      The current implementation can detect the beginning of a new grace period
      before noting the end of a previous grace period.  Although the current
      implementation correctly handles this sort of nonsense, it would be
      good to reduce RCU's state space by making such nonsense unnecessary,
      which is now possible thanks to the fact that RCU's callback groups are
      now numbered.
      
      This commit therefore makes __note_new_gpnum() invoke
      __rcu_process_gp_end() in order to note the ends of prior grace
      periods before noting the beginnings of new grace periods.
      Of course, this now means that note_new_gpnum() notes both the
      beginnings and ends of grace periods, and could therefore be
      used in place of rcu_process_gp_end().  But that is a job for
      later commits.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      398ebe60
    • P
      rcu: Move code to apply callback-numbering simplifications · 6eaef633
      Paul E. McKenney 提交于
      The addition of callback numbering allows combining the detection of the
      ends of old grace periods and the beginnings of new grace periods.  This
      commit moves code to set the stage for this combining.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      6eaef633
    • P
      rcu: Convert rcutree.c printk calls · d7f3e207
      Paul E. McKenney 提交于
      This commit converts printk() calls to the corresponding pr_*() calls.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      d7f3e207
    • P
      rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration · 971394f3
      Paul E. McKenney 提交于
      In Steven Rostedt's words:
      
      > I've been debugging the last couple of days why my tests have been
      > locking up. One of my tracing tests, runs all available tracers. The
      > lockup always happened with the mmiotrace, which is used to trace
      > interactions between priority drivers and the kernel. But to do this
      > easily, when the tracer gets registered, it disables all but the boot
      > CPUs. The lockup always happened after it got done disabling the CPUs.
      >
      > Then I decided to try this:
      >
      > while :; do
      > 	for i in 1 2 3; do
      > 		echo 0 > /sys/devices/system/cpu/cpu$i/online
      > 	done
      > 	for i in 1 2 3; do
      > 		echo 1 > /sys/devices/system/cpu/cpu$i/online
      > 	done
      > done
      >
      > Well, sure enough, that locked up too, with the same users. Doing a
      > sysrq-w (showing all blocked tasks):
      >
      > [ 2991.344562]   task                        PC stack   pid father
      > [ 2991.344562] rcu_preempt     D ffff88007986fdf8     0    10      2 0x00000000
      > [ 2991.344562]  ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908
      > [ 2991.344562]  ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80
      > [ 2991.344562]  ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295
      > [ 2991.344562] Call Trace:
      > [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
      > [ 2991.344562]  [<ffffffff81541750>] schedule_timeout+0xbc/0xf9
      > [ 2991.344562]  [<ffffffff8154bec0>] ? ftrace_call+0x5/0x2f
      > [ 2991.344562]  [<ffffffff81049513>] ? cascade+0xa8/0xa8
      > [ 2991.344562]  [<ffffffff815417ab>] schedule_timeout_uninterruptible+0x1e/0x20
      > [ 2991.344562]  [<ffffffff810c980c>] rcu_gp_kthread+0x502/0x94b
      > [ 2991.344562]  [<ffffffff81062791>] ? __init_waitqueue_head+0x50/0x50
      > [ 2991.344562]  [<ffffffff810c930a>] ? rcu_gp_fqs+0x64/0x64
      > [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
      > [ 2991.344562]  [<ffffffff81091e31>] ? lock_release_holdtime.part.23+0x4e/0x55
      > [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
      > [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
      > [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
      > [ 2991.344562] kworker/0:1     D ffffffff81a30680     0    47      2 0x00000000
      > [ 2991.344562] Workqueue: events cpuset_hotplug_workfn
      > [ 2991.344562]  ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8
      > [ 2991.344562]  ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80
      > [ 2991.344562]  ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000
      > [ 2991.344562] Call Trace:
      > [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
      > [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
      > [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
      > [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
      > [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
      > [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
      > [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
      > [ 2991.344562]  [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
      > [ 2991.344562]  [<ffffffff810af7e6>] rebuild_sched_domains_locked+0x6e/0x3a8
      > [ 2991.344562]  [<ffffffff810b0ec6>] rebuild_sched_domains+0x1c/0x2a
      > [ 2991.344562]  [<ffffffff810b109b>] cpuset_hotplug_workfn+0x1c7/0x1d3
      > [ 2991.344562]  [<ffffffff810b0ed9>] ? cpuset_hotplug_workfn+0x5/0x1d3
      > [ 2991.344562]  [<ffffffff81058e07>] process_one_work+0x2d4/0x4d1
      > [ 2991.344562]  [<ffffffff81058d3a>] ? process_one_work+0x207/0x4d1
      > [ 2991.344562]  [<ffffffff8105964c>] worker_thread+0x2e7/0x3b5
      > [ 2991.344562]  [<ffffffff81059365>] ? rescuer_thread+0x332/0x332
      > [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
      > [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
      > [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
      > [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
      > [ 2991.344562] bash            D ffffffff81a4aa80     0  2618   2612 0x10000000
      > [ 2991.344562]  ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c
      > [ 2991.344562]  ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80
      > [ 2991.344562]  ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000
      > [ 2991.344562] Call Trace:
      > [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
      > [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
      > [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
      > [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
      > [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
      > [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
      > [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
      > [ 2991.344562]  [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
      > [ 2991.344562]  [<ffffffff81091c99>] ? __lock_is_held+0x32/0x53
      > [ 2991.344562]  [<ffffffff81548912>] notifier_call_chain+0x6b/0x98
      > [ 2991.344562]  [<ffffffff810671fd>] __raw_notifier_call_chain+0xe/0x10
      > [ 2991.344562]  [<ffffffff8103cf64>] __cpu_notify+0x20/0x32
      > [ 2991.344562]  [<ffffffff8103cf8d>] cpu_notify_nofail+0x17/0x36
      > [ 2991.344562]  [<ffffffff815225de>] _cpu_down+0x154/0x259
      > [ 2991.344562]  [<ffffffff81522710>] cpu_down+0x2d/0x3a
      > [ 2991.344562]  [<ffffffff81526351>] store_online+0x4e/0xe7
      > [ 2991.344562]  [<ffffffff8134d764>] dev_attr_store+0x20/0x22
      > [ 2991.344562]  [<ffffffff811b3c5f>] sysfs_write_file+0x108/0x144
      > [ 2991.344562]  [<ffffffff8114c5ef>] vfs_write+0xfd/0x158
      > [ 2991.344562]  [<ffffffff8114c928>] SyS_write+0x5c/0x83
      > [ 2991.344562]  [<ffffffff8154c494>] tracesys+0xdd/0xe2
      >
      > As well as held locks:
      >
      > [ 3034.728033] Showing all locks held in the system:
      > [ 3034.728033] 1 lock held by rcu_preempt/10:
      > [ 3034.728033]  #0:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff810c9471>] rcu_gp_kthread+0x167/0x94b
      > [ 3034.728033] 4 locks held by kworker/0:1/47:
      > [ 3034.728033]  #0:  (events){.+.+.+}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
      > [ 3034.728033]  #1:  (cpuset_hotplug_work){+.+.+.}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
      > [ 3034.728033]  #2:  (cpuset_mutex){+.+.+.}, at: [<ffffffff810b0ec1>] rebuild_sched_domains+0x17/0x2a
      > [ 3034.728033]  #3:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
      > [ 3034.728033] 1 lock held by mingetty/2563:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 1 lock held by mingetty/2565:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 1 lock held by mingetty/2569:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 1 lock held by mingetty/2572:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 1 lock held by mingetty/2575:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 7 locks held by bash/2618:
      > [ 3034.728033]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff8114bc3f>] file_start_write+0x2a/0x2c
      > [ 3034.728033]  #1:  (&buffer->mutex#2){+.+.+.}, at: [<ffffffff811b3b93>] sysfs_write_file+0x3c/0x144
      > [ 3034.728033]  #2:  (s_active#54){.+.+.+}, at: [<ffffffff811b3c3e>] sysfs_write_file+0xe7/0x144
      > [ 3034.728033]  #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff810217c2>] cpu_hotplug_driver_lock+0x17/0x19
      > [ 3034.728033]  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8103d196>] cpu_maps_update_begin+0x17/0x19
      > [ 3034.728033]  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103cfd8>] cpu_hotplug_begin+0x2c/0x6d
      > [ 3034.728033]  #6:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
      > [ 3034.728033] 1 lock held by bash/2980:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      >
      > Things looked a little weird. Also, this is a deadlock that lockdep did
      > not catch. But what we have here does not look like a circular lock
      > issue:
      >
      > Bash is blocked in rcu_cpu_notify():
      >
      > 1961		/* Exclude any attempts to start a new grace period. */
      > 1962		mutex_lock(&rsp->onoff_mutex);
      >
      >
      > kworker is blocked in get_online_cpus(), which makes sense as we are
      > currently taking down a CPU.
      >
      > But rcu_preempt is not blocked on anything. It is simply sleeping in
      > rcu_gp_kthread (really rcu_gp_init) here:
      >
      > 1453	#ifdef CONFIG_PROVE_RCU_DELAY
      > 1454			if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 &&
      > 1455			    system_state == SYSTEM_RUNNING)
      > 1456				schedule_timeout_uninterruptible(2);
      > 1457	#endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
      >
      > And it does this while holding the onoff_mutex that bash is waiting for.
      >
      > Doing a function trace, it showed me where it happened:
      >
      > [  125.940066] rcu_pree-10      3.... 28384115273: schedule_timeout_uninterruptible <-rcu_gp_kthread
      > [...]
      > [  125.940066] rcu_pree-10      3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==> next_comm=watchdog/3 next_pid=38 next_prio=120
      >
      > The watchdog ran, and then:
      >
      > [  125.940066] watchdog-38      3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==> next_comm=modprobe next_pid=2848 next_prio=118
      >
      > Not sure what modprobe was doing, but shortly after that:
      >
      > [  125.940066] modprobe-2848    3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==> next_comm=migration/3 next_pid=40 next_prio=0
      >
      > Where the migration thread took down the CPU:
      >
      > [  125.940066] migratio-40      3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==> next_comm=swapper/3 next_pid=0 next_prio=120
      >
      > which finally did:
      >
      > [  125.940066]   <idle>-0       3...1 28389282142: arch_cpu_idle_dead <-cpu_startup_entry
      > [  125.940066]   <idle>-0       3...1 28389282548: native_play_dead <-arch_cpu_idle_dead
      > [  125.940066]   <idle>-0       3...1 28389282924: play_dead_common <-native_play_dead
      > [  125.940066]   <idle>-0       3...1 28389283468: idle_task_exit <-play_dead_common
      > [  125.940066]   <idle>-0       3...1 28389284644: amd_e400_remove_cpu <-play_dead_common
      >
      >
      > CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still
      > doing a schedule_timeout_uninterruptible() and it registered it's
      > timeout to the timer base for CPU 3. You would think that it would get
      > migrated right? The issue here is that the timer migration happens at
      > the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for
      > CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is
      > held by the thread that just put itself into a uninterruptible sleep,
      > that wont wake up until the CPU_DEAD notifier of the timer
      > infrastructure is called, which wont happen until the rcu notifier
      > finishes. Here's our deadlock!
      
      This commit breaks this deadlock cycle by substituting a shorter udelay()
      for the previous schedule_timeout_uninterruptible(), while at the same
      time increasing the probability of the delay.  This maintains the intensity
      of the testing.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      971394f3
    • S
      rcu: Don't call wakeup() with rcu_node structure ->lock held · 016a8d5b
      Steven Rostedt 提交于
      This commit fixes a lockdep-detected deadlock by moving a wake_up()
      call out from a rnp->lock critical section.  Please see below for
      the long version of this story.
      
      On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:
      
      > [12572.705832] ======================================================
      > [12572.750317] [ INFO: possible circular locking dependency detected ]
      > [12572.796978] 3.10.0-rc3+ #39 Not tainted
      > [12572.833381] -------------------------------------------------------
      > [12572.862233] trinity-child17/31341 is trying to acquire lock:
      > [12572.870390]  (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
      > [12572.878859]
      > but task is already holding lock:
      > [12572.894894]  (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
      > [12572.903381]
      > which lock already depends on the new lock.
      >
      > [12572.927541]
      > the existing dependency chain (in reverse order) is:
      > [12572.943736]
      > -> #4 (&ctx->lock){-.-...}:
      > [12572.960032]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12572.968337]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12572.976633]        [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
      > [12572.984969]        [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
      > [12572.993326]        [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
      > [12573.001652]        [<ffffffff816eacfe>] schedule_user+0x2e/0x70
      > [12573.009998]        [<ffffffff816ecd64>] retint_careful+0x12/0x2e
      > [12573.018321]
      > -> #3 (&rq->lock){-.-.-.}:
      > [12573.034628]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.042930]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12573.051248]        [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
      > [12573.059579]        [<ffffffff810492f5>] do_fork+0x105/0x470
      > [12573.067880]        [<ffffffff81049686>] kernel_thread+0x26/0x30
      > [12573.076202]        [<ffffffff816cee63>] rest_init+0x23/0x140
      > [12573.084508]        [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
      > [12573.092852]        [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
      > [12573.101233]        [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
      > [12573.109528]
      > -> #2 (&p->pi_lock){-.-.-.}:
      > [12573.125675]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.133829]        [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
      > [12573.141964]        [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
      > [12573.150065]        [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
      > [12573.158151]        [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
      > [12573.166195]        [<ffffffff81085398>] __wake_up_common+0x58/0x90
      > [12573.174215]        [<ffffffff81086909>] __wake_up+0x39/0x50
      > [12573.182146]        [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
      > [12573.190119]        [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
      > [12573.198023]        [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
      > [12573.205860]        [<ffffffff8107a91d>] kthread+0xed/0x100
      > [12573.213656]        [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
      > [12573.221379]
      > -> #1 (&rsp->gp_wq){..-.-.}:
      > [12573.236329]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.243783]        [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
      > [12573.251178]        [<ffffffff810868f3>] __wake_up+0x23/0x50
      > [12573.258505]        [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
      > [12573.265891]        [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
      > [12573.273248]        [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
      > [12573.280564]        [<ffffffff8107a91d>] kthread+0xed/0x100
      > [12573.287807]        [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
      
      Notice the above call chain.
      
      rcu_start_future_gp() is called with the rnp->lock held. Then it calls
      rcu_start_gp_advance, which does a wakeup.
      
      You can't do wakeups while holding the rnp->lock, as that would mean
      that you could not do a rcu_read_unlock() while holding the rq lock, or
      any lock that was taken while holding the rq lock. This is because...
      (See below).
      
      > [12573.295067]
      > -> #0 (rcu_node_0){..-.-.}:
      > [12573.309293]        [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
      > [12573.316568]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.323825]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12573.331081]        [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
      > [12573.338377]        [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
      > [12573.345648]        [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
      > [12573.352942]        [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
      > [12573.360211]        [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
      > [12573.367514]        [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
      > [12573.374816]        [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
      
      Notice the above trace.
      
      perf took its own ctx->lock, which can be taken while holding the rq
      lock. While holding this lock, it did a rcu_read_unlock(). The
      perf_lock_task_context() basically looks like:
      
      rcu_read_lock();
      raw_spin_lock(ctx->lock);
      rcu_read_unlock();
      
      Now, what looks to have happened, is that we scheduled after taking that
      first rcu_read_lock() but before taking the spin lock. When we scheduled
      back in and took the ctx->lock, the following rcu_read_unlock()
      triggered the "special" code.
      
      The rcu_read_unlock_special() takes the rnp->lock, which gives us a
      possible deadlock scenario.
      
      	CPU0		CPU1		CPU2
      	----		----		----
      
      				     rcu_nocb_kthread()
          lock(rq->lock);
      		    lock(ctx->lock);
      				     lock(rnp->lock);
      
      				     wake_up();
      
      				     lock(rq->lock);
      
      		    rcu_read_unlock();
      
      		    rcu_read_unlock_special();
      
      		    lock(rnp->lock);
          lock(ctx->lock);
      
      **** DEADLOCK ****
      
      > [12573.382068]
      > other info that might help us debug this:
      >
      > [12573.403229] Chain exists of:
      >   rcu_node_0 --> &rq->lock --> &ctx->lock
      >
      > [12573.424471]  Possible unsafe locking scenario:
      >
      > [12573.438499]        CPU0                    CPU1
      > [12573.445599]        ----                    ----
      > [12573.452691]   lock(&ctx->lock);
      > [12573.459799]                                lock(&rq->lock);
      > [12573.467010]                                lock(&ctx->lock);
      > [12573.474192]   lock(rcu_node_0);
      > [12573.481262]
      >  *** DEADLOCK ***
      >
      > [12573.501931] 1 lock held by trinity-child17/31341:
      > [12573.508990]  #0:  (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
      > [12573.516475]
      > stack backtrace:
      > [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
      > [12573.545357]  ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
      > [12573.552868]  ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
      > [12573.560353]  0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
      > [12573.567856] Call Trace:
      > [12573.575011]  [<ffffffff816e375b>] dump_stack+0x19/0x1b
      > [12573.582284]  [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
      > [12573.589637]  [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
      > [12573.596982]  [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
      > [12573.604344]  [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.611652]  [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
      > [12573.619030]  [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12573.626331]  [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
      > [12573.633671]  [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
      > [12573.640992]  [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
      > [12573.648330]  [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
      > [12573.655662]  [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
      > [12573.662964]  [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
      > [12573.670276]  [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
      > [12573.677622]  [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
      > [12573.684981]  [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
      > [12573.692358]  [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
      > [12573.699753]  [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
      > [12573.707135]  [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
      > [12573.714599]  [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
      > [12573.721996]  [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
      
      This commit delays the wakeup via irq_work(), which is what
      perf and ftrace use to perform wakeups in critical sections.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      016a8d5b
  9. 30 4月, 2013 1 次提交
  10. 19 4月, 2013 1 次提交
    • F
      nohz: Ensure full dynticks CPUs are RCU nocbs · d1e43fa5
      Frederic Weisbecker 提交于
      We need full dynticks CPU to also be RCU nocb so
      that we don't have to keep the tick to handle RCU
      callbacks.
      
      Make sure the range passed to nohz_full= boot
      parameter is a subset of rcu_nocbs=
      
      The CPUs that fail to meet this requirement will be
      excluded from the nohz_full range. This is checked
      early in boot time, before any CPU has the opportunity
      to stop its tick.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      d1e43fa5
  11. 16 4月, 2013 1 次提交
    • P
      rcu: Kick adaptive-ticks CPUs that are holding up RCU grace periods · 65d798f0
      Paul E. McKenney 提交于
      Adaptive-ticks CPUs inform RCU when they enter kernel mode, but they do
      not necessarily turn the scheduler-clock tick back on.  This state of
      affairs could result in RCU waiting on an adaptive-ticks CPU running
      for an extended period in kernel mode.  Such a CPU will never run the
      RCU state machine, and could therefore indefinitely extend the RCU state
      machine, sooner or later resulting in an OOM condition.
      
      This patch, inspired by an earlier patch by Frederic Weisbecker, therefore
      causes RCU's force-quiescent-state processing to check for this condition
      and to send an IPI to CPUs that remain in that state for too long.
      "Too long" currently means about three jiffies by default, which is
      quite some time for a CPU to remain in the kernel without blocking.
      The rcu_tree.jiffies_till_first_fqs and rcutree.jiffies_till_next_fqs
      sysfs variables may be used to tune "too long" if needed.
      Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      65d798f0
  12. 26 3月, 2013 8 次提交