1. 13 11月, 2009 1 次提交
    • P
      rcu: Accelerate callback processing on CPUs not detecting GP end · b32e9eb6
      Paul E. McKenney 提交于
      An earlier fix for a race resulted in a situation where the CPUs
      other than the CPU that detected the end of the grace period would
      not process their callbacks until the next grace period started.
      
      This means that these other CPUs would unnecessarily demand that an
      extra grace period be started.
      
      This patch eliminates this extra grace period and speeds callback
      processing by propagating rsp->completed to the rcu_node structures
      in the case where the CPU detecting the end of the grace period
      sees no reason to start a new grace period.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1258094104417-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b32e9eb6
  2. 12 11月, 2009 1 次提交
  3. 11 11月, 2009 4 次提交
    • P
      rcu: Simplify association of quiescent states with grace periods · c64ac3ce
      Paul E. McKenney 提交于
      The rdp->passed_quiesc_completed fields are used to properly
      associate the recorded quiescent state with a grace period.  It
      is OK to wrongly associate a given quiescent state with a
      preceding grace period, but it is fatal to associate a given
      quiescent state with a grace period that begins after the
      quiescent state occurred.  Grace periods are numbered, and the
      following fields track them:
      
      o	->gpnum is the number of the grace period currently in
      	progress, or the number of the last grace period to
      	complete if no grace period is currently in progress.
      
      o	->completed is the number of the last grace period to
      	have completed.
      
      These two fields are equal if there is no grace period in
      progress, otherwise ->gpnum is one greater than ->completed.
      But the rdp->passed_quiesc_completed field compared against
      ->completed, and if equal, the quiescent state is presumed to
      count against the current grace period.
      
      The earlier code copied rdp->completed to
      rdp->passed_quiesc_completed, which has been made to work, but
      is error-prone.  In contrast, copying one less than rdp->gpnum
      is guaranteed safe, because rdp->gpnum is not incremented until
      after the start of the corresponding grace period. At the end of
      the grace period, when ->completed has incremented, then any
      quiescent periods recorded previously will be discarded.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12578890421011-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c64ac3ce
    • P
      rcu: Rename dynticks_completed to completed_fqs · 4bcfe055
      Paul E. McKenney 提交于
      This field is used whether or not CONFIG_NO_HZ is set, so the
      old name of ->dynticks_completed is quite misleading.
      
      Change to ->completed_fqs, given that it the value that
      force_quiescent_state() is trying to drive the ->completed field
      away from.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12578890423298-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4bcfe055
    • P
      rcu: Enable synchronize_sched_expedited() fastpath · 956539b7
      Paul E. McKenney 提交于
      This patch adds a counter increment to enable tasks to actually
      take the synchronize_sched_expedited() function's fastpath.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1257889042435-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      956539b7
    • P
      rcu: Remove inline from forward-referenced functions · dbe01350
      Paul E. McKenney 提交于
      Some variants of gcc are reputed to dislike forward references
      to functions declared "inline".  Remove the "inline" keyword
      from such functions.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12578890422402-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dbe01350
  4. 10 11月, 2009 4 次提交
    • P
      rcu: Fix note_new_gpnum() uses of ->gpnum · 9160306e
      Paul E. McKenney 提交于
      Impose a clear locking design on the note_new_gpnum()
      function's use of the ->gpnum counter.  This is done by updating
      rdp->gpnum only from the corresponding leaf rcu_node structure's
      rnp->gpnum field, and even then only under the protection of
      that same rcu_node structure's ->lock field.  Performance and
      scalability are maintained using a form of double-checked
      locking, and excessive spinning is avoided by use of the
      spin_trylock() function.  The use of spin_trylock() is safe due
      to the fact that CPUs who fail to acquire this lock will try
      again later. The hierarchical nature of the rcu_node data
      structure limits contention (which could be limited further if
      need be using the RCU_FANOUT kernel parameter).
      
      Without this patch, obscure but quite possible races could
      result in a quiescent state that occurred during one grace
      period to be accounted to the following grace period, causing
      this following grace period to end prematurely.  Not good!
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: <stable@kernel.org> # .32.x
      LKML-Reference: <12571987492350-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9160306e
    • P
      rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter · d09b62df
      Paul E. McKenney 提交于
      Impose a clear locking design on the rcu_process_gp_end()
      function's use of the ->completed counter.  This is done by
      creating a ->completed field in the rcu_node structure, which
      can safely be accessed under the protection of that structure's
      lock.  Performance and scalability are maintained by using a
      form of double-checked locking, so that rcu_process_gp_end()
      only acquires the leaf rcu_node structure's ->lock if a grace
      period has recently ended.
      
      This fix reduces rcutorture failure rate by at least two orders
      of magnitude under heavy stress with force_quiescent_state()
      being invoked artificially often.  Without this fix,
      unsynchronized access to the ->completed field can cause
      rcu_process_gp_end() to advance callbacks whose grace period has
      not yet expired.  (Bad idea!)
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: <stable@kernel.org> # .32.x
      LKML-Reference: <12571987494069-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d09b62df
    • P
      rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->completed counter · 281d150c
      Paul E. McKenney 提交于
      Impose a clear locking design on non-NO_HZ handling of the
      ->completed counter.  This increases the distance between the
      RCU and the CPU-hotplug mechanisms.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: <stable@kernel.org> # .32.x
      LKML-Reference: <12571987491353-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      281d150c
    • I
      Merge branch 'core/urgent' into core/rcu · 7e1a2766
      Ingo Molnar 提交于
      Merge reason: Pick up RCU fixlet to base further commits on.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7e1a2766
  5. 02 11月, 2009 3 次提交
    • L
      rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls · c5e0cb3d
      Lai Jiangshan 提交于
      Currently, rcu_irq_exit() is invoked only for CONFIG_NO_HZ,
      while rcu_irq_enter() is invoked unconditionally.  This patch
      moves rcu_irq_exit() out from under CONFIG_NO_HZ so that the
      calls are balanced.
      
      This patch has no effect on the behavior of the kernel because
      both rcu_irq_enter() and rcu_irq_exit() are empty for
      !CONFIG_NO_HZ, but the code is easier to understand if the calls
      are obviously balanced in all cases.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12567428891605-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c5e0cb3d
    • P
      rcu: Fix long-grace-period race between forcing and initialization · 83f5b01f
      Paul E. McKenney 提交于
      Very long RCU read-side critical sections (50 milliseconds or
      so) can cause a race between force_quiescent_state() and
      rcu_start_gp() as follows on kernel builds with multi-level
      rcu_node hierarchies:
      
      1.	CPU 0 calls force_quiescent_state(), sees that there is a
      	grace period in progress, and acquires ->fsqlock.
      
      2.	CPU 1 detects the end of the grace period, and so
      	cpu_quiet_msk_finish() sets rsp->completed to rsp->gpnum.
      	This operation is carried out under the root rnp->lock,
      	but CPU 0 has not yet acquired that lock.  Note that
      	rsp->signaled is still RCU_SAVE_DYNTICK from the last
      	grace period.
      
      3.	CPU 1 calls rcu_start_gp(), but no one wants a new grace
      	period, so it drops the root rnp->lock and returns.
      
      4.	CPU 0 acquires the root rnp->lock and picks up rsp->completed
      	and rsp->signaled, then drops rnp->lock.  It then enters the
      	RCU_SAVE_DYNTICK leg of the switch statement.
      
      5.	CPU 2 invokes call_rcu(), and now needs a new grace period.
      	It calls rcu_start_gp(), which acquires the root rnp->lock, sets
      	rsp->signaled to RCU_GP_INIT (too bad that CPU 0 is already in
      	the RCU_SAVE_DYNTICK leg of the switch statement!)  and starts
      	initializing the rcu_node hierarchy.  If there are multiple
      	levels to the hierarchy, it will drop the root rnp->lock and
      	initialize the lower levels of the hierarchy.
      
      6.	CPU 0 notes that rsp->completed has not changed, which permits
              both CPU 2 and CPU 0 to try updating it concurrently.  If CPU 0's
      	update prevails, later calls to force_quiescent_state() can
      	count old quiescent states against the new grace period, which
      	can in turn result in premature ending of grace periods.
      
      	Not good.
      
      This patch adds an RCU_GP_IDLE state for rsp->signaled that is
      set initially at boot time and any time a grace period ends.
      This prevents CPU 0 from getting into the workings of
      force_quiescent_state() in step 4.  Additional locking and
      checks prevent the concurrent update of rsp->signaled in step 6.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1256742889199-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      83f5b01f
    • T
      uids: Prevent tear down race · b00bc0b2
      Thomas Gleixner 提交于
      Ingo triggered the following warning:
      
      WARNING: at lib/debugobjects.c:255 debug_print_object+0x42/0x50()
      Hardware name: System Product Name
      ODEBUG: init active object type: timer_list
      Modules linked in:
      Pid: 2619, comm: dmesg Tainted: G        W  2.6.32-rc5-tip+ #5298
      Call Trace:
       [<81035443>] warn_slowpath_common+0x6a/0x81
       [<8120e483>] ? debug_print_object+0x42/0x50
       [<81035498>] warn_slowpath_fmt+0x29/0x2c
       [<8120e483>] debug_print_object+0x42/0x50
       [<8120ec2a>] __debug_object_init+0x279/0x2d7
       [<8120ecb3>] debug_object_init+0x13/0x18
       [<810409d2>] init_timer_key+0x17/0x6f
       [<81041526>] free_uid+0x50/0x6c
       [<8104ed2d>] put_cred_rcu+0x61/0x72
       [<81067fac>] rcu_do_batch+0x70/0x121
      
      debugobjects warns about an enqueued timer being initialized. If
      CONFIG_USER_SCHED=y the user management code uses delayed work to
      remove the user from the hash table and tear down the sysfs objects.
      
      free_uid is called from RCU and initializes/schedules delayed work if
      the usage count of the user_struct is 0. The init/schedule happens
      outside of the uidhash_lock protected region which allows a concurrent
      caller of find_user() to reference the about to be destroyed
      user_struct w/o preventing the work from being scheduled. If the next
      free_uid call happens before the work timer expired then the active
      timer is initialized and the work scheduled again.
      
      The race was introduced in commit 5cb350ba (sched: group scheduling,
      sysfs tunables) and made more prominent by commit 3959214f (sched:
      delayed cleanup of user_struct)
      
      Move the init/schedule_delayed_work inside of the uidhash_lock
      protected region to prevent the race.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@us.ibm.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: stable@kernel.org
      b00bc0b2
  6. 29 10月, 2009 1 次提交
    • T
      futex: Fix spurious wakeup for requeue_pi really · 11df6ddd
      Thomas Gleixner 提交于
      The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
      NULL test) nor does it use the wake_list of futex_wake() which where
      the reason for commit 41890f2 (futex: Handle spurious wake up)
      
      See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com>
      
      The changes in this fix to the wait_requeue_pi path were considered to
      be a likely unecessary, but harmless safety net. But it turns out that
      due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
      as EAGAIN we built an endless loop in the code path which returns
      correctly EWOULDBLOCK.
      
      Spurious wakeups in wait_requeue_pi code path are unlikely so we do
      the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
      it deal with the spurious wakeup.
      
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      LKML-Reference: <4AE23C74.1090502@us.ibm.com>
      Cc: stable@kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      11df6ddd
  7. 27 10月, 2009 2 次提交
  8. 26 10月, 2009 6 次提交
    • I
      rcu: Do tiny cleanups in rcutiny · 4ce5b903
      Ingo Molnar 提交于
      No change in functionality - just straighten out a few small
      stylistic details.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: avi@redhat.com
      Cc: mtosatti@redhat.com
      LKML-Reference: <12565226351355-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4ce5b903
    • P
      rcu: Improve rcutorture diagnostics when bad torture_type specified · cf886c44
      Paul E. McKenney 提交于
      Make rcutorture list the available torture_type values when it
      doesn't like the one specified.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: avi@redhat.com
      Cc: mtosatti@redhat.com
      LKML-Reference: <12565226351868-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf886c44
    • P
      rcu: Add synchronize_srcu_expedited() to the documentation · 64179861
      Paul E. McKenney 提交于
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: avi@redhat.com
      Cc: mtosatti@redhat.com
      LKML-Reference: <12565226354176-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      64179861
    • P
      rcu: Add synchronize_srcu_expedited() to the rcutorture test suite · 804bb837
      Paul E. McKenney 提交于
      Adds the "srcu_expedited" torture type, and also renames
      sched_ops_sync to sched_sync_ops for consistency while we are in
      this file.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: avi@redhat.com
      Cc: mtosatti@redhat.com
      LKML-Reference: <12565226353636-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      804bb837
    • P
      rcu: Add synchronize_srcu_expedited() · 0cd397d3
      Paul E. McKenney 提交于
      This patch creates a synchronize_srcu_expedited() that uses
      synchronize_sched_expedited() where synchronize_srcu()
      uses synchronize_sched().  The synchronize_srcu() and
      synchronize_srcu_expedited() functions become one-liners that
      pass synchronize_sched() or synchronize_sched_expedited(),
      repectively, to a new __synchronize_srcu() function.
      
      While in the file, move the EXPORT_SYMBOL_GPL()s to immediately
      follow the corresponding functions.
      Requested-by: NAvi Kivity <avi@redhat.com>
      Tested-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: avi@redhat.com
      LKML-Reference: <12565226354038-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0cd397d3
    • P
      rcu: "Tiny RCU", The Bloatwatch Edition · 9b1d82fa
      Paul E. McKenney 提交于
      This patch is a version of RCU designed for !SMP provided for a
      small-footprint RCU implementation.  In particular, the
      implementation of synchronize_rcu() is extremely lightweight and
      high performance. It passes rcutorture testing in each of the
      four relevant configurations (combinations of NO_HZ and PREEMPT)
      on x86.  This saves about 1K bytes compared to old Classic RCU
      (which is no longer in mainline), and more than three kilobytes
      compared to Hierarchical RCU (updated to 2.6.30):
      
      	CONFIG_TREE_RCU:
      
      	   text	   data	    bss	    dec	    filename
      	    183       4       0     187     kernel/rcupdate.o
      	   2783     520      36    3339     kernel/rcutree.o
      				   3526 Total (vs 4565 for v7)
      
      	CONFIG_TREE_PREEMPT_RCU:
      
      	   text	   data	    bss	    dec	    filename
      	    263       4       0     267     kernel/rcupdate.o
      	   4594     776      52    5422     kernel/rcutree.o
      	   			   5689 Total (6155 for v7)
      
      	CONFIG_TINY_RCU:
      
      	   text	   data	    bss	    dec	    filename
      	     96       4       0     100     kernel/rcupdate.o
      	    734      24       0     758     kernel/rcutiny.o
      	    			    858 Total (vs 848 for v7)
      
      The above is for x86.  Your mileage may vary on other platforms.
      Further compression is possible, but is being procrastinated.
      
      Changes from v7 (http://lkml.org/lkml/2009/10/9/388)
      
      o	Apply Lai Jiangshan's review comments (aside from
      might_sleep() 	in synchronize_sched(), which is covered by SMP builds).
      
      o	Fix up expedited primitives.
      
      Changes from v6 (http://lkml.org/lkml/2009/9/23/293).
      
      o	Forward ported to put it into the 2.6.33 stream.
      
      o	Added lockdep support.
      
      o	Make lightweight rcu_barrier.
      
      Changes from v5 (http://lkml.org/lkml/2009/6/23/12).
      
      o	Ported to latest pre-2.6.32 merge window kernel.
      
      	- Renamed rcu_qsctr_inc() to rcu_sched_qs().
      	- Renamed rcu_bh_qsctr_inc() to rcu_bh_qs().
      	- Provided trivial rcu_cpu_notify().
      	- Provided trivial exit_rcu().
      	- Provided trivial rcu_needs_cpu().
      	- Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h.
      
      o	Removed the dependence on EMBEDDED, with a view to making
      	TINY_RCU default for !SMP at some time in the future.
      
      o	Added (trivial) support for expedited grace periods.
      
      Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include:
      
      o	Squeeze the size down a bit further by removing the
      	->completed field from struct rcu_ctrlblk.
      
      o	This permits synchronize_rcu() to become the empty function.
      	Previous concerns about rcutorture were unfounded, as
      	rcutorture correctly handles a constant value from
      	rcu_batches_completed() and rcu_batches_completed_bh().
      
      Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include:
      
      o	Changed rcu_batches_completed(), rcu_batches_completed_bh()
      	rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and
      	rcu_nmi_exit(), to be static inlines, as suggested by David
      	Howells.  Doing this saves about 100 bytes from rcutiny.o.
      	(The numbers between v3 and this v4 of the patch are not directly
      	comparable, since they are against different versions of Linux.)
      
      Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include:
      
      o	Fix whitespace issues.
      
      o	Change short-circuit "||" operator to instead be "+" in order
      to 	fix performance bug noted by "kraai" on LWN.
      
      		(http://lwn.net/Articles/324348/)
      
      Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include:
      
      o	This version depends on EMBEDDED as well as !SMP, as suggested
      	by Ingo.
      
      o	Updated rcu_needs_cpu() to unconditionally return zero,
      	permitting the CPU to enter dynticks-idle mode at any time.
      	This works because callbacks can be invoked upon entry to
      	dynticks-idle mode.
      
      o	Paul is now OK with this being included, based on a poll at
      the 	Kernel Miniconf at linux.conf.au, where about ten people said
      	that they cared about saving 900 bytes on single-CPU systems.
      
      o	Applies to both mainline and tip/core/rcu.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: avi@redhat.com
      Cc: mtosatti@redhat.com
      LKML-Reference: <12565226351355-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9b1d82fa
  9. 16 10月, 2009 2 次提交
    • D
      futex: Move drop_futex_key_refs out of spinlock'ed region · 89061d3d
      Darren Hart 提交于
      When requeuing tasks from one futex to another, the reference held
      by the requeued task to the original futex location needs to be
      dropped eventually.
      
      Dropping the reference may ultimately lead to a call to
      "iput_final" and subsequently call into filesystem- specific code -
      which may be non-atomic.
      
      It is therefore safer to defer this drop operation until after the
      futex_hash_bucket spinlock has been dropped.
      
      Originally-From: Helge Bahmann <hcb@chaoticmind.net>
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: <stable@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      Cc: Sven-Thorsten Dietrich <sdietrich@novell.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <4AD7A298.5040802@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      89061d3d
    • P
      rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang · 237c80c5
      Paul E. McKenney 提交于
      If the following sequence of events occurs, then
      TREE_PREEMPT_RCU will hang waiting for a grace period to
      complete, eventually OOMing the system:
      
      o	A TREE_PREEMPT_RCU build of the kernel is booted on a system
      	with more than 64 physical CPUs present (32 on a 32-bit system).
      	Alternatively, a TREE_PREEMPT_RCU build of the kernel is booted
      	with RCU_FANOUT set to a sufficiently small value that the
      	physical CPUs populate two or more leaf rcu_node structures.
      
      o	A task is preempted in an RCU read-side critical section
      	while running on a CPU corresponding to a given leaf rcu_node
      	structure.
      
      o	All CPUs corresponding to this same leaf rcu_node structure
      	record quiescent states for the current grace period.
      
      o	All of these same CPUs go offline (hence the need for enough
      	physical CPUs to populate more than one leaf rcu_node structure).
      	This causes the preempted task to be moved to the root rcu_node
      	structure.
      
      At this point, there is nothing left to cause the quiescent
      state to be propagated up the rcu_node tree, so the current
      grace period never completes.
      
      The simplest fix, especially after considering the deadlock
      possibilities, is to detect this situation when the last CPU is
      offlined, and to set that CPU's ->qsmask bit in its leaf
      rcu_node structure.  This will cause the next invocation of
      force_quiescent_state() to end the grace period.
      
      Without this fix, this hang can be triggered in an hour or so on
      some machines with rcutorture and random CPU onlining/offlining.
      With this fix, these same machines pass a full 10 hours of this
      sort of abuse.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <20091015162614.GA19131@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      237c80c5
  10. 15 10月, 2009 6 次提交
    • P
      rcu: Update trace.txt documentation for blocked-tasks lists · 0edf1a68
      Paul E. McKenney 提交于
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: npiggin@suse.de
      Cc: jens.axboe@oracle.com
      LKML-Reference: <12555405592804-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0edf1a68
    • P
      rcu: Update trace.txt documentation to reflect recent changes · bd58b430
      Paul E. McKenney 提交于
      o	Remove the CONFIG_PREEMPT_RCU documentation since this
      	config option has now been removed.
      
      o	Change the now-incorrect references to "rcu" labels to
      	instead be "rcu_sched".
      
      o	Add notes stating that CONFIG_TREE_PREEMPT_RCU kernels will
      	have additional "rcu_preempt" output.
      
      o	Note the new "oqlen" field in the rcuhier output (for
      	RCU callbacks orphaned by an offlined CPU).
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: npiggin@suse.de
      Cc: jens.axboe@oracle.com
      LKML-Reference: <1255540559799-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bd58b430
    • P
      rcu: Add rnp->blocked_tasks to tracing · 3397e040
      Paul E. McKenney 提交于
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: npiggin@suse.de
      Cc: jens.axboe@oracle.com
      Cc: Josh Triplett <josh@joshtriplett.org>
      LKML-Reference: <20091014233638.GE6763@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
       kernel/rcutree_trace.c |    8 ++++++--
       1 file changed, 6 insertions(+), 2 deletions(-)
      3397e040
    • P
      rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU · 019129d5
      Paul E. McKenney 提交于
      For the short term, map synchronize_rcu_expedited() to
      synchronize_rcu() for TREE_PREEMPT_RCU and to
      synchronize_sched_expedited() for TREE_RCU.
      
      Longer term, there needs to be a real expedited grace period for
      TREE_PREEMPT_RCU, but candidate patches to date are considerably
      more complex and intrusive.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: npiggin@suse.de
      Cc: jens.axboe@oracle.com
      LKML-Reference: <12555405592331-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      019129d5
    • P
      rcu: Prevent RCU IPI storms in presence of high call_rcu() load · 37c72e56
      Paul E. McKenney 提交于
      As the number of callbacks on a given CPU rises, invoke
      force_quiescent_state() only every blimit number of callbacks
      (defaults to 10,000), and even then only if no other CPU has
      invoked force_quiescent_state() in the meantime.
      
      This should fix the performance regression reported by Nick.
      Reported-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: jens.axboe@oracle.com
      LKML-Reference: <12555405592133-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      37c72e56
    • D
      futex: Check for NULL keys in match_futex · 2bc87203
      Darren Hart 提交于
      If userspace tries to perform a requeue_pi on a non-requeue_pi waiter,
      it will find the futex_q->requeue_pi_key to be NULL and OOPS.
      
      Check for NULL in match_futex() instead of doing explicit NULL pointer
      checks on all call sites.  While match_futex(NULL, NULL) returning
      false is a little odd, it's still correct as we expect valid key
      references.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Dinakar Guniguntala <dino@in.ibm.com>
      CC: John Stultz <johnstul@us.ibm.com>
      Cc: stable@kernel.org
      LKML-Reference: <4AD60687.10306@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      2bc87203
  11. 14 10月, 2009 1 次提交
    • T
      futex: Handle spurious wake up · d58e6576
      Thomas Gleixner 提交于
      The futex code does not handle spurious wake up in futex_wait and
      futex_wait_requeue_pi.
      
      The code assumes that any wake up which was not caused by futex_wake /
      requeue or by a timeout was caused by a signal wake up and returns one
      of the syscall restart error codes.
      
      In case of a spurious wake up the signal delivery code which deals
      with the restart error codes is not invoked and we return that error
      code to user space. That causes applications which actually check the
      return codes to fail. Blaise reported that on preempt-rt a python test
      program run into a exception trap. -rt exposed that due to a built in
      spurious wake up accelerator :)
      
      Solve this by checking signal_pending(current) in the wake up path and
      handle the spurious wake up case w/o returning to user space.
      Reported-by: NBlaise Gassend <blaise@willowgarage.com>
      Debugged-by: NDarren Hart <dvhltc@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@kernel.org
      LKML-Reference: <new-submission>
      d58e6576
  12. 13 10月, 2009 1 次提交
  13. 10 10月, 2009 2 次提交
    • R
      oprofile: warn on freeing event buffer too early · c0868934
      Robert Richter 提交于
      A race shouldn't happen since all workqueues or handlers are canceled
      or flushed before the event buffer is freed. A warning is triggered
      now if the buffer is freed too early.
      
      Also, this patch adds some comments about event buffer protection,
      reworks some code and adds code to clear buffer_pos during alloc and
      free of the event buffer.
      
      Cc: David Rientjes <rientjes@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      c0868934
    • D
      oprofile: fix race condition in event_buffer free · 066b3aa8
      David Rientjes 提交于
      Looking at the 2.6.31-rc9 code, it appears there is a race condition
      in the event_buffer cleanup code path (shutdown). This could lead to
      kernel panic as some CPUs may be operating on the event buffer AFTER
      it has been freed. The attached patch solves the problem and makes
      sure CPUs check if the buffer is not NULL before they access it as
      some may have been spinning on the mutex while the buffer was being
      freed.
      
      The race may happen if the buffer is freed during pending reads. But
      it is not clear why there are races in add_event_entry() since all
      workqueues or handlers are canceled or flushed before the event buffer
      is freed.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      066b3aa8
  14. 09 10月, 2009 6 次提交