1. 05 7月, 2008 1 次提交
  2. 01 7月, 2008 1 次提交
    • G
      rcu: fix hotplug vs rcu race · 8558f8f8
      Gautham R Shenoy 提交于
      Dhaval Giani reported this warning during cpu hotplug stress-tests:
      
      | On running kernel compiles in parallel with cpu hotplug:
      |
      | WARNING: at arch/x86/kernel/smp.c:118
      | native_smp_send_reschedule+0x21/0x36()
      | Modules linked in:
      | Pid: 27483, comm: cc1 Not tainted 2.6.26-rc7 #1
      | [...]
      |  [<c0110355>] native_smp_send_reschedule+0x21/0x36
      |  [<c014fe8f>] force_quiescent_state+0x47/0x57
      |  [<c014fef0>] call_rcu+0x51/0x6d
      |  [<c01713b3>] __fput+0x130/0x158
      |  [<c0171231>] fput+0x17/0x19
      |  [<c016fd99>] filp_close+0x4d/0x57
      |  [<c016fdff>] sys_close+0x5c/0x97
      
      IMHO the warning is a spurious one.
      
      cpu_online_map is updated by the _cpu_down() using stop_machine_run().
      Since force_quiescent_state is invoked from irqs disabled section,
      stop_machine_run() won't be executing while a cpu is executing
      force_quiescent_state(). Hence the cpu_online_map is stable while we're
      in the irq disabled section.
      
      However, a cpu might have been offlined _just_ before we disabled irqs
      while entering force_quiescent_state(). And rcu subsystem might not yet
      have handled the CPU_DEAD notification, leading to the offlined cpu's
      bit being set in the rcp->cpumask.
      
      Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent sending
      smp_reschedule() to an offlined CPU.
      
      Here's the timeline:
      
      CPU_A						 CPU_B
      --------------------------------------------------------------
      cpu_down():					.
      .					   	.
      .						.
      stop_machine(): /* disables preemption,		.
      		 * and irqs */			.
      .						.
      .						.
      take_cpu_down();				.
      .						.
      .						.
      .						.
      cpu_disable(); /*this removes cpu 		.
      		*from cpu_online_map 		.
      		*/				.
      .						.
      .						.
      restart_machine(); /* enables irqs */		.
      ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
      .						call_rcu();
      .						/* disables irqs here */
      .						.force_quiescent_state();
      .CPU_DEAD:					.for_each_cpu(rcp->cpumask)
      .						.   smp_send_reschedule();
      .						.
      .						.   WARN_ON() for offlined CPU!
      .
      .
      .
      rcu_cpu_notify:
      .
      -------- WINDOW ENDS ------------------------------------------
      rcu_offline_cpu() /* Which calls cpu_quiet()
      		   * which removes
      		   * cpu from rcp->cpumask.
      		   */
      
      If a new batch was started just before calling stop_machine_run(), the
      "tobe-offlined" cpu is still present in rcp-cpumask.
      
      During a cpu-offline, from take_cpu_down(), we queue an rt-prio idle
      task as the next task to be picked by the scheduler. We also call
      cpu_disable() which will disable any further interrupts and remove the
      cpu's bit from the cpu_online_map.
      
      Once the stop_machine_run() successfully calls take_cpu_down(), it calls
      schedule(). That's the last time a schedule is called on the offlined
      cpu, and hence the last time when rdp->passed_quiesc will be set to 1
      through rcu_qsctr_inc().
      
      But the cpu_quiet() will be on this cpu will be called only when the
      next RCU_SOFTIRQ occurs on this CPU. So at this time, the offlined CPU
      is still set in rcp->cpumask.
      
      Now coming back to the idle_task which truely offlines the CPU, it does
      check for a pending RCU and raises the softirq, since it will find
      rdp->passed_quiesc to be 0 in this case. However, since the cpu is
      offline I am not sure if the softirq will trigger on the CPU.
      
      Even if it doesn't the rcu_offline_cpu() will find that rcp->completed
      is not the same as rcp->cur, which means that our cpu could be holding
      up the grace period progression. Hence we call cpu_quiet() and move
      ahead.
      
      But because of the window explained in the timeline, we could still have
      a call_rcu() before the RCU subsystem executes it's CPU_DEAD
      notification, and we send smp_send_reschedule() to offlined cpu while
      trying to force the quiescent states. The appended patch adds comments
      and prevents checking for offlined cpu everytime.
      
      cpu_online_map is updated by the _cpu_down() using stop_machine_run().
      Since force_quiescent_state is invoked from irqs disabled section,
      stop_machine_run() won't be executing while a cpu is executing
      force_quiescent_state(). Hence the cpu_online_map is stable while we're
      in the irq disabled section.
      Reported-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rusty Russel <rusty@rustcorp.com.au>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8558f8f8
  3. 27 6月, 2008 1 次提交
  4. 23 6月, 2008 1 次提交
    • T
      futexes: fix fault handling in futex_lock_pi · 1b7558e4
      Thomas Gleixner 提交于
      This patch addresses a very sporadic pi-futex related failure in
      highly threaded java apps on large SMP systems.
      
      David Holmes reported that the pi_state consistency check in
      lookup_pi_state triggered with his test application. This means that
      the kernel internal pi_state and the user space futex variable are out
      of sync. First we assumed that this is a user space data corruption,
      but deeper investigation revieled that the problem happend because the
      pi-futex code is not handling a fault in the futex_lock_pi path when
      the user space variable needs to be fixed up.
      
      The fault happens when a fork mapped the anon memory which contains
      the futex readonly for COW or the page got swapped out exactly between
      the unlock of the futex and the return of either the new futex owner
      or the task which was the expected owner but failed to acquire the
      kernel internal rtmutex. The current futex_lock_pi() code drops out
      with an inconsistent in case it faults and returns -EFAULT to user
      space. User space has no way to fixup that state.
      
      When we wrote this code we thought that we could not drop the hash
      bucket lock at this point to handle the fault.
      
      After analysing the code again it turned out to be wrong because there
      are only two tasks involved which might modify the pi_state and the
      user space variable:
      
       - the task which acquired the rtmutex
       - the pending owner of the pi_state which did not get the rtmutex
      
      Both tasks drop into the fixup_pi_state() function before returning to
      user space. The first task which acquired the hash bucket lock faults
      in the fixup of the user space variable, drops the spinlock and calls
      futex_handle_fault() to fault in the page. Now the second task could
      acquire the hash bucket lock and tries to fixup the user space
      variable as well. It either faults as well or it succeeds because the
      first task already faulted the page in.
      
      One caveat is to avoid a double fixup. After returning from the fault
      handling we reacquire the hash bucket lock and check whether the
      pi_state owner has been modified already.
      Reported-by: NDavid Holmes <david.holmes@sun.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: David Holmes <david.holmes@sun.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
       kernel/futex.c |   93 ++++++++++++++++++++++++++++++++++++++++++++-------------
       1 file changed, 73 insertions(+), 20 deletions(-)
      1b7558e4
  5. 22 6月, 2008 4 次提交
  6. 21 6月, 2008 29 次提交
  7. 20 6月, 2008 3 次提交