1. 11 6月, 2013 8 次提交
    • P
      rcu: Merge adjacent identical ifdefs · 4982969d
      Paul E. McKenney 提交于
      Two ifdefs in kernel/rcupdate.c now have identical conditions with
      nothing between them, so the commit merges them into a single ifdef.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      4982969d
    • P
      rcu: Drive quiescent-state-forcing delay from HZ · 026ad283
      Paul E. McKenney 提交于
      Systems with HZ=100 can have slow bootup times due to the default
      three-jiffy delays between quiescent-state forcing attempts.  This
      commit therefore auto-tunes the RCU_JIFFIES_TILL_FORCE_QS value based
      on the value of HZ.  However, this would break very large systems that
      require more time between quiescent-state forcing attempts.  This
      commit therefore also ups the default delay by one jiffy for each
      256 CPUs that might be on the system (based off of nr_cpu_ids at
      runtime, -not- NR_CPUS at build time).
      
      Updated to collapse #ifdefs for RCU_JIFFIES_TILL_FORCE_QS into a
      step-function definition as suggested by Josh Triplett.
      Reported-by: NPaul Mackerras <paulus@au1.ibm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      026ad283
    • P
      rcu: Remove "Experimental" flags · 9a5739d7
      Paul E. McKenney 提交于
      After a release or two, features are no longer experimental.  Therefore,
      this commit removes the "Experimental" tag from them.
      Reported-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      9a5739d7
    • P
      rcu: Convert rcutree_plugin.h printk calls · efc151c3
      Paul E. McKenney 提交于
      This commit converts printk() calls to the corresponding pr_*() calls.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      efc151c3
    • P
      rcu: Convert rcutree.c printk calls · d7f3e207
      Paul E. McKenney 提交于
      This commit converts printk() calls to the corresponding pr_*() calls.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      d7f3e207
    • P
      rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration · 971394f3
      Paul E. McKenney 提交于
      In Steven Rostedt's words:
      
      > I've been debugging the last couple of days why my tests have been
      > locking up. One of my tracing tests, runs all available tracers. The
      > lockup always happened with the mmiotrace, which is used to trace
      > interactions between priority drivers and the kernel. But to do this
      > easily, when the tracer gets registered, it disables all but the boot
      > CPUs. The lockup always happened after it got done disabling the CPUs.
      >
      > Then I decided to try this:
      >
      > while :; do
      > 	for i in 1 2 3; do
      > 		echo 0 > /sys/devices/system/cpu/cpu$i/online
      > 	done
      > 	for i in 1 2 3; do
      > 		echo 1 > /sys/devices/system/cpu/cpu$i/online
      > 	done
      > done
      >
      > Well, sure enough, that locked up too, with the same users. Doing a
      > sysrq-w (showing all blocked tasks):
      >
      > [ 2991.344562]   task                        PC stack   pid father
      > [ 2991.344562] rcu_preempt     D ffff88007986fdf8     0    10      2 0x00000000
      > [ 2991.344562]  ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908
      > [ 2991.344562]  ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80
      > [ 2991.344562]  ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295
      > [ 2991.344562] Call Trace:
      > [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
      > [ 2991.344562]  [<ffffffff81541750>] schedule_timeout+0xbc/0xf9
      > [ 2991.344562]  [<ffffffff8154bec0>] ? ftrace_call+0x5/0x2f
      > [ 2991.344562]  [<ffffffff81049513>] ? cascade+0xa8/0xa8
      > [ 2991.344562]  [<ffffffff815417ab>] schedule_timeout_uninterruptible+0x1e/0x20
      > [ 2991.344562]  [<ffffffff810c980c>] rcu_gp_kthread+0x502/0x94b
      > [ 2991.344562]  [<ffffffff81062791>] ? __init_waitqueue_head+0x50/0x50
      > [ 2991.344562]  [<ffffffff810c930a>] ? rcu_gp_fqs+0x64/0x64
      > [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
      > [ 2991.344562]  [<ffffffff81091e31>] ? lock_release_holdtime.part.23+0x4e/0x55
      > [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
      > [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
      > [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
      > [ 2991.344562] kworker/0:1     D ffffffff81a30680     0    47      2 0x00000000
      > [ 2991.344562] Workqueue: events cpuset_hotplug_workfn
      > [ 2991.344562]  ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8
      > [ 2991.344562]  ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80
      > [ 2991.344562]  ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000
      > [ 2991.344562] Call Trace:
      > [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
      > [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
      > [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
      > [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
      > [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
      > [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
      > [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
      > [ 2991.344562]  [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
      > [ 2991.344562]  [<ffffffff810af7e6>] rebuild_sched_domains_locked+0x6e/0x3a8
      > [ 2991.344562]  [<ffffffff810b0ec6>] rebuild_sched_domains+0x1c/0x2a
      > [ 2991.344562]  [<ffffffff810b109b>] cpuset_hotplug_workfn+0x1c7/0x1d3
      > [ 2991.344562]  [<ffffffff810b0ed9>] ? cpuset_hotplug_workfn+0x5/0x1d3
      > [ 2991.344562]  [<ffffffff81058e07>] process_one_work+0x2d4/0x4d1
      > [ 2991.344562]  [<ffffffff81058d3a>] ? process_one_work+0x207/0x4d1
      > [ 2991.344562]  [<ffffffff8105964c>] worker_thread+0x2e7/0x3b5
      > [ 2991.344562]  [<ffffffff81059365>] ? rescuer_thread+0x332/0x332
      > [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
      > [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
      > [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
      > [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
      > [ 2991.344562] bash            D ffffffff81a4aa80     0  2618   2612 0x10000000
      > [ 2991.344562]  ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c
      > [ 2991.344562]  ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80
      > [ 2991.344562]  ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000
      > [ 2991.344562] Call Trace:
      > [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
      > [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
      > [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
      > [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
      > [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
      > [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
      > [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
      > [ 2991.344562]  [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
      > [ 2991.344562]  [<ffffffff81091c99>] ? __lock_is_held+0x32/0x53
      > [ 2991.344562]  [<ffffffff81548912>] notifier_call_chain+0x6b/0x98
      > [ 2991.344562]  [<ffffffff810671fd>] __raw_notifier_call_chain+0xe/0x10
      > [ 2991.344562]  [<ffffffff8103cf64>] __cpu_notify+0x20/0x32
      > [ 2991.344562]  [<ffffffff8103cf8d>] cpu_notify_nofail+0x17/0x36
      > [ 2991.344562]  [<ffffffff815225de>] _cpu_down+0x154/0x259
      > [ 2991.344562]  [<ffffffff81522710>] cpu_down+0x2d/0x3a
      > [ 2991.344562]  [<ffffffff81526351>] store_online+0x4e/0xe7
      > [ 2991.344562]  [<ffffffff8134d764>] dev_attr_store+0x20/0x22
      > [ 2991.344562]  [<ffffffff811b3c5f>] sysfs_write_file+0x108/0x144
      > [ 2991.344562]  [<ffffffff8114c5ef>] vfs_write+0xfd/0x158
      > [ 2991.344562]  [<ffffffff8114c928>] SyS_write+0x5c/0x83
      > [ 2991.344562]  [<ffffffff8154c494>] tracesys+0xdd/0xe2
      >
      > As well as held locks:
      >
      > [ 3034.728033] Showing all locks held in the system:
      > [ 3034.728033] 1 lock held by rcu_preempt/10:
      > [ 3034.728033]  #0:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff810c9471>] rcu_gp_kthread+0x167/0x94b
      > [ 3034.728033] 4 locks held by kworker/0:1/47:
      > [ 3034.728033]  #0:  (events){.+.+.+}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
      > [ 3034.728033]  #1:  (cpuset_hotplug_work){+.+.+.}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
      > [ 3034.728033]  #2:  (cpuset_mutex){+.+.+.}, at: [<ffffffff810b0ec1>] rebuild_sched_domains+0x17/0x2a
      > [ 3034.728033]  #3:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
      > [ 3034.728033] 1 lock held by mingetty/2563:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 1 lock held by mingetty/2565:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 1 lock held by mingetty/2569:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 1 lock held by mingetty/2572:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 1 lock held by mingetty/2575:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 7 locks held by bash/2618:
      > [ 3034.728033]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff8114bc3f>] file_start_write+0x2a/0x2c
      > [ 3034.728033]  #1:  (&buffer->mutex#2){+.+.+.}, at: [<ffffffff811b3b93>] sysfs_write_file+0x3c/0x144
      > [ 3034.728033]  #2:  (s_active#54){.+.+.+}, at: [<ffffffff811b3c3e>] sysfs_write_file+0xe7/0x144
      > [ 3034.728033]  #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff810217c2>] cpu_hotplug_driver_lock+0x17/0x19
      > [ 3034.728033]  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8103d196>] cpu_maps_update_begin+0x17/0x19
      > [ 3034.728033]  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103cfd8>] cpu_hotplug_begin+0x2c/0x6d
      > [ 3034.728033]  #6:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
      > [ 3034.728033] 1 lock held by bash/2980:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      >
      > Things looked a little weird. Also, this is a deadlock that lockdep did
      > not catch. But what we have here does not look like a circular lock
      > issue:
      >
      > Bash is blocked in rcu_cpu_notify():
      >
      > 1961		/* Exclude any attempts to start a new grace period. */
      > 1962		mutex_lock(&rsp->onoff_mutex);
      >
      >
      > kworker is blocked in get_online_cpus(), which makes sense as we are
      > currently taking down a CPU.
      >
      > But rcu_preempt is not blocked on anything. It is simply sleeping in
      > rcu_gp_kthread (really rcu_gp_init) here:
      >
      > 1453	#ifdef CONFIG_PROVE_RCU_DELAY
      > 1454			if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 &&
      > 1455			    system_state == SYSTEM_RUNNING)
      > 1456				schedule_timeout_uninterruptible(2);
      > 1457	#endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
      >
      > And it does this while holding the onoff_mutex that bash is waiting for.
      >
      > Doing a function trace, it showed me where it happened:
      >
      > [  125.940066] rcu_pree-10      3.... 28384115273: schedule_timeout_uninterruptible <-rcu_gp_kthread
      > [...]
      > [  125.940066] rcu_pree-10      3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==> next_comm=watchdog/3 next_pid=38 next_prio=120
      >
      > The watchdog ran, and then:
      >
      > [  125.940066] watchdog-38      3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==> next_comm=modprobe next_pid=2848 next_prio=118
      >
      > Not sure what modprobe was doing, but shortly after that:
      >
      > [  125.940066] modprobe-2848    3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==> next_comm=migration/3 next_pid=40 next_prio=0
      >
      > Where the migration thread took down the CPU:
      >
      > [  125.940066] migratio-40      3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==> next_comm=swapper/3 next_pid=0 next_prio=120
      >
      > which finally did:
      >
      > [  125.940066]   <idle>-0       3...1 28389282142: arch_cpu_idle_dead <-cpu_startup_entry
      > [  125.940066]   <idle>-0       3...1 28389282548: native_play_dead <-arch_cpu_idle_dead
      > [  125.940066]   <idle>-0       3...1 28389282924: play_dead_common <-native_play_dead
      > [  125.940066]   <idle>-0       3...1 28389283468: idle_task_exit <-play_dead_common
      > [  125.940066]   <idle>-0       3...1 28389284644: amd_e400_remove_cpu <-play_dead_common
      >
      >
      > CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still
      > doing a schedule_timeout_uninterruptible() and it registered it's
      > timeout to the timer base for CPU 3. You would think that it would get
      > migrated right? The issue here is that the timer migration happens at
      > the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for
      > CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is
      > held by the thread that just put itself into a uninterruptible sleep,
      > that wont wake up until the CPU_DEAD notifier of the timer
      > infrastructure is called, which wont happen until the rcu notifier
      > finishes. Here's our deadlock!
      
      This commit breaks this deadlock cycle by substituting a shorter udelay()
      for the previous schedule_timeout_uninterruptible(), while at the same
      time increasing the probability of the delay.  This maintains the intensity
      of the testing.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      971394f3
    • S
      rcu: Don't call wakeup() with rcu_node structure ->lock held · 016a8d5b
      Steven Rostedt 提交于
      This commit fixes a lockdep-detected deadlock by moving a wake_up()
      call out from a rnp->lock critical section.  Please see below for
      the long version of this story.
      
      On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:
      
      > [12572.705832] ======================================================
      > [12572.750317] [ INFO: possible circular locking dependency detected ]
      > [12572.796978] 3.10.0-rc3+ #39 Not tainted
      > [12572.833381] -------------------------------------------------------
      > [12572.862233] trinity-child17/31341 is trying to acquire lock:
      > [12572.870390]  (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
      > [12572.878859]
      > but task is already holding lock:
      > [12572.894894]  (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
      > [12572.903381]
      > which lock already depends on the new lock.
      >
      > [12572.927541]
      > the existing dependency chain (in reverse order) is:
      > [12572.943736]
      > -> #4 (&ctx->lock){-.-...}:
      > [12572.960032]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12572.968337]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12572.976633]        [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
      > [12572.984969]        [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
      > [12572.993326]        [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
      > [12573.001652]        [<ffffffff816eacfe>] schedule_user+0x2e/0x70
      > [12573.009998]        [<ffffffff816ecd64>] retint_careful+0x12/0x2e
      > [12573.018321]
      > -> #3 (&rq->lock){-.-.-.}:
      > [12573.034628]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.042930]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12573.051248]        [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
      > [12573.059579]        [<ffffffff810492f5>] do_fork+0x105/0x470
      > [12573.067880]        [<ffffffff81049686>] kernel_thread+0x26/0x30
      > [12573.076202]        [<ffffffff816cee63>] rest_init+0x23/0x140
      > [12573.084508]        [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
      > [12573.092852]        [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
      > [12573.101233]        [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
      > [12573.109528]
      > -> #2 (&p->pi_lock){-.-.-.}:
      > [12573.125675]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.133829]        [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
      > [12573.141964]        [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
      > [12573.150065]        [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
      > [12573.158151]        [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
      > [12573.166195]        [<ffffffff81085398>] __wake_up_common+0x58/0x90
      > [12573.174215]        [<ffffffff81086909>] __wake_up+0x39/0x50
      > [12573.182146]        [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
      > [12573.190119]        [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
      > [12573.198023]        [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
      > [12573.205860]        [<ffffffff8107a91d>] kthread+0xed/0x100
      > [12573.213656]        [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
      > [12573.221379]
      > -> #1 (&rsp->gp_wq){..-.-.}:
      > [12573.236329]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.243783]        [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
      > [12573.251178]        [<ffffffff810868f3>] __wake_up+0x23/0x50
      > [12573.258505]        [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
      > [12573.265891]        [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
      > [12573.273248]        [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
      > [12573.280564]        [<ffffffff8107a91d>] kthread+0xed/0x100
      > [12573.287807]        [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
      
      Notice the above call chain.
      
      rcu_start_future_gp() is called with the rnp->lock held. Then it calls
      rcu_start_gp_advance, which does a wakeup.
      
      You can't do wakeups while holding the rnp->lock, as that would mean
      that you could not do a rcu_read_unlock() while holding the rq lock, or
      any lock that was taken while holding the rq lock. This is because...
      (See below).
      
      > [12573.295067]
      > -> #0 (rcu_node_0){..-.-.}:
      > [12573.309293]        [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
      > [12573.316568]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.323825]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12573.331081]        [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
      > [12573.338377]        [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
      > [12573.345648]        [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
      > [12573.352942]        [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
      > [12573.360211]        [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
      > [12573.367514]        [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
      > [12573.374816]        [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
      
      Notice the above trace.
      
      perf took its own ctx->lock, which can be taken while holding the rq
      lock. While holding this lock, it did a rcu_read_unlock(). The
      perf_lock_task_context() basically looks like:
      
      rcu_read_lock();
      raw_spin_lock(ctx->lock);
      rcu_read_unlock();
      
      Now, what looks to have happened, is that we scheduled after taking that
      first rcu_read_lock() but before taking the spin lock. When we scheduled
      back in and took the ctx->lock, the following rcu_read_unlock()
      triggered the "special" code.
      
      The rcu_read_unlock_special() takes the rnp->lock, which gives us a
      possible deadlock scenario.
      
      	CPU0		CPU1		CPU2
      	----		----		----
      
      				     rcu_nocb_kthread()
          lock(rq->lock);
      		    lock(ctx->lock);
      				     lock(rnp->lock);
      
      				     wake_up();
      
      				     lock(rq->lock);
      
      		    rcu_read_unlock();
      
      		    rcu_read_unlock_special();
      
      		    lock(rnp->lock);
          lock(ctx->lock);
      
      **** DEADLOCK ****
      
      > [12573.382068]
      > other info that might help us debug this:
      >
      > [12573.403229] Chain exists of:
      >   rcu_node_0 --> &rq->lock --> &ctx->lock
      >
      > [12573.424471]  Possible unsafe locking scenario:
      >
      > [12573.438499]        CPU0                    CPU1
      > [12573.445599]        ----                    ----
      > [12573.452691]   lock(&ctx->lock);
      > [12573.459799]                                lock(&rq->lock);
      > [12573.467010]                                lock(&ctx->lock);
      > [12573.474192]   lock(rcu_node_0);
      > [12573.481262]
      >  *** DEADLOCK ***
      >
      > [12573.501931] 1 lock held by trinity-child17/31341:
      > [12573.508990]  #0:  (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
      > [12573.516475]
      > stack backtrace:
      > [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
      > [12573.545357]  ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
      > [12573.552868]  ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
      > [12573.560353]  0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
      > [12573.567856] Call Trace:
      > [12573.575011]  [<ffffffff816e375b>] dump_stack+0x19/0x1b
      > [12573.582284]  [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
      > [12573.589637]  [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
      > [12573.596982]  [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
      > [12573.604344]  [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.611652]  [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
      > [12573.619030]  [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12573.626331]  [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
      > [12573.633671]  [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
      > [12573.640992]  [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
      > [12573.648330]  [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
      > [12573.655662]  [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
      > [12573.662964]  [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
      > [12573.670276]  [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
      > [12573.677622]  [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
      > [12573.684981]  [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
      > [12573.692358]  [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
      > [12573.699753]  [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
      > [12573.707135]  [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
      > [12573.714599]  [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
      > [12573.721996]  [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
      
      This commit delays the wakeup via irq_work(), which is what
      perf and ftrace use to perform wakeups in critical sections.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      016a8d5b
    • P
      trace: Allow idle-safe tracepoints to be called from irq · d6284099
      Paul E. McKenney 提交于
      __DECLARE_TRACE_RCU() currently creates an _rcuidle() tracepoint which
      may safely be invoked from what RCU considers to be an idle CPU.
      However, these _rcuidle() tracepoints may -not- be invoked from the
      handler of an irq taken from idle, because rcu_idle_enter() zeroes
      RCU's nesting-level counter, so that the rcu_irq_exit() returning to
      idle will trigger a WARN_ON_ONCE().
      
      This commit therefore substitutes rcu_irq_enter() for rcu_idle_exit()
      and rcu_irq_exit() for rcu_idle_enter() in order to make the _rcuidle()
      tracepoints usable from irq handlers as well as from process context.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      d6284099
  2. 09 6月, 2013 13 次提交
  3. 08 6月, 2013 14 次提交
    • L
      Merge tag 'trace-fixes-v3.10-rc3-v3' of... · 14d0ee05
      Linus Torvalds 提交于
      Merge tag 'trace-fixes-v3.10-rc3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull tracing fixes from Steven Rostedt:
       "This contains 4 fixes.
      
        The first two fix the case where full RCU debugging is enabled,
        enabling function tracing causes a live lock of the system.  This is
        due to the added debug checks in rcu_dereference_raw() that is used by
        the function tracer.  These checks are also traced by the function
        tracer as well as cause enough overhead to the function tracer to slow
        down the system enough that the time to finish an interrupt can take
        longer than when the next interrupt is triggered, causing a live lock
        from the timer interrupt.
      
        Talking this over with Paul McKenney, we came up with a fix that adds
        a new rcu_dereference_raw_notrace() that does not perform these added
        checks, and let the function tracer use that.
      
        The third commit fixes a failed compile when branch tracing is
        enabled, due to the conversion of the trace_test_buffer() selftest
        that the branch trace wasn't converted for.
      
        The forth patch fixes a bug caught by the RCU lockdep code where a
        rcu_read_lock() is performed when rcu is disabled (either going to or
        from idle, or user space).  This happened on the irqsoff tracer as it
        calls task_uid().  The fix here was to use current_uid() when possible
        that doesn't use rcu locking.  Which luckily, is always used when
        irqsoff calls this code."
      
      * tag 'trace-fixes-v3.10-rc3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Use current_uid() for critical time tracing
        tracing: Fix bad parameter passed in branch selftest
        ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends
        rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu()
      14d0ee05
    • R
      Revert "ACPI / scan: do not match drivers against objects having scan handlers" · ea7f6656
      Rafael J. Wysocki 提交于
      Commit 9f29ab11 ("ACPI / scan: do not match drivers against objects
      having scan handlers") introduced a boot regression on Tony's ia64 HP
      rx2600.  Tony says:
      
        "It panics with the message:
      
         Kernel panic - not syncing: Unable to find SBA IOMMU: Try a generic or DIG kernel
      
         [...] my problem comes from arch/ia64/hp/common/sba_iommu.c
         where the code in sba_init() says:
      
              acpi_bus_register_driver(&acpi_sba_ioc_driver);
              if (!ioc_list) {
      
         but because of this change we never managed to call ioc_init()
         so ioc_list doesn't get set up, and we die."
      
      Revert it to avoid this breakage and we'll fix the problem it attempted
      to address later.
      Reported-by: NTony Luck <tony.luck@gmail.com>
      Cc: 3.9+ <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea7f6656
    • O
      Merge tag 'mxs-fixes-3.10' of git://git.linaro.org/people/shawnguo/linux-2.6 into fixes · 090878aa
      Olof Johansson 提交于
      From Shawn Guo, mxs fixes for 3.10:
      
      - Since the time we move to MULTI_IRQ_HANDLER, the 0x7f polling for no
        interrupt in icoll_handle_irq() becomes insane, because 0x7f is an
        valid interrupt number, the irq of gpio bank 0.  That unnecessary
        polling results in the driver not detecting when irq 0x7f is active
        which makes the machine effectively dead lock.  The fix removes the
        interrupt poll loop and allows usage of gpio0 interrupt without an
        infinite loop.
      
      * tag 'mxs-fixes-3.10' of git://git.linaro.org/people/shawnguo/linux-2.6:
        ARM: mxs: icoll: Fix interrupts gpio bank 0
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      090878aa
    • O
      Merge tag 'imx-fixes-3.10-2' of git://git.linaro.org/people/shawnguo/linux-2.6 into fixes · 3d0d8b91
      Olof Johansson 提交于
      From Shawn Guo, imx fixes for 3.10, take 2:
      
      - One device tree fix for all spi node to have per clock added.
        The clock is needed by spi driver to calculate bit rate divisor.
        The spi node in the current device trees either does not have the
        clock or is defined as dummy clock, in which case the driver probe
        will fail or spi will run at a wrong bit rate.
      
      - Two imx6q clock fixes, which correct axi_sels and ldb_di_sels.
      
      * tag 'imx-fixes-3.10-2' of git://git.linaro.org/people/shawnguo/linux-2.6:
        ARM: imx: clk-imx6q: AXI clock select index is incorrect
        ARM: dts: imx: fix clocks for cspi
        ARM i.MX6q: fix for ldb_di_sels
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      3d0d8b91
    • D
      ARM: exynos: add debug_ll_io_init() call in exynos_init_io() · 9c1fcdcc
      Doug Anderson 提交于
      If the early MMU mapping of the UART happens to get booted out of the
      TLB between the start of paging_init() and when we finally re-add the
      UART at the very end of s3c_init_cpu(), we'll get a hang at bootup if
      we've got early_printk enabled.  Avoid this hang by calling
      debug_ll_io_init() early.
      
      Without this patch, you can reliably reproduce a hang when early
      printk is enabled by adding flush_tlb_all() at the start of
      exynos_init_io().  After this patch the hang goes away.
      Signed-off-by: NDoug Anderson <dianders@chromium.org>
      Acked-by: NKukjin Kim <kgene.kim@samsung.com>
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      9c1fcdcc
    • O
      Merge tag 'renesas-fixes-for-v3.10' of... · fb565ff7
      Olof Johansson 提交于
      Merge tag 'renesas-fixes-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes
      
      From Simon Horman, Renesas ARM based SoC fixes for v3.10:
      - Correction to USB OVC and PENC pin groupings on r8a7779 SoC.
        This avoids conflicts when the USB_OVCn pins are used by another function.
        This has been observed to be a problem in v3.10-rc1.
      - Update CMT clock rating for sh73a0 SoC to resolve boot failure
        on kzm9g-reference. This resolves a regression between v3.9 and v3.10-rc1.
      
      * tag 'renesas-fixes-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
        ARM: shmobile: sh73a0: Update CMT clockevent rating to 80
        sh-pfc: r8a7779: Don't group USB OVC and PENC pins
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      fb565ff7
    • T
      ARM: EXYNOS: uncompress - print debug messages if DEBUG_LL is defined · 437d8ac5
      Tushar Behera 提交于
      Printing low-level debug messages make an assumption that the specified
      UART port has been preconfigured by the bootloader. Incorrectly
      specified UART port results in system getting stalled while printing the
      message "Uncompressing Linux... done, booting the kernel"
      This UART port number is specified through S3C_LOWLEVEL_UART_PORT. Since
      the UART port might different for different board, it is not possible to
      specify it correctly for every board that use a common defconfig file.
      
      Calling this print subroutine only when DEBUG_LL fixes the problem. By
      disabling DEBUG_LL in default config file, we would be able to boot
      multiple boards with different default UART ports.
      
      With this current approach, we miss the print "Uncompressing Linux...
      done, booting the kernel." when DEBUG_LL is not defined.
      Signed-off-by: NTushar Behera <tushar.behera@linaro.org>
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      437d8ac5
    • L
      Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband · e8193ce5
      Linus Torvalds 提交于
      Pull infiniband fixes from Roland Dreier:
       - qib RCU/lockdep fix
       - iser device removal fix, plus doc fixes
      
      * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
        IB/qib: Fix lockdep splat in qib_alloc_lkey()
        MAINTAINERS: Add entry for iSCSI Extensions for RDMA (iSER) initiator
        IB/iser: Add Mellanox copyright
        IB/iser: Fix device removal flow
      e8193ce5
    • L
      Merge tag 'vfio-v3.10-rc5' of git://github.com/awilliam/linux-vfio · 5125ed5b
      Linus Torvalds 提交于
      Pull vfio fix from Alex Williamson:
       "fix rmmod crash"
      
      * tag 'vfio-v3.10-rc5' of git://github.com/awilliam/linux-vfio:
        vfio: fix crash on rmmod
      5125ed5b
    • L
      Merge tag 'ecryptfs-3.10-rc5-msync' of... · b8e9dbac
      Linus Torvalds 提交于
      Merge tag 'ecryptfs-3.10-rc5-msync' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
      
      Pull ecryptfs fixes from Tyler Hicks:
       - Fixes how eCryptfs handles msync to sync both the upper and lower
         file
       - A couple of MAINTAINERS updates
      
      * tag 'ecryptfs-3.10-rc5-msync' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
        eCryptfs: Check return of filemap_write_and_wait during fsync
        Update eCryptFS maintainers
        ecryptfs: fixed msync to flush data
      b8e9dbac
    • L
      Merge branch 'for-3.10' of git://git.samba.org/sfrench/cifs-2.6 · e4327859
      Linus Torvalds 提交于
      Pull CIFS fix from Steve French:
       "Fix one byte buffer overrun with prefixpaths on cifs mounts which can
        cause a problem with mount depending on the string length"
      
      * 'for-3.10' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: fix off-by-one bug in build_unc_path_to_root
      e4327859
    • A
      dmatest: do not allow to interrupt ongoing tests · bcc567e3
      Andy Shevchenko 提交于
      When user interrupts ongoing transfers the dmatest may end up with console
      lockup, oops, or data mismatch. This patch prevents user to abort any ongoing
      test.
      
      Documentation is updated accordingly.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reported-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      bcc567e3
    • L
      Merge tag 'sound-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 308f8813
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       - A pile of small regression fix patches for HD-audio VIA codecs
       - Quirks for HD-aduio and USB-audio devices
       - A trivial SIS7019 error path fix
      
      * tag 'sound-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: usb-audio - Fix invalid volume resolution on Logitech HD webcam c270
        ALSA: usb-audio - Apply Logitech QuickCam Pro 9000 quirk only to audio iface
        ALSA: hda/via - Clean up duplicated codes
        ALSA: hda/via - Fix wrongly cleared pins after suspend on VT1802
        ALSA: hda - Add keep_eapd_on flag to generic parser
        ALSA: hda - Allow setting automute/automic hooks after parsing
        ALSA: hda/via - Disable broken dynamic power control
        ALSA: usb-audio: fix Roland/Cakewalk UM-3G support
        ALSA: hda - Add headset quirk for two Dell machines
        ALSA: hda - add dock support for Thinkpad T431s
        ALSA: sis7019: fix error return code in sis_chip_create()
      308f8813
    • L
      Merge tag 'pm+acpi-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 3132aef2
      Linus Torvalds 提交于
      Pull power management and ACPI fixes from Rafael J Wysocki:
      
       - Fix for an ACPI PM regression causing Toshiba P870-303 to crash
         during boot from Rafael J Wysocki.
      
       - ACPI fix for an issue causing some drivers to attempt to bind to
         devices they shouldn't touch from Aaron Lu.
      
       - Fix for a recent cpufreq regression related to a possible race with
         CPU offline from Michael Wang.
      
       - ACPI cpufreq regression fix for an issue causing turbo frequencies to
         be underutilized in some cases from Ross Lagerwall.
      
       - cpufreq-cpu0 driver fix related to incorrect clock ACPI usage from
         Guennadi Liakhovetski.
      
       - HP WMI driver fix for an issue causing GPS initialization and
         poweroff failures on HP Elitebook 6930p from Lan Tianyu.
      
       - APEI (ACPI Platform Error Interface) fix for an issue in the error
         code path in ghes_probe() from Wei Yongjun.
      
       - New ACPI video driver blacklist entries for HP m4 and HP Pavilion g6
         from Alex Hung and Ash Willis.
      
      * tag 'pm+acpi-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / PM: Do not execute _PS0 for devices without _PSC during initialization
        cpufreq: cpufreq-cpu0: use the exact frequency for clk_set_rate()
        cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work()
        ACPI / scan: do not match drivers against objects having scan handlers
        ACPI / APEI: fix error return code in ghes_probe()
        acpi-cpufreq: set current frequency based on target P-State
        ACPI / video: ignore BIOS initial backlight value for HP Pavilion g6
        ACPI / video: ignore BIOS initial backlight value for HP m4
        x86 / platform / hp_wmi: Fix bluetooth_rfkill misuse in hp_wmi_rfkill_setup()
      3132aef2
  4. 07 6月, 2013 5 次提交