1. 25 2月, 2016 2 次提交
    • P
      rcu: Use simple wait queues where possible in rcutree · abedf8e2
      Paul Gortmaker 提交于
      As of commit dae6e64d ("rcu: Introduce proper blocking to no-CBs kthreads
      GP waits") the RCU subsystem started making use of wait queues.
      
      Here we convert all additions of RCU wait queues to use simple wait queues,
      since they don't need the extra overhead of the full wait queue features.
      
      Originally this was done for RT kernels[1], since we would get things like...
      
        BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
        in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt
        Pid: 8, comm: rcu_preempt Not tainted
        Call Trace:
         [<ffffffff8106c8d0>] __might_sleep+0xd0/0xf0
         [<ffffffff817d77b4>] rt_spin_lock+0x24/0x50
         [<ffffffff8106fcf6>] __wake_up+0x36/0x70
         [<ffffffff810c4542>] rcu_gp_kthread+0x4d2/0x680
         [<ffffffff8105f910>] ? __init_waitqueue_head+0x50/0x50
         [<ffffffff810c4070>] ? rcu_gp_fqs+0x80/0x80
         [<ffffffff8105eabb>] kthread+0xdb/0xe0
         [<ffffffff8106b912>] ? finish_task_switch+0x52/0x100
         [<ffffffff817e0754>] kernel_thread_helper+0x4/0x10
         [<ffffffff8105e9e0>] ? __init_kthread_worker+0x60/0x60
         [<ffffffff817e0750>] ? gs_change+0xb/0xb
      
      ...and hence simple wait queues were deployed on RT out of necessity
      (as simple wait uses a raw lock), but mainline might as well take
      advantage of the more streamline support as well.
      
      [1] This is a carry forward of work from v3.10-rt; the original conversion
      was by Thomas on an earlier -rt version, and Sebastian extended it to
      additional post-3.10 added RCU waiters; here I've added a commit log and
      unified the RCU changes into one, and uprev'd it to match mainline RCU.
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: linux-rt-users@vger.kernel.org
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1455871601-27484-6-git-send-email-wagi@monom.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      abedf8e2
    • D
      rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lock · 065bb78c
      Daniel Wagner 提交于
      rcu_nocb_gp_cleanup() is called while holding rnp->lock. Currently,
      this is okay because the wake_up_all() in rcu_nocb_gp_cleanup() will
      not enable the IRQs. lockdep is happy.
      
      By switching over using swait this is not true anymore. swake_up_all()
      enables the IRQs while processing the waiters. __do_softirq() can now
      run and will eventually call rcu_process_callbacks() which wants to
      grap nrp->lock.
      
      Let's move the rcu_nocb_gp_cleanup() call outside the lock before we
      switch over to swait.
      
      If we would hold the rnp->lock and use swait, lockdep reports
      following:
      
       =================================
       [ INFO: inconsistent lock state ]
       4.2.0-rc5-00025-g9a73ba0 #136 Not tainted
       ---------------------------------
       inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
       rcu_preempt/8 [HC0[0]:SC0[0]:HE1:SE1] takes:
        (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0
       {IN-SOFTIRQ-W} state was registered at:
         [<ffffffff81109b9f>] __lock_acquire+0xd5f/0x21e0
         [<ffffffff8110be0f>] lock_acquire+0xdf/0x2b0
         [<ffffffff81841cc9>] _raw_spin_lock_irqsave+0x59/0xa0
         [<ffffffff81136991>] rcu_process_callbacks+0x141/0x3c0
         [<ffffffff810b1a9d>] __do_softirq+0x14d/0x670
         [<ffffffff810b2214>] irq_exit+0x104/0x110
         [<ffffffff81844e96>] smp_apic_timer_interrupt+0x46/0x60
         [<ffffffff81842e70>] apic_timer_interrupt+0x70/0x80
         [<ffffffff810dba66>] rq_attach_root+0xa6/0x100
         [<ffffffff810dbc2d>] cpu_attach_domain+0x16d/0x650
         [<ffffffff810e4b42>] build_sched_domains+0x942/0xb00
         [<ffffffff821777c2>] sched_init_smp+0x509/0x5c1
         [<ffffffff821551e3>] kernel_init_freeable+0x172/0x28f
         [<ffffffff8182cdce>] kernel_init+0xe/0xe0
         [<ffffffff8184231f>] ret_from_fork+0x3f/0x70
       irq event stamp: 76
       hardirqs last  enabled at (75): [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60
       hardirqs last disabled at (76): [<ffffffff8184116f>] _raw_spin_lock_irq+0x1f/0x90
       softirqs last  enabled at (0): [<ffffffff810a8df2>] copy_process.part.26+0x602/0x1cf0
       softirqs last disabled at (0): [<          (null)>]           (null)
       other info that might help us debug this:
        Possible unsafe locking scenario:
              CPU0
              ----
         lock(rcu_node_1);
         <Interrupt>
           lock(rcu_node_1);
        *** DEADLOCK ***
       1 lock held by rcu_preempt/8:
        #0:  (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0
       stack backtrace:
       CPU: 0 PID: 8 Comm: rcu_preempt Not tainted 4.2.0-rc5-00025-g9a73ba0 #136
       Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014
        0000000000000000 000000006d7e67d8 ffff881fb081fbd8 ffffffff818379e0
        0000000000000000 ffff881fb0812a00 ffff881fb081fc38 ffffffff8110813b
        0000000000000000 0000000000000001 ffff881f00000001 ffffffff8102fa4f
       Call Trace:
        [<ffffffff818379e0>] dump_stack+0x4f/0x7b
        [<ffffffff8110813b>] print_usage_bug+0x1db/0x1e0
        [<ffffffff8102fa4f>] ? save_stack_trace+0x2f/0x50
        [<ffffffff811087ad>] mark_lock+0x66d/0x6e0
        [<ffffffff81107790>] ? check_usage_forwards+0x150/0x150
        [<ffffffff81108898>] mark_held_locks+0x78/0xa0
        [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60
        [<ffffffff81108a28>] trace_hardirqs_on_caller+0x168/0x220
        [<ffffffff81108aed>] trace_hardirqs_on+0xd/0x10
        [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60
        [<ffffffff810fd1c7>] swake_up_all+0xb7/0xe0
        [<ffffffff811386e1>] rcu_gp_kthread+0xab1/0xeb0
        [<ffffffff811089bf>] ? trace_hardirqs_on_caller+0xff/0x220
        [<ffffffff81841341>] ? _raw_spin_unlock_irq+0x41/0x60
        [<ffffffff81137c30>] ? rcu_barrier+0x20/0x20
        [<ffffffff810d2014>] kthread+0x104/0x120
        [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60
        [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260
        [<ffffffff8184231f>] ret_from_fork+0x3f/0x70
        [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: linux-rt-users@vger.kernel.org
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1455871601-27484-5-git-send-email-wagi@monom.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      065bb78c
  2. 06 12月, 2015 1 次提交
  3. 05 12月, 2015 2 次提交
    • P
      rcu: Reduce expedited GP memory contention via per-CPU variables · df5bd514
      Paul E. McKenney 提交于
      Currently, the piggybacked-work checks carried out by sync_exp_work_done()
      atomically increment a small set of variables (the ->expedited_workdone0,
      ->expedited_workdone1, ->expedited_workdone2, ->expedited_workdone3
      fields in the rcu_state structure), which will form a memory-contention
      bottleneck given a sufficiently large number of CPUs concurrently invoking
      either synchronize_rcu_expedited() or synchronize_sched_expedited().
      
      This commit therefore moves these for fields to the per-CPU rcu_data
      structure, eliminating the memory contention.  The show_rcuexp() function
      also changes to sum up each field in the rcu_data structures.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      df5bd514
    • P
      rcu: Clarify role of ->expmaskinitnext · 1de6e56d
      Paul E. McKenney 提交于
      Analogy with the ->qsmaskinitnext field might lead one to believe that
      ->expmaskinitnext tracks online CPUs.  This belief is incorrect: Any CPU
      that has ever been online will have its bit set in the ->expmaskinitnext
      field.  This commit therefore adds a comment to make this clear, at
      least to people who read comments.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1de6e56d
  4. 24 11月, 2015 1 次提交
  5. 08 10月, 2015 3 次提交
    • P
      rcu: Add online/offline info to expedited stall warning message · 74611ecb
      Paul E. McKenney 提交于
      This commit makes the RCU CPU stall warning message print online/offline
      indications immediately after the CPU number.  A "O" indicates global
      offline, a "." global online, and a "o" indicates RCU believes that the
      CPU is offline for the current grace period and "." otherwise, and an
      "N" indicates that RCU believes that the CPU will be offline for the
      next grace period, and "." otherwise, all right after the CPU number.
      So for CPU 10, you would normally see "10-...:" indicating that everything
      believes that the CPU is online.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      74611ecb
    • P
      rcu: Stop silencing lockdep false positive for expedited grace periods · 83c2c735
      Paul E. McKenney 提交于
      This reverts commit af859bea (rcu: Silence lockdep false positive
      for expedited grace periods).  Because synchronize_rcu_expedited()
      no longer invokes synchronize_sched_expedited(), ->exp_funnel_mutex
      acquisition is no longer nested, so the false positive no longer happens.
      This commit therefore removes the extra lockdep data structures, as they
      are no longer needed.
      83c2c735
    • P
      rcu: Switch synchronize_sched_expedited() to IPI · 6587a23b
      Paul E. McKenney 提交于
      This commit switches synchronize_sched_expedited() from stop_one_cpu_nowait()
      to smp_call_function_single(), thus moving from an IPI and a pair of
      context switches to an IPI and a single pass through the scheduler.
      Of course, if the scheduler actually does decide to switch to a different
      task, there will still be a pair of context switches, but there would
      likely have been a pair of context switches anyway, just a bit later.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6587a23b
  6. 07 10月, 2015 4 次提交
  7. 21 9月, 2015 5 次提交
    • P
      rcu: Make ->cpu_no_qs be a union for aggregate OR · 5b74c458
      Paul E. McKenney 提交于
      This commit converts the rcu_data structure's ->cpu_no_qs field
      to a union.  The bytewise side of this union allows individual access
      to indications as to whether this CPU needs to find a quiescent state
      for a normal (.norm) and/or expedited (.exp) grace period.  The setwise
      side of the union allows testing whether or not a quiescent state is
      needed at all, for either type of grace period.
      
      For now, only .norm is used.  A later commit will introduce the expedited
      usage.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5b74c458
    • P
      rcu: Invert passed_quiesce and rename to cpu_no_qs · 0d43eb34
      Paul E. McKenney 提交于
      This commit inverts the sense of the rcu_data structure's ->passed_quiesce
      field and renames it to ->cpu_no_qs.  This will allow a later commit to
      use an "aggregate OR" operation to test expedited as well as normal grace
      periods without added overhead.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0d43eb34
    • P
      rcu: Rename qs_pending to core_needs_qs · 97c668b8
      Paul E. McKenney 提交于
      An upcoming commit needs to invert the sense of the ->passed_quiesce
      rcu_data structure field, so this commit is taking this opportunity
      to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.
      
      So if !rdp->core_needs_qs, then this CPU need not concern itself with
      quiescent states, in particular, it need not acquire its leaf rcu_node
      structure's ->lock to check.  Otherwise, it needs to report the next
      quiescent state.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      97c668b8
    • P
      rcu: Move synchronize_sched_expedited() to combining tree · bce5fa12
      Paul E. McKenney 提交于
      Currently, synchronize_sched_expedited() uses a single global counter
      to track the number of remaining context switches that the current
      expedited grace period must wait on.  This is problematic on large
      systems, where the resulting memory contention can be pathological.
      This commit therefore makes synchronize_sched_expedited() instead use
      the combining tree in the same manner as synchronize_rcu_expedited(),
      keeping memory contention down to a dull roar.
      
      This commit creates a temporary function sync_sched_exp_select_cpus()
      that is very similar to sync_rcu_exp_select_cpus().  A later commit
      will consolidate these two functions, which becomes possible when
      synchronize_sched_expedited() switches from stop_one_cpu_nowait() to
      smp_call_function_single().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bce5fa12
    • P
      rcu: Consolidate tree setup for synchronize_rcu_expedited() · b9585e94
      Paul E. McKenney 提交于
      This commit replaces sync_rcu_preempt_exp_init1(() and
      sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug()
      and sync_exp_reset_tree(), which will also be used by
      synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which
      contains code specific to synchronize_rcu_expedited().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b9585e94
  8. 04 8月, 2015 2 次提交
    • P
      rcu,locking: Privatize smp_mb__after_unlock_lock() · 12d560f4
      Paul E. McKenney 提交于
      RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
      likely the only thing that ever will use it, so this commit makes this
      macro private to RCU.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
      12d560f4
    • P
      rcu: Silence lockdep false positive for expedited grace periods · af859bea
      Paul E. McKenney 提交于
      In a CONFIG_PREEMPT=y kernel, synchronize_rcu_expedited()
      acquires the ->exp_funnel_mutex in rcu_preempt_state, then invokes
      synchronize_sched_expedited, which acquires the ->exp_funnel_mutex in
      rcu_sched_state.  There can be no deadlock because rcu_preempt_state
      ->exp_funnel_mutex acquisition always precedes that of rcu_sched_state.
      But lockdep does not know that, so it gives false-positive splats.
      
      This commit therefore associates a separate lock_class_key structure
      with the rcu_sched_state structure's ->exp_funnel_mutex, allowing
      lockdep to see the lock ordering, avoiding the false positives.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      af859bea
  9. 18 7月, 2015 11 次提交
  10. 16 7月, 2015 6 次提交
  11. 28 5月, 2015 3 次提交