1. 07 10月, 2015 1 次提交
  2. 21 9月, 2015 3 次提交
    • P
      rcu: Make ->cpu_no_qs be a union for aggregate OR · 5b74c458
      Paul E. McKenney 提交于
      This commit converts the rcu_data structure's ->cpu_no_qs field
      to a union.  The bytewise side of this union allows individual access
      to indications as to whether this CPU needs to find a quiescent state
      for a normal (.norm) and/or expedited (.exp) grace period.  The setwise
      side of the union allows testing whether or not a quiescent state is
      needed at all, for either type of grace period.
      
      For now, only .norm is used.  A later commit will introduce the expedited
      usage.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5b74c458
    • P
      rcu: Invert passed_quiesce and rename to cpu_no_qs · 0d43eb34
      Paul E. McKenney 提交于
      This commit inverts the sense of the rcu_data structure's ->passed_quiesce
      field and renames it to ->cpu_no_qs.  This will allow a later commit to
      use an "aggregate OR" operation to test expedited as well as normal grace
      periods without added overhead.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0d43eb34
    • P
      rcu: Rename qs_pending to core_needs_qs · 97c668b8
      Paul E. McKenney 提交于
      An upcoming commit needs to invert the sense of the ->passed_quiesce
      rcu_data structure field, so this commit is taking this opportunity
      to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.
      
      So if !rdp->core_needs_qs, then this CPU need not concern itself with
      quiescent states, in particular, it need not acquire its leaf rcu_node
      structure's ->lock to check.  Otherwise, it needs to report the next
      quiescent state.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      97c668b8
  3. 18 7月, 2015 6 次提交
  4. 28 5月, 2015 1 次提交
  5. 13 3月, 2015 1 次提交
    • P
      rcu: Process offlining and onlining only at grace-period start · 0aa04b05
      Paul E. McKenney 提交于
      Races between CPU hotplug and grace periods can be difficult to resolve,
      so the ->onoff_mutex is used to exclude the two events.  Unfortunately,
      this means that it is impossible for an outgoing CPU to perform the
      last bits of its offlining from its last pass through the idle loop,
      because sleeplocks cannot be acquired in that context.
      
      This commit avoids these problems by buffering online and offline events
      in a new ->qsmaskinitnext field in the leaf rcu_node structures.  When a
      grace period starts, the events accumulated in this mask are applied to
      the ->qsmaskinit field, and, if needed, up the rcu_node tree.  The special
      case of all CPUs corresponding to a given leaf rcu_node structure being
      offline while there are still elements in that structure's ->blkd_tasks
      list is handled using a new ->wait_blkd_tasks field.  In this case,
      propagating the offline bits up the tree is deferred until the beginning
      of the grace period after all of the tasks have exited their RCU read-side
      critical sections and removed themselves from the list, at which point
      the ->wait_blkd_tasks flag is cleared.  If one of that leaf rcu_node
      structure's CPUs comes back online before the list empties, then the
      ->wait_blkd_tasks flag is simply cleared.
      
      This of course means that RCU's notion of which CPUs are offline can be
      out of date.  This is OK because RCU need only wait on CPUs that were
      online at the time that the grace period started.  In addition, RCU's
      force-quiescent-state actions will handle the case where a CPU goes
      offline after the grace period starts.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0aa04b05
  6. 16 1月, 2015 1 次提交
    • P
      rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors · 5cd37193
      Paul E. McKenney 提交于
      Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
      in places where it would be useful for it to apply to the normal RCU
      flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
      case for workloads that aggressively overload the system, particularly
      those that generate large numbers of RCU updates on systems running
      NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
      from cond_resched_rcu_qs() to the normal RCU flavors.
      
      Note that it is unfortunately necessary to leave the old ->passed_quiesce
      mechanism in place to allow quiescent states that apply to only one
      flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
      that case, but that is not so good for debugging of RCU internals.)
      In addition, if one of the RCU flavor's grace period has stalled, this
      will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
      quiescent state visible from other CPUs.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
        was used in preemptible code. ]
      5cd37193
  7. 18 2月, 2014 2 次提交
  8. 04 12月, 2013 1 次提交
    • P
      rcu: Break call_rcu() deadlock involving scheduler and perf · 96d3fd0d
      Paul E. McKenney 提交于
      Dave Jones got the following lockdep splat:
      
      >  ======================================================
      >  [ INFO: possible circular locking dependency detected ]
      >  3.12.0-rc3+ #92 Not tainted
      >  -------------------------------------------------------
      >  trinity-child2/15191 is trying to acquire lock:
      >   (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50
      >
      > but task is already holding lock:
      >   (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
      >
      > which lock already depends on the new lock.
      >
      >
      > the existing dependency chain (in reverse order) is:
      >
      > -> #3 (&ctx->lock){-.-...}:
      >         [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >         [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
      >         [<ffffffff811500ff>] __perf_event_task_sched_out+0x2df/0x5e0
      >         [<ffffffff81091b83>] perf_event_task_sched_out+0x93/0xa0
      >         [<ffffffff81732052>] __schedule+0x1d2/0xa20
      >         [<ffffffff81732f30>] preempt_schedule_irq+0x50/0xb0
      >         [<ffffffff817352b6>] retint_kernel+0x26/0x30
      >         [<ffffffff813eed04>] tty_flip_buffer_push+0x34/0x50
      >         [<ffffffff813f0504>] pty_write+0x54/0x60
      >         [<ffffffff813e900d>] n_tty_write+0x32d/0x4e0
      >         [<ffffffff813e5838>] tty_write+0x158/0x2d0
      >         [<ffffffff811c4850>] vfs_write+0xc0/0x1f0
      >         [<ffffffff811c52cc>] SyS_write+0x4c/0xa0
      >         [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
      >
      > -> #2 (&rq->lock){-.-.-.}:
      >         [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >         [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
      >         [<ffffffff810980b2>] wake_up_new_task+0xc2/0x2e0
      >         [<ffffffff81054336>] do_fork+0x126/0x460
      >         [<ffffffff81054696>] kernel_thread+0x26/0x30
      >         [<ffffffff8171ff93>] rest_init+0x23/0x140
      >         [<ffffffff81ee1e4b>] start_kernel+0x3f6/0x403
      >         [<ffffffff81ee1571>] x86_64_start_reservations+0x2a/0x2c
      >         [<ffffffff81ee1664>] x86_64_start_kernel+0xf1/0xf4
      >
      > -> #1 (&p->pi_lock){-.-.-.}:
      >         [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >         [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
      >         [<ffffffff810979d1>] try_to_wake_up+0x31/0x350
      >         [<ffffffff81097d62>] default_wake_function+0x12/0x20
      >         [<ffffffff81084af8>] autoremove_wake_function+0x18/0x40
      >         [<ffffffff8108ea38>] __wake_up_common+0x58/0x90
      >         [<ffffffff8108ff59>] __wake_up+0x39/0x50
      >         [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
      >         [<ffffffff81111450>] __call_rcu+0x140/0x820
      >         [<ffffffff81111b8d>] call_rcu+0x1d/0x20
      >         [<ffffffff81093697>] cpu_attach_domain+0x287/0x360
      >         [<ffffffff81099d7e>] build_sched_domains+0xe5e/0x10a0
      >         [<ffffffff81efa7fc>] sched_init_smp+0x3b7/0x47a
      >         [<ffffffff81ee1f4e>] kernel_init_freeable+0xf6/0x202
      >         [<ffffffff817200be>] kernel_init+0xe/0x190
      >         [<ffffffff8173d22c>] ret_from_fork+0x7c/0xb0
      >
      > -> #0 (&rdp->nocb_wq){......}:
      >         [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
      >         [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >         [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
      >         [<ffffffff8108ff43>] __wake_up+0x23/0x50
      >         [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
      >         [<ffffffff81111450>] __call_rcu+0x140/0x820
      >         [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
      >         [<ffffffff81149abf>] put_ctx+0x4f/0x70
      >         [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
      >         [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
      >         [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
      >         [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
      >         [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
      >
      > other info that might help us debug this:
      >
      > Chain exists of:
      >   &rdp->nocb_wq --> &rq->lock --> &ctx->lock
      >
      >   Possible unsafe locking scenario:
      >
      >         CPU0                    CPU1
      >         ----                    ----
      >    lock(&ctx->lock);
      >                                 lock(&rq->lock);
      >                                 lock(&ctx->lock);
      >    lock(&rdp->nocb_wq);
      >
      >  *** DEADLOCK ***
      >
      > 1 lock held by trinity-child2/15191:
      >  #0:  (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
      >
      > stack backtrace:
      > CPU: 2 PID: 15191 Comm: trinity-child2 Not tainted 3.12.0-rc3+ #92
      >  ffffffff82565b70 ffff880070c2dbf8 ffffffff8172a363 ffffffff824edf40
      >  ffff880070c2dc38 ffffffff81726741 ffff880070c2dc90 ffff88022383b1c0
      >  ffff88022383aac0 0000000000000000 ffff88022383b188 ffff88022383b1c0
      > Call Trace:
      >  [<ffffffff8172a363>] dump_stack+0x4e/0x82
      >  [<ffffffff81726741>] print_circular_bug+0x200/0x20f
      >  [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
      >  [<ffffffff810c6439>] ? get_lock_stats+0x19/0x60
      >  [<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
      >  [<ffffffff810cc243>] lock_acquire+0x93/0x200
      >  [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
      >  [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
      >  [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
      >  [<ffffffff8108ff43>] __wake_up+0x23/0x50
      >  [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
      >  [<ffffffff81111450>] __call_rcu+0x140/0x820
      >  [<ffffffff8109bc8f>] ? local_clock+0x3f/0x50
      >  [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
      >  [<ffffffff81149abf>] put_ctx+0x4f/0x70
      >  [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
      >  [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
      >  [<ffffffff810c9af5>] ? trace_hardirqs_on_caller+0x115/0x1e0
      >  [<ffffffff810c9bcd>] ? trace_hardirqs_on+0xd/0x10
      >  [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
      >  [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
      >  [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
      
      The underlying problem is that perf is invoking call_rcu() with the
      scheduler locks held, but in NOCB mode, call_rcu() will with high
      probability invoke the scheduler -- which just might want to use its
      locks.  The reason that call_rcu() needs to invoke the scheduler is
      to wake up the corresponding rcuo callback-offload kthread, which
      does the job of starting up a grace period and invoking the callbacks
      afterwards.
      
      One solution (championed on a related problem by Lai Jiangshan) is to
      simply defer the wakeup to some point where scheduler locks are no longer
      held.  Since we don't want to unnecessarily incur the cost of such
      deferral, the task before us is threefold:
      
      1.	Determine when it is likely that a relevant scheduler lock is held.
      
      2.	Defer the wakeup in such cases.
      
      3.	Ensure that all deferred wakeups eventually happen, preferably
      	sooner rather than later.
      
      We use irqs_disabled_flags() as a proxy for relevant scheduler locks
      being held.  This works because the relevant locks are always acquired
      with interrupts disabled.  We may defer more often than needed, but that
      is at least safe.
      
      The wakeup deferral is tracked via a new field in the per-CPU and
      per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup.
      
      This flag is checked by the RCU core processing.  The __rcu_pending()
      function now checks this flag, which causes rcu_check_callbacks()
      to initiate RCU core processing at each scheduling-clock interrupt
      where this flag is set.  Of course this is not sufficient because
      scheduling-clock interrupts are often turned off (the things we used to
      be able to count on!).  So the flags are also checked on entry to any
      state that RCU considers to be idle, which includes both NO_HZ_IDLE idle
      state and NO_HZ_FULL user-mode-execution state.
      
      This approach should allow call_rcu() to be invoked regardless of what
      locks you might be holding, the key word being "should".
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      96d3fd0d
  9. 16 10月, 2013 1 次提交
  10. 05 5月, 2013 1 次提交
  11. 26 3月, 2013 1 次提交
  12. 17 11月, 2012 2 次提交
    • P
      rcu: Separate accounting of callbacks from callback-free CPUs · c635a4e1
      Paul E. McKenney 提交于
      Currently, callback invocations from callback-free CPUs are accounted to
      the CPU that registered the callback, but using the same field that is
      used for normal callbacks.  This makes it impossible to determine from
      debugfs output whether callbacks are in fact being diverted.  This commit
      therefore adds a separate ->n_nocbs_invoked field in the rcu_data structure
      in which diverted callback invocations are counted.  RCU's debugfs tracing
      still displays normal callback invocations using ci=, but displayed
      diverted callbacks with nci=.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c635a4e1
    • P
      rcu: Add callback-free CPUs · 3fbfbf7a
      Paul E. McKenney 提交于
      RCU callback execution can add significant OS jitter and also can
      degrade both scheduling latency and, in asymmetric multiprocessors,
      energy efficiency.  This commit therefore adds the ability for selected
      CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded
      to kthreads.  If the "rcu_nocb_poll" boot parameter is also specified,
      these kthreads will do polling, removing the need for the offloaded
      CPUs to do wakeups.  At least one CPU must be doing normal callback
      processing: currently CPU 0 cannot be selected as a no-CBs CPU.
      In addition, attempts to offline the last normal-CBs CPU will fail.
      
      This feature was inspired by Jim Houston's and Joe Korty's JRCU, and
      this commit includes fixes to problems located by Fengguang Wu's
      kbuild test robot.
      
      [ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ]
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3fbfbf7a
  13. 09 11月, 2012 14 次提交
  14. 23 9月, 2012 3 次提交
  15. 13 8月, 2012 1 次提交
  16. 06 7月, 2012 1 次提交