1. 21 4月, 2017 3 次提交
    • P
      srcu: Expedite srcu_schedule_cbs_snp() callback invocation · 0497b489
      Paul E. McKenney 提交于
      Although Tree SRCU does reduce delays when there is at least one
      synchronize_srcu_expedited() invocation pending, srcu_schedule_cbs_snp()
      still waits for SRCU_INTERVAL before invoking callbacks.  Since
      synchronize_srcu_expedited() now posts a callback and waits for
      that callback to do a wakeup, this destroys the expedited nature of
      synchronize_srcu_expedited().  This destruction became apparent to
      Marc Zyngier in the guise of a guest-OS bootup slowdown from five
      seconds to no fewer than forty seconds.
      
      This commit therefore invokes callbacks immediately at the end of the
      grace period when there is at least one synchronize_srcu_expedited()
      invocation pending.  This brought Marc's guest-OS bootup times back
      into the realm of reason.
      Reported-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
      0497b489
    • P
      srcu: Parallelize callback handling · da915ad5
      Paul E. McKenney 提交于
      Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2],
      however, there are workloads that could result in a high volume of
      concurrent invocations of call_srcu(), which with current SRCU would
      result in excessive lock contention on the srcu_struct structure's
      ->queue_lock, which protects SRCU's callback lists.  This commit therefore
      moves SRCU to per-CPU callback lists, thus greatly reducing contention.
      
      Because a given SRCU instance no longer has a single centralized callback
      list, starting grace periods and invoking callbacks are both more complex
      than in the single-list Classic SRCU implementation.  Starting grace
      periods and handling callbacks are now handled using an srcu_node tree
      that is in some ways similar to the rcu_node trees used by RCU-bh,
      RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is
      controlled by exactly the same Kconfig options and boot parameters that
      control the shape of the rcu_node tree).
      
      In addition, the old per-CPU srcu_array structure is now named srcu_data
      and contains an rcu_segcblist structure named ->srcu_cblist for its
      callbacks (and a spinlock to protect this).  The srcu_struct gets
      an srcu_gp_seq that is used to associate callback segments with the
      corresponding completion-time grace-period number.  These completion-time
      grace-period numbers are propagated up the srcu_node tree so that the
      grace-period workqueue handler can determine whether additional grace
      periods are needed on the one hand and where to look for callbacks that
      are ready to be invoked.
      
      The srcu_barrier() function must now wait on all instances of the per-CPU
      ->srcu_cblist.  Because each ->srcu_cblist is protected by ->lock,
      srcu_barrier() can remotely add the needed callbacks.  In theory,
      it could also remotely start grace periods, but in practice doing so
      is complex and racy.  And interestingly enough, it is never necessary
      for srcu_barrier() to start a grace period because srcu_barrier() only
      enqueues a callback when a callback is already present--and it turns out
      that a grace period has to have already been started for this pre-existing
      callback.  Furthermore, it is only the callback that srcu_barrier()
      needs to wait on, not any particular grace period.  Therefore, a new
      rcu_segcblist_entrain() function enqueues the srcu_barrier() function's
      callback into the same segment occupied by the last pre-existing callback
      in the list.  The special case where all the pre-existing callbacks are
      on a different list (because they are in the process of being invoked)
      is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL
      segment, relying on the done-callbacks check that takes place after all
      callbacks are inovked.
      
      Note that the readers use the same algorithm as before.  Note that there
      is a separate srcu_idx that tells the readers what counter to increment.
      This unfortunately cannot be combined with srcu_gp_seq because they
      need to be incremented at different times.
      
      This commit introduces some ugly #ifdefs in rcutorture.  These will go
      away when I feel good enough about Tree SRCU to ditch Classic SRCU.
      
      Some crude performance comparisons, courtesy of a quickly hacked rcuperf
      asynchronous-grace-period capability:
      
      			Callback Queuing Overhead
      			-------------------------
      	# CPUS		Classic SRCU	Tree SRCU
      	------          ------------    ---------
      	     2              0.349 us     0.342 us
      	    16             31.66  us     0.4   us
      	    41             ---------     0.417 us
      
      The times are the 90th percentiles, a statistic that was chosen to reject
      the overheads of the occasional srcu_barrier() call needed to avoid OOMing
      the test machine.  The rcuperf test hangs when running Classic SRCU at 41
      CPUs, hence the line of dashes.  Despite the hacks to both the rcuperf code
      and that statistics, this is a convincing demonstration of Tree SRCU's
      performance and scalability advantages.
      
      [1] https://lwn.net/Articles/309030/
      [2] https://patchwork.kernel.org/patch/5108281/Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]
      da915ad5
    • P
      kvm: Move srcu_struct fields to end of struct kvm · 6ade8694
      Paul E. McKenney 提交于
      Parallelizing SRCU callback handling will increase the size of
      srcu_struct, which will move the kvm structure's kvm_arch field out
      of reach of powerpc's current assembly code, which will result in the
      following sort of build error:
      
      arch/powerpc/kvm/book3s_hv_rmhandlers.S:617: Error: operand out of range (0x000000000000b328 is not between 0xffffffffffff8000 and 0x0000000000007fff)
      
      This commit moves the srcu_struct fields in the kvm structure to follow
      the kvm_arch field, which will allow powerpc's assembly code to continue
      to be able to reach the kvm_arch field.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Reported-by: NMichael Ellerman <michaele@au1.ibm.com>
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      [ paulmck: Moved this commit to precede SRCU callback parallelization,
        and reworded the commit log into future tense, all in the name of
        bisectability. ]
      6ade8694
  2. 19 4月, 2017 37 次提交
    • P
      srcu: Introduce CLASSIC_SRCU Kconfig option · dad81a20
      Paul E. McKenney 提交于
      The TREE_SRCU rewrite is large and a bit on the non-simple side, so
      this commit helps reduce risk by allowing the old v4.11 SRCU algorithm
      to be selected using a new CLASSIC_SRCU Kconfig option that depends
      on RCU_EXPERT.  The default is to use the new TREE_SRCU and TINY_SRCU
      algorithms, in order to help get these the testing that they need.
      However, if your users do not require the update-side scalability that
      is to be provided by TREE_SRCU, select RCU_EXPERT and then CLASSIC_SRCU
      to revert back to the old classic SRCU algorithm.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      dad81a20
    • P
      srcutorture: Print Tiny SRCU reader statistics · 32071141
      Paul E. McKenney 提交于
      The srcu_torture_stats() function is adapted to the specific srcu_struct
      layout traditionally used by SRCU.  This commit therefore adds support
      for Tiny SRCU.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      32071141
    • P
      srcu: Create a tiny SRCU · d8be8173
      Paul E. McKenney 提交于
      In response to automated complaints about modifications to SRCU
      increasing its size, this commit creates a tiny SRCU that is
      used in SMP=n && PREEMPT=n builds.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d8be8173
    • P
      mm: Use static initialization for "srcu" · dde8da6c
      Paul E. McKenney 提交于
      The MM-notifier code currently dynamically initializes the srcu_struct
      named "srcu" at subsys_initcall() time, and includes a BUG_ON() to check
      this initialization in do_mmu_notifier_register().  Unfortunately, there
      is no foolproof way to verify that an srcu_struct has been initialized,
      given the possibility of an srcu_struct being allocated on the stack or
      on the heap.  This means that creating an srcu_struct_is_initialized()
      function is not a reasonable course of action.  Nor is peppering
      do_mmu_notifier_register() with SRCU-specific #ifdefs an attractive
      alternative.
      
      This commit therefore uses DEFINE_STATIC_SRCU() to initialize
      this srcu_struct at compile time, thus eliminating both the
      subsys_initcall()-time initialization and the runtime BUG_ON().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: <linux-mm@kvack.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      dde8da6c
    • P
      srcu: Crude control of expedited grace periods · f60d231a
      Paul E. McKenney 提交于
      SRCU's implementation of expedited grace periods has always assumed
      that the SRCU instance is idle when the expedited request arrives.
      This commit improves this a bit by maintaining a count of the number
      of outstanding expedited requests, thus allowing prior non-expedited
      grace periods accommodate these requests by shifting to expedited mode.
      However, any non-expedited wait already in progress will still wait for
      the full duration.
      
      Improved control of expedited grace periods is planned, but one step
      at a time.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f60d231a
    • P
      srcu: Merge ->srcu_state into ->srcu_gp_seq · 80a7956f
      Paul E. McKenney 提交于
      Updating ->srcu_state and ->srcu_gp_seq will lead to extremely complex
      race conditions given multiple callback queues, so this commit takes
      advantage of the two-bit state now available in rcu_seq counters to
      store the state in the bottom two bits of ->srcu_gp_seq.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      80a7956f
    • P
      srcu: Allow a second bit in rcu_seq for SRCU state · f1ec57a4
      Paul E. McKenney 提交于
      This commit increases the number of reserved bits at the bottom of an
      rcu_seq grace-period counter from one to two, as will be needed to
      accommodate SRCU's three-state grace periods.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f1ec57a4
    • P
      srcu: Improve rcu_seq grace-period-counter abstraction · 031aeee0
      Paul E. McKenney 提交于
      The expedited grace-period code contains several open-coded shifts
      know the format of an rcu_seq grace-period counter, which is not
      particularly good style.  This commit therefore creates a new
      rcu_seq_ctr() function that extracts the counter portion of the
      counter, and an rcu_seq_state() function that extracts the low-order
      state bit.  This commit prepares for SRCU callback parallelization,
      which will require two state bits.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      031aeee0
    • P
      91e27c35
    • P
      srcu: Make num_rcu_lvl[] array be external · e95d68d2
      Paul E. McKenney 提交于
      This commit makes the num_rcu_lvl[] array external so that SRCU can
      make use of it for initializing its upcoming srcu_node tree.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e95d68d2
    • P
      srcu: Move rcu_node traversal macros to rcu.h · efbe451d
      Paul E. McKenney 提交于
      This commit moves rcu_for_each_node_breadth_first(),
      rcu_for_each_nonleaf_node_breadth_first(), and
      rcu_for_each_leaf_node() from kernel/rcu/tree.h to
      kernel/rcu/rcu.h so that SRCU can access them.
      This commit is code-movement only.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      efbe451d
    • P
      rcu: Remove redundant levelcnt[] array from rcu_init_one() · 41f5c631
      Paul E. McKenney 提交于
      The levelcnt[] array is identical to num_rcu_lvl[], so this commit
      removes levelcnt[].
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      41f5c631
    • P
      srcu: Move rcu_init_levelspread() to rcu_tree_node.h · 2b34c43c
      Paul E. McKenney 提交于
      This commit moves the rcu_init_levelspread() function from
      kernel/rcu/tree.c to kernel/rcu/rcu.h so that SRCU can access it.  This is
      another step towards enabling SRCU to create its own combining tree.
      This commit is code-movement only, give or take knock-on adjustments.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2b34c43c
    • P
      srcu: Move combining-tree definitions for SRCU's benefit · f2425b4e
      Paul E. McKenney 提交于
      This commit moves the C preprocessor code that defines the default shape
      of the rcu_node combining tree to a new include/linux/rcu_node_tree.h
      file as a first step towards enabling SRCU to create its own combining
      tree, which in turn enables SRCU to implement per-CPU callback handling,
      thus avoiding contention on the lock currently guarding the single list
      of callbacks.  Note that users of SRCU still need to know the size of
      the srcu_struct structure, hence include/linux rather than kernel/rcu.
      
      This commit is code-movement only.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f2425b4e
    • P
      srcu: Use rcu_segcblist to track SRCU callbacks · 8660b7d8
      Paul E. McKenney 提交于
      This commit switches SRCU from custom-built callback queues to the new
      rcu_segcblist structure.  This change associates grace-period sequence
      numbers with groups of callbacks, which will be needed for efficient
      processing of per-CPU callbacks.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8660b7d8
    • P
      srcu: Add grace-period sequence numbers · ac367c1c
      Paul E. McKenney 提交于
      This commit adds grace-period sequence numbers, which will be used to
      handle mid-boot grace periods and per-CPU callback lists.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ac367c1c
    • P
      srcu: Move to state-based grace-period sequencing · c2a8ec07
      Paul E. McKenney 提交于
      The current SRCU grace-period processing might never reach the last
      portion of srcu_advance_batches().  This is OK given the current
      implementation, as the first portion, up to the try_check_zero()
      following the srcu_flip() is sufficient to drive grace periods forward.
      However, it has the unfortunate side-effect of making it impossible to
      determine when a given grace period has ended, and it will be necessary
      to efficiently trace ends of grace periods in order to efficiently handle
      per-CPU SRCU callback lists.
      
      This commit therefore adds states to the SRCU grace-period processing,
      so that the end of a given SRCU grace period is marked by the transition
      to the SRCU_STATE_DONE state.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c2a8ec07
    • P
      srcu: Push srcu_advance_batches() fastpath into common case · c6e56f59
      Paul E. McKenney 提交于
      This commit simplifies the SRCU state machine by pushing the
      srcu_advance_batches() idle-SRCU fastpath into the common case.  This is
      done by giving srcu_reschedule() a delay parameter, which is zero in
      the call from srcu_advance_batches().
      
      This commit is a step towards numbering callbacks in order to
      efficiently handle per-CPU callback lists.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c6e56f59
    • D
      rcu: Fix warning in rcu_seq_end() · f010ed82
      Dmitry Vyukov 提交于
      The rcu_seq_end() function increments seq signifying completion
      of a grace period, after that checks that the seq is even and wakes
      _synchronize_rcu_expedited().  The _synchronize_rcu_expedited() function
      uses wait_event() to wait for even seq.  The problem is that wait_event()
      can return as soon as seq becomes even without waiting for the wakeup.
      In such case the warning in rcu_seq_end() can falsely fire if the next
      expedited grace period starts before the check.
      
      Check that seq has good value before incrementing it.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: syzkaller@googlegroups.com
      Cc: linux-kernel@vger.kernel.org
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: josh@joshtriplett.org
      Cc: jiangshanlai@gmail.com
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      
      ---
      
      syzkaller-triggered warning:
      
      WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533
      rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
      CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Workqueue: events wait_rcu_exp_gp
      Call Trace:
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
       panic+0x1fb/0x412 kernel/panic.c:179
       __warn+0x1c4/0x1e0 kernel/panic.c:540
       warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
       rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
       rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline]
       rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517
       rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline]
       wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570
       process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096
       worker_thread+0x223/0x19c0 kernel/workqueue.c:2230
       kthread+0x326/0x3f0 kernel/kthread.c:227
       ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
      ---
      f010ed82
    • P
      rcu: Expedited wakeups need to be fully ordered · 3c345825
      Paul E. McKenney 提交于
      Expedited grace periods use workqueue handlers that wake up the requesters,
      but there is no lock mediating this wakeup.  Therefore, memory barriers
      are required to ensure that the handler's memory references are seen by
      all to occur before synchronize_*_expedited() returns to its caller.
      Possibly detected by syzkaller.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3c345825
    • P
      srcu: Move rcu_seq_start() and friends to rcu.h · 2e8c28c2
      Paul E. McKenney 提交于
      This commit moves rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(),
      and rcu_seq_done() from kernel/rcu/tree.c to kernel/rcu/rcu.h.
      This will allow SRCU to use these functions, which in turn will
      allow SRCU to move from a single global callback queue to a
      per-CPU callback queue.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2e8c28c2
    • P
      rcu: Add single-element dequeue functions to rcu_segcblist · bdcabf4c
      Paul E. McKenney 提交于
      This commit adds single-element dequeue functions to rcu_segcblist.
      These are less efficient than using the extract and insert functions,
      but allow more precise debugging code.  These functions are thus
      expected to be used only in debug builds, for example, CONFIG_PROVE_RCU.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bdcabf4c
    • P
      srcu: Allow early boot use of synchronize_srcu() · b5eaeaa5
      Paul E. McKenney 提交于
      This commit checks for pre-scheduler state, and if that early in the
      boot process, synchronize_srcu() and friends are no-ops.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b5eaeaa5
    • P
      srcu: Allow SRCU to access rcu_scheduler_active · 900b1028
      Paul E. McKenney 提交于
      This is primarily a code-movement commit in preparation for allowing
      SRCU to handle early-boot SRCU grace periods.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      900b1028
    • P
      srcu: Abstract multi-tail callback list handling · 15fecf89
      Paul E. McKenney 提交于
      RCU has only one multi-tail callback list, which is implemented via
      the nxtlist, nxttail, nxtcompleted, qlen_lazy, and qlen fields in the
      rcu_data structure, and whose operations are open-code throughout the
      Tree RCU implementation.  This has been more or less OK in the past,
      but upcoming callback-list optimizations in SRCU could really use
      a multi-tail callback list there as well.
      
      This commit therefore abstracts the multi-tail callback list handling
      into a new kernel/rcu/rcu_segcblist.h file, and uses this new API.
      The simple head-and-tail pointer callback list is also abstracted and
      applied everywhere except for the NOCB callback-offload lists.  (Yes,
      the plan is to apply them there as well, but this commit is already
      bigger than would be good.)
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      15fecf89
    • P
      rcu: Default RCU_FANOUT_LEAF to 16 unless explicitly changed · b8c78d3a
      Paul E. McKenney 提交于
      If the RCU_EXPERT Kconfig option is not set (the default), then the
      RCU_FANOUT_LEAF Kconfig option will not be defined, which will cause
      the leaf-level rcu_node tree fanout to default to 32 on 32-bit systems
      and 64 on 64-bit systems.  This can result in excessive lock contention.
      This commit therefore changes the computation of the leaf-level rcu_node
      tree fanout so that the result will be 16 unless an explicit Kconfig or
      kernel-boot setting says otherwise.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b8c78d3a
    • P
      rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions · 9226b10d
      Paul E. McKenney 提交于
      The rcu_all_qs() and rcu_note_context_switch() do a series of checks,
      taking various actions to supply RCU with quiescent states, depending
      on the outcomes of the various checks.  This is a bit much for scheduling
      fastpaths, so this commit creates a separate ->rcu_urgent_qs field in
      the rcu_dynticks structure that acts as a global guard for these checks.
      Thus, in the common case, rcu_all_qs() and rcu_note_context_switch()
      check the ->rcu_urgent_qs field, find it false, and simply return.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      9226b10d
    • P
      rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle() · 0f9be8ca
      Paul E. McKenney 提交于
      The rcu_momentary_dyntick_idle() function scans the RCU flavors, checking
      that one of them still needs a quiescent state before doing an expensive
      atomic operation on the ->dynticks counter.  However, this check reduces
      overhead only after a rare race condition, and increases complexity.  This
      commit therefore removes the scan and the mechanism enabling the scan.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0f9be8ca
    • P
      rcu: Pull rcu_qs_ctr into rcu_dynticks structure · 9577df9a
      Paul E. McKenney 提交于
      The rcu_qs_ctr variable is yet another isolated per-CPU variable,
      so this commit pulls it into the pre-existing rcu_dynticks per-CPU
      structure.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9577df9a
    • P
      rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure · abb06b99
      Paul E. McKenney 提交于
      The rcu_sched_qs_mask variable is yet another isolated per-CPU variable,
      so this commit pulls it into the pre-existing rcu_dynticks per-CPU
      structure.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      abb06b99
    • P
      rcu: Semicolon inside RCU_TRACE() for tree.c · 88a4976d
      Paul E. McKenney 提交于
      The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
      where "statement" is a local-variable declaration, as it can leave a
      misplaced ";" in the source code.  This commit therefore converts these
      to "RCU_TRACE(statement;)", which avoids the misplaced ";".
      Reported-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      88a4976d
    • P
      rcu: Semicolon inside RCU_TRACE() for Tiny RCU · 6c8c1485
      Paul E. McKenney 提交于
      The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
      where "statement" is a local-variable declaration, as it can leave a
      misplaced ";" in the source code.  This commit therefore converts these
      to "RCU_TRACE(statement;)", which avoids the misplaced ";".
      Reported-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6c8c1485
    • P
      rcu: Semicolon inside RCU_TRACE() for rcu.h · dffd06a7
      Paul E. McKenney 提交于
      The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
      where "statement" is a local-variable declaration, as it can leave a
      misplaced ";" in the source code.  This commit therefore converts these
      to "RCU_TRACE(statement;)", which avoids the misplaced ";".
      Reported-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      dffd06a7
    • P
      srcu: Check for tardy grace-period activity in cleanup_srcu_struct() · 15c68f7f
      Paul E. McKenney 提交于
      Users of SRCU are obliged to complete all grace-period activity before
      invoking cleanup_srcu_struct().  This means that all calls to either
      synchronize_srcu() or synchronize_srcu_expedited() must have returned,
      and all calls to call_srcu() must have returned, and the last call to
      call_srcu() must have been followed by a call to srcu_barrier().
      Furthermore, the caller must have done something to prevent any
      further calls to synchronize_srcu(), synchronize_srcu_expedited(),
      and call_srcu().
      
      Therefore, if there has ever been an invocation of call_srcu() on
      the srcu_struct in question, the sequence of events must be as
      follows:
      
      1.  Prevent any further calls to call_srcu().
      2.  Wait for any pre-existing call_srcu() invocations to return.
      3.  Invoke srcu_barrier().
      4.  It is now safe to invoke cleanup_srcu_struct().
      
      On the other hand, if there has ever been a call to synchronize_srcu()
      or synchronize_srcu_expedited(), the sequence of events must be as
      follows:
      
      1.  Prevent any further calls to synchronize_srcu() or
          synchronize_srcu_expedited().
      2.  Wait for any pre-existing synchronize_srcu() or
          synchronize_srcu_expedited() invocations to return.
      3.  It is now safe to invoke cleanup_srcu_struct().
      
      If there have been calls to all both types of functions (call_srcu()
      and either of synchronize_srcu() and synchronize_srcu_expedited()), then
      the caller must do the first three steps of the call_srcu() procedure
      above and the first two steps of the synchronize_s*() procedure above,
      and only then invoke cleanup_srcu_struct().
      
      Note that cleanup_srcu_struct() does some probabilistic checks
      for the caller failing to follow these procedures, in which case
      cleanup_srcu_struct() does WARN_ON() and avoids freeing the per-CPU
      structures associated with the specified srcu_struct structure.
      Reported-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      15c68f7f
    • P
      srcu: Consolidate batch checking into rcu_all_batches_empty() · cc985822
      Paul E. McKenney 提交于
      The srcu_reschedule() function invokes rcu_batch_empty() on each of
      the four rcu_batch structures in the srcu_struct in question twice.
      Given that this check will also be needed in cleanup_srcu_struct(), this
      commit consolidates these four checks into a new rcu_all_batches_empty()
      function.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      cc985822
    • P
      rcu: Make arch select smp_mb__after_unlock_lock() strength · 77e58496
      Paul E. McKenney 提交于
      The definition of smp_mb__after_unlock_lock() is currently smp_mb()
      for CONFIG_PPC and a no-op otherwise.  It would be better to instead
      provide an architecture-selectable Kconfig option, and select the
      strength of smp_mb__after_unlock_lock() based on that option.  This
      commit therefore creates ARCH_WEAK_RELEASE_ACQUIRE, has PPC select it,
      and bases the definition of smp_mb__after_unlock_lock() on this new
      ARCH_WEAK_RELEASE_ACQUIRE Kconfig option.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Boqun Feng <boqun.feng@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: <linuxppc-dev@lists.ozlabs.org>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      77e58496
    • P
      rcu: Maintain special bits at bottom of ->dynticks counter · b8c17e66
      Paul E. McKenney 提交于
      Currently, IPIs are used to force other CPUs to invalidate their TLBs
      in response to a kernel virtual-memory mapping change.  This works, but
      degrades both battery lifetime (for idle CPUs) and real-time response
      (for nohz_full CPUs), and in addition results in unnecessary IPIs due to
      the fact that CPUs executing in usermode are unaffected by stale kernel
      mappings.  It would be better to cause a CPU executing in usermode to
      wait until it is entering kernel mode to do the flush, first to avoid
      interrupting usemode tasks and second to handle multiple flush requests
      with a single flush in the case of a long-running user task.
      
      This commit therefore reserves a bit at the bottom of the ->dynticks
      counter, which is checked upon exit from extended quiescent states.
      If it is set, it is cleared and then a new rcu_eqs_special_exit() macro is
      invoked, which, if not supplied, is an empty single-pass do-while loop.
      If this bottom bit is set on -entry- to an extended quiescent state,
      then a WARN_ON_ONCE() triggers.
      
      This bottom bit may be set using a new rcu_eqs_special_set() function,
      which returns true if the bit was set, or false if the CPU turned
      out to not be in an extended quiescent state.  Please note that this
      function refuses to set the bit for a non-nohz_full CPU when that CPU
      is executing in usermode because usermode execution is tracked by RCU
      as a dyntick-idle extended quiescent state only for nohz_full CPUs.
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      b8c17e66