1. 09 6月, 2017 14 次提交
  2. 08 6月, 2017 26 次提交
    • P
      rcu: Use RCU_NOCB_WAKE rather than RCU_NOGP_WAKE · 511324e4
      Paul E. McKenney 提交于
      The RCU_NOGP_WAKE_NOT, RCU_NOGP_WAKE, and RCU_NOGP_WAKE_FORCE flags
      are used to mediate wakeups for the no-CBs CPU kthreads.  The "NOGP"
      really doesn't make any sense, so this commit does s/NOGP/NOCB/.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      511324e4
    • P
      sched: Rely on synchronize_rcu_mult() de-duplication · d7d34d5e
      Paul E. McKenney 提交于
      The synchronize_rcu_mult() function now detects duplicate requests
      for the same grace-period flavor and waits only once for each flavor.
      This commit therefore removes the ugly #ifdef from sched_cpu_deactivate()
      because synchronize_rcu_mult(call_rcu, call_rcu_sched) now does what
      the #ifdef used to be needed for.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      d7d34d5e
    • P
      rcu: Make synchronize_rcu_mult() check for duplicates · 68ab0b42
      Paul E. McKenney 提交于
      Currently, doing synchronize_rcu_mult(call_rcu, call_rcu) might
      (or might not) wait for two RCU grace periods.  One approach is
      of course "don't do that!", but in CONFIG_PREEMPT=n kernels,
      synchronize_rcu_mult(call_rcu, call_rcu_sched) does exactly that.
      This results in an ugly #ifdef in sched_cpu_deactivate().
      
      This commit therefore makes __wait_rcu_gp() check for duplicates,
      which in turn allows duplicates to be passed to synchronize_rcu_mult()
      without risk of waiting twice on the same type of grace period.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      68ab0b42
    • P
      srcu: Add DEBUG_OBJECTS_RCU_HEAD functionality · a602538e
      Paul E. McKenney 提交于
      This commit adds DEBUG_OBJECTS_RCU_HEAD checking to detect call_srcu()
      counterparts to double-free bugs.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a602538e
    • P
      srcu: Shrink Tiny SRCU a bit · d4efe6c5
      Paul E. McKenney 提交于
      In Tiny SRCU, __srcu_read_lock() is a trivial function, outweighed by
      its EXPORT_SYMBOL_GPL(), and on many architectures, its call sequence.
      This commit therefore moves it to srcutiny.h so that it can be inlined.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d4efe6c5
    • P
      rcu: Add lockdep_assert_held() teeth to tree_plugin.h · ea9b0c8a
      Paul E. McKenney 提交于
      Comments can be helpful, but assertions carry more force.  This commit
      therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN() calls to
      enforce lock-held and interrupt-disabled preconditions.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ea9b0c8a
    • P
      rcu: Add lockdep_assert_held() teeth to tree.c · c0b334c5
      Paul E. McKenney 提交于
      Comments can be helpful, but assertions carry more force.  This
      commit therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN()
      calls to enforce lock-held and interrupt-disabled preconditions.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c0b334c5
    • P
      srcu: Print non-default exp_holdoff values at boot time · 0c8e0e3c
      Paul E. McKenney 提交于
      This commit makes srcu_bootup_announce() check for non-default values
      of the auto-expedite holdoff time exp_holdoff and print a message if so.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0c8e0e3c
    • P
      srcu: Make exp_holdoff module parameter be static · b5815e6c
      Paul E. McKenney 提交于
      Because exp_holdoff is not used outside of srcutree.c, it can be static.
      This commit therefore makes this change.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b5815e6c
    • P
      rcu: Update rcu_bootup_announce_oddness() · 17c7798b
      Paul E. McKenney 提交于
      This commit updates rcu_bootup_announce_oddness() to check additional
      Kconfig options and module/boot parameters.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      17c7798b
    • P
      rcu: Print out rcupdate.c non-default boot-time settings · 59d80fd8
      Paul E. McKenney 提交于
      This commit adds a rcupdate_announce_bootup_oddness() function to
      print out non-default values of significant kernel boot parameter
      settings to aid in debugging.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      59d80fd8
    • P
      rcu: Add preemptibility checks in rcu_sched_qs() and rcu_bh_qs() · f4687d26
      Paul E. McKenney 提交于
      This commit adds WARN_ON_ONCE() calls that trigger if either
      rcu_sched_qs() or rcu_bh_qs() are invoked with preemption enabled.
      In the immortal words of Peter Zijlstra: "these are much harder to ignore
      than comments".
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f4687d26
    • P
      rcuperf: Add writer_holdoff boot parameter · 820687a7
      Paul E. McKenney 提交于
      This commit adds a writer_holdoff boot parameter to rcuperf, which is
      intended to be used to test Tree SRCU's auto-expediting.  This
      boot parameter is in microseconds, and defaults to zero (that is,
      disabled).  Set it to a bit larger than srcutree.exp_holdoff,
      keeping the nanosecond/microsecond conversion, to force Tree SRCU
      to auto-expedite more aggressively.
      
      This commit also adds documentation for this parameter, and fixes some
      alphabetization while in the neighborhood.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      820687a7
    • P
      rcuperf: Set more user-friendly defaults · 492b95e5
      Paul E. McKenney 提交于
      Common-case use of rcuperf must set rcuperf.nreaders=0 and if not built
      as a module, rcuperf.shutdown.  This commit therefore sets the default
      for rcuperf.nreaders to zero and sets the default for rcuperf.shutdown
      to zero if rcuperf is built as a module and to one otherwise.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      492b95e5
    • P
      srcu: Shrink Tiny SRCU a bit more · 3ddf20c9
      Paul E. McKenney 提交于
      This commit rearranges Tiny SRCU's srcu_struct structure, substitutes
      u8 for bool, and shrinks counters down to short.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3ddf20c9
    • P
      srcu: Make Classic and Tree SRCU announce themselves at bootup · 1f4f6da1
      Paul E. McKenney 提交于
      Currently, the only way to tell whether a given kernel is running
      Classic, Tiny, or Tree SRCU is to look at the .config file, which
      can easily be lost or associated with the wrong kernel.  This commit
      therefore has Classic and Tree SRCU identify themselves at boot time.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1f4f6da1
    • P
      rcuperf: Add test for dynamically initialized srcu_struct · f60cb4d4
      Paul E. McKenney 提交于
      This commit adds a perf_type of "srcud", which species that rcuperf
      test SRCU on a dynamically initialized srcu_struct.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f60cb4d4
    • P
      rcu: Make sync_rcu_preempt_exp_done() return bool · dcfc315b
      Paul E. McKenney 提交于
      The sync_rcu_preempt_exp_done() function returns a logical expression,
      but its return type is nevertheless int.  This commit therefore changes
      the return type to bool.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      dcfc315b
    • P
      rcuperf: Add ability to performance-test call_rcu() and friends · 881ed593
      Paul E. McKenney 提交于
      This commit upgrades rcuperf so that it can do performance testing on
      asynchronous grace-period primitives such as call_srcu().  There is
      a new rcuperf.gp_async module parameter that specifies this new behavior,
      with the pre-existing rcuperf.gp_exp testing expedited grace periods such as
      synchronize_rcu_expedited, and with the default being to test synchronous
      non-expedited grace periods such as synchronize_rcu().
      
      There is also a new rcuperf.gp_async_max module parameter that specifies
      the maximum number of outstanding callbacks per writer kthread, defaulting
      to 1,000.  When this limit is exceeded, the writer thread invokes the
      appropriate flavor of rcu_barrier() to wait for callbacks to drain.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Removed the redundant initialization noted by Arnd Bergmann. ]
      881ed593
    • P
      rcu: Remove obsolete reference to synchronize_kernel() · e28371c8
      Paul E. McKenney 提交于
      The synchronize_kernel() primitive was removed in favor of
      synchronize_sched() more than a decade ago, and it seems likely that
      rather few kernel hackers are familiar with it.  Its continued presence
      is therefore providing more confusion than enlightenment.  This commit
      therefore removes the reference from the synchronize_sched() header
      comment, and adds the corresponding information to the synchronize_rcu(0
      header comment.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e28371c8
    • P
      rcuperf: Defer expedited/normal check to end of test · 9683937d
      Paul E. McKenney 提交于
      Current rcuperf startup checks to see if the user asked to measure
      only expedited grace periods, yet constrained all grace periods to be
      normal, or if the user asked to measure only normal grace periods, yet
      constrained all grace periods to be expedited.  Useless tests of this
      sort are aborted.
      
      Unfortunately, making RCU work through the mid-boot dead zone [1] puts
      RCU into expedited-only mode during that zone.  Which happens to also
      be the exact time that rcuperf carries out the aforementioned check.
      So if the user asks rcuperf to measure only normal grace periods (the
      default), rcuperf will now always complain and terminate the test.
      
      This commit therefore moves the checks to rcu_perf_cleanup().  This has
      the disadvantage of failing to abort useless tests, but avoids the need to
      create yet another kthread and the need to do fiddly checks involving the
      holdoff time.  (Yes, another approach is to do the checks in a late-stage
      init function, but that would require some way to communicate badness
      to rcuperf's kthreads, and seems not worth the bother.)
      
      [1] https://lwn.net/Articles/716148/Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9683937d
    • P
      rcu: Complain if blocking in preemptible RCU read-side critical section · 5b72f964
      Paul E. McKenney 提交于
      Although preemptible RCU allows its read-side critical sections to be
      preempted, general blocking is forbidden.  The reason for this is that
      excessive preemption times can be handled by CONFIG_RCU_BOOST=y, but a
      voluntarily blocked task doesn't care how high you boost its priority.
      Because preemptible RCU is a global mechanism, one ill-behaved reader
      hurts everyone.  Hence the prohibition against general blocking in
      RCU-preempt read-side critical sections.  Preemption yes, blocking no.
      
      This commit enforces this prohibition.
      
      There is a special exception for the -rt patchset (which they kindly
      volunteered to implement):  It is OK to block (as opposed to merely being
      preempted) within an RCU-preempt read-side critical section, but only if
      the blocking is subject to priority inheritance.  This exception permits
      CONFIG_RCU_BOOST=y to get -rt RCU readers out of trouble.
      
      Why doesn't this exception also apply to mainline's rt_mutex?  Because
      of the possibility that someone does general blocking while holding
      an rt_mutex.  Yes, the priority boosting will affect the rt_mutex,
      but it won't help with the task doing general blocking while holding
      that rt_mutex.
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5b72f964
    • P
      srcu: Eliminate possibility of destructive counter overflow · 881ec9d2
      Paul E. McKenney 提交于
      Earlier versions of Tree SRCU were subject to a counter overflow bug that
      could theoretically result in too-short grace periods.  This commit
      eliminates this problem by adding an update-side memory barrier.
      The short explanation is that if the updater sums the unlock counts
      too late to see a given __srcu_read_unlock() increment, that CPU's
      next __srcu_read_lock() must see the new value of ->srcu_idx, thus
      incrementing the other bank of counters.  This eliminates the possibility
      of destructive counter overflow as long as the srcu_read_lock() nesting
      level does not exceed floor(ULONG_MAX/NR_CPUS/2), which should be an
      eminently reasonable nesting limit, especially on 64-bit systems.
      Reported-by: NLance Roy <ldr709@gmail.com>
      Suggested-by: NLance Roy <ldr709@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      881ec9d2
    • P
      rcu: Prevent rcu_barrier() from starting needless grace periods · f92c734f
      Paul E. McKenney 提交于
      Currently rcu_barrier() uses call_rcu() to enqueue new callbacks
      on each CPU with a non-empty callback list.  This works, but means
      that rcu_barrier() forces grace periods that are not otherwise needed.
      The key point is that rcu_barrier() never needs to wait for a grace
      period, but instead only for all pre-existing callbacks to be invoked.
      This means that rcu_barrier()'s new callbacks should be placed in
      the callback-list segment containing the last pre-existing callback.
      
      This commit makes this change using the new rcu_segcblist_entrain()
      function.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f92c734f
    • P
      srcu: Allow use of Classic SRCU from both process and interrupt context · 1123a604
      Paolo Bonzini 提交于
      Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
      down a guest running iperf on a VFIO assigned device.  This happens
      because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
      context, while a worker thread does the same inside kvm_set_irq().  If the
      interrupt happens while the worker thread is executing __srcu_read_lock(),
      updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
      ->srcu_lock_count[] field can be lost.
      
      The docs say you are not supposed to call srcu_read_lock() and
      srcu_read_unlock() from irq context, but KVM interrupt injection happens
      from (host) interrupt context and it would be nice if SRCU supported the
      use case.  KVM is using SRCU here not really for the "sleepable" part,
      but rather due to its IPI-free fast detection of grace periods.  It is
      therefore not desirable to switch back to RCU, which would effectively
      revert commit 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
      2014-01-16).
      
      However, the docs are overly conservative.  You can have an SRCU instance
      only has users in irq context, and you can mix process and irq context
      as long as process context users disable interrupts.  In addition,
      __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
      Classic SRCU.  For those two implementations, only srcu_read_lock()
      is unsafe.
      
      When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
      in commit 5a41344a ("srcu: Simplify __srcu_read_unlock() via
      this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
      Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
      the caller.  Tree SRCU however only does one increment, so on most
      architectures it is more efficient for __srcu_read_lock() to use
      this_cpu_inc(), and any performance differences appear to be down in
      the noise.
      
      Cc: stable@vger.kernel.org
      Fixes: 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
      Reported-by: NLinu Cherian <linuc.decode@gmail.com>
      Suggested-by: NLinu Cherian <linuc.decode@gmail.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1123a604
    • P
      srcu: Allow use of Tiny/Tree SRCU from both process and interrupt context · cdf7abc4
      Paolo Bonzini 提交于
      Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
      down a guest running iperf on a VFIO assigned device.  This happens
      because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
      context, while a worker thread does the same inside kvm_set_irq().  If the
      interrupt happens while the worker thread is executing __srcu_read_lock(),
      updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
      ->srcu_lock_count[] field can be lost.
      
      The docs say you are not supposed to call srcu_read_lock() and
      srcu_read_unlock() from irq context, but KVM interrupt injection happens
      from (host) interrupt context and it would be nice if SRCU supported the
      use case.  KVM is using SRCU here not really for the "sleepable" part,
      but rather due to its IPI-free fast detection of grace periods.  It is
      therefore not desirable to switch back to RCU, which would effectively
      revert commit 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
      2014-01-16).
      
      However, the docs are overly conservative.  You can have an SRCU instance
      only has users in irq context, and you can mix process and irq context
      as long as process context users disable interrupts.  In addition,
      __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
      Classic SRCU.  For those two implementations, only srcu_read_lock()
      is unsafe.
      
      When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
      in commit 5a41344a ("srcu: Simplify __srcu_read_unlock() via
      this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
      Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
      the caller.  Tree SRCU however only does one increment, so on most
      architectures it is more efficient for __srcu_read_lock() to use
      this_cpu_inc(), and any performance differences appear to be down in
      the noise.
      
      Unlike Classic and Tree SRCU, Tiny SRCU does increments and decrements on
      a single variable.  Therefore, as Peter Zijlstra pointed out, Tiny SRCU's
      implementation already supports mixed-context use of srcu_read_lock()
      and srcu_read_unlock(), at least as long as uses of srcu_read_lock()
      and srcu_read_unlock() in each handler are nested and paired properly.
      In other words, it is still illegal to (say) invoke srcu_read_lock()
      in an interrupt handler and to invoke the matching srcu_read_unlock()
      in a softirq handler.  Therefore, the only change required for Tiny SRCU
      is to its comments.
      
      Fixes: 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
      Reported-by: NLinu Cherian <linuc.decode@gmail.com>
      Suggested-by: NLinu Cherian <linuc.decode@gmail.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
      cdf7abc4