1. 16 5月, 2018 1 次提交
    • P
      rcu: Rename cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs() · cee43939
      Paul E. McKenney 提交于
      Commit e31d28b6 ("trace: Eliminate cond_resched_rcu_qs() in favor
      of cond_resched()") substituted cond_resched() for the earlier call
      to cond_resched_rcu_qs().  However, the new-age cond_resched() does
      not do anything to help RCU-tasks grace periods because (1) RCU-tasks
      is only enabled when CONFIG_PREEMPT=y and (2) cond_resched() is a
      complete no-op when preemption is enabled.  This situation results
      in hangs when running the trace benchmarks.
      
      A number of potential fixes were discussed on LKML
      (https://lkml.kernel.org/r/20180224151240.0d63a059@vmware.local.home),
      including making cond_resched() not be a no-op; making cond_resched()
      not be a no-op, but only when running tracing benchmarks; reverting
      the aforementioned commit (which works because cond_resched_rcu_qs()
      does provide an RCU-tasks quiescent state; and adding a call to the
      scheduler/RCU rcu_note_voluntary_context_switch() function.  All were
      deemed unsatisfactory, either due to added cond_resched() overhead or
      due to magic functions inviting cargo culting.
      
      This commit renames cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs(),
      which provides a clear hint as to what this function is doing and
      why and where it should be used, and then replaces the call to
      cond_resched() with cond_resched_tasks_rcu_qs() in the trace benchmark's
      benchmark_event_kthread() function.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NNicholas Piggin <npiggin@gmail.com>
      cee43939
  2. 08 12月, 2017 1 次提交
  3. 27 10月, 2017 2 次提交
  4. 10 10月, 2017 2 次提交
    • P
      rcu: Suppress RCU CPU stall warnings while dumping trace · f22ce091
      Paul E. McKenney 提交于
      Currently, RCU emits Suppress RCU CPU stall warnings during its
      automatically initiated ftrace_dump() calls after detecting an error
      condition, which can result in excessively excessive console output
      and lost trace events.  This commit therefore suppresses RCU CPU stall
      warnings across any of these ftrace_dump() calls.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f22ce091
    • P
      rcu: Create call_rcu_tasks() kthread at boot time · c63eb17f
      Paul E. McKenney 提交于
      Currently the call_rcu_tasks() kthread is created upon first
      invocation of call_rcu_tasks().  This has the advantage of avoiding
      creation if there are never any invocations of call_rcu_tasks() and of
      synchronize_rcu_tasks(), but it requires an unreliable heuristic to
      determine when it is safe to create the kthread.  For example, it is
      not safe to create the kthread when call_rcu_tasks() is invoked with
      a spinlock held, but there is no good way to detect this in !PREEMPT
      kernels.
      
      This commit therefore creates this kthread unconditionally at
      core_initcall() time.  If you don't want this kthread created, then
      build with CONFIG_TASKS_RCU=n.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c63eb17f
  5. 17 8月, 2017 1 次提交
  6. 09 6月, 2017 2 次提交
  7. 08 6月, 2017 2 次提交
  8. 21 4月, 2017 1 次提交
    • P
      rcu: Make non-preemptive schedule be Tasks RCU quiescent state · bcbfdd01
      Paul E. McKenney 提交于
      Currently, a call to schedule() acts as a Tasks RCU quiescent state
      only if a context switch actually takes place.  However, just the
      call to schedule() guarantees that the calling task has moved off of
      whatever tracing trampoline that it might have been one previously.
      This commit therefore plumbs schedule()'s "preempt" parameter into
      rcu_note_context_switch(), which then records the Tasks RCU quiescent
      state, but only if this call to schedule() was -not- due to a preemption.
      
      To avoid adding overhead to the common-case context-switch path,
      this commit hides the rcu_note_context_switch() check under an existing
      non-common-case check.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bcbfdd01
  9. 19 4月, 2017 1 次提交
  10. 02 3月, 2017 3 次提交
    • I
      sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> · b17b0153
      Ingo Molnar 提交于
      We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which
      will have to be picked up from other headers and a couple of .c files.
      
      Create a trivial placeholder <linux/sched/debug.h> file that just
      maps to <linux/sched.h> to make this patch obviously correct and
      bisectable.
      
      Include the new header in the files that are going to need it.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b17b0153
    • I
      sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> · 3f07c014
      Ingo Molnar 提交于
      We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
      will have to be picked up from other headers and a couple of .c files.
      
      Create a trivial placeholder <linux/sched/signal.h> file that just
      maps to <linux/sched.h> to make this patch obviously correct and
      bisectable.
      
      Include the new header in the files that are going to need it.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3f07c014
    • I
      rcu: Separate the RCU synchronization types and APIs into <linux/rcupdate_wait.h> · f9411ebe
      Ingo Molnar 提交于
      So rcupdate.h is a pretty complex header, in particular it includes
      <linux/completion.h> which includes <linux/wait.h> - creating a
      dependency that includes <linux/wait.h> in <linux/sched.h>,
      which prevents the isolation of <linux/sched.h> from the derived
      <linux/wait.h> header.
      
      Solve part of the problem by decoupling rcupdate.h from completions:
      this can be done by separating out the rcu_synchronize types and APIs,
      and updating their usage sites.
      
      Since this is a mostly RCU-internal types this will not just simplify
      <linux/sched.h>'s dependencies, but will make all the hundreds of
      .c files that include rcupdate.h but not completions or wait.h build
      faster.
      
      ( For rcutiny this means that two dependent APIs have to be uninlined,
        but that shouldn't be much of a problem as they are rare variants. )
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f9411ebe
  11. 17 1月, 2017 1 次提交
  12. 15 1月, 2017 1 次提交
    • P
      rcu: Narrow early boot window of illegal synchronous grace periods · 52d7e48b
      Paul E. McKenney 提交于
      The current preemptible RCU implementation goes through three phases
      during bootup.  In the first phase, there is only one CPU that is running
      with preemption disabled, so that a no-op is a synchronous grace period.
      In the second mid-boot phase, the scheduler is running, but RCU has
      not yet gotten its kthreads spawned (and, for expedited grace periods,
      workqueues are not yet running.  During this time, any attempt to do
      a synchronous grace period will hang the system (or complain bitterly,
      depending).  In the third and final phase, RCU is fully operational and
      everything works normally.
      
      This has been OK for some time, but there has recently been some
      synchronous grace periods showing up during the second mid-boot phase.
      This code worked "by accident" for awhile, but started failing as soon
      as expedited RCU grace periods switched over to workqueues in commit
      8b355e3b ("rcu: Drive expedited grace periods from workqueue").
      Note that the code was buggy even before this commit, as it was subject
      to failure on real-time systems that forced all expedited grace periods
      to run as normal grace periods (for example, using the rcu_normal ksysfs
      parameter).  The callchain from the failure case is as follows:
      
      early_amd_iommu_init()
      |-> acpi_put_table(ivrs_base);
      |-> acpi_tb_put_table(table_desc);
      |-> acpi_tb_invalidate_table(table_desc);
      |-> acpi_tb_release_table(...)
      |-> acpi_os_unmap_memory
      |-> acpi_os_unmap_iomem
      |-> acpi_os_map_cleanup
      |-> synchronize_rcu_expedited
      
      The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
      which caused the code to try using workqueues before they were
      initialized, which did not go well.
      
      This commit therefore reworks RCU to permit synchronous grace periods
      to proceed during this mid-boot phase.  This commit is therefore a
      fix to a regression introduced in v4.9, and is therefore being put
      forward post-merge-window in v4.10.
      
      This commit sets a flag from the existing rcu_scheduler_starting()
      function which causes all synchronous grace periods to take the expedited
      path.  The expedited path now checks this flag, using the requesting task
      to drive the expedited grace period forward during the mid-boot phase.
      Finally, this flag is updated by a core_initcall() function named
      rcu_exp_runtime_mode(), which causes the runtime codepaths to be used.
      
      Note that this arrangement assumes that tasks are not sent POSIX signals
      (or anything similar) from the time that the first task is spawned
      through core_initcall() time.
      
      Fixes: 8b355e3b ("rcu: Drive expedited grace periods from workqueue")
      Reported-by: N"Zheng, Lv" <lv.zheng@intel.com>
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NStan Kain <stan.kain@gmail.com>
      Tested-by: NIvan <waffolz@hotmail.com>
      Tested-by: NEmanuel Castelo <emanuel.castelo@gmail.com>
      Tested-by: NBruno Pesavento <bpesavento@infinito.it>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NFrederic Bezies <fredbezies@gmail.com>
      Cc: <stable@vger.kernel.org> # 4.9.0-
      52d7e48b
  13. 23 8月, 2016 1 次提交
    • P
      rcu: Don't use modular infrastructure in non-modular code · e77b7041
      Paul Gortmaker 提交于
      The Kconfig currently controlling compilation of tree.c is:
      
      init/Kconfig:config TREE_RCU
      init/Kconfig:   bool
      
      ...and update.c and sync.c are "obj-y" meaning that none are ever
      built as a module by anyone.
      
      Since MODULE_ALIAS is a no-op for non-modular code, we can remove
      them from these files.
      
      We leave moduleparam.h behind since the files instantiate some boot
      time configuration parameters with module_param() still.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e77b7041
  14. 16 6月, 2016 1 次提交
    • P
      rcu: Make call_rcu_tasks() tolerate first call with irqs disabled · 4929c913
      Paul E. McKenney 提交于
      Currently, if the very first call to call_rcu_tasks() has irqs disabled,
      it will create the rcu_tasks_kthread with irqs disabled, which will
      result in a splat in the memory allocator, which kthread_run() invokes
      with the expectation that irqs are enabled.
      
      This commit fixes this problem by deferring kthread creation if called
      with irqs disabled.  The first call to call_rcu_tasks() that has irqs
      enabled will create the kthread.
      
      This bug was detected by rcutorture changes that were motivated by
      Iftekhar Ahmed's mutation-testing efforts.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4929c913
  15. 20 5月, 2016 2 次提交
  16. 01 4月, 2016 1 次提交
    • B
      rcu: Remove superfluous versions of rcu_read_lock_sched_held() · 293e2421
      Boqun Feng 提交于
      Currently, we have four versions of rcu_read_lock_sched_held(), depending
      on the combined choices on PREEMPT_COUNT and DEBUG_LOCK_ALLOC.  However,
      there is an existing function preemptible() that already distinguishes
      between the PREEMPT_COUNT=y and PREEMPT_COUNT=n cases, and allows these
      four implementations to be consolidated down to two.
      
      This commit therefore uses preemptible() to achieve this consolidation.
      
      Note that there could be a small performance regression in the case
      of CONFIG_DEBUG_LOCK_ALLOC=y && PREEMPT_COUNT=n.  However, given the
      overhead associated with CONFIG_DEBUG_LOCK_ALLOC=y, this should be
      down in the noise.
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      293e2421
  17. 24 2月, 2016 1 次提交
  18. 08 12月, 2015 1 次提交
  19. 05 12月, 2015 2 次提交
    • P
      rcu: Allow expedited grace periods to be disabled at init · 3e42ec1a
      Paul E. McKenney 提交于
      Expedited grace periods can speed up boot, but are undesirable in
      aggressive real-time systems.  This commit therefore introduces a
      kernel parameter rcupdate.rcu_normal_after_boot that disables
      expedited grace periods just before init is spawned.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3e42ec1a
    • P
      rcu: Add rcu_normal kernel parameter to suppress expediting · 5a9be7c6
      Paul E. McKenney 提交于
      Although expedited grace periods can be quite useful, and although their
      OS jitter has been greatly reduced, they can still pose problems for
      extreme real-time workloads.  This commit therefore adds a rcu_normal
      kernel boot parameter (which can also be manipulated via sysfs)
      to suppress expedited grace periods, that is, to treat requests for
      expedited grace periods as if they were requests for normal grace periods.
      If both rcu_expedited and rcu_normal are specified, rcu_normal wins.
      This means that if you are relying on expedited grace periods to speed up
      boot, you will want to specify rcu_expedited on the kernel command line,
      and then specify rcu_normal via sysfs once boot completes.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5a9be7c6
  20. 07 10月, 2015 1 次提交
  21. 23 7月, 2015 3 次提交
  22. 16 7月, 2015 1 次提交
    • D
      rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOC · d5671f6b
      Denys Vlasenko 提交于
      DEBUG_LOCK_ALLOC=y is not a production setting, but it is
      not very unusual either. Many developers routinely
      use kernels built with it enabled.
      
      Apart from being selected by hand, it is also auto-selected by
      PROVE_LOCKING "Lock debugging: prove locking correctness" and
      LOCK_STAT "Lock usage statistics" config options.
      LOCK STAT is necessary for "perf lock" to work.
      
      I wouldn't spend too much time optimizing it, but this particular
      function has a very large cost in code size: when it is deinlined,
      code size decreases by 830,000 bytes:
      
          text     data      bss       dec     hex filename
      85674192 22294776 20627456 128596424 7aa39c8 vmlinux.before
      84837612 22294424 20627456 127759492 79d7484 vmlinux
      
      (with this config: http://busybox.net/~vda/kernel_config)
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      CC: Josh Triplett <josh@joshtriplett.org>
      CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      CC: Lai Jiangshan <laijs@cn.fujitsu.com>
      CC: Tejun Heo <tj@kernel.org>
      CC: Oleg Nesterov <oleg@redhat.com>
      CC: linux-kernel@vger.kernel.org
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d5671f6b
  23. 28 5月, 2015 1 次提交
  24. 27 2月, 2015 2 次提交
    • P
      rcu: Add Kconfig option to expedite grace periods during boot · ee42571f
      Paul E. McKenney 提交于
      This commit adds a CONFIG_RCU_EXPEDITE_BOOT Kconfig parameter
      that emulates a very early boot rcu_expedite_gp().  A late-boot
      call to rcu_end_inkernel_boot() will provide the corresponding
      rcu_unexpedite_gp().  The late-boot call to rcu_end_inkernel_boot()
      should be made just before init is spawned.
      
      According to Arjan:
      
      > To show the boot time, I'm using the timestamp of the "Write protecting"
      > line, that's pretty much the last thing we print prior to ring 3 execution.
      >
      > A kernel with default RCU behavior (inside KVM, only virtual devices)
      > looks like this:
      >
      > [    0.038724] Write protecting the kernel read-only data: 10240k
      >
      > a kernel with expedited RCU (using the command line option, so that I
      > don't have to recompile between measurements and thus am completely
      > oranges-to-oranges)
      >
      > [    0.031768] Write protecting the kernel read-only data: 10240k
      >
      > which, in percentage, is an 18% improvement.
      Reported-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NArjan van de Ven <arjan@linux.intel.com>
      ee42571f
    • P
      rcu: Provide rcu_expedite_gp() and rcu_unexpedite_gp() · 0d39482c
      Paul E. McKenney 提交于
      Currently, expediting of normal synchronous grace-period primitives
      (synchronize_rcu() and friends) is controlled by the rcu_expedited()
      boot/sysfs parameter.  This works well, but does not handle nesting.
      This commit therefore provides rcu_expedite_gp() to enable expediting
      and rcu_unexpedite_gp() to cancel a prior rcu_expedite_gp(), both of
      which support nesting.
      Reported-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0d39482c
  25. 26 2月, 2015 1 次提交
  26. 14 11月, 2014 1 次提交
  27. 04 11月, 2014 1 次提交
  28. 30 10月, 2014 1 次提交
  29. 08 9月, 2014 1 次提交
    • P
      rcu: Remove local_irq_disable() in rcu_preempt_note_context_switch() · 1d082fd0
      Paul E. McKenney 提交于
      The rcu_preempt_note_context_switch() function is on a scheduling fast
      path, so it would be good to avoid disabling irqs.  The reason that irqs
      are disabled is to synchronize process-level and irq-handler access to
      the task_struct ->rcu_read_unlock_special bitmask.  This commit therefore
      makes ->rcu_read_unlock_special instead be a union of bools with a short
      allowing single-access checks in RCU's __rcu_read_unlock().  This results
      in the process-level and irq-handler accesses being simple loads and
      stores, so that irqs need no longer be disabled.  This commit therefore
      removes the irq disabling from rcu_preempt_note_context_switch().
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1d082fd0