1. 09 6月, 2017 2 次提交
  2. 08 6月, 2017 1 次提交
  3. 21 4月, 2017 1 次提交
    • P
      rcu: Make non-preemptive schedule be Tasks RCU quiescent state · bcbfdd01
      Paul E. McKenney 提交于
      Currently, a call to schedule() acts as a Tasks RCU quiescent state
      only if a context switch actually takes place.  However, just the
      call to schedule() guarantees that the calling task has moved off of
      whatever tracing trampoline that it might have been one previously.
      This commit therefore plumbs schedule()'s "preempt" parameter into
      rcu_note_context_switch(), which then records the Tasks RCU quiescent
      state, but only if this call to schedule() was -not- due to a preemption.
      
      To avoid adding overhead to the common-case context-switch path,
      this commit hides the rcu_note_context_switch() check under an existing
      non-common-case check.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bcbfdd01
  4. 19 4月, 2017 1 次提交
  5. 11 4月, 2017 1 次提交
    • S
      rcu/tracing: Add rcu_disabled to denote when rcu_irq_enter() will not work · 03ecd3f4
      Steven Rostedt (VMware) 提交于
      Tracing uses rcu_irq_enter() as a way to make sure that RCU is watching when
      it needs to use rcu_read_lock() and friends. This is because tracing can
      happen as RCU is about to enter user space, or about to go idle, and RCU
      does not watch for RCU read side critical sections as it makes the
      transition.
      
      There is a small location within the RCU infrastructure that rcu_irq_enter()
      itself will not work. If tracing were to occur in that section it will break
      if it tries to use rcu_irq_enter().
      
      Originally, this happens with the stack_tracer, because it will call
      save_stack_trace when it encounters stack usage that is greater than any
      stack usage it had encountered previously. There was a case where that
      happened in the RCU section where rcu_irq_enter() did not work, and lockdep
      complained loudly about it. To fix it, stack tracing added a call to be
      disabled and RCU would disable stack tracing during the critical section
      that rcu_irq_enter() was inoperable. This solution worked, but there are
      other cases that use rcu_irq_enter() and it would be a good idea to let RCU
      give a way to let others know that rcu_irq_enter() will not work. For
      example, in trace events.
      
      Another helpful aspect of this change is that it also moves the per cpu
      variable called in the RCU critical section into a cache locale along with
      other RCU per cpu variables used in that same location.
      
      I'm keeping the stack_trace_disable() code, as that still could be used in
      the future by places that really need to disable it. And since it's only a
      static inline, it wont take up any kernel text if it is not used.
      
      Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.homeAcked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      03ecd3f4
  6. 02 3月, 2017 1 次提交
    • I
      rcu: Separate the RCU synchronization types and APIs into <linux/rcupdate_wait.h> · f9411ebe
      Ingo Molnar 提交于
      So rcupdate.h is a pretty complex header, in particular it includes
      <linux/completion.h> which includes <linux/wait.h> - creating a
      dependency that includes <linux/wait.h> in <linux/sched.h>,
      which prevents the isolation of <linux/sched.h> from the derived
      <linux/wait.h> header.
      
      Solve part of the problem by decoupling rcupdate.h from completions:
      this can be done by separating out the rcu_synchronize types and APIs,
      and updating their usage sites.
      
      Since this is a mostly RCU-internal types this will not just simplify
      <linux/sched.h>'s dependencies, but will make all the hundreds of
      .c files that include rcupdate.h but not completions or wait.h build
      faster.
      
      ( For rcutiny this means that two dependent APIs have to be uninlined,
        but that shouldn't be much of a problem as they are rare variants. )
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f9411ebe
  7. 26 1月, 2017 1 次提交
    • P
      srcu: Force full grace-period ordering · d85b62f1
      Paul E. McKenney 提交于
      If a process invokes synchronize_srcu(), is delayed just the right amount
      of time, and thus does not sleep when waiting for the grace period to
      complete, there is no ordering between the end of the grace period and
      the code following the synchronize_srcu().  Similarly, there can be a
      lack of ordering between the end of the SRCU grace period and callback
      invocation.
      
      This commit adds the necessary ordering.
      Reported-by: NLance Roy <ldr709@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Further smp_mb() adjustment per email with Lance Roy. ]
      d85b62f1
  8. 15 1月, 2017 1 次提交
    • P
      rcu: Narrow early boot window of illegal synchronous grace periods · 52d7e48b
      Paul E. McKenney 提交于
      The current preemptible RCU implementation goes through three phases
      during bootup.  In the first phase, there is only one CPU that is running
      with preemption disabled, so that a no-op is a synchronous grace period.
      In the second mid-boot phase, the scheduler is running, but RCU has
      not yet gotten its kthreads spawned (and, for expedited grace periods,
      workqueues are not yet running.  During this time, any attempt to do
      a synchronous grace period will hang the system (or complain bitterly,
      depending).  In the third and final phase, RCU is fully operational and
      everything works normally.
      
      This has been OK for some time, but there has recently been some
      synchronous grace periods showing up during the second mid-boot phase.
      This code worked "by accident" for awhile, but started failing as soon
      as expedited RCU grace periods switched over to workqueues in commit
      8b355e3b ("rcu: Drive expedited grace periods from workqueue").
      Note that the code was buggy even before this commit, as it was subject
      to failure on real-time systems that forced all expedited grace periods
      to run as normal grace periods (for example, using the rcu_normal ksysfs
      parameter).  The callchain from the failure case is as follows:
      
      early_amd_iommu_init()
      |-> acpi_put_table(ivrs_base);
      |-> acpi_tb_put_table(table_desc);
      |-> acpi_tb_invalidate_table(table_desc);
      |-> acpi_tb_release_table(...)
      |-> acpi_os_unmap_memory
      |-> acpi_os_unmap_iomem
      |-> acpi_os_map_cleanup
      |-> synchronize_rcu_expedited
      
      The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
      which caused the code to try using workqueues before they were
      initialized, which did not go well.
      
      This commit therefore reworks RCU to permit synchronous grace periods
      to proceed during this mid-boot phase.  This commit is therefore a
      fix to a regression introduced in v4.9, and is therefore being put
      forward post-merge-window in v4.10.
      
      This commit sets a flag from the existing rcu_scheduler_starting()
      function which causes all synchronous grace periods to take the expedited
      path.  The expedited path now checks this flag, using the requesting task
      to drive the expedited grace period forward during the mid-boot phase.
      Finally, this flag is updated by a core_initcall() function named
      rcu_exp_runtime_mode(), which causes the runtime codepaths to be used.
      
      Note that this arrangement assumes that tasks are not sent POSIX signals
      (or anything similar) from the time that the first task is spawned
      through core_initcall() time.
      
      Fixes: 8b355e3b ("rcu: Drive expedited grace periods from workqueue")
      Reported-by: N"Zheng, Lv" <lv.zheng@intel.com>
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NStan Kain <stan.kain@gmail.com>
      Tested-by: NIvan <waffolz@hotmail.com>
      Tested-by: NEmanuel Castelo <emanuel.castelo@gmail.com>
      Tested-by: NBruno Pesavento <bpesavento@infinito.it>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NFrederic Bezies <fredbezies@gmail.com>
      Cc: <stable@vger.kernel.org> # 4.9.0-
      52d7e48b
  9. 23 8月, 2016 1 次提交
    • P
      rcu: Provide exact CPU-online tracking for RCU · 7ec99de3
      Paul E. McKenney 提交于
      Up to now, RCU has assumed that the CPU-online process makes it from
      CPU_UP_PREPARE to set_cpu_online() within one jiffy.  Given the recent
      rise of virtualized environments, this assumption is very clearly
      obsolete.  Failing to meet this deadline can result in RCU paying
      attention to an incoming CPU for one jiffy, then ignoring it until the
      grace period following the one in which that CPU sets itself online.
      This situation might prove to be fatally disappointing to any RCU
      read-side critical sections that had the misfortune to execute during
      the time in which RCU was ignoring the slow-to-come-online CPU.
      
      This commit therefore updates RCU's internal CPU state-tracking
      information at notify_cpu_starting() time, thus providing RCU with
      an exact transition of the CPU's state from offline to online.
      
      Note that this means that incoming CPUs must not use RCU read-side
      critical section (other than those of SRCU) until notify_cpu_starting()
      time.  Note also that the CPU_STARTING notifiers -are- allowed to use
      RCU read-side critical sections.  (Of course, CPU-hotplug notifiers are
      rapidly becoming obsolete, so you need to act fast!)
      
      If a given architecture or CPU family needs to use RCU read-side
      critical sections earlier, the call to rcu_cpu_starting() from
      notify_cpu_starting() will need to be architecture-specific, with
      architectures that need early use being required to hand-place
      the call to rcu_cpu_starting() at some point preceding the call to
      notify_cpu_starting().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      7ec99de3
  10. 06 7月, 2016 1 次提交
  11. 16 6月, 2016 2 次提交
  12. 15 6月, 2016 1 次提交
  13. 01 4月, 2016 2 次提交
  14. 02 3月, 2016 1 次提交
    • T
      rcu: Make CPU_DYING_IDLE an explicit call · 27d50c7e
      Thomas Gleixner 提交于
      Make the RCU CPU_DYING_IDLE callback an explicit function call, so it gets
      invoked at the proper place.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.870167933@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      27d50c7e
  15. 24 2月, 2016 1 次提交
  16. 08 12月, 2015 3 次提交
  17. 05 12月, 2015 1 次提交
    • P
      rcu: Add rcu_normal kernel parameter to suppress expediting · 5a9be7c6
      Paul E. McKenney 提交于
      Although expedited grace periods can be quite useful, and although their
      OS jitter has been greatly reduced, they can still pose problems for
      extreme real-time workloads.  This commit therefore adds a rcu_normal
      kernel boot parameter (which can also be manipulated via sysfs)
      to suppress expedited grace periods, that is, to treat requests for
      expedited grace periods as if they were requests for normal grace periods.
      If both rcu_expedited and rcu_normal are specified, rcu_normal wins.
      This means that if you are relying on expedited grace periods to speed up
      boot, you will want to specify rcu_expedited on the kernel command line,
      and then specify rcu_normal via sysfs once boot completes.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5a9be7c6
  18. 07 10月, 2015 4 次提交
    • P
      rcu: Remove deprecated rcu_lockdep_assert() · e62e3f62
      Paul E. McKenney 提交于
      The old rcu_lockdep_assert() was retained to ease handling of incoming
      patches, but any use will result in deprecated warnings.  However, its
      replacement, RCU_LOCKDEP_WARN(), is now upstream.  It is therefore
      time to remove rcu_lockdep_assert(), which this commit does.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      e62e3f62
    • P
      rcu: Add rcu_pointer_handoff() · c3ac7cf1
      Paul E. McKenney 提交于
      This commit adds an rcu_pointer_handoff() that is intended to mark
      situations where a structure's protection transitions from RCU to some
      other mechanism (locking, reference counting, whatever).  These markings
      should allow external tools to more easily spot bugs involving leaking
      pointers out of RCU read-side critical sections.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c3ac7cf1
    • B
      rcu: Don't disable preemption for Tiny and Tree RCU readers · bb73c52b
      Boqun Feng 提交于
      Because preempt_disable() maps to barrier() for non-debug builds,
      it forces the compiler to spill and reload registers.  Because Tree
      RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
      barrier() instances generate needless extra code for each instance of
      rcu_read_lock() and rcu_read_unlock().  This extra code slows down Tree
      RCU and bloats Tiny RCU.
      
      This commit therefore removes the preempt_disable() and preempt_enable()
      from the non-preemptible implementations of __rcu_read_lock() and
      __rcu_read_unlock(), respectively.  However, for debug purposes,
      preempt_disable() and preempt_enable() are still invoked if
      CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
      atomic sections in non-preemptible kernels.
      
      However, Tiny and Tree RCU operates by coalescing all RCU read-side
      critical sections on a given CPU that lie between successive quiescent
      states.  It is therefore necessary to compensate for removing barriers
      from __rcu_read_lock() and __rcu_read_unlock() by adding them to a
      couple of the RCU functions invoked during quiescent states, namely to
      rcu_all_qs() and rcu_note_context_switch().  However, note that the latter
      is more paranoia than necessity, at least until link-time optimizations
      become more aggressive.
      
      This is based on an earlier patch by Paul E. McKenney, fixing
      a bug encountered in kernels built with CONFIG_PREEMPT=n and
      CONFIG_PREEMPT_COUNT=y.
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bb73c52b
    • B
      rcu: Use rcu_callback_t in call_rcu*() and friends · b6a4ae76
      Boqun Feng 提交于
      As we now have rcu_callback_t typedefs as the type of rcu callbacks, we
      should use it in call_rcu*() and friends as the type of parameters. This
      could save us a few lines of code and make it clear which function
      requires an rcu callbacks rather than other callbacks as its argument.
      
      Besides, this can also help cscope to generate a better database for
      code reading.
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      b6a4ae76
  19. 21 9月, 2015 1 次提交
  20. 23 7月, 2015 3 次提交
  21. 16 7月, 2015 1 次提交
    • D
      rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOC · d5671f6b
      Denys Vlasenko 提交于
      DEBUG_LOCK_ALLOC=y is not a production setting, but it is
      not very unusual either. Many developers routinely
      use kernels built with it enabled.
      
      Apart from being selected by hand, it is also auto-selected by
      PROVE_LOCKING "Lock debugging: prove locking correctness" and
      LOCK_STAT "Lock usage statistics" config options.
      LOCK STAT is necessary for "perf lock" to work.
      
      I wouldn't spend too much time optimizing it, but this particular
      function has a very large cost in code size: when it is deinlined,
      code size decreases by 830,000 bytes:
      
          text     data      bss       dec     hex filename
      85674192 22294776 20627456 128596424 7aa39c8 vmlinux.before
      84837612 22294424 20627456 127759492 79d7484 vmlinux
      
      (with this config: http://busybox.net/~vda/kernel_config)
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      CC: Josh Triplett <josh@joshtriplett.org>
      CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      CC: Lai Jiangshan <laijs@cn.fujitsu.com>
      CC: Tejun Heo <tj@kernel.org>
      CC: Oleg Nesterov <oleg@redhat.com>
      CC: linux-kernel@vger.kernel.org
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d5671f6b
  22. 07 7月, 2015 1 次提交
  23. 28 5月, 2015 5 次提交
  24. 22 4月, 2015 1 次提交
  25. 13 3月, 2015 1 次提交
    • P
      rcu: Handle outgoing CPUs on exit from idle loop · 88428cc5
      Paul E. McKenney 提交于
      This commit informs RCU of an outgoing CPU just before that CPU invokes
      arch_cpu_idle_dead() during its last pass through the idle loop (via a
      new CPU_DYING_IDLE notifier value).  This change means that RCU need not
      deal with outgoing CPUs passing through the scheduler after informing
      RCU that they are no longer online.  Note that removing the CPU from
      the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
      and orphaning callbacks is still done at CPU_DEAD time, the reason being
      that at CPU_DEAD time we have another CPU that can adopt them.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      88428cc5
  26. 04 3月, 2015 1 次提交
    • P
      rcu: Reverse rcu_dereference_check() conditions · b826565a
      Paul E. McKenney 提交于
      The rcu_dereference_check() family of primitives evaluates the RCU
      lockdep expression first, and only then evaluates the expression passed
      in.  This works fine normally, but can potentially fail in environments
      (such as NMI handlers) where lockdep cannot be invoked.  The problem is
      that even if the expression passed in is "1", the compiler would need to
      prove that the RCU lockdep expression (rcu_read_lock_held(), for example)
      is free of side effects in order to be able to elide it.  Given that
      rcu_read_lock_held() is sometimes separately compiled, the compiler cannot
      always use this optimization.
      
      This commit therefore reverse the order of evaluation, so that the
      expression passed in is evaluated first, and the RCU lockdep expression is
      evaluated only if the passed-in expression evaluated to false, courtesy
      of the C-language short-circuit boolean evaluation rules.  This compells
      the compiler to forego executing the RCU lockdep expression in cases
      where the passed-in expression evaluates to "1" at compile time, so that
      (for example) rcu_dereference_raw() can be guaranteed to execute safely
      within an NMI handler.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      b826565a