1. 02 3月, 2017 2 次提交
    • I
      sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h> · ae7e81c0
      Ingo Molnar 提交于
      We are going to move scheduler ABI details to <uapi/linux/sched/types.h>,
      which will be used from a number of .c files.
      
      Create empty placeholder header that maps to <linux/types.h>.
      
      Include the new header in the files that are going to need it.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ae7e81c0
    • I
      rcu: Separate the RCU synchronization types and APIs into <linux/rcupdate_wait.h> · f9411ebe
      Ingo Molnar 提交于
      So rcupdate.h is a pretty complex header, in particular it includes
      <linux/completion.h> which includes <linux/wait.h> - creating a
      dependency that includes <linux/wait.h> in <linux/sched.h>,
      which prevents the isolation of <linux/sched.h> from the derived
      <linux/wait.h> header.
      
      Solve part of the problem by decoupling rcupdate.h from completions:
      this can be done by separating out the rcu_synchronize types and APIs,
      and updating their usage sites.
      
      Since this is a mostly RCU-internal types this will not just simplify
      <linux/sched.h>'s dependencies, but will make all the hundreds of
      .c files that include rcupdate.h but not completions or wait.h build
      faster.
      
      ( For rcutiny this means that two dependent APIs have to be uninlined,
        but that shouldn't be much of a problem as they are rare variants. )
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f9411ebe
  2. 26 1月, 2017 3 次提交
    • P
      srcu: Reduce probability of SRCU ->unlock_count[] counter overflow · 7f554a3d
      Paul E. McKenney 提交于
      Because there are no memory barriers between the srcu_flip() ->completed
      increment and the summation of the read-side ->unlock_count[] counters,
      both the compiler and the CPU can reorder the summation with the
      ->completed increment.  If the updater is preempted long enough during
      this process, the read-side counters could overflow, resulting in a
      too-short grace period.
      
      This commit therefore adds a memory barrier just after the ->completed
      increment, ensuring that if the summation misses an increment of
      ->unlock_count[] from __srcu_read_unlock(), the next __srcu_read_lock()
      will see the new value of ->completed, thus bounding the number of
      ->unlock_count[] increments that can be missed to NR_CPUS.  The actual
      overflow computation is more complex due to the possibility of nesting
      of __srcu_read_lock().
      Reported-by: NLance Roy <ldr709@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      7f554a3d
    • P
      srcu: Force full grace-period ordering · d85b62f1
      Paul E. McKenney 提交于
      If a process invokes synchronize_srcu(), is delayed just the right amount
      of time, and thus does not sleep when waiting for the grace period to
      complete, there is no ordering between the end of the grace period and
      the code following the synchronize_srcu().  Similarly, there can be a
      lack of ordering between the end of the SRCU grace period and callback
      invocation.
      
      This commit adds the necessary ordering.
      Reported-by: NLance Roy <ldr709@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Further smp_mb() adjustment per email with Lance Roy. ]
      d85b62f1
    • L
      srcu: Implement more-efficient reader counts · f2c46896
      Lance Roy 提交于
      SRCU uses two per-cpu counters: a nesting counter to count the number of
      active critical sections, and a sequence counter to ensure that the nesting
      counters don't change while they are being added together in
      srcu_readers_active_idx_check().
      
      This patch instead uses per-cpu lock and unlock counters. Because both
      counters only increase and srcu_readers_active_idx_check() reads the unlock
      counter before the lock counter, this achieves the same end without having
      to increment two different counters in srcu_read_lock(). This also saves a
      smp_mb() in srcu_readers_active_idx_check().
      
      Possible bug: There is no guarantee that the lock counter won't overflow
      during srcu_readers_active_idx_check(), as there are no memory barriers
      around srcu_flip() (see comment in srcu_readers_active_idx_check() for
      details). However, this problem was already present before this patch.
      Suggested-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NLance Roy <ldr709@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f2c46896
  3. 24 1月, 2017 16 次提交
  4. 17 1月, 2017 3 次提交
  5. 15 1月, 2017 2 次提交
    • P
      rcu: Narrow early boot window of illegal synchronous grace periods · 52d7e48b
      Paul E. McKenney 提交于
      The current preemptible RCU implementation goes through three phases
      during bootup.  In the first phase, there is only one CPU that is running
      with preemption disabled, so that a no-op is a synchronous grace period.
      In the second mid-boot phase, the scheduler is running, but RCU has
      not yet gotten its kthreads spawned (and, for expedited grace periods,
      workqueues are not yet running.  During this time, any attempt to do
      a synchronous grace period will hang the system (or complain bitterly,
      depending).  In the third and final phase, RCU is fully operational and
      everything works normally.
      
      This has been OK for some time, but there has recently been some
      synchronous grace periods showing up during the second mid-boot phase.
      This code worked "by accident" for awhile, but started failing as soon
      as expedited RCU grace periods switched over to workqueues in commit
      8b355e3b ("rcu: Drive expedited grace periods from workqueue").
      Note that the code was buggy even before this commit, as it was subject
      to failure on real-time systems that forced all expedited grace periods
      to run as normal grace periods (for example, using the rcu_normal ksysfs
      parameter).  The callchain from the failure case is as follows:
      
      early_amd_iommu_init()
      |-> acpi_put_table(ivrs_base);
      |-> acpi_tb_put_table(table_desc);
      |-> acpi_tb_invalidate_table(table_desc);
      |-> acpi_tb_release_table(...)
      |-> acpi_os_unmap_memory
      |-> acpi_os_unmap_iomem
      |-> acpi_os_map_cleanup
      |-> synchronize_rcu_expedited
      
      The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
      which caused the code to try using workqueues before they were
      initialized, which did not go well.
      
      This commit therefore reworks RCU to permit synchronous grace periods
      to proceed during this mid-boot phase.  This commit is therefore a
      fix to a regression introduced in v4.9, and is therefore being put
      forward post-merge-window in v4.10.
      
      This commit sets a flag from the existing rcu_scheduler_starting()
      function which causes all synchronous grace periods to take the expedited
      path.  The expedited path now checks this flag, using the requesting task
      to drive the expedited grace period forward during the mid-boot phase.
      Finally, this flag is updated by a core_initcall() function named
      rcu_exp_runtime_mode(), which causes the runtime codepaths to be used.
      
      Note that this arrangement assumes that tasks are not sent POSIX signals
      (or anything similar) from the time that the first task is spawned
      through core_initcall() time.
      
      Fixes: 8b355e3b ("rcu: Drive expedited grace periods from workqueue")
      Reported-by: N"Zheng, Lv" <lv.zheng@intel.com>
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NStan Kain <stan.kain@gmail.com>
      Tested-by: NIvan <waffolz@hotmail.com>
      Tested-by: NEmanuel Castelo <emanuel.castelo@gmail.com>
      Tested-by: NBruno Pesavento <bpesavento@infinito.it>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NFrederic Bezies <fredbezies@gmail.com>
      Cc: <stable@vger.kernel.org> # 4.9.0-
      52d7e48b
    • P
      rcu: Remove cond_resched() from Tiny synchronize_sched() · f466ae66
      Paul E. McKenney 提交于
      It is now legal to invoke synchronize_sched() at early boot, which causes
      Tiny RCU's synchronize_sched() to emit spurious splats.  This commit
      therefore removes the cond_resched() from Tiny RCU's synchronize_sched().
      
      Fixes: 8b355e3b ("rcu: Drive expedited grace periods from workqueue")
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org> # 4.9.0-
      f466ae66
  6. 15 11月, 2016 6 次提交
  7. 11 10月, 2016 1 次提交
    • E
      latent_entropy: Mark functions with __latent_entropy · 0766f788
      Emese Revfy 提交于
      The __latent_entropy gcc attribute can be used only on functions and
      variables.  If it is on a function then the plugin will instrument it for
      gathering control-flow entropy. If the attribute is on a variable then
      the plugin will initialize it with random contents.  The variable must
      be an integer, an integer array type or a structure with integer fields.
      
      These specific functions have been selected because they are init
      functions (to help gather boot-time entropy), are called at unpredictable
      times, or they have variable loops, each of which provide some level of
      latent entropy.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      0766f788
  8. 23 8月, 2016 7 次提交
    • S
      rcuperf: Consistently insert space between flag and message · a56fefa2
      SeongJae Park 提交于
      A few rcuperf dmesg output messages have no space between the flag and
      the start of the message. In contrast, every other messages consistently
      supplies a single space.  This difference makes rcuperf dmesg output
      hard to read and to mechanically parse.  This commit therefore fixes
      this problem by modifying a pr_alert() call and PERFOUT_STRING() macro
      function to provide that single space.
      Signed-off-by: NSeongJae Park <sj38.park@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a56fefa2
    • S
      rcutorture: Print out barrier error as document says · 472213a6
      SeongJae Park 提交于
      Tests for rcu_barrier() were introduced by commit fae4b54f ("rcu:
      Introduce rcutorture testing for rcu_barrier()").  This commit updated
      the documentation to say that the "rtbe" field in rcutorture's dmesg
      output indicates test failure.  However, the code was not updated, only
      the documentation.  This commit therefore updates the code to match the
      updated documentation.
      Signed-off-by: NSeongJae Park <sj38.park@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      472213a6
    • P
      torture: Add task state to writer-task stall printk()s · 4ffa6699
      Paul E. McKenney 提交于
      This commit adds a dump of the scheduler state for stalled rcutorture
      writer tasks.  This addition provides yet more debug for the intermittent
      "failures to proceed", where grace periods move ahead but the rcutorture
      writer tasks fail to do so.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4ffa6699
    • S
      rcutorture: Convert to hotplug state machine · 0ffd374b
      Sebastian Andrzej Siewior 提交于
      Install the callbacks via the state machine and let the core invoke
      the callbacks on the already online CPUs.
      
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0ffd374b
    • P
      rcu: Provide exact CPU-online tracking for RCU · 7ec99de3
      Paul E. McKenney 提交于
      Up to now, RCU has assumed that the CPU-online process makes it from
      CPU_UP_PREPARE to set_cpu_online() within one jiffy.  Given the recent
      rise of virtualized environments, this assumption is very clearly
      obsolete.  Failing to meet this deadline can result in RCU paying
      attention to an incoming CPU for one jiffy, then ignoring it until the
      grace period following the one in which that CPU sets itself online.
      This situation might prove to be fatally disappointing to any RCU
      read-side critical sections that had the misfortune to execute during
      the time in which RCU was ignoring the slow-to-come-online CPU.
      
      This commit therefore updates RCU's internal CPU state-tracking
      information at notify_cpu_starting() time, thus providing RCU with
      an exact transition of the CPU's state from offline to online.
      
      Note that this means that incoming CPUs must not use RCU read-side
      critical section (other than those of SRCU) until notify_cpu_starting()
      time.  Note also that the CPU_STARTING notifiers -are- allowed to use
      RCU read-side critical sections.  (Of course, CPU-hotplug notifiers are
      rapidly becoming obsolete, so you need to act fast!)
      
      If a given architecture or CPU family needs to use RCU read-side
      critical sections earlier, the call to rcu_cpu_starting() from
      notify_cpu_starting() will need to be architecture-specific, with
      architectures that need early use being required to hand-place
      the call to rcu_cpu_starting() at some point preceding the call to
      notify_cpu_starting().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      7ec99de3
    • P
      rcu: Avoid redundant quiescent-state chasing · 3563a438
      Paul E. McKenney 提交于
      Currently, __note_gp_changes() checks to see if the CPU has slept through
      multiple grace periods.  If it has, it resynchronizes that CPU's view
      of the grace-period state, which includes whether or not the current
      grace period needs a quiescent state from this CPU.  The fact of this
      need (or lack thereof) needs to be in two places, rdp->cpu_no_qs.b.norm
      and rdp->core_needs_qs.  The former tells RCU's context-switch code to
      go get a quiescent state and the latter says that it needs to be reported.
      The current code unconditionally sets the former to true, but correctly
      sets the latter.
      
      This does not result in failures, but it does unnecessarily increase
      the amount of work done on average at context-switch time.  This commit
      therefore correctly sets both fields.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3563a438
    • P
      rcu: Don't use modular infrastructure in non-modular code · e77b7041
      Paul Gortmaker 提交于
      The Kconfig currently controlling compilation of tree.c is:
      
      init/Kconfig:config TREE_RCU
      init/Kconfig:   bool
      
      ...and update.c and sync.c are "obj-y" meaning that none are ever
      built as a module by anyone.
      
      Since MODULE_ALIAS is a no-op for non-modular code, we can remove
      them from these files.
      
      We leave moduleparam.h behind since the files instantiate some boot
      time configuration parameters with module_param() still.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e77b7041