1. 21 9月, 2015 6 次提交
    • P
      rcu: Invert passed_quiesce and rename to cpu_no_qs · 0d43eb34
      Paul E. McKenney 提交于
      This commit inverts the sense of the rcu_data structure's ->passed_quiesce
      field and renames it to ->cpu_no_qs.  This will allow a later commit to
      use an "aggregate OR" operation to test expedited as well as normal grace
      periods without added overhead.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0d43eb34
    • P
      rcu: Rename qs_pending to core_needs_qs · 97c668b8
      Paul E. McKenney 提交于
      An upcoming commit needs to invert the sense of the ->passed_quiesce
      rcu_data structure field, so this commit is taking this opportunity
      to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.
      
      So if !rdp->core_needs_qs, then this CPU need not concern itself with
      quiescent states, in particular, it need not acquire its leaf rcu_node
      structure's ->lock to check.  Otherwise, it needs to report the next
      quiescent state.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      97c668b8
    • P
      rcu: Use single-stage IPI algorithm for RCU expedited grace period · 8203d6d0
      Paul E. McKenney 提交于
      The current preemptible-RCU expedited grace-period algorithm invokes
      synchronize_sched_expedited() to enqueue all tasks currently running
      in a preemptible-RCU read-side critical section, then waits for all the
      ->blkd_tasks lists to drain.  This works, but results in both an IPI and
      a double context switch even on CPUs that do not happen to be running
      in a preemptible RCU read-side critical section.
      
      This commit implements a new algorithm that causes less OS jitter.
      This new algorithm IPIs all online CPUs that are not idle (from an
      RCU perspective), but refrains from self-IPIs.  If a CPU receiving
      this IPI is not in a preemptible RCU read-side critical section (or
      is just now exiting one), it pushes quiescence up the rcu_node tree,
      otherwise, it sets a flag that will be handled by the upcoming outermost
      rcu_read_unlock(), which will then push quiescence up the tree.
      
      The expedited grace period must of course wait on any pre-existing blocked
      readers, and newly blocked readers must be queued carefully based on
      the state of both the normal and the expedited grace periods.  This
      new queueing approach also avoids the need to update boost state,
      courtesy of the fact that blocked tasks are no longer ever migrated to
      the root rcu_node structure.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8203d6d0
    • P
      rcu: Consolidate tree setup for synchronize_rcu_expedited() · b9585e94
      Paul E. McKenney 提交于
      This commit replaces sync_rcu_preempt_exp_init1(() and
      sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug()
      and sync_exp_reset_tree(), which will also be used by
      synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which
      contains code specific to synchronize_rcu_expedited().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b9585e94
    • P
      rcu: Move rcu_report_exp_rnp() to allow consolidation · 7922cd0e
      Paul E. McKenney 提交于
      This is a nearly pure code-movement commit, moving rcu_report_exp_rnp(),
      sync_rcu_preempt_exp_done(), and rcu_preempted_readers_exp() so
      that later commits can make synchronize_sched_expedited() use them.
      The non-code-movement portion of this commit tags rcu_report_exp_rnp()
      as __maybe_unused to avoid build errors when CONFIG_PREEMPT=n.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      7922cd0e
    • P
      rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq · f4ecea30
      Paul E. McKenney 提交于
      Now that there is an ->expedited_wq waitqueue in each rcu_state structure,
      there is no need for the sync_rcu_preempt_exp_wq global variable.  This
      commit therefore substitutes ->expedited_wq for sync_rcu_preempt_exp_wq.
      It also initializes ->expedited_wq only once at boot instead of at the
      start of each expedited grace period.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f4ecea30
  2. 23 7月, 2015 3 次提交
  3. 18 7月, 2015 5 次提交
  4. 16 7月, 2015 1 次提交
  5. 19 6月, 2015 1 次提交
    • T
      timer: Reduce timer migration overhead if disabled · bc7a34b8
      Thomas Gleixner 提交于
      Eric reported that the timer_migration sysctl is not really nice
      performance wise as it needs to check at every timer insertion whether
      the feature is enabled or not. Further the check does not live in the
      timer code, so we have an extra function call which checks an extra
      cache line to figure out that it is disabled.
      
      We can do better and store that information in the per cpu (hr)timer
      bases. I pondered to use a static key, but that's a nightmare to
      update from the nohz code and the timer base cache line is hot anyway
      when we select a timer base.
      
      The old logic enabled the timer migration unconditionally if
      CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
      line.
      
      With this modification, we start off with migration disabled. The user
      visible sysctl is still set to enabled. If the kernel switches to NOHZ
      migration is enabled, if the user did not disable it via the sysctl
      prior to the switch. If nohz=off is on the kernel command line,
      migration stays disabled no matter what.
      
      Before:
        47.76%  hog       [.] main
        14.84%  [kernel]  [k] _raw_spin_lock_irqsave
         9.55%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.71%  [kernel]  [k] mod_timer
         6.24%  [kernel]  [k] lock_timer_base.isra.38
         3.76%  [kernel]  [k] detach_if_pending
         3.71%  [kernel]  [k] del_timer
         2.50%  [kernel]  [k] internal_add_timer
         1.51%  [kernel]  [k] get_nohz_timer_target
         1.28%  [kernel]  [k] __internal_add_timer
         0.78%  [kernel]  [k] timerfn
         0.48%  [kernel]  [k] wake_up_nohz_cpu
      
      After:
        48.10%  hog       [.] main
        15.25%  [kernel]  [k] _raw_spin_lock_irqsave
         9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.50%  [kernel]  [k] mod_timer
         6.44%  [kernel]  [k] lock_timer_base.isra.38
         3.87%  [kernel]  [k] detach_if_pending
         3.80%  [kernel]  [k] del_timer
         2.67%  [kernel]  [k] internal_add_timer
         1.33%  [kernel]  [k] __internal_add_timer
         0.73%  [kernel]  [k] timerfn
         0.54%  [kernel]  [k] wake_up_nohz_cpu
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      bc7a34b8
  6. 28 5月, 2015 12 次提交
  7. 22 4月, 2015 1 次提交
  8. 13 3月, 2015 3 次提交
    • P
      rcu: Process offlining and onlining only at grace-period start · 0aa04b05
      Paul E. McKenney 提交于
      Races between CPU hotplug and grace periods can be difficult to resolve,
      so the ->onoff_mutex is used to exclude the two events.  Unfortunately,
      this means that it is impossible for an outgoing CPU to perform the
      last bits of its offlining from its last pass through the idle loop,
      because sleeplocks cannot be acquired in that context.
      
      This commit avoids these problems by buffering online and offline events
      in a new ->qsmaskinitnext field in the leaf rcu_node structures.  When a
      grace period starts, the events accumulated in this mask are applied to
      the ->qsmaskinit field, and, if needed, up the rcu_node tree.  The special
      case of all CPUs corresponding to a given leaf rcu_node structure being
      offline while there are still elements in that structure's ->blkd_tasks
      list is handled using a new ->wait_blkd_tasks field.  In this case,
      propagating the offline bits up the tree is deferred until the beginning
      of the grace period after all of the tasks have exited their RCU read-side
      critical sections and removed themselves from the list, at which point
      the ->wait_blkd_tasks flag is cleared.  If one of that leaf rcu_node
      structure's CPUs comes back online before the list empties, then the
      ->wait_blkd_tasks flag is simply cleared.
      
      This of course means that RCU's notion of which CPUs are offline can be
      out of date.  This is OK because RCU need only wait on CPUs that were
      online at the time that the grace period started.  In addition, RCU's
      force-quiescent-state actions will handle the case where a CPU goes
      offline after the grace period starts.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0aa04b05
    • P
      rcu: Move rcu_report_unblock_qs_rnp() to common code · cc99a310
      Paul E. McKenney 提交于
      The rcu_report_unblock_qs_rnp() function is invoked when the
      last task blocking the current grace period exits its outermost
      RCU read-side critical section.  Previously, this was called only
      from rcu_read_unlock_special(), and was therefore defined only when
      CONFIG_RCU_PREEMPT=y.  However, this function will be invoked even when
      CONFIG_RCU_PREEMPT=n once CPU-hotplug operations are processed only at
      the beginnings of RCU grace periods.  The reason for this change is that
      the last task on a given leaf rcu_node structure's ->blkd_tasks list
      might well exit its RCU read-side critical section between the time that
      recent CPU-hotplug operations were applied and when the new grace period
      was initialized.  This situation could result in RCU waiting forever on
      that leaf rcu_node structure, because if all that structure's CPUs were
      already offline, there would be no quiescent-state events to drive that
      structure's part of the grace period.
      
      This commit therefore moves rcu_report_unblock_qs_rnp() to common code
      that is built unconditionally so that the quiescent-state-forcing code
      can clean up after this situation, avoiding the grace-period stall.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      cc99a310
    • P
      rcu: Rework preemptible expedited bitmask handling · 8eb74b2b
      Paul E. McKenney 提交于
      Currently, the rcu_node tree ->expmask bitmasks are initially set to
      reflect the online CPUs.  This is pointless, because only the CPUs
      preempted within RCU read-side critical sections by the preceding
      synchronize_sched_expedited() need to be tracked.  This commit therefore
      instead sets up these bitmasks based on the state of the ->blkd_tasks
      lists.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8eb74b2b
  9. 12 3月, 2015 2 次提交
  10. 04 3月, 2015 4 次提交
  11. 27 2月, 2015 2 次提交
    • P
      rcu: Tighten up affinity and check for sysidle · 5871968d
      Paul E. McKenney 提交于
      If the RCU grace-period kthread invoking rcu_sysidle_check_cpu()
      happens to be running on the tick_do_timer_cpu initially,
      then rcu_bind_gp_kthread() won't bind it.  This kthread might
      then migrate before invoking rcu_gp_fqs(), which will trigger the
      WARN_ON_ONCE() in rcu_sysidle_check_cpu().  This commit therefore makes
      rcu_bind_gp_kthread() do the binding even if the kthread is currently
      on the same CPU.  Because this incurs added overhead, this commit also
      causes each RCU grace-period kthread to invoke rcu_bind_gp_kthread()
      once at boot rather than at the beginning of each grace period.
      And as long as rcu_bind_gp_kthread() is being modified, this commit
      eliminates its #ifdef.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5871968d
    • P
      rcu: Update from rcu_expedited variable to rcu_gp_is_expedited() · 5afff48b
      Paul E. McKenney 提交于
      This commit updates open-coded tests of the rcu_expedited variable
      to instead use rcu_gp_is_expedited().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5afff48b