1. 12 12月, 2011 11 次提交
    • F
      rcu: Inform the user about extended quiescent state on PROVE_RCU warning · 0464e937
      Frederic Weisbecker 提交于
      Inform the user if an RCU usage error is detected by lockdep while in
      an extended quiescent state (in this case, the RCU-free window in idle).
      This is accomplished by adding a line to the RCU lockdep splat indicating
      whether or not the splat occurred in extended quiescent state.
      
      Uses of RCU from within extended quiescent state mode are totally ignored
      by RCU, hence the importance of this diagnostic.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      0464e937
    • F
      rcu: Detect illegal rcu dereference in extended quiescent state · e6b80a3b
      Frederic Weisbecker 提交于
      Report that none of the rcu read lock maps are held while in an RCU
      extended quiescent state (the section between rcu_idle_enter()
      and rcu_idle_exit()). This helps detect any use of rcu_dereference()
      and friends from within the section in idle where RCU is not allowed.
      
      This way we can guarantee an extended quiescent window where the CPU
      can be put in dyntick idle mode or can simply aoid to be part of any
      global grace period completion while in the idle loop.
      
      Uses of RCU from such mode are totally ignored by RCU, hence the
      importance of these checks.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      e6b80a3b
    • T
      rcu: Remove redundant return from rcu_report_exp_rnp() · a0f8eefb
      Thomas Gleixner 提交于
      Empty void functions do not need "return", so this commit removes it
      from rcu_report_exp_rnp().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a0f8eefb
    • T
      rcu: Omit self-awaken when setting up expedited grace period · b40d293e
      Thomas Gleixner 提交于
      When setting up an expedited grace period, if there were no readers, the
      task will awaken itself.  This commit removes this useless self-awakening.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b40d293e
    • P
      rcu: Disable preemption in rcu_is_cpu_idle() · 34240697
      Paul E. McKenney 提交于
      Because rcu_is_cpu_idle() is to be used to check for extended quiescent
      states in RCU-preempt read-side critical sections, it cannot assume that
      preemption is disabled.  And preemption must be disabled when accessing
      the dyntick-idle state, because otherwise the following sequence of events
      could occur:
      
      1.	Task A on CPU 1 enters rcu_is_cpu_idle() and picks up the pointer
      	to CPU 1's per-CPU variables.
      
      2.	Task B preempts Task A and starts running on CPU 1.
      
      3.	Task A migrates to CPU 2.
      
      4.	Task B blocks, leaving CPU 1 idle.
      
      5.	Task A continues execution on CPU 2, accessing CPU 1's dyntick-idle
      	information using the pointer fetched in step 1 above, and finds
      	that CPU 1 is idle.
      
      6.	Task A therefore incorrectly concludes that it is executing in
      	an extended quiescent state, possibly issuing a spurious splat.
      
      Therefore, this commit disables preemption within the rcu_is_cpu_idle()
      function.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      34240697
    • P
      rcu: Add failure tracing to rcutorture · 91afaf30
      Paul E. McKenney 提交于
      Trace the rcutorture RCU accesses and dump the trace buffer when the
      first failure is detected.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      91afaf30
    • P
      trace: Allow ftrace_dump() to be called from modules · a8eecf22
      Paul E. McKenney 提交于
      Add an EXPORT_SYMBOL_GPL() so that rcutorture can dump the trace buffer
      upon detection of an RCU error.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      a8eecf22
    • P
      rcu: Track idleness independent of idle tasks · 9b2e4f18
      Paul E. McKenney 提交于
      Earlier versions of RCU used the scheduling-clock tick to detect idleness
      by checking for the idle task, but handled idleness differently for
      CONFIG_NO_HZ=y.  But there are now a number of uses of RCU read-side
      critical sections in the idle task, for example, for tracing.  A more
      fine-grained detection of idleness is therefore required.
      
      This commit presses the old dyntick-idle code into full-time service,
      so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
      always invoked at the beginning of an idle loop iteration.  Similarly,
      rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
      at the end of an idle-loop iteration.  This allows the idle task to
      use RCU everywhere except between consecutive rcu_idle_enter() and
      rcu_idle_exit() calls, in turn allowing architecture maintainers to
      specify exactly where in the idle loop that RCU may be used.
      
      Because some of the userspace upcall uses can result in what looks
      to RCU like half of an interrupt, it is not possible to expect that
      the irq_enter() and irq_exit() hooks will give exact counts.  This
      patch therefore expands the ->dynticks_nesting counter to 64 bits
      and uses two separate bitfields to count process/idle transitions
      and interrupt entry/exit transitions.  It is presumed that userspace
      upcalls do not happen in the idle loop or from usermode execution
      (though usermode might do a system call that results in an upcall).
      The counter is hard-reset on each process/idle transition, which
      avoids the interrupt entry/exit error from accumulating.  Overflow
      is avoided by the 64-bitness of the ->dyntick_nesting counter.
      
      This commit also adds warnings if a non-idle task asks RCU to enter
      idle state (and these checks will need some adjustment before applying
      Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
      In addition, validation of ->dynticks and ->dynticks_nesting is added.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      9b2e4f18
    • P
      rcu: Make synchronize_sched_expedited() better at work sharing · 7077714e
      Paul E. McKenney 提交于
      When synchronize_sched_expedited() takes its second and subsequent
      snapshots of sync_sched_expedited_started, it subtracts 1.  This
      means that the concurrent caller of synchronize_sched_expedited()
      that incremented to that value sees our successful completion, it
      will not be able to take advantage of it.  This restriction is
      pointless, given that our full expedited grace period would have
      happened after the other guy started, and thus should be able to
      serve as a proxy for the other guy successfully executing
      try_stop_cpus().
      
      This commit therefore removes the subtraction of 1.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      7077714e
    • P
      rcu: Avoid RCU-preempt expedited grace-period botch · 389abd48
      Paul E. McKenney 提交于
      Because rcu_read_unlock_special() samples rcu_preempted_readers_exp(rnp)
      after dropping rnp->lock, the following sequence of events is possible:
      
      1.	Task A exits its RCU read-side critical section, and removes
      	itself from the ->blkd_tasks list, releases rnp->lock, and is
      	then preempted.  Task B remains on the ->blkd_tasks list, and
      	blocks the current expedited grace period.
      
      2.	Task B exits from its RCU read-side critical section and removes
      	itself from the ->blkd_tasks list.  Because it is the last task
      	blocking the current expedited grace period, it ends that
      	expedited grace period.
      
      3.	Task A resumes, and samples rcu_preempted_readers_exp(rnp) which
      	of course indicates that nothing is blocking the nonexistent
      	expedited grace period. Task A is again preempted.
      
      4.	Some other CPU starts an expedited grace period.  There are several
      	tasks blocking this expedited grace period queued on the
      	same rcu_node structure that Task A was using in step 1 above.
      
      5.	Task A examines its state and incorrectly concludes that it was
      	the last task blocking the expedited grace period on the current
      	rcu_node structure.  It therefore reports completion up the
      	rcu_node tree.
      
      6.	The expedited grace period can then incorrectly complete before
      	the tasks blocked on this same rcu_node structure exit their
      	RCU read-side critical sections.  Arbitrarily bad things happen.
      
      This commit therefore takes a snapshot of rcu_preempted_readers_exp(rnp)
      prior to dropping the lock, so that only the last task thinks that it is
      the last task, thus avoiding the failure scenario laid out above.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      389abd48
    • P
      rcu: ->signaled better named ->fqs_state · af446b70
      Paul E. McKenney 提交于
      The ->signaled field was named before complications in the form of
      dyntick-idle mode and offlined CPUs.  These complications have required
      that force_quiescent_state() be implemented as a state machine, instead
      of simply unconditionally sending reschedule IPIs.  Therefore, this
      commit renames ->signaled to ->fqs_state to catch up with the new
      force_quiescent_state() reality.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      af446b70
  2. 09 12月, 2011 2 次提交
  3. 07 12月, 2011 2 次提交
  4. 06 12月, 2011 6 次提交
  5. 05 12月, 2011 1 次提交
    • P
      perf: Fix loss of notification with multi-event · 10c6db11
      Peter Zijlstra 提交于
      When you do:
              $ perf record -e cycles,cycles,cycles noploop 10
      
      You expect about 10,000 samples for each event, i.e., 10s at
      1000samples/sec. However, this is not what's happening. You
      get much fewer samples, maybe 3700 samples/event:
      
      $ perf report -D | tail -15
      Aggregated stats:
                 TOTAL events:      10998
                  MMAP events:         66
                  COMM events:          2
                SAMPLE events:      10930
      cycles stats:
                 TOTAL events:       3644
                SAMPLE events:       3644
      cycles stats:
                 TOTAL events:       3642
                SAMPLE events:       3642
      cycles stats:
                 TOTAL events:       3644
                SAMPLE events:       3644
      
      On a Intel Nehalem or even AMD64, there are 4 counters capable
      of measuring cycles, so there is plenty of space to measure those
      events without multiplexing (even with the NMI watchdog active).
      And even with multiplexing, we'd expect roughly the same number
      of samples per event.
      
      The root of the problem was that when the event that caused the buffer
      to become full was not the first event passed on the cmdline, the user
      notification would get lost. The notification was sent to the file
      descriptor of the overflowed event but the perf tool was not polling
      on it.  The perf tool aggregates all samples into a single buffer,
      i.e., the buffer of the first event. Consequently, it assumes
      notifications for any event will come via that descriptor.
      
      The seemingly straight forward solution of moving the waitq into the
      ringbuffer object doesn't work because of life-time issues. One could
      perf_event_set_output() on a fd that you're also blocking on and cause
      the old rb object to be freed while its waitq would still be
      referenced by the blocked thread -> FAIL.
      
      Therefore link all events to the ringbuffer and broadcast the wakeup
      from the ringbuffer object to all possible events that could be waited
      upon. This is rather ugly, and we're open to better solutions but it
      works for now.
      Reported-by: NStephane Eranian <eranian@google.com>
      Finished-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20111126014731.GA7030@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      10c6db11
  6. 02 12月, 2011 5 次提交
  7. 29 11月, 2011 1 次提交
  8. 25 11月, 2011 1 次提交
    • M
      cgroup_freezer: fix freezing groups with stopped tasks · 884a45d9
      Michal Hocko 提交于
      2d3cbf8b (cgroup_freezer: update_freezer_state() does incorrect state
      transitions) removed is_task_frozen_enough and replaced it with a simple
      frozen call. This, however, breaks freezing for a group with stopped tasks
      because those cannot be frozen and so the group remains in CGROUP_FREEZING
      state (update_if_frozen doesn't count stopped tasks) and never reaches
      CGROUP_FROZEN.
      
      Let's add is_task_frozen_enough back and use it at the original locations
      (update_if_frozen and try_to_freeze_cgroup). Semantically we consider
      stopped tasks as frozen enough so we should consider both cases when
      testing frozen tasks.
      
      Testcase:
      mkdir /dev/freezer
      mount -t cgroup -o freezer none /dev/freezer
      mkdir /dev/freezer/foo
      sleep 1h &
      pid=$!
      kill -STOP $pid
      echo $pid > /dev/freezer/foo/tasks
      echo FROZEN > /dev/freezer/foo/freezer.state
      while true
      do
      	cat /dev/freezer/foo/freezer.state
      	[ "`cat /dev/freezer/foo/freezer.state`" = "FROZEN" ] && break
      	sleep 1
      done
      echo OK
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Tomasz Buchert <tomasz.buchert@inria.fr>
      Cc: Paul Menage <paul@paulmenage.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@kernel.org
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      884a45d9
  9. 24 11月, 2011 1 次提交
  10. 19 11月, 2011 3 次提交
  11. 18 11月, 2011 2 次提交
  12. 17 11月, 2011 1 次提交
  13. 16 11月, 2011 2 次提交
  14. 14 11月, 2011 2 次提交