1. 15 5月, 2012 1 次提交
    • J
      genirq: export handle_edge_irq() and irq_to_desc() · 3911ff30
      Jiri Kosina 提交于
      Export handle_edge_irq() and irq_to_desc() to modules to allow them to
      do things such as
      
      	__irq_set_handler_locked(...., handle_edge_irq);
      
      This fixes
      
      	ERROR: "handle_edge_irq" [drivers/gpio/gpio-pch.ko] undefined!
      	ERROR: "irq_to_desc" [drivers/gpio/gpio-pch.ko] undefined!
      
      when gpio-pch is being built as a module.
      
      This was introduced by commit df9541a6 ("gpio: pch9: Use proper flow
      type handlers") that added
      
      	__irq_set_handler_locked(d->irq, handle_edge_irq);
      
      but handle_edge_irq() was not exported for modules (and inlined
      __irq_set_handler_locked() requires irq_to_desc() exported as well)
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3911ff30
  2. 11 5月, 2012 2 次提交
    • M
      namespaces, pid_ns: fix leakage on fork() failure · 5e2bf014
      Mike Galbraith 提交于
      Fork() failure post namespace creation for a child cloned with
      CLONE_NEWPID leaks pid_namespace/mnt_cache due to proc being mounted
      during creation, but not unmounted during cleanup.  Call
      pid_ns_release_proc() during cleanup.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Louis Rilling <louis.rilling@kerlabs.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e2bf014
    • S
      tracing: Do not enable function event with enable · 9b63776f
      Steven Rostedt 提交于
      With the adding of function tracing event to perf, it caused a
      side effect that produces the following warning when enabling all
      events in ftrace:
      
       # echo 1 > /sys/kernel/debug/tracing/events/enable
      
      [console]
      event trace: Could not enable event function
      
      This is because when enabling all events via the debugfs system
      it ignores events that do not have a ->reg() function assigned.
      This was to skip over the ftrace internal events (as they are
      not TRACE_EVENTs). But as the ftrace function event now has
      a ->reg() function attached to it for use with perf, it is no
      longer ignored.
      
      Worse yet, this ->reg() function is being called when it should
      not be. It returns an error and causes the above warning to
      be printed.
      
      By adding a new event_call flag (TRACE_EVENT_FL_IGNORE_ENABLE)
      and have all ftrace internel event structures have it set,
      setting the events/enable will no longe try to incorrectly enable
      the function event and does not warn.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9b63776f
  3. 10 5月, 2012 4 次提交
    • J
      compat: Fix RT signal mask corruption via sigprocmask · b7dafa0e
      Jan Kiszka 提交于
      compat_sys_sigprocmask reads a smaller signal mask from userspace than
      sigprogmask accepts for setting.  So the high word of blocked.sig[0]
      will be cleared, releasing any potentially blocked RT signal.
      
      This was discovered via userspace code that relies on get/setcontext.
      glibc's i386 versions of those functions use sigprogmask instead of
      rt_sigprogmask to save/restore signal mask and caused RT signal
      unblocking this way.
      
      As suggested by Linus, this replaces the sys_sigprocmask based compat
      version with one that open-codes the required logic, including the merge
      of the existing blocked set with the new one provided on SIG_SETMASK.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7dafa0e
    • P
      rcu: Make rcu_barrier() less disruptive · b1420f1c
      Paul E. McKenney 提交于
      The rcu_barrier() primitive interrupts each and every CPU, registering
      a callback on every CPU.  Once all of these callbacks have been invoked,
      rcu_barrier() knows that every callback that was registered before
      the call to rcu_barrier() has also been invoked.
      
      However, there is no point in registering a callback on a CPU that
      currently has no callbacks, most especially if that CPU is in a
      deep idle state.  This commit therefore makes rcu_barrier() avoid
      interrupting CPUs that have no callbacks.  Doing this requires reworking
      the handling of orphaned callbacks, otherwise callbacks could slip through
      rcu_barrier()'s net by being orphaned from a CPU that rcu_barrier() had
      not yet interrupted to a CPU that rcu_barrier() had already interrupted.
      This reworking was needed anyway to take a first step towards weaning
      RCU from the CPU_DYING notifier's use of stop_cpu().
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b1420f1c
    • P
      rcu: Explicitly initialize RCU_FAST_NO_HZ per-CPU variables · 98248a0e
      Paul E. McKenney 提交于
      The current initialization of the RCU_FAST_NO_HZ per-CPU variables makes
      needless and fragile assumptions about the initial value of things like
      the jiffies counter.  This commit therefore explicitly initializes all of
      them that are better started with a non-zero value.  It also adds some
      comments describing the per-CPU state variables.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      98248a0e
    • P
      rcu: Make RCU_FAST_NO_HZ handle timer migration · 21e52e15
      Paul E. McKenney 提交于
      The current RCU_FAST_NO_HZ assumes that timers do not migrate unless a
      CPU goes offline, in which case it assumes that the CPU will have to come
      out of dyntick-idle mode (cancelling the timer) in order to go offline.
      This is important because when RCU_FAST_NO_HZ permits a CPU to enter
      dyntick-idle mode despite having RCU callbacks pending, it posts a timer
      on that CPU to force a wakeup on that CPU.  This wakeup ensures that the
      CPU will eventually handle the end of the grace period, including invoking
      its RCU callbacks.
      
      However, Pascal Chapperon's test setup shows that the timer handler
      rcu_idle_gp_timer_func() really does get invoked in some cases.  This is
      problematic because this can cause the CPU that entered dyntick-idle
      mode despite still having RCU callbacks pending to remain in
      dyntick-idle mode indefinitely, which means that its RCU callbacks might
      never be invoked.  This situation can result in grace-period delays or
      even system hangs, which matches Pascal's observations of slow boot-up
      and shutdown (https://lkml.org/lkml/2012/4/5/142).  See also the bugzilla:
      
      	https://bugzilla.redhat.com/show_bug.cgi?id=806548
      
      This commit therefore causes the "should never be invoked" timer handler
      rcu_idle_gp_timer_func() to use smp_call_function_single() to wake up
      the CPU for which the timer was intended, allowing that CPU to invoke
      its RCU callbacks in a timely manner.
      Reported-by: NPascal Chapperon <pascal.chapperon@wanadoo.fr>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      21e52e15
  4. 09 5月, 2012 1 次提交
  5. 03 5月, 2012 2 次提交
  6. 01 5月, 2012 12 次提交
    • P
      rcu: Ensure that RCU_FAST_NO_HZ timers expire on correct CPU · f511fc62
      Paul E. McKenney 提交于
      Timers are subject to migration, which can lead to the following
      system-hang scenario when CONFIG_RCU_FAST_NO_HZ=y:
      
      1.	CPU 0 executes synchronize_rcu(), which posts an RCU callback.
      
      2.	CPU 0 then goes idle.  It cannot immediately invoke the callback,
      	but there is nothing RCU needs from ti, so it enters dyntick-idle
      	mode after posting a timer.
      
      3.	The timer gets migrated to CPU 1.
      
      4.	CPU 0 never wakes up, so the synchronize_rcu() never returns, so
      	the system hangs.
      
      This commit fixes this problem by using mod_timer_pinned(), as suggested
      by Peter Zijlstra, to ensure that the timer is actually posted on the
      running CPU.
      Reported-by: NDipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f511fc62
    • L
      rcu: Add rcutorture test for call_srcu() · 9059c940
      Lai Jiangshan 提交于
      Add srcu_torture_deferred_free() for srcu_ops so as to test the new
      call_srcu().  Rename the original srcu_ops to srcu_sync_ops.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9059c940
    • L
      rcu: Implement per-domain single-threaded call_srcu() state machine · 931ea9d1
      Lai Jiangshan 提交于
      This commit implements an SRCU state machine in support of call_srcu().
      The state machine is preemptible, light-weight, and single-threaded,
      minimizing synchronization overhead.  In particular, there is no longer
      any need for synchronize_srcu() to be guarded by a mutex.
      
      Expedited processing is handled, at least in the absence of concurrent
      grace-period operations on that same srcu_struct structure, by having
      the synchronize_srcu_expedited() thread take on the role of the
      workqueue thread for one iteration.
      
      There is a reasonable probability that a given SRCU callback will
      be invoked on the same CPU that registered it, however, there is no
      guarantee.  Concurrent SRCU grace-period primitives can cause callbacks
      to be executed elsewhere, even in absence of CPU-hotplug operations.
      
      Callbacks execute in process context, but under the influence of
      local_bh_disable(), so it is illegal to sleep in an SRCU callback
      function.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      931ea9d1
    • L
      rcu: Use single value to handle expedited SRCU grace periods · d9792edd
      Lai Jiangshan 提交于
      The earlier algorithm used an "expedited" flag combined with a "trycount"
      counter to differentiate between normal and expedited SRCU grace periods.
      However, the difference can be encoded into a single counter with a cutoff
      value and different initial values for expedited and normal SRCU grace
      periods.  This commit makes that change.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      
      Conflicts:
      
      	kernel/srcu.c
      d9792edd
    • L
      rcu: Improve srcu_readers_active_idx()'s cache locality · dc879175
      Lai Jiangshan 提交于
      Expand the calls to srcu_readers_active_idx() from srcu_readers_active()
      inline.  This change improves cache locality by interating over the CPUs
      once rather than twice.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      dc879175
    • L
      rcu: Implement a variant of Peter's SRCU algorithm · b52ce066
      Lai Jiangshan 提交于
      This commit implements a variant of Peter's algorithm, which may be found
      at https://lkml.org/lkml/2012/2/1/119.
      
      o	Make the checking lock-free to enable parallel checking.
      	Parallel checking is required when (1) the original checking
      	task is preempted for a long time, (2) sychronize_srcu_expedited()
      	starts during an ongoing SRCU grace period, or (3) we wish to
      	avoid acquiring a lock.
      
      o	Since the checking is lock-free, we avoid a mutex in state machine
      	for call_srcu().
      
      o	Remove the SRCU_REF_MASK and remove the coupling with the flipping.
      	This might allow us to remove the preempt_disable() in future
      	versions, though such removal will need great care because it
      	rescinds the one-old-reader-per-CPU guarantee.
      
      o	Remove a smp_mb(), simplify the comments and make the smp_mb() pairs
      	more intuitive.
      Inspired-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b52ce066
    • L
      rcu: Improve SRCU's wait_idx() comments · 18108ebf
      Lai Jiangshan 提交于
      The safety of SRCU is provided byy wait_idx() rather than flipping.
      The flipping actually prevents starvation.
      
      This commit therefore updates the comments to more accurately and
      precisely describe what is going on.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      18108ebf
    • L
      rcu: Flip ->completed only once per SRCU grace period · 944ce9af
      Lai Jiangshan 提交于
      This is an optimization of the SRCU grace period.  To guard against
      preempted readers with old values of the counter, it suffices to scan the
      old counters once more, then flip ->completed only one time.  The reason
      this works is that the old readers must have incremented the old set of
      counters (if they have not yet incremented, then their critical section
      starts after this grace period, so they may be safely ignored).
      
      This commit therefore optimizes the second flip out in favor of a simple
      rescan.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      944ce9af
    • L
      rcu: Increment upper bit only for srcu_read_lock() · 440253c1
      Lai Jiangshan 提交于
      The purpose of the upper bit of SRCU's per-CPU counters is to guarantee
      that no reasonable series of srcu_read_lock() and srcu_read_unlock()
      operations can return the value of the counter to its original value.
      This guarantee is require only after the index has been switched to
      the other set of counters, so at most one srcu_read_lock() can affect
      a given CPU's counter.  The number of srcu_read_unlock() operations
      on a given counter is limited to the number of tasks in the system,
      which given the Linux kernel's current structure is limited to far less
      than 2^30 on 32-bit systems and far less than 2^62 on 64-bit systems.
      (Something about a limited number of bytes in the kernel's address space.)
      
      Therefore, if srcu_read_lock() increments the upper bits, then
      srcu_read_unlock() need not do so.  In this case, an srcu_read_lock() and
      an srcu_read_unlock() will flip the lower bit of the upper field of the
      counter.  An unreasonably large additional number of srcu_read_unlock()
      operations would be required to return the counter to its initial value,
      thus preserving the guarantee.
      
      This commit takes this approach, which further allows it to shrink
      the size of the upper field to one bit, making the number of
      srcu_read_unlock() operations required to return the counter to its
      initial value even more unreasonable than before.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      440253c1
    • L
      rcu: Remove fast check path from __synchronize_srcu() · 4b7a3e9e
      Lai Jiangshan 提交于
      The fastpath in __synchronize_srcu() is designed to handle cases where
      there are a large number of concurrent calls for the same srcu_struct
      structure.  However, the Linux kernel currently does not use SRCU in
      this manner, so remove the fastpath checks for simplicity.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4b7a3e9e
    • P
      rcu: Direct algorithmic SRCU implementation · cef50120
      Paul E. McKenney 提交于
      The current implementation of synchronize_srcu_expedited() can cause
      severe OS jitter due to its use of synchronize_sched(), which in turn
      invokes try_stop_cpus(), which causes each CPU to be sent an IPI.
      This can result in severe performance degradation for real-time workloads
      and especially for short-interation-length HPC workloads.  Furthermore,
      because only one instance of try_stop_cpus() can be making forward progress
      at a given time, only one instance of synchronize_srcu_expedited() can
      make forward progress at a time, even if they are all operating on
      distinct srcu_struct structures.
      
      This commit, inspired by an earlier implementation by Peter Zijlstra
      (https://lkml.org/lkml/2012/1/31/211) and by further offline discussions,
      takes a strictly algorithmic bits-in-memory approach.  This has the
      disadvantage of requiring one explicit memory-barrier instruction in
      each of srcu_read_lock() and srcu_read_unlock(), but on the other hand
      completely dispenses with OS jitter and furthermore allows SRCU to be
      used freely by CPUs that RCU believes to be idle or offline.
      
      The update-side implementation handles the single read-side memory
      barrier by rechecking the per-CPU counters after summing them and
      by running through the update-side state machine twice.
      
      This implementation has passed moderate rcutorture testing on both
      x86 and Power.  Also updated to use this_cpu_ptr() instead of per_cpu_ptr(),
      as suggested by Peter Zijlstra.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      cef50120
    • P
      rcu: Introduce rcutorture testing for rcu_barrier() · fae4b54f
      Paul E. McKenney 提交于
      Although rcutorture does invoke rcu_barrier() and friends, it cannot
      really be called a torture test given that it invokes them only once
      at the end of the test.  This commit therefore introduces heavy-duty
      rcutorture testing for rcu_barrier(), which may be carried out
      concurrently with normal rcutorture testing.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fae4b54f
  7. 27 4月, 2012 1 次提交
  8. 26 4月, 2012 4 次提交
  9. 25 4月, 2012 8 次提交
    • S
      hung task debugging: Inject NMI when hung and going to panic · 625056b6
      Sasha Levin 提交于
      Send an NMI to all CPUs when a hung task is detected and the hung
      task code is configured to panic. This gives us a fairly uptodate
      snapshot of all CPUs in the system.
      
      This lets us get stack trace of all CPUs which makes life easier
      trying to debug a deadlock, and the NMI doesn't change anything
      since the next step is a kernel panic.
      Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1331848040-1676-1-git-send-email-levinsasha928@gmail.com
      [ extended the changelog a bit ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      625056b6
    • P
      rcu: Fixes to rcutorture error handling and cleanup · 37e377d2
      Paul E. McKenney 提交于
      The rcutorture initialization code ignored the error returns from
      rcu_torture_onoff_init() and rcu_torture_stall_init().  The rcutorture
      cleanup code failed to NULL out a number of pointers.  These bugs will
      normally have no effect, but this commit fixes them nevertheless.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      37e377d2
    • P
      rcu: Make RCU_FAST_NO_HZ account for pauses out of idle · c57afe80
      Paul E. McKenney 提交于
      Both Steven Rostedt's new idle-capable trace macros and the RCU_NONIDLE()
      macro can cause RCU to momentarily pause out of idle without the rest
      of the system being involved.  This can cause rcu_prepare_for_idle()
      to run through its state machine too quickly, which can in turn result
      in needless scheduling-clock interrupts.
      
      This commit therefore adds code to enable rcu_prepare_for_idle() to
      distinguish between an initial entry to idle on the one hand (which needs
      to advance the rcu_prepare_for_idle() state machine) and an idle reentry
      due to idle-capable trace macros and RCU_NONIDLE() on the other hand
      (which should avoid advancing the rcu_prepare_for_idle() state machine).
      Additional state is maintained to allow the timer to be correctly reposted
      when returning after a momentary pause out of idle, and even more state
      is maintained to detect when new non-lazy callbacks have been enqueued
      (which may require re-evaluation of the approach to idleness).
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c57afe80
    • P
      rcu: Make RCU_FAST_NO_HZ use timer rather than hrtimer · 2ee3dc80
      Paul E. McKenney 提交于
      The RCU_FAST_NO_HZ facility uses an hrtimer to wake up a CPU when
      it is allowed to go into dyntick-idle mode, which is almost always
      cancelled soon after.  This is not what hrtimers are good at, so
      this commit switches to the timer wheel.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2ee3dc80
    • P
      rcu: Add RCU_FAST_NO_HZ tracing for idle exit · 2fdbb31b
      Paul E. McKenney 提交于
      Traces of rcu_prep_idle events can be confusing because
      rcu_cleanup_after_idle() does no tracing.  This commit therefore adds
      this tracing.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2fdbb31b
    • P
      rcu: Document why rcu_blocking_is_gp() is safe · 6d813391
      Paul E. McKenney 提交于
      The rcu_blocking_is_gp() function tests to see if there is only one
      online CPU, and if so, synchronize_sched() and friends become no-ops.
      However, for larger systems, num_online_cpus() scans a large vector,
      and might be preempted while doing so.  While preempted, any number
      of CPUs might come online and go offline, potentially resulting in
      num_online_cpus() returning 1 when there never had only been one
      CPU online.  This could result in a too-short RCU grace period, which
      could in turn result in total failure, except that the only way that
      the grace period is too short is if there is an RCU read-side critical
      section spanning it.  For RCU-sched and RCU-bh (which are the only
      cases using rcu_blocking_is_gp()), RCU read-side critical sections
      have either preemption or bh disabled, which prevents CPUs from going
      offline.  This in turn prevents actual failures from occurring.
      
      This commit therefore adds a large block comment to rcu_blocking_is_gp()
      documenting why it is safe.  This commit also moves rcu_blocking_is_gp()
      into kernel/rcutree.c, which should help prevent unwary developers from
      mistaking it for a generally useful function.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6d813391
    • P
      rcu: Reduce cache-miss initialization latencies for large systems · 8932a63d
      Paul E. McKenney 提交于
      Commit #0209f649 (rcu: limit rcu_node leaf-level fanout) set an upper
      limit of 16 on the leaf-level fanout for the rcu_node tree.  This was
      needed to reduce lock contention that was induced by the synchronization
      of scheduling-clock interrupts, which was in turn needed to improve
      energy efficiency for moderate-sized lightly loaded servers.
      
      However, reducing the leaf-level fanout means that there are more
      leaf-level rcu_node structures in the tree, which in turn means that
      RCU's grace-period initialization incurs more cache misses.  This is
      not a problem on moderate-sized servers with only a few tens of CPUs,
      but becomes a major source of real-time latency spikes on systems with
      many hundreds of CPUs.  In addition, the workloads running on these large
      systems tend to be CPU-bound, which eliminates the energy-efficiency
      advantages of synchronizing scheduling-clock interrupts.  Therefore,
      these systems need maximal values for the rcu_node leaf-level fanout.
      
      This commit addresses this problem by introducing a new kernel parameter
      named RCU_FANOUT_LEAF that directly controls the leaf-level fanout.
      This parameter defaults to 16 to handle the common case of a moderate
      sized lightly loaded servers, but may be set higher on larger systems.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Reported-by: NDimitri Sivanich <sivanich@sgi.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8932a63d
    • B
      PM / Hibernate: fix the number of pages used for hibernate/thaw buffering · f8262d47
      Bojan Smojver 提交于
      Hibernation regression fix, since 3.2.
      
      Calculate the number of required free pages based on non-high memory
      pages only, because that is where the buffers will come from.
      
      Commit 081a9d04 introduced a new buffer
      page allocation logic during hibernation, in order to improve the
      performance. The amount of pages allocated was calculated based on total
      amount of pages available, although only non-high memory pages are
      usable for this purpose. This caused hibernation code to attempt to over
      allocate pages on platforms that have high memory, which led to hangs.
      Signed-off-by: NBojan Smojver <bojan@rexursive.com>
      Signed-off-by: NRafael J. Wysocki <rjw@suse.de>
      f8262d47
  10. 24 4月, 2012 1 次提交
    • P
      irq: hide debug macros so they don't collide with others. · 9f3045ec
      Paul Gortmaker 提交于
      The file kernel/irq/debug.h temporarily defines P, PS, PD
      and then undefines them.  However these names aren't really
      "internal" enough, and collide with other more legit users
      such as the ones in the xtensa arch, causing:
      
      In file included from kernel/irq/internals.h:58:0,
                       from kernel/irq/irqdesc.c:18:
      kernel/irq/debug.h:8:0: warning: "PS" redefined [enabled by default]
      arch/xtensa/include/asm/regs.h:59:0: note: this is the location of the previous definition
      
      Add a handful of underscores to do a better job of hiding these
      temporary macros.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      9f3045ec
  11. 20 4月, 2012 3 次提交
    • S
      tracing: Fix stacktrace of latency tracers (irqsoff and friends) · db4c75cb
      Steven Rostedt 提交于
      While debugging a latency with someone on IRC (mirage335) on #linux-rt (OFTC),
      we discovered that the stacktrace output of the latency tracers
      (preemptirqsoff) was empty.
      
      This bug was caused by the creation of the dynamic length stack trace
      again (like commit 12b5da34 "tracing: Fix ent_size in trace output" was).
      
      This bug is caused by the latency tracers requiring the next event
      to determine the time between the current event and the next. But by
      grabbing the next event, the iter->ent_size is set to the next event
      instead of the current one. As the stacktrace event is the last event,
      this makes the ent_size zero and causes nothing to be printed for
      the stack trace. The dynamic stacktrace uses the ent_size to determine
      how much of the stack can be printed. The ent_size of zero means
      no stack.
      
      The simple fix is to save the iter->ent_size before finding the next event.
      
      Note, mirage335 asked to remain anonymous from LKML and git, so I will
      not add the Reported-by and Tested-by tags, even though he did report
      the issue and tested the fix.
      
      Cc: stable@vger.kernel.org # 3.1+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      db4c75cb
    • S
      tick: Fix the spurious broadcast timer ticks after resume · a6371f80
      Suresh Siddha 提交于
      During resume, tick_resume_broadcast() programs the broadcast timer in
      oneshot mode unconditionally. On the platforms where broadcast timer
      is not really required, this will generate spurious broadcast timer
      ticks upon resume. For example, on the always running apic timer
      platforms with HPET, I see spurious hpet tick once every ~5minutes
      (which is the 32-bit hpet counter wraparound time).
      
      Similar to boot time, during resume make the oneshot mode setting of
      the broadcast clock event device conditional on the state of active
      broadcast users.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: svenjoac@gmx.de
      Cc: torvalds@linux-foundation.org
      Cc: rjw@sisk.pl
      Link: http://lkml.kernel.org/r/1334802459.28674.209.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a6371f80
    • T
      tick: Ensure that the broadcast device is initialized · b9a6a235
      Thomas Gleixner 提交于
      Santosh found another trap when we avoid to initialize the broadcast
      device in the switch_to_oneshot code. The broadcast device might be
      still in SHUTDOWN state when we actually need to use it. That
      obviously breaks, as set_next_event() is called on a shutdown
      device. This did not break on x86, but Suresh analyzed it:
      
      From the review, most likely on Sven's system we are force enabling
      the hpet using the pci quirk's method very late. And in this case,
      hpet_clockevent (which will be global_clock_event) handler can be
      null, specifically as this platform might not be using deeper c-states
      and using the reliable APIC timer.
      
      Prior to commit 'fa4da365', that handler will be set to
      'tick_handle_oneshot_broadcast' when we switch the broadcast timer to
      oneshot mode, even though we don't use it. Post commit
      'fa4da365', we stopped switching the broadcast mode to oneshot
      as this is not really needed and his platform's global_clock_event's
      handler will remain null. While on my SNB laptop, same is set to
      'clockevents_handle_noop' because hpet gets enabled very early. (noop
      handler on my platform set when the early enabled hpet timer gets
      replaced by the lapic timer).
      
      But the commit 'fa4da365' tracked the broadcast timer mode in
      the SW as oneshot, even though it didn't touch the HW timer. During
      resume however, tick_resume_broadcast() saw the SW broadcast mode as
      oneshot and actually programmed the broadcast device also into oneshot
      mode. So this triggered the null pointer de-reference after the hpet
      wraps around and depending on what the hpet counter is set to. On the
      normal platforms where hpet gets enabled early we should be seeing a
      spurious interrupt (in my SNB laptop I see one spurious interrupt
      after around 5 minutes ;) which is 32-bit hpet counter wraparound
      time), but that's a separate issue.
      
      Enforce the mode setting when trying to set an event.
      Reported-and-tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: torvalds@linux-foundation.org
      Cc: svenjoac@gmx.de
      Cc: rjw@sisk.pl
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181723350.2542@ionos
      b9a6a235
  12. 18 4月, 2012 1 次提交
    • T
      tick: Fix oneshot broadcast setup really · b435092f
      Thomas Gleixner 提交于
      Sven Joachim reported, that suspend/resume on rc3 trips over a NULL
      pointer dereference. Linus spotted the clockevent handler being NULL.
      
      commit fa4da365(clockevents: tTack broadcast device mode change in
      tick_broadcast_switch_to_oneshot()) tried to fix a problem with the
      broadcast device setup, which was introduced in commit 77b0d60c(
      clockevents: Leave the broadcast device in shutdown mode when not
      needed).
      
      The initial commit avoided to set up the broadcast device when no
      broadcast request bits were set, but that left the broadcast device
      disfunctional. In consequence deep idle states which need the
      broadcast device were not woken up.
      
      commit fa4da365 tried to fix that by initializing the state of the
      broadcast facility, but that missed the fact, that nothing initializes
      the event handler and some other state of the underlying clock event
      device.
      
      The fix is to revert both commits and make only the mode setting of
      the clock event device conditional on the state of active broadcast
      users. 
      
      That initializes everything except the low level device mode, but this
      happens when the broadcast functionality is invoked by deep idle.
      Reported-and-tested-by: NSven Joachim <svenjoac@gmx.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181205540.2542@ionos
      b435092f