1. 19 5月, 2011 9 次提交
    • S
      ftrace: Free hash with call_rcu_sched() · 07fd5515
      Steven Rostedt 提交于
      When a hash is modified and might be in use, we need to perform
      a schedule RCU operation on it, as the hashes will soon be used
      directly in the function tracer callback.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      07fd5515
    • S
      ftrace: Have global_ops store the functions that are to be traced · 2b499381
      Steven Rostedt 提交于
      This is a step towards each ops structure defining its own set
      of functions to trace. As the current code with pid's and such
      are specific to the global_ops, it is restructured to be used
      with the global ops.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2b499381
    • S
      ftrace: Add ops parameter to ftrace_startup/shutdown functions · bd69c30b
      Steven Rostedt 提交于
      In order to allow different ops to enable different functions,
      the ftrace_startup() and ftrace_shutdown() functions need the
      ops parameter passed to them.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bd69c30b
    • S
      ftrace: Add enabled_functions file · 647bcd03
      Steven Rostedt 提交于
      Add the enabled_functions file that is used to show all the
      functions that have been enabled for tracing as well as their
      ref counts. This helps seeing if any function has been registered
      and what functions are being traced.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      647bcd03
    • S
      ftrace: Use counters to enable functions to trace · ed926f9b
      Steven Rostedt 提交于
      Every function has its own record that stores the instruction
      pointer and flags for the function to be traced. There are only
      two flags: enabled and free. The enabled flag states that tracing
      for the function has been enabled (actively traced), and the free
      flag states that the record no longer points to a function and can
      be used by new functions (loaded modules).
      
      These flags are now moved to the MSB of the flags (actually just
      the top 32bits). The rest of the bits (30 bits) are now used as
      a ref counter. Everytime a tracer register functions to trace,
      those functions will have its counter incremented.
      
      When tracing is enabled, to determine if a function should be traced,
      the counter is examined, and if it is non-zero it is set to trace.
      
      When a ftrace_ops is registered to trace functions, its hashes
      are examined. If the ftrace_ops filter_hash count is zero, then
      all functions are set to be traced, otherwise only the functions
      in the hash are to be traced. The exception to this is if a function
      is also in the ftrace_ops notrace_hash. Then that function's counter
      is not incremented for this ftrace_ops.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ed926f9b
    • S
      ftrace: Separate hash allocation and assignment · 33dc9b12
      Steven Rostedt 提交于
      When filtering, allocate a hash to insert the function records.
      After the filtering is complete, assign it to the ftrace_ops structure.
      
      This allows the ftrace_ops structure to have a much smaller array of
      hash buckets instead of wasting a lot of memory.
      
      A read only empty_hash is created to be the minimum size that any ftrace_ops
      can point to.
      
      When a new hash is created, it has the following steps:
      
      o Allocate a default hash.
      o Walk the function records assigning the filtered records to the hash
      o Allocate a new hash with the appropriate size buckets
      o Move the entries from the default hash to the new hash.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      33dc9b12
    • S
      ftrace: Create a global_ops to hold the filter and notrace hashes · f45948e8
      Steven Rostedt 提交于
      Combine the filter and notrace hashes to be accessed by a single entity,
      the global_ops. The global_ops is a ftrace_ops structure that is passed
      to different functions that can read or modify the filtering of the
      function tracer.
      
      The ftrace_ops structure was modified to hold a filter and notrace
      hashes so that later patches may allow each ftrace_ops to have its own
      set of rules to what functions may be filtered.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f45948e8
    • S
      ftrace: Use hash instead for FTRACE_FL_FILTER · 1cf41dd7
      Steven Rostedt 提交于
      When multiple users are allowed to have their own set of functions
      to trace, having the FTRACE_FL_FILTER flag will not be enough to
      handle the accounting of those users. Each user will need their own
      set of functions.
      
      Replace the FTRACE_FL_FILTER with a filter_hash instead. This is
      temporary until the rest of the function filtering accounting
      gets in.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1cf41dd7
    • S
      ftrace: Replace FTRACE_FL_NOTRACE flag with a hash of ignored functions · b448c4e3
      Steven Rostedt 提交于
      To prepare for the accounting system that will allow multiple users of
      the function tracer, having the FTRACE_FL_NOTRACE as a flag in the
      dyn_trace record does not make sense.
      
      All ftrace_ops will soon have a hash of functions they should trace
      and not trace. By making a global hash of functions not to trace makes
      this easier for the transition.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b448c4e3
  2. 07 5月, 2011 1 次提交
  3. 04 5月, 2011 1 次提交
  4. 03 5月, 2011 3 次提交
  5. 30 4月, 2011 10 次提交
  6. 29 4月, 2011 2 次提交
    • T
      hrtimer: Initialize CLOCK_ID to HRTIMER_BASE table statically · ce31332d
      Thomas Gleixner 提交于
      Sedat and Bruno reported RCU stalls which turned out to be caused by
      the following;
      
      sched_init() calls init_rt_bandwidth() which calls hrtimer_init()
      _BEFORE_ hrtimers_init() is called. While not entirely correct this
      worked because hrtimer_init() only accessed statically initialized
      data (hrtimer_bases.clock_base[CLOCK_MONOTONIC])
      
      Commit e06383db (hrtimers: extend hrtimer base code to handle more
      then 2 clockids) added an indirection to the hrtimer_bases.clock_base
      lookup to avoid gap handling in the hot path. The table which is used
      for the translataion from CLOCK_ID to HRTIMER_BASE index is
      initialized at runtime in hrtimers_init(). So the early call of the
      scheduler code translates CLOCK_MONOTONIC to HRTIMER_BASE_REALTIME.
      
      Thus the rt_bandwith timer ends up on CLOCK_REALTIME. If the timer is
      armed and the wall clock time is set (e.g. ntpdate in the early boot
      process - which also gives the problem deterministic behaviour
      i.e. magic recovery after N hours), then the timer ends up with an
      expiry time far into the future. That breaks the RT throttler
      mechanism as rt runtime is accumulated and never cleared, so the rt
      throttler detects a false cpu hog condition and blocks all RT tasks
      until the timer finally expires. That in turn stalls the RCU thread of
      TINYRCU which leads to an huge amount of RCU callbacks piling up.
      
      Make the translation table statically initialized, so we are back to
      the status of <= 2.6.39.
      Reported-and-tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Reported-by: NBruno Prémont <bonbons@linux-vserver.org>
      Cc: John stultz <johnstul@us.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104282353140.3005%40ionos%3EReviewed-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ce31332d
    • H
      kernel/watchdog.c: disable nmi perf event in the error path of enabling watchdog · 1409f141
      Hillf Danton 提交于
      In corner cases where softlockup watchdog is not setup successfully, the
      relevant nmi perf event for hardlockup watchdog could be disabled, then
      the status of the underlying hardware remains unchanged.
      
      Also, if the kthread doesn't start then the hrtimer won't run and the
      hardlockup detector will falsely fire.
      Signed-off-by: NHillf Danton <dhillf@gmail.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1409f141
  7. 25 4月, 2011 1 次提交
    • F
      ptrace: Prepare to fix racy accesses on task breakpoints · bf26c018
      Frederic Weisbecker 提交于
      When a task is traced and is in a stopped state, the tracer
      may execute a ptrace request to examine the tracee state and
      get its task struct. Right after, the tracee can be killed
      and thus its breakpoints released.
      This can happen concurrently when the tracer is in the middle
      of reading or modifying these breakpoints, leading to dereferencing
      a freed pointer.
      
      Hence, to prepare the fix, create a generic breakpoint reference
      holding API. When a reference on the breakpoints of a task is
      held, the breakpoints won't be released until the last reference
      is dropped. After that, no more ptrace request on the task's
      breakpoints can be serviced for the tracer.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: v2.6.33.. <stable@kernel.org>
      Link: http://lkml.kernel.org/r/1302284067-7860-2-git-send-email-fweisbec@gmail.com
      bf26c018
  8. 21 4月, 2011 1 次提交
  9. 20 4月, 2011 1 次提交
  10. 19 4月, 2011 2 次提交
    • R
      PM: Fix error code paths executed after failing syscore_suspend() · 2ca6f62f
      Rafael J. Wysocki 提交于
      If syscore_suspend() fails in suspend_enter(), create_image() or
      resume_target_kernel(), it is necessary to call sysdev_resume(),
      because sysdev_suspend() has been called already and succeeded
      and we are going to abort the transition.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
      2ca6f62f
    • L
      next_pidmap: fix overflow condition · c78193e9
      Linus Torvalds 提交于
      next_pidmap() just quietly accepted whatever 'last' pid that was passed
      in, which is not all that safe when one of the users is /proc.
      
      Admittedly the proc code should do some sanity checking on the range
      (and that will be the next commit), but that doesn't mean that the
      helper functions should just do that pidmap pointer arithmetic without
      checking the range of its arguments.
      
      So clamp 'last' to PID_MAX_LIMIT.  The fact that we then do "last+1"
      doesn't really matter, the for-loop does check against the end of the
      pidmap array properly (it's only the actual pointer arithmetic overflow
      case we need to worry about, and going one bit beyond isn't going to
      overflow).
      
      [ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]
      Reported-by: NTavis Ormandy <taviso@cmpxchg8b.com>
      Analyzed-by: NRobert Święcki <robert@swiecki.net>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c78193e9
  11. 18 4月, 2011 1 次提交
  12. 16 4月, 2011 2 次提交
    • J
      block: make unplug timer trace event correspond to the schedule() unplug · 49cac01e
      Jens Axboe 提交于
      It's a pretty close match to what we had before - the timer triggering
      would mean that nobody unplugged the plug in due time, in the new
      scheme this matches very closely what the schedule() unplug now is.
      It's essentially the difference between an explicit unplug (IO unplug)
      or an implicit unplug (timer unplug, we scheduled with pending IO
      queued).
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      49cac01e
    • J
      block: let io_schedule() flush the plug inline · a237c1c5
      Jens Axboe 提交于
      Linus correctly observes that the most important dispatch cases
      are now done from kblockd, this isn't ideal for latency reasons.
      The original reason for switching dispatches out-of-line was to
      avoid too deep a stack, so by _only_ letting the "accidental"
      flush directly in schedule() be guarded by offload to kblockd,
      we should be able to get the best of both worlds.
      
      So add a blk_schedule_flush_plug() that offloads to kblockd,
      and only use that from the schedule() path.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      a237c1c5
  13. 15 4月, 2011 1 次提交
  14. 13 4月, 2011 1 次提交
  15. 12 4月, 2011 4 次提交