1. 24 2月, 2012 1 次提交
  2. 23 2月, 2012 1 次提交
    • D
      tracepoint, vfs, sched: Add exec() tracepoint · 4ff16c25
      David Smith 提交于
      Added a minimal exec tracepoint. Exec is an important major event
      in the life of a task, like fork(), clone() or exit(), all of
      which we already trace.
      
      [ We also do scheduling re-balancing during exec() - so it's useful
        from a scheduler instrumentation POV as well. ]
      
      If you want to watch a task start up, when it gets exec'ed is a good place
      to start.  With the addition of this tracepoint, exec's can be monitored
      and better picture of general system activity can be obtained. This
      tracepoint will also enable better process life tracking, allowing you to
      answer questions like "what process keeps starting up binary X?".
      
      This tracepoint can also be useful in ftrace filtering and trigger
      conditions: i.e. starting or stopping filtering when exec is called.
      Signed-off-by: NDavid Smith <dsmith@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/4F314D19.7030504@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      4ff16c25
  3. 22 2月, 2012 2 次提交
    • P
      sched/events: Revert trace_sched_stat_sleeptime() · 8c79a045
      Peter Zijlstra 提交于
      Commit 1ac9bc69 ("sched/tracing: Add a new tracepoint for sleeptime")
      added a new sched:sched_stat_sleeptime tracepoint.
      
      It's broken: the first sample we get on a task might be bad because
      of a stale sleep_start value that wasn't reset at the last task switch
      because the tracepoint was not active.
      
      It also breaks the existing schedstat samples due to the side
      effects of:
      
      -               se->statistics.sleep_start = 0;
      ...
      -               se->statistics.block_start = 0;
      
      Nor do I see means to fix it without adding overhead to the scheduler
      fast path, which I'm not willing to for the sake of redundant
      instrumentation.
      
      Most importantly, sleep time information can already be constructed
      by tracing context switches and wakeups, and taking the timestamp
      difference between the schedule-out, the wakeup and the schedule-in.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Vagin <avagin@openvz.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-pc4c9qhl8q6vg3bs4j6k0rbd@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      8c79a045
    • P
      rcu: Avoid waking up CPUs having only kfree_rcu() callbacks · 486e2593
      Paul E. McKenney 提交于
      When CONFIG_RCU_FAST_NO_HZ is enabled, RCU will allow a given CPU to
      enter dyntick-idle mode even if it still has RCU callbacks queued.
      RCU avoids system hangs in this case by scheduling a timer for several
      jiffies in the future.  However, if all of the callbacks on that CPU
      are from kfree_rcu(), there is no reason to wake the CPU up, as it is
      not a problem to defer freeing of memory.
      
      This commit therefore tracks the number of callbacks on a given CPU
      that are from kfree_rcu(), and avoids scheduling the timer if all of
      a given CPU's callbacks are from kfree_rcu().
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      486e2593
  4. 14 2月, 2012 1 次提交
  5. 13 2月, 2012 1 次提交
  6. 06 2月, 2012 1 次提交
  7. 05 2月, 2012 1 次提交
  8. 04 2月, 2012 1 次提交
  9. 01 2月, 2012 1 次提交
  10. 17 1月, 2012 2 次提交
  11. 14 1月, 2012 1 次提交
  12. 13 1月, 2012 1 次提交
  13. 11 1月, 2012 2 次提交
    • K
      tracepoint: add tracepoints for debugging oom_score_adj · 43d2b113
      KAMEZAWA Hiroyuki 提交于
      oom_score_adj is used for guarding processes from OOM-Killer.  One of
      problem is that it's inherited at fork().  When a daemon set oom_score_adj
      and make children, it's hard to know where the value is set.
      
      This patch adds some tracepoints useful for debugging. This patch adds
      3 trace points.
        - creating new task
        - renaming a task (exec)
        - set oom_score_adj
      
      To debug, users need to enable some trace pointer. Maybe filtering is useful as
      
      # EVENT=/sys/kernel/debug/tracing/events/task/
      # echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
      # echo "oom_score_adj != 0" > $EVENT/task_rename/filter
      # echo 1 > $EVENT/enable
      # EVENT=/sys/kernel/debug/tracing/events/oom/
      # echo 1 > $EVENT/enable
      
      output will be like this.
      # grep oom /sys/kernel/debug/tracing/trace
      bash-7699  [007] d..3  5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
      bash-7699  [007] ...1  5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
      ls-7729  [003] ...2  5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
      bash-7699  [002] ...1  5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
      grep-7730  [007] ...2  5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43d2b113
    • K
      mm-tracepoint: rename page-free events · b413d48a
      Konstantin Khlebnikov 提交于
      Rename mm_page_free_direct into mm_page_free and mm_pagevec_free into
      mm_page_free_batched
      
      Since v2.6.33-5426-gc475dab6 the kernel triggers mm_page_free_direct for
      all freed pages, not only for directly freed.  So, let's name it properly.
       For pages freed via page-list we also trigger mm_page_free_batched event.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b413d48a
  14. 24 12月, 2011 1 次提交
  15. 19 12月, 2011 1 次提交
  16. 18 12月, 2011 2 次提交
    • W
      writeback: dirty ratelimit - think time compensation · 83712358
      Wu Fengguang 提交于
      Compensate the task's think time when computing the final pause time,
      so that ->dirty_ratelimit can be executed accurately.
      
              think time := time spend outside of balance_dirty_pages()
      
      In the rare case that the task slept longer than the 200ms period time
      (result in negative pause time), the sleep time will be compensated in
      the following periods, too, if it's less than 1 second.
      
      Accumulated errors are carefully avoided as long as the max pause area
      is not hitted.
      
      Pseudo code:
      
              period = pages_dirtied / task_ratelimit;
              think = jiffies - dirty_paused_when;
              pause = period - think;
      
      1) normal case: period > think
      
              pause = period - think
              dirty_paused_when = jiffies + pause
              nr_dirtied = 0
      
                                   period time
                    |===============================>|
                        think time      pause time
                    |===============>|==============>|
              ------|----------------|---------------|------------------------
              dirty_paused_when   jiffies
      
      2) no pause case: period <= think
      
              don't pause; reduce future pause time by:
              dirty_paused_when += period
              nr_dirtied = 0
      
                                 period time
                    |===============================>|
                                        think time
                    |===================================================>|
              ------|--------------------------------+-------------------|----
              dirty_paused_when                                       jiffies
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      83712358
    • W
      writeback: show writeback reason with __print_symbolic · b3bba872
      Wu Fengguang 提交于
      This makes the binary trace understandable by trace-cmd.
      
      CC: Dave Chinner <david@fromorbit.com>
      CC: Curt Wohlgemuth <curtw@google.com>
      CC: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      b3bba872
  17. 12 12月, 2011 8 次提交
    • P
      rcu: Augment rcu_batch_end tracing for idle and callback state · 4968c300
      Paul E. McKenney 提交于
      The current rcu_batch_end event trace records only the name of the RCU
      flavor and the total number of callbacks that remain queued on the
      current CPU.  This is insufficient for testing and tuning the new
      dyntick-idle RCU_FAST_NO_HZ code, so this commit adds idle state along
      with whether or not any of the callbacks that were ready to invoke
      at the beginning of rcu_do_batch() are still queued.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4968c300
    • P
      rcu: Permit dyntick-idle with callbacks pending · 7cb92499
      Paul E. McKenney 提交于
      The current implementation of RCU_FAST_NO_HZ prevents CPUs from entering
      dyntick-idle state if they have RCU callbacks pending.  Unfortunately,
      this has the side-effect of often preventing them from entering this
      state, especially if at least one other CPU is not in dyntick-idle state.
      However, the resulting per-tick wakeup is wasteful in many cases: if the
      CPU has already fully responded to the current RCU grace period, there
      will be nothing for it to do until this grace period ends, which will
      frequently take several jiffies.
      
      This commit therefore permits a CPU that has done everything that the
      current grace period has asked of it (rcu_pending() == 0) even if it
      still as RCU callbacks pending.  However, such a CPU posts a timer to
      wake it up several jiffies later (6 jiffies, based on experience with
      grace-period lengths).  This wakeup is required to handle situations
      that can result in all CPUs being in dyntick-idle mode, thus failing
      to ever complete the current grace period.  If a CPU wakes up before
      the timer goes off, then it cancels that timer, thus avoiding spurious
      wakeups.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      7cb92499
    • P
      rcu: Eliminate RCU_FAST_NO_HZ grace-period hang · f535a607
      Paul E. McKenney 提交于
      With the new implementation of RCU_FAST_NO_HZ, it was possible to hang
      RCU grace periods as follows:
      
      o	CPU 0 attempts to go idle, cycles several times through the
      	rcu_prepare_for_idle() loop, then goes dyntick-idle when
      	RCU needs nothing more from it, while still having at least
      	on RCU callback pending.
      
      o	CPU 1 goes idle with no callbacks.
      
      Both CPUs can then stay in dyntick-idle mode indefinitely, preventing
      the RCU grace period from ever completing, possibly hanging the system.
      
      This commit therefore prevents CPUs that have RCU callbacks from entering
      dyntick-idle mode.  This approach also eliminates the need for the
      end-of-grace-period IPIs used previously.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f535a607
    • P
      rcu: Add tracing for RCU_FAST_NO_HZ · 433cdddc
      Paul E. McKenney 提交于
      This commit adds trace_rcu_prep_idle(), which is invoked from
      rcu_prepare_for_idle() and rcu_wake_cpu() to trace attempts on
      the part of RCU to force CPUs into dyntick-idle mode.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      433cdddc
    • P
      rcu: Update trace_rcu_dyntick() header comment · 045fb931
      Paul E. McKenney 提交于
      This commit updates the trace_rcu_dyntick() header comment to reflect
      events added by commit 4b4f421.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      045fb931
    • P
      rcu: Deconfuse dynticks entry-exit tracing · 4145fa7f
      Paul E. McKenney 提交于
      The trace_rcu_dyntick() trace event did not print both the old and
      the new value of the nesting level, and furthermore printed only
      the low-order 32 bits of it.  This could result in some confusion
      when interpreting trace-event dumps, so this commit prints both
      the old and the new value, prints the full 64 bits, and also selects
      the process-entry/exit increment to print nicely in hexadecimal.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      4145fa7f
    • P
      rcu: Add failure tracing to rcutorture · 91afaf30
      Paul E. McKenney 提交于
      Trace the rcutorture RCU accesses and dump the trace buffer when the
      first failure is detected.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      91afaf30
    • P
      rcu: Track idleness independent of idle tasks · 9b2e4f18
      Paul E. McKenney 提交于
      Earlier versions of RCU used the scheduling-clock tick to detect idleness
      by checking for the idle task, but handled idleness differently for
      CONFIG_NO_HZ=y.  But there are now a number of uses of RCU read-side
      critical sections in the idle task, for example, for tracing.  A more
      fine-grained detection of idleness is therefore required.
      
      This commit presses the old dyntick-idle code into full-time service,
      so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
      always invoked at the beginning of an idle loop iteration.  Similarly,
      rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
      at the end of an idle-loop iteration.  This allows the idle task to
      use RCU everywhere except between consecutive rcu_idle_enter() and
      rcu_idle_exit() calls, in turn allowing architecture maintainers to
      specify exactly where in the idle loop that RCU may be used.
      
      Because some of the userspace upcall uses can result in what looks
      to RCU like half of an interrupt, it is not possible to expect that
      the irq_enter() and irq_exit() hooks will give exact counts.  This
      patch therefore expands the ->dynticks_nesting counter to 64 bits
      and uses two separate bitfields to count process/idle transitions
      and interrupt entry/exit transitions.  It is presumed that userspace
      upcalls do not happen in the idle loop or from usermode execution
      (though usermode might do a system call that results in an upcall).
      The counter is hard-reset on each process/idle transition, which
      avoids the interrupt entry/exit error from accumulating.  Overflow
      is avoided by the 64-bitness of the ->dyntick_nesting counter.
      
      This commit also adds warnings if a non-idle task asks RCU to enter
      idle state (and these checks will need some adjustment before applying
      Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
      In addition, validation of ->dynticks and ->dynticks_nesting is added.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      9b2e4f18
  18. 06 12月, 2011 1 次提交
  19. 01 12月, 2011 1 次提交
  20. 01 11月, 2011 3 次提交
    • M
      mm: change isolate mode from #define to bitwise type · 4356f21d
      Minchan Kim 提交于
      Change ISOLATE_XXX macro with bitwise isolate_mode_t type.  Normally,
      macro isn't recommended as it's type-unsafe and making debugging harder as
      symbol cannot be passed throught to the debugger.
      
      Quote from Johannes
      " Hmm, it would probably be cleaner to fully convert the isolation mode
      into independent flags.  INACTIVE, ACTIVE, BOTH is currently a
      tri-state among flags, which is a bit ugly."
      
      This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4356f21d
    • P
      Revert "tracing: Include module.h in define_trace.h" · 67b84999
      Paul Gortmaker 提交于
      This reverts commit 3a9f987b.
      
      With all the files that are real modules now having module.h
      explicitly called out for inclusion, and no reliance on any
      implicit presence of module.h assumed, we should no longer
      need this workaround.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      67b84999
    • P
      include: replace linux/module.h with "struct module" wherever possible · de477254
      Paul Gortmaker 提交于
      The <linux/module.h> pretty much brings in the kitchen sink along
      with it, so it should be avoided wherever reasonably possible in
      terms of being included from other commonly used <linux/something.h>
      files, as it results in a measureable increase on compile times.
      
      The worst culprit was probably device.h since it is used everywhere.
      This file also had an implicit dependency/usage of mutex.h which was
      masked by module.h, and is also fixed here at the same time.
      
      There are over a dozen other headers that simply declare the
      struct instead of pulling in the whole file, so follow their lead
      and simply make it a few more.
      
      Most of the implicit dependencies on module.h being present by
      these headers pulling it in have been now weeded out, so we can
      finally make this change with hopefully minimal breakage.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      de477254
  21. 31 10月, 2011 4 次提交
  22. 27 10月, 2011 1 次提交
    • E
      ext4: optimize ext4_ext_convert_to_initialized() · 6f91bc5f
      Eric Gouriou 提交于
      This patch introduces a fast path in ext4_ext_convert_to_initialized()
      for the case when the conversion can be performed by transferring
      the newly initialized blocks from the uninitialized extent into
      an adjacent initialized extent. Doing so removes the expensive
      invocations of memmove() which occur during extent insertion and
      the subsequent merge.
      
      In practice this should be the common case for clients performing
      append writes into files pre-allocated via
      fallocate(FALLOC_FL_KEEP_SIZE). In such a workload performed via
      direct IO and when using a suboptimal implementation of memmove()
      (x86_64 prior to the 2.6.39 rewrite), this patch reduces kernel CPU
      consumption by 32%.
      
      Two new trace points are added to ext4_ext_convert_to_initialized()
      to offer visibility into its operations. No exit trace point has
      been added due to the multiplicity of return points. This can be
      revisited once the upstream cleanup is backported.
      Signed-off-by: NEric Gouriou <egouriou@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      6f91bc5f
  23. 25 10月, 2011 1 次提交
    • A
      net/9p: Convert net/9p protocol dumps to tracepoints · 348b5901
      Aneesh Kumar K.V 提交于
      This helps in more control over debugging.
      root@qemu-img-64:~# ls /pass/123
      ls: cannot access /pass/123: No such file or directory
      root@qemu-img-64:~# cat /sys/kernel/debug/tracing/trace
      # tracer: nop
      #
      #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
      #              | |       |          |         |
                    ls-1536  [001]    70.928584: 9p_protocol_dump: clnt 18446612132784021504 P9_TWALK(tag = 1)
      000: 16 00 00 00 6e 01 00 01 00 00 00 02 00 00 00 01
      010: 00 03 00 31 32 33 00 00 00 ff ff ff ff 00 00 00
      
                    ls-1536  [001]    70.928587: <stack trace>
       => trace_9p_protocol_dump
       => p9pdu_finalize
       => p9_client_rpc
       => p9_client_walk
       => v9fs_vfs_lookup
       => d_alloc_and_lookup
       => walk_component
       => path_lookupat
                    ls-1536  [000]    70.929696: 9p_protocol_dump: clnt 18446612132784021504 P9_RLERROR(tag = 1)
      000: 0b 00 00 00 07 01 00 02 00 00 00 4e 03 00 02 00
      010: 00 00 00 00 03 00 02 00 00 00 00 00 ff 43 00 00
      
                    ls-1536  [000]    70.929697: <stack trace>
       => trace_9p_protocol_dump
       => p9_client_rpc
       => p9_client_walk
       => v9fs_vfs_lookup
       => d_alloc_and_lookup
       => walk_component
       => path_lookupat
       => do_path_lookup
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      348b5901
  24. 04 10月, 2011 1 次提交