1. 07 4月, 2009 1 次提交
    • L
      tracing: fix incorrect return type of ns2usecs() · cf8e3474
      Lai Jiangshan 提交于
      Impact: fix time output bug in 32bits system
      
      ns2usecs() returns 'long', it's incorrect.
      
      (In i386)
      ...
                <idle>-0     [000]   521.442100: _spin_lock <-tick_do_update_jiffies64
                <idle>-0     [000]   521.442101: do_timer <-tick_do_update_jiffies64
                <idle>-0     [000]   521.442102: update_wall_time <-do_timer
                <idle>-0     [000]   521.442102: update_xtime_cache <-update_wall_time
      ....
      (It always print the time less than 2200 seconds besides ...)
      Because 'long' is 32bits in i386. ( (1<<31) useconds is about 2200 seconds)
      
      ...
                <idle>-0     [001] 4154502640.134759: rcu_bh_qsctr_inc <-__do_softirq
                <idle>-0     [001] 4154502640.134760: _local_bh_enable <-__do_softirq
                <idle>-0     [001] 4154502640.134761: idle_cpu <-irq_exit
      ...
      (very large value)
      Because 'long' is a signed type and it is 32bits in i386.
      
      Changes in v2:
      return 'unsigned long long' instead of 'cycle_t'
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <49D05D10.4030009@cn.fujitsu.com>
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf8e3474
  2. 24 3月, 2009 2 次提交
    • S
      function-graph: add option for include sleep times · be6f164a
      Steven Rostedt 提交于
      Impact: give user a choice to show times spent while sleeping
      
      The user may want to see the time a function spent sleeping.
      This patch adds the trace option "sleep-time" to allow that.
      The "sleep-time" option is default on.
      
       echo sleep-time > /debug/tracing/trace_options
      
      produces:
      
       ------------------------------------------
       2)  avahi-d-3428  =>    <idle>-0
       ------------------------------------------
      
       2)               |      finish_task_switch() {
       2)   0.621 us    |        _spin_unlock_irq();
       2)   2.202 us    |      }
       2) ! 1002.197 us |    }
       2) ! 1003.521 us |  }
      
      where as,
      
       echo nosleep-time > /debug/tracing/trace_options
      
      produces:
      
       0)    <idle>-0    =>  yum-upd-3416
       ------------------------------------------
      
       0)               |              finish_task_switch() {
       0)   0.643 us    |                _spin_unlock_irq();
       0)   2.342 us    |              }
       0) + 41.302 us   |            }
       0) + 42.453 us   |          }
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      be6f164a
    • T
      tracing/filters: use trace_seq_printf() to print filters · 4bda2d51
      Tom Zanussi 提交于
      Impact: cleanup
      
      Instead of just using the trace_seq buffer to print the filters, use
      trace_seq_printf() as it was intended to be used.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1237878871.8339.59.camel@charm-linux>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4bda2d51
  3. 23 3月, 2009 4 次提交
    • F
      tracing/events: don't use wake up for events · 07edf712
      Frederic Weisbecker 提交于
      Impact: fix hard-lockup with sched switch events
      
      Some ftrace events, such as sched wakeup, can be traced
      while the runqueue lock is hold. Since they are using
      trace_current_buffer_unlock_commit(), they call wake_up()
      which can try to grab the runqueue lock too, resulting in
      a deadlock.
      
      Now for all event, we call a new helper:
      trace_nowake_buffer_unlock_commit() which do pretty the same than
      trace_current_buffer_unlock_commit() except than it doesn't call
      trace_wake_up().
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1237759847-21025-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      07edf712
    • T
      tracing: add per-subsystem filtering · cfb180f3
      Tom Zanussi 提交于
      This patch adds per-subsystem filtering to the event tracing subsystem.
      
      It adds a 'filter' debugfs file to each subsystem directory.  This file
      can be written to to set filters; reading from it will display the
      current set of filters set for that subsystem.
      
      Basically what it does is propagate the filter down to each event
      contained in the subsystem.  If a particular event doesn't have a field
      with the name specified in the filter, it simply doesn't get set for
      that event.  You can verify whether or not the filter was set for a
      particular event by looking at the filter file for that event.
      
      As with per-event filters, compound expressions are supported, echoing
      '0' to the subsystem's filter file clears all filters in the subsystem,
      etc.
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1237710677.7703.49.camel@charm-linux>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cfb180f3
    • T
      tracing: add per-event filtering · 7ce7e424
      Tom Zanussi 提交于
      This patch adds per-event filtering to the event tracing subsystem.
      
      It adds a 'filter' debugfs file to each event directory.  This file can
      be written to to set filters; reading from it will display the current
      set of filters set for that event.
      
      Basically, any field listed in the 'format' file for an event can be
      filtered on (including strings, but not yet other array types) using
      either matching ('==') or non-matching ('!=') 'predicates'.  A
      'predicate' can be either a single expression:
      
       # echo pid != 0 > filter
      
       # cat filter
       pid != 0
      
      or a compound expression of up to 8 sub-expressions combined using '&&'
      or '||':
      
       # echo comm == Xorg > filter
       # echo "&& sig != 29" > filter
      
       # cat filter
       comm == Xorg
       && sig != 29
      
      Only events having field values matching an expression will be available
      in the trace output; non-matching events are discarded.
      
      Note that a compound expression is built up by echoing each
      sub-expression separately - it's not the most efficient way to do
      things, but it keeps the parser simple and assumes that compound
      expressions will be relatively uncommon.  In any case, a subsequent
      patch introducing a way to set filters for entire subsystems should
      mitigate any need to do this for lots of events.
      
      Setting a filter without an '&&' or '||' clears the previous filter
      completely and sets the filter to the new expression:
      
       # cat filter
       comm == Xorg
       && sig != 29
      
       # echo comm != Xorg
      
       # cat filter
       comm != Xorg
      
      To clear a filter, echo 0 to the filter file:
      
       # echo 0 > filter
       # cat filter
       none
      
      The limit of 8 predicates for a compound expression is arbitrary - for
      efficiency, it's implemented as an array of pointers to predicates, and
      8 seemed more than enough for any filter...
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1237710665.7703.48.camel@charm-linux>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7ce7e424
    • T
      tracing: add run-time field descriptions for event filtering · cf027f64
      Tom Zanussi 提交于
      This patch makes the field descriptions defined for event tracing
      available at run-time, for the event-filtering mechanism introduced
      in a subsequent patch.
      
      The common event fields are prepended with 'common_' in the format
      display, allowing them to be distinguished from the other fields
      that might internally have same name and can therefore be
      unambiguously used in filters.
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1237710639.7703.46.camel@charm-linux>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf027f64
  4. 20 3月, 2009 2 次提交
  5. 18 3月, 2009 1 次提交
  6. 17 3月, 2009 1 次提交
    • S
      tracing: protect reader of cmdline output · 4ca53085
      Steven Rostedt 提交于
      Impact: fix to one cause of incorrect comm outputs in trace
      
      The spinlock only protected the creation of a comm <=> pid pair.
      But it was possible that a reader could look up a pid, and get the
      wrong comm because it had no locking.
      
      This also required changing trace_find_cmdline to copy the comm cache
      and not just send back a pointer to it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      4ca53085
  7. 13 3月, 2009 5 次提交
    • F
      tracing/syscalls: core infrastructure for syscalls tracing, enhancements · bed1ffca
      Frederic Weisbecker 提交于
      Impact: new feature
      
      This adds the generic support for syscalls tracing. This is
      currently exploited through a devoted tracer but other tracing
      engines can use it. (They just have to play with
      {start,stop}_ftrace_syscalls() and use the display callbacks
      unless they want to override them.)
      
      The syscalls prototypes definitions are abused here to steal
      some metadata informations:
      
      - syscall name, param types, param names, number of params
      
      The syscall addr is not directly saved during this definition
      because we don't know if its prototype is available in the
      namespace. But we don't really need it. The arch has just to
      build a function able to resolve the syscall number to its
      metadata struct.
      
      The current tracer prints the syscall names, parameters names
      and values (and their types optionally). Currently the value is
      a raw hex but higher level values diplaying is on my TODO list.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1236955332-10133-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bed1ffca
    • F
      tracing/ftrace: syscall tracing infrastructure, basics · ee08c6ec
      Frederic Weisbecker 提交于
      Provide basic callbacks to do syscall tracing.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <1236401580-5758-2-git-send-email-fweisbec@gmail.com>
      [ simplified it to a trace_printk() for now. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ee08c6ec
    • S
      tracing: add comment for use of double __builtin_consant_p · bdc06758
      Steven Rostedt 提交于
      Impact: documentation
      
      The use of the double __builtin_contant_p checks in the event_trace_printk
      can be confusing to developers and reviewers. This patch adds a comment
      to explain why it is there.
      Requested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      LKML-Reference: <20090313122235.43EB.A69D9226@jp.fujitsu.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      bdc06758
    • S
      tracing: have event_trace_printk use static tracer · e9fb2b6d
      Steven Rostedt 提交于
      Impact: speed up on event tracing
      
      The event_trace_printk is currently a wrapper function that calls
      trace_vprintk. Because it uses a variable for the fmt it misses out
      on the optimization of using the binary printk.
      
      This patch makes event_trace_printk into a macro wrapper to use the
      fmt as the same as the trace_printks.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      e9fb2b6d
    • F
      tracing/core: bring back raw trace_printk for dynamic formats strings · 48ead020
      Frederic Weisbecker 提交于
      Impact: fix callsites with dynamic format strings
      
      Since its new binary implementation, trace_printk() internally uses static
      containers for the format strings on each callsites. But the value is
      assigned once at build time, which means that it can't take dynamic
      formats.
      
      So this patch unearthes the raw trace_printk implementation for the callers
      that will need trace_printk to be able to carry these dynamic format
      strings. The trace_printk() macro will use the appropriate implementation
      for each callsite. Most of the time however, the binary implementation will
      still be used.
      
      The other impact of this patch is that mmiotrace_printk() will use the old
      implementation because it calls the low level trace_vprintk and we can't
      guess here whether the format passed in it is dynamic or not.
      
      Some parts of this patch have been written by Steven Rostedt (most notably
      the part that chooses the appropriate implementation for each callsites).
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      48ead020
  8. 12 3月, 2009 1 次提交
  9. 10 3月, 2009 1 次提交
    • S
      tracing: new format for specialized trace points · da4d0302
      Steven Rostedt 提交于
      Impact: clean up and enhancement
      
      The TRACE_EVENT_FORMAT macro looks quite ugly and is limited in its
      ability to save data as well as to print the record out. Working with
      Ingo Molnar, we came up with a new format that is much more pleasing to
      the eye of C developers. This new macro is more C style than the old
      macro, and is more obvious to what it does.
      
      Here's the example. The only updated macro in this patch is the
      sched_switch trace point.
      
      The old method looked like this:
      
       TRACE_EVENT_FORMAT(sched_switch,
              TP_PROTO(struct rq *rq, struct task_struct *prev,
                      struct task_struct *next),
              TP_ARGS(rq, prev, next),
              TP_FMT("task %s:%d ==> %s:%d",
                    prev->comm, prev->pid, next->comm, next->pid),
              TRACE_STRUCT(
                      TRACE_FIELD(pid_t, prev_pid, prev->pid)
                      TRACE_FIELD(int, prev_prio, prev->prio)
                      TRACE_FIELD_SPECIAL(char next_comm[TASK_COMM_LEN],
                                          next_comm,
                                          TP_CMD(memcpy(TRACE_ENTRY->next_comm,
                                                       next->comm,
                                                       TASK_COMM_LEN)))
                      TRACE_FIELD(pid_t, next_pid, next->pid)
                      TRACE_FIELD(int, next_prio, next->prio)
              ),
              TP_RAW_FMT("prev %d:%d ==> next %s:%d:%d")
              );
      
      The above method is hard to read and requires two format fields.
      
      The new method:
      
       /*
        * Tracepoint for task switches, performed by the scheduler:
        *
        * (NOTE: the 'rq' argument is not used by generic trace events,
        *        but used by the latency tracer plugin. )
        */
       TRACE_EVENT(sched_switch,
      
      	TP_PROTO(struct rq *rq, struct task_struct *prev,
      		 struct task_struct *next),
      
      	TP_ARGS(rq, prev, next),
      
      	TP_STRUCT__entry(
      		__array(	char,	prev_comm,	TASK_COMM_LEN	)
      		__field(	pid_t,	prev_pid			)
      		__field(	int,	prev_prio			)
      		__array(	char,	next_comm,	TASK_COMM_LEN	)
      		__field(	pid_t,	next_pid			)
      		__field(	int,	next_prio			)
      	),
      
      	TP_printk("task %s:%d [%d] ==> %s:%d [%d]",
      		__entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
      		__entry->next_comm, __entry->next_pid, __entry->next_prio),
      
      	TP_fast_assign(
      		memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
      		__entry->prev_pid	= prev->pid;
      		__entry->prev_prio	= prev->prio;
      		memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
      		__entry->next_pid	= next->pid;
      		__entry->next_prio	= next->prio;
      	)
       );
      
      This macro is called TRACE_EVENT, it is broken up into 5 parts:
      
       TP_PROTO:        the proto type of the trace point
       TP_ARGS:         the arguments of the trace point
       TP_STRUCT_entry: the structure layout of the entry in the ring buffer
       TP_printk:       the printk format
       TP_fast_assign:  the method used to write the entry into the ring buffer
      
      The structure is the definition of how the event will be saved in the
      ring buffer. The printk is used by the internal tracing in case of
      an oops, and the kernel needs to print out the format of the record
      to the console. This the TP_printk gives a means to show the records
      in a human readable format. It is also used to print out the data
      from the trace file.
      
      The TP_fast_assign is executed directly. It is basically like a C function,
      where the __entry is the handle to the record.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      da4d0302
  10. 07 3月, 2009 3 次提交
    • I
      tracing: trace_bprintk() cleanups · 9de36825
      Ingo Molnar 提交于
      Impact: cleanup
      
      Remove a few leftovers and clean up the code a bit.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9de36825
    • F
      tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() · 769b0441
      Frederic Weisbecker 提交于
      Impact: faster and lighter tracing
      
      Now that we have trace_bprintk() which is faster and consume lesser
      memory than trace_printk() and has the same purpose, we can now drop
      the old implementation in favour of the binary one from trace_bprintk(),
      which means we move all the implementation of trace_bprintk() to
      trace_printk(), so the Api doesn't change except that we must now use
      trace_seq_bprintk() to print the TRACE_PRINT entries.
      
      Some changes result of this:
      
      - Previously, trace_bprintk depended of a single tracer and couldn't
        work without. This tracer has been dropped and the whole implementation
        of trace_printk() (like the module formats management) is now integrated
        in the tracing core (comes with CONFIG_TRACING), though we keep the file
        trace_printk (previously trace_bprintk.c) where we can find the module
        management. Thus we don't overflow trace.c
      
      - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.
      
      - change a bit trace_printk/trace_vprintk macros to support non-builtin formats
        constants, and fix 'const' qualifiers warnings. But this is all transparent for
        developers.
      
      - etc...
      
      V2:
      
      - Rebase against last changes
      - Fix mispell on the changelog
      
      V3:
      
      - Rebase against last changes (moving trace_printk() to kernel.h)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      769b0441
    • L
      tracing: infrastructure for supporting binary record · 1427cdf0
      Lai Jiangshan 提交于
      Impact: save on memory for tracing
      
      Current tracers are typically using a struct(like struct ftrace_entry,
      struct ctx_switch_entry, struct special_entr etc...)to record a binary
      event. These structs can only record a their own kind of events.
      A new kind of tracer need a new struct and a lot of code too handle it.
      
      So we need a generic binary record for events. This infrastructure
      is for this purpose.
      
      [fweisbec@gmail.com: rebase against latest -tip, make it safe while sched
      tracing as reported by Steven Rostedt]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1236356510-8381-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1427cdf0
  11. 05 3月, 2009 2 次提交
    • I
      tracing: rename ftrace_printk() => trace_printk() · 5e1607a0
      Ingo Molnar 提交于
      Impact: cleanup
      
      Use a more generic name - this also allows the prototype to move
      to kernel.h and be generally available to kernel developers who
      want to do some quick tracing.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5e1607a0
    • S
      tracing: add latency output format option · c032ef64
      Steven Rostedt 提交于
      With the removal of the latency_trace file, we lost the ability
      to see some of the finer details in a trace. Like the state of
      interrupts enabled, the preempt count, need resched, and if we
      are in an interrupt handler, softirq handler or not.
      
      This patch simply creates an option to bring back the old format.
      This also removes the warning about an unused variable that held
      the latency_trace file operations.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      c032ef64
  12. 04 3月, 2009 1 次提交
    • S
      tracing: add binary buffer files for use with splice · 2cadf913
      Steven Rostedt 提交于
      Impact: new feature
      
      This patch creates a directory of files that correspond to the
      per CPU ring buffers. These are binary files and are made to
      be used with splice. This is the fastest way to extract data from
      the ftrace ring buffers.
      
      Thanks to Jiaying Zhang for pushing me to get this code fixed,
       and to Eduard - Gabriel Munteanu for his splice code that helped
       me debug my code.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      2cadf913
  13. 03 3月, 2009 2 次提交
    • S
      tracing: add format file to describe event struct fields · 981d081e
      Steven Rostedt 提交于
      This patch adds the "format" file to the trace point event directory.
      This is based off of work by Tom Zanussi, in which a file is exported
      to be tread from user land such that a user space app may read the
      binary record stored in the ring buffer.
      
       # cat /debug/tracing/events/sched/sched_switch/format
              field:pid_t prev_pid;   offset:12;      size:4;
              field:int prev_prio;    offset:16;      size:4;
              field special:char next_comm[TASK_COMM_LEN];    offset:20;      size:16;
              field:pid_t next_pid;   offset:36;      size:4;
              field:int next_prio;    offset:40;      size:4;
      
      Idea-from: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      981d081e
    • S
      tracing: make trace_seq_reset global and rename to trace_seq_init · f9520750
      Steven Rostedt 提交于
      Impact: clean up
      
      The trace_seq functions may be used separately outside of the ftrace
      iterator. The trace_seq_reset is needed for these operations.
      
      This patch also renames trace_seq_reset to the more appropriate
      trace_seq_init.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      f9520750
  14. 28 2月, 2009 3 次提交
    • S
      tracing: add raw fast tracing interface for trace events · fd994989
      Steven Rostedt 提交于
      This patch adds the interface to enable the C style trace points.
      In the directory /debugfs/tracing/events/subsystem/event
      We now have three files:
      
       enable : values 0 or 1 to enable or disable the trace event.
      
       available_types: values 'raw' and 'printf' which indicate the tracing
             types available for the trace point. If a developer does not
             use the TRACE_EVENT_FORMAT macro and just uses the TRACE_FORMAT
             macro, then only 'printf' will be available. This file is
             read only.
      
       type: values 'raw' or 'printf'. This indicates which type of tracing
             is active for that trace point. 'printf' is the default and
             if 'raw' is not available, this file is read only.
      
       # echo raw > /debug/tracing/events/sched/sched_wakeup/type
       # echo 1 > /debug/tracing/events/sched/sched_wakeup/enable
      
       Will enable the C style tracing for the sched_wakeup trace point.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      fd994989
    • S
      tracing: add raw trace point recording infrastructure · c32e827b
      Steven Rostedt 提交于
      Impact: lower overhead tracing
      
      The current event tracer can automatically pick up trace points
      that are registered with the TRACE_FORMAT macro. But it required
      a printf format string and parsing. Although, this adds the ability
      to get guaranteed information like task names and such, it took
      a hit in overhead processing. This processing can add about 500-1000
      nanoseconds overhead, but in some cases that too is considered
      too much and we want to shave off as much from this overhead as
      possible.
      
      Tom Zanussi recently posted tracing patches to lkml that are based
      on a nice idea about capturing the data via C structs using
      STRUCT_ENTER, STRUCT_EXIT type of macros.
      
      I liked that method very much, but did not like the implementation
      that required a developer to add data/code in several disjoint
      locations.
      
      This patch extends the event_tracer macros to do a similar "raw C"
      approach that Tom Zanussi did. But instead of having the developers
      needing to tweak a bunch of code all over the place, they can do it
      all in one macro - preferably placed near the code that it is
      tracing. That makes it much more likely that tracepoints will be
      maintained on an ongoing basis by the code they modify.
      
      The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
      a developer may still utilize the more low level DECLARE_TRACE macros
      if they don't care about getting their traces automatically in the event
      tracer.)
      
      They can also use the existing TRACE_FORMAT if they don't need to code
      the tracepoint in C, but just want to use the convenience of printf.
      
      So if the developer wants to "hardwire" a tracepoint in the fastest
      possible way, and wants to acquire their data via a user space utility
      in a raw binary format, or wants to see it in the trace output but not
      sacrifice any performance, then they can implement the faster but
      more complex TRACE_EVENT_FORMAT macro.
      
      Here's what usage looks like:
      
        TRACE_EVENT_FORMAT(name,
      	TPPROTO(proto),
      	TPARGS(args),
      	TPFMT(fmt, fmt_args),
      	TRACE_STUCT(
      		TRACE_FIELD(type1, item1, assign1)
      		TRACE_FIELD(type2, item2, assign2)
      			[...]
      	),
      	TPRAWFMT(raw_fmt)
      	);
      
      Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
      uses.
      
       name: is the unique identifier of the trace point
       proto: The proto type that the trace point uses
       args: the args in the proto type
       fmt: printf format to use with the event printf tracer
       fmt_args: the printf argments to match fmt
      
       TRACE_STRUCT starts the ability to create a structure.
       Each item in the structure is defined with a TRACE_FIELD
      
        TRACE_FIELD(type, item, assign)
      
       type: the C type of item.
       item: the name of the item in the stucture
       assign: what to assign the item in the trace point callback
      
       raw_fmt is a way to pretty print the struct. It must match
        the order of the items are added in TRACE_STUCT
      
       An example of this would be:
      
       TRACE_EVENT_FORMAT(sched_wakeup,
      	TPPROTO(struct rq *rq, struct task_struct *p, int success),
      	TPARGS(rq, p, success),
      	TPFMT("task %s:%d %s",
      	      p->comm, p->pid, success?"succeeded":"failed"),
      	TRACE_STRUCT(
      		TRACE_FIELD(pid_t, pid, p->pid)
      		TRACE_FIELD(int, success, success)
      	),
      	TPRAWFMT("task %d success=%d")
      	);
      
       This creates us a unique struct of:
      
       struct {
      	pid_t		pid;
      	int		success;
       };
      
       And the way the call back would assign these values would be:
      
      	entry->pid = p->pid;
      	entry->success = success;
      
      The nice part about this is that the creation of the assignent is done
      via macro magic in the event tracer.  Once the TRACE_EVENT_FORMAT is
      created, the developer will then have a faster method to record
      into the ring buffer. They do not need to worry about the tracer itself.
      
      The developer would only need to touch the files in include/trace/*.h
      
      Again, I would like to give special thanks to Tom Zanussi for this
      nice idea.
      
      Idea-from: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      c32e827b
    • S
      tracing: add interface to write into current tracer buffer · ef5580d0
      Steven Rostedt 提交于
      Right now all tracers must manage their own trace buffers. This was
      to enforce tracers to be independent in case we finally decide to
      allow each tracer to have their own trace buffer.
      
      But now we are adding event tracing that writes to the current tracer's
      buffer. This adds an interface to allow events to write to the current
      tracer buffer without having to manage its own. Since event tracing
      has no "tracer", and is just a way to hook into any other tracer.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      ef5580d0
  15. 25 2月, 2009 2 次提交
    • F
      tracing/core: make the read callbacks reentrants · d7350c3f
      Frederic Weisbecker 提交于
      Now that several per-cpu files can be read or spliced at the
      same, we want the read/splice callbacks for tracing files to be
      reentrants.
      
      Until now, a single global mutex (trace_types_lock) serialized
      the access to tracing_read_pipe(), tracing_splice_read_pipe(),
      and the seq helpers.
      
      Ie: it means that if a user tries to read trace_pipe0 and
      trace_pipe1 at the same time, the access to the function
      tracing_read_pipe() is contended and one reader must wait for
      the other to finish its read call.
      
      The trace_type_lock mutex is mostly here to serialize the access
      to the global current tracer (current_trace), which can be
      changed concurrently. Although the iter struct keeps a private
      pointer to this tracer, its callbacks can be changed by another
      function.
      
      The method used here is to not keep anymore private reference to
      the tracer inside the iterator but to make a copy of it inside
      the iterator. Then it checks on subsequents read calls if the
      tracer has changed. This is not costly because the current
      tracer is not expected to be changed often, so we use a branch
      prediction for that.
      
      Moreover, we add a private mutex to the iterator (there is one
      iterator per file descriptor) to serialize the accesses in case
      of multiple consumers per file descriptor (which would be a
      silly idea from the user). Note that this is not to protect the
      ring buffer, since the ring buffer already serializes the
      readers accesses. This is to prevent from traces weirdness in
      case of concurrent consumers. But these mutexes can be dropped
      anyway, that would not result in any crash. Just tell me what
      you think about it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d7350c3f
    • F
      tracing/core: introduce per cpu tracing files · b04cc6b1
      Frederic Weisbecker 提交于
      Impact: split up tracing output per cpu
      
      Currently, on the tracing debugfs directory, three files are
      available to the user to let him extracting the trace output:
      
      - trace is an iterator through the ring-buffer. It's a reader
        but not a consumer It doesn't block when no more traces are
        available.
      
      - trace pretty similar to the former, except that it adds more
        informations such as prempt count, irq flag, ...
      
      - trace_pipe is a reader and a consumer, it will also block
        waiting for traces if necessary (heh, yes it's a pipe).
      
      The traces coming from different cpus are curretly mixed up
      inside these files. Sometimes it messes up the informations,
      sometimes it's useful, depending on what does the tracer
      capture.
      
      The tracing_cpumask file is useful to filter the output and
      select only the traces captured a custom defined set of cpus.
      But still it is not enough powerful to extract at the same time
      one trace buffer per cpu.
      
      So this patch creates a new directory: /debug/tracing/per_cpu/.
      
      Inside this directory, you will now find one trace_pipe file and
      one trace file per cpu.
      
      Which means if you have two cpus, you will have:
      
       trace0
       trace1
       trace_pipe0
       trace_pipe1
      
      And of course, reading these files will have the same effect
      than with the usual tracing files, except that you will only see
      the traces from the given cpu.
      
      The original all-in-one cpu trace file are still available on
      their original place.
      
      Until now, only one consumer was allowed on trace_pipe to avoid
      racy consuming on the ring-buffer. Now the approach changed a
      bit, you can have only one consumer per cpu.
      
      Which means you are allowed to read concurrently trace_pipe0 and
      trace_pipe1 But you can't have two readers on trace_pipe0 or
      trace_pipe1.
      
      Following the same logic, if there is one reader on the common
      trace_pipe, you can not have at the same time another reader on
      trace_pipe0 or in trace_pipe1. Because in trace_pipe is already
      a consumer in all cpu buffers in essence.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b04cc6b1
  16. 18 2月, 2009 1 次提交
    • F
      tracing/core: use appropriate waiting on trace_pipe · 6eaaa5d5
      Frederic Weisbecker 提交于
      Impact: api and pipe waiting change
      
      Currently, the waiting used in tracing_read_pipe() is done through a
      100 msecs schedule_timeout() loop which periodically check if there
      are traces on the buffer.
      
      This can cause small latencies for programs which are reading the incoming
      events.
      
      This patch makes the reader waiting for the trace_wait waitqueue except
      for few tracers such as the sched and functions tracers which might be
      already hold the runqueue lock while waking up the reader.
      
      This is performed through a new callback wait_pipe() on struct tracer.
      If none is implemented on a specific tracer, the default waiting for
      trace_wait queue is attached.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6eaaa5d5
  17. 10 2月, 2009 1 次提交
  18. 09 2月, 2009 3 次提交
  19. 08 2月, 2009 2 次提交
    • W
      trace: trivial fixes in comment typos. · 57794a9d
      Wenji Huang 提交于
      Impact: clean up
      
      Fixed several typos in the comments.
      Signed-off-by: NWenji Huang <wenji.huang@oracle.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      57794a9d
    • S
      trace: remove deprecated entry->cpu · 1830b52d
      Steven Rostedt 提交于
      Impact: fix to prevent developers from using entry->cpu
      
      With the new ring buffer infrastructure, the cpu for the entry is
      implicit with which CPU buffer it is on.
      
      The original code use to record the current cpu into the generic
      entry header, which can be retrieved by entry->cpu. When the
      ring buffer was introduced, the users were convert to use the
      the cpu number of which cpu ring buffer was in use (this was passed
      to the tracers by the iterator: iter->cpu).
      
      Unfortunately, the cpu item in the entry structure was never removed.
      This allowed for developers to use it instead of the proper iter->cpu,
      unknowingly, using an uninitialized variable. This was not the fault
      of the developers, since it would seem like the logical place to
      retrieve the cpu identifier.
      
      This patch removes the cpu item from the entry structure and fixes
      all the users that should have been using iter->cpu.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      1830b52d
  20. 06 2月, 2009 2 次提交
    • A
      trace: Call tracing_reset_online_cpus before tracer->init() · b6f11df2
      Arnaldo Carvalho de Melo 提交于
      Impact: cleanup
      
      To make it easy for ftrace plugin writers, as this was open coded in
      the existing plugins
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NFrédéric Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b6f11df2
    • A
      tracing: Introduce trace_buffer_{lock_reserve,unlock_commit} · 51a763dd
      Arnaldo Carvalho de Melo 提交于
      Impact: new API
      
      These new functions do what previously was being open coded, reducing
      the number of details ftrace plugin writers have to worry about.
      
      It also standardizes the handling of stacktrace, userstacktrace and
      other trace options we may introduce in the future.
      
      With this patch, for instance, the blk tracer (and some others already
      in the tree) can use the "userstacktrace" /d/tracing/trace_options
      facility.
      
      $ codiff /tmp/vmlinux.before /tmp/vmlinux.after
      linux-2.6-tip/kernel/trace/trace.c:
        trace_vprintk              |   -5
        trace_graph_return         |  -22
        trace_graph_entry          |  -26
        trace_function             |  -45
        __ftrace_trace_stack       |  -27
        ftrace_trace_userstack     |  -29
        tracing_sched_switch_trace |  -66
        tracing_stop               |   +1
        trace_seq_to_user          |   -1
        ftrace_trace_special       |  -63
        ftrace_special             |   +1
        tracing_sched_wakeup_trace |  -70
        tracing_reset_online_cpus  |   -1
       13 functions changed, 2 bytes added, 355 bytes removed, diff: -353
      
      linux-2.6-tip/block/blktrace.c:
        __blk_add_trace |  -58
       1 function changed, 58 bytes removed, diff: -58
      
      linux-2.6-tip/kernel/trace/trace.c:
        trace_buffer_lock_reserve  |  +88
        trace_buffer_unlock_commit |  +86
       2 functions changed, 174 bytes added, diff: +174
      
      /tmp/vmlinux.after:
       16 functions changed, 176 bytes added, 413 bytes removed, diff: -237
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NFrédéric Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      51a763dd