1. 18 4月, 2009 1 次提交
  2. 17 4月, 2009 3 次提交
    • S
      tracing/events: perform function tracing in event selftests · 9ea21c1e
      Steven Rostedt 提交于
      We can find some bugs in the trace events if we stress the writes as well.
      The function tracer is a good way to stress the events.
      
      [ Impact: extend scope of event tracer self-tests ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20090416161746.604786131@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9ea21c1e
    • S
      tracing/events/ring-buffer: expose format of ring buffer headers to users · d1b182a8
      Steven Rostedt 提交于
      Currently, every thing needed to read the binary output from the
      ring buffers is available, with the exception of the way the ring
      buffers handles itself internally.
      
      This patch creates two special files in the debugfs/tracing/events
      directory:
      
       # cat /debug/tracing/events/header_page
              field: u64 timestamp;   offset:0;       size:8;
              field: local_t commit;  offset:8;       size:8;
              field: char data;       offset:16;      size:4080;
      
       # cat /debug/tracing/events/header_event
              type        :    2 bits
              len         :    3 bits
              time_delta  :   27 bits
              array       :   32 bits
      
              padding     : type == 0
              time_extend : type == 1
              data        : type == 3
      
      This is to allow a userspace app to see if the ring buffer format changes
      or not.
      
      [ Impact: allow userspace apps to know of ringbuffer format changes ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d1b182a8
    • S
      tracing/events: add startup tests for events · e6187007
      Steven Rostedt 提交于
      As events start to become popular, and the new way to add tracing
      infrastructure into ftrace, it is important to catch any problems
      that might happen with a mistake in the TRACE_EVENT macro.
      
      This patch introduces a startup self test on the registered trace
      events. Note, it can only do a generic test, any type of testing that
      needs more involement is needed to be implemented by the tracepoint
      creators.
      
      The test goes down one by one enabling a trace point and running
      some random tasks (random in the sense that I just made them up).
      Those tasks are creating threads, grabbing mutexes and spinlocks
      and using workqueues.
      
      After testing each event individually, it does the same test after
      enabling each system of trace points. Like sched, irq, lockdep.
      
      Then finally it enables all tracepoints and performs the tasks again.
      The output to the console on bootup will look like this when everything
      works:
      
      Running tests on trace events:
      Testing event kfree_skb: OK
      Testing event kmalloc: OK
      Testing event kmem_cache_alloc: OK
      Testing event kmalloc_node: OK
      Testing event kmem_cache_alloc_node: OK
      Testing event kfree: OK
      Testing event kmem_cache_free: OK
      Testing event irq_handler_exit: OK
      Testing event irq_handler_entry: OK
      Testing event softirq_entry: OK
      Testing event softirq_exit: OK
      Testing event lock_acquire: OK
      Testing event lock_release: OK
      Testing event sched_kthread_stop: OK
      Testing event sched_kthread_stop_ret: OK
      Testing event sched_wait_task: OK
      Testing event sched_wakeup: OK
      Testing event sched_wakeup_new: OK
      Testing event sched_switch: OK
      Testing event sched_migrate_task: OK
      Testing event sched_process_free: OK
      Testing event sched_process_exit: OK
      Testing event sched_process_wait: OK
      Testing event sched_process_fork: OK
      Testing event sched_signal_send: OK
      Running tests on trace event systems:
      Testing event system skb: OK
      Testing event system kmem: OK
      Testing event system irq: OK
      Testing event system lockdep: OK
      Testing event system sched: OK
      Running tests on all trace events:
      Testing all events: OK
      
      [ folded in:
      
        tracing: add #include <linux/delay.h> to fix build failure in test_work()
      
        This build failure occured on a few rare configs:
      
         kernel/trace/trace_events.c: In function ‘test_work’:
         kernel/trace/trace_events.c:975: error: implicit declaration of function ‘udelay’
         kernel/trace/trace_events.c:980: error: implicit declaration of function ‘msleep’
      
        delay.h is included in way too many other headers, hiding cases
        where new usage is added without header inclusion.
      
        [ Impact: build fix ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ]
      
      [ Impact: add event tracer self-tests ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e6187007
  3. 15 4月, 2009 4 次提交
  4. 14 4月, 2009 2 次提交
    • T
      tracing/filters: allow on-the-fly filter switching · 0a19e53c
      Tom Zanussi 提交于
      This patch allows event filters to be safely removed or switched
      on-the-fly while avoiding the use of rcu or the suspension of tracing of
      previous versions.
      
      It does it by adding a new filter_pred_none() predicate function which
      does nothing and by never deallocating either the predicates or any of
      the filter_pred members used in matching; the predicate lists are
      allocated and initialized during ftrace_event_calls initialization.
      
      Whenever a filter is removed or replaced, the filter_pred_* functions
      currently in use by the affected ftrace_event_call are immediately
      switched over to to the filter_pred_none() function, while the rest of
      the filter_pred members are left intact, allowing any currently
      executing filter_pred_* functions to finish up, using the values they're
      currently using.
      
      In the case of filter replacement, the new predicate values are copied
      into the old predicates after the above step, and the filter_pred_none()
      functions are replaced by the filter_pred_* functions for the new
      filter.  In this case, it is possible though very unlikely that a
      previous filter_pred_* is still running even after the
      filter_pred_none() switch and the switch to the new filter_pred_*.  In
      that case, however, because nothing has been deallocated in the
      filter_pred, the worst that can happen is that the old filter_pred_*
      function sees the new values and as a result produces either a false
      positive or a false negative, depending on the values it finds.
      
      So one downside to this method is that rarely, it can produce a bad
      match during the filter switch, but it should be possible to live with
      that, IMHO.
      
      The other downside is that at least in this patch the predicate lists
      are always pre-allocated, taking up memory from the start.  They could
      probably be allocated on first-use, and de-allocated when tracing is
      completely stopped - if this patch makes sense, I could create another
      one to do that later on.
      
      Oh, and it also places a restriction on the size of __arrays in events,
      currently set to 128, since they can't be larger than the now embedded
      str_val arrays in the filter_pred struct.
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: paulmck@linux.vnet.ibm.com
      LKML-Reference: <1239610670.6660.49.camel@tropicana>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0a19e53c
    • T
      tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events · e1112b4d
      Tom Zanussi 提交于
      This patch adds run-time field descriptions to all the event formats
      exported using TRACE_EVENT_FORMAT.  It also hooks up all the tracers
      that use them (i.e. the tracers in the 'ftrace subsystem') so they can
      also have their output filtered by the event-filtering mechanism.
      
      When I was testing this, there were a couple of things that fooled me
      into thinking the filters weren't working, when actually they were -
      I'll mention them here so others don't make the same mistakes (and file
      bug reports. ;-)
      
      One is that some of the tracers trace multiple events e.g. the
      sched_switch tracer uses the context_switch and wakeup events, and if
      you don't set filters on all of the traced events, the unfiltered output
      from the events without filters on them can make it look like the
      filtering as a whole isn't working properly, when actually it is doing
      what it was asked to do - it just wasn't asked to do the right thing.
      
      The other is that for the really high-volume tracers e.g. the function
      tracer, the volume of filtered events can be so high that it pushes the
      unfiltered events out of the ring buffer before they can be read so e.g.
      cat'ing the trace file repeatedly shows either no output, or once in
      awhile some output but that isn't there the next time you read the
      trace, which isn't what you normally expect when reading the trace file.
      If you read from the trace_pipe file though, you can catch them before
      they disappear.
      
      Changes from v1:
      
      As suggested by Frederic Weisbecker:
      
      - get rid of externs in functions
      - added unlikely() to filter_check_discard()
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e1112b4d
  5. 12 4月, 2009 2 次提交
  6. 26 3月, 2009 1 次提交
    • T
      tracing: filter fix for TRACE_EVENT_FORMAT events · 9a8118ba
      Tom Zanussi 提交于
      Impact: fix crash (hang) when using TRACE_EVENT_FORMAT filter files
      
      filters are only hooked up to the tracepoint events defined using
      TRACE_EVENT but not the tracers that use TRACE_EVENT_FORMAT, such
      as ftrace.
      
      Do not display the filter files at all for TRACE_EVENT_FORMAT events
      for the time being.
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1237878882.8339.61.camel@charm-linux>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9a8118ba
  7. 24 3月, 2009 2 次提交
  8. 23 3月, 2009 6 次提交
    • T
      tracing/filters: clean up filter_add_subsystem_pred() · c4cff064
      Tom Zanussi 提交于
      Impact: cleanup, memory leak fix
      
      This patch cleans up filter_add_subsystem_pred():
      
      - searches for the field before creating a copy of the pred
      
      - fixes memory leak in the case a predicate isn't applied
      
      - if -ENOMEM, makes sure there's no longer a reference to the
        pred so the caller can free the half-finished filter
      
      - changes the confusing i == MAX_FILTER_PRED - 1 comparison
        previously remarked upon
      
      This affects only per-subsystem event filtering.
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1237796808.7527.40.camel@charm-linux>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c4cff064
    • F
      tracing/events: make the filter files writable · 9bd7d099
      Frederic Weisbecker 提交于
      We need the filter files to be writable, the current
      filter file permissions are only set readable.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <1237759847-21025-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9bd7d099
    • I
      tracing: add run-time field descriptions for event filtering, kfree fix · fe9f57f2
      Ingo Molnar 提交于
      Impact: fix potential kfree of random data in (rare) failure path
      
      Zero-initialize the field structure.
      Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <1237710639.7703.46.camel@charm-linux>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fe9f57f2
    • T
      tracing: add per-subsystem filtering · cfb180f3
      Tom Zanussi 提交于
      This patch adds per-subsystem filtering to the event tracing subsystem.
      
      It adds a 'filter' debugfs file to each subsystem directory.  This file
      can be written to to set filters; reading from it will display the
      current set of filters set for that subsystem.
      
      Basically what it does is propagate the filter down to each event
      contained in the subsystem.  If a particular event doesn't have a field
      with the name specified in the filter, it simply doesn't get set for
      that event.  You can verify whether or not the filter was set for a
      particular event by looking at the filter file for that event.
      
      As with per-event filters, compound expressions are supported, echoing
      '0' to the subsystem's filter file clears all filters in the subsystem,
      etc.
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1237710677.7703.49.camel@charm-linux>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cfb180f3
    • T
      tracing: add per-event filtering · 7ce7e424
      Tom Zanussi 提交于
      This patch adds per-event filtering to the event tracing subsystem.
      
      It adds a 'filter' debugfs file to each event directory.  This file can
      be written to to set filters; reading from it will display the current
      set of filters set for that event.
      
      Basically, any field listed in the 'format' file for an event can be
      filtered on (including strings, but not yet other array types) using
      either matching ('==') or non-matching ('!=') 'predicates'.  A
      'predicate' can be either a single expression:
      
       # echo pid != 0 > filter
      
       # cat filter
       pid != 0
      
      or a compound expression of up to 8 sub-expressions combined using '&&'
      or '||':
      
       # echo comm == Xorg > filter
       # echo "&& sig != 29" > filter
      
       # cat filter
       comm == Xorg
       && sig != 29
      
      Only events having field values matching an expression will be available
      in the trace output; non-matching events are discarded.
      
      Note that a compound expression is built up by echoing each
      sub-expression separately - it's not the most efficient way to do
      things, but it keeps the parser simple and assumes that compound
      expressions will be relatively uncommon.  In any case, a subsequent
      patch introducing a way to set filters for entire subsystems should
      mitigate any need to do this for lots of events.
      
      Setting a filter without an '&&' or '||' clears the previous filter
      completely and sets the filter to the new expression:
      
       # cat filter
       comm == Xorg
       && sig != 29
      
       # echo comm != Xorg
      
       # cat filter
       comm != Xorg
      
      To clear a filter, echo 0 to the filter file:
      
       # echo 0 > filter
       # cat filter
       none
      
      The limit of 8 predicates for a compound expression is arbitrary - for
      efficiency, it's implemented as an array of pointers to predicates, and
      8 seemed more than enough for any filter...
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1237710665.7703.48.camel@charm-linux>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7ce7e424
    • T
      tracing: add run-time field descriptions for event filtering · cf027f64
      Tom Zanussi 提交于
      This patch makes the field descriptions defined for event tracing
      available at run-time, for the event-filtering mechanism introduced
      in a subsequent patch.
      
      The common event fields are prepended with 'common_' in the format
      display, allowing them to be distinguished from the other fields
      that might internally have same name and can therefore be
      unambiguously used in filters.
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1237710639.7703.46.camel@charm-linux>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf027f64
  9. 20 3月, 2009 2 次提交
  10. 17 3月, 2009 1 次提交
    • T
      tracing: fix leak in event_format_read() · c269fc8c
      Tom Zanussi 提交于
      Impact: fix memory leak
      
      If event_format_read() exits early due to nonzero ppos, the
      previous kmalloc doesn't get freed - might as well do the
      check before the kmalloc and avoid the problem.
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1237270859.8033.141.camel@charm-linux>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c269fc8c
  11. 13 3月, 2009 1 次提交
  12. 12 3月, 2009 1 次提交
  13. 11 3月, 2009 1 次提交
  14. 10 3月, 2009 4 次提交
    • S
      tracing: do not allow modifying the ftrace events via the event files · 40e26815
      Steven Rostedt 提交于
      Impact: fix to prevent crash on calling NULL function pointer
      
      The ftrace internal records have their format exported via the event
      system under the ftrace subsystem. These are only for exporting the
      format to allow binary readers to be able to parse them in a binary
      output.
      
      The ftrace subsystem events can only be enabled via the ftrace tracers
      and do not have a registering function. The event files expect the
      event record to have registering function and will call it directly.
      Passing in a ftrace subsystem event will cause the kernel to crash
      because it will execute a NULL pointer.
      
      This patch prevents the ftrace subsystem from being viewable to the
      event enabling files.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      40e26815
    • S
      tracing: fix printk format specifier · ce8eb2bf
      Steven Rostedt 提交于
      Impact: clean up
      
      The offsetof and sizeof are of type size_t, and instead of typecasting
      them to unsigned int for printk formatting, one could just use %zu.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      ce8eb2bf
    • S
      tracing: new format for specialized trace points · da4d0302
      Steven Rostedt 提交于
      Impact: clean up and enhancement
      
      The TRACE_EVENT_FORMAT macro looks quite ugly and is limited in its
      ability to save data as well as to print the record out. Working with
      Ingo Molnar, we came up with a new format that is much more pleasing to
      the eye of C developers. This new macro is more C style than the old
      macro, and is more obvious to what it does.
      
      Here's the example. The only updated macro in this patch is the
      sched_switch trace point.
      
      The old method looked like this:
      
       TRACE_EVENT_FORMAT(sched_switch,
              TP_PROTO(struct rq *rq, struct task_struct *prev,
                      struct task_struct *next),
              TP_ARGS(rq, prev, next),
              TP_FMT("task %s:%d ==> %s:%d",
                    prev->comm, prev->pid, next->comm, next->pid),
              TRACE_STRUCT(
                      TRACE_FIELD(pid_t, prev_pid, prev->pid)
                      TRACE_FIELD(int, prev_prio, prev->prio)
                      TRACE_FIELD_SPECIAL(char next_comm[TASK_COMM_LEN],
                                          next_comm,
                                          TP_CMD(memcpy(TRACE_ENTRY->next_comm,
                                                       next->comm,
                                                       TASK_COMM_LEN)))
                      TRACE_FIELD(pid_t, next_pid, next->pid)
                      TRACE_FIELD(int, next_prio, next->prio)
              ),
              TP_RAW_FMT("prev %d:%d ==> next %s:%d:%d")
              );
      
      The above method is hard to read and requires two format fields.
      
      The new method:
      
       /*
        * Tracepoint for task switches, performed by the scheduler:
        *
        * (NOTE: the 'rq' argument is not used by generic trace events,
        *        but used by the latency tracer plugin. )
        */
       TRACE_EVENT(sched_switch,
      
      	TP_PROTO(struct rq *rq, struct task_struct *prev,
      		 struct task_struct *next),
      
      	TP_ARGS(rq, prev, next),
      
      	TP_STRUCT__entry(
      		__array(	char,	prev_comm,	TASK_COMM_LEN	)
      		__field(	pid_t,	prev_pid			)
      		__field(	int,	prev_prio			)
      		__array(	char,	next_comm,	TASK_COMM_LEN	)
      		__field(	pid_t,	next_pid			)
      		__field(	int,	next_prio			)
      	),
      
      	TP_printk("task %s:%d [%d] ==> %s:%d [%d]",
      		__entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
      		__entry->next_comm, __entry->next_pid, __entry->next_prio),
      
      	TP_fast_assign(
      		memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
      		__entry->prev_pid	= prev->pid;
      		__entry->prev_prio	= prev->prio;
      		memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
      		__entry->next_pid	= next->pid;
      		__entry->next_prio	= next->prio;
      	)
       );
      
      This macro is called TRACE_EVENT, it is broken up into 5 parts:
      
       TP_PROTO:        the proto type of the trace point
       TP_ARGS:         the arguments of the trace point
       TP_STRUCT_entry: the structure layout of the entry in the ring buffer
       TP_printk:       the printk format
       TP_fast_assign:  the method used to write the entry into the ring buffer
      
      The structure is the definition of how the event will be saved in the
      ring buffer. The printk is used by the internal tracing in case of
      an oops, and the kernel needs to print out the format of the record
      to the console. This the TP_printk gives a means to show the records
      in a human readable format. It is also used to print out the data
      from the trace file.
      
      The TP_fast_assign is executed directly. It is basically like a C function,
      where the __entry is the handle to the record.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      da4d0302
    • S
      tracing: typecast sizeof and offsetof to unsigned int · 156b5f17
      Steven Rostedt 提交于
      Impact: fix compiler warnings
      
      On x86_64 sizeof and offsetof are treated as long, where as on x86_32
      they are int. This patch typecasts them to unsigned int to avoid
      one arch giving warnings while the other does not.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      156b5f17
  15. 06 3月, 2009 1 次提交
    • S
      tracing: add format files for ftrace default entries · 770cb243
      Steven Rostedt 提交于
      Impact: allow user apps to read binary format of basic ftrace entries
      
      Currently, only defined raw events export their formats so a binary
      reader can parse them. There's no reason that the default ftrace entries
      can't export their formats.
      
      This patch adds a subsystem called "ftrace" in the events directory
      that includes the ftrace entries for basic ftrace recorded items.
      
      These only have three files in the events directory:
      
       type             : printf
       available_types  : printf
       format           : format for the event entry
      
      For example:
      
       # cat /debug/tracing/events/ftrace/wakeup/format
      name: wakeup
      ID: 3
      format:
              field:unsigned char type;       offset:0;       size:1;
              field:unsigned char flags;      offset:1;       size:1;
              field:unsigned char preempt_count;      offset:2;       size:1;
              field:int pid;  offset:4;       size:4;
              field:int tgid; offset:8;       size:4;
      
              field:unsigned int prev_pid;    offset:12;      size:4;
              field:unsigned char prev_prio;  offset:16;      size:1;
              field:unsigned char prev_state; offset:17;      size:1;
              field:unsigned int next_pid;    offset:20;      size:4;
              field:unsigned char next_prio;  offset:24;      size:1;
              field:unsigned char next_state; offset:25;      size:1;
              field:unsigned int next_cpu;    offset:28;      size:4;
      
      print fmt: "%u:%u:%u  ==+ %u:%u:%u [%03u]"
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      770cb243
  16. 03 3月, 2009 4 次提交
    • S
      tracing: add trace name and id to event formats · c5e4e192
      Steven Rostedt 提交于
      To be able to identify the trace in the binary format output, the
      id of the trace event (which is dynamically assigned) must also be listed.
      
      This patch adds the name of the trace point as well as the id assigned.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      c5e4e192
    • S
      tracing: add ftrace headers to event format files · 91729ef9
      Steven Rostedt 提交于
      This patch includes the ftrace header to the event formats files:
      
       # cat /debug/tracing/events/sched/sched_switch/format
              field:unsigned char type;       offset:0;       size:1;
              field:unsigned char flags;      offset:1;       size:1;
              field:unsigned char preempt_count;      offset:2;       size:1;
              field:int pid;  offset:4;       size:4;
              field:int tgid; offset:8;       size:4;
      
              field:pid_t prev_pid;   offset:12;      size:4;
              field:int prev_prio;    offset:16;      size:4;
              field special:char next_comm[TASK_COMM_LEN];    offset:20;      size:16;
              field:pid_t next_pid;   offset:36;      size:4;
              field:int next_prio;    offset:40;      size:4;
      
      A blank line is used as a deliminator between the ftrace header and the
      trace point fields.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      91729ef9
    • S
      tracing: add format file to describe event struct fields · 981d081e
      Steven Rostedt 提交于
      This patch adds the "format" file to the trace point event directory.
      This is based off of work by Tom Zanussi, in which a file is exported
      to be tread from user land such that a user space app may read the
      binary record stored in the ring buffer.
      
       # cat /debug/tracing/events/sched/sched_switch/format
              field:pid_t prev_pid;   offset:12;      size:4;
              field:int prev_prio;    offset:16;      size:4;
              field special:char next_comm[TASK_COMM_LEN];    offset:20;      size:16;
              field:pid_t next_pid;   offset:36;      size:4;
              field:int next_prio;    offset:40;      size:4;
      
      Idea-from: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      981d081e
    • S
      tracing: add protection around modify trace event fields · 11a241a3
      Steven Rostedt 提交于
      The trace event objects are currently not proctected against
      reentrancy. This patch adds a mutex around the modifications of
      the trace event fields.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      11a241a3
  17. 28 2月, 2009 4 次提交
    • S
      tracing: add raw fast tracing interface for trace events · fd994989
      Steven Rostedt 提交于
      This patch adds the interface to enable the C style trace points.
      In the directory /debugfs/tracing/events/subsystem/event
      We now have three files:
      
       enable : values 0 or 1 to enable or disable the trace event.
      
       available_types: values 'raw' and 'printf' which indicate the tracing
             types available for the trace point. If a developer does not
             use the TRACE_EVENT_FORMAT macro and just uses the TRACE_FORMAT
             macro, then only 'printf' will be available. This file is
             read only.
      
       type: values 'raw' or 'printf'. This indicates which type of tracing
             is active for that trace point. 'printf' is the default and
             if 'raw' is not available, this file is read only.
      
       # echo raw > /debug/tracing/events/sched/sched_wakeup/type
       # echo 1 > /debug/tracing/events/sched/sched_wakeup/enable
      
       Will enable the C style tracing for the sched_wakeup trace point.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      fd994989
    • S
      tracing: add raw trace point recording infrastructure · c32e827b
      Steven Rostedt 提交于
      Impact: lower overhead tracing
      
      The current event tracer can automatically pick up trace points
      that are registered with the TRACE_FORMAT macro. But it required
      a printf format string and parsing. Although, this adds the ability
      to get guaranteed information like task names and such, it took
      a hit in overhead processing. This processing can add about 500-1000
      nanoseconds overhead, but in some cases that too is considered
      too much and we want to shave off as much from this overhead as
      possible.
      
      Tom Zanussi recently posted tracing patches to lkml that are based
      on a nice idea about capturing the data via C structs using
      STRUCT_ENTER, STRUCT_EXIT type of macros.
      
      I liked that method very much, but did not like the implementation
      that required a developer to add data/code in several disjoint
      locations.
      
      This patch extends the event_tracer macros to do a similar "raw C"
      approach that Tom Zanussi did. But instead of having the developers
      needing to tweak a bunch of code all over the place, they can do it
      all in one macro - preferably placed near the code that it is
      tracing. That makes it much more likely that tracepoints will be
      maintained on an ongoing basis by the code they modify.
      
      The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
      a developer may still utilize the more low level DECLARE_TRACE macros
      if they don't care about getting their traces automatically in the event
      tracer.)
      
      They can also use the existing TRACE_FORMAT if they don't need to code
      the tracepoint in C, but just want to use the convenience of printf.
      
      So if the developer wants to "hardwire" a tracepoint in the fastest
      possible way, and wants to acquire their data via a user space utility
      in a raw binary format, or wants to see it in the trace output but not
      sacrifice any performance, then they can implement the faster but
      more complex TRACE_EVENT_FORMAT macro.
      
      Here's what usage looks like:
      
        TRACE_EVENT_FORMAT(name,
      	TPPROTO(proto),
      	TPARGS(args),
      	TPFMT(fmt, fmt_args),
      	TRACE_STUCT(
      		TRACE_FIELD(type1, item1, assign1)
      		TRACE_FIELD(type2, item2, assign2)
      			[...]
      	),
      	TPRAWFMT(raw_fmt)
      	);
      
      Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
      uses.
      
       name: is the unique identifier of the trace point
       proto: The proto type that the trace point uses
       args: the args in the proto type
       fmt: printf format to use with the event printf tracer
       fmt_args: the printf argments to match fmt
      
       TRACE_STRUCT starts the ability to create a structure.
       Each item in the structure is defined with a TRACE_FIELD
      
        TRACE_FIELD(type, item, assign)
      
       type: the C type of item.
       item: the name of the item in the stucture
       assign: what to assign the item in the trace point callback
      
       raw_fmt is a way to pretty print the struct. It must match
        the order of the items are added in TRACE_STUCT
      
       An example of this would be:
      
       TRACE_EVENT_FORMAT(sched_wakeup,
      	TPPROTO(struct rq *rq, struct task_struct *p, int success),
      	TPARGS(rq, p, success),
      	TPFMT("task %s:%d %s",
      	      p->comm, p->pid, success?"succeeded":"failed"),
      	TRACE_STRUCT(
      		TRACE_FIELD(pid_t, pid, p->pid)
      		TRACE_FIELD(int, success, success)
      	),
      	TPRAWFMT("task %d success=%d")
      	);
      
       This creates us a unique struct of:
      
       struct {
      	pid_t		pid;
      	int		success;
       };
      
       And the way the call back would assign these values would be:
      
      	entry->pid = p->pid;
      	entry->success = success;
      
      The nice part about this is that the creation of the assignent is done
      via macro magic in the event tracer.  Once the TRACE_EVENT_FORMAT is
      created, the developer will then have a faster method to record
      into the ring buffer. They do not need to worry about the tracer itself.
      
      The developer would only need to touch the files in include/trace/*.h
      
      Again, I would like to give special thanks to Tom Zanussi for this
      nice idea.
      
      Idea-from: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      c32e827b
    • S
      tracing: make the set_event and available_events subsystem aware · b628b3e6
      Steven Rostedt 提交于
      This patch makes the event files, set_event and available_events
      aware of the subsystem.
      
      Now you can enable an entire subsystem with:
      
        echo 'irq:*' > set_event
      
      Note: the '*' is not needed.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      b628b3e6
    • S
      tracing: add subsystem level to trace events · 6ecc2d1c
      Steven Rostedt 提交于
      If a trace point header defines TRACE_SYSTEM, then it will add the
      following trace points into that event system.
      
      If include/trace/irq_event_types.h has:
      
       #define TRACE_SYSTEM irq
      
      at the top and
      
       #undef TRACE_SYSTEM
      
      at the bottom, then a directory "irq" will be created in the
      /debug/tracing/events directory. Inside that directory will contain the
      two trace points that are defined in include/trace/irq_event_types.h.
      
      Only adding the above to irq and not to sched, we get:
      
       # ls /debug/tracing/events/
      irq                     sched_process_exit  sched_signal_send  sched_wakeup_new
      sched_kthread_stop      sched_process_fork  sched_switch
      sched_kthread_stop_ret  sched_process_free  sched_wait_task
      sched_migrate_task      sched_process_wait  sched_wakeup
      
       # ls /debug/tracing/events/irq
      irq_handler_entry  irq_handler_exit
      
      If we add #define TRACE_SYSTEM sched to the trace/sched_event_types.h
      then the rest of the trace events will be put in a sched directory
      within the events directory.
      
      I've been playing with this idea of the subsystem for a while, but
      recently Tom Zanussi posted some patches to lkml that included this
      method. Tom's approach was clean and got me to finally put some effort
      to clean up the event trace points.
      
      Thanks to Tom Zanussi for demonstrating how nice the subsystem
      method is.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      6ecc2d1c