1. 21 12月, 2013 2 次提交
    • T
      tracing: Add 'traceon' and 'traceoff' event trigger commands · 2a2df321
      Tom Zanussi 提交于
      Add 'traceon' and 'traceoff' event_command commands.  traceon and
      traceoff event triggers are added by the user via these commands in a
      similar way and using practically the same syntax as the analagous
      'traceon' and 'traceoff' ftrace function commands, but instead of
      writing to the set_ftrace_filter file, the traceon and traceoff
      triggers are written to the per-event 'trigger' files:
      
          echo 'traceon' > .../tracing/events/somesys/someevent/trigger
          echo 'traceoff' > .../tracing/events/somesys/someevent/trigger
      
      The above command will turn tracing on or off whenever someevent is
      hit.
      
      This also adds a 'count' version that limits the number of times the
      command will be invoked:
      
          echo 'traceon:N' > .../tracing/events/somesys/someevent/trigger
          echo 'traceoff:N' > .../tracing/events/somesys/someevent/trigger
      
      Where N is the number of times the command will be invoked.
      
      The above commands will will turn tracing on or off whenever someevent
      is hit, but only N times.
      
      Some common register/unregister_trigger() implementations of the
      event_command reg()/unreg() callbacks are also provided, which add and
      remove trigger instances to the per-event list of triggers, and
      arm/disarm them as appropriate.  event_trigger_callback() is a
      general-purpose event_command func() implementation that orchestrates
      command parsing and registration for most normal commands.
      
      Most event commands will use these, but some will override and
      possibly reuse them.
      
      The event_trigger_init(), event_trigger_free(), and
      event_trigger_print() functions are meant to be common implementations
      of the event_trigger_ops init(), free(), and print() ops,
      respectively.
      
      Most trigger_ops implementations will use these, but some will
      override and possibly reuse them.
      
      Link: http://lkml.kernel.org/r/00a52816703b98d2072947478dd6e2d70cde5197.1382622043.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2a2df321
    • T
      tracing: Add basic event trigger framework · 85f2b082
      Tom Zanussi 提交于
      Add a 'trigger' file for each trace event, enabling 'trace event
      triggers' to be set for trace events.
      
      'trace event triggers' are patterned after the existing 'ftrace
      function triggers' implementation except that triggers are written to
      per-event 'trigger' files instead of to a single file such as the
      'set_ftrace_filter' used for ftrace function triggers.
      
      The implementation is meant to be entirely separate from ftrace
      function triggers, in order to keep the respective implementations
      relatively simple and to allow them to diverge.
      
      The event trigger functionality is built on top of SOFT_DISABLE
      functionality.  It adds a TRIGGER_MODE bit to the ftrace_event_file
      flags which is checked when any trace event fires.  Triggers set for a
      particular event need to be checked regardless of whether that event
      is actually enabled or not - getting an event to fire even if it's not
      enabled is what's already implemented by SOFT_DISABLE mode, so trigger
      mode directly reuses that.  Event trigger essentially inherit the soft
      disable logic in __ftrace_event_enable_disable() while adding a bit of
      logic and trigger reference counting via tm_ref on top of that in a
      new trace_event_trigger_enable_disable() function.  Because the base
      __ftrace_event_enable_disable() code now needs to be invoked from
      outside trace_events.c, a wrapper is also added for those usages.
      
      The triggers for an event are actually invoked via a new function,
      event_triggers_call(), and code is also added to invoke them for
      ftrace_raw_event calls as well as syscall events.
      
      The main part of the patch creates a new trace_events_trigger.c file
      to contain the trace event triggers implementation.
      
      The standard open, read, and release file operations are implemented
      here.
      
      The open() implementation sets up for the various open modes of the
      'trigger' file.  It creates and attaches the trigger iterator and sets
      up the command parser.  If opened for reading set up the trigger
      seq_ops.
      
      The read() implementation parses the event trigger written to the
      'trigger' file, looks up the trigger command, and passes it along to
      that event_command's func() implementation for command-specific
      processing.
      
      The release() implementation does whatever cleanup is needed to
      release the 'trigger' file, like releasing the parser and trigger
      iterator, etc.
      
      A couple of functions for event command registration and
      unregistration are added, along with a list to add them to and a mutex
      to protect them, as well as an (initially empty) registration function
      to add the set of commands that will be added by future commits, and
      call to it from the trace event initialization code.
      
      also added are a couple trigger-specific data structures needed for
      these implementations such as a trigger iterator and a struct for
      trigger-specific data.
      
      A couple structs consisting mostly of function meant to be implemented
      in command-specific ways, event_command and event_trigger_ops, are
      used by the generic event trigger command implementations.  They're
      being put into trace.h alongside the other trace_event data structures
      and functions, in the expectation that they'll be needed in several
      trace_event-related files such as trace_events_trigger.c and
      trace_events.c.
      
      The event_command.func() function is meant to be called by the trigger
      parsing code in order to add a trigger instance to the corresponding
      event.  It essentially coordinates adding a live trigger instance to
      the event, and arming the triggering the event.
      
      Every event_command func() implementation essentially does the
      same thing for any command:
      
         - choose ops - use the value of param to choose either a number or
           count version of event_trigger_ops specific to the command
         - do the register or unregister of those ops
         - associate a filter, if specified, with the triggering event
      
      The reg() and unreg() ops allow command-specific implementations for
      event_trigger_op registration and unregistration, and the
      get_trigger_ops() op allows command-specific event_trigger_ops
      selection to be parameterized.  When a trigger instance is added, the
      reg() op essentially adds that trigger to the triggering event and
      arms it, while unreg() does the opposite.  The set_filter() function
      is used to associate a filter with the trigger - if the command
      doesn't specify a set_filter() implementation, the command will ignore
      filters.
      
      Each command has an associated trigger_type, which serves double duty,
      both as a unique identifier for the command as well as a value that
      can be used for setting a trigger mode bit during trigger invocation.
      
      The signature of func() adds a pointer to the event_command struct,
      used to invoke those functions, along with a command_data param that
      can be passed to the reg/unreg functions.  This allows func()
      implementations to use command-specific blobs and supports code
      re-use.
      
      The event_trigger_ops.func() command corrsponds to the trigger 'probe'
      function that gets called when the triggering event is actually
      invoked.  The other functions are used to list the trigger when
      needed, along with a couple mundane book-keeping functions.
      
      This also moves event_file_data() into trace.h so it can be used
      outside of trace_events.c.
      
      Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Idea-by: NSteve Rostedt <rostedt@goodmis.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      85f2b082
  2. 19 11月, 2013 1 次提交
  3. 06 11月, 2013 1 次提交
    • T
      tracing: Update event filters for multibuffer · f306cc82
      Tom Zanussi 提交于
      The trace event filters are still tied to event calls rather than
      event files, which means you don't get what you'd expect when using
      filters in the multibuffer case:
      
      Before:
      
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 2048
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      
      Setting the filter in tracing/instances/test1/events shouldn't affect
      the same event in tracing/events as it does above.
      
      After:
      
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      
      We'd like to just move the filter directly from ftrace_event_call to
      ftrace_event_file, but there are a couple cases that don't yet have
      multibuffer support and therefore have to continue using the current
      event_call-based filters.  For those cases, a new USE_CALL_FILTER bit
      is added to the event_call flags, whose main purpose is to keep the
      old behavior for those cases until they can be updated with
      multibuffer support; at that point, the USE_CALL_FILTER flag (and the
      new associated call_filter_check_discard() function) can go away.
      
      The multibuffer support also made filter_current_check_discard()
      redundant, so this change removes that function as well and replaces
      it with filter_check_discard() (or call_filter_check_discard() as
      appropriate).
      
      Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f306cc82
  4. 03 8月, 2013 1 次提交
  5. 01 8月, 2013 1 次提交
  6. 27 7月, 2013 1 次提交
    • S
      tracing: Add __tracepoint_string() to export string pointers · 102c9323
      Steven Rostedt (Red Hat) 提交于
      There are several tracepoints (mostly in RCU), that reference a string
      pointer and uses the print format of "%s" to display the string that
      exists in the kernel, instead of copying the actual string to the
      ring buffer (saves time and ring buffer space).
      
      But this has an issue with userspace tools that read the binary buffers
      that has the address of the string but has no access to what the string
      itself is. The end result is just output that looks like:
      
       rcu_dyntick:          ffffffff818adeaa 1 0
       rcu_dyntick:          ffffffff818adeb5 0 140000000000000
       rcu_dyntick:          ffffffff818adeb5 0 140000000000000
       rcu_utilization:      ffffffff8184333b
       rcu_utilization:      ffffffff8184333b
      
      The above is pretty useless when read by the userspace tools. Ideally
      we would want something that looks like this:
      
       rcu_dyntick:          Start 1 0
       rcu_dyntick:          End 0 140000000000000
       rcu_dyntick:          Start 140000000000000 0
       rcu_callback:         rcu_preempt rhp=0xffff880037aff710 func=put_cred_rcu 0/4
       rcu_callback:         rcu_preempt rhp=0xffff880078961980 func=file_free_rcu 0/5
       rcu_dyntick:          End 0 1
      
      The trace_printk() which also only stores the address of the string
      format instead of recording the string into the buffer itself, exports
      the mapping of kernel addresses to format strings via the printk_format
      file in the debugfs tracing directory.
      
      The tracepoint strings can use this same method and output the format
      to the same file and the userspace tools will be able to decipher
      the address without any modification.
      
      The tracepoint strings need its own section to save the strings because
      the trace_printk section will cause the trace_printk() buffers to be
      allocated if anything exists within the section. trace_printk() is only
      used for debugging and should never exist in the kernel, we can not use
      the trace_printk sections.
      
      Add a new tracepoint_str section that will also be examined by the output
      of the printk_format file.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      102c9323
  7. 10 5月, 2013 1 次提交
    • M
      tracing: Modify soft-mode only if there's no other referrer · 1cf4c073
      Masami Hiramatsu 提交于
      Modify soft-mode flag only if no other soft-mode referrer
      (currently only the ftrace triggers) by using a reference
      counter in each ftrace_event_file.
      
      Without this fix, adding and removing several different
      enable/disable_event triggers on the same event clear
      soft-mode bit from the ftrace_event_file. This also
      happens with a typo of glob on setting triggers.
      
      e.g.
      
       # echo vfs_symlink:enable_event:net:netif_rx > set_ftrace_filter
       # cat events/net/netif_rx/enable
       0*
       # echo typo_func:enable_event:net:netif_rx > set_ftrace_filter
       # cat events/net/netif_rx/enable
       0
       # cat set_ftrace_filter
       #### all functions enabled ####
       vfs_symlink:enable_event:net:netif_rx:unlimited
      
      As above, we still have a trigger, but soft-mode is gone.
      
      Link: http://lkml.kernel.org/r/20130509054429.30398.7464.stgit@mhiramat-M0-7522
      
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: David Sharp <dhsharp@google.com>
      Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      Cc: Tom Zanussi <tom.zanussi@intel.com>
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1cf4c073
  8. 20 4月, 2013 1 次提交
  9. 15 3月, 2013 8 次提交
    • S
      tracing: Add a way to soft disable trace events · 417944c4
      Steven Rostedt (Red Hat) 提交于
      In order to let triggers enable or disable events, we need a 'soft'
      method for doing so. For example, if a function probe is added that
      lets a user enable or disable events when a function is called, that
      change must be done without taking locks or a mutex, and definitely
      it can't sleep. But the full enabling of a tracepoint is expensive.
      
      By adding a 'SOFT_DISABLE' flag, and converting the flags to be updated
      without the protection of a mutex (using set/clear_bit()), this soft
      disable flag can be used to allow critical sections to enable or disable
      events from being traced (after the event has been placed into "SOFT_MODE").
      
      Some caveats though: The comm recorder (to map pids with a comm) can not
      be soft disabled (yet). If you disable an event with with a "soft"
      disable and wait a while before reading the trace, the comm cache may be
      replaced and you'll get a bunch of <...> for comms in the trace.
      
      Reading the "enable" file for an event that is disabled will now give
      you "0*" where the '*' denotes that the tracepoint is still active but
      the event itself is "disabled".
      
      [ fixed _BIT used in & operation : thanks to Dan Carpenter and smatch ]
      
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      417944c4
    • S
      tracing: Fix comments for ftrace_event_file/call flags · 57d01ad0
      Steven Rostedt (Red Hat) 提交于
      Most of the flags for the struct ftrace_event_file were moved over
      to the flags of the struct ftrace_event_call, but the comments were
      never updated.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      57d01ad0
    • S
      tracing: Consolidate max_tr into main trace_array structure · 12883efb
      Steven Rostedt (Red Hat) 提交于
      Currently, the way the latency tracers and snapshot feature works
      is to have a separate trace_array called "max_tr" that holds the
      snapshot buffer. For latency tracers, this snapshot buffer is used
      to swap the running buffer with this buffer to save the current max
      latency.
      
      The only items needed for the max_tr is really just a copy of the buffer
      itself, the per_cpu data pointers, the time_start timestamp that states
      when the max latency was triggered, and the cpu that the max latency
      was triggered on. All other fields in trace_array are unused by the
      max_tr, making the max_tr mostly bloat.
      
      This change removes the max_tr completely, and adds a new structure
      called trace_buffer, that holds the buffer pointer, the per_cpu data
      pointers, the time_start timestamp, and the cpu where the latency occurred.
      
      The trace_array, now has two trace_buffers, one for the normal trace and
      one for the max trace or snapshot. By doing this, not only do we remove
      the bloat from the max_trace but the instances of traces can now use
      their own snapshot feature and not have just the top level global_trace have
      the snapshot feature and latency tracers for itself.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      12883efb
    • S
      tracing: Only clear trace buffer on module unload if event was traced · 575380da
      Steven Rostedt (Red Hat) 提交于
      Currently, when a module with events is unloaded, the trace buffer is
      cleared. This is just a safety net in case the module might have some
      strange callback when its event is outputted. But there's no reason
      to reset the buffer if the module didn't have any of its events traced.
      
      Add a flag to the event "call" structure called WAS_ENABLED and gets set
      when the event is ever enabled, and this flag never gets cleared. When a
      module gets unloaded, if any of its events have this flag set, then the
      trace buffer will get cleared.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      575380da
    • S
      tracing: Add comment for trace event flag IGNORE_ENABLE · 2a30c11f
      Steven Rostedt (Red Hat) 提交于
      All the trace event flags have comments but the IGNORE_ENABLE flag
      which is set for ftrace internal events that should not be enabled
      via the debugfs "enable" file. That is, if the top level enable file
      is set, it will enable all events. It use to just check the ftrace
      event call descriptor "reg" field and skip those whithout it, but now
      some ftrace internal events have a reg field but still need to be
      skipped. The flag was created to ignore those events.
      
      Now document it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2a30c11f
    • L
      tracing: Add a helper function for event print functions · f71130de
      Li Zefan 提交于
      Move duplicate code in event print functions to a helper function.
      
      This shrinks the size of the kernel by ~13K.
      
         text    data     bss     dec     hex filename
      6596137 1743966 10138672        18478775        119f6b7 vmlinux.o.old
      6583002 1743849 10138672        18465523        119c2f3 vmlinux.o.new
      
      Link: http://lkml.kernel.org/r/51258746.2060304@huawei.comSigned-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f71130de
    • S
      tracing: Pass the ftrace_file to the buffer lock reserve code · ccb469a1
      Steven Rostedt 提交于
      Pass the struct ftrace_event_file *ftrace_file to the
      trace_event_buffer_lock_reserve() (new function that replaces the
      trace_current_buffer_lock_reserver()).
      
      The ftrace_file holds a pointer to the trace_array that is in use.
      In the case of multiple buffers with different trace_arrays, this
      allows different events to be recorded into different buffers.
      
      Also fixed some of the stale comments in include/trace/ftrace.h
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ccb469a1
    • S
      tracing: Separate out trace events from global variables · ae63b31e
      Steven Rostedt 提交于
      The trace events for ftrace are all defined via global variables.
      The arrays of events and event systems are linked to a global list.
      This prevents multiple users of the event system (what to enable and
      what not to).
      
      By adding descriptors to represent the event/file relation, as well
      as to which trace_array descriptor they are associated with, allows
      for more than one set of events to be defined. Once the trace events
      files have a link between the trace event and the trace_array they
      are associated with, we can create multiple trace_arrays that can
      record separate events in separate buffers.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ae63b31e
  10. 31 1月, 2013 1 次提交
    • H
      tracing: Make a snapshot feature available from userspace · debdd57f
      Hiraku Toyooka 提交于
      Ftrace has a snapshot feature available from kernel space and
      latency tracers (e.g. irqsoff) are using it. This patch enables
      user applictions to take a snapshot via debugfs.
      
      Add "snapshot" debugfs file in "tracing" directory.
      
        snapshot:
          This is used to take a snapshot and to read the output of the
          snapshot.
      
           # echo 1 > snapshot
      
          This will allocate the spare buffer for snapshot (if it is
          not allocated), and take a snapshot.
      
           # cat snapshot
      
          This will show contents of the snapshot.
      
           # echo 0 > snapshot
      
          This will free the snapshot if it is allocated.
      
          Any other positive values will clear the snapshot contents if
          the snapshot is allocated, or return EINVAL if it is not allocated.
      
      Link: http://lkml.kernel.org/r/20121226025300.3252.86850.stgit@liselsia
      
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: NHiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      [
         Fixed irqsoff selftest and also a conflict with a change
         that fixes the update_max_tr.
      ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      debdd57f
  11. 22 1月, 2013 2 次提交
    • S
      tracing: Remove the extra 4 bytes of padding in events · b000c806
      Steven Rostedt 提交于
      Due to a userspace issue with PowerTop v2beta, which hardcoded
      the offset of event fields that it was using, it broke when
      we removed the Big Kernel Lock counter from the event header.
      
       (commit e6e1e259 "tracing: Remove lock_depth from event entry")
      
      Because this broke userspace, it was determined that we must
      keep those 4 bytes around.
      
       (commit a3a4a5ac "Regression: partial revert "tracing: Remove lock_depth from event entry"")
      
      This unfortunately wastes space in the ring buffer. 4 bytes per
      event, where a lot of events are just 24 bytes. That's 16% of the
      buffer wasted. A million events will add 4 megs of white space
      into the buffer.
      
      It was later noticed that PowerTop v2beta could not work on systems
      where the kernel was 64 bit but the userspace was 32 bits.
      The reason was because the offsets are different between the
      two and the hard coded offset of one would not work with the other.
      
      With PowerTop v2 final, it implemented the same interface that both
      perf and trace-cmd use. That is, it reads the format file of
      the event to find the offsets of the fields it needs. This fixes
      the problem with running powertop on a 32 bit userspace running
      on a 64 bit kernel. It also no longer requires the 4 byte padding.
      
      As PowerTop v2 has been out for a while, and is included in all
      major distributions, it is time that we can safely remove the
      4 bytes of padding. Users of PowerTop v2beta should upgrade to
      PowerTop v2 final.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b000c806
    • S
      tracing: Fix sparse warning with is_signed_type() macro · 418c59e4
      Steven Rostedt 提交于
      Sparse complains when is_signed_type() is used on a pointer.
      This macro is needed for the format output used for ftrace
      and perf, to know if a binary field is a signed type or not.
      The is_signed_type() macro is used against all fields that are
      recorded by events to automate the operation.
      
      The problem sparse has is with the current way is_signed_type()
      works:
      
        ((type)-1 < 0)
      
      If "type" is a poiner, than sparse does not like it being compared
      to an integer (zero). The simple fix is to just give zero the
      same type. The runtime result stays the same.
      Reported-by: NRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      418c59e4
  12. 14 11月, 2012 1 次提交
    • D
      tracing: Format non-nanosec times from tsc clock without a decimal point. · 8be0709f
      David Sharp 提交于
      With the addition of the "tsc" clock, formatting timestamps to look like
      fractional seconds is misleading. Mark clocks as either in nanoseconds or
      not, and format non-nanosecond timestamps as decimal integers.
      
      Tested:
      $ cd /sys/kernel/debug/tracing/
      $ cat trace_clock
      [local] global tsc
      $ echo sched_switch > set_event
      $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
      $ cat trace
                <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
                 sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      $ echo 1 > options/latency-format
      $ cat trace
        <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
         sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      $ echo tsc > trace_clock
      $ cat trace
      $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
      $ echo 0 > options/latency-format
      $ cat trace
                <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
                 sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      echo 1 > options/latency-format
      $ cat trace
        <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
         sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      
      v2:
      Move arch-specific bits out of generic code.
      v4:
      Fix x86_32 build due to 64-bit division.
      
      Google-Bug-Id: 6980623
      Link: http://lkml.kernel.org/r/1352837903-32191-2-git-send-email-dhsharp@google.com
      
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8be0709f
  13. 02 11月, 2012 1 次提交
    • S
      tracing: Use irq_work for wake ups and remove *_nowake_*() functions · 0d5c6e1c
      Steven Rostedt 提交于
      Have the ring buffer commit function use the irq_work infrastructure to
      wake up any waiters waiting on the ring buffer for new data. The irq_work
      was created for such a purpose, where doing the actual wake up at the
      time of adding data is too dangerous, as an event or function trace may
      be in the midst of the work queue locks and cause deadlocks. The irq_work
      will either delay the action to the next timer interrupt, or trigger an IPI
      to itself forcing an interrupt to do the work (in a safe location).
      
      With irq_work, all ring buffer commits can safely do wakeups, removing
      the need for the ring buffer commit "nowake" variants, which were used
      by events and function tracing. All commits can now safely use the
      normal commit, and the "nowake" variants can be removed.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0d5c6e1c
  14. 31 7月, 2012 1 次提交
  15. 29 6月, 2012 1 次提交
    • S
      tracing: Remove NR_CPUS array from trace_iterator · 6d158a81
      Steven Rostedt 提交于
      Replace the NR_CPUS array of buffer_iter from the trace_iterator
      with an allocated array. This will just create an array of
      possible CPUS instead of the max number specified.
      
      The use of NR_CPUS in that array caused allocation failures for
      machines that were tight on memory. This did not cause any failures
      to the system itself (no crashes), but caused unnecessary failures
      for reading the trace files.
      
      Added a helper function called 'trace_buffer_iter()' that returns
      the buffer_iter item or NULL if it is not defined or the array was
      not allocated. Some routines do not require the array
      (tracing_open_pipe() for one).
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6d158a81
  16. 15 6月, 2012 1 次提交
    • S
      tracing: Add comments for the other bits of ftrace_event_call.flags · 5da43bed
      Steven Rostedt 提交于
      	TRACE_EVENT_FL_ENABLED_BIT,
      	TRACE_EVENT_FL_FILTERED_BIT,
      	TRACE_EVENT_FL_RECORDED_CMD_BIT,
      
      Have comments about what they are, but:
      
      	TRACE_EVENT_FL_CAP_ANY_BIT,
      	TRACE_EVENT_FL_NO_SET_FILTER_BIT,
      	TRACE_EVENT_FL_IGNORE_ENABLE_BIT,
      
      do not, making them second class citizens. To prevent another
      class warfare, these bits have protested for their right to be
      commented. And By Golly! I'll give them what they want!
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5da43bed
  17. 11 5月, 2012 1 次提交
    • S
      tracing: Do not enable function event with enable · 9b63776f
      Steven Rostedt 提交于
      With the adding of function tracing event to perf, it caused a
      side effect that produces the following warning when enabling all
      events in ftrace:
      
       # echo 1 > /sys/kernel/debug/tracing/events/enable
      
      [console]
      event trace: Could not enable event function
      
      This is because when enabling all events via the debugfs system
      it ignores events that do not have a ->reg() function assigned.
      This was to skip over the ftrace internal events (as they are
      not TRACE_EVENTs). But as the ftrace function event now has
      a ->reg() function attached to it for use with perf, it is no
      longer ignored.
      
      Worse yet, this ->reg() function is being called when it should
      not be. It returns an error and causes the above warning to
      be printed.
      
      By adding a new event_call flag (TRACE_EVENT_FL_IGNORE_ENABLE)
      and have all ftrace internel event structures have it set,
      setting the events/enable will no longe try to incorrectly enable
      the function event and does not warn.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9b63776f
  18. 14 3月, 2012 1 次提交
  19. 22 2月, 2012 3 次提交
  20. 06 12月, 2011 1 次提交
  21. 03 11月, 2011 1 次提交
  22. 15 7月, 2011 1 次提交
    • S
      tracing: Have dynamic size event stack traces · 4a9bd3f1
      Steven Rostedt 提交于
      Currently the stack trace per event in ftace is only 8 frames.
      This can be quite limiting and sometimes useless. Especially when
      the "ignore frames" is wrong and we also use up stack frames for
      the event processing itself.
      
      Change this to be dynamic by adding a percpu buffer that we can
      write a large stack frame into and then copy into the ring buffer.
      
      For interrupts and NMIs that come in while another event is being
      process, will only get to use the 8 frame stack. That should be enough
      as the task that it interrupted will have the full stack frame anyway.
      Requested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4a9bd3f1
  23. 15 6月, 2011 1 次提交
    • M
      tracing/kprobes: Fix kprobe-tracer to support stack trace · 1fd8df2c
      Masami Hiramatsu 提交于
      Fix to support kernel stack trace correctly on kprobe-tracer.
      Since the execution path of kprobe-based dynamic events is different
      from other tracepoint-based events, normal ftrace_trace_stack() doesn't
      work correctly. To fix that, this introduces ftrace_trace_stack_regs()
      which traces stack via pt_regs instead of current stack register.
      
      e.g.
      
       # echo p schedule+4 > /sys/kernel/debug/tracing/kprobe_events
       # echo 1 > /sys/kernel/debug/tracing/options/stacktrace
       # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
       # head -n 20 /sys/kernel/debug/tracing/trace
                  bash-2968  [000] 10297.050245: p_schedule_4: (schedule+0x4/0x4ca)
                  bash-2968  [000] 10297.050247: <stack trace>
       => schedule_timeout
       => n_tty_read
       => tty_read
       => vfs_read
       => sys_read
       => system_call_fastpath
           kworker/0:1-2940  [000] 10297.050265: p_schedule_4: (schedule+0x4/0x4ca)
           kworker/0:1-2940  [000] 10297.050266: <stack trace>
       => worker_thread
       => kthread
       => kernel_thread_helper
                  sshd-1132  [000] 10297.050365: p_schedule_4: (schedule+0x4/0x4ca)
                  sshd-1132  [000] 10297.050365: <stack trace>
       => sysret_careful
      
      Note: Even with this fix, the first entry will be skipped
      if the probe is put on the function entry area before
      the frame pointer is set up (usually, that is 4 bytes
       (push %bp; mov %sp %bp) on x86), because stack unwinder
      depends on the frame pointer.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: yrl.pp-manager.tt@hitachi.com
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Link: http://lkml.kernel.org/r/20110608070934.17777.17116.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1fd8df2c
  24. 26 5月, 2011 1 次提交
  25. 07 5月, 2011 1 次提交
  26. 10 3月, 2011 1 次提交
    • S
      tracing: Remove lock_depth from event entry · e6e1e259
      Steven Rostedt 提交于
      The lock_depth field in the event headers was added as a temporary
      data point for help in removing the BKL. Now that the BKL is pretty
      much been removed, we can remove this field.
      
      This in turn changes the header from 12 bytes to 8 bytes,
      removing the 4 byte buffer that gcc would insert if the first field
      in the data load was 8 bytes in size.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e6e1e259
  27. 08 2月, 2011 1 次提交
  28. 19 11月, 2010 1 次提交
    • S
      tracing/events: Show real number in array fields · 04295780
      Steven Rostedt 提交于
      Currently we have in something like the sched_switch event:
      
        field:char prev_comm[TASK_COMM_LEN];	offset:12;	size:16;	signed:1;
      
      When a userspace tool such as perf tries to parse this, the
      TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro
      simply uses a #len to show the string of the length. When the length is
      an enum, we get a string that means nothing for tools.
      
      By adding a static buffer and a mutex to protect it, we can store the
      string into that buffer with snprintf and show the actual number.
      Now we get:
      
        field:char prev_comm[16];       offset:12;      size:16;        signed:1;
      
      Something much more useful.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      04295780
  29. 18 11月, 2010 1 次提交
    • F
      tracing: Allow syscall trace events for non privileged users · 53cf810b
      Frederic Weisbecker 提交于
      As for the raw syscalls events, individual syscall events won't
      leak system wide information on task bound tracing. Allow non
      privileged users to use them in such workflow.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      53cf810b