1. 06 11月, 2013 1 次提交
    • T
      tracing: Update event filters for multibuffer · f306cc82
      Tom Zanussi 提交于
      The trace event filters are still tied to event calls rather than
      event files, which means you don't get what you'd expect when using
      filters in the multibuffer case:
      
      Before:
      
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 2048
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      
      Setting the filter in tracing/instances/test1/events shouldn't affect
      the same event in tracing/events as it does above.
      
      After:
      
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      
      We'd like to just move the filter directly from ftrace_event_call to
      ftrace_event_file, but there are a couple cases that don't yet have
      multibuffer support and therefore have to continue using the current
      event_call-based filters.  For those cases, a new USE_CALL_FILTER bit
      is added to the event_call flags, whose main purpose is to keep the
      old behavior for those cases until they can be updated with
      multibuffer support; at that point, the USE_CALL_FILTER flag (and the
      new associated call_filter_check_discard() function) can go away.
      
      The multibuffer support also made filter_current_check_discard()
      redundant, so this change removes that function as well and replaces
      it with filter_check_discard() (or call_filter_check_discard() as
      appropriate).
      
      Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f306cc82
  2. 03 8月, 2013 1 次提交
  3. 01 8月, 2013 1 次提交
  4. 27 7月, 2013 1 次提交
    • S
      tracing: Add __tracepoint_string() to export string pointers · 102c9323
      Steven Rostedt (Red Hat) 提交于
      There are several tracepoints (mostly in RCU), that reference a string
      pointer and uses the print format of "%s" to display the string that
      exists in the kernel, instead of copying the actual string to the
      ring buffer (saves time and ring buffer space).
      
      But this has an issue with userspace tools that read the binary buffers
      that has the address of the string but has no access to what the string
      itself is. The end result is just output that looks like:
      
       rcu_dyntick:          ffffffff818adeaa 1 0
       rcu_dyntick:          ffffffff818adeb5 0 140000000000000
       rcu_dyntick:          ffffffff818adeb5 0 140000000000000
       rcu_utilization:      ffffffff8184333b
       rcu_utilization:      ffffffff8184333b
      
      The above is pretty useless when read by the userspace tools. Ideally
      we would want something that looks like this:
      
       rcu_dyntick:          Start 1 0
       rcu_dyntick:          End 0 140000000000000
       rcu_dyntick:          Start 140000000000000 0
       rcu_callback:         rcu_preempt rhp=0xffff880037aff710 func=put_cred_rcu 0/4
       rcu_callback:         rcu_preempt rhp=0xffff880078961980 func=file_free_rcu 0/5
       rcu_dyntick:          End 0 1
      
      The trace_printk() which also only stores the address of the string
      format instead of recording the string into the buffer itself, exports
      the mapping of kernel addresses to format strings via the printk_format
      file in the debugfs tracing directory.
      
      The tracepoint strings can use this same method and output the format
      to the same file and the userspace tools will be able to decipher
      the address without any modification.
      
      The tracepoint strings need its own section to save the strings because
      the trace_printk section will cause the trace_printk() buffers to be
      allocated if anything exists within the section. trace_printk() is only
      used for debugging and should never exist in the kernel, we can not use
      the trace_printk sections.
      
      Add a new tracepoint_str section that will also be examined by the output
      of the printk_format file.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      102c9323
  5. 10 5月, 2013 1 次提交
    • M
      tracing: Modify soft-mode only if there's no other referrer · 1cf4c073
      Masami Hiramatsu 提交于
      Modify soft-mode flag only if no other soft-mode referrer
      (currently only the ftrace triggers) by using a reference
      counter in each ftrace_event_file.
      
      Without this fix, adding and removing several different
      enable/disable_event triggers on the same event clear
      soft-mode bit from the ftrace_event_file. This also
      happens with a typo of glob on setting triggers.
      
      e.g.
      
       # echo vfs_symlink:enable_event:net:netif_rx > set_ftrace_filter
       # cat events/net/netif_rx/enable
       0*
       # echo typo_func:enable_event:net:netif_rx > set_ftrace_filter
       # cat events/net/netif_rx/enable
       0
       # cat set_ftrace_filter
       #### all functions enabled ####
       vfs_symlink:enable_event:net:netif_rx:unlimited
      
      As above, we still have a trigger, but soft-mode is gone.
      
      Link: http://lkml.kernel.org/r/20130509054429.30398.7464.stgit@mhiramat-M0-7522
      
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: David Sharp <dhsharp@google.com>
      Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      Cc: Tom Zanussi <tom.zanussi@intel.com>
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1cf4c073
  6. 20 4月, 2013 1 次提交
  7. 15 3月, 2013 8 次提交
    • S
      tracing: Add a way to soft disable trace events · 417944c4
      Steven Rostedt (Red Hat) 提交于
      In order to let triggers enable or disable events, we need a 'soft'
      method for doing so. For example, if a function probe is added that
      lets a user enable or disable events when a function is called, that
      change must be done without taking locks or a mutex, and definitely
      it can't sleep. But the full enabling of a tracepoint is expensive.
      
      By adding a 'SOFT_DISABLE' flag, and converting the flags to be updated
      without the protection of a mutex (using set/clear_bit()), this soft
      disable flag can be used to allow critical sections to enable or disable
      events from being traced (after the event has been placed into "SOFT_MODE").
      
      Some caveats though: The comm recorder (to map pids with a comm) can not
      be soft disabled (yet). If you disable an event with with a "soft"
      disable and wait a while before reading the trace, the comm cache may be
      replaced and you'll get a bunch of <...> for comms in the trace.
      
      Reading the "enable" file for an event that is disabled will now give
      you "0*" where the '*' denotes that the tracepoint is still active but
      the event itself is "disabled".
      
      [ fixed _BIT used in & operation : thanks to Dan Carpenter and smatch ]
      
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      417944c4
    • S
      tracing: Fix comments for ftrace_event_file/call flags · 57d01ad0
      Steven Rostedt (Red Hat) 提交于
      Most of the flags for the struct ftrace_event_file were moved over
      to the flags of the struct ftrace_event_call, but the comments were
      never updated.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      57d01ad0
    • S
      tracing: Consolidate max_tr into main trace_array structure · 12883efb
      Steven Rostedt (Red Hat) 提交于
      Currently, the way the latency tracers and snapshot feature works
      is to have a separate trace_array called "max_tr" that holds the
      snapshot buffer. For latency tracers, this snapshot buffer is used
      to swap the running buffer with this buffer to save the current max
      latency.
      
      The only items needed for the max_tr is really just a copy of the buffer
      itself, the per_cpu data pointers, the time_start timestamp that states
      when the max latency was triggered, and the cpu that the max latency
      was triggered on. All other fields in trace_array are unused by the
      max_tr, making the max_tr mostly bloat.
      
      This change removes the max_tr completely, and adds a new structure
      called trace_buffer, that holds the buffer pointer, the per_cpu data
      pointers, the time_start timestamp, and the cpu where the latency occurred.
      
      The trace_array, now has two trace_buffers, one for the normal trace and
      one for the max trace or snapshot. By doing this, not only do we remove
      the bloat from the max_trace but the instances of traces can now use
      their own snapshot feature and not have just the top level global_trace have
      the snapshot feature and latency tracers for itself.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      12883efb
    • S
      tracing: Only clear trace buffer on module unload if event was traced · 575380da
      Steven Rostedt (Red Hat) 提交于
      Currently, when a module with events is unloaded, the trace buffer is
      cleared. This is just a safety net in case the module might have some
      strange callback when its event is outputted. But there's no reason
      to reset the buffer if the module didn't have any of its events traced.
      
      Add a flag to the event "call" structure called WAS_ENABLED and gets set
      when the event is ever enabled, and this flag never gets cleared. When a
      module gets unloaded, if any of its events have this flag set, then the
      trace buffer will get cleared.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      575380da
    • S
      tracing: Add comment for trace event flag IGNORE_ENABLE · 2a30c11f
      Steven Rostedt (Red Hat) 提交于
      All the trace event flags have comments but the IGNORE_ENABLE flag
      which is set for ftrace internal events that should not be enabled
      via the debugfs "enable" file. That is, if the top level enable file
      is set, it will enable all events. It use to just check the ftrace
      event call descriptor "reg" field and skip those whithout it, but now
      some ftrace internal events have a reg field but still need to be
      skipped. The flag was created to ignore those events.
      
      Now document it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2a30c11f
    • L
      tracing: Add a helper function for event print functions · f71130de
      Li Zefan 提交于
      Move duplicate code in event print functions to a helper function.
      
      This shrinks the size of the kernel by ~13K.
      
         text    data     bss     dec     hex filename
      6596137 1743966 10138672        18478775        119f6b7 vmlinux.o.old
      6583002 1743849 10138672        18465523        119c2f3 vmlinux.o.new
      
      Link: http://lkml.kernel.org/r/51258746.2060304@huawei.comSigned-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f71130de
    • S
      tracing: Pass the ftrace_file to the buffer lock reserve code · ccb469a1
      Steven Rostedt 提交于
      Pass the struct ftrace_event_file *ftrace_file to the
      trace_event_buffer_lock_reserve() (new function that replaces the
      trace_current_buffer_lock_reserver()).
      
      The ftrace_file holds a pointer to the trace_array that is in use.
      In the case of multiple buffers with different trace_arrays, this
      allows different events to be recorded into different buffers.
      
      Also fixed some of the stale comments in include/trace/ftrace.h
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ccb469a1
    • S
      tracing: Separate out trace events from global variables · ae63b31e
      Steven Rostedt 提交于
      The trace events for ftrace are all defined via global variables.
      The arrays of events and event systems are linked to a global list.
      This prevents multiple users of the event system (what to enable and
      what not to).
      
      By adding descriptors to represent the event/file relation, as well
      as to which trace_array descriptor they are associated with, allows
      for more than one set of events to be defined. Once the trace events
      files have a link between the trace event and the trace_array they
      are associated with, we can create multiple trace_arrays that can
      record separate events in separate buffers.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ae63b31e
  8. 31 1月, 2013 1 次提交
    • H
      tracing: Make a snapshot feature available from userspace · debdd57f
      Hiraku Toyooka 提交于
      Ftrace has a snapshot feature available from kernel space and
      latency tracers (e.g. irqsoff) are using it. This patch enables
      user applictions to take a snapshot via debugfs.
      
      Add "snapshot" debugfs file in "tracing" directory.
      
        snapshot:
          This is used to take a snapshot and to read the output of the
          snapshot.
      
           # echo 1 > snapshot
      
          This will allocate the spare buffer for snapshot (if it is
          not allocated), and take a snapshot.
      
           # cat snapshot
      
          This will show contents of the snapshot.
      
           # echo 0 > snapshot
      
          This will free the snapshot if it is allocated.
      
          Any other positive values will clear the snapshot contents if
          the snapshot is allocated, or return EINVAL if it is not allocated.
      
      Link: http://lkml.kernel.org/r/20121226025300.3252.86850.stgit@liselsia
      
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: NHiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      [
         Fixed irqsoff selftest and also a conflict with a change
         that fixes the update_max_tr.
      ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      debdd57f
  9. 22 1月, 2013 2 次提交
    • S
      tracing: Remove the extra 4 bytes of padding in events · b000c806
      Steven Rostedt 提交于
      Due to a userspace issue with PowerTop v2beta, which hardcoded
      the offset of event fields that it was using, it broke when
      we removed the Big Kernel Lock counter from the event header.
      
       (commit e6e1e259 "tracing: Remove lock_depth from event entry")
      
      Because this broke userspace, it was determined that we must
      keep those 4 bytes around.
      
       (commit a3a4a5ac "Regression: partial revert "tracing: Remove lock_depth from event entry"")
      
      This unfortunately wastes space in the ring buffer. 4 bytes per
      event, where a lot of events are just 24 bytes. That's 16% of the
      buffer wasted. A million events will add 4 megs of white space
      into the buffer.
      
      It was later noticed that PowerTop v2beta could not work on systems
      where the kernel was 64 bit but the userspace was 32 bits.
      The reason was because the offsets are different between the
      two and the hard coded offset of one would not work with the other.
      
      With PowerTop v2 final, it implemented the same interface that both
      perf and trace-cmd use. That is, it reads the format file of
      the event to find the offsets of the fields it needs. This fixes
      the problem with running powertop on a 32 bit userspace running
      on a 64 bit kernel. It also no longer requires the 4 byte padding.
      
      As PowerTop v2 has been out for a while, and is included in all
      major distributions, it is time that we can safely remove the
      4 bytes of padding. Users of PowerTop v2beta should upgrade to
      PowerTop v2 final.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b000c806
    • S
      tracing: Fix sparse warning with is_signed_type() macro · 418c59e4
      Steven Rostedt 提交于
      Sparse complains when is_signed_type() is used on a pointer.
      This macro is needed for the format output used for ftrace
      and perf, to know if a binary field is a signed type or not.
      The is_signed_type() macro is used against all fields that are
      recorded by events to automate the operation.
      
      The problem sparse has is with the current way is_signed_type()
      works:
      
        ((type)-1 < 0)
      
      If "type" is a poiner, than sparse does not like it being compared
      to an integer (zero). The simple fix is to just give zero the
      same type. The runtime result stays the same.
      Reported-by: NRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      418c59e4
  10. 14 11月, 2012 1 次提交
    • D
      tracing: Format non-nanosec times from tsc clock without a decimal point. · 8be0709f
      David Sharp 提交于
      With the addition of the "tsc" clock, formatting timestamps to look like
      fractional seconds is misleading. Mark clocks as either in nanoseconds or
      not, and format non-nanosecond timestamps as decimal integers.
      
      Tested:
      $ cd /sys/kernel/debug/tracing/
      $ cat trace_clock
      [local] global tsc
      $ echo sched_switch > set_event
      $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
      $ cat trace
                <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
                 sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      $ echo 1 > options/latency-format
      $ cat trace
        <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
         sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      $ echo tsc > trace_clock
      $ cat trace
      $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
      $ echo 0 > options/latency-format
      $ cat trace
                <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
                 sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      echo 1 > options/latency-format
      $ cat trace
        <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
         sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      
      v2:
      Move arch-specific bits out of generic code.
      v4:
      Fix x86_32 build due to 64-bit division.
      
      Google-Bug-Id: 6980623
      Link: http://lkml.kernel.org/r/1352837903-32191-2-git-send-email-dhsharp@google.com
      
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8be0709f
  11. 02 11月, 2012 1 次提交
    • S
      tracing: Use irq_work for wake ups and remove *_nowake_*() functions · 0d5c6e1c
      Steven Rostedt 提交于
      Have the ring buffer commit function use the irq_work infrastructure to
      wake up any waiters waiting on the ring buffer for new data. The irq_work
      was created for such a purpose, where doing the actual wake up at the
      time of adding data is too dangerous, as an event or function trace may
      be in the midst of the work queue locks and cause deadlocks. The irq_work
      will either delay the action to the next timer interrupt, or trigger an IPI
      to itself forcing an interrupt to do the work (in a safe location).
      
      With irq_work, all ring buffer commits can safely do wakeups, removing
      the need for the ring buffer commit "nowake" variants, which were used
      by events and function tracing. All commits can now safely use the
      normal commit, and the "nowake" variants can be removed.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0d5c6e1c
  12. 31 7月, 2012 1 次提交
  13. 29 6月, 2012 1 次提交
    • S
      tracing: Remove NR_CPUS array from trace_iterator · 6d158a81
      Steven Rostedt 提交于
      Replace the NR_CPUS array of buffer_iter from the trace_iterator
      with an allocated array. This will just create an array of
      possible CPUS instead of the max number specified.
      
      The use of NR_CPUS in that array caused allocation failures for
      machines that were tight on memory. This did not cause any failures
      to the system itself (no crashes), but caused unnecessary failures
      for reading the trace files.
      
      Added a helper function called 'trace_buffer_iter()' that returns
      the buffer_iter item or NULL if it is not defined or the array was
      not allocated. Some routines do not require the array
      (tracing_open_pipe() for one).
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6d158a81
  14. 15 6月, 2012 1 次提交
    • S
      tracing: Add comments for the other bits of ftrace_event_call.flags · 5da43bed
      Steven Rostedt 提交于
      	TRACE_EVENT_FL_ENABLED_BIT,
      	TRACE_EVENT_FL_FILTERED_BIT,
      	TRACE_EVENT_FL_RECORDED_CMD_BIT,
      
      Have comments about what they are, but:
      
      	TRACE_EVENT_FL_CAP_ANY_BIT,
      	TRACE_EVENT_FL_NO_SET_FILTER_BIT,
      	TRACE_EVENT_FL_IGNORE_ENABLE_BIT,
      
      do not, making them second class citizens. To prevent another
      class warfare, these bits have protested for their right to be
      commented. And By Golly! I'll give them what they want!
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5da43bed
  15. 11 5月, 2012 1 次提交
    • S
      tracing: Do not enable function event with enable · 9b63776f
      Steven Rostedt 提交于
      With the adding of function tracing event to perf, it caused a
      side effect that produces the following warning when enabling all
      events in ftrace:
      
       # echo 1 > /sys/kernel/debug/tracing/events/enable
      
      [console]
      event trace: Could not enable event function
      
      This is because when enabling all events via the debugfs system
      it ignores events that do not have a ->reg() function assigned.
      This was to skip over the ftrace internal events (as they are
      not TRACE_EVENTs). But as the ftrace function event now has
      a ->reg() function attached to it for use with perf, it is no
      longer ignored.
      
      Worse yet, this ->reg() function is being called when it should
      not be. It returns an error and causes the above warning to
      be printed.
      
      By adding a new event_call flag (TRACE_EVENT_FL_IGNORE_ENABLE)
      and have all ftrace internel event structures have it set,
      setting the events/enable will no longe try to incorrectly enable
      the function event and does not warn.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9b63776f
  16. 14 3月, 2012 1 次提交
  17. 22 2月, 2012 3 次提交
  18. 06 12月, 2011 1 次提交
  19. 03 11月, 2011 1 次提交
  20. 15 7月, 2011 1 次提交
    • S
      tracing: Have dynamic size event stack traces · 4a9bd3f1
      Steven Rostedt 提交于
      Currently the stack trace per event in ftace is only 8 frames.
      This can be quite limiting and sometimes useless. Especially when
      the "ignore frames" is wrong and we also use up stack frames for
      the event processing itself.
      
      Change this to be dynamic by adding a percpu buffer that we can
      write a large stack frame into and then copy into the ring buffer.
      
      For interrupts and NMIs that come in while another event is being
      process, will only get to use the 8 frame stack. That should be enough
      as the task that it interrupted will have the full stack frame anyway.
      Requested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4a9bd3f1
  21. 15 6月, 2011 1 次提交
    • M
      tracing/kprobes: Fix kprobe-tracer to support stack trace · 1fd8df2c
      Masami Hiramatsu 提交于
      Fix to support kernel stack trace correctly on kprobe-tracer.
      Since the execution path of kprobe-based dynamic events is different
      from other tracepoint-based events, normal ftrace_trace_stack() doesn't
      work correctly. To fix that, this introduces ftrace_trace_stack_regs()
      which traces stack via pt_regs instead of current stack register.
      
      e.g.
      
       # echo p schedule+4 > /sys/kernel/debug/tracing/kprobe_events
       # echo 1 > /sys/kernel/debug/tracing/options/stacktrace
       # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
       # head -n 20 /sys/kernel/debug/tracing/trace
                  bash-2968  [000] 10297.050245: p_schedule_4: (schedule+0x4/0x4ca)
                  bash-2968  [000] 10297.050247: <stack trace>
       => schedule_timeout
       => n_tty_read
       => tty_read
       => vfs_read
       => sys_read
       => system_call_fastpath
           kworker/0:1-2940  [000] 10297.050265: p_schedule_4: (schedule+0x4/0x4ca)
           kworker/0:1-2940  [000] 10297.050266: <stack trace>
       => worker_thread
       => kthread
       => kernel_thread_helper
                  sshd-1132  [000] 10297.050365: p_schedule_4: (schedule+0x4/0x4ca)
                  sshd-1132  [000] 10297.050365: <stack trace>
       => sysret_careful
      
      Note: Even with this fix, the first entry will be skipped
      if the probe is put on the function entry area before
      the frame pointer is set up (usually, that is 4 bytes
       (push %bp; mov %sp %bp) on x86), because stack unwinder
      depends on the frame pointer.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: yrl.pp-manager.tt@hitachi.com
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Link: http://lkml.kernel.org/r/20110608070934.17777.17116.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1fd8df2c
  22. 26 5月, 2011 1 次提交
  23. 07 5月, 2011 1 次提交
  24. 10 3月, 2011 1 次提交
    • S
      tracing: Remove lock_depth from event entry · e6e1e259
      Steven Rostedt 提交于
      The lock_depth field in the event headers was added as a temporary
      data point for help in removing the BKL. Now that the BKL is pretty
      much been removed, we can remove this field.
      
      This in turn changes the header from 12 bytes to 8 bytes,
      removing the 4 byte buffer that gcc would insert if the first field
      in the data load was 8 bytes in size.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e6e1e259
  25. 08 2月, 2011 1 次提交
  26. 19 11月, 2010 1 次提交
    • S
      tracing/events: Show real number in array fields · 04295780
      Steven Rostedt 提交于
      Currently we have in something like the sched_switch event:
      
        field:char prev_comm[TASK_COMM_LEN];	offset:12;	size:16;	signed:1;
      
      When a userspace tool such as perf tries to parse this, the
      TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro
      simply uses a #len to show the string of the length. When the length is
      an enum, we get a string that means nothing for tools.
      
      By adding a static buffer and a mutex to protect it, we can store the
      string into that buffer with snprintf and show the actual number.
      Now we get:
      
        field:char prev_comm[16];       offset:12;      size:16;        signed:1;
      
      Something much more useful.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      04295780
  27. 18 11月, 2010 2 次提交
    • F
      tracing: Allow syscall trace events for non privileged users · 53cf810b
      Frederic Weisbecker 提交于
      As for the raw syscalls events, individual syscall events won't
      leak system wide information on task bound tracing. Allow non
      privileged users to use them in such workflow.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      53cf810b
    • F
      tracing: New flag to allow non privileged users to use a trace event · 61c32659
      Frederic Weisbecker 提交于
      This adds a new trace event internal flag that allows them to be
      used in perf by non privileged users in case of task bound tracing.
      
      This is desired for syscalls tracepoint because they don't leak
      global system informations, like some other tracepoints.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      61c32659
  28. 10 9月, 2010 1 次提交
    • P
      perf: Rework the PMU methods · a4eaf7f1
      Peter Zijlstra 提交于
      Replace pmu::{enable,disable,start,stop,unthrottle} with
      pmu::{add,del,start,stop}, all of which take a flags argument.
      
      The new interface extends the capability to stop a counter while
      keeping it scheduled on the PMU. We replace the throttled state with
      the generic stopped state.
      
      This also allows us to efficiently stop/start counters over certain
      code paths (like IRQ handlers).
      
      It also allows scheduling a counter without it starting, allowing for
      a generic frozen state (useful for rotating stopped counters).
      
      The stopped state is implemented in two different ways, depending on
      how the architecture implemented the throttled state:
      
       1) We disable the counter:
          a) the pmu has per-counter enable bits, we flip that
          b) we program a NOP event, preserving the counter state
      
       2) We store the counter state and ignore all read/overflow events
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus <paulus@samba.org>
      Cc: stephane eranian <eranian@googlemail.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      Cc: Yanmin <yanmin_zhang@linux.intel.com>
      Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Michael Cree <mcree@orcon.net.nz>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a4eaf7f1
  29. 19 8月, 2010 1 次提交