1. 08 7月, 2009 1 次提交
    • S
      ring-buffer: make lockless · 77ae365e
      Steven Rostedt 提交于
      This patch converts the ring buffers into a completely lockless
      buffer recording system. The read side still takes locks since
      we still serialize readers. But the writers are the ones that
      must be lockless (those can happen in NMIs).
      
      The main change is to the "head_page" pointer. We write to the
      tail, and read from the head. The "head_page" pointer in the cpu
      buffer is now just a reference to where to look. The real head
      page is now kept in the head_page->list->prev->next pointer.
      That is, in the list head of the previous page we set flags.
      
      The list pages are allocated to be aligned such that the lowest
      significant bits are always zero pointing to the list. This gives
      us play to put in flags to their pointers.
      
      bit 0: set when the page is a head page
      bit 1: set when the writer is moving the page (for overwrite mode)
      
      cmpxchg is used to update the pointer.
      
      When the writer wraps the buffer and the tail meets the head,
      in overwrite mode, the writer must move the head page forward.
      It first uses cmpxchg to change the pointer flag from 1 to 2.
      Once this is done, the reader on another CPU will not take the
      page from the buffer.
      
      The writers need to protect against interrupts (we don't bother with
      disabling interrupts because NMIs are allowed to write too).
      
      After the writer sets the pointer flag to 2, it takes care to
      manage interrupts coming in. This is discribed in detail within the
      comments of the code.
      
       Changes in version 2:
        - Let reader reset entries value of header page.
        - Fix tail page passing commit page on reader page test.
        - Always increment entries and write counter in rb_tail_page_update
        - Add safety check in rb_set_commit_to_write to break out of infinite loop
        - add mask in rb_is_reader_page
      
      [ Impact: lock free writing to the ring buffer ]
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      77ae365e
  2. 15 6月, 2009 1 次提交
  3. 09 6月, 2009 1 次提交
    • P
      ring-buffer: pass in lockdep class key for reader_lock · 1f8a6a10
      Peter Zijlstra 提交于
      On Sun, 7 Jun 2009, Ingo Molnar wrote:
      > Testing tracer sched_switch: <6>Starting ring buffer hammer
      > PASSED
      > Testing tracer sysprof: PASSED
      > Testing tracer function: PASSED
      > Testing tracer irqsoff:
      > =============================================
      > PASSED
      > Testing tracer preemptoff: PASSED
      > Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ]
      > PASSED
      > Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760
      > ---------------------------------------------
      > rb_consumer/431 is trying to acquire lock:
      >  (&cpu_buffer->reader_lock){......}, at: [<c109eef7>] ring_buffer_reset_cpu+0x37/0x70
      >
      > but task is already holding lock:
      >  (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
      >
      > other info that might help us debug this:
      > 1 lock held by rb_consumer/431:
      >  #0:  (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
      
      The ring buffer is a generic structure, and can be used outside of
      ftrace. If ftrace traces within the use of the ring buffer, it can produce
      false positives with lockdep.
      
      This patch passes in a static lock key into the allocation of the ring
      buffer, so that different ring buffers will have their own lock class.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1244477919.13761.9042.camel@twins>
      
      [ store key in ring buffer descriptor ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1f8a6a10
  4. 06 5月, 2009 1 次提交
    • S
      ring-buffer: add counters for commit overrun and nmi dropped entries · f0d2c681
      Steven Rostedt 提交于
      The WARN_ON in the ring buffer when a commit is preempted and the
      buffer is filled by preceding writes can happen in normal operations.
      The WARN_ON makes it look like a bug, not to mention, because
      it does not stop tracing and calls printk which can also recurse, this
      is prone to deadlock (the WARN_ON is not in a position to recurse).
      
      This patch removes the WARN_ON and replaces it with a counter that
      can be retrieved by a tracer. This counter is called commit_overrun.
      
      While at it, I added a nmi_dropped counter to count any time an NMI entry
      is dropped because the NMI could not take the spinlock.
      
      [ Impact: prevent deadlock by printing normal case warning ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f0d2c681
  5. 24 4月, 2009 1 次提交
  6. 17 4月, 2009 1 次提交
    • S
      tracing/events/ring-buffer: expose format of ring buffer headers to users · d1b182a8
      Steven Rostedt 提交于
      Currently, every thing needed to read the binary output from the
      ring buffers is available, with the exception of the way the ring
      buffers handles itself internally.
      
      This patch creates two special files in the debugfs/tracing/events
      directory:
      
       # cat /debug/tracing/events/header_page
              field: u64 timestamp;   offset:0;       size:8;
              field: local_t commit;  offset:8;       size:8;
              field: char data;       offset:16;      size:4080;
      
       # cat /debug/tracing/events/header_event
              type        :    2 bits
              len         :    3 bits
              time_delta  :   27 bits
              array       :   32 bits
      
              padding     : type == 0
              time_extend : type == 1
              data        : type == 3
      
      This is to allow a userspace app to see if the ring buffer format changes
      or not.
      
      [ Impact: allow userspace apps to know of ringbuffer format changes ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d1b182a8
  7. 14 4月, 2009 1 次提交
    • S
      ring-buffer: add ring_buffer_discard_commit · fa1b47dd
      Steven Rostedt 提交于
      The ring_buffer_discard_commit is similar to ring_buffer_event_discard
      but it can only be done on an event that has yet to be commited.
      Unpredictable results can happen otherwise.
      
      The main difference between ring_buffer_discard_commit and
      ring_buffer_event_discard is that ring_buffer_discard_commit will try
      to free the data in the ring buffer if nothing has addded data
      after the reserved event. If something did, then it acts almost the
      same as ring_buffer_event_discard followed by a
      ring_buffer_unlock_commit.
      
      Note, either ring_buffer_commit_discard and ring_buffer_unlock_commit
      can be called on an event, not both.
      
      This commit also exports both discard functions to be usable by
      GPL modules.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa1b47dd
  8. 23 3月, 2009 1 次提交
  9. 18 3月, 2009 1 次提交
  10. 05 3月, 2009 1 次提交
    • S
      tracing: add tracing_on/tracing_off to kernel.h · 2002c258
      Steven Rostedt 提交于
      Impact: cleanup
      
      The functions tracing_start/tracing_stop have been moved to kernel.h.
      These are not the functions a developer most likely wants to use
      when they want to insert a place to stop tracing and restart it from
      user space.
      
      tracing_start/tracing_stop was created to work with things like
      suspend to ram, where even calling smp_processor_id() can crash the
      system. The tracing_start/tracing_stop was used to stop the tracer from
      doing anything. These are still light weight functions, but add a bit
      more overhead to be able to stop the tracers. They also have no interface
      back to userland. That is, if the kernel calls tracing_stop, userland
      can not start tracing.
      
      What a developer most likely wants to use is tracing_on/tracing_off.
      These are very light weight functions (simply sets or clears a bit).
      These functions just stop recording into the ring buffer. The tracers
      don't even know that this happens except that they would receive NULL
      from the ring_buffer_lock_reserve function.
      
      Also, there's a way for the user land to enable or disable this bit.
      In debugfs/tracing/tracing_on, a user may echo "0" (same as tracing_off())
      or echo "1" (same as tracing_on()) into this file. This becomes handy when
      a kernel developer is debugging and wants tracing to turn off when it
      hits an anomaly. Then the developer can examine the trace, and restart
      tracing if they want to try again (echo 1 > tracing_on).
      
      This patch moves the prototypes for tracing_on/tracing_off to kernel.h
      and comments their use, so that a kernel developer will know how
      to use them.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      2002c258
  11. 04 3月, 2009 1 次提交
  12. 17 2月, 2009 1 次提交
  13. 11 2月, 2009 1 次提交
  14. 08 2月, 2009 1 次提交
  15. 06 2月, 2009 1 次提交
    • A
      ring_buffer: remove unused flags parameter · 0a987751
      Arnaldo Carvalho de Melo 提交于
      Impact: API change, cleanup
      
      >From ring_buffer_{lock_reserve,unlock_commit}.
      
      $ codiff /tmp/vmlinux.before /tmp/vmlinux.after
      linux-2.6-tip/kernel/trace/trace.c:
        trace_vprintk              |  -14
        trace_graph_return         |  -14
        trace_graph_entry          |  -10
        trace_function             |   -8
        __ftrace_trace_stack       |   -8
        ftrace_trace_userstack     |   -8
        tracing_sched_switch_trace |   -8
        ftrace_trace_special       |  -12
        tracing_sched_wakeup_trace |   -8
       9 functions changed, 90 bytes removed, diff: -90
      
      linux-2.6-tip/block/blktrace.c:
        __blk_add_trace |   -1
       1 function changed, 1 bytes removed, diff: -1
      
      /tmp/vmlinux.after:
       10 functions changed, 91 bytes removed, diff: -91
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NFrédéric Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0a987751
  16. 10 12月, 2008 1 次提交
  17. 08 12月, 2008 1 次提交
  18. 03 12月, 2008 1 次提交
    • S
      ring-buffer: read page interface · 8789a9e7
      Steven Rostedt 提交于
      Impact: new API to ring buffer
      
      This patch adds a new interface into the ring buffer that allows a
      page to be read from the ring buffer on a given CPU. For every page
      read, one must also be given to allow for a "swap" of the pages.
      
       rpage = ring_buffer_alloc_read_page(buffer);
       if (!rpage)
      	goto err;
       ret = ring_buffer_read_page(buffer, &rpage, cpu, full);
       if (!ret)
      	goto empty;
       process_page(rpage);
       ring_buffer_free_read_page(rpage);
      
      The caller of these functions must handle any waits that are
      needed to wait for new data. The ring_buffer_read_page will simply
      return 0 if there is no data, or if "full" is set and the writer
      is still on the current page.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8789a9e7
  19. 23 11月, 2008 1 次提交
    • S
      ring-buffer: add tracing_off_permanent · 033601a3
      Steven Rostedt 提交于
      Impact: feature to permanently disable ring buffer
      
      This patch adds a API to the ring buffer code that will permanently
      disable the ring buffer from ever recording. This should only be
      called when some serious anomaly is detected, and the system
      may be in an unstable state. When that happens, shutting down the
      recording to the ring buffers may be appropriate.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      033601a3
  20. 12 11月, 2008 1 次提交
    • S
      ring-buffer: buffer record on/off switch · a3583244
      Steven Rostedt 提交于
      Impact: enable/disable ring buffer recording API added
      
      Several kernel developers have requested that there be a way to stop
      recording into the ring buffers with a simple switch that can also
      be enabled from userspace. This patch addes a new kernel API to the
      ring buffers called:
      
       tracing_on()
       tracing_off()
      
      When tracing_off() is called, all ring buffers will not be able to record
      into their buffers.
      
      tracing_on() will enable the ring buffers again.
      
      These two act like an on/off switch. That is, there is no counting of the
      number of times tracing_off or tracing_on has been called.
      
      A new file is added to the debugfs/tracing directory called
      
        tracing_on
      
      This allows for userspace applications to also flip the switch.
      
        echo 0 > debugfs/tracing/tracing_on
      
      disables the tracing.
      
        echo 1 > /debugfs/tracing/tracing_on
      
      enables it.
      
      Note, this does not disable or enable any tracers. It only sets or clears
      a flag that needs to be set in order for the ring buffers to write to
      their buffers. It is a global flag, and affects all ring buffers.
      
      The buffers start out with tracing_on enabled.
      
      There are now three flags that control recording into the buffers:
      
       tracing_on: which affects all ring buffer tracers.
      
       buffer->record_disabled: which affects an allocated buffer, which may be set
           if an anomaly is detected, and tracing is disabled.
      
       cpu_buffer->record_disabled: which is set by tracing_stop() or if an
           anomaly is detected. tracing_start can not reenable this if
           an anomaly occurred.
      
      The userspace debugfs/tracing/tracing_enabled is implemented with
      tracing_stop() but the user space code can not enable it if the kernel
      called tracing_stop().
      
      Userspace can enable the tracing_on even if the kernel disabled it.
      It is just a switch used to stop tracing if a condition was hit.
      tracing_on is not for protecting critical areas in the kernel nor is
      it for stopping tracing if an anomaly occurred. This is because userspace
      can reenable it at any time.
      
      Side effect: With this patch, I discovered a dead variable in ftrace.c
        called tracing_on. This patch removes it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      a3583244
  21. 14 10月, 2008 2 次提交
    • S
      ring_buffer: implement new locking · d769041f
      Steven Rostedt 提交于
      The old "lock always" scheme had issues with lockdep, and was not very
      efficient anyways.
      
      This patch does a new design to be partially lockless on writes.
      Writes will add new entries to the per cpu pages by simply disabling
      interrupts. When a write needs to go to another page than it will
      grab the lock.
      
      A new "read page" has been added so that the reader can pull out a page
      from the ring buffer to read without worrying about the writer writing over
      it. This allows us to not take the lock for all reads. The lock is
      now only taken when a read needs to go to a new page.
      
      This is far from lockless, and interrupts still need to be disabled,
      but it is a step towards a more lockless solution, and it also
      solves a lot of the issues that were noticed by the first conversion
      of ftrace to the ring buffers.
      
      Note: the ring_buffer_{un}lock API has been removed.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d769041f
    • S
      tracing: unified trace buffer · 7a8e76a3
      Steven Rostedt 提交于
      This is a unified tracing buffer that implements a ring buffer that
      hopefully everyone will eventually be able to use.
      
      The events recorded into the buffer have the following structure:
      
        struct ring_buffer_event {
      	u32 type:2, len:3, time_delta:27;
      	u32 array[];
        };
      
      The minimum size of an event is 8 bytes. All events are 4 byte
      aligned inside the buffer.
      
      There are 4 types (all internal use for the ring buffer, only
      the data type is exported to the interface users).
      
       RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
      	of a buffer page.
      
       RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
      	is greater than the 27 bit delta can hold. We add another
      	32 bits, and record that in its own event (8 byte size).
      
       RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
      	help keep the buffer timestamps in sync.
      
      RINGBUF_TYPE_DATA: The event actually holds user data.
      
      The "len" field is only three bits. Since the data must be
      4 byte aligned, this field is shifted left by 2, giving a
      max length of 28 bytes. If the data load is greater than 28
      bytes, the first array field holds the full length of the
      data load and the len field is set to zero.
      
      Example, data size of 7 bytes:
      
      	type = RINGBUF_TYPE_DATA
      	len = 2
      	time_delta: <time-stamp> - <prev_event-time-stamp>
      	array[0..1]: <7 bytes of data> <1 byte empty>
      
      This event is saved in 12 bytes of the buffer.
      
      An event with 82 bytes of data:
      
      	type = RINGBUF_TYPE_DATA
      	len = 0
      	time_delta: <time-stamp> - <prev_event-time-stamp>
      	array[0]: 84 (Note the alignment)
      	array[1..14]: <82 bytes of data> <2 bytes empty>
      
      The above event is saved in 92 bytes (if my math is correct).
      82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.
      
      Do not reference the above event struct directly. Use the following
      functions to gain access to the event table, since the
      ring_buffer_event structure may change in the future.
      
      ring_buffer_event_length(event): get the length of the event.
      	This is the size of the memory used to record this
      	event, and not the size of the data pay load.
      
      ring_buffer_time_delta(event): get the time delta of the event
      	This returns the delta time stamp since the last event.
      	Note: Even though this is in the header, there should
      		be no reason to access this directly, accept
      		for debugging.
      
      ring_buffer_event_data(event): get the data from the event
      	This is the function to use to get the actual data
      	from the event. Note, it is only a pointer to the
      	data inside the buffer. This data must be copied to
      	another location otherwise you risk it being written
      	over in the buffer.
      
      ring_buffer_lock: A way to lock the entire buffer.
      ring_buffer_unlock: unlock the buffer.
      
      ring_buffer_alloc: create a new ring buffer. Can choose between
      	overwrite or consumer/producer mode. Overwrite will
      	overwrite old data, where as consumer producer will
      	throw away new data if the consumer catches up with the
      	producer.  The consumer/producer is the default.
      
      ring_buffer_free: free the ring buffer.
      
      ring_buffer_resize: resize the buffer. Changes the size of each cpu
      	buffer. Note, it is up to the caller to provide that
      	the buffer is not being used while this is happening.
      	This requirement may go away but do not count on it.
      
      ring_buffer_lock_reserve: locks the ring buffer and allocates an
      	entry on the buffer to write to.
      ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
      	the buffer.
      
      ring_buffer_write: writes some data into the ring buffer.
      
      ring_buffer_peek: Look at a next item in the cpu buffer.
      ring_buffer_consume: get the next item in the cpu buffer and
      	consume it. That is, this function increments the head
      	pointer.
      
      ring_buffer_read_start: Start an iterator of a cpu buffer.
      	For now, this disables the cpu buffer, until you issue
      	a finish. This is just because we do not want the iterator
      	to be overwritten. This restriction may change in the future.
      	But note, this is used for static reading of a buffer which
      	is usually done "after" a trace. Live readings would want
      	to use the ring_buffer_consume above, which will not
      	disable the ring buffer.
      
      ring_buffer_read_finish: Finishes the read iterator and reenables
      	the ring buffer.
      
      ring_buffer_iter_peek: Look at the next item in the cpu iterator.
      ring_buffer_read: Read the iterator and increment it.
      ring_buffer_iter_reset: Reset the iterator to point to the beginning
      	of the cpu buffer.
      ring_buffer_iter_empty: Returns true if the iterator is at the end
      	of the cpu buffer.
      
      ring_buffer_size: returns the size in bytes of each cpu buffer.
      	Note, the real size is this times the number of CPUs.
      
      ring_buffer_reset_cpu: Sets the cpu buffer to empty
      ring_buffer_reset: sets all cpu buffers to empty
      
      ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
      	cpu buffer of another buffer. This is handy when you
      	want to take a snap shot of a running trace on just one
      	cpu. Having a backup buffer, to swap with facilitates this.
      	Ftrace max latencies use this.
      
      ring_buffer_empty: Returns true if the ring buffer is empty.
      ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.
      
      ring_buffer_record_disable: disable all cpu buffers (read only)
      ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
      ring_buffer_record_enable: enable all cpu buffers.
      ring_buffer_record_enabl_cpu: enable a single cpu buffer.
      
      ring_buffer_entries: The number of entries in a ring buffer.
      ring_buffer_overruns: The number of entries removed due to writing wrap.
      
      ring_buffer_time_stamp: Get the time stamp used by the ring buffer
      ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
      	into nanosecs.
      
      I still need to implement the GTOD feature. But we need support from
      the cpu frequency infrastructure.  But this can be done at a later
      time without affecting the ring buffer interface.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7a8e76a3