1. 03 12月, 2008 2 次提交
    • S
      ring-buffer: read page interface · 8789a9e7
      Steven Rostedt 提交于
      Impact: new API to ring buffer
      
      This patch adds a new interface into the ring buffer that allows a
      page to be read from the ring buffer on a given CPU. For every page
      read, one must also be given to allow for a "swap" of the pages.
      
       rpage = ring_buffer_alloc_read_page(buffer);
       if (!rpage)
      	goto err;
       ret = ring_buffer_read_page(buffer, &rpage, cpu, full);
       if (!ret)
      	goto empty;
       process_page(rpage);
       ring_buffer_free_read_page(rpage);
      
      The caller of these functions must handle any waits that are
      needed to wait for new data. The ring_buffer_read_page will simply
      return 0 if there is no data, or if "full" is set and the writer
      is still on the current page.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8789a9e7
    • S
      ring-buffer: move some metadata into buffer page · abc9b56d
      Steven Rostedt 提交于
      Impact: get ready for splice changes
      
      This patch moves the commit and timestamp into the beginning of each
      data page of the buffer. This change will allow the page to be moved
      to another location (disk, network, etc) and still have information
      in the page to be able to read it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      abc9b56d
  2. 27 11月, 2008 1 次提交
  3. 23 11月, 2008 1 次提交
    • S
      ring-buffer: add tracing_off_permanent · 033601a3
      Steven Rostedt 提交于
      Impact: feature to permanently disable ring buffer
      
      This patch adds a API to the ring buffer code that will permanently
      disable the ring buffer from ever recording. This should only be
      called when some serious anomaly is detected, and the system
      may be in an unstable state. When that happens, shutting down the
      recording to the ring buffers may be appropriate.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      033601a3
  4. 19 11月, 2008 1 次提交
  5. 13 11月, 2008 1 次提交
  6. 12 11月, 2008 5 次提交
    • S
      ring-buffer: fix deadlock from reader_lock in read_start · 642edba5
      Steven Rostedt 提交于
      Impact: deadlock fix in ring_buffer_read_start
      
      The ring_buffer_iter_reset was called from ring_buffer_read_start
      where both grabbed the reader_lock.
      
      This patch separates out the internals of ring_buffer_iter_reset
      to its own function so that both APIs may grab the reader_lock.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      642edba5
    • S
      ring-buffer: no preempt for sched_clock() · 47e74f2b
      Steven Rostedt 提交于
      Impact: disable preemption when calling sched_clock()
      
      The ring_buffer_time_stamp still uses sched_clock as its counter.
      But it is a bug to call it with preemption enabled. This requirement
      should not be pushed to the ring_buffer_time_stamp callers, so
      the ring_buffer_time_stamp needs to disable preemption when calling
      sched_clock.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      47e74f2b
    • S
      ring-buffer: clean up warn ons · 3e89c7bb
      Steven Rostedt 提交于
      Impact: Restructure WARN_ONs in ring_buffer.c
      
      The current WARN_ON macros in ring_buffer.c are quite ugly.
      
      This patch cleans them up and uses a single RB_WARN_ON that returns
      the value of the condition. This allows the caller to abort the
      function if the condition is true.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e89c7bb
    • S
      ring-buffer: buffer record on/off switch · a3583244
      Steven Rostedt 提交于
      Impact: enable/disable ring buffer recording API added
      
      Several kernel developers have requested that there be a way to stop
      recording into the ring buffers with a simple switch that can also
      be enabled from userspace. This patch addes a new kernel API to the
      ring buffers called:
      
       tracing_on()
       tracing_off()
      
      When tracing_off() is called, all ring buffers will not be able to record
      into their buffers.
      
      tracing_on() will enable the ring buffers again.
      
      These two act like an on/off switch. That is, there is no counting of the
      number of times tracing_off or tracing_on has been called.
      
      A new file is added to the debugfs/tracing directory called
      
        tracing_on
      
      This allows for userspace applications to also flip the switch.
      
        echo 0 > debugfs/tracing/tracing_on
      
      disables the tracing.
      
        echo 1 > /debugfs/tracing/tracing_on
      
      enables it.
      
      Note, this does not disable or enable any tracers. It only sets or clears
      a flag that needs to be set in order for the ring buffers to write to
      their buffers. It is a global flag, and affects all ring buffers.
      
      The buffers start out with tracing_on enabled.
      
      There are now three flags that control recording into the buffers:
      
       tracing_on: which affects all ring buffer tracers.
      
       buffer->record_disabled: which affects an allocated buffer, which may be set
           if an anomaly is detected, and tracing is disabled.
      
       cpu_buffer->record_disabled: which is set by tracing_stop() or if an
           anomaly is detected. tracing_start can not reenable this if
           an anomaly occurred.
      
      The userspace debugfs/tracing/tracing_enabled is implemented with
      tracing_stop() but the user space code can not enable it if the kernel
      called tracing_stop().
      
      Userspace can enable the tracing_on even if the kernel disabled it.
      It is just a switch used to stop tracing if a condition was hit.
      tracing_on is not for protecting critical areas in the kernel nor is
      it for stopping tracing if an anomaly occurred. This is because userspace
      can reenable it at any time.
      
      Side effect: With this patch, I discovered a dead variable in ftrace.c
        called tracing_on. This patch removes it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      a3583244
    • S
      ring-buffer: add reader lock · f83c9d0f
      Steven Rostedt 提交于
      Impact: serialize reader accesses to individual CPU ring buffers
      
      The code in the ring buffer expects only one reader at a time, but currently
      it puts that requirement on the caller. This is not strong enough, and this
      patch adds a "reader_lock" that serializes the access to the reader API
      of the ring buffer.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f83c9d0f
  7. 11 11月, 2008 2 次提交
    • S
      ring-buffer: replace most bug ons with warn on and disable buffer · f536aafc
      Steven Rostedt 提交于
      This patch replaces most of the BUG_ONs in the ring_buffer code with
      RB_WARN_ON variants. It adds some more variants as needed for the
      replacement. This lets the buffer die nicely and still warn the user.
      
      One BUG_ON remains in the code, and that is because it detects a
      bad pointer passed in by the calling function, and not a bug by
      the ring buffer code itself.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f536aafc
    • S
      ring-buffer: prevent infinite looping on time stamping · 4143c5cb
      Steven Rostedt 提交于
      Impact: removal of unnecessary looping
      
      The lockless part of the ring buffer allows for reentry into the code
      from interrupts. A timestamp is taken, a test is preformed and if it
      detects that an interrupt occurred that did tracing, it tries again.
      
      The problem arises if the timestamp code itself causes a trace.
      The detection will detect this and loop again. The difference between
      this and an interrupt doing tracing, is that this will fail every time,
      and cause an infinite loop.
      
      Currently, we test if the loop happens 1000 times, and if so, it will
      produce a warning and disable the ring buffer.
      
      The problem with this approach is that it makes it difficult to perform
      some types of tracing (tracing the timestamp code itself).
      
      Each trace entry has a delta timestamp from the previous entry.
      If a trace entry is reserved but and interrupt occurs and traces before
      the previous entry is commited, the delta timestamp for that entry will
      be zero. This actually makes sense in terms of tracing, because the
      interrupt entry happened before the preempted entry was commited, so
      one may consider the two happening at the same time. The order is
      still preserved in the buffer.
      
      With this idea, instead of trying to get a new timestamp if an interrupt
      made it in between the timestamp and the test, the entry could simply
      make the delta zero and continue. This will prevent interrupts or
      tracers in the timer code from causing the above loop.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      4143c5cb
  8. 06 11月, 2008 1 次提交
    • S
      ring-buffer: convert to raw spinlocks · 3e03fb7f
      Steven Rostedt 提交于
      Impact: no lockdep debugging of ring buffer
      
      The problem with running lockdep on the ring buffer is that the
      ring buffer is the core infrastructure of ftrace. What happens is
      that the tracer will start tracing the lockdep code while lockdep
      is testing the ring buffers locks.  This can cause lockdep to
      fail due to testing cases that have not fully finished their
      locking transition.
      
      This patch converts the spin locks used by the ring buffer back
      into raw spin locks which lockdep does not check.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e03fb7f
  9. 04 11月, 2008 1 次提交
  10. 03 11月, 2008 1 次提交
    • S
      tracing, ring-buffer: add paranoid checks for loops · 818e3dd3
      Steven Rostedt 提交于
      While writing a new tracer, I had a bug where I caused the ring-buffer
      to recurse in a bad way. The bug was with the tracer I was writing
      and not the ring-buffer itself. But it took a long time to find the
      problem.
      
      This patch adds paranoid checks into the ring-buffer infrastructure
      that will catch bugs of this nature.
      
      Note: I put the bug back in the tracer and this patch showed the error
            nicely and prevented the lockup.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      818e3dd3
  11. 27 10月, 2008 1 次提交
    • S
      trace: fix printk warning for u64 · e2862c94
      Stephen Rothwell 提交于
      A powerpc ppc64_defconfig build produces these warnings:
      
      kernel/trace/ring_buffer.c: In function 'rb_add_time_stamp':
      kernel/trace/ring_buffer.c:969: warning: format '%llu' expects type 'long long unsigned int', but argument 2 has type 'u64'
      kernel/trace/ring_buffer.c:969: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'u64'
      kernel/trace/ring_buffer.c:969: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'u64'
      
      Just cast the u64s to unsigned long long like we do everywhere else.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e2862c94
  12. 22 10月, 2008 1 次提交
  13. 14 10月, 2008 10 次提交
    • S
      ring-buffer: make reentrant · bf41a158
      Steven Rostedt 提交于
      This patch replaces the local_irq_save/restore with preempt_disable/
      enable. This allows for interrupts to enter while recording.
      To write to the ring buffer, you must reserve data, and then
      commit it. During this time, an interrupt may call a trace function
      that will also record into the buffer before the commit is made.
      
      The interrupt will reserve its entry after the first entry, even
      though the first entry did not finish yet.
      
      The time stamp delta of the interrupt entry will be zero, since
      in the view of the trace, the interrupt happened during the
      first field anyway.
      
      Locking still takes place when the tail/write moves from one page
      to the next. The reader always takes the locks.
      
      A new page pointer is added, called the commit. The write/tail will
      always point to the end of all entries. The commit field will
      point to the last committed entry. Only this commit entry may
      update the write time stamp.
      
      The reader can only go up to the commit. It cannot go past it.
      
      If a lot of interrupts come in during a commit that fills up the
      buffer, and it happens to make it all the way around the buffer
      back to the commit, then a warning is printed and new events will
      be dropped.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bf41a158
    • S
      ring-buffer: move page indexes into page headers · 6f807acd
      Steven Rostedt 提交于
      Remove the global head and tail indexes and move them into the
      page header. Each page will now keep track of where the last
      write and read was made. We also rename the head and tail to read
      and write for better clarification.
      
      This patch is needed for future enhancements to move the ring buffer
      to a lockless solution.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6f807acd
    • S
      ring_buffer: map to cpu not page · aa1e0e3b
      Steven Rostedt 提交于
      My original patch had a compile bug when NUMA was configured. I
      referenced cpu when it should have been cpu_buffer->cpu.
      
      Ingo quickly fixed this bug by replacing cpu with 'i' because that
      was the loop counter. Unfortunately, the 'i' was the counter of
      pages, not CPUs. This caused a crash when the number of pages allocated
      for the buffers exceeded the number of pages, which would usually
      be the case.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      aa1e0e3b
    • I
      ring-buffer: fix build error · 77ae11f6
      Ingo Molnar 提交于
      fix:
      
       kernel/trace/ring_buffer.c: In function ‘rb_allocate_pages’:
       kernel/trace/ring_buffer.c:235: error: ‘cpu’ undeclared (first use in this function)
       kernel/trace/ring_buffer.c:235: error: (Each undeclared identifier is reported only once
       kernel/trace/ring_buffer.c:235: error: for each function it appears in.)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      77ae11f6
    • S
      ring_buffer: allocate buffer page pointer · e4c2ce82
      Steven Rostedt 提交于
      The current method of overlaying the page frame as the buffer page pointer
      can be very dangerous and limits our ability to do other things with
      a page from the buffer, like send it off to disk.
      
      This patch allocates the buffer_page instead of overlaying the page's
      page frame. The use of the buffer_page has hardly changed due to this.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e4c2ce82
    • S
      ring_buffer: implement new locking · d769041f
      Steven Rostedt 提交于
      The old "lock always" scheme had issues with lockdep, and was not very
      efficient anyways.
      
      This patch does a new design to be partially lockless on writes.
      Writes will add new entries to the per cpu pages by simply disabling
      interrupts. When a write needs to go to another page than it will
      grab the lock.
      
      A new "read page" has been added so that the reader can pull out a page
      from the ring buffer to read without worrying about the writer writing over
      it. This allows us to not take the lock for all reads. The lock is
      now only taken when a read needs to go to a new page.
      
      This is far from lockless, and interrupts still need to be disabled,
      but it is a step towards a more lockless solution, and it also
      solves a lot of the issues that were noticed by the first conversion
      of ftrace to the ring buffers.
      
      Note: the ring_buffer_{un}lock API has been removed.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d769041f
    • S
      ring_buffer: remove raw from local_irq_save · 70255b5e
      Steven Rostedt 提交于
      The raw_local_irq_save causes issues with lockdep. We don't need it
      so replace them with local_irq_save.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      70255b5e
    • S
      ring_buffer: reset buffer page when freeing · ed56829c
      Steven Rostedt 提交于
      Mathieu Desnoyers pointed out that the freeing of the page frame needs
      to be reset otherwise we might trigger BUG_ON in the page free code.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ed56829c
    • S
      ring_buffer: add paranoid check for buffer page · a7b13743
      Steven Rostedt 提交于
      If for some strange reason the buffer_page gets bigger, or the page struct
      gets smaller, I want to know this ASAP.  The best way is to not let the
      kernel compile.
      
      This patch adds code to test the size of the struct buffer_page against the
      page struct and will cause compile issues if the buffer_page ever gets bigger
      than the page struct.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a7b13743
    • S
      tracing: unified trace buffer · 7a8e76a3
      Steven Rostedt 提交于
      This is a unified tracing buffer that implements a ring buffer that
      hopefully everyone will eventually be able to use.
      
      The events recorded into the buffer have the following structure:
      
        struct ring_buffer_event {
      	u32 type:2, len:3, time_delta:27;
      	u32 array[];
        };
      
      The minimum size of an event is 8 bytes. All events are 4 byte
      aligned inside the buffer.
      
      There are 4 types (all internal use for the ring buffer, only
      the data type is exported to the interface users).
      
       RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
      	of a buffer page.
      
       RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
      	is greater than the 27 bit delta can hold. We add another
      	32 bits, and record that in its own event (8 byte size).
      
       RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
      	help keep the buffer timestamps in sync.
      
      RINGBUF_TYPE_DATA: The event actually holds user data.
      
      The "len" field is only three bits. Since the data must be
      4 byte aligned, this field is shifted left by 2, giving a
      max length of 28 bytes. If the data load is greater than 28
      bytes, the first array field holds the full length of the
      data load and the len field is set to zero.
      
      Example, data size of 7 bytes:
      
      	type = RINGBUF_TYPE_DATA
      	len = 2
      	time_delta: <time-stamp> - <prev_event-time-stamp>
      	array[0..1]: <7 bytes of data> <1 byte empty>
      
      This event is saved in 12 bytes of the buffer.
      
      An event with 82 bytes of data:
      
      	type = RINGBUF_TYPE_DATA
      	len = 0
      	time_delta: <time-stamp> - <prev_event-time-stamp>
      	array[0]: 84 (Note the alignment)
      	array[1..14]: <82 bytes of data> <2 bytes empty>
      
      The above event is saved in 92 bytes (if my math is correct).
      82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.
      
      Do not reference the above event struct directly. Use the following
      functions to gain access to the event table, since the
      ring_buffer_event structure may change in the future.
      
      ring_buffer_event_length(event): get the length of the event.
      	This is the size of the memory used to record this
      	event, and not the size of the data pay load.
      
      ring_buffer_time_delta(event): get the time delta of the event
      	This returns the delta time stamp since the last event.
      	Note: Even though this is in the header, there should
      		be no reason to access this directly, accept
      		for debugging.
      
      ring_buffer_event_data(event): get the data from the event
      	This is the function to use to get the actual data
      	from the event. Note, it is only a pointer to the
      	data inside the buffer. This data must be copied to
      	another location otherwise you risk it being written
      	over in the buffer.
      
      ring_buffer_lock: A way to lock the entire buffer.
      ring_buffer_unlock: unlock the buffer.
      
      ring_buffer_alloc: create a new ring buffer. Can choose between
      	overwrite or consumer/producer mode. Overwrite will
      	overwrite old data, where as consumer producer will
      	throw away new data if the consumer catches up with the
      	producer.  The consumer/producer is the default.
      
      ring_buffer_free: free the ring buffer.
      
      ring_buffer_resize: resize the buffer. Changes the size of each cpu
      	buffer. Note, it is up to the caller to provide that
      	the buffer is not being used while this is happening.
      	This requirement may go away but do not count on it.
      
      ring_buffer_lock_reserve: locks the ring buffer and allocates an
      	entry on the buffer to write to.
      ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
      	the buffer.
      
      ring_buffer_write: writes some data into the ring buffer.
      
      ring_buffer_peek: Look at a next item in the cpu buffer.
      ring_buffer_consume: get the next item in the cpu buffer and
      	consume it. That is, this function increments the head
      	pointer.
      
      ring_buffer_read_start: Start an iterator of a cpu buffer.
      	For now, this disables the cpu buffer, until you issue
      	a finish. This is just because we do not want the iterator
      	to be overwritten. This restriction may change in the future.
      	But note, this is used for static reading of a buffer which
      	is usually done "after" a trace. Live readings would want
      	to use the ring_buffer_consume above, which will not
      	disable the ring buffer.
      
      ring_buffer_read_finish: Finishes the read iterator and reenables
      	the ring buffer.
      
      ring_buffer_iter_peek: Look at the next item in the cpu iterator.
      ring_buffer_read: Read the iterator and increment it.
      ring_buffer_iter_reset: Reset the iterator to point to the beginning
      	of the cpu buffer.
      ring_buffer_iter_empty: Returns true if the iterator is at the end
      	of the cpu buffer.
      
      ring_buffer_size: returns the size in bytes of each cpu buffer.
      	Note, the real size is this times the number of CPUs.
      
      ring_buffer_reset_cpu: Sets the cpu buffer to empty
      ring_buffer_reset: sets all cpu buffers to empty
      
      ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
      	cpu buffer of another buffer. This is handy when you
      	want to take a snap shot of a running trace on just one
      	cpu. Having a backup buffer, to swap with facilitates this.
      	Ftrace max latencies use this.
      
      ring_buffer_empty: Returns true if the ring buffer is empty.
      ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.
      
      ring_buffer_record_disable: disable all cpu buffers (read only)
      ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
      ring_buffer_record_enable: enable all cpu buffers.
      ring_buffer_record_enabl_cpu: enable a single cpu buffer.
      
      ring_buffer_entries: The number of entries in a ring buffer.
      ring_buffer_overruns: The number of entries removed due to writing wrap.
      
      ring_buffer_time_stamp: Get the time stamp used by the ring buffer
      ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
      	into nanosecs.
      
      I still need to implement the GTOD feature. But we need support from
      the cpu frequency infrastructure.  But this can be done at a later
      time without affecting the ring buffer interface.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7a8e76a3