1. 01 4月, 2010 2 次提交
    • S
      ring-buffer: Add lost event count to end of sub buffer · ff0ff84a
      Steven Rostedt 提交于
      Currently, binary readers of the ring buffer only know where events were
      lost, but not how many events were lost at that location.
      This information is available, but it would require adding another
      field to the sub buffer header to include it.
      
      But when a event can not fit at the end of a sub buffer, it is written
      to the next sub buffer. This means there is a good chance that the
      buffer may have room to hold this counter. If it does, write
      the counter at the end of the sub buffer and set another flag
      in the data size field that states that this information exists.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ff0ff84a
    • S
      ring-buffer: Add place holder recording of dropped events · 66a8cb95
      Steven Rostedt 提交于
      Currently, when the ring buffer drops events, it does not record
      the fact that it did so. It does inform the writer that the event
      was dropped by returning a NULL event, but it does not put in any
      place holder where the event was dropped.
      
      This is not a trivial thing to add because the ring buffer mostly
      runs in overwrite (flight recorder) mode. That is, when the ring
      buffer is full, new data will overwrite old data.
      
      In a produce/consumer mode, where new data is simply dropped when
      the ring buffer is full, it is trivial to add the placeholder
      for dropped events. When there's more room to write new data, then
      a special event can be added to notify the reader about the dropped
      events.
      
      But in overwrite mode, any new write can overwrite events. A place
      holder can not be inserted into the ring buffer since there never
      may be room. A reader could also come in at anytime and miss the
      placeholder.
      
      Luckily, the way the ring buffer works, the read side can find out
      if events were lost or not, and how many events. Everytime a write
      takes place, if it overwrites the header page (the next read) it
      updates a "overrun" variable that keeps track of the number of
      lost events. When a reader swaps out a page from the ring buffer,
      it can record this number, perfom the swap, and then check to
      see if the number changed, and take the diff if it has, which would be
      the number of events dropped. This can be stored by the reader
      and returned to callers of the reader.
      
      Since the reader page swap will fail if the writer moved the head
      page since the time the reader page set up the swap, this gives room
      to record the overruns without worrying about races. If the reader
      sets up the pages, records the overrun, than performs the swap,
      if the swap succeeds, then the overrun variable has not been
      updated since the setup before the swap.
      
      For binary readers of the ring buffer, a flag is set in the header
      of each sub page (sub buffer) of the ring buffer. This flag is embedded
      in the size field of the data on the sub buffer, in the 31st bit (the size
      can be 32 or 64 bits depending on the architecture), but only 27
      bits needs to be used for the actual size (less actually).
      
      We could add a new field in the sub buffer header to also record the
      number of events dropped since the last read, but this will change the
      format of the binary ring buffer a bit too much. Perhaps this change can
      be made if the information on the number of events dropped is considered
      important enough.
      
      Note, the notification of dropped events is only used by consuming reads
      or peeking at the ring buffer. Iterating over the ring buffer does not
      keep this information because the necessary data is only available when
      a page swap is made, and the iterator does not swap out pages.
      
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      66a8cb95
  2. 19 3月, 2010 1 次提交
    • S
      ring-buffer: Do 8 byte alignment for 64 bit that can not handle 4 byte align · 2271048d
      Steven Rostedt 提交于
      The ring buffer uses 4 byte alignment while recording events into the
      buffer, even on 64bit machines. This saves space when there are lots
      of events being recorded at 4 byte boundaries.
      
      The ring buffer has a zero copy method to write into the buffer, with
      the reserving of space and then committing it. This may cause problems
      when writing an 8 byte word into a 4 byte alignment (not 8). For x86 and
      PPC this is not an issue, but on some architectures this would cause an
      out-of-alignment exception.
      
      This patch uses CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to determine
      if it is OK to use 4 byte alignments on 64 bit machines. If it is not,
      it forces the ring buffer event header to be 8 bytes and not 4,
      and will align the length of the data to be 8 byte aligned.
      This keeps the data payload at 8 byte alignments and will allow these
      machines to run without issue.
      
      The trick to this is that the header can be either 4 bytes or 8 bytes
      depending on the length of the data payload. The 4 byte header
      has a length field that supports up to 112 bytes. If the length of
      the data is more than 112, the length field is set to zero, and the actual
      length is stored in the next 4 bytes after the header.
      
      When CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not set, the code forces
      zero in the 4 byte header forcing the length to be stored in the 4 byte
      array, even with a small data load. It also forces the length of the
      data load to be 8 byte aligned. The combination of these two guarantee
      that the data is always at 8 byte alignment.
      Tested-by: NFrederic Weisbecker <fweisbec@gmail.com>
                 (on sparc64)
      Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2271048d
  3. 13 3月, 2010 1 次提交
  4. 04 2月, 2010 1 次提交
  5. 27 1月, 2010 2 次提交
    • S
      ring-buffer: Check for end of page in iterator · 3c05d748
      Steven Rostedt 提交于
      If the iterator comes to an empty page for some reason, or if
      the page is emptied by a consuming read. The iterator code currently
      does not check if the iterator is pass the contents, and may
      return a false entry.
      
      This patch adds a check to the ring buffer iterator to test if the
      current page has been completely read and sets the iterator to the
      next page if necessary.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3c05d748
    • S
      ring-buffer: Check if ring buffer iterator has stale data · 492a74f4
      Steven Rostedt 提交于
      Usually reads of the ring buffer is performed by a single task.
      There are two types of reads from the ring buffer.
      
      One is a consuming read which will consume the entry that was read
      and the next read will be the entry that follows.
      
      The other is an iterator that will let the user read the contents of
      the ring buffer without modifying it. When an iterator is allocated,
      writes to the ring buffer are disabled to protect the iterator.
      
      The problem exists when consuming reads happen while an iterator is
      allocated. Specifically, the kind of read that swaps out an entire
      page (used by splice) and replaces it with a new read. If the iterator
      is on the page that is swapped out, then the next read may read
      from this swapped out page and return garbage.
      
      This patch adds a check when reading the iterator to make sure that
      the iterator contents are still valid. If a consuming read has taken
      place, the iterator is reset.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      492a74f4
  6. 07 1月, 2010 2 次提交
    • S
      ring-buffer: Add rb_list_head() wrapper around new reader page next field · 0e1ff5d7
      Steven Rostedt 提交于
      If the very unlikely case happens where the writer moves the head by one
      between where the head page is read and where the new reader page
      is assigned _and_ the writer then writes and wraps the entire ring buffer
      so that the head page is back to what was originally read as the head page,
      the page to be swapped will have a corrupted next pointer.
      
      Simple solution is to wrap the assignment of the next pointer with a
      rb_list_head().
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0e1ff5d7
    • D
      ring-buffer: Wrap a list.next reference with rb_list_head() · 5ded3dc6
      David Sharp 提交于
      This reference at the end of rb_get_reader_page() was causing off-by-one
      writes to the prev pointer of the page after the reader page when that
      page is the head page, and therefore the reader page has the RB_PAGE_HEAD
      flag in its list.next pointer. This eventually results in a GPF in a
      subsequent call to rb_set_head_page() (usually from rb_get_reader_page())
      when that prev pointer is dereferenced. The dereferenced register would
      characteristically have an address that appears shifted left by one byte
      (eg, ffxxxxxxxxxxxxyy instead of ffffxxxxxxxxxxxx) due to being written at
      an address one byte too high.
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1262826727-9090-1-git-send-email-dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5ded3dc6
  7. 05 1月, 2010 1 次提交
  8. 15 12月, 2009 3 次提交
  9. 11 12月, 2009 2 次提交
    • S
      ring-buffer: Move resize integrity check under reader lock · dd7f5943
      Steven Rostedt 提交于
      While using an application that does splice on the ftrace ring
      buffer at start up, I triggered an integrity check failure.
      
      Looking into this, I discovered that resizing the buffer performs
      an integrity check after the buffer is resized. This check unfortunately
      is preformed after it releases the reader lock. If a reader is
      reading the buffer it may cause the integrity check to trigger a
      false failure.
      
      This patch simply moves the integrity checker under the protection
      of the ring buffer reader lock.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      dd7f5943
    • S
      ring-buffer: Use sync sched protection on ring buffer resizing · 18421015
      Steven Rostedt 提交于
      There was a comment in the ring buffer code that says the calling
      layers should prevent tracing or reading of the ring buffer while
      resizing. I have discovered that the tracers do not honor this
      arrangement.
      
      This patch moves the disabling and synchronizing the ring buffer to
      a higher layer during resizing. This guarantees that no writes
      are occurring while the resize takes place.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      18421015
  10. 17 11月, 2009 1 次提交
    • S
      ring-buffer: Move access to commit_page up into function used · 5a50e33c
      Steven Rostedt 提交于
      With the change of the way we process commits. Where a commit only happens
      at the outer most level, and that we don't need to worry about
      a commit ending after the rb_start_commit() has been called, the code
      use to grab the commit page before the tail page to prevent a possible
      race. But this race no longer exists with the rb_start_commit()
      rb_end_commit() interface.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5a50e33c
  11. 04 11月, 2009 1 次提交
  12. 24 10月, 2009 2 次提交
  13. 06 10月, 2009 1 次提交
  14. 20 9月, 2009 1 次提交
  15. 14 9月, 2009 1 次提交
    • S
      ring-buffer: typecast cmpxchg to fix PowerPC warning · 08a40816
      Steven Rostedt 提交于
      The cmpxchg used by PowerPC does the following:
      
        ({									 \
           __typeof__(*(ptr)) _o_ = (o);					 \
           __typeof__(*(ptr)) _n_ = (n);					 \
           (__typeof__(*(ptr))) __cmpxchg((ptr), (unsigned long)_o_,		 \
      				    (unsigned long)_n_, sizeof(*(ptr))); \
        })
      
      This does a type check of *ptr to both o and n.
      
      Unfortunately, the code in ring-buffer.c assigns longs to pointers
      and pointers to longs and causes a warning on PowerPC:
      
      ring_buffer.c: In function 'rb_head_page_set':
      ring_buffer.c:704: warning: initialization makes pointer from integer without a cast
      ring_buffer.c:704: warning: initialization makes pointer from integer without a cast
      ring_buffer.c: In function 'rb_head_page_replace':
      ring_buffer.c:797: warning: initialization makes integer from pointer without a cast
      
      This patch adds the typecasts inside cmpxchg to annotate that a long is
      being cast to a pointer and a pointer is being casted to a long and this
      removes the PowerPC warnings.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      08a40816
  16. 10 9月, 2009 1 次提交
  17. 05 9月, 2009 2 次提交
  18. 04 9月, 2009 7 次提交
    • S
      ring-buffer: disable all cpu buffers when one finds a problem · 077c5407
      Steven Rostedt 提交于
      Currently the way RB_WARN_ON works, is to disable either the current
      CPU buffer or all CPU buffers, depending on whether a ring_buffer or
      ring_buffer_per_cpu struct was passed into the macro.
      
      Most users of the RB_WARN_ON pass in the CPU buffer, so only the one
      CPU buffer gets disabled but the rest are still active. This may
      confuse users even though a warning is sent to the console.
      
      This patch changes the macro to disable the entire buffer even if
      the CPU buffer is passed in.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      077c5407
    • S
      ring-buffer: do not count discarded events · a1863c21
      Steven Rostedt 提交于
      The latency tracers report the number of items in the trace buffer.
      This uses the ring buffer data to calculate this. Because discarded
      events are also counted, the numbers do not match the number of items
      that are printed. The ring buffer also adds a "padding" item to the
      end of each buffer page which also gets counted as a discarded item.
      
      This patch decrements the counter to the page entries on a discard.
      This allows us to ignore discarded entries while reading the buffer.
      
      Decrementing the counter is still safe since it can only happen while
      the committing flag is still set.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a1863c21
    • S
      ring-buffer: remove ring_buffer_event_discard · dc892f73
      Steven Rostedt 提交于
      The function ring_buffer_event_discard can be used on any item in the
      ring buffer, even after the item was committed. This function provides
      no safety nets and is very race prone.
      
      An item may be safely removed from the ring buffer before it is committed
      with the ring_buffer_discard_commit.
      
      Since there are currently no users of this function, and because this
      function is racey and error prone, this patch removes it altogether.
      
      Note, removing this function also allows the counters to ignore
      all discarded events (patches will follow).
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      dc892f73
    • S
      ring-buffer: fix ring_buffer_read crossing pages · 7e9391cf
      Steven Rostedt 提交于
      When the ring buffer uses an iterator (static read mode, not on the
      fly reading), when it crosses a page boundery, it will skip the first
      entry on the next page. The reason is that the last entry of a page
      is usually padding if the page is not full. The padding will not be
      returned to the user.
      
      The problem arises on ring_buffer_read because it also increments the
      iterator. Because both the read and peek use the same rb_iter_peek,
      the rb_iter_peak will return the padding but also increment to the next
      item. This is because the ring_buffer_peek will not incerment it
      itself.
      
      The ring_buffer_read will increment it again and then call rb_iter_peek
      again to get the next item. But that will be the second item, not the
      first one on the page.
      
      The reason this never showed up before, is because the ftrace utility
      always calls ring_buffer_peek first and only uses ring_buffer_read
      to increment to the next item. The ring_buffer_peek will always keep
      the pointer to a valid item and not padding. This just hid the bug.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7e9391cf
    • S
      ring-buffer: remove unnecessary cpu_relax · 1b959e18
      Steven Rostedt 提交于
      The loops in the ring buffer that use cpu_relax are not dependent on
      other CPUs. They simply came across some padding in the ring buffer and
      are skipping over them. It is a normal loop and does not require a
      cpu_relax.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1b959e18
    • S
      ring-buffer: do not swap buffers during a commit · 98277991
      Steven Rostedt 提交于
      If a commit is taking place on a CPU ring buffer, do not allow it to
      be swapped. Return -EBUSY when this is detected instead.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      98277991
    • S
      ring-buffer: do not reset while in a commit · 41b6a95d
      Steven Rostedt 提交于
      The callers of reset must ensure that no commit can be taking place
      at the time of the reset. If it does then we may corrupt the ring buffer.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      41b6a95d
  19. 08 8月, 2009 1 次提交
  20. 06 8月, 2009 3 次提交
    • R
      ring-buffer: Fix advance of reader in rb_buffer_peek() · 469535a5
      Robert Richter 提交于
      When calling rb_buffer_peek() from ring_buffer_consume() and a
      padding event is returned, the function rb_advance_reader() is
      called twice. This may lead to missing samples or under high
      workloads to the warning below. This patch fixes this. If a padding
      event is returned by rb_buffer_peek() it will be consumed by the
      calling function now.
      
      Also, I simplified some code in ring_buffer_consume().
      
      ------------[ cut here ]------------
      WARNING: at /dev/shm/.source/linux/kernel/trace/ring_buffer.c:2289 rb_advance_reader+0x2e/0xc5()
      Hardware name: Anaheim
      Modules linked in:
      Pid: 29, comm: events/2 Tainted: G        W  2.6.31-rc3-oprofile-x86_64-standard-00059-g5050dc2 #1
      Call Trace:
      [<ffffffff8106776f>] ? rb_advance_reader+0x2e/0xc5
      [<ffffffff81039ffe>] warn_slowpath_common+0x77/0x8f
      [<ffffffff8103a025>] warn_slowpath_null+0xf/0x11
      [<ffffffff8106776f>] rb_advance_reader+0x2e/0xc5
      [<ffffffff81068bda>] ring_buffer_consume+0xa0/0xd2
      [<ffffffff81326933>] op_cpu_buffer_read_entry+0x21/0x9e
      [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
      [<ffffffff8132749b>] sync_buffer+0xa5/0x401
      [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
      [<ffffffff81326c1b>] ? wq_sync_buffer+0x0/0x78
      [<ffffffff81326c76>] wq_sync_buffer+0x5b/0x78
      [<ffffffff8104aa30>] worker_thread+0x113/0x1ac
      [<ffffffff8104dd95>] ? autoremove_wake_function+0x0/0x38
      [<ffffffff8104a91d>] ? worker_thread+0x0/0x1ac
      [<ffffffff8104dc9a>] kthread+0x88/0x92
      [<ffffffff8100bdba>] child_rip+0xa/0x20
      [<ffffffff8104dc12>] ? kthread+0x0/0x92
      [<ffffffff8100bdb0>] ? child_rip+0x0/0x20
      ---[ end trace f561c0a58fcc89bd ]---
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      469535a5
    • S
      ring-buffer: do not disable ring buffer on oops_in_progress · 464e85eb
      Steven Rostedt 提交于
      The commit:
      
        commit e0fdace1
        Author: David Miller <davem@davemloft.net>
        Date:   Fri Aug 1 01:11:22 2008 -0700
      
          debug_locks: set oops_in_progress if we will log messages.
      
          Otherwise lock debugging messages on runqueue locks can deadlock the
          system due to the wakeups performed by printk().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
      Will permanently set oops_in_progress on any lockdep failure.
      When this triggers it will cause any read from the ring buffer to
      permanently disable the ring buffer (not to mention no locking of
      printk).
      
      This patch removes the check. It keeps the print in NMI which makes
      sense. This is probably OK, since the ring buffer should not cause
      something to set oops_in_progress anyway.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      464e85eb
    • S
      ring-buffer: fix check of try_to_discard result · 0f2541d2
      Steven Rostedt 提交于
      The function ring_buffer_discard_commit inversed the code path
      of the result of try_to_discard. It should skip incrementing the
      entry counter if try_to_discard succeeded. But instead, it increments
      the entry conder if it succeeded to discard, and does not increment
      it if it fails.
      
      The result of this bug is that filtering will make the stat counters
      incorrect.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0f2541d2
  21. 17 7月, 2009 1 次提交
  22. 08 7月, 2009 2 次提交
    • S
      ring-buffer: make lockless · 77ae365e
      Steven Rostedt 提交于
      This patch converts the ring buffers into a completely lockless
      buffer recording system. The read side still takes locks since
      we still serialize readers. But the writers are the ones that
      must be lockless (those can happen in NMIs).
      
      The main change is to the "head_page" pointer. We write to the
      tail, and read from the head. The "head_page" pointer in the cpu
      buffer is now just a reference to where to look. The real head
      page is now kept in the head_page->list->prev->next pointer.
      That is, in the list head of the previous page we set flags.
      
      The list pages are allocated to be aligned such that the lowest
      significant bits are always zero pointing to the list. This gives
      us play to put in flags to their pointers.
      
      bit 0: set when the page is a head page
      bit 1: set when the writer is moving the page (for overwrite mode)
      
      cmpxchg is used to update the pointer.
      
      When the writer wraps the buffer and the tail meets the head,
      in overwrite mode, the writer must move the head page forward.
      It first uses cmpxchg to change the pointer flag from 1 to 2.
      Once this is done, the reader on another CPU will not take the
      page from the buffer.
      
      The writers need to protect against interrupts (we don't bother with
      disabling interrupts because NMIs are allowed to write too).
      
      After the writer sets the pointer flag to 2, it takes care to
      manage interrupts coming in. This is discribed in detail within the
      comments of the code.
      
       Changes in version 2:
        - Let reader reset entries value of header page.
        - Fix tail page passing commit page on reader page test.
        - Always increment entries and write counter in rb_tail_page_update
        - Add safety check in rb_set_commit_to_write to break out of infinite loop
        - add mask in rb_is_reader_page
      
      [ Impact: lock free writing to the ring buffer ]
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      77ae365e
    • S
      ring-buffer: make the buffer a true circular link list · 3adc54fa
      Steven Rostedt 提交于
      This patch changes the ring buffer data pages from using a link list
      head pointer, to making each buffer page point to another buffer page
      and never back to a "head".
      
      This makes the handling of the ring buffer less complex, since the
      traversing of the ring buffer pages no longer needs to account for the
      head pointer.
      
      This change also is needed to make the ring buffer lockless.
      
      [
        Changes in version 2:
      
        - Added change that Lai Jiangshan mentioned.
      
        From: Lai Jiangshan <laijs@cn.fujitsu.com>
        Date: Thu, 11 Jun 2009 11:25:48 +0800
        LKML-Reference: <4A30793C.6090208@cn.fujitsu.com>
      
        I'm not sure whether these 4 lines:
      	bpage = list_entry(pages.next, struct buffer_page, list);
      	list_del_init(&bpage->list);
      	cpu_buffer->pages = &bpage->list;
      
      	list_splice(&pages, cpu_buffer->pages);
        equal to these 2 lines:
       	cpu_buffer->pages = pages.next;
       	list_del(&pages);
      
        If there are equivalent, I think the second one
        are simpler. It may be not a really necessarily cleanup.
      
        What I asked is: if there are equivalent, could you use these two line:
       	cpu_buffer->pages = pages.next;
      	list_del(&pages);
      ]
      
      [ Impact: simplify the ring buffer to help make it lockless ]
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      3adc54fa
  23. 25 6月, 2009 1 次提交
    • P
      ring-buffer: Make it generally available · 1155de47
      Paul Mundt 提交于
      In hunting down the cause for the hwlat_detector ring buffer spew in
      my failed -next builds it became obvious that folks are now treating
      ring_buffer as something that is generic independent of tracing and thus,
      suitable for public driver consumption.
      
      Given that there are only a few minor areas in ring_buffer that have any
      reliance on CONFIG_TRACING or CONFIG_FUNCTION_TRACER, provide stubs for
      those and make it generally available.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Jon Masters <jcm@jonmasters.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <20090625053012.GB19944@linux-sh.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1155de47