1. 04 2月, 2010 1 次提交
  2. 27 1月, 2010 2 次提交
    • S
      ring-buffer: Check for end of page in iterator · 3c05d748
      Steven Rostedt 提交于
      If the iterator comes to an empty page for some reason, or if
      the page is emptied by a consuming read. The iterator code currently
      does not check if the iterator is pass the contents, and may
      return a false entry.
      
      This patch adds a check to the ring buffer iterator to test if the
      current page has been completely read and sets the iterator to the
      next page if necessary.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3c05d748
    • S
      ring-buffer: Check if ring buffer iterator has stale data · 492a74f4
      Steven Rostedt 提交于
      Usually reads of the ring buffer is performed by a single task.
      There are two types of reads from the ring buffer.
      
      One is a consuming read which will consume the entry that was read
      and the next read will be the entry that follows.
      
      The other is an iterator that will let the user read the contents of
      the ring buffer without modifying it. When an iterator is allocated,
      writes to the ring buffer are disabled to protect the iterator.
      
      The problem exists when consuming reads happen while an iterator is
      allocated. Specifically, the kind of read that swaps out an entire
      page (used by splice) and replaces it with a new read. If the iterator
      is on the page that is swapped out, then the next read may read
      from this swapped out page and return garbage.
      
      This patch adds a check when reading the iterator to make sure that
      the iterator contents are still valid. If a consuming read has taken
      place, the iterator is reset.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      492a74f4
  3. 07 1月, 2010 2 次提交
    • S
      ring-buffer: Add rb_list_head() wrapper around new reader page next field · 0e1ff5d7
      Steven Rostedt 提交于
      If the very unlikely case happens where the writer moves the head by one
      between where the head page is read and where the new reader page
      is assigned _and_ the writer then writes and wraps the entire ring buffer
      so that the head page is back to what was originally read as the head page,
      the page to be swapped will have a corrupted next pointer.
      
      Simple solution is to wrap the assignment of the next pointer with a
      rb_list_head().
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0e1ff5d7
    • D
      ring-buffer: Wrap a list.next reference with rb_list_head() · 5ded3dc6
      David Sharp 提交于
      This reference at the end of rb_get_reader_page() was causing off-by-one
      writes to the prev pointer of the page after the reader page when that
      page is the head page, and therefore the reader page has the RB_PAGE_HEAD
      flag in its list.next pointer. This eventually results in a GPF in a
      subsequent call to rb_set_head_page() (usually from rb_get_reader_page())
      when that prev pointer is dereferenced. The dereferenced register would
      characteristically have an address that appears shifted left by one byte
      (eg, ffxxxxxxxxxxxxyy instead of ffffxxxxxxxxxxxx) due to being written at
      an address one byte too high.
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1262826727-9090-1-git-send-email-dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5ded3dc6
  4. 15 12月, 2009 3 次提交
  5. 11 12月, 2009 2 次提交
    • S
      ring-buffer: Move resize integrity check under reader lock · dd7f5943
      Steven Rostedt 提交于
      While using an application that does splice on the ftrace ring
      buffer at start up, I triggered an integrity check failure.
      
      Looking into this, I discovered that resizing the buffer performs
      an integrity check after the buffer is resized. This check unfortunately
      is preformed after it releases the reader lock. If a reader is
      reading the buffer it may cause the integrity check to trigger a
      false failure.
      
      This patch simply moves the integrity checker under the protection
      of the ring buffer reader lock.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      dd7f5943
    • S
      ring-buffer: Use sync sched protection on ring buffer resizing · 18421015
      Steven Rostedt 提交于
      There was a comment in the ring buffer code that says the calling
      layers should prevent tracing or reading of the ring buffer while
      resizing. I have discovered that the tracers do not honor this
      arrangement.
      
      This patch moves the disabling and synchronizing the ring buffer to
      a higher layer during resizing. This guarantees that no writes
      are occurring while the resize takes place.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      18421015
  6. 17 11月, 2009 1 次提交
    • S
      ring-buffer: Move access to commit_page up into function used · 5a50e33c
      Steven Rostedt 提交于
      With the change of the way we process commits. Where a commit only happens
      at the outer most level, and that we don't need to worry about
      a commit ending after the rb_start_commit() has been called, the code
      use to grab the commit page before the tail page to prevent a possible
      race. But this race no longer exists with the rb_start_commit()
      rb_end_commit() interface.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5a50e33c
  7. 04 11月, 2009 1 次提交
  8. 24 10月, 2009 2 次提交
  9. 06 10月, 2009 1 次提交
  10. 20 9月, 2009 1 次提交
  11. 14 9月, 2009 1 次提交
    • S
      ring-buffer: typecast cmpxchg to fix PowerPC warning · 08a40816
      Steven Rostedt 提交于
      The cmpxchg used by PowerPC does the following:
      
        ({									 \
           __typeof__(*(ptr)) _o_ = (o);					 \
           __typeof__(*(ptr)) _n_ = (n);					 \
           (__typeof__(*(ptr))) __cmpxchg((ptr), (unsigned long)_o_,		 \
      				    (unsigned long)_n_, sizeof(*(ptr))); \
        })
      
      This does a type check of *ptr to both o and n.
      
      Unfortunately, the code in ring-buffer.c assigns longs to pointers
      and pointers to longs and causes a warning on PowerPC:
      
      ring_buffer.c: In function 'rb_head_page_set':
      ring_buffer.c:704: warning: initialization makes pointer from integer without a cast
      ring_buffer.c:704: warning: initialization makes pointer from integer without a cast
      ring_buffer.c: In function 'rb_head_page_replace':
      ring_buffer.c:797: warning: initialization makes integer from pointer without a cast
      
      This patch adds the typecasts inside cmpxchg to annotate that a long is
      being cast to a pointer and a pointer is being casted to a long and this
      removes the PowerPC warnings.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      08a40816
  12. 10 9月, 2009 1 次提交
  13. 05 9月, 2009 2 次提交
  14. 04 9月, 2009 7 次提交
    • S
      ring-buffer: disable all cpu buffers when one finds a problem · 077c5407
      Steven Rostedt 提交于
      Currently the way RB_WARN_ON works, is to disable either the current
      CPU buffer or all CPU buffers, depending on whether a ring_buffer or
      ring_buffer_per_cpu struct was passed into the macro.
      
      Most users of the RB_WARN_ON pass in the CPU buffer, so only the one
      CPU buffer gets disabled but the rest are still active. This may
      confuse users even though a warning is sent to the console.
      
      This patch changes the macro to disable the entire buffer even if
      the CPU buffer is passed in.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      077c5407
    • S
      ring-buffer: do not count discarded events · a1863c21
      Steven Rostedt 提交于
      The latency tracers report the number of items in the trace buffer.
      This uses the ring buffer data to calculate this. Because discarded
      events are also counted, the numbers do not match the number of items
      that are printed. The ring buffer also adds a "padding" item to the
      end of each buffer page which also gets counted as a discarded item.
      
      This patch decrements the counter to the page entries on a discard.
      This allows us to ignore discarded entries while reading the buffer.
      
      Decrementing the counter is still safe since it can only happen while
      the committing flag is still set.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a1863c21
    • S
      ring-buffer: remove ring_buffer_event_discard · dc892f73
      Steven Rostedt 提交于
      The function ring_buffer_event_discard can be used on any item in the
      ring buffer, even after the item was committed. This function provides
      no safety nets and is very race prone.
      
      An item may be safely removed from the ring buffer before it is committed
      with the ring_buffer_discard_commit.
      
      Since there are currently no users of this function, and because this
      function is racey and error prone, this patch removes it altogether.
      
      Note, removing this function also allows the counters to ignore
      all discarded events (patches will follow).
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      dc892f73
    • S
      ring-buffer: fix ring_buffer_read crossing pages · 7e9391cf
      Steven Rostedt 提交于
      When the ring buffer uses an iterator (static read mode, not on the
      fly reading), when it crosses a page boundery, it will skip the first
      entry on the next page. The reason is that the last entry of a page
      is usually padding if the page is not full. The padding will not be
      returned to the user.
      
      The problem arises on ring_buffer_read because it also increments the
      iterator. Because both the read and peek use the same rb_iter_peek,
      the rb_iter_peak will return the padding but also increment to the next
      item. This is because the ring_buffer_peek will not incerment it
      itself.
      
      The ring_buffer_read will increment it again and then call rb_iter_peek
      again to get the next item. But that will be the second item, not the
      first one on the page.
      
      The reason this never showed up before, is because the ftrace utility
      always calls ring_buffer_peek first and only uses ring_buffer_read
      to increment to the next item. The ring_buffer_peek will always keep
      the pointer to a valid item and not padding. This just hid the bug.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7e9391cf
    • S
      ring-buffer: remove unnecessary cpu_relax · 1b959e18
      Steven Rostedt 提交于
      The loops in the ring buffer that use cpu_relax are not dependent on
      other CPUs. They simply came across some padding in the ring buffer and
      are skipping over them. It is a normal loop and does not require a
      cpu_relax.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1b959e18
    • S
      ring-buffer: do not swap buffers during a commit · 98277991
      Steven Rostedt 提交于
      If a commit is taking place on a CPU ring buffer, do not allow it to
      be swapped. Return -EBUSY when this is detected instead.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      98277991
    • S
      ring-buffer: do not reset while in a commit · 41b6a95d
      Steven Rostedt 提交于
      The callers of reset must ensure that no commit can be taking place
      at the time of the reset. If it does then we may corrupt the ring buffer.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      41b6a95d
  15. 08 8月, 2009 1 次提交
  16. 06 8月, 2009 3 次提交
    • R
      ring-buffer: Fix advance of reader in rb_buffer_peek() · 469535a5
      Robert Richter 提交于
      When calling rb_buffer_peek() from ring_buffer_consume() and a
      padding event is returned, the function rb_advance_reader() is
      called twice. This may lead to missing samples or under high
      workloads to the warning below. This patch fixes this. If a padding
      event is returned by rb_buffer_peek() it will be consumed by the
      calling function now.
      
      Also, I simplified some code in ring_buffer_consume().
      
      ------------[ cut here ]------------
      WARNING: at /dev/shm/.source/linux/kernel/trace/ring_buffer.c:2289 rb_advance_reader+0x2e/0xc5()
      Hardware name: Anaheim
      Modules linked in:
      Pid: 29, comm: events/2 Tainted: G        W  2.6.31-rc3-oprofile-x86_64-standard-00059-g5050dc2 #1
      Call Trace:
      [<ffffffff8106776f>] ? rb_advance_reader+0x2e/0xc5
      [<ffffffff81039ffe>] warn_slowpath_common+0x77/0x8f
      [<ffffffff8103a025>] warn_slowpath_null+0xf/0x11
      [<ffffffff8106776f>] rb_advance_reader+0x2e/0xc5
      [<ffffffff81068bda>] ring_buffer_consume+0xa0/0xd2
      [<ffffffff81326933>] op_cpu_buffer_read_entry+0x21/0x9e
      [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
      [<ffffffff8132749b>] sync_buffer+0xa5/0x401
      [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
      [<ffffffff81326c1b>] ? wq_sync_buffer+0x0/0x78
      [<ffffffff81326c76>] wq_sync_buffer+0x5b/0x78
      [<ffffffff8104aa30>] worker_thread+0x113/0x1ac
      [<ffffffff8104dd95>] ? autoremove_wake_function+0x0/0x38
      [<ffffffff8104a91d>] ? worker_thread+0x0/0x1ac
      [<ffffffff8104dc9a>] kthread+0x88/0x92
      [<ffffffff8100bdba>] child_rip+0xa/0x20
      [<ffffffff8104dc12>] ? kthread+0x0/0x92
      [<ffffffff8100bdb0>] ? child_rip+0x0/0x20
      ---[ end trace f561c0a58fcc89bd ]---
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      469535a5
    • S
      ring-buffer: do not disable ring buffer on oops_in_progress · 464e85eb
      Steven Rostedt 提交于
      The commit:
      
        commit e0fdace1
        Author: David Miller <davem@davemloft.net>
        Date:   Fri Aug 1 01:11:22 2008 -0700
      
          debug_locks: set oops_in_progress if we will log messages.
      
          Otherwise lock debugging messages on runqueue locks can deadlock the
          system due to the wakeups performed by printk().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
      Will permanently set oops_in_progress on any lockdep failure.
      When this triggers it will cause any read from the ring buffer to
      permanently disable the ring buffer (not to mention no locking of
      printk).
      
      This patch removes the check. It keeps the print in NMI which makes
      sense. This is probably OK, since the ring buffer should not cause
      something to set oops_in_progress anyway.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      464e85eb
    • S
      ring-buffer: fix check of try_to_discard result · 0f2541d2
      Steven Rostedt 提交于
      The function ring_buffer_discard_commit inversed the code path
      of the result of try_to_discard. It should skip incrementing the
      entry counter if try_to_discard succeeded. But instead, it increments
      the entry conder if it succeeded to discard, and does not increment
      it if it fails.
      
      The result of this bug is that filtering will make the stat counters
      incorrect.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0f2541d2
  17. 17 7月, 2009 1 次提交
  18. 08 7月, 2009 2 次提交
    • S
      ring-buffer: make lockless · 77ae365e
      Steven Rostedt 提交于
      This patch converts the ring buffers into a completely lockless
      buffer recording system. The read side still takes locks since
      we still serialize readers. But the writers are the ones that
      must be lockless (those can happen in NMIs).
      
      The main change is to the "head_page" pointer. We write to the
      tail, and read from the head. The "head_page" pointer in the cpu
      buffer is now just a reference to where to look. The real head
      page is now kept in the head_page->list->prev->next pointer.
      That is, in the list head of the previous page we set flags.
      
      The list pages are allocated to be aligned such that the lowest
      significant bits are always zero pointing to the list. This gives
      us play to put in flags to their pointers.
      
      bit 0: set when the page is a head page
      bit 1: set when the writer is moving the page (for overwrite mode)
      
      cmpxchg is used to update the pointer.
      
      When the writer wraps the buffer and the tail meets the head,
      in overwrite mode, the writer must move the head page forward.
      It first uses cmpxchg to change the pointer flag from 1 to 2.
      Once this is done, the reader on another CPU will not take the
      page from the buffer.
      
      The writers need to protect against interrupts (we don't bother with
      disabling interrupts because NMIs are allowed to write too).
      
      After the writer sets the pointer flag to 2, it takes care to
      manage interrupts coming in. This is discribed in detail within the
      comments of the code.
      
       Changes in version 2:
        - Let reader reset entries value of header page.
        - Fix tail page passing commit page on reader page test.
        - Always increment entries and write counter in rb_tail_page_update
        - Add safety check in rb_set_commit_to_write to break out of infinite loop
        - add mask in rb_is_reader_page
      
      [ Impact: lock free writing to the ring buffer ]
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      77ae365e
    • S
      ring-buffer: make the buffer a true circular link list · 3adc54fa
      Steven Rostedt 提交于
      This patch changes the ring buffer data pages from using a link list
      head pointer, to making each buffer page point to another buffer page
      and never back to a "head".
      
      This makes the handling of the ring buffer less complex, since the
      traversing of the ring buffer pages no longer needs to account for the
      head pointer.
      
      This change also is needed to make the ring buffer lockless.
      
      [
        Changes in version 2:
      
        - Added change that Lai Jiangshan mentioned.
      
        From: Lai Jiangshan <laijs@cn.fujitsu.com>
        Date: Thu, 11 Jun 2009 11:25:48 +0800
        LKML-Reference: <4A30793C.6090208@cn.fujitsu.com>
      
        I'm not sure whether these 4 lines:
      	bpage = list_entry(pages.next, struct buffer_page, list);
      	list_del_init(&bpage->list);
      	cpu_buffer->pages = &bpage->list;
      
      	list_splice(&pages, cpu_buffer->pages);
        equal to these 2 lines:
       	cpu_buffer->pages = pages.next;
       	list_del(&pages);
      
        If there are equivalent, I think the second one
        are simpler. It may be not a really necessarily cleanup.
      
        What I asked is: if there are equivalent, could you use these two line:
       	cpu_buffer->pages = pages.next;
      	list_del(&pages);
      ]
      
      [ Impact: simplify the ring buffer to help make it lockless ]
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      3adc54fa
  19. 25 6月, 2009 1 次提交
    • P
      ring-buffer: Make it generally available · 1155de47
      Paul Mundt 提交于
      In hunting down the cause for the hwlat_detector ring buffer spew in
      my failed -next builds it became obvious that folks are now treating
      ring_buffer as something that is generic independent of tracing and thus,
      suitable for public driver consumption.
      
      Given that there are only a few minor areas in ring_buffer that have any
      reliance on CONFIG_TRACING or CONFIG_FUNCTION_TRACER, provide stubs for
      those and make it generally available.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Jon Masters <jcm@jonmasters.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <20090625053012.GB19944@linux-sh.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1155de47
  20. 18 6月, 2009 4 次提交
    • S
      ring-buffer: do not grab locks in nmi · 8d707e8e
      Steven Rostedt 提交于
      If ftrace_dump_on_oops is set, and an NMI detects a lockup, then it
      will need to read from the ring buffer. But the read side of the
      ring buffer still takes locks. This patch adds a check on the read
      side that if it is in an NMI, then it will disable the ring buffer
      and not take any locks.
      
      Reads can still happen on a disabled ring buffer.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8d707e8e
    • S
      ring-buffer: add locks around rb_per_cpu_empty · d4788207
      Steven Rostedt 提交于
      The checking of whether the buffer is empty or not needs to be serialized
      among the readers. Add the reader spin lock around it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d4788207
    • S
      ring-buffer: check for less than two in size allocation · 5f78abee
      Steven Rostedt 提交于
      The ring buffer must have at least two pages allocated for the
      reader page swap to work.
      
      The page count check will miss the case of a zero size passed in.
      Even though a zero size ring buffer would probably fail an allocation,
      making the min size check for less than two instead of equal to one makes
      the code a bit more robust.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5f78abee
    • S
      ring-buffer: remove useless compile check for buffer_page size · 0dcd4d6c
      Steven Rostedt 提交于
      The original version of the ring buffer had a hack to map the
      page struct that held the pages of the buffer to also be the structure
      that the ring buffer would keep the pages in a link list.
      
      This overlap of the page struct was very dangerous and that hack was
      removed a while ago.
      
      But there was a check to make sure the buffer_page never became bigger
      than the page struct, and would fail the compile if it did. The
      check was only meaningful when we had the hack. Now that we have separate
      allocated descriptors for the buffer pages, we can remove this check.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0dcd4d6c
  21. 17 6月, 2009 1 次提交