1. 20 9月, 2009 1 次提交
  2. 14 9月, 2009 1 次提交
    • S
      ring-buffer: typecast cmpxchg to fix PowerPC warning · 08a40816
      Steven Rostedt 提交于
      The cmpxchg used by PowerPC does the following:
      
        ({									 \
           __typeof__(*(ptr)) _o_ = (o);					 \
           __typeof__(*(ptr)) _n_ = (n);					 \
           (__typeof__(*(ptr))) __cmpxchg((ptr), (unsigned long)_o_,		 \
      				    (unsigned long)_n_, sizeof(*(ptr))); \
        })
      
      This does a type check of *ptr to both o and n.
      
      Unfortunately, the code in ring-buffer.c assigns longs to pointers
      and pointers to longs and causes a warning on PowerPC:
      
      ring_buffer.c: In function 'rb_head_page_set':
      ring_buffer.c:704: warning: initialization makes pointer from integer without a cast
      ring_buffer.c:704: warning: initialization makes pointer from integer without a cast
      ring_buffer.c: In function 'rb_head_page_replace':
      ring_buffer.c:797: warning: initialization makes integer from pointer without a cast
      
      This patch adds the typecasts inside cmpxchg to annotate that a long is
      being cast to a pointer and a pointer is being casted to a long and this
      removes the PowerPC warnings.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      08a40816
  3. 10 9月, 2009 1 次提交
  4. 05 9月, 2009 2 次提交
  5. 04 9月, 2009 7 次提交
    • S
      ring-buffer: disable all cpu buffers when one finds a problem · 077c5407
      Steven Rostedt 提交于
      Currently the way RB_WARN_ON works, is to disable either the current
      CPU buffer or all CPU buffers, depending on whether a ring_buffer or
      ring_buffer_per_cpu struct was passed into the macro.
      
      Most users of the RB_WARN_ON pass in the CPU buffer, so only the one
      CPU buffer gets disabled but the rest are still active. This may
      confuse users even though a warning is sent to the console.
      
      This patch changes the macro to disable the entire buffer even if
      the CPU buffer is passed in.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      077c5407
    • S
      ring-buffer: do not count discarded events · a1863c21
      Steven Rostedt 提交于
      The latency tracers report the number of items in the trace buffer.
      This uses the ring buffer data to calculate this. Because discarded
      events are also counted, the numbers do not match the number of items
      that are printed. The ring buffer also adds a "padding" item to the
      end of each buffer page which also gets counted as a discarded item.
      
      This patch decrements the counter to the page entries on a discard.
      This allows us to ignore discarded entries while reading the buffer.
      
      Decrementing the counter is still safe since it can only happen while
      the committing flag is still set.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a1863c21
    • S
      ring-buffer: remove ring_buffer_event_discard · dc892f73
      Steven Rostedt 提交于
      The function ring_buffer_event_discard can be used on any item in the
      ring buffer, even after the item was committed. This function provides
      no safety nets and is very race prone.
      
      An item may be safely removed from the ring buffer before it is committed
      with the ring_buffer_discard_commit.
      
      Since there are currently no users of this function, and because this
      function is racey and error prone, this patch removes it altogether.
      
      Note, removing this function also allows the counters to ignore
      all discarded events (patches will follow).
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      dc892f73
    • S
      ring-buffer: fix ring_buffer_read crossing pages · 7e9391cf
      Steven Rostedt 提交于
      When the ring buffer uses an iterator (static read mode, not on the
      fly reading), when it crosses a page boundery, it will skip the first
      entry on the next page. The reason is that the last entry of a page
      is usually padding if the page is not full. The padding will not be
      returned to the user.
      
      The problem arises on ring_buffer_read because it also increments the
      iterator. Because both the read and peek use the same rb_iter_peek,
      the rb_iter_peak will return the padding but also increment to the next
      item. This is because the ring_buffer_peek will not incerment it
      itself.
      
      The ring_buffer_read will increment it again and then call rb_iter_peek
      again to get the next item. But that will be the second item, not the
      first one on the page.
      
      The reason this never showed up before, is because the ftrace utility
      always calls ring_buffer_peek first and only uses ring_buffer_read
      to increment to the next item. The ring_buffer_peek will always keep
      the pointer to a valid item and not padding. This just hid the bug.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7e9391cf
    • S
      ring-buffer: remove unnecessary cpu_relax · 1b959e18
      Steven Rostedt 提交于
      The loops in the ring buffer that use cpu_relax are not dependent on
      other CPUs. They simply came across some padding in the ring buffer and
      are skipping over them. It is a normal loop and does not require a
      cpu_relax.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1b959e18
    • S
      ring-buffer: do not swap buffers during a commit · 98277991
      Steven Rostedt 提交于
      If a commit is taking place on a CPU ring buffer, do not allow it to
      be swapped. Return -EBUSY when this is detected instead.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      98277991
    • S
      ring-buffer: do not reset while in a commit · 41b6a95d
      Steven Rostedt 提交于
      The callers of reset must ensure that no commit can be taking place
      at the time of the reset. If it does then we may corrupt the ring buffer.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      41b6a95d
  6. 08 8月, 2009 1 次提交
  7. 06 8月, 2009 3 次提交
    • R
      ring-buffer: Fix advance of reader in rb_buffer_peek() · 469535a5
      Robert Richter 提交于
      When calling rb_buffer_peek() from ring_buffer_consume() and a
      padding event is returned, the function rb_advance_reader() is
      called twice. This may lead to missing samples or under high
      workloads to the warning below. This patch fixes this. If a padding
      event is returned by rb_buffer_peek() it will be consumed by the
      calling function now.
      
      Also, I simplified some code in ring_buffer_consume().
      
      ------------[ cut here ]------------
      WARNING: at /dev/shm/.source/linux/kernel/trace/ring_buffer.c:2289 rb_advance_reader+0x2e/0xc5()
      Hardware name: Anaheim
      Modules linked in:
      Pid: 29, comm: events/2 Tainted: G        W  2.6.31-rc3-oprofile-x86_64-standard-00059-g5050dc2 #1
      Call Trace:
      [<ffffffff8106776f>] ? rb_advance_reader+0x2e/0xc5
      [<ffffffff81039ffe>] warn_slowpath_common+0x77/0x8f
      [<ffffffff8103a025>] warn_slowpath_null+0xf/0x11
      [<ffffffff8106776f>] rb_advance_reader+0x2e/0xc5
      [<ffffffff81068bda>] ring_buffer_consume+0xa0/0xd2
      [<ffffffff81326933>] op_cpu_buffer_read_entry+0x21/0x9e
      [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
      [<ffffffff8132749b>] sync_buffer+0xa5/0x401
      [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
      [<ffffffff81326c1b>] ? wq_sync_buffer+0x0/0x78
      [<ffffffff81326c76>] wq_sync_buffer+0x5b/0x78
      [<ffffffff8104aa30>] worker_thread+0x113/0x1ac
      [<ffffffff8104dd95>] ? autoremove_wake_function+0x0/0x38
      [<ffffffff8104a91d>] ? worker_thread+0x0/0x1ac
      [<ffffffff8104dc9a>] kthread+0x88/0x92
      [<ffffffff8100bdba>] child_rip+0xa/0x20
      [<ffffffff8104dc12>] ? kthread+0x0/0x92
      [<ffffffff8100bdb0>] ? child_rip+0x0/0x20
      ---[ end trace f561c0a58fcc89bd ]---
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      469535a5
    • S
      ring-buffer: do not disable ring buffer on oops_in_progress · 464e85eb
      Steven Rostedt 提交于
      The commit:
      
        commit e0fdace1
        Author: David Miller <davem@davemloft.net>
        Date:   Fri Aug 1 01:11:22 2008 -0700
      
          debug_locks: set oops_in_progress if we will log messages.
      
          Otherwise lock debugging messages on runqueue locks can deadlock the
          system due to the wakeups performed by printk().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
      Will permanently set oops_in_progress on any lockdep failure.
      When this triggers it will cause any read from the ring buffer to
      permanently disable the ring buffer (not to mention no locking of
      printk).
      
      This patch removes the check. It keeps the print in NMI which makes
      sense. This is probably OK, since the ring buffer should not cause
      something to set oops_in_progress anyway.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      464e85eb
    • S
      ring-buffer: fix check of try_to_discard result · 0f2541d2
      Steven Rostedt 提交于
      The function ring_buffer_discard_commit inversed the code path
      of the result of try_to_discard. It should skip incrementing the
      entry counter if try_to_discard succeeded. But instead, it increments
      the entry conder if it succeeded to discard, and does not increment
      it if it fails.
      
      The result of this bug is that filtering will make the stat counters
      incorrect.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0f2541d2
  8. 17 7月, 2009 1 次提交
  9. 08 7月, 2009 2 次提交
    • S
      ring-buffer: make lockless · 77ae365e
      Steven Rostedt 提交于
      This patch converts the ring buffers into a completely lockless
      buffer recording system. The read side still takes locks since
      we still serialize readers. But the writers are the ones that
      must be lockless (those can happen in NMIs).
      
      The main change is to the "head_page" pointer. We write to the
      tail, and read from the head. The "head_page" pointer in the cpu
      buffer is now just a reference to where to look. The real head
      page is now kept in the head_page->list->prev->next pointer.
      That is, in the list head of the previous page we set flags.
      
      The list pages are allocated to be aligned such that the lowest
      significant bits are always zero pointing to the list. This gives
      us play to put in flags to their pointers.
      
      bit 0: set when the page is a head page
      bit 1: set when the writer is moving the page (for overwrite mode)
      
      cmpxchg is used to update the pointer.
      
      When the writer wraps the buffer and the tail meets the head,
      in overwrite mode, the writer must move the head page forward.
      It first uses cmpxchg to change the pointer flag from 1 to 2.
      Once this is done, the reader on another CPU will not take the
      page from the buffer.
      
      The writers need to protect against interrupts (we don't bother with
      disabling interrupts because NMIs are allowed to write too).
      
      After the writer sets the pointer flag to 2, it takes care to
      manage interrupts coming in. This is discribed in detail within the
      comments of the code.
      
       Changes in version 2:
        - Let reader reset entries value of header page.
        - Fix tail page passing commit page on reader page test.
        - Always increment entries and write counter in rb_tail_page_update
        - Add safety check in rb_set_commit_to_write to break out of infinite loop
        - add mask in rb_is_reader_page
      
      [ Impact: lock free writing to the ring buffer ]
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      77ae365e
    • S
      ring-buffer: make the buffer a true circular link list · 3adc54fa
      Steven Rostedt 提交于
      This patch changes the ring buffer data pages from using a link list
      head pointer, to making each buffer page point to another buffer page
      and never back to a "head".
      
      This makes the handling of the ring buffer less complex, since the
      traversing of the ring buffer pages no longer needs to account for the
      head pointer.
      
      This change also is needed to make the ring buffer lockless.
      
      [
        Changes in version 2:
      
        - Added change that Lai Jiangshan mentioned.
      
        From: Lai Jiangshan <laijs@cn.fujitsu.com>
        Date: Thu, 11 Jun 2009 11:25:48 +0800
        LKML-Reference: <4A30793C.6090208@cn.fujitsu.com>
      
        I'm not sure whether these 4 lines:
      	bpage = list_entry(pages.next, struct buffer_page, list);
      	list_del_init(&bpage->list);
      	cpu_buffer->pages = &bpage->list;
      
      	list_splice(&pages, cpu_buffer->pages);
        equal to these 2 lines:
       	cpu_buffer->pages = pages.next;
       	list_del(&pages);
      
        If there are equivalent, I think the second one
        are simpler. It may be not a really necessarily cleanup.
      
        What I asked is: if there are equivalent, could you use these two line:
       	cpu_buffer->pages = pages.next;
      	list_del(&pages);
      ]
      
      [ Impact: simplify the ring buffer to help make it lockless ]
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      3adc54fa
  10. 25 6月, 2009 1 次提交
    • P
      ring-buffer: Make it generally available · 1155de47
      Paul Mundt 提交于
      In hunting down the cause for the hwlat_detector ring buffer spew in
      my failed -next builds it became obvious that folks are now treating
      ring_buffer as something that is generic independent of tracing and thus,
      suitable for public driver consumption.
      
      Given that there are only a few minor areas in ring_buffer that have any
      reliance on CONFIG_TRACING or CONFIG_FUNCTION_TRACER, provide stubs for
      those and make it generally available.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Jon Masters <jcm@jonmasters.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <20090625053012.GB19944@linux-sh.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1155de47
  11. 18 6月, 2009 4 次提交
    • S
      ring-buffer: do not grab locks in nmi · 8d707e8e
      Steven Rostedt 提交于
      If ftrace_dump_on_oops is set, and an NMI detects a lockup, then it
      will need to read from the ring buffer. But the read side of the
      ring buffer still takes locks. This patch adds a check on the read
      side that if it is in an NMI, then it will disable the ring buffer
      and not take any locks.
      
      Reads can still happen on a disabled ring buffer.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8d707e8e
    • S
      ring-buffer: add locks around rb_per_cpu_empty · d4788207
      Steven Rostedt 提交于
      The checking of whether the buffer is empty or not needs to be serialized
      among the readers. Add the reader spin lock around it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d4788207
    • S
      ring-buffer: check for less than two in size allocation · 5f78abee
      Steven Rostedt 提交于
      The ring buffer must have at least two pages allocated for the
      reader page swap to work.
      
      The page count check will miss the case of a zero size passed in.
      Even though a zero size ring buffer would probably fail an allocation,
      making the min size check for less than two instead of equal to one makes
      the code a bit more robust.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5f78abee
    • S
      ring-buffer: remove useless compile check for buffer_page size · 0dcd4d6c
      Steven Rostedt 提交于
      The original version of the ring buffer had a hack to map the
      page struct that held the pages of the buffer to also be the structure
      that the ring buffer would keep the pages in a link list.
      
      This overlap of the page struct was very dangerous and that hack was
      removed a while ago.
      
      But there was a check to make sure the buffer_page never became bigger
      than the page struct, and would fail the compile if it did. The
      check was only meaningful when we had the hack. Now that we have separate
      allocated descriptors for the buffer pages, we can remove this check.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0dcd4d6c
  12. 17 6月, 2009 4 次提交
    • S
      ring-buffer: remove useless warn on check · c6a9d7b5
      Steven Rostedt 提交于
      A check if "write > BUF_PAGE_SIZE" is done right after a
      
      	if (write > BUF_PAGE_SIZE)
      		return ...;
      
      Thus the check is actually testing the compiler and not the
      kernel. This is useless, remove it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c6a9d7b5
    • S
      ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index · 22f470f8
      Steven Rostedt 提交于
      The index of the event is found by masking PAGE_MASK to it and
      subtracting the header size. Currently the header size is calculate
      by PAGE_SIZE - BUF_PAGE_SIZE, when we already have a macro
      BUF_PAGE_HDR_SIZE to define it.
      
      If we want to change BUF_PAGE_SIZE to something less than filling
      the rest of the page (this is done for debugging), then we break
      the algorithm to find the index.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      22f470f8
    • S
      ring-buffer: use commit counters for commit pointer accounting · fa743953
      Steven Rostedt 提交于
      The ring buffer is made up of three sets of pointers.
      
      The head page pointer, which points to the next page for the reader to
      get.
      
      The commit pointer and commit index, which points to the page and index
      of the last committed write respectively.
      
      The tail pointer and tail index, which points to the page and the index
      of the last reserved data respectively (non committed).
      
      The commit pointer is only moved forward by the outer most writer.
      If a nested writer comes in, it will not move the pointer forward.
      
      The current implementation has a flaw. It assumes that the outer most
      writer successfully reserved data. There's a small race window where
      the outer most writer could find the tail pointer, but a nested
      writer could come in (via interrupt) and move the tail forward, and
      even the commit forward.
      
      The outer writer would not realized the commit moved forward and the
      accounting will break.
      
      This patch changes the design to use counters in the per cpu buffers
      to keep track of commits. The counters are incremented at the start
      of the commit, and decremented at the end. If the end commit counter
      is 1, then it moves the commit pointers. A loop is made to check for
      races between checking and moving the commit pointers. Only the outer
      commit should move the pointers anyway.
      
      The test of knowing if a reserve is equal to the last commit update
      is still needed to know for time keeping. The time code is much less
      racey than the commit updates.
      
      This change not only solves the mentioned race, but also makes the
      code simpler.
      
      [ Impact: fix commit race and simplify code ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      fa743953
    • S
      ring-buffer: remove unused variable · 263294f3
      Steven Rostedt 提交于
      Fix the compiler error:
      
      kernel/trace/ring_buffer.c: In function 'rb_move_tail':
      kernel/trace/ring_buffer.c:1236: warning: unused variable 'event'
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      263294f3
  13. 15 6月, 2009 3 次提交
  14. 10 6月, 2009 1 次提交
  15. 09 6月, 2009 1 次提交
    • P
      ring-buffer: pass in lockdep class key for reader_lock · 1f8a6a10
      Peter Zijlstra 提交于
      On Sun, 7 Jun 2009, Ingo Molnar wrote:
      > Testing tracer sched_switch: <6>Starting ring buffer hammer
      > PASSED
      > Testing tracer sysprof: PASSED
      > Testing tracer function: PASSED
      > Testing tracer irqsoff:
      > =============================================
      > PASSED
      > Testing tracer preemptoff: PASSED
      > Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ]
      > PASSED
      > Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760
      > ---------------------------------------------
      > rb_consumer/431 is trying to acquire lock:
      >  (&cpu_buffer->reader_lock){......}, at: [<c109eef7>] ring_buffer_reset_cpu+0x37/0x70
      >
      > but task is already holding lock:
      >  (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
      >
      > other info that might help us debug this:
      > 1 lock held by rb_consumer/431:
      >  #0:  (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
      
      The ring buffer is a generic structure, and can be used outside of
      ftrace. If ftrace traces within the use of the ring buffer, it can produce
      false positives with lockdep.
      
      This patch passes in a static lock key into the allocation of the ring
      buffer, so that different ring buffers will have their own lock class.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1244477919.13761.9042.camel@twins>
      
      [ store key in ring buffer descriptor ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1f8a6a10
  16. 03 6月, 2009 3 次提交
    • S
      ring-buffer: discard timestamps that are at the start of the buffer · ea05b57c
      Steven Rostedt 提交于
      Every buffer page in the ring buffer includes its own time stamp.
      When an event is recorded to the ring buffer with a delta time greater
      than what can be held in the event header, a time stamp event is created.
      
      If the the create timestamp falls over to the next buffer page, it is
      redundant because the buffer page holds a full time stamp. This patch
      will try to discard the time stamp when it falls to the start of the
      next page.
      
      This change also fixes a issues with disarding events. If most events are
      discarded, timestamps will start to creep into the ring buffer. If we
      do not discard the timestamps then they can fill up the ring buffer over
      time and waste space.
      
      This change will keep time stamps from filling up over another page. If
      something is recorded in the buffer page, and the rest is filtered, then
      the time stamps can only fill up to the end of the page.
      
      [ Impact: prevent time stamps from filling ring buffer ]
      Reported-by: NTim Bird <tim.bird@am.sony.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ea05b57c
    • S
      ring-buffer: try to discard unneeded timestamps · edd813bf
      Steven Rostedt 提交于
      There are times that a race may happen that we add a timestamp in a
      nested write. This timestamp would just contain a zero delta and serves
      no purpose.
      
      Now that we have a way to discard events, this patch will try to discard
      the timestamp instead of just wasting the space in the ring buffer.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      edd813bf
    • T
      ring-buffer: fix bug in ring_buffer_discard_commit · a2023556
      Tim Bird 提交于
      There's a bug in ring_buffer_discard_commit.  The wrong
      pointer is being compared in order to check if the event
      can be freed from the buffer rather than discarded
      (i.e. marked as PAD).
      
      I noticed this when I was working on duration filtering.
      The bug is not deadly - it just results in lots of wasted
      space in the buffer.  All filtered events are left in
      the buffer and marked as discarded, rather than being
      removed from the buffer to make space for other events.
      
      Unfortunately, when I fixed this bug, I got errors doing a
      filtered function trace.  Multiple TIME_EXTEND
      events pile up in the buffer, and trigger the
      following loop overage warning in rb_iter_peek():
      
      again:
      	...
      	if (RB_WARN_ON(cpu_buffer, ++nr_loops > 10))
      		return NULL;
      
      I'm not sure what the best way is to fix this. I don't
      know if I should extend the loop threshhold, or if I should
      make the test more complex (ignore TIME_EXTEND
      events), or just get rid of this loop check completely.
      
      Note that if I implement a workaround for this, then I
      see another problem from rb_advance_iter().  I haven't
      tracked that one down yet.
      
      In general, it seems like the case of removing filtered
      events has not been working properly, and so some assumptions
      about buffer invariant conditions need to be revisited.
      
      Here's the patch for the simple fix:
      
      Compare correct pointer for checking if an event can be
      freed rather than left as discarded in the buffer.
      Signed-off-by: NTim Bird <tim.bird@am.sony.com>
      LKML-Reference: <4A25BE9E.5090909@am.sony.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a2023556
  17. 12 5月, 2009 4 次提交