1. 21 10月, 2010 1 次提交
    • S
      ring-buffer: Pass delta by value and not by reference · f25106ae
      Steven Rostedt 提交于
      The delta between events is passed to the timestamp code by reference
      and the timestamp code will reset the value. But it can be reset
      from the caller. No need to pass it in by reference.
      
      By changing the call to pass by value, lets gcc optimize the code
      a bit more where it can store the delta in a register and not
      worry about updating the reference.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f25106ae
  2. 20 10月, 2010 2 次提交
    • S
      ring-buffer: Pass timestamp by value and not by reference · e8bc43e8
      Steven Rostedt 提交于
      The original code for the ring buffer had locations that modified
      the timestamp and that change was used by the callers. Now,
      the timestamp is not reused by the callers and there is no reason
      to pass it by reference.
      
      By changing the call to pass by value, lets gcc optimize the code
      a bit more where it can store the timestamp in a register and not
      worry about updating the reference.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e8bc43e8
    • S
      ring-buffer: Make write slow path out of line · 747e94ae
      Steven Rostedt 提交于
      Gcc inlines the slow path of the ring buffer write which can
      hurt performance. This patch simply forces the slow path function
      rb_move_tail() to always be a function.
      
      The ring_buffer_benchmark module with reader_disabled=1 shows that
      this patch changes the time to record an event from 135 ns to
      132 ns. (3 ns or 2.22% improvement)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      747e94ae
  3. 13 10月, 2010 1 次提交
    • S
      ring-buffer: Fix typo of time extends per page · d0134324
      Steven Rostedt 提交于
      Time stamps for the ring buffer are created by the difference between
      two events. Each page of the ring buffer holds a full 64 bit timestamp.
      Each event has a 27 bit delta stamp from the last event. The unit of time
      is nanoseconds, so 27 bits can hold ~134 milliseconds. If two events
      happen more than 134 milliseconds apart, a time extend is inserted
      to add more bits for the delta. The time extend has 59 bits, which
      is good for ~18 years.
      
      Currently the time extend is committed separately from the event.
      If an event is discarded before it is committed, due to filtering,
      the time extend still exists. If all events are being filtered, then
      after ~134 milliseconds a new time extend will be added to the buffer.
      
      This can only happen till the end of the page. Since each page holds
      a full timestamp, there is no reason to add a time extend to the
      beginning of a page. Time extends can only fill a page that has actual
      data at the beginning, so there is no fear that time extends will fill
      more than a page without any data.
      
      When reading an event, a loop is made to skip over time extends
      since they are only used to maintain the time stamp and are never
      given to the caller. As a paranoid check to prevent the loop running
      forever, with the knowledge that time extends may only fill a page,
      a check is made that tests the iteration of the loop, and if the
      iteration is more than the number of time extends that can fit in a page
      a warning is printed and the ring buffer is disabled (all of ftrace
      is also disabled with it).
      
      There is another event type that is called a TIMESTAMP which can
      hold 64 bits of data in the theoretical case that two events happen
      18 years apart. This code has not been implemented, but the name
      of this event exists, as well as the structure for it. The
      size of a TIMESTAMP is 16 bytes, where as a time extend is only
      8 bytes. The macro used to calculate how many time extends can fit on
      a page used the TIMESTAMP size instead of the time extend size
      cutting the amount in half.
      
      The following test case can easily trigger the warning since we only
      need to have half the page filled with time extends to trigger the
      warning:
      
       # cd /sys/kernel/debug/tracing/
       # echo function > current_tracer
       # echo 'common_pid < 0' > events/ftrace/function/filter
       # echo > trace
       # echo 1 > trace_marker
       # sleep 120
       # cat trace
      
      Enabling the function tracer and then setting the filter to only trace
      functions where the process id is negative (no events), then clearing
      the trace buffer to ensure that we have nothing in the buffer,
      then write to trace_marker to add an event to the beginning of a page,
      sleep for 2 minutes (only 35 seconds is probably needed, but this
      guarantees the bug), and then finally reading the trace which will
      trigger the bug.
      
      This patch fixes the typo and prevents the false positive of that warning.
      Reported-by: NHans J. Koch <hjk@linutronix.de>
      Tested-by: NHans J. Koch <hjk@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Stable Kernel <stable@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d0134324
  4. 05 9月, 2010 1 次提交
  5. 07 8月, 2010 1 次提交
    • H
      tracing: Fix ring_buffer_read_page reading out of page boundary · 18fab912
      Huang Ying 提交于
      With the configuration: CONFIG_DEBUG_PAGEALLOC=y and Shaohua's patch:
      
      [PATCH]x86: make spurious_fault check correct pte bit
      
      Function call graph trace with the following will trigger a page fault.
      
      # cd /sys/kernel/debug/tracing/
      # echo function_graph > current_tracer
      # cat per_cpu/cpu1/trace_pipe_raw > /dev/null
      
      BUG: unable to handle kernel paging request at ffff880006e99000
      IP: [<ffffffff81085572>] rb_event_length+0x1/0x3f
      PGD 1b19063 PUD 1b1d063 PMD 3f067 PTE 6e99160
      Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      last sysfs file: /sys/devices/virtual/net/lo/operstate
      CPU 1
      Modules linked in:
      
      Pid: 1982, comm: cat Not tainted 2.6.35-rc6-aes+ #300 /Bochs
      RIP: 0010:[<ffffffff81085572>]  [<ffffffff81085572>] rb_event_length+0x1/0x3f
      RSP: 0018:ffff880006475e38  EFLAGS: 00010006
      RAX: 0000000000000ff0 RBX: ffff88000786c630 RCX: 000000000000001d
      RDX: ffff880006e98000 RSI: 0000000000000ff0 RDI: ffff880006e99000
      RBP: ffff880006475eb8 R08: 000000145d7008bd R09: 0000000000000000
      R10: 0000000000008000 R11: ffffffff815d9336 R12: ffff880006d08000
      R13: ffff880006e605d8 R14: 0000000000000000 R15: 0000000000000018
      FS:  00007f2b83e456f0(0000) GS:ffff880002100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: ffff880006e99000 CR3: 00000000064a8000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process cat (pid: 1982, threadinfo ffff880006474000, task ffff880006e40770)
      Stack:
       ffff880006475eb8 ffffffff8108730f 0000000000000ff0 000000145d7008bd
      <0> ffff880006e98010 ffff880006d08010 0000000000000296 ffff88000786c640
      <0> ffffffff81002956 0000000000000000 ffff8800071f4680 ffff8800071f4680
      Call Trace:
       [<ffffffff8108730f>] ? ring_buffer_read_page+0x15a/0x24a
       [<ffffffff81002956>] ? return_to_handler+0x15/0x2f
       [<ffffffff8108a575>] tracing_buffers_read+0xb9/0x164
       [<ffffffff810debfe>] vfs_read+0xaf/0x150
       [<ffffffff81002941>] return_to_handler+0x0/0x2f
       [<ffffffff810248b0>] __bad_area_nosemaphore+0x17e/0x1a1
       [<ffffffff81002941>] return_to_handler+0x0/0x2f
       [<ffffffff810248e6>] bad_area_nosemaphore+0x13/0x15
      Code: 80 25 b2 16 b3 00 fe c9 c3 55 48 89 e5 f0 80 0d a4 16 b3 00 02 c9 c3 55 31 c0 48 89 e5 48 83 3d 94 16 b3 00 01 c9 0f 94 c0 c3 55 <8a> 0f 48 89 e5 83 e1 1f b8 08 00 00 00 0f b6 d1 83 fa 1e 74 27
      RIP  [<ffffffff81085572>] rb_event_length+0x1/0x3f
       RSP <ffff880006475e38>
      CR2: ffff880006e99000
      ---[ end trace a6877bb92ccb36bb ]---
      
      The root cause is that ring_buffer_read_page() may read out of page
      boundary, because the boundary checking is done after reading. This is
      fixed via doing boundary checking before reading.
      Reported-by: NShaohua Li <shaohua.li@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      LKML-Reference: <1280297641.2771.307.camel@yhuang-dev>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      18fab912
  6. 21 7月, 2010 1 次提交
  7. 04 6月, 2010 1 次提交
    • S
      tracing: Remove ftrace_preempt_disable/enable · 5168ae50
      Steven Rostedt 提交于
      The ftrace_preempt_disable/enable functions were to address a
      recursive race caused by the function tracer. The function tracer
      traces all functions which makes it easily susceptible to recursion.
      One area was preempt_enable(). This would call the scheduler and
      the schedulre would call the function tracer and loop.
      (So was it thought).
      
      The ftrace_preempt_disable/enable was made to protect against recursion
      inside the scheduler by storing the NEED_RESCHED flag. If it was
      set before the ftrace_preempt_disable() it would not call schedule
      on ftrace_preempt_enable(), thinking that if it was set before then
      it would have already scheduled unless it was already in the scheduler.
      
      This worked fine except in the case of SMP, where another task would set
      the NEED_RESCHED flag for a task on another CPU, and then kick off an
      IPI to trigger it. This could cause the NEED_RESCHED to be saved at
      ftrace_preempt_disable() but the IPI to arrive in the the preempt
      disabled section. The ftrace_preempt_enable() would not call the scheduler
      because the flag was already set before entring the section.
      
      This bug would cause a missed preemption check and cause lower latencies.
      
      Investigating further, I found that the recusion caused by the function
      tracer was not due to schedule(), but due to preempt_schedule(). Now
      that preempt_schedule is completely annotated with notrace, the recusion
      no longer is an issue.
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5168ae50
  8. 25 5月, 2010 2 次提交
    • S
      ring-buffer: Move zeroing out excess in page to ring buffer code · 2711ca23
      Steven Rostedt 提交于
      Currently the trace splice code zeros out the excess bytes in the page before
      sending it off to userspace.
      
      This is to make sure userspace is not getting anything it should not be
      when reading the pages, because the excess data was never initialized
      to zero before writing (for perfomance reasons).
      
      But the splice code has no business in doing this work, it should be
      done by the ring buffer. With the latest changes for recording lost
      events, the splice code gets it wrong anyway.
      
      Move the zeroing out of excess bytes into the ring buffer code.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2711ca23
    • S
      ring-buffer: Reset "real_end" when page is filled · b3230c8b
      Steven Rostedt 提交于
      The code to store the "lost events" requires knowing the real end
      of the page. Since the 'commit' includes the padding at the end of
      a page a "real_end" variable was used to keep track of the end not
      including the padding.
      
      If events were lost, the reader can place the count of events in
      the padded area if there is enough room.
      
      The bug this patch fixes is that when we fill the page we do not
      reset the real_end variable, and if the writer had wrapped a few
      times, the real_end would be incorrect.
      
      This patch simply resets the real_end if the page was filled.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b3230c8b
  9. 05 5月, 2010 1 次提交
  10. 28 4月, 2010 1 次提交
    • D
      ring-buffer: Make non-consuming read less expensive with lots of cpus. · 72c9ddfd
      David Miller 提交于
      When performing a non-consuming read, a synchronize_sched() is
      performed once for every cpu which is actively tracing.
      
      This is very expensive, and can make it take several seconds to open
      up the 'trace' file with lots of cpus.
      
      Only one synchronize_sched() call is actually necessary.  What is
      desired is for all cpus to see the disabling state change.  So we
      transform the existing sequence:
      
      	for_each_cpu() {
      		ring_buffer_read_start();
      	}
      
      where each ring_buffer_start() call performs a synchronize_sched(),
      into the following:
      
      	for_each_cpu() {
      		ring_buffer_read_prepare();
      	}
      	ring_buffer_read_prepare_sync();
      	for_each_cpu() {
      		ring_buffer_read_start();
      	}
      
      wherein only the single ring_buffer_read_prepare_sync() call needs to
      do the synchronize_sched().
      
      The first phase, via ring_buffer_read_prepare(), allocates the 'iter'
      memory and increments ->record_disabled.
      
      In the second phase, ring_buffer_read_prepare_sync() makes sure this
      ->record_disabled state is visible fully to all cpus.
      
      And in the final third phase, the ring_buffer_read_start() calls reset
      the 'iter' objects allocated in the first phase since we now know that
      none of the cpus are adding trace entries any more.
      
      This makes openning the 'trace' file nearly instantaneous on a
      sparc64 Niagara2 box with 128 cpus tracing.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      LKML-Reference: <20100420.154711.11246950.davem@davemloft.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      72c9ddfd
  11. 01 4月, 2010 2 次提交
    • S
      ring-buffer: Add lost event count to end of sub buffer · ff0ff84a
      Steven Rostedt 提交于
      Currently, binary readers of the ring buffer only know where events were
      lost, but not how many events were lost at that location.
      This information is available, but it would require adding another
      field to the sub buffer header to include it.
      
      But when a event can not fit at the end of a sub buffer, it is written
      to the next sub buffer. This means there is a good chance that the
      buffer may have room to hold this counter. If it does, write
      the counter at the end of the sub buffer and set another flag
      in the data size field that states that this information exists.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ff0ff84a
    • S
      ring-buffer: Add place holder recording of dropped events · 66a8cb95
      Steven Rostedt 提交于
      Currently, when the ring buffer drops events, it does not record
      the fact that it did so. It does inform the writer that the event
      was dropped by returning a NULL event, but it does not put in any
      place holder where the event was dropped.
      
      This is not a trivial thing to add because the ring buffer mostly
      runs in overwrite (flight recorder) mode. That is, when the ring
      buffer is full, new data will overwrite old data.
      
      In a produce/consumer mode, where new data is simply dropped when
      the ring buffer is full, it is trivial to add the placeholder
      for dropped events. When there's more room to write new data, then
      a special event can be added to notify the reader about the dropped
      events.
      
      But in overwrite mode, any new write can overwrite events. A place
      holder can not be inserted into the ring buffer since there never
      may be room. A reader could also come in at anytime and miss the
      placeholder.
      
      Luckily, the way the ring buffer works, the read side can find out
      if events were lost or not, and how many events. Everytime a write
      takes place, if it overwrites the header page (the next read) it
      updates a "overrun" variable that keeps track of the number of
      lost events. When a reader swaps out a page from the ring buffer,
      it can record this number, perfom the swap, and then check to
      see if the number changed, and take the diff if it has, which would be
      the number of events dropped. This can be stored by the reader
      and returned to callers of the reader.
      
      Since the reader page swap will fail if the writer moved the head
      page since the time the reader page set up the swap, this gives room
      to record the overruns without worrying about races. If the reader
      sets up the pages, records the overrun, than performs the swap,
      if the swap succeeds, then the overrun variable has not been
      updated since the setup before the swap.
      
      For binary readers of the ring buffer, a flag is set in the header
      of each sub page (sub buffer) of the ring buffer. This flag is embedded
      in the size field of the data on the sub buffer, in the 31st bit (the size
      can be 32 or 64 bits depending on the architecture), but only 27
      bits needs to be used for the actual size (less actually).
      
      We could add a new field in the sub buffer header to also record the
      number of events dropped since the last read, but this will change the
      format of the binary ring buffer a bit too much. Perhaps this change can
      be made if the information on the number of events dropped is considered
      important enough.
      
      Note, the notification of dropped events is only used by consuming reads
      or peeking at the ring buffer. Iterating over the ring buffer does not
      keep this information because the necessary data is only available when
      a page swap is made, and the iterator does not swap out pages.
      
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      66a8cb95
  12. 30 3月, 2010 2 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
    • J
      ring-buffer: Add missing unlock · 292f60c0
      Julia Lawall 提交于
      In some error handling cases the lock is not unlocked.  The return is
      converted to a goto, to share the unlock at the end of the function.
      
      A simplified version of the semantic patch that finds this problem is as
      follows: (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r exists@
      expression E1;
      identifier f;
      @@
      
      f (...) { <+...
      * spin_lock_irq (E1,...);
      ... when != E1
      * return ...;
      ...+> }
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      LKML-Reference: <Pine.LNX.4.64.1003291736440.21896@ask.diku.dk>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      292f60c0
  13. 19 3月, 2010 1 次提交
    • S
      ring-buffer: Do 8 byte alignment for 64 bit that can not handle 4 byte align · 2271048d
      Steven Rostedt 提交于
      The ring buffer uses 4 byte alignment while recording events into the
      buffer, even on 64bit machines. This saves space when there are lots
      of events being recorded at 4 byte boundaries.
      
      The ring buffer has a zero copy method to write into the buffer, with
      the reserving of space and then committing it. This may cause problems
      when writing an 8 byte word into a 4 byte alignment (not 8). For x86 and
      PPC this is not an issue, but on some architectures this would cause an
      out-of-alignment exception.
      
      This patch uses CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to determine
      if it is OK to use 4 byte alignments on 64 bit machines. If it is not,
      it forces the ring buffer event header to be 8 bytes and not 4,
      and will align the length of the data to be 8 byte aligned.
      This keeps the data payload at 8 byte alignments and will allow these
      machines to run without issue.
      
      The trick to this is that the header can be either 4 bytes or 8 bytes
      depending on the length of the data payload. The 4 byte header
      has a length field that supports up to 112 bytes. If the length of
      the data is more than 112, the length field is set to zero, and the actual
      length is stored in the next 4 bytes after the header.
      
      When CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not set, the code forces
      zero in the 4 byte header forcing the length to be stored in the 4 byte
      array, even with a small data load. It also forces the length of the
      data load to be 8 byte aligned. The combination of these two guarantee
      that the data is always at 8 byte alignment.
      Tested-by: NFrederic Weisbecker <fweisbec@gmail.com>
                 (on sparc64)
      Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2271048d
  14. 13 3月, 2010 1 次提交
  15. 04 2月, 2010 1 次提交
  16. 27 1月, 2010 2 次提交
    • S
      ring-buffer: Check for end of page in iterator · 3c05d748
      Steven Rostedt 提交于
      If the iterator comes to an empty page for some reason, or if
      the page is emptied by a consuming read. The iterator code currently
      does not check if the iterator is pass the contents, and may
      return a false entry.
      
      This patch adds a check to the ring buffer iterator to test if the
      current page has been completely read and sets the iterator to the
      next page if necessary.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3c05d748
    • S
      ring-buffer: Check if ring buffer iterator has stale data · 492a74f4
      Steven Rostedt 提交于
      Usually reads of the ring buffer is performed by a single task.
      There are two types of reads from the ring buffer.
      
      One is a consuming read which will consume the entry that was read
      and the next read will be the entry that follows.
      
      The other is an iterator that will let the user read the contents of
      the ring buffer without modifying it. When an iterator is allocated,
      writes to the ring buffer are disabled to protect the iterator.
      
      The problem exists when consuming reads happen while an iterator is
      allocated. Specifically, the kind of read that swaps out an entire
      page (used by splice) and replaces it with a new read. If the iterator
      is on the page that is swapped out, then the next read may read
      from this swapped out page and return garbage.
      
      This patch adds a check when reading the iterator to make sure that
      the iterator contents are still valid. If a consuming read has taken
      place, the iterator is reset.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      492a74f4
  17. 07 1月, 2010 2 次提交
    • S
      ring-buffer: Add rb_list_head() wrapper around new reader page next field · 0e1ff5d7
      Steven Rostedt 提交于
      If the very unlikely case happens where the writer moves the head by one
      between where the head page is read and where the new reader page
      is assigned _and_ the writer then writes and wraps the entire ring buffer
      so that the head page is back to what was originally read as the head page,
      the page to be swapped will have a corrupted next pointer.
      
      Simple solution is to wrap the assignment of the next pointer with a
      rb_list_head().
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0e1ff5d7
    • D
      ring-buffer: Wrap a list.next reference with rb_list_head() · 5ded3dc6
      David Sharp 提交于
      This reference at the end of rb_get_reader_page() was causing off-by-one
      writes to the prev pointer of the page after the reader page when that
      page is the head page, and therefore the reader page has the RB_PAGE_HEAD
      flag in its list.next pointer. This eventually results in a GPF in a
      subsequent call to rb_set_head_page() (usually from rb_get_reader_page())
      when that prev pointer is dereferenced. The dereferenced register would
      characteristically have an address that appears shifted left by one byte
      (eg, ffxxxxxxxxxxxxyy instead of ffffxxxxxxxxxxxx) due to being written at
      an address one byte too high.
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1262826727-9090-1-git-send-email-dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5ded3dc6
  18. 05 1月, 2010 1 次提交
  19. 15 12月, 2009 3 次提交
  20. 11 12月, 2009 2 次提交
    • S
      ring-buffer: Move resize integrity check under reader lock · dd7f5943
      Steven Rostedt 提交于
      While using an application that does splice on the ftrace ring
      buffer at start up, I triggered an integrity check failure.
      
      Looking into this, I discovered that resizing the buffer performs
      an integrity check after the buffer is resized. This check unfortunately
      is preformed after it releases the reader lock. If a reader is
      reading the buffer it may cause the integrity check to trigger a
      false failure.
      
      This patch simply moves the integrity checker under the protection
      of the ring buffer reader lock.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      dd7f5943
    • S
      ring-buffer: Use sync sched protection on ring buffer resizing · 18421015
      Steven Rostedt 提交于
      There was a comment in the ring buffer code that says the calling
      layers should prevent tracing or reading of the ring buffer while
      resizing. I have discovered that the tracers do not honor this
      arrangement.
      
      This patch moves the disabling and synchronizing the ring buffer to
      a higher layer during resizing. This guarantees that no writes
      are occurring while the resize takes place.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      18421015
  21. 17 11月, 2009 1 次提交
    • S
      ring-buffer: Move access to commit_page up into function used · 5a50e33c
      Steven Rostedt 提交于
      With the change of the way we process commits. Where a commit only happens
      at the outer most level, and that we don't need to worry about
      a commit ending after the rb_start_commit() has been called, the code
      use to grab the commit page before the tail page to prevent a possible
      race. But this race no longer exists with the rb_start_commit()
      rb_end_commit() interface.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5a50e33c
  22. 04 11月, 2009 1 次提交
  23. 24 10月, 2009 2 次提交
  24. 06 10月, 2009 1 次提交
  25. 20 9月, 2009 1 次提交
  26. 14 9月, 2009 1 次提交
    • S
      ring-buffer: typecast cmpxchg to fix PowerPC warning · 08a40816
      Steven Rostedt 提交于
      The cmpxchg used by PowerPC does the following:
      
        ({									 \
           __typeof__(*(ptr)) _o_ = (o);					 \
           __typeof__(*(ptr)) _n_ = (n);					 \
           (__typeof__(*(ptr))) __cmpxchg((ptr), (unsigned long)_o_,		 \
      				    (unsigned long)_n_, sizeof(*(ptr))); \
        })
      
      This does a type check of *ptr to both o and n.
      
      Unfortunately, the code in ring-buffer.c assigns longs to pointers
      and pointers to longs and causes a warning on PowerPC:
      
      ring_buffer.c: In function 'rb_head_page_set':
      ring_buffer.c:704: warning: initialization makes pointer from integer without a cast
      ring_buffer.c:704: warning: initialization makes pointer from integer without a cast
      ring_buffer.c: In function 'rb_head_page_replace':
      ring_buffer.c:797: warning: initialization makes integer from pointer without a cast
      
      This patch adds the typecasts inside cmpxchg to annotate that a long is
      being cast to a pointer and a pointer is being casted to a long and this
      removes the PowerPC warnings.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      08a40816
  27. 10 9月, 2009 1 次提交
  28. 05 9月, 2009 2 次提交
  29. 04 9月, 2009 1 次提交
    • S
      ring-buffer: disable all cpu buffers when one finds a problem · 077c5407
      Steven Rostedt 提交于
      Currently the way RB_WARN_ON works, is to disable either the current
      CPU buffer or all CPU buffers, depending on whether a ring_buffer or
      ring_buffer_per_cpu struct was passed into the macro.
      
      Most users of the RB_WARN_ON pass in the CPU buffer, so only the one
      CPU buffer gets disabled but the rest are still active. This may
      confuse users even though a warning is sent to the console.
      
      This patch changes the macro to disable the entire buffer even if
      the CPU buffer is passed in.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      077c5407