1. 11 11月, 2014 1 次提交
    • R
      tracing: Do not busy wait in buffer splice · e30f53aa
      Rabin Vincent 提交于
      On a !PREEMPT kernel, attempting to use trace-cmd results in a soft
      lockup:
      
       # trace-cmd record -e raw_syscalls:* -F false
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61]
       ...
       Call Trace:
        [<ffffffff8105b580>] ? __wake_up_common+0x90/0x90
        [<ffffffff81092e25>] wait_on_pipe+0x35/0x40
        [<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0
        [<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0
        [<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40
        [<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290
        [<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0
        [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
        [<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60
        [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
        [<ffffffff8112415d>] do_splice_to+0x6d/0x90
        [<ffffffff81126971>] SyS_splice+0x7c1/0x800
        [<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8
      
      The problem is this: tracing_buffers_splice_read() calls
      ring_buffer_wait() to wait for data in the ring buffers.  The buffers
      are not empty so ring_buffer_wait() returns immediately.  But
      tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1,
      meaning it only wants to read a full page.  When the full page is not
      available, tracing_buffers_splice_read() tries to wait again with
      ring_buffer_wait(), which again returns immediately, and so on.
      
      Fix this by adding a "full" argument to ring_buffer_wait() which will
      make ring_buffer_wait() wait until the writer has left the reader's
      page, i.e.  until full-page reads will succeed.
      
      Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in
      
      Cc: stable@vger.kernel.org # 3.16+
      Fixes: b1169cc6 ("tracing: Remove mock up poll wait function")
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e30f53aa
  2. 03 10月, 2014 1 次提交
    • S
      ring-buffer: Fix infinite spin in reading buffer · 24607f11
      Steven Rostedt (Red Hat) 提交于
      Commit 651e22f2 "ring-buffer: Always reset iterator to reader page"
      fixed one bug but in the process caused another one. The reset is to
      update the header page, but that fix also changed the way the cached
      reads were updated. The cache reads are used to test if an iterator
      needs to be updated or not.
      
      A ring buffer iterator, when created, disables writes to the ring buffer
      but does not stop other readers or consuming reads from happening.
      Although all readers are synchronized via a lock, they are only
      synchronized when in the ring buffer functions. Those functions may
      be called by any number of readers. The iterator continues down when
      its not interrupted by a consuming reader. If a consuming read
      occurs, the iterator starts from the beginning of the buffer.
      
      The way the iterator sees that a consuming read has happened since
      its last read is by checking the reader "cache". The cache holds the
      last counts of the read and the reader page itself.
      
      Commit 651e22f2 changed what was saved by the cache_read when
      the rb_iter_reset() occurred, making the iterator never match the cache.
      Then if the iterator calls rb_iter_reset(), it will go into an
      infinite loop by checking if the cache doesn't match, doing the reset
      and retrying, just to see that the cache still doesn't match! Which
      should never happen as the reset is suppose to set the cache to the
      current value and there's locks that keep a consuming reader from
      having access to the data.
      
      Fixes: 651e22f2 "ring-buffer: Always reset iterator to reader page"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      24607f11
  3. 26 8月, 2014 1 次提交
    • J
      trace: Fix epoll hang when we race with new entries · 4ce97dbf
      Josef Bacik 提交于
      Epoll on trace_pipe can sometimes hang in a weird case.  If the ring buffer is
      empty when we set waiters_pending but an event shows up exactly at that moment
      we can miss being woken up by the ring buffers irq work.  Since
      ring_buffer_empty() is inherently racey we will sometimes think that the buffer
      is not empty.  So we don't get woken up and we don't think there are any events
      even though there were some ready when we added the watch, which makes us hang.
      This patch fixes this by making sure that we are actually on the wait list
      before we set waiters_pending, and add a memory barrier to make sure
      ring_buffer_empty() is going to be correct.
      
      Link: http://lkml.kernel.org/p/1408989581-23727-1-git-send-email-jbacik@fb.com
      
      Cc: stable@vger.kernel.org # 3.10+
      Cc: Martin Lau <kafai@fb.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4ce97dbf
  4. 07 8月, 2014 2 次提交
    • S
      ring-buffer: Always reset iterator to reader page · 651e22f2
      Steven Rostedt (Red Hat) 提交于
      When performing a consuming read, the ring buffer swaps out a
      page from the ring buffer with a empty page and this page that
      was swapped out becomes the new reader page. The reader page
      is owned by the reader and since it was swapped out of the ring
      buffer, writers do not have access to it (there's an exception
      to that rule, but it's out of scope for this commit).
      
      When reading the "trace" file, it is a non consuming read, which
      means that the data in the ring buffer will not be modified.
      When the trace file is opened, a ring buffer iterator is allocated
      and writes to the ring buffer are disabled, such that the iterator
      will not have issues iterating over the data.
      
      Although the ring buffer disabled writes, it does not disable other
      reads, or even consuming reads. If a consuming read happens, then
      the iterator is reset and starts reading from the beginning again.
      
      My tests would sometimes trigger this bug on my i386 box:
      
      WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa()
      Modules linked in:
      CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8
      Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
       00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b
       f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0
       ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M
      Call Trace:
       [<c18796b3>] dump_stack+0x4b/0x75
       [<c103a0e3>] warn_slowpath_common+0x7e/0x95
       [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
       [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
       [<c103a185>] warn_slowpath_fmt+0x33/0x35
       [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M
       [<c10bed04>] trace_find_cmdline+0x40/0x64
       [<c10c3c16>] trace_print_context+0x27/0xec
       [<c10c4360>] ? trace_seq_printf+0x37/0x5b
       [<c10c0b15>] print_trace_line+0x319/0x39b
       [<c10ba3fb>] ? ring_buffer_read+0x47/0x50
       [<c10c13b1>] s_show+0x192/0x1ab
       [<c10bfd9a>] ? s_next+0x5a/0x7c
       [<c112e76e>] seq_read+0x267/0x34c
       [<c1115a25>] vfs_read+0x8c/0xef
       [<c112e507>] ? seq_lseek+0x154/0x154
       [<c1115ba2>] SyS_read+0x54/0x7f
       [<c188488e>] syscall_call+0x7/0xb
      ---[ end trace 3f507febd6b4cc83 ]---
      >>>> ##### CPU 1 buffer started ####
      
      Which was the __trace_find_cmdline() function complaining about the pid
      in the event record being negative.
      
      After adding more test cases, this would trigger more often. Strangely
      enough, it would never trigger on a single test, but instead would trigger
      only when running all the tests. I believe that was the case because it
      required one of the tests to be shutting down via delayed instances while
      a new test started up.
      
      After spending several days debugging this, I found that it was caused by
      the iterator becoming corrupted. Debugging further, I found out why
      the iterator became corrupted. It happened with the rb_iter_reset().
      
      As consuming reads may not read the full reader page, and only part
      of it, there's a "read" field to know where the last read took place.
      The iterator, must also start at the read position. In the rb_iter_reset()
      code, if the reader page was disconnected from the ring buffer, the iterator
      would start at the head page within the ring buffer (where writes still
      happen). But the mistake there was that it still used the "read" field
      to start the iterator on the head page, where it should always start
      at zero because readers never read from within the ring buffer where
      writes occur.
      
      I originally wrote a patch to have it set the iter->head to 0 instead
      of iter->head_page->read, but then I questioned why it wasn't always
      setting the iter to point to the reader page, as the reader page is
      still valid.  The list_empty(reader_page->list) just means that it was
      successful in swapping out. But the reader_page may still have data.
      
      There was a bug report a long time ago that was not reproducible that
      had something about trace_pipe (consuming read) not matching trace
      (iterator read). This may explain why that happened.
      
      Anyway, the correct answer to this bug is to always use the reader page
      an not reset the iterator to inside the writable ring buffer.
      
      Cc: stable@vger.kernel.org # 2.6.28+
      Fixes: d769041f "ring_buffer: implement new locking"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      651e22f2
    • S
      ring-buffer: Up rb_iter_peek() loop count to 3 · 021de3d9
      Steven Rostedt (Red Hat) 提交于
      After writting a test to try to trigger the bug that caused the
      ring buffer iterator to become corrupted, I hit another bug:
      
       WARNING: CPU: 1 PID: 5281 at kernel/trace/ring_buffer.c:3766 rb_iter_peek+0x113/0x238()
       Modules linked in: ipt_MASQUERADE sunrpc [...]
       CPU: 1 PID: 5281 Comm: grep Tainted: G        W     3.16.0-rc3-test+ #143
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
        0000000000000000 ffffffff81809a80 ffffffff81503fb0 0000000000000000
        ffffffff81040ca1 ffff8800796d6010 ffffffff810c138d ffff8800796d6010
        ffff880077438c80 ffff8800796d6010 ffff88007abbe600 0000000000000003
       Call Trace:
        [<ffffffff81503fb0>] ? dump_stack+0x4a/0x75
        [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
        [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
        [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
        [<ffffffff810c14df>] ? ring_buffer_iter_peek+0x2d/0x5c
        [<ffffffff810c6f73>] ? tracing_iter_reset+0x6e/0x96
        [<ffffffff810c74a3>] ? s_start+0xd7/0x17b
        [<ffffffff8112b13e>] ? kmem_cache_alloc_trace+0xda/0xea
        [<ffffffff8114cf94>] ? seq_read+0x148/0x361
        [<ffffffff81132d98>] ? vfs_read+0x93/0xf1
        [<ffffffff81132f1b>] ? SyS_read+0x60/0x8e
        [<ffffffff8150bf9f>] ? tracesys+0xdd/0xe2
      
      Debugging this bug, which triggers when the rb_iter_peek() loops too
      many times (more than 2 times), I discovered there's a case that can
      cause that function to legitimately loop 3 times!
      
      rb_iter_peek() is different than rb_buffer_peek() as the rb_buffer_peek()
      only deals with the reader page (it's for consuming reads). The
      rb_iter_peek() is for traversing the buffer without consuming it, and as
      such, it can loop for one more reason. That is, if we hit the end of
      the reader page or any page, it will go to the next page and try again.
      
      That is, we have this:
      
       1. iter->head > iter->head_page->page->commit
          (rb_inc_iter() which moves the iter to the next page)
          try again
      
       2. event = rb_iter_head_event()
          event->type_len == RINGBUF_TYPE_TIME_EXTEND
          rb_advance_iter()
          try again
      
       3. read the event.
      
      But we never get to 3, because the count is greater than 2 and we
      cause the WARNING and return NULL.
      
      Up the counter to 3.
      
      Cc: stable@vger.kernel.org # 2.6.37+
      Fixes: 69d1b839 "ring-buffer: Bind time extend and data events together"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      021de3d9
  5. 24 7月, 2014 1 次提交
  6. 19 7月, 2014 1 次提交
  7. 16 7月, 2014 1 次提交
    • M
      ring-buffer: Fix polling on trace_pipe · 97b8ee84
      Martin Lau 提交于
      ring_buffer_poll_wait() should always put the poll_table to its wait_queue
      even there is immediate data available.  Otherwise, the following epoll and
      read sequence will eventually hang forever:
      
      1. Put some data to make the trace_pipe ring_buffer read ready first
      2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee)
      3. epoll_wait()
      4. read(trace_pipe_fd) till EAGAIN
      5. Add some more data to the trace_pipe ring_buffer
      6. epoll_wait() -> this epoll_wait() will block forever
      
      ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2,
        ring_buffer_poll_wait() returns immediately without adding poll_table,
        which has poll_table->_qproc pointing to ep_poll_callback(), to its
        wait_queue.
      ~ During the epoll_wait() call in step 3 and step 6,
        ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue
        because the poll_table->_qproc is NULL and it is how epoll works.
      ~ When there is new data available in step 6, ring_buffer does not know
        it has to call ep_poll_callback() because it is not in its wait queue.
        Hence, block forever.
      
      Other poll implementation seems to call poll_wait() unconditionally as the very
      first thing to do.  For example, tcp_poll() in tcp.c.
      
      Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com
      
      Cc: stable@vger.kernel.org # 2.6.27
      Fixes: 2a2cc8f7 "ftrace: allow the event pipe to be polled"
      Reviewed-by: NChris Mason <clm@fb.com>
      Signed-off-by: NMartin Lau <kafai@fb.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      97b8ee84
  8. 10 6月, 2014 1 次提交
  9. 20 3月, 2014 1 次提交
    • S
      trace, ring-buffer: Fix CPU hotplug callback registration · d39ad278
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the tracing ring-buffer code by using this latter form of callback
      registration.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d39ad278
  10. 12 2月, 2014 1 次提交
    • S
      ring-buffer: Fix first commit on sub-buffer having non-zero delta · d651aa1d
      Steven Rostedt (Red Hat) 提交于
      Each sub-buffer (buffer page) has a full 64 bit timestamp. The events on
      that page use a 27 bit delta against that timestamp in order to save on
      bits written to the ring buffer. If the time between events is larger than
      what the 27 bits can hold, a "time extend" event is added to hold the
      entire 64 bit timestamp again and the events after that hold a delta from
      that timestamp.
      
      As a "time extend" is always paired with an event, it is logical to just
      allocate the event with the time extend, to make things a bit more efficient.
      
      Unfortunately, when the pairing code was written, it removed the "delta = 0"
      from the first commit on a page, causing the events on the page to be
      slightly skewed.
      
      Fixes: 69d1b839 "ring-buffer: Bind time extend and data events together"
      Cc: stable@vger.kernel.org # 2.6.37+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d651aa1d
  11. 13 1月, 2014 1 次提交
    • P
      sched/clock, x86: Use a static_key for sched_clock_stable · 35af99e6
      Peter Zijlstra 提交于
      In order to avoid the runtime condition and variable load turn
      sched_clock_stable into a static_key.
      
      Also provide a shorter implementation of local_clock() and
      cpu_clock(int) when sched_clock_stable==1.
      
                              MAINLINE   PRE       POST
      
          sched_clock_stable: 1          1         1
          (cold) sched_clock: 329841     221876    215295
          (cold) local_clock: 301773     234692    220773
          (warm) sched_clock: 38375      25602     25659
          (warm) local_clock: 100371     33265     27242
          (warm) rdtsc:       27340      24214     24208
          sched_clock_stable: 0          0         0
          (cold) sched_clock: 382634     235941    237019
          (cold) local_clock: 396890     297017    294819
          (warm) sched_clock: 38194      25233     25609
          (warm) local_clock: 143452     71234     71232
          (warm) rdtsc:       27345      24245     24243
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35af99e6
  12. 19 7月, 2013 2 次提交
  13. 28 5月, 2013 1 次提交
  14. 16 3月, 2013 1 次提交
    • S
      ring-buffer: Add ring buffer startup selftest · 6c43e554
      Steven Rostedt (Red Hat) 提交于
      When testing my large changes to the ftrace system, there was
      a bug that looked like the ring buffer was dropping events.
      I wrote up a quick integrity checker of the ring buffer to
      see if it was.
      
      Although the bug ended up being something stupid I did in ftrace,
      and had nothing to do with the ring buffer, I figured if I spent
      the time to write up this test, I might as well include it in the
      kernel.
      
      I cleaned it up a bit, as the original version was rather ugly.
      Not saying this version is pretty, but it's a beauty queen
      compared to what I original wrote.
      
      To enable the start up test, set CONFIG_RING_BUFFER_STARTUP_TEST.
      
      Note, it runs for 10 seconds, so it will slow your boot time
      by at least 10 more seconds.
      
      What it does is documented in both the comments and the Kconfig
      help.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6c43e554
  15. 15 3月, 2013 3 次提交
  16. 03 3月, 2013 1 次提交
    • J
      trace/ring_buffer: handle 64bit aligned structs · 649508f6
      James Hogan 提交于
      Some 32 bit architectures require 64 bit values to be aligned (for
      example Meta which has 64 bit read/write instructions). These require 8
      byte alignment of event data too, so use
      !CONFIG_HAVE_64BIT_ALIGNED_ACCESS instead of !CONFIG_64BIT ||
      CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to decide alignment, and align
      buffer_data_page::data accordingly.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Acked-by: Steven Rostedt <rostedt@goodmis.org> (previous version subtly different)
      649508f6
  17. 31 1月, 2013 1 次提交
  18. 23 1月, 2013 2 次提交
    • S
      ring-buffer: Remove trace.h from ring_buffer.c · 0b07436d
      Steven Rostedt 提交于
      ring_buffer.c use to require declarations from trace.h, but
      these have moved to the generic header files. There's nothing
      in trace.h that ring_buffer.c requires.
      
      There's some headers that trace.h included that ring_buffer.c
      needs, but it's best that it includes them directly, and not
      include trace.h.
      
      Also, some things may use ring_buffer.c without having tracing
      configured. This removes the dependency that may come in the
      future.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0b07436d
    • S
      ring-buffer: User context bit recursion checking · 567cd4da
      Steven Rostedt 提交于
      Using context bit recursion checking, we can help increase the
      performance of the ring buffer.
      
      Before this patch:
      
       # echo function > /debug/tracing/current_tracer
       # for i in `seq 10`; do ./hackbench 50; done
      Time: 10.285
      Time: 10.407
      Time: 10.243
      Time: 10.372
      Time: 10.380
      Time: 10.198
      Time: 10.272
      Time: 10.354
      Time: 10.248
      Time: 10.253
      
      (average: 10.3012)
      
      Now we have:
      
       # echo function > /debug/tracing/current_tracer
       # for i in `seq 10`; do ./hackbench 50; done
      Time: 9.712
      Time: 9.824
      Time: 9.861
      Time: 9.827
      Time: 9.962
      Time: 9.905
      Time: 9.886
      Time: 10.088
      Time: 9.861
      Time: 9.834
      
      (average: 9.876)
      
       a 4% savings!
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      567cd4da
  19. 22 1月, 2013 1 次提交
    • S
      ring-buffer: Remove unnecessary recusive call in rb_advance_iter() · 771e0384
      Steven Rostedt 提交于
      The original ring-buffer code had special checks at the start
      of rb_advance_iter() and instead of repeating them again at the
      end of the function if a certain condition existed, I just did
      a recursive call to rb_advance_iter() because the special condition
      would cause rb_advance_iter() to return early (after the checks).
      
      But as things have changed, the special checks no longer exist
      and the only thing done for the special_condition is to call
      rb_inc_iter() and return. Instead of doing a confusing recursive call,
      just call rb_inc_iter instead.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      771e0384
  20. 01 12月, 2012 2 次提交
    • S
      ring-buffer: Fix race between integrity check and readers · 9366c1ba
      Steven Rostedt 提交于
      The function rb_check_pages() was added to make sure the ring buffer's
      pages were sane. This check is done when the ring buffer size is modified
      as well as when the iterator is released (closing the "trace" file),
      as that was considered a non fast path and a good place to do a sanity
      check.
      
      The problem is that the check does not have any locks around it.
      If one process were to read the trace file, and another were to read
      the raw binary file, the check could happen while the reader is reading
      the file.
      
      The issues with this is that the check requires to clear the HEAD page
      before doing the full check and it restores it afterward. But readers
      require the HEAD page to exist before it can read the buffer, otherwise
      it gives a nasty warning and disables the buffer.
      
      By adding the reader lock around the check, this keeps the race from
      happening.
      
      Cc: stable@vger.kernel.org # 3.6
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9366c1ba
    • S
      ring-buffer: Fix NULL pointer if rb_set_head_page() fails · 54f7be5b
      Steven Rostedt 提交于
      The function rb_set_head_page() searches the list of ring buffer
      pages for a the page that has the HEAD page flag set. If it does
      not find it, it will do a WARN_ON(), disable the ring buffer and
      return NULL, as this should never happen.
      
      But if this bug happens to happen, not all callers of this function
      can handle a NULL pointer being returned from it. That needs to be
      fixed.
      
      Cc: stable@vger.kernel.org # 3.0+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      54f7be5b
  21. 02 11月, 2012 1 次提交
  22. 01 11月, 2012 2 次提交
  23. 12 10月, 2012 1 次提交
  24. 07 8月, 2012 1 次提交
  25. 30 6月, 2012 2 次提交
  26. 29 6月, 2012 1 次提交
    • S
      ring-buffer: Fix uninitialized read_stamp · a5fb8331
      Steven Rostedt 提交于
      The ring buffer reader page is used to swap a page from the writable
      ring buffer. If the writer happens to be on that page, it ends up on the
      reader page, but will simply move off of it, back into the writable ring
      buffer as writes are added.
      
      The time stamp passed back to the readers is stored in the cpu_buffer per
      CPU descriptor. This stamp is updated when a swap of the reader page takes
      place, and it reads the current stamp from the page taken from the writable
      ring buffer. Everytime a writer goes to a new page, it updates the time stamp
      of that page.
      
      The problem happens if a reader reads a page from an empty per CPU ring buffer.
      If the buffer is empty, the swap still takes place, placing the writer at the
      start of the reader page. If at a later time, a write happens, it updates the
      page's time stamp and continues. But the problem is that the read_stamp does
      not get updated, because the page was already swapped.
      
      The solution to this was to not swap the page if the ring buffer happens to
      be empty. This also removes the side effect that the writes on the reader
      page will not get updated because the writer never gets back on the reader
      page without a swap. That is, if a read happens on an empty buffer, but then
      no reads happen for a while. If a swap took place, and the writer were to start
      writing a lot of data (function tracer), it will start overflowing the ring buffer
      and overwrite the older data. But because the writer never goes back onto the
      reader page, the data left on the reader page never gets overwritten. This
      causes the reader to see really old data, followed by a jump to newer data.
      
      Link: http://lkml.kernel.org/r/1340060577-9112-1-git-send-email-dhsharp@google.com
      Google-Bug-Id: 6410455
      Reported-by: NDavid Sharp <dhsharp@google.com>
      tested-by: NDavid Sharp <dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a5fb8331
  27. 24 5月, 2012 1 次提交
    • S
      ring-buffer: Check for valid buffer before changing size · 6a31e1f1
      Steven Rostedt 提交于
      On some machines the number of possible CPUS is not the same as the
      number of CPUs that is on the machine. Ftrace uses possible_cpus to
      update the tracing structures but the ring buffer only allocates
      per cpu buffers for online CPUs when they come up.
      
      When the wakeup tracer was enabled in such a case, the ftrace code
      enabled all possible cpu buffers, but the code in ring_buffer_resize()
      did not check to see if the buffer in question was allocated. Since
      boot up CPUs did not match possible CPUs it caused the following
      crash:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000020
      IP: [<c1097851>] ring_buffer_resize+0x16a/0x28d
      *pde = 00000000
      Oops: 0000 [#1] PREEMPT SMP
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in: [last unloaded: scsi_wait_scan]
      
      Pid: 1387, comm: bash Not tainted 3.4.0-test+ #13                  /DG965MQ
      EIP: 0060:[<c1097851>] EFLAGS: 00010217 CPU: 0
      EIP is at ring_buffer_resize+0x16a/0x28d
      EAX: f5a14340 EBX: f6026b80 ECX: 00000ff4 EDX: 00000ff3
      ESI: 00000000 EDI: 00000002 EBP: f4275ecc ESP: f4275eb0
       DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      CR0: 80050033 CR2: 00000020 CR3: 34396000 CR4: 000007d0
      DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      DR6: ffff0ff0 DR7: 00000400
      Process bash (pid: 1387, ti=f4274000 task=f4380cb0 task.ti=f4274000)
      Stack:
       c109cf9a f6026b98 00000162 00160f68 00000006 00160f68 00000002 f4275ef0
       c109d013 f4275ee8 c123b72a c1c0bf00 c1cc81dc 00000005 f4275f98 00000007
       f4275f70 c109d0c7 7700000e 75656b61 00000070 f5e90900 f5c4e198 00000301
      Call Trace:
       [<c109cf9a>] ? tracing_set_tracer+0x115/0x1e9
       [<c109d013>] tracing_set_tracer+0x18e/0x1e9
       [<c123b72a>] ? _copy_from_user+0x30/0x46
       [<c109d0c7>] tracing_set_trace_write+0x59/0x7f
       [<c10ec01e>] ? fput+0x18/0x1c6
       [<c11f8732>] ? security_file_permission+0x27/0x2b
       [<c10eaacd>] ? rw_verify_area+0xcf/0xf2
       [<c10ec01e>] ? fput+0x18/0x1c6
       [<c109d06e>] ? tracing_set_tracer+0x1e9/0x1e9
       [<c10ead77>] vfs_write+0x8b/0xe3
       [<c10ebead>] ? fget_light+0x30/0x81
       [<c10eaf54>] sys_write+0x42/0x63
       [<c1834fbf>] sysenter_do_call+0x12/0x28
      
      This happens with the latency tracer as the ftrace code updates the
      saved max buffer via its cpumask and not with a global setting.
      
      Adding a check in ring_buffer_resize() to make sure the buffer being resized
      exists, fixes the problem.
      
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6a31e1f1
  28. 19 5月, 2012 1 次提交
  29. 17 5月, 2012 4 次提交
    • S
      ring-buffer: Reset head page before running self test · 308f7eeb
      Steven Rostedt 提交于
      When the ring buffer does its consistency test on itself, it
      removes the head page, runs the tests, and then adds it back
      to what the "head_page" pointer was. But because the head_page
      pointer may lack behind the real head page (held by the link
      list pointer). The reset may be incorrect.
      
      Instead, if the head_page exists (it does not on first allocation)
      reset it back to the real head page before running the consistency
      tests. Then it will be put back to its original location after
      the tests are complete.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      308f7eeb
    • S
      ring-buffer: Add integrity check at end of iter read · 659f451f
      Steven Rostedt 提交于
      There use to be ring buffer integrity checks after updating the
      size of the ring buffer. But now that the ring buffer can modify
      the size while the system is running, the integrity checks were
      removed, as they require the ring buffer to be disabed to perform
      the check.
      
      Move the integrity check to the reading of the ring buffer via the
      iterator reads (the "trace" file). As reading via an iterator requires
      disabling the ring buffer, it is a perfect place to have it.
      
      If the ring buffer happens to be disabled when updating the size,
      we still perform the integrity check.
      
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      659f451f
    • V
      ring-buffer: Make addition of pages in ring buffer atomic · 5040b4b7
      Vaibhav Nagarnaik 提交于
      This patch adds the capability to add new pages to a ring buffer
      atomically while write operations are going on. This makes it possible
      to expand the ring buffer size without reinitializing the ring buffer.
      
      The new pages are attached between the head page and its previous page.
      
      Link: http://lkml.kernel.org/r/1336096792-25373-2-git-send-email-vnagarnaik@google.com
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Laurent Chavey <chavey@google.com>
      Cc: Justin Teravest <teravest@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5040b4b7
    • V
      ring-buffer: Make removal of ring buffer pages atomic · 83f40318
      Vaibhav Nagarnaik 提交于
      This patch adds the capability to remove pages from a ring buffer
      without destroying any existing data in it.
      
      This is done by removing the pages after the tail page. This makes sure
      that first all the empty pages in the ring buffer are removed. If the
      head page is one in the list of pages to be removed, then the page after
      the removed ones is made the head page. This removes the oldest data
      from the ring buffer and keeps the latest data around to be read.
      
      To do this in a non-racey manner, tracing is stopped for a very short
      time while the pages to be removed are identified and unlinked from the
      ring buffer. The pages are freed after the tracing is restarted to
      minimize the time needed to stop tracing.
      
      The context in which the pages from the per-cpu ring buffer are removed
      runs on the respective CPU. This minimizes the events not traced to only
      NMI trace contexts.
      
      Link: http://lkml.kernel.org/r/1336096792-25373-1-git-send-email-vnagarnaik@google.com
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Laurent Chavey <chavey@google.com>
      Cc: Justin Teravest <teravest@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      83f40318