1. 16 7月, 2014 1 次提交
    • M
      ring-buffer: Fix polling on trace_pipe · 97b8ee84
      Martin Lau 提交于
      ring_buffer_poll_wait() should always put the poll_table to its wait_queue
      even there is immediate data available.  Otherwise, the following epoll and
      read sequence will eventually hang forever:
      
      1. Put some data to make the trace_pipe ring_buffer read ready first
      2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee)
      3. epoll_wait()
      4. read(trace_pipe_fd) till EAGAIN
      5. Add some more data to the trace_pipe ring_buffer
      6. epoll_wait() -> this epoll_wait() will block forever
      
      ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2,
        ring_buffer_poll_wait() returns immediately without adding poll_table,
        which has poll_table->_qproc pointing to ep_poll_callback(), to its
        wait_queue.
      ~ During the epoll_wait() call in step 3 and step 6,
        ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue
        because the poll_table->_qproc is NULL and it is how epoll works.
      ~ When there is new data available in step 6, ring_buffer does not know
        it has to call ep_poll_callback() because it is not in its wait queue.
        Hence, block forever.
      
      Other poll implementation seems to call poll_wait() unconditionally as the very
      first thing to do.  For example, tcp_poll() in tcp.c.
      
      Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com
      
      Cc: stable@vger.kernel.org # 2.6.27
      Fixes: 2a2cc8f7 "ftrace: allow the event pipe to be polled"
      Reviewed-by: NChris Mason <clm@fb.com>
      Signed-off-by: NMartin Lau <kafai@fb.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      97b8ee84
  2. 10 6月, 2014 1 次提交
  3. 20 3月, 2014 1 次提交
    • S
      trace, ring-buffer: Fix CPU hotplug callback registration · d39ad278
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the tracing ring-buffer code by using this latter form of callback
      registration.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d39ad278
  4. 12 2月, 2014 1 次提交
    • S
      ring-buffer: Fix first commit on sub-buffer having non-zero delta · d651aa1d
      Steven Rostedt (Red Hat) 提交于
      Each sub-buffer (buffer page) has a full 64 bit timestamp. The events on
      that page use a 27 bit delta against that timestamp in order to save on
      bits written to the ring buffer. If the time between events is larger than
      what the 27 bits can hold, a "time extend" event is added to hold the
      entire 64 bit timestamp again and the events after that hold a delta from
      that timestamp.
      
      As a "time extend" is always paired with an event, it is logical to just
      allocate the event with the time extend, to make things a bit more efficient.
      
      Unfortunately, when the pairing code was written, it removed the "delta = 0"
      from the first commit on a page, causing the events on the page to be
      slightly skewed.
      
      Fixes: 69d1b839 "ring-buffer: Bind time extend and data events together"
      Cc: stable@vger.kernel.org # 2.6.37+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d651aa1d
  5. 13 1月, 2014 1 次提交
    • P
      sched/clock, x86: Use a static_key for sched_clock_stable · 35af99e6
      Peter Zijlstra 提交于
      In order to avoid the runtime condition and variable load turn
      sched_clock_stable into a static_key.
      
      Also provide a shorter implementation of local_clock() and
      cpu_clock(int) when sched_clock_stable==1.
      
                              MAINLINE   PRE       POST
      
          sched_clock_stable: 1          1         1
          (cold) sched_clock: 329841     221876    215295
          (cold) local_clock: 301773     234692    220773
          (warm) sched_clock: 38375      25602     25659
          (warm) local_clock: 100371     33265     27242
          (warm) rdtsc:       27340      24214     24208
          sched_clock_stable: 0          0         0
          (cold) sched_clock: 382634     235941    237019
          (cold) local_clock: 396890     297017    294819
          (warm) sched_clock: 38194      25233     25609
          (warm) local_clock: 143452     71234     71232
          (warm) rdtsc:       27345      24245     24243
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35af99e6
  6. 19 7月, 2013 2 次提交
  7. 28 5月, 2013 1 次提交
  8. 16 3月, 2013 1 次提交
    • S
      ring-buffer: Add ring buffer startup selftest · 6c43e554
      Steven Rostedt (Red Hat) 提交于
      When testing my large changes to the ftrace system, there was
      a bug that looked like the ring buffer was dropping events.
      I wrote up a quick integrity checker of the ring buffer to
      see if it was.
      
      Although the bug ended up being something stupid I did in ftrace,
      and had nothing to do with the ring buffer, I figured if I spent
      the time to write up this test, I might as well include it in the
      kernel.
      
      I cleaned it up a bit, as the original version was rather ugly.
      Not saying this version is pretty, but it's a beauty queen
      compared to what I original wrote.
      
      To enable the start up test, set CONFIG_RING_BUFFER_STARTUP_TEST.
      
      Note, it runs for 10 seconds, so it will slow your boot time
      by at least 10 more seconds.
      
      What it does is documented in both the comments and the Kconfig
      help.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6c43e554
  9. 15 3月, 2013 3 次提交
  10. 03 3月, 2013 1 次提交
    • J
      trace/ring_buffer: handle 64bit aligned structs · 649508f6
      James Hogan 提交于
      Some 32 bit architectures require 64 bit values to be aligned (for
      example Meta which has 64 bit read/write instructions). These require 8
      byte alignment of event data too, so use
      !CONFIG_HAVE_64BIT_ALIGNED_ACCESS instead of !CONFIG_64BIT ||
      CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to decide alignment, and align
      buffer_data_page::data accordingly.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Acked-by: Steven Rostedt <rostedt@goodmis.org> (previous version subtly different)
      649508f6
  11. 31 1月, 2013 1 次提交
  12. 23 1月, 2013 2 次提交
    • S
      ring-buffer: Remove trace.h from ring_buffer.c · 0b07436d
      Steven Rostedt 提交于
      ring_buffer.c use to require declarations from trace.h, but
      these have moved to the generic header files. There's nothing
      in trace.h that ring_buffer.c requires.
      
      There's some headers that trace.h included that ring_buffer.c
      needs, but it's best that it includes them directly, and not
      include trace.h.
      
      Also, some things may use ring_buffer.c without having tracing
      configured. This removes the dependency that may come in the
      future.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0b07436d
    • S
      ring-buffer: User context bit recursion checking · 567cd4da
      Steven Rostedt 提交于
      Using context bit recursion checking, we can help increase the
      performance of the ring buffer.
      
      Before this patch:
      
       # echo function > /debug/tracing/current_tracer
       # for i in `seq 10`; do ./hackbench 50; done
      Time: 10.285
      Time: 10.407
      Time: 10.243
      Time: 10.372
      Time: 10.380
      Time: 10.198
      Time: 10.272
      Time: 10.354
      Time: 10.248
      Time: 10.253
      
      (average: 10.3012)
      
      Now we have:
      
       # echo function > /debug/tracing/current_tracer
       # for i in `seq 10`; do ./hackbench 50; done
      Time: 9.712
      Time: 9.824
      Time: 9.861
      Time: 9.827
      Time: 9.962
      Time: 9.905
      Time: 9.886
      Time: 10.088
      Time: 9.861
      Time: 9.834
      
      (average: 9.876)
      
       a 4% savings!
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      567cd4da
  13. 22 1月, 2013 1 次提交
    • S
      ring-buffer: Remove unnecessary recusive call in rb_advance_iter() · 771e0384
      Steven Rostedt 提交于
      The original ring-buffer code had special checks at the start
      of rb_advance_iter() and instead of repeating them again at the
      end of the function if a certain condition existed, I just did
      a recursive call to rb_advance_iter() because the special condition
      would cause rb_advance_iter() to return early (after the checks).
      
      But as things have changed, the special checks no longer exist
      and the only thing done for the special_condition is to call
      rb_inc_iter() and return. Instead of doing a confusing recursive call,
      just call rb_inc_iter instead.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      771e0384
  14. 01 12月, 2012 2 次提交
    • S
      ring-buffer: Fix race between integrity check and readers · 9366c1ba
      Steven Rostedt 提交于
      The function rb_check_pages() was added to make sure the ring buffer's
      pages were sane. This check is done when the ring buffer size is modified
      as well as when the iterator is released (closing the "trace" file),
      as that was considered a non fast path and a good place to do a sanity
      check.
      
      The problem is that the check does not have any locks around it.
      If one process were to read the trace file, and another were to read
      the raw binary file, the check could happen while the reader is reading
      the file.
      
      The issues with this is that the check requires to clear the HEAD page
      before doing the full check and it restores it afterward. But readers
      require the HEAD page to exist before it can read the buffer, otherwise
      it gives a nasty warning and disables the buffer.
      
      By adding the reader lock around the check, this keeps the race from
      happening.
      
      Cc: stable@vger.kernel.org # 3.6
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9366c1ba
    • S
      ring-buffer: Fix NULL pointer if rb_set_head_page() fails · 54f7be5b
      Steven Rostedt 提交于
      The function rb_set_head_page() searches the list of ring buffer
      pages for a the page that has the HEAD page flag set. If it does
      not find it, it will do a WARN_ON(), disable the ring buffer and
      return NULL, as this should never happen.
      
      But if this bug happens to happen, not all callers of this function
      can handle a NULL pointer being returned from it. That needs to be
      fixed.
      
      Cc: stable@vger.kernel.org # 3.0+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      54f7be5b
  15. 02 11月, 2012 1 次提交
  16. 01 11月, 2012 2 次提交
  17. 12 10月, 2012 1 次提交
  18. 07 8月, 2012 1 次提交
  19. 30 6月, 2012 2 次提交
  20. 29 6月, 2012 1 次提交
    • S
      ring-buffer: Fix uninitialized read_stamp · a5fb8331
      Steven Rostedt 提交于
      The ring buffer reader page is used to swap a page from the writable
      ring buffer. If the writer happens to be on that page, it ends up on the
      reader page, but will simply move off of it, back into the writable ring
      buffer as writes are added.
      
      The time stamp passed back to the readers is stored in the cpu_buffer per
      CPU descriptor. This stamp is updated when a swap of the reader page takes
      place, and it reads the current stamp from the page taken from the writable
      ring buffer. Everytime a writer goes to a new page, it updates the time stamp
      of that page.
      
      The problem happens if a reader reads a page from an empty per CPU ring buffer.
      If the buffer is empty, the swap still takes place, placing the writer at the
      start of the reader page. If at a later time, a write happens, it updates the
      page's time stamp and continues. But the problem is that the read_stamp does
      not get updated, because the page was already swapped.
      
      The solution to this was to not swap the page if the ring buffer happens to
      be empty. This also removes the side effect that the writes on the reader
      page will not get updated because the writer never gets back on the reader
      page without a swap. That is, if a read happens on an empty buffer, but then
      no reads happen for a while. If a swap took place, and the writer were to start
      writing a lot of data (function tracer), it will start overflowing the ring buffer
      and overwrite the older data. But because the writer never goes back onto the
      reader page, the data left on the reader page never gets overwritten. This
      causes the reader to see really old data, followed by a jump to newer data.
      
      Link: http://lkml.kernel.org/r/1340060577-9112-1-git-send-email-dhsharp@google.com
      Google-Bug-Id: 6410455
      Reported-by: NDavid Sharp <dhsharp@google.com>
      tested-by: NDavid Sharp <dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a5fb8331
  21. 24 5月, 2012 1 次提交
    • S
      ring-buffer: Check for valid buffer before changing size · 6a31e1f1
      Steven Rostedt 提交于
      On some machines the number of possible CPUS is not the same as the
      number of CPUs that is on the machine. Ftrace uses possible_cpus to
      update the tracing structures but the ring buffer only allocates
      per cpu buffers for online CPUs when they come up.
      
      When the wakeup tracer was enabled in such a case, the ftrace code
      enabled all possible cpu buffers, but the code in ring_buffer_resize()
      did not check to see if the buffer in question was allocated. Since
      boot up CPUs did not match possible CPUs it caused the following
      crash:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000020
      IP: [<c1097851>] ring_buffer_resize+0x16a/0x28d
      *pde = 00000000
      Oops: 0000 [#1] PREEMPT SMP
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in: [last unloaded: scsi_wait_scan]
      
      Pid: 1387, comm: bash Not tainted 3.4.0-test+ #13                  /DG965MQ
      EIP: 0060:[<c1097851>] EFLAGS: 00010217 CPU: 0
      EIP is at ring_buffer_resize+0x16a/0x28d
      EAX: f5a14340 EBX: f6026b80 ECX: 00000ff4 EDX: 00000ff3
      ESI: 00000000 EDI: 00000002 EBP: f4275ecc ESP: f4275eb0
       DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      CR0: 80050033 CR2: 00000020 CR3: 34396000 CR4: 000007d0
      DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      DR6: ffff0ff0 DR7: 00000400
      Process bash (pid: 1387, ti=f4274000 task=f4380cb0 task.ti=f4274000)
      Stack:
       c109cf9a f6026b98 00000162 00160f68 00000006 00160f68 00000002 f4275ef0
       c109d013 f4275ee8 c123b72a c1c0bf00 c1cc81dc 00000005 f4275f98 00000007
       f4275f70 c109d0c7 7700000e 75656b61 00000070 f5e90900 f5c4e198 00000301
      Call Trace:
       [<c109cf9a>] ? tracing_set_tracer+0x115/0x1e9
       [<c109d013>] tracing_set_tracer+0x18e/0x1e9
       [<c123b72a>] ? _copy_from_user+0x30/0x46
       [<c109d0c7>] tracing_set_trace_write+0x59/0x7f
       [<c10ec01e>] ? fput+0x18/0x1c6
       [<c11f8732>] ? security_file_permission+0x27/0x2b
       [<c10eaacd>] ? rw_verify_area+0xcf/0xf2
       [<c10ec01e>] ? fput+0x18/0x1c6
       [<c109d06e>] ? tracing_set_tracer+0x1e9/0x1e9
       [<c10ead77>] vfs_write+0x8b/0xe3
       [<c10ebead>] ? fget_light+0x30/0x81
       [<c10eaf54>] sys_write+0x42/0x63
       [<c1834fbf>] sysenter_do_call+0x12/0x28
      
      This happens with the latency tracer as the ftrace code updates the
      saved max buffer via its cpumask and not with a global setting.
      
      Adding a check in ring_buffer_resize() to make sure the buffer being resized
      exists, fixes the problem.
      
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6a31e1f1
  22. 19 5月, 2012 1 次提交
  23. 17 5月, 2012 4 次提交
    • S
      ring-buffer: Reset head page before running self test · 308f7eeb
      Steven Rostedt 提交于
      When the ring buffer does its consistency test on itself, it
      removes the head page, runs the tests, and then adds it back
      to what the "head_page" pointer was. But because the head_page
      pointer may lack behind the real head page (held by the link
      list pointer). The reset may be incorrect.
      
      Instead, if the head_page exists (it does not on first allocation)
      reset it back to the real head page before running the consistency
      tests. Then it will be put back to its original location after
      the tests are complete.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      308f7eeb
    • S
      ring-buffer: Add integrity check at end of iter read · 659f451f
      Steven Rostedt 提交于
      There use to be ring buffer integrity checks after updating the
      size of the ring buffer. But now that the ring buffer can modify
      the size while the system is running, the integrity checks were
      removed, as they require the ring buffer to be disabed to perform
      the check.
      
      Move the integrity check to the reading of the ring buffer via the
      iterator reads (the "trace" file). As reading via an iterator requires
      disabling the ring buffer, it is a perfect place to have it.
      
      If the ring buffer happens to be disabled when updating the size,
      we still perform the integrity check.
      
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      659f451f
    • V
      ring-buffer: Make addition of pages in ring buffer atomic · 5040b4b7
      Vaibhav Nagarnaik 提交于
      This patch adds the capability to add new pages to a ring buffer
      atomically while write operations are going on. This makes it possible
      to expand the ring buffer size without reinitializing the ring buffer.
      
      The new pages are attached between the head page and its previous page.
      
      Link: http://lkml.kernel.org/r/1336096792-25373-2-git-send-email-vnagarnaik@google.com
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Laurent Chavey <chavey@google.com>
      Cc: Justin Teravest <teravest@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5040b4b7
    • V
      ring-buffer: Make removal of ring buffer pages atomic · 83f40318
      Vaibhav Nagarnaik 提交于
      This patch adds the capability to remove pages from a ring buffer
      without destroying any existing data in it.
      
      This is done by removing the pages after the tail page. This makes sure
      that first all the empty pages in the ring buffer are removed. If the
      head page is one in the list of pages to be removed, then the page after
      the removed ones is made the head page. This removes the oldest data
      from the ring buffer and keeps the latest data around to be read.
      
      To do this in a non-racey manner, tracing is stopped for a very short
      time while the pages to be removed are identified and unlinked from the
      ring buffer. The pages are freed after the tracing is restarted to
      minimize the time needed to stop tracing.
      
      The context in which the pages from the per-cpu ring buffer are removed
      runs on the respective CPU. This minimizes the events not traced to only
      NMI trace contexts.
      
      Link: http://lkml.kernel.org/r/1336096792-25373-1-git-send-email-vnagarnaik@google.com
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Laurent Chavey <chavey@google.com>
      Cc: Justin Teravest <teravest@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      83f40318
  24. 24 4月, 2012 1 次提交
  25. 23 2月, 2012 1 次提交
    • S
      tracing/ring-buffer: Only have tracing_on disable tracing buffers · 499e5470
      Steven Rostedt 提交于
      As the ring-buffer code is being used by other facilities in the
      kernel, having tracing_on file disable *all* buffers is not a desired
      affect. It should only disable the ftrace buffers that are being used.
      
      Move the code into the trace.c file and use the buffer disabling
      for tracing_on() and tracing_off(). This way only the ftrace buffers
      will be affected by them and other kernel utilities will not be
      confused to why their output suddenly stopped.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      499e5470
  26. 13 9月, 2011 1 次提交
  27. 31 8月, 2011 1 次提交
    • V
      trace: Add ring buffer stats to measure rate of events · c64e148a
      Vaibhav Nagarnaik 提交于
      The stats file under per_cpu folder provides the number of entries,
      overruns and other statistics about the CPU ring buffer. However, the
      numbers do not provide any indication of how full the ring buffer is in
      bytes compared to the overall size in bytes. Also, it is helpful to know
      the rate at which the cpu buffer is filling up.
      
      This patch adds an entry "bytes: " in printed stats for per_cpu ring
      buffer which provides the actual bytes consumed in the ring buffer. This
      field includes the number of bytes used by recorded events and the
      padding bytes added when moving the tail pointer to next page.
      
      It also adds the following time stamps:
      "oldest event ts:" - the oldest timestamp in the ring buffer
      "now ts:"  - the timestamp at the time of reading
      
      The field "now ts" provides a consistent time snapshot to the userspace
      when being read. This is read from the same trace clock used by tracing
      event timestamps.
      
      Together, these values provide the rate at which the buffer is filling
      up, from the formula:
      bytes / (now_ts - oldest_event_ts)
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Link: http://lkml.kernel.org/r/1313531179-9323-3-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c64e148a
  28. 15 6月, 2011 3 次提交