1. 03 11月, 2015 1 次提交
  2. 11 11月, 2014 1 次提交
    • R
      tracing: Do not busy wait in buffer splice · e30f53aa
      Rabin Vincent 提交于
      On a !PREEMPT kernel, attempting to use trace-cmd results in a soft
      lockup:
      
       # trace-cmd record -e raw_syscalls:* -F false
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61]
       ...
       Call Trace:
        [<ffffffff8105b580>] ? __wake_up_common+0x90/0x90
        [<ffffffff81092e25>] wait_on_pipe+0x35/0x40
        [<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0
        [<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0
        [<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40
        [<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290
        [<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0
        [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
        [<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60
        [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
        [<ffffffff8112415d>] do_splice_to+0x6d/0x90
        [<ffffffff81126971>] SyS_splice+0x7c1/0x800
        [<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8
      
      The problem is this: tracing_buffers_splice_read() calls
      ring_buffer_wait() to wait for data in the ring buffers.  The buffers
      are not empty so ring_buffer_wait() returns immediately.  But
      tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1,
      meaning it only wants to read a full page.  When the full page is not
      available, tracing_buffers_splice_read() tries to wait again with
      ring_buffer_wait(), which again returns immediately, and so on.
      
      Fix this by adding a "full" argument to ring_buffer_wait() which will
      make ring_buffer_wait() wait until the writer has left the reader's
      page, i.e.  until full-page reads will succeed.
      
      Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in
      
      Cc: stable@vger.kernel.org # 3.16+
      Fixes: b1169cc6 ("tracing: Remove mock up poll wait function")
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e30f53aa
  3. 10 6月, 2014 1 次提交
  4. 15 3月, 2013 1 次提交
  5. 31 1月, 2013 1 次提交
  6. 02 11月, 2012 1 次提交
  7. 01 11月, 2012 1 次提交
  8. 24 4月, 2012 1 次提交
  9. 23 2月, 2012 1 次提交
    • S
      tracing/ring-buffer: Only have tracing_on disable tracing buffers · 499e5470
      Steven Rostedt 提交于
      As the ring-buffer code is being used by other facilities in the
      kernel, having tracing_on file disable *all* buffers is not a desired
      affect. It should only disable the ftrace buffers that are being used.
      
      Move the code into the trace.c file and use the buffer disabling
      for tracing_on() and tracing_off(). This way only the ftrace buffers
      will be affected by them and other kernel utilities will not be
      confused to why their output suddenly stopped.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      499e5470
  10. 31 8月, 2011 1 次提交
    • V
      trace: Add ring buffer stats to measure rate of events · c64e148a
      Vaibhav Nagarnaik 提交于
      The stats file under per_cpu folder provides the number of entries,
      overruns and other statistics about the CPU ring buffer. However, the
      numbers do not provide any indication of how full the ring buffer is in
      bytes compared to the overall size in bytes. Also, it is helpful to know
      the rate at which the cpu buffer is filling up.
      
      This patch adds an entry "bytes: " in printed stats for per_cpu ring
      buffer which provides the actual bytes consumed in the ring buffer. This
      field includes the number of bytes used by recorded events and the
      padding bytes added when moving the tail pointer to next page.
      
      It also adds the following time stamps:
      "oldest event ts:" - the oldest timestamp in the ring buffer
      "now ts:"  - the timestamp at the time of reading
      
      The field "now ts" provides a consistent time snapshot to the userspace
      when being read. This is read from the same trace clock used by tracing
      event timestamps.
      
      Together, these values provide the rate at which the buffer is filling
      up, from the formula:
      bytes / (now_ts - oldest_event_ts)
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Link: http://lkml.kernel.org/r/1313531179-9323-3-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c64e148a
  11. 15 6月, 2011 1 次提交
    • V
      tracing: Use NUMA allocation for per-cpu ring buffer pages · 7ea59064
      Vaibhav Nagarnaik 提交于
      The tracing ring buffer is a group of per-cpu ring buffers where
      allocation and logging is done on a per-cpu basis. The events that are
      generated on a particular CPU are logged in the corresponding buffer.
      This is to provide wait-free writes between CPUs and good NUMA node
      locality while accessing the ring buffer.
      
      However, the allocation routines consider NUMA locality only for buffer
      page metadata and not for the actual buffer page. This causes the pages
      to be allocated on the NUMA node local to the CPU where the allocation
      routine is running at the time.
      
      This patch fixes the problem by using a NUMA node specific allocation
      routine so that the pages are allocated from a NUMA node local to the
      logging CPU.
      
      I tested with the getuid_microbench from autotest. It is a simple binary
      that calls getuid() in a loop and measures the average time for the
      syscall to complete. The following command was used to test:
      $ getuid_microbench 1000000
      
      Compared the numbers found on kernel with and without this patch and
      found that logging latency decreases by 30-50 ns/call.
      tracing with non-NUMA allocation - 569 ns/call
      tracing with NUMA allocation     - 512 ns/call
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7ea59064
  12. 10 3月, 2011 1 次提交
    • D
      tracing: Add an 'overwrite' trace_option. · 750912fa
      David Sharp 提交于
      Add an "overwrite" trace_option for ftrace to control whether the buffer should
      be overwritten on overflow or not. The default remains to overwrite old events
      when the buffer is full. This patch adds the option to instead discard newest
      events when the buffer is full. This is useful to get a snapshot of traces just
      after enabling traces. Dropping the current event is also a simpler code path.
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1291844807-15481-1-git-send-email-dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      750912fa
  13. 21 10月, 2010 1 次提交
  14. 28 4月, 2010 1 次提交
    • D
      ring-buffer: Make non-consuming read less expensive with lots of cpus. · 72c9ddfd
      David Miller 提交于
      When performing a non-consuming read, a synchronize_sched() is
      performed once for every cpu which is actively tracing.
      
      This is very expensive, and can make it take several seconds to open
      up the 'trace' file with lots of cpus.
      
      Only one synchronize_sched() call is actually necessary.  What is
      desired is for all cpus to see the disabling state change.  So we
      transform the existing sequence:
      
      	for_each_cpu() {
      		ring_buffer_read_start();
      	}
      
      where each ring_buffer_start() call performs a synchronize_sched(),
      into the following:
      
      	for_each_cpu() {
      		ring_buffer_read_prepare();
      	}
      	ring_buffer_read_prepare_sync();
      	for_each_cpu() {
      		ring_buffer_read_start();
      	}
      
      wherein only the single ring_buffer_read_prepare_sync() call needs to
      do the synchronize_sched().
      
      The first phase, via ring_buffer_read_prepare(), allocates the 'iter'
      memory and increments ->record_disabled.
      
      In the second phase, ring_buffer_read_prepare_sync() makes sure this
      ->record_disabled state is visible fully to all cpus.
      
      And in the final third phase, the ring_buffer_read_start() calls reset
      the 'iter' objects allocated in the first phase since we now know that
      none of the cpus are adding trace entries any more.
      
      This makes openning the 'trace' file nearly instantaneous on a
      sparc64 Niagara2 box with 128 cpus tracing.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      LKML-Reference: <20100420.154711.11246950.davem@davemloft.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      72c9ddfd
  15. 01 4月, 2010 1 次提交
    • S
      ring-buffer: Add place holder recording of dropped events · 66a8cb95
      Steven Rostedt 提交于
      Currently, when the ring buffer drops events, it does not record
      the fact that it did so. It does inform the writer that the event
      was dropped by returning a NULL event, but it does not put in any
      place holder where the event was dropped.
      
      This is not a trivial thing to add because the ring buffer mostly
      runs in overwrite (flight recorder) mode. That is, when the ring
      buffer is full, new data will overwrite old data.
      
      In a produce/consumer mode, where new data is simply dropped when
      the ring buffer is full, it is trivial to add the placeholder
      for dropped events. When there's more room to write new data, then
      a special event can be added to notify the reader about the dropped
      events.
      
      But in overwrite mode, any new write can overwrite events. A place
      holder can not be inserted into the ring buffer since there never
      may be room. A reader could also come in at anytime and miss the
      placeholder.
      
      Luckily, the way the ring buffer works, the read side can find out
      if events were lost or not, and how many events. Everytime a write
      takes place, if it overwrites the header page (the next read) it
      updates a "overrun" variable that keeps track of the number of
      lost events. When a reader swaps out a page from the ring buffer,
      it can record this number, perfom the swap, and then check to
      see if the number changed, and take the diff if it has, which would be
      the number of events dropped. This can be stored by the reader
      and returned to callers of the reader.
      
      Since the reader page swap will fail if the writer moved the head
      page since the time the reader page set up the swap, this gives room
      to record the overruns without worrying about races. If the reader
      sets up the pages, records the overrun, than performs the swap,
      if the swap succeeds, then the overrun variable has not been
      updated since the setup before the swap.
      
      For binary readers of the ring buffer, a flag is set in the header
      of each sub page (sub buffer) of the ring buffer. This flag is embedded
      in the size field of the data on the sub buffer, in the 31st bit (the size
      can be 32 or 64 bits depending on the architecture), but only 27
      bits needs to be used for the actual size (less actually).
      
      We could add a new field in the sub buffer header to also record the
      number of events dropped since the last read, but this will change the
      format of the binary ring buffer a bit too much. Perhaps this change can
      be made if the information on the number of events dropped is considered
      important enough.
      
      Note, the notification of dropped events is only used by consuming reads
      or peeking at the ring buffer. Iterating over the ring buffer does not
      keep this information because the necessary data is only available when
      a page swap is made, and the iterator does not swap out pages.
      
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      66a8cb95
  16. 05 9月, 2009 1 次提交
  17. 04 9月, 2009 1 次提交
    • S
      ring-buffer: remove ring_buffer_event_discard · dc892f73
      Steven Rostedt 提交于
      The function ring_buffer_event_discard can be used on any item in the
      ring buffer, even after the item was committed. This function provides
      no safety nets and is very race prone.
      
      An item may be safely removed from the ring buffer before it is committed
      with the ring_buffer_discard_commit.
      
      Since there are currently no users of this function, and because this
      function is racey and error prone, this patch removes it altogether.
      
      Note, removing this function also allows the counters to ignore
      all discarded events (patches will follow).
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      dc892f73
  18. 08 7月, 2009 1 次提交
    • S
      ring-buffer: make lockless · 77ae365e
      Steven Rostedt 提交于
      This patch converts the ring buffers into a completely lockless
      buffer recording system. The read side still takes locks since
      we still serialize readers. But the writers are the ones that
      must be lockless (those can happen in NMIs).
      
      The main change is to the "head_page" pointer. We write to the
      tail, and read from the head. The "head_page" pointer in the cpu
      buffer is now just a reference to where to look. The real head
      page is now kept in the head_page->list->prev->next pointer.
      That is, in the list head of the previous page we set flags.
      
      The list pages are allocated to be aligned such that the lowest
      significant bits are always zero pointing to the list. This gives
      us play to put in flags to their pointers.
      
      bit 0: set when the page is a head page
      bit 1: set when the writer is moving the page (for overwrite mode)
      
      cmpxchg is used to update the pointer.
      
      When the writer wraps the buffer and the tail meets the head,
      in overwrite mode, the writer must move the head page forward.
      It first uses cmpxchg to change the pointer flag from 1 to 2.
      Once this is done, the reader on another CPU will not take the
      page from the buffer.
      
      The writers need to protect against interrupts (we don't bother with
      disabling interrupts because NMIs are allowed to write too).
      
      After the writer sets the pointer flag to 2, it takes care to
      manage interrupts coming in. This is discribed in detail within the
      comments of the code.
      
       Changes in version 2:
        - Let reader reset entries value of header page.
        - Fix tail page passing commit page on reader page test.
        - Always increment entries and write counter in rb_tail_page_update
        - Add safety check in rb_set_commit_to_write to break out of infinite loop
        - add mask in rb_is_reader_page
      
      [ Impact: lock free writing to the ring buffer ]
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      77ae365e
  19. 15 6月, 2009 1 次提交
  20. 09 6月, 2009 1 次提交
    • P
      ring-buffer: pass in lockdep class key for reader_lock · 1f8a6a10
      Peter Zijlstra 提交于
      On Sun, 7 Jun 2009, Ingo Molnar wrote:
      > Testing tracer sched_switch: <6>Starting ring buffer hammer
      > PASSED
      > Testing tracer sysprof: PASSED
      > Testing tracer function: PASSED
      > Testing tracer irqsoff:
      > =============================================
      > PASSED
      > Testing tracer preemptoff: PASSED
      > Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ]
      > PASSED
      > Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760
      > ---------------------------------------------
      > rb_consumer/431 is trying to acquire lock:
      >  (&cpu_buffer->reader_lock){......}, at: [<c109eef7>] ring_buffer_reset_cpu+0x37/0x70
      >
      > but task is already holding lock:
      >  (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
      >
      > other info that might help us debug this:
      > 1 lock held by rb_consumer/431:
      >  #0:  (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
      
      The ring buffer is a generic structure, and can be used outside of
      ftrace. If ftrace traces within the use of the ring buffer, it can produce
      false positives with lockdep.
      
      This patch passes in a static lock key into the allocation of the ring
      buffer, so that different ring buffers will have their own lock class.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1244477919.13761.9042.camel@twins>
      
      [ store key in ring buffer descriptor ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1f8a6a10
  21. 06 5月, 2009 1 次提交
    • S
      ring-buffer: add counters for commit overrun and nmi dropped entries · f0d2c681
      Steven Rostedt 提交于
      The WARN_ON in the ring buffer when a commit is preempted and the
      buffer is filled by preceding writes can happen in normal operations.
      The WARN_ON makes it look like a bug, not to mention, because
      it does not stop tracing and calls printk which can also recurse, this
      is prone to deadlock (the WARN_ON is not in a position to recurse).
      
      This patch removes the WARN_ON and replaces it with a counter that
      can be retrieved by a tracer. This counter is called commit_overrun.
      
      While at it, I added a nmi_dropped counter to count any time an NMI entry
      is dropped because the NMI could not take the spinlock.
      
      [ Impact: prevent deadlock by printing normal case warning ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f0d2c681
  22. 24 4月, 2009 1 次提交
  23. 17 4月, 2009 1 次提交
    • S
      tracing/events/ring-buffer: expose format of ring buffer headers to users · d1b182a8
      Steven Rostedt 提交于
      Currently, every thing needed to read the binary output from the
      ring buffers is available, with the exception of the way the ring
      buffers handles itself internally.
      
      This patch creates two special files in the debugfs/tracing/events
      directory:
      
       # cat /debug/tracing/events/header_page
              field: u64 timestamp;   offset:0;       size:8;
              field: local_t commit;  offset:8;       size:8;
              field: char data;       offset:16;      size:4080;
      
       # cat /debug/tracing/events/header_event
              type        :    2 bits
              len         :    3 bits
              time_delta  :   27 bits
              array       :   32 bits
      
              padding     : type == 0
              time_extend : type == 1
              data        : type == 3
      
      This is to allow a userspace app to see if the ring buffer format changes
      or not.
      
      [ Impact: allow userspace apps to know of ringbuffer format changes ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d1b182a8
  24. 14 4月, 2009 1 次提交
    • S
      ring-buffer: add ring_buffer_discard_commit · fa1b47dd
      Steven Rostedt 提交于
      The ring_buffer_discard_commit is similar to ring_buffer_event_discard
      but it can only be done on an event that has yet to be commited.
      Unpredictable results can happen otherwise.
      
      The main difference between ring_buffer_discard_commit and
      ring_buffer_event_discard is that ring_buffer_discard_commit will try
      to free the data in the ring buffer if nothing has addded data
      after the reserved event. If something did, then it acts almost the
      same as ring_buffer_event_discard followed by a
      ring_buffer_unlock_commit.
      
      Note, either ring_buffer_commit_discard and ring_buffer_unlock_commit
      can be called on an event, not both.
      
      This commit also exports both discard functions to be usable by
      GPL modules.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa1b47dd
  25. 23 3月, 2009 1 次提交
  26. 18 3月, 2009 1 次提交
  27. 05 3月, 2009 1 次提交
    • S
      tracing: add tracing_on/tracing_off to kernel.h · 2002c258
      Steven Rostedt 提交于
      Impact: cleanup
      
      The functions tracing_start/tracing_stop have been moved to kernel.h.
      These are not the functions a developer most likely wants to use
      when they want to insert a place to stop tracing and restart it from
      user space.
      
      tracing_start/tracing_stop was created to work with things like
      suspend to ram, where even calling smp_processor_id() can crash the
      system. The tracing_start/tracing_stop was used to stop the tracer from
      doing anything. These are still light weight functions, but add a bit
      more overhead to be able to stop the tracers. They also have no interface
      back to userland. That is, if the kernel calls tracing_stop, userland
      can not start tracing.
      
      What a developer most likely wants to use is tracing_on/tracing_off.
      These are very light weight functions (simply sets or clears a bit).
      These functions just stop recording into the ring buffer. The tracers
      don't even know that this happens except that they would receive NULL
      from the ring_buffer_lock_reserve function.
      
      Also, there's a way for the user land to enable or disable this bit.
      In debugfs/tracing/tracing_on, a user may echo "0" (same as tracing_off())
      or echo "1" (same as tracing_on()) into this file. This becomes handy when
      a kernel developer is debugging and wants tracing to turn off when it
      hits an anomaly. Then the developer can examine the trace, and restart
      tracing if they want to try again (echo 1 > tracing_on).
      
      This patch moves the prototypes for tracing_on/tracing_off to kernel.h
      and comments their use, so that a kernel developer will know how
      to use them.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      2002c258
  28. 04 3月, 2009 1 次提交
  29. 17 2月, 2009 1 次提交
  30. 11 2月, 2009 1 次提交
  31. 08 2月, 2009 1 次提交
  32. 06 2月, 2009 1 次提交
    • A
      ring_buffer: remove unused flags parameter · 0a987751
      Arnaldo Carvalho de Melo 提交于
      Impact: API change, cleanup
      
      >From ring_buffer_{lock_reserve,unlock_commit}.
      
      $ codiff /tmp/vmlinux.before /tmp/vmlinux.after
      linux-2.6-tip/kernel/trace/trace.c:
        trace_vprintk              |  -14
        trace_graph_return         |  -14
        trace_graph_entry          |  -10
        trace_function             |   -8
        __ftrace_trace_stack       |   -8
        ftrace_trace_userstack     |   -8
        tracing_sched_switch_trace |   -8
        ftrace_trace_special       |  -12
        tracing_sched_wakeup_trace |   -8
       9 functions changed, 90 bytes removed, diff: -90
      
      linux-2.6-tip/block/blktrace.c:
        __blk_add_trace |   -1
       1 function changed, 1 bytes removed, diff: -1
      
      /tmp/vmlinux.after:
       10 functions changed, 91 bytes removed, diff: -91
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NFrédéric Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0a987751
  33. 10 12月, 2008 1 次提交
  34. 08 12月, 2008 1 次提交
  35. 03 12月, 2008 1 次提交
    • S
      ring-buffer: read page interface · 8789a9e7
      Steven Rostedt 提交于
      Impact: new API to ring buffer
      
      This patch adds a new interface into the ring buffer that allows a
      page to be read from the ring buffer on a given CPU. For every page
      read, one must also be given to allow for a "swap" of the pages.
      
       rpage = ring_buffer_alloc_read_page(buffer);
       if (!rpage)
      	goto err;
       ret = ring_buffer_read_page(buffer, &rpage, cpu, full);
       if (!ret)
      	goto empty;
       process_page(rpage);
       ring_buffer_free_read_page(rpage);
      
      The caller of these functions must handle any waits that are
      needed to wait for new data. The ring_buffer_read_page will simply
      return 0 if there is no data, or if "full" is set and the writer
      is still on the current page.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8789a9e7
  36. 23 11月, 2008 1 次提交
    • S
      ring-buffer: add tracing_off_permanent · 033601a3
      Steven Rostedt 提交于
      Impact: feature to permanently disable ring buffer
      
      This patch adds a API to the ring buffer code that will permanently
      disable the ring buffer from ever recording. This should only be
      called when some serious anomaly is detected, and the system
      may be in an unstable state. When that happens, shutting down the
      recording to the ring buffers may be appropriate.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      033601a3
  37. 12 11月, 2008 1 次提交
    • S
      ring-buffer: buffer record on/off switch · a3583244
      Steven Rostedt 提交于
      Impact: enable/disable ring buffer recording API added
      
      Several kernel developers have requested that there be a way to stop
      recording into the ring buffers with a simple switch that can also
      be enabled from userspace. This patch addes a new kernel API to the
      ring buffers called:
      
       tracing_on()
       tracing_off()
      
      When tracing_off() is called, all ring buffers will not be able to record
      into their buffers.
      
      tracing_on() will enable the ring buffers again.
      
      These two act like an on/off switch. That is, there is no counting of the
      number of times tracing_off or tracing_on has been called.
      
      A new file is added to the debugfs/tracing directory called
      
        tracing_on
      
      This allows for userspace applications to also flip the switch.
      
        echo 0 > debugfs/tracing/tracing_on
      
      disables the tracing.
      
        echo 1 > /debugfs/tracing/tracing_on
      
      enables it.
      
      Note, this does not disable or enable any tracers. It only sets or clears
      a flag that needs to be set in order for the ring buffers to write to
      their buffers. It is a global flag, and affects all ring buffers.
      
      The buffers start out with tracing_on enabled.
      
      There are now three flags that control recording into the buffers:
      
       tracing_on: which affects all ring buffer tracers.
      
       buffer->record_disabled: which affects an allocated buffer, which may be set
           if an anomaly is detected, and tracing is disabled.
      
       cpu_buffer->record_disabled: which is set by tracing_stop() or if an
           anomaly is detected. tracing_start can not reenable this if
           an anomaly occurred.
      
      The userspace debugfs/tracing/tracing_enabled is implemented with
      tracing_stop() but the user space code can not enable it if the kernel
      called tracing_stop().
      
      Userspace can enable the tracing_on even if the kernel disabled it.
      It is just a switch used to stop tracing if a condition was hit.
      tracing_on is not for protecting critical areas in the kernel nor is
      it for stopping tracing if an anomaly occurred. This is because userspace
      can reenable it at any time.
      
      Side effect: With this patch, I discovered a dead variable in ftrace.c
        called tracing_on. This patch removes it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      a3583244
  38. 14 10月, 2008 2 次提交
    • S
      ring_buffer: implement new locking · d769041f
      Steven Rostedt 提交于
      The old "lock always" scheme had issues with lockdep, and was not very
      efficient anyways.
      
      This patch does a new design to be partially lockless on writes.
      Writes will add new entries to the per cpu pages by simply disabling
      interrupts. When a write needs to go to another page than it will
      grab the lock.
      
      A new "read page" has been added so that the reader can pull out a page
      from the ring buffer to read without worrying about the writer writing over
      it. This allows us to not take the lock for all reads. The lock is
      now only taken when a read needs to go to a new page.
      
      This is far from lockless, and interrupts still need to be disabled,
      but it is a step towards a more lockless solution, and it also
      solves a lot of the issues that were noticed by the first conversion
      of ftrace to the ring buffers.
      
      Note: the ring_buffer_{un}lock API has been removed.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d769041f
    • S
      tracing: unified trace buffer · 7a8e76a3
      Steven Rostedt 提交于
      This is a unified tracing buffer that implements a ring buffer that
      hopefully everyone will eventually be able to use.
      
      The events recorded into the buffer have the following structure:
      
        struct ring_buffer_event {
      	u32 type:2, len:3, time_delta:27;
      	u32 array[];
        };
      
      The minimum size of an event is 8 bytes. All events are 4 byte
      aligned inside the buffer.
      
      There are 4 types (all internal use for the ring buffer, only
      the data type is exported to the interface users).
      
       RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
      	of a buffer page.
      
       RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
      	is greater than the 27 bit delta can hold. We add another
      	32 bits, and record that in its own event (8 byte size).
      
       RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
      	help keep the buffer timestamps in sync.
      
      RINGBUF_TYPE_DATA: The event actually holds user data.
      
      The "len" field is only three bits. Since the data must be
      4 byte aligned, this field is shifted left by 2, giving a
      max length of 28 bytes. If the data load is greater than 28
      bytes, the first array field holds the full length of the
      data load and the len field is set to zero.
      
      Example, data size of 7 bytes:
      
      	type = RINGBUF_TYPE_DATA
      	len = 2
      	time_delta: <time-stamp> - <prev_event-time-stamp>
      	array[0..1]: <7 bytes of data> <1 byte empty>
      
      This event is saved in 12 bytes of the buffer.
      
      An event with 82 bytes of data:
      
      	type = RINGBUF_TYPE_DATA
      	len = 0
      	time_delta: <time-stamp> - <prev_event-time-stamp>
      	array[0]: 84 (Note the alignment)
      	array[1..14]: <82 bytes of data> <2 bytes empty>
      
      The above event is saved in 92 bytes (if my math is correct).
      82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.
      
      Do not reference the above event struct directly. Use the following
      functions to gain access to the event table, since the
      ring_buffer_event structure may change in the future.
      
      ring_buffer_event_length(event): get the length of the event.
      	This is the size of the memory used to record this
      	event, and not the size of the data pay load.
      
      ring_buffer_time_delta(event): get the time delta of the event
      	This returns the delta time stamp since the last event.
      	Note: Even though this is in the header, there should
      		be no reason to access this directly, accept
      		for debugging.
      
      ring_buffer_event_data(event): get the data from the event
      	This is the function to use to get the actual data
      	from the event. Note, it is only a pointer to the
      	data inside the buffer. This data must be copied to
      	another location otherwise you risk it being written
      	over in the buffer.
      
      ring_buffer_lock: A way to lock the entire buffer.
      ring_buffer_unlock: unlock the buffer.
      
      ring_buffer_alloc: create a new ring buffer. Can choose between
      	overwrite or consumer/producer mode. Overwrite will
      	overwrite old data, where as consumer producer will
      	throw away new data if the consumer catches up with the
      	producer.  The consumer/producer is the default.
      
      ring_buffer_free: free the ring buffer.
      
      ring_buffer_resize: resize the buffer. Changes the size of each cpu
      	buffer. Note, it is up to the caller to provide that
      	the buffer is not being used while this is happening.
      	This requirement may go away but do not count on it.
      
      ring_buffer_lock_reserve: locks the ring buffer and allocates an
      	entry on the buffer to write to.
      ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
      	the buffer.
      
      ring_buffer_write: writes some data into the ring buffer.
      
      ring_buffer_peek: Look at a next item in the cpu buffer.
      ring_buffer_consume: get the next item in the cpu buffer and
      	consume it. That is, this function increments the head
      	pointer.
      
      ring_buffer_read_start: Start an iterator of a cpu buffer.
      	For now, this disables the cpu buffer, until you issue
      	a finish. This is just because we do not want the iterator
      	to be overwritten. This restriction may change in the future.
      	But note, this is used for static reading of a buffer which
      	is usually done "after" a trace. Live readings would want
      	to use the ring_buffer_consume above, which will not
      	disable the ring buffer.
      
      ring_buffer_read_finish: Finishes the read iterator and reenables
      	the ring buffer.
      
      ring_buffer_iter_peek: Look at the next item in the cpu iterator.
      ring_buffer_read: Read the iterator and increment it.
      ring_buffer_iter_reset: Reset the iterator to point to the beginning
      	of the cpu buffer.
      ring_buffer_iter_empty: Returns true if the iterator is at the end
      	of the cpu buffer.
      
      ring_buffer_size: returns the size in bytes of each cpu buffer.
      	Note, the real size is this times the number of CPUs.
      
      ring_buffer_reset_cpu: Sets the cpu buffer to empty
      ring_buffer_reset: sets all cpu buffers to empty
      
      ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
      	cpu buffer of another buffer. This is handy when you
      	want to take a snap shot of a running trace on just one
      	cpu. Having a backup buffer, to swap with facilitates this.
      	Ftrace max latencies use this.
      
      ring_buffer_empty: Returns true if the ring buffer is empty.
      ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.
      
      ring_buffer_record_disable: disable all cpu buffers (read only)
      ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
      ring_buffer_record_enable: enable all cpu buffers.
      ring_buffer_record_enabl_cpu: enable a single cpu buffer.
      
      ring_buffer_entries: The number of entries in a ring buffer.
      ring_buffer_overruns: The number of entries removed due to writing wrap.
      
      ring_buffer_time_stamp: Get the time stamp used by the ring buffer
      ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
      	into nanosecs.
      
      I still need to implement the GTOD feature. But we need support from
      the cpu frequency infrastructure.  But this can be done at a later
      time without affecting the ring buffer interface.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7a8e76a3