1. 11 2月, 2009 1 次提交
    • A
      ring_buffer: pahole struct ring_buffer · 00f62f61
      Arnaldo Carvalho de Melo 提交于
      While fixing some bugs in pahole (built-in.o files were not being
      processed due to relocation problems) I found out about these packable
      structures:
      
      $ pahole --packable kernel/trace/ring_buffer.o  | grep ring
      ring_buffer	72	64	8
      ring_buffer_per_cpu	112	104	8
      
      If we take a look at the current layout of struct ring_buffer we can see
      that we have two 4 bytes holes.
      
      $ pahole -C ring_buffer kernel/trace/ring_buffer.o
      struct ring_buffer {
      	unsigned int               pages;           /*     0     4 */
      	unsigned int               flags;           /*     4     4 */
      	int                        cpus;            /*     8     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	cpumask_var_t              cpumask;         /*    16     8 */
      	atomic_t                   record_disabled; /*    24     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct mutex               mutex;           /*    32    32 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	struct ring_buffer_per_cpu * * buffers;     /*    64     8 */
      
      	/* size: 72, cachelines: 2, members: 7 */
      	/* sum members: 64, holes: 2, sum holes: 8 */
      	/* last cacheline: 8 bytes */
      };
      
      So, if I ask pahole to reorganize it:
      
      $ pahole -C ring_buffer --reorganize kernel/trace/ring_buffer.o
      
      struct ring_buffer {
      	unsigned int               pages;           /*     0     4 */
      	unsigned int               flags;           /*     4     4 */
      	int                        cpus;            /*     8     4 */
      	atomic_t                   record_disabled; /*    12     4 */
      	cpumask_var_t              cpumask;         /*    16     8 */
      	struct mutex               mutex;           /*    24    32 */
      	struct ring_buffer_per_cpu * * buffers;     /*    56     8 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      
      	/* size: 64, cachelines: 1, members: 7 */
      };   /* saved 8 bytes and 1 cacheline! */
      
      We get it using just one 64 bytes cacheline.
      
      To see what it did:
      
      $ pahole -C ring_buffer --reorganize --show_reorg_steps \
      	kernel/trace/ring_buffer.o | grep \/
      /* Moving 'record_disabled' from after 'cpumask' to after 'cpus' */
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      00f62f61
  2. 08 2月, 2009 2 次提交
    • S
      ring-buffer: use generic version of in_nmi · a81bd80a
      Steven Rostedt 提交于
      Impact: clean up
      
      Now that a generic in_nmi is available, this patch removes the
      special code in the ring_buffer and implements the in_nmi generic
      version instead.
      
      With this change, I was also able to rename the "arch_ftrace_nmi_enter"
      back to "ftrace_nmi_enter" and remove the code from the ring buffer.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      a81bd80a
    • S
      ring-buffer: add NMI protection for spinlocks · 78d904b4
      Steven Rostedt 提交于
      Impact: prevent deadlock in NMI
      
      The ring buffers are not yet totally lockless with writing to
      the buffer. When a writer crosses a page, it grabs a per cpu spinlock
      to protect against a reader. The spinlocks taken by a writer are not
      to protect against other writers, since a writer can only write to
      its own per cpu buffer. The spinlocks protect against readers that
      can touch any cpu buffer. The writers are made to be reentrant
      with the spinlocks disabling interrupts.
      
      The problem arises when an NMI writes to the buffer, and that write
      crosses a page boundary. If it grabs a spinlock, it can be racing
      with another writer (since disabling interrupts does not protect
      against NMIs) or with a reader on the same CPU. Luckily, most of the
      users are not reentrant and protects against this issue. But if a
      user of the ring buffer becomes reentrant (which is what the ring
      buffers do allow), if the NMI also writes to the ring buffer then
      we risk the chance of a deadlock.
      
      This patch moves the ftrace_nmi_enter called by nmi_enter() to the
      ring buffer code. It replaces the current ftrace_nmi_enter that is
      used by arch specific code to arch_ftrace_nmi_enter and updates
      the Kconfig to handle it.
      
      When an NMI is called, it will set a per cpu variable in the ring buffer
      code and will clear it when the NMI exits. If a write to the ring buffer
      crosses page boundaries inside an NMI, a trylock is used on the spin
      lock instead. If the spinlock fails to be acquired, then the entry
      is discarded.
      
      This bug appeared in the ftrace work in the RT tree, where event tracing
      is reentrant. This workaround solved the deadlocks that appeared there.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      78d904b4
  3. 06 2月, 2009 1 次提交
    • A
      ring_buffer: remove unused flags parameter · 0a987751
      Arnaldo Carvalho de Melo 提交于
      Impact: API change, cleanup
      
      >From ring_buffer_{lock_reserve,unlock_commit}.
      
      $ codiff /tmp/vmlinux.before /tmp/vmlinux.after
      linux-2.6-tip/kernel/trace/trace.c:
        trace_vprintk              |  -14
        trace_graph_return         |  -14
        trace_graph_entry          |  -10
        trace_function             |   -8
        __ftrace_trace_stack       |   -8
        ftrace_trace_userstack     |   -8
        tracing_sched_switch_trace |   -8
        ftrace_trace_special       |  -12
        tracing_sched_wakeup_trace |   -8
       9 functions changed, 90 bytes removed, diff: -90
      
      linux-2.6-tip/block/blktrace.c:
        __blk_add_trace |   -1
       1 function changed, 1 bytes removed, diff: -1
      
      /tmp/vmlinux.after:
       10 functions changed, 91 bytes removed, diff: -91
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NFrédéric Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0a987751
  4. 22 1月, 2009 3 次提交
  5. 21 1月, 2009 1 次提交
    • L
      ring_buffer: reset write when reserve buffer fail · 551b4048
      Lai Jiangshan 提交于
      Impact: reset struct buffer_page.write when interrupt storm
      
      if struct buffer_page.write is not reset, any succedent committing
      will corrupted ring_buffer:
      
      static inline void
      rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
      {
      	......
      		cpu_buffer->commit_page->commit =
      			cpu_buffer->commit_page->write;
      	......
      }
      
      when "if (RB_WARN_ON(cpu_buffer, next_page == reader_page))", ring_buffer
      is disabled, but some reserved buffers may haven't been committed.
      we need reset struct buffer_page.write.
      
      when "if (unlikely(next_page == cpu_buffer->commit_page))", ring_buffer
      is still available, we should not corrupt it.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      551b4048
  6. 20 1月, 2009 1 次提交
    • S
      ring-buffer: fix alignment problem · 082605de
      Steven Rostedt 提交于
      Impact: fix to allow some archs to use the ring buffer
      
      Commits in the ring buffer are checked by pointer arithmetic.
      If the calculation is incorrect, then the commits will never take
      place and the buffer will simply fill up and report an error.
      
      Each page in the ring buffer has a small header:
      
      struct buffer_data_page {
      	u64		time_stamp;
      	local_t		commit;
      	unsigned char	data[];
      };
      
      Unfortuntely, some of the calculations used sizeof(struct buffer_data_page)
      to know the size of the header. But this is incorrect on some archs,
      where sizeof(struct buffer_data_page) does not equal
      offsetof(struct buffer_data_page, data), and on those archs, the commits
      are never processed.
      
      This patch replaces the sizeof with offsetof.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      082605de
  7. 15 1月, 2009 1 次提交
    • L
      ring_buffer: reset write when reserve buffer fail · 6f3b3440
      Lai Jiangshan 提交于
      Impact: reset struct buffer_page.write when interrupt storm
      
      if struct buffer_page.write is not reset, any succedent committing
      will corrupted ring_buffer:
      
      static inline void
      rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
      {
      	......
      		cpu_buffer->commit_page->commit =
      			cpu_buffer->commit_page->write;
      	......
      }
      
      when "if (RB_WARN_ON(cpu_buffer, next_page == reader_page))", ring_buffer
      is disabled, but some reserved buffers may haven't been committed.
      we need reset struct buffer_page.write.
      
      when "if (unlikely(next_page == cpu_buffer->commit_page))", ring_buffer
      is still available, we should not corrupt it.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6f3b3440
  8. 11 1月, 2009 2 次提交
  9. 08 1月, 2009 1 次提交
    • R
      ring_buffer: fix ring_buffer_event_length() · 465634ad
      Robert Richter 提交于
      Function ring_buffer_event_length() provides an interface to detect
      the length of data stored in an entry. However, the length contains
      offsets depending on the internal usage. This makes it unusable. This
      patch fixes this and now ring_buffer_event_length() returns the
      alligned length that has been used in ring_buffer_lock_reserve().
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      465634ad
  10. 01 1月, 2009 1 次提交
  11. 24 12月, 2008 2 次提交
  12. 18 12月, 2008 1 次提交
  13. 17 12月, 2008 1 次提交
  14. 12 12月, 2008 1 次提交
  15. 10 12月, 2008 1 次提交
  16. 03 12月, 2008 3 次提交
    • S
      ring-buffer: change "page" variable names to "bpage" · 044fa782
      Steven Rostedt 提交于
      Impact: clean up
      
      Andrew Morton pointed out that the kernel convention of a variable
      named page should be of type page struct. The ring buffer uses
      a variable named "page" for a pointer to something else.
      
      This patch converts those to be called "bpage" (as in "buffer page").
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      044fa782
    • S
      ring-buffer: read page interface · 8789a9e7
      Steven Rostedt 提交于
      Impact: new API to ring buffer
      
      This patch adds a new interface into the ring buffer that allows a
      page to be read from the ring buffer on a given CPU. For every page
      read, one must also be given to allow for a "swap" of the pages.
      
       rpage = ring_buffer_alloc_read_page(buffer);
       if (!rpage)
      	goto err;
       ret = ring_buffer_read_page(buffer, &rpage, cpu, full);
       if (!ret)
      	goto empty;
       process_page(rpage);
       ring_buffer_free_read_page(rpage);
      
      The caller of these functions must handle any waits that are
      needed to wait for new data. The ring_buffer_read_page will simply
      return 0 if there is no data, or if "full" is set and the writer
      is still on the current page.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8789a9e7
    • S
      ring-buffer: move some metadata into buffer page · abc9b56d
      Steven Rostedt 提交于
      Impact: get ready for splice changes
      
      This patch moves the commit and timestamp into the beginning of each
      data page of the buffer. This change will allow the page to be moved
      to another location (disk, network, etc) and still have information
      in the page to be able to read it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      abc9b56d
  17. 27 11月, 2008 1 次提交
  18. 23 11月, 2008 1 次提交
    • S
      ring-buffer: add tracing_off_permanent · 033601a3
      Steven Rostedt 提交于
      Impact: feature to permanently disable ring buffer
      
      This patch adds a API to the ring buffer code that will permanently
      disable the ring buffer from ever recording. This should only be
      called when some serious anomaly is detected, and the system
      may be in an unstable state. When that happens, shutting down the
      recording to the ring buffers may be appropriate.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      033601a3
  19. 19 11月, 2008 1 次提交
  20. 13 11月, 2008 1 次提交
  21. 12 11月, 2008 5 次提交
    • S
      ring-buffer: fix deadlock from reader_lock in read_start · 642edba5
      Steven Rostedt 提交于
      Impact: deadlock fix in ring_buffer_read_start
      
      The ring_buffer_iter_reset was called from ring_buffer_read_start
      where both grabbed the reader_lock.
      
      This patch separates out the internals of ring_buffer_iter_reset
      to its own function so that both APIs may grab the reader_lock.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      642edba5
    • S
      ring-buffer: no preempt for sched_clock() · 47e74f2b
      Steven Rostedt 提交于
      Impact: disable preemption when calling sched_clock()
      
      The ring_buffer_time_stamp still uses sched_clock as its counter.
      But it is a bug to call it with preemption enabled. This requirement
      should not be pushed to the ring_buffer_time_stamp callers, so
      the ring_buffer_time_stamp needs to disable preemption when calling
      sched_clock.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      47e74f2b
    • S
      ring-buffer: clean up warn ons · 3e89c7bb
      Steven Rostedt 提交于
      Impact: Restructure WARN_ONs in ring_buffer.c
      
      The current WARN_ON macros in ring_buffer.c are quite ugly.
      
      This patch cleans them up and uses a single RB_WARN_ON that returns
      the value of the condition. This allows the caller to abort the
      function if the condition is true.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e89c7bb
    • S
      ring-buffer: buffer record on/off switch · a3583244
      Steven Rostedt 提交于
      Impact: enable/disable ring buffer recording API added
      
      Several kernel developers have requested that there be a way to stop
      recording into the ring buffers with a simple switch that can also
      be enabled from userspace. This patch addes a new kernel API to the
      ring buffers called:
      
       tracing_on()
       tracing_off()
      
      When tracing_off() is called, all ring buffers will not be able to record
      into their buffers.
      
      tracing_on() will enable the ring buffers again.
      
      These two act like an on/off switch. That is, there is no counting of the
      number of times tracing_off or tracing_on has been called.
      
      A new file is added to the debugfs/tracing directory called
      
        tracing_on
      
      This allows for userspace applications to also flip the switch.
      
        echo 0 > debugfs/tracing/tracing_on
      
      disables the tracing.
      
        echo 1 > /debugfs/tracing/tracing_on
      
      enables it.
      
      Note, this does not disable or enable any tracers. It only sets or clears
      a flag that needs to be set in order for the ring buffers to write to
      their buffers. It is a global flag, and affects all ring buffers.
      
      The buffers start out with tracing_on enabled.
      
      There are now three flags that control recording into the buffers:
      
       tracing_on: which affects all ring buffer tracers.
      
       buffer->record_disabled: which affects an allocated buffer, which may be set
           if an anomaly is detected, and tracing is disabled.
      
       cpu_buffer->record_disabled: which is set by tracing_stop() or if an
           anomaly is detected. tracing_start can not reenable this if
           an anomaly occurred.
      
      The userspace debugfs/tracing/tracing_enabled is implemented with
      tracing_stop() but the user space code can not enable it if the kernel
      called tracing_stop().
      
      Userspace can enable the tracing_on even if the kernel disabled it.
      It is just a switch used to stop tracing if a condition was hit.
      tracing_on is not for protecting critical areas in the kernel nor is
      it for stopping tracing if an anomaly occurred. This is because userspace
      can reenable it at any time.
      
      Side effect: With this patch, I discovered a dead variable in ftrace.c
        called tracing_on. This patch removes it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      a3583244
    • S
      ring-buffer: add reader lock · f83c9d0f
      Steven Rostedt 提交于
      Impact: serialize reader accesses to individual CPU ring buffers
      
      The code in the ring buffer expects only one reader at a time, but currently
      it puts that requirement on the caller. This is not strong enough, and this
      patch adds a "reader_lock" that serializes the access to the reader API
      of the ring buffer.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f83c9d0f
  22. 11 11月, 2008 2 次提交
    • S
      ring-buffer: replace most bug ons with warn on and disable buffer · f536aafc
      Steven Rostedt 提交于
      This patch replaces most of the BUG_ONs in the ring_buffer code with
      RB_WARN_ON variants. It adds some more variants as needed for the
      replacement. This lets the buffer die nicely and still warn the user.
      
      One BUG_ON remains in the code, and that is because it detects a
      bad pointer passed in by the calling function, and not a bug by
      the ring buffer code itself.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f536aafc
    • S
      ring-buffer: prevent infinite looping on time stamping · 4143c5cb
      Steven Rostedt 提交于
      Impact: removal of unnecessary looping
      
      The lockless part of the ring buffer allows for reentry into the code
      from interrupts. A timestamp is taken, a test is preformed and if it
      detects that an interrupt occurred that did tracing, it tries again.
      
      The problem arises if the timestamp code itself causes a trace.
      The detection will detect this and loop again. The difference between
      this and an interrupt doing tracing, is that this will fail every time,
      and cause an infinite loop.
      
      Currently, we test if the loop happens 1000 times, and if so, it will
      produce a warning and disable the ring buffer.
      
      The problem with this approach is that it makes it difficult to perform
      some types of tracing (tracing the timestamp code itself).
      
      Each trace entry has a delta timestamp from the previous entry.
      If a trace entry is reserved but and interrupt occurs and traces before
      the previous entry is commited, the delta timestamp for that entry will
      be zero. This actually makes sense in terms of tracing, because the
      interrupt entry happened before the preempted entry was commited, so
      one may consider the two happening at the same time. The order is
      still preserved in the buffer.
      
      With this idea, instead of trying to get a new timestamp if an interrupt
      made it in between the timestamp and the test, the entry could simply
      make the delta zero and continue. This will prevent interrupts or
      tracers in the timer code from causing the above loop.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      4143c5cb
  23. 06 11月, 2008 1 次提交
    • S
      ring-buffer: convert to raw spinlocks · 3e03fb7f
      Steven Rostedt 提交于
      Impact: no lockdep debugging of ring buffer
      
      The problem with running lockdep on the ring buffer is that the
      ring buffer is the core infrastructure of ftrace. What happens is
      that the tracer will start tracing the lockdep code while lockdep
      is testing the ring buffers locks.  This can cause lockdep to
      fail due to testing cases that have not fully finished their
      locking transition.
      
      This patch converts the spin locks used by the ring buffer back
      into raw spin locks which lockdep does not check.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e03fb7f
  24. 04 11月, 2008 1 次提交
  25. 03 11月, 2008 1 次提交
    • S
      tracing, ring-buffer: add paranoid checks for loops · 818e3dd3
      Steven Rostedt 提交于
      While writing a new tracer, I had a bug where I caused the ring-buffer
      to recurse in a bad way. The bug was with the tracer I was writing
      and not the ring-buffer itself. But it took a long time to find the
      problem.
      
      This patch adds paranoid checks into the ring-buffer infrastructure
      that will catch bugs of this nature.
      
      Note: I put the bug back in the tracer and this patch showed the error
            nicely and prevented the lockup.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      818e3dd3
  26. 27 10月, 2008 1 次提交
    • S
      trace: fix printk warning for u64 · e2862c94
      Stephen Rothwell 提交于
      A powerpc ppc64_defconfig build produces these warnings:
      
      kernel/trace/ring_buffer.c: In function 'rb_add_time_stamp':
      kernel/trace/ring_buffer.c:969: warning: format '%llu' expects type 'long long unsigned int', but argument 2 has type 'u64'
      kernel/trace/ring_buffer.c:969: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'u64'
      kernel/trace/ring_buffer.c:969: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'u64'
      
      Just cast the u64s to unsigned long long like we do everywhere else.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e2862c94
  27. 22 10月, 2008 1 次提交
  28. 14 10月, 2008 1 次提交
    • S
      ring-buffer: make reentrant · bf41a158
      Steven Rostedt 提交于
      This patch replaces the local_irq_save/restore with preempt_disable/
      enable. This allows for interrupts to enter while recording.
      To write to the ring buffer, you must reserve data, and then
      commit it. During this time, an interrupt may call a trace function
      that will also record into the buffer before the commit is made.
      
      The interrupt will reserve its entry after the first entry, even
      though the first entry did not finish yet.
      
      The time stamp delta of the interrupt entry will be zero, since
      in the view of the trace, the interrupt happened during the
      first field anyway.
      
      Locking still takes place when the tail/write moves from one page
      to the next. The reader always takes the locks.
      
      A new page pointer is added, called the commit. The write/tail will
      always point to the end of all entries. The commit field will
      point to the last committed entry. Only this commit entry may
      update the write time stamp.
      
      The reader can only go up to the commit. It cannot go past it.
      
      If a lot of interrupts come in during a commit that fills up the
      buffer, and it happens to make it all the way around the buffer
      back to the commit, then a warning is printed and new events will
      be dropped.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bf41a158