提交 · ff0ff84a0767df48d728c36510365344a7e7d582 · openeuler / raspberrypi-kernel

01 4月, 2010 2 次提交

ring-buffer: Add lost event count to end of sub buffer · ff0ff84a

由 Steven Rostedt 提交于 3月 31, 2010

Currently, binary readers of the ring buffer only know where events were
lost, but not how many events were lost at that location.
This information is available, but it would require adding another
field to the sub buffer header to include it.

But when a event can not fit at the end of a sub buffer, it is written
to the next sub buffer. This means there is a good chance that the
buffer may have room to hold this counter. If it does, write
the counter at the end of the sub buffer and set another flag
in the data size field that states that this information exists.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

ff0ff84a

ring-buffer: Add place holder recording of dropped events · 66a8cb95

由 Steven Rostedt 提交于 3月 31, 2010

Currently, when the ring buffer drops events, it does not record
the fact that it did so. It does inform the writer that the event
was dropped by returning a NULL event, but it does not put in any
place holder where the event was dropped.

This is not a trivial thing to add because the ring buffer mostly
runs in overwrite (flight recorder) mode. That is, when the ring
buffer is full, new data will overwrite old data.

In a produce/consumer mode, where new data is simply dropped when
the ring buffer is full, it is trivial to add the placeholder
for dropped events. When there's more room to write new data, then
a special event can be added to notify the reader about the dropped
events.

But in overwrite mode, any new write can overwrite events. A place
holder can not be inserted into the ring buffer since there never
may be room. A reader could also come in at anytime and miss the
placeholder.

Luckily, the way the ring buffer works, the read side can find out
if events were lost or not, and how many events. Everytime a write
takes place, if it overwrites the header page (the next read) it
updates a "overrun" variable that keeps track of the number of
lost events. When a reader swaps out a page from the ring buffer,
it can record this number, perfom the swap, and then check to
see if the number changed, and take the diff if it has, which would be
the number of events dropped. This can be stored by the reader
and returned to callers of the reader.

Since the reader page swap will fail if the writer moved the head
page since the time the reader page set up the swap, this gives room
to record the overruns without worrying about races. If the reader
sets up the pages, records the overrun, than performs the swap,
if the swap succeeds, then the overrun variable has not been
updated since the setup before the swap.

For binary readers of the ring buffer, a flag is set in the header
of each sub page (sub buffer) of the ring buffer. This flag is embedded
in the size field of the data on the sub buffer, in the 31st bit (the size
can be 32 or 64 bits depending on the architecture), but only 27
bits needs to be used for the actual size (less actually).

We could add a new field in the sub buffer header to also record the
number of events dropped since the last read, but this will change the
format of the binary ring buffer a bit too much. Perhaps this change can
be made if the information on the number of events dropped is considered
important enough.

Note, the notification of dropped events is only used by consuming reads
or peeking at the ring buffer. Iterating over the ring buffer does not
keep this information because the necessary data is only available when
a page swap is made, and the iterator does not swap out pages.

Cc: Robert Richter <robert.richter@amd.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

66a8cb95

19 3月, 2010 1 次提交

ring-buffer: Do 8 byte alignment for 64 bit that can not handle 4 byte align · 2271048d

由 Steven Rostedt 提交于 3月 18, 2010

The ring buffer uses 4 byte alignment while recording events into the
buffer, even on 64bit machines. This saves space when there are lots
of events being recorded at 4 byte boundaries.

The ring buffer has a zero copy method to write into the buffer, with
the reserving of space and then committing it. This may cause problems
when writing an 8 byte word into a 4 byte alignment (not 8). For x86 and
PPC this is not an issue, but on some architectures this would cause an
out-of-alignment exception.

This patch uses CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to determine
if it is OK to use 4 byte alignments on 64 bit machines. If it is not,
it forces the ring buffer event header to be 8 bytes and not 4,
and will align the length of the data to be 8 byte aligned.
This keeps the data payload at 8 byte alignments and will allow these
machines to run without issue.

The trick to this is that the header can be either 4 bytes or 8 bytes
depending on the length of the data payload. The 4 byte header
has a length field that supports up to 112 bytes. If the length of
the data is more than 112, the length field is set to zero, and the actual
length is stored in the next 4 bytes after the header.

When CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not set, the code forces
zero in the 4 byte header forcing the length to be stored in the 4 byte
array, even with a small data load. It also forces the length of the
data load to be 8 byte aligned. The combination of these two guarantee
that the data is always at 8 byte alignment.
Tested-by: NFrederic Weisbecker <fweisbec@gmail.com>
           (on sparc64)
Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2271048d

13 3月, 2010 1 次提交

ring-buffer: Move disabled check into preempt disable section · 52fbe9cd

由 Lai Jiangshan 提交于 3月 08, 2010

The ring buffer resizing and resetting relies on a schedule RCU
action. The buffers are disabled, a synchronize_sched() is called
and then the resize or reset takes place.

But this only works if the disabling of the buffers are within the
preempt disabled section, otherwise a window exists that the buffers
can be written to while a reset or resize takes place.

Cc: stable@kernel.org
Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4B949E43.2010906@cn.fujitsu.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

52fbe9cd

04 2月, 2010 1 次提交

Fix misspellings of "truly" in comments. · c41b20e7

由 Adam Buchbinder 提交于 12月 11, 2009

Some comments misspell "truly"; this fixes them. No code changes.
Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

c41b20e7

27 1月, 2010 2 次提交

ring-buffer: Check for end of page in iterator · 3c05d748

由 Steven Rostedt 提交于 1月 26, 2010

If the iterator comes to an empty page for some reason, or if
the page is emptied by a consuming read. The iterator code currently
does not check if the iterator is pass the contents, and may
return a false entry.

This patch adds a check to the ring buffer iterator to test if the
current page has been completely read and sets the iterator to the
next page if necessary.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3c05d748

ring-buffer: Check if ring buffer iterator has stale data · 492a74f4

由 Steven Rostedt 提交于 1月 25, 2010

Usually reads of the ring buffer is performed by a single task.
There are two types of reads from the ring buffer.

One is a consuming read which will consume the entry that was read
and the next read will be the entry that follows.

The other is an iterator that will let the user read the contents of
the ring buffer without modifying it. When an iterator is allocated,
writes to the ring buffer are disabled to protect the iterator.

The problem exists when consuming reads happen while an iterator is
allocated. Specifically, the kind of read that swaps out an entire
page (used by splice) and replaces it with a new read. If the iterator
is on the page that is swapped out, then the next read may read
from this swapped out page and return garbage.

This patch adds a check when reading the iterator to make sure that
the iterator contents are still valid. If a consuming read has taken
place, the iterator is reset.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

492a74f4

07 1月, 2010 2 次提交

ring-buffer: Add rb_list_head() wrapper around new reader page next field · 0e1ff5d7

由 Steven Rostedt 提交于 1月 06, 2010

If the very unlikely case happens where the writer moves the head by one
between where the head page is read and where the new reader page
is assigned _and_ the writer then writes and wraps the entire ring buffer
so that the head page is back to what was originally read as the head page,
the page to be swapped will have a corrupted next pointer.

Simple solution is to wrap the assignment of the next pointer with a
rb_list_head().
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

0e1ff5d7

ring-buffer: Wrap a list.next reference with rb_list_head() · 5ded3dc6

由 David Sharp 提交于 1月 06, 2010

This reference at the end of rb_get_reader_page() was causing off-by-one
writes to the prev pointer of the page after the reader page when that
page is the head page, and therefore the reader page has the RB_PAGE_HEAD
flag in its list.next pointer. This eventually results in a GPF in a
subsequent call to rb_set_head_page() (usually from rb_get_reader_page())
when that prev pointer is dereferenced. The dereferenced register would
characteristically have an address that appears shifted left by one byte
(eg, ffxxxxxxxxxxxxyy instead of ffffxxxxxxxxxxxx) due to being written at
an address one byte too high.
Signed-off-by: NDavid Sharp <dhsharp@google.com>
LKML-Reference: <1262826727-9090-1-git-send-email-dhsharp@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5ded3dc6

05 1月, 2010 1 次提交

local_t: Move local.h include to ringbuffer.c and ring_buffer_benchmark.c · 79615760

由 Christoph Lameter 提交于 1月 05, 2010

ringbuffer*.c are the last users of local.h.

Remove the include from modules.h and add it to ringbuffer files.
Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

79615760

15 12月, 2009 3 次提交

locking: Convert __raw_spin* functions to arch_spin* · 0199c4e6

由 Thomas Gleixner 提交于 12月 02, 2009

Name space cleanup. No functional change.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org

0199c4e6

locking: Rename __RAW_SPIN_LOCK_UNLOCKED to __ARCH_SPIN_LOCK_UNLOCKED · edc35bd7

由 Thomas Gleixner 提交于 12月 03, 2009

Further name space cleanup. No functional change
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org

edc35bd7

locking: Convert raw_spinlock to arch_spinlock · 445c8951

由 Thomas Gleixner 提交于 12月 02, 2009

The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.

Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever

No functional change.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org

445c8951

11 12月, 2009 2 次提交

ring-buffer: Move resize integrity check under reader lock · dd7f5943

由 Steven Rostedt 提交于 12月 10, 2009

While using an application that does splice on the ftrace ring
buffer at start up, I triggered an integrity check failure.

Looking into this, I discovered that resizing the buffer performs
an integrity check after the buffer is resized. This check unfortunately
is preformed after it releases the reader lock. If a reader is
reading the buffer it may cause the integrity check to trigger a
false failure.

This patch simply moves the integrity checker under the protection
of the ring buffer reader lock.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

dd7f5943

ring-buffer: Use sync sched protection on ring buffer resizing · 18421015

由 Steven Rostedt 提交于 12月 10, 2009

There was a comment in the ring buffer code that says the calling
layers should prevent tracing or reading of the ring buffer while
resizing. I have discovered that the tracers do not honor this
arrangement.

This patch moves the disabling and synchronizing the ring buffer to
a higher layer during resizing. This guarantees that no writes
are occurring while the resize takes place.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

18421015

17 11月, 2009 1 次提交

ring-buffer: Move access to commit_page up into function used · 5a50e33c

由 Steven Rostedt 提交于 11月 17, 2009

With the change of the way we process commits. Where a commit only happens
at the outer most level, and that we don't need to worry about
a commit ending after the rb_start_commit() has been called, the code
use to grab the commit page before the tail page to prevent a possible
race. But this race no longer exists with the rb_start_commit()
rb_end_commit() interface.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5a50e33c

04 11月, 2009 1 次提交

ring-buffer: Synchronize resizing buffer with reader lock · f7112949

由 Lai Jiangshan 提交于 11月 03, 2009

We got a sudden panic when we reduced the size of the
ringbuffer.

We can reproduce the panic by the following steps:

echo 1 > events/sched/enable
cat trace_pipe > /dev/null &

while ((1))
do
echo 12000 > buffer_size_kb
echo 512 > buffer_size_kb
done

(not more than 5 seconds, panic ...)
Reported-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4AF01735.9060409@cn.fujitsu.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

f7112949

24 10月, 2009 2 次提交

tracing: Remove cpu arg from the rb_time_stamp() function · 6d3f1e12

由 Jiri Olsa 提交于 10月 23, 2009

The cpu argument is not used inside the rb_time_stamp() function.
Plus fix a typo.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233647.118547500@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6d3f1e12

tracing: Fix comment typo and documentation example · 67b394f7

由 Jiri Olsa 提交于 10月 23, 2009

Trivial patch to fix a documentation example and to fix a
comment.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.871719877@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

67b394f7

06 10月, 2009 1 次提交

tracing/events: Add 'signed' field to format files · 26a50744

由 Tom Zanussi 提交于 10月 06, 2009

The sign info used for filters in the kernel is also useful to
applications that process the trace stream.  Add it to the format
files and make it available to userspace.
Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: rostedt@goodmis.org
Cc: lizf@cn.fujitsu.com
Cc: hch@infradead.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1254809398-8078-2-git-send-email-tzanussi@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

26a50744

20 9月, 2009 1 次提交

includecheck fix: kernel/trace, ring_buffer.c · a0f320f4

由 Jaswinder Singh Rajput 提交于 9月 20, 2009

fix the following 'make includecheck' warning:

  kernel/trace/ring_buffer.c: trace.h is included more than once.
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1247068617.4382.107.camel@ht.satnam>

a0f320f4

14 9月, 2009 1 次提交

ring-buffer: typecast cmpxchg to fix PowerPC warning · 08a40816

由 Steven Rostedt 提交于 9月 14, 2009

The cmpxchg used by PowerPC does the following:

  ({									 \
     __typeof__(*(ptr)) _o_ = (o);					 \
     __typeof__(*(ptr)) _n_ = (n);					 \
     (__typeof__(*(ptr))) __cmpxchg((ptr), (unsigned long)_o_,		 \
				    (unsigned long)_n_, sizeof(*(ptr))); \
  })

This does a type check of *ptr to both o and n.

Unfortunately, the code in ring-buffer.c assigns longs to pointers
and pointers to longs and causes a warning on PowerPC:

ring_buffer.c: In function 'rb_head_page_set':
ring_buffer.c:704: warning: initialization makes pointer from integer without a cast
ring_buffer.c:704: warning: initialization makes pointer from integer without a cast
ring_buffer.c: In function 'rb_head_page_replace':
ring_buffer.c:797: warning: initialization makes integer from pointer without a cast

This patch adds the typecasts inside cmpxchg to annotate that a long is
being cast to a pointer and a pointer is being casted to a long and this
removes the PowerPC warnings.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

08a40816

10 9月, 2009 1 次提交

ring-buffer: consolidate interface of rb_buffer_peek() · d8eeb2d3

由 Robert Richter 提交于 7月 31, 2009

rb_buffer_peek() operates with struct ring_buffer_per_cpu *cpu_buffer
only. Thus, instead of passing variables buffer and cpu it is better
to use cpu_buffer directly. This also reduces the risk of races since
cpu_buffer is not calculated twice.
Signed-off-by: NRobert Richter <robert.richter@amd.com>
LKML-Reference: <1249045084-3028-1-git-send-email-robert.richter@amd.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d8eeb2d3

05 9月, 2009 2 次提交

ring-buffer: only enable ring_buffer_swap_cpu when needed · 85bac32c

由 Steven Rostedt 提交于 9月 04, 2009

Since the ability to swap the cpu buffers adds a small overhead to
the recording of a trace, we only want to add it when needed.

Only the irqsoff and preemptoff tracers use this feature, and both are
not recommended for production kernels. This patch disables its use
when neither irqsoff nor preemptoff is configured.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

85bac32c

ring-buffer: check for swapped buffers in start of committing · 62f0b3eb

由 Steven Rostedt 提交于 9月 04, 2009

Because the irqsoff tracer can swap an internal CPU buffer, it is possible
that a swap happens between the start of the write and before the committing
bit is set (the committing bit will disable swapping).

This patch adds a check for this and will fail the write if it detects it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

62f0b3eb

04 9月, 2009 7 次提交

ring-buffer: disable all cpu buffers when one finds a problem · 077c5407

由 Steven Rostedt 提交于 9月 03, 2009

Currently the way RB_WARN_ON works, is to disable either the current
CPU buffer or all CPU buffers, depending on whether a ring_buffer or
ring_buffer_per_cpu struct was passed into the macro.

Most users of the RB_WARN_ON pass in the CPU buffer, so only the one
CPU buffer gets disabled but the rest are still active. This may
confuse users even though a warning is sent to the console.

This patch changes the macro to disable the entire buffer even if
the CPU buffer is passed in.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

077c5407

ring-buffer: do not count discarded events · a1863c21

由 Steven Rostedt 提交于 9月 03, 2009

The latency tracers report the number of items in the trace buffer.
This uses the ring buffer data to calculate this. Because discarded
events are also counted, the numbers do not match the number of items
that are printed. The ring buffer also adds a "padding" item to the
end of each buffer page which also gets counted as a discarded item.

This patch decrements the counter to the page entries on a discard.
This allows us to ignore discarded entries while reading the buffer.

Decrementing the counter is still safe since it can only happen while
the committing flag is still set.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a1863c21

ring-buffer: remove ring_buffer_event_discard · dc892f73

由 Steven Rostedt 提交于 9月 03, 2009

The function ring_buffer_event_discard can be used on any item in the
ring buffer, even after the item was committed. This function provides
no safety nets and is very race prone.

An item may be safely removed from the ring buffer before it is committed
with the ring_buffer_discard_commit.

Since there are currently no users of this function, and because this
function is racey and error prone, this patch removes it altogether.

Note, removing this function also allows the counters to ignore
all discarded events (patches will follow).
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

dc892f73

ring-buffer: fix ring_buffer_read crossing pages · 7e9391cf

由 Steven Rostedt 提交于 9月 03, 2009

When the ring buffer uses an iterator (static read mode, not on the
fly reading), when it crosses a page boundery, it will skip the first
entry on the next page. The reason is that the last entry of a page
is usually padding if the page is not full. The padding will not be
returned to the user.

The problem arises on ring_buffer_read because it also increments the
iterator. Because both the read and peek use the same rb_iter_peek,
the rb_iter_peak will return the padding but also increment to the next
item. This is because the ring_buffer_peek will not incerment it
itself.

The ring_buffer_read will increment it again and then call rb_iter_peek
again to get the next item. But that will be the second item, not the
first one on the page.

The reason this never showed up before, is because the ftrace utility
always calls ring_buffer_peek first and only uses ring_buffer_read
to increment to the next item. The ring_buffer_peek will always keep
the pointer to a valid item and not padding. This just hid the bug.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

7e9391cf

ring-buffer: remove unnecessary cpu_relax · 1b959e18

由 Steven Rostedt 提交于 9月 03, 2009

The loops in the ring buffer that use cpu_relax are not dependent on
other CPUs. They simply came across some padding in the ring buffer and
are skipping over them. It is a normal loop and does not require a
cpu_relax.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

1b959e18

ring-buffer: do not swap buffers during a commit · 98277991

由 Steven Rostedt 提交于 9月 02, 2009

If a commit is taking place on a CPU ring buffer, do not allow it to
be swapped. Return -EBUSY when this is detected instead.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

98277991

ring-buffer: do not reset while in a commit · 41b6a95d

由 Steven Rostedt 提交于 9月 02, 2009

The callers of reset must ensure that no commit can be taking place
at the time of the reset. If it does then we may corrupt the ring buffer.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

41b6a95d

08 8月, 2009 1 次提交

ring-buffer: Fix memleak in ring_buffer_free() · bd3f0221

由 Eric Dumazet 提交于 8月 07, 2009

I noticed oprofile memleaked in linux-2.6 current tree,
and tracked this ring-buffer leak.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <4A7C06B9.2090302@gmail.com>
Cc: stable@kernel.org
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

bd3f0221

06 8月, 2009 3 次提交

ring-buffer: Fix advance of reader in rb_buffer_peek() · 469535a5

由 Robert Richter 提交于 7月 30, 2009

When calling rb_buffer_peek() from ring_buffer_consume() and a
padding event is returned, the function rb_advance_reader() is
called twice. This may lead to missing samples or under high
workloads to the warning below. This patch fixes this. If a padding
event is returned by rb_buffer_peek() it will be consumed by the
calling function now.

Also, I simplified some code in ring_buffer_consume().

------------[ cut here ]------------
WARNING: at /dev/shm/.source/linux/kernel/trace/ring_buffer.c:2289 rb_advance_reader+0x2e/0xc5()
Hardware name: Anaheim
Modules linked in:
Pid: 29, comm: events/2 Tainted: G        W  2.6.31-rc3-oprofile-x86_64-standard-00059-g5050dc2 #1
Call Trace:
[<ffffffff8106776f>] ? rb_advance_reader+0x2e/0xc5
[<ffffffff81039ffe>] warn_slowpath_common+0x77/0x8f
[<ffffffff8103a025>] warn_slowpath_null+0xf/0x11
[<ffffffff8106776f>] rb_advance_reader+0x2e/0xc5
[<ffffffff81068bda>] ring_buffer_consume+0xa0/0xd2
[<ffffffff81326933>] op_cpu_buffer_read_entry+0x21/0x9e
[<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
[<ffffffff8132749b>] sync_buffer+0xa5/0x401
[<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
[<ffffffff81326c1b>] ? wq_sync_buffer+0x0/0x78
[<ffffffff81326c76>] wq_sync_buffer+0x5b/0x78
[<ffffffff8104aa30>] worker_thread+0x113/0x1ac
[<ffffffff8104dd95>] ? autoremove_wake_function+0x0/0x38
[<ffffffff8104a91d>] ? worker_thread+0x0/0x1ac
[<ffffffff8104dc9a>] kthread+0x88/0x92
[<ffffffff8100bdba>] child_rip+0xa/0x20
[<ffffffff8104dc12>] ? kthread+0x0/0x92
[<ffffffff8100bdb0>] ? child_rip+0x0/0x20
---[ end trace f561c0a58fcc89bd ]---

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@kernel.org>
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

469535a5

ring-buffer: do not disable ring buffer on oops_in_progress · 464e85eb

由 Steven Rostedt 提交于 8月 05, 2009

The commit:

  commit e0fdace1
  Author: David Miller <davem@davemloft.net>
  Date:   Fri Aug 1 01:11:22 2008 -0700

    debug_locks: set oops_in_progress if we will log messages.

    Otherwise lock debugging messages on runqueue locks can deadlock the
    system due to the wakeups performed by printk().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

Will permanently set oops_in_progress on any lockdep failure.
When this triggers it will cause any read from the ring buffer to
permanently disable the ring buffer (not to mention no locking of
printk).

This patch removes the check. It keeps the print in NMI which makes
sense. This is probably OK, since the ring buffer should not cause
something to set oops_in_progress anyway.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

464e85eb

ring-buffer: fix check of try_to_discard result · 0f2541d2

由 Steven Rostedt 提交于 8月 05, 2009

The function ring_buffer_discard_commit inversed the code path
of the result of try_to_discard. It should skip incrementing the
entry counter if try_to_discard succeeded. But instead, it increments
the entry conder if it succeeded to discard, and does not increment
it if it fails.

The result of this bug is that filtering will make the stat counters
incorrect.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

0f2541d2

17 7月, 2009 1 次提交

ring_buffer: Fix warning while ignoring cmpxchg return value · da706d8b

由 Lai Jiangshan 提交于 7月 15, 2009

kernel/trace/ring_buffer.c: In function 'rb_tail_page_update':
kernel/trace/ring_buffer.c:849: warning: value computed is not used
kernel/trace/ring_buffer.c:850: warning: value computed is not used

Add "(void)"s to fix this warning, because we don't need here to handle
the fail case of cmpxchg, it's fine if an interrupt already did the
job.

Changed from V1:
  Add a comment(which is written by Steven) for it.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

da706d8b

08 7月, 2009 2 次提交

ring-buffer: make lockless · 77ae365e

由 Steven Rostedt 提交于 3月 27, 2009

This patch converts the ring buffers into a completely lockless
buffer recording system. The read side still takes locks since
we still serialize readers. But the writers are the ones that
must be lockless (those can happen in NMIs).

The main change is to the "head_page" pointer. We write to the
tail, and read from the head. The "head_page" pointer in the cpu
buffer is now just a reference to where to look. The real head
page is now kept in the head_page->list->prev->next pointer.
That is, in the list head of the previous page we set flags.

The list pages are allocated to be aligned such that the lowest
significant bits are always zero pointing to the list. This gives
us play to put in flags to their pointers.

bit 0: set when the page is a head page
bit 1: set when the writer is moving the page (for overwrite mode)

cmpxchg is used to update the pointer.

When the writer wraps the buffer and the tail meets the head,
in overwrite mode, the writer must move the head page forward.
It first uses cmpxchg to change the pointer flag from 1 to 2.
Once this is done, the reader on another CPU will not take the
page from the buffer.

The writers need to protect against interrupts (we don't bother with
disabling interrupts because NMIs are allowed to write too).

After the writer sets the pointer flag to 2, it takes care to
manage interrupts coming in. This is discribed in detail within the
comments of the code.

 Changes in version 2:
  - Let reader reset entries value of header page.
  - Fix tail page passing commit page on reader page test.
  - Always increment entries and write counter in rb_tail_page_update
  - Add safety check in rb_set_commit_to_write to break out of infinite loop
  - add mask in rb_is_reader_page

[ Impact: lock free writing to the ring buffer ]
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>

77ae365e

ring-buffer: make the buffer a true circular link list · 3adc54fa

由 Steven Rostedt 提交于 3月 30, 2009

This patch changes the ring buffer data pages from using a link list
head pointer, to making each buffer page point to another buffer page
and never back to a "head".

This makes the handling of the ring buffer less complex, since the
traversing of the ring buffer pages no longer needs to account for the
head pointer.

This change also is needed to make the ring buffer lockless.

[
  Changes in version 2:

  - Added change that Lai Jiangshan mentioned.

  From: Lai Jiangshan <laijs@cn.fujitsu.com>
  Date: Thu, 11 Jun 2009 11:25:48 +0800
  LKML-Reference: <4A30793C.6090208@cn.fujitsu.com>

  I'm not sure whether these 4 lines:
	bpage = list_entry(pages.next, struct buffer_page, list);
	list_del_init(&bpage->list);
	cpu_buffer->pages = &bpage->list;

	list_splice(&pages, cpu_buffer->pages);
  equal to these 2 lines:
 	cpu_buffer->pages = pages.next;
 	list_del(&pages);

  If there are equivalent, I think the second one
  are simpler. It may be not a really necessarily cleanup.

  What I asked is: if there are equivalent, could you use these two line:
 	cpu_buffer->pages = pages.next;
	list_del(&pages);
]

[ Impact: simplify the ring buffer to help make it lockless ]
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>

3adc54fa

25 6月, 2009 1 次提交

ring-buffer: Make it generally available · 1155de47

由 Paul Mundt 提交于 6月 25, 2009

In hunting down the cause for the hwlat_detector ring buffer spew in
my failed -next builds it became obvious that folks are now treating
ring_buffer as something that is generic independent of tracing and thus,
suitable for public driver consumption.

Given that there are only a few minor areas in ring_buffer that have any
reliance on CONFIG_TRACING or CONFIG_FUNCTION_TRACER, provide stubs for
those and make it generally available.
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
Cc: Jon Masters <jcm@jonmasters.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20090625053012.GB19944@linux-sh.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1155de47