提交 · 18fab912d4fa70133df164d2dcf3310be0c38c34 · openanolis / cloud-kernel

07 8月, 2010 2 次提交

tracing: Fix ring_buffer_read_page reading out of page boundary · 18fab912

由 Huang Ying 提交于 7月 28, 2010

With the configuration: CONFIG_DEBUG_PAGEALLOC=y and Shaohua's patch:

[PATCH]x86: make spurious_fault check correct pte bit

Function call graph trace with the following will trigger a page fault.

# cd /sys/kernel/debug/tracing/
# echo function_graph > current_tracer
# cat per_cpu/cpu1/trace_pipe_raw > /dev/null

BUG: unable to handle kernel paging request at ffff880006e99000
IP: [<ffffffff81085572>] rb_event_length+0x1/0x3f
PGD 1b19063 PUD 1b1d063 PMD 3f067 PTE 6e99160
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/net/lo/operstate
CPU 1
Modules linked in:

Pid: 1982, comm: cat Not tainted 2.6.35-rc6-aes+ #300 /Bochs
RIP: 0010:[<ffffffff81085572>]  [<ffffffff81085572>] rb_event_length+0x1/0x3f
RSP: 0018:ffff880006475e38  EFLAGS: 00010006
RAX: 0000000000000ff0 RBX: ffff88000786c630 RCX: 000000000000001d
RDX: ffff880006e98000 RSI: 0000000000000ff0 RDI: ffff880006e99000
RBP: ffff880006475eb8 R08: 000000145d7008bd R09: 0000000000000000
R10: 0000000000008000 R11: ffffffff815d9336 R12: ffff880006d08000
R13: ffff880006e605d8 R14: 0000000000000000 R15: 0000000000000018
FS:  00007f2b83e456f0(0000) GS:ffff880002100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff880006e99000 CR3: 00000000064a8000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process cat (pid: 1982, threadinfo ffff880006474000, task ffff880006e40770)
Stack:
 ffff880006475eb8 ffffffff8108730f 0000000000000ff0 000000145d7008bd
<0> ffff880006e98010 ffff880006d08010 0000000000000296 ffff88000786c640
<0> ffffffff81002956 0000000000000000 ffff8800071f4680 ffff8800071f4680
Call Trace:
 [<ffffffff8108730f>] ? ring_buffer_read_page+0x15a/0x24a
 [<ffffffff81002956>] ? return_to_handler+0x15/0x2f
 [<ffffffff8108a575>] tracing_buffers_read+0xb9/0x164
 [<ffffffff810debfe>] vfs_read+0xaf/0x150
 [<ffffffff81002941>] return_to_handler+0x0/0x2f
 [<ffffffff810248b0>] __bad_area_nosemaphore+0x17e/0x1a1
 [<ffffffff81002941>] return_to_handler+0x0/0x2f
 [<ffffffff810248e6>] bad_area_nosemaphore+0x13/0x15
Code: 80 25 b2 16 b3 00 fe c9 c3 55 48 89 e5 f0 80 0d a4 16 b3 00 02 c9 c3 55 31 c0 48 89 e5 48 83 3d 94 16 b3 00 01 c9 0f 94 c0 c3 55 <8a> 0f 48 89 e5 83 e1 1f b8 08 00 00 00 0f b6 d1 83 fa 1e 74 27
RIP  [<ffffffff81085572>] rb_event_length+0x1/0x3f
 RSP <ffff880006475e38>
CR2: ffff880006e99000
---[ end trace a6877bb92ccb36bb ]---

The root cause is that ring_buffer_read_page() may read out of page
boundary, because the boundary checking is done after reading. This is
fixed via doing boundary checking before reading.
Reported-by: NShaohua Li <shaohua.li@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: NHuang Ying <ying.huang@intel.com>
LKML-Reference: <1280297641.2771.307.camel@yhuang-dev>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

18fab912

tracing: Fix an unallocated memory access in function_graph · 575570f0

由 Shaohua Li 提交于 7月 27, 2010

With CONFIG_DEBUG_PAGEALLOC, I observed an unallocated memory access in
function_graph trace. It appears we find a small size entry in ring buffer,
but we access it as a big size entry. The access overflows the page size
and touches an unallocated page.

Cc: <stable@kernel.org>
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
LKML-Reference: <1280217994.32400.76.camel@sli10-desk.sh.intel.com>
[ Added a comment to explain the problem - SDR ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

575570f0

11 6月, 2010 1 次提交

perf/tracing: Fix regression of perf losing kprobe events · a8fb2608

由 Steven Rostedt 提交于 6月 10, 2010

With the addition of the code to shrink the kernel tracepoint
infrastructure, we lost kprobes being traced by perf. The reason
is that I tested if the "tp_event->class->perf_probe" existed before
enabling it. This prevents "ftrace only" events (like the function
trace events) from being enabled by perf.

Unfortunately, kprobe events do not use perf_probe. This causes
kprobes to be missed by perf. To fix this, we add the test to
see if "tp_event->class->reg" exists as well as perf_probe.

Normal trace events have only "perf_probe" but no "reg" function,
and kprobes and syscalls have the "reg" but no "perf_probe".
The ftrace unique events do not have either, so this is a valid
test. If a kprobe or syscall is not to be probed by perf, the
"reg" function is called anyway, and will return a failure and
prevent perf from probing it.
Reported-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a8fb2608

31 5月, 2010 3 次提交

blktrace: Fix new kernel-doc warnings · 546cf44a

由 Randy Dunlap 提交于 5月 29, 2010

Fix blktrace.c kernel-doc warnings:
 Warning(kernel/trace/blktrace.c:858): No description found for parameter 'ignore'
 Warning(kernel/trace/blktrace.c:890): No description found for parameter 'ignore'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100529114507.c466fc1e.randy.dunlap@oracle.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

546cf44a

perf_events, trace: Fix perf_trace_destroy(), mutex went missing · 2e97942f

由 Peter Zijlstra 提交于 5月 21, 2010

Steve spotted I forgot to do the destroy under event_mutex.
Reported-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1274451913.1674.1707.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2e97942f

perf_events, trace: Fix probe unregister race · 3771f077

由 Peter Zijlstra 提交于 5月 21, 2010

tracepoint_probe_unregister() does not synchronize against the probe
callbacks, so do that explicitly. This properly serializes the callbacks
and the free of the data used therein.

Also, use this_cpu_ptr() where possible.
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1274438476.1674.1702.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3771f077

25 5月, 2010 3 次提交

ring-buffer: Move zeroing out excess in page to ring buffer code · 2711ca23

由 Steven Rostedt 提交于 5月 21, 2010

Currently the trace splice code zeros out the excess bytes in the page before
sending it off to userspace.

This is to make sure userspace is not getting anything it should not be
when reading the pages, because the excess data was never initialized
to zero before writing (for perfomance reasons).

But the splice code has no business in doing this work, it should be
done by the ring buffer. With the latest changes for recording lost
events, the splice code gets it wrong anyway.

Move the zeroing out of excess bytes into the ring buffer code.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2711ca23

ring-buffer: Reset "real_end" when page is filled · b3230c8b

由 Steven Rostedt 提交于 5月 21, 2010

The code to store the "lost events" requires knowing the real end
of the page. Since the 'commit' includes the padding at the end of
a page a "real_end" variable was used to keep track of the end not
including the padding.

If events were lost, the reader can place the count of events in
the padded area if there is enough room.

The bug this patch fixes is that when we fill the page we do not
reset the real_end variable, and if the writer had wrapped a few
times, the real_end would be incorrect.

This patch simply resets the real_end if the page was filled.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

b3230c8b

perf, trace: Fix !x86 build bug · 87f44bbc

由 Peter Zijlstra 提交于 5月 25, 2010

Patch b7e2ecef (perf, trace: Optimize tracepoints by removing
IRQ-disable from perf/tracepoint interaction) made the
unfortunate mistake of assuming the world is x86 only, correct
this.

The problem was that perf_fetch_caller_regs() did
local_save_flags() into regs->flags, and I re-used that to
remove another local_save_flags(), forgetting !x86 doesn't have
regs->flags.

Do the reverse, remove the local_save_flags() from
perf_fetch_caller_regs() and let the ftrace site do the
local_save_flags() instead.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NPaul Mackerras <paulus@samba.org>
Cc: acme@redhat.com
Cc: efault@gmx.de
Cc: fweisbec@gmail.com
Cc: rostedt@goodmis.org
LKML-Reference: <1274778175.5882.623.camel@twins>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

87f44bbc

22 5月, 2010 1 次提交

pipe: add support for shrinking and growing pipes · 35f3d14d

由 Jens Axboe 提交于 5月 20, 2010

This patch adds F_GETPIPE_SZ and F_SETPIPE_SZ fcntl() actions for
growing and shrinking the size of a pipe and adjusts pipe.c and splice.c
(and relay and network splice) usage to work with these larger (or smaller)
pipes.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

35f3d14d

21 5月, 2010 2 次提交

perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events · 1c024eca

由 Peter Zijlstra 提交于 5月 19, 2010

Avoid the swevent hash-table by using per-tracepoint
hlists.

Also, avoid conditionals on the fast path by ordering
with probe unregister so that we should never get on
the callback path without the data being there.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20100521090710.473188012@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1c024eca

perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction · b7e2ecef

由 Peter Zijlstra 提交于 5月 19, 2010

Improves performance.
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1274259525.5605.10352.camel@twins>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b7e2ecef

19 5月, 2010 1 次提交

perf/ftrace: Optimize perf/tracepoint interaction for single events · 4f41c013

由 Peter Zijlstra 提交于 5月 18, 2010

When we've got but a single event per tracepoint
there is no reason to try and multiplex it so don't.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: NIngo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4f41c013

15 5月, 2010 8 次提交

tracing: Fix function declarations if !CONFIG_STACKTRACE · e1f7992e

由 Li Zefan 提交于 5月 10, 2010

ftrace_trace_stack() and frace_trace_userstacke() take a
struct ring_buffer argument, not struct trace_array. Commit
e77405ad("tracing: pass around ring buffer instead of tracer")
made this change.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4BE77C14.5010806@cn.fujitsu.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e1f7992e

tracing: Combine event filter_active and enable into single flags field · 553552ce

由 Steven Rostedt 提交于 4月 23, 2010

The filter_active and enable both use an int (4 bytes each) to
set a single flag. We can save 4 bytes per event by combining the
two into a single integer.

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4894944	1018052	 861512	6774508	 675eec	vmlinux.id
4894871	1012292	 861512	6768675	 674823	vmlinux.flags

This gives us another 5K in savings.

The modification of both the enable and filter fields are done
under the event_mutex, so it is still safe to combine the two.

Note: Although Mathieu gave his Acked-by, he would like it documented
 that the reads of flags are not protected by the mutex. The way the
 code works, these reads will not break anything, but will have a
 residual effect. Since this behavior is the same even before this
 patch, describing this situation is left to another patch, as this
 patch does not change the behavior, but just brought it to Mathieu's
 attention.

v2: Updated the event trace self test to for this change.
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

553552ce

tracing: Remove duplicate id information in event structure · 32c0edae

由 Steven Rostedt 提交于 4月 23, 2010

Now that the trace_event structure is embedded in the ftrace_event_call
structure, there is no need for the ftrace_event_call id field.
The id field is the same as the trace_event type field.

Removing the id and re-arranging the structure brings down the tracepoint
footprint by another 5K.

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4895024	1023812	 861512	6780348	 6775bc	vmlinux.print
4894944	1018052	 861512	6774508	 675eec	vmlinux.id
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

32c0edae

tracing: Move print functions into event class · 80decc70

由 Steven Rostedt 提交于 4月 23, 2010

Currently, every event has its own trace_event structure. This is
fine since the structure is needed anyway. But the print function
structure (trace_event_functions) is now separate. Since the output
of the trace event is done by the class (with the exception of events
defined by DEFINE_EVENT_PRINT), it makes sense to have the class
define the print functions that all events in the class can use.

This makes a bigger deal with the syscall events since all syscall events
use the same class. The savings here is another 30K.

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4900382	1048964	 861512	6810858	 67ecea	vmlinux.init
4900446	1049028	 861512	6810986	 67ed6a	vmlinux.preprint
4895024	1023812	 861512	6780348	 6775bc	vmlinux.print

To accomplish this, and to let the class know what event is being
printed, the event structure is embedded in the ftrace_event_call
structure. This should not be an issues since the event structure
was created for each event anyway.
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

80decc70

tracing: Allow events to share their print functions · a9a57763

由 Steven Rostedt 提交于 4月 22, 2010

Multiple events may use the same method to print their data.
Instead of having all events have a pointer to their print funtions,
the trace_event structure now points to a trace_event_functions structure
that will hold the way to print ouf the event.

The event itself is now passed to the print function to let the print
function know what kind of event it should print.

This opens the door to consolidating the way several events print
their output.

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4900382	1048964	 861512	6810858	 67ecea	vmlinux.init
4900446	1049028	 861512	6810986	 67ed6a	vmlinux.preprint

This change slightly increases the size but is needed for the next change.

v3: Fix the branch tracer events to handle this change.

v2: Fix the new function graph tracer event calls to handle this change.
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a9a57763

tracing: Move raw_init from events to class · 0405ab80

由 Steven Rostedt 提交于 4月 22, 2010

The raw_init function pointer in the event is used to initialize
various kinds of events. The type of initialization needed is usually
classed to the kind of event it is.

Two events with the same class will always have the same initialization
function, so it makes sense to move this to the class structure.

Perhaps even making a special system structure would work since
the initialization is the same for all events within a system.
But since there's no system structure (yet), this will just move it
to the class.

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4900375	1053380	 861512	6815267	 67fe23	vmlinux.fields
4900382	1048964	 861512	6810858	 67ecea	vmlinux.init

The text grew very slightly, but this is a constant growth that happened
with the changing of the C files that call the init code.
The bigger savings is the data which will be saved the more events share
a class.
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

0405ab80

tracing: Move fields from event to class structure · 2e33af02

由 Steven Rostedt 提交于 4月 22, 2010

Move the defined fields from the event to the class structure.
Since the fields of the event are defined by the class they belong
to, it makes sense to have the class hold the information instead
of the individual events. The events of the same class would just
hold duplicate information.

After this change the size of the kernel dropped another 3K:

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4900252	1057412	 861512	6819176	 680d68	vmlinux.regs
4900375	1053380	 861512	6815267	 67fe23	vmlinux.fields

Although the text increased, this was mainly due to the C files
having to adapt to the change. This is a constant increase, where
new tracepoints will not increase the Text. But the big drop is
in the data size (as well as needed allocations to hold the fields).
This will give even more savings as more tracepoints are created.

Note, if just TRACE_EVENT()s are used and not DECLARE_EVENT_CLASS()
with several DEFINE_EVENT()s, then the savings will be lost. But
we are pushing developers to consolidate events with DEFINE_EVENT()
so this should not be an issue.

The kprobes define a unique class to every new event, but are dynamic
so it should not be a issue.

The syscalls however have a single class but the fields for the individual
events are different. The syscalls use a metadata to define the
fields. I moved the fields list from the event to the metadata and
added a "get_fields()" function to the class. This function is used
to find the fields. For normal events and kprobes, get_fields() just
returns a pointer to the fields list_head in the class. For syscall
events, it returns the fields list_head in the metadata for the event.

v2:  Fixed the syscall fields. The syscall metadata needs a list
     of fields for both enter and exit.
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2e33af02

tracing: Remove per event trace registering · 2239291a

由 Steven Rostedt 提交于 4月 21, 2010

This patch removes the register functions of TRACE_EVENT() to enable
and disable tracepoints. The registering of a event is now down
directly in the trace_events.c file. The tracepoint_probe_register()
is now called directly.

The prototypes are no longer type checked, but this should not be
an issue since the tracepoints are created automatically by the
macros. If a prototype is incorrect in the TRACE_EVENT() macro, then
other macros will catch it.

The trace_event_class structure now holds the probes to be called
by the callbacks. This removes needing to have each event have
a separate pointer for the probe.

To handle kprobes and syscalls, since they register probes in a
different manner, a "reg" field is added to the ftrace_event_class
structure. If the "reg" field is assigned, then it will be called for
enabling and disabling of the probe for either ftrace or perf. To let
the reg function know what is happening, a new enum (trace_reg) is
created that has the type of control that is needed.

With this new rework, the 82 kernel events and 618 syscall events
has their footprint dramatically lowered:

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4914025	1088868	 861512	6864405	 68be15	vmlinux.class
4918492	1084612	 861512	6864616	 68bee8	vmlinux.tracepoint
4900252	1057412	 861512	6819176	 680d68	vmlinux.regs

The size went from 6863829 to 6819176, that's a total of 44K
in savings. With tracepoints being continuously added, this is
critical that the footprint becomes minimal.

v5: Added #ifdef CONFIG_PERF_EVENTS around a reference to perf
    specific structure in trace_events.c.

v4: Fixed trace self tests to check probe because regfunc no longer
    exists.

v3: Updated to handle void *data in beginning of probe parameters.
    Also added the tracepoint: check_trace_callback_type_##call().

v2: Changed the callback probes to pass void * and typecast the
    value within the function.
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2239291a

14 5月, 2010 2 次提交

tracing: Let tracepoints have data passed to tracepoint callbacks · 38516ab5

由 Steven Rostedt 提交于 4月 20, 2010

This patch adds data to be passed to tracepoint callbacks.

The created functions from DECLARE_TRACE() now need a mandatory data
parameter. For example:

DECLARE_TRACE(mytracepoint, int value, value)

Will create the register function:

int register_trace_mytracepoint((void(*)(void *data, int value))probe,
                                void *data);

As the first argument, all callbacks (probes) must take a (void *data)
parameter. So a callback for the above tracepoint will look like:

void myprobe(void *data, int value)
{
}

The callback may choose to ignore the data parameter.

This change allows callbacks to register a private data pointer along
with the function probe.

	void mycallback(void *data, int value);

	register_trace_mytracepoint(mycallback, mydata);

Then the mycallback() will receive the "mydata" as the first parameter
before the args.

A more detailed example:

  DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

  /* In the C file */

  DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

  [...]

       trace_mytracepoint(status);

  /* In a file registering this tracepoint */

  int my_callback(void *data, int status)
  {
	struct my_struct my_data = data;
	[...]
  }

  [...]
	my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
	init_my_data(my_data);
	register_trace_mytracepoint(my_callback, my_data);

The same callback can also be registered to the same tracepoint as long
as the data registered is different. Note, the data must also be used
to unregister the callback:

	unregister_trace_mytracepoint(my_callback, my_data);

Because of the data parameter, tracepoints declared this way can not have
no args. That is:

  DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());

will cause an error.

If no arguments are needed, a new macro can be used instead:

  DECLARE_TRACE_NOARGS(mytracepoint);

Since there are no arguments, the proto and args fields are left out.

This is part of a series to make the tracepoint footprint smaller:

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4914025	1088868	 861512	6864405	 68be15	vmlinux.class
4918492	1084612	 861512	6864616	 68bee8	vmlinux.tracepoint

Again, this patch also increases the size of the kernel, but
lays the ground work for decreasing it.

 v5: Fixed net/core/drop_monitor.c to handle these updates.

 v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the
     #ifdef CONFIG_TRACE_POINTS, since the two are the same in both
     cases. The __DECLARE_TRACE() is what changes.
     Thanks to Frederic Weisbecker for pointing this out.

 v3: Made all register_* functions require data to be passed and
     all callbacks to take a void * parameter as its first argument.
     This makes the calling functions comply with C standards.

     Also added more comments to the modifications of DECLARE_TRACE().

 v2: Made the DECLARE_TRACE() have the ability to pass arguments
     and added a new DECLARE_TRACE_NOARGS() for tracepoints that
     do not need any arguments.
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

38516ab5

tracing: Create class struct for events · 8f082018

由 Steven Rostedt 提交于 4月 20, 2010

This patch creates a ftrace_event_class struct that event structs point to.
This class struct will be made to hold information to modify the
events. Currently the class struct only holds the events system name.

This patch slightly increases the size, but this change lays the ground work
of other changes to make the footprint of tracepoints smaller.

With 82 standard tracepoints, and 618 system call tracepoints
(two tracepoints per syscall: enter and exit):

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4914025	1088868	 861512	6864405	 68be15	vmlinux.class

This patch also cleans up some stale comments in ftrace.h.

v2: Fixed missing semi-colon in macro.
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

8f082018

07 5月, 2010 2 次提交

sched: Remove rq argument to the tracepoints · 27a9da65

由 Peter Zijlstra 提交于 5月 04, 2010

struct rq isn't visible outside of sched.o so its near useless to
expose the pointer, also there are no users of it, so remove it.
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1272997616.1642.207.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

27a9da65

perf: Fix check at end of event search · d9f599e1

由 Dan Carpenter 提交于 3月 20, 2010

The original code doesn't work because "call" is never NULL there.
Signed-off-by: NDan Carpenter <error27@gmail.com>
LKML-Reference: <20100320143911.GF5331@bicker>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d9f599e1

06 5月, 2010 1 次提交

tracing: Fix "integer as NULL pointer" warning. · 668eb65f

由 Thiago Farina 提交于 1月 24, 2010

kernel/trace/trace_output.c:256:24: warning: Using plain integer as NULL pointer
Signed-off-by: NThiago Farina <tfransosi@gmail.com>
LKML-Reference: <1264349038-1766-3-git-send-email-tfransosi@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

668eb65f

05 5月, 2010 1 次提交

ring-buffer: Wrap open-coded WARN_ONCE · 95609791

由 Borislav Petkov 提交于 5月 02, 2010

Wrap open-coded WARN_ONCE functionality into the equivalent macro.
Signed-off-by: NBorislav Petkov <bp@alien8.de>
LKML-Reference: <20100502060354.GA5281@liondog.tnic>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

95609791

01 5月, 2010 3 次提交

hw-breakpoints: Get the number of available registers on boot dynamically · feef47d0

由 Frederic Weisbecker 提交于 4月 23, 2010

The breakpoint generic layer assumes that archs always know in advance
the static number of address registers available to host breakpoints
through the HBP_NUM macro.

However this is not true for every archs. For example Arm needs to get
this information dynamically to handle the compatiblity between
different versions.

To solve this, this patch proposes to drop the static HBP_NUM macro
and let the arch provide the number of available slots through a
new hw_breakpoint_slots() function. For archs that have
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS selected, it will be called once
as the number of registers fits for instruction and data breakpoints
together.
For the others it will be called first to get the number of
instruction breakpoint registers and another time to get the
data breakpoint registers, the targeted type is given as a
parameter of hw_breakpoint_slots().
Reported-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NPaul Mundt <lethal@linux-sh.org>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>

feef47d0

[SCSI] add scsi trace core functions and put trace points · bf816235

由 Kei Tokunaga 提交于 4月 01, 2010

Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NTomohiro Kusumi <kusumi.tomohiro@jp.fujitsu.com>
Signed-off-by: NKei Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>

bf816235

[SCSI] ftrace: add __print_hex() · 5a2e3995

由 Kei Tokunaga 提交于 4月 01, 2010

__print_hex() prints values in an array in hex (w/o '0x') (space separated)
EX) 92 33 32 f3 ee 4d
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NTomohiro Kusumi <kusumi.tomohiro@jp.fujitsu.com>
Signed-off-by: NKei Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>

5a2e3995

28 4月, 2010 5 次提交

tracing: Fix sleep time function profiling · 37e44bc5

由 Steven Rostedt 提交于 4月 27, 2010

When sleep_time is off the function profiler ignores the time that a task
is scheduled out. When the task is scheduled out a timestamp is taken.
When the task is scheduled back in, the timestamp is compared to the
current time and the saved calltimes are adjusted accordingly.

But when stopping the function profiler, the sched switch hook that
does this adjustment was stopped before shutting down the tracer.
This allowed some tasks to not get their timestamps set when they
scheduled out. When the function profiler started again, this would
skew the times of the scheduler functions.

This patch moves the stopping of the sched switch to after the function
profiler is stopped. It also ignores zero set calltimes, which may
happen on start up.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

37e44bc5

tracing: Show sample std dev in function profiling · e330b3bc

由 Chase Douglas 提交于 4月 26, 2010

When combined with function graph tracing the ftrace function profiler
also prints the average run time of functions. While this gives us some
good information, it doesn't tell us anything about the variance of the
run times of the function. This change prints out the s^2 sample
standard deviation alongside the average.

This change adds one entry to the profile record structure. This
increases the memory footprint of the function profiler by 1/3 on a
32-bit system, and by 1/5 on a 64-bit system when function graphing is
enabled, though the memory is only allocated when the profiler is turned
on. During the profiling, one extra line of code adds the squared
calltime to the new record entry, so this should not adversly affect
performance.

Note that the square of the sample standard deviation is printed because
there is no sqrt implementation for unsigned long long in the kernel.
Signed-off-by: NChase Douglas <chase.douglas@canonical.com>
LKML-Reference: <1272304925-2436-1-git-send-email-chase.douglas@canonical.com>

[ fixed comment about ns^2 -> us^2 conversion ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e330b3bc

ring-buffer: Make benchmark handle missed events · a838b2e6

由 Steven Rostedt 提交于 4月 27, 2010

With the addition of the "missed events" flags that is stored in the
commit field of the ring buffer page, the ring_buffer_benchmark
was not updated to handle this. If events are missed, then the
missed events flag is set in the ring buffer page, the benchmark
will count that flag as part of the size of the page and will hit the BUG()
when it tries to read beyond the page.

The solution is simply to have the ring buffer benchmark mask off
the extra bits.
Reported-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a838b2e6

ring-buffer: Make non-consuming read less expensive with lots of cpus. · 72c9ddfd

由 David Miller 提交于 4月 20, 2010

When performing a non-consuming read, a synchronize_sched() is
performed once for every cpu which is actively tracing.

This is very expensive, and can make it take several seconds to open
up the 'trace' file with lots of cpus.

Only one synchronize_sched() call is actually necessary.  What is
desired is for all cpus to see the disabling state change.  So we
transform the existing sequence:

	for_each_cpu() {
		ring_buffer_read_start();
	}

where each ring_buffer_start() call performs a synchronize_sched(),
into the following:

	for_each_cpu() {
		ring_buffer_read_prepare();
	}
	ring_buffer_read_prepare_sync();
	for_each_cpu() {
		ring_buffer_read_start();
	}

wherein only the single ring_buffer_read_prepare_sync() call needs to
do the synchronize_sched().

The first phase, via ring_buffer_read_prepare(), allocates the 'iter'
memory and increments ->record_disabled.

In the second phase, ring_buffer_read_prepare_sync() makes sure this
->record_disabled state is visible fully to all cpus.

And in the final third phase, the ring_buffer_read_start() calls reset
the 'iter' objects allocated in the first phase since we now know that
none of the cpus are adding trace entries any more.

This makes openning the 'trace' file nearly instantaneous on a
sparc64 Niagara2 box with 128 cpus tracing.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
LKML-Reference: <20100420.154711.11246950.davem@davemloft.net>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

72c9ddfd

tracing: Add graph output support for irqsoff tracer · 62b915f1

由 Jiri Olsa 提交于 4月 02, 2010

Add function graph output to irqsoff tracer.

The graph output is enabled by setting new 'display-graph' trace option.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
LKML-Reference: <1270227683-14631-4-git-send-email-jolsa@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

62b915f1

27 4月, 2010 2 次提交

tracing: Have graph flags passed in to ouput functions · d7a8d9e9

由 Jiri Olsa 提交于 4月 02, 2010

Let the function graph tracer have custom flags passed to its
output functions.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
LKML-Reference: <1270227683-14631-3-git-send-email-jolsa@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d7a8d9e9

tracing: Add ftrace events for graph tracer · 9106b693

由 Jiri Olsa 提交于 4月 02, 2010

Add ftrace events for graph tracer, so the graph output could be shared
with other tracers.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
LKML-Reference: <1270227683-14631-2-git-send-email-jolsa@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9106b693

22 4月, 2010 1 次提交

tracing: Dump either the oops's cpu source or all cpus buffers · cecbca96

由 Frederic Weisbecker 提交于 4月 18, 2010

The ftrace_dump_on_oops kernel parameter, sysctl and sysrq let one
dump every cpu buffers when an oops or panic happens.

It's nice when you have few cpus but it may take ages if have many,
plus you miss the real origin of the problem in all the cpu traces.

Sometimes, all you need is to dump the cpu buffer that triggered the
opps, most of the time it is our main interest.

This patch modifies ftrace_dump_on_oops to handle this choice.

The ftrace_dump_on_oops kernel parameter, when it comes alone, has
the same behaviour than before. But ftrace_dump_on_oops=orig_cpu
will only dump the buffer of the cpu that oops'ed.

Similarly, sysctl kernel.ftrace_dump_on_oops=1 and
echo 1 > /proc/sys/kernel/ftrace_dump_on_oops keep their previous
behaviour. But setting 2 jumps into cpu origin dump mode.

v2: Fix double setup
v3: Fix spelling issues reported by Randy Dunlap
v4: Also update __ftrace_dump in the selftests
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>

cecbca96

15 4月, 2010 1 次提交

tracing/kprobes: Support basic types on dynamic events · 93ccae7a

由 Masami Hiramatsu 提交于 4月 12, 2010

Support basic types of integer (u8, u16, u32, u64, s8, s16, s32, s64) in
kprobe tracer. With this patch, users can specify above basic types on
each arguments after ':'. If omitted, the argument type is set as
unsigned long (u32 or u64, arch-dependent).

 e.g.
  echo 'p account_system_time+0 hardirq_offset=%si:s32' > kprobe_events

  adds a probe recording hardirq_offset in signed-32bits value on the
  entry of account_system_time.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100412171708.3790.18599.stgit@localhost6.localdomain6>
Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

93ccae7a

05 4月, 2010 1 次提交

tracing: Fix uninitialized variable of tracing/trace output · aa27497c

由 Lai Jiangshan 提交于 4月 05, 2010

Because a local variable is not initialized, I got these
when I did 'cat tracing/trace'. (not trace_pipe):

CPU:0 [LOST 18446744071579453134 EVENTS]
ps-3099 [000] 560.770221: lock_acquire: ffff880030865010 &(&dentry->d_lock)->rlock
CPU:0 [LOST 18446744071579453134 EVENTS]
ps-3099 [000] 560.770221: lock_release: ffff880030865010 &(&dentry->d_lock)->rlock
CPU:0 [LOST 18446612133255294080 EVENTS]
ps-3099 [000] 560.770221: lock_acquire: ffff880030865010 &(&dentry->d_lock)->rlock
CPU:0 [LOST 18446744071579453134 EVENTS]
ps-3099 [000] 560.770222: lock_release: ffff880030865010 &(&dentry->d_lock)->rlock
CPU:0 [LOST 18446744071579453134 EVENTS]
ps-3099 [000] 560.770222: lock_release: ffffffff816cfb98 dcache_lock

See peek_next_entry(), it does not set *lost_events when we 'cat tracing/trace'
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4BB9A929.2000303@cn.fujitsu.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

aa27497c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功