提交 · 41794f190780c28784fa62b22001691e5876d149 · xiphi1978 / linux

21 4月, 2017 1 次提交

ftrace: Pass probe ops to probe function · bca6c8d0

由 Steven Rostedt (VMware) 提交于 4月 03, 2017

In preparation to cleaning up the probe function registration code, the
"data" parameter will eventually be removed from the probe->func() call.
Instead it will receive its own "ops" function, in which it can set up its
own data that it needs to map.
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

bca6c8d0

18 4月, 2017 1 次提交

ftrace: Add 'function-fork' trace option · 1e10486f

由 Namhyung Kim 提交于 4月 17, 2017

The function-fork option is same as event-fork that it tracks task
fork/exit and set the pid filter properly. This can be useful if user
wants to trace selected tasks including their children only.

Link: http://lkml.kernel.org/r/20170417024430.21194-3-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

1e10486f

25 3月, 2017 4 次提交

tracing: Move trace_handle_return() out of line · af0009fc

由 Steven Rostedt (VMware) 提交于 3月 16, 2017

Currently trace_handle_return() looks like this:

 static inline enum print_line_t trace_handle_return(struct trace_seq *s)
 {
        return trace_seq_has_overflowed(s) ?
                TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED;
 }

Where trace_seq_overflowed(s) is:

 static inline bool trace_seq_has_overflowed(struct trace_seq *s)
 {
	return s->full || seq_buf_has_overflowed(&s->seq);
 }

And seq_buf_has_overflowed(&s->seq) is:

 static inline bool
 seq_buf_has_overflowed(struct seq_buf *s)
 {
	return s->len > s->size;
 }

Making trace_handle_return() into:

 return (s->full || (s->seq->len > s->seq->size)) ?
           TRACE_TYPE_PARTIAL_LINE :
           TRACE_TYPE_HANDLED;

One would think this is not an issue to keep as an inline. But because this
is used in the TRACE_EVENT() macro, it is extended for every tracepoint in
the system. Taking a look at a single tracepoint x86_irq_vector (was the
first one I randomly chosen). As trace_handle_return is used in the
TRACE_EVENT() macro of trace_raw_output_##call() we disassemble
trace_raw_output_x86_irq_vector and do a diff:

- is the original
+ is the out-of-line code

I removed identical lines that were different just due to different
addresses.

--- /tmp/irq-vec-orig	2017-03-16 09:12:48.569384851 -0400
+++ /tmp/irq-vec-ool	2017-03-16 09:13:39.378153385 -0400
@@ -6,27 +6,23 @@
        53                      push   %rbx
        48 89 fb                mov    %rdi,%rbx
        4c 8b a7 c0 20 00 00    mov    0x20c0(%rdi),%r12
        e8 f7 72 13 00          callq  ffffffff81155c80 <trace_raw_output_prep>
        83 f8 01                cmp    $0x1,%eax
        74 05                   je     ffffffff8101e993 <trace_raw_output_x86_irq_vector+0x23>
        5b                      pop    %rbx
        41 5c                   pop    %r12
        5d                      pop    %rbp
        c3                      retq
        41 8b 54 24 08          mov    0x8(%r12),%edx
-       48 8d bb 98 10 00 00    lea    0x1098(%rbx),%rdi
+       48 81 c3 98 10 00 00    add    $0x1098,%rbx
-       48 c7 c6 7b 8a a0 81    mov    $0xffffffff81a08a7b,%rsi
+       48 c7 c6 ab 8a a0 81    mov    $0xffffffff81a08aab,%rsi
-       e8 c5 85 13 00          callq  ffffffff81156f70 <trace_seq_printf>

 === here's the start of the main difference ===

+       48 89 df                mov    %rbx,%rdi
+       e8 62 7e 13 00          callq  ffffffff81156810 <trace_seq_printf>
-       8b 93 b8 20 00 00       mov    0x20b8(%rbx),%edx
-       31 c0                   xor    %eax,%eax
-       85 d2                   test   %edx,%edx
-       75 11                   jne    ffffffff8101e9c8 <trace_raw_output_x86_irq_vector+0x58>
-       48 8b 83 a8 20 00 00    mov    0x20a8(%rbx),%rax
-       48 39 83 a0 20 00 00    cmp    %rax,0x20a0(%rbx)
-       0f 93 c0                setae  %al
+       48 89 df                mov    %rbx,%rdi
+       e8 4a c5 12 00          callq  ffffffff8114af00 <trace_handle_return>
        5b                      pop    %rbx
-       0f b6 c0                movzbl %al,%eax

 === end ===

        41 5c                   pop    %r12
        5d                      pop    %rbp
        c3                      retq

If you notice, the original has 22 bytes of text more than the out of line
version. As this is for every TRACE_EVENT() defined in the system, this can
become quite large.

   text	   data	    bss	    dec	    hex	filename
8690305	5450490	1298432	15439227	 eb957b	vmlinux-orig
8681725	5450490	1298432	15430647	 eb73f7	vmlinux-handle

This change has a total of 8580 bytes in savings.

 $ objdump -dr /tmp/vmlinux-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
324

That's 324 tracepoints. But this does not include modules (which contain
many more tracepoints). For an allyesconfig build:

 $ objdump -dr vmlinux-allyes-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
1401

That's 1401 tracepoints giving us:

   text    data     bss     dec     hex filename
137920629       140221067       53264384        331406080       13c0db00 vmlinux-allyes-orig
137827709       140221067       53264384        331313160       13bf7008 vmlinux-allyes-handle

92920 bytes in savings!!!

Link: http://lkml.kernel.org/r/20170315021431.13107-2-andi@firstfloor.orgReported-by: NAndi Kleen <andi@firstfloor.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

af0009fc

ftrace: Have function tracing start in early boot up · dbeafd0d

由 Steven Rostedt (VMware) 提交于 3月 03, 2017

Register the function tracer right after the tracing buffers are initialized
in early boot up. This will allow function tracing to begin early if it is
enabled via the kernel command line.
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

dbeafd0d

tracing: Postpone tracer start-up tests till the system is more robust · 9afecfbb

由 Steven Rostedt (VMware) 提交于 3月 24, 2017

As tracing can now be enabled very early in boot up, even before some
critical system services (like scheduling), do not run the tracer selftests
until after early_initcall() is performed. If a tracer is registered before
such time, it is saved off in a list and the test is run when the system is
able to handle more diverse functions.
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

9afecfbb

tracing: Split tracing initialization into two for early initialization · e725c731

由 Steven Rostedt (VMware) 提交于 3月 03, 2017

Create an early_trace_init() function that will initialize the buffers and
allow for ealier use of trace_printk(). This will also allow for future work
to have function tracing start earlier at boot up.
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

e725c731

01 3月, 2017 1 次提交

perf/core: Rename CONFIG_[UK]PROBE_EVENT to CONFIG_[UK]PROBE_EVENTS · 6b0b7551

由 Anton Blanchard 提交于 2月 16, 2017

We have uses of CONFIG_UPROBE_EVENT and CONFIG_KPROBE_EVENT as
well as CONFIG_UPROBE_EVENTS and CONFIG_KPROBE_EVENTS.

Consistently use the plurals.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: davem@davemloft.net
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/20170216060050.20866-1-anton@ozlabs.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

6b0b7551

17 2月, 2017 1 次提交

tracing: Remove outdated ring buffer comment · 67d04bb2

由 Joel Fernandes 提交于 2月 16, 2017

The comment about ring buffer's organization is outdated and the code sits
elsewhere, remove the comment.
Link: http://lkml.kernel.org/r/20170217041058.23904-1-joelaf@google.com

Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: NJoel Fernandes <joelaf@google.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

67d04bb2

03 2月, 2017 1 次提交

tracing: Reset parser->buffer to allow multiple "puts" · 0e684b65

由 Steven Rostedt (VMware) 提交于 2月 02, 2017

trace_parser_put() simply frees the allocated parser buffer. But it does not
reset the pointer that was freed. This means that if trace_parser_put() is
called on the same parser more than once, it will corrupt the allocation
system. Setting parser->buffer to NULL after free allows it to be called
more than once without any ill effect.
Acked-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

0e684b65

01 2月, 2017 1 次提交

fs: Better permission checking for submounts · 93faccbb

由 Eric W. Biederman 提交于 2月 01, 2017

To support unprivileged users mounting filesystems two permission
checks have to be performed: a test to see if the user allowed to
create a mount in the mount namespace, and a test to see if
the user is allowed to access the specified filesystem.

The automount case is special in that mounting the original filesystem
grants permission to mount the sub-filesystems, to any user who
happens to stumble across the their mountpoint and satisfies the
ordinary filesystem permission checks.

Attempting to handle the automount case by using override_creds
almost works.  It preserves the idea that permission to mount
the original filesystem is permission to mount the sub-filesystem.
Unfortunately using override_creds messes up the filesystems
ordinary permission checks.

Solve this by being explicit that a mount is a submount by introducing
vfs_submount, and using it where appropriate.

vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let
sget and friends know that a mount is a submount so they can take appropriate
action.

sget and sget_userns are modified to not perform any permission checks
on submounts.

follow_automount is modified to stop using override_creds as that
has proven problemantic.

do_mount is modified to always remove the new MS_SUBMOUNT flag so
that we know userspace will never by able to specify it.

autofs4 is modified to stop using current_real_cred that was put in
there to handle the previous version of submount permission checking.

cifs is modified to pass the mountpoint all of the way down to vfs_submount.

debugfs is modified to pass the mountpoint all of the way down to
trace_automount by adding a new parameter.  To make this change easier
a new typedef debugfs_automount_t is introduced to capture the type of
the debugfs automount function.

Cc: stable@vger.kernel.org
Fixes: 069d5ac9 ("autofs:  Fix automounts by using current_real_cred()->uid")
Fixes: aeaa4a79 ("fs: Call d_automount with the filesystems creds")
Reviewed-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: NSeth Forshee <seth.forshee@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

93faccbb

25 12月, 2016 1 次提交

clocksource: Use a plain u64 instead of cycle_t · a5a1d1c2

由 Thomas Gleixner 提交于 12月 21, 2016

There is no point in having an extra type for extra confusion. u64 is
unambiguous.

Conversion was done with the following coccinelle script:

@rem@
@@
-typedef u64 cycle_t;

@fix@
typedef cycle_t;
@@
-cycle_t
+u64
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>

a5a1d1c2

13 12月, 2016 1 次提交

tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results · c59f29cb

由 Pavankumar Kondeti 提交于 12月 09, 2016

The 's' flag is supposed to indicate that a softirq is running. This
can be detected by testing the preempt_count with SOFTIRQ_OFFSET.

The current code tests the preempt_count with SOFTIRQ_MASK, which
would be true even when softirqs are disabled but not serving a
softirq.

Link: http://lkml.kernel.org/r/1481300417-3564-1-git-send-email-pkondeti@codeaurora.orgSigned-off-by: NPavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

c59f29cb

09 12月, 2016 1 次提交

tracing: Replace kmap with copy_from_user() in trace_marker writing · 656c7f0d

由 Steven Rostedt (Red Hat) 提交于 12月 08, 2016

Instead of using get_user_pages_fast() and kmap_atomic() when writing
to the trace_marker file, just allocate enough space on the ring buffer
directly, and write into it via copy_from_user().

Writing into the trace_marker file use to allocate a temporary buffer
to perform the copy_from_user(), as we didn't want to write into the
ring buffer if the copy failed. But as a trace_marker write is suppose
to be extremely fast, and allocating memory causes other tracepoints to
trigger, Peter Zijlstra suggested using get_user_pages_fast() and
kmap_atomic() to keep the user space pages in memory and reading it
directly. But Henrik Austad had issues with this because it required taking
the mm->mmap_sem and causing long delays with the write.

Instead, just allocate the space in the ring buffer and use
copy_from_user() directly. If it faults, return -EFAULT and write
"<faulted>" into the ring buffer.

Link: http://lkml.kernel.org/r/20161208124018.72dd0f86@gandalf.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Henrik Austad <henrik@austad.us>
Cc: Peter Zijlstra <peterz@infradead.org>
Updates: d696b58c "tracing: Do not allocate buffer for trace_marker"
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

656c7f0d

02 12月, 2016 1 次提交

tracing/rb: Convert to hotplug state machine · b32614c0

由 Sebastian Andrzej Siewior 提交于 11月 27, 2016

Install the callbacks via the state machine. The notifier in struct
ring_buffer is replaced by the multi instance interface.  Upon
__ring_buffer_alloc() invocation, cpuhp_state_add_instance() will invoke
the trace_rb_cpu_prepare() on each CPU.

This callback may now fail. This means __ring_buffer_alloc() will fail and
cleanup (like previously) and during a CPU up event this failure will not
allow the CPU to come up.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161126231350.10321-7-bigeasy@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

b32614c0

30 11月, 2016 1 次提交

trace: Add an option for boot clock as trace clock · 80ec3552

由 Joel Fernandes 提交于 11月 28, 2016

Unlike monotonic clock, boot clock as a trace clock will account for
time spent in suspend useful for tracing suspend/resume. This uses
earlier introduced infrastructure for using the fast boot clock.
Signed-off-by: NJoel Fernandes <joelaf@google.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Link: http://lkml.kernel.org/r/1480372524-15181-7-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

80ec3552

24 11月, 2016 3 次提交

tracing: Make __buffer_unlock_commit() always_inline · 52ffabe3

由 Steven Rostedt (Red Hat) 提交于 11月 23, 2016

The function __buffer_unlock_commit() is called in a few places outside of
trace.c. But for the most part, it should really be inlined, as it is in the
hot path of the trace_events. For the callers outside of trace.c, create a
new function trace_buffer_unlock_commit_nostack(), as the reason it was used
was to avoid the stack tracing that trace_buffer_unlock_commit() could do.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.orgReported-by: NAndi Kleen <andi@firstfloor.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

52ffabe3

tracing: Make tracepoint_printk a static_key · 42391745

由 Steven Rostedt (Red Hat) 提交于 11月 23, 2016

Currently, when tracepoint_printk is set (enabled by the "tp_printk" kernel
command line), it causes trace events to print via printk(). This is a very
dangerous operation, but is useful for debugging.

The issue is, it's seldom used, but it is always checked even if it's not
enabled by the kernel command line. Instead of having this feature called by
a branch against a variable, turn that variable into a static key, and this
will remove the test and jump.

To simplify things, the functions output_printk() and
trace_event_buffer_commit() were moved from trace_events.c to trace.c.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

42391745

tracing: Create a always_inlined __trace_buffer_lock_reserve() · 3e9a8aad

由 Steven Rostedt (Red Hat) 提交于 11月 23, 2016

As Andi Kleen pointed out in the Link below, the trace events has quite a
bit of code execution. A lot of that happens to be calling functions, where
some of them should simply be inlined. One of these functions happens to be
trace_buffer_lock_reserve() which is also a global, but it is used
throughout the file it is defined in. Create a __trace_buffer_lock_reserve()
that is always inlined that the file can benefit from.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.orgReported-by: NAndi Kleen <andi@firstfloor.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3e9a8aad

23 11月, 2016 1 次提交

tracing: Add hook to function tracing for other subsystems to use · 478409dd

由 Chunyan Zhang 提交于 11月 21, 2016

Currently Function traces can be only exported to the ring buffer. This
adds a trace_export concept which can process traces and export
them to a registered destination as an addition to the current
one that outputs to Ftrace - i.e. ring buffer.

In this way, if we want function traces to be sent to other destinations
rather than only to the ring buffer, we just need to register a new
trace_export and implement its own .write() function for writing traces to
storage.

With this patch, only function tracing (trace type is TRACE_FN)
is supported.

Link: http://lkml.kernel.org/r/1479715043-6534-2-git-send-email-zhang.chunyan@linaro.orgSigned-off-by: NChunyan Zhang <zhang.chunyan@linaro.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

478409dd

16 11月, 2016 1 次提交

tracing: Add new trace_marker_raw · fa32e855

由 Steven Rostedt 提交于 7月 06, 2016

A new file is created:

 /sys/kernel/debug/tracing/trace_marker_raw

This allows for appications to create data structures and write the binary
data directly into it, and then read the trace data out from trace_pipe_raw
into the same type of data structure. This saves on converting numbers into
ASCII that would be required by trace_marker.
Suggested-by: NOlof Johansson <olof@lixom.net>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

fa32e855

15 11月, 2016 1 次提交

ftrace: Support full glob matching · 60f1d5e3

由 Masami Hiramatsu 提交于 10月 05, 2016

Use glob_match() to support flexible glob wildcards (*,?)
and character classes ([) for ftrace.
Since the full glob matching is slower than the current
partial matching routines(*pat, pat*, *pat*), this leaves
those routines and just add MATCH_GLOB for complex glob
expression.

e.g.
----
[root@localhost tracing]# echo 'sched*group' > set_ftrace_filter
[root@localhost tracing]# cat set_ftrace_filter
sched_free_group
sched_change_group
sched_create_group
sched_online_group
sched_destroy_group
sched_offline_group
[root@localhost tracing]# echo '[Ss]y[Ss]_*' > set_ftrace_filter
[root@localhost tracing]# head set_ftrace_filter
sys_arch_prctl
sys_rt_sigreturn
sys_ioperm
SyS_iopl
sys_modify_ldt
SyS_mmap
SyS_set_thread_area
SyS_get_thread_area
SyS_set_tid_address
sys_fork
----

Link: http://lkml.kernel.org/r/147566869501.29136.6462645009894738056.stgit@devboxAcked-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

60f1d5e3

26 9月, 2016 1 次提交
- A
  fix memory leaks in tracing_buffers_splice_read() · 1ae2293d
  由 Al Viro 提交于 9月 17, 2016
```
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  1ae2293d
25 9月, 2016 1 次提交

tracing: Move mutex to protect against resetting of seq data · 1245800c

由 Steven Rostedt (Red Hat) 提交于 9月 23, 2016

The iter->seq can be reset outside the protection of the mutex. So can
reading of user data. Move the mutex up to the beginning of the function.

Fixes: d7350c3f ("tracing/core: make the read callbacks reentrants")
Cc: stable@vger.kernel.org # 2.6.30+
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

1245800c

12 9月, 2016 1 次提交

tracing: Have max_latency be defined for HWLAT_TRACER as well · f971cc9a

由 Steven Rostedt (Red Hat) 提交于 9月 07, 2016

The hwlat tracer uses tr->max_latency, and if it's the only tracer enabled
that uses it, the build will fail. Add max_latency and its file when the
hwlat tracer is enabled.

Link: http://lkml.kernel.org/r/d6c3b7eb-ba95-1ffa-0453-464e1e24262a@infradead.orgReported-by: NRandy Dunlap <rdunlap@infradead.org>
Tested-by: NRandy Dunlap <rdunlap@infradead.org>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

f971cc9a

03 9月, 2016 1 次提交

tracing: Added hardware latency tracer · e7c15cd8

由 Steven Rostedt (Red Hat) 提交于 6月 23, 2016

The hardware latency tracer has been in the PREEMPT_RT patch for some time.
It is used to detect possible SMIs or any other hardware interruptions that
the kernel is unaware of. Note, NMIs may also be detected, but that may be
good to note as well.

The logic is pretty simple. It simply creates a thread that spins on a
single CPU for a specified amount of time (width) within a periodic window
(window). These numbers may be adjusted by their cooresponding names in

   /sys/kernel/tracing/hwlat_detector/

The defaults are window = 1000000 us (1 second)
                 width  =  500000 us (1/2 second)

The loop consists of:

	t1 = trace_clock_local();
	t2 = trace_clock_local();

Where trace_clock_local() is a variant of sched_clock().

The difference of t2 - t1 is recorded as the "inner" timestamp and also the
timestamp  t1 - prev_t2 is recorded as the "outer" timestamp. If either of
these differences are greater than the time denoted in
/sys/kernel/tracing/tracing_thresh then it records the event.

When this tracer is started, and tracing_thresh is zero, it changes to the
default threshold of 10 us.

The hwlat tracer in the PREEMPT_RT patch was originally written by
Jon Masters. I have modified it quite a bit and turned it into a
tracer.
Based-on-code-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e7c15cd8

24 8月, 2016 1 次提交

ftrace: probe: Add README entries for k/uprobe-events · 86425625

由 Masami Hiramatsu 提交于 8月 18, 2016

Add README entries for kprobe-events and uprobe-events.
This allows user to check what options can be acceptable
for running kernel.
E.g. perf tools can choose correct types for the kernel.
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Naohiro Aota <naohiro.aota@hgst.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/147151069524.12957.12957179170304055028.stgit@devboxSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

86425625

05 7月, 2016 2 次提交

tracing: Using for_each_set_bit() to simplify trace_pid_write() · 67f20b08

由 Wei Yongjun 提交于 7月 04, 2016

Using for_each_set_bit() to simplify the code.

Link: http://lkml.kernel.org/r/1467645004-11169-1-git-send-email-weiyj_lk@163.comSigned-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

67f20b08

ftrace: Move toplevel init out of ftrace_init_tracefs() · 501c2375

由 Steven Rostedt (Red Hat) 提交于 7月 05, 2016

Commit 345ddcc8 ("ftrace: Have set_ftrace_pid use the bitmap like events
do") placed ftrace_init_tracefs into the instance creation, and encapsulated
the top level updating with an if conditional, as the top level only gets
updated at boot up. Unfortunately, this triggers section mismatch errors as
the init functions are called from a function that can be called later, and
the section mismatch logic is unaware of the if conditional that would
prevent it from happening at run time.

To make everyone happy, create a separate ftrace_init_tracefs_toplevel()
routine that only gets called by init functions, and this will be what calls
other init functions for the toplevel directory.

Link: http://lkml.kernel.org/r/20160704102139.19cbc0d9@gandalf.local.homeReported-by: Nkbuild test robot <fengguang.wu@intel.com>
Reported-by: NArnd Bergmann <arnd@arndb.de>
Fixes: 345ddcc8 ("ftrace: Have set_ftrace_pid use the bitmap like events do")
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

501c2375

24 6月, 2016 1 次提交

tracing: Skip more functions when doing stack tracing of events · be54f69c

由 Steven Rostedt (Red Hat) 提交于 6月 23, 2016

 # echo 1 > options/stacktrace
 # echo 1 > events/sched/sched_switch/enable
 # cat trace
          <idle>-0     [002] d..2  1982.525169: <stack trace>
 => save_stack_trace
 => __ftrace_trace_stack
 => trace_buffer_unlock_commit_regs
 => event_trigger_unlock_commit
 => trace_event_buffer_commit
 => trace_event_raw_event_sched_switch
 => __schedule
 => schedule
 => schedule_preempt_disabled
 => cpu_startup_entry
 => start_secondary

The above shows that we are seeing 6 functions before ever making it to the
caller of the sched_switch event.

 # echo stacktrace > events/sched/sched_switch/trigger
 # cat trace
          <idle>-0     [002] d..3  2146.335208: <stack trace>
 => trace_event_buffer_commit
 => trace_event_raw_event_sched_switch
 => __schedule
 => schedule
 => schedule_preempt_disabled
 => cpu_startup_entry
 => start_secondary

The stacktrace trigger isn't as bad, because it adds its own skip to the
stacktracing, but still has two events extra.

One issue is that if the stacktrace passes its own "regs" then there should
be no addition to the skip, as the regs will not include the functions being
called. This was an issue that was fixed by commit 7717c6be ("tracing:
Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()" as adding
the skip number for kprobes made the probes not have any stack at all.

But since this is only an issue when regs is being used, a skip should be
added if regs is NULL. Now we have:

 # echo 1 > options/stacktrace
 # echo 1 > events/sched/sched_switch/enable
 # cat trace
          <idle>-0     [000] d..2  1297.676333: <stack trace>
 => __schedule
 => schedule
 => schedule_preempt_disabled
 => cpu_startup_entry
 => rest_init
 => start_kernel
 => x86_64_start_reservations
 => x86_64_start_kernel

 # echo stacktrace > events/sched/sched_switch/trigger
 # cat trace
          <idle>-0     [002] d..3  1370.759745: <stack trace>
 => __schedule
 => schedule
 => schedule_preempt_disabled
 => cpu_startup_entry
 => start_secondary

And kprobes are not touched.
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

be54f69c

20 6月, 2016 5 次提交

tracing: Choose static tp_printk buffer by explicit nesting count · e2ace001

由 Andy Lutomirski 提交于 5月 26, 2016

Currently, the trace_printk code chooses which static buffer to use based
on what type of atomic context (NMI, IRQ, etc) it's in. Simplify the
code and make it more robust: simply count the nesting depth and choose
a buffer based on the current nesting depth.

The new code will only drop an event if we nest more than 4 deep,
and the old code was guaranteed to malfunction if that happened.

Link: http://lkml.kernel.org/r/07ab03aecfba25fcce8f9a211b14c9c5e2865c58.1464289095.git.luto@kernel.orgAcked-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e2ace001

ftrace: Have set_ftrace_pid use the bitmap like events do · 345ddcc8

由 Steven Rostedt (Red Hat) 提交于 4月 22, 2016

Convert set_ftrace_pid to use the bitmap like set_event_pid does. This
allows for instances to use the pid filtering as well, and will allow for
function-fork option to set if the children of a traced function should be
traced or not.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

345ddcc8

tracing: Move pid_list write processing into its own function · 76c813e2

由 Steven Rostedt (Red Hat) 提交于 4月 21, 2016

The addition of PIDs into a pid_list via the write operation of
set_event_pid is a bit complex. The same operation will be needed for
function tracing pids. Move the code into its own generic function in
trace.c, so that we can avoid duplication of this code.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

76c813e2

tracing: Move the pid_list seq_file functions to be global · 5cc8976b

由 Steven Rostedt (Red Hat) 提交于 4月 20, 2016

To allow other aspects of ftrace to use the pid_list logic, we need to reuse
the seq_file functions. Making the generic part into functions that can be
called by other files will help in this regard.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5cc8976b

tracing: Move filtered_pid helper functions into trace.c · d8275c45

由 Steven Rostedt 提交于 4月 14, 2016

As the filtered_pid functions are going to be used by function tracer as
well as trace_events, move the code into the generic trace.c file.

The functions moved are:

 trace_find_filtered_pid()
 trace_ignore_this_task()
 trace_filter_add_remove_task()

Kernel Doc text was also added.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d8275c45

04 5月, 2016 1 次提交

tracing: Use temp buffer when filtering events · 0fc1b09f

由 Steven Rostedt (Red Hat) 提交于 5月 03, 2016

Filtering of events requires the data to be written to the ring buffer
before it can be decided to filter or not. This is because the parameters of
the filter are based on the result that is written to the ring buffer and
not on the parameters that are passed into the trace functions.

The ftrace ring buffer is optimized for writing into the ring buffer and
committing. The discard procedure used when filtering decides the event
should be discarded is much more heavy weight. Thus, using a temporary
filter when filtering events can speed things up drastically.

Without a temp buffer we have:

 # trace-cmd start -p nop
 # perf stat -r 10 hackbench 50
       0.790706626 seconds time elapsed ( +-  0.71% )

 # trace-cmd start -e all
 # perf stat -r 10 hackbench 50
       1.566904059 seconds time elapsed ( +-  0.27% )

 # trace-cmd start -e all -f 'common_preempt_count==20'
 # perf stat -r 10 hackbench 50
       1.690598511 seconds time elapsed ( +-  0.19% )

 # trace-cmd start -e all -f 'common_preempt_count!=20'
 # perf stat -r 10 hackbench 50
       1.707486364 seconds time elapsed ( +-  0.30% )

The first run above is without any tracing, just to get a based figure.
hackbench takes ~0.79 seconds to run on the system.

The second run enables tracing all events where nothing is filtered. This
increases the time by 100% and hackbench takes 1.57 seconds to run.

The third run filters all events where the preempt count will equal "20"
(this should never happen) thus all events are discarded. This takes 1.69
seconds to run. This is 10% slower than just committing the events!

The last run enables all events and filters where the filter will commit all
events, and this takes 1.70 seconds to run. The filtering overhead is
approximately 10%. Thus, the discard and commit of an event from the ring
buffer may be about the same time.

With this patch, the numbers change:

 # trace-cmd start -p nop
 # perf stat -r 10 hackbench 50
       0.778233033 seconds time elapsed ( +-  0.38% )

 # trace-cmd start -e all
 # perf stat -r 10 hackbench 50
       1.582102692 seconds time elapsed ( +-  0.28% )

 # trace-cmd start -e all -f 'common_preempt_count==20'
 # perf stat -r 10 hackbench 50
       1.309230710 seconds time elapsed ( +-  0.22% )

 # trace-cmd start -e all -f 'common_preempt_count!=20'
 # perf stat -r 10 hackbench 50
       1.786001924 seconds time elapsed ( +-  0.20% )

The first run is again the base with no tracing.

The second run is all tracing with no filtering. It is a little slower, but
that may be well within the noise.

The third run shows that discarding all events only took 1.3 seconds. This
is a speed up of 23%! The discard is much faster than even the commit.

The one downside is shown in the last run. Events that are not discarded by
the filter will take longer to add, this is due to the extra copy of the
event.

Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

0fc1b09f

30 4月, 2016 5 次提交

S
tracing: Remove unused function trace_current_buffer_lock_reserve() · 904d1857
由 Steven Rostedt (Red Hat) 提交于 4月 29, 2016
```
trace_current_buffer_lock_reserve() has no more users. Remove it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
```
904d1857

tracing: Have trace_buffer_unlock_commit() call the _regs version with NULL · 33fddff2

由 Steven Rostedt (Red Hat) 提交于 4月 29, 2016

There's no real difference between trace_buffer_unlock_commit() and
trace_buffer_unlock_commit_regs() except that the former passes NULL to
ftrace_stack_trace() instead of regs. Have the former be a static inline of
the latter which passes NULL for regs.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

33fddff2

S
tracing: Remove unused function trace_current_buffer_discard_commit() · a9fe48dc
由 Steven Rostedt (Red Hat) 提交于 4月 29, 2016
```
The function trace_current_buffer_discard_commit() has no callers, remove
it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
```
a9fe48dc

tracing: Move trace_buffer_unlock_commit{_regs}() to local header · fa66ddb8

由 Steven Rostedt (Red Hat) 提交于 4月 28, 2016

The functions trace_buffer_unlock_commit() and the _regs() version are only
used within the kernel/trace directory. Move them to the local header and
remove the export as well.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

fa66ddb8

tracing: Fold filter_check_discard() into its only user · 9cbb1506

由 Steven Rostedt (Red Hat) 提交于 4月 27, 2016

The function filter_check_discard() is small and only called by one user,
its code can be folded into that one caller and make the code a bit less
comlplex.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9cbb1506