提交 · 60b1af3300724d211bb0b420c1fbe6bf5b87b013 · openeuler / Kernel

16 7月, 2016 1 次提交

tracing: Use outer () on __get_str() definition · 934de5f2

由 Daniel Bristot de Oliveira 提交于 7月 01, 2016

__get_str(str)'s definition includes a (char *) operator
overloading that is not protected with outer ().

This patch adds () around __get_str()'s definition, enabling
some code cleanup.

Link: http://lkml.kernel.org/r/20ac1a10c2ec4ccd23e4a8ef34101fb6e4157d37.1467407618.git.bristot@redhat.com

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

934de5f2

08 4月, 2016 1 次提交

perf: remove unused __addr variable · e93735be

由 Alexei Starovoitov 提交于 4月 06, 2016

now all calls to perf_trace_buf_submit() pass 0 as 4th
argument which will be repurposed in the next patch which will
change the meaning of 1st arg of perf_tp_event() to event_type
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e93735be

24 12月, 2015 1 次提交

tracing: Introduce TRACE_EVENT_FN_COND macro · 2701121b

由 Denis Kirjanov 提交于 12月 14, 2015

TRACE_EVENT_FN can't be used in some circumstances
like invoking trace functions from offlined CPU due
to RCU usage.

This patch adds the TRACE_EVENT_FN_COND macro
to make such trace points conditional.

Link: http://lkml.kernel.org/r/1450124286-4822-1-git-send-email-kda@linux-powerpc.orgSigned-off-by: NDenis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2701121b

23 9月, 2015 1 次提交

tracing: Move non perf code out of perf.h · 46ac5182

由 Steven Rostedt (Red Hat) 提交于 9月 23, 2015

Commit ee53bbd1 "tracing: Move the perf code out of trace_event.h" moved
more than just the perf code out of trace_event.h, but also removed a bit of
the tracing code too. Move it back.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

46ac5182

14 5月, 2015 11 次提交

tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call() · d0ee8f4a

由 Steven Rostedt (Red Hat) 提交于 5月 13, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The function ftrace_get_offsets_##call()
is used to find the offset into dynamically allocated trace event fields
for printing. It has nothing to do with function tracing. Rename it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d0ee8f4a

tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call() · 33d0f35e

由 Steven Rostedt (Red Hat) 提交于 5月 13, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The function ftrace_define_fields_##call()
is used to define how to process the trace_event fields. It has nothing to
do with function tracing. Rename it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

33d0f35e

tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call · 3ad017ba

由 Steven Rostedt (Red Hat) 提交于 5月 13, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structure ftrace_event_type_funcs_##call
is used to define how the trace_events will be printed. It has nothing to
do with function tracing. Rename it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3ad017ba

tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call · 62323a14

由 Steven Rostedt (Red Hat) 提交于 5月 13, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structure ftrace_data_offset_##call is
used to find the offsets of dynamically allocated fields in trace_events.
It has nothing to do with function tracing. Rename it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

62323a14

tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call · a7237765

由 Steven Rostedt (Red Hat) 提交于 5月 13, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The ftrace_raw_##call structures are built
by macros for trace events. They have nothing to do with function tracing.
Rename them.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a7237765

tracing: Rename ftrace_output functions to trace_output · 892c505a

由 Steven Rostedt (Red Hat) 提交于 5月 05, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The ftrace_output_*() and ftrace_raw_output_*()
functions represent the trace_event code. Rename them to just trace_output
or trace_raw_output.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

892c505a

tracing: Rename ftrace_event_{call,class} to trace_event_{call,class} · 2425bcb9

由 Steven Rostedt (Red Hat) 提交于 5月 05, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structures ftrace_event_call and
ftrace_event_class have nothing to do with the function hooks, and are
really trace_event structures. Rename ftrace_event_* to trace_event_*.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2425bcb9

tracing: Rename ftrace_print_*() functions ta trace_print_*() · 645df987

由 Steven Rostedt (Red Hat) 提交于 5月 04, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The functions ftrace_print_*() are not part of
the function infrastructure, and the names can be confusing. Rename them
to be trace_print_*().
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

645df987

tracing: Rename ftrace_event.h to trace_events.h · af658dca

由 Steven Rostedt (Red Hat) 提交于 4月 29, 2015

The term "ftrace" is really the infrastructure of the function hooks,
and not the trace events. Rename ftrace_event.h to trace_events.h to
represent the trace_event infrastructure and decouple the term ftrace
from it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

af658dca

tracing: Move the perf code out of trace_event.h · ee53bbd1

由 Steven Rostedt (Red Hat) 提交于 4月 29, 2015

The trace_event.h file is for the generic trace event code. Move
the perf related code into its own trace header file perf.h
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

ee53bbd1

tracing: Rename trace/ftrace.h to trace/trace_events.h · 2167ae72

由 Steven Rostedt (Red Hat) 提交于 4月 29, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the TRACE_EVENT() macros. The file trace/ftrace.h was originally
written to be mostly focused toward the "ftrace" code (that in kernel/trace/)
but ended up being generic and used by perf and others.

Rename the file to be less confusing about what infrastructure it belongs to.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2167ae72

08 4月, 2015 2 次提交

tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values · 0c564a53

由 Steven Rostedt (Red Hat) 提交于 3月 24, 2015

Several tracepoints use the helper functions __print_symbolic() or
__print_flags() and pass in enums that do the mapping between the
binary data stored and the value to print. This works well for reading
the ASCII trace files, but when the data is read via userspace tools
such as perf and trace-cmd, the conversion of the binary value to a
human string format is lost if an enum is used, as userspace does not
have access to what the ENUM is.

For example, the tracepoint trace_tlb_flush() has:

 __print_symbolic(REC->reason,
    { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
    { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
    { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
    { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

Which maps the enum values to the strings they represent. But perf and
trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would
not be able to map it.

With TRACE_DEFINE_ENUM(), developers can place these in the event header
files and ftrace will convert the enums to their values:

By adding:

 TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
 TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
 TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
 TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

 $ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format
[...]
 __print_symbolic(REC->reason,
    { 0, "flush on task switch" },
    { 1, "remote shootdown" },
    { 2, "local shootdown" },
    { 3, "local mm shootdown" })

The above is what userspace expects to see, and tools do not need to
be modified to parse them.

Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org

Cc: Guilherme Cox <cox@computer.org>
Cc: Tony Luck <tony.luck@gmail.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Acked-by: NNamhyung Kim <namhyung@kernel.org>
Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

0c564a53

tracing: Give system name a pointer · acd388fd

由 Steven Rostedt (Red Hat) 提交于 3月 31, 2015

Normally the compiler will use the same pointer for a string throughout
the file. But there's no guarantee of that happening. Later changes will
require that all events have the same pointer to the system string.

Name the system string and have all events point to it.

Testing this, it did not increases the size of the text, except for the
notes section, which should not harm the real size any.

Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.orgReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

acd388fd

28 1月, 2015 1 次提交

tracing: Add array printing helper · 6ea22486

由 Dave Martin 提交于 1月 28, 2015

If a trace event contains an array, there is currently no standard
way to format this for text output.  Drivers are currently hacking
around this by a) local hacks that use the trace_seq functionailty
directly, or b) just not printing that information.  For fixed size
arrays, formatting of the elements can be open-coded, but this gets
cumbersome for arrays of non-trivial size.

These approaches result in non-standard content of the event format
description delivered to userspace, so userland tools needs to be
taught to understand and parse each array printing method
individually.

This patch implements a __print_array() helper that tracepoint
implementations can use instead of reinventing it.  A simple C-style
syntax is used to delimit the array and its elements {like,this}.

So that the helpers can be used with large static arrays as well as
dynamic arrays, they take a pointer and element count: they can be
used with __get_dynamic_array() for use with dynamic arrays.
Link: http://lkml.kernel.org/r/1422449335-8289-2-git-send-email-javi.merino@arm.com

Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Signed-off-by: NJavi Merino <javi.merino@arm.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

6ea22486

14 1月, 2015 1 次提交

perf: Avoid horrible stack usage · 86038c5e

由 Peter Zijlstra (Intel) 提交于 12月 16, 2014

Both Linus (most recent) and Steve (a while ago) reported that perf
related callbacks have massive stack bloat.

The problem is that software events need a pt_regs in order to
properly report the event location and unwind stack. And because we
could not assume one was present we allocated one on stack and filled
it with minimal bits required for operation.

Now, pt_regs is quite large, so this is undesirable. Furthermore it
turns out that most sites actually have a pt_regs pointer available,
making this even more onerous, as the stack space is pointless waste.

This patch addresses the problem by observing that software events
have well defined nesting semantics, therefore we can use static
per-cpu storage instead of on-stack.

Linus made the further observation that all but the scheduler callers
of perf_sw_event() have a pt_regs available, so we change the regular
perf_sw_event() to require a valid pt_regs (where it used to be
optional) and add perf_sw_event_sched() for the scheduler.

We have a scheduler specific call instead of a more generic _noregs()
like construct because we can assume non-recursion from the scheduler
and thereby simplify the code further (_noregs would have to put the
recursion context call inline in order to assertain which __perf_regs
element to use).

One last note on the implementation of perf_trace_buf_prepare(); we
allow .regs = NULL for those cases where we already have a pt_regs
pointer available and do not need another.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reported-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

86038c5e

20 11月, 2014 2 次提交

tracing: Fix return value of ftrace_raw_output_prep() · 8e2e095c

由 Steven Rostedt (Red Hat) 提交于 11月 14, 2014

If the trace_seq of ftrace_raw_output_prep() is full this function
returns TRACE_TYPE_PARTIAL_LINE, otherwise it returns zero.

The problem is that TRACE_TYPE_PARTIAL_LINE happens to be zero!

The thing is, the caller of ftrace_raw_output_prep() expects a
success to be zero. Change that to expect it to be
TRACE_TYPE_HANDLED.

Link: http://lkml.kernel.org/r/20141114112522.GA2988@dhcp128.suse.czReminded-by: NPetr Mladek <pmladek@suse.cz>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

8e2e095c

tracing: Add trace_seq_has_overflowed() and trace_handle_return() · 19a7fe20

由 Steven Rostedt (Red Hat) 提交于 11月 12, 2014

Adding a trace_seq_has_overflowed() which returns true if the trace_seq
had too much written into it allows us to simplify the code.

Instead of checking the return value of every call to trace_seq_printf()
and friends, they can all be called normally, and at the end we can
return !trace_seq_has_overflowed() instead.

Several functions also return TRACE_TYPE_PARTIAL_LINE when the trace_seq
overflowed and TRACE_TYPE_HANDLED otherwise. Another helper function
was created called trace_handle_return() which takes a trace_seq and
returns these enums. Using this helper function also simplifies the
code.

This change also makes it possible to remove the return values of
trace_seq_printf() and friends. They should instead just be
void functions.

Link: http://lkml.kernel.org/r/20141114011410.365183157@goodmis.orgReviewed-by: NPetr Mladek <pmladek@suse.cz>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

19a7fe20

21 6月, 2014 1 次提交

tracing: Add __field_struct macro for TRACE_EVENT() · 4d4c9cc8

由 Steven Rostedt 提交于 6月 17, 2014

Currently the __field() macro in TRACE_EVENT is only good for primitive
values, such as integers and pointers, but it fails on complex data types
such as structures or unions. This is because the __field() macro
determines if the variable is signed or not with the test of:

  (((type)(-1)) < (type)1)

Unfortunately, that fails when type is a structure.

Since trace events should support structures as fields a new macro
is created for such a case called __field_struct() which acts exactly
the same as __field() does but it does not do the signed type check
and just uses a constant false for that answer.

Cc: Tony Luck <tony.luck@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4d4c9cc8

05 6月, 2014 1 次提交

tracing: Add __get_dynamic_array_len() macro for trace events · beba4bb0

由 Steven Rostedt (Red Hat) 提交于 6月 04, 2014

If a trace event uses a dynamic array for something other than a string
then there's currently no way the TP_printk() can figure out what size
it is. A __get_dynamic_array_len() is required to know the length.

This also simplifies the __get_bitmask() macro which required it as well,
but instead just hardcoded it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

beba4bb0

15 5月, 2014 1 次提交

tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks · 4449bf92

由 Steven Rostedt (Red Hat) 提交于 5月 06, 2014

Being able to show a cpumask of events can be useful as some events
may affect only some CPUs. There is no standard way to record the
cpumask and converting it to a string is rather expensive during
the trace as traces happen in hotpaths. It would be better to record
the raw event mask and be able to parse it at print time.

The following macros were added for use with the TRACE_EVENT() macro:

  __bitmask()
  __assign_bitmask()
  __get_bitmask()

To test this, I added this to the sched_migrate_task event, which
looked like this:

TRACE_EVENT(sched_migrate_task,

	TP_PROTO(struct task_struct *p, int dest_cpu, const struct cpumask *cpus),

	TP_ARGS(p, dest_cpu, cpus),

	TP_STRUCT__entry(
		__array(	char,	comm,	TASK_COMM_LEN	)
		__field(	pid_t,	pid			)
		__field(	int,	prio			)
		__field(	int,	orig_cpu		)
		__field(	int,	dest_cpu		)
		__bitmask(	cpumask, num_possible_cpus()	)
	),

	TP_fast_assign(
		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
		__entry->pid		= p->pid;
		__entry->prio		= p->prio;
		__entry->orig_cpu	= task_cpu(p);
		__entry->dest_cpu	= dest_cpu;
		__assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus());
	),

	TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s",
		  __entry->comm, __entry->pid, __entry->prio,
		  __entry->orig_cpu, __entry->dest_cpu,
		  __get_bitmask(cpumask))
);

With the output of:

        ksmtuned-3613  [003] d..2   485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f
     migration/1-13    [001] d..5   485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f
             awk-3615  [002] d.H5   485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff
     migration/2-18    [002] d..5   485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f

Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com
Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.homeSuggested-by: NJavi Merino <javi.merino@arm.com>
Tested-by: NJavi Merino <javi.merino@arm.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4449bf92

10 4月, 2014 1 次提交

tracing: Fix anonymous unions in struct ftrace_event_call · abb43f69

由 Mathieu Desnoyers 提交于 4月 09, 2014

gcc <= 4.5.x has significant limitations with respect to initialization
of anonymous unions within structures. They need to be surrounded by
brackets, _and_ they need to be initialized in the same order in which
they appear in the structure declaration.

Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676
Link: http://lkml.kernel.org/r/1397077568-3156-1-git-send-email-mathieu.desnoyers@efficios.comSigned-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

abb43f69

09 4月, 2014 1 次提交

tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints · de7b2973

由 Mathieu Desnoyers 提交于 4月 08, 2014

Register/unregister tracepoint probes with struct tracepoint pointer
rather than tracepoint name.

This change, which vastly simplifies tracepoint.c, has been proposed by
Steven Rostedt. It also removes 8.8kB (mostly of text) to the vmlinux
size.

From this point on, the tracers need to pass a struct tracepoint pointer
to probe register/unregister. A probe can now only be connected to a
tracepoint that exists. Moreover, tracers are responsible for
unregistering the probe before the module containing its associated
tracepoint is unloaded.

   text    data     bss     dec     hex filename
10443444        4282528 10391552        25117524        17f4354 vmlinux.orig
10434930        4282848 10391552        25109330        17f2352 vmlinux

Link: http://lkml.kernel.org/r/1396992381-23785-2-git-send-email-mathieu.desnoyers@efficios.com

CC: Ingo Molnar <mingo@kernel.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Frank Ch. Eigler <fche@redhat.com>
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
[ SDR - fixed return val in void func in tracepoint_module_going() ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

de7b2973

22 3月, 2014 1 次提交

Revert "tracing: Move event storage for array from macro to standalone function" · bc4c426e

由 Steven Rostedt (Red Hat) 提交于 3月 21, 2014

I originally wrote commit 35bb4399 to shrink the size of the overhead of
tracepoints by several kilobytes. Later, I received a patch from Vaibhav
Nagarnaik that fixed a bug in the same code that this commit touches. Not
only did it fix a bug, it also removed code and shrunk the size of the
overhead of trace events even more than this commit did.

Since this commit is scheduled for 3.15 and Vaibhav's patch is already in
mainline, I need to revert this patch in order to keep it from conflicting
with Vaibhav's patch. Not to mention, Vaibhav's patch makes this patch
obsolete.

Link: http://lkml.kernel.org/r/20140320225637.0226041b@gandalf.local.home

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

bc4c426e

21 3月, 2014 1 次提交

tracing: Fix array size mismatch in format string · 87291347

由 Vaibhav Nagarnaik 提交于 2月 13, 2014

In event format strings, the array size is reported in two locations.
One in array subscript and then via the "size:" attribute. The values
reported there have a mismatch.

For e.g., in sched:sched_switch the prev_comm and next_comm character
arrays have subscript values as [32] where as the actual field size is
16.

name: sched_switch
ID: 301
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1;signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:char prev_comm[32];       offset:8;       size:16;        signed:1;
        field:pid_t prev_pid;   offset:24;      size:4; signed:1;
        field:int prev_prio;    offset:28;      size:4; signed:1;
        field:long prev_state;  offset:32;      size:8; signed:1;
        field:char next_comm[32];       offset:40;      size:16;        signed:1;
        field:pid_t next_pid;   offset:56;      size:4; signed:1;
        field:int next_prio;    offset:60;      size:4; signed:1;

After bisection, the following commit was blamed:
92edca07 tracing: Use direct field, type and system names

This commit removes the duplication of strings for field->name and
field->type assuming that all the strings passed in
__trace_define_field() are immutable. This is not true for arrays, where
the type string is created in event_storage variable and field->type for
all array fields points to event_storage.

Use __stringify() to create a string constant for the type string.

Also, get rid of event_storage and event_storage_mutex that are not
needed anymore.

also, an added benefit is that this reduces the overhead of events a bit more:

   text    data     bss     dec     hex filename
8424787 2036472 1302528 11763787         b3804b vmlinux
8420814 2036408 1302528 11759750         b37086 vmlinux.patched

Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com

Cc: Laurent Chavey <chavey@google.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

87291347

07 3月, 2014 5 次提交

tracing: Evaluate len expression only once in __dynamic_array macro · 114e7b52

由 Filipe Brandenburger 提交于 2月 28, 2014

Use a temporary variable to store the expansion of the len expression.
If the evaluation is expensive, this commit will ensure it is evaluated
only once inside ftrace_get_offsets_<call>.

Link: http://lkml.kernel.org/r/1393651938-16418-3-git-send-email-filbranden@google.comSigned-off-by: NFilipe Brandenburger <filbranden@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

114e7b52

tracing: Correctly expand len expressions from __dynamic_array macro · 5859fa1a

由 Filipe Brandenburger 提交于 2月 28, 2014

This fixes expansion of the len argument in __dynamic_array macros.
The previous code from commit 7d536cb3 would not fully evaluate the
expression before multiplying its result by the size of the type.

This went unnoticed because the length stored in the high 16 bits of the
offset (which is the one that was broken here) is only used by
filter_pred_strloc which only acts on strings for which the size of the
type is 1.

Link: http://lkml.kernel.org/r/1393651938-16418-2-git-send-email-filbranden@google.comSigned-off-by: NFilipe Brandenburger <filbranden@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5859fa1a

tracing: Use helper functions in event assignment to shrink macro size · 3fd40d1e

由 Steven Rostedt 提交于 8月 09, 2012

The functions that assign the contents for the ftrace events are
defined by the TRACE_EVENT() macros. Each event has its own unique
way to assign data to its buffer. When you have over 500 events,
that means there's 500 functions assigning data uniquely for each
event (not really that many, as DECLARE_EVENT_CLASS() and multiple
DEFINE_EVENT()s will only need a single function).

By making helper functions in the core kernel to do some of the work
instead, we can shrink the size of the kernel down a bit.

With a kernel configured with 502 events, the change in size was:

text data bss dec hex filename
12987390 1913504 9785344 24686238 178ae9e /tmp/vmlinux
12959102 1913504 9785344 24657950 178401e /tmp/vmlinux.patched

That's a total of 28288 bytes, which comes down to 56 bytes per event.

Link: http://lkml.kernel.org/r/20120810034708.370808175@goodmis.orgSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>

3fd40d1e

tracing: Move event storage for array from macro to standalone function · 35bb4399

由 Steven Rostedt 提交于 8月 09, 2012

The code that shows array fields for events is defined for all events.
This can add up quite a bit when you have over 500 events.

By making helper functions in the core kernel to do the work
instead, we can shrink the size of the kernel down a bit.

With a kernel configured with 502 events, the change in size was:

text data bss dec hex filename
12990946 1913568 9785344 24689858 178bcc2 /tmp/vmlinux
12987390 1913504 9785344 24686238 178ae9e /tmp/vmlinux.patched

That's a total of 3556 bytes, which comes down to 7 bytes per event.
Although it's not much, this code is just called at initialization of
the events.

Link: http://lkml.kernel.org/r/20120810034708.084036335@goodmis.orgSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>

35bb4399

tracing: Move raw output code from macro to standalone function · 1d6bae96

由 Steven Rostedt 提交于 8月 09, 2012

The code for trace events to format the raw recorded event data
into human readable format in the 'trace' file is repeated for every
event in the system. When you have over 500 events, this can add up
quite a bit.

By making helper functions in the core kernel to do the work
instead, we can shrink the size of the kernel down a bit.

With a kernel configured with 502 events, the change in size was:

   text    data     bss     dec     hex filename
12991007        1913568 9785344 24689919        178bcff /tmp/vmlinux.orig
12990946        1913568 9785344 24689858        178bcc2 /tmp/vmlinux.patched

Note, this version does not save as much as the version of this patch
I had a few years ago. That is because in the mean time, commit
f71130de ("tracing: Add a helper function for event print functions")
did a lot of the work my original patch did. But this change helps
slightly, and is part of a larger clean up to reduce the size much further.

Link: http://lkml.kernel.org/r/20120810034707.378538034@goodmis.org

Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

1d6bae96

10 1月, 2014 1 次提交

tracing: Consolidate event trigger code · 13a1e4ae

由 Steven Rostedt (Red Hat) 提交于 1月 06, 2014

The event trigger code that checks for callback triggers before and
after recording of an event has lots of flags checks. This code is
duplicated throughout the ftrace events, kprobes and system calls.
They all do the exact same checks against the event flags.

Added helper functions ftrace_trigger_soft_disabled(),
event_trigger_unlock_commit() and event_trigger_unlock_commit_regs()
that consolidated the code and these are used instead.

Link: http://lkml.kernel.org/r/20140106222703.5e7dbba2@gandalf.local.homeAcked-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

13a1e4ae

22 12月, 2013 1 次提交

tracing: Add and use generic set_trigger_filter() implementation · bac5fb97

由 Tom Zanussi 提交于 10月 24, 2013

Add a generic event_command.set_trigger_filter() op implementation and
have the current set of trigger commands use it - this essentially
gives them all support for filters.

Syntactically, filters are supported by adding 'if <filter>' just
after the command, in which case only events matching the filter will
invoke the trigger. For example, to add a filter to an
enable/disable_event command:

echo 'enable_event:system:event if common_pid == 999' > \
.../othersys/otherevent/trigger

The above command will only enable the system:event event if the
common_pid field in the othersys:otherevent event is 999.

As another example, to add a filter to a stacktrace command:

echo 'stacktrace if common_pid == 999' > \
.../somesys/someevent/trigger

The above command will only trigger a stacktrace if the common_pid
field in the event is 999.

The filter syntax is the same as that described in the 'Event
filtering' section of Documentation/trace/events.txt.

Because triggers can now use filters, the trigger-invoking logic needs
to be moved in those cases - e.g. for ftrace_raw_event_calls, if a
trigger has a filter associated with it, the trigger invocation now
needs to happen after the { assign; } part of the call, in order for
the trigger condition to be tested.

There's still a SOFT_DISABLED-only check at the top of e.g. the
ftrace_raw_events function, so when an event is soft disabled but not
because of the presence of a trigger, the original SOFT_DISABLED
behavior remains unchanged.

There's also a bit of trickiness in that some triggers need to avoid
being invoked while an event is currently in the process of being
logged, since the trigger may itself log data into the trace buffer.
Thus we make sure the current event is committed before invoking those
triggers. To do that, we split the trigger invocation in two - the
first part (event_triggers_call()) checks the filter using the current
trace record; if a command has the post_trigger flag set, it sets a
bit for itself in the return value, otherwise it directly invoks the
trigger. Once all commands have been either invoked or set their
return flag, event_triggers_call() returns. The current record is
then either committed or discarded; if any commands have deferred
their triggers, those commands are finally invoked following the close
of the current event by event_triggers_post_call().

To simplify the above and make it more efficient, the TRIGGER_COND bit
is introduced, which is set only if a soft-disabled trigger needs to
use the log record for filter testing or needs to wait until the
current log record is closed.

The syscall event invocation code is also changed in analogous ways.

Because event triggers need to be able to create and free filters,
this also adds a couple external wrappers for the existing
create_filter and free_filter functions, which are too generic to be
made extern functions themselves.

Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

bac5fb97

21 12月, 2013 1 次提交

tracing: Add basic event trigger framework · 85f2b082

由 Tom Zanussi 提交于 10月 24, 2013

Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.

'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.

The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.

The event trigger functionality is built on top of SOFT_DISABLE
functionality.  It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires.  Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that.  Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function.  Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.

The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.

The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.

The standard open, read, and release file operations are implemented
here.

The open() implementation sets up for the various open modes of the
'trigger' file.  It creates and attaches the trigger iterator and sets
up the command parser.  If opened for reading set up the trigger
seq_ops.

The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.

The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.

A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.

also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.

A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations.  They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.

The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event.  It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.

Every event_command func() implementation essentially does the
same thing for any command:

   - choose ops - use the value of param to choose either a number or
     count version of event_trigger_ops specific to the command
   - do the register or unregister of those ops
   - associate a filter, if specified, with the triggering event

The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized.  When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite.  The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.

Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.

The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions.  This allows func()
implementations to use command-specific blobs and supports code
re-use.

The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked.  The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.

This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.

Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: NSteve Rostedt <rostedt@goodmis.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

85f2b082

26 11月, 2013 1 次提交

tracing: Allow events to have NULL strings · 4e58e547

由 Steven Rostedt (Red Hat) 提交于 11月 26, 2013

If an TRACE_EVENT() uses __assign_str() or __get_str on a NULL pointer
then the following oops will happen:

BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [<c127a17b>] strlen+0x10/0x1a
*pde = 00000000 ^M
Oops: 0000 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.13.0-rc1-test+ #2
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006^M
task: f5cde9f0 ti: f5e5e000 task.ti: f5e5e000
EIP: 0060:[<c127a17b>] EFLAGS: 00210046 CPU: 1
EIP is at strlen+0x10/0x1a
EAX: 00000000 EBX: c2472da8 ECX: ffffffff EDX: c2472da8
ESI: c1c5e5fc EDI: 00000000 EBP: f5e5fe84 ESP: f5e5fe80
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 00000000 CR3: 01f32000 CR4: 000007d0
Stack:
 f5f18b90 f5e5feb8 c10687a8 0759004f 00000005 00000005 00000005 00200046
 00000002 00000000 c1082a93 f56c7e28 c2472da8 c1082a93 f5e5fee4 c106bc61^M
 00000000 c1082a93 00000000 00000000 00000001 00200046 00200082 00000000
Call Trace:
 [<c10687a8>] ftrace_raw_event_lock+0x39/0xc0
 [<c1082a93>] ? ktime_get+0x29/0x69
 [<c1082a93>] ? ktime_get+0x29/0x69
 [<c106bc61>] lock_release+0x57/0x1a5
 [<c1082a93>] ? ktime_get+0x29/0x69
 [<c10824dd>] read_seqcount_begin.constprop.7+0x4d/0x75
 [<c1082a93>] ? ktime_get+0x29/0x69^M
 [<c1082a93>] ktime_get+0x29/0x69
 [<c108a46a>] __tick_nohz_idle_enter+0x1e/0x426
 [<c10690e8>] ? lock_release_holdtime.part.19+0x48/0x4d
 [<c10bc184>] ? time_hardirqs_off+0xe/0x28
 [<c1068c82>] ? trace_hardirqs_off_caller+0x3f/0xaf
 [<c108a8cb>] tick_nohz_idle_enter+0x59/0x62
 [<c1079242>] cpu_startup_entry+0x64/0x192
 [<c102299c>] start_secondary+0x277/0x27c
Code: 90 89 c6 89 d0 88 c4 ac 38 e0 74 09 84 c0 75 f7 be 01 00 00 00 89 f0 48 5e 5d c3 55 89 e5 57 66 66 66 66 90 83 c9 ff 89 c7 31 c0 <f2> ae f7 d1 8d 41 ff 5f 5d c3 55 89 e5 57 66 66 66 66 90 31 ff
EIP: [<c127a17b>] strlen+0x10/0x1a SS:ESP 0068:f5e5fe80
CR2: 0000000000000000
---[ end trace 01bc47bf519ec1b2 ]---

New tracepoints have been added that have allowed for NULL pointers
being assigned to strings. To fix this, change the TRACE_EVENT() code
to check for NULL and if it is, it will assign "(null)" to it instead
(similar to what glibc printf does).
Reported-by: NShuah Khan <shuah.kh@samsung.com>
Reported-by: NJovi Zhangwei <jovi.zhangwei@gmail.com>
Link: http://lkml.kernel.org/r/CAGdX0WFeEuy+DtpsJzyzn0343qEEjLX97+o1VREFkUEhndC+5Q@mail.gmail.com
Link: http://lkml.kernel.org/r/528D6972.9010702@samsung.com
Fixes: 9cbf1176 ("tracing/events: provide string with undefined size support")
Cc: stable@vger.kernel.org # 2.6.31+
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4e58e547

19 11月, 2013 1 次提交

ftrace, perf: Avoid infinite event generation loop · d5b5f391

由 Peter Zijlstra 提交于 11月 14, 2013

Vince's perf-trinity fuzzer found yet another 'interesting' problem.

When we sample the irq_work_exit tracepoint with period==1 (or
PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an
infinite event generation loop:

  ,-> <IPI>
  |     irq_work_exit() ->
  |       trace_irq_work_exit() ->
  |         ...
  |           __perf_event_overflow() -> (due to fasync)
  |             irq_work_queue() -> (irq_work_list must be empty)
  '---------      arch_irq_work_raise()

Similar things can happen due to regular poll() wakeups if we exceed
the ring-buffer wakeup watermark, or have an event_limit.

To avoid this, dis-allow sampling this particular tracepoint.

In order to achieve this, create a special perf_perm function pointer
for each event and call this (when set) on trying to create a
tracepoint perf event.

[ roasted: use expr... to allow for ',' in your expression ]
Reported-by: NVince Weaver <vincent.weaver@maine.edu>
Tested-by: NVince Weaver <vincent.weaver@maine.edu>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

d5b5f391

06 11月, 2013 1 次提交

tracing: Update event filters for multibuffer · f306cc82

由 Tom Zanussi 提交于 10月 24, 2013

The trace event filters are still tied to event calls rather than
event files, which means you don't get what you'd expect when using
filters in the multibuffer case:

Before:

  # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 8192
  # mkdir /sys/kernel/debug/tracing/instances/test1
  # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 2048
  # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  bytes_alloc > 2048

Setting the filter in tracing/instances/test1/events shouldn't affect
the same event in tracing/events as it does above.

After:

  # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 8192
  # mkdir /sys/kernel/debug/tracing/instances/test1
  # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 8192
  # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  bytes_alloc > 2048

We'd like to just move the filter directly from ftrace_event_call to
ftrace_event_file, but there are a couple cases that don't yet have
multibuffer support and therefore have to continue using the current
event_call-based filters.  For those cases, a new USE_CALL_FILTER bit
is added to the event_call flags, whose main purpose is to keep the
old behavior for those cases until they can be updated with
multibuffer support; at that point, the USE_CALL_FILTER flag (and the
new associated call_filter_check_discard() function) can go away.

The multibuffer support also made filter_current_check_discard()
redundant, so this change removes that function as well and replaces
it with filter_check_discard() (or call_filter_check_discard() as
appropriate).

Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

f306cc82

14 8月, 2013 1 次提交

tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible · d027e6a9

由 Oleg Nesterov 提交于 8月 06, 2013

perf_trace_buf_prepare() + perf_trace_buf_submit(task => NULL)
make no sense if hlist_empty(head). Change perf_trace_##call()
to check ->perf_events beforehand and do nothing if it is empty.

This removes the overhead for tasks without events associated
with them. For example, "perf record -e sched:sched_switch -p1"
attaches the counter(s) to the single task, but every task in
system will do perf_trace_buf_prepare/submit() just to realize
that it was not attached to this event.

However, we can only do this if __task == NULL, so we also add
the __builtin_constant_p(__task) check.

With this patch "perf bench sched pipe" shows approximately 4%
improvement when "perf record -p1" runs in parallel, many thanks
to Steven for the testing.

Link: http://lkml.kernel.org/r/20130806160847.GA2746@redhat.comTested-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d027e6a9

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功