提交 · 190b57fcb9c5fed5414935a174094f534fc510bc · openeuler / raspberrypi-kernel

16 7月, 2011 10 次提交

perf probe: Add probed module in front of function · 190b57fc

由 Masami Hiramatsu 提交于 6月 27, 2011

Add probed module name and ":" in front of function name
if -m module option is given. In the result, the symbol
name passed to kprobe-tracer becomes MODULE:FUNCTION,
so that kallsyms can solve it as a symbol in the module
correctly.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/20110627072745.6528.26416.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

190b57fc

perf probe: Introduce debuginfo to encapsulate dwarf information · ff741783

由 Masami Hiramatsu 提交于 6月 27, 2011

Introduce debuginfo to encapsulate dwarf information.
This new object allows us to reuse and expand debuginfo easily.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/20110627072739.6528.12438.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

ff741783

perf-probe: Move dwarf library routines to dwarf-aux.{c, h} · e0d153c6

由 Masami Hiramatsu 提交于 6月 27, 2011

Move dwarf library related routines to dwarf-aux.{c,h}.
This includes several minor changes.
- Add simple documents for each API.
- Rename die_find_real_subprogram() to die_find_realfunc()
- Rename line_walk_handler_t to line_walk_callback_t.
- Minor cleanups.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/20110627072727.6528.57647.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e0d153c6

perf probe: Remove redundant dwarf functions · bcfc0821

由 Masami Hiramatsu 提交于 6月 27, 2011

Since there are dwarf_bitsize, dwarf_bitoffset and dwarf_bytesize
defined in libdw, we don't need die_get_bit_size, die_get_bit_offset
and die_get_byte_size anymore.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/20110627072721.6528.2747.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

bcfc0821

perf probe: Move strtailcmp to string.c · bad03ae4

由 Masami Hiramatsu 提交于 6月 27, 2011

Since strtailcmp() is enough generic, it should be defined in string.c.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/20110627072715.6528.10677.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

bad03ae4

perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END · baad2d3e

由 Masami Hiramatsu 提交于 6月 27, 2011

Since die_find/walk* callbacks use DIE_FIND_CB_FOUND for
both of failed and found cases, it should be "END"
instead "FOUND" for avoiding confusion.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reported-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/20110627072709.6528.45706.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

baad2d3e

tracing/kprobe: Update symbol reference when loading module · 7f6878a3

由 Masami Hiramatsu 提交于 6月 27, 2011

Since the address of a module-local variable can only be
solved after the target module is loaded, the symbol
fetch-argument should be updated when loading target
module.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20110627072703.6528.75042.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

7f6878a3

tracing/kprobes: Support module init function probing · 61424318

由 Masami Hiramatsu 提交于 6月 27, 2011

To support probing module init functions, kprobe-tracer allows
user to define a probe on non-existed function when it is given
with a module name. This also enables user to set a probe on
a function on a specific module, even if a same name (but different)
function is locally defined in another module.

The module name must be in the front of function name and separated
by a ':'. e.g. btrfs:btrfs_init_sysfs
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20110627072656.6528.89970.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

61424318

kprobes: Return -ENOENT if probe point doesn't exist · bc81d48d

由 Masami Hiramatsu 提交于 6月 27, 2011

Return -ENOENT if probe point doesn't exist, but still returns
-EINVAL if both of kprobe->addr and kprobe->symbol_name are
specified or both are not specified.
Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20110627072650.6528.67329.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

bc81d48d

tracing/kprobes: Merge trace probe enable/disable functions · 1538f888

由 Masami Hiramatsu 提交于 6月 27, 2011

Merge redundant enable/disable functions into enable_trace_probe()
and disable_trace_probe().
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: http://lkml.kernel.org/r/20110627072644.6528.26910.stgit@fedora15

[ converted kprobe selftest to use  enable_trace_probe ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

1538f888

15 7月, 2011 4 次提交

tracing/kprobes: Rename probe_* to trace_probe_* · 7143f168

由 Masami Hiramatsu 提交于 6月 27, 2011

Rename probe_* to trace_probe_* for avoiding namespace
confliction. This also fixes improper names of find_probe_event()
and cleanup_all_probes() to find_trace_probe() and
release_all_trace_probes() respectively.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20110627072636.6528.60374.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

7143f168

perf, x86: P4 PMU - Introduce event alias feature · f9129870

由 Cyrill Gorcunov 提交于 7月 09, 2011

Instead of hw_nmi_watchdog_set_attr() weak function
and appropriate x86_pmu::hw_watchdog_set_attr() call
we introduce even alias mechanism which allow us
to drop this routines completely and isolate quirks
of Netburst architecture inside P4 PMU code only.

The main idea remains the same though -- to allow
nmi-watchdog and perf top run simultaneously.

Note the aliasing mechanism applies to generic
PERF_COUNT_HW_CPU_CYCLES event only because arbitrary
event (say passed as RAW initially) might have some
additional bits set inside ESCR register changing
the behaviour of event and we can't guarantee anymore
that alias event will give the same result.

P.S. Thanks a huge to Don and Steven for for testing
     and early review.
Acked-by: NDon Zickus <dzickus@redhat.com>
Tested-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Stephane Eranian <eranian@google.com>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20110708201712.GS23657@sunSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>

f9129870

tracing: Have dynamic size event stack traces · 4a9bd3f1

由 Steven Rostedt 提交于 7月 14, 2011

Currently the stack trace per event in ftace is only 8 frames.
This can be quite limiting and sometimes useless. Especially when
the "ignore frames" is wrong and we also use up stack frames for
the event processing itself.

Change this to be dynamic by adding a percpu buffer that we can
write a large stack frame into and then copy into the ring buffer.

For interrupts and NMIs that come in while another event is being
process, will only get to use the 8 frame stack. That should be enough
as the task that it interrupted will have the full stack frame anyway.
Requested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4a9bd3f1

perf: Robustify proc and debugfs file recording · 259032bf

由 Sonny Rao 提交于 7月 14, 2011

While attempting to create a timechart of boot up I found perf didn't
tolerate modules being loaded/unloaded.  This patch fixes this by
reading the file once and then writing the size read at the correct
point in the file.  It also simplifies the code somewhat.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NSonny Rao <sonnyrao@chromium.org>
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Link: http://lkml.kernel.org/r/10011.1310614483@neuling.orgSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>

259032bf

14 7月, 2011 3 次提交

ftrace: Fix dynamic selftest failure on some archs · 6331c28c

由 Steven Rostedt 提交于 7月 13, 2011

Archs that do not implement CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, will
fail the dynamic ftrace selftest.

The function tracer has a quick 'off' variable that will prevent
the call back functions from being called. This variable is called
function_trace_stop. In x86, this is implemented directly in the mcount
assembly, but for other archs, an intermediate function is used called
ftrace_test_stop_func().

In dynamic ftrace, the function pointer variable ftrace_trace_function is
used to update the caller code in the mcount caller. But for archs that
do not have CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST set, it only calls
ftrace_test_stop_func() instead, which in turn calls __ftrace_trace_function.

When more than one ftrace_ops is registered, the function it calls is
ftrace_ops_list_func(), which will iterate over all registered ftrace_ops
and call the callbacks that have their hash matching.

The issue happens when two ftrace_ops are registered for different functions
and one is then unregistered. The __ftrace_trace_function is then pointed
to the remaining ftrace_ops callback function directly. This mean it will
be called for all functions that were registered to trace by both ftrace_ops
that were registered.

This is not an issue for archs with CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST,
because the update of ftrace_trace_function doesn't happen until after all
functions have been updated, and then the mcount caller is updated. But
for those archs that do use the ftrace_test_stop_func(), the update is
immediate.

The dynamic selftest fails because it hits this situation, and the
ftrace_ops that it registers fails to only trace what it was suppose to
and instead traces all other functions.

The solution is to delay the setting of __ftrace_trace_function until
after all the functions have been updated according to the registered
ftrace_ops. Also, function_trace_stop is set during the update to prevent
function tracing from calling code that is caused by the function tracer
itself.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

6331c28c

ftrace: Update filter when tracing enabled in set_ftrace_filter() · 072126f4

由 Steven Rostedt 提交于 7月 13, 2011

Currently, if set_ftrace_filter() is called when the ftrace_ops is
active, the function filters will not be updated. They will only be updated
when tracing is disabled and re-enabled.

Update the functions immediately during set_ftrace_filter().
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

072126f4

ftrace: Balance records when updating the hash · 41fb61c2

由 Steven Rostedt 提交于 7月 13, 2011

Whenever the hash of the ftrace_ops is updated, the record counts
must be balance. This requires disabling the records that are set
in the original hash, and then enabling the records that are set
in the updated hash.

Moving the update into ftrace_hash_move() removes the bug where the
hash was updated but the records were not, which results in ftrace
triggering a warning and disabling itself because the ftrace_ops filter
is updated while the ftrace_ops was registered, and then the failure
happens when the ftrace_ops is unregistered.

The current code will not trigger this bug, but new code will.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

41fb61c2

08 7月, 2011 3 次提交

ftrace: Do not disable interrupts for modules in mcount update · 4376cac6

由 Steven Rostedt 提交于 6月 24, 2011

When I mounted an NFS directory, it caused several modules to be loaded. At the
time I was running the preemptirqsoff tracer, and it showed the following
output:

# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 2.6.33.9-rt30-mrg-test
# --------------------------------------------------------------------
# latency: 1177 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
#    -----------------
#    | task: modprobe-19370 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#  => started at: ftrace_module_notify
#  => ended at:   ftrace_module_notify
#
#
#                  _------=> CPU#
#                 / _-----=> irqs-off
#                | / _----=> need-resched
#                || / _---=> hardirq/softirq
#                ||| / _--=> preempt-depth
#                |||| /_--=> lock-depth
#                |||||/     delay
#  cmd     pid   |||||| time  |   caller
#     \   /      ||||||   \   |   /
modprobe-19370   3d....    0us!: ftrace_process_locs <-ftrace_module_notify
modprobe-19370   3d.... 1176us : ftrace_process_locs <-ftrace_module_notify
modprobe-19370   3d.... 1178us : trace_hardirqs_on <-ftrace_module_notify
modprobe-19370   3d.... 1178us : <stack trace>
 => ftrace_process_locs
 => ftrace_module_notify
 => notifier_call_chain
 => __blocking_notifier_call_chain
 => blocking_notifier_call_chain
 => sys_init_module
 => system_call_fastpath

That's over 1ms that interrupts are disabled on a Real-Time kernel!

Looking at the cause (being the ftrace author helped), I found that the
interrupts are disabled before the code modification of mcounts into nops. The
interrupts only need to be disabled on start up around this code, not when
modules are being loaded.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4376cac6

tracing: Still trace filtered irq functions when irq trace is disabled · e4a3f541

由 Steven Rostedt 提交于 6月 14, 2011

If a function is set to be traced by the set_graph_function, but the
option funcgraph-irqs is zero, and the traced function happens to be
called from a interrupt, it will not be traced.

The point of funcgraph-irqs is to not trace interrupts when we are
preempted by an irq, not to not trace functions we want to trace that
happen to be *in* a irq.

Luckily the current->trace_recursion element is perfect to add a flag
to help us be able to trace functions within an interrupt even when
we are not tracing interrupts that preempt the trace.
Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e4a3f541

tracing, x86/irq: Do not trace arch_local_{*,irq_*}() functions · e08fbb78

由 Steven Rostedt 提交于 7月 01, 2011

I triggered a triple fault with gcc 4.5.1 because it did not
honor the inline annotation to arch_local_save_flags() function
and that function was added to the pool of functions traced by
the function tracer.

When preempt_schedule() called arch_local_save_flags() (called
by irqs_disabled()), it was traced, but the first thing the
function tracer does is disable preemption. When it enables
preemption, the NEED_RESCHED flag will not have been cleared and
the preemption check will trigger the call to preempt_schedule()
again.

Although the dynamic function tracer crashed immediately, the
static version of the function tracer (CONFIG_DYNAMIC_FTRACE is
not set) actually was able to show where the problem was.

 swapper-1       3.N.. 103885us : arch_local_save_flags <-preempt_schedule
 swapper-1       3.N.. 103886us : arch_local_save_flags <-preempt_schedule
 swapper-1       3.N.. 103886us : arch_local_save_flags <-preempt_schedule
 swapper-1       3.N.. 103887us : arch_local_save_flags <-preempt_schedule
 swapper-1       3.N.. 103887us : arch_local_save_flags <-preempt_schedule
 swapper-1       3.N.. 103888us : arch_local_save_flags <-preempt_schedule
 swapper-1       3.N.. 103888us : arch_local_save_flags <-preempt_schedule

It went on for a while before it triple faulted with a corrupted
stack.

The arch_local_save_flags and arch_local_irq_* functions should
not be traced. Even though they are marked as inline, gcc may
still make them a function and enable tracing of them.

The simple solution is to just mark them as notrace. I had to
add the <linux/types.h> for this file to include the notrace
tag.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110702033852.733414762@goodmis.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

e08fbb78

05 7月, 2011 2 次提交

Merge branch 'tip/perf/core-2' of... · 931da613

由 Ingo Molnar 提交于 7月 05, 2011

Merge branch 'tip/perf/core-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core

931da613

perf report/annotate/script: Add option to specify a CPU range · 5d67be97

由 Anton Blanchard 提交于 7月 04, 2011

Add an option to perf report/annotate/script to specify which
CPUs to operate on. This enables us to take a single system wide
profile and analyse each CPU (or group of CPUs) in isolation.

This was useful when profiling a multiprocess workload where the
bottleneck was on one CPU but this was hidden in the overall
profile. Per process and per thread breakdowns didn't help
because multiple processes were running on each CPU and no
single process consumed an entire CPU.

The patch converts the list of CPUs returned by cpu_map__new
into a bitmap for fast lookup. I wanted to use -C to be
consistent with perf top/record/stat, but unfortunately perf
report already uses -C <comms>.

 v2: Incorporate suggestions from David Ahern:
	- Added -c to perf script
	- Check that SAMPLE_CPU is set when -c is used
	- Update documentation

 v3: Create perf_session__cpu_bitmap()
Signed-off-by: NAnton Blanchard <anton@samba.org>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Link: http://lkml.kernel.org/r/20110704215750.11647eb9@krytenSigned-off-by: NIngo Molnar <mingo@elte.hu>

5d67be97

04 7月, 2011 2 次提交
- I
  
  Merge branch 'core' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/core · 9f8b6a6c
  由 Ingo Molnar 提交于 7月 04, 2011
  
  9f8b6a6c
- I
  Merge branch 'perf/stacktrace' of... · 729aa21a
  由 Ingo Molnar 提交于 7月 03, 2011
```
Merge branch 'perf/stacktrace' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core
```
  729aa21a
03 7月, 2011 6 次提交

x86: Don't use frame pointer to save old stack on irq entry · a2bbe750

由 Frederic Weisbecker 提交于 7月 02, 2011

rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
in order to restore it later in ret_from_intr.

It is convenient because we save its value in the irq regs
and it's easily restored using the leave instruction.

However this is a kind of abuse of the frame pointer which
role is to help unwinding the kernel by chaining frames
together, each node following the return address to the
previous frame.

But although we are breaking the frame by changing the stack
pointer, there is no preceding return address before the new
frame. Hence using the frame pointer to link the two stacks
breaks the stack unwinders that find a random value instead of
a return address here.

There is no workaround that can work in every case. We are using
the fixup_bp_irq_link() function to dereference that abused frame
pointer in the case of non nesting interrupt (which means stack
changed).
But that doesn't fix the case of interrupts that don't change the
stack (but we still have the unconditional frame link), which is
the case of hardirq interrupting softirq. We have no way to detect
this transition so the frame irq link is considered as a real frame
pointer and the return address is dereferenced but it is still a
spurious one.

There are two possible results of this: either the spurious return
address, a random stack value, luckily belongs to the kernel text
and then the unwinding can continue and we just have a weird entry
in the stack trace. Or it doesn't belong to the kernel text and
unwinding stops there.

This is the reason why stacktraces (including perf callchains) on
irqs that interrupted softirqs don't work very well.

To solve this, we don't save the old stack pointer on rbp anymore
but we save it to a scratch register that we push on the new
stack and that we pop back later on irq return.

This preserves the whole frame chain without spurious return addresses
in the middle and drops the need for the horrid fixup_bp_irq_link()
workaround.

And finally irqs that interrupt softirq are sanely unwinded.

Before:

    99.81%         perf  [kernel.kallsyms]  [k] perf_pending_event
                   |
                   --- perf_pending_event
                       irq_work_run
                       smp_irq_work_interrupt
                       irq_work_interrupt
                      |
                      |--41.60%-- __read
                      |          |
                      |          |--99.90%-- create_worker
                      |          |          bench_sched_messaging
                      |          |          cmd_bench
                      |          |          run_builtin
                      |          |          main
                      |          |          __libc_start_main
                      |           --0.10%-- [...]

After:

     1.64%  swapper  [kernel.kallsyms]  [k] perf_pending_event
            |
            --- perf_pending_event
                irq_work_run
                smp_irq_work_interrupt
                irq_work_interrupt
               |
               |--95.00%-- arch_irq_work_raise
               |          irq_work_queue
               |          __perf_event_overflow
               |          perf_swevent_overflow
               |          perf_swevent_event
               |          perf_tp_event
               |          perf_trace_softirq
               |          __do_softirq
               |          call_softirq
               |          do_softirq
               |          irq_exit
               |          |
               |          |--73.68%-- smp_apic_timer_interrupt
               |          |          apic_timer_interrupt
               |          |          |
               |          |          |--96.43%-- amd_e400_idle
               |          |          |          cpu_idle
               |          |          |          start_secondary
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>

a2bbe750

x86: Remove useless unwinder backlink from irq regs saving · 48ffee7d

由 Frederic Weisbecker 提交于 7月 02, 2011

The unwinder backlink in interrupt entry is very useless.
It's actually not part of the stack frame chain and thus is
never used.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>

48ffee7d

x86,64: Separate arg1 from rbp handling in SAVE_REGS_IRQ · 3b99a3ef

由 Frederic Weisbecker 提交于 7月 01, 2011

Just for clarity in the code. Have a first block that handles
the frame pointer and a separate one that handles pt_regs
pointer and its use.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>

3b99a3ef

x86,64: Simplify save_regs() · 1871853f

由 Frederic Weisbecker 提交于 7月 01, 2011

The save_regs function that saves the regs on low level
irq entry is complicated because of the fact it changes
its stack in the middle and also because it manipulates
data allocated in the caller frame and accesses there
are directly calculated from callee rsp value with the
return address in the middle of the way.

This complicates the static stack offsets calculation and
require more dynamic ones. It also needs a save/restore
of the function's return address.

To simplify and optimize this, turn save_regs() into a
macro.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>

1871853f

x86: Fetch stack from regs when possible in dump_trace() · 47ce11a2

由 Frederic Weisbecker 提交于 6月 30, 2011

When regs are passed to dump_stack(), we fetch the frame
pointer from the regs but the stack pointer is taken from
the current frame.

Thus the frame and stack pointers may not come from the same
context. For example this can result in the unwinder to
think the context is in irq, due to the current value of
the stack, but the frame pointer coming from the regs points
to a frame from another place. It then tries to fix up
the irq link but ends up dereferencing a random frame
pointer that doesn't belong to the irq stack:

[ 9131.706906] ------------[ cut here ]------------
[ 9131.707003] WARNING: at arch/x86/kernel/dumpstack_64.c:129 dump_trace+0x2aa/0x330()
[ 9131.707003] Hardware name: AMD690VM-FMH
[ 9131.707003] Perf: bad frame pointer = 0000000000000005 in callchain
[ 9131.707003] Modules linked in:
[ 9131.707003] Pid: 1050, comm: perf Not tainted 3.0.0-rc3+ #181
[ 9131.707003] Call Trace:
[ 9131.707003]  <IRQ>  [<ffffffff8104bd4a>] warn_slowpath_common+0x7a/0xb0
[ 9131.707003]  [<ffffffff8104be21>] warn_slowpath_fmt+0x41/0x50
[ 9131.707003]  [<ffffffff8178b873>] ? bad_to_user+0x6d/0x10be
[ 9131.707003]  [<ffffffff8100c2da>] dump_trace+0x2aa/0x330
[ 9131.707003]  [<ffffffff810107d3>] ? native_sched_clock+0x13/0x50
[ 9131.707003]  [<ffffffff8101b164>] perf_callchain_kernel+0x54/0x70
[ 9131.707003]  [<ffffffff810d391f>] perf_prepare_sample+0x19f/0x2a0
[ 9131.707003]  [<ffffffff810d546c>] __perf_event_overflow+0x16c/0x290
[ 9131.707003]  [<ffffffff810d5430>] ? __perf_event_overflow+0x130/0x290
[ 9131.707003]  [<ffffffff810107d3>] ? native_sched_clock+0x13/0x50
[ 9131.707003]  [<ffffffff8100fbb9>] ? sched_clock+0x9/0x10
[ 9131.707003]  [<ffffffff810752e5>] ? T.375+0x15/0x90
[ 9131.707003]  [<ffffffff81084da4>] ? trace_hardirqs_on_caller+0x64/0x180
[ 9131.707003]  [<ffffffff810817bd>] ? trace_hardirqs_off+0xd/0x10
[ 9131.707003]  [<ffffffff810d5764>] perf_event_overflow+0x14/0x20
[ 9131.707003]  [<ffffffff810d588c>] perf_swevent_hrtimer+0x11c/0x130
[ 9131.707003]  [<ffffffff817821a1>] ? error_exit+0x51/0xb0
[ 9131.707003]  [<ffffffff81072e93>] __run_hrtimer+0x83/0x1e0
[ 9131.707003]  [<ffffffff810d5770>] ? perf_event_overflow+0x20/0x20
[ 9131.707003]  [<ffffffff81073256>] hrtimer_interrupt+0x106/0x250
[ 9131.707003]  [<ffffffff812a3bfd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 9131.707003]  [<ffffffff81024833>] smp_apic_timer_interrupt+0x53/0x90
[ 9131.707003]  [<ffffffff81789053>] apic_timer_interrupt+0x13/0x20
[ 9131.707003]  <EOI>  [<ffffffff817821a1>] ? error_exit+0x51/0xb0
[ 9131.707003]  [<ffffffff8178219c>] ? error_exit+0x4c/0xb0
[ 9131.707003] ---[ end trace b2560d4876709347 ]---

Fix this by simply taking the stack pointer from regs->sp
when regs are provided.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>

47ce11a2

x86: Save stack pointer in perf live regs savings · 9e46294d

由 Frederic Weisbecker 提交于 7月 02, 2011

In order to prepare for fetching the stack pointer from the
regs when possible in dump_trace() instead of taking the
local one, save the current stack pointer in perf live regs saving.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>

9e46294d

01 7月, 2011 10 次提交

perf, powerpc: Fix build borkage · 29dfc4fd

由 Peter Zijlstra 提交于 7月 01, 2011

The patch a8b0ca17 ("perf: Remove the nmi parameter from the swevent
and overflow interface") missed a spot in the ppc hw_breakpoint code,
fix this up so things compile again.
Reported-by: NIngo Molnar <mingo@elte.hu>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-09pfip95g88s70iwkxu6nnbt@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

29dfc4fd

perf stat: Add noise output for csv mode · 3ae9a34d

由 Zhengyu He 提交于 6月 23, 2011

Previously, when you want perf-stat to output the statistics in
csv mode, no information of the noise will be printed out.

For example right now we output this --repeat information:

 ./perf stat -r3 -x, sleep 1
 1.164789,task-clock
 8,context-switches
 0,CPU-migrations
 219,page-faults
 3337800,cycles

With this patch, the output will be appended with an additional
entry for the noise value:

 ./perf stat -r3 -x, sleep 1
 1.164789,task-clock,3.75%
 8,context-switches,75.00%
 0,CPU-migrations,100.00%
 219,page-faults,0.00%
 3337800,cycles,3.36%
Signed-off-by: NZhengyu He <zhengyuh@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1308861942-4945-1-git-send-email-zhengyuh@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

3ae9a34d

Merge branch 'perf/core' of... · 343a031f

由 Ingo Molnar 提交于 7月 01, 2011

Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core

343a031f

perf: export perf_event_refresh() to modules · 26ca5c11

由 Avi Kivity 提交于 6月 29, 2011

KVM needs one-shot samples, since a PMC programmed to -X will fire after X
events and then again after 2^40 events (i.e. variable period).
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-4-git-send-email-avi@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

26ca5c11

x86, perf: Add constraints for architectural PMU · 0af3ac1f

由 Avi Kivity 提交于 6月 29, 2011

The v1 PMU does not have any fixed counters. Using the v2 constraints,
which do have fixed counters, causes an additional choice to be present
in the weight calculation, but not when actually scheduling the event,
leading to an event being not scheduled at all.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-3-git-send-email-avi@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

0af3ac1f

perf: Add context field to perf_event · 4dc0da86

由 Avi Kivity 提交于 6月 29, 2011

The perf_event overflow handler does not receive any caller-derived
argument, so many callers need to resort to looking up the perf_event
in their local data structure.  This is ugly and doesn't scale if a
single callback services many perf_events.

Fix by adding a context parameter to perf_event_create_kernel_counter()
(and derived hardware breakpoints APIs) and storing it in the perf_event.
The field can be accessed from the callback as event->overflow_handler_context.
All callers are updated.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

4dc0da86

perf, arch: Add generic NODE cache events · 89d6c0b5

由 Peter Zijlstra 提交于 4月 22, 2011

Add a NODE level to the generic cache events which is used to measure
local vs remote memory accesses. Like all other cache events, an
ACCESS is HIT+MISS, if there is no way to distinguish between reads
and writes do reads only etc..

The below needs filling out for !x86 (which I filled out with
unsupported events).

I'm fairly sure ARM can leave it like that since it doesn't strike me as
an architecture that even has NUMA support. SH might have something since
it does appear to have some NUMA bits.

Sparc64, PowerPC and MIPS certainly want a good look there since they
clearly are NUMA capable.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Miller <davem@davemloft.net>
Cc: Anton Blanchard <anton@samba.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptopSigned-off-by: NIngo Molnar <mingo@elte.hu>

89d6c0b5

perf, intel: Try alternative OFFCORE encodings · b79e8941

由 Peter Zijlstra 提交于 5月 23, 2011

Since the OFFCORE registers are fully symmetric, try the other one
when the specified one is already in use.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1306141897.18455.8.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

b79e8941

perf_events: Add Intel Sandy Bridge offcore_response low-level support · ee89cbc2

由 Stephane Eranian 提交于 6月 06, 2011

This patch adds Intel Sandy Bridge offcore_response support by
providing the low-level constraint table for those events.

On Sandy Bridge, there are two offcore_response events. Each uses
its own dedictated extra register. But those registers are NOT shared
between sibling CPUs when HT is on unlike Nehalem/Westmere. They are
always private to each CPU. But they still need to be controlled within
an event group. All events within an event group must use the same
value for the extra MSR. That's not controlled by the second patch in
this series.

Furthermore on Sandy Bridge, the offcore_response events have NO
counter constraints contrary to what the official documentation
indicates, so drop the events from the contraint table.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110606145712.GA7304@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>

ee89cbc2

perf_events: Fix validation of events using an extra reg · cd8a38d3

由 Stephane Eranian 提交于 6月 06, 2011

The validate_group() function needs to validate events with
extra shared regs. Within an event group, only events with
the same value for the extra reg can co-exist. This was not
checked by validate_group() because it was missing the
shared_regs logic.

This patch changes the allocation of the fake cpuc used for
validation to also point to a fake shared_regs structure such
that group events be properly testing.

It modifies __intel_shared_reg_get_constraints() to use
spin_lock_irqsave() to avoid lockdep issues.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110606145708.GA7279@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>

cd8a38d3