- 22 4月, 2010 1 次提交
-
-
由 Frederic Weisbecker 提交于
The ftrace_dump_on_oops kernel parameter, sysctl and sysrq let one dump every cpu buffers when an oops or panic happens. It's nice when you have few cpus but it may take ages if have many, plus you miss the real origin of the problem in all the cpu traces. Sometimes, all you need is to dump the cpu buffer that triggered the opps, most of the time it is our main interest. This patch modifies ftrace_dump_on_oops to handle this choice. The ftrace_dump_on_oops kernel parameter, when it comes alone, has the same behaviour than before. But ftrace_dump_on_oops=orig_cpu will only dump the buffer of the cpu that oops'ed. Similarly, sysctl kernel.ftrace_dump_on_oops=1 and echo 1 > /proc/sys/kernel/ftrace_dump_on_oops keep their previous behaviour. But setting 2 jumps into cpu origin dump mode. v2: Fix double setup v3: Fix spelling issues reported by Randy Dunlap v4: Also update __ftrace_dump in the selftests Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
-
- 05 4月, 2010 1 次提交
-
-
由 Lai Jiangshan 提交于
Because a local variable is not initialized, I got these when I did 'cat tracing/trace'. (not trace_pipe): CPU:0 [LOST 18446744071579453134 EVENTS] ps-3099 [000] 560.770221: lock_acquire: ffff880030865010 &(&dentry->d_lock)->rlock CPU:0 [LOST 18446744071579453134 EVENTS] ps-3099 [000] 560.770221: lock_release: ffff880030865010 &(&dentry->d_lock)->rlock CPU:0 [LOST 18446612133255294080 EVENTS] ps-3099 [000] 560.770221: lock_acquire: ffff880030865010 &(&dentry->d_lock)->rlock CPU:0 [LOST 18446744071579453134 EVENTS] ps-3099 [000] 560.770222: lock_release: ffff880030865010 &(&dentry->d_lock)->rlock CPU:0 [LOST 18446744071579453134 EVENTS] ps-3099 [000] 560.770222: lock_release: ffffffff816cfb98 dcache_lock See peek_next_entry(), it does not set *lost_events when we 'cat tracing/trace' Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4BB9A929.2000303@cn.fujitsu.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 01 4月, 2010 4 次提交
-
-
由 Frederic Weisbecker 提交于
The trace event buffer used by perf to record raw sample events is typed as an array of char and may then not be aligned to 8 by alloc_percpu(). But we need it to be aligned to 8 in sparc64 because we cast this buffer into a random structure type built by the TRACE_EVENT() macro to store the traces. So if a random 64 bits field is accessed inside, it may be not under an expected good alignment. Use an array of long instead to force the appropriate alignment, and perform a compile time check to ensure the size in byte of the buffer is a multiple of sizeof(long) so that its actual size doesn't get shrinked under us. This fixes unaligned accesses reported while using perf lock in sparc 64. Suggested-by: NDavid Miller <davem@davemloft.net> Suggested-by: NTejun Heo <htejun@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Steven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
Currently, binary readers of the ring buffer only know where events were lost, but not how many events were lost at that location. This information is available, but it would require adding another field to the sub buffer header to include it. But when a event can not fit at the end of a sub buffer, it is written to the next sub buffer. This means there is a good chance that the buffer may have room to hold this counter. If it does, write the counter at the end of the sub buffer and set another flag in the data size field that states that this information exists. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
Now that the ring buffer can keep track of where events are lost. Use this information to the output of trace_pipe: hackbench-3588 [001] 1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock hackbench-3588 [001] 1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock hackbench-3588 [001] 1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock CPU:1 [LOST 673 EVENTS] hackbench-3588 [001] 1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738 hackbench-3588 [001] 1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem hackbench-3588 [001] 1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem Even works with the function graph tracer: 2) ! 170.098 us | } 2) 4.036 us | rcu_irq_exit(); 2) 3.657 us | idle_cpu(); 2) ! 190.301 us | } CPU:2 [LOST 2196 EVENTS] 2) 0.853 us | } /* cancel_dirty_page */ 2) | remove_from_page_cache() { 2) 1.578 us | _raw_spin_lock_irq(); 2) | __remove_from_page_cache() { Note, it does not work with the iterator "trace" file, since it requires the use of consuming the page from the ring buffer to determine how many events were lost, which the iterator does not do. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
Currently, when the ring buffer drops events, it does not record the fact that it did so. It does inform the writer that the event was dropped by returning a NULL event, but it does not put in any place holder where the event was dropped. This is not a trivial thing to add because the ring buffer mostly runs in overwrite (flight recorder) mode. That is, when the ring buffer is full, new data will overwrite old data. In a produce/consumer mode, where new data is simply dropped when the ring buffer is full, it is trivial to add the placeholder for dropped events. When there's more room to write new data, then a special event can be added to notify the reader about the dropped events. But in overwrite mode, any new write can overwrite events. A place holder can not be inserted into the ring buffer since there never may be room. A reader could also come in at anytime and miss the placeholder. Luckily, the way the ring buffer works, the read side can find out if events were lost or not, and how many events. Everytime a write takes place, if it overwrites the header page (the next read) it updates a "overrun" variable that keeps track of the number of lost events. When a reader swaps out a page from the ring buffer, it can record this number, perfom the swap, and then check to see if the number changed, and take the diff if it has, which would be the number of events dropped. This can be stored by the reader and returned to callers of the reader. Since the reader page swap will fail if the writer moved the head page since the time the reader page set up the swap, this gives room to record the overruns without worrying about races. If the reader sets up the pages, records the overrun, than performs the swap, if the swap succeeds, then the overrun variable has not been updated since the setup before the swap. For binary readers of the ring buffer, a flag is set in the header of each sub page (sub buffer) of the ring buffer. This flag is embedded in the size field of the data on the sub buffer, in the 31st bit (the size can be 32 or 64 bits depending on the architecture), but only 27 bits needs to be used for the actual size (less actually). We could add a new field in the sub buffer header to also record the number of events dropped since the last read, but this will change the format of the binary ring buffer a bit too much. Perhaps this change can be made if the information on the number of events dropped is considered important enough. Note, the notification of dropped events is only used by consuming reads or peeking at the ring buffer. Iterating over the ring buffer does not keep this information because the necessary data is only available when a page swap is made, and the iterator does not swap out pages. Cc: Robert Richter <robert.richter@amd.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 30 3月, 2010 3 次提交
-
-
由 Tejun Heo 提交于
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: NTejun Heo <tj@kernel.org> Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
-
由 Julia Lawall 提交于
In some error handling cases the lock is not unlocked. The return is converted to a goto, to share the unlock at the end of the function. A simplified version of the semantic patch that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ expression E1; identifier f; @@ f (...) { <+... * spin_lock_irq (E1,...); ... when != E1 * return ...; ...+> } // </smpl> Signed-off-by: NJulia Lawall <julia@diku.dk> LKML-Reference: <Pine.LNX.4.64.1003291736440.21896@ask.diku.dk> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Li Zefan 提交于
# echo 1 > events/enable # echo global > trace_clock ------------[ cut here ]------------ WARNING: at kernel/lockdep.c:3162 check_flags+0xb2/0x190() ... ---[ end trace 3f86734a89416623 ]--- possible reason: unannotated irqs-on. ... There's no reason to use the raw_local_irq_save() in trace_clock_global. The local_irq_save() version is fine, and does not cause the bug in lockdep. Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4BA97FA1.7030606@cn.fujitsu.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 19 3月, 2010 1 次提交
-
-
由 Steven Rostedt 提交于
The ring buffer uses 4 byte alignment while recording events into the buffer, even on 64bit machines. This saves space when there are lots of events being recorded at 4 byte boundaries. The ring buffer has a zero copy method to write into the buffer, with the reserving of space and then committing it. This may cause problems when writing an 8 byte word into a 4 byte alignment (not 8). For x86 and PPC this is not an issue, but on some architectures this would cause an out-of-alignment exception. This patch uses CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to determine if it is OK to use 4 byte alignments on 64 bit machines. If it is not, it forces the ring buffer event header to be 8 bytes and not 4, and will align the length of the data to be 8 byte aligned. This keeps the data payload at 8 byte alignments and will allow these machines to run without issue. The trick to this is that the header can be either 4 bytes or 8 bytes depending on the length of the data payload. The 4 byte header has a length field that supports up to 112 bytes. If the length of the data is more than 112, the length field is set to zero, and the actual length is stored in the next 4 bytes after the header. When CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not set, the code forces zero in the 4 byte header forcing the length to be stored in the 4 byte array, even with a small data load. It also forces the length of the data load to be 8 byte aligned. The combination of these two guarantee that the data is always at 8 byte alignment. Tested-by: NFrederic Weisbecker <fweisbec@gmail.com> (on sparc64) Reported-by: NFrederic Weisbecker <fweisbec@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 17 3月, 2010 1 次提交
-
-
由 Frederic Weisbecker 提交于
perf_arch_fetch_caller_regs() is exported for the overriden x86 version, but not for the generic weak version. As a general rule, weak functions should not have their symbol exported in the same file they are defined. So let's export it on trace_event_perf.c as it is used by trace events only. This fixes: ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined! ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined! -v2: And also only build it if trace events are enabled. -v3: Fix changelog mistake Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 13 3月, 2010 5 次提交
-
-
由 Steven Rostedt 提交于
A bug was found with Li Zefan's ftrace_stress_test that caused applications to segfault during the test. Placing a tracing_off() in the segfault code, and examining several traces, I found that the following was always the case. The lock tracer was enabled (lockdep being required) and userstack was enabled. Testing this out, I just enabled the two, but that was not good enough. I needed to run something else that could trigger it. Running a load like hackbench did not work, but executing a new program would. The following would trigger the segfault within seconds: # echo 1 > /debug/tracing/options/userstacktrace # echo 1 > /debug/tracing/events/lock/enable # while :; do ls > /dev/null ; done Enabling the function graph tracer and looking at what was happening I finally noticed that all cashes happened just after an NMI. 1) | copy_user_handle_tail() { 1) | bad_area_nosemaphore() { 1) | __bad_area_nosemaphore() { 1) | no_context() { 1) | fixup_exception() { 1) 0.319 us | search_exception_tables(); 1) 0.873 us | } [...] 1) 0.314 us | __rcu_read_unlock(); 1) 0.325 us | native_apic_mem_write(); 1) 0.943 us | } 1) 0.304 us | rcu_nmi_exit(); [...] 1) 0.479 us | find_vma(); 1) | bad_area() { 1) | __bad_area() { After capturing several traces of failures, all of them happened after an NMI. Curious about this, I added a trace_printk() to the NMI handler to read the regs->ip to see where the NMI happened. In which I found out it was here: ffffffff8135b660 <page_fault>: ffffffff8135b660: 48 83 ec 78 sub $0x78,%rsp ffffffff8135b664: e8 97 01 00 00 callq ffffffff8135b800 <error_entry> What was happening is that the NMI would happen at the place that a page fault occurred. It would call rcu_read_lock() which was traced by the lock events, and the user_stack_trace would run. This would trigger a page fault inside the NMI. I do not see where the CR2 register is saved or restored in NMI handling. This means that it would corrupt the page fault handling that the NMI interrupted. The reason the while loop of ls helped trigger the bug, was that each execution of ls would cause lots of pages to be faulted in, and increase the chances of the race happening. The simple solution is to not allow user stack traces in NMI context. After this patch, I ran the above "ls" test for a couple of hours without any issues. Without this patch, the bug would trigger in less than a minute. Cc: stable@kernel.org Reported-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
When the trace iterator is read, tracing_start() and tracing_stop() is called to stop tracing while the iterator is processing the trace output. These functions disable both the standard buffer and the max latency buffer. But if the wakeup tracer is running, it can switch these buffers between the two disables: buffer = global_trace.buffer; if (buffer) ring_buffer_record_disable(buffer); <<<--------- swap happens here buffer = max_tr.buffer; if (buffer) ring_buffer_record_disable(buffer); What happens is that we disabled the same buffer twice. On tracing_start() we can enable the same buffer twice. All ring_buffer_record_disable() must be matched with a ring_buffer_record_enable() or the buffer can be disable permanently, or enable prematurely, and cause a bug where a reset happens while a trace is commiting. This patch protects these two by taking the ftrace_max_lock to prevent a switch from occurring. Found with Li Zefan's ftrace_stress_test. Cc: stable@kernel.org Reported-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
In the ftrace code that resets the ring buffer it references the buffer with a local variable, but then uses the tr->buffer as the parameter to reset. If the wakeup tracer is running, which can switch the tr->buffer with the max saved buffer, this can break the requirement of disabling the buffer before the reset. buffer = tr->buffer; ring_buffer_record_disable(buffer); synchronize_sched(); __tracing_reset(tr->buffer, cpu); If the tr->buffer is swapped, then the reset is not happening to the buffer that was disabled. This will cause the ring buffer to fail. Found with Li Zefan's ftrace_stress_test. Cc: stable@kernel.org Reported-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
If the graph tracer is active, and a task is forked but the allocating of the processes graph stack fails, it can cause crash later on. This is due to the temporary stack being NULL, but the curr_ret_stack variable is copied from the parent. If it is not -1, then in ftrace_graph_probe_sched_switch() the following: for (index = next->curr_ret_stack; index >= 0; index--) next->ret_stack[index].calltime += timestamp; Will cause a kernel OOPS. Found with Li Zefan's ftrace_stress_test. Cc: stable@kernel.org Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Lai Jiangshan 提交于
The ring buffer resizing and resetting relies on a schedule RCU action. The buffers are disabled, a synchronize_sched() is called and then the resize or reset takes place. But this only works if the disabling of the buffers are within the preempt disabled section, otherwise a window exists that the buffers can be written to while a reset or resize takes place. Cc: stable@kernel.org Reported-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4B949E43.2010906@cn.fujitsu.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 11 3月, 2010 2 次提交
-
-
由 Xiao Guangrong 提交于
Export perf_trace_regs and perf_arch_fetch_caller_regs since module will use these. Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> [ use EXPORT_PER_CPU_SYMBOL_GPL() ] Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4B989C1B.2090407@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Paul E. McKenney 提交于
Replace the calls to read_barrier_depends() in ftrace_list_func() with rcu_dereference_raw() to improve readability. The reason that we use rcu_dereference_raw() here is that removed entries are never freed, instead they are simply leaked. This is one of a very few cases where use of rcu_dereference_raw() is the long-term right answer. And I don't yet know of any others. ;-) Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1267830207-9474-1-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 10 3月, 2010 2 次提交
-
-
由 Frederic Weisbecker 提交于
Drop the obsolete "profile" naming used by perf for trace events. Perf can now do more than simple events counting, so generalize the API naming. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Jason Baron <jbaron@redhat.com>
-
由 Frederic Weisbecker 提交于
We are taking a wrong regs snapshot when a trace event triggers. Either we use get_irq_regs(), which gives us the interrupted registers if we are in an interrupt, or we use task_pt_regs() which gives us the state before we entered the kernel, assuming we are lucky enough to be no kernel thread, in which case task_pt_regs() returns the initial set of regs when the kernel thread was started. What we want is different. We need a hot snapshot of the regs, so that we can get the instruction pointer to record in the sample, the frame pointer for the callchain, and some other things. Let's use the new perf_fetch_caller_regs() for that. Comparison with perf record -e lock: -R -a -f -g Before: perf [kernel] [k] __do_softirq | --- __do_softirq | |--55.16%-- __open | --44.84%-- __write_nocancel After: perf [kernel] [k] perf_tp_event | --- perf_tp_event | |--41.07%-- lock_acquire | | | |--39.36%-- _raw_spin_lock | | | | | |--7.81%-- hrtimer_interrupt | | | smp_apic_timer_interrupt | | | apic_timer_interrupt The old case was producing unreliable callchains. Now having right frame and instruction pointers, we have the trace we want. Also syscalls and kprobe events already have the right regs, let's use them instead of wasting a retrieval. v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs() Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Archs <linux-arch@vger.kernel.org>
-
- 06 3月, 2010 4 次提交
-
-
由 Tim Bird 提交于
Add support for tracing_thresh to the function_graph tracer. This version of this feature isolates the checks into new entry and return functions, to avoid adding more conditional code into the main function_graph paths. When the tracing_thresh is set and the function graph tracer is enabled, only the functions that took longer than the time in microseconds that was set in tracing_thresh are recorded. To do this efficiently, only the function exits are recorded: [tracing]# echo 100 > tracing_thresh [tracing]# echo function_graph > current_tracer [tracing]# cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) ! 119.214 us | } /* smp_apic_timer_interrupt */ 1) <========== | 0) ! 101.527 us | } /* __rcu_process_callbacks */ 0) ! 126.461 us | } /* rcu_process_callbacks */ 0) ! 145.111 us | } /* __do_softirq */ 0) ! 149.667 us | } /* do_softirq */ 0) ! 168.817 us | } /* irq_exit */ 0) ! 248.254 us | } /* smp_apic_timer_interrupt */ Also, add support for specifying tracing_thresh on the kernel command line. When used like so: "tracing_thresh=200 ftrace=function_graph" this can be used to analyse system startup. It is important to disable tracing soon after boot, in order to avoid losing the trace data. Acked-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NTim Bird <tim.bird@am.sony.com> LKML-Reference: <4B87098B.4040308@am.sony.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Arnaldo Carvalho de Melo 提交于
The latency output showed: # | task: -3 (uid:0 nice:0 policy:1 rt_prio:99) The comm is missing in the "task:" and it looks like a minus 3 is the output. The correct display should be: # | task: migration/0-3 (uid:0 nice:0 policy:1 rt_prio:99) The problem is that the comm is being stored in the wrong data structure. The max_tr.data[cpu] is what stores the comm, not the tr->data[cpu]. Before this patch the max_tr.data[cpu]->comm was zeroed and the /debug/trace ended up showing just the '-' sign followed by the pid. Also remove a needless initialization of max_data. Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1267824230-23861-1-git-send-email-acme@infradead.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
When a '}' does not have a matching function start, the name is printed within parenthesis. But this makes it confusing between ending '}' and function starts. This patch makes the function name appear in C comment notation. Old view: 3) 1.281 us | } (might_fault) 3) 3.620 us | } (filldir) 3) 5.251 us | } (call_filldir) 3) | call_filldir() { 3) | filldir() { New view: 3) 1.281 us | } /* might_fault */ 3) 3.620 us | } /* filldir */ 3) 5.251 us | } /* call_filldir */ 3) | call_filldir() { 3) | filldir() { Requested-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
The declaration of ftrace_set_func() is at the start of the ftrace.c file and wrapped with a #ifdef CONFIG_FUNCTION_GRAPH condition. If function graph tracing is enabled but CONFIG_DYNAMIC_FTRACE is not, a warning about that function being declared static and unused is given. This really should have been placed within the CONFIG_FUNCTION_GRAPH condition that uses ftrace_set_func(). Moving the declaration down fixes the warning and makes the code cleaner. Reported-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 04 3月, 2010 1 次提交
-
-
由 Paul E. McKenney 提交于
Change the pair of rcu_dereference() calls in ftrace_perf_buf_prepare() to rcu_dereference_sched(). Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1267667418-32233-3-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 03 3月, 2010 1 次提交
-
-
由 Lai Jiangshan 提交于
This warning in s_next() can be triggered by lseek(): [<c018b3f7>] ? s_next+0x77/0x80 [<c013e3c1>] warn_slowpath_common+0x81/0xa0 [<c018b3f7>] ? s_next+0x77/0x80 [<c013e3fa>] warn_slowpath_null+0x1a/0x20 [<c018b3f7>] s_next+0x77/0x80 [<c01efa77>] traverse+0x117/0x200 [<c01eff13>] seq_lseek+0xa3/0x120 [<c01efe70>] ? seq_lseek+0x0/0x120 [<c01d7081>] vfs_llseek+0x41/0x50 [<c01d8116>] sys_llseek+0x66/0xa0 [<c0102bd0>] sysenter_do_call+0x12/0x26 The iterator "leftover" variable is zeroed in the opening of the trace file. But lseek can call s_start() which will call s_next() without reseting the "leftover" variable back to zero, which might trigger the WARN_ON_ONCE(iter->leftover) that is in s_next(). Cc: stable@kernel.org Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4B8CE06A.9090207@cn.fujitsu.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 01 3月, 2010 2 次提交
-
-
由 Dmitry Monakhov 提交于
Currently even if BLKTRACESETUP ioctl has failed user must call BLKTRACETEARDOWN to be shure what all staff was cleaned, which is contr-intuitive. Let's setup ioctl make necessery cleanup by it self. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Frederic Weisbecker 提交于
trace_clock.c includes spinlock.h, which ends up including asm/system.h, which in turn includes linux/irqflags.h in x86. So the definition of raw_local_irq_save is luckily covered there, but this is not the case in parisc: tip/kernel/trace/trace_clock.c:86: error: implicit declaration of function 'raw_local_irq_save' tip/kernel/trace/trace_clock.c:112: error: implicit declaration of function 'raw_local_irq_restore' We need to include linux/irqflags.h directly from trace_clock.c to avoid such build error. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Robert Richter <robert.richter@amd.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 27 2月, 2010 1 次提交
-
-
由 Steven Rostedt 提交于
The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 25 2月, 2010 6 次提交
-
-
由 Wenji Huang 提交于
Discard freeing field->type since it is not necessary. Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NWenji Huang <wenji.huang@oracle.com> LKML-Reference: <1266997226-6833-5-git-send-email-wenji.huang@oracle.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Wenji Huang 提交于
The "cpu" variable is declared at the start of the function and also within a branch, with the exact same initialization. Remove the local variable of the same name in the branch. Signed-off-by: NWenji Huang <wenji.huang@oracle.com> LKML-Reference: <1266997226-6833-3-git-send-email-wenji.huang@oracle.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Wenji Huang 提交于
Signed-off-by: NWenji Huang <wenji.huang@oracle.com> LKML-Reference: <1266997226-6833-2-git-send-email-wenji.huang@oracle.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Wenji Huang 提交于
Signed-off-by: NWenji Huang <wenji.huang@oracle.com> LKML-Reference: <1266997226-6833-1-git-send-email-wenji.huang@oracle.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Li Zefan 提交于
The power tracer has been converted to power trace events. Acked-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4B84D50E.4070806@cn.fujitsu.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Jeff Mahoney 提交于
GCC 4.5 introduces behavior that forces the alignment of structures to use the largest possible value. The default value is 32 bytes, so if some structures are defined with a 4-byte alignment and others aren't declared with an alignment constraint at all - it will align at 32-bytes. For things like the ftrace events, this results in a non-standard array. When initializing the ftrace subsystem, we traverse the _ftrace_events section and call the initialization callback for each event. When the structures are misaligned, we could be treating another part of the structure (or the zeroed out space between them) as a function pointer. This patch forces the alignment for all the ftrace_event_call structures to 4 bytes. Without this patch, the kernel fails to boot very early when built with gcc 4.5. It's trivial to check the alignment of the members of the array, so it might be worthwhile to add something to the build system to do that automatically. Unfortunately, that only covers this case. I've asked one of the gcc developers about adding a warning when this condition is seen. Cc: stable@kernel.org Signed-off-by: NJeff Mahoney <jeffm@suse.com> LKML-Reference: <4B85770B.6010901@suse.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 17 2月, 2010 2 次提交
-
-
由 Heiko Carstens 提交于
KPROBES_EVENT actually depends on the regs and stack access API (b1cf540f) and not on x86. So introduce a new config option which architectures can select if they have the API implemented and switch x86. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Acked-by: NMasami Hiramatsu <mhiramat@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> LKML-Reference: <20100210162517.GB6933@osiris.boeblingen.de.ibm.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Mike Frysinger 提交于
Most implementations of arch_syscall_addr() are the same, so create a default version in common code and move the one piece that differs (the syscall table) to asm/syscall.h. New arch ports don't have to waste time copying & pasting this simple function. The s390/sparc versions need to be different, so document why. Signed-off-by: NMike Frysinger <vapier@gentoo.org> Acked-by: NDavid S. Miller <davem@davemloft.net> Acked-by: NPaul Mundt <lethal@linux-sh.org> Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1264498803-17278-1-git-send-email-vapier@gentoo.org> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
- 14 2月, 2010 1 次提交
-
-
由 Heiko Carstens 提交于
Trying to add a probe like: echo p:myprobe 0x10000 > /sys/kernel/debug/tracing/kprobe_events will fail since the wrong pointer is passed to strict_strtoul when trying to convert the address to an unsigned long. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Acked-by: NMasami Hiramatsu <mhiramat@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20100210162346.GA6933@osiris.boeblingen.de.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 2月, 2010 1 次提交
-
-
由 Li Zefan 提交于
I don't see why we can only clear all functions from the filter. After patching: # echo sys_open > set_graph_function # echo sys_close >> set_graph_function # cat set_graph_function sys_open sys_close # echo '!sys_close' >> set_graph_function # cat set_graph_function sys_open Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4B726388.2000408@cn.fujitsu.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 10 2月, 2010 1 次提交
-
-
由 Steven Rostedt 提交于
The branch annotation is a bit difficult to see the worst offenders because it only sorts by percentage: correct incorrect % Function File Line ------- --------- - -------- ---- ---- 0 163 100 qdisc_restart sch_generic.c 179 0 163 100 pfifo_fast_dequeue sch_generic.c 447 0 4 100 pskb_trim_rcsum skbuff.h 1689 0 4 100 llc_rcv llc_input.c 170 0 18 100 psmouse_interrupt psmouse-base.c 304 0 3 100 atkbd_interrupt atkbd.c 389 0 5 100 usb_alloc_dev usb.c 437 0 11 100 vsscanf vsprintf.c 1897 0 2 100 IS_ERR err.h 34 0 23 100 __rmqueue_fallback page_alloc.c 865 0 4 100 probe_wakeup_sched_switch trace_sched_wakeup.c 142 0 3 100 move_masked_irq migration.c 11 Adding the incorrect and correct values as sort keys makes this file a bit more informative: correct incorrect % Function File Line ------- --------- - -------- ---- ---- 0 366541 100 audit_syscall_entry auditsc.c 1637 0 366538 100 audit_syscall_exit auditsc.c 1685 0 115839 100 sched_info_switch sched_stats.h 269 0 74567 100 sched_info_queued sched_stats.h 222 0 66578 100 sched_info_dequeued sched_stats.h 177 0 15113 100 trace_workqueue_insertion workqueue.h 38 0 15107 100 trace_workqueue_execution workqueue.h 45 0 3622 100 syscall_trace_leave ptrace.c 1772 0 2750 100 sched_move_task sched.c 10100 0 2750 100 sched_move_task sched.c 10110 0 1815 100 pre_schedule_rt sched_rt.c 1462 0 837 100 audit_alloc auditsc.c 879 0 814 100 tcp_mss_split_point tcp_output.c 1302 Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-