- 14 10月, 2008 40 次提交
-
-
由 Stephen Rothwell 提交于
After disabling FTRACE_MCOUNT_RECORD via a patch, a dormant build failure surfaced: kernel/trace/ftrace.c: In function 'ftrace_record_ip': kernel/trace/ftrace.c:416: error: incompatible type for argument 1 of '_spin_lock_irqsave' kernel/trace/ftrace.c:433: error: incompatible type for argument 1 of '_spin_lock_irqsave' Introduced by commit 6dad8e07f4c10b17b038e84d29f3ca41c2e55cd0 ("ftrace: add necessary locking for ftrace records"). Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
The modification of code is performed either by kstop_machine, before SMP starts, or on module code before the module is executed. There is no reason to do the modifications from assembly. The copy to and from user functions are sufficient and produces cleaner and easier to read code. Thanks to Benjamin Herrenschmidt for suggesting the idea. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
During tests and checks, I've discovered that there were failures to convert mcount callers into nops. Looking deeper into these failures, code that was attempted to be changed was not an mcount caller. The current code only updates if the code being changed is what it expects, but I still investigate any time there is a failure. What was happening is that a weak symbol was being used as a reference for other mcount callers. That weak symbol was also referenced elsewhere so the offsets were using the strong symbol and not the function symbol that it was referenced from. This patch changes the setting up of the mcount_loc section to search for a global function that is not weak. It will pick a local over a weak but if only a weak is found in a section, a warning is printed and the mcount location is not recorded (just to be safe). Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
I'm trying to keep all the arch changes in recordmcount.pl in one place. I moved your code into that area, by adding the flags to the commands that were passed in. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Jeremy Fitzhardinge 提交于
I'm seeing when I use separate src/build dirs: make[3]: *** [arch/x86/kernel/time_32.o] Error 1 /bin/sh: scripts/recordmcount.pl: No such file or directory make[3]: *** [arch/x86/kernel/irq_32.o] Error 1 /bin/sh: scripts/recordmcount.pl: No such file or directory make[3]: *** [arch/x86/kernel/ldt.o] Error 1 /bin/sh: scripts/recordmcount.pl: No such file or directory make[3]: *** [arch/x86/kernel/i8259.o] Error 1 /bin/sh: scripts/recordmcount.pl: No such file or directory This fixes it. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Huang Ying 提交于
This patch fixes incorrect comment style of __ftrace_enabled_save(). Signed-off-by: NHuang Ying <ying.huang@intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
The new design of pre-recorded mcounts and updating the code outside of kstop_machine has changed the way the records themselves are protected. This patch uses the ftrace_lock to protect the records. Note, the lock still does not need to be taken within calls that are only called via kstop_machine, since the that code can not run while the spin lock is held. Also removed the hash_lock needed for the daemon when MCOUNT_RECORD is configured. Also did a slight cleanup of an unused variable. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
If one of the self tests of ftrace has disabled the function tracer, do not run the code to convert the mcount calls in modules. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Frédéric Weisbecker 提交于
This patch fixes some mistakes on the tracer in warning messages when debugfs fails to create tracing files. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: srostedt@redhat.com Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
hack around: ld: Relocatable linking with relocations from format elf32-i386 (init/.tmp_gl_calibrate.o) to format elf64-x86-64 (init/.tmp_mx_calibrate.o) i CC arch/x86/mm/extable.o objcopy: 'init/.tmp_mx_calibrate.o': No such file rm: cannot remove `init/.tmp_mx_calibrate.o': No such file or directory ld: Relocatable linking with relocations from format elf32-i386 (arch/x86/mm/extable.o) to format elf64-x86-64 (arch/x86/mm/.tmp_mx_extable.o) is not supported mv: cannot stat `arch/x86/mm/.tmp_mx_extable.o': No such file or directory ld: Relocatable linking with relocations from format elf32-i386 (arch/x86/mm/fault.o) to format elf64-x86-64 (arch/x86/mm/.tmp_mx_fault.o) is not supported Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
fix: kernel/built-in.o: In function `ftrace_dump': (.text+0x2e2ea): undefined reference to `ftrace_kill_atomic' Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
fix: In file included from init/main.c:65: include/linux/ftrace.h:166: error: expected ‘,' or ‘;' before ‘{' token make[1]: *** [init/main.o] Error 1 make: *** [init/main.o] Error 2 Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
At OLS I had a lot of interest to be able to have the ftrace buffers dumped on panic. Usually one would expect to uses kexec and examine the buffers after a new kernel is loaded. But sometimes the resources do not permit kdump and kexec, so having an option to still see the sequence of events up to the crash is very advantageous. This patch adds the option to have the ftrace buffers dumped to the console in the latency_trace format on a panic. When the option is set, the default entries per CPU buffer are lowered to 16384, since the writing to the serial (if that is the console) may take an awful long time otherwise. [ Changes since -v1: Got alpine to send correctly (as well as spell check working). Removed config option. Moved the static variables into ftrace_dump itself. Gave printk a log level. ] Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
Based on Randy Dunlap's suggestion, the ftrace_printk kernel-doc belongs with the ftrace_printk macro that should be used. Not with the __ftrace_printk internal function. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Acked-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
This patch adds a feature that can help kernel developers debug their code using ftrace. int ftrace_printk(const char *fmt, ...); This records into the ftrace buffer using printf formatting. The entry size in the buffers are still a fixed length. A new type has been added that allows for more entries to be used for a single recording. The start of the print is still the same as the other entries. It returns the number of characters written to the ftrace buffer. For example: Having a module with the following code: static int __init ftrace_print_test(void) { ftrace_printk("jiffies are %ld\n", jiffies); return 0; } Gives me: insmod-5441 3...1 7569us : ftrace_print_test: jiffies are 4296626666 for the latency_trace file and: insmod-5441 [03] 1959.370498: ftrace_print_test jiffies are 4296626666 for the trace file. Note: Only the infrastructure should go into the kernel. It is to help facilitate debugging for other kernel developers. Calls to ftrace_printk is not intended to be left in the kernel, and should be frowned upon just like scattering printks around in the code. But having this easily at your fingertips helps the debugging go faster and bugs be solved quicker. Maybe later on, we can hook this with markers and have their printf format be sucked into ftrace output. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
Some tracers will need to work with more than one entry. In order to do this the trace_entry structure was split into two fields. One for the start of all entries, and one to continue an existing entry. The trace_entry structure now has a "field" entry that consists of the previous content of the trace_entry, and a "cont" entry that is just a string buffer the size of the "field" entry. Thanks to Andrew Morton for suggesting this idea. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
When a mcount pointer is recorded into a table, it is used to add or remove calls to mcount (replacing them with nops). If the code is removed via removing a module, the pointers still exist. At modifying the code a check is always made to make sure the code being replaced is the code expected. In-other-words, the code being replaced is compared to what it is expected to be before being replaced. There is a very small chance that the code being replaced just happens to look like code that calls mcount (very small since the call to mcount is relative). To remove this chance, this patch adds ftrace_release to allow module unloading to remove the pointers to mcount within the module. Another change for init calls is made to not trace calls marked with __init. The tracing can not be started until after init is done anyway. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
The notrace define belongs in compiler.h so that it can be used in init.h Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
Seems that freed records can appear in the available_filter_functions list. This patch fixes that. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
Mathieu Desnoyers revealed a bug in the original code. The nop that is used to relpace the mcount caller can be a two part nop. This runs the risk where a process can be preempted after executing the first nop, but before the second part of the nop. The ftrace code calls kstop_machine to keep multiple CPUs from executing code that is being modified, but it does not protect against a task preempting in the middle of a two part nop. If the above preemption happens and the tracer is enabled, after the kstop_machine runs, all those nops will be calls to the trace function. If the preempted process that was preempted between the two nops is executed again, it will execute half of the call to the trace function, and this might crash the system. This patch instead uses what both the latest Intel and AMD spec suggests. That is the P6_NOP5 sequence of "0x0f 0x1f 0x44 0x00 0x00". Note, some older CPUs and QEMU might fault on this nop, so this nop is executed with fault handling first. If it detects a fault, it will then use the code "0x66 0x66 0x66 0x66 0x90". If that faults, it will then default to a simple "jmp 1f; .byte 0x00 0x00 0x00; 1:". The jmp is not optimal but will do if the first two can not be executed. TODO: Examine the cpuid to determine the nop to use. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
x86 now sets up the mcount locations through the build and no longer needs to record the ip when the function is executed. This patch changes the initial mcount to simply return. There's no need to do any other work. If the ftrace start up test fails, the original mcount will be what everything will use, so having this as fast as possible is a good thing. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
Enable the use of the __mcount_loc infrastructure on x86_64 and i386. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
When enabling or disabling CONFIG_FTRACE_MCOUNT_RECORD, we want a full kernel compile to handle the adding of the __mcount_loc sections. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
This patch enables the loading of the __mcount_section of modules and changing all the callers of mcount into nops. The modification is done before the init_module function is called, so again, we do not need to use kstop_machine to make these changes. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
This is the infrastructure to the converting the mcount call sites recorded by the __mcount_loc section into nops on boot. It also allows for using these sites to enable tracing as normal. When the __mcount_loc section is used, the "ftraced" kernel thread is disabled. This uses the current infrastructure to record the mcount call sites as well as convert them to nops. The mcount function is kept as a stub on boot up and not converted to the ftrace_record_ip function. We use the ftrace_record_ip to only record from the table. This patch does not handle modules. That comes with a later patch. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
This patch creates a section in the kernel called "__mcount_loc". This will hold a list of pointers to the mcount relocation for each call site of mcount. For example: objdump -dr init/main.o [...] Disassembly of section .text: 0000000000000000 <do_one_initcall>: 0: 55 push %rbp [...] 000000000000017b <init_post>: 17b: 55 push %rbp 17c: 48 89 e5 mov %rsp,%rbp 17f: 53 push %rbx 180: 48 83 ec 08 sub $0x8,%rsp 184: e8 00 00 00 00 callq 189 <init_post+0xe> 185: R_X86_64_PC32 mcount+0xfffffffffffffffc [...] We will add a section to point to each function call. .section __mcount_loc,"a",@progbits [...] .quad .text + 0x185 [...] The offset to of the mcount call site in init_post is an offset from the start of the section, and not the start of the function init_post. The mcount relocation is at the call site 0x185 from the start of the .text section. .text + 0x185 == init_post + 0xa We need a way to add this __mcount_loc section in a way that we do not lose the relocations after final link. The .text section here will be attached to all other .text sections after final link and the offsets will be meaningless. We need to keep track of where these .text sections are. To do this, we use the start of the first function in the section. do_one_initcall. We can make a tmp.s file with this function as a reference to the start of the .text section. .section __mcount_loc,"a",@progbits [...] .quad do_one_initcall + 0x185 [...] Then we can compile the tmp.s into a tmp.o gcc -c tmp.s -o tmp.o And link it into back into main.o. ld -r main.o tmp.o -o tmp_main.o mv tmp_main.o main.o But we have a problem. What happens if the first function in a section is not exported, and is a static function. The linker will not let the tmp.o use it. This case exists in main.o as well. Disassembly of section .init.text: 0000000000000000 <set_reset_devices>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: e8 00 00 00 00 callq 9 <set_reset_devices+0x9> 5: R_X86_64_PC32 mcount+0xfffffffffffffffc The first function in .init.text is a static function. 00000000000000a8 t __setup_set_reset_devices 000000000000105f t __setup_str_set_reset_devices 0000000000000000 t set_reset_devices The lowercase 't' means that set_reset_devices is local and is not exported. If we simply try to link the tmp.o with the set_reset_devices we end up with two symbols: one local and one global. .section __mcount_loc,"a",@progbits .quad set_reset_devices + 0x10 00000000000000a8 t __setup_set_reset_devices 000000000000105f t __setup_str_set_reset_devices 0000000000000000 t set_reset_devices U set_reset_devices We still have an undefined reference to set_reset_devices, and if we try to compile the kernel, we will end up with an undefined reference to set_reset_devices, or even worst, it could be exported someplace else, and then we will have a reference to the wrong location. To handle this case, we make an intermediate step using objcopy. We convert set_reset_devices into a global exported symbol before linking it with tmp.o and set it back afterwards. 00000000000000a8 t __setup_set_reset_devices 000000000000105f t __setup_str_set_reset_devices 0000000000000000 T set_reset_devices 00000000000000a8 t __setup_set_reset_devices 000000000000105f t __setup_str_set_reset_devices 0000000000000000 T set_reset_devices 00000000000000a8 t __setup_set_reset_devices 000000000000105f t __setup_str_set_reset_devices 0000000000000000 t set_reset_devices Now we have a section in main.o called __mcount_loc that we can place somewhere in the kernel using vmlinux.ld.S and access it to convert all these locations that call mcount into nops before starting SMP and thus, eliminating the need to do this with kstop_machine. Note, A well documented perl script (scripts/recordmcount.pl) is used to do all this in one location. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
it can be called in the NMI path: [ 0.645999] calling ftrace_dynamic_init+0x0/0xd6 [ 0.647521] ------------[ cut here ]------------ [ 0.647521] WARNING: at kernel/trace/ftrace.c:348 ftrace_record_ip+0x4e/0x252() [ 0.647521] Modules linked in: [ 0.647521] Pid: 15, comm: kstop1 Not tainted 2.6.27-rc1-tip #22686 [ 0.647521] [ 0.647521] Call Trace: [ 0.647521] <NMI> [<ffffffff8024593f>] warn_on_slowpath+0x5d/0x84 [ 0.647521] [<ffffffff80220b99>] ? lapic_wd_event+0xb/0x5c [ 0.647521] [<ffffffff80287b3b>] ftrace_record_ip+0x4e/0x252 [ 0.647521] [<ffffffff80211274>] mcount_call+0x5/0x31 [ 0.647521] [<ffffffff80220b9e>] ? lapic_wd_event+0x10/0x5c [ 0.647521] [<ffffffff8083f3ec>] nmi_watchdog_tick+0x19d/0x1ad [ 0.647521] [<ffffffff8083e875>] default_do_nmi+0x75/0x1e3 [ 0.647521] [<ffffffff8083f0b3>] do_nmi+0x5d/0x94 [ 0.647521] [<ffffffff8083e2d2>] nmi+0xa2/0xc2 [ 0.647521] [<ffffffff802b48c3>] ? check_bytes_and_report+0x11/0xcc [ 0.647521] <<EOE>> [<ffffffff80211274>] ? mcount_call+0x5/0x31 [ 0.647521] [<ffffffff802b49df>] check_object+0x61/0x1b0 [ 0.647521] [<ffffffff802b502a>] __slab_free+0x169/0x2ae [ 0.647521] [<ffffffff80242dbf>] ? __cleanup_sighand+0x25/0x27 [ 0.647521] [<ffffffff80242dbf>] ? __cleanup_sighand+0x25/0x27 [ 0.647521] [<ffffffff802b60cd>] kmem_cache_free+0x85/0xb9 [ 0.647521] [<ffffffff80242dbf>] __cleanup_sighand+0x25/0x27 [ 0.647521] [<ffffffff80247b3d>] release_task+0x256/0x339 [ 0.647521] [<ffffffff802490b4>] do_exit+0x764/0x7ef [ 0.647521] [<ffffffff8027624c>] __xchg+0x0/0x38 [ 0.647521] [<ffffffff8027619a>] ? stop_cpu+0x0/0xb2 [ 0.647521] [<ffffffff8027619a>] ? stop_cpu+0x0/0xb2 [ 0.647521] [<ffffffff8025922f>] kthread+0x4e/0x7b [ 0.647521] [<ffffffff80212979>] child_rip+0xa/0x11 [ 0.647521] [<ffffffff80211c17>] ? restore_args+0x0/0x30 [ 0.647521] [<ffffffff802283a5>] ? native_load_tls+0x14/0x2e [ 0.647521] [<ffffffff802591e1>] ? kthread+0x0/0x7b [ 0.647521] [<ffffffff8021296f>] ? child_rip+0x0/0x11 [ 0.647521] [ 0.647521] ---[ end trace 4eaa2a86a8e2da22 ]--- [ 0.672032] initcall ftrace_dynamic_init+0x0/0xd6 returned 0 after 19 msecs also mark it no-kprobes while at it. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
kprobes already has an extensive list of annotations for functions that should not be instrumented. Add notrace annotations to these functions as well. This is particularly useful for functions called by the NMI path. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mathieu Desnoyers 提交于
Steven Rostedt suggested: | Wouldn't it look nicer to have: (TRACEPOINT_TABLE_SIZE - 1) ? Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Pekka Paalanen 提交于
When SIL, DIL, BPL or SPL registers were used in MMIO, the datum was extracted from AH, BH, CH, or DH, which are incorrect. Signed-off-by: NPekka Paalanen <pq@iki.fi> Cc: "Vegard Nossum" <vegard.nossum@gmail.com> Cc: "Steven Rostedt" <srostedt@redhat.com> Cc: proski@gnu.org Cc: "Pekka Enberg" <penberg@cs.helsinki.fi> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
do not expose users to CONFIG_TRACEPOINTS - tracers can select it just fine. update ftrace to select CONFIG_TRACEPOINTS. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
make it a bit more structured hence more readable. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
while it's arguably low overhead, we dont enable new features by default. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mathieu Desnoyers 提交于
Porting the trace_mark() used by ftrace to tracepoints. (cleanup) Changelog : - Change error messages : marker -> tracepoint [ mingo@elte.hu: conflict resolutions ] Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: N'Peter Zijlstra' <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mathieu Desnoyers 提交于
Instrument the scheduler activity (sched_switch, migration, wakeups, wait for a task, signal delivery) and process/thread creation/destruction (fork, exit, kthread stop). Actually, kthread creation is not instrumented in this patch because it is architecture dependent. It allows to connect tracers such as ftrace which detects scheduling latencies, good/bad scheduler decisions. Tools like LTTng can export this scheduler information along with instrumentation of the rest of the kernel activity to perform post-mortem analysis on the scheduler activity. About the performance impact of tracepoints (which is comparable to markers), even without immediate values optimizations, tests done by Hideo Aoki on ia64 show no regression. His test case was using hackbench on a kernel where scheduler instrumentation (about 5 events in code scheduler code) was added. See the "Tracepoints" patch header for performance result detail. Changelog : - Change instrumentation location and parameter to match ftrace instrumentation, previously done with kernel markers. [ mingo@elte.hu: conflict resolutions ] Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: N'Peter Zijlstra' <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mathieu Desnoyers 提交于
Tracepoint example code under samples/. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: N'Peter Zijlstra' <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mathieu Desnoyers 提交于
Documentation of tracepoint usage. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: N'Peter Zijlstra' <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mathieu Desnoyers 提交于
Implementation of kernel tracepoints. Inspired from the Linux Kernel Markers. Allows complete typing verification by declaring both tracing statement inline functions and probe registration/unregistration static inline functions within the same macro "DEFINE_TRACE". No format string is required. See the tracepoint Documentation and Samples patches for usage examples. Taken from the documentation patch : "A tracepoint placed in code provides a hook to call a function (probe) that you can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or "off" (no probe is attached). When a tracepoint is "off" it has no effect, except for adding a tiny time penalty (checking a condition for a branch) and space penalty (adding a few bytes for the function call at the end of the instrumented function and adds a data structure in a separate section). When a tracepoint is "on", the function you provide is called each time the tracepoint is executed, in the execution context of the caller. When the function provided ends its execution, it returns to the caller (continuing from the tracepoint site). You can put tracepoints at important locations in the code. They are lightweight hooks that can pass an arbitrary number of parameters, which prototypes are described in a tracepoint declaration placed in a header file." Addition and removal of tracepoints is synchronized by RCU using the scheduler (and preempt_disable) as guarantees to find a quiescent state (this is really RCU "classic"). The update side uses rcu_barrier_sched() with call_rcu_sched() and the read/execute side uses "preempt_disable()/preempt_enable()". We make sure the previous array containing probes, which has been scheduled for deletion by the rcu callback, is indeed freed before we proceed to the next update. It therefore limits the rate of modification of a single tracepoint to one update per RCU period. The objective here is to permit fast batch add/removal of probes on _different_ tracepoints. Changelog : - Use #name ":" #proto as string to identify the tracepoint in the tracepoint table. This will make sure not type mismatch happens due to connexion of a probe with the wrong type to a tracepoint declared with the same name in a different header. - Add tracepoint_entry_free_old. - Change __TO_TRACE to get rid of the 'i' iterator. Masami Hiramatsu <mhiramat@redhat.com> : Tested on x86-64. Performance impact of a tracepoint : same as markers, except that it adds about 70 bytes of instructions in an unlikely branch of each instrumented function (the for loop, the stack setup and the function call). It currently adds a memory read, a test and a conditional branch at the instrumentation site (in the hot path). Immediate values will eventually change this into a load immediate, test and branch, which removes the memory read which will make the i-cache impact smaller (changing the memory read for a load immediate removes 3-4 bytes per site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it also saves the d-cache hit). About the performance impact of tracepoints (which is comparable to markers), even without immediate values optimizations, tests done by Hideo Aoki on ia64 show no regression. His test case was using hackbench on a kernel where scheduler instrumentation (about 5 events in code scheduler code) was added. Quoting Hideo Aoki about Markers : I evaluated overhead of kernel marker using linux-2.6-sched-fixes git tree, which includes several markers for LTTng, using an ia64 server. While the immediate trace mark feature isn't implemented on ia64, there is no major performance regression. So, I think that we don't have any issues to propose merging marker point patches into Linus's tree from the viewpoint of performance impact. I prepared two kernels to evaluate. The first one was compiled without CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS. I downloaded the original hackbench from the following URL: http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c I ran hackbench 5 times in each condition and calculated the average and difference between the kernels. The parameter of hackbench: every 50 from 50 to 800 The number of CPUs of the server: 2, 4, and 8 Below is the results. As you can see, major performance regression wasn't found in any case. Even if number of processes increases, differences between marker-enabled kernel and marker- disabled kernel doesn't increase. Moreover, if number of CPUs increases, the differences doesn't increase either. Curiously, marker-enabled kernel is better than marker-disabled kernel in more than half cases, although I guess it comes from the difference of memory access pattern. * 2 CPUs Number of | without | with | diff | diff | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | -------------------------------------------------------------- 50 | 4.811 | 4.872 | +0.061 | +1.27 | 100 | 9.854 | 10.309 | +0.454 | +4.61 | 150 | 15.602 | 15.040 | -0.562 | -3.6 | 200 | 20.489 | 20.380 | -0.109 | -0.53 | 250 | 25.798 | 25.652 | -0.146 | -0.56 | 300 | 31.260 | 30.797 | -0.463 | -1.48 | 350 | 36.121 | 35.770 | -0.351 | -0.97 | 400 | 42.288 | 42.102 | -0.186 | -0.44 | 450 | 47.778 | 47.253 | -0.526 | -1.1 | 500 | 51.953 | 52.278 | +0.325 | +0.63 | 550 | 58.401 | 57.700 | -0.701 | -1.2 | 600 | 63.334 | 63.222 | -0.112 | -0.18 | 650 | 68.816 | 68.511 | -0.306 | -0.44 | 700 | 74.667 | 74.088 | -0.579 | -0.78 | 750 | 78.612 | 79.582 | +0.970 | +1.23 | 800 | 85.431 | 85.263 | -0.168 | -0.2 | -------------------------------------------------------------- * 4 CPUs Number of | without | with | diff | diff | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | -------------------------------------------------------------- 50 | 2.586 | 2.584 | -0.003 | -0.1 | 100 | 5.254 | 5.283 | +0.030 | +0.56 | 150 | 8.012 | 8.074 | +0.061 | +0.76 | 200 | 11.172 | 11.000 | -0.172 | -1.54 | 250 | 13.917 | 14.036 | +0.119 | +0.86 | 300 | 16.905 | 16.543 | -0.362 | -2.14 | 350 | 19.901 | 20.036 | +0.135 | +0.68 | 400 | 22.908 | 23.094 | +0.186 | +0.81 | 450 | 26.273 | 26.101 | -0.172 | -0.66 | 500 | 29.554 | 29.092 | -0.461 | -1.56 | 550 | 32.377 | 32.274 | -0.103 | -0.32 | 600 | 35.855 | 35.322 | -0.533 | -1.49 | 650 | 39.192 | 38.388 | -0.804 | -2.05 | 700 | 41.744 | 41.719 | -0.025 | -0.06 | 750 | 45.016 | 44.496 | -0.520 | -1.16 | 800 | 48.212 | 47.603 | -0.609 | -1.26 | -------------------------------------------------------------- * 8 CPUs Number of | without | with | diff | diff | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | -------------------------------------------------------------- 50 | 2.094 | 2.072 | -0.022 | -1.07 | 100 | 4.162 | 4.273 | +0.111 | +2.66 | 150 | 6.485 | 6.540 | +0.055 | +0.84 | 200 | 8.556 | 8.478 | -0.078 | -0.91 | 250 | 10.458 | 10.258 | -0.200 | -1.91 | 300 | 12.425 | 12.750 | +0.325 | +2.62 | 350 | 14.807 | 14.839 | +0.032 | +0.22 | 400 | 16.801 | 16.959 | +0.158 | +0.94 | 450 | 19.478 | 19.009 | -0.470 | -2.41 | 500 | 21.296 | 21.504 | +0.208 | +0.98 | 550 | 23.842 | 23.979 | +0.137 | +0.57 | 600 | 26.309 | 26.111 | -0.198 | -0.75 | 650 | 28.705 | 28.446 | -0.259 | -0.9 | 700 | 31.233 | 31.394 | +0.161 | +0.52 | 750 | 34.064 | 33.720 | -0.344 | -1.01 | 800 | 36.320 | 36.114 | -0.206 | -0.57 | -------------------------------------------------------------- Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: NMasami Hiramatsu <mhiramat@redhat.com> Acked-by: N'Peter Zijlstra' <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip由 Linus Torvalds 提交于
Merges oprofile, timers/hpet, x86/traps, x86/time, and x86/core misc items. * 'x86-core-v4-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (132 commits) x86: change early_ioremap to use slots instead of nesting x86: adjust dependencies for CONFIG_X86_CMOV dumpstack: x86: various small unification steps, fix x86: remove additional_cpus x86: remove additional_cpus configurability x86: improve UP kernel when CPU-hotplug and SMP is enabled dumpstack: x86: various small unification steps dumpstack: i386: make kstack= an early boot-param and add oops=panic dumpstack: x86: use log_lvl and unify trace formatting dumptrace: x86: consistently include loglevel, print stack switch dumpstack: x86: add "end" parameter to valid_stack_ptr and print_context_stack dumpstack: x86: make printk_address equal dumpstack: x86: move die_nmi to dumpstack_32.c traps: x86: finalize unification of traps.c traps: x86: make traps_32.c and traps_64.c equal traps: x86: various noop-changes preparing for unification of traps_xx.c traps: x86_64: use task_pid_nr(tsk) instead of tsk->pid in do_general_protection traps: i386: expand clear_mem_error and remove from mach_traps.h traps: x86_64: make io_check_error equal to the one on i386 traps: i386: use preempt_conditional_sti/cli in do_int3 ...
-
由 Alan Cox 提交于
Original idea for this from a patch by Rodolfo Giometti which merges various bits of PPS support Signed-off-by: NAlan Cox <alan@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-