- 16 6月, 2015 2 次提交
-
-
由 Alexei Starovoitov 提交于
bpf_trace_printk() is a helper function used to debug eBPF programs. Let socket and TC programs use it as well. Note, it's DEBUG ONLY helper. If it's used in the program, the kernel will print warning banner to make sure users don't use it in production. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
eBPF programs attached to kprobes need to filter based on current->pid, uid and other fields, so introduce helper functions: u64 bpf_get_current_pid_tgid(void) Return: current->tgid << 32 | current->pid u64 bpf_get_current_uid_gid(void) Return: current_gid << 32 | current_uid bpf_get_current_comm(char *buf, int size_of_buf) stores current->comm into buf They can be used from the programs attached to TC as well to classify packets based on current task fields. Update tracex2 example to print histogram of write syscalls for each process instead of aggregated for all. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 6月, 2015 1 次提交
-
-
由 Wang Long 提交于
The producer should be used producer_fifo as its sched_priority, so correct it. Link: http://lkml.kernel.org/r/1433923957-67842-1-git-send-email-long.wanglong@huawei.com Cc: stable@vger.kernel.org # 2.6.33+ Signed-off-by: NWang Long <long.wanglong@huawei.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 01 6月, 2015 1 次提交
-
-
由 Daniel Borkmann 提交于
As this is already exported from tracing side via commit d9847d31 ("tracing: Allow BPF programs to call bpf_ktime_get_ns()"), we might as well want to move it to the core, so also networking users can make use of it, e.g. to measure diffs for certain flows from ingress/egress. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 5月, 2015 1 次提交
-
-
由 Alexei Starovoitov 提交于
introduce bpf_tail_call(ctx, &jmp_table, index) helper function which can be used from BPF programs like: int bpf_prog(struct pt_regs *ctx) { ... bpf_tail_call(ctx, &jmp_table, index); ... } that is roughly equivalent to: int bpf_prog(struct pt_regs *ctx) { ... if (jmp_table[index]) return (*jmp_table[index])(ctx); ... } The important detail that it's not a normal call, but a tail call. The kernel stack is precious, so this helper reuses the current stack frame and jumps into another BPF program without adding extra call frame. It's trivially done in interpreter and a bit trickier in JITs. In case of x64 JIT the bigger part of generated assembler prologue is common for all programs, so it is simply skipped while jumping. Other JITs can do similar prologue-skipping optimization or do stack unwind before jumping into the next program. bpf_tail_call() arguments: ctx - context pointer jmp_table - one of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table index - index in the jump table Since all BPF programs are idenitified by file descriptor, user space need to populate the jmp_table with FDs of other BPF programs. If jmp_table[index] is empty the bpf_tail_call() doesn't jump anywhere and program execution continues as normal. New BPF_MAP_TYPE_PROG_ARRAY map type is introduced so that user space can populate this jmp_table array with FDs of other bpf programs. Programs can share the same jmp_table array or use multiple jmp_tables. The chain of tail calls can form unpredictable dynamic loops therefore tail_call_cnt is used to limit the number of calls and currently is set to 32. Use cases: Acked-by: NDaniel Borkmann <daniel@iogearbox.net> ========== - simplify complex programs by splitting them into a sequence of small programs - dispatch routine For tracing and future seccomp the program may be triggered on all system calls, but processing of syscall arguments will be different. It's more efficient to implement them as: int syscall_entry(struct seccomp_data *ctx) { bpf_tail_call(ctx, &syscall_jmp_table, ctx->nr /* syscall number */); ... default: process unknown syscall ... } int sys_write_event(struct seccomp_data *ctx) {...} int sys_read_event(struct seccomp_data *ctx) {...} syscall_jmp_table[__NR_write] = sys_write_event; syscall_jmp_table[__NR_read] = sys_read_event; For networking the program may call into different parsers depending on packet format, like: int packet_parser(struct __sk_buff *skb) { ... parse L2, L3 here ... __u8 ipproto = load_byte(skb, ... offsetof(struct iphdr, protocol)); bpf_tail_call(skb, &ipproto_jmp_table, ipproto); ... default: process unknown protocol ... } int parse_tcp(struct __sk_buff *skb) {...} int parse_udp(struct __sk_buff *skb) {...} ipproto_jmp_table[IPPROTO_TCP] = parse_tcp; ipproto_jmp_table[IPPROTO_UDP] = parse_udp; - for TC use case, bpf_tail_call() allows to implement reclassify-like logic - bpf_map_update_elem/delete calls into BPF_MAP_TYPE_PROG_ARRAY jump table are atomic, so user space can build chains of BPF programs on the fly Implementation details: ======================= - high performance of bpf_tail_call() is the goal. It could have been implemented without JIT changes as a wrapper on top of BPF_PROG_RUN() macro, but with two downsides: . all programs would have to pay performance penalty for this feature and tail call itself would be slower, since mandatory stack unwind, return, stack allocate would be done for every tailcall. . tailcall would be limited to programs running preempt_disabled, since generic 'void *ctx' doesn't have room for 'tail_call_cnt' and it would need to be either global per_cpu variable accessed by helper and by wrapper or global variable protected by locks. In this implementation x64 JIT bypasses stack unwind and jumps into the callee program after prologue. - bpf_prog_array_compatible() ensures that prog_type of callee and caller are the same and JITed/non-JITed flag is the same, since calling JITed program from non-JITed is invalid, since stack frames are different. Similarly calling kprobe type program from socket type program is invalid. - jump table is implemented as BPF_MAP_TYPE_PROG_ARRAY to reuse 'map' abstraction, its user space API and all of verifier logic. It's in the existing arraymap.c file, since several functions are shared with regular array map. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 5月, 2015 1 次提交
-
-
由 Alex Bennée 提交于
The only caller to this function (__print_array) was getting it wrong by passing the array length instead of buffer length. As the element size was already being passed for other reasons it seems reasonable to push the calculation of buffer length into the function. Link: http://lkml.kernel.org/r/1430320727-14582-1-git-send-email-alex.bennee@linaro.orgSigned-off-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 17 4月, 2015 1 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
The code that replaces the enum names with the enum values in the tracepoints' format files could possible miss the end of string nul character. This was caused by processing things like backslashes, quotes and other tokens. After processing the tokens, a check for the nul character needed to be done before continuing the loop, because the loop incremented the pointer before doing the check, which could bypass the nul character. Link: http://lkml.kernel.org/r/552E661D.5060502@oracle.com Reported-by: Sasha Levin <sasha.levin@oracle.com> # via KASan Tested-by: NAndrey Ryabinin <a.ryabinin@samsung.com> Fixes: 0c564a53 "tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values" Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 16 4月, 2015 4 次提交
-
-
由 Joonsoo Kim 提交于
There is a problem that trace events are not properly enabled with boot cmdline. The problem is that if we pass "trace_event=kmem:mm_page_alloc" to the boot cmdline, it enables all kmem trace events, and not just the page_alloc event. This is caused by the parsing mechanism. When we parse the cmdline, the buffer contents is modified due to tokenization. And, if we use this buffer again, we will get the wrong result. Unfortunately, this buffer is be accessed three times to set trace events properly at boot time. So, we need to handle this situation. There is already code handling ",", but we need another for ":". This patch adds it. Link: http://lkml.kernel.org/r/1429159484-22977-1-git-send-email-iamjoonsoo.kim@lge.com Cc: stable@vger.kernel.org # 3.19+ Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> [ added missing return ret; ] Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Rabin Vincent 提交于
graph_trace_open() can be called in atomic context from ftrace_dump(). Use GFP_ATOMIC for the memory allocations when that's the case, in order to avoid the following splat. BUG: sleeping function called from invalid context at mm/slab.c:2849 in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/0 Backtrace: .. [<8004dc94>] (__might_sleep) from [<801371f4>] (kmem_cache_alloc_trace+0x160/0x238) r7:87800040 r6:000080d0 r5:810d16e8 r4:000080d0 [<80137094>] (kmem_cache_alloc_trace) from [<800cbd60>] (graph_trace_open+0x30/0xd0) r10:00000100 r9:809171a8 r8:00008e28 r7:810d16f0 r6:00000001 r5:810d16e8 r4:810d16f0 [<800cbd30>] (graph_trace_open) from [<800c79c4>] (trace_init_global_iter+0x50/0x9c) r8:00008e28 r7:808c853c r6:00000001 r5:810d16e8 r4:810d16f0 r3:800cbd30 [<800c7974>] (trace_init_global_iter) from [<800c7aa0>] (ftrace_dump+0x90/0x2ec) r4:810d2580 r3:00000000 [<800c7a10>] (ftrace_dump) from [<80414b2c>] (sysrq_ftrace_dump+0x1c/0x20) r10:00000100 r9:809171a8 r8:808f6e7c r7:00000001 r6:00000007 r5:0000007a r4:808d5394 [<80414b10>] (sysrq_ftrace_dump) from [<800169b8>] (return_to_handler+0x0/0x18) [<80415498>] (__handle_sysrq) from [<800169b8>] (return_to_handler+0x0/0x18) r8:808c8100 r7:808c8444 r6:00000101 r5:00000010 r4:84eb3210 [<80415668>] (handle_sysrq) from [<800169b8>] (return_to_handler+0x0/0x18) [<8042a760>] (pl011_int) from [<800169b8>] (return_to_handler+0x0/0x18) r10:809171bc r9:809171a8 r8:00000001 r7:00000026 r6:808c6000 r5:84f01e60 r4:8454fe00 [<8007782c>] (handle_irq_event_percpu) from [<80077b44>] (handle_irq_event+0x4c/0x6c) r10:808c7ef0 r9:87283e00 r8:00000001 r7:00000000 r6:8454fe00 r5:84f01e60 r4:84f01e00 [<80077af8>] (handle_irq_event) from [<8007aa28>] (handle_fasteoi_irq+0xf0/0x1ac) r6:808f52a4 r5:84f01e60 r4:84f01e00 r3:00000000 [<8007a938>] (handle_fasteoi_irq) from [<80076dc0>] (generic_handle_irq+0x3c/0x4c) r6:00000026 r5:00000000 r4:00000026 r3:8007a938 [<80076d84>] (generic_handle_irq) from [<80077128>] (__handle_domain_irq+0x8c/0xfc) r4:808c1e38 r3:0000002e [<8007709c>] (__handle_domain_irq) from [<800087b8>] (gic_handle_irq+0x34/0x6c) r10:80917748 r9:00000001 r8:88802100 r7:808c7ef0 r6:808c8fb0 r5:00000015 r4:8880210c r3:808c7ef0 [<80008784>] (gic_handle_irq) from [<80014044>] (__irq_svc+0x44/0x7c) Link: http://lkml.kernel.org/r/1428953721-31349-1-git-send-email-rabin@rab.in Link: http://lkml.kernel.org/r/1428957012-2319-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 3.13+ Signed-off-by: NRabin Vincent <rabin@rab.in> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Joe Perches 提交于
The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Miscellanea: o Remove unused return value from trace_lookup_stack Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Howells 提交于
relayfs and tracefs are dealing with inodes of their own; those two act as filesystem drivers Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 08 4月, 2015 3 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
Add a enum_map file in the tracing directory to see what enums have been saved to convert in the print fmt files. As this requires the enum mapping to be persistent in memory, it is only created if the new config option CONFIG_TRACE_ENUM_MAP_FILE is enabled. This is for debugging and will increase the persistent memory footprint of the kernel. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.orgReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
Update the infrastructure such that modules that declare TRACE_DEFINE_ENUM() will have those enums converted into their values in the tracepoint print fmt strings. Link: http://lkml.kernel.org/r/87vbhjp74q.fsf@rustcorp.com.auAcked-by: NRusty Russell <rusty@rustcorp.com.au> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
Several tracepoints use the helper functions __print_symbolic() or __print_flags() and pass in enums that do the mapping between the binary data stored and the value to print. This works well for reading the ASCII trace files, but when the data is read via userspace tools such as perf and trace-cmd, the conversion of the binary value to a human string format is lost if an enum is used, as userspace does not have access to what the ENUM is. For example, the tracepoint trace_tlb_flush() has: __print_symbolic(REC->reason, { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" }, { TLB_REMOTE_SHOOTDOWN, "remote shootdown" }, { TLB_LOCAL_SHOOTDOWN, "local shootdown" }, { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" }) Which maps the enum values to the strings they represent. But perf and trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would not be able to map it. With TRACE_DEFINE_ENUM(), developers can place these in the event header files and ftrace will convert the enums to their values: By adding: TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH); TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN); $ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format [...] __print_symbolic(REC->reason, { 0, "flush on task switch" }, { 1, "remote shootdown" }, { 2, "local shootdown" }, { 3, "local mm shootdown" }) The above is what userspace expects to see, and tools do not need to be modified to parse them. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Cc: Guilherme Cox <cox@computer.org> Cc: Tony Luck <tony.luck@gmail.com> Cc: Xie XiuQi <xiexiuqi@huawei.com> Acked-by: NNamhyung Kim <namhyung@kernel.org> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 03 4月, 2015 1 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
Dynamically allocated trampolines call ftrace_ops_get_func to get the function which they should call. For dynamic fops (FTRACE_OPS_FL_DYNAMIC flag is set) ftrace_ops_list_func is always returned. This is reasonable for static trampolines but goes against the main advantage of dynamic ones, that is avoidance of going through the list of all registered callbacks for functions that are only being traced by a single callback. We can fix it by returning ops->func (or recursion safe version) from ftrace_ops_get_func whenever it is possible for dynamic trampolines. Note that dynamic trampolines are not allowed for dynamic fops if CONFIG_PREEMPT=y. Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1501291023000.25445@pobox.suse.cz Link: http://lkml.kernel.org/r/1424357773-13536-1-git-send-email-mbenes@suse.czReported-by: NMiroslav Benes <mbenes@suse.cz> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 02 4月, 2015 5 次提交
-
-
由 Ingo Molnar 提交于
So bpf_tracing.o depends on CONFIG_BPF_SYSCALL - but that's not its only dependency, it also depends on the tracing infrastructure and on kprobes, without which it will fail to build with: In file included from kernel/trace/bpf_trace.c:14:0: kernel/trace/trace.h: In function ‘trace_test_and_set_recursion’: kernel/trace/trace.h:491:28: error: ‘struct task_struct’ has no member named ‘trace_recursion’ unsigned int val = current->trace_recursion; [...] It took quite some time to trigger this build failure, because right now BPF_SYSCALL is very obscure, depends on CONFIG_EXPERT. So also make BPF_SYSCALL more configurable, not just under CONFIG_EXPERT. If BPF_SYSCALL, tracing and kprobes are enabled then enable the bpf_tracing gateway as well. We might want to make this an interactive option later on, although I'd not complicate it unnecessarily: enabling BPF_SYSCALL is enough of an indicator that the user wants BPF support. Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexei Starovoitov 提交于
Debugging of BPF programs needs some form of printk from the program, so let programs call limited trace_printk() with %d %u %x %p modifiers only. Similar to kernel modules, during program load verifier checks whether program is calling bpf_trace_printk() and if so, kernel allocates trace_printk buffers and emits big 'this is debug only' banner. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1427312966-8434-6-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexei Starovoitov 提交于
bpf_ktime_get_ns() is used by programs to compute time delta between events or as a timestamp Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1427312966-8434-5-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexei Starovoitov 提交于
BPF programs, attached to kprobes, provide a safe way to execute user-defined BPF byte-code programs without being able to crash or hang the kernel in any way. The BPF engine makes sure that such programs have a finite execution time and that they cannot break out of their sandbox. The user interface is to attach to a kprobe via the perf syscall: struct perf_event_attr attr = { .type = PERF_TYPE_TRACEPOINT, .config = event_id, ... }; event_fd = perf_event_open(&attr,...); ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); 'prog_fd' is a file descriptor associated with BPF program previously loaded. 'event_id' is an ID of the kprobe created. Closing 'event_fd': close(event_fd); ... automatically detaches BPF program from it. BPF programs can call in-kernel helper functions to: - lookup/update/delete elements in maps - probe_read - wraper of probe_kernel_read() used to access any kernel data structures BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is architecture dependent) and return 0 to ignore the event and 1 to store kprobe event into the ring buffer. Note, kprobes are a fundamentally _not_ a stable kernel ABI, so BPF programs attached to kprobes must be recompiled for every kernel version and user must supply correct LINUX_VERSION_CODE in attr.kern_version during bpf_prog_load() call. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexei Starovoitov 提交于
add TRACE_EVENT_FL_KPROBE flag to differentiate kprobe type of tracepoints, since bpf programs can only be attached to kprobe type of PERF_TYPE_TRACEPOINT perf events. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1427312966-8434-3-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 31 3月, 2015 1 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
A clean up of the recursive protection code changed val = this_cpu_read(current_context); val--; val &= this_cpu_read(current_context); to val = this_cpu_read(current_context); val &= val & (val - 1); Which has a duplicate use of '&' as the above is the same as val = val & (val - 1); Actually, it would be best to remove that line altogether and just add it to where it is used. And Christoph even mentioned that it can be further compacted to just a single line: __this_cpu_and(current_context, __this_cpu_read(current_context) - 1); Link: http://lkml.kernel.org/alpine.DEB.2.11.1503271423580.23114@gentwo.orgSuggested-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 25 3月, 2015 4 次提交
-
-
由 Stephen Rothwell 提交于
The commit that added a check for this to checkpatch says: "Using weak declarations can have unintended link defects. The __weak on the declaration causes non-weak definitions to become weak." In this case, when a PowerPC kernel is built with CONFIG_KPROBE_EVENT but not CONFIG_UPROBE_EVENT, it generates the following warning: WARNING: 1 bad relocations c0000000014f2190 R_PPC64_ADDR64 uprobes_fetch_type_table This is fixed by passing the fetch_table arrays to traceprobe_parse_probe_arg() which also means that they can never be NULL. Link: http://lkml.kernel.org/r/20150312165834.4482cb48@canb.auug.org.auAcked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 He Kuang 提交于
TRACE_EVENT_FL_USE_CALL_FILTER flag in ftrace:functon event can be removed. This flag was first introduced in commit f306cc82 ("tracing: Update event filters for multibuffer"). Now, the only place uses this flag is ftrace:function, but the filter of ftrace:function has a different code path with events/syscalls and events/tracepoints. It uses ftrace_filter_write() and perf's ftrace_profile_set_filter() to set the filter, the functionality of file 'tracing/events/ftrace/function/filter' is bypassed in function init_pred(), in which case, neither call->filter nor file->filter is used. So we can safely remove TRACE_EVENT_FL_USE_CALL_FILTER flag from ftrace:function events. Link: http://lkml.kernel.org/r/1425367294-27852-1-git-send-email-hekuang@huawei.comSigned-off-by: NHe Kuang <hekuang@huawei.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Scott Wood 提交于
Use %pS for actual addresses, otherwise you'll get bad output on arches like ppc64 where %pF expects a function descriptor. Link: http://lkml.kernel.org/r/1426130037-17956-22-git-send-email-scottwood@freescale.comSigned-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
It has come to my attention that this_cpu_read/write are horrible on architectures other than x86. Worse yet, they actually disable preemption or interrupts! This caused some unexpected tracing results on ARM. 101.356868: preempt_count_add <-ring_buffer_lock_reserve 101.356870: preempt_count_sub <-ring_buffer_lock_reserve The ring_buffer_lock_reserve has recursion protection that requires accessing a per cpu variable. But since preempt_disable() is traced, it too got traced while accessing the variable that is suppose to prevent recursion like this. The generic version of this_cpu_read() and write() are: #define this_cpu_generic_read(pcp) \ ({ typeof(pcp) ret__; \ preempt_disable(); \ ret__ = *this_cpu_ptr(&(pcp)); \ preempt_enable(); \ ret__; \ }) #define this_cpu_generic_to_op(pcp, val, op) \ do { \ unsigned long flags; \ raw_local_irq_save(flags); \ *__this_cpu_ptr(&(pcp)) op val; \ raw_local_irq_restore(flags); \ } while (0) Which is unacceptable for locations that know they are within preempt disabled or interrupt disabled locations. Paul McKenney stated that __this_cpu_() versions produce much better code on other architectures than this_cpu_() does, if we know that the call is done in a preempt disabled location. I also changed the recursive_unlock() to use two local variables instead of accessing the per_cpu variable twice. Link: http://lkml.kernel.org/r/20150317114411.GE3589@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20150317104038.312e73d1@gandalf.local.home Cc: stable@vger.kernel.org Acked-by: NChristoph Lameter <cl@linux.com> Reported-by: NUwe Kleine-Koenig <u.kleine-koenig@pengutronix.de> Tested-by: NUwe Kleine-Koenig <u.kleine-koenig@pengutronix.de> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 23 3月, 2015 1 次提交
-
-
由 Peter Zijlstra 提交于
The only reason CQM had to use a hard-coded pmu type was so it could use cqm_target in hw_perf_event. Do away with the {tp,bp,cqm}_target pointers and provide a non type specific one. This allows us to do away with that silly pmu type as well. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Vince Weaver <vince@deater.net> Cc: acme@kernel.org Cc: acme@redhat.com Cc: hpa@zytor.com Cc: jolsa@redhat.com Cc: kanaka.d.juvva@intel.com Cc: matt.fleming@intel.com Cc: tglx@linutronix.de Cc: torvalds@linux-foundation.org Cc: vikas.shivappa@linux.intel.com Link: http://lkml.kernel.org/r/20150305211019.GU21418@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 3月, 2015 3 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
Some archs (specifically PowerPC), are sensitive with the ordering of the enabling of the calls to function tracing and setting of the function to use to be traced. That is, update_ftrace_function() sets what function the ftrace_caller trampoline should call. Some archs require this to be set before calling ftrace_run_update_code(). Another bug was discovered, that ftrace_startup_sysctl() called ftrace_run_update_code() directly. If the function the ftrace_caller trampoline changes, then it will not be updated. Instead a call to ftrace_startup_enable() should be called because it tests to see if the callback changed since the code was disabled, and will tell the arch to update appropriately. Most archs do not need this notification, but PowerPC does. The problem could be seen by the following commands: # echo 0 > /proc/sys/kernel/ftrace_enabled # echo function > /sys/kernel/debug/tracing/current_tracer # echo 1 > /proc/sys/kernel/ftrace_enabled # cat /sys/kernel/debug/tracing/trace The trace will show that function tracing was not active. Cc: stable@vger.kernel.org # 2.6.27+ Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Pratyush Anand 提交于
When ftrace is enabled globally through the proc interface, we must check if ftrace_graph_active is set. If it is set, then we should also pass the FTRACE_START_FUNC_RET command to ftrace_run_update_code(). Similarly, when ftrace is disabled globally through the proc interface, we must check if ftrace_graph_active is set. If it is set, then we should also pass the FTRACE_STOP_FUNC_RET command to ftrace_run_update_code(). Consider the following situation. # echo 0 > /proc/sys/kernel/ftrace_enabled After this ftrace_enabled = 0. # echo function_graph > /sys/kernel/debug/tracing/current_tracer Since ftrace_enabled = 0, ftrace_enable_ftrace_graph_caller() is never called. # echo 1 > /proc/sys/kernel/ftrace_enabled Now ftrace_enabled will be set to true, but still ftrace_enable_ftrace_graph_caller() will not be called, which is not desired. Further if we execute the following after this: # echo nop > /sys/kernel/debug/tracing/current_tracer Now since ftrace_enabled is set it will call ftrace_disable_ftrace_graph_caller(), which causes a kernel warning on the ARM platform. On the ARM platform, when ftrace_enable_ftrace_graph_caller() is called, it checks whether the old instruction is a nop or not. If it's not a nop, then it returns an error. If it is a nop then it replaces instruction at that address with a branch to ftrace_graph_caller. ftrace_disable_ftrace_graph_caller() behaves just the opposite. Therefore, if generic ftrace code ever calls either ftrace_enable_ftrace_graph_caller() or ftrace_disable_ftrace_graph_caller() consecutively two times in a row, then it will return an error, which will cause the generic ftrace code to raise a warning. Note, x86 does not have an issue with this because the architecture specific code for ftrace_enable_ftrace_graph_caller() and ftrace_disable_ftrace_graph_caller() does not check the previous state, and calling either of these functions twice in a row has no ill effect. Link: http://lkml.kernel.org/r/e4fbe64cdac0dd0e86a3bf914b0f83c0b419f146.1425666454.git.panand@redhat.com Cc: stable@vger.kernel.org # 2.6.31+ Signed-off-by: NPratyush Anand <panand@redhat.com> [ removed extra if (ftrace_start_up) and defined ftrace_graph_active as 0 if CONFIG_FUNCTION_GRAPH_TRACER is not set. ] Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
When /proc/sys/kernel/ftrace_enabled is set to zero, all function tracing is disabled. But the records that represent the functions still hold information about the ftrace_ops that are hooked to them. ftrace_ops may request "REGS" (have a full set of pt_regs passed to the callback), or "TRAMP" (the ops has its own trampoline to use). When the record is updated to represent the state of the ops hooked to it, it sets "REGS_EN" and/or "TRAMP_EN" to state that the callback points to the correct trampoline (REGS has its own trampoline). When ftrace_enabled is set to zero, all ftrace locations are a nop, so they do not point to any trampoline. But the _EN flags are still set. This can cause the accounting to go wrong when ftrace_enabled is cleared and an ops that has a trampoline is registered or unregistered. For example, the following will cause ftrace to crash: # echo function_graph > /sys/kernel/debug/tracing/current_tracer # echo 0 > /proc/sys/kernel/ftrace_enabled # echo nop > /sys/kernel/debug/tracing/current_tracer # echo 1 > /proc/sys/kernel/ftrace_enabled # echo function_graph > /sys/kernel/debug/tracing/current_tracer As function_graph uses a trampoline, when ftrace_enabled is set to zero the updates to the record are not done. When enabling function_graph again, the record will still have the TRAMP_EN flag set, and it will look for an op that has a trampoline other than the function_graph ops, and fail to find one. Cc: stable@vger.kernel.org # 3.17+ Reported-by: NPratyush Anand <panand@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 14 2月, 2015 1 次提交
-
-
由 Tejun Heo 提交于
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 2月, 2015 1 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
When an application connects to the ring buffer via splice, it can only read full pages. Splice does not work with partial pages. If there is not enough data to fill a page, the splice command will either block or return -EAGAIN (if set to nonblock). Code was added where if the page is not full, to just sleep again. The problem is, it will get woken up again on the next event. That is, when something is written into the ring buffer, if there is a waiter it will wake it up. The waiter would then check the buffer, see that it still does not have enough data to fill a page and go back to sleep. To make matters worse, when the waiter goes back to sleep, it could cause another event, which would wake it back up again to see it doesn't have enough data and sleep again. This produces a tremendous overhead and fills the ring buffer with noise. For example, recording sched_switch on an idle system for 10 seconds produces 25,350,475 events!!! Create another wait queue for those waiters wanting full pages. When an event is written, it only wakes up waiters if there's a full page of data. It does not wake up the waiter if the page is not yet full. After this change, recording sched_switch on an idle system for 10 seconds produces only 800 events. Getting rid of 25,349,675 useless events (99.9969% of events!!), is something to take seriously. Cc: stable@vger.kernel.org # 3.16+ Cc: Rabin Vincent <rabin@rab.in> Fixes: e30f53aa "tracing: Do not busy wait in buffer splice" Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 10 2月, 2015 1 次提交
-
-
由 Vikram Mulukutla 提交于
Commit 6edb2a8a introduced an array map_pages that contains the addresses returned by kmap_atomic. However, when unmapping those pages, map_pages[0] is unmapped before map_pages[1], breaking the nesting requirement as specified in the documentation for kmap_atomic/kunmap_atomic. This was caught by the highmem debug code present in kunmap_atomic. Fix the loop to do the unmapping properly. Link: http://lkml.kernel.org/r/1418871056-6614-1-git-send-email-markivx@codeaurora.org Cc: stable@vger.kernel.org # 3.5+ Reviewed-by: NStephen Boyd <sboyd@codeaurora.org> Reported-by: NLime Yang <limey@codeaurora.org> Signed-off-by: NVikram Mulukutla <markivx@codeaurora.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 04 2月, 2015 5 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
The tracing "instances" directory can create sub tracing buffers with mkdir, and remove them with rmdir. As a mkdir will also create all the files and directories that control the sub buffer the inode mutexes need to be released before this is done, to avoid deadlocks. It is better to let the tracing system unlock the inode mutexes before calling the functions that create the files within the new directory (or deletes the files from the one being destroyed). Now that tracing has been converted over to tracefs, the tracefs file system can be modified to accommodate this feature. It still releases the locks, but the filesystem itself can take care of the ugly business and let the user just do what it needs. The tracing system now attaches a descriptor to the directory dentry that can have userspace create or remove sub directories. If this descriptor does not exist for a dentry, then that dentry can not be used to create other directories. This descriptor holds a mkdir and rmdir method that only takes a character string as an argument. The tracefs file system will first make a copy of the dentry name before releasing the locks. Then it will pass the copied name to the methods. It is up to the tracing system that supplied the methods to handle races with duplicate names and such as all the inode mutexes would be released when the functions are called. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
As tools currently rely on the tracing directory in debugfs, we can not just created a tracefs infrastructure and expect sysadmins to mount the new tracefs to have their old tools work. Instead, the debugfs tracing directory is still created and the tracefs file system is mounted there when the debugfs filesystem is mounted. No longer does the tracing infrastructure update the debugfs file system, but instead interacts with the tracefs file system. But now, it still appears to the user like nothing changed, except you also have the feature of mounting just the tracing system without needing all of debugfs! Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
debugfs was fine for the tracing facility as a quick way to get an interface. Now that tracing has matured, it should separate itself from debugfs such that it can be mounted separately without needing to mount all of debugfs with it. That is, users resist using tracing because it requires mounting debugfs. Having tracing have its own file system lets users get the features of tracing without needing to bring in the rest of the kernel's debug infrastructure. Another reason for tracefs is that debubfs does not support mkdir. Currently, to create instances, one does a mkdir in the tracing/instance directory. This is implemented via a hack that forces debugfs to do something it is not intended on doing. By converting over to tracefs, this hack can be removed and mkdir can be properly implemented. This patch does not address this yet, but it lays the ground work for that to be done. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
The options for cmdline tracers are not created if the debugfs system is not ready yet. If tracing has started before debugfs is up, then the option files for the tracer are not created. Create them when creating the tracing directory if the current tracer requires option files. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
Do not bother creating tracer options if no tracing directory exists. If a tracer is enabled via the command line, and is started before the tracing directory is created, then it wont have its tracer specific options created. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 02 2月, 2015 2 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
The top level trace array is treated a little different than the instances, as it has to deal with more of the general tracing. The tr->dir is the tracing directory, which is an immutable dentry, where as the tr->dir of instances are the dentry that was created, and can be destroyed later. These should have different functions accessing them. As only tracing_init_dentry() deals with the top level array, fold the code for it into that function, and remove the trace_init_dentry_tr() that was also used by the instances to get their directory dentry. Add a tracing_get_dentry() to just get the tracing dir entry for instances as well as the top level array. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
tracing_init_dentry_tr() is not used outside of trace.c, it should be static. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 30 1月, 2015 1 次提交
-
-
由 Todd E Brandt 提交于
Export the suspend_resume tracepoint so it can be used in loadable modules. Signed-off-by: NTodd Brandt <todd.e.brandt@linux.intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-