1. 25 10月, 2017 1 次提交
    • Y
      bpf: permit multiple bpf attachments for a single perf event · e87c6bc3
      Yonghong Song 提交于
      This patch enables multiple bpf attachments for a
      kprobe/uprobe/tracepoint single trace event.
      Each trace_event keeps a list of attached perf events.
      When an event happens, all attached bpf programs will
      be executed based on the order of attachment.
      
      A global bpf_event_mutex lock is introduced to protect
      prog_array attaching and detaching. An alternative will
      be introduce a mutex lock in every trace_event_call
      structure, but it takes a lot of extra memory.
      So a global bpf_event_mutex lock is a good compromise.
      
      The bpf prog detachment involves allocation of memory.
      If the allocation fails, a dummy do-nothing program
      will replace to-be-detached program in-place.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e87c6bc3
  2. 29 8月, 2017 1 次提交
    • Z
      perf/ftrace: Fix double traces of perf on ftrace:function · 75e83876
      Zhou Chengming 提交于
      When running perf on the ftrace:function tracepoint, there is a bug
      which can be reproduced by:
      
        perf record -e ftrace:function -a sleep 20 &
        perf record -e ftrace:function ls
        perf script
      
                    ls 10304 [005]   171.853235: ftrace:function:
        perf_output_begin
                    ls 10304 [005]   171.853237: ftrace:function:
        perf_output_begin
                    ls 10304 [005]   171.853239: ftrace:function:
        task_tgid_nr_ns
                    ls 10304 [005]   171.853240: ftrace:function:
        task_tgid_nr_ns
                    ls 10304 [005]   171.853242: ftrace:function:
        __task_pid_nr_ns
                    ls 10304 [005]   171.853244: ftrace:function:
        __task_pid_nr_ns
      
      We can see that all the function traces are doubled.
      
      The problem is caused by the inconsistency of the register
      function perf_ftrace_event_register() with the probe function
      perf_ftrace_function_call(). The former registers one probe
      for every perf_event. And the latter handles all perf_events
      on the current cpu. So when two perf_events on the current cpu,
      the traces of them will be doubled.
      
      So this patch adds an extra parameter "event" for perf_tp_event,
      only send sample data to this event when it's not NULL.
      Signed-off-by: NZhou Chengming <zhouchengming1@huawei.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: alexander.shishkin@linux.intel.com
      Cc: huawei.libin@huawei.com
      Link: http://lkml.kernel.org/r/1503668977-12526-1-git-send-email-zhouchengming1@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      75e83876
  3. 09 7月, 2017 1 次提交
  4. 08 7月, 2017 1 次提交
  5. 30 6月, 2017 1 次提交
  6. 18 5月, 2017 1 次提交
  7. 04 4月, 2017 1 次提交
    • A
      tracing/kprobes: expose maxactive for kretprobe in kprobe_events · 696ced4f
      Alban Crequy 提交于
      When a kretprobe is installed on a kernel function, there is a maximum
      limit of how many calls in parallel it can catch (aka "maxactive"). A
      kernel module could call register_kretprobe() and initialize maxactive
      (see example in samples/kprobes/kretprobe_example.c).
      
      But that is not exposed to userspace and it is currently not possible to
      choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events
      
      The default maxactive can be as low as 1 on single-core with a
      non-preemptive kernel. This is too low and we need to increase it not
      only for recursive functions, but for functions that sleep or resched.
      
      This patch updates the format of the command that can be written to
      kprobe_events so that maxactive can be optionally specified.
      
      I need this for a bpf program attached to the kretprobe of
      inet_csk_accept, which can sleep for a long time.
      
      This patch includes a basic selftest:
      
      > # ./ftracetest -v  test.d/kprobe/
      > === Ftrace unit tests ===
      > [1] Kprobe dynamic event - adding and removing	[PASS]
      > [2] Kprobe dynamic event - busy event check	[PASS]
      > [3] Kprobe dynamic event with arguments	[PASS]
      > [4] Kprobes event arguments with types	[PASS]
      > [5] Kprobe dynamic event with function tracer	[PASS]
      > [6] Kretprobe dynamic event with arguments	[PASS]
      > [7] Kretprobe dynamic event with maxactive	[PASS]
      >
      > # of passed:  7
      > # of failed:  0
      > # of unresolved:  0
      > # of untested:  0
      > # of unsupported:  0
      > # of xfailed:  0
      > # of undefined(test bug):  0
      
      BugLink: https://github.com/iovisor/bcc/issues/1072
      Link: http://lkml.kernel.org/r/1491215782-15490-1-git-send-email-alban@kinvolk.ioAcked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NAlban Crequy <alban@kinvolk.io>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      696ced4f
  8. 16 3月, 2017 1 次提交
    • N
      trace/kprobes: Fix check for kretprobe offset within function entry · 1d585e70
      Naveen N. Rao 提交于
      perf specifies an offset from _text and since this offset is fed
      directly into the arch-specific helper, kprobes tracer rejects
      installation of kretprobes through perf. Fix this by looking up the
      actual offset from a function for the specified sym+offset.
      
      Refactor and reuse existing routines to limit code duplication -- we
      repurpose kprobe_addr() for determining final kprobe address and we
      split out the function entry offset determination into a separate
      generic helper.
      
      Before patch:
      
        naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -v do_open%return
        probe-definition(0): do_open%return
        symbol:do_open file:(null) line:0 offset:0 return:1 lazy:(null)
        0 arguments
        Looking at the vmlinux_path (8 entries long)
        Using /boot/vmlinux for symbols
        Open Debuginfo file: /boot/vmlinux
        Try to find probe point from debuginfo.
        Matched function: do_open [2d0c7ff]
        Probe point found: do_open+0
        Matched function: do_open [35d76dc]
        found inline addr: 0xc0000000004ba9c4
        Failed to find "do_open%return",
         because do_open is an inlined function and has no return point.
        An error occurred in debuginfo analysis (-22).
        Trying to use symbols.
        Opening /sys/kernel/debug/tracing//README write=0
        Opening /sys/kernel/debug/tracing//kprobe_events write=1
        Writing event: r:probe/do_open _text+4469776
        Failed to write event: Invalid argument
          Error: Failed to add events. Reason: Invalid argument (Code: -22)
        naveen@ubuntu:~/linux/tools/perf$ dmesg | tail
        <snip>
        [   33.568656] Given offset is not valid for return probe.
      
      After patch:
      
        naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -v do_open%return
        probe-definition(0): do_open%return
        symbol:do_open file:(null) line:0 offset:0 return:1 lazy:(null)
        0 arguments
        Looking at the vmlinux_path (8 entries long)
        Using /boot/vmlinux for symbols
        Open Debuginfo file: /boot/vmlinux
        Try to find probe point from debuginfo.
        Matched function: do_open [2d0c7d6]
        Probe point found: do_open+0
        Matched function: do_open [35d76b3]
        found inline addr: 0xc0000000004ba9e4
        Failed to find "do_open%return",
         because do_open is an inlined function and has no return point.
        An error occurred in debuginfo analysis (-22).
        Trying to use symbols.
        Opening /sys/kernel/debug/tracing//README write=0
        Opening /sys/kernel/debug/tracing//kprobe_events write=1
        Writing event: r:probe/do_open _text+4469808
        Writing event: r:probe/do_open_1 _text+4956344
        Added new events:
          probe:do_open        (on do_open%return)
          probe:do_open_1      (on do_open%return)
      
        You can now use it in all perf tools, such as:
      
      	  perf record -e probe:do_open_1 -aR sleep 1
      
        naveen@ubuntu:~/linux/tools/perf$ sudo cat /sys/kernel/debug/kprobes/list
        c000000000041370  k  kretprobe_trampoline+0x0    [OPTIMIZED]
        c0000000004ba0b8  r  do_open+0x8    [DISABLED]
        c000000000443430  r  do_open+0x0    [DISABLED]
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/d8cd1ef420ec22e3643ac332fdabcffc77319a42.1488961018.git.naveen.n.rao@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1d585e70
  9. 04 3月, 2017 2 次提交
  10. 02 3月, 2017 1 次提交
  11. 15 2月, 2017 1 次提交
  12. 02 2月, 2017 1 次提交
  13. 13 12月, 2016 2 次提交
  14. 24 8月, 2016 1 次提交
  15. 20 6月, 2016 1 次提交
  16. 08 4月, 2016 1 次提交
  17. 23 3月, 2016 1 次提交
  18. 09 2月, 2016 1 次提交
    • M
      kprobes: Optimize hot path by using percpu counter to collect 'nhit' statistics · a7636d9e
      Martin KaFai Lau 提交于
      When doing ebpf+kprobe on some hot TCP functions (e.g.
      tcp_rcv_established), the kprobe_dispatcher() function
      shows up in 'perf report'.
      
      In kprobe_dispatcher(), there is a lot of cache bouncing
      on 'tk->nhit++'.  'tk->nhit' and 'tk->tp.flags' also share
      the same cacheline.
      
      perf report (cycles:pp):
      
      	8.30%  ipv4_dst_check
      	4.74%  copy_user_enhanced_fast_string
      	3.93%  dst_release
      	2.80%  tcp_v4_rcv
      	2.31%  queued_spin_lock_slowpath
      	2.30%  _raw_spin_lock
      	1.88%  mlx4_en_process_rx_cq
      	1.84%  eth_get_headlen
      	1.81%  ip_rcv_finish
      	~~~~
      	1.71%  kprobe_dispatcher
      	~~~~
      	1.55%  mlx4_en_xmit
      	1.09%  __probe_kernel_read
      
      perf report after patch:
      
      	9.15%  ipv4_dst_check
      	5.00%  copy_user_enhanced_fast_string
      	4.12%  dst_release
      	2.96%  tcp_v4_rcv
      	2.50%  _raw_spin_lock
      	2.39%  queued_spin_lock_slowpath
      	2.11%  eth_get_headlen
      	2.03%  mlx4_en_process_rx_cq
      	1.69%  mlx4_en_xmit
      	1.19%  ip_rcv_finish
      	1.12%  __probe_kernel_read
      	1.02%  ehci_hcd_cleanup
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Kernel Team <kernel-team@fb.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1454531308-2441898-1-git-send-email-kafai@fb.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a7636d9e
  19. 29 8月, 2015 1 次提交
  20. 14 5月, 2015 5 次提交
  21. 02 4月, 2015 2 次提交
    • A
      tracing, perf: Implement BPF programs attached to kprobes · 2541517c
      Alexei Starovoitov 提交于
      BPF programs, attached to kprobes, provide a safe way to execute
      user-defined BPF byte-code programs without being able to crash or
      hang the kernel in any way. The BPF engine makes sure that such
      programs have a finite execution time and that they cannot break
      out of their sandbox.
      
      The user interface is to attach to a kprobe via the perf syscall:
      
      	struct perf_event_attr attr = {
      		.type	= PERF_TYPE_TRACEPOINT,
      		.config	= event_id,
      		...
      	};
      
      	event_fd = perf_event_open(&attr,...);
      	ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
      
      'prog_fd' is a file descriptor associated with BPF program
      previously loaded.
      
      'event_id' is an ID of the kprobe created.
      
      Closing 'event_fd':
      
      	close(event_fd);
      
      ... automatically detaches BPF program from it.
      
      BPF programs can call in-kernel helper functions to:
      
        - lookup/update/delete elements in maps
      
        - probe_read - wraper of probe_kernel_read() used to access any
          kernel data structures
      
      BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
      architecture dependent) and return 0 to ignore the event and 1 to store
      kprobe event into the ring buffer.
      
      Note, kprobes are a fundamentally _not_ a stable kernel ABI,
      so BPF programs attached to kprobes must be recompiled for
      every kernel version and user must supply correct LINUX_VERSION_CODE
      in attr.kern_version during bpf_prog_load() call.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2541517c
    • A
      tracing: Add kprobe flag · 72cbbc89
      Alexei Starovoitov 提交于
      add TRACE_EVENT_FL_KPROBE flag to differentiate kprobe type of
      tracepoints, since bpf programs can only be attached to kprobe
      type of PERF_TYPE_TRACEPOINT perf events.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-3-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      72cbbc89
  22. 25 3月, 2015 1 次提交
  23. 04 2月, 2015 1 次提交
    • S
      tracing: Convert the tracing facility over to use tracefs · 8434dc93
      Steven Rostedt (Red Hat) 提交于
      debugfs was fine for the tracing facility as a quick way to get
      an interface. Now that tracing has matured, it should separate itself
      from debugfs such that it can be mounted separately without needing
      to mount all of debugfs with it. That is, users resist using tracing
      because it requires mounting debugfs. Having tracing have its own file
      system lets users get the features of tracing without needing to bring
      in the rest of the kernel's debug infrastructure.
      
      Another reason for tracefs is that debubfs does not support mkdir.
      Currently, to create instances, one does a mkdir in the tracing/instance
      directory. This is implemented via a hack that forces debugfs to do
      something it is not intended on doing. By converting over to tracefs, this
      hack can be removed and mkdir can be properly implemented. This patch does
      not address this yet, but it lays the ground work for that to be done.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8434dc93
  24. 23 1月, 2015 1 次提交
  25. 14 1月, 2015 1 次提交
    • P
      perf: Avoid horrible stack usage · 86038c5e
      Peter Zijlstra (Intel) 提交于
      Both Linus (most recent) and Steve (a while ago) reported that perf
      related callbacks have massive stack bloat.
      
      The problem is that software events need a pt_regs in order to
      properly report the event location and unwind stack. And because we
      could not assume one was present we allocated one on stack and filled
      it with minimal bits required for operation.
      
      Now, pt_regs is quite large, so this is undesirable. Furthermore it
      turns out that most sites actually have a pt_regs pointer available,
      making this even more onerous, as the stack space is pointless waste.
      
      This patch addresses the problem by observing that software events
      have well defined nesting semantics, therefore we can use static
      per-cpu storage instead of on-stack.
      
      Linus made the further observation that all but the scheduler callers
      of perf_sw_event() have a pt_regs available, so we change the regular
      perf_sw_event() to require a valid pt_regs (where it used to be
      optional) and add perf_sw_event_sched() for the scheduler.
      
      We have a scheduler specific call instead of a more generic _noregs()
      like construct because we can assume non-recursion from the scheduler
      and thereby simplify the code further (_noregs would have to put the
      recursion context call inline in order to assertain which __perf_regs
      element to use).
      
      One last note on the implementation of perf_trace_buf_prepare(); we
      allow .regs = NULL for those cases where we already have a pt_regs
      pointer available and do not need another.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86038c5e
  26. 20 11月, 2014 1 次提交
  27. 14 11月, 2014 2 次提交
  28. 06 6月, 2014 1 次提交
  29. 24 4月, 2014 2 次提交
  30. 09 4月, 2014 1 次提交
  31. 21 2月, 2014 1 次提交