1. 16 3月, 2016 1 次提交
  2. 04 3月, 2016 1 次提交
    • S
      tracing: Do not have 'comm' filter override event 'comm' field · e57cbaf0
      Steven Rostedt (Red Hat) 提交于
      Commit 9f616680 "tracing: Allow triggers to filter for CPU ids and
      process names" added a 'comm' filter that will filter events based on the
      current tasks struct 'comm'. But this now hides the ability to filter events
      that have a 'comm' field too. For example, sched_migrate_task trace event.
      That has a 'comm' field of the task to be migrated.
      
       echo 'comm == "bash"' > events/sched_migrate_task/filter
      
      will now filter all sched_migrate_task events for tasks named "bash" that
      migrates other tasks (in interrupt context), instead of seeing when "bash"
      itself gets migrated.
      
      This fix requires a couple of changes.
      
      1) Change the look up order for filter predicates to look at the events
         fields before looking at the generic filters.
      
      2) Instead of basing the filter function off of the "comm" name, have the
         generic "comm" filter have its own filter_type (FILTER_COMM). Test
         against the type instead of the name to assign the filter function.
      
      3) Add a new "COMM" filter that works just like "comm" but will filter based
         on the current task, even if the trace event contains a "comm" field.
      
      Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".
      
      Cc: stable@vger.kernel.org # v4.3+
      Fixes: 9f616680 "tracing: Allow triggers to filter for CPU ids and process names"
      Reported-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e57cbaf0
  3. 26 10月, 2015 1 次提交
    • S
      tracing: Implement event pid filtering · 3fdaf80f
      Steven Rostedt (Red Hat) 提交于
      Add the necessary hooks to use the pids loaded in set_event_pid to filter
      all the events enabled in the tracing instance that match the pids listed.
      
      Two probes are added to both sched_switch and sched_wakeup tracepoints to be
      called before other probes are called and after the other probes are called.
      The first is used to set the necessary flags to let the probes know to test
      if they should be traced or not.
      
      The sched_switch pre probe will set the "ignore_pid" flag if neither the
      previous or next task has a matching pid.
      
      The sched_switch probe will set the "ignore_pid" flag if the next task
      does not match the matching pid.
      
      The pre probe allows for probes tracing sched_switch to be traced if
      necessary.
      
      The sched_wakeup pre probe will set the "ignore_pid" flag if neither the
      current task nor the wakee task has a matching pid.
      
      The sched_wakeup post probe will set the "ignore_pid" flag if the current
      task does not have a matching pid.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3fdaf80f
  4. 26 9月, 2015 2 次提交
  5. 07 8月, 2015 2 次提交
    • W
      tracing, perf: Implement BPF programs attached to uprobes · 04a22fae
      Wang Nan 提交于
      By copying BPF related operation to uprobe processing path, this patch
      allow users attach BPF programs to uprobes like what they are already
      doing on kprobes.
      
      After this patch, users are allowed to use PERF_EVENT_IOC_SET_BPF on a
      uprobe perf event. Which make it possible to profile user space programs
      and kernel events together using BPF.
      
      Because of this patch, CONFIG_BPF_EVENTS should be selected by
      CONFIG_UPROBE_EVENT to ensure trace_call_bpf() is compiled even if
      KPROBE_EVENT is not set.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1435716878-189507-3-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      04a22fae
    • W
      bpf: Use correct #ifdef controller for trace_call_bpf() · 098d2164
      Wang Nan 提交于
      Commit e1abf2cc ("bpf: Fix the build on
      BPF_SYSCALL=y && !CONFIG_TRACING kernels, make it more configurable")
      updated the building condition of bpf_trace.o from CONFIG_BPF_SYSCALL
      to CONFIG_BPF_EVENTS, but the corresponding #ifdef controller in
      trace_events.h for trace_call_bpf() was not changed. Which, in theory,
      is incorrect.
      
      With current Kconfigs, we can create a .config with CONFIG_BPF_SYSCALL=y
      and CONFIG_BPF_EVENTS=n by unselecting CONFIG_KPROBE_EVENT and
      selecting CONFIG_BPF_SYSCALL. With these options, trace_call_bpf() will
      be defined as an extern function, but if anyone calls it a symbol missing
      error will be triggered since bpf_trace.o was not built.
      
      This patch changes the #ifdef controller for trace_call_bpf() from
      CONFIG_BPF_SYSCALL to CONFIG_BPF_EVENTS. I'll show its correctness:
      
      Before this patch:
      
         BPF_SYSCALL   BPF_EVENTS   trace_call_bpf   bpf_trace.o
         y             y           normal           compiled
         n             n           inline           not compiled
         y             n           normal           not compiled (incorrect)
         n             y          impossible (BPF_EVENTS depends on BPF_SYSCALL)
      
      After this patch:
      
         BPF_SYSCALL   BPF_EVENTS   trace_call_bpf   bpf_trace.o
         y             y           normal           compiled
         n             n           inline           not compiled
         y             n           inline           not compiled (fixed)
         n             y          impossible (BPF_EVENTS depends on BPF_SYSCALL)
      
      So this patch doesn't break anything. QED.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1435716878-189507-2-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      098d2164
  6. 14 5月, 2015 13 次提交
  7. 13 5月, 2015 1 次提交
  8. 07 5月, 2015 1 次提交
  9. 08 4月, 2015 2 次提交
  10. 02 4月, 2015 2 次提交
    • A
      tracing, perf: Implement BPF programs attached to kprobes · 2541517c
      Alexei Starovoitov 提交于
      BPF programs, attached to kprobes, provide a safe way to execute
      user-defined BPF byte-code programs without being able to crash or
      hang the kernel in any way. The BPF engine makes sure that such
      programs have a finite execution time and that they cannot break
      out of their sandbox.
      
      The user interface is to attach to a kprobe via the perf syscall:
      
      	struct perf_event_attr attr = {
      		.type	= PERF_TYPE_TRACEPOINT,
      		.config	= event_id,
      		...
      	};
      
      	event_fd = perf_event_open(&attr,...);
      	ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
      
      'prog_fd' is a file descriptor associated with BPF program
      previously loaded.
      
      'event_id' is an ID of the kprobe created.
      
      Closing 'event_fd':
      
      	close(event_fd);
      
      ... automatically detaches BPF program from it.
      
      BPF programs can call in-kernel helper functions to:
      
        - lookup/update/delete elements in maps
      
        - probe_read - wraper of probe_kernel_read() used to access any
          kernel data structures
      
      BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
      architecture dependent) and return 0 to ignore the event and 1 to store
      kprobe event into the ring buffer.
      
      Note, kprobes are a fundamentally _not_ a stable kernel ABI,
      so BPF programs attached to kprobes must be recompiled for
      every kernel version and user must supply correct LINUX_VERSION_CODE
      in attr.kern_version during bpf_prog_load() call.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2541517c
    • A
      tracing: Add kprobe flag · 72cbbc89
      Alexei Starovoitov 提交于
      add TRACE_EVENT_FL_KPROBE flag to differentiate kprobe type of
      tracepoints, since bpf programs can only be attached to kprobe
      type of PERF_TYPE_TRACEPOINT perf events.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-3-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      72cbbc89
  11. 28 1月, 2015 1 次提交
    • D
      tracing: Add array printing helper · 6ea22486
      Dave Martin 提交于
      If a trace event contains an array, there is currently no standard
      way to format this for text output.  Drivers are currently hacking
      around this by a) local hacks that use the trace_seq functionailty
      directly, or b) just not printing that information.  For fixed size
      arrays, formatting of the elements can be open-coded, but this gets
      cumbersome for arrays of non-trivial size.
      
      These approaches result in non-standard content of the event format
      description delivered to userspace, so userland tools needs to be
      taught to understand and parse each array printing method
      individually.
      
      This patch implements a __print_array() helper that tracepoint
      implementations can use instead of reinventing it.  A simple C-style
      syntax is used to delimit the array and its elements {like,this}.
      
      So that the helpers can be used with large static arrays as well as
      dynamic arrays, they take a pointer and element count: they can be
      used with __get_dynamic_array() for use with dynamic arrays.
      Link: http://lkml.kernel.org/r/1422449335-8289-2-git-send-email-javi.merino@arm.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NJavi Merino <javi.merino@arm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6ea22486
  12. 14 1月, 2015 1 次提交
    • P
      perf: Avoid horrible stack usage · 86038c5e
      Peter Zijlstra (Intel) 提交于
      Both Linus (most recent) and Steve (a while ago) reported that perf
      related callbacks have massive stack bloat.
      
      The problem is that software events need a pt_regs in order to
      properly report the event location and unwind stack. And because we
      could not assume one was present we allocated one on stack and filled
      it with minimal bits required for operation.
      
      Now, pt_regs is quite large, so this is undesirable. Furthermore it
      turns out that most sites actually have a pt_regs pointer available,
      making this even more onerous, as the stack space is pointless waste.
      
      This patch addresses the problem by observing that software events
      have well defined nesting semantics, therefore we can use static
      per-cpu storage instead of on-stack.
      
      Linus made the further observation that all but the scheduler callers
      of perf_sw_event() have a pt_regs available, so we change the regular
      perf_sw_event() to require a valid pt_regs (where it used to be
      optional) and add perf_sw_event_sched() for the scheduler.
      
      We have a scheduler specific call instead of a more generic _noregs()
      like construct because we can assume non-recursion from the scheduler
      and thereby simplify the code further (_noregs would have to put the
      recursion context call inline in order to assertain which __perf_regs
      element to use).
      
      One last note on the implementation of perf_trace_buf_prepare(); we
      allow .regs = NULL for those cases where we already have a pt_regs
      pointer available and do not need another.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86038c5e
  13. 20 11月, 2014 1 次提交
  14. 08 8月, 2014 1 次提交
  15. 17 7月, 2014 3 次提交
  16. 15 5月, 2014 1 次提交
    • S
      tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks · 4449bf92
      Steven Rostedt (Red Hat) 提交于
      Being able to show a cpumask of events can be useful as some events
      may affect only some CPUs. There is no standard way to record the
      cpumask and converting it to a string is rather expensive during
      the trace as traces happen in hotpaths. It would be better to record
      the raw event mask and be able to parse it at print time.
      
      The following macros were added for use with the TRACE_EVENT() macro:
      
        __bitmask()
        __assign_bitmask()
        __get_bitmask()
      
      To test this, I added this to the sched_migrate_task event, which
      looked like this:
      
      TRACE_EVENT(sched_migrate_task,
      
      	TP_PROTO(struct task_struct *p, int dest_cpu, const struct cpumask *cpus),
      
      	TP_ARGS(p, dest_cpu, cpus),
      
      	TP_STRUCT__entry(
      		__array(	char,	comm,	TASK_COMM_LEN	)
      		__field(	pid_t,	pid			)
      		__field(	int,	prio			)
      		__field(	int,	orig_cpu		)
      		__field(	int,	dest_cpu		)
      		__bitmask(	cpumask, num_possible_cpus()	)
      	),
      
      	TP_fast_assign(
      		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
      		__entry->pid		= p->pid;
      		__entry->prio		= p->prio;
      		__entry->orig_cpu	= task_cpu(p);
      		__entry->dest_cpu	= dest_cpu;
      		__assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus());
      	),
      
      	TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s",
      		  __entry->comm, __entry->pid, __entry->prio,
      		  __entry->orig_cpu, __entry->dest_cpu,
      		  __get_bitmask(cpumask))
      );
      
      With the output of:
      
              ksmtuned-3613  [003] d..2   485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f
           migration/1-13    [001] d..5   485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f
                   awk-3615  [002] d.H5   485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff
           migration/2-18    [002] d..5   485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f
      
      Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com
      Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.homeSuggested-by: NJavi Merino <javi.merino@arm.com>
      Tested-by: NJavi Merino <javi.merino@arm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4449bf92
  17. 09 4月, 2014 1 次提交
  18. 22 3月, 2014 1 次提交
  19. 21 3月, 2014 1 次提交
    • V
      tracing: Fix array size mismatch in format string · 87291347
      Vaibhav Nagarnaik 提交于
      In event format strings, the array size is reported in two locations.
      One in array subscript and then via the "size:" attribute. The values
      reported there have a mismatch.
      
      For e.g., in sched:sched_switch the prev_comm and next_comm character
      arrays have subscript values as [32] where as the actual field size is
      16.
      
      name: sched_switch
      ID: 301
      format:
              field:unsigned short common_type;       offset:0;       size:2; signed:0;
              field:unsigned char common_flags;       offset:2;       size:1; signed:0;
              field:unsigned char common_preempt_count;       offset:3;       size:1;signed:0;
              field:int common_pid;   offset:4;       size:4; signed:1;
      
              field:char prev_comm[32];       offset:8;       size:16;        signed:1;
              field:pid_t prev_pid;   offset:24;      size:4; signed:1;
              field:int prev_prio;    offset:28;      size:4; signed:1;
              field:long prev_state;  offset:32;      size:8; signed:1;
              field:char next_comm[32];       offset:40;      size:16;        signed:1;
              field:pid_t next_pid;   offset:56;      size:4; signed:1;
              field:int next_prio;    offset:60;      size:4; signed:1;
      
      After bisection, the following commit was blamed:
      92edca07 tracing: Use direct field, type and system names
      
      This commit removes the duplication of strings for field->name and
      field->type assuming that all the strings passed in
      __trace_define_field() are immutable. This is not true for arrays, where
      the type string is created in event_storage variable and field->type for
      all array fields points to event_storage.
      
      Use __stringify() to create a string constant for the type string.
      
      Also, get rid of event_storage and event_storage_mutex that are not
      needed anymore.
      
      also, an added benefit is that this reduces the overhead of events a bit more:
      
         text    data     bss     dec     hex filename
      8424787 2036472 1302528 11763787         b3804b vmlinux
      8420814 2036408 1302528 11759750         b37086 vmlinux.patched
      
      Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com
      
      Cc: Laurent Chavey <chavey@google.com>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      87291347
  20. 07 3月, 2014 3 次提交
    • S
      tracing: Use helper functions in event assignment to shrink macro size · 3fd40d1e
      Steven Rostedt 提交于
      The functions that assign the contents for the ftrace events are
      defined by the TRACE_EVENT() macros. Each event has its own unique
      way to assign data to its buffer. When you have over 500 events,
      that means there's 500 functions assigning data uniquely for each
      event (not really that many, as DECLARE_EVENT_CLASS() and multiple
      DEFINE_EVENT()s will only need a single function).
      
      By making helper functions in the core kernel to do some of the work
      instead, we can shrink the size of the kernel down a bit.
      
      With a kernel configured with 502 events, the change in size was:
      
         text    data     bss     dec     hex filename
      12987390        1913504 9785344 24686238        178ae9e /tmp/vmlinux
      12959102        1913504 9785344 24657950        178401e /tmp/vmlinux.patched
      
      That's a total of 28288 bytes, which comes down to 56 bytes per event.
      
      Link: http://lkml.kernel.org/r/20120810034708.370808175@goodmis.orgSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3fd40d1e
    • S
      tracing: Move event storage for array from macro to standalone function · 35bb4399
      Steven Rostedt 提交于
      The code that shows array fields for events is defined for all events.
      This can add up quite a bit when you have over 500 events.
      
      By making helper functions in the core kernel to do the work
      instead, we can shrink the size of the kernel down a bit.
      
      With a kernel configured with 502 events, the change in size was:
      
         text    data     bss     dec     hex filename
      12990946        1913568 9785344 24689858        178bcc2 /tmp/vmlinux
      12987390        1913504 9785344 24686238        178ae9e /tmp/vmlinux.patched
      
      That's a total of 3556 bytes, which comes down to 7 bytes per event.
      Although it's not much, this code is just called at initialization of
      the events.
      
      Link: http://lkml.kernel.org/r/20120810034708.084036335@goodmis.orgSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      35bb4399
    • S
      tracing: Move raw output code from macro to standalone function · 1d6bae96
      Steven Rostedt 提交于
      The code for trace events to format the raw recorded event data
      into human readable format in the 'trace' file is repeated for every
      event in the system. When you have over 500 events, this can add up
      quite a bit.
      
      By making helper functions in the core kernel to do the work
      instead, we can shrink the size of the kernel down a bit.
      
      With a kernel configured with 502 events, the change in size was:
      
         text    data     bss     dec     hex filename
      12991007        1913568 9785344 24689919        178bcff /tmp/vmlinux.orig
      12990946        1913568 9785344 24689858        178bcc2 /tmp/vmlinux.patched
      
      Note, this version does not save as much as the version of this patch
      I had a few years ago. That is because in the mean time, commit
      f71130de ("tracing: Add a helper function for event print functions")
      did a lot of the work my original patch did. But this change helps
      slightly, and is part of a larger clean up to reduce the size much further.
      
      Link: http://lkml.kernel.org/r/20120810034707.378538034@goodmis.org
      
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1d6bae96