1. 20 7月, 2017 1 次提交
  2. 28 6月, 2017 1 次提交
    • J
      tracing: Add support for recording tgid of tasks · d914ba37
      Joel Fernandes 提交于
      Inorder to support recording of tgid, the following changes are made:
      
      * Introduce a new API (tracing_record_taskinfo) to additionally record the tgid
        along with the task's comm at the same time. This has has the benefit of not
        setting trace_cmdline_save before all the information for a task is saved.
      * Add a new API tracing_record_taskinfo_sched_switch to record task information
        for 2 tasks at a time (previous and next) and use it from sched_switch probe.
      * Preserve the old API (tracing_record_cmdline) and create it as a wrapper
        around the new one so that existing callers aren't affected.
      * Reuse the existing sched_switch and sched_wakeup probes to record tgid
        information and add a new option 'record-tgid' to enable recording of tgid
      
      When record-tgid option isn't enabled to being with, we take care to make sure
      that there's isn't memory or runtime overhead.
      
      Link: http://lkml.kernel.org/r/20170627020155.5139-1-joelaf@google.com
      
      Cc: kernel-team@android.com
      Cc: Ingo Molnar <mingo@redhat.com>
      Tested-by: NMichael Sartain <mikesart@gmail.com>
      Signed-off-by: NJoel Fernandes <joelaf@google.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      d914ba37
  3. 25 3月, 2017 1 次提交
    • S
      tracing: Move trace_handle_return() out of line · af0009fc
      Steven Rostedt (VMware) 提交于
      Currently trace_handle_return() looks like this:
      
       static inline enum print_line_t trace_handle_return(struct trace_seq *s)
       {
              return trace_seq_has_overflowed(s) ?
                      TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED;
       }
      
      Where trace_seq_overflowed(s) is:
      
       static inline bool trace_seq_has_overflowed(struct trace_seq *s)
       {
      	return s->full || seq_buf_has_overflowed(&s->seq);
       }
      
      And seq_buf_has_overflowed(&s->seq) is:
      
       static inline bool
       seq_buf_has_overflowed(struct seq_buf *s)
       {
      	return s->len > s->size;
       }
      
      Making trace_handle_return() into:
      
       return (s->full || (s->seq->len > s->seq->size)) ?
                 TRACE_TYPE_PARTIAL_LINE :
                 TRACE_TYPE_HANDLED;
      
      One would think this is not an issue to keep as an inline. But because this
      is used in the TRACE_EVENT() macro, it is extended for every tracepoint in
      the system. Taking a look at a single tracepoint x86_irq_vector (was the
      first one I randomly chosen). As trace_handle_return is used in the
      TRACE_EVENT() macro of trace_raw_output_##call() we disassemble
      trace_raw_output_x86_irq_vector and do a diff:
      
      - is the original
      + is the out-of-line code
      
      I removed identical lines that were different just due to different
      addresses.
      
      --- /tmp/irq-vec-orig	2017-03-16 09:12:48.569384851 -0400
      +++ /tmp/irq-vec-ool	2017-03-16 09:13:39.378153385 -0400
      @@ -6,27 +6,23 @@
              53                      push   %rbx
              48 89 fb                mov    %rdi,%rbx
              4c 8b a7 c0 20 00 00    mov    0x20c0(%rdi),%r12
              e8 f7 72 13 00          callq  ffffffff81155c80 <trace_raw_output_prep>
              83 f8 01                cmp    $0x1,%eax
              74 05                   je     ffffffff8101e993 <trace_raw_output_x86_irq_vector+0x23>
              5b                      pop    %rbx
              41 5c                   pop    %r12
              5d                      pop    %rbp
              c3                      retq
              41 8b 54 24 08          mov    0x8(%r12),%edx
      -       48 8d bb 98 10 00 00    lea    0x1098(%rbx),%rdi
      +       48 81 c3 98 10 00 00    add    $0x1098,%rbx
      -       48 c7 c6 7b 8a a0 81    mov    $0xffffffff81a08a7b,%rsi
      +       48 c7 c6 ab 8a a0 81    mov    $0xffffffff81a08aab,%rsi
      -       e8 c5 85 13 00          callq  ffffffff81156f70 <trace_seq_printf>
      
       === here's the start of the main difference ===
      
      +       48 89 df                mov    %rbx,%rdi
      +       e8 62 7e 13 00          callq  ffffffff81156810 <trace_seq_printf>
      -       8b 93 b8 20 00 00       mov    0x20b8(%rbx),%edx
      -       31 c0                   xor    %eax,%eax
      -       85 d2                   test   %edx,%edx
      -       75 11                   jne    ffffffff8101e9c8 <trace_raw_output_x86_irq_vector+0x58>
      -       48 8b 83 a8 20 00 00    mov    0x20a8(%rbx),%rax
      -       48 39 83 a0 20 00 00    cmp    %rax,0x20a0(%rbx)
      -       0f 93 c0                setae  %al
      +       48 89 df                mov    %rbx,%rdi
      +       e8 4a c5 12 00          callq  ffffffff8114af00 <trace_handle_return>
              5b                      pop    %rbx
      -       0f b6 c0                movzbl %al,%eax
      
       === end ===
      
              41 5c                   pop    %r12
              5d                      pop    %rbp
              c3                      retq
      
      If you notice, the original has 22 bytes of text more than the out of line
      version. As this is for every TRACE_EVENT() defined in the system, this can
      become quite large.
      
         text	   data	    bss	    dec	    hex	filename
      8690305	5450490	1298432	15439227	 eb957b	vmlinux-orig
      8681725	5450490	1298432	15430647	 eb73f7	vmlinux-handle
      
      This change has a total of 8580 bytes in savings.
      
       $ objdump -dr /tmp/vmlinux-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
      324
      
      That's 324 tracepoints. But this does not include modules (which contain
      many more tracepoints). For an allyesconfig build:
      
       $ objdump -dr vmlinux-allyes-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
      1401
      
      That's 1401 tracepoints giving us:
      
         text    data     bss     dec     hex filename
      137920629       140221067       53264384        331406080       13c0db00 vmlinux-allyes-orig
      137827709       140221067       53264384        331313160       13bf7008 vmlinux-allyes-handle
      
      92920 bytes in savings!!!
      
      Link: http://lkml.kernel.org/r/20170315021431.13107-2-andi@firstfloor.orgReported-by: NAndi Kleen <andi@firstfloor.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      af0009fc
  4. 23 2月, 2017 1 次提交
    • R
      tracing: add __print_flags_u64() · d3213e8f
      Ross Zwisler 提交于
      Patch series "DAX tracepoints, mm argument simplification", v4.
      
      This contains both my DAX tracepoint code and Dave Jiang's MM argument
      simplifications.  Dave's code was written with my tracepoint code as a
      baseline, so it seemed simplest to keep them together in a single series.
      
      This patch (of 7):
      
      Add __print_flags_u64() and the helper trace_print_flags_seq_u64() in the
      same spirit as __print_symbolic_u64() and trace_print_symbols_seq_u64().
      These functions allow us to print symbols associated with flags that are
      64 bits wide even on 32 bit machines.
      
      These will be used by the DAX code so that we can print the flags set in a
      pfn_t such as PFN_SG_CHAIN, PFN_SG_LAST, PFN_DEV and PFN_MAP.
      
      Without this new function I was getting errors like the following when
      compiling for i386:
      
        include/linux/pfn_t.h:13:22: warning: large integer implicitly truncated to unsigned type [-Woverflow]
         #define PFN_SG_CHAIN (1ULL << (BITS_PER_LONG_LONG - 1))
          ^
      
      Link: http://lkml.kernel.org/r/1484085142-2297-2-git-send-email-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d3213e8f
  5. 04 2月, 2017 1 次提交
  6. 26 1月, 2017 1 次提交
  7. 03 5月, 2016 1 次提交
  8. 30 4月, 2016 3 次提交
  9. 27 4月, 2016 2 次提交
  10. 22 4月, 2016 1 次提交
  11. 20 4月, 2016 2 次提交
    • T
      tracing: Add enable_hist/disable_hist triggers · d0bad49b
      Tom Zanussi 提交于
      Similar to enable_event/disable_event triggers, these triggers enable
      and disable the aggregation of events into maps rather than enabling
      and disabling their writing into the trace buffer.
      
      They can be used to automatically start and stop hist triggers based
      on a matching filter condition.
      
      If there's a paused hist trigger on system:event, the following would
      start it when the filter condition was hit:
      
        # echo enable_hist:system:event [ if filter] > event/trigger
      
      And the following would disable a running system:event hist trigger:
      
        # echo disable_hist:system:event [ if filter] > event/trigger
      
      See Documentation/trace/events.txt for real examples.
      
      Link: http://lkml.kernel.org/r/f812f086e52c8b7c8ad5443487375e03c96a601f.1457029949.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d0bad49b
    • T
      tracing: Add 'hist' event trigger command · 7ef224d1
      Tom Zanussi 提交于
      'hist' triggers allow users to continually aggregate trace events,
      which can then be viewed afterwards by simply reading a 'hist' file
      containing the aggregation in a human-readable format.
      
      The basic idea is very simple and boils down to a mechanism whereby
      trace events, rather than being exhaustively dumped in raw form and
      viewed directly, are automatically 'compressed' into meaningful tables
      completely defined by the user.
      
      This is done strictly via single-line command-line commands and
      without the aid of any kind of programming language or interpreter.
      
      A surprising number of typical use cases can be accomplished by users
      via this simple mechanism.  In fact, a large number of the tasks that
      users typically do using the more complicated script-based tracing
      tools, at least during the initial stages of an investigation, can be
      accomplished by simply specifying a set of keys and values to be used
      in the creation of a hash table.
      
      The Linux kernel trace event subsystem happens to provide an extensive
      list of keys and values ready-made for such a purpose in the form of
      the event format files associated with each trace event.  By simply
      consulting the format file for field names of interest and by plugging
      them into the hist trigger command, users can create an endless number
      of useful aggregations to help with investigating various properties
      of the system.  See Documentation/trace/events.txt for examples.
      
      hist triggers are implemented on top of the existing event trigger
      infrastructure, and as such are consistent with the existing triggers
      from a user's perspective as well.
      
      The basic syntax follows the existing trigger syntax.  Users start an
      aggregation by writing a 'hist' trigger to the event of interest's
      trigger file:
      
        # echo hist:keys=xxx [ if filter] > event/trigger
      
      Once a hist trigger has been set up, by default it continually
      aggregates every matching event into a hash table using the event key
      and a value field named 'hitcount'.
      
      To view the aggregation at any point in time, simply read the 'hist'
      file in the same directory as the 'trigger' file:
      
        # cat event/hist
      
      The detailed syntax provides additional options for user control, and
      is described exhaustively in Documentation/trace/events.txt and in the
      virtual tracing/README file in the tracing subsystem.
      
      Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7ef224d1
  12. 08 4月, 2016 2 次提交
  13. 16 3月, 2016 1 次提交
  14. 09 3月, 2016 1 次提交
  15. 04 3月, 2016 1 次提交
    • S
      tracing: Do not have 'comm' filter override event 'comm' field · e57cbaf0
      Steven Rostedt (Red Hat) 提交于
      Commit 9f616680 "tracing: Allow triggers to filter for CPU ids and
      process names" added a 'comm' filter that will filter events based on the
      current tasks struct 'comm'. But this now hides the ability to filter events
      that have a 'comm' field too. For example, sched_migrate_task trace event.
      That has a 'comm' field of the task to be migrated.
      
       echo 'comm == "bash"' > events/sched_migrate_task/filter
      
      will now filter all sched_migrate_task events for tasks named "bash" that
      migrates other tasks (in interrupt context), instead of seeing when "bash"
      itself gets migrated.
      
      This fix requires a couple of changes.
      
      1) Change the look up order for filter predicates to look at the events
         fields before looking at the generic filters.
      
      2) Instead of basing the filter function off of the "comm" name, have the
         generic "comm" filter have its own filter_type (FILTER_COMM). Test
         against the type instead of the name to assign the filter function.
      
      3) Add a new "COMM" filter that works just like "comm" but will filter based
         on the current task, even if the trace event contains a "comm" field.
      
      Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".
      
      Cc: stable@vger.kernel.org # v4.3+
      Fixes: 9f616680 "tracing: Allow triggers to filter for CPU ids and process names"
      Reported-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e57cbaf0
  16. 26 10月, 2015 1 次提交
    • S
      tracing: Implement event pid filtering · 3fdaf80f
      Steven Rostedt (Red Hat) 提交于
      Add the necessary hooks to use the pids loaded in set_event_pid to filter
      all the events enabled in the tracing instance that match the pids listed.
      
      Two probes are added to both sched_switch and sched_wakeup tracepoints to be
      called before other probes are called and after the other probes are called.
      The first is used to set the necessary flags to let the probes know to test
      if they should be traced or not.
      
      The sched_switch pre probe will set the "ignore_pid" flag if neither the
      previous or next task has a matching pid.
      
      The sched_switch probe will set the "ignore_pid" flag if the next task
      does not match the matching pid.
      
      The pre probe allows for probes tracing sched_switch to be traced if
      necessary.
      
      The sched_wakeup pre probe will set the "ignore_pid" flag if neither the
      current task nor the wakee task has a matching pid.
      
      The sched_wakeup post probe will set the "ignore_pid" flag if the current
      task does not have a matching pid.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3fdaf80f
  17. 26 9月, 2015 2 次提交
  18. 07 8月, 2015 2 次提交
    • W
      tracing, perf: Implement BPF programs attached to uprobes · 04a22fae
      Wang Nan 提交于
      By copying BPF related operation to uprobe processing path, this patch
      allow users attach BPF programs to uprobes like what they are already
      doing on kprobes.
      
      After this patch, users are allowed to use PERF_EVENT_IOC_SET_BPF on a
      uprobe perf event. Which make it possible to profile user space programs
      and kernel events together using BPF.
      
      Because of this patch, CONFIG_BPF_EVENTS should be selected by
      CONFIG_UPROBE_EVENT to ensure trace_call_bpf() is compiled even if
      KPROBE_EVENT is not set.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1435716878-189507-3-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      04a22fae
    • W
      bpf: Use correct #ifdef controller for trace_call_bpf() · 098d2164
      Wang Nan 提交于
      Commit e1abf2cc ("bpf: Fix the build on
      BPF_SYSCALL=y && !CONFIG_TRACING kernels, make it more configurable")
      updated the building condition of bpf_trace.o from CONFIG_BPF_SYSCALL
      to CONFIG_BPF_EVENTS, but the corresponding #ifdef controller in
      trace_events.h for trace_call_bpf() was not changed. Which, in theory,
      is incorrect.
      
      With current Kconfigs, we can create a .config with CONFIG_BPF_SYSCALL=y
      and CONFIG_BPF_EVENTS=n by unselecting CONFIG_KPROBE_EVENT and
      selecting CONFIG_BPF_SYSCALL. With these options, trace_call_bpf() will
      be defined as an extern function, but if anyone calls it a symbol missing
      error will be triggered since bpf_trace.o was not built.
      
      This patch changes the #ifdef controller for trace_call_bpf() from
      CONFIG_BPF_SYSCALL to CONFIG_BPF_EVENTS. I'll show its correctness:
      
      Before this patch:
      
         BPF_SYSCALL   BPF_EVENTS   trace_call_bpf   bpf_trace.o
         y             y           normal           compiled
         n             n           inline           not compiled
         y             n           normal           not compiled (incorrect)
         n             y          impossible (BPF_EVENTS depends on BPF_SYSCALL)
      
      After this patch:
      
         BPF_SYSCALL   BPF_EVENTS   trace_call_bpf   bpf_trace.o
         y             y           normal           compiled
         n             n           inline           not compiled
         y             n           inline           not compiled (fixed)
         n             y          impossible (BPF_EVENTS depends on BPF_SYSCALL)
      
      So this patch doesn't break anything. QED.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1435716878-189507-2-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      098d2164
  19. 14 5月, 2015 13 次提交
  20. 13 5月, 2015 1 次提交
  21. 07 5月, 2015 1 次提交