1. 03 9月, 2016 2 次提交
    • S
      tracing: Add NMI tracing in hwlat detector · 7b2c8625
      Steven Rostedt (Red Hat) 提交于
      As NMIs can also cause latency when interrupts are disabled, the hwlat
      detectory has no way to know if the latency it detects is from an NMI or an
      SMI or some other hardware glitch.
      
      As ftrace_nmi_enter/exit() funtions are no longer used (except for sh, which
      isn't supported anymore), I converted those to "arch_ftrace_nmi_enter/exit"
      and use ftrace_nmi_enter/exit() to check if hwlat detector is tracing or
      not, and if so, it calls into the hwlat utility.
      
      Since the hwlat detector only has a single kthread that is spinning with
      interrupts disabled, it marks what CPU it is on, and if the NMI callback
      happens on that CPU, it records the time spent in that NMI. This is added to
      the output that is generated by the hwlat detector as:
      
       #3     inner/outer(us):    9/9     ts:1470836488.206734548
       #4     inner/outer(us):    0/8     ts:1470836497.140808588
       #5     inner/outer(us):    0/6     ts:1470836499.140825168 nmi-total:5 nmi-count:1
       #6     inner/outer(us):    9/9     ts:1470836501.140841748
      
      All time is still tracked in microseconds.
      
      The NMI information is only shown when an NMI occurred during the sample.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7b2c8625
    • S
      tracing: Added hardware latency tracer · e7c15cd8
      Steven Rostedt (Red Hat) 提交于
      The hardware latency tracer has been in the PREEMPT_RT patch for some time.
      It is used to detect possible SMIs or any other hardware interruptions that
      the kernel is unaware of. Note, NMIs may also be detected, but that may be
      good to note as well.
      
      The logic is pretty simple. It simply creates a thread that spins on a
      single CPU for a specified amount of time (width) within a periodic window
      (window). These numbers may be adjusted by their cooresponding names in
      
         /sys/kernel/tracing/hwlat_detector/
      
      The defaults are window = 1000000 us (1 second)
                       width  =  500000 us (1/2 second)
      
      The loop consists of:
      
      	t1 = trace_clock_local();
      	t2 = trace_clock_local();
      
      Where trace_clock_local() is a variant of sched_clock().
      
      The difference of t2 - t1 is recorded as the "inner" timestamp and also the
      timestamp  t1 - prev_t2 is recorded as the "outer" timestamp. If either of
      these differences are greater than the time denoted in
      /sys/kernel/tracing/tracing_thresh then it records the event.
      
      When this tracer is started, and tracing_thresh is zero, it changes to the
      default threshold of 10 us.
      
      The hwlat tracer in the PREEMPT_RT patch was originally written by
      Jon Masters. I have modified it quite a bit and turned it into a
      tracer.
      Based-on-code-by: NJon Masters <jcm@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e7c15cd8
  2. 23 3月, 2016 1 次提交
  3. 01 10月, 2015 1 次提交
    • S
      tracing: Move trace_flags from global to a trace_array field · 983f938a
      Steven Rostedt (Red Hat) 提交于
      In preparation to make trace options per instance, the global trace_flags
      needs to be moved from being a global variable to a field within the trace
      instance trace_array structure.
      
      There's still more work to do, as there's some functions that use
      trace_flags without passing in a way to get to the current_trace array. For
      those, the global_trace is used directly (from trace.c). This includes
      setting and clearing the trace_flags. This means that when a new instance is
      created, it just gets the trace_flags of the global_trace and will not be
      able to modify them. Depending on the functions that have access to the
      trace_array, the flags of an instance may not affect parts of its trace,
      where the global_trace is used. These will be fixed in future changes.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      983f938a
  4. 28 9月, 2015 2 次提交
  5. 21 7月, 2015 1 次提交
  6. 14 5月, 2015 6 次提交
  7. 07 5月, 2015 1 次提交
  8. 28 1月, 2015 1 次提交
    • D
      tracing: Add array printing helper · 6ea22486
      Dave Martin 提交于
      If a trace event contains an array, there is currently no standard
      way to format this for text output.  Drivers are currently hacking
      around this by a) local hacks that use the trace_seq functionailty
      directly, or b) just not printing that information.  For fixed size
      arrays, formatting of the elements can be open-coded, but this gets
      cumbersome for arrays of non-trivial size.
      
      These approaches result in non-standard content of the event format
      description delivered to userspace, so userland tools needs to be
      taught to understand and parse each array printing method
      individually.
      
      This patch implements a __print_array() helper that tracepoint
      implementations can use instead of reinventing it.  A simple C-style
      syntax is used to delimit the array and its elements {like,this}.
      
      So that the helpers can be used with large static arrays as well as
      dynamic arrays, they take a pointer and element count: they can be
      used with __get_dynamic_array() for use with dynamic arrays.
      Link: http://lkml.kernel.org/r/1422449335-8289-2-git-send-email-javi.merino@arm.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NJavi Merino <javi.merino@arm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6ea22486
  9. 04 12月, 2014 1 次提交
    • B
      tracing: Add additional marks to signal very large time deltas · 8e1e1df2
      Byungchul Park 提交于
      Currently, function graph tracer prints "!" or "+" just before
      function execution time to signal a function overhead, depending
      on the time. And some tracers tracing latency also print "!" or
      "+" just after time to signal overhead, depending on the interval
      between events. Even it is usually enough to do that, we sometimes
      need to signal for bigger execution time than 100 micro seconds.
      
      For example, I used function graph tracer to detect if there is
      any case that exit_mm() takes too much time. I did following steps
      in /sys/kernel/debug/tracing. It was easier to detect very large
      excution time with patched kernel than with original kernel.
      
      $ echo exit_mm > set_graph_function
      $ echo function_graph > current_tracer
      $ echo > trace
      $ cat trace_pipe > $LOGFILE
       ... (do something and terminate logging)
      $ grep "\\$" $LOGFILE
       3) $ 22082032 us |                      } /* kernel_map_pages */
       3) $ 22082040 us |                    } /* free_pages_prepare */
       3) $ 22082113 us |                  } /* free_hot_cold_page */
       3) $ 22083455 us |                } /* free_hot_cold_page_list */
       3) $ 22083895 us |              } /* release_pages */
       3) $ 22177873 us |            } /* free_pages_and_swap_cache */
       3) $ 22178929 us |          } /* unmap_single_vma */
       3) $ 22198885 us |        } /* unmap_vmas */
       3) $ 22206949 us |      } /* exit_mmap */
       3) $ 22207659 us |    } /* mmput */
       3) $ 22207793 us |  } /* exit_mm */
      
      And then, it was easy to find out that a schedule-out occured by
      sub_preempt_count() within kernel_map_pages().
      
      To detect very large function exection time caused by either problematic
      function implementation or scheduling issues, this patch can be useful.
      
      Link: http://lkml.kernel.org/r/1416789259-24038-1-git-send-email-byungchul.park@lge.comSigned-off-by: NByungchul Park <byungchul.park@lge.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8e1e1df2
  10. 20 11月, 2014 2 次提交
  11. 12 11月, 2014 1 次提交
  12. 01 7月, 2014 2 次提交
  13. 15 5月, 2014 1 次提交
    • S
      tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks · 4449bf92
      Steven Rostedt (Red Hat) 提交于
      Being able to show a cpumask of events can be useful as some events
      may affect only some CPUs. There is no standard way to record the
      cpumask and converting it to a string is rather expensive during
      the trace as traces happen in hotpaths. It would be better to record
      the raw event mask and be able to parse it at print time.
      
      The following macros were added for use with the TRACE_EVENT() macro:
      
        __bitmask()
        __assign_bitmask()
        __get_bitmask()
      
      To test this, I added this to the sched_migrate_task event, which
      looked like this:
      
      TRACE_EVENT(sched_migrate_task,
      
      	TP_PROTO(struct task_struct *p, int dest_cpu, const struct cpumask *cpus),
      
      	TP_ARGS(p, dest_cpu, cpus),
      
      	TP_STRUCT__entry(
      		__array(	char,	comm,	TASK_COMM_LEN	)
      		__field(	pid_t,	pid			)
      		__field(	int,	prio			)
      		__field(	int,	orig_cpu		)
      		__field(	int,	dest_cpu		)
      		__bitmask(	cpumask, num_possible_cpus()	)
      	),
      
      	TP_fast_assign(
      		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
      		__entry->pid		= p->pid;
      		__entry->prio		= p->prio;
      		__entry->orig_cpu	= task_cpu(p);
      		__entry->dest_cpu	= dest_cpu;
      		__assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus());
      	),
      
      	TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s",
      		  __entry->comm, __entry->pid, __entry->prio,
      		  __entry->orig_cpu, __entry->dest_cpu,
      		  __get_bitmask(cpumask))
      );
      
      With the output of:
      
              ksmtuned-3613  [003] d..2   485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f
           migration/1-13    [001] d..5   485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f
                   awk-3615  [002] d.H5   485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff
           migration/2-18    [002] d..5   485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f
      
      Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com
      Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.homeSuggested-by: NJavi Merino <javi.merino@arm.com>
      Tested-by: NJavi Merino <javi.merino@arm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4449bf92
  14. 09 4月, 2014 1 次提交
  15. 22 3月, 2014 1 次提交
  16. 07 3月, 2014 2 次提交
    • S
      tracing: Move event storage for array from macro to standalone function · 35bb4399
      Steven Rostedt 提交于
      The code that shows array fields for events is defined for all events.
      This can add up quite a bit when you have over 500 events.
      
      By making helper functions in the core kernel to do the work
      instead, we can shrink the size of the kernel down a bit.
      
      With a kernel configured with 502 events, the change in size was:
      
         text    data     bss     dec     hex filename
      12990946        1913568 9785344 24689858        178bcc2 /tmp/vmlinux
      12987390        1913504 9785344 24686238        178ae9e /tmp/vmlinux.patched
      
      That's a total of 3556 bytes, which comes down to 7 bytes per event.
      Although it's not much, this code is just called at initialization of
      the events.
      
      Link: http://lkml.kernel.org/r/20120810034708.084036335@goodmis.orgSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      35bb4399
    • S
      tracing: Move raw output code from macro to standalone function · 1d6bae96
      Steven Rostedt 提交于
      The code for trace events to format the raw recorded event data
      into human readable format in the 'trace' file is repeated for every
      event in the system. When you have over 500 events, this can add up
      quite a bit.
      
      By making helper functions in the core kernel to do the work
      instead, we can shrink the size of the kernel down a bit.
      
      With a kernel configured with 502 events, the change in size was:
      
         text    data     bss     dec     hex filename
      12991007        1913568 9785344 24689919        178bcff /tmp/vmlinux.orig
      12990946        1913568 9785344 24689858        178bcc2 /tmp/vmlinux.patched
      
      Note, this version does not save as much as the version of this patch
      I had a few years ago. That is because in the mean time, commit
      f71130de ("tracing: Add a helper function for event print functions")
      did a lot of the work my original patch did. But this change helps
      slightly, and is part of a larger clean up to reduce the size much further.
      
      Link: http://lkml.kernel.org/r/20120810034707.378538034@goodmis.org
      
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1d6bae96
  17. 11 11月, 2013 1 次提交
  18. 19 7月, 2013 1 次提交
  19. 16 3月, 2013 1 次提交
  20. 15 3月, 2013 3 次提交
    • S
      tracing: Add trace_puts() for even faster trace_printk() tracing · 09ae7234
      Steven Rostedt (Red Hat) 提交于
      The trace_printk() is extremely fast and is very handy as it can be
      used in any context (including NMIs!). But it still requires scanning
      the fmt string for parsing the args. Even the trace_bprintk() requires
      a scan to know what args will be saved, although it doesn't copy the
      format string itself.
      
      Several times trace_printk() has no args, and wastes cpu cycles scanning
      the fmt string.
      
      Adding trace_puts() allows the developer to use an even faster
      tracing method that only saves the pointer to the string in the
      ring buffer without doing any format parsing at all. This will
      help remove even more of the "Heisenbug" effect, when debugging.
      
      Also fixed up the F_printk()s for the ftrace internal bprint and print events.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      09ae7234
    • S
      tracing: Consolidate max_tr into main trace_array structure · 12883efb
      Steven Rostedt (Red Hat) 提交于
      Currently, the way the latency tracers and snapshot feature works
      is to have a separate trace_array called "max_tr" that holds the
      snapshot buffer. For latency tracers, this snapshot buffer is used
      to swap the running buffer with this buffer to save the current max
      latency.
      
      The only items needed for the max_tr is really just a copy of the buffer
      itself, the per_cpu data pointers, the time_start timestamp that states
      when the max latency was triggered, and the cpu that the max latency
      was triggered on. All other fields in trace_array are unused by the
      max_tr, making the max_tr mostly bloat.
      
      This change removes the max_tr completely, and adds a new structure
      called trace_buffer, that holds the buffer pointer, the per_cpu data
      pointers, the time_start timestamp, and the cpu where the latency occurred.
      
      The trace_array, now has two trace_buffers, one for the normal trace and
      one for the max trace or snapshot. By doing this, not only do we remove
      the bloat from the max_trace but the instances of traces can now use
      their own snapshot feature and not have just the top level global_trace have
      the snapshot feature and latency tracers for itself.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      12883efb
    • L
      tracing: Add a helper function for event print functions · f71130de
      Li Zefan 提交于
      Move duplicate code in event print functions to a helper function.
      
      This shrinks the size of the kernel by ~13K.
      
         text    data     bss     dec     hex filename
      6596137 1743966 10138672        18478775        119f6b7 vmlinux.o.old
      6583002 1743849 10138672        18465523        119c2f3 vmlinux.o.new
      
      Link: http://lkml.kernel.org/r/51258746.2060304@huawei.comSigned-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f71130de
  21. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  22. 14 11月, 2012 1 次提交
    • D
      tracing: Format non-nanosec times from tsc clock without a decimal point. · 8be0709f
      David Sharp 提交于
      With the addition of the "tsc" clock, formatting timestamps to look like
      fractional seconds is misleading. Mark clocks as either in nanoseconds or
      not, and format non-nanosecond timestamps as decimal integers.
      
      Tested:
      $ cd /sys/kernel/debug/tracing/
      $ cat trace_clock
      [local] global tsc
      $ echo sched_switch > set_event
      $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
      $ cat trace
                <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
                 sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      $ echo 1 > options/latency-format
      $ cat trace
        <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
         sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      $ echo tsc > trace_clock
      $ cat trace
      $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
      $ echo 0 > options/latency-format
      $ cat trace
                <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
                 sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      echo 1 > options/latency-format
      $ cat trace
        <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
         sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      
      v2:
      Move arch-specific bits out of generic code.
      v4:
      Fix x86_32 build due to 64-bit division.
      
      Google-Bug-Id: 6980623
      Link: http://lkml.kernel.org/r/1352837903-32191-2-git-send-email-dhsharp@google.com
      
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8be0709f
  23. 15 6月, 2012 1 次提交
    • S
      tracing: Register the ftrace internal events during early boot · 7374e827
      Steven Rostedt 提交于
      All trace events including ftrace internel events (like trace_printk
      and function tracing), register functions that describe how to print
      their output. The events may be recorded as soon as the ring buffer
      is allocated, but they are just raw binary in the buffer. The mapping
      of event ids to how to print them are held within a structure that
      is registered on system boot.
      
      If a crash happens in boot up before these functions are registered
      then their output (via ftrace_dump_on_oops) will be useless:
      
      Dumping ftrace buffer:
      ---------------------------------
         <...>-1       0.... 319705us : Unknown type 6
      ---------------------------------
      
      This can be quite frustrating for a kernel developer trying to see
      what is going wrong.
      
      There's no reason to register them so late in the boot up process.
      They can be registered by early_initcall().
      Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7374e827
  24. 20 4月, 2012 1 次提交
    • S
      tracing: Fix stacktrace of latency tracers (irqsoff and friends) · db4c75cb
      Steven Rostedt 提交于
      While debugging a latency with someone on IRC (mirage335) on #linux-rt (OFTC),
      we discovered that the stacktrace output of the latency tracers
      (preemptirqsoff) was empty.
      
      This bug was caused by the creation of the dynamic length stack trace
      again (like commit 12b5da34 "tracing: Fix ent_size in trace output" was).
      
      This bug is caused by the latency tracers requiring the next event
      to determine the time between the current event and the next. But by
      grabbing the next event, the iter->ent_size is set to the next event
      instead of the current one. As the stacktrace event is the last event,
      this makes the ent_size zero and causes nothing to be printed for
      the stack trace. The dynamic stacktrace uses the ent_size to determine
      how much of the stack can be printed. The ent_size of zero means
      no stack.
      
      The simple fix is to save the iter->ent_size before finding the next event.
      
      Note, mirage335 asked to remain anonymous from LKML and git, so I will
      not add the Reported-by and Tested-by tags, even though he did report
      the issue and tested the fix.
      
      Cc: stable@vger.kernel.org # 3.1+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      db4c75cb
  25. 21 3月, 2012 1 次提交
  26. 22 2月, 2012 1 次提交
  27. 21 2月, 2012 1 次提交
  28. 17 11月, 2011 1 次提交
    • S
      tracing: Add irq, preempt-count and need resched info to default trace output · 77271ce4
      Steven Rostedt 提交于
      People keep asking how to get the preempt count, irq, and need resched info
      and we keep telling them to enable the latency format. Some developers think
      that traces without this info is completely useless, and for a lot of tasks
      it is useless.
      
      The first option was to enable the latency trace as the default format, but
      the header for the latency format is pretty useless for most tracers and
      it also does the timestamp in straight microseconds from the time the trace
      started. This is sometimes more difficult to read as the default trace is
      seconds from the start of boot up.
      
      Latency format:
      
       # tracer: nop
       #
       # nop latency trace v1.1.5 on 3.2.0-rc1-test+
       # --------------------------------------------------------------------
       # latency: 0 us, #159771/64234230, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
       #    -----------------
       #    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
       #    -----------------
       #
       #                  _------=> CPU#
       #                 / _-----=> irqs-off
       #                | / _----=> need-resched
       #                || / _---=> hardirq/softirq
       #                ||| / _--=> preempt-depth
       #                |||| /     delay
       #  cmd     pid   ||||| time  |   caller
       #     \   /      |||||  \    |   /
       migratio-6       0...2 41778231us+: rcu_note_context_switch <-__schedule
       migratio-6       0...2 41778233us : trace_rcu_utilization <-rcu_note_context_switch
       migratio-6       0...2 41778235us+: rcu_sched_qs <-rcu_note_context_switch
       migratio-6       0d..2 41778236us+: rcu_preempt_qs <-rcu_note_context_switch
       migratio-6       0...2 41778238us : trace_rcu_utilization <-rcu_note_context_switch
       migratio-6       0...2 41778239us+: debug_lockdep_rcu_enabled <-__schedule
      
      default format:
      
       # tracer: nop
       #
       #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
       #              | |       |          |         |
            migration/0-6     [000]    50.025810: rcu_note_context_switch <-__schedule
            migration/0-6     [000]    50.025812: trace_rcu_utilization <-rcu_note_context_switch
            migration/0-6     [000]    50.025813: rcu_sched_qs <-rcu_note_context_switch
            migration/0-6     [000]    50.025815: rcu_preempt_qs <-rcu_note_context_switch
            migration/0-6     [000]    50.025817: trace_rcu_utilization <-rcu_note_context_switch
            migration/0-6     [000]    50.025818: debug_lockdep_rcu_enabled <-__schedule
            migration/0-6     [000]    50.025820: debug_lockdep_rcu_enabled <-__schedule
      
      The latency format header has latency information that is pretty meaningless
      for most tracers. Although some of the header is useful, and we can add that
      later to the default format as well.
      
      What is really useful with the latency format is the irqs-off, need-resched
      hard/softirq context and the preempt count.
      
      This commit adds the option irq-info which is on by default that adds this
      information:
      
       # tracer: nop
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
                 <idle>-0     [000] d..2    49.309305: cpuidle_get_driver <-cpuidle_idle_call
                 <idle>-0     [000] d..2    49.309307: mwait_idle <-cpu_idle
                 <idle>-0     [000] d..2    49.309309: need_resched <-mwait_idle
                 <idle>-0     [000] d..2    49.309310: test_ti_thread_flag <-need_resched
                 <idle>-0     [000] d..2    49.309312: trace_power_start.constprop.13 <-mwait_idle
                 <idle>-0     [000] d..2    49.309313: trace_cpu_idle <-mwait_idle
                 <idle>-0     [000] d..2    49.309315: need_resched <-mwait_idle
      
      If a user wants the old format, they can disable the 'irq-info' option:
      
       # tracer: nop
       #
       #           TASK-PID   CPU#      TIMESTAMP  FUNCTION
       #              | |       |          |         |
                 <idle>-0     [000]     49.309305: cpuidle_get_driver <-cpuidle_idle_call
                 <idle>-0     [000]     49.309307: mwait_idle <-cpu_idle
                 <idle>-0     [000]     49.309309: need_resched <-mwait_idle
                 <idle>-0     [000]     49.309310: test_ti_thread_flag <-need_resched
                 <idle>-0     [000]     49.309312: trace_power_start.constprop.13 <-mwait_idle
                 <idle>-0     [000]     49.309313: trace_cpu_idle <-mwait_idle
                 <idle>-0     [000]     49.309315: need_resched <-mwait_idle
      Requested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      77271ce4