1. 15 3月, 2013 2 次提交
    • S
      tracing: Consolidate max_tr into main trace_array structure · 12883efb
      Steven Rostedt (Red Hat) 提交于
      Currently, the way the latency tracers and snapshot feature works
      is to have a separate trace_array called "max_tr" that holds the
      snapshot buffer. For latency tracers, this snapshot buffer is used
      to swap the running buffer with this buffer to save the current max
      latency.
      
      The only items needed for the max_tr is really just a copy of the buffer
      itself, the per_cpu data pointers, the time_start timestamp that states
      when the max latency was triggered, and the cpu that the max latency
      was triggered on. All other fields in trace_array are unused by the
      max_tr, making the max_tr mostly bloat.
      
      This change removes the max_tr completely, and adds a new structure
      called trace_buffer, that holds the buffer pointer, the per_cpu data
      pointers, the time_start timestamp, and the cpu where the latency occurred.
      
      The trace_array, now has two trace_buffers, one for the normal trace and
      one for the max trace or snapshot. By doing this, not only do we remove
      the bloat from the max_trace but the instances of traces can now use
      their own snapshot feature and not have just the top level global_trace have
      the snapshot feature and latency tracers for itself.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      12883efb
    • S
      tracing: Replace the static global per_cpu arrays with allocated per_cpu · a7603ff4
      Steven Rostedt 提交于
      The global and max-tr currently use static per_cpu arrays for the CPU data
      descriptors. But in order to get new allocated trace_arrays, they need to
      be allocated per_cpu arrays. Instead of using the static arrays, switch
      the global and max-tr to use allocated data.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a7603ff4
  2. 30 1月, 2013 1 次提交
    • S
      tracing/fgraph: Adjust fgraph depth before calling trace return callback · 03274a3f
      Steven Rostedt (Red Hat) 提交于
      While debugging the virtual cputime with the function graph tracer
      with a max_depth of 1 (most common use of the max_depth so far),
      I found that I was missing kernel execution because of a race condition.
      
      The code for the return side of the function has a slight race:
      
      	ftrace_pop_return_trace(&trace, &ret, frame_pointer);
      	trace.rettime = trace_clock_local();
      	ftrace_graph_return(&trace);
      	barrier();
      	current->curr_ret_stack--;
      
      The ftrace_pop_return_trace() initializes the trace structure for
      the callback. The ftrace_graph_return() uses the trace structure
      for its own use as that structure is on the stack and is local
      to this function. Then the curr_ret_stack is decremented which
      is what the trace.depth is set to.
      
      If an interrupt comes in after the ftrace_graph_return() but
      before the curr_ret_stack, then the called function will get
      a depth of 2. If max_depth is set to 1 this function will be
      ignored.
      
      The problem is that the trace has already been called, and the
      timestamp for that trace will not reflect the time the function
      was about to re-enter userspace. Calls to the interrupt will not
      be traced because the max_depth has prevented this.
      
      To solve this issue, the ftrace_graph_return() can safely be
      moved after the current->curr_ret_stack has been updated.
      This way the timestamp for the return callback will reflect
      the actual time.
      
      If an interrupt comes in after the curr_ret_stack update and
      ftrace_graph_return(), it will be traced. It may look a little
      confusing to see it within the other function, but at least
      it will not be lost.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      03274a3f
  3. 22 1月, 2013 1 次提交
    • S
      tracing/fgraph: Add max_graph_depth to limit function_graph depth · 8741db53
      Steven Rostedt 提交于
      Add the file max_graph_depth to the debug tracing directory that lets
      the user define the depth of the function graph.
      
      A very useful operation is to set the depth to 1. Then it traces only
      the first function that is called when entering the kernel. This can
      be used to determine what system operations interrupt a process.
      
      For example, to work on NOHZ processes (single tasks running without
      a timer tick), if any interrupt goes off and preempts that task, this
      code will show it happening.
      
        # cd /sys/kernel/debug/tracing
        # echo 1 > max_graph_depth
        # echo function_graph > current_tracer
        # cat per_cpu/cpu/<cpu-of-process>/trace
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8741db53
  4. 01 11月, 2012 2 次提交
    • S
      tracing: Cache comms only after an event occurred · 7ffbd48d
      Steven Rostedt 提交于
      Whenever an event is registered, the comm of tasks are saved at
      every task switch instead of saving them at every event. But if
      an event isn't executed much, the comm cache will be filled up
      by tasks that did not record the event and you lose out on the comms
      that did.
      
      Here's an example, if you enable the following events:
      
      echo 1 > /debug/tracing/events/kvm/kvm_cr/enable
      echo 1 > /debug/tracing/events/net/net_dev_xmit/enable
      
      Note, there's no kvm running on this machine so the first event will
      never be triggered, but because it is enabled, the storing of comms
      will continue. If we now disable the network event:
      
      echo 0 > /debug/tracing/events/net/net_dev_xmit/enable
      
      and look at the trace:
      
      cat /debug/tracing/trace
                  sshd-2672  [001] ..s2   375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s1   375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s2   375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s1   375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s2   375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s1   375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s2   376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s1   376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s2   377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0
                  sshd-2672  [001] ..s1   377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0
                  sshd-2672  [001] ..s2   377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0
                  sshd-2672  [001] ..s1   377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0
      
      We see that process 2672 which triggered the events has the comm "sshd".
      But if we run hackbench for a bit and look again:
      
      cat /debug/tracing/trace
                 <...>-2672  [001] ..s2   375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s1   375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s2   375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s1   375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s2   375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s1   375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s2   376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s1   376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s2   377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0
                 <...>-2672  [001] ..s1   377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0
                 <...>-2672  [001] ..s2   377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0
                 <...>-2672  [001] ..s1   377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0
      
      The stored "sshd" comm has been flushed out and we get a useless "<...>".
      
      But by only storing comms after a trace event occurred, we can run
      hackbench all day and still get the same output.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7ffbd48d
    • S
      tracing: Allow tracers to start at core initcall · 6f415672
      Steven Rostedt 提交于
      There's times during debugging that it is helpful to see traces of early
      boot functions. But the tracers are initialized at device_initcall()
      which is quite late during the boot process. Setting the kernel command
      line parameter ftrace=function will not show anything until the function
      tracer is initialized. This prevents being able to trace functions before
      device_initcall().
      
      There's no reason that the tracers need to be initialized so late in the
      boot process. Move them up to core_initcall() as they still need to come
      after early_initcall() which initializes the tracing buffers.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6f415672
  5. 23 8月, 2012 1 次提交
  6. 29 6月, 2012 1 次提交
    • S
      tracing: Remove NR_CPUS array from trace_iterator · 6d158a81
      Steven Rostedt 提交于
      Replace the NR_CPUS array of buffer_iter from the trace_iterator
      with an allocated array. This will just create an array of
      possible CPUS instead of the max number specified.
      
      The use of NR_CPUS in that array caused allocation failures for
      machines that were tight on memory. This did not cause any failures
      to the system itself (no crashes), but caused unnecessary failures
      for reading the trace files.
      
      Added a helper function called 'trace_buffer_iter()' that returns
      the buffer_iter item or NULL if it is not defined or the array was
      not allocated. Some routines do not require the array
      (tracing_open_pipe() for one).
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6d158a81
  7. 08 7月, 2011 1 次提交
  8. 15 6月, 2011 4 次提交
    • J
      tracing, function_graph: Add context-info support for function_graph tracer · 749230b0
      Jiri Olsa 提交于
      The function_graph tracer does not follow global context-info option.
      Adding TRACE_ITER_CONTEXT_INFO trace_flags check to enable it.
      
      With following commands:
      	# echo function_graph > ./current_tracer
      	# echo 0 > options/context-info
      	# cat trace
      
      This is what it looked like before:
      # tracer: function_graph
      #
      #     TIME        CPU  DURATION                  FUNCTION CALLS
      #      |          |     |   |                     |   |   |   |
       1)   0.079 us    |          } /* __vma_link_rb */
       1)   0.056 us    |          copy_page_range();
       1)               |          security_vm_enough_memory() {
      ...
      
      This is what it looks like now:
      # tracer: function_graph
      #
        } /* update_ts_time_stats */
        timekeeping_max_deferment();
      ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-6-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      749230b0
    • J
      tracing, function_graph: Remove lock-depth from latency trace · 199abfab
      Jiri Olsa 提交于
      The lock_depth was removed in commit
      e6e1e259 tracing: Remove lock_depth from event entry
      
      Removing the lock_depth info from function_graph latency header.
      
      With following commands:
      	# echo function_graph > ./current_tracer
      	# echo 1 > options/latency-format
      	# cat trace
      
      This is what it looked like before:
      # tracer: function_graph
      #
      # function_graph latency trace v1.1.5 on 3.0.0-rc1-tip+
      # --------------------------------------------------------------------
      # latency: 0 us, #59756/311298, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
      #    -----------------
      #    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
      #    -----------------
      #
      #      _-----=> irqs-off
      #     / _----=> need-resched
      #    | / _---=> hardirq/softirq
      #    || / _--=> preempt-depth
      #    ||| / _-=> lock-depth
      #    |||| /
      # CPU|||||  DURATION                  FUNCTION CALLS
      # |  |||||   |   |                     |   |   |   |
       0)  ....  0.068 us    |    } /* __rcu_read_unlock */
      ...
      
      This is what it looks like now:
      # tracer: function_graph
      #
      # function_graph latency trace v1.1.5 on 3.0.0-rc1-tip+
      # --------------------------------------------------------------------
      # latency: 0 us, #59747/1744610, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
      #    -----------------
      #    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
      #    -----------------
      #
      #      _-----=> irqs-off
      #     / _----=> need-resched
      #    | / _---=> hardirq/softirq
      #    || / _--=> preempt-depth
      #    ||| /
      # CPU||||  DURATION                  FUNCTION CALLS
      # |  ||||   |   |                     |   |   |   |
       0)  ..s.  1.641 us    |  } /* __rcu_process_callbacks */
      ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-5-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      199abfab
    • J
      tracing, function_graph: Merge overhead and duration display functions · ffeb80fc
      Jiri Olsa 提交于
      Functions print_graph_overhead() and print_graph_duration() displays
      data for one field - DURATION.
      
      I merged them into single function print_graph_duration(),
      and added a way to display the empty parts of the field.
      
      This way the print_graph_irq() function can use this column to display
      the IRQ signs if needed and the DURATION field details stays inside
      the print_graph_duration() function.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-3-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ffeb80fc
    • J
      tracing, function_graph: Remove dependency of abstime and duration fields on latency · 321e68b0
      Jiri Olsa 提交于
      The display of absolute time and duration fields is based on the
      latency field. This was added during the irqsoff/wakeup tracers
      graph support changes.
      
      It's causing confusion in what fields will be displayed for the
      function_graph tracer itself. So I'm removing this depency, and
      adding absolute time and duration fields to the preemptirqsoff
      preemptoff irqsoff wakeup tracers.
      
      With following commands:
      	# echo function_graph > ./current_tracer
      	# cat trace
      
      This is what it looked like before:
      # tracer: function_graph
      #
      #     TIME        CPU  DURATION                  FUNCTION CALLS
      #      |          |     |   |                     |   |   |   |
       0)   0.068 us    |          } /* page_add_file_rmap */
       0)               |          _raw_spin_unlock() {
      ...
      
      This is what it looks like now:
      # tracer: function_graph
      #
      # CPU  DURATION                  FUNCTION CALLS
      # |     |   |                     |   |   |   |
       0)   0.068 us    |                } /* add_preempt_count */
       0)   0.993 us    |              } /* vfsmount_lock_local_lock */
      ...
      
      For preemptirqsoff preemptoff irqsoff wakeup tracers,
      this is what it looked like before:
      SNIP
      #                       _-----=> irqs-off
      #                      / _----=> need-resched
      #                     | / _---=> hardirq/softirq
      #                     || / _--=> preempt-depth
      #                     ||| / _-=> lock-depth
      #                     |||| /
      # CPU  TASK/PID       |||||  DURATION                  FUNCTION CALLS
      # |     |    |        |||||   |   |                     |   |   |   |
       1)    <idle>-0    |  d..1  0.000 us    |  acpi_idle_enter_simple();
      ...
      
      This is what it looks like now:
      SNIP
      #
      #                                       _-----=> irqs-off
      #                                      / _----=> need-resched
      #                                     | / _---=> hardirq/softirq
      #                                     || / _--=> preempt-depth
      #                                     ||| /
      #     TIME        CPU  TASK/PID       ||||  DURATION                  FUNCTION CALLS
      #      |          |     |    |        ||||   |   |                     |   |   |   |
         19.847735 |   1)    <idle>-0    |  d..1  0.000 us    |  acpi_idle_enter_simple();
      ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-2-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      321e68b0
  9. 31 3月, 2011 1 次提交
  10. 18 10月, 2010 3 次提交
    • S
      tracing: Remove parent recording in latency tracer graph options · 78c89ba1
      Steven Rostedt 提交于
      Even though the parent is recorded with the normal function tracing
      of the latency tracers (irqsoff and wakeup), the function graph
      recording is bogus.
      
      This is due to the function graph messing with the return stack.
      The latency tracers pass in as the parent CALLER_ADDR0, which
      works fine for plain function tracing. But this causes bogus output
      with the graph tracer:
      
       3)    <idle>-0    |  d.s3.  0.000 us    |  return_to_handler();
       3)    <idle>-0    |  d.s3.  0.000 us    |  _raw_spin_unlock_irqrestore();
       3)    <idle>-0    |  d.s3.  0.000 us    |  return_to_handler();
       3)    <idle>-0    |  d.s3.  0.000 us    |  trace_hardirqs_on();
      
      The "return_to_handle()" call is the trampoline of the
      function graph tracer, and is meaningless in this context.
      
      Cc: Jiri Olsa <jolsa@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      78c89ba1
    • J
      tracing: Make graph related irqs/preemptsoff functions global · 0a772620
      Jiri Olsa 提交于
      Move trace_graph_function() and print_graph_headers_flags() functions
      to the trace_function_graph.c to be globaly available.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      LKML-Reference: <1285243253-7372-3-git-send-email-jolsa@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0a772620
    • J
      tracing: Add proper check for irq_depth routines · a9d61173
      Jiri Olsa 提交于
      The check_irq_entry and check_irq_return could be called
      from graph event context. In such case there's no graph
      private data allocated. Adding checks to handle this case.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      LKML-Reference: <20100924154102.GB1818@jolsa.brq.redhat.com>
      
      [ Fixed some grammar in the comments ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a9d61173
  11. 13 10月, 2010 1 次提交
    • B
      tracing: Fix function-graph build warning on 32-bit · 14cae9bd
      Borislav Petkov 提交于
      Fix
      
      kernel/trace/trace_functions_graph.c: In function ‘trace_print_graph_duration’:
      kernel/trace/trace_functions_graph.c:652: warning: comparison of distinct pointer types lacks a cast
      
      when building 36-rc6 on a 32-bit due to the strict type check failing
      in the min() macro.
      Signed-off-by: NBorislav Petkov <bp@alien8.de>
      Cc: Chase Douglas <chase.douglas@canonical.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      LKML-Reference: <20100929080823.GA13595@liondog.tnic>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      14cae9bd
  12. 15 9月, 2010 2 次提交
    • S
      tracing: Do not trace in irq when funcgraph-irq option is zero · b304d044
      Steven Rostedt 提交于
      When the function graph tracer funcgraph-irq option is zero, disable
      tracing in IRQs. This makes the option have two effects.
      
      1) When reading the trace file, do not display the functions that
         happen in interrupt context (when detected)
      
      2) [*new*] When recording a trace, skip those that are detected
         to be in interrupt by the 'in_irq()' function
      
      Note, in_irq() is updated at irq_enter() and irq_exit(). There are
      still functions that are recorded by the function graph tracer that
      is in interrupt context but outside the irq_enter/exit() routines.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b304d044
    • J
      tracing: Add funcgraph-irq option for function graph tracer. · 2bd16212
      Jiri Olsa 提交于
      It's handy to be able to disable the irq related output
      and not to have to jump over each irq related code, when
      you have no interrest in it.
      
      The option is by default enabled, so there's no change to
      current behaviour. It affects only the final output, so all
      the irq related data stay in the ring buffer.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      LKML-Reference: <20100907145344.GC1912@jolsa.brq.redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2bd16212
  13. 19 8月, 2010 1 次提交
  14. 07 8月, 2010 1 次提交
  15. 29 6月, 2010 1 次提交
    • C
      tracing/function-graph: Use correct string size for snprintf · d62f85d1
      Chase Douglas 提交于
      The nsecs_str string is a local variable defined as:
      
      char nsecs_str[5];
      
      It is possible for the snprintf call to use a size value larger than the
      size of the string. This should not cause a buffer overrun as it is
      written now due to the value for the string format "%03lu" can not be
      larger than 1000. However, this change makes it correct. By making the
      size correct we guard against potential future changes that could actually
      cause a buffer overrun.
      Signed-off-by: NChase Douglas <chase.douglas@canonical.com>
      LKML-Reference: <1276619355-18116-1-git-send-email-chase.douglas@canonical.com>
      
      [ added 'UL' to number 8 to fix gcc warning comparing it to sizeof() ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d62f85d1
  16. 15 5月, 2010 1 次提交
    • S
      tracing: Allow events to share their print functions · a9a57763
      Steven Rostedt 提交于
      Multiple events may use the same method to print their data.
      Instead of having all events have a pointer to their print funtions,
      the trace_event structure now points to a trace_event_functions structure
      that will hold the way to print ouf the event.
      
      The event itself is now passed to the print function to let the print
      function know what kind of event it should print.
      
      This opens the door to consolidating the way several events print
      their output.
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4900382	1048964	 861512	6810858	 67ecea	vmlinux.init
      4900446	1049028	 861512	6810986	 67ed6a	vmlinux.preprint
      
      This change slightly increases the size but is needed for the next change.
      
      v3: Fix the branch tracer events to handle this change.
      
      v2: Fix the new function graph tracer event calls to handle this change.
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a9a57763
  17. 28 4月, 2010 1 次提交
  18. 27 4月, 2010 2 次提交
  19. 01 4月, 2010 1 次提交
    • S
      ring-buffer: Add place holder recording of dropped events · 66a8cb95
      Steven Rostedt 提交于
      Currently, when the ring buffer drops events, it does not record
      the fact that it did so. It does inform the writer that the event
      was dropped by returning a NULL event, but it does not put in any
      place holder where the event was dropped.
      
      This is not a trivial thing to add because the ring buffer mostly
      runs in overwrite (flight recorder) mode. That is, when the ring
      buffer is full, new data will overwrite old data.
      
      In a produce/consumer mode, where new data is simply dropped when
      the ring buffer is full, it is trivial to add the placeholder
      for dropped events. When there's more room to write new data, then
      a special event can be added to notify the reader about the dropped
      events.
      
      But in overwrite mode, any new write can overwrite events. A place
      holder can not be inserted into the ring buffer since there never
      may be room. A reader could also come in at anytime and miss the
      placeholder.
      
      Luckily, the way the ring buffer works, the read side can find out
      if events were lost or not, and how many events. Everytime a write
      takes place, if it overwrites the header page (the next read) it
      updates a "overrun" variable that keeps track of the number of
      lost events. When a reader swaps out a page from the ring buffer,
      it can record this number, perfom the swap, and then check to
      see if the number changed, and take the diff if it has, which would be
      the number of events dropped. This can be stored by the reader
      and returned to callers of the reader.
      
      Since the reader page swap will fail if the writer moved the head
      page since the time the reader page set up the swap, this gives room
      to record the overruns without worrying about races. If the reader
      sets up the pages, records the overrun, than performs the swap,
      if the swap succeeds, then the overrun variable has not been
      updated since the setup before the swap.
      
      For binary readers of the ring buffer, a flag is set in the header
      of each sub page (sub buffer) of the ring buffer. This flag is embedded
      in the size field of the data on the sub buffer, in the 31st bit (the size
      can be 32 or 64 bits depending on the architecture), but only 27
      bits needs to be used for the actual size (less actually).
      
      We could add a new field in the sub buffer header to also record the
      number of events dropped since the last read, but this will change the
      format of the binary ring buffer a bit too much. Perhaps this change can
      be made if the information on the number of events dropped is considered
      important enough.
      
      Note, the notification of dropped events is only used by consuming reads
      or peeking at the ring buffer. Iterating over the ring buffer does not
      keep this information because the necessary data is only available when
      a page swap is made, and the iterator does not swap out pages.
      
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      66a8cb95
  20. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  21. 06 3月, 2010 2 次提交
    • T
      function-graph: Add tracing_thresh support to function_graph tracer · 0e950173
      Tim Bird 提交于
      Add support for tracing_thresh to the function_graph tracer.  This
      version of this feature isolates the checks into new entry and
      return functions, to avoid adding more conditional code into the
      main function_graph paths.
      
      When the tracing_thresh is set and the function graph tracer is
      enabled, only the functions that took longer than the time in
      microseconds that was set in tracing_thresh are recorded. To do this
      efficiently, only the function exits are recorded:
      
       [tracing]# echo 100 > tracing_thresh
       [tracing]# echo function_graph > current_tracer
       [tracing]# cat trace
       # tracer: function_graph
       #
       # CPU  DURATION                  FUNCTION CALLS
       # |     |   |                     |   |   |   |
        1) ! 119.214 us  |  } /* smp_apic_timer_interrupt */
        1)   <========== |
        0) ! 101.527 us  |              } /* __rcu_process_callbacks */
        0) ! 126.461 us  |            } /* rcu_process_callbacks */
        0) ! 145.111 us  |          } /* __do_softirq */
        0) ! 149.667 us  |        } /* do_softirq */
        0) ! 168.817 us  |      } /* irq_exit */
        0) ! 248.254 us  |    } /* smp_apic_timer_interrupt */
      
      Also, add support for specifying tracing_thresh on the kernel
      command line.  When used like so: "tracing_thresh=200 ftrace=function_graph"
      this can be used to analyse system startup.  It is important to disable
      tracing soon after boot, in order to avoid losing the trace data.
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NTim Bird <tim.bird@am.sony.com>
      LKML-Reference: <4B87098B.4040308@am.sony.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0e950173
    • S
      function-graph: Use comment notation for func names of dangling '}' · a094fe04
      Steven Rostedt 提交于
      When a '}' does not have a matching function start, the name is printed
      within parenthesis. But this makes it confusing between ending '}'
      and function starts. This patch makes the function name appear in C comment
      notation.
      
      Old view:
       3)   1.281 us    |            } (might_fault)
       3)   3.620 us    |          } (filldir)
       3)   5.251 us    |        } (call_filldir)
       3)               |        call_filldir() {
       3)               |          filldir() {
      
      New view:
       3)   1.281 us    |            } /* might_fault */
       3)   3.620 us    |          } /* filldir */
       3)   5.251 us    |        } /* call_filldir */
       3)               |        call_filldir() {
       3)               |          filldir() {
      Requested-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a094fe04
  22. 27 2月, 2010 1 次提交
    • S
      ftrace: Add function names to dangling } in function graph tracer · f1c7f517
      Steven Rostedt 提交于
      The function graph tracer is currently the most invasive tracer
      in the ftrace family. It can easily overflow the buffer even with
      10megs per CPU. This means that events can often be lost.
      
      On start up, or after events are lost, if the function return is
      recorded but the function enter was lost, all we get to see is the
      exiting '}'.
      
      Here is how a typical trace output starts:
      
       [tracing] cat trace
       # tracer: function_graph
       #
       # CPU  DURATION                  FUNCTION CALLS
       # |     |   |                     |   |   |   |
        0) + 91.897 us   |                  }
        0) ! 567.961 us  |                }
        0)   <========== |
        0) ! 579.083 us  |                _raw_spin_lock_irqsave();
        0)   4.694 us    |                _raw_spin_unlock_irqrestore();
        0) ! 594.862 us  |              }
        0) ! 603.361 us  |            }
        0) ! 613.574 us  |          }
        0) ! 623.554 us  |        }
        0)   3.653 us    |        fget_light();
        0)               |        sock_poll() {
      
      There are a series of '}' with no matching "func() {". There's no information
      to what functions these ending brackets belong to.
      
      This patch adds a stack on the per cpu structure used in outputting
      the function graph tracer to keep track of what function was outputted.
      Then on a function exit event, it checks the depth to see if the
      function exit has a matching entry event. If it does, then it only
      prints the '}', otherwise it adds the function name after the '}'.
      
      This allows function exit events to show what function they belong to
      at trace output startup, when the entry was lost due to ring buffer
      overflow, or even after a new task is scheduled in.
      
      Here is what the above trace will look like after this patch:
      
       [tracing] cat trace
       # tracer: function_graph
       #
       # CPU  DURATION                  FUNCTION CALLS
       # |     |   |                     |   |   |   |
        0) + 91.897 us   |                  } (irq_exit)
        0) ! 567.961 us  |                } (smp_apic_timer_interrupt)
        0)   <========== |
        0) ! 579.083 us  |                _raw_spin_lock_irqsave();
        0)   4.694 us    |                _raw_spin_unlock_irqrestore();
        0) ! 594.862 us  |              } (add_wait_queue)
        0) ! 603.361 us  |            } (__pollwait)
        0) ! 613.574 us  |          } (tcp_poll)
        0) ! 623.554 us  |        } (sock_poll)
        0)   3.653 us    |        fget_light();
        0)               |        sock_poll() {
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f1c7f517
  23. 25 2月, 2010 1 次提交
  24. 29 1月, 2010 1 次提交
    • L
      tracing: Simplify test for function_graph tracing start point · ea2c68a0
      Lai Jiangshan 提交于
      In the function graph tracer, a calling function is to be traced
      only when it is enabled through the set_graph_function file,
      or when it is nested in an enabled function.
      
      Current code uses TSK_TRACE_FL_GRAPH to test whether it is nested
      or not. Looking at the code, we can get this:
      (trace->depth > 0) <==> (TSK_TRACE_FL_GRAPH is set)
      
      trace->depth is more explicit to tell that it is nested.
      So we use trace->depth directly and simplify the code.
      
      No functionality is changed.
      TSK_TRACE_FL_GRAPH is not removed yet, it is left for future usage.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <4B4DB0B6.7040607@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      ea2c68a0
  25. 17 1月, 2010 1 次提交
  26. 10 12月, 2009 1 次提交
    • J
      tracing: Fix function graph trace_pipe to properly display failed entries · be1eca39
      Jiri Olsa 提交于
      There is a case where the graph tracer might get confused and omits
      displaying of a single record.  This applies mostly with the trace_pipe
      since it is unlikely that the trace_seq buffer will overflow with the
      trace file.
      
      As the function_graph tracer goes through the trace entries keeping a
      pointer to the current record:
      
      current ->  func1 ENTRY
                  func2 ENTRY
                  func2 RETURN
                  func1 RETURN
      
      When an function ENTRY is encountered, it moves the pointer to the
      next entry to check if the function is a nested or leaf function.
      
                  func1 ENTRY
      current ->  func2 ENTRY
                  func2 RETURN
                  func1 RETURN
      
      If the rest of the writing of the function fills the trace_seq buffer,
      then the trace_pipe read will ignore this entry. The next read will
      Now start at the current location, but the first entry (func1) will
      be discarded.
      
      This patch keeps a copy of the current entry in the iterator private
      storage and will keep track of when the trace_seq buffer fills. When
      the trace_seq buffer fills, it will reuse the copy of the entry in the
      next iteration.
      
      [
        This patch has been largely modified by Steven Rostedt in order to
        clean it up and simplify it. The original idea and concept was from
        Jirka and for that, this patch will go under his name to give him
        the credit he deserves. But because this was modify by Steven Rostedt
        anything wrong with the patch should be blamed on Steven.
      ]
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      be1eca39
  27. 29 10月, 2009 1 次提交
  28. 13 10月, 2009 1 次提交
  29. 18 9月, 2009 1 次提交
  30. 12 9月, 2009 1 次提交