1. 17 3月, 2011 1 次提交
  2. 10 3月, 2011 9 次提交
    • S
      tracing: Fix irqoff selftest expanding max buffer · 4a0b1665
      Steven Rostedt 提交于
      If the kernel command line declares a tracer "ftrace=sometracer" and
      that tracer is either not defined or is enabled after irqsoff,
      then the irqs off selftest will fail with the following error:
      
      Testing tracer irqsoff:
      ------------[ cut here ]------------
      WARNING: at /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/tra
      ce.c:713 update_max_tr_single+0xfa/0x11b()
      Hardware name:
      Modules linked in:
      Pid: 1, comm: swapper Not tainted 2.6.38-rc8-test #1
      Call Trace:
       [<c0441d9d>] ? warn_slowpath_common+0x65/0x7a
       [<c049adb2>] ? update_max_tr_single+0xfa/0x11b
       [<c0441dc1>] ? warn_slowpath_null+0xf/0x13
       [<c049adb2>] ? update_max_tr_single+0xfa/0x11b
       [<c049e454>] ? stop_critical_timing+0x154/0x204
       [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
       [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
       [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
       [<c049e529>] ? time_hardirqs_on+0x25/0x28
       [<c0468bca>] ? trace_hardirqs_on_caller+0x18/0x12f
       [<c0468cec>] ? trace_hardirqs_on+0xb/0xd
       [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
       [<c049b6b8>] ? register_tracer+0xf8/0x1a3
       [<c14e93fe>] ? init_irqsoff_tracer+0xd/0x11
       [<c040115e>] ? do_one_initcall+0x71/0x121
       [<c14e93f1>] ? init_irqsoff_tracer+0x0/0x11
       [<c14ce3a9>] ? kernel_init+0x13a/0x1b6
       [<c14ce26f>] ? kernel_init+0x0/0x1b6
       [<c0403842>] ? kernel_thread_helper+0x6/0x10
      ---[ end trace e93713a9d40cd06c ]---
      .. no entries found ..FAILED!
      
      What happens is the "ftrace=..." will expand the ring buffer to its
      default size (from its minimum size) but it will not expand the
      max ring buffer (the ring buffer to store maximum latencies).
      When the irqsoff test runs, it will call the ring buffer swap routine
      that checks if the max ring buffer is the same size as the normal
      ring buffer, and will fail if it is not. This causes the test to fail.
      
      The solution is to expand the max ring buffer before running the self
      test if the max ring buffer is used by that tracer and the normal ring
      buffer is expanded. The max ring buffer should be shrunk again after
      the test is done to save space.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4a0b1665
    • S
      tracing: Align 4 byte ints together in struct tracer · 9a24470b
      Steven Rostedt 提交于
      Move elements in struct tracer for better alignment.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9a24470b
    • Y
      tracing: Export trace_set_clr_event() · 56355b83
      Yuanhan Liu 提交于
      Trace events belonging to a module only exists when the module is
      loaded. Well, we can use trace_set_clr_event funtion to enable some
      trace event at the module init routine, so that we will not miss
      something while loading then module.
      
      So, Export the trace_set_clr_event function so that module can use it.
      Signed-off-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
      LKML-Reference: <1289196312-25323-1-git-send-email-yuanhan.liu@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      56355b83
    • J
      tracing: Explain about unstable clock on resume with ring buffer warning · 31274d72
      Jiri Olsa 提交于
      The "Delta way too big" warning might appear on a system with a
      unstable shed clock right after the system is resumed and tracing
      was enabled at time of suspend.
      
      Since it's not realy a bug, and the unstable sched clock is working
      fast and reliable otherwise, Steven suggested to keep using the
      sched clock in any case and just to make note in the warning itself.
      
      v2 changes:
      - added #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      LKML-Reference: <20110218145219.GD2604@jolsa.brq.redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      31274d72
    • D
      tracing: Adjust conditional expression latency formatting. · 10da37a6
      David Sharp 提交于
      Formatting change only to improve code readability. No code changes except to
      introduce intermediate variables.
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1291421609-14665-13-git-send-email-dhsharp@google.com>
      
      [ Keep variable declarations and assignment separate ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      10da37a6
    • D
      tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup · 140e4f2d
      David Sharp 提交于
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1291421609-14665-6-git-send-email-dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      140e4f2d
    • S
      tracing: Remove lock_depth from event entry · e6e1e259
      Steven Rostedt 提交于
      The lock_depth field in the event headers was added as a temporary
      data point for help in removing the BKL. Now that the BKL is pretty
      much been removed, we can remove this field.
      
      This in turn changes the header from 12 bytes to 8 bytes,
      removing the 4 byte buffer that gcc would insert if the first field
      in the data load was 8 bytes in size.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e6e1e259
    • D
      ring-buffer: Remove unused #include <linux/trace_irq.h> · de29be5e
      David Sharp 提交于
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1291421609-14665-3-git-send-email-dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      de29be5e
    • D
      tracing: Add an 'overwrite' trace_option. · 750912fa
      David Sharp 提交于
      Add an "overwrite" trace_option for ftrace to control whether the buffer should
      be overwritten on overflow or not. The default remains to overwrite old events
      when the buffer is full. This patch adds the option to instead discard newest
      events when the buffer is full. This is useful to get a snapshot of traces just
      after enabling traces. Dropping the current event is also a simpler code path.
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1291844807-15481-1-git-send-email-dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      750912fa
  3. 03 3月, 2011 1 次提交
    • T
      blktrace: Remove blk_fill_rwbs_rq. · 2d3a8497
      Tao Ma 提交于
      If we enable trace events to trace block actions, We use
      blk_fill_rwbs_rq to analyze the corresponding actions
      in request's cmd_flags, but we only choose the minor 2 bits
      from it, so most of other flags(e.g, REQ_SYNC) are missing.
      For example, with a sync write we get:
      write_test-2409  [001]   160.013869: block_rq_insert: 3,64 W 0 () 258135 + =
      8 [write_test]
      
      Since now we have integrated the flags of both bio and request,
      it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and
      blk_fill_rwbs_rq isn't needed any more.
      
      With this patch, after a sync write we get:
      write_test-2417  [000]   226.603878: block_rq_insert: 3,64 WS 0 () 258135 +=
       8 [write_test]
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      2d3a8497
  4. 18 2月, 2011 1 次提交
  5. 14 2月, 2011 1 次提交
  6. 12 2月, 2011 1 次提交
    • S
      ftrace: Fix memory leak with function graph and cpu hotplug · 868baf07
      Steven Rostedt 提交于
      When the fuction graph tracer starts, it needs to make a special
      stack for each task to save the real return values of the tasks.
      All running tasks have this stack created, as well as any new
      tasks.
      
      On CPU hot plug, the new idle task will allocate a stack as well
      when init_idle() is called. The problem is that cpu hotplug does
      not create a new idle_task. Instead it uses the idle task that
      existed when the cpu went down.
      
      ftrace_graph_init_task() will add a new ret_stack to the task
      that is given to it. Because a clone will make the task
      have a stack of its parent it does not check if the task's
      ret_stack is already NULL or not. When the CPU hotplug code
      starts a CPU up again, it will allocate a new stack even
      though one already existed for it.
      
      The solution is to treat the idle_task specially. In fact, the
      function_graph code already does, just not at init_idle().
      Instead of using the ftrace_graph_init_task() for the idle task,
      which that function expects the task to be a clone, have a
      separate ftrace_graph_init_idle_task(). Also, we will create a
      per_cpu ret_stack that is used by the idle task. When we call
      ftrace_graph_init_idle_task() it will check if the idle task's
      ret_stack is NULL, if it is, then it will assign it the per_cpu
      ret_stack.
      Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Suggested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stable Tree <stable@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      868baf07
  7. 09 2月, 2011 3 次提交
  8. 08 2月, 2011 19 次提交
    • I
      tracing/syscalls: Early terminate search for sys_ni_syscall · ae07f551
      Ian Munsie 提交于
      Many system calls are unimplemented and mapped to sys_ni_syscall, but at
      boot ftrace would still search through every syscall metadata entry for
      a match which wouldn't be there.
      
      This patch adds causes the search to terminate early if the system call
      is not mapped.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      LKML-Reference: <1296703645-18718-7-git-send-email-imunsie@au1.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ae07f551
    • I
      tracing/syscalls: Allow arch specific syscall symbol matching · b2d55496
      Ian Munsie 提交于
      Some architectures have unusual symbol names and the generic code to
      match the symbol name with the function name for the syscall metadata
      will fail. For example, symbols on PPC64 start with a period and the
      generic code will fail to match them.
      
      This patch moves the match logic out into a separate function which an
      arch can override by defining ARCH_HAS_SYSCALL_MATCH_SYM_NAME in
      asm/ftrace.h and implementing arch_syscall_match_sym_name.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      LKML-Reference: <1296703645-18718-5-git-send-email-imunsie@au1.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b2d55496
    • I
      tracing/syscalls: Make arch_syscall_addr weak · c763ba06
      Ian Munsie 提交于
      Some architectures use non-trivial system call tables and will not work
      with the generic arch_syscall_addr code. For example, PowerPC64 uses a
      table of twin long longs.
      
      This patch makes the generic arch_syscall_addr weak to allow
      architectures with non-trivial system call tables to override it.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      LKML-Reference: <1296703645-18718-4-git-send-email-imunsie@au1.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c763ba06
    • I
      tracing/syscalls: Convert redundant syscall_nr checks into WARN_ON · 3773b389
      Ian Munsie 提交于
      With the ftrace events now checking if the syscall_nr is valid upon
      initialisation it should no longer be possible to register or unregister
      a syscall event without a valid syscall_nr since they should not be
      created. This adds a WARN_ON_ONCE in the register and unregister
      functions to locate potential regressions in the future.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      LKML-Reference: <1296703645-18718-3-git-send-email-imunsie@au1.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3773b389
    • I
      tracing/syscalls: Don't add events for unmapped syscalls · ba976970
      Ian Munsie 提交于
      FTRACE_SYSCALLS would create events for each and every system call, even
      if it had failed to map the system call's name with it's number. This
      resulted in a number of events being created that would not behave as
      expected.
      
      This could happen, for example, on architectures who's symbol names are
      unusual and will not match the system call name. It could also happen
      with system calls which were mapped to sys_ni_syscall.
      
      This patch changes the default system call number in the metadata to -1.
      If the system call name from the metadata is not successfully mapped to
      a system call number during boot, than the event initialisation routine
      will now return an error, preventing the event from being created.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      LKML-Reference: <1296703645-18718-2-git-send-email-imunsie@au1.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ba976970
    • S
      tracing/filter: Remove synchronize_sched() from __alloc_preds() · 4defe682
      Steven Rostedt 提交于
      Because the filters are processed first and then activated
      (added to the call), we no longer need to worry about the preds
      of the filter in __alloc_preds() being used. As the filter that
      is allocating preds is not activated yet.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4defe682
    • S
      tracing/filter: Swap entire filter of events · 75b8e982
      Steven Rostedt 提交于
      When creating a new filter, instead of allocating the filter to the
      event call first and then processing the filter, it is easier to
      process a temporary filter and then just swap it with the call filter.
      By doing this, it simplifies the code.
      
      A filter is allocated and processed, when it is done, it is
      swapped with the call filter, synchronize_sched() is called to make
      sure all callers are done with the old filter (filters are called
      with premption disabled), and then the old filter is freed.
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      75b8e982
    • S
      tracing/filter: Increase the max preds to 2^14 · bf93f9ed
      Steven Rostedt 提交于
      Now that the filter logic does not require to save the pred results
      on the stack, we can increase the max number of preds we allow.
      As the preds are index by a short value, and we use the MSBs as flags
      we can increase the max preds to 2^14 (16384) which should be way
      more than enough.
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bf93f9ed
    • S
      tracing/filter: Move MAX_FILTER_PRED to local tracing directory · 4a3d27e9
      Steven Rostedt 提交于
      The MAX_FILTER_PRED is only needed by the kernel/trace/*.c files.
      Move it to kernel/trace/trace.h.
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4a3d27e9
    • S
      tracing/filter: Optimize filter by folding the tree · 43cd4145
      Steven Rostedt 提交于
      There are many cases that a filter will contain multiple ORs or
      ANDs together near the leafs. Walking up and down the tree to get
      to the next compare can be a waste.
      
      If there are several ORs or ANDs together, fold them into a single
      pred and allocate an array of the conditions that they check.
      This will speed up the filter by linearly walking an array
      and can still break out if a short circuit condition is met.
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      43cd4145
    • S
      tracing/filter: Check the created pred tree · ec126cac
      Steven Rostedt 提交于
      Since the filter walks a tree to determine if a match is made or not,
      if the tree was incorrectly created, it could cause an infinite loop.
      
      Add a check to walk the entire tree before assigning it as a filter
      to make sure the tree is correct.
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ec126cac
    • S
      tracing/filter: Optimize short ciruit check · 55719274
      Steven Rostedt 提交于
      The test if we should break out early for OR and AND operations
      can be optimized by comparing the current result with
        (pred->op == OP_OR)
      
      That is if the result is true and the op is an OP_OR, or
      if the result is false and the op is not an OP_OR (thus an OP_AND)
      we can break out early in either case. Otherwise we continue
      processing.
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      55719274
    • S
      tracing/filter: Use a tree instead of stack for filter_match_preds() · 61e9dea2
      Steven Rostedt 提交于
      Currently the filter_match_preds() requires a stack to push
      and pop the preds to determine if the filter matches the record or not.
      This has two drawbacks:
      
      1) It requires a stack to store state information. As this is done
         in fast paths we can't allocate the storage for this stack, and
         we can't use a global as it must be re-entrant. The stack is stored
         on the kernel stack and this greatly limits how many preds we
         may allow.
      
      2) All conditions are calculated even when a short circuit exists.
         a || b  will always calculate a and b even though a was determined
         to be true.
      
      Using a tree we can walk a constant structure that will save
      the state as we go. The algorithm is simply:
      
        pred = root;
        do {
      	switch (move) {
      	case MOVE_DOWN:
      		if (OR or AND) {
      			pred = left;
      			continue;
      		}
      		if (pred == root)
      			break;
      		match = pred->fn();
      		pred = pred->parent;
      		move = left child ? MOVE_UP_FROM_LEFT : MOVE_UP_FROM_RIGHT;
      		continue;
      
      	case MOVE_UP_FROM_LEFT:
      		/* Only OR or AND can be a parent */
      		if (match && OR || !match && AND) {
      			/* short circuit */
      			if (pred == root)
      				break;
      			pred = pred->parent;
      			move = left child ?
      				MOVE_UP_FROM_LEFT :
      				MOVE_UP_FROM_RIGHT;
      			continue;
      		}
      		pred = pred->right;
      		move = MOVE_DOWN;
      		continue;
      
      	case MOVE_UP_FROM_RIGHT:
      		if (pred == root)
      			break;
      		pred = pred->parent;
      		move = left child ? MOVE_UP_FROM_LEFT : MOVE_UP_FROM_RIGHT;
      		continue;
      	}
      	done = 1;
        } while (!done);
      
      This way there's no strict limit to how many preds we allow
      and it also will short circuit the logical operations when possible.
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      61e9dea2
    • S
      tracing/filter: Free pred array on disabling of filter · f76690af
      Steven Rostedt 提交于
      When a filter is disabled, free the preds.
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f76690af
    • S
      tracing/filter: Allocate the preds in an array · 74e9e58c
      Steven Rostedt 提交于
      Currently we allocate an array of pointers to filter_preds, and then
      allocate a separate filter_pred for each item in the array.
      This adds slight overhead in the filters as it needs to derefernce
      twice to get to the op condition.
      
      Allocating the preds themselves in a single array removes a dereference
      as well as helps on the cache footprint.
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      74e9e58c
    • S
      tracing/filter: Call synchronize_sched() just once for system filters · 0fc3ca9a
      Steven Rostedt 提交于
      By separating out the reseting of the filter->n_preds to zero from
      the reallocation of preds for the filter, we can reset groups of
      filters first, call synchronize_sched() just once, and then reallocate
      each of the filters in the system group.
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0fc3ca9a
    • S
      tracing/filter: Dynamically allocate preds · c9c53ca0
      Steven Rostedt 提交于
      For every filter that is made, we create predicates to hold every
      operation within the filter. We have a max of 32 predicates that we
      can hold. Currently, we allocate all 32 even if we only need to
      use one.
      
      Part of the reason we do this is that the filter can be used at
      any moment by any event. Fortunately, the filter is only used
      with preemption disabled. By reseting the count of preds used "n_preds"
      to zero, then performing a synchronize_sched(), we can safely
      free and reallocate a new array of preds.
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c9c53ca0
    • S
      tracing/filter: Move OR and AND logic out of fn() method · 58d9a597
      Steven Rostedt 提交于
      The ops OR and AND act different from the other ops, as they
      are the only ones to take other ops as their arguements.
      These ops als change the logic of the filter_match_preds.
      
      By removing the OR and AND fn's we can also remove the val1 and val2
      that is passed to all other fn's and are unused.
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      58d9a597
    • S
      tracing/filter: Have no filter return a match · 6d54057d
      Steven Rostedt 提交于
      The n_preds field of a file can change at anytime, and even can become
      zero, just as the filter is about to be processed by an event.
      In the case that is zero on entering the filter, return 1, telling
      the caller the event matchs and should be trace.
      
      Also use a variable and assign it with ACCESS_ONCE() such that the
      count stays consistent within the function.
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6d54057d
  9. 07 2月, 2011 3 次提交
  10. 03 2月, 2011 1 次提交
    • S
      tracing: Replace syscall_meta_data struct array with pointer array · 3d56e331
      Steven Rostedt 提交于
      Currently the syscall_meta structures for the syscall tracepoints are
      placed in the __syscall_metadata section, and at link time, the linker
      makes one large array of all these syscall metadata structures. On boot
      up, this array is read (much like the initcall sections) and the syscall
      data is processed.
      
      The problem is that there is no guarantee that gcc will place complex
      structures nicely together in an array format. Two structures in the
      same file may be placed awkwardly, because gcc has no clue that they
      are suppose to be in an array.
      
      A hack was used previous to force the alignment to 4, to pack the
      structures together. But this caused alignment issues with other
      architectures (sparc).
      
      Instead of packing the structures into an array, the structures' addresses
      are now put into the __syscall_metadata section. As pointers are always the
      natural alignment, gcc should always pack them tightly together
      (otherwise initcall, extable, etc would also fail).
      
      By having the pointers to the structures in the section, we can still
      iterate the trace_events without causing unnecessary alignment problems
      with other architectures, or depending on the current behaviour of
      gcc that will likely change in the future just to tick us kernel developers
      off a little more.
      
      The __syscall_metadata section is also moved into the .init.data section
      as it is now only needed at boot up.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3d56e331