1. 17 6月, 2009 5 次提交
    • L
      tracing/filters: fix race between filter setting and module unload · 00e95830
      Li Zefan 提交于
      Module unload is protected by event_mutex, while setting filter is
      protected by filter_mutex. This leads to the race:
      
      echo 'bar == 0 || bar == 10' \    |
      		> sample/filter   |
                                        |  insmod sample.ko
        add_pred("bar == 0")            |
          -> n_preds == 1               |
        add_pred("bar == 100")          |
          -> n_preds == 2               |
                                        |  rmmod sample.ko
                                        |  insmod sample.ko
        add_pred("&&")                  |
          -> n_preds == 1 (should be 3) |
      
      Now event->filter->preds is corrupted. An then when filter_match_preds()
      is called, the WARN_ON() in it will be triggered.
      
      To avoid the race, we remove filter_mutex, and replace it with event_mutex.
      
      [ Impact: prevent corruption of filters by module removing and loading ]
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A375A4D.6000205@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      00e95830
    • L
      tracing/filters: free filter_string in destroy_preds() · 57be8887
      Li Zefan 提交于
      filter->filter_string is not freed when unloading a module:
      
       # insmod trace-events-sample.ko
       # echo "bar < 100" > /mnt/tracing/events/sample/foo_bar/filter
       # rmmod trace-events-sample.ko
      
      [ Impact: fix memory leak when unloading module ]
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A375A30.9060802@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      57be8887
    • S
      ring-buffer: use commit counters for commit pointer accounting · fa743953
      Steven Rostedt 提交于
      The ring buffer is made up of three sets of pointers.
      
      The head page pointer, which points to the next page for the reader to
      get.
      
      The commit pointer and commit index, which points to the page and index
      of the last committed write respectively.
      
      The tail pointer and tail index, which points to the page and the index
      of the last reserved data respectively (non committed).
      
      The commit pointer is only moved forward by the outer most writer.
      If a nested writer comes in, it will not move the pointer forward.
      
      The current implementation has a flaw. It assumes that the outer most
      writer successfully reserved data. There's a small race window where
      the outer most writer could find the tail pointer, but a nested
      writer could come in (via interrupt) and move the tail forward, and
      even the commit forward.
      
      The outer writer would not realized the commit moved forward and the
      accounting will break.
      
      This patch changes the design to use counters in the per cpu buffers
      to keep track of commits. The counters are incremented at the start
      of the commit, and decremented at the end. If the end commit counter
      is 1, then it moves the commit pointers. A loop is made to check for
      races between checking and moving the commit pointers. Only the outer
      commit should move the pointers anyway.
      
      The test of knowing if a reserve is equal to the last commit update
      is still needed to know for time keeping. The time code is much less
      racey than the commit updates.
      
      This change not only solves the mentioned race, but also makes the
      code simpler.
      
      [ Impact: fix commit race and simplify code ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      fa743953
    • S
      ring-buffer: remove unused variable · 263294f3
      Steven Rostedt 提交于
      Fix the compiler error:
      
      kernel/trace/ring_buffer.c: In function 'rb_move_tail':
      kernel/trace/ring_buffer.c:1236: warning: unused variable 'event'
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      263294f3
    • S
      ring-buffer: have benchmark test handle discarded events · 9086c7b9
      Steven Rostedt 提交于
      With the addition of commit:
      
        c7b09308
        ring-buffer: prevent adding write in discarded area
      
      The ring buffer may now add discarded events when a write passes
      the end of a buffer page. Before, a discarded event was only added
      when the tracer deliberately created one. The ring buffer benchmark
      test does not handle discarded events when it reads the buffer and
      fails when it encounters one.
      
      Also fix the increment for large data entries (luckily, the test did
      not add any yet).
      
      [ Impact: fix false failure of ring buffer self test ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9086c7b9
  2. 15 6月, 2009 6 次提交
  3. 10 6月, 2009 4 次提交
    • S
      tracing: add protection around module events unload · 110bf2b7
      Steven Rostedt 提交于
      When reading the trace buffer, there is a race that when a module
      is unloaded it removes events that is stilled referenced in the buffers.
      This patch adds the protection around the unloading of the events
      from modules and the reading of the trace buffers.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      110bf2b7
    • S
      tracing: add trace_seq_vprint interface · 725c624a
      Steven Rostedt 提交于
      The code to update the print formats for events requires a vprintf
      format in the trace_seq. This patch adds that interface.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      725c624a
    • L
      tracing/events: convert block trace points to TRACE_EVENT() · 55782138
      Li Zefan 提交于
      TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
      these new capabilities to this tracepoint:
      
        - zero-copy and per-cpu splice() tracing
        - binary tracing without printf overhead
        - structured logging records exposed under /debug/tracing/events
        - trace events embedded in function tracer output and other plugins
        - user-defined, per tracepoint filter expressions
        ...
      
      Cons:
      
        - no dev_t info for the output of plug, unplug_timer and unplug_io events.
          no dev_t info for getrq and sleeprq events if bio == NULL.
          no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.
      
          This is mainly because we can't get the deivce from a request queue.
          But this may change in the future.
      
        - A packet command is converted to a string in TP_assign, not TP_print.
          While blktrace do the convertion just before output.
      
          Since pc requests should be rather rare, this is not a big issue.
      
        - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
          has a unique format, which means we have some unused data in a trace entry.
      
          The overhead is minimized by using __dynamic_array() instead of __array().
      
      I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:
      
            dd                   dd + ioctl blktrace       dd + TRACE_EVENT (splice)
      1     7.36s, 42.7 MB/s     7.50s, 42.0 MB/s          7.41s, 42.5 MB/s
      2     7.43s, 42.3 MB/s     7.48s, 42.1 MB/s          7.43s, 42.4 MB/s
      3     7.38s, 42.6 MB/s     7.45s, 42.2 MB/s          7.41s, 42.5 MB/s
      
      So the overhead of tracing is very small, and no regression when using
      those trace events vs blktrace.
      
      And the binary output of TRACE_EVENT is much smaller than blktrace:
      
       # ls -l -h
       -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
       -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
       -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out
      
      Following are some comparisons between TRACE_EVENT and blktrace:
      
      plug:
        kjournald-480   [000]   303.084981: block_plug: [kjournald]
        kjournald-480   [000]   303.084981:   8,0    P   N [kjournald]
      
      unplug_io:
        kblockd/0-118   [000]   300.052973: block_unplug_io: [kblockd/0] 1
        kblockd/0-118   [000]   300.052974:   8,0    U   N [kblockd/0] 1
      
      remap:
        kjournald-480   [000]   303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384
        kjournald-480   [000]   303.085043:   8,0    A   W 102736992 + 8 <- (8,8) 33384
      
      bio_backmerge:
        kjournald-480   [000]   303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald]
        kjournald-480   [000]   303.085086:   8,0    M   W 102737032 + 8 [kjournald]
      
      getrq:
        kjournald-480   [000]   303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084975:   8,0    G   W 102736984 + 8 [kjournald]
      
        bash-2066  [001]  1072.953770:   8,0    G   N [bash]
        bash-2066  [001]  1072.953773: block_getrq: 0,0 N 0 + 0 [bash]
      
      rq_complete:
        konsole-2065  [001]   300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0]
        konsole-2065  [001]   300.053191:   8,0    C   W 103669040 + 16 [0]
      
        ksoftirqd/1-7   [001]  1072.953811:   8,0    C   N (5a 00 08 00 00 00 00 00 24 00) [0]
        ksoftirqd/1-7   [001]  1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0]
      
      rq_insert:
        kjournald-480   [000]   303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084986:   8,0    I   W 102736984 + 8 [kjournald]
      
      Changelog from v2 -> v3:
      
      - use the newly introduced __dynamic_array().
      
      Changelog from v1 -> v2:
      
      - use __string() instead of __array() to minimize the memory required
        to store hex dump of rq->cmd().
      
      - support large pc requests.
      
      - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.
      
      - some cleanups.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      55782138
    • S
      ring-buffer: fix ret in rb_add_time_stamp · f57a8a19
      Steven Rostedt 提交于
      The update of ret got mistakenly added to the if statement of
      rb_try_to_discard. The variable ret should be 1 on commit and zero
      otherwise.
      
      [ Impact: fix compiler warning and real bug ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f57a8a19
  4. 09 6月, 2009 1 次提交
    • P
      ring-buffer: pass in lockdep class key for reader_lock · 1f8a6a10
      Peter Zijlstra 提交于
      On Sun, 7 Jun 2009, Ingo Molnar wrote:
      > Testing tracer sched_switch: <6>Starting ring buffer hammer
      > PASSED
      > Testing tracer sysprof: PASSED
      > Testing tracer function: PASSED
      > Testing tracer irqsoff:
      > =============================================
      > PASSED
      > Testing tracer preemptoff: PASSED
      > Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ]
      > PASSED
      > Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760
      > ---------------------------------------------
      > rb_consumer/431 is trying to acquire lock:
      >  (&cpu_buffer->reader_lock){......}, at: [<c109eef7>] ring_buffer_reset_cpu+0x37/0x70
      >
      > but task is already holding lock:
      >  (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
      >
      > other info that might help us debug this:
      > 1 lock held by rb_consumer/431:
      >  #0:  (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
      
      The ring buffer is a generic structure, and can be used outside of
      ftrace. If ftrace traces within the use of the ring buffer, it can produce
      false positives with lockdep.
      
      This patch passes in a static lock key into the allocation of the ring
      buffer, so that different ring buffers will have their own lock class.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1244477919.13761.9042.camel@twins>
      
      [ store key in ring buffer descriptor ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1f8a6a10
  5. 03 6月, 2009 12 次提交
    • S
      tracing: add annotation to what type of stack trace is recorded · 563af16c
      Steven Rostedt 提交于
      The current method of printing out a stack trace is to add a new line
      and print out the trace:
      
          yum-updatesd-3120  [002]   573.691303:
       => do_softirq
       => irq_exit
       => smp_apic_timer_interrupt
       => apic_timer_interrupt
      
      This looks a bit awkward, and if we have both stack and user stack traces
      running, it would be nice to have a title to tell them apart, although
      it is easy to tell by the output.
      
      This patch adds an annotation to the start of the stack traces:
      
                  init-1     [003]   929.304979: <stack trace>
       => user_path_at
       => vfs_fstatat
       => vfs_stat
       => sys_newstat
       => system_call_fastpath
      
                   cat-3459  [002]  1016.824040: <user stack trace>
       =>  <0000003aae6c0250>
       =>  <00007ffff4b06ae4>
       =>  <69636172742f6775>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      563af16c
    • S
      tracing: fix multiple use of __print_flags and __print_symbolic · 56d8bd3f
      Steven Whitehouse 提交于
      Here is an updated patch to include the extra call to
      trace_seq_init() as requested. This is vs. the latest
      -tip tree and fixes the use of multiple __print_flags
      and __print_symbolic in a single tracer. Also tested
      to ensure its working now:
      
      mount.gfs2-2534  [000]   235.850587: gfs2_glock_queue: 8.7 glock 1:2 dequeue PR
      mount.gfs2-2534  [000]   235.850591: gfs2_demote_rq: 8.7 glock 1:0 demote EX to NL flags:DI
      mount.gfs2-2534  [000]   235.850591: gfs2_glock_queue: 8.7 glock 1:0 dequeue EX
      glock_workqueue-2529  [000]   235.850666: gfs2_glock_state_change: 8.7 glock 1:0 state EX => NL tgt:NL dmt:NL flags:lDpI
      glock_workqueue-2529  [000]   235.850672: gfs2_glock_put: 8.7 glock 1:0 state NL => IV flags:I
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      LKML-Reference: <1244037123.29604.603.camel@localhost.localdomain>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      56d8bd3f
    • W
      tracing/events: fix output format of user stack · 048dc50c
      walimis 提交于
      According to "events/ftrace/user_stack/format", fix the output of
      user stack.
      
      before fix:
      
        sh-1073  [000]    31.137561:  <b7f274fe> <-  <0804e33c> <-  <080835c1>
      
      after fix:
      
        sh-1072  [000]    37.039329:
       =>  <b7f8a4fe>
       =>  <0804e33c>
       =>  <080835c1>
      Signed-off-by: Nwalimis <walimisdev@gmail.com>
      LKML-Reference: <1244016090-7814-3-git-send-email-walimisdev@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      048dc50c
    • W
      tracing/events: fix output format of kernel stack · f11b3f4e
      walimis 提交于
      According to "events/ftrace/kernel_stack/format", output format of
      kernel stack should use "=>" instead of "<=".
      
      The second problem is that we shouldn't skip the first entry in the stack,
      although it seems to be duplicated when used in the "function" tracer,
      but events also use it. If we skip the first one, we will drop the topmost
      entry of the stack.
      
      The last problem is that if the last entry is ULONG_MAX(0xffffffff), we should
      drop it, otherwise it will print a NULL name line.
      
      before fix:
      
            sh-1072  [000]   26.957239: sched_process_fork: parent sh:1072 child sh:1073
            sh-1072  [000]   26.957262:
       <= syscall_call
       <=
            sh-1072  [000]   26.957744: sched_switch: task sh:1072 [120] (R) ==> sh:1073 [120]
            sh-1072  [000]   26.957752:
       <= preempt_schedule
       <= wake_up_new_task
       <= do_fork
       <= sys_clone
       <= syscall_call
       <=
      
      After fix:
      
            sh-1075  [000]    39.791848: sched_process_fork: parent sh:1075  child sh:1076
            sh-1075  [000]    39.791871:
       => sys_clone
       => syscall_call
            sh-1075  [000]    39.792713: sched_switch: task sh:1075 [120] (R) ==> sh:1076 [120]
            sh-1075  [000]    39.792722:
       => schedule
       => preempt_schedule
       => wake_up_new_task
       => do_fork
       => sys_clone
       => syscall_call
      Signed-off-by: Nwalimis <walimisdev@gmail.com>
      LKML-Reference: <1244016090-7814-2-git-send-email-walimisdev@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f11b3f4e
    • W
      tracing/trace_stack: fix the number of entries in the header · 083a63b4
      walimis 提交于
      The last entry in the stack_dump_trace is ULONG_MAX, which is not
      a valid entry, but max_stack_trace.nr_entries has accounted for it.
      So when printing the header, we should decrease it by one.
      Before fix, print as following, for example:
      
      	Depth    Size   Location    (53 entries)	<--- should be 52
      	-----    ----   --------
        0)     3264     108   update_wall_time+0x4d5/0x9a0
        ...
       51)       80      80   syscall_call+0x7/0xb
       ^^^
         it's correct.
      Signed-off-by: Nwalimis <walimisdev@gmail.com>
      LKML-Reference: <1244016090-7814-1-git-send-email-walimisdev@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      083a63b4
    • S
      ring-buffer: discard timestamps that are at the start of the buffer · ea05b57c
      Steven Rostedt 提交于
      Every buffer page in the ring buffer includes its own time stamp.
      When an event is recorded to the ring buffer with a delta time greater
      than what can be held in the event header, a time stamp event is created.
      
      If the the create timestamp falls over to the next buffer page, it is
      redundant because the buffer page holds a full time stamp. This patch
      will try to discard the time stamp when it falls to the start of the
      next page.
      
      This change also fixes a issues with disarding events. If most events are
      discarded, timestamps will start to creep into the ring buffer. If we
      do not discard the timestamps then they can fill up the ring buffer over
      time and waste space.
      
      This change will keep time stamps from filling up over another page. If
      something is recorded in the buffer page, and the rest is filtered, then
      the time stamps can only fill up to the end of the page.
      
      [ Impact: prevent time stamps from filling ring buffer ]
      Reported-by: NTim Bird <tim.bird@am.sony.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ea05b57c
    • S
      ring-buffer: try to discard unneeded timestamps · edd813bf
      Steven Rostedt 提交于
      There are times that a race may happen that we add a timestamp in a
      nested write. This timestamp would just contain a zero delta and serves
      no purpose.
      
      Now that we have a way to discard events, this patch will try to discard
      the timestamp instead of just wasting the space in the ring buffer.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      edd813bf
    • T
      ring-buffer: fix bug in ring_buffer_discard_commit · a2023556
      Tim Bird 提交于
      There's a bug in ring_buffer_discard_commit.  The wrong
      pointer is being compared in order to check if the event
      can be freed from the buffer rather than discarded
      (i.e. marked as PAD).
      
      I noticed this when I was working on duration filtering.
      The bug is not deadly - it just results in lots of wasted
      space in the buffer.  All filtered events are left in
      the buffer and marked as discarded, rather than being
      removed from the buffer to make space for other events.
      
      Unfortunately, when I fixed this bug, I got errors doing a
      filtered function trace.  Multiple TIME_EXTEND
      events pile up in the buffer, and trigger the
      following loop overage warning in rb_iter_peek():
      
      again:
      	...
      	if (RB_WARN_ON(cpu_buffer, ++nr_loops > 10))
      		return NULL;
      
      I'm not sure what the best way is to fix this. I don't
      know if I should extend the loop threshhold, or if I should
      make the test more complex (ignore TIME_EXTEND
      events), or just get rid of this loop check completely.
      
      Note that if I implement a workaround for this, then I
      see another problem from rb_advance_iter().  I haven't
      tracked that one down yet.
      
      In general, it seems like the case of removing filtered
      events has not been working properly, and so some assumptions
      about buffer invariant conditions need to be revisited.
      
      Here's the patch for the simple fix:
      
      Compare correct pointer for checking if an event can be
      freed rather than left as discarded in the buffer.
      Signed-off-by: NTim Bird <tim.bird@am.sony.com>
      LKML-Reference: <4A25BE9E.5090909@am.sony.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a2023556
    • S
      function-graph: always initialize task ret_stack · 84047e36
      Steven Rostedt 提交于
      On creating a new task while running the function graph tracer, if
      we fail to allocate the ret_stack, and then fail the fork, the
      code will free the parent ret_stack. This is because the child
      duplicated the parent and currently points to the parent's ret_stack.
      
      This patch always initializes the task's ret_stack to NULL.
      
      [ Impact: prevent crash of parent on low memory during fork ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      84047e36
    • S
      function-graph: add memory barriers for accessing task's ret_stack · 26c01624
      Steven Rostedt 提交于
      The code that handles the tasks ret_stack allocation for every task
      assumes that only an interrupt can cause issues (even though interrupts
      are disabled).
      
      In reality, the code is allocating the ret_stack for tasks that may be
      running on other CPUs and there are not efficient memory barriers to
      handle this case.
      
      [ Impact: prevent crash due to using of uninitialized ret_stack variables ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      26c01624
    • S
      function-graph: enable the stack after initialization of other variables · 82310a32
      Steven Rostedt 提交于
      The function graph tracer checks if the task_struct has ret_stack defined
      to know if it is OK or not to use it. The initialization is done for
      all tasks by one process, but the idle tasks use the same initialization
      used by new tasks.
      
      If an interrupt happens on an idle task that just had the ret_stack
      created, but before the rest of the initialization took place, then
      we can corrupt the return address of the functions.
      
      This patch moves the setting of the task_struct's ret_stack to after
      the other variables have been initialized.
      
      [ Impact: prevent kernel panic on idle task when starting function graph ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      82310a32
    • S
      function-graph: only allocate init tasks if it was not already done · 179c498a
      Steven Rostedt 提交于
      When the function graph tracer is enabled, it calls the initialization
      needed for the init tasks that would be called on all created tasks.
      
      The problem is that this is called every time the function graph tracer
      is enabled, and the ret_stack is allocated for the idle tasks each time.
      Thus, the old ret_stack is lost and a memory leak is created.
      
      This is also dangerous because if an interrupt happened on another CPU
      with the init task and the ret_stack is replaced, we then lose all the
      return pointers for the interrupt, and a crash would take place.
      
      [ Impact: fix memory leak and possible crash due to race ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      179c498a
  6. 02 6月, 2009 12 次提交
    • S
      ftrace: do not profile functions when disabled · 0f6ce3de
      Steven Rostedt 提交于
      A race was found that if one were to enable and disable the function
      profiler repeatedly, then the system can panic. This was because a profiled
      function may be preempted just before disabling interrupts. While
      the profiler is disabled and then reenabled, the preempted function
      could start again, and access the hash as it is being initialized.
      
      This just adds a check in the irq disabled part to check if the profiler
      is enabled, and if it is not then it will just exit.
      
      When the system is disabled, the profile_enabled variable is cleared
      before calling the unregistering of the function profiler. This
      unregistering calls stop machine which also acts as a synchronize schedule.
      
      [ Impact: fix panic in enabling/disabling function profiler ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0f6ce3de
    • S
      tracing: make trace pipe recognize latency format flag · 112f38a7
      Steven Rostedt 提交于
      The trace_pipe did not recognize the latency format flag and would produce
      different output than the trace file. The problem was partly due that
      the trace flags in the iterator was not set as well as the trace_pipe
      zeros out part of the iterator (including the flags) to be able to use
      the same routines as the trace file. trace_flags of the iterator should
      not cause any problems when not zeroed out by for trace_pipe.
      Reported-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      112f38a7
    • S
      tracing: add exports to use __print_symbolic and __print_flags from a module · ec081ddc
      Steven Whitehouse 提交于
      A patch to allow the use of __print_symbolic and __print_flags
      from a module. This allows the current GFS2 tracing patch to
      build.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      LKML-Reference: <1243868015.29604.542.camel@localhost.localdomain>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ec081ddc
    • L
      tracing/events: introduce __dynamic_array() · 7fcb7c47
      Li Zefan 提交于
      __string() is limited:
      
        - it's a char array, but we may want to define array with other types
        - a source string should be available, but we may just know the string size
      
      We introduce __dynamic_array() to break those limitations, and __string()
      becomes a wrapper of it. As a side effect, now __get_str() can be used
      in TP_fast_assign but not only TP_print.
      
      Take XFS for example, we have the string length in the dirent, but the
      string itself is not NULL-terminated, so __dynamic_array() can be used:
      
      TRACE_EVENT(xfs_dir2,
      	TP_PROTO(struct xfs_da_args *args),
      	TP_ARGS(args),
      
      	TP_STRUCT__entry(
      		__field(int, namelen)
      		__dynamic_array(char, name, args->namelen + 1)
      		...
      	),
      
      	TP_fast_assign(
      		char *name = __get_str(name);
      
      		if (args->namelen)
      			memcpy(name, args->name, args->namelen);
      		name[args->namelen] = '\0';
      
      		__entry->namelen = args->namelen;
      	),
      
      	TP_printk("name %.*s namelen %d",
      		  __entry->namelen ? __get_str(name) : NULL
      		  __entry->namelen)
      );
      
      [ Impact: allow defining dynamic size arrays ]
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A2384D2.3080403@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7fcb7c47
    • S
      tracing: combine the default tracers into one config · 897f17a6
      Steven Rostedt 提交于
      Both event tracer and sched switch plugin are selected by default
      by all generic tracers. But if no generic tracer is enabled, their options
      appear. But ether one of them will select the other, thus it only
      makes sense to have the default tracers be selected by one option.
      
      [ Impact: clean up kconfig menu ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      897f17a6
    • S
      tracing: fix config options to not show when automatically selected · 5e0a0939
      Steven Rostedt 提交于
      There are two options that are selected by all tracers, but we want
      to have those options available when no tracer is selected. These are
      
       The event tracer and sched switch tracer.
      
      The are enabled by all tracers, but if a tracer is not selected we want
      the options to appear. All tracers including them select TRACING.
      Thus what we would like to do is:
      
        config EVENT_TRACER
      	bool "prompt"
      	depends on TRACING
      	select TRACING
      
      But that gives us a bug in the kbuild system since we just created a
      circular dependency. We only want the prompt to show when TRACING is off.
      
      This patch adds GENERIC_TRACER that all tracers will select instead of
      TRACING. The two options (sched switch and event tracer) will select
      TRACING directly and depend on !GENERIC_TRACER. This solves the cicular
      dependency.
      
      [ Impact: hide options that are selected by default ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5e0a0939
    • S
      ftrace: add kernel command line function filtering · 2af15d6a
      Steven Rostedt 提交于
      When using ftrace=function on the command line to trace functions
      on boot up, one can not filter out functions that are commonly called.
      
      This patch adds two new ftrace command line commands.
      
        ftrace_notrace=function-list
        ftrace_filter=function-list
      
      Where function-list is a comma separated list of functions to filter.
      The ftrace_notrace will make the functions listed not be included
      in the function tracing, and ftrace_filter will only trace the functions
      listed.
      
      These two act the same as the debugfs/tracing/set_ftrace_notrace and
      debugfs/tracing/set_ftrace_filter respectively.
      
      The simple glob expressions that are allowed by the filter files can also
      be used by the command line interface.
      
      	ftrace_notrace=rcu*,*lock,*spin*
      
      Will not trace any function that starts with rcu, ends with lock, or has
      the word spin in it.
      
      Note, if the self tests are enabled, they may interfere with the filtering
      set by the command lines.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2af15d6a
    • F
      tracing/stat: remove unappropriate safe walk on list · 43bd1236
      Frederic Weisbecker 提交于
      register_stat_tracer() uses list_for_each_entry_safe
      to check whether a tracer is already present in the list.
      But we don't delete anything from the list here, so
      we don't need the safe version
      
      [ Impact: cleanup list use is stat tracing ]
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      43bd1236
    • L
      tracing/stat: do some cleanups · dbd3fbdf
      Li Zefan 提交于
      - remove duplicate code in stat_seq_init()
      - update comments to reflect the change from stat list to stat rbtree
      
      [ Impact: clean up ]
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      dbd3fbdf
    • L
      tracing/stat: remember to free root node · e1622806
      Li Zefan 提交于
      When closing a trace_stat file, we destroy the rbtree constructed during
      file open, but there is memory leak that the root node is not freed.
      
      [ Impact: fix memory leak when closing a trace_stat file ]
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      e1622806
    • L
      tracing/stat: change dummpy_cmp() to return -1 · b3dd7ba7
      Li Zefan 提交于
      Currently the output of trace_stat/workqueues is totally reversed:
      
       # cat /debug/tracing/trace_stat/workqueues
          ...
          1       17       17      210       37   `-blk_unplug_work+0x0/0x57
          1     3779     3779      181       11   |-cfq_kick_queue+0x0/0x2f
          1     3796     3796                     kblockd/1:120
          ...
      
      The correct output should be:
      
          1     3796     3796                     kblockd/1:120
          1     3779     3779      181       11   |-cfq_kick_queue+0x0/0x2f
          1       17       17      210       37   `-blk_unplug_work+0x0/0x57
      
      It's caused by "tracing/stat: replace linked list by an rbtree for
      sorting"
      (53059c9b67a62a3dc8c80204d3da42b9267ea5a0).
      
      dummpy_cmp() should return -1, so rb_node will always be inserted as
      right-most node in the rbtree, thus we sort the output in ascending
      order.
      
      [ Impact: fix the output of trace_stat/workqueues ]
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      b3dd7ba7
    • F
      tracing/stat: replace linked list by an rbtree for sorting · 8f184f27
      Frederic Weisbecker 提交于
      When the stat tracing framework prepares the entries from a tracer
      to output them to the user, it starts by computing a linear sort
      through a linked list to give the entries ordered by relevance
      to the user.
      
      This is quite ugly and causes a small latency when we begin to
      read the file.
      
      This patch changes that by turning the linked list into a red-black
      tree. Athough the whole iteration using the start and next tracer
      callbacks while opening the file remain the same, it is now much
      more fast and scalable.
      
      The rbtree guarantees O(log(n)) insertions whereas a linked
      list with linear sorting brought us a O(n) despair. Now the
      (visible) latency has disapeared.
      
      [ Impact: kill the latency while starting to read a stat tracer file ]
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      8f184f27