1. 01 4月, 2010 3 次提交
    • S
      ring-buffer: Add lost event count to end of sub buffer · ff0ff84a
      Steven Rostedt 提交于
      Currently, binary readers of the ring buffer only know where events were
      lost, but not how many events were lost at that location.
      This information is available, but it would require adding another
      field to the sub buffer header to include it.
      
      But when a event can not fit at the end of a sub buffer, it is written
      to the next sub buffer. This means there is a good chance that the
      buffer may have room to hold this counter. If it does, write
      the counter at the end of the sub buffer and set another flag
      in the data size field that states that this information exists.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ff0ff84a
    • S
      tracing: Show the lost events in the trace_pipe output · bc21b478
      Steven Rostedt 提交于
      Now that the ring buffer can keep track of where events are lost.
      Use this information to the output of trace_pipe:
      
             hackbench-3588  [001]  1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock
             hackbench-3588  [001]  1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock
             hackbench-3588  [001]  1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock
      CPU:1 [LOST 673 EVENTS]
             hackbench-3588  [001]  1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738
             hackbench-3588  [001]  1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem
             hackbench-3588  [001]  1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem
      
      Even works with the function graph tracer:
      
       2) ! 170.098 us  |                                            }
       2)   4.036 us    |                                            rcu_irq_exit();
       2)   3.657 us    |                                            idle_cpu();
       2) ! 190.301 us  |                                          }
      CPU:2 [LOST 2196 EVENTS]
       2)   0.853 us    |                            } /* cancel_dirty_page */
       2)               |                            remove_from_page_cache() {
       2)   1.578 us    |                              _raw_spin_lock_irq();
       2)               |                              __remove_from_page_cache() {
      
      Note, it does not work with the iterator "trace" file, since it requires
      the use of consuming the page from the ring buffer to determine how many
      events were lost, which the iterator does not do.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bc21b478
    • S
      ring-buffer: Add place holder recording of dropped events · 66a8cb95
      Steven Rostedt 提交于
      Currently, when the ring buffer drops events, it does not record
      the fact that it did so. It does inform the writer that the event
      was dropped by returning a NULL event, but it does not put in any
      place holder where the event was dropped.
      
      This is not a trivial thing to add because the ring buffer mostly
      runs in overwrite (flight recorder) mode. That is, when the ring
      buffer is full, new data will overwrite old data.
      
      In a produce/consumer mode, where new data is simply dropped when
      the ring buffer is full, it is trivial to add the placeholder
      for dropped events. When there's more room to write new data, then
      a special event can be added to notify the reader about the dropped
      events.
      
      But in overwrite mode, any new write can overwrite events. A place
      holder can not be inserted into the ring buffer since there never
      may be room. A reader could also come in at anytime and miss the
      placeholder.
      
      Luckily, the way the ring buffer works, the read side can find out
      if events were lost or not, and how many events. Everytime a write
      takes place, if it overwrites the header page (the next read) it
      updates a "overrun" variable that keeps track of the number of
      lost events. When a reader swaps out a page from the ring buffer,
      it can record this number, perfom the swap, and then check to
      see if the number changed, and take the diff if it has, which would be
      the number of events dropped. This can be stored by the reader
      and returned to callers of the reader.
      
      Since the reader page swap will fail if the writer moved the head
      page since the time the reader page set up the swap, this gives room
      to record the overruns without worrying about races. If the reader
      sets up the pages, records the overrun, than performs the swap,
      if the swap succeeds, then the overrun variable has not been
      updated since the setup before the swap.
      
      For binary readers of the ring buffer, a flag is set in the header
      of each sub page (sub buffer) of the ring buffer. This flag is embedded
      in the size field of the data on the sub buffer, in the 31st bit (the size
      can be 32 or 64 bits depending on the architecture), but only 27
      bits needs to be used for the actual size (less actually).
      
      We could add a new field in the sub buffer header to also record the
      number of events dropped since the last read, but this will change the
      format of the binary ring buffer a bit too much. Perhaps this change can
      be made if the information on the number of events dropped is considered
      important enough.
      
      Note, the notification of dropped events is only used by consuming reads
      or peeking at the ring buffer. Iterating over the ring buffer does not
      keep this information because the necessary data is only available when
      a page swap is made, and the iterator does not swap out pages.
      
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      66a8cb95
  2. 19 3月, 2010 1 次提交
    • S
      ring-buffer: Do 8 byte alignment for 64 bit that can not handle 4 byte align · 2271048d
      Steven Rostedt 提交于
      The ring buffer uses 4 byte alignment while recording events into the
      buffer, even on 64bit machines. This saves space when there are lots
      of events being recorded at 4 byte boundaries.
      
      The ring buffer has a zero copy method to write into the buffer, with
      the reserving of space and then committing it. This may cause problems
      when writing an 8 byte word into a 4 byte alignment (not 8). For x86 and
      PPC this is not an issue, but on some architectures this would cause an
      out-of-alignment exception.
      
      This patch uses CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to determine
      if it is OK to use 4 byte alignments on 64 bit machines. If it is not,
      it forces the ring buffer event header to be 8 bytes and not 4,
      and will align the length of the data to be 8 byte aligned.
      This keeps the data payload at 8 byte alignments and will allow these
      machines to run without issue.
      
      The trick to this is that the header can be either 4 bytes or 8 bytes
      depending on the length of the data payload. The 4 byte header
      has a length field that supports up to 112 bytes. If the length of
      the data is more than 112, the length field is set to zero, and the actual
      length is stored in the next 4 bytes after the header.
      
      When CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not set, the code forces
      zero in the 4 byte header forcing the length to be stored in the 4 byte
      array, even with a small data load. It also forces the length of the
      data load to be 8 byte aligned. The combination of these two guarantee
      that the data is always at 8 byte alignment.
      Tested-by: NFrederic Weisbecker <fweisbec@gmail.com>
                 (on sparc64)
      Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2271048d
  3. 17 3月, 2010 1 次提交
    • F
      perf: Fix unexported generic perf_arch_fetch_caller_regs · dcd5c166
      Frederic Weisbecker 提交于
      perf_arch_fetch_caller_regs() is exported for the overriden x86
      version, but not for the generic weak version.
      
      As a general rule, weak functions should not have their symbol
      exported in the same file they are defined.
      
      So let's export it on trace_event_perf.c as it is used by trace
      events only.
      
      This fixes:
      
      	ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
      	ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!
      
      -v2: And also only build it if trace events are enabled.
      -v3: Fix changelog mistake
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dcd5c166
  4. 13 3月, 2010 5 次提交
    • S
      tracing: Do not record user stack trace from NMI context · b6345879
      Steven Rostedt 提交于
      A bug was found with Li Zefan's ftrace_stress_test that caused applications
      to segfault during the test.
      
      Placing a tracing_off() in the segfault code, and examining several
      traces, I found that the following was always the case. The lock tracer
      was enabled (lockdep being required) and userstack was enabled. Testing
      this out, I just enabled the two, but that was not good enough. I needed
      to run something else that could trigger it. Running a load like hackbench
      did not work, but executing a new program would. The following would
      trigger the segfault within seconds:
      
        # echo 1 > /debug/tracing/options/userstacktrace
        # echo 1 > /debug/tracing/events/lock/enable
        # while :; do ls > /dev/null ; done
      
      Enabling the function graph tracer and looking at what was happening
      I finally noticed that all cashes happened just after an NMI.
      
       1)               |    copy_user_handle_tail() {
       1)               |      bad_area_nosemaphore() {
       1)               |        __bad_area_nosemaphore() {
       1)               |          no_context() {
       1)               |            fixup_exception() {
       1)   0.319 us    |              search_exception_tables();
       1)   0.873 us    |            }
      [...]
       1)   0.314 us    |  __rcu_read_unlock();
       1)   0.325 us    |    native_apic_mem_write();
       1)   0.943 us    |  }
       1)   0.304 us    |  rcu_nmi_exit();
      [...]
       1)   0.479 us    |  find_vma();
       1)               |  bad_area() {
       1)               |    __bad_area() {
      
      After capturing several traces of failures, all of them happened
      after an NMI. Curious about this, I added a trace_printk() to the NMI
      handler to read the regs->ip to see where the NMI happened. In which I
      found out it was here:
      
      ffffffff8135b660 <page_fault>:
      ffffffff8135b660:       48 83 ec 78             sub    $0x78,%rsp
      ffffffff8135b664:       e8 97 01 00 00          callq  ffffffff8135b800 <error_entry>
      
      What was happening is that the NMI would happen at the place that a page
      fault occurred. It would call rcu_read_lock() which was traced by
      the lock events, and the user_stack_trace would run. This would trigger
      a page fault inside the NMI. I do not see where the CR2 register is
      saved or restored in NMI handling. This means that it would corrupt
      the page fault handling that the NMI interrupted.
      
      The reason the while loop of ls helped trigger the bug, was that
      each execution of ls would cause lots of pages to be faulted in, and
      increase the chances of the race happening.
      
      The simple solution is to not allow user stack traces in NMI context.
      After this patch, I ran the above "ls" test for a couple of hours
      without any issues. Without this patch, the bug would trigger in less
      than a minute.
      
      Cc: stable@kernel.org
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b6345879
    • S
      tracing: Disable buffer switching when starting or stopping trace · a2f80714
      Steven Rostedt 提交于
      When the trace iterator is read, tracing_start() and tracing_stop()
      is called to stop tracing while the iterator is processing the trace
      output.
      
      These functions disable both the standard buffer and the max latency
      buffer. But if the wakeup tracer is running, it can switch these
      buffers between the two disables:
      
        buffer = global_trace.buffer;
        if (buffer)
            ring_buffer_record_disable(buffer);
      
            <<<--------- swap happens here
      
        buffer = max_tr.buffer;
        if (buffer)
            ring_buffer_record_disable(buffer);
      
      What happens is that we disabled the same buffer twice. On tracing_start()
      we can enable the same buffer twice. All ring_buffer_record_disable()
      must be matched with a ring_buffer_record_enable() or the buffer
      can be disable permanently, or enable prematurely, and cause a bug
      where a reset happens while a trace is commiting.
      
      This patch protects these two by taking the ftrace_max_lock to prevent
      a switch from occurring.
      
      Found with Li Zefan's ftrace_stress_test.
      
      Cc: stable@kernel.org
      Reported-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a2f80714
    • S
      tracing: Use same local variable when resetting the ring buffer · 283740c6
      Steven Rostedt 提交于
      In the ftrace code that resets the ring buffer it references the
      buffer with a local variable, but then uses the tr->buffer as the
      parameter to reset. If the wakeup tracer is running, which can
      switch the tr->buffer with the max saved buffer, this can break
      the requirement of disabling the buffer before the reset.
      
         buffer = tr->buffer;
         ring_buffer_record_disable(buffer);
         synchronize_sched();
         __tracing_reset(tr->buffer, cpu);
      
      If the tr->buffer is swapped, then the reset is not happening to the
      buffer that was disabled. This will cause the ring buffer to fail.
      
      Found with Li Zefan's ftrace_stress_test.
      
      Cc: stable@kernel.org
      Reported-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      283740c6
    • S
      function-graph: Init curr_ret_stack with ret_stack · ea14eb71
      Steven Rostedt 提交于
      If the graph tracer is active, and a task is forked but the allocating of
      the processes graph stack fails, it can cause crash later on.
      
      This is due to the temporary stack being NULL, but the curr_ret_stack
      variable is copied from the parent. If it is not -1, then in
      ftrace_graph_probe_sched_switch() the following:
      
      	for (index = next->curr_ret_stack; index >= 0; index--)
      		next->ret_stack[index].calltime += timestamp;
      
      Will cause a kernel OOPS.
      
      Found with Li Zefan's ftrace_stress_test.
      
      Cc: stable@kernel.org
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ea14eb71
    • L
      ring-buffer: Move disabled check into preempt disable section · 52fbe9cd
      Lai Jiangshan 提交于
      The ring buffer resizing and resetting relies on a schedule RCU
      action. The buffers are disabled, a synchronize_sched() is called
      and then the resize or reset takes place.
      
      But this only works if the disabling of the buffers are within the
      preempt disabled section, otherwise a window exists that the buffers
      can be written to while a reset or resize takes place.
      
      Cc: stable@kernel.org
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4B949E43.2010906@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      52fbe9cd
  5. 11 3月, 2010 2 次提交
  6. 10 3月, 2010 2 次提交
    • F
      perf: Drop the obsolete profile naming for trace events · 97d5a220
      Frederic Weisbecker 提交于
      Drop the obsolete "profile" naming used by perf for trace events.
      Perf can now do more than simple events counting, so generalize
      the API naming.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      97d5a220
    • F
      perf: Take a hot regs snapshot for trace events · c530665c
      Frederic Weisbecker 提交于
      We are taking a wrong regs snapshot when a trace event triggers.
      Either we use get_irq_regs(), which gives us the interrupted
      registers if we are in an interrupt, or we use task_pt_regs()
      which gives us the state before we entered the kernel, assuming
      we are lucky enough to be no kernel thread, in which case
      task_pt_regs() returns the initial set of regs when the kernel
      thread was started.
      
      What we want is different. We need a hot snapshot of the regs,
      so that we can get the instruction pointer to record in the
      sample, the frame pointer for the callchain, and some other
      things.
      
      Let's use the new perf_fetch_caller_regs() for that.
      
      Comparison with perf record -e lock: -R -a -f -g
      Before:
      
              perf  [kernel]                   [k] __do_softirq
                     |
                     --- __do_softirq
                        |
                        |--55.16%-- __open
                        |
                         --44.84%-- __write_nocancel
      
      After:
      
                  perf  [kernel]           [k] perf_tp_event
                     |
                     --- perf_tp_event
                        |
                        |--41.07%-- lock_acquire
                        |          |
                        |          |--39.36%-- _raw_spin_lock
                        |          |          |
                        |          |          |--7.81%-- hrtimer_interrupt
                        |          |          |          smp_apic_timer_interrupt
                        |          |          |          apic_timer_interrupt
      
      The old case was producing unreliable callchains. Now having
      right frame and instruction pointers, we have the trace we
      want.
      
      Also syscalls and kprobe events already have the right regs,
      let's use them instead of wasting a retrieval.
      
      v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs()
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Archs <linux-arch@vger.kernel.org>
      c530665c
  7. 06 3月, 2010 4 次提交
    • T
      function-graph: Add tracing_thresh support to function_graph tracer · 0e950173
      Tim Bird 提交于
      Add support for tracing_thresh to the function_graph tracer.  This
      version of this feature isolates the checks into new entry and
      return functions, to avoid adding more conditional code into the
      main function_graph paths.
      
      When the tracing_thresh is set and the function graph tracer is
      enabled, only the functions that took longer than the time in
      microseconds that was set in tracing_thresh are recorded. To do this
      efficiently, only the function exits are recorded:
      
       [tracing]# echo 100 > tracing_thresh
       [tracing]# echo function_graph > current_tracer
       [tracing]# cat trace
       # tracer: function_graph
       #
       # CPU  DURATION                  FUNCTION CALLS
       # |     |   |                     |   |   |   |
        1) ! 119.214 us  |  } /* smp_apic_timer_interrupt */
        1)   <========== |
        0) ! 101.527 us  |              } /* __rcu_process_callbacks */
        0) ! 126.461 us  |            } /* rcu_process_callbacks */
        0) ! 145.111 us  |          } /* __do_softirq */
        0) ! 149.667 us  |        } /* do_softirq */
        0) ! 168.817 us  |      } /* irq_exit */
        0) ! 248.254 us  |    } /* smp_apic_timer_interrupt */
      
      Also, add support for specifying tracing_thresh on the kernel
      command line.  When used like so: "tracing_thresh=200 ftrace=function_graph"
      this can be used to analyse system startup.  It is important to disable
      tracing soon after boot, in order to avoid losing the trace data.
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NTim Bird <tim.bird@am.sony.com>
      LKML-Reference: <4B87098B.4040308@am.sony.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0e950173
    • A
      tracing: Update the comm field in the right variable in update_max_tr · 1acaa1b2
      Arnaldo Carvalho de Melo 提交于
      The latency output showed:
      
       #    | task: -3 (uid:0 nice:0 policy:1 rt_prio:99)
      
      The comm is missing in the "task:" and it looks like a minus 3 is
      the output. The correct display should be:
      
       #    | task: migration/0-3 (uid:0 nice:0 policy:1 rt_prio:99)
      
      The problem is that the comm is being stored in the wrong data
      structure. The max_tr.data[cpu] is what stores the comm, not the
      tr->data[cpu].
      
      Before this patch the max_tr.data[cpu]->comm was zeroed and the /debug/trace
      ended up showing just the '-' sign followed by the pid.
      
      Also remove a needless initialization of max_data.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1267824230-23861-1-git-send-email-acme@infradead.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1acaa1b2
    • S
      function-graph: Use comment notation for func names of dangling '}' · a094fe04
      Steven Rostedt 提交于
      When a '}' does not have a matching function start, the name is printed
      within parenthesis. But this makes it confusing between ending '}'
      and function starts. This patch makes the function name appear in C comment
      notation.
      
      Old view:
       3)   1.281 us    |            } (might_fault)
       3)   3.620 us    |          } (filldir)
       3)   5.251 us    |        } (call_filldir)
       3)               |        call_filldir() {
       3)               |          filldir() {
      
      New view:
       3)   1.281 us    |            } /* might_fault */
       3)   3.620 us    |          } /* filldir */
       3)   5.251 us    |        } /* call_filldir */
       3)               |        call_filldir() {
       3)               |          filldir() {
      Requested-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a094fe04
    • S
      function-graph: Fix unused reference to ftrace_set_func() · 801c29fd
      Steven Rostedt 提交于
      The declaration of ftrace_set_func() is at the start of the ftrace.c file
      and wrapped with a #ifdef CONFIG_FUNCTION_GRAPH condition. If function
      graph tracing is enabled but CONFIG_DYNAMIC_FTRACE is not, a warning
      about that function being declared static and unused is given.
      
      This really should have been placed within the CONFIG_FUNCTION_GRAPH
      condition that uses ftrace_set_func().
      
      Moving the declaration down fixes the warning and makes the code cleaner.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      801c29fd
  8. 04 3月, 2010 1 次提交
  9. 03 3月, 2010 1 次提交
    • L
      tracing: Fix warning in s_next of trace file ops · ac91d854
      Lai Jiangshan 提交于
      This warning in s_next() can be triggered by lseek():
       [<c018b3f7>] ? s_next+0x77/0x80
       [<c013e3c1>] warn_slowpath_common+0x81/0xa0
       [<c018b3f7>] ? s_next+0x77/0x80
       [<c013e3fa>] warn_slowpath_null+0x1a/0x20
       [<c018b3f7>] s_next+0x77/0x80
       [<c01efa77>] traverse+0x117/0x200
       [<c01eff13>] seq_lseek+0xa3/0x120
       [<c01efe70>] ? seq_lseek+0x0/0x120
       [<c01d7081>] vfs_llseek+0x41/0x50
       [<c01d8116>] sys_llseek+0x66/0xa0
       [<c0102bd0>] sysenter_do_call+0x12/0x26
      
      The iterator "leftover" variable is zeroed in the opening of the trace
      file. But lseek can call s_start() which will call s_next() without
      reseting the "leftover" variable back to zero, which might trigger
      the WARN_ON_ONCE(iter->leftover) that is in s_next().
      
      Cc: stable@kernel.org
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4B8CE06A.9090207@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ac91d854
  10. 01 3月, 2010 2 次提交
  11. 27 2月, 2010 1 次提交
    • S
      ftrace: Add function names to dangling } in function graph tracer · f1c7f517
      Steven Rostedt 提交于
      The function graph tracer is currently the most invasive tracer
      in the ftrace family. It can easily overflow the buffer even with
      10megs per CPU. This means that events can often be lost.
      
      On start up, or after events are lost, if the function return is
      recorded but the function enter was lost, all we get to see is the
      exiting '}'.
      
      Here is how a typical trace output starts:
      
       [tracing] cat trace
       # tracer: function_graph
       #
       # CPU  DURATION                  FUNCTION CALLS
       # |     |   |                     |   |   |   |
        0) + 91.897 us   |                  }
        0) ! 567.961 us  |                }
        0)   <========== |
        0) ! 579.083 us  |                _raw_spin_lock_irqsave();
        0)   4.694 us    |                _raw_spin_unlock_irqrestore();
        0) ! 594.862 us  |              }
        0) ! 603.361 us  |            }
        0) ! 613.574 us  |          }
        0) ! 623.554 us  |        }
        0)   3.653 us    |        fget_light();
        0)               |        sock_poll() {
      
      There are a series of '}' with no matching "func() {". There's no information
      to what functions these ending brackets belong to.
      
      This patch adds a stack on the per cpu structure used in outputting
      the function graph tracer to keep track of what function was outputted.
      Then on a function exit event, it checks the depth to see if the
      function exit has a matching entry event. If it does, then it only
      prints the '}', otherwise it adds the function name after the '}'.
      
      This allows function exit events to show what function they belong to
      at trace output startup, when the entry was lost due to ring buffer
      overflow, or even after a new task is scheduled in.
      
      Here is what the above trace will look like after this patch:
      
       [tracing] cat trace
       # tracer: function_graph
       #
       # CPU  DURATION                  FUNCTION CALLS
       # |     |   |                     |   |   |   |
        0) + 91.897 us   |                  } (irq_exit)
        0) ! 567.961 us  |                } (smp_apic_timer_interrupt)
        0)   <========== |
        0) ! 579.083 us  |                _raw_spin_lock_irqsave();
        0)   4.694 us    |                _raw_spin_unlock_irqrestore();
        0) ! 594.862 us  |              } (add_wait_queue)
        0) ! 603.361 us  |            } (__pollwait)
        0) ! 613.574 us  |          } (tcp_poll)
        0) ! 623.554 us  |        } (sock_poll)
        0)   3.653 us    |        fget_light();
        0)               |        sock_poll() {
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f1c7f517
  12. 25 2月, 2010 6 次提交
  13. 17 2月, 2010 2 次提交
  14. 14 2月, 2010 1 次提交
  15. 12 2月, 2010 1 次提交
  16. 10 2月, 2010 1 次提交
    • S
      tracing: Add correct/incorrect to sort keys for branch annotation output · ede55c9d
      Steven Rostedt 提交于
      The branch annotation is a bit difficult to see the worst offenders
      because it only sorts by percentage:
      
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
             0      163 100 qdisc_restart                  sch_generic.c        179
             0      163 100 pfifo_fast_dequeue             sch_generic.c        447
             0        4 100 pskb_trim_rcsum                skbuff.h             1689
             0        4 100 llc_rcv                        llc_input.c          170
             0       18 100 psmouse_interrupt              psmouse-base.c       304
             0        3 100 atkbd_interrupt                atkbd.c              389
             0        5 100 usb_alloc_dev                  usb.c                437
             0       11 100 vsscanf                        vsprintf.c           1897
             0        2 100 IS_ERR                         err.h                34
             0       23 100 __rmqueue_fallback             page_alloc.c         865
             0        4 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 142
             0        3 100 move_masked_irq                migration.c          11
      
      Adding the incorrect and correct values as sort keys makes this file a
      bit more informative:
      
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
             0   366541 100 audit_syscall_entry            auditsc.c            1637
             0   366538 100 audit_syscall_exit             auditsc.c            1685
             0   115839 100 sched_info_switch              sched_stats.h        269
             0    74567 100 sched_info_queued              sched_stats.h        222
             0    66578 100 sched_info_dequeued            sched_stats.h        177
             0    15113 100 trace_workqueue_insertion      workqueue.h          38
             0    15107 100 trace_workqueue_execution      workqueue.h          45
             0     3622 100 syscall_trace_leave            ptrace.c             1772
             0     2750 100 sched_move_task                sched.c              10100
             0     2750 100 sched_move_task                sched.c              10110
             0     1815 100 pre_schedule_rt                sched_rt.c           1462
             0      837 100 audit_alloc                    auditsc.c            879
             0      814 100 tcp_mss_split_point            tcp_output.c         1302
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ede55c9d
  17. 05 2月, 2010 1 次提交
  18. 04 2月, 2010 3 次提交
    • A
      Fix misspellings of "truly" in comments. · c41b20e7
      Adam Buchbinder 提交于
      Some comments misspell "truly"; this fixes them. No code changes.
      Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      c41b20e7
    • M
      ftrace: Remove record freezing · f24bb999
      Masami Hiramatsu 提交于
      Remove record freezing. Because kprobes never puts probe on
      ftrace's mcount call anymore, it doesn't need ftrace to check
      whether kprobes on it.
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: przemyslaw@pawelczyk.it
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100202214925.4694.73469.stgit@dhcp-100-2-132.bos.redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f24bb999
    • M
      ftrace/alternatives: Introducing *_text_reserved functions · 2cfa1978
      Masami Hiramatsu 提交于
      Introducing *_text_reserved functions for checking the text
      address range is partially reserved or not. This patch provides
      checking routines for x86 smp alternatives and dynamic ftrace.
      Since both functions modify fixed pieces of kernel text, they
      should reserve and protect those from other dynamic text
      modifier, like kprobes.
      
      This will also be extended when introducing other subsystems
      which modify fixed pieces of kernel text. Dynamic text modifiers
      should avoid those.
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: przemyslaw@pawelczyk.it
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <20100202214911.4694.16587.stgit@dhcp-100-2-132.bos.redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2cfa1978
  19. 02 2月, 2010 1 次提交
    • L
      tracing: Fix circular dead lock in stack trace · 4f48f8b7
      Lai Jiangshan 提交于
      When we cat <debugfs>/tracing/stack_trace, we may cause circular lock:
      sys_read()
        t_start()
           arch_spin_lock(&max_stack_lock);
      
        t_show()
           seq_printf(), vsnprintf() .... /* they are all trace-able,
             when they are traced, max_stack_lock may be required again. */
      
      The following script can trigger this circular dead lock very easy:
      #!/bin/bash
      
      echo 1 > /proc/sys/kernel/stack_tracer_enabled
      
      mount -t debugfs xxx /mnt > /dev/null 2>&1
      
      (
      # make check_stack() zealous to require max_stack_lock
      for ((; ;))
      {
      	echo 1 > /mnt/tracing/stack_max_size
      }
      ) &
      
      for ((; ;))
      {
      	cat /mnt/tracing/stack_trace > /dev/null
      }
      
      To fix this bug, we increase the percpu trace_active before
      require the lock.
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4B67D4F9.9080905@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4f48f8b7
  20. 29 1月, 2010 1 次提交