1. 09 8月, 2009 1 次提交
    • F
      perf_counter: Fix/complete ftrace event records sampling · f413cdb8
      Frederic Weisbecker 提交于
      This patch implements the kernel side support for ftrace event
      record sampling.
      
      A new counter sampling attribute is added:
      
         PERF_SAMPLE_TP_RECORD
      
      which requests ftrace events record sampling. In this case
      if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
      fires, we emit the tracepoint binary record to the
      perfcounter event buffer, as a sample.
      
      Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
      record:
      
       perf record -f -F 1 -a -e workqueue:workqueue_execution
       perf report -D
      
       0x21e18 [0x48]: event: 9
       .
       . ... raw event: size 72 bytes
       .  0000:  09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff  ......H........
       .  0010:  0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00  ........!......
       .  0020:  2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e  +...........eve
       .  0030:  74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00  ts/1...........
       .  0040:  e0 b1 31 81 ff ff ff ff                          .......
      .
      0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
      
      The raw ftrace binary record starts at offset 0020.
      
      Translation:
      
       struct trace_entry {
      	type		= 0x2b = 43;
      	flags		= 1;
      	preempt_count	= 2;
      	pid		= 0xa = 10;
      	tgid		= 0xa = 10;
       }
      
       thread_comm = "events/1"
       thread_pid  = 0xa = 10;
       func	    = 0xffffffff8131b1e0 = flush_to_ldisc()
      
      What will come next?
      
       - Userspace support ('perf trace'), 'flight data recorder' mode
         for perf trace, etc.
      
       - The unconditional copy from the profiling callback brings
         some costs however if someone wants no such sampling to
         occur, and needs to be fixed in the future. For that we need
         to have an instant access to the perf counter attribute.
         This is a matter of a flag to add in the struct ftrace_event.
      
       - Take care of the events recursivity! Don't ever try to record
         a lock event for example, it seems some locking is used in
         the profiling fast path and lead to a tracing recursivity.
         That will be fixed using raw spinlock or recursivity
         protection.
      
       - [...]
      
       - Profit! :-)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f413cdb8
  2. 08 8月, 2009 3 次提交
  3. 06 8月, 2009 4 次提交
    • R
      ring-buffer: Fix advance of reader in rb_buffer_peek() · 469535a5
      Robert Richter 提交于
      When calling rb_buffer_peek() from ring_buffer_consume() and a
      padding event is returned, the function rb_advance_reader() is
      called twice. This may lead to missing samples or under high
      workloads to the warning below. This patch fixes this. If a padding
      event is returned by rb_buffer_peek() it will be consumed by the
      calling function now.
      
      Also, I simplified some code in ring_buffer_consume().
      
      ------------[ cut here ]------------
      WARNING: at /dev/shm/.source/linux/kernel/trace/ring_buffer.c:2289 rb_advance_reader+0x2e/0xc5()
      Hardware name: Anaheim
      Modules linked in:
      Pid: 29, comm: events/2 Tainted: G        W  2.6.31-rc3-oprofile-x86_64-standard-00059-g5050dc2 #1
      Call Trace:
      [<ffffffff8106776f>] ? rb_advance_reader+0x2e/0xc5
      [<ffffffff81039ffe>] warn_slowpath_common+0x77/0x8f
      [<ffffffff8103a025>] warn_slowpath_null+0xf/0x11
      [<ffffffff8106776f>] rb_advance_reader+0x2e/0xc5
      [<ffffffff81068bda>] ring_buffer_consume+0xa0/0xd2
      [<ffffffff81326933>] op_cpu_buffer_read_entry+0x21/0x9e
      [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
      [<ffffffff8132749b>] sync_buffer+0xa5/0x401
      [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
      [<ffffffff81326c1b>] ? wq_sync_buffer+0x0/0x78
      [<ffffffff81326c76>] wq_sync_buffer+0x5b/0x78
      [<ffffffff8104aa30>] worker_thread+0x113/0x1ac
      [<ffffffff8104dd95>] ? autoremove_wake_function+0x0/0x38
      [<ffffffff8104a91d>] ? worker_thread+0x0/0x1ac
      [<ffffffff8104dc9a>] kthread+0x88/0x92
      [<ffffffff8100bdba>] child_rip+0xa/0x20
      [<ffffffff8104dc12>] ? kthread+0x0/0x92
      [<ffffffff8100bdb0>] ? child_rip+0x0/0x20
      ---[ end trace f561c0a58fcc89bd ]---
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      469535a5
    • P
      ftrace: Fix perf-tracepoint OOPS · af6af30c
      Peter Zijlstra 提交于
      Not all tracepoints are created equal, in specific the ftrace
      tracepoints are created with TRACE_EVENT_FORMAT() which does
      not generate the needed bits to tie them into perf counters.
      
      For those events, don't create the 'id' file and fail
      ->profile_enable when their ID is specified through other
      means.
      Reported-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1249497664.5890.4.camel@laptop>
      [ v2: fix build error in the !CONFIG_EVENT_PROFILE case ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      af6af30c
    • S
      ring-buffer: do not disable ring buffer on oops_in_progress · 464e85eb
      Steven Rostedt 提交于
      The commit:
      
        commit e0fdace1
        Author: David Miller <davem@davemloft.net>
        Date:   Fri Aug 1 01:11:22 2008 -0700
      
          debug_locks: set oops_in_progress if we will log messages.
      
          Otherwise lock debugging messages on runqueue locks can deadlock the
          system due to the wakeups performed by printk().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
      Will permanently set oops_in_progress on any lockdep failure.
      When this triggers it will cause any read from the ring buffer to
      permanently disable the ring buffer (not to mention no locking of
      printk).
      
      This patch removes the check. It keeps the print in NMI which makes
      sense. This is probably OK, since the ring buffer should not cause
      something to set oops_in_progress anyway.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      464e85eb
    • S
      ring-buffer: fix check of try_to_discard result · 0f2541d2
      Steven Rostedt 提交于
      The function ring_buffer_discard_commit inversed the code path
      of the result of try_to_discard. It should skip incrementing the
      entry counter if try_to_discard succeeded. But instead, it increments
      the entry conder if it succeeded to discard, and does not increment
      it if it fails.
      
      The result of this bug is that filtering will make the stat counters
      incorrect.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0f2541d2
  4. 29 7月, 2009 2 次提交
    • L
      tracing: Fix missing function_graph events when we splice_read from trace_pipe · 74e7ff8c
      Lai Jiangshan 提交于
      About a half events are missing when we splice_read
      from trace_pipe. They are unexpectedly consumed because we ignore
      the TRACE_TYPE_NO_CONSUME return value used by the function graph
      tracer when it needs to consume the events by itself to walk on
      the ring buffer.
      
      The same problem appears with ftrace_dump()
      
      Example of an output before this patch:
      
      1)               |      ktime_get_real() {
      1)   2.846 us    |          read_hpet();
      1)   4.558 us    |        }
      1)   6.195 us    |      }
      
      After this patch:
      
      0)               |      ktime_get_real() {
      0)               |        getnstimeofday() {
      0)   1.960 us    |          read_hpet();
      0)   3.597 us    |        }
      0)   5.196 us    |      }
      
      The fix also applies on 2.6.30
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: stable@kernel.org
      LKML-Reference: <4A6EEC52.90704@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      74e7ff8c
    • L
      tracing: Fix invalid function_graph entry · 38ceb592
      Lai Jiangshan 提交于
      When print_graph_entry() computes a function call entry event, it needs
      to also check the next entry to guess if it matches the return event of
      the current function entry.
      In order to look at this next event, it needs to consume the current
      entry before going ahead in the ring buffer.
      
      However, if the current event that gets consumed is the last one in the
      ring buffer head page, the ring_buffer may reuse the page for writers.
      The consumed entry will then become invalid because of possible
      racy overwriting.
      
      Me must then handle this entry by making a copy of it.
      
      The fix also applies on 2.6.30
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: stable@kernel.org
      LKML-Reference: <4A6EEAEC.3050508@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      38ceb592
  5. 23 7月, 2009 5 次提交
  6. 17 7月, 2009 1 次提交
    • X
      tracing/function: Fix the return value of ftrace_trace_onoff_callback() · 04aef32d
      Xiao Guangrong 提交于
      ftrace_trace_onoff_callback() will return an error even if we do the
      right operation, for example:
      
       # echo _spin_*:traceon:10 > set_ftrace_filter
       -bash: echo: write error: Invalid argument
       # cat set_ftrace_filter
       #### all functions enabled ####
       _spin_trylock_bh:traceon:count=10
       _spin_unlock_irq:traceon:count=10
       _spin_unlock_bh:traceon:count=10
       _spin_lock_irq:traceon:count=10
       _spin_unlock:traceon:count=10
       _spin_trylock:traceon:count=10
       _spin_unlock_irqrestore:traceon:count=10
       _spin_lock_irqsave:traceon:count=10
       _spin_lock_bh:traceon:count=10
       _spin_lock:traceon:count=10
      
      We want to set _spin_*:traceon:10 to set_ftrace_filter, it complains
      with "Invalid argument", but the operation is successful.
      
      This is because ftrace_process_regex() returns the number of functions that
      matched the pattern. If the number is not 0, this value is returned
      by ftrace_regex_write() whereas we want to return the number of bytes
      virtually written.
      Also the file offset pointer is not updated in this case.
      
      If the number of matched functions is lower than the number of bytes written
      by the user, this results to a reprocessing of the string given by the user with
      a lower size, leading to a malformed ftrace regex and then a -EINVAL returned.
      
      So, this patch fixes it by returning 0 if no error occured.
      The fix also applies on 2.6.30
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: stable@kernel.org
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      04aef32d
  7. 13 7月, 2009 2 次提交
  8. 02 7月, 2009 1 次提交
  9. 29 6月, 2009 1 次提交
  10. 27 6月, 2009 2 次提交
  11. 26 6月, 2009 1 次提交
  12. 25 6月, 2009 2 次提交
    • P
      ring-buffer: Make it generally available · 1155de47
      Paul Mundt 提交于
      In hunting down the cause for the hwlat_detector ring buffer spew in
      my failed -next builds it became obvious that folks are now treating
      ring_buffer as something that is generic independent of tracing and thus,
      suitable for public driver consumption.
      
      Given that there are only a few minor areas in ring_buffer that have any
      reliance on CONFIG_TRACING or CONFIG_FUNCTION_TRACER, provide stubs for
      those and make it generally available.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Jon Masters <jcm@jonmasters.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <20090625053012.GB19944@linux-sh.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1155de47
    • L
      ftrace: Remove duplicate newline · 00e54d08
      Li Zefan 提交于
      Before:
        # echo 'sys_open:traceon:' > set_ftrace_filter
        # echo 'sys_close:traceoff:5' > set_ftrace_filter
        # cat set_ftrace_filter
        #### all functions enabled ####
        sys_open:traceon:unlimited
      
        sys_close:traceoff:count=0
      
      After:
        # cat set_ftrace_filter
        #### all functions enabled ####
        sys_open:traceon:unlimited
        sys_close:traceoff:count=0
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A4313A7.7030105@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      00e54d08
  13. 24 6月, 2009 8 次提交
  14. 20 6月, 2009 2 次提交
    • F
      tracing/urgent: warn in case of ftrace_start_up inbalance · 9ea1a153
      Frederic Weisbecker 提交于
      Prevent from further ftrace_start_up inbalances so that we avoid
      future nop patching omissions with dynamic ftrace.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      9ea1a153
    • F
      tracing/urgent: fix unbalanced ftrace_start_up · c85a17e2
      Frederic Weisbecker 提交于
      Perfcounter reports the following stats for a wide system
      profiling:
      
       #
       # (2364 samples)
       #
       # Overhead  Symbol
       # ........  ......
       #
          15.40%  [k] mwait_idle_with_hints
           8.29%  [k] read_hpet
           5.75%  [k] ftrace_caller
           3.60%  [k] ftrace_call
           [...]
      
      This snapshot has been taken while neither the function tracer nor
      the function graph tracer was running.
      With dynamic ftrace, such results show a wrong ftrace behaviour
      because all calls to ftrace_caller or ftrace_graph_caller (the patched
      calls to mcount) are supposed to be patched into nop if none of those
      tracers are running.
      
      The problem occurs after the first run of the function tracer. Once we
      launch it a second time, the callsites will never be nopped back,
      unless you set custom filters.
      For example it happens during the self tests at boot time.
      The function tracer selftest runs, and then the dynamic tracing is
      tested too. After that, the callsites are left un-nopped.
      
      This is because the reset callback of the function tracer tries to
      unregister two ftrace callbacks in once: the common function tracer
      and the function tracer with stack backtrace, regardless of which
      one is currently in use.
      It then creates an unbalance on ftrace_start_up value which is expected
      to be zero when the last ftrace callback is unregistered. When it
      reaches zero, the FTRACE_DISABLE_CALLS is set on the next ftrace
      command, triggering the patching into nop. But since it becomes
      unbalanced, ie becomes lower than zero, if the kernel functions
      are patched again (as in every further function tracer runs), they
      won't ever be nopped back.
      
      Note that ftrace_call and ftrace_graph_call are still patched back
      to ftrace_stub in the off case, but not the callers of ftrace_call
      and ftrace_graph_caller. It means that the tracing is well deactivated
      but we waste a useless call into every kernel function.
      
      This patch just unregisters the right ftrace_ops for the function
      tracer on its reset callback and ignores the other one which is
      not registered, fixing the unbalance. The problem also happens
      is .30
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: stable@kernel.org
      c85a17e2
  15. 19 6月, 2009 2 次提交
    • S
      function-graph: add stack frame test · 71e308a2
      Steven Rostedt 提交于
      In case gcc does something funny with the stack frames, or the return
      from function code, we would like to detect that.
      
      An arch may implement passing of a variable that is unique to the
      function and can be saved on entering a function and can be tested
      when exiting the function. Usually the frame pointer can be used for
      this purpose.
      
      This patch also implements this for x86. Where it passes in the stack
      frame of the parent function, and will test that frame on exit.
      
      There was a case in x86_32 with optimize for size (-Os) where, for a
      few functions, gcc would align the stack frame and place a copy of the
      return address into it. The function graph tracer modified the copy and
      not the actual return address. On return from the funtion, it did not go
      to the tracer hook, but returned to the parent. This broke the function
      graph tracer, because the return of the parent (where gcc did not do
      this funky manipulation) returned to the location that the child function
      was suppose to. This caused strange kernel crashes.
      
      This test detected the problem and pointed out where the issue was.
      
      This modifies the parameters of one of the functions that the arch
      specific code calls, so it includes changes to arch code to accommodate
      the new prototype.
      
      Note, I notice that the parsic arch implements its own push_return_trace.
      This is now a generic function and the ftrace_push_return_trace should be
      used instead. This patch does not touch that code.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      71e308a2
    • S
      function-graph: disable when both x86_32 and optimize for size are configured · eb4a0378
      Steven Rostedt 提交于
      On x86_32, when optimize for size is set, gcc may align the frame pointer
      and make a copy of the the return address inside the stack frame.
      The return address that is located in the stack frame may not be
      the one used to return to the calling function. This will break the
      function graph tracer.
      
      The function graph tracer replaces the return address with a jump to a hook
      function that can trace the exit of the function. If it only replaces
      a copy, then the hook will not be called when the function returns.
      Worse yet, when the parent function returns, the function graph tracer
      will return back to the location of the child function which will
      easily crash the kernel with weird results.
      
      To see the problem, when i386 is compiled with -Os we get:
      
      c106be03:       57                      push   %edi
      c106be04:       8d 7c 24 08             lea    0x8(%esp),%edi
      c106be08:       83 e4 e0                and    $0xffffffe0,%esp
      c106be0b:       ff 77 fc                pushl  0xfffffffc(%edi)
      c106be0e:       55                      push   %ebp
      c106be0f:       89 e5                   mov    %esp,%ebp
      c106be11:       57                      push   %edi
      c106be12:       56                      push   %esi
      c106be13:       53                      push   %ebx
      c106be14:       81 ec 8c 00 00 00       sub    $0x8c,%esp
      c106be1a:       e8 f5 57 fb ff          call   c1021614 <mcount>
      
      When it is compiled with -O2 instead we get:
      
      c10896f0:       55                      push   %ebp
      c10896f1:       89 e5                   mov    %esp,%ebp
      c10896f3:       83 ec 28                sub    $0x28,%esp
      c10896f6:       89 5d f4                mov    %ebx,0xfffffff4(%ebp)
      c10896f9:       89 75 f8                mov    %esi,0xfffffff8(%ebp)
      c10896fc:       89 7d fc                mov    %edi,0xfffffffc(%ebp)
      c10896ff:       e8 d0 08 fa ff          call   c1029fd4 <mcount>
      
      The compile with -Os will align the stack pointer then set up the
      frame pointer (%ebp), and it copies the return address back into
      the stack frame. The change to the return address in mcount is done
      to the copy and not the real place holder of the return address.
      
      Then compile with -O2 sets up the frame pointer first, this makes
      the change to the return address by mcount affect where the function
      will jump on exit.
      Reported-by: NJake Edge <jake@lwn.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      eb4a0378
  16. 18 6月, 2009 3 次提交