1. 14 8月, 2010 1 次提交
    • M
      tracing: Sanitize value returned from write(trace_marker, "...", len) · 1aa54bca
      Marcin Slusarz 提交于
      When userspace code writes non-new-line-terminated string to trace_marker
      file, write handler appends new-line and returns number of bytes written
      to trace buffer, so
      write(fd, "abc", 3) will return 4
      
      That's unexpected and unfortunately it confuses glibc's fprintf function.
      
      Example:
      int main() {
        fprintf(stderr, "abc");
        return 0;
      }
      
      $ gcc test.c -o test
      $ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
      $ ./test 2>/sys/kernel/debug/tracing/trace_marker
      
      results in infinite loop:
      write(fd, "abc", 3) = 4
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      (...)
      
      ...and kernel trace buffer full of empty markers.
      
      Fix it by sanitizing write return value.
      Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
      LKML-Reference: <20100727231801.GB2826@joi.lan>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1aa54bca
  2. 13 8月, 2010 1 次提交
    • S
      tracing/events: Convert format output to seq_file · 2a37a3df
      Steven Rostedt 提交于
      Two new events were added that broke the current format output.
      
      Both from the SCSI system: scsi_dispatch_cmd_done and scsi_dispatch_cmd_timeout
      
      The reason is that their print_fmt exceeded a page size. Since the output
      of the format used simple_read_from_buffer and trace_seq, it was limited
      to a page size in output.
      
      This patch converts the printing of the format of an event into seq_file,
      which allows greater than a page size to be shown.
      
      I diffed all event formats comparing the output with and without this
      patch. All matched except for the above two, which showed just:
      
        FORMAT TOO BIG
      
      without this patch, but now properly displays the output with this patch.
      
      v2: Remove updating *pos in seq start function.
         [ Thanks to Li Zefan for pointing that out ]
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com>
      Cc: James Bottomley <James.Bottomley@suse.de>
      Cc: Tomohiro Kusumi <kusumi.tomohiro@jp.fujitsu.com>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2a37a3df
  3. 12 8月, 2010 1 次提交
  4. 08 8月, 2010 3 次提交
  5. 07 8月, 2010 2 次提交
    • H
      tracing: Fix ring_buffer_read_page reading out of page boundary · 18fab912
      Huang Ying 提交于
      With the configuration: CONFIG_DEBUG_PAGEALLOC=y and Shaohua's patch:
      
      [PATCH]x86: make spurious_fault check correct pte bit
      
      Function call graph trace with the following will trigger a page fault.
      
      # cd /sys/kernel/debug/tracing/
      # echo function_graph > current_tracer
      # cat per_cpu/cpu1/trace_pipe_raw > /dev/null
      
      BUG: unable to handle kernel paging request at ffff880006e99000
      IP: [<ffffffff81085572>] rb_event_length+0x1/0x3f
      PGD 1b19063 PUD 1b1d063 PMD 3f067 PTE 6e99160
      Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      last sysfs file: /sys/devices/virtual/net/lo/operstate
      CPU 1
      Modules linked in:
      
      Pid: 1982, comm: cat Not tainted 2.6.35-rc6-aes+ #300 /Bochs
      RIP: 0010:[<ffffffff81085572>]  [<ffffffff81085572>] rb_event_length+0x1/0x3f
      RSP: 0018:ffff880006475e38  EFLAGS: 00010006
      RAX: 0000000000000ff0 RBX: ffff88000786c630 RCX: 000000000000001d
      RDX: ffff880006e98000 RSI: 0000000000000ff0 RDI: ffff880006e99000
      RBP: ffff880006475eb8 R08: 000000145d7008bd R09: 0000000000000000
      R10: 0000000000008000 R11: ffffffff815d9336 R12: ffff880006d08000
      R13: ffff880006e605d8 R14: 0000000000000000 R15: 0000000000000018
      FS:  00007f2b83e456f0(0000) GS:ffff880002100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: ffff880006e99000 CR3: 00000000064a8000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process cat (pid: 1982, threadinfo ffff880006474000, task ffff880006e40770)
      Stack:
       ffff880006475eb8 ffffffff8108730f 0000000000000ff0 000000145d7008bd
      <0> ffff880006e98010 ffff880006d08010 0000000000000296 ffff88000786c640
      <0> ffffffff81002956 0000000000000000 ffff8800071f4680 ffff8800071f4680
      Call Trace:
       [<ffffffff8108730f>] ? ring_buffer_read_page+0x15a/0x24a
       [<ffffffff81002956>] ? return_to_handler+0x15/0x2f
       [<ffffffff8108a575>] tracing_buffers_read+0xb9/0x164
       [<ffffffff810debfe>] vfs_read+0xaf/0x150
       [<ffffffff81002941>] return_to_handler+0x0/0x2f
       [<ffffffff810248b0>] __bad_area_nosemaphore+0x17e/0x1a1
       [<ffffffff81002941>] return_to_handler+0x0/0x2f
       [<ffffffff810248e6>] bad_area_nosemaphore+0x13/0x15
      Code: 80 25 b2 16 b3 00 fe c9 c3 55 48 89 e5 f0 80 0d a4 16 b3 00 02 c9 c3 55 31 c0 48 89 e5 48 83 3d 94 16 b3 00 01 c9 0f 94 c0 c3 55 <8a> 0f 48 89 e5 83 e1 1f b8 08 00 00 00 0f b6 d1 83 fa 1e 74 27
      RIP  [<ffffffff81085572>] rb_event_length+0x1/0x3f
       RSP <ffff880006475e38>
      CR2: ffff880006e99000
      ---[ end trace a6877bb92ccb36bb ]---
      
      The root cause is that ring_buffer_read_page() may read out of page
      boundary, because the boundary checking is done after reading. This is
      fixed via doing boundary checking before reading.
      Reported-by: NShaohua Li <shaohua.li@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      LKML-Reference: <1280297641.2771.307.camel@yhuang-dev>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      18fab912
    • S
      tracing: Fix an unallocated memory access in function_graph · 575570f0
      Shaohua Li 提交于
      With CONFIG_DEBUG_PAGEALLOC, I observed an unallocated memory access in
      function_graph trace. It appears we find a small size entry in ring buffer,
      but we access it as a big size entry. The access overflows the page size
      and touches an unallocated page.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      LKML-Reference: <1280217994.32400.76.camel@sli10-desk.sh.intel.com>
      [ Added a comment to explain the problem - SDR ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      575570f0
  6. 05 8月, 2010 2 次提交
  7. 04 8月, 2010 1 次提交
  8. 02 8月, 2010 1 次提交
    • F
      perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call · 669336e4
      Frederic Weisbecker 提交于
      We use synchronize_sched() to ensure a tracepoint won't be called
      while/after we release the perf buffers it references.
      
      But the tracepoint API has its own API for that:
      tracepoint_synchronize_unregister(). Use it instead as it's
      self-explanatory and eases maintainance.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      669336e4
  9. 27 7月, 2010 1 次提交
  10. 23 7月, 2010 1 次提交
  11. 21 7月, 2010 4 次提交
    • K
      tracing: Shrink max latency ringbuffer if unnecessary · ef710e10
      KOSAKI Motohiro 提交于
      Documentation/trace/ftrace.txt says
      
        buffer_size_kb:
      
              This sets or displays the number of kilobytes each CPU
              buffer can hold. The tracer buffers are the same size
              for each CPU. The displayed number is the size of the
              CPU buffer and not total size of all buffers. The
              trace buffers are allocated in pages (blocks of memory
              that the kernel uses for allocation, usually 4 KB in size).
              If the last page allocated has room for more bytes
              than requested, the rest of the page will be used,
              making the actual allocation bigger than requested.
              ( Note, the size may not be a multiple of the page size
                due to buffer management overhead. )
      
              This can only be updated when the current_tracer
              is set to "nop".
      
      But it's incorrect. currently total memory consumption is
      'buffer_size_kb x CPUs x 2'.
      
      Why two times difference is there? because ftrace implicitly allocate
      the buffer for max latency too.
      
      That makes sad result when admin want to use large buffer. (If admin
      want full logging and makes detail analysis). example, If admin
      have 24 CPUs machine and write 200MB to buffer_size_kb, the system
      consume ~10GB memory (200MB x 24 x 2). umm.. 5GB memory waste is
      usually unacceptable.
      
      Fortunatelly, almost all users don't use max latency feature.
      The max latency buffer can be disabled easily.
      
      This patch shrink buffer size of the max latency buffer if
      unnecessary.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      LKML-Reference: <20100701104554.DA2D.A69D9226@jp.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ef710e10
    • L
      tracing: Reduce latency and remove percpu trace_seq · bc289ae9
      Lai Jiangshan 提交于
      __print_flags() and __print_symbolic() use percpu trace_seq:
      
      1) Its memory is allocated at compile time, it wastes memory if we don't use tracing.
      2) It is percpu data and it wastes more memory for multi-cpus system.
      3) It disables preemption when it executes its core routine
         "trace_seq_printf(s, "%s: ", #call);" and introduces latency.
      
      So we move this trace_seq to struct trace_iterator.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4C078350.7090106@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bc289ae9
    • R
      trace: Reorder struct ring_buffer_per_cpu to remove padding on 64bit · 985023de
      Richard Kennedy 提交于
      Reorder structure to remove 8 bytes of padding on 64 bit builds.
      This shrinks the size to 128 bytes so allowing allocation from a smaller
      slab & needed one fewer cache lines.
      Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
      LKML-Reference: <1269516456.2054.8.camel@localhost>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      985023de
    • L
      tracing: Allow to disable cmdline recording · e870e9a1
      Li Zefan 提交于
      We found that even enabling a single trace event that will rarely be
      triggered can add big overhead to context switch.
      
      (lmbench context switch test)
       -------------------------------------------------
       2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
       ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
      ------ ------ ------ ------ ------ ------- -------
        2.19   2.3   2.21   2.56   2.13     2.54    2.07
        2.39   2.51  2.35   2.75   2.27     2.81    2.24
      
      The overhead is 6% ~ 11%.
      
      It's because when a trace event is enabled 3 tracepoints (sched_switch,
      sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.
      
      We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
      to allow to disable cmdline recording.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4C2D57F4.2050204@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e870e9a1
  12. 20 7月, 2010 3 次提交
  13. 16 7月, 2010 1 次提交
    • F
      tracing: Remove ksym tracer · 5d550467
      Frederic Weisbecker 提交于
      The ksym (breakpoint) ftrace plugin has been superseded by perf
      tools that are much more poweful to use the cpu breakpoints.
      This tracer doesn't bring more feature. It has been deprecated
      for a while now, lets remove it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      5d550467
  14. 09 7月, 2010 1 次提交
  15. 06 7月, 2010 1 次提交
    • M
      tracing/kprobes: Support "string" type · e09c8614
      Masami Hiramatsu 提交于
      Support string type tracing and printing in kprobe-tracer.
      
      This allows user to trace string data in kernel including __user data. Note
      that sometimes __user data may not be accessed if it is paged-out (sorry, but
      kprobes operation should be done in atomic, we can not wait for page-in).
      
      Commiter note: Fixed up conflicts with b7e2ecef.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100519195724.2885.18788.stgit@localhost6.localdomain6>
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e09c8614
  16. 29 6月, 2010 8 次提交
  17. 11 6月, 2010 1 次提交
    • S
      perf/tracing: Fix regression of perf losing kprobe events · a8fb2608
      Steven Rostedt 提交于
      With the addition of the code to shrink the kernel tracepoint
      infrastructure, we lost kprobes being traced by perf. The reason
      is that I tested if the "tp_event->class->perf_probe" existed before
      enabling it. This prevents "ftrace only" events (like the function
      trace events) from being enabled by perf.
      
      Unfortunately, kprobe events do not use perf_probe. This causes
      kprobes to be missed by perf. To fix this, we add the test to
      see if "tp_event->class->reg" exists as well as perf_probe.
      
      Normal trace events have only "perf_probe" but no "reg" function,
      and kprobes and syscalls have the "reg" but no "perf_probe".
      The ftrace unique events do not have either, so this is a valid
      test. If a kprobe or syscall is not to be probed by perf, the
      "reg" function is called anyway, and will return a failure and
      prevent perf from probing it.
      Reported-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Tested-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a8fb2608
  18. 09 6月, 2010 5 次提交
    • L
      tracing: Remove kmemtrace ftrace plugin · 039ca4e7
      Li Zefan 提交于
      We have been resisting new ftrace plugins and removing existing
      ones, and kmemtrace has been superseded by kmem trace events
      and perf-kmem, so we remove it.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NEduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      [ remove kmemtrace from the makefile, handle slob too ]
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      039ca4e7
    • P
      sched_clock: Add local_clock() API and improve documentation · c676329a
      Peter Zijlstra 提交于
      For people who otherwise get to write: cpu_clock(smp_processor_id()),
      there is now: local_clock().
      
      Also, as per suggestion from Andrew, provide some documentation on
      the various clock interfaces, and minimize the unsigned long long vs
      u64 mess.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jens Axboe <jaxboe@fusionio.com>
      LKML-Reference: <1275052414.1645.52.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c676329a
    • A
      tracing: Remove boot tracer · 30dbb20e
      Américo Wang 提交于
      The boot tracer is useless. It simply logs the initcalls
      but in fact these initcalls are also logged through printk
      while using the initcall_debug kernel parameter.
      
      Nobody seem to be using it so far. Then just remove it.
      Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Cc: Chase Douglas <chase.douglas@canonical.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <20100526105753.GA5677@cr0.nay.redhat.com>
      [ remove the hooks in main.c, and the headers ]
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      30dbb20e
    • F
      perf: Drop the skip argument from perf_arch_fetch_regs_caller · b0f82b81
      Frederic Weisbecker 提交于
      Drop this argument now that we always want to rewind only to the
      state of the first caller.
      It means frame pointers are not necessary anymore to reliably get
      the source of an event. But this also means we need this helper
      to be a macro now, as an inline function is not an option since
      we need to know when to provide a default implentation.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      b0f82b81
    • F
      x86: Unify dumpstack.h and stacktrace.h · c9cf4dbb
      Frederic Weisbecker 提交于
      arch/x86/include/asm/stacktrace.h and arch/x86/kernel/dumpstack.h
      declare headers of objects that deal with the same topic.
      Actually most of the files that include stacktrace.h also include
      dumpstack.h
      
      Although dumpstack.h seems more reserved for internals of stack
      traces, those are quite often needed to define specialized stack
      trace operations. And perf event arch headers are going to need
      access to such low level operations anyway. So don't continue to
      bother with dumpstack.h as it's not anymore about isolated deep
      internals.
      
      v2: fix struct stack_frame definition conflict in sysprof
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Soeren Sandmann <sandmann@daimi.au.dk>
      c9cf4dbb
  19. 04 6月, 2010 1 次提交
    • S
      tracing: Remove ftrace_preempt_disable/enable · 5168ae50
      Steven Rostedt 提交于
      The ftrace_preempt_disable/enable functions were to address a
      recursive race caused by the function tracer. The function tracer
      traces all functions which makes it easily susceptible to recursion.
      One area was preempt_enable(). This would call the scheduler and
      the schedulre would call the function tracer and loop.
      (So was it thought).
      
      The ftrace_preempt_disable/enable was made to protect against recursion
      inside the scheduler by storing the NEED_RESCHED flag. If it was
      set before the ftrace_preempt_disable() it would not call schedule
      on ftrace_preempt_enable(), thinking that if it was set before then
      it would have already scheduled unless it was already in the scheduler.
      
      This worked fine except in the case of SMP, where another task would set
      the NEED_RESCHED flag for a task on another CPU, and then kick off an
      IPI to trigger it. This could cause the NEED_RESCHED to be saved at
      ftrace_preempt_disable() but the IPI to arrive in the the preempt
      disabled section. The ftrace_preempt_enable() would not call the scheduler
      because the flag was already set before entring the section.
      
      This bug would cause a missed preemption check and cause lower latencies.
      
      Investigating further, I found that the recusion caused by the function
      tracer was not due to schedule(), but due to preempt_schedule(). Now
      that preempt_schedule is completely annotated with notrace, the recusion
      no longer is an issue.
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5168ae50
  20. 31 5月, 2010 1 次提交
    • R
      blktrace: Fix new kernel-doc warnings · 546cf44a
      Randy Dunlap 提交于
      Fix blktrace.c kernel-doc warnings:
       Warning(kernel/trace/blktrace.c:858): No description found for parameter 'ignore'
       Warning(kernel/trace/blktrace.c:890): No description found for parameter 'ignore'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100529114507.c466fc1e.randy.dunlap@oracle.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      546cf44a