1. 01 7月, 2014 1 次提交
    • S
      ftrace: Optimize function graph to be called directly · 79922b80
      Steven Rostedt (Red Hat) 提交于
      Function graph tracing is a bit different than the function tracers, as
      it is processed after either the ftrace_caller or ftrace_regs_caller
      and we only have one place to modify the jump to ftrace_graph_caller,
      the jump needs to happen after the restore of registeres.
      
      The function graph tracer is dependent on the function tracer, where
      even if the function graph tracing is going on by itself, the save and
      restore of registers is still done for function tracing regardless of
      if function tracing is happening, before it calls the function graph
      code.
      
      If there's no function tracing happening, it is possible to just call
      the function graph tracer directly, and avoid the wasted effort to save
      and restore regs for function tracing.
      
      This requires adding new flags to the dyn_ftrace records:
      
        FTRACE_FL_TRAMP
        FTRACE_FL_TRAMP_EN
      
      The first is set if the count for the record is one, and the ftrace_ops
      associated to that record has its own trampoline. That way the mcount code
      can call that trampoline directly.
      
      In the future, trampolines can be added to arbitrary ftrace_ops, where you
      can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
      and if they are not tracing the same functions, then instead of doing a
      loop to check all registered ftrace_ops against their hashes, just call the
      ftrace_ops trampoline directly, which would call the registered ftrace_ops
      function directly.
      
      Without this patch perf showed:
      
        0.05%  hackbench  [kernel.kallsyms]  [k] ftrace_caller
        0.05%  hackbench  [kernel.kallsyms]  [k] arch_local_irq_save
        0.05%  hackbench  [kernel.kallsyms]  [k] native_sched_clock
        0.04%  hackbench  [kernel.kallsyms]  [k] __buffer_unlock_commit
        0.04%  hackbench  [kernel.kallsyms]  [k] preempt_trace
        0.04%  hackbench  [kernel.kallsyms]  [k] prepare_ftrace_return
        0.04%  hackbench  [kernel.kallsyms]  [k] __this_cpu_preempt_check
        0.04%  hackbench  [kernel.kallsyms]  [k] ftrace_graph_caller
      
      See that the ftrace_caller took up more time than the ftrace_graph_caller
      did.
      
      With this patch:
      
        0.05%  hackbench  [kernel.kallsyms]  [k] __buffer_unlock_commit
        0.04%  hackbench  [kernel.kallsyms]  [k] call_filter_check_discard
        0.04%  hackbench  [kernel.kallsyms]  [k] ftrace_graph_caller
        0.04%  hackbench  [kernel.kallsyms]  [k] sched_clock
      
      The ftrace_caller is no where to be found and ftrace_graph_caller still
      takes up the same percentage.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      79922b80
  2. 30 6月, 2014 2 次提交
    • S
      ftrace: Add ftrace_rec_counter() macro to simplify the code · 0376bde1
      Steven Rostedt (Red Hat) 提交于
      The ftrace dynamic record has a flags element that also has a counter.
      Instead of hard coding "rec->flags & ~FTRACE_FL_MASK" all over the
      place. Use a macro instead.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0376bde1
    • S
      ftrace: Allow no regs if no more callbacks require it · 4fbb48cb
      Steven Rostedt (Red Hat) 提交于
      When registering a function callback for the function tracer, the ops
      can specify if it wants to save full regs (like an interrupt would)
      for each function that it traces, or if it does not care about regs
      and just wants to have the fastest return possible.
      
      Once a ops has registered a function, if other ops register that
      function they all will receive the regs too. That's because it does
      the work once, it does it for everyone.
      
      Now if the ops wanting regs unregisters the function so that there's
      only ops left that do not care about regs, those ops will still
      continue getting regs and going through the work for it on that
      function. This is because the disabling of the rec counter only
      sees the ops registered, and does not see the ops that are still
      attached, and does not know if the current ops that are still attached
      want regs or not. To play it safe, it just keeps regs being processed
      until no function is registered anymore.
      
      Instead of doing that, check the ops that are still registered for that
      function and if none want regs for it anymore, then disable the
      processing of regs.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4fbb48cb
  3. 11 6月, 2014 2 次提交
  4. 10 6月, 2014 2 次提交
  5. 07 6月, 2014 1 次提交
    • S
      tracing: Fix memory leak on instance deletion · a9fcaaac
      Steven Rostedt (Red Hat) 提交于
      When an instance is created, it also gets a snapshot ring buffer
      allocated (with minimum of pages). But when it is deleted the snapshot
      buffer is not. There was a helper function added to match the allocation
      of these ring buffers to a way to free them, but it wasn't used by
      the deletion of an instance. Using that helper function solves this
      memory leak.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a9fcaaac
  6. 06 6月, 2014 6 次提交
  7. 05 6月, 2014 1 次提交
  8. 04 6月, 2014 1 次提交
  9. 03 6月, 2014 1 次提交
    • M
      tracing: Print max callstack on stacktrace bug · e3172181
      Minchan Kim 提交于
      While I played with my own feature(ex, something on the way to reclaim),
      the kernel would easily oops. I guessed that the reason had to do with
      stack overflow and wanted to prove it.
      
      I discovered the stack tracer which proved to be very useful for me but
      the kernel would oops before my user program gather the information via
      "watch cat /sys/kernel/debug/tracing/stack_trace" so I couldn't get any
      message from that. What I needed was to have the stack tracer emit the
      kernel stack usage before it does the oops so I could find what was
      hogging the stack.
      
      This patch shows the callstack of max stack usage right before an oops so
      we can find a culprit.
      
      So, the result is as follows.
      
      [ 1116.522206] init: lightdm main process (1246) terminated with status 1
      [ 1119.922916] init: failsafe-x main process (1272) terminated with status 1
      [ 3887.728131] kworker/u24:1 (6637) used greatest stack depth: 256 bytes left
      [ 6397.629227] cc1 (9554) used greatest stack depth: 128 bytes left
      [ 7174.467392]         Depth    Size   Location    (47 entries)
      [ 7174.467392]         -----    ----   --------
      [ 7174.467785]   0)     7248     256   get_page_from_freelist+0xa7/0x920
      [ 7174.468506]   1)     6992     352   __alloc_pages_nodemask+0x1cd/0xb20
      [ 7174.469224]   2)     6640       8   alloc_pages_current+0x10f/0x1f0
      [ 7174.469413]   3)     6632     168   new_slab+0x2c5/0x370
      [ 7174.469413]   4)     6464       8   __slab_alloc+0x3a9/0x501
      [ 7174.469413]   5)     6456      80   __kmalloc+0x1cb/0x200
      [ 7174.469413]   6)     6376     376   vring_add_indirect+0x36/0x200
      [ 7174.469413]   7)     6000     144   virtqueue_add_sgs+0x2e2/0x320
      [ 7174.469413]   8)     5856     288   __virtblk_add_req+0xda/0x1b0
      [ 7174.469413]   9)     5568      96   virtio_queue_rq+0xd3/0x1d0
      [ 7174.469413]  10)     5472     128   __blk_mq_run_hw_queue+0x1ef/0x440
      [ 7174.469413]  11)     5344      16   blk_mq_run_hw_queue+0x35/0x40
      [ 7174.469413]  12)     5328      96   blk_mq_insert_requests+0xdb/0x160
      [ 7174.469413]  13)     5232     112   blk_mq_flush_plug_list+0x12b/0x140
      [ 7174.469413]  14)     5120     112   blk_flush_plug_list+0xc7/0x220
      [ 7174.469413]  15)     5008      64   io_schedule_timeout+0x88/0x100
      [ 7174.469413]  16)     4944     128   mempool_alloc+0x145/0x170
      [ 7174.469413]  17)     4816      96   bio_alloc_bioset+0x10b/0x1d0
      [ 7174.469413]  18)     4720      48   get_swap_bio+0x30/0x90
      [ 7174.469413]  19)     4672     160   __swap_writepage+0x150/0x230
      [ 7174.469413]  20)     4512      32   swap_writepage+0x42/0x90
      [ 7174.469413]  21)     4480     320   shrink_page_list+0x676/0xa80
      [ 7174.469413]  22)     4160     208   shrink_inactive_list+0x262/0x4e0
      [ 7174.469413]  23)     3952     304   shrink_lruvec+0x3e1/0x6a0
      [ 7174.469413]  24)     3648      80   shrink_zone+0x3f/0x110
      [ 7174.469413]  25)     3568     128   do_try_to_free_pages+0x156/0x4c0
      [ 7174.469413]  26)     3440     208   try_to_free_pages+0xf7/0x1e0
      [ 7174.469413]  27)     3232     352   __alloc_pages_nodemask+0x783/0xb20
      [ 7174.469413]  28)     2880       8   alloc_pages_current+0x10f/0x1f0
      [ 7174.469413]  29)     2872     200   __page_cache_alloc+0x13f/0x160
      [ 7174.469413]  30)     2672      80   find_or_create_page+0x4c/0xb0
      [ 7174.469413]  31)     2592      80   ext4_mb_load_buddy+0x1e9/0x370
      [ 7174.469413]  32)     2512     176   ext4_mb_regular_allocator+0x1b7/0x460
      [ 7174.469413]  33)     2336     128   ext4_mb_new_blocks+0x458/0x5f0
      [ 7174.469413]  34)     2208     256   ext4_ext_map_blocks+0x70b/0x1010
      [ 7174.469413]  35)     1952     160   ext4_map_blocks+0x325/0x530
      [ 7174.469413]  36)     1792     384   ext4_writepages+0x6d1/0xce0
      [ 7174.469413]  37)     1408      16   do_writepages+0x23/0x40
      [ 7174.469413]  38)     1392      96   __writeback_single_inode+0x45/0x2e0
      [ 7174.469413]  39)     1296     176   writeback_sb_inodes+0x2ad/0x500
      [ 7174.469413]  40)     1120      80   __writeback_inodes_wb+0x9e/0xd0
      [ 7174.469413]  41)     1040     160   wb_writeback+0x29b/0x350
      [ 7174.469413]  42)      880     208   bdi_writeback_workfn+0x11c/0x480
      [ 7174.469413]  43)      672     144   process_one_work+0x1d2/0x570
      [ 7174.469413]  44)      528     112   worker_thread+0x116/0x370
      [ 7174.469413]  45)      416     240   kthread+0xf3/0x110
      [ 7174.469413]  46)      176     176   ret_from_fork+0x7c/0xb0
      [ 7174.469413] ------------[ cut here ]------------
      [ 7174.469413] kernel BUG at kernel/trace/trace_stack.c:174!
      [ 7174.469413] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ 7174.469413] Dumping ftrace buffer:
      [ 7174.469413]    (ftrace buffer empty)
      [ 7174.469413] Modules linked in:
      [ 7174.469413] CPU: 0 PID: 440 Comm: kworker/u24:0 Not tainted 3.14.0+ #212
      [ 7174.469413] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [ 7174.469413] Workqueue: writeback bdi_writeback_workfn (flush-253:0)
      [ 7174.469413] task: ffff880034170000 ti: ffff880029518000 task.ti: ffff880029518000
      [ 7174.469413] RIP: 0010:[<ffffffff8112336e>]  [<ffffffff8112336e>] stack_trace_call+0x2de/0x340
      [ 7174.469413] RSP: 0000:ffff880029518290  EFLAGS: 00010046
      [ 7174.469413] RAX: 0000000000000030 RBX: 000000000000002f RCX: 0000000000000000
      [ 7174.469413] RDX: 0000000000000000 RSI: 000000000000002f RDI: ffffffff810b7159
      [ 7174.469413] RBP: ffff8800295182f0 R08: ffffffffffffffff R09: 0000000000000000
      [ 7174.469413] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff82768dfc
      [ 7174.469413] R13: 000000000000f2e8 R14: ffff8800295182b8 R15: 00000000000000f8
      [ 7174.469413] FS:  0000000000000000(0000) GS:ffff880037c00000(0000) knlGS:0000000000000000
      [ 7174.469413] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 7174.469413] CR2: 00002acd0b994000 CR3: 0000000001c0b000 CR4: 00000000000006f0
      [ 7174.469413] Stack:
      [ 7174.469413]  0000000000000000 ffffffff8114fdb7 0000000000000087 0000000000001c50
      [ 7174.469413]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [ 7174.469413]  0000000000000002 ffff880034170000 ffff880034171028 0000000000000000
      [ 7174.469413] Call Trace:
      [ 7174.469413]  [<ffffffff8114fdb7>] ? get_page_from_freelist+0xa7/0x920
      [ 7174.469413]  [<ffffffff816eee3f>] ftrace_call+0x5/0x2f
      [ 7174.469413]  [<ffffffff81165065>] ? next_zones_zonelist+0x5/0x70
      [ 7174.469413]  [<ffffffff810a23fa>] ? __bfs+0x11a/0x270
      [ 7174.469413]  [<ffffffff81165065>] ? next_zones_zonelist+0x5/0x70
      [ 7174.469413]  [<ffffffff8114fdb7>] ? get_page_from_freelist+0xa7/0x920
      [ 7174.469413]  [<ffffffff8119092f>] ? alloc_pages_current+0x10f/0x1f0
      [ 7174.469413]  [<ffffffff811507fd>] __alloc_pages_nodemask+0x1cd/0xb20
      [ 7174.469413]  [<ffffffff810a4de6>] ? check_irq_usage+0x96/0xe0
      [ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
      [ 7174.469413]  [<ffffffff8119092f>] alloc_pages_current+0x10f/0x1f0
      [ 7174.469413]  [<ffffffff81199cd5>] ? new_slab+0x2c5/0x370
      [ 7174.469413]  [<ffffffff81199cd5>] new_slab+0x2c5/0x370
      [ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
      [ 7174.469413]  [<ffffffff816db002>] __slab_alloc+0x3a9/0x501
      [ 7174.469413]  [<ffffffff8119af8b>] ? __kmalloc+0x1cb/0x200
      [ 7174.469413]  [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200
      [ 7174.469413]  [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200
      [ 7174.469413]  [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200
      [ 7174.469413]  [<ffffffff8119af8b>] __kmalloc+0x1cb/0x200
      [ 7174.469413]  [<ffffffff8141de10>] ? vring_add_indirect+0x200/0x200
      [ 7174.469413]  [<ffffffff8141dc46>] vring_add_indirect+0x36/0x200
      [ 7174.469413]  [<ffffffff8141e402>] virtqueue_add_sgs+0x2e2/0x320
      [ 7174.469413]  [<ffffffff8148e35a>] __virtblk_add_req+0xda/0x1b0
      [ 7174.469413]  [<ffffffff8148e503>] virtio_queue_rq+0xd3/0x1d0
      [ 7174.469413]  [<ffffffff8134aa0f>] __blk_mq_run_hw_queue+0x1ef/0x440
      [ 7174.469413]  [<ffffffff8134b0d5>] blk_mq_run_hw_queue+0x35/0x40
      [ 7174.469413]  [<ffffffff8134b7bb>] blk_mq_insert_requests+0xdb/0x160
      [ 7174.469413]  [<ffffffff8134be5b>] blk_mq_flush_plug_list+0x12b/0x140
      [ 7174.469413]  [<ffffffff81342237>] blk_flush_plug_list+0xc7/0x220
      [ 7174.469413]  [<ffffffff816e60ef>] ? _raw_spin_unlock_irqrestore+0x3f/0x70
      [ 7174.469413]  [<ffffffff816e16e8>] io_schedule_timeout+0x88/0x100
      [ 7174.469413]  [<ffffffff816e1665>] ? io_schedule_timeout+0x5/0x100
      [ 7174.469413]  [<ffffffff81149415>] mempool_alloc+0x145/0x170
      [ 7174.469413]  [<ffffffff8109baf0>] ? __init_waitqueue_head+0x60/0x60
      [ 7174.469413]  [<ffffffff811e246b>] bio_alloc_bioset+0x10b/0x1d0
      [ 7174.469413]  [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
      [ 7174.469413]  [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
      [ 7174.469413]  [<ffffffff81184110>] get_swap_bio+0x30/0x90
      [ 7174.469413]  [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
      [ 7174.469413]  [<ffffffff81184660>] __swap_writepage+0x150/0x230
      [ 7174.469413]  [<ffffffff810ab405>] ? do_raw_spin_unlock+0x5/0xa0
      [ 7174.469413]  [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
      [ 7174.469413]  [<ffffffff81184515>] ? __swap_writepage+0x5/0x230
      [ 7174.469413]  [<ffffffff81184782>] swap_writepage+0x42/0x90
      [ 7174.469413]  [<ffffffff8115ae96>] shrink_page_list+0x676/0xa80
      [ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
      [ 7174.469413]  [<ffffffff8115b872>] shrink_inactive_list+0x262/0x4e0
      [ 7174.469413]  [<ffffffff8115c1c1>] shrink_lruvec+0x3e1/0x6a0
      [ 7174.469413]  [<ffffffff8115c4bf>] shrink_zone+0x3f/0x110
      [ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
      [ 7174.469413]  [<ffffffff8115c9e6>] do_try_to_free_pages+0x156/0x4c0
      [ 7174.469413]  [<ffffffff8115cf47>] try_to_free_pages+0xf7/0x1e0
      [ 7174.469413]  [<ffffffff81150db3>] __alloc_pages_nodemask+0x783/0xb20
      [ 7174.469413]  [<ffffffff8119092f>] alloc_pages_current+0x10f/0x1f0
      [ 7174.469413]  [<ffffffff81145c0f>] ? __page_cache_alloc+0x13f/0x160
      [ 7174.469413]  [<ffffffff81145c0f>] __page_cache_alloc+0x13f/0x160
      [ 7174.469413]  [<ffffffff81146c6c>] find_or_create_page+0x4c/0xb0
      [ 7174.469413]  [<ffffffff811463e5>] ? find_get_page+0x5/0x130
      [ 7174.469413]  [<ffffffff812837b9>] ext4_mb_load_buddy+0x1e9/0x370
      [ 7174.469413]  [<ffffffff81284c07>] ext4_mb_regular_allocator+0x1b7/0x460
      [ 7174.469413]  [<ffffffff81281070>] ? ext4_mb_use_preallocated+0x40/0x360
      [ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
      [ 7174.469413]  [<ffffffff81287eb8>] ext4_mb_new_blocks+0x458/0x5f0
      [ 7174.469413]  [<ffffffff8127d83b>] ext4_ext_map_blocks+0x70b/0x1010
      [ 7174.469413]  [<ffffffff8124e6d5>] ext4_map_blocks+0x325/0x530
      [ 7174.469413]  [<ffffffff81253871>] ext4_writepages+0x6d1/0xce0
      [ 7174.469413]  [<ffffffff812531a0>] ? ext4_journalled_write_end+0x330/0x330
      [ 7174.469413]  [<ffffffff811539b3>] do_writepages+0x23/0x40
      [ 7174.469413]  [<ffffffff811d2365>] __writeback_single_inode+0x45/0x2e0
      [ 7174.469413]  [<ffffffff811d36ed>] writeback_sb_inodes+0x2ad/0x500
      [ 7174.469413]  [<ffffffff811d39de>] __writeback_inodes_wb+0x9e/0xd0
      [ 7174.469413]  [<ffffffff811d40bb>] wb_writeback+0x29b/0x350
      [ 7174.469413]  [<ffffffff81057c3d>] ? __local_bh_enable_ip+0x6d/0xd0
      [ 7174.469413]  [<ffffffff811d6e9c>] bdi_writeback_workfn+0x11c/0x480
      [ 7174.469413]  [<ffffffff81070610>] ? process_one_work+0x170/0x570
      [ 7174.469413]  [<ffffffff81070672>] process_one_work+0x1d2/0x570
      [ 7174.469413]  [<ffffffff81070610>] ? process_one_work+0x170/0x570
      [ 7174.469413]  [<ffffffff81071bb6>] worker_thread+0x116/0x370
      [ 7174.469413]  [<ffffffff81071aa0>] ? manage_workers.isra.19+0x2e0/0x2e0
      [ 7174.469413]  [<ffffffff81078e53>] kthread+0xf3/0x110
      [ 7174.469413]  [<ffffffff81078d60>] ? flush_kthread_worker+0x150/0x150
      [ 7174.469413]  [<ffffffff816ef0ec>] ret_from_fork+0x7c/0xb0
      [ 7174.469413]  [<ffffffff81078d60>] ? flush_kthread_worker+0x150/0x150
      [ 7174.469413] Code: c0 49 bc fc 8d 76 82 ff ff ff ff e8 44 5a 5b 00 31 f6 8b 05 95 2b b3 00 48 39 c6 7d 0e 4c 8b 04 f5 20 5f c5 81 49 83 f8 ff 75 11 <0f> 0b 48 63 05 71 5a 64 01 48 29 c3 e9 d0 fd ff ff 48 8d 5e 01
      [ 7174.469413] RIP  [<ffffffff8112336e>] stack_trace_call+0x2de/0x340
      [ 7174.469413]  RSP <ffff880029518290>
      [ 7174.469413] ---[ end trace c97d325b36b718f3 ]---
      
      Link: http://lkml.kernel.org/p/1401683592-1651-1-git-send-email-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e3172181
  10. 31 5月, 2014 1 次提交
  11. 30 5月, 2014 4 次提交
    • S
      tracing: Try again for saved cmdline if failed due to locking · 379cfdac
      Steven Rostedt (Red Hat) 提交于
      In order to prevent the saved cmdline cache from being filled when
      tracing is not active, the comms are only recorded after a trace event
      is recorded.
      
      The problem is, a comm can fail to be recorded if the trace_cmdline_lock
      is held. That lock is taken via a trylock to allow it to happen from
      any context (including NMI). If the lock fails to be taken, the comm
      is skipped. No big deal, as we will try again later.
      
      But! Because of the code that was added to only record after an event,
      we may not try again later as the recording is made as a oneshot per
      event per CPU.
      
      Only disable the recording of the comm if the comm is actually recorded.
      
      Fixes: 7ffbd48d "tracing: Cache comms only after an event occurred"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      379cfdac
    • Y
      tracing: Have saved_cmdlines use the seq_read infrastructure · 42584c81
      Yoshihiro YUNOMAE 提交于
      Current tracing_saved_cmdlines_read() implementation is naive; It allocates
      a large buffer, constructs output data to that buffer for each read
      operation, and then copies a portion of the buffer to the user space
      buffer. This has several issues such as slow memory allocation, high
      CPU usage, and even corruption of the output data.
      
      The seq_read infrastructure is made to handle this type of work.
      By converting it to use seq_read() the code becomes smaller, simplified,
      as well as correct.
      
      Link: http://lkml.kernel.org/p/20140220084431.3839.51793.stgit@yunodevelSigned-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      42584c81
    • S
      tracing: Add tracepoint benchmark tracepoint · 81dc9f0e
      Steven Rostedt (Red Hat) 提交于
      In order to help benchmark the time tracepoints take, a new config
      option is added called CONFIG_TRACEPOINT_BENCHMARK. When this option
      is set a tracepoint is created called "benchmark:benchmark_event".
      When the tracepoint is enabled, it kicks off a kernel thread that
      goes into an infinite loop (calling cond_sched() to let other tasks
      run), and calls the tracepoint. Each iteration will record the time
      it took to write to the tracepoint and the next iteration that
      data will be passed to the tracepoint itself. That is, the tracepoint
      will report the time it took to do the previous tracepoint.
      The string written to the tracepoint is a static string of 128 bytes
      to keep the time the same. The initial string is simply a write of
      "START". The second string records the cold cache time of the first
      write which is not added to the rest of the calculations.
      
      As it is a tight loop, it benchmarks as hot cache. That's fine because
      we care most about hot paths that are probably in cache already.
      
      An example of the output:
      
           START
           first=3672 [COLD CACHED]
           last=632 first=3672 max=632 min=632 avg=316 std=446 std^2=199712
           last=278 first=3672 max=632 min=278 avg=303 std=316 std^2=100337
           last=277 first=3672 max=632 min=277 avg=296 std=258 std^2=67064
           last=273 first=3672 max=632 min=273 avg=292 std=224 std^2=50411
           last=273 first=3672 max=632 min=273 avg=288 std=200 std^2=40389
           last=281 first=3672 max=632 min=273 avg=287 std=183 std^2=33666
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      81dc9f0e
    • S
      tracing: Print nasty banner when trace_printk() is in use · 2184db46
      Steven Rostedt 提交于
      trace_printk() is used to debug fast paths within the kernel. Places
      that gets called in any context (interrupt or NMI) or thousands of
      times a second. Something you do not want to do with a printk().
      
      In order to make it completely lockless as it needs a temporary buffer
      to handle some of the string formatting, a page is created per cpu for
      every context (four per cpu; normal, softirq, irq, NMI).
      
      Since trace_printk() should only be used for debugging purposes,
      there's no reason to waste memory on these buffers on a production
      system. That means, trace_printk() should never be used unless a
      developer is debugging their kernel. There's macro magic to allocate
      the buffers if trace_printk() is used anywhere in the kernel.
      
      To help enforce that trace_printk() isn't used outside of development,
      when it is used, a nasty banner is displayed on bootup (or when a module
      is loaded that uses trace_printk() and the kernel core does not).
      
      Here's the banner:
      
       **********************************************************
       **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
       **                                                      **
       ** trace_printk() being used. Allocating extra memory.  **
       **                                                      **
       ** This means that this is a DEBUG kernel and it is     **
       ** unsafe for produciton use.                           **
       **                                                      **
       ** If you see this message and you are not debugging    **
       ** the kernel, report this immediately to your vendor!  **
       **                                                      **
       **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
       **********************************************************
      
      That should hopefully keep developers from trying to sneak in a
      trace_printk() or two.
      
      Link: http://lkml.kernel.org/p/20140528131440.2283213c@gandalf.local.homeSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2184db46
  12. 21 5月, 2014 2 次提交
  13. 15 5月, 2014 1 次提交
    • S
      tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks · 4449bf92
      Steven Rostedt (Red Hat) 提交于
      Being able to show a cpumask of events can be useful as some events
      may affect only some CPUs. There is no standard way to record the
      cpumask and converting it to a string is rather expensive during
      the trace as traces happen in hotpaths. It would be better to record
      the raw event mask and be able to parse it at print time.
      
      The following macros were added for use with the TRACE_EVENT() macro:
      
        __bitmask()
        __assign_bitmask()
        __get_bitmask()
      
      To test this, I added this to the sched_migrate_task event, which
      looked like this:
      
      TRACE_EVENT(sched_migrate_task,
      
      	TP_PROTO(struct task_struct *p, int dest_cpu, const struct cpumask *cpus),
      
      	TP_ARGS(p, dest_cpu, cpus),
      
      	TP_STRUCT__entry(
      		__array(	char,	comm,	TASK_COMM_LEN	)
      		__field(	pid_t,	pid			)
      		__field(	int,	prio			)
      		__field(	int,	orig_cpu		)
      		__field(	int,	dest_cpu		)
      		__bitmask(	cpumask, num_possible_cpus()	)
      	),
      
      	TP_fast_assign(
      		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
      		__entry->pid		= p->pid;
      		__entry->prio		= p->prio;
      		__entry->orig_cpu	= task_cpu(p);
      		__entry->dest_cpu	= dest_cpu;
      		__assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus());
      	),
      
      	TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s",
      		  __entry->comm, __entry->pid, __entry->prio,
      		  __entry->orig_cpu, __entry->dest_cpu,
      		  __get_bitmask(cpumask))
      );
      
      With the output of:
      
              ksmtuned-3613  [003] d..2   485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f
           migration/1-13    [001] d..5   485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f
                   awk-3615  [002] d.H5   485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff
           migration/2-18    [002] d..5   485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f
      
      Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com
      Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.homeSuggested-by: NJavi Merino <javi.merino@arm.com>
      Tested-by: NJavi Merino <javi.merino@arm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4449bf92
  14. 14 5月, 2014 6 次提交
    • S
      ftrace: Remove FTRACE_UPDATE_MODIFY_CALL_REGS flag · f1b2f2bd
      Steven Rostedt (Red Hat) 提交于
      As the decision to what needs to be done (converting a call to the
      ftrace_caller to ftrace_caller_regs or to convert from ftrace_caller_regs
      to ftrace_caller) can easily be determined from the rec->flags of
      FTRACE_FL_REGS and FTRACE_FL_REGS_EN, there's no need to have the
      ftrace_check_record() return either a UPDATE_MODIFY_CALL_REGS or a
      UPDATE_MODIFY_CALL. Just he latter is enough. This added flag causes
      more complexity than is required. Remove it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f1b2f2bd
    • S
      ftrace: Use the ftrace_addr helper functions to find the ftrace_addr · 7c0868e0
      Steven Rostedt (Red Hat) 提交于
      With the moving of the functions that determine what the mcount call site
      should be replaced with into the generic code, there is a few places
      in the generic code that can use them instead of hard coding it as it
      does.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7c0868e0
    • S
      ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global · 7413af1f
      Steven Rostedt (Red Hat) 提交于
      Move and rename get_ftrace_addr() and get_ftrace_addr_old() to
      ftrace_get_addr_new() and ftrace_get_addr_curr() respectively.
      
      This moves these two helper functions in the generic code out from
      the arch specific code, and renames them to have a better generic
      name. This will allow other archs to use them as well as makes it
      a bit easier to work on getting separate trampolines for different
      functions.
      
      ftrace_get_addr_new() returns the trampoline address that the mcount
      call address will be converted to.
      
      ftrace_get_addr_curr() returns the trampoline address of what the
      mcount call address currently jumps to.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7413af1f
    • S
      ftrace: Always inline ftrace_hash_empty() helper function · 68f40969
      Steven Rostedt (Red Hat) 提交于
      The ftrace_hash_empty() function is a simple test:
      
      	return !hash || !hash->count;
      
      But gcc seems to want to make it a call. As this is in an extreme
      hot path of the function tracer, there's no reason it needs to be
      a call. I only wrote it to be a helper function anyway, otherwise
      it would have been inlined manually.
      
      Force gcc to inline it, as it could have also been a macro.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      68f40969
    • S
      ftrace: Write in missing comment from a very old commit · 19eab4a4
      Steven Rostedt (Red Hat) 提交于
      Back in 2011 Commit ed926f9b "ftrace: Use counters to enable
      functions to trace" changed the way ftrace accounts for enabled
      and disabled traced functions. There was a comment started as:
      
      	/*
      	 *
      	 */
      
      But never finished. Well, that's rather useless. I probably forgot
      to save the file before committing it. And it passed review from all
      this time.
      
      Anyway, better late than never. I updated the comment to express what
      is happening in that somewhat complex code.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      19eab4a4
    • S
      ftrace: Remove boolean of hash_enable and hash_disable · 66209a5b
      Steven Rostedt (Red Hat) 提交于
      Commit 4104d326 "ftrace: Remove global function list and call
      function directly" cleaned up the global_ops filtering and made
      the code simpler, but it left a variable "hash_enable" that was used
      to know if the hash functions should be updated or not. It was
      updated if the global_ops did not override them. As the global_ops
      are now no different than any other ftrace_ops, the hash always
      gets updated and there's no reason to use the hash_enable boolean.
      
      The same goes for hash_disable used in ftrace_shutdown().
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      66209a5b
  15. 06 5月, 2014 1 次提交
  16. 03 5月, 2014 1 次提交
    • S
      tracing: Use rcu_dereference_sched() for trace event triggers · 561a4fe8
      Steven Rostedt (Red Hat) 提交于
      As trace event triggers are now part of the mainline kernel, I added
      my trace event trigger tests to my test suite I run on all my kernels.
      Now these tests get run under different config options, and one of
      those options is CONFIG_PROVE_RCU, which checks under lockdep that
      the rcu locking primitives are being used correctly. This triggered
      the following splat:
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      3.15.0-rc2-test+ #11 Not tainted
      -------------------------------
      kernel/trace/trace_events_trigger.c:80 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 0
      4 locks held by swapper/1/0:
       #0:  ((&(&j_cdbs->work)->timer)){..-...}, at: [<ffffffff8104d2cc>] call_timer_fn+0x5/0x1be
       #1:  (&(&pool->lock)->rlock){-.-...}, at: [<ffffffff81059856>] __queue_work+0x140/0x283
       #2:  (&p->pi_lock){-.-.-.}, at: [<ffffffff8106e961>] try_to_wake_up+0x2e/0x1e8
       #3:  (&rq->lock){-.-.-.}, at: [<ffffffff8106ead3>] try_to_wake_up+0x1a0/0x1e8
      
      stack backtrace:
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc2-test+ #11
      Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
       0000000000000001 ffff88007e083b98 ffffffff819f53a5 0000000000000006
       ffff88007b0942c0 ffff88007e083bc8 ffffffff81081307 ffff88007ad96d20
       0000000000000000 ffff88007af2d840 ffff88007b2e701c ffff88007e083c18
      Call Trace:
       <IRQ>  [<ffffffff819f53a5>] dump_stack+0x4f/0x7c
       [<ffffffff81081307>] lockdep_rcu_suspicious+0x107/0x110
       [<ffffffff810ee51c>] event_triggers_call+0x99/0x108
       [<ffffffff810e8174>] ftrace_event_buffer_commit+0x42/0xa4
       [<ffffffff8106aadc>] ftrace_raw_event_sched_wakeup_template+0x71/0x7c
       [<ffffffff8106bcbf>] ttwu_do_wakeup+0x7f/0xff
       [<ffffffff8106bd9b>] ttwu_do_activate.constprop.126+0x5c/0x61
       [<ffffffff8106eadf>] try_to_wake_up+0x1ac/0x1e8
       [<ffffffff8106eb77>] wake_up_process+0x36/0x3b
       [<ffffffff810575cc>] wake_up_worker+0x24/0x26
       [<ffffffff810578bc>] insert_work+0x5c/0x65
       [<ffffffff81059982>] __queue_work+0x26c/0x283
       [<ffffffff81059999>] ? __queue_work+0x283/0x283
       [<ffffffff810599b7>] delayed_work_timer_fn+0x1e/0x20
       [<ffffffff8104d3a6>] call_timer_fn+0xdf/0x1be^M
       [<ffffffff8104d2cc>] ? call_timer_fn+0x5/0x1be
       [<ffffffff81059999>] ? __queue_work+0x283/0x283
       [<ffffffff8104d823>] run_timer_softirq+0x1a4/0x22f^M
       [<ffffffff8104696d>] __do_softirq+0x17b/0x31b^M
       [<ffffffff81046d03>] irq_exit+0x42/0x97
       [<ffffffff81a08db6>] smp_apic_timer_interrupt+0x37/0x44
       [<ffffffff81a07a2f>] apic_timer_interrupt+0x6f/0x80
       <EOI>  [<ffffffff8100a5d8>] ? default_idle+0x21/0x32
       [<ffffffff8100a5d6>] ? default_idle+0x1f/0x32
       [<ffffffff8100ac10>] arch_cpu_idle+0xf/0x11
       [<ffffffff8107b3a4>] cpu_startup_entry+0x1a3/0x213
       [<ffffffff8102a23c>] start_secondary+0x212/0x219
      
      The cause is that the triggers are protected by rcu_read_lock_sched() but
      the data is dereferenced with rcu_dereference() which expects it to
      be protected with rcu_read_lock(). The proper reference should be
      rcu_dereference_sched().
      
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      561a4fe8
  17. 02 5月, 2014 1 次提交
  18. 01 5月, 2014 2 次提交
  19. 30 4月, 2014 2 次提交
  20. 28 4月, 2014 1 次提交
    • S
      ftrace/module: Hardcode ftrace_module_init() call into load_module() · a949ae56
      Steven Rostedt (Red Hat) 提交于
      A race exists between module loading and enabling of function tracer.
      
      	CPU 1				CPU 2
      	-----				-----
        load_module()
         module->state = MODULE_STATE_COMING
      
      				register_ftrace_function()
      				 mutex_lock(&ftrace_lock);
      				 ftrace_startup()
      				  update_ftrace_function();
      				   ftrace_arch_code_modify_prepare()
      				    set_all_module_text_rw();
      				   <enables-ftrace>
      				    ftrace_arch_code_modify_post_process()
      				     set_all_module_text_ro();
      
      				[ here all module text is set to RO,
      				  including the module that is
      				  loading!! ]
      
         blocking_notifier_call_chain(MODULE_STATE_COMING);
          ftrace_init_module()
      
           [ tries to modify code, but it's RO, and fails!
             ftrace_bug() is called]
      
      When this race happens, ftrace_bug() will produces a nasty warning and
      all of the function tracing features will be disabled until reboot.
      
      The simple solution is to treate module load the same way the core
      kernel is treated at boot. To hardcode the ftrace function modification
      of converting calls to mcount into nops. This is done in init/main.c
      there's no reason it could not be done in load_module(). This gives
      a better control of the changes and doesn't tie the state of the
      module to its notifiers as much. Ftrace is special, it needs to be
      treated as such.
      
      The reason this would work, is that the ftrace_module_init() would be
      called while the module is in MODULE_STATE_UNFORMED, which is ignored
      by the set_all_module_text_ro() call.
      
      Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.comReported-by: NTakao Indoh <indou.takao@jp.fujitsu.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: stable@vger.kernel.org # 2.6.38+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a949ae56
  21. 25 4月, 2014 1 次提交