1. 04 5月, 2017 2 次提交
  2. 01 5月, 2017 1 次提交
    • S
      ring-buffer: Return reader page back into existing ring buffer · 73a757e6
      Steven Rostedt (VMware) 提交于
      When reading the ring buffer for consuming, it is optimized for splice,
      where a page is taken out of the ring buffer (zero copy) and sent to the
      reading consumer. When the read is finished with the page, it calls
      ring_buffer_free_read_page(), which simply frees the page. The next time the
      reader needs to get a page from the ring buffer, it must call
      ring_buffer_alloc_read_page() which allocates and initializes a reader page
      for the ring buffer to be swapped into the ring buffer for a new filled page
      for the reader.
      
      The problem is that there's no reason to actually free the page when it is
      passed back to the ring buffer. It can hold it off and reuse it for the next
      iteration. This completely removes the interaction with the page_alloc
      mechanism.
      
      Using the trace-cmd utility to record all events (causing trace-cmd to
      require reading lots of pages from the ring buffer, and calling
      ring_buffer_alloc/free_read_page() several times), and also assigning a
      stack trace trigger to the mm_page_alloc event, we can see how many times
      the ring_buffer_alloc_read_page() needed to allocate a page for the ring
      buffer.
      
      Before this change:
      
        # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
        # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
        9968
      
      After this change:
      
        # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
        # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
        4
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      73a757e6
  3. 21 4月, 2017 25 次提交
  4. 20 4月, 2017 2 次提交
    • S
      ring-buffer: Have ring_buffer_iter_empty() return true when empty · 78f7a45d
      Steven Rostedt (VMware) 提交于
      I noticed that reading the snapshot file when it is empty no longer gives a
      status. It suppose to show the status of the snapshot buffer as well as how
      to allocate and use it. For example:
      
       ># cat snapshot
       # tracer: nop
       #
       #
       # * Snapshot is allocated *
       #
       # Snapshot commands:
       # echo 0 > snapshot : Clears and frees snapshot buffer
       # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
       #                      Takes a snapshot of the main buffer.
       # echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
       #                      (Doesn't have to be '2' works with any number that
       #                       is not a '0' or '1')
      
      But instead it just showed an empty buffer:
      
       ># cat snapshot
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 0/0   #P:4
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
      
      What happened was that it was using the ring_buffer_iter_empty() function to
      see if it was empty, and if it was, it showed the status. But that function
      was returning false when it was empty. The reason was that the iter header
      page was on the reader page, and the reader page was empty, but so was the
      buffer itself. The check only tested to see if the iter was on the commit
      page, but the commit page was no longer pointing to the reader page, but as
      all pages were empty, the buffer is also.
      
      Cc: stable@vger.kernel.org
      Fixes: 651e22f2 ("ring-buffer: Always reset iterator to reader page")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      78f7a45d
    • S
      tracing: Allocate the snapshot buffer before enabling probe · df62db5b
      Steven Rostedt (VMware) 提交于
      Currently the snapshot trigger enables the probe and then allocates the
      snapshot. If the probe triggers before the allocation, it could cause the
      snapshot to fail and turn tracing off. It's best to allocate the snapshot
      buffer first, and then enable the trigger. If something goes wrong in the
      enabling of the trigger, the snapshot buffer is still allocated, but it can
      also be freed by the user by writting zero into the snapshot buffer file.
      
      Also add a check of the return status of alloc_snapshot().
      
      Cc: stable@vger.kernel.org
      Fixes: 77fd5c15 ("tracing: Add snapshot trigger to function probes")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      df62db5b
  5. 19 4月, 2017 1 次提交
    • S
      ftrace: Move the probe function into the tracing directory · ec19b859
      Steven Rostedt (VMware) 提交于
      As nothing outside the tracing directory uses the function probes mechanism,
      I'm moving the prototypes out of the include/linux/ftrace.h and into the
      local kernel/trace/trace.h header. I plan on making them hook to the
      trace_array structure which is local to kernel/trace, and I do not want to
      expose it to the rest of the kernel. This requires that the probe functions
      must also be local to tracing. But luckily nothing else uses them.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      ec19b859
  6. 18 4月, 2017 3 次提交
  7. 17 4月, 2017 1 次提交
    • S
      ftrace: Fix indexing of t_hash_start() from t_next() · fcdc7125
      Steven Rostedt (VMware) 提交于
      t_hash_start() does not increment *pos, where as t_next() must. But when
      t_next() does increment *pos, it must still pass in the original *pos to
      t_hash_start() otherwise it will skip the first instance:
      
       # cd /sys/kernel/debug/tracing
       # echo schedule:traceoff > set_ftrace_filter
       # echo do_IRQ:traceoff > set_ftrace_filter
       # echo call_rcu > set_ftrace_filter
       # cat set_ftrace_filter
      call_rcu
      schedule:traceoff:unlimited
      do_IRQ:traceoff:unlimited
      
      The above called t_hash_start() from t_start() as there was only one
      function (call_rcu), but if we add another function:
      
       # echo xfrm_policy_destroy_rcu >> set_ftrace_filter
       # cat set_ftrace_filter
      call_rcu
      xfrm_policy_destroy_rcu
      do_IRQ:traceoff:unlimited
      
      The "schedule:traceoff" disappears.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      fcdc7125
  8. 16 4月, 2017 1 次提交
    • S
      ftrace: Fix removing of second function probe · acceb72e
      Steven Rostedt (VMware) 提交于
      When two function probes are added to set_ftrace_filter, and then one of
      them is removed, the update to the function locations is not performed, and
      the record keeping of the function states are corrupted, and causes an
      ftrace_bug() to occur.
      
      This is easily reproducable by adding two probes, removing one, and then
      adding it back again.
      
       # cd /sys/kernel/debug/tracing
       # echo schedule:traceoff > set_ftrace_filter
       # echo do_IRQ:traceoff > set_ftrace_filter
       # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter
       # echo do_IRQ:traceoff > set_ftrace_filter
      
      Causes:
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220
       Modules linked in: [...]
       CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
       Call Trace:
        dump_stack+0x68/0x9f
        __warn+0x111/0x130
        ? trace_irq_work_interrupt+0xa0/0xa0
        warn_slowpath_null+0x1d/0x20
        ftrace_get_addr_curr+0x143/0x220
        ? __fentry__+0x10/0x10
        ftrace_replace_code+0xe3/0x4f0
        ? ftrace_int3_handler+0x90/0x90
        ? printk+0x99/0xb5
        ? 0xffffffff81000000
        ftrace_modify_all_code+0x97/0x110
        arch_ftrace_update_code+0x10/0x20
        ftrace_run_update_code+0x1c/0x60
        ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0
        register_ftrace_function_probe+0x4b6/0x590
        ? ftrace_startup+0x310/0x310
        ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30
        ? update_stack_state+0x88/0x110
        ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
        ? preempt_count_sub+0x18/0xd0
        ? mutex_lock_nested+0x104/0x800
        ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
        ? __unwind_start+0x1c0/0x1c0
        ? _mutex_lock_nest_lock+0x800/0x800
        ftrace_trace_probe_callback.isra.3+0xc0/0x130
        ? func_set_flag+0xe0/0xe0
        ? __lock_acquire+0x642/0x1790
        ? __might_fault+0x1e/0x20
        ? trace_get_user+0x398/0x470
        ? strcmp+0x35/0x60
        ftrace_trace_onoff_callback+0x48/0x70
        ftrace_regex_write.isra.43.part.44+0x251/0x320
        ? match_records+0x420/0x420
        ftrace_filter_write+0x2b/0x30
        __vfs_write+0xd7/0x330
        ? do_loop_readv_writev+0x120/0x120
        ? locks_remove_posix+0x90/0x2f0
        ? do_lock_file_wait+0x160/0x160
        ? __lock_is_held+0x93/0x100
        ? rcu_read_lock_sched_held+0x5c/0xb0
        ? preempt_count_sub+0x18/0xd0
        ? __sb_start_write+0x10a/0x230
        ? vfs_write+0x222/0x240
        vfs_write+0xef/0x240
        SyS_write+0xab/0x130
        ? SyS_read+0x130/0x130
        ? trace_hardirqs_on_caller+0x182/0x280
        ? trace_hardirqs_on_thunk+0x1a/0x1c
        entry_SYSCALL_64_fastpath+0x18/0xad
       RIP: 0033:0x7fe61c157c30
       RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
       RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30
       RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001
       RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700
       R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400
       R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c
        ? trace_hardirqs_off_caller+0xc0/0x110
       ---[ end trace 99fa09b3d9869c2c ]---
       Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150)
      
      Cc: stable@vger.kernel.org
      Fixes: 59df055f ("ftrace: trace different functions with a different tracer")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      acceb72e
  9. 15 4月, 2017 1 次提交
    • S
      ftrace: Fix removing of second function probe · 82cc4fc2
      Steven Rostedt (VMware) 提交于
      When two function probes are added to set_ftrace_filter, and then one of
      them is removed, the update to the function locations is not performed, and
      the record keeping of the function states are corrupted, and causes an
      ftrace_bug() to occur.
      
      This is easily reproducable by adding two probes, removing one, and then
      adding it back again.
      
       # cd /sys/kernel/debug/tracing
       # echo schedule:traceoff > set_ftrace_filter
       # echo do_IRQ:traceoff > set_ftrace_filter
       # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter
       # echo do_IRQ:traceoff > set_ftrace_filter
      
      Causes:
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220
       Modules linked in: [...]
       CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
       Call Trace:
        dump_stack+0x68/0x9f
        __warn+0x111/0x130
        ? trace_irq_work_interrupt+0xa0/0xa0
        warn_slowpath_null+0x1d/0x20
        ftrace_get_addr_curr+0x143/0x220
        ? __fentry__+0x10/0x10
        ftrace_replace_code+0xe3/0x4f0
        ? ftrace_int3_handler+0x90/0x90
        ? printk+0x99/0xb5
        ? 0xffffffff81000000
        ftrace_modify_all_code+0x97/0x110
        arch_ftrace_update_code+0x10/0x20
        ftrace_run_update_code+0x1c/0x60
        ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0
        register_ftrace_function_probe+0x4b6/0x590
        ? ftrace_startup+0x310/0x310
        ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30
        ? update_stack_state+0x88/0x110
        ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
        ? preempt_count_sub+0x18/0xd0
        ? mutex_lock_nested+0x104/0x800
        ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
        ? __unwind_start+0x1c0/0x1c0
        ? _mutex_lock_nest_lock+0x800/0x800
        ftrace_trace_probe_callback.isra.3+0xc0/0x130
        ? func_set_flag+0xe0/0xe0
        ? __lock_acquire+0x642/0x1790
        ? __might_fault+0x1e/0x20
        ? trace_get_user+0x398/0x470
        ? strcmp+0x35/0x60
        ftrace_trace_onoff_callback+0x48/0x70
        ftrace_regex_write.isra.43.part.44+0x251/0x320
        ? match_records+0x420/0x420
        ftrace_filter_write+0x2b/0x30
        __vfs_write+0xd7/0x330
        ? do_loop_readv_writev+0x120/0x120
        ? locks_remove_posix+0x90/0x2f0
        ? do_lock_file_wait+0x160/0x160
        ? __lock_is_held+0x93/0x100
        ? rcu_read_lock_sched_held+0x5c/0xb0
        ? preempt_count_sub+0x18/0xd0
        ? __sb_start_write+0x10a/0x230
        ? vfs_write+0x222/0x240
        vfs_write+0xef/0x240
        SyS_write+0xab/0x130
        ? SyS_read+0x130/0x130
        ? trace_hardirqs_on_caller+0x182/0x280
        ? trace_hardirqs_on_thunk+0x1a/0x1c
        entry_SYSCALL_64_fastpath+0x18/0xad
       RIP: 0033:0x7fe61c157c30
       RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
       RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30
       RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001
       RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700
       R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400
       R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c
        ? trace_hardirqs_off_caller+0xc0/0x110
       ---[ end trace 99fa09b3d9869c2c ]---
       Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150)
      
      Cc: stable@vger.kernel.org
      Fixes: 59df055f ("ftrace: trace different functions with a different tracer")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      82cc4fc2
  10. 12 4月, 2017 1 次提交
  11. 11 4月, 2017 2 次提交
    • S
      rcu/tracing: Add rcu_disabled to denote when rcu_irq_enter() will not work · 03ecd3f4
      Steven Rostedt (VMware) 提交于
      Tracing uses rcu_irq_enter() as a way to make sure that RCU is watching when
      it needs to use rcu_read_lock() and friends. This is because tracing can
      happen as RCU is about to enter user space, or about to go idle, and RCU
      does not watch for RCU read side critical sections as it makes the
      transition.
      
      There is a small location within the RCU infrastructure that rcu_irq_enter()
      itself will not work. If tracing were to occur in that section it will break
      if it tries to use rcu_irq_enter().
      
      Originally, this happens with the stack_tracer, because it will call
      save_stack_trace when it encounters stack usage that is greater than any
      stack usage it had encountered previously. There was a case where that
      happened in the RCU section where rcu_irq_enter() did not work, and lockdep
      complained loudly about it. To fix it, stack tracing added a call to be
      disabled and RCU would disable stack tracing during the critical section
      that rcu_irq_enter() was inoperable. This solution worked, but there are
      other cases that use rcu_irq_enter() and it would be a good idea to let RCU
      give a way to let others know that rcu_irq_enter() will not work. For
      example, in trace events.
      
      Another helpful aspect of this change is that it also moves the per cpu
      variable called in the RCU critical section into a cache locale along with
      other RCU per cpu variables used in that same location.
      
      I'm keeping the stack_trace_disable() code, as that still could be used in
      the future by places that really need to disable it. And since it's only a
      static inline, it wont take up any kernel text if it is not used.
      
      Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.homeAcked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      03ecd3f4
    • S
      tracing: Rename trace_active to disable_stack_tracer and inline its modification · 8aaf1ee7
      Steven Rostedt (VMware) 提交于
      In order to eliminate a function call, make "trace_active" into
      "disable_stack_tracer" and convert stack_tracer_disable() and friends into
      static inline functions.
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      8aaf1ee7