1. 26 11月, 2013 1 次提交
    • S
      ftrace: Fix function graph with loading of modules · 8a56d776
      Steven Rostedt (Red Hat) 提交于
      Commit 8c4f3c3f "ftrace: Check module functions being traced on reload"
      fixed module loading and unloading with respect to function tracing, but
      it missed the function graph tracer. If you perform the following
      
       # cd /sys/kernel/debug/tracing
       # echo function_graph > current_tracer
       # modprobe nfsd
       # echo nop > current_tracer
      
      You'll get the following oops message:
      
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9()
       Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt
       CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
        0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000
        0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668
        ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000
       Call Trace:
        [<ffffffff814fe193>] dump_stack+0x4f/0x7c
        [<ffffffff8103b80a>] warn_slowpath_common+0x81/0x9b
        [<ffffffff810b2b9a>] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9
        [<ffffffff8103b83e>] warn_slowpath_null+0x1a/0x1c
        [<ffffffff810b2b9a>] __ftrace_hash_rec_update.part.35+0x168/0x1b9
        [<ffffffff81502f89>] ? __mutex_lock_slowpath+0x364/0x364
        [<ffffffff810b2cc2>] ftrace_shutdown+0xd7/0x12b
        [<ffffffff810b47f0>] unregister_ftrace_graph+0x49/0x78
        [<ffffffff810c4b30>] graph_trace_reset+0xe/0x10
        [<ffffffff810bf393>] tracing_set_tracer+0xa7/0x26a
        [<ffffffff810bf5e1>] tracing_set_trace_write+0x8b/0xbd
        [<ffffffff810c501c>] ? ftrace_return_to_handler+0xb2/0xde
        [<ffffffff811240a8>] ? __sb_end_write+0x5e/0x5e
        [<ffffffff81122aed>] vfs_write+0xab/0xf6
        [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
        [<ffffffff81122dbd>] SyS_write+0x59/0x82
        [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
        [<ffffffff8150a2d2>] system_call_fastpath+0x16/0x1b
       ---[ end trace 940358030751eafb ]---
      
      The above mentioned commit didn't go far enough. Well, it covered the
      function tracer by adding checks in __register_ftrace_function(). The
      problem is that the function graph tracer circumvents that (for a slight
      efficiency gain when function graph trace is running with a function
      tracer. The gain was not worth this).
      
      The problem came with ftrace_startup() which should always be called after
      __register_ftrace_function(), if you want this bug to be completely fixed.
      
      Anyway, this solution moves __register_ftrace_function() inside of
      ftrace_startup() and removes the need to call them both.
      Reported-by: NDave Wysochanski <dwysocha@redhat.com>
      Fixes: ed926f9b ("ftrace: Use counters to enable functions to trace")
      Cc: stable@vger.kernel.org # 3.0+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8a56d776
  2. 12 11月, 2013 1 次提交
  3. 11 11月, 2013 1 次提交
  4. 09 11月, 2013 1 次提交
  5. 08 11月, 2013 1 次提交
    • J
      blktrace: Send BLK_TN_PROCESS events to all running traces · a404d557
      Jan Kara 提交于
      Currently each task sends BLK_TN_PROCESS event to the first traced
      device it interacts with after a new trace is started. When there are
      several traced devices and the task accesses more devices, this logic
      can result in BLK_TN_PROCESS being sent several times to some devices
      while it is never sent to other devices. Thus blkparse doesn't display
      command name when parsing some blktrace files.
      
      Fix the problem by sending BLK_TN_PROCESS event to all traced devices
      when a task interacts with any of them.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Review-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a404d557
  6. 07 11月, 2013 5 次提交
  7. 06 11月, 2013 6 次提交
    • C
      tracing: Open tracer when ftrace_dump_on_oops is used · b2f974d6
      Cody P Schafer 提交于
      With ftrace_dump_on_oops, we previously did not open the tracer in
      question, sometimes causing the trace output to be useless.
      
      For example, the function_graph tracer with tracing_thresh set dumped via
      ftrace_dump_on_oops would show a series of '}' indented at different levels,
      but no function names.
      
      call trace->open() (and do a few other fixups copied from the normal dump
      path) to make the output more intelligible.
      
      Link: http://lkml.kernel.org/r/1382554197-16961-1-git-send-email-cody@linux.vnet.ibm.comSigned-off-by: NCody P Schafer <cody@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b2f974d6
    • T
      tracing: Add support for SOFT_DISABLE to syscall events · d562aff9
      Tom Zanussi 提交于
      The original SOFT_DISABLE patches didn't add support for soft disable
      of syscall events; this adds it.
      
      Add an array of ftrace_event_file pointers indexed by syscall number
      to the trace array and remove the existing enabled bitmaps, which as a
      result are now redundant.  The ftrace_event_file structs in turn
      contain the soft disable flags we need for per-syscall soft disable
      accounting.
      
      Adding ftrace_event_files also means we can remove the USE_CALL_FILTER
      bit, thus enabling multibuffer filter support for syscall events.
      
      Link: http://lkml.kernel.org/r/6e72b566e85d8df8042f133efbc6c30e21fb017e.1382620672.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d562aff9
    • T
      tracing: Make register/unregister_ftrace_command __init · 38de93ab
      Tom Zanussi 提交于
      register/unregister_ftrace_command() are only ever called from __init
      functions, so can themselves be made __init.
      
      Also make register_snapshot_cmd() __init for the same reason.
      
      Link: http://lkml.kernel.org/r/d4042c8cadb7ae6f843ac9a89a24e1c6a3099727.1382620672.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      38de93ab
    • T
      tracing: Update event filters for multibuffer · f306cc82
      Tom Zanussi 提交于
      The trace event filters are still tied to event calls rather than
      event files, which means you don't get what you'd expect when using
      filters in the multibuffer case:
      
      Before:
      
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 2048
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      
      Setting the filter in tracing/instances/test1/events shouldn't affect
      the same event in tracing/events as it does above.
      
      After:
      
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      
      We'd like to just move the filter directly from ftrace_event_call to
      ftrace_event_file, but there are a couple cases that don't yet have
      multibuffer support and therefore have to continue using the current
      event_call-based filters.  For those cases, a new USE_CALL_FILTER bit
      is added to the event_call flags, whose main purpose is to keep the
      old behavior for those cases until they can be updated with
      multibuffer support; at that point, the USE_CALL_FILTER flag (and the
      new associated call_filter_check_discard() function) can go away.
      
      The multibuffer support also made filter_current_check_discard()
      redundant, so this change removes that function as well and replaces
      it with filter_check_discard() (or call_filter_check_discard() as
      appropriate).
      
      Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f306cc82
    • S
      ftrace: Have control op function callback only trace when RCU is watching · b5aa3a47
      Steven Rostedt (Red Hat) 提交于
      Dave Jones reported that trinity would be able to trigger the following
      back trace:
      
       ===============================
       [ INFO: suspicious RCU usage. ]
       3.10.0-rc2+ #38 Not tainted
       -------------------------------
       include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
       other info that might help us debug this:
      
       RCU used illegally from idle CPU!  rcu_scheduler_active = 1, debug_locks = 0
       RCU used illegally from extended quiescent state!
       1 lock held by trinity-child1/18786:
        #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8113dd48>] __perf_event_overflow+0x108/0x310
       stack backtrace:
       CPU: 3 PID: 18786 Comm: trinity-child1 Not tainted 3.10.0-rc2+ #38
        0000000000000000 ffff88020767bac8 ffffffff816e2f6b ffff88020767baf8
        ffffffff810b5897 ffff88021de92520 0000000000000000 ffff88020767bbf8
        0000000000000000 ffff88020767bb78 ffffffff8113ded4 ffffffff8113dd48
       Call Trace:
        [<ffffffff816e2f6b>] dump_stack+0x19/0x1b
        [<ffffffff810b5897>] lockdep_rcu_suspicious+0xe7/0x120
        [<ffffffff8113ded4>] __perf_event_overflow+0x294/0x310
        [<ffffffff8113dd48>] ? __perf_event_overflow+0x108/0x310
        [<ffffffff81309289>] ? __const_udelay+0x29/0x30
        [<ffffffff81076054>] ? __rcu_read_unlock+0x54/0xa0
        [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
        [<ffffffff8113dfa1>] perf_swevent_overflow+0x51/0xe0
        [<ffffffff8113e08f>] perf_swevent_event+0x5f/0x90
        [<ffffffff8113e1c9>] perf_tp_event+0x109/0x4f0
        [<ffffffff8113e36f>] ? perf_tp_event+0x2af/0x4f0
        [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
        [<ffffffff8112d79f>] perf_ftrace_function_call+0xbf/0xd0
        [<ffffffff8110e1e1>] ? ftrace_ops_control_func+0x181/0x210
        [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
        [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
        [<ffffffff8110e1e1>] ftrace_ops_control_func+0x181/0x210
        [<ffffffff816f4000>] ftrace_call+0x5/0x2f
        [<ffffffff8110e229>] ? ftrace_ops_control_func+0x1c9/0x210
        [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
        [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
        [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
        [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
        [<ffffffff8110112a>] rcu_eqs_enter+0x6a/0xb0
        [<ffffffff81103673>] rcu_user_enter+0x13/0x20
        [<ffffffff8114541a>] user_enter+0x6a/0xd0
        [<ffffffff8100f6d8>] syscall_trace_leave+0x78/0x140
        [<ffffffff816f46af>] int_check_syscall_exit_work+0x34/0x3d
       ------------[ cut here ]------------
      
      Perf uses rcu_read_lock() but as the function tracer can trace functions
      even when RCU is not currently active, this makes the rcu_read_lock()
      used by perf ineffective.
      
      As perf is currently the only user of the ftrace_ops_control_func() and
      perf is also the only function callback that actively uses rcu_read_lock(),
      the quick fix is to prevent the ftrace_ops_control_func() from calling
      its callbacks if RCU is not active.
      
      With Paul's new "rcu_is_watching()" we can tell if RCU is active or not.
      Reported-by: NDave Jones <davej@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b5aa3a47
    • C
      trace/trace_stat: use rbtree postorder iteration helper instead of opencoding · 9cd804ac
      Cody P Schafer 提交于
      Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
      of opencoding an alternate postorder iteration that modifies the tree
      
      Link: http://lkml.kernel.org/r/1383345566-25087-2-git-send-email-cody@linux.vnet.ibm.comSigned-off-by: NCody P Schafer <cody@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9cd804ac
  8. 19 10月, 2013 5 次提交
    • N
      ftrace: Add set_graph_notrace filter · 29ad23b0
      Namhyung Kim 提交于
      The set_graph_notrace filter is analogous to set_ftrace_notrace and
      can be used for eliminating uninteresting part of function graph trace
      output.  It also works with set_graph_function nicely.
      
        # cd /sys/kernel/debug/tracing/
        # echo do_page_fault > set_graph_function
        # perf ftrace live true
         2)               |  do_page_fault() {
         2)               |    __do_page_fault() {
         2)   0.381 us    |      down_read_trylock();
         2)   0.055 us    |      __might_sleep();
         2)   0.696 us    |      find_vma();
         2)               |      handle_mm_fault() {
         2)               |        handle_pte_fault() {
         2)               |          __do_fault() {
         2)               |            filemap_fault() {
         2)               |              find_get_page() {
         2)   0.033 us    |                __rcu_read_lock();
         2)   0.035 us    |                __rcu_read_unlock();
         2)   1.696 us    |              }
         2)   0.031 us    |              __might_sleep();
         2)   2.831 us    |            }
         2)               |            _raw_spin_lock() {
         2)   0.046 us    |              add_preempt_count();
         2)   0.841 us    |            }
         2)   0.033 us    |            page_add_file_rmap();
         2)               |            _raw_spin_unlock() {
         2)   0.057 us    |              sub_preempt_count();
         2)   0.568 us    |            }
         2)               |            unlock_page() {
         2)   0.084 us    |              page_waitqueue();
         2)   0.126 us    |              __wake_up_bit();
         2)   1.117 us    |            }
         2)   7.729 us    |          }
         2)   8.397 us    |        }
         2)   8.956 us    |      }
         2)   0.085 us    |      up_read();
         2) + 12.745 us   |    }
         2) + 13.401 us   |  }
        ...
      
        # echo handle_mm_fault > set_graph_notrace
        # perf ftrace live true
         1)               |  do_page_fault() {
         1)               |    __do_page_fault() {
         1)   0.205 us    |      down_read_trylock();
         1)   0.041 us    |      __might_sleep();
         1)   0.344 us    |      find_vma();
         1)   0.069 us    |      up_read();
         1)   4.692 us    |    }
         1)   5.311 us    |  }
        ...
      
      Link: http://lkml.kernel.org/r/1381739066-7531-5-git-send-email-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      29ad23b0
    • N
      ftrace: Narrow down the protected area of graph_lock · 6a10108b
      Namhyung Kim 提交于
      The parser set up is just a generic utility that uses local variables
      allocated by the function. There's no need to hold the graph_lock for
      this set up.
      
      This also makes the code simpler.
      
      Link: http://lkml.kernel.org/r/1381739066-7531-4-git-send-email-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6a10108b
    • N
      ftrace: Introduce struct ftrace_graph_data · faf982a6
      Namhyung Kim 提交于
      The struct ftrace_graph_data is for generalizing the access to
      set_graph_function file.  This is a preparation for adding support to
      set_graph_notrace.
      
      Link: http://lkml.kernel.org/r/1381739066-7531-3-git-send-email-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      faf982a6
    • N
      ftrace: Get rid of ftrace_graph_filter_enabled · 9aa72b4b
      Namhyung Kim 提交于
      The ftrace_graph_filter_enabled means that user sets function filter
      and it always has same meaning of ftrace_graph_count > 0.
      
      Link: http://lkml.kernel.org/r/1381739066-7531-2-git-send-email-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9aa72b4b
    • S
      tracing: Fix potential out-of-bounds in trace_get_user() · 057db848
      Steven Rostedt 提交于
      Andrey reported the following report:
      
      ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3
      ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3)
      Accessed by thread T13003:
        #0 ffffffff810dd2da (asan_report_error+0x32a/0x440)
        #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40)
        #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20)
        #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260)
        #4 ffffffff812a1065 (__fput+0x155/0x360)
        #5 ffffffff812a12de (____fput+0x1e/0x30)
        #6 ffffffff8111708d (task_work_run+0x10d/0x140)
        #7 ffffffff810ea043 (do_exit+0x433/0x11f0)
        #8 ffffffff810eaee4 (do_group_exit+0x84/0x130)
        #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30)
        #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b)
      
      Allocated by thread T5167:
        #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0)
        #1 ffffffff8128337c (__kmalloc+0xbc/0x500)
        #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90)
        #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0)
        #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40)
        #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430)
        #6 ffffffff8129b668 (finish_open+0x68/0xa0)
        #7 ffffffff812b66ac (do_last+0xb8c/0x1710)
        #8 ffffffff812b7350 (path_openat+0x120/0xb50)
        #9 ffffffff812b8884 (do_filp_open+0x54/0xb0)
        #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0)
        #11 ffffffff8129d4b7 (SyS_open+0x37/0x50)
        #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b)
      
      Shadow bytes around the buggy address:
        ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
        ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
        ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      =>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb
        ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
        ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
      Shadow byte legend (one shadow byte represents 8 application bytes):
        Addressable:           00
        Partially addressable: 01 02 03 04 05 06 07
        Heap redzone:          fa
        Heap kmalloc redzone:  fb
        Freed heap region:     fd
        Shadow gap:            fe
      
      The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;'
      
      Although the crash happened in ftrace_regex_open() the real bug
      occurred in trace_get_user() where there's an incrementation to
      parser->idx without a check against the size. The way it is triggered
      is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop
      that reads the last character stores it and then breaks out because
      there is no more characters. Then the last character is read to determine
      what to do next, and the index is incremented without checking size.
      
      Then the caller of trace_get_user() usually nulls out the last character
      with a zero, but since the index is equal to the size, it writes a nul
      character after the allocated space, which can corrupt memory.
      
      Luckily, only root user has write access to this file.
      
      Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.homeReported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      057db848
  9. 10 10月, 2013 1 次提交
  10. 04 9月, 2013 1 次提交
    • S
      ftrace: Fix a slight race in modifying what function callback gets traced · 59338f75
      Steven Rostedt (Red Hat) 提交于
      There's a slight race when going from a list function to a non list
      function. That is, when only one callback is registered to the function
      tracer, it gets called directly by the mcount trampoline. But if this
      function has filters, it may be called by the wrong functions.
      
      As the list ops callback that handles multiple callbacks that are
      registered to ftrace, it also handles what functions they call. While
      the transaction is taking place, use the list function always, and
      after all the updates are finished (only the functions that should be
      traced are being traced), then we can update the trampoline to call
      the function directly.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      59338f75
  11. 23 8月, 2013 1 次提交
  12. 22 8月, 2013 4 次提交
  13. 03 8月, 2013 3 次提交
  14. 02 8月, 2013 1 次提交
  15. 01 8月, 2013 3 次提交
    • S
      tracing/kprobes: Fail to unregister if probe event files are in use · 40c32592
      Steven Rostedt (Red Hat) 提交于
      When a probe is being removed, it cleans up the event files that correspond
      to the probe. But there is a race between writing to one of these files
      and deleting the probe. This is especially true for the "enable" file.
      
      	CPU 0				CPU 1
      	-----				-----
      
      				  fd = open("enable",O_WRONLY);
      
        probes_open()
        release_all_trace_probes()
        unregister_trace_probe()
        if (trace_probe_is_enabled(tp))
      	return -EBUSY
      
      				   write(fd, "1", 1)
      				   __ftrace_set_clr_event()
      				   call->class->reg()
      				    (kprobe_register)
      				     enable_trace_probe(tp)
      
        __unregister_trace_probe(tp);
        list_del(&tp->list)
        unregister_probe_event(tp) <-- fails!
        free_trace_probe(tp)
      
      				   write(fd, "0", 1)
      				   __ftrace_set_clr_event()
      				   call->class->unreg
      				    (kprobe_register)
      				    disable_trace_probe(tp) <-- BOOM!
      
      A test program was written that used two threads to simulate the
      above scenario adding a nanosleep() interval to change the timings
      and after several thousand runs, it was able to trigger this bug
      and crash:
      
      BUG: unable to handle kernel paging request at 00000005000000f9
      IP: [<ffffffff810dee70>] probes_open+0x3b/0xa7
      PGD 7808a067 PUD 0
      Oops: 0000 [#1] PREEMPT SMP
      Dumping ftrace buffer:
      ---------------------------------
      Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6
      CPU: 1 PID: 2070 Comm: test-kprobe-rem Not tainted 3.11.0-rc3-test+ #47
      Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
      task: ffff880077756440 ti: ffff880076e52000 task.ti: ffff880076e52000
      RIP: 0010:[<ffffffff810dee70>]  [<ffffffff810dee70>] probes_open+0x3b/0xa7
      RSP: 0018:ffff880076e53c38  EFLAGS: 00010203
      RAX: 0000000500000001 RBX: ffff88007844f440 RCX: 0000000000000003
      RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff880076e52000
      RBP: ffff880076e53c58 R08: ffff880076e53bd8 R09: 0000000000000000
      R10: ffff880077756440 R11: 0000000000000006 R12: ffffffff810dee35
      R13: ffff880079250418 R14: 0000000000000000 R15: ffff88007844f450
      FS:  00007f87a276f700(0000) GS:ffff88007d480000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00000005000000f9 CR3: 0000000077262000 CR4: 00000000000007e0
      Stack:
       ffff880076e53c58 ffffffff81219ea0 ffff88007844f440 ffffffff810dee35
       ffff880076e53ca8 ffffffff81130f78 ffff8800772986c0 ffff8800796f93a0
       ffffffff81d1b5d8 ffff880076e53e04 0000000000000000 ffff88007844f440
      Call Trace:
       [<ffffffff81219ea0>] ? security_file_open+0x2c/0x30
       [<ffffffff810dee35>] ? unregister_trace_probe+0x4b/0x4b
       [<ffffffff81130f78>] do_dentry_open+0x162/0x226
       [<ffffffff81131186>] finish_open+0x46/0x54
       [<ffffffff8113f30b>] do_last+0x7f6/0x996
       [<ffffffff8113cc6f>] ? inode_permission+0x42/0x44
       [<ffffffff8113f6dd>] path_openat+0x232/0x496
       [<ffffffff8113fc30>] do_filp_open+0x3a/0x8a
       [<ffffffff8114ab32>] ? __alloc_fd+0x168/0x17a
       [<ffffffff81131f4e>] do_sys_open+0x70/0x102
       [<ffffffff8108f06e>] ? trace_hardirqs_on_caller+0x160/0x197
       [<ffffffff81131ffe>] SyS_open+0x1e/0x20
       [<ffffffff81522742>] system_call_fastpath+0x16/0x1b
      Code: e5 41 54 53 48 89 f3 48 83 ec 10 48 23 56 78 48 39 c2 75 6c 31 f6 48 c7
      RIP  [<ffffffff810dee70>] probes_open+0x3b/0xa7
       RSP <ffff880076e53c38>
      CR2: 00000005000000f9
      ---[ end trace 35f17d68fc569897 ]---
      
      The unregister_trace_probe() must be done first, and if it fails it must
      fail the removal of the kprobe.
      
      Several changes have already been made by Oleg Nesterov and Masami Hiramatsu
      to allow moving the unregister_probe_event() before the removal of
      the probe and exit the function if it fails. This prevents the tp
      structure from being used after it is freed.
      
      Link: http://lkml.kernel.org/r/20130704034038.819592356@goodmis.orgAcked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      40c32592
    • S
      tracing: Add comment to describe special break case in probe_remove_event_call() · 2ba64035
      Steven Rostedt (Red Hat) 提交于
      The "break" used in the do_for_each_event_file() is used as an optimization
      as the loop is really a double loop. The loop searches all event files
      for each trace_array. There's only one matching event file per trace_array
      and after we find the event file for the trace_array, the break is used
      to jump to the next trace_array and start the search there.
      
      As this is not a standard way of using "break" in C code, it requires
      a comment right before the break to let people know what is going on.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2ba64035
    • O
      tracing: trace_remove_event_call() should fail if call/file is in use · 2816c551
      Oleg Nesterov 提交于
      Change trace_remove_event_call(call) to return the error if this
      call is active. This is what the callers assume but can't verify
      outside of the tracing locks. Both trace_kprobe.c/trace_uprobe.c
      need the additional changes, unregister_trace_probe() should abort
      if trace_remove_event_call() fails.
      
      The caller is going to free this call/file so we must ensure that
      nobody can use them after trace_remove_event_call() succeeds.
      debugfs should be fine after the previous changes and event_remove()
      does TRACE_REG_UNREGISTER, but still there are 2 reasons why we need
      the additional checks:
      
      - There could be a perf_event(s) attached to this tp_event, so the
        patch checks ->perf_refcount.
      
      - TRACE_REG_UNREGISTER can be suppressed by FTRACE_EVENT_FL_SOFT_MODE,
        so we simply check FTRACE_EVENT_FL_ENABLED protected by event_mutex.
      
      Link: http://lkml.kernel.org/r/20130729175033.GB26284@redhat.comReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2816c551
  16. 31 7月, 2013 1 次提交
    • S
      ftrace: Check module functions being traced on reload · 8c4f3c3f
      Steven Rostedt (Red Hat) 提交于
      There's been a nasty bug that would show up and not give much info.
      The bug displayed the following warning:
      
       WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230()
       Pid: 20903, comm: bash Tainted: G           O 3.6.11+ #38405.trunk
       Call Trace:
        [<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
        [<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230
        [<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0
        [<ffffffff811401cc>] ? kfree+0x2c/0x110
        [<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150
        [<ffffffff81149f1e>] __fput+0xae/0x220
        [<ffffffff8114a09e>] ____fput+0xe/0x10
        [<ffffffff8105fa22>] task_work_run+0x72/0x90
        [<ffffffff810028ec>] do_notify_resume+0x6c/0xc0
        [<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
        [<ffffffff815c0f88>] int_signal+0x12/0x17
       ---[ end trace 793179526ee09b2c ]---
      
      It was finally narrowed down to unloading a module that was being traced.
      
      It was actually more than that. When functions are being traced, there's
      a table of all functions that have a ref count of the number of active
      tracers attached to that function. When a function trace callback is
      registered to a function, the function's record ref count is incremented.
      When it is unregistered, the function's record ref count is decremented.
      If an inconsistency is detected (ref count goes below zero) the above
      warning is shown and the function tracing is permanently disabled until
      reboot.
      
      The ftrace callback ops holds a hash of functions that it filters on
      (and/or filters off). If the hash is empty, the default means to filter
      all functions (for the filter_hash) or to disable no functions (for the
      notrace_hash).
      
      When a module is unloaded, it frees the function records that represent
      the module functions. These records exist on their own pages, that is
      function records for one module will not exist on the same page as
      function records for other modules or even the core kernel.
      
      Now when a module unloads, the records that represents its functions are
      freed. When the module is loaded again, the records are recreated with
      a default ref count of zero (unless there's a callback that traces all
      functions, then they will also be traced, and the ref count will be
      incremented).
      
      The problem is that if an ftrace callback hash includes functions of the
      module being unloaded, those hash entries will not be removed. If the
      module is reloaded in the same location, the hash entries still point
      to the functions of the module but the module's ref counts do not reflect
      that.
      
      With the help of Steve and Joern, we found a reproducer:
      
       Using uinput module and uinput_release function.
      
       cd /sys/kernel/debug/tracing
       modprobe uinput
       echo uinput_release > set_ftrace_filter
       echo function > current_tracer
       rmmod uinput
       modprobe uinput
       # check /proc/modules to see if loaded in same addr, otherwise try again
       echo nop > current_tracer
      
       [BOOM]
      
      The above loads the uinput module, which creates a table of functions that
      can be traced within the module.
      
      We add uinput_release to the filter_hash to trace just that function.
      
      Enable function tracincg, which increments the ref count of the record
      associated to uinput_release.
      
      Remove uinput, which frees the records including the one that represents
      uinput_release.
      
      Load the uinput module again (and make sure it's at the same address).
      This recreates the function records all with a ref count of zero,
      including uinput_release.
      
      Disable function tracing, which will decrement the ref count for uinput_release
      which is now zero because of the module removal and reload, and we have
      a mismatch (below zero ref count).
      
      The solution is to check all currently tracing ftrace callbacks to see if any
      are tracing any of the module's functions when a module is loaded (it already does
      that with callbacks that trace all functions). If a callback happens to have
      a module function being traced, it increments that records ref count and starts
      tracing that function.
      
      There may be a strange side effect with this, where tracing module functions
      on unload and then reloading a new module may have that new module's functions
      being traced. This may be something that confuses the user, but it's not
      a big deal. Another approach is to disable all callback hashes on module unload,
      but this leaves some ftrace callbacks that may not be registered, but can
      still have hashes tracing the module's function where ftrace doesn't know about
      it. That situation can cause the same bug. This solution solves that case too.
      Another benefit of this solution, is it is possible to trace a module's
      function on unload and load.
      
      Link: http://lkml.kernel.org/r/20130705142629.GA325@redhat.comReported-by: NJörn Engel <joern@logfs.org>
      Reported-by: NDave Jones <davej@redhat.com>
      Reported-by: NSteve Hodgson <steve@purestorage.com>
      Tested-by: NSteve Hodgson <steve@purestorage.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8c4f3c3f
  17. 30 7月, 2013 4 次提交