• J
    ftrace: Fix unregister ftrace_ops accounting · 30fb6aa7
    Jiri Olsa 提交于
    Multiple users of the function tracer can register their functions
    with the ftrace_ops structure. The accounting within ftrace will
    update the counter on each function record that is being traced.
    When the ftrace_ops filtering adds or removes functions, the
    function records will be updated accordingly if the ftrace_ops is
    still registered.
    
    When a ftrace_ops is removed, the counter of the function records,
    that the ftrace_ops traces, are decremented. When they reach zero
    the functions that they represent are modified to stop calling the
    mcount code.
    
    When changes are made, the code is updated via stop_machine() with
    a command passed to the function to tell it what to do. There is an
    ENABLE and DISABLE command that tells the called function to enable
    or disable the functions. But the ENABLE is really a misnomer as it
    should just update the records, as records that have been enabled
    and now have a count of zero should be disabled.
    
    The DISABLE command is used to disable all functions regardless of
    their counter values. This is the big off switch and is not the
    complement of the ENABLE command.
    
    To make matters worse, when a ftrace_ops is unregistered and there
    is another ftrace_ops registered, neither the DISABLE nor the
    ENABLE command are set when calling into the stop_machine() function
    and the records will not be updated to match their counter. A command
    is passed to that function that will update the mcount code to call
    the registered callback directly if it is the only one left. This
    means that the ftrace_ops that is still registered will have its callback
    called by all functions that have been set for it as well as the ftrace_ops
    that was just unregistered.
    
    Here's a way to trigger this bug. Compile the kernel with
    CONFIG_FUNCTION_PROFILER set and with CONFIG_FUNCTION_GRAPH not set:
    
     CONFIG_FUNCTION_PROFILER=y
     # CONFIG_FUNCTION_GRAPH is not set
    
    This will force the function profiler to use the function tracer instead
    of the function graph tracer.
    
      # cd /sys/kernel/debug/tracing
      # echo schedule > set_ftrace_filter
      # echo function > current_tracer
      # cat set_ftrace_filter
     schedule
      # cat trace
     # tracer: nop
     #
     # entries-in-buffer/entries-written: 692/68108025   #P:4
     #
     #                              _-----=> irqs-off
     #                             / _----=> need-resched
     #                            | / _---=> hardirq/softirq
     #                            || / _--=> preempt-depth
     #                            ||| /     delay
     #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
     #              | |       |   ||||       |         |
          kworker/0:2-909   [000] ....   531.235574: schedule <-worker_thread
               <idle>-0     [001] .N..   531.235575: schedule <-cpu_idle
          kworker/0:2-909   [000] ....   531.235597: schedule <-worker_thread
                 sshd-2563  [001] ....   531.235647: schedule <-schedule_hrtimeout_range_clock
    
      # echo 1 > function_profile_enabled
      # echo 0 > function_porfile_enabled
      # cat set_ftrace_filter
     schedule
      # cat trace
     # tracer: function
     #
     # entries-in-buffer/entries-written: 159701/118821262   #P:4
     #
     #                              _-----=> irqs-off
     #                             / _----=> need-resched
     #                            | / _---=> hardirq/softirq
     #                            || / _--=> preempt-depth
     #                            ||| /     delay
     #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
     #              | |       |   ||||       |         |
               <idle>-0     [002] ...1   604.870655: local_touch_nmi <-cpu_idle
               <idle>-0     [002] d..1   604.870655: enter_idle <-cpu_idle
               <idle>-0     [002] d..1   604.870656: atomic_notifier_call_chain <-enter_idle
               <idle>-0     [002] d..1   604.870656: __atomic_notifier_call_chain <-atomic_notifier_call_chain
    
    The same problem could have happened with the trace_probe_ops,
    but they are modified with the set_frace_filter file which does the
    update at closure of the file.
    
    The simple solution is to change ENABLE to UPDATE and call it every
    time an ftrace_ops is unregistered.
    
    Link: http://lkml.kernel.org/r/1323105776-26961-3-git-send-email-jolsa@redhat.com
    
    Cc: stable@vger.kernel.org # 3.0+
    Signed-off-by: NJiri Olsa <jolsa@redhat.com>
    Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
    30fb6aa7
ftrace.c 89.9 KB