ftrace-uses.rst 10.1 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
=================================
Using ftrace to hook to functions
=================================

.. Copyright 2017 VMware Inc.
..   Author:   Steven Rostedt <srostedt@goodmis.org>
..  License:   The GNU Free Documentation License, Version 1.2
..               (dual licensed under the GPL v2)

Written for: 4.14

Introduction
============

The ftrace infrastructure was originially created to attach callbacks to the
beginning of functions in order to record and trace the flow of the kernel.
But callbacks to the start of a function can have other use cases. Either
for live kernel patching, or for security monitoring. This document describes
how to use ftrace to implement your own function callbacks.


The ftrace context
==================

WARNING: The ability to add a callback to almost any function within the
kernel comes with risks. A callback can be called from any context
(normal, softirq, irq, and NMI). Callbacks can also be called just before
going to idle, during CPU bring up and takedown, or going to user space.
This requires extra care to what can be done inside a callback. A callback
can be called outside the protective scope of RCU.

The ftrace infrastructure has some protections agains recursions and RCU
but one must still be very careful how they use the callbacks.


The ftrace_ops structure
========================

To register a function callback, a ftrace_ops is required. This structure
is used to tell ftrace what function should be called as the callback
as well as what protections the callback will perform and not require
ftrace to handle.

There is only one field that is needed to be set when registering
45
an ftrace_ops with ftrace:
46

47
.. code-block:: c
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83

 struct ftrace_ops ops = {
       .func			= my_callback_func,
       .flags			= MY_FTRACE_FLAGS
       .private			= any_private_data_structure,
 };

Both .flags and .private are optional. Only .func is required.

To enable tracing call::

.. c:function::  register_ftrace_function(&ops);

To disable tracing call::

.. c:function::  unregister_ftrace_function(&ops);

The above is defined by including the header::

.. c:function:: #include <linux/ftrace.h>

The registered callback will start being called some time after the
register_ftrace_function() is called and before it returns. The exact time
that callbacks start being called is dependent upon architecture and scheduling
of services. The callback itself will have to handle any synchronization if it
must begin at an exact moment.

The unregister_ftrace_function() will guarantee that the callback is
no longer being called by functions after the unregister_ftrace_function()
returns. Note that to perform this guarantee, the unregister_ftrace_function()
may take some time to finish.


The callback function
=====================

84
The prototype of the callback function is as follows (as of v4.14):
85

86
.. code-block:: c
87

88 89
   void callback_func(unsigned long ip, unsigned long parent_ip,
                      struct ftrace_ops *op, struct pt_regs *regs);
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178

@ip
	 This is the instruction pointer of the function that is being traced.
      	 (where the fentry or mcount is within the function)

@parent_ip
	This is the instruction pointer of the function that called the
	the function being traced (where the call of the function occurred).

@op
	This is a pointer to ftrace_ops that was used to register the callback.
	This can be used to pass data to the callback via the private pointer.

@regs
	If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
	flags are set in the ftrace_ops structure, then this will be pointing
	to the pt_regs structure like it would be if an breakpoint was placed
	at the start of the function where ftrace was tracing. Otherwise it
	either contains garbage, or NULL.


The ftrace FLAGS
================

The ftrace_ops flags are all defined and documented in include/linux/ftrace.h.
Some of the flags are used for internal infrastructure of ftrace, but the
ones that users should be aware of are the following:

FTRACE_OPS_FL_SAVE_REGS
	If the callback requires reading or modifying the pt_regs
	passed to the callback, then it must set this flag. Registering
	a ftrace_ops with this flag set on an architecture that does not
	support passing of pt_regs to the callback will fail.

FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
	Similar to SAVE_REGS but the registering of a
	ftrace_ops on an architecture that does not support passing of regs
	will not fail with this flag set. But the callback must check if
	regs is NULL or not to determine if the architecture supports it.

FTRACE_OPS_FL_RECURSION_SAFE
	By default, a wrapper is added around the callback to
	make sure that recursion of the function does not occur. That is,
	if a function that is called as a result of the callback's execution
	is also traced, ftrace will prevent the callback from being called
	again. But this wrapper adds some overhead, and if the callback is
	safe from recursion, it can set this flag to disable the ftrace
	protection.

	Note, if this flag is set, and recursion does occur, it could cause
	the system to crash, and possibly reboot via a triple fault.

	It is OK if another callback traces a function that is called by a
	callback that is marked recursion safe. Recursion safe callbacks
	must never trace any function that are called by the callback
	itself or any nested functions that those functions call.

	If this flag is set, it is possible that the callback will also
	be called with preemption enabled (when CONFIG_PREEMPT is set),
	but this is not guaranteed.

FTRACE_OPS_FL_IPMODIFY
	Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack"
	the traced function (have another function called instead of the
	traced function), it requires setting this flag. This is what live
	kernel patches uses. Without this flag the pt_regs->ip can not be
	modified.

	Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be
	registered to any given function at a time.

FTRACE_OPS_FL_RCU
	If this is set, then the callback will only be called by functions
	where RCU is "watching". This is required if the callback function
	performs any rcu_read_lock() operation.

	RCU stops watching when the system goes idle, the time when a CPU
	is taken down and comes back online, and when entering from kernel
	to user space and back to kernel space. During these transitions,
	a callback may be executed and RCU synchronization will not protect
	it.


Filtering which functions to trace
==================================

If a callback is only to be called from specific functions, a filter must be
set up. The filters are added by name, or ip if it is known.

179
.. code-block:: c
180

181 182
   int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
                         int len, int reset);
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204

@ops
	The ops to set the filter with

@buf
	The string that holds the function filter text.
@len
	The length of the string.

@reset
	Non-zero to reset all filters before applying this filter.

Filters denote which functions should be enabled when tracing is enabled.
If @buf is NULL and reset is set, all functions will be enabled for tracing.

The @buf can also be a glob expression to enable all functions that
match a specific pattern.

See Filter Commands in :file:`Documentation/trace/ftrace.txt`.

To just trace the schedule function::

205
.. code-block:: c
206

207
   ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0);
208 209 210 211 212 213 214

To add more functions, call the ftrace_set_filter() more than once with the
@reset parameter set to zero. To remove the current filter set and replace it
with new functions defined by @buf, have @reset be non-zero.

To remove all the filtered functions and trace all functions::

215
.. code-block:: c
216

217
   ret = ftrace_set_filter(&ops, NULL, 0, 1);
218 219 220 221 222


Sometimes more than one function has the same name. To trace just a specific
function in this case, ftrace_set_filter_ip() can be used.

223
.. code-block:: c
224

225
   ret = ftrace_set_filter_ip(&ops, ip, 0, 0);
226 227 228 229 230 231 232 233 234 235 236 237 238 239

Although the ip must be the address where the call to fentry or mcount is
located in the function. This function is used by perf and kprobes that
gets the ip address from the user (usually using debug info from the kernel).

If a glob is used to set the filter, functions can be added to a "notrace"
list that will prevent those functions from calling the callback.
The "notrace" list takes precedence over the "filter" list. If the
two lists are non-empty and contain the same functions, the callback will not
be called by any function.

An empty "notrace" list means to allow all functions defined by the filter
to be traced.

240
.. code-block:: c
241

242 243
   int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
                          int len, int reset);
244 245 246 247 248 249 250 251 252 253

This takes the same parameters as ftrace_set_filter() but will add the
functions it finds to not be traced. This is a separate list from the
filter list, and this function does not modify the filter list.

A non-zero @reset will clear the "notrace" list before adding functions
that match @buf to it.

Clearing the "notrace" list is the same as clearing the filter list

254
.. code-block:: c
255 256 257 258 259 260 261 262 263 264 265 266

  ret = ftrace_set_notrace(&ops, NULL, 0, 1);

The filter and notrace lists may be changed at any time. If only a set of
functions should call the callback, it is best to set the filters before
registering the callback. But the changes may also happen after the callback
has been registered.

If a filter is in place, and the @reset is non-zero, and @buf contains a
matching glob to functions, the switch will happen during the time of
the ftrace_set_filter() call. At no time will all functions call the callback.

267
.. code-block:: c
268

269
   ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
270

271
   register_ftrace_function(&ops);
272

273
   msleep(10);
274

275
   ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1);
276 277 278

is not the same as:

279
.. code-block:: c
280

281
   ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
282

283
   register_ftrace_function(&ops);
284

285
   msleep(10);
286

287
   ftrace_set_filter(&ops, NULL, 0, 1);
288

289
   ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0);
290 291 292 293

As the latter will have a short time where all functions will call
the callback, between the time of the reset, and the time of the
new setting of the filter.