提交 943f3d03 编写于 作者: I Ingo Molnar

Merge branches 'sched/core', 'core/core' and 'tracing/core' into cpus4096

...@@ -82,7 +82,7 @@ of ftrace. Here is a list of some of the key files: ...@@ -82,7 +82,7 @@ of ftrace. Here is a list of some of the key files:
tracer is not adding more data, they will display tracer is not adding more data, they will display
the same information every time they are read. the same information every time they are read.
iter_ctrl: This file lets the user control the amount of data trace_options: This file lets the user control the amount of data
that is displayed in one of the above output that is displayed in one of the above output
files. files.
...@@ -94,10 +94,10 @@ of ftrace. Here is a list of some of the key files: ...@@ -94,10 +94,10 @@ of ftrace. Here is a list of some of the key files:
only be recorded if the latency is greater than only be recorded if the latency is greater than
the value in this file. (in microseconds) the value in this file. (in microseconds)
trace_entries: This sets or displays the number of bytes each CPU buffer_size_kb: This sets or displays the number of kilobytes each CPU
buffer can hold. The tracer buffers are the same size buffer can hold. The tracer buffers are the same size
for each CPU. The displayed number is the size of the for each CPU. The displayed number is the size of the
CPU buffer and not total size of all buffers. The CPU buffer and not total size of all buffers. The
trace buffers are allocated in pages (blocks of memory trace buffers are allocated in pages (blocks of memory
that the kernel uses for allocation, usually 4 KB in size). that the kernel uses for allocation, usually 4 KB in size).
If the last page allocated has room for more bytes If the last page allocated has room for more bytes
...@@ -316,23 +316,23 @@ The above is mostly meaningful for kernel developers. ...@@ -316,23 +316,23 @@ The above is mostly meaningful for kernel developers.
The rest is the same as the 'trace' file. The rest is the same as the 'trace' file.
iter_ctrl trace_options
--------- -------------
The iter_ctrl file is used to control what gets printed in the trace The trace_options file is used to control what gets printed in the trace
output. To see what is available, simply cat the file: output. To see what is available, simply cat the file:
cat /debug/tracing/iter_ctrl cat /debug/tracing/trace_options
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \ print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
noblock nostacktrace nosched-tree noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj
To disable one of the options, echo in the option prepended with "no". To disable one of the options, echo in the option prepended with "no".
echo noprint-parent > /debug/tracing/iter_ctrl echo noprint-parent > /debug/tracing/trace_options
To enable an option, leave off the "no". To enable an option, leave off the "no".
echo sym-offset > /debug/tracing/iter_ctrl echo sym-offset > /debug/tracing/trace_options
Here are the available options: Here are the available options:
...@@ -378,6 +378,20 @@ Here are the available options: ...@@ -378,6 +378,20 @@ Here are the available options:
When a trace is recorded, so is the stack of functions. When a trace is recorded, so is the stack of functions.
This allows for back traces of trace sites. This allows for back traces of trace sites.
userstacktrace - This option changes the trace.
It records a stacktrace of the current userspace thread.
sym-userobj - when user stacktrace are enabled, look up which object the
address belongs to, and print a relative address
This is especially useful when ASLR is on, otherwise you don't
get a chance to resolve the address to object/file/line after the app is no
longer running
The lookup is performed when you read trace,trace_pipe,latency_trace. Example:
a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
sched-tree - TBD (any users??) sched-tree - TBD (any users??)
...@@ -1299,41 +1313,29 @@ trace entries ...@@ -1299,41 +1313,29 @@ trace entries
------------- -------------
Having too much or not enough data can be troublesome in diagnosing Having too much or not enough data can be troublesome in diagnosing
an issue in the kernel. The file trace_entries is used to modify an issue in the kernel. The file buffer_size_kb is used to modify
the size of the internal trace buffers. The number listed the size of the internal trace buffers. The number listed
is the number of entries that can be recorded per CPU. To know is the number of entries that can be recorded per CPU. To know
the full size, multiply the number of possible CPUS with the the full size, multiply the number of possible CPUS with the
number of entries. number of entries.
# cat /debug/tracing/trace_entries # cat /debug/tracing/buffer_size_kb
65620 1408 (units kilobytes)
Note, to modify this, you must have tracing completely disabled. To do that, Note, to modify this, you must have tracing completely disabled. To do that,
echo "nop" into the current_tracer. If the current_tracer is not set echo "nop" into the current_tracer. If the current_tracer is not set
to "nop", an EINVAL error will be returned. to "nop", an EINVAL error will be returned.
# echo nop > /debug/tracing/current_tracer # echo nop > /debug/tracing/current_tracer
# echo 100000 > /debug/tracing/trace_entries # echo 10000 > /debug/tracing/buffer_size_kb
# cat /debug/tracing/trace_entries # cat /debug/tracing/buffer_size_kb
100045 10000 (units kilobytes)
Notice that we echoed in 100,000 but the size is 100,045. The entries
are held in individual pages. It allocates the number of pages it takes
to fulfill the request. If more entries may fit on the last page
then they will be added.
# echo 1 > /debug/tracing/trace_entries
# cat /debug/tracing/trace_entries
85
This shows us that 85 entries can fit in a single page.
The number of pages which will be allocated is limited to a percentage The number of pages which will be allocated is limited to a percentage
of available memory. Allocating too much will produce an error. of available memory. Allocating too much will produce an error.
# echo 1000000000000 > /debug/tracing/trace_entries # echo 1000000000000 > /debug/tracing/buffer_size_kb
-bash: echo: write error: Cannot allocate memory -bash: echo: write error: Cannot allocate memory
# cat /debug/tracing/trace_entries # cat /debug/tracing/buffer_size_kb
85 85
...@@ -750,6 +750,14 @@ and is between 256 and 4096 characters. It is defined in the file ...@@ -750,6 +750,14 @@ and is between 256 and 4096 characters. It is defined in the file
parameter will force ia64_sal_cache_flush to call parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
ftrace=[tracer]
[ftrace] will set and start the specified tracer
as early as possible in order to facilitate early
boot debugging.
ftrace_dump_on_oops
[ftrace] will dump the trace buffers on oops.
gamecon.map[2|3]= gamecon.map[2|3]=
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad [HW,JOY] Multisystem joystick and NES/SNES/PSX pad
support via parallel port (up to 5 devices per port) support via parallel port (up to 5 devices per port)
......
...@@ -71,35 +71,50 @@ Look at the current lock statistics: ...@@ -71,35 +71,50 @@ Look at the current lock statistics:
# less /proc/lock_stat # less /proc/lock_stat
01 lock_stat version 0.2 01 lock_stat version 0.3
02 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 02 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
03 class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total 03 class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total
04 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 04 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
05 05
06 &inode->i_data.tree_lock-W: 15 21657 0.18 1093295.30 11547131054.85 58 10415 0.16 87.51 6387.60 06 &mm->mmap_sem-W: 233 538 18446744073708 22924.27 607243.51 1342 45806 1.71 8595.89 1180582.34
07 &inode->i_data.tree_lock-R: 0 0 0.00 0.00 0.00 23302 231198 0.25 8.45 98023.38 07 &mm->mmap_sem-R: 205 587 18446744073708 28403.36 731975.00 1940 412426 0.58 187825.45 6307502.88
08 -------------------------- 08 ---------------
09 &inode->i_data.tree_lock 0 [<ffffffff8027c08f>] add_to_page_cache+0x5f/0x190 09 &mm->mmap_sem 487 [<ffffffff8053491f>] do_page_fault+0x466/0x928
10 10 &mm->mmap_sem 179 [<ffffffff802a6200>] sys_mprotect+0xcd/0x21d
11 ............................................................................................................................................................................................... 11 &mm->mmap_sem 279 [<ffffffff80210a57>] sys_mmap+0x75/0xce
12 12 &mm->mmap_sem 76 [<ffffffff802a490b>] sys_munmap+0x32/0x59
13 dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24 13 ---------------
14 ----------- 14 &mm->mmap_sem 270 [<ffffffff80210a57>] sys_mmap+0x75/0xce
15 dcache_lock 180 [<ffffffff802c0d7e>] sys_getcwd+0x11e/0x230 15 &mm->mmap_sem 431 [<ffffffff8053491f>] do_page_fault+0x466/0x928
16 dcache_lock 165 [<ffffffff802c002a>] d_alloc+0x15a/0x210 16 &mm->mmap_sem 138 [<ffffffff802a490b>] sys_munmap+0x32/0x59
17 dcache_lock 33 [<ffffffff8035818d>] _atomic_dec_and_lock+0x4d/0x70 17 &mm->mmap_sem 145 [<ffffffff802a6200>] sys_mprotect+0xcd/0x21d
18 dcache_lock 1 [<ffffffff802beef8>] shrink_dcache_parent+0x18/0x130 18
19 ...............................................................................................................................................................................................
20
21 dcache_lock: 621 623 0.52 118.26 1053.02 6745 91930 0.29 316.29 118423.41
22 -----------
23 dcache_lock 179 [<ffffffff80378274>] _atomic_dec_and_lock+0x34/0x54
24 dcache_lock 113 [<ffffffff802cc17b>] d_alloc+0x19a/0x1eb
25 dcache_lock 99 [<ffffffff802ca0dc>] d_rehash+0x1b/0x44
26 dcache_lock 104 [<ffffffff802cbca0>] d_instantiate+0x36/0x8a
27 -----------
28 dcache_lock 192 [<ffffffff80378274>] _atomic_dec_and_lock+0x34/0x54
29 dcache_lock 98 [<ffffffff802ca0dc>] d_rehash+0x1b/0x44
30 dcache_lock 72 [<ffffffff802cc17b>] d_alloc+0x19a/0x1eb
31 dcache_lock 112 [<ffffffff802cbca0>] d_instantiate+0x36/0x8a
This excerpt shows the first two lock class statistics. Line 01 shows the This excerpt shows the first two lock class statistics. Line 01 shows the
output version - each time the format changes this will be updated. Line 02-04 output version - each time the format changes this will be updated. Line 02-04
show the header with column descriptions. Lines 05-10 and 13-18 show the actual show the header with column descriptions. Lines 05-18 and 20-31 show the actual
statistics. These statistics come in two parts; the actual stats separated by a statistics. These statistics come in two parts; the actual stats separated by a
short separator (line 08, 14) from the contention points. short separator (line 08, 13) from the contention points.
The first lock (05-10) is a read/write lock, and shows two lines above the The first lock (05-18) is a read/write lock, and shows two lines above the
short separator. The contention points don't match the column descriptors, short separator. The contention points don't match the column descriptors,
they have two: contentions and [<IP>] symbol. they have two: contentions and [<IP>] symbol. The second set of contention
points are the points we're contending with.
The integer part of the time values is in us.
View the top contending locks: View the top contending locks:
......
...@@ -70,6 +70,20 @@ a printk warning which identifies the inconsistency: ...@@ -70,6 +70,20 @@ a printk warning which identifies the inconsistency:
"Format mismatch for probe probe_name (format), marker (format)" "Format mismatch for probe probe_name (format), marker (format)"
Another way to use markers is to simply define the marker without generating any
function call to actually call into the marker. This is useful in combination
with tracepoint probes in a scheme like this :
void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk);
DEFINE_MARKER_TP(marker_eventname, tracepoint_name, probe_tracepoint_name,
"arg1 %u pid %d");
notrace void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk)
{
struct marker *marker = &GET_MARKER(kernel_irq_entry);
/* write data to trace buffers ... */
}
* Probe / marker example * Probe / marker example
......
...@@ -3,28 +3,30 @@ ...@@ -3,28 +3,30 @@
Mathieu Desnoyers Mathieu Desnoyers
This document introduces Linux Kernel Tracepoints and their use. It provides This document introduces Linux Kernel Tracepoints and their use. It
examples of how to insert tracepoints in the kernel and connect probe functions provides examples of how to insert tracepoints in the kernel and
to them and provides some examples of probe functions. connect probe functions to them and provides some examples of probe
functions.
* Purpose of tracepoints * Purpose of tracepoints
A tracepoint placed in code provides a hook to call a function (probe) that you A tracepoint placed in code provides a hook to call a function (probe)
can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or that you can provide at runtime. A tracepoint can be "on" (a probe is
"off" (no probe is attached). When a tracepoint is "off" it has no effect, connected to it) or "off" (no probe is attached). When a tracepoint is
except for adding a tiny time penalty (checking a condition for a branch) and "off" it has no effect, except for adding a tiny time penalty
space penalty (adding a few bytes for the function call at the end of the (checking a condition for a branch) and space penalty (adding a few
instrumented function and adds a data structure in a separate section). When a bytes for the function call at the end of the instrumented function
tracepoint is "on", the function you provide is called each time the tracepoint and adds a data structure in a separate section). When a tracepoint
is executed, in the execution context of the caller. When the function provided is "on", the function you provide is called each time the tracepoint
ends its execution, it returns to the caller (continuing from the tracepoint is executed, in the execution context of the caller. When the function
site). provided ends its execution, it returns to the caller (continuing from
the tracepoint site).
You can put tracepoints at important locations in the code. They are You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, lightweight hooks that can pass an arbitrary number of parameters,
which prototypes are described in a tracepoint declaration placed in a header which prototypes are described in a tracepoint declaration placed in a
file. header file.
They can be used for tracing and performance accounting. They can be used for tracing and performance accounting.
...@@ -42,7 +44,7 @@ In include/trace/subsys.h : ...@@ -42,7 +44,7 @@ In include/trace/subsys.h :
#include <linux/tracepoint.h> #include <linux/tracepoint.h>
DEFINE_TRACE(subsys_eventname, DECLARE_TRACE(subsys_eventname,
TPPTOTO(int firstarg, struct task_struct *p), TPPTOTO(int firstarg, struct task_struct *p),
TPARGS(firstarg, p)); TPARGS(firstarg, p));
...@@ -50,6 +52,8 @@ In subsys/file.c (where the tracing statement must be added) : ...@@ -50,6 +52,8 @@ In subsys/file.c (where the tracing statement must be added) :
#include <trace/subsys.h> #include <trace/subsys.h>
DEFINE_TRACE(subsys_eventname);
void somefct(void) void somefct(void)
{ {
... ...
...@@ -61,31 +65,41 @@ Where : ...@@ -61,31 +65,41 @@ Where :
- subsys_eventname is an identifier unique to your event - subsys_eventname is an identifier unique to your event
- subsys is the name of your subsystem. - subsys is the name of your subsystem.
- eventname is the name of the event to trace. - eventname is the name of the event to trace.
- TPPTOTO(int firstarg, struct task_struct *p) is the prototype of the function
called by this tracepoint.
- TPARGS(firstarg, p) are the parameters names, same as found in the prototype.
Connecting a function (probe) to a tracepoint is done by providing a probe - TPPTOTO(int firstarg, struct task_struct *p) is the prototype of the
(function to call) for the specific tracepoint through function called by this tracepoint.
register_trace_subsys_eventname(). Removing a probe is done through
unregister_trace_subsys_eventname(); it will remove the probe sure there is no
caller left using the probe when it returns. Probe removal is preempt-safe
because preemption is disabled around the probe call. See the "Probe example"
section below for a sample probe module.
The tracepoint mechanism supports inserting multiple instances of the same
tracepoint, but a single definition must be made of a given tracepoint name over
all the kernel to make sure no type conflict will occur. Name mangling of the
tracepoints is done using the prototypes to make sure typing is correct.
Verification of probe type correctness is done at the registration site by the
compiler. Tracepoints can be put in inline functions, inlined static functions,
and unrolled loops as well as regular functions.
The naming scheme "subsys_event" is suggested here as a convention intended
to limit collisions. Tracepoint names are global to the kernel: they are
considered as being the same whether they are in the core kernel image or in
modules.
- TPARGS(firstarg, p) are the parameters names, same as found in the
prototype.
Connecting a function (probe) to a tracepoint is done by providing a
probe (function to call) for the specific tracepoint through
register_trace_subsys_eventname(). Removing a probe is done through
unregister_trace_subsys_eventname(); it will remove the probe.
tracepoint_synchronize_unregister() must be called before the end of
the module exit function to make sure there is no caller left using
the probe. This, and the fact that preemption is disabled around the
probe call, make sure that probe removal and module unload are safe.
See the "Probe example" section below for a sample probe module.
The tracepoint mechanism supports inserting multiple instances of the
same tracepoint, but a single definition must be made of a given
tracepoint name over all the kernel to make sure no type conflict will
occur. Name mangling of the tracepoints is done using the prototypes
to make sure typing is correct. Verification of probe type correctness
is done at the registration site by the compiler. Tracepoints can be
put in inline functions, inlined static functions, and unrolled loops
as well as regular functions.
The naming scheme "subsys_event" is suggested here as a convention
intended to limit collisions. Tracepoint names are global to the
kernel: they are considered as being the same whether they are in the
core kernel image or in modules.
If the tracepoint has to be used in kernel modules, an
EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be
used to export the defined tracepoints.
* Probe / tracepoint example * Probe / tracepoint example
......
...@@ -37,7 +37,7 @@ $ echo mmiotrace > /debug/tracing/current_tracer ...@@ -37,7 +37,7 @@ $ echo mmiotrace > /debug/tracing/current_tracer
$ cat /debug/tracing/trace_pipe > mydump.txt & $ cat /debug/tracing/trace_pipe > mydump.txt &
Start X or whatever. Start X or whatever.
$ echo "X is up" > /debug/tracing/trace_marker $ echo "X is up" > /debug/tracing/trace_marker
$ echo none > /debug/tracing/current_tracer $ echo nop > /debug/tracing/current_tracer
Check for lost events. Check for lost events.
...@@ -66,7 +66,7 @@ which action. It is recommended to place descriptive markers about what you ...@@ -66,7 +66,7 @@ which action. It is recommended to place descriptive markers about what you
do. do.
Shut down mmiotrace (requires root privileges): Shut down mmiotrace (requires root privileges):
$ echo none > /debug/tracing/current_tracer $ echo nop > /debug/tracing/current_tracer
The 'cat' process exits. If it does not, kill it by issuing 'fg' command and The 'cat' process exits. If it does not, kill it by issuing 'fg' command and
pressing ctrl+c. pressing ctrl+c.
...@@ -81,7 +81,9 @@ are: ...@@ -81,7 +81,9 @@ are:
$ cat /debug/tracing/trace_entries $ cat /debug/tracing/trace_entries
gives you a number. Approximately double this number and write it back, for gives you a number. Approximately double this number and write it back, for
instance: instance:
$ echo 0 > /debug/tracing/tracing_enabled
$ echo 128000 > /debug/tracing/trace_entries $ echo 128000 > /debug/tracing/trace_entries
$ echo 1 > /debug/tracing/tracing_enabled
Then start again from the top. Then start again from the top.
If you are doing a trace for a driver project, e.g. Nouveau, you should also If you are doing a trace for a driver project, e.g. Nouveau, you should also
......
...@@ -33,6 +33,7 @@ ...@@ -33,6 +33,7 @@
#define LCD_CONN_TYPE(_x) ((_x) & 0x0f) #define LCD_CONN_TYPE(_x) ((_x) & 0x0f)
#define LCD_CONN_WIDTH(_x) (((_x) >> 4) & 0x1f) #define LCD_CONN_WIDTH(_x) (((_x) >> 4) & 0x1f)
#define LCD_TYPE_MASK 0xf
#define LCD_TYPE_UNKNOWN 0 #define LCD_TYPE_UNKNOWN 0
#define LCD_TYPE_MONO_STN 1 #define LCD_TYPE_MONO_STN 1
#define LCD_TYPE_MONO_DSTN 2 #define LCD_TYPE_MONO_DSTN 2
......
...@@ -90,12 +90,13 @@ void arch_reset(char mode) ...@@ -90,12 +90,13 @@ void arch_reset(char mode)
/* Jump into ROM at address 0 */ /* Jump into ROM at address 0 */
cpu_reset(0); cpu_reset(0);
break; break;
case 'h':
do_hw_reset();
break;
case 'g': case 'g':
do_gpio_reset(); do_gpio_reset();
break; break;
case 'h':
default:
do_hw_reset();
break;
} }
} }
...@@ -67,6 +67,7 @@ ...@@ -67,6 +67,7 @@
static unsigned long spitz_pin_config[] __initdata = { static unsigned long spitz_pin_config[] __initdata = {
/* Chip Selects */ /* Chip Selects */
GPIO78_nCS_2, /* SCOOP #2 */ GPIO78_nCS_2, /* SCOOP #2 */
GPIO79_nCS_3, /* NAND */
GPIO80_nCS_4, /* SCOOP #1 */ GPIO80_nCS_4, /* SCOOP #1 */
/* LCD - 16bpp Active TFT */ /* LCD - 16bpp Active TFT */
...@@ -97,10 +98,10 @@ static unsigned long spitz_pin_config[] __initdata = { ...@@ -97,10 +98,10 @@ static unsigned long spitz_pin_config[] __initdata = {
GPIO51_nPIOW, GPIO51_nPIOW,
GPIO85_nPCE_1, GPIO85_nPCE_1,
GPIO54_nPCE_2, GPIO54_nPCE_2,
GPIO79_PSKTSEL,
GPIO55_nPREG, GPIO55_nPREG,
GPIO56_nPWAIT, GPIO56_nPWAIT,
GPIO57_nIOIS16, GPIO57_nIOIS16,
GPIO104_PSKTSEL,
/* MMC */ /* MMC */
GPIO32_MMC_CLK, GPIO32_MMC_CLK,
...@@ -686,7 +687,6 @@ static void __init akita_init(void) ...@@ -686,7 +687,6 @@ static void __init akita_init(void)
spitz_pcmcia_config.num_devs = 1; spitz_pcmcia_config.num_devs = 1;
platform_scoop_config = &spitz_pcmcia_config; platform_scoop_config = &spitz_pcmcia_config;
pxa_set_i2c_info(NULL);
i2c_register_board_info(0, ARRAY_AND_SIZE(akita_i2c_board_info)); i2c_register_board_info(0, ARRAY_AND_SIZE(akita_i2c_board_info));
common_init(); common_init();
......
...@@ -7,7 +7,19 @@ ...@@ -7,7 +7,19 @@
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
extern void _mcount(void); extern void _mcount(void);
#endif
#ifdef CONFIG_DYNAMIC_FTRACE
static inline unsigned long ftrace_call_adjust(unsigned long addr)
{
/* reloction of mcount call site is the same as the address */
return addr;
}
struct dyn_arch_ftrace {
struct module *mod;
};
#endif /* CONFIG_DYNAMIC_FTRACE */
#endif /* __ASSEMBLY__ */
#endif #endif
......
...@@ -34,11 +34,19 @@ struct mod_arch_specific { ...@@ -34,11 +34,19 @@ struct mod_arch_specific {
#ifdef __powerpc64__ #ifdef __powerpc64__
unsigned int stubs_section; /* Index of stubs section in module */ unsigned int stubs_section; /* Index of stubs section in module */
unsigned int toc_section; /* What section is the TOC? */ unsigned int toc_section; /* What section is the TOC? */
#else #ifdef CONFIG_DYNAMIC_FTRACE
unsigned long toc;
unsigned long tramp;
#endif
#else /* powerpc64 */
/* Indices of PLT sections within module. */ /* Indices of PLT sections within module. */
unsigned int core_plt_section; unsigned int core_plt_section;
unsigned int init_plt_section; unsigned int init_plt_section;
#ifdef CONFIG_DYNAMIC_FTRACE
unsigned long tramp;
#endif #endif
#endif /* powerpc64 */
/* List of BUG addresses, source line numbers and filenames */ /* List of BUG addresses, source line numbers and filenames */
struct list_head bug_list; struct list_head bug_list;
...@@ -68,6 +76,12 @@ struct mod_arch_specific { ...@@ -68,6 +76,12 @@ struct mod_arch_specific {
# endif /* MODULE */ # endif /* MODULE */
#endif #endif
#ifdef CONFIG_DYNAMIC_FTRACE
# ifdef MODULE
asm(".section .ftrace.tramp,\"ax\",@nobits; .align 3; .previous");
# endif /* MODULE */
#endif
struct exception_table_entry; struct exception_table_entry;
void sort_ex_table(struct exception_table_entry *start, void sort_ex_table(struct exception_table_entry *start,
......
...@@ -9,22 +9,30 @@ ...@@ -9,22 +9,30 @@
#include <linux/spinlock.h> #include <linux/spinlock.h>
#include <linux/hardirq.h> #include <linux/hardirq.h>
#include <linux/uaccess.h>
#include <linux/module.h>
#include <linux/ftrace.h> #include <linux/ftrace.h>
#include <linux/percpu.h> #include <linux/percpu.h>
#include <linux/init.h> #include <linux/init.h>
#include <linux/list.h> #include <linux/list.h>
#include <asm/cacheflush.h> #include <asm/cacheflush.h>
#include <asm/code-patching.h>
#include <asm/ftrace.h> #include <asm/ftrace.h>
#if 0
#define DEBUGP printk
#else
#define DEBUGP(fmt , ...) do { } while (0)
#endif
static unsigned int ftrace_nop = 0x60000000; static unsigned int ftrace_nop = PPC_NOP_INSTR;
#ifdef CONFIG_PPC32 #ifdef CONFIG_PPC32
# define GET_ADDR(addr) addr # define GET_ADDR(addr) addr
#else #else
/* PowerPC64's functions are data that points to the functions */ /* PowerPC64's functions are data that points to the functions */
# define GET_ADDR(addr) *(unsigned long *)addr # define GET_ADDR(addr) (*(unsigned long *)addr)
#endif #endif
...@@ -33,12 +41,12 @@ static unsigned int ftrace_calc_offset(long ip, long addr) ...@@ -33,12 +41,12 @@ static unsigned int ftrace_calc_offset(long ip, long addr)
return (int)(addr - ip); return (int)(addr - ip);
} }
unsigned char *ftrace_nop_replace(void) static unsigned char *ftrace_nop_replace(void)
{ {
return (char *)&ftrace_nop; return (char *)&ftrace_nop;
} }
unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr) static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
{ {
static unsigned int op; static unsigned int op;
...@@ -68,49 +76,434 @@ unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr) ...@@ -68,49 +76,434 @@ unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
# define _ASM_PTR " .long " # define _ASM_PTR " .long "
#endif #endif
int static int
ftrace_modify_code(unsigned long ip, unsigned char *old_code, ftrace_modify_code(unsigned long ip, unsigned char *old_code,
unsigned char *new_code) unsigned char *new_code)
{ {
unsigned replaced; unsigned char replaced[MCOUNT_INSN_SIZE];
unsigned old = *(unsigned *)old_code;
unsigned new = *(unsigned *)new_code;
int faulted = 0;
/* /*
* Note: Due to modules and __init, code can * Note: Due to modules and __init, code can
* disappear and change, we need to protect against faulting * disappear and change, we need to protect against faulting
* as well as code changing. * as well as code changing. We do this by using the
* probe_kernel_* functions.
* *
* No real locking needed, this code is run through * No real locking needed, this code is run through
* kstop_machine. * kstop_machine, or before SMP starts.
*/ */
asm volatile (
"1: lwz %1, 0(%2)\n" /* read the text we want to modify */
" cmpw %1, %5\n" if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
" bne 2f\n" return -EFAULT;
" stwu %3, 0(%2)\n"
"2:\n" /* Make sure it is what we expect it to be */
".section .fixup, \"ax\"\n" if (memcmp(replaced, old_code, MCOUNT_INSN_SIZE) != 0)
"3: li %0, 1\n" return -EINVAL;
" b 2b\n"
".previous\n" /* replace the text with the new text */
".section __ex_table,\"a\"\n" if (probe_kernel_write((void *)ip, new_code, MCOUNT_INSN_SIZE))
_ASM_ALIGN "\n" return -EPERM;
_ASM_PTR "1b, 3b\n"
".previous" flush_icache_range(ip, ip + 8);
: "=r"(faulted), "=r"(replaced)
: "r"(ip), "r"(new), return 0;
"0"(faulted), "r"(old) }
: "memory");
/*
if (replaced != old && replaced != new) * Helper functions that are the same for both PPC64 and PPC32.
faulted = 2; */
static int test_24bit_addr(unsigned long ip, unsigned long addr)
if (!faulted) {
flush_icache_range(ip, ip + 8); long diff;
return faulted; /*
* Can we get to addr from ip in 24 bits?
* (26 really, since we mulitply by 4 for 4 byte alignment)
*/
diff = addr - ip;
/*
* Return true if diff is less than 1 << 25
* and greater than -1 << 26.
*/
return (diff < (1 << 25)) && (diff > (-1 << 26));
}
static int is_bl_op(unsigned int op)
{
return (op & 0xfc000003) == 0x48000001;
}
static int test_offset(unsigned long offset)
{
return (offset + 0x2000000 > 0x3ffffff) || ((offset & 3) != 0);
}
static unsigned long find_bl_target(unsigned long ip, unsigned int op)
{
static int offset;
offset = (op & 0x03fffffc);
/* make it signed */
if (offset & 0x02000000)
offset |= 0xfe000000;
return ip + (long)offset;
}
static unsigned int branch_offset(unsigned long offset)
{
/* return "bl ip+offset" */
return 0x48000001 | (offset & 0x03fffffc);
}
#ifdef CONFIG_PPC64
static int
__ftrace_make_nop(struct module *mod,
struct dyn_ftrace *rec, unsigned long addr)
{
unsigned char replaced[MCOUNT_INSN_SIZE * 2];
unsigned int *op = (unsigned *)&replaced;
unsigned char jmp[8];
unsigned long *ptr = (unsigned long *)&jmp;
unsigned long ip = rec->ip;
unsigned long tramp;
int offset;
/* read where this goes */
if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
return -EFAULT;
/* Make sure that that this is still a 24bit jump */
if (!is_bl_op(*op)) {
printk(KERN_ERR "Not expected bl: opcode is %x\n", *op);
return -EINVAL;
}
/* lets find where the pointer goes */
tramp = find_bl_target(ip, *op);
/*
* On PPC64 the trampoline looks like:
* 0x3d, 0x82, 0x00, 0x00, addis r12,r2, <high>
* 0x39, 0x8c, 0x00, 0x00, addi r12,r12, <low>
* Where the bytes 2,3,6 and 7 make up the 32bit offset
* to the TOC that holds the pointer.
* to jump to.
* 0xf8, 0x41, 0x00, 0x28, std r2,40(r1)
* 0xe9, 0x6c, 0x00, 0x20, ld r11,32(r12)
* The actually address is 32 bytes from the offset
* into the TOC.
* 0xe8, 0x4c, 0x00, 0x28, ld r2,40(r12)
*/
DEBUGP("ip:%lx jumps to %lx r2: %lx", ip, tramp, mod->arch.toc);
/* Find where the trampoline jumps to */
if (probe_kernel_read(jmp, (void *)tramp, 8)) {
printk(KERN_ERR "Failed to read %lx\n", tramp);
return -EFAULT;
}
DEBUGP(" %08x %08x",
(unsigned)(*ptr >> 32),
(unsigned)*ptr);
offset = (unsigned)jmp[2] << 24 |
(unsigned)jmp[3] << 16 |
(unsigned)jmp[6] << 8 |
(unsigned)jmp[7];
DEBUGP(" %x ", offset);
/* get the address this jumps too */
tramp = mod->arch.toc + offset + 32;
DEBUGP("toc: %lx", tramp);
if (probe_kernel_read(jmp, (void *)tramp, 8)) {
printk(KERN_ERR "Failed to read %lx\n", tramp);
return -EFAULT;
}
DEBUGP(" %08x %08x\n",
(unsigned)(*ptr >> 32),
(unsigned)*ptr);
/* This should match what was called */
if (*ptr != GET_ADDR(addr)) {
printk(KERN_ERR "addr does not match %lx\n", *ptr);
return -EINVAL;
}
/*
* We want to nop the line, but the next line is
* 0xe8, 0x41, 0x00, 0x28 ld r2,40(r1)
* This needs to be turned to a nop too.
*/
if (probe_kernel_read(replaced, (void *)(ip+4), MCOUNT_INSN_SIZE))
return -EFAULT;
if (*op != 0xe8410028) {
printk(KERN_ERR "Next line is not ld! (%08x)\n", *op);
return -EINVAL;
}
/*
* Milton Miller pointed out that we can not blindly do nops.
* If a task was preempted when calling a trace function,
* the nops will remove the way to restore the TOC in r2
* and the r2 TOC will get corrupted.
*/
/*
* Replace:
* bl <tramp> <==== will be replaced with "b 1f"
* ld r2,40(r1)
* 1:
*/
op[0] = 0x48000008; /* b +8 */
if (probe_kernel_write((void *)ip, replaced, MCOUNT_INSN_SIZE))
return -EPERM;
return 0;
}
#else /* !PPC64 */
static int
__ftrace_make_nop(struct module *mod,
struct dyn_ftrace *rec, unsigned long addr)
{
unsigned char replaced[MCOUNT_INSN_SIZE];
unsigned int *op = (unsigned *)&replaced;
unsigned char jmp[8];
unsigned int *ptr = (unsigned int *)&jmp;
unsigned long ip = rec->ip;
unsigned long tramp;
int offset;
if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
return -EFAULT;
/* Make sure that that this is still a 24bit jump */
if (!is_bl_op(*op)) {
printk(KERN_ERR "Not expected bl: opcode is %x\n", *op);
return -EINVAL;
}
/* lets find where the pointer goes */
tramp = find_bl_target(ip, *op);
/*
* On PPC32 the trampoline looks like:
* lis r11,sym@ha
* addi r11,r11,sym@l
* mtctr r11
* bctr
*/
DEBUGP("ip:%lx jumps to %lx", ip, tramp);
/* Find where the trampoline jumps to */
if (probe_kernel_read(jmp, (void *)tramp, 8)) {
printk(KERN_ERR "Failed to read %lx\n", tramp);
return -EFAULT;
}
DEBUGP(" %08x %08x ", ptr[0], ptr[1]);
tramp = (ptr[1] & 0xffff) |
((ptr[0] & 0xffff) << 16);
if (tramp & 0x8000)
tramp -= 0x10000;
DEBUGP(" %x ", tramp);
if (tramp != addr) {
printk(KERN_ERR
"Trampoline location %08lx does not match addr\n",
tramp);
return -EINVAL;
}
op[0] = PPC_NOP_INSTR;
if (probe_kernel_write((void *)ip, replaced, MCOUNT_INSN_SIZE))
return -EPERM;
return 0;
}
#endif /* PPC64 */
int ftrace_make_nop(struct module *mod,
struct dyn_ftrace *rec, unsigned long addr)
{
unsigned char *old, *new;
unsigned long ip = rec->ip;
/*
* If the calling address is more that 24 bits away,
* then we had to use a trampoline to make the call.
* Otherwise just update the call site.
*/
if (test_24bit_addr(ip, addr)) {
/* within range */
old = ftrace_call_replace(ip, addr);
new = ftrace_nop_replace();
return ftrace_modify_code(ip, old, new);
}
/*
* Out of range jumps are called from modules.
* We should either already have a pointer to the module
* or it has been passed in.
*/
if (!rec->arch.mod) {
if (!mod) {
printk(KERN_ERR "No module loaded addr=%lx\n",
addr);
return -EFAULT;
}
rec->arch.mod = mod;
} else if (mod) {
if (mod != rec->arch.mod) {
printk(KERN_ERR
"Record mod %p not equal to passed in mod %p\n",
rec->arch.mod, mod);
return -EINVAL;
}
/* nothing to do if mod == rec->arch.mod */
} else
mod = rec->arch.mod;
return __ftrace_make_nop(mod, rec, addr);
}
#ifdef CONFIG_PPC64
static int
__ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
{
unsigned char replaced[MCOUNT_INSN_SIZE * 2];
unsigned int *op = (unsigned *)&replaced;
unsigned long ip = rec->ip;
unsigned long offset;
/* read where this goes */
if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE * 2))
return -EFAULT;
/*
* It should be pointing to two nops or
* b +8; ld r2,40(r1)
*/
if (((op[0] != 0x48000008) || (op[1] != 0xe8410028)) &&
((op[0] != PPC_NOP_INSTR) || (op[1] != PPC_NOP_INSTR))) {
printk(KERN_ERR "Expected NOPs but have %x %x\n", op[0], op[1]);
return -EINVAL;
}
/* If we never set up a trampoline to ftrace_caller, then bail */
if (!rec->arch.mod->arch.tramp) {
printk(KERN_ERR "No ftrace trampoline\n");
return -EINVAL;
}
/* now calculate a jump to the ftrace caller trampoline */
offset = rec->arch.mod->arch.tramp - ip;
if (test_offset(offset)) {
printk(KERN_ERR "REL24 %li out of range!\n",
(long int)offset);
return -EINVAL;
}
/* Set to "bl addr" */
op[0] = branch_offset(offset);
/* ld r2,40(r1) */
op[1] = 0xe8410028;
DEBUGP("write to %lx\n", rec->ip);
if (probe_kernel_write((void *)ip, replaced, MCOUNT_INSN_SIZE * 2))
return -EPERM;
return 0;
}
#else
static int
__ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
{
unsigned char replaced[MCOUNT_INSN_SIZE];
unsigned int *op = (unsigned *)&replaced;
unsigned long ip = rec->ip;
unsigned long offset;
/* read where this goes */
if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
return -EFAULT;
/* It should be pointing to a nop */
if (op[0] != PPC_NOP_INSTR) {
printk(KERN_ERR "Expected NOP but have %x\n", op[0]);
return -EINVAL;
}
/* If we never set up a trampoline to ftrace_caller, then bail */
if (!rec->arch.mod->arch.tramp) {
printk(KERN_ERR "No ftrace trampoline\n");
return -EINVAL;
}
/* now calculate a jump to the ftrace caller trampoline */
offset = rec->arch.mod->arch.tramp - ip;
if (test_offset(offset)) {
printk(KERN_ERR "REL24 %li out of range!\n",
(long int)offset);
return -EINVAL;
}
/* Set to "bl addr" */
op[0] = branch_offset(offset);
DEBUGP("write to %lx\n", rec->ip);
if (probe_kernel_write((void *)ip, replaced, MCOUNT_INSN_SIZE))
return -EPERM;
return 0;
}
#endif /* CONFIG_PPC64 */
int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
{
unsigned char *old, *new;
unsigned long ip = rec->ip;
/*
* If the calling address is more that 24 bits away,
* then we had to use a trampoline to make the call.
* Otherwise just update the call site.
*/
if (test_24bit_addr(ip, addr)) {
/* within range */
old = ftrace_nop_replace();
new = ftrace_call_replace(ip, addr);
return ftrace_modify_code(ip, old, new);
}
/*
* Out of range jumps are called from modules.
* Being that we are converting from nop, it had better
* already have a module defined.
*/
if (!rec->arch.mod) {
printk(KERN_ERR "No module loaded\n");
return -EINVAL;
}
return __ftrace_make_call(rec, addr);
} }
int ftrace_update_ftrace_func(ftrace_func_t func) int ftrace_update_ftrace_func(ftrace_func_t func)
...@@ -128,10 +521,10 @@ int ftrace_update_ftrace_func(ftrace_func_t func) ...@@ -128,10 +521,10 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
int __init ftrace_dyn_arch_init(void *data) int __init ftrace_dyn_arch_init(void *data)
{ {
/* This is running in kstop_machine */ /* caller expects data to be zero */
unsigned long *p = data;
ftrace_mcount_set(data); *p = 0;
return 0; return 0;
} }
...@@ -69,10 +69,15 @@ void cpu_idle(void) ...@@ -69,10 +69,15 @@ void cpu_idle(void)
smp_mb(); smp_mb();
local_irq_disable(); local_irq_disable();
/* Don't trace irqs off for idle */
stop_critical_timings();
/* check again after disabling irqs */ /* check again after disabling irqs */
if (!need_resched() && !cpu_should_die()) if (!need_resched() && !cpu_should_die())
ppc_md.power_save(); ppc_md.power_save();
start_critical_timings();
local_irq_enable(); local_irq_enable();
set_thread_flag(TIF_POLLING_NRFLAG); set_thread_flag(TIF_POLLING_NRFLAG);
......
...@@ -22,6 +22,7 @@ ...@@ -22,6 +22,7 @@
#include <linux/fs.h> #include <linux/fs.h>
#include <linux/string.h> #include <linux/string.h>
#include <linux/kernel.h> #include <linux/kernel.h>
#include <linux/ftrace.h>
#include <linux/cache.h> #include <linux/cache.h>
#include <linux/bug.h> #include <linux/bug.h>
#include <linux/sort.h> #include <linux/sort.h>
...@@ -53,6 +54,9 @@ static unsigned int count_relocs(const Elf32_Rela *rela, unsigned int num) ...@@ -53,6 +54,9 @@ static unsigned int count_relocs(const Elf32_Rela *rela, unsigned int num)
r_addend = rela[i].r_addend; r_addend = rela[i].r_addend;
} }
#ifdef CONFIG_DYNAMIC_FTRACE
_count_relocs++; /* add one for ftrace_caller */
#endif
return _count_relocs; return _count_relocs;
} }
...@@ -306,5 +310,11 @@ int apply_relocate_add(Elf32_Shdr *sechdrs, ...@@ -306,5 +310,11 @@ int apply_relocate_add(Elf32_Shdr *sechdrs,
return -ENOEXEC; return -ENOEXEC;
} }
} }
#ifdef CONFIG_DYNAMIC_FTRACE
module->arch.tramp =
do_plt_call(module->module_core,
(unsigned long)ftrace_caller,
sechdrs, module);
#endif
return 0; return 0;
} }
...@@ -20,6 +20,7 @@ ...@@ -20,6 +20,7 @@
#include <linux/moduleloader.h> #include <linux/moduleloader.h>
#include <linux/err.h> #include <linux/err.h>
#include <linux/vmalloc.h> #include <linux/vmalloc.h>
#include <linux/ftrace.h>
#include <linux/bug.h> #include <linux/bug.h>
#include <asm/module.h> #include <asm/module.h>
#include <asm/firmware.h> #include <asm/firmware.h>
...@@ -163,6 +164,11 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr, ...@@ -163,6 +164,11 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
} }
} }
#ifdef CONFIG_DYNAMIC_FTRACE
/* make the trampoline to the ftrace_caller */
relocs++;
#endif
DEBUGP("Looks like a total of %lu stubs, max\n", relocs); DEBUGP("Looks like a total of %lu stubs, max\n", relocs);
return relocs * sizeof(struct ppc64_stub_entry); return relocs * sizeof(struct ppc64_stub_entry);
} }
...@@ -441,5 +447,12 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, ...@@ -441,5 +447,12 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
} }
} }
#ifdef CONFIG_DYNAMIC_FTRACE
me->arch.toc = my_r2(sechdrs, me);
me->arch.tramp = stub_for_addr(sechdrs,
(unsigned long)ftrace_caller,
me);
#endif
return 0; return 0;
} }
...@@ -11,21 +11,21 @@ extern int get_signals(void); ...@@ -11,21 +11,21 @@ extern int get_signals(void);
extern void block_signals(void); extern void block_signals(void);
extern void unblock_signals(void); extern void unblock_signals(void);
#define local_save_flags(flags) do { typecheck(unsigned long, flags); \ #define raw_local_save_flags(flags) do { typecheck(unsigned long, flags); \
(flags) = get_signals(); } while(0) (flags) = get_signals(); } while(0)
#define local_irq_restore(flags) do { typecheck(unsigned long, flags); \ #define raw_local_irq_restore(flags) do { typecheck(unsigned long, flags); \
set_signals(flags); } while(0) set_signals(flags); } while(0)
#define local_irq_save(flags) do { local_save_flags(flags); \ #define raw_local_irq_save(flags) do { raw_local_save_flags(flags); \
local_irq_disable(); } while(0) raw_local_irq_disable(); } while(0)
#define local_irq_enable() unblock_signals() #define raw_local_irq_enable() unblock_signals()
#define local_irq_disable() block_signals() #define raw_local_irq_disable() block_signals()
#define irqs_disabled() \ #define irqs_disabled() \
({ \ ({ \
unsigned long flags; \ unsigned long flags; \
local_save_flags(flags); \ raw_local_save_flags(flags); \
(flags == 0); \ (flags == 0); \
}) })
......
...@@ -29,11 +29,14 @@ config X86 ...@@ -29,11 +29,14 @@ config X86
select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_DYNAMIC_FTRACE select HAVE_DYNAMIC_FTRACE
select HAVE_FUNCTION_TRACER select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_RET_TRACER if X86_32
select HAVE_FUNCTION_TRACE_MCOUNT_TEST
select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64) select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64)
select HAVE_ARCH_KGDB if !X86_VOYAGER select HAVE_ARCH_KGDB if !X86_VOYAGER
select HAVE_ARCH_TRACEHOOK select HAVE_ARCH_TRACEHOOK
select HAVE_GENERIC_DMA_COHERENT if X86_32 select HAVE_GENERIC_DMA_COHERENT if X86_32
select HAVE_EFFICIENT_UNALIGNED_ACCESS select HAVE_EFFICIENT_UNALIGNED_ACCESS
select USER_STACKTRACE_SUPPORT
config ARCH_DEFCONFIG config ARCH_DEFCONFIG
string string
......
...@@ -186,14 +186,10 @@ config IOMMU_LEAK ...@@ -186,14 +186,10 @@ config IOMMU_LEAK
Add a simple leak tracer to the IOMMU code. This is useful when you Add a simple leak tracer to the IOMMU code. This is useful when you
are debugging a buggy device driver that leaks IOMMU mappings. are debugging a buggy device driver that leaks IOMMU mappings.
config MMIOTRACE_HOOKS
bool
config MMIOTRACE config MMIOTRACE
bool "Memory mapped IO tracing" bool "Memory mapped IO tracing"
depends on DEBUG_KERNEL && PCI depends on DEBUG_KERNEL && PCI
select TRACING select TRACING
select MMIOTRACE_HOOKS
help help
Mmiotrace traces Memory Mapped I/O access and is meant for Mmiotrace traces Memory Mapped I/O access and is meant for
debugging and reverse engineering. It is called from the ioremap debugging and reverse engineering. It is called from the ioremap
......
...@@ -17,8 +17,40 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr) ...@@ -17,8 +17,40 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
*/ */
return addr - 1; return addr - 1;
} }
#endif
#ifdef CONFIG_DYNAMIC_FTRACE
struct dyn_arch_ftrace {
/* No extra data needed for x86 */
};
#endif /* CONFIG_DYNAMIC_FTRACE */
#endif /* __ASSEMBLY__ */
#endif /* CONFIG_FUNCTION_TRACER */ #endif /* CONFIG_FUNCTION_TRACER */
#ifdef CONFIG_FUNCTION_RET_TRACER
#ifndef __ASSEMBLY__
/*
* Stack of return addresses for functions
* of a thread.
* Used in struct thread_info
*/
struct ftrace_ret_stack {
unsigned long ret;
unsigned long func;
unsigned long long calltime;
};
/*
* Primary handler of a function return.
* It relays on ftrace_return_to_handler.
* Defined in entry32.S
*/
extern void return_to_handler(void);
#endif /* __ASSEMBLY__ */
#endif /* CONFIG_FUNCTION_RET_TRACER */
#endif /* _ASM_X86_FTRACE_H */ #endif /* _ASM_X86_FTRACE_H */
...@@ -20,6 +20,8 @@ ...@@ -20,6 +20,8 @@
struct task_struct; struct task_struct;
struct exec_domain; struct exec_domain;
#include <asm/processor.h> #include <asm/processor.h>
#include <asm/ftrace.h>
#include <asm/atomic.h>
struct thread_info { struct thread_info {
struct task_struct *task; /* main task structure */ struct task_struct *task; /* main task structure */
......
...@@ -157,6 +157,7 @@ extern int __get_user_bad(void); ...@@ -157,6 +157,7 @@ extern int __get_user_bad(void);
int __ret_gu; \ int __ret_gu; \
unsigned long __val_gu; \ unsigned long __val_gu; \
__chk_user_ptr(ptr); \ __chk_user_ptr(ptr); \
might_fault(); \
switch (sizeof(*(ptr))) { \ switch (sizeof(*(ptr))) { \
case 1: \ case 1: \
__get_user_x(1, __ret_gu, __val_gu, ptr); \ __get_user_x(1, __ret_gu, __val_gu, ptr); \
...@@ -241,6 +242,7 @@ extern void __put_user_8(void); ...@@ -241,6 +242,7 @@ extern void __put_user_8(void);
int __ret_pu; \ int __ret_pu; \
__typeof__(*(ptr)) __pu_val; \ __typeof__(*(ptr)) __pu_val; \
__chk_user_ptr(ptr); \ __chk_user_ptr(ptr); \
might_fault(); \
__pu_val = x; \ __pu_val = x; \
switch (sizeof(*(ptr))) { \ switch (sizeof(*(ptr))) { \
case 1: \ case 1: \
......
...@@ -82,8 +82,8 @@ __copy_to_user_inatomic(void __user *to, const void *from, unsigned long n) ...@@ -82,8 +82,8 @@ __copy_to_user_inatomic(void __user *to, const void *from, unsigned long n)
static __always_inline unsigned long __must_check static __always_inline unsigned long __must_check
__copy_to_user(void __user *to, const void *from, unsigned long n) __copy_to_user(void __user *to, const void *from, unsigned long n)
{ {
might_sleep(); might_fault();
return __copy_to_user_inatomic(to, from, n); return __copy_to_user_inatomic(to, from, n);
} }
static __always_inline unsigned long static __always_inline unsigned long
...@@ -137,7 +137,7 @@ __copy_from_user_inatomic(void *to, const void __user *from, unsigned long n) ...@@ -137,7 +137,7 @@ __copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
static __always_inline unsigned long static __always_inline unsigned long
__copy_from_user(void *to, const void __user *from, unsigned long n) __copy_from_user(void *to, const void __user *from, unsigned long n)
{ {
might_sleep(); might_fault();
if (__builtin_constant_p(n)) { if (__builtin_constant_p(n)) {
unsigned long ret; unsigned long ret;
...@@ -159,7 +159,7 @@ __copy_from_user(void *to, const void __user *from, unsigned long n) ...@@ -159,7 +159,7 @@ __copy_from_user(void *to, const void __user *from, unsigned long n)
static __always_inline unsigned long __copy_from_user_nocache(void *to, static __always_inline unsigned long __copy_from_user_nocache(void *to,
const void __user *from, unsigned long n) const void __user *from, unsigned long n)
{ {
might_sleep(); might_fault();
if (__builtin_constant_p(n)) { if (__builtin_constant_p(n)) {
unsigned long ret; unsigned long ret;
......
...@@ -29,6 +29,8 @@ static __always_inline __must_check ...@@ -29,6 +29,8 @@ static __always_inline __must_check
int __copy_from_user(void *dst, const void __user *src, unsigned size) int __copy_from_user(void *dst, const void __user *src, unsigned size)
{ {
int ret = 0; int ret = 0;
might_fault();
if (!__builtin_constant_p(size)) if (!__builtin_constant_p(size))
return copy_user_generic(dst, (__force void *)src, size); return copy_user_generic(dst, (__force void *)src, size);
switch (size) { switch (size) {
...@@ -71,6 +73,8 @@ static __always_inline __must_check ...@@ -71,6 +73,8 @@ static __always_inline __must_check
int __copy_to_user(void __user *dst, const void *src, unsigned size) int __copy_to_user(void __user *dst, const void *src, unsigned size)
{ {
int ret = 0; int ret = 0;
might_fault();
if (!__builtin_constant_p(size)) if (!__builtin_constant_p(size))
return copy_user_generic((__force void *)dst, src, size); return copy_user_generic((__force void *)dst, src, size);
switch (size) { switch (size) {
...@@ -113,6 +117,8 @@ static __always_inline __must_check ...@@ -113,6 +117,8 @@ static __always_inline __must_check
int __copy_in_user(void __user *dst, const void __user *src, unsigned size) int __copy_in_user(void __user *dst, const void __user *src, unsigned size)
{ {
int ret = 0; int ret = 0;
might_fault();
if (!__builtin_constant_p(size)) if (!__builtin_constant_p(size))
return copy_user_generic((__force void *)dst, return copy_user_generic((__force void *)dst,
(__force void *)src, size); (__force void *)src, size);
......
...@@ -14,6 +14,11 @@ CFLAGS_REMOVE_paravirt-spinlocks.o = -pg ...@@ -14,6 +14,11 @@ CFLAGS_REMOVE_paravirt-spinlocks.o = -pg
CFLAGS_REMOVE_ftrace.o = -pg CFLAGS_REMOVE_ftrace.o = -pg
endif endif
ifdef CONFIG_FUNCTION_RET_TRACER
# Don't trace __switch_to() but let it for function tracer
CFLAGS_REMOVE_process_32.o = -pg
endif
# #
# vsyscalls (which work on the user stack) should have # vsyscalls (which work on the user stack) should have
# no stack-protector checks: # no stack-protector checks:
...@@ -65,6 +70,7 @@ obj-$(CONFIG_X86_LOCAL_APIC) += apic.o nmi.o ...@@ -65,6 +70,7 @@ obj-$(CONFIG_X86_LOCAL_APIC) += apic.o nmi.o
obj-$(CONFIG_X86_IO_APIC) += io_apic.o obj-$(CONFIG_X86_IO_APIC) += io_apic.o
obj-$(CONFIG_X86_REBOOTFIXUPS) += reboot_fixups_32.o obj-$(CONFIG_X86_REBOOTFIXUPS) += reboot_fixups_32.o
obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o
obj-$(CONFIG_FUNCTION_RET_TRACER) += ftrace.o
obj-$(CONFIG_KEXEC) += machine_kexec_$(BITS).o obj-$(CONFIG_KEXEC) += machine_kexec_$(BITS).o
obj-$(CONFIG_KEXEC) += relocate_kernel_$(BITS).o crash.o obj-$(CONFIG_KEXEC) += relocate_kernel_$(BITS).o crash.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o
......
...@@ -1157,6 +1157,9 @@ ENTRY(mcount) ...@@ -1157,6 +1157,9 @@ ENTRY(mcount)
END(mcount) END(mcount)
ENTRY(ftrace_caller) ENTRY(ftrace_caller)
cmpl $0, function_trace_stop
jne ftrace_stub
pushl %eax pushl %eax
pushl %ecx pushl %ecx
pushl %edx pushl %edx
...@@ -1180,8 +1183,15 @@ END(ftrace_caller) ...@@ -1180,8 +1183,15 @@ END(ftrace_caller)
#else /* ! CONFIG_DYNAMIC_FTRACE */ #else /* ! CONFIG_DYNAMIC_FTRACE */
ENTRY(mcount) ENTRY(mcount)
cmpl $0, function_trace_stop
jne ftrace_stub
cmpl $ftrace_stub, ftrace_trace_function cmpl $ftrace_stub, ftrace_trace_function
jnz trace jnz trace
#ifdef CONFIG_FUNCTION_RET_TRACER
cmpl $ftrace_stub, ftrace_function_return
jnz ftrace_return_caller
#endif
.globl ftrace_stub .globl ftrace_stub
ftrace_stub: ftrace_stub:
ret ret
...@@ -1200,12 +1210,42 @@ trace: ...@@ -1200,12 +1210,42 @@ trace:
popl %edx popl %edx
popl %ecx popl %ecx
popl %eax popl %eax
jmp ftrace_stub jmp ftrace_stub
END(mcount) END(mcount)
#endif /* CONFIG_DYNAMIC_FTRACE */ #endif /* CONFIG_DYNAMIC_FTRACE */
#endif /* CONFIG_FUNCTION_TRACER */ #endif /* CONFIG_FUNCTION_TRACER */
#ifdef CONFIG_FUNCTION_RET_TRACER
ENTRY(ftrace_return_caller)
cmpl $0, function_trace_stop
jne ftrace_stub
pushl %eax
pushl %ecx
pushl %edx
movl 0xc(%esp), %edx
lea 0x4(%ebp), %eax
call prepare_ftrace_return
popl %edx
popl %ecx
popl %eax
ret
END(ftrace_return_caller)
.globl return_to_handler
return_to_handler:
pushl $0
pushl %eax
pushl %ecx
pushl %edx
call ftrace_return_to_handler
movl %eax, 0xc(%esp)
popl %edx
popl %ecx
popl %eax
ret
#endif
.section .rodata,"a" .section .rodata,"a"
#include "syscall_table_32.S" #include "syscall_table_32.S"
......
...@@ -68,6 +68,8 @@ ENTRY(mcount) ...@@ -68,6 +68,8 @@ ENTRY(mcount)
END(mcount) END(mcount)
ENTRY(ftrace_caller) ENTRY(ftrace_caller)
cmpl $0, function_trace_stop
jne ftrace_stub
/* taken from glibc */ /* taken from glibc */
subq $0x38, %rsp subq $0x38, %rsp
...@@ -103,6 +105,9 @@ END(ftrace_caller) ...@@ -103,6 +105,9 @@ END(ftrace_caller)
#else /* ! CONFIG_DYNAMIC_FTRACE */ #else /* ! CONFIG_DYNAMIC_FTRACE */
ENTRY(mcount) ENTRY(mcount)
cmpl $0, function_trace_stop
jne ftrace_stub
cmpq $ftrace_stub, ftrace_trace_function cmpq $ftrace_stub, ftrace_trace_function
jnz trace jnz trace
.globl ftrace_stub .globl ftrace_stub
......
...@@ -14,14 +14,17 @@ ...@@ -14,14 +14,17 @@
#include <linux/uaccess.h> #include <linux/uaccess.h>
#include <linux/ftrace.h> #include <linux/ftrace.h>
#include <linux/percpu.h> #include <linux/percpu.h>
#include <linux/sched.h>
#include <linux/init.h> #include <linux/init.h>
#include <linux/list.h> #include <linux/list.h>
#include <asm/ftrace.h> #include <asm/ftrace.h>
#include <linux/ftrace.h>
#include <asm/nops.h> #include <asm/nops.h>
#include <asm/nmi.h>
static unsigned char ftrace_nop[MCOUNT_INSN_SIZE]; #ifdef CONFIG_DYNAMIC_FTRACE
union ftrace_code_union { union ftrace_code_union {
char code[MCOUNT_INSN_SIZE]; char code[MCOUNT_INSN_SIZE];
...@@ -31,18 +34,12 @@ union ftrace_code_union { ...@@ -31,18 +34,12 @@ union ftrace_code_union {
} __attribute__((packed)); } __attribute__((packed));
}; };
static int ftrace_calc_offset(long ip, long addr) static int ftrace_calc_offset(long ip, long addr)
{ {
return (int)(addr - ip); return (int)(addr - ip);
} }
unsigned char *ftrace_nop_replace(void) static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
{
return ftrace_nop;
}
unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
{ {
static union ftrace_code_union calc; static union ftrace_code_union calc;
...@@ -56,7 +53,143 @@ unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr) ...@@ -56,7 +53,143 @@ unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
return calc.code; return calc.code;
} }
int /*
* Modifying code must take extra care. On an SMP machine, if
* the code being modified is also being executed on another CPU
* that CPU will have undefined results and possibly take a GPF.
* We use kstop_machine to stop other CPUS from exectuing code.
* But this does not stop NMIs from happening. We still need
* to protect against that. We separate out the modification of
* the code to take care of this.
*
* Two buffers are added: An IP buffer and a "code" buffer.
*
* 1) Put the instruction pointer into the IP buffer
* and the new code into the "code" buffer.
* 2) Set a flag that says we are modifying code
* 3) Wait for any running NMIs to finish.
* 4) Write the code
* 5) clear the flag.
* 6) Wait for any running NMIs to finish.
*
* If an NMI is executed, the first thing it does is to call
* "ftrace_nmi_enter". This will check if the flag is set to write
* and if it is, it will write what is in the IP and "code" buffers.
*
* The trick is, it does not matter if everyone is writing the same
* content to the code location. Also, if a CPU is executing code
* it is OK to write to that code location if the contents being written
* are the same as what exists.
*/
static atomic_t in_nmi = ATOMIC_INIT(0);
static int mod_code_status; /* holds return value of text write */
static int mod_code_write; /* set when NMI should do the write */
static void *mod_code_ip; /* holds the IP to write to */
static void *mod_code_newcode; /* holds the text to write to the IP */
static unsigned nmi_wait_count;
static atomic_t nmi_update_count = ATOMIC_INIT(0);
int ftrace_arch_read_dyn_info(char *buf, int size)
{
int r;
r = snprintf(buf, size, "%u %u",
nmi_wait_count,
atomic_read(&nmi_update_count));
return r;
}
static void ftrace_mod_code(void)
{
/*
* Yes, more than one CPU process can be writing to mod_code_status.
* (and the code itself)
* But if one were to fail, then they all should, and if one were
* to succeed, then they all should.
*/
mod_code_status = probe_kernel_write(mod_code_ip, mod_code_newcode,
MCOUNT_INSN_SIZE);
}
void ftrace_nmi_enter(void)
{
atomic_inc(&in_nmi);
/* Must have in_nmi seen before reading write flag */
smp_mb();
if (mod_code_write) {
ftrace_mod_code();
atomic_inc(&nmi_update_count);
}
}
void ftrace_nmi_exit(void)
{
/* Finish all executions before clearing in_nmi */
smp_wmb();
atomic_dec(&in_nmi);
}
static void wait_for_nmi(void)
{
int waited = 0;
while (atomic_read(&in_nmi)) {
waited = 1;
cpu_relax();
}
if (waited)
nmi_wait_count++;
}
static int
do_ftrace_mod_code(unsigned long ip, void *new_code)
{
mod_code_ip = (void *)ip;
mod_code_newcode = new_code;
/* The buffers need to be visible before we let NMIs write them */
smp_wmb();
mod_code_write = 1;
/* Make sure write bit is visible before we wait on NMIs */
smp_mb();
wait_for_nmi();
/* Make sure all running NMIs have finished before we write the code */
smp_mb();
ftrace_mod_code();
/* Make sure the write happens before clearing the bit */
smp_wmb();
mod_code_write = 0;
/* make sure NMIs see the cleared bit */
smp_mb();
wait_for_nmi();
return mod_code_status;
}
static unsigned char ftrace_nop[MCOUNT_INSN_SIZE];
static unsigned char *ftrace_nop_replace(void)
{
return ftrace_nop;
}
static int
ftrace_modify_code(unsigned long ip, unsigned char *old_code, ftrace_modify_code(unsigned long ip, unsigned char *old_code,
unsigned char *new_code) unsigned char *new_code)
{ {
...@@ -81,7 +214,7 @@ ftrace_modify_code(unsigned long ip, unsigned char *old_code, ...@@ -81,7 +214,7 @@ ftrace_modify_code(unsigned long ip, unsigned char *old_code,
return -EINVAL; return -EINVAL;
/* replace the text with the new text */ /* replace the text with the new text */
if (probe_kernel_write((void *)ip, new_code, MCOUNT_INSN_SIZE)) if (do_ftrace_mod_code(ip, new_code))
return -EPERM; return -EPERM;
sync_core(); sync_core();
...@@ -89,6 +222,29 @@ ftrace_modify_code(unsigned long ip, unsigned char *old_code, ...@@ -89,6 +222,29 @@ ftrace_modify_code(unsigned long ip, unsigned char *old_code,
return 0; return 0;
} }
int ftrace_make_nop(struct module *mod,
struct dyn_ftrace *rec, unsigned long addr)
{
unsigned char *new, *old;
unsigned long ip = rec->ip;
old = ftrace_call_replace(ip, addr);
new = ftrace_nop_replace();
return ftrace_modify_code(rec->ip, old, new);
}
int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
{
unsigned char *new, *old;
unsigned long ip = rec->ip;
old = ftrace_nop_replace();
new = ftrace_call_replace(ip, addr);
return ftrace_modify_code(rec->ip, old, new);
}
int ftrace_update_ftrace_func(ftrace_func_t func) int ftrace_update_ftrace_func(ftrace_func_t func)
{ {
unsigned long ip = (unsigned long)(&ftrace_call); unsigned long ip = (unsigned long)(&ftrace_call);
...@@ -165,3 +321,139 @@ int __init ftrace_dyn_arch_init(void *data) ...@@ -165,3 +321,139 @@ int __init ftrace_dyn_arch_init(void *data)
return 0; return 0;
} }
#endif
#ifdef CONFIG_FUNCTION_RET_TRACER
#ifndef CONFIG_DYNAMIC_FTRACE
/*
* These functions are picked from those used on
* this page for dynamic ftrace. They have been
* simplified to ignore all traces in NMI context.
*/
static atomic_t in_nmi;
void ftrace_nmi_enter(void)
{
atomic_inc(&in_nmi);
}
void ftrace_nmi_exit(void)
{
atomic_dec(&in_nmi);
}
#endif /* !CONFIG_DYNAMIC_FTRACE */
/* Add a function return address to the trace stack on thread info.*/
static int push_return_trace(unsigned long ret, unsigned long long time,
unsigned long func)
{
int index;
if (!current->ret_stack)
return -EBUSY;
/* The return trace stack is full */
if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
atomic_inc(&current->trace_overrun);
return -EBUSY;
}
index = ++current->curr_ret_stack;
barrier();
current->ret_stack[index].ret = ret;
current->ret_stack[index].func = func;
current->ret_stack[index].calltime = time;
return 0;
}
/* Retrieve a function return address to the trace stack on thread info.*/
static void pop_return_trace(unsigned long *ret, unsigned long long *time,
unsigned long *func, unsigned long *overrun)
{
int index;
index = current->curr_ret_stack;
*ret = current->ret_stack[index].ret;
*func = current->ret_stack[index].func;
*time = current->ret_stack[index].calltime;
*overrun = atomic_read(&current->trace_overrun);
current->curr_ret_stack--;
}
/*
* Send the trace to the ring-buffer.
* @return the original return address.
*/
unsigned long ftrace_return_to_handler(void)
{
struct ftrace_retfunc trace;
pop_return_trace(&trace.ret, &trace.calltime, &trace.func,
&trace.overrun);
trace.rettime = cpu_clock(raw_smp_processor_id());
ftrace_function_return(&trace);
return trace.ret;
}
/*
* Hook the return address and push it in the stack of return addrs
* in current thread info.
*/
void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
{
unsigned long old;
unsigned long long calltime;
int faulted;
unsigned long return_hooker = (unsigned long)
&return_to_handler;
/* Nmi's are currently unsupported */
if (atomic_read(&in_nmi))
return;
/*
* Protect against fault, even if it shouldn't
* happen. This tool is too much intrusive to
* ignore such a protection.
*/
asm volatile(
"1: movl (%[parent_old]), %[old]\n"
"2: movl %[return_hooker], (%[parent_replaced])\n"
" movl $0, %[faulted]\n"
".section .fixup, \"ax\"\n"
"3: movl $1, %[faulted]\n"
".previous\n"
".section __ex_table, \"a\"\n"
" .long 1b, 3b\n"
" .long 2b, 3b\n"
".previous\n"
: [parent_replaced] "=r" (parent), [old] "=r" (old),
[faulted] "=r" (faulted)
: [parent_old] "0" (parent), [return_hooker] "r" (return_hooker)
: "memory"
);
if (WARN_ON(faulted)) {
unregister_ftrace_return();
return;
}
if (WARN_ON(!__kernel_text_address(old))) {
unregister_ftrace_return();
*parent = old;
return;
}
calltime = cpu_clock(raw_smp_processor_id());
if (push_return_trace(old, calltime, self_addr) == -EBUSY)
*parent = old;
}
#endif /* CONFIG_FUNCTION_RET_TRACER */
...@@ -6,6 +6,7 @@ ...@@ -6,6 +6,7 @@
#include <linux/sched.h> #include <linux/sched.h>
#include <linux/stacktrace.h> #include <linux/stacktrace.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/uaccess.h>
#include <asm/stacktrace.h> #include <asm/stacktrace.h>
static void save_stack_warning(void *data, char *msg) static void save_stack_warning(void *data, char *msg)
...@@ -83,3 +84,66 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace) ...@@ -83,3 +84,66 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
trace->entries[trace->nr_entries++] = ULONG_MAX; trace->entries[trace->nr_entries++] = ULONG_MAX;
} }
EXPORT_SYMBOL_GPL(save_stack_trace_tsk); EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
/* Userspace stacktrace - based on kernel/trace/trace_sysprof.c */
struct stack_frame {
const void __user *next_fp;
unsigned long ret_addr;
};
static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
{
int ret;
if (!access_ok(VERIFY_READ, fp, sizeof(*frame)))
return 0;
ret = 1;
pagefault_disable();
if (__copy_from_user_inatomic(frame, fp, sizeof(*frame)))
ret = 0;
pagefault_enable();
return ret;
}
static inline void __save_stack_trace_user(struct stack_trace *trace)
{
const struct pt_regs *regs = task_pt_regs(current);
const void __user *fp = (const void __user *)regs->bp;
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = regs->ip;
while (trace->nr_entries < trace->max_entries) {
struct stack_frame frame;
frame.next_fp = NULL;
frame.ret_addr = 0;
if (!copy_stack_frame(fp, &frame))
break;
if ((unsigned long)fp < regs->sp)
break;
if (frame.ret_addr) {
trace->entries[trace->nr_entries++] =
frame.ret_addr;
}
if (fp == frame.next_fp)
break;
fp = frame.next_fp;
}
}
void save_stack_trace_user(struct stack_trace *trace)
{
/*
* Trace user stack if we are not a kernel thread
*/
if (current->mm) {
__save_stack_trace_user(trace);
}
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
}
...@@ -17,6 +17,9 @@ ...@@ -17,6 +17,9 @@
* want per guest time just set the kernel.vsyscall64 sysctl to 0. * want per guest time just set the kernel.vsyscall64 sysctl to 0.
*/ */
/* Disable profiling for userspace code: */
#define DISABLE_BRANCH_PROFILING
#include <linux/time.h> #include <linux/time.h>
#include <linux/init.h> #include <linux/init.h>
#include <linux/kernel.h> #include <linux/kernel.h>
......
...@@ -39,7 +39,7 @@ static inline int __movsl_is_ok(unsigned long a1, unsigned long a2, unsigned lon ...@@ -39,7 +39,7 @@ static inline int __movsl_is_ok(unsigned long a1, unsigned long a2, unsigned lon
#define __do_strncpy_from_user(dst, src, count, res) \ #define __do_strncpy_from_user(dst, src, count, res) \
do { \ do { \
int __d0, __d1, __d2; \ int __d0, __d1, __d2; \
might_sleep(); \ might_fault(); \
__asm__ __volatile__( \ __asm__ __volatile__( \
" testl %1,%1\n" \ " testl %1,%1\n" \
" jz 2f\n" \ " jz 2f\n" \
...@@ -126,7 +126,7 @@ EXPORT_SYMBOL(strncpy_from_user); ...@@ -126,7 +126,7 @@ EXPORT_SYMBOL(strncpy_from_user);
#define __do_clear_user(addr,size) \ #define __do_clear_user(addr,size) \
do { \ do { \
int __d0; \ int __d0; \
might_sleep(); \ might_fault(); \
__asm__ __volatile__( \ __asm__ __volatile__( \
"0: rep; stosl\n" \ "0: rep; stosl\n" \
" movl %2,%0\n" \ " movl %2,%0\n" \
...@@ -155,7 +155,7 @@ do { \ ...@@ -155,7 +155,7 @@ do { \
unsigned long unsigned long
clear_user(void __user *to, unsigned long n) clear_user(void __user *to, unsigned long n)
{ {
might_sleep(); might_fault();
if (access_ok(VERIFY_WRITE, to, n)) if (access_ok(VERIFY_WRITE, to, n))
__do_clear_user(to, n); __do_clear_user(to, n);
return n; return n;
...@@ -197,7 +197,7 @@ long strnlen_user(const char __user *s, long n) ...@@ -197,7 +197,7 @@ long strnlen_user(const char __user *s, long n)
unsigned long mask = -__addr_ok(s); unsigned long mask = -__addr_ok(s);
unsigned long res, tmp; unsigned long res, tmp;
might_sleep(); might_fault();
__asm__ __volatile__( __asm__ __volatile__(
" testl %0, %0\n" " testl %0, %0\n"
......
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
#define __do_strncpy_from_user(dst,src,count,res) \ #define __do_strncpy_from_user(dst,src,count,res) \
do { \ do { \
long __d0, __d1, __d2; \ long __d0, __d1, __d2; \
might_sleep(); \ might_fault(); \
__asm__ __volatile__( \ __asm__ __volatile__( \
" testq %1,%1\n" \ " testq %1,%1\n" \
" jz 2f\n" \ " jz 2f\n" \
...@@ -64,7 +64,7 @@ EXPORT_SYMBOL(strncpy_from_user); ...@@ -64,7 +64,7 @@ EXPORT_SYMBOL(strncpy_from_user);
unsigned long __clear_user(void __user *addr, unsigned long size) unsigned long __clear_user(void __user *addr, unsigned long size)
{ {
long __d0; long __d0;
might_sleep(); might_fault();
/* no memory constraint because it doesn't change any memory gcc knows /* no memory constraint because it doesn't change any memory gcc knows
about */ about */
asm volatile( asm volatile(
......
...@@ -8,9 +8,8 @@ obj-$(CONFIG_X86_PTDUMP) += dump_pagetables.o ...@@ -8,9 +8,8 @@ obj-$(CONFIG_X86_PTDUMP) += dump_pagetables.o
obj-$(CONFIG_HIGHMEM) += highmem_32.o obj-$(CONFIG_HIGHMEM) += highmem_32.o
obj-$(CONFIG_MMIOTRACE_HOOKS) += kmmio.o
obj-$(CONFIG_MMIOTRACE) += mmiotrace.o obj-$(CONFIG_MMIOTRACE) += mmiotrace.o
mmiotrace-y := pf_in.o mmio-mod.o mmiotrace-y := kmmio.o pf_in.o mmio-mod.o
obj-$(CONFIG_MMIOTRACE_TEST) += testmmiotrace.o obj-$(CONFIG_MMIOTRACE_TEST) += testmmiotrace.o
obj-$(CONFIG_NUMA) += numa_$(BITS).o obj-$(CONFIG_NUMA) += numa_$(BITS).o
......
...@@ -53,7 +53,7 @@ ...@@ -53,7 +53,7 @@
static inline int kmmio_fault(struct pt_regs *regs, unsigned long addr) static inline int kmmio_fault(struct pt_regs *regs, unsigned long addr)
{ {
#ifdef CONFIG_MMIOTRACE_HOOKS #ifdef CONFIG_MMIOTRACE
if (unlikely(is_kmmio_active())) if (unlikely(is_kmmio_active()))
if (kmmio_handler(regs, addr) == 1) if (kmmio_handler(regs, addr) == 1)
return -1; return -1;
......
...@@ -9,6 +9,9 @@ ...@@ -9,6 +9,9 @@
* Also alternative() doesn't work. * Also alternative() doesn't work.
*/ */
/* Disable profiling for userspace code: */
#define DISABLE_BRANCH_PROFILING
#include <linux/kernel.h> #include <linux/kernel.h>
#include <linux/posix-timers.h> #include <linux/posix-timers.h>
#include <linux/time.h> #include <linux/time.h>
......
...@@ -274,6 +274,22 @@ static struct sysrq_key_op sysrq_showstate_blocked_op = { ...@@ -274,6 +274,22 @@ static struct sysrq_key_op sysrq_showstate_blocked_op = {
.enable_mask = SYSRQ_ENABLE_DUMP, .enable_mask = SYSRQ_ENABLE_DUMP,
}; };
#ifdef CONFIG_TRACING
#include <linux/ftrace.h>
static void sysrq_ftrace_dump(int key, struct tty_struct *tty)
{
ftrace_dump();
}
static struct sysrq_key_op sysrq_ftrace_dump_op = {
.handler = sysrq_ftrace_dump,
.help_msg = "dumpZ-ftrace-buffer",
.action_msg = "Dump ftrace buffer",
.enable_mask = SYSRQ_ENABLE_DUMP,
};
#else
#define sysrq_ftrace_dump_op (*(struct sysrq_key_op *)0)
#endif
static void sysrq_handle_showmem(int key, struct tty_struct *tty) static void sysrq_handle_showmem(int key, struct tty_struct *tty)
{ {
...@@ -406,7 +422,7 @@ static struct sysrq_key_op *sysrq_key_table[36] = { ...@@ -406,7 +422,7 @@ static struct sysrq_key_op *sysrq_key_table[36] = {
NULL, /* x */ NULL, /* x */
/* y: May be registered on sparc64 for global register dump */ /* y: May be registered on sparc64 for global register dump */
NULL, /* y */ NULL, /* y */
NULL /* z */ &sysrq_ftrace_dump_op, /* z */
}; };
/* key2index calculation, -1 on invalid index */ /* key2index calculation, -1 on invalid index */
......
...@@ -804,6 +804,9 @@ static int pxafb_smart_thread(void *arg) ...@@ -804,6 +804,9 @@ static int pxafb_smart_thread(void *arg)
static int pxafb_smart_init(struct pxafb_info *fbi) static int pxafb_smart_init(struct pxafb_info *fbi)
{ {
if (!(fbi->lccr0 | LCCR0_LCDT))
return 0;
fbi->smart_thread = kthread_run(pxafb_smart_thread, fbi, fbi->smart_thread = kthread_run(pxafb_smart_thread, fbi,
"lcd_refresh"); "lcd_refresh");
if (IS_ERR(fbi->smart_thread)) { if (IS_ERR(fbi->smart_thread)) {
...@@ -1372,7 +1375,7 @@ static void pxafb_decode_mach_info(struct pxafb_info *fbi, ...@@ -1372,7 +1375,7 @@ static void pxafb_decode_mach_info(struct pxafb_info *fbi,
fbi->cmap_inverse = inf->cmap_inverse; fbi->cmap_inverse = inf->cmap_inverse;
fbi->cmap_static = inf->cmap_static; fbi->cmap_static = inf->cmap_static;
switch (lcd_conn & 0xf) { switch (lcd_conn & LCD_TYPE_MASK) {
case LCD_TYPE_MONO_STN: case LCD_TYPE_MONO_STN:
fbi->lccr0 = LCCR0_CMS; fbi->lccr0 = LCCR0_CMS;
break; break;
......
...@@ -357,7 +357,18 @@ int seq_printf(struct seq_file *m, const char *f, ...) ...@@ -357,7 +357,18 @@ int seq_printf(struct seq_file *m, const char *f, ...)
} }
EXPORT_SYMBOL(seq_printf); EXPORT_SYMBOL(seq_printf);
static char *mangle_path(char *s, char *p, char *esc) /**
* mangle_path - mangle and copy path to buffer beginning
* @s: buffer start
* @p: beginning of path in above buffer
* @esc: set of characters that need escaping
*
* Copy the path from @p to @s, replacing each occurrence of character from
* @esc with usual octal escape.
* Returns pointer past last written character in @s, or NULL in case of
* failure.
*/
char *mangle_path(char *s, char *p, char *esc)
{ {
while (s <= p) { while (s <= p) {
char c = *p++; char c = *p++;
...@@ -376,6 +387,7 @@ static char *mangle_path(char *s, char *p, char *esc) ...@@ -376,6 +387,7 @@ static char *mangle_path(char *s, char *p, char *esc)
} }
return NULL; return NULL;
} }
EXPORT_SYMBOL_GPL(mangle_path);
/* /*
* return the absolute path of 'dentry' residing in mount 'mnt'. * return the absolute path of 'dentry' residing in mount 'mnt'.
......
...@@ -45,6 +45,22 @@ ...@@ -45,6 +45,22 @@
#define MCOUNT_REC() #define MCOUNT_REC()
#endif #endif
#ifdef CONFIG_TRACE_BRANCH_PROFILING
#define LIKELY_PROFILE() VMLINUX_SYMBOL(__start_annotated_branch_profile) = .; \
*(_ftrace_annotated_branch) \
VMLINUX_SYMBOL(__stop_annotated_branch_profile) = .;
#else
#define LIKELY_PROFILE()
#endif
#ifdef CONFIG_PROFILE_ALL_BRANCHES
#define BRANCH_PROFILE() VMLINUX_SYMBOL(__start_branch_profile) = .; \
*(_ftrace_branch) \
VMLINUX_SYMBOL(__stop_branch_profile) = .;
#else
#define BRANCH_PROFILE()
#endif
/* .data section */ /* .data section */
#define DATA_DATA \ #define DATA_DATA \
*(.data) \ *(.data) \
...@@ -60,9 +76,12 @@ ...@@ -60,9 +76,12 @@
VMLINUX_SYMBOL(__start___markers) = .; \ VMLINUX_SYMBOL(__start___markers) = .; \
*(__markers) \ *(__markers) \
VMLINUX_SYMBOL(__stop___markers) = .; \ VMLINUX_SYMBOL(__stop___markers) = .; \
. = ALIGN(32); \
VMLINUX_SYMBOL(__start___tracepoints) = .; \ VMLINUX_SYMBOL(__start___tracepoints) = .; \
*(__tracepoints) \ *(__tracepoints) \
VMLINUX_SYMBOL(__stop___tracepoints) = .; VMLINUX_SYMBOL(__stop___tracepoints) = .; \
LIKELY_PROFILE() \
BRANCH_PROFILE()
#define RO_DATA(align) \ #define RO_DATA(align) \
. = ALIGN((align)); \ . = ALIGN((align)); \
......
...@@ -59,8 +59,88 @@ extern void __chk_io_ptr(const volatile void __iomem *); ...@@ -59,8 +59,88 @@ extern void __chk_io_ptr(const volatile void __iomem *);
* specific implementations come from the above header files * specific implementations come from the above header files
*/ */
#define likely(x) __builtin_expect(!!(x), 1) struct ftrace_branch_data {
#define unlikely(x) __builtin_expect(!!(x), 0) const char *func;
const char *file;
unsigned line;
union {
struct {
unsigned long correct;
unsigned long incorrect;
};
struct {
unsigned long miss;
unsigned long hit;
};
};
};
/*
* Note: DISABLE_BRANCH_PROFILING can be used by special lowlevel code
* to disable branch tracing on a per file basis.
*/
#if defined(CONFIG_TRACE_BRANCH_PROFILING) && !defined(DISABLE_BRANCH_PROFILING)
void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
#define likely_notrace(x) __builtin_expect(!!(x), 1)
#define unlikely_notrace(x) __builtin_expect(!!(x), 0)
#define __branch_check__(x, expect) ({ \
int ______r; \
static struct ftrace_branch_data \
__attribute__((__aligned__(4))) \
__attribute__((section("_ftrace_annotated_branch"))) \
______f = { \
.func = __func__, \
.file = __FILE__, \
.line = __LINE__, \
}; \
______r = likely_notrace(x); \
ftrace_likely_update(&______f, ______r, expect); \
______r; \
})
/*
* Using __builtin_constant_p(x) to ignore cases where the return
* value is always the same. This idea is taken from a similar patch
* written by Daniel Walker.
*/
# ifndef likely
# define likely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 1))
# endif
# ifndef unlikely
# define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
# endif
#ifdef CONFIG_PROFILE_ALL_BRANCHES
/*
* "Define 'is'", Bill Clinton
* "Define 'if'", Steven Rostedt
*/
#define if(cond) if (__builtin_constant_p((cond)) ? !!(cond) : \
({ \
int ______r; \
static struct ftrace_branch_data \
__attribute__((__aligned__(4))) \
__attribute__((section("_ftrace_branch"))) \
______f = { \
.func = __func__, \
.file = __FILE__, \
.line = __LINE__, \
}; \
______r = !!(cond); \
if (______r) \
______f.hit++; \
else \
______f.miss++; \
______r; \
}))
#endif /* CONFIG_PROFILE_ALL_BRANCHES */
#else
# define likely(x) __builtin_expect(!!(x), 1)
# define unlikely(x) __builtin_expect(!!(x), 0)
#endif
/* Optimization barrier */ /* Optimization barrier */
#ifndef barrier #ifndef barrier
......
...@@ -17,7 +17,7 @@ extern int debug_locks_off(void); ...@@ -17,7 +17,7 @@ extern int debug_locks_off(void);
({ \ ({ \
int __ret = 0; \ int __ret = 0; \
\ \
if (unlikely(c)) { \ if (!oops_in_progress && unlikely(c)) { \
if (debug_locks_off() && !debug_locks_silent) \ if (debug_locks_off() && !debug_locks_silent) \
WARN_ON(1); \ WARN_ON(1); \
__ret = 1; \ __ret = 1; \
......
...@@ -23,6 +23,45 @@ struct ftrace_ops { ...@@ -23,6 +23,45 @@ struct ftrace_ops {
struct ftrace_ops *next; struct ftrace_ops *next;
}; };
extern int function_trace_stop;
/*
* Type of the current tracing.
*/
enum ftrace_tracing_type_t {
FTRACE_TYPE_ENTER = 0, /* Hook the call of the function */
FTRACE_TYPE_RETURN, /* Hook the return of the function */
};
/* Current tracing type, default is FTRACE_TYPE_ENTER */
extern enum ftrace_tracing_type_t ftrace_tracing_type;
/**
* ftrace_stop - stop function tracer.
*
* A quick way to stop the function tracer. Note this an on off switch,
* it is not something that is recursive like preempt_disable.
* This does not disable the calling of mcount, it only stops the
* calling of functions from mcount.
*/
static inline void ftrace_stop(void)
{
function_trace_stop = 1;
}
/**
* ftrace_start - start the function tracer.
*
* This function is the inverse of ftrace_stop. This does not enable
* the function tracing if the function tracer is disabled. This only
* sets the function tracer flag to continue calling the functions
* from mcount.
*/
static inline void ftrace_start(void)
{
function_trace_stop = 0;
}
/* /*
* The ftrace_ops must be a static and should also * The ftrace_ops must be a static and should also
* be read_mostly. These functions do modify read_mostly variables * be read_mostly. These functions do modify read_mostly variables
...@@ -41,9 +80,13 @@ extern void ftrace_stub(unsigned long a0, unsigned long a1); ...@@ -41,9 +80,13 @@ extern void ftrace_stub(unsigned long a0, unsigned long a1);
# define unregister_ftrace_function(ops) do { } while (0) # define unregister_ftrace_function(ops) do { } while (0)
# define clear_ftrace_function(ops) do { } while (0) # define clear_ftrace_function(ops) do { } while (0)
static inline void ftrace_kill(void) { } static inline void ftrace_kill(void) { }
static inline void ftrace_stop(void) { }
static inline void ftrace_start(void) { }
#endif /* CONFIG_FUNCTION_TRACER */ #endif /* CONFIG_FUNCTION_TRACER */
#ifdef CONFIG_DYNAMIC_FTRACE #ifdef CONFIG_DYNAMIC_FTRACE
/* asm/ftrace.h must be defined for archs supporting dynamic ftrace */
#include <asm/ftrace.h>
enum { enum {
FTRACE_FL_FREE = (1 << 0), FTRACE_FL_FREE = (1 << 0),
...@@ -59,6 +102,7 @@ struct dyn_ftrace { ...@@ -59,6 +102,7 @@ struct dyn_ftrace {
struct list_head list; struct list_head list;
unsigned long ip; /* address of mcount call-site */ unsigned long ip; /* address of mcount call-site */
unsigned long flags; unsigned long flags;
struct dyn_arch_ftrace arch;
}; };
int ftrace_force_update(void); int ftrace_force_update(void);
...@@ -66,19 +110,43 @@ void ftrace_set_filter(unsigned char *buf, int len, int reset); ...@@ -66,19 +110,43 @@ void ftrace_set_filter(unsigned char *buf, int len, int reset);
/* defined in arch */ /* defined in arch */
extern int ftrace_ip_converted(unsigned long ip); extern int ftrace_ip_converted(unsigned long ip);
extern unsigned char *ftrace_nop_replace(void);
extern unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr);
extern int ftrace_dyn_arch_init(void *data); extern int ftrace_dyn_arch_init(void *data);
extern int ftrace_update_ftrace_func(ftrace_func_t func); extern int ftrace_update_ftrace_func(ftrace_func_t func);
extern void ftrace_caller(void); extern void ftrace_caller(void);
extern void ftrace_call(void); extern void ftrace_call(void);
extern void mcount_call(void); extern void mcount_call(void);
#ifdef CONFIG_FUNCTION_RET_TRACER
extern void ftrace_return_caller(void);
#endif
/**
* ftrace_make_nop - convert code into top
* @mod: module structure if called by module load initialization
* @rec: the mcount call site record
* @addr: the address that the call site should be calling
*
* This is a very sensitive operation and great care needs
* to be taken by the arch. The operation should carefully
* read the location, check to see if what is read is indeed
* what we expect it to be, and then on success of the compare,
* it should write to the location.
*
* The code segment at @rec->ip should be a caller to @addr
*
* Return must be:
* 0 on success
* -EFAULT on error reading the location
* -EINVAL on a failed compare of the contents
* -EPERM on error writing to the location
* Any other value will be considered a failure.
*/
extern int ftrace_make_nop(struct module *mod,
struct dyn_ftrace *rec, unsigned long addr);
/** /**
* ftrace_modify_code - modify code segment * ftrace_make_call - convert a nop call site into a call to addr
* @ip: the address of the code segment * @rec: the mcount call site record
* @old_code: the contents of what is expected to be there * @addr: the address that the call site should call
* @new_code: the code to patch in
* *
* This is a very sensitive operation and great care needs * This is a very sensitive operation and great care needs
* to be taken by the arch. The operation should carefully * to be taken by the arch. The operation should carefully
...@@ -86,6 +154,8 @@ extern void mcount_call(void); ...@@ -86,6 +154,8 @@ extern void mcount_call(void);
* what we expect it to be, and then on success of the compare, * what we expect it to be, and then on success of the compare,
* it should write to the location. * it should write to the location.
* *
* The code segment at @rec->ip should be a nop
*
* Return must be: * Return must be:
* 0 on success * 0 on success
* -EFAULT on error reading the location * -EFAULT on error reading the location
...@@ -93,8 +163,11 @@ extern void mcount_call(void); ...@@ -93,8 +163,11 @@ extern void mcount_call(void);
* -EPERM on error writing to the location * -EPERM on error writing to the location
* Any other value will be considered a failure. * Any other value will be considered a failure.
*/ */
extern int ftrace_modify_code(unsigned long ip, unsigned char *old_code, extern int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr);
unsigned char *new_code);
/* May be defined in arch */
extern int ftrace_arch_read_dyn_info(char *buf, int size);
extern int skip_trace(unsigned long ip); extern int skip_trace(unsigned long ip);
...@@ -102,7 +175,6 @@ extern void ftrace_release(void *start, unsigned long size); ...@@ -102,7 +175,6 @@ extern void ftrace_release(void *start, unsigned long size);
extern void ftrace_disable_daemon(void); extern void ftrace_disable_daemon(void);
extern void ftrace_enable_daemon(void); extern void ftrace_enable_daemon(void);
#else #else
# define skip_trace(ip) ({ 0; }) # define skip_trace(ip) ({ 0; })
# define ftrace_force_update() ({ 0; }) # define ftrace_force_update() ({ 0; })
...@@ -181,6 +253,12 @@ static inline void __ftrace_enabled_restore(int enabled) ...@@ -181,6 +253,12 @@ static inline void __ftrace_enabled_restore(int enabled)
#endif #endif
#ifdef CONFIG_TRACING #ifdef CONFIG_TRACING
extern int ftrace_dump_on_oops;
extern void tracing_start(void);
extern void tracing_stop(void);
extern void ftrace_off_permanent(void);
extern void extern void
ftrace_special(unsigned long arg1, unsigned long arg2, unsigned long arg3); ftrace_special(unsigned long arg1, unsigned long arg2, unsigned long arg3);
...@@ -211,6 +289,9 @@ ftrace_special(unsigned long arg1, unsigned long arg2, unsigned long arg3) { } ...@@ -211,6 +289,9 @@ ftrace_special(unsigned long arg1, unsigned long arg2, unsigned long arg3) { }
static inline int static inline int
ftrace_printk(const char *fmt, ...) __attribute__ ((format (printf, 1, 0))); ftrace_printk(const char *fmt, ...) __attribute__ ((format (printf, 1, 0)));
static inline void tracing_start(void) { }
static inline void tracing_stop(void) { }
static inline void ftrace_off_permanent(void) { }
static inline int static inline int
ftrace_printk(const char *fmt, ...) ftrace_printk(const char *fmt, ...)
{ {
...@@ -221,33 +302,44 @@ static inline void ftrace_dump(void) { } ...@@ -221,33 +302,44 @@ static inline void ftrace_dump(void) { }
#ifdef CONFIG_FTRACE_MCOUNT_RECORD #ifdef CONFIG_FTRACE_MCOUNT_RECORD
extern void ftrace_init(void); extern void ftrace_init(void);
extern void ftrace_init_module(unsigned long *start, unsigned long *end); extern void ftrace_init_module(struct module *mod,
unsigned long *start, unsigned long *end);
#else #else
static inline void ftrace_init(void) { } static inline void ftrace_init(void) { }
static inline void static inline void
ftrace_init_module(unsigned long *start, unsigned long *end) { } ftrace_init_module(struct module *mod,
unsigned long *start, unsigned long *end) { }
#endif #endif
struct boot_trace { /*
pid_t caller; * Structure that defines a return function trace.
char func[KSYM_NAME_LEN]; */
int result; struct ftrace_retfunc {
unsigned long long duration; /* usecs */ unsigned long ret; /* Return address */
ktime_t calltime; unsigned long func; /* Current function */
ktime_t rettime; unsigned long long calltime;
unsigned long long rettime;
/* Number of functions that overran the depth limit for current task */
unsigned long overrun;
}; };
#ifdef CONFIG_BOOT_TRACER #ifdef CONFIG_FUNCTION_RET_TRACER
extern void trace_boot(struct boot_trace *it, initcall_t fn); #define FTRACE_RETFUNC_DEPTH 50
extern void start_boot_trace(void); #define FTRACE_RETSTACK_ALLOC_SIZE 32
extern void stop_boot_trace(void); /* Type of a callback handler of tracing return function */
#else typedef void (*trace_function_return_t)(struct ftrace_retfunc *);
static inline void trace_boot(struct boot_trace *it, initcall_t fn) { }
static inline void start_boot_trace(void) { }
static inline void stop_boot_trace(void) { }
#endif
extern int register_ftrace_return(trace_function_return_t func);
/* The current handler in use */
extern trace_function_return_t ftrace_function_return;
extern void unregister_ftrace_return(void);
extern void ftrace_retfunc_init_task(struct task_struct *t);
extern void ftrace_retfunc_exit_task(struct task_struct *t);
#else
static inline void ftrace_retfunc_init_task(struct task_struct *t) { }
static inline void ftrace_retfunc_exit_task(struct task_struct *t) { }
#endif
#endif /* _LINUX_FTRACE_H */ #endif /* _LINUX_FTRACE_H */
#ifndef _LINUX_FTRACE_IRQ_H
#define _LINUX_FTRACE_IRQ_H
#if defined(CONFIG_DYNAMIC_FTRACE) || defined(CONFIG_FUNCTION_RET_TRACER)
extern void ftrace_nmi_enter(void);
extern void ftrace_nmi_exit(void);
#else
static inline void ftrace_nmi_enter(void) { }
static inline void ftrace_nmi_exit(void) { }
#endif
#endif /* _LINUX_FTRACE_IRQ_H */
...@@ -164,6 +164,8 @@ union futex_key { ...@@ -164,6 +164,8 @@ union futex_key {
} both; } both;
}; };
#define FUTEX_KEY_INIT (union futex_key) { .both = { .ptr = NULL } }
#ifdef CONFIG_FUTEX #ifdef CONFIG_FUTEX
extern void exit_robust_list(struct task_struct *curr); extern void exit_robust_list(struct task_struct *curr);
extern void exit_pi_state_list(struct task_struct *curr); extern void exit_pi_state_list(struct task_struct *curr);
......
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
#include <linux/preempt.h> #include <linux/preempt.h>
#include <linux/smp_lock.h> #include <linux/smp_lock.h>
#include <linux/lockdep.h> #include <linux/lockdep.h>
#include <linux/ftrace_irq.h>
#include <asm/hardirq.h> #include <asm/hardirq.h>
#include <asm/system.h> #include <asm/system.h>
...@@ -161,7 +162,17 @@ extern void irq_enter(void); ...@@ -161,7 +162,17 @@ extern void irq_enter(void);
*/ */
extern void irq_exit(void); extern void irq_exit(void);
#define nmi_enter() do { lockdep_off(); __irq_enter(); } while (0) #define nmi_enter() \
#define nmi_exit() do { __irq_exit(); lockdep_on(); } while (0) do { \
ftrace_nmi_enter(); \
lockdep_off(); \
__irq_enter(); \
} while (0)
#define nmi_exit() \
do { \
__irq_exit(); \
lockdep_on(); \
ftrace_nmi_exit(); \
} while (0)
#endif /* LINUX_HARDIRQ_H */ #endif /* LINUX_HARDIRQ_H */
...@@ -141,6 +141,15 @@ extern int _cond_resched(void); ...@@ -141,6 +141,15 @@ extern int _cond_resched(void);
(__x < 0) ? -__x : __x; \ (__x < 0) ? -__x : __x; \
}) })
#ifdef CONFIG_PROVE_LOCKING
void might_fault(void);
#else
static inline void might_fault(void)
{
might_sleep();
}
#endif
extern struct atomic_notifier_head panic_notifier_list; extern struct atomic_notifier_head panic_notifier_list;
extern long (*panic_blink)(long time); extern long (*panic_blink)(long time);
NORET_TYPE void panic(const char * fmt, ...) NORET_TYPE void panic(const char * fmt, ...)
...@@ -188,6 +197,8 @@ extern unsigned long long memparse(const char *ptr, char **retptr); ...@@ -188,6 +197,8 @@ extern unsigned long long memparse(const char *ptr, char **retptr);
extern int core_kernel_text(unsigned long addr); extern int core_kernel_text(unsigned long addr);
extern int __kernel_text_address(unsigned long addr); extern int __kernel_text_address(unsigned long addr);
extern int kernel_text_address(unsigned long addr); extern int kernel_text_address(unsigned long addr);
extern int func_ptr_is_kernel_text(void *ptr);
struct pid; struct pid;
extern struct pid *session_of_pgrp(struct pid *pgrp); extern struct pid *session_of_pgrp(struct pid *pgrp);
......
...@@ -73,6 +73,8 @@ struct lock_class_key { ...@@ -73,6 +73,8 @@ struct lock_class_key {
struct lockdep_subclass_key subkeys[MAX_LOCKDEP_SUBCLASSES]; struct lockdep_subclass_key subkeys[MAX_LOCKDEP_SUBCLASSES];
}; };
#define LOCKSTAT_POINTS 4
/* /*
* The lock-class itself: * The lock-class itself:
*/ */
...@@ -119,7 +121,8 @@ struct lock_class { ...@@ -119,7 +121,8 @@ struct lock_class {
int name_version; int name_version;
#ifdef CONFIG_LOCK_STAT #ifdef CONFIG_LOCK_STAT
unsigned long contention_point[4]; unsigned long contention_point[LOCKSTAT_POINTS];
unsigned long contending_point[LOCKSTAT_POINTS];
#endif #endif
}; };
...@@ -144,6 +147,7 @@ enum bounce_type { ...@@ -144,6 +147,7 @@ enum bounce_type {
struct lock_class_stats { struct lock_class_stats {
unsigned long contention_point[4]; unsigned long contention_point[4];
unsigned long contending_point[4];
struct lock_time read_waittime; struct lock_time read_waittime;
struct lock_time write_waittime; struct lock_time write_waittime;
struct lock_time read_holdtime; struct lock_time read_holdtime;
...@@ -165,6 +169,7 @@ struct lockdep_map { ...@@ -165,6 +169,7 @@ struct lockdep_map {
const char *name; const char *name;
#ifdef CONFIG_LOCK_STAT #ifdef CONFIG_LOCK_STAT
int cpu; int cpu;
unsigned long ip;
#endif #endif
}; };
...@@ -356,7 +361,7 @@ struct lock_class_key { }; ...@@ -356,7 +361,7 @@ struct lock_class_key { };
#ifdef CONFIG_LOCK_STAT #ifdef CONFIG_LOCK_STAT
extern void lock_contended(struct lockdep_map *lock, unsigned long ip); extern void lock_contended(struct lockdep_map *lock, unsigned long ip);
extern void lock_acquired(struct lockdep_map *lock); extern void lock_acquired(struct lockdep_map *lock, unsigned long ip);
#define LOCK_CONTENDED(_lock, try, lock) \ #define LOCK_CONTENDED(_lock, try, lock) \
do { \ do { \
...@@ -364,13 +369,13 @@ do { \ ...@@ -364,13 +369,13 @@ do { \
lock_contended(&(_lock)->dep_map, _RET_IP_); \ lock_contended(&(_lock)->dep_map, _RET_IP_); \
lock(_lock); \ lock(_lock); \
} \ } \
lock_acquired(&(_lock)->dep_map); \ lock_acquired(&(_lock)->dep_map, _RET_IP_); \
} while (0) } while (0)
#else /* CONFIG_LOCK_STAT */ #else /* CONFIG_LOCK_STAT */
#define lock_contended(lockdep_map, ip) do {} while (0) #define lock_contended(lockdep_map, ip) do {} while (0)
#define lock_acquired(lockdep_map) do {} while (0) #define lock_acquired(lockdep_map, ip) do {} while (0)
#define LOCK_CONTENDED(_lock, try, lock) \ #define LOCK_CONTENDED(_lock, try, lock) \
lock(_lock) lock(_lock)
...@@ -481,4 +486,22 @@ static inline void print_irqtrace_events(struct task_struct *curr) ...@@ -481,4 +486,22 @@ static inline void print_irqtrace_events(struct task_struct *curr)
# define lock_map_release(l) do { } while (0) # define lock_map_release(l) do { } while (0)
#endif #endif
#ifdef CONFIG_PROVE_LOCKING
# define might_lock(lock) \
do { \
typecheck(struct lockdep_map *, &(lock)->dep_map); \
lock_acquire(&(lock)->dep_map, 0, 0, 0, 2, NULL, _THIS_IP_); \
lock_release(&(lock)->dep_map, 0, _THIS_IP_); \
} while (0)
# define might_lock_read(lock) \
do { \
typecheck(struct lockdep_map *, &(lock)->dep_map); \
lock_acquire(&(lock)->dep_map, 0, 0, 1, 2, NULL, _THIS_IP_); \
lock_release(&(lock)->dep_map, 0, _THIS_IP_); \
} while (0)
#else
# define might_lock(lock) do { } while (0)
# define might_lock_read(lock) do { } while (0)
#endif
#endif /* __LINUX_LOCKDEP_H */ #endif /* __LINUX_LOCKDEP_H */
...@@ -12,6 +12,7 @@ ...@@ -12,6 +12,7 @@
* See the file COPYING for more details. * See the file COPYING for more details.
*/ */
#include <stdarg.h>
#include <linux/types.h> #include <linux/types.h>
struct module; struct module;
...@@ -48,10 +49,28 @@ struct marker { ...@@ -48,10 +49,28 @@ struct marker {
void (*call)(const struct marker *mdata, void *call_private, ...); void (*call)(const struct marker *mdata, void *call_private, ...);
struct marker_probe_closure single; struct marker_probe_closure single;
struct marker_probe_closure *multi; struct marker_probe_closure *multi;
const char *tp_name; /* Optional tracepoint name */
void *tp_cb; /* Optional tracepoint callback */
} __attribute__((aligned(8))); } __attribute__((aligned(8)));
#ifdef CONFIG_MARKERS #ifdef CONFIG_MARKERS
#define _DEFINE_MARKER(name, tp_name_str, tp_cb, format) \
static const char __mstrtab_##name[] \
__attribute__((section("__markers_strings"))) \
= #name "\0" format; \
static struct marker __mark_##name \
__attribute__((section("__markers"), aligned(8))) = \
{ __mstrtab_##name, &__mstrtab_##name[sizeof(#name)], \
0, 0, marker_probe_cb, { __mark_empty_function, NULL},\
NULL, tp_name_str, tp_cb }
#define DEFINE_MARKER(name, format) \
_DEFINE_MARKER(name, NULL, NULL, format)
#define DEFINE_MARKER_TP(name, tp_name, tp_cb, format) \
_DEFINE_MARKER(name, #tp_name, tp_cb, format)
/* /*
* Note : the empty asm volatile with read constraint is used here instead of a * Note : the empty asm volatile with read constraint is used here instead of a
* "used" attribute to fix a gcc 4.1.x bug. * "used" attribute to fix a gcc 4.1.x bug.
...@@ -65,14 +84,7 @@ struct marker { ...@@ -65,14 +84,7 @@ struct marker {
*/ */
#define __trace_mark(generic, name, call_private, format, args...) \ #define __trace_mark(generic, name, call_private, format, args...) \
do { \ do { \
static const char __mstrtab_##name[] \ DEFINE_MARKER(name, format); \
__attribute__((section("__markers_strings"))) \
= #name "\0" format; \
static struct marker __mark_##name \
__attribute__((section("__markers"), aligned(8))) = \
{ __mstrtab_##name, &__mstrtab_##name[sizeof(#name)], \
0, 0, marker_probe_cb, \
{ __mark_empty_function, NULL}, NULL }; \
__mark_check_format(format, ## args); \ __mark_check_format(format, ## args); \
if (unlikely(__mark_##name.state)) { \ if (unlikely(__mark_##name.state)) { \
(*__mark_##name.call) \ (*__mark_##name.call) \
...@@ -80,14 +92,39 @@ struct marker { ...@@ -80,14 +92,39 @@ struct marker {
} \ } \
} while (0) } while (0)
#define __trace_mark_tp(name, call_private, tp_name, tp_cb, format, args...) \
do { \
void __check_tp_type(void) \
{ \
register_trace_##tp_name(tp_cb); \
} \
DEFINE_MARKER_TP(name, tp_name, tp_cb, format); \
__mark_check_format(format, ## args); \
(*__mark_##name.call)(&__mark_##name, call_private, \
## args); \
} while (0)
extern void marker_update_probe_range(struct marker *begin, extern void marker_update_probe_range(struct marker *begin,
struct marker *end); struct marker *end);
#define GET_MARKER(name) (__mark_##name)
#else /* !CONFIG_MARKERS */ #else /* !CONFIG_MARKERS */
#define DEFINE_MARKER(name, tp_name, tp_cb, format)
#define __trace_mark(generic, name, call_private, format, args...) \ #define __trace_mark(generic, name, call_private, format, args...) \
__mark_check_format(format, ## args) __mark_check_format(format, ## args)
#define __trace_mark_tp(name, call_private, tp_name, tp_cb, format, args...) \
do { \
void __check_tp_type(void) \
{ \
register_trace_##tp_name(tp_cb); \
} \
__mark_check_format(format, ## args); \
} while (0)
static inline void marker_update_probe_range(struct marker *begin, static inline void marker_update_probe_range(struct marker *begin,
struct marker *end) struct marker *end)
{ } { }
#define GET_MARKER(name)
#endif /* CONFIG_MARKERS */ #endif /* CONFIG_MARKERS */
/** /**
...@@ -116,6 +153,20 @@ static inline void marker_update_probe_range(struct marker *begin, ...@@ -116,6 +153,20 @@ static inline void marker_update_probe_range(struct marker *begin,
#define _trace_mark(name, format, args...) \ #define _trace_mark(name, format, args...) \
__trace_mark(1, name, NULL, format, ## args) __trace_mark(1, name, NULL, format, ## args)
/**
* trace_mark_tp - Marker in a tracepoint callback
* @name: marker name, not quoted.
* @tp_name: tracepoint name, not quoted.
* @tp_cb: tracepoint callback. Should have an associated global symbol so it
* is not optimized away by the compiler (should not be static).
* @format: format string
* @args...: variable argument list
*
* Places a marker in a tracepoint callback.
*/
#define trace_mark_tp(name, tp_name, tp_cb, format, args...) \
__trace_mark_tp(name, NULL, tp_name, tp_cb, format, ## args)
/** /**
* MARK_NOARGS - Format string for a marker with no argument. * MARK_NOARGS - Format string for a marker with no argument.
*/ */
...@@ -136,8 +187,6 @@ extern marker_probe_func __mark_empty_function; ...@@ -136,8 +187,6 @@ extern marker_probe_func __mark_empty_function;
extern void marker_probe_cb(const struct marker *mdata, extern void marker_probe_cb(const struct marker *mdata,
void *call_private, ...); void *call_private, ...);
extern void marker_probe_cb_noarg(const struct marker *mdata,
void *call_private, ...);
/* /*
* Connect a probe to a marker. * Connect a probe to a marker.
......
...@@ -144,6 +144,8 @@ extern int __must_check mutex_lock_killable(struct mutex *lock); ...@@ -144,6 +144,8 @@ extern int __must_check mutex_lock_killable(struct mutex *lock);
/* /*
* NOTE: mutex_trylock() follows the spin_trylock() convention, * NOTE: mutex_trylock() follows the spin_trylock() convention,
* not the down_trylock() convention! * not the down_trylock() convention!
*
* Returns 1 if the mutex has been acquired successfully, and 0 on contention.
*/ */
extern int mutex_trylock(struct mutex *lock); extern int mutex_trylock(struct mutex *lock);
extern void mutex_unlock(struct mutex *lock); extern void mutex_unlock(struct mutex *lock);
......
...@@ -41,7 +41,7 @@ ...@@ -41,7 +41,7 @@
#include <linux/seqlock.h> #include <linux/seqlock.h>
#ifdef CONFIG_RCU_CPU_STALL_DETECTOR #ifdef CONFIG_RCU_CPU_STALL_DETECTOR
#define RCU_SECONDS_TILL_STALL_CHECK ( 3 * HZ) /* for rcp->jiffies_stall */ #define RCU_SECONDS_TILL_STALL_CHECK (10 * HZ) /* for rcp->jiffies_stall */
#define RCU_SECONDS_TILL_STALL_RECHECK (30 * HZ) /* for rcp->jiffies_stall */ #define RCU_SECONDS_TILL_STALL_RECHECK (30 * HZ) /* for rcp->jiffies_stall */
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */ #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
......
...@@ -142,6 +142,7 @@ struct rcu_head { ...@@ -142,6 +142,7 @@ struct rcu_head {
* on the write-side to insure proper synchronization. * on the write-side to insure proper synchronization.
*/ */
#define rcu_read_lock_sched() preempt_disable() #define rcu_read_lock_sched() preempt_disable()
#define rcu_read_lock_sched_notrace() preempt_disable_notrace()
/* /*
* rcu_read_unlock_sched - marks the end of a RCU-classic critical section * rcu_read_unlock_sched - marks the end of a RCU-classic critical section
...@@ -149,6 +150,7 @@ struct rcu_head { ...@@ -149,6 +150,7 @@ struct rcu_head {
* See rcu_read_lock_sched for more information. * See rcu_read_lock_sched for more information.
*/ */
#define rcu_read_unlock_sched() preempt_enable() #define rcu_read_unlock_sched() preempt_enable()
#define rcu_read_unlock_sched_notrace() preempt_enable_notrace()
......
...@@ -122,6 +122,7 @@ void ring_buffer_normalize_time_stamp(int cpu, u64 *ts); ...@@ -122,6 +122,7 @@ void ring_buffer_normalize_time_stamp(int cpu, u64 *ts);
void tracing_on(void); void tracing_on(void);
void tracing_off(void); void tracing_off(void);
void tracing_off_permanent(void);
enum ring_buffer_flags { enum ring_buffer_flags {
RB_FL_OVERWRITE = 1 << 0, RB_FL_OVERWRITE = 1 << 0,
......
...@@ -1350,6 +1350,17 @@ struct task_struct { ...@@ -1350,6 +1350,17 @@ struct task_struct {
unsigned long default_timer_slack_ns; unsigned long default_timer_slack_ns;
struct list_head *scm_work_list; struct list_head *scm_work_list;
#ifdef CONFIG_FUNCTION_RET_TRACER
/* Index of current stored adress in ret_stack */
int curr_ret_stack;
/* Stack of return addresses for return function tracing */
struct ftrace_ret_stack *ret_stack;
/*
* Number of functions that haven't been traced
* because of depth overrun.
*/
atomic_t trace_overrun;
#endif
}; };
/* /*
......
...@@ -34,6 +34,7 @@ struct seq_operations { ...@@ -34,6 +34,7 @@ struct seq_operations {
#define SEQ_SKIP 1 #define SEQ_SKIP 1
char *mangle_path(char *s, char *p, char *esc);
int seq_open(struct file *, const struct seq_operations *); int seq_open(struct file *, const struct seq_operations *);
ssize_t seq_read(struct file *, char __user *, size_t, loff_t *); ssize_t seq_read(struct file *, char __user *, size_t, loff_t *);
loff_t seq_lseek(struct file *, loff_t, int); loff_t seq_lseek(struct file *, loff_t, int);
......
...@@ -15,9 +15,17 @@ extern void save_stack_trace_tsk(struct task_struct *tsk, ...@@ -15,9 +15,17 @@ extern void save_stack_trace_tsk(struct task_struct *tsk,
struct stack_trace *trace); struct stack_trace *trace);
extern void print_stack_trace(struct stack_trace *trace, int spaces); extern void print_stack_trace(struct stack_trace *trace, int spaces);
#ifdef CONFIG_USER_STACKTRACE_SUPPORT
extern void save_stack_trace_user(struct stack_trace *trace);
#else
# define save_stack_trace_user(trace) do { } while (0)
#endif
#else #else
# define save_stack_trace(trace) do { } while (0) # define save_stack_trace(trace) do { } while (0)
# define save_stack_trace_tsk(tsk, trace) do { } while (0) # define save_stack_trace_tsk(tsk, trace) do { } while (0)
# define save_stack_trace_user(trace) do { } while (0)
# define print_stack_trace(trace, spaces) do { } while (0) # define print_stack_trace(trace, spaces) do { } while (0)
#endif #endif
......
...@@ -24,8 +24,12 @@ struct tracepoint { ...@@ -24,8 +24,12 @@ struct tracepoint {
const char *name; /* Tracepoint name */ const char *name; /* Tracepoint name */
int state; /* State. */ int state; /* State. */
void **funcs; void **funcs;
} __attribute__((aligned(8))); } __attribute__((aligned(32))); /*
* Aligned on 32 bytes because it is
* globally visible and gcc happily
* align these on the structure size.
* Keep in sync with vmlinux.lds.h.
*/
#define TPPROTO(args...) args #define TPPROTO(args...) args
#define TPARGS(args...) args #define TPARGS(args...) args
...@@ -40,14 +44,14 @@ struct tracepoint { ...@@ -40,14 +44,14 @@ struct tracepoint {
do { \ do { \
void **it_func; \ void **it_func; \
\ \
rcu_read_lock_sched(); \ rcu_read_lock_sched_notrace(); \
it_func = rcu_dereference((tp)->funcs); \ it_func = rcu_dereference((tp)->funcs); \
if (it_func) { \ if (it_func) { \
do { \ do { \
((void(*)(proto))(*it_func))(args); \ ((void(*)(proto))(*it_func))(args); \
} while (*(++it_func)); \ } while (*(++it_func)); \
} \ } \
rcu_read_unlock_sched(); \ rcu_read_unlock_sched_notrace(); \
} while (0) } while (0)
/* /*
...@@ -55,35 +59,40 @@ struct tracepoint { ...@@ -55,35 +59,40 @@ struct tracepoint {
* not add unwanted padding between the beginning of the section and the * not add unwanted padding between the beginning of the section and the
* structure. Force alignment to the same alignment as the section start. * structure. Force alignment to the same alignment as the section start.
*/ */
#define DEFINE_TRACE(name, proto, args) \ #define DECLARE_TRACE(name, proto, args) \
extern struct tracepoint __tracepoint_##name; \
static inline void trace_##name(proto) \ static inline void trace_##name(proto) \
{ \ { \
static const char __tpstrtab_##name[] \
__attribute__((section("__tracepoints_strings"))) \
= #name ":" #proto; \
static struct tracepoint __tracepoint_##name \
__attribute__((section("__tracepoints"), aligned(8))) = \
{ __tpstrtab_##name, 0, NULL }; \
if (unlikely(__tracepoint_##name.state)) \ if (unlikely(__tracepoint_##name.state)) \
__DO_TRACE(&__tracepoint_##name, \ __DO_TRACE(&__tracepoint_##name, \
TPPROTO(proto), TPARGS(args)); \ TPPROTO(proto), TPARGS(args)); \
} \ } \
static inline int register_trace_##name(void (*probe)(proto)) \ static inline int register_trace_##name(void (*probe)(proto)) \
{ \ { \
return tracepoint_probe_register(#name ":" #proto, \ return tracepoint_probe_register(#name, (void *)probe); \
(void *)probe); \
} \ } \
static inline void unregister_trace_##name(void (*probe)(proto))\ static inline int unregister_trace_##name(void (*probe)(proto)) \
{ \ { \
tracepoint_probe_unregister(#name ":" #proto, \ return tracepoint_probe_unregister(#name, (void *)probe);\
(void *)probe); \
} }
#define DEFINE_TRACE(name) \
static const char __tpstrtab_##name[] \
__attribute__((section("__tracepoints_strings"))) = #name; \
struct tracepoint __tracepoint_##name \
__attribute__((section("__tracepoints"), aligned(32))) = \
{ __tpstrtab_##name, 0, NULL }
#define EXPORT_TRACEPOINT_SYMBOL_GPL(name) \
EXPORT_SYMBOL_GPL(__tracepoint_##name)
#define EXPORT_TRACEPOINT_SYMBOL(name) \
EXPORT_SYMBOL(__tracepoint_##name)
extern void tracepoint_update_probe_range(struct tracepoint *begin, extern void tracepoint_update_probe_range(struct tracepoint *begin,
struct tracepoint *end); struct tracepoint *end);
#else /* !CONFIG_TRACEPOINTS */ #else /* !CONFIG_TRACEPOINTS */
#define DEFINE_TRACE(name, proto, args) \ #define DECLARE_TRACE(name, proto, args) \
static inline void _do_trace_##name(struct tracepoint *tp, proto) \ static inline void _do_trace_##name(struct tracepoint *tp, proto) \
{ } \ { } \
static inline void trace_##name(proto) \ static inline void trace_##name(proto) \
...@@ -92,8 +101,14 @@ extern void tracepoint_update_probe_range(struct tracepoint *begin, ...@@ -92,8 +101,14 @@ extern void tracepoint_update_probe_range(struct tracepoint *begin,
{ \ { \
return -ENOSYS; \ return -ENOSYS; \
} \ } \
static inline void unregister_trace_##name(void (*probe)(proto))\ static inline int unregister_trace_##name(void (*probe)(proto)) \
{ } { \
return -ENOSYS; \
}
#define DEFINE_TRACE(name)
#define EXPORT_TRACEPOINT_SYMBOL_GPL(name)
#define EXPORT_TRACEPOINT_SYMBOL(name)
static inline void tracepoint_update_probe_range(struct tracepoint *begin, static inline void tracepoint_update_probe_range(struct tracepoint *begin,
struct tracepoint *end) struct tracepoint *end)
...@@ -112,6 +127,10 @@ extern int tracepoint_probe_register(const char *name, void *probe); ...@@ -112,6 +127,10 @@ extern int tracepoint_probe_register(const char *name, void *probe);
*/ */
extern int tracepoint_probe_unregister(const char *name, void *probe); extern int tracepoint_probe_unregister(const char *name, void *probe);
extern int tracepoint_probe_register_noupdate(const char *name, void *probe);
extern int tracepoint_probe_unregister_noupdate(const char *name, void *probe);
extern void tracepoint_probe_update_all(void);
struct tracepoint_iter { struct tracepoint_iter {
struct module *module; struct module *module;
struct tracepoint *tracepoint; struct tracepoint *tracepoint;
......
...@@ -78,7 +78,7 @@ static inline unsigned long __copy_from_user_nocache(void *to, ...@@ -78,7 +78,7 @@ static inline unsigned long __copy_from_user_nocache(void *to,
\ \
set_fs(KERNEL_DS); \ set_fs(KERNEL_DS); \
pagefault_disable(); \ pagefault_disable(); \
ret = __get_user(retval, (__force typeof(retval) __user *)(addr)); \ ret = __copy_from_user_inatomic(&(retval), (__force typeof(retval) __user *)(addr), sizeof(retval)); \
pagefault_enable(); \ pagefault_enable(); \
set_fs(old_fs); \ set_fs(old_fs); \
ret; \ ret; \
......
#ifndef _LINUX_TRACE_BOOT_H
#define _LINUX_TRACE_BOOT_H
/*
* Structure which defines the trace of an initcall
* while it is called.
* You don't have to fill the func field since it is
* only used internally by the tracer.
*/
struct boot_trace_call {
pid_t caller;
char func[KSYM_NAME_LEN];
};
/*
* Structure which defines the trace of an initcall
* while it returns.
*/
struct boot_trace_ret {
char func[KSYM_NAME_LEN];
int result;
unsigned long long duration; /* nsecs */
};
#ifdef CONFIG_BOOT_TRACER
/* Append the traces on the ring-buffer */
extern void trace_boot_call(struct boot_trace_call *bt, initcall_t fn);
extern void trace_boot_ret(struct boot_trace_ret *bt, initcall_t fn);
/* Tells the tracer that smp_pre_initcall is finished.
* So we can start the tracing
*/
extern void start_boot_trace(void);
/* Resume the tracing of other necessary events
* such as sched switches
*/
extern void enable_boot_trace(void);
/* Suspend this tracing. Actually, only sched_switches tracing have
* to be suspended. Initcalls doesn't need it.)
*/
extern void disable_boot_trace(void);
#else
static inline
void trace_boot_call(struct boot_trace_call *bt, initcall_t fn) { }
static inline
void trace_boot_ret(struct boot_trace_ret *bt, initcall_t fn) { }
static inline void start_boot_trace(void) { }
static inline void enable_boot_trace(void) { }
static inline void disable_boot_trace(void) { }
#endif /* CONFIG_BOOT_TRACER */
#endif /* __LINUX_TRACE_BOOT_H */
...@@ -4,52 +4,52 @@ ...@@ -4,52 +4,52 @@
#include <linux/sched.h> #include <linux/sched.h>
#include <linux/tracepoint.h> #include <linux/tracepoint.h>
DEFINE_TRACE(sched_kthread_stop, DECLARE_TRACE(sched_kthread_stop,
TPPROTO(struct task_struct *t), TPPROTO(struct task_struct *t),
TPARGS(t)); TPARGS(t));
DEFINE_TRACE(sched_kthread_stop_ret, DECLARE_TRACE(sched_kthread_stop_ret,
TPPROTO(int ret), TPPROTO(int ret),
TPARGS(ret)); TPARGS(ret));
DEFINE_TRACE(sched_wait_task, DECLARE_TRACE(sched_wait_task,
TPPROTO(struct rq *rq, struct task_struct *p), TPPROTO(struct rq *rq, struct task_struct *p),
TPARGS(rq, p)); TPARGS(rq, p));
DEFINE_TRACE(sched_wakeup, DECLARE_TRACE(sched_wakeup,
TPPROTO(struct rq *rq, struct task_struct *p), TPPROTO(struct rq *rq, struct task_struct *p),
TPARGS(rq, p)); TPARGS(rq, p));
DEFINE_TRACE(sched_wakeup_new, DECLARE_TRACE(sched_wakeup_new,
TPPROTO(struct rq *rq, struct task_struct *p), TPPROTO(struct rq *rq, struct task_struct *p),
TPARGS(rq, p)); TPARGS(rq, p));
DEFINE_TRACE(sched_switch, DECLARE_TRACE(sched_switch,
TPPROTO(struct rq *rq, struct task_struct *prev, TPPROTO(struct rq *rq, struct task_struct *prev,
struct task_struct *next), struct task_struct *next),
TPARGS(rq, prev, next)); TPARGS(rq, prev, next));
DEFINE_TRACE(sched_migrate_task, DECLARE_TRACE(sched_migrate_task,
TPPROTO(struct rq *rq, struct task_struct *p, int dest_cpu), TPPROTO(struct rq *rq, struct task_struct *p, int dest_cpu),
TPARGS(rq, p, dest_cpu)); TPARGS(rq, p, dest_cpu));
DEFINE_TRACE(sched_process_free, DECLARE_TRACE(sched_process_free,
TPPROTO(struct task_struct *p), TPPROTO(struct task_struct *p),
TPARGS(p)); TPARGS(p));
DEFINE_TRACE(sched_process_exit, DECLARE_TRACE(sched_process_exit,
TPPROTO(struct task_struct *p), TPPROTO(struct task_struct *p),
TPARGS(p)); TPARGS(p));
DEFINE_TRACE(sched_process_wait, DECLARE_TRACE(sched_process_wait,
TPPROTO(struct pid *pid), TPPROTO(struct pid *pid),
TPARGS(pid)); TPARGS(pid));
DEFINE_TRACE(sched_process_fork, DECLARE_TRACE(sched_process_fork,
TPPROTO(struct task_struct *parent, struct task_struct *child), TPPROTO(struct task_struct *parent, struct task_struct *child),
TPARGS(parent, child)); TPARGS(parent, child));
DEFINE_TRACE(sched_signal_send, DECLARE_TRACE(sched_signal_send,
TPPROTO(int sig, struct task_struct *p), TPPROTO(int sig, struct task_struct *p),
TPARGS(sig, p)); TPARGS(sig, p));
......
...@@ -808,6 +808,7 @@ config TRACEPOINTS ...@@ -808,6 +808,7 @@ config TRACEPOINTS
config MARKERS config MARKERS
bool "Activate markers" bool "Activate markers"
depends on TRACEPOINTS
help help
Place an empty function call at each marker site. Can be Place an empty function call at each marker site. Can be
dynamically changed for a probe function. dynamically changed for a probe function.
......
...@@ -63,6 +63,7 @@ ...@@ -63,6 +63,7 @@
#include <linux/signal.h> #include <linux/signal.h>
#include <linux/idr.h> #include <linux/idr.h>
#include <linux/ftrace.h> #include <linux/ftrace.h>
#include <trace/boot.h>
#include <asm/io.h> #include <asm/io.h>
#include <asm/bugs.h> #include <asm/bugs.h>
...@@ -703,31 +704,35 @@ core_param(initcall_debug, initcall_debug, bool, 0644); ...@@ -703,31 +704,35 @@ core_param(initcall_debug, initcall_debug, bool, 0644);
int do_one_initcall(initcall_t fn) int do_one_initcall(initcall_t fn)
{ {
int count = preempt_count(); int count = preempt_count();
ktime_t delta; ktime_t calltime, delta, rettime;
char msgbuf[64]; char msgbuf[64];
struct boot_trace it; struct boot_trace_call call;
struct boot_trace_ret ret;
if (initcall_debug) { if (initcall_debug) {
it.caller = task_pid_nr(current); call.caller = task_pid_nr(current);
printk("calling %pF @ %i\n", fn, it.caller); printk("calling %pF @ %i\n", fn, call.caller);
it.calltime = ktime_get(); calltime = ktime_get();
trace_boot_call(&call, fn);
enable_boot_trace();
} }
it.result = fn(); ret.result = fn();
if (initcall_debug) { if (initcall_debug) {
it.rettime = ktime_get(); disable_boot_trace();
delta = ktime_sub(it.rettime, it.calltime); rettime = ktime_get();
it.duration = (unsigned long long) delta.tv64 >> 10; delta = ktime_sub(rettime, calltime);
ret.duration = (unsigned long long) ktime_to_ns(delta) >> 10;
trace_boot_ret(&ret, fn);
printk("initcall %pF returned %d after %Ld usecs\n", fn, printk("initcall %pF returned %d after %Ld usecs\n", fn,
it.result, it.duration); ret.result, ret.duration);
trace_boot(&it, fn);
} }
msgbuf[0] = 0; msgbuf[0] = 0;
if (it.result && it.result != -ENODEV && initcall_debug) if (ret.result && ret.result != -ENODEV && initcall_debug)
sprintf(msgbuf, "error code %d ", it.result); sprintf(msgbuf, "error code %d ", ret.result);
if (preempt_count() != count) { if (preempt_count() != count) {
strlcat(msgbuf, "preemption imbalance ", sizeof(msgbuf)); strlcat(msgbuf, "preemption imbalance ", sizeof(msgbuf));
...@@ -741,7 +746,7 @@ int do_one_initcall(initcall_t fn) ...@@ -741,7 +746,7 @@ int do_one_initcall(initcall_t fn)
printk("initcall %pF returned with %s\n", fn, msgbuf); printk("initcall %pF returned with %s\n", fn, msgbuf);
} }
return it.result; return ret.result;
} }
...@@ -882,7 +887,7 @@ static int __init kernel_init(void * unused) ...@@ -882,7 +887,7 @@ static int __init kernel_init(void * unused)
* we're essentially up and running. Get rid of the * we're essentially up and running. Get rid of the
* initmem segments and start the user-mode stuff.. * initmem segments and start the user-mode stuff..
*/ */
stop_boot_trace();
init_post(); init_post();
return 0; return 0;
} }
...@@ -20,6 +20,10 @@ CFLAGS_REMOVE_rtmutex-debug.o = -pg ...@@ -20,6 +20,10 @@ CFLAGS_REMOVE_rtmutex-debug.o = -pg
CFLAGS_REMOVE_cgroup-debug.o = -pg CFLAGS_REMOVE_cgroup-debug.o = -pg
CFLAGS_REMOVE_sched_clock.o = -pg CFLAGS_REMOVE_sched_clock.o = -pg
endif endif
ifdef CONFIG_FUNCTION_RET_TRACER
CFLAGS_REMOVE_extable.o = -pg # For __kernel_text_address()
CFLAGS_REMOVE_module.o = -pg # For __module_text_address()
endif
obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_FREEZER) += freezer.o
obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_PROFILING) += profile.o
......
...@@ -53,6 +53,10 @@ ...@@ -53,6 +53,10 @@
#include <asm/pgtable.h> #include <asm/pgtable.h>
#include <asm/mmu_context.h> #include <asm/mmu_context.h>
DEFINE_TRACE(sched_process_free);
DEFINE_TRACE(sched_process_exit);
DEFINE_TRACE(sched_process_wait);
static void exit_mm(struct task_struct * tsk); static void exit_mm(struct task_struct * tsk);
static inline int task_detached(struct task_struct *p) static inline int task_detached(struct task_struct *p)
...@@ -1123,7 +1127,6 @@ NORET_TYPE void do_exit(long code) ...@@ -1123,7 +1127,6 @@ NORET_TYPE void do_exit(long code)
preempt_disable(); preempt_disable();
/* causes final put_task_struct in finish_task_switch(). */ /* causes final put_task_struct in finish_task_switch(). */
tsk->state = TASK_DEAD; tsk->state = TASK_DEAD;
schedule(); schedule();
BUG(); BUG();
/* Avoid "noreturn function does return". */ /* Avoid "noreturn function does return". */
...@@ -1321,10 +1324,10 @@ static int wait_task_zombie(struct task_struct *p, int options, ...@@ -1321,10 +1324,10 @@ static int wait_task_zombie(struct task_struct *p, int options,
* group, which consolidates times for all threads in the * group, which consolidates times for all threads in the
* group including the group leader. * group including the group leader.
*/ */
thread_group_cputime(p, &cputime);
spin_lock_irq(&p->parent->sighand->siglock); spin_lock_irq(&p->parent->sighand->siglock);
psig = p->parent->signal; psig = p->parent->signal;
sig = p->signal; sig = p->signal;
thread_group_cputime(p, &cputime);
psig->cutime = psig->cutime =
cputime_add(psig->cutime, cputime_add(psig->cutime,
cputime_add(cputime.utime, cputime_add(cputime.utime,
......
...@@ -66,3 +66,19 @@ int kernel_text_address(unsigned long addr) ...@@ -66,3 +66,19 @@ int kernel_text_address(unsigned long addr)
return 1; return 1;
return module_text_address(addr) != NULL; return module_text_address(addr) != NULL;
} }
/*
* On some architectures (PPC64, IA64) function pointers
* are actually only tokens to some data that then holds the
* real function address. As a result, to find if a function
* pointer is part of the kernel text, we need to do some
* special dereferencing first.
*/
int func_ptr_is_kernel_text(void *ptr)
{
unsigned long addr;
addr = (unsigned long) dereference_function_descriptor(ptr);
if (core_kernel_text(addr))
return 1;
return module_text_address(addr) != NULL;
}
...@@ -47,6 +47,7 @@ ...@@ -47,6 +47,7 @@
#include <linux/mount.h> #include <linux/mount.h>
#include <linux/audit.h> #include <linux/audit.h>
#include <linux/memcontrol.h> #include <linux/memcontrol.h>
#include <linux/ftrace.h>
#include <linux/profile.h> #include <linux/profile.h>
#include <linux/rmap.h> #include <linux/rmap.h>
#include <linux/acct.h> #include <linux/acct.h>
...@@ -80,6 +81,8 @@ DEFINE_PER_CPU(unsigned long, process_counts) = 0; ...@@ -80,6 +81,8 @@ DEFINE_PER_CPU(unsigned long, process_counts) = 0;
__cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */ __cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */
DEFINE_TRACE(sched_process_fork);
int nr_processes(void) int nr_processes(void)
{ {
int cpu; int cpu;
...@@ -137,6 +140,7 @@ void free_task(struct task_struct *tsk) ...@@ -137,6 +140,7 @@ void free_task(struct task_struct *tsk)
prop_local_destroy_single(&tsk->dirties); prop_local_destroy_single(&tsk->dirties);
free_thread_info(tsk->stack); free_thread_info(tsk->stack);
rt_mutex_debug_task_free(tsk); rt_mutex_debug_task_free(tsk);
ftrace_retfunc_exit_task(tsk);
free_task_struct(tsk); free_task_struct(tsk);
} }
EXPORT_SYMBOL(free_task); EXPORT_SYMBOL(free_task);
...@@ -1267,6 +1271,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, ...@@ -1267,6 +1271,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
total_forks++; total_forks++;
spin_unlock(&current->sighand->siglock); spin_unlock(&current->sighand->siglock);
write_unlock_irq(&tasklist_lock); write_unlock_irq(&tasklist_lock);
ftrace_retfunc_init_task(p);
proc_fork_connector(p); proc_fork_connector(p);
cgroup_post_fork(p); cgroup_post_fork(p);
return p; return p;
......
...@@ -122,24 +122,6 @@ struct futex_hash_bucket { ...@@ -122,24 +122,6 @@ struct futex_hash_bucket {
static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS]; static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS];
/*
* Take mm->mmap_sem, when futex is shared
*/
static inline void futex_lock_mm(struct rw_semaphore *fshared)
{
if (fshared)
down_read(fshared);
}
/*
* Release mm->mmap_sem, when the futex is shared
*/
static inline void futex_unlock_mm(struct rw_semaphore *fshared)
{
if (fshared)
up_read(fshared);
}
/* /*
* We hash on the keys returned from get_futex_key (see below). * We hash on the keys returned from get_futex_key (see below).
*/ */
...@@ -161,6 +143,45 @@ static inline int match_futex(union futex_key *key1, union futex_key *key2) ...@@ -161,6 +143,45 @@ static inline int match_futex(union futex_key *key1, union futex_key *key2)
&& key1->both.offset == key2->both.offset); && key1->both.offset == key2->both.offset);
} }
/*
* Take a reference to the resource addressed by a key.
* Can be called while holding spinlocks.
*
*/
static void get_futex_key_refs(union futex_key *key)
{
if (!key->both.ptr)
return;
switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
case FUT_OFF_INODE:
atomic_inc(&key->shared.inode->i_count);
break;
case FUT_OFF_MMSHARED:
atomic_inc(&key->private.mm->mm_count);
break;
}
}
/*
* Drop a reference to the resource addressed by a key.
* The hash bucket spinlock must not be held.
*/
static void drop_futex_key_refs(union futex_key *key)
{
if (!key->both.ptr)
return;
switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
case FUT_OFF_INODE:
iput(key->shared.inode);
break;
case FUT_OFF_MMSHARED:
mmdrop(key->private.mm);
break;
}
}
/** /**
* get_futex_key - Get parameters which are the keys for a futex. * get_futex_key - Get parameters which are the keys for a futex.
* @uaddr: virtual address of the futex * @uaddr: virtual address of the futex
...@@ -179,12 +200,10 @@ static inline int match_futex(union futex_key *key1, union futex_key *key2) ...@@ -179,12 +200,10 @@ static inline int match_futex(union futex_key *key1, union futex_key *key2)
* For other futexes, it points to &current->mm->mmap_sem and * For other futexes, it points to &current->mm->mmap_sem and
* caller must have taken the reader lock. but NOT any spinlocks. * caller must have taken the reader lock. but NOT any spinlocks.
*/ */
static int get_futex_key(u32 __user *uaddr, struct rw_semaphore *fshared, static int get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key)
union futex_key *key)
{ {
unsigned long address = (unsigned long)uaddr; unsigned long address = (unsigned long)uaddr;
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
struct page *page; struct page *page;
int err; int err;
...@@ -208,100 +227,50 @@ static int get_futex_key(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -208,100 +227,50 @@ static int get_futex_key(u32 __user *uaddr, struct rw_semaphore *fshared,
return -EFAULT; return -EFAULT;
key->private.mm = mm; key->private.mm = mm;
key->private.address = address; key->private.address = address;
get_futex_key_refs(key);
return 0; return 0;
} }
/*
* The futex is hashed differently depending on whether
* it's in a shared or private mapping. So check vma first.
*/
vma = find_extend_vma(mm, address);
if (unlikely(!vma))
return -EFAULT;
/* again:
* Permissions. err = get_user_pages_fast(address, 1, 0, &page);
*/ if (err < 0)
if (unlikely((vma->vm_flags & (VM_IO|VM_READ)) != VM_READ)) return err;
return (vma->vm_flags & VM_IO) ? -EPERM : -EACCES;
lock_page(page);
if (!page->mapping) {
unlock_page(page);
put_page(page);
goto again;
}
/* /*
* Private mappings are handled in a simple way. * Private mappings are handled in a simple way.
* *
* NOTE: When userspace waits on a MAP_SHARED mapping, even if * NOTE: When userspace waits on a MAP_SHARED mapping, even if
* it's a read-only handle, it's expected that futexes attach to * it's a read-only handle, it's expected that futexes attach to
* the object not the particular process. Therefore we use * the object not the particular process.
* VM_MAYSHARE here, not VM_SHARED which is restricted to shared
* mappings of _writable_ handles.
*/ */
if (likely(!(vma->vm_flags & VM_MAYSHARE))) { if (PageAnon(page)) {
key->both.offset |= FUT_OFF_MMSHARED; /* reference taken on mm */ key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */
key->private.mm = mm; key->private.mm = mm;
key->private.address = address; key->private.address = address;
return 0; } else {
key->both.offset |= FUT_OFF_INODE; /* inode-based key */
key->shared.inode = page->mapping->host;
key->shared.pgoff = page->index;
} }
/* get_futex_key_refs(key);
* Linear file mappings are also simple.
*/
key->shared.inode = vma->vm_file->f_path.dentry->d_inode;
key->both.offset |= FUT_OFF_INODE; /* inode-based key. */
if (likely(!(vma->vm_flags & VM_NONLINEAR))) {
key->shared.pgoff = (((address - vma->vm_start) >> PAGE_SHIFT)
+ vma->vm_pgoff);
return 0;
}
/* unlock_page(page);
* We could walk the page table to read the non-linear put_page(page);
* pte, and get the page index without fetching the page return 0;
* from swap. But that's a lot of code to duplicate here
* for a rare case, so we simply fetch the page.
*/
err = get_user_pages(current, mm, address, 1, 0, 0, &page, NULL);
if (err >= 0) {
key->shared.pgoff =
page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
put_page(page);
return 0;
}
return err;
}
/*
* Take a reference to the resource addressed by a key.
* Can be called while holding spinlocks.
*
*/
static void get_futex_key_refs(union futex_key *key)
{
if (key->both.ptr == NULL)
return;
switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
case FUT_OFF_INODE:
atomic_inc(&key->shared.inode->i_count);
break;
case FUT_OFF_MMSHARED:
atomic_inc(&key->private.mm->mm_count);
break;
}
} }
/* static inline
* Drop a reference to the resource addressed by a key. void put_futex_key(int fshared, union futex_key *key)
* The hash bucket spinlock must not be held.
*/
static void drop_futex_key_refs(union futex_key *key)
{ {
if (!key->both.ptr) drop_futex_key_refs(key);
return;
switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
case FUT_OFF_INODE:
iput(key->shared.inode);
break;
case FUT_OFF_MMSHARED:
mmdrop(key->private.mm);
break;
}
} }
static u32 cmpxchg_futex_value_locked(u32 __user *uaddr, u32 uval, u32 newval) static u32 cmpxchg_futex_value_locked(u32 __user *uaddr, u32 uval, u32 newval)
...@@ -328,10 +297,8 @@ static int get_futex_value_locked(u32 *dest, u32 __user *from) ...@@ -328,10 +297,8 @@ static int get_futex_value_locked(u32 *dest, u32 __user *from)
/* /*
* Fault handling. * Fault handling.
* if fshared is non NULL, current->mm->mmap_sem is already held
*/ */
static int futex_handle_fault(unsigned long address, static int futex_handle_fault(unsigned long address, int attempt)
struct rw_semaphore *fshared, int attempt)
{ {
struct vm_area_struct * vma; struct vm_area_struct * vma;
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
...@@ -340,8 +307,7 @@ static int futex_handle_fault(unsigned long address, ...@@ -340,8 +307,7 @@ static int futex_handle_fault(unsigned long address,
if (attempt > 2) if (attempt > 2)
return ret; return ret;
if (!fshared) down_read(&mm->mmap_sem);
down_read(&mm->mmap_sem);
vma = find_vma(mm, address); vma = find_vma(mm, address);
if (vma && address >= vma->vm_start && if (vma && address >= vma->vm_start &&
(vma->vm_flags & VM_WRITE)) { (vma->vm_flags & VM_WRITE)) {
...@@ -361,8 +327,7 @@ static int futex_handle_fault(unsigned long address, ...@@ -361,8 +327,7 @@ static int futex_handle_fault(unsigned long address,
current->min_flt++; current->min_flt++;
} }
} }
if (!fshared) up_read(&mm->mmap_sem);
up_read(&mm->mmap_sem);
return ret; return ret;
} }
...@@ -385,6 +350,7 @@ static int refill_pi_state_cache(void) ...@@ -385,6 +350,7 @@ static int refill_pi_state_cache(void)
/* pi_mutex gets initialized later */ /* pi_mutex gets initialized later */
pi_state->owner = NULL; pi_state->owner = NULL;
atomic_set(&pi_state->refcount, 1); atomic_set(&pi_state->refcount, 1);
pi_state->key = FUTEX_KEY_INIT;
current->pi_state_cache = pi_state; current->pi_state_cache = pi_state;
...@@ -462,7 +428,7 @@ void exit_pi_state_list(struct task_struct *curr) ...@@ -462,7 +428,7 @@ void exit_pi_state_list(struct task_struct *curr)
struct list_head *next, *head = &curr->pi_state_list; struct list_head *next, *head = &curr->pi_state_list;
struct futex_pi_state *pi_state; struct futex_pi_state *pi_state;
struct futex_hash_bucket *hb; struct futex_hash_bucket *hb;
union futex_key key; union futex_key key = FUTEX_KEY_INIT;
if (!futex_cmpxchg_enabled) if (!futex_cmpxchg_enabled)
return; return;
...@@ -719,20 +685,17 @@ double_lock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2) ...@@ -719,20 +685,17 @@ double_lock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2)
* Wake up all waiters hashed on the physical page that is mapped * Wake up all waiters hashed on the physical page that is mapped
* to this virtual address: * to this virtual address:
*/ */
static int futex_wake(u32 __user *uaddr, struct rw_semaphore *fshared, static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
int nr_wake, u32 bitset)
{ {
struct futex_hash_bucket *hb; struct futex_hash_bucket *hb;
struct futex_q *this, *next; struct futex_q *this, *next;
struct plist_head *head; struct plist_head *head;
union futex_key key; union futex_key key = FUTEX_KEY_INIT;
int ret; int ret;
if (!bitset) if (!bitset)
return -EINVAL; return -EINVAL;
futex_lock_mm(fshared);
ret = get_futex_key(uaddr, fshared, &key); ret = get_futex_key(uaddr, fshared, &key);
if (unlikely(ret != 0)) if (unlikely(ret != 0))
goto out; goto out;
...@@ -760,7 +723,7 @@ static int futex_wake(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -760,7 +723,7 @@ static int futex_wake(u32 __user *uaddr, struct rw_semaphore *fshared,
spin_unlock(&hb->lock); spin_unlock(&hb->lock);
out: out:
futex_unlock_mm(fshared); put_futex_key(fshared, &key);
return ret; return ret;
} }
...@@ -769,19 +732,16 @@ static int futex_wake(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -769,19 +732,16 @@ static int futex_wake(u32 __user *uaddr, struct rw_semaphore *fshared,
* to this virtual address: * to this virtual address:
*/ */
static int static int
futex_wake_op(u32 __user *uaddr1, struct rw_semaphore *fshared, futex_wake_op(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
u32 __user *uaddr2,
int nr_wake, int nr_wake2, int op) int nr_wake, int nr_wake2, int op)
{ {
union futex_key key1, key2; union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
struct futex_hash_bucket *hb1, *hb2; struct futex_hash_bucket *hb1, *hb2;
struct plist_head *head; struct plist_head *head;
struct futex_q *this, *next; struct futex_q *this, *next;
int ret, op_ret, attempt = 0; int ret, op_ret, attempt = 0;
retryfull: retryfull:
futex_lock_mm(fshared);
ret = get_futex_key(uaddr1, fshared, &key1); ret = get_futex_key(uaddr1, fshared, &key1);
if (unlikely(ret != 0)) if (unlikely(ret != 0))
goto out; goto out;
...@@ -826,18 +786,12 @@ futex_wake_op(u32 __user *uaddr1, struct rw_semaphore *fshared, ...@@ -826,18 +786,12 @@ futex_wake_op(u32 __user *uaddr1, struct rw_semaphore *fshared,
*/ */
if (attempt++) { if (attempt++) {
ret = futex_handle_fault((unsigned long)uaddr2, ret = futex_handle_fault((unsigned long)uaddr2,
fshared, attempt); attempt);
if (ret) if (ret)
goto out; goto out;
goto retry; goto retry;
} }
/*
* If we would have faulted, release mmap_sem,
* fault it in and start all over again.
*/
futex_unlock_mm(fshared);
ret = get_user(dummy, uaddr2); ret = get_user(dummy, uaddr2);
if (ret) if (ret)
return ret; return ret;
...@@ -873,7 +827,8 @@ futex_wake_op(u32 __user *uaddr1, struct rw_semaphore *fshared, ...@@ -873,7 +827,8 @@ futex_wake_op(u32 __user *uaddr1, struct rw_semaphore *fshared,
if (hb1 != hb2) if (hb1 != hb2)
spin_unlock(&hb2->lock); spin_unlock(&hb2->lock);
out: out:
futex_unlock_mm(fshared); put_futex_key(fshared, &key2);
put_futex_key(fshared, &key1);
return ret; return ret;
} }
...@@ -882,19 +837,16 @@ futex_wake_op(u32 __user *uaddr1, struct rw_semaphore *fshared, ...@@ -882,19 +837,16 @@ futex_wake_op(u32 __user *uaddr1, struct rw_semaphore *fshared,
* Requeue all waiters hashed on one physical page to another * Requeue all waiters hashed on one physical page to another
* physical page. * physical page.
*/ */
static int futex_requeue(u32 __user *uaddr1, struct rw_semaphore *fshared, static int futex_requeue(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
u32 __user *uaddr2,
int nr_wake, int nr_requeue, u32 *cmpval) int nr_wake, int nr_requeue, u32 *cmpval)
{ {
union futex_key key1, key2; union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
struct futex_hash_bucket *hb1, *hb2; struct futex_hash_bucket *hb1, *hb2;
struct plist_head *head1; struct plist_head *head1;
struct futex_q *this, *next; struct futex_q *this, *next;
int ret, drop_count = 0; int ret, drop_count = 0;
retry: retry:
futex_lock_mm(fshared);
ret = get_futex_key(uaddr1, fshared, &key1); ret = get_futex_key(uaddr1, fshared, &key1);
if (unlikely(ret != 0)) if (unlikely(ret != 0))
goto out; goto out;
...@@ -917,12 +869,6 @@ static int futex_requeue(u32 __user *uaddr1, struct rw_semaphore *fshared, ...@@ -917,12 +869,6 @@ static int futex_requeue(u32 __user *uaddr1, struct rw_semaphore *fshared,
if (hb1 != hb2) if (hb1 != hb2)
spin_unlock(&hb2->lock); spin_unlock(&hb2->lock);
/*
* If we would have faulted, release mmap_sem, fault
* it in and start all over again.
*/
futex_unlock_mm(fshared);
ret = get_user(curval, uaddr1); ret = get_user(curval, uaddr1);
if (!ret) if (!ret)
...@@ -974,7 +920,8 @@ static int futex_requeue(u32 __user *uaddr1, struct rw_semaphore *fshared, ...@@ -974,7 +920,8 @@ static int futex_requeue(u32 __user *uaddr1, struct rw_semaphore *fshared,
drop_futex_key_refs(&key1); drop_futex_key_refs(&key1);
out: out:
futex_unlock_mm(fshared); put_futex_key(fshared, &key2);
put_futex_key(fshared, &key1);
return ret; return ret;
} }
...@@ -1096,8 +1043,7 @@ static void unqueue_me_pi(struct futex_q *q) ...@@ -1096,8 +1043,7 @@ static void unqueue_me_pi(struct futex_q *q)
* private futexes. * private futexes.
*/ */
static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
struct task_struct *newowner, struct task_struct *newowner, int fshared)
struct rw_semaphore *fshared)
{ {
u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS; u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
struct futex_pi_state *pi_state = q->pi_state; struct futex_pi_state *pi_state = q->pi_state;
...@@ -1176,7 +1122,7 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, ...@@ -1176,7 +1122,7 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
handle_fault: handle_fault:
spin_unlock(q->lock_ptr); spin_unlock(q->lock_ptr);
ret = futex_handle_fault((unsigned long)uaddr, fshared, attempt++); ret = futex_handle_fault((unsigned long)uaddr, attempt++);
spin_lock(q->lock_ptr); spin_lock(q->lock_ptr);
...@@ -1200,7 +1146,7 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, ...@@ -1200,7 +1146,7 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
static long futex_wait_restart(struct restart_block *restart); static long futex_wait_restart(struct restart_block *restart);
static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared, static int futex_wait(u32 __user *uaddr, int fshared,
u32 val, ktime_t *abs_time, u32 bitset) u32 val, ktime_t *abs_time, u32 bitset)
{ {
struct task_struct *curr = current; struct task_struct *curr = current;
...@@ -1218,8 +1164,7 @@ static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -1218,8 +1164,7 @@ static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
q.pi_state = NULL; q.pi_state = NULL;
q.bitset = bitset; q.bitset = bitset;
retry: retry:
futex_lock_mm(fshared); q.key = FUTEX_KEY_INIT;
ret = get_futex_key(uaddr, fshared, &q.key); ret = get_futex_key(uaddr, fshared, &q.key);
if (unlikely(ret != 0)) if (unlikely(ret != 0))
goto out_release_sem; goto out_release_sem;
...@@ -1251,12 +1196,6 @@ static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -1251,12 +1196,6 @@ static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
if (unlikely(ret)) { if (unlikely(ret)) {
queue_unlock(&q, hb); queue_unlock(&q, hb);
/*
* If we would have faulted, release mmap_sem, fault it in and
* start all over again.
*/
futex_unlock_mm(fshared);
ret = get_user(uval, uaddr); ret = get_user(uval, uaddr);
if (!ret) if (!ret)
...@@ -1270,12 +1209,6 @@ static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -1270,12 +1209,6 @@ static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
/* Only actually queue if *uaddr contained val. */ /* Only actually queue if *uaddr contained val. */
queue_me(&q, hb); queue_me(&q, hb);
/*
* Now the futex is queued and we have checked the data, we
* don't want to hold mmap_sem while we sleep.
*/
futex_unlock_mm(fshared);
/* /*
* There might have been scheduling since the queue_me(), as we * There might have been scheduling since the queue_me(), as we
* cannot hold a spinlock across the get_user() in case it * cannot hold a spinlock across the get_user() in case it
...@@ -1363,7 +1296,7 @@ static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -1363,7 +1296,7 @@ static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
queue_unlock(&q, hb); queue_unlock(&q, hb);
out_release_sem: out_release_sem:
futex_unlock_mm(fshared); put_futex_key(fshared, &q.key);
return ret; return ret;
} }
...@@ -1371,13 +1304,13 @@ static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -1371,13 +1304,13 @@ static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
static long futex_wait_restart(struct restart_block *restart) static long futex_wait_restart(struct restart_block *restart)
{ {
u32 __user *uaddr = (u32 __user *)restart->futex.uaddr; u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
struct rw_semaphore *fshared = NULL; int fshared = 0;
ktime_t t; ktime_t t;
t.tv64 = restart->futex.time; t.tv64 = restart->futex.time;
restart->fn = do_no_restart_syscall; restart->fn = do_no_restart_syscall;
if (restart->futex.flags & FLAGS_SHARED) if (restart->futex.flags & FLAGS_SHARED)
fshared = &current->mm->mmap_sem; fshared = 1;
return (long)futex_wait(uaddr, fshared, restart->futex.val, &t, return (long)futex_wait(uaddr, fshared, restart->futex.val, &t,
restart->futex.bitset); restart->futex.bitset);
} }
...@@ -1389,7 +1322,7 @@ static long futex_wait_restart(struct restart_block *restart) ...@@ -1389,7 +1322,7 @@ static long futex_wait_restart(struct restart_block *restart)
* if there are waiters then it will block, it does PI, etc. (Due to * if there are waiters then it will block, it does PI, etc. (Due to
* races the kernel might see a 0 value of the futex too.) * races the kernel might see a 0 value of the futex too.)
*/ */
static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared, static int futex_lock_pi(u32 __user *uaddr, int fshared,
int detect, ktime_t *time, int trylock) int detect, ktime_t *time, int trylock)
{ {
struct hrtimer_sleeper timeout, *to = NULL; struct hrtimer_sleeper timeout, *to = NULL;
...@@ -1412,8 +1345,7 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -1412,8 +1345,7 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared,
q.pi_state = NULL; q.pi_state = NULL;
retry: retry:
futex_lock_mm(fshared); q.key = FUTEX_KEY_INIT;
ret = get_futex_key(uaddr, fshared, &q.key); ret = get_futex_key(uaddr, fshared, &q.key);
if (unlikely(ret != 0)) if (unlikely(ret != 0))
goto out_release_sem; goto out_release_sem;
...@@ -1502,7 +1434,6 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -1502,7 +1434,6 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared,
* exit to complete. * exit to complete.
*/ */
queue_unlock(&q, hb); queue_unlock(&q, hb);
futex_unlock_mm(fshared);
cond_resched(); cond_resched();
goto retry; goto retry;
...@@ -1534,12 +1465,6 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -1534,12 +1465,6 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared,
*/ */
queue_me(&q, hb); queue_me(&q, hb);
/*
* Now the futex is queued and we have checked the data, we
* don't want to hold mmap_sem while we sleep.
*/
futex_unlock_mm(fshared);
WARN_ON(!q.pi_state); WARN_ON(!q.pi_state);
/* /*
* Block on the PI mutex: * Block on the PI mutex:
...@@ -1552,7 +1477,6 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -1552,7 +1477,6 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared,
ret = ret ? 0 : -EWOULDBLOCK; ret = ret ? 0 : -EWOULDBLOCK;
} }
futex_lock_mm(fshared);
spin_lock(q.lock_ptr); spin_lock(q.lock_ptr);
if (!ret) { if (!ret) {
...@@ -1618,7 +1542,6 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -1618,7 +1542,6 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared,
/* Unqueue and drop the lock */ /* Unqueue and drop the lock */
unqueue_me_pi(&q); unqueue_me_pi(&q);
futex_unlock_mm(fshared);
if (to) if (to)
destroy_hrtimer_on_stack(&to->timer); destroy_hrtimer_on_stack(&to->timer);
...@@ -1628,7 +1551,7 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -1628,7 +1551,7 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared,
queue_unlock(&q, hb); queue_unlock(&q, hb);
out_release_sem: out_release_sem:
futex_unlock_mm(fshared); put_futex_key(fshared, &q.key);
if (to) if (to)
destroy_hrtimer_on_stack(&to->timer); destroy_hrtimer_on_stack(&to->timer);
return ret; return ret;
...@@ -1645,15 +1568,12 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -1645,15 +1568,12 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared,
queue_unlock(&q, hb); queue_unlock(&q, hb);
if (attempt++) { if (attempt++) {
ret = futex_handle_fault((unsigned long)uaddr, fshared, ret = futex_handle_fault((unsigned long)uaddr, attempt);
attempt);
if (ret) if (ret)
goto out_release_sem; goto out_release_sem;
goto retry_unlocked; goto retry_unlocked;
} }
futex_unlock_mm(fshared);
ret = get_user(uval, uaddr); ret = get_user(uval, uaddr);
if (!ret && (uval != -EFAULT)) if (!ret && (uval != -EFAULT))
goto retry; goto retry;
...@@ -1668,13 +1588,13 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared, ...@@ -1668,13 +1588,13 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared,
* This is the in-kernel slowpath: we look up the PI state (if any), * This is the in-kernel slowpath: we look up the PI state (if any),
* and do the rt-mutex unlock. * and do the rt-mutex unlock.
*/ */
static int futex_unlock_pi(u32 __user *uaddr, struct rw_semaphore *fshared) static int futex_unlock_pi(u32 __user *uaddr, int fshared)
{ {
struct futex_hash_bucket *hb; struct futex_hash_bucket *hb;
struct futex_q *this, *next; struct futex_q *this, *next;
u32 uval; u32 uval;
struct plist_head *head; struct plist_head *head;
union futex_key key; union futex_key key = FUTEX_KEY_INIT;
int ret, attempt = 0; int ret, attempt = 0;
retry: retry:
...@@ -1685,10 +1605,6 @@ static int futex_unlock_pi(u32 __user *uaddr, struct rw_semaphore *fshared) ...@@ -1685,10 +1605,6 @@ static int futex_unlock_pi(u32 __user *uaddr, struct rw_semaphore *fshared)
*/ */
if ((uval & FUTEX_TID_MASK) != task_pid_vnr(current)) if ((uval & FUTEX_TID_MASK) != task_pid_vnr(current))
return -EPERM; return -EPERM;
/*
* First take all the futex related locks:
*/
futex_lock_mm(fshared);
ret = get_futex_key(uaddr, fshared, &key); ret = get_futex_key(uaddr, fshared, &key);
if (unlikely(ret != 0)) if (unlikely(ret != 0))
...@@ -1747,7 +1663,7 @@ static int futex_unlock_pi(u32 __user *uaddr, struct rw_semaphore *fshared) ...@@ -1747,7 +1663,7 @@ static int futex_unlock_pi(u32 __user *uaddr, struct rw_semaphore *fshared)
out_unlock: out_unlock:
spin_unlock(&hb->lock); spin_unlock(&hb->lock);
out: out:
futex_unlock_mm(fshared); put_futex_key(fshared, &key);
return ret; return ret;
...@@ -1763,16 +1679,13 @@ static int futex_unlock_pi(u32 __user *uaddr, struct rw_semaphore *fshared) ...@@ -1763,16 +1679,13 @@ static int futex_unlock_pi(u32 __user *uaddr, struct rw_semaphore *fshared)
spin_unlock(&hb->lock); spin_unlock(&hb->lock);
if (attempt++) { if (attempt++) {
ret = futex_handle_fault((unsigned long)uaddr, fshared, ret = futex_handle_fault((unsigned long)uaddr, attempt);
attempt);
if (ret) if (ret)
goto out; goto out;
uval = 0; uval = 0;
goto retry_unlocked; goto retry_unlocked;
} }
futex_unlock_mm(fshared);
ret = get_user(uval, uaddr); ret = get_user(uval, uaddr);
if (!ret && (uval != -EFAULT)) if (!ret && (uval != -EFAULT))
goto retry; goto retry;
...@@ -1898,8 +1811,7 @@ int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int pi) ...@@ -1898,8 +1811,7 @@ int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int pi)
* PI futexes happens in exit_pi_state(): * PI futexes happens in exit_pi_state():
*/ */
if (!pi && (uval & FUTEX_WAITERS)) if (!pi && (uval & FUTEX_WAITERS))
futex_wake(uaddr, &curr->mm->mmap_sem, 1, futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
FUTEX_BITSET_MATCH_ANY);
} }
return 0; return 0;
} }
...@@ -1995,10 +1907,10 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, ...@@ -1995,10 +1907,10 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
{ {
int ret = -ENOSYS; int ret = -ENOSYS;
int cmd = op & FUTEX_CMD_MASK; int cmd = op & FUTEX_CMD_MASK;
struct rw_semaphore *fshared = NULL; int fshared = 0;
if (!(op & FUTEX_PRIVATE_FLAG)) if (!(op & FUTEX_PRIVATE_FLAG))
fshared = &current->mm->mmap_sem; fshared = 1;
switch (cmd) { switch (cmd) {
case FUTEX_WAIT: case FUTEX_WAIT:
......
...@@ -21,6 +21,9 @@ static DEFINE_SPINLOCK(kthread_create_lock); ...@@ -21,6 +21,9 @@ static DEFINE_SPINLOCK(kthread_create_lock);
static LIST_HEAD(kthread_create_list); static LIST_HEAD(kthread_create_list);
struct task_struct *kthreadd_task; struct task_struct *kthreadd_task;
DEFINE_TRACE(sched_kthread_stop);
DEFINE_TRACE(sched_kthread_stop_ret);
struct kthread_create_info struct kthread_create_info
{ {
/* Information passed to kthread() from kthreadd. */ /* Information passed to kthread() from kthreadd. */
......
...@@ -136,16 +136,16 @@ static inline struct lock_class *hlock_class(struct held_lock *hlock) ...@@ -136,16 +136,16 @@ static inline struct lock_class *hlock_class(struct held_lock *hlock)
#ifdef CONFIG_LOCK_STAT #ifdef CONFIG_LOCK_STAT
static DEFINE_PER_CPU(struct lock_class_stats[MAX_LOCKDEP_KEYS], lock_stats); static DEFINE_PER_CPU(struct lock_class_stats[MAX_LOCKDEP_KEYS], lock_stats);
static int lock_contention_point(struct lock_class *class, unsigned long ip) static int lock_point(unsigned long points[], unsigned long ip)
{ {
int i; int i;
for (i = 0; i < ARRAY_SIZE(class->contention_point); i++) { for (i = 0; i < LOCKSTAT_POINTS; i++) {
if (class->contention_point[i] == 0) { if (points[i] == 0) {
class->contention_point[i] = ip; points[i] = ip;
break; break;
} }
if (class->contention_point[i] == ip) if (points[i] == ip)
break; break;
} }
...@@ -185,6 +185,9 @@ struct lock_class_stats lock_stats(struct lock_class *class) ...@@ -185,6 +185,9 @@ struct lock_class_stats lock_stats(struct lock_class *class)
for (i = 0; i < ARRAY_SIZE(stats.contention_point); i++) for (i = 0; i < ARRAY_SIZE(stats.contention_point); i++)
stats.contention_point[i] += pcs->contention_point[i]; stats.contention_point[i] += pcs->contention_point[i];
for (i = 0; i < ARRAY_SIZE(stats.contending_point); i++)
stats.contending_point[i] += pcs->contending_point[i];
lock_time_add(&pcs->read_waittime, &stats.read_waittime); lock_time_add(&pcs->read_waittime, &stats.read_waittime);
lock_time_add(&pcs->write_waittime, &stats.write_waittime); lock_time_add(&pcs->write_waittime, &stats.write_waittime);
...@@ -209,6 +212,7 @@ void clear_lock_stats(struct lock_class *class) ...@@ -209,6 +212,7 @@ void clear_lock_stats(struct lock_class *class)
memset(cpu_stats, 0, sizeof(struct lock_class_stats)); memset(cpu_stats, 0, sizeof(struct lock_class_stats));
} }
memset(class->contention_point, 0, sizeof(class->contention_point)); memset(class->contention_point, 0, sizeof(class->contention_point));
memset(class->contending_point, 0, sizeof(class->contending_point));
} }
static struct lock_class_stats *get_lock_stats(struct lock_class *class) static struct lock_class_stats *get_lock_stats(struct lock_class *class)
...@@ -2999,7 +3003,7 @@ __lock_contended(struct lockdep_map *lock, unsigned long ip) ...@@ -2999,7 +3003,7 @@ __lock_contended(struct lockdep_map *lock, unsigned long ip)
struct held_lock *hlock, *prev_hlock; struct held_lock *hlock, *prev_hlock;
struct lock_class_stats *stats; struct lock_class_stats *stats;
unsigned int depth; unsigned int depth;
int i, point; int i, contention_point, contending_point;
depth = curr->lockdep_depth; depth = curr->lockdep_depth;
if (DEBUG_LOCKS_WARN_ON(!depth)) if (DEBUG_LOCKS_WARN_ON(!depth))
...@@ -3023,18 +3027,22 @@ __lock_contended(struct lockdep_map *lock, unsigned long ip) ...@@ -3023,18 +3027,22 @@ __lock_contended(struct lockdep_map *lock, unsigned long ip)
found_it: found_it:
hlock->waittime_stamp = sched_clock(); hlock->waittime_stamp = sched_clock();
point = lock_contention_point(hlock_class(hlock), ip); contention_point = lock_point(hlock_class(hlock)->contention_point, ip);
contending_point = lock_point(hlock_class(hlock)->contending_point,
lock->ip);
stats = get_lock_stats(hlock_class(hlock)); stats = get_lock_stats(hlock_class(hlock));
if (point < ARRAY_SIZE(stats->contention_point)) if (contention_point < LOCKSTAT_POINTS)
stats->contention_point[point]++; stats->contention_point[contention_point]++;
if (contending_point < LOCKSTAT_POINTS)
stats->contending_point[contending_point]++;
if (lock->cpu != smp_processor_id()) if (lock->cpu != smp_processor_id())
stats->bounces[bounce_contended + !!hlock->read]++; stats->bounces[bounce_contended + !!hlock->read]++;
put_lock_stats(stats); put_lock_stats(stats);
} }
static void static void
__lock_acquired(struct lockdep_map *lock) __lock_acquired(struct lockdep_map *lock, unsigned long ip)
{ {
struct task_struct *curr = current; struct task_struct *curr = current;
struct held_lock *hlock, *prev_hlock; struct held_lock *hlock, *prev_hlock;
...@@ -3083,6 +3091,7 @@ __lock_acquired(struct lockdep_map *lock) ...@@ -3083,6 +3091,7 @@ __lock_acquired(struct lockdep_map *lock)
put_lock_stats(stats); put_lock_stats(stats);
lock->cpu = cpu; lock->cpu = cpu;
lock->ip = ip;
} }
void lock_contended(struct lockdep_map *lock, unsigned long ip) void lock_contended(struct lockdep_map *lock, unsigned long ip)
...@@ -3104,7 +3113,7 @@ void lock_contended(struct lockdep_map *lock, unsigned long ip) ...@@ -3104,7 +3113,7 @@ void lock_contended(struct lockdep_map *lock, unsigned long ip)
} }
EXPORT_SYMBOL_GPL(lock_contended); EXPORT_SYMBOL_GPL(lock_contended);
void lock_acquired(struct lockdep_map *lock) void lock_acquired(struct lockdep_map *lock, unsigned long ip)
{ {
unsigned long flags; unsigned long flags;
...@@ -3117,7 +3126,7 @@ void lock_acquired(struct lockdep_map *lock) ...@@ -3117,7 +3126,7 @@ void lock_acquired(struct lockdep_map *lock)
raw_local_irq_save(flags); raw_local_irq_save(flags);
check_flags(flags); check_flags(flags);
current->lockdep_recursion = 1; current->lockdep_recursion = 1;
__lock_acquired(lock); __lock_acquired(lock, ip);
current->lockdep_recursion = 0; current->lockdep_recursion = 0;
raw_local_irq_restore(flags); raw_local_irq_restore(flags);
} }
...@@ -3276,10 +3285,10 @@ void __init lockdep_info(void) ...@@ -3276,10 +3285,10 @@ void __init lockdep_info(void)
{ {
printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n"); printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
printk("... MAX_LOCKDEP_SUBCLASSES: %lu\n", MAX_LOCKDEP_SUBCLASSES); printk("... MAX_LOCKDEP_SUBCLASSES: %lu\n", MAX_LOCKDEP_SUBCLASSES);
printk("... MAX_LOCK_DEPTH: %lu\n", MAX_LOCK_DEPTH); printk("... MAX_LOCK_DEPTH: %lu\n", MAX_LOCK_DEPTH);
printk("... MAX_LOCKDEP_KEYS: %lu\n", MAX_LOCKDEP_KEYS); printk("... MAX_LOCKDEP_KEYS: %lu\n", MAX_LOCKDEP_KEYS);
printk("... CLASSHASH_SIZE: %lu\n", CLASSHASH_SIZE); printk("... CLASSHASH_SIZE: %lu\n", CLASSHASH_SIZE);
printk("... MAX_LOCKDEP_ENTRIES: %lu\n", MAX_LOCKDEP_ENTRIES); printk("... MAX_LOCKDEP_ENTRIES: %lu\n", MAX_LOCKDEP_ENTRIES);
printk("... MAX_LOCKDEP_CHAINS: %lu\n", MAX_LOCKDEP_CHAINS); printk("... MAX_LOCKDEP_CHAINS: %lu\n", MAX_LOCKDEP_CHAINS);
printk("... CHAINHASH_SIZE: %lu\n", CHAINHASH_SIZE); printk("... CHAINHASH_SIZE: %lu\n", CHAINHASH_SIZE);
......
...@@ -470,11 +470,12 @@ static void seq_line(struct seq_file *m, char c, int offset, int length) ...@@ -470,11 +470,12 @@ static void seq_line(struct seq_file *m, char c, int offset, int length)
static void snprint_time(char *buf, size_t bufsiz, s64 nr) static void snprint_time(char *buf, size_t bufsiz, s64 nr)
{ {
unsigned long rem; s64 div;
s32 rem;
nr += 5; /* for display rounding */ nr += 5; /* for display rounding */
rem = do_div(nr, 1000); /* XXX: do_div_signed */ div = div_s64_rem(nr, 1000, &rem);
snprintf(buf, bufsiz, "%lld.%02d", (long long)nr, (int)rem/10); snprintf(buf, bufsiz, "%lld.%02d", (long long)div, (int)rem/10);
} }
static void seq_time(struct seq_file *m, s64 time) static void seq_time(struct seq_file *m, s64 time)
...@@ -556,7 +557,7 @@ static void seq_stats(struct seq_file *m, struct lock_stat_data *data) ...@@ -556,7 +557,7 @@ static void seq_stats(struct seq_file *m, struct lock_stat_data *data)
if (stats->read_holdtime.nr) if (stats->read_holdtime.nr)
namelen += 2; namelen += 2;
for (i = 0; i < ARRAY_SIZE(class->contention_point); i++) { for (i = 0; i < LOCKSTAT_POINTS; i++) {
char sym[KSYM_SYMBOL_LEN]; char sym[KSYM_SYMBOL_LEN];
char ip[32]; char ip[32];
...@@ -573,6 +574,23 @@ static void seq_stats(struct seq_file *m, struct lock_stat_data *data) ...@@ -573,6 +574,23 @@ static void seq_stats(struct seq_file *m, struct lock_stat_data *data)
stats->contention_point[i], stats->contention_point[i],
ip, sym); ip, sym);
} }
for (i = 0; i < LOCKSTAT_POINTS; i++) {
char sym[KSYM_SYMBOL_LEN];
char ip[32];
if (class->contending_point[i] == 0)
break;
if (!i)
seq_line(m, '-', 40-namelen, namelen);
sprint_symbol(sym, class->contending_point[i]);
snprintf(ip, sizeof(ip), "[<%p>]",
(void *)class->contending_point[i]);
seq_printf(m, "%40s %14lu %29s %s\n", name,
stats->contending_point[i],
ip, sym);
}
if (i) { if (i) {
seq_puts(m, "\n"); seq_puts(m, "\n");
seq_line(m, '.', 0, 40 + 1 + 10 * (14 + 1)); seq_line(m, '.', 0, 40 + 1 + 10 * (14 + 1));
...@@ -582,7 +600,7 @@ static void seq_stats(struct seq_file *m, struct lock_stat_data *data) ...@@ -582,7 +600,7 @@ static void seq_stats(struct seq_file *m, struct lock_stat_data *data)
static void seq_header(struct seq_file *m) static void seq_header(struct seq_file *m)
{ {
seq_printf(m, "lock_stat version 0.2\n"); seq_printf(m, "lock_stat version 0.3\n");
seq_line(m, '-', 0, 40 + 1 + 10 * (14 + 1)); seq_line(m, '-', 0, 40 + 1 + 10 * (14 + 1));
seq_printf(m, "%40s %14s %14s %14s %14s %14s %14s %14s %14s " seq_printf(m, "%40s %14s %14s %14s %14s %14s %14s %14s %14s "
"%14s %14s\n", "%14s %14s\n",
......
...@@ -43,6 +43,7 @@ static DEFINE_MUTEX(markers_mutex); ...@@ -43,6 +43,7 @@ static DEFINE_MUTEX(markers_mutex);
*/ */
#define MARKER_HASH_BITS 6 #define MARKER_HASH_BITS 6
#define MARKER_TABLE_SIZE (1 << MARKER_HASH_BITS) #define MARKER_TABLE_SIZE (1 << MARKER_HASH_BITS)
static struct hlist_head marker_table[MARKER_TABLE_SIZE];
/* /*
* Note about RCU : * Note about RCU :
...@@ -64,11 +65,10 @@ struct marker_entry { ...@@ -64,11 +65,10 @@ struct marker_entry {
void *oldptr; void *oldptr;
int rcu_pending; int rcu_pending;
unsigned char ptype:1; unsigned char ptype:1;
unsigned char format_allocated:1;
char name[0]; /* Contains name'\0'format'\0' */ char name[0]; /* Contains name'\0'format'\0' */
}; };
static struct hlist_head marker_table[MARKER_TABLE_SIZE];
/** /**
* __mark_empty_function - Empty probe callback * __mark_empty_function - Empty probe callback
* @probe_private: probe private data * @probe_private: probe private data
...@@ -81,7 +81,7 @@ static struct hlist_head marker_table[MARKER_TABLE_SIZE]; ...@@ -81,7 +81,7 @@ static struct hlist_head marker_table[MARKER_TABLE_SIZE];
* though the function pointer change and the marker enabling are two distinct * though the function pointer change and the marker enabling are two distinct
* operations that modifies the execution flow of preemptible code. * operations that modifies the execution flow of preemptible code.
*/ */
void __mark_empty_function(void *probe_private, void *call_private, notrace void __mark_empty_function(void *probe_private, void *call_private,
const char *fmt, va_list *args) const char *fmt, va_list *args)
{ {
} }
...@@ -97,7 +97,8 @@ EXPORT_SYMBOL_GPL(__mark_empty_function); ...@@ -97,7 +97,8 @@ EXPORT_SYMBOL_GPL(__mark_empty_function);
* need to put a full smp_rmb() in this branch. This is why we do not use * need to put a full smp_rmb() in this branch. This is why we do not use
* rcu_dereference() for the pointer read. * rcu_dereference() for the pointer read.
*/ */
void marker_probe_cb(const struct marker *mdata, void *call_private, ...) notrace void marker_probe_cb(const struct marker *mdata,
void *call_private, ...)
{ {
va_list args; va_list args;
char ptype; char ptype;
...@@ -107,7 +108,7 @@ void marker_probe_cb(const struct marker *mdata, void *call_private, ...) ...@@ -107,7 +108,7 @@ void marker_probe_cb(const struct marker *mdata, void *call_private, ...)
* sure the teardown of the callbacks can be done correctly when they * sure the teardown of the callbacks can be done correctly when they
* are in modules and they insure RCU read coherency. * are in modules and they insure RCU read coherency.
*/ */
rcu_read_lock_sched(); rcu_read_lock_sched_notrace();
ptype = mdata->ptype; ptype = mdata->ptype;
if (likely(!ptype)) { if (likely(!ptype)) {
marker_probe_func *func; marker_probe_func *func;
...@@ -145,7 +146,7 @@ void marker_probe_cb(const struct marker *mdata, void *call_private, ...) ...@@ -145,7 +146,7 @@ void marker_probe_cb(const struct marker *mdata, void *call_private, ...)
va_end(args); va_end(args);
} }
} }
rcu_read_unlock_sched(); rcu_read_unlock_sched_notrace();
} }
EXPORT_SYMBOL_GPL(marker_probe_cb); EXPORT_SYMBOL_GPL(marker_probe_cb);
...@@ -157,12 +158,13 @@ EXPORT_SYMBOL_GPL(marker_probe_cb); ...@@ -157,12 +158,13 @@ EXPORT_SYMBOL_GPL(marker_probe_cb);
* *
* Should be connected to markers "MARK_NOARGS". * Should be connected to markers "MARK_NOARGS".
*/ */
void marker_probe_cb_noarg(const struct marker *mdata, void *call_private, ...) static notrace void marker_probe_cb_noarg(const struct marker *mdata,
void *call_private, ...)
{ {
va_list args; /* not initialized */ va_list args; /* not initialized */
char ptype; char ptype;
rcu_read_lock_sched(); rcu_read_lock_sched_notrace();
ptype = mdata->ptype; ptype = mdata->ptype;
if (likely(!ptype)) { if (likely(!ptype)) {
marker_probe_func *func; marker_probe_func *func;
...@@ -195,9 +197,8 @@ void marker_probe_cb_noarg(const struct marker *mdata, void *call_private, ...) ...@@ -195,9 +197,8 @@ void marker_probe_cb_noarg(const struct marker *mdata, void *call_private, ...)
multi[i].func(multi[i].probe_private, call_private, multi[i].func(multi[i].probe_private, call_private,
mdata->format, &args); mdata->format, &args);
} }
rcu_read_unlock_sched(); rcu_read_unlock_sched_notrace();
} }
EXPORT_SYMBOL_GPL(marker_probe_cb_noarg);
static void free_old_closure(struct rcu_head *head) static void free_old_closure(struct rcu_head *head)
{ {
...@@ -416,6 +417,7 @@ static struct marker_entry *add_marker(const char *name, const char *format) ...@@ -416,6 +417,7 @@ static struct marker_entry *add_marker(const char *name, const char *format)
e->single.probe_private = NULL; e->single.probe_private = NULL;
e->multi = NULL; e->multi = NULL;
e->ptype = 0; e->ptype = 0;
e->format_allocated = 0;
e->refcount = 0; e->refcount = 0;
e->rcu_pending = 0; e->rcu_pending = 0;
hlist_add_head(&e->hlist, head); hlist_add_head(&e->hlist, head);
...@@ -447,6 +449,8 @@ static int remove_marker(const char *name) ...@@ -447,6 +449,8 @@ static int remove_marker(const char *name)
if (e->single.func != __mark_empty_function) if (e->single.func != __mark_empty_function)
return -EBUSY; return -EBUSY;
hlist_del(&e->hlist); hlist_del(&e->hlist);
if (e->format_allocated)
kfree(e->format);
/* Make sure the call_rcu has been executed */ /* Make sure the call_rcu has been executed */
if (e->rcu_pending) if (e->rcu_pending)
rcu_barrier_sched(); rcu_barrier_sched();
...@@ -457,57 +461,34 @@ static int remove_marker(const char *name) ...@@ -457,57 +461,34 @@ static int remove_marker(const char *name)
/* /*
* Set the mark_entry format to the format found in the element. * Set the mark_entry format to the format found in the element.
*/ */
static int marker_set_format(struct marker_entry **entry, const char *format) static int marker_set_format(struct marker_entry *entry, const char *format)
{ {
struct marker_entry *e; entry->format = kstrdup(format, GFP_KERNEL);
size_t name_len = strlen((*entry)->name) + 1; if (!entry->format)
size_t format_len = strlen(format) + 1;
e = kmalloc(sizeof(struct marker_entry) + name_len + format_len,
GFP_KERNEL);
if (!e)
return -ENOMEM; return -ENOMEM;
memcpy(&e->name[0], (*entry)->name, name_len); entry->format_allocated = 1;
e->format = &e->name[name_len];
memcpy(e->format, format, format_len);
if (strcmp(e->format, MARK_NOARGS) == 0)
e->call = marker_probe_cb_noarg;
else
e->call = marker_probe_cb;
e->single = (*entry)->single;
e->multi = (*entry)->multi;
e->ptype = (*entry)->ptype;
e->refcount = (*entry)->refcount;
e->rcu_pending = 0;
hlist_add_before(&e->hlist, &(*entry)->hlist);
hlist_del(&(*entry)->hlist);
/* Make sure the call_rcu has been executed */
if ((*entry)->rcu_pending)
rcu_barrier_sched();
kfree(*entry);
*entry = e;
trace_mark(core_marker_format, "name %s format %s", trace_mark(core_marker_format, "name %s format %s",
e->name, e->format); entry->name, entry->format);
return 0; return 0;
} }
/* /*
* Sets the probe callback corresponding to one marker. * Sets the probe callback corresponding to one marker.
*/ */
static int set_marker(struct marker_entry **entry, struct marker *elem, static int set_marker(struct marker_entry *entry, struct marker *elem,
int active) int active)
{ {
int ret; int ret = 0;
WARN_ON(strcmp((*entry)->name, elem->name) != 0); WARN_ON(strcmp(entry->name, elem->name) != 0);
if ((*entry)->format) { if (entry->format) {
if (strcmp((*entry)->format, elem->format) != 0) { if (strcmp(entry->format, elem->format) != 0) {
printk(KERN_NOTICE printk(KERN_NOTICE
"Format mismatch for probe %s " "Format mismatch for probe %s "
"(%s), marker (%s)\n", "(%s), marker (%s)\n",
(*entry)->name, entry->name,
(*entry)->format, entry->format,
elem->format); elem->format);
return -EPERM; return -EPERM;
} }
...@@ -523,37 +504,67 @@ static int set_marker(struct marker_entry **entry, struct marker *elem, ...@@ -523,37 +504,67 @@ static int set_marker(struct marker_entry **entry, struct marker *elem,
* pass from a "safe" callback (with argument) to an "unsafe" * pass from a "safe" callback (with argument) to an "unsafe"
* callback (does not set arguments). * callback (does not set arguments).
*/ */
elem->call = (*entry)->call; elem->call = entry->call;
/* /*
* Sanity check : * Sanity check :
* We only update the single probe private data when the ptr is * We only update the single probe private data when the ptr is
* set to a _non_ single probe! (0 -> 1 and N -> 1, N != 1) * set to a _non_ single probe! (0 -> 1 and N -> 1, N != 1)
*/ */
WARN_ON(elem->single.func != __mark_empty_function WARN_ON(elem->single.func != __mark_empty_function
&& elem->single.probe_private && elem->single.probe_private != entry->single.probe_private
!= (*entry)->single.probe_private && && !elem->ptype);
!elem->ptype); elem->single.probe_private = entry->single.probe_private;
elem->single.probe_private = (*entry)->single.probe_private;
/* /*
* Make sure the private data is valid when we update the * Make sure the private data is valid when we update the
* single probe ptr. * single probe ptr.
*/ */
smp_wmb(); smp_wmb();
elem->single.func = (*entry)->single.func; elem->single.func = entry->single.func;
/* /*
* We also make sure that the new probe callbacks array is consistent * We also make sure that the new probe callbacks array is consistent
* before setting a pointer to it. * before setting a pointer to it.
*/ */
rcu_assign_pointer(elem->multi, (*entry)->multi); rcu_assign_pointer(elem->multi, entry->multi);
/* /*
* Update the function or multi probe array pointer before setting the * Update the function or multi probe array pointer before setting the
* ptype. * ptype.
*/ */
smp_wmb(); smp_wmb();
elem->ptype = (*entry)->ptype; elem->ptype = entry->ptype;
if (elem->tp_name && (active ^ elem->state)) {
WARN_ON(!elem->tp_cb);
/*
* It is ok to directly call the probe registration because type
* checking has been done in the __trace_mark_tp() macro.
*/
if (active) {
/*
* try_module_get should always succeed because we hold
* lock_module() to get the tp_cb address.
*/
ret = try_module_get(__module_text_address(
(unsigned long)elem->tp_cb));
BUG_ON(!ret);
ret = tracepoint_probe_register_noupdate(
elem->tp_name,
elem->tp_cb);
} else {
ret = tracepoint_probe_unregister_noupdate(
elem->tp_name,
elem->tp_cb);
/*
* tracepoint_probe_update_all() must be called
* before the module containing tp_cb is unloaded.
*/
module_put(__module_text_address(
(unsigned long)elem->tp_cb));
}
}
elem->state = active; elem->state = active;
return 0; return ret;
} }
/* /*
...@@ -564,7 +575,24 @@ static int set_marker(struct marker_entry **entry, struct marker *elem, ...@@ -564,7 +575,24 @@ static int set_marker(struct marker_entry **entry, struct marker *elem,
*/ */
static void disable_marker(struct marker *elem) static void disable_marker(struct marker *elem)
{ {
int ret;
/* leave "call" as is. It is known statically. */ /* leave "call" as is. It is known statically. */
if (elem->tp_name && elem->state) {
WARN_ON(!elem->tp_cb);
/*
* It is ok to directly call the probe registration because type
* checking has been done in the __trace_mark_tp() macro.
*/
ret = tracepoint_probe_unregister_noupdate(elem->tp_name,
elem->tp_cb);
WARN_ON(ret);
/*
* tracepoint_probe_update_all() must be called
* before the module containing tp_cb is unloaded.
*/
module_put(__module_text_address((unsigned long)elem->tp_cb));
}
elem->state = 0; elem->state = 0;
elem->single.func = __mark_empty_function; elem->single.func = __mark_empty_function;
/* Update the function before setting the ptype */ /* Update the function before setting the ptype */
...@@ -594,8 +622,7 @@ void marker_update_probe_range(struct marker *begin, ...@@ -594,8 +622,7 @@ void marker_update_probe_range(struct marker *begin,
for (iter = begin; iter < end; iter++) { for (iter = begin; iter < end; iter++) {
mark_entry = get_marker(iter->name); mark_entry = get_marker(iter->name);
if (mark_entry) { if (mark_entry) {
set_marker(&mark_entry, iter, set_marker(mark_entry, iter, !!mark_entry->refcount);
!!mark_entry->refcount);
/* /*
* ignore error, continue * ignore error, continue
*/ */
...@@ -629,6 +656,7 @@ static void marker_update_probes(void) ...@@ -629,6 +656,7 @@ static void marker_update_probes(void)
marker_update_probe_range(__start___markers, __stop___markers); marker_update_probe_range(__start___markers, __stop___markers);
/* Markers in modules. */ /* Markers in modules. */
module_update_markers(); module_update_markers();
tracepoint_probe_update_all();
} }
/** /**
...@@ -657,7 +685,7 @@ int marker_probe_register(const char *name, const char *format, ...@@ -657,7 +685,7 @@ int marker_probe_register(const char *name, const char *format,
ret = PTR_ERR(entry); ret = PTR_ERR(entry);
} else if (format) { } else if (format) {
if (!entry->format) if (!entry->format)
ret = marker_set_format(&entry, format); ret = marker_set_format(entry, format);
else if (strcmp(entry->format, format)) else if (strcmp(entry->format, format))
ret = -EPERM; ret = -EPERM;
} }
...@@ -676,10 +704,11 @@ int marker_probe_register(const char *name, const char *format, ...@@ -676,10 +704,11 @@ int marker_probe_register(const char *name, const char *format,
goto end; goto end;
} }
mutex_unlock(&markers_mutex); mutex_unlock(&markers_mutex);
marker_update_probes(); /* may update entry */ marker_update_probes();
mutex_lock(&markers_mutex); mutex_lock(&markers_mutex);
entry = get_marker(name); entry = get_marker(name);
WARN_ON(!entry); if (!entry)
goto end;
if (entry->rcu_pending) if (entry->rcu_pending)
rcu_barrier_sched(); rcu_barrier_sched();
entry->oldptr = old; entry->oldptr = old;
...@@ -720,7 +749,7 @@ int marker_probe_unregister(const char *name, ...@@ -720,7 +749,7 @@ int marker_probe_unregister(const char *name,
rcu_barrier_sched(); rcu_barrier_sched();
old = marker_entry_remove_probe(entry, probe, probe_private); old = marker_entry_remove_probe(entry, probe, probe_private);
mutex_unlock(&markers_mutex); mutex_unlock(&markers_mutex);
marker_update_probes(); /* may update entry */ marker_update_probes();
mutex_lock(&markers_mutex); mutex_lock(&markers_mutex);
entry = get_marker(name); entry = get_marker(name);
if (!entry) if (!entry)
...@@ -801,10 +830,11 @@ int marker_probe_unregister_private_data(marker_probe_func *probe, ...@@ -801,10 +830,11 @@ int marker_probe_unregister_private_data(marker_probe_func *probe,
rcu_barrier_sched(); rcu_barrier_sched();
old = marker_entry_remove_probe(entry, NULL, probe_private); old = marker_entry_remove_probe(entry, NULL, probe_private);
mutex_unlock(&markers_mutex); mutex_unlock(&markers_mutex);
marker_update_probes(); /* may update entry */ marker_update_probes();
mutex_lock(&markers_mutex); mutex_lock(&markers_mutex);
entry = get_marker_from_private_data(probe, probe_private); entry = get_marker_from_private_data(probe, probe_private);
WARN_ON(!entry); if (!entry)
goto end;
if (entry->rcu_pending) if (entry->rcu_pending)
rcu_barrier_sched(); rcu_barrier_sched();
entry->oldptr = old; entry->oldptr = old;
...@@ -848,8 +878,6 @@ void *marker_get_private_data(const char *name, marker_probe_func *probe, ...@@ -848,8 +878,6 @@ void *marker_get_private_data(const char *name, marker_probe_func *probe,
if (!e->ptype) { if (!e->ptype) {
if (num == 0 && e->single.func == probe) if (num == 0 && e->single.func == probe)
return e->single.probe_private; return e->single.probe_private;
else
break;
} else { } else {
struct marker_probe_closure *closure; struct marker_probe_closure *closure;
int match = 0; int match = 0;
...@@ -861,8 +889,42 @@ void *marker_get_private_data(const char *name, marker_probe_func *probe, ...@@ -861,8 +889,42 @@ void *marker_get_private_data(const char *name, marker_probe_func *probe,
return closure[i].probe_private; return closure[i].probe_private;
} }
} }
break;
} }
} }
return ERR_PTR(-ENOENT); return ERR_PTR(-ENOENT);
} }
EXPORT_SYMBOL_GPL(marker_get_private_data); EXPORT_SYMBOL_GPL(marker_get_private_data);
#ifdef CONFIG_MODULES
int marker_module_notify(struct notifier_block *self,
unsigned long val, void *data)
{
struct module *mod = data;
switch (val) {
case MODULE_STATE_COMING:
marker_update_probe_range(mod->markers,
mod->markers + mod->num_markers);
break;
case MODULE_STATE_GOING:
marker_update_probe_range(mod->markers,
mod->markers + mod->num_markers);
break;
}
return 0;
}
struct notifier_block marker_module_nb = {
.notifier_call = marker_module_notify,
.priority = 0,
};
static int init_markers(void)
{
return register_module_notifier(&marker_module_nb);
}
__initcall(init_markers);
#endif /* CONFIG_MODULES */
...@@ -2184,24 +2184,15 @@ static noinline struct module *load_module(void __user *umod, ...@@ -2184,24 +2184,15 @@ static noinline struct module *load_module(void __user *umod,
struct mod_debug *debug; struct mod_debug *debug;
unsigned int num_debug; unsigned int num_debug;
#ifdef CONFIG_MARKERS
marker_update_probe_range(mod->markers,
mod->markers + mod->num_markers);
#endif
debug = section_objs(hdr, sechdrs, secstrings, "__verbose", debug = section_objs(hdr, sechdrs, secstrings, "__verbose",
sizeof(*debug), &num_debug); sizeof(*debug), &num_debug);
dynamic_printk_setup(debug, num_debug); dynamic_printk_setup(debug, num_debug);
#ifdef CONFIG_TRACEPOINTS
tracepoint_update_probe_range(mod->tracepoints,
mod->tracepoints + mod->num_tracepoints);
#endif
} }
/* sechdrs[0].sh_size is always zero */ /* sechdrs[0].sh_size is always zero */
mseg = section_objs(hdr, sechdrs, secstrings, "__mcount_loc", mseg = section_objs(hdr, sechdrs, secstrings, "__mcount_loc",
sizeof(*mseg), &num_mcount); sizeof(*mseg), &num_mcount);
ftrace_init_module(mseg, mseg + num_mcount); ftrace_init_module(mod, mseg, mseg + num_mcount);
err = module_finalize(hdr, sechdrs, mod); err = module_finalize(hdr, sechdrs, mod);
if (err < 0) if (err < 0)
......
...@@ -59,7 +59,7 @@ EXPORT_SYMBOL(__mutex_init); ...@@ -59,7 +59,7 @@ EXPORT_SYMBOL(__mutex_init);
* We also put the fastpath first in the kernel image, to make sure the * We also put the fastpath first in the kernel image, to make sure the
* branch is predicted by the CPU as default-untaken. * branch is predicted by the CPU as default-untaken.
*/ */
static void noinline __sched static __used noinline void __sched
__mutex_lock_slowpath(atomic_t *lock_count); __mutex_lock_slowpath(atomic_t *lock_count);
/*** /***
...@@ -96,7 +96,7 @@ void inline __sched mutex_lock(struct mutex *lock) ...@@ -96,7 +96,7 @@ void inline __sched mutex_lock(struct mutex *lock)
EXPORT_SYMBOL(mutex_lock); EXPORT_SYMBOL(mutex_lock);
#endif #endif
static noinline void __sched __mutex_unlock_slowpath(atomic_t *lock_count); static __used noinline void __sched __mutex_unlock_slowpath(atomic_t *lock_count);
/*** /***
* mutex_unlock - release the mutex * mutex_unlock - release the mutex
...@@ -184,7 +184,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, ...@@ -184,7 +184,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
} }
done: done:
lock_acquired(&lock->dep_map); lock_acquired(&lock->dep_map, ip);
/* got the lock - rejoice! */ /* got the lock - rejoice! */
mutex_remove_waiter(lock, &waiter, task_thread_info(task)); mutex_remove_waiter(lock, &waiter, task_thread_info(task));
debug_mutex_set_owner(lock, task_thread_info(task)); debug_mutex_set_owner(lock, task_thread_info(task));
...@@ -268,7 +268,7 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested) ...@@ -268,7 +268,7 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
/* /*
* Release the lock, slowpath: * Release the lock, slowpath:
*/ */
static noinline void static __used noinline void
__mutex_unlock_slowpath(atomic_t *lock_count) __mutex_unlock_slowpath(atomic_t *lock_count)
{ {
__mutex_unlock_common_slowpath(lock_count, 1); __mutex_unlock_common_slowpath(lock_count, 1);
...@@ -313,7 +313,7 @@ int __sched mutex_lock_killable(struct mutex *lock) ...@@ -313,7 +313,7 @@ int __sched mutex_lock_killable(struct mutex *lock)
} }
EXPORT_SYMBOL(mutex_lock_killable); EXPORT_SYMBOL(mutex_lock_killable);
static noinline void __sched static __used noinline void __sched
__mutex_lock_slowpath(atomic_t *lock_count) __mutex_lock_slowpath(atomic_t *lock_count)
{ {
struct mutex *lock = container_of(lock_count, struct mutex, count); struct mutex *lock = container_of(lock_count, struct mutex, count);
......
...@@ -82,6 +82,14 @@ static int __kprobes notifier_call_chain(struct notifier_block **nl, ...@@ -82,6 +82,14 @@ static int __kprobes notifier_call_chain(struct notifier_block **nl,
while (nb && nr_to_call) { while (nb && nr_to_call) {
next_nb = rcu_dereference(nb->next); next_nb = rcu_dereference(nb->next);
#ifdef CONFIG_DEBUG_NOTIFIERS
if (unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
WARN(1, "Invalid notifier called!");
nb = next_nb;
continue;
}
#endif
ret = nb->notifier_call(nb, val, v); ret = nb->notifier_call(nb, val, v);
if (nr_calls) if (nr_calls)
......
...@@ -58,21 +58,21 @@ void thread_group_cputime( ...@@ -58,21 +58,21 @@ void thread_group_cputime(
struct task_struct *tsk, struct task_struct *tsk,
struct task_cputime *times) struct task_cputime *times)
{ {
struct signal_struct *sig; struct task_cputime *totals, *tot;
int i; int i;
struct task_cputime *tot;
sig = tsk->signal; totals = tsk->signal->cputime.totals;
if (unlikely(!sig) || !sig->cputime.totals) { if (!totals) {
times->utime = tsk->utime; times->utime = tsk->utime;
times->stime = tsk->stime; times->stime = tsk->stime;
times->sum_exec_runtime = tsk->se.sum_exec_runtime; times->sum_exec_runtime = tsk->se.sum_exec_runtime;
return; return;
} }
times->stime = times->utime = cputime_zero; times->stime = times->utime = cputime_zero;
times->sum_exec_runtime = 0; times->sum_exec_runtime = 0;
for_each_possible_cpu(i) { for_each_possible_cpu(i) {
tot = per_cpu_ptr(tsk->signal->cputime.totals, i); tot = per_cpu_ptr(totals, i);
times->utime = cputime_add(times->utime, tot->utime); times->utime = cputime_add(times->utime, tot->utime);
times->stime = cputime_add(times->stime, tot->stime); times->stime = cputime_add(times->stime, tot->stime);
times->sum_exec_runtime += tot->sum_exec_runtime; times->sum_exec_runtime += tot->sum_exec_runtime;
......
...@@ -22,7 +22,6 @@ ...@@ -22,7 +22,6 @@
#include <linux/console.h> #include <linux/console.h>
#include <linux/cpu.h> #include <linux/cpu.h>
#include <linux/freezer.h> #include <linux/freezer.h>
#include <linux/ftrace.h>
#include "power.h" #include "power.h"
...@@ -257,7 +256,7 @@ static int create_image(int platform_mode) ...@@ -257,7 +256,7 @@ static int create_image(int platform_mode)
int hibernation_snapshot(int platform_mode) int hibernation_snapshot(int platform_mode)
{ {
int error, ftrace_save; int error;
/* Free memory before shutting down devices. */ /* Free memory before shutting down devices. */
error = swsusp_shrink_memory(); error = swsusp_shrink_memory();
...@@ -269,7 +268,6 @@ int hibernation_snapshot(int platform_mode) ...@@ -269,7 +268,6 @@ int hibernation_snapshot(int platform_mode)
goto Close; goto Close;
suspend_console(); suspend_console();
ftrace_save = __ftrace_enabled_save();
error = device_suspend(PMSG_FREEZE); error = device_suspend(PMSG_FREEZE);
if (error) if (error)
goto Recover_platform; goto Recover_platform;
...@@ -299,7 +297,6 @@ int hibernation_snapshot(int platform_mode) ...@@ -299,7 +297,6 @@ int hibernation_snapshot(int platform_mode)
Resume_devices: Resume_devices:
device_resume(in_suspend ? device_resume(in_suspend ?
(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE); (error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
__ftrace_enabled_restore(ftrace_save);
resume_console(); resume_console();
Close: Close:
platform_end(platform_mode); platform_end(platform_mode);
...@@ -370,11 +367,10 @@ static int resume_target_kernel(void) ...@@ -370,11 +367,10 @@ static int resume_target_kernel(void)
int hibernation_restore(int platform_mode) int hibernation_restore(int platform_mode)
{ {
int error, ftrace_save; int error;
pm_prepare_console(); pm_prepare_console();
suspend_console(); suspend_console();
ftrace_save = __ftrace_enabled_save();
error = device_suspend(PMSG_QUIESCE); error = device_suspend(PMSG_QUIESCE);
if (error) if (error)
goto Finish; goto Finish;
...@@ -389,7 +385,6 @@ int hibernation_restore(int platform_mode) ...@@ -389,7 +385,6 @@ int hibernation_restore(int platform_mode)
platform_restore_cleanup(platform_mode); platform_restore_cleanup(platform_mode);
device_resume(PMSG_RECOVER); device_resume(PMSG_RECOVER);
Finish: Finish:
__ftrace_enabled_restore(ftrace_save);
resume_console(); resume_console();
pm_restore_console(); pm_restore_console();
return error; return error;
...@@ -402,7 +397,7 @@ int hibernation_restore(int platform_mode) ...@@ -402,7 +397,7 @@ int hibernation_restore(int platform_mode)
int hibernation_platform_enter(void) int hibernation_platform_enter(void)
{ {
int error, ftrace_save; int error;
if (!hibernation_ops) if (!hibernation_ops)
return -ENOSYS; return -ENOSYS;
...@@ -417,7 +412,6 @@ int hibernation_platform_enter(void) ...@@ -417,7 +412,6 @@ int hibernation_platform_enter(void)
goto Close; goto Close;
suspend_console(); suspend_console();
ftrace_save = __ftrace_enabled_save();
error = device_suspend(PMSG_HIBERNATE); error = device_suspend(PMSG_HIBERNATE);
if (error) { if (error) {
if (hibernation_ops->recover) if (hibernation_ops->recover)
...@@ -452,7 +446,6 @@ int hibernation_platform_enter(void) ...@@ -452,7 +446,6 @@ int hibernation_platform_enter(void)
hibernation_ops->finish(); hibernation_ops->finish();
Resume_devices: Resume_devices:
device_resume(PMSG_RESTORE); device_resume(PMSG_RESTORE);
__ftrace_enabled_restore(ftrace_save);
resume_console(); resume_console();
Close: Close:
hibernation_ops->end(); hibernation_ops->end();
......
...@@ -22,7 +22,6 @@ ...@@ -22,7 +22,6 @@
#include <linux/freezer.h> #include <linux/freezer.h>
#include <linux/vmstat.h> #include <linux/vmstat.h>
#include <linux/syscalls.h> #include <linux/syscalls.h>
#include <linux/ftrace.h>
#include "power.h" #include "power.h"
...@@ -317,7 +316,7 @@ static int suspend_enter(suspend_state_t state) ...@@ -317,7 +316,7 @@ static int suspend_enter(suspend_state_t state)
*/ */
int suspend_devices_and_enter(suspend_state_t state) int suspend_devices_and_enter(suspend_state_t state)
{ {
int error, ftrace_save; int error;
if (!suspend_ops) if (!suspend_ops)
return -ENOSYS; return -ENOSYS;
...@@ -328,7 +327,6 @@ int suspend_devices_and_enter(suspend_state_t state) ...@@ -328,7 +327,6 @@ int suspend_devices_and_enter(suspend_state_t state)
goto Close; goto Close;
} }
suspend_console(); suspend_console();
ftrace_save = __ftrace_enabled_save();
suspend_test_start(); suspend_test_start();
error = device_suspend(PMSG_SUSPEND); error = device_suspend(PMSG_SUSPEND);
if (error) { if (error) {
...@@ -360,7 +358,6 @@ int suspend_devices_and_enter(suspend_state_t state) ...@@ -360,7 +358,6 @@ int suspend_devices_and_enter(suspend_state_t state)
suspend_test_start(); suspend_test_start();
device_resume(PMSG_RESUME); device_resume(PMSG_RESUME);
suspend_test_finish("resume devices"); suspend_test_finish("resume devices");
__ftrace_enabled_restore(ftrace_save);
resume_console(); resume_console();
Close: Close:
if (suspend_ops->end) if (suspend_ops->end)
......
...@@ -544,7 +544,7 @@ static const struct file_operations proc_profile_operations = { ...@@ -544,7 +544,7 @@ static const struct file_operations proc_profile_operations = {
}; };
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
static inline void profile_nop(void *unused) static void profile_nop(void *unused)
{ {
} }
......
...@@ -191,7 +191,7 @@ static void print_other_cpu_stall(struct rcu_ctrlblk *rcp) ...@@ -191,7 +191,7 @@ static void print_other_cpu_stall(struct rcu_ctrlblk *rcp)
/* OK, time to rat on our buddy... */ /* OK, time to rat on our buddy... */
printk(KERN_ERR "RCU detected CPU stalls:"); printk(KERN_ERR "INFO: RCU detected CPU stalls:");
for_each_possible_cpu(cpu) { for_each_possible_cpu(cpu) {
if (cpu_isset(cpu, rcp->cpumask)) if (cpu_isset(cpu, rcp->cpumask))
printk(" %d", cpu); printk(" %d", cpu);
...@@ -204,7 +204,7 @@ static void print_cpu_stall(struct rcu_ctrlblk *rcp) ...@@ -204,7 +204,7 @@ static void print_cpu_stall(struct rcu_ctrlblk *rcp)
{ {
unsigned long flags; unsigned long flags;
printk(KERN_ERR "RCU detected CPU %d stall (t=%lu/%lu jiffies)\n", printk(KERN_ERR "INFO: RCU detected CPU %d stall (t=%lu/%lu jiffies)\n",
smp_processor_id(), jiffies, smp_processor_id(), jiffies,
jiffies - rcp->gp_start); jiffies - rcp->gp_start);
dump_stack(); dump_stack();
......
...@@ -118,6 +118,12 @@ ...@@ -118,6 +118,12 @@
*/ */
#define RUNTIME_INF ((u64)~0ULL) #define RUNTIME_INF ((u64)~0ULL)
DEFINE_TRACE(sched_wait_task);
DEFINE_TRACE(sched_wakeup);
DEFINE_TRACE(sched_wakeup_new);
DEFINE_TRACE(sched_switch);
DEFINE_TRACE(sched_migrate_task);
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
/* /*
* Divide a load by a sched group cpu_power : (load / sg->__cpu_power) * Divide a load by a sched group cpu_power : (load / sg->__cpu_power)
...@@ -4171,7 +4177,6 @@ void account_steal_time(struct task_struct *p, cputime_t steal) ...@@ -4171,7 +4177,6 @@ void account_steal_time(struct task_struct *p, cputime_t steal)
if (p == rq->idle) { if (p == rq->idle) {
p->stime = cputime_add(p->stime, steal); p->stime = cputime_add(p->stime, steal);
account_group_system_time(p, steal);
if (atomic_read(&rq->nr_iowait) > 0) if (atomic_read(&rq->nr_iowait) > 0)
cpustat->iowait = cputime64_add(cpustat->iowait, tmp); cpustat->iowait = cputime64_add(cpustat->iowait, tmp);
else else
...@@ -4307,7 +4312,7 @@ void __kprobes sub_preempt_count(int val) ...@@ -4307,7 +4312,7 @@ void __kprobes sub_preempt_count(int val)
/* /*
* Underflow? * Underflow?
*/ */
if (DEBUG_LOCKS_WARN_ON(val > preempt_count())) if (DEBUG_LOCKS_WARN_ON(val > preempt_count() - (!!kernel_locked())))
return; return;
/* /*
* Is the spinlock portion underflowing? * Is the spinlock portion underflowing?
...@@ -5864,6 +5869,7 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu) ...@@ -5864,6 +5869,7 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
* The idle tasks have their own, simple scheduling class: * The idle tasks have their own, simple scheduling class:
*/ */
idle->sched_class = &idle_sched_class; idle->sched_class = &idle_sched_class;
ftrace_retfunc_init_task(idle);
} }
/* /*
......
...@@ -41,6 +41,8 @@ ...@@ -41,6 +41,8 @@
static struct kmem_cache *sigqueue_cachep; static struct kmem_cache *sigqueue_cachep;
DEFINE_TRACE(sched_signal_send);
static void __user *sig_handler(struct task_struct *t, int sig) static void __user *sig_handler(struct task_struct *t, int sig)
{ {
return t->sighand->action[sig - 1].sa.sa_handler; return t->sighand->action[sig - 1].sa.sa_handler;
......
...@@ -164,7 +164,7 @@ unsigned long __read_mostly sysctl_hung_task_check_count = 1024; ...@@ -164,7 +164,7 @@ unsigned long __read_mostly sysctl_hung_task_check_count = 1024;
/* /*
* Zero means infinite timeout - no checking done: * Zero means infinite timeout - no checking done:
*/ */
unsigned long __read_mostly sysctl_hung_task_timeout_secs = 120; unsigned long __read_mostly sysctl_hung_task_timeout_secs = 480;
unsigned long __read_mostly sysctl_hung_task_warnings = 10; unsigned long __read_mostly sysctl_hung_task_warnings = 10;
......
...@@ -858,8 +858,8 @@ void do_sys_times(struct tms *tms) ...@@ -858,8 +858,8 @@ void do_sys_times(struct tms *tms)
struct task_cputime cputime; struct task_cputime cputime;
cputime_t cutime, cstime; cputime_t cutime, cstime;
spin_lock_irq(&current->sighand->siglock);
thread_group_cputime(current, &cputime); thread_group_cputime(current, &cputime);
spin_lock_irq(&current->sighand->siglock);
cutime = current->signal->cutime; cutime = current->signal->cutime;
cstime = current->signal->cstime; cstime = current->signal->cstime;
spin_unlock_irq(&current->sighand->siglock); spin_unlock_irq(&current->sighand->siglock);
......
...@@ -484,6 +484,16 @@ static struct ctl_table kern_table[] = { ...@@ -484,6 +484,16 @@ static struct ctl_table kern_table[] = {
.proc_handler = &ftrace_enable_sysctl, .proc_handler = &ftrace_enable_sysctl,
}, },
#endif #endif
#ifdef CONFIG_TRACING
{
.ctl_name = CTL_UNNUMBERED,
.procname = "ftrace_dump_on_oops",
.data = &ftrace_dump_on_oops,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &proc_dointvec,
},
#endif
#ifdef CONFIG_MODULES #ifdef CONFIG_MODULES
{ {
.ctl_name = KERN_MODPROBE, .ctl_name = KERN_MODPROBE,
......
...@@ -3,12 +3,25 @@ ...@@ -3,12 +3,25 @@
# select HAVE_FUNCTION_TRACER: # select HAVE_FUNCTION_TRACER:
# #
config USER_STACKTRACE_SUPPORT
bool
config NOP_TRACER config NOP_TRACER
bool bool
config HAVE_FUNCTION_TRACER config HAVE_FUNCTION_TRACER
bool bool
config HAVE_FUNCTION_RET_TRACER
bool
config HAVE_FUNCTION_TRACE_MCOUNT_TEST
bool
help
This gets selected when the arch tests the function_trace_stop
variable at the mcount call site. Otherwise, this variable
is tested by the called function.
config HAVE_DYNAMIC_FTRACE config HAVE_DYNAMIC_FTRACE
bool bool
...@@ -47,6 +60,16 @@ config FUNCTION_TRACER ...@@ -47,6 +60,16 @@ config FUNCTION_TRACER
(the bootup default), then the overhead of the instructions is very (the bootup default), then the overhead of the instructions is very
small and not measurable even in micro-benchmarks. small and not measurable even in micro-benchmarks.
config FUNCTION_RET_TRACER
bool "Kernel Function return Tracer"
depends on HAVE_FUNCTION_RET_TRACER
depends on FUNCTION_TRACER
help
Enable the kernel to trace a function at its return.
It's first purpose is to trace the duration of functions.
This is done by setting the current return address on the thread
info structure of the current task.
config IRQSOFF_TRACER config IRQSOFF_TRACER
bool "Interrupts-off Latency Tracer" bool "Interrupts-off Latency Tracer"
default n default n
...@@ -138,6 +161,59 @@ config BOOT_TRACER ...@@ -138,6 +161,59 @@ config BOOT_TRACER
selected, because the self-tests are an initcall as well and that selected, because the self-tests are an initcall as well and that
would invalidate the boot trace. ) would invalidate the boot trace. )
config TRACE_BRANCH_PROFILING
bool "Trace likely/unlikely profiler"
depends on DEBUG_KERNEL
select TRACING
help
This tracer profiles all the the likely and unlikely macros
in the kernel. It will display the results in:
/debugfs/tracing/profile_annotated_branch
Note: this will add a significant overhead, only turn this
on if you need to profile the system's use of these macros.
Say N if unsure.
config PROFILE_ALL_BRANCHES
bool "Profile all if conditionals"
depends on TRACE_BRANCH_PROFILING
help
This tracer profiles all branch conditions. Every if ()
taken in the kernel is recorded whether it hit or miss.
The results will be displayed in:
/debugfs/tracing/profile_branch
This configuration, when enabled, will impose a great overhead
on the system. This should only be enabled when the system
is to be analyzed
Say N if unsure.
config TRACING_BRANCHES
bool
help
Selected by tracers that will trace the likely and unlikely
conditions. This prevents the tracers themselves from being
profiled. Profiling the tracing infrastructure can only happen
when the likelys and unlikelys are not being traced.
config BRANCH_TRACER
bool "Trace likely/unlikely instances"
depends on TRACE_BRANCH_PROFILING
select TRACING_BRANCHES
help
This traces the events of likely and unlikely condition
calls in the kernel. The difference between this and the
"Trace likely/unlikely profiler" is that this is not a
histogram of the callers, but actually places the calling
events into a running trace buffer to see when and where the
events happened, as well as their results.
Say N if unsure.
config STACK_TRACER config STACK_TRACER
bool "Trace max stack" bool "Trace max stack"
depends on HAVE_FUNCTION_TRACER depends on HAVE_FUNCTION_TRACER
......
...@@ -10,6 +10,11 @@ CFLAGS_trace_selftest_dynamic.o = -pg ...@@ -10,6 +10,11 @@ CFLAGS_trace_selftest_dynamic.o = -pg
obj-y += trace_selftest_dynamic.o obj-y += trace_selftest_dynamic.o
endif endif
# If unlikely tracing is enabled, do not trace these files
ifdef CONFIG_TRACING_BRANCHES
KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
endif
obj-$(CONFIG_FUNCTION_TRACER) += libftrace.o obj-$(CONFIG_FUNCTION_TRACER) += libftrace.o
obj-$(CONFIG_RING_BUFFER) += ring_buffer.o obj-$(CONFIG_RING_BUFFER) += ring_buffer.o
...@@ -24,5 +29,7 @@ obj-$(CONFIG_NOP_TRACER) += trace_nop.o ...@@ -24,5 +29,7 @@ obj-$(CONFIG_NOP_TRACER) += trace_nop.o
obj-$(CONFIG_STACK_TRACER) += trace_stack.o obj-$(CONFIG_STACK_TRACER) += trace_stack.o
obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
obj-$(CONFIG_BOOT_TRACER) += trace_boot.o obj-$(CONFIG_BOOT_TRACER) += trace_boot.o
obj-$(CONFIG_FUNCTION_RET_TRACER) += trace_functions_return.o
obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
libftrace-y := ftrace.o libftrace-y := ftrace.o
此差异已折叠。
此差异已折叠。
此差异已折叠。
...@@ -8,6 +8,7 @@ ...@@ -8,6 +8,7 @@
#include <linux/ring_buffer.h> #include <linux/ring_buffer.h>
#include <linux/mmiotrace.h> #include <linux/mmiotrace.h>
#include <linux/ftrace.h> #include <linux/ftrace.h>
#include <trace/boot.h>
enum trace_type { enum trace_type {
__TRACE_FIRST_TYPE = 0, __TRACE_FIRST_TYPE = 0,
...@@ -21,7 +22,11 @@ enum trace_type { ...@@ -21,7 +22,11 @@ enum trace_type {
TRACE_SPECIAL, TRACE_SPECIAL,
TRACE_MMIO_RW, TRACE_MMIO_RW,
TRACE_MMIO_MAP, TRACE_MMIO_MAP,
TRACE_BOOT, TRACE_BRANCH,
TRACE_BOOT_CALL,
TRACE_BOOT_RET,
TRACE_FN_RET,
TRACE_USER_STACK,
__TRACE_LAST_TYPE __TRACE_LAST_TYPE
}; };
...@@ -38,6 +43,7 @@ struct trace_entry { ...@@ -38,6 +43,7 @@ struct trace_entry {
unsigned char flags; unsigned char flags;
unsigned char preempt_count; unsigned char preempt_count;
int pid; int pid;
int tgid;
}; };
/* /*
...@@ -48,6 +54,16 @@ struct ftrace_entry { ...@@ -48,6 +54,16 @@ struct ftrace_entry {
unsigned long ip; unsigned long ip;
unsigned long parent_ip; unsigned long parent_ip;
}; };
/* Function return entry */
struct ftrace_ret_entry {
struct trace_entry ent;
unsigned long ip;
unsigned long parent_ip;
unsigned long long calltime;
unsigned long long rettime;
unsigned long overrun;
};
extern struct tracer boot_tracer; extern struct tracer boot_tracer;
/* /*
...@@ -85,6 +101,11 @@ struct stack_entry { ...@@ -85,6 +101,11 @@ struct stack_entry {
unsigned long caller[FTRACE_STACK_ENTRIES]; unsigned long caller[FTRACE_STACK_ENTRIES];
}; };
struct userstack_entry {
struct trace_entry ent;
unsigned long caller[FTRACE_STACK_ENTRIES];
};
/* /*
* ftrace_printk entry: * ftrace_printk entry:
*/ */
...@@ -112,9 +133,24 @@ struct trace_mmiotrace_map { ...@@ -112,9 +133,24 @@ struct trace_mmiotrace_map {
struct mmiotrace_map map; struct mmiotrace_map map;
}; };
struct trace_boot { struct trace_boot_call {
struct trace_entry ent; struct trace_entry ent;
struct boot_trace initcall; struct boot_trace_call boot_call;
};
struct trace_boot_ret {
struct trace_entry ent;
struct boot_trace_ret boot_ret;
};
#define TRACE_FUNC_SIZE 30
#define TRACE_FILE_SIZE 20
struct trace_branch {
struct trace_entry ent;
unsigned line;
char func[TRACE_FUNC_SIZE+1];
char file[TRACE_FILE_SIZE+1];
char correct;
}; };
/* /*
...@@ -172,7 +208,6 @@ struct trace_iterator; ...@@ -172,7 +208,6 @@ struct trace_iterator;
struct trace_array { struct trace_array {
struct ring_buffer *buffer; struct ring_buffer *buffer;
unsigned long entries; unsigned long entries;
long ctrl;
int cpu; int cpu;
cycle_t time_start; cycle_t time_start;
struct task_struct *waiter; struct task_struct *waiter;
...@@ -212,13 +247,17 @@ extern void __ftrace_bad_type(void); ...@@ -212,13 +247,17 @@ extern void __ftrace_bad_type(void);
IF_ASSIGN(var, ent, struct ctx_switch_entry, 0); \ IF_ASSIGN(var, ent, struct ctx_switch_entry, 0); \
IF_ASSIGN(var, ent, struct trace_field_cont, TRACE_CONT); \ IF_ASSIGN(var, ent, struct trace_field_cont, TRACE_CONT); \
IF_ASSIGN(var, ent, struct stack_entry, TRACE_STACK); \ IF_ASSIGN(var, ent, struct stack_entry, TRACE_STACK); \
IF_ASSIGN(var, ent, struct userstack_entry, TRACE_USER_STACK);\
IF_ASSIGN(var, ent, struct print_entry, TRACE_PRINT); \ IF_ASSIGN(var, ent, struct print_entry, TRACE_PRINT); \
IF_ASSIGN(var, ent, struct special_entry, 0); \ IF_ASSIGN(var, ent, struct special_entry, 0); \
IF_ASSIGN(var, ent, struct trace_mmiotrace_rw, \ IF_ASSIGN(var, ent, struct trace_mmiotrace_rw, \
TRACE_MMIO_RW); \ TRACE_MMIO_RW); \
IF_ASSIGN(var, ent, struct trace_mmiotrace_map, \ IF_ASSIGN(var, ent, struct trace_mmiotrace_map, \
TRACE_MMIO_MAP); \ TRACE_MMIO_MAP); \
IF_ASSIGN(var, ent, struct trace_boot, TRACE_BOOT); \ IF_ASSIGN(var, ent, struct trace_boot_call, TRACE_BOOT_CALL);\
IF_ASSIGN(var, ent, struct trace_boot_ret, TRACE_BOOT_RET);\
IF_ASSIGN(var, ent, struct trace_branch, TRACE_BRANCH); \
IF_ASSIGN(var, ent, struct ftrace_ret_entry, TRACE_FN_RET);\
__ftrace_bad_type(); \ __ftrace_bad_type(); \
} while (0) } while (0)
...@@ -229,29 +268,55 @@ enum print_line_t { ...@@ -229,29 +268,55 @@ enum print_line_t {
TRACE_TYPE_UNHANDLED = 2 /* Relay to other output functions */ TRACE_TYPE_UNHANDLED = 2 /* Relay to other output functions */
}; };
/*
* An option specific to a tracer. This is a boolean value.
* The bit is the bit index that sets its value on the
* flags value in struct tracer_flags.
*/
struct tracer_opt {
const char *name; /* Will appear on the trace_options file */
u32 bit; /* Mask assigned in val field in tracer_flags */
};
/*
* The set of specific options for a tracer. Your tracer
* have to set the initial value of the flags val.
*/
struct tracer_flags {
u32 val;
struct tracer_opt *opts;
};
/* Makes more easy to define a tracer opt */
#define TRACER_OPT(s, b) .name = #s, .bit = b
/* /*
* A specific tracer, represented by methods that operate on a trace array: * A specific tracer, represented by methods that operate on a trace array:
*/ */
struct tracer { struct tracer {
const char *name; const char *name;
void (*init)(struct trace_array *tr); /* Your tracer should raise a warning if init fails */
int (*init)(struct trace_array *tr);
void (*reset)(struct trace_array *tr); void (*reset)(struct trace_array *tr);
void (*start)(struct trace_array *tr);
void (*stop)(struct trace_array *tr);
void (*open)(struct trace_iterator *iter); void (*open)(struct trace_iterator *iter);
void (*pipe_open)(struct trace_iterator *iter); void (*pipe_open)(struct trace_iterator *iter);
void (*close)(struct trace_iterator *iter); void (*close)(struct trace_iterator *iter);
void (*start)(struct trace_iterator *iter);
void (*stop)(struct trace_iterator *iter);
ssize_t (*read)(struct trace_iterator *iter, ssize_t (*read)(struct trace_iterator *iter,
struct file *filp, char __user *ubuf, struct file *filp, char __user *ubuf,
size_t cnt, loff_t *ppos); size_t cnt, loff_t *ppos);
void (*ctrl_update)(struct trace_array *tr);
#ifdef CONFIG_FTRACE_STARTUP_TEST #ifdef CONFIG_FTRACE_STARTUP_TEST
int (*selftest)(struct tracer *trace, int (*selftest)(struct tracer *trace,
struct trace_array *tr); struct trace_array *tr);
#endif #endif
enum print_line_t (*print_line)(struct trace_iterator *iter); enum print_line_t (*print_line)(struct trace_iterator *iter);
/* If you handled the flag setting, return 0 */
int (*set_flag)(u32 old_flags, u32 bit, int set);
struct tracer *next; struct tracer *next;
int print_max; int print_max;
struct tracer_flags *flags;
}; };
struct trace_seq { struct trace_seq {
...@@ -279,8 +344,11 @@ struct trace_iterator { ...@@ -279,8 +344,11 @@ struct trace_iterator {
unsigned long iter_flags; unsigned long iter_flags;
loff_t pos; loff_t pos;
long idx; long idx;
cpumask_t started;
}; };
int tracing_is_enabled(void);
void trace_wake_up(void); void trace_wake_up(void);
void tracing_reset(struct trace_array *tr, int cpu); void tracing_reset(struct trace_array *tr, int cpu);
int tracing_open_generic(struct inode *inode, struct file *filp); int tracing_open_generic(struct inode *inode, struct file *filp);
...@@ -320,9 +388,14 @@ void trace_function(struct trace_array *tr, ...@@ -320,9 +388,14 @@ void trace_function(struct trace_array *tr,
unsigned long ip, unsigned long ip,
unsigned long parent_ip, unsigned long parent_ip,
unsigned long flags, int pc); unsigned long flags, int pc);
void
trace_function_return(struct ftrace_retfunc *trace);
void tracing_start_cmdline_record(void); void tracing_start_cmdline_record(void);
void tracing_stop_cmdline_record(void); void tracing_stop_cmdline_record(void);
void tracing_sched_switch_assign_trace(struct trace_array *tr);
void tracing_stop_sched_switch_record(void);
void tracing_start_sched_switch_record(void);
int register_tracer(struct tracer *type); int register_tracer(struct tracer *type);
void unregister_tracer(struct tracer *type); void unregister_tracer(struct tracer *type);
...@@ -383,12 +456,18 @@ extern int trace_selftest_startup_sched_switch(struct tracer *trace, ...@@ -383,12 +456,18 @@ extern int trace_selftest_startup_sched_switch(struct tracer *trace,
struct trace_array *tr); struct trace_array *tr);
extern int trace_selftest_startup_sysprof(struct tracer *trace, extern int trace_selftest_startup_sysprof(struct tracer *trace,
struct trace_array *tr); struct trace_array *tr);
extern int trace_selftest_startup_branch(struct tracer *trace,
struct trace_array *tr);
#endif /* CONFIG_FTRACE_STARTUP_TEST */ #endif /* CONFIG_FTRACE_STARTUP_TEST */
extern void *head_page(struct trace_array_cpu *data); extern void *head_page(struct trace_array_cpu *data);
extern int trace_seq_printf(struct trace_seq *s, const char *fmt, ...); extern int trace_seq_printf(struct trace_seq *s, const char *fmt, ...);
extern void trace_seq_print_cont(struct trace_seq *s, extern void trace_seq_print_cont(struct trace_seq *s,
struct trace_iterator *iter); struct trace_iterator *iter);
extern int
seq_print_ip_sym(struct trace_seq *s, unsigned long ip,
unsigned long sym_flags);
extern ssize_t trace_seq_to_user(struct trace_seq *s, char __user *ubuf, extern ssize_t trace_seq_to_user(struct trace_seq *s, char __user *ubuf,
size_t cnt); size_t cnt);
extern long ns2usecs(cycle_t nsec); extern long ns2usecs(cycle_t nsec);
...@@ -396,6 +475,17 @@ extern int trace_vprintk(unsigned long ip, const char *fmt, va_list args); ...@@ -396,6 +475,17 @@ extern int trace_vprintk(unsigned long ip, const char *fmt, va_list args);
extern unsigned long trace_flags; extern unsigned long trace_flags;
/* Standard output formatting function used for function return traces */
#ifdef CONFIG_FUNCTION_RET_TRACER
extern enum print_line_t print_return_function(struct trace_iterator *iter);
#else
static inline enum print_line_t
print_return_function(struct trace_iterator *iter)
{
return TRACE_TYPE_UNHANDLED;
}
#endif
/* /*
* trace_iterator_flags is an enumeration that defines bit * trace_iterator_flags is an enumeration that defines bit
* positions into trace_flags that controls the output. * positions into trace_flags that controls the output.
...@@ -415,8 +505,92 @@ enum trace_iterator_flags { ...@@ -415,8 +505,92 @@ enum trace_iterator_flags {
TRACE_ITER_STACKTRACE = 0x100, TRACE_ITER_STACKTRACE = 0x100,
TRACE_ITER_SCHED_TREE = 0x200, TRACE_ITER_SCHED_TREE = 0x200,
TRACE_ITER_PRINTK = 0x400, TRACE_ITER_PRINTK = 0x400,
TRACE_ITER_PREEMPTONLY = 0x800,
TRACE_ITER_BRANCH = 0x1000,
TRACE_ITER_ANNOTATE = 0x2000,
TRACE_ITER_USERSTACKTRACE = 0x4000,
TRACE_ITER_SYM_USEROBJ = 0x8000
}; };
/*
* TRACE_ITER_SYM_MASK masks the options in trace_flags that
* control the output of kernel symbols.
*/
#define TRACE_ITER_SYM_MASK \
(TRACE_ITER_PRINT_PARENT|TRACE_ITER_SYM_OFFSET|TRACE_ITER_SYM_ADDR)
extern struct tracer nop_trace; extern struct tracer nop_trace;
/**
* ftrace_preempt_disable - disable preemption scheduler safe
*
* When tracing can happen inside the scheduler, there exists
* cases that the tracing might happen before the need_resched
* flag is checked. If this happens and the tracer calls
* preempt_enable (after a disable), a schedule might take place
* causing an infinite recursion.
*
* To prevent this, we read the need_recshed flag before
* disabling preemption. When we want to enable preemption we
* check the flag, if it is set, then we call preempt_enable_no_resched.
* Otherwise, we call preempt_enable.
*
* The rational for doing the above is that if need resched is set
* and we have yet to reschedule, we are either in an atomic location
* (where we do not need to check for scheduling) or we are inside
* the scheduler and do not want to resched.
*/
static inline int ftrace_preempt_disable(void)
{
int resched;
resched = need_resched();
preempt_disable_notrace();
return resched;
}
/**
* ftrace_preempt_enable - enable preemption scheduler safe
* @resched: the return value from ftrace_preempt_disable
*
* This is a scheduler safe way to enable preemption and not miss
* any preemption checks. The disabled saved the state of preemption.
* If resched is set, then we were either inside an atomic or
* are inside the scheduler (we would have already scheduled
* otherwise). In this case, we do not want to call normal
* preempt_enable, but preempt_enable_no_resched instead.
*/
static inline void ftrace_preempt_enable(int resched)
{
if (resched)
preempt_enable_no_resched_notrace();
else
preempt_enable_notrace();
}
#ifdef CONFIG_BRANCH_TRACER
extern int enable_branch_tracing(struct trace_array *tr);
extern void disable_branch_tracing(void);
static inline int trace_branch_enable(struct trace_array *tr)
{
if (trace_flags & TRACE_ITER_BRANCH)
return enable_branch_tracing(tr);
return 0;
}
static inline void trace_branch_disable(void)
{
/* due to races, always disable */
disable_branch_tracing();
}
#else
static inline int trace_branch_enable(struct trace_array *tr)
{
return 0;
}
static inline void trace_branch_disable(void)
{
}
#endif /* CONFIG_BRANCH_TRACER */
#endif /* _LINUX_KERNEL_TRACE_H */ #endif /* _LINUX_KERNEL_TRACE_H */
此差异已折叠。
此差异已折叠。
...@@ -42,24 +42,20 @@ static void stop_function_trace(struct trace_array *tr) ...@@ -42,24 +42,20 @@ static void stop_function_trace(struct trace_array *tr)
tracing_stop_cmdline_record(); tracing_stop_cmdline_record();
} }
static void function_trace_init(struct trace_array *tr) static int function_trace_init(struct trace_array *tr)
{ {
if (tr->ctrl) start_function_trace(tr);
start_function_trace(tr); return 0;
} }
static void function_trace_reset(struct trace_array *tr) static void function_trace_reset(struct trace_array *tr)
{ {
if (tr->ctrl) stop_function_trace(tr);
stop_function_trace(tr);
} }
static void function_trace_ctrl_update(struct trace_array *tr) static void function_trace_start(struct trace_array *tr)
{ {
if (tr->ctrl) function_reset(tr);
start_function_trace(tr);
else
stop_function_trace(tr);
} }
static struct tracer function_trace __read_mostly = static struct tracer function_trace __read_mostly =
...@@ -67,7 +63,7 @@ static struct tracer function_trace __read_mostly = ...@@ -67,7 +63,7 @@ static struct tracer function_trace __read_mostly =
.name = "function", .name = "function",
.init = function_trace_init, .init = function_trace_init,
.reset = function_trace_reset, .reset = function_trace_reset,
.ctrl_update = function_trace_ctrl_update, .start = function_trace_start,
#ifdef CONFIG_FTRACE_SELFTEST #ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_function, .selftest = trace_selftest_startup_function,
#endif #endif
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
...@@ -261,27 +261,17 @@ static void stop_stack_trace(struct trace_array *tr) ...@@ -261,27 +261,17 @@ static void stop_stack_trace(struct trace_array *tr)
mutex_unlock(&sample_timer_lock); mutex_unlock(&sample_timer_lock);
} }
static void stack_trace_init(struct trace_array *tr) static int stack_trace_init(struct trace_array *tr)
{ {
sysprof_trace = tr; sysprof_trace = tr;
if (tr->ctrl) start_stack_trace(tr);
start_stack_trace(tr); return 0;
} }
static void stack_trace_reset(struct trace_array *tr) static void stack_trace_reset(struct trace_array *tr)
{ {
if (tr->ctrl) stop_stack_trace(tr);
stop_stack_trace(tr);
}
static void stack_trace_ctrl_update(struct trace_array *tr)
{
/* When starting a new trace, reset the buffers */
if (tr->ctrl)
start_stack_trace(tr);
else
stop_stack_trace(tr);
} }
static struct tracer stack_trace __read_mostly = static struct tracer stack_trace __read_mostly =
...@@ -289,7 +279,6 @@ static struct tracer stack_trace __read_mostly = ...@@ -289,7 +279,6 @@ static struct tracer stack_trace __read_mostly =
.name = "sysprof", .name = "sysprof",
.init = stack_trace_init, .init = stack_trace_init,
.reset = stack_trace_reset, .reset = stack_trace_reset,
.ctrl_update = stack_trace_ctrl_update,
#ifdef CONFIG_FTRACE_SELFTEST #ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_sysprof, .selftest = trace_selftest_startup_sysprof,
#endif #endif
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册