提交 a926021c 编写于 作者: L Linus Torvalds

Merge branch 'perf-core-for-linus' of...

Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (184 commits)
  perf probe: Clean up probe_point_lazy_walker() return value
  tracing: Fix irqoff selftest expanding max buffer
  tracing: Align 4 byte ints together in struct tracer
  tracing: Export trace_set_clr_event()
  tracing: Explain about unstable clock on resume with ring buffer warning
  ftrace/graph: Trace function entry before updating index
  ftrace: Add .ref.text as one of the safe areas to trace
  tracing: Adjust conditional expression latency formatting.
  tracing: Fix event alignment: skb:kfree_skb
  tracing: Fix event alignment: mce:mce_record
  tracing: Fix event alignment: kvm:kvm_hv_hypercall
  tracing: Fix event alignment: module:module_request
  tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup
  tracing: Remove lock_depth from event entry
  perf header: Stop using 'self'
  perf session: Use evlist/evsel for managing perf.data attributes
  perf top: Don't let events to eat up whole header line
  perf top: Fix events overflow in top command
  ring-buffer: Remove unused #include <linux/trace_irq.h>
  tracing: Add an 'overwrite' trace_option.
  ...
...@@ -247,6 +247,13 @@ You need very few things to get the syscalls tracing in an arch. ...@@ -247,6 +247,13 @@ You need very few things to get the syscalls tracing in an arch.
- Support the TIF_SYSCALL_TRACEPOINT thread flags. - Support the TIF_SYSCALL_TRACEPOINT thread flags.
- Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace - Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace
in the ptrace syscalls tracing path. in the ptrace syscalls tracing path.
- If the system call table on this arch is more complicated than a simple array
of addresses of the system calls, implement an arch_syscall_addr to return
the address of a given system call.
- If the symbol names of the system calls do not match the function names on
this arch, define ARCH_HAS_SYSCALL_MATCH_SYM_NAME in asm/ftrace.h and
implement arch_syscall_match_sym_name with the appropriate logic to return
true if the function name corresponds with the symbol name.
- Tag this arch as HAVE_SYSCALL_TRACEPOINTS. - Tag this arch as HAVE_SYSCALL_TRACEPOINTS.
......
...@@ -80,11 +80,11 @@ of ftrace. Here is a list of some of the key files: ...@@ -80,11 +80,11 @@ of ftrace. Here is a list of some of the key files:
tracers listed here can be configured by tracers listed here can be configured by
echoing their name into current_tracer. echoing their name into current_tracer.
tracing_enabled: tracing_on:
This sets or displays whether the current_tracer This sets or displays whether writing to the trace
is activated and tracing or not. Echo 0 into this ring buffer is enabled. Echo 0 into this file to disable
file to disable the tracer or 1 to enable it. the tracer or 1 to enable it.
trace: trace:
...@@ -202,10 +202,6 @@ Here is the list of current tracers that may be configured. ...@@ -202,10 +202,6 @@ Here is the list of current tracers that may be configured.
to draw a graph of function calls similar to C code to draw a graph of function calls similar to C code
source. source.
"sched_switch"
Traces the context switches and wakeups between tasks.
"irqsoff" "irqsoff"
Traces the areas that disable interrupts and saves Traces the areas that disable interrupts and saves
...@@ -273,39 +269,6 @@ format, the function name that was traced "path_put" and the ...@@ -273,39 +269,6 @@ format, the function name that was traced "path_put" and the
parent function that called this function "path_walk". The parent function that called this function "path_walk". The
timestamp is the time at which the function was entered. timestamp is the time at which the function was entered.
The sched_switch tracer also includes tracing of task wakeups
and context switches.
ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S
ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S
ksoftirqd/1-7 [01] 1453.070013: 7:115:R ==> 10:115:R
events/1-10 [01] 1453.070013: 10:115:S ==> 2916:115:R
kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R
ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R
Wake ups are represented by a "+" and the context switches are
shown as "==>". The format is:
Context switches:
Previous task Next Task
<pid>:<prio>:<state> ==> <pid>:<prio>:<state>
Wake ups:
Current task Task waking up
<pid>:<prio>:<state> + <pid>:<prio>:<state>
The prio is the internal kernel priority, which is the inverse
of the priority that is usually displayed by user-space tools.
Zero represents the highest priority (99). Prio 100 starts the
"nice" priorities with 100 being equal to nice -20 and 139 being
nice 19. The prio "140" is reserved for the idle task which is
the lowest priority thread (pid 0).
Latency trace format Latency trace format
-------------------- --------------------
...@@ -491,78 +454,10 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] ...@@ -491,78 +454,10 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
latencies, as described in "Latency latencies, as described in "Latency
trace format". trace format".
sched_switch overwrite - This controls what happens when the trace buffer is
------------ full. If "1" (default), the oldest events are
discarded and overwritten. If "0", then the newest
This tracer simply records schedule switches. Here is an example events are discarded.
of how to use it.
# echo sched_switch > current_tracer
# echo 1 > tracing_enabled
# sleep 1
# echo 0 > tracing_enabled
# cat trace
# tracer: sched_switch
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
bash-3997 [01] 240.132281: 3997:120:R + 4055:120:R
bash-3997 [01] 240.132284: 3997:120:R ==> 4055:120:R
sleep-4055 [01] 240.132371: 4055:120:S ==> 3997:120:R
bash-3997 [01] 240.132454: 3997:120:R + 4055:120:S
bash-3997 [01] 240.132457: 3997:120:R ==> 4055:120:R
sleep-4055 [01] 240.132460: 4055:120:D ==> 3997:120:R
bash-3997 [01] 240.132463: 3997:120:R + 4055:120:D
bash-3997 [01] 240.132465: 3997:120:R ==> 4055:120:R
<idle>-0 [00] 240.132589: 0:140:R + 4:115:S
<idle>-0 [00] 240.132591: 0:140:R ==> 4:115:R
ksoftirqd/0-4 [00] 240.132595: 4:115:S ==> 0:140:R
<idle>-0 [00] 240.132598: 0:140:R + 4:115:S
<idle>-0 [00] 240.132599: 0:140:R ==> 4:115:R
ksoftirqd/0-4 [00] 240.132603: 4:115:S ==> 0:140:R
sleep-4055 [01] 240.133058: 4055:120:S ==> 3997:120:R
[...]
As we have discussed previously about this format, the header
shows the name of the trace and points to the options. The
"FUNCTION" is a misnomer since here it represents the wake ups
and context switches.
The sched_switch file only lists the wake ups (represented with
'+') and context switches ('==>') with the previous task or
current task first followed by the next task or task waking up.
The format for both of these is PID:KERNEL-PRIO:TASK-STATE.
Remember that the KERNEL-PRIO is the inverse of the actual
priority with zero (0) being the highest priority and the nice
values starting at 100 (nice -20). Below is a quick chart to map
the kernel priority to user land priorities.
Kernel Space User Space
===============================================================
0(high) to 98(low) user RT priority 99(high) to 1(low)
with SCHED_RR or SCHED_FIFO
---------------------------------------------------------------
99 sched_priority is not used in scheduling
decisions(it must be specified as 0)
---------------------------------------------------------------
100(high) to 139(low) user nice -20(high) to 19(low)
---------------------------------------------------------------
140 idle task priority
---------------------------------------------------------------
The task states are:
R - running : wants to run, may not actually be running
S - sleep : process is waiting to be woken up (handles signals)
D - disk sleep (uninterruptible sleep) : process must be woken up
(ignores signals)
T - stopped : process suspended
t - traced : process is being traced (with something like gdb)
Z - zombie : process waiting to be cleaned up
X - unknown
ftrace_enabled ftrace_enabled
-------------- --------------
...@@ -607,10 +502,10 @@ an example: ...@@ -607,10 +502,10 @@ an example:
# echo irqsoff > current_tracer # echo irqsoff > current_tracer
# echo latency-format > trace_options # echo latency-format > trace_options
# echo 0 > tracing_max_latency # echo 0 > tracing_max_latency
# echo 1 > tracing_enabled # echo 1 > tracing_on
# ls -ltr # ls -ltr
[...] [...]
# echo 0 > tracing_enabled # echo 0 > tracing_on
# cat trace # cat trace
# tracer: irqsoff # tracer: irqsoff
# #
...@@ -715,10 +610,10 @@ is much like the irqsoff tracer. ...@@ -715,10 +610,10 @@ is much like the irqsoff tracer.
# echo preemptoff > current_tracer # echo preemptoff > current_tracer
# echo latency-format > trace_options # echo latency-format > trace_options
# echo 0 > tracing_max_latency # echo 0 > tracing_max_latency
# echo 1 > tracing_enabled # echo 1 > tracing_on
# ls -ltr # ls -ltr
[...] [...]
# echo 0 > tracing_enabled # echo 0 > tracing_on
# cat trace # cat trace
# tracer: preemptoff # tracer: preemptoff
# #
...@@ -863,10 +758,10 @@ tracers. ...@@ -863,10 +758,10 @@ tracers.
# echo preemptirqsoff > current_tracer # echo preemptirqsoff > current_tracer
# echo latency-format > trace_options # echo latency-format > trace_options
# echo 0 > tracing_max_latency # echo 0 > tracing_max_latency
# echo 1 > tracing_enabled # echo 1 > tracing_on
# ls -ltr # ls -ltr
[...] [...]
# echo 0 > tracing_enabled # echo 0 > tracing_on
# cat trace # cat trace
# tracer: preemptirqsoff # tracer: preemptirqsoff
# #
...@@ -1026,9 +921,9 @@ Instead of performing an 'ls', we will run 'sleep 1' under ...@@ -1026,9 +921,9 @@ Instead of performing an 'ls', we will run 'sleep 1' under
# echo wakeup > current_tracer # echo wakeup > current_tracer
# echo latency-format > trace_options # echo latency-format > trace_options
# echo 0 > tracing_max_latency # echo 0 > tracing_max_latency
# echo 1 > tracing_enabled # echo 1 > tracing_on
# chrt -f 5 sleep 1 # chrt -f 5 sleep 1
# echo 0 > tracing_enabled # echo 0 > tracing_on
# cat trace # cat trace
# tracer: wakeup # tracer: wakeup
# #
...@@ -1140,9 +1035,9 @@ ftrace_enabled is set; otherwise this tracer is a nop. ...@@ -1140,9 +1035,9 @@ ftrace_enabled is set; otherwise this tracer is a nop.
# sysctl kernel.ftrace_enabled=1 # sysctl kernel.ftrace_enabled=1
# echo function > current_tracer # echo function > current_tracer
# echo 1 > tracing_enabled # echo 1 > tracing_on
# usleep 1 # usleep 1
# echo 0 > tracing_enabled # echo 0 > tracing_on
# cat trace # cat trace
# tracer: function # tracer: function
# #
...@@ -1180,7 +1075,7 @@ int trace_fd; ...@@ -1180,7 +1075,7 @@ int trace_fd;
[...] [...]
int main(int argc, char *argv[]) { int main(int argc, char *argv[]) {
[...] [...]
trace_fd = open(tracing_file("tracing_enabled"), O_WRONLY); trace_fd = open(tracing_file("tracing_on"), O_WRONLY);
[...] [...]
if (condition_hit()) { if (condition_hit()) {
write(trace_fd, "0", 1); write(trace_fd, "0", 1);
...@@ -1631,9 +1526,9 @@ If I am only interested in sys_nanosleep and hrtimer_interrupt: ...@@ -1631,9 +1526,9 @@ If I am only interested in sys_nanosleep and hrtimer_interrupt:
# echo sys_nanosleep hrtimer_interrupt \ # echo sys_nanosleep hrtimer_interrupt \
> set_ftrace_filter > set_ftrace_filter
# echo function > current_tracer # echo function > current_tracer
# echo 1 > tracing_enabled # echo 1 > tracing_on
# usleep 1 # usleep 1
# echo 0 > tracing_enabled # echo 0 > tracing_on
# cat trace # cat trace
# tracer: ftrace # tracer: ftrace
# #
...@@ -1879,9 +1774,9 @@ different. The trace is live. ...@@ -1879,9 +1774,9 @@ different. The trace is live.
# echo function > current_tracer # echo function > current_tracer
# cat trace_pipe > /tmp/trace.out & # cat trace_pipe > /tmp/trace.out &
[1] 4153 [1] 4153
# echo 1 > tracing_enabled # echo 1 > tracing_on
# usleep 1 # usleep 1
# echo 0 > tracing_enabled # echo 0 > tracing_on
# cat trace # cat trace
# tracer: function # tracer: function
# #
......
...@@ -42,11 +42,25 @@ Synopsis of kprobe_events ...@@ -42,11 +42,25 @@ Synopsis of kprobe_events
+|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**) +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
NAME=FETCHARG : Set NAME as the argument name of FETCHARG. NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
(u8/u16/u32/u64/s8/s16/s32/s64) and string are supported. (u8/u16/u32/u64/s8/s16/s32/s64), "string" and bitfield
are supported.
(*) only for return probe. (*) only for return probe.
(**) this is useful for fetching a field of data structures. (**) this is useful for fetching a field of data structures.
Types
-----
Several types are supported for fetch-args. Kprobe tracer will access memory
by given type. Prefix 's' and 'u' means those types are signed and unsigned
respectively. Traced arguments are shown in decimal (signed) or hex (unsigned).
String type is a special type, which fetches a "null-terminated" string from
kernel space. This means it will fail and store NULL if the string container
has been paged out.
Bitfield is another special type, which takes 3 parameters, bit-width, bit-
offset, and container-size (usually 32). The syntax is;
b<bit-width>@<bit-offset>/<container-size>
Per-Probe Event Filtering Per-Probe Event Filtering
------------------------- -------------------------
......
...@@ -25,6 +25,8 @@ ...@@ -25,6 +25,8 @@
#define sysretl_audit ia32_ret_from_sys_call #define sysretl_audit ia32_ret_from_sys_call
#endif #endif
.section .entry.text, "ax"
#define IA32_NR_syscalls ((ia32_syscall_end - ia32_sys_call_table)/8) #define IA32_NR_syscalls ((ia32_syscall_end - ia32_sys_call_table)/8)
.macro IA32_ARG_FIXUP noebp=0 .macro IA32_ARG_FIXUP noebp=0
......
...@@ -160,6 +160,7 @@ ...@@ -160,6 +160,7 @@
#define X86_FEATURE_NODEID_MSR (6*32+19) /* NodeId MSR */ #define X86_FEATURE_NODEID_MSR (6*32+19) /* NodeId MSR */
#define X86_FEATURE_TBM (6*32+21) /* trailing bit manipulations */ #define X86_FEATURE_TBM (6*32+21) /* trailing bit manipulations */
#define X86_FEATURE_TOPOEXT (6*32+22) /* topology extensions CPUID leafs */ #define X86_FEATURE_TOPOEXT (6*32+22) /* topology extensions CPUID leafs */
#define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
/* /*
* Auxiliary flags: Linux defined - For features scattered in various * Auxiliary flags: Linux defined - For features scattered in various
...@@ -279,6 +280,7 @@ extern const char * const x86_power_flags[32]; ...@@ -279,6 +280,7 @@ extern const char * const x86_power_flags[32];
#define cpu_has_xsave boot_cpu_has(X86_FEATURE_XSAVE) #define cpu_has_xsave boot_cpu_has(X86_FEATURE_XSAVE)
#define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR) #define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR)
#define cpu_has_pclmulqdq boot_cpu_has(X86_FEATURE_PCLMULQDQ) #define cpu_has_pclmulqdq boot_cpu_has(X86_FEATURE_PCLMULQDQ)
#define cpu_has_perfctr_core boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
#if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64) #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
# define cpu_has_invlpg 1 # define cpu_has_invlpg 1
......
...@@ -13,7 +13,6 @@ enum die_val { ...@@ -13,7 +13,6 @@ enum die_val {
DIE_PANIC, DIE_PANIC,
DIE_NMI, DIE_NMI,
DIE_DIE, DIE_DIE,
DIE_NMIWATCHDOG,
DIE_KERNELDEBUG, DIE_KERNELDEBUG,
DIE_TRAP, DIE_TRAP,
DIE_GPF, DIE_GPF,
......
...@@ -52,6 +52,9 @@ ...@@ -52,6 +52,9 @@
#define MSR_IA32_MCG_STATUS 0x0000017a #define MSR_IA32_MCG_STATUS 0x0000017a
#define MSR_IA32_MCG_CTL 0x0000017b #define MSR_IA32_MCG_CTL 0x0000017b
#define MSR_OFFCORE_RSP_0 0x000001a6
#define MSR_OFFCORE_RSP_1 0x000001a7
#define MSR_IA32_PEBS_ENABLE 0x000003f1 #define MSR_IA32_PEBS_ENABLE 0x000003f1
#define MSR_IA32_DS_AREA 0x00000600 #define MSR_IA32_DS_AREA 0x00000600
#define MSR_IA32_PERF_CAPABILITIES 0x00000345 #define MSR_IA32_PERF_CAPABILITIES 0x00000345
......
...@@ -7,7 +7,6 @@ ...@@ -7,7 +7,6 @@
#ifdef CONFIG_X86_LOCAL_APIC #ifdef CONFIG_X86_LOCAL_APIC
extern void die_nmi(char *str, struct pt_regs *regs, int do_panic);
extern int avail_to_resrv_perfctr_nmi_bit(unsigned int); extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
extern int reserve_perfctr_nmi(unsigned int); extern int reserve_perfctr_nmi(unsigned int);
extern void release_perfctr_nmi(unsigned int); extern void release_perfctr_nmi(unsigned int);
......
...@@ -17,10 +17,20 @@ ...@@ -17,10 +17,20 @@
#endif #endif
#include <asm/thread_info.h> #include <asm/thread_info.h>
#include <asm/cpumask.h> #include <asm/cpumask.h>
#include <asm/cpufeature.h>
extern int smp_num_siblings; extern int smp_num_siblings;
extern unsigned int num_processors; extern unsigned int num_processors;
static inline bool cpu_has_ht_siblings(void)
{
bool has_siblings = false;
#ifdef CONFIG_SMP
has_siblings = cpu_has_ht && smp_num_siblings > 1;
#endif
return has_siblings;
}
DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_map); DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_map);
DECLARE_PER_CPU(cpumask_var_t, cpu_core_map); DECLARE_PER_CPU(cpumask_var_t, cpu_core_map);
DECLARE_PER_CPU(u16, cpu_llc_id); DECLARE_PER_CPU(u16, cpu_llc_id);
......
...@@ -30,6 +30,7 @@ ...@@ -30,6 +30,7 @@
#include <asm/stacktrace.h> #include <asm/stacktrace.h>
#include <asm/nmi.h> #include <asm/nmi.h>
#include <asm/compat.h> #include <asm/compat.h>
#include <asm/smp.h>
#if 0 #if 0
#undef wrmsrl #undef wrmsrl
...@@ -93,6 +94,8 @@ struct amd_nb { ...@@ -93,6 +94,8 @@ struct amd_nb {
struct event_constraint event_constraints[X86_PMC_IDX_MAX]; struct event_constraint event_constraints[X86_PMC_IDX_MAX];
}; };
struct intel_percore;
#define MAX_LBR_ENTRIES 16 #define MAX_LBR_ENTRIES 16
struct cpu_hw_events { struct cpu_hw_events {
...@@ -127,6 +130,13 @@ struct cpu_hw_events { ...@@ -127,6 +130,13 @@ struct cpu_hw_events {
struct perf_branch_stack lbr_stack; struct perf_branch_stack lbr_stack;
struct perf_branch_entry lbr_entries[MAX_LBR_ENTRIES]; struct perf_branch_entry lbr_entries[MAX_LBR_ENTRIES];
/*
* Intel percore register state.
* Coordinate shared resources between HT threads.
*/
int percore_used; /* Used by this CPU? */
struct intel_percore *per_core;
/* /*
* AMD specific bits * AMD specific bits
*/ */
...@@ -166,8 +176,10 @@ struct cpu_hw_events { ...@@ -166,8 +176,10 @@ struct cpu_hw_events {
/* /*
* Constraint on the Event code + UMask * Constraint on the Event code + UMask
*/ */
#define PEBS_EVENT_CONSTRAINT(c, n) \ #define INTEL_UEVENT_CONSTRAINT(c, n) \
EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK) EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK)
#define PEBS_EVENT_CONSTRAINT(c, n) \
INTEL_UEVENT_CONSTRAINT(c, n)
#define EVENT_CONSTRAINT_END \ #define EVENT_CONSTRAINT_END \
EVENT_CONSTRAINT(0, 0, 0) EVENT_CONSTRAINT(0, 0, 0)
...@@ -175,6 +187,28 @@ struct cpu_hw_events { ...@@ -175,6 +187,28 @@ struct cpu_hw_events {
#define for_each_event_constraint(e, c) \ #define for_each_event_constraint(e, c) \
for ((e) = (c); (e)->weight; (e)++) for ((e) = (c); (e)->weight; (e)++)
/*
* Extra registers for specific events.
* Some events need large masks and require external MSRs.
* Define a mapping to these extra registers.
*/
struct extra_reg {
unsigned int event;
unsigned int msr;
u64 config_mask;
u64 valid_mask;
};
#define EVENT_EXTRA_REG(e, ms, m, vm) { \
.event = (e), \
.msr = (ms), \
.config_mask = (m), \
.valid_mask = (vm), \
}
#define INTEL_EVENT_EXTRA_REG(event, msr, vm) \
EVENT_EXTRA_REG(event, msr, ARCH_PERFMON_EVENTSEL_EVENT, vm)
#define EVENT_EXTRA_END EVENT_EXTRA_REG(0, 0, 0, 0)
union perf_capabilities { union perf_capabilities {
struct { struct {
u64 lbr_format : 6; u64 lbr_format : 6;
...@@ -219,6 +253,7 @@ struct x86_pmu { ...@@ -219,6 +253,7 @@ struct x86_pmu {
void (*put_event_constraints)(struct cpu_hw_events *cpuc, void (*put_event_constraints)(struct cpu_hw_events *cpuc,
struct perf_event *event); struct perf_event *event);
struct event_constraint *event_constraints; struct event_constraint *event_constraints;
struct event_constraint *percore_constraints;
void (*quirks)(void); void (*quirks)(void);
int perfctr_second_write; int perfctr_second_write;
...@@ -247,6 +282,11 @@ struct x86_pmu { ...@@ -247,6 +282,11 @@ struct x86_pmu {
*/ */
unsigned long lbr_tos, lbr_from, lbr_to; /* MSR base regs */ unsigned long lbr_tos, lbr_from, lbr_to; /* MSR base regs */
int lbr_nr; /* hardware stack size */ int lbr_nr; /* hardware stack size */
/*
* Extra registers for events
*/
struct extra_reg *extra_regs;
}; };
static struct x86_pmu x86_pmu __read_mostly; static struct x86_pmu x86_pmu __read_mostly;
...@@ -271,6 +311,10 @@ static u64 __read_mostly hw_cache_event_ids ...@@ -271,6 +311,10 @@ static u64 __read_mostly hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX] [PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX]; [PERF_COUNT_HW_CACHE_RESULT_MAX];
static u64 __read_mostly hw_cache_extra_regs
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX];
/* /*
* Propagate event elapsed time into the generic event. * Propagate event elapsed time into the generic event.
...@@ -298,7 +342,7 @@ x86_perf_event_update(struct perf_event *event) ...@@ -298,7 +342,7 @@ x86_perf_event_update(struct perf_event *event)
*/ */
again: again:
prev_raw_count = local64_read(&hwc->prev_count); prev_raw_count = local64_read(&hwc->prev_count);
rdmsrl(hwc->event_base + idx, new_raw_count); rdmsrl(hwc->event_base, new_raw_count);
if (local64_cmpxchg(&hwc->prev_count, prev_raw_count, if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
new_raw_count) != prev_raw_count) new_raw_count) != prev_raw_count)
...@@ -321,6 +365,49 @@ x86_perf_event_update(struct perf_event *event) ...@@ -321,6 +365,49 @@ x86_perf_event_update(struct perf_event *event)
return new_raw_count; return new_raw_count;
} }
/* using X86_FEATURE_PERFCTR_CORE to later implement ALTERNATIVE() here */
static inline int x86_pmu_addr_offset(int index)
{
if (boot_cpu_has(X86_FEATURE_PERFCTR_CORE))
return index << 1;
return index;
}
static inline unsigned int x86_pmu_config_addr(int index)
{
return x86_pmu.eventsel + x86_pmu_addr_offset(index);
}
static inline unsigned int x86_pmu_event_addr(int index)
{
return x86_pmu.perfctr + x86_pmu_addr_offset(index);
}
/*
* Find and validate any extra registers to set up.
*/
static int x86_pmu_extra_regs(u64 config, struct perf_event *event)
{
struct extra_reg *er;
event->hw.extra_reg = 0;
event->hw.extra_config = 0;
if (!x86_pmu.extra_regs)
return 0;
for (er = x86_pmu.extra_regs; er->msr; er++) {
if (er->event != (config & er->config_mask))
continue;
if (event->attr.config1 & ~er->valid_mask)
return -EINVAL;
event->hw.extra_reg = er->msr;
event->hw.extra_config = event->attr.config1;
break;
}
return 0;
}
static atomic_t active_events; static atomic_t active_events;
static DEFINE_MUTEX(pmc_reserve_mutex); static DEFINE_MUTEX(pmc_reserve_mutex);
...@@ -331,12 +418,12 @@ static bool reserve_pmc_hardware(void) ...@@ -331,12 +418,12 @@ static bool reserve_pmc_hardware(void)
int i; int i;
for (i = 0; i < x86_pmu.num_counters; i++) { for (i = 0; i < x86_pmu.num_counters; i++) {
if (!reserve_perfctr_nmi(x86_pmu.perfctr + i)) if (!reserve_perfctr_nmi(x86_pmu_event_addr(i)))
goto perfctr_fail; goto perfctr_fail;
} }
for (i = 0; i < x86_pmu.num_counters; i++) { for (i = 0; i < x86_pmu.num_counters; i++) {
if (!reserve_evntsel_nmi(x86_pmu.eventsel + i)) if (!reserve_evntsel_nmi(x86_pmu_config_addr(i)))
goto eventsel_fail; goto eventsel_fail;
} }
...@@ -344,13 +431,13 @@ static bool reserve_pmc_hardware(void) ...@@ -344,13 +431,13 @@ static bool reserve_pmc_hardware(void)
eventsel_fail: eventsel_fail:
for (i--; i >= 0; i--) for (i--; i >= 0; i--)
release_evntsel_nmi(x86_pmu.eventsel + i); release_evntsel_nmi(x86_pmu_config_addr(i));
i = x86_pmu.num_counters; i = x86_pmu.num_counters;
perfctr_fail: perfctr_fail:
for (i--; i >= 0; i--) for (i--; i >= 0; i--)
release_perfctr_nmi(x86_pmu.perfctr + i); release_perfctr_nmi(x86_pmu_event_addr(i));
return false; return false;
} }
...@@ -360,8 +447,8 @@ static void release_pmc_hardware(void) ...@@ -360,8 +447,8 @@ static void release_pmc_hardware(void)
int i; int i;
for (i = 0; i < x86_pmu.num_counters; i++) { for (i = 0; i < x86_pmu.num_counters; i++) {
release_perfctr_nmi(x86_pmu.perfctr + i); release_perfctr_nmi(x86_pmu_event_addr(i));
release_evntsel_nmi(x86_pmu.eventsel + i); release_evntsel_nmi(x86_pmu_config_addr(i));
} }
} }
...@@ -382,7 +469,7 @@ static bool check_hw_exists(void) ...@@ -382,7 +469,7 @@ static bool check_hw_exists(void)
* complain and bail. * complain and bail.
*/ */
for (i = 0; i < x86_pmu.num_counters; i++) { for (i = 0; i < x86_pmu.num_counters; i++) {
reg = x86_pmu.eventsel + i; reg = x86_pmu_config_addr(i);
ret = rdmsrl_safe(reg, &val); ret = rdmsrl_safe(reg, &val);
if (ret) if (ret)
goto msr_fail; goto msr_fail;
...@@ -407,8 +494,8 @@ static bool check_hw_exists(void) ...@@ -407,8 +494,8 @@ static bool check_hw_exists(void)
* that don't trap on the MSR access and always return 0s. * that don't trap on the MSR access and always return 0s.
*/ */
val = 0xabcdUL; val = 0xabcdUL;
ret = checking_wrmsrl(x86_pmu.perfctr, val); ret = checking_wrmsrl(x86_pmu_event_addr(0), val);
ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new); ret |= rdmsrl_safe(x86_pmu_event_addr(0), &val_new);
if (ret || val != val_new) if (ret || val != val_new)
goto msr_fail; goto msr_fail;
...@@ -442,8 +529,9 @@ static inline int x86_pmu_initialized(void) ...@@ -442,8 +529,9 @@ static inline int x86_pmu_initialized(void)
} }
static inline int static inline int
set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event_attr *attr) set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event *event)
{ {
struct perf_event_attr *attr = &event->attr;
unsigned int cache_type, cache_op, cache_result; unsigned int cache_type, cache_op, cache_result;
u64 config, val; u64 config, val;
...@@ -470,8 +558,8 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event_attr *attr) ...@@ -470,8 +558,8 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event_attr *attr)
return -EINVAL; return -EINVAL;
hwc->config |= val; hwc->config |= val;
attr->config1 = hw_cache_extra_regs[cache_type][cache_op][cache_result];
return 0; return x86_pmu_extra_regs(val, event);
} }
static int x86_setup_perfctr(struct perf_event *event) static int x86_setup_perfctr(struct perf_event *event)
...@@ -496,10 +584,10 @@ static int x86_setup_perfctr(struct perf_event *event) ...@@ -496,10 +584,10 @@ static int x86_setup_perfctr(struct perf_event *event)
} }
if (attr->type == PERF_TYPE_RAW) if (attr->type == PERF_TYPE_RAW)
return 0; return x86_pmu_extra_regs(event->attr.config, event);
if (attr->type == PERF_TYPE_HW_CACHE) if (attr->type == PERF_TYPE_HW_CACHE)
return set_ext_hw_attr(hwc, attr); return set_ext_hw_attr(hwc, event);
if (attr->config >= x86_pmu.max_events) if (attr->config >= x86_pmu.max_events)
return -EINVAL; return -EINVAL;
...@@ -617,11 +705,11 @@ static void x86_pmu_disable_all(void) ...@@ -617,11 +705,11 @@ static void x86_pmu_disable_all(void)
if (!test_bit(idx, cpuc->active_mask)) if (!test_bit(idx, cpuc->active_mask))
continue; continue;
rdmsrl(x86_pmu.eventsel + idx, val); rdmsrl(x86_pmu_config_addr(idx), val);
if (!(val & ARCH_PERFMON_EVENTSEL_ENABLE)) if (!(val & ARCH_PERFMON_EVENTSEL_ENABLE))
continue; continue;
val &= ~ARCH_PERFMON_EVENTSEL_ENABLE; val &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
wrmsrl(x86_pmu.eventsel + idx, val); wrmsrl(x86_pmu_config_addr(idx), val);
} }
} }
...@@ -642,21 +730,26 @@ static void x86_pmu_disable(struct pmu *pmu) ...@@ -642,21 +730,26 @@ static void x86_pmu_disable(struct pmu *pmu)
x86_pmu.disable_all(); x86_pmu.disable_all();
} }
static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
u64 enable_mask)
{
if (hwc->extra_reg)
wrmsrl(hwc->extra_reg, hwc->extra_config);
wrmsrl(hwc->config_base, hwc->config | enable_mask);
}
static void x86_pmu_enable_all(int added) static void x86_pmu_enable_all(int added)
{ {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int idx; int idx;
for (idx = 0; idx < x86_pmu.num_counters; idx++) { for (idx = 0; idx < x86_pmu.num_counters; idx++) {
struct perf_event *event = cpuc->events[idx]; struct hw_perf_event *hwc = &cpuc->events[idx]->hw;
u64 val;
if (!test_bit(idx, cpuc->active_mask)) if (!test_bit(idx, cpuc->active_mask))
continue; continue;
val = event->hw.config; __x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
val |= ARCH_PERFMON_EVENTSEL_ENABLE;
wrmsrl(x86_pmu.eventsel + idx, val);
} }
} }
...@@ -821,15 +914,10 @@ static inline void x86_assign_hw_event(struct perf_event *event, ...@@ -821,15 +914,10 @@ static inline void x86_assign_hw_event(struct perf_event *event,
hwc->event_base = 0; hwc->event_base = 0;
} else if (hwc->idx >= X86_PMC_IDX_FIXED) { } else if (hwc->idx >= X86_PMC_IDX_FIXED) {
hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL; hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
/* hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0;
* We set it so that event_base + idx in wrmsr/rdmsr maps to
* MSR_ARCH_PERFMON_FIXED_CTR0 ... CTR2:
*/
hwc->event_base =
MSR_ARCH_PERFMON_FIXED_CTR0 - X86_PMC_IDX_FIXED;
} else { } else {
hwc->config_base = x86_pmu.eventsel; hwc->config_base = x86_pmu_config_addr(hwc->idx);
hwc->event_base = x86_pmu.perfctr; hwc->event_base = x86_pmu_event_addr(hwc->idx);
} }
} }
...@@ -915,17 +1003,11 @@ static void x86_pmu_enable(struct pmu *pmu) ...@@ -915,17 +1003,11 @@ static void x86_pmu_enable(struct pmu *pmu)
x86_pmu.enable_all(added); x86_pmu.enable_all(added);
} }
static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
u64 enable_mask)
{
wrmsrl(hwc->config_base + hwc->idx, hwc->config | enable_mask);
}
static inline void x86_pmu_disable_event(struct perf_event *event) static inline void x86_pmu_disable_event(struct perf_event *event)
{ {
struct hw_perf_event *hwc = &event->hw; struct hw_perf_event *hwc = &event->hw;
wrmsrl(hwc->config_base + hwc->idx, hwc->config); wrmsrl(hwc->config_base, hwc->config);
} }
static DEFINE_PER_CPU(u64 [X86_PMC_IDX_MAX], pmc_prev_left); static DEFINE_PER_CPU(u64 [X86_PMC_IDX_MAX], pmc_prev_left);
...@@ -978,7 +1060,7 @@ x86_perf_event_set_period(struct perf_event *event) ...@@ -978,7 +1060,7 @@ x86_perf_event_set_period(struct perf_event *event)
*/ */
local64_set(&hwc->prev_count, (u64)-left); local64_set(&hwc->prev_count, (u64)-left);
wrmsrl(hwc->event_base + idx, (u64)(-left) & x86_pmu.cntval_mask); wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask);
/* /*
* Due to erratum on certan cpu we need * Due to erratum on certan cpu we need
...@@ -986,7 +1068,7 @@ x86_perf_event_set_period(struct perf_event *event) ...@@ -986,7 +1068,7 @@ x86_perf_event_set_period(struct perf_event *event)
* is updated properly * is updated properly
*/ */
if (x86_pmu.perfctr_second_write) { if (x86_pmu.perfctr_second_write) {
wrmsrl(hwc->event_base + idx, wrmsrl(hwc->event_base,
(u64)(-left) & x86_pmu.cntval_mask); (u64)(-left) & x86_pmu.cntval_mask);
} }
...@@ -1113,8 +1195,8 @@ void perf_event_print_debug(void) ...@@ -1113,8 +1195,8 @@ void perf_event_print_debug(void)
pr_info("CPU#%d: active: %016llx\n", cpu, *(u64 *)cpuc->active_mask); pr_info("CPU#%d: active: %016llx\n", cpu, *(u64 *)cpuc->active_mask);
for (idx = 0; idx < x86_pmu.num_counters; idx++) { for (idx = 0; idx < x86_pmu.num_counters; idx++) {
rdmsrl(x86_pmu.eventsel + idx, pmc_ctrl); rdmsrl(x86_pmu_config_addr(idx), pmc_ctrl);
rdmsrl(x86_pmu.perfctr + idx, pmc_count); rdmsrl(x86_pmu_event_addr(idx), pmc_count);
prev_left = per_cpu(pmc_prev_left[idx], cpu); prev_left = per_cpu(pmc_prev_left[idx], cpu);
...@@ -1389,7 +1471,7 @@ static void __init pmu_check_apic(void) ...@@ -1389,7 +1471,7 @@ static void __init pmu_check_apic(void)
pr_info("no hardware sampling interrupt available.\n"); pr_info("no hardware sampling interrupt available.\n");
} }
int __init init_hw_perf_events(void) static int __init init_hw_perf_events(void)
{ {
struct event_constraint *c; struct event_constraint *c;
int err; int err;
...@@ -1608,7 +1690,7 @@ static int validate_group(struct perf_event *event) ...@@ -1608,7 +1690,7 @@ static int validate_group(struct perf_event *event)
return ret; return ret;
} }
int x86_pmu_event_init(struct perf_event *event) static int x86_pmu_event_init(struct perf_event *event)
{ {
struct pmu *tmp; struct pmu *tmp;
int err; int err;
......
...@@ -127,6 +127,11 @@ static int amd_pmu_hw_config(struct perf_event *event) ...@@ -127,6 +127,11 @@ static int amd_pmu_hw_config(struct perf_event *event)
/* /*
* AMD64 events are detected based on their event codes. * AMD64 events are detected based on their event codes.
*/ */
static inline unsigned int amd_get_event_code(struct hw_perf_event *hwc)
{
return ((hwc->config >> 24) & 0x0f00) | (hwc->config & 0x00ff);
}
static inline int amd_is_nb_event(struct hw_perf_event *hwc) static inline int amd_is_nb_event(struct hw_perf_event *hwc)
{ {
return (hwc->config & 0xe0) == 0xe0; return (hwc->config & 0xe0) == 0xe0;
...@@ -385,13 +390,181 @@ static __initconst const struct x86_pmu amd_pmu = { ...@@ -385,13 +390,181 @@ static __initconst const struct x86_pmu amd_pmu = {
.cpu_dead = amd_pmu_cpu_dead, .cpu_dead = amd_pmu_cpu_dead,
}; };
/* AMD Family 15h */
#define AMD_EVENT_TYPE_MASK 0x000000F0ULL
#define AMD_EVENT_FP 0x00000000ULL ... 0x00000010ULL
#define AMD_EVENT_LS 0x00000020ULL ... 0x00000030ULL
#define AMD_EVENT_DC 0x00000040ULL ... 0x00000050ULL
#define AMD_EVENT_CU 0x00000060ULL ... 0x00000070ULL
#define AMD_EVENT_IC_DE 0x00000080ULL ... 0x00000090ULL
#define AMD_EVENT_EX_LS 0x000000C0ULL
#define AMD_EVENT_DE 0x000000D0ULL
#define AMD_EVENT_NB 0x000000E0ULL ... 0x000000F0ULL
/*
* AMD family 15h event code/PMC mappings:
*
* type = event_code & 0x0F0:
*
* 0x000 FP PERF_CTL[5:3]
* 0x010 FP PERF_CTL[5:3]
* 0x020 LS PERF_CTL[5:0]
* 0x030 LS PERF_CTL[5:0]
* 0x040 DC PERF_CTL[5:0]
* 0x050 DC PERF_CTL[5:0]
* 0x060 CU PERF_CTL[2:0]
* 0x070 CU PERF_CTL[2:0]
* 0x080 IC/DE PERF_CTL[2:0]
* 0x090 IC/DE PERF_CTL[2:0]
* 0x0A0 ---
* 0x0B0 ---
* 0x0C0 EX/LS PERF_CTL[5:0]
* 0x0D0 DE PERF_CTL[2:0]
* 0x0E0 NB NB_PERF_CTL[3:0]
* 0x0F0 NB NB_PERF_CTL[3:0]
*
* Exceptions:
*
* 0x003 FP PERF_CTL[3]
* 0x00B FP PERF_CTL[3]
* 0x00D FP PERF_CTL[3]
* 0x023 DE PERF_CTL[2:0]
* 0x02D LS PERF_CTL[3]
* 0x02E LS PERF_CTL[3,0]
* 0x043 CU PERF_CTL[2:0]
* 0x045 CU PERF_CTL[2:0]
* 0x046 CU PERF_CTL[2:0]
* 0x054 CU PERF_CTL[2:0]
* 0x055 CU PERF_CTL[2:0]
* 0x08F IC PERF_CTL[0]
* 0x187 DE PERF_CTL[0]
* 0x188 DE PERF_CTL[0]
* 0x0DB EX PERF_CTL[5:0]
* 0x0DC LS PERF_CTL[5:0]
* 0x0DD LS PERF_CTL[5:0]
* 0x0DE LS PERF_CTL[5:0]
* 0x0DF LS PERF_CTL[5:0]
* 0x1D6 EX PERF_CTL[5:0]
* 0x1D8 EX PERF_CTL[5:0]
*/
static struct event_constraint amd_f15_PMC0 = EVENT_CONSTRAINT(0, 0x01, 0);
static struct event_constraint amd_f15_PMC20 = EVENT_CONSTRAINT(0, 0x07, 0);
static struct event_constraint amd_f15_PMC3 = EVENT_CONSTRAINT(0, 0x08, 0);
static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT(0, 0x09, 0);
static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
static struct event_constraint *
amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
{
unsigned int event_code = amd_get_event_code(&event->hw);
switch (event_code & AMD_EVENT_TYPE_MASK) {
case AMD_EVENT_FP:
switch (event_code) {
case 0x003:
case 0x00B:
case 0x00D:
return &amd_f15_PMC3;
default:
return &amd_f15_PMC53;
}
case AMD_EVENT_LS:
case AMD_EVENT_DC:
case AMD_EVENT_EX_LS:
switch (event_code) {
case 0x023:
case 0x043:
case 0x045:
case 0x046:
case 0x054:
case 0x055:
return &amd_f15_PMC20;
case 0x02D:
return &amd_f15_PMC3;
case 0x02E:
return &amd_f15_PMC30;
default:
return &amd_f15_PMC50;
}
case AMD_EVENT_CU:
case AMD_EVENT_IC_DE:
case AMD_EVENT_DE:
switch (event_code) {
case 0x08F:
case 0x187:
case 0x188:
return &amd_f15_PMC0;
case 0x0DB ... 0x0DF:
case 0x1D6:
case 0x1D8:
return &amd_f15_PMC50;
default:
return &amd_f15_PMC20;
}
case AMD_EVENT_NB:
/* not yet implemented */
return &emptyconstraint;
default:
return &emptyconstraint;
}
}
static __initconst const struct x86_pmu amd_pmu_f15h = {
.name = "AMD Family 15h",
.handle_irq = x86_pmu_handle_irq,
.disable_all = x86_pmu_disable_all,
.enable_all = x86_pmu_enable_all,
.enable = x86_pmu_enable_event,
.disable = x86_pmu_disable_event,
.hw_config = amd_pmu_hw_config,
.schedule_events = x86_schedule_events,
.eventsel = MSR_F15H_PERF_CTL,
.perfctr = MSR_F15H_PERF_CTR,
.event_map = amd_pmu_event_map,
.max_events = ARRAY_SIZE(amd_perfmon_event_map),
.num_counters = 6,
.cntval_bits = 48,
.cntval_mask = (1ULL << 48) - 1,
.apic = 1,
/* use highest bit to detect overflow */
.max_period = (1ULL << 47) - 1,
.get_event_constraints = amd_get_event_constraints_f15h,
/* nortbridge counters not yet implemented: */
#if 0
.put_event_constraints = amd_put_event_constraints,
.cpu_prepare = amd_pmu_cpu_prepare,
.cpu_starting = amd_pmu_cpu_starting,
.cpu_dead = amd_pmu_cpu_dead,
#endif
};
static __init int amd_pmu_init(void) static __init int amd_pmu_init(void)
{ {
/* Performance-monitoring supported from K7 and later: */ /* Performance-monitoring supported from K7 and later: */
if (boot_cpu_data.x86 < 6) if (boot_cpu_data.x86 < 6)
return -ENODEV; return -ENODEV;
x86_pmu = amd_pmu; /*
* If core performance counter extensions exists, it must be
* family 15h, otherwise fail. See x86_pmu_addr_offset().
*/
switch (boot_cpu_data.x86) {
case 0x15:
if (!cpu_has_perfctr_core)
return -ENODEV;
x86_pmu = amd_pmu_f15h;
break;
default:
if (cpu_has_perfctr_core)
return -ENODEV;
x86_pmu = amd_pmu;
break;
}
/* Events are common for all AMDs */ /* Events are common for all AMDs */
memcpy(hw_cache_event_ids, amd_hw_cache_event_ids, memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
......
#ifdef CONFIG_CPU_SUP_INTEL #ifdef CONFIG_CPU_SUP_INTEL
#define MAX_EXTRA_REGS 2
/*
* Per register state.
*/
struct er_account {
int ref; /* reference count */
unsigned int extra_reg; /* extra MSR number */
u64 extra_config; /* extra MSR config */
};
/*
* Per core state
* This used to coordinate shared registers for HT threads.
*/
struct intel_percore {
raw_spinlock_t lock; /* protect structure */
struct er_account regs[MAX_EXTRA_REGS];
int refcnt; /* number of threads */
unsigned core_id;
};
/* /*
* Intel PerfMon, used on Core and later. * Intel PerfMon, used on Core and later.
*/ */
...@@ -64,6 +86,18 @@ static struct event_constraint intel_nehalem_event_constraints[] = ...@@ -64,6 +86,18 @@ static struct event_constraint intel_nehalem_event_constraints[] =
EVENT_CONSTRAINT_END EVENT_CONSTRAINT_END
}; };
static struct extra_reg intel_nehalem_extra_regs[] =
{
INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0xffff),
EVENT_EXTRA_END
};
static struct event_constraint intel_nehalem_percore_constraints[] =
{
INTEL_EVENT_CONSTRAINT(0xb7, 0),
EVENT_CONSTRAINT_END
};
static struct event_constraint intel_westmere_event_constraints[] = static struct event_constraint intel_westmere_event_constraints[] =
{ {
FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */ FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
...@@ -76,6 +110,33 @@ static struct event_constraint intel_westmere_event_constraints[] = ...@@ -76,6 +110,33 @@ static struct event_constraint intel_westmere_event_constraints[] =
EVENT_CONSTRAINT_END EVENT_CONSTRAINT_END
}; };
static struct event_constraint intel_snb_event_constraints[] =
{
FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
/* FIXED_EVENT_CONSTRAINT(0x013c, 2), CPU_CLK_UNHALTED.REF */
INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.PENDING */
INTEL_EVENT_CONSTRAINT(0xb7, 0x1), /* OFF_CORE_RESPONSE_0 */
INTEL_EVENT_CONSTRAINT(0xbb, 0x8), /* OFF_CORE_RESPONSE_1 */
INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST */
INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.LOAD_LATENCY */
EVENT_CONSTRAINT_END
};
static struct extra_reg intel_westmere_extra_regs[] =
{
INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0xffff),
INTEL_EVENT_EXTRA_REG(0xbb, MSR_OFFCORE_RSP_1, 0xffff),
EVENT_EXTRA_END
};
static struct event_constraint intel_westmere_percore_constraints[] =
{
INTEL_EVENT_CONSTRAINT(0xb7, 0),
INTEL_EVENT_CONSTRAINT(0xbb, 0),
EVENT_CONSTRAINT_END
};
static struct event_constraint intel_gen_event_constraints[] = static struct event_constraint intel_gen_event_constraints[] =
{ {
FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */ FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
...@@ -89,6 +150,106 @@ static u64 intel_pmu_event_map(int hw_event) ...@@ -89,6 +150,106 @@ static u64 intel_pmu_event_map(int hw_event)
return intel_perfmon_event_map[hw_event]; return intel_perfmon_event_map[hw_event];
} }
static __initconst const u64 snb_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
[ C(L1D) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0xf1d0, /* MEM_UOP_RETIRED.LOADS */
[ C(RESULT_MISS) ] = 0x0151, /* L1D.REPLACEMENT */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0xf2d0, /* MEM_UOP_RETIRED.STORES */
[ C(RESULT_MISS) ] = 0x0851, /* L1D.ALL_M_REPLACEMENT */
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x024e, /* HW_PRE_REQ.DL1_MISS */
},
},
[ C(L1I ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0280, /* ICACHE.MISSES */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0,
},
},
[ C(LL ) ] = {
/*
* TBD: Need Off-core Response Performance Monitoring support
*/
[ C(OP_READ) ] = {
/* OFFCORE_RESPONSE_0.ANY_DATA.LOCAL_CACHE */
[ C(RESULT_ACCESS) ] = 0x01b7,
/* OFFCORE_RESPONSE_1.ANY_DATA.ANY_LLC_MISS */
[ C(RESULT_MISS) ] = 0x01bb,
},
[ C(OP_WRITE) ] = {
/* OFFCORE_RESPONSE_0.ANY_RFO.LOCAL_CACHE */
[ C(RESULT_ACCESS) ] = 0x01b7,
/* OFFCORE_RESPONSE_1.ANY_RFO.ANY_LLC_MISS */
[ C(RESULT_MISS) ] = 0x01bb,
},
[ C(OP_PREFETCH) ] = {
/* OFFCORE_RESPONSE_0.PREFETCH.LOCAL_CACHE */
[ C(RESULT_ACCESS) ] = 0x01b7,
/* OFFCORE_RESPONSE_1.PREFETCH.ANY_LLC_MISS */
[ C(RESULT_MISS) ] = 0x01bb,
},
},
[ C(DTLB) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOP_RETIRED.ALL_LOADS */
[ C(RESULT_MISS) ] = 0x0108, /* DTLB_LOAD_MISSES.CAUSES_A_WALK */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_UOP_RETIRED.ALL_STORES */
[ C(RESULT_MISS) ] = 0x0149, /* DTLB_STORE_MISSES.MISS_CAUSES_A_WALK */
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0,
},
},
[ C(ITLB) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x1085, /* ITLB_MISSES.STLB_HIT */
[ C(RESULT_MISS) ] = 0x0185, /* ITLB_MISSES.CAUSES_A_WALK */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
},
[ C(BPU ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x00c4, /* BR_INST_RETIRED.ALL_BRANCHES */
[ C(RESULT_MISS) ] = 0x00c5, /* BR_MISP_RETIRED.ALL_BRANCHES */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
},
};
static __initconst const u64 westmere_hw_cache_event_ids static __initconst const u64 westmere_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX] [PERF_COUNT_HW_CACHE_OP_MAX]
...@@ -124,16 +285,26 @@ static __initconst const u64 westmere_hw_cache_event_ids ...@@ -124,16 +285,26 @@ static __initconst const u64 westmere_hw_cache_event_ids
}, },
[ C(LL ) ] = { [ C(LL ) ] = {
[ C(OP_READ) ] = { [ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x0324, /* L2_RQSTS.LOADS */ /* OFFCORE_RESPONSE_0.ANY_DATA.LOCAL_CACHE */
[ C(RESULT_MISS) ] = 0x0224, /* L2_RQSTS.LD_MISS */ [ C(RESULT_ACCESS) ] = 0x01b7,
/* OFFCORE_RESPONSE_1.ANY_DATA.ANY_LLC_MISS */
[ C(RESULT_MISS) ] = 0x01bb,
}, },
/*
* Use RFO, not WRITEBACK, because a write miss would typically occur
* on RFO.
*/
[ C(OP_WRITE) ] = { [ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x0c24, /* L2_RQSTS.RFOS */ /* OFFCORE_RESPONSE_1.ANY_RFO.LOCAL_CACHE */
[ C(RESULT_MISS) ] = 0x0824, /* L2_RQSTS.RFO_MISS */ [ C(RESULT_ACCESS) ] = 0x01bb,
/* OFFCORE_RESPONSE_0.ANY_RFO.ANY_LLC_MISS */
[ C(RESULT_MISS) ] = 0x01b7,
}, },
[ C(OP_PREFETCH) ] = { [ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x4f2e, /* LLC Reference */ /* OFFCORE_RESPONSE_0.PREFETCH.LOCAL_CACHE */
[ C(RESULT_MISS) ] = 0x412e, /* LLC Misses */ [ C(RESULT_ACCESS) ] = 0x01b7,
/* OFFCORE_RESPONSE_1.PREFETCH.ANY_LLC_MISS */
[ C(RESULT_MISS) ] = 0x01bb,
}, },
}, },
[ C(DTLB) ] = { [ C(DTLB) ] = {
...@@ -180,6 +351,39 @@ static __initconst const u64 westmere_hw_cache_event_ids ...@@ -180,6 +351,39 @@ static __initconst const u64 westmere_hw_cache_event_ids
}, },
}; };
/*
* OFFCORE_RESPONSE MSR bits (subset), See IA32 SDM Vol 3 30.6.1.3
*/
#define DMND_DATA_RD (1 << 0)
#define DMND_RFO (1 << 1)
#define DMND_WB (1 << 3)
#define PF_DATA_RD (1 << 4)
#define PF_DATA_RFO (1 << 5)
#define RESP_UNCORE_HIT (1 << 8)
#define RESP_MISS (0xf600) /* non uncore hit */
static __initconst const u64 nehalem_hw_cache_extra_regs
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
[ C(LL ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = DMND_DATA_RD|RESP_UNCORE_HIT,
[ C(RESULT_MISS) ] = DMND_DATA_RD|RESP_MISS,
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = DMND_RFO|DMND_WB|RESP_UNCORE_HIT,
[ C(RESULT_MISS) ] = DMND_RFO|DMND_WB|RESP_MISS,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = PF_DATA_RD|PF_DATA_RFO|RESP_UNCORE_HIT,
[ C(RESULT_MISS) ] = PF_DATA_RD|PF_DATA_RFO|RESP_MISS,
},
}
};
static __initconst const u64 nehalem_hw_cache_event_ids static __initconst const u64 nehalem_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX] [PERF_COUNT_HW_CACHE_OP_MAX]
...@@ -215,16 +419,26 @@ static __initconst const u64 nehalem_hw_cache_event_ids ...@@ -215,16 +419,26 @@ static __initconst const u64 nehalem_hw_cache_event_ids
}, },
[ C(LL ) ] = { [ C(LL ) ] = {
[ C(OP_READ) ] = { [ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x0324, /* L2_RQSTS.LOADS */ /* OFFCORE_RESPONSE.ANY_DATA.LOCAL_CACHE */
[ C(RESULT_MISS) ] = 0x0224, /* L2_RQSTS.LD_MISS */ [ C(RESULT_ACCESS) ] = 0x01b7,
/* OFFCORE_RESPONSE.ANY_DATA.ANY_LLC_MISS */
[ C(RESULT_MISS) ] = 0x01b7,
}, },
/*
* Use RFO, not WRITEBACK, because a write miss would typically occur
* on RFO.
*/
[ C(OP_WRITE) ] = { [ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x0c24, /* L2_RQSTS.RFOS */ /* OFFCORE_RESPONSE.ANY_RFO.LOCAL_CACHE */
[ C(RESULT_MISS) ] = 0x0824, /* L2_RQSTS.RFO_MISS */ [ C(RESULT_ACCESS) ] = 0x01b7,
/* OFFCORE_RESPONSE.ANY_RFO.ANY_LLC_MISS */
[ C(RESULT_MISS) ] = 0x01b7,
}, },
[ C(OP_PREFETCH) ] = { [ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x4f2e, /* LLC Reference */ /* OFFCORE_RESPONSE.PREFETCH.LOCAL_CACHE */
[ C(RESULT_MISS) ] = 0x412e, /* LLC Misses */ [ C(RESULT_ACCESS) ] = 0x01b7,
/* OFFCORE_RESPONSE.PREFETCH.ANY_LLC_MISS */
[ C(RESULT_MISS) ] = 0x01b7,
}, },
}, },
[ C(DTLB) ] = { [ C(DTLB) ] = {
...@@ -691,8 +905,8 @@ static void intel_pmu_reset(void) ...@@ -691,8 +905,8 @@ static void intel_pmu_reset(void)
printk("clearing PMU state on CPU#%d\n", smp_processor_id()); printk("clearing PMU state on CPU#%d\n", smp_processor_id());
for (idx = 0; idx < x86_pmu.num_counters; idx++) { for (idx = 0; idx < x86_pmu.num_counters; idx++) {
checking_wrmsrl(x86_pmu.eventsel + idx, 0ull); checking_wrmsrl(x86_pmu_config_addr(idx), 0ull);
checking_wrmsrl(x86_pmu.perfctr + idx, 0ull); checking_wrmsrl(x86_pmu_event_addr(idx), 0ull);
} }
for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++)
checking_wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, 0ull); checking_wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, 0ull);
...@@ -793,6 +1007,67 @@ intel_bts_constraints(struct perf_event *event) ...@@ -793,6 +1007,67 @@ intel_bts_constraints(struct perf_event *event)
return NULL; return NULL;
} }
static struct event_constraint *
intel_percore_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
unsigned int e = hwc->config & ARCH_PERFMON_EVENTSEL_EVENT;
struct event_constraint *c;
struct intel_percore *pc;
struct er_account *era;
int i;
int free_slot;
int found;
if (!x86_pmu.percore_constraints || hwc->extra_alloc)
return NULL;
for (c = x86_pmu.percore_constraints; c->cmask; c++) {
if (e != c->code)
continue;
/*
* Allocate resource per core.
*/
pc = cpuc->per_core;
if (!pc)
break;
c = &emptyconstraint;
raw_spin_lock(&pc->lock);
free_slot = -1;
found = 0;
for (i = 0; i < MAX_EXTRA_REGS; i++) {
era = &pc->regs[i];
if (era->ref > 0 && hwc->extra_reg == era->extra_reg) {
/* Allow sharing same config */
if (hwc->extra_config == era->extra_config) {
era->ref++;
cpuc->percore_used = 1;
hwc->extra_alloc = 1;
c = NULL;
}
/* else conflict */
found = 1;
break;
} else if (era->ref == 0 && free_slot == -1)
free_slot = i;
}
if (!found && free_slot != -1) {
era = &pc->regs[free_slot];
era->ref = 1;
era->extra_reg = hwc->extra_reg;
era->extra_config = hwc->extra_config;
cpuc->percore_used = 1;
hwc->extra_alloc = 1;
c = NULL;
}
raw_spin_unlock(&pc->lock);
return c;
}
return NULL;
}
static struct event_constraint * static struct event_constraint *
intel_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event) intel_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
{ {
...@@ -806,9 +1081,51 @@ intel_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event ...@@ -806,9 +1081,51 @@ intel_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event
if (c) if (c)
return c; return c;
c = intel_percore_constraints(cpuc, event);
if (c)
return c;
return x86_get_event_constraints(cpuc, event); return x86_get_event_constraints(cpuc, event);
} }
static void intel_put_event_constraints(struct cpu_hw_events *cpuc,
struct perf_event *event)
{
struct extra_reg *er;
struct intel_percore *pc;
struct er_account *era;
struct hw_perf_event *hwc = &event->hw;
int i, allref;
if (!cpuc->percore_used)
return;
for (er = x86_pmu.extra_regs; er->msr; er++) {
if (er->event != (hwc->config & er->config_mask))
continue;
pc = cpuc->per_core;
raw_spin_lock(&pc->lock);
for (i = 0; i < MAX_EXTRA_REGS; i++) {
era = &pc->regs[i];
if (era->ref > 0 &&
era->extra_config == hwc->extra_config &&
era->extra_reg == er->msr) {
era->ref--;
hwc->extra_alloc = 0;
break;
}
}
allref = 0;
for (i = 0; i < MAX_EXTRA_REGS; i++)
allref += pc->regs[i].ref;
if (allref == 0)
cpuc->percore_used = 0;
raw_spin_unlock(&pc->lock);
break;
}
}
static int intel_pmu_hw_config(struct perf_event *event) static int intel_pmu_hw_config(struct perf_event *event)
{ {
int ret = x86_pmu_hw_config(event); int ret = x86_pmu_hw_config(event);
...@@ -880,20 +1197,67 @@ static __initconst const struct x86_pmu core_pmu = { ...@@ -880,20 +1197,67 @@ static __initconst const struct x86_pmu core_pmu = {
*/ */
.max_period = (1ULL << 31) - 1, .max_period = (1ULL << 31) - 1,
.get_event_constraints = intel_get_event_constraints, .get_event_constraints = intel_get_event_constraints,
.put_event_constraints = intel_put_event_constraints,
.event_constraints = intel_core_event_constraints, .event_constraints = intel_core_event_constraints,
}; };
static int intel_pmu_cpu_prepare(int cpu)
{
struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
if (!cpu_has_ht_siblings())
return NOTIFY_OK;
cpuc->per_core = kzalloc_node(sizeof(struct intel_percore),
GFP_KERNEL, cpu_to_node(cpu));
if (!cpuc->per_core)
return NOTIFY_BAD;
raw_spin_lock_init(&cpuc->per_core->lock);
cpuc->per_core->core_id = -1;
return NOTIFY_OK;
}
static void intel_pmu_cpu_starting(int cpu) static void intel_pmu_cpu_starting(int cpu)
{ {
struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
int core_id = topology_core_id(cpu);
int i;
init_debug_store_on_cpu(cpu); init_debug_store_on_cpu(cpu);
/* /*
* Deal with CPUs that don't clear their LBRs on power-up. * Deal with CPUs that don't clear their LBRs on power-up.
*/ */
intel_pmu_lbr_reset(); intel_pmu_lbr_reset();
if (!cpu_has_ht_siblings())
return;
for_each_cpu(i, topology_thread_cpumask(cpu)) {
struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;
if (pc && pc->core_id == core_id) {
kfree(cpuc->per_core);
cpuc->per_core = pc;
break;
}
}
cpuc->per_core->core_id = core_id;
cpuc->per_core->refcnt++;
} }
static void intel_pmu_cpu_dying(int cpu) static void intel_pmu_cpu_dying(int cpu)
{ {
struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
struct intel_percore *pc = cpuc->per_core;
if (pc) {
if (pc->core_id == -1 || --pc->refcnt == 0)
kfree(pc);
cpuc->per_core = NULL;
}
fini_debug_store_on_cpu(cpu); fini_debug_store_on_cpu(cpu);
} }
...@@ -918,7 +1282,9 @@ static __initconst const struct x86_pmu intel_pmu = { ...@@ -918,7 +1282,9 @@ static __initconst const struct x86_pmu intel_pmu = {
*/ */
.max_period = (1ULL << 31) - 1, .max_period = (1ULL << 31) - 1,
.get_event_constraints = intel_get_event_constraints, .get_event_constraints = intel_get_event_constraints,
.put_event_constraints = intel_put_event_constraints,
.cpu_prepare = intel_pmu_cpu_prepare,
.cpu_starting = intel_pmu_cpu_starting, .cpu_starting = intel_pmu_cpu_starting,
.cpu_dying = intel_pmu_cpu_dying, .cpu_dying = intel_pmu_cpu_dying,
}; };
...@@ -1024,6 +1390,7 @@ static __init int intel_pmu_init(void) ...@@ -1024,6 +1390,7 @@ static __init int intel_pmu_init(void)
intel_pmu_lbr_init_core(); intel_pmu_lbr_init_core();
x86_pmu.event_constraints = intel_core2_event_constraints; x86_pmu.event_constraints = intel_core2_event_constraints;
x86_pmu.pebs_constraints = intel_core2_pebs_event_constraints;
pr_cont("Core2 events, "); pr_cont("Core2 events, ");
break; break;
...@@ -1032,11 +1399,16 @@ static __init int intel_pmu_init(void) ...@@ -1032,11 +1399,16 @@ static __init int intel_pmu_init(void)
case 46: /* 45 nm nehalem-ex, "Beckton" */ case 46: /* 45 nm nehalem-ex, "Beckton" */
memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids, memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
sizeof(hw_cache_event_ids)); sizeof(hw_cache_event_ids));
memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
sizeof(hw_cache_extra_regs));
intel_pmu_lbr_init_nhm(); intel_pmu_lbr_init_nhm();
x86_pmu.event_constraints = intel_nehalem_event_constraints; x86_pmu.event_constraints = intel_nehalem_event_constraints;
x86_pmu.pebs_constraints = intel_nehalem_pebs_event_constraints;
x86_pmu.percore_constraints = intel_nehalem_percore_constraints;
x86_pmu.enable_all = intel_pmu_nhm_enable_all; x86_pmu.enable_all = intel_pmu_nhm_enable_all;
x86_pmu.extra_regs = intel_nehalem_extra_regs;
pr_cont("Nehalem events, "); pr_cont("Nehalem events, ");
break; break;
...@@ -1047,6 +1419,7 @@ static __init int intel_pmu_init(void) ...@@ -1047,6 +1419,7 @@ static __init int intel_pmu_init(void)
intel_pmu_lbr_init_atom(); intel_pmu_lbr_init_atom();
x86_pmu.event_constraints = intel_gen_event_constraints; x86_pmu.event_constraints = intel_gen_event_constraints;
x86_pmu.pebs_constraints = intel_atom_pebs_event_constraints;
pr_cont("Atom events, "); pr_cont("Atom events, ");
break; break;
...@@ -1054,14 +1427,30 @@ static __init int intel_pmu_init(void) ...@@ -1054,14 +1427,30 @@ static __init int intel_pmu_init(void)
case 44: /* 32 nm nehalem, "Gulftown" */ case 44: /* 32 nm nehalem, "Gulftown" */
memcpy(hw_cache_event_ids, westmere_hw_cache_event_ids, memcpy(hw_cache_event_ids, westmere_hw_cache_event_ids,
sizeof(hw_cache_event_ids)); sizeof(hw_cache_event_ids));
memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
sizeof(hw_cache_extra_regs));
intel_pmu_lbr_init_nhm(); intel_pmu_lbr_init_nhm();
x86_pmu.event_constraints = intel_westmere_event_constraints; x86_pmu.event_constraints = intel_westmere_event_constraints;
x86_pmu.percore_constraints = intel_westmere_percore_constraints;
x86_pmu.enable_all = intel_pmu_nhm_enable_all; x86_pmu.enable_all = intel_pmu_nhm_enable_all;
x86_pmu.pebs_constraints = intel_westmere_pebs_event_constraints;
x86_pmu.extra_regs = intel_westmere_extra_regs;
pr_cont("Westmere events, "); pr_cont("Westmere events, ");
break; break;
case 42: /* SandyBridge */
memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
sizeof(hw_cache_event_ids));
intel_pmu_lbr_init_nhm();
x86_pmu.event_constraints = intel_snb_event_constraints;
x86_pmu.pebs_constraints = intel_snb_pebs_events;
pr_cont("SandyBridge events, ");
break;
default: default:
/* /*
* default constraints for v2 and up * default constraints for v2 and up
......
...@@ -361,30 +361,88 @@ static int intel_pmu_drain_bts_buffer(void) ...@@ -361,30 +361,88 @@ static int intel_pmu_drain_bts_buffer(void)
/* /*
* PEBS * PEBS
*/ */
static struct event_constraint intel_core2_pebs_event_constraints[] = {
static struct event_constraint intel_core_pebs_events[] = { PEBS_EVENT_CONSTRAINT(0x00c0, 0x1), /* INST_RETIRED.ANY */
PEBS_EVENT_CONSTRAINT(0x00c0, 0x1), /* INSTR_RETIRED.ANY */
PEBS_EVENT_CONSTRAINT(0xfec1, 0x1), /* X87_OPS_RETIRED.ANY */ PEBS_EVENT_CONSTRAINT(0xfec1, 0x1), /* X87_OPS_RETIRED.ANY */
PEBS_EVENT_CONSTRAINT(0x00c5, 0x1), /* BR_INST_RETIRED.MISPRED */ PEBS_EVENT_CONSTRAINT(0x00c5, 0x1), /* BR_INST_RETIRED.MISPRED */
PEBS_EVENT_CONSTRAINT(0x1fc7, 0x1), /* SIMD_INST_RETURED.ANY */ PEBS_EVENT_CONSTRAINT(0x1fc7, 0x1), /* SIMD_INST_RETURED.ANY */
PEBS_EVENT_CONSTRAINT(0x01cb, 0x1), /* MEM_LOAD_RETIRED.L1D_MISS */ INTEL_EVENT_CONSTRAINT(0xcb, 0x1), /* MEM_LOAD_RETIRED.* */
PEBS_EVENT_CONSTRAINT(0x02cb, 0x1), /* MEM_LOAD_RETIRED.L1D_LINE_MISS */ EVENT_CONSTRAINT_END
PEBS_EVENT_CONSTRAINT(0x04cb, 0x1), /* MEM_LOAD_RETIRED.L2_MISS */ };
PEBS_EVENT_CONSTRAINT(0x08cb, 0x1), /* MEM_LOAD_RETIRED.L2_LINE_MISS */
PEBS_EVENT_CONSTRAINT(0x10cb, 0x1), /* MEM_LOAD_RETIRED.DTLB_MISS */ static struct event_constraint intel_atom_pebs_event_constraints[] = {
PEBS_EVENT_CONSTRAINT(0x00c0, 0x1), /* INST_RETIRED.ANY */
PEBS_EVENT_CONSTRAINT(0x00c5, 0x1), /* MISPREDICTED_BRANCH_RETIRED */
INTEL_EVENT_CONSTRAINT(0xcb, 0x1), /* MEM_LOAD_RETIRED.* */
EVENT_CONSTRAINT_END EVENT_CONSTRAINT_END
}; };
static struct event_constraint intel_nehalem_pebs_events[] = { static struct event_constraint intel_nehalem_pebs_event_constraints[] = {
PEBS_EVENT_CONSTRAINT(0x00c0, 0xf), /* INSTR_RETIRED.ANY */ INTEL_EVENT_CONSTRAINT(0x0b, 0xf), /* MEM_INST_RETIRED.* */
PEBS_EVENT_CONSTRAINT(0xfec1, 0xf), /* X87_OPS_RETIRED.ANY */ INTEL_EVENT_CONSTRAINT(0x0f, 0xf), /* MEM_UNCORE_RETIRED.* */
PEBS_EVENT_CONSTRAINT(0x00c5, 0xf), /* BR_INST_RETIRED.MISPRED */ PEBS_EVENT_CONSTRAINT(0x010c, 0xf), /* MEM_STORE_RETIRED.DTLB_MISS */
PEBS_EVENT_CONSTRAINT(0x1fc7, 0xf), /* SIMD_INST_RETURED.ANY */ INTEL_EVENT_CONSTRAINT(0xc0, 0xf), /* INST_RETIRED.ANY */
PEBS_EVENT_CONSTRAINT(0x01cb, 0xf), /* MEM_LOAD_RETIRED.L1D_MISS */ INTEL_EVENT_CONSTRAINT(0xc2, 0xf), /* UOPS_RETIRED.* */
PEBS_EVENT_CONSTRAINT(0x02cb, 0xf), /* MEM_LOAD_RETIRED.L1D_LINE_MISS */ INTEL_EVENT_CONSTRAINT(0xc4, 0xf), /* BR_INST_RETIRED.* */
PEBS_EVENT_CONSTRAINT(0x04cb, 0xf), /* MEM_LOAD_RETIRED.L2_MISS */ PEBS_EVENT_CONSTRAINT(0x02c5, 0xf), /* BR_MISP_RETIRED.NEAR_CALL */
PEBS_EVENT_CONSTRAINT(0x08cb, 0xf), /* MEM_LOAD_RETIRED.L2_LINE_MISS */ INTEL_EVENT_CONSTRAINT(0xc7, 0xf), /* SSEX_UOPS_RETIRED.* */
PEBS_EVENT_CONSTRAINT(0x10cb, 0xf), /* MEM_LOAD_RETIRED.DTLB_MISS */ PEBS_EVENT_CONSTRAINT(0x20c8, 0xf), /* ITLB_MISS_RETIRED */
INTEL_EVENT_CONSTRAINT(0xcb, 0xf), /* MEM_LOAD_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xf7, 0xf), /* FP_ASSIST.* */
EVENT_CONSTRAINT_END
};
static struct event_constraint intel_westmere_pebs_event_constraints[] = {
INTEL_EVENT_CONSTRAINT(0x0b, 0xf), /* MEM_INST_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0x0f, 0xf), /* MEM_UNCORE_RETIRED.* */
PEBS_EVENT_CONSTRAINT(0x010c, 0xf), /* MEM_STORE_RETIRED.DTLB_MISS */
INTEL_EVENT_CONSTRAINT(0xc0, 0xf), /* INSTR_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xc2, 0xf), /* UOPS_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xc4, 0xf), /* BR_INST_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xc5, 0xf), /* BR_MISP_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xc7, 0xf), /* SSEX_UOPS_RETIRED.* */
PEBS_EVENT_CONSTRAINT(0x20c8, 0xf), /* ITLB_MISS_RETIRED */
INTEL_EVENT_CONSTRAINT(0xcb, 0xf), /* MEM_LOAD_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xf7, 0xf), /* FP_ASSIST.* */
EVENT_CONSTRAINT_END
};
static struct event_constraint intel_snb_pebs_events[] = {
PEBS_EVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */
PEBS_EVENT_CONSTRAINT(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
PEBS_EVENT_CONSTRAINT(0x02c2, 0xf), /* UOPS_RETIRED.RETIRE_SLOTS */
PEBS_EVENT_CONSTRAINT(0x01c4, 0xf), /* BR_INST_RETIRED.CONDITIONAL */
PEBS_EVENT_CONSTRAINT(0x02c4, 0xf), /* BR_INST_RETIRED.NEAR_CALL */
PEBS_EVENT_CONSTRAINT(0x04c4, 0xf), /* BR_INST_RETIRED.ALL_BRANCHES */
PEBS_EVENT_CONSTRAINT(0x08c4, 0xf), /* BR_INST_RETIRED.NEAR_RETURN */
PEBS_EVENT_CONSTRAINT(0x10c4, 0xf), /* BR_INST_RETIRED.NOT_TAKEN */
PEBS_EVENT_CONSTRAINT(0x20c4, 0xf), /* BR_INST_RETIRED.NEAR_TAKEN */
PEBS_EVENT_CONSTRAINT(0x40c4, 0xf), /* BR_INST_RETIRED.FAR_BRANCH */
PEBS_EVENT_CONSTRAINT(0x01c5, 0xf), /* BR_MISP_RETIRED.CONDITIONAL */
PEBS_EVENT_CONSTRAINT(0x02c5, 0xf), /* BR_MISP_RETIRED.NEAR_CALL */
PEBS_EVENT_CONSTRAINT(0x04c5, 0xf), /* BR_MISP_RETIRED.ALL_BRANCHES */
PEBS_EVENT_CONSTRAINT(0x10c5, 0xf), /* BR_MISP_RETIRED.NOT_TAKEN */
PEBS_EVENT_CONSTRAINT(0x20c5, 0xf), /* BR_MISP_RETIRED.TAKEN */
PEBS_EVENT_CONSTRAINT(0x01cd, 0x8), /* MEM_TRANS_RETIRED.LOAD_LATENCY */
PEBS_EVENT_CONSTRAINT(0x02cd, 0x8), /* MEM_TRANS_RETIRED.PRECISE_STORE */
PEBS_EVENT_CONSTRAINT(0x11d0, 0xf), /* MEM_UOP_RETIRED.STLB_MISS_LOADS */
PEBS_EVENT_CONSTRAINT(0x12d0, 0xf), /* MEM_UOP_RETIRED.STLB_MISS_STORES */
PEBS_EVENT_CONSTRAINT(0x21d0, 0xf), /* MEM_UOP_RETIRED.LOCK_LOADS */
PEBS_EVENT_CONSTRAINT(0x22d0, 0xf), /* MEM_UOP_RETIRED.LOCK_STORES */
PEBS_EVENT_CONSTRAINT(0x41d0, 0xf), /* MEM_UOP_RETIRED.SPLIT_LOADS */
PEBS_EVENT_CONSTRAINT(0x42d0, 0xf), /* MEM_UOP_RETIRED.SPLIT_STORES */
PEBS_EVENT_CONSTRAINT(0x81d0, 0xf), /* MEM_UOP_RETIRED.ANY_LOADS */
PEBS_EVENT_CONSTRAINT(0x82d0, 0xf), /* MEM_UOP_RETIRED.ANY_STORES */
PEBS_EVENT_CONSTRAINT(0x01d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L1_HIT */
PEBS_EVENT_CONSTRAINT(0x02d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L2_HIT */
PEBS_EVENT_CONSTRAINT(0x04d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.LLC_HIT */
PEBS_EVENT_CONSTRAINT(0x40d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.HIT_LFB */
PEBS_EVENT_CONSTRAINT(0x01d2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_MISS */
PEBS_EVENT_CONSTRAINT(0x02d2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT */
PEBS_EVENT_CONSTRAINT(0x04d2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM */
PEBS_EVENT_CONSTRAINT(0x08d2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_NONE */
PEBS_EVENT_CONSTRAINT(0x02d4, 0xf), /* MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS */
EVENT_CONSTRAINT_END EVENT_CONSTRAINT_END
}; };
...@@ -695,20 +753,17 @@ static void intel_ds_init(void) ...@@ -695,20 +753,17 @@ static void intel_ds_init(void)
printk(KERN_CONT "PEBS fmt0%c, ", pebs_type); printk(KERN_CONT "PEBS fmt0%c, ", pebs_type);
x86_pmu.pebs_record_size = sizeof(struct pebs_record_core); x86_pmu.pebs_record_size = sizeof(struct pebs_record_core);
x86_pmu.drain_pebs = intel_pmu_drain_pebs_core; x86_pmu.drain_pebs = intel_pmu_drain_pebs_core;
x86_pmu.pebs_constraints = intel_core_pebs_events;
break; break;
case 1: case 1:
printk(KERN_CONT "PEBS fmt1%c, ", pebs_type); printk(KERN_CONT "PEBS fmt1%c, ", pebs_type);
x86_pmu.pebs_record_size = sizeof(struct pebs_record_nhm); x86_pmu.pebs_record_size = sizeof(struct pebs_record_nhm);
x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm; x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm;
x86_pmu.pebs_constraints = intel_nehalem_pebs_events;
break; break;
default: default:
printk(KERN_CONT "no PEBS fmt%d%c, ", format, pebs_type); printk(KERN_CONT "no PEBS fmt%d%c, ", format, pebs_type);
x86_pmu.pebs = 0; x86_pmu.pebs = 0;
break;
} }
} }
} }
......
...@@ -764,9 +764,9 @@ static inline int p4_pmu_clear_cccr_ovf(struct hw_perf_event *hwc) ...@@ -764,9 +764,9 @@ static inline int p4_pmu_clear_cccr_ovf(struct hw_perf_event *hwc)
u64 v; u64 v;
/* an official way for overflow indication */ /* an official way for overflow indication */
rdmsrl(hwc->config_base + hwc->idx, v); rdmsrl(hwc->config_base, v);
if (v & P4_CCCR_OVF) { if (v & P4_CCCR_OVF) {
wrmsrl(hwc->config_base + hwc->idx, v & ~P4_CCCR_OVF); wrmsrl(hwc->config_base, v & ~P4_CCCR_OVF);
return 1; return 1;
} }
...@@ -815,7 +815,7 @@ static inline void p4_pmu_disable_event(struct perf_event *event) ...@@ -815,7 +815,7 @@ static inline void p4_pmu_disable_event(struct perf_event *event)
* state we need to clear P4_CCCR_OVF, otherwise interrupt get * state we need to clear P4_CCCR_OVF, otherwise interrupt get
* asserted again and again * asserted again and again
*/ */
(void)checking_wrmsrl(hwc->config_base + hwc->idx, (void)checking_wrmsrl(hwc->config_base,
(u64)(p4_config_unpack_cccr(hwc->config)) & (u64)(p4_config_unpack_cccr(hwc->config)) &
~P4_CCCR_ENABLE & ~P4_CCCR_OVF & ~P4_CCCR_RESERVED); ~P4_CCCR_ENABLE & ~P4_CCCR_OVF & ~P4_CCCR_RESERVED);
} }
...@@ -885,7 +885,7 @@ static void p4_pmu_enable_event(struct perf_event *event) ...@@ -885,7 +885,7 @@ static void p4_pmu_enable_event(struct perf_event *event)
p4_pmu_enable_pebs(hwc->config); p4_pmu_enable_pebs(hwc->config);
(void)checking_wrmsrl(escr_addr, escr_conf); (void)checking_wrmsrl(escr_addr, escr_conf);
(void)checking_wrmsrl(hwc->config_base + hwc->idx, (void)checking_wrmsrl(hwc->config_base,
(cccr & ~P4_CCCR_RESERVED) | P4_CCCR_ENABLE); (cccr & ~P4_CCCR_RESERVED) | P4_CCCR_ENABLE);
} }
......
...@@ -68,7 +68,7 @@ p6_pmu_disable_event(struct perf_event *event) ...@@ -68,7 +68,7 @@ p6_pmu_disable_event(struct perf_event *event)
if (cpuc->enabled) if (cpuc->enabled)
val |= ARCH_PERFMON_EVENTSEL_ENABLE; val |= ARCH_PERFMON_EVENTSEL_ENABLE;
(void)checking_wrmsrl(hwc->config_base + hwc->idx, val); (void)checking_wrmsrl(hwc->config_base, val);
} }
static void p6_pmu_enable_event(struct perf_event *event) static void p6_pmu_enable_event(struct perf_event *event)
...@@ -81,7 +81,7 @@ static void p6_pmu_enable_event(struct perf_event *event) ...@@ -81,7 +81,7 @@ static void p6_pmu_enable_event(struct perf_event *event)
if (cpuc->enabled) if (cpuc->enabled)
val |= ARCH_PERFMON_EVENTSEL_ENABLE; val |= ARCH_PERFMON_EVENTSEL_ENABLE;
(void)checking_wrmsrl(hwc->config_base + hwc->idx, val); (void)checking_wrmsrl(hwc->config_base, val);
} }
static __initconst const struct x86_pmu p6_pmu = { static __initconst const struct x86_pmu p6_pmu = {
......
...@@ -46,6 +46,8 @@ static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr) ...@@ -46,6 +46,8 @@ static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
/* returns the bit offset of the performance counter register */ /* returns the bit offset of the performance counter register */
switch (boot_cpu_data.x86_vendor) { switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD: case X86_VENDOR_AMD:
if (msr >= MSR_F15H_PERF_CTR)
return (msr - MSR_F15H_PERF_CTR) >> 1;
return msr - MSR_K7_PERFCTR0; return msr - MSR_K7_PERFCTR0;
case X86_VENDOR_INTEL: case X86_VENDOR_INTEL:
if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
...@@ -70,6 +72,8 @@ static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr) ...@@ -70,6 +72,8 @@ static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr)
/* returns the bit offset of the event selection register */ /* returns the bit offset of the event selection register */
switch (boot_cpu_data.x86_vendor) { switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD: case X86_VENDOR_AMD:
if (msr >= MSR_F15H_PERF_CTL)
return (msr - MSR_F15H_PERF_CTL) >> 1;
return msr - MSR_K7_EVNTSEL0; return msr - MSR_K7_EVNTSEL0;
case X86_VENDOR_INTEL: case X86_VENDOR_INTEL:
if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
......
...@@ -320,31 +320,6 @@ void die(const char *str, struct pt_regs *regs, long err) ...@@ -320,31 +320,6 @@ void die(const char *str, struct pt_regs *regs, long err)
oops_end(flags, regs, sig); oops_end(flags, regs, sig);
} }
void notrace __kprobes
die_nmi(char *str, struct pt_regs *regs, int do_panic)
{
unsigned long flags;
if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
return;
/*
* We are in trouble anyway, lets at least try
* to get a message out.
*/
flags = oops_begin();
printk(KERN_EMERG "%s", str);
printk(" on CPU%d, ip %08lx, registers:\n",
smp_processor_id(), regs->ip);
show_registers(regs);
oops_end(flags, regs, 0);
if (do_panic || panic_on_oops)
panic("Non maskable interrupt");
nmi_exit();
local_irq_enable();
do_exit(SIGBUS);
}
static int __init oops_setup(char *s) static int __init oops_setup(char *s)
{ {
if (!s) if (!s)
......
...@@ -65,6 +65,8 @@ ...@@ -65,6 +65,8 @@
#define sysexit_audit syscall_exit_work #define sysexit_audit syscall_exit_work
#endif #endif
.section .entry.text, "ax"
/* /*
* We use macros for low-level operations which need to be overridden * We use macros for low-level operations which need to be overridden
* for paravirtualization. The following will never clobber any registers: * for paravirtualization. The following will never clobber any registers:
...@@ -788,7 +790,7 @@ ENDPROC(ptregs_clone) ...@@ -788,7 +790,7 @@ ENDPROC(ptregs_clone)
*/ */
.section .init.rodata,"a" .section .init.rodata,"a"
ENTRY(interrupt) ENTRY(interrupt)
.text .section .entry.text, "ax"
.p2align 5 .p2align 5
.p2align CONFIG_X86_L1_CACHE_SHIFT .p2align CONFIG_X86_L1_CACHE_SHIFT
ENTRY(irq_entries_start) ENTRY(irq_entries_start)
...@@ -807,7 +809,7 @@ vector=FIRST_EXTERNAL_VECTOR ...@@ -807,7 +809,7 @@ vector=FIRST_EXTERNAL_VECTOR
.endif .endif
.previous .previous
.long 1b .long 1b
.text .section .entry.text, "ax"
vector=vector+1 vector=vector+1
.endif .endif
.endr .endr
......
...@@ -61,6 +61,8 @@ ...@@ -61,6 +61,8 @@
#define __AUDIT_ARCH_LE 0x40000000 #define __AUDIT_ARCH_LE 0x40000000
.code64 .code64
.section .entry.text, "ax"
#ifdef CONFIG_FUNCTION_TRACER #ifdef CONFIG_FUNCTION_TRACER
#ifdef CONFIG_DYNAMIC_FTRACE #ifdef CONFIG_DYNAMIC_FTRACE
ENTRY(mcount) ENTRY(mcount)
...@@ -744,7 +746,7 @@ END(stub_rt_sigreturn) ...@@ -744,7 +746,7 @@ END(stub_rt_sigreturn)
*/ */
.section .init.rodata,"a" .section .init.rodata,"a"
ENTRY(interrupt) ENTRY(interrupt)
.text .section .entry.text
.p2align 5 .p2align 5
.p2align CONFIG_X86_L1_CACHE_SHIFT .p2align CONFIG_X86_L1_CACHE_SHIFT
ENTRY(irq_entries_start) ENTRY(irq_entries_start)
...@@ -763,7 +765,7 @@ vector=FIRST_EXTERNAL_VECTOR ...@@ -763,7 +765,7 @@ vector=FIRST_EXTERNAL_VECTOR
.endif .endif
.previous .previous
.quad 1b .quad 1b
.text .section .entry.text
vector=vector+1 vector=vector+1
.endif .endif
.endr .endr
......
...@@ -437,18 +437,19 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr, ...@@ -437,18 +437,19 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
return; return;
} }
if (ftrace_push_return_trace(old, self_addr, &trace.depth,
frame_pointer) == -EBUSY) {
*parent = old;
return;
}
trace.func = self_addr; trace.func = self_addr;
trace.depth = current->curr_ret_stack + 1;
/* Only trace if the calling function expects to */ /* Only trace if the calling function expects to */
if (!ftrace_graph_entry(&trace)) { if (!ftrace_graph_entry(&trace)) {
current->curr_ret_stack--;
*parent = old; *parent = old;
return;
}
if (ftrace_push_return_trace(old, self_addr, &trace.depth,
frame_pointer) == -EBUSY) {
*parent = old;
return;
} }
} }
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */ #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
...@@ -533,15 +533,6 @@ static int __kgdb_notify(struct die_args *args, unsigned long cmd) ...@@ -533,15 +533,6 @@ static int __kgdb_notify(struct die_args *args, unsigned long cmd)
} }
return NOTIFY_DONE; return NOTIFY_DONE;
case DIE_NMIWATCHDOG:
if (atomic_read(&kgdb_active) != -1) {
/* KGDB CPU roundup: */
kgdb_nmicallback(raw_smp_processor_id(), regs);
return NOTIFY_STOP;
}
/* Enter debugger: */
break;
case DIE_DEBUG: case DIE_DEBUG:
if (atomic_read(&kgdb_cpu_doing_single_step) != -1) { if (atomic_read(&kgdb_cpu_doing_single_step) != -1) {
if (user_mode(regs)) if (user_mode(regs))
......
...@@ -1276,6 +1276,14 @@ static int __kprobes can_optimize(unsigned long paddr) ...@@ -1276,6 +1276,14 @@ static int __kprobes can_optimize(unsigned long paddr)
if (!kallsyms_lookup_size_offset(paddr, &size, &offset)) if (!kallsyms_lookup_size_offset(paddr, &size, &offset))
return 0; return 0;
/*
* Do not optimize in the entry code due to the unstable
* stack handling.
*/
if ((paddr >= (unsigned long )__entry_text_start) &&
(paddr < (unsigned long )__entry_text_end))
return 0;
/* Check there is enough space for a relative jump. */ /* Check there is enough space for a relative jump. */
if (size - offset < RELATIVEJUMP_SIZE) if (size - offset < RELATIVEJUMP_SIZE)
return 0; return 0;
......
...@@ -105,6 +105,7 @@ SECTIONS ...@@ -105,6 +105,7 @@ SECTIONS
SCHED_TEXT SCHED_TEXT
LOCK_TEXT LOCK_TEXT
KPROBES_TEXT KPROBES_TEXT
ENTRY_TEXT
IRQENTRY_TEXT IRQENTRY_TEXT
*(.fixup) *(.fixup)
*(.gnu.warning) *(.gnu.warning)
......
...@@ -62,21 +62,21 @@ TRACE_EVENT(kvm_hv_hypercall, ...@@ -62,21 +62,21 @@ TRACE_EVENT(kvm_hv_hypercall,
TP_ARGS(code, fast, rep_cnt, rep_idx, ingpa, outgpa), TP_ARGS(code, fast, rep_cnt, rep_idx, ingpa, outgpa),
TP_STRUCT__entry( TP_STRUCT__entry(
__field( __u16, code )
__field( bool, fast )
__field( __u16, rep_cnt ) __field( __u16, rep_cnt )
__field( __u16, rep_idx ) __field( __u16, rep_idx )
__field( __u64, ingpa ) __field( __u64, ingpa )
__field( __u64, outgpa ) __field( __u64, outgpa )
__field( __u16, code )
__field( bool, fast )
), ),
TP_fast_assign( TP_fast_assign(
__entry->code = code;
__entry->fast = fast;
__entry->rep_cnt = rep_cnt; __entry->rep_cnt = rep_cnt;
__entry->rep_idx = rep_idx; __entry->rep_idx = rep_idx;
__entry->ingpa = ingpa; __entry->ingpa = ingpa;
__entry->outgpa = outgpa; __entry->outgpa = outgpa;
__entry->code = code;
__entry->fast = fast;
), ),
TP_printk("code 0x%x %s cnt 0x%x idx 0x%x in 0x%llx out 0x%llx", TP_printk("code 0x%x %s cnt 0x%x idx 0x%x in 0x%llx out 0x%llx",
......
...@@ -11,6 +11,7 @@ extern char _sinittext[], _einittext[]; ...@@ -11,6 +11,7 @@ extern char _sinittext[], _einittext[];
extern char _end[]; extern char _end[];
extern char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[]; extern char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[];
extern char __kprobes_text_start[], __kprobes_text_end[]; extern char __kprobes_text_start[], __kprobes_text_end[];
extern char __entry_text_start[], __entry_text_end[];
extern char __initdata_begin[], __initdata_end[]; extern char __initdata_begin[], __initdata_end[];
extern char __start_rodata[], __end_rodata[]; extern char __start_rodata[], __end_rodata[];
......
...@@ -424,6 +424,12 @@ ...@@ -424,6 +424,12 @@
*(.kprobes.text) \ *(.kprobes.text) \
VMLINUX_SYMBOL(__kprobes_text_end) = .; VMLINUX_SYMBOL(__kprobes_text_end) = .;
#define ENTRY_TEXT \
ALIGN_FUNCTION(); \
VMLINUX_SYMBOL(__entry_text_start) = .; \
*(.entry.text) \
VMLINUX_SYMBOL(__entry_text_end) = .;
#ifdef CONFIG_FUNCTION_GRAPH_TRACER #ifdef CONFIG_FUNCTION_GRAPH_TRACER
#define IRQENTRY_TEXT \ #define IRQENTRY_TEXT \
ALIGN_FUNCTION(); \ ALIGN_FUNCTION(); \
......
...@@ -474,7 +474,8 @@ struct cgroup_subsys { ...@@ -474,7 +474,8 @@ struct cgroup_subsys {
struct cgroup *old_cgrp, struct task_struct *tsk, struct cgroup *old_cgrp, struct task_struct *tsk,
bool threadgroup); bool threadgroup);
void (*fork)(struct cgroup_subsys *ss, struct task_struct *task); void (*fork)(struct cgroup_subsys *ss, struct task_struct *task);
void (*exit)(struct cgroup_subsys *ss, struct task_struct *task); void (*exit)(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup *old_cgrp, struct task_struct *task);
int (*populate)(struct cgroup_subsys *ss, int (*populate)(struct cgroup_subsys *ss,
struct cgroup *cgrp); struct cgroup *cgrp);
void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp); void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
...@@ -626,6 +627,7 @@ bool css_is_ancestor(struct cgroup_subsys_state *cg, ...@@ -626,6 +627,7 @@ bool css_is_ancestor(struct cgroup_subsys_state *cg,
/* Get id and depth of css */ /* Get id and depth of css */
unsigned short css_id(struct cgroup_subsys_state *css); unsigned short css_id(struct cgroup_subsys_state *css);
unsigned short css_depth(struct cgroup_subsys_state *css); unsigned short css_depth(struct cgroup_subsys_state *css);
struct cgroup_subsys_state *cgroup_css_from_dir(struct file *f, int id);
#else /* !CONFIG_CGROUPS */ #else /* !CONFIG_CGROUPS */
......
...@@ -65,4 +65,8 @@ SUBSYS(net_cls) ...@@ -65,4 +65,8 @@ SUBSYS(net_cls)
SUBSYS(blkio) SUBSYS(blkio)
#endif #endif
#ifdef CONFIG_CGROUP_PERF
SUBSYS(perf)
#endif
/* */ /* */
...@@ -428,6 +428,7 @@ extern void unregister_ftrace_graph(void); ...@@ -428,6 +428,7 @@ extern void unregister_ftrace_graph(void);
extern void ftrace_graph_init_task(struct task_struct *t); extern void ftrace_graph_init_task(struct task_struct *t);
extern void ftrace_graph_exit_task(struct task_struct *t); extern void ftrace_graph_exit_task(struct task_struct *t);
extern void ftrace_graph_init_idle_task(struct task_struct *t, int cpu);
static inline int task_curr_ret_stack(struct task_struct *t) static inline int task_curr_ret_stack(struct task_struct *t)
{ {
...@@ -451,6 +452,7 @@ static inline void unpause_graph_tracing(void) ...@@ -451,6 +452,7 @@ static inline void unpause_graph_tracing(void)
static inline void ftrace_graph_init_task(struct task_struct *t) { } static inline void ftrace_graph_init_task(struct task_struct *t) { }
static inline void ftrace_graph_exit_task(struct task_struct *t) { } static inline void ftrace_graph_exit_task(struct task_struct *t) { }
static inline void ftrace_graph_init_idle_task(struct task_struct *t, int cpu) { }
static inline int register_ftrace_graph(trace_func_graph_ret_t retfunc, static inline int register_ftrace_graph(trace_func_graph_ret_t retfunc,
trace_func_graph_ent_t entryfunc) trace_func_graph_ent_t entryfunc)
......
...@@ -37,7 +37,6 @@ struct trace_entry { ...@@ -37,7 +37,6 @@ struct trace_entry {
unsigned char flags; unsigned char flags;
unsigned char preempt_count; unsigned char preempt_count;
int pid; int pid;
int lock_depth;
}; };
#define FTRACE_MAX_EVENT \ #define FTRACE_MAX_EVENT \
...@@ -208,7 +207,6 @@ struct ftrace_event_call { ...@@ -208,7 +207,6 @@ struct ftrace_event_call {
#define PERF_MAX_TRACE_SIZE 2048 #define PERF_MAX_TRACE_SIZE 2048
#define MAX_FILTER_PRED 32
#define MAX_FILTER_STR_VAL 256 /* Should handle KSYM_SYMBOL_LEN */ #define MAX_FILTER_STR_VAL 256 /* Should handle KSYM_SYMBOL_LEN */
extern void destroy_preds(struct ftrace_event_call *call); extern void destroy_preds(struct ftrace_event_call *call);
......
...@@ -225,8 +225,14 @@ struct perf_event_attr { ...@@ -225,8 +225,14 @@ struct perf_event_attr {
}; };
__u32 bp_type; __u32 bp_type;
__u64 bp_addr; union {
__u64 bp_len; __u64 bp_addr;
__u64 config1; /* extension of config */
};
union {
__u64 bp_len;
__u64 config2; /* extension of config1 */
};
}; };
/* /*
...@@ -464,6 +470,7 @@ enum perf_callchain_context { ...@@ -464,6 +470,7 @@ enum perf_callchain_context {
#define PERF_FLAG_FD_NO_GROUP (1U << 0) #define PERF_FLAG_FD_NO_GROUP (1U << 0)
#define PERF_FLAG_FD_OUTPUT (1U << 1) #define PERF_FLAG_FD_OUTPUT (1U << 1)
#define PERF_FLAG_PID_CGROUP (1U << 2) /* pid=cgroup id, per-cpu mode only */
#ifdef __KERNEL__ #ifdef __KERNEL__
/* /*
...@@ -471,6 +478,7 @@ enum perf_callchain_context { ...@@ -471,6 +478,7 @@ enum perf_callchain_context {
*/ */
#ifdef CONFIG_PERF_EVENTS #ifdef CONFIG_PERF_EVENTS
# include <linux/cgroup.h>
# include <asm/perf_event.h> # include <asm/perf_event.h>
# include <asm/local64.h> # include <asm/local64.h>
#endif #endif
...@@ -539,6 +547,9 @@ struct hw_perf_event { ...@@ -539,6 +547,9 @@ struct hw_perf_event {
unsigned long event_base; unsigned long event_base;
int idx; int idx;
int last_cpu; int last_cpu;
unsigned int extra_reg;
u64 extra_config;
int extra_alloc;
}; };
struct { /* software */ struct { /* software */
struct hrtimer hrtimer; struct hrtimer hrtimer;
...@@ -716,6 +727,22 @@ struct swevent_hlist { ...@@ -716,6 +727,22 @@ struct swevent_hlist {
#define PERF_ATTACH_GROUP 0x02 #define PERF_ATTACH_GROUP 0x02
#define PERF_ATTACH_TASK 0x04 #define PERF_ATTACH_TASK 0x04
#ifdef CONFIG_CGROUP_PERF
/*
* perf_cgroup_info keeps track of time_enabled for a cgroup.
* This is a per-cpu dynamically allocated data structure.
*/
struct perf_cgroup_info {
u64 time;
u64 timestamp;
};
struct perf_cgroup {
struct cgroup_subsys_state css;
struct perf_cgroup_info *info; /* timing info, one per cpu */
};
#endif
/** /**
* struct perf_event - performance event kernel representation: * struct perf_event - performance event kernel representation:
*/ */
...@@ -832,6 +859,11 @@ struct perf_event { ...@@ -832,6 +859,11 @@ struct perf_event {
struct event_filter *filter; struct event_filter *filter;
#endif #endif
#ifdef CONFIG_CGROUP_PERF
struct perf_cgroup *cgrp; /* cgroup event is attach to */
int cgrp_defer_enabled;
#endif
#endif /* CONFIG_PERF_EVENTS */ #endif /* CONFIG_PERF_EVENTS */
}; };
...@@ -886,6 +918,7 @@ struct perf_event_context { ...@@ -886,6 +918,7 @@ struct perf_event_context {
u64 generation; u64 generation;
int pin_count; int pin_count;
struct rcu_head rcu_head; struct rcu_head rcu_head;
int nr_cgroups; /* cgroup events present */
}; };
/* /*
...@@ -905,6 +938,9 @@ struct perf_cpu_context { ...@@ -905,6 +938,9 @@ struct perf_cpu_context {
struct list_head rotation_list; struct list_head rotation_list;
int jiffies_interval; int jiffies_interval;
struct pmu *active_pmu; struct pmu *active_pmu;
#ifdef CONFIG_CGROUP_PERF
struct perf_cgroup *cgrp;
#endif
}; };
struct perf_output_handle { struct perf_output_handle {
...@@ -1040,11 +1076,11 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr) ...@@ -1040,11 +1076,11 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
__perf_sw_event(event_id, nr, nmi, regs, addr); __perf_sw_event(event_id, nr, nmi, regs, addr);
} }
extern atomic_t perf_task_events; extern atomic_t perf_sched_events;
static inline void perf_event_task_sched_in(struct task_struct *task) static inline void perf_event_task_sched_in(struct task_struct *task)
{ {
COND_STMT(&perf_task_events, __perf_event_task_sched_in(task)); COND_STMT(&perf_sched_events, __perf_event_task_sched_in(task));
} }
static inline static inline
...@@ -1052,7 +1088,7 @@ void perf_event_task_sched_out(struct task_struct *task, struct task_struct *nex ...@@ -1052,7 +1088,7 @@ void perf_event_task_sched_out(struct task_struct *task, struct task_struct *nex
{ {
perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0); perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0);
COND_STMT(&perf_task_events, __perf_event_task_sched_out(task, next)); COND_STMT(&perf_sched_events, __perf_event_task_sched_out(task, next));
} }
extern void perf_event_mmap(struct vm_area_struct *vma); extern void perf_event_mmap(struct vm_area_struct *vma);
...@@ -1083,6 +1119,10 @@ extern int sysctl_perf_event_paranoid; ...@@ -1083,6 +1119,10 @@ extern int sysctl_perf_event_paranoid;
extern int sysctl_perf_event_mlock; extern int sysctl_perf_event_mlock;
extern int sysctl_perf_event_sample_rate; extern int sysctl_perf_event_sample_rate;
extern int perf_proc_update_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
static inline bool perf_paranoid_tracepoint_raw(void) static inline bool perf_paranoid_tracepoint_raw(void)
{ {
return sysctl_perf_event_paranoid > -1; return sysctl_perf_event_paranoid > -1;
......
...@@ -100,6 +100,8 @@ void ring_buffer_free(struct ring_buffer *buffer); ...@@ -100,6 +100,8 @@ void ring_buffer_free(struct ring_buffer *buffer);
int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size); int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size);
void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val);
struct ring_buffer_event *ring_buffer_lock_reserve(struct ring_buffer *buffer, struct ring_buffer_event *ring_buffer_lock_reserve(struct ring_buffer *buffer,
unsigned long length); unsigned long length);
int ring_buffer_unlock_commit(struct ring_buffer *buffer, int ring_buffer_unlock_commit(struct ring_buffer *buffer,
......
...@@ -2578,13 +2578,6 @@ static inline void inc_syscw(struct task_struct *tsk) ...@@ -2578,13 +2578,6 @@ static inline void inc_syscw(struct task_struct *tsk)
#define TASK_SIZE_OF(tsk) TASK_SIZE #define TASK_SIZE_OF(tsk) TASK_SIZE
#endif #endif
/*
* Call the function if the target task is executing on a CPU right now:
*/
extern void task_oncpu_function_call(struct task_struct *p,
void (*func) (void *info), void *info);
#ifdef CONFIG_MM_OWNER #ifdef CONFIG_MM_OWNER
extern void mm_update_next_owner(struct mm_struct *mm); extern void mm_update_next_owner(struct mm_struct *mm);
extern void mm_init_owner(struct mm_struct *mm, struct task_struct *p); extern void mm_init_owner(struct mm_struct *mm, struct task_struct *p);
......
...@@ -133,11 +133,11 @@ extern struct trace_event_functions exit_syscall_print_funcs; ...@@ -133,11 +133,11 @@ extern struct trace_event_functions exit_syscall_print_funcs;
.class = &event_class_syscall_enter, \ .class = &event_class_syscall_enter, \
.event.funcs = &enter_syscall_print_funcs, \ .event.funcs = &enter_syscall_print_funcs, \
.data = (void *)&__syscall_meta_##sname,\ .data = (void *)&__syscall_meta_##sname,\
.flags = TRACE_EVENT_FL_CAP_ANY, \
}; \ }; \
static struct ftrace_event_call __used \ static struct ftrace_event_call __used \
__attribute__((section("_ftrace_events"))) \ __attribute__((section("_ftrace_events"))) \
*__event_enter_##sname = &event_enter_##sname; \ *__event_enter_##sname = &event_enter_##sname;
__TRACE_EVENT_FLAGS(enter_##sname, TRACE_EVENT_FL_CAP_ANY)
#define SYSCALL_TRACE_EXIT_EVENT(sname) \ #define SYSCALL_TRACE_EXIT_EVENT(sname) \
static struct syscall_metadata __syscall_meta_##sname; \ static struct syscall_metadata __syscall_meta_##sname; \
...@@ -147,11 +147,11 @@ extern struct trace_event_functions exit_syscall_print_funcs; ...@@ -147,11 +147,11 @@ extern struct trace_event_functions exit_syscall_print_funcs;
.class = &event_class_syscall_exit, \ .class = &event_class_syscall_exit, \
.event.funcs = &exit_syscall_print_funcs, \ .event.funcs = &exit_syscall_print_funcs, \
.data = (void *)&__syscall_meta_##sname,\ .data = (void *)&__syscall_meta_##sname,\
.flags = TRACE_EVENT_FL_CAP_ANY, \
}; \ }; \
static struct ftrace_event_call __used \ static struct ftrace_event_call __used \
__attribute__((section("_ftrace_events"))) \ __attribute__((section("_ftrace_events"))) \
*__event_exit_##sname = &event_exit_##sname; \ *__event_exit_##sname = &event_exit_##sname;
__TRACE_EVENT_FLAGS(exit_##sname, TRACE_EVENT_FL_CAP_ANY)
#define SYSCALL_METADATA(sname, nb) \ #define SYSCALL_METADATA(sname, nb) \
SYSCALL_TRACE_ENTER_EVENT(sname); \ SYSCALL_TRACE_ENTER_EVENT(sname); \
...@@ -159,6 +159,7 @@ extern struct trace_event_functions exit_syscall_print_funcs; ...@@ -159,6 +159,7 @@ extern struct trace_event_functions exit_syscall_print_funcs;
static struct syscall_metadata __used \ static struct syscall_metadata __used \
__syscall_meta_##sname = { \ __syscall_meta_##sname = { \
.name = "sys"#sname, \ .name = "sys"#sname, \
.syscall_nr = -1, /* Filled in at boot */ \
.nb_args = nb, \ .nb_args = nb, \
.types = types_##sname, \ .types = types_##sname, \
.args = args_##sname, \ .args = args_##sname, \
...@@ -176,6 +177,7 @@ extern struct trace_event_functions exit_syscall_print_funcs; ...@@ -176,6 +177,7 @@ extern struct trace_event_functions exit_syscall_print_funcs;
static struct syscall_metadata __used \ static struct syscall_metadata __used \
__syscall_meta__##sname = { \ __syscall_meta__##sname = { \
.name = "sys_"#sname, \ .name = "sys_"#sname, \
.syscall_nr = -1, /* Filled in at boot */ \
.nb_args = 0, \ .nb_args = 0, \
.enter_event = &event_enter__##sname, \ .enter_event = &event_enter__##sname, \
.exit_event = &event_exit__##sname, \ .exit_event = &event_exit__##sname, \
......
...@@ -17,36 +17,36 @@ TRACE_EVENT(mce_record, ...@@ -17,36 +17,36 @@ TRACE_EVENT(mce_record,
TP_STRUCT__entry( TP_STRUCT__entry(
__field( u64, mcgcap ) __field( u64, mcgcap )
__field( u64, mcgstatus ) __field( u64, mcgstatus )
__field( u8, bank )
__field( u64, status ) __field( u64, status )
__field( u64, addr ) __field( u64, addr )
__field( u64, misc ) __field( u64, misc )
__field( u64, ip ) __field( u64, ip )
__field( u8, cs )
__field( u64, tsc ) __field( u64, tsc )
__field( u64, walltime ) __field( u64, walltime )
__field( u32, cpu ) __field( u32, cpu )
__field( u32, cpuid ) __field( u32, cpuid )
__field( u32, apicid ) __field( u32, apicid )
__field( u32, socketid ) __field( u32, socketid )
__field( u8, cs )
__field( u8, bank )
__field( u8, cpuvendor ) __field( u8, cpuvendor )
), ),
TP_fast_assign( TP_fast_assign(
__entry->mcgcap = m->mcgcap; __entry->mcgcap = m->mcgcap;
__entry->mcgstatus = m->mcgstatus; __entry->mcgstatus = m->mcgstatus;
__entry->bank = m->bank;
__entry->status = m->status; __entry->status = m->status;
__entry->addr = m->addr; __entry->addr = m->addr;
__entry->misc = m->misc; __entry->misc = m->misc;
__entry->ip = m->ip; __entry->ip = m->ip;
__entry->cs = m->cs;
__entry->tsc = m->tsc; __entry->tsc = m->tsc;
__entry->walltime = m->time; __entry->walltime = m->time;
__entry->cpu = m->extcpu; __entry->cpu = m->extcpu;
__entry->cpuid = m->cpuid; __entry->cpuid = m->cpuid;
__entry->apicid = m->apicid; __entry->apicid = m->apicid;
__entry->socketid = m->socketid; __entry->socketid = m->socketid;
__entry->cs = m->cs;
__entry->bank = m->bank;
__entry->cpuvendor = m->cpuvendor; __entry->cpuvendor = m->cpuvendor;
), ),
......
...@@ -108,14 +108,14 @@ TRACE_EVENT(module_request, ...@@ -108,14 +108,14 @@ TRACE_EVENT(module_request,
TP_ARGS(name, wait, ip), TP_ARGS(name, wait, ip),
TP_STRUCT__entry( TP_STRUCT__entry(
__field( bool, wait )
__field( unsigned long, ip ) __field( unsigned long, ip )
__field( bool, wait )
__string( name, name ) __string( name, name )
), ),
TP_fast_assign( TP_fast_assign(
__entry->wait = wait;
__entry->ip = ip; __entry->ip = ip;
__entry->wait = wait;
__assign_str(name, name); __assign_str(name, name);
), ),
...@@ -129,4 +129,3 @@ TRACE_EVENT(module_request, ...@@ -129,4 +129,3 @@ TRACE_EVENT(module_request,
/* This part must be outside protection */ /* This part must be outside protection */
#include <trace/define_trace.h> #include <trace/define_trace.h>
...@@ -19,14 +19,14 @@ TRACE_EVENT(kfree_skb, ...@@ -19,14 +19,14 @@ TRACE_EVENT(kfree_skb,
TP_STRUCT__entry( TP_STRUCT__entry(
__field( void *, skbaddr ) __field( void *, skbaddr )
__field( unsigned short, protocol )
__field( void *, location ) __field( void *, location )
__field( unsigned short, protocol )
), ),
TP_fast_assign( TP_fast_assign(
__entry->skbaddr = skb; __entry->skbaddr = skb;
__entry->protocol = ntohs(skb->protocol);
__entry->location = location; __entry->location = location;
__entry->protocol = ntohs(skb->protocol);
), ),
TP_printk("skbaddr=%p protocol=%u location=%p", TP_printk("skbaddr=%p protocol=%u location=%p",
......
...@@ -695,6 +695,16 @@ config CGROUP_MEM_RES_CTLR_SWAP_ENABLED ...@@ -695,6 +695,16 @@ config CGROUP_MEM_RES_CTLR_SWAP_ENABLED
select this option (if, for some reason, they need to disable it select this option (if, for some reason, they need to disable it
then noswapaccount does the trick). then noswapaccount does the trick).
config CGROUP_PERF
bool "Enable perf_event per-cpu per-container group (cgroup) monitoring"
depends on PERF_EVENTS && CGROUPS
help
This option extends the per-cpu mode to restrict monitoring to
threads which belong to the cgroup specified and run on the
designated cpu.
Say N if unsure.
menuconfig CGROUP_SCHED menuconfig CGROUP_SCHED
bool "Group CPU scheduler" bool "Group CPU scheduler"
depends on EXPERIMENTAL depends on EXPERIMENTAL
......
...@@ -4230,20 +4230,8 @@ void cgroup_post_fork(struct task_struct *child) ...@@ -4230,20 +4230,8 @@ void cgroup_post_fork(struct task_struct *child)
*/ */
void cgroup_exit(struct task_struct *tsk, int run_callbacks) void cgroup_exit(struct task_struct *tsk, int run_callbacks)
{ {
int i;
struct css_set *cg; struct css_set *cg;
int i;
if (run_callbacks && need_forkexit_callback) {
/*
* modular subsystems can't use callbacks, so no need to lock
* the subsys array
*/
for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (ss->exit)
ss->exit(ss, tsk);
}
}
/* /*
* Unlink from the css_set task list if necessary. * Unlink from the css_set task list if necessary.
...@@ -4261,7 +4249,24 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks) ...@@ -4261,7 +4249,24 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
task_lock(tsk); task_lock(tsk);
cg = tsk->cgroups; cg = tsk->cgroups;
tsk->cgroups = &init_css_set; tsk->cgroups = &init_css_set;
if (run_callbacks && need_forkexit_callback) {
/*
* modular subsystems can't use callbacks, so no need to lock
* the subsys array
*/
for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (ss->exit) {
struct cgroup *old_cgrp =
rcu_dereference_raw(cg->subsys[i])->cgroup;
struct cgroup *cgrp = task_cgroup(tsk, i);
ss->exit(ss, cgrp, old_cgrp, tsk);
}
}
}
task_unlock(tsk); task_unlock(tsk);
if (cg) if (cg)
put_css_set_taskexit(cg); put_css_set_taskexit(cg);
} }
...@@ -4813,6 +4818,29 @@ css_get_next(struct cgroup_subsys *ss, int id, ...@@ -4813,6 +4818,29 @@ css_get_next(struct cgroup_subsys *ss, int id,
return ret; return ret;
} }
/*
* get corresponding css from file open on cgroupfs directory
*/
struct cgroup_subsys_state *cgroup_css_from_dir(struct file *f, int id)
{
struct cgroup *cgrp;
struct inode *inode;
struct cgroup_subsys_state *css;
inode = f->f_dentry->d_inode;
/* check in cgroup filesystem dir */
if (inode->i_op != &cgroup_dir_inode_operations)
return ERR_PTR(-EBADF);
if (id < 0 || id >= CGROUP_SUBSYS_COUNT)
return ERR_PTR(-EINVAL);
/* get cgroup */
cgrp = __d_cgrp(f->f_dentry);
css = cgrp->subsys[id];
return css ? css : ERR_PTR(-ENOENT);
}
#ifdef CONFIG_CGROUP_DEBUG #ifdef CONFIG_CGROUP_DEBUG
static struct cgroup_subsys_state *debug_create(struct cgroup_subsys *ss, static struct cgroup_subsys_state *debug_create(struct cgroup_subsys *ss,
struct cgroup *cont) struct cgroup *cont)
......
此差异已折叠。
...@@ -606,9 +606,6 @@ static inline struct task_group *task_group(struct task_struct *p) ...@@ -606,9 +606,6 @@ static inline struct task_group *task_group(struct task_struct *p)
struct task_group *tg; struct task_group *tg;
struct cgroup_subsys_state *css; struct cgroup_subsys_state *css;
if (p->flags & PF_EXITING)
return &root_task_group;
css = task_subsys_state_check(p, cpu_cgroup_subsys_id, css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
lockdep_is_held(&task_rq(p)->lock)); lockdep_is_held(&task_rq(p)->lock));
tg = container_of(css, struct task_group, css); tg = container_of(css, struct task_group, css);
...@@ -2265,27 +2262,6 @@ void kick_process(struct task_struct *p) ...@@ -2265,27 +2262,6 @@ void kick_process(struct task_struct *p)
EXPORT_SYMBOL_GPL(kick_process); EXPORT_SYMBOL_GPL(kick_process);
#endif /* CONFIG_SMP */ #endif /* CONFIG_SMP */
/**
* task_oncpu_function_call - call a function on the cpu on which a task runs
* @p: the task to evaluate
* @func: the function to be called
* @info: the function call argument
*
* Calls the function @func when the task is currently running. This might
* be on the current CPU, which just calls the function directly
*/
void task_oncpu_function_call(struct task_struct *p,
void (*func) (void *info), void *info)
{
int cpu;
preempt_disable();
cpu = task_cpu(p);
if (task_curr(p))
smp_call_function_single(cpu, func, info, 1);
preempt_enable();
}
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
/* /*
* ->cpus_allowed is protected by either TASK_WAKING or rq->lock held. * ->cpus_allowed is protected by either TASK_WAKING or rq->lock held.
...@@ -2776,9 +2752,12 @@ static inline void ...@@ -2776,9 +2752,12 @@ static inline void
prepare_task_switch(struct rq *rq, struct task_struct *prev, prepare_task_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next) struct task_struct *next)
{ {
sched_info_switch(prev, next);
perf_event_task_sched_out(prev, next);
fire_sched_out_preempt_notifiers(prev, next); fire_sched_out_preempt_notifiers(prev, next);
prepare_lock_switch(rq, next); prepare_lock_switch(rq, next);
prepare_arch_switch(next); prepare_arch_switch(next);
trace_sched_switch(prev, next);
} }
/** /**
...@@ -2911,7 +2890,7 @@ context_switch(struct rq *rq, struct task_struct *prev, ...@@ -2911,7 +2890,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
struct mm_struct *mm, *oldmm; struct mm_struct *mm, *oldmm;
prepare_task_switch(rq, prev, next); prepare_task_switch(rq, prev, next);
trace_sched_switch(prev, next);
mm = next->mm; mm = next->mm;
oldmm = prev->active_mm; oldmm = prev->active_mm;
/* /*
...@@ -3989,9 +3968,6 @@ asmlinkage void __sched schedule(void) ...@@ -3989,9 +3968,6 @@ asmlinkage void __sched schedule(void)
rq->skip_clock_update = 0; rq->skip_clock_update = 0;
if (likely(prev != next)) { if (likely(prev != next)) {
sched_info_switch(prev, next);
perf_event_task_sched_out(prev, next);
rq->nr_switches++; rq->nr_switches++;
rq->curr = next; rq->curr = next;
++*switch_count; ++*switch_count;
...@@ -5572,7 +5548,7 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu) ...@@ -5572,7 +5548,7 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
* The idle tasks have their own, simple scheduling class: * The idle tasks have their own, simple scheduling class:
*/ */
idle->sched_class = &idle_sched_class; idle->sched_class = &idle_sched_class;
ftrace_graph_init_task(idle); ftrace_graph_init_idle_task(idle, cpu);
} }
/* /*
...@@ -8885,7 +8861,8 @@ cpu_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, ...@@ -8885,7 +8861,8 @@ cpu_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
} }
static void static void
cpu_cgroup_exit(struct cgroup_subsys *ss, struct task_struct *task) cpu_cgroup_exit(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup *old_cgrp, struct task_struct *task)
{ {
/* /*
* cgroup_exit() is called in the copy_process() failure path. * cgroup_exit() is called in the copy_process() failure path.
......
...@@ -948,7 +948,7 @@ static struct ctl_table kern_table[] = { ...@@ -948,7 +948,7 @@ static struct ctl_table kern_table[] = {
.data = &sysctl_perf_event_sample_rate, .data = &sysctl_perf_event_sample_rate,
.maxlen = sizeof(sysctl_perf_event_sample_rate), .maxlen = sizeof(sysctl_perf_event_sample_rate),
.mode = 0644, .mode = 0644,
.proc_handler = proc_dointvec, .proc_handler = perf_proc_update_handler,
}, },
#endif #endif
#ifdef CONFIG_KMEMCHECK #ifdef CONFIG_KMEMCHECK
......
...@@ -3328,7 +3328,7 @@ static int start_graph_tracing(void) ...@@ -3328,7 +3328,7 @@ static int start_graph_tracing(void)
/* The cpu_boot init_task->ret_stack will never be freed */ /* The cpu_boot init_task->ret_stack will never be freed */
for_each_online_cpu(cpu) { for_each_online_cpu(cpu) {
if (!idle_task(cpu)->ret_stack) if (!idle_task(cpu)->ret_stack)
ftrace_graph_init_task(idle_task(cpu)); ftrace_graph_init_idle_task(idle_task(cpu), cpu);
} }
do { do {
...@@ -3418,6 +3418,49 @@ void unregister_ftrace_graph(void) ...@@ -3418,6 +3418,49 @@ void unregister_ftrace_graph(void)
mutex_unlock(&ftrace_lock); mutex_unlock(&ftrace_lock);
} }
static DEFINE_PER_CPU(struct ftrace_ret_stack *, idle_ret_stack);
static void
graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
{
atomic_set(&t->tracing_graph_pause, 0);
atomic_set(&t->trace_overrun, 0);
t->ftrace_timestamp = 0;
/* make curr_ret_stack visable before we add the ret_stack */
smp_wmb();
t->ret_stack = ret_stack;
}
/*
* Allocate a return stack for the idle task. May be the first
* time through, or it may be done by CPU hotplug online.
*/
void ftrace_graph_init_idle_task(struct task_struct *t, int cpu)
{
t->curr_ret_stack = -1;
/*
* The idle task has no parent, it either has its own
* stack or no stack at all.
*/
if (t->ret_stack)
WARN_ON(t->ret_stack != per_cpu(idle_ret_stack, cpu));
if (ftrace_graph_active) {
struct ftrace_ret_stack *ret_stack;
ret_stack = per_cpu(idle_ret_stack, cpu);
if (!ret_stack) {
ret_stack = kmalloc(FTRACE_RETFUNC_DEPTH
* sizeof(struct ftrace_ret_stack),
GFP_KERNEL);
if (!ret_stack)
return;
per_cpu(idle_ret_stack, cpu) = ret_stack;
}
graph_init_task(t, ret_stack);
}
}
/* Allocate a return stack for newly created task */ /* Allocate a return stack for newly created task */
void ftrace_graph_init_task(struct task_struct *t) void ftrace_graph_init_task(struct task_struct *t)
{ {
...@@ -3433,12 +3476,7 @@ void ftrace_graph_init_task(struct task_struct *t) ...@@ -3433,12 +3476,7 @@ void ftrace_graph_init_task(struct task_struct *t)
GFP_KERNEL); GFP_KERNEL);
if (!ret_stack) if (!ret_stack)
return; return;
atomic_set(&t->tracing_graph_pause, 0); graph_init_task(t, ret_stack);
atomic_set(&t->trace_overrun, 0);
t->ftrace_timestamp = 0;
/* make curr_ret_stack visable before we add the ret_stack */
smp_wmb();
t->ret_stack = ret_stack;
} }
} }
......
...@@ -5,7 +5,6 @@ ...@@ -5,7 +5,6 @@
*/ */
#include <linux/ring_buffer.h> #include <linux/ring_buffer.h>
#include <linux/trace_clock.h> #include <linux/trace_clock.h>
#include <linux/ftrace_irq.h>
#include <linux/spinlock.h> #include <linux/spinlock.h>
#include <linux/debugfs.h> #include <linux/debugfs.h>
#include <linux/uaccess.h> #include <linux/uaccess.h>
...@@ -1429,6 +1428,17 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size) ...@@ -1429,6 +1428,17 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
} }
EXPORT_SYMBOL_GPL(ring_buffer_resize); EXPORT_SYMBOL_GPL(ring_buffer_resize);
void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val)
{
mutex_lock(&buffer->mutex);
if (val)
buffer->flags |= RB_FL_OVERWRITE;
else
buffer->flags &= ~RB_FL_OVERWRITE;
mutex_unlock(&buffer->mutex);
}
EXPORT_SYMBOL_GPL(ring_buffer_change_overwrite);
static inline void * static inline void *
__rb_data_page_index(struct buffer_data_page *bpage, unsigned index) __rb_data_page_index(struct buffer_data_page *bpage, unsigned index)
{ {
...@@ -2162,11 +2172,19 @@ rb_reserve_next_event(struct ring_buffer *buffer, ...@@ -2162,11 +2172,19 @@ rb_reserve_next_event(struct ring_buffer *buffer,
if (likely(ts >= cpu_buffer->write_stamp)) { if (likely(ts >= cpu_buffer->write_stamp)) {
delta = diff; delta = diff;
if (unlikely(test_time_stamp(delta))) { if (unlikely(test_time_stamp(delta))) {
int local_clock_stable = 1;
#ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
local_clock_stable = sched_clock_stable;
#endif
WARN_ONCE(delta > (1ULL << 59), WARN_ONCE(delta > (1ULL << 59),
KERN_WARNING "Delta way too big! %llu ts=%llu write stamp = %llu\n", KERN_WARNING "Delta way too big! %llu ts=%llu write stamp = %llu\n%s",
(unsigned long long)delta, (unsigned long long)delta,
(unsigned long long)ts, (unsigned long long)ts,
(unsigned long long)cpu_buffer->write_stamp); (unsigned long long)cpu_buffer->write_stamp,
local_clock_stable ? "" :
"If you just came from a suspend/resume,\n"
"please switch to the trace global clock:\n"
" echo global > /sys/kernel/debug/tracing/trace_clock\n");
add_timestamp = 1; add_timestamp = 1;
} }
} }
......
...@@ -41,8 +41,6 @@ ...@@ -41,8 +41,6 @@
#include "trace.h" #include "trace.h"
#include "trace_output.h" #include "trace_output.h"
#define TRACE_BUFFER_FLAGS (RB_FL_OVERWRITE)
/* /*
* On boot up, the ring buffer is set to the minimum size, so that * On boot up, the ring buffer is set to the minimum size, so that
* we do not waste memory on systems that are not using tracing. * we do not waste memory on systems that are not using tracing.
...@@ -340,7 +338,7 @@ static DECLARE_WAIT_QUEUE_HEAD(trace_wait); ...@@ -340,7 +338,7 @@ static DECLARE_WAIT_QUEUE_HEAD(trace_wait);
/* trace_flags holds trace_options default values */ /* trace_flags holds trace_options default values */
unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK | unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK |
TRACE_ITER_ANNOTATE | TRACE_ITER_CONTEXT_INFO | TRACE_ITER_SLEEP_TIME | TRACE_ITER_ANNOTATE | TRACE_ITER_CONTEXT_INFO | TRACE_ITER_SLEEP_TIME |
TRACE_ITER_GRAPH_TIME | TRACE_ITER_RECORD_CMD; TRACE_ITER_GRAPH_TIME | TRACE_ITER_RECORD_CMD | TRACE_ITER_OVERWRITE;
static int trace_stop_count; static int trace_stop_count;
static DEFINE_SPINLOCK(tracing_start_lock); static DEFINE_SPINLOCK(tracing_start_lock);
...@@ -425,6 +423,7 @@ static const char *trace_options[] = { ...@@ -425,6 +423,7 @@ static const char *trace_options[] = {
"sleep-time", "sleep-time",
"graph-time", "graph-time",
"record-cmd", "record-cmd",
"overwrite",
NULL NULL
}; };
...@@ -780,6 +779,11 @@ __acquires(kernel_lock) ...@@ -780,6 +779,11 @@ __acquires(kernel_lock)
tracing_reset_online_cpus(tr); tracing_reset_online_cpus(tr);
current_trace = type; current_trace = type;
/* If we expanded the buffers, make sure the max is expanded too */
if (ring_buffer_expanded && type->use_max_tr)
ring_buffer_resize(max_tr.buffer, trace_buf_size);
/* the test is responsible for initializing and enabling */ /* the test is responsible for initializing and enabling */
pr_info("Testing tracer %s: ", type->name); pr_info("Testing tracer %s: ", type->name);
ret = type->selftest(type, tr); ret = type->selftest(type, tr);
...@@ -792,6 +796,10 @@ __acquires(kernel_lock) ...@@ -792,6 +796,10 @@ __acquires(kernel_lock)
/* Only reset on passing, to avoid touching corrupted buffers */ /* Only reset on passing, to avoid touching corrupted buffers */
tracing_reset_online_cpus(tr); tracing_reset_online_cpus(tr);
/* Shrink the max buffer again */
if (ring_buffer_expanded && type->use_max_tr)
ring_buffer_resize(max_tr.buffer, 1);
printk(KERN_CONT "PASSED\n"); printk(KERN_CONT "PASSED\n");
} }
#endif #endif
...@@ -1102,7 +1110,6 @@ tracing_generic_entry_update(struct trace_entry *entry, unsigned long flags, ...@@ -1102,7 +1110,6 @@ tracing_generic_entry_update(struct trace_entry *entry, unsigned long flags,
entry->preempt_count = pc & 0xff; entry->preempt_count = pc & 0xff;
entry->pid = (tsk) ? tsk->pid : 0; entry->pid = (tsk) ? tsk->pid : 0;
entry->lock_depth = (tsk) ? tsk->lock_depth : 0;
entry->flags = entry->flags =
#ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
(irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) | (irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) |
...@@ -1749,10 +1756,9 @@ static void print_lat_help_header(struct seq_file *m) ...@@ -1749,10 +1756,9 @@ static void print_lat_help_header(struct seq_file *m)
seq_puts(m, "# | / _----=> need-resched \n"); seq_puts(m, "# | / _----=> need-resched \n");
seq_puts(m, "# || / _---=> hardirq/softirq \n"); seq_puts(m, "# || / _---=> hardirq/softirq \n");
seq_puts(m, "# ||| / _--=> preempt-depth \n"); seq_puts(m, "# ||| / _--=> preempt-depth \n");
seq_puts(m, "# |||| /_--=> lock-depth \n"); seq_puts(m, "# |||| / delay \n");
seq_puts(m, "# |||||/ delay \n"); seq_puts(m, "# cmd pid ||||| time | caller \n");
seq_puts(m, "# cmd pid |||||| time | caller \n"); seq_puts(m, "# \\ / ||||| \\ | / \n");
seq_puts(m, "# \\ / |||||| \\ | / \n");
} }
static void print_func_help_header(struct seq_file *m) static void print_func_help_header(struct seq_file *m)
...@@ -2529,6 +2535,9 @@ static void set_tracer_flags(unsigned int mask, int enabled) ...@@ -2529,6 +2535,9 @@ static void set_tracer_flags(unsigned int mask, int enabled)
if (mask == TRACE_ITER_RECORD_CMD) if (mask == TRACE_ITER_RECORD_CMD)
trace_event_enable_cmd_record(enabled); trace_event_enable_cmd_record(enabled);
if (mask == TRACE_ITER_OVERWRITE)
ring_buffer_change_overwrite(global_trace.buffer, enabled);
} }
static ssize_t static ssize_t
...@@ -2710,6 +2719,10 @@ tracing_ctrl_write(struct file *filp, const char __user *ubuf, ...@@ -2710,6 +2719,10 @@ tracing_ctrl_write(struct file *filp, const char __user *ubuf,
mutex_lock(&trace_types_lock); mutex_lock(&trace_types_lock);
if (tracer_enabled ^ val) { if (tracer_enabled ^ val) {
/* Only need to warn if this is used to change the state */
WARN_ONCE(1, "tracing_enabled is deprecated. Use tracing_on");
if (val) { if (val) {
tracer_enabled = 1; tracer_enabled = 1;
if (current_trace->start) if (current_trace->start)
...@@ -4551,9 +4564,11 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) ...@@ -4551,9 +4564,11 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
__init static int tracer_alloc_buffers(void) __init static int tracer_alloc_buffers(void)
{ {
int ring_buf_size; int ring_buf_size;
enum ring_buffer_flags rb_flags;
int i; int i;
int ret = -ENOMEM; int ret = -ENOMEM;
if (!alloc_cpumask_var(&tracing_buffer_mask, GFP_KERNEL)) if (!alloc_cpumask_var(&tracing_buffer_mask, GFP_KERNEL))
goto out; goto out;
...@@ -4566,12 +4581,13 @@ __init static int tracer_alloc_buffers(void) ...@@ -4566,12 +4581,13 @@ __init static int tracer_alloc_buffers(void)
else else
ring_buf_size = 1; ring_buf_size = 1;
rb_flags = trace_flags & TRACE_ITER_OVERWRITE ? RB_FL_OVERWRITE : 0;
cpumask_copy(tracing_buffer_mask, cpu_possible_mask); cpumask_copy(tracing_buffer_mask, cpu_possible_mask);
cpumask_copy(tracing_cpumask, cpu_all_mask); cpumask_copy(tracing_cpumask, cpu_all_mask);
/* TODO: make the number of buffers hot pluggable with CPUS */ /* TODO: make the number of buffers hot pluggable with CPUS */
global_trace.buffer = ring_buffer_alloc(ring_buf_size, global_trace.buffer = ring_buffer_alloc(ring_buf_size, rb_flags);
TRACE_BUFFER_FLAGS);
if (!global_trace.buffer) { if (!global_trace.buffer) {
printk(KERN_ERR "tracer: failed to allocate ring buffer!\n"); printk(KERN_ERR "tracer: failed to allocate ring buffer!\n");
WARN_ON(1); WARN_ON(1);
...@@ -4581,7 +4597,7 @@ __init static int tracer_alloc_buffers(void) ...@@ -4581,7 +4597,7 @@ __init static int tracer_alloc_buffers(void)
#ifdef CONFIG_TRACER_MAX_TRACE #ifdef CONFIG_TRACER_MAX_TRACE
max_tr.buffer = ring_buffer_alloc(1, TRACE_BUFFER_FLAGS); max_tr.buffer = ring_buffer_alloc(1, rb_flags);
if (!max_tr.buffer) { if (!max_tr.buffer) {
printk(KERN_ERR "tracer: failed to allocate max ring buffer!\n"); printk(KERN_ERR "tracer: failed to allocate max ring buffer!\n");
WARN_ON(1); WARN_ON(1);
......
...@@ -272,8 +272,8 @@ struct tracer { ...@@ -272,8 +272,8 @@ struct tracer {
/* If you handled the flag setting, return 0 */ /* If you handled the flag setting, return 0 */
int (*set_flag)(u32 old_flags, u32 bit, int set); int (*set_flag)(u32 old_flags, u32 bit, int set);
struct tracer *next; struct tracer *next;
int print_max;
struct tracer_flags *flags; struct tracer_flags *flags;
int print_max;
int use_max_tr; int use_max_tr;
}; };
...@@ -606,6 +606,7 @@ enum trace_iterator_flags { ...@@ -606,6 +606,7 @@ enum trace_iterator_flags {
TRACE_ITER_SLEEP_TIME = 0x40000, TRACE_ITER_SLEEP_TIME = 0x40000,
TRACE_ITER_GRAPH_TIME = 0x80000, TRACE_ITER_GRAPH_TIME = 0x80000,
TRACE_ITER_RECORD_CMD = 0x100000, TRACE_ITER_RECORD_CMD = 0x100000,
TRACE_ITER_OVERWRITE = 0x200000,
}; };
/* /*
...@@ -661,8 +662,10 @@ struct ftrace_event_field { ...@@ -661,8 +662,10 @@ struct ftrace_event_field {
}; };
struct event_filter { struct event_filter {
int n_preds; int n_preds; /* Number assigned */
struct filter_pred **preds; int a_preds; /* allocated */
struct filter_pred *preds;
struct filter_pred *root;
char *filter_string; char *filter_string;
}; };
...@@ -674,11 +677,23 @@ struct event_subsystem { ...@@ -674,11 +677,23 @@ struct event_subsystem {
int nr_events; int nr_events;
}; };
#define FILTER_PRED_INVALID ((unsigned short)-1)
#define FILTER_PRED_IS_RIGHT (1 << 15)
#define FILTER_PRED_FOLD (1 << 15)
/*
* The max preds is the size of unsigned short with
* two flags at the MSBs. One bit is used for both the IS_RIGHT
* and FOLD flags. The other is reserved.
*
* 2^14 preds is way more than enough.
*/
#define MAX_FILTER_PRED 16384
struct filter_pred; struct filter_pred;
struct regex; struct regex;
typedef int (*filter_pred_fn_t) (struct filter_pred *pred, void *event, typedef int (*filter_pred_fn_t) (struct filter_pred *pred, void *event);
int val1, int val2);
typedef int (*regex_match_func)(char *str, struct regex *r, int len); typedef int (*regex_match_func)(char *str, struct regex *r, int len);
...@@ -700,11 +715,23 @@ struct filter_pred { ...@@ -700,11 +715,23 @@ struct filter_pred {
filter_pred_fn_t fn; filter_pred_fn_t fn;
u64 val; u64 val;
struct regex regex; struct regex regex;
char *field_name; /*
* Leaf nodes use field_name, ops is used by AND and OR
* nodes. The field_name is always freed when freeing a pred.
* We can overload field_name for ops and have it freed
* as well.
*/
union {
char *field_name;
unsigned short *ops;
};
int offset; int offset;
int not; int not;
int op; int op;
int pop_n; unsigned short index;
unsigned short parent;
unsigned short left;
unsigned short right;
}; };
extern struct list_head ftrace_common_fields; extern struct list_head ftrace_common_fields;
......
...@@ -109,12 +109,12 @@ FTRACE_ENTRY(funcgraph_exit, ftrace_graph_ret_entry, ...@@ -109,12 +109,12 @@ FTRACE_ENTRY(funcgraph_exit, ftrace_graph_ret_entry,
*/ */
#define FTRACE_CTX_FIELDS \ #define FTRACE_CTX_FIELDS \
__field( unsigned int, prev_pid ) \ __field( unsigned int, prev_pid ) \
__field( unsigned int, next_pid ) \
__field( unsigned int, next_cpu ) \
__field( unsigned char, prev_prio ) \ __field( unsigned char, prev_prio ) \
__field( unsigned char, prev_state ) \ __field( unsigned char, prev_state ) \
__field( unsigned int, next_pid ) \
__field( unsigned char, next_prio ) \ __field( unsigned char, next_prio ) \
__field( unsigned char, next_state ) \ __field( unsigned char, next_state )
__field( unsigned int, next_cpu )
FTRACE_ENTRY(context_switch, ctx_switch_entry, FTRACE_ENTRY(context_switch, ctx_switch_entry,
......
...@@ -116,7 +116,6 @@ static int trace_define_common_fields(void) ...@@ -116,7 +116,6 @@ static int trace_define_common_fields(void)
__common_field(unsigned char, flags); __common_field(unsigned char, flags);
__common_field(unsigned char, preempt_count); __common_field(unsigned char, preempt_count);
__common_field(int, pid); __common_field(int, pid);
__common_field(int, lock_depth);
return ret; return ret;
} }
...@@ -326,6 +325,7 @@ int trace_set_clr_event(const char *system, const char *event, int set) ...@@ -326,6 +325,7 @@ int trace_set_clr_event(const char *system, const char *event, int set)
{ {
return __ftrace_set_clr_event(NULL, system, event, set); return __ftrace_set_clr_event(NULL, system, event, set);
} }
EXPORT_SYMBOL_GPL(trace_set_clr_event);
/* 128 should be much more than enough */ /* 128 should be much more than enough */
#define EVENT_BUF_SIZE 127 #define EVENT_BUF_SIZE 127
......
此差异已折叠。
...@@ -353,6 +353,43 @@ static __kprobes void free_deref_fetch_param(struct deref_fetch_param *data) ...@@ -353,6 +353,43 @@ static __kprobes void free_deref_fetch_param(struct deref_fetch_param *data)
kfree(data); kfree(data);
} }
/* Bitfield fetch function */
struct bitfield_fetch_param {
struct fetch_param orig;
unsigned char hi_shift;
unsigned char low_shift;
};
#define DEFINE_FETCH_bitfield(type) \
static __kprobes void FETCH_FUNC_NAME(bitfield, type)(struct pt_regs *regs,\
void *data, void *dest) \
{ \
struct bitfield_fetch_param *bprm = data; \
type buf = 0; \
call_fetch(&bprm->orig, regs, &buf); \
if (buf) { \
buf <<= bprm->hi_shift; \
buf >>= bprm->low_shift; \
} \
*(type *)dest = buf; \
}
DEFINE_BASIC_FETCH_FUNCS(bitfield)
#define fetch_bitfield_string NULL
#define fetch_bitfield_string_size NULL
static __kprobes void
free_bitfield_fetch_param(struct bitfield_fetch_param *data)
{
/*
* Don't check the bitfield itself, because this must be the
* last fetch function.
*/
if (CHECK_FETCH_FUNCS(deref, data->orig.fn))
free_deref_fetch_param(data->orig.data);
else if (CHECK_FETCH_FUNCS(symbol, data->orig.fn))
free_symbol_cache(data->orig.data);
kfree(data);
}
/* Default (unsigned long) fetch type */ /* Default (unsigned long) fetch type */
#define __DEFAULT_FETCH_TYPE(t) u##t #define __DEFAULT_FETCH_TYPE(t) u##t
#define _DEFAULT_FETCH_TYPE(t) __DEFAULT_FETCH_TYPE(t) #define _DEFAULT_FETCH_TYPE(t) __DEFAULT_FETCH_TYPE(t)
...@@ -367,6 +404,7 @@ enum { ...@@ -367,6 +404,7 @@ enum {
FETCH_MTD_memory, FETCH_MTD_memory,
FETCH_MTD_symbol, FETCH_MTD_symbol,
FETCH_MTD_deref, FETCH_MTD_deref,
FETCH_MTD_bitfield,
FETCH_MTD_END, FETCH_MTD_END,
}; };
...@@ -387,6 +425,7 @@ ASSIGN_FETCH_FUNC(retval, ftype), \ ...@@ -387,6 +425,7 @@ ASSIGN_FETCH_FUNC(retval, ftype), \
ASSIGN_FETCH_FUNC(memory, ftype), \ ASSIGN_FETCH_FUNC(memory, ftype), \
ASSIGN_FETCH_FUNC(symbol, ftype), \ ASSIGN_FETCH_FUNC(symbol, ftype), \
ASSIGN_FETCH_FUNC(deref, ftype), \ ASSIGN_FETCH_FUNC(deref, ftype), \
ASSIGN_FETCH_FUNC(bitfield, ftype), \
} \ } \
} }
...@@ -430,9 +469,33 @@ static const struct fetch_type *find_fetch_type(const char *type) ...@@ -430,9 +469,33 @@ static const struct fetch_type *find_fetch_type(const char *type)
if (!type) if (!type)
type = DEFAULT_FETCH_TYPE_STR; type = DEFAULT_FETCH_TYPE_STR;
/* Special case: bitfield */
if (*type == 'b') {
unsigned long bs;
type = strchr(type, '/');
if (!type)
goto fail;
type++;
if (strict_strtoul(type, 0, &bs))
goto fail;
switch (bs) {
case 8:
return find_fetch_type("u8");
case 16:
return find_fetch_type("u16");
case 32:
return find_fetch_type("u32");
case 64:
return find_fetch_type("u64");
default:
goto fail;
}
}
for (i = 0; i < ARRAY_SIZE(fetch_type_table); i++) for (i = 0; i < ARRAY_SIZE(fetch_type_table); i++)
if (strcmp(type, fetch_type_table[i].name) == 0) if (strcmp(type, fetch_type_table[i].name) == 0)
return &fetch_type_table[i]; return &fetch_type_table[i];
fail:
return NULL; return NULL;
} }
...@@ -586,7 +649,9 @@ static struct trace_probe *alloc_trace_probe(const char *group, ...@@ -586,7 +649,9 @@ static struct trace_probe *alloc_trace_probe(const char *group,
static void free_probe_arg(struct probe_arg *arg) static void free_probe_arg(struct probe_arg *arg)
{ {
if (CHECK_FETCH_FUNCS(deref, arg->fetch.fn)) if (CHECK_FETCH_FUNCS(bitfield, arg->fetch.fn))
free_bitfield_fetch_param(arg->fetch.data);
else if (CHECK_FETCH_FUNCS(deref, arg->fetch.fn))
free_deref_fetch_param(arg->fetch.data); free_deref_fetch_param(arg->fetch.data);
else if (CHECK_FETCH_FUNCS(symbol, arg->fetch.fn)) else if (CHECK_FETCH_FUNCS(symbol, arg->fetch.fn))
free_symbol_cache(arg->fetch.data); free_symbol_cache(arg->fetch.data);
...@@ -767,16 +832,15 @@ static int __parse_probe_arg(char *arg, const struct fetch_type *t, ...@@ -767,16 +832,15 @@ static int __parse_probe_arg(char *arg, const struct fetch_type *t,
} }
break; break;
case '+': /* deref memory */ case '+': /* deref memory */
arg++; /* Skip '+', because strict_strtol() rejects it. */
case '-': case '-':
tmp = strchr(arg, '('); tmp = strchr(arg, '(');
if (!tmp) if (!tmp)
break; break;
*tmp = '\0'; *tmp = '\0';
ret = strict_strtol(arg + 1, 0, &offset); ret = strict_strtol(arg, 0, &offset);
if (ret) if (ret)
break; break;
if (arg[0] == '-')
offset = -offset;
arg = tmp + 1; arg = tmp + 1;
tmp = strrchr(arg, ')'); tmp = strrchr(arg, ')');
if (tmp) { if (tmp) {
...@@ -807,6 +871,41 @@ static int __parse_probe_arg(char *arg, const struct fetch_type *t, ...@@ -807,6 +871,41 @@ static int __parse_probe_arg(char *arg, const struct fetch_type *t,
return ret; return ret;
} }
#define BYTES_TO_BITS(nb) ((BITS_PER_LONG * (nb)) / sizeof(long))
/* Bitfield type needs to be parsed into a fetch function */
static int __parse_bitfield_probe_arg(const char *bf,
const struct fetch_type *t,
struct fetch_param *f)
{
struct bitfield_fetch_param *bprm;
unsigned long bw, bo;
char *tail;
if (*bf != 'b')
return 0;
bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
if (!bprm)
return -ENOMEM;
bprm->orig = *f;
f->fn = t->fetch[FETCH_MTD_bitfield];
f->data = (void *)bprm;
bw = simple_strtoul(bf + 1, &tail, 0); /* Use simple one */
if (bw == 0 || *tail != '@')
return -EINVAL;
bf = tail + 1;
bo = simple_strtoul(bf, &tail, 0);
if (tail == bf || *tail != '/')
return -EINVAL;
bprm->hi_shift = BYTES_TO_BITS(t->size) - (bw + bo);
bprm->low_shift = bprm->hi_shift + bo;
return (BYTES_TO_BITS(t->size) < (bw + bo)) ? -EINVAL : 0;
}
/* String length checking wrapper */ /* String length checking wrapper */
static int parse_probe_arg(char *arg, struct trace_probe *tp, static int parse_probe_arg(char *arg, struct trace_probe *tp,
struct probe_arg *parg, int is_return) struct probe_arg *parg, int is_return)
...@@ -836,6 +935,8 @@ static int parse_probe_arg(char *arg, struct trace_probe *tp, ...@@ -836,6 +935,8 @@ static int parse_probe_arg(char *arg, struct trace_probe *tp,
parg->offset = tp->size; parg->offset = tp->size;
tp->size += parg->type->size; tp->size += parg->type->size;
ret = __parse_probe_arg(arg, parg->type, &parg->fetch, is_return); ret = __parse_probe_arg(arg, parg->type, &parg->fetch, is_return);
if (ret >= 0 && t != NULL)
ret = __parse_bitfield_probe_arg(t, parg->type, &parg->fetch);
if (ret >= 0) { if (ret >= 0) {
parg->fetch_size.fn = get_fetch_size_function(parg->type, parg->fetch_size.fn = get_fetch_size_function(parg->type,
parg->fetch.fn); parg->fetch.fn);
...@@ -1130,7 +1231,7 @@ static int command_trace_probe(const char *buf) ...@@ -1130,7 +1231,7 @@ static int command_trace_probe(const char *buf)
return ret; return ret;
} }
#define WRITE_BUFSIZE 128 #define WRITE_BUFSIZE 4096
static ssize_t probes_write(struct file *file, const char __user *buffer, static ssize_t probes_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos) size_t count, loff_t *ppos)
......
...@@ -529,24 +529,34 @@ seq_print_ip_sym(struct trace_seq *s, unsigned long ip, unsigned long sym_flags) ...@@ -529,24 +529,34 @@ seq_print_ip_sym(struct trace_seq *s, unsigned long ip, unsigned long sym_flags)
* @entry: The trace entry field from the ring buffer * @entry: The trace entry field from the ring buffer
* *
* Prints the generic fields of irqs off, in hard or softirq, preempt * Prints the generic fields of irqs off, in hard or softirq, preempt
* count and lock depth. * count.
*/ */
int trace_print_lat_fmt(struct trace_seq *s, struct trace_entry *entry) int trace_print_lat_fmt(struct trace_seq *s, struct trace_entry *entry)
{ {
int hardirq, softirq; char hardsoft_irq;
char need_resched;
char irqs_off;
int hardirq;
int softirq;
int ret; int ret;
hardirq = entry->flags & TRACE_FLAG_HARDIRQ; hardirq = entry->flags & TRACE_FLAG_HARDIRQ;
softirq = entry->flags & TRACE_FLAG_SOFTIRQ; softirq = entry->flags & TRACE_FLAG_SOFTIRQ;
irqs_off =
(entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' :
(entry->flags & TRACE_FLAG_IRQS_NOSUPPORT) ? 'X' :
'.';
need_resched =
(entry->flags & TRACE_FLAG_NEED_RESCHED) ? 'N' : '.';
hardsoft_irq =
(hardirq && softirq) ? 'H' :
hardirq ? 'h' :
softirq ? 's' :
'.';
if (!trace_seq_printf(s, "%c%c%c", if (!trace_seq_printf(s, "%c%c%c",
(entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' : irqs_off, need_resched, hardsoft_irq))
(entry->flags & TRACE_FLAG_IRQS_NOSUPPORT) ?
'X' : '.',
(entry->flags & TRACE_FLAG_NEED_RESCHED) ?
'N' : '.',
(hardirq && softirq) ? 'H' :
hardirq ? 'h' : softirq ? 's' : '.'))
return 0; return 0;
if (entry->preempt_count) if (entry->preempt_count)
...@@ -554,13 +564,7 @@ int trace_print_lat_fmt(struct trace_seq *s, struct trace_entry *entry) ...@@ -554,13 +564,7 @@ int trace_print_lat_fmt(struct trace_seq *s, struct trace_entry *entry)
else else
ret = trace_seq_putc(s, '.'); ret = trace_seq_putc(s, '.');
if (!ret) return ret;
return 0;
if (entry->lock_depth < 0)
return trace_seq_putc(s, '.');
return trace_seq_printf(s, "%d", entry->lock_depth);
} }
static int static int
......
...@@ -247,51 +247,3 @@ void tracing_sched_switch_assign_trace(struct trace_array *tr) ...@@ -247,51 +247,3 @@ void tracing_sched_switch_assign_trace(struct trace_array *tr)
ctx_trace = tr; ctx_trace = tr;
} }
static void stop_sched_trace(struct trace_array *tr)
{
tracing_stop_sched_switch_record();
}
static int sched_switch_trace_init(struct trace_array *tr)
{
ctx_trace = tr;
tracing_reset_online_cpus(tr);
tracing_start_sched_switch_record();
return 0;
}
static void sched_switch_trace_reset(struct trace_array *tr)
{
if (sched_ref)
stop_sched_trace(tr);
}
static void sched_switch_trace_start(struct trace_array *tr)
{
sched_stopped = 0;
}
static void sched_switch_trace_stop(struct trace_array *tr)
{
sched_stopped = 1;
}
static struct tracer sched_switch_trace __read_mostly =
{
.name = "sched_switch",
.init = sched_switch_trace_init,
.reset = sched_switch_trace_reset,
.start = sched_switch_trace_start,
.stop = sched_switch_trace_stop,
.wait_pipe = poll_wait_pipe,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_sched_switch,
#endif
};
__init static int init_sched_switch_trace(void)
{
return register_tracer(&sched_switch_trace);
}
device_initcall(init_sched_switch_trace);
...@@ -60,6 +60,19 @@ extern struct syscall_metadata *__stop_syscalls_metadata[]; ...@@ -60,6 +60,19 @@ extern struct syscall_metadata *__stop_syscalls_metadata[];
static struct syscall_metadata **syscalls_metadata; static struct syscall_metadata **syscalls_metadata;
#ifndef ARCH_HAS_SYSCALL_MATCH_SYM_NAME
static inline bool arch_syscall_match_sym_name(const char *sym, const char *name)
{
/*
* Only compare after the "sys" prefix. Archs that use
* syscall wrappers may have syscalls symbols aliases prefixed
* with "SyS" instead of "sys", leading to an unwanted
* mismatch.
*/
return !strcmp(sym + 3, name + 3);
}
#endif
static __init struct syscall_metadata * static __init struct syscall_metadata *
find_syscall_meta(unsigned long syscall) find_syscall_meta(unsigned long syscall)
{ {
...@@ -72,14 +85,11 @@ find_syscall_meta(unsigned long syscall) ...@@ -72,14 +85,11 @@ find_syscall_meta(unsigned long syscall)
stop = __stop_syscalls_metadata; stop = __stop_syscalls_metadata;
kallsyms_lookup(syscall, NULL, NULL, NULL, str); kallsyms_lookup(syscall, NULL, NULL, NULL, str);
if (arch_syscall_match_sym_name(str, "sys_ni_syscall"))
return NULL;
for ( ; start < stop; start++) { for ( ; start < stop; start++) {
/* if ((*start)->name && arch_syscall_match_sym_name(str, (*start)->name))
* Only compare after the "sys" prefix. Archs that use
* syscall wrappers may have syscalls symbols aliases prefixed
* with "SyS" instead of "sys", leading to an unwanted
* mismatch.
*/
if ((*start)->name && !strcmp((*start)->name + 3, str + 3))
return *start; return *start;
} }
return NULL; return NULL;
...@@ -359,7 +369,7 @@ int reg_event_syscall_enter(struct ftrace_event_call *call) ...@@ -359,7 +369,7 @@ int reg_event_syscall_enter(struct ftrace_event_call *call)
int num; int num;
num = ((struct syscall_metadata *)call->data)->syscall_nr; num = ((struct syscall_metadata *)call->data)->syscall_nr;
if (num < 0 || num >= NR_syscalls) if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
return -ENOSYS; return -ENOSYS;
mutex_lock(&syscall_trace_lock); mutex_lock(&syscall_trace_lock);
if (!sys_refcount_enter) if (!sys_refcount_enter)
...@@ -377,7 +387,7 @@ void unreg_event_syscall_enter(struct ftrace_event_call *call) ...@@ -377,7 +387,7 @@ void unreg_event_syscall_enter(struct ftrace_event_call *call)
int num; int num;
num = ((struct syscall_metadata *)call->data)->syscall_nr; num = ((struct syscall_metadata *)call->data)->syscall_nr;
if (num < 0 || num >= NR_syscalls) if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
return; return;
mutex_lock(&syscall_trace_lock); mutex_lock(&syscall_trace_lock);
sys_refcount_enter--; sys_refcount_enter--;
...@@ -393,7 +403,7 @@ int reg_event_syscall_exit(struct ftrace_event_call *call) ...@@ -393,7 +403,7 @@ int reg_event_syscall_exit(struct ftrace_event_call *call)
int num; int num;
num = ((struct syscall_metadata *)call->data)->syscall_nr; num = ((struct syscall_metadata *)call->data)->syscall_nr;
if (num < 0 || num >= NR_syscalls) if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
return -ENOSYS; return -ENOSYS;
mutex_lock(&syscall_trace_lock); mutex_lock(&syscall_trace_lock);
if (!sys_refcount_exit) if (!sys_refcount_exit)
...@@ -411,7 +421,7 @@ void unreg_event_syscall_exit(struct ftrace_event_call *call) ...@@ -411,7 +421,7 @@ void unreg_event_syscall_exit(struct ftrace_event_call *call)
int num; int num;
num = ((struct syscall_metadata *)call->data)->syscall_nr; num = ((struct syscall_metadata *)call->data)->syscall_nr;
if (num < 0 || num >= NR_syscalls) if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
return; return;
mutex_lock(&syscall_trace_lock); mutex_lock(&syscall_trace_lock);
sys_refcount_exit--; sys_refcount_exit--;
...@@ -424,6 +434,14 @@ void unreg_event_syscall_exit(struct ftrace_event_call *call) ...@@ -424,6 +434,14 @@ void unreg_event_syscall_exit(struct ftrace_event_call *call)
int init_syscall_trace(struct ftrace_event_call *call) int init_syscall_trace(struct ftrace_event_call *call)
{ {
int id; int id;
int num;
num = ((struct syscall_metadata *)call->data)->syscall_nr;
if (num < 0 || num >= NR_syscalls) {
pr_debug("syscall %s metadata not mapped, disabling ftrace event\n",
((struct syscall_metadata *)call->data)->name);
return -ENOSYS;
}
if (set_syscall_print_fmt(call) < 0) if (set_syscall_print_fmt(call) < 0)
return -ENOMEM; return -ENOMEM;
...@@ -438,7 +456,7 @@ int init_syscall_trace(struct ftrace_event_call *call) ...@@ -438,7 +456,7 @@ int init_syscall_trace(struct ftrace_event_call *call)
return id; return id;
} }
unsigned long __init arch_syscall_addr(int nr) unsigned long __init __weak arch_syscall_addr(int nr)
{ {
return (unsigned long)sys_call_table[nr]; return (unsigned long)sys_call_table[nr];
} }
......
#!/usr/bin/perl -w #!/usr/bin/perl -w
# #
# Copywrite 2005-2009 - Steven Rostedt # Copyright 2005-2009 - Steven Rostedt
# Licensed under the terms of the GNU GPL License version 2 # Licensed under the terms of the GNU GPL License version 2
# #
# It's simple enough to figure out how this works. # It's simple enough to figure out how this works.
......
...@@ -206,7 +206,8 @@ static uint32_t (*w2)(uint16_t); ...@@ -206,7 +206,8 @@ static uint32_t (*w2)(uint16_t);
static int static int
is_mcounted_section_name(char const *const txtname) is_mcounted_section_name(char const *const txtname)
{ {
return 0 == strcmp(".text", txtname) || return 0 == strcmp(".text", txtname) ||
0 == strcmp(".ref.text", txtname) ||
0 == strcmp(".sched.text", txtname) || 0 == strcmp(".sched.text", txtname) ||
0 == strcmp(".spinlock.text", txtname) || 0 == strcmp(".spinlock.text", txtname) ||
0 == strcmp(".irqentry.text", txtname) || 0 == strcmp(".irqentry.text", txtname) ||
......
...@@ -130,6 +130,7 @@ if ($inputfile =~ m,kernel/trace/ftrace\.o$,) { ...@@ -130,6 +130,7 @@ if ($inputfile =~ m,kernel/trace/ftrace\.o$,) {
# Acceptable sections to record. # Acceptable sections to record.
my %text_sections = ( my %text_sections = (
".text" => 1, ".text" => 1,
".ref.text" => 1,
".sched.text" => 1, ".sched.text" => 1,
".spinlock.text" => 1, ".spinlock.text" => 1,
".irqentry.text" => 1, ".irqentry.text" => 1,
......
PERF-BUILD-OPTIONS
PERF-CFLAGS PERF-CFLAGS
PERF-GUI-VARS PERF-GUI-VARS
PERF-VERSION-FILE PERF-VERSION-FILE
......
...@@ -178,8 +178,8 @@ install-pdf: pdf ...@@ -178,8 +178,8 @@ install-pdf: pdf
$(INSTALL) -d -m 755 $(DESTDIR)$(pdfdir) $(INSTALL) -d -m 755 $(DESTDIR)$(pdfdir)
$(INSTALL) -m 644 user-manual.pdf $(DESTDIR)$(pdfdir) $(INSTALL) -m 644 user-manual.pdf $(DESTDIR)$(pdfdir)
install-html: html #install-html: html
'$(SHELL_PATH_SQ)' ./install-webdoc.sh $(DESTDIR)$(htmldir) # '$(SHELL_PATH_SQ)' ./install-webdoc.sh $(DESTDIR)$(htmldir)
../PERF-VERSION-FILE: .FORCE-PERF-VERSION-FILE ../PERF-VERSION-FILE: .FORCE-PERF-VERSION-FILE
$(QUIET_SUBDIR0)../ $(QUIET_SUBDIR1) PERF-VERSION-FILE $(QUIET_SUBDIR0)../ $(QUIET_SUBDIR1) PERF-VERSION-FILE
...@@ -288,15 +288,16 @@ $(patsubst %.txt,%.html,$(wildcard howto/*.txt)): %.html : %.txt ...@@ -288,15 +288,16 @@ $(patsubst %.txt,%.html,$(wildcard howto/*.txt)): %.html : %.txt
sed -e '1,/^$$/d' $< | $(ASCIIDOC) -b xhtml11 - >$@+ && \ sed -e '1,/^$$/d' $< | $(ASCIIDOC) -b xhtml11 - >$@+ && \
mv $@+ $@ mv $@+ $@
install-webdoc : html # UNIMPLEMENTED
'$(SHELL_PATH_SQ)' ./install-webdoc.sh $(WEBDOC_DEST) #install-webdoc : html
# '$(SHELL_PATH_SQ)' ./install-webdoc.sh $(WEBDOC_DEST)
quick-install: quick-install-man # quick-install: quick-install-man
quick-install-man: # quick-install-man:
'$(SHELL_PATH_SQ)' ./install-doc-quick.sh $(DOC_REF) $(DESTDIR)$(mandir) # '$(SHELL_PATH_SQ)' ./install-doc-quick.sh $(DOC_REF) $(DESTDIR)$(mandir)
quick-install-html: #quick-install-html:
'$(SHELL_PATH_SQ)' ./install-doc-quick.sh $(HTML_REF) $(DESTDIR)$(htmldir) # '$(SHELL_PATH_SQ)' ./install-doc-quick.sh $(HTML_REF) $(DESTDIR)$(htmldir)
.PHONY: .FORCE-PERF-VERSION-FILE .PHONY: .FORCE-PERF-VERSION-FILE
...@@ -8,7 +8,7 @@ perf-list - List all symbolic event types ...@@ -8,7 +8,7 @@ perf-list - List all symbolic event types
SYNOPSIS SYNOPSIS
-------- --------
[verse] [verse]
'perf list' 'perf list' [hw|sw|cache|tracepoint|event_glob]
DESCRIPTION DESCRIPTION
----------- -----------
...@@ -63,7 +63,26 @@ details. Some of them are referenced in the SEE ALSO section below. ...@@ -63,7 +63,26 @@ details. Some of them are referenced in the SEE ALSO section below.
OPTIONS OPTIONS
------- -------
None
Without options all known events will be listed.
To limit the list use:
. 'hw' or 'hardware' to list hardware events such as cache-misses, etc.
. 'sw' or 'software' to list software events such as context switches, etc.
. 'cache' or 'hwcache' to list hardware cache events such as L1-dcache-loads, etc.
. 'tracepoint' to list all tracepoint events, alternatively use
'subsys_glob:event_glob' to filter by tracepoint subsystems such as sched,
block, etc.
. If none of the above is matched, it will apply the supplied glob to all
events, printing the ones that match.
One or more types can be used at the same time, listing the events for the
types specified.
SEE ALSO SEE ALSO
-------- --------
......
...@@ -24,8 +24,8 @@ and statistics with this 'perf lock' command. ...@@ -24,8 +24,8 @@ and statistics with this 'perf lock' command.
'perf lock report' reports statistical data. 'perf lock report' reports statistical data.
OPTIONS COMMON OPTIONS
------- --------------
-i:: -i::
--input=<file>:: --input=<file>::
...@@ -39,6 +39,14 @@ OPTIONS ...@@ -39,6 +39,14 @@ OPTIONS
--dump-raw-trace:: --dump-raw-trace::
Dump raw trace in ASCII. Dump raw trace in ASCII.
REPORT OPTIONS
--------------
-k::
--key=<value>::
Sorting key. Possible values: acquired (default), contended,
wait_total, wait_max, wait_min.
SEE ALSO SEE ALSO
-------- --------
linkperf:perf[1] linkperf:perf[1]
...@@ -16,7 +16,7 @@ or ...@@ -16,7 +16,7 @@ or
or or
'perf probe' --list 'perf probe' --list
or or
'perf probe' [options] --line='FUNC[:RLN[+NUM|:RLN2]]|SRC:ALN[+NUM|:ALN2]' 'perf probe' [options] --line='LINE'
or or
'perf probe' [options] --vars='PROBEPOINT' 'perf probe' [options] --vars='PROBEPOINT'
...@@ -73,6 +73,17 @@ OPTIONS ...@@ -73,6 +73,17 @@ OPTIONS
(Only for --vars) Show external defined variables in addition to local (Only for --vars) Show external defined variables in addition to local
variables. variables.
-F::
--funcs::
Show available functions in given module or kernel.
--filter=FILTER::
(Only for --vars and --funcs) Set filter. FILTER is a combination of glob
pattern, see FILTER PATTERN for detail.
Default FILTER is "!__k???tab_* & !__crc_*" for --vars, and "!_*"
for --funcs.
If several filters are specified, only the last filter is used.
-f:: -f::
--force:: --force::
Forcibly add events with existing name. Forcibly add events with existing name.
...@@ -117,13 +128,14 @@ LINE SYNTAX ...@@ -117,13 +128,14 @@ LINE SYNTAX
----------- -----------
Line range is described by following syntax. Line range is described by following syntax.
"FUNC[:RLN[+NUM|-RLN2]]|SRC[:ALN[+NUM|-ALN2]]" "FUNC[@SRC][:RLN[+NUM|-RLN2]]|SRC[:ALN[+NUM|-ALN2]]"
FUNC specifies the function name of showing lines. 'RLN' is the start line FUNC specifies the function name of showing lines. 'RLN' is the start line
number from function entry line, and 'RLN2' is the end line number. As same as number from function entry line, and 'RLN2' is the end line number. As same as
probe syntax, 'SRC' means the source file path, 'ALN' is start line number, probe syntax, 'SRC' means the source file path, 'ALN' is start line number,
and 'ALN2' is end line number in the file. It is also possible to specify how and 'ALN2' is end line number in the file. It is also possible to specify how
many lines to show by using 'NUM'. many lines to show by using 'NUM'. Moreover, 'FUNC@SRC' combination is good
for searching a specific function when several functions share same name.
So, "source.c:100-120" shows lines between 100th to l20th in source.c file. And "func:10+20" shows 20 lines from 10th line of func function. So, "source.c:100-120" shows lines between 100th to l20th in source.c file. And "func:10+20" shows 20 lines from 10th line of func function.
LAZY MATCHING LAZY MATCHING
...@@ -135,6 +147,14 @@ e.g. ...@@ -135,6 +147,14 @@ e.g.
This provides some sort of flexibility and robustness to probe point definitions against minor code changes. For example, actual 10th line of schedule() can be moved easily by modifying schedule(), but the same line matching 'rq=cpu_rq*' may still exist in the function.) This provides some sort of flexibility and robustness to probe point definitions against minor code changes. For example, actual 10th line of schedule() can be moved easily by modifying schedule(), but the same line matching 'rq=cpu_rq*' may still exist in the function.)
FILTER PATTERN
--------------
The filter pattern is a glob matching pattern(s) to filter variables.
In addition, you can use "!" for specifying filter-out rule. You also can give several rules combined with "&" or "|", and fold those rules as one rule by using "(" ")".
e.g.
With --filter "foo* | bar*", perf probe -V shows variables which start with "foo" or "bar".
With --filter "!foo* & *bar", perf probe -V shows variables which don't start with "foo" and end with "bar", like "fizzbar". But "foobar" is filtered out.
EXAMPLES EXAMPLES
-------- --------
......
...@@ -137,6 +137,17 @@ Do not update the builid cache. This saves some overhead in situations ...@@ -137,6 +137,17 @@ Do not update the builid cache. This saves some overhead in situations
where the information in the perf.data file (which includes buildids) where the information in the perf.data file (which includes buildids)
is sufficient. is sufficient.
-G name,...::
--cgroup name,...::
monitor only in the container (cgroup) called "name". This option is available only
in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
container "name" are monitored when they run on the monitored CPUs. Multiple cgroups
can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup
to first event, second cgroup to second event and so on. It is possible to provide
an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
corresponding events, i.e., they always refer to events defined earlier on the command
line.
SEE ALSO SEE ALSO
-------- --------
linkperf:perf-stat[1], linkperf:perf-list[1] linkperf:perf-stat[1], linkperf:perf-list[1]
...@@ -83,6 +83,17 @@ This option is only valid in system-wide mode. ...@@ -83,6 +83,17 @@ This option is only valid in system-wide mode.
print counts using a CSV-style output to make it easy to import directly into print counts using a CSV-style output to make it easy to import directly into
spreadsheets. Columns are separated by the string specified in SEP. spreadsheets. Columns are separated by the string specified in SEP.
-G name::
--cgroup name::
monitor only in the container (cgroup) called "name". This option is available only
in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
container "name" are monitored when they run on the monitored CPUs. Multiple cgroups
can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup
to first event, second cgroup to second event and so on. It is possible to provide
an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
corresponding events, i.e., they always refer to events defined earlier on the command
line.
EXAMPLES EXAMPLES
-------- --------
......
此差异已折叠。
...@@ -55,7 +55,7 @@ int bench_sched_pipe(int argc, const char **argv, ...@@ -55,7 +55,7 @@ int bench_sched_pipe(int argc, const char **argv,
* discarding returned value of read(), write() * discarding returned value of read(), write()
* causes error in building environment for perf * causes error in building environment for perf
*/ */
int ret, wait_stat; int __used ret, wait_stat;
pid_t pid, retpid; pid_t pid, retpid;
argc = parse_options(argc, argv, options, argc = parse_options(argc, argv, options,
......
此差异已折叠。
...@@ -30,13 +30,13 @@ static int hists__add_entry(struct hists *self, ...@@ -30,13 +30,13 @@ static int hists__add_entry(struct hists *self,
return -ENOMEM; return -ENOMEM;
} }
static int diff__process_sample_event(event_t *event, static int diff__process_sample_event(union perf_event *event,
struct sample_data *sample, struct perf_sample *sample,
struct perf_session *session) struct perf_session *session)
{ {
struct addr_location al; struct addr_location al;
if (event__preprocess_sample(event, session, &al, sample, NULL) < 0) { if (perf_event__preprocess_sample(event, session, &al, sample, NULL) < 0) {
pr_warning("problem processing %d event, skipping it.\n", pr_warning("problem processing %d event, skipping it.\n",
event->header.type); event->header.type);
return -1; return -1;
...@@ -56,11 +56,11 @@ static int diff__process_sample_event(event_t *event, ...@@ -56,11 +56,11 @@ static int diff__process_sample_event(event_t *event,
static struct perf_event_ops event_ops = { static struct perf_event_ops event_ops = {
.sample = diff__process_sample_event, .sample = diff__process_sample_event,
.mmap = event__process_mmap, .mmap = perf_event__process_mmap,
.comm = event__process_comm, .comm = perf_event__process_comm,
.exit = event__process_task, .exit = perf_event__process_task,
.fork = event__process_task, .fork = perf_event__process_task,
.lost = event__process_lost, .lost = perf_event__process_lost,
.ordered_samples = true, .ordered_samples = true,
.ordering_requires_timestamps = true, .ordering_requires_timestamps = true,
}; };
......
此差异已折叠。
...@@ -275,9 +275,8 @@ static void process_free_event(void *data, ...@@ -275,9 +275,8 @@ static void process_free_event(void *data,
s_alloc->alloc_cpu = -1; s_alloc->alloc_cpu = -1;
} }
static void static void process_raw_event(union perf_event *raw_event __used, void *data,
process_raw_event(event_t *raw_event __used, void *data, int cpu, u64 timestamp, struct thread *thread)
int cpu, u64 timestamp, struct thread *thread)
{ {
struct event *event; struct event *event;
int type; int type;
...@@ -304,7 +303,8 @@ process_raw_event(event_t *raw_event __used, void *data, ...@@ -304,7 +303,8 @@ process_raw_event(event_t *raw_event __used, void *data,
} }
} }
static int process_sample_event(event_t *event, struct sample_data *sample, static int process_sample_event(union perf_event *event,
struct perf_sample *sample,
struct perf_session *session) struct perf_session *session)
{ {
struct thread *thread = perf_session__findnew(session, event->ip.pid); struct thread *thread = perf_session__findnew(session, event->ip.pid);
...@@ -325,7 +325,7 @@ static int process_sample_event(event_t *event, struct sample_data *sample, ...@@ -325,7 +325,7 @@ static int process_sample_event(event_t *event, struct sample_data *sample,
static struct perf_event_ops event_ops = { static struct perf_event_ops event_ops = {
.sample = process_sample_event, .sample = process_sample_event,
.comm = event__process_comm, .comm = perf_event__process_comm,
.ordered_samples = true, .ordered_samples = true,
}; };
......
...@@ -5,6 +5,7 @@ ...@@ -5,6 +5,7 @@
* *
* Copyright (C) 2009, Thomas Gleixner <tglx@linutronix.de> * Copyright (C) 2009, Thomas Gleixner <tglx@linutronix.de>
* Copyright (C) 2008-2009, Red Hat Inc, Ingo Molnar <mingo@redhat.com> * Copyright (C) 2008-2009, Red Hat Inc, Ingo Molnar <mingo@redhat.com>
* Copyright (C) 2011, Red Hat Inc, Arnaldo Carvalho de Melo <acme@redhat.com>
*/ */
#include "builtin.h" #include "builtin.h"
...@@ -13,9 +14,47 @@ ...@@ -13,9 +14,47 @@
#include "util/parse-events.h" #include "util/parse-events.h"
#include "util/cache.h" #include "util/cache.h"
int cmd_list(int argc __used, const char **argv __used, const char *prefix __used) int cmd_list(int argc, const char **argv, const char *prefix __used)
{ {
setup_pager(); setup_pager();
print_events();
if (argc == 1)
print_events(NULL);
else {
int i;
for (i = 1; i < argc; ++i) {
if (i > 1)
putchar('\n');
if (strncmp(argv[i], "tracepoint", 10) == 0)
print_tracepoint_events(NULL, NULL);
else if (strcmp(argv[i], "hw") == 0 ||
strcmp(argv[i], "hardware") == 0)
print_events_type(PERF_TYPE_HARDWARE);
else if (strcmp(argv[i], "sw") == 0 ||
strcmp(argv[i], "software") == 0)
print_events_type(PERF_TYPE_SOFTWARE);
else if (strcmp(argv[i], "cache") == 0 ||
strcmp(argv[i], "hwcache") == 0)
print_hwcache_events(NULL);
else {
char *sep = strchr(argv[i], ':'), *s;
int sep_idx;
if (sep == NULL) {
print_events(argv[i]);
continue;
}
sep_idx = sep - argv[i];
s = strdup(argv[i]);
if (s == NULL)
return -1;
s[sep_idx] = '\0';
print_tracepoint_events(s, s + sep_idx + 1);
free(s);
}
}
}
return 0; return 0;
} }
...@@ -834,14 +834,14 @@ static void dump_info(void) ...@@ -834,14 +834,14 @@ static void dump_info(void)
die("Unknown type of information\n"); die("Unknown type of information\n");
} }
static int process_sample_event(event_t *self, struct sample_data *sample, static int process_sample_event(union perf_event *event, struct perf_sample *sample,
struct perf_session *s) struct perf_session *s)
{ {
struct thread *thread = perf_session__findnew(s, sample->tid); struct thread *thread = perf_session__findnew(s, sample->tid);
if (thread == NULL) { if (thread == NULL) {
pr_debug("problem processing %d event, skipping it.\n", pr_debug("problem processing %d event, skipping it.\n",
self->header.type); event->header.type);
return -1; return -1;
} }
...@@ -852,7 +852,7 @@ static int process_sample_event(event_t *self, struct sample_data *sample, ...@@ -852,7 +852,7 @@ static int process_sample_event(event_t *self, struct sample_data *sample,
static struct perf_event_ops eops = { static struct perf_event_ops eops = {
.sample = process_sample_event, .sample = process_sample_event,
.comm = event__process_comm, .comm = perf_event__process_comm,
.ordered_samples = true, .ordered_samples = true,
}; };
...@@ -893,7 +893,7 @@ static const char * const report_usage[] = { ...@@ -893,7 +893,7 @@ static const char * const report_usage[] = {
static const struct option report_options[] = { static const struct option report_options[] = {
OPT_STRING('k', "key", &sort_key, "acquired", OPT_STRING('k', "key", &sort_key, "acquired",
"key for sorting"), "key for sorting (acquired / contended / wait_total / wait_max / wait_min)"),
/* TODO: type */ /* TODO: type */
OPT_END() OPT_END()
}; };
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
...@@ -34,13 +34,14 @@ extern int pager_use_color; ...@@ -34,13 +34,14 @@ extern int pager_use_color;
extern int use_browser; extern int use_browser;
#ifdef NO_NEWT_SUPPORT #ifdef NO_NEWT_SUPPORT
static inline void setup_browser(void) static inline void setup_browser(bool fallback_to_pager)
{ {
setup_pager(); if (fallback_to_pager)
setup_pager();
} }
static inline void exit_browser(bool wait_for_ok __used) {} static inline void exit_browser(bool wait_for_ok __used) {}
#else #else
void setup_browser(void); void setup_browser(bool fallback_to_pager);
void exit_browser(bool wait_for_ok); void exit_browser(bool wait_for_ok);
#endif #endif
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册