提交 36db171c 编写于 作者: L Linus Torvalds

Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:
 "Bigger kernel side changes:

   - Add backwards writing capability to the perf ring-buffer code,
     which is preparation for future advanced features like robust
     'overwrite support' and snapshot mode.  (Wang Nan)

   - Add pause and resume ioctls for the perf ringbuffer (Wang Nan)

   - x86 Intel cstate code cleanups and reorgnization (Thomas Gleixner)

   - x86 Intel uncore and CPU PMU driver updates (Kan Liang, Peter
     Zijlstra)

   - x86 AUX (Intel PT) related enhancements and updates (Alexander
     Shishkin)

   - x86 MSR PMU driver enhancements and updates (Huang Rui)

   - ... and lots of other changes spread out over 40+ commits.

  Biggest tooling side changes:

   - 'perf trace' features and enhancements.  (Arnaldo Carvalho de Melo)

   - BPF tooling updates (Wang Nan)

   - 'perf sched' updates (Jiri Olsa)

   - 'perf probe' updates (Masami Hiramatsu)

   - ... plus 200+ other enhancements, fixes and cleanups to tools/

  The merge commits, the shortlog and the changelogs contain a lot more
  details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (249 commits)
  perf/core: Disable the event on a truncated AUX record
  perf/x86/intel/pt: Generate PMI in the STOP region as well
  perf buildid-cache: Use lsdir() for looking up buildid caches
  perf symbols: Use lsdir() for the search in kcore cache directory
  perf tools: Use SBUILD_ID_SIZE where applicable
  perf tools: Fix lsdir to set errno correctly
  perf trace: Move seccomp args beautifiers to tools/perf/trace/beauty/
  perf trace: Move flock op beautifier to tools/perf/trace/beauty/
  perf build: Add build-test for debug-frame on arm/arm64
  perf build: Add build-test for libunwind cross-platforms support
  perf script: Fix export of callchains with recursion in db-export
  perf script: Fix callchain addresses in db-export
  perf script: Fix symbol insertion behavior in db-export
  perf symbols: Add dso__insert_symbol function
  perf scripting python: Use Py_FatalError instead of die()
  perf tools: Remove xrealloc and ALLOC_GROW
  perf help: Do not use ALLOC_GROW in add_cmd_list
  perf pmu: Make pmu_formats_string to check return value of strbuf
  perf header: Make topology checkers to check return value of strbuf
  perf tools: Make alias handler to check return value of strbuf
  ...
......@@ -60,6 +60,7 @@ show up in /proc/sys/kernel:
- panic_on_warn
- perf_cpu_time_max_percent
- perf_event_paranoid
- perf_event_max_stack
- pid_max
- powersave-nap [ PPC only ]
- printk
......@@ -654,6 +655,19 @@ users (without CAP_SYS_ADMIN). The default value is 2.
==============================================================
perf_event_max_stack:
Controls maximum number of stack frames to copy for (attr.sample_type &
PERF_SAMPLE_CALLCHAIN) configured events, for instance, when using
'perf record -g' or 'perf trace --call-graph fp'.
This can only be done when no events are in use that have callchains
enabled, otherwise writing to this file will return -EBUSY.
The default value is 127.
==============================================================
pid_max:
PID allocation wrap value. When the kernel's next PID value
......
......@@ -631,7 +631,7 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
info->address &= ~alignment_mask;
info->ctrl.len <<= offset;
if (!bp->overflow_handler) {
if (is_default_overflow_handler(bp)) {
/*
* Mismatch breakpoints are required for single-stepping
* breakpoints.
......@@ -754,7 +754,7 @@ static void watchpoint_handler(unsigned long addr, unsigned int fsr,
* mismatch breakpoint so we can single-step over the
* watchpoint trigger.
*/
if (!wp->overflow_handler)
if (is_default_overflow_handler(wp))
enable_single_step(wp, instruction_pointer(regs));
unlock:
......
......@@ -75,7 +75,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
tail = (struct frame_tail __user *)regs->ARM_fp - 1;
while ((entry->nr < PERF_MAX_STACK_DEPTH) &&
while ((entry->nr < sysctl_perf_event_max_stack) &&
tail && !((unsigned long)tail & 0x3))
tail = user_backtrace(tail, entry);
}
......
......@@ -616,7 +616,7 @@ static int breakpoint_handler(unsigned long unused, unsigned int esr,
perf_bp_event(bp, regs);
/* Do we need to handle the stepping? */
if (!bp->overflow_handler)
if (is_default_overflow_handler(bp))
step = 1;
unlock:
rcu_read_unlock();
......@@ -712,7 +712,7 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
perf_bp_event(wp, regs);
/* Do we need to handle the stepping? */
if (!wp->overflow_handler)
if (is_default_overflow_handler(wp))
step = 1;
unlock:
......
......@@ -122,7 +122,7 @@ void perf_callchain_user(struct perf_callchain_entry *entry,
tail = (struct frame_tail __user *)regs->regs[29];
while (entry->nr < PERF_MAX_STACK_DEPTH &&
while (entry->nr < sysctl_perf_event_max_stack &&
tail && !((unsigned long)tail & 0xf))
tail = user_backtrace(tail, entry);
} else {
......@@ -132,7 +132,7 @@ void perf_callchain_user(struct perf_callchain_entry *entry,
tail = (struct compat_frame_tail __user *)regs->compat_fp - 1;
while ((entry->nr < PERF_MAX_STACK_DEPTH) &&
while ((entry->nr < sysctl_perf_event_max_stack) &&
tail && !((unsigned long)tail & 0x3))
tail = compat_user_backtrace(tail, entry);
#endif
......
......@@ -65,7 +65,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
--frame;
while ((entry->nr < PERF_MAX_STACK_DEPTH) && frame)
while ((entry->nr < sysctl_perf_event_max_stack) && frame)
frame = user_backtrace(frame, entry);
}
......
......@@ -35,7 +35,7 @@ static void save_raw_perf_callchain(struct perf_callchain_entry *entry,
addr = *sp++;
if (__kernel_text_address(addr)) {
perf_callchain_store(entry, addr);
if (entry->nr >= PERF_MAX_STACK_DEPTH)
if (entry->nr >= sysctl_perf_event_max_stack)
break;
}
}
......@@ -59,7 +59,7 @@ void perf_callchain_kernel(struct perf_callchain_entry *entry,
}
do {
perf_callchain_store(entry, pc);
if (entry->nr >= PERF_MAX_STACK_DEPTH)
if (entry->nr >= sysctl_perf_event_max_stack)
break;
pc = unwind_stack(current, &sp, pc, &ra);
} while (pc);
......
......@@ -247,7 +247,7 @@ static void perf_callchain_user_64(struct perf_callchain_entry *entry,
sp = regs->gpr[1];
perf_callchain_store(entry, next_ip);
while (entry->nr < PERF_MAX_STACK_DEPTH) {
while (entry->nr < sysctl_perf_event_max_stack) {
fp = (unsigned long __user *) sp;
if (!valid_user_sp(sp, 1) || read_user_stack_64(fp, &next_sp))
return;
......@@ -453,7 +453,7 @@ static void perf_callchain_user_32(struct perf_callchain_entry *entry,
sp = regs->gpr[1];
perf_callchain_store(entry, next_ip);
while (entry->nr < PERF_MAX_STACK_DEPTH) {
while (entry->nr < sysctl_perf_event_max_stack) {
fp = (unsigned int __user *) (unsigned long) sp;
if (!valid_user_sp(sp, 0) || read_user_stack_32(fp, &next_sp))
return;
......
......@@ -1756,7 +1756,7 @@ void perf_callchain_kernel(struct perf_callchain_entry *entry,
}
}
#endif
} while (entry->nr < PERF_MAX_STACK_DEPTH);
} while (entry->nr < sysctl_perf_event_max_stack);
}
static inline int
......@@ -1790,7 +1790,7 @@ static void perf_callchain_user_64(struct perf_callchain_entry *entry,
pc = sf.callers_pc;
ufp = (unsigned long)sf.fp + STACK_BIAS;
perf_callchain_store(entry, pc);
} while (entry->nr < PERF_MAX_STACK_DEPTH);
} while (entry->nr < sysctl_perf_event_max_stack);
}
static void perf_callchain_user_32(struct perf_callchain_entry *entry,
......@@ -1822,7 +1822,7 @@ static void perf_callchain_user_32(struct perf_callchain_entry *entry,
ufp = (unsigned long)sf.fp;
}
perf_callchain_store(entry, pc);
} while (entry->nr < PERF_MAX_STACK_DEPTH);
} while (entry->nr < sysctl_perf_event_max_stack);
}
void
......
......@@ -164,10 +164,6 @@ config INSTRUCTION_DECODER
def_bool y
depends on KPROBES || PERF_EVENTS || UPROBES
config PERF_EVENTS_INTEL_UNCORE
def_bool y
depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
config OUTPUT_FORMAT
string
default "elf32-i386" if X86_32
......@@ -1046,6 +1042,8 @@ config X86_THERMAL_VECTOR
def_bool y
depends on X86_MCE_INTEL
source "arch/x86/events/Kconfig"
config X86_LEGACY_VM86
bool "Legacy VM86 support"
default n
......@@ -1210,15 +1208,6 @@ config MICROCODE_OLD_INTERFACE
def_bool y
depends on MICROCODE
config PERF_EVENTS_AMD_POWER
depends on PERF_EVENTS && CPU_SUP_AMD
tristate "AMD Processor Power Reporting Mechanism"
---help---
Provide power reporting mechanism support for AMD processors.
Currently, it leverages X86_FEATURE_ACC_POWER
(CPUID Fn8000_0007_EDX[12]) interface to calculate the
average power consumption on Family 15h processors.
config X86_MSR
tristate "/dev/cpu/*/msr - Model-specific register support"
---help---
......
menu "Performance monitoring"
config PERF_EVENTS_INTEL_UNCORE
tristate "Intel uncore performance events"
depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
default y
---help---
Include support for Intel uncore performance events. These are
available on NehalemEX and more modern processors.
config PERF_EVENTS_INTEL_RAPL
tristate "Intel rapl performance events"
depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
default y
---help---
Include support for Intel rapl performance events for power
monitoring on modern processors.
config PERF_EVENTS_INTEL_CSTATE
tristate "Intel cstate performance events"
depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
default y
---help---
Include support for Intel cstate performance events for power
monitoring on modern processors.
config PERF_EVENTS_AMD_POWER
depends on PERF_EVENTS && CPU_SUP_AMD
tristate "AMD Processor Power Reporting Mechanism"
---help---
Provide power reporting mechanism support for AMD processors.
Currently, it leverages X86_FEATURE_ACC_POWER
(CPUID Fn8000_0007_EDX[12]) interface to calculate the
average power consumption on Family 15h processors.
endmenu
......@@ -6,9 +6,6 @@ obj-$(CONFIG_X86_LOCAL_APIC) += amd/ibs.o msr.o
ifdef CONFIG_AMD_IOMMU
obj-$(CONFIG_CPU_SUP_AMD) += amd/iommu.o
endif
obj-$(CONFIG_CPU_SUP_INTEL) += intel/core.o intel/bts.o intel/cqm.o
obj-$(CONFIG_CPU_SUP_INTEL) += intel/cstate.o intel/ds.o intel/knc.o
obj-$(CONFIG_CPU_SUP_INTEL) += intel/lbr.o intel/p4.o intel/p6.o intel/pt.o
obj-$(CONFIG_CPU_SUP_INTEL) += intel/rapl.o msr.o
obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += intel/uncore.o intel/uncore_nhmex.o
obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += intel/uncore_snb.o intel/uncore_snbep.o
obj-$(CONFIG_CPU_SUP_INTEL) += msr.o
obj-$(CONFIG_CPU_SUP_INTEL) += intel/
......@@ -263,6 +263,7 @@ static const struct attribute_group *amd_uncore_attr_groups[] = {
};
static struct pmu amd_nb_pmu = {
.task_ctx_nr = perf_invalid_context,
.attr_groups = amd_uncore_attr_groups,
.name = "amd_nb",
.event_init = amd_uncore_event_init,
......@@ -274,6 +275,7 @@ static struct pmu amd_nb_pmu = {
};
static struct pmu amd_l2_pmu = {
.task_ctx_nr = perf_invalid_context,
.attr_groups = amd_uncore_attr_groups,
.name = "amd_l2",
.event_init = amd_uncore_event_init,
......
......@@ -360,6 +360,9 @@ int x86_add_exclusive(unsigned int what)
{
int i;
if (x86_pmu.lbr_pt_coexist)
return 0;
if (!atomic_inc_not_zero(&x86_pmu.lbr_exclusive[what])) {
mutex_lock(&pmc_reserve_mutex);
for (i = 0; i < ARRAY_SIZE(x86_pmu.lbr_exclusive); i++) {
......@@ -380,6 +383,9 @@ int x86_add_exclusive(unsigned int what)
void x86_del_exclusive(unsigned int what)
{
if (x86_pmu.lbr_pt_coexist)
return;
atomic_dec(&x86_pmu.lbr_exclusive[what]);
atomic_dec(&active_events);
}
......@@ -2277,7 +2283,7 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
fp = compat_ptr(ss_base + regs->bp);
pagefault_disable();
while (entry->nr < PERF_MAX_STACK_DEPTH) {
while (entry->nr < sysctl_perf_event_max_stack) {
unsigned long bytes;
frame.next_frame = 0;
frame.return_address = 0;
......@@ -2337,7 +2343,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
return;
pagefault_disable();
while (entry->nr < PERF_MAX_STACK_DEPTH) {
while (entry->nr < sysctl_perf_event_max_stack) {
unsigned long bytes;
frame.next_frame = NULL;
frame.return_address = 0;
......
obj-$(CONFIG_CPU_SUP_INTEL) += core.o bts.o cqm.o
obj-$(CONFIG_CPU_SUP_INTEL) += ds.o knc.o
obj-$(CONFIG_CPU_SUP_INTEL) += lbr.o p4.o p6.o pt.o
obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL) += intel-rapl.o
intel-rapl-objs := rapl.o
obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += intel-uncore.o
intel-uncore-objs := uncore.o uncore_nhmex.o uncore_snb.o uncore_snbep.o
obj-$(CONFIG_PERF_EVENTS_INTEL_CSTATE) += intel-cstate.o
intel-cstate-objs := cstate.o
......@@ -171,18 +171,6 @@ static void bts_buffer_pad_out(struct bts_phys *phys, unsigned long head)
memset(page_address(phys->page) + index, 0, phys->size - index);
}
static bool bts_buffer_is_full(struct bts_buffer *buf, struct bts_ctx *bts)
{
if (buf->snapshot)
return false;
if (local_read(&buf->data_size) >= bts->handle.size ||
bts->handle.size - local_read(&buf->data_size) < BTS_RECORD_SIZE)
return true;
return false;
}
static void bts_update(struct bts_ctx *bts)
{
int cpu = raw_smp_processor_id();
......@@ -213,18 +201,15 @@ static void bts_update(struct bts_ctx *bts)
}
}
static int
bts_buffer_reset(struct bts_buffer *buf, struct perf_output_handle *handle);
static void __bts_event_start(struct perf_event *event)
{
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct bts_buffer *buf = perf_get_aux(&bts->handle);
u64 config = 0;
if (!buf || bts_buffer_is_full(buf, bts))
return;
event->hw.itrace_started = 1;
event->hw.state = 0;
if (!buf->snapshot)
config |= ARCH_PERFMON_EVENTSEL_INT;
if (!event->attr.exclude_kernel)
......@@ -241,16 +226,41 @@ static void __bts_event_start(struct perf_event *event)
wmb();
intel_pmu_enable_bts(config);
}
static void bts_event_start(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct bts_buffer *buf;
buf = perf_aux_output_begin(&bts->handle, event);
if (!buf)
goto fail_stop;
if (bts_buffer_reset(buf, &bts->handle))
goto fail_end_stop;
bts->ds_back.bts_buffer_base = cpuc->ds->bts_buffer_base;
bts->ds_back.bts_absolute_maximum = cpuc->ds->bts_absolute_maximum;
bts->ds_back.bts_interrupt_threshold = cpuc->ds->bts_interrupt_threshold;
event->hw.itrace_started = 1;
event->hw.state = 0;
__bts_event_start(event);
/* PMI handler: this counter is running and likely generating PMIs */
ACCESS_ONCE(bts->started) = 1;
return;
fail_end_stop:
perf_aux_output_end(&bts->handle, 0, false);
fail_stop:
event->hw.state = PERF_HES_STOPPED;
}
static void __bts_event_stop(struct perf_event *event)
......@@ -269,15 +279,32 @@ static void __bts_event_stop(struct perf_event *event)
static void bts_event_stop(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct bts_buffer *buf = perf_get_aux(&bts->handle);
/* PMI handler: don't restart this counter */
ACCESS_ONCE(bts->started) = 0;
__bts_event_stop(event);
if (flags & PERF_EF_UPDATE)
if (flags & PERF_EF_UPDATE) {
bts_update(bts);
if (buf) {
if (buf->snapshot)
bts->handle.head =
local_xchg(&buf->data_size,
buf->nr_pages << PAGE_SHIFT);
perf_aux_output_end(&bts->handle, local_xchg(&buf->data_size, 0),
!!local_xchg(&buf->lost, 0));
}
cpuc->ds->bts_index = bts->ds_back.bts_buffer_base;
cpuc->ds->bts_buffer_base = bts->ds_back.bts_buffer_base;
cpuc->ds->bts_absolute_maximum = bts->ds_back.bts_absolute_maximum;
cpuc->ds->bts_interrupt_threshold = bts->ds_back.bts_interrupt_threshold;
}
}
void intel_bts_enable_local(void)
......@@ -417,34 +444,14 @@ int intel_bts_interrupt(void)
static void bts_event_del(struct perf_event *event, int mode)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct bts_buffer *buf = perf_get_aux(&bts->handle);
bts_event_stop(event, PERF_EF_UPDATE);
if (buf) {
if (buf->snapshot)
bts->handle.head =
local_xchg(&buf->data_size,
buf->nr_pages << PAGE_SHIFT);
perf_aux_output_end(&bts->handle, local_xchg(&buf->data_size, 0),
!!local_xchg(&buf->lost, 0));
}
cpuc->ds->bts_index = bts->ds_back.bts_buffer_base;
cpuc->ds->bts_buffer_base = bts->ds_back.bts_buffer_base;
cpuc->ds->bts_absolute_maximum = bts->ds_back.bts_absolute_maximum;
cpuc->ds->bts_interrupt_threshold = bts->ds_back.bts_interrupt_threshold;
}
static int bts_event_add(struct perf_event *event, int mode)
{
struct bts_buffer *buf;
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
int ret = -EBUSY;
event->hw.state = PERF_HES_STOPPED;
......@@ -454,26 +461,10 @@ static int bts_event_add(struct perf_event *event, int mode)
if (bts->handle.event)
return -EBUSY;
buf = perf_aux_output_begin(&bts->handle, event);
if (!buf)
return -EINVAL;
ret = bts_buffer_reset(buf, &bts->handle);
if (ret) {
perf_aux_output_end(&bts->handle, 0, false);
return ret;
}
bts->ds_back.bts_buffer_base = cpuc->ds->bts_buffer_base;
bts->ds_back.bts_absolute_maximum = cpuc->ds->bts_absolute_maximum;
bts->ds_back.bts_interrupt_threshold = cpuc->ds->bts_interrupt_threshold;
if (mode & PERF_EF_START) {
bts_event_start(event, 0);
if (hwc->state & PERF_HES_STOPPED) {
bts_event_del(event, 0);
return -EBUSY;
}
if (hwc->state & PERF_HES_STOPPED)
return -EINVAL;
}
return 0;
......
......@@ -1465,6 +1465,140 @@ static __initconst const u64 slm_hw_cache_event_ids
},
};
static struct extra_reg intel_glm_extra_regs[] __read_mostly = {
/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x760005ffbfull, RSP_0),
INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x360005ffbfull, RSP_1),
EVENT_EXTRA_END
};
#define GLM_DEMAND_DATA_RD BIT_ULL(0)
#define GLM_DEMAND_RFO BIT_ULL(1)
#define GLM_ANY_RESPONSE BIT_ULL(16)
#define GLM_SNP_NONE_OR_MISS BIT_ULL(33)
#define GLM_DEMAND_READ GLM_DEMAND_DATA_RD
#define GLM_DEMAND_WRITE GLM_DEMAND_RFO
#define GLM_DEMAND_PREFETCH (SNB_PF_DATA_RD|SNB_PF_RFO)
#define GLM_LLC_ACCESS GLM_ANY_RESPONSE
#define GLM_SNP_ANY (GLM_SNP_NONE_OR_MISS|SNB_NO_FWD|SNB_HITM)
#define GLM_LLC_MISS (GLM_SNP_ANY|SNB_NON_DRAM)
static __initconst const u64 glm_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
[C(L1D)] = {
[C(OP_READ)] = {
[C(RESULT_ACCESS)] = 0x81d0, /* MEM_UOPS_RETIRED.ALL_LOADS */
[C(RESULT_MISS)] = 0x0,
},
[C(OP_WRITE)] = {
[C(RESULT_ACCESS)] = 0x82d0, /* MEM_UOPS_RETIRED.ALL_STORES */
[C(RESULT_MISS)] = 0x0,
},
[C(OP_PREFETCH)] = {
[C(RESULT_ACCESS)] = 0x0,
[C(RESULT_MISS)] = 0x0,
},
},
[C(L1I)] = {
[C(OP_READ)] = {
[C(RESULT_ACCESS)] = 0x0380, /* ICACHE.ACCESSES */
[C(RESULT_MISS)] = 0x0280, /* ICACHE.MISSES */
},
[C(OP_WRITE)] = {
[C(RESULT_ACCESS)] = -1,
[C(RESULT_MISS)] = -1,
},
[C(OP_PREFETCH)] = {
[C(RESULT_ACCESS)] = 0x0,
[C(RESULT_MISS)] = 0x0,
},
},
[C(LL)] = {
[C(OP_READ)] = {
[C(RESULT_ACCESS)] = 0x1b7, /* OFFCORE_RESPONSE */
[C(RESULT_MISS)] = 0x1b7, /* OFFCORE_RESPONSE */
},
[C(OP_WRITE)] = {
[C(RESULT_ACCESS)] = 0x1b7, /* OFFCORE_RESPONSE */
[C(RESULT_MISS)] = 0x1b7, /* OFFCORE_RESPONSE */
},
[C(OP_PREFETCH)] = {
[C(RESULT_ACCESS)] = 0x1b7, /* OFFCORE_RESPONSE */
[C(RESULT_MISS)] = 0x1b7, /* OFFCORE_RESPONSE */
},
},
[C(DTLB)] = {
[C(OP_READ)] = {
[C(RESULT_ACCESS)] = 0x81d0, /* MEM_UOPS_RETIRED.ALL_LOADS */
[C(RESULT_MISS)] = 0x0,
},
[C(OP_WRITE)] = {
[C(RESULT_ACCESS)] = 0x82d0, /* MEM_UOPS_RETIRED.ALL_STORES */
[C(RESULT_MISS)] = 0x0,
},
[C(OP_PREFETCH)] = {
[C(RESULT_ACCESS)] = 0x0,
[C(RESULT_MISS)] = 0x0,
},
},
[C(ITLB)] = {
[C(OP_READ)] = {
[C(RESULT_ACCESS)] = 0x00c0, /* INST_RETIRED.ANY_P */
[C(RESULT_MISS)] = 0x0481, /* ITLB.MISS */
},
[C(OP_WRITE)] = {
[C(RESULT_ACCESS)] = -1,
[C(RESULT_MISS)] = -1,
},
[C(OP_PREFETCH)] = {
[C(RESULT_ACCESS)] = -1,
[C(RESULT_MISS)] = -1,
},
},
[C(BPU)] = {
[C(OP_READ)] = {
[C(RESULT_ACCESS)] = 0x00c4, /* BR_INST_RETIRED.ALL_BRANCHES */
[C(RESULT_MISS)] = 0x00c5, /* BR_MISP_RETIRED.ALL_BRANCHES */
},
[C(OP_WRITE)] = {
[C(RESULT_ACCESS)] = -1,
[C(RESULT_MISS)] = -1,
},
[C(OP_PREFETCH)] = {
[C(RESULT_ACCESS)] = -1,
[C(RESULT_MISS)] = -1,
},
},
};
static __initconst const u64 glm_hw_cache_extra_regs
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
[C(LL)] = {
[C(OP_READ)] = {
[C(RESULT_ACCESS)] = GLM_DEMAND_READ|
GLM_LLC_ACCESS,
[C(RESULT_MISS)] = GLM_DEMAND_READ|
GLM_LLC_MISS,
},
[C(OP_WRITE)] = {
[C(RESULT_ACCESS)] = GLM_DEMAND_WRITE|
GLM_LLC_ACCESS,
[C(RESULT_MISS)] = GLM_DEMAND_WRITE|
GLM_LLC_MISS,
},
[C(OP_PREFETCH)] = {
[C(RESULT_ACCESS)] = GLM_DEMAND_PREFETCH|
GLM_LLC_ACCESS,
[C(RESULT_MISS)] = GLM_DEMAND_PREFETCH|
GLM_LLC_MISS,
},
},
};
#define KNL_OT_L2_HITE BIT_ULL(19) /* Other Tile L2 Hit */
#define KNL_OT_L2_HITF BIT_ULL(20) /* Other Tile L2 Hit */
#define KNL_MCDRAM_LOCAL BIT_ULL(21)
......@@ -3447,7 +3581,7 @@ __init int intel_pmu_init(void)
memcpy(hw_cache_extra_regs, slm_hw_cache_extra_regs,
sizeof(hw_cache_extra_regs));
intel_pmu_lbr_init_atom();
intel_pmu_lbr_init_slm();
x86_pmu.event_constraints = intel_slm_event_constraints;
x86_pmu.pebs_constraints = intel_slm_pebs_event_constraints;
......@@ -3456,6 +3590,30 @@ __init int intel_pmu_init(void)
pr_cont("Silvermont events, ");
break;
case 92: /* 14nm Atom "Goldmont" */
case 95: /* 14nm Atom "Goldmont Denverton" */
memcpy(hw_cache_event_ids, glm_hw_cache_event_ids,
sizeof(hw_cache_event_ids));
memcpy(hw_cache_extra_regs, glm_hw_cache_extra_regs,
sizeof(hw_cache_extra_regs));
intel_pmu_lbr_init_skl();
x86_pmu.event_constraints = intel_slm_event_constraints;
x86_pmu.pebs_constraints = intel_glm_pebs_event_constraints;
x86_pmu.extra_regs = intel_glm_extra_regs;
/*
* It's recommended to use CPU_CLK_UNHALTED.CORE_P + NPEBS
* for precise cycles.
* :pp is identical to :ppp
*/
x86_pmu.pebs_aliases = NULL;
x86_pmu.pebs_prec_dist = true;
x86_pmu.lbr_pt_coexist = true;
x86_pmu.flags |= PMU_FL_HAS_RSP_1;
pr_cont("Goldmont events, ");
break;
case 37: /* 32nm Westmere */
case 44: /* 32nm Westmere-EP */
case 47: /* 32nm Westmere-EX */
......
此差异已折叠。
......@@ -645,6 +645,12 @@ struct event_constraint intel_slm_pebs_event_constraints[] = {
EVENT_CONSTRAINT_END
};
struct event_constraint intel_glm_pebs_event_constraints[] = {
/* Allow all events as PEBS with no flags */
INTEL_ALL_EVENT_CONSTRAINT(0, 0x1),
EVENT_CONSTRAINT_END
};
struct event_constraint intel_nehalem_pebs_event_constraints[] = {
INTEL_PLD_CONSTRAINT(0x100b, 0xf), /* MEM_INST_RETIRED.* */
INTEL_FLAGS_EVENT_CONSTRAINT(0x0f, 0xf), /* MEM_UNCORE_RETIRED.* */
......
......@@ -14,7 +14,8 @@ enum {
LBR_FORMAT_EIP_FLAGS = 0x03,
LBR_FORMAT_EIP_FLAGS2 = 0x04,
LBR_FORMAT_INFO = 0x05,
LBR_FORMAT_MAX_KNOWN = LBR_FORMAT_INFO,
LBR_FORMAT_TIME = 0x06,
LBR_FORMAT_MAX_KNOWN = LBR_FORMAT_TIME,
};
static enum {
......@@ -464,6 +465,16 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
abort = !!(info & LBR_INFO_ABORT);
cycles = (info & LBR_INFO_CYCLES);
}
if (lbr_format == LBR_FORMAT_TIME) {
mis = !!(from & LBR_FROM_FLAG_MISPRED);
pred = !mis;
skip = 1;
cycles = ((to >> 48) & LBR_INFO_CYCLES);
to = (u64)((((s64)to) << 16) >> 16);
}
if (lbr_flags & LBR_EIP_FLAGS) {
mis = !!(from & LBR_FROM_FLAG_MISPRED);
pred = !mis;
......@@ -1049,6 +1060,24 @@ void __init intel_pmu_lbr_init_atom(void)
pr_cont("8-deep LBR, ");
}
/* slm */
void __init intel_pmu_lbr_init_slm(void)
{
x86_pmu.lbr_nr = 8;
x86_pmu.lbr_tos = MSR_LBR_TOS;
x86_pmu.lbr_from = MSR_LBR_CORE_FROM;
x86_pmu.lbr_to = MSR_LBR_CORE_TO;
x86_pmu.lbr_sel_mask = LBR_SEL_MASK;
x86_pmu.lbr_sel_map = nhm_lbr_sel_map;
/*
* SW branch filter usage:
* - compensate for lack of HW filter
*/
pr_cont("8-deep LBR, ");
}
/* Knights Landing */
void intel_pmu_lbr_init_knl(void)
{
......
......@@ -67,11 +67,13 @@ static struct pt_cap_desc {
PT_CAP(max_subleaf, 0, CR_EAX, 0xffffffff),
PT_CAP(cr3_filtering, 0, CR_EBX, BIT(0)),
PT_CAP(psb_cyc, 0, CR_EBX, BIT(1)),
PT_CAP(ip_filtering, 0, CR_EBX, BIT(2)),
PT_CAP(mtc, 0, CR_EBX, BIT(3)),
PT_CAP(topa_output, 0, CR_ECX, BIT(0)),
PT_CAP(topa_multiple_entries, 0, CR_ECX, BIT(1)),
PT_CAP(single_range_output, 0, CR_ECX, BIT(2)),
PT_CAP(payloads_lip, 0, CR_ECX, BIT(31)),
PT_CAP(num_address_ranges, 1, CR_EAX, 0x3),
PT_CAP(mtc_periods, 1, CR_EAX, 0xffff0000),
PT_CAP(cycle_thresholds, 1, CR_EBX, 0xffff),
PT_CAP(psb_periods, 1, CR_EBX, 0xffff0000),
......@@ -125,9 +127,46 @@ static struct attribute_group pt_format_group = {
.attrs = pt_formats_attr,
};
static ssize_t
pt_timing_attr_show(struct device *dev, struct device_attribute *attr,
char *page)
{
struct perf_pmu_events_attr *pmu_attr =
container_of(attr, struct perf_pmu_events_attr, attr);
switch (pmu_attr->id) {
case 0:
return sprintf(page, "%lu\n", pt_pmu.max_nonturbo_ratio);
case 1:
return sprintf(page, "%u:%u\n",
pt_pmu.tsc_art_num,
pt_pmu.tsc_art_den);
default:
break;
}
return -EINVAL;
}
PMU_EVENT_ATTR(max_nonturbo_ratio, timing_attr_max_nonturbo_ratio, 0,
pt_timing_attr_show);
PMU_EVENT_ATTR(tsc_art_ratio, timing_attr_tsc_art_ratio, 1,
pt_timing_attr_show);
static struct attribute *pt_timing_attr[] = {
&timing_attr_max_nonturbo_ratio.attr.attr,
&timing_attr_tsc_art_ratio.attr.attr,
NULL,
};
static struct attribute_group pt_timing_group = {
.attrs = pt_timing_attr,
};
static const struct attribute_group *pt_attr_groups[] = {
&pt_cap_group,
&pt_format_group,
&pt_timing_group,
NULL,
};
......@@ -140,6 +179,23 @@ static int __init pt_pmu_hw_init(void)
int ret;
long i;
rdmsrl(MSR_PLATFORM_INFO, reg);
pt_pmu.max_nonturbo_ratio = (reg & 0xff00) >> 8;
/*
* if available, read in TSC to core crystal clock ratio,
* otherwise, zero for numerator stands for "not enumerated"
* as per SDM
*/
if (boot_cpu_data.cpuid_level >= CPUID_TSC_LEAF) {
u32 eax, ebx, ecx, edx;
cpuid(CPUID_TSC_LEAF, &eax, &ebx, &ecx, &edx);
pt_pmu.tsc_art_num = ebx;
pt_pmu.tsc_art_den = eax;
}
if (boot_cpu_has(X86_FEATURE_VMX)) {
/*
* Intel SDM, 36.5 "Tracing post-VMXON" says that
......@@ -263,6 +319,75 @@ static bool pt_event_valid(struct perf_event *event)
* These all are cpu affine and operate on a local PT
*/
/* Address ranges and their corresponding msr configuration registers */
static const struct pt_address_range {
unsigned long msr_a;
unsigned long msr_b;
unsigned int reg_off;
} pt_address_ranges[] = {
{
.msr_a = MSR_IA32_RTIT_ADDR0_A,
.msr_b = MSR_IA32_RTIT_ADDR0_B,
.reg_off = RTIT_CTL_ADDR0_OFFSET,
},
{
.msr_a = MSR_IA32_RTIT_ADDR1_A,
.msr_b = MSR_IA32_RTIT_ADDR1_B,
.reg_off = RTIT_CTL_ADDR1_OFFSET,
},
{
.msr_a = MSR_IA32_RTIT_ADDR2_A,
.msr_b = MSR_IA32_RTIT_ADDR2_B,
.reg_off = RTIT_CTL_ADDR2_OFFSET,
},
{
.msr_a = MSR_IA32_RTIT_ADDR3_A,
.msr_b = MSR_IA32_RTIT_ADDR3_B,
.reg_off = RTIT_CTL_ADDR3_OFFSET,
}
};
static u64 pt_config_filters(struct perf_event *event)
{
struct pt_filters *filters = event->hw.addr_filters;
struct pt *pt = this_cpu_ptr(&pt_ctx);
unsigned int range = 0;
u64 rtit_ctl = 0;
if (!filters)
return 0;
perf_event_addr_filters_sync(event);
for (range = 0; range < filters->nr_filters; range++) {
struct pt_filter *filter = &filters->filter[range];
/*
* Note, if the range has zero start/end addresses due
* to its dynamic object not being loaded yet, we just
* go ahead and program zeroed range, which will simply
* produce no data. Note^2: if executable code at 0x0
* is a concern, we can set up an "invalid" configuration
* such as msr_b < msr_a.
*/
/* avoid redundant msr writes */
if (pt->filters.filter[range].msr_a != filter->msr_a) {
wrmsrl(pt_address_ranges[range].msr_a, filter->msr_a);
pt->filters.filter[range].msr_a = filter->msr_a;
}
if (pt->filters.filter[range].msr_b != filter->msr_b) {
wrmsrl(pt_address_ranges[range].msr_b, filter->msr_b);
pt->filters.filter[range].msr_b = filter->msr_b;
}
rtit_ctl |= filter->config << pt_address_ranges[range].reg_off;
}
return rtit_ctl;
}
static void pt_config(struct perf_event *event)
{
u64 reg;
......@@ -272,7 +397,8 @@ static void pt_config(struct perf_event *event)
wrmsrl(MSR_IA32_RTIT_STATUS, 0);
}
reg = RTIT_CTL_TOPA | RTIT_CTL_BRANCH_EN | RTIT_CTL_TRACEEN;
reg = pt_config_filters(event);
reg |= RTIT_CTL_TOPA | RTIT_CTL_BRANCH_EN | RTIT_CTL_TRACEEN;
if (!event->attr.exclude_kernel)
reg |= RTIT_CTL_OS;
......@@ -921,24 +1047,80 @@ static void pt_buffer_free_aux(void *data)
kfree(buf);
}
/**
* pt_buffer_is_full() - check if the buffer is full
* @buf: PT buffer.
* @pt: Per-cpu pt handle.
*
* If the user hasn't read data from the output region that aux_head
* points to, the buffer is considered full: the user needs to read at
* least this region and update aux_tail to point past it.
*/
static bool pt_buffer_is_full(struct pt_buffer *buf, struct pt *pt)
static int pt_addr_filters_init(struct perf_event *event)
{
if (buf->snapshot)
return false;
struct pt_filters *filters;
int node = event->cpu == -1 ? -1 : cpu_to_node(event->cpu);
if (!pt_cap_get(PT_CAP_num_address_ranges))
return 0;
filters = kzalloc_node(sizeof(struct pt_filters), GFP_KERNEL, node);
if (!filters)
return -ENOMEM;
if (event->parent)
memcpy(filters, event->parent->hw.addr_filters,
sizeof(*filters));
event->hw.addr_filters = filters;
return 0;
}
static void pt_addr_filters_fini(struct perf_event *event)
{
kfree(event->hw.addr_filters);
event->hw.addr_filters = NULL;
}
static int pt_event_addr_filters_validate(struct list_head *filters)
{
struct perf_addr_filter *filter;
int range = 0;
list_for_each_entry(filter, filters, entry) {
/* PT doesn't support single address triggers */
if (!filter->range)
return -EOPNOTSUPP;
if (!filter->inode && !kernel_ip(filter->offset))
return -EINVAL;
if (++range > pt_cap_get(PT_CAP_num_address_ranges))
return -EOPNOTSUPP;
}
return 0;
}
static void pt_event_addr_filters_sync(struct perf_event *event)
{
struct perf_addr_filters_head *head = perf_event_addr_filters(event);
unsigned long msr_a, msr_b, *offs = event->addr_filters_offs;
struct pt_filters *filters = event->hw.addr_filters;
struct perf_addr_filter *filter;
int range = 0;
if (!filters)
return;
if (local_read(&buf->data_size) >= pt->handle.size)
return true;
list_for_each_entry(filter, &head->list, entry) {
if (filter->inode && !offs[range]) {
msr_a = msr_b = 0;
} else {
/* apply the offset */
msr_a = filter->offset + offs[range];
msr_b = filter->size + msr_a;
}
filters->filter[range].msr_a = msr_a;
filters->filter[range].msr_b = msr_b;
filters->filter[range].config = filter->filter ? 1 : 2;
range++;
}
return false;
filters->nr_filters = range;
}
/**
......@@ -955,7 +1137,7 @@ void intel_pt_interrupt(void)
* after PT has been disabled by pt_event_stop(). Make sure we don't
* do anything (particularly, re-enable) for this event here.
*/
if (!ACCESS_ONCE(pt->handle_nmi))
if (!READ_ONCE(pt->handle_nmi))
return;
/*
......@@ -1040,23 +1222,36 @@ EXPORT_SYMBOL_GPL(intel_pt_handle_vmx);
static void pt_event_start(struct perf_event *event, int mode)
{
struct hw_perf_event *hwc = &event->hw;
struct pt *pt = this_cpu_ptr(&pt_ctx);
struct pt_buffer *buf = perf_get_aux(&pt->handle);
struct pt_buffer *buf;
if (READ_ONCE(pt->vmx_on))
return;
if (!buf || pt_buffer_is_full(buf, pt)) {
event->hw.state = PERF_HES_STOPPED;
return;
buf = perf_aux_output_begin(&pt->handle, event);
if (!buf)
goto fail_stop;
pt_buffer_reset_offsets(buf, pt->handle.head);
if (!buf->snapshot) {
if (pt_buffer_reset_markers(buf, &pt->handle))
goto fail_end_stop;
}
ACCESS_ONCE(pt->handle_nmi) = 1;
event->hw.state = 0;
WRITE_ONCE(pt->handle_nmi, 1);
hwc->state = 0;
pt_config_buffer(buf->cur->table, buf->cur_idx,
buf->output_off);
pt_config(event);
return;
fail_end_stop:
perf_aux_output_end(&pt->handle, 0, true);
fail_stop:
hwc->state = PERF_HES_STOPPED;
}
static void pt_event_stop(struct perf_event *event, int mode)
......@@ -1067,7 +1262,7 @@ static void pt_event_stop(struct perf_event *event, int mode)
* Protect against the PMI racing with disabling wrmsr,
* see comment in intel_pt_interrupt().
*/
ACCESS_ONCE(pt->handle_nmi) = 0;
WRITE_ONCE(pt->handle_nmi, 0);
pt_config_stop(event);
......@@ -1090,19 +1285,7 @@ static void pt_event_stop(struct perf_event *event, int mode)
pt_handle_status(pt);
pt_update_head(pt);
}
}
static void pt_event_del(struct perf_event *event, int mode)
{
struct pt *pt = this_cpu_ptr(&pt_ctx);
struct pt_buffer *buf;
pt_event_stop(event, PERF_EF_UPDATE);
buf = perf_get_aux(&pt->handle);
if (buf) {
if (buf->snapshot)
pt->handle.head =
local_xchg(&buf->data_size,
......@@ -1112,9 +1295,13 @@ static void pt_event_del(struct perf_event *event, int mode)
}
}
static void pt_event_del(struct perf_event *event, int mode)
{
pt_event_stop(event, PERF_EF_UPDATE);
}
static int pt_event_add(struct perf_event *event, int mode)
{
struct pt_buffer *buf;
struct pt *pt = this_cpu_ptr(&pt_ctx);
struct hw_perf_event *hwc = &event->hw;
int ret = -EBUSY;
......@@ -1122,34 +1309,18 @@ static int pt_event_add(struct perf_event *event, int mode)
if (pt->handle.event)
goto fail;
buf = perf_aux_output_begin(&pt->handle, event);
ret = -EINVAL;
if (!buf)
goto fail_stop;
pt_buffer_reset_offsets(buf, pt->handle.head);
if (!buf->snapshot) {
ret = pt_buffer_reset_markers(buf, &pt->handle);
if (ret)
goto fail_end_stop;
}
if (mode & PERF_EF_START) {
pt_event_start(event, 0);
ret = -EBUSY;
ret = -EINVAL;
if (hwc->state == PERF_HES_STOPPED)
goto fail_end_stop;
goto fail;
} else {
hwc->state = PERF_HES_STOPPED;
}
return 0;
fail_end_stop:
perf_aux_output_end(&pt->handle, 0, true);
fail_stop:
hwc->state = PERF_HES_STOPPED;
ret = 0;
fail:
return ret;
}
......@@ -1159,6 +1330,7 @@ static void pt_event_read(struct perf_event *event)
static void pt_event_destroy(struct perf_event *event)
{
pt_addr_filters_fini(event);
x86_del_exclusive(x86_lbr_exclusive_pt);
}
......@@ -1173,6 +1345,11 @@ static int pt_event_init(struct perf_event *event)
if (x86_add_exclusive(x86_lbr_exclusive_pt))
return -EBUSY;
if (pt_addr_filters_init(event)) {
x86_del_exclusive(x86_lbr_exclusive_pt);
return -ENOMEM;
}
event->destroy = pt_event_destroy;
return 0;
......@@ -1192,7 +1369,7 @@ static __init int pt_init(void)
BUILD_BUG_ON(sizeof(struct topa) > PAGE_SIZE);
if (!test_cpu_cap(&boot_cpu_data, X86_FEATURE_INTEL_PT))
if (!boot_cpu_has(X86_FEATURE_INTEL_PT))
return -ENODEV;
get_online_cpus();
......@@ -1226,16 +1403,21 @@ static __init int pt_init(void)
PERF_PMU_CAP_AUX_NO_SG | PERF_PMU_CAP_AUX_SW_DOUBLEBUF;
pt_pmu.pmu.capabilities |= PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE;
pt_pmu.pmu.attr_groups = pt_attr_groups;
pt_pmu.pmu.task_ctx_nr = perf_sw_context;
pt_pmu.pmu.event_init = pt_event_init;
pt_pmu.pmu.add = pt_event_add;
pt_pmu.pmu.del = pt_event_del;
pt_pmu.pmu.start = pt_event_start;
pt_pmu.pmu.stop = pt_event_stop;
pt_pmu.pmu.read = pt_event_read;
pt_pmu.pmu.setup_aux = pt_buffer_setup_aux;
pt_pmu.pmu.free_aux = pt_buffer_free_aux;
pt_pmu.pmu.attr_groups = pt_attr_groups;
pt_pmu.pmu.task_ctx_nr = perf_sw_context;
pt_pmu.pmu.event_init = pt_event_init;
pt_pmu.pmu.add = pt_event_add;
pt_pmu.pmu.del = pt_event_del;
pt_pmu.pmu.start = pt_event_start;
pt_pmu.pmu.stop = pt_event_stop;
pt_pmu.pmu.read = pt_event_read;
pt_pmu.pmu.setup_aux = pt_buffer_setup_aux;
pt_pmu.pmu.free_aux = pt_buffer_free_aux;
pt_pmu.pmu.addr_filters_sync = pt_event_addr_filters_sync;
pt_pmu.pmu.addr_filters_validate = pt_event_addr_filters_validate;
pt_pmu.pmu.nr_addr_filters =
pt_cap_get(PT_CAP_num_address_ranges);
ret = perf_pmu_register(&pt_pmu.pmu, "intel_pt", -1);
return ret;
......
......@@ -19,6 +19,40 @@
#ifndef __INTEL_PT_H__
#define __INTEL_PT_H__
/*
* PT MSR bit definitions
*/
#define RTIT_CTL_TRACEEN BIT(0)
#define RTIT_CTL_CYCLEACC BIT(1)
#define RTIT_CTL_OS BIT(2)
#define RTIT_CTL_USR BIT(3)
#define RTIT_CTL_CR3EN BIT(7)
#define RTIT_CTL_TOPA BIT(8)
#define RTIT_CTL_MTC_EN BIT(9)
#define RTIT_CTL_TSC_EN BIT(10)
#define RTIT_CTL_DISRETC BIT(11)
#define RTIT_CTL_BRANCH_EN BIT(13)
#define RTIT_CTL_MTC_RANGE_OFFSET 14
#define RTIT_CTL_MTC_RANGE (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
#define RTIT_CTL_CYC_THRESH_OFFSET 19
#define RTIT_CTL_CYC_THRESH (0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
#define RTIT_CTL_PSB_FREQ_OFFSET 24
#define RTIT_CTL_PSB_FREQ (0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
#define RTIT_CTL_ADDR0_OFFSET 32
#define RTIT_CTL_ADDR0 (0x0full << RTIT_CTL_ADDR0_OFFSET)
#define RTIT_CTL_ADDR1_OFFSET 36
#define RTIT_CTL_ADDR1 (0x0full << RTIT_CTL_ADDR1_OFFSET)
#define RTIT_CTL_ADDR2_OFFSET 40
#define RTIT_CTL_ADDR2 (0x0full << RTIT_CTL_ADDR2_OFFSET)
#define RTIT_CTL_ADDR3_OFFSET 44
#define RTIT_CTL_ADDR3 (0x0full << RTIT_CTL_ADDR3_OFFSET)
#define RTIT_STATUS_FILTEREN BIT(0)
#define RTIT_STATUS_CONTEXTEN BIT(1)
#define RTIT_STATUS_TRIGGEREN BIT(2)
#define RTIT_STATUS_BUFFOVF BIT(3)
#define RTIT_STATUS_ERROR BIT(4)
#define RTIT_STATUS_STOPPED BIT(5)
/*
* Single-entry ToPA: when this close to region boundary, switch
* buffers to avoid losing data.
......@@ -48,15 +82,20 @@ struct topa_entry {
#define PT_CPUID_LEAVES 2
#define PT_CPUID_REGS_NUM 4 /* number of regsters (eax, ebx, ecx, edx) */
/* TSC to Core Crystal Clock Ratio */
#define CPUID_TSC_LEAF 0x15
enum pt_capabilities {
PT_CAP_max_subleaf = 0,
PT_CAP_cr3_filtering,
PT_CAP_psb_cyc,
PT_CAP_ip_filtering,
PT_CAP_mtc,
PT_CAP_topa_output,
PT_CAP_topa_multiple_entries,
PT_CAP_single_range_output,
PT_CAP_payloads_lip,
PT_CAP_num_address_ranges,
PT_CAP_mtc_periods,
PT_CAP_cycle_thresholds,
PT_CAP_psb_periods,
......@@ -66,6 +105,9 @@ struct pt_pmu {
struct pmu pmu;
u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES];
bool vmx;
unsigned long max_nonturbo_ratio;
unsigned int tsc_art_num;
unsigned int tsc_art_den;
};
/**
......@@ -104,14 +146,40 @@ struct pt_buffer {
struct topa_entry *topa_index[0];
};
#define PT_FILTERS_NUM 4
/**
* struct pt_filter - IP range filter configuration
* @msr_a: range start, goes to RTIT_ADDRn_A
* @msr_b: range end, goes to RTIT_ADDRn_B
* @config: 4-bit field in RTIT_CTL
*/
struct pt_filter {
unsigned long msr_a;
unsigned long msr_b;
unsigned long config;
};
/**
* struct pt_filters - IP range filtering context
* @filter: filters defined for this context
* @nr_filters: number of defined filters in the @filter array
*/
struct pt_filters {
struct pt_filter filter[PT_FILTERS_NUM];
unsigned int nr_filters;
};
/**
* struct pt - per-cpu pt context
* @handle: perf output handle
* @filters: last configured filters
* @handle_nmi: do handle PT PMI on this cpu, there's an active event
* @vmx_on: 1 if VMX is ON on this cpu
*/
struct pt {
struct perf_output_handle handle;
struct pt_filters filters;
int handle_nmi;
int vmx_on;
};
......
......@@ -27,10 +27,14 @@
* event: rapl_energy_dram
* perf code: 0x3
*
* dram counter: consumption of the builtin-gpu domain (client only)
* gpu counter: consumption of the builtin-gpu domain (client only)
* event: rapl_energy_gpu
* perf code: 0x4
*
* psys counter: consumption of the builtin-psys domain (client only)
* event: rapl_energy_psys
* perf code: 0x5
*
* We manage those counters as free running (read-only). They may be
* use simultaneously by other tools, such as turbostat.
*
......@@ -53,6 +57,8 @@
#include <asm/cpu_device_id.h>
#include "../perf_event.h"
MODULE_LICENSE("GPL");
/*
* RAPL energy status counters
*/
......@@ -64,13 +70,16 @@
#define INTEL_RAPL_RAM 0x3 /* pseudo-encoding */
#define RAPL_IDX_PP1_NRG_STAT 3 /* gpu */
#define INTEL_RAPL_PP1 0x4 /* pseudo-encoding */
#define RAPL_IDX_PSYS_NRG_STAT 4 /* psys */
#define INTEL_RAPL_PSYS 0x5 /* pseudo-encoding */
#define NR_RAPL_DOMAINS 0x4
#define NR_RAPL_DOMAINS 0x5
static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
"pp0-core",
"package",
"dram",
"pp1-gpu",
"psys",
};
/* Clients have PP0, PKG */
......@@ -89,6 +98,13 @@ static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
1<<RAPL_IDX_RAM_NRG_STAT|\
1<<RAPL_IDX_PP1_NRG_STAT)
/* SKL clients have PP0, PKG, RAM, PP1, PSYS */
#define RAPL_IDX_SKL_CLN (1<<RAPL_IDX_PP0_NRG_STAT|\
1<<RAPL_IDX_PKG_NRG_STAT|\
1<<RAPL_IDX_RAM_NRG_STAT|\
1<<RAPL_IDX_PP1_NRG_STAT|\
1<<RAPL_IDX_PSYS_NRG_STAT)
/* Knights Landing has PKG, RAM */
#define RAPL_IDX_KNL (1<<RAPL_IDX_PKG_NRG_STAT|\
1<<RAPL_IDX_RAM_NRG_STAT)
......@@ -360,6 +376,10 @@ static int rapl_pmu_event_init(struct perf_event *event)
bit = RAPL_IDX_PP1_NRG_STAT;
msr = MSR_PP1_ENERGY_STATUS;
break;
case INTEL_RAPL_PSYS:
bit = RAPL_IDX_PSYS_NRG_STAT;
msr = MSR_PLATFORM_ENERGY_STATUS;
break;
default:
return -EINVAL;
}
......@@ -414,11 +434,13 @@ RAPL_EVENT_ATTR_STR(energy-cores, rapl_cores, "event=0x01");
RAPL_EVENT_ATTR_STR(energy-pkg , rapl_pkg, "event=0x02");
RAPL_EVENT_ATTR_STR(energy-ram , rapl_ram, "event=0x03");
RAPL_EVENT_ATTR_STR(energy-gpu , rapl_gpu, "event=0x04");
RAPL_EVENT_ATTR_STR(energy-psys, rapl_psys, "event=0x05");
RAPL_EVENT_ATTR_STR(energy-cores.unit, rapl_cores_unit, "Joules");
RAPL_EVENT_ATTR_STR(energy-pkg.unit , rapl_pkg_unit, "Joules");
RAPL_EVENT_ATTR_STR(energy-ram.unit , rapl_ram_unit, "Joules");
RAPL_EVENT_ATTR_STR(energy-gpu.unit , rapl_gpu_unit, "Joules");
RAPL_EVENT_ATTR_STR(energy-psys.unit, rapl_psys_unit, "Joules");
/*
* we compute in 0.23 nJ increments regardless of MSR
......@@ -427,6 +449,7 @@ RAPL_EVENT_ATTR_STR(energy-cores.scale, rapl_cores_scale, "2.3283064365386962890
RAPL_EVENT_ATTR_STR(energy-pkg.scale, rapl_pkg_scale, "2.3283064365386962890625e-10");
RAPL_EVENT_ATTR_STR(energy-ram.scale, rapl_ram_scale, "2.3283064365386962890625e-10");
RAPL_EVENT_ATTR_STR(energy-gpu.scale, rapl_gpu_scale, "2.3283064365386962890625e-10");
RAPL_EVENT_ATTR_STR(energy-psys.scale, rapl_psys_scale, "2.3283064365386962890625e-10");
static struct attribute *rapl_events_srv_attr[] = {
EVENT_PTR(rapl_cores),
......@@ -476,6 +499,27 @@ static struct attribute *rapl_events_hsw_attr[] = {
NULL,
};
static struct attribute *rapl_events_skl_attr[] = {
EVENT_PTR(rapl_cores),
EVENT_PTR(rapl_pkg),
EVENT_PTR(rapl_gpu),
EVENT_PTR(rapl_ram),
EVENT_PTR(rapl_psys),
EVENT_PTR(rapl_cores_unit),
EVENT_PTR(rapl_pkg_unit),
EVENT_PTR(rapl_gpu_unit),
EVENT_PTR(rapl_ram_unit),
EVENT_PTR(rapl_psys_unit),
EVENT_PTR(rapl_cores_scale),
EVENT_PTR(rapl_pkg_scale),
EVENT_PTR(rapl_gpu_scale),
EVENT_PTR(rapl_ram_scale),
EVENT_PTR(rapl_psys_scale),
NULL,
};
static struct attribute *rapl_events_knl_attr[] = {
EVENT_PTR(rapl_pkg),
EVENT_PTR(rapl_ram),
......@@ -592,6 +636,11 @@ static int rapl_cpu_notifier(struct notifier_block *self,
return NOTIFY_OK;
}
static struct notifier_block rapl_cpu_nb = {
.notifier_call = rapl_cpu_notifier,
.priority = CPU_PRI_PERF + 1,
};
static int rapl_check_hw_unit(bool apply_quirk)
{
u64 msr_rapl_power_unit_bits;
......@@ -660,7 +709,7 @@ static int __init rapl_prepare_cpus(void)
return 0;
}
static void __init cleanup_rapl_pmus(void)
static void cleanup_rapl_pmus(void)
{
int i;
......@@ -691,52 +740,92 @@ static int __init init_rapl_pmus(void)
return 0;
}
#define X86_RAPL_MODEL_MATCH(model, init) \
{ X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY, (unsigned long)&init }
struct intel_rapl_init_fun {
bool apply_quirk;
int cntr_mask;
struct attribute **attrs;
};
static const struct intel_rapl_init_fun snb_rapl_init __initconst = {
.apply_quirk = false,
.cntr_mask = RAPL_IDX_CLN,
.attrs = rapl_events_cln_attr,
};
static const struct intel_rapl_init_fun hsx_rapl_init __initconst = {
.apply_quirk = true,
.cntr_mask = RAPL_IDX_SRV,
.attrs = rapl_events_srv_attr,
};
static const struct intel_rapl_init_fun hsw_rapl_init __initconst = {
.apply_quirk = false,
.cntr_mask = RAPL_IDX_HSW,
.attrs = rapl_events_hsw_attr,
};
static const struct intel_rapl_init_fun snbep_rapl_init __initconst = {
.apply_quirk = false,
.cntr_mask = RAPL_IDX_SRV,
.attrs = rapl_events_srv_attr,
};
static const struct intel_rapl_init_fun knl_rapl_init __initconst = {
.apply_quirk = true,
.cntr_mask = RAPL_IDX_KNL,
.attrs = rapl_events_knl_attr,
};
static const struct intel_rapl_init_fun skl_rapl_init __initconst = {
.apply_quirk = false,
.cntr_mask = RAPL_IDX_SKL_CLN,
.attrs = rapl_events_skl_attr,
};
static const struct x86_cpu_id rapl_cpu_match[] __initconst = {
[0] = { .vendor = X86_VENDOR_INTEL, .family = 6 },
[1] = {},
X86_RAPL_MODEL_MATCH(42, snb_rapl_init), /* Sandy Bridge */
X86_RAPL_MODEL_MATCH(45, snbep_rapl_init), /* Sandy Bridge-EP */
X86_RAPL_MODEL_MATCH(58, snb_rapl_init), /* Ivy Bridge */
X86_RAPL_MODEL_MATCH(62, snbep_rapl_init), /* IvyTown */
X86_RAPL_MODEL_MATCH(60, hsw_rapl_init), /* Haswell */
X86_RAPL_MODEL_MATCH(63, hsx_rapl_init), /* Haswell-Server */
X86_RAPL_MODEL_MATCH(69, hsw_rapl_init), /* Haswell-Celeron */
X86_RAPL_MODEL_MATCH(70, hsw_rapl_init), /* Haswell GT3e */
X86_RAPL_MODEL_MATCH(61, hsw_rapl_init), /* Broadwell */
X86_RAPL_MODEL_MATCH(71, hsw_rapl_init), /* Broadwell-H */
X86_RAPL_MODEL_MATCH(79, hsx_rapl_init), /* Broadwell-Server */
X86_RAPL_MODEL_MATCH(86, hsx_rapl_init), /* Broadwell Xeon D */
X86_RAPL_MODEL_MATCH(87, knl_rapl_init), /* Knights Landing */
X86_RAPL_MODEL_MATCH(78, skl_rapl_init), /* Skylake */
X86_RAPL_MODEL_MATCH(94, skl_rapl_init), /* Skylake H/S */
{},
};
MODULE_DEVICE_TABLE(x86cpu, rapl_cpu_match);
static int __init rapl_pmu_init(void)
{
bool apply_quirk = false;
const struct x86_cpu_id *id;
struct intel_rapl_init_fun *rapl_init;
bool apply_quirk;
int ret;
if (!x86_match_cpu(rapl_cpu_match))
id = x86_match_cpu(rapl_cpu_match);
if (!id)
return -ENODEV;
switch (boot_cpu_data.x86_model) {
case 42: /* Sandy Bridge */
case 58: /* Ivy Bridge */
rapl_cntr_mask = RAPL_IDX_CLN;
rapl_pmu_events_group.attrs = rapl_events_cln_attr;
break;
case 63: /* Haswell-Server */
case 79: /* Broadwell-Server */
apply_quirk = true;
rapl_cntr_mask = RAPL_IDX_SRV;
rapl_pmu_events_group.attrs = rapl_events_srv_attr;
break;
case 60: /* Haswell */
case 69: /* Haswell-Celeron */
case 70: /* Haswell GT3e */
case 61: /* Broadwell */
case 71: /* Broadwell-H */
rapl_cntr_mask = RAPL_IDX_HSW;
rapl_pmu_events_group.attrs = rapl_events_hsw_attr;
break;
case 45: /* Sandy Bridge-EP */
case 62: /* IvyTown */
rapl_cntr_mask = RAPL_IDX_SRV;
rapl_pmu_events_group.attrs = rapl_events_srv_attr;
break;
case 87: /* Knights Landing */
apply_quirk = true;
rapl_cntr_mask = RAPL_IDX_KNL;
rapl_pmu_events_group.attrs = rapl_events_knl_attr;
break;
default:
return -ENODEV;
}
rapl_init = (struct intel_rapl_init_fun *)id->driver_data;
apply_quirk = rapl_init->apply_quirk;
rapl_cntr_mask = rapl_init->cntr_mask;
rapl_pmu_events_group.attrs = rapl_init->attrs;
ret = rapl_check_hw_unit(apply_quirk);
if (ret)
......@@ -756,7 +845,7 @@ static int __init rapl_pmu_init(void)
if (ret)
goto out;
__perf_cpu_notifier(rapl_cpu_notifier);
__register_cpu_notifier(&rapl_cpu_nb);
cpu_notifier_register_done();
rapl_advertise();
return 0;
......@@ -767,4 +856,14 @@ static int __init rapl_pmu_init(void)
cpu_notifier_register_done();
return ret;
}
device_initcall(rapl_pmu_init);
module_init(rapl_pmu_init);
static void __exit intel_rapl_exit(void)
{
cpu_notifier_register_begin();
__unregister_cpu_notifier(&rapl_cpu_nb);
perf_pmu_unregister(&rapl_pmus->pmu);
cleanup_rapl_pmus();
cpu_notifier_register_done();
}
module_exit(intel_rapl_exit);
#include <asm/cpu_device_id.h>
#include "uncore.h"
static struct intel_uncore_type *empty_uncore[] = { NULL, };
......@@ -21,6 +22,8 @@ static struct event_constraint uncore_constraint_fixed =
struct event_constraint uncore_constraint_empty =
EVENT_CONSTRAINT(0, 0, 0);
MODULE_LICENSE("GPL");
static int uncore_pcibus_to_physid(struct pci_bus *bus)
{
struct pci2phy_map *map;
......@@ -754,7 +757,7 @@ static void uncore_pmu_unregister(struct intel_uncore_pmu *pmu)
pmu->registered = false;
}
static void __init __uncore_exit_boxes(struct intel_uncore_type *type, int cpu)
static void __uncore_exit_boxes(struct intel_uncore_type *type, int cpu)
{
struct intel_uncore_pmu *pmu = type->pmus;
struct intel_uncore_box *box;
......@@ -770,7 +773,7 @@ static void __init __uncore_exit_boxes(struct intel_uncore_type *type, int cpu)
}
}
static void __init uncore_exit_boxes(void *dummy)
static void uncore_exit_boxes(void *dummy)
{
struct intel_uncore_type **types;
......@@ -787,7 +790,7 @@ static void uncore_free_boxes(struct intel_uncore_pmu *pmu)
kfree(pmu->boxes);
}
static void __init uncore_type_exit(struct intel_uncore_type *type)
static void uncore_type_exit(struct intel_uncore_type *type)
{
struct intel_uncore_pmu *pmu = type->pmus;
int i;
......@@ -804,7 +807,7 @@ static void __init uncore_type_exit(struct intel_uncore_type *type)
type->events_group = NULL;
}
static void __init uncore_types_exit(struct intel_uncore_type **types)
static void uncore_types_exit(struct intel_uncore_type **types)
{
for (; *types; types++)
uncore_type_exit(*types);
......@@ -989,46 +992,6 @@ static int __init uncore_pci_init(void)
size_t size;
int ret;
switch (boot_cpu_data.x86_model) {
case 45: /* Sandy Bridge-EP */
ret = snbep_uncore_pci_init();
break;
case 62: /* Ivy Bridge-EP */
ret = ivbep_uncore_pci_init();
break;
case 63: /* Haswell-EP */
ret = hswep_uncore_pci_init();
break;
case 79: /* BDX-EP */
case 86: /* BDX-DE */
ret = bdx_uncore_pci_init();
break;
case 42: /* Sandy Bridge */
ret = snb_uncore_pci_init();
break;
case 58: /* Ivy Bridge */
ret = ivb_uncore_pci_init();
break;
case 60: /* Haswell */
case 69: /* Haswell Celeron */
ret = hsw_uncore_pci_init();
break;
case 61: /* Broadwell */
ret = bdw_uncore_pci_init();
break;
case 87: /* Knights Landing */
ret = knl_uncore_pci_init();
break;
case 94: /* SkyLake */
ret = skl_uncore_pci_init();
break;
default:
return -ENODEV;
}
if (ret)
return ret;
size = max_packages * sizeof(struct pci_extra_dev);
uncore_extra_pci_dev = kzalloc(size, GFP_KERNEL);
if (!uncore_extra_pci_dev) {
......@@ -1060,7 +1023,7 @@ static int __init uncore_pci_init(void)
return ret;
}
static void __init uncore_pci_exit(void)
static void uncore_pci_exit(void)
{
if (pcidrv_registered) {
pcidrv_registered = false;
......@@ -1287,46 +1250,6 @@ static int __init uncore_cpu_init(void)
{
int ret;
switch (boot_cpu_data.x86_model) {
case 26: /* Nehalem */
case 30:
case 37: /* Westmere */
case 44:
nhm_uncore_cpu_init();
break;
case 42: /* Sandy Bridge */
case 58: /* Ivy Bridge */
case 60: /* Haswell */
case 69: /* Haswell */
case 70: /* Haswell */
case 61: /* Broadwell */
case 71: /* Broadwell */
snb_uncore_cpu_init();
break;
case 45: /* Sandy Bridge-EP */
snbep_uncore_cpu_init();
break;
case 46: /* Nehalem-EX */
case 47: /* Westmere-EX aka. Xeon E7 */
nhmex_uncore_cpu_init();
break;
case 62: /* Ivy Bridge-EP */
ivbep_uncore_cpu_init();
break;
case 63: /* Haswell-EP */
hswep_uncore_cpu_init();
break;
case 79: /* BDX-EP */
case 86: /* BDX-DE */
bdx_uncore_cpu_init();
break;
case 87: /* Knights Landing */
knl_uncore_cpu_init();
break;
default:
return -ENODEV;
}
ret = uncore_types_init(uncore_msr_uncores, true);
if (ret)
goto err;
......@@ -1376,11 +1299,105 @@ static int __init uncore_cpumask_init(bool msr)
return 0;
}
#define X86_UNCORE_MODEL_MATCH(model, init) \
{ X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY, (unsigned long)&init }
struct intel_uncore_init_fun {
void (*cpu_init)(void);
int (*pci_init)(void);
};
static const struct intel_uncore_init_fun nhm_uncore_init __initconst = {
.cpu_init = nhm_uncore_cpu_init,
};
static const struct intel_uncore_init_fun snb_uncore_init __initconst = {
.cpu_init = snb_uncore_cpu_init,
.pci_init = snb_uncore_pci_init,
};
static const struct intel_uncore_init_fun ivb_uncore_init __initconst = {
.cpu_init = snb_uncore_cpu_init,
.pci_init = ivb_uncore_pci_init,
};
static const struct intel_uncore_init_fun hsw_uncore_init __initconst = {
.cpu_init = snb_uncore_cpu_init,
.pci_init = hsw_uncore_pci_init,
};
static const struct intel_uncore_init_fun bdw_uncore_init __initconst = {
.cpu_init = snb_uncore_cpu_init,
.pci_init = bdw_uncore_pci_init,
};
static const struct intel_uncore_init_fun snbep_uncore_init __initconst = {
.cpu_init = snbep_uncore_cpu_init,
.pci_init = snbep_uncore_pci_init,
};
static const struct intel_uncore_init_fun nhmex_uncore_init __initconst = {
.cpu_init = nhmex_uncore_cpu_init,
};
static const struct intel_uncore_init_fun ivbep_uncore_init __initconst = {
.cpu_init = ivbep_uncore_cpu_init,
.pci_init = ivbep_uncore_pci_init,
};
static const struct intel_uncore_init_fun hswep_uncore_init __initconst = {
.cpu_init = hswep_uncore_cpu_init,
.pci_init = hswep_uncore_pci_init,
};
static const struct intel_uncore_init_fun bdx_uncore_init __initconst = {
.cpu_init = bdx_uncore_cpu_init,
.pci_init = bdx_uncore_pci_init,
};
static const struct intel_uncore_init_fun knl_uncore_init __initconst = {
.cpu_init = knl_uncore_cpu_init,
.pci_init = knl_uncore_pci_init,
};
static const struct intel_uncore_init_fun skl_uncore_init __initconst = {
.pci_init = skl_uncore_pci_init,
};
static const struct x86_cpu_id intel_uncore_match[] __initconst = {
X86_UNCORE_MODEL_MATCH(26, nhm_uncore_init), /* Nehalem */
X86_UNCORE_MODEL_MATCH(30, nhm_uncore_init),
X86_UNCORE_MODEL_MATCH(37, nhm_uncore_init), /* Westmere */
X86_UNCORE_MODEL_MATCH(44, nhm_uncore_init),
X86_UNCORE_MODEL_MATCH(42, snb_uncore_init), /* Sandy Bridge */
X86_UNCORE_MODEL_MATCH(58, ivb_uncore_init), /* Ivy Bridge */
X86_UNCORE_MODEL_MATCH(60, hsw_uncore_init), /* Haswell */
X86_UNCORE_MODEL_MATCH(69, hsw_uncore_init), /* Haswell Celeron */
X86_UNCORE_MODEL_MATCH(70, hsw_uncore_init), /* Haswell */
X86_UNCORE_MODEL_MATCH(61, bdw_uncore_init), /* Broadwell */
X86_UNCORE_MODEL_MATCH(71, bdw_uncore_init), /* Broadwell */
X86_UNCORE_MODEL_MATCH(45, snbep_uncore_init), /* Sandy Bridge-EP */
X86_UNCORE_MODEL_MATCH(46, nhmex_uncore_init), /* Nehalem-EX */
X86_UNCORE_MODEL_MATCH(47, nhmex_uncore_init), /* Westmere-EX aka. Xeon E7 */
X86_UNCORE_MODEL_MATCH(62, ivbep_uncore_init), /* Ivy Bridge-EP */
X86_UNCORE_MODEL_MATCH(63, hswep_uncore_init), /* Haswell-EP */
X86_UNCORE_MODEL_MATCH(79, bdx_uncore_init), /* BDX-EP */
X86_UNCORE_MODEL_MATCH(86, bdx_uncore_init), /* BDX-DE */
X86_UNCORE_MODEL_MATCH(87, knl_uncore_init), /* Knights Landing */
X86_UNCORE_MODEL_MATCH(94, skl_uncore_init), /* SkyLake */
{},
};
MODULE_DEVICE_TABLE(x86cpu, intel_uncore_match);
static int __init intel_uncore_init(void)
{
int pret, cret, ret;
const struct x86_cpu_id *id;
struct intel_uncore_init_fun *uncore_init;
int pret = 0, cret = 0, ret;
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
id = x86_match_cpu(intel_uncore_match);
if (!id)
return -ENODEV;
if (cpu_has_hypervisor)
......@@ -1388,8 +1405,17 @@ static int __init intel_uncore_init(void)
max_packages = topology_max_packages();
pret = uncore_pci_init();
cret = uncore_cpu_init();
uncore_init = (struct intel_uncore_init_fun *)id->driver_data;
if (uncore_init->pci_init) {
pret = uncore_init->pci_init();
if (!pret)
pret = uncore_pci_init();
}
if (uncore_init->cpu_init) {
uncore_init->cpu_init();
cret = uncore_cpu_init();
}
if (cret && pret)
return -ENODEV;
......@@ -1409,4 +1435,14 @@ static int __init intel_uncore_init(void)
cpu_notifier_register_done();
return ret;
}
device_initcall(intel_uncore_init);
module_init(intel_uncore_init);
static void __exit intel_uncore_exit(void)
{
cpu_notifier_register_begin();
__unregister_cpu_notifier(&uncore_cpu_nb);
uncore_types_exit(uncore_msr_uncores);
uncore_pci_exit();
cpu_notifier_register_done();
}
module_exit(intel_uncore_exit);
......@@ -6,6 +6,8 @@ enum perf_msr_id {
PERF_MSR_MPERF = 2,
PERF_MSR_PPERF = 3,
PERF_MSR_SMI = 4,
PERF_MSR_PTSC = 5,
PERF_MSR_IRPERF = 6,
PERF_MSR_EVENT_MAX,
};
......@@ -15,6 +17,16 @@ static bool test_aperfmperf(int idx)
return boot_cpu_has(X86_FEATURE_APERFMPERF);
}
static bool test_ptsc(int idx)
{
return boot_cpu_has(X86_FEATURE_PTSC);
}
static bool test_irperf(int idx)
{
return boot_cpu_has(X86_FEATURE_IRPERF);
}
static bool test_intel(int idx)
{
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
......@@ -69,18 +81,22 @@ struct perf_msr {
bool (*test)(int idx);
};
PMU_EVENT_ATTR_STRING(tsc, evattr_tsc, "event=0x00");
PMU_EVENT_ATTR_STRING(aperf, evattr_aperf, "event=0x01");
PMU_EVENT_ATTR_STRING(mperf, evattr_mperf, "event=0x02");
PMU_EVENT_ATTR_STRING(pperf, evattr_pperf, "event=0x03");
PMU_EVENT_ATTR_STRING(smi, evattr_smi, "event=0x04");
PMU_EVENT_ATTR_STRING(tsc, evattr_tsc, "event=0x00");
PMU_EVENT_ATTR_STRING(aperf, evattr_aperf, "event=0x01");
PMU_EVENT_ATTR_STRING(mperf, evattr_mperf, "event=0x02");
PMU_EVENT_ATTR_STRING(pperf, evattr_pperf, "event=0x03");
PMU_EVENT_ATTR_STRING(smi, evattr_smi, "event=0x04");
PMU_EVENT_ATTR_STRING(ptsc, evattr_ptsc, "event=0x05");
PMU_EVENT_ATTR_STRING(irperf, evattr_irperf, "event=0x06");
static struct perf_msr msr[] = {
[PERF_MSR_TSC] = { 0, &evattr_tsc, NULL, },
[PERF_MSR_APERF] = { MSR_IA32_APERF, &evattr_aperf, test_aperfmperf, },
[PERF_MSR_MPERF] = { MSR_IA32_MPERF, &evattr_mperf, test_aperfmperf, },
[PERF_MSR_PPERF] = { MSR_PPERF, &evattr_pperf, test_intel, },
[PERF_MSR_SMI] = { MSR_SMI_COUNT, &evattr_smi, test_intel, },
[PERF_MSR_TSC] = { 0, &evattr_tsc, NULL, },
[PERF_MSR_APERF] = { MSR_IA32_APERF, &evattr_aperf, test_aperfmperf, },
[PERF_MSR_MPERF] = { MSR_IA32_MPERF, &evattr_mperf, test_aperfmperf, },
[PERF_MSR_PPERF] = { MSR_PPERF, &evattr_pperf, test_intel, },
[PERF_MSR_SMI] = { MSR_SMI_COUNT, &evattr_smi, test_intel, },
[PERF_MSR_PTSC] = { MSR_F15H_PTSC, &evattr_ptsc, test_ptsc, },
[PERF_MSR_IRPERF] = { MSR_F17H_IRPERF, &evattr_irperf, test_irperf, },
};
static struct attribute *events_attrs[PERF_MSR_EVENT_MAX + 1] = {
......
......@@ -601,6 +601,7 @@ struct x86_pmu {
u64 lbr_sel_mask; /* LBR_SELECT valid bits */
const int *lbr_sel_map; /* lbr_select mappings */
bool lbr_double_abort; /* duplicated lbr aborts */
bool lbr_pt_coexist; /* LBR may coexist with PT */
/*
* Intel PT/LBR/BTS are exclusive
......@@ -859,6 +860,8 @@ extern struct event_constraint intel_atom_pebs_event_constraints[];
extern struct event_constraint intel_slm_pebs_event_constraints[];
extern struct event_constraint intel_glm_pebs_event_constraints[];
extern struct event_constraint intel_nehalem_pebs_event_constraints[];
extern struct event_constraint intel_westmere_pebs_event_constraints[];
......@@ -907,6 +910,8 @@ void intel_pmu_lbr_init_nhm(void);
void intel_pmu_lbr_init_atom(void);
void intel_pmu_lbr_init_slm(void);
void intel_pmu_lbr_init_snb(void);
void intel_pmu_lbr_init_hsw(void);
......
......@@ -177,6 +177,7 @@
#define X86_FEATURE_PERFCTR_CORE ( 6*32+23) /* core performance counter extensions */
#define X86_FEATURE_PERFCTR_NB ( 6*32+24) /* NB performance counter extensions */
#define X86_FEATURE_BPEXT (6*32+26) /* data breakpoint extension */
#define X86_FEATURE_PTSC ( 6*32+27) /* performance time-stamp counter */
#define X86_FEATURE_PERFCTR_L2 ( 6*32+28) /* L2 performance counter extensions */
#define X86_FEATURE_MWAITX ( 6*32+29) /* MWAIT extension (MONITORX/MWAITX) */
......@@ -250,6 +251,7 @@
/* AMD-defined CPU features, CPUID level 0x80000008 (ebx), word 13 */
#define X86_FEATURE_CLZERO (13*32+0) /* CLZERO instruction */
#define X86_FEATURE_IRPERF (13*32+1) /* Instructions Retired Count */
/* Thermal and Power Management Leaf, CPUID level 0x00000006 (eax), word 14 */
#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */
......
......@@ -89,27 +89,16 @@
#define MSR_PEBS_LD_LAT_THRESHOLD 0x000003f6
#define MSR_IA32_RTIT_CTL 0x00000570
#define RTIT_CTL_TRACEEN BIT(0)
#define RTIT_CTL_CYCLEACC BIT(1)
#define RTIT_CTL_OS BIT(2)
#define RTIT_CTL_USR BIT(3)
#define RTIT_CTL_CR3EN BIT(7)
#define RTIT_CTL_TOPA BIT(8)
#define RTIT_CTL_MTC_EN BIT(9)
#define RTIT_CTL_TSC_EN BIT(10)
#define RTIT_CTL_DISRETC BIT(11)
#define RTIT_CTL_BRANCH_EN BIT(13)
#define RTIT_CTL_MTC_RANGE_OFFSET 14
#define RTIT_CTL_MTC_RANGE (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
#define RTIT_CTL_CYC_THRESH_OFFSET 19
#define RTIT_CTL_CYC_THRESH (0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
#define RTIT_CTL_PSB_FREQ_OFFSET 24
#define RTIT_CTL_PSB_FREQ (0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
#define MSR_IA32_RTIT_STATUS 0x00000571
#define RTIT_STATUS_CONTEXTEN BIT(1)
#define RTIT_STATUS_TRIGGEREN BIT(2)
#define RTIT_STATUS_ERROR BIT(4)
#define RTIT_STATUS_STOPPED BIT(5)
#define MSR_IA32_RTIT_STATUS 0x00000571
#define MSR_IA32_RTIT_ADDR0_A 0x00000580
#define MSR_IA32_RTIT_ADDR0_B 0x00000581
#define MSR_IA32_RTIT_ADDR1_A 0x00000582
#define MSR_IA32_RTIT_ADDR1_B 0x00000583
#define MSR_IA32_RTIT_ADDR2_A 0x00000584
#define MSR_IA32_RTIT_ADDR2_B 0x00000585
#define MSR_IA32_RTIT_ADDR3_A 0x00000586
#define MSR_IA32_RTIT_ADDR3_B 0x00000587
#define MSR_IA32_RTIT_CR3_MATCH 0x00000572
#define MSR_IA32_RTIT_OUTPUT_BASE 0x00000560
#define MSR_IA32_RTIT_OUTPUT_MASK 0x00000561
......@@ -205,6 +194,8 @@
#define MSR_CONFIG_TDP_CONTROL 0x0000064B
#define MSR_TURBO_ACTIVATION_RATIO 0x0000064C
#define MSR_PLATFORM_ENERGY_STATUS 0x0000064D
#define MSR_PKG_WEIGHTED_CORE_C0_RES 0x00000658
#define MSR_PKG_ANY_CORE_C0_RES 0x00000659
#define MSR_PKG_ANY_GFXE_C0_RES 0x0000065A
......@@ -315,6 +306,9 @@
#define MSR_AMD64_IBSOPDATA4 0xc001103d
#define MSR_AMD64_IBS_REG_COUNT_MAX 8 /* includes MSR_AMD64_IBSBRTARGET */
/* Fam 17h MSRs */
#define MSR_F17H_IRPERF 0xc00000e9
/* Fam 16h MSRs */
#define MSR_F16H_L2I_PERF_CTL 0xc0010230
#define MSR_F16H_L2I_PERF_CTR 0xc0010231
......@@ -328,6 +322,7 @@
#define MSR_F15H_PERF_CTR 0xc0010201
#define MSR_F15H_NB_PERF_CTL 0xc0010240
#define MSR_F15H_NB_PERF_CTR 0xc0010241
#define MSR_F15H_PTSC 0xc0010280
#define MSR_F15H_IC_CFG 0xc0011021
/* Fam 10h MSRs */
......
......@@ -578,7 +578,7 @@ static void default_abort_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
riprel_post_xol(auprobe, regs);
}
static struct uprobe_xol_ops default_xol_ops = {
static const struct uprobe_xol_ops default_xol_ops = {
.pre_xol = default_pre_xol_op,
.post_xol = default_post_xol_op,
.abort = default_abort_op,
......@@ -695,7 +695,7 @@ static void branch_clear_offset(struct arch_uprobe *auprobe, struct insn *insn)
0, insn->immediate.nbytes);
}
static struct uprobe_xol_ops branch_xol_ops = {
static const struct uprobe_xol_ops branch_xol_ops = {
.emulate = branch_emulate_op,
.post_xol = branch_post_xol_op,
};
......
......@@ -332,14 +332,14 @@ static int callchain_trace(struct stackframe *frame, void *data)
void perf_callchain_kernel(struct perf_callchain_entry *entry,
struct pt_regs *regs)
{
xtensa_backtrace_kernel(regs, PERF_MAX_STACK_DEPTH,
xtensa_backtrace_kernel(regs, sysctl_perf_event_max_stack,
callchain_trace, NULL, entry);
}
void perf_callchain_user(struct perf_callchain_entry *entry,
struct pt_regs *regs)
{
xtensa_backtrace_user(regs, PERF_MAX_STACK_DEPTH,
xtensa_backtrace_user(regs, sysctl_perf_event_max_stack,
callchain_trace, entry);
}
......
......@@ -847,6 +847,14 @@ static int cpu_pmu_init(struct arm_pmu *cpu_pmu)
if (!platform_get_irq(cpu_pmu->plat_device, 0))
cpu_pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
/*
* This is a CPU PMU potentially in a heterogeneous configuration (e.g.
* big.LITTLE). This is not an uncore PMU, and we have taken ctx
* sharing into account (e.g. with our pmu::filter_match callback and
* pmu::event_init group validation).
*/
cpu_pmu->pmu.capabilities |= PERF_PMU_CAP_HETEROGENEOUS_CPUS;
return 0;
out_unregister:
......
......@@ -34,6 +34,9 @@
#include <asm/processor.h>
#include <asm/cpu_device_id.h>
/* Local defines */
#define MSR_PLATFORM_POWER_LIMIT 0x0000065C
/* bitmasks for RAPL MSRs, used by primitive access functions */
#define ENERGY_STATUS_MASK 0xffffffff
......@@ -86,6 +89,7 @@ enum rapl_domain_type {
RAPL_DOMAIN_PP0, /* core power plane */
RAPL_DOMAIN_PP1, /* graphics uncore */
RAPL_DOMAIN_DRAM,/* DRAM control_type */
RAPL_DOMAIN_PLATFORM, /* PSys control_type */
RAPL_DOMAIN_MAX,
};
......@@ -251,9 +255,11 @@ static const char * const rapl_domain_names[] = {
"core",
"uncore",
"dram",
"psys",
};
static struct powercap_control_type *control_type; /* PowerCap Controller */
static struct rapl_domain *platform_rapl_domain; /* Platform (PSys) domain */
/* caller to ensure CPU hotplug lock is held */
static struct rapl_package *find_package_by_id(int id)
......@@ -409,6 +415,14 @@ static const struct powercap_zone_ops zone_ops[] = {
.set_enable = set_domain_enable,
.get_enable = get_domain_enable,
},
/* RAPL_DOMAIN_PLATFORM */
{
.get_energy_uj = get_energy_counter,
.get_max_energy_range_uj = get_max_energy_counter,
.release = release_zone,
.set_enable = set_domain_enable,
.get_enable = get_domain_enable,
},
};
static int set_power_limit(struct powercap_zone *power_zone, int id,
......@@ -1160,6 +1174,13 @@ static int rapl_unregister_powercap(void)
powercap_unregister_zone(control_type,
&rd_package->power_zone);
}
if (platform_rapl_domain) {
powercap_unregister_zone(control_type,
&platform_rapl_domain->power_zone);
kfree(platform_rapl_domain);
}
powercap_unregister_control_type(control_type);
return 0;
......@@ -1239,6 +1260,47 @@ static int rapl_package_register_powercap(struct rapl_package *rp)
return ret;
}
static int rapl_register_psys(void)
{
struct rapl_domain *rd;
struct powercap_zone *power_zone;
u64 val;
if (rdmsrl_safe_on_cpu(0, MSR_PLATFORM_ENERGY_STATUS, &val) || !val)
return -ENODEV;
if (rdmsrl_safe_on_cpu(0, MSR_PLATFORM_POWER_LIMIT, &val) || !val)
return -ENODEV;
rd = kzalloc(sizeof(*rd), GFP_KERNEL);
if (!rd)
return -ENOMEM;
rd->name = rapl_domain_names[RAPL_DOMAIN_PLATFORM];
rd->id = RAPL_DOMAIN_PLATFORM;
rd->msrs[0] = MSR_PLATFORM_POWER_LIMIT;
rd->msrs[1] = MSR_PLATFORM_ENERGY_STATUS;
rd->rpl[0].prim_id = PL1_ENABLE;
rd->rpl[0].name = pl1_name;
rd->rpl[1].prim_id = PL2_ENABLE;
rd->rpl[1].name = pl2_name;
rd->rp = find_package_by_id(0);
power_zone = powercap_register_zone(&rd->power_zone, control_type,
"psys", NULL,
&zone_ops[RAPL_DOMAIN_PLATFORM],
2, &constraint_ops);
if (IS_ERR(power_zone)) {
kfree(rd);
return PTR_ERR(power_zone);
}
platform_rapl_domain = rd;
return 0;
}
static int rapl_register_powercap(void)
{
struct rapl_domain *rd;
......@@ -1255,6 +1317,10 @@ static int rapl_register_powercap(void)
list_for_each_entry(rp, &rapl_packages, plist)
if (rapl_package_register_powercap(rp))
goto err_cleanup_package;
/* Don't bail out if PSys is not supported */
rapl_register_psys();
return ret;
err_cleanup_package:
......@@ -1289,6 +1355,9 @@ static int rapl_check_domain(int cpu, int domain)
case RAPL_DOMAIN_DRAM:
msr = MSR_DRAM_ENERGY_STATUS;
break;
case RAPL_DOMAIN_PLATFORM:
/* PSYS(PLATFORM) is not a CPU domain, so avoid printng error */
return -EINVAL;
default:
pr_err("invalid domain id %d\n", domain);
return -EINVAL;
......
......@@ -58,7 +58,7 @@ struct perf_guest_info_callbacks {
struct perf_callchain_entry {
__u64 nr;
__u64 ip[PERF_MAX_STACK_DEPTH];
__u64 ip[0]; /* /proc/sys/kernel/perf_event_max_stack */
};
struct perf_raw_record {
......@@ -151,6 +151,15 @@ struct hw_perf_event {
*/
struct task_struct *target;
/*
* PMU would store hardware filter configuration
* here.
*/
void *addr_filters;
/* Last sync'ed generation of filters */
unsigned long addr_filters_gen;
/*
* hw_perf_event::state flags; used to track the PERF_EF_* state.
*/
......@@ -216,6 +225,7 @@ struct perf_event;
#define PERF_PMU_CAP_AUX_SW_DOUBLEBUF 0x08
#define PERF_PMU_CAP_EXCLUSIVE 0x10
#define PERF_PMU_CAP_ITRACE 0x20
#define PERF_PMU_CAP_HETEROGENEOUS_CPUS 0x40
/**
* struct pmu - generic performance monitoring unit
......@@ -240,6 +250,9 @@ struct pmu {
int task_ctx_nr;
int hrtimer_interval_ms;
/* number of address filters this PMU can do */
unsigned int nr_addr_filters;
/*
* Fully disable/enable this PMU, can be used to protect from the PMI
* as well as for lazy/batch writing of the MSRs.
......@@ -392,12 +405,71 @@ struct pmu {
*/
void (*free_aux) (void *aux); /* optional */
/*
* Validate address range filters: make sure the HW supports the
* requested configuration and number of filters; return 0 if the
* supplied filters are valid, -errno otherwise.
*
* Runs in the context of the ioctl()ing process and is not serialized
* with the rest of the PMU callbacks.
*/
int (*addr_filters_validate) (struct list_head *filters);
/* optional */
/*
* Synchronize address range filter configuration:
* translate hw-agnostic filters into hardware configuration in
* event::hw::addr_filters.
*
* Runs as a part of filter sync sequence that is done in ->start()
* callback by calling perf_event_addr_filters_sync().
*
* May (and should) traverse event::addr_filters::list, for which its
* caller provides necessary serialization.
*/
void (*addr_filters_sync) (struct perf_event *event);
/* optional */
/*
* Filter events for PMU-specific reasons.
*/
int (*filter_match) (struct perf_event *event); /* optional */
};
/**
* struct perf_addr_filter - address range filter definition
* @entry: event's filter list linkage
* @inode: object file's inode for file-based filters
* @offset: filter range offset
* @size: filter range size
* @range: 1: range, 0: address
* @filter: 1: filter/start, 0: stop
*
* This is a hardware-agnostic filter configuration as specified by the user.
*/
struct perf_addr_filter {
struct list_head entry;
struct inode *inode;
unsigned long offset;
unsigned long size;
unsigned int range : 1,
filter : 1;
};
/**
* struct perf_addr_filters_head - container for address range filters
* @list: list of filters for this event
* @lock: spinlock that serializes accesses to the @list and event's
* (and its children's) filter generations.
*
* A child event will use parent's @list (and therefore @lock), so they are
* bundled together; see perf_event_addr_filters().
*/
struct perf_addr_filters_head {
struct list_head list;
raw_spinlock_t lock;
};
/**
* enum perf_event_active_state - the states of a event
*/
......@@ -566,6 +638,12 @@ struct perf_event {
atomic_t event_limit;
/* address range filters */
struct perf_addr_filters_head addr_filters;
/* vma address array for file-based filders */
unsigned long *addr_filters_offs;
unsigned long addr_filters_gen;
void (*destroy)(struct perf_event *);
struct rcu_head rcu_head;
......@@ -834,9 +912,25 @@ extern int perf_event_overflow(struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs);
extern void perf_event_output_forward(struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs);
extern void perf_event_output_backward(struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs);
extern void perf_event_output(struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs);
struct perf_sample_data *data,
struct pt_regs *regs);
static inline bool
is_default_overflow_handler(struct perf_event *event)
{
if (likely(event->overflow_handler == perf_event_output_forward))
return true;
if (unlikely(event->overflow_handler == perf_event_output_backward))
return true;
return false;
}
extern void
perf_event_header__init_id(struct perf_event_header *header,
......@@ -977,9 +1071,11 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
extern int get_callchain_buffers(void);
extern void put_callchain_buffers(void);
extern int sysctl_perf_event_max_stack;
static inline int perf_callchain_store(struct perf_callchain_entry *entry, u64 ip)
{
if (entry->nr < PERF_MAX_STACK_DEPTH) {
if (entry->nr < sysctl_perf_event_max_stack) {
entry->ip[entry->nr++] = ip;
return 0;
} else {
......@@ -1001,6 +1097,8 @@ extern int perf_cpu_time_max_percent_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
int perf_event_max_stack_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos);
static inline bool perf_paranoid_tracepoint_raw(void)
{
......@@ -1045,8 +1143,41 @@ static inline bool has_aux(struct perf_event *event)
return event->pmu->setup_aux;
}
static inline bool is_write_backward(struct perf_event *event)
{
return !!event->attr.write_backward;
}
static inline bool has_addr_filter(struct perf_event *event)
{
return event->pmu->nr_addr_filters;
}
/*
* An inherited event uses parent's filters
*/
static inline struct perf_addr_filters_head *
perf_event_addr_filters(struct perf_event *event)
{
struct perf_addr_filters_head *ifh = &event->addr_filters;
if (event->parent)
ifh = &event->parent->addr_filters;
return ifh;
}
extern void perf_event_addr_filters_sync(struct perf_event *event);
extern int perf_output_begin(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size);
extern int perf_output_begin_forward(struct perf_output_handle *handle,
struct perf_event *event,
unsigned int size);
extern int perf_output_begin_backward(struct perf_output_handle *handle,
struct perf_event *event,
unsigned int size);
extern void perf_output_end(struct perf_output_handle *handle);
extern unsigned int perf_output_copy(struct perf_output_handle *handle,
const void *buf, unsigned int len);
......
......@@ -340,7 +340,8 @@ struct perf_event_attr {
comm_exec : 1, /* flag comm events that are due to an exec */
use_clockid : 1, /* use @clockid for time fields */
context_switch : 1, /* context switch data */
__reserved_1 : 37;
write_backward : 1, /* Write ring buffer from end to beginning */
__reserved_1 : 36;
union {
__u32 wakeup_events; /* wakeup every n events */
......@@ -401,6 +402,7 @@ struct perf_event_attr {
#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
#define PERF_EVENT_IOC_ID _IOR('$', 7, __u64 *)
#define PERF_EVENT_IOC_SET_BPF _IOW('$', 8, __u32)
#define PERF_EVENT_IOC_PAUSE_OUTPUT _IOW('$', 9, __u32)
enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
......
......@@ -66,7 +66,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
value_size < 8 || value_size % 8 ||
value_size / 8 > PERF_MAX_STACK_DEPTH)
value_size / 8 > sysctl_perf_event_max_stack)
return ERR_PTR(-EINVAL);
/* hash table size must be power of 2 */
......@@ -124,8 +124,8 @@ static u64 bpf_get_stackid(u64 r1, u64 r2, u64 flags, u64 r4, u64 r5)
struct perf_callchain_entry *trace;
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
u32 max_depth = map->value_size / 8;
/* stack_map_alloc() checks that max_depth <= PERF_MAX_STACK_DEPTH */
u32 init_nr = PERF_MAX_STACK_DEPTH - max_depth;
/* stack_map_alloc() checks that max_depth <= sysctl_perf_event_max_stack */
u32 init_nr = sysctl_perf_event_max_stack - max_depth;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
u32 hash, id, trace_nr, trace_len;
bool user = flags & BPF_F_USER_STACK;
......@@ -143,7 +143,7 @@ static u64 bpf_get_stackid(u64 r1, u64 r2, u64 flags, u64 r4, u64 r5)
return -EFAULT;
/* get_perf_callchain() guarantees that trace->nr >= init_nr
* and trace-nr <= PERF_MAX_STACK_DEPTH, so trace_nr <= max_depth
* and trace-nr <= sysctl_perf_event_max_stack, so trace_nr <= max_depth
*/
trace_nr = trace->nr - init_nr;
......
......@@ -18,6 +18,14 @@ struct callchain_cpus_entries {
struct perf_callchain_entry *cpu_entries[0];
};
int sysctl_perf_event_max_stack __read_mostly = PERF_MAX_STACK_DEPTH;
static inline size_t perf_callchain_entry__sizeof(void)
{
return (sizeof(struct perf_callchain_entry) +
sizeof(__u64) * sysctl_perf_event_max_stack);
}
static DEFINE_PER_CPU(int, callchain_recursion[PERF_NR_CONTEXTS]);
static atomic_t nr_callchain_events;
static DEFINE_MUTEX(callchain_mutex);
......@@ -73,7 +81,7 @@ static int alloc_callchain_buffers(void)
if (!entries)
return -ENOMEM;
size = sizeof(struct perf_callchain_entry) * PERF_NR_CONTEXTS;
size = perf_callchain_entry__sizeof() * PERF_NR_CONTEXTS;
for_each_possible_cpu(cpu) {
entries->cpu_entries[cpu] = kmalloc_node(size, GFP_KERNEL,
......@@ -147,7 +155,8 @@ static struct perf_callchain_entry *get_callchain_entry(int *rctx)
cpu = smp_processor_id();
return &entries->cpu_entries[cpu][*rctx];
return (((void *)entries->cpu_entries[cpu]) +
(*rctx * perf_callchain_entry__sizeof()));
}
static void
......@@ -215,3 +224,25 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
return entry;
}
int perf_event_max_stack_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
int new_value = sysctl_perf_event_max_stack, ret;
struct ctl_table new_table = *table;
new_table.data = &new_value;
ret = proc_dointvec_minmax(&new_table, write, buffer, lenp, ppos);
if (ret || !write)
return ret;
mutex_lock(&callchain_mutex);
if (atomic_read(&nr_callchain_events))
ret = -EBUSY;
else
sysctl_perf_event_max_stack = new_value;
mutex_unlock(&callchain_mutex);
return ret;
}
此差异已折叠。
......@@ -11,13 +11,13 @@
struct ring_buffer {
atomic_t refcount;
struct rcu_head rcu_head;
struct irq_work irq_work;
#ifdef CONFIG_PERF_USE_VMALLOC
struct work_struct work;
int page_order; /* allocation order */
#endif
int nr_pages; /* nr of data pages */
int overwrite; /* can overwrite itself */
int paused; /* can write into ring buffer */
atomic_t poll; /* POLL_ for wakeups */
......@@ -65,6 +65,14 @@ static inline void rb_free_rcu(struct rcu_head *rcu_head)
rb_free(rb);
}
static inline void rb_toggle_paused(struct ring_buffer *rb, bool pause)
{
if (!pause && rb->nr_pages)
rb->paused = 0;
else
rb->paused = 1;
}
extern struct ring_buffer *
rb_alloc(int nr_pages, long watermark, int cpu, int flags);
extern void perf_event_wakeup(struct perf_event *event);
......
......@@ -102,8 +102,21 @@ static void perf_output_put_handle(struct perf_output_handle *handle)
preempt_enable();
}
int perf_output_begin(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size)
static bool __always_inline
ring_buffer_has_space(unsigned long head, unsigned long tail,
unsigned long data_size, unsigned int size,
bool backward)
{
if (!backward)
return CIRC_SPACE(head, tail, data_size) >= size;
else
return CIRC_SPACE(tail, head, data_size) >= size;
}
static int __always_inline
__perf_output_begin(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size,
bool backward)
{
struct ring_buffer *rb;
unsigned long tail, offset, head;
......@@ -125,8 +138,11 @@ int perf_output_begin(struct perf_output_handle *handle,
if (unlikely(!rb))
goto out;
if (unlikely(!rb->nr_pages))
if (unlikely(rb->paused)) {
if (rb->nr_pages)
local_inc(&rb->lost);
goto out;
}
handle->rb = rb;
handle->event = event;
......@@ -143,9 +159,12 @@ int perf_output_begin(struct perf_output_handle *handle,
do {
tail = READ_ONCE(rb->user_page->data_tail);
offset = head = local_read(&rb->head);
if (!rb->overwrite &&
unlikely(CIRC_SPACE(head, tail, perf_data_size(rb)) < size))
goto fail;
if (!rb->overwrite) {
if (unlikely(!ring_buffer_has_space(head, tail,
perf_data_size(rb),
size, backward)))
goto fail;
}
/*
* The above forms a control dependency barrier separating the
......@@ -159,9 +178,17 @@ int perf_output_begin(struct perf_output_handle *handle,
* See perf_output_put_handle().
*/
head += size;
if (!backward)
head += size;
else
head -= size;
} while (local_cmpxchg(&rb->head, offset, head) != offset);
if (backward) {
offset = head;
head = (u64)(-head);
}
/*
* We rely on the implied barrier() by local_cmpxchg() to ensure
* none of the data stores below can be lifted up by the compiler.
......@@ -203,6 +230,26 @@ int perf_output_begin(struct perf_output_handle *handle,
return -ENOSPC;
}
int perf_output_begin_forward(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size)
{
return __perf_output_begin(handle, event, size, false);
}
int perf_output_begin_backward(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size)
{
return __perf_output_begin(handle, event, size, true);
}
int perf_output_begin(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size)
{
return __perf_output_begin(handle, event, size,
unlikely(is_write_backward(event)));
}
unsigned int perf_output_copy(struct perf_output_handle *handle,
const void *buf, unsigned int len)
{
......@@ -221,8 +268,6 @@ void perf_output_end(struct perf_output_handle *handle)
rcu_read_unlock();
}
static void rb_irq_work(struct irq_work *work);
static void
ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
{
......@@ -243,16 +288,13 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
INIT_LIST_HEAD(&rb->event_list);
spin_lock_init(&rb->event_lock);
init_irq_work(&rb->irq_work, rb_irq_work);
}
static void ring_buffer_put_async(struct ring_buffer *rb)
{
if (!atomic_dec_and_test(&rb->refcount))
return;
rb->rcu_head.next = (void *)rb;
irq_work_queue(&rb->irq_work);
/*
* perf_output_begin() only checks rb->paused, therefore
* rb->paused must be true if we have no pages for output.
*/
if (!rb->nr_pages)
rb->paused = 1;
}
/*
......@@ -264,6 +306,10 @@ static void ring_buffer_put_async(struct ring_buffer *rb)
* The ordering is similar to that of perf_output_{begin,end}, with
* the exception of (B), which should be taken care of by the pmu
* driver, since ordering rules will differ depending on hardware.
*
* Call this from pmu::start(); see the comment in perf_aux_output_end()
* about its use in pmu callbacks. Both can also be called from the PMI
* handler if needed.
*/
void *perf_aux_output_begin(struct perf_output_handle *handle,
struct perf_event *event)
......@@ -287,6 +333,13 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
if (!rb_has_aux(rb) || !atomic_inc_not_zero(&rb->aux_refcount))
goto err;
/*
* If rb::aux_mmap_count is zero (and rb_has_aux() above went through),
* the aux buffer is in perf_mmap_close(), about to get freed.
*/
if (!atomic_read(&rb->aux_mmap_count))
goto err_put;
/*
* Nesting is not supported for AUX area, make sure nested
* writers are caught early
......@@ -328,10 +381,11 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
return handle->rb->aux_priv;
err_put:
/* can't be last */
rb_free_aux(rb);
err:
ring_buffer_put_async(rb);
ring_buffer_put(rb);
handle->event = NULL;
return NULL;
......@@ -342,6 +396,10 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
* aux_head and posting a PERF_RECORD_AUX into the perf buffer. It is the
* pmu driver's responsibility to observe ordering rules of the hardware,
* so that all the data is externally visible before this is called.
*
* Note: this has to be called from pmu::stop() callback, as the assumption
* of the AUX buffer management code is that after pmu::stop(), the AUX
* transaction must be stopped and therefore drop the AUX reference count.
*/
void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
bool truncated)
......@@ -389,8 +447,9 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
handle->event = NULL;
local_set(&rb->aux_nest, 0);
/* can't be last */
rb_free_aux(rb);
ring_buffer_put_async(rb);
ring_buffer_put(rb);
}
/*
......@@ -471,6 +530,14 @@ static void __rb_free_aux(struct ring_buffer *rb)
{
int pg;
/*
* Should never happen, the last reference should be dropped from
* perf_mmap_close() path, which first stops aux transactions (which
* in turn are the atomic holders of aux_refcount) and then does the
* last rb_free_aux().
*/
WARN_ON_ONCE(in_atomic());
if (rb->aux_priv) {
rb->free_aux(rb->aux_priv);
rb->free_aux = NULL;
......@@ -582,18 +649,7 @@ int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
void rb_free_aux(struct ring_buffer *rb)
{
if (atomic_dec_and_test(&rb->aux_refcount))
irq_work_queue(&rb->irq_work);
}
static void rb_irq_work(struct irq_work *work)
{
struct ring_buffer *rb = container_of(work, struct ring_buffer, irq_work);
if (!atomic_read(&rb->aux_refcount))
__rb_free_aux(rb);
if (rb->rcu_head.next == (void *)rb)
call_rcu(&rb->rcu_head, rb_free_rcu);
}
#ifndef CONFIG_PERF_USE_VMALLOC
......
......@@ -130,6 +130,9 @@ static int one_thousand = 1000;
#ifdef CONFIG_PRINTK
static int ten_thousand = 10000;
#endif
#ifdef CONFIG_PERF_EVENTS
static int six_hundred_forty_kb = 640 * 1024;
#endif
/* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
......@@ -1144,6 +1147,15 @@ static struct ctl_table kern_table[] = {
.extra1 = &zero,
.extra2 = &one_hundred,
},
{
.procname = "perf_event_max_stack",
.data = NULL, /* filled in by handler */
.maxlen = sizeof(sysctl_perf_event_max_stack),
.mode = 0644,
.proc_handler = perf_event_max_stack_handler,
.extra1 = &zero,
.extra2 = &six_hundred_forty_kb,
},
#endif
#ifdef CONFIG_KMEMCHECK
{
......
......@@ -47,6 +47,9 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
if (perf_paranoid_tracepoint_raw() && !capable(CAP_SYS_ADMIN))
return -EPERM;
if (!is_sampling_event(p_event))
return 0;
/*
* We don't allow user space callchains for function trace
* event, due to issues with page faults while tracing page
......
......@@ -137,7 +137,8 @@ libsubcmd_clean:
$(call descend,lib/subcmd,clean)
perf_clean:
$(call descend,$(@:_clean=),clean)
$(Q)mkdir -p $(PERF_O) .
$(Q)$(MAKE) --no-print-directory -C perf O=$(PERF_O) subdir= clean
selftests_clean:
$(call descend,testing/$(@:_clean=),clean)
......
......@@ -49,6 +49,10 @@ FEATURE_TESTS_BASIC := \
libslang \
libcrypto \
libunwind \
libunwind-x86 \
libunwind-x86_64 \
libunwind-arm \
libunwind-aarch64 \
pthread-attr-setaffinity-np \
stackprotector-all \
timerfd \
......@@ -69,7 +73,9 @@ FEATURE_TESTS_EXTRA := \
libbabeltrace \
liberty \
liberty-z \
libunwind-debug-frame
libunwind-debug-frame \
libunwind-debug-frame-arm \
libunwind-debug-frame-aarch64
FEATURE_TESTS ?= $(FEATURE_TESTS_BASIC)
......
......@@ -27,6 +27,12 @@ FILES= \
test-libcrypto.bin \
test-libunwind.bin \
test-libunwind-debug-frame.bin \
test-libunwind-x86.bin \
test-libunwind-x86_64.bin \
test-libunwind-arm.bin \
test-libunwind-aarch64.bin \
test-libunwind-debug-frame-arm.bin \
test-libunwind-debug-frame-aarch64.bin \
test-pthread-attr-setaffinity-np.bin \
test-stackprotector-all.bin \
test-timerfd.bin \
......@@ -103,6 +109,23 @@ $(OUTPUT)test-libunwind.bin:
$(OUTPUT)test-libunwind-debug-frame.bin:
$(BUILD) -lelf
$(OUTPUT)test-libunwind-x86.bin:
$(BUILD) -lelf -lunwind-x86
$(OUTPUT)test-libunwind-x86_64.bin:
$(BUILD) -lelf -lunwind-x86_64
$(OUTPUT)test-libunwind-arm.bin:
$(BUILD) -lelf -lunwind-arm
$(OUTPUT)test-libunwind-aarch64.bin:
$(BUILD) -lelf -lunwind-aarch64
$(OUTPUT)test-libunwind-debug-frame-arm.bin:
$(BUILD) -lelf -lunwind-arm
$(OUTPUT)test-libunwind-debug-frame-aarch64.bin:
$(BUILD) -lelf -lunwind-aarch64
$(OUTPUT)test-libaudit.bin:
$(BUILD) -laudit
......
......@@ -27,10 +27,9 @@ int main(void)
attr.log_level = 0;
attr.kern_version = 0;
attr = attr;
/*
* Test existence of __NR_bpf and BPF_PROG_LOAD.
* This call should fail if we run the testcase.
*/
return syscall(__NR_bpf, BPF_PROG_LOAD, attr, sizeof(attr));
return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
}
#include <libunwind-aarch64.h>
#include <stdlib.h>
extern int UNW_OBJ(dwarf_search_unwind_table) (unw_addr_space_t as,
unw_word_t ip,
unw_dyn_info_t *di,
unw_proc_info_t *pi,
int need_unwind_info, void *arg);
#define dwarf_search_unwind_table UNW_OBJ(dwarf_search_unwind_table)
static unw_accessors_t accessors;
int main(void)
{
unw_addr_space_t addr_space;
addr_space = unw_create_addr_space(&accessors, 0);
if (addr_space)
return 0;
unw_init_remote(NULL, addr_space, NULL);
dwarf_search_unwind_table(addr_space, 0, NULL, NULL, 0, NULL);
return 0;
}
#include <libunwind-arm.h>
#include <stdlib.h>
extern int UNW_OBJ(dwarf_search_unwind_table) (unw_addr_space_t as,
unw_word_t ip,
unw_dyn_info_t *di,
unw_proc_info_t *pi,
int need_unwind_info, void *arg);
#define dwarf_search_unwind_table UNW_OBJ(dwarf_search_unwind_table)
static unw_accessors_t accessors;
int main(void)
{
unw_addr_space_t addr_space;
addr_space = unw_create_addr_space(&accessors, 0);
if (addr_space)
return 0;
unw_init_remote(NULL, addr_space, NULL);
dwarf_search_unwind_table(addr_space, 0, NULL, NULL, 0, NULL);
return 0;
}
#include <libunwind-aarch64.h>
#include <stdlib.h>
extern int
UNW_OBJ(dwarf_find_debug_frame) (int found, unw_dyn_info_t *di_debug,
unw_word_t ip, unw_word_t segbase,
const char *obj_name, unw_word_t start,
unw_word_t end);
#define dwarf_find_debug_frame UNW_OBJ(dwarf_find_debug_frame)
int main(void)
{
dwarf_find_debug_frame(0, NULL, 0, 0, NULL, 0, 0);
return 0;
}
#include <libunwind-arm.h>
#include <stdlib.h>
extern int
UNW_OBJ(dwarf_find_debug_frame) (int found, unw_dyn_info_t *di_debug,
unw_word_t ip, unw_word_t segbase,
const char *obj_name, unw_word_t start,
unw_word_t end);
#define dwarf_find_debug_frame UNW_OBJ(dwarf_find_debug_frame)
int main(void)
{
dwarf_find_debug_frame(0, NULL, 0, 0, NULL, 0, 0);
return 0;
}
#include <libunwind-x86.h>
#include <stdlib.h>
extern int UNW_OBJ(dwarf_search_unwind_table) (unw_addr_space_t as,
unw_word_t ip,
unw_dyn_info_t *di,
unw_proc_info_t *pi,
int need_unwind_info, void *arg);
#define dwarf_search_unwind_table UNW_OBJ(dwarf_search_unwind_table)
static unw_accessors_t accessors;
int main(void)
{
unw_addr_space_t addr_space;
addr_space = unw_create_addr_space(&accessors, 0);
if (addr_space)
return 0;
unw_init_remote(NULL, addr_space, NULL);
dwarf_search_unwind_table(addr_space, 0, NULL, NULL, 0, NULL);
return 0;
}
#include <libunwind-x86_64.h>
#include <stdlib.h>
extern int UNW_OBJ(dwarf_search_unwind_table) (unw_addr_space_t as,
unw_word_t ip,
unw_dyn_info_t *di,
unw_proc_info_t *pi,
int need_unwind_info, void *arg);
#define dwarf_search_unwind_table UNW_OBJ(dwarf_search_unwind_table)
static unw_accessors_t accessors;
int main(void)
{
unw_addr_space_t addr_space;
addr_space = unw_create_addr_space(&accessors, 0);
if (addr_space)
return 0;
unw_init_remote(NULL, addr_space, NULL);
dwarf_search_unwind_table(addr_space, 0, NULL, NULL, 0, NULL);
return 0;
}
......@@ -351,6 +351,19 @@ int filename__read_str(const char *filename, char **buf, size_t *sizep)
return err;
}
int procfs__read_str(const char *entry, char **buf, size_t *sizep)
{
char path[PATH_MAX];
const char *procfs = procfs__mountpoint();
if (!procfs)
return -1;
snprintf(path, sizeof(path), "%s/%s", procfs, entry);
return filename__read_str(path, buf, sizep);
}
int sysfs__read_ull(const char *entry, unsigned long long *value)
{
char path[PATH_MAX];
......
......@@ -29,6 +29,8 @@ int filename__read_int(const char *filename, int *value);
int filename__read_ull(const char *filename, unsigned long long *value);
int filename__read_str(const char *filename, char **buf, size_t *sizep);
int procfs__read_str(const char *entry, char **buf, size_t *sizep);
int sysctl__read_int(const char *sysctl, int *value);
int sysfs__read_int(const char *entry, int *value);
int sysfs__read_ull(const char *entry, unsigned long long *value);
......
......@@ -672,6 +672,7 @@ The letters are:
d create a debug log
g synthesize a call chain (use with i or x)
l synthesize last branch entries (use with i or x)
s skip initial number of events
"Instructions" events look like they were recorded by "perf record -e
instructions".
......@@ -730,6 +731,12 @@ from one sample to the next.
To disable trace decoding entirely, use the option --no-itrace.
It is also possible to skip events generated (instructions, branches, transactions)
at the beginning. This is useful to ignore initialization code.
--itrace=i0nss1000000
skips the first million instructions.
dump option
-----------
......
......@@ -7,6 +7,7 @@
d create a debug log
g synthesize a call chain (use with i or x)
l synthesize last branch entries (use with i or x)
s skip initial number of events
The default is all events i.e. the same as --itrace=ibxe
......@@ -24,3 +25,10 @@
Also the number of last branch entries (default 64, max. 1024) for
instructions or transactions events can be specified.
It is also possible to skip events generated (instructions, branches, transactions)
at the beginning. This is useful to ignore initialization code.
--itrace=i0nss1000000
skips the first million instructions.
......@@ -33,7 +33,7 @@ OPTIONS
-f::
--force::
Don't complain, do it.
Don't do ownership validation.
-v::
--verbose::
......
......@@ -75,7 +75,7 @@ OPTIONS
-f::
--force::
Don't complain, do it.
Don't do ownership validation.
--symfs=<directory>::
Look for files with symbols relative to this directory.
......
......@@ -93,6 +93,67 @@ raw encoding of 0x1A8 can be used:
You should refer to the processor specific documentation for getting these
details. Some of them are referenced in the SEE ALSO section below.
ARBITRARY PMUS
--------------
perf also supports an extended syntax for specifying raw parameters
to PMUs. Using this typically requires looking up the specific event
in the CPU vendor specific documentation.
The available PMUs and their raw parameters can be listed with
ls /sys/devices/*/format
For example the raw event "LSD.UOPS" core pmu event above could
be specified as
perf stat -e cpu/event=0xa8,umask=0x1,name=LSD.UOPS_CYCLES,cmask=1/ ...
PER SOCKET PMUS
---------------
Some PMUs are not associated with a core, but with a whole CPU socket.
Events on these PMUs generally cannot be sampled, but only counted globally
with perf stat -a. They can be bound to one logical CPU, but will measure
all the CPUs in the same socket.
This example measures memory bandwidth every second
on the first memory controller on socket 0 of a Intel Xeon system
perf stat -C 0 -a uncore_imc_0/cas_count_read/,uncore_imc_0/cas_count_write/ -I 1000 ...
Each memory controller has its own PMU. Measuring the complete system
bandwidth would require specifying all imc PMUs (see perf list output),
and adding the values together.
This example measures the combined core power every second
perf stat -I 1000 -e power/energy-cores/ -a
ACCESS RESTRICTIONS
-------------------
For non root users generally only context switched PMU events are available.
This is normally only the events in the cpu PMU, the predefined events
like cycles and instructions and some software events.
Other PMUs and global measurements are normally root only.
Some event qualifiers, such as "any", are also root only.
This can be overriden by setting the kernel.perf_event_paranoid
sysctl to -1, which allows non root to use these events.
For accessing trace point events perf needs to have read access to
/sys/kernel/debug/tracing, even when perf_event_paranoid is in a relaxed
setting.
TRACING
-------
Some PMUs control advanced hardware tracing capabilities, such as Intel PT,
that allows low overhead execution tracing. These are described in a separate
intel-pt.txt document.
PARAMETERIZED EVENTS
--------------------
......@@ -106,6 +167,50 @@ also be supplied. For example:
perf stat -C 0 -e 'hv_gpci/dtbp_ptitc,phys_processor_idx=0x2/' ...
EVENT GROUPS
------------
Perf supports time based multiplexing of events, when the number of events
active exceeds the number of hardware performance counters. Multiplexing
can cause measurement errors when the workload changes its execution
profile.
When metrics are computed using formulas from event counts, it is useful to
ensure some events are always measured together as a group to minimize multiplexing
errors. Event groups can be specified using { }.
perf stat -e '{instructions,cycles}' ...
The number of available performance counters depend on the CPU. A group
cannot contain more events than available counters.
For example Intel Core CPUs typically have four generic performance counters
for the core, plus three fixed counters for instructions, cycles and
ref-cycles. Some special events have restrictions on which counter they
can schedule, and may not support multiple instances in a single group.
When too many events are specified in the group none of them will not
be measured.
Globally pinned events can limit the number of counters available for
other groups. On x86 systems, the NMI watchdog pins a counter by default.
The nmi watchdog can be disabled as root with
echo 0 > /proc/sys/kernel/nmi_watchdog
Events from multiple different PMUs cannot be mixed in a group, with
some exceptions for software events.
LEADER SAMPLING
---------------
perf also supports group leader sampling using the :S specifier.
perf record -e '{cycles,instructions}:S' ...
perf report --group
Normally all events in a event group sample, but with :S only
the first event (the leader) samples, and it only reads the values of the
other events in the group.
OPTIONS
-------
......@@ -143,5 +248,5 @@ SEE ALSO
--------
linkperf:perf-stat[1], linkperf:perf-top[1],
linkperf:perf-record[1],
http://www.intel.com/Assets/PDF/manual/253669.pdf[Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide],
http://www.intel.com/sdm/[Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide],
http://support.amd.com/us/Processor_TechDocs/24593_APM_v2.pdf[AMD64 Architecture Programmer’s Manual Volume 2: System Programming]
......@@ -48,6 +48,14 @@ OPTIONS
option can be passed in record mode. It will be interpreted the same way as perf
record.
-K::
--all-kernel::
Configure all used events to run in kernel space.
-U::
--all-user::
Configure all used events to run in user space.
SEE ALSO
--------
linkperf:perf-record[1], linkperf:perf-report[1]
......@@ -177,7 +177,7 @@ Default is to monitor all CPUS.
between information loss and faster processing especially for
workloads that can have a very long callchain stack.
Default: 127
Default: /proc/sys/kernel/perf_event_max_stack when present, 127 otherwise.
--ignore-callees=<regex>::
Ignore callees of the function(s) matching the given regex.
......
此差异已折叠。
......@@ -3,4 +3,5 @@ PERF_HAVE_DWARF_REGS := 1
endif
HAVE_KVM_STAT_SUPPORT := 1
PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET := 1
PERF_HAVE_JITDUMP := 1
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册