提交 90405aa0 编写于 作者: A Andi Kleen 提交者: Ingo Molnar

perf/x86/intel/lbr: Limit LBR accesses to TOS in callstack mode

In callstack mode the LBR is not a ring buffer, but a stack that grows up
and down. This means in  this case we don't need to access all LBRs, only the
ones up to TOS. Do this optimization for the normal LBR read, and the context
switch save/restore code. For save/restore it can be done unconditionally, as
it only runs when call stack mode is active.

This recovers some of the cost of going to 32 LBRs on Skylake.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: eranian@google.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1432786398-23861-6-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
上级 e0573364
...@@ -240,7 +240,7 @@ static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx) ...@@ -240,7 +240,7 @@ static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx)
mask = x86_pmu.lbr_nr - 1; mask = x86_pmu.lbr_nr - 1;
tos = intel_pmu_lbr_tos(); tos = intel_pmu_lbr_tos();
for (i = 0; i < x86_pmu.lbr_nr; i++) { for (i = 0; i < tos; i++) {
lbr_idx = (tos - i) & mask; lbr_idx = (tos - i) & mask;
wrmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]); wrmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
wrmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]); wrmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
...@@ -263,7 +263,7 @@ static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx) ...@@ -263,7 +263,7 @@ static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx)
mask = x86_pmu.lbr_nr - 1; mask = x86_pmu.lbr_nr - 1;
tos = intel_pmu_lbr_tos(); tos = intel_pmu_lbr_tos();
for (i = 0; i < x86_pmu.lbr_nr; i++) { for (i = 0; i < tos; i++) {
lbr_idx = (tos - i) & mask; lbr_idx = (tos - i) & mask;
rdmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]); rdmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
rdmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]); rdmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
...@@ -425,8 +425,12 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) ...@@ -425,8 +425,12 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
u64 tos = intel_pmu_lbr_tos(); u64 tos = intel_pmu_lbr_tos();
int i; int i;
int out = 0; int out = 0;
int num = x86_pmu.lbr_nr;
for (i = 0; i < x86_pmu.lbr_nr; i++) { if (cpuc->lbr_sel->config & LBR_CALL_STACK)
num = tos;
for (i = 0; i < num; i++) {
unsigned long lbr_idx = (tos - i) & mask; unsigned long lbr_idx = (tos - i) & mask;
u64 from, to, mis = 0, pred = 0, in_tx = 0, abort = 0; u64 from, to, mis = 0, pred = 0, in_tx = 0, abort = 0;
int skip = 0; int skip = 0;
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册