- 13 5月, 2010 1 次提交
-
-
由 Cyrill Gorcunov 提交于
Linear search over all p4 MSRs should be fine if only we would not use it in events scheduling routine which is pretty time critical. Lets use hashes. It should speed scheduling up significantly. v2: Steven proposed to use more gentle approach than issue BUG on error, so we use WARN_ONCE now Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100512174242.GA5190@lenovo> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 08 5月, 2010 4 次提交
-
-
由 Cyrill Gorcunov 提交于
RAW events are special and we should be ready for user passing in insane event index values. Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100508112717.315897547@openvz.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Cyrill Gorcunov 提交于
The caller already has done such a check. And it was wrong anyway, it had to be '>=' rather than '>' Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100508112717.130386882@openvz.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Cyrill Gorcunov 提交于
Steven reported: | | I'm getting: | | Pid: 3477, comm: perf Not tainted 2.6.34-rc6 #2727 | Call Trace: | [<ffffffff811c7565>] debug_smp_processor_id+0xd5/0xf0 | [<ffffffff81019874>] p4_hw_config+0x2b/0x15c | [<ffffffff8107acbc>] ? trace_hardirqs_on_caller+0x12b/0x14f | [<ffffffff81019143>] hw_perf_event_init+0x468/0x7be | [<ffffffff810782fd>] ? debug_mutex_init+0x31/0x3c | [<ffffffff810c68b2>] T.850+0x273/0x42e | [<ffffffff810c6cab>] sys_perf_event_open+0x23e/0x3f1 | [<ffffffff81009e6a>] ? sysret_check+0x2e/0x69 | [<ffffffff81009e32>] system_call_fastpath+0x16/0x1b | | When running perf record in latest tip/perf/core | Due to the fact that p4 counters are shared between HT threads we synthetically divide the whole set of counters into two non-intersected subsets. And while we're "borrowing" counters from these subsets we should not be preempted (well, strictly speaking in p4_hw_config we just pre-set reference to the subset which allow to save some cycles in schedule routine if it happens on the same cpu). So use get_cpu/put_cpu pair. Also p4_pmu_schedule_events should use smp_processor_id rather than raw_ version. This allow us to catch up preemption issue (if there will ever be). Reported-by: NSteven Rostedt <rostedt@goodmis.org> Tested-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100508112716.963478928@openvz.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Cyrill Gorcunov 提交于
If an event is not RAW we should not exit p4_hw_config early but call x86_setup_perfctr as well. Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Robert Richter <robert.richter@amd.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 07 5月, 2010 1 次提交
-
-
由 Robert Richter 提交于
The perfctr setup calls are in the corresponding .hw_config() functions now. This makes it possible to introduce config functions for other pmu events that are not perfctr specific. Also, all of a sudden the code looks much nicer. Signed-off-by: NRobert Richter <robert.richter@amd.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271190201-25705-4-git-send-email-robert.richter@amd.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 03 4月, 2010 3 次提交
-
-
由 Peter Zijlstra 提交于
All variables that have __initconst should also be const. Suggested-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Stephane noticed that the ANY flag was in generic arch code, and Cyrill reported that it broke the P4 code. Solve this by merging x86_pmu::raw_event into x86_pmu::hw_config and provide intel_pmu and amd_pmu specific versions of this callback. The intel_pmu one deals with the ANY flag, the amd_pmu adds the few extra event bits AMD64 has. Reported-by: NStephane Eranian <eranian@google.com> Reported-by: NCyrill Gorcunov <gorcunov@gmail.com> Acked-by: NRobert Richter <robert.richter@amd.com> Acked-by: NCyrill Gorcunov <gorcunov@gmail.com> Acked-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1269968113.5258.442.camel@laptop> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Robert Richter 提交于
The big rename: cdd6c482 perf: Do the big rename: Performance Counters -> Performance Events accidentally renamed some members of stucts that were named after registers in the spec. To avoid confusion this patch reverts some changes. The related specs are MSR descriptions in AMD's BKDGs and the ARCHITECTURAL PERFORMANCE MONITORING section in the Intel 64 and IA-32 Architectures Software Developer's Manuals. This patch does: $ sed -i -e 's:num_events:num_counters:g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event_amd.c \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_p6.c \ arch/x86/kernel/cpu/perf_event_p4.c \ arch/x86/oprofile/op_model_ppro.c $ sed -i -e 's:event_bits:cntval_bits:g' -e 's:event_mask:cntval_mask:g' \ arch/x86/kernel/cpu/perf_event_amd.c \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_p6.c \ arch/x86/kernel/cpu/perf_event_p4.c Signed-off-by: NRobert Richter <robert.richter@amd.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1269880612-25800-2-git-send-email-robert.richter@amd.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 26 3月, 2010 2 次提交
-
-
由 Peter Zijlstra 提交于
Implement the workaround for Intel Errata AAK100 and AAP53. Also, remove the Core-i7 name for Nehalem events since there are also Westmere based i7 chips. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <1269608924.12097.147.camel@laptop> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Cyrill Gorcunov 提交于
The adding of raw event support lead to complete code refactoring. I hope is became more readable then it was. The list of changes: 1) The 64bit config field is enough to hold all information we need to track event details. To achieve it we used *own* enum for events selection in ESCR register and map this key into proper value at moment of event enabling. For the same reason we use 12LSB bits in CCCR register -- to track which exactly cache trace event was requested. And we cear this bits at real 'write' moment. 2) There is no per-cpu area reserved for P4 PMU anymore. We don't need it. All is held by config. 3) Now we may use any available counter, ie we try to grab any possible counter. v2: - Lin Ming reported the lack of ESCR selector in CCCR for cache events v3: - Don't loose cache event codes at config unpacking procedure, we may need it one day so no obscure hack behind our back, better to clear reserved bits explicitly when needed (thanks Ming for pointing out) - Lin Ming fixed misplaced opcodes in cache events Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Tested-by: NLin Ming <ming.m.lin@intel.com> Signed-off-by: NLin Ming <ming.m.lin@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1269403766.3409.6.camel@minggr.sh.intel.com> [ v4: did a few whitespace fixlets ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 19 3月, 2010 4 次提交
-
-
由 Cyrill Gorcunov 提交于
- A few ESCR have escaped fixing at previous attempt. - p4_escr_map is read only, make it const. Nothing serious. Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100318211256.GH5062@lenovo> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Lin Ming 提交于
Move the HT bit setting code from p4_pmu_event_map to p4_hw_config. So the cache events can get HT bit set correctly. Tested on my P4 desktop, below 6 cache events work: L1-dcache-load-misses LLC-load-misses dTLB-load-misses dTLB-store-misses iTLB-loads iTLB-load-misses Signed-off-by: NLin Ming <ming.m.lin@intel.com> Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1268908392.13901.128.camel@minggr.sh.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Lin Ming 提交于
Currently, we use opcode(Event and Event-Selector) + emask to look up template in p4_templates. But cache events (L1-dcache-load-misses, LLC-load-misses, etc) use the same event(P4_REPLAY_EVENT) to do the counting, ie, they have the same opcode and emask. So we can not use current lookup mechanism to find the template for cache events. This patch introduces a "key", which is the index into p4_templates. The low 12 bits of CCCR are reserved, so we can hide the "key" in the low 12 bits of hwc->config. We extract the key from hwc->config and then quickly find the template. Signed-off-by: NLin Ming <ming.m.lin@intel.com> Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1268908387.13901.127.camel@minggr.sh.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Cyrill Gorcunov 提交于
Since apic_write() maps to a plain noop in the !CONFIG_X86_LOCAL_APIC case we're safe to remove this conditional compilation and clean up the code a bit. Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: fweisbec@gmail.com Cc: acme@redhat.com Cc: eranian@google.com Cc: peterz@infradead.org LKML-Reference: <20100317104356.232371479@openvz.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 15 3月, 2010 1 次提交
-
-
由 Cyrill Gorcunov 提交于
This should turn on instruction counting on P4s, which was missing in the first version of the new PMU driver. It's inaccurate for now, we still need dependant event to tag mops before we can count them precisely. The result is that the number of instruction may be lifted up. Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: NLin Ming <ming.m.lin@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1268629102.3355.11.camel@minggr.sh.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 13 3月, 2010 1 次提交
-
-
由 Cyrill Gorcunov 提交于
Ingo reported: | | There's a build failure on -tip with the P4 driver, on UP 32-bit, if | PERF_EVENTS is enabled but UP_APIC is disabled: | | arch/x86/built-in.o: In function `p4_pmu_handle_irq': | perf_event.c:(.text+0xa756): undefined reference to `apic' | perf_event.c:(.text+0xa76e): undefined reference to `apic' | So we have to unmask LVTPC only if we're configured to have one. Reported-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> CC: Lin Ming <ming.m.lin@intel.com> CC: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20100313081116.GA5179@lenovo> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 3月, 2010 1 次提交
-
-
由 Cyrill Gorcunov 提交于
The netburst PMU is way different from the "architectural perfomance monitoring" specification that current CPUs use. P4 uses a tuple of ESCR+CCCR+COUNTER MSR registers to handle perfomance monitoring events. A few implementational details: 1) We need a separate x86_pmu::hw_config helper in struct x86_pmu since register bit-fields are quite different from P6, Core and later cpu series. 2) For the same reason is a x86_pmu::schedule_events helper introduced. 3) hw_perf_event::config consists of packed ESCR+CCCR values. It's allowed since in reality both registers only use a half of their size. Of course before making a real write into a particular MSR we need to unpack the value and extend it to a proper size. 4) The tuple of packed ESCR+CCCR in hw_perf_event::config doesn't describe the memory address of ESCR MSR register so that we need to keep a mapping between these tuples used and available ESCR (various P4 events may use same ESCRs but not simultaneously), for this sake every active event has a per-cpu map of hw_perf_event::idx <--> ESCR addresses. 5) Since hw_perf_event::idx is an offset to counter/control register we need to lift X86_PMC_MAX_GENERIC up, otherwise kernel strips it down to 8 registers and event armed may never be turned off (ie the bit in active_mask is set but the loop never reaches this index to check), thanks to Peter Zijlstra Restrictions: - No cascaded counters support (do we ever need them?) - No dependent events support (so PERF_COUNT_HW_INSTRUCTIONS doesn't work for now) - There are events with same counters which can't work simultaneously (need to use intersected ones due to broken counter 1) - No PERF_COUNT_HW_CACHE_ events yet Todo: - Implement dependent events - Need proper hashing for event opcodes (no linear search, good for debugging stage but not in real loads) - Some events counted during a clock cycle -- need to set threshold for them and count every clock cycle just to get summary statistics (ie to behave the same way as other PMUs do) - Need to swicth to use event_constraints - To support RAW events we need to encode a global list of P4 events into p4_templates - Cache events need to be added Event support status matrix: Event status ----------------------------- cycles works cache-references works cache-misses works branch-misses works bus-cycles partially (does not work on 64bit cpu with HT enabled) instruction doesnt work (needs dependent event [mop tagging]) branches doesnt work Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: NLin Ming <ming.m.lin@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100311165439.GB5129@lenovo> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-