1. 19 3月, 2010 2 次提交
    • L
      perf, x86: Add a key to simplify template lookup in Pentium-4 PMU · f34edbc1
      Lin Ming 提交于
      Currently, we use opcode(Event and Event-Selector) + emask to
      look up template in p4_templates.
      
      But cache events (L1-dcache-load-misses, LLC-load-misses, etc)
      use the same event(P4_REPLAY_EVENT) to do the counting, ie, they
      have the same opcode and emask. So we can not use current lookup
      mechanism to find the template for cache events.
      
      This patch introduces a "key", which is the index into
      p4_templates. The low 12 bits of CCCR are reserved, so we can
      hide the "key" in the low 12 bits of hwc->config.
      
      We extract the key from hwc->config and then quickly find the
      template.
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1268908387.13901.127.camel@minggr.sh.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f34edbc1
    • C
      x86, perf: Use apic_write unconditionally · 7335f75e
      Cyrill Gorcunov 提交于
      Since apic_write() maps to a plain noop in the !CONFIG_X86_LOCAL_APIC
      case we're safe to remove this conditional compilation and clean up
      the code a bit.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: fweisbec@gmail.com
      Cc: acme@redhat.com
      Cc: eranian@google.com
      Cc: peterz@infradead.org
      LKML-Reference: <20100317104356.232371479@openvz.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7335f75e
  2. 15 3月, 2010 1 次提交
  3. 13 3月, 2010 1 次提交
  4. 12 3月, 2010 1 次提交
    • C
      perf, x86: Implement initial P4 PMU driver · a072738e
      Cyrill Gorcunov 提交于
      The netburst PMU is way different from the "architectural
      perfomance monitoring" specification that current CPUs use.
      P4 uses a tuple of ESCR+CCCR+COUNTER MSR registers to handle
      perfomance monitoring events.
      
      A few implementational details:
      
      1) We need a separate x86_pmu::hw_config helper in struct
         x86_pmu since register bit-fields are quite different from P6,
         Core and later cpu series.
      
      2) For the same reason is a x86_pmu::schedule_events helper
         introduced.
      
      3) hw_perf_event::config consists of packed ESCR+CCCR values.
         It's allowed since in reality both registers only use a half
         of their size. Of course before making a real write into a
         particular MSR we need to unpack the value and extend it to
         a proper size.
      
      4) The tuple of packed ESCR+CCCR in hw_perf_event::config
         doesn't describe the memory address of ESCR MSR register
         so that we need to keep a mapping between these tuples
         used and available ESCR (various P4 events may use same
         ESCRs but not simultaneously), for this sake every active
         event has a per-cpu map of hw_perf_event::idx <--> ESCR
         addresses.
      
      5) Since hw_perf_event::idx is an offset to counter/control register
         we need to lift X86_PMC_MAX_GENERIC up, otherwise kernel
         strips it down to 8 registers and event armed may never be turned
         off (ie the bit in active_mask is set but the loop never reaches
         this index to check), thanks to Peter Zijlstra
      
      Restrictions:
      
       - No cascaded counters support (do we ever need them?)
       - No dependent events support (so PERF_COUNT_HW_INSTRUCTIONS
         doesn't work for now)
       - There are events with same counters which can't work simultaneously
         (need to use intersected ones due to broken counter 1)
       - No PERF_COUNT_HW_CACHE_ events yet
      
      Todo:
      
       - Implement dependent events
       - Need proper hashing for event opcodes (no linear search, good for
         debugging stage but not in real loads)
       - Some events counted during a clock cycle -- need to set threshold
         for them and count every clock cycle just to get summary statistics
         (ie to behave the same way as other PMUs do)
       - Need to swicth to use event_constraints
       - To support RAW events we need to encode a global list of P4 events
         into p4_templates
       - Cache events need to be added
      
      Event support status matrix:
      
       Event			status
       -----------------------------
       cycles			works
       cache-references	works
       cache-misses		works
       branch-misses		works
       bus-cycles		partially (does not work on 64bit cpu with HT enabled)
       instruction		doesnt work (needs dependent event [mop tagging])
       branches		doesnt work
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100311165439.GB5129@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a072738e