1. 13 9月, 2016 2 次提交
  2. 10 3月, 2011 1 次提交
  3. 13 10月, 2009 1 次提交
    • H
      perf_event, x86, mce: Use TRACE_EVENT() for MCE logging · 8968f9d3
      Hidetoshi Seto 提交于
      This approach is the first baby step towards solving many of the
      structural problems the x86 MCE logging code is having today:
      
       - It has a private ring-buffer implementation that has a number
         of limitations and has been historically fragile and buggy.
      
       - It is using a quirky /dev/mcelog ioctl driven ABI that is MCE
         specific. /dev/mcelog is not part of any larger logging
         framework and hence has remained on the fringes for many years.
      
       - The MCE logging code is still very unclean partly due to its ABI
         limitations. Fields are being reused for multiple purposes, and
         the whole message structure is limited and x86 specific to begin
         with.
      
      All in one, the x86 tree would like to move away from this private
      implementation of an event logging facility to a broader framework.
      
      By using perf events we gain the following advantages:
      
       - Multiple user-space agents can access MCE events. We can have an
         mcelog daemon running but also a system-wide tracer capturing
         important events in flight-recorder mode.
      
       - Sampling support: the kernel and the user-space call-chain of MCE
         events can be stored and analyzed as well. This way actual patterns
         of bad behavior can be matched to precisely what kind of activity
         happened in the kernel (and/or in the app) around that moment in
         time.
      
       - Coupling with other hardware and software events: the PMU can track a
         number of other anomalies - monitoring software might chose to
         monitor those plus the MCE events as well - in one coherent stream of
         events.
      
       - Discovery of MCE sources - tracepoints are enumerated and tools can
         act upon the existence (or non-existence) of various channels of MCE
         information.
      
       - Filtering support: we just subscribe to and act upon the events we
         are interested in. Then even on a per event source basis there's
         in-kernel filter expressions available that can restrict the amount
         of data that hits the event channel.
      
       - Arbitrary deep per cpu buffering of events - we can buffer 32
         entries or we can buffer as much as we want, as long as we have
         the RAM.
      
       - An NMI-safe ring-buffer implementation - mappable to user-space.
      
       - Built-in support for timestamping of events, PID markers, CPU
         markers, etc.
      
       - A rich ABI accessible over system call interface. Per cpu, per task
         and per workload monitoring of MCE events can be done this way. The
         ABI itself has a nice, meaningful structure.
      
       - Extensible ABI: new fields can be added without breaking tooling.
         New tracepoints can be added as the hardware side evolves. There's
         various parsers that can be used.
      
       - Lots of scheduling/buffering/batching modes of operandi for MCE
         events. poll() support. mmap() support. read() support. You name it.
      
       - Rich tooling support: even without any MCE specific extensions added
         the 'perf' tool today offers various views of MCE data: perf report,
         perf stat, perf trace can all be used to view logged MCE events and
         perhaps correlate them to certain user-space usage patterns. But it
         can be used directly as well, for user-space agents and policy action
         in mcelog, etc.
      
      With this we hope to achieve significant code cleanup and feature
      improvements in the MCE code, and we hope to be able to drop the
      /dev/mcelog facility in the end.
      
      This patch is just a plain dumb dump of mce_log() records to
      the tracepoints / perf events framework - a first proof of
      concept step.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      LKML-Reference: <4AD42A0D.7050104@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8968f9d3