1. 16 3月, 2013 1 次提交
  2. 05 3月, 2013 1 次提交
  3. 26 2月, 2013 9 次提交
    • W
      i5100_edac: convert to use simple_open() · b0769891
      Wei Yongjun 提交于
      This removes an open coded simple_open() function and
      replaces file operations references to the function
      with simple_open() instead.
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      b0769891
    • W
      ghes_edac: fix to use list_for_each_entry_safe() when delete list items · 5dae92a7
      Wei Yongjun 提交于
      Since we will remove items off the list using list_del() we need
      to use a safe version of the list_for_each_entry() macro aptly named
      list_for_each_entry_safe().
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      5dae92a7
    • M
      ghes_edac: Fix RAS tracing · 8ae8f50a
      Mauro Carvalho Chehab 提交于
      With the current version of CPER, there's no way to associate an
      error with the memory error. So, the error location in EDAC
      layers is unused.
      
      As CPER has its own idea about memory architectural layers, just
      output whatever is there inside the driver's detail at the RAS
      tracepoint.
      
      The EDAC location keeps untouched, in the case that, in some future,
      we could actually map the error into the dimm labels.
      
      Now, the error message:
      
      [   72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
      [   72.396627] {1}[Hardware Error]: APEI generic hardware error status
      [   72.396628] {1}[Hardware Error]: severity: 2, corrected
      [   72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected
      [   72.396632] {1}[Hardware Error]: flags: 0x01
      [   72.396634] {1}[Hardware Error]: primary
      [   72.396635] {1}[Hardware Error]: section_type: memory error
      [   72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400
      [   72.396638] {1}[Hardware Error]: node: 3
      [   72.396639] {1}[Hardware Error]: card: 0
      [   72.396640] {1}[Hardware Error]: module: 0
      [   72.396641] {1}[Hardware Error]: device: 0
      [   72.396643] {1}[Hardware Error]: error_type: 18, unknown
      [   72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)
      
      Is properly represented on the trace event:
      
           kworker/0:2-584   [000] ....    72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location:-1:-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory)
      
      Tested on a 4 sockets E5-4650 Sandy Bridge machine.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      8ae8f50a
    • M
      ghes_edac: Make it compliant with UEFI spec 2.3.1 · 689c9cd8
      Mauro Carvalho Chehab 提交于
      The UEFI spec defines the memory error types ans the bits that
      validate each field on the memory error record, at
      Appendix N om items N.2.5 (Memory Error Section) and
      N.2.11 (Error Status). Make the error description compliant with
      it, only showing the valid fields.
      
      The EDAC error log is now properly reporting the error:
      
      [  281.556854] mce: [Hardware Error]: Machine check events logged
      [  281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
      [  281.557044] {2}[Hardware Error]: APEI generic hardware error status
      [  281.557046] {2}[Hardware Error]: severity: 2, corrected
      [  281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected
      [  281.557050] {2}[Hardware Error]: flags: 0x01
      [  281.557052] {2}[Hardware Error]: primary
      [  281.557053] {2}[Hardware Error]: section_type: memory error
      [  281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400
      [  281.557056] {2}[Hardware Error]: node: 3
      [  281.557057] {2}[Hardware Error]: card: 0
      [  281.557058] {2}[Hardware Error]: module: 1
      [  281.557059] {2}[Hardware Error]: device: 0
      [  281.557061] {2}[Hardware Error]: error_type: 18, unknown
      [  281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9
      [  281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)
      
      Tested on a 4 CPUs E5-4650 Sandy Bridge machine.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      689c9cd8
    • M
      ghes_edac: Improve driver's printk messages · d2a68566
      Mauro Carvalho Chehab 提交于
      Provide a better infrastructure for printk's inside the driver:
      	- use edac_dbg() for debug messages;
      	- standardize the usage of pr_info();
      	- provide warning about the risk of relying on this
      	  driver.
      
      While here, changes the size of a fake memory to 1 page. This is
      as good or as bad as 1000 pages, but it is easier for userspace to
      detect, as I don't expect that any machine implementing GHES would
      provide just 1 page available ;)
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      
      Conflicts:
      	drivers/edac/ghes_edac.c
      d2a68566
    • M
      ghes_edac: Don't credit the same memory dimm twice · 5ee726db
      Mauro Carvalho Chehab 提交于
      On my tests on a 4xE5-4650 CPU's system, the GHES
      EDAC driver is called twice. As the SMBIOS DMI enumeration
      call will seek for the entire DIMM sockets in the system, on
      this machine, equipped with 128 GB of RAM, the memory is
      displayed twice:
      
                +-----------------------+
                |    mc0    |    mc1    |
      ----------+-----------------------+
      memory45: |  8192 MB  |  8192 MB  |
      memory44: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory43: |     0 MB  |     0 MB  |
      memory42: |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory41: |     0 MB  |     0 MB  |
      memory40: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory39: |  8192 MB  |  8192 MB  |
      memory38: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory37: |     0 MB  |     0 MB  |
      memory36: |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory35: |     0 MB  |     0 MB  |
      memory34: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory33: |  8192 MB  |  8192 MB  |
      memory32: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory31: |     0 MB  |     0 MB  |
      memory30: |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory29: |     0 MB  |     0 MB  |
      memory28: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory27: |  8192 MB  |  8192 MB  |
      memory26: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory25: |     0 MB  |     0 MB  |
      memory24: |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory23: |     0 MB  |     0 MB  |
      memory22: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory21: |  8192 MB  |  8192 MB  |
      memory20: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory19: |     0 MB  |     0 MB  |
      memory18: |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory17: |     0 MB  |     0 MB  |
      memory16: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory15: |  8192 MB  |  8192 MB  |
      memory14: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory13: |     0 MB  |     0 MB  |
      memory12: |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory11: |     0 MB  |     0 MB  |
      memory10: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory9:  |  8192 MB  |  8192 MB  |
      memory8:  |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory7:  |     0 MB  |     0 MB  |
      memory6:  |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory5:  |     0 MB  |     0 MB  |
      memory4:  |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory3:  |  8192 MB  |  8192 MB  |
      memory2:  |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory1:  |     0 MB  |     0 MB  |
      memory0:  |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      
      Total sum of 256 GB.
      
      As there's no reliable way to credit DIMMS to the right memory
      controller, just put everything on memory controller 0 (with should
      always exist).
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      5ee726db
    • M
      ghes_edac: do a better job of filling EDAC DIMM info · 32fa1f53
      Mauro Carvalho Chehab 提交于
      Instead of just faking a random value for the DIMM data, get
      the information that it is available via DMI table.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      32fa1f53
    • M
      ghes_edac: add support for reporting errors via EDAC · f04c62a7
      Mauro Carvalho Chehab 提交于
      Now that the EDAC core is capable of just forward the errors via
      the userspace API, add a report mechanism for the GHES errors.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      f04c62a7
    • M
      ghes_edac: Register at EDAC core the BIOS report · 77c5f5d2
      Mauro Carvalho Chehab 提交于
      Register GHES at EDAC MC core, in order to avoid other
      drivers to also handle errors and mangle with error data.
      
      The edac core will warrant that just one driver will be used,
      so the first one to register (BIOS first) will be the one that
      will be reporting the hardware errors.
      
      For now, the EDAC driver does nothing but to register at the
      EDAC core, preventing the hardware-driven mechanism to
      interfere with GHES.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      77c5f5d2
  4. 22 2月, 2013 2 次提交
  5. 21 2月, 2013 11 次提交
    • M
      edac: lock module owner to avoid error report conflicts · 80cc7d87
      Mauro Carvalho Chehab 提交于
      APEI GHES and i7core_edac/sb_edac currently can be loaded at
      the same time, but those are Highlander modules:
      	"There can be only one".
      
      There are two reasons for that:
      
      1) Each driver assumes that it is the only one registering at
         the EDAC core, as it is driver's responsibility to number
         the memory controllers, and all of them start from 0;
      
      2) If BIOS is handling the memory errors, the OS can't also be
         doing it, as one will mangle with the other.
      
      So, we need to add an module owner's lock at the EDAC core,
      in order to avoid having two different modules handling memory
      errors at the same time. The best way for doing this lock seems
      to use the driver's name, as this is unique, and won't require
      changes on every driver.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      80cc7d87
    • M
      edac: add a new memory layer type · c66b5a79
      Mauro Carvalho Chehab 提交于
      There are some cases where the memory controller layout is
      completely hidden. This is the case of firmware-driven error
      code, like the one provided by GHES. Add a new layer to be
      used on such memory error report mechanisms.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      c66b5a79
    • M
      edac: initialize the core earlier · 4ab19b06
      Mauro Carvalho Chehab 提交于
      In order for it to work with it builtin, the EDAC core should
      be initialized earlier, otherwise the ghes_edac driver initializes
      before edac_mc_sysfs_init() being called:
      
      ...
      [    4.998373] EDAC MC0: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
      ...
      [    4.998373] EDAC MC1: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
      [    6.519495] EDAC MC: Ver: 3.0.0
      [    6.523749] EDAC DEBUG: edac_mc_sysfs_init: device mc created
      
      The net result is that no EDAC sysfs nodes will appear.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      4ab19b06
    • M
      edac: better report error conditions in debug mode · 3d958823
      Mauro Carvalho Chehab 提交于
      It is hard to find what's wrong without a proper error
      report. Improve it, in debug mode.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      3d958823
    • M
      i5100_edac: Remove two checkpatch warnings · 59b9796d
      Mauro Carvalho Chehab 提交于
      The last changeset introduced a few checkpatch warnings:
      
      WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required
      261: FILE: drivers/edac/i5100_edac.c:1207:
      +       if (priv->debugfs)
      +               debugfs_remove_recursive(priv->debugfs);
      
      WARNING: debugfs_remove(NULL) is safe this check is probably not required
      290: FILE: drivers/edac/i5100_edac.c:1250:
      +       if (i5100_debugfs)
      +               debugfs_remove(i5100_debugfs);
      
      Get rid of them.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      59b9796d
    • N
      i5100_edac: connect fault injection to debugfs node · 9cbc6d38
      Niklas Söderlund 提交于
      Create a debugfs direcotry i5100_edac/mcX for each memory controller and
      add nodes to control how fault injection is preformed.
      
      After configuring an injection using inject_channel, inject_deviceptr1,
      inject_deviceptr2, inject_eccmask1, inject_eccmask2 and inject_hlinesel
      trigger the injection by writing anything to inject_enable.
      
      Example of a CE injection:
      
      echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
      echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
      echo 61440 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
      echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable
      
      Example of UE injection:
      
      echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
      echo 2 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
      echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
      echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask2
      echo 17 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr1
      echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr2
      echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable
      
      Sometimes it is needed to enable the injection more then once (echo to
      the inject_enable node) for the injection to happen, I am not sure why.
      Signed-off-by: NNiklas Söderlund <niklas.soderlund@ericsson.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      9cbc6d38
    • N
      i5100_edac: add fault injection code · 53ceafd6
      Niklas Söderlund 提交于
      Add fault injection based on information datasheet for i5100, see 1. In
      addition to the i5100 datasheet some missing information on injection
      functions where found through experimentation and the i7300 datasheet,
      see 2.
      
      [1] Intel 5100 Memory Controller Hub Chipset
          Doc.Nr: 318378
          http://www.intel.com/content/dam/doc/datasheet/5100-
          memory-controller-hub-chipset-datasheet.pdf
      
      [2] Intel 7300 Chipset MemoryController Hub (MCH)
          Doc.Nr: 318082
      	http://www.intel.com/assets/pdf/datasheet/318082.pdfSigned-off-by: NNiklas Söderlund <niklas.soderlund@ericsson.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      53ceafd6
    • N
      i5100_edac: probe for device 19 function 0 · 52608ba2
      Niklas Söderlund 提交于
      Probe and store the device handle for the device 19 function 0 during
      driver initialization. The device is used during fault injection.
      Signed-off-by: NNiklas Söderlund <niklas.soderlund@ericsson.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      52608ba2
    • M
      edac: only create sdram_scrub_rate where supported · e7100478
      Mauro Carvalho Chehab 提交于
      Currently, sdram_scrub_rate sysfs node is created even if the device
      doesn't support get/set the scub rate. Change the logic to only
      create this device node when the operation is supported.
      Reported-by: NFelipe Balbi <balbi@ti.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NFelipe Balbi <balbi@ti.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      e7100478
    • M
      i3200_edac: Fix the logic that detects filled memories · 61734e18
      Mauro Carvalho Chehab 提交于
      After running a series of tests on an HP DL320, filled with different
      memory sizes, it was noticed that, when filled with just one DIMM
      on such hardware, the driver wrongly detects twice the memory, and
      thinks that both channels 0 and 1 are filled.
      
      It seems to be partially caused by the BIOS and partially by the driver.
      
      The i3200_edac current logic would be working fine if the BIOS were
      disabling the unused second channel when just one DIMM is connected,
      in order to do power-saving, as recommended on this chipset's datasheet.
      
      However, the BIOS on this particular machine doesn't do it:
      
      [   16.741421] EDAC DEBUG: how_many_channels: In dual channel mode
      [   16.741424] EDAC DEBUG: how_many_channels: 2 DIMMS per channel enabled
      
      So, the driver were assuming that 2 channels are enabled (well, they are,
      but the second is unused).
      
      Combined with that, I found two issues at the logic that creates the
      EDAC data, that were failing when the two channels are not equally
      filled (AFAICT, that happens only when just 1 DIMM is plugged).
      
      The first one is that a 0 at DRB means that nothing is filled. The
      driver's logic, however, do some calculation with that.
      
      The second one is that the logic that fills the DIMM data currently
      assumes that both channels are equally filled.
      
      I tested the system already with the current configuration and my
      patch and it is now working fine. So, for a 2R single DIMM 2Gb memory
      at dimm slot 01 (channel 0), it is now displaying:
      
      [   16.741406] EDAC DEBUG: i3200_get_drbs: drb[0][0] = 16, drb[1][0] = 0
      [   16.741410] EDAC DEBUG: i3200_get_drbs: drb[0][1] = 32, drb[1][1] = 0
      [   16.741413] EDAC DEBUG: i3200_get_drbs: drb[0][2] = 32, drb[1][2] = 0
      [   16.741416] EDAC DEBUG: i3200_get_drbs: drb[0][3] = 32, drb[1][3] = 0
      ...
      [   16.741896] EDAC DEBUG: i3200_probe1: csrow 0, channel 0, size = 1024 Mb
      [   16.741899] EDAC DEBUG: i3200_probe1: csrow 1, channel 0, size = 1024 Mb
      
      and the corresponding sysfs nodes are now properly filled.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      61734e18
    • M
      i3200_edac: Add more debug to the driver · 5f466cb0
      Mauro Carvalho Chehab 提交于
      Currently, it is not possible to know, when debug is enabled,
      if the driver is using 2 DIMMS per channel mode or not. It is
      not possible to know the values of the drbs registers, used
      to identify the memory rank sizes.
      
      Add debug for both, as it helps to track issues on the driver.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      5f466cb0
  6. 10 2月, 2013 1 次提交
  7. 30 1月, 2013 2 次提交
  8. 23 1月, 2013 4 次提交
  9. 18 1月, 2013 1 次提交
  10. 10 1月, 2013 4 次提交
  11. 08 1月, 2013 3 次提交
  12. 04 1月, 2013 1 次提交
    • G
      Drivers: edac: remove __dev* attributes. · 9b3c6e85
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, and __devexit
      from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: Jason Uhlenkott <juhlenko@akamai.com>
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Cc: Tim Small <tim@buttersideup.com>
      Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
      Cc: "Arvind R." <arvino55@gmail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Egor Martovetsky <egor@pasemi.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b3c6e85