1. 15 11月, 2013 7 次提交
  2. 27 8月, 2013 2 次提交
  3. 14 8月, 2013 1 次提交
  4. 12 8月, 2013 2 次提交
    • B
      amd64_edac: Get rid of boot_cpu_data accesses · a4b4bedc
      Borislav Petkov 提交于
      Now that we cache (family, model, stepping) locally, use them instead of
      boot_cpu_data.
      
      No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      a4b4bedc
    • A
      amd64_edac: Add ECC decoding support for newer F15h models · 18b94f66
      Aravind Gopalakrishnan 提交于
      On newer models, support has been included for upto 4 DCT's, however,
      only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10).
      Also, the routing DRAM Requests algorithm is different for F15h M30h.
      Thus it is cleaner to use a brand new function rather than adding quirks
      to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM
      Routing Requests" in the BKDG for further info.
      
      Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and
      verified to be functionally correct.
      
      While at it, verify if erratum workarounds for E505 and E637 still hold.
      From email conversations within AMD, the current status of the errata
      is:
      
            * Erratum 505: fixed in model 0x1, stepping 0x1 and later.
            * Erratum 637: not fixed.
      Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      [ Cleanups, corrections ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      18b94f66
  5. 09 8月, 2013 2 次提交
  6. 29 7月, 2013 1 次提交
    • B
      amd64_edac: Fix single-channel setups · f0a56c48
      Borislav Petkov 提交于
      It can happen that configurations are running in a single-channel mode
      even with a dual-channel memory controller, by, say, putting the DIMMs
      only on the one channel and leaving the other empty. This causes a
      problem in init_csrows which implicitly assumes that when the second
      channel is enabled, i.e. channel 1, the struct dimm hierarchy will be
      present. Which is not.
      
      So always allocate two channels unconditionally.
      
      This provides for the nice side effect that the data structures are
      initialized so some day, when memory hotplug is supported, it should
      just work out of the box when all of a sudden a second channel appears.
      Reported-and-tested-by: NRoger Leigh <rleigh@debian.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      f0a56c48
  7. 24 7月, 2013 2 次提交
    • J
      EDAC: Replace strict_strtol() with kstrtol() · c542b53d
      Jingoo Han 提交于
      The usage of strict_strtol() is not preferred, because strict_strtol()
      is obsolete. Thus, kstrtol() should be used.
      Signed-off-by: NJingoo Han <jg1.han@samsung.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      c542b53d
    • B
      EDAC: Fix lockdep splat · 88d84ac9
      Borislav Petkov 提交于
      Fix the following:
      
      BUG: key ffff88043bdd0330 not in .data!
      ------------[ cut here ]------------
      WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
      DEBUG_LOCKS_WARN_ON(1)
      Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
      CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
      Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
       0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
       ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
       ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
      Call Trace:
        dump_stack
        warn_slowpath_common
        warn_slowpath_fmt
        lockdep_init_map
        ? trace_hardirqs_on_caller
        ? trace_hardirqs_on
        debug_mutex_init
        __mutex_init
        bus_register
        edac_create_sysfs_mci_device
        edac_mc_add_mc
        sbridge_probe
        pci_device_probe
        driver_probe_device
        __driver_attach
        ? driver_probe_device
        bus_for_each_dev
        driver_attach
        bus_add_driver
        driver_register
        __pci_register_driver
        ? 0xffffffffa0010fff
        sbridge_init
        ? 0xffffffffa0010fff
        do_one_initcall
        load_module
        ? unset_module_init_ro_nx
        SyS_init_module
        tracesys
      ---[ end trace d24a70b0d3ddf733 ]---
      EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
      EDAC sbridge: Driver loaded.
      
      What happens is that bus_register needs a statically allocated lock_key
      because the last is handed in to lockdep. However, struct mem_ctl_info
      embeds struct bus_type (the whole struct, not a pointer to it) and the
      whole thing gets dynamically allocated.
      
      Fix this by using a statically allocated struct bus_type for the MC bus.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NMauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: stable@kernel.org # v3.10
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      88d84ac9
  8. 18 7月, 2013 1 次提交
  9. 11 6月, 2013 1 次提交
  10. 08 6月, 2013 2 次提交
  11. 04 6月, 2013 1 次提交
  12. 21 5月, 2013 1 次提交
  13. 09 5月, 2013 1 次提交
    • S
      EDAC: Don't give write permission to read-only files · c8c64d16
      Srivatsa S. Bhat 提交于
      I get the following warning on boot:
      
      ------------[ cut here ]------------
      WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0()
      Hardware name:  -[8737R2A]-
      Write permission without 'store'
      ...
      </snip>
      
      Drilling down, this is related to dynamic channel ce_count attribute
      files sporting a S_IWUSR mode without a ->store() function. Looking
      around, it appears that they aren't supposed to have a ->store()
      function. So remove the bogus write permission to get rid of the
      warning.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Cc: <stable@vger.kernel.org> # 3.[89]
      [ shorten commit message ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      c8c64d16
  14. 29 4月, 2013 2 次提交
    • L
      edac: sb_edac.c should not require prescence of IMC_DDRIO device · de4772c6
      Luck, Tony 提交于
      The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR
      space to determine the type of DIMMs (registered or unregistered).
      But this device does not exist on some single socket Sandy Bridge
      servers.  While the type of DIMMs is nice to know, it is not essential
      for this driver's other functions. So it seems harsh to have it
      refuse to load at all when it cannot find this device.
      
      Make the check for this device be optional. If it isn't present
      just report the memory type as "MEM_UNKNOWN".
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      de4772c6
    • M
      i7300_edac: Fix memory detection in single mode · 33ad4126
      Mauro Carvalho Chehab 提交于
      When the machine is on single mode, only branch 0 channel 0
      is valid. However, the code is not honouring it:
      
      [ 1952.639341] EDAC DEBUG: i7300_get_mc_regs: Memory controller operating on single mode
      ...
      [ 1952.639351] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH0 = 0x1:
      [ 1952.639353] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH1 = 0x0:
      [ 1952.639355] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH2 = 0x0:
      [ 1952.639358] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH3 = 0x0:
      ...
      [ 1952.639360] EDAC DEBUG: decode_mtr: 	MTR0 CH0: DIMMs are Present (mtr)
      [ 1952.639362] EDAC DEBUG: decode_mtr: 		WIDTH: x8
      [ 1952.639363] EDAC DEBUG: decode_mtr: 		ELECTRICAL THROTTLING is enabled
      [ 1952.639364] EDAC DEBUG: decode_mtr: 		NUMBANK: 4 bank(s)
      [ 1952.639366] EDAC DEBUG: decode_mtr: 		NUMRANK: single
      [ 1952.639367] EDAC DEBUG: decode_mtr: 		NUMROW: 16,384 - 14 rows
      [ 1952.639368] EDAC DEBUG: decode_mtr: 		NUMCOL: 1,024 - 10 columns
      [ 1952.639370] EDAC DEBUG: decode_mtr: 		SIZE: 512 MB
      [ 1952.639371] EDAC DEBUG: decode_mtr: 		ECC code is 8-byte-over-32-byte SECDED+ code
      [ 1952.639373] EDAC DEBUG: decode_mtr: 		Scrub algorithm for x8 is on enhanced mode
      [ 1952.639374] EDAC DEBUG: decode_mtr: 	MTR0 CH1: DIMMs are Present (mtr)
      [ 1952.639376] EDAC DEBUG: decode_mtr: 		WIDTH: x8
      [ 1952.639377] EDAC DEBUG: decode_mtr: 		ELECTRICAL THROTTLING is enabled
      [ 1952.639379] EDAC DEBUG: decode_mtr: 		NUMBANK: 4 bank(s)
      [ 1952.639380] EDAC DEBUG: decode_mtr: 		NUMRANK: single
      [ 1952.639381] EDAC DEBUG: decode_mtr: 		NUMROW: 16,384 - 14 rows
      [ 1952.639383] EDAC DEBUG: decode_mtr: 		NUMCOL: 1,024 - 10 columns
      [ 1952.639384] EDAC DEBUG: decode_mtr: 		SIZE: 512 MB
      [ 1952.639385] EDAC DEBUG: decode_mtr: 		ECC code is 8-byte-over-32-byte SECDED+ code
      [ 1952.639387] EDAC DEBUG: decode_mtr: 		Scrub algorithm for x8 is on enhanced mode
      ...
      [ 1952.639449] EDAC DEBUG: print_dimm_size:               channel 0 | channel 1 | channel 2 | channel 3 |
      [ 1952.639451] EDAC DEBUG: print_dimm_size: -------------------------------------------------------------
      [ 1952.639453] EDAC DEBUG: print_dimm_size: csrow/SLOT 0   512 MB   |  512 MB   |    0 MB   |    0 MB   |
      [ 1952.639456] EDAC DEBUG: print_dimm_size: csrow/SLOT 1     0 MB   |    0 MB   |    0 MB   |    0 MB   |
      [ 1952.639458] EDAC DEBUG: print_dimm_size: csrow/SLOT 2     0 MB   |    0 MB   |    0 MB   |    0 MB   |
      [ 1952.639460] EDAC DEBUG: print_dimm_size: csrow/SLOT 3     0 MB   |    0 MB   |    0 MB   |    0 MB   |
      [ 1952.639462] EDAC DEBUG: print_dimm_size: csrow/SLOT 4     0 MB   |    0 MB   |    0 MB   |    0 MB   |
      [ 1952.639464] EDAC DEBUG: print_dimm_size: csrow/SLOT 5     0 MB   |    0 MB   |    0 MB   |    0 MB   |
      [ 1952.639466] EDAC DEBUG: print_dimm_size: csrow/SLOT 6     0 MB   |    0 MB   |    0 MB   |    0 MB   |
      [ 1952.639468] EDAC DEBUG: print_dimm_size: csrow/SLOT 7     0 MB   |    0 MB   |    0 MB   |    0 MB   |
      [ 1952.639470] EDAC DEBUG: print_dimm_size: -------------------------------------------------------------
      
      Instead of detecting a single memory at channel 0, it is showing
      twice the memory.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      33ad4126
  15. 19 4月, 2013 1 次提交
  16. 25 3月, 2013 1 次提交
  17. 16 3月, 2013 2 次提交
  18. 05 3月, 2013 1 次提交
  19. 26 2月, 2013 9 次提交
    • W
      i5100_edac: convert to use simple_open() · b0769891
      Wei Yongjun 提交于
      This removes an open coded simple_open() function and
      replaces file operations references to the function
      with simple_open() instead.
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      b0769891
    • W
      ghes_edac: fix to use list_for_each_entry_safe() when delete list items · 5dae92a7
      Wei Yongjun 提交于
      Since we will remove items off the list using list_del() we need
      to use a safe version of the list_for_each_entry() macro aptly named
      list_for_each_entry_safe().
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      5dae92a7
    • M
      ghes_edac: Fix RAS tracing · 8ae8f50a
      Mauro Carvalho Chehab 提交于
      With the current version of CPER, there's no way to associate an
      error with the memory error. So, the error location in EDAC
      layers is unused.
      
      As CPER has its own idea about memory architectural layers, just
      output whatever is there inside the driver's detail at the RAS
      tracepoint.
      
      The EDAC location keeps untouched, in the case that, in some future,
      we could actually map the error into the dimm labels.
      
      Now, the error message:
      
      [   72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
      [   72.396627] {1}[Hardware Error]: APEI generic hardware error status
      [   72.396628] {1}[Hardware Error]: severity: 2, corrected
      [   72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected
      [   72.396632] {1}[Hardware Error]: flags: 0x01
      [   72.396634] {1}[Hardware Error]: primary
      [   72.396635] {1}[Hardware Error]: section_type: memory error
      [   72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400
      [   72.396638] {1}[Hardware Error]: node: 3
      [   72.396639] {1}[Hardware Error]: card: 0
      [   72.396640] {1}[Hardware Error]: module: 0
      [   72.396641] {1}[Hardware Error]: device: 0
      [   72.396643] {1}[Hardware Error]: error_type: 18, unknown
      [   72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)
      
      Is properly represented on the trace event:
      
           kworker/0:2-584   [000] ....    72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location:-1:-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory)
      
      Tested on a 4 sockets E5-4650 Sandy Bridge machine.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      8ae8f50a
    • M
      ghes_edac: Make it compliant with UEFI spec 2.3.1 · 689c9cd8
      Mauro Carvalho Chehab 提交于
      The UEFI spec defines the memory error types ans the bits that
      validate each field on the memory error record, at
      Appendix N om items N.2.5 (Memory Error Section) and
      N.2.11 (Error Status). Make the error description compliant with
      it, only showing the valid fields.
      
      The EDAC error log is now properly reporting the error:
      
      [  281.556854] mce: [Hardware Error]: Machine check events logged
      [  281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
      [  281.557044] {2}[Hardware Error]: APEI generic hardware error status
      [  281.557046] {2}[Hardware Error]: severity: 2, corrected
      [  281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected
      [  281.557050] {2}[Hardware Error]: flags: 0x01
      [  281.557052] {2}[Hardware Error]: primary
      [  281.557053] {2}[Hardware Error]: section_type: memory error
      [  281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400
      [  281.557056] {2}[Hardware Error]: node: 3
      [  281.557057] {2}[Hardware Error]: card: 0
      [  281.557058] {2}[Hardware Error]: module: 1
      [  281.557059] {2}[Hardware Error]: device: 0
      [  281.557061] {2}[Hardware Error]: error_type: 18, unknown
      [  281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9
      [  281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)
      
      Tested on a 4 CPUs E5-4650 Sandy Bridge machine.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      689c9cd8
    • M
      ghes_edac: Improve driver's printk messages · d2a68566
      Mauro Carvalho Chehab 提交于
      Provide a better infrastructure for printk's inside the driver:
      	- use edac_dbg() for debug messages;
      	- standardize the usage of pr_info();
      	- provide warning about the risk of relying on this
      	  driver.
      
      While here, changes the size of a fake memory to 1 page. This is
      as good or as bad as 1000 pages, but it is easier for userspace to
      detect, as I don't expect that any machine implementing GHES would
      provide just 1 page available ;)
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      
      Conflicts:
      	drivers/edac/ghes_edac.c
      d2a68566
    • M
      ghes_edac: Don't credit the same memory dimm twice · 5ee726db
      Mauro Carvalho Chehab 提交于
      On my tests on a 4xE5-4650 CPU's system, the GHES
      EDAC driver is called twice. As the SMBIOS DMI enumeration
      call will seek for the entire DIMM sockets in the system, on
      this machine, equipped with 128 GB of RAM, the memory is
      displayed twice:
      
                +-----------------------+
                |    mc0    |    mc1    |
      ----------+-----------------------+
      memory45: |  8192 MB  |  8192 MB  |
      memory44: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory43: |     0 MB  |     0 MB  |
      memory42: |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory41: |     0 MB  |     0 MB  |
      memory40: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory39: |  8192 MB  |  8192 MB  |
      memory38: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory37: |     0 MB  |     0 MB  |
      memory36: |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory35: |     0 MB  |     0 MB  |
      memory34: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory33: |  8192 MB  |  8192 MB  |
      memory32: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory31: |     0 MB  |     0 MB  |
      memory30: |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory29: |     0 MB  |     0 MB  |
      memory28: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory27: |  8192 MB  |  8192 MB  |
      memory26: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory25: |     0 MB  |     0 MB  |
      memory24: |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory23: |     0 MB  |     0 MB  |
      memory22: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory21: |  8192 MB  |  8192 MB  |
      memory20: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory19: |     0 MB  |     0 MB  |
      memory18: |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory17: |     0 MB  |     0 MB  |
      memory16: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory15: |  8192 MB  |  8192 MB  |
      memory14: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory13: |     0 MB  |     0 MB  |
      memory12: |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory11: |     0 MB  |     0 MB  |
      memory10: |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory9:  |  8192 MB  |  8192 MB  |
      memory8:  |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory7:  |     0 MB  |     0 MB  |
      memory6:  |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      memory5:  |     0 MB  |     0 MB  |
      memory4:  |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory3:  |  8192 MB  |  8192 MB  |
      memory2:  |     0 MB  |     0 MB  |
      ----------+-----------------------+
      memory1:  |     0 MB  |     0 MB  |
      memory0:  |  8192 MB  |  8192 MB  |
      ----------+-----------------------+
      
      Total sum of 256 GB.
      
      As there's no reliable way to credit DIMMS to the right memory
      controller, just put everything on memory controller 0 (with should
      always exist).
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      5ee726db
    • M
      ghes_edac: do a better job of filling EDAC DIMM info · 32fa1f53
      Mauro Carvalho Chehab 提交于
      Instead of just faking a random value for the DIMM data, get
      the information that it is available via DMI table.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      32fa1f53
    • M
      ghes_edac: add support for reporting errors via EDAC · f04c62a7
      Mauro Carvalho Chehab 提交于
      Now that the EDAC core is capable of just forward the errors via
      the userspace API, add a report mechanism for the GHES errors.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      f04c62a7
    • M
      ghes_edac: Register at EDAC core the BIOS report · 77c5f5d2
      Mauro Carvalho Chehab 提交于
      Register GHES at EDAC MC core, in order to avoid other
      drivers to also handle errors and mangle with error data.
      
      The edac core will warrant that just one driver will be used,
      so the first one to register (BIOS first) will be the one that
      will be reporting the hardware errors.
      
      For now, the EDAC driver does nothing but to register at the
      EDAC core, preventing the hardware-driven mechanism to
      interfere with GHES.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      77c5f5d2