1. 03 12月, 2014 2 次提交
  2. 02 12月, 2014 3 次提交
    • T
      sb_edac: Fix discovery of top-of-low-memory for Haswell · f7cf2a22
      Tony Luck 提交于
      Haswell moved the TOLM/TOHM registers to a different device and offset.
      The sb_edac driver accounted for the change of device, but not for the
      new offset.  There was also a typo in the constant to fill in the low
      26 bits (was 0x1ffffff, should be 0x3ffffff).
      
      This resulted in a bogus value for the top of low memory:
      
        EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff)
      
      which would result in EDAC refusing to translate addresses for
      errors above the bogus value and below 4GB:
      
         sbridge MC3: HANDLING MCE MEMORY ERROR
         sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
         sbridge MC3: TSC 0
         sbridge MC3: ADDR 2000000
         sbridge MC3: MISC 523eac86
         sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0
         MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0)
      
      With the fix we see the correct TOLM value:
      
         DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff)
      
      and we decode address 2000000 correctly:
      
         sbridge MC3: HANDLING MCE MEMORY ERROR
         sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
         sbridge MC3: TSC 0
         sbridge MC3: ADDR 2000000
         sbridge MC3: MISC 523e1086
         sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0
         DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0
         DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01
         DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1
         DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0
         DEBUG: sbridge_mce_output_error:  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0
         MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
      f7cf2a22
    • J
    • J
      sb_edac: Fix off-by-one error in number of channels · 50043e25
      Jim Snow 提交于
      This prevented edac sysfs code from properly handling 6 channels
      per memory controller.
      Signed-off-by: NJim Snow <jim.snow@intel.com>
      Signed-off-by: NLukasz Anaczkowski <lukasz.anaczkowski@intel.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
      50043e25
  3. 23 10月, 2014 4 次提交
  4. 09 10月, 2014 3 次提交
  5. 30 9月, 2014 1 次提交
    • B
      mpc85xx_edac: Make L2 interrupt shared too · a18c3f16
      Borislav Petkov 提交于
      The other two interrupt handlers in this driver are shared, except this
      one. When loading the driver, it fails like this.
      
      So make the IRQ line shared.
      
      Freescale(R) MPC85xx EDAC driver, (C) 2006 Montavista Software
      mpc85xx_mc_err_probe: No ECC DIMMs discovered
      EDAC DEVICE0: Giving out device to module MPC85xx_edac controller mpc85xx_l2_err: DEV mpc85xx_l2_err (INTERRUPT)
      genirq: Flags mismatch irq 16. 00000000 ([EDAC] L2 err) vs. 00000080 ([EDAC] PCI err)
      mpc85xx_l2_err_probe: Unable to request irq 16 for MPC85xx L2 err
      remove_proc_entry: removing non-empty directory 'irq/16', leaking at least 'aerdrv'
      ------------[ cut here ]------------
      WARNING: at fs/proc/generic.c:521
      Modules linked in:
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-rc5-dirty #1
      task: ee058000 ti: ee046000 task.ti: ee046000
      NIP: c016c0c4 LR: c016c0c4 CTR: c037b51c
      REGS: ee047c10 TRAP: 0700 Not tainted (3.17.0-rc5-dirty)
      MSR: 00029000 <CE,EE,ME> CR: 22008022 XER: 20000000
      
      GPR00: c016c0c4 ee047cc0 ee058000 00000053 00029000 00000000 c037c744 00000003
      GPR08: c09aab28 c09aab24 c09aab28 00000156 20008028 00000000 c0002ac8 00000000
      GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000139 c0950394
      GPR24: c09f0000 ee5585b0 ee047d08 c0a10000 ee047d08 ee15f808 00000002 ee03f660
      NIP [c016c0c4] remove_proc_entry
      LR [c016c0c4] remove_proc_entry
      Call Trace:
      remove_proc_entry (unreliable)
      unregister_irq_proc
      free_desc
      irq_free_descs
      mpc85xx_l2_err_probe
      platform_drv_probe
      really_probe
      __driver_attach
      bus_for_each_dev
      bus_add_driver
      driver_register
      mpc85xx_mc_init
      do_one_initcall
      kernel_init_freeable
      kernel_init
      ret_from_kernel_thread
      Instruction dump: ...
      
      Reported-and-tested-by: <lpb_098@163.com>
      Acked-by: NJohannes Thumshirn <johannes.thumshirn@men.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      a18c3f16
  6. 23 9月, 2014 1 次提交
    • A
      amd64_edac: Modify usage of amd64_read_dct_pci_cfg() · 7981a28f
      Aravind Gopalakrishnan 提交于
      Rationale behind this change:
       - F2x1xx addresses were stopped from being mapped explicitly to DCT1
         from F15h (OR) onwards. They use _dct[0:1] mechanism to access the
         registers. So we should move away from using address ranges to select
         DCT for these families.
       - On newer processors, the address ranges used to indicate DCT1 (0x140,
         0x1a0) have different meanings than what is assumed currently.
      
      Changes introduced:
       - amd64_read_dct_pci_cfg() now takes in dct value and uses it for
         'selecting the dct'
       - Update usage of the function. Keep in mind that different families
         have specific handling requirements
       - Remove [k8|f10]_read_dct_pci_cfg() as they don't do much different
         from amd64_read_pci_cfg()
         - Move the k8 specific check to amd64_read_pci_cfg
       - Remove f15_read_dct_pci_cfg() and move logic to amd64_read_dct_pci_cfg()
       - Remove now needless .read_dct_pci_cfg
      
      Testing:
       - Tested on Fam 10h; Fam15h Models: 00h, 30h; Fam16h using 'EDAC_DEBUG'
         and mce_amd_inj
       - driver obtains info from F2x registers and caches it in pvt
         structures correctly
       - ECC decoding works fine
      Signed-off-by: NAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
      Link: http://lkml.kernel.org/r/1410799058-3149-1-git-send-email-aravind.gopalakrishnan@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
      7981a28f
  7. 15 9月, 2014 1 次提交
  8. 05 9月, 2014 1 次提交
  9. 02 9月, 2014 1 次提交
  10. 14 7月, 2014 1 次提交
  11. 10 7月, 2014 1 次提交
  12. 04 7月, 2014 2 次提交
  13. 27 6月, 2014 12 次提交
  14. 24 6月, 2014 2 次提交
  15. 31 5月, 2014 1 次提交
  16. 30 5月, 2014 1 次提交
  17. 09 5月, 2014 2 次提交
  18. 01 4月, 2014 1 次提交