1. 03 6月, 2015 2 次提交
  2. 21 3月, 2015 1 次提交
  3. 12 3月, 2015 1 次提交
  4. 23 2月, 2015 11 次提交
  5. 17 2月, 2015 1 次提交
    • D
      EDAC, amd64_edac: Prevent OOPS with >16 memory controllers · 0c510cc8
      Daniel J Blueman 提交于
      When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16),
      the kernel fatally dereferences unallocated structures, see splat below;
      this occurs on at least NumaConnect systems.
      
      Fix by checking if a memory controller info structure was found.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000320
      IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
      PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0
      Oops: 0000 [#2] SMP
      Modules linked in:
      CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G   D    3.19.0 #1
      Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b    01/28/2015
      task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000
      RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
      RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297
      RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6
      RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c
      RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022
      R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008
      R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000
      FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0
      Stack:
       0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13
       000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a
       ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010
      Call Trace:
       <IRQ>
       ? vprintk_default
       ? printk
       amd_decode_mce
       notifier_call_chain
       atomic_notifier_call_chain
       mce_log
       machine_check_poll
       mce_timer_fn
       ? mce_cpu_restart
       call_timer_fn.isra.29
       run_timer_softirq
       __do_softirq
       irq_exit
       smp_apic_timer_interrupt
       apic_timer_interrupt
       <EOI>
       ? down_read_trylock
       __do_page_fault
       ? __schedule
       do_page_fault
       page_fault
      Signed-off-by: NDaniel J Blueman <daniel@numascale.com>
      Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com
      Cc: stable@vger.kernel.org
      [ Boris: massage commit message ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      0c510cc8
  6. 09 2月, 2015 1 次提交
    • B
      sb_edac: Fix detection on SNB machines · 11249e73
      Borislav Petkov 提交于
      d0585cd8 ("sb_edac: Claim a different PCI device") changed the
      probing of sb_edac to look for PCI device 0x3ca0:
      
      3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07)
      00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00
      ...
      
      but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA
      in sbridge_probe() therefore the probing fails.
      
      Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0),
      .i.e., the 14.0 device, fixes the issue and driver loads successfully
      again:
      
      [ 2449.013120] EDAC DEBUG: sbridge_init:
      [ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
      [ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0
      [ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
      [ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
      [ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8
      [ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
      ...
      
      Add a debug printk while at it to be able to catch the failure in the
      future and dump driver version on successful load.
      
      Fixes: d0585cd8 ("sb_edac: Claim a different PCI device")
      Cc: stable@vger.kernel.org # 3.18
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      11249e73
  7. 31 1月, 2015 1 次提交
  8. 30 1月, 2015 2 次提交
  9. 12 1月, 2015 1 次提交
  10. 07 1月, 2015 1 次提交
  11. 02 1月, 2015 1 次提交
  12. 21 12月, 2014 1 次提交
  13. 03 12月, 2014 2 次提交
  14. 02 12月, 2014 3 次提交
    • T
      sb_edac: Fix discovery of top-of-low-memory for Haswell · f7cf2a22
      Tony Luck 提交于
      Haswell moved the TOLM/TOHM registers to a different device and offset.
      The sb_edac driver accounted for the change of device, but not for the
      new offset.  There was also a typo in the constant to fill in the low
      26 bits (was 0x1ffffff, should be 0x3ffffff).
      
      This resulted in a bogus value for the top of low memory:
      
        EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff)
      
      which would result in EDAC refusing to translate addresses for
      errors above the bogus value and below 4GB:
      
         sbridge MC3: HANDLING MCE MEMORY ERROR
         sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
         sbridge MC3: TSC 0
         sbridge MC3: ADDR 2000000
         sbridge MC3: MISC 523eac86
         sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0
         MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0)
      
      With the fix we see the correct TOLM value:
      
         DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff)
      
      and we decode address 2000000 correctly:
      
         sbridge MC3: HANDLING MCE MEMORY ERROR
         sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
         sbridge MC3: TSC 0
         sbridge MC3: ADDR 2000000
         sbridge MC3: MISC 523e1086
         sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0
         DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0
         DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01
         DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1
         DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0
         DEBUG: sbridge_mce_output_error:  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0
         MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
      f7cf2a22
    • J
    • J
      sb_edac: Fix off-by-one error in number of channels · 50043e25
      Jim Snow 提交于
      This prevented edac sysfs code from properly handling 6 channels
      per memory controller.
      Signed-off-by: NJim Snow <jim.snow@intel.com>
      Signed-off-by: NLukasz Anaczkowski <lukasz.anaczkowski@intel.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
      50043e25
  15. 25 11月, 2014 5 次提交
  16. 20 11月, 2014 1 次提交
  17. 19 11月, 2014 1 次提交
  18. 12 11月, 2014 2 次提交
  19. 05 11月, 2014 2 次提交