1. 17 2月, 2015 1 次提交
    • D
      EDAC, amd64_edac: Prevent OOPS with >16 memory controllers · 0c510cc8
      Daniel J Blueman 提交于
      When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16),
      the kernel fatally dereferences unallocated structures, see splat below;
      this occurs on at least NumaConnect systems.
      
      Fix by checking if a memory controller info structure was found.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000320
      IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
      PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0
      Oops: 0000 [#2] SMP
      Modules linked in:
      CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G   D    3.19.0 #1
      Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b    01/28/2015
      task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000
      RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
      RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297
      RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6
      RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c
      RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022
      R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008
      R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000
      FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0
      Stack:
       0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13
       000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a
       ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010
      Call Trace:
       <IRQ>
       ? vprintk_default
       ? printk
       amd_decode_mce
       notifier_call_chain
       atomic_notifier_call_chain
       mce_log
       machine_check_poll
       mce_timer_fn
       ? mce_cpu_restart
       call_timer_fn.isra.29
       run_timer_softirq
       __do_softirq
       irq_exit
       smp_apic_timer_interrupt
       apic_timer_interrupt
       <EOI>
       ? down_read_trylock
       __do_page_fault
       ? __schedule
       do_page_fault
       page_fault
      Signed-off-by: NDaniel J Blueman <daniel@numascale.com>
      Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com
      Cc: stable@vger.kernel.org
      [ Boris: massage commit message ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      0c510cc8
  2. 05 11月, 2014 1 次提交
  3. 30 10月, 2014 1 次提交
    • A
      amd64_edac: Add F15h M60h support · a597d2a5
      Aravind Gopalakrishnan 提交于
      This patch adds support for ECC error decoding for F15h M60h processor.
      Aside from the usual changes, the patch adds support for some new features
      in the processor:
       - DDR4(unbuffered, registered); LRDIMM DDR3 support
         - relevant debug messages have been modified/added to report these
           memory types
       - new dbam_to_cs mappers
         - if (F15h M60h && LRDIMM); we need a 'multiplier' value to find
           cs_size. This multiplier value is obtained from the per-dimm
           DCSM register. So, change the interface to accept a 'cs_mask_nr'
           value to facilitate this calculation
       - switch-casing determine_memory_type()
         - done to cleanse the function of too many if-else statements
           and improve readability
         - This is now called early in read_mc_regs() to cache dram_type
      
      Misc cleanup:
       - amd64_pci_table[] is condensed by using PCI_VDEVICE macro.
      
      Testing details:
      Tested the patch by injecting 'ECC' type errors using mce_amd_inj
      and error decoding works fine.
      Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Link: http://lkml.kernel.org/r/1414617483-4941-1-git-send-email-Aravind.Gopalakrishnan@amd.com
      [ Boris: determine_memory_type() cleanups ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      a597d2a5
  4. 23 9月, 2014 1 次提交
    • A
      amd64_edac: Modify usage of amd64_read_dct_pci_cfg() · 7981a28f
      Aravind Gopalakrishnan 提交于
      Rationale behind this change:
       - F2x1xx addresses were stopped from being mapped explicitly to DCT1
         from F15h (OR) onwards. They use _dct[0:1] mechanism to access the
         registers. So we should move away from using address ranges to select
         DCT for these families.
       - On newer processors, the address ranges used to indicate DCT1 (0x140,
         0x1a0) have different meanings than what is assumed currently.
      
      Changes introduced:
       - amd64_read_dct_pci_cfg() now takes in dct value and uses it for
         'selecting the dct'
       - Update usage of the function. Keep in mind that different families
         have specific handling requirements
       - Remove [k8|f10]_read_dct_pci_cfg() as they don't do much different
         from amd64_read_pci_cfg()
         - Move the k8 specific check to amd64_read_pci_cfg
       - Remove f15_read_dct_pci_cfg() and move logic to amd64_read_dct_pci_cfg()
       - Remove now needless .read_dct_pci_cfg
      
      Testing:
       - Tested on Fam 10h; Fam15h Models: 00h, 30h; Fam16h using 'EDAC_DEBUG'
         and mce_amd_inj
       - driver obtains info from F2x registers and caches it in pvt
         structures correctly
       - ECC decoding works fine
      Signed-off-by: NAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
      Link: http://lkml.kernel.org/r/1410799058-3149-1-git-send-email-aravind.gopalakrishnan@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
      7981a28f
  5. 28 2月, 2014 1 次提交
  6. 07 2月, 2014 1 次提交
  7. 16 12月, 2013 3 次提交
  8. 06 12月, 2013 2 次提交
  9. 22 10月, 2013 1 次提交
  10. 27 8月, 2013 2 次提交
  11. 12 8月, 2013 2 次提交
    • B
      amd64_edac: Get rid of boot_cpu_data accesses · a4b4bedc
      Borislav Petkov 提交于
      Now that we cache (family, model, stepping) locally, use them instead of
      boot_cpu_data.
      
      No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      a4b4bedc
    • A
      amd64_edac: Add ECC decoding support for newer F15h models · 18b94f66
      Aravind Gopalakrishnan 提交于
      On newer models, support has been included for upto 4 DCT's, however,
      only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10).
      Also, the routing DRAM Requests algorithm is different for F15h M30h.
      Thus it is cleaner to use a brand new function rather than adding quirks
      to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM
      Routing Requests" in the BKDG for further info.
      
      Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and
      verified to be functionally correct.
      
      While at it, verify if erratum workarounds for E505 and E637 still hold.
      From email conversations within AMD, the current status of the errata
      is:
      
            * Erratum 505: fixed in model 0x1, stepping 0x1 and later.
            * Erratum 637: not fixed.
      Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      [ Cleanups, corrections ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      18b94f66
  12. 29 7月, 2013 1 次提交
    • B
      amd64_edac: Fix single-channel setups · f0a56c48
      Borislav Petkov 提交于
      It can happen that configurations are running in a single-channel mode
      even with a dual-channel memory controller, by, say, putting the DIMMs
      only on the one channel and leaving the other empty. This causes a
      problem in init_csrows which implicitly assumes that when the second
      channel is enabled, i.e. channel 1, the struct dimm hierarchy will be
      present. Which is not.
      
      So always allocate two channels unconditionally.
      
      This provides for the nice side effect that the data structures are
      initialized so some day, when memory hotplug is supported, it should
      just work out of the box when all of a sudden a second channel appears.
      Reported-and-tested-by: NRoger Leigh <rleigh@debian.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      f0a56c48
  13. 19 4月, 2013 1 次提交
  14. 16 3月, 2013 2 次提交
  15. 23 1月, 2013 1 次提交
  16. 10 1月, 2013 4 次提交
  17. 04 1月, 2013 1 次提交
    • G
      Drivers: edac: remove __dev* attributes. · 9b3c6e85
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, and __devexit
      from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: Jason Uhlenkott <juhlenko@akamai.com>
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Cc: Tim Small <tim@buttersideup.com>
      Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
      Cc: "Arvind R." <arvino55@gmail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Egor Martovetsky <egor@pasemi.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b3c6e85
  18. 28 11月, 2012 9 次提交
  19. 24 10月, 2012 1 次提交
  20. 12 6月, 2012 4 次提交