1. 03 6月, 2016 2 次提交
  2. 03 5月, 2016 1 次提交
    • T
      EDAC, sb_edac: Use cpu family/model in driver detection · 2c1ea4c7
      Tony Luck 提交于
      Instead of picking a random PCI ID from the dozen or so we need to
      access, just use x86_match_cpu() to pick based on CPU model number. The
      choosing of PCI devices has been problematic in the past, see
      
        11249e73 ("sb_edac: Fix detection on SNB machines")
      
      which fixed problems introduced by
      
        d0585cd8 ("sb_edac: Claim a different PCI device").
      
      This is especially ugly if future hardware might not even have
      EDAC-relevant registers in PCI config space and we would still be
      required to choose some "random" PCI devices to scan for just so our
      driver loads.
      
      Is this cleaner/clearer? It deletes much more code than it adds. Only
      tested on Broadwell. The driver loads/unloads and loads again. Still
      decodes errors too.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Suggested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      2c1ea4c7
  3. 29 4月, 2016 1 次提交
  4. 23 4月, 2016 1 次提交
  5. 22 4月, 2016 2 次提交
  6. 11 3月, 2016 1 次提交
    • L
      EDAC/sb_edac: Fix computation of channel address · eb1af3b7
      Luck, Tony 提交于
      Large memory Haswell-EX systems with multiple DIMMs per channel were
      sometimes reporting the wrong DIMM.
      
      Found three problems:
      
       1) Debug printouts for socket and channel interleave were not interpreting
          the register fields correctly. The socket interleave field is a 2^X
          value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2,
          2=3. 3=4).
      
       2) Actual use of the socket interleave value didn't interpret as 2^X
      
       3) Conversion of address to channel address was complicated, and wrong.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-edac@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      eb1af3b7
  7. 08 3月, 2016 1 次提交
  8. 11 12月, 2015 1 次提交
  9. 06 12月, 2015 3 次提交
  10. 25 9月, 2015 1 次提交
    • S
      EDAC, sb_edac: Fix TAD presence check for sbridge_mci_bind_devs() · 2900ea60
      Seth Jennings 提交于
      In commit
      
        7d375bff ("sb_edac: Fix support for systems with two home agents per socket")
      
      NUM_CHANNELS was changed to 8 and the channel space was renumerated to
      handle EN, EP, and EX configurations.
      
      The *_mci_bind_devs() functions - except for sbridge_mci_bind_devs() -
      got a new device presence check in the form of saw_chan_mask. However,
      sbridge_mci_bind_devs() still uses the NUM_CHANNELS for loop.
      
      With the increase in NUM_CHANNELS, this loop fails at index 4 since
      SB only has 4 TADs.  This results in the following error on SB machines:
      
        EDAC sbridge: Some needed devices are missing
        EDAC sbridge: Couldn't find mci handler
        EDAC sbridge: Couldn't find mci handle
      
      This patch adapts the saw_chan_mask logic for sbridge_mci_bind_devs() as
      well.
      
      After this patch:
      
        EDAC MC0: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#0: DEV 0000:3f:0e.0 (POLLED)
        EDAC MC1: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#1: DEV 0000:7f:0e.0 (POLLED)
      Signed-off-by: NSeth Jennings <sjenning@redhat.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # v4.2
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1438798561-10180-1-git-send-email-sjenning@redhat.comSigned-off-by: NBorislav Petkov <bp@suse.de>
      2900ea60
  11. 09 9月, 2015 2 次提交
  12. 13 8月, 2015 2 次提交
  13. 03 6月, 2015 3 次提交
    • T
      sb_edac: support for Broadwell -EP and -EX · fa2ce64f
      Tony Luck 提交于
      Basic support for the single socket Broadwell-DE processor
      was added back in commit 1f39581a
         sb_edac: Add support for Broadwell-DE processor
      This patch extends Broadwell support to cover the two
      socket "-EP" and four socket "-EX" versions of Broadwell.
      Only tested on the 2 socket - but this code is largely
      cloned from the Haswell path.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
      fa2ce64f
    • T
      sb_edac: Fix support for systems with two home agents per socket · 7d375bff
      Tony Luck 提交于
      First noticed a problem on a 4 socket machine where EDAC only reported
      half the DIMMS.  Tracked this down to the code that assumes that systems
      with two home agents only have two memory channels on each agent. This
      is true on 2 sockect ("-EP") machines. But four socket ("-EX") machines
      have four memory channels on each home agent.
      
      The old code would have had problems on two socket systems as it did
      a shuffling trick to make the internals of the code think that the
      channels from the first agent were '0' and '1', with the second agent
      providing '2' and '3'. But the code didn't uniformly convert from
      {ha,channel} tuples to this internal representation.
      
      New code always considers up to eight channels.
      On a machine with a single home agent these map easily to edac channels
      0, 1, 2, 3. On machines with two home agents we map using:
        edac_channel = 4*ha# + channel
      So on a -EP machine where each home agent supports only two channels
      we'll fill in channels 0, 1, 4, 5, and on a -EX machine we use all of 0,
      1, 2, 3, 4, 5, 6, 7.
      
      [mchehab@osg.samsung.com: fold a fixup patch as per Tony's request and fixed
       a few CodingStyle issues]
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
      7d375bff
    • T
      sb_edac: Fix a typo and a thinko in address handling for Haswell · bb89e714
      Tony Luck 提交于
      typo: "a7mode" chooses whether to use bits {8, 7, 9} or {8, 7, 6}
      in the algorithm to spread access between memory resources. But
      the non-a7mode path was incorrectly using GET_BITFIELD(addr, 7, 9)
      and so picking bits {9, 8, 7}
      
      thinko: BIT(1) of the dram_rule registers chooses whether to just
      use the {8, 7, 6} (or {8, 7, 9}) bits mentioned above as they are,
      or to XOR them with bits {18, 17, 16} but the code inverted the
      test. We need the additional XOR when dram_rule{1} == 0.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
      bb89e714
  14. 09 2月, 2015 1 次提交
    • B
      sb_edac: Fix detection on SNB machines · 11249e73
      Borislav Petkov 提交于
      d0585cd8 ("sb_edac: Claim a different PCI device") changed the
      probing of sb_edac to look for PCI device 0x3ca0:
      
      3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07)
      00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00
      ...
      
      but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA
      in sbridge_probe() therefore the probing fails.
      
      Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0),
      .i.e., the 14.0 device, fixes the issue and driver loads successfully
      again:
      
      [ 2449.013120] EDAC DEBUG: sbridge_init:
      [ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
      [ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0
      [ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
      [ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
      [ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8
      [ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
      ...
      
      Add a debug printk while at it to be able to catch the failure in the
      future and dump driver version on successful load.
      
      Fixes: d0585cd8 ("sb_edac: Claim a different PCI device")
      Cc: stable@vger.kernel.org # 3.18
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      11249e73
  15. 03 12月, 2014 2 次提交
  16. 02 12月, 2014 2 次提交
    • T
      sb_edac: Fix discovery of top-of-low-memory for Haswell · f7cf2a22
      Tony Luck 提交于
      Haswell moved the TOLM/TOHM registers to a different device and offset.
      The sb_edac driver accounted for the change of device, but not for the
      new offset.  There was also a typo in the constant to fill in the low
      26 bits (was 0x1ffffff, should be 0x3ffffff).
      
      This resulted in a bogus value for the top of low memory:
      
        EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff)
      
      which would result in EDAC refusing to translate addresses for
      errors above the bogus value and below 4GB:
      
         sbridge MC3: HANDLING MCE MEMORY ERROR
         sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
         sbridge MC3: TSC 0
         sbridge MC3: ADDR 2000000
         sbridge MC3: MISC 523eac86
         sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0
         MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0)
      
      With the fix we see the correct TOLM value:
      
         DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff)
      
      and we decode address 2000000 correctly:
      
         sbridge MC3: HANDLING MCE MEMORY ERROR
         sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
         sbridge MC3: TSC 0
         sbridge MC3: ADDR 2000000
         sbridge MC3: MISC 523e1086
         sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0
         DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0
         DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01
         DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1
         DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0
         DEBUG: sbridge_mce_output_error:  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0
         MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
      f7cf2a22
    • J
  17. 09 10月, 2014 3 次提交
  18. 27 6月, 2014 9 次提交
  19. 13 3月, 2014 2 次提交