1. 24 12月, 2021 1 次提交
  2. 10 12月, 2021 1 次提交
  3. 15 11月, 2021 4 次提交
  4. 07 10月, 2021 1 次提交
    • Y
      EDAC/amd64: Handle three rank interleaving mode · 9f4873fb
      Yazen Ghannam 提交于
      AMD Rome systems and later support interleaving between three identical
      ranks within a channel.
      
      Check for this mode by counting the number of enabled chip selects and
      comparing their masks. If there are exactly three enabled chip selects
      and their masks are identical, then three rank interleaving is enabled.
      
      The size of a rank is determined from its mask value. However, three
      rank interleaving doesn't follow the method of swapping an interleave
      bit with the most significant bit. Rather, the interleave bit is flipped
      and the most significant bit remains the same. There is only a single
      interleave bit in this case.
      
      Account for this when determining the chip select size by keeping the
      most significant bit at its original value and ignoring any zero bits.
      This will return a full bitmask in [MSB:1].
      
      Fixes: e53a3b26 ("EDAC/amd64: Find Chip Select memory size using Address Mask")
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20211005154419.2060504-1-yazen.ghannam@amd.com
      9f4873fb
  5. 14 7月, 2021 1 次提交
  6. 10 5月, 2021 1 次提交
  7. 22 1月, 2021 1 次提交
    • B
      EDAC/amd64: Issue probing messages only on properly detected hardware · 4cbcb73b
      Borislav Petkov 提交于
      amd64_edac was converted to CPU family autoprobing (from PCI device
      IDs) to not have to add a new PCI device ID each time a new platform is
      shipped but to support the whole family out-of-the-box.
      
      However, this caused a lot of noise in dmesg even when the machine
      doesn't have ECC DIMMs or ECC has been disabled in the BIOS:
      
        EDAC MC: Ver: 3.0.0
        EDAC amd64: F17h detected (node 0).
        EDAC amd64: Node 0: DRAM ECC disabled.
        EDAC amd64: F17h detected (node 1).
        EDAC amd64: Node 1: DRAM ECC disabled.
        EDAC amd64: F17h detected (node 2).
        EDAC amd64: Node 2: DRAM ECC disabled.
        EDAC amd64: F17h detected (node 3).
        EDAC amd64: Node 3: DRAM ECC disabled.
        EDAC amd64: F17h detected (node 4).
        EDAC amd64: Node 4: DRAM ECC disabled.
        EDAC amd64: F17h detected (node 5).
        EDAC amd64: Node 5: DRAM ECC disabled.
        EDAC amd64: F17h detected (node 6).
        EDAC amd64: Node 6: DRAM ECC disabled.
        EDAC amd64: F17h detected (node 7).
        EDAC amd64: Node 7: DRAM ECC disabled.
      
      or even
      
      $ grep EDAC dmesg.log | sed 's/\[.*\] //' | sort | uniq -c
          128 EDAC amd64: F17h detected (node 0).
          128 EDAC amd64: Node 0: DRAM ECC disabled.
            1 EDAC MC: Ver: 3.0.0
      
      on a big machine. Yap, that's once per CPU for 128 of them.
      
      So move the init messages after all probing has succeeded to avoid
      unnecessary spew in dmesg.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20210119164141.17417-1-bp@alien8.de
      4cbcb73b
  8. 29 12月, 2020 4 次提交
  9. 28 12月, 2020 1 次提交
  10. 27 11月, 2020 1 次提交
    • B
      EDAC/amd64: Fix PCI component registration · 706657b1
      Borislav Petkov 提交于
      In order to setup its PCI component, the driver needs any node private
      instance in order to get a reference to the PCI device and hand that
      into edac_pci_create_generic_ctl(). For convenience, it uses the 0th
      memory controller descriptor under the assumption that if any, the 0th
      will be always present.
      
      However, this assumption goes wrong when the 0th node doesn't have
      memory and the driver doesn't initialize an instance for it:
      
        EDAC amd64: F17h detected (node 0).
        ...
        EDAC amd64: Node 0: No DIMMs detected.
      
      But looking up node instances is not really needed - all one needs is
      the pointer to the proper device which gets discovered during instance
      init.
      
      So stash that pointer into a variable and use it when setting up the
      EDAC PCI component.
      
      Clear that variable when the driver needs to unwind due to some
      instances failing init to avoid any registration imbalance.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20201122150815.13808-1-bp@alien8.de
      706657b1
  11. 19 11月, 2020 1 次提交
  12. 26 10月, 2020 1 次提交
  13. 10 10月, 2020 1 次提交
  14. 24 8月, 2020 1 次提交
  15. 19 6月, 2020 1 次提交
  16. 29 5月, 2020 1 次提交
  17. 23 5月, 2020 1 次提交
  18. 14 4月, 2020 1 次提交
  19. 25 3月, 2020 1 次提交
  20. 17 1月, 2020 3 次提交
  21. 09 11月, 2019 1 次提交
    • B
      EDAC/amd64: Get rid of the ECC disabled long message · 7fdfee92
      Borislav Petkov 提交于
      This message keeps flooding dmesg on boxes where ECC is disabled or the
      DIMMs do not support ECC but the module gets auto-probed. What's even
      worse is that autoprobing happens on every CPU due to the CPU-family
      matching the driver does and uevent being generated for each CPU device.
      
      What is more, this message is becoming even more useless on newer
      systems where forcing ECC is not recommended and it should be done in
      the BIOS so the BIOS can do all the necessary work, i.e., just setting a
      bit in an MSR is not enough anymore.
      
      So get rid of it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Yazen Ghannam <yazen.ghannam@amd.com>
      Cc: linux-edac@vger.kernel.org
      Link: https://lkml.kernel.org/r/20191106160607.GC28380@zn.tnic
      7fdfee92
  22. 06 11月, 2019 5 次提交
  23. 25 10月, 2019 1 次提交
  24. 07 9月, 2019 1 次提交
  25. 23 8月, 2019 4 次提交
    • Y
      EDAC/amd64: Support asymmetric dual-rank DIMMs · 81f5090d
      Yazen Ghannam 提交于
      Future AMD systems will support asymmetric dual-rank DIMMs. These are
      DIMMs where the ranks are of different sizes.
      
      The even rank will use the Primary Even Chip Select registers and the
      odd rank will use the Secondary Odd Chip Select registers.
      
      Recognize if a Secondary Odd Chip Select is being used. Use the
      Secondary Odd Address Mask when calculating the chip select size.
      
       [ bp: move csrow_sec_enabled() to the header, fix CS_ODD define and
         tone-down the capitalized words spelling. ]
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: https://lkml.kernel.org/r/20190821235938.118710-8-Yazen.Ghannam@amd.com
      81f5090d
    • Y
      EDAC/amd64: Cache secondary Chip Select registers · 7574729e
      Yazen Ghannam 提交于
      AMD Family 17h systems have a set of secondary Chip Select Base
      Addresses and Address Masks. These do not represent unique Chip
      Selects, rather they are used in conjunction with the primary
      Chip Select registers in certain cases.
      
      Cache these secondary Chip Select registers for future use.
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: https://lkml.kernel.org/r/20190821235938.118710-7-Yazen.Ghannam@amd.com
      7574729e
    • Y
      EDAC/amd64: Decode syndrome before translating address · 8a2eaab7
      Yazen Ghannam 提交于
      AMD Family 17h systems currently require address translation in order to
      report the system address of a DRAM ECC error. This is currently done
      before decoding the syndrome information. The syndrome information does
      not depend on the address translation, so the proper EDAC csrow/channel
      reporting can function without the address. However, the syndrome
      information will not be decoded if the address translation fails.
      
      Decode the syndrome information before doing the address translation.
      The syndrome information is architecturally defined in MCA_SYND and can
      be considered robust. The address translation is system-specific and may
      fail on newer systems without proper updates to the translation
      algorithm.
      
      Fixes: 713ad546 ("EDAC, amd64: Define and register UMC error decode function")
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: https://lkml.kernel.org/r/20190821235938.118710-6-Yazen.Ghannam@amd.com
      8a2eaab7
    • Y
      EDAC/amd64: Find Chip Select memory size using Address Mask · e53a3b26
      Yazen Ghannam 提交于
      Chip Select memory size reporting on AMD Family 17h was recently fixed
      in order to account for interleaving. However, the current method is not
      robust.
      
      The Chip Select Address Mask can be used to find the memory size. There
      are a couple of cases.
      
      1) For single-rank and dual-rank non-interleaved, use the address mask
      plus 1 as the size.
      
      2) For dual-rank interleaved, do #1 but "de-interleave" the address mask
      first.
      
      Always "de-interleave" the address mask in order to simplify the code
      flow. Bit mask manipulation is necessary to check for interleaving, so
      just go ahead and do the de-interleaving. In the non-interleaved case,
      the original and de-interleaved address masks will be the same.
      
      To de-interleave the mask, count the number of zero bits in the middle
      of the mask and swap them with the most significant bits.
      
      For example,
      Original=0xFFFF9FE, De-interleaved=0x3FFFFFE
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: https://lkml.kernel.org/r/20190821235938.118710-5-Yazen.Ghannam@amd.com
      e53a3b26