1. 29 5月, 2012 2 次提交
    • M
      edac: move dimm properties to struct dimm_info · 084a4fcc
      Mauro Carvalho Chehab 提交于
      On systems based on chip select rows, all channels need to use memories
      with the same properties, otherwise the memories on channels A and B
      won't be recognized.
      
      However, such assumption is not true for all types of memory
      controllers.
      
      Controllers for FB-DIMM's don't have such requirements.
      
      Also, modern Intel controllers seem to be capable of handling such
      differences.
      
      So, we need to get rid of storing the DIMM information into a per-csrow
      data, storing it, instead at the right place.
      
      The first step is to move grain, mtype, dtype and edac_mode to the
      per-dimm struct.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Reviewed-by: NBorislav Petkov <borislav.petkov@amd.com>
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: Jason Uhlenkott <juhlenko@akamai.com>
      Cc: Tim Small <tim@buttersideup.com>
      Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
      Cc: "Arvind R." <arvino55@gmail.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Egor Martovetsky <egor@pasemi.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Joe Perches <joe@perches.com>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hitoshi Mitake <h.mitake@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: James Bottomley <James.Bottomley@parallels.com>
      Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      Cc: Josh Boyer <jwboyer@gmail.com>
      Cc: Mike Williams <mike@mikebwilliams.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      084a4fcc
    • M
      edac: Create a dimm struct and move the labels into it · a7d7d2e1
      Mauro Carvalho Chehab 提交于
      The way a DIMM is currently represented implies that they're
      linked into a per-csrow struct. However, some drivers don't see
      csrows, as they're ridden behind some chip like the AMB's
      on FBDIMM's, for example.
      
      This forced drivers to fake^Wvirtualize a csrow struct, and to create
      a mess under csrow/channel original's concept.
      
      Move the DIMM labels into a per-DIMM struct, and add there
      the real location of the socket, in terms of csrow/channel.
      Latter patches will modify the location to properly represent the
      memory architecture.
      
      All other drivers will use a per-csrow type of location.
      Some of those drivers will require a latter conversion, as
      they also fake the csrows internally.
      
      TODO: While this patch doesn't change the existing behavior, on
      csrows-based memory controllers, a csrow/channel pair points to a memory
      rank. There's a known bug at the EDAC core that allows having different
      labels for the same DIMM, if it has more than one rank. A latter patch
      is need to merge the several ranks for a DIMM into the same dimm_info
      struct, in order to avoid having different labels for the same DIMM.
      
      The edac_mc_alloc() will now contain a per-dimm initialization loop that
      will be changed by latter patches in order to match other types of
      memory architectures.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Reviewed-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
      Cc: "Arvind R." <arvino55@gmail.com>
      Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      a7d7d2e1
  2. 22 3月, 2012 1 次提交
    • M
      edac: rename channel_info to rank_info · a4b4be3f
      Mauro Carvalho Chehab 提交于
      What it is pointed by a csrow/channel vector is a rank information, and
      not a channel information.
      
      On a traditional architecture, the memory controller directly access the
      memory ranks, via chip select rows. Different ranks at the same DIMM is
      selected via different chip select rows. So, typically, one
      csrow/channel pair means one different DIMM.
      
      On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced
      Memory Buffer (AMB) that serves as the interface between the memory
      controller and the memory chips.
      
      The AMB selection is via the DIMM slot, and not via a csrow.
      
      It is up to the AMB to talk with the csrows of the DRAM chips.
      
      So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM
      rank. RAMBUS is similar.
      
      Newer memory controllers, like the ones found on Intel Sandy Bridge and
      Nehalem, even working with normal DDR3 DIMM's, don't use the usual
      channel A/channel B interleaving schema to provide 128 bits data access.
      
      Instead, they have more channels (3 or 4 channels), and they can use
      several interleaving schemas. Such memory controllers see the DIMMs
      directly on their registers, instead of the ranks, which is better for
      the driver, as its main usageis to point to a broken DIMM stick (the
      Field Repleceable Unit), and not to point to a broken DRAM chip.
      
      The drivers that support such such newer memory architecture models
      currently need to fake information and to abuse on EDAC structures, as
      the subsystem was conceived with the idea that the csrow would always be
      visible by the CPU.
      
      To make things a little worse, those drivers don't currently fake
      csrows/channels on a consistent way, as the concepts there don't apply
      to the memory controllers they're talking with. So, each driver author
      interpreted the concepts using a different logic.
      
      In order to fix it, let's rename the data structure that points into a
      DIMM rank to "rank_info", in order to be clearer about what's stored
      there.
      
      Latter patches will provide a better way to represent the memory
      hierarchy for the other types of memory controller.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      a4b4be3f
  3. 20 3月, 2012 1 次提交
  4. 15 12月, 2011 1 次提交
  5. 27 5月, 2011 1 次提交
  6. 31 3月, 2011 1 次提交
  7. 07 1月, 2011 1 次提交
  8. 09 12月, 2010 1 次提交
    • B
      EDAC: Fix workqueue-related crashes · bb31b312
      Borislav Petkov 提交于
      00740c58 changed edac_core to
      un-/register a workqueue item only if a lowlevel driver supplies a
      polling routine. Normally, when we remove a polling low-level driver, we
      go and cancel all the queued work. However, the workqueue unreg happens
      based on the ->op_state setting, and edac_mc_del_mc() sets this to
      OP_OFFLINE _before_ we cancel the work item, leading to NULL ptr oops on
      the workqueue list.
      
      Fix it by putting the unreg stuff in proper order.
      
      Cc: <stable@kernel.org> #36.x
      Reported-and-tested-by: NTobias Karnat <tobias.karnat@googlemail.com>
      LKML-Reference: <1291201307.3029.21.camel@Tobias-Karnat>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      bb31b312
  9. 24 10月, 2010 4 次提交
    • M
      i7core_edac: don't use a freed mci struct · accf74ff
      Mauro Carvalho Chehab 提交于
      This is a nasty bug. Since kobject count will be reduced by zero by
      edac_mc_del_mc(), and this triggers the kobj release method, the
      mci memory will be freed automatically. So, all we have left is ctl_name,
      as shown by enabling debug:
      
      [   80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device()  remove_link
      [   80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device()  remove_mci_instance
      [   80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
      [   80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0
      [   80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs
      [   80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free()
      [   80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj()
      [   80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices()
      
      Also, kfree(mci) shouldn't happen at the kobj.release, as it happens
      when edac_remove_sysfs_mci_device() is called, but the logic is:
      	edac_remove_sysfs_mci_device(mci);
      	edac_printk(KERN_INFO, EDAC_MC,
      		"Removed device %d for %s %s: DEV %s\n", mci->mc_idx,
      		mci->mod_name, mci->ctl_name, edac_dev_name(mci));
      So, as the edac_printk() needs the mci struct, this generates an OOPS.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      accf74ff
    • M
      edac_core: Print debug messages at release calls · bbc560ae
      Mauro Carvalho Chehab 提交于
      This is important to track a nasty bug at the free logic.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      bbc560ae
    • M
      edac_core: Do a better job with node removal · 6fe1108f
      Mauro Carvalho Chehab 提交于
      Make sure we remove groups at the right order
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      6fe1108f
    • M
      i7core_edac: Be sure that the edac pci handler will be properly released · 939747bd
      Mauro Carvalho Chehab 提交于
      With multi-sockets, more than one edac pci handler is enabled. Be sure to
      un-register all instances.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      939747bd
  10. 27 9月, 2010 1 次提交
    • B
      amd64_edac: Fix driver module removal · 00740c58
      Borislav Petkov 提交于
      f4347553 removed the edac polling
      mechanism in favor of using a notifier chain for conveying MCE
      information to edac. However, the module removal path didn't test
      whether the driver had setup the polling function workqueue at all and
      the rmmod process was hanging in the kernel at try_to_del_timer_sync()
      in the cancel_delayed_work() path, trying to cancel an uninitialized
      work struct.
      
      Fix that by adding a balancing check to the workqueue removal path.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      00740c58
  11. 08 12月, 2009 1 次提交
  12. 24 9月, 2009 1 次提交
  13. 14 4月, 2009 1 次提交
  14. 07 1月, 2009 1 次提交
  15. 06 5月, 2008 1 次提交
  16. 29 4月, 2008 2 次提交
  17. 27 7月, 2007 1 次提交
  18. 20 7月, 2007 18 次提交