1. 12 6月, 2012 11 次提交
    • M
      edac: edac_mc_handle_error(): add an error_count parameter · 9eb07a7f
      Mauro Carvalho Chehab 提交于
      In order to avoid loosing error events, it is desirable to group
      error events together and generate a single trace for several identical
      errors.
      
      The trace API already allows reporting multiple errors. Change the
      handle_error function to also allow that.
      
      The changes at the drivers were made by this small script:
      
      	$file .=$_ while (<>);
      	$file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g;
      	print $file;
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      9eb07a7f
    • M
      edac: remove arch-specific parameter for the error handler · 03f7eae8
      Mauro Carvalho Chehab 提交于
      Remove the arch-dependent parameter, as it were not used,
      as the MCE tracepoint weren't implemented. It probably doesn't
      make sense to have an MCE-specific tracepoint, as this will
      cost more bytes at the tracepoint, and tracepoint is not free.
      
      The changes at the EDAC drivers were done by this small perl script:
      
      	$file .=$_ while (<>);
      	$file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g;
      	print $file;
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      03f7eae8
    • M
      edac_mc: Cleanup per-dimm_info debug messages · 6e84d359
      Mauro Carvalho Chehab 提交于
      The edac_mc_alloc() routine allocates one dimm_info device for all
      possible memories, including the non-filled ones. The debug messages
      there are somewhat confusing. So, cleans them, by moving the code
      that prints the memory location to edac_mc, and using it on both
      edac_mc_sysfs and edac_mc.
      
      Also, only dumps information when DIMM/ranks are actually
      filled.
      
      After this patch, a dimm-based memory controller will print the debug
      info as:
      
      [ 1011.380027] EDAC DEBUG: edac_mc_dump_csrow: csrow->csrow_idx = 0
      [ 1011.380029] EDAC DEBUG: edac_mc_dump_csrow:   csrow = ffff8801169be000
      [ 1011.380031] EDAC DEBUG: edac_mc_dump_csrow:   csrow->first_page = 0x0
      [ 1011.380032] EDAC DEBUG: edac_mc_dump_csrow:   csrow->last_page = 0x0
      [ 1011.380034] EDAC DEBUG: edac_mc_dump_csrow:   csrow->page_mask = 0x0
      [ 1011.380035] EDAC DEBUG: edac_mc_dump_csrow:   csrow->nr_channels = 3
      [ 1011.380037] EDAC DEBUG: edac_mc_dump_csrow:   csrow->channels = ffff8801149c2840
      [ 1011.380039] EDAC DEBUG: edac_mc_dump_csrow:   csrow->mci = ffff880117426000
      [ 1011.380041] EDAC DEBUG: edac_mc_dump_channel:   channel->chan_idx = 0
      [ 1011.380042] EDAC DEBUG: edac_mc_dump_channel:     channel = ffff8801149c2860
      [ 1011.380044] EDAC DEBUG: edac_mc_dump_channel:     channel->csrow = ffff8801169be000
      [ 1011.380046] EDAC DEBUG: edac_mc_dump_channel:     channel->dimm = ffff88010fe90400
      ...
      [ 1011.380095] EDAC DEBUG: edac_mc_dump_dimm: dimm0: channel 0 slot 0 mapped as virtual row 0, chan 0
      [ 1011.380097] EDAC DEBUG: edac_mc_dump_dimm:   dimm = ffff88010fe90400
      [ 1011.380099] EDAC DEBUG: edac_mc_dump_dimm:   dimm->label = 'CPU#0Channel#0_DIMM#0'
      [ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm:   dimm->nr_pages = 0x40000
      [ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm:   dimm->grain = 8
      [ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm:   dimm->nr_pages = 0x40000
      ...
      
      (a rank-based memory controller would print, instead of "dimm?", "rank?"
       on the above debug info)
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      6e84d359
    • J
      edac: Convert debugfX to edac_dbg(X, · 956b9ba1
      Joe Perches 提交于
      Use a more common debugging style.
      
      Remove __FILE__ uses, add missing newlines,
      coalesce formats and align arguments.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      956b9ba1
    • M
      edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs · dd23cd6e
      Mauro Carvalho Chehab 提交于
      The debug macro already adds that. Most of the work here was
      made by this small script:
      
      $f .=$_ while (<>);
      
      $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*": /\1"/g;
      $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*/\1/g;
      $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*"MC: /\1"/g;
      
      $f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
      $f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;
      $f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
      $f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;
      
      $f =~ s/\"MC\: \\n\"/"MC:\\n"/g;
      
      print $f;
      
      After running the script, manual cleanups were done to fix it the remaining
      places.
      
      While here, removed the __LINE__ on most places, as it doesn't actually give
      useful info on most places.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      dd23cd6e
    • M
      edac: change the mem allocation scheme to make Documentation/kobject.txt happy · de3910eb
      Mauro Carvalho Chehab 提交于
      Kernel kobjects have rigid rules: each container object should be
      dynamically allocated, and can't be allocated into a single kmalloc.
      
      EDAC never obeyed this rule: it has a single malloc function that
      allocates all needed data into a single kzalloc.
      
      As this is not accepted anymore, change the allocation schema of the
      EDAC *_info structs to enforce this Kernel standard.
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Cc: Aristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Greg K H <gregkh@linuxfoundation.org>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: Tim Small <tim@buttersideup.com>
      Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
      Cc: "Arvind R." <arvino55@gmail.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Egor Martovetsky <egor@pasemi.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hitoshi Mitake <h.mitake@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      de3910eb
    • M
      edac: Only expose csrows/channels on legacy API if they're populated · e39f4ea9
      Mauro Carvalho Chehab 提交于
      This patch actually fixes a bug with the legacy API, where, at the
      same csrow, some channels may have different DIMMs. This can happen
      on FB-DIMM/RAMBUS and modern Intel controllers.
      
      This is the case, for example, of Nehalem machines:
      
      $ ./edac-ctl --layout
             +-----------------------------------+
             |                mc0                |
             | channel0  | channel1  | channel2  |
      -------+-----------------------------------+
      slot2: |     0 MB  |     0 MB  |     0 MB  |
      slot1: |  1024 MB  |     0 MB  |     0 MB  |
      slot0: |  1024 MB  |  1024 MB  |  1024 MB  |
      -------+-----------------------------------+
      
      Before this patch, non-filled memories were shown. Now, only what's
      filled is there:
      
      grep . /sys/devices/system/edac/mc/mc0/csrow*/ch?*
      /sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0
      /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label:CPU#0Channel#0_DIMM#0
      /sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0
      /sys/devices/system/edac/mc/mc0/csrow0/ch1_dimm_label:CPU#0Channel#0_DIMM#1
      /sys/devices/system/edac/mc/mc0/csrow1/ch0_ce_count:0
      /sys/devices/system/edac/mc/mc0/csrow1/ch0_dimm_label:CPU#0Channel#1_DIMM#0
      /sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count:0
      /sys/devices/system/edac/mc/mc0/csrow2/ch0_dimm_label:CPU#0Channel#2_DIMM#0
      
      Thanks-to: Aristeu Rozanski Filho <arozansk@redhat.com>
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      e39f4ea9
    • M
      edac: Add debufs nodes to allow doing fake error inject · 452a6bf9
      Mauro Carvalho Chehab 提交于
      Sometimes, it is useful to have a mechanism that generates fake
      errors, in order to test the EDAC core code, and the userspace
      tools.
      
      Provide such mechanism by adding a few debugfs nodes.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      452a6bf9
    • M
      edac: add a sysfs node to report the maximum location for the system · 8ad6c78a
      Mauro Carvalho Chehab 提交于
      The userspace tools need to know what's the maximum location on each
      system, as it helps to create nice maps showing how the memory was
      filled at the system.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      8ad6c78a
    • M
      edac: add a new per-dimm API and make the old per-virtual-rank API obsolete · 19974710
      Mauro Carvalho Chehab 提交于
      The old EDAC API is broken. It only works fine for systems manufatured
      before 2005 and for AMD 64. The reason is that it forces all memory
      controller drivers to discover rank info.
      
      Also, it doesn't allow grouping the several ranks into a DIMM.
      
      So, what almost all modern drivers do is to create a fake virtual-rank
      information, and use it to cheat the EDAC core to accept the driver.
      
      While this works if the user has enough time to discover what DIMM slot
      corresponds to each "virtual-rank" information, it prevents EDAC usage
      for users with less available time. It also makes life hard for vendors
      that may want to provide a table with their motherboards to the userspace
      tool (edac-utils) as each driver has its own logic for the virtual
      mapping.
      
      So, the old API should be removed, in favor of a more flexible API that
      allows newer drivers to not lie to the EDAC core.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Hui Wang <jason77.wang@gmail.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      19974710
    • M
      edac: rewrite the sysfs code to use struct device · 7a623c03
      Mauro Carvalho Chehab 提交于
      The EDAC subsystem uses the old struct sysdev approach,
      creating all nodes using the raw sysfs API. This is bad,
      as the API is deprecated.
      
      As we'll be changing the EDAC API, let's first port the existing
      code to struct device.
      
      There's one drawback on this patch: driver-specific sysfs
      nodes, used by mpc85xx_edac, amd64_edac and i7core_edac
       won't be created anymore. While it would be possible to
      also port the device-specific code, that would mix kobj with
      struct device, with is not recommended. Also, it is easier and nicer
      to move the code to the drivers, instead, as the core can get rid
      of some complex logic that just emulates what the device_add()
      and device_create_file() already does.
      
      The next patches will convert the driver-specific code to use
      the device-specific calls. Then, the remaining bits of the old
      sysfs API will be removed.
      
      NOTE: a per-MC bus is required, otherwise devices with more than
      one memory controller will hit a bug like the one below:
      
      [  819.094946] EDAC DEBUG: find_mci_by_dev: find_mci_by_dev()
      [  819.094948] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device() idx=1
      [  819.094952] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device(): creating device mc1
      [  819.094967] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device creating dimm0, located at channel 0 slot 0
      [  819.094984] ------------[ cut here ]------------
      [  819.100142] WARNING: at fs/sysfs/dir.c:481 sysfs_add_one+0xc1/0xf0()
      [  819.107282] Hardware name: S2600CP
      [  819.111078] sysfs: cannot create duplicate filename '/bus/edac/devices/dimm0'
      [  819.119062] Modules linked in: sb_edac(+) edac_core ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm microcode pcspkr iTCO_wdt iTCO_vendor_support igb i2c_i801 i2c_core sg ioatdma dca sr_mod cdrom sd_mod crc_t10dif ahci libahci isci libsas libata scsi_transport_sas scsi_mod wmi dm_mod [last unloaded: scsi_wait_scan]
      [  819.175748] Pid: 10902, comm: modprobe Not tainted 3.3.0-0.11.el7.v12.2.x86_64 #1
      [  819.184113] Call Trace:
      [  819.186868]  [<ffffffff8105adaf>] warn_slowpath_common+0x7f/0xc0
      [  819.193573]  [<ffffffff8105aea6>] warn_slowpath_fmt+0x46/0x50
      [  819.200000]  [<ffffffff811f53d1>] sysfs_add_one+0xc1/0xf0
      [  819.206025]  [<ffffffff811f5cf5>] sysfs_do_create_link+0x135/0x220
      [  819.212944]  [<ffffffff811f7023>] ? sysfs_create_group+0x13/0x20
      [  819.219656]  [<ffffffff811f5df3>] sysfs_create_link+0x13/0x20
      [  819.226109]  [<ffffffff813b04f6>] bus_add_device+0xe6/0x1b0
      [  819.232350]  [<ffffffff813ae7cb>] device_add+0x2db/0x460
      [  819.238300]  [<ffffffffa0325634>] edac_create_dimm_object+0x84/0xf0 [edac_core]
      [  819.246460]  [<ffffffffa0325e18>] edac_create_sysfs_mci_device+0xe8/0x290 [edac_core]
      [  819.255215]  [<ffffffffa0322e2a>] edac_mc_add_mc+0x5a/0x2c0 [edac_core]
      [  819.262611]  [<ffffffffa03412df>] sbridge_register_mci+0x1bc/0x279 [sb_edac]
      [  819.270493]  [<ffffffffa03417a3>] sbridge_probe+0xef/0x175 [sb_edac]
      [  819.277630]  [<ffffffff813ba4e8>] ? pm_runtime_enable+0x58/0x90
      [  819.284268]  [<ffffffff812f430c>] local_pci_probe+0x5c/0xd0
      [  819.290508]  [<ffffffff812f5ba1>] __pci_device_probe+0xf1/0x100
      [  819.297117]  [<ffffffff812f5bea>] pci_device_probe+0x3a/0x60
      [  819.303457]  [<ffffffff813b1003>] really_probe+0x73/0x270
      [  819.309496]  [<ffffffff813b138e>] driver_probe_device+0x4e/0xb0
      [  819.316104]  [<ffffffff813b149b>] __driver_attach+0xab/0xb0
      [  819.322337]  [<ffffffff813b13f0>] ? driver_probe_device+0xb0/0xb0
      [  819.329151]  [<ffffffff813af5d6>] bus_for_each_dev+0x56/0x90
      [  819.335489]  [<ffffffff813b0d7e>] driver_attach+0x1e/0x20
      [  819.341534]  [<ffffffff813b0980>] bus_add_driver+0x1b0/0x2a0
      [  819.347884]  [<ffffffffa0347000>] ? 0xffffffffa0346fff
      [  819.353641]  [<ffffffff813b19f6>] driver_register+0x76/0x140
      [  819.359980]  [<ffffffff8159f18b>] ? printk+0x51/0x53
      [  819.365524]  [<ffffffffa0347000>] ? 0xffffffffa0346fff
      [  819.371291]  [<ffffffff812f5896>] __pci_register_driver+0x56/0xd0
      [  819.378096]  [<ffffffffa0347054>] sbridge_init+0x54/0x1000 [sb_edac]
      [  819.385231]  [<ffffffff8100203f>] do_one_initcall+0x3f/0x170
      [  819.391577]  [<ffffffff810bcd2e>] sys_init_module+0xbe/0x230
      [  819.397926]  [<ffffffff815bb529>] system_call_fastpath+0x16/0x1b
      [  819.404633] ---[ end trace 1654fdd39556689f ]---
      
      This happens because the bus is not being properly initialized.
      Instead of putting the memory sub-devices inside the memory controller,
      it is putting everything under the same directory:
      
      $ tree /sys/bus/edac/
      /sys/bus/edac/
      ├── devices
      │   ├── all_channel_counts -> ../../../devices/system/edac/mc/mc0/all_channel_counts
      │   ├── csrow0 -> ../../../devices/system/edac/mc/mc0/csrow0
      │   ├── csrow1 -> ../../../devices/system/edac/mc/mc0/csrow1
      │   ├── csrow2 -> ../../../devices/system/edac/mc/mc0/csrow2
      │   ├── dimm0 -> ../../../devices/system/edac/mc/mc0/dimm0
      │   ├── dimm1 -> ../../../devices/system/edac/mc/mc0/dimm1
      │   ├── dimm3 -> ../../../devices/system/edac/mc/mc0/dimm3
      │   ├── dimm6 -> ../../../devices/system/edac/mc/mc0/dimm6
      │   ├── inject_addrmatch -> ../../../devices/system/edac/mc/mc0/inject_addrmatch
      │   ├── mc -> ../../../devices/system/edac/mc
      │   └── mc0 -> ../../../devices/system/edac/mc/mc0
      ├── drivers
      ├── drivers_autoprobe
      ├── drivers_probe
      └── uevent
      
      On a multi-memory controller system, the names "csrow%d" and "dimm%d"
      should be under "mc%d", and not at the main hierarchy level.
      
      So, we need to create a per-MC bus, in order to have its own namespace.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Greg K H <gregkh@linuxfoundation.org>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      7a623c03
  2. 11 6月, 2012 1 次提交
    • M
      edac: Rename the parent dev to pdev · fd687502
      Mauro Carvalho Chehab 提交于
      As EDAC doesn't use struct device itself, it created a parent dev
      pointer called as "pdev".  Now that we'll be converting it to use
      struct device, instead of struct devsys, this needs to be fixed.
      
      No functional changes.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: Jason Uhlenkott <juhlenko@akamai.com>
      Cc: Tim Small <tim@buttersideup.com>
      Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
      Cc: "Arvind R." <arvino55@gmail.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Egor Martovetsky <egor@pasemi.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Joe Perches <joe@perches.com>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hitoshi Mitake <h.mitake@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      Cc: Josh Boyer <jwboyer@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      fd687502
  3. 29 5月, 2012 4 次提交
    • M
      edac: Initialize the dimm label with the known information · 5926ff50
      Mauro Carvalho Chehab 提交于
      While userspace doesn't fill the dimm labels, add there the dimm location,
      as described by the used memory model. This could eventually match what
      is described at the dmidecode, making easier for people to identify the
      memory.
      
      For example, on an Intel motherboard where the DMI table is reliable,
      the first memory stick is described as:
      
      Memory Device
      	Array Handle: 0x0029
      	Error Information Handle: Not Provided
      	Total Width: 64 bits
      	Data Width: 64 bits
      	Size: 2048 MB
      	Form Factor: DIMM
      	Set: 1
      	Locator: A1_DIMM0
      	Bank Locator: A1_Node0_Channel0_Dimm0
      	Type: <OUT OF SPEC>
      	Type Detail: Synchronous
      	Speed: 800 MHz
      	Manufacturer: A1_Manufacturer0
      	Serial Number: A1_SerNum0
      	Asset Tag: A1_AssetTagNum0
      	Part Number: A1_PartNum0
      
      The memory named as "A1_DIMM0" is physically located at the first
      memory controller (node 0), at channel 0, dimm slot 0.
      
      After this patch, the memory label will be filled with:
      	/sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0
      
      And (after the new EDAC API patches) as:
      	/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0
      
      So, even if the memory label is not initialized on userspace, an useful
      information with the error location is filled there, expecially since
      several systems/motherboards are provided with enough info to map from
      channel/slot (or branch/channel/slot) into the DIMM label. So, letting the
      EDAC core fill it by default is a good thing.
      
      It should noticed that, as the label filling happens at the
      edac_mc_alloc(), drivers can override it to better describe the memories
      (and some actually do it).
      
      Cc: Aristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      5926ff50
    • M
      edac: move nr_pages to dimm struct · a895bf8b
      Mauro Carvalho Chehab 提交于
      The number of pages is a dimm property. Move it to the dimm struct.
      
      After this change, it is possible to add sysfs nodes for the DIMM's that
      will properly represent the DIMM stick properties, including its size.
      
      A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when
      the memory controller represents the memory via chip select rows.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: Jason Uhlenkott <juhlenko@akamai.com>
      Cc: Tim Small <tim@buttersideup.com>
      Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
      Cc: "Arvind R." <arvino55@gmail.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Egor Martovetsky <egor@pasemi.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Joe Perches <joe@perches.com>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hitoshi Mitake <h.mitake@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      Cc: Josh Boyer <jwboyer@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      a895bf8b
    • M
      edac: move dimm properties to struct dimm_info · 084a4fcc
      Mauro Carvalho Chehab 提交于
      On systems based on chip select rows, all channels need to use memories
      with the same properties, otherwise the memories on channels A and B
      won't be recognized.
      
      However, such assumption is not true for all types of memory
      controllers.
      
      Controllers for FB-DIMM's don't have such requirements.
      
      Also, modern Intel controllers seem to be capable of handling such
      differences.
      
      So, we need to get rid of storing the DIMM information into a per-csrow
      data, storing it, instead at the right place.
      
      The first step is to move grain, mtype, dtype and edac_mode to the
      per-dimm struct.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Reviewed-by: NBorislav Petkov <borislav.petkov@amd.com>
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: Jason Uhlenkott <juhlenko@akamai.com>
      Cc: Tim Small <tim@buttersideup.com>
      Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
      Cc: "Arvind R." <arvino55@gmail.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Egor Martovetsky <egor@pasemi.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Joe Perches <joe@perches.com>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hitoshi Mitake <h.mitake@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: James Bottomley <James.Bottomley@parallels.com>
      Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      Cc: Josh Boyer <jwboyer@gmail.com>
      Cc: Mike Williams <mike@mikebwilliams.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      084a4fcc
    • M
      edac: Create a dimm struct and move the labels into it · a7d7d2e1
      Mauro Carvalho Chehab 提交于
      The way a DIMM is currently represented implies that they're
      linked into a per-csrow struct. However, some drivers don't see
      csrows, as they're ridden behind some chip like the AMB's
      on FBDIMM's, for example.
      
      This forced drivers to fake^Wvirtualize a csrow struct, and to create
      a mess under csrow/channel original's concept.
      
      Move the DIMM labels into a per-DIMM struct, and add there
      the real location of the socket, in terms of csrow/channel.
      Latter patches will modify the location to properly represent the
      memory architecture.
      
      All other drivers will use a per-csrow type of location.
      Some of those drivers will require a latter conversion, as
      they also fake the csrows internally.
      
      TODO: While this patch doesn't change the existing behavior, on
      csrows-based memory controllers, a csrow/channel pair points to a memory
      rank. There's a known bug at the EDAC core that allows having different
      labels for the same DIMM, if it has more than one rank. A latter patch
      is need to merge the several ranks for a DIMM into the same dimm_info
      struct, in order to avoid having different labels for the same DIMM.
      
      The edac_mc_alloc() will now contain a per-dimm initialization loop that
      will be changed by latter patches in order to match other types of
      memory architectures.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Reviewed-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
      Cc: "Arvind R." <arvino55@gmail.com>
      Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      a7d7d2e1
  4. 19 3月, 2012 1 次提交
    • B
      EDAC: Correct scrub rate API · 5e8e19bf
      Borislav Petkov 提交于
      The original scrub rate API definition states that if scrub rate
      accessors are not implemented, a negative value (-1) should be written
      to the sysfs file (/sys/devices/system/edac/mc/mc<N>/sdram_scrub_rate,
      where N is the memory controller number on the system). This is
      counter-intuitive and awkward at the very least because, when setting
      the scrub rate, userspace has to write to sysfs and then read it back to
      check error status of the operation.
      
      As Tony notes, best it would be to not have the sdram_scrub_rate in
      sysfs if scrub rate support is not implemented. It is too late about
      that and a bunch of drivers on a bunch of arches would need to be
      changed and tested which is not a trivial task ATM.
      
      Instead, settle for the next best thing of returning -ENODEV when
      implementation is missing and -EINVAL when there was an error
      encountered while setting the scrub rate.
      Reported-by: NHan Pingtian <phan@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/20110916105856.GA13253@hpt.nay.redhat.comSigned-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      5e8e19bf
  5. 15 12月, 2011 1 次提交
  6. 21 4月, 2011 1 次提交
  7. 31 3月, 2011 1 次提交
  8. 17 3月, 2011 1 次提交
  9. 07 1月, 2011 1 次提交
  10. 24 10月, 2010 5 次提交
    • M
      i7core_edac: don't use a freed mci struct · accf74ff
      Mauro Carvalho Chehab 提交于
      This is a nasty bug. Since kobject count will be reduced by zero by
      edac_mc_del_mc(), and this triggers the kobj release method, the
      mci memory will be freed automatically. So, all we have left is ctl_name,
      as shown by enabling debug:
      
      [   80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device()  remove_link
      [   80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device()  remove_mci_instance
      [   80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
      [   80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0
      [   80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs
      [   80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free()
      [   80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj()
      [   80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices()
      
      Also, kfree(mci) shouldn't happen at the kobj.release, as it happens
      when edac_remove_sysfs_mci_device() is called, but the logic is:
      	edac_remove_sysfs_mci_device(mci);
      	edac_printk(KERN_INFO, EDAC_MC,
      		"Removed device %d for %s %s: DEV %s\n", mci->mc_idx,
      		mci->mod_name, mci->ctl_name, edac_dev_name(mci));
      So, as the edac_printk() needs the mci struct, this generates an OOPS.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      accf74ff
    • M
      edac_core: Print debug messages at release calls · bbc560ae
      Mauro Carvalho Chehab 提交于
      This is important to track a nasty bug at the free logic.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      bbc560ae
    • M
      edac_core: Don't let free(mci) happen while using it · ac99768c
      Mauro Carvalho Chehab 提交于
      A very nasty bug were happening on edac core, due to the way mci objects are
      freed. mci memory is freed when kobject count reaches zero, by
      edac_mci_control_release(). However, from the logs, this is clearly happening
      before the final usage of mci struct:
      
      [15799.607454] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
      [15799.618773] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 769: edac_inst_grp_release()
      [15799.627326] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 894: edac_remove_mci_instance_attributes() end of seeking for group all_channel_counts
      [15799.640887] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 877: edac_remove_mci_instance_attributes() sysfs_attrib = ffffffffa01d7240
      [15799.653412] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device()  remove_link
      [15799.663753] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device()  remove_mci_instance
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      ac99768c
    • M
      edac_core: Do a better job with node removal · 6fe1108f
      Mauro Carvalho Chehab 提交于
      Make sure we remove groups at the right order
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      6fe1108f
    • M
      i7core_edac: Properly mark const static vars as such · 1288c18f
      Mauro Carvalho Chehab 提交于
      There are two groups of sysfs attributes: one for rdimm and another
      for udimm. Instead of changing dynamically the unique static struct
      for handling udimm's, declare two vars and make them constant.
      
      This avoids the risk of having two or more memory controllers, each
      needing a different set of attributes.
      
      While here, use const on all places where it is applicable.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      
      edac_core: use const for constant sysfs arguments
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      1288c18f
  11. 21 10月, 2010 2 次提交
  12. 03 8月, 2010 1 次提交
    • B
      edac, mc: Improve scrub rate handling · eba042a8
      Borislav Petkov 提交于
      Fortify the interface to not accept negative values, remove
      memctrl_int_store() as a result. Also, sanitize bandwidth setting by
      making the argument a simple u32 instead of strange u32 pointer being
      passed around for no obvious reason. Then, fix error handling and teach
      it to return proper error values. Finally, make code more readable,
      simplify debug messages.
      
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Cc: Arthur Jones <ajones@riverbed.com>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Acked-by: NDoug Thompson <dougthompson@xmission.com>
      eba042a8
  13. 10 5月, 2010 4 次提交
  14. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  15. 08 3月, 2010 1 次提交
  16. 01 7月, 2009 1 次提交
  17. 26 7月, 2008 3 次提交