1. 12 12月, 2012 1 次提交
    • R
      MIPS: Cavium: Add EDAC support. · f65aad41
      Ralf Baechle 提交于
      Drivers for EDAC on Cavium.  Supported subsystems are:
      
       o CPU primary caches.  These are parity protected only, so only error
         reporting.
       o Second level cache - ECC protected, provides SECDED.
       o Memory: ECC / SECDEC if used with suitable DRAM modules.  The driver will
         will only initialize if ECC is enabled on a system so is safe to run on
         non-ECC memory.
       o PCI: Parity error reporting
      
      Since it is very hard to test this sort of code the implementation is very
      conservative and uses polling where possible for now.
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      Reviewed-by: NBorislav Petkov <borislav.petkov@amd.com>
      f65aad41
  2. 30 10月, 2012 1 次提交
  3. 24 10月, 2012 1 次提交
  4. 25 9月, 2012 3 次提交
    • M
      sb_edac: Avoid overflow errors at memory size calculation · deb09dda
      Mauro Carvalho Chehab 提交于
      Sandy bridge EDAC is calculating the memory size with overflow.
      Basically, the size field and the integer calculation is using 32 bits.
      More bits are needed, when the DIMM memories have high density.
      
      The net result is that memories are improperly reported there, when
      high-density DIMMs are used:
      
      EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 0, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800
      EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 1, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800
      
      As the number of pages value is handled at the EDAC core as unsigned
      ints, the driver shows the 16 GB memories at sysfs interface as 16760832
      MB! The fix is simple: calculate the number of pages as unsigned 64-bits
      integer.
      
      After the patch, the memory size (16 GB) is properly detected:
      
      EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 0, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800
      EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 1, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800
      
      Cc: stable@kernel.org
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      deb09dda
    • M
      i5000: Fix the memory size calculation with 2R memories · b70f8333
      Mauro Carvalho Chehab 提交于
      When 2R memories are found, the memory size should be multiplied
      by two, otherwise, it will report half of the memory size:
      
             +-----------------------------------------------+
             |                      mc0                      |
             |        branch0        |        branch1        |
             | channel0  | channel1  | channel0  | channel1  |
      -------+-----------------------------------------------+
      slot3: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
      slot2: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
      -------+-----------------------------------------------+
      slot1: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
      slot0: |  1024 MB  |  1024 MB  |  1024 MB  |  1024 MB  |
      -------+-----------------------------------------------+
      
      (the above machine have 4 x 2GB 2R memories)
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      b70f8333
    • M
      i3200_edac: Fix memory rank size · 582a8996
      Mauro Carvalho Chehab 提交于
      commit a895bf8b incorrectly
      changed the logic that fills the memory bank size. Fix it.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      582a8996
  5. 24 9月, 2012 2 次提交
    • S
      edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs. · faa2ad09
      Shaun Ruffell 提交于
      Fix potential NULL pointer dereference in edac_unregister_sysfs() on
      system boot introduced in 3.6-rc1.
      
      Since commit 7a623c03 ("edac: rewrite the sysfs code to use struct
      device") edac_mc_alloc() no longer initializes embedded kobjects in
      struct mem_ctl_info.  Therefore edac_mc_free() can no longer simply
      decrement a kobject reference count to free the allocated memory unless
      the memory controller driver module had also called edac_mc_add_mc().
      
      Now edac_mc_free() will check if the newly embedded struct device has
      been registered with sysfs before using either the standard device
      release functions or freeing the data structures itself with logic
      pulled out of the error path of edac_mc_alloc().
      
      The BUG this patch resolves for me:
      
        BUG: unable to handle kernel NULL pointer dereference at   (null)
        EIP is at __wake_up_common+0x1a/0x6a
        Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 task.ti=f3dc6000)
        Call Trace:
          complete_all+0x3f/0x50
          device_pm_remove+0x23/0xa2
          device_del+0x34/0x142
          edac_unregister_sysfs+0x3b/0x5c [edac_core]
          edac_mc_free+0x29/0x2f [edac_core]
          e7xxx_probe1+0x268/0x311 [e7xxx_edac]
          e7xxx_init_one+0x56/0x61 [e7xxx_edac]
          local_pci_probe+0x13/0x15
        ...
      
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      Signed-off-by: NShaun Ruffell <sruffell@digium.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      faa2ad09
    • F
      edac_mc: fix messy kfree calls in the error path · ef6e7816
      Fengguang Wu 提交于
      coccinelle warns about:
      
      + drivers/edac/edac_mc.c:429:9-23: ERROR: reference preceded by free on line 429
      
         421         if (mci->csrows) {
       > 422                 for (chn = 0; chn < tot_channels; chn++) {
         423                         csr = mci->csrows[chn];
         424                         if (csr) {
       > 425                                 for (chn = 0; chn < tot_channels; chn++)
         426                                          kfree(csr->channels[chn]);
         427                                  kfree(csr);
         428                          }
       > 429                          kfree(mci->csrows[i]);
         430                  }
         431                  kfree(mci->csrows);
         432          }
      
      and that code block seem to mess things up in several ways (double free, memory
      leak, out-of-bound reads etc.):
      
      L422: The iterator "chn" and bound "tot_channels" are totally wrong. Should be
            "row" and "tot_csrows" respectively. Which means either memory leak, or
            out-of-bound reads (which if does not trigger an immediate page fault
            error, will further lead to kfree() on random addresses).
      
      L425: The inner loop is reusing the same iterator "chn" as the outer loop,
            which could lead to premature end of the outer loop, and hence memory leak.
      
      L429: The array index 'i' in mci->csrows[i] is a temporary value used in
            previous loops, and won't change at all in the current loop. Which
            means either out-of-bound read and possibly kfree(random number), or the
            same mci->csrows[i] get freed once and again, and possibly double free
            for the kfree(csr) in L427.
      
      L426/L427: a kfree(csr->channels) is needed in between to avoid leaking the memory.
      
      The buggy code was introduced by commit de3910eb ("edac: change the mem
      allocation scheme to make Documentation/kobject.txt happy") in the 3.6-rc1
      merge window. Fix it by freeing up resources in this order:
      
        free csrows[i]->channels[j]
        free csrows[i]->channels
        free csrows[i]
        free csrows
      
      CC: Mauro Carvalho Chehab <mchehab@redhat.com>
      CC: Shaun Ruffell <sruffell@digium.com>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef6e7816
  6. 13 9月, 2012 1 次提交
  7. 14 8月, 2012 1 次提交
    • T
      workqueue: use mod_delayed_work() instead of cancel + queue · 41f63c53
      Tejun Heo 提交于
      Convert delayed_work users doing cancel_delayed_work() followed by
      queue_delayed_work() to mod_delayed_work().
      
      Most conversions are straight-forward.  Ones worth mentioning are,
      
      * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
        use mod_delayed_work() and cancel loop in
        edac_mc_reset_delay_period() is dropped.
      
      * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
        watchdog is active or not.  @fan_watchdog_active and related code
        dropped.
      
      * drivers/power/charger-manager.c: Seemingly a lot of
        delayed_work_pending() abuse going on here.
        [delayed_]work_pending() are unsynchronized and racy when used like
        this.  I converted one instance in fullbatt_handler().  Please
        conver the rest so that it invokes workqueue APIs for the intended
        target state rather than trying to game work item pending state
        transitions.  e.g. if timer should be modified - call
        mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().
      
      * drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
        simplified.  Note that round_jiffies() calls in this function are
        meaningless.  round_jiffies() work on absolute jiffies not delta
        delay used by delayed_work.
      
      v2: Tomi pointed out that __cancel_delayed_work() users can't be
          safely converted to mod_delayed_work().  They could be calling it
          from irq context and if that happens while delayed_work_timer_fn()
          is running, it could deadlock.  __cancel_delayed_work() users are
          dropped.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NHenrique de Moraes Holschuh <hmh@hmh.eng.br>
      Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Acked-by: NAnton Vorontsov <cbouatmailru@gmail.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      41f63c53
  8. 27 6月, 2012 4 次提交
  9. 12 6月, 2012 24 次提交
    • R
      edac: create top-level debugfs directory · e7930ba4
      Rob Herring 提交于
      Create a single, top-level "edac" directory for debugfs. An "mc[0-N]"
      directory is then created for each memory controller. Individual drivers
      can create additional entries such as h/w error injection control.
      Signed-off-by: NRob Herring <rob.herring@calxeda.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      e7930ba4
    • M
      sb_edac: properly handle error count · c1053839
      Mauro Carvalho Chehab 提交于
      Instead of reporting the error count via driver-specific details,
      use the new way provided by edac_mc_handle_error.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      c1053839
    • M
      i7core_edac: properly handle error count · 00d18339
      Mauro Carvalho Chehab 提交于
      Instead of generating a burst of errors or reporting the error
      count via driver-specific details, use the new way provided by
      edac_mc_handle_error.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      00d18339
    • M
      edac: edac_mc_handle_error(): add an error_count parameter · 9eb07a7f
      Mauro Carvalho Chehab 提交于
      In order to avoid loosing error events, it is desirable to group
      error events together and generate a single trace for several identical
      errors.
      
      The trace API already allows reporting multiple errors. Change the
      handle_error function to also allow that.
      
      The changes at the drivers were made by this small script:
      
      	$file .=$_ while (<>);
      	$file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g;
      	print $file;
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      9eb07a7f
    • M
      edac: remove arch-specific parameter for the error handler · 03f7eae8
      Mauro Carvalho Chehab 提交于
      Remove the arch-dependent parameter, as it were not used,
      as the MCE tracepoint weren't implemented. It probably doesn't
      make sense to have an MCE-specific tracepoint, as this will
      cost more bytes at the tracepoint, and tracepoint is not free.
      
      The changes at the EDAC drivers were done by this small perl script:
      
      	$file .=$_ while (<>);
      	$file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g;
      	print $file;
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      03f7eae8
    • M
      amd64_edac: Don't pass driver name as an error parameter · 075f3090
      Mauro Carvalho Chehab 提交于
      The EDAC driver name doesn't help to handle EDAC errors. So,
      remove it from the EDAC error messages, preserving only the
      error_message.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      075f3090
    • D
      edac_mc: check for allocation failure in edac_mc_alloc() · 08a4a136
      Dan Carpenter 提交于
      Add a check here for if kzalloc() failed.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      08a4a136
    • M
      edac: Increase version to 3.0.0 · 5156a5f4
      Mauro Carvalho Chehab 提交于
      There were lots of changes introduced to justify renaming it to
      3.0.0:
      
        - EDAC core were redesigned to represent all types of
          memory controllers;
      
        - EDAC API were redesigned to properly represent the memory
          controller hierarchy;
      
        - a tracepoint-based API were added to report memory errors.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      5156a5f4
    • M
      edac_mc: Cleanup per-dimm_info debug messages · 6e84d359
      Mauro Carvalho Chehab 提交于
      The edac_mc_alloc() routine allocates one dimm_info device for all
      possible memories, including the non-filled ones. The debug messages
      there are somewhat confusing. So, cleans them, by moving the code
      that prints the memory location to edac_mc, and using it on both
      edac_mc_sysfs and edac_mc.
      
      Also, only dumps information when DIMM/ranks are actually
      filled.
      
      After this patch, a dimm-based memory controller will print the debug
      info as:
      
      [ 1011.380027] EDAC DEBUG: edac_mc_dump_csrow: csrow->csrow_idx = 0
      [ 1011.380029] EDAC DEBUG: edac_mc_dump_csrow:   csrow = ffff8801169be000
      [ 1011.380031] EDAC DEBUG: edac_mc_dump_csrow:   csrow->first_page = 0x0
      [ 1011.380032] EDAC DEBUG: edac_mc_dump_csrow:   csrow->last_page = 0x0
      [ 1011.380034] EDAC DEBUG: edac_mc_dump_csrow:   csrow->page_mask = 0x0
      [ 1011.380035] EDAC DEBUG: edac_mc_dump_csrow:   csrow->nr_channels = 3
      [ 1011.380037] EDAC DEBUG: edac_mc_dump_csrow:   csrow->channels = ffff8801149c2840
      [ 1011.380039] EDAC DEBUG: edac_mc_dump_csrow:   csrow->mci = ffff880117426000
      [ 1011.380041] EDAC DEBUG: edac_mc_dump_channel:   channel->chan_idx = 0
      [ 1011.380042] EDAC DEBUG: edac_mc_dump_channel:     channel = ffff8801149c2860
      [ 1011.380044] EDAC DEBUG: edac_mc_dump_channel:     channel->csrow = ffff8801169be000
      [ 1011.380046] EDAC DEBUG: edac_mc_dump_channel:     channel->dimm = ffff88010fe90400
      ...
      [ 1011.380095] EDAC DEBUG: edac_mc_dump_dimm: dimm0: channel 0 slot 0 mapped as virtual row 0, chan 0
      [ 1011.380097] EDAC DEBUG: edac_mc_dump_dimm:   dimm = ffff88010fe90400
      [ 1011.380099] EDAC DEBUG: edac_mc_dump_dimm:   dimm->label = 'CPU#0Channel#0_DIMM#0'
      [ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm:   dimm->nr_pages = 0x40000
      [ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm:   dimm->grain = 8
      [ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm:   dimm->nr_pages = 0x40000
      ...
      
      (a rank-based memory controller would print, instead of "dimm?", "rank?"
       on the above debug info)
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      6e84d359
    • J
      edac: Convert debugfX to edac_dbg(X, · 956b9ba1
      Joe Perches 提交于
      Use a more common debugging style.
      
      Remove __FILE__ uses, add missing newlines,
      coalesce formats and align arguments.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      956b9ba1
    • J
      edac: Use more normal debugging macro style · 7e881856
      Joe Perches 提交于
      Convert macros to a simpler style and enforce appropriate
      format checking when not CONFIG_EDAC_DEBUG.
      
      Use fmt and __VA_ARGS__, neaten macros.
      
      Move some string arrays to the debugfx uses and remove the
      now unnecessary CONFIG_EDAC_DEBUG variable block definitions.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      7e881856
    • M
      edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs · dd23cd6e
      Mauro Carvalho Chehab 提交于
      The debug macro already adds that. Most of the work here was
      made by this small script:
      
      $f .=$_ while (<>);
      
      $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*": /\1"/g;
      $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*/\1/g;
      $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*"MC: /\1"/g;
      
      $f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
      $f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;
      $f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
      $f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;
      
      $f =~ s/\"MC\: \\n\"/"MC:\\n"/g;
      
      print $f;
      
      After running the script, manual cleanups were done to fix it the remaining
      places.
      
      While here, removed the __LINE__ on most places, as it doesn't actually give
      useful info on most places.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      dd23cd6e
    • M
      i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy · 356f0a30
      Mauro Carvalho Chehab 提交于
      Kernel kobjects have rigid rules: each container object should be
      dynamically allocated, and can't be allocated into a single kmalloc.
      
      EDAC never obeyed this rule: it has a single malloc function that
      allocates all needed data into a single kzalloc.
      
      As this is not accepted anymore, change the allocation schema of the
      EDAC *_info structs to enforce this Kernel standard.
      
      Cc: Aristeu Rozanski <arozansk@redhat.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      356f0a30
    • M
      edac: change the mem allocation scheme to make Documentation/kobject.txt happy · de3910eb
      Mauro Carvalho Chehab 提交于
      Kernel kobjects have rigid rules: each container object should be
      dynamically allocated, and can't be allocated into a single kmalloc.
      
      EDAC never obeyed this rule: it has a single malloc function that
      allocates all needed data into a single kzalloc.
      
      As this is not accepted anymore, change the allocation schema of the
      EDAC *_info structs to enforce this Kernel standard.
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Cc: Aristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Greg K H <gregkh@linuxfoundation.org>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: Tim Small <tim@buttersideup.com>
      Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
      Cc: "Arvind R." <arvino55@gmail.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Egor Martovetsky <egor@pasemi.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hitoshi Mitake <h.mitake@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      de3910eb
    • M
      edac: Only expose csrows/channels on legacy API if they're populated · e39f4ea9
      Mauro Carvalho Chehab 提交于
      This patch actually fixes a bug with the legacy API, where, at the
      same csrow, some channels may have different DIMMs. This can happen
      on FB-DIMM/RAMBUS and modern Intel controllers.
      
      This is the case, for example, of Nehalem machines:
      
      $ ./edac-ctl --layout
             +-----------------------------------+
             |                mc0                |
             | channel0  | channel1  | channel2  |
      -------+-----------------------------------+
      slot2: |     0 MB  |     0 MB  |     0 MB  |
      slot1: |  1024 MB  |     0 MB  |     0 MB  |
      slot0: |  1024 MB  |  1024 MB  |  1024 MB  |
      -------+-----------------------------------+
      
      Before this patch, non-filled memories were shown. Now, only what's
      filled is there:
      
      grep . /sys/devices/system/edac/mc/mc0/csrow*/ch?*
      /sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0
      /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label:CPU#0Channel#0_DIMM#0
      /sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0
      /sys/devices/system/edac/mc/mc0/csrow0/ch1_dimm_label:CPU#0Channel#0_DIMM#1
      /sys/devices/system/edac/mc/mc0/csrow1/ch0_ce_count:0
      /sys/devices/system/edac/mc/mc0/csrow1/ch0_dimm_label:CPU#0Channel#1_DIMM#0
      /sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count:0
      /sys/devices/system/edac/mc/mc0/csrow2/ch0_dimm_label:CPU#0Channel#2_DIMM#0
      
      Thanks-to: Aristeu Rozanski Filho <arozansk@redhat.com>
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      e39f4ea9
    • M
      edac: Move grain/dtype/edac_type calculus to be out of channel loop · fd63312d
      Mauro Carvalho Chehab 提交于
      The 3e7bddc changeset (edac: move dimm properties to struct memset_info)
      moved the calculus inside a loop. However, at those stuff are common to
      all channels, on several drivers, it is better to put the calculus
      outside the loop, to optimize the code.
      Reported-by: NAristeu Rozanski Filho <arozansk@redhat.com>
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      fd63312d
    • M
      edac: Add debufs nodes to allow doing fake error inject · 452a6bf9
      Mauro Carvalho Chehab 提交于
      Sometimes, it is useful to have a mechanism that generates fake
      errors, in order to test the EDAC core code, and the userspace
      tools.
      
      Provide such mechanism by adding a few debugfs nodes.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      452a6bf9
    • M
      edac: add a sysfs node to report the maximum location for the system · 8ad6c78a
      Mauro Carvalho Chehab 提交于
      The userspace tools need to know what's the maximum location on each
      system, as it helps to create nice maps showing how the memory was
      filled at the system.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      8ad6c78a
    • M
      edac: add a new per-dimm API and make the old per-virtual-rank API obsolete · 19974710
      Mauro Carvalho Chehab 提交于
      The old EDAC API is broken. It only works fine for systems manufatured
      before 2005 and for AMD 64. The reason is that it forces all memory
      controller drivers to discover rank info.
      
      Also, it doesn't allow grouping the several ranks into a DIMM.
      
      So, what almost all modern drivers do is to create a fake virtual-rank
      information, and use it to cheat the EDAC core to accept the driver.
      
      While this works if the user has enough time to discover what DIMM slot
      corresponds to each "virtual-rank" information, it prevents EDAC usage
      for users with less available time. It also makes life hard for vendors
      that may want to provide a table with their motherboards to the userspace
      tool (edac-utils) as each driver has its own logic for the virtual
      mapping.
      
      So, the old API should be removed, in favor of a more flexible API that
      allows newer drivers to not lie to the EDAC core.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Hui Wang <jason77.wang@gmail.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      19974710
    • M
      edac: Get rid of the old kobj's from the edac mc code · d90c0089
      Mauro Carvalho Chehab 提交于
      Now that al users for the old kobj raw access are gone,
      we can get rid of the legacy kobj-based structures and
      data.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      d90c0089
    • M
      i7core_edac: convert it to use struct device · 5c4cdb5a
      Mauro Carvalho Chehab 提交于
      Instead of relying on a complex logic inside the edac core to create
      a "device tree-like" sysfs struct, just use device_add.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      5c4cdb5a
    • M
      amd64_edac: convert sysfs logic to use struct device · c5608759
      Mauro Carvalho Chehab 提交于
      Now that the EDAC core supports struct device, there's no sense
      on having any logic at the EDAC core to simulate it. So, instead
      of adding such logic there, change the logic at amd64_edac to
      use it.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      c5608759
    • M
      mpc85xx_edac: convert sysfs logic to use struct device · ba004239
      Mauro Carvalho Chehab 提交于
      Now that the EDAC core supports struct device, there's no sense on
      having any logic at the EDAC core to simulate it. So, instead of adding
      such logic there, change the logic at mpc85xx_edac to use it
      
      compile-tested only.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      ba004239
    • M
      edac: rewrite the sysfs code to use struct device · 7a623c03
      Mauro Carvalho Chehab 提交于
      The EDAC subsystem uses the old struct sysdev approach,
      creating all nodes using the raw sysfs API. This is bad,
      as the API is deprecated.
      
      As we'll be changing the EDAC API, let's first port the existing
      code to struct device.
      
      There's one drawback on this patch: driver-specific sysfs
      nodes, used by mpc85xx_edac, amd64_edac and i7core_edac
       won't be created anymore. While it would be possible to
      also port the device-specific code, that would mix kobj with
      struct device, with is not recommended. Also, it is easier and nicer
      to move the code to the drivers, instead, as the core can get rid
      of some complex logic that just emulates what the device_add()
      and device_create_file() already does.
      
      The next patches will convert the driver-specific code to use
      the device-specific calls. Then, the remaining bits of the old
      sysfs API will be removed.
      
      NOTE: a per-MC bus is required, otherwise devices with more than
      one memory controller will hit a bug like the one below:
      
      [  819.094946] EDAC DEBUG: find_mci_by_dev: find_mci_by_dev()
      [  819.094948] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device() idx=1
      [  819.094952] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device(): creating device mc1
      [  819.094967] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device creating dimm0, located at channel 0 slot 0
      [  819.094984] ------------[ cut here ]------------
      [  819.100142] WARNING: at fs/sysfs/dir.c:481 sysfs_add_one+0xc1/0xf0()
      [  819.107282] Hardware name: S2600CP
      [  819.111078] sysfs: cannot create duplicate filename '/bus/edac/devices/dimm0'
      [  819.119062] Modules linked in: sb_edac(+) edac_core ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm microcode pcspkr iTCO_wdt iTCO_vendor_support igb i2c_i801 i2c_core sg ioatdma dca sr_mod cdrom sd_mod crc_t10dif ahci libahci isci libsas libata scsi_transport_sas scsi_mod wmi dm_mod [last unloaded: scsi_wait_scan]
      [  819.175748] Pid: 10902, comm: modprobe Not tainted 3.3.0-0.11.el7.v12.2.x86_64 #1
      [  819.184113] Call Trace:
      [  819.186868]  [<ffffffff8105adaf>] warn_slowpath_common+0x7f/0xc0
      [  819.193573]  [<ffffffff8105aea6>] warn_slowpath_fmt+0x46/0x50
      [  819.200000]  [<ffffffff811f53d1>] sysfs_add_one+0xc1/0xf0
      [  819.206025]  [<ffffffff811f5cf5>] sysfs_do_create_link+0x135/0x220
      [  819.212944]  [<ffffffff811f7023>] ? sysfs_create_group+0x13/0x20
      [  819.219656]  [<ffffffff811f5df3>] sysfs_create_link+0x13/0x20
      [  819.226109]  [<ffffffff813b04f6>] bus_add_device+0xe6/0x1b0
      [  819.232350]  [<ffffffff813ae7cb>] device_add+0x2db/0x460
      [  819.238300]  [<ffffffffa0325634>] edac_create_dimm_object+0x84/0xf0 [edac_core]
      [  819.246460]  [<ffffffffa0325e18>] edac_create_sysfs_mci_device+0xe8/0x290 [edac_core]
      [  819.255215]  [<ffffffffa0322e2a>] edac_mc_add_mc+0x5a/0x2c0 [edac_core]
      [  819.262611]  [<ffffffffa03412df>] sbridge_register_mci+0x1bc/0x279 [sb_edac]
      [  819.270493]  [<ffffffffa03417a3>] sbridge_probe+0xef/0x175 [sb_edac]
      [  819.277630]  [<ffffffff813ba4e8>] ? pm_runtime_enable+0x58/0x90
      [  819.284268]  [<ffffffff812f430c>] local_pci_probe+0x5c/0xd0
      [  819.290508]  [<ffffffff812f5ba1>] __pci_device_probe+0xf1/0x100
      [  819.297117]  [<ffffffff812f5bea>] pci_device_probe+0x3a/0x60
      [  819.303457]  [<ffffffff813b1003>] really_probe+0x73/0x270
      [  819.309496]  [<ffffffff813b138e>] driver_probe_device+0x4e/0xb0
      [  819.316104]  [<ffffffff813b149b>] __driver_attach+0xab/0xb0
      [  819.322337]  [<ffffffff813b13f0>] ? driver_probe_device+0xb0/0xb0
      [  819.329151]  [<ffffffff813af5d6>] bus_for_each_dev+0x56/0x90
      [  819.335489]  [<ffffffff813b0d7e>] driver_attach+0x1e/0x20
      [  819.341534]  [<ffffffff813b0980>] bus_add_driver+0x1b0/0x2a0
      [  819.347884]  [<ffffffffa0347000>] ? 0xffffffffa0346fff
      [  819.353641]  [<ffffffff813b19f6>] driver_register+0x76/0x140
      [  819.359980]  [<ffffffff8159f18b>] ? printk+0x51/0x53
      [  819.365524]  [<ffffffffa0347000>] ? 0xffffffffa0346fff
      [  819.371291]  [<ffffffff812f5896>] __pci_register_driver+0x56/0xd0
      [  819.378096]  [<ffffffffa0347054>] sbridge_init+0x54/0x1000 [sb_edac]
      [  819.385231]  [<ffffffff8100203f>] do_one_initcall+0x3f/0x170
      [  819.391577]  [<ffffffff810bcd2e>] sys_init_module+0xbe/0x230
      [  819.397926]  [<ffffffff815bb529>] system_call_fastpath+0x16/0x1b
      [  819.404633] ---[ end trace 1654fdd39556689f ]---
      
      This happens because the bus is not being properly initialized.
      Instead of putting the memory sub-devices inside the memory controller,
      it is putting everything under the same directory:
      
      $ tree /sys/bus/edac/
      /sys/bus/edac/
      ├── devices
      │   ├── all_channel_counts -> ../../../devices/system/edac/mc/mc0/all_channel_counts
      │   ├── csrow0 -> ../../../devices/system/edac/mc/mc0/csrow0
      │   ├── csrow1 -> ../../../devices/system/edac/mc/mc0/csrow1
      │   ├── csrow2 -> ../../../devices/system/edac/mc/mc0/csrow2
      │   ├── dimm0 -> ../../../devices/system/edac/mc/mc0/dimm0
      │   ├── dimm1 -> ../../../devices/system/edac/mc/mc0/dimm1
      │   ├── dimm3 -> ../../../devices/system/edac/mc/mc0/dimm3
      │   ├── dimm6 -> ../../../devices/system/edac/mc/mc0/dimm6
      │   ├── inject_addrmatch -> ../../../devices/system/edac/mc/mc0/inject_addrmatch
      │   ├── mc -> ../../../devices/system/edac/mc
      │   └── mc0 -> ../../../devices/system/edac/mc/mc0
      ├── drivers
      ├── drivers_autoprobe
      ├── drivers_probe
      └── uevent
      
      On a multi-memory controller system, the names "csrow%d" and "dimm%d"
      should be under "mc%d", and not at the main hierarchy level.
      
      So, we need to create a per-MC bus, in order to have its own namespace.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Greg K H <gregkh@linuxfoundation.org>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      7a623c03
  10. 11 6月, 2012 2 次提交
    • C
      edac: Do alignment logic properly in edac_align_ptr() · 8447c4d1
      Chris Metcalf 提交于
      The logic was checking the sizeof the structure being allocated to
      determine whether an alignment fixup was required.  This isn't right;
      what we actually care about is the alignment of the actual pointer that's
      about to be returned.  This became an issue recently because struct
      edac_mc_layer has a size that is not zero modulo eight, so we were
      taking the correctly-aligned pointer and forcing it to be misaligned.
      On Tile this caused an alignment exception.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      8447c4d1
    • M
      edac: Rename the parent dev to pdev · fd687502
      Mauro Carvalho Chehab 提交于
      As EDAC doesn't use struct device itself, it created a parent dev
      pointer called as "pdev".  Now that we'll be converting it to use
      struct device, instead of struct devsys, this needs to be fixed.
      
      No functional changes.
      Reviewed-by: NAristeu Rozanski <arozansk@redhat.com>
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: Jason Uhlenkott <juhlenko@akamai.com>
      Cc: Tim Small <tim@buttersideup.com>
      Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
      Cc: "Arvind R." <arvino55@gmail.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Egor Martovetsky <egor@pasemi.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Joe Perches <joe@perches.com>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hitoshi Mitake <h.mitake@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      Cc: Josh Boyer <jwboyer@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      fd687502