1. 10 4月, 2017 4 次提交
  2. 28 1月, 2017 1 次提交
  3. 25 12月, 2016 1 次提交
  4. 15 12月, 2016 2 次提交
  5. 14 11月, 2016 1 次提交
  6. 03 6月, 2016 1 次提交
    • N
      EDAC: Fix workqueues poll period resetting · fbedcaf4
      Nicholas Krause 提交于
      After the workqueue cleanup, we're registering workqueues based on
      the presence of an ->edac_check function. When that is the case,
      we're setting OP_RUNNING_POLL. But we forgot to check that in
      edac_mc_reset_delay_period(), leading to:
      
        BUG: unable to handle kernel paging request at 0000000000015d10
        IP: [ .. ] queued_spin_lock_slowpath
        PGD 3ffcc8067 PUD 3ffc56067 PMD 0
        Oops: 0002 [#1] SMP
        Modules linked in: ...
        CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1
        Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
        Stack:
        Call Trace:
          ? _raw_spin_lock_irqsave
          ? lock_timer_base.isra.34
          ? del_timer
          ? try_to_grab_pending
          ? mod_delayed_work_on
          ? edac_mc_reset_delay_period
          ? edac_set_poll_msec
          ? param_attr_store
          ? module_attr_store
          ? kernfs_fop_write
          ? __vfs_write
          ? __vfs_read
          ? __alloc_fd
          ? vfs_write
          ? SyS_write
          ? entry_SYSCALL_64_fastpath
        Code:
        RIP  [ .. ] queued_spin_lock_slowpath
         RSP <>
        CR2: 0000000000015d10
        ---[ end trace 3f286bc71cca15d1 ]---
        Kernel panic - not syncing: Fatal exception
      
      Fix it.
      Signed-off-by: NNicholas Krause <xerofoify@gmail.com>
      Cc: <stable@vger.kernel.org> # 4.5
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1463697958-13406-1-git-send-email-xerofoify@gmail.com
      [ Rewrite commit message. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      fbedcaf4
  7. 24 4月, 2016 1 次提交
  8. 02 2月, 2016 3 次提交
  9. 11 12月, 2015 2 次提交
    • B
      EDAC: Rework workqueue handling · c4cf3b45
      Borislav Petkov 提交于
      Hide the EDAC workqueue pointer in a separate compilation unit and add
      accessors for the workqueue manipulations needed.
      
      Remove edac_pci_reset_delay_period() which wasn't used by anything. It
      seems it got added without a user with
      
        91b99041 ("drivers/edac: updated PCI monitoring")
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      c4cf3b45
    • B
      EDAC: Robustify workqueues destruction · fcd5c4dd
      Borislav Petkov 提交于
      EDAC workqueue destruction is really fragile. We cancel delayed work
      but if it is still running and requeues itself, we still go ahead and
      destroy the workqueue and the queued work explodes when workqueue core
      attempts to run it.
      
      Make the destruction more robust by switching op_state to offline so
      that requeuing stops. Cancel any pending work *synchronously* too.
      
        EDAC i7core: Driver loaded.
        general protection fault: 0000 [#1] SMP
        CPU 12
        Modules linked in:
        Supported: Yes
        Pid: 0, comm: kworker/0:1 Tainted: G          IE   3.0.101-0-default #1 HP ProLiant DL380 G7
        RIP: 0010:[<ffffffff8107dcd7>]  [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0
        < ... regs ...>
        Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600)
        Stack:
         ...
        Call Trace:
         call_timer_fn
         run_timer_softirq
         __do_softirq
         call_softirq
         do_softirq
         irq_exit
         smp_apic_timer_interrupt
         apic_timer_interrupt
         intel_idle
         cpuidle_idle_call
         cpu_idle
        Code: ...
        RIP  __queue_work
         RSP <...>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      fcd5c4dd
  10. 23 10月, 2015 1 次提交
  11. 28 5月, 2015 1 次提交
    • B
      EDAC: Cleanup atomic_scrub mess · b01aec9b
      Borislav Petkov 提交于
      So first of all, this atomic_scrub() function's naming is bad. It looks
      like an atomic_t helper. Change it to edac_atomic_scrub().
      
      The bigger problem is that this function is arch-specific and every new
      arch which doesn't necessarily need that functionality still needs to
      define it, otherwise EDAC doesn't compile.
      
      So instead of doing that and including arch-specific headers, have each
      arch define an EDAC_ATOMIC_SCRUB symbol which can be used in edac_mc.c
      for ifdeffery. Much cleaner.
      
      And we already are doing this with another symbol - EDAC_SUPPORT. This
      is also much cleaner than having CONFIG_EDAC enumerate all the arches
      which need/have EDAC support and drivers.
      
      This way I can kill the useless edac.h header in tile too.
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NChris Metcalf <cmetcalf@ezchip.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-edac@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: "Maciej W. Rozycki" <macro@codesourcery.com>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Steven J. Hill" <Steven.Hill@imgtec.com>
      Cc: x86@kernel.org
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      b01aec9b
  12. 23 2月, 2015 1 次提交
  13. 20 10月, 2014 2 次提交
  14. 02 9月, 2014 1 次提交
  15. 24 6月, 2014 1 次提交
  16. 09 5月, 2014 1 次提交
  17. 14 2月, 2014 2 次提交
  18. 05 11月, 2013 1 次提交
  19. 24 7月, 2013 1 次提交
    • B
      EDAC: Fix lockdep splat · 88d84ac9
      Borislav Petkov 提交于
      Fix the following:
      
      BUG: key ffff88043bdd0330 not in .data!
      ------------[ cut here ]------------
      WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
      DEBUG_LOCKS_WARN_ON(1)
      Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
      CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
      Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
       0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
       ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
       ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
      Call Trace:
        dump_stack
        warn_slowpath_common
        warn_slowpath_fmt
        lockdep_init_map
        ? trace_hardirqs_on_caller
        ? trace_hardirqs_on
        debug_mutex_init
        __mutex_init
        bus_register
        edac_create_sysfs_mci_device
        edac_mc_add_mc
        sbridge_probe
        pci_device_probe
        driver_probe_device
        __driver_attach
        ? driver_probe_device
        bus_for_each_dev
        driver_attach
        bus_add_driver
        driver_register
        __pci_register_driver
        ? 0xffffffffa0010fff
        sbridge_init
        ? 0xffffffffa0010fff
        do_one_initcall
        load_module
        ? unset_module_init_ro_nx
        SyS_init_module
        tracesys
      ---[ end trace d24a70b0d3ddf733 ]---
      EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
      EDAC sbridge: Driver loaded.
      
      What happens is that bus_register needs a statically allocated lock_key
      because the last is handed in to lockdep. However, struct mem_ctl_info
      embeds struct bus_type (the whole struct, not a pointer to it) and the
      whole thing gets dynamically allocated.
      
      Fix this by using a statically allocated struct bus_type for the MC bus.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NMauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: stable@kernel.org # v3.10
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      88d84ac9
  20. 16 3月, 2013 1 次提交
  21. 22 2月, 2013 2 次提交
  22. 21 2月, 2013 2 次提交
    • M
      edac: lock module owner to avoid error report conflicts · 80cc7d87
      Mauro Carvalho Chehab 提交于
      APEI GHES and i7core_edac/sb_edac currently can be loaded at
      the same time, but those are Highlander modules:
      	"There can be only one".
      
      There are two reasons for that:
      
      1) Each driver assumes that it is the only one registering at
         the EDAC core, as it is driver's responsibility to number
         the memory controllers, and all of them start from 0;
      
      2) If BIOS is handling the memory errors, the OS can't also be
         doing it, as one will mangle with the other.
      
      So, we need to add an module owner's lock at the EDAC core,
      in order to avoid having two different modules handling memory
      errors at the same time. The best way for doing this lock seems
      to use the driver's name, as this is unique, and won't require
      changes on every driver.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      80cc7d87
    • M
      edac: add a new memory layer type · c66b5a79
      Mauro Carvalho Chehab 提交于
      There are some cases where the memory controller layout is
      completely hidden. This is the case of firmware-driven error
      code, like the one provided by GHES. Add a new layer to be
      used on such memory error report mechanisms.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      c66b5a79
  23. 30 1月, 2013 1 次提交
  24. 21 12月, 2012 1 次提交
  25. 28 11月, 2012 2 次提交
  26. 25 10月, 2012 1 次提交
    • M
      edac: Fix the dimm filling for csrows-based layouts · 24bef66e
      Mauro Carvalho Chehab 提交于
      The driver is currently filling data in a wrong way, on drivers
      for csrows-based memory controller, when the first layer is a
      csrow.
      
      This is not easily to notice, as, in general, memories are
      filed in dual, interleaved, symetric mode, as very few memory
      controllers support asymetric modes.
      
      While digging into a bug for i82795_edac driver, the asymetric
      mode there is now working, allowing us to fill the machine with
      4x1GB ranks at channel 0, and 2x512GB at channel 1:
      
      Channel 0 ranks:
      EDAC DEBUG: i82975x_init_csrows: DIMM A0: from page 0x00000000 to 0x0003ffff (size: 0x00040000 pages)
      EDAC DEBUG: i82975x_init_csrows: DIMM A1: from page 0x00040000 to 0x0007ffff (size: 0x00040000 pages)
      EDAC DEBUG: i82975x_init_csrows: DIMM A2: from page 0x00080000 to 0x000bffff (size: 0x00040000 pages)
      EDAC DEBUG: i82975x_init_csrows: DIMM A3: from page 0x000c0000 to 0x000fffff (size: 0x00040000 pages)
      
      Channel 1 ranks:
      EDAC DEBUG: i82975x_init_csrows: DIMM B0: from page 0x00100000 to 0x0011ffff (size: 0x00020000 pages)
      EDAC DEBUG: i82975x_init_csrows: DIMM B1: from page 0x00120000 to 0x0013ffff (size: 0x00020000 pages)
      
      Instead of properly showing the memories as such, before this patch, it
      shows the memory layout as:
      
                +-----------------------------------+
                |                mc0                |
                |  csrow0   |  csrow1   |  csrow2   |
      ----------+-----------------------------------+
      channel1: |  1024 MB  |  1024 MB  |   512 MB  |
      channel0: |  1024 MB  |  1024 MB  |   512 MB  |
      ----------+-----------------------------------+
      
      as if both channels were symetric, grouping the DIMMs on a wrong
      layout.
      
      After this patch, the memory is correctly represented.
      So, for csrows at layers[0], it shows:
      
                +-----------------------------------------------+
                |                      mc0                      |
                |  csrow0   |  csrow1   |  csrow2   |  csrow3   |
      ----------+-----------------------------------------------+
      channel1: |   512 MB  |   512 MB  |     0 MB  |     0 MB  |
      channel0: |  1024 MB  |  1024 MB  |  1024 MB  |  1024 MB  |
      ----------+-----------------------------------------------+
      
      For csrows at layers[1], it shows:
      
              +-----------------------+
              |          mc0          |
              | channel0  | channel1  |
      --------+-----------------------+
      csrow3: |  1024 MB  |     0 MB  |
      csrow2: |  1024 MB  |     0 MB  |
      --------+-----------------------+
      csrow1: |  1024 MB  |   512 MB  |
      csrow0: |  1024 MB  |   512 MB  |
      --------+-----------------------+
      
      So, no matter of what comes first, the information between
      channel and csrow will be properly represented.
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      24bef66e
  27. 24 9月, 2012 2 次提交
    • S
      edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs. · faa2ad09
      Shaun Ruffell 提交于
      Fix potential NULL pointer dereference in edac_unregister_sysfs() on
      system boot introduced in 3.6-rc1.
      
      Since commit 7a623c03 ("edac: rewrite the sysfs code to use struct
      device") edac_mc_alloc() no longer initializes embedded kobjects in
      struct mem_ctl_info.  Therefore edac_mc_free() can no longer simply
      decrement a kobject reference count to free the allocated memory unless
      the memory controller driver module had also called edac_mc_add_mc().
      
      Now edac_mc_free() will check if the newly embedded struct device has
      been registered with sysfs before using either the standard device
      release functions or freeing the data structures itself with logic
      pulled out of the error path of edac_mc_alloc().
      
      The BUG this patch resolves for me:
      
        BUG: unable to handle kernel NULL pointer dereference at   (null)
        EIP is at __wake_up_common+0x1a/0x6a
        Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 task.ti=f3dc6000)
        Call Trace:
          complete_all+0x3f/0x50
          device_pm_remove+0x23/0xa2
          device_del+0x34/0x142
          edac_unregister_sysfs+0x3b/0x5c [edac_core]
          edac_mc_free+0x29/0x2f [edac_core]
          e7xxx_probe1+0x268/0x311 [e7xxx_edac]
          e7xxx_init_one+0x56/0x61 [e7xxx_edac]
          local_pci_probe+0x13/0x15
        ...
      
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      Signed-off-by: NShaun Ruffell <sruffell@digium.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      faa2ad09
    • F
      edac_mc: fix messy kfree calls in the error path · ef6e7816
      Fengguang Wu 提交于
      coccinelle warns about:
      
      + drivers/edac/edac_mc.c:429:9-23: ERROR: reference preceded by free on line 429
      
         421         if (mci->csrows) {
       > 422                 for (chn = 0; chn < tot_channels; chn++) {
         423                         csr = mci->csrows[chn];
         424                         if (csr) {
       > 425                                 for (chn = 0; chn < tot_channels; chn++)
         426                                          kfree(csr->channels[chn]);
         427                                  kfree(csr);
         428                          }
       > 429                          kfree(mci->csrows[i]);
         430                  }
         431                  kfree(mci->csrows);
         432          }
      
      and that code block seem to mess things up in several ways (double free, memory
      leak, out-of-bound reads etc.):
      
      L422: The iterator "chn" and bound "tot_channels" are totally wrong. Should be
            "row" and "tot_csrows" respectively. Which means either memory leak, or
            out-of-bound reads (which if does not trigger an immediate page fault
            error, will further lead to kfree() on random addresses).
      
      L425: The inner loop is reusing the same iterator "chn" as the outer loop,
            which could lead to premature end of the outer loop, and hence memory leak.
      
      L429: The array index 'i' in mci->csrows[i] is a temporary value used in
            previous loops, and won't change at all in the current loop. Which
            means either out-of-bound read and possibly kfree(random number), or the
            same mci->csrows[i] get freed once and again, and possibly double free
            for the kfree(csr) in L427.
      
      L426/L427: a kfree(csr->channels) is needed in between to avoid leaking the memory.
      
      The buggy code was introduced by commit de3910eb ("edac: change the mem
      allocation scheme to make Documentation/kobject.txt happy") in the 3.6-rc1
      merge window. Fix it by freeing up resources in this order:
      
        free csrows[i]->channels[j]
        free csrows[i]->channels
        free csrows[i]
        free csrows
      
      CC: Mauro Carvalho Chehab <mchehab@redhat.com>
      CC: Shaun Ruffell <sruffell@digium.com>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef6e7816