1. 11 3月, 2016 1 次提交
    • L
      EDAC/sb_edac: Fix computation of channel address · eb1af3b7
      Luck, Tony 提交于
      Large memory Haswell-EX systems with multiple DIMMs per channel were
      sometimes reporting the wrong DIMM.
      
      Found three problems:
      
       1) Debug printouts for socket and channel interleave were not interpreting
          the register fields correctly. The socket interleave field is a 2^X
          value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2,
          2=3. 3=4).
      
       2) Actual use of the socket interleave value didn't interpret as 2^X
      
       3) Conversion of address to channel address was complicated, and wrong.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NAristeu Rozanski <arozansk@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-edac@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      eb1af3b7
  2. 08 3月, 2016 1 次提交
  3. 02 1月, 2016 1 次提交
  4. 11 12月, 2015 9 次提交
    • H
      EDAC, sb_edac: Set fixed DIMM width on Xeon Knights Landing · 45f4d3ab
      Hubert Chrzaniuk 提交于
      Knights Landing does not come with register that could be used to fetch
      DIMM width. However the value is fixed for this architecture so it can
      be hardcoded.
      Signed-off-by: NHubert Chrzaniuk <hubert.chrzaniuk@intel.com>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: lukasz.anaczkowski@intel.com
      Link: http://lkml.kernel.org/r/1449840082-18673-1-git-send-email-hubert.chrzaniuk@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>
      45f4d3ab
    • B
      EDAC: Rework workqueue handling · c4cf3b45
      Borislav Petkov 提交于
      Hide the EDAC workqueue pointer in a separate compilation unit and add
      accessors for the workqueue manipulations needed.
      
      Remove edac_pci_reset_delay_period() which wasn't used by anything. It
      seems it got added without a user with
      
        91b99041 ("drivers/edac: updated PCI monitoring")
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      c4cf3b45
    • B
      EDAC: Make edac_device workqueue setup/teardown functions static · e136fa01
      Borislav Petkov 提交于
      They're not used anywhere else.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      e136fa01
    • B
      EDAC: Remove edac_get_sysfs_subsys() error handling · d4538000
      Borislav Petkov 提交于
      It cannot fail now. We either load EDAC core after having successfully
      initialized edac_subsys or we don't.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      d4538000
    • B
      EDAC: Unexport and make edac_subsys static · a97d2627
      Borislav Petkov 提交于
      ... and use the accessor instead.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      a97d2627
    • B
      EDAC: Rip out the edac_subsys reference counting · 733476cf
      Borislav Petkov 提交于
      This was really dumb - reference counting for the main EDAC sysfs
      object. While we could've simply registered it as the first thing in the
      module init path and then hand it around to what needs it.
      
      Do that and rip out all the code around it, thus simplifying the whole
      handling significantly.
      
      Move the edac_subsys node back to edac_module.c.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      733476cf
    • B
      EDAC: Robustify workqueues destruction · fcd5c4dd
      Borislav Petkov 提交于
      EDAC workqueue destruction is really fragile. We cancel delayed work
      but if it is still running and requeues itself, we still go ahead and
      destroy the workqueue and the queued work explodes when workqueue core
      attempts to run it.
      
      Make the destruction more robust by switching op_state to offline so
      that requeuing stops. Cancel any pending work *synchronously* too.
      
        EDAC i7core: Driver loaded.
        general protection fault: 0000 [#1] SMP
        CPU 12
        Modules linked in:
        Supported: Yes
        Pid: 0, comm: kworker/0:1 Tainted: G          IE   3.0.101-0-default #1 HP ProLiant DL380 G7
        RIP: 0010:[<ffffffff8107dcd7>]  [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0
        < ... regs ...>
        Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600)
        Stack:
         ...
        Call Trace:
         call_timer_fn
         run_timer_softirq
         __do_softirq
         call_softirq
         do_softirq
         irq_exit
         smp_apic_timer_interrupt
         apic_timer_interrupt
         intel_idle
         cpuidle_idle_call
         cpu_idle
        Code: ...
        RIP  __queue_work
         RSP <...>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      fcd5c4dd
    • B
      EDAC, mc_sysfs: Fix freeing bus' name · 12e26969
      Borislav Petkov 提交于
      I get the splat below when modprobing/rmmoding EDAC drivers. It happens
      because bus->name is invalid after bus_unregister() has run. The Code: section
      below corresponds to:
      
        .loc 1 1108 0
        movq    672(%rbx), %rax # mci_1(D)->bus, mci_1(D)->bus
        .loc 1 1109 0
        popq    %rbx    #
      
        .loc 1 1108 0
        movq    (%rax), %rdi    # _7->name,
        jmp     kfree   #
      
      and %rax has some funky stuff 2030203020312030 which looks a lot like
      something walked over it.
      
      Fix that by saving the name ptr before doing stuff to string it points to.
      
        general protection fault: 0000 [#1] SMP
        Modules linked in: ...
        CPU: 4 PID: 10318 Comm: modprobe Tainted: G          I EN  3.12.51-11-default+ #48
        Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011
        task: ffff880311320280 ti: ffff88030da3e000 task.ti: ffff88030da3e000
        RIP: 0010:[<ffffffffa019da92>]  [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core]
        RSP: 0018:ffff88030da3fe28  EFLAGS: 00010292
        RAX: 2030203020312030 RBX: ffff880311b4e000 RCX: 000000000000095c
        RDX: 0000000000000001 RSI: ffff880327bb9600 RDI: 0000000000000286
        RBP: ffff880311b4e750 R08: 0000000000000000 R09: ffffffff81296110
        R10: 0000000000000400 R11: 0000000000000000 R12: ffff88030ba1ac68
        R13: 0000000000000001 R14: 00000000011b02f0 R15: 0000000000000000
        FS:  00007fc9bf8f5700(0000) GS:ffff8801a7c40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 0000000000403c90 CR3: 000000019ebdf000 CR4: 00000000000007e0
        Stack:
        Call Trace:
          i7core_unregister_mci.isra.9
          i7core_remove
          pci_device_remove
          __device_release_driver
          driver_detach
          bus_remove_driver
          pci_unregister_driver
          i7core_exit
          SyS_delete_module
          system_call_fastpath
          0x7fc9bf426536
        Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb e8 52 2a 1f e1 48 8b bb a0 02 00 00 e8 46 59 1f e1 48 8b 83 a0 02 00 00 5b <48> 8b 38 e9 26 9a fe e0 66 0f 1f 44 00 00 66 66 66 66 90 48 8b
        RIP  [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core]
         RSP <ffff88030da3fe28>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: <stable@vger.kernel.org> # v3.6..
      Fixes: 7a623c03 ("edac: rewrite the sysfs code to use struct device")
      12e26969
    • S
      EDAC, mpc85xx: Make mpc85xx-pci-edac a platform device · 666db563
      Scott Wood 提交于
      Originally the mpc85xx-pci-edac driver bound directly to the PCI
      controller node.
      
      Commit
      
        905e75c4 ("powerpc/fsl-pci: Unify pci/pcie initialization code")
      
      turned the PCI controller code into a platform device. Since we can't
      have two drivers binding to the same device, the EDAC code was changed
      to be called into as a library-style submodule. However, this doesn't
      work if the EDAC driver is built as a module.
      
      Commit
      
        8d8fcba6d1ea ("EDAC: Rip out the edac_subsys reference counting")
      
      exposed another problem with this approach -- mpc85xx_pci_err_probe()
      was being called in the same early boot phase that the PCI controller
      is initialized, rather than in the device_initcall phase that the EDAC
      layer expects. This caused a crash on boot.
      
      To fix this, the PCI controller code now creates a child platform device
      specifically for EDAC, which the mpc85xx-pci-edac driver binds to.
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Cc: Jia Hongtao <B38951@freescale.com>
      Cc: Jiri Kosina <jkosina@suse.com>
      Cc: Kim Phillips <kim.phillips@freescale.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Masanari Iida <standby24x7@gmail.com>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Rob Herring <robh@kernel.org>
      Link: http://lkml.kernel.org/r/1449774432-18593-1-git-send-email-scottwood@freescale.comSigned-off-by: NBorislav Petkov <bp@suse.de>
      666db563
  5. 06 12月, 2015 3 次提交
  6. 03 12月, 2015 2 次提交
  7. 18 11月, 2015 1 次提交
  8. 23 10月, 2015 1 次提交
  9. 21 10月, 2015 1 次提交
  10. 15 10月, 2015 3 次提交
  11. 03 10月, 2015 1 次提交
  12. 29 9月, 2015 2 次提交
  13. 28 9月, 2015 1 次提交
  14. 26 9月, 2015 2 次提交
    • T
      EDAC: Fix sysfs dimm_label store operation · 438470b8
      Toshi Kani 提交于
      Sysfs "dimm_label" and "chX_dimm_label" nodes have the following issues
      in their store operation:
      
       1) A newline-terminated input string causes redundant newlines:
      
        # echo "test" > /sys/bus/mc0/devices/dimm0/dimm_label
        # cat  /sys/bus/mc0/devices/dimm0/dimm_label
        test
      
        #  od -bc /sys/bus/mc0/devices/dimm0/dimm_label
        0000000 164 145 163 164 012 012
                  t   e   s   t  \n  \n
        0000006
      
       2) The original label string (31 characters) cannot be stored due to
          an improper size check:
      
        # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0" > /sys/bus/mc0/devices/dimm0/dimm_label
        # cat /sys/bus/mc0/devices/dimm0/dimm_label
      
        # od -bc /sys/bus/mc0/devices/dimm0/dimm_label
         0000000 012 012
                  \n  \n
         0000002
      
       3) An input string longer than the buffer size results a wrong label
          info as it allows a retry with the remaining string:
      
        # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0_TEST" > /sys/bus/mc0/devices/dimm0/dimm_label
        # cat  /sys/bus/mc0/devices/dimm0/dimm_label
        _TEST
      
      Fix these issues by making the following changes:
       1) Replace a newline character at the end by setting a null. It also
          assures that the string is null-terminated in the label buffer.
       2) Check the label buffer size with 'sizeof(dimm->label)'.
       3) Fail a request if its string exceeds the label buffer size.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: Robert Elliott <elliott@hpe.com>
      Link: http://lkml.kernel.org/r/1443121564.25474.160.camel@hpe.comSigned-off-by: NBorislav Petkov <bp@suse.de>
      438470b8
    • T
      EDAC: Fix sysfs dimm_label show operation · 1ea62c59
      Toshi Kani 提交于
      After
      
        7d375bff ("sb_edac: Fix support for systems with two home agents per socket")
      
      sysfs "dimm_label" and "chX_dimm_label" show their label string without a
      newline "\n" at the end.
      
        [root@orange ~]# cat /sys/bus/mc0/devices/dimm0/dimm_label
        CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#
      
        [root@orange ~]# cat /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label
        CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#
      
      The label strings now have 31 characters, which are the same as
      EDAC_MC_LABEL_LEN. Since the snprintf()s in channel_dimm_label_show()
      and dimmdev_label_show() limit the whole length by EDAC_MC_LABEL_LEN,
      the newline in the format "%s\n" is ignored.
      
        [root@orange ~]# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
        0000000 103 120 125 137 123 162 143 111 104 043 060 137 110 141 043 060
                  C   P   U   _   S   r   c   I   D   #   0   _   H   a   #   0
        0000020 137 103 150 141 156 043 060 137 104 111 115 115 043 060 000
                  _   C   h   a   n   #   0   _   D   I   M   M   #   0  \0
        0000037
      
      Fix it by using 'sizeof(dimm->label) + 1' as the whole length in the
      snprintf()s in channel_dimm_label_show() and dimmdev_label_show().
      Reported-by: NRobert Elliott <elliott@hpe.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Link: http://lkml.kernel.org/r/1442933883-21587-2-git-send-email-toshi.kani@hpe.comSigned-off-by: NBorislav Petkov <bp@suse.de>
      1ea62c59
  15. 25 9月, 2015 4 次提交
  16. 23 9月, 2015 5 次提交
  17. 22 9月, 2015 1 次提交
  18. 09 9月, 2015 1 次提交