1. 17 2月, 2020 6 次提交
  2. 13 2月, 2020 1 次提交
    • R
      EDAC/mc: Fix use-after-free and memleaks during device removal · 216aa145
      Robert Richter 提交于
      A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and
      DEBUG_KMEMLEAK set, revealed several issues when removing an mci device:
      
      1) Use-after-free:
      
      On 27.11.19 17:07:33, John Garry wrote:
      > [   22.104498] BUG: KASAN: use-after-free in
      > edac_remove_sysfs_mci_device+0x148/0x180
      
      The use-after-free is caused by the mci_for_each_dimm() macro called in
      edac_remove_sysfs_mci_device(). The iterator was introduced with
      
        c498afaf ("EDAC: Introduce an mci_for_each_dimm() iterator").
      
      The iterator loop calls device_unregister(&dimm->dev), which removes
      the sysfs entry of the device, but also frees the dimm struct in
      dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(),
      the dimm struct is accessed again, after having been freed already.
      
      The fix is to free all the mci device's subsequent dimm and csrow
      objects at a later point, in _edac_mc_free(), when the mci device itself
      is being freed.
      
      This keeps the data structures intact and the mci device can be
      fully used until its removal. The change allows the safe usage of
      mci_for_each_dimm() to release dimm devices from sysfs.
      
      2) Memory leaks:
      
      Following memory leaks have been detected:
      
       # grep edac /sys/kernel/debug/kmemleak | sort | uniq -c
             1     [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0      # mci->csrows
            16     [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0      # csr->channels
            16     [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0      # csr->channels[chn]
             1     [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0      # mci->dimms
            34     [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc()
      
      All leaks are from memory allocated by edac_mc_alloc().
      
      Note: The test above shows that edac_mc_alloc() was called here from
      ghes_edac_register(), thus both functions show up in the stack trace
      but the module causing the leaks is edac_mc. The comments with the data
      structures involved were made manually by analyzing the objdump.
      
      The data structures listed above and created by edac_mc_alloc() are
      not properly removed during device removal, which is done in
      edac_mc_free().
      
      There are two paths implemented to remove the device depending on device
      registration, _edac_mc_free() is called if the device is not registered
      and edac_unregister_sysfs() otherwise.
      
      The implemenations differ. For the sysfs case, the mci device removal
      lacks the removal of subsequent data structures (csrows, channels,
      dimms). This causes the memory leaks (see mci_attr_release()).
      
       [ bp: Massage commit message. ]
      
      Fixes: c498afaf ("EDAC: Introduce an mci_for_each_dimm() iterator")
      Fixes: faa2ad09 ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.")
      Fixes: 7a623c03 ("edac: rewrite the sysfs code to use struct device")
      Reported-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NRobert Richter <rrichter@marvell.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NJohn Garry <john.garry@huawei.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com
      216aa145
  3. 10 11月, 2019 5 次提交
  4. 09 11月, 2019 1 次提交
  5. 04 9月, 2019 1 次提交
  6. 15 8月, 2019 1 次提交
  7. 03 8月, 2019 1 次提交
    • R
      EDAC/mc: Fix grain_bits calculation · 3724ace5
      Robert Richter 提交于
      The grain in EDAC is defined as "minimum granularity for an error
      report, in bytes". The following calculation of the grain_bits in
      edac_mc is wrong:
      
      	grain_bits = fls_long(e->grain) + 1;
      
      Where grain_bits is defined as:
      
      	grain = 1 << grain_bits
      
      Example:
      
      	grain = 8	# 64 bit (8 bytes)
      	grain_bits = fls_long(8) + 1
      	grain_bits = 4 + 1 = 5
      
      	grain = 1 << grain_bits
      	grain = 1 << 5 = 32
      
      Replace it with the correct calculation:
      
      	grain_bits = fls_long(e->grain - 1);
      
      The example gives now:
      
      	grain_bits = fls_long(8 - 1)
      	grain_bits = fls_long(7)
      	grain_bits = 3
      
      	grain = 1 << 3 = 8
      
      Also, check if the hardware reports a reasonable grain != 0 and fallback
      with a warning to 1 byte granularity otherwise.
      
       [ bp: massage a bit. ]
      Signed-off-by: NRobert Richter <rrichter@marvell.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: https://lkml.kernel.org/r/20190624150758.6695-2-rrichter@marvell.com
      3724ace5
  8. 14 5月, 2019 1 次提交
  9. 14 11月, 2018 1 次提交
  10. 17 8月, 2018 1 次提交
  11. 14 3月, 2018 2 次提交
    • T
      EDAC: Add new memory type for non-volatile DIMMs · 001f8613
      Tony Luck 提交于
      There are now non-volatile versions of DIMMs. Add a new entry to "enum
      mem_type" and a new string in edac_mem_types[].
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Aristeu Rozanski <aris@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jean Delvare <jdelvare@suse.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
      Cc: linux-acpi@vger.kernel.org
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/20180312182430.10335-3-tony.luck@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>
      001f8613
    • T
      EDAC: Drop duplicated array of strings for memory type names · d6dd77eb
      Tony Luck 提交于
      Somehow we ended up with two separate arrays of strings to describe the
      "enum mem_type" values.
      
      In edac_mc.c we have an exported list edac_mem_types[] that is used
      by a couple of drivers in debug messaged.
      
      In edac_mc_sysfs.c we have a private list that is used to display
      values in:
        /sys/devices/system/edac/mc/mc*/dimm*/dimm_mem_type
        /sys/devices/system/edac/mc/mc*/csrow*/mem_type
      
      This list was missing a value for MEM_LRDDR3.
      
      The string values in the two lists were different :-(
      
      Combining the lists, I kept the values so that the sysfs output
      will be unchanged as some scripts may depend on that.
      Reported-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Aristeu Rozanski <aris@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jean Delvare <jdelvare@suse.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
      Cc: linux-acpi@vger.kernel.org
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/20180312182430.10335-2-tony.luck@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>
      d6dd77eb
  12. 25 9月, 2017 1 次提交
  13. 10 4月, 2017 6 次提交
  14. 28 1月, 2017 1 次提交
  15. 25 12月, 2016 1 次提交
  16. 15 12月, 2016 2 次提交
  17. 14 11月, 2016 1 次提交
  18. 03 6月, 2016 1 次提交
    • N
      EDAC: Fix workqueues poll period resetting · fbedcaf4
      Nicholas Krause 提交于
      After the workqueue cleanup, we're registering workqueues based on
      the presence of an ->edac_check function. When that is the case,
      we're setting OP_RUNNING_POLL. But we forgot to check that in
      edac_mc_reset_delay_period(), leading to:
      
        BUG: unable to handle kernel paging request at 0000000000015d10
        IP: [ .. ] queued_spin_lock_slowpath
        PGD 3ffcc8067 PUD 3ffc56067 PMD 0
        Oops: 0002 [#1] SMP
        Modules linked in: ...
        CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1
        Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
        Stack:
        Call Trace:
          ? _raw_spin_lock_irqsave
          ? lock_timer_base.isra.34
          ? del_timer
          ? try_to_grab_pending
          ? mod_delayed_work_on
          ? edac_mc_reset_delay_period
          ? edac_set_poll_msec
          ? param_attr_store
          ? module_attr_store
          ? kernfs_fop_write
          ? __vfs_write
          ? __vfs_read
          ? __alloc_fd
          ? vfs_write
          ? SyS_write
          ? entry_SYSCALL_64_fastpath
        Code:
        RIP  [ .. ] queued_spin_lock_slowpath
         RSP <>
        CR2: 0000000000015d10
        ---[ end trace 3f286bc71cca15d1 ]---
        Kernel panic - not syncing: Fatal exception
      
      Fix it.
      Signed-off-by: NNicholas Krause <xerofoify@gmail.com>
      Cc: <stable@vger.kernel.org> # 4.5
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1463697958-13406-1-git-send-email-xerofoify@gmail.com
      [ Rewrite commit message. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      fbedcaf4
  19. 24 4月, 2016 1 次提交
  20. 02 2月, 2016 3 次提交
  21. 11 12月, 2015 2 次提交
    • B
      EDAC: Rework workqueue handling · c4cf3b45
      Borislav Petkov 提交于
      Hide the EDAC workqueue pointer in a separate compilation unit and add
      accessors for the workqueue manipulations needed.
      
      Remove edac_pci_reset_delay_period() which wasn't used by anything. It
      seems it got added without a user with
      
        91b99041 ("drivers/edac: updated PCI monitoring")
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      c4cf3b45
    • B
      EDAC: Robustify workqueues destruction · fcd5c4dd
      Borislav Petkov 提交于
      EDAC workqueue destruction is really fragile. We cancel delayed work
      but if it is still running and requeues itself, we still go ahead and
      destroy the workqueue and the queued work explodes when workqueue core
      attempts to run it.
      
      Make the destruction more robust by switching op_state to offline so
      that requeuing stops. Cancel any pending work *synchronously* too.
      
        EDAC i7core: Driver loaded.
        general protection fault: 0000 [#1] SMP
        CPU 12
        Modules linked in:
        Supported: Yes
        Pid: 0, comm: kworker/0:1 Tainted: G          IE   3.0.101-0-default #1 HP ProLiant DL380 G7
        RIP: 0010:[<ffffffff8107dcd7>]  [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0
        < ... regs ...>
        Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600)
        Stack:
         ...
        Call Trace:
         call_timer_fn
         run_timer_softirq
         __do_softirq
         call_softirq
         do_softirq
         irq_exit
         smp_apic_timer_interrupt
         apic_timer_interrupt
         intel_idle
         cpuidle_idle_call
         cpu_idle
        Code: ...
        RIP  __queue_work
         RSP <...>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      fcd5c4dd