1. 17 2月, 2020 7 次提交
  2. 13 2月, 2020 2 次提交
    • R
      EDAC/sysfs: Remove csrow objects on errors · 4d59588c
      Robert Richter 提交于
      All created csrow objects must be removed in the error path of
      edac_create_csrow_objects(). The objects have been added as devices.
      
      They need to be removed by doing a device_del() *and* put_device() call
      to also free their memory. The missing put_device() leaves a memory
      leak. Use device_unregister() instead of device_del() which properly
      unregisters the device doing both.
      
      Fixes: 7adc05d2 ("EDAC/sysfs: Drop device references properly")
      Signed-off-by: NRobert Richter <rrichter@marvell.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NJohn Garry <john.garry@huawei.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200212120340.4764-4-rrichter@marvell.com
      4d59588c
    • R
      EDAC/mc: Fix use-after-free and memleaks during device removal · 216aa145
      Robert Richter 提交于
      A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and
      DEBUG_KMEMLEAK set, revealed several issues when removing an mci device:
      
      1) Use-after-free:
      
      On 27.11.19 17:07:33, John Garry wrote:
      > [   22.104498] BUG: KASAN: use-after-free in
      > edac_remove_sysfs_mci_device+0x148/0x180
      
      The use-after-free is caused by the mci_for_each_dimm() macro called in
      edac_remove_sysfs_mci_device(). The iterator was introduced with
      
        c498afaf ("EDAC: Introduce an mci_for_each_dimm() iterator").
      
      The iterator loop calls device_unregister(&dimm->dev), which removes
      the sysfs entry of the device, but also frees the dimm struct in
      dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(),
      the dimm struct is accessed again, after having been freed already.
      
      The fix is to free all the mci device's subsequent dimm and csrow
      objects at a later point, in _edac_mc_free(), when the mci device itself
      is being freed.
      
      This keeps the data structures intact and the mci device can be
      fully used until its removal. The change allows the safe usage of
      mci_for_each_dimm() to release dimm devices from sysfs.
      
      2) Memory leaks:
      
      Following memory leaks have been detected:
      
       # grep edac /sys/kernel/debug/kmemleak | sort | uniq -c
             1     [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0      # mci->csrows
            16     [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0      # csr->channels
            16     [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0      # csr->channels[chn]
             1     [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0      # mci->dimms
            34     [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc()
      
      All leaks are from memory allocated by edac_mc_alloc().
      
      Note: The test above shows that edac_mc_alloc() was called here from
      ghes_edac_register(), thus both functions show up in the stack trace
      but the module causing the leaks is edac_mc. The comments with the data
      structures involved were made manually by analyzing the objdump.
      
      The data structures listed above and created by edac_mc_alloc() are
      not properly removed during device removal, which is done in
      edac_mc_free().
      
      There are two paths implemented to remove the device depending on device
      registration, _edac_mc_free() is called if the device is not registered
      and edac_unregister_sysfs() otherwise.
      
      The implemenations differ. For the sysfs case, the mci device removal
      lacks the removal of subsequent data structures (csrows, channels,
      dimms). This causes the memory leaks (see mci_attr_release()).
      
       [ bp: Massage commit message. ]
      
      Fixes: c498afaf ("EDAC: Introduce an mci_for_each_dimm() iterator")
      Fixes: faa2ad09 ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.")
      Fixes: 7a623c03 ("edac: rewrite the sysfs code to use struct device")
      Reported-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NRobert Richter <rrichter@marvell.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NJohn Garry <john.garry@huawei.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com
      216aa145
  3. 17 1月, 2020 7 次提交
  4. 13 1月, 2020 1 次提交
  5. 06 1月, 2020 1 次提交
  6. 20 12月, 2019 1 次提交
    • C
      riscv: move sifive_l2_cache.c to drivers/soc · 9209fb51
      Christoph Hellwig 提交于
      The sifive_l2_cache.c is in no way related to RISC-V architecture
      memory management.  It is a little stub driver working around the fact
      that the EDAC maintainers prefer their drivers to be structured in a
      certain way that doesn't fit the SiFive SOCs.
      
      Move the file to drivers/soc and add a Kconfig option for it, as well
      as the whole drivers/soc boilerplate for CONFIG_SOC_SIFIVE.
      
      Fixes: a967a289 ("RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs")
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      [paul.walmsley@sifive.com: keep the MAINTAINERS change specific to the L2$ controller code]
      Signed-off-by: NPaul Walmsley <paul.walmsley@sifive.com>
      9209fb51
  7. 19 12月, 2019 1 次提交
  8. 17 12月, 2019 1 次提交
  9. 11 12月, 2019 1 次提交
  10. 10 12月, 2019 1 次提交
  11. 22 11月, 2019 4 次提交
  12. 10 11月, 2019 8 次提交
  13. 09 11月, 2019 3 次提交
  14. 08 11月, 2019 1 次提交
    • R
      EDAC/ghes: Fix locking and memory barrier issues · 23f61b9f
      Robert Richter 提交于
      The ghes registration and refcount is broken in several ways:
      
       * ghes_edac_register() returns with success for a 2nd instance
         even if a first instance's registration is still running. This is
         not correct as the first instance may fail later. A subsequent
         registration may not finish before the first. Parallel registrations
         must be avoided.
      
       * The refcount was increased even if a registration failed. This
         leads to stale counters preventing the device from being released.
      
       * The ghes refcount may not be decremented properly on unregistration.
         Always decrement the refcount once ghes_edac_unregister() is called to
         keep the refcount sane.
      
       * The ghes_pvt pointer is handed to the irq handler before registration
         finished.
      
       * The mci structure could be freed while the irq handler is running.
      
      Fix this by adding a mutex to ghes_edac_register(). This mutex
      serializes instances to register and unregister. The refcount is only
      increased if the registration succeeded. This makes sure the refcount is
      in a consistent state after registering or unregistering a device.
      
      Note: A spinlock cannot be used here as the code section may sleep.
      
      The ghes_pvt is protected by ghes_lock now. This ensures the pointer is
      not updated before registration was finished or while the irq handler is
      running. It is unset before unregistering the device including necessary
      (implicit) memory barriers making the changes visible to other CPUs.
      Thus, the device can not be used anymore by an interrupt.
      
      Also, rename ghes_init to ghes_refcount for better readability and
      switch to refcount API.
      
      A refcount is needed because there can be multiple GHES structures being
      defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error
      Source, "Some platforms may describe multiple Generic Hardware Error
      Source structures with different notification types, ...").
      
      Another approach to use the mci's device refcount (get_device()) and
      have a release function does not work here. A release function will be
      called only for device_release() with the last put_device() call. The
      device must be deleted *before* that with device_del(). This is only
      possible by maintaining an own refcount.
      
       [ bp: touchups. ]
      
      Fixes: 0fe5f281 ("EDAC, ghes: Model a single, logical memory controller")
      Fixes: 1e72e673 ("EDAC/ghes: Fix Use after free in ghes_edac remove path")
      Co-developed-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Co-developed-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NRobert Richter <rrichter@marvell.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
      23f61b9f
  15. 06 11月, 2019 1 次提交
    • Y
      EDAC/amd64: Check for memory before fully initializing an instance · 582f94b5
      Yazen Ghannam 提交于
      Return early before checking for ECC if the node does not have any
      populated memory.
      
      Free any cached hardware data before returning. Also, return 0 in this
      case since this is not a failure. Other nodes may have memory and the
      module should attempt to load an instance for them.
      
      Move printing of hardware information to after the instance is
      initialized, so that the information is only printed for nodes with
      memory.
      
      Return an error code when ECC is disabled. This check happens after
      checking for memory. The module should explicitly fail to load if memory
      is populated on a node and ECC is disabled.
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Robert Richter <rrichter@marvell.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: https://lkml.kernel.org/r/20191106012448.243970-6-Yazen.Ghannam@amd.com
      582f94b5