1. 13 2月, 2020 1 次提交
    • R
      EDAC/mc: Fix use-after-free and memleaks during device removal · 216aa145
      Robert Richter 提交于
      A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and
      DEBUG_KMEMLEAK set, revealed several issues when removing an mci device:
      
      1) Use-after-free:
      
      On 27.11.19 17:07:33, John Garry wrote:
      > [   22.104498] BUG: KASAN: use-after-free in
      > edac_remove_sysfs_mci_device+0x148/0x180
      
      The use-after-free is caused by the mci_for_each_dimm() macro called in
      edac_remove_sysfs_mci_device(). The iterator was introduced with
      
        c498afaf ("EDAC: Introduce an mci_for_each_dimm() iterator").
      
      The iterator loop calls device_unregister(&dimm->dev), which removes
      the sysfs entry of the device, but also frees the dimm struct in
      dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(),
      the dimm struct is accessed again, after having been freed already.
      
      The fix is to free all the mci device's subsequent dimm and csrow
      objects at a later point, in _edac_mc_free(), when the mci device itself
      is being freed.
      
      This keeps the data structures intact and the mci device can be
      fully used until its removal. The change allows the safe usage of
      mci_for_each_dimm() to release dimm devices from sysfs.
      
      2) Memory leaks:
      
      Following memory leaks have been detected:
      
       # grep edac /sys/kernel/debug/kmemleak | sort | uniq -c
             1     [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0      # mci->csrows
            16     [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0      # csr->channels
            16     [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0      # csr->channels[chn]
             1     [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0      # mci->dimms
            34     [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc()
      
      All leaks are from memory allocated by edac_mc_alloc().
      
      Note: The test above shows that edac_mc_alloc() was called here from
      ghes_edac_register(), thus both functions show up in the stack trace
      but the module causing the leaks is edac_mc. The comments with the data
      structures involved were made manually by analyzing the objdump.
      
      The data structures listed above and created by edac_mc_alloc() are
      not properly removed during device removal, which is done in
      edac_mc_free().
      
      There are two paths implemented to remove the device depending on device
      registration, _edac_mc_free() is called if the device is not registered
      and edac_unregister_sysfs() otherwise.
      
      The implemenations differ. For the sysfs case, the mci device removal
      lacks the removal of subsequent data structures (csrows, channels,
      dimms). This causes the memory leaks (see mci_attr_release()).
      
       [ bp: Massage commit message. ]
      
      Fixes: c498afaf ("EDAC: Introduce an mci_for_each_dimm() iterator")
      Fixes: faa2ad09 ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.")
      Fixes: 7a623c03 ("edac: rewrite the sysfs code to use struct device")
      Reported-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NRobert Richter <rrichter@marvell.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NJohn Garry <john.garry@huawei.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com
      216aa145
  2. 17 1月, 2020 7 次提交
  3. 13 1月, 2020 1 次提交
  4. 06 1月, 2020 1 次提交
  5. 20 12月, 2019 1 次提交
    • C
      riscv: move sifive_l2_cache.c to drivers/soc · 9209fb51
      Christoph Hellwig 提交于
      The sifive_l2_cache.c is in no way related to RISC-V architecture
      memory management.  It is a little stub driver working around the fact
      that the EDAC maintainers prefer their drivers to be structured in a
      certain way that doesn't fit the SiFive SOCs.
      
      Move the file to drivers/soc and add a Kconfig option for it, as well
      as the whole drivers/soc boilerplate for CONFIG_SOC_SIFIVE.
      
      Fixes: a967a289 ("RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs")
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      [paul.walmsley@sifive.com: keep the MAINTAINERS change specific to the L2$ controller code]
      Signed-off-by: NPaul Walmsley <paul.walmsley@sifive.com>
      9209fb51
  6. 19 12月, 2019 1 次提交
  7. 17 12月, 2019 1 次提交
  8. 11 12月, 2019 1 次提交
  9. 10 12月, 2019 1 次提交
  10. 22 11月, 2019 4 次提交
  11. 10 11月, 2019 8 次提交
  12. 09 11月, 2019 3 次提交
  13. 08 11月, 2019 1 次提交
    • R
      EDAC/ghes: Fix locking and memory barrier issues · 23f61b9f
      Robert Richter 提交于
      The ghes registration and refcount is broken in several ways:
      
       * ghes_edac_register() returns with success for a 2nd instance
         even if a first instance's registration is still running. This is
         not correct as the first instance may fail later. A subsequent
         registration may not finish before the first. Parallel registrations
         must be avoided.
      
       * The refcount was increased even if a registration failed. This
         leads to stale counters preventing the device from being released.
      
       * The ghes refcount may not be decremented properly on unregistration.
         Always decrement the refcount once ghes_edac_unregister() is called to
         keep the refcount sane.
      
       * The ghes_pvt pointer is handed to the irq handler before registration
         finished.
      
       * The mci structure could be freed while the irq handler is running.
      
      Fix this by adding a mutex to ghes_edac_register(). This mutex
      serializes instances to register and unregister. The refcount is only
      increased if the registration succeeded. This makes sure the refcount is
      in a consistent state after registering or unregistering a device.
      
      Note: A spinlock cannot be used here as the code section may sleep.
      
      The ghes_pvt is protected by ghes_lock now. This ensures the pointer is
      not updated before registration was finished or while the irq handler is
      running. It is unset before unregistering the device including necessary
      (implicit) memory barriers making the changes visible to other CPUs.
      Thus, the device can not be used anymore by an interrupt.
      
      Also, rename ghes_init to ghes_refcount for better readability and
      switch to refcount API.
      
      A refcount is needed because there can be multiple GHES structures being
      defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error
      Source, "Some platforms may describe multiple Generic Hardware Error
      Source structures with different notification types, ...").
      
      Another approach to use the mci's device refcount (get_device()) and
      have a release function does not work here. A release function will be
      called only for device_release() with the last put_device() call. The
      device must be deleted *before* that with device_del(). This is only
      possible by maintaining an own refcount.
      
       [ bp: touchups. ]
      
      Fixes: 0fe5f281 ("EDAC, ghes: Model a single, logical memory controller")
      Fixes: 1e72e673 ("EDAC/ghes: Fix Use after free in ghes_edac remove path")
      Co-developed-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Co-developed-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NRobert Richter <rrichter@marvell.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
      23f61b9f
  14. 06 11月, 2019 5 次提交
  15. 25 10月, 2019 1 次提交
  16. 24 10月, 2019 1 次提交
  17. 19 10月, 2019 2 次提交
    • T
      EDAC, skx: Retrieve and print retry_rd_err_log registers · e80634a7
      Tony Luck 提交于
      Skylake logs some additional useful information in per-channel
      registers in addition the the architectural status/addr/misc
      logged in the machine check bank.
      
      Pick up this information and add it to the EDAC log:
      
      	retry_rd_err_[five 32-bit register values]
      
      Sorry, no definitions for these registers. OEMs and DIMM vendors
      will be able to use them to isolate which cells in the DIMM are
      causing problems.
      
      	correrrcnt[per rank corrected error counts]
      
      Note that if additional errors are logged while these registers are
      being read, you may see a jumble of values some from earlier errors,
      others from later errors (since the registers report the most recent
      logged error). The correrrcnt registers provide error counts per possible
      rank. If these counts only change by one since the previous error logged
      for this channel, then it is safe to assume that the registers logged
      provide a coherent view of one error.
      
      With this change EDAC logs look like this:
      
      EDAC MC4: 1 CE memory read error on CPU_SrcID#2_MC#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x8f26018 offset:0x0 grain:32 syndrome:0x0 -  err_code:0x0101:0x0091 socket:2 imc:0 rank:0 bg:0 ba:0 row:0x1f880 col:0x200 retry_rd_err_log[0001a209 00000000 00000001 04800001 0001f880] correrrcnt[0001 0000 0000 0000 0000 0000 0000 0000])
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      e80634a7
    • T
      EDAC, skx_common: Refactor so that we initialize "dev" in result of adxl decode. · 29b8e84f
      Tony Luck 提交于
      Simplifies the code a little.
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      29b8e84f