1. 02 9月, 2020 5 次提交
  2. 18 3月, 2020 1 次提交
    • T
      EDAC, skx: Retrieve and print retry_rd_err_log registers · a20889d9
      Tony Luck 提交于
      commit e80634a75aba90e7485cd1fdb463fcac5d45f14d upstream
      
      Skylake logs some additional useful information in per-channel
      registers in addition the the architectural status/addr/misc
      logged in the machine check bank.
      
      Pick up this information and add it to the EDAC log:
      
      	retry_rd_err_[five 32-bit register values]
      
      Sorry, no definitions for these registers. OEMs and DIMM vendors
      will be able to use them to isolate which cells in the DIMM are
      causing problems.
      
      	correrrcnt[per rank corrected error counts]
      
      Note that if additional errors are logged while these registers are
      being read, you may see a jumble of values some from earlier errors,
      others from later errors (since the registers report the most recent
      logged error). The correrrcnt registers provide error counts per possible
      rank. If these counts only change by one since the previous error logged
      for this channel, then it is safe to assume that the registers logged
      provide a coherent view of one error.
      
      With this change EDAC logs look like this:
      
      EDAC MC4: 1 CE memory read error on CPU_SrcID#2_MC#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x8f26018 offset:0x0 grain:32 syndrome:0x0 -  err_code:0x0101:0x0091 socket:2 imc:0 rank:0 bg:0 ba:0 row:0x1f880 col:0x200 retry_rd_err_log[0001a209 00000000 00000001 04800001 0001f880] correrrcnt[0001 0000 0000 0000 0000 0000 0000 0000])
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: Nzhaobing <zhaobing@linux.alibaba.com>
      Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
      a20889d9
  3. 17 1月, 2020 13 次提交
  4. 15 1月, 2020 10 次提交
  5. 01 12月, 2019 1 次提交
  6. 21 11月, 2019 2 次提交
  7. 29 10月, 2019 1 次提交
    • J
      EDAC/ghes: Fix Use after free in ghes_edac remove path · d97e4a6d
      James Morse 提交于
      commit 1e72e673b9d102ff2e8333e74b3308d012ddf75b upstream.
      
      ghes_edac models a single logical memory controller, and uses a global
      ghes_init variable to ensure only the first ghes_edac_register() will
      do anything.
      
      ghes_edac is registered the first time a GHES entry in the HEST is
      probed. There may be multiple entries, so subsequent attempts to
      register ghes_edac are silently ignored as the work has already been
      done.
      
      When a GHES entry is unregistered, it calls ghes_edac_unregister(),
      which free()s the memory behind the global variables in ghes_edac.
      
      But there may be multiple GHES entries, the next call to
      ghes_edac_unregister() will dereference the free()d memory, and attempt
      to free it a second time.
      
      This may also be triggered on a platform with one GHES entry, if the
      driver is unbound/re-bound and unbound. The re-bind step will do
      nothing because of ghes_init, the second unbind will then do the same
      work as the first.
      
      Doing the unregister work on the first call is unsafe, as another
      CPU may be processing a notification in ghes_edac_report_mem_error(),
      using the memory we are about to free.
      
      ghes_init is already half of the reference counting. We only need
      to do the register work for the first call, and the unregister work
      for the last. Add the unregister check.
      
      This means we no longer free ghes_edac's memory while there are
      GHES entries that may receive a notification.
      
      This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE.
      
       [ bp: merge into a single patch. ]
      
      Fixes: 0fe5f281 ("EDAC, ghes: Model a single, logical memory controller")
      Reported-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Robert Richter <rrichter@marvell.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com
      Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d97e4a6d
  8. 05 10月, 2019 5 次提交
  9. 26 7月, 2019 2 次提交
    • E
      EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msec · e78d1d23
      Eiichi Tsukata 提交于
      [ Upstream commit d8655e7630dafa88bc37f101640e39c736399771 ]
      
      Commit 9da21b15 ("EDAC: Poll timeout cannot be zero, p2") assumes
      edac_mc_poll_msec to be unsigned long, but the type of the variable still
      remained as int. Setting edac_mc_poll_msec can trigger out-of-bounds
      write.
      
      Reproducer:
      
        # echo 1001 > /sys/module/edac_core/parameters/edac_mc_poll_msec
      
      KASAN report:
      
        BUG: KASAN: global-out-of-bounds in edac_set_poll_msec+0x140/0x150
        Write of size 8 at addr ffffffffb91b2d00 by task bash/1996
      
        CPU: 1 PID: 1996 Comm: bash Not tainted 5.2.0-rc6+ #23
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
        Call Trace:
         dump_stack+0xca/0x13e
         print_address_description.cold+0x5/0x246
         __kasan_report.cold+0x75/0x9a
         ? edac_set_poll_msec+0x140/0x150
         kasan_report+0xe/0x20
         edac_set_poll_msec+0x140/0x150
         ? dimmdev_location_show+0x30/0x30
         ? vfs_lock_file+0xe0/0xe0
         ? _raw_spin_lock+0x87/0xe0
         param_attr_store+0x1b5/0x310
         ? param_array_set+0x4f0/0x4f0
         module_attr_store+0x58/0x80
         ? module_attr_show+0x80/0x80
         sysfs_kf_write+0x13d/0x1a0
         kernfs_fop_write+0x2bc/0x460
         ? sysfs_kf_bin_read+0x270/0x270
         ? kernfs_notify+0x1f0/0x1f0
         __vfs_write+0x81/0x100
         vfs_write+0x1e1/0x560
         ksys_write+0x126/0x250
         ? __ia32_sys_read+0xb0/0xb0
         ? do_syscall_64+0x1f/0x390
         do_syscall_64+0xc1/0x390
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        RIP: 0033:0x7fa7caa5e970
        Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 04
        RSP: 002b:00007fff6acfdfe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
        RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa7caa5e970
        RDX: 0000000000000005 RSI: 0000000000e95c08 RDI: 0000000000000001
        RBP: 0000000000e95c08 R08: 00007fa7cad1e760 R09: 00007fa7cb36a700
        R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000005
        R13: 0000000000000001 R14: 00007fa7cad1d600 R15: 0000000000000005
      
        The buggy address belongs to the variable:
         edac_mc_poll_msec+0x0/0x40
      
        Memory state around the buggy address:
         ffffffffb91b2c00: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
         ffffffffb91b2c80: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
        >ffffffffb91b2d00: 04 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
                           ^
         ffffffffb91b2d80: 04 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
         ffffffffb91b2e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      Fix it by changing the type of edac_mc_poll_msec to unsigned int.
      The reason why this patch adopts unsigned int rather than unsigned long
      is msecs_to_jiffies() assumes arg to be unsigned int. We can avoid
      integer conversion bugs and unsigned int will be large enough for
      edac_mc_poll_msec.
      Reviewed-by: NJames Morse <james.morse@arm.com>
      Fixes: 9da21b15 ("EDAC: Poll timeout cannot be zero, p2")
      Signed-off-by: NEiichi Tsukata <devel@etsukata.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e78d1d23
    • P
      EDAC/sysfs: Fix memory leak when creating a csrow object · 10cc3a65
      Pan Bian 提交于
      [ Upstream commit 585fb3d93d32dbe89e718b85009f9c322cc554cd ]
      
      In edac_create_csrow_object(), the reference to the object is not
      released when adding the device to the device hierarchy fails
      (device_add()). This may result in a memory leak.
      Signed-off-by: NPan Bian <bianpan2016@163.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: https://lkml.kernel.org/r/1555554438-103953-1-git-send-email-bianpan2016@163.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      10cc3a65