- 17 2月, 2020 6 次提交
-
-
由 Robert Richter 提交于
Carve out the error_count increment into a separate function edac_inc_csrow(). This better separates code and reduces the indentation level. Implementation note: The function edac_inc_csrow() counts the same as before, ->ce_count is only incremented if row >= 0. This is esp. true for the case of (!e->enable_per_layer_report). Here, a DIMM was not found, variable row still has a value of -1 and ->ce_count is not incremented. [ bp: Massage commit message. ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab@kernel.org> Acked-by: NAristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200214141757.8976-1-rrichter@marvell.com
-
由 Robert Richter 提交于
Each struct mci has its own error descriptor. Create a function error_desc_to_mci() to determine the corresponding mci from an error descriptor. This removes @mci from the parameter list of edac_raw_mc_handle_error() as the mci pointer does not need to be passed any longer. [ bp: Massage commit message. ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: NAristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-5-rrichter@marvell.com
-
由 Robert Richter 提交于
Store the error type in struct edac_raw_error_desc. This makes the type parameter of edac_raw_mc_handle_error() obsolete. [ kernel-doc typo ] Reported-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab@kernel.org> Acked-by: NAristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-4-rrichter@marvell.com
-
由 Robert Richter 提交于
Reorder the new created functions edac_mc_alloc_csrows() and edac_mc_alloc_dimms() and move them before edac_mc_alloc(). No further code changes. Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: NAristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-3-rrichter@marvell.com
-
由 Robert Richter 提交于
edac_mc_alloc() is huge. Factor out code by moving it to the two new functions edac_mc_alloc_csrows() and edac_mc_alloc_dimms(). Do not move code yet for better review. [ bp: sort local args in reversed fir tree order. ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: NAristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-2-rrichter@marvell.com
-
由 Robert Richter 提交于
There are dimm and csrow devices linked to the mci device esp. to show up in sysfs. It must be granted that children devices are removed before its mci parent. Thus, the release functions must be called in the correct order and may not miss any child before releasing its parent. In the current implementation this is only granted by the correct order of release functions. A much better approach is to use put_device() that releases the device only after all users are gone. It is the recommended way to release a device and free its memory. The function uses the device's refcount and only frees it if there are no users of it anymore such as children. So implement a mci_release() function to remove mci devices, use put_device() to free them and early initialize the mci device right after its struct has been allocated. Change the release function so that it can be universally used no matter if the device is registered or not. Since subsequent dimm and csrow sysfs links are implemented as children devices, their refcounts will keep the parent mci device from being removed as long as sysfs entries exist and until all users have been unregistered in edac_remove_sysfs_mci_device(). Remove edac_unregister_sysfs() and merge mci sysfs removal into edac_remove_sysfs_mci_device(). There is only a single instance now that removes the sysfs entries. The function can now be used in the error paths for cleanup. Also, create device release functions for all involved devices (dev->release), remove device_type release functions (dev_type-> release) and also use dev->init_name instead of dev_set_name(). [ bp: Massage commit message and comments. ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NAristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200212120340.4764-5-rrichter@marvell.com
-
- 13 2月, 2020 1 次提交
-
-
由 Robert Richter 提交于
A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and DEBUG_KMEMLEAK set, revealed several issues when removing an mci device: 1) Use-after-free: On 27.11.19 17:07:33, John Garry wrote: > [ 22.104498] BUG: KASAN: use-after-free in > edac_remove_sysfs_mci_device+0x148/0x180 The use-after-free is caused by the mci_for_each_dimm() macro called in edac_remove_sysfs_mci_device(). The iterator was introduced with c498afaf ("EDAC: Introduce an mci_for_each_dimm() iterator"). The iterator loop calls device_unregister(&dimm->dev), which removes the sysfs entry of the device, but also frees the dimm struct in dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(), the dimm struct is accessed again, after having been freed already. The fix is to free all the mci device's subsequent dimm and csrow objects at a later point, in _edac_mc_free(), when the mci device itself is being freed. This keeps the data structures intact and the mci device can be fully used until its removal. The change allows the safe usage of mci_for_each_dimm() to release dimm devices from sysfs. 2) Memory leaks: Following memory leaks have been detected: # grep edac /sys/kernel/debug/kmemleak | sort | uniq -c 1 [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0 # mci->csrows 16 [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0 # csr->channels 16 [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0 # csr->channels[chn] 1 [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0 # mci->dimms 34 [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc() All leaks are from memory allocated by edac_mc_alloc(). Note: The test above shows that edac_mc_alloc() was called here from ghes_edac_register(), thus both functions show up in the stack trace but the module causing the leaks is edac_mc. The comments with the data structures involved were made manually by analyzing the objdump. The data structures listed above and created by edac_mc_alloc() are not properly removed during device removal, which is done in edac_mc_free(). There are two paths implemented to remove the device depending on device registration, _edac_mc_free() is called if the device is not registered and edac_unregister_sysfs() otherwise. The implemenations differ. For the sysfs case, the mci device removal lacks the removal of subsequent data structures (csrows, channels, dimms). This causes the memory leaks (see mci_attr_release()). [ bp: Massage commit message. ] Fixes: c498afaf ("EDAC: Introduce an mci_for_each_dimm() iterator") Fixes: faa2ad09 ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.") Fixes: 7a623c03 ("edac: rewrite the sysfs code to use struct device") Reported-by: NJohn Garry <john.garry@huawei.com> Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Tested-by: NJohn Garry <john.garry@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com
-
- 10 11月, 2019 5 次提交
-
-
由 Robert Richter 提交于
The code in ghes_edac.c and edac_mc.c for grain_bits calculation and calling trace_mc_event() is now the same. Move it to a single location in edac_raw_mc_handle_error(). The only difference is the missing IS_ENABLED(CONFIG_RAS) switch, but this is needed for ghes too. Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-13-rrichter@marvell.com
-
由 Robert Richter 提交于
Reduce the indentation level in edac_mc_handle_error() a bit. No functional changes. Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-7-rrichter@marvell.com
-
由 Robert Richter 提交于
The e string to which this is pointing to has already been cleared earlier in the function so remove the needless zero string termination. [ bp: Correct the commit message. ] Suggested-by: NJoe Perches <joe@perches.com> Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-6-rrichter@marvell.com
-
由 Robert Richter 提交于
No need to crash the system in case edac_mc_alloc() is called with invalid arguments, just warn and return. This would cause a checkpatch warning when touching the code later, so just fix it. Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-5-rrichter@marvell.com
-
由 Robert Richter 提交于
Introduce an mci_for_each_dimm() iterator. It returns a pointer to a struct dimm_info. This makes the declaration and use of an index obsolete and avoids access to internal data of struct mci (direct array access etc). [ bp: push the struct dimm_info *dimm; declaration into the CONFIG_EDAC_DEBUG block. ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-4-rrichter@marvell.com
-
- 09 11月, 2019 1 次提交
-
-
由 Robert Richter 提交于
The EDAC_DIMM_OFF() macro takes 5 arguments to get the DIMM's index. Simplify this by storing the index in struct dimm_info to avoid its calculation and remove the EDAC_DIMM_OFF() macro. The index can be directly used then. Another advantage is that edac_mc_alloc() could be used even if the exact size of the layers is unknown. Only the number of DIMMs would be needed. Rename iterator variable to idx, while at it. The name is more handy, esp. when searching for it in the code. Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-3-rrichter@marvell.com
-
- 04 9月, 2019 1 次提交
-
-
由 Robert Richter 提交于
Use of 'unsigned int' instead of bare use of 'unsigned'. Fix this for edac_mc*, ghes and the i5100 driver as reported by checkpatch.pl. While at it, struct member dev_ch_attribute->channel is always used as unsigned int. Change type to unsigned int to avoid type casts. [ bp: Massage. ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190902123216.9809-2-rrichter@marvell.com
-
- 15 8月, 2019 1 次提交
-
-
由 Robert Richter 提交于
Remove needless and boilerplate variable declarations. No functional changes. [ bp: Add newlines for better readability. ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190624150758.6695-10-rrichter@marvell.com
-
- 03 8月, 2019 1 次提交
-
-
由 Robert Richter 提交于
The grain in EDAC is defined as "minimum granularity for an error report, in bytes". The following calculation of the grain_bits in edac_mc is wrong: grain_bits = fls_long(e->grain) + 1; Where grain_bits is defined as: grain = 1 << grain_bits Example: grain = 8 # 64 bit (8 bytes) grain_bits = fls_long(8) + 1 grain_bits = 4 + 1 = 5 grain = 1 << grain_bits grain = 1 << 5 = 32 Replace it with the correct calculation: grain_bits = fls_long(e->grain - 1); The example gives now: grain_bits = fls_long(8 - 1) grain_bits = fls_long(7) grain_bits = 3 grain = 1 << 3 = 8 Also, check if the hardware reports a reasonable grain != 0 and fallback with a warning to 1 byte granularity otherwise. [ bp: massage a bit. ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190624150758.6695-2-rrichter@marvell.com
-
- 14 5月, 2019 1 次提交
-
-
由 Robert Richter 提交于
The function should return NULL in case no device is found, but it always returns the last checked mc device from the list even if the index did not match. Fix that. I did some analysis why this did not raise any issues for about 3 years and the reason is that edac_mc_find() is mostly used to search for existing devices. Thus, the bug is not triggered. [ bp: Drop the if (mci->mc_idx > idx) test in favor of readability. ] Fixes: c73e8833 ("EDAC, mc: Fix locking around mc_devices list") Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com
-
- 14 11月, 2018 1 次提交
-
-
由 Borislav Petkov 提交于
... and use the single edac_subsys object returned from subsys_system_register(). The idea is to have a single bus and multiple devices on it. Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> CC: Aristeu Rozanski Filho <arozansk@redhat.com> CC: Greg KH <gregkh@linuxfoundation.org> CC: Justin Ernst <justin.ernst@hpe.com> CC: linux-edac <linux-edac@vger.kernel.org> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: Russ Anderson <rja@hpe.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20180926152752.GG5584@zn.tnic
-
- 17 8月, 2018 1 次提交
-
-
由 Takashi Iwai 提交于
The edac_mem_types[] array misses a MEM_LRDDR4 entry, which leads to NULL pointer dereference when accessed via sysfs or such. Signed-off-by: NTakashi Iwai <tiwai@suse.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20180810141426.8918-1-tiwai@suse.de Fixes: 1e8096bb ("EDAC: Add LRDDR4 DRAM type") Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 14 3月, 2018 2 次提交
-
-
由 Tony Luck 提交于
There are now non-volatile versions of DIMMs. Add a new entry to "enum mem_type" and a new string in edac_mem_types[]. Signed-off-by: NTony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jean Delvare <jdelvare@suse.com> Cc: Len Brown <lenb@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-acpi@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/20180312182430.10335-3-tony.luck@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Tony Luck 提交于
Somehow we ended up with two separate arrays of strings to describe the "enum mem_type" values. In edac_mc.c we have an exported list edac_mem_types[] that is used by a couple of drivers in debug messaged. In edac_mc_sysfs.c we have a private list that is used to display values in: /sys/devices/system/edac/mc/mc*/dimm*/dimm_mem_type /sys/devices/system/edac/mc/mc*/csrow*/mem_type This list was missing a value for MEM_LRDDR3. The string values in the two lists were different :-( Combining the lists, I kept the values so that the sysfs output will be unchanged as some scripts may depend on that. Reported-by: NBorislav Petkov <bp@suse.de> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NTony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jean Delvare <jdelvare@suse.com> Cc: Len Brown <lenb@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-acpi@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/20180312182430.10335-2-tony.luck@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 25 9月, 2017 1 次提交
-
-
由 Toshi Kani 提交于
Only a single EDAC platform driver can be loaded. When ghes_edac is enabled, an EDAC platform driver still attempts to register itself and fails in edac_mc_add_mc(). Add edac_get_owner() so that EDAC platform drivers can check the owner first. Signed-off-by: NToshi Kani <toshi.kani@hpe.com> Suggested-by: NBorislav Petkov <bp@alien8.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170823225447.15608-5-toshi.kani@hpe.com [ Massage commit message. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 10 4月, 2017 6 次提交
-
-
由 Borislav Petkov 提交于
Change them to have the edac_ prefix. No functionality change. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
Move the remaining functionality to edac_mc.c. Convert "edac_report=" to a module parameter. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
... and this happens only when CONFIG_RAS is enabled. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
... as part of moving stuff away from edac_stub.c Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
... and the glue around it. It is not needed anymore. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
Use mc_devices list instead to check whether we have EDAC driver instances successfully registered with EDAC core. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 28 1月, 2017 1 次提交
-
-
由 Yazen Ghannam 提交于
We need to know if any MC devices have been allocated. Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-7-git-send-email-Yazen.Ghannam@amd.com [ Prettify text. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 25 12月, 2016 1 次提交
-
-
由 Linus Torvalds 提交于
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 12月, 2016 2 次提交
-
-
由 Mauro Carvalho Chehab 提交于
Several functions are documented at edac_mc.c. As we'll be including edac_core.h at drivers-api book, move those, in order for the kernel-doc markups be part of the API documentation book. Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
-
由 Mauro Carvalho Chehab 提交于
Now, all left at edac_core.h are at drivers/edac/edac_mc.c, so rename it to edac_mc.h. Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
-
- 14 11月, 2016 1 次提交
-
-
由 Borislav Petkov 提交于
When accessing the mc_devices list of memory controller descriptors, we need to hold mem_ctls_mutex. This was not always the case, fix that. Make all external callers call a version which grabs the mutex since the last is local to edac_mc.c. Reported-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 03 6月, 2016 1 次提交
-
-
由 Nicholas Krause 提交于
After the workqueue cleanup, we're registering workqueues based on the presence of an ->edac_check function. When that is the case, we're setting OP_RUNNING_POLL. But we forgot to check that in edac_mc_reset_delay_period(), leading to: BUG: unable to handle kernel paging request at 0000000000015d10 IP: [ .. ] queued_spin_lock_slowpath PGD 3ffcc8067 PUD 3ffc56067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: ... CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1 Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013 Stack: Call Trace: ? _raw_spin_lock_irqsave ? lock_timer_base.isra.34 ? del_timer ? try_to_grab_pending ? mod_delayed_work_on ? edac_mc_reset_delay_period ? edac_set_poll_msec ? param_attr_store ? module_attr_store ? kernfs_fop_write ? __vfs_write ? __vfs_read ? __alloc_fd ? vfs_write ? SyS_write ? entry_SYSCALL_64_fastpath Code: RIP [ .. ] queued_spin_lock_slowpath RSP <> CR2: 0000000000015d10 ---[ end trace 3f286bc71cca15d1 ]--- Kernel panic - not syncing: Fatal exception Fix it. Signed-off-by: NNicholas Krause <xerofoify@gmail.com> Cc: <stable@vger.kernel.org> # 4.5 Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1463697958-13406-1-git-send-email-xerofoify@gmail.com [ Rewrite commit message. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 24 4月, 2016 1 次提交
-
-
由 Emmanouil Maroudas 提交于
Fix typo in edac_inc_ue_error() to increment ue_noinfo_count instead of ce_noinfo_count. Signed-off-by: NEmmanouil Maroudas <emmanouil.maroudas@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: 4275be63 ("edac: Change internal representation to work with layers") Link: http://lkml.kernel.org/r/1461425580-5898-1-git-send-email-emmanouil.maroudas@gmail.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 02 2月, 2016 3 次提交
-
-
由 Borislav Petkov 提交于
They're both running only when ->edac_check is initialized so remove that check from the workqueue function itself. Synchronize/generalize the ->op_state check between the two. Kill useless comments, while at it. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
We have the generic wrappers now, use those. edac_pci_workq_setup() had an unused argument anyway. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
We use the ->edac_check function pointers to determine whether we need to setup a polling workqueue. However, the destroy path is not balanced and we might try to teardown an unitialized workqueue. Balance init and destroy paths by looking at ->edac_check in both cases. Set op_state to OP_OFFLINE *before* destroying anything. Reported-by: NZhiqiang Hou <Zhiqiang.Hou@freescale.com> Cc: Varun Sethi <Varun.Sethi@freescale.com> Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 11 12月, 2015 2 次提交
-
-
由 Borislav Petkov 提交于
Hide the EDAC workqueue pointer in a separate compilation unit and add accessors for the workqueue manipulations needed. Remove edac_pci_reset_delay_period() which wasn't used by anything. It seems it got added without a user with 91b99041 ("drivers/edac: updated PCI monitoring") Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
EDAC workqueue destruction is really fragile. We cancel delayed work but if it is still running and requeues itself, we still go ahead and destroy the workqueue and the queued work explodes when workqueue core attempts to run it. Make the destruction more robust by switching op_state to offline so that requeuing stops. Cancel any pending work *synchronously* too. EDAC i7core: Driver loaded. general protection fault: 0000 [#1] SMP CPU 12 Modules linked in: Supported: Yes Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7 RIP: 0010:[<ffffffff8107dcd7>] [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0 < ... regs ...> Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600) Stack: ... Call Trace: call_timer_fn run_timer_softirq __do_softirq call_softirq do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt intel_idle cpuidle_idle_call cpu_idle Code: ... RIP __queue_work RSP <...> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org>
-