提交 · be1d162948f5bb0ced260e60208e7dc06cd45cab · openeuler / Kernel

10 4月, 2017 4 次提交

B
EDAC: Issue tracepoint only when it is defined · be1d1629
由 Borislav Petkov 提交于 2月 03, 2017
```
... and this happens only when CONFIG_RAS is enabled.
Signed-off-by: NBorislav Petkov <bp@suse.de>
```
be1d1629
B
EDAC: Move edac_op_state to edac_mc.c · 8c22b4fe
由 Borislav Petkov 提交于 1月 26, 2017
```
... as part of moving stuff away from edac_stub.c
Signed-off-by: NBorislav Petkov <bp@suse.de>
```
8c22b4fe

由 Borislav Petkov 提交于 1月 26, 2017

... and the glue around it. It is not needed anymore.
Signed-off-by: NBorislav Petkov <bp@suse.de>

d3116a08

EDAC: Get rid of edac_handlers · 97bb6c17

由 Borislav Petkov 提交于 1月 26, 2017

Use mc_devices list instead to check whether we have EDAC driver
instances successfully registered with EDAC core.
Signed-off-by: NBorislav Petkov <bp@suse.de>

97bb6c17

28 1月, 2017 1 次提交

EDAC: Add routine to check if MC devices list is empty · d7fc9d77

由 Yazen Ghannam 提交于 1月 27, 2017

We need to know if any MC devices have been allocated.
Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-7-git-send-email-Yazen.Ghannam@amd.com
[ Prettify text. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

d7fc9d77

25 12月, 2016 1 次提交

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

由 Linus Torvalds 提交于 12月 24, 2016

This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

15 12月, 2016 2 次提交

edac: move documentation from edac_mc.c to edac_core.h · e01aa14c

由 Mauro Carvalho Chehab 提交于 10月 26, 2016

Several functions are documented at edac_mc.c.

As we'll be including edac_core.h at drivers-api book, move
those, in order for the kernel-doc markups be part of the API
documentation book.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

e01aa14c

edac: rename edac_core.h to edac_mc.h · 78d88e8a

由 Mauro Carvalho Chehab 提交于 10月 29, 2016

Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

78d88e8a

14 11月, 2016 1 次提交

EDAC, mc: Fix locking around mc_devices list · c73e8833

由 Borislav Petkov 提交于 11月 14, 2016

When accessing the mc_devices list of memory controller descriptors, we
need to hold mem_ctls_mutex. This was not always the case, fix that.

Make all external callers call a version which grabs the mutex since the
last is local to edac_mc.c.
Reported-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

c73e8833

03 6月, 2016 1 次提交

EDAC: Fix workqueues poll period resetting · fbedcaf4

由 Nicholas Krause 提交于 5月 19, 2016

After the workqueue cleanup, we're registering workqueues based on
the presence of an ->edac_check function. When that is the case,
we're setting OP_RUNNING_POLL. But we forgot to check that in
edac_mc_reset_delay_period(), leading to:

  BUG: unable to handle kernel paging request at 0000000000015d10
  IP: [ .. ] queued_spin_lock_slowpath
  PGD 3ffcc8067 PUD 3ffc56067 PMD 0
  Oops: 0002 [#1] SMP
  Modules linked in: ...
  CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1
  Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
  Stack:
  Call Trace:
    ? _raw_spin_lock_irqsave
    ? lock_timer_base.isra.34
    ? del_timer
    ? try_to_grab_pending
    ? mod_delayed_work_on
    ? edac_mc_reset_delay_period
    ? edac_set_poll_msec
    ? param_attr_store
    ? module_attr_store
    ? kernfs_fop_write
    ? __vfs_write
    ? __vfs_read
    ? __alloc_fd
    ? vfs_write
    ? SyS_write
    ? entry_SYSCALL_64_fastpath
  Code:
  RIP  [ .. ] queued_spin_lock_slowpath
   RSP <>
  CR2: 0000000000015d10
  ---[ end trace 3f286bc71cca15d1 ]---
  Kernel panic - not syncing: Fatal exception

Fix it.
Signed-off-by: NNicholas Krause <xerofoify@gmail.com>
Cc: <stable@vger.kernel.org> # 4.5
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1463697958-13406-1-git-send-email-xerofoify@gmail.com
[ Rewrite commit message. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

fbedcaf4

24 4月, 2016 1 次提交

EDAC: Increment correct counter in edac_inc_ue_error() · 993f88f1

由 Emmanouil Maroudas 提交于 4月 23, 2016

Fix typo in edac_inc_ue_error() to increment ue_noinfo_count instead of
ce_noinfo_count.
Signed-off-by: NEmmanouil Maroudas <emmanouil.maroudas@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 4275be63 ("edac: Change internal representation to work with layers")
Link: http://lkml.kernel.org/r/1461425580-5898-1-git-send-email-emmanouil.maroudas@gmail.comSigned-off-by: NBorislav Petkov <bp@suse.de>

993f88f1

02 2月, 2016 3 次提交

EDAC: Cleanup/sync workqueue functions · 06e912d4

由 Borislav Petkov 提交于 2月 02, 2016

They're both running only when ->edac_check is initialized so remove
that check from the workqueue function itself. Synchronize/generalize
the ->op_state check between the two.

Kill useless comments, while at it.
Signed-off-by: NBorislav Petkov <bp@suse.de>

06e912d4

EDAC: Kill workqueue setup/teardown functions · 626a7a4d

由 Borislav Petkov 提交于 2月 02, 2016

We have the generic wrappers now, use those. edac_pci_workq_setup() had
an unused argument anyway.
Signed-off-by: NBorislav Petkov <bp@suse.de>

626a7a4d

EDAC: Balance workqueue setup and teardown · 09667606

由 Borislav Petkov 提交于 2月 02, 2016

We use the ->edac_check function pointers to determine whether we need
to setup a polling workqueue. However, the destroy path is not balanced
and we might try to teardown an unitialized workqueue.

Balance init and destroy paths by looking at ->edac_check in both cases.
Set op_state to OP_OFFLINE *before* destroying anything.
Reported-by: NZhiqiang Hou <Zhiqiang.Hou@freescale.com>
Cc: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

09667606

11 12月, 2015 2 次提交

EDAC: Rework workqueue handling · c4cf3b45

由 Borislav Petkov 提交于 11月 30, 2015

Hide the EDAC workqueue pointer in a separate compilation unit and add
accessors for the workqueue manipulations needed.

Remove edac_pci_reset_delay_period() which wasn't used by anything. It
seems it got added without a user with

  91b99041 ("drivers/edac: updated PCI monitoring")
Signed-off-by: NBorislav Petkov <bp@suse.de>

c4cf3b45

EDAC: Robustify workqueues destruction · fcd5c4dd

由 Borislav Petkov 提交于 11月 27, 2015

EDAC workqueue destruction is really fragile. We cancel delayed work
but if it is still running and requeues itself, we still go ahead and
destroy the workqueue and the queued work explodes when workqueue core
attempts to run it.

Make the destruction more robust by switching op_state to offline so
that requeuing stops. Cancel any pending work *synchronously* too.

  EDAC i7core: Driver loaded.
  general protection fault: 0000 [#1] SMP
  CPU 12
  Modules linked in:
  Supported: Yes
  Pid: 0, comm: kworker/0:1 Tainted: G          IE   3.0.101-0-default #1 HP ProLiant DL380 G7
  RIP: 0010:[<ffffffff8107dcd7>]  [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0
  < ... regs ...>
  Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600)
  Stack:
   ...
  Call Trace:
   call_timer_fn
   run_timer_softirq
   __do_softirq
   call_softirq
   do_softirq
   irq_exit
   smp_apic_timer_interrupt
   apic_timer_interrupt
   intel_idle
   cpuidle_idle_call
   cpu_idle
  Code: ...
  RIP  __queue_work
   RSP <...>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>

fcd5c4dd

23 10月, 2015 1 次提交

EDAC: Fix PAGES_TO_MiB macro misuse · 990995ba

由 Tan Xiaojun 提交于 10月 20, 2015

The PAGES_TO_MiB macro is used for unit conversion but the
trace_mc_event() tracepoint expects a page address. Fix that.
Signed-off-by: NTan Xiaojun <tanxiaojun@huawei.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1445341538-24271-1-git-send-email-tanxiaojun@huawei.comSigned-off-by: NBorislav Petkov <bp@suse.de>

990995ba

28 5月, 2015 1 次提交

EDAC: Cleanup atomic_scrub mess · b01aec9b

由 Borislav Petkov 提交于 5月 21, 2015

So first of all, this atomic_scrub() function's naming is bad. It looks
like an atomic_t helper. Change it to edac_atomic_scrub().

The bigger problem is that this function is arch-specific and every new
arch which doesn't necessarily need that functionality still needs to
define it, otherwise EDAC doesn't compile.

So instead of doing that and including arch-specific headers, have each
arch define an EDAC_ATOMIC_SCRUB symbol which can be used in edac_mc.c
for ifdeffery. Much cleaner.

And we already are doing this with another symbol - EDAC_SUPPORT. This
is also much cleaner than having CONFIG_EDAC enumerate all the arches
which need/have EDAC support and drivers.

This way I can kill the useless edac.h header in tile too.
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
Acked-by: NChris Metcalf <cmetcalf@ezchip.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "Maciej W. Rozycki" <macro@codesourcery.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Steven J. Hill" <Steven.Hill@imgtec.com>
Cc: x86@kernel.org
Signed-off-by: NBorislav Petkov <bp@suse.de>

b01aec9b

23 2月, 2015 1 次提交

EDAC: Allow to pass driver-specific attribute groups · 4e8d230d

由 Takashi Iwai 提交于 2月 04, 2015

Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info
object with the optional attribute groups.  This allows drivers to
pass additional sysfs entries without manual (and racy)
device_create_file() and co calls.

edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups()
with NULL groups.
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.deSigned-off-by: NBorislav Petkov <bp@suse.de>

4e8d230d

20 10月, 2014 2 次提交

EDAC: Sync memory types and names · 4cfc3a40

由 Borislav Petkov 提交于 9月 30, 2014

Make keeping the sync between the mem_types enum and the actual string
names simpler by using designated initializers.
Signed-off-by: NBorislav Petkov <bp@suse.de>

4cfc3a40

EDAC: Add DDR3 LRDIMM entries to edac_mem_types · 348fec70

由 Aravind Gopalakrishnan 提交于 9月 18, 2014

F15hM60h adds support for DDR4 and DDR3 LRDIMMs. Add them here.
Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1411070218-10258-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: improve comments. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

348fec70

02 9月, 2014 1 次提交

EDAC: Fix mem_types strings type · f4ce6eca

由 Borislav Petkov 提交于 8月 13, 2014

This one got forgotten during an earlier cleanup.
Signed-off-by: NBorislav Petkov <bp@suse.de>

f4ce6eca

24 6月, 2014 1 次提交

trace, RAS: Add basic RAS trace event · 76ac8275

由 Chen, Gong 提交于 6月 11, 2014

To avoid confuision and conflict of usage for RAS related trace event,
add an unified RAS trace event stub.

Start a RAS subsystem menu which will be fleshed out in time, when more
features get added to it.
Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1402475691-30045-2-git-send-email-gong.chen@linux.intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NTony Luck <tony.luck@intel.com>

76ac8275

09 5月, 2014 1 次提交

EDAC: Fix MC scrub mode comparsion bug for correctable errors · aa2064d7

由 Loc Ho 提交于 5月 08, 2014

The MC structure field scrub_mode is of integer type - not bit field.
Use it accordingly.
Signed-off-by: NLoc Ho <lho@apm.com>
Link: http://lkml.kernel.org/r/1399590199-12256-2-git-send-email-lho@apm.comSigned-off-by: NBorislav Petkov <bp@suse.de>

aa2064d7

14 2月, 2014 2 次提交

EDAC: Correct workqueue setup path · cb6ef42e

由 Borislav Petkov 提交于 2月 12, 2014

We're using edac_mc_workq_setup() both on the init path, when
we load an edac driver and when we change the polling period
(edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec.

On that second path we don't need to init the workqueue which has been
initialized already.

Thanks to Tejun for workqueue insights.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: <stable@vger.kernel.org>

cb6ef42e

EDAC: Poll timeout cannot be zero, p2 · 9da21b15

由 Borislav Petkov 提交于 2月 03, 2014

Sanitize code even more to accept unsigned longs only and to not allow
polling intervals below 1 second as this is unnecessary and doesn't make
much sense anyway for polling errors.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@vger.kernel.org>

9da21b15

05 11月, 2013 1 次提交

edac: Unify reporting of device info for device, mc and pci · 7270a608

由 Robert Richter 提交于 10月 10, 2013

Log messages slightly differ between edac subsystems. Unifying it.
Signed-off-by: NRobert Richter <robert.richter@linaro.org>
Acked-by: NRob Herring <rob.herring@calxeda.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NRobert Richter <rric@kernel.org>

7270a608

24 7月, 2013 1 次提交

EDAC: Fix lockdep splat · 88d84ac9

由 Borislav Petkov 提交于 7月 19, 2013

Fix the following:

BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
 0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
 ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
 ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
  dump_stack
  warn_slowpath_common
  warn_slowpath_fmt
  lockdep_init_map
  ? trace_hardirqs_on_caller
  ? trace_hardirqs_on
  debug_mutex_init
  __mutex_init
  bus_register
  edac_create_sysfs_mci_device
  edac_mc_add_mc
  sbridge_probe
  pci_device_probe
  driver_probe_device
  __driver_attach
  ? driver_probe_device
  bus_for_each_dev
  driver_attach
  bus_add_driver
  driver_register
  __pci_register_driver
  ? 0xffffffffa0010fff
  sbridge_init
  ? 0xffffffffa0010fff
  do_one_initcall
  load_module
  ? unset_module_init_ro_nx
  SyS_init_module
  tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.

What happens is that bus_register needs a statically allocated lock_key
because the last is handed in to lockdep. However, struct mem_ctl_info
embeds struct bus_type (the whole struct, not a pointer to it) and the
whole thing gets dynamically allocated.

Fix this by using a statically allocated struct bus_type for the MC bus.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NMauro Carvalho Chehab <mchehab@infradead.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: stable@kernel.org # v3.10
Signed-off-by: NTony Luck <tony.luck@intel.com>

88d84ac9

16 3月, 2013 1 次提交

EDAC: Merge mci.mem_is_per_rank with mci.csbased · 9713faec

由 Mauro Carvalho Chehab 提交于 3月 11, 2013

Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.

There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:

	if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
			per_rank = true;
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

9713faec

22 2月, 2013 2 次提交

edac: add support for raw error reports · e7e24830

由 Mauro Carvalho Chehab 提交于 10月 31, 2012

That allows APEI GHES driver to report errors directly, using
the EDAC error report API.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

e7e24830

edac: reduce stack pressure by using a pre-allocated buffer · c7ef7645

由 Mauro Carvalho Chehab 提交于 2月 21, 2013

The number of variables at the stack is too big.
Reduces the stack usage by using a pre-allocated error
buffer.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

c7ef7645

21 2月, 2013 2 次提交

edac: lock module owner to avoid error report conflicts · 80cc7d87

由 Mauro Carvalho Chehab 提交于 10月 31, 2012

APEI GHES and i7core_edac/sb_edac currently can be loaded at
the same time, but those are Highlander modules:
	"There can be only one".

There are two reasons for that:

1) Each driver assumes that it is the only one registering at
   the EDAC core, as it is driver's responsibility to number
   the memory controllers, and all of them start from 0;

2) If BIOS is handling the memory errors, the OS can't also be
   doing it, as one will mangle with the other.

So, we need to add an module owner's lock at the EDAC core,
in order to avoid having two different modules handling memory
errors at the same time. The best way for doing this lock seems
to use the driver's name, as this is unique, and won't require
changes on every driver.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

80cc7d87

edac: add a new memory layer type · c66b5a79

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

There are some cases where the memory controller layout is
completely hidden. This is the case of firmware-driven error
code, like the one provided by GHES. Add a new layer to be
used on such memory error report mechanisms.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

c66b5a79

30 1月, 2013 1 次提交

EDAC: Fix kcalloc argument order · d3d09e18

由 Joe Perches 提交于 1月 26, 2013

First number, then size.
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>

d3d09e18

21 12月, 2012 1 次提交

edac: edac_mc no longer deals with kobjects directly · 80f5ab09

由 Shaun Ruffell 提交于 8月 19, 2012

There are no more embedded kobjects in struct mem_ctl_info. Remove a header and
a comment that does not reflect the code anymore.
Signed-off-by: NShaun Ruffell <sruffell@digium.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

80f5ab09

28 11月, 2012 2 次提交

EDAC: Handle empty msg strings when reporting errors · f430d570

由 Borislav Petkov 提交于 9月 10, 2012

A reported error could look like this

[  226.178315] EDAC MC0: 1 CE  on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x427c0d offset:0xde0 grain:0 syndrome:0x1c6)

with two spaces back-to-back due to the msg argument of
edac_mc_handle_error being passed on empty by the specific drivers.
Handle that.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

f430d570

EDAC: Remove useless assignment of error type · 4da1b7bf

由 Borislav Petkov 提交于 9月 10, 2012

The tracepoint decodes the error type later anyway so remove a useless
assignment to the temporary p which gets overwritten later anyway.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

4da1b7bf

25 10月, 2012 1 次提交

edac: Fix the dimm filling for csrows-based layouts · 24bef66e

由 Mauro Carvalho Chehab 提交于 10月 24, 2012

The driver is currently filling data in a wrong way, on drivers
for csrows-based memory controller, when the first layer is a
csrow.

This is not easily to notice, as, in general, memories are
filed in dual, interleaved, symetric mode, as very few memory
controllers support asymetric modes.

While digging into a bug for i82795_edac driver, the asymetric
mode there is now working, allowing us to fill the machine with
4x1GB ranks at channel 0, and 2x512GB at channel 1:

Channel 0 ranks:
EDAC DEBUG: i82975x_init_csrows: DIMM A0: from page 0x00000000 to 0x0003ffff (size: 0x00040000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM A1: from page 0x00040000 to 0x0007ffff (size: 0x00040000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM A2: from page 0x00080000 to 0x000bffff (size: 0x00040000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM A3: from page 0x000c0000 to 0x000fffff (size: 0x00040000 pages)

Channel 1 ranks:
EDAC DEBUG: i82975x_init_csrows: DIMM B0: from page 0x00100000 to 0x0011ffff (size: 0x00020000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM B1: from page 0x00120000 to 0x0013ffff (size: 0x00020000 pages)

Instead of properly showing the memories as such, before this patch, it
shows the memory layout as:

          +-----------------------------------+
          |                mc0                |
          |  csrow0   |  csrow1   |  csrow2   |
----------+-----------------------------------+
channel1: |  1024 MB  |  1024 MB  |   512 MB  |
channel0: |  1024 MB  |  1024 MB  |   512 MB  |
----------+-----------------------------------+

as if both channels were symetric, grouping the DIMMs on a wrong
layout.

After this patch, the memory is correctly represented.
So, for csrows at layers[0], it shows:

          +-----------------------------------------------+
          |                      mc0                      |
          |  csrow0   |  csrow1   |  csrow2   |  csrow3   |
----------+-----------------------------------------------+
channel1: |   512 MB  |   512 MB  |     0 MB  |     0 MB  |
channel0: |  1024 MB  |  1024 MB  |  1024 MB  |  1024 MB  |
----------+-----------------------------------------------+

For csrows at layers[1], it shows:

        +-----------------------+
        |          mc0          |
        | channel0  | channel1  |
--------+-----------------------+
csrow3: |  1024 MB  |     0 MB  |
csrow2: |  1024 MB  |     0 MB  |
--------+-----------------------+
csrow1: |  1024 MB  |   512 MB  |
csrow0: |  1024 MB  |   512 MB  |
--------+-----------------------+

So, no matter of what comes first, the information between
channel and csrow will be properly represented.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

24bef66e

24 9月, 2012 2 次提交

edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs. · faa2ad09

由 Shaun Ruffell 提交于 9月 22, 2012

Fix potential NULL pointer dereference in edac_unregister_sysfs() on
system boot introduced in 3.6-rc1.

Since commit 7a623c03 ("edac: rewrite the sysfs code to use struct
device") edac_mc_alloc() no longer initializes embedded kobjects in
struct mem_ctl_info.  Therefore edac_mc_free() can no longer simply
decrement a kobject reference count to free the allocated memory unless
the memory controller driver module had also called edac_mc_add_mc().

Now edac_mc_free() will check if the newly embedded struct device has
been registered with sysfs before using either the standard device
release functions or freeing the data structures itself with logic
pulled out of the error path of edac_mc_alloc().

The BUG this patch resolves for me:

  BUG: unable to handle kernel NULL pointer dereference at   (null)
  EIP is at __wake_up_common+0x1a/0x6a
  Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 task.ti=f3dc6000)
  Call Trace:
    complete_all+0x3f/0x50
    device_pm_remove+0x23/0xa2
    device_del+0x34/0x142
    edac_unregister_sysfs+0x3b/0x5c [edac_core]
    edac_mc_free+0x29/0x2f [edac_core]
    e7xxx_probe1+0x268/0x311 [e7xxx_edac]
    e7xxx_init_one+0x56/0x61 [e7xxx_edac]
    local_pci_probe+0x13/0x15
  ...

Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: NShaun Ruffell <sruffell@digium.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

faa2ad09

edac_mc: fix messy kfree calls in the error path · ef6e7816

由 Fengguang Wu 提交于 9月 23, 2012

coccinelle warns about:

+ drivers/edac/edac_mc.c:429:9-23: ERROR: reference preceded by free on line 429

   421         if (mci->csrows) {
 > 422                 for (chn = 0; chn < tot_channels; chn++) {
   423                         csr = mci->csrows[chn];
   424                         if (csr) {
 > 425                                 for (chn = 0; chn < tot_channels; chn++)
   426                                          kfree(csr->channels[chn]);
   427                                  kfree(csr);
   428                          }
 > 429                          kfree(mci->csrows[i]);
   430                  }
   431                  kfree(mci->csrows);
   432          }

and that code block seem to mess things up in several ways (double free, memory
leak, out-of-bound reads etc.):

L422: The iterator "chn" and bound "tot_channels" are totally wrong. Should be
      "row" and "tot_csrows" respectively. Which means either memory leak, or
      out-of-bound reads (which if does not trigger an immediate page fault
      error, will further lead to kfree() on random addresses).

L425: The inner loop is reusing the same iterator "chn" as the outer loop,
      which could lead to premature end of the outer loop, and hence memory leak.

L429: The array index 'i' in mci->csrows[i] is a temporary value used in
      previous loops, and won't change at all in the current loop. Which
      means either out-of-bound read and possibly kfree(random number), or the
      same mci->csrows[i] get freed once and again, and possibly double free
      for the kfree(csr) in L427.

L426/L427: a kfree(csr->channels) is needed in between to avoid leaking the memory.

The buggy code was introduced by commit de3910eb ("edac: change the mem
allocation scheme to make Documentation/kobject.txt happy") in the 3.6-rc1
merge window. Fix it by freeing up resources in this order:

  free csrows[i]->channels[j]
  free csrows[i]->channels
  free csrows[i]
  free csrows

CC: Mauro Carvalho Chehab <mchehab@redhat.com>
CC: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef6e7816

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功