• C
    x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll · fa92c586
    Chen Yucong 提交于
    Uncorrected no action required (UCNA) - is a uncorrected recoverable
    machine check error that is not signaled via a machine check exception
    and, instead, is reported to system software as a corrected machine
    check error. UCNA errors indicate that some data in the system is
    corrupted, but the data has not been consumed and the processor state
    is valid and you may continue execution on this processor. UCNA errors
    require no action from system software to continue execution. Note that
    UCNA errors are supported by the processor only when IA32_MCG_CAP[24]
    (MCG_SER_P) is set.
                                                   -- Intel SDM Volume 3B
    
    Deferred errors are errors that cannot be corrected by hardware, but
    do not cause an immediate interruption in program flow, loss of data
    integrity, or corruption of processor state. These errors indicate
    that data has been corrupted but not consumed. Hardware writes information
    to the status and address registers in the corresponding bank that
    identifies the source of the error if deferred errors are enabled for
    logging. Deferred errors are not reported via machine check exceptions;
    they can be seen by polling the MCi_STATUS registers.
                                                    -- AMD64 APM Volume 2
    
    Above two items, both UCNA and Deferred errors belong to detected
    errors, but they can't be corrected by hardware, and this is very
    similar to Software Recoverable Action Optional (SRAO) errors.
    Therefore, we can take some actions that have been used for handling
    SRAO errors to handle UCNA and Deferred errors.
    Acked-by: NBorislav Petkov <bp@suse.de>
    Signed-off-by: NChen Yucong <slaoub@gmail.com>
    Signed-off-by: NTony Luck <tony.luck@intel.com>
    fa92c586
mce.c 60.1 KB