• D
    corelockup: Add support of cpu core hang check · 69a48b7f
    Dong Kai 提交于
    ascend inclusion
    category: feature
    bugzilla: NA
    CVE: NA
    
    --------------------------------
    
    The softlockup and hardlockup detector only check the status
    of the cpu which it resides. If certain cpu core suspends,
    they are both not works. There is no any valid log but the
    cpu already abnormal and brings a lot of problems of system.
    To detect this case, we add the corelockup detector.
    
    First we use whether cpu core can responds to nmi as a sectence
    to determine if it is suspended. Then things is simple. Per cpu
    core maintains it's nmi interrupt counts and detector the
    nmi_counts of next cpu core. If the nmi interrupt counts not
    changed any more which means it can't respond nmi normally, we
    regard it as suspend.
    
    To ensure robustness, only consecutive lost nmi more than two
    times then trigger the warn.
    
    The detection chain is as following:
    cpu0->cpu1->...->cpuN->cpu0
    Signed-off-by: NDong Kai <dongkai11@huawei.com>
    Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
    Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
    Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
    69a48b7f
nmi.h 7.7 KB