• T
    kernel/watchdog: Prevent false positives with turbo modes · 7edaeb68
    Thomas Gleixner 提交于
    The hardlockup detector on x86 uses a performance counter based on unhalted
    CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
    performance counter period, so the hrtimer should fire 2-3 times before the
    performance counter NMI fires. The NMI code checks whether the hrtimer
    fired since the last invocation. If not, it assumess a hard lockup.
    
    The calculation of those periods is based on the nominal CPU
    frequency. Turbo modes increase the CPU clock frequency and therefore
    shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
    nominal frequency) the perf/NMI period is shorter than the hrtimer period
    which leads to false positives.
    
    A simple fix would be to shorten the hrtimer period, but that comes with
    the side effect of more frequent hrtimer and softlockup thread wakeups,
    which is not desired.
    
    Implement a low pass filter, which checks the perf/NMI period against
    kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
    elapsed then the event is ignored and postponed to the next perf/NMI.
    
    That solves the problem and avoids the overhead of shorter hrtimer periods
    and more frequent softlockup thread wakeups.
    
    Fixes: 58687acb ("lockup_detector: Combine nmi_watchdog and softlockup detector")
    Reported-and-tested-by: NKan Liang <Kan.liang@intel.com>
    Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
    Cc: dzickus@redhat.com
    Cc: prarit@redhat.com
    Cc: ak@linux.intel.com
    Cc: babu.moger@oracle.com
    Cc: peterz@infradead.org
    Cc: eranian@google.com
    Cc: acme@redhat.com
    Cc: stable@vger.kernel.org
    Cc: atomlin@redhat.com
    Cc: akpm@linux-foundation.org
    Cc: torvalds@linux-foundation.org
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
    7edaeb68
watchdog.c 25.3 KB