提交 42f930da 编写于 作者: D Don Zickus 提交者: Thomas Gleixner

watchdog/hardlockup/perf: Use atomics to track in-use cpu counter

Guenter reported:
  There is still a problem. When running 
    echo 6 > /proc/sys/kernel/watchdog_thresh
    echo 5 > /proc/sys/kernel/watchdog_thresh
  repeatedly, the message
 
   NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
 
  stops after a while (after ~10-30 iterations, with fluctuations).
  Maybe watchdog_cpus needs to be atomic ?

That's correct as this again is affected by the asynchronous nature of the
smpboot thread unpark mechanism.

CPU 0				CPU1			CPU2
write(watchdog_thresh, 6)	
  stop()
    park()
  update()
  start()
    unpark()
				thread->unpark()
				  cnt++;
write(watchdog_thresh, 5)				thread->unpark()
  stop()
    park()			thread->park()
				   cnt--;		  cnt++;
  update()
  start()
    unpark()

That's not a functional problem, it just affects the informational message.

Convert watchdog_cpus to atomic_t to prevent the problem
Reported-and-tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20171101181126.j727fqjmdthjz4xk@redhat.com
上级 9c388a5e
...@@ -12,6 +12,7 @@ ...@@ -12,6 +12,7 @@
#define pr_fmt(fmt) "NMI watchdog: " fmt #define pr_fmt(fmt) "NMI watchdog: " fmt
#include <linux/nmi.h> #include <linux/nmi.h>
#include <linux/atomic.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/sched/debug.h> #include <linux/sched/debug.h>
...@@ -25,7 +26,7 @@ static DEFINE_PER_CPU(struct perf_event *, dead_event); ...@@ -25,7 +26,7 @@ static DEFINE_PER_CPU(struct perf_event *, dead_event);
static struct cpumask dead_events_mask; static struct cpumask dead_events_mask;
static unsigned long hardlockup_allcpu_dumped; static unsigned long hardlockup_allcpu_dumped;
static unsigned int watchdog_cpus; static atomic_t watchdog_cpus = ATOMIC_INIT(0);
void arch_touch_nmi_watchdog(void) void arch_touch_nmi_watchdog(void)
{ {
...@@ -189,7 +190,8 @@ void hardlockup_detector_perf_enable(void) ...@@ -189,7 +190,8 @@ void hardlockup_detector_perf_enable(void)
if (hardlockup_detector_event_create()) if (hardlockup_detector_event_create())
return; return;
if (!watchdog_cpus++) /* use original value for check */
if (!atomic_fetch_inc(&watchdog_cpus))
pr_info("Enabled. Permanently consumes one hw-PMU counter.\n"); pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
perf_event_enable(this_cpu_read(watchdog_ev)); perf_event_enable(this_cpu_read(watchdog_ev));
...@@ -207,7 +209,7 @@ void hardlockup_detector_perf_disable(void) ...@@ -207,7 +209,7 @@ void hardlockup_detector_perf_disable(void)
this_cpu_write(watchdog_ev, NULL); this_cpu_write(watchdog_ev, NULL);
this_cpu_write(dead_event, event); this_cpu_write(dead_event, event);
cpumask_set_cpu(smp_processor_id(), &dead_events_mask); cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
watchdog_cpus--; atomic_dec(&watchdog_cpus);
} }
} }
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册