1. 04 10月, 2017 1 次提交
  2. 28 9月, 2017 1 次提交
  3. 26 9月, 2017 1 次提交
  4. 14 9月, 2017 26 次提交
    • T
      watchdog/hardlockup: Clean up hotplug locking mess · ab5fe3ff
      Thomas Gleixner 提交于
      All watchdog thread related functions are delegated to the smpboot thread
      infrastructure, which handles serialization against CPU hotplug correctly.
      
      The sysctl interface is completely decoupled from anything which requires
      CPU hotplug protection.
      
      No need to protect the sysctl writes against cpu hotplug anymore. Remove it
      and add the now required protection to the powerpc arch_nmi_watchdog
      implementation.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20170912194148.418497420@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ab5fe3ff
    • T
      watchdog/hardlockup/perf: Simplify deferred event destroy · a33d4484
      Thomas Gleixner 提交于
      Now that all functionality is properly serialized against CPU hotplug,
      remove the extra per cpu storage which holds the disabled events for
      cleanup. The core makes sure that cleanup happens before new events are
      created.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194148.340708074@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a33d4484
    • T
      watchdog/hardlockup/perf: Use new perf CPU enable mechanism · 146c9d0e
      Thomas Gleixner 提交于
      Get rid of the hodgepodge which tries to be smart about perf being
      unavailable and error printout rate limiting.
      
      That's all not required simply because this is never invoked when the perf
      NMI watchdog is not functional.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194148.259651788@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      146c9d0e
    • T
      watchdog/hardlockup/perf: Implement CPU enable replacement · 2a1b8ee4
      Thomas Gleixner 提交于
      watchdog_nmi_enable() is an unparseable mess, Provide a clean perf specific
      implementation, which will be used when the existing setup/teardown mess is
      replaced.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194148.180215498@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2a1b8ee4
    • T
      watchdog/hardlockup/perf: Implement init time detection of perf · a994a314
      Thomas Gleixner 提交于
      Use the init time detection of the perf NMI watchdog to determine whether
      the perf NMI watchdog is functional. If not disable it permanentely. It
      won't come back magically at runtime.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194148.099799541@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a994a314
    • T
      watchdog/hardlockup/perf: Implement init time perf validation · 178b9f7a
      Thomas Gleixner 提交于
      The watchdog tries to create perf events even after it figured out that
      perf is not functional or the requested event is not supported.
      
      That's braindead as this can be done once at init time and if not supported
      the NMI watchdog can be turned off unconditonally.
      
      Implement the perf hardlockup detector functionality for that. This creates
      a new event create function, which will replace the unholy mess of the
      existing one in later patches.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194148.019090547@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      178b9f7a
    • T
      watchdog/core: Get rid of the racy update loop · 09154985
      Thomas Gleixner 提交于
      Letting user space poke directly at variables which are used at run time is
      stupid and causes a lot of race conditions and other issues.
      
      Seperate the user variables and on change invoke the reconfiguration, which
      then stops the watchdogs, reevaluates the new user value and restarts the
      watchdogs with the new parameters.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194147.939985640@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      09154985
    • T
      watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage · 6592ad2f
      Thomas Gleixner 提交于
      Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure()
      need to be done in two steps.
      
           1) Stop all NMIs
           2) Read the new parameters and start NMIs
      
      Right now watchdog_nmi_reconfigure() is a combination of both. To allow a
      clean reconfiguration add a 'run' argument and split the functionality in
      powerpc.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20170912194147.862865570@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6592ad2f
    • T
      watchdog/sysctl: Clean up sysctl variable name space · 7feeb9cd
      Thomas Gleixner 提交于
      Reflect that these variables are user interface related and remove the
      whitespace damage in the sysctl table while at it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194147.783210221@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7feeb9cd
    • T
      watchdog/sysctl: Get rid of the #ifdeffery · 51d4052b
      Thomas Gleixner 提交于
      The sysctl of the nmi_watchdog file prevents writes by setting:
      
          min = max = 0
      
      if none of the users is enabled. That involves ifdeffery and is competely
      non obvious.
      
      If none of the facilities is enabeld, then the file can simply be made read
      only. Move the ifdeffery into the header and use a constant for file
      permissions.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194147.706073616@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      51d4052b
    • T
      watchdog/core: Further simplify sysctl handling · e8b62b2d
      Thomas Gleixner 提交于
      Use a single function to update sysctl changes. This is not a high
      frequency user space interface and it's root only.
      
      Preparatory patch to cleanup the sysctl variable handling.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194147.549114957@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e8b62b2d
    • T
      watchdog/core: Get rid of the thread teardown/setup dance · d57108d4
      Thomas Gleixner 提交于
      The lockup detector reconfiguration tears down all watchdog threads when
      the watchdog is disabled and sets them up again when its enabled.
      
      That's a pointless exercise. The watchdog threads are not consuming an
      insane amount of resources, so it's enough to set them up at init time and
      keep them in parked position when the watchdog is disabled and unpark them
      when it is reenabled. The smpboot thread infrastructure takes care of
      keeping the force parked threads in place even across cpu hotplug.
      
      Aside of that the code implements the park/unpark facility of smp hotplug
      threads on its own, which is even more pointless. We have functionality in
      the smpboot thread code to do so.
      
      Use the new thread management functions and get rid of the unholy mess.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194147.470370113@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d57108d4
    • T
      watchdog/core: Create new thread handling infrastructure · 2eb2527f
      Thomas Gleixner 提交于
      The lockup detector reconfiguration tears down all watchdog threads when
      the watchdog is disabled and sets them up again when its enabled.
      
      That's a pointless exercise. The watchdog threads are not consuming an
      insane amount of resources, so it's enough to set them up at init time and
      keep them in parked position when the watchdog is disabled and unpark them
      when it is reenabled. The smpboot thread infrastructure takes care of
      keeping the force parked threads in place even across cpu hotplug.
      
      Another horrible mechanism are the open coded park/unpark loops which are
      used for reconfiguration of the watchdog. The smpboot infrastructure allows
      exactly the same via smpboot_update_cpumask_thread_percpu(), which is cpu
      hotplug safe. Using that instead of the open coded loops allows to get rid
      of the hotplug locking mess in the watchdog code.
      
      Implement a clean infrastructure which allows to replace the open coded
      nonsense.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194147.377182587@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2eb2527f
    • T
      smpboot/threads, watchdog/core: Avoid runtime allocation · 0d85923c
      Thomas Gleixner 提交于
      smpboot_update_cpumask_threads_percpu() allocates a temporary cpumask at
      runtime. This is suboptimal because the call site needs more code size for
      proper error handling than a statically allocated temporary mask requires
      data size.
      
      Add static temporary cpumask. The function is globaly serialized, so no
      further protection required.
      
      Remove the half baken error handling in the watchdog code and get rid of
      the export as there are no in tree modular users of that function.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194147.297288838@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0d85923c
    • T
      watchdog/core: Split out cpumask write function · 05ba3de7
      Thomas Gleixner 提交于
      Split the write part of the cpumask proc handler out into a separate helper
      to avoid deep indentation. This also reduces the patch complexity in the
      following cleanups.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194147.218075991@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      05ba3de7
    • T
      watchdog/core: Clean up the #ifdef maze · 368a7e2c
      Thomas Gleixner 提交于
      The #ifdef maze in this file is horrible, group stuff at least a bit so one
      can figure out what belongs to what.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194147.139629546@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      368a7e2c
    • T
      watchdog/core: Clean up stub functions · 2b9d7f23
      Thomas Gleixner 提交于
      Having stub functions which take a full page is not helping the
      readablility of code.
      
      Condense them and move the doubled #ifdef variant into the SYSFS section.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194147.045545271@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2b9d7f23
    • T
      watchdog/core: Remove the park_in_progress obfuscation · 01f0a027
      Thomas Gleixner 提交于
      Commit:
      
        b94f5118 ("kernel/watchdog: prevent false hardlockup on overloaded system")
      
      tries to fix the following issue:
      
      proc_write()
         set_sample_period()    <--- New sample period becoms visible
      			  <----- Broken starts
         proc_watchdog_update()
           watchdog_enable_all_cpus()		watchdog_hrtimer_fn()
           update_watchdog_all_cpus()		   restart_timer(sample_period)
              watchdog_park_threads()
      
      					thread->park()
      					  disable_nmi()
      			  <----- Broken ends
      
      The reason why this is broken is that the update of the watchdog threshold
      becomes immediately effective and visible for the hrtimer function which
      uses that value to rearm the timer. But the NMI/perf side still uses the
      old value up to the point where it is disabled. If the rate has been
      lowered then the NMI can run fast enough to 'detect' a hard lockup because
      the timer has not fired due to the longer period.
      
      The patch 'fixed' this by adding a variable:
      
      proc_write()
         set_sample_period()
      					<----- Broken starts
         proc_watchdog_update()
           watchdog_enable_all_cpus()		watchdog_hrtimer_fn()
           update_watchdog_all_cpus()		   restart_timer(sample_period)
               watchdog_park_threads()
      	  park_in_progress = 1
      					<----- Broken ends
      				        nmi_watchdog()
      					  if (park_in_progress)
      					     return;
      
      The only effect of this variable was to make the window where the breakage
      can hit small enough that it was not longer observable in testing. From a
      correctness point of view it is a pointless bandaid which merily papers
      over the root cause: the unsychronized update of the variable.
      
      Looking deeper into the related code pathes unearthed similar problems in
      the watchdog_start()/stop() functions.
      
       watchdog_start()
      	perf_nmi_event_start()
      	hrtimer_start()
      
       watchdog_stop()
      	hrtimer_cancel()
      	perf_nmi_event_stop()
      
      In both cases the call order is wrong because if the tasks gets preempted
      or the VM gets scheduled out long enough after the first call, then there is
      a chance that the next NMI will see a stale hrtimer interrupt count and
      trigger a false positive hard lockup splat.
      
      Get rid of park_in_progress so the code can be gradually deobfuscated and
      pruned from several layers of duct tape papering over the root cause,
      which has been either ignored or not understood at all.
      
      Once this is removed the underlying problem will be fixed by rewriting the
      proc interface to do a proper synchronized update.
      
      Address the start/stop() ordering problem as well by reverting the call
      order, so this part is at least correct now.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1709052038270.2393@nanosSigned-off-by: NIngo Molnar <mingo@kernel.org>
      01f0a027
    • T
      watchdog/hardlockup/perf: Prevent CPU hotplug deadlock · 941154bd
      Thomas Gleixner 提交于
      The following deadlock is possible in the watchdog hotplug code:
      
        cpus_write_lock()
          ...
            takedown_cpu()
              smpboot_park_threads()
                smpboot_park_thread()
                  kthread_park()
                    ->park() := watchdog_disable()
                      watchdog_nmi_disable()
                        perf_event_release_kernel();
                          put_event()
                            _free_event()
                              ->destroy() := hw_perf_event_destroy()
                                x86_release_hardware()
                                  release_ds_buffers()
                                    get_online_cpus()
      
      when a per cpu watchdog perf event is destroyed which drops the last
      reference to the PMU hardware. The cleanup code there invokes
      get_online_cpus() which instantly deadlocks because the hotplug percpu
      rwsem is write locked.
      
      To solve this add a deferring mechanism:
      
        cpus_write_lock()
      			   kthread_park()
      			    watchdog_nmi_disable(deferred)
      			      perf_event_disable(event);
      			      move_event_to_deferred(event);
      			   ....
        cpus_write_unlock()
        cleaup_deferred_events()
          perf_event_release_kernel()
      
      This is still properly serialized against concurrent hotplug via the
      cpu_add_remove_lock, which is held by the task which initiated the hotplug
      event.
      
      This is also used to handle event destruction when the watchdog threads are
      parked via other mechanisms than CPU hotplug.
      Analyzed-by: NPeter Zijlstra <peterz@infradead.org>
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194146.884469246@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      941154bd
    • T
      watchdog/hardlockup/perf: Remove broken self disable on failure · 20d853fd
      Thomas Gleixner 提交于
      The self disabling feature is broken vs. CPU hotplug locking:
      
      CPU 0			   CPU 1
      cpus_write_lock();
       cpu_up(1)
         wait_for_completion()
      			   ....
      			   unpark_watchdog()
      			   ->unpark()
      			     perf_event_create() <- fails
      			       watchdog_enable &= ~NMI_WATCHDOG;
      			   ....
      cpus_write_unlock();
      			   CPU 2
      cpus_write_lock()
       cpu_down(2)
         wait_for_completion()
      			   wakeup(watchdog);
      			     watchdog()
      			     if (!(watchdog_enable & NMI_WATCHDOG))
      				watchdog_nmi_disable()
      				  perf_event_disable()
      				  ....
      				  cpus_read_lock();
      
      			   stop_smpboot_threads()
      			     park_watchdog();
      			       wait_for_completion(watchdog->parked);
      
      Result: End of hotplug and instantaneous full lockup of the machine.
      
      There is a similar problem with disabling the watchdog via the user space
      interface as the sysctl function fiddles with watchdog_enable directly.
      
      It's very debatable whether this is required at all. If the watchdog works
      nicely on N CPUs and it fails to enable on the N + 1 CPU either during
      hotplug or because the user space interface disabled it via sysctl cpumask
      and then some perf user grabbed the counter which is then unavailable for
      the watchdog when the sysctl cpumask gets changed back.
      
      There is no real justification for this.
      
      One of the reasons WHY this is done is the utter stupidity of the init code
      of the perf NMI watchdog. Instead of checking upfront at boot whether PERF
      is available and functional at all, it just does this check at run time
      over and over when user space fiddles with the sysctl. That's broken beyond
      repair along with the idiotic error code dependent warn level printks and
      the even more silly printk rate limiting.
      
      If the init code checks whether perf works at boot time, then this mess can
      be more or less avoided completely. Perf does not come magically into life
      at runtime. Brain usage while coding is overrated.
      
      Remove the cruft and add a temporary safe guard which gets removed later.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194146.806708429@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20d853fd
    • T
      watchdog/core: Mark hardlockup_detector_disable() __init · 7a355820
      Thomas Gleixner 提交于
      The function is only used by the KVM init code. Mark it __init to prevent
      creative abuse.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194146.727134632@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7a355820
    • T
      watchdog/core: Rename watchdog_proc_mutex · 946d1977
      Thomas Gleixner 提交于
      Following patches will use the mutex for other purposes as well. Rename it
      as it is not longer a proc specific thing.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194146.647714850@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      946d1977
    • T
      watchdog/core: Rework CPU hotplug locking · b7a34981
      Thomas Gleixner 提交于
      The watchdog proc interface causes extensive recursive locking of the CPU
      hotplug percpu rwsem, which is deadlock prone.
      
      Replace the get/put_online_cpus() pairs with cpu_hotplug_disable()/enable()
      calls for now. Later patches will remove that requirement completely.
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194146.568079057@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b7a34981
    • T
      watchdog/core: Remove broken suspend/resume interfaces · 5490125d
      Thomas Gleixner 提交于
      This interface has several issues:
      
       - It's causing recursive locking of the hotplug lock.
      
       - It's complete overkill to teardown all threads and then recreate them
      
      The same can be achieved with the simple hardlockup_detector_perf_stop /
      restart() interfaces. The abuse from the busy looping poweroff() loop of
      PARISC has been solved as well.
      
      Remove the cruft.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194146.487537732@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5490125d
    • T
      watchdog/core: Provide interface to stop from poweroff() · 6554fd8c
      Thomas Gleixner 提交于
      PARISC has a a busy looping power off routine. If the watchdog is enabled
      the watchdog timer will still fire, but the thread is not running, which
      causes the softlockup watchdog to trigger.
      
      Provide a interface which allows to turn the watchdog off.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: linux-parisc@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170912194146.327343752@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6554fd8c
    • P
      watchdog/hardlockup: Provide interface to stop/restart perf events · d0b6e0a8
      Peter Zijlstra 提交于
      Provide an interface to stop and restart perf NMI watchdog events on all
      CPUs. This is only usable during init and especially for handling the perf
      HT bug on Intel machines. It's safe to use it this way as nothing can
      start/stop the NMI watchdog in parallel.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194146.167649596@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d0b6e0a8
  5. 12 9月, 2017 4 次提交
  6. 11 9月, 2017 1 次提交
  7. 09 9月, 2017 6 次提交
    • J
      bpf: devmap, use cond_resched instead of cpu_relax · 374fb014
      John Fastabend 提交于
      Be a bit more friendly about waiting for flush bits to complete.
      Replace the cpu_relax() with a cond_resched().
      Suggested-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      374fb014
    • J
      bpf: add support for sockmap detach programs · 5a67da2a
      John Fastabend 提交于
      The bpf map sockmap supports adding programs via attach commands. This
      patch adds the detach command to keep the API symmetric and allow
      users to remove previously added programs. Otherwise the user would
      have to delete the map and re-add it to get in this state.
      
      This also adds a series of additional tests to capture detach operation
      and also attaching/detaching invalid prog types.
      
      API note: socks will run (or not run) programs depending on the state
      of the map at the time the sock is added. We do not for example walk
      the map and remove programs from previously attached socks.
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a67da2a
    • D
      bpf: don't select potentially stale ri->map from buggy xdp progs · 109980b8
      Daniel Borkmann 提交于
      We can potentially run into a couple of issues with the XDP
      bpf_redirect_map() helper. The ri->map in the per CPU storage
      can become stale in several ways, mostly due to misuse, where
      we can then trigger a use after free on the map:
      
      i) prog A is calling bpf_redirect_map(), returning XDP_REDIRECT
      and running on a driver not supporting XDP_REDIRECT yet. The
      ri->map on that CPU becomes stale when the XDP program is unloaded
      on the driver, and a prog B loaded on a different driver which
      supports XDP_REDIRECT return code. prog B would have to omit
      calling to bpf_redirect_map() and just return XDP_REDIRECT, which
      would then access the freed map in xdp_do_redirect() since not
      cleared for that CPU.
      
      ii) prog A is calling bpf_redirect_map(), returning a code other
      than XDP_REDIRECT. prog A is then detached, which triggers release
      of the map. prog B is attached which, similarly as in i), would
      just return XDP_REDIRECT without having called bpf_redirect_map()
      and thus be accessing the freed map in xdp_do_redirect() since
      not cleared for that CPU.
      
      iii) prog A is attached to generic XDP, calling the bpf_redirect_map()
      helper and returning XDP_REDIRECT. xdp_do_generic_redirect() is
      currently not handling ri->map (will be fixed by Jesper), so it's
      not being reset. Later loading a e.g. native prog B which would,
      say, call bpf_xdp_redirect() and then returns XDP_REDIRECT would
      find in xdp_do_redirect() that a map was set and uses that causing
      use after free on map access.
      
      Fix thus needs to avoid accessing stale ri->map pointers, naive
      way would be to call a BPF function from drivers that just resets
      it to NULL for all XDP return codes but XDP_REDIRECT and including
      XDP_REDIRECT for drivers not supporting it yet (and let ri->map
      being handled in xdp_do_generic_redirect()). There is a less
      intrusive way w/o letting drivers call a reset for each BPF run.
      
      The verifier knows we're calling into bpf_xdp_redirect_map()
      helper, so it can do a small insn rewrite transparent to the prog
      itself in the sense that it fills R4 with a pointer to the own
      bpf_prog. We have that pointer at verification time anyway and
      R4 is allowed to be used as per calling convention we scratch
      R0 to R5 anyway, so they become inaccessible and program cannot
      read them prior to a write. Then, the helper would store the prog
      pointer in the current CPUs struct redirect_info. Later in
      xdp_do_*_redirect() we check whether the redirect_info's prog
      pointer is the same as passed xdp_prog pointer, and if that's
      the case then all good, since the prog holds a ref on the map
      anyway, so it is always valid at that point in time and must
      have a reference count of at least 1. If in the unlikely case
      they are not equal, it means we got a stale pointer, so we clear
      and bail out right there. Also do reset map and the owning prog
      in bpf_xdp_redirect(), so that bpf_xdp_redirect_map() and
      bpf_xdp_redirect() won't get mixed up, only the last call should
      take precedence. A tc bpf_redirect() doesn't use map anywhere
      yet, so no need to clear it there since never accessed in that
      layer.
      
      Note that in case the prog is released, and thus the map as
      well we're still under RCU read critical section at that time
      and have preemption disabled as well. Once we commit with the
      __dev_map_insert_ctx() from xdp_do_redirect_map() and set the
      map to ri->map_to_flush, we still wait for a xdp_do_flush_map()
      to finish in devmap dismantle time once flush_needed bit is set,
      so that is fine.
      
      Fixes: 97f91a7c ("bpf: add bpf_redirect_map helper routine")
      Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      109980b8
    • D
      kcov: support compat processes · 7483e5d4
      Dmitry Vyukov 提交于
      Support compat processes in KCOV by providing compat_ioctl callback.
      Compat mode uses the same ioctl callback: we have 2 commands that do not
      use the argument and 1 that already checks that the arg does not overflow
      INT_MAX.  This allows to use KCOV-guided fuzzing in compat processes.
      
      Link: http://lkml.kernel.org/r/20170823100553.55812-1-dvyukov@google.comSigned-off-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: <syzkaller@googlegroups.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7483e5d4
    • R
      drivers/pps: aesthetic tweaks to PPS-related content · a2d81803
      Robert P. J. Day 提交于
      Collection of aesthetic adjustments to various PPS-related files,
      directories and Documentation, some quite minor just for the sake of
      consistency, including:
      
       * Updated example of pps device tree node (courtesy Rodolfo G.)
       * "PPS-API" -> "PPS API"
       * "pps_source_info_s" -> "pps_source_info"
       * "ktimer driver" -> "pps-ktimer driver"
       * "ppstest /dev/pps0" -> "ppstest /dev/pps1" to match example
       * Add missing PPS-related entries to MAINTAINERS file
       * Other trivialities
      
      Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708261048220.8106@localhost.localdomainSigned-off-by: NRobert P. J. Day <rpjday@crashcourse.ca>
      Acked-by: NRodolfo Giometti <giometti@enneenne.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2d81803
    • L
      kmod: move #ifdef CONFIG_MODULES wrapper to Makefile · 0ce2c202
      Luis R. Rodriguez 提交于
      The entire file is now conditionally compiled only when CONFIG_MODULES is
      enabled, and this this is a bool.  Just move this conditional to the
      Makefile as its easier to read this way.
      
      Link: http://lkml.kernel.org/r/20170810180618.22457-5-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Cc: David Binderman <dcb314@hotmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ce2c202