1. 13 3月, 2015 8 次提交
    • J
      clocksource: Rename __clocksource_updatefreq_*() to __clocksource_update_freq_*() · fba9e072
      John Stultz 提交于
      Ingo requested this function be renamed to improve readability,
      so I've renamed __clocksource_updatefreq_scale() as well as the
      __clocksource_updatefreq_hz/khz() functions to avoid
      squishedtogethernames.
      
      This touches some of the sh clocksources, which I've not tested.
      
      The arch/arm/plat-omap change is just a comment change for
      consistency.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-13-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fba9e072
    • J
      clocksource: Add some debug info about clocksources being registered · 8cc8c525
      John Stultz 提交于
      Print the mask, max_cycles, and max_idle_ns values for
      clocksources being registered.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-12-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8cc8c525
    • J
      clocksource: Mostly kill clocksource_register() · f8935983
      John Stultz 提交于
      A long running project has been to clean up remaining uses
      of clocksource_register(), replacing it with the simpler
      clocksource_register_khz/hz() functions.
      
      However, there are a few cases where we need to self-define
      our mult/shift values, so switch the function to a more
      obviously internal __clocksource_register() name, and
      consolidate much of the internal logic so we don't have
      duplication.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-10-git-send-email-john.stultz@linaro.org
      [ Minor cleanups. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f8935983
    • J
      clocksource: Improve clocksource watchdog reporting · 0b046b21
      John Stultz 提交于
      The clocksource watchdog reporting has been less helpful
      then desired, as it just printed the delta between
      the two clocksources. This prevents any useful analysis
      of why the skew occurred.
      
      Thus this patch tries to improve the output when we
      mark a clocksource as unstable, printing out the cycle
      last and now values for both the current clocksource
      and the watchdog clocksource. This will allow us to see
      if the result was due to a false positive caused by
      a problematic watchdog.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-9-git-send-email-john.stultz@linaro.org
      [ Minor cleanups of kernel messages. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0b046b21
    • J
      timekeeping: Add warnings when overflows or underflows are observed · 4ca22c26
      John Stultz 提交于
      It was suggested that the underflow/overflow protection
      should probably throw some sort of warning out, rather
      than just silently fixing the issue.
      
      So this patch adds some warnings here. The flag variables
      used are not protected by locks, but since we can't print
      from the reading functions, just being able to say we
      saw an issue in the update interval is useful enough,
      and can be slightly racy without real consequence.
      
      The big complication is that we're only under a read
      seqlock, so the data could shift under us during
      our calculation to see if there was a problem. This
      patch avoids this issue by nesting another seqlock
      which allows us to snapshot the just required values
      atomically. So we shouldn't see false positives.
      
      I also added some basic rate-limiting here, since
      on one build machine w/ skewed TSCs it was fairly
      noisy at bootup.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-8-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4ca22c26
    • J
      timekeeping: Try to catch clocksource delta underflows · 057b87e3
      John Stultz 提交于
      In the case where there is a broken clocksource
      where there are multiple actual clocks that
      aren't perfectly aligned, we may see small "negative"
      deltas when we subtract 'now' from 'cycle_last'.
      
      The values are actually negative with respect to the
      clocksource mask value, not necessarily negative
      if cast to a s64, but we can check by checking the
      delta to see if it is a small (relative to the mask)
      negative value (again negative relative to the mask).
      
      If so, we assume we jumped backwards somehow and
      instead use zero for our delta.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-7-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      057b87e3
    • J
      timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value · a558cd02
      John Stultz 提交于
      When calculating the current delta since the last tick, we
      currently have no hard protections to prevent a multiplication
      overflow from occuring.
      
      This patch introduces infrastructure to allow a cap that
      limits the clocksource read delta value to the 'max_cycles' value,
      which is where an overflow would occur.
      
      Since this is in the hotpath, it adds the extra checking under
      CONFIG_DEBUG_TIMEKEEPING=y.
      
      There was some concern that capping time like this could cause
      problems as we may stop expiring timers, which could go circular
      if the timer that triggers time accumulation were mis-scheduled
      too far in the future, which would cause time to stop.
      
      However, since the mult overflow would result in a smaller time
      value, we would effectively have the same problem there.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-6-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a558cd02
    • J
      timekeeping: Add debugging checks to warn if we see delays · 3c17ad19
      John Stultz 提交于
      Recently there's been requests for better sanity
      checking in the time code, so that it's more clear
      when something is going wrong, since timekeeping issues
      could manifest in a large number of strange ways in
      various subsystems.
      
      Thus, this patch adds some extra infrastructure to
      add a check to update_wall_time() to print two new
      warnings:
      
       1) if we see the call delayed beyond the 'max_cycles'
          overflow point,
      
       2) or if we see the call delayed beyond the clocksource's
          'max_idle_ns' value, which is currently 50% of the
          overflow point.
      
      This extra infrastructure is conditional on
      a new CONFIG_DEBUG_TIMEKEEPING option, also
      added in this patch - default off.
      
      Tested this a bit by halting qemu for specified
      lengths of time to trigger the warnings.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stultz@linaro.org
      [ Improved the changelog and the messages a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3c17ad19
  2. 12 3月, 2015 3 次提交
    • J
      clocksource: Add 'max_cycles' to 'struct clocksource' · fb82fe2f
      John Stultz 提交于
      In order to facilitate clocksource validation, add a
      'max_cycles' field to the clocksource structure which
      will hold the maximum cycle value that can safely be
      multiplied without potentially causing an overflow.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-4-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fb82fe2f
    • J
      clocksource: Simplify the logic around clocksource wrapping safety margins · 362fde04
      John Stultz 提交于
      The clocksource logic has a number of places where we try to
      include a safety margin. Most of these are 12% safety margins,
      but they are inconsistently applied and sometimes are applied
      on top of each other.
      
      Additionally, in the previous patch, we corrected an issue
      where we unintentionally in effect created a 50% safety margin,
      which these 12.5% margins where then added to.
      
      So to simplify the logic here, this patch removes the various
      12.5% margins, and consolidates adding the margin in one place:
      clocks_calc_max_nsecs().
      
      Additionally, Linus prefers a 50% safety margin, as it allows
      bad clock values to be more easily caught. This should really
      have no net effect, due to the corrected issue earlier which
      caused greater then 50% margins to be used w/o issue.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: Stephen Boyd <sboyd@codeaurora.org> (for the sched_clock.c bit)
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      362fde04
    • J
      clocksource: Simplify the clocks_calc_max_nsecs() logic · 6086e346
      John Stultz 提交于
      The previous clocks_calc_max_nsecs() code had some unecessarily
      complex bit logic to find the max interval that could cause
      multiplication overflows. Since this is not in the hot
      path, just do the divide to make it easier to read.
      
      The previous implementation also had a subtle issue
      that it avoided overflows with signed 64-bit values, where
      as the intervals are always unsigned. This resulted in
      overly conservative intervals, which other safety margins
      were then added to, reducing the intended interval length.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1426133800-29329-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6086e346
  3. 18 2月, 2015 2 次提交
    • V
      clockevents: Introduce mode specific callbacks · bd624d75
      Viresh Kumar 提交于
      It is not possible for the clockevents core to know which modes (other than
      those with a corresponding feature flag) are supported by a particular
      implementation. And drivers are expected to handle transition to all modes
      elegantly, as ->set_mode() would be issued for them unconditionally.
      
      Now, adding support for a new mode complicates things a bit if we want to use
      the legacy ->set_mode() callback. We need to closely review all clockevents
      drivers to see if they would break on addition of a new mode. And after such
      reviews, it is found that we have to do non-trivial changes to most of the
      drivers [1].
      
      Introduce mode-specific set_mode_*() callbacks, some of which the drivers may or
      may not implement. A missing callback would clearly convey the message that the
      corresponding mode isn't supported.
      
      A driver may still choose to keep supporting the legacy ->set_mode() callback,
      but ->set_mode() wouldn't be supporting any new modes beyond RESUME. If a driver
      wants to benefit from using a new mode, it would be required to migrate to
      the mode specific callbacks.
      
      The legacy ->set_mode() callback and the newly introduced mode-specific
      callbacks are mutually exclusive. Only one of them should be supported by the
      driver.
      
      Sanity check is done at the time of registration to distinguish between optional
      and required callbacks and to make error recovery and handling simpler. If the
      legacy ->set_mode() callback is provided, all mode specific ones would be
      ignored by the core but a warning is thrown if they are present.
      
      Call sites calling ->set_mode() directly are also updated to use
      __clockevents_set_mode() instead, as ->set_mode() may not be available anymore
      for few drivers.
      
       [1] https://lkml.org/lkml/2014/12/9/605
       [2] https://lkml.org/lkml/2015/1/23/255
      
      Suggested-by: Thomas Gleixner <tglx@linutronix.de> [2]
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: linaro-kernel@lists.linaro.org
      Cc: linaro-networking@linaro.org
      Link: http://lkml.kernel.org/r/792d59a40423f0acffc9bb0bec9de1341a06fa02.1423788565.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bd624d75
    • J
      ntp: Fixup adjtimex freq validation on 32-bit systems · 29183a70
      John Stultz 提交于
      Additional validation of adjtimex freq values to avoid
      potential multiplication overflows were added in commit
      5e5aeb43 (time: adjtimex: Validate the ADJ_FREQUENCY values)
      
      Unfortunately the patch used LONG_MAX/MIN instead of
      LLONG_MAX/MIN, which was fine on 64-bit systems, but being
      much smaller on 32-bit systems caused false positives
      resulting in most direct frequency adjustments to fail w/
      EINVAL.
      
      ntpd only does direct frequency adjustments at startup, so
      the issue was not as easily observed there, but other time
      sync applications like ptpd and chrony were more effected by
      the bug.
      
      See bugs:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=92481
        https://bugzilla.redhat.com/show_bug.cgi?id=1188074
      
      This patch changes the checks to use LLONG_MAX for
      clarity, and additionally the checks are disabled
      on 32-bit systems since LLONG_MAX/PPM_SCALE is always
      larger then the 32-bit long freq value, so multiplication
      overflows aren't possible there.
      Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
      Reported-by: NGeorge Joseph <george.joseph@fairview5.com>
      Tested-by: NGeorge Joseph <george.joseph@fairview5.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # v3.19+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stultz@linaro.org
      [ Prettified the changelog and the comments a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      29183a70
  4. 16 2月, 2015 2 次提交
    • R
      PM / sleep: Make it possible to quiesce timers during suspend-to-idle · 124cf911
      Rafael J. Wysocki 提交于
      The efficiency of suspend-to-idle depends on being able to keep CPUs
      in the deepest available idle states for as much time as possible.
      Ideally, they should only be brought out of idle by system wakeup
      interrupts.
      
      However, timer interrupts occurring periodically prevent that from
      happening and it is not practical to chase all of the "misbehaving"
      timers in a whack-a-mole fashion.  A much more effective approach is
      to suspend the local ticks for all CPUs and the entire timekeeping
      along the lines of what is done during full suspend, which also
      helps to keep suspend-to-idle and full suspend reasonably similar.
      
      The idea is to suspend the local tick on each CPU executing
      cpuidle_enter_freeze() and to make the last of them suspend the
      entire timekeeping.  That should prevent timer interrupts from
      triggering until an IO interrupt wakes up one of the CPUs.  It
      needs to be done with interrupts disabled on all of the CPUs,
      though, because otherwise the suspended clocksource might be
      accessed by an interrupt handler which might lead to fatal
      consequences.
      
      Unfortunately, the existing ->enter callbacks provided by cpuidle
      drivers generally cannot be used for implementing that, because some
      of them re-enable interrupts temporarily and some idle entry methods
      cause interrupts to be re-enabled automatically on exit.  Also some
      of these callbacks manipulate local clock event devices of the CPUs
      which really shouldn't be done after suspending their ticks.
      
      To overcome that difficulty, introduce a new cpuidle state callback,
      ->enter_freeze, that will be guaranteed (1) to keep interrupts
      disabled all the time (and return with interrupts disabled) and (2)
      not to touch the CPU timer devices.  Modify cpuidle_enter_freeze() to
      look for the deepest available idle state with ->enter_freeze present
      and to make the CPU execute that callback with suspended tick (and the
      last of the online CPUs to execute it with suspended timekeeping).
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      124cf911
    • R
      timekeeping: Make it safe to use the fast timekeeper while suspended · 060407ae
      Rafael J. Wysocki 提交于
      Theoretically, ktime_get_mono_fast_ns() may be executed after
      timekeeping has been suspended (or before it is resumed) which
      in turn may lead to undefined behavior, for example, when the
      clocksource read from timekeeping_get_ns() called by it is
      not accessible at that time.
      
      Prevent that from happening by setting up a dummy readout base for
      the fast timekeeper during timekeeping_suspend() such that it will
      always return the same number of cycles.
      
      After the last timekeeping_update() in timekeeping_suspend() the
      clocksource is read and the result is stored as cycles_at_suspend.
      The readout base from the current timekeeper is copied onto the
      dummy and the ->read pointer of the dummy is set to a routine
      unconditionally returning cycles_at_suspend.  Next, the dummy is
      passed to update_fast_timekeeper().
      
      Then, ktime_get_mono_fast_ns() will work until the subsequent
      timekeeping_resume() and the proper readout base for the fast
      timekeeper will be restored by the timekeeping_update() called
      right after clearing timekeeping_suspended.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      060407ae
  5. 14 2月, 2015 2 次提交
  6. 13 2月, 2015 1 次提交
    • A
      all arches, signal: move restart_block to struct task_struct · f56141e3
      Andy Lutomirski 提交于
      If an attacker can cause a controlled kernel stack overflow, overwriting
      the restart block is a very juicy exploit target.  This is because the
      restart_block is held in the same memory allocation as the kernel stack.
      
      Moving the restart block to struct task_struct prevents this exploit by
      making the restart_block harder to locate.
      
      Note that there are other fields in thread_info that are also easy
      targets, at least on some architectures.
      
      It's also a decent simplification, since the restart code is more or less
      identical on all architectures.
      
      [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: NRichard Weinberger <richard@nod.at>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f56141e3
  7. 05 2月, 2015 1 次提交
  8. 24 1月, 2015 4 次提交
  9. 23 1月, 2015 1 次提交
    • T
      hrtimer: Prevent stale expiry time in hrtimer_interrupt() · 9bc74919
      Thomas Gleixner 提交于
      hrtimer_interrupt() has the following subtle issue:
      
      hrtimer_interrupt()
        lock(cpu_base);
        expires_next = KTIME_MAX;
      
        expire_timers(CLOCK_MONOTONIC);
        expires = get_next_timer(CLOCK_MONOTONIC);
        if (expires < expires_next)
          expires_next = expires;
      
        expire_timers(CLOCK_REALTIME);
          unlock(cpu_base);
          wakeup()
          hrtimer_start(CLOCK_MONOTONIC, newtimer);
          lock(cpu_base();  
        expires = get_next_timer(CLOCK_REALTIME);
        if (expires < expires_next)
          expires_next = expires;
      
      So because we already evaluated the next expiring timer of
      CLOCK_MONOTONIC we ignore that the expiry time of newtimer might be
      earlier than the overall next expiry time in hrtimer_interrupt().
      
      To solve this, remove the caching of the next expiry value from
      hrtimer_interrupt() and reevaluate all active clock bases for the next
      expiry value. To avoid another code duplication, create a shared
      evaluation function and use it for hrtimer_get_next_event(),
      hrtimer_force_reprogram() and hrtimer_interrupt().
      
      There is another subtlety in this mechanism:
      
      While hrtimer_interrupt() is running, we want to avoid to touch the
      hardware device because we will reprogram it anyway at the end of
      hrtimer_interrupt(). This works nicely for hrtimers which get rearmed
      via the HRTIMER_RESTART mechanism, because we drop out when the
      callback on that CPU is running. But that fails, if a new timer gets
      enqueued like in the example above.
      
      This has another implication: While hrtimer_interrupt() is running we
      refuse remote enqueueing of timers - see hrtimer_interrupt() and
      hrtimer_check_target().
      
      hrtimer_interrupt() tries to prevent this by setting cpu_base->expires
      to KTIME_MAX, but that fails if a new timer gets queued.
      
      Prevent both the hardware access and the remote enqueue
      explicitely. We can loosen the restriction on the remote enqueue now
      due to reevaluation of the next expiry value, but that needs a
      seperate patch.
      
      Folded in a fix from Vignesh Radhakrishnan.
      Reported-and-tested-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Based-on-patch-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: vigneshr@codeaurora.org
      Cc: john.stultz@linaro.org
      Cc: viresh.kumar@linaro.org
      Cc: fweisbec@gmail.com
      Cc: cl@linux.com
      Cc: stuart.w.hayes@gmail.com
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1501202049190.5526@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9bc74919
  10. 08 1月, 2015 2 次提交
  11. 31 12月, 2014 2 次提交
  12. 19 12月, 2014 1 次提交
    • T
      tick/powerclamp: Remove tick_nohz_idle abuse · a5fd9733
      Thomas Gleixner 提交于
      commit 4dbd2771 "tick: export nohz tick idle symbols for module
      use" was merged via the thermal tree without an explicit ack from the
      relevant maintainers.
      
      The exports are abused by the intel powerclamp driver which implements
      a fake idle state from a sched FIFO task. This causes all kinds of
      wreckage in the NOHZ core code which rightfully assumes that
      tick_nohz_idle_enter/exit() are only called from the idle task itself.
      
      Recent changes in the NOHZ core lead to a failure of the powerclamp
      driver and now people try to hack completely broken and backwards
      workarounds into the NOHZ core code. This is completely unacceptable
      and just papers over the real problem. There are way more subtle
      issues lurking around the corner.
      
      The real solution is to fix the powerclamp driver by rewriting it with
      a sane concept, but that's beyond the scope of this.
      
      So the only solution for now is to remove the calls into the core NOHZ
      code from the powerclamp trainwreck along with the exports. 
      
      Fixes: d6d71ee4 "PM: Introduce Intel PowerClamp Driver"
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Pan Jacob jun <jacob.jun.pan@intel.com>
      Cc: LKP <lkp@01.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a5fd9733
  13. 05 12月, 2014 1 次提交
  14. 25 11月, 2014 1 次提交
  15. 22 11月, 2014 9 次提交