1. 18 8月, 2015 1 次提交
    • J
      timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers · 38bf985b
      John Stultz 提交于
      I noticed for non-monotonic timers in timer_list, some of the
      output looked a little confusing.
      
      For example:
       #1: <0000000000000000>, posix_timer_fn, S:01, hrtimer_start_range_ns, leap-a-day/2360
       # expires at 1434412800000000000-1434412800000000000 nsecs [in 1434410725062375469 to 1434410725062375469 nsecs]
      
      You'll note the relative time till the expiration "[in xxx to
      yyy nsecs]" is incorrect. This is because its printing the delta
      between CLOCK_MONOTONIC time to the CLOCK_REALTIME expiration.
      
      This patch fixes this issue by adding the clock offset to the
      "now" time which we use to calculate the delta.
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      38bf985b
  2. 19 6月, 2015 1 次提交
    • T
      timer: Reduce timer migration overhead if disabled · bc7a34b8
      Thomas Gleixner 提交于
      Eric reported that the timer_migration sysctl is not really nice
      performance wise as it needs to check at every timer insertion whether
      the feature is enabled or not. Further the check does not live in the
      timer code, so we have an extra function call which checks an extra
      cache line to figure out that it is disabled.
      
      We can do better and store that information in the per cpu (hr)timer
      bases. I pondered to use a static key, but that's a nightmare to
      update from the nohz code and the timer base cache line is hot anyway
      when we select a timer base.
      
      The old logic enabled the timer migration unconditionally if
      CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
      line.
      
      With this modification, we start off with migration disabled. The user
      visible sysctl is still set to enabled. If the kernel switches to NOHZ
      migration is enabled, if the user did not disable it via the sysctl
      prior to the switch. If nohz=off is on the kernel command line,
      migration stays disabled no matter what.
      
      Before:
        47.76%  hog       [.] main
        14.84%  [kernel]  [k] _raw_spin_lock_irqsave
         9.55%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.71%  [kernel]  [k] mod_timer
         6.24%  [kernel]  [k] lock_timer_base.isra.38
         3.76%  [kernel]  [k] detach_if_pending
         3.71%  [kernel]  [k] del_timer
         2.50%  [kernel]  [k] internal_add_timer
         1.51%  [kernel]  [k] get_nohz_timer_target
         1.28%  [kernel]  [k] __internal_add_timer
         0.78%  [kernel]  [k] timerfn
         0.48%  [kernel]  [k] wake_up_nohz_cpu
      
      After:
        48.10%  hog       [.] main
        15.25%  [kernel]  [k] _raw_spin_lock_irqsave
         9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.50%  [kernel]  [k] mod_timer
         6.44%  [kernel]  [k] lock_timer_base.isra.38
         3.87%  [kernel]  [k] detach_if_pending
         3.80%  [kernel]  [k] del_timer
         2.67%  [kernel]  [k] internal_add_timer
         1.33%  [kernel]  [k] __internal_add_timer
         0.73%  [kernel]  [k] timerfn
         0.54%  [kernel]  [k] wake_up_nohz_cpu
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      bc7a34b8
  3. 19 5月, 2015 1 次提交
    • V
      clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state · 8fff52fd
      Viresh Kumar 提交于
      When no timers/hrtimers are pending, the expiry time is set to a
      special value: 'KTIME_MAX'. This normally happens with
      NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
      
      When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer
      (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device
      (NOHZ_MODE_LOWRES).  But, the clockevent device is already
      reprogrammed from the tick-handler for next tick.
      
      As the clock event device is programmed in ONESHOT mode it will at
      least fire one more time (unnecessarily). Timers on few
      implementations (like arm_arch_timer, etc.) only support PERIODIC mode
      and their drivers emulate ONESHOT over that. Which means that on these
      platforms we will get spurious interrupts periodically (at last
      programmed interval rate, normally tick rate).
      
      In order to avoid spurious interrupts, the clockevent device should be
      stopped or its interrupts should be masked.
      
      A simple (yet hacky) solution to get this fixed could be: update
      hrtimer_force_reprogram() to always reprogram clockevent device and
      update clockevent drivers to STOP generating events (or delay it to
      max time) when 'expires' is set to KTIME_MAX. But the drawback here is
      that every clockevent driver has to be hacked for this particular case
      and its very easy for new ones to miss this.
      
      However, Thomas suggested to add an optional state ONESHOT_STOPPED to
      solve this problem: lkml.org/lkml/2014/5/9/508.
      
      This patch adds support for ONESHOT_STOPPED state in clockevents
      core. It will only be available to drivers that implement the
      state-specific callbacks instead of the legacy ->set_mode() callback.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NPreeti U. Murthy <preeti@linux.vnet.ibm.com>
      Cc: linaro-kernel@lists.linaro.org
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/b8b383a03ac07b13312c16850b5106b82e4245b5.1428031396.git.viresh.kumar@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8fff52fd
  4. 05 5月, 2015 1 次提交
  5. 22 4月, 2015 4 次提交
  6. 01 4月, 2015 1 次提交
  7. 27 3月, 2015 2 次提交
  8. 18 2月, 2015 1 次提交
    • V
      clockevents: Introduce mode specific callbacks · bd624d75
      Viresh Kumar 提交于
      It is not possible for the clockevents core to know which modes (other than
      those with a corresponding feature flag) are supported by a particular
      implementation. And drivers are expected to handle transition to all modes
      elegantly, as ->set_mode() would be issued for them unconditionally.
      
      Now, adding support for a new mode complicates things a bit if we want to use
      the legacy ->set_mode() callback. We need to closely review all clockevents
      drivers to see if they would break on addition of a new mode. And after such
      reviews, it is found that we have to do non-trivial changes to most of the
      drivers [1].
      
      Introduce mode-specific set_mode_*() callbacks, some of which the drivers may or
      may not implement. A missing callback would clearly convey the message that the
      corresponding mode isn't supported.
      
      A driver may still choose to keep supporting the legacy ->set_mode() callback,
      but ->set_mode() wouldn't be supporting any new modes beyond RESUME. If a driver
      wants to benefit from using a new mode, it would be required to migrate to
      the mode specific callbacks.
      
      The legacy ->set_mode() callback and the newly introduced mode-specific
      callbacks are mutually exclusive. Only one of them should be supported by the
      driver.
      
      Sanity check is done at the time of registration to distinguish between optional
      and required callbacks and to make error recovery and handling simpler. If the
      legacy ->set_mode() callback is provided, all mode specific ones would be
      ignored by the core but a warning is thrown if they are present.
      
      Call sites calling ->set_mode() directly are also updated to use
      __clockevents_set_mode() instead, as ->set_mode() may not be available anymore
      for few drivers.
      
       [1] https://lkml.org/lkml/2014/12/9/605
       [2] https://lkml.org/lkml/2015/1/23/255
      
      Suggested-by: Thomas Gleixner <tglx@linutronix.de> [2]
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: linaro-kernel@lists.linaro.org
      Cc: linaro-networking@linaro.org
      Link: http://lkml.kernel.org/r/792d59a40423f0acffc9bb0bec9de1341a06fa02.1423788565.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bd624d75
  9. 29 8月, 2013 1 次提交
  10. 18 4月, 2013 2 次提交
  11. 12 6月, 2012 1 次提交
    • F
      nohz: Rename ts->idle_tick to ts->last_tick · f5d411c9
      Frederic Weisbecker 提交于
      Now that idle and nohz logics are going to be independant each others,
      ts->idle_tick becomes too much a biased name to describe the field that
      saves the last scheduled tick on top of which we re-calculate the next
      tick to schedule when the timer is restarted.
      
      We want to reuse this even to stop the tick outside idle cases. So let's
      rename it to some more generic name: ts->last_tick.
      
      This changes a bit the timer list stat export so we need to increase its
      version.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      f5d411c9
  12. 12 2月, 2011 1 次提交
  13. 11 12月, 2010 1 次提交
  14. 10 5月, 2010 1 次提交
  15. 13 3月, 2010 1 次提交
  16. 17 12月, 2009 1 次提交
  17. 15 12月, 2009 1 次提交
  18. 10 12月, 2009 1 次提交
    • T
      hrtimer: Tune hrtimer_interrupt hang logic · 41d2e494
      Thomas Gleixner 提交于
      The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
      execution time of the hrtimer callbacks.
      
      This is error-prone for virtual machines, where a guest vcpu can be
      scheduled out during the execution of the callbacks (and the callbacks
      themselves can do operations that translate to blocking operations in
      the hypervisor), which in can lead to large min_delta_ns rendering the
      system unusable.
      
      Replace the current heuristics with something more reliable. Allow the
      interrupt code to try 3 times to catch up with the lost time. If that
      fails use the total time spent in the interrupt handler to defer the
      next timer interrupt so the system can catch up with other things
      which got delayed. Limit that deferment to 100ms.
      
      The retry events and the maximum time spent in the interrupt handler
      are recorded and exposed via /proc/timer_list
      
      Inspired by a patch from Marcelo.
      Reported-by: NMichael Tokarev <mjt@tls.msk.ru>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Cc: kvm@vger.kernel.org
      41d2e494
  19. 14 11月, 2009 2 次提交
    • J
      nohz: Allow 32-bit machines to sleep for more than 2.15 seconds · 97813f2f
      Jon Hunter 提交于
      In the dynamic tick code, "max_delta_ns" (member of the
      "clock_event_device" structure) represents the maximum sleep time
      that can occur between timer events in nanoseconds.
      
      The variable, "max_delta_ns", is defined as an unsigned long
      which is a 32-bit integer for 32-bit machines and a 64-bit
      integer for 64-bit machines (if -m64 option is used for gcc).
      The value of max_delta_ns is set by calling the function
      "clockevent_delta2ns()" which returns a maximum value of LONG_MAX.
      For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in
      nanoseconds this equates to ~2.15 seconds. Hence, the maximum
      sleep time for a 32-bit machine is ~2.15 seconds, where as for
      a 64-bit machine it will be many years.
      
      This patch changes the type of max_delta_ns to be "u64" instead of
      "unsigned long" so that this variable is a 64-bit type for both 32-bit
      and 64-bit machines. It also changes the maximum value returned by
      clockevent_delta2ns() to KTIME_MAX.  Hence this allows a 32-bit
      machine to sleep for longer than ~2.15 seconds. Please note that this
      patch also changes "min_delta_ns" to be "u64" too and although this is
      unnecessary, it makes the patch simpler as it avoids to fixup all
      callers of clockevent_delta2ns().
      
      [ tglx: changed "unsigned long long" to u64 as we use this data type
        	through out the time code ]
      Signed-off-by: NJon Hunter <jon-hunter@ti.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1250617512-23567-3-git-send-email-jon-hunter@ti.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      97813f2f
    • T
      clockevents: Use u32 for mult and shift factors · 23af368e
      Thomas Gleixner 提交于
      The mult and shift factors of clock events differ in their data type
      from those of clock sources for no reason. u32 is sufficient for
      both. shift is always <= 32 and mult is limited to 2^32-1 to avoid
      64bit multiplication overflows in the conversion.
      
      Preparatory patch for a generic mult/shift factor calculation
      function.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMikael Pettersson <mikpe@it.uu.se>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NLinus Walleij <linus.walleij@stericsson.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20091111134229.725664788@linutronix.de>
      23af368e
  20. 02 10月, 2009 1 次提交
  21. 17 8月, 2009 1 次提交
    • A
      timers: Drop write permission on /proc/timer_list · de809347
      Amerigo Wang 提交于
      /proc/timer_list and /proc/slabinfo are not supposed to be
      written, so there should be no write permissions on it.
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: linux-mm@kvack.org
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Amerigo Wang <amwang@redhat.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      LKML-Reference: <20090817094525.6355.88682.sendpatchset@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      de809347
  22. 20 10月, 2008 3 次提交
  23. 08 9月, 2008 1 次提交
  24. 06 9月, 2008 1 次提交
  25. 29 4月, 2008 1 次提交
  26. 18 2月, 2008 1 次提交
  27. 02 2月, 2008 1 次提交
  28. 29 10月, 2007 1 次提交
  29. 01 8月, 2007 1 次提交
  30. 18 7月, 2007 1 次提交
    • T
      kallsyms: make KSYM_NAME_LEN include space for trailing '\0' · 9281acea
      Tejun Heo 提交于
      KSYM_NAME_LEN is peculiar in that it does not include the space for the
      trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
      buffer.  This is nonsense and error-prone.  Moreover, when the caller
      forgets that it's very likely to subtly bite back by corrupting the stack
      because the last position of the buffer is always cleared to zero.
      
      This patch increments KSYM_NAME_LEN by one and updates code accordingly.
      
      * off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
        is fixed.
      
      * Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
        MODULE_NAME_LEN was treated as if it didn't include space for the
        trailing '\0'.  Fix it.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Acked-by: NPaulo Marques <pmarques@grupopie.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9281acea
  31. 10 5月, 2007 1 次提交
  32. 09 5月, 2007 1 次提交