1. 18 8月, 2015 2 次提交
    • K
      time: Fix nanosecond file time rounding in timespec_trunc() · de4a95fa
      Karsten Blees 提交于
      timespec_trunc() avoids rounding if granularity <= nanoseconds-per-jiffie
      (or TICK_NSEC). This optimization assumes that:
      
       1. current_kernel_time().tv_nsec is already rounded to TICK_NSEC (i.e.
          with HZ=1000 you'd get 1000000, 2000000, 3000000... but never 1000001).
          This is no longer true (probably since hrtimers introduced in 2.6.16).
      
       2. TICK_NSEC is evenly divisible by all possible granularities. This may
          be true for HZ=100, 250, 1000, but obviously not for HZ=300 /
          TICK_NSEC=3333333 (introduced in 2.6.20).
      
      Thus, sub-second portions of in-core file times are not rounded to on-disk
      granularity. I.e. file times may change when the inode is re-read from disk
      or when the file system is remounted.
      
      This affects all file systems with file time granularities > 1 ns and < 1s,
      e.g. CEPH (1000 ns), UDF (1000 ns), CIFS (100 ns), NTFS (100 ns) and FUSE
      (configurable from user mode via struct fuse_init_out.time_gran).
      
      Steps to reproduce with e.g. UDF:
      
        $ dd if=/dev/zero of=udfdisk count=10000 && mkudffs udfdisk
        $ mkdir udf && mount udfdisk udf
        $ touch udf/test && stat -c %y udf/test
        2015-06-09 10:22:56.130006767 +0200
        $ umount udf && mount udfdisk udf
        $ stat -c %y udf/test
        2015-06-09 10:22:56.130006000 +0200
      
      Remounting truncates the mtime to 1 µs.
      
      Fix the rounding in timespec_trunc() and update the documentation.
      
      timespec_trunc() is exclusively used to calculate inode's [acm]time (mostly
      via current_fs_time()), and always with super_block.s_time_gran as second
      argument. So this can safely be changed without side effects.
      
      Note: This does _not_ fix the issue for FAT's 2 second mtime resolution,
      as super_block.s_time_gran isn't prepared to handle different ctime /
      mtime / atime resolutions nor resolutions > 1 second.
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NKarsten Blees <blees@dcon.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      de4a95fa
    • J
      timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers · 38bf985b
      John Stultz 提交于
      I noticed for non-monotonic timers in timer_list, some of the
      output looked a little confusing.
      
      For example:
       #1: <0000000000000000>, posix_timer_fn, S:01, hrtimer_start_range_ns, leap-a-day/2360
       # expires at 1434412800000000000-1434412800000000000 nsecs [in 1434410725062375469 to 1434410725062375469 nsecs]
      
      You'll note the relative time till the expiration "[in xxx to
      yyy nsecs]" is incorrect. This is because its printing the delta
      between CLOCK_MONOTONIC time to the CLOCK_REALTIME expiration.
      
      This patch fixes this issue by adding the clock offset to the
      "now" time which we use to calculate the delta.
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      38bf985b
  2. 01 7月, 2015 1 次提交
  3. 27 6月, 2015 1 次提交
    • T
      timer: Fix hotplug regression · 24bfcb10
      Thomas Gleixner 提交于
      The recent timer wheel rework removed the get/put_cpu_var() pair in
      the hotplug migration code, which results in:
      
      BUG: using smp_processor_id() in preemptible [00000000] code: hib.sh/2845
      ...
      [<ffffffff810d4fa3>] timer_cpu_notify+0x53/0x12
      
      That hunk is a leftover from an earlier iteration and went unnoticed
      so far.
      
      Restore the previous code which was obviously correct.
      
      Fixes: 0eeda71b 'timer: Replace timer base by a cpu index'
      Reported-and_tested-by: Borislav Petkov <bp@alien8.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      24bfcb10
  4. 19 6月, 2015 10 次提交
    • T
      timer: Minimize nohz off overhead · 683be13a
      Thomas Gleixner 提交于
      If nohz is disabled on the kernel command line the [hr]timer code
      still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty
      pointless exercise. Cache nohz_active in [hr]timer per cpu bases and
      avoid the overhead.
      
      Before:
        48.10%  hog       [.] main
        15.25%  [kernel]  [k] _raw_spin_lock_irqsave
         9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.50%  [kernel]  [k] mod_timer
         6.44%  [kernel]  [k] lock_timer_base.isra.38
         3.87%  [kernel]  [k] detach_if_pending
         3.80%  [kernel]  [k] del_timer
         2.67%  [kernel]  [k] internal_add_timer
         1.33%  [kernel]  [k] __internal_add_timer
         0.73%  [kernel]  [k] timerfn
         0.54%  [kernel]  [k] wake_up_nohz_cpu
      
      After:
        48.73%  hog       [.] main
        15.36%  [kernel]  [k] _raw_spin_lock_irqsave
         9.77%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.61%  [kernel]  [k] lock_timer_base.isra.38
         6.42%  [kernel]  [k] mod_timer
         3.90%  [kernel]  [k] detach_if_pending
         3.76%  [kernel]  [k] del_timer
         2.41%  [kernel]  [k] internal_add_timer
         1.39%  [kernel]  [k] __internal_add_timer
         0.76%  [kernel]  [k] timerfn
      
      We probably should have a cached value for nohz full in the per cpu
      bases as well to avoid the cpumask check. The base cache line is hot
      already, the cpumask not necessarily.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      683be13a
    • T
      timer: Reduce timer migration overhead if disabled · bc7a34b8
      Thomas Gleixner 提交于
      Eric reported that the timer_migration sysctl is not really nice
      performance wise as it needs to check at every timer insertion whether
      the feature is enabled or not. Further the check does not live in the
      timer code, so we have an extra function call which checks an extra
      cache line to figure out that it is disabled.
      
      We can do better and store that information in the per cpu (hr)timer
      bases. I pondered to use a static key, but that's a nightmare to
      update from the nohz code and the timer base cache line is hot anyway
      when we select a timer base.
      
      The old logic enabled the timer migration unconditionally if
      CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
      line.
      
      With this modification, we start off with migration disabled. The user
      visible sysctl is still set to enabled. If the kernel switches to NOHZ
      migration is enabled, if the user did not disable it via the sysctl
      prior to the switch. If nohz=off is on the kernel command line,
      migration stays disabled no matter what.
      
      Before:
        47.76%  hog       [.] main
        14.84%  [kernel]  [k] _raw_spin_lock_irqsave
         9.55%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.71%  [kernel]  [k] mod_timer
         6.24%  [kernel]  [k] lock_timer_base.isra.38
         3.76%  [kernel]  [k] detach_if_pending
         3.71%  [kernel]  [k] del_timer
         2.50%  [kernel]  [k] internal_add_timer
         1.51%  [kernel]  [k] get_nohz_timer_target
         1.28%  [kernel]  [k] __internal_add_timer
         0.78%  [kernel]  [k] timerfn
         0.48%  [kernel]  [k] wake_up_nohz_cpu
      
      After:
        48.10%  hog       [.] main
        15.25%  [kernel]  [k] _raw_spin_lock_irqsave
         9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.50%  [kernel]  [k] mod_timer
         6.44%  [kernel]  [k] lock_timer_base.isra.38
         3.87%  [kernel]  [k] detach_if_pending
         3.80%  [kernel]  [k] del_timer
         2.67%  [kernel]  [k] internal_add_timer
         1.33%  [kernel]  [k] __internal_add_timer
         0.73%  [kernel]  [k] timerfn
         0.54%  [kernel]  [k] wake_up_nohz_cpu
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      bc7a34b8
    • T
      timer: Stats: Simplify the flags handling · c74441a1
      Thomas Gleixner 提交于
      Simplify the handling of the flag storage for the timer statistics. No
      intermediate storage anymore. Just hand over the flags field.
      
      I left the printout of 'deferrable' for now because changing this
      would be an ABI update and I have no idea how strong people feel about
      that. OTOH, I wonder whether we should kill the whole timer stats
      stuff because all of that information can be retrieved via ftrace/perf
      as well.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224512.046626248@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c74441a1
    • T
      timer: Replace timer base by a cpu index · 0eeda71b
      Thomas Gleixner 提交于
      Instead of storing a pointer to the per cpu tvec_base we can simply
      cache a CPU index in the timer_list and use that to get hold of the
      correct per cpu tvec_base. This is only used in lock_timer_base() and
      the slightly larger code is peanuts versus the spinlock operation and
      the d-cache foot print of the timer wheel.
      
      Aside of that this allows to get rid of following nuisances:
      
       - boot_tvec_base
      
         That statically allocated 4k bss data is just kept around so the
         timer has a home when it gets statically initialized. It serves no
         other purpose.
      
         With the CPU index we assign the timer to CPU0 at static
         initialization time and therefor can avoid the whole boot_tvec_base
         dance.  That also simplifies the init code, which just can use the
         per cpu base.
      
         Before:
           text	   data	    bss	    dec	    hex	filename
          17491	   9201	   4160	  30852	   7884	../build/kernel/time/timer.o
         After:
           text	   data	    bss	    dec	    hex	filename
          17440	   9193	      0	  26633	   6809	../build/kernel/time/timer.o
      
       - Overloading the base pointer with various flags
      
         The CPU index has enough space to hold the flags (deferrable,
         irqsafe) so we can get rid of the extra masking and bit fiddling
         with the base pointer.
      
      As a benefit we reduce the size of struct timer_list on 64 bit
      machines. 4 - 8 bytes, a size reduction up to 15% per struct timer_list,
      which is a real win as we have tons of them embedded in other structs.
      
      This changes also the newly added deferrable printout of the timer
      start trace point to capture and print all timer->flags, which allows
      us to decode the target cpu of the timer as well.
      
      We might have used bitfields for this, but that would change the
      static initializers and the init function for no value to accomodate
      big endian bitfields.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Badhri Jagan Sridharan <Badhri@google.com>
      Link: http://lkml.kernel.org/r/20150526224511.950084301@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      0eeda71b
    • T
      timer: Use hlist for the timer wheel hash buckets · 1dabbcec
      Thomas Gleixner 提交于
      This reduces the size of struct tvec_base by 50% and results in
      slightly smaller code as well.
      
      Before:
         struct tvec_base: size: 8256, cachelines: 129
      
         text	   data	    bss	    dec	    hex	filename
        17698	  13297	   8256	  39251	   9953	../build/kernel/time/timer.o
      
      After:
        struct tvec_base: 4160, cachelines: 65
      
         text	   data	    bss	    dec	    hex	filename
        17491	   9201	   4160	  30852	   7884	../build/kernel/time/timer.o
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224511.854731214@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1dabbcec
    • T
      timer: Remove FIFO "guarantee" · 1bd04bf6
      Thomas Gleixner 提交于
      The FIFO guarantee is only there if two timers are queued into the
      same bucket at the same jiffie on the same cpu:
      
       - The slack value depends on the delta between expiry and enqueue
         time, so the resulting expiry time can be different for timers
         which are queued in different jiffies.
      
       - Timers which are queued into the secondary array end up after a
         later queued timer which was queued into the primary array due to
         cascading.
      
       - Timers can end up on different cpus due to the NOHZ target moving
         around. Obviously there is no guarantee of expiry ordering between
         cpus.
      
      So anything which relies on FIFO behaviour of the timer wheel is
      broken already.
      
      This is a preparatory patch for converting the timer wheel to hlist
      which reduces the memory foot print of the wheel by 50%.
      
      It's a seperate patch so any (unlikely to happen) regression caused by
      this can be identified clearly.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Cc: George Spelvin <linux@horizon.com>
      Link: http://lkml.kernel.org/r/20150526224511.757520403@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1bd04bf6
    • T
      timers: Sanitize catchup_timer_jiffies() usage · 3bb475a3
      Thomas Gleixner 提交于
      catchup_timer_jiffies() has been applied blindly to several functions
      without looking for possible better ways to do it.
      
      1) internal_add_timer()
      
         Move the update to base->all_timers before we actually insert the
         timer into the wheel.
      
      2) detach_if_pending()
      
         Again the update to base->all_timers allows us to explicitely do
         the timer_jiffies update in place, if this was the last timer which
         got removed.
      
      3) __run_timers()
      
         We only check on entry, which is silly, because base->timer_jiffies
         can be behind - especially on NOHZ kernels - and if there is a
         single deferrable timer somewhere between base->timer_jiffies and
         jiffies we expire it and then loop until base->timer_jiffies ==
         jiffies.
      
         Move it into the loop.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224511.662994644@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3bb475a3
    • P
      hrtimer: Allow hrtimer::function() to free the timer · 887d9dc9
      Peter Zijlstra 提交于
      Currently an hrtimer callback function cannot free its own timer
      because __run_hrtimer() still needs to clear HRTIMER_STATE_CALLBACK
      after it. Freeing the timer would result in a clear use-after-free.
      
      Solve this by using a scheme similar to regular timers; track the
      current running timer in hrtimer_clock_base::running.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: wanpeng.li@linux.intel.com
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.471563047@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      887d9dc9
    • P
      hrtimer: Fix hrtimer_is_queued() hole · 8edfb036
      Peter Zijlstra 提交于
      A queued hrtimer that gets restarted (hrtimer_start*() while
      hrtimer_is_queued()) will briefly appear as unqueued/inactive, even
      though the timer has always been active, we just moved it.
      
      Close this hole by preserving timer->state in
      hrtimer_start_range_ns()'s remove_hrtimer() call.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.175989138@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8edfb036
    • O
      hrtimer: Remove HRTIMER_STATE_MIGRATE · c04dca02
      Oleg Nesterov 提交于
      I do not understand HRTIMER_STATE_MIGRATE. Unless I am totally
      confused it looks buggy and simply unneeded.
      
      migrate_hrtimer_list() sets it to keep hrtimer_active() == T, but this
      is not enough: this can fool, say, hrtimer_is_queued() in
      dequeue_signal().
      
      Can't migrate_hrtimer_list() simply use HRTIMER_STATE_ENQUEUED?
      This fixes the race and we can kill STATE_MIGRATE.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.072387650@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c04dca02
  5. 18 6月, 2015 2 次提交
  6. 12 6月, 2015 4 次提交
    • J
      ntp: Do leapsecond adjustment in adjtimex read path · 96efdcf2
      John Stultz 提交于
      Since the leapsecond is applied at tick-time, this means there is a
      small window of time at the start of a leap-second where we cross into
      the next second before applying the leap.
      
      This patch modified adjtimex so that the leap-second is applied on the
      second edge. Providing more correct leapsecond behavior.
      
      This does make it so that adjtimex()'s returned time values can be
      inconsistent with time values read from gettimeofday() or
      clock_gettime(CLOCK_REALTIME,...)  for a brief period of one tick at
      the leapsecond.  However, those other interfaces do not provide the
      TIME_OOP time_state return that adjtimex() provides, which allows the
      leapsecond to be properly represented. They instead only see a time
      discontinuity, and cannot tell the first 23:59:59 from the repeated
      23:59:59 leap second.
      
      This seems like a reasonable tradeoff given clock_gettime() /
      gettimeofday() cannot properly represent a leapsecond, and users
      likely care more about performance, while folks who are using
      adjtimex() more likely care about leap-second correctness.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1434063297-28657-5-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      96efdcf2
    • J
      time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge · 833f32d7
      John Stultz 提交于
      Currently, leapsecond adjustments are done at tick time. As a result,
      the leapsecond was applied at the first timer tick *after* the
      leapsecond (~1-10ms late depending on HZ), rather then exactly on the
      second edge.
      
      This was in part historical from back when we were always tick based,
      but correcting this since has been avoided since it adds extra
      conditional checks in the gettime fastpath, which has performance
      overhead.
      
      However, it was recently pointed out that ABS_TIME CLOCK_REALTIME
      timers set for right after the leapsecond could fire a second early,
      since some timers may be expired before we trigger the timekeeping
      timer, which then applies the leapsecond.
      
      This isn't quite as bad as it sounds, since behaviorally it is similar
      to what is possible w/ ntpd made leapsecond adjustments done w/o using
      the kernel discipline. Where due to latencies, timers may fire just
      prior to the settimeofday call. (Also, one should note that all
      applications using CLOCK_REALTIME timers should always be careful,
      since they are prone to quirks from settimeofday() disturbances.)
      
      However, the purpose of having the kernel do the leap adjustment is to
      avoid such latencies, so I think this is worth fixing.
      
      So in order to properly keep those timers from firing a second early,
      this patch modifies the ntp and timekeeping logic so that we keep
      enough state so that the update_base_offsets_now accessor, which
      provides the hrtimer core the current time, can check and apply the
      leapsecond adjustment on the second edge. This prevents the hrtimer
      core from expiring timers too early.
      
      This patch does not modify any other time read path, so no additional
      overhead is incurred. However, this also means that the leap-second
      continues to be applied at tick time for all other read-paths.
      
      Apologies to Richard Cochran, who pushed for similar changes years
      ago, which I resisted due to the concerns about the performance
      overhead.
      
      While I suspect this isn't extremely critical, folks who care about
      strict leap-second correctness will likely want to watch
      this. Potentially a -stable candidate eventually.
      Originally-suggested-by: NRichard Cochran <richardcochran@gmail.com>
      Reported-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Reported-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      833f32d7
    • J
      ntp: Introduce and use SECS_PER_DAY macro instead of 86400 · 90bf361c
      John Stultz 提交于
      Currently the leapsecond logic uses what looks like magic values.
      
      Improve this by defining SECS_PER_DAY and using that macro
      to make the logic more clear.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1434063297-28657-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      90bf361c
    • J
      time: Move clock_was_set_seq update before updating shadow-timekeeper · d1518326
      John Stultz 提交于
      It was reported that 868a3e91 (hrtimer: Make offset
      update smarter) was causing timer problems after suspend/resume.
      
      The problem with that change is the modification to
      clock_was_set_seq in timekeeping_update is done prior to
      mirroring the time state to the shadow-timekeeper. Thus the
      next time we do update_wall_time() the updated sequence is
      overwritten by whats in the shadow copy.
      
      This patch moves the shadow-timekeeper mirroring to the end
      of the function, after all updates have been made, so all data
      is kept in sync.
      
      (This patch also affects the update_fast_timekeeper calls which
      were also problematically done prior to the mirroring).
      Reported-and-tested-by: NJeremiah Mahler <jmmahler@gmail.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d1518326
  7. 10 6月, 2015 2 次提交
  8. 02 6月, 2015 4 次提交
  9. 28 5月, 2015 2 次提交
  10. 27 5月, 2015 1 次提交
  11. 23 5月, 2015 5 次提交
  12. 19 5月, 2015 5 次提交
    • V
      clockevents: Stop unused clockevent devices · d2540875
      Viresh Kumar 提交于
      To avoid getting spurious interrupts on a tickless CPU, clockevent
      device can now be stopped by switching to ONESHOT_STOPPED state.
      
      The natural place for handling this transition is tick_program_event().
      
      On 'expires == KTIME_MAX', we skip programming the event and so we need
      to fix such call sites as well, to always call tick_program_event()
      irrespective of the expires value.
      
      Once the clockevent device is required again, check if it was earlier
      put into ONESHOT_STOPPED state. If yes, switch its state to ONESHOT
      before programming its event.
      
      To make sure we haven't missed any corner case, add a WARN() for the
      case where we try to reprogram clockevent device while we aren't
      configured in ONESHOT_STOPPED state.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: linaro-kernel@lists.linaro.org
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/5146b07be7f0bc497e0ebae036590ec2fa73e540.1428031396.git.viresh.kumar@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d2540875
    • V
      clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state · 8fff52fd
      Viresh Kumar 提交于
      When no timers/hrtimers are pending, the expiry time is set to a
      special value: 'KTIME_MAX'. This normally happens with
      NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
      
      When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer
      (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device
      (NOHZ_MODE_LOWRES).  But, the clockevent device is already
      reprogrammed from the tick-handler for next tick.
      
      As the clock event device is programmed in ONESHOT mode it will at
      least fire one more time (unnecessarily). Timers on few
      implementations (like arm_arch_timer, etc.) only support PERIODIC mode
      and their drivers emulate ONESHOT over that. Which means that on these
      platforms we will get spurious interrupts periodically (at last
      programmed interval rate, normally tick rate).
      
      In order to avoid spurious interrupts, the clockevent device should be
      stopped or its interrupts should be masked.
      
      A simple (yet hacky) solution to get this fixed could be: update
      hrtimer_force_reprogram() to always reprogram clockevent device and
      update clockevent drivers to STOP generating events (or delay it to
      max time) when 'expires' is set to KTIME_MAX. But the drawback here is
      that every clockevent driver has to be hacked for this particular case
      and its very easy for new ones to miss this.
      
      However, Thomas suggested to add an optional state ONESHOT_STOPPED to
      solve this problem: lkml.org/lkml/2014/5/9/508.
      
      This patch adds support for ONESHOT_STOPPED state in clockevents
      core. It will only be available to drivers that implement the
      state-specific callbacks instead of the legacy ->set_mode() callback.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NPreeti U. Murthy <preeti@linux.vnet.ibm.com>
      Cc: linaro-kernel@lists.linaro.org
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/b8b383a03ac07b13312c16850b5106b82e4245b5.1428031396.git.viresh.kumar@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8fff52fd
    • N
      time: Refactor msecs_to_jiffies · ca42aaf0
      Nicholas Mc Guire 提交于
      Refactor the msecs_to_jiffies conditional code part in time.c and 
      jiffies.h putting it into conditional functions rather than #ifdefs
      to improve readability.
      
      [ tglx: Verified that there is no binary code change ]
      Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
      Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Link: http://lkml.kernel.org/r/1431951554-5563-2-git-send-email-hofrat@osadl.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ca42aaf0
    • N
      time: Move timeconst.h into include/generated · 0a227985
      Nicholas Mc Guire 提交于
      kernel/time/timeconst.h is moved to include/generated/ and generated 
      by the top level Kbuild. This allows using timeconst.h in an earlier
      build stage.
      Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
      Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Link: http://lkml.kernel.org/r/1431951554-5563-1-git-send-email-hofrat@osadl.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      0a227985
    • R
      PM / sleep: Make suspend-to-idle-specific code depend on CONFIG_SUSPEND · 87e9b9f1
      Rafael J. Wysocki 提交于
      Since idle_should_freeze() is defined to always return 'false'
      for CONFIG_SUSPEND unset, all of the code depending on it in
      cpuidle_idle_call() is not necessary in that case.
      
      Make that code depend on CONFIG_SUSPEND too to avoid building it
      when it is not going to be used.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      87e9b9f1
  13. 15 5月, 2015 1 次提交