提交 · 683be13a284720205228e29207ef11a1c3c322b9 · openanolis / cloud-kernel

19 6月, 2015 12 次提交

timer: Minimize nohz off overhead · 683be13a

由 Thomas Gleixner 提交于 5月 26, 2015

If nohz is disabled on the kernel command line the [hr]timer code
still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty
pointless exercise. Cache nohz_active in [hr]timer per cpu bases and
avoid the overhead.

Before:
  48.10%  hog       [.] main
  15.25%  [kernel]  [k] _raw_spin_lock_irqsave
   9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.50%  [kernel]  [k] mod_timer
   6.44%  [kernel]  [k] lock_timer_base.isra.38
   3.87%  [kernel]  [k] detach_if_pending
   3.80%  [kernel]  [k] del_timer
   2.67%  [kernel]  [k] internal_add_timer
   1.33%  [kernel]  [k] __internal_add_timer
   0.73%  [kernel]  [k] timerfn
   0.54%  [kernel]  [k] wake_up_nohz_cpu

After:
  48.73%  hog       [.] main
  15.36%  [kernel]  [k] _raw_spin_lock_irqsave
   9.77%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.61%  [kernel]  [k] lock_timer_base.isra.38
   6.42%  [kernel]  [k] mod_timer
   3.90%  [kernel]  [k] detach_if_pending
   3.76%  [kernel]  [k] del_timer
   2.41%  [kernel]  [k] internal_add_timer
   1.39%  [kernel]  [k] __internal_add_timer
   0.76%  [kernel]  [k] timerfn

We probably should have a cached value for nohz full in the per cpu
bases as well to avoid the cpumask check. The base cache line is hot
already, the cpumask not necessarily.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

683be13a

timer: Reduce timer migration overhead if disabled · bc7a34b8

由 Thomas Gleixner 提交于 5月 26, 2015

Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.

We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.

The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.

With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.

Before:
  47.76%  hog       [.] main
  14.84%  [kernel]  [k] _raw_spin_lock_irqsave
   9.55%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.71%  [kernel]  [k] mod_timer
   6.24%  [kernel]  [k] lock_timer_base.isra.38
   3.76%  [kernel]  [k] detach_if_pending
   3.71%  [kernel]  [k] del_timer
   2.50%  [kernel]  [k] internal_add_timer
   1.51%  [kernel]  [k] get_nohz_timer_target
   1.28%  [kernel]  [k] __internal_add_timer
   0.78%  [kernel]  [k] timerfn
   0.48%  [kernel]  [k] wake_up_nohz_cpu

After:
  48.10%  hog       [.] main
  15.25%  [kernel]  [k] _raw_spin_lock_irqsave
   9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.50%  [kernel]  [k] mod_timer
   6.44%  [kernel]  [k] lock_timer_base.isra.38
   3.87%  [kernel]  [k] detach_if_pending
   3.80%  [kernel]  [k] del_timer
   2.67%  [kernel]  [k] internal_add_timer
   1.33%  [kernel]  [k] __internal_add_timer
   0.73%  [kernel]  [k] timerfn
   0.54%  [kernel]  [k] wake_up_nohz_cpu
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

bc7a34b8

timer: Stats: Simplify the flags handling · c74441a1

由 Thomas Gleixner 提交于 5月 26, 2015

Simplify the handling of the flag storage for the timer statistics. No
intermediate storage anymore. Just hand over the flags field.

I left the printout of 'deferrable' for now because changing this
would be an ABI update and I have no idea how strong people feel about
that. OTOH, I wonder whether we should kill the whole timer stats
stuff because all of that information can be retrieved via ftrace/perf
as well.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.046626248@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

c74441a1

timer: Replace timer base by a cpu index · 0eeda71b

由 Thomas Gleixner 提交于 5月 26, 2015

Instead of storing a pointer to the per cpu tvec_base we can simply
cache a CPU index in the timer_list and use that to get hold of the
correct per cpu tvec_base. This is only used in lock_timer_base() and
the slightly larger code is peanuts versus the spinlock operation and
the d-cache foot print of the timer wheel.

Aside of that this allows to get rid of following nuisances:

 - boot_tvec_base

   That statically allocated 4k bss data is just kept around so the
   timer has a home when it gets statically initialized. It serves no
   other purpose.

   With the CPU index we assign the timer to CPU0 at static
   initialization time and therefor can avoid the whole boot_tvec_base
   dance.  That also simplifies the init code, which just can use the
   per cpu base.

   Before:
     text	   data	    bss	    dec	    hex	filename
    17491	   9201	   4160	  30852	   7884	../build/kernel/time/timer.o
   After:
     text	   data	    bss	    dec	    hex	filename
    17440	   9193	      0	  26633	   6809	../build/kernel/time/timer.o

 - Overloading the base pointer with various flags

   The CPU index has enough space to hold the flags (deferrable,
   irqsafe) so we can get rid of the extra masking and bit fiddling
   with the base pointer.

As a benefit we reduce the size of struct timer_list on 64 bit
machines. 4 - 8 bytes, a size reduction up to 15% per struct timer_list,
which is a real win as we have tons of them embedded in other structs.

This changes also the newly added deferrable printout of the timer
start trace point to capture and print all timer->flags, which allows
us to decode the target cpu of the timer as well.

We might have used bitfields for this, but that would change the
static initializers and the init function for no value to accomodate
big endian bitfields.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Badhri Jagan Sridharan <Badhri@google.com>
Link: http://lkml.kernel.org/r/20150526224511.950084301@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

0eeda71b

timer: Use hlist for the timer wheel hash buckets · 1dabbcec

由 Thomas Gleixner 提交于 5月 26, 2015

This reduces the size of struct tvec_base by 50% and results in
slightly smaller code as well.

Before:
   struct tvec_base: size: 8256, cachelines: 129

   text	   data	    bss	    dec	    hex	filename
  17698	  13297	   8256	  39251	   9953	../build/kernel/time/timer.o

After:
  struct tvec_base: 4160, cachelines: 65

   text	   data	    bss	    dec	    hex	filename
  17491	   9201	   4160	  30852	   7884	../build/kernel/time/timer.o
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NViresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224511.854731214@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

1dabbcec

timer: Remove FIFO "guarantee" · 1bd04bf6

由 Thomas Gleixner 提交于 5月 26, 2015

The FIFO guarantee is only there if two timers are queued into the
same bucket at the same jiffie on the same cpu:

 - The slack value depends on the delta between expiry and enqueue
   time, so the resulting expiry time can be different for timers
   which are queued in different jiffies.

 - Timers which are queued into the secondary array end up after a
   later queued timer which was queued into the primary array due to
   cascading.

 - Timers can end up on different cpus due to the NOHZ target moving
   around. Obviously there is no guarantee of expiry ordering between
   cpus.

So anything which relies on FIFO behaviour of the timer wheel is
broken already.

This is a preparatory patch for converting the timer wheel to hlist
which reduces the memory foot print of the wheel by 50%.

It's a seperate patch so any (unlikely to happen) regression caused by
this can be identified clearly.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NViresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Cc: George Spelvin <linux@horizon.com>
Link: http://lkml.kernel.org/r/20150526224511.757520403@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

1bd04bf6

timers: Sanitize catchup_timer_jiffies() usage · 3bb475a3

由 Thomas Gleixner 提交于 5月 26, 2015

catchup_timer_jiffies() has been applied blindly to several functions
without looking for possible better ways to do it.

1) internal_add_timer()

   Move the update to base->all_timers before we actually insert the
   timer into the wheel.

2) detach_if_pending()

   Again the update to base->all_timers allows us to explicitely do
   the timer_jiffies update in place, if this was the last timer which
   got removed.

3) __run_timers()

   We only check on entry, which is silly, because base->timer_jiffies
   can be behind - especially on NOHZ kernels - and if there is a
   single deferrable timer somewhere between base->timer_jiffies and
   jiffies we expire it and then loop until base->timer_jiffies ==
   jiffies.

   Move it into the loop.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224511.662994644@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

3bb475a3

hrtimer: Allow hrtimer::function() to free the timer · 887d9dc9

由 Peter Zijlstra 提交于 6月 11, 2015

Currently an hrtimer callback function cannot free its own timer
because __run_hrtimer() still needs to clear HRTIMER_STATE_CALLBACK
after it. Freeing the timer would result in a clear use-after-free.

Solve this by using a scheme similar to regular timers; track the
current running timer in hrtimer_clock_base::running.
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: wanpeng.li@linux.intel.com
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.471563047@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

887d9dc9

seqcount: Introduce raw_write_seqcount_barrier() · c4bfa3f5

由 Peter Zijlstra 提交于 6月 17, 2015

Introduce raw_write_seqcount_barrier(), a new construct that can be
used to provide write barrier semantics in seqcount read loops instead
of the usual consistency guarantee.

raw_write_seqcount_barier() is equivalent to:

	raw_write_seqcount_begin();
	raw_write_seqcount_end();

But avoids issueing two back-to-back smp_wmb() instructions.

This construct works because the read side will 'stall' when observing
odd values. This means that -- referring to the example in the comment
below -- even though there is no (matching) read barrier between the
loads of X and Y, we cannot observe !x && !y, because:

 - if we observe Y == false we must observe the first sequence
   increment, which makes us loop, until

 - we observe !(seq & 1) -- the second sequence increment -- at which
   time we must also observe T == true.
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: umgwanakikbuti@gmail.com
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: oleg@redhat.com
Cc: wanpeng.li@linux.intel.com
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20150617122924.GP3644@twins.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

c4bfa3f5

seqcount: Rename write_seqcount_barrier() · a7c6f571

由 Peter Zijlstra 提交于 6月 11, 2015

I'll shortly be introducing another seqcount primitive that's useful
to provide ordering semantics and would like to use the
write_seqcount_barrier() name for that.

Seeing how there's only one user of the current primitive, lets rename
it to invalidate, as that appears what its doing.

While there, employ lockdep_assert_held() instead of
assert_spin_locked() to not generate debug code for regular kernels.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: wanpeng.li@linux.intel.com
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.279926217@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

a7c6f571

hrtimer: Fix hrtimer_is_queued() hole · 8edfb036

由 Peter Zijlstra 提交于 6月 11, 2015

A queued hrtimer that gets restarted (hrtimer_start*() while
hrtimer_is_queued()) will briefly appear as unqueued/inactive, even
though the timer has always been active, we just moved it.

Close this hole by preserving timer->state in
hrtimer_start_range_ns()'s remove_hrtimer() call.
Reported-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: wanpeng.li@linux.intel.com
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.175989138@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

8edfb036

hrtimer: Remove HRTIMER_STATE_MIGRATE · c04dca02

由 Oleg Nesterov 提交于 6月 11, 2015

I do not understand HRTIMER_STATE_MIGRATE. Unless I am totally
confused it looks buggy and simply unneeded.

migrate_hrtimer_list() sets it to keep hrtimer_active() == T, but this
is not enough: this can fool, say, hrtimer_is_queued() in
dequeue_signal().

Can't migrate_hrtimer_list() simply use HRTIMER_STATE_ENQUEUED?
This fixes the race and we can kill STATE_MIGRATE.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: wanpeng.li@linux.intel.com
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.072387650@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

c04dca02

18 6月, 2015 3 次提交

selftest: Timers: Avoid signal deadlock in leap-a-day · 51a16c1e

由 John Stultz 提交于 6月 17, 2015

In 0c4a5fc9 (Add leap-second timer edge testing to
leap-a-day.c), we added a timer to the test which checks to make
sure timers near the leapsecond edge behave correctly.

However, the output generated from the timer uses ctime_r, which
isn't async-signal safe, and should that signal land while the
main test is using ctime_r to print its output, its possible for
the test to deadlock on glibc internal locks.

Thus this patch reworks the output to avoid using ctime_r in
the signal handler.
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434565003-3386-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

51a16c1e

timekeeping: Copy the shadow-timekeeper over the real timekeeper last · 906c5557

由 John Stultz 提交于 6月 17, 2015

The fix in d1518326 (time: Move clock_was_set_seq update
before updating shadow-timekeeper) was unfortunately incomplete.

The main gist of that change was to do the shadow-copy update
last, so that any state changes were properly duplicated, and
we wouldn't accidentally have stale data in the shadow.

Unfortunately in the main update_wall_time() logic, we update
use the shadow-timekeeper to calculate the next update values,
then while holding the lock, copy the shadow-timekeeper over,
then call timekeeping_update() to do some additional
bookkeeping, (skipping the shadow mirror). The bug with this is
the additional bookkeeping isn't all read-only, and some
changes timkeeper state. Thus we might then overwrite this state
change on the next update.

To avoid this problem, do the timekeeping_update() on the
shadow-timekeeper prior to copying the full state over to
the real-timekeeper.

This avoids problems with both the clock_was_set_seq and
next_leap_ktime being overwritten and possibly the
fast-timekeepers as well.

Many thanks to Prarit for his rigorous testing, which discovered
this problem, along with Prarit and Daniel's work validating this
fix.
Reported-by: NPrarit Bhargava <prarit@redhat.com>
Tested-by: NPrarit Bhargava <prarit@redhat.com>
Tested-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434560753-7441-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

906c5557

clockevents: Check state instead of mode in suspend/resume path · a9d20988

由 Viresh Kumar 提交于 6月 17, 2015

CLOCK_EVT_MODE_* macros are present for backward compatibility (as most
of the drivers are still using old ->set_mode() interface).

These macro's shouldn't be used anymore in code, that is common to both
driver interfaces, i.e. ->set_mode() and ->set_state_*().

Drivers implementing ->set_state_*() interface, which have their
clkevt->mode set to 0 (clkevt device structures are normally globally
defined), will not participate in suspend/resume as they will always be
marked as UNUSED.

Fix this by checking state of the clockevent device instead of mode,
which is updated for both the interfaces.

Fixes: ac34ad27 ("clockevents: Do not suspend/resume if unused")
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: alexandre.belloni@free-electrons.com
Cc: sylvain.rochet@finsecur.com
Link: http://lkml.kernel.org/r/a1964eef6e8a47d02b1ff9083c6c91f73f0ff643.1434537215.git.viresh.kumar@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

a9d20988

12 6月, 2015 5 次提交

selftests: timers: Add leap-second timer edge testing to leap-a-day.c · 0c4a5fc9

由 John Stultz 提交于 6月 11, 2015

Prarit reported an issue w/ timers around the leapsecond, where a
timer set for Midnight UTC (00:00:00) might fire a second early right
before the leapsecond (23:59:60 - though it appears as a repeated
23:59:59) is applied.

So I've updated the leap-a-day.c test to integrate a similar test,
where we set a timer and check if it triggers at the right time, and
if the ntp state transition is managed properly.
Reported-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
Reported-by: NPrarit Bhargava <prarit@redhat.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-6-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

0c4a5fc9

ntp: Do leapsecond adjustment in adjtimex read path · 96efdcf2

由 John Stultz 提交于 6月 11, 2015

Since the leapsecond is applied at tick-time, this means there is a
small window of time at the start of a leap-second where we cross into
the next second before applying the leap.

This patch modified adjtimex so that the leap-second is applied on the
second edge. Providing more correct leapsecond behavior.

This does make it so that adjtimex()'s returned time values can be
inconsistent with time values read from gettimeofday() or
clock_gettime(CLOCK_REALTIME,...)  for a brief period of one tick at
the leapsecond.  However, those other interfaces do not provide the
TIME_OOP time_state return that adjtimex() provides, which allows the
leapsecond to be properly represented. They instead only see a time
discontinuity, and cannot tell the first 23:59:59 from the repeated
23:59:59 leap second.

This seems like a reasonable tradeoff given clock_gettime() /
gettimeofday() cannot properly represent a leapsecond, and users
likely care more about performance, while folks who are using
adjtimex() more likely care about leap-second correctness.
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-5-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

96efdcf2

time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge · 833f32d7

由 John Stultz 提交于 6月 11, 2015

Currently, leapsecond adjustments are done at tick time. As a result,
the leapsecond was applied at the first timer tick *after* the
leapsecond (~1-10ms late depending on HZ), rather then exactly on the
second edge.

This was in part historical from back when we were always tick based,
but correcting this since has been avoided since it adds extra
conditional checks in the gettime fastpath, which has performance
overhead.

However, it was recently pointed out that ABS_TIME CLOCK_REALTIME
timers set for right after the leapsecond could fire a second early,
since some timers may be expired before we trigger the timekeeping
timer, which then applies the leapsecond.

This isn't quite as bad as it sounds, since behaviorally it is similar
to what is possible w/ ntpd made leapsecond adjustments done w/o using
the kernel discipline. Where due to latencies, timers may fire just
prior to the settimeofday call. (Also, one should note that all
applications using CLOCK_REALTIME timers should always be careful,
since they are prone to quirks from settimeofday() disturbances.)

However, the purpose of having the kernel do the leap adjustment is to
avoid such latencies, so I think this is worth fixing.

So in order to properly keep those timers from firing a second early,
this patch modifies the ntp and timekeeping logic so that we keep
enough state so that the update_base_offsets_now accessor, which
provides the hrtimer core the current time, can check and apply the
leapsecond adjustment on the second edge. This prevents the hrtimer
core from expiring timers too early.

This patch does not modify any other time read path, so no additional
overhead is incurred. However, this also means that the leap-second
continues to be applied at tick time for all other read-paths.

Apologies to Richard Cochran, who pushed for similar changes years
ago, which I resisted due to the concerns about the performance
overhead.

While I suspect this isn't extremely critical, folks who care about
strict leap-second correctness will likely want to watch
this. Potentially a -stable candidate eventually.
Originally-suggested-by: NRichard Cochran <richardcochran@gmail.com>
Reported-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
Reported-by: NPrarit Bhargava <prarit@redhat.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

833f32d7

ntp: Introduce and use SECS_PER_DAY macro instead of 86400 · 90bf361c

由 John Stultz 提交于 6月 11, 2015

Currently the leapsecond logic uses what looks like magic values.

Improve this by defining SECS_PER_DAY and using that macro
to make the logic more clear.
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

90bf361c

time: Move clock_was_set_seq update before updating shadow-timekeeper · d1518326

由 John Stultz 提交于 6月 11, 2015

It was reported that 868a3e91 (hrtimer: Make offset
update smarter) was causing timer problems after suspend/resume.

The problem with that change is the modification to
clock_was_set_seq in timekeeping_update is done prior to
mirroring the time state to the shadow-timekeeper. Thus the
next time we do update_wall_time() the updated sequence is
overwritten by whats in the shadow copy.

This patch moves the shadow-timekeeper mirroring to the end
of the function, after all updates have been made, so all data
is kept in sync.

(This patch also affects the update_fast_timekeeper calls which
were also problematically done prior to the mirroring).
Reported-and-tested-by: NJeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

d1518326

10 6月, 2015 3 次提交

clocksource: Use current logging style · 45bbfe64

由 Joe Perches 提交于 5月 25, 2015

clocksource messages aren't prefixed in dmesg so it's a bit unclear
what subsystem emits the messages.

Use pr_fmt and pr_<level> to auto-prefix the messages appropriately.

Miscellanea:

o Remove "Warning" from KERN_WARNING level messages
o Align "timekeeping watchdog: " messages
o Coalesce formats
o Align multiline arguments
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1432579795.2846.75.camel@perches.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

45bbfe64

time: Allow gcc to fold usecs_to_jiffies(constant) · c569a23d

由 Nicholas Mc Guire 提交于 5月 28, 2015

To allow constant folding in usecs_to_jiffies() conditionally calls
the HZ dependent _usecs_to_jiffies() helpers or, when gcc can not
figure out constant folding, __usecs_to_jiffies, which is the renamed
original usecs_to_jiffies() function.
Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Joe Perches <joe@perches.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Link: http://lkml.kernel.org/r/1432832996-12129-2-git-send-email-hofrat@osadl.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

c569a23d

time: Refactor usecs_to_jiffies · ae60d6a0

由 Nicholas Mc Guire 提交于 5月 28, 2015

Refactor the usecs_to_jiffies conditional code part in time.c and
jiffies.h putting it into conditional functions rather than #ifdefs
to improve readability. This is analogous to the msecs_to_jiffies()
cleanup in commit ca42aaf0 ("time: Refactor msecs_to_jiffies")
Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Joe Perches <joe@perches.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Link: http://lkml.kernel.org/r/1432832996-12129-1-git-send-email-hofrat@osadl.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

ae60d6a0

08 6月, 2015 1 次提交

hrtimers: Make sure hrtimer_resolution is unsigned int · d711b8b3

由 Borislav Petkov 提交于 6月 06, 2015

... in the !CONFIG_HIGH_RES_TIMERS case too. And thus fix warnings like
this one:

net/sched/sch_api.c: In function ‘psched_show’:
net/sched/sch_api.c:1891:6: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 6 has type ‘long int’ [-Wformat=]
      (u32)NSEC_PER_SEC / hrtimer_resolution);
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1433583000-32090-1-git-send-email-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>

d711b8b3

02 6月, 2015 16 次提交

Merge branch 'clockevents/4.2' of... · 09cbbf0c

由 Thomas Gleixner 提交于 6月 02, 2015

Merge branch 'clockevents/4.2' of http://git.linaro.org/people/daniel.lezcano/linux into timers/core

Pull clockevents/clocksource changes from Daniel Lezcano:

  - Removed dead code in the files related to mach-msm for qcom (Stephen Boyd)
  - Cleaned up code for exynos_mct (Krzysztof Kozlowski)
  - Added the new timer lpc3220 (Joachim Eastwood)
  - Added the new timer STM32 and ARM system timer (Maxime Coquelin)

09cbbf0c

clockevents: Rename state to state_use_accessors · be3ef76e

由 Thomas Gleixner 提交于 6月 02, 2015

The only sensible way to make abuse of core internal fields obvious
and easy to grep for.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>

be3ef76e

clockevents: Use set/get state helper functions · 051ebd10

由 Thomas Gleixner 提交于 6月 02, 2015

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>

051ebd10

clockevents: Provide functions to set and get the state · d7eb231c

由 Thomas Gleixner 提交于 6月 02, 2015

We want to rename dev->state, so provide proper get and set
functions. Rename clockevents_set_state() to
clockevents_switch_state() to avoid confusion.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>

d7eb231c

clockevents: Use helpers to check the state of a clockevent device · 472c4a94

由 Viresh Kumar 提交于 5月 21, 2015

Use accessor functions to check the state of clockevent devices in
core code.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/fa2b9869fd17f210eaa156ec2b594efd0230b6c7.1432192527.git.viresh.kumar@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

472c4a94

clockevents: Add helpers to check the state of a clockevent device · 3434d23b

由 Viresh Kumar 提交于 5月 21, 2015

Some clockevent drivers, once migrated to use per-state callbacks,
need to check the state of the clockevent device in their callbacks or
interrupt handler.

Add accessor functions clockevent_state_*() to get this information.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/04a717d490335c688dd7af899fbcede97e1bb8ee.1432192527.git.viresh.kumar@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

3434d23b

clockevents/drivers/timer-stm32: Fix build warning spotted by kbuild test robot · d4688bdc

由 Maxime Coquelin 提交于 5月 28, 2015

This patch fixes below warning spotted by kbuild test robot when building
with ARCH=powerpc:

   drivers/clocksource/timer-stm32.c: In function 'stm32_clockevent_init':
>> drivers/clocksource/timer-stm32.c:140:9: warning: large integer implicitly
	truncated to unsigned type [-Woverflow]

     writel_relaxed(~0UL, data->base + TIM_ARR);

The fix consists in using 0U instead of 0UL.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NMaxime Coquelin <mcoquelin.stm32@gmail.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

d4688bdc

clockevents/drivers: Add STM32 Timer driver · e37e4593

由 Maxime Coquelin 提交于 5月 22, 2015

STM32 MCUs feature 16 and 32 bits general purpose timers with prescalers.
The drivers detects whether the time is 16 or 32 bits, and applies a
1024 prescaler value if it is 16 bits.
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Tested-by: NChanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: NMaxime Coquelin <mcoquelin.stm32@gmail.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

e37e4593

dt-bindings: Document the STM32 timer bindings · 4853914f

由 Maxime Coquelin 提交于 5月 22, 2015

This adds documentation of device tree bindings for the
STM32 timer.
Tested-by: NChanwoo Choi <cw00.choi@samsung.com>
Acked-by: NRob Herring <robh@kernel.org>
Signed-off-by: NMaxime Coquelin <mcoquelin.stm32@gmail.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

4853914f

clocksource/drivers/armv7m_systick: Add ARM System timer driver · 4958ebb3

由 Maxime Coquelin 提交于 5月 09, 2015

This patch adds clocksource support for ARMv7-M's System timer,
also known as SysTick.
Tested-by: NChanwoo Choi <cw00.choi@samsung.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NMaxime Coquelin <mcoquelin.stm32@gmail.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

4958ebb3

dt-bindings: Document the ARM System timer bindings · 571fc8e8

由 Maxime Coquelin 提交于 5月 09, 2015

This adds documentation of device tree bindings for the
ARM System timer.
Tested-by: NChanwoo Choi <cw00.choi@samsung.com>
Acked-by: NRob Herring <robh@kernel.org>
Signed-off-by: NMaxime Coquelin <mcoquelin.stm32@gmail.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

571fc8e8

doc: dt: Add documentation for lpc3220-timer · 5fc9b49d

由 Joachim Eastwood 提交于 5月 12, 2015

Add DT bindings documentation for lpc3220-timer. This timer is
used as clocksource on many NXP platforms.
Signed-off-by: NJoachim Eastwood <manabian@gmail.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>

5fc9b49d

clocksource/drivers/lpc32xx: Add the lpc32xx timer driver · 050dd322

由 Joachim Eastwood 提交于 5月 12, 2015

Add support for using the NXP LPC timer as clocksource and clock
event. These timers are present on many NXP devices including
LPC32xx, LPC17xx, LPC18xx and LPC43xx.

The timer has a 32-bit timer counter register with a programmable
32-bit prescaler. It supports up to 4 compare match values with
interrupt generation and reset/stop timer counter action.
Signed-off-by: NJoachim Eastwood <manabian@gmail.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: NEzequiel Garcia <ezequiel@vanguardiasur.com.ar>
Acked-by: NArnd Bergmann <arnd@arndb.de>

050dd322

clocksource/drivers/exynos_mct: Remove old platform mct_init() · 65ec7b27

由 Krzysztof Kozlowski 提交于 4月 30, 2015

Since commit 228e3023 ("Merge tag 'mct-exynos-for-v3.10' of ...") the
mct_init() was superseded by mct_init_dt() and is not referenced
anywhere. Remove it.
Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

65ec7b27

clocksource/drivers/exynos_mct: Staticize struct clocksource · 6c10bf63

由 Krzysztof Kozlowski 提交于 4月 30, 2015

The struct clocksource 'mct_frc' is not exported and used outside so
make it static.
Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

6c10bf63

clocksource/drivers/exynos_mct: Change exynos4_mct_tick_clear return type to void · 37285674

由 Krzysztof Kozlowski 提交于 4月 30, 2015

Return value of exynos4_mct_tick_clear() was never checked so it can
be safely changed to void.
Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

37285674

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功