- 13 9月, 2014 4 次提交
-
-
由 Richard Larocque 提交于
Locks the k_itimer's it_lock member when handling the alarm timer's expiry callback. The regular posix timers defined in posix-timers.c have this lock held during timout processing because their callbacks are routed through posix_timer_fn(). The alarm timers follow a different path, so they ought to grab the lock somewhere else. Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by: NRichard Larocque <rlarocque@google.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Richard Larocque 提交于
Avoids sending a signal to alarm timers created with sigev_notify set to SIGEV_NONE by checking for that special case in the timeout callback. The regular posix timers avoid sending signals to SIGEV_NONE timers by not scheduling any callbacks for them in the first place. Although it would be possible to do something similar for alarm timers, it's simpler to handle this as a special case in the timeout. Prior to this patch, the alarm timer would ignore the sigev_notify value and try to deliver signals to the process anyway. Even worse, the sanity check for the value of sigev_signo is skipped when SIGEV_NONE was specified, so the signal number could be bogus. If sigev_signo was an unitialized value (as it often would be if SIGEV_NONE is used), then it's hard to predict which signal will be sent. Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by: NRichard Larocque <rlarocque@google.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Richard Larocque 提交于
Returns the time remaining for an alarm timer, rather than the time at which it is scheduled to expire. If the timer has already expired or it is not currently scheduled, the it_value's members are set to zero. This new behavior matches that of the other posix-timers and the POSIX specifications. This is a change in user-visible behavior, and may break existing applications. Hopefully, few users rely on the old incorrect behavior. Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by: NRichard Larocque <rlarocque@google.com> [jstultz: minor style tweak] Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Andrew Hunter 提交于
timeval_to_jiffies tried to round a timeval up to an integral number of jiffies, but the logic for doing so was incorrect: intervals corresponding to exactly N jiffies would become N+1. This manifested itself particularly repeatedly stopping/starting an itimer: setitimer(ITIMER_PROF, &val, NULL); setitimer(ITIMER_PROF, NULL, &val); would add a full tick to val, _even if it was exactly representable in terms of jiffies_ (say, the result of a previous rounding.) Doing this repeatedly would cause unbounded growth in val. So fix the math. Here's what was wrong with the conversion: we essentially computed (eliding seconds) jiffies = usec * (NSEC_PER_USEC/TICK_NSEC) by using scaling arithmetic, which took the best approximation of NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC = x/(2^USEC_JIFFIE_SC), and computed: jiffies = (usec * x) >> USEC_JIFFIE_SC and rounded this calculation up in the intermediate form (since we can't necessarily exactly represent TICK_NSEC in usec.) But the scaling arithmetic is a (very slight) *over*approximation of the true value; that is, instead of dividing by (1 usec/ 1 jiffie), we effectively divided by (1 usec/1 jiffie)-epsilon (rounding down). This would normally be fine, but we want to round timeouts up, and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this would be fine if our division was exact, but dividing this by the slightly smaller factor was equivalent to adding just _over_ 1 to the final result (instead of just _under_ 1, as desired.) In particular, with HZ=1000, we consistently computed that 10000 usec was 11 jiffies; the same was true for any exact multiple of TICK_NSEC. We could possibly still round in the intermediate form, adding something less than 2^USEC_JIFFIE_SC - 1, but easier still is to convert usec->nsec, round in nanoseconds, and then convert using time*spec*_to_jiffies. This adds one constant multiplication, and is not observably slower in microbenchmarks on recent x86 hardware. Tested: the following program: int main() { struct itimerval zero = {{0, 0}, {0, 0}}; /* Initially set to 10 ms. */ struct itimerval initial = zero; initial.it_interval.tv_usec = 10000; setitimer(ITIMER_PROF, &initial, NULL); /* Save and restore several times. */ for (size_t i = 0; i < 10; ++i) { struct itimerval prev; setitimer(ITIMER_PROF, &zero, &prev); /* on old kernels, this goes up by TICK_USEC every iteration */ printf("previous value: %ld %ld %ld %ld\n", prev.it_interval.tv_sec, prev.it_interval.tv_usec, prev.it_value.tv_sec, prev.it_value.tv_usec); setitimer(ITIMER_PROF, &prev, NULL); } return 0; } Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Turner <pjt@google.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reviewed-by: NPaul Turner <pjt@google.com> Reported-by: NAaron Jacobs <jacobsa@google.com> Signed-off-by: NAndrew Hunter <ahh@google.com> [jstultz: Tweaked to apply to 3.17-rc] Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
- 06 9月, 2014 1 次提交
-
-
由 Thomas Gleixner 提交于
The update_walltime() code works on the shadow timekeeper to make the seqcount protected region as short as possible. But that update to the shadow timekeeper does not update all timekeeper fields because it's sufficient to do that once before it becomes life. One of these fields is tkr.base_mono. That stays stale in the shadow timekeeper unless an operation happens which copies the real timekeeper to the shadow. The update function is called after the update calls to vsyscall and pvclock. While not correct, it did not cause any problems because none of the invoked update functions used base_mono. commit cbcf2dd3 (x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based) changed that in the kvm pvclock update function, so the stale mono_base value got used and caused kvm-clock to malfunction. Put the update where it belongs and fix the issue. Reported-by: NChris J Arges <chris.j.arges@canonical.com> Reported-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 05 9月, 2014 1 次提交
-
-
由 Frederic Weisbecker 提交于
The local nohz kick is currently used by perf which needs it to be NMI-safe. Recent commit though (7d1311b9) changed its implementation to fire the local kick using the remote kick API. It was convenient to make the code more generic but the remote kick isn't NMI-safe. As a result: WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140() CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34 0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37 0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180 0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000 Call Trace: <NMI> [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a [<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0 [<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20 [<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140 [<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90 [<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350 [<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0 [<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150 [<ffffffff9a187934>] perf_event_overflow+0x14/0x20 [<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410 [<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130 [<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50 [<ffffffff9a007b72>] nmi_handle+0xd2/0x390 [<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390 [<ffffffff9a0d131b>] ? lock_release+0xab/0x330 [<ffffffff9a008062>] default_do_nmi+0x72/0x1c0 [<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200 [<ffffffff9a008268>] do_nmi+0xb8/0x100 Lets fix this by restoring the use of local irq work for the nohz local kick. Reported-by: NCatalin Iacob <iacobcatalin@gmail.com> Reported-and-tested-by: NDave Jones <davej@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
- 15 8月, 2014 1 次提交
-
-
由 John Stultz 提交于
Benjamin Herrenschmidt pointed out that I further missed modifying update_vsyscall after the wall_to_mono value was changed to a timespec64. This causes issues on powerpc32, which expects a 32bit timespec. This patch fixes the problem by properly converting from a timespec64 to a timespec before passing the value on to the arch-specific vsyscall logic. [ Thomas is currently on vacation, but reviewed it and wanted me to send this fix on to you directly. ] Cc: LKML <linux-kernel@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 8月, 2014 1 次提交
-
-
由 Jan Kara 提交于
clockevents_increase_min_delta() calls printk() from under hrtimer_bases.lock. That causes lock inversion on scheduler locks because printk() can call into the scheduler. Lockdep puts it as: ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc8-06195-g939f04be #2 Not tainted ------------------------------------------------------- trinity-main/74 is trying to acquire lock: (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c but task is already holding lock: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (hrtimer_bases.lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197 [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85 [<81080792>] task_clock_event_start+0x3a/0x3f [<810807a4>] task_clock_event_add+0xd/0x14 [<8108259a>] event_sched_in+0xb6/0x17a [<810826a2>] group_sched_in+0x44/0x122 [<81082885>] ctx_sched_in.isra.67+0x105/0x11f [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b [<81082bf6>] __perf_install_in_context+0x8b/0xa3 [<8107eb8e>] remote_function+0x12/0x2a [<8105f5af>] smp_call_function_single+0x2d/0x53 [<8107e17d>] task_function_call+0x30/0x36 [<8107fb82>] perf_install_in_context+0x87/0xbb [<810852c9>] SYSC_perf_event_open+0x5c6/0x701 [<810856f9>] SyS_perf_event_open+0x17/0x19 [<8142f8ee>] syscall_call+0x7/0xb -> #4 (&ctx->lock){......}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 -> #3 (&rq->lock){-.-.-.}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81040873>] __task_rq_lock+0x33/0x3a [<8104184c>] wake_up_new_task+0x25/0xc2 [<8102474b>] do_fork+0x15c/0x2a0 [<810248a9>] kernel_thread+0x1a/0x1f [<814232a2>] rest_init+0x1a/0x10e [<817af949>] start_kernel+0x303/0x308 [<817af2ab>] i386_start_kernel+0x79/0x7d -> #2 (&p->pi_lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<810413dd>] try_to_wake_up+0x1d/0xd6 [<810414cd>] default_wake_function+0xb/0xd [<810461f3>] __wake_up_common+0x39/0x59 [<81046346>] __wake_up+0x29/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #1 (&tty->write_wait){-.....}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<81046332>] __wake_up+0x15/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #0 (&port_lock_key){-.....}: [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105c548>] clockevents_program_event+0xe7/0xf3 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 other info that might help us debug this: Chain exists of: &port_lock_key --> &ctx->lock --> hrtimer_bases.lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(hrtimer_bases.lock); lock(&ctx->lock); lock(hrtimer_bases.lock); lock(&port_lock_key); *** DEADLOCK *** 4 locks held by trinity-main/74: #0: (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb #1: (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f #2: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 #3: (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4 stack backtrace: CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04be #2 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003 Call Trace: [<81426f69>] dump_stack+0x16/0x18 [<81425a99>] print_circular_bug+0x18f/0x19c [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104af87>] ? lock_release+0x191/0x223 [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8104416d>] ? __dequeue_entity+0x23/0x27 [<81044505>] ? pick_next_task_fair+0xb1/0x120 [<8142cacc>] __schedule+0x4c6/0x4cb [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108 [<810475b0>] ? trace_hardirqs_off+0xb/0xd [<81056346>] ? rcu_irq_exit+0x64/0x77 Fix the problem by using printk_deferred() which does not call into the scheduler. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NJan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 24 7月, 2014 32 次提交
-
-
由 Stephen Boyd 提交于
During suspend we call sched_clock_poll() to update the epoch and accumulated time and reprogram the sched_clock_timer to fire before the next wrap-around time. Unfortunately, sched_clock_poll() doesn't restart the timer, instead it relies on the hrtimer layer to do that and during suspend we aren't calling that function from the hrtimer layer. Instead, we're reprogramming the expires time while the hrtimer is enqueued, which can cause the hrtimer tree to be corrupted. Furthermore, we restart the timer during suspend but we update the epoch during resume which seems counter-intuitive. Let's fix this by saving the accumulated state and canceling the timer during suspend. On resume we can update the epoch and restart the timer similar to what we would do if we were starting the clock for the first time. Fixes: a08ca5d1 "sched_clock: Use an hrtimer instead of timer" Signed-off-by: NStephen Boyd <sboyd@codeaurora.org> Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1406174630-23458-1-git-send-email-john.stultz@linaro.org Cc: Ingo Molnar <mingo@kernel.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 John Stultz 提交于
By caching the ntp_tick_length() when we correct the frequency error, and then using that cached value to accumulate error, we avoid large initial errors when the tick length is changed. This makes convergence happen much faster in the simulator, since the initial error doesn't have to be slowly whittled away. This initially seems like an accounting error, but Miroslav pointed out that ntp_tick_length() can change mid-tick, so when we apply it in the error accumulation, we are applying any recent change to the entire tick. This approach chooses to apply changes in the ntp_tick_length() only to the next tick, which allows us to calculate the freq correction before using the new tick length, which avoids accummulating error. Credit to Miroslav for pointing this out and providing the original patch this functionality has been pulled out from, along with the rational. Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reported-by: NMiroslav Lichvar <mlichvar@redhat.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 John Stultz 提交于
The existing timekeeping_adjust logic has always been complicated to understand. Further, since it was developed prior to NOHZ becoming common, its not surprising it performs poorly when NOHZ is enabled. Since Miroslav pointed out the problematic nature of the existing code in the NOHZ case, I've tried to refactor the code to perform better. The problem with the previous approach was that it tried to adjust for the total cumulative error using a scaled dampening factor. This resulted in large errors to be corrected slowly, while small errors were corrected quickly. With NOHZ the timekeeping code doesn't know how far out the next tick will be, so this results in bad over-correction to small errors, and insufficient correction to large errors. Inspired by Miroslav's patch, I've refactored the code to try to address the correction in two steps. 1) Check the future freq error for the next tick, and if the frequency error is large, try to make sure we correct it so it doesn't cause much accumulated error. 2) Then make a small single unit adjustment to correct any cumulative error that has collected over time. This method performs fairly well in the simulator Miroslav created. Major credit to Miroslav for pointing out the issue, providing the original patch to resolve this, a simulator for testing, as well as helping debug and resolve issues in my implementation so that it performed closer to his original implementation. Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reported-by: NMiroslav Lichvar <mlichvar@redhat.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 John Stultz 提交于
In the GENERIC_TIME_VSYSCALL_OLD update_vsyscall implementation, we take the tk_xtime() value, which returns a timespec64, and store it in a timespec. This luckily is ok, since the only architectures that use GENERIC_TIME_VSYSCALL_OLD are ia64 and ppc64, which are both 64 bit systems where timespec64 is the same as a timespec. Even so, for cleanliness reasons, use the conversion function to assign the proper type. Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
Tracers want a correlated time between the kernel instrumentation and user space. We really do not want to export sched_clock() to user space, so we need to provide something sensible for this. Using separate data structures with an non blocking sequence count based update mechanism allows us to do that. The data structure required for the readout has a sequence counter and two copies of the timekeeping data. On the update side: smp_wmb(); tkf->seq++; smp_wmb(); update(tkf->base[0], tk); smp_wmb(); tkf->seq++; smp_wmb(); update(tkf->base[1], tk); On the reader side: do { seq = tkf->seq; smp_rmb(); idx = seq & 0x01; now = now(tkf->base[idx]); smp_rmb(); } while (seq != tkf->seq) So if a NMI hits the update of base[0] it will use base[1] which is still consistent, but this timestamp is not guaranteed to be monotonic across an update. The timestamp is calculated by: now = base_mono + clock_delta * slope So if the update lowers the slope, readers who are forced to the not yet updated second array are still using the old steeper slope. tmono ^ | o n | o n | u | o |o |12345678---> reader order o = old slope u = update n = new slope So reader 6 will observe time going backwards versus reader 5. While other CPUs are likely to be able observe that, the only way for a CPU local observation is when an NMI hits in the middle of the update. Timestamps taken from that NMI context might be ahead of the following timestamps. Callers need to be aware of that and deal with it. V2: Got rid of clock monotonic raw and reorganized the data structures. Folded in the barrier fix from Mathieu. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
All the function needs is in the tk_read_base struct. No functional change for the current code, just a preparatory patch for the NMI safe accessor to clock monotonic which will use struct tk_read_base as well. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
The members of the new struct are the required ones for the new NMI safe accessor to clcok monotonic. In order to reuse the existing timekeeping code and to make the update of the fast NMI safe timekeepers a simple memcpy use the struct for the timekeeper as well and convert all users. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
Access to time requires to touch two cachelines at minimum 1) The timekeeper data structure 2) The clocksource data structure The access to the clocksource data structure can be avoided as almost all clocksource implementations ignore the argument to the read callback, which is a pointer to the clocksource. But the core needs to touch it to access the members @read and @mask. So we are better off by copying the @read function pointer and the @mask from the clocksource to the core data structure itself. For the most used ktime_get() access all required data including the @read and @mask copies fits together with the sequence counter into a single 64 byte cacheline. For the other time access functions we touch in the current code three cache lines in the worst case. But with the clocksource data copies we can reduce that to two adjacent cachelines, which is more efficient than disjunct cache lines. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
cycle_last was added to the clocksource to support the TSC validation. We moved that to the core code, so we can get rid of the extra copy. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
The only user of the cycle_last validation is the x86 TSC. In order to provide NMI safe accessor functions for clock monotonic and monotonic_raw we need to do that in the core. We can't do the TSC specific if (now < cycle_last) now = cycle_last; for the other wrapping around clocksources, but TSC has CLOCKSOURCE_MASK(64) which actually does not mask out anything so if now is less than cycle_last the subtraction will give a negative result. So we can check for that in clocksource_delta() and return 0 for that case. Implement and enable it for x86 Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
We want to move the TSC sanity check into core code to make NMI safe accessors to clock monotonic[_raw] possible. For this we need to sanity check the delta calculation. Create a helper function and convert all sites to use it. [ Build fix from jstultz ] Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
Provide a ktime_t based interface for raw monotonic time. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
timekeeping_clocktai() is not used in fast pathes, so the extra timespec conversion is not problematic. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
No more users. Remove it Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
Subtracting plain nsec values and converting to timespec is simpler than the whole timespec math. Not really fastpath code, so the division is not an issue. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
get_monotonic_boottime() is not used in fast pathes, so the extra timespec conversion is not problematic. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
No more users. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
Required for moving drivers to the nanosecond based interfaces. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
No more users. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
ktime based conversion function to map a monotonic time stamp to a different CLOCK. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
No need to juggle with timespecs. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
No need to juggle with timespecs. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
Speed up the readout. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
Provide a helper function which lets us implement ktime_t based interfaces for real, boot and tai clocks. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
Speed up ktime_get() by using ktime_t based data. Text size shrinks by 64 bytes on x8664. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
The ktime_t based interfaces are used a lot in performance critical code pathes. Add ktime_t based data so the interfaces don't have to convert from the xtime/timespec based data. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
We already have a function which does the right thing, that also makes sure that the coming ktime_t based cached values are getting updated. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
struct timekeeper is quite badly sorted for the hot readout path. Most time access functions need to load two cache lines. Rearrange it so ktime_get() and getnstimeofday() are happy with a single cache line. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
No users outside of the core. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
To convert callers of the core code to timespec64 we need to provide the proper interfaces. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-