1. 24 7月, 2014 29 次提交
  2. 05 6月, 2014 1 次提交
    • J
      timekeeping: use printk_deferred when holding timekeeping seqlock · 6d9bcb62
      John Stultz 提交于
      Jiri Bohac pointed out that there are rare but potential deadlock
      possibilities when calling printk while holding the timekeeping
      seqlock.
      
      This is due to printk() triggering console sem wakeup, which can
      cause scheduling code to trigger hrtimers which may try to read
      the time.
      
      Specifically, as Jiri pointed out, that path is:
        printk
          vprintk_emit
            console_unlock
              up(&console_sem)
                __up
      	    wake_up_process
      	      try_to_wake_up
      	        ttwu_do_activate
      		  ttwu_activate
      		    activate_task
      		      enqueue_task
      		        enqueue_task_fair
      			  hrtick_update
      			    hrtick_start_fair
      			      hrtick_start_fair
      			        get_time
      				  ktime_get
      				    --> endless loop on
      				    read_seqcount_retry(&timekeeper_seq, ...)
      
      This patch tries to avoid this issue by using printk_deferred (previously
      named printk_sched) which should defer printing via a irq_work_queue.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Reported-by: NJiri Bohac <jbohac@suse.cz>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d9bcb62
  3. 08 4月, 2014 1 次提交
  4. 28 3月, 2014 1 次提交
  5. 24 12月, 2013 8 次提交
    • J
      timekeeping: Remove comment that's mostly out of date · 38aef31c
      John Stultz 提交于
      Prior to 92bb1fcf (Only
      do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD
      systems), the comment here was accuate, but now we can
      mostly avoid the extra rounding which causes the unlikey
      to be actually likely here.
      
      So remove the out of date comment.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      38aef31c
    • Y
      timekeeper: fix comment typo for tk_setup_internals() · d26e4fe0
      Yijing Wang 提交于
      Fix trivial comment typo for tk_setup_internals().
      Signed-off-by: NYijing Wang <wangyijing@huawei.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      d26e4fe0
    • J
      timekeeping: Fix missing timekeeping_update in suspend path · 330a1617
      John Stultz 提交于
      Since 48cdc135 (Implement a shadow timekeeper), we have to
      call timekeeping_update() after any adjustment to the timekeeping
      structure in order to make sure that any adjustments to the structure
      persist.
      
      In the timekeeping suspend path, we udpate the timekeeper
      structure, so we should be sure to update the shadow-timekeeper
      before releasing the timekeeping locks. Currently this isn't done.
      
      In most cases, the next time related code to run would be
      timekeeping_resume, which does update the shadow-timekeeper, but
      in an abundence of caution, this patch adds the call to
      timekeeping_update() in the suspend path.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable <stable@vger.kernel.org> #3.10+
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      330a1617
    • J
      timekeeping: Fix CLOCK_TAI timer/nanosleep delays · 04005f60
      John Stultz 提交于
      A think-o in the calculation of the monotonic -> tai time offset
      results in CLOCK_TAI timers and nanosleeps to expire late (the
      latency is ~2x the tai offset).
      
      Fix this by adding the tai offset from the realtime offset instead
      of subtracting.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable <stable@vger.kernel.org> #3.10+
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      04005f60
    • J
      tick/timekeeping: Call update_wall_time outside the jiffies lock · 47a1b796
      John Stultz 提交于
      Since the xtime lock was split into the timekeeping lock and
      the jiffies lock, we no longer need to call update_wall_time()
      while holding the jiffies lock.
      
      Thus, this patch splits update_wall_time() out from do_timer().
      
      This allows us to get away from calling clock_was_set_delayed()
      in update_wall_time() and instead use the standard clock_was_set()
      call that previously would deadlock, as it causes the jiffies lock
      to be acquired.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      47a1b796
    • J
      timekeeping: Avoid possible deadlock from clock_was_set_delayed · 6fdda9a9
      John Stultz 提交于
      As part of normal operaions, the hrtimer subsystem frequently calls
      into the timekeeping code, creating a locking order of
        hrtimer locks -> timekeeping locks
      
      clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
      between the timekeeping the hrtimer subsystem, so that we could
      notify the hrtimer subsytem the time had changed while holding
      the timekeeping locks. This was done by scheduling delayed work
      that would run later once we were out of the timekeeing code.
      
      But unfortunately the lock chains are complex enoguh that in
      scheduling delayed work, we end up eventually trying to grab
      an hrtimer lock.
      
      Sasha Levin noticed this in testing when the new seqlock lockdep
      enablement triggered the following (somewhat abrieviated) message:
      
      [  251.100221] ======================================================
      [  251.100221] [ INFO: possible circular locking dependency detected ]
      [  251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
      [  251.101967] -------------------------------------------------------
      [  251.101967] kworker/10:1/4506 is trying to acquire lock:
      [  251.101967]  (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70
      [  251.101967]
      [  251.101967] but task is already holding lock:
      [  251.101967]  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
      [  251.101967]
      [  251.101967] which lock already depends on the new lock.
      [  251.101967]
      [  251.101967]
      [  251.101967] the existing dependency chain (in reverse order) is:
      [  251.101967]
      -> #5 (hrtimer_bases.lock#11){-.-...}:
      [snipped]
      -> #4 (&rt_b->rt_runtime_lock){-.-...}:
      [snipped]
      -> #3 (&rq->lock){-.-.-.}:
      [snipped]
      -> #2 (&p->pi_lock){-.-.-.}:
      [snipped]
      -> #1 (&(&pool->lock)->rlock){-.-...}:
      [  251.101967]        [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
      [  251.101967]        [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
      [  251.101967]        [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
      [  251.101967]        [<ffffffff84398500>] _raw_spin_lock+0x40/0x80
      [  251.101967]        [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0
      [  251.101967]        [<ffffffff81154168>] queue_work_on+0x98/0x120
      [  251.101967]        [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30
      [  251.101967]        [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160
      [  251.101967]        [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
      [  251.101967]        [<ffffffff843a4b49>] ia32_sysret+0x0/0x5
      [  251.101967]
      -> #0 (timekeeper_seq){----..}:
      [snipped]
      [  251.101967] other info that might help us debug this:
      [  251.101967]
      [  251.101967] Chain exists of:
        timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11
      
      [  251.101967]  Possible unsafe locking scenario:
      [  251.101967]
      [  251.101967]        CPU0                    CPU1
      [  251.101967]        ----                    ----
      [  251.101967]   lock(hrtimer_bases.lock#11);
      [  251.101967]                                lock(&rt_b->rt_runtime_lock);
      [  251.101967]                                lock(hrtimer_bases.lock#11);
      [  251.101967]   lock(timekeeper_seq);
      [  251.101967]
      [  251.101967]  *** DEADLOCK ***
      [  251.101967]
      [  251.101967] 3 locks held by kworker/10:1/4506:
      [  251.101967]  #0:  (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
      [  251.101967]  #1:  (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
      [  251.101967]  #2:  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
      [  251.101967]
      [  251.101967] stack backtrace:
      [  251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
      [  251.101967] Workqueue: events clock_was_set_work
      
      So the best solution is to avoid calling clock_was_set_delayed() while
      holding the timekeeping lock, and instead using a flag variable to
      decide if we should call clock_was_set() once we've released the locks.
      
      This works for the case here, where the do_adjtimex() was the deadlock
      trigger point. Unfortuantely, in update_wall_time() we still hold
      the jiffies lock, which would deadlock with the ipi triggered by
      clock_was_set(), preventing us from calling it even after we drop the
      timekeeping lock. So instead call clock_was_set_delayed() at that point.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: stable <stable@vger.kernel.org> #3.10+
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      6fdda9a9
    • J
      timekeeping: Fix potential lost pv notification of time change · 5258d3f2
      John Stultz 提交于
      In 780427f0 (Indicate that clock was set in the pvclock
      gtod notifier), logic was added to pass a CLOCK_WAS_SET
      notification to the pvclock notifier chain.
      
      While that patch added a action flag returned from
      accumulate_nsecs_to_secs(), it only uses the returned value
      in one location, and not in the logarithmic accumulation.
      
      This means if a leap second triggered during the logarithmic
      accumulation (which is most likely where it would happen),
      the notification that the clock was set would not make it to
      the pv notifiers.
      
      This patch extends the logarithmic_accumulation pass down
      that action flag so proper notification will occur.
      
      This patch also changes the varialbe action -> clock_set
      per Ingo's suggestion.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: <xen-devel@lists.xen.org>
      Cc: stable <stable@vger.kernel.org> #3.11+
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      5258d3f2
    • J
      timekeeping: Fix lost updates to tai adjustment · f55c0760
      John Stultz 提交于
      Since 48cdc135 (Implement a shadow timekeeper), we have to
      call timekeeping_update() after any adjustment to the timekeeping
      structure in order to make sure that any adjustments to the structure
      persist.
      
      Unfortunately, the updates to the tai offset via adjtimex do not
      trigger this update, causing adjustments to the tai offset to be
      made and then over-written by the previous value at the next
      update_wall_time() call.
      
      This patch resovles the issue by calling timekeeping_update()
      right after setting the tai offset.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable <stable@vger.kernel.org> #3.10+
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      f55c0760