1. 20 2月, 2014 1 次提交
    • S
      sched_clock: Prevent callers from seeing half-updated data · 5ae8aabe
      Stephen Boyd 提交于
      The generic sched_clock registration function was previously
      done lockless, due to the fact that it was expected to be called
      only once. However, now there are systems that may register
      multiple sched_clock sources, for which the lack of locking has
      casued problems:
      
      If two sched_clock sources are registered we may end up in a
      situation where a call to sched_clock() may be accessing the
      epoch cycle count for the old counter and the cycle count for the
      new counter. This can lead to confusing results where
      sched_clock() values jump and then are reset to 0 (due to the way
      the registration function forces the epoch_ns to be 0).
      
      Fix this by reorganizing the registration function to hold the
      seqlock for as short a time as possible while we update the
      clock_data structure for a new counter. We also put any
      accumulated time into epoch_ns instead of resetting the time to
      0 so that the clock doesn't reset after each successful
      registration.
      
      [jstultz: Added extra context to the commit message]
      Reported-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Cartwright <joshc@codeaurora.org>
      Link: http://lkml.kernel.org/r/1392662736-7803-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5ae8aabe
  2. 14 2月, 2014 1 次提交
    • T
      tick: Clear broadcast pending bit when switching to oneshot · dd5fd9b9
      Thomas Gleixner 提交于
      AMD systems which use the C1E workaround in the amd_e400_idle routine
      trigger the WARN_ON_ONCE in the broadcast code when onlining a CPU.
      
      The reason is that the idle routine of those AMD systems switches the
      cpu into forced broadcast mode early on before the newly brought up
      CPU can switch over to high resolution / NOHZ mode. The timer related
      CPU1 bringup looks like this:
      
        clockevent_register_device(local_apic);
        tick_setup(local_apic);
        ...
        idle()
      	tick_broadcast_on_off(FORCE);
      	tick_broadcast_oneshot_control(ENTER)
      	  cpumask_set(cpu, broadcast_oneshot_mask);
      	halt();
      
      Now the broadcast interrupt on CPU0 sets CPU1 in the
      broadcast_pending_mask and wakes CPU1. So CPU1 continues:
      
      	local_apic_timer_interrupt()
      	   tick_handle_periodic();
      	   softirq()
      	     tick_init_highres();
      	       cpumask_clr(cpu, broadcast_oneshot_mask);
      	
      	tick_broadcast_oneshot_control(ENTER)
      	   WARN_ON(cpumask_test(cpu, broadcast_pending_mask);
      
      So while we remove CPU1 from the broadcast_oneshot_mask when we switch
      over to highres mode, we do not clear the pending bit, which then
      triggers the warning when we go back to idle.
      
      The reason why this is only visible on C1E affected AMD systems is
      that the other machines enter the deep sleep states via
      acpi_idle/intel_idle and exit the broadcast mode before executing the
      remote triggered local_apic_timer_interrupt. So the pending bit is
      already cleared when the switch over to highres mode is clearing the
      oneshot mask.
      
      The solution is simple: Clear the pending bit together with the mask
      bit when we switch over to highres mode.
      
      Stanislaw came up independently with the same patch by enforcing the
      C1E workaround and debugging the fallout. I picked mine, because mine
      has a changelog :)
      Reported-by: Npoma <pomidorabelisima@gmail.com>
      Debugged-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Olaf Hering <olaf@aepfle.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Justin M. Forbes <jforbes@redhat.com>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402111434180.21991@ionos.tec.linutronix.de
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      dd5fd9b9
  3. 06 2月, 2014 1 次提交
  4. 16 1月, 2014 3 次提交
  5. 13 1月, 2014 1 次提交
    • P
      sched/clock, x86: Use a static_key for sched_clock_stable · 35af99e6
      Peter Zijlstra 提交于
      In order to avoid the runtime condition and variable load turn
      sched_clock_stable into a static_key.
      
      Also provide a shorter implementation of local_clock() and
      cpu_clock(int) when sched_clock_stable==1.
      
                              MAINLINE   PRE       POST
      
          sched_clock_stable: 1          1         1
          (cold) sched_clock: 329841     221876    215295
          (cold) local_clock: 301773     234692    220773
          (warm) sched_clock: 38375      25602     25659
          (warm) local_clock: 100371     33265     27242
          (warm) rdtsc:       27340      24214     24208
          sched_clock_stable: 0          0         0
          (cold) sched_clock: 382634     235941    237019
          (cold) local_clock: 396890     297017    294819
          (warm) sched_clock: 38194      25233     25609
          (warm) local_clock: 143452     71234     71232
          (warm) rdtsc:       27345      24245     24243
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35af99e6
  6. 12 1月, 2014 1 次提交
  7. 24 12月, 2013 8 次提交
    • J
      timekeeping: Remove comment that's mostly out of date · 38aef31c
      John Stultz 提交于
      Prior to 92bb1fcf (Only
      do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD
      systems), the comment here was accuate, but now we can
      mostly avoid the extra rounding which causes the unlikey
      to be actually likely here.
      
      So remove the out of date comment.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      38aef31c
    • Y
      timekeeper: fix comment typo for tk_setup_internals() · d26e4fe0
      Yijing Wang 提交于
      Fix trivial comment typo for tk_setup_internals().
      Signed-off-by: NYijing Wang <wangyijing@huawei.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      d26e4fe0
    • J
      timekeeping: Fix missing timekeeping_update in suspend path · 330a1617
      John Stultz 提交于
      Since 48cdc135 (Implement a shadow timekeeper), we have to
      call timekeeping_update() after any adjustment to the timekeeping
      structure in order to make sure that any adjustments to the structure
      persist.
      
      In the timekeeping suspend path, we udpate the timekeeper
      structure, so we should be sure to update the shadow-timekeeper
      before releasing the timekeeping locks. Currently this isn't done.
      
      In most cases, the next time related code to run would be
      timekeeping_resume, which does update the shadow-timekeeper, but
      in an abundence of caution, this patch adds the call to
      timekeeping_update() in the suspend path.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable <stable@vger.kernel.org> #3.10+
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      330a1617
    • J
      timekeeping: Fix CLOCK_TAI timer/nanosleep delays · 04005f60
      John Stultz 提交于
      A think-o in the calculation of the monotonic -> tai time offset
      results in CLOCK_TAI timers and nanosleeps to expire late (the
      latency is ~2x the tai offset).
      
      Fix this by adding the tai offset from the realtime offset instead
      of subtracting.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable <stable@vger.kernel.org> #3.10+
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      04005f60
    • J
      tick/timekeeping: Call update_wall_time outside the jiffies lock · 47a1b796
      John Stultz 提交于
      Since the xtime lock was split into the timekeeping lock and
      the jiffies lock, we no longer need to call update_wall_time()
      while holding the jiffies lock.
      
      Thus, this patch splits update_wall_time() out from do_timer().
      
      This allows us to get away from calling clock_was_set_delayed()
      in update_wall_time() and instead use the standard clock_was_set()
      call that previously would deadlock, as it causes the jiffies lock
      to be acquired.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      47a1b796
    • J
      timekeeping: Avoid possible deadlock from clock_was_set_delayed · 6fdda9a9
      John Stultz 提交于
      As part of normal operaions, the hrtimer subsystem frequently calls
      into the timekeeping code, creating a locking order of
        hrtimer locks -> timekeeping locks
      
      clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
      between the timekeeping the hrtimer subsystem, so that we could
      notify the hrtimer subsytem the time had changed while holding
      the timekeeping locks. This was done by scheduling delayed work
      that would run later once we were out of the timekeeing code.
      
      But unfortunately the lock chains are complex enoguh that in
      scheduling delayed work, we end up eventually trying to grab
      an hrtimer lock.
      
      Sasha Levin noticed this in testing when the new seqlock lockdep
      enablement triggered the following (somewhat abrieviated) message:
      
      [  251.100221] ======================================================
      [  251.100221] [ INFO: possible circular locking dependency detected ]
      [  251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
      [  251.101967] -------------------------------------------------------
      [  251.101967] kworker/10:1/4506 is trying to acquire lock:
      [  251.101967]  (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70
      [  251.101967]
      [  251.101967] but task is already holding lock:
      [  251.101967]  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
      [  251.101967]
      [  251.101967] which lock already depends on the new lock.
      [  251.101967]
      [  251.101967]
      [  251.101967] the existing dependency chain (in reverse order) is:
      [  251.101967]
      -> #5 (hrtimer_bases.lock#11){-.-...}:
      [snipped]
      -> #4 (&rt_b->rt_runtime_lock){-.-...}:
      [snipped]
      -> #3 (&rq->lock){-.-.-.}:
      [snipped]
      -> #2 (&p->pi_lock){-.-.-.}:
      [snipped]
      -> #1 (&(&pool->lock)->rlock){-.-...}:
      [  251.101967]        [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
      [  251.101967]        [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
      [  251.101967]        [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
      [  251.101967]        [<ffffffff84398500>] _raw_spin_lock+0x40/0x80
      [  251.101967]        [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0
      [  251.101967]        [<ffffffff81154168>] queue_work_on+0x98/0x120
      [  251.101967]        [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30
      [  251.101967]        [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160
      [  251.101967]        [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
      [  251.101967]        [<ffffffff843a4b49>] ia32_sysret+0x0/0x5
      [  251.101967]
      -> #0 (timekeeper_seq){----..}:
      [snipped]
      [  251.101967] other info that might help us debug this:
      [  251.101967]
      [  251.101967] Chain exists of:
        timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11
      
      [  251.101967]  Possible unsafe locking scenario:
      [  251.101967]
      [  251.101967]        CPU0                    CPU1
      [  251.101967]        ----                    ----
      [  251.101967]   lock(hrtimer_bases.lock#11);
      [  251.101967]                                lock(&rt_b->rt_runtime_lock);
      [  251.101967]                                lock(hrtimer_bases.lock#11);
      [  251.101967]   lock(timekeeper_seq);
      [  251.101967]
      [  251.101967]  *** DEADLOCK ***
      [  251.101967]
      [  251.101967] 3 locks held by kworker/10:1/4506:
      [  251.101967]  #0:  (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
      [  251.101967]  #1:  (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
      [  251.101967]  #2:  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
      [  251.101967]
      [  251.101967] stack backtrace:
      [  251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
      [  251.101967] Workqueue: events clock_was_set_work
      
      So the best solution is to avoid calling clock_was_set_delayed() while
      holding the timekeeping lock, and instead using a flag variable to
      decide if we should call clock_was_set() once we've released the locks.
      
      This works for the case here, where the do_adjtimex() was the deadlock
      trigger point. Unfortuantely, in update_wall_time() we still hold
      the jiffies lock, which would deadlock with the ipi triggered by
      clock_was_set(), preventing us from calling it even after we drop the
      timekeeping lock. So instead call clock_was_set_delayed() at that point.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: stable <stable@vger.kernel.org> #3.10+
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      6fdda9a9
    • J
      timekeeping: Fix potential lost pv notification of time change · 5258d3f2
      John Stultz 提交于
      In 780427f0 (Indicate that clock was set in the pvclock
      gtod notifier), logic was added to pass a CLOCK_WAS_SET
      notification to the pvclock notifier chain.
      
      While that patch added a action flag returned from
      accumulate_nsecs_to_secs(), it only uses the returned value
      in one location, and not in the logarithmic accumulation.
      
      This means if a leap second triggered during the logarithmic
      accumulation (which is most likely where it would happen),
      the notification that the clock was set would not make it to
      the pv notifiers.
      
      This patch extends the logarithmic_accumulation pass down
      that action flag so proper notification will occur.
      
      This patch also changes the varialbe action -> clock_set
      per Ingo's suggestion.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: <xen-devel@lists.xen.org>
      Cc: stable <stable@vger.kernel.org> #3.11+
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      5258d3f2
    • J
      timekeeping: Fix lost updates to tai adjustment · f55c0760
      John Stultz 提交于
      Since 48cdc135 (Implement a shadow timekeeper), we have to
      call timekeeping_update() after any adjustment to the timekeeping
      structure in order to make sure that any adjustments to the structure
      persist.
      
      Unfortunately, the updates to the tai offset via adjtimex do not
      trigger this update, causing adjustments to the tai offset to be
      made and then over-written by the previous value at the next
      update_wall_time() call.
      
      This patch resovles the issue by calling timekeeping_update()
      right after setting the tai offset.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable <stable@vger.kernel.org> #3.10+
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      f55c0760
  8. 03 12月, 2013 1 次提交
    • F
      nohz: Convert a few places to use local per cpu accesses · e8fcaa5c
      Frederic Weisbecker 提交于
      A few functions use remote per CPU access APIs when they
      deal with local values.
      
      Just do the right conversion to improve performance, code
      readability and debug checks.
      
      While at it, lets extend some of these function names with *_this_cpu()
      suffix in order to display their purpose more clearly.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      e8fcaa5c
  9. 29 11月, 2013 1 次提交
  10. 23 11月, 2013 1 次提交
    • M
      time: Fix 1ns/tick drift w/ GENERIC_TIME_VSYSCALL_OLD · 4be77398
      Martin Schwidefsky 提交于
      Since commit 1e75fa8b (time: Condense timekeeper.xtime
      into xtime_sec - merged in v3.6), there has been an problem
      with the error accounting in the timekeeping code, such that
      when truncating to nanoseconds, we round up to the next nsec,
      but the balancing adjustment to the ntp_error value was dropped.
      
      This causes 1ns per tick drift forward of the clock.
      
      In 3.7, this logic was isolated to only GENERIC_TIME_VSYSCALL_OLD
      architectures (s390, ia64, powerpc).
      
      The fix is simply to balance the accounting and to subtract the
      added nanosecond from ntp_error. This allows the internal long-term
      clock steering to keep the clock accurate.
      
      While this fix removes the regression added in 1e75fa8b, the
      ideal solution is to move away from GENERIC_TIME_VSYSCALL_OLD
      and use the new VSYSCALL method, which avoids entirely the
      nanosecond granular rounding, and the resulting short-term clock
      adjustment oscillation needed to keep long term accurate time.
      
      [ jstultz: Many thanks to Martin for his efforts identifying this
        	   subtle bug, and providing the fix. ]
      
      Originally-from: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Paul Turner <pjt@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable <stable@vger.kernel.org>  #v3.6+
      Link: http://lkml.kernel.org/r/1385149491-20307-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      4be77398
  11. 19 11月, 2013 2 次提交
  12. 23 10月, 2013 1 次提交
    • T
      clockevents: Sanitize ticks to nsec conversion · 97b94106
      Thomas Gleixner 提交于
      Marc Kleine-Budde pointed out, that commit 77cc982f "clocksource: use
      clockevents_config_and_register() where possible" caused a regression
      for some of the converted subarchs.
      
      The reason is, that the clockevents core code converts the minimal
      hardware tick delta to a nanosecond value for core internal
      usage. This conversion is affected by integer math rounding loss, so
      the backwards conversion to hardware ticks will likely result in a
      value which is less than the configured hardware limitation. The
      affected subarchs used their own workaround (SIGH!) which got lost in
      the conversion.
      
      The solution for the issue at hand is simple: adding evt->mult - 1 to
      the shifted value before the integer divison in the core conversion
      function takes care of it. But this only works for the case where for
      the scaled math mult/shift pair "mult <= 1 << shift" is true. For the
      case where "mult > 1 << shift" we can apply the rounding add only for
      the minimum delta value to make sure that the backward conversion is
      not less than the given hardware limit. For the upper bound we need to
      omit the rounding add, because the backwards conversion is always
      larger than the original latch value. That would violate the upper
      bound of the hardware device.
      
      Though looking closer at the details of that function reveals another
      bogosity: The upper bounds check is broken as well. Checking for a
      resulting "clc" value greater than KTIME_MAX after the conversion is
      pointless. The conversion does:
      
            u64 clc = (latch << evt->shift) / evt->mult;
      
      So there is no sanity check for (latch << evt->shift) exceeding the
      64bit boundary. The latch argument is "unsigned long", so on a 64bit
      arch the handed in argument could easily lead to an unnoticed shift
      overflow. With the above rounding fix applied the calculation before
      the divison is:
      
             u64 clc = (latch << evt->shift) + evt->mult - 1;
      
      So we need to make sure, that neither the shift nor the rounding add
      is overflowing the u64 boundary.
      
      [ukl: move assignment to rnd after eventually changing mult, fix build
       issue and correct comment with the right math]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: Marc Kleine-Budde <mkl@pengutronix.de>
      Cc: nicolas.ferre@atmel.com
      Cc: Marc Pignat <marc.pignat@hevs.ch>
      Cc: john.stultz@linaro.org
      Cc: kernel@pengutronix.de
      Cc: Ronald Wahl <ronald.wahl@raritan.com>
      Cc: LAK <linux-arm-kernel@lists.infradead.org>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1380052223-24139-1-git-send-email-u.kleine-koenig@pengutronix.deSigned-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      97b94106
  13. 19 10月, 2013 3 次提交
  14. 18 10月, 2013 1 次提交
  15. 10 10月, 2013 2 次提交
  16. 02 10月, 2013 1 次提交
    • S
      tick: broadcast: Deny per-cpu clockevents from being broadcast sources · 245a3496
      Soren Brinkmann 提交于
      On most ARM systems the per-cpu clockevents are truly per-cpu in
      the sense that they can't be controlled on any other CPU besides
      the CPU that they interrupt. If one of these clockevents were to
      become a broadcast source we will run into a lot of trouble
      because the broadcast source is enabled on the first CPU to go
      into deep idle (if that CPU suffers from FEAT_C3_STOP) and that
      could be a different CPU than what the clockevent is interrupting
      (or even worse the CPU that the clockevent interrupts could be
      offline).
      
      Theoretically it's possible to support per-cpu clockevents as the
      broadcast source but so far we haven't needed this and supporting
      it is rather complicated. Let's just deny the possibility for now
      until this becomes a reality (let's hope it never does!).
      Signed-off-by: NSoren Brinkmann <soren.brinkmann@xilinx.com>
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NMichal Simek <michal.simek@xilinx.com>
      245a3496
  17. 30 9月, 2013 2 次提交
    • K
      nohz: Drop generic vtime obsolete dependency on CONFIG_64BIT · ff3fb254
      Kevin Hilman 提交于
      The CONFIG_64BIT requirement on vtime can finally be removed
      since we now depend on HAVE_VIRT_CPU_ACCOUNTING_GEN which
      already takes care of the arch ability to handle nsecs based
      cputime_t safely.
      Signed-off-by: NKevin Hilman <khilman@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Arm Linux <linux-arm-kernel@lists.infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      ff3fb254
    • K
      vtime: Add HAVE_VIRT_CPU_ACCOUNTING_GEN Kconfig · 554b0004
      Kevin Hilman 提交于
      With VIRT_CPU_ACCOUNTING_GEN, cputime_t becomes 64-bit. In order
      to use that feature, arch code should be audited to ensure there are no
      races in concurrent read/write of cputime_t. For example,
      reading/writing 64-bit cputime_t on some 32-bit arches may require
      multiple accesses for low and high value parts, so proper locking
      is needed to protect against concurrent accesses.
      
      Therefore, add CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN which arches can
      enable after they've been audited for potential races.
      
      This option is automatically enabled on 64-bit platforms.
      
      Feature requested by Frederic Weisbecker.
      Signed-off-by: NKevin Hilman <khilman@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Arm Linux <linux-arm-kernel@lists.infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      554b0004
  18. 18 9月, 2013 1 次提交
  19. 12 9月, 2013 1 次提交
    • J
      timekeeping: Fix HRTICK related deadlock from ntp lock changes · 7bd36014
      John Stultz 提交于
      Gerlando Falauto reported that when HRTICK is enabled, it is
      possible to trigger system deadlocks. These were hard to
      reproduce, as HRTICK has been broken in the past, but seemed
      to be connected to the timekeeping_seq lock.
      
      Since seqlock/seqcount's aren't supported w/ lockdep, I added
      some extra spinlock based locking and triggered the following
      lockdep output:
      
      [   15.849182] ntpd/4062 is trying to acquire lock:
      [   15.849765]  (&(&pool->lock)->rlock){..-...}, at: [<ffffffff810aa9b5>] __queue_work+0x145/0x480
      [   15.850051]
      [   15.850051] but task is already holding lock:
      [   15.850051]  (timekeeper_lock){-.-.-.}, at: [<ffffffff810df6df>] do_adjtimex+0x7f/0x100
      
      <snip>
      
      [   15.850051] Chain exists of: &(&pool->lock)->rlock --> &p->pi_lock --> timekeeper_lock
      [   15.850051]  Possible unsafe locking scenario:
      [   15.850051]
      [   15.850051]        CPU0                    CPU1
      [   15.850051]        ----                    ----
      [   15.850051]   lock(timekeeper_lock);
      [   15.850051]                                lock(&p->pi_lock);
      [   15.850051] lock(timekeeper_lock);
      [   15.850051] lock(&(&pool->lock)->rlock);
      [   15.850051]
      [   15.850051]  *** DEADLOCK ***
      
      The deadlock was introduced by 06c017fd ("timekeeping:
      Hold timekeepering locks in do_adjtimex and hardpps") in 3.10
      
      This patch avoids this deadlock, by moving the call to
      schedule_delayed_work() outside of the timekeeper lock
      critical section.
      Reported-by: NGerlando Falauto <gerlando.falauto@keymile.com>
      Tested-by: NLin Ming <minggr@gmail.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: stable <stable@vger.kernel.org> #3.11, 3.10
      Link: http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7bd36014
  20. 01 9月, 2013 1 次提交
    • P
      nohz_full: Add full-system-idle state machine · 0edd1b17
      Paul E. McKenney 提交于
      This commit adds the state machine that takes the per-CPU idle data
      as input and produces a full-system-idle indication as output.  This
      state machine is driven out of RCU's quiescent-state-forcing
      mechanism, which invokes rcu_sysidle_check_cpu() to collect per-CPU
      idle state and then rcu_sysidle_report() to drive the state machine.
      
      The full-system-idle state is sampled using rcu_sys_is_idle(), which
      also drives the state machine if RCU is idle (and does so by forcing
      RCU to become non-idle).  This function returns true if all but the
      timekeeping CPU (tick_do_timer_cpu) are idle and have been idle long
      enough to avoid memory contention on the full_sysidle_state state
      variable.  The rcu_sysidle_force_exit() may be called externally
      to reset the state machine back into non-idle state.
      
      For large systems the state machine is driven out of RCU's
      force-quiescent-state logic, which provides good scalability at the price
      of millisecond-scale latencies on the transition to full-system-idle
      state.  This is not so good for battery-powered systems, which are usually
      small enough that they don't need to care about scalability, but which
      do care deeply about energy efficiency.  Small systems therefore drive
      the state machine directly out of the idle-entry code.  The number of
      CPUs in a "small" system is defined by a new NO_HZ_FULL_SYSIDLE_SMALL
      Kconfig parameter, which defaults to 8.  Note that this is a build-time
      definition.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      [ paulmck: Use true and false for boolean constants per Lai Jiangshan. ]
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Simplify logic and provide better comments for memory barriers,
        based on review comments and questions by Lai Jiangshan. ]
      0edd1b17
  21. 29 8月, 2013 1 次提交
  22. 23 8月, 2013 1 次提交
    • M
      ntp: Make periodic RTC update more reliable · a97ad0c4
      Miroslav Lichvar 提交于
      The current code requires that the scheduled update of the RTC happens
      in the closest tick to the half of the second. This seems to be
      difficult to achieve reliably. The scheduled work may be missing the
      target time by a tick or two and be constantly rescheduled every second.
      
      Relax the limit to 10 ticks. As a typical RTC drifts in the 11-minute
      update interval by several milliseconds, this shouldn't affect the
      overall accuracy of the RTC much.
      Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      a97ad0c4
  23. 19 8月, 2013 1 次提交
  24. 16 8月, 2013 1 次提交
  25. 14 8月, 2013 2 次提交
    • F
      nohz: Optimize full dynticks's sched hooks with static keys · d13508f9
      Frederic Weisbecker 提交于
      Scheduler IPIs and task context switches are serious fast path.
      Let's try to hide as much as we can the impact of full
      dynticks APIs' off case that are called on these sites
      through the use of static keys.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      d13508f9
    • F
      nohz: Optimize full dynticks state checks with static keys · 460775df
      Frederic Weisbecker 提交于
      These APIs are frequenctly accessed and priority is given
      to optimize the full dynticks off-case in order to let
      distros enable this feature without suffering from
      significant performance regressions.
      
      Let's inline these APIs and optimize them with static keys.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      460775df