1. 23 10月, 2014 1 次提交
  2. 24 7月, 2014 1 次提交
    • T
      clocksource: Move cycle_last validation to core code · 09ec5442
      Thomas Gleixner 提交于
      The only user of the cycle_last validation is the x86 TSC. In order to
      provide NMI safe accessor functions for clock monotonic and
      monotonic_raw we need to do that in the core.
      
      We can't do the TSC specific
      
          if (now < cycle_last)
             	    now = cycle_last;
      
      for the other wrapping around clocksources, but TSC has
      CLOCKSOURCE_MASK(64) which actually does not mask out anything so if
      now is less than cycle_last the subtraction will give a negative
      result. So we can check for that in clocksource_delta() and return 0
      for that case.
      
      Implement and enable it for x86
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      09ec5442
  3. 02 7月, 2014 1 次提交
    • P
      x86, tsc: Fix cpufreq lockup · 3896c329
      Peter Zijlstra 提交于
      Mauro reported that his AMD X2 using the powernow-k8 cpufreq driver
      locked up when doing cpu hotplug.
      
      Because we called set_cyc2ns_scale() from the time_cpufreq_notifier()
      unconditionally, it gets called multiple times for each freq change,
      instead of only the once, when the tsc_khz value actually changes.
      
      Because it gets called more than once, we run out of cyc2ns data slots
      and stall, waiting for a free one, but because we're half way offline,
      there's no consumers to free slots.
      
      By placing the call inside the condition that actually changes tsc_khz
      we avoid superfluous calls and avoid the problem.
      Reported-by: NMauro <registosites@hotmail.com>
      Tested-by: NMauro <registosites@hotmail.com>
      Fixes: 20d1c86a ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs")
      Cc: <stable@vger.kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Bin Gao <bin.gao@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Stefani Seibold <stefani@seibold.net>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3896c329
  4. 20 6月, 2014 1 次提交
  5. 19 3月, 2014 2 次提交
  6. 20 2月, 2014 1 次提交
    • T
      x86, tsc: Fallback to normal calibration if fast MSR calibration fails · 5f0e0309
      Thomas Gleixner 提交于
      If we cannot calibrate TSC via MSR based calibration
      try_msr_calibrate_tsc() stores zero to fast_calibrate and returns that
      to the caller. This value gets then propagated further to clockevents
      code resulting division by zero oops like the one below:
      
       divide error: 0000 [#1] PREEMPT SMP
       Modules linked in:
       CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0+ #47
       task: ffff880075508000 ti: ffff880075506000 task.ti: ffff880075506000
       RIP: 0010:[<ffffffff810aec14>]  [<ffffffff810aec14>] clockevents_config.part.3+0x24/0xa0
       RSP: 0000:ffff880075507e58  EFLAGS: 00010246
       RAX: ffffffffffffffff RBX: ffff880079c0cd80 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff
       RBP: ffff880075507e70 R08: 0000000000000001 R09: 00000000000000be
       R10: 00000000000000bd R11: 0000000000000003 R12: 000000000000b008
       R13: 0000000000000008 R14: 000000000000b010 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff880079c00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: ffff880079fff000 CR3: 0000000001c0b000 CR4: 00000000001006f0
       Stack:
        ffff880079c0cd80 000000000000b008 0000000000000008 ffff880075507e88
        ffffffff810aecb0 ffff880079c0cd80 ffff880075507e98 ffffffff81030168
        ffff880075507ed8 ffffffff81d1104f 00000000000000c3 0000000000000000
       Call Trace:
        [<ffffffff810aecb0>] clockevents_config_and_register+0x20/0x30
        [<ffffffff81030168>] setup_APIC_timer+0xc8/0xd0
        [<ffffffff81d1104f>] setup_boot_APIC_clock+0x4cc/0x4d8
        [<ffffffff81d0f5de>] native_smp_prepare_cpus+0x3dd/0x3f0
        [<ffffffff81d02ee9>] kernel_init_freeable+0xc3/0x205
        [<ffffffff8177c910>] ? rest_init+0x90/0x90
        [<ffffffff8177c91e>] kernel_init+0xe/0x120
        [<ffffffff8178deec>] ret_from_fork+0x7c/0xb0
        [<ffffffff8177c910>] ? rest_init+0x90/0x90
      
      Prevent this from happening by:
       1) Modifying try_msr_calibrate_tsc() to return calibration value or zero
          if it fails.
       2) Check this return value in native_calibrate_tsc() and in case of zero
          fallback to use normal non-MSR based calibration.
      
      [mw: Added subject and changelog]
      Reported-and-tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bin Gao <bin.gao@linux.intel.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/1392810750-18660-1-git-send-email-mika.westerberg@linux.intel.comSigned-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5f0e0309
  7. 09 2月, 2014 1 次提交
  8. 23 1月, 2014 1 次提交
  9. 16 1月, 2014 1 次提交
  10. 13 1月, 2014 4 次提交
  11. 23 7月, 2013 1 次提交
  12. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  13. 16 3月, 2013 1 次提交
  14. 16 1月, 2013 1 次提交
  15. 24 10月, 2012 1 次提交
  16. 06 6月, 2012 1 次提交
  17. 20 3月, 2012 1 次提交
    • M
      x86: kvmclock: abstract save/restore sched_clock_state · b74f05d6
      Marcelo Tosatti 提交于
      Upon resume from hibernation, CPU 0's hvclock area contains the old
      values for system_time and tsc_timestamp. It is necessary for the
      hypervisor to update these values with uptodate ones before the CPU uses
      them.
      
      Abstract TSC's save/restore sched_clock_state functions and use
      restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update.
      
      Also move restore_sched_clock_state before __restore_processor_state,
      since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC).
      Thanks to Igor Mammedov for tracking it down.
      
      Fixes suspend-to-disk with kvmclock.
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      b74f05d6
  18. 16 3月, 2012 1 次提交
    • A
      x86, tsc: Skip refined tsc calibration on systems with reliable TSC · 57779dc2
      Alok Kataria 提交于
      While running the latest Linux as guest under VMware in highly
      over-committed situations, we have seen cases when the refined TSC
      algorithm fails to get a valid tsc_start value in
      tsc_refine_calibration_work from multiple attempts. As a result the
      kernel keeps on scheduling the tsc_irqwork task for later. Subsequently
      after several attempts when it gets a valid start value it goes through
      the refined calibration and either bails out or uses the new results.
      Given that the kernel originally read the TSC frequency from the
      platform, which is the best it can get, I don't think there is much
      value in refining it.
      
      So  for systems which get the TSC frequency from the platform we
      should skip the refined tsc algorithm.
      
      We can use the TSC_RELIABLE cpu cap flag to detect this, right now it is
      set only on VMware and for Moorestown Penwell both of which have there
      own TSC calibration methods.
      Signed-off-by: NAlok N Kataria <akataria@vmware.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Dirk Brandewie <dirk.brandewie@gmail.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: stable@kernel.org
      [jstultz: Reworked to simply not schedule the refining work,
      rather then scheduling the work and bombing out later]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      57779dc2
  19. 13 3月, 2012 1 次提交
  20. 18 1月, 2012 1 次提交
    • L
      x86, tsc: Fix SMI induced variation in quick_pit_calibrate() · 68f30fbe
      Linus Torvalds 提交于
      pit_expect_msb() returns success wrongly in the below SMI scenario:
      
      a. pit_verify_msb() has not yet seen the MSB transition.
      
      b. we are close to the MSB transition though and got a SMI immediately after
         returning from pit_verify_msb() which didn't see the MSB transition. PIT MSB
         transition has happened somewhere during SMI execution.
      
      c. returned from SMI and we noted down the 'tsc', saw the pit MSB change now and
         exited the loop to calculate 'deltatsc'. Instead of noting the TSC at the MSB
         transition, we are way off because of the SMI.  And as the SMI happened
         between the pit_verify_msb() and before the 'tsc' is recorded in the
         for loop, 'delattsc' (d1/d2 in quick_pit_calibrate()) will be small and
         quick_pit_calibrate() will not notice this error.
      
      Depending on whether SMI disturbance happens while computing d1 or d2, we will
      see the TSC calibrated value smaller or bigger than the expected value. As a
      result, in a cluster we were seeing a variation of approximately +/- 20MHz in
      the calibrated values, resulting in NTP failures.
      
        [ As far as the SMI source is concerned, this is a periodic SMI that gets
          disabled after ACPI is enabled by the OS. But the TSC calibration happens
          before the ACPI is enabled. ]
      
      To address this, change pit_expect_msb() so that
      
       - the 'tsc' is the TSC in between the two reads that read the MSB
      change from the PIT (same as before)
      
       - the 'delta' is the difference in TSC from *before* the MSB changed
      to *after* the MSB changed.
      
      Now the delta is twice as big as before (it covers four PIT accesses,
      roughly 4us) and quick_pit_calibrate() will loop a bit longer to get
      the calibrated value with in the 500ppm precision. As the delta (d1/d2)
      covers four PIT accesses, actual calibrated result might be closer to
      250ppm precision.
      
      As the loop now takes longer to stabilize, double MAX_QUICK_PIT_MS to 50.
      
      SMI disturbance will showup as much larger delta's and the loop will take
      longer than usual for the result to be with in the accepted precision. Or will
      fallback to slow PIT calibration if it takes more than 50msec.
      
      Also while we are at this, remove the calibration correction that aims to
      get the result to the middle of the error bars. We really don't know which
      direction to correct into, so remove it.
      Reported-and-tested-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1326843337.5291.4.camel@sbsiddha-mobl2Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      68f30fbe
  21. 06 12月, 2011 2 次提交
  22. 22 11月, 2011 1 次提交
  23. 15 7月, 2011 1 次提交
  24. 14 7月, 2011 1 次提交
  25. 01 6月, 2011 1 次提交
  26. 24 5月, 2011 4 次提交
  27. 18 3月, 2011 1 次提交
  28. 15 1月, 2011 1 次提交
    • J
      x86: tsc: Fix calibration refinement conditionals to avoid divide by zero · 62627bec
      John Stultz 提交于
      Konrad Wilk reported that the new delayed calibration crashes with a
      divide by zero on Xen. The reason is that Xen sets the pmtimer
      address, but reading from it returns 0xffffff. That results in the
      ref_start and ref_stop value being the same, so the delta is zero
      which causes the divide by zero later in the calculation.
      
      The conditional (!hpet && !ref_start && !ref_stop) which sanity checks
      the calibration reference values doesn't really make sense. If the
      refs are null, but hpet is on, we still want to break out.
      
      The div by zero would be possible to trigger by chance if both reads
      from the hardware provided the exact same value (due to hardware
      wrapping).
      
      So checking if both the ref values are the same should handle if we
      don't have hardware (both null) or if they are the same value (either by
      invalid hardware, or by chance), avoiding the div by zero issue.
      
      [ tglx: Applied the same fix to native_calibrate_tsc() where this
        	check was copied from ]
      Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1295024788-15619-1-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      62627bec
  29. 11 1月, 2011 1 次提交
  30. 30 12月, 2010 1 次提交
  31. 13 12月, 2010 1 次提交
    • T
      x86: Check tsc available/disabled in the delayed init function · a8760eca
      Thomas Gleixner 提交于
      The delayed TSC init function does not check whether the system has no
      TSC or TSC is disabled at the kernel command line, which results in a
      crash in the work queue based extended calibration due to division by
      zero because the basic calibration never happened.
      
      Add the missing checks and do not touch TSC when not available or
      disabled.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <johnstul@us.ibm.com>
      a8760eca
  32. 03 12月, 2010 1 次提交
    • J
      x86: Improve TSC calibration using a delayed workqueue · 08ec0c58
      John Stultz 提交于
      Boot to boot the TSC calibration may vary by quite a large amount.
      
      While normal variance of 50-100ppm can easily be seen, the quick
      calibration code only requires 500ppm accuracy, which is the limit
      of what NTP can correct for.
      
      This can cause problems for systems being used as NTP servers, as
      every time they reboot it can take hours for them to calculate the
      new drift error caused by the calibration.
      
      The classic trade-off here is calibration accuracy vs slow boot times,
      as during the calibration nothing else can run.
      
      This patch uses a delayed workqueue  to calibrate the TSC over the
      period of a second. This allows very accurate calibration (in my
      tests only varying by 1khz or 0.4ppm boot to boot). Additionally this
      refined calibration step does not block the boot process, and only
      delays the TSC clocksoure registration by a few seconds in early boot.
      If the refined calibration strays 1% from the early boot calibration
      value, the system will fall back to already calculated early boot
      calibration.
      
      Credit to Andi Kleen who suggested using a timer quite awhile back,
      but I dismissed it thinking the timer calibration would be done after
      the clocksource was registered (which would break things). Forgive
      me for my short-sightedness.
      
      This patch has worked very well in my testing, but TSC hardware is
      quite varied so it would probably be good to get some extended
      testing, possibly pushing inclusion out to 2.6.39.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1289003985-29060-1-git-send-email-johnstul@us.ibm.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
      CC: Clark Williams <williams@redhat.com>
      CC: Andi Kleen <andi@firstfloor.org>
      08ec0c58