1. 19 6月, 2008 1 次提交
    • M
      x86, 32-bit: fix boot failure on TSC-less processors · df17b1d9
      Mikael Pettersson 提交于
      Booting 2.6.26-rc6 on my 486 DX/4 fails with a "BUG: Int 6"
      (invalid opcode) and a kernel halt immediately after the
      kernel has been uncompressed. The BUG shows EIP pointing
      to an rdtsc instruction in native_read_tsc(), invoked from
      native_sched_clock().
      
      (This error occurs so early that not even the serial console
      can capture it.)
      
      A bisection showed that this bug first occurs in 2.6.26-rc3-git7,
      via commit 9ccc906c:
      
      >x86: distangle user disabled TSC from unstable
      >
      >tsc_enabled is set to 0 from the command line switch "notsc" and from
      >the mark_tsc_unstable code. Seperate those functionalities and replace
      >tsc_enable with tsc_disable. This makes also the native_sched_clock()
      >decision when to use TSC understandable.
      >
      >Preparatory patch to solve the sched_clock() issue on 32 bit.
      >
      >Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
      
      The core reason for this bug is that native_sched_clock() gets
      called before tsc_init().
      
      Before the commit above, tsc_32.c used a "tsc_enabled" variable
      which defaulted to 0 == disabled, and which only got enabled late
      in tsc_init(). Thus early calls to native_sched_clock() would skip
      the TSC and use jiffies instead.
      
      After the commit above, tsc_32.c uses a "tsc_disabled" variable
      which defaults to 0, meaning that the TSC is Ok to use. Early calls
      to native_sched_clock() now erroneously try to use the TSC on
      !cpu_has_tsc processors, leading to invalid opcode exceptions.
      
      My proposed fix is to initialise tsc_disabled to a "soft disabled"
      state distinct from the hard disabled state set up by the "notsc"
      kernel option. This fixes the native_sched_clock() problem. It also
      allows tsc_init() to be simplified: instead of setting tsc_disabled = 1
      on every error return, we just set tsc_disabled = 0 once when all
      checks have succeeded.
      
      I've verified that this lets my 486 boot again. I've also verified
      that a Core2 machine still uses the TSC as clocksource after the patch.
      Signed-off-by: NMikael Pettersson <mikpe@it.uu.se>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      df17b1d9
  2. 23 5月, 2008 2 次提交
  3. 20 4月, 2008 2 次提交
    • T
      x86: tsc prevent time going backwards · d8bb6f4c
      Thomas Gleixner 提交于
      We already catch most of the TSC problems by sanity checks, but there
      is a subtle bug which has been in the code forever. This can cause
      time jumps in the range of hours.
      
      This was reported in:
           http://lkml.org/lkml/2007/8/23/96
      and
           http://lkml.org/lkml/2008/3/31/23
      
      I was able to reproduce the problem with a gettimeofday loop test on a
      dual core and a quad core machine which both have sychronized
      TSCs. The TSCs seems not to be perfectly in sync though, but the
      kernel is not able to detect the slight delta in the sync check. Still
      there exists an extremly small window where this delta can be observed
      with a real big time jump. So far I was only able to reproduce this
      with the vsyscall gettimeofday implementation, but in theory this
      might be observable with the syscall based version as well.
      
      CPU 0 updates the clock source variables under xtime/vyscall lock and
      CPU1, where the TSC is slighty behind CPU0, is reading the time right
      after the seqlock was unlocked.
      
      The clocksource reference data was updated with the TSC from CPU0 and
      the value which is read from TSC on CPU1 is less than the reference
      data. This results in a huge delta value due to the unsigned
      subtraction of the TSC value and the reference value. This algorithm
      can not be changed due to the support of wrapping clock sources like
      pm timer.
      
      The huge delta is converted to nanoseconds and added to xtime, which
      is then observable by the caller. The next gettimeofday call on CPU1
      will show the correct time again as now the TSC has advanced above the
      reference value.
      
      To prevent this TSC specific wreckage we need to compare the TSC value
      against the reference value and return the latter when it is larger
      than the actual TSC value.
      
      I pondered to mark the TSC unstable when the readout is smaller than
      the reference value, but this would render an otherwise good and fast
      clocksource unusable without a real good reason.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d8bb6f4c
    • P
      4bd01600
  4. 17 4月, 2008 2 次提交
  5. 08 4月, 2008 2 次提交
  6. 05 4月, 2008 1 次提交
    • T
      x86: tsc prevent time going backwards · 47001d60
      Thomas Gleixner 提交于
      We already catch most of the TSC problems by sanity checks, but there
      is a subtle bug which has been in the code for ever. This can cause
      time jumps in the range of hours.
      
      This was reported in:
           http://lkml.org/lkml/2007/8/23/96
      and
           http://lkml.org/lkml/2008/3/31/23
      
      I was able to reproduce the problem with a gettimeofday loop test on a
      dual core and a quad core machine which both have sychronized
      TSCs. The TSCs seems not to be perfectly in sync though, but the
      kernel is not able to detect the slight delta in the sync check. Still
      there exists an extremly small window where this delta can be observed
      with a real big time jump. So far I was only able to reproduce this
      with the vsyscall gettimeofday implementation, but in theory this
      might be observable with the syscall based version as well.
      
      CPU 0 updates the clock source variables under xtime/vyscall lock and
      CPU1, where the TSC is slighty behind CPU0, is reading the time right
      after the seqlock was unlocked.
      
      The clocksource reference data was updated with the TSC from CPU0 and
      the value which is read from TSC on CPU1 is less than the reference
      data. This results in a huge delta value due to the unsigned
      subtraction of the TSC value and the reference value. This algorithm
      can not be changed due to the support of wrapping clock sources like
      pm timer.
      
      The huge delta is converted to nanoseconds and added to xtime, which
      is then observable by the caller. The next gettimeofday call on CPU1
      will show the correct time again as now the TSC has advanced above the
      reference value.
      
      To prevent this TSC specific wreckage we need to compare the TSC value
      against the reference value and return the latter when it is larger
      than the actual TSC value.
      
      I pondered to mark the TSC unstable when the readout is smaller than
      the reference value, but this would render an otherwise good and fast
      clocksource unusable without a real good reason.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      47001d60
  7. 26 2月, 2008 1 次提交
  8. 30 1月, 2008 3 次提交
    • A
      x86: convert TSC disabling to generic cpuid disable bitmap · 404ee5b1
      Andi Kleen 提交于
      Fix from: Ian Campbell <ijc@hellion.org.uk>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      404ee5b1
    • A
      x86: allow TSC clock source on AMD Fam10h and some cleanup · 51fc97b9
      Andi Kleen 提交于
      After a lot of discussions with AMD it turns out that TSC
      on Fam10h CPUs is synchronized when the CONSTANT_TSC cpuid bit is set.
      Or rather that if there are ever systems where that is not
      true it would be their BIOS' task to disable the bit.
      
      So finally use TSC gettimeofday on Fam10h by default.
      
      Or rather it is always used now on CPUs where the AMD
      specific CONSTANT_TSC bit is set.
      
      This gives a nice speed bost for gettimeofday() on these systems
      which tends to be by far the most common v/syscall.
      
      On a Fam10h system here TSC gtod uses about 20% of the CPU time of
      acpi_pm based gtod(). This was measured on 32bit, on 64bit
      it is even better because TSC gtod() can use a vsyscall
      and stay in ring 3, which acpi_pm doesn't.
      
      The Intel check simply checks for CONSTANT_TSC too without hardcoding
      Intel vendor. This is equivalent on 64bit because all 64bit capable Intel
      CPUs will have CONSTANT_TSC set.
      
      On Intel there is no CPU supplied CONSTANT_TSC bit currently,
      but we synthesize one based on hardcoded knowledge which steppings
      have p-state invariant TSC.
      
      So the new logic is now: On CPUs which have the AMD specific
      CONSTANT_TSC bit set or on Intel CPUs which are new enough
      to be known to have p-state invariant TSC always use
      TSC based gettimeofday()
      
      Cc: lenb@kernel.org
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      51fc97b9
    • G
      x86: scale cyc_2_nsec according to CPU frequency · 53d517cd
      Guillaume Chazarain 提交于
      scale the sched_clock() cyc_2_nsec scaling factor according to
      CPU frequency changes.
      
      [ mingo@elte.hu: simplified it and fixed it for SMP. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      53d517cd
  9. 24 10月, 2007 2 次提交
    • D
      x86: fix more TSC clock source calibration errors · 8c660065
      Dave Johnson 提交于
      The previous patch wasn't correctly handling the 'count' variable.  If
      a CPU gave bad results on the 1st or 2nd run but good results on the
      3rd, it wouldn't do the correct thing.  No idea if any such CPU
      exists, but the patch below handles that case by discarding the bad
      runs.
      
      If a bad result (too quick, or too slow) occurs on any of the 3 runs
      it will be discarded.
      
      Also updated some comments to explain what's going on.
      Signed-off-by: NDave Johnson <djohnson@sw.starentnetworks.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      8c660065
    • D
      x86: fix TSC clock source calibration error · edaf420f
      Dave Johnson 提交于
      I ran into this problem on a system that was unable to obtain NTP sync
      because the clock was running very slow (over 10000ppm slow). ntpd had
      declared all of its peers 'reject' with 'peer_dist' reason.
      
      On investigation, the tsc_khz variable was significantly incorrect
      causing xtime to run slow.  After a reboot tsc_khz was correct so I
      did a reboot test to see how often the problem occurred:
      
      Test was done on a 2000 Mhz Xeon system.  Of 689 reboots, 8 of them
      had unacceptable tsc_khz values (>500ppm):
      
       range of tsc_khz  # of boots  % of boots
       ----------------  ----------  ----------
              < 1999750           0      0.000%
      1999750 - 1999800          21      3.048%
      1999800 - 1999850         166     24.128%
      1999850 - 1999900         241     35.029%
      1999900 - 1999950         211     30.669%
      1999950 - 2000000          42      6.105%
      2000000 - 2000000           0      0.000%
      2000050 - 2000100           0      0.000%
                         [...]
      2000100 - 2015000           1      0.145%  << BAD
      2015000 - 2030000           6      0.872%  << BAD
      2030000 - 2045000           1      0.145%  << BAD
      2045000 <                   0      0.000%
      
      The worst boot was 2032.577 Mhz, over 1.5% off!
      
      It appears that on rare occasions, mach_countup() is taking longer to
      complete than necessary.
      
      I suspect that this is caused by the CPU taking a periodic SMI
      interrupt right at the end of the 30ms calibration loop.  This would
      cause the loop to delay while the SMI BIOS hander runs. The resulting
      TSC value is beyond what it actually should be resulting in a higher
      tsc_khz.
      
      The below patch makes native_calculate_cpu_khz() take the best
      (shortest duration, lowest khz) run of it's 3 calibration loops.  If a
      SMI goes off causing a bad result (long duration, higher khz) it will
      be discarded.
      
      With the patch applied, 300 boots of the same system produce good
      results:
      
       range of tsc_khz  # of boots  % of boots
       ----------------  ----------  ----------
              < 1999750           0      0.000%
      1999750 - 1999800          30     10.000%
      1999800 - 1999850         166     55.333%
      1999850 - 1999900          89     29.667%
      1999900 - 1999950          15      5.000%
      1999950 <                   0      0.000%
      
      Problem was found and tested against 2.6.18.  Patch is against 2.6.22.
      Signed-off-by: NDave Johnson <djohnson@sw.starentnetworks.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      edaf420f
  10. 20 10月, 2007 3 次提交
  11. 18 10月, 2007 1 次提交
    • I
      x86: do not crash on non-Geode PCs in TSC probe · f97586b6
      Ingo Molnar 提交于
      with this fix Geode kernels can be booted (and QA-ed) on generic PCs.
      
      otherwise it crashes and burns during early bootup:
      
      Detected 2160.212 MHz processor.
      general protection fault: 0000 [#1]
      PREEMPT SMP
      Modules linked in:
      CPU:    0
      EIP:    0060:[<c09071f6>]    Not tainted VLI
      EFLAGS: 00010002   (2.6.23-rc9 #90)
      EIP is at tsc_init+0xa6/0x150
      eax: 00000001   ebx: c1dce000   ecx: 00001900   edx: 00000001
      esi: 00051000   edi: 00051000   ebp: c08fdfc4   esp: c08fdfa4
      ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
      Process swapper (pid: 0, ti=c08fc000 task=c082a180 task.ti=c08fc000)
      Stack: c076b870 00000870 000000d4 0000001d c0831e80 c1dce000 00051000 00051000
             c08fdfcc c09053f8 c08fdff8 c09045ff 000001e2 c09040a0 00051000 00000020
             0004e500 c0932140 00020800 00099800 c08ed000 01409007 00000000
      Call Trace:
       [<c010517a>] show_trace_log_lvl+0x1a/0x30
       [<c0105246>] show_stack_log_lvl+0xb6/0x100
       [<c0105732>] show_registers+0x212/0x3a0
       [<c0105aa4>] die+0x104/0x220
       [<c0105f5f>] do_general_protection+0x1ef/0x2b0
       [<c06699f2>] error_code+0x72/0x78
       [<c09053f8>] time_init+0x8/0x20
       [<c09045ff>] start_kernel+0x1af/0x320
       [<00000000>] 0x0
       =======================
      Code: 31 d2 b8 00 00 09 3d f7 35 2c 70 9b c0 a3 04 95 8f c0 e8 ce 4e 99 ff b8 e0 45 93 c0 e8 94 b1 c5 ff e8 7f 3d 80 ff b9 00 19 00 00 <0f> 32 f6 c4 01 74 07 83 25 24 ce 82 c0 fd 8b 0d 20 ce 82 c0 b8
      EIP: [<c09071f6>] tsc_init+0xa6/0x150 SS:ESP 0068:c08fdfa4
      Kernel panic - not syncing: Attempted to kill the idle task!
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f97586b6
  12. 14 10月, 2007 1 次提交
    • D
      Delete filenames in comments. · 835c34a1
      Dave Jones 提交于
      Since the x86 merge, lots of files that referenced their own filenames
      are no longer correct.  Rather than keep them up to date, just delete
      them, as they add no real value.
      
      Additionally:
      - fix up comment formatting in scx200_32.c
      - Remove a credit from myself in setup_64.c from a time when we had no SCM
      - remove longwinded history from tsc_32.c which can be figured out from
        git.
      Signed-off-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      835c34a1
  13. 11 10月, 2007 2 次提交
  14. 10 10月, 2007 1 次提交
    • J
      drivers/firmware: const-ify DMI API and internals · 1855256c
      Jeff Garzik 提交于
      Three main sets of changes:
      
      1) dmi_get_system_info() return value should have been marked const,
         since callers should not be changing that data.
      
      2) const-ify DMI internals, since DMI firmware tables should,
         whenever possible, be marked const to ensure we never ever write to
         that data area.
      
      3) const-ify DMI API, to enable marking tables const where possible
         in low-level drivers.
      
      And if we're really lucky, this might enable some additional
      optimizations on the part of the compiler.
      
      The bulk of the changes are #2 and #3, which are interrelated.  #1 could
      have been a separate patch, but it was so small compared to the others,
      it was easier to roll it into this changeset.
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      1855256c
  15. 23 8月, 2007 1 次提交
    • I
      sched: sched_clock_idle_[sleep|wakeup]_event() · 2aa44d05
      Ingo Molnar 提交于
      construct a more or less wall-clock time out of sched_clock(), by
      using ACPI-idle's existing knowledge about how much time we spent
      idling. This allows the rq clock to work around TSC-stops-in-C2,
      TSC-gets-corrupted-in-C3 type of problems.
      
      ( Besides the scheduler's statistics this also benefits blktrace and
        printk-timestamps as well. )
      
      Furthermore, the precise before-C2/C3-sleep and after-C2/C3-wakeup
      callbacks allow the scheduler to get out the most of the period where
      the CPU has a reliable TSC. This results in slightly more precise
      task statistics.
      
      the ACPI bits were acked by Len.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NLen Brown <len.brown@intel.com>
      2aa44d05
  16. 20 7月, 2007 1 次提交
  17. 18 7月, 2007 1 次提交
    • J
      Add a sched_clock paravirt_op · 688340ea
      Jeremy Fitzhardinge 提交于
      The tsc-based get_scheduled_cycles interface is not a good match for
      Xen's runstate accounting, which reports everything in nanoseconds.
      
      This patch replaces this interface with a sched_clock interface, which
      matches both Xen and VMI's requirements.
      
      In order to do this, we:
         1. replace get_scheduled_cycles with sched_clock
         2. hoist cycles_2_ns into a common header
         3. update vmi accordingly
      
      One thing to note: because sched_clock is implemented as a weak
      function in kernel/sched.c, we must define a real function in order to
      override this weak binding.  This means the usual paravirt_ops
      technique of using an inline function won't work in this case.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Dan Hecht <dhecht@vmware.com>
      Cc: john stultz <johnstul@us.ibm.com>
      688340ea
  18. 10 7月, 2007 1 次提交
  19. 03 5月, 2007 2 次提交
    • D
      [PATCH] i386: remove xtime_lock'ing around cpufreq notifier · df3624aa
      Daniel Walker 提交于
      The locking of the xtime_lock around the cpu notifier is unessesary now.
      At one time the tsc was used after a frequency change for timekeeping, but
      the re-write of timekeeping no longer uses the TSC unless the frequency is
      constant.
      
      The variables that are changed in this section of code had also once been
      used for timekeeping, but not any longer ..
      Signed-off-by: NDaniel Walker <dwalker@mvista.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      df3624aa
    • J
      [PATCH] x86: Log reason why TSC was marked unstable · 5a90cf20
      john stultz 提交于
      Change mark_tsc_unstable() so it takes a string argument, which holds the
      reason the TSC was marked unstable.
      
      This is then displayed the first time mark_tsc_unstable is called.
      
      This should help us better debug why the TSC was marked unstable on certain
      systems and allow us to make sure we're not being overly paranoid when
      throwing out this troublesome clocksource.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      5a90cf20
  20. 25 3月, 2007 1 次提交
  21. 17 3月, 2007 1 次提交
  22. 07 3月, 2007 1 次提交
  23. 05 3月, 2007 3 次提交
    • J
      [PATCH] clocksource init adjustments (fix bug #7426) · 6bb74df4
      john stultz 提交于
      This patch resolves the issue found here:
      http://bugme.osdl.org/show_bug.cgi?id=7426
      
      The basic summary is:
      Currently we register most of i386/x86_64 clocksources at module_init
      time. Then we enable clocksource selection at late_initcall time. This
      causes some problems for drivers that use gettimeofday for init
      calibration routines (specifically the es1968 driver in this case),
      where durring module_init, the only clocksource available is the low-res
      jiffies clocksource. This may cause slight calibration errors, due to
      the small sampling time used.
      
      It should be noted that drivers that require fine grained time may not
      function on architectures that do not have better then jiffies
      resolution timekeeping (there are a few). However, this does not
      discount the reasonable need for such fine-grained timekeeping at init
      time.
      
      Thus the solution here is to register clocksources earlier (ideally when
      the hardware is being initialized), and then we enable clocksource
      selection at fs_initcall (before device_initcall).
      
      This patch should probably get some testing time in -mm, since
      clocksource selection is one of the most important issues for correct
      timekeeping, and I've only been able to test this on a few of my own
      boxes.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6bb74df4
    • Z
      [PATCH] vmi: cpu cycles fix · 1182d852
      Zachary Amsden 提交于
      In order to share the common code in tsc.c which does CPU Khz calibration, we
      need to make an accurate value of CPU speed available to the tsc.c code.  This
      value loses a lot of precision in a VM because of the timing differences with
      real hardware, but we need it to be as precise as possible so the guest can
      make accurate time calculations with the cycle counters.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1182d852
    • Z
      [PATCH] vmi: sched clock paravirt op fix · 6cb9a835
      Zachary Amsden 提交于
      The custom_sched_clock hook is broken.  The result from sched_clock needs to
      be in nanoseconds, not in CPU cycles.  The TSC is insufficient for this
      purpose, because TSC is poorly defined in a virtual environment, and mostly
      represents real world time instead of scheduled process time (which can be
      interrupted without notice when a virtual machine is descheduled).
      
      To make the scheduler consistent, we must expose a different nature of time,
      that is scheduled time.  So deprecate this custom_sched_clock hack and turn it
      into a paravirt-op, as it should have been all along.  This allows the tsc.c
      code which converts cycles to nanoseconds to be shared by all paravirt-ops
      backends.
      
      It is unfortunate to add a new paravirt-op, but this is a very distinct
      abstraction which is clearly different for all virtual machine
      implementations, and it gets rid of an ugly indirect function which I
      ashamedly admit I hacked in to try to get this to work earlier, and then even
      got in the wrong units.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6cb9a835
  24. 17 2月, 2007 4 次提交