1. 24 10月, 2007 3 次提交
    • D
      x86: fix more TSC clock source calibration errors · 8c660065
      Dave Johnson 提交于
      The previous patch wasn't correctly handling the 'count' variable.  If
      a CPU gave bad results on the 1st or 2nd run but good results on the
      3rd, it wouldn't do the correct thing.  No idea if any such CPU
      exists, but the patch below handles that case by discarding the bad
      runs.
      
      If a bad result (too quick, or too slow) occurs on any of the 3 runs
      it will be discarded.
      
      Also updated some comments to explain what's going on.
      Signed-off-by: NDave Johnson <djohnson@sw.starentnetworks.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      8c660065
    • D
      x86: fix TSC clock source calibration error · edaf420f
      Dave Johnson 提交于
      I ran into this problem on a system that was unable to obtain NTP sync
      because the clock was running very slow (over 10000ppm slow). ntpd had
      declared all of its peers 'reject' with 'peer_dist' reason.
      
      On investigation, the tsc_khz variable was significantly incorrect
      causing xtime to run slow.  After a reboot tsc_khz was correct so I
      did a reboot test to see how often the problem occurred:
      
      Test was done on a 2000 Mhz Xeon system.  Of 689 reboots, 8 of them
      had unacceptable tsc_khz values (>500ppm):
      
       range of tsc_khz  # of boots  % of boots
       ----------------  ----------  ----------
              < 1999750           0      0.000%
      1999750 - 1999800          21      3.048%
      1999800 - 1999850         166     24.128%
      1999850 - 1999900         241     35.029%
      1999900 - 1999950         211     30.669%
      1999950 - 2000000          42      6.105%
      2000000 - 2000000           0      0.000%
      2000050 - 2000100           0      0.000%
                         [...]
      2000100 - 2015000           1      0.145%  << BAD
      2015000 - 2030000           6      0.872%  << BAD
      2030000 - 2045000           1      0.145%  << BAD
      2045000 <                   0      0.000%
      
      The worst boot was 2032.577 Mhz, over 1.5% off!
      
      It appears that on rare occasions, mach_countup() is taking longer to
      complete than necessary.
      
      I suspect that this is caused by the CPU taking a periodic SMI
      interrupt right at the end of the 30ms calibration loop.  This would
      cause the loop to delay while the SMI BIOS hander runs. The resulting
      TSC value is beyond what it actually should be resulting in a higher
      tsc_khz.
      
      The below patch makes native_calculate_cpu_khz() take the best
      (shortest duration, lowest khz) run of it's 3 calibration loops.  If a
      SMI goes off causing a bad result (long duration, higher khz) it will
      be discarded.
      
      With the patch applied, 300 boots of the same system produce good
      results:
      
       range of tsc_khz  # of boots  % of boots
       ----------------  ----------  ----------
              < 1999750           0      0.000%
      1999750 - 1999800          30     10.000%
      1999800 - 1999850         166     55.333%
      1999850 - 1999900          89     29.667%
      1999900 - 1999950          15      5.000%
      1999950 <                   0      0.000%
      
      Problem was found and tested against 2.6.18.  Patch is against 2.6.22.
      Signed-off-by: NDave Johnson <djohnson@sw.starentnetworks.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      edaf420f
    • A
      x86: add instrumentation menu · ea580655
      Adrian Bunk 提交于
      It seems commit 09cadedb was incomplete 
      due to a clash with the x86 architecture merge.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ea580655
  2. 23 10月, 2007 11 次提交
    • R
      Revert lguest magic and use hook in head.S · 814a0e5c
      Rusty Russell 提交于
      Version 2.07 of the boot protocol uses 0x23C for the hardware_subarch
      field, that for lguest is "1".  This allows us to use the standard
      boot entry point rather than the "GenuineLguest" string hack.
      
      The standard entry point also clears the BSS and copies the boot parameters
      and commandline for us, saving more code.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      814a0e5c
    • R
      Lguest support for Virtio · 19f1537b
      Rusty Russell 提交于
      This makes lguest able to use the virtio devices.
      
      We change the device descriptor page from a simple array to a variable
      length "type, config_len, status, config data..." format, and
      implement virtio_config_ops to read from that config data.
      
      We use the virtio ring implementation for an efficient Guest <-> Host
      virtqueue mechanism, and the new LHCALL_NOTIFY hypercall to kick the
      host when it changes.
      
      We also use LHCALL_NOTIFY on kernel addresses for very very early
      console output.  We could have another hypercall, but this hack works
      quite well.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      19f1537b
    • R
      Remove old lguest bus and drivers. · 0ca49ca9
      Rusty Russell 提交于
      This gets rid of the lguest bus, drivers and DMA mechanism, to make
      way for a generic virtio mechanism.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      0ca49ca9
    • R
      Virtio helper routines for a descriptor ringbuffer implementation · 0a8a69dd
      Rusty Russell 提交于
      These helper routines supply most of the virtqueue_ops for hypervisors
      which want to use a ring for virtio.  Unlike the previous lguest
      implementation:
      
      1) The rings are variable sized (2^n-1 elements).
      2) They have an unfortunate limit of 65535 bytes per sg element.
      3) The page numbers are always 64 bit (PAE anyone?)
      4) They no longer place used[] on a separate page, just a separate
         cacheline.
      5) We do a modulo on a variable.  We could be tricky if we cared.
      6) Interrupts and notifies are suppressed using flags within the rings.
      
      Users need only get the ring pages and provide a notify hook (KVM
      wants the guest to allocate the rings, lguest does it sanely).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Dor Laor <dor.laor@qumranet.com>
      0a8a69dd
    • R
      Boot with virtual == physical to get closer to native Linux. · 47436aa4
      Rusty Russell 提交于
      1) This allows us to get alot closer to booting bzImages.
      
      2) It means we don't have to know page_offset.
      
      3) The Guest needs to modify the boot pagetables to create the
         PAGE_OFFSET mapping before jumping to C code.
      
      4) guest_pa() walks the page tables rather than using page_offset.
      
      5) We don't use page_offset to figure out whether to emulate: it was
         always kinda quesationable, and won't work for instructions done
         before remapping (bzImage unpacking in particular).
      
      6) We still want the kernel address for tlb flushing: have the initial
         hypercall give us that, too.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      47436aa4
    • R
      Allow guest to specify syscall vector to use. · c18acd73
      Rusty Russell 提交于
      (Based on Ron Minnich's LGUEST_PLAN9_SYSCALL patch).
      
      This patch allows Guests to specify what system call vector they want,
      and we try to reserve it.  We only allow one non-Linux system call
      vector, to try to avoid DoS on the Host.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      c18acd73
    • J
      Make hypercalls arch-independent. · b410e7b1
      Jes Sorensen 提交于
      Clean up the hypercall code to make the code in hypercalls.c
      architecture independent. First process the common hypercalls and
      then call lguest_arch_do_hcall() if the call hasn't been handled.
      Rename struct hcall_ring to hcall_args.
      
      This patch requires the previous patch which reorganize the layout of
      struct lguest_regs on i386 so they match the layout of struct
      hcall_args.
      Signed-off-by: NJes Sorensen <jes@sgi.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      b410e7b1
    • J
      Move i386 part of core.c to x86/core.c. · 625efab1
      Jes Sorensen 提交于
      Separate i386 architecture specific from core.c and move it to
      x86/core.c and add x86/lguest.h header file to match.
      Signed-off-by: NJes Sorensen <jes@sgi.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      625efab1
    • R
      Move lguest guest support to arch/x86. · 34b8867a
      Rusty Russell 提交于
      Lguest has two sides: host support (to launch guests) and guest
      support (replacement boot path and paravirt_ops).  This moves the
      guest side to arch/x86/lguest where it's closer to related code.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      34b8867a
    • R
      Normalize config options for guest support · d3d1c4bd
      Rusty Russell 提交于
      1) Group all the "guest OS" support options together, under a PARAVIRT_GUEST
         menu.
      2) Make those options select CONFIG_PARAVIRT, as suggested by Andi.
      3) Make kconfig help titles consistent.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zach Amsden <zach@vmware.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      d3d1c4bd
    • J
      Update arch/ to use sg helpers · 58b053e4
      Jens Axboe 提交于
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      58b053e4
  3. 22 10月, 2007 7 次提交
  4. 20 10月, 2007 19 次提交