1. 30 1月, 2008 6 次提交
  2. 17 10月, 2007 2 次提交
    • J
      paravirt: clean up lazy mode handling · 8965c1c0
      Jeremy Fitzhardinge 提交于
      Currently, the set_lazy_mode pv_op is overloaded with 5 functions:
       1. enter lazy cpu mode
       2. leave lazy cpu mode
       3. enter lazy mmu mode
       4. leave lazy mmu mode
       5. flush pending batched operations
      
      This complicates each paravirt backend, since it needs to deal with
      all the possible state transitions, handling flushing, etc. In
      particular, flushing is quite distinct from the other 4 functions, and
      seems to just cause complication.
      
      This patch removes the set_lazy_mode operation, and adds "enter" and
      "leave" lazy mode operations on mmu_ops and cpu_ops.  All the logic
      associated with enter and leaving lazy states is now in common code
      (basically BUG_ONs to make sure that no mode is current when entering
      a lazy mode, and make sure that the mode is current when leaving).
      Also, flush is handled in a common way, by simply leaving and
      re-entering the lazy mode.
      
      The result is that the Xen, lguest and VMI lazy mode implementations
      are much simpler.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zach Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Anthony Liguory <aliguori@us.ibm.com>
      Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      8965c1c0
    • J
      paravirt: refactor struct paravirt_ops into smaller pv_*_ops · 93b1eab3
      Jeremy Fitzhardinge 提交于
      This patch refactors the paravirt_ops structure into groups of
      functionally related ops:
      
      pv_info - random info, rather than function entrypoints
      pv_init_ops - functions used at boot time (some for module_init too)
      pv_misc_ops - lazy mode, which didn't fit well anywhere else
      pv_time_ops - time-related functions
      pv_cpu_ops - various privileged instruction ops
      pv_irq_ops - operations for managing interrupt state
      pv_apic_ops - APIC operations
      pv_mmu_ops - operations for managing pagetables
      
      There are several motivations for this:
      
      1. Some of these ops will be general to all x86, and some will be
         i386/x86-64 specific.  This makes it easier to share common stuff
         while allowing separate implementations where needed.
      
      2. At the moment we must export all of paravirt_ops, but modules only
         need selected parts of it.  This allows us to export on a case by case
         basis (and also choose which export license we want to apply).
      
      3. Functional groupings make things a bit more readable.
      
      Struct paravirt_ops is now only used as a template to generate
      patch-site identifiers, and to extract function pointers for inserting
      into jmp/calls when patching.  It is only instantiated when needed.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zach Amsden <zach@vmware.com>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Anthony Liguory <aliguori@us.ibm.com>
      Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      93b1eab3
  3. 11 10月, 2007 2 次提交
  4. 12 8月, 2007 1 次提交
    • A
      i386: Make patching more robust, fix paravirt issue · ab144f5e
      Andi Kleen 提交于
      Commit 19d36ccd "x86: Fix alternatives
      and kprobes to remap write-protected kernel text" uses code which is
      being patched for patching.
      
      In particular, paravirt_ops does patching in two stages: first it
      calls paravirt_ops.patch, then it fills any remaining instructions
      with nop_out().  nop_out calls text_poke() which calls
      lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val):
      that call site is one of the places we patch.
      
      If we always do patching as one single call to text_poke(), we only
      need make sure we're not patching the memcpy in text_poke itself.
      This means the prototype to paravirt_ops.patch needs to change, to
      marshal the new code into a buffer rather than patching in place as it
      does now.  It also means all patching goes through text_poke(), which
      is known to be safe (apply_alternatives is also changed to make a
      single patch).
      
      AK: fix compilation on x86-64 (bad rusty!)
      AK: fix boot on x86-64 (sigh)
      AK: merged with other patches
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab144f5e
  5. 18 7月, 2007 2 次提交
    • J
      Add a sched_clock paravirt_op · 688340ea
      Jeremy Fitzhardinge 提交于
      The tsc-based get_scheduled_cycles interface is not a good match for
      Xen's runstate accounting, which reports everything in nanoseconds.
      
      This patch replaces this interface with a sched_clock interface, which
      matches both Xen and VMI's requirements.
      
      In order to do this, we:
         1. replace get_scheduled_cycles with sched_clock
         2. hoist cycles_2_ns into a common header
         3. update vmi accordingly
      
      One thing to note: because sched_clock is implemented as a weak
      function in kernel/sched.c, we must define a real function in order to
      override this weak binding.  This means the usual paravirt_ops
      technique of using an inline function won't work in this case.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Dan Hecht <dhecht@vmware.com>
      Cc: john stultz <johnstul@us.ibm.com>
      688340ea
    • J
      paravirt: add an "mm" argument to alloc_pt · fdb4c338
      Jeremy Fitzhardinge 提交于
      It's useful to know which mm is allocating a pagetable.  Xen uses this
      to determine whether the pagetable being added to is pinned or not.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      fdb4c338
  6. 01 6月, 2007 1 次提交
  7. 03 5月, 2007 10 次提交
  8. 15 4月, 2007 1 次提交
  9. 09 4月, 2007 1 次提交
  10. 17 3月, 2007 1 次提交
  11. 05 3月, 2007 9 次提交
    • A
      [PATCH] arch/i386/kernel/vmi.c must #include <asm/kmap_types.h> · 8f485612
      Adrian Bunk 提交于
        CC      arch/i386/kernel/vmi.o
      /home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c: In function 'vmi_map_pt_hook':
      /home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c:387: error: 'KM_PTE0' undeclared (first use in this function)
      /home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c:387: error: (Each undeclared identifier is reported only once
      /home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c:387: error: for each function it appears in.)
      /home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c:387: error: 'KM_PTE1' undeclared (first use in this function)
      make[2]: *** [arch/i386/kernel/vmi.o] Error 1
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Acked-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f485612
    • Z
      [PATCH] vmi: smp fixes · c6b36e9a
      Zachary Amsden 提交于
      Critical fixes for SMP.
      
      Fix a couple functions which needed to be __devinit and fix a bogus parameter
      to AP startup that just so happened to work because the low virtual mapping of
      memory was still established.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c6b36e9a
    • Z
      [PATCH] vmi: apic ops · 772205f6
      Zachary Amsden 提交于
      Use para_fill instead of directly setting the APIC ops to the result of the
      vmi_get_function call - this allows one to implement a VMI ROM without
      implementing APIC functions, just using the native APIC functions.
      
      While doing this, I realized that there is a lot more cleanup that should have
      been done.  Basically, we should never assume that the ROM implements a
      specific set of functions, and always allow fallback to the native
      implementation.
      
      This is critical for future compatibility.
      Signed-off-by: NAnthony Liguori <anthony@codemonkey.ws>
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      772205f6
    • Z
      [PATCH] vmi: fix nohz compile · a9eddc95
      Zachary Amsden 提交于
      More goo from hrtimers integration.  We do compile and run properly with NO_HZ
      enabled.  There was a period when we didn't because of a missing export, but
      that was since fixed.
      
      And with the clocksource code now firmly in place, we can get rid of code that
      fixes up the wallclock, since this is done in the common infrastructure.  This
      actually fixes a timer bug as well, that was caused by do_settimeofday no
      longer being callable with interrupts disabled due to the use of
      on_each_cpu().
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9eddc95
    • Z
      [PATCH] vmi: paravirt drop udelay op · eda08b1b
      Zachary Amsden 提交于
      Not respecting udelay causes problems with any virtual hardware that is passed
      through to real hardware.  This can be noticed by any device that interacts
      with the real world in real time - like AP startup, which takes real time.  Or
      keyboard LEDs, which should blink in real-time.  Or floppy drives, but only
      when passed through to a real floppy controller on OSes which can't
      sufficiently buffer the floppy commands to emulate a zero latency floppy.  Or
      IDE drives, when connecting to a physical CDROM.
      
      This was mostly a hack to get the kernel to boot faster, but it introduced a
      number of misvirtualization bugs, and Alan and Pavel argued pretty strongly
      against it.  We were the only client, and now want to clean up this cruft.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eda08b1b
    • Z
      [PATCH] vmi: fix highpte · 9a1c13e9
      Zachary Amsden 提交于
      Provide a PT map hook for HIGHPTE kernels to designate where they are mapping
      page tables.  This information is required so the physical address of PTE
      updates can be determined; otherwise, the mm layer would have to carry the
      physical address all the way to each PTE modification callsite, which is even
      more hideous that the macros required to provide the proper hooks.
      
      So lets not mess up arch neutral code to achieve this, but keep the horror in
      an #ifdef HIGHPTE in include/asm-i386/pgtable.h.  I had to use macros here
      because some types are not yet defined in all the include paths for this
      header.
      
      This patch is absolutely required for HIGHPTE kernels to operate properly with
      VMI.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a1c13e9
    • Z
      [PATCH] vmi: cpu cycles fix · 1182d852
      Zachary Amsden 提交于
      In order to share the common code in tsc.c which does CPU Khz calibration, we
      need to make an accurate value of CPU speed available to the tsc.c code.  This
      value loses a lot of precision in a VM because of the timing differences with
      real hardware, but we need it to be as precise as possible so the guest can
      make accurate time calculations with the cycle counters.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1182d852
    • Z
      [PATCH] vmi: sched clock paravirt op fix · 6cb9a835
      Zachary Amsden 提交于
      The custom_sched_clock hook is broken.  The result from sched_clock needs to
      be in nanoseconds, not in CPU cycles.  The TSC is insufficient for this
      purpose, because TSC is poorly defined in a virtual environment, and mostly
      represents real world time instead of scheduled process time (which can be
      interrupted without notice when a virtual machine is descheduled).
      
      To make the scheduler consistent, we must expose a different nature of time,
      that is scheduled time.  So deprecate this custom_sched_clock hack and turn it
      into a paravirt-op, as it should have been all along.  This allows the tsc.c
      code which converts cycles to nanoseconds to be shared by all paravirt-ops
      backends.
      
      It is unfortunate to add a new paravirt-op, but this is a very distinct
      abstraction which is clearly different for all virtual machine
      implementations, and it gets rid of an ugly indirect function which I
      ashamedly admit I hacked in to try to get this to work earlier, and then even
      got in the wrong units.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6cb9a835
    • Z
      [PATCH] vmi: timer fixes round two · 7507ba34
      Zachary Amsden 提交于
      Critical bugfixes for the VMI-Timer code.
      
      1) Do not setup a one shot alarm if we are keeping the periodic alarm
         armed.  Additionally, since the periodic alarm can be run at a lower rate
         than HZ, let's fixup the guard to the no-idle-hz mode appropriately.  This
         fixes the bug where the no-idle-hz mode might have a higher interrupt rate
         than the non-idle case.
      
      2) The interrupt handler can no longer adjust xtime due to nested lock
         acquisition.  Drop this.  We don't need to check for wallclock time at
         every tick, it can be done in userspace instead.
      
      3) Add a bypass to disable noidle operation.  This is useful as a last
         minute workaround, or testing measure.
      
      4) The code to skip the IO_APIC timer testing (no_timer_check) should be
         conditional on IO_APIC, not SMP, since UP kernels can have this configured
         in as well.
      Signed-off-by: NDan Hecht <dhecht@vmware.com>
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7507ba34
  12. 13 2月, 2007 2 次提交
    • Z
      [PATCH] i386: vMI timer patches · bbab4f3b
      Zachary Amsden 提交于
      VMI timer code.  It works by taking over the local APIC clock when APIC is
      configured, which requires a couple hooks into the APIC code.  The backend
      timer code could be commonized into the timer infrastructure, but there are
      some pieces missing (stolen time, in particular), and the exact semantics of
      when to do accounting for NO_IDLE need to be shared between different
      hypervisors as well.  So for now, VMI timer is a separate module.
      
      [Adrian Bunk: cleanups]
      
      Subject: VMI timer patches
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      bbab4f3b
    • Z
      [PATCH] i386: vMI backend for paravirt-ops · 7ce0bcfd
      Zachary Amsden 提交于
      Fairly straightforward implementation of VMI backend for paravirt-ops.
      
      [Adrian Bunk: some cleanups]
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      7ce0bcfd