1. 30 1月, 2008 9 次提交
  2. 30 11月, 2007 1 次提交
    • J
      x86/paravirt: revert exports to restore old behaviour · f97b8954
      Jeremy Fitzhardinge 提交于
      Subdividing the paravirt_ops structure caused a regression in certain
      non-GPL modules which try to use mmu_ops and cpu_ops.  This restores the
      old behaviour, and makes it consistent with the non-CONFIG_PARAVIRT case.
      
      Takashi Iwai <tiwai@suse.de> adds:
      > I took at this problem (as I have an nvidia card on one of my
      > workstations), and found out that the following suffer from
      > EXPORT_SYMBOL_GPL changes:
      >
      > * local_disable_irq(), local_irq_save*(), etc.
      > * MSR-related macros like rdmsr(), wrmsr(), read_cr0(), etc.
      >   wbinvd(), too.
      > * pmd_val(), pgd_val(), etc are all involved with pv_mm_ops.
      >   pmd_large() and pmd_bad() is also indirectly involved.
      >   __flush_tlb() and friends suffer, too.
      
      Christoph Hellwig objects to this patch on the grounds that modules
      shouldn't be using these operations anyway.  I don't think this is a
      particularly good reason to reject the patch, for several reasons:
      
      1. These operations are still available to modules when not using
         CONFIG_PARAVIRT, since they are implicitly exported as inline
         functions via the kernel headers.  Exporting the same functionality as
         GPL-only symbols just adds a gratuitious difference between
         CONFIG_PARAVIRT and non-CONFIG_PARAVIRT configurations.  If we really
         think these operations are not for module use (or non-GPL module use),
         then we should solve the problem in a general way.
      
      2. It's a regression from previous kernels, which would work these
         modules even with CONFIG_PARAVIRT enabled.
      
      3. The operations in question seem pretty reasonable for modules to
         use.  The control registers/MSRs can be accessed directly anyway, so there's
         no benefit in preventing modules from using standard interfaces.  And it seems
         reasonable to allow a graphics driver to create its own mappings if it wants.
      
      Therefore, I think this patch should go in for 2.6.24.  If people
      really think that these operations should not be available to modules,
      then we can address that separately.
      Signed-off-by: NJeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
      Cc: Tobias Powalowski <t.powa@gmx.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f97b8954
  3. 17 10月, 2007 2 次提交
    • J
      paravirt: clean up lazy mode handling · 8965c1c0
      Jeremy Fitzhardinge 提交于
      Currently, the set_lazy_mode pv_op is overloaded with 5 functions:
       1. enter lazy cpu mode
       2. leave lazy cpu mode
       3. enter lazy mmu mode
       4. leave lazy mmu mode
       5. flush pending batched operations
      
      This complicates each paravirt backend, since it needs to deal with
      all the possible state transitions, handling flushing, etc. In
      particular, flushing is quite distinct from the other 4 functions, and
      seems to just cause complication.
      
      This patch removes the set_lazy_mode operation, and adds "enter" and
      "leave" lazy mode operations on mmu_ops and cpu_ops.  All the logic
      associated with enter and leaving lazy states is now in common code
      (basically BUG_ONs to make sure that no mode is current when entering
      a lazy mode, and make sure that the mode is current when leaving).
      Also, flush is handled in a common way, by simply leaving and
      re-entering the lazy mode.
      
      The result is that the Xen, lguest and VMI lazy mode implementations
      are much simpler.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zach Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Anthony Liguory <aliguori@us.ibm.com>
      Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      8965c1c0
    • J
      paravirt: refactor struct paravirt_ops into smaller pv_*_ops · 93b1eab3
      Jeremy Fitzhardinge 提交于
      This patch refactors the paravirt_ops structure into groups of
      functionally related ops:
      
      pv_info - random info, rather than function entrypoints
      pv_init_ops - functions used at boot time (some for module_init too)
      pv_misc_ops - lazy mode, which didn't fit well anywhere else
      pv_time_ops - time-related functions
      pv_cpu_ops - various privileged instruction ops
      pv_irq_ops - operations for managing interrupt state
      pv_apic_ops - APIC operations
      pv_mmu_ops - operations for managing pagetables
      
      There are several motivations for this:
      
      1. Some of these ops will be general to all x86, and some will be
         i386/x86-64 specific.  This makes it easier to share common stuff
         while allowing separate implementations where needed.
      
      2. At the moment we must export all of paravirt_ops, but modules only
         need selected parts of it.  This allows us to export on a case by case
         basis (and also choose which export license we want to apply).
      
      3. Functional groupings make things a bit more readable.
      
      Struct paravirt_ops is now only used as a template to generate
      patch-site identifiers, and to extract function pointers for inserting
      into jmp/calls when patching.  It is only instantiated when needed.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zach Amsden <zach@vmware.com>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Anthony Liguory <aliguori@us.ibm.com>
      Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      93b1eab3
  4. 11 10月, 2007 2 次提交
  5. 12 8月, 2007 1 次提交
    • A
      i386: Make patching more robust, fix paravirt issue · ab144f5e
      Andi Kleen 提交于
      Commit 19d36ccd "x86: Fix alternatives
      and kprobes to remap write-protected kernel text" uses code which is
      being patched for patching.
      
      In particular, paravirt_ops does patching in two stages: first it
      calls paravirt_ops.patch, then it fills any remaining instructions
      with nop_out().  nop_out calls text_poke() which calls
      lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val):
      that call site is one of the places we patch.
      
      If we always do patching as one single call to text_poke(), we only
      need make sure we're not patching the memcpy in text_poke itself.
      This means the prototype to paravirt_ops.patch needs to change, to
      marshal the new code into a buffer rather than patching in place as it
      does now.  It also means all patching goes through text_poke(), which
      is known to be safe (apply_alternatives is also changed to make a
      single patch).
      
      AK: fix compilation on x86-64 (bad rusty!)
      AK: fix boot on x86-64 (sigh)
      AK: merged with other patches
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab144f5e
  6. 23 7月, 2007 2 次提交
  7. 18 7月, 2007 2 次提交
    • J
      Add a sched_clock paravirt_op · 688340ea
      Jeremy Fitzhardinge 提交于
      The tsc-based get_scheduled_cycles interface is not a good match for
      Xen's runstate accounting, which reports everything in nanoseconds.
      
      This patch replaces this interface with a sched_clock interface, which
      matches both Xen and VMI's requirements.
      
      In order to do this, we:
         1. replace get_scheduled_cycles with sched_clock
         2. hoist cycles_2_ns into a common header
         3. update vmi accordingly
      
      One thing to note: because sched_clock is implemented as a weak
      function in kernel/sched.c, we must define a real function in order to
      override this weak binding.  This means the usual paravirt_ops
      technique of using an inline function won't work in this case.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Dan Hecht <dhecht@vmware.com>
      Cc: john stultz <johnstul@us.ibm.com>
      688340ea
    • J
      paravirt: helper to disable all IO space · d572929c
      Jeremy Fitzhardinge 提交于
      In a virtual environment, device drivers such as legacy IDE will waste
      quite a lot of time probing for their devices which will never appear.
      This helper function allows a paravirt implementation to lay claim to
      the whole iomem and ioport space, thereby disabling all device drivers
      trying to claim IO resources.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      d572929c
  8. 11 5月, 2007 1 次提交
    • E
      Revert "[PATCH] paravirt: Add startup infrastructure for paravirtualization" · 5a18c92a
      Eric W. Biederman 提交于
      This reverts commit c9ccf30d.
      
      Entering the kernel at startup_32 without passing our real mode data in
      %esi, and without guaranteeing that physical and virtual addresses are
      identity mapped makes head.S impossible to maintain.
      
      The only user of this infrastructure is lguest which is not merged so
      nothing we currently support will break by removing this over designed
      nightmare, and only the pending lguest patches will be affected.  The
      pending Xen patches have a different entry point that they use.
      
      We are currently discussing what Xen and lguest need to do to boot the
      kernel in a more normal fashion so using startup_32 in this weird manner is
      clearly not their long term direction.
      
      So let's remove this code in head.S before it causes brain damage to people
      trying to maintain head.S
      
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Zachary Amsden <zach@vmware.com>
      CC: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a18c92a
  9. 03 5月, 2007 14 次提交
    • J
      [PATCH] i386: PARAVIRT: fix startup_ipi_hook config dependency · 0260c196
      Jeremy Fitzhardinge 提交于
      startup_ipi_hook depends on CONFIG_X86_LOCAL_APIC, so move it to the
      right part of the paravirt_ops initialization.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      0260c196
    • A
      [PATCH] i386: PARAVIRT: Export paravirt_ops for non GPL modules too · 21564fd2
      Andi Kleen 提交于
      Otherwise non GPL modules cannot even do basic operations
      like disabling interrupts anymore, which would be excessive.
      
      Longer term should split the single structure up into
      internal and external symbols and not export the internal
      ones at all.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      21564fd2
    • J
      [PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear · 4cdd9c89
      Jeremy Fitzhardinge 提交于
      In shadow mode hypervisors, ptep_get_and_clear achieves the desired
      purpose of keeping the shadows in sync by issuing a native_get_and_clear,
      followed by a call to pte_update, which indicates the PTE has been
      modified.
      
      Direct mode hypervisors (Xen) have no need for this anyway, and will trap
      the update using writable pagetables.
      
      This means no hypervisor makes use of ptep_get_and_clear; there is no
      reason to have it in the paravirt-ops structure.  Change confusing
      terminology about raw vs. native functions into consistent use of
      native_pte_xxx for operations which do not invoke paravirt-ops.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      4cdd9c89
    • J
      [PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages · ce6234b5
      Jeremy Fitzhardinge 提交于
      Xen and VMI both have special requirements when mapping a highmem pte
      page into the kernel address space.  These can be dealt with by adding
      a new kmap_atomic_pte() function for mapping highptes, and hooking it
      into the paravirt_ops infrastructure.
      
      Xen specifically wants to map the pte page RO, so this patch exposes a
      helper function, kmap_atomic_prot, which maps the page with the
      specified page protections.
      
      This also adds a kmap_flush_unused() function to clear out the cached
      kmap mappings.  Xen needs this to clear out any potential stray RW
      mappings of pages which will become part of a pagetable.
      
      [ Zach - vmi.c will need some attention after this patch.  It wasn't
        immediately obvious to me what needs to be done. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      ce6234b5
    • J
      [PATCH] i386: PARAVIRT: revert map_pt_hook. · a27fe809
      Jeremy Fitzhardinge 提交于
      Back out the map_pt_hook to clear the way for kmap_atomic_pte.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      a27fe809
    • J
      [PATCH] i386: PARAVIRT: add flush_tlb_others paravirt_op · d4c10477
      Jeremy Fitzhardinge 提交于
      This patch adds a pv_op for flush_tlb_others.  Linux running on native
      hardware uses cross-CPU IPIs to flush the TLB on any CPU which may
      have a particular mm's pagetable entries cached in its TLB.  This is
      inefficient in a paravirtualized environment, since the hypervisor
      knows which real CPUs actually contain cached mappings, which may be a
      small subset of a guest's VCPUs.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      d4c10477
    • J
      [PATCH] i386: PARAVIRT: add common patching machinery · 63f70270
      Jeremy Fitzhardinge 提交于
      Implement the actual patching machinery.  paravirt_patch_default()
      contains the logic to automatically patch a callsite based on a few
      simple rules:
      
       - if the paravirt_op function is paravirt_nop, then patch nops
       - if the paravirt_op function is a jmp target, then jmp to it
       - if the paravirt_op function is callable and doesn't clobber too much
          for the callsite, call it directly
      
      paravirt_patch_default is suitable as a default implementation of
      paravirt_ops.patch, will remove most of the expensive indirect calls
      in favour of either a direct call or a pile of nops.
      
      Backends may implement their own patcher, however.  There are several
      helper functions to help with this:
      
      paravirt_patch_nop	nop out a callsite
      paravirt_patch_ignore	leave the callsite as-is
      paravirt_patch_call	patch a call if the caller and callee
      			have compatible clobbers
      paravirt_patch_jmp	patch in a jmp
      paravirt_patch_insns	patch some literal instructions over
      			the callsite, if they fit
      
      This patch also implements more direct patches for the native case, so
      that when running on native hardware many common operations are
      implemented inline.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Anthony Liguori <anthony@codemonkey.ws>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      63f70270
    • J
      [PATCH] i386: PARAVIRT: Use patch site IDs computed from offset in paravirt_ops structure · d5822035
      Jeremy Fitzhardinge 提交于
      Use patch type identifiers derived from the offset of the operation in
      the paravirt_ops structure.  This avoids having to maintain a separate
      enum for patch site types.
      
      Also, since the identifier is derived from the offset into
      paravirt_ops, the offset can be derived from the identifier.  This is
      used to remove replicated information in the various callsite macros,
      which has been a source of bugs in the past.
      
      This patch also drops the fused save_fl+cli operation, which doesn't
      really add much and makes things more complex - specifically because
      it breaks the 1:1 relationship between identifiers and offsets.  If
      this operation turns out to be particularly beneficial, then the right
      answer is to define a new entrypoint for it.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      d5822035
    • J
      [PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction · d6dd61c8
      Jeremy Fitzhardinge 提交于
      Add hooks to allow a paravirt implementation to track the lifetime of
      an mm.  Paravirtualization requires three hooks, but only two are
      needed in common code.  They are:
      
      arch_dup_mmap, which is called when a new mmap is created at fork
      
      arch_exit_mmap, which is called when the last process reference to an
        mm is dropped, which typically happens on exit and exec.
      
      The third hook is activate_mm, which is called from the arch-specific
      activate_mm() macro/function, and so doesn't need stub versions for
      other architectures.  It's called when an mm is first used.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: linux-arch@vger.kernel.org
      Cc: James Bottomley <James.Bottomley@SteelEye.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      d6dd61c8
    • J
      [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing · 5311ab62
      Jeremy Fitzhardinge 提交于
      Normally when running in PAE mode, the 4th PMD maps the kernel address space,
      which can be shared among all processes (since they all need the same kernel
      mappings).
      
      Xen, however, does not allow guests to have the kernel pmd shared between page
      tables, so parameterize pgtable.c to allow both modes of operation.
      
      There are several side-effects of this.  One is that vmalloc will update the
      kernel address space mappings, and those updates need to be propagated into
      all processes if the kernel mappings are not intrinsically shared.  In the
      non-PAE case, this is done by maintaining a pgd_list of all processes; this
      list is used when all process pagetables must be updated.  pgd_list is
      threaded via otherwise unused entries in the page structure for the pgd, which
      means that the pgd must be page-sized for this to work.
      
      Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
      pgd to page aligned anyway, so this patch forces the pgd to be page
      aligned+sized when the kernel pmd is unshared, to accomodate both these
      requirements.
      
      Also, since there may be several distinct kernel pmds (if the user/kernel
      split is below 3G), there's no point in allocating them from a slab cache;
      they're just allocated with get_free_page and initialized appropriately.  (Of
      course the could be cached if there is just a single kernel pmd - which is the
      default with a 3G user/kernel split - but it doesn't seem worthwhile to add
      yet another case into this code).
      
      [ Many thanks to wli for review comments. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NWilliam Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      5311ab62
    • J
      [PATCH] i386: PARAVIRT: Hooks to set up initial pagetable · b239fb25
      Jeremy Fitzhardinge 提交于
      This patch introduces paravirt_ops hooks to control how the kernel's
      initial pagetable is set up.
      
      In the case of a native boot, the very early bootstrap code creates a
      simple non-PAE pagetable to map the kernel and physical memory.  When
      the VM subsystem is initialized, it creates a proper pagetable which
      respects the PAE mode, large pages, etc.
      
      When booting under a hypervisor, there are many possibilities for what
      paging environment the hypervisor establishes for the guest kernel, so
      the constructon of the kernel's pagetable depends on the hypervisor.
      
      In the case of Xen, the hypervisor boots the kernel with a fully
      constructed pagetable, which is already using PAE if necessary.  Also,
      Xen requires particular care when constructing pagetables to make sure
      all pagetables are always mapped read-only.
      
      In order to make this easier, kernel's initial pagetable construction
      has been changed to only allocate and initialize a pagetable page if
      there's no page already present in the pagetable.  This allows the Xen
      paravirt backend to make a copy of the hypervisor-provided pagetable,
      allowing the kernel to establish any more mappings it needs while
      keeping the existing ones.
      
      A slightly subtle point which is worth highlighting here is that Xen
      requires all kernel mappings to share the same pte_t pages between all
      pagetables, so that updating a kernel page's mapping in one pagetable
      is reflected in all other pagetables.  This makes it possible to
      allocate a page and attach it to a pagetable without having to
      explicitly enumerate that page's mapping in all pagetables.
      
      And:
      
      +From: "Eric W. Biederman" <ebiederm@xmission.com>
      
      If we don't set the leaf page table entries it is quite possible that
      will inherit and incorrect page table entry from the initial boot
      page table setup in head.S.  So we need to redo the effort here,
      so we pick up PSE, PGE and the like.
      
      Hypervisors like Xen require that their page tables be read-only,
      which is slightly incompatible with our low identity mappings, however
      I discussed this with Jeremy he has modified the Xen early set_pte
      function to avoid problems in this area.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NWilliam Irwin <bill.irwin@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      b239fb25
    • J
      [PATCH] i386: PARAVIRT: Add pagetable accessors to pack and unpack pagetable entries · 3dc494e8
      Jeremy Fitzhardinge 提交于
      Add a set of accessors to pack, unpack and modify page table entries
      (at all levels).  This allows a paravirt implementation to control the
      contents of pgd/pmd/pte entries.  For example, Xen uses this to
      convert the (pseudo-)physical address into a machine address when
      populating a pagetable entry, and converting back to pphys address
      when an entry is read.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      3dc494e8
    • J
      [PATCH] i386: PARAVIRT: use paravirt_nop to consistently mark no-op operations · 45876233
      Jeremy Fitzhardinge 提交于
      Add a _paravirt_nop function for use as a stub for no-op operations,
      and paravirt_nop #defined void * version to make using it easier
      (since all its uses are as a void *).
      
      This is useful to allow the patcher to automatically identify noop
      operations so it can simply nop out the callsite.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      [mingo] but only as a cleanup of the current open-coded (void *) casts.
      My problem with this is that it loses the types. Not that there is much
      to check for, but still, this adds some assumptions about how function
      calls look like
      45876233
    • R
      [PATCH] i386: rationalize paravirt wrappers · 90a0a06a
      Rusty Russell 提交于
      paravirt.c used to implement native versions of all low-level
      functions.  Far cleaner is to have the native versions exposed in the
      headers and as inline native_XXX, and if !CONFIG_PARAVIRT, then simply
      #define XXX native_XXX.
      
      There are several nice side effects:
      
      1) write_dt_entry() now takes the correct "struct Xgt_desc_struct *"
         not "void *".
      
      2) load_TLS is reintroduced to the for loop, not manually unrolled
         with a #error in case the bounds ever change.
      
      3) Macros become inlines, with type checking.
      
      4) Access to the native versions is trivial for KVM, lguest, Xen and
         others who might want it.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Avi Kivity <avi@qumranet.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      90a0a06a
  10. 05 3月, 2007 5 次提交
    • Z
      [PATCH] vmi: pit override · e30fab3a
      Zachary Amsden 提交于
      The time_init_hook in paravirt-ops no longer functions in the correct manner
      after the integration of the hrtimers code.  The problem is that now the call
      path for time initialization is:
      
        time_init :
             late_time_init = hpet_time_init;
      
        late_time_init -> hpet_time_init:
             setup_pit_timer (BAD)
             do_time_init --> (via paravirt.h)
                time_init_hook --> (via arch_hooks.h)
                    time_init_hook (in SUBARCH/setup.c)
      
      If this isn't confusing enough, the paravirt case goes through an indirect
      function pointer in the paravirt-ops table.  The problem is, by the time the
      paravirt hook is called, the pit timer is already enabled.
      
      But paravirt guests have their own timer, and don't want to use the PIT.
      Rather than intensify the struggle for power going on here, just make it all
      nice and simple and just unconditionally do all timer setup in the
      late_time_init hook.  This also has the advantage of enabling timers in the
      same place in all code paths, so everyone has the same bugs and we don't have
      outliers who break other code because they turn on timer too early or too
      late.
      
      So the paravirt-ops time init function is now by default hpet_time_init, which
      is the time init function used for native hardware.  Paravirt guests have the
      chance to override this when they setup the paravirt-ops table, and should
      need no change.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e30fab3a
    • Z
      [PATCH] vmi: paravirt drop udelay op · eda08b1b
      Zachary Amsden 提交于
      Not respecting udelay causes problems with any virtual hardware that is passed
      through to real hardware.  This can be noticed by any device that interacts
      with the real world in real time - like AP startup, which takes real time.  Or
      keyboard LEDs, which should blink in real-time.  Or floppy drives, but only
      when passed through to a real floppy controller on OSes which can't
      sufficiently buffer the floppy commands to emulate a zero latency floppy.  Or
      IDE drives, when connecting to a physical CDROM.
      
      This was mostly a hack to get the kernel to boot faster, but it introduced a
      number of misvirtualization bugs, and Alan and Pavel argued pretty strongly
      against it.  We were the only client, and now want to clean up this cruft.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eda08b1b
    • Z
      [PATCH] vmi: fix highpte · 9a1c13e9
      Zachary Amsden 提交于
      Provide a PT map hook for HIGHPTE kernels to designate where they are mapping
      page tables.  This information is required so the physical address of PTE
      updates can be determined; otherwise, the mm layer would have to carry the
      physical address all the way to each PTE modification callsite, which is even
      more hideous that the macros required to provide the proper hooks.
      
      So lets not mess up arch neutral code to achieve this, but keep the horror in
      an #ifdef HIGHPTE in include/asm-i386/pgtable.h.  I had to use macros here
      because some types are not yet defined in all the include paths for this
      header.
      
      This patch is absolutely required for HIGHPTE kernels to operate properly with
      VMI.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a1c13e9
    • Z
      [PATCH] vmi: cpu cycles fix · 1182d852
      Zachary Amsden 提交于
      In order to share the common code in tsc.c which does CPU Khz calibration, we
      need to make an accurate value of CPU speed available to the tsc.c code.  This
      value loses a lot of precision in a VM because of the timing differences with
      real hardware, but we need it to be as precise as possible so the guest can
      make accurate time calculations with the cycle counters.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1182d852
    • Z
      [PATCH] vmi: sched clock paravirt op fix · 6cb9a835
      Zachary Amsden 提交于
      The custom_sched_clock hook is broken.  The result from sched_clock needs to
      be in nanoseconds, not in CPU cycles.  The TSC is insufficient for this
      purpose, because TSC is poorly defined in a virtual environment, and mostly
      represents real world time instead of scheduled process time (which can be
      interrupted without notice when a virtual machine is descheduled).
      
      To make the scheduler consistent, we must expose a different nature of time,
      that is scheduled time.  So deprecate this custom_sched_clock hack and turn it
      into a paravirt-op, as it should have been all along.  This allows the tsc.c
      code which converts cycles to nanoseconds to be shared by all paravirt-ops
      backends.
      
      It is unfortunate to add a new paravirt-op, but this is a very distinct
      abstraction which is clearly different for all virtual machine
      implementations, and it gets rid of an ugly indirect function which I
      ashamedly admit I hacked in to try to get this to work earlier, and then even
      got in the wrong units.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6cb9a835
  11. 13 2月, 2007 1 次提交
    • R
      [PATCH] i386: paravirt unhandled fallthrough · 992af681
      Rusty Russell 提交于
      The current code simply calls "start_kernel" directly if we're under a
      hypervisor and no paravirt_ops backend wants us, because paravirt.c
      registers that as a backend.
      
      This was always a vain hope; start_kernel won't get far without setup.
      It's also impossible for paravirt_ops backends which don't sit in the
      arch/i386/kernel directory: they can't link before paravirt.o anyway.
      
      Keep it simple: if we pass all the registered paravirt probes, BUG().
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      992af681