1. 30 1月, 2008 15 次提交
  2. 17 10月, 2007 2 次提交
    • J
      paravirt: clean up lazy mode handling · 8965c1c0
      Jeremy Fitzhardinge 提交于
      Currently, the set_lazy_mode pv_op is overloaded with 5 functions:
       1. enter lazy cpu mode
       2. leave lazy cpu mode
       3. enter lazy mmu mode
       4. leave lazy mmu mode
       5. flush pending batched operations
      
      This complicates each paravirt backend, since it needs to deal with
      all the possible state transitions, handling flushing, etc. In
      particular, flushing is quite distinct from the other 4 functions, and
      seems to just cause complication.
      
      This patch removes the set_lazy_mode operation, and adds "enter" and
      "leave" lazy mode operations on mmu_ops and cpu_ops.  All the logic
      associated with enter and leaving lazy states is now in common code
      (basically BUG_ONs to make sure that no mode is current when entering
      a lazy mode, and make sure that the mode is current when leaving).
      Also, flush is handled in a common way, by simply leaving and
      re-entering the lazy mode.
      
      The result is that the Xen, lguest and VMI lazy mode implementations
      are much simpler.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zach Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Anthony Liguory <aliguori@us.ibm.com>
      Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      8965c1c0
    • J
      paravirt: refactor struct paravirt_ops into smaller pv_*_ops · 93b1eab3
      Jeremy Fitzhardinge 提交于
      This patch refactors the paravirt_ops structure into groups of
      functionally related ops:
      
      pv_info - random info, rather than function entrypoints
      pv_init_ops - functions used at boot time (some for module_init too)
      pv_misc_ops - lazy mode, which didn't fit well anywhere else
      pv_time_ops - time-related functions
      pv_cpu_ops - various privileged instruction ops
      pv_irq_ops - operations for managing interrupt state
      pv_apic_ops - APIC operations
      pv_mmu_ops - operations for managing pagetables
      
      There are several motivations for this:
      
      1. Some of these ops will be general to all x86, and some will be
         i386/x86-64 specific.  This makes it easier to share common stuff
         while allowing separate implementations where needed.
      
      2. At the moment we must export all of paravirt_ops, but modules only
         need selected parts of it.  This allows us to export on a case by case
         basis (and also choose which export license we want to apply).
      
      3. Functional groupings make things a bit more readable.
      
      Struct paravirt_ops is now only used as a template to generate
      patch-site identifiers, and to extract function pointers for inserting
      into jmp/calls when patching.  It is only instantiated when needed.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zach Amsden <zach@vmware.com>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Anthony Liguory <aliguori@us.ibm.com>
      Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      93b1eab3
  3. 11 10月, 2007 1 次提交
  4. 12 8月, 2007 1 次提交
    • A
      i386: Make patching more robust, fix paravirt issue · ab144f5e
      Andi Kleen 提交于
      Commit 19d36ccd "x86: Fix alternatives
      and kprobes to remap write-protected kernel text" uses code which is
      being patched for patching.
      
      In particular, paravirt_ops does patching in two stages: first it
      calls paravirt_ops.patch, then it fills any remaining instructions
      with nop_out().  nop_out calls text_poke() which calls
      lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val):
      that call site is one of the places we patch.
      
      If we always do patching as one single call to text_poke(), we only
      need make sure we're not patching the memcpy in text_poke itself.
      This means the prototype to paravirt_ops.patch needs to change, to
      marshal the new code into a buffer rather than patching in place as it
      does now.  It also means all patching goes through text_poke(), which
      is known to be safe (apply_alternatives is also changed to make a
      single patch).
      
      AK: fix compilation on x86-64 (bad rusty!)
      AK: fix boot on x86-64 (sigh)
      AK: merged with other patches
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab144f5e
  5. 18 7月, 2007 4 次提交
    • J
      Add a sched_clock paravirt_op · 688340ea
      Jeremy Fitzhardinge 提交于
      The tsc-based get_scheduled_cycles interface is not a good match for
      Xen's runstate accounting, which reports everything in nanoseconds.
      
      This patch replaces this interface with a sched_clock interface, which
      matches both Xen and VMI's requirements.
      
      In order to do this, we:
         1. replace get_scheduled_cycles with sched_clock
         2. hoist cycles_2_ns into a common header
         3. update vmi accordingly
      
      One thing to note: because sched_clock is implemented as a weak
      function in kernel/sched.c, we must define a real function in order to
      override this weak binding.  This means the usual paravirt_ops
      technique of using an inline function won't work in this case.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Dan Hecht <dhecht@vmware.com>
      Cc: john stultz <johnstul@us.ibm.com>
      688340ea
    • J
      paravirt: helper to disable all IO space · d572929c
      Jeremy Fitzhardinge 提交于
      In a virtual environment, device drivers such as legacy IDE will waste
      quite a lot of time probing for their devices which will never appear.
      This helper function allows a paravirt implementation to lay claim to
      the whole iomem and ioport space, thereby disabling all device drivers
      trying to claim IO resources.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      d572929c
    • J
      paravirt: add a hook for once the allocator is ready · 6996d3b6
      Jeremy Fitzhardinge 提交于
      Add a hook so that the paravirt backend knows when the allocator is
      ready.  This is useful for the obvious reason that the allocator is
      available, but the other side-effect of having the bootmem allocator
      available is that each page now has an associated "struct page".
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      6996d3b6
    • J
      paravirt: add an "mm" argument to alloc_pt · fdb4c338
      Jeremy Fitzhardinge 提交于
      It's useful to know which mm is allocating a pagetable.  Xen uses this
      to determine whether the pagetable being added to is pinned or not.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      fdb4c338
  6. 26 6月, 2007 1 次提交
  7. 11 5月, 2007 1 次提交
    • E
      Revert "[PATCH] paravirt: Add startup infrastructure for paravirtualization" · 5a18c92a
      Eric W. Biederman 提交于
      This reverts commit c9ccf30d.
      
      Entering the kernel at startup_32 without passing our real mode data in
      %esi, and without guaranteeing that physical and virtual addresses are
      identity mapped makes head.S impossible to maintain.
      
      The only user of this infrastructure is lguest which is not merged so
      nothing we currently support will break by removing this over designed
      nightmare, and only the pending lguest patches will be affected.  The
      pending Xen patches have a different entry point that they use.
      
      We are currently discussing what Xen and lguest need to do to boot the
      kernel in a more normal fashion so using startup_32 in this weird manner is
      clearly not their long term direction.
      
      So let's remove this code in head.S before it causes brain damage to people
      trying to maintain head.S
      
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Zachary Amsden <zach@vmware.com>
      CC: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a18c92a
  8. 10 5月, 2007 1 次提交
  9. 03 5月, 2007 14 次提交
    • J
      [PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear · 4cdd9c89
      Jeremy Fitzhardinge 提交于
      In shadow mode hypervisors, ptep_get_and_clear achieves the desired
      purpose of keeping the shadows in sync by issuing a native_get_and_clear,
      followed by a call to pte_update, which indicates the PTE has been
      modified.
      
      Direct mode hypervisors (Xen) have no need for this anyway, and will trap
      the update using writable pagetables.
      
      This means no hypervisor makes use of ptep_get_and_clear; there is no
      reason to have it in the paravirt-ops structure.  Change confusing
      terminology about raw vs. native functions into consistent use of
      native_pte_xxx for operations which do not invoke paravirt-ops.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      4cdd9c89
    • J
      [PATCH] i386: PARAVIRT: Clean up paravirt patchable wrappers · 1a45b7aa
      Jeremy Fitzhardinge 提交于
      Replace all the open-coded macros for generating calls with a pair of
      more general macros (__PVOP_CALL/VCALL), and redefine all the
      PVOP_V?CALL[0-4] in terms of them.
      
      [ Andrew, Andi: this should slot in immediately after "Document asm-i386/paravirt.h"
        (paravirt_ops-document-asm-i386-paravirth.patch) ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      1a45b7aa
    • J
      [PATCH] i386: PARAVIRT: Use enums for paravirt lazy flush modi · 4e0fa856
      Jeremy Fitzhardinge 提交于
      Remove #defines, add enum for PARAVIRT_LAZY_FLUSH.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      4e0fa856
    • J
      [PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages · ce6234b5
      Jeremy Fitzhardinge 提交于
      Xen and VMI both have special requirements when mapping a highmem pte
      page into the kernel address space.  These can be dealt with by adding
      a new kmap_atomic_pte() function for mapping highptes, and hooking it
      into the paravirt_ops infrastructure.
      
      Xen specifically wants to map the pte page RO, so this patch exposes a
      helper function, kmap_atomic_prot, which maps the page with the
      specified page protections.
      
      This also adds a kmap_flush_unused() function to clear out the cached
      kmap mappings.  Xen needs this to clear out any potential stray RW
      mappings of pages which will become part of a pagetable.
      
      [ Zach - vmi.c will need some attention after this patch.  It wasn't
        immediately obvious to me what needs to be done. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      ce6234b5
    • J
      [PATCH] i386: PARAVIRT: revert map_pt_hook. · a27fe809
      Jeremy Fitzhardinge 提交于
      Back out the map_pt_hook to clear the way for kmap_atomic_pte.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      a27fe809
    • J
      [PATCH] i386: PARAVIRT: add flush_tlb_others paravirt_op · d4c10477
      Jeremy Fitzhardinge 提交于
      This patch adds a pv_op for flush_tlb_others.  Linux running on native
      hardware uses cross-CPU IPIs to flush the TLB on any CPU which may
      have a particular mm's pagetable entries cached in its TLB.  This is
      inefficient in a paravirtualized environment, since the hypervisor
      knows which real CPUs actually contain cached mappings, which may be a
      small subset of a guest's VCPUs.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      d4c10477
    • J
      [PATCH] i386: PARAVIRT: add common patching machinery · 63f70270
      Jeremy Fitzhardinge 提交于
      Implement the actual patching machinery.  paravirt_patch_default()
      contains the logic to automatically patch a callsite based on a few
      simple rules:
      
       - if the paravirt_op function is paravirt_nop, then patch nops
       - if the paravirt_op function is a jmp target, then jmp to it
       - if the paravirt_op function is callable and doesn't clobber too much
          for the callsite, call it directly
      
      paravirt_patch_default is suitable as a default implementation of
      paravirt_ops.patch, will remove most of the expensive indirect calls
      in favour of either a direct call or a pile of nops.
      
      Backends may implement their own patcher, however.  There are several
      helper functions to help with this:
      
      paravirt_patch_nop	nop out a callsite
      paravirt_patch_ignore	leave the callsite as-is
      paravirt_patch_call	patch a call if the caller and callee
      			have compatible clobbers
      paravirt_patch_jmp	patch in a jmp
      paravirt_patch_insns	patch some literal instructions over
      			the callsite, if they fit
      
      This patch also implements more direct patches for the native case, so
      that when running on native hardware many common operations are
      implemented inline.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Anthony Liguori <anthony@codemonkey.ws>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      63f70270
    • J
      [PATCH] i386: PARAVIRT: Document asm-i386/paravirt.h · 294688c0
      Jeremy Fitzhardinge 提交于
      Clean things up, and broadly document:
       - the paravirt_ops functions themselves
       - the patching mechanism
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      294688c0
    • J
      [PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make them patchable · f8822f42
      Jeremy Fitzhardinge 提交于
      Wrap a set of interesting paravirt_ops calls in a wrapper which makes
      the callsites available for patching.  Unfortunately this is pretty
      ugly because there's no way to get gcc to generate a function call,
      but also wrap just the callsite itself with the necessary labels.
      
      This patch supports functions with 0-4 arguments, and either void or
      returning a value.  64-bit arguments must be split into a pair of
      32-bit arguments (lower word first).  Small structures are returned in
      registers.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Anthony Liguori <anthony@codemonkey.ws>
      f8822f42
    • J
      [PATCH] i386: PARAVIRT: Fix patch site clobbers to include return register · 42c24fa2
      Jeremy Fitzhardinge 提交于
      Fix a few clobbers to include the return register.  The clobbers set
      is the set of all registers modified (or may be modified) by the code
      snippet, regardless of whether it was deliberate or accidental.
      
      Also, make sure that callsites which are used in contexts which don't
      allow clobbers actually save and restore all clobberable registers.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      42c24fa2
    • J
      [PATCH] i386: PARAVIRT: Use patch site IDs computed from offset in paravirt_ops structure · d5822035
      Jeremy Fitzhardinge 提交于
      Use patch type identifiers derived from the offset of the operation in
      the paravirt_ops structure.  This avoids having to maintain a separate
      enum for patch site types.
      
      Also, since the identifier is derived from the offset into
      paravirt_ops, the offset can be derived from the identifier.  This is
      used to remove replicated information in the various callsite macros,
      which has been a source of bugs in the past.
      
      This patch also drops the fused save_fl+cli operation, which doesn't
      really add much and makes things more complex - specifically because
      it breaks the 1:1 relationship between identifiers and offsets.  If
      this operation turns out to be particularly beneficial, then the right
      answer is to define a new entrypoint for it.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      d5822035
    • J
      [PATCH] i386: PARAVIRT: rename struct paravirt_patch to paravirt_patch_site for clarity · 98de032b
      Jeremy Fitzhardinge 提交于
      Rename struct paravirt_patch to paravirt_patch_site, so that it
      clearly refers to a callsite, and not the patch which may be applied
      to that callsite.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      98de032b
    • J
      [PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction · d6dd61c8
      Jeremy Fitzhardinge 提交于
      Add hooks to allow a paravirt implementation to track the lifetime of
      an mm.  Paravirtualization requires three hooks, but only two are
      needed in common code.  They are:
      
      arch_dup_mmap, which is called when a new mmap is created at fork
      
      arch_exit_mmap, which is called when the last process reference to an
        mm is dropped, which typically happens on exit and exec.
      
      The third hook is activate_mm, which is called from the arch-specific
      activate_mm() macro/function, and so doesn't need stub versions for
      other architectures.  It's called when an mm is first used.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: linux-arch@vger.kernel.org
      Cc: James Bottomley <James.Bottomley@SteelEye.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      d6dd61c8
    • J
      [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing · 5311ab62
      Jeremy Fitzhardinge 提交于
      Normally when running in PAE mode, the 4th PMD maps the kernel address space,
      which can be shared among all processes (since they all need the same kernel
      mappings).
      
      Xen, however, does not allow guests to have the kernel pmd shared between page
      tables, so parameterize pgtable.c to allow both modes of operation.
      
      There are several side-effects of this.  One is that vmalloc will update the
      kernel address space mappings, and those updates need to be propagated into
      all processes if the kernel mappings are not intrinsically shared.  In the
      non-PAE case, this is done by maintaining a pgd_list of all processes; this
      list is used when all process pagetables must be updated.  pgd_list is
      threaded via otherwise unused entries in the page structure for the pgd, which
      means that the pgd must be page-sized for this to work.
      
      Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
      pgd to page aligned anyway, so this patch forces the pgd to be page
      aligned+sized when the kernel pmd is unshared, to accomodate both these
      requirements.
      
      Also, since there may be several distinct kernel pmds (if the user/kernel
      split is below 3G), there's no point in allocating them from a slab cache;
      they're just allocated with get_free_page and initialized appropriately.  (Of
      course the could be cached if there is just a single kernel pmd - which is the
      default with a 3G user/kernel split - but it doesn't seem worthwhile to add
      yet another case into this code).
      
      [ Many thanks to wli for review comments. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NWilliam Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      5311ab62