1. 05 4月, 2008 2 次提交
  2. 27 3月, 2008 2 次提交
    • J
      xen: fix UP setup of shared_info · 2e8fe719
      Jeremy Fitzhardinge 提交于
      We need to set up the shared_info pointer once we've mapped the real
      shared_info into its fixmap slot.  That needs to happen once the general
      pagetable setup has been done.  Previously, the UP shared_info was set
      up one in xen_start_kernel, but that was left pointing to the dummy
      shared info.  Unfortunately there's no really good place to do a later
      setup of the shared_info in UP, so just do it once the pagetable setup
      has been done.
      
      [ Stable: needed in 2.6.24.x ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stable Kernel <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2e8fe719
    • J
      xen: fix RMW when unmasking events · 04c44a08
      Jeremy Fitzhardinge 提交于
      xen_irq_enable_direct and xen_sysexit were using "andw $0x00ff,
      XEN_vcpu_info_pending(vcpu)" to unmask events and test for pending ones
      in one instuction.
      
      Unfortunately, the pending flag must be modified with a locked operation
      since it can be set by another CPU, and the unlocked form of this
      operation was causing the pending flag to get lost, allowing the processor
      to return to usermode with pending events and ultimately deadlock.
      
      The simple fix would be to make it a locked operation, but that's rather
      costly and unnecessary.  The fix here is to split the mask-clearing and
      pending-testing into two instructions; the interrupt window between
      them is of no concern because either way pending or new events will
      be processed.
      
      This should fix lingering bugs in using direct vcpu structure access too.
      
      [ Stable: needed in 2.6.24.x ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stable <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      04c44a08
  3. 01 3月, 2008 1 次提交
  4. 13 2月, 2008 1 次提交
  5. 30 1月, 2008 10 次提交
  6. 24 1月, 2008 1 次提交
  7. 11 12月, 2007 1 次提交
  8. 18 10月, 2007 1 次提交
  9. 17 10月, 2007 8 次提交
    • H
      [x86] remove uses of magic macros for boot_params access · 30c82645
      H. Peter Anvin 提交于
      Instead of using magic macros for boot_params access, simply use the
      boot_params structure.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      30c82645
    • J
      xen: fix incorrect vcpu_register_vcpu_info hypercall argument · e3d26976
      Jeremy Fitzhardinge 提交于
      The kernel's copy of struct vcpu_register_vcpu_info was out of date,
      at best causing the hypercall to fail and the guest kernel to fall
      back to the old mechanism, or worse, causing random memory corruption.
      
      [ Stable folks: applies to 2.6.23 ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Stable Kernel <stable@kernel.org>
      Cc: Morten =?utf-8?q?B=C3=B8geskov?= <xen-users@morten.bogeskov.dk>
      Cc: Mark Williamson <mark.williamson@cl.cam.ac.uk>
      e3d26976
    • J
      xen: ask the hypervisor how much space it needs reserved · fb1d8404
      Jeremy Fitzhardinge 提交于
      Ask the hypervisor how much space it needs reserved, since 32-on-64
      doesn't need any space, and it may change in future.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      fb1d8404
    • J
      xen: lock pte pages while pinning/unpinning · 74260714
      Jeremy Fitzhardinge 提交于
      When a pagetable is created, it is made globally visible in the rmap
      prio tree before it is pinned via arch_dup_mmap(), and remains in the
      rmap tree while it is unpinned with arch_exit_mmap().
      
      This means that other CPUs may race with the pinning/unpinning
      process, and see a pte between when it gets marked RO and actually
      pinned, causing any pte updates to fail with write-protect faults.
      
      As a result, all pte pages must be properly locked, and only unlocked
      once the pinning/unpinning process has finished.
      
      In order to avoid taking spinlocks for the whole pagetable - which may
      overflow the PREEMPT_BITS portion of preempt counter - it locks and pins
      each pte page individually, and then finally pins the whole pagetable.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickens <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Keir Fraser <keir@xensource.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      74260714
    • J
      xen: deal with stale cr3 values when unpinning pagetables · 9f79991d
      Jeremy Fitzhardinge 提交于
      When a pagetable is no longer in use, it must be unpinned so that its
      pages can be freed.  However, this is only possible if there are no
      stray uses of the pagetable.  The code currently deals with all the
      usual cases, but there's a rare case where a vcpu is changing cr3, but
      is doing so lazily, and the change hasn't actually happened by the time
      the pagetable is unpinned, even though it appears to have been completed.
      
      This change adds a second per-cpu cr3 variable - xen_current_cr3 -
      which tracks the actual state of the vcpu cr3.  It is only updated once
      the actual hypercall to set cr3 has been completed.  Other processors
      wishing to unpin a pagetable can check other vcpu's xen_current_cr3
      values to see if any cross-cpu IPIs are needed to clean things up.
      
      [ Stable folks: 2.6.23 bugfix ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Stable Kernel <stable@kernel.org>
      9f79991d
    • J
      Clean up duplicate includes in arch/i386/xen/ · d626a1f1
      Jesper Juhl 提交于
      This patch cleans up duplicate includes in
      	arch/i386/xen/
      Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      d626a1f1
    • J
      paravirt: clean up lazy mode handling · 8965c1c0
      Jeremy Fitzhardinge 提交于
      Currently, the set_lazy_mode pv_op is overloaded with 5 functions:
       1. enter lazy cpu mode
       2. leave lazy cpu mode
       3. enter lazy mmu mode
       4. leave lazy mmu mode
       5. flush pending batched operations
      
      This complicates each paravirt backend, since it needs to deal with
      all the possible state transitions, handling flushing, etc. In
      particular, flushing is quite distinct from the other 4 functions, and
      seems to just cause complication.
      
      This patch removes the set_lazy_mode operation, and adds "enter" and
      "leave" lazy mode operations on mmu_ops and cpu_ops.  All the logic
      associated with enter and leaving lazy states is now in common code
      (basically BUG_ONs to make sure that no mode is current when entering
      a lazy mode, and make sure that the mode is current when leaving).
      Also, flush is handled in a common way, by simply leaving and
      re-entering the lazy mode.
      
      The result is that the Xen, lguest and VMI lazy mode implementations
      are much simpler.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zach Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Anthony Liguory <aliguori@us.ibm.com>
      Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      8965c1c0
    • J
      paravirt: refactor struct paravirt_ops into smaller pv_*_ops · 93b1eab3
      Jeremy Fitzhardinge 提交于
      This patch refactors the paravirt_ops structure into groups of
      functionally related ops:
      
      pv_info - random info, rather than function entrypoints
      pv_init_ops - functions used at boot time (some for module_init too)
      pv_misc_ops - lazy mode, which didn't fit well anywhere else
      pv_time_ops - time-related functions
      pv_cpu_ops - various privileged instruction ops
      pv_irq_ops - operations for managing interrupt state
      pv_apic_ops - APIC operations
      pv_mmu_ops - operations for managing pagetables
      
      There are several motivations for this:
      
      1. Some of these ops will be general to all x86, and some will be
         i386/x86-64 specific.  This makes it easier to share common stuff
         while allowing separate implementations where needed.
      
      2. At the moment we must export all of paravirt_ops, but modules only
         need selected parts of it.  This allows us to export on a case by case
         basis (and also choose which export license we want to apply).
      
      3. Functional groupings make things a bit more readable.
      
      Struct paravirt_ops is now only used as a template to generate
      patch-site identifiers, and to extract function pointers for inserting
      into jmp/calls when patching.  It is only instantiated when needed.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zach Amsden <zach@vmware.com>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Anthony Liguory <aliguori@us.ibm.com>
      Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      93b1eab3
  10. 11 10月, 2007 1 次提交
  11. 20 9月, 2007 1 次提交
  12. 12 8月, 2007 1 次提交
    • A
      i386: Make patching more robust, fix paravirt issue · ab144f5e
      Andi Kleen 提交于
      Commit 19d36ccd "x86: Fix alternatives
      and kprobes to remap write-protected kernel text" uses code which is
      being patched for patching.
      
      In particular, paravirt_ops does patching in two stages: first it
      calls paravirt_ops.patch, then it fills any remaining instructions
      with nop_out().  nop_out calls text_poke() which calls
      lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val):
      that call site is one of the places we patch.
      
      If we always do patching as one single call to text_poke(), we only
      need make sure we're not patching the memcpy in text_poke itself.
      This means the prototype to paravirt_ops.patch needs to change, to
      marshal the new code into a buffer rather than patching in place as it
      does now.  It also means all patching goes through text_poke(), which
      is known to be safe (apply_alternatives is also changed to make a
      single patch).
      
      AK: fix compilation on x86-64 (bad rusty!)
      AK: fix boot on x86-64 (sigh)
      AK: merged with other patches
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab144f5e
  13. 18 7月, 2007 10 次提交
    • J
      xen: use iret directly when possible · 9ec2b804
      Jeremy Fitzhardinge 提交于
      Most of the time we can simply use the iret instruction to exit the
      kernel, rather than having to use the iret hypercall - the only
      exception is if we're returning into vm86 mode, or from delivering an
      NMI (which we don't support yet).
      
      When running native, iret has the behaviour of testing for a pending
      interrupt atomically with re-enabling interrupts.  Unfortunately
      there's no way to do this with Xen, so there's a window in which we
      could get a recursive exception after enabling events but before
      actually returning to userspace.
      
      This causes a problem: if the nested interrupt causes one of the
      task's TIF_WORK_MASK flags to be set, they will not be checked again
      before returning to userspace.  This means that pending work may be
      left pending indefinitely, until the process enters and leaves the
      kernel again.  The net effect is that a pending signal or reschedule
      event could be delayed for an unbounded amount of time.
      
      To deal with this, the xen event upcall handler checks to see if the
      EIP is within the critical section of the iret code, after events
      are (potentially) enabled up to the iret itself.  If its within this
      range, it calls the iret critical section fixup, which adjusts the
      stack to deal with any unrestored registers, and then shifts the
      stack frame up to replace the previous invocation.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      9ec2b804
    • J
      xen: Attempt to patch inline versions of common operations · 6487673b
      Jeremy Fitzhardinge 提交于
      This patchs adds the mechanism to allow us to patch inline versions of
      common operations.
      
      The implementations of the direct-access versions save_fl, restore_fl,
      irq_enable and irq_disable are now in assembler, and the same code is
      used for both out of line and inline uses.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Keir Fraser <keir@xensource.com>
      6487673b
    • J
      xen: Place vcpu_info structure into per-cpu memory · 60223a32
      Jeremy Fitzhardinge 提交于
      An experimental patch for Xen allows guests to place their vcpu_info
      structs anywhere.  We try to use this to place the vcpu_info into the
      PDA, which allows direct access.
      
      If this works, then switch to using direct access operations for
      irq_enable, disable, save_fl and restore_fl.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Keir Fraser <keir@xensource.com>
      60223a32
    • J
      xen: machine operations · fefa629a
      Jeremy Fitzhardinge 提交于
      Make the appropriate hypercalls to halt and reboot the virtual machine.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      fefa629a
    • J
      xen: hack to prevent bad segment register reload · 8b84ad94
      Jeremy Fitzhardinge 提交于
      The hypervisor saves and restores the segment registers as part of the
      state is saves while context switching.  If, during a context switch,
      the next process doesn't use the TLS segments, it invalidates the GDT
      entry, causing the segment register reload to fault.  This fault
      effectively doubles the cost of a context switch.
      
      This patch is a band-aid workaround which clears the usermode %gs
      after it has been saved for the previous process, but before it gets
      reloaded for the next, and it avoids having the hypervisor attempt to
      erroneously reload it.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      8b84ad94
    • J
      xen: lazy-mmu operations · d66bf8fc
      Jeremy Fitzhardinge 提交于
      This patch uses the lazy-mmu hooks to batch mmu operations where
      possible.  This is primarily useful for batching operations applied to
      active pagetables, which happens during mprotect, munmap, mremap and
      the like (mmap does not do bulk pagetable operations, so it isn't
      helped).
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      d66bf8fc
    • J
      xen: Add support for preemption · f120f13e
      Jeremy Fitzhardinge 提交于
      Add Xen support for preemption.  This is mostly a cleanup of existing
      preempt_enable/disable calls, or just comments to explain the current
      usage.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      f120f13e
    • J
      xen: SMP guest support · f87e4cac
      Jeremy Fitzhardinge 提交于
      This is a fairly straightforward Xen implementation of smp_ops.
      
      Xen has its own IPI mechanisms, and has no dependency on any
      APIC-based IPI.  The smp_ops hooks and the flush_tlb_others pv_op
      allow a Xen guest to avoid all APIC code in arch/i386 (the only apic
      operation is a single apic_read for the apic version number).
      
      One subtle point which needs to be addressed is unpinning pagetables
      when another cpu may have a lazy tlb reference to the pagetable. Xen
      will not allow an in-use pagetable to be unpinned, so we must find any
      other cpus with a reference to the pagetable and get them to shoot
      down their references.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      f87e4cac
    • J
      xen: Implement sched_clock · ab550288
      Jeremy Fitzhardinge 提交于
      Implement xen_sched_clock, which returns the number of ns the current
      vcpu has been actually in an unstolen state (ie, running or blocked,
      vs runnable-but-not-running, or offline) since boot.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Cc: john stultz <johnstul@us.ibm.com>
      ab550288
    • J
      xen: ignore RW mapping of RO pages in pagetable_init · 9a4029fd
      Jeremy Fitzhardinge 提交于
      When setting up the initial pagetable, which includes mappings of all
      low physical memory, ignore a mapping which tries to set the RW bit on
      an RO pte.  An RO pte indicates a page which is part of the current
      pagetable, and so it cannot be allowed to become RW.
      
      Once xen_pagetable_setup_done is called, set_pte reverts to its normal
      behaviour.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Cc: ebiederm@xmission.com (Eric W. Biederman)
      9a4029fd