1. 21 8月, 2008 3 次提交
  2. 20 8月, 2008 3 次提交
  3. 31 7月, 2008 3 次提交
  4. 28 7月, 2008 2 次提交
  5. 26 7月, 2008 1 次提交
  6. 24 7月, 2008 3 次提交
  7. 22 7月, 2008 2 次提交
  8. 18 7月, 2008 3 次提交
    • J
      xen: report hypervisor version · 95c7c23b
      Jeremy Fitzhardinge 提交于
      Various versions of the hypervisor have differences in what ABIs and
      features they support.  Print some details into the boot log to help
      with remote debugging.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      95c7c23b
    • M
      x86: APIC: remove apic_write_around(); use alternatives · 593f4a78
      Maciej W. Rozycki 提交于
      Use alternatives to select the workaround for the 11AP Pentium erratum
      for the affected steppings on the fly rather than build time.  Remove the
      X86_GOOD_APIC configuration option and replace all the calls to
      apic_write_around() with plain apic_write(), protecting accesses to the
      ESR as appropriate due to the 3AP Pentium erratum.  Remove
      apic_read_around() and all its invocations altogether as not needed.
      Remove apic_write_atomic() and all its implementing backends.  The use of
      ASM_OUTPUT2() is not strictly needed for input constraints, but I have
      used it for readability's sake.
      
      I had the feeling no one else was brave enough to do it, so I went ahead
      and here it is.  Verified by checking the generated assembly and tested
      with both a 32-bit and a 64-bit configuration, also with the 11AP
      "feature" forced on and verified with gdb on /proc/kcore to work as
      expected (as an 11AP machines are quite hard to get hands on these days).
      Some script complained about the use of "volatile", but apic_write() needs
      it for the same reason and is effectively a replacement for writel(), so I
      have disregarded it.
      
      I am not sure what the policy wrt defconfig files is, they are generated
      and there is risk of a conflict resulting from an unrelated change, so I
      have left changes to them out.  The option will get removed from them at
      the next run.
      
      Some testing with machines other than mine will be needed to avoid some
      stupid mistake, but despite its volume, the change is not really that
      intrusive, so I am fairly confident that because it works for me, it will
      everywhere.
      Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      593f4a78
    • J
      x86, xen, power: fix up config dependencies on PM · 93a0886e
      Jeremy Fitzhardinge 提交于
      Xen save/restore needs bits of code enabled by PM_SLEEP, and PM_SLEEP
      depends on PM.  So make XEN_SAVE_RESTORE depend on PM and PM_SLEEP
      depend on XEN_SAVE_RESTORE.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      93a0886e
  9. 16 7月, 2008 20 次提交
    • J
      xen: implement Xen-specific spinlocks · 2d9e1e2f
      Jeremy Fitzhardinge 提交于
      The standard ticket spinlocks are very expensive in a virtual
      environment, because their performance depends on Xen's scheduler
      giving vcpus time in the order that they're supposed to take the
      spinlock.
      
      This implements a Xen-specific spinlock, which should be much more
      efficient.
      
      The fast-path is essentially the old Linux-x86 locks, using a single
      lock byte.  The locker decrements the byte; if the result is 0, then
      they have the lock.  If the lock is negative, then locker must spin
      until the lock is positive again.
      
      When there's contention, the locker spin for 2^16[*] iterations waiting
      to get the lock.  If it fails to get the lock in that time, it adds
      itself to the contention count in the lock and blocks on a per-cpu
      event channel.
      
      When unlocking the spinlock, the locker looks to see if there's anyone
      blocked waiting for the lock by checking for a non-zero waiter count.
      If there's a waiter, it traverses the per-cpu "lock_spinners"
      variable, which contains which lock each CPU is waiting on.  It picks
      one CPU waiting on the lock and sends it an event to wake it up.
      
      This allows efficient fast-path spinlock operation, while allowing
      spinning vcpus to give up their processor time while waiting for a
      contended lock.
      
      [*] 2^16 iterations is threshold at which 98% locks have been taken
      according to Thomas Friebel's Xen Summit talk "Preventing Guests from
      Spinning Around".  Therefore, we'd expect the lock and unlock slow
      paths will only be entered 2% of the time.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <clameter@linux-foundation.org>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: Virtualization <virtualization@lists.linux-foundation.org>
      Cc: Xen devel <xen-devel@lists.xensource.com>
      Cc: Thomas Friebel <thomas.friebel@amd.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2d9e1e2f
    • J
      xen: use lock-byte spinlock implementation · 56397f8d
      Jeremy Fitzhardinge 提交于
      Switch to using the lock-byte spinlock implementation, to avoid the
      worst of the performance hit from ticket locks.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <clameter@linux-foundation.org>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: Virtualization <virtualization@lists.linux-foundation.org>
      Cc: Xen devel <xen-devel@lists.xensource.com>
      Cc: Thomas Friebel <thomas.friebel@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      56397f8d
    • J
      x86: xen: no need to disable vdso32 · d5303b81
      Jeremy Fitzhardinge 提交于
      Now that the vdso32 code can cope with both syscall and sysenter
      missing for 32-bit compat processes, just disable the features without
      disabling vdso altogether.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      d5303b81
    • J
      x86_64: further cleanup of 32-bit compat syscall mechanisms · 6a52e4b1
      Jeremy Fitzhardinge 提交于
      AMD only supports "syscall" from 32-bit compat usermode.
      Intel and Centaur(?) only support "sysenter" from 32-bit compat usermode.
      
      Set the X86 feature bits accordingly, and set up the vdso in
      accordance with those bits.  On the offchance we run on in a 64-bit
      environment which supports neither syscall nor sysenter from 32-bit
      mode, then fall back to the int $0x80 vdso.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      6a52e4b1
    • I
      x86, xen, vdso: fix build error · 71415c6a
      Ingo Molnar 提交于
      fix:
      
         arch/x86/xen/built-in.o: In function `xen_enable_syscall':
         (.cpuinit.text+0xdb): undefined reference to `sysctl_vsyscall32'
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      71415c6a
    • J
      xen64: disable 32-bit syscall/sysenter if not supported. · 62541c37
      Jeremy Fitzhardinge 提交于
      Old versions of Xen (3.1 and before) don't support sysenter or syscall
      from 32-bit compat userspaces.  If we can't set the appropriate
      syscall callback, then disable the corresponding feature bit, which
      will cause the vdso32 setup to fall back appropriately.
      
      Linux assumes that syscall is always available to 32-bit userspace,
      and installs it by default if sysenter isn't available.  In that case,
      we just disable vdso altogether, forcing userspace libc to fall back
      to int $0x80.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      62541c37
    • I
      xen64: fix build error on 32-bit + !HIGHMEM · b3fe1243
      Ingo Molnar 提交于
      fix:
      
      arch/x86/xen/enlighten.c: In function 'xen_set_fixmap':
      arch/x86/xen/enlighten.c:1127: error: 'FIX_KMAP_BEGIN' undeclared (first use in this function)
      arch/x86/xen/enlighten.c:1127: error: (Each undeclared identifier is reported only once
      arch/x86/xen/enlighten.c:1127: error: for each function it appears in.)
      arch/x86/xen/enlighten.c:1127: error: 'FIX_KMAP_END' undeclared (first use in this function)
      make[1]: *** [arch/x86/xen/enlighten.o] Error 1
      make: *** [arch/x86/xen/enlighten.o] Error 2
      
      FIX_KMAP_BEGIN is only available on HIGHMEM.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b3fe1243
    • J
      xen: update Kconfig to allow 64-bit Xen · 51dd660a
      Jeremy Fitzhardinge 提交于
      Allow Xen to be enabled on 64-bit.
      
      Also extend domain size limit from 8 GB (on 32-bit) to 32 GB on 64-bit.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      51dd660a
    • J
      xen: implement Xen write_msr operation · 1153968a
      Jeremy Fitzhardinge 提交于
      64-bit uses MSRs for important things like the base for fs and
      gs-prefixed addresses.  It's more efficient to use a hypercall to
      update these, rather than go via the trap and emulate path.
      
      Other MSR writes are just passed through; in an unprivileged domain
      they do nothing, but it might be useful later.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1153968a
    • J
      xen64: set up userspace syscall patch · bf18bf94
      Jeremy Fitzhardinge 提交于
      64-bit userspace expects the vdso to be mapped at a specific fixed
      address, which happens to be in the middle of the kernel address
      space.  Because we have split user and kernel pagetables, we need to
      make special arrangements for the vsyscall mapping to appear in the
      kernel part of the user pagetable.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bf18bf94
    • J
      xen64: set up syscall and sysenter entrypoints for 64-bit · 6fcac6d3
      Jeremy Fitzhardinge 提交于
      We set up entrypoints for syscall and sysenter.  sysenter is only used
      for 32-bit compat processes, whereas syscall can be used in by both 32
      and 64-bit processes.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6fcac6d3
    • J
      xen64: allocate and manage user pagetables · d6182fbf
      Jeremy Fitzhardinge 提交于
      Because the x86_64 architecture does not enforce segment limits, Xen
      cannot protect itself with them as it does in 32-bit mode.  Therefore,
      to protect itself, it runs the guest kernel in ring 3.  Since it also
      runs the guest userspace in ring3, the guest kernel must maintain a
      second pagetable for its userspace, which does not map kernel space.
      Naturally, the guest kernel pagetables map both kernel and userspace.
      
      The userspace pagetable is attached to the corresponding kernel
      pagetable via the pgd's page->private field.  It is allocated and
      freed at the same time as the kernel pgd via the
      paravirt_pgd_alloc/free hooks.
      
      Fortunately, the user pagetable is almost entirely shared with the
      kernel pagetable; the only difference is the pgd page itself.  set_pgd
      will populate all entries in the kernel pagetable, and also set the
      corresponding user pgd entry if the address is less than
      STACK_TOP_MAX.
      
      The user pagetable must be pinned and unpinned with the kernel one,
      but because the pagetables are aliased, pgd_walk() only needs to be
      called on the kernel pagetable.  The user pgd page is then
      pinned/unpinned along with the kernel pgd page.
      
      xen_write_cr3 must write both the kernel and user cr3s.
      
      The init_mm.pgd pagetable never has a user pagetable allocated for it,
      because it can never be used while running usermode.
      
      One awkward area is that early in boot the page structures are not
      available.  No user pagetable can exist at that point, but it
      complicates the logic to avoid looking at the page structure.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d6182fbf
    • E
      xen64: Clear %fs on xen_load_tls() · 8a95408e
      Eduardo Habkost 提交于
      We need to do this, otherwise we can get a GPF on hypercall return
      after TLS descriptor is cleared but %fs is still pointing to it.
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8a95408e
    • J
      xen: make sure the kernel command line is right · b7c3c5c1
      Jeremy Fitzhardinge 提交于
      Point the boot params cmd_line_ptr to the domain-builder-provided
      command line.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b7c3c5c1
    • J
      xen: rework pgd_walk to deal with 32/64 bit · 5deb30d1
      Jeremy Fitzhardinge 提交于
      Rewrite pgd_walk to deal with 64-bit address spaces.  There are two
      notible features of 64-bit workspaces:
      
       1. The physical address is only 48 bits wide, with the upper 16 bits
          being sign extension; kernel addresses are negative, and userspace is
          positive.
      
       2. The Xen hypervisor mapping is at the negative-most address, just above
          the sign-extension hole.
      
      1. means that we can't easily use addresses when traversing the space,
      since we must deal with sign extension.  This rewrite expresses
      everything in terms of pgd/pud/pmd indices, which means we don't need
      to worry about the exact configuration of the virtual memory space.
      This approach works equally well in 32-bit.
      
      To deal with 2, assume the hole is between the uppermost userspace
      address and PAGE_OFFSET.  For 64-bit this skips the Xen mapping hole.
      For 32-bit, the hole is zero-sized.
      
      In all cases, the uppermost kernel address is FIXADDR_TOP.
      
      A side-effect of this patch is that the upper boundary is actually
      handled properly, exposing a long-standing bug in 32-bit, which failed
      to pin kernel pmd page.  The kernel pmd is not shared, and so must be
      explicitly pinned, even though the kernel ptes are shared and don't
      need pinning.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5deb30d1
    • E
      xen64: implement xen_load_gs_index() · a8fc1089
      Eduardo Habkost 提交于
      xen-64: implement xen_load_gs_index()
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a8fc1089
    • J
      xen64: add identity irq->vector map · 0725cbb9
      Jeremy Fitzhardinge 提交于
      The x86_64 interrupt subsystem is oriented towards vectors, as opposed
      to a flat irq space as it is in x86-32.  This patch adds a simple
      identity irq->vector mapping so that we can continue to feed irqs into
      do_IRQ() and get a good result.
      
      Ideally x86_32 will unify with the 64-bit code and use vectors too.
      At that point we can move to mapping event channels to vectors, which
      will allow us to economise on irqs (so per-cpu event channels can
      share irqs, rather than having to allocte one per cpu, for example).
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0725cbb9
    • J
      xen64: register callbacks in arch-independent way · 88459d4c
      Jeremy Fitzhardinge 提交于
      Use callback_op hypercall to register callbacks in a 32/64-bit
      independent way (64-bit doesn't need a code segment, but that detail
      is hidden in XEN_CALLBACK).
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      88459d4c
    • J
      xen64: add pvop for swapgs · 952d1d70
      Jeremy Fitzhardinge 提交于
      swapgs is a no-op under Xen, because the hypervisor makes sure the
      right version of %gs is current when switching between user and kernel
      modes.  This means that the swapgs "implementation" can be inlined and
      used when the stack is unsafe (usermode).  Unfortunately, it means
      that disabling patching will result in a non-booting kernel...
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      952d1d70
    • J
      xen64: deal with extra words Xen pushes onto exception frames · 997409d3
      Jeremy Fitzhardinge 提交于
      Xen pushes two extra words containing the values of rcx and r11.  This
      pvop hook copies the words back into their appropriate registers, and
      cleans them off the stack.  This leaves the stack in native form, so
      the normal handler can run unchanged.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      997409d3