1. 23 7月, 2010 1 次提交
  2. 04 12月, 2009 2 次提交
    • I
      xen: correctly restore pfn_to_mfn_list_list after resume · fa24ba62
      Ian Campbell 提交于
      pvops kernels >= 2.6.30 can currently only be saved and restored once. The
      second attempt to save results in:
      
          ERROR Internal error: Frame# in pfn-to-mfn frame list is not in pseudophys
          ERROR Internal error: entry 0: p2m_frame_list[0] is 0xf2c2c2c2, max 0x120000
          ERROR Internal error: Failed to map/save the p2m frame list
      
      I finally narrowed it down to:
      
          commit cdaead6b
              Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
              Date:   Fri Feb 27 15:34:59 2009 -0800
      
                  xen: split construction of p2m mfn tables from registration
      
                  Build the p2m_mfn_list_list early with the rest of the p2m table, but
                  register it later when the real shared_info structure is in place.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      
      The unforeseen side-effect of this change was to cause the mfn list list to not
      be rebuilt on resume. Prior to this change it would have been rebuilt via
      xen_post_suspend() -> xen_setup_shared_info() -> xen_setup_mfn_list_list().
      
      Fix by explicitly calling xen_build_mfn_list_list() from xen_post_suspend().
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stable Kernel <stable@kernel.org>
      fa24ba62
    • I
      xen: re-register runstate area earlier on resume. · be012920
      Ian Campbell 提交于
      This is necessary to ensure the runstate area is available to
      xen_sched_clock before any calls to printk which will require it in
      order to provide a timestamp.
      
      I chose to pull the xen_setup_runstate_info out of xen_time_init into
      the caller in order to maintain parity with calling
      xen_setup_runstate_info separately from calling xen_time_resume.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stable Kernel <stable@kernel.org>
      be012920
  3. 31 8月, 2009 1 次提交
  4. 16 5月, 2009 1 次提交
    • J
      x86: Fix performance regression caused by paravirt_ops on native kernels · b4ecc126
      Jeremy Fitzhardinge 提交于
      Xiaohui Xin and some other folks at Intel have been looking into what's
      behind the performance hit of paravirt_ops when running native.
      
      It appears that the hit is entirely due to the paravirtualized
      spinlocks introduced by:
      
       | commit 8efcbab6
       | Date:   Mon Jul 7 12:07:51 2008 -0700
       |
       |     paravirt: introduce a "lock-byte" spinlock implementation
      
      The extra call/return in the spinlock path is somehow
      causing an increase in the cycles/instruction of somewhere around 2-7%
      (seems to vary quite a lot from test to test).  The working theory is
      that the CPU's pipeline is getting upset about the
      call->call->locked-op->return->return, and seems to be failing to
      speculate (though I haven't seen anything definitive about the precise
      reasons).  This doesn't entirely make sense, because the performance
      hit is also visible on unlock and other operations which don't involve
      locked instructions.  But spinlock operations clearly swamp all the
      other pvops operations, even though I can't imagine that they're
      nearly as common (there's only a .05% increase in instructions
      executed).
      
      If I disable just the pv-spinlock calls, my tests show that pvops is
      identical to non-pvops performance on native (my measurements show that
      it is actually about .1% faster, but Xiaohui shows a .05% slowdown).
      
      Summary of results, averaging 10 runs of the "mmperf" test, using a
      no-pvops build as baseline:
      
      		nopv		Pv-nospin	Pv-spin
      CPU cycles	100.00%		99.89%		102.18%
      instructions	100.00%		100.10%		100.15%
      CPI		100.00%		99.79%		102.03%
      cache ref	100.00%		100.84%		100.28%
      cache miss	100.00%		90.47%		88.56%
      cache miss rate	100.00%		89.72%		88.31%
      branches	100.00%		99.93%		100.04%
      branch miss	100.00%		103.66%		107.72%
      branch miss rt	100.00%		103.73%		107.67%
      wallclock	100.00%		99.90%		102.20%
      
      The clear effect here is that the 2% increase in CPI is
      directly reflected in the final wallclock time.
      
      (The other interesting effect is that the more ops are
      out of line calls via pvops, the lower the cache access
      and miss rates.  Not too surprising, but it suggests that
      the non-pvops kernel is over-inlined.  On the flipside,
      the branch misses go up correspondingly...)
      
      So, what's the fix?
      
      Paravirt patching turns all the pvops calls into direct calls, so
      _spin_lock etc do end up having direct calls.  For example, the compiler
      generated code for paravirtualized _spin_lock is:
      
      <_spin_lock+0>:		mov    %gs:0xb4c8,%rax
      <_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
      <_spin_lock+15>:	callq  *0xffffffff805a5b30
      <_spin_lock+22>:	retq
      
      The indirect call will get patched to:
      <_spin_lock+0>:		mov    %gs:0xb4c8,%rax
      <_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
      <_spin_lock+15>:	callq <__ticket_spin_lock>
      <_spin_lock+20>:	nop; nop		/* or whatever 2-byte nop */
      <_spin_lock+22>:	retq
      
      One possibility is to inline _spin_lock, etc, when building an
      optimised kernel (ie, when there's no spinlock/preempt
      instrumentation/debugging enabled).  That will remove the outer
      call/return pair, returning the instruction stream to a single
      call/return, which will presumably execute the same as the non-pvops
      case.  The downsides arel 1) it will replicate the
      preempt_disable/enable code at eack lock/unlock callsite; this code is
      fairly small, but not nothing; and 2) the spinlock definitions are
      already a very heavily tangled mass of #ifdefs and other preprocessor
      magic, and making any changes will be non-trivial.
      
      The other obvious answer is to disable pv-spinlocks.  Making them a
      separate config option is fairly easy, and it would be trivial to
      enable them only when Xen is enabled (as the only non-default user).
      But it doesn't really address the common case of a distro build which
      is going to have Xen support enabled, and leaves the open question of
      whether the native performance cost of pv-spinlocks is worth the
      performance improvement on a loaded Xen system (10% saving of overall
      system CPU when guests block rather than spin).  Still it is a
      reasonable short-term workaround.
      
      [ Impact: fix pvops performance regression when running native ]
      Analysed-by: N"Xin Xiaohui" <xiaohui.xin@intel.com>
      Analysed-by: N"Li Xin" <xin.li@intel.com>
      Analysed-by: N"Nakajima Jun" <jun.nakajima@intel.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Xen-devel <xen-devel@lists.xensource.com>
      LKML-Reference: <4A0B62F7.5030802@goop.org>
      [ fixed the help text ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b4ecc126
  5. 09 4月, 2009 1 次提交
  6. 31 3月, 2009 1 次提交
  7. 30 3月, 2009 1 次提交
  8. 05 2月, 2009 1 次提交
  9. 31 1月, 2009 1 次提交
  10. 17 12月, 2008 1 次提交
  11. 01 12月, 2008 1 次提交
  12. 09 9月, 2008 1 次提交
  13. 25 8月, 2008 1 次提交
    • A
      xen: implement CPU hotplugging · d68d82af
      Alex Nixon 提交于
      Note the changes from 2.6.18-xen CPU hotplugging:
      
      A vcpu_down request from the remote admin via Xenbus both hotunplugs the
      CPU, and disables it by removing it from the cpu_present map, and removing
      its entry in /sys.
      
      A vcpu_up request from the remote admin only re-enables the CPU, and does
      not immediately bring the CPU up. A udev event is emitted, which can be
      caught by the user if he wishes to automatically re-up CPUs when available,
      or implement a more complex policy.
      Signed-off-by: NAlex Nixon <alex.nixon@citrix.com>
      Acked-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d68d82af
  14. 22 8月, 2008 1 次提交
  15. 31 7月, 2008 1 次提交
  16. 24 7月, 2008 1 次提交
  17. 16 7月, 2008 5 次提交
  18. 09 7月, 2008 1 次提交
  19. 26 6月, 2008 1 次提交
  20. 02 6月, 2008 2 次提交
  21. 27 5月, 2008 4 次提交
  22. 25 4月, 2008 3 次提交
  23. 17 4月, 2008 1 次提交
  24. 17 10月, 2007 3 次提交
    • J
      xen: deal with stale cr3 values when unpinning pagetables · 9f79991d
      Jeremy Fitzhardinge 提交于
      When a pagetable is no longer in use, it must be unpinned so that its
      pages can be freed.  However, this is only possible if there are no
      stray uses of the pagetable.  The code currently deals with all the
      usual cases, but there's a rare case where a vcpu is changing cr3, but
      is doing so lazily, and the change hasn't actually happened by the time
      the pagetable is unpinned, even though it appears to have been completed.
      
      This change adds a second per-cpu cr3 variable - xen_current_cr3 -
      which tracks the actual state of the vcpu cr3.  It is only updated once
      the actual hypercall to set cr3 has been completed.  Other processors
      wishing to unpin a pagetable can check other vcpu's xen_current_cr3
      values to see if any cross-cpu IPIs are needed to clean things up.
      
      [ Stable folks: 2.6.23 bugfix ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Stable Kernel <stable@kernel.org>
      9f79991d
    • J
      xen: yield to IPI target if necessary · f0d73394
      Jeremy Fitzhardinge 提交于
      When sending a call-function IPI to a vcpu, yield if the vcpu isn't
      running.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      f0d73394
    • J
      paravirt: clean up lazy mode handling · 8965c1c0
      Jeremy Fitzhardinge 提交于
      Currently, the set_lazy_mode pv_op is overloaded with 5 functions:
       1. enter lazy cpu mode
       2. leave lazy cpu mode
       3. enter lazy mmu mode
       4. leave lazy mmu mode
       5. flush pending batched operations
      
      This complicates each paravirt backend, since it needs to deal with
      all the possible state transitions, handling flushing, etc. In
      particular, flushing is quite distinct from the other 4 functions, and
      seems to just cause complication.
      
      This patch removes the set_lazy_mode operation, and adds "enter" and
      "leave" lazy mode operations on mmu_ops and cpu_ops.  All the logic
      associated with enter and leaving lazy states is now in common code
      (basically BUG_ONs to make sure that no mode is current when entering
      a lazy mode, and make sure that the mode is current when leaving).
      Also, flush is handled in a common way, by simply leaving and
      re-entering the lazy mode.
      
      The result is that the Xen, lguest and VMI lazy mode implementations
      are much simpler.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zach Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Anthony Liguory <aliguori@us.ibm.com>
      Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      8965c1c0
  25. 11 10月, 2007 1 次提交
  26. 18 7月, 2007 2 次提交
    • J
      xen: use iret directly when possible · 9ec2b804
      Jeremy Fitzhardinge 提交于
      Most of the time we can simply use the iret instruction to exit the
      kernel, rather than having to use the iret hypercall - the only
      exception is if we're returning into vm86 mode, or from delivering an
      NMI (which we don't support yet).
      
      When running native, iret has the behaviour of testing for a pending
      interrupt atomically with re-enabling interrupts.  Unfortunately
      there's no way to do this with Xen, so there's a window in which we
      could get a recursive exception after enabling events but before
      actually returning to userspace.
      
      This causes a problem: if the nested interrupt causes one of the
      task's TIF_WORK_MASK flags to be set, they will not be checked again
      before returning to userspace.  This means that pending work may be
      left pending indefinitely, until the process enters and leaves the
      kernel again.  The net effect is that a pending signal or reschedule
      event could be delayed for an unbounded amount of time.
      
      To deal with this, the xen event upcall handler checks to see if the
      EIP is within the critical section of the iret code, after events
      are (potentially) enabled up to the iret itself.  If its within this
      range, it calls the iret critical section fixup, which adjusts the
      stack to deal with any unrestored registers, and then shifts the
      stack frame up to replace the previous invocation.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      9ec2b804
    • J
      xen: Attempt to patch inline versions of common operations · 6487673b
      Jeremy Fitzhardinge 提交于
      This patchs adds the mechanism to allow us to patch inline versions of
      common operations.
      
      The implementations of the direct-access versions save_fl, restore_fl,
      irq_enable and irq_disable are now in assembler, and the same code is
      used for both out of line and inline uses.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Keir Fraser <keir@xensource.com>
      6487673b