1. 16 7月, 2008 4 次提交
  2. 04 7月, 2008 1 次提交
  3. 26 6月, 2008 1 次提交
  4. 25 6月, 2008 2 次提交
    • J
      xen: add mechanism to extend existing multicalls · 400d3494
      Jeremy Fitzhardinge 提交于
      Some Xen hypercalls accept an array of operations to work on.  In
      general this is because its more efficient for the hypercall to the
      work all at once rather than as separate hypercalls (even batched as a
      multicall).
      
      This patch adds a mechanism (xen_mc_extend_args()) to allocate more
      argument space to the last-issued multicall, in order to extend its
      argument list.
      
      The user of this mechanism is xen/mmu.c, which uses it to extend the
      args array of mmu_update.  This is particularly valuable when doing
      the update for a large mprotect, which goes via
      ptep_modify_prot_commit(), but it also manages to batch updates to
      pgd/pmds as well.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      400d3494
    • J
      xen: implement ptep_modify_prot_start/commit · e57778a1
      Jeremy Fitzhardinge 提交于
      Xen has a pte update function which will update a pte while preserving
      its accessed and dirty bits.  This means that ptep_modify_prot_start() can be
      implemented as a simple read of the pte value.  The hardware may
      update the pte in the meantime, but ptep_modify_prot_commit() updates it while
      preserving any changes that may have happened in the meantime.
      
      The updates in ptep_modify_prot_commit() are batched if we're currently in lazy
      mmu mode.
      
      The mmu_update hypercall can take a batch of updates to perform, but
      this code doesn't make particular use of that feature, in favour of
      using generic multicall batching to get them all into the hypervisor.
      
      The net effect of this is that each mprotect pte update turns from two
      expensive trap-and-emulate faults into they hypervisor into a single
      hypercall whose cost is amortized in a batched multicall.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e57778a1
  5. 24 6月, 2008 1 次提交
  6. 20 6月, 2008 4 次提交
  7. 02 6月, 2008 2 次提交
  8. 28 5月, 2008 1 次提交
    • I
      xen: fix early bootup crash on native hardware · b20aeccd
      Ingo Molnar 提交于
      -tip tree auto-testing found the following early bootup hang:
      
      -------------->
      get_memcfg_from_srat: assigning address to rsdp
      RSD PTR  v0 [Nvidia]
      BUG: Int 14: CR2 ffd00040
           EDI 8092fbfe  ESI ffd00040  EBP 80b0aee8  ESP 80b0aed0
           EBX 000f76f0  EDX 0000000e  ECX 00000003  EAX ffd00040
           err 00000000  EIP 802c055a   CS 00000060  flg 00010006
      Stack: ffd00040 80bc78d0 80b0af6c 80b1dbfe 8093d8ba 00000008 80b42810 80b4ddb4
             80b42842 00000000 80b0af1c 801079c8 808e724e 00000000 80b42871 802c0531
             00000100 00000000 0003fff0 80b0af40 80129999 00040100 00040100 00000000
      Pid: 0, comm: swapper Not tainted 2.6.26-rc4-sched-devel.git #570
       [<802c055a>] ? strncmp+0x11/0x25
       [<80b1dbfe>] ? get_memcfg_from_srat+0xb4/0x568
       [<801079c8>] ? mcount_call+0x5/0x9
       [<802c0531>] ? strcmp+0xa/0x22
       [<80129999>] ? printk+0x38/0x3a
       [<80129999>] ? printk+0x38/0x3a
       [<8011b122>] ? memory_present+0x66/0x6f
       [<80b216b4>] ? setup_memory+0x13/0x40c
       [<80b16b47>] ? propagate_e820_map+0x80/0x97
       [<80b1622a>] ? setup_arch+0x248/0x477
       [<80129999>] ? printk+0x38/0x3a
       [<80b11759>] ? start_kernel+0x6e/0x2eb
       [<80b110fc>] ? i386_start_kernel+0xeb/0xf2
       =======================
      <------
      
      with this config:
      
         http://redhat.com/~mingo/misc/config-Wed_May_28_01_33_33_CEST_2008.bad
      
      The thing is, the crash makes little sense at first sight. We crash on a
      benign-looking printk. The code around it got changed in -tip but
      checking those topic branches individually did not reproduce the bug.
      
      Bisection led to this commit:
      
      |   d5edbc1f is first bad commit
      |   commit d5edbc1f
      |   Author: Jeremy Fitzhardinge <jeremy@goop.org>
      |   Date:   Mon May 26 23:31:22 2008 +0100
      |
      |   xen: add p2m mfn_list_list
      
      Which is somewhat surprising, as on native hardware Xen client side
      should have little to no side-effects.
      
      After some head scratching, it turns out the following happened:
      randconfig enabled the following Xen options:
      
        CONFIG_XEN=y
        CONFIG_XEN_MAX_DOMAIN_MEMORY=8
        # CONFIG_XEN_BLKDEV_FRONTEND is not set
        # CONFIG_XEN_NETDEV_FRONTEND is not set
        CONFIG_HVC_XEN=y
        # CONFIG_XEN_BALLOON is not set
      
      which activated this piece of code in arch/x86/xen/mmu.c:
      
      > @@ -69,6 +69,13 @@
      >  	__attribute__((section(".data.page_aligned"))) =
      >  		{ [ 0 ... TOP_ENTRIES - 1] = &p2m_missing[0] };
      >
      > +/* Arrays of p2m arrays expressed in mfns used for save/restore */
      > +static unsigned long p2m_top_mfn[TOP_ENTRIES]
      > +	__attribute__((section(".bss.page_aligned")));
      > +
      > +static unsigned long p2m_top_mfn_list[TOP_ENTRIES / P2M_ENTRIES_PER_PAGE]
      > +	__attribute__((section(".bss.page_aligned")));
      
      The problem is, you must only put variables into .bss.page_aligned that
      have a _size_ that is _exactly_ page aligned. In this case the size of
      p2m_top_mfn_list is not page aligned:
      
       80b8d000 b p2m_top_mfn
       80b8f000 b p2m_top_mfn_list
       80b8f008 b softirq_stack
       80b97008 b hardirq_stack
       80b9f008 b bm_pte
      
      So all subsequent variables get unaligned which, depending on luck,
      breaks the kernel in various funny ways. In this case what killed the
      kernel first was the misaligned bootmap pte page, resulting in that
      creative crash above.
      
      Anyway, this was a fun bug to track down :-)
      
      I think the moral is that .bss.page_aligned is a dangerous construct in
      its current form, and the symptoms of breakage are very non-trivial, so
      i think we need build-time checks to make sure all symbols in
      .bss.page_aligned are truly page aligned.
      
      The Xen fix below gets the kernel booting again.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b20aeccd
  9. 27 5月, 2008 5 次提交
  10. 23 5月, 2008 2 次提交
  11. 28 4月, 2008 1 次提交
  12. 25 4月, 2008 4 次提交
  13. 05 4月, 2008 1 次提交
  14. 10 2月, 2008 1 次提交
  15. 30 1月, 2008 4 次提交
  16. 30 11月, 2007 1 次提交
  17. 18 10月, 2007 1 次提交
  18. 17 10月, 2007 4 次提交
    • J
      xen: lock pte pages while pinning/unpinning · 74260714
      Jeremy Fitzhardinge 提交于
      When a pagetable is created, it is made globally visible in the rmap
      prio tree before it is pinned via arch_dup_mmap(), and remains in the
      rmap tree while it is unpinned with arch_exit_mmap().
      
      This means that other CPUs may race with the pinning/unpinning
      process, and see a pte between when it gets marked RO and actually
      pinned, causing any pte updates to fail with write-protect faults.
      
      As a result, all pte pages must be properly locked, and only unlocked
      once the pinning/unpinning process has finished.
      
      In order to avoid taking spinlocks for the whole pagetable - which may
      overflow the PREEMPT_BITS portion of preempt counter - it locks and pins
      each pte page individually, and then finally pins the whole pagetable.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickens <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Keir Fraser <keir@xensource.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      74260714
    • J
      xen: deal with stale cr3 values when unpinning pagetables · 9f79991d
      Jeremy Fitzhardinge 提交于
      When a pagetable is no longer in use, it must be unpinned so that its
      pages can be freed.  However, this is only possible if there are no
      stray uses of the pagetable.  The code currently deals with all the
      usual cases, but there's a rare case where a vcpu is changing cr3, but
      is doing so lazily, and the change hasn't actually happened by the time
      the pagetable is unpinned, even though it appears to have been completed.
      
      This change adds a second per-cpu cr3 variable - xen_current_cr3 -
      which tracks the actual state of the vcpu cr3.  It is only updated once
      the actual hypercall to set cr3 has been completed.  Other processors
      wishing to unpin a pagetable can check other vcpu's xen_current_cr3
      values to see if any cross-cpu IPIs are needed to clean things up.
      
      [ Stable folks: 2.6.23 bugfix ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Stable Kernel <stable@kernel.org>
      9f79991d
    • J
      Clean up duplicate includes in arch/i386/xen/ · d626a1f1
      Jesper Juhl 提交于
      This patch cleans up duplicate includes in
      	arch/i386/xen/
      Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      d626a1f1
    • J
      paravirt: clean up lazy mode handling · 8965c1c0
      Jeremy Fitzhardinge 提交于
      Currently, the set_lazy_mode pv_op is overloaded with 5 functions:
       1. enter lazy cpu mode
       2. leave lazy cpu mode
       3. enter lazy mmu mode
       4. leave lazy mmu mode
       5. flush pending batched operations
      
      This complicates each paravirt backend, since it needs to deal with
      all the possible state transitions, handling flushing, etc. In
      particular, flushing is quite distinct from the other 4 functions, and
      seems to just cause complication.
      
      This patch removes the set_lazy_mode operation, and adds "enter" and
      "leave" lazy mode operations on mmu_ops and cpu_ops.  All the logic
      associated with enter and leaving lazy states is now in common code
      (basically BUG_ONs to make sure that no mode is current when entering
      a lazy mode, and make sure that the mode is current when leaving).
      Also, flush is handled in a common way, by simply leaving and
      re-entering the lazy mode.
      
      The result is that the Xen, lguest and VMI lazy mode implementations
      are much simpler.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zach Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Anthony Liguory <aliguori@us.ibm.com>
      Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      8965c1c0