1. 04 12月, 2014 5 次提交
  2. 23 10月, 2014 3 次提交
    • J
      x86/xen: avoid race in p2m handling · 3a0e94f8
      Juergen Gross 提交于
      When a new p2m leaf is allocated this leaf is linked into the p2m tree
      via cmpxchg. Unfortunately the compare value for checking the success
      of the update is read after checking for the need of a new leaf. It is
      possible that a new leaf has been linked into the tree concurrently
      in between. This could lead to a leaked memory page and to the loss of
      some p2m entries.
      
      Avoid the race by using the read compare value for checking the need
      of a new p2m leaf and use ACCESS_ONCE() to get it.
      
      There are other places which seem to need ACCESS_ONCE() to ensure
      proper operation. Change them accordingly.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      3a0e94f8
    • J
      x86/xen: delay construction of mfn_list_list · 2c185687
      Juergen Gross 提交于
      The 3 level p2m tree for the Xen tools is constructed very early at
      boot by calling xen_build_mfn_list_list(). Memory needed for this tree
      is allocated via extend_brk().
      
      As this tree (other than the kernel internal p2m tree) is only needed
      for domain save/restore, live migration and crash dump analysis it
      doesn't matter whether it is constructed very early or just some
      milliseconds later when memory allocation is possible by other means.
      
      This patch moves the call of xen_build_mfn_list_list() just after
      calling xen_pagetable_p2m_copy() simplifying this function, too, as it
      doesn't have to bother with two parallel trees now. The same applies
      for some other internal functions.
      
      While simplifying code, make early_can_reuse_p2m_middle() static and
      drop the unused second parameter. p2m_mid_identity_mfn can be removed
      as well, it isn't used either.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      2c185687
    • J
      x86/xen: avoid writing to freed memory after race in p2m handling · 239af7c7
      Juergen Gross 提交于
      In case a race was detected during allocation of a new p2m tree
      element in alloc_p2m() the new allocated mid_mfn page is freed without
      updating the pointer to the found value in the tree. This will result
      in overwriting the just freed page with the mfn of the p2m leaf.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      239af7c7
  3. 23 9月, 2014 1 次提交
    • M
      xen/setup: Remap Xen Identity Mapped RAM · 4fbb67e3
      Matt Rushton 提交于
      Instead of ballooning up and down dom0 memory this remaps the existing mfns
      that were replaced by the identity map. The reason for this is that the
      existing implementation ballooned memory up and and down which caused dom0
      to have discontiguous pages. In some cases this resulted in the use of bounce
      buffers which reduced network I/O performance significantly. This change will
      honor the existing order of the pages with the exception of some boundary
      conditions.
      
      To do this we need to update both the Linux p2m table and the Xen m2p table.
      Particular care must be taken when updating the p2m table since it's important
      to limit table memory consumption and reuse the existing leaf pages which get
      freed when an entire leaf page is set to the identity map. To implement this,
      mapping updates are grouped into blocks with table entries getting cached
      temporarily and then released.
      
      On my test system before:
      Total pages: 2105014
      Total contiguous: 1640635
      
      After:
      Total pages: 2105014
      Total contiguous: 2098904
      Signed-off-by: NMatthew Rushton <mrushton@amazon.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      4fbb67e3
  4. 01 8月, 2014 1 次提交
  5. 15 5月, 2014 4 次提交
  6. 18 3月, 2014 1 次提交
  7. 03 2月, 2014 1 次提交
  8. 31 1月, 2014 1 次提交
    • Z
      xen/grant-table: Avoid m2p_override during mapping · 08ece5bb
      Zoltan Kiss 提交于
      The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
      for blkback and future netback patches it just cause a lock contention, as
      those pages never go to userspace. Therefore this series does the following:
      - the original functions were renamed to __gnttab_[un]map_refs, with a new
        parameter m2p_override
      - based on m2p_override either they follow the original behaviour, or just set
        the private flag and call set_phys_to_machine
      - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
        m2p_override false
      - a new function gnttab_[un]map_refs_userspace provides the old behaviour
      
      It also removes a stray space from page.h and change ret to 0 if
      XENFEAT_auto_translated_physmap, as that is the only possible return value
      there.
      
      v2:
      - move the storing of the old mfn in page->index to gnttab_map_refs
      - move the function header update to a separate patch
      
      v3:
      - a new approach to retain old behaviour where it needed
      - squash the patches into one
      
      v4:
      - move out the common bits from m2p* functions, and pass pfn/mfn as parameter
      - clear page->private before doing anything with the page, so m2p_find_override
        won't race with this
      
      v5:
      - change return value handling in __gnttab_[un]map_refs
      - remove a stray space in page.h
      - add detail why ret = 0 now at some places
      
      v6:
      - don't pass pfn to m2p* functions, just get it locally
      Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
      Suggested-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      08ece5bb
  9. 06 1月, 2014 2 次提交
    • M
      xen/pvh: Setup up shared_info. · 4dd322bc
      Mukesh Rathor 提交于
      For PVHVM the shared_info structure is provided via the same way
      as for normal PV guests (see include/xen/interface/xen.h).
      
      That is during bootup we get 'xen_start_info' via the %esi register
      in startup_xen. Then later we extract the 'shared_info' from said
      structure (in xen_setup_shared_info) and start using it.
      
      The 'xen_setup_shared_info' is all setup to work with auto-xlat
      guests, but there are two functions which it calls that are not:
      xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
      This patch modifies the P2M code (xen_setup_mfn_list_list)
      while the "Piggyback on PVHVM for event channels" modifies
      the xen_setup_vcpu_info_placement.
      Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      4dd322bc
    • K
      xen/pvh: Don't setup P2M tree. · 696fd7c5
      Konrad Rzeszutek Wilk 提交于
      P2M is not available for PVH. Fortunatly for us the
      P2M code already has mostly the support for auto-xlat guest thanks to
      commit 3d24bbd7
      "grant-table: call set_phys_to_machine after mapping grant refs"
      which: "
      introduces set_phys_to_machine calls for auto_translated guests
      (even on x86) in gnttab_map_refs and gnttab_unmap_refs.
      translated by swiotlb-xen... " so we don't need to muck much.
      
      with above mentioned "commit you'll get set_phys_to_machine calls
      from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do
      anything with them " (Stefano Stabellini) which is OK - we want
      them to be NOPs.
      
      This is because we assume that an "IOMMU is always present on the
      plaform and Xen is going to make the appropriate IOMMU pagetable
      changes in the hypercall implementation of GNTTABOP_map_grant_ref
      and GNTTABOP_unmap_grant_ref, then eveything should be transparent
      from PVH priviligied point of view and DMA transfers involving
      foreign pages keep working with no issues[sp]
      
      Otherwise we would need a P2M (and an M2P) for PVH priviligied to
      track these foreign pages .. (see arch/arm/xen/p2m.c)."
      (Stefano Stabellini).
      
      We still have to inhibit the building of the P2M tree.
      That had been done in the past by not calling
      xen_build_dynamic_phys_to_machine (which setups the P2M tree
      and gives us virtual address to access them). But we are missing
      a check for xen_build_mfn_list_list - which was continuing to setup
      the P2M tree and would blow up at trying to get the virtual
      address of p2m_missing (which would have been setup by
      xen_build_dynamic_phys_to_machine).
      
      Hence a check is needed to not call xen_build_mfn_list_list when
      running in auto-xlat mode.
      
      Instead of replicating the check for auto-xlat in enlighten.c
      do it in the p2m.c code. The reason is that the xen_build_mfn_list_list
      is called also in xen_arch_post_suspend without any checks for
      auto-xlat. So for PVH or PV with auto-xlat - we would needlessly
      allocate space for an P2M tree.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      696fd7c5
  10. 10 10月, 2013 1 次提交
  11. 25 9月, 2013 1 次提交
    • D
      xen/p2m: check MFN is in range before using the m2p table · 0160676b
      David Vrabel 提交于
      On hosts with more than 168 GB of memory, a 32-bit guest may attempt
      to grant map an MFN that is error cannot lookup in its mapping of the
      m2p table.  There is an m2p lookup as part of m2p_add_override() and
      m2p_remove_override().  The lookup falls off the end of the mapped
      portion of the m2p and (because the mapping is at the highest virtual
      address) wraps around and the lookup causes a fault on what appears to
      be a user space address.
      
      do_page_fault() (thinking it's a fault to a userspace address), tries
      to lock mm->mmap_sem.  If the gntdev device is used for the grant map,
      m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
      already locked.  do_page_fault() then deadlocks.
      
      The deadlock would most commonly occur when a 64-bit guest is started
      and xenconsoled attempts to grant map its console ring.
      
      Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
      mapped portion of the m2p table before accessing the table and use
      this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
      (which already had the correct range check).
      
      All faults caused by accessing the non-existant parts of the m2p are
      thus within the kernel address space and exception_fixup() is called
      without trying to lock mm->mmap_sem.
      
      This means that for MFNs that are outside the mapped range of the m2p
      then mfn_to_pfn() will always look in the m2p overrides.  This is
      correct because it must be a foreign MFN (and the PFN in the m2p in
      this case is only relevant for the other domain).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      --
      v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
      v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
          range as it's probably foreign.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      0160676b
  12. 09 9月, 2013 1 次提交
  13. 20 8月, 2013 1 次提交
  14. 09 8月, 2013 1 次提交
  15. 12 9月, 2012 1 次提交
    • S
      xen/m2p: do not reuse kmap_op->dev_bus_addr · 2fc136ee
      Stefano Stabellini 提交于
      If the caller passes a valid kmap_op to m2p_add_override, we use
      kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is
      part of the interface with Xen and if we are batching the hypercalls it
      might not have been written by the hypervisor yet. That means that later
      on Xen will write to it and we'll think that the original mfn is
      actually what Xen has written to it.
      
      Rather than "stealing" struct members from kmap_op, keep using
      page->index to store the original mfn and add another parameter to
      m2p_remove_override to get the corresponding kmap_op instead.
      It is now responsibility of the caller to keep track of which kmap_op
      corresponds to a particular page in the m2p_override (gntdev, the only
      user of this interface that passes a valid kmap_op, is already doing that).
      
      CC: stable@kernel.org
      Reported-and-Tested-By: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2fc136ee
  16. 05 9月, 2012 1 次提交
    • K
      xen/p2m: Fix one-off error in checking the P2M tree directory. · 50e90041
      Konrad Rzeszutek Wilk 提交于
      We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES
      inclusive) when trying to figure out whether we can re-use some of the
      P2M middle leafs.
      
      Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512
      we would try to use the 512th entry. Fortunately for us the p2m_top_index
      has a check for this:
      
       BUG_ON(pfn >= MAX_P2M_PFN);
      
      which we hit and saw this:
      
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      (XEN) ----[ Xen-4.1.2-OVM  x86_64  debug=n  Tainted:    C ]----
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff819cadeb>]
      (XEN) RFLAGS: 0000000000000212   EM: 1   CONTEXT: pv guest
      (XEN) rax: ffffffff81db5000   rbx: ffffffff81db4000   rcx: 0000000000000000
      (XEN) rdx: 0000000000480211   rsi: 0000000000000000   rdi: ffffffff81db4000
      (XEN) rbp: ffffffff81793db8   rsp: ffffffff81793d38   r8:  0000000008000000
      (XEN) r9:  4000000000000000   r10: 0000000000000000   r11: ffffffff81db7000
      (XEN) r12: 0000000000000ff8   r13: ffffffff81df1ff8   r14: ffffffff81db6000
      (XEN) r15: 0000000000000ff8   cr0: 000000008005003b   cr4: 00000000000026f0
      (XEN) cr3: 0000000661795000   cr2: 0000000000000000
      
      Fixes-Oracle-Bug: 14570662
      CC: stable@vger.kernel.org # only for v3.5
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      50e90041
  17. 23 8月, 2012 3 次提交
    • K
      xen/p2m: When revectoring deal with holes in the P2M array. · 3fc509fc
      Konrad Rzeszutek Wilk 提交于
      When we free the PFNs and then subsequently populate them back
      during bootup:
      
      Freeing 20000-20200 pfn range: 512 pages freed
      1-1 mapping on 20000->20200
      Freeing 40000-40200 pfn range: 512 pages freed
      1-1 mapping on 40000->40200
      Freeing bad80-badf4 pfn range: 116 pages freed
      1-1 mapping on bad80->badf4
      Freeing badf6-bae7f pfn range: 137 pages freed
      1-1 mapping on badf6->bae7f
      Freeing bb000-100000 pfn range: 282624 pages freed
      1-1 mapping on bb000->100000
      Released 283999 pages of unused memory
      Set 283999 page(s) to 1-1 mapping
      Populating 1acb8a-1f20e9 pfn range: 283999 pages added
      
      We end up having the P2M array (that is the one that was
      grafted on the P2M tree) filled with IDENTITY_FRAME or
      INVALID_P2M_ENTRY) entries. The patch titled
      
      "xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID."
      recycles said slots and replaces the P2M tree leaf's with
       &mfn_list[xx] with p2m_identity or p2m_missing.
      
      And re-uses the P2M array sections for other P2M tree leaf's.
      For the above mentioned bootup excerpt, the PFNs at
      0x20000->0x20200 are going to be IDENTITY based:
      
      P2M[0][256][0] -> P2M[0][257][0] get turned in IDENTITY_FRAME.
      
      We can re-use that and replace P2M[0][256] to point to p2m_identity.
      The "old" page (the grafted P2M array provided by Xen) that was at
      P2M[0][256] gets put somewhere else. Specifically at P2M[6][358],
      b/c when we populate back:
      
      Populating 1acb8a-1f20e9 pfn range: 283999 pages added
      
      we fill P2M[6][358][0] (and P2M[6][358], P2M[6][359], ...) with
      the new MFNs.
      
      That is all OK, except when we revector we assume that the PFN
      count would be the same in the grafted P2M array and in the
      newly allocated. Since that is no longer the case, as we have
      holes in the P2M that point to p2m_missing or p2m_identity we
      have to take that into account.
      
      [v2: Check for overflow]
      [v3: Move within the __va check]
      [v4: Fix the computation]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3fc509fc
    • K
      xen/p2m: Add logic to revector a P2M tree to use __va leafs. · 357a3cfb
      Konrad Rzeszutek Wilk 提交于
      During bootup Xen supplies us with a P2M array. It sticks
      it right after the ramdisk, as can be seen with a 128GB PV guest:
      
      (certain parts removed for clarity):
      xc_dom_build_image: called
      xc_dom_alloc_segment:   kernel       : 0xffffffff81000000 -> 0xffffffff81e43000  (pfn 0x1000 + 0xe43 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
      xc_dom_alloc_segment:   ramdisk      : 0xffffffff81e43000 -> 0xffffffff925c7000  (pfn 0x1e43 + 0x10784 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
      xc_dom_alloc_segment:   phys2mach    : 0xffffffff925c7000 -> 0xffffffffa25c7000  (pfn 0x125c7 + 0x10000 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000
      xc_dom_alloc_page   :   start info   : 0xffffffffa25c7000 (pfn 0x225c7)
      xc_dom_alloc_page   :   xenstore     : 0xffffffffa25c8000 (pfn 0x225c8)
      xc_dom_alloc_page   :   console      : 0xffffffffa25c9000 (pfn 0x225c9)
      nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
      nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
      nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s)
      nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s)
      xc_dom_alloc_segment:   page tables  : 0xffffffffa25ca000 -> 0xffffffffa26e1000  (pfn 0x225ca + 0x117 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
      xc_dom_alloc_page   :   boot stack   : 0xffffffffa26e1000 (pfn 0x226e1)
      xc_dom_build_image  : virt_alloc_end : 0xffffffffa26e2000
      xc_dom_build_image  : virt_pgtab_end : 0xffffffffa2800000
      
      So the physical memory and virtual (using __START_KERNEL_map addresses)
      layout looks as so:
      
        phys                             __ka
      /------------\                   /-------------------\
      | 0          | empty             | 0xffffffff80000000|
      | ..         |                   | ..                |
      | 16MB       | <= kernel starts  | 0xffffffff81000000|
      | ..         |                   |                   |
      | 30MB       | <= kernel ends => | 0xffffffff81e43000|
      | ..         |  & ramdisk starts | ..                |
      | 293MB      | <= ramdisk ends=> | 0xffffffff925c7000|
      | ..         |  & P2M starts     | ..                |
      | ..         |                   | ..                |
      | 549MB      | <= P2M ends    => | 0xffffffffa25c7000|
      | ..         | start_info        | 0xffffffffa25c7000|
      | ..         | xenstore          | 0xffffffffa25c8000|
      | ..         | cosole            | 0xffffffffa25c9000|
      | 549MB      | <= page tables => | 0xffffffffa25ca000|
      | ..         |                   |                   |
      | 550MB      | <= PGT end     => | 0xffffffffa26e1000|
      | ..         | boot stack        |                   |
      \------------/                   \-------------------/
      
      As can be seen, the ramdisk, P2M and pagetables are taking
      a bit of __ka addresses space. Which is a problem since the
      MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
      right in there! This results during bootup with the inability to
      load modules, with this error:
      
      ------------[ cut here ]------------
      WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370()
      Call Trace:
       [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0
       [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
       [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20
       [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370
       [<ffffffff81130c4d>] map_vm_area+0x2d/0x50
       [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250
       [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c6186>] ? load_module+0x66/0x19c0
       [<ffffffff8105cadc>] module_alloc+0x5c/0x60
       [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c70c3>] load_module+0xfa3/0x19c0
       [<ffffffff812491f6>] ? security_file_permission+0x86/0x90
       [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220
       [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b
      ---[ end trace fd8f7704fdea0291 ]---
      vmalloc: allocation failure, allocated 16384 of 20480 bytes
      modprobe: page allocation failure: order:0, mode:0xd2
      
      Since the __va and __ka are 1:1 up to MODULES_VADDR and
      cleanup_highmap rids __ka of the ramdisk mapping, what
      we want to do is similar - get rid of the P2M in the __ka
      address space. There are two ways of fixing this:
      
       1) All P2M lookups instead of using the __ka address would
          use the __va address. This means we can safely erase from
          __ka space the PMD pointers that point to the PFNs for
          P2M array and be OK.
       2). Allocate a new array, copy the existing P2M into it,
          revector the P2M tree to use that, and return the old
          P2M to the memory allocate. This has the advantage that
          it sets the stage for using XEN_ELF_NOTE_INIT_P2M
          feature. That feature allows us to set the exact virtual
          address space we want for the P2M - and allows us to
          boot as initial domain on large machines.
      
      So we pick option 2).
      
      This patch only lays the groundwork in the P2M code. The patch
      that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      357a3cfb
    • K
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and... · 51faaf2b
      Konrad Rzeszutek Wilk 提交于
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas."
      
      This reverts commit 806c312e and
      commit 59b29440.
      
      And also documents setup.c and why we want to do it that way, which
      is that we tried to make the the memblock_reserve more selective so
      that it would be clear what region is reserved. Sadly we ran
      in the problem wherein on a 64-bit hypervisor with a 32-bit
      initial domain, the pt_base has the cr3 value which is not
      neccessarily where the pagetable starts! As Jan put it: "
      Actually, the adjustment turns out to be correct: The page
      tables for a 32-on-64 dom0 get allocated in the order "first L1",
      "first L2", "first L3", so the offset to the page table base is
      indeed 2. When reading xen/include/public/xen.h's comment
      very strictly, this is not a violation (since there nothing is said
      that the first thing in the page table space is pointed to by
      pt_base; I admit that this seems to be implied though, namely
      do I think that it is implied that the page table space is the
      range [pt_base, pt_base + nt_pt_frames), whereas that
      range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames),
      which - without a priori knowledge - the kernel would have
      difficulty to figure out)." - so lets just fall back to the
      easy way and reserve the whole region.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      51faaf2b
  18. 22 8月, 2012 2 次提交
  19. 17 8月, 2012 1 次提交
    • K
      xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID. · 250a41e0
      Konrad Rzeszutek Wilk 提交于
      If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
      1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
      with either a p2m_missing or p2m_identity respectively. The old
      page (which was created via extend_brk or was grafted on from the
      mfn_list) can be re-used for setting new PFNs.
      
      This also means we can remove git commit:
      5bc6f988
      xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back
      which tried to fix this.
      
      and make the amount that is required to be reserved much smaller.
      
      CC: stable@vger.kernel.org # for 3.5 only.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      250a41e0
  20. 02 8月, 2012 1 次提交
    • K
      xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back. · 5bc6f988
      Konrad Rzeszutek Wilk 提交于
      When we release pages back during bootup:
      
      Freeing  9d-100 pfn range: 99 pages freed
      Freeing  9cf36-9d0d2 pfn range: 412 pages freed
      Freeing  9f6bd-9f6bf pfn range: 2 pages freed
      Freeing  9f714-9f7bf pfn range: 171 pages freed
      Freeing  9f7e0-9f7ff pfn range: 31 pages freed
      Freeing  9f800-100000 pfn range: 395264 pages freed
      Released 395979 pages of unused memory
      
      We then try to populate those pages back. In the P2M tree however
      the space for those leafs must be reserved - as such we use extend_brk.
      We reserve 8MB of _brk space, which means we can fit over
      1048576 PFNs - which is more than we should ever need.
      
      Without this, on certain compilation of the kernel we would hit:
      
      (XEN) domain_crash_sync called from entry.S
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff818aad3b>]
      (XEN) RFLAGS: 0000000000000206   EM: 1   CONTEXT: pv guest
      (XEN) rax: ffffffff81a7c000   rbx: 000000000000003d   rcx: 0000000000001000
      (XEN) rdx: ffffffff81a7b000   rsi: 0000000000001000   rdi: 0000000000001000
      (XEN) rbp: ffffffff81801cd8   rsp: ffffffff81801c98   r8:  0000000000100000
      (XEN) r9:  ffffffff81a7a000   r10: 0000000000000001   r11: 0000000000000003
      (XEN) r12: 0000000000000004   r13: 0000000000000004   r14: 000000000000003d
      (XEN) r15: 00000000000001e8   cr0: 000000008005003b   cr4: 00000000000006f0
      (XEN) cr3: 0000000125803000   cr2: 0000000000000000
      (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
      (XEN) Guest stack trace from rsp=ffffffff81801c98:
      
      .. which is extend_brk hitting a BUG_ON.
      
      Interestingly enough, most of the time we are not going to hit this
      b/c the _brk space is quite large (v3.5):
       ffffffff81a25000 B __brk_base
       ffffffff81e43000 B __brk_limit
      = ~4MB.
      
      vs earlier kernels (with this back-ported), the space is smaller:
       ffffffff81a25000 B __brk_base
       ffffffff81a7b000 B __brk_limit
      = 344 kBytes.
      
      where we would certainly hit this and hit extend_brk.
      
      Note that git commit c3d93f88
      (xen: populate correct number of pages when across mem boundary (v2))
      exposed this bug).
      
      [v1: Made it 8MB of _brk space instead of 4MB per Jan's suggestion]
      
      CC: stable@vger.kernel.org #only for 3.5
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      5bc6f988
  21. 15 6月, 2012 1 次提交
    • S
      xen: mark local pages as FOREIGN in the m2p_override · b9e0d95c
      Stefano Stabellini 提交于
      When the frontend and the backend reside on the same domain, even if we
      add pages to the m2p_override, these pages will never be returned by
      mfn_to_pfn because the check "get_phys_to_machine(pfn) != mfn" will
      always fail, so the pfn of the frontend will be returned instead
      (resulting in a deadlock because the frontend pages are already locked).
      
      INFO: task qemu-system-i38:1085 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      qemu-system-i38 D ffff8800cfc137c0     0  1085      1 0x00000000
       ffff8800c47ed898 0000000000000282 ffff8800be4596b0 00000000000137c0
       ffff8800c47edfd8 ffff8800c47ec010 00000000000137c0 00000000000137c0
       ffff8800c47edfd8 00000000000137c0 ffffffff82213020 ffff8800be4596b0
      Call Trace:
       [<ffffffff81101ee0>] ? __lock_page+0x70/0x70
       [<ffffffff81a0fdd9>] schedule+0x29/0x70
       [<ffffffff81a0fe80>] io_schedule+0x60/0x80
       [<ffffffff81101eee>] sleep_on_page+0xe/0x20
       [<ffffffff81a0e1ca>] __wait_on_bit_lock+0x5a/0xc0
       [<ffffffff81101ed7>] __lock_page+0x67/0x70
       [<ffffffff8106f750>] ? autoremove_wake_function+0x40/0x40
       [<ffffffff811867e6>] ? bio_add_page+0x36/0x40
       [<ffffffff8110b692>] set_page_dirty_lock+0x52/0x60
       [<ffffffff81186021>] bio_set_pages_dirty+0x51/0x70
       [<ffffffff8118c6b4>] do_blockdev_direct_IO+0xb24/0xeb0
       [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
       [<ffffffff8118ca95>] __blockdev_direct_IO+0x55/0x60
       [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
       [<ffffffff811e91c8>] ext3_direct_IO+0xf8/0x390
       [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
       [<ffffffff81004b60>] ? xen_mc_flush+0xb0/0x1b0
       [<ffffffff81104027>] generic_file_aio_read+0x737/0x780
       [<ffffffff813bedeb>] ? gnttab_map_refs+0x15b/0x1e0
       [<ffffffff811038f0>] ? find_get_pages+0x150/0x150
       [<ffffffff8119736c>] aio_rw_vect_retry+0x7c/0x1d0
       [<ffffffff811972f0>] ? lookup_ioctx+0x90/0x90
       [<ffffffff81198856>] aio_run_iocb+0x66/0x1a0
       [<ffffffff811998b8>] do_io_submit+0x708/0xb90
       [<ffffffff81199d50>] sys_io_submit+0x10/0x20
       [<ffffffff81a18d69>] system_call_fastpath+0x16/0x1b
      
      The explanation is in the comment within the code:
      
      We need to do this because the pages shared by the frontend
      (xen-blkfront) can be already locked (lock_page, called by
      do_read_cache_page); when the userspace backend tries to use them
      with direct_IO, mfn_to_pfn returns the pfn of the frontend, so
      do_blockdev_direct_IO is going to try to lock the same pages
      again resulting in a deadlock.
      
      A simplified call graph looks like this:
      
      pygrub                          QEMU
      -----------------------------------------------
      do_read_cache_page              io_submit
        |                              |
      lock_page                       ext3_direct_IO
                                       |
                                      bio_add_page
                                       |
                                      lock_page
      
      Internally the xen-blkback uses m2p_add_override to swizzle (temporarily)
      a 'struct page' to have a different MFN (so that it can point to another
      guest). It also can easily find out whether another pfn corresponding
      to the mfn exists in the m2p, and can set the FOREIGN bit
      in the p2m, making sure that mfn_to_pfn returns the pfn of the backend.
      
      This allows the backend to perform direct_IO on these pages, but as a
      side effect prevents the frontend from using get_user_pages_fast on
      them while they are being shared with the backend.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b9e0d95c
  22. 20 4月, 2012 1 次提交
  23. 18 4月, 2012 1 次提交
  24. 07 4月, 2012 4 次提交