1. 03 2月, 2014 1 次提交
  2. 31 1月, 2014 1 次提交
    • Z
      xen/grant-table: Avoid m2p_override during mapping · 08ece5bb
      Zoltan Kiss 提交于
      The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
      for blkback and future netback patches it just cause a lock contention, as
      those pages never go to userspace. Therefore this series does the following:
      - the original functions were renamed to __gnttab_[un]map_refs, with a new
        parameter m2p_override
      - based on m2p_override either they follow the original behaviour, or just set
        the private flag and call set_phys_to_machine
      - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
        m2p_override false
      - a new function gnttab_[un]map_refs_userspace provides the old behaviour
      
      It also removes a stray space from page.h and change ret to 0 if
      XENFEAT_auto_translated_physmap, as that is the only possible return value
      there.
      
      v2:
      - move the storing of the old mfn in page->index to gnttab_map_refs
      - move the function header update to a separate patch
      
      v3:
      - a new approach to retain old behaviour where it needed
      - squash the patches into one
      
      v4:
      - move out the common bits from m2p* functions, and pass pfn/mfn as parameter
      - clear page->private before doing anything with the page, so m2p_find_override
        won't race with this
      
      v5:
      - change return value handling in __gnttab_[un]map_refs
      - remove a stray space in page.h
      - add detail why ret = 0 now at some places
      
      v6:
      - don't pass pfn to m2p* functions, just get it locally
      Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
      Suggested-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      08ece5bb
  3. 06 1月, 2014 2 次提交
    • M
      xen/pvh: Setup up shared_info. · 4dd322bc
      Mukesh Rathor 提交于
      For PVHVM the shared_info structure is provided via the same way
      as for normal PV guests (see include/xen/interface/xen.h).
      
      That is during bootup we get 'xen_start_info' via the %esi register
      in startup_xen. Then later we extract the 'shared_info' from said
      structure (in xen_setup_shared_info) and start using it.
      
      The 'xen_setup_shared_info' is all setup to work with auto-xlat
      guests, but there are two functions which it calls that are not:
      xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
      This patch modifies the P2M code (xen_setup_mfn_list_list)
      while the "Piggyback on PVHVM for event channels" modifies
      the xen_setup_vcpu_info_placement.
      Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      4dd322bc
    • K
      xen/pvh: Don't setup P2M tree. · 696fd7c5
      Konrad Rzeszutek Wilk 提交于
      P2M is not available for PVH. Fortunatly for us the
      P2M code already has mostly the support for auto-xlat guest thanks to
      commit 3d24bbd7
      "grant-table: call set_phys_to_machine after mapping grant refs"
      which: "
      introduces set_phys_to_machine calls for auto_translated guests
      (even on x86) in gnttab_map_refs and gnttab_unmap_refs.
      translated by swiotlb-xen... " so we don't need to muck much.
      
      with above mentioned "commit you'll get set_phys_to_machine calls
      from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do
      anything with them " (Stefano Stabellini) which is OK - we want
      them to be NOPs.
      
      This is because we assume that an "IOMMU is always present on the
      plaform and Xen is going to make the appropriate IOMMU pagetable
      changes in the hypercall implementation of GNTTABOP_map_grant_ref
      and GNTTABOP_unmap_grant_ref, then eveything should be transparent
      from PVH priviligied point of view and DMA transfers involving
      foreign pages keep working with no issues[sp]
      
      Otherwise we would need a P2M (and an M2P) for PVH priviligied to
      track these foreign pages .. (see arch/arm/xen/p2m.c)."
      (Stefano Stabellini).
      
      We still have to inhibit the building of the P2M tree.
      That had been done in the past by not calling
      xen_build_dynamic_phys_to_machine (which setups the P2M tree
      and gives us virtual address to access them). But we are missing
      a check for xen_build_mfn_list_list - which was continuing to setup
      the P2M tree and would blow up at trying to get the virtual
      address of p2m_missing (which would have been setup by
      xen_build_dynamic_phys_to_machine).
      
      Hence a check is needed to not call xen_build_mfn_list_list when
      running in auto-xlat mode.
      
      Instead of replicating the check for auto-xlat in enlighten.c
      do it in the p2m.c code. The reason is that the xen_build_mfn_list_list
      is called also in xen_arch_post_suspend without any checks for
      auto-xlat. So for PVH or PV with auto-xlat - we would needlessly
      allocate space for an P2M tree.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      696fd7c5
  4. 10 10月, 2013 1 次提交
  5. 25 9月, 2013 1 次提交
    • D
      xen/p2m: check MFN is in range before using the m2p table · 0160676b
      David Vrabel 提交于
      On hosts with more than 168 GB of memory, a 32-bit guest may attempt
      to grant map an MFN that is error cannot lookup in its mapping of the
      m2p table.  There is an m2p lookup as part of m2p_add_override() and
      m2p_remove_override().  The lookup falls off the end of the mapped
      portion of the m2p and (because the mapping is at the highest virtual
      address) wraps around and the lookup causes a fault on what appears to
      be a user space address.
      
      do_page_fault() (thinking it's a fault to a userspace address), tries
      to lock mm->mmap_sem.  If the gntdev device is used for the grant map,
      m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
      already locked.  do_page_fault() then deadlocks.
      
      The deadlock would most commonly occur when a 64-bit guest is started
      and xenconsoled attempts to grant map its console ring.
      
      Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
      mapped portion of the m2p table before accessing the table and use
      this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
      (which already had the correct range check).
      
      All faults caused by accessing the non-existant parts of the m2p are
      thus within the kernel address space and exception_fixup() is called
      without trying to lock mm->mmap_sem.
      
      This means that for MFNs that are outside the mapped range of the m2p
      then mfn_to_pfn() will always look in the m2p overrides.  This is
      correct because it must be a foreign MFN (and the PFN in the m2p in
      this case is only relevant for the other domain).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      --
      v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
      v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
          range as it's probably foreign.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      0160676b
  6. 09 9月, 2013 1 次提交
  7. 20 8月, 2013 1 次提交
  8. 09 8月, 2013 1 次提交
  9. 12 9月, 2012 1 次提交
    • S
      xen/m2p: do not reuse kmap_op->dev_bus_addr · 2fc136ee
      Stefano Stabellini 提交于
      If the caller passes a valid kmap_op to m2p_add_override, we use
      kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is
      part of the interface with Xen and if we are batching the hypercalls it
      might not have been written by the hypervisor yet. That means that later
      on Xen will write to it and we'll think that the original mfn is
      actually what Xen has written to it.
      
      Rather than "stealing" struct members from kmap_op, keep using
      page->index to store the original mfn and add another parameter to
      m2p_remove_override to get the corresponding kmap_op instead.
      It is now responsibility of the caller to keep track of which kmap_op
      corresponds to a particular page in the m2p_override (gntdev, the only
      user of this interface that passes a valid kmap_op, is already doing that).
      
      CC: stable@kernel.org
      Reported-and-Tested-By: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2fc136ee
  10. 05 9月, 2012 1 次提交
    • K
      xen/p2m: Fix one-off error in checking the P2M tree directory. · 50e90041
      Konrad Rzeszutek Wilk 提交于
      We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES
      inclusive) when trying to figure out whether we can re-use some of the
      P2M middle leafs.
      
      Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512
      we would try to use the 512th entry. Fortunately for us the p2m_top_index
      has a check for this:
      
       BUG_ON(pfn >= MAX_P2M_PFN);
      
      which we hit and saw this:
      
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      (XEN) ----[ Xen-4.1.2-OVM  x86_64  debug=n  Tainted:    C ]----
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff819cadeb>]
      (XEN) RFLAGS: 0000000000000212   EM: 1   CONTEXT: pv guest
      (XEN) rax: ffffffff81db5000   rbx: ffffffff81db4000   rcx: 0000000000000000
      (XEN) rdx: 0000000000480211   rsi: 0000000000000000   rdi: ffffffff81db4000
      (XEN) rbp: ffffffff81793db8   rsp: ffffffff81793d38   r8:  0000000008000000
      (XEN) r9:  4000000000000000   r10: 0000000000000000   r11: ffffffff81db7000
      (XEN) r12: 0000000000000ff8   r13: ffffffff81df1ff8   r14: ffffffff81db6000
      (XEN) r15: 0000000000000ff8   cr0: 000000008005003b   cr4: 00000000000026f0
      (XEN) cr3: 0000000661795000   cr2: 0000000000000000
      
      Fixes-Oracle-Bug: 14570662
      CC: stable@vger.kernel.org # only for v3.5
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      50e90041
  11. 23 8月, 2012 3 次提交
    • K
      xen/p2m: When revectoring deal with holes in the P2M array. · 3fc509fc
      Konrad Rzeszutek Wilk 提交于
      When we free the PFNs and then subsequently populate them back
      during bootup:
      
      Freeing 20000-20200 pfn range: 512 pages freed
      1-1 mapping on 20000->20200
      Freeing 40000-40200 pfn range: 512 pages freed
      1-1 mapping on 40000->40200
      Freeing bad80-badf4 pfn range: 116 pages freed
      1-1 mapping on bad80->badf4
      Freeing badf6-bae7f pfn range: 137 pages freed
      1-1 mapping on badf6->bae7f
      Freeing bb000-100000 pfn range: 282624 pages freed
      1-1 mapping on bb000->100000
      Released 283999 pages of unused memory
      Set 283999 page(s) to 1-1 mapping
      Populating 1acb8a-1f20e9 pfn range: 283999 pages added
      
      We end up having the P2M array (that is the one that was
      grafted on the P2M tree) filled with IDENTITY_FRAME or
      INVALID_P2M_ENTRY) entries. The patch titled
      
      "xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID."
      recycles said slots and replaces the P2M tree leaf's with
       &mfn_list[xx] with p2m_identity or p2m_missing.
      
      And re-uses the P2M array sections for other P2M tree leaf's.
      For the above mentioned bootup excerpt, the PFNs at
      0x20000->0x20200 are going to be IDENTITY based:
      
      P2M[0][256][0] -> P2M[0][257][0] get turned in IDENTITY_FRAME.
      
      We can re-use that and replace P2M[0][256] to point to p2m_identity.
      The "old" page (the grafted P2M array provided by Xen) that was at
      P2M[0][256] gets put somewhere else. Specifically at P2M[6][358],
      b/c when we populate back:
      
      Populating 1acb8a-1f20e9 pfn range: 283999 pages added
      
      we fill P2M[6][358][0] (and P2M[6][358], P2M[6][359], ...) with
      the new MFNs.
      
      That is all OK, except when we revector we assume that the PFN
      count would be the same in the grafted P2M array and in the
      newly allocated. Since that is no longer the case, as we have
      holes in the P2M that point to p2m_missing or p2m_identity we
      have to take that into account.
      
      [v2: Check for overflow]
      [v3: Move within the __va check]
      [v4: Fix the computation]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3fc509fc
    • K
      xen/p2m: Add logic to revector a P2M tree to use __va leafs. · 357a3cfb
      Konrad Rzeszutek Wilk 提交于
      During bootup Xen supplies us with a P2M array. It sticks
      it right after the ramdisk, as can be seen with a 128GB PV guest:
      
      (certain parts removed for clarity):
      xc_dom_build_image: called
      xc_dom_alloc_segment:   kernel       : 0xffffffff81000000 -> 0xffffffff81e43000  (pfn 0x1000 + 0xe43 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
      xc_dom_alloc_segment:   ramdisk      : 0xffffffff81e43000 -> 0xffffffff925c7000  (pfn 0x1e43 + 0x10784 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
      xc_dom_alloc_segment:   phys2mach    : 0xffffffff925c7000 -> 0xffffffffa25c7000  (pfn 0x125c7 + 0x10000 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000
      xc_dom_alloc_page   :   start info   : 0xffffffffa25c7000 (pfn 0x225c7)
      xc_dom_alloc_page   :   xenstore     : 0xffffffffa25c8000 (pfn 0x225c8)
      xc_dom_alloc_page   :   console      : 0xffffffffa25c9000 (pfn 0x225c9)
      nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
      nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
      nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s)
      nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s)
      xc_dom_alloc_segment:   page tables  : 0xffffffffa25ca000 -> 0xffffffffa26e1000  (pfn 0x225ca + 0x117 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
      xc_dom_alloc_page   :   boot stack   : 0xffffffffa26e1000 (pfn 0x226e1)
      xc_dom_build_image  : virt_alloc_end : 0xffffffffa26e2000
      xc_dom_build_image  : virt_pgtab_end : 0xffffffffa2800000
      
      So the physical memory and virtual (using __START_KERNEL_map addresses)
      layout looks as so:
      
        phys                             __ka
      /------------\                   /-------------------\
      | 0          | empty             | 0xffffffff80000000|
      | ..         |                   | ..                |
      | 16MB       | <= kernel starts  | 0xffffffff81000000|
      | ..         |                   |                   |
      | 30MB       | <= kernel ends => | 0xffffffff81e43000|
      | ..         |  & ramdisk starts | ..                |
      | 293MB      | <= ramdisk ends=> | 0xffffffff925c7000|
      | ..         |  & P2M starts     | ..                |
      | ..         |                   | ..                |
      | 549MB      | <= P2M ends    => | 0xffffffffa25c7000|
      | ..         | start_info        | 0xffffffffa25c7000|
      | ..         | xenstore          | 0xffffffffa25c8000|
      | ..         | cosole            | 0xffffffffa25c9000|
      | 549MB      | <= page tables => | 0xffffffffa25ca000|
      | ..         |                   |                   |
      | 550MB      | <= PGT end     => | 0xffffffffa26e1000|
      | ..         | boot stack        |                   |
      \------------/                   \-------------------/
      
      As can be seen, the ramdisk, P2M and pagetables are taking
      a bit of __ka addresses space. Which is a problem since the
      MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
      right in there! This results during bootup with the inability to
      load modules, with this error:
      
      ------------[ cut here ]------------
      WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370()
      Call Trace:
       [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0
       [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
       [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20
       [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370
       [<ffffffff81130c4d>] map_vm_area+0x2d/0x50
       [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250
       [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c6186>] ? load_module+0x66/0x19c0
       [<ffffffff8105cadc>] module_alloc+0x5c/0x60
       [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c70c3>] load_module+0xfa3/0x19c0
       [<ffffffff812491f6>] ? security_file_permission+0x86/0x90
       [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220
       [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b
      ---[ end trace fd8f7704fdea0291 ]---
      vmalloc: allocation failure, allocated 16384 of 20480 bytes
      modprobe: page allocation failure: order:0, mode:0xd2
      
      Since the __va and __ka are 1:1 up to MODULES_VADDR and
      cleanup_highmap rids __ka of the ramdisk mapping, what
      we want to do is similar - get rid of the P2M in the __ka
      address space. There are two ways of fixing this:
      
       1) All P2M lookups instead of using the __ka address would
          use the __va address. This means we can safely erase from
          __ka space the PMD pointers that point to the PFNs for
          P2M array and be OK.
       2). Allocate a new array, copy the existing P2M into it,
          revector the P2M tree to use that, and return the old
          P2M to the memory allocate. This has the advantage that
          it sets the stage for using XEN_ELF_NOTE_INIT_P2M
          feature. That feature allows us to set the exact virtual
          address space we want for the P2M - and allows us to
          boot as initial domain on large machines.
      
      So we pick option 2).
      
      This patch only lays the groundwork in the P2M code. The patch
      that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      357a3cfb
    • K
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and... · 51faaf2b
      Konrad Rzeszutek Wilk 提交于
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas."
      
      This reverts commit 806c312e and
      commit 59b29440.
      
      And also documents setup.c and why we want to do it that way, which
      is that we tried to make the the memblock_reserve more selective so
      that it would be clear what region is reserved. Sadly we ran
      in the problem wherein on a 64-bit hypervisor with a 32-bit
      initial domain, the pt_base has the cr3 value which is not
      neccessarily where the pagetable starts! As Jan put it: "
      Actually, the adjustment turns out to be correct: The page
      tables for a 32-on-64 dom0 get allocated in the order "first L1",
      "first L2", "first L3", so the offset to the page table base is
      indeed 2. When reading xen/include/public/xen.h's comment
      very strictly, this is not a violation (since there nothing is said
      that the first thing in the page table space is pointed to by
      pt_base; I admit that this seems to be implied though, namely
      do I think that it is implied that the page table space is the
      range [pt_base, pt_base + nt_pt_frames), whereas that
      range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames),
      which - without a priori knowledge - the kernel would have
      difficulty to figure out)." - so lets just fall back to the
      easy way and reserve the whole region.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      51faaf2b
  12. 22 8月, 2012 2 次提交
  13. 17 8月, 2012 1 次提交
    • K
      xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID. · 250a41e0
      Konrad Rzeszutek Wilk 提交于
      If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
      1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
      with either a p2m_missing or p2m_identity respectively. The old
      page (which was created via extend_brk or was grafted on from the
      mfn_list) can be re-used for setting new PFNs.
      
      This also means we can remove git commit:
      5bc6f988
      xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back
      which tried to fix this.
      
      and make the amount that is required to be reserved much smaller.
      
      CC: stable@vger.kernel.org # for 3.5 only.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      250a41e0
  14. 02 8月, 2012 1 次提交
    • K
      xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back. · 5bc6f988
      Konrad Rzeszutek Wilk 提交于
      When we release pages back during bootup:
      
      Freeing  9d-100 pfn range: 99 pages freed
      Freeing  9cf36-9d0d2 pfn range: 412 pages freed
      Freeing  9f6bd-9f6bf pfn range: 2 pages freed
      Freeing  9f714-9f7bf pfn range: 171 pages freed
      Freeing  9f7e0-9f7ff pfn range: 31 pages freed
      Freeing  9f800-100000 pfn range: 395264 pages freed
      Released 395979 pages of unused memory
      
      We then try to populate those pages back. In the P2M tree however
      the space for those leafs must be reserved - as such we use extend_brk.
      We reserve 8MB of _brk space, which means we can fit over
      1048576 PFNs - which is more than we should ever need.
      
      Without this, on certain compilation of the kernel we would hit:
      
      (XEN) domain_crash_sync called from entry.S
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff818aad3b>]
      (XEN) RFLAGS: 0000000000000206   EM: 1   CONTEXT: pv guest
      (XEN) rax: ffffffff81a7c000   rbx: 000000000000003d   rcx: 0000000000001000
      (XEN) rdx: ffffffff81a7b000   rsi: 0000000000001000   rdi: 0000000000001000
      (XEN) rbp: ffffffff81801cd8   rsp: ffffffff81801c98   r8:  0000000000100000
      (XEN) r9:  ffffffff81a7a000   r10: 0000000000000001   r11: 0000000000000003
      (XEN) r12: 0000000000000004   r13: 0000000000000004   r14: 000000000000003d
      (XEN) r15: 00000000000001e8   cr0: 000000008005003b   cr4: 00000000000006f0
      (XEN) cr3: 0000000125803000   cr2: 0000000000000000
      (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
      (XEN) Guest stack trace from rsp=ffffffff81801c98:
      
      .. which is extend_brk hitting a BUG_ON.
      
      Interestingly enough, most of the time we are not going to hit this
      b/c the _brk space is quite large (v3.5):
       ffffffff81a25000 B __brk_base
       ffffffff81e43000 B __brk_limit
      = ~4MB.
      
      vs earlier kernels (with this back-ported), the space is smaller:
       ffffffff81a25000 B __brk_base
       ffffffff81a7b000 B __brk_limit
      = 344 kBytes.
      
      where we would certainly hit this and hit extend_brk.
      
      Note that git commit c3d93f88
      (xen: populate correct number of pages when across mem boundary (v2))
      exposed this bug).
      
      [v1: Made it 8MB of _brk space instead of 4MB per Jan's suggestion]
      
      CC: stable@vger.kernel.org #only for 3.5
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      5bc6f988
  15. 15 6月, 2012 1 次提交
    • S
      xen: mark local pages as FOREIGN in the m2p_override · b9e0d95c
      Stefano Stabellini 提交于
      When the frontend and the backend reside on the same domain, even if we
      add pages to the m2p_override, these pages will never be returned by
      mfn_to_pfn because the check "get_phys_to_machine(pfn) != mfn" will
      always fail, so the pfn of the frontend will be returned instead
      (resulting in a deadlock because the frontend pages are already locked).
      
      INFO: task qemu-system-i38:1085 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      qemu-system-i38 D ffff8800cfc137c0     0  1085      1 0x00000000
       ffff8800c47ed898 0000000000000282 ffff8800be4596b0 00000000000137c0
       ffff8800c47edfd8 ffff8800c47ec010 00000000000137c0 00000000000137c0
       ffff8800c47edfd8 00000000000137c0 ffffffff82213020 ffff8800be4596b0
      Call Trace:
       [<ffffffff81101ee0>] ? __lock_page+0x70/0x70
       [<ffffffff81a0fdd9>] schedule+0x29/0x70
       [<ffffffff81a0fe80>] io_schedule+0x60/0x80
       [<ffffffff81101eee>] sleep_on_page+0xe/0x20
       [<ffffffff81a0e1ca>] __wait_on_bit_lock+0x5a/0xc0
       [<ffffffff81101ed7>] __lock_page+0x67/0x70
       [<ffffffff8106f750>] ? autoremove_wake_function+0x40/0x40
       [<ffffffff811867e6>] ? bio_add_page+0x36/0x40
       [<ffffffff8110b692>] set_page_dirty_lock+0x52/0x60
       [<ffffffff81186021>] bio_set_pages_dirty+0x51/0x70
       [<ffffffff8118c6b4>] do_blockdev_direct_IO+0xb24/0xeb0
       [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
       [<ffffffff8118ca95>] __blockdev_direct_IO+0x55/0x60
       [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
       [<ffffffff811e91c8>] ext3_direct_IO+0xf8/0x390
       [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
       [<ffffffff81004b60>] ? xen_mc_flush+0xb0/0x1b0
       [<ffffffff81104027>] generic_file_aio_read+0x737/0x780
       [<ffffffff813bedeb>] ? gnttab_map_refs+0x15b/0x1e0
       [<ffffffff811038f0>] ? find_get_pages+0x150/0x150
       [<ffffffff8119736c>] aio_rw_vect_retry+0x7c/0x1d0
       [<ffffffff811972f0>] ? lookup_ioctx+0x90/0x90
       [<ffffffff81198856>] aio_run_iocb+0x66/0x1a0
       [<ffffffff811998b8>] do_io_submit+0x708/0xb90
       [<ffffffff81199d50>] sys_io_submit+0x10/0x20
       [<ffffffff81a18d69>] system_call_fastpath+0x16/0x1b
      
      The explanation is in the comment within the code:
      
      We need to do this because the pages shared by the frontend
      (xen-blkfront) can be already locked (lock_page, called by
      do_read_cache_page); when the userspace backend tries to use them
      with direct_IO, mfn_to_pfn returns the pfn of the frontend, so
      do_blockdev_direct_IO is going to try to lock the same pages
      again resulting in a deadlock.
      
      A simplified call graph looks like this:
      
      pygrub                          QEMU
      -----------------------------------------------
      do_read_cache_page              io_submit
        |                              |
      lock_page                       ext3_direct_IO
                                       |
                                      bio_add_page
                                       |
                                      lock_page
      
      Internally the xen-blkback uses m2p_add_override to swizzle (temporarily)
      a 'struct page' to have a different MFN (so that it can point to another
      guest). It also can easily find out whether another pfn corresponding
      to the mfn exists in the m2p, and can set the FOREIGN bit
      in the p2m, making sure that mfn_to_pfn returns the pfn of the backend.
      
      This allows the backend to perform direct_IO on these pages, but as a
      side effect prevents the frontend from using get_user_pages_fast on
      them while they are being shared with the backend.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b9e0d95c
  16. 20 4月, 2012 1 次提交
  17. 18 4月, 2012 1 次提交
  18. 07 4月, 2012 4 次提交
  19. 20 10月, 2011 2 次提交
  20. 29 9月, 2011 1 次提交
    • S
      xen: modify kernel mappings corresponding to granted pages · 0930bba6
      Stefano Stabellini 提交于
      If we want to use granted pages for AIO, changing the mappings of a user
      vma and the corresponding p2m is not enough, we also need to update the
      kernel mappings accordingly.
      Currently this is only needed for pages that are created for user usages
      through /dev/xen/gntdev. As in, pages that have been in use by the
      kernel and use the P2M will not need this special mapping.
      However there are no guarantees that in the future the kernel won't
      start accessing pages through the 1:1 even for internal usage.
      
      In order to avoid the complexity of dealing with highmem, we allocated
      the pages lowmem.
      We issue a HYPERVISOR_grant_table_op right away in
      m2p_add_override and we remove the mappings using another
      HYPERVISOR_grant_table_op in m2p_remove_override.
      Considering that m2p_add_override and m2p_remove_override are called
      once per page we use multicalls and hypercall batching.
      
      Use the kmap_op pointer directly as argument to do the mapping as it is
      guaranteed to be present up until the unmapping is done.
      Before issuing any unmapping multicalls, we need to make sure that the
      mapping has already being done, because we need the kmap->handle to be
      set correctly.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      [v1: Removed GRANT_FRAME_BIT usage]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      0930bba6
  21. 24 9月, 2011 2 次提交
  22. 19 5月, 2011 2 次提交
    • K
      xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions. · c9ce9e43
      Konrad Rzeszutek Wilk 提交于
      If the backends, which use these two functions, are compiled as
      a module we need these two functions to be exported.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c9ce9e43
    • K
      xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. · d5431d52
      Konrad Rzeszutek Wilk 提交于
      We only supported the M2P (and P2M) override only for the
      GNTMAP_contains_pte type mappings. Meaning that we grants
      operations would "contain the machine address of the PTE to update"
      If the flag is unset, then the grant operation is
      "contains a host virtual address". The latter case means that
      the Hypervisor takes care of updating our page table
      (specifically the PTE entry) with the guest's MFN. As such we should
      not try to do anything with the PTE. Previous to this patch
      we would try to clear the PTE which resulted in Xen hypervisor
      being upset with us:
      
      (XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
      (XEN) domain_crash called from mm.c:1067
      (XEN) Domain 0 (vcpu#0) crashed on cpu#3:
      (XEN) ----[ Xen-4.0-110228  x86_64  debug=y  Not tainted ]----
      
      and crashing us.
      
      This patch allows us to inhibit the PTE clearing in the PV guest
      if the GNTMAP_contains_pte is not set.
      
      On the m2p_remove_override path we provide the same parameter.
      
      Sadly in the grant-table driver we do not have a mechanism to
      tell m2p_remove_override whether to clear the PTE or not. Since
      the grant-table driver is used by user-space, we can safely assume
      that it operates only on PTE's. Hence the implementation for
      it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
      we can implement the support for this. It will require some extra
      accounting structure to keep track of the page[i], and the flag.
      
      [v1: Added documentation details, made it return -EOPNOTSUPP instead
       of trying to do a half-way implementation]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d5431d52
  23. 13 5月, 2011 1 次提交
  24. 20 4月, 2011 1 次提交
  25. 18 4月, 2011 1 次提交
    • K
      xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. · cf8d9163
      Konrad Rzeszutek Wilk 提交于
      We only supported the M2P (and P2M) override only for the
      GNTMAP_contains_pte type mappings. Meaning that we grants
      operations would "contain the machine address of the PTE to update"
      If the flag is unset, then the grant operation is
      "contains a host virtual address". The latter case means that
      the Hypervisor takes care of updating our page table
      (specifically the PTE entry) with the guest's MFN. As such we should
      not try to do anything with the PTE. Previous to this patch
      we would try to clear the PTE which resulted in Xen hypervisor
      being upset with us:
      
      (XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
      (XEN) domain_crash called from mm.c:1067
      (XEN) Domain 0 (vcpu#0) crashed on cpu#3:
      (XEN) ----[ Xen-4.0-110228  x86_64  debug=y  Not tainted ]----
      
      and crashing us.
      
      This patch allows us to inhibit the PTE clearing in the PV guest
      if the GNTMAP_contains_pte is not set.
      
      On the m2p_remove_override path we provide the same parameter.
      
      Sadly in the grant-table driver we do not have a mechanism to
      tell m2p_remove_override whether to clear the PTE or not. Since
      the grant-table driver is used by user-space, we can safely assume
      that it operates only on PTE's. Hence the implementation for
      it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
      we can implement the support for this. It will require some extra
      accounting structure to keep track of the page[i], and the flag.
      
      [v1: Added documentation details, made it return -EOPNOTSUPP instead
       of trying to do a half-way implementation]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cf8d9163
  26. 29 3月, 2011 1 次提交
    • R
      xen: fix p2m section mismatches · b83c6e55
      Randy Dunlap 提交于
      Fix section mismatch warnings:
      set_phys_range_identity() is called by __init xen_set_identity(),
      so also mark set_phys_range_identity() as __init.
      then:
      __early_alloc_p2m() is called set_phys_range_identity(), so also mark
      __early_alloc_p2m() as __init.
      
      WARNING: arch/x86/built-in.o(.text+0x7856): Section mismatch in reference from the function __early_alloc_p2m() to the function .init.text:extend_brk()
      The function __early_alloc_p2m() references
      the function __init extend_brk().
      This is often because __early_alloc_p2m lacks a __init
      annotation or the annotation of extend_brk is wrong.
      
      WARNING: arch/x86/built-in.o(.text+0x7967): Section mismatch in reference from the function set_phys_range_identity() to the function .init.text:extend_brk()
      The function set_phys_range_identity() references
      the function __init extend_brk().
      This is often because set_phys_range_identity lacks a __init
      annotation or the annotation of extend_brk is wrong.
      
      [v2: Per Stephen Hemming recommonedation made __early_alloc_p2m static]
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b83c6e55
  27. 24 3月, 2011 1 次提交
  28. 14 3月, 2011 3 次提交
    • K
      xen/debugfs: Add 'p2m' file for printing out the P2M layout. · 2222e71b
      Konrad Rzeszutek Wilk 提交于
      We walk over the whole P2M tree and construct a simplified view of
      which PFN regions belong to what level and what type they are.
      
      Only enabled if CONFIG_XEN_DEBUG_FS is set.
      
      [v2: UNKN->UNKNOWN, use uninitialized_var]
      [v3: Rebased on top of mmu->p2m code split]
      [v4: Fixed the else if]
      Reviewed-by: NIan Campbell <Ian.Campbell@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2222e71b
    • K
      xen/mmu: WARN_ON when racing to swap middle leaf. · c7617798
      Konrad Rzeszutek Wilk 提交于
      The initial bootup code uses set_phys_to_machine quite a lot, and after
      bootup it would be used by the balloon driver. The balloon driver does have
      mutex lock so this should not be necessary - but just in case, add
      a WARN_ON if we do hit this scenario. If we do fail this, it is OK
      to continue as there is a backup mechanism (VM_IO) that can bypass
      the P2M and still set the _PAGE_IOMAP flags.
      
      [v2: Change from WARN to BUG_ON]
      [v3: Rebased on top of xen->p2m code split]
      [v4: Change from BUG_ON to WARN]
      Reviewed-by: NIan Campbell <Ian.Campbell@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c7617798
    • K
      xen/mmu: Add the notion of identity (1-1) mapping. · f4cec35b
      Konrad Rzeszutek Wilk 提交于
      Our P2M tree structure is a three-level. On the leaf nodes
      we set the Machine Frame Number (MFN) of the PFN. What this means
      is that when one does: pfn_to_mfn(pfn), which is used when creating
      PTE entries, you get the real MFN of the hardware. When Xen sets
      up a guest it initially populates a array which has descending
      (or ascending) MFN values, as so:
      
       idx: 0,  1,       2
       [0x290F, 0x290E, 0x290D, ..]
      
      so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list
      starts looking quite random.
      
      We graft this structure on our P2M tree structure and stick in
      those MFN in the leafs. But for all other leaf entries, or for the top
      root, or middle one, for which there is a void entry, we assume it is
      "missing". So
       pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY.
      
      We add the possibility of setting 1-1 mappings on certain regions, so
      that:
       pfn_to_mfn(0xc0000)=0xc0000
      
      The benefit of this is, that we can assume for non-RAM regions (think
      PCI BARs, or ACPI spaces), we can create mappings easily b/c we
      get the PFN value to match the MFN.
      
      For this to work efficiently we introduce one new page p2m_identity and
      allocate (via reserved_brk) any other pages we need to cover the sides
      (1GB or 4MB boundary violations). All entries in p2m_identity are set to
      INVALID_P2M_ENTRY type (Xen toolstack only recognizes that and MFNs,
      no other fancy value).
      
      On lookup we spot that the entry points to p2m_identity and return the identity
      value instead of dereferencing and returning INVALID_P2M_ENTRY. If the entry
      points to an allocated page, we just proceed as before and return the PFN.
      If the PFN has IDENTITY_FRAME_BIT set we unmask that in appropriate functions
      (pfn_to_mfn).
      
      The reason for having the IDENTITY_FRAME_BIT instead of just returning the
      PFN is that we could find ourselves where pfn_to_mfn(pfn)==pfn for a
      non-identity pfn. To protect ourselves against we elect to set (and get) the
      IDENTITY_FRAME_BIT on all identity mapped PFNs.
      
      This simplistic diagram is used to explain the more subtle piece of code.
      There is also a digram of the P2M at the end that can help.
      Imagine your E820 looking as so:
      
                         1GB                                           2GB
      /-------------------+---------\/----\         /----------\    /---+-----\
      | System RAM        | Sys RAM ||ACPI|         | reserved |    | Sys RAM |
      \-------------------+---------/\----/         \----------/    \---+-----/
                                    ^- 1029MB                       ^- 2001MB
      
      [1029MB = 263424 (0x40500), 2001MB = 512256 (0x7D100), 2048MB = 524288 (0x80000)]
      
      And dom0_mem=max:3GB,1GB is passed in to the guest, meaning memory past 1GB
      is actually not present (would have to kick the balloon driver to put it in).
      
      When we are told to set the PFNs for identity mapping (see patch: "xen/setup:
      Set identity mapping for non-RAM E820 and E820 gaps.") we pass in the start
      of the PFN and the end PFN (263424 and 512256 respectively). The first step is
      to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page
      covers 512^2 of page estate (1GB) and in case the start or end PFN is not
      aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn to
      end pfn.  We reserve_brk top leaf pages if they are missing (means they point
      to p2m_mid_missing).
      
      With the E820 example above, 263424 is not 1GB aligned so we allocate a
      reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000.
      Each entry in the allocate page is "missing" (points to p2m_missing).
      
      Next stage is to determine if we need to do a more granular boundary check
      on the 4MB (or 2MB depending on architecture) off the start and end pfn's.
      We check if the start pfn and end pfn violate that boundary check, and if
      so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer
      granularity of setting which PFNs are missing and which ones are identity.
      In our example 263424 and 512256 both fail the check so we reserve_brk two
      pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing" values)
      and assign them to p2m[1][2] and p2m[1][488] respectively.
      
      At this point we would at minimum reserve_brk one page, but could be up to
      three. Each call to set_phys_range_identity has at maximum a three page
      cost. If we were to query the P2M at this stage, all those entries from
      start PFN through end PFN (so 1029MB -> 2001MB) would return INVALID_P2M_ENTRY
      ("missing").
      
      The next step is to walk from the start pfn to the end pfn setting
      the IDENTITY_FRAME_BIT on each PFN. This is done in 'set_phys_range_identity'.
      If we find that the middle leaf is pointing to p2m_missing we can swap it over
      to p2m_identity - this way covering 4MB (or 2MB) PFN space.  At this point we
      do not need to worry about boundary aligment (so no need to reserve_brk a middle
      page, figure out which PFNs are "missing" and which ones are identity), as that
      has been done earlier.  If we find that the middle leaf is not occupied by
      p2m_identity or p2m_missing, we dereference that page (which covers
      512 PFNs) and set the appropriate PFN with IDENTITY_FRAME_BIT. In our example
      263424 and 512256 end up there, and we set from p2m[1][2][256->511] and
      p2m[1][488][0->256] with IDENTITY_FRAME_BIT set.
      
      All other regions that are void (or not filled) either point to p2m_missing
      (considered missing) or have the default value of INVALID_P2M_ENTRY (also
      considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511]
      contain the INVALID_P2M_ENTRY value and are considered "missing."
      
      This is what the p2m ends up looking (for the E820 above) with this
      fabulous drawing:
      
         p2m         /--------------\
       /-----\       | &mfn_list[0],|                           /-----------------\
       |  0  |------>| &mfn_list[1],|    /---------------\      | ~0, ~0, ..      |
       |-----|       |  ..., ~0, ~0 |    | ~0, ~0, [x]---+----->| IDENTITY [@256] |
       |  1  |---\   \--------------/    | [p2m_identity]+\     | IDENTITY [@257] |
       |-----|    \                      | [p2m_identity]+\\    | ....            |
       |  2  |--\  \-------------------->|  ...          | \\   \----------------/
       |-----|   \                       \---------------/  \\
       |  3  |\   \                                          \\  p2m_identity
       |-----| \   \-------------------->/---------------\   /-----------------\
       | ..  +->+                        | [p2m_identity]+-->| ~0, ~0, ~0, ... |
       \-----/ /                         | [p2m_identity]+-->| ..., ~0         |
              / /---------------\        | ....          |   \-----------------/
             /  | IDENTITY[@0]  |      /-+-[x], ~0, ~0.. |
            /   | IDENTITY[@256]|<----/  \---------------/
           /    | ~0, ~0, ....  |
          |     \---------------/
          |
          p2m_missing             p2m_missing
      /------------------\     /------------\
      | [p2m_mid_missing]+---->| ~0, ~0, ~0 |
      | [p2m_mid_missing]+---->| ..., ~0    |
      \------------------/     \------------/
      
      where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN | IDENTITY_BIT)
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      [v5: Changed code to use ranges, added ASCII art]
      [v6: Rebased on top of xen->p2m code split]
      [v4: Squished patches in just this one]
      [v7: Added RESERVE_BRK for potentially allocated pages]
      [v8: Fixed alignment problem]
      [v9: Changed 1<<3X to 1<<BITS_PER_LONG-X]
      [v10: Copied git commit description in the p2m code + Add Review tag]
      [v11: Title had '2-1' - should be '1-1' mapping]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f4cec35b