1. 25 9月, 2013 1 次提交
    • D
      xen/p2m: check MFN is in range before using the m2p table · 0160676b
      David Vrabel 提交于
      On hosts with more than 168 GB of memory, a 32-bit guest may attempt
      to grant map an MFN that is error cannot lookup in its mapping of the
      m2p table.  There is an m2p lookup as part of m2p_add_override() and
      m2p_remove_override().  The lookup falls off the end of the mapped
      portion of the m2p and (because the mapping is at the highest virtual
      address) wraps around and the lookup causes a fault on what appears to
      be a user space address.
      
      do_page_fault() (thinking it's a fault to a userspace address), tries
      to lock mm->mmap_sem.  If the gntdev device is used for the grant map,
      m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
      already locked.  do_page_fault() then deadlocks.
      
      The deadlock would most commonly occur when a 64-bit guest is started
      and xenconsoled attempts to grant map its console ring.
      
      Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
      mapped portion of the m2p table before accessing the table and use
      this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
      (which already had the correct range check).
      
      All faults caused by accessing the non-existant parts of the m2p are
      thus within the kernel address space and exception_fixup() is called
      without trying to lock mm->mmap_sem.
      
      This means that for MFNs that are outside the mapped range of the m2p
      then mfn_to_pfn() will always look in the m2p overrides.  This is
      correct because it must be a foreign MFN (and the PFN in the m2p in
      this case is only relevant for the other domain).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      --
      v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
      v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
          range as it's probably foreign.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      0160676b
  2. 09 9月, 2013 1 次提交
  3. 20 8月, 2013 1 次提交
  4. 09 8月, 2013 1 次提交
  5. 12 9月, 2012 1 次提交
    • S
      xen/m2p: do not reuse kmap_op->dev_bus_addr · 2fc136ee
      Stefano Stabellini 提交于
      If the caller passes a valid kmap_op to m2p_add_override, we use
      kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is
      part of the interface with Xen and if we are batching the hypercalls it
      might not have been written by the hypervisor yet. That means that later
      on Xen will write to it and we'll think that the original mfn is
      actually what Xen has written to it.
      
      Rather than "stealing" struct members from kmap_op, keep using
      page->index to store the original mfn and add another parameter to
      m2p_remove_override to get the corresponding kmap_op instead.
      It is now responsibility of the caller to keep track of which kmap_op
      corresponds to a particular page in the m2p_override (gntdev, the only
      user of this interface that passes a valid kmap_op, is already doing that).
      
      CC: stable@kernel.org
      Reported-and-Tested-By: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2fc136ee
  6. 05 9月, 2012 1 次提交
    • K
      xen/p2m: Fix one-off error in checking the P2M tree directory. · 50e90041
      Konrad Rzeszutek Wilk 提交于
      We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES
      inclusive) when trying to figure out whether we can re-use some of the
      P2M middle leafs.
      
      Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512
      we would try to use the 512th entry. Fortunately for us the p2m_top_index
      has a check for this:
      
       BUG_ON(pfn >= MAX_P2M_PFN);
      
      which we hit and saw this:
      
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      (XEN) ----[ Xen-4.1.2-OVM  x86_64  debug=n  Tainted:    C ]----
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff819cadeb>]
      (XEN) RFLAGS: 0000000000000212   EM: 1   CONTEXT: pv guest
      (XEN) rax: ffffffff81db5000   rbx: ffffffff81db4000   rcx: 0000000000000000
      (XEN) rdx: 0000000000480211   rsi: 0000000000000000   rdi: ffffffff81db4000
      (XEN) rbp: ffffffff81793db8   rsp: ffffffff81793d38   r8:  0000000008000000
      (XEN) r9:  4000000000000000   r10: 0000000000000000   r11: ffffffff81db7000
      (XEN) r12: 0000000000000ff8   r13: ffffffff81df1ff8   r14: ffffffff81db6000
      (XEN) r15: 0000000000000ff8   cr0: 000000008005003b   cr4: 00000000000026f0
      (XEN) cr3: 0000000661795000   cr2: 0000000000000000
      
      Fixes-Oracle-Bug: 14570662
      CC: stable@vger.kernel.org # only for v3.5
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      50e90041
  7. 23 8月, 2012 3 次提交
    • K
      xen/p2m: When revectoring deal with holes in the P2M array. · 3fc509fc
      Konrad Rzeszutek Wilk 提交于
      When we free the PFNs and then subsequently populate them back
      during bootup:
      
      Freeing 20000-20200 pfn range: 512 pages freed
      1-1 mapping on 20000->20200
      Freeing 40000-40200 pfn range: 512 pages freed
      1-1 mapping on 40000->40200
      Freeing bad80-badf4 pfn range: 116 pages freed
      1-1 mapping on bad80->badf4
      Freeing badf6-bae7f pfn range: 137 pages freed
      1-1 mapping on badf6->bae7f
      Freeing bb000-100000 pfn range: 282624 pages freed
      1-1 mapping on bb000->100000
      Released 283999 pages of unused memory
      Set 283999 page(s) to 1-1 mapping
      Populating 1acb8a-1f20e9 pfn range: 283999 pages added
      
      We end up having the P2M array (that is the one that was
      grafted on the P2M tree) filled with IDENTITY_FRAME or
      INVALID_P2M_ENTRY) entries. The patch titled
      
      "xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID."
      recycles said slots and replaces the P2M tree leaf's with
       &mfn_list[xx] with p2m_identity or p2m_missing.
      
      And re-uses the P2M array sections for other P2M tree leaf's.
      For the above mentioned bootup excerpt, the PFNs at
      0x20000->0x20200 are going to be IDENTITY based:
      
      P2M[0][256][0] -> P2M[0][257][0] get turned in IDENTITY_FRAME.
      
      We can re-use that and replace P2M[0][256] to point to p2m_identity.
      The "old" page (the grafted P2M array provided by Xen) that was at
      P2M[0][256] gets put somewhere else. Specifically at P2M[6][358],
      b/c when we populate back:
      
      Populating 1acb8a-1f20e9 pfn range: 283999 pages added
      
      we fill P2M[6][358][0] (and P2M[6][358], P2M[6][359], ...) with
      the new MFNs.
      
      That is all OK, except when we revector we assume that the PFN
      count would be the same in the grafted P2M array and in the
      newly allocated. Since that is no longer the case, as we have
      holes in the P2M that point to p2m_missing or p2m_identity we
      have to take that into account.
      
      [v2: Check for overflow]
      [v3: Move within the __va check]
      [v4: Fix the computation]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3fc509fc
    • K
      xen/p2m: Add logic to revector a P2M tree to use __va leafs. · 357a3cfb
      Konrad Rzeszutek Wilk 提交于
      During bootup Xen supplies us with a P2M array. It sticks
      it right after the ramdisk, as can be seen with a 128GB PV guest:
      
      (certain parts removed for clarity):
      xc_dom_build_image: called
      xc_dom_alloc_segment:   kernel       : 0xffffffff81000000 -> 0xffffffff81e43000  (pfn 0x1000 + 0xe43 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
      xc_dom_alloc_segment:   ramdisk      : 0xffffffff81e43000 -> 0xffffffff925c7000  (pfn 0x1e43 + 0x10784 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
      xc_dom_alloc_segment:   phys2mach    : 0xffffffff925c7000 -> 0xffffffffa25c7000  (pfn 0x125c7 + 0x10000 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000
      xc_dom_alloc_page   :   start info   : 0xffffffffa25c7000 (pfn 0x225c7)
      xc_dom_alloc_page   :   xenstore     : 0xffffffffa25c8000 (pfn 0x225c8)
      xc_dom_alloc_page   :   console      : 0xffffffffa25c9000 (pfn 0x225c9)
      nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
      nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
      nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s)
      nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s)
      xc_dom_alloc_segment:   page tables  : 0xffffffffa25ca000 -> 0xffffffffa26e1000  (pfn 0x225ca + 0x117 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
      xc_dom_alloc_page   :   boot stack   : 0xffffffffa26e1000 (pfn 0x226e1)
      xc_dom_build_image  : virt_alloc_end : 0xffffffffa26e2000
      xc_dom_build_image  : virt_pgtab_end : 0xffffffffa2800000
      
      So the physical memory and virtual (using __START_KERNEL_map addresses)
      layout looks as so:
      
        phys                             __ka
      /------------\                   /-------------------\
      | 0          | empty             | 0xffffffff80000000|
      | ..         |                   | ..                |
      | 16MB       | <= kernel starts  | 0xffffffff81000000|
      | ..         |                   |                   |
      | 30MB       | <= kernel ends => | 0xffffffff81e43000|
      | ..         |  & ramdisk starts | ..                |
      | 293MB      | <= ramdisk ends=> | 0xffffffff925c7000|
      | ..         |  & P2M starts     | ..                |
      | ..         |                   | ..                |
      | 549MB      | <= P2M ends    => | 0xffffffffa25c7000|
      | ..         | start_info        | 0xffffffffa25c7000|
      | ..         | xenstore          | 0xffffffffa25c8000|
      | ..         | cosole            | 0xffffffffa25c9000|
      | 549MB      | <= page tables => | 0xffffffffa25ca000|
      | ..         |                   |                   |
      | 550MB      | <= PGT end     => | 0xffffffffa26e1000|
      | ..         | boot stack        |                   |
      \------------/                   \-------------------/
      
      As can be seen, the ramdisk, P2M and pagetables are taking
      a bit of __ka addresses space. Which is a problem since the
      MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
      right in there! This results during bootup with the inability to
      load modules, with this error:
      
      ------------[ cut here ]------------
      WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370()
      Call Trace:
       [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0
       [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
       [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20
       [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370
       [<ffffffff81130c4d>] map_vm_area+0x2d/0x50
       [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250
       [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c6186>] ? load_module+0x66/0x19c0
       [<ffffffff8105cadc>] module_alloc+0x5c/0x60
       [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c70c3>] load_module+0xfa3/0x19c0
       [<ffffffff812491f6>] ? security_file_permission+0x86/0x90
       [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220
       [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b
      ---[ end trace fd8f7704fdea0291 ]---
      vmalloc: allocation failure, allocated 16384 of 20480 bytes
      modprobe: page allocation failure: order:0, mode:0xd2
      
      Since the __va and __ka are 1:1 up to MODULES_VADDR and
      cleanup_highmap rids __ka of the ramdisk mapping, what
      we want to do is similar - get rid of the P2M in the __ka
      address space. There are two ways of fixing this:
      
       1) All P2M lookups instead of using the __ka address would
          use the __va address. This means we can safely erase from
          __ka space the PMD pointers that point to the PFNs for
          P2M array and be OK.
       2). Allocate a new array, copy the existing P2M into it,
          revector the P2M tree to use that, and return the old
          P2M to the memory allocate. This has the advantage that
          it sets the stage for using XEN_ELF_NOTE_INIT_P2M
          feature. That feature allows us to set the exact virtual
          address space we want for the P2M - and allows us to
          boot as initial domain on large machines.
      
      So we pick option 2).
      
      This patch only lays the groundwork in the P2M code. The patch
      that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      357a3cfb
    • K
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and... · 51faaf2b
      Konrad Rzeszutek Wilk 提交于
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas."
      
      This reverts commit 806c312e and
      commit 59b29440.
      
      And also documents setup.c and why we want to do it that way, which
      is that we tried to make the the memblock_reserve more selective so
      that it would be clear what region is reserved. Sadly we ran
      in the problem wherein on a 64-bit hypervisor with a 32-bit
      initial domain, the pt_base has the cr3 value which is not
      neccessarily where the pagetable starts! As Jan put it: "
      Actually, the adjustment turns out to be correct: The page
      tables for a 32-on-64 dom0 get allocated in the order "first L1",
      "first L2", "first L3", so the offset to the page table base is
      indeed 2. When reading xen/include/public/xen.h's comment
      very strictly, this is not a violation (since there nothing is said
      that the first thing in the page table space is pointed to by
      pt_base; I admit that this seems to be implied though, namely
      do I think that it is implied that the page table space is the
      range [pt_base, pt_base + nt_pt_frames), whereas that
      range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames),
      which - without a priori knowledge - the kernel would have
      difficulty to figure out)." - so lets just fall back to the
      easy way and reserve the whole region.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      51faaf2b
  8. 22 8月, 2012 2 次提交
  9. 17 8月, 2012 1 次提交
    • K
      xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID. · 250a41e0
      Konrad Rzeszutek Wilk 提交于
      If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
      1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
      with either a p2m_missing or p2m_identity respectively. The old
      page (which was created via extend_brk or was grafted on from the
      mfn_list) can be re-used for setting new PFNs.
      
      This also means we can remove git commit:
      5bc6f988
      xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back
      which tried to fix this.
      
      and make the amount that is required to be reserved much smaller.
      
      CC: stable@vger.kernel.org # for 3.5 only.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      250a41e0
  10. 02 8月, 2012 1 次提交
    • K
      xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back. · 5bc6f988
      Konrad Rzeszutek Wilk 提交于
      When we release pages back during bootup:
      
      Freeing  9d-100 pfn range: 99 pages freed
      Freeing  9cf36-9d0d2 pfn range: 412 pages freed
      Freeing  9f6bd-9f6bf pfn range: 2 pages freed
      Freeing  9f714-9f7bf pfn range: 171 pages freed
      Freeing  9f7e0-9f7ff pfn range: 31 pages freed
      Freeing  9f800-100000 pfn range: 395264 pages freed
      Released 395979 pages of unused memory
      
      We then try to populate those pages back. In the P2M tree however
      the space for those leafs must be reserved - as such we use extend_brk.
      We reserve 8MB of _brk space, which means we can fit over
      1048576 PFNs - which is more than we should ever need.
      
      Without this, on certain compilation of the kernel we would hit:
      
      (XEN) domain_crash_sync called from entry.S
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff818aad3b>]
      (XEN) RFLAGS: 0000000000000206   EM: 1   CONTEXT: pv guest
      (XEN) rax: ffffffff81a7c000   rbx: 000000000000003d   rcx: 0000000000001000
      (XEN) rdx: ffffffff81a7b000   rsi: 0000000000001000   rdi: 0000000000001000
      (XEN) rbp: ffffffff81801cd8   rsp: ffffffff81801c98   r8:  0000000000100000
      (XEN) r9:  ffffffff81a7a000   r10: 0000000000000001   r11: 0000000000000003
      (XEN) r12: 0000000000000004   r13: 0000000000000004   r14: 000000000000003d
      (XEN) r15: 00000000000001e8   cr0: 000000008005003b   cr4: 00000000000006f0
      (XEN) cr3: 0000000125803000   cr2: 0000000000000000
      (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
      (XEN) Guest stack trace from rsp=ffffffff81801c98:
      
      .. which is extend_brk hitting a BUG_ON.
      
      Interestingly enough, most of the time we are not going to hit this
      b/c the _brk space is quite large (v3.5):
       ffffffff81a25000 B __brk_base
       ffffffff81e43000 B __brk_limit
      = ~4MB.
      
      vs earlier kernels (with this back-ported), the space is smaller:
       ffffffff81a25000 B __brk_base
       ffffffff81a7b000 B __brk_limit
      = 344 kBytes.
      
      where we would certainly hit this and hit extend_brk.
      
      Note that git commit c3d93f88
      (xen: populate correct number of pages when across mem boundary (v2))
      exposed this bug).
      
      [v1: Made it 8MB of _brk space instead of 4MB per Jan's suggestion]
      
      CC: stable@vger.kernel.org #only for 3.5
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      5bc6f988
  11. 15 6月, 2012 1 次提交
    • S
      xen: mark local pages as FOREIGN in the m2p_override · b9e0d95c
      Stefano Stabellini 提交于
      When the frontend and the backend reside on the same domain, even if we
      add pages to the m2p_override, these pages will never be returned by
      mfn_to_pfn because the check "get_phys_to_machine(pfn) != mfn" will
      always fail, so the pfn of the frontend will be returned instead
      (resulting in a deadlock because the frontend pages are already locked).
      
      INFO: task qemu-system-i38:1085 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      qemu-system-i38 D ffff8800cfc137c0     0  1085      1 0x00000000
       ffff8800c47ed898 0000000000000282 ffff8800be4596b0 00000000000137c0
       ffff8800c47edfd8 ffff8800c47ec010 00000000000137c0 00000000000137c0
       ffff8800c47edfd8 00000000000137c0 ffffffff82213020 ffff8800be4596b0
      Call Trace:
       [<ffffffff81101ee0>] ? __lock_page+0x70/0x70
       [<ffffffff81a0fdd9>] schedule+0x29/0x70
       [<ffffffff81a0fe80>] io_schedule+0x60/0x80
       [<ffffffff81101eee>] sleep_on_page+0xe/0x20
       [<ffffffff81a0e1ca>] __wait_on_bit_lock+0x5a/0xc0
       [<ffffffff81101ed7>] __lock_page+0x67/0x70
       [<ffffffff8106f750>] ? autoremove_wake_function+0x40/0x40
       [<ffffffff811867e6>] ? bio_add_page+0x36/0x40
       [<ffffffff8110b692>] set_page_dirty_lock+0x52/0x60
       [<ffffffff81186021>] bio_set_pages_dirty+0x51/0x70
       [<ffffffff8118c6b4>] do_blockdev_direct_IO+0xb24/0xeb0
       [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
       [<ffffffff8118ca95>] __blockdev_direct_IO+0x55/0x60
       [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
       [<ffffffff811e91c8>] ext3_direct_IO+0xf8/0x390
       [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
       [<ffffffff81004b60>] ? xen_mc_flush+0xb0/0x1b0
       [<ffffffff81104027>] generic_file_aio_read+0x737/0x780
       [<ffffffff813bedeb>] ? gnttab_map_refs+0x15b/0x1e0
       [<ffffffff811038f0>] ? find_get_pages+0x150/0x150
       [<ffffffff8119736c>] aio_rw_vect_retry+0x7c/0x1d0
       [<ffffffff811972f0>] ? lookup_ioctx+0x90/0x90
       [<ffffffff81198856>] aio_run_iocb+0x66/0x1a0
       [<ffffffff811998b8>] do_io_submit+0x708/0xb90
       [<ffffffff81199d50>] sys_io_submit+0x10/0x20
       [<ffffffff81a18d69>] system_call_fastpath+0x16/0x1b
      
      The explanation is in the comment within the code:
      
      We need to do this because the pages shared by the frontend
      (xen-blkfront) can be already locked (lock_page, called by
      do_read_cache_page); when the userspace backend tries to use them
      with direct_IO, mfn_to_pfn returns the pfn of the frontend, so
      do_blockdev_direct_IO is going to try to lock the same pages
      again resulting in a deadlock.
      
      A simplified call graph looks like this:
      
      pygrub                          QEMU
      -----------------------------------------------
      do_read_cache_page              io_submit
        |                              |
      lock_page                       ext3_direct_IO
                                       |
                                      bio_add_page
                                       |
                                      lock_page
      
      Internally the xen-blkback uses m2p_add_override to swizzle (temporarily)
      a 'struct page' to have a different MFN (so that it can point to another
      guest). It also can easily find out whether another pfn corresponding
      to the mfn exists in the m2p, and can set the FOREIGN bit
      in the p2m, making sure that mfn_to_pfn returns the pfn of the backend.
      
      This allows the backend to perform direct_IO on these pages, but as a
      side effect prevents the frontend from using get_user_pages_fast on
      them while they are being shared with the backend.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b9e0d95c
  12. 20 4月, 2012 1 次提交
  13. 18 4月, 2012 1 次提交
  14. 07 4月, 2012 4 次提交
  15. 20 10月, 2011 2 次提交
  16. 29 9月, 2011 1 次提交
    • S
      xen: modify kernel mappings corresponding to granted pages · 0930bba6
      Stefano Stabellini 提交于
      If we want to use granted pages for AIO, changing the mappings of a user
      vma and the corresponding p2m is not enough, we also need to update the
      kernel mappings accordingly.
      Currently this is only needed for pages that are created for user usages
      through /dev/xen/gntdev. As in, pages that have been in use by the
      kernel and use the P2M will not need this special mapping.
      However there are no guarantees that in the future the kernel won't
      start accessing pages through the 1:1 even for internal usage.
      
      In order to avoid the complexity of dealing with highmem, we allocated
      the pages lowmem.
      We issue a HYPERVISOR_grant_table_op right away in
      m2p_add_override and we remove the mappings using another
      HYPERVISOR_grant_table_op in m2p_remove_override.
      Considering that m2p_add_override and m2p_remove_override are called
      once per page we use multicalls and hypercall batching.
      
      Use the kmap_op pointer directly as argument to do the mapping as it is
      guaranteed to be present up until the unmapping is done.
      Before issuing any unmapping multicalls, we need to make sure that the
      mapping has already being done, because we need the kmap->handle to be
      set correctly.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      [v1: Removed GRANT_FRAME_BIT usage]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      0930bba6
  17. 24 9月, 2011 2 次提交
  18. 19 5月, 2011 2 次提交
    • K
      xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions. · c9ce9e43
      Konrad Rzeszutek Wilk 提交于
      If the backends, which use these two functions, are compiled as
      a module we need these two functions to be exported.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c9ce9e43
    • K
      xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. · d5431d52
      Konrad Rzeszutek Wilk 提交于
      We only supported the M2P (and P2M) override only for the
      GNTMAP_contains_pte type mappings. Meaning that we grants
      operations would "contain the machine address of the PTE to update"
      If the flag is unset, then the grant operation is
      "contains a host virtual address". The latter case means that
      the Hypervisor takes care of updating our page table
      (specifically the PTE entry) with the guest's MFN. As such we should
      not try to do anything with the PTE. Previous to this patch
      we would try to clear the PTE which resulted in Xen hypervisor
      being upset with us:
      
      (XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
      (XEN) domain_crash called from mm.c:1067
      (XEN) Domain 0 (vcpu#0) crashed on cpu#3:
      (XEN) ----[ Xen-4.0-110228  x86_64  debug=y  Not tainted ]----
      
      and crashing us.
      
      This patch allows us to inhibit the PTE clearing in the PV guest
      if the GNTMAP_contains_pte is not set.
      
      On the m2p_remove_override path we provide the same parameter.
      
      Sadly in the grant-table driver we do not have a mechanism to
      tell m2p_remove_override whether to clear the PTE or not. Since
      the grant-table driver is used by user-space, we can safely assume
      that it operates only on PTE's. Hence the implementation for
      it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
      we can implement the support for this. It will require some extra
      accounting structure to keep track of the page[i], and the flag.
      
      [v1: Added documentation details, made it return -EOPNOTSUPP instead
       of trying to do a half-way implementation]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d5431d52
  19. 13 5月, 2011 1 次提交
  20. 20 4月, 2011 1 次提交
  21. 18 4月, 2011 1 次提交
    • K
      xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. · cf8d9163
      Konrad Rzeszutek Wilk 提交于
      We only supported the M2P (and P2M) override only for the
      GNTMAP_contains_pte type mappings. Meaning that we grants
      operations would "contain the machine address of the PTE to update"
      If the flag is unset, then the grant operation is
      "contains a host virtual address". The latter case means that
      the Hypervisor takes care of updating our page table
      (specifically the PTE entry) with the guest's MFN. As such we should
      not try to do anything with the PTE. Previous to this patch
      we would try to clear the PTE which resulted in Xen hypervisor
      being upset with us:
      
      (XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
      (XEN) domain_crash called from mm.c:1067
      (XEN) Domain 0 (vcpu#0) crashed on cpu#3:
      (XEN) ----[ Xen-4.0-110228  x86_64  debug=y  Not tainted ]----
      
      and crashing us.
      
      This patch allows us to inhibit the PTE clearing in the PV guest
      if the GNTMAP_contains_pte is not set.
      
      On the m2p_remove_override path we provide the same parameter.
      
      Sadly in the grant-table driver we do not have a mechanism to
      tell m2p_remove_override whether to clear the PTE or not. Since
      the grant-table driver is used by user-space, we can safely assume
      that it operates only on PTE's. Hence the implementation for
      it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
      we can implement the support for this. It will require some extra
      accounting structure to keep track of the page[i], and the flag.
      
      [v1: Added documentation details, made it return -EOPNOTSUPP instead
       of trying to do a half-way implementation]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cf8d9163
  22. 29 3月, 2011 1 次提交
    • R
      xen: fix p2m section mismatches · b83c6e55
      Randy Dunlap 提交于
      Fix section mismatch warnings:
      set_phys_range_identity() is called by __init xen_set_identity(),
      so also mark set_phys_range_identity() as __init.
      then:
      __early_alloc_p2m() is called set_phys_range_identity(), so also mark
      __early_alloc_p2m() as __init.
      
      WARNING: arch/x86/built-in.o(.text+0x7856): Section mismatch in reference from the function __early_alloc_p2m() to the function .init.text:extend_brk()
      The function __early_alloc_p2m() references
      the function __init extend_brk().
      This is often because __early_alloc_p2m lacks a __init
      annotation or the annotation of extend_brk is wrong.
      
      WARNING: arch/x86/built-in.o(.text+0x7967): Section mismatch in reference from the function set_phys_range_identity() to the function .init.text:extend_brk()
      The function set_phys_range_identity() references
      the function __init extend_brk().
      This is often because set_phys_range_identity lacks a __init
      annotation or the annotation of extend_brk is wrong.
      
      [v2: Per Stephen Hemming recommonedation made __early_alloc_p2m static]
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b83c6e55
  23. 24 3月, 2011 1 次提交
  24. 14 3月, 2011 3 次提交
    • K
      xen/debugfs: Add 'p2m' file for printing out the P2M layout. · 2222e71b
      Konrad Rzeszutek Wilk 提交于
      We walk over the whole P2M tree and construct a simplified view of
      which PFN regions belong to what level and what type they are.
      
      Only enabled if CONFIG_XEN_DEBUG_FS is set.
      
      [v2: UNKN->UNKNOWN, use uninitialized_var]
      [v3: Rebased on top of mmu->p2m code split]
      [v4: Fixed the else if]
      Reviewed-by: NIan Campbell <Ian.Campbell@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2222e71b
    • K
      xen/mmu: WARN_ON when racing to swap middle leaf. · c7617798
      Konrad Rzeszutek Wilk 提交于
      The initial bootup code uses set_phys_to_machine quite a lot, and after
      bootup it would be used by the balloon driver. The balloon driver does have
      mutex lock so this should not be necessary - but just in case, add
      a WARN_ON if we do hit this scenario. If we do fail this, it is OK
      to continue as there is a backup mechanism (VM_IO) that can bypass
      the P2M and still set the _PAGE_IOMAP flags.
      
      [v2: Change from WARN to BUG_ON]
      [v3: Rebased on top of xen->p2m code split]
      [v4: Change from BUG_ON to WARN]
      Reviewed-by: NIan Campbell <Ian.Campbell@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c7617798
    • K
      xen/mmu: Add the notion of identity (1-1) mapping. · f4cec35b
      Konrad Rzeszutek Wilk 提交于
      Our P2M tree structure is a three-level. On the leaf nodes
      we set the Machine Frame Number (MFN) of the PFN. What this means
      is that when one does: pfn_to_mfn(pfn), which is used when creating
      PTE entries, you get the real MFN of the hardware. When Xen sets
      up a guest it initially populates a array which has descending
      (or ascending) MFN values, as so:
      
       idx: 0,  1,       2
       [0x290F, 0x290E, 0x290D, ..]
      
      so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list
      starts looking quite random.
      
      We graft this structure on our P2M tree structure and stick in
      those MFN in the leafs. But for all other leaf entries, or for the top
      root, or middle one, for which there is a void entry, we assume it is
      "missing". So
       pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY.
      
      We add the possibility of setting 1-1 mappings on certain regions, so
      that:
       pfn_to_mfn(0xc0000)=0xc0000
      
      The benefit of this is, that we can assume for non-RAM regions (think
      PCI BARs, or ACPI spaces), we can create mappings easily b/c we
      get the PFN value to match the MFN.
      
      For this to work efficiently we introduce one new page p2m_identity and
      allocate (via reserved_brk) any other pages we need to cover the sides
      (1GB or 4MB boundary violations). All entries in p2m_identity are set to
      INVALID_P2M_ENTRY type (Xen toolstack only recognizes that and MFNs,
      no other fancy value).
      
      On lookup we spot that the entry points to p2m_identity and return the identity
      value instead of dereferencing and returning INVALID_P2M_ENTRY. If the entry
      points to an allocated page, we just proceed as before and return the PFN.
      If the PFN has IDENTITY_FRAME_BIT set we unmask that in appropriate functions
      (pfn_to_mfn).
      
      The reason for having the IDENTITY_FRAME_BIT instead of just returning the
      PFN is that we could find ourselves where pfn_to_mfn(pfn)==pfn for a
      non-identity pfn. To protect ourselves against we elect to set (and get) the
      IDENTITY_FRAME_BIT on all identity mapped PFNs.
      
      This simplistic diagram is used to explain the more subtle piece of code.
      There is also a digram of the P2M at the end that can help.
      Imagine your E820 looking as so:
      
                         1GB                                           2GB
      /-------------------+---------\/----\         /----------\    /---+-----\
      | System RAM        | Sys RAM ||ACPI|         | reserved |    | Sys RAM |
      \-------------------+---------/\----/         \----------/    \---+-----/
                                    ^- 1029MB                       ^- 2001MB
      
      [1029MB = 263424 (0x40500), 2001MB = 512256 (0x7D100), 2048MB = 524288 (0x80000)]
      
      And dom0_mem=max:3GB,1GB is passed in to the guest, meaning memory past 1GB
      is actually not present (would have to kick the balloon driver to put it in).
      
      When we are told to set the PFNs for identity mapping (see patch: "xen/setup:
      Set identity mapping for non-RAM E820 and E820 gaps.") we pass in the start
      of the PFN and the end PFN (263424 and 512256 respectively). The first step is
      to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page
      covers 512^2 of page estate (1GB) and in case the start or end PFN is not
      aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn to
      end pfn.  We reserve_brk top leaf pages if they are missing (means they point
      to p2m_mid_missing).
      
      With the E820 example above, 263424 is not 1GB aligned so we allocate a
      reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000.
      Each entry in the allocate page is "missing" (points to p2m_missing).
      
      Next stage is to determine if we need to do a more granular boundary check
      on the 4MB (or 2MB depending on architecture) off the start and end pfn's.
      We check if the start pfn and end pfn violate that boundary check, and if
      so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer
      granularity of setting which PFNs are missing and which ones are identity.
      In our example 263424 and 512256 both fail the check so we reserve_brk two
      pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing" values)
      and assign them to p2m[1][2] and p2m[1][488] respectively.
      
      At this point we would at minimum reserve_brk one page, but could be up to
      three. Each call to set_phys_range_identity has at maximum a three page
      cost. If we were to query the P2M at this stage, all those entries from
      start PFN through end PFN (so 1029MB -> 2001MB) would return INVALID_P2M_ENTRY
      ("missing").
      
      The next step is to walk from the start pfn to the end pfn setting
      the IDENTITY_FRAME_BIT on each PFN. This is done in 'set_phys_range_identity'.
      If we find that the middle leaf is pointing to p2m_missing we can swap it over
      to p2m_identity - this way covering 4MB (or 2MB) PFN space.  At this point we
      do not need to worry about boundary aligment (so no need to reserve_brk a middle
      page, figure out which PFNs are "missing" and which ones are identity), as that
      has been done earlier.  If we find that the middle leaf is not occupied by
      p2m_identity or p2m_missing, we dereference that page (which covers
      512 PFNs) and set the appropriate PFN with IDENTITY_FRAME_BIT. In our example
      263424 and 512256 end up there, and we set from p2m[1][2][256->511] and
      p2m[1][488][0->256] with IDENTITY_FRAME_BIT set.
      
      All other regions that are void (or not filled) either point to p2m_missing
      (considered missing) or have the default value of INVALID_P2M_ENTRY (also
      considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511]
      contain the INVALID_P2M_ENTRY value and are considered "missing."
      
      This is what the p2m ends up looking (for the E820 above) with this
      fabulous drawing:
      
         p2m         /--------------\
       /-----\       | &mfn_list[0],|                           /-----------------\
       |  0  |------>| &mfn_list[1],|    /---------------\      | ~0, ~0, ..      |
       |-----|       |  ..., ~0, ~0 |    | ~0, ~0, [x]---+----->| IDENTITY [@256] |
       |  1  |---\   \--------------/    | [p2m_identity]+\     | IDENTITY [@257] |
       |-----|    \                      | [p2m_identity]+\\    | ....            |
       |  2  |--\  \-------------------->|  ...          | \\   \----------------/
       |-----|   \                       \---------------/  \\
       |  3  |\   \                                          \\  p2m_identity
       |-----| \   \-------------------->/---------------\   /-----------------\
       | ..  +->+                        | [p2m_identity]+-->| ~0, ~0, ~0, ... |
       \-----/ /                         | [p2m_identity]+-->| ..., ~0         |
              / /---------------\        | ....          |   \-----------------/
             /  | IDENTITY[@0]  |      /-+-[x], ~0, ~0.. |
            /   | IDENTITY[@256]|<----/  \---------------/
           /    | ~0, ~0, ....  |
          |     \---------------/
          |
          p2m_missing             p2m_missing
      /------------------\     /------------\
      | [p2m_mid_missing]+---->| ~0, ~0, ~0 |
      | [p2m_mid_missing]+---->| ..., ~0    |
      \------------------/     \------------/
      
      where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN | IDENTITY_BIT)
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      [v5: Changed code to use ranges, added ASCII art]
      [v6: Rebased on top of xen->p2m code split]
      [v4: Squished patches in just this one]
      [v7: Added RESERVE_BRK for potentially allocated pages]
      [v8: Fixed alignment problem]
      [v9: Changed 1<<3X to 1<<BITS_PER_LONG-X]
      [v10: Copied git commit description in the p2m code + Add Review tag]
      [v11: Title had '2-1' - should be '1-1' mapping]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f4cec35b
  25. 04 3月, 2011 1 次提交
    • K
      xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY. · 6eaa412f
      Konrad Rzeszutek Wilk 提交于
      With this patch, we diligently set regions that will be used by the
      balloon driver to be INVALID_P2M_ENTRY and under the ownership
      of the balloon driver. We are OK using the __set_phys_to_machine
      as we do not expect to be allocating any P2M middle or entries pages.
      The set_phys_to_machine has the side-effect of potentially allocating
      new pages and we do not want that at this stage.
      
      We can do this because xen_build_mfn_list_list will have already
      allocated all such pages up to xen_max_p2m_pfn.
      
      We also move the check for auto translated physmap down the
      stack so it is present in __set_phys_to_machine.
      
      [v2: Rebased with mmu->p2m code split]
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      6eaa412f
  26. 12 2月, 2011 2 次提交
    • I
      xen: annotate functions which only call into __init at start of day · 44b46c3e
      Ian Campbell 提交于
      Both xen_hvm_init_shared_info and xen_build_mfn_list_list can be
      called at resume time as well as at start of day but only reference
      __init functions (extend_brk) at start of day. Hence annotate with
      __ref.
      
          WARNING: arch/x86/built-in.o(.text+0x4f1): Section mismatch in reference
              from the function xen_hvm_init_shared_info() to the function
              .init.text:extend_brk()
          The function xen_hvm_init_shared_info() references
          the function __init extend_brk().
          This is often because xen_hvm_init_shared_info lacks a __init
          annotation or the annotation of extend_brk is wrong.
      
      xen_hvm_init_shared_info calls extend_brk() iff !shared_info_page and
      initialises shared_info_page with the result. This happens at start of
      day only.
      
          WARNING: arch/x86/built-in.o(.text+0x599b): Section mismatch in reference
              from the function xen_build_mfn_list_list() to the function
              .init.text:extend_brk()
          The function xen_build_mfn_list_list() references
          the function __init extend_brk().
          This is often because xen_build_mfn_list_list lacks a __init
          annotation or the annotation of extend_brk is wrong.
      
      (this warning occurs multiple times)
      
      xen_build_mfn_list_list only calls extend_brk() at boot time, while
      building the initial mfn list list
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      44b46c3e
    • I
      xen p2m: annotate variable which appears unused · 6b08cfeb
      Ian Campbell 提交于
       CC      arch/x86/xen/p2m.o
      arch/x86/xen/p2m.c: In function 'm2p_remove_override':
      arch/x86/xen/p2m.c:460: warning: 'address' may be used uninitialized in this function
      arch/x86/xen/p2m.c: In function 'm2p_add_override':
      arch/x86/xen/p2m.c:426: warning: 'address' may be used uninitialized in this function
      
      In actual fact address is inialised in one "if (!PageHighMem(page))"
      statement and used in a second and so is always initialised before
      use.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      6b08cfeb
  27. 27 1月, 2011 1 次提交
  28. 22 1月, 2011 1 次提交
    • S
      xen: p2m: correctly initialize partial p2m leaf · 8e1b4cf2
      Stefan Bader 提交于
      After changing the p2m mapping to a tree by
      
        commit 58e05027
          xen: convert p2m to a 3 level tree
      
      and trying to boot a DomU with 615MB of memory, the following crash was
      observed in the dump:
      
      kernel direct mapping tables up to 26f00000 @ 1ec4000-1fff000
      BUG: unable to handle kernel NULL pointer dereference at (null)
      IP: [<c0107397>] xen_set_pte+0x27/0x60
      *pdpt = 0000000000000000 *pde = 0000000000000000
      
      Adding further debug statements showed that when trying to set up
      pfn=0x26700 the returned mapping was invalid.
      
      pfn=0x266ff calling set_pte(0xc1fe77f8, 0x6b3003)
      pfn=0x26700 calling set_pte(0xc1fe7800, 0x3)
      
      Although the last_pfn obtained from the startup info is 0x26700, which
      should in turn not be hit, the additional 8MB which are added as extra
      memory normally seem to be ok. This lead to looking into the initial
      p2m tree construction, which uses the smaller value and assuming that
      there is other code handling the extra memory.
      
      When the p2m tree is set up, the leaves are directly pointed to the
      array which the domain builder set up. But if the mapping is not on a
      boundary that fits into one p2m page, this will result in the last leaf
      being only partially valid. And as the invalid entries are not
      initialized in that case, things go badly wrong.
      
      I am trying to fix that by checking whether the current leaf is a
      complete map and if not, allocate a completely new page and copy only
      the valid pointers there. This may not be the most efficient or elegant
      solution, but at least it seems to allow me booting DomUs with memory
      assignments all over the range.
      
      BugLink: http://bugs.launchpad.net/bugs/686692
      [v2: Redid a bit of commit wording and fixed a compile warning]
      Signed-off-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      8e1b4cf2