1. 24 9月, 2012 1 次提交
  2. 18 9月, 2012 2 次提交
  3. 12 9月, 2012 4 次提交
  4. 06 9月, 2012 1 次提交
  5. 05 9月, 2012 2 次提交
    • A
      xen: fix logical error in tlb flushing · ce7184bd
      Alex Shi 提交于
      While TLB_FLUSH_ALL gets passed as 'end' argument to
      flush_tlb_others(), the Xen code was made to check its 'start'
      parameter. That may give a incorrect op.cmd to MMUEXT_INVLPG_MULTI
      instead of MMUEXT_TLB_FLUSH_MULTI. Then it causes some page can not
      be flushed from TLB.
      
      This patch fixed this issue.
      Reported-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NAlex Shi <alex.shi@intel.com>
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Tested-by: NYongjie Ren <yongjie.ren@intel.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ce7184bd
    • K
      xen/p2m: Fix one-off error in checking the P2M tree directory. · 50e90041
      Konrad Rzeszutek Wilk 提交于
      We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES
      inclusive) when trying to figure out whether we can re-use some of the
      P2M middle leafs.
      
      Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512
      we would try to use the 512th entry. Fortunately for us the p2m_top_index
      has a check for this:
      
       BUG_ON(pfn >= MAX_P2M_PFN);
      
      which we hit and saw this:
      
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      (XEN) ----[ Xen-4.1.2-OVM  x86_64  debug=n  Tainted:    C ]----
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff819cadeb>]
      (XEN) RFLAGS: 0000000000000212   EM: 1   CONTEXT: pv guest
      (XEN) rax: ffffffff81db5000   rbx: ffffffff81db4000   rcx: 0000000000000000
      (XEN) rdx: 0000000000480211   rsi: 0000000000000000   rdi: ffffffff81db4000
      (XEN) rbp: ffffffff81793db8   rsp: ffffffff81793d38   r8:  0000000008000000
      (XEN) r9:  4000000000000000   r10: 0000000000000000   r11: ffffffff81db7000
      (XEN) r12: 0000000000000ff8   r13: ffffffff81df1ff8   r14: ffffffff81db6000
      (XEN) r15: 0000000000000ff8   cr0: 000000008005003b   cr4: 00000000000026f0
      (XEN) cr3: 0000000661795000   cr2: 0000000000000000
      
      Fixes-Oracle-Bug: 14570662
      CC: stable@vger.kernel.org # only for v3.5
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      50e90041
  6. 23 8月, 2012 15 次提交
    • K
      xen/mmu: If the revector fails, don't attempt to revector anything else. · 32873187
      Konrad Rzeszutek Wilk 提交于
      If the P2M revectoring would fail, we would try to continue on by
      cleaning the PMD for L1 (PTE) page-tables. The xen_cleanhighmap
      is greedy and erases the PMD on both boundaries. Since the P2M
      array can share the PMD, we would wipe out part of the __ka
      that is still used in the P2M tree to point to P2M leafs.
      
      This fixes it by bypassing the revectoring and continuing on.
      If the revector fails, a nice WARN is printed so we can still
      troubleshoot this.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      32873187
    • K
      xen/p2m: When revectoring deal with holes in the P2M array. · 3fc509fc
      Konrad Rzeszutek Wilk 提交于
      When we free the PFNs and then subsequently populate them back
      during bootup:
      
      Freeing 20000-20200 pfn range: 512 pages freed
      1-1 mapping on 20000->20200
      Freeing 40000-40200 pfn range: 512 pages freed
      1-1 mapping on 40000->40200
      Freeing bad80-badf4 pfn range: 116 pages freed
      1-1 mapping on bad80->badf4
      Freeing badf6-bae7f pfn range: 137 pages freed
      1-1 mapping on badf6->bae7f
      Freeing bb000-100000 pfn range: 282624 pages freed
      1-1 mapping on bb000->100000
      Released 283999 pages of unused memory
      Set 283999 page(s) to 1-1 mapping
      Populating 1acb8a-1f20e9 pfn range: 283999 pages added
      
      We end up having the P2M array (that is the one that was
      grafted on the P2M tree) filled with IDENTITY_FRAME or
      INVALID_P2M_ENTRY) entries. The patch titled
      
      "xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID."
      recycles said slots and replaces the P2M tree leaf's with
       &mfn_list[xx] with p2m_identity or p2m_missing.
      
      And re-uses the P2M array sections for other P2M tree leaf's.
      For the above mentioned bootup excerpt, the PFNs at
      0x20000->0x20200 are going to be IDENTITY based:
      
      P2M[0][256][0] -> P2M[0][257][0] get turned in IDENTITY_FRAME.
      
      We can re-use that and replace P2M[0][256] to point to p2m_identity.
      The "old" page (the grafted P2M array provided by Xen) that was at
      P2M[0][256] gets put somewhere else. Specifically at P2M[6][358],
      b/c when we populate back:
      
      Populating 1acb8a-1f20e9 pfn range: 283999 pages added
      
      we fill P2M[6][358][0] (and P2M[6][358], P2M[6][359], ...) with
      the new MFNs.
      
      That is all OK, except when we revector we assume that the PFN
      count would be the same in the grafted P2M array and in the
      newly allocated. Since that is no longer the case, as we have
      holes in the P2M that point to p2m_missing or p2m_identity we
      have to take that into account.
      
      [v2: Check for overflow]
      [v3: Move within the __va check]
      [v4: Fix the computation]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3fc509fc
    • K
      xen/mmu: Release just the MFN list, not MFN list and part of pagetables. · 785f6231
      Konrad Rzeszutek Wilk 提交于
      We call memblock_reserve for [start of mfn list] -> [PMD aligned end
      of mfn list] instead of <start of mfn list> -> <page aligned end of mfn list].
      
      This has the disastrous effect that if at bootup the end of mfn_list is
      not PMD aligned we end up returning to memblock parts of the region
      past the mfn_list array. And those parts are the PTE tables with
      the disastrous effect of seeing this at bootup:
      
      Write protecting the kernel read-only data: 10240k
      Freeing unused kernel memory: 1860k freed
      Freeing unused kernel memory: 200k freed
      (XEN) mm.c:2429:d0 Bad type (saw 1400000000000002 != exp 7000000000000000) for mfn 116a80 (pfn 14e26)
      ...
      (XEN) mm.c:908:d0 Error getting mfn 116a83 (pfn 14e2a) from L1 entry 8000000116a83067 for l1e_owner=0, pg_owner=0
      (XEN) mm.c:908:d0 Error getting mfn 4040 (pfn 5555555555555555) from L1 entry 0000000004040601 for l1e_owner=0, pg_owner=0
      .. and so on.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      785f6231
    • K
      xen/mmu: Remove from __ka space PMD entries for pagetables. · 3aca7fbc
      Konrad Rzeszutek Wilk 提交于
      Please first read the description in "xen/mmu: Copy and revector the
      P2M tree."
      
      At this stage, the __ka address space (which is what the old
      P2M tree was using) is partially disassembled. The cleanup_highmap
      has removed the PMD entries from 0-16MB and anything past _brk_end
      up to the max_pfn_mapped (which is the end of the ramdisk).
      
      The xen_remove_p2m_tree and code around has ripped out the __ka for
      the old P2M array.
      
      Here we continue on doing it to where the Xen page-tables were.
      It is safe to do it, as the page-tables are addressed using __va.
      For good measure we delete anything that is within MODULES_VADDR
      and up to the end of the PMD.
      
      At this point the __ka only contains PMD entries for the start
      of the kernel up to __brk.
      
      [v1: Per Stefano's suggestion wrapped the MODULES_VADDR in debug]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3aca7fbc
    • K
      xen/mmu: Copy and revector the P2M tree. · 7f914062
      Konrad Rzeszutek Wilk 提交于
      Please first read the description in "xen/p2m: Add logic to revector a
      P2M tree to use __va leafs" patch.
      
      The 'xen_revector_p2m_tree()' function allocates a new P2M tree
      copies the contents of the old one in it, and returns the new one.
      
      At this stage, the __ka address space (which is what the old
      P2M tree was using) is partially disassembled. The cleanup_highmap
      has removed the PMD entries from 0-16MB and anything past _brk_end
      up to the max_pfn_mapped (which is the end of the ramdisk).
      
      We have revectored the P2M tree (and the one for save/restore as well)
      to use new shiny __va address to new MFNs. The xen_start_info
      has been taken care of already in 'xen_setup_kernel_pagetable()' and
      xen_start_info->shared_info in 'xen_setup_shared_info()', so
      we are free to roam and delete PMD entries - which is exactly what
      we are going to do. We rip out the __ka for the old P2M array.
      
      [v1: Fix smatch warnings]
      [v2: memset was doing 0 instead of 0xff]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7f914062
    • K
      xen/p2m: Add logic to revector a P2M tree to use __va leafs. · 357a3cfb
      Konrad Rzeszutek Wilk 提交于
      During bootup Xen supplies us with a P2M array. It sticks
      it right after the ramdisk, as can be seen with a 128GB PV guest:
      
      (certain parts removed for clarity):
      xc_dom_build_image: called
      xc_dom_alloc_segment:   kernel       : 0xffffffff81000000 -> 0xffffffff81e43000  (pfn 0x1000 + 0xe43 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
      xc_dom_alloc_segment:   ramdisk      : 0xffffffff81e43000 -> 0xffffffff925c7000  (pfn 0x1e43 + 0x10784 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
      xc_dom_alloc_segment:   phys2mach    : 0xffffffff925c7000 -> 0xffffffffa25c7000  (pfn 0x125c7 + 0x10000 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000
      xc_dom_alloc_page   :   start info   : 0xffffffffa25c7000 (pfn 0x225c7)
      xc_dom_alloc_page   :   xenstore     : 0xffffffffa25c8000 (pfn 0x225c8)
      xc_dom_alloc_page   :   console      : 0xffffffffa25c9000 (pfn 0x225c9)
      nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
      nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
      nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s)
      nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s)
      xc_dom_alloc_segment:   page tables  : 0xffffffffa25ca000 -> 0xffffffffa26e1000  (pfn 0x225ca + 0x117 pages)
      xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
      xc_dom_alloc_page   :   boot stack   : 0xffffffffa26e1000 (pfn 0x226e1)
      xc_dom_build_image  : virt_alloc_end : 0xffffffffa26e2000
      xc_dom_build_image  : virt_pgtab_end : 0xffffffffa2800000
      
      So the physical memory and virtual (using __START_KERNEL_map addresses)
      layout looks as so:
      
        phys                             __ka
      /------------\                   /-------------------\
      | 0          | empty             | 0xffffffff80000000|
      | ..         |                   | ..                |
      | 16MB       | <= kernel starts  | 0xffffffff81000000|
      | ..         |                   |                   |
      | 30MB       | <= kernel ends => | 0xffffffff81e43000|
      | ..         |  & ramdisk starts | ..                |
      | 293MB      | <= ramdisk ends=> | 0xffffffff925c7000|
      | ..         |  & P2M starts     | ..                |
      | ..         |                   | ..                |
      | 549MB      | <= P2M ends    => | 0xffffffffa25c7000|
      | ..         | start_info        | 0xffffffffa25c7000|
      | ..         | xenstore          | 0xffffffffa25c8000|
      | ..         | cosole            | 0xffffffffa25c9000|
      | 549MB      | <= page tables => | 0xffffffffa25ca000|
      | ..         |                   |                   |
      | 550MB      | <= PGT end     => | 0xffffffffa26e1000|
      | ..         | boot stack        |                   |
      \------------/                   \-------------------/
      
      As can be seen, the ramdisk, P2M and pagetables are taking
      a bit of __ka addresses space. Which is a problem since the
      MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
      right in there! This results during bootup with the inability to
      load modules, with this error:
      
      ------------[ cut here ]------------
      WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370()
      Call Trace:
       [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0
       [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
       [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20
       [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370
       [<ffffffff81130c4d>] map_vm_area+0x2d/0x50
       [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250
       [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c6186>] ? load_module+0x66/0x19c0
       [<ffffffff8105cadc>] module_alloc+0x5c/0x60
       [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80
       [<ffffffff810c70c3>] load_module+0xfa3/0x19c0
       [<ffffffff812491f6>] ? security_file_permission+0x86/0x90
       [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220
       [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b
      ---[ end trace fd8f7704fdea0291 ]---
      vmalloc: allocation failure, allocated 16384 of 20480 bytes
      modprobe: page allocation failure: order:0, mode:0xd2
      
      Since the __va and __ka are 1:1 up to MODULES_VADDR and
      cleanup_highmap rids __ka of the ramdisk mapping, what
      we want to do is similar - get rid of the P2M in the __ka
      address space. There are two ways of fixing this:
      
       1) All P2M lookups instead of using the __ka address would
          use the __va address. This means we can safely erase from
          __ka space the PMD pointers that point to the PFNs for
          P2M array and be OK.
       2). Allocate a new array, copy the existing P2M into it,
          revector the P2M tree to use that, and return the old
          P2M to the memory allocate. This has the advantage that
          it sets the stage for using XEN_ELF_NOTE_INIT_P2M
          feature. That feature allows us to set the exact virtual
          address space we want for the P2M - and allows us to
          boot as initial domain on large machines.
      
      So we pick option 2).
      
      This patch only lays the groundwork in the P2M code. The patch
      that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      357a3cfb
    • K
      xen/mmu: Recycle the Xen provided L4, L3, and L2 pages · 488f046d
      Konrad Rzeszutek Wilk 提交于
      As we are not using them. We end up only using the L1 pagetables
      and grafting those to our page-tables.
      
      [v1: Per Stefano's suggestion squashed two commits]
      [v2: Per Stefano's suggestion simplified loop]
      [v3: Fix smatch warnings]
      [v4: Add more comments]
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      488f046d
    • K
      xen/mmu: For 64-bit do not call xen_map_identity_early · caaf9ecf
      Konrad Rzeszutek Wilk 提交于
      B/c we do not need it. During the startup the Xen provides
      us with all the initial memory mapped that we need to function.
      
      The initial memory mapped is up to the bootstack, which means
      we can reference using __ka up to 4.f):
      
      (from xen/interface/xen.h):
      
       4. This the order of bootstrap elements in the initial virtual region:
         a. relocated kernel image
         b. initial ram disk              [mod_start, mod_len]
         c. list of allocated page frames [mfn_list, nr_pages]
         d. start_info_t structure        [register ESI (x86)]
         e. bootstrap page tables         [pt_base, CR3 (x86)]
         f. bootstrap stack               [register ESP (x86)]
      
      (initial ram disk may be ommitted).
      
      [v1: More comments in git commit]
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      caaf9ecf
    • K
      xen/mmu: use copy_page instead of memcpy. · ae895ed7
      Konrad Rzeszutek Wilk 提交于
      After all, this is what it is there for.
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ae895ed7
    • K
      xen/mmu: Provide comments describing the _ka and _va aliasing issue · 4fac153a
      Konrad Rzeszutek Wilk 提交于
      Which is that the level2_kernel_pgt (__ka virtual addresses)
      and level2_ident_pgt (__va virtual address) contain the same
      PMD entries. So if you modify a PTE in __ka, it will be reflected
      in __va (and vice-versa).
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      4fac153a
    • K
      xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything. · 3699aad0
      Konrad Rzeszutek Wilk 提交于
      We don't need to return the new PGD - as we do not use it.
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3699aad0
    • K
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and... · 51faaf2b
      Konrad Rzeszutek Wilk 提交于
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas."
      
      This reverts commit 806c312e and
      commit 59b29440.
      
      And also documents setup.c and why we want to do it that way, which
      is that we tried to make the the memblock_reserve more selective so
      that it would be clear what region is reserved. Sadly we ran
      in the problem wherein on a 64-bit hypervisor with a 32-bit
      initial domain, the pt_base has the cr3 value which is not
      neccessarily where the pagetable starts! As Jan put it: "
      Actually, the adjustment turns out to be correct: The page
      tables for a 32-on-64 dom0 get allocated in the order "first L1",
      "first L2", "first L3", so the offset to the page table base is
      indeed 2. When reading xen/include/public/xen.h's comment
      very strictly, this is not a violation (since there nothing is said
      that the first thing in the page table space is pointed to by
      pt_base; I admit that this seems to be implied though, namely
      do I think that it is implied that the page table space is the
      range [pt_base, pt_base + nt_pt_frames), whereas that
      range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames),
      which - without a priori knowledge - the kernel would have
      difficulty to figure out)." - so lets just fall back to the
      easy way and reserve the whole region.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      51faaf2b
    • K
      xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer. · 6d7083ee
      Konrad Rzeszutek Wilk 提交于
      arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer
      arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      6d7083ee
    • S
      xen: allow privcmd for HVM guests · 1a1d4331
      Stefano Stabellini 提交于
      This patch removes the "return -ENOSYS" for auto_translated_physmap
      guests from privcmd_mmap, thus it allows ARM guests to issue privcmd
      mmap calls. However privcmd mmap calls are still going to fail for HVM
      and hybrid guests on x86 because the xen_remap_domain_mfn_range
      implementation is currently PV only.
      
      Changes in v2:
      
      - better commit message;
      - return -EINVAL from xen_remap_domain_mfn_range if
        auto_translated_physmap.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      1a1d4331
    • K
      xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. · c96aae1f
      Konrad Rzeszutek Wilk 提交于
      When we are finished with return PFNs to the hypervisor, then
      populate it back, and also mark the E820 MMIO and E820 gaps
      as IDENTITY_FRAMEs, we then call P2M to set areas that can
      be used for ballooning. We were off by one, and ended up
      over-writting a P2M entry that most likely was an IDENTITY_FRAME.
      For example:
      
      1-1 mapping on 40000->40200
      1-1 mapping on bc558->bc5ac
      1-1 mapping on bc5b4->bc8c5
      1-1 mapping on bc8c6->bcb7c
      1-1 mapping on bcd00->100000
      Released 614 pages of unused memory
      Set 277889 page(s) to 1-1 mapping
      Populating 40200-40466 pfn range: 614 pages added
      
      => here we set from 40466 up to bc559 P2M tree to be
      INVALID_P2M_ENTRY. We should have done it up to bc558.
      
      The end result is that if anybody is trying to construct
      a PTE for PFN bc558 they end up with ~PAGE_PRESENT.
      
      CC: stable@vger.kernel.org
      Reported-by-and-Tested-by: NAndre Przywara <andre.przywara@amd.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c96aae1f
  7. 22 8月, 2012 6 次提交
  8. 17 8月, 2012 2 次提交
  9. 02 8月, 2012 1 次提交
    • K
      xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back. · 5bc6f988
      Konrad Rzeszutek Wilk 提交于
      When we release pages back during bootup:
      
      Freeing  9d-100 pfn range: 99 pages freed
      Freeing  9cf36-9d0d2 pfn range: 412 pages freed
      Freeing  9f6bd-9f6bf pfn range: 2 pages freed
      Freeing  9f714-9f7bf pfn range: 171 pages freed
      Freeing  9f7e0-9f7ff pfn range: 31 pages freed
      Freeing  9f800-100000 pfn range: 395264 pages freed
      Released 395979 pages of unused memory
      
      We then try to populate those pages back. In the P2M tree however
      the space for those leafs must be reserved - as such we use extend_brk.
      We reserve 8MB of _brk space, which means we can fit over
      1048576 PFNs - which is more than we should ever need.
      
      Without this, on certain compilation of the kernel we would hit:
      
      (XEN) domain_crash_sync called from entry.S
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff818aad3b>]
      (XEN) RFLAGS: 0000000000000206   EM: 1   CONTEXT: pv guest
      (XEN) rax: ffffffff81a7c000   rbx: 000000000000003d   rcx: 0000000000001000
      (XEN) rdx: ffffffff81a7b000   rsi: 0000000000001000   rdi: 0000000000001000
      (XEN) rbp: ffffffff81801cd8   rsp: ffffffff81801c98   r8:  0000000000100000
      (XEN) r9:  ffffffff81a7a000   r10: 0000000000000001   r11: 0000000000000003
      (XEN) r12: 0000000000000004   r13: 0000000000000004   r14: 000000000000003d
      (XEN) r15: 00000000000001e8   cr0: 000000008005003b   cr4: 00000000000006f0
      (XEN) cr3: 0000000125803000   cr2: 0000000000000000
      (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
      (XEN) Guest stack trace from rsp=ffffffff81801c98:
      
      .. which is extend_brk hitting a BUG_ON.
      
      Interestingly enough, most of the time we are not going to hit this
      b/c the _brk space is quite large (v3.5):
       ffffffff81a25000 B __brk_base
       ffffffff81e43000 B __brk_limit
      = ~4MB.
      
      vs earlier kernels (with this back-ported), the space is smaller:
       ffffffff81a25000 B __brk_base
       ffffffff81a7b000 B __brk_limit
      = 344 kBytes.
      
      where we would certainly hit this and hit extend_brk.
      
      Note that git commit c3d93f88
      (xen: populate correct number of pages when across mem boundary (v2))
      exposed this bug).
      
      [v1: Made it 8MB of _brk space instead of 4MB per Jan's suggestion]
      
      CC: stable@vger.kernel.org #only for 3.5
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      5bc6f988
  10. 31 7月, 2012 1 次提交
  11. 20 7月, 2012 5 次提交
    • Z
      xen: populate correct number of pages when across mem boundary (v2) · c3d93f88
      zhenzhong.duan 提交于
      When populate pages across a mem boundary at bootup, the page count
      populated isn't correct. This is due to mem populated to non-mem
      region and ignored.
      
      Pfn range is also wrongly aligned when mem boundary isn't page aligned.
      
      For a dom0 booted with dom_mem=3368952K(0xcd9ff000-4k) dmesg diff is:
       [    0.000000] Freeing 9e-100 pfn range: 98 pages freed
       [    0.000000] 1-1 mapping on 9e->100
       [    0.000000] 1-1 mapping on cd9ff->100000
       [    0.000000] Released 98 pages of unused memory
       [    0.000000] Set 206435 page(s) to 1-1 mapping
      -[    0.000000] Populating cd9fe-cda00 pfn range: 1 pages added
      +[    0.000000] Populating cd9fe-cd9ff pfn range: 1 pages added
      +[    0.000000] Populating 100000-100061 pfn range: 97 pages added
       [    0.000000] BIOS-provided physical RAM map:
       [    0.000000] Xen: 0000000000000000 - 000000000009e000 (usable)
       [    0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved)
       [    0.000000] Xen: 0000000000100000 - 00000000cd9ff000 (usable)
       [    0.000000] Xen: 00000000cd9ffc00 - 00000000cda53c00 (ACPI NVS)
      ...
       [    0.000000] Xen: 0000000100000000 - 0000000100061000 (usable)
       [    0.000000] Xen: 0000000100061000 - 000000012c000000 (unusable)
      ...
       [    0.000000] MEMBLOCK configuration:
      ...
      -[    0.000000]  reserved[0x4]       [0x000000cd9ff000-0x000000cd9ffbff], 0xc00 bytes
      -[    0.000000]  reserved[0x5]       [0x00000100000000-0x00000100060fff], 0x61000 bytes
      
      Related xen memory layout:
      (XEN) Xen-e820 RAM map:
      (XEN)  0000000000000000 - 000000000009ec00 (usable)
      (XEN)  00000000000f0000 - 0000000000100000 (reserved)
      (XEN)  0000000000100000 - 00000000cd9ffc00 (usable)
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      [v2: If xen_do_chunk fail(populate), abort this chunk and any others]
      Suggested by David, thanks.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c3d93f88
    • O
      xen PVonHVM: move shared_info to MMIO before kexec · 00e37bdb
      Olaf Hering 提交于
      Currently kexec in a PVonHVM guest fails with a triple fault because the
      new kernel overwrites the shared info page. The exact failure depends on
      the size of the kernel image. This patch moves the pfn from RAM into
      MMIO space before the kexec boot.
      
      The pfn containing the shared_info is located somewhere in RAM. This
      will cause trouble if the current kernel is doing a kexec boot into a
      new kernel. The new kernel (and its startup code) can not know where the
      pfn is, so it can not reserve the page. The hypervisor will continue to
      update the pfn, and as a result memory corruption occours in the new
      kernel.
      
      One way to work around this issue is to allocate a page in the
      xen-platform pci device's BAR memory range. But pci init is done very
      late and the shared_info page is already in use very early to read the
      pvclock. So moving the pfn from RAM to MMIO is racy because some code
      paths on other vcpus could access the pfn during the small   window when
      the old pfn is moved to the new pfn. There is even a  small window were
      the old pfn is not backed by a mfn, and during that time all reads
      return -1.
      
      Because it is not known upfront where the MMIO region is located it can
      not be used right from the start in xen_hvm_init_shared_info.
      
      To minimise trouble the move of the pfn is done shortly before kexec.
      This does not eliminate the race because all vcpus are still online when
      the syscore_ops will be called. But hopefully there is no work pending
      at this point in time. Also the syscore_op is run last which reduces the
      risk further.
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      00e37bdb
    • O
      xen: simplify init_hvm_pv_info · 4ff2d062
      Olaf Hering 提交于
      init_hvm_pv_info is called only in PVonHVM context, move it into ifdef.
      init_hvm_pv_info does not fail, make it a void function.
      remove arguments from init_hvm_pv_info because they are not used by the
      caller.
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      4ff2d062
    • O
      xen: remove cast from HYPERVISOR_shared_info assignment · 4648da7c
      Olaf Hering 提交于
      Both have type struct shared_info so no cast is needed.
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      4648da7c
    • D
      xen/x86: avoid updating TLS descriptors if they haven't changed · 1c32cdc6
      David Vrabel 提交于
      When switching tasks in a Xen PV guest, avoid updating the TLS
      descriptors if they haven't changed.  This improves the speed of
      context switches by almost 10% as much of the time the descriptors are
      the same or only one is different.
      
      The descriptors written into the GDT by Xen are modified from the
      values passed in the update_descriptor hypercall so we keep shadow
      copies of the three TLS descriptors to compare against.
      
      lmbench3 test     Before  After  Improvement
      --------------------------------------------
      lat_ctx -s 32 24   7.19    6.52  9%
      lat_pipe          12.56   11.66  7%
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      1c32cdc6