1. 21 5月, 2011 3 次提交
    • J
      xen: condense everything onto xen_set_pte · 4a35c13c
      Jeremy Fitzhardinge 提交于
      xen_set_pte_at and xen_clear_pte are essentially identical to
      xen_set_pte, so just make them all common.
      
      When batched set_pte and pte_clear are the same, but the unbatch operation
      must be different: they need to update the two halves of the pte in
      different order.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      4a35c13c
    • J
      xen: use mmu_update for xen_set_pte_at() · a99ac5e8
      Jeremy Fitzhardinge 提交于
      In principle update_va_mapping is a good match for set_pte_at, since
      it gets the address being mapped, which allows Xen to use its linear
      pagetable mapping.
      
      However that assumes that the pmd for the address is attached to the
      current pagetable, which may not be true for a given user address space
      because the kernel pmd is not shared (at least on 32-bit guests).
      Normally the kernel will automatically sync a missing part of the
      pagetable with the init_mm pagetable transparently via faults, but that
      fails when a missing address is passed to Xen.
      
      And while the linear pagetable mapping is very useful for 32-bit Xen
      (as it avoids an explicit domain mapping), 32-bit Xen is deprecated.
      64-bit Xen has all memory mapped all the time, so it makes no real
      difference.
      
      The upshot is that we should use mmu_update, since it can operate on
      non-current pagetables or detached pagetables.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      a99ac5e8
    • J
      xen: drop all the special iomap pte paths. · 331468b1
      Jeremy Fitzhardinge 提交于
      Xen can work out when we're doing IO mappings for itself, so we don't
      need to do anything special, and the extra tests just clog things up.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      331468b1
  2. 13 5月, 2011 2 次提交
    • S
      x86,xen: introduce x86_init.mapping.pagetable_reserve · 279b706b
      Stefano Stabellini 提交于
      Introduce a new x86_init hook called pagetable_reserve that at the end
      of init_memory_mapping is used to reserve a range of memory addresses for
      the kernel pagetable pages we used and free the other ones.
      
      On native it just calls memblock_x86_reserve_range while on xen it also
      takes care of setting the spare memory previously allocated
      for kernel pagetable pages from RO to RW, so that it can be used for
      other purposes.
      
      A detailed explanation of the reason why this hook is needed follows.
      
      As a consequence of the commit:
      
      commit 4b239f45
      Author: Yinghai Lu <yinghai@kernel.org>
      Date:   Fri Dec 17 16:58:28 2010 -0800
      
          x86-64, mm: Put early page table high
      
      at some point init_memory_mapping is going to reach the pagetable pages
      area and map those pages too (mapping them as normal memory that falls
      in the range of addresses passed to init_memory_mapping as argument).
      Some of those pages are already pagetable pages (they are in the range
      pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
      everything is fine.
      Some of these pages are not pagetable pages yet (they fall in the range
      pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
      are going to be mapped RW.  When these pages become pagetable pages and
      are hooked into the pagetable, xen will find that the guest has already
      a RW mapping of them somewhere and fail the operation.
      The reason Xen requires pagetables to be RO is that the hypervisor needs
      to verify that the pagetables are valid before using them. The validation
      operations are called "pinning" (more details in arch/x86/xen/mmu.c).
      
      In order to fix the issue we mark all the pages in the entire range
      pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
      is completed only the range pgt_buf_start-pgt_buf_end is reserved by
      init_memory_mapping. Hence the kernel is going to crash as soon as one
      of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
      ranges are RO).
      
      For this reason we need a hook to reserve the kernel pagetable pages we
      used and free the other ones so that they can be reused for other
      purposes.
      On native it just means calling memblock_x86_reserve_range, on Xen it
      also means marking RW the pagetable pages that we allocated before but
      that haven't been used before.
      
      Another way to fix this is without using the hook is by adding a 'if
      (xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen
      counterpart, but that is just nasty.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      279b706b
    • K
      Revert "xen/mmu: Add workaround "x86-64, mm: Put early page table high"" · 92bdaef7
      Konrad Rzeszutek Wilk 提交于
      This reverts commit a3864783.
      
      It does not work with certain AMD machines.
      
      last_pfn = 0x100000 max_arch_pfn = 0x400000000
      initial memory mapped : 0 - 02c3a000
      Base memory trampoline at [ffff88000009b000] 9b000 size 20480
      init_memory_mapping: 0000000000000000-0000000100000000
       0000000000 - 0100000000 page 4k
      kernel direct mapping tables up to 100000000 @ ff7fb000-100000000
      init_memory_mapping: 0000000100000000-00000001e0800000
       0100000000 - 01e0800000 page 4k
      kernel direct mapping tables up to 1e0800000 @ 1df0f3000-1e0000000
      xen: setting RW the range fffdc000 - 100000000
      RAMDISK: 0203b000 - 02c3a000
      No NUMA configuration found
      Faking a node at 0000000000000000-00000001e0800000
      NUMA: Using 63 for the hash shift.
      Initmem setup node 0 0000000000000000-00000001e0800000
        NODE_DATA [00000001dfffb000 - 00000001dfffffff]
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
      PGD 0
      Oops: 0003 [#1] SMP
      last sysfs file:
      CPU 0
      Modules linked in:
      
      Pid: 0, comm: swapper Not tainted 2.6.39-0-virtual #6~smb1
      RIP: e030:[<ffffffff81cf6a75>]  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
      RSP: e02b:ffffffff81c01e38  EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 00000001e0800000 RCX: 0000000000001040
      RDX: 0000000000004100 RSI: 0000000000000000 RDI: ffff8801dfffb000
      RBP: ffffffff81c01e58 R08: 0000000000000020 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000bfe400
      FS:  0000000000000000(0000) GS:ffffffff81cca000(0000) knlGS:0000000000000000
      CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000001c03000 CR4: 0000000000000660
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0b020)
      Stack:
       0000000000000040 0000000000000001 0000000000000000 ffffffffffffffff
       ffffffff81c01e88 ffffffff81cf6c25 0000000000000000 0000000000000000
       ffffffff81cf687f 0000000000000000 ffffffff81c01ea8 ffffffff81cf6e45
      Call Trace:
       [<ffffffff81cf6c25>] numa_register_memblks.constprop.3+0x150/0x181
       [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
       [<ffffffff81cf6e45>] numa_init.part.2+0x1c/0x7c
       [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
       [<ffffffff81cf6f67>] numa_init+0x6c/0x70
       [<ffffffff81cf7057>] initmem_init+0x39/0x3b
       [<ffffffff81ce5865>] setup_arch+0x64e/0x769
       [<ffffffff815e43c1>] ? printk+0x51/0x53
       [<ffffffff81cdf92b>] start_kernel+0xd4/0x3f3
       [<ffffffff81cdf388>] x86_64_start_reservations+0x132/0x136
       [<ffffffff81ce2ed4>] xen_start_kernel+0x588/0x58f
      Code: 41 00 00 48 8b 3c c5 a0 24 cc 81 31 c0 40 f6 c7 01 74 05 aa 66 ba ff 40 40 f6 c7 02 74 05 66 ab 83 ea 02 89 d1 c1 e9 02 f6 c2 02 <f3> ab 74 02 66 ab 80 e2 01 74 01 aa 49 63 c4 48 c1 eb 0c 44 89
      RIP  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
       RSP <ffffffff81c01e38>
      CR2: 0000000000000000
      ---[ end trace a7919e7f17c0a725 ]---
      Kernel panic - not syncing: Attempted to kill the idle task!
      Pid: 0, comm: swapper Tainted: G      D     2.6.39-0-virtual #6~smb1
      Reported-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      92bdaef7
  3. 03 5月, 2011 2 次提交
    • S
      xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top · b9269dc7
      Stefano Stabellini 提交于
      mask_rw_pte is currently checking if a pfn is a pagetable page if it
      falls in the range pgt_buf_start - pgt_buf_end but that is incorrect
      because pgt_buf_end is a moving target: pgt_buf_top is the real
      boundary.
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b9269dc7
    • K
      xen/mmu: Add workaround "x86-64, mm: Put early page table high" · a3864783
      Konrad Rzeszutek Wilk 提交于
      As a consequence of the commit:
      
      commit 4b239f45
      Author: Yinghai Lu <yinghai@kernel.org>
      Date:   Fri Dec 17 16:58:28 2010 -0800
      
          x86-64, mm: Put early page table high
      
      it causes the Linux kernel to crash under Xen:
      
      mapping kernel into physical memory
      Xen: setup ISA identity maps
      about to get started...
      (XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7)
      (XEN) mm.c:3027:d0 Error while pinning mfn b1d89
      (XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      ...
      
      The reason is that at some point init_memory_mapping is going to reach
      the pagetable pages area and map those pages too (mapping them as normal
      memory that falls in the range of addresses passed to init_memory_mapping
      as argument). Some of those pages are already pagetable pages (they are
      in the range pgt_buf_start-pgt_buf_end) therefore they are going to be
      mapped RO and everything is fine.
      Some of these pages are not pagetable pages yet (they fall in the range
      pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
      are going to be mapped RW.  When these pages become pagetable pages and
      are hooked into the pagetable, xen will find that the guest has already
      a RW mapping of them somewhere and fail the operation.
      The reason Xen requires pagetables to be RO is that the hypervisor needs
      to verify that the pagetables are valid before using them. The validation
      operations are called "pinning" (more details in arch/x86/xen/mmu.c).
      
      In order to fix the issue we mark all the pages in the entire range
      pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
      is completed only the range pgt_buf_start-pgt_buf_end is reserved by
      init_memory_mapping. Hence the kernel is going to crash as soon as one
      of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
      ranges are RO).
      
      For this reason, this function is introduced which is called _after_
      the init_memory_mapping has completed (in a perfect world we would
      call this function from init_memory_mapping, but lets ignore that).
      
      Because we are called _after_ init_memory_mapping the pgt_buf_[start,
      end,top] have all changed to new values (b/c another init_memory_mapping
      is called). Hence, the first time we enter this function, we save
      away the pgt_buf_start value and update the pgt_buf_[end,top].
      
      When we detect that the "old" pgt_buf_start through pgt_buf_end
      PFNs have been reserved (so memblock_x86_reserve_range has been called),
      we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top.
      
      And then we update those "old" pgt_buf_[end|top] with the new ones
      so that we can redo this on the next pagetable.
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Reviewed-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      [v1: Updated with Jeremy's comments]
      [v2: Added the crash output]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a3864783
  4. 20 4月, 2011 2 次提交
  5. 12 4月, 2011 1 次提交
  6. 06 4月, 2011 2 次提交
  7. 05 4月, 2011 1 次提交
  8. 29 3月, 2011 1 次提交
    • R
      xen: fix p2m section mismatches · b83c6e55
      Randy Dunlap 提交于
      Fix section mismatch warnings:
      set_phys_range_identity() is called by __init xen_set_identity(),
      so also mark set_phys_range_identity() as __init.
      then:
      __early_alloc_p2m() is called set_phys_range_identity(), so also mark
      __early_alloc_p2m() as __init.
      
      WARNING: arch/x86/built-in.o(.text+0x7856): Section mismatch in reference from the function __early_alloc_p2m() to the function .init.text:extend_brk()
      The function __early_alloc_p2m() references
      the function __init extend_brk().
      This is often because __early_alloc_p2m lacks a __init
      annotation or the annotation of extend_brk is wrong.
      
      WARNING: arch/x86/built-in.o(.text+0x7967): Section mismatch in reference from the function set_phys_range_identity() to the function .init.text:extend_brk()
      The function set_phys_range_identity() references
      the function __init extend_brk().
      This is often because set_phys_range_identity lacks a __init
      annotation or the annotation of extend_brk is wrong.
      
      [v2: Per Stephen Hemming recommonedation made __early_alloc_p2m static]
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b83c6e55
  9. 24 3月, 2011 1 次提交
  10. 20 3月, 2011 2 次提交
    • S
      xen: update mask_rw_pte after kernel page tables init changes · d8aa5ec3
      Stefano Stabellini 提交于
      After "x86-64, mm: Put early page table high" already existing kernel
      page table pages can be mapped using early_ioremap too so we need to
      update mask_rw_pte to make sure these pages are still mapped RO.
      The reason why we have to do that is explain by the commit message of
      fef5ba79:
      
      "Xen requires that all pages containing pagetable entries to be mapped
      read-only.  If pages used for the initial pagetable are already mapped
      then we can change the mapping to RO.  However, if they are initially
      unmapped, we need to make sure that when they are later mapped, they
      are also mapped RO.
      
      ..SNIP..
      
      the pagetable setup code early_ioremaps the pages to write their
      entries, so we must make sure that mappings created in the early_ioremap
      fixmap area are mapped RW.  (Those mappings are removed before the pages
      are presented to Xen as pagetable pages.)"
      
      We accomplish all this in mask_rw_pte by mapping RO all the pages mapped
      using early_ioremap apart from the last one that has been allocated
      because it is not a page table page yet (it has not been hooked into the
      page tables yet).
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      d8aa5ec3
    • S
      xen: set max_pfn_mapped to the last pfn mapped · 14988a4d
      Stefano Stabellini 提交于
      Do not set max_pfn_mapped to the end of the initial memory mappings,
      that also contain pages that don't belong in pfn space (like the mfn
      list).
      
      Set max_pfn_mapped to the last real pfn mapped in the initial memory
      mappings that is the pfn backing _end.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      14988a4d
  11. 18 3月, 2011 1 次提交
  12. 15 3月, 2011 1 次提交
    • R
      PM: Make CONFIG_PM depend on (CONFIG_PM_SLEEP || CONFIG_PM_RUNTIME) · 1eb208ae
      Rafael J. Wysocki 提交于
      From the users' point of view CONFIG_PM is really only used for
      making it possible to set CONFIG_SUSPEND, CONFIG_HIBERNATION,
      CONFIG_PM_RUNTIME and (surprisingly enough) CONFIG_XEN_SAVE_RESTORE
      (CONFIG_PM_OPP also depends on CONFIG_PM, but quite artificially).
      However, both CONFIG_SUSPEND and CONFIG_HIBERNATION require platform
      support (independent of CONFIG_PM) and it is not quite obvious that
      CONFIG_PM has to be set for CONFIG_XEN_SAVE_RESTORE to be available.
      Thus, from the users' point of view, it would be more logical to
      automatically select CONFIG_PM if any of the above options depending
      on it are set.
      
      Make CONFIG_PM depend on (CONFIG_PM_SLEEP || CONFIG_PM_RUNTIME),
      which will cause it to be selected when any of CONFIG_SUSPEND,
      CONFIG_HIBERNATION, CONFIG_PM_RUNTIME, CONFIG_XEN_SAVE_RESTORE is
      set and will clarify its meaning.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      1eb208ae
  13. 14 3月, 2011 7 次提交
    • D
      xen/balloon: Removal of driver_pages · 06f521d5
      Daniel Kiper 提交于
      Removal of driver_pages (I do not have seen any references to it).
      Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      06f521d5
    • K
      xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set. · fc25151d
      Konrad Rzeszutek Wilk 提交于
      Only enabled if XEN_DEBUG is enabled. We print a warning
      when:
      
       pfn_to_mfn(pfn) == pfn, but no VM_IO (_PAGE_IOMAP) flag set
      	(and pfn is an identity mapped pfn)
       pfn_to_mfn(pfn) != pfn, and VM_IO flag is set.
      	(ditto, pfn is an identity mapped pfn)
      
      [v2: Make it dependent on CONFIG_XEN_DEBUG instead of ..DEBUG_FS]
      [v3: Fix compiler warning]
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      fc25151d
    • K
      xen/debugfs: Add 'p2m' file for printing out the P2M layout. · 2222e71b
      Konrad Rzeszutek Wilk 提交于
      We walk over the whole P2M tree and construct a simplified view of
      which PFN regions belong to what level and what type they are.
      
      Only enabled if CONFIG_XEN_DEBUG_FS is set.
      
      [v2: UNKN->UNKNOWN, use uninitialized_var]
      [v3: Rebased on top of mmu->p2m code split]
      [v4: Fixed the else if]
      Reviewed-by: NIan Campbell <Ian.Campbell@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2222e71b
    • K
      xen/setup: Set identity mapping for non-RAM E820 and E820 gaps. · 68df0da7
      Konrad Rzeszutek Wilk 提交于
      We walk the E820 region and start at 0 (for PV guests we start
      at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other
      regions and as well the gaps we set them to be identity mappings.
      
      The reasons we do not want to set the identity mapping from 0->
      ISA_END_ADDRESS when running as PV is b/c that the kernel would
      try to read DMI information and fail (no permissions to read that).
      There is a lot of gnarly code to deal with that weird region so
      we won't try to do a cleanup in this patch.
      
      This code ends up calling 'set_phys_to_identity' with the start
      and end PFN of the the E820 that are non-RAM or have gaps.
      On 99% of machines that means one big region right underneath the
      4GB mark. Usually starts at 0xc0000 (or 0x80000) and goes to
      0x100000.
      
      [v2: Fix for E820 crossing 1MB region and clamp the start]
      [v3: Squshed in code that does this over ranges]
      [v4: Moved the comment to the correct spot]
      [v5: Use the "raw" E820 from the hypervisor]
      [v6: Added Review-by tag]
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      68df0da7
    • K
      xen/mmu: WARN_ON when racing to swap middle leaf. · c7617798
      Konrad Rzeszutek Wilk 提交于
      The initial bootup code uses set_phys_to_machine quite a lot, and after
      bootup it would be used by the balloon driver. The balloon driver does have
      mutex lock so this should not be necessary - but just in case, add
      a WARN_ON if we do hit this scenario. If we do fail this, it is OK
      to continue as there is a backup mechanism (VM_IO) that can bypass
      the P2M and still set the _PAGE_IOMAP flags.
      
      [v2: Change from WARN to BUG_ON]
      [v3: Rebased on top of xen->p2m code split]
      [v4: Change from BUG_ON to WARN]
      Reviewed-by: NIan Campbell <Ian.Campbell@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c7617798
    • K
      xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN. · fb38923e
      Konrad Rzeszutek Wilk 提交于
      If we find that the PFN is within the P2M as an identity
      PFN make sure to tack on the _PAGE_IOMAP flag.
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      fb38923e
    • K
      xen/mmu: Add the notion of identity (1-1) mapping. · f4cec35b
      Konrad Rzeszutek Wilk 提交于
      Our P2M tree structure is a three-level. On the leaf nodes
      we set the Machine Frame Number (MFN) of the PFN. What this means
      is that when one does: pfn_to_mfn(pfn), which is used when creating
      PTE entries, you get the real MFN of the hardware. When Xen sets
      up a guest it initially populates a array which has descending
      (or ascending) MFN values, as so:
      
       idx: 0,  1,       2
       [0x290F, 0x290E, 0x290D, ..]
      
      so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list
      starts looking quite random.
      
      We graft this structure on our P2M tree structure and stick in
      those MFN in the leafs. But for all other leaf entries, or for the top
      root, or middle one, for which there is a void entry, we assume it is
      "missing". So
       pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY.
      
      We add the possibility of setting 1-1 mappings on certain regions, so
      that:
       pfn_to_mfn(0xc0000)=0xc0000
      
      The benefit of this is, that we can assume for non-RAM regions (think
      PCI BARs, or ACPI spaces), we can create mappings easily b/c we
      get the PFN value to match the MFN.
      
      For this to work efficiently we introduce one new page p2m_identity and
      allocate (via reserved_brk) any other pages we need to cover the sides
      (1GB or 4MB boundary violations). All entries in p2m_identity are set to
      INVALID_P2M_ENTRY type (Xen toolstack only recognizes that and MFNs,
      no other fancy value).
      
      On lookup we spot that the entry points to p2m_identity and return the identity
      value instead of dereferencing and returning INVALID_P2M_ENTRY. If the entry
      points to an allocated page, we just proceed as before and return the PFN.
      If the PFN has IDENTITY_FRAME_BIT set we unmask that in appropriate functions
      (pfn_to_mfn).
      
      The reason for having the IDENTITY_FRAME_BIT instead of just returning the
      PFN is that we could find ourselves where pfn_to_mfn(pfn)==pfn for a
      non-identity pfn. To protect ourselves against we elect to set (and get) the
      IDENTITY_FRAME_BIT on all identity mapped PFNs.
      
      This simplistic diagram is used to explain the more subtle piece of code.
      There is also a digram of the P2M at the end that can help.
      Imagine your E820 looking as so:
      
                         1GB                                           2GB
      /-------------------+---------\/----\         /----------\    /---+-----\
      | System RAM        | Sys RAM ||ACPI|         | reserved |    | Sys RAM |
      \-------------------+---------/\----/         \----------/    \---+-----/
                                    ^- 1029MB                       ^- 2001MB
      
      [1029MB = 263424 (0x40500), 2001MB = 512256 (0x7D100), 2048MB = 524288 (0x80000)]
      
      And dom0_mem=max:3GB,1GB is passed in to the guest, meaning memory past 1GB
      is actually not present (would have to kick the balloon driver to put it in).
      
      When we are told to set the PFNs for identity mapping (see patch: "xen/setup:
      Set identity mapping for non-RAM E820 and E820 gaps.") we pass in the start
      of the PFN and the end PFN (263424 and 512256 respectively). The first step is
      to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page
      covers 512^2 of page estate (1GB) and in case the start or end PFN is not
      aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn to
      end pfn.  We reserve_brk top leaf pages if they are missing (means they point
      to p2m_mid_missing).
      
      With the E820 example above, 263424 is not 1GB aligned so we allocate a
      reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000.
      Each entry in the allocate page is "missing" (points to p2m_missing).
      
      Next stage is to determine if we need to do a more granular boundary check
      on the 4MB (or 2MB depending on architecture) off the start and end pfn's.
      We check if the start pfn and end pfn violate that boundary check, and if
      so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer
      granularity of setting which PFNs are missing and which ones are identity.
      In our example 263424 and 512256 both fail the check so we reserve_brk two
      pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing" values)
      and assign them to p2m[1][2] and p2m[1][488] respectively.
      
      At this point we would at minimum reserve_brk one page, but could be up to
      three. Each call to set_phys_range_identity has at maximum a three page
      cost. If we were to query the P2M at this stage, all those entries from
      start PFN through end PFN (so 1029MB -> 2001MB) would return INVALID_P2M_ENTRY
      ("missing").
      
      The next step is to walk from the start pfn to the end pfn setting
      the IDENTITY_FRAME_BIT on each PFN. This is done in 'set_phys_range_identity'.
      If we find that the middle leaf is pointing to p2m_missing we can swap it over
      to p2m_identity - this way covering 4MB (or 2MB) PFN space.  At this point we
      do not need to worry about boundary aligment (so no need to reserve_brk a middle
      page, figure out which PFNs are "missing" and which ones are identity), as that
      has been done earlier.  If we find that the middle leaf is not occupied by
      p2m_identity or p2m_missing, we dereference that page (which covers
      512 PFNs) and set the appropriate PFN with IDENTITY_FRAME_BIT. In our example
      263424 and 512256 end up there, and we set from p2m[1][2][256->511] and
      p2m[1][488][0->256] with IDENTITY_FRAME_BIT set.
      
      All other regions that are void (or not filled) either point to p2m_missing
      (considered missing) or have the default value of INVALID_P2M_ENTRY (also
      considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511]
      contain the INVALID_P2M_ENTRY value and are considered "missing."
      
      This is what the p2m ends up looking (for the E820 above) with this
      fabulous drawing:
      
         p2m         /--------------\
       /-----\       | &mfn_list[0],|                           /-----------------\
       |  0  |------>| &mfn_list[1],|    /---------------\      | ~0, ~0, ..      |
       |-----|       |  ..., ~0, ~0 |    | ~0, ~0, [x]---+----->| IDENTITY [@256] |
       |  1  |---\   \--------------/    | [p2m_identity]+\     | IDENTITY [@257] |
       |-----|    \                      | [p2m_identity]+\\    | ....            |
       |  2  |--\  \-------------------->|  ...          | \\   \----------------/
       |-----|   \                       \---------------/  \\
       |  3  |\   \                                          \\  p2m_identity
       |-----| \   \-------------------->/---------------\   /-----------------\
       | ..  +->+                        | [p2m_identity]+-->| ~0, ~0, ~0, ... |
       \-----/ /                         | [p2m_identity]+-->| ..., ~0         |
              / /---------------\        | ....          |   \-----------------/
             /  | IDENTITY[@0]  |      /-+-[x], ~0, ~0.. |
            /   | IDENTITY[@256]|<----/  \---------------/
           /    | ~0, ~0, ....  |
          |     \---------------/
          |
          p2m_missing             p2m_missing
      /------------------\     /------------\
      | [p2m_mid_missing]+---->| ~0, ~0, ~0 |
      | [p2m_mid_missing]+---->| ..., ~0    |
      \------------------/     \------------/
      
      where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN | IDENTITY_BIT)
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      [v5: Changed code to use ranges, added ASCII art]
      [v6: Rebased on top of xen->p2m code split]
      [v4: Squished patches in just this one]
      [v7: Added RESERVE_BRK for potentially allocated pages]
      [v8: Fixed alignment problem]
      [v9: Changed 1<<3X to 1<<BITS_PER_LONG-X]
      [v10: Copied git commit description in the p2m code + Add Review tag]
      [v11: Title had '2-1' - should be '1-1' mapping]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f4cec35b
  14. 12 3月, 2011 1 次提交
    • K
      xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest and fix overflow. · 86b32122
      Konrad Rzeszutek Wilk 提交于
      If we have a guest that asked for:
      
      memory=1024
      maxmem=2048
      
      Which means we want 1GB now, and create pagetables so that we can expand
      up to 2GB, we would have this E820 layout:
      
      [    0.000000] BIOS-provided physical RAM map:
      [    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
      [    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
      [    0.000000]  Xen: 0000000000100000 - 0000000080800000 (usable)
      
      Due to patch: "xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps."
      we would mark the memory past the 1GB mark as unusuable resulting in:
      
      [    0.000000] BIOS-provided physical RAM map:
      [    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
      [    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
      [    0.000000]  Xen: 0000000000100000 - 0000000040000000 (usable)
      [    0.000000]  Xen: 0000000040000000 - 0000000080800000 (unusable)
      
      which meant that we could not balloon up anymore. We could
      balloon the guest down. The fix is to run the code introduced
      by the above mentioned patch only for the initial domain.
      
      We will have to revisit this once we start introducing a modified
      E820 for PCI passthrough so that we can utilize the P2M identity code.
      
      We also fix an overflow by having UL instead of ULL on 32-bit machines.
      
      [v2: Ian pointed to the overflow issue]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      86b32122
  15. 10 3月, 2011 1 次提交
  16. 04 3月, 2011 2 次提交
  17. 26 2月, 2011 5 次提交
  18. 24 2月, 2011 1 次提交
    • Y
      x86: Rename e820_table_* to pgt_buf_* · d1b19426
      Yinghai Lu 提交于
      e820_table_{start|end|top}, which are used to buffer page table
      allocation during early boot, are now derived from memblock and don't
      have much to do with e820.  Change the names so that they reflect what
      they're used for.
      
      This patch doesn't introduce any behavior change.
      
      -v2: Ingo found that earlier patch "x86: Use early pre-allocated page
           table buffer top-down" caused crash on 32bit and needed to be
           dropped.  This patch was updated to reflect the change.
      
      -tj: Updated commit description.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d1b19426
  19. 23 2月, 2011 1 次提交
    • Z
      xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps. · 2f14ddc3
      Zhang, Fengzhe 提交于
      With the hypervisor argument of dom0_mem=X we iterate over the physical
      (only for the initial domain) E820 and subtract the the size from each
      E820_RAM region the delta so that the cumulative size of all E820_RAM regions
      is equal to 'X'. This sometimes ends up with E820_RAM regions with zero size
      (which are removed by e820_sanitize) and E820_RAM that are smaller
      than physically.
      
      Later on the PCI API looks at the E820 and attempts to set up an
      resource region for the "PCI mem". The E820 (assume dom0_mem=1GB is
      set) compared to the physical looks as so:
      
       [    0.000000] BIOS-provided physical RAM map:
       [    0.000000]  Xen: 0000000000000000 - 0000000000097c00 (usable)
       [    0.000000]  Xen: 0000000000097c00 - 0000000000100000 (reserved)
      -[    0.000000]  Xen: 0000000000100000 - 00000000defafe00 (usable)
      +[    0.000000]  Xen: 0000000000100000 - 0000000040000000 (usable)
       [    0.000000]  Xen: 00000000defafe00 - 00000000defb1ea0 (ACPI NVS)
       [    0.000000]  Xen: 00000000defb1ea0 - 00000000e0000000 (reserved)
       [    0.000000]  Xen: 00000000f4000000 - 00000000f8000000 (reserved)
      ..
      And we get
      [    0.000000] Allocating PCI resources starting at 40000000 (gap: 40000000:9efafe00)
      
      while it should have started at e0000000 (a nice big gap up to
      f4000000 exists). The "Allocating PCI" is part of the resource API.
      
      The users that end up using those PCI I/O regions usually supply their
      own BARs when calling the resource API (request_resource, or allocate_resource),
      but there are exceptions which provide an empty 'struct resource' and
      expect the API to provide the 'struct resource' to be populated with valid values.
      The one that triggered this bug was the intel AGP driver that requested
      a region for the flush page (intel_i9xx_setup_flush).
      
      Before this patch, when running under Xen hypervisor, the 'struct resource'
      returned could have (depending on the dom0_mem size) physical ranges of a 'System RAM'
      instead of 'I/O' regions. This ended up with the Hypervisor failing a request
      to populate PTE's with those PFNs as the domain did not have access to those
      'System RAM' regions (rightly so).
      
      After this patch, the left-over E820_RAM region from the truncation, will be
      labeled as E820_UNUSABLE. The E820 will look as so:
      
       [    0.000000] BIOS-provided physical RAM map:
       [    0.000000]  Xen: 0000000000000000 - 0000000000097c00 (usable)
       [    0.000000]  Xen: 0000000000097c00 - 0000000000100000 (reserved)
      -[    0.000000]  Xen: 0000000000100000 - 00000000defafe00 (usable)
      +[    0.000000]  Xen: 0000000000100000 - 0000000040000000 (usable)
      +[    0.000000]  Xen: 0000000040000000 - 00000000defafe00 (unusable)
       [    0.000000]  Xen: 00000000defafe00 - 00000000defb1ea0 (ACPI NVS)
       [    0.000000]  Xen: 00000000defb1ea0 - 00000000e0000000 (reserved)
       [    0.000000]  Xen: 00000000f4000000 - 00000000f8000000 (reserved)
      
      For more information:
      http://mid.gmane.org/1A42CE6F5F474C41B63392A5F80372B2335E978C@shsmsx501.ccr.corp.intel.com
      
      BugLink: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1726Signed-off-by: NFengzhe Zhang <fengzhe.zhang@intel.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2f14ddc3
  20. 12 2月, 2011 2 次提交
    • I
      xen: annotate functions which only call into __init at start of day · 44b46c3e
      Ian Campbell 提交于
      Both xen_hvm_init_shared_info and xen_build_mfn_list_list can be
      called at resume time as well as at start of day but only reference
      __init functions (extend_brk) at start of day. Hence annotate with
      __ref.
      
          WARNING: arch/x86/built-in.o(.text+0x4f1): Section mismatch in reference
              from the function xen_hvm_init_shared_info() to the function
              .init.text:extend_brk()
          The function xen_hvm_init_shared_info() references
          the function __init extend_brk().
          This is often because xen_hvm_init_shared_info lacks a __init
          annotation or the annotation of extend_brk is wrong.
      
      xen_hvm_init_shared_info calls extend_brk() iff !shared_info_page and
      initialises shared_info_page with the result. This happens at start of
      day only.
      
          WARNING: arch/x86/built-in.o(.text+0x599b): Section mismatch in reference
              from the function xen_build_mfn_list_list() to the function
              .init.text:extend_brk()
          The function xen_build_mfn_list_list() references
          the function __init extend_brk().
          This is often because xen_build_mfn_list_list lacks a __init
          annotation or the annotation of extend_brk is wrong.
      
      (this warning occurs multiple times)
      
      xen_build_mfn_list_list only calls extend_brk() at boot time, while
      building the initial mfn list list
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      44b46c3e
    • I
      xen p2m: annotate variable which appears unused · 6b08cfeb
      Ian Campbell 提交于
       CC      arch/x86/xen/p2m.o
      arch/x86/xen/p2m.c: In function 'm2p_remove_override':
      arch/x86/xen/p2m.c:460: warning: 'address' may be used uninitialized in this function
      arch/x86/xen/p2m.c: In function 'm2p_add_override':
      arch/x86/xen/p2m.c:426: warning: 'address' may be used uninitialized in this function
      
      In actual fact address is inialised in one "if (!PageHighMem(page))"
      statement and used in a second and so is always initialised before
      use.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      6b08cfeb
  21. 28 1月, 2011 1 次提交