1. 20 8月, 2015 2 次提交
    • J
      xen: remove no longer needed p2m.h · cb3eb850
      Juergen Gross 提交于
      Cleanup by removing arch/x86/xen/p2m.h as it isn't needed any more.
      
      Most definitions in this file are used in p2m.c only. Move those into
      p2m.c.
      
      set_phys_range_identity() is already declared in
      arch/x86/include/asm/xen/page.h, add __init annotation there.
      
      MAX_REMAP_RANGES isn't used at all, just delete it.
      
      The only define left is P2M_PER_PAGE which is moved to page.h as well.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      cb3eb850
    • J
      xen: allow more than 512 GB of RAM for 64 bit pv-domains · c70727a5
      Juergen Gross 提交于
      64 bit pv-domains under Xen are limited to 512 GB of RAM today. The
      main reason has been the 3 level p2m tree, which was replaced by the
      virtual mapped linear p2m list. Parallel to the p2m list which is
      being used by the kernel itself there is a 3 level mfn tree for usage
      by the Xen tools and eventually for crash dump analysis. For this tree
      the linear p2m list can serve as a replacement, too. As the kernel
      can't know whether the tools are capable of dealing with the p2m list
      instead of the mfn tree, the limit of 512 GB can't be dropped in all
      cases.
      
      This patch replaces the hard limit by a kernel parameter which tells
      the kernel to obey the 512 GB limit or not. The default is selected by
      a configuration parameter which specifies whether the 512 GB limit
      should be active per default for domUs (domain save/restore/migration
      and crash dump analysis are affected).
      
      Memory above the domain limit is returned to the hypervisor instead of
      being identity mapped, which was wrong anyway.
      
      The kernel configuration parameter to specify the maximum size of a
      domain can be deleted, as it is not relevant any more.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      c70727a5
  2. 06 5月, 2015 1 次提交
  3. 28 1月, 2015 2 次提交
  4. 08 12月, 2014 1 次提交
  5. 04 12月, 2014 6 次提交
    • J
      xen: switch to linear virtual mapped sparse p2m list · 054954eb
      Juergen Gross 提交于
      At start of the day the Xen hypervisor presents a contiguous mfn list
      to a pv-domain. In order to support sparse memory this mfn list is
      accessed via a three level p2m tree built early in the boot process.
      Whenever the system needs the mfn associated with a pfn this tree is
      used to find the mfn.
      
      Instead of using a software walked tree for accessing a specific mfn
      list entry this patch is creating a virtual address area for the
      entire possible mfn list including memory holes. The holes are
      covered by mapping a pre-defined  page consisting only of "invalid
      mfn" entries. Access to a mfn entry is possible by just using the
      virtual base address of the mfn list and the pfn as index into that
      list. This speeds up the (hot) path of determining the mfn of a
      pfn.
      
      Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
      showed following improvements:
      
      Elapsed time: 32:50 ->  32:35
      System:       18:07 ->  17:47
      User:        104:00 -> 103:30
      
      Tested with following configurations:
      - 64 bit dom0, 8GB RAM
      - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB
      - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without
                                          the patch)
      - 32 bit domU, ballooning up and down
      - 32 bit domU, save and restore
      - 32 bit domU with PCI passthrough
      - 64 bit domU, 8 GB, 2049 MB, 5000 MB
      - 64 bit domU, ballooning up and down
      - 64 bit domU, save and restore
      - 64 bit domU with PCI passthrough
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      054954eb
    • J
      xen: Hide get_phys_to_machine() to be able to tune common path · 0aad5689
      Juergen Gross 提交于
      Today get_phys_to_machine() is always called when the mfn for a pfn
      is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
      to be able to avoid calling get_phys_to_machine() when possible as
      soon as the switch to a linear mapped p2m list has been done.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      0aad5689
    • J
      xen: Delay invalidating extra memory · 5b8e7d80
      Juergen Gross 提交于
      When the physical memory configuration is initialized the p2m entries
      for not pouplated memory pages are set to "invalid". As those pages
      are beyond the hypervisor built p2m list the p2m tree has to be
      extended.
      
      This patch delays processing the extra memory related p2m entries
      during the boot process until some more basic memory management
      functions are callable. This removes the need to create new p2m
      entries until virtual memory management is available.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      5b8e7d80
    • J
      xen: Delay remapping memory of pv-domain · 1f3ac86b
      Juergen Gross 提交于
      Early in the boot process the memory layout of a pv-domain is changed
      to match the E820 map (either the host one for Dom0 or the Xen one)
      regarding placement of RAM and PCI holes. This requires removing memory
      pages initially located at positions not suitable for RAM and adding
      them later at higher addresses where no restrictions apply.
      
      To be able to operate on the hypervisor supported p2m list until a
      virtual mapped linear p2m list can be constructed, remapping must
      be delayed until virtual memory management is initialized, as the
      initial p2m list can't be extended unlimited at physical memory
      initialization time due to it's fixed structure.
      
      A further advantage is the reduction in complexity and code volume as
      we don't have to be careful regarding memory restrictions during p2m
      updates.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      1f3ac86b
    • J
      xen: Make functions static · 820c4db2
      Juergen Gross 提交于
      Some functions in arch/x86/xen/p2m.c are used locally only. Make them
      static. Rearrange the functions in p2m.c to avoid forward declarations.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      820c4db2
    • S
      xen/arm/arm64: introduce xen_arch_need_swiotlb · a4dba130
      Stefano Stabellini 提交于
      Introduce an arch specific function to find out whether a particular dma
      mapping operation needs to bounce on the swiotlb buffer.
      
      On ARM and ARM64, if the page involved is a foreign page and the device
      is not coherent, we need to bounce because at unmap time we cannot
      execute any required cache maintenance operations (we don't know how to
      find the pfn from the mfn).
      
      No change of behaviour for x86.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a4dba130
  6. 18 3月, 2014 1 次提交
  7. 03 2月, 2014 1 次提交
  8. 31 1月, 2014 1 次提交
    • Z
      xen/grant-table: Avoid m2p_override during mapping · 08ece5bb
      Zoltan Kiss 提交于
      The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
      for blkback and future netback patches it just cause a lock contention, as
      those pages never go to userspace. Therefore this series does the following:
      - the original functions were renamed to __gnttab_[un]map_refs, with a new
        parameter m2p_override
      - based on m2p_override either they follow the original behaviour, or just set
        the private flag and call set_phys_to_machine
      - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
        m2p_override false
      - a new function gnttab_[un]map_refs_userspace provides the old behaviour
      
      It also removes a stray space from page.h and change ret to 0 if
      XENFEAT_auto_translated_physmap, as that is the only possible return value
      there.
      
      v2:
      - move the storing of the old mfn in page->index to gnttab_map_refs
      - move the function header update to a separate patch
      
      v3:
      - a new approach to retain old behaviour where it needed
      - squash the patches into one
      
      v4:
      - move out the common bits from m2p* functions, and pass pfn/mfn as parameter
      - clear page->private before doing anything with the page, so m2p_find_override
        won't race with this
      
      v5:
      - change return value handling in __gnttab_[un]map_refs
      - remove a stray space in page.h
      - add detail why ret = 0 now at some places
      
      v6:
      - don't pass pfn to m2p* functions, just get it locally
      Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
      Suggested-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      08ece5bb
  9. 06 1月, 2014 2 次提交
    • K
      xen/grant: Implement an grant frame array struct (v3). · efaf30a3
      Konrad Rzeszutek Wilk 提交于
      The 'xen_hvm_resume_frames' used to be an 'unsigned long'
      and contain the virtual address of the grants. That was OK
      for most architectures (PVHVM, ARM) were the grants are contiguous
      in memory. That however is not the case for PVH - in which case
      we will have to do a lookup for each virtual address for the PFN.
      
      Instead of doing that, lets make it a structure which will contain
      the array of PFNs, the virtual address and the count of said PFNs.
      
      Also provide a generic functions: gnttab_setup_auto_xlat_frames and
      gnttab_free_auto_xlat_frames to populate said structure with
      appropriate values for PVHVM and ARM.
      
      To round it off, change the name from 'xen_hvm_resume_frames' to
      a more descriptive one - 'xen_auto_xlat_grant_frames'.
      
      For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
      we will populate the 'xen_auto_xlat_grant_frames' by ourselves.
      
      v2 moves the xen_remap in the gnttab_setup_auto_xlat_frames
      and also introduces xen_unmap for gnttab_free_auto_xlat_frames.
      Suggested-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v3: Based on top of 'asm/xen/page.h: remove redundant semicolon']
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      efaf30a3
    • M
      xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn. · fc590efe
      Mukesh Rathor 提交于
      Most of the functions in page.h are prefaced with
      	if (xen_feature(XENFEAT_auto_translated_physmap))
      		return mfn;
      
      Except the mfn_to_local_pfn. At a first sight, the function
      should work without this patch - as the 'mfn_to_mfn' has
      a similar check. But there are no such check in the
      'get_phys_to_machine' function - so we would crash in there.
      
      This fixes it by following the convention of having the
      check for auto-xlat in these static functions.
      Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      fc590efe
  10. 25 9月, 2013 1 次提交
    • D
      xen/p2m: check MFN is in range before using the m2p table · 0160676b
      David Vrabel 提交于
      On hosts with more than 168 GB of memory, a 32-bit guest may attempt
      to grant map an MFN that is error cannot lookup in its mapping of the
      m2p table.  There is an m2p lookup as part of m2p_add_override() and
      m2p_remove_override().  The lookup falls off the end of the mapped
      portion of the m2p and (because the mapping is at the highest virtual
      address) wraps around and the lookup causes a fault on what appears to
      be a user space address.
      
      do_page_fault() (thinking it's a fault to a userspace address), tries
      to lock mm->mmap_sem.  If the gntdev device is used for the grant map,
      m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
      already locked.  do_page_fault() then deadlocks.
      
      The deadlock would most commonly occur when a 64-bit guest is started
      and xenconsoled attempts to grant map its console ring.
      
      Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
      mapped portion of the m2p table before accessing the table and use
      this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
      (which already had the correct range check).
      
      All faults caused by accessing the non-existant parts of the m2p are
      thus within the kernel address space and exception_fixup() is called
      without trying to lock mm->mmap_sem.
      
      This means that for MFNs that are outside the mapped range of the m2p
      then mfn_to_pfn() will always look in the m2p overrides.  This is
      correct because it must be a foreign MFN (and the PFN in the m2p in
      this case is only relevant for the other domain).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      --
      v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
      v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
          range as it's probably foreign.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      0160676b
  11. 20 2月, 2013 1 次提交
  12. 12 9月, 2012 1 次提交
    • S
      xen/m2p: do not reuse kmap_op->dev_bus_addr · 2fc136ee
      Stefano Stabellini 提交于
      If the caller passes a valid kmap_op to m2p_add_override, we use
      kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is
      part of the interface with Xen and if we are batching the hypercalls it
      might not have been written by the hypervisor yet. That means that later
      on Xen will write to it and we'll think that the original mfn is
      actually what Xen has written to it.
      
      Rather than "stealing" struct members from kmap_op, keep using
      page->index to store the original mfn and add another parameter to
      m2p_remove_override to get the corresponding kmap_op instead.
      It is now responsibility of the caller to keep track of which kmap_op
      corresponds to a particular page in the m2p_override (gntdev, the only
      user of this interface that passes a valid kmap_op, is already doing that).
      
      CC: stable@kernel.org
      Reported-and-Tested-By: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2fc136ee
  13. 07 4月, 2012 1 次提交
  14. 29 9月, 2011 1 次提交
    • S
      xen: modify kernel mappings corresponding to granted pages · 0930bba6
      Stefano Stabellini 提交于
      If we want to use granted pages for AIO, changing the mappings of a user
      vma and the corresponding p2m is not enough, we also need to update the
      kernel mappings accordingly.
      Currently this is only needed for pages that are created for user usages
      through /dev/xen/gntdev. As in, pages that have been in use by the
      kernel and use the P2M will not need this special mapping.
      However there are no guarantees that in the future the kernel won't
      start accessing pages through the 1:1 even for internal usage.
      
      In order to avoid the complexity of dealing with highmem, we allocated
      the pages lowmem.
      We issue a HYPERVISOR_grant_table_op right away in
      m2p_add_override and we remove the mappings using another
      HYPERVISOR_grant_table_op in m2p_remove_override.
      Considering that m2p_add_override and m2p_remove_override are called
      once per page we use multicalls and hypercall batching.
      
      Use the kmap_op pointer directly as argument to do the mapping as it is
      guaranteed to be present up until the unmapping is done.
      Before issuing any unmapping multicalls, we need to make sure that the
      mapping has already being done, because we need the kmap->handle to be
      set correctly.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      [v1: Removed GRANT_FRAME_BIT usage]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      0930bba6
  15. 24 9月, 2011 1 次提交
  16. 17 8月, 2011 1 次提交
    • J
      xen/x86: replace order-based range checking of M2P table by linear one · ccbcdf7c
      Jan Beulich 提交于
      The order-based approach is not only less efficient (requiring a shift
      and a compare, typical generated code looking like this
      
      	mov	eax, [machine_to_phys_order]
      	mov	ecx, eax
      	shr	ebx, cl
      	test	ebx, ebx
      	jnz	...
      
      whereas a direct check requires just a compare, like in
      
      	cmp	ebx, [machine_to_phys_nr]
      	jae	...
      
      ), but also slightly dangerous in the 32-on-64 case - the element
      address calculation can wrap if the next power of two boundary is
      sufficiently far away from the actual upper limit of the table, and
      hence can result in user space addresses being accessed (with it being
      unknown what may actually be mapped there).
      
      Additionally, the elimination of the mistaken use of fls() here (should
      have been __fls()) fixes a latent issue on x86-64 that would trigger
      if the code was run on a system with memory extending beyond the 44-bit
      boundary.
      
      CC: stable@kernel.org
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      [v1: Based on Jeremy's feedback]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ccbcdf7c
  17. 19 5月, 2011 1 次提交
    • K
      xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. · d5431d52
      Konrad Rzeszutek Wilk 提交于
      We only supported the M2P (and P2M) override only for the
      GNTMAP_contains_pte type mappings. Meaning that we grants
      operations would "contain the machine address of the PTE to update"
      If the flag is unset, then the grant operation is
      "contains a host virtual address". The latter case means that
      the Hypervisor takes care of updating our page table
      (specifically the PTE entry) with the guest's MFN. As such we should
      not try to do anything with the PTE. Previous to this patch
      we would try to clear the PTE which resulted in Xen hypervisor
      being upset with us:
      
      (XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
      (XEN) domain_crash called from mm.c:1067
      (XEN) Domain 0 (vcpu#0) crashed on cpu#3:
      (XEN) ----[ Xen-4.0-110228  x86_64  debug=y  Not tainted ]----
      
      and crashing us.
      
      This patch allows us to inhibit the PTE clearing in the PV guest
      if the GNTMAP_contains_pte is not set.
      
      On the m2p_remove_override path we provide the same parameter.
      
      Sadly in the grant-table driver we do not have a mechanism to
      tell m2p_remove_override whether to clear the PTE or not. Since
      the grant-table driver is used by user-space, we can safely assume
      that it operates only on PTE's. Hence the implementation for
      it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
      we can implement the support for this. It will require some extra
      accounting structure to keep track of the page[i], and the flag.
      
      [v1: Added documentation details, made it return -EOPNOTSUPP instead
       of trying to do a half-way implementation]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d5431d52
  18. 18 4月, 2011 1 次提交
    • K
      xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. · cf8d9163
      Konrad Rzeszutek Wilk 提交于
      We only supported the M2P (and P2M) override only for the
      GNTMAP_contains_pte type mappings. Meaning that we grants
      operations would "contain the machine address of the PTE to update"
      If the flag is unset, then the grant operation is
      "contains a host virtual address". The latter case means that
      the Hypervisor takes care of updating our page table
      (specifically the PTE entry) with the guest's MFN. As such we should
      not try to do anything with the PTE. Previous to this patch
      we would try to clear the PTE which resulted in Xen hypervisor
      being upset with us:
      
      (XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
      (XEN) domain_crash called from mm.c:1067
      (XEN) Domain 0 (vcpu#0) crashed on cpu#3:
      (XEN) ----[ Xen-4.0-110228  x86_64  debug=y  Not tainted ]----
      
      and crashing us.
      
      This patch allows us to inhibit the PTE clearing in the PV guest
      if the GNTMAP_contains_pte is not set.
      
      On the m2p_remove_override path we provide the same parameter.
      
      Sadly in the grant-table driver we do not have a mechanism to
      tell m2p_remove_override whether to clear the PTE or not. Since
      the grant-table driver is used by user-space, we can safely assume
      that it operates only on PTE's. Hence the implementation for
      it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
      we can implement the support for this. It will require some extra
      accounting structure to keep track of the page[i], and the flag.
      
      [v1: Added documentation details, made it return -EOPNOTSUPP instead
       of trying to do a half-way implementation]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cf8d9163
  19. 14 3月, 2011 4 次提交
    • S
      xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set.. · 706cc9d2
      Stefano Stabellini 提交于
      If there is no proper PFN value in the M2P for the MFN
      (so we get 0xFFFFF.. or 0x55555, or 0x0), we should
      consult the M2P override to see if there is an entry for this.
      [Note: we also consult the M2P override if the MFN
      is past our machine_to_phys size].
      
      We consult the P2M with the PFN. In case the returned
      MFN is one of the special values: 0xFFF.., 0x5555
      (which signify that the MFN can be either "missing" or it
      belongs to DOMID_IO) or the p2m(m2p(mfn)) != mfn, we check
      the M2P override. If we fail the M2P override check, we reset
      the PFN value to INVALID_P2M_ENTRY.
      
      Next we try to find the MFN in the P2M using the MFN
      value (not the PFN value) and if found, we know
      that this MFN is an identity value and return it as so.
      
      Otherwise we have exhausted all the posibilities and we
      return the PFN, which at this stage can either be a real
      PFN value found in the machine_to_phys.. array, or
      INVALID_P2M_ENTRY value.
      
      [v1: Added Review-by tag]
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      706cc9d2
    • K
      xen/m2p: No need to catch exceptions when we know that there is no RAM · 146c4e51
      Konrad Rzeszutek Wilk 提交于
      .. beyound what we think is the end of memory. However there might
      be more System RAM - but assigned to a guest. Hence jump to the
      M2P override check and consult.
      
      [v1: Added Review-by tag]
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      146c4e51
    • K
      xen/debugfs: Add 'p2m' file for printing out the P2M layout. · 2222e71b
      Konrad Rzeszutek Wilk 提交于
      We walk over the whole P2M tree and construct a simplified view of
      which PFN regions belong to what level and what type they are.
      
      Only enabled if CONFIG_XEN_DEBUG_FS is set.
      
      [v2: UNKN->UNKNOWN, use uninitialized_var]
      [v3: Rebased on top of mmu->p2m code split]
      [v4: Fixed the else if]
      Reviewed-by: NIan Campbell <Ian.Campbell@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2222e71b
    • K
      xen/mmu: Add the notion of identity (1-1) mapping. · f4cec35b
      Konrad Rzeszutek Wilk 提交于
      Our P2M tree structure is a three-level. On the leaf nodes
      we set the Machine Frame Number (MFN) of the PFN. What this means
      is that when one does: pfn_to_mfn(pfn), which is used when creating
      PTE entries, you get the real MFN of the hardware. When Xen sets
      up a guest it initially populates a array which has descending
      (or ascending) MFN values, as so:
      
       idx: 0,  1,       2
       [0x290F, 0x290E, 0x290D, ..]
      
      so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list
      starts looking quite random.
      
      We graft this structure on our P2M tree structure and stick in
      those MFN in the leafs. But for all other leaf entries, or for the top
      root, or middle one, for which there is a void entry, we assume it is
      "missing". So
       pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY.
      
      We add the possibility of setting 1-1 mappings on certain regions, so
      that:
       pfn_to_mfn(0xc0000)=0xc0000
      
      The benefit of this is, that we can assume for non-RAM regions (think
      PCI BARs, or ACPI spaces), we can create mappings easily b/c we
      get the PFN value to match the MFN.
      
      For this to work efficiently we introduce one new page p2m_identity and
      allocate (via reserved_brk) any other pages we need to cover the sides
      (1GB or 4MB boundary violations). All entries in p2m_identity are set to
      INVALID_P2M_ENTRY type (Xen toolstack only recognizes that and MFNs,
      no other fancy value).
      
      On lookup we spot that the entry points to p2m_identity and return the identity
      value instead of dereferencing and returning INVALID_P2M_ENTRY. If the entry
      points to an allocated page, we just proceed as before and return the PFN.
      If the PFN has IDENTITY_FRAME_BIT set we unmask that in appropriate functions
      (pfn_to_mfn).
      
      The reason for having the IDENTITY_FRAME_BIT instead of just returning the
      PFN is that we could find ourselves where pfn_to_mfn(pfn)==pfn for a
      non-identity pfn. To protect ourselves against we elect to set (and get) the
      IDENTITY_FRAME_BIT on all identity mapped PFNs.
      
      This simplistic diagram is used to explain the more subtle piece of code.
      There is also a digram of the P2M at the end that can help.
      Imagine your E820 looking as so:
      
                         1GB                                           2GB
      /-------------------+---------\/----\         /----------\    /---+-----\
      | System RAM        | Sys RAM ||ACPI|         | reserved |    | Sys RAM |
      \-------------------+---------/\----/         \----------/    \---+-----/
                                    ^- 1029MB                       ^- 2001MB
      
      [1029MB = 263424 (0x40500), 2001MB = 512256 (0x7D100), 2048MB = 524288 (0x80000)]
      
      And dom0_mem=max:3GB,1GB is passed in to the guest, meaning memory past 1GB
      is actually not present (would have to kick the balloon driver to put it in).
      
      When we are told to set the PFNs for identity mapping (see patch: "xen/setup:
      Set identity mapping for non-RAM E820 and E820 gaps.") we pass in the start
      of the PFN and the end PFN (263424 and 512256 respectively). The first step is
      to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page
      covers 512^2 of page estate (1GB) and in case the start or end PFN is not
      aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn to
      end pfn.  We reserve_brk top leaf pages if they are missing (means they point
      to p2m_mid_missing).
      
      With the E820 example above, 263424 is not 1GB aligned so we allocate a
      reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000.
      Each entry in the allocate page is "missing" (points to p2m_missing).
      
      Next stage is to determine if we need to do a more granular boundary check
      on the 4MB (or 2MB depending on architecture) off the start and end pfn's.
      We check if the start pfn and end pfn violate that boundary check, and if
      so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer
      granularity of setting which PFNs are missing and which ones are identity.
      In our example 263424 and 512256 both fail the check so we reserve_brk two
      pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing" values)
      and assign them to p2m[1][2] and p2m[1][488] respectively.
      
      At this point we would at minimum reserve_brk one page, but could be up to
      three. Each call to set_phys_range_identity has at maximum a three page
      cost. If we were to query the P2M at this stage, all those entries from
      start PFN through end PFN (so 1029MB -> 2001MB) would return INVALID_P2M_ENTRY
      ("missing").
      
      The next step is to walk from the start pfn to the end pfn setting
      the IDENTITY_FRAME_BIT on each PFN. This is done in 'set_phys_range_identity'.
      If we find that the middle leaf is pointing to p2m_missing we can swap it over
      to p2m_identity - this way covering 4MB (or 2MB) PFN space.  At this point we
      do not need to worry about boundary aligment (so no need to reserve_brk a middle
      page, figure out which PFNs are "missing" and which ones are identity), as that
      has been done earlier.  If we find that the middle leaf is not occupied by
      p2m_identity or p2m_missing, we dereference that page (which covers
      512 PFNs) and set the appropriate PFN with IDENTITY_FRAME_BIT. In our example
      263424 and 512256 end up there, and we set from p2m[1][2][256->511] and
      p2m[1][488][0->256] with IDENTITY_FRAME_BIT set.
      
      All other regions that are void (or not filled) either point to p2m_missing
      (considered missing) or have the default value of INVALID_P2M_ENTRY (also
      considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511]
      contain the INVALID_P2M_ENTRY value and are considered "missing."
      
      This is what the p2m ends up looking (for the E820 above) with this
      fabulous drawing:
      
         p2m         /--------------\
       /-----\       | &mfn_list[0],|                           /-----------------\
       |  0  |------>| &mfn_list[1],|    /---------------\      | ~0, ~0, ..      |
       |-----|       |  ..., ~0, ~0 |    | ~0, ~0, [x]---+----->| IDENTITY [@256] |
       |  1  |---\   \--------------/    | [p2m_identity]+\     | IDENTITY [@257] |
       |-----|    \                      | [p2m_identity]+\\    | ....            |
       |  2  |--\  \-------------------->|  ...          | \\   \----------------/
       |-----|   \                       \---------------/  \\
       |  3  |\   \                                          \\  p2m_identity
       |-----| \   \-------------------->/---------------\   /-----------------\
       | ..  +->+                        | [p2m_identity]+-->| ~0, ~0, ~0, ... |
       \-----/ /                         | [p2m_identity]+-->| ..., ~0         |
              / /---------------\        | ....          |   \-----------------/
             /  | IDENTITY[@0]  |      /-+-[x], ~0, ~0.. |
            /   | IDENTITY[@256]|<----/  \---------------/
           /    | ~0, ~0, ....  |
          |     \---------------/
          |
          p2m_missing             p2m_missing
      /------------------\     /------------\
      | [p2m_mid_missing]+---->| ~0, ~0, ~0 |
      | [p2m_mid_missing]+---->| ..., ~0    |
      \------------------/     \------------/
      
      where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN | IDENTITY_BIT)
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      [v5: Changed code to use ranges, added ASCII art]
      [v6: Rebased on top of xen->p2m code split]
      [v4: Squished patches in just this one]
      [v7: Added RESERVE_BRK for potentially allocated pages]
      [v8: Fixed alignment problem]
      [v9: Changed 1<<3X to 1<<BITS_PER_LONG-X]
      [v10: Copied git commit description in the p2m code + Add Review tag]
      [v11: Title had '2-1' - should be '1-1' mapping]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f4cec35b
  20. 04 3月, 2011 1 次提交
    • K
      xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY. · 6eaa412f
      Konrad Rzeszutek Wilk 提交于
      With this patch, we diligently set regions that will be used by the
      balloon driver to be INVALID_P2M_ENTRY and under the ownership
      of the balloon driver. We are OK using the __set_phys_to_machine
      as we do not expect to be allocating any P2M middle or entries pages.
      The set_phys_to_machine has the side-effect of potentially allocating
      new pages and we do not want that at this stage.
      
      We can do this because xen_build_mfn_list_list will have already
      allocated all such pages up to xen_max_p2m_pfn.
      
      We also move the check for auto translated physmap down the
      stack so it is present in __set_phys_to_machine.
      
      [v2: Rebased with mmu->p2m code split]
      Reviewed-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      6eaa412f
  21. 12 1月, 2011 2 次提交
  22. 13 11月, 2010 1 次提交
  23. 23 10月, 2010 2 次提交
  24. 21 10月, 2010 1 次提交
  25. 08 6月, 2010 1 次提交
  26. 09 4月, 2009 1 次提交
  27. 31 3月, 2009 1 次提交