1. 04 12月, 2014 8 次提交
    • J
      xen: switch to linear virtual mapped sparse p2m list · 054954eb
      Juergen Gross 提交于
      At start of the day the Xen hypervisor presents a contiguous mfn list
      to a pv-domain. In order to support sparse memory this mfn list is
      accessed via a three level p2m tree built early in the boot process.
      Whenever the system needs the mfn associated with a pfn this tree is
      used to find the mfn.
      
      Instead of using a software walked tree for accessing a specific mfn
      list entry this patch is creating a virtual address area for the
      entire possible mfn list including memory holes. The holes are
      covered by mapping a pre-defined  page consisting only of "invalid
      mfn" entries. Access to a mfn entry is possible by just using the
      virtual base address of the mfn list and the pfn as index into that
      list. This speeds up the (hot) path of determining the mfn of a
      pfn.
      
      Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
      showed following improvements:
      
      Elapsed time: 32:50 ->  32:35
      System:       18:07 ->  17:47
      User:        104:00 -> 103:30
      
      Tested with following configurations:
      - 64 bit dom0, 8GB RAM
      - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB
      - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without
                                          the patch)
      - 32 bit domU, ballooning up and down
      - 32 bit domU, save and restore
      - 32 bit domU with PCI passthrough
      - 64 bit domU, 8 GB, 2049 MB, 5000 MB
      - 64 bit domU, ballooning up and down
      - 64 bit domU, save and restore
      - 64 bit domU with PCI passthrough
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      054954eb
    • J
      xen: Hide get_phys_to_machine() to be able to tune common path · 0aad5689
      Juergen Gross 提交于
      Today get_phys_to_machine() is always called when the mfn for a pfn
      is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
      to be able to avoid calling get_phys_to_machine() when possible as
      soon as the switch to a linear mapped p2m list has been done.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      0aad5689
    • J
      xen: Delay invalidating extra memory · 5b8e7d80
      Juergen Gross 提交于
      When the physical memory configuration is initialized the p2m entries
      for not pouplated memory pages are set to "invalid". As those pages
      are beyond the hypervisor built p2m list the p2m tree has to be
      extended.
      
      This patch delays processing the extra memory related p2m entries
      during the boot process until some more basic memory management
      functions are callable. This removes the need to create new p2m
      entries until virtual memory management is available.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      5b8e7d80
    • J
      xen: Delay m2p_override initialization · 97f4533a
      Juergen Gross 提交于
      The m2p overrides are used to be able to find the local pfn for a
      foreign mfn mapped into the domain. They are used by driver backends
      having to access frontend data.
      
      As this functionality isn't used in early boot it makes no sense to
      initialize the m2p override functions very early. It can be done
      later without doing any harm, removing the need for allocating memory
      via extend_brk().
      
      While at it make some m2p override functions static as they are only
      used internally.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      97f4533a
    • J
      xen: Delay remapping memory of pv-domain · 1f3ac86b
      Juergen Gross 提交于
      Early in the boot process the memory layout of a pv-domain is changed
      to match the E820 map (either the host one for Dom0 or the Xen one)
      regarding placement of RAM and PCI holes. This requires removing memory
      pages initially located at positions not suitable for RAM and adding
      them later at higher addresses where no restrictions apply.
      
      To be able to operate on the hypervisor supported p2m list until a
      virtual mapped linear p2m list can be constructed, remapping must
      be delayed until virtual memory management is initialized, as the
      initial p2m list can't be extended unlimited at physical memory
      initialization time due to it's fixed structure.
      
      A further advantage is the reduction in complexity and code volume as
      we don't have to be careful regarding memory restrictions during p2m
      updates.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      1f3ac86b
    • J
      xen: use common page allocation function in p2m.c · 7108c9ce
      Juergen Gross 提交于
      In arch/x86/xen/p2m.c three different allocation functions for
      obtaining a memory page are used: extend_brk(), alloc_bootmem_align()
      or __get_free_page().  Which of those functions is used depends on the
      progress of the boot process of the system.
      
      Introduce a common allocation routine selecting the to be called
      allocation routine dynamically based on the boot progress. This allows
      moving initialization steps without having to care about changing
      allocation calls.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      7108c9ce
    • J
      xen: Make functions static · 820c4db2
      Juergen Gross 提交于
      Some functions in arch/x86/xen/p2m.c are used locally only. Make them
      static. Rearrange the functions in p2m.c to avoid forward declarations.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      820c4db2
    • J
      xen: fix some style issues in p2m.c · 6f58d89e
      Juergen Gross 提交于
      The source arch/x86/xen/p2m.c has some coding style issues. Fix them.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      6f58d89e
  2. 10 11月, 2014 1 次提交
  3. 23 10月, 2014 5 次提交
  4. 06 10月, 2014 1 次提交
  5. 03 10月, 2014 1 次提交
  6. 23 9月, 2014 4 次提交
    • D
      x86: remove the Xen-specific _PAGE_IOMAP PTE flag · f955371c
      David Vrabel 提交于
      The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs
      that were used to map I/O regions that are 1:1 in the p2m.  This
      allowed Xen to obtain the correct PFN when converting the MFNs read
      from a PTE back to their PFN.
      
      Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn()
      returns the correct PFN by using a combination of the m2p and p2m to
      determine if an MFN corresponds to a 1:1 mapping in the the p2m.
      
      Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for
      future uses of the PTE flag.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      f955371c
    • D
      x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings · 7f2f8822
      David Vrabel 提交于
      Since mfn_to_pfn() returns the correct PFN for identity mappings (as
      used for MMIO regions), the use of _PAGE_IOMAP is not required in
      pte_mfn_to_pfn().
      
      Do not set the _PAGE_IOMAP flag in pte_pfn_to_mfn() and do not use it
      in pte_mfn_to_pfn().
      
      This will allow _PAGE_IOMAP to be removed, making it available for
      future use.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7f2f8822
    • D
      xen/efi: Directly include needed headers · 342cd340
      Daniel Kiper 提交于
      I discovered that some needed stuff is defined/declared in headers
      which are not included directly. Currently it works but if somebody
      remove required headers from currently included headers then build
      will break. So, just in case directly include all needed headers.
      Signed-off-by: NDaniel Kiper <daniel.kiper@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      342cd340
    • M
      xen/setup: Remap Xen Identity Mapped RAM · 4fbb67e3
      Matt Rushton 提交于
      Instead of ballooning up and down dom0 memory this remaps the existing mfns
      that were replaced by the identity map. The reason for this is that the
      existing implementation ballooned memory up and and down which caused dom0
      to have discontiguous pages. In some cases this resulted in the use of bounce
      buffers which reduced network I/O performance significantly. This change will
      honor the existing order of the pages with the exception of some boundary
      conditions.
      
      To do this we need to update both the Linux p2m table and the Xen m2p table.
      Particular care must be taken when updating the p2m table since it's important
      to limit table memory consumption and reuse the existing leaf pages which get
      freed when an entire leaf page is set to the identity map. To implement this,
      mapping updates are grouped into blocks with table entries getting cached
      temporarily and then released.
      
      On my test system before:
      Total pages: 2105014
      Total contiguous: 1640635
      
      After:
      Total pages: 2105014
      Total contiguous: 2098904
      Signed-off-by: NMatthew Rushton <mrushton@amazon.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      4fbb67e3
  7. 16 9月, 2014 1 次提交
  8. 10 9月, 2014 1 次提交
    • S
      x86/xen: don't copy bogus duplicate entries into kernel page tables · 0b5a5063
      Stefan Bader 提交于
      When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded
      modules exceeds 512 MiB, then loading modules fails with a warning
      (and hence a vmalloc allocation failure) because the PTEs for the
      newly-allocated vmalloc address space are not zero.
      
        WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128
                 vmap_page_range_noflush+0x2a1/0x360()
      
      This is caused by xen_setup_kernel_pagetables() copying
      level2_kernel_pgt into level2_fixmap_pgt, overwriting many non-present
      entries.
      
      Without KASLR, the normal kernel image size only covers the first half
      of level2_kernel_pgt and module space starts after that.
      
      L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[  0..255]->kernel
                                                        [256..511]->module
                                [511]->level2_fixmap_pgt[  0..505]->module
      
      This allows 512 MiB of of module vmalloc space to be used before
      having to use the corrupted level2_fixmap_pgt entries.
      
      With KASLR enabled, the kernel image uses the full PUD range of 1G and
      module space starts in the level2_fixmap_pgt. So basically:
      
      L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel
                                [511]->level2_fixmap_pgt[0..505]->module
      
      And now no module vmalloc space can be used without using the corrupt
      level2_fixmap_pgt entries.
      
      Fix this by properly converting the level2_fixmap_pgt entries to MFNs,
      and setting level1_fixmap_pgt as read-only.
      
      A number of comments were also using the the wrong L3 offset for
      level2_kernel_pgt.  These have been corrected.
      Signed-off-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: stable@vger.kernel.org
      0b5a5063
  9. 27 8月, 2014 1 次提交
    • C
      x86: Replace __get_cpu_var uses · 89cbc767
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      89cbc767
  10. 11 8月, 2014 2 次提交
    • D
      x86/xen: use vmap() to map grant table pages in PVH guests · 7d951f3c
      David Vrabel 提交于
      Commit b7dd0e35 (x86/xen: safely map and unmap grant frames when
      in atomic context) causes PVH guests to crash in
      arch_gnttab_map_shared() when they attempted to map the pages for the
      grant table.
      
      This use of a PV-specific function during the PVH grant table setup is
      non-obvious and not needed.  The standard vmap() function does the
      right thing.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reported-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Tested-by: NMukesh Rathor <mukesh.rathor@oracle.com>
      Cc: stable@vger.kernel.org
      7d951f3c
    • D
      x86/xen: resume timer irqs early · 8d5999df
      David Vrabel 提交于
      If the timer irqs are resumed during device resume it is possible in
      certain circumstances for the resume to hang early on, before device
      interrupts are resumed.  For an Ubuntu 14.04 PVHVM guest this would
      occur in ~0.5% of resume attempts.
      
      It is not entirely clear what is occuring the point of the hang but I
      think a task necessary for the resume calls schedule_timeout(),
      waiting for a timer interrupt (which never arrives).  This failure may
      require specific tasks to be running on the other VCPUs to trigger
      (processes are not frozen during a suspend/resume if PREEMPT is
      disabled).
      
      Add IRQF_EARLY_RESUME to the timer interrupts so they are resumed in
      syscore_resume().
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: stable@vger.kernel.org
      8d5999df
  11. 01 8月, 2014 1 次提交
  12. 30 7月, 2014 1 次提交
    • D
      x86/xen: safely map and unmap grant frames when in atomic context · b7dd0e35
      David Vrabel 提交于
      arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in
      atomic context but were calling alloc_vm_area() which might sleep.
      
      Also, if a driver attempts to allocate a grant ref from an interrupt
      and the table needs expanding, then the CPU may already by in lazy MMU
      mode and apply_to_page_range() will BUG when it tries to re-enable
      lazy MMU mode.
      
      These two functions are only used in PV guests.
      
      Introduce arch_gnttab_init() to allocates the virtual address space in
      advance.
      
      Avoid the use of apply_to_page_range() by using saving and using the
      array of PTE addresses from the alloc_vm_area() call (which ensures
      that the required page tables are pre-allocated).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b7dd0e35
  13. 19 7月, 2014 2 次提交
    • D
      arch/x86/xen: Silence compiler warnings · c7341d6a
      Daniel Kiper 提交于
      Compiler complains in the following way when x86 32-bit kernel
      with Xen support is build:
      
        CC      arch/x86/xen/enlighten.o
      arch/x86/xen/enlighten.c: In function ‘xen_start_kernel’:
      arch/x86/xen/enlighten.c:1726:3: warning: right shift count >= width of type [enabled by default]
      
      Such line contains following EFI initialization code:
      
      boot_params.efi_info.efi_systab_hi = (__u32)(__pa(efi_systab_xen) >> 32);
      
      There is no issue if x86 64-bit kernel is build. However, 32-bit case
      generate warning (even if that code will not be executed because Xen
      does not work on 32-bit EFI platforms) due to __pa() returning unsigned long
      type which has 32-bits width. So move whole EFI initialization stuff
      to separate function and build it conditionally to avoid above mentioned
      warning on x86 32-bit architecture.
      Signed-off-by: NDaniel Kiper <daniel.kiper@oracle.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      c7341d6a
    • D
      xen: Put EFI machinery in place · be81c8a1
      Daniel Kiper 提交于
      This patch enables EFI usage under Xen dom0. Standard EFI Linux
      Kernel infrastructure cannot be used because it requires direct
      access to EFI data and code. However, in dom0 case it is not possible
      because above mentioned EFI stuff is fully owned and controlled
      by Xen hypervisor. In this case all calls from dom0 to EFI must
      be requested via special hypercall which in turn executes relevant
      EFI code in behalf of dom0.
      
      When dom0 kernel boots it checks for EFI availability on a machine.
      If it is detected then artificial EFI system table is filled.
      Native EFI callas are replaced by functions which mimics them
      by calling relevant hypercall. Later pointer to EFI system table
      is passed to standard EFI machinery and it continues EFI subsystem
      initialization taking into account that there is no direct access
      to EFI boot services, runtime, tables, structures, etc. After that
      system runs as usual.
      
      This patch is based on Jan Beulich and Tang Liang work.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NTang Liang <liang.tang@oracle.com>
      Signed-off-by: NDaniel Kiper <daniel.kiper@oracle.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      be81c8a1
  14. 15 7月, 2014 3 次提交
    • D
      xen/grant-table: remove support for V2 tables · 438b33c7
      David Vrabel 提交于
      Since 11c7ff17 (xen/grant-table: Force
      to use v1 of grants.) the code for V2 grant tables is not used.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      438b33c7
    • D
      x86/xen: safely map and unmap grant frames when in atomic context · 162e3717
      David Vrabel 提交于
      arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in
      atomic context but were calling alloc_vm_area() which might sleep.
      
      Also, if a driver attempts to allocate a grant ref from an interrupt
      and the table needs expanding, then the CPU may already by in lazy MMU
      mode and apply_to_page_range() will BUG when it tries to re-enable
      lazy MMU mode.
      
      These two functions are only used in PV guests.
      
      Introduce arch_gnttab_init() to allocates the virtual address space in
      advance.
      
      Avoid the use of apply_to_page_range() by using saving and using the
      array of PTE addresses from the alloc_vm_area() call.
      
      N.B. 'alloc_vm_area' pre-allocates the pagetable so there is no need
      to worry about having to do a PGD/PUD/PMD walk (like apply_to_page_range
      does) and we can instead do set_pte.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ----
      [v2: Add comment about alloc_vm_area]
      [v3: Fix compile error found by 0-day bot]
      162e3717
    • K
      xen: Introduce 'xen_nopv' to disable PV extensions for HVM guests. · 8d693b91
      Konrad Rzeszutek Wilk 提交于
      By default when CONFIG_XEN and CONFIG_XEN_PVHVM kernels are
      run, they will enable the PV extensions (drivers, interrupts, timers,
      etc) - which is the best option for the majority of use cases.
      
      However, in some cases (kexec not fully working, benchmarking)
      we want to disable Xen PV extensions. As such introduce the
      'xen_nopv' parameter that will do it.
      
      This parameter is intended only for HVM guests as the Xen PV
      guests MUST boot with PV extensions. However, even if you use
      'xen_nopv' on Xen PV guests it will be ignored.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      ---
      [v2: s/off/xen_nopv/ per Boris Ostrovsky recommendation.]
      [v3: Add Reviewed-by]
      [v4: Clarify that this is only for HVM guests]
      8d693b91
  15. 18 6月, 2014 1 次提交
  16. 05 6月, 2014 2 次提交
  17. 27 5月, 2014 1 次提交
  18. 15 5月, 2014 4 次提交