1. 12 1月, 2015 1 次提交
  2. 08 1月, 2015 2 次提交
  3. 08 12月, 2014 1 次提交
  4. 04 12月, 2014 2 次提交
    • J
      xen: Delay invalidating extra memory · 5b8e7d80
      Juergen Gross 提交于
      When the physical memory configuration is initialized the p2m entries
      for not pouplated memory pages are set to "invalid". As those pages
      are beyond the hypervisor built p2m list the p2m tree has to be
      extended.
      
      This patch delays processing the extra memory related p2m entries
      during the boot process until some more basic memory management
      functions are callable. This removes the need to create new p2m
      entries until virtual memory management is available.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      5b8e7d80
    • J
      xen: Delay remapping memory of pv-domain · 1f3ac86b
      Juergen Gross 提交于
      Early in the boot process the memory layout of a pv-domain is changed
      to match the E820 map (either the host one for Dom0 or the Xen one)
      regarding placement of RAM and PCI holes. This requires removing memory
      pages initially located at positions not suitable for RAM and adding
      them later at higher addresses where no restrictions apply.
      
      To be able to operate on the hypervisor supported p2m list until a
      virtual mapped linear p2m list can be constructed, remapping must
      be delayed until virtual memory management is initialized, as the
      initial p2m list can't be extended unlimited at physical memory
      initialization time due to it's fixed structure.
      
      A further advantage is the reduction in complexity and code volume as
      we don't have to be careful regarding memory restrictions during p2m
      updates.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      1f3ac86b
  5. 23 10月, 2014 1 次提交
  6. 23 9月, 2014 1 次提交
    • M
      xen/setup: Remap Xen Identity Mapped RAM · 4fbb67e3
      Matt Rushton 提交于
      Instead of ballooning up and down dom0 memory this remaps the existing mfns
      that were replaced by the identity map. The reason for this is that the
      existing implementation ballooned memory up and and down which caused dom0
      to have discontiguous pages. In some cases this resulted in the use of bounce
      buffers which reduced network I/O performance significantly. This change will
      honor the existing order of the pages with the exception of some boundary
      conditions.
      
      To do this we need to update both the Linux p2m table and the Xen m2p table.
      Particular care must be taken when updating the p2m table since it's important
      to limit table memory consumption and reuse the existing leaf pages which get
      freed when an entire leaf page is set to the identity map. To implement this,
      mapping updates are grouped into blocks with table entries getting cached
      temporarily and then released.
      
      On my test system before:
      Total pages: 2105014
      Total contiguous: 1640635
      
      After:
      Total pages: 2105014
      Total contiguous: 2098904
      Signed-off-by: NMatthew Rushton <mrushton@amazon.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      4fbb67e3
  7. 18 6月, 2014 1 次提交
  8. 05 6月, 2014 2 次提交
  9. 15 5月, 2014 2 次提交
  10. 06 5月, 2014 1 次提交
    • A
      x86, vdso: Reimplement vdso.so preparation in build-time C · 6f121e54
      Andy Lutomirski 提交于
      Currently, vdso.so files are prepared and analyzed by a combination
      of objcopy, nm, some linker script tricks, and some simple ELF
      parsers in the kernel.  Replace all of that with plain C code that
      runs at build time.
      
      All five vdso images now generate .c files that are compiled and
      linked in to the kernel image.
      
      This should cause only one userspace-visible change: the loaded vDSO
      images are stripped more heavily than they used to be.  Everything
      outside the loadable segment is dropped.  In particular, this causes
      the section table and section name strings to be missing.  This
      should be fine: real dynamic loaders don't load or inspect these
      tables anyway.  The result is roughly equivalent to eu-strip's
      --strip-sections option.
      
      The purpose of this change is to enable the vvar and hpet mappings
      to be moved to the page following the vDSO load segment.  Currently,
      it is possible for the section table to extend into the page after
      the load segment, so, if we map it, it risks overlapping the vvar or
      hpet page.  This happens whenever the load segment is just under a
      multiple of PAGE_SIZE.
      
      The only real subtlety here is that the old code had a C file with
      inline assembler that did 'call VDSO32_vsyscall' and a linker script
      that defined 'VDSO32_vsyscall = __kernel_vsyscall'.  This most
      likely worked by accident: the linker script entry defines a symbol
      associated with an address as opposed to an alias for the real
      dynamic symbol __kernel_vsyscall.  That caused ld to relocate the
      reference at link time instead of leaving an interposable dynamic
      relocation.  Since the VDSO32_vsyscall hack is no longer needed, I
      now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it
      work.  vdso2c will generate an error and abort the build if the
      resulting image contains any dynamic relocations, so we won't
      silently generate bad vdso images.
      
      (Dynamic relocations are a problem because nothing will even attempt
      to relocate the vdso.)
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      6f121e54
  11. 30 1月, 2014 1 次提交
  12. 06 1月, 2014 2 次提交
  13. 09 11月, 2013 1 次提交
  14. 20 8月, 2013 2 次提交
    • D
      x86/xen: during early setup, only 1:1 map the ISA region · e201bfcc
      David Vrabel 提交于
      During early setup, when the reserved regions and MMIO holes are being
      setup as 1:1 in the p2m, clear any mappings instead of making them 1:1
      (execept for the ISA region which is expected to be mapped).
      
      This fixes a regression introduced in 3.5 by 83d51ab4 (xen/setup:
      update VA mapping when releasing memory during setup) which caused
      hosts with tboot to fail to boot.
      
      tboot marks a region in the e820 map as unusable and the dom0 kernel
      would attempt to map this region and Xen does not permit unusable
      regions to be mapped by guests.
      
      (XEN)  0000000000000000 - 0000000000060000 (usable)
      (XEN)  0000000000060000 - 0000000000068000 (reserved)
      (XEN)  0000000000068000 - 000000000009e000 (usable)
      (XEN)  0000000000100000 - 0000000000800000 (usable)
      (XEN)  0000000000800000 - 0000000000972000 (unusable)
      
      tboot marked this region as unusable.
      
      (XEN)  0000000000972000 - 00000000cf200000 (usable)
      (XEN)  00000000cf200000 - 00000000cf38f000 (reserved)
      (XEN)  00000000cf38f000 - 00000000cf3ce000 (ACPI data)
      (XEN)  00000000cf3ce000 - 00000000d0000000 (reserved)
      (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
      (XEN)  00000000fe000000 - 0000000100000000 (reserved)
      (XEN)  0000000100000000 - 0000000630000000 (usable)
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      e201bfcc
    • D
      x86/xen: do not identity map UNUSABLE regions in the machine E820 · 3bc38cbc
      David Vrabel 提交于
      If there are UNUSABLE regions in the machine memory map, dom0 will
      attempt to map them 1:1 which is not permitted by Xen and the kernel
      will crash.
      
      There isn't anything interesting in the UNUSABLE region that the dom0
      kernel needs access to so we can avoid making the 1:1 mapping and
      treat it as RAM.
      
      We only do this for dom0, as that is where tboot case shows up.
      A PV domU could have an UNUSABLE region in its pseudo-physical map
      and would need to be handled in another patch.
      
      This fixes a boot failure on hosts with tboot.
      
      tboot marks a region in the e820 map as unusable and the dom0 kernel
      would attempt to map this region and Xen does not permit unusable
      regions to be mapped by guests.
      
        (XEN)  0000000000000000 - 0000000000060000 (usable)
        (XEN)  0000000000060000 - 0000000000068000 (reserved)
        (XEN)  0000000000068000 - 000000000009e000 (usable)
        (XEN)  0000000000100000 - 0000000000800000 (usable)
        (XEN)  0000000000800000 - 0000000000972000 (unusable)
      
      tboot marked this region as unusable.
      
        (XEN)  0000000000972000 - 00000000cf200000 (usable)
        (XEN)  00000000cf200000 - 00000000cf38f000 (reserved)
        (XEN)  00000000cf38f000 - 00000000cf3ce000 (ACPI data)
        (XEN)  00000000cf3ce000 - 00000000d0000000 (reserved)
        (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
        (XEN)  00000000fe000000 - 0000000100000000 (reserved)
        (XEN)  0000000100000000 - 0000000630000000 (usable)
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      [v1: Altered the patch and description with domU's with UNUSABLE regions]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3bc38cbc
  15. 09 8月, 2013 1 次提交
    • K
      xen: Support 64-bit PV guest receiving NMIs · 6efa20e4
      Konrad Rzeszutek Wilk 提交于
      This is based on a patch that Zhenzhong Duan had sent - which
      was missing some of the remaining pieces. The kernel has the
      logic to handle Xen-type-exceptions using the paravirt interface
      in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME -
      pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN -
      pv_cpu_ops.iret).
      
      That means the nmi handler (and other exception handlers) use
      the hypervisor iret.
      
      The other changes that would be neccessary for this would
      be to translate the NMI_VECTOR to one of the entries on the
      ipi_vector and make xen_send_IPI_mask_allbutself use different
      events.
      
      Fortunately for us commit 1db01b49
      (xen: Clean up apic ipi interface) implemented this and we piggyback
      on the cleanup such that the apic IPI interface will pass the right
      vector value for NMI.
      
      With this patch we can trigger NMIs within a PV guest (only tested
      x86_64).
      
      For this to work with normal PV guests (not initial domain)
      we need the domain to be able to use the APIC ops - they are
      already implemented to use the Xen event channels. For that
      to be turned on in a PV domU we need to remove the masking
      of X86_FEATURE_APIC.
      
      Incidentally that means kgdb will also now work within
      a PV guest without using the 'nokgdbroundup' workaround.
      
      Note that the 32-bit version is different and this patch
      does not enable that.
      
      CC: Lisa Nguyen <lisa@xenapiadmin.com>
      CC: Ben Guthro <benjamin.guthro@citrix.com>
      CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v1: Fixed up per David Vrabel comments]
      Reviewed-by: NBen Guthro <benjamin.guthro@citrix.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      6efa20e4
  16. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  17. 10 2月, 2013 2 次提交
    • L
      x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag · 27be4570
      Len Brown 提交于
      Remove 32-bit x86 a cmdline param "no-hlt",
      and the cpuinfo_x86.hlt_works_ok that it sets.
      
      If a user wants to avoid HLT, then "idle=poll"
      is much more useful, as it avoids invocation of HLT
      in idle, while "no-hlt" failed to do so.
      
      Indeed, hlt_works_ok was consulted in only 3 places.
      
      First, in /proc/cpuinfo where "hlt_bug yes"
      would be printed if and only if the user booted
      the system with "no-hlt" -- as there was no other code
      to set that flag.
      
      Second, check_hlt() would not invoke halt() if "no-hlt"
      were on the cmdline.
      
      Third, it was consulted in stop_this_cpu(), which is invoked
      by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() --
      all cases where the machine is being shutdown/reset.
      The flag was not consulted in the more frequently invoked
      play_dead()/hlt_play_dead() used in processor offline and suspend.
      
      Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations
      indicating that it would be removed in 2012.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: x86@kernel.org
      27be4570
    • L
      xen idle: make xen-specific macro xen-specific · 6a377ddc
      Len Brown 提交于
      This macro is only invoked by Xen,
      so make its definition specific to Xen.
      
      > set_pm_idle_to_default()
      < xen_set_default_idle()
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: xen-devel@lists.xensource.com
      6a377ddc
  18. 24 9月, 2012 1 次提交
    • K
      xen/boot: Disable NUMA for PV guests. · 8d54db79
      Konrad Rzeszutek Wilk 提交于
      The hypervisor is in charge of allocating the proper "NUMA" memory
      and dealing with the CPU scheduler to keep them bound to the proper
      NUMA node. The PV guests (and PVHVM) have no inkling of where they
      run and do not need to know that right now. In the future we will
      need to inject NUMA configuration data (if a guest spans two or more
      NUMA nodes) so that the kernel can make the right choices. But those
      patches are not yet present.
      
      In the meantime, disable the NUMA capability in the PV guest, which
      also fixes a bootup issue. Andre says:
      
      "we see Dom0 crashes due to the kernel detecting the NUMA topology not
      by ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).
      
      This will detect the actual NUMA config of the physical machine, but
      will crash about the mismatch with Dom0's virtual memory. Variation of
      the theme: Dom0 sees what it's not supposed to see.
      
      This happens with the said config option enabled and on a machine where
      this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
      
      We have this dump then:
      NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10
      Scanning NUMA topology in Northbridge 24
      Number of physical nodes 4
      Node 0 MemBase 0000000000000000 Limit 0000000040000000
      Node 1 MemBase 0000000040000000 Limit 0000000138000000
      Node 2 MemBase 0000000138000000 Limit 00000001f8000000
      Node 3 MemBase 00000001f8000000 Limit 0000000238000000
      Initmem setup node 0 0000000000000000-0000000040000000
        NODE_DATA [000000003ffd9000 - 000000003fffffff]
      Initmem setup node 1 0000000040000000-0000000138000000
        NODE_DATA [0000000137fd9000 - 0000000137ffffff]
      Initmem setup node 2 0000000138000000-00000001f8000000
        NODE_DATA [00000001f095e000 - 00000001f0984fff]
      Initmem setup node 3 00000001f8000000-0000000238000000
      Cannot find 159744 bytes in node 3
      BUG: unable to handle kernel NULL pointer dereference at (null)
      IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
      Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
      RIP: e030:[<ffffffff81d220e6>]  [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
      .. snip..
        [<ffffffff81d23024>] sparse_early_usemaps_alloc_node+0x64/0x178
        [<ffffffff81d23348>] sparse_init+0xe4/0x25a
        [<ffffffff81d16840>] paging_init+0x13/0x22
        [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
        [<ffffffff81683954>] ? printk+0x3c/0x3e
        [<ffffffff81d01a38>] start_kernel+0xe5/0x468
        [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1
        [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36
        [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
      "
      
      so we just disable NUMA scanning by setting numa_off=1.
      
      CC: stable@vger.kernel.org
      Reported-and-Tested-by: NAndre Przywara <andre.przywara@amd.com>
      Acked-by: NAndre Przywara <andre.przywara@amd.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      8d54db79
  19. 23 8月, 2012 2 次提交
    • K
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and... · 51faaf2b
      Konrad Rzeszutek Wilk 提交于
      Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas."
      
      This reverts commit 806c312e and
      commit 59b29440.
      
      And also documents setup.c and why we want to do it that way, which
      is that we tried to make the the memblock_reserve more selective so
      that it would be clear what region is reserved. Sadly we ran
      in the problem wherein on a 64-bit hypervisor with a 32-bit
      initial domain, the pt_base has the cr3 value which is not
      neccessarily where the pagetable starts! As Jan put it: "
      Actually, the adjustment turns out to be correct: The page
      tables for a 32-on-64 dom0 get allocated in the order "first L1",
      "first L2", "first L3", so the offset to the page table base is
      indeed 2. When reading xen/include/public/xen.h's comment
      very strictly, this is not a violation (since there nothing is said
      that the first thing in the page table space is pointed to by
      pt_base; I admit that this seems to be implied though, namely
      do I think that it is implied that the page table space is the
      range [pt_base, pt_base + nt_pt_frames), whereas that
      range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames),
      which - without a priori knowledge - the kernel would have
      difficulty to figure out)." - so lets just fall back to the
      easy way and reserve the whole region.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      51faaf2b
    • K
      xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. · c96aae1f
      Konrad Rzeszutek Wilk 提交于
      When we are finished with return PFNs to the hypervisor, then
      populate it back, and also mark the E820 MMIO and E820 gaps
      as IDENTITY_FRAMEs, we then call P2M to set areas that can
      be used for ballooning. We were off by one, and ended up
      over-writting a P2M entry that most likely was an IDENTITY_FRAME.
      For example:
      
      1-1 mapping on 40000->40200
      1-1 mapping on bc558->bc5ac
      1-1 mapping on bc5b4->bc8c5
      1-1 mapping on bc8c6->bcb7c
      1-1 mapping on bcd00->100000
      Released 614 pages of unused memory
      Set 277889 page(s) to 1-1 mapping
      Populating 40200-40466 pfn range: 614 pages added
      
      => here we set from 40466 up to bc559 P2M tree to be
      INVALID_P2M_ENTRY. We should have done it up to bc558.
      
      The end result is that if anybody is trying to construct
      a PTE for PFN bc558 they end up with ~PAGE_PRESENT.
      
      CC: stable@vger.kernel.org
      Reported-by-and-Tested-by: NAndre Przywara <andre.przywara@amd.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c96aae1f
  20. 22 8月, 2012 1 次提交
  21. 20 7月, 2012 1 次提交
    • Z
      xen: populate correct number of pages when across mem boundary (v2) · c3d93f88
      zhenzhong.duan 提交于
      When populate pages across a mem boundary at bootup, the page count
      populated isn't correct. This is due to mem populated to non-mem
      region and ignored.
      
      Pfn range is also wrongly aligned when mem boundary isn't page aligned.
      
      For a dom0 booted with dom_mem=3368952K(0xcd9ff000-4k) dmesg diff is:
       [    0.000000] Freeing 9e-100 pfn range: 98 pages freed
       [    0.000000] 1-1 mapping on 9e->100
       [    0.000000] 1-1 mapping on cd9ff->100000
       [    0.000000] Released 98 pages of unused memory
       [    0.000000] Set 206435 page(s) to 1-1 mapping
      -[    0.000000] Populating cd9fe-cda00 pfn range: 1 pages added
      +[    0.000000] Populating cd9fe-cd9ff pfn range: 1 pages added
      +[    0.000000] Populating 100000-100061 pfn range: 97 pages added
       [    0.000000] BIOS-provided physical RAM map:
       [    0.000000] Xen: 0000000000000000 - 000000000009e000 (usable)
       [    0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved)
       [    0.000000] Xen: 0000000000100000 - 00000000cd9ff000 (usable)
       [    0.000000] Xen: 00000000cd9ffc00 - 00000000cda53c00 (ACPI NVS)
      ...
       [    0.000000] Xen: 0000000100000000 - 0000000100061000 (usable)
       [    0.000000] Xen: 0000000100061000 - 000000012c000000 (unusable)
      ...
       [    0.000000] MEMBLOCK configuration:
      ...
      -[    0.000000]  reserved[0x4]       [0x000000cd9ff000-0x000000cd9ffbff], 0xc00 bytes
      -[    0.000000]  reserved[0x5]       [0x00000100000000-0x00000100060fff], 0x61000 bytes
      
      Related xen memory layout:
      (XEN) Xen-e820 RAM map:
      (XEN)  0000000000000000 - 000000000009ec00 (usable)
      (XEN)  00000000000f0000 - 0000000000100000 (reserved)
      (XEN)  0000000000100000 - 00000000cd9ffc00 (usable)
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      [v2: If xen_do_chunk fail(populate), abort this chunk and any others]
      Suggested by David, thanks.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c3d93f88
  22. 30 5月, 2012 1 次提交
    • K
      xen/balloon: Subtract from xen_released_pages the count that is populated. · 58b7b53a
      Konrad Rzeszutek Wilk 提交于
      We did not take into account that xen_released_pages would be
      used outside the initial E820 parsing code. As such we would
      did not subtract from xen_released_pages the count of pages
      that we had populated back (instead we just did a simple
      extra_pages = released - populated).
      
      The balloon driver uses xen_released_pages to set the initial
      current_pages count.  If this is wrong (too low) then when a new
      (higher) target is set, the balloon driver will request too many pages
      from Xen."
      
      This fixes errors such as:
      
      (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (51 of 512)
      during bootup and
      free_memory            : 0
      
      where the free_memory should be 128.
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      [v1: Per David's review made the git commit better]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      58b7b53a
  23. 08 5月, 2012 4 次提交
    • D
      xen/setup: update VA mapping when releasing memory during setup · 83d51ab4
      David Vrabel 提交于
      In xen_memory_setup(), if a page that is being released has a VA
      mapping this must also be updated.  Otherwise, the page will be not
      released completely -- it will still be referenced in Xen and won't be
      freed util the mapping is removed and this prevents it from being
      reallocated at a different PFN.
      
      This was already being done for the ISA memory region in
      xen_ident_map_ISA() but on many systems this was omitting a few pages
      as many systems marked a few pages below the ISA memory region as
      reserved in the e820 map.
      
      This fixes errors such as:
      
      (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
      (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      83d51ab4
    • K
      xen/setup: Combine the two hypercall functions - since they are quite similar. · 96dc08b3
      Konrad Rzeszutek Wilk 提交于
      They use the same set of arguments, so it is just the matter
      of using the proper hypercall.
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      96dc08b3
    • K
      xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM · 2e2fb754
      Konrad Rzeszutek Wilk 提交于
      When the Xen hypervisor boots a PV kernel it hands it two pieces
      of information: nr_pages and a made up E820 entry.
      
      The nr_pages value defines the range from zero to nr_pages of PFNs
      which have a valid Machine Frame Number (MFN) underneath it. The
      E820 mirrors that (with the VGA hole):
      BIOS-provided physical RAM map:
       Xen: 0000000000000000 - 00000000000a0000 (usable)
       Xen: 00000000000a0000 - 0000000000100000 (reserved)
       Xen: 0000000000100000 - 0000000080800000 (usable)
      
      The fun comes when a PV guest that is run with a machine E820 - that
      can either be the initial domain or a PCI PV guest, where the E820
      looks like the normal thing:
      
      BIOS-provided physical RAM map:
       Xen: 0000000000000000 - 000000000009e000 (usable)
       Xen: 000000000009ec00 - 0000000000100000 (reserved)
       Xen: 0000000000100000 - 0000000020000000 (usable)
       Xen: 0000000020000000 - 0000000020200000 (reserved)
       Xen: 0000000020200000 - 0000000040000000 (usable)
       Xen: 0000000040000000 - 0000000040200000 (reserved)
       Xen: 0000000040200000 - 00000000bad80000 (usable)
       Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS)
      ..
      With that overlaying the nr_pages directly on the E820 does not
      work as there are gaps and non-RAM regions that won't be used
      by the memory allocator. The 'xen_release_chunk' helps with that
      by punching holes in the P2M (PFN to MFN lookup tree) for those
      regions and tells us that:
      
      Freeing  20000-20200 pfn range: 512 pages freed
      Freeing  40000-40200 pfn range: 512 pages freed
      Freeing  bad80-badf4 pfn range: 116 pages freed
      Freeing  badf6-bae7f pfn range: 137 pages freed
      Freeing  bb000-100000 pfn range: 282624 pages freed
      Released 283999 pages of unused memory
      
      Those 283999 pages are subtracted from the nr_pages and are returned
      to the hypervisor. The end result is that the initial domain
      boots with 1GB less memory as the nr_pages has been subtracted by
      the amount of pages residing within the PCI hole. It can balloon up
      to that if desired using 'xl mem-set 0 8092', but the balloon driver
      is not always compiled in for the initial domain.
      
      This patch, implements the populate hypercall (XENMEM_populate_physmap)
      which increases the the domain with the same amount of pages that
      were released.
      
      The other solution (that did not work) was to transplant the MFN in
      the P2M tree - the ones that were going to be freed were put in
      the E820_RAM regions past the nr_pages. But the modifications to the
      M2P array (the other side of creating PTEs) were not carried away.
      As the hypervisor is the only one capable of modifying that and the
      only two hypercalls that would do this are: the update_va_mapping
      (which won't work, as during initial bootup only PFNs up to nr_pages
      are mapped in the guest) or via the populate hypercall.
      
      The end result is that the kernel can now boot with the
      nr_pages without having to subtract the 283999 pages.
      
      On a 8GB machine, with various dom0_mem= parameters this is what we get:
      
      no dom0_mem
      -Memory: 6485264k/9435136k available (5817k kernel code, 1136060k absent, 1813812k reserved, 2899k data, 696k init)
      +Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init)
      
      dom0_mem=3G
      -Memory: 2616536k/9435136k available (5817k kernel code, 1136060k absent, 5682540k reserved, 2899k data, 696k init)
      +Memory: 2703776k/9435136k available (5817k kernel code, 1136060k absent, 5595300k reserved, 2899k data, 696k init)
      
      dom0_mem=max:3G
      -Memory: 2696732k/4281724k available (5817k kernel code, 1136060k absent, 448932k reserved, 2899k data, 696k init)
      +Memory: 2702204k/4281724k available (5817k kernel code, 1136060k absent, 443460k reserved, 2899k data, 696k init)
      
      And the 'xm list' or 'xl list' now reflect what the dom0_mem=
      argument is.
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      [v2: Use populate hypercall]
      [v3: Remove debug printks]
      [v4: Simplify code]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2e2fb754
    • K
      xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 · ca118238
      Konrad Rzeszutek Wilk 提交于
      Otherwise we can get these meaningless:
      Freeing  bad80-badf4 pfn range: 0 pages freed
      
      We also can do this for the summary ones - no point of printing
      "Set 0 page(s) to 1-1 mapping"
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      [v1: Extended to the summary printks]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ca118238
  24. 21 3月, 2012 1 次提交
  25. 11 3月, 2012 1 次提交
  26. 16 12月, 2011 1 次提交
    • I
      xen: only limit memory map to maximum reservation for domain 0. · d3db7281
      Ian Campbell 提交于
      d312ae87 "xen: use maximum reservation to limit amount of usable RAM"
      clamped the total amount of RAM to the current maximum reservation. This is
      correct for dom0 but is not correct for guest domains. In order to boot a guest
      "pre-ballooned" (e.g. with memory=1G but maxmem=2G) in order to allow for
      future memory expansion the guest must derive max_pfn from the e820 provided by
      the toolstack and not the current maximum reservation (which can reflect only
      the current maximum, not the guest lifetime max). The existing algorithm
      already behaves this correctly if we do not artificially limit the maximum
      number of pages for the guest case.
      
      For a guest booted with maxmem=512, memory=128 this results in:
       [    0.000000] BIOS-provided physical RAM map:
       [    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
       [    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
      -[    0.000000]  Xen: 0000000000100000 - 0000000008100000 (usable)
      -[    0.000000]  Xen: 0000000008100000 - 0000000020800000 (unusable)
      +[    0.000000]  Xen: 0000000000100000 - 0000000020800000 (usable)
      ...
       [    0.000000] NX (Execute Disable) protection: active
       [    0.000000] DMI not present or invalid.
       [    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
       [    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
      -[    0.000000] last_pfn = 0x8100 max_arch_pfn = 0x1000000
      +[    0.000000] last_pfn = 0x20800 max_arch_pfn = 0x1000000
       [    0.000000] initial memory mapped : 0 - 027ff000
       [    0.000000] Base memory trampoline at [c009f000] 9f000 size 4096
      -[    0.000000] init_memory_mapping: 0000000000000000-0000000008100000
      -[    0.000000]  0000000000 - 0008100000 page 4k
      -[    0.000000] kernel direct mapping tables up to 8100000 @ 27bb000-27ff000
      +[    0.000000] init_memory_mapping: 0000000000000000-0000000020800000
      +[    0.000000]  0000000000 - 0020800000 page 4k
      +[    0.000000] kernel direct mapping tables up to 20800000 @ 26f8000-27ff000
       [    0.000000] xen: setting RW the range 27e8000 - 27ff000
       [    0.000000] 0MB HIGHMEM available.
      -[    0.000000] 129MB LOWMEM available.
      -[    0.000000]   mapped low ram: 0 - 08100000
      -[    0.000000]   low ram: 0 - 08100000
      +[    0.000000] 520MB LOWMEM available.
      +[    0.000000]   mapped low ram: 0 - 20800000
      +[    0.000000]   low ram: 0 - 20800000
      
      With this change "xl mem-set <domain> 512M" will successfully increase the
      guest RAM (by reducing the balloon).
      
      There is no change for dom0.
      Reported-and-Tested-by: NGeorge Shuklin <george.shuklin@gmail.com>
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: stable@kernel.org
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d3db7281
  27. 04 12月, 2011 1 次提交
    • K
      xen/pm_idle: Make pm_idle be default_idle under Xen. · e5fd47bf
      Konrad Rzeszutek Wilk 提交于
      The idea behind commit d91ee586 ("cpuidle: replace xen access to x86
      pm_idle and default_idle") was to have one call - disable_cpuidle()
      which would make pm_idle not be molested by other code.  It disallows
      cpuidle_idle_call to be set to pm_idle (which is excellent).
      
      But in the select_idle_routine() and idle_setup(), the pm_idle can still
      be set to either: amd_e400_idle, mwait_idle or default_idle.  This
      depends on some CPU flags (MWAIT) and in AMD case on the type of CPU.
      
      In case of mwait_idle we can hit some instances where the hypervisor
      (Amazon EC2 specifically) sets the MWAIT and we get:
      
        Brought up 2 CPUs
        invalid opcode: 0000 [#1] SMP
      
        Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1
        RIP: e030:[<ffffffff81015d1d>]  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
        ...
        Call Trace:
         [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8
         [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10
        RIP  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
         RSP <ffff8801d28ddf10>
      
      In the case of amd_e400_idle we don't get so spectacular crashes, but we
      do end up making an MSR which is trapped in the hypervisor, and then
      follow it up with a yield hypercall.  Meaning we end up going to
      hypervisor twice instead of just once.
      
      The previous behavior before v3.0 was that pm_idle was set to
      default_idle regardless of select_idle_routine/idle_setup.
      
      We want to do that, but only for one specific case: Xen.  This patch
      does that.
      
      Fixes RH BZ #739499 and Ubuntu #881076
      Reported-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5fd47bf
  28. 29 9月, 2011 2 次提交
    • D
      xen: release all pages within 1-1 p2m mappings · f3f436e3
      David Vrabel 提交于
      In xen_memory_setup() all reserved regions and gaps are set to an
      identity (1-1) p2m mapping.  If an available page has a PFN within one
      of these 1-1 mappings it will become inaccessible (as it MFN is lost)
      so release them before setting up the mapping.
      
      This can make an additional 256 MiB or more of RAM available
      (depending on the size of the reserved regions in the memory map) if
      the initial pages overlap with reserved regions.
      
      The 1:1 p2m mappings are also extended to cover partial pages.  This
      fixes an issue with (for example) systems with a BIOS that puts the
      DMI tables in a reserved region that begins on a non-page boundary.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f3f436e3
    • D
      xen: allow extra memory to be in multiple regions · dc91c728
      David Vrabel 提交于
      Allow the extra memory (used by the balloon driver) to be in multiple
      regions (typically two regions, one for low memory and one for high
      memory).  This allows the balloon driver to increase the number of
      available low pages (if the initial number if pages is small).
      
      As a side effect, the algorithm for building the e820 memory map is
      simpler and more obviously correct as the map supplied by the
      hypervisor is (almost) used as is (in particular, all reserved regions
      and gaps are preserved).  Only RAM regions are altered and RAM regions
      above max_pfn + extra_pages are marked as unused (the region is split
      in two if necessary).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      dc91c728