- 06 5月, 2014 1 次提交
-
-
由 Andy Lutomirski 提交于
Currently, vdso.so files are prepared and analyzed by a combination of objcopy, nm, some linker script tricks, and some simple ELF parsers in the kernel. Replace all of that with plain C code that runs at build time. All five vdso images now generate .c files that are compiled and linked in to the kernel image. This should cause only one userspace-visible change: the loaded vDSO images are stripped more heavily than they used to be. Everything outside the loadable segment is dropped. In particular, this causes the section table and section name strings to be missing. This should be fine: real dynamic loaders don't load or inspect these tables anyway. The result is roughly equivalent to eu-strip's --strip-sections option. The purpose of this change is to enable the vvar and hpet mappings to be moved to the page following the vDSO load segment. Currently, it is possible for the section table to extend into the page after the load segment, so, if we map it, it risks overlapping the vvar or hpet page. This happens whenever the load segment is just under a multiple of PAGE_SIZE. The only real subtlety here is that the old code had a C file with inline assembler that did 'call VDSO32_vsyscall' and a linker script that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most likely worked by accident: the linker script entry defines a symbol associated with an address as opposed to an alias for the real dynamic symbol __kernel_vsyscall. That caused ld to relocate the reference at link time instead of leaving an interposable dynamic relocation. Since the VDSO32_vsyscall hack is no longer needed, I now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it work. vdso2c will generate an error and abort the build if the resulting image contains any dynamic relocations, so we won't silently generate bad vdso images. (Dynamic relocations are a problem because nothing will even attempt to relocate the vdso.) Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 30 1月, 2014 1 次提交
-
-
由 Andi Kleen 提交于
LTO requires consistent types of symbols over all files. So "nmi" cannot be declared as a char [] here, need to use the correct function type. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-8-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 06 1月, 2014 2 次提交
-
-
由 Mukesh Rathor 提交于
In xen_add_extra_mem() we can skip updating P2M as it's managed by Xen. PVH maps the entire IO space, but only RAM pages need to be repopulated. Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com> Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
-
由 Mukesh Rathor 提交于
We don't use the filtering that 'xen_cpuid' is doing because the hypervisor treats 'XEN_EMULATE_PREFIX' as an invalid instruction. This means that all of the filtering will have to be done in the hypervisor/toolstack. Without the filtering we expose to the guest the: - cpu topology (sockets, cores, etc); - the APERF (which the generic scheduler likes to use), see 5e626254 "xen/setup: filter APERFMPERF cpuid feature out" - and the inability to figure out whether MWAIT_LEAF should be exposed or not. See df88b2d9 "xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded." - x2apic, see 4ea9b9ac "xen: mask x2APIC feature in PV" We also check for vector callback early on, as it is a required feature. PVH also runs at default kernel IOPL. Finally, pure PV settings are moved to a separate function that are only called for pure PV, ie, pv with pvmmu. They are also #ifdef with CONFIG_XEN_PVMMU. Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
-
- 09 11月, 2013 1 次提交
-
-
由 Paul Gortmaker 提交于
commit 6efa20e4 ("xen: Support 64-bit PV guest receiving NMIs") and commit cd9151e2 ( "xen/balloon: set a mapping for ballooned out pages") added new instances of __cpuinit usage. We removed this a couple versions ago; we now want to remove the compat no-op stubs. Introducing new users is not what we want to see at this point in time, as it will break once the stubs are gone. Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 20 8月, 2013 2 次提交
-
-
由 David Vrabel 提交于
During early setup, when the reserved regions and MMIO holes are being setup as 1:1 in the p2m, clear any mappings instead of making them 1:1 (execept for the ISA region which is expected to be mapped). This fixes a regression introduced in 3.5 by 83d51ab4 (xen/setup: update VA mapping when releasing memory during setup) which caused hosts with tboot to fail to boot. tboot marks a region in the e820 map as unusable and the dom0 kernel would attempt to map this region and Xen does not permit unusable regions to be mapped by guests. (XEN) 0000000000000000 - 0000000000060000 (usable) (XEN) 0000000000060000 - 0000000000068000 (reserved) (XEN) 0000000000068000 - 000000000009e000 (usable) (XEN) 0000000000100000 - 0000000000800000 (usable) (XEN) 0000000000800000 - 0000000000972000 (unusable) tboot marked this region as unusable. (XEN) 0000000000972000 - 00000000cf200000 (usable) (XEN) 00000000cf200000 - 00000000cf38f000 (reserved) (XEN) 00000000cf38f000 - 00000000cf3ce000 (ACPI data) (XEN) 00000000cf3ce000 - 00000000d0000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fe000000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000630000000 (usable) Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 David Vrabel 提交于
If there are UNUSABLE regions in the machine memory map, dom0 will attempt to map them 1:1 which is not permitted by Xen and the kernel will crash. There isn't anything interesting in the UNUSABLE region that the dom0 kernel needs access to so we can avoid making the 1:1 mapping and treat it as RAM. We only do this for dom0, as that is where tboot case shows up. A PV domU could have an UNUSABLE region in its pseudo-physical map and would need to be handled in another patch. This fixes a boot failure on hosts with tboot. tboot marks a region in the e820 map as unusable and the dom0 kernel would attempt to map this region and Xen does not permit unusable regions to be mapped by guests. (XEN) 0000000000000000 - 0000000000060000 (usable) (XEN) 0000000000060000 - 0000000000068000 (reserved) (XEN) 0000000000068000 - 000000000009e000 (usable) (XEN) 0000000000100000 - 0000000000800000 (usable) (XEN) 0000000000800000 - 0000000000972000 (unusable) tboot marked this region as unusable. (XEN) 0000000000972000 - 00000000cf200000 (usable) (XEN) 00000000cf200000 - 00000000cf38f000 (reserved) (XEN) 00000000cf38f000 - 00000000cf3ce000 (ACPI data) (XEN) 00000000cf3ce000 - 00000000d0000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fe000000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000630000000 (usable) Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> [v1: Altered the patch and description with domU's with UNUSABLE regions] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 09 8月, 2013 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
This is based on a patch that Zhenzhong Duan had sent - which was missing some of the remaining pieces. The kernel has the logic to handle Xen-type-exceptions using the paravirt interface in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME - pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN - pv_cpu_ops.iret). That means the nmi handler (and other exception handlers) use the hypervisor iret. The other changes that would be neccessary for this would be to translate the NMI_VECTOR to one of the entries on the ipi_vector and make xen_send_IPI_mask_allbutself use different events. Fortunately for us commit 1db01b49 (xen: Clean up apic ipi interface) implemented this and we piggyback on the cleanup such that the apic IPI interface will pass the right vector value for NMI. With this patch we can trigger NMIs within a PV guest (only tested x86_64). For this to work with normal PV guests (not initial domain) we need the domain to be able to use the APIC ops - they are already implemented to use the Xen event channels. For that to be turned on in a PV domU we need to remove the masking of X86_FEATURE_APIC. Incidentally that means kgdb will also now work within a PV guest without using the 'nokgdbroundup' workaround. Note that the 32-bit version is different and this patch does not enable that. CC: Lisa Nguyen <lisa@xenapiadmin.com> CC: Ben Guthro <benjamin.guthro@citrix.com> CC: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Fixed up per David Vrabel comments] Reviewed-by: NBen Guthro <benjamin.guthro@citrix.com> Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
-
- 15 7月, 2013 1 次提交
-
-
由 Paul Gortmaker 提交于
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/x86 uses of the __cpuinit macros from all C files. x86 only had the one __CPUINIT used in assembly files, and it wasn't paired off with a .previous or a __FINIT, so we can delete it directly w/o any corresponding additional change there. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: NIngo Molnar <mingo@kernel.org> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NH. Peter Anvin <hpa@linux.intel.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 10 2月, 2013 2 次提交
-
-
由 Len Brown 提交于
Remove 32-bit x86 a cmdline param "no-hlt", and the cpuinfo_x86.hlt_works_ok that it sets. If a user wants to avoid HLT, then "idle=poll" is much more useful, as it avoids invocation of HLT in idle, while "no-hlt" failed to do so. Indeed, hlt_works_ok was consulted in only 3 places. First, in /proc/cpuinfo where "hlt_bug yes" would be printed if and only if the user booted the system with "no-hlt" -- as there was no other code to set that flag. Second, check_hlt() would not invoke halt() if "no-hlt" were on the cmdline. Third, it was consulted in stop_this_cpu(), which is invoked by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() -- all cases where the machine is being shutdown/reset. The flag was not consulted in the more frequently invoked play_dead()/hlt_play_dead() used in processor offline and suspend. Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations indicating that it would be removed in 2012. Signed-off-by: NLen Brown <len.brown@intel.com> Cc: x86@kernel.org -
由 Len Brown 提交于
This macro is only invoked by Xen, so make its definition specific to Xen. > set_pm_idle_to_default() < xen_set_default_idle() Signed-off-by: NLen Brown <len.brown@intel.com> Cc: xen-devel@lists.xensource.com
-
- 24 9月, 2012 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
The hypervisor is in charge of allocating the proper "NUMA" memory and dealing with the CPU scheduler to keep them bound to the proper NUMA node. The PV guests (and PVHVM) have no inkling of where they run and do not need to know that right now. In the future we will need to inject NUMA configuration data (if a guest spans two or more NUMA nodes) so that the kernel can make the right choices. But those patches are not yet present. In the meantime, disable the NUMA capability in the PV guest, which also fixes a bootup issue. Andre says: "we see Dom0 crashes due to the kernel detecting the NUMA topology not by ACPI, but directly from the northbridge (CONFIG_AMD_NUMA). This will detect the actual NUMA config of the physical machine, but will crash about the mismatch with Dom0's virtual memory. Variation of the theme: Dom0 sees what it's not supposed to see. This happens with the said config option enabled and on a machine where this scanning is still enabled (K8 and Fam10h, not Bulldozer class) We have this dump then: NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10 Scanning NUMA topology in Northbridge 24 Number of physical nodes 4 Node 0 MemBase 0000000000000000 Limit 0000000040000000 Node 1 MemBase 0000000040000000 Limit 0000000138000000 Node 2 MemBase 0000000138000000 Limit 00000001f8000000 Node 3 MemBase 00000001f8000000 Limit 0000000238000000 Initmem setup node 0 0000000000000000-0000000040000000 NODE_DATA [000000003ffd9000 - 000000003fffffff] Initmem setup node 1 0000000040000000-0000000138000000 NODE_DATA [0000000137fd9000 - 0000000137ffffff] Initmem setup node 2 0000000138000000-00000001f8000000 NODE_DATA [00000001f095e000 - 00000001f0984fff] Initmem setup node 3 00000001f8000000-0000000238000000 Cannot find 159744 bytes in node 3 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar RIP: e030:[<ffffffff81d220e6>] [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 .. snip.. [<ffffffff81d23024>] sparse_early_usemaps_alloc_node+0x64/0x178 [<ffffffff81d23348>] sparse_init+0xe4/0x25a [<ffffffff81d16840>] paging_init+0x13/0x22 [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b [<ffffffff81683954>] ? printk+0x3c/0x3e [<ffffffff81d01a38>] start_kernel+0xe5/0x468 [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1 [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36 [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c " so we just disable NUMA scanning by setting numa_off=1. CC: stable@vger.kernel.org Reported-and-Tested-by: NAndre Przywara <andre.przywara@amd.com> Acked-by: NAndre Przywara <andre.przywara@amd.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 23 8月, 2012 2 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas." This reverts commit 806c312e and commit 59b29440. And also documents setup.c and why we want to do it that way, which is that we tried to make the the memblock_reserve more selective so that it would be clear what region is reserved. Sadly we ran in the problem wherein on a 64-bit hypervisor with a 32-bit initial domain, the pt_base has the cr3 value which is not neccessarily where the pagetable starts! As Jan put it: " Actually, the adjustment turns out to be correct: The page tables for a 32-on-64 dom0 get allocated in the order "first L1", "first L2", "first L3", so the offset to the page table base is indeed 2. When reading xen/include/public/xen.h's comment very strictly, this is not a violation (since there nothing is said that the first thing in the page table space is pointed to by pt_base; I admit that this seems to be implied though, namely do I think that it is implied that the page table space is the range [pt_base, pt_base + nt_pt_frames), whereas that range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames), which - without a priori knowledge - the kernel would have difficulty to figure out)." - so lets just fall back to the easy way and reserve the whole region. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
When we are finished with return PFNs to the hypervisor, then populate it back, and also mark the E820 MMIO and E820 gaps as IDENTITY_FRAMEs, we then call P2M to set areas that can be used for ballooning. We were off by one, and ended up over-writting a P2M entry that most likely was an IDENTITY_FRAME. For example: 1-1 mapping on 40000->40200 1-1 mapping on bc558->bc5ac 1-1 mapping on bc5b4->bc8c5 1-1 mapping on bc8c6->bcb7c 1-1 mapping on bcd00->100000 Released 614 pages of unused memory Set 277889 page(s) to 1-1 mapping Populating 40200-40466 pfn range: 614 pages added => here we set from 40466 up to bc559 P2M tree to be INVALID_P2M_ENTRY. We should have done it up to bc558. The end result is that if anybody is trying to construct a PTE for PFN bc558 they end up with ~PAGE_PRESENT. CC: stable@vger.kernel.org Reported-by-and-Tested-by: NAndre Przywara <andre.przywara@amd.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 22 8月, 2012 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
instead of a big memblock_reserve. This way we can be more selective in freeing regions (and it also makes it easier to understand where is what). [v1: Move the auto_translate_physmap to proper line] [v2: Per Stefano suggestion add more comments] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 20 7月, 2012 1 次提交
-
-
由 zhenzhong.duan 提交于
When populate pages across a mem boundary at bootup, the page count populated isn't correct. This is due to mem populated to non-mem region and ignored. Pfn range is also wrongly aligned when mem boundary isn't page aligned. For a dom0 booted with dom_mem=3368952K(0xcd9ff000-4k) dmesg diff is: [ 0.000000] Freeing 9e-100 pfn range: 98 pages freed [ 0.000000] 1-1 mapping on 9e->100 [ 0.000000] 1-1 mapping on cd9ff->100000 [ 0.000000] Released 98 pages of unused memory [ 0.000000] Set 206435 page(s) to 1-1 mapping -[ 0.000000] Populating cd9fe-cda00 pfn range: 1 pages added +[ 0.000000] Populating cd9fe-cd9ff pfn range: 1 pages added +[ 0.000000] Populating 100000-100061 pfn range: 97 pages added [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 000000000009e000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 00000000cd9ff000 (usable) [ 0.000000] Xen: 00000000cd9ffc00 - 00000000cda53c00 (ACPI NVS) ... [ 0.000000] Xen: 0000000100000000 - 0000000100061000 (usable) [ 0.000000] Xen: 0000000100061000 - 000000012c000000 (unusable) ... [ 0.000000] MEMBLOCK configuration: ... -[ 0.000000] reserved[0x4] [0x000000cd9ff000-0x000000cd9ffbff], 0xc00 bytes -[ 0.000000] reserved[0x5] [0x00000100000000-0x00000100060fff], 0x61000 bytes Related xen memory layout: (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009ec00 (usable) (XEN) 00000000000f0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000cd9ffc00 (usable) Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com> [v2: If xen_do_chunk fail(populate), abort this chunk and any others] Suggested by David, thanks. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 30 5月, 2012 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
We did not take into account that xen_released_pages would be used outside the initial E820 parsing code. As such we would did not subtract from xen_released_pages the count of pages that we had populated back (instead we just did a simple extra_pages = released - populated). The balloon driver uses xen_released_pages to set the initial current_pages count. If this is wrong (too low) then when a new (higher) target is set, the balloon driver will request too many pages from Xen." This fixes errors such as: (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (51 of 512) during bootup and free_memory : 0 where the free_memory should be 128. Acked-by: NDavid Vrabel <david.vrabel@citrix.com> [v1: Per David's review made the git commit better] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 08 5月, 2012 4 次提交
-
-
由 David Vrabel 提交于
In xen_memory_setup(), if a page that is being released has a VA mapping this must also be updated. Otherwise, the page will be not released completely -- it will still be referenced in Xen and won't be freed util the mapping is removed and this prevents it from being reallocated at a different PFN. This was already being done for the ISA memory region in xen_ident_map_ISA() but on many systems this was omitting a few pages as many systems marked a few pages below the ISA memory region as reserved in the e820 map. This fixes errors such as: (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152 (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17) Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
They use the same set of arguments, so it is just the matter of using the proper hypercall. Acked-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
When the Xen hypervisor boots a PV kernel it hands it two pieces of information: nr_pages and a made up E820 entry. The nr_pages value defines the range from zero to nr_pages of PFNs which have a valid Machine Frame Number (MFN) underneath it. The E820 mirrors that (with the VGA hole): BIOS-provided physical RAM map: Xen: 0000000000000000 - 00000000000a0000 (usable) Xen: 00000000000a0000 - 0000000000100000 (reserved) Xen: 0000000000100000 - 0000000080800000 (usable) The fun comes when a PV guest that is run with a machine E820 - that can either be the initial domain or a PCI PV guest, where the E820 looks like the normal thing: BIOS-provided physical RAM map: Xen: 0000000000000000 - 000000000009e000 (usable) Xen: 000000000009ec00 - 0000000000100000 (reserved) Xen: 0000000000100000 - 0000000020000000 (usable) Xen: 0000000020000000 - 0000000020200000 (reserved) Xen: 0000000020200000 - 0000000040000000 (usable) Xen: 0000000040000000 - 0000000040200000 (reserved) Xen: 0000000040200000 - 00000000bad80000 (usable) Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS) .. With that overlaying the nr_pages directly on the E820 does not work as there are gaps and non-RAM regions that won't be used by the memory allocator. The 'xen_release_chunk' helps with that by punching holes in the P2M (PFN to MFN lookup tree) for those regions and tells us that: Freeing 20000-20200 pfn range: 512 pages freed Freeing 40000-40200 pfn range: 512 pages freed Freeing bad80-badf4 pfn range: 116 pages freed Freeing badf6-bae7f pfn range: 137 pages freed Freeing bb000-100000 pfn range: 282624 pages freed Released 283999 pages of unused memory Those 283999 pages are subtracted from the nr_pages and are returned to the hypervisor. The end result is that the initial domain boots with 1GB less memory as the nr_pages has been subtracted by the amount of pages residing within the PCI hole. It can balloon up to that if desired using 'xl mem-set 0 8092', but the balloon driver is not always compiled in for the initial domain. This patch, implements the populate hypercall (XENMEM_populate_physmap) which increases the the domain with the same amount of pages that were released. The other solution (that did not work) was to transplant the MFN in the P2M tree - the ones that were going to be freed were put in the E820_RAM regions past the nr_pages. But the modifications to the M2P array (the other side of creating PTEs) were not carried away. As the hypervisor is the only one capable of modifying that and the only two hypercalls that would do this are: the update_va_mapping (which won't work, as during initial bootup only PFNs up to nr_pages are mapped in the guest) or via the populate hypercall. The end result is that the kernel can now boot with the nr_pages without having to subtract the 283999 pages. On a 8GB machine, with various dom0_mem= parameters this is what we get: no dom0_mem -Memory: 6485264k/9435136k available (5817k kernel code, 1136060k absent, 1813812k reserved, 2899k data, 696k init) +Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init) dom0_mem=3G -Memory: 2616536k/9435136k available (5817k kernel code, 1136060k absent, 5682540k reserved, 2899k data, 696k init) +Memory: 2703776k/9435136k available (5817k kernel code, 1136060k absent, 5595300k reserved, 2899k data, 696k init) dom0_mem=max:3G -Memory: 2696732k/4281724k available (5817k kernel code, 1136060k absent, 448932k reserved, 2899k data, 696k init) +Memory: 2702204k/4281724k available (5817k kernel code, 1136060k absent, 443460k reserved, 2899k data, 696k init) And the 'xm list' or 'xl list' now reflect what the dom0_mem= argument is. Acked-by: NDavid Vrabel <david.vrabel@citrix.com> [v2: Use populate hypercall] [v3: Remove debug printks] [v4: Simplify code] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
Otherwise we can get these meaningless: Freeing bad80-badf4 pfn range: 0 pages freed We also can do this for the summary ones - no point of printing "Set 0 page(s) to 1-1 mapping" Acked-by: NDavid Vrabel <david.vrabel@citrix.com> [v1: Extended to the summary printks] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 21 3月, 2012 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
By using the functionality provided by "[CPUFREQ]: provide disable_cpuidle() function to disable the API." Under the Xen hypervisor we do not want the initial domain to exercise the cpufreq scaling drivers. This is b/c the Xen hypervisor is in charge of doing this as well and we can end up with both the Linux kernel and the hypervisor trying to change the P-states leading to weird performance issues. Acked-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: Fix compile error spotted by Benjamin Schweikert <b.schweikert@googlemail.com>]
-
- 11 3月, 2012 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
We needed that call in the past to force the kernel to use default_idle (which called safe_halt, which called xen_safe_halt). But set_pm_idle_to_default() does now that, so there is no need to use this boot option operand. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 16 12月, 2011 1 次提交
-
-
由 Ian Campbell 提交于
d312ae87 "xen: use maximum reservation to limit amount of usable RAM" clamped the total amount of RAM to the current maximum reservation. This is correct for dom0 but is not correct for guest domains. In order to boot a guest "pre-ballooned" (e.g. with memory=1G but maxmem=2G) in order to allow for future memory expansion the guest must derive max_pfn from the e820 provided by the toolstack and not the current maximum reservation (which can reflect only the current maximum, not the guest lifetime max). The existing algorithm already behaves this correctly if we do not artificially limit the maximum number of pages for the guest case. For a guest booted with maxmem=512, memory=128 this results in: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) -[ 0.000000] Xen: 0000000000100000 - 0000000008100000 (usable) -[ 0.000000] Xen: 0000000008100000 - 0000000020800000 (unusable) +[ 0.000000] Xen: 0000000000100000 - 0000000020800000 (usable) ... [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI not present or invalid. [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable) -[ 0.000000] last_pfn = 0x8100 max_arch_pfn = 0x1000000 +[ 0.000000] last_pfn = 0x20800 max_arch_pfn = 0x1000000 [ 0.000000] initial memory mapped : 0 - 027ff000 [ 0.000000] Base memory trampoline at [c009f000] 9f000 size 4096 -[ 0.000000] init_memory_mapping: 0000000000000000-0000000008100000 -[ 0.000000] 0000000000 - 0008100000 page 4k -[ 0.000000] kernel direct mapping tables up to 8100000 @ 27bb000-27ff000 +[ 0.000000] init_memory_mapping: 0000000000000000-0000000020800000 +[ 0.000000] 0000000000 - 0020800000 page 4k +[ 0.000000] kernel direct mapping tables up to 20800000 @ 26f8000-27ff000 [ 0.000000] xen: setting RW the range 27e8000 - 27ff000 [ 0.000000] 0MB HIGHMEM available. -[ 0.000000] 129MB LOWMEM available. -[ 0.000000] mapped low ram: 0 - 08100000 -[ 0.000000] low ram: 0 - 08100000 +[ 0.000000] 520MB LOWMEM available. +[ 0.000000] mapped low ram: 0 - 20800000 +[ 0.000000] low ram: 0 - 20800000 With this change "xl mem-set <domain> 512M" will successfully increase the guest RAM (by reducing the balloon). There is no change for dom0. Reported-and-Tested-by: NGeorge Shuklin <george.shuklin@gmail.com> Signed-off-by: NIan Campbell <ian.campbell@citrix.com> Cc: stable@kernel.org Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 04 12月, 2011 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
The idea behind commit d91ee586 ("cpuidle: replace xen access to x86 pm_idle and default_idle") was to have one call - disable_cpuidle() which would make pm_idle not be molested by other code. It disallows cpuidle_idle_call to be set to pm_idle (which is excellent). But in the select_idle_routine() and idle_setup(), the pm_idle can still be set to either: amd_e400_idle, mwait_idle or default_idle. This depends on some CPU flags (MWAIT) and in AMD case on the type of CPU. In case of mwait_idle we can hit some instances where the hypervisor (Amazon EC2 specifically) sets the MWAIT and we get: Brought up 2 CPUs invalid opcode: 0000 [#1] SMP Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1 RIP: e030:[<ffffffff81015d1d>] [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4 ... Call Trace: [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8 [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10 RIP [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4 RSP <ffff8801d28ddf10> In the case of amd_e400_idle we don't get so spectacular crashes, but we do end up making an MSR which is trapped in the hypervisor, and then follow it up with a yield hypercall. Meaning we end up going to hypervisor twice instead of just once. The previous behavior before v3.0 was that pm_idle was set to default_idle regardless of select_idle_routine/idle_setup. We want to do that, but only for one specific case: Xen. This patch does that. Fixes RH BZ #739499 and Ubuntu #881076 Reported-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 9月, 2011 4 次提交
-
-
由 David Vrabel 提交于
In xen_memory_setup() all reserved regions and gaps are set to an identity (1-1) p2m mapping. If an available page has a PFN within one of these 1-1 mappings it will become inaccessible (as it MFN is lost) so release them before setting up the mapping. This can make an additional 256 MiB or more of RAM available (depending on the size of the reserved regions in the memory map) if the initial pages overlap with reserved regions. The 1:1 p2m mappings are also extended to cover partial pages. This fixes an issue with (for example) systems with a BIOS that puts the DMI tables in a reserved region that begins on a non-page boundary. Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 David Vrabel 提交于
Allow the extra memory (used by the balloon driver) to be in multiple regions (typically two regions, one for low memory and one for high memory). This allows the balloon driver to increase the number of available low pages (if the initial number if pages is small). As a side effect, the algorithm for building the e820 memory map is simpler and more obviously correct as the map supplied by the hypervisor is (almost) used as is (in particular, all reserved regions and gaps are preserved). Only RAM regions are altered and RAM regions above max_pfn + extra_pages are marked as unused (the region is split in two if necessary). Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 David Vrabel 提交于
Allow the xen balloon driver to populate its list of extra pages from more than one region of memory. This will allow platforms to provide (for example) a region of low memory and a region of high memory. The maximum possible number of extra regions is 128 (== E820MAX) which is quite large so xen_extra_mem is placed in __initdata. This is safe as both xen_memory_setup() and balloon_init() are in __init. The balloon regions themselves are not altered (i.e., there is still only the one region). Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 David Vrabel 提交于
In xen_memory_setup() pages that occur in gaps in the memory map are released back to Xen. This reduces the domain's current page count in the hypervisor. The Xen balloon driver does not correctly decrease its initial current_pages count to reflect this. If 'delta' pages are released and the target is adjusted the resulting reservation is always 'delta' less than the requested target. This affects dom0 if the initial allocation of pages overlaps the PCI memory region but won't affect most domU guests that have been setup with pseudo-physical memory maps that don't have gaps. Fix this by accouting for the released pages when starting the balloon driver. If the domain's targets are managed by xapi, the domain may eventually run out of memory and die because xapi currently gets its target calculations wrong and whenever it is restarted it always reduces the target by 'delta'. Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 13 9月, 2011 1 次提交
-
-
由 David Vrabel 提交于
The patch "xen: use maximum reservation to limit amount of usable RAM" (d312ae87) breaks machines that do not use 'dom0_mem=' argument with: reserve RAM buffer: 000000133f2e2000 - 000000133fffffff (XEN) mm.c:4976:d0 Global bit is set to kernel page fffff8117e (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: ... The reason being that the last E820 entry is created using the 'extra_pages' (which is based on how many pages have been freed). The mentioned git commit sets the initial value of 'extra_pages' using a hypercall which returns the number of pages (if dom0_mem has been used) or -1 otherwise. If the later we return with MAX_DOMAIN_PAGES as basis for calculation: return min(max_pages, MAX_DOMAIN_PAGES); and use it: extra_limit = xen_get_max_pages(); if (extra_limit >= max_pfn) extra_pages = extra_limit - max_pfn; else extra_pages = 0; which means we end up with extra_pages = 128GB in PFNs (33554432) - 8GB in PFNs (2097152, on this specific box, can be larger or smaller), and then we add that value to the E820 making it: Xen: 00000000ff000000 - 0000000100000000 (reserved) Xen: 0000000100000000 - 000000133f2e2000 (usable) which is clearly wrong. It should look as so: Xen: 00000000ff000000 - 0000000100000000 (reserved) Xen: 0000000100000000 - 000000027fbda000 (usable) Naturally this problem does not present itself if dom0_mem=max:X is used. CC: stable@kernel.org Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 01 9月, 2011 1 次提交
-
-
由 David Vrabel 提交于
Use the domain's maximum reservation to limit the amount of extra RAM for the memory balloon. This reduces the size of the pages tables and the amount of reserved low memory (which defaults to about 1/32 of the total RAM). On a system with 8 GiB of RAM with the domain limited to 1 GiB the kernel reports: Before: Memory: 627792k/4472000k available After: Memory: 549740k/11132224k available A increase of about 76 MiB (~1.5% of the unused 7 GiB). The reserved low memory is also reduced from 253 MiB to 32 MiB. The total additional usable RAM is 329 MiB. For dom0, this requires at patch to Xen ('x86: use 'dom0_mem' to limit the number of pages for dom0') (c/s 23790) CC: stable@kernel.org Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 05 8月, 2011 2 次提交
-
-
由 Igor Mammedov 提交于
WARN message should not complain "Failed to release memory %lx-%lx err=%d\n" ^^^^^^^ about range when it fails to release just one page, instead it should say what pfn is not freed. In addition line: printk(KERN_INFO "xen_release_chunk: looking at area pfn %lx-%lx: " ... printk(KERN_CONT "%lu pages freed\n", len); will be broken if WARN in between this line is fired. So fix it by using a single printk for this. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> -
由 Igor Mammedov 提交于
Use correct format specifier for unsigned long. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 04 8月, 2011 1 次提交
-
-
由 Len Brown 提交于
When a Xen Dom0 kernel boots on a hypervisor, it gets access to the raw-hardware ACPI tables. While it parses the idle tables for the hypervisor's beneift, it uses HLT for its own idle. Rather than have xen scribble on pm_idle and access default_idle, have it simply disable_cpuidle() so acpi_idle will not load and architecture default HLT will be used. cc: xen-devel@lists.xensource.com Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NH. Peter Anvin <hpa@linux.intel.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
- 15 7月, 2011 1 次提交
-
-
由 Tejun Heo 提交于
Other than sanity check and debug message, the x86 specific version of memblock reserve/free functions are simple wrappers around the generic versions - memblock_reserve/free(). This patch adds debug messages with caller identification to the generic versions and replaces x86 specific ones and kills them. arch/x86/include/asm/memblock.h and arch/x86/mm/memblock.c are empty after this change and removed. Signed-off-by: NTejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-14-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 17 6月, 2011 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
The earlier attempts (24bdb0b6) at fixing this problem caused other problems to surface (PV guests with no PCI passthrough would have SWIOTLB turned on - which meant 64MB of precious contingous DMA32 memory being eaten up per guest). The problem was: "on xen we add an extra memory region at the end of the e820, and on this particular machine this extra memory region would start below 4g and cross over the 4g boundary: [0xfee01000-0x192655000) Unfortunately e820_end_of_low_ram_pfn does not expect an e820 layout like that so it returns 4g, therefore initial_memory_mapping will map [0 - 0x100000000), that is a memory range that includes some reserved memory regions." The memory range was the IOAPIC regions, and with the 1-1 mapping turned on, it would map them as RAM, not as MMIO regions. This caused the hypervisor to complain. Fortunately this is experienced only under the initial domain so we guard for it. Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 13 5月, 2011 3 次提交
-
-
由 Daniel Kiper 提交于
Cleanup code/data sections definitions accordingly to include/linux/init.h. Signed-off-by: NDaniel Kiper <dkiper@net-space.pl> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Daniel Kiper 提交于
git commit 24bdb0b6 (xen: do not create the extra e820 region at an addr lower than 4G) does not take into account that ifdef CONFIG_X86_32 instead of e820_end_of_low_ram_pfn() find_low_pfn_range() is called (both calls are from arch/x86/kernel/setup.c). find_low_pfn_range() behaves correctly and does not require change in xen_extra_mem_start initialization. Additionally, if xen_extra_mem_start is initialized in the same way as ifdef CONFIG_X86_64 then memory hotplug support for Xen balloon driver (under development) is broken. Signed-off-by: NDaniel Kiper <dkiper@net-space.pl> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
When we parse the raw E820, the Xen hypervisor can set "E820_RAM" to "E820_UNUSABLE" if the mem=X argument is used. As such we should _not_ consider the E820_UNUSABLE as an 1-1 identity mapping, but instead use the same case as for E820_RAM. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 20 4月, 2011 1 次提交
-
-
由 Stefano Stabellini 提交于
Do not add the extra e820 region at a physical address lower than 4G because it breaks e820_end_of_low_ram_pfn(). It is OK for us to move the xen_extra_mem_start up and down because this is the index of the memory that can be ballooned in/out - it is memory not available to the kernel during bootup. Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-