- 20 8月, 2015 4 次提交
-
-
由 Juergen Gross 提交于
Check whether the hypervisor supplied p2m list is placed at a location which is conflicting with the target E820 map. If this is the case relocate it to a new area unused up to now and compliant to the E820 map. As the p2m list might by huge (up to several GB) and is required to be mapped virtually, set up a temporary mapping for the copied list. For pvh domains just delete the p2m related information from start info instead of reserving the p2m memory, as we don't need it at all. For 32 bit kernels adjust the memblock_reserve() parameters in order to cover the page tables only. This requires to memblock_reserve() the start_info page on it's own. Signed-off-by: NJuergen Gross <jgross@suse.com> Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
由 Juergen Gross 提交于
Some special pages containing interfaces to xen are being reserved implicitly only today. The memblock_reserve() call to reserve them is meant to reserve the p2m list supplied by xen. It is just reserving not only the p2m list itself, but some more pages up to the start of the xen built page tables. To be able to move the p2m list to another pfn range, which is needed for support of huge RAM, this memblock_reserve() must be split up to cover all affected reserved pages explicitly. The affected pages are: - start_info page - xenstore ring (might be missing, mfn is 0 in this case) - console ring (not for initial domain) Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
由 Juergen Gross 提交于
Check whether the page tables built by the domain builder are at memory addresses which are in conflict with the target memory map. If this is the case just panic instead of running into problems later. Signed-off-by: NJuergen Gross <jgross@suse.com> Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
由 Juergen Gross 提交于
Direct Xen to place the initial P->M table outside of the initial mapping, as otherwise the 1G (implementation) / 2G (theoretical) restriction on the size of the initial mapping limits the amount of memory a domain can be handed initially. As the initial P->M table is copied rather early during boot to domain private memory and it's initial virtual mapping is dropped, the easiest way to avoid virtual address conflicts with other addresses in the kernel is to use a user address area for the virtual address of the initial P->M table. This allows us to just throw away the page tables of the initial mapping after the copy without having to care about address invalidation. It should be noted that this patch won't enable a pv-domain to USE more than 512 GB of RAM. It just enables it to be started with a P->M table covering more memory. This is especially important for being able to boot a Dom0 on a system with more than 512 GB memory. Signed-off-by: NJuergen Gross <jgross@suse.com> Based-on-patch-by: NJan Beulich <jbeulich@suse.com> Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
- 15 4月, 2015 1 次提交
-
-
由 Kirill A. Shutemov 提交于
We would want to use number of page table level to define mm_struct. Let's expose it as CONFIG_PGTABLE_LEVELS. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Tested-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 3月, 2015 2 次提交
-
-
由 David Vrabel 提交于
Make the IOCTL_PRIVCMD_MMAPBATCH_V2 (and older V1 version) map multiple frames at a time rather than one at a time, despite the pages being non-consecutive GFNs. xen_remap_foreign_mfn_array() is added which maps an array of GFNs (instead of a consecutive range of GFNs). Since per-frame errors are returned in an array, privcmd must set the MMAPBATCH_V1 error bits as part of the "report errors" phase, after all the frames are mapped. Migrate times are significantly improved (when using a PV toolstack domain). For example, for an idle 12 GiB PV guest: Before After real 0m38.179s 0m26.868s user 0m15.096s 0m13.652s sys 0m28.988s 0m18.732s Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
-
由 David Vrabel 提交于
Auto-translated physmap guests (arm, arm64 and x86 PVHVM/PVH) map and unmap foreign GFNs using the same method (updating the physmap). Unify the two arm and x86 implementations into one commont one. Note that on arm and arm64, the correct error code will be returned (instead of always -EFAULT) and map/unmap failure warnings are no longer printed. These changes are required if the foreign domain is paging (-ENOENT failures are expected and must be propagated up to the caller). Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
-
- 28 1月, 2015 2 次提交
-
-
由 Juergen Gross 提交于
Remove a nested ifdef. Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
由 Juergen Gross 提交于
The file arch/x86/xen/mmu.c has some functions that can be annotated with "__init". Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
- 11 12月, 2014 1 次提交
-
-
由 Juergen Gross 提交于
With the virtual mapped linear p2m list the post-init mmu operations must be used for setting up the p2m mappings, as in case of CONFIG_FLATMEM the init routines may trigger BUGs. paging_init() sets up all infrastructure needed to switch to the post-init mmu ops done by xen_post_allocator_init(). With the virtual mapped linear p2m list we need some mmu ops during setup of this list, so we have to switch to the correct mmu ops as soon as possible. The p2m list is usable from the beginning, just expansion requires to have established the new linear mapping. So the call of xen_remap_memory() had to be introduced, but this is not due to the mmu ops requiring this. Summing it up: calling xen_post_allocator_init() not directly after paging_init() was conceptually wrong in the beginning, it just didn't matter up to now as no functions used between the two calls needed some critical mmu ops (e.g. alloc_pte). This has changed now, so I corrected it. Reported-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
- 04 12月, 2014 4 次提交
-
-
由 Juergen Gross 提交于
At start of the day the Xen hypervisor presents a contiguous mfn list to a pv-domain. In order to support sparse memory this mfn list is accessed via a three level p2m tree built early in the boot process. Whenever the system needs the mfn associated with a pfn this tree is used to find the mfn. Instead of using a software walked tree for accessing a specific mfn list entry this patch is creating a virtual address area for the entire possible mfn list including memory holes. The holes are covered by mapping a pre-defined page consisting only of "invalid mfn" entries. Access to a mfn entry is possible by just using the virtual base address of the mfn list and the pfn as index into that list. This speeds up the (hot) path of determining the mfn of a pfn. Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0 showed following improvements: Elapsed time: 32:50 -> 32:35 System: 18:07 -> 17:47 User: 104:00 -> 103:30 Tested with following configurations: - 64 bit dom0, 8GB RAM - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without the patch) - 32 bit domU, ballooning up and down - 32 bit domU, save and restore - 32 bit domU with PCI passthrough - 64 bit domU, 8 GB, 2049 MB, 5000 MB - 64 bit domU, ballooning up and down - 64 bit domU, save and restore - 64 bit domU with PCI passthrough Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
由 Juergen Gross 提交于
Today get_phys_to_machine() is always called when the mfn for a pfn is to be obtained. Add a wrapper __pfn_to_mfn() as inline function to be able to avoid calling get_phys_to_machine() when possible as soon as the switch to a linear mapped p2m list has been done. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
由 Juergen Gross 提交于
Early in the boot process the memory layout of a pv-domain is changed to match the E820 map (either the host one for Dom0 or the Xen one) regarding placement of RAM and PCI holes. This requires removing memory pages initially located at positions not suitable for RAM and adding them later at higher addresses where no restrictions apply. To be able to operate on the hypervisor supported p2m list until a virtual mapped linear p2m list can be constructed, remapping must be delayed until virtual memory management is initialized, as the initial p2m list can't be extended unlimited at physical memory initialization time due to it's fixed structure. A further advantage is the reduction in complexity and code volume as we don't have to be careful regarding memory restrictions during p2m updates. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
由 Juergen Gross 提交于
In arch/x86/xen/p2m.c three different allocation functions for obtaining a memory page are used: extend_brk(), alloc_bootmem_align() or __get_free_page(). Which of those functions is used depends on the progress of the boot process of the system. Introduce a common allocation routine selecting the to be called allocation routine dynamically based on the boot progress. This allows moving initialization steps without having to care about changing allocation calls. Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
- 16 11月, 2014 1 次提交
-
-
由 Juergen Gross 提交于
With the dynamical mapping between cache modes and pgprot values it is now possible to use all cache modes via the Xen hypervisor PAT settings in a pv domain. All to be done is to read the PAT configuration MSR and set up the translation tables accordingly. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: ville.syrjala@linux.intel.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-19-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 04 11月, 2014 1 次提交
-
-
由 Andy Lutomirski 提交于
This adds CONFIG_X86_VSYSCALL_EMULATION, guarded by CONFIG_EXPERT. Turning it off completely disables vsyscall emulation, saving ~3.5k for vsyscall_64.c, 4k for vsyscall_emu_64.S (the fake vsyscall page), some tiny amount of core mm code that supports a gate area, and possibly 4k for a wasted pagetable. The latter is because the vsyscall addresses are misaligned and fit poorly in the fixmap. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/406db88b8dd5f0cbbf38216d11be34bbb43c7eae.1414618407.git.luto@amacapital.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 23 10月, 2014 1 次提交
-
-
由 Juergen Gross 提交于
The 3 level p2m tree for the Xen tools is constructed very early at boot by calling xen_build_mfn_list_list(). Memory needed for this tree is allocated via extend_brk(). As this tree (other than the kernel internal p2m tree) is only needed for domain save/restore, live migration and crash dump analysis it doesn't matter whether it is constructed very early or just some milliseconds later when memory allocation is possible by other means. This patch moves the call of xen_build_mfn_list_list() just after calling xen_pagetable_p2m_copy() simplifying this function, too, as it doesn't have to bother with two parallel trees now. The same applies for some other internal functions. While simplifying code, make early_can_reuse_p2m_middle() static and drop the unused second parameter. p2m_mid_identity_mfn can be removed as well, it isn't used either. Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
- 23 9月, 2014 1 次提交
-
-
由 David Vrabel 提交于
Since mfn_to_pfn() returns the correct PFN for identity mappings (as used for MMIO regions), the use of _PAGE_IOMAP is not required in pte_mfn_to_pfn(). Do not set the _PAGE_IOMAP flag in pte_pfn_to_mfn() and do not use it in pte_mfn_to_pfn(). This will allow _PAGE_IOMAP to be removed, making it available for future use. Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 10 9月, 2014 1 次提交
-
-
由 Stefan Bader 提交于
When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded modules exceeds 512 MiB, then loading modules fails with a warning (and hence a vmalloc allocation failure) because the PTEs for the newly-allocated vmalloc address space are not zero. WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128 vmap_page_range_noflush+0x2a1/0x360() This is caused by xen_setup_kernel_pagetables() copying level2_kernel_pgt into level2_fixmap_pgt, overwriting many non-present entries. Without KASLR, the normal kernel image size only covers the first half of level2_kernel_pgt and module space starts after that. L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[ 0..255]->kernel [256..511]->module [511]->level2_fixmap_pgt[ 0..505]->module This allows 512 MiB of of module vmalloc space to be used before having to use the corrupted level2_fixmap_pgt entries. With KASLR enabled, the kernel image uses the full PUD range of 1G and module space starts in the level2_fixmap_pgt. So basically: L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel [511]->level2_fixmap_pgt[0..505]->module And now no module vmalloc space can be used without using the corrupt level2_fixmap_pgt entries. Fix this by properly converting the level2_fixmap_pgt entries to MFNs, and setting level1_fixmap_pgt as read-only. A number of comments were also using the the wrong L3 offset for level2_kernel_pgt. These have been corrected. Signed-off-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org
-
- 27 5月, 2014 1 次提交
-
-
由 Mukesh Rathor 提交于
When running as a dom0 in PVH mode, foreign pfns that are accessed must be added to our p2m which is managed by xen. This is done via XENMEM_add_to_physmap_range hypercall. This is needed for toolstack building guests and mapping guest memory, xentrace mapping xen pages, etc. Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
- 15 5月, 2014 1 次提交
-
-
由 David Vrabel 提交于
_PAGE_IOMAP is used in xen_remap_domain_mfn_range() to prevent the pfn_pte() call in remap_area_mfn_pte_fn() from using the p2m to translate the MFN. If mfn_pte() is used instead, the p2m look up is avoided and the use of _PAGE_IOMAP is no longer needed. Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 06 5月, 2014 1 次提交
-
-
由 Andy Lutomirski 提交于
This makes the 64-bit and x32 vdsos use the same mechanism as the 32-bit vdso. Most of the churn is deleting all the old fixmap code. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/8af87023f57f6bb96ec8d17fce3f88018195b49b.1399317206.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 25 3月, 2014 1 次提交
-
-
由 David Vrabel 提交于
This reverts commit a9c8e4be. PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT is set and pseudo-physical addresses is _PAGE_PRESENT is clear. This is because during a domain save/restore (migration) the page table entries are "canonicalised" and uncanonicalised". i.e., MFNs are converted to PFNs during domain save so that on a restore the page table entries may be rewritten with the new MFNs on the destination. This canonicalisation is only done for PTEs that are present. This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or _PAGE_NUMA) was set but _PAGE_PRESENT was clear. These PTEs would be migrated as-is which would result in unexpected behaviour in the destination domain. Either a) the MFN would be translated to the wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE because the MFN is no longer owned by the domain; or c) the present bit would not get set. Symptoms include "Bad page" reports when munmapping after migrating a domain. Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> [3.12+]
-
- 14 3月, 2014 1 次提交
-
-
由 H. Peter Anvin 提交于
Checkin b0b49f26 x86, vdso: Remove compat vdso support ... removed the VDSO from the fixmap, and thus FIX_VDSO; remove a stray reference in Xen. Found by Fengguang Wu's test robot. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Link: http://lkml.kernel.org/r/4bb4690899106eb11430b1186d5cc66ca9d1660c.1394751608.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 11 2月, 2014 1 次提交
-
-
由 Mel Gorman 提交于
Steven Noonan forwarded a users report where they had a problem starting vsftpd on a Xen paravirtualized guest, with this in dmesg: BUG: Bad page map in process vsftpd pte:8000000493b88165 pmd:e9cc01067 page:ffffea00124ee200 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x2ffc0000000014(referenced|dirty) addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping: (null) index:7f97eea74 CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1 Call Trace: dump_stack+0x45/0x56 print_bad_pte+0x22e/0x250 unmap_single_vma+0x583/0x890 unmap_vmas+0x65/0x90 exit_mmap+0xc5/0x170 mmput+0x65/0x100 do_exit+0x393/0x9e0 do_group_exit+0xcc/0x140 SyS_exit_group+0x14/0x20 system_call_fastpath+0x1a/0x1f Disabling lock debugging due to kernel taint BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1 BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1 The issue could not be reproduced under an HVM instance with the same kernel, so it appears to be exclusive to paravirtual Xen guests. He bisected the problem to commit 1667918b ("mm: numa: clear numa hinting information on mprotect") that was also included in 3.12-stable. The problem was related to how xen translates ptes because it was not accounting for the _PAGE_NUMA bit. This patch splits pte_present to add a pteval_present helper for use by xen so both bare metal and xen use the same code when checking if a PTE is present. [mgorman@suse.de: wrote changelog, proposed minor modifications] [akpm@linux-foundation.org: fix typo in comment] Reported-by: NSteven Noonan <steven@uplinklabs.net> Tested-by: NSteven Noonan <steven@uplinklabs.net> Signed-off-by: NElena Ufimtseva <ufimtseva@gmail.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com> Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 1月, 2014 1 次提交
-
-
由 Andi Kleen 提交于
The paravirt thunks use a hack of using a static reference to a static function to reference that function from the top level statement. This assumes that gcc always generates static function names in a specific format, which is not necessarily true. Simply make these functions global and asmlinkage or __visible. This way the static __used variables are not needed and everything works. Functions with arguments are __visible to keep the register calling convention on 32bit. Changed in paravirt and in all users (Xen and vsmp) v2: Use __visible for functions with arguments Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ido Yariv <ido@wizery.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-5-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 06 1月, 2014 4 次提交
-
-
由 Mukesh Rathor 提交于
We also optimize one - the TLB flush. The native operation would needlessly IPI offline VCPUs causing extra wakeups. Using the Xen one avoids that and lets the hypervisor determine which VCPU needs the TLB flush. Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Mukesh Rathor 提交于
.. which are surprisingly small compared to the amount for PV code. PVH uses mostly native mmu ops, we leave the generic (native_*) for the majority and just overwrite the baremetal with the ones we need. At startup, we are running with pre-allocated page-tables courtesy of the tool-stack. But we still need to graft them in the Linux initial pagetables. However there is no need to unpin/pin and change them to R/O or R/W. Note that the xen_pagetable_init due to 7836fec9d0994cc9c9150c5a33f0eb0eb08a335a "xen/mmu/p2m: Refactor the xen_pagetable_init code." does not need any changes - we just need to make sure that xen_post_allocator_init does not alter the pvops from the default native one. Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
-
由 Konrad Rzeszutek Wilk 提交于
Stefano noticed that the code runs only under 64-bit so the comments about 32-bit are pointless. Also we change the condition for xen_revector_p2m_tree returning the same value (because it could not allocate a swath of space to put the new P2M in) or it had been called once already. In such we return early from the function. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
-
由 Konrad Rzeszutek Wilk 提交于
The revectoring and copying of the P2M only happens when !auto-xlat and on 64-bit builds. It is not obvious from the code, so lets have seperate 32 and 64-bit functions. We also invert the check for auto-xlat to make the code flow simpler. Suggested-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 15 11月, 2013 2 次提交
-
-
由 Kirill A. Shutemov 提交于
If split page table lock is in use, we embed the lock into struct page of table's page. We have to disable split lock, if spinlock_t is too big be to be embedded, like when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC enabled. This patch add support for dynamic allocation of split page table lock if we can't embed it to struct page. page->ptl is unsigned long now and we use it as spinlock_t if sizeof(spinlock_t) <= sizeof(long), otherwise it's pointer to spinlock_t. The spinlock_t allocated in pgtable_page_ctor() for PTE table and in pgtable_pmd_page_ctor() for PMD table. All other helpers converted to support dynamically allocated page->ptl. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
We're going to introduce split page table lock for PMD level. Let's rename existing split ptlock for PTE level to avoid confusion. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: NAlex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 10月, 2013 2 次提交
-
-
由 Stefano Stabellini 提交于
Use xen_alloc_coherent_pages and xen_free_coherent_pages to allocate or free coherent pages. We need to be careful handling the pointer returned by xen_alloc_coherent_pages, because on ARM the pointer is not equal to phys_to_virt(*dma_handle). In fact virt_to_phys only works for kernel direct mapped RAM memory. In ARM case the pointer could be an ioremap address, therefore passing it to virt_to_phys would give you another physical address that doesn't correspond to it. Make xen_create_contiguous_region take a phys_addr_t as start parameter to avoid the virt_to_phys calls which would be incorrect. Changes in v6: - remove extra spaces. Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Stefano Stabellini 提交于
Modify xen_create_contiguous_region to return the dma address of the newly contiguous buffer. Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com> Changes in v4: - use virt_to_machine instead of virt_to_bus.
-
- 27 9月, 2013 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
Jan Beulich spotted that the PAT MSR settings in the Xen public document that "the first (PAT6) column was wrong across the board, and the column for PAT7 was missing altogether." This updates it to be in sync. CC: Jan Beulich <jbeulich@suse.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NJan Beulich <jbeulich@suse.com>
-
- 12 4月, 2013 1 次提交
-
-
由 Kees Cook 提交于
Make a copy of the IDT (as seen via the "sidt" instruction) read-only. This primarily removes the IDT from being a target for arbitrary memory write attacks, and has the added benefit of also not leaking the kernel base offset, if it has been relocated. We already did this on vendor == Intel and family == 5 because of the F0 0F bug -- regardless of if a particular CPU had the F0 0F bug or not. Since the workaround was so cheap, there simply was no reason to be very specific. This patch extends the readonly alias to all CPUs, but does not activate the #PF to #UD conversion code needed to deliver the proper exception in the F0 0F case except on Intel family 5 processors. Signed-off-by: NKees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20130410192422.GA17344@www.outflux.net Cc: Eric Northup <digitaleric@google.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 11 4月, 2013 1 次提交
-
-
由 Boris Ostrovsky 提交于
Invoking arch_flush_lazy_mmu_mode() results in calls to preempt_enable()/disable() which may have performance impact. Since lazy MMU is not used on bare metal we can patch away arch_flush_lazy_mmu_mode() so that it is never called in such environment. [ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU updates" may cause a minor performance regression on bare metal. This patch resolves that performance regression. It is somewhat unclear to me if this is a good -stable candidate. ] Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.comTested-by: NJosh Boyer <jwboyer@redhat.com> Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> SEE NOTE ABOVE
-
- 03 4月, 2013 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
Occassionaly on a DL380 G4 the guest would crash quite early with this: (XEN) d244:v0: unhandled page fault (ec=0003) (XEN) Pagetable walk from ffffffff84dc7000: (XEN) L4[0x1ff] = 00000000c3f18067 0000000000001789 (XEN) L3[0x1fe] = 00000000c3f14067 000000000000178d (XEN) L2[0x026] = 00000000dc8b2067 0000000000004def (XEN) L1[0x1c7] = 00100000dc8da067 0000000000004dc7 (XEN) domain_crash_sync called from entry.S (XEN) Domain 244 (vcpu#0) crashed on cpu#3: (XEN) ----[ Xen-4.1.3OVM x86_64 debug=n Not tainted ]---- (XEN) CPU: 3 (XEN) RIP: e033:[<ffffffff81263f22>] (XEN) RFLAGS: 0000000000000216 EM: 1 CONTEXT: pv guest (XEN) rax: 0000000000000000 rbx: ffffffff81785f88 rcx: 000000000000003f (XEN) rdx: 0000000000000000 rsi: 00000000dc8da063 rdi: ffffffff84dc7000 The offending code shows it to be a loop writting the value zero (%rax) in the %rdi (the L4 provided by Xen) register: 0: 44 00 00 add %r8b,(%rax) 3: 31 c0 xor %eax,%eax 5: b9 40 00 00 00 mov $0x40,%ecx a: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 11: 00 00 13: ff c9 dec %ecx 15:* 48 89 07 mov %rax,(%rdi) <-- trapping instruction 18: 48 89 47 08 mov %rax,0x8(%rdi) 1c: 48 89 47 10 mov %rax,0x10(%rdi) which fails. xen_setup_kernel_pagetable recycles some of the Xen's page-table entries when it has switched over to its Linux page-tables. Right before try to clear the page, we make a hypercall to change it from _RO to _RW and that works (otherwise we would hit an BUG()). And the _RW flag is set for that page: (XEN) L1[0x1c7] = 001000004885f067 0000000000004dc7 The error code is 3, so PFEC_page_present and PFEC_write_access, so page is present (correct), and we tried to write to the page, but a violation occurred. The one theory is that the the page entries in hardware (which are cached) are not up to date with what we just set. Especially as we have just done an CR3 write and flushed the multicalls. This patch does solve the problem by flusing out the TLB page entry after changing it from _RO to _RW and we don't hit this issue anymore. Fixed-Oracle-Bug: 16243091 [ON OCCASIONS VM START GOES INTO 'CRASH' STATE: CLEAR_PAGE+0X12 ON HP DL380 G4] Reported-and-Tested-by: NSaar Maoz <Saar.Maoz@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 28 3月, 2013 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
We move the setting of write_cr3 from the early bootup variant (see git commit 0cc9129d "x86-64, xen, mmu: Provide an early version of write_cr3.") to a more appropiate location. This new location sets all of the other non-early variants of pvops calls - and most importantly is before the alternative_asm mechanism kicks in. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 23 2月, 2013 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
With commit 8170e6be ("x86, 64bit: Use a #PF handler to materialize early mappings on demand") we started hitting an early bootup crash where the Xen hypervisor would inform us that: (XEN) d7:v0: unhandled page fault (ec=0000) (XEN) Pagetable walk from ffffea000005b2d0: (XEN) L4[0x1d4] = 0000000000000000 ffffffffffffffff (XEN) domain_crash_sync called from entry.S (XEN) Domain 7 (vcpu#0) crashed on cpu#3: (XEN) ----[ Xen-4.2.0 x86_64 debug=n Not tainted ]---- .. that Xen was unable to context switch back to dom0. Looking at the calling stack we find: [<ffffffff8103feba>] xen_get_user_pgd+0x5a <-- [<ffffffff8103feba>] xen_get_user_pgd+0x5a [<ffffffff81042d27>] xen_write_cr3+0x77 [<ffffffff81ad2d21>] init_mem_mapping+0x1f9 [<ffffffff81ac293f>] setup_arch+0x742 [<ffffffff81666d71>] printk+0x48 We are trying to figure out whether we need to up-date the user PGD as well. Please keep in mind that under 64-bit PV guests we have a limited amount of rings: 0 for the Hypervisor, and 1 for both the Linux kernel and user-space. As such the Linux pvops'fied version of write_cr3 checks if it has to update the user-space cr3 as well. That clearly is not needed during early bootup. The recent changes (see above git commit) streamline the x86 page table allocation to be much simpler (And also incidentally the #PF handler ends up in spirit being similar to how the Xen toolstack sets up the initial page-tables). The fix is to have an early-bootup version of cr3 that just loads the kernel %cr3. The later version - which also handles user-page modifications will be used after the initial page tables have been setup. [ hpa: removed a redundant #ifdef and made the new function __init. Also note that x86-32 already has such an early xen_write_cr3. ] Tested-by: N"H. Peter Anvin" <hpa@zytor.com> Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1361579812-23709-1-git-send-email-konrad.wilk@oracle.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-