- 11 7月, 2014 3 次提交
-
-
由 Kim Phillips 提交于
A userspace process can map device MMIO memory via VFIO or /dev/mem, e.g., for platform device passthrough support in QEMU. During early development, we found the PAGE_S2 memory type being used for MMIO mappings. This patch corrects that by using the more strongly ordered memory type for device MMIO mappings: PAGE_S2_DEVICE. Signed-off-by: NKim Phillips <kim.phillips@linaro.org> Acked-by: NChristoffer Dall <christoffer.dall@linaro.org> Acked-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Eric Auger 提交于
Currently when a KVM region is deleted or moved after KVM_SET_USER_MEMORY_REGION ioctl, the corresponding intermediate physical memory is not unmapped. This patch corrects this and unmaps the region's IPA range in kvm_arch_commit_memory_region using unmap_stage2_range. Signed-off-by: NEric Auger <eric.auger@linaro.org> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Christoffer Dall 提交于
unmap_range() was utterly broken, to quote Marc, and broke in all sorts of situations. It was also quite complicated to follow and didn't follow the usual scheme of having a separate iterating function for each level of page tables. Address this by refactoring the code and introduce a pgd_clear() function. Reviewed-by: NJungseok Lee <jays.lee@samsung.com> Reviewed-by: NMario Smarduch <m.smarduch@samsung.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 28 4月, 2014 1 次提交
-
-
由 Mark Salter 提交于
The kvm/mmu code shared by arm and arm64 uses kalloc() to allocate a bounce page (if hypervisor init code crosses page boundary) and hypervisor PGDs. The problem is that kalloc() does not guarantee the proper alignment. In the case of the bounce page, the page sized buffer allocated may also cross a page boundary negating the purpose and leading to a hang during kvm initialization. Likewise the PGDs allocated may not meet the minimum alignment requirements of the underlying MMU. This patch uses __get_free_page() to guarantee the worst case alignment needs of the bounce page and PGDs on both arm and arm64. Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: NMark Salter <msalter@redhat.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 03 3月, 2014 4 次提交
-
-
由 Marc Zyngier 提交于
Compiling with THP enabled leads to the following warning: arch/arm/kvm/mmu.c: In function ‘unmap_range’: arch/arm/kvm/mmu.c:177:39: warning: ‘pte’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (kvm_pmd_huge(*pmd) || page_empty(pte)) { ^ Code inspection reveals that these two cases are mutually exclusive, so GCC is a bit overzealous here. Silence it anyway by initializing pte to NULL and testing it later on. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Marc Zyngier 提交于
When the guest runs with caches disabled (like in an early boot sequence, for example), all the writes are diectly going to RAM, bypassing the caches altogether. Once the MMU and caches are enabled, whatever sits in the cache becomes suddenly visible, which isn't what the guest expects. A way to avoid this potential disaster is to invalidate the cache when the MMU is being turned on. For this, we hook into the SCTLR_EL1 trapping code, and scan the stage-2 page tables, invalidating the pages/sections that have already been mapped in. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Marc Zyngier 提交于
The use of p*d_addr_end with stage-2 translation is slightly dodgy, as the IPA is 40bits, while all the p*d_addr_end helpers are taking an unsigned long (arm64 is fine with that as unligned long is 64bit). The fix is to introduce 64bit clean versions of the same helpers, and use them in the stage-2 page table code. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Marc Zyngier 提交于
In order for the guest with caches off to observe data written contained in a given page, we need to make sure that page is committed to memory, and not just hanging in the cache (as guest accesses are completely bypassing the cache until it decides to enable it). For this purpose, hook into the coherent_icache_guest_page function and flush the region if the guest SCTLR_EL1 register doesn't show the MMU and caches as being enabled. The function also get renamed to coherent_cache_guest_page. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 09 1月, 2014 1 次提交
-
-
由 Marc Zyngier 提交于
The THP code in KVM/ARM is a bit restrictive in not allowing a THP to be used if the VMA is not 2MB aligned. Actually, it is not so much the VMA that matters, but the associated memslot: A process can perfectly mmap a region with no particular alignment restriction, and then pass a 2MB aligned address to KVM. In this case, KVM will only use this 2MB aligned region, and will ignore the range between vma->vm_start and memslot->userspace_addr. It can also choose to place this memslot at whatever alignment it wants in the IPA space. In the end, what matters is the relative alignment of the user space and IPA mappings with respect to a 2M page. They absolutely must be the same if you want to use THP. Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 12 12月, 2013 1 次提交
-
-
由 Santosh Shilimkar 提交于
KVM initialisation fails on architectures implementing virt_to_idmap() because virt_to_phys() on such architectures won't fetch you the correct idmap page. So update the KVM ARM code to use the virt_to_idmap() to fix the issue. Since the KVM code is shared between arm and arm64, we create kvm_virt_to_phys() and handle the redirection in respective headers. Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 17 11月, 2013 1 次提交
-
-
由 Christoffer Dall 提交于
Using virt_to_phys on percpu mappings is horribly wrong as it may be backed by vmalloc. Introduce kvm_kaddr_to_phys which translates both types of valid kernel addresses to the corresponding physical address. At the same time resolves a typing issue where we were storing the physical address as a 32 bit unsigned long (on arm), truncating the physical address for addresses above the 4GB limit. This caused breakage on Keystone. Cc: <stable@vger.kernel.org> [3.10+] Reported-by: NSantosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 18 10月, 2013 2 次提交
-
-
由 Christoffer Dall 提交于
Support transparent huge pages in KVM/ARM and KVM/ARM64. The transparent_hugepage_adjust is not very pretty, but this is also how it's solved on x86 and seems to be simply an artifact on how THPs behave. This should eventually be shared across architectures if possible, but that can always be changed down the road. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Christoffer Dall 提交于
Support huge pages in KVM/ARM and KVM/ARM64. The pud_huge checking on the unmap path may feel a bit silly as the pud_huge check is always defined to false, but the compiler should be smart about this. Note: This deals only with VMAs marked as huge which are allocated by users through hugetlbfs only. Transparent huge pages can only be detected by looking at the underlying pages (or the page tables themselves) and this patch so far simply maps these on a page-by-page level in the Stage-2 page tables. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 14 8月, 2013 1 次提交
-
-
由 Christoffer Dall 提交于
THe L_PTE_USER actually has nothing to do with stage 2 mappings and the L_PTE_S2_RDWR value sets the readable bit, which was what L_PTE_USER was used for before proper handling of stage 2 memory defines. Changelog: [v3]: Drop call to kvm_set_s2pte_writable in mmu.c [v2]: Change default mappings to be r/w instead of r/o, as per Marc Zyngier's suggestion. Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 08 8月, 2013 2 次提交
-
-
由 Marc Zyngier 提交于
When using 64kB pages, we only have two levels of page tables, meaning that PGD, PUD and PMD are fused. In this case, trying to refcount PUDs and PMDs independently is a a complete disaster, as they are the same. We manage to get it right for the allocation (stage2_set_pte uses {pmd,pud}_none), but the unmapping path clears both pud and pmd refcounts, which fails spectacularly with 2-level page tables. The fix is to avoid calling clear_pud_entry when both the pmd and pud pages are empty. For this, and instead of introducing another pud_empty function, consolidate both pte_empty and pmd_empty into page_empty (the code is actually identical) and use that to also test the validity of the pud. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Christoffer Dall 提交于
The unmap_range function did not properly cover the case when the start address was not aligned to PMD_SIZE or PUD_SIZE and an entire pte table or pmd table was cleared, causing us to leak memory when incrementing the addr. The fix is to always move onto the next page table entry boundary instead of adding the full size of the VA range covered by the corresponding table level entry. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 27 6月, 2013 1 次提交
-
-
由 Marc Zyngier 提交于
S2_PGD_SIZE defines the number of pages used by a stage-2 PGD and is unused, except for a VM_BUG_ON check that missuses the define. As the check is very unlikely to ever triggered except in circumstances where KVM is the least of our worries, just kill both the define and the VM_BUG_ON check. Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <cdall@cs.columbia.edu>
-
- 03 6月, 2013 1 次提交
-
-
由 Marc Zyngier 提交于
The KVM/ARM MMU code doesn't take care of invalidating TLBs before freeing a {pte,pmd} table. This could cause problems if the page is reallocated and then speculated into by another CPU. Reported-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NChristoffer Dall <cdall@cs.columbia.edu>
-
- 29 4月, 2013 6 次提交
-
-
由 Marc Zyngier 提交于
Now that we have the necessary infrastructure to boot a hotplugged CPU at any point in time, wire a CPU notifier that will perform the HYP init for the incoming CPU. Note that this depends on the platform code and/or firmware to boot the incoming CPU with HYP mode enabled and return to the kernel by following the normal boot path (HYP stub installed). Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <cdall@cs.columbia.edu>
-
由 Marc Zyngier 提交于
Our HYP init code suffers from two major design issues: - it cannot support CPU hotplug, as we tear down the idmap very early - it cannot perform a TLB invalidation when switching from init to runtime mappings, as pages are manipulated from PL1 exclusively The hotplug problem mandates that we keep two sets of page tables (boot and runtime). The TLB problem mandates that we're able to transition from one PGD to another while in HYP, invalidating the TLBs in the process. To be able to do this, we need to share a page between the two page tables. A page that will have the same VA in both configurations. All we need is a VA that has the following properties: - This VA can't be used to represent a kernel mapping. - This VA will not conflict with the physical address of the kernel text The vectors page seems to satisfy this requirement: - The kernel never maps anything else there - The kernel text being copied at the beginning of the physical memory, it is unlikely to use the last 64kB (I doubt we'll ever support KVM on a system with something like 4MB of RAM, but patches are very welcome). Let's call this VA the trampoline VA. Now, we map our init page at 3 locations: - idmap in the boot pgd - trampoline VA in the boot pgd - trampoline VA in the runtime pgd The init scenario is now the following: - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd, runtime stack, runtime vectors - Enable the MMU with the boot pgd - Jump to a target into the trampoline page (remember, this is the same physical page!) - Now switch to the runtime pgd (same VA, and still the same physical page!) - Invalidate TLBs - Set stack and vectors - Profit! (or eret, if you only care about the code). Note that we keep the boot mapping permanently (it is not strictly an idmap anymore) to allow for CPU hotplug in later patches. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <cdall@cs.columbia.edu>
-
由 Marc Zyngier 提交于
There is no point in freeing HYP page tables differently from Stage-2. They now have the same requirements, and should be dealt with the same way. Promote unmap_stage2_range to be The One True Way, and get rid of a number of nasty bugs in the process (good thing we never actually called free_hyp_pmds before...). Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <cdall@cs.columbia.edu>
-
由 Marc Zyngier 提交于
After the HYP page table rework, it is pretty easy to let the KVM code provide its own idmap, rather than expecting the kernel to provide it. It takes actually less code to do so. Acked-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <cdall@cs.columbia.edu>
-
由 Marc Zyngier 提交于
The current code for creating HYP mapping doesn't like to wrap around zero, which prevents from mapping anything into the last page of the virtual address space. It doesn't take much effort to remove this limitation, making the code more consistent with the rest of the kernel in the process. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <cdall@cs.columbia.edu>
-
由 Marc Zyngier 提交于
The way we populate HYP mappings is a bit convoluted, to say the least. Passing a pointer around to keep track of the current PFN is quite odd, and we end-up having two different PTE accessors for no good reason. Simplify the whole thing by unifying the two PTE accessors, passing a pgprot_t around, and moving the various validity checks to the upper layers. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <cdall@cs.columbia.edu>
-
- 07 3月, 2013 11 次提交
-
-
由 Marc Zyngier 提交于
Instead of trying to free everything from PAGE_OFFSET to the top of memory, use the virt_addr_valid macro to check the upper limit. Also do the same for the vmalloc region where the IO mappings are allocated. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <cdall@cs.columbia.edu>
-
由 Marc Zyngier 提交于
v8 is capable of invalidating Stage-2 by IPA, but v7 is not. Change kvm_tlb_flush_vmid() to take an IPA parameter, which is then ignored by the invalidation code (and nuke the whole TLB as it always did). This allows v8 to implement a more optimized strategy. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
Since the arm64 code doesn't have a global asm/idmap.h file, move the inclusion to asm/kvm_mmu.h. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
The ARM ARM says that HPFAR reports bits [39:12] of the faulting IPA, and we need to complement it with the bottom 12 bits of the faulting VA. This is always 12 bits, irrespective of the page size. Makes it clearer in the code. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
__create_hyp_mappings() performs some kind of address validation before creating the mapping, by verifying that the start address is above PAGE_OFFSET. This check is not completely correct for kernel memory (the upper boundary has to be checked as well so we do not end up with highmem pages), and wrong for IO mappings (the mapping must exist in the vmalloc region). Fix this by using the proper predicates (virt_addr_valid and is_vmalloc_addr), which also work correctly on ARM64 (where the vmalloc region is below PAGE_OFFSET). Also change the BUG_ON() into a less agressive error return. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
arm64 cannot represent the kernel VAs in HYP mode, because of the lack of TTBR1 at EL2. A way to cope with this situation is to have HYP VAs to be an offset from the kernel VAs. Introduce macros to convert a kernel VA to a HYP VA, make the HYP mapping functions use these conversion macros. Also change the documentation to reflect the existence of the offset. On ARM, where we can have an identity mapping between kernel and HYP, the macros are without any effect. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
Move low level MMU-related operations to kvm_mmu.h. This makes the MMU code reusable by the arm64 port. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
Instead of directly accessing the fault registers, use proper accessors so the core code can be shared. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 25 2月, 2013 1 次提交
-
-
由 Marc Zyngier 提交于
Commit 7a905b14 (KVM: Remove user_alloc from struct kvm_memory_slot) broke KVM/ARM by removing the user_alloc field from a public structure. As we only used this field to alert the user that we didn't support this operation mode, there is no harm in discarding this bit of code without any remorse. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 24 1月, 2013 4 次提交
-
-
由 Christoffer Dall 提交于
When the guest accesses I/O memory this will create data abort exceptions and they are handled by decoding the HSR information (physical address, read/write, length, register) and forwarding reads and writes to QEMU which performs the device emulation. Certain classes of load/store operations do not support the syndrome information provided in the HSR. We don't support decoding these (patches are available elsewhere), so we report an error to user space in this case. This requires changing the general flow somewhat since new calls to run the VCPU must check if there's a pending MMIO load and perform the write after userspace has made the data available. Reviewed-by: NWill Deacon <will.deacon@arm.com> Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com>
-
由 Christoffer Dall 提交于
Handles the guest faults in KVM by mapping in corresponding user pages in the 2nd stage page tables. We invalidate the instruction cache by MVA whenever we map a page to the guest (no, we cannot only do it when we have an iabt because the guest may happily read/write a page before hitting the icache) if the hardware uses VIPT or PIPT. In the latter case, we can invalidate only that physical page. In the first case, all bets are off and we simply must invalidate the whole affair. Not that VIVT icaches are tagged with vmids, and we are out of the woods on that one. Alexander Graf was nice enough to remind us of this massive pain. Reviewed-by: NWill Deacon <will.deacon@arm.com> Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com>
-
由 Christoffer Dall 提交于
This commit introduces the framework for guest memory management through the use of 2nd stage translation. Each VM has a pointer to a level-1 table (the pgd field in struct kvm_arch) which is used for the 2nd stage translations. Entries are added when handling guest faults (later patch) and the table itself can be allocated and freed through the following functions implemented in arch/arm/kvm/arm_mmu.c: - kvm_alloc_stage2_pgd(struct kvm *kvm); - kvm_free_stage2_pgd(struct kvm *kvm); Each entry in TLBs and caches are tagged with a VMID identifier in addition to ASIDs. The VMIDs are assigned consecutively to VMs in the order that VMs are executed, and caches and tlbs are invalidated when the VMID space has been used to allow for more than 255 simultaenously running guests. The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is freed in kvm_arch_destroy_vm(). Both functions are called from the main KVM code. We pre-allocate page table memory to be able to synchronize using a spinlock and be called under rcu_read_lock from the MMU notifiers. We steal the mmu_memory_cache implementation from x86 and adapt for our specific usage. We support MMU notifiers (thanks to Marc Zyngier) through kvm_unmap_hva and kvm_set_spte_hva. Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA, which is used by VGIC support to map the virtual CPU interface registers to the guest. This support is added by Marc Zyngier. Reviewed-by: NWill Deacon <will.deacon@arm.com> Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com>
-
由 Christoffer Dall 提交于
Sets up KVM code to handle all exceptions taken to Hyp mode. When the kernel is booted in Hyp mode, calling an hvc instruction with r0 pointing to the new vectors, the HVBAR is changed to the the vector pointers. This allows subsystems (like KVM here) to execute code in Hyp-mode with the MMU disabled. We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode to perform a world-switch into a KVM guest. Also provides memory mapping code to map required code pages, data structures, and I/O regions accessed in Hyp mode at the same virtual address as the host kernel virtual addresses, but which conforms to the architectural requirements for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c and comprises: - create_hyp_mappings(from, to); - create_hyp_io_mappings(from, to, phys_addr); - free_hyp_pmds(); Reviewed-by: NWill Deacon <will.deacon@arm.com> Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com>
-