- 15 12月, 2014 1 次提交
-
-
由 Paul Mackerras 提交于
Testing with KSM active in the host showed occasional corruption of guest memory. Typically a page that should have contained zeroes would contain values that look like the contents of a user process stack (values such as 0x0000_3fff_xxxx_xxx). Code inspection in kvmppc_h_protect revealed that there was a race condition with the possibility of granting write access to a page which is read-only in the host page tables. The code attempts to keep the host mapping read-only if the host userspace PTE is read-only, but if that PTE had been temporarily made invalid for any reason, the read-only check would not trigger and the host HPTE could end up read-write. Examination of the guest HPT in the failure situation revealed that there were indeed shared pages which should have been read-only that were mapped read-write. To close this race, we don't let a page go from being read-only to being read-write, as far as the real HPTE mapping the page is concerned (the guest view can go to read-write, but the actual mapping stays read-only). When the guest tries to write to the page, we take an HDSI and let kvmppc_book3s_hv_page_fault take care of providing a writable HPTE for the page. This eliminates the occasional corruption of shared pages that was previously seen with KSM active. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 28 7月, 2014 1 次提交
-
-
由 Alexander Graf 提交于
When running on an LE host all data structures are kept in little endian byte order. However, the HTAB still needs to be maintained in big endian. So every time we access any HTAB we need to make sure we do so in the right byte order. Fix up all accesses to manually byte swap. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 25 6月, 2014 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
With guests supporting Multiple page size per segment (MPSS), hpte_page_size returns the actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB. Without this patch a hpte lookup can fail since we are comparing wrong page size in kvmppc_hv_find_lock_hpte. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 30 5月, 2014 1 次提交
-
-
由 Paul Mackerras 提交于
The global_invalidates() function contains a check that is intended to tell whether we are currently executing in the context of a hypercall issued by the guest. The reason is that the optimization of using a local TLB invalidate instruction is only valid in that context. The check was testing local_paca->kvm_hstate.kvm_vcore, which gets set when entering the guest but no longer gets cleared when exiting the guest. To fix this, we use the kvm_vcpu field instead, which does get cleared when exiting the guest, by the kvmppc_release_hwthread() calls inside kvmppc_run_core(). The effect of having the check wrong was that when kvmppc_do_h_remove() got called from htab_write() on the destination machine during a migration, it cleared the current cpu's bit in kvm->arch.need_tlb_flush. This meant that when the guest started running in the destination VM, it may miss out on doing a complete TLB flush, and therefore may end up using stale TLB entries from a previous guest that used the same LPID value. This should make migration more reliable. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 28 4月, 2014 1 次提交
-
-
Numa fault is a method which help to achieve auto numa balancing. When such a page fault takes place, the page fault handler will check whether the page is placed correctly. If not, migration should be involved to cut down the distance between the cpu and pages. A pte with _PAGE_NUMA help to implement numa fault. It means not to allow the MMU to access the page directly. So a page fault is triggered and numa fault handler gets the opportunity to run checker. As for the access of MMU, we need special handling for the powernv's guest. When we mark a pte with _PAGE_NUMA, we already call mmu_notifier to invalidate it in guest's htab, but when we tried to re-insert them, we firstly try to map it in real-mode. Only after this fails, we fallback to virt mode, and most of important, we run numa fault handler in virt mode. This patch guards the way of real-mode to ensure that if a pte is marked with _PAGE_NUMA, it will NOT be mapped in real mode, instead, it will be mapped in virt mode and have the opportunity to be checked with placement. Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 29 3月, 2014 1 次提交
-
-
由 Paul Mackerras 提交于
With HV KVM, some high-frequency hypercalls such as H_ENTER are handled in real mode, and need to access the memslots array for the guest. Accessing the memslots array is safe, because we hold the SRCU read lock for the whole time that a guest vcpu is running. However, the checks that kvm_memslots() does when lockdep is enabled are potentially unsafe in real mode, when only the linear mapping is available. Furthermore, kvm_memslots() can be called from a secondary CPU thread, which is an offline CPU from the point of view of the host kernel, and is not running the task which holds the SRCU read lock. To avoid false positives in the checks in kvm_memslots(), and to avoid possible side effects from doing the checks in real mode, this replaces kvm_memslots() with kvm_memslots_raw() in all the places that execute in real mode. kvm_memslots_raw() is a new function that is like kvm_memslots() but uses rcu_dereference_raw_notrace() instead of kvm_dereference_check(). Signed-off-by: NPaul Mackerras <paulus@samba.org> Acked-by: NScott Wood <scottwood@freescale.com>
-
- 09 1月, 2014 1 次提交
-
-
由 Bharat Bhushan 提交于
lookup_linux_pte() is doing more than lookup, updating the pte, so for clarity it is renamed to lookup_linux_pte_and_update() Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com> Reviewed-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 18 12月, 2013 1 次提交
-
-
由 Paul Mackerras 提交于
Commit caaa4c80 ("KVM: PPC: Book3S HV: Fix physical address calculations") unfortunately resulted in some low-order address bits getting dropped in the case where the guest is creating a 4k HPTE and the host page size is 64k. By getting the low-order bits from hva rather than gpa we miss out on bits 12 - 15 in this case, since hva is at page granularity. This puts the missing bits back in. Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 19 11月, 2013 2 次提交
-
-
由 pingfan liu 提交于
Since kvmppc_hv_find_lock_hpte() is called from both virtmode and realmode, so it can trigger the deadlock. Suppose the following scene: Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of vcpus. If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched out, and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be caught in realmode for a long time. What makes things even worse if the following happens, On cpuM, bitlockX is hold, on cpuN, Y is hold. vcpu_B_2 try to lock Y on cpuM in realmode vcpu_A_2 try to lock X on cpuN in realmode Oops! deadlock happens Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com> Reviewed-by: NPaul Mackerras <paulus@samba.org> CC: stable@vger.kernel.org Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This fixes a bug in kvmppc_do_h_enter() where the physical address for a page can be calculated incorrectly if transparent huge pages (THP) are active. Until THP came along, it was true that if we encountered a large (16M) page in kvmppc_do_h_enter(), then the associated memslot must be 16M aligned for both its guest physical address and the userspace address, and the physical address calculations in kvmppc_do_h_enter() assumed that. With THP, that is no longer true. In the case where we are using MMU notifiers and the page size that we get from the Linux page tables is larger than the page being mapped by the guest, we need to fill in some low-order bits of the physical address. Without THP, these bits would be the same in the guest physical address (gpa) and the host virtual address (hva). With THP, they can be different, and we need to use the bits from hva rather than gpa. In the case where we are not using MMU notifiers, the host physical address we get from the memslot->arch.slot_phys[] array already includes the low-order bits down to the PAGE_SIZE level, even if we are using large pages. Thus we can simplify the calculation in this case to just add in the remaining bits in the case where PAGE_SIZE is 64k and the guest is mapping a 4k page. The same bug exists in kvmppc_book3s_hv_page_fault(). The basic fix is to use psize (the page size from the HPTE) rather than pte_size (the page size from the Linux PTE) when updating the HPTE low word in r. That means that pfn needs to be computed to PAGE_SIZE granularity even if the Linux PTE is a huge page PTE. That can be arranged simply by doing the page_to_pfn() before setting page to the head of the compound page. If psize is less than PAGE_SIZE, then we need to make sure we only update the bits from PAGE_SIZE upwards, in order not to lose any sub-page offset bits in r. On the other hand, if psize is greater than PAGE_SIZE, we need to make sure we don't bring in non-zero low order bits in pfn, hence we mask (pfn << PAGE_SHIFT) with ~(psize - 1). Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 14 8月, 2013 1 次提交
-
-
由 Anton Blanchard 提交于
Our ppc64 spinlocks and rwlocks use a trick where a lock token and the paca index are placed in the lock with a single store. Since we are using two u16s they need adjusting for little endian. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 10 7月, 2013 1 次提交
-
-
由 Paul Mackerras 提交于
This corrects the usage of the tlbie (TLB invalidate entry) instruction in HV KVM. The tlbie instruction changed between PPC970 and POWER7. On the PPC970, the bit to select large vs. small page is in the instruction, not in the RB register value. This changes the code to use the correct form on PPC970. On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower) field of the RB value incorrectly for 64k pages. This fixes it. Since we now have several cases to handle for the tlbie instruction, this factors out the code to do a sequence of tlbies into a new function, do_tlbies(), and calls that from the various places where the code was doing tlbie instructions inline. It also makes kvmppc_h_bulk_remove() use the same global_invalidates() function for determining whether to do local or global TLB invalidations as is used in other places, for consistency, and also to make sure that kvm->arch.need_tlb_flush gets updated properly. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 21 6月, 2013 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
We can find pte that are splitting while walking page tables. Return None pte in that case. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Aneesh Kumar K.V 提交于
Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly document why we don't need to handle transparent hugepages at callsites. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 27 4月, 2013 1 次提交
-
-
由 Paul Mackerras 提交于
At present, the code that determines whether a HPT entry has changed, and thus needs to be sent to userspace when it is copying the HPT, doesn't consider a hardware update to the reference and change bits (R and C) in the HPT entries to constitute a change that needs to be sent to userspace. This adds code to check for changes in R and C when we are scanning the HPT to find changed entries, and adds code to set the changed flag for the HPTE when we update the R and C bits in the guest view of the HPTE. Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification() function into kvm_book3s_64.h. Current Linux guest kernels don't use the hardware updates of R and C in the HPT, so this change won't affect them. Linux (or other) kernels might in future want to use the R and C bits and have them correctly transferred across when a guest is migrated, so it is better to correct this deficiency. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 06 12月, 2012 6 次提交
-
-
由 Paul Mackerras 提交于
When we change or remove a HPT (hashed page table) entry, we can do either a global TLB invalidation (tlbie) that works across the whole machine, or a local invalidation (tlbiel) that only affects this core. Currently we do local invalidations if the VM has only one vcpu or if the guest requests it with the H_LOCAL flag, though the guest Linux kernel currently doesn't ever use H_LOCAL. Then, to cope with the possibility that vcpus moving around to different physical cores might expose stale TLB entries, there is some code in kvmppc_hv_entry to flush the whole TLB of entries for this VM if either this vcpu is now running on a different physical core from where it last ran, or if this physical core last ran a different vcpu. There are a number of problems on POWER7 with this as it stands: - The TLB invalidation is done per thread, whereas it only needs to be done per core, since the TLB is shared between the threads. - With the possibility of the host paging out guest pages, the use of H_LOCAL by an SMP guest is dangerous since the guest could possibly retain and use a stale TLB entry pointing to a page that had been removed from the guest. - The TLB invalidations that we do when a vcpu moves from one physical core to another are unnecessary in the case of an SMP guest that isn't using H_LOCAL. - The optimization of using local invalidations rather than global should apply to guests with one virtual core, not just one vcpu. (None of this applies on PPC970, since there we always have to invalidate the whole TLB when entering and leaving the guest, and we can't support paging out guest memory.) To fix these problems and simplify the code, we now maintain a simple cpumask of which cpus need to flush the TLB on entry to the guest. (This is indexed by cpu, though we only ever use the bits for thread 0 of each core.) Whenever we do a local TLB invalidation, we set the bits for every cpu except the bit for thread 0 of the core that we're currently running on. Whenever we enter a guest, we test and clear the bit for our core, and flush the TLB if it was set. On initial startup of the VM, and when resetting the HPT, we set all the bits in the need_tlb_flush cpumask, since any core could potentially have stale TLB entries from the previous VM to use the same LPID, or the previous contents of the HPT. Then, we maintain a count of the number of online virtual cores, and use that when deciding whether to use a local invalidation rather than the number of online vcpus. The code to make that decision is extracted out into a new function, global_invalidates(). For multi-core guests on POWER7 (i.e. when we are using mmu notifiers), we now never do local invalidations regardless of the H_LOCAL flag. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
Currently, if the guest does an H_PROTECT hcall requesting that the permissions on a HPT entry be changed to allow writing, we make the requested change even if the page is marked read-only in the host Linux page tables. This is a problem since it would for instance allow a guest to modify a page that KSM has decided can be shared between multiple guests. To fix this, if the new permissions for the page allow writing, we need to look up the memslot for the page, work out the host virtual address, and look up the Linux page tables to get the PTE for the page. If that PTE is read-only, we reduce the HPTE permissions to read-only. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This makes a HPTE removal function, kvmppc_do_h_remove(), available outside book3s_hv_rm_mmu.c. This will be used by the HPT writing code. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This uses a bit in our record of the guest view of the HPTE to record when the HPTE gets modified. We use a reserved bit for this, and ensure that this bit is always cleared in HPTE values returned to the guest. The recording of modified HPTEs is only done if other code indicates its interest by setting kvm->arch.hpte_mod_interest to a non-zero value. The reason for this is that when later commits add facilities for userspace to read the HPT, the first pass of reading the HPT will be quicker if there are no (or very few) HPTEs marked as modified, rather than having most HPTEs marked as modified. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This fixes a bug where adding a new guest HPT entry via the H_ENTER hcall would lose the "changed" bit in the reverse map information for the guest physical page being mapped. The result was that the KVM_GET_DIRTY_LOG could return a zero bit for the page even though the page had been modified by the guest. This fixes it by only modifying the index and present bits in the reverse map entry, thus preserving the reference and change bits. We were also unnecessarily setting the reference bit, and this fixes that too. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This restructures the code that creates HPT (hashed page table) entries so that it can be called in situations where we don't have a struct vcpu pointer, only a struct kvm pointer. It also fixes a bug where kvmppc_map_vrma() would corrupt the guest R4 value. Most of the work of kvmppc_virtmode_h_enter is now done by a new function, kvmppc_virtmode_do_h_enter, which itself calls another new function, kvmppc_do_h_enter, which contains most of the old kvmppc_h_enter. The new kvmppc_do_h_enter takes explicit arguments for the place to return the HPTE index, the Linux page tables to use, and whether it is being called in real mode, thus removing the need for it to have the vcpu as an argument. Currently kvmppc_map_vrma creates the VRMA (virtual real mode area) HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily to handle H_ENTER hcalls from the guest that need to pin a page of memory. Since H_ENTER returns the index of the created HPTE in R4, kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4 in the case when it gets called from kvmppc_map_vrma on the first VCPU_RUN ioctl. With this, kvmppc_map_vrma instead calls kvmppc_virtmode_do_h_enter with the address of a dummy word as the place to store the HPTE index, thus avoiding corrupting the guest R4. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 23 10月, 2012 1 次提交
-
-
由 Christoffer Dall 提交于
The mmu_notifier_retry is not specific to any vcpu (and never will be) so only take struct kvm as a parameter. The motivation is the ARM mmu code that needs to call this from somewhere where we long let go of the vcpu pointer. Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 06 10月, 2012 2 次提交
-
-
由 Paul Mackerras 提交于
This adds an implementation of kvm_arch_flush_shadow_memslot for Book3S HV, and arranges for kvmppc_core_commit_memory_region to flush the dirty log when modifying an existing slot. With this, we can handle deletion and modification of memory slots. kvm_arch_flush_shadow_memslot calls kvmppc_core_flush_memslot, which on Book3S HV now traverses the reverse map chains to remove any HPT (hashed page table) entries referring to pages in the memslot. This gets called by generic code whenever deleting a memslot or changing the guest physical address for a memslot. We flush the dirty log in kvmppc_core_commit_memory_region for consistency with what x86 does. We only need to flush when an existing memslot is being modified, because for a new memslot the rmap array (which stores the dirty bits) is all zero, meaning that every page is considered clean already, and when deleting a memslot we obviously don't care about the dirty bits any more. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
Now that we have an architecture-specific field in the kvm_memory_slot structure, we can use it to store the array of page physical addresses that we need for Book3S HV KVM on PPC970 processors. This reduces the size of struct kvm_arch for Book3S HV, and also reduces the size of struct kvm_arch_memory_slot for other PPC KVM variants since the fields in it are now only compiled in for Book3S HV. This necessitates making the kvm_arch_create_memslot and kvm_arch_free_memslot operations specific to each PPC KVM variant. That in turn means that we now don't allocate the rmap arrays on Book3S PR and Book E. Since we now unpin pages and free the slot_phys array in kvmppc_core_free_memslot, we no longer need to do it in kvmppc_core_destroy_vm, since the generic code takes care to free all the memslots when destroying a VM. We now need the new memslot to be passed in to kvmppc_core_prepare_memory_region, since we need to initialize its arch.slot_phys member on Book3S HV. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 28 8月, 2012 1 次提交
-
-
由 Gavin Shan 提交于
The build error was caused by that builtin functions are calling the functions implemented in modules. This error was introduced by commit 4d8b81ab ("KVM: introduce readonly memslot"). The patch fixes the build error by moving function __gfn_to_hva_memslot() from kvm_main.c to kvm_host.h and making that "inline" so that the builtin function (kvmppc_h_enter) can use that. Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 06 8月, 2012 1 次提交
-
-
由 Takuya Yoshikawa 提交于
Two reasons: - x86 can integrate rmap and rmap_pde and remove heuristics in __gfn_to_rmap(). - Some architectures do not need rmap. Since rmap is one of the most memory consuming stuff in KVM, ppc'd better restrict the allocation to Book3S HV. Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 30 5月, 2012 1 次提交
-
-
由 Paul Mackerras 提交于
This adds a new ioctl to enable userspace to control the size of the guest hashed page table (HPT) and to clear it out when resetting the guest. The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter a pointer to a u32 containing the desired order of the HPT (log base 2 of the size in bytes), which is updated on successful return to the actual order of the HPT which was allocated. There must be no vcpus running at the time of this ioctl. To enforce this, we now keep a count of the number of vcpus running in kvm->arch.vcpus_running. If the ioctl is called when a HPT has already been allocated, we don't reallocate the HPT but just clear it out. We first clear the kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold the kvm->lock mutex, it will prevent any vcpus from starting to run until we're done, and (b) it means that the first vcpu to run after we're done will re-establish the VRMA if necessary. If userspace doesn't call this ioctl before running the first vcpu, the kernel will allocate a default-sized HPT at that point. We do it then rather than when creating the VM, as the code did previously, so that userspace has a chance to do the ioctl if it wants. When allocating the HPT, we can allocate either from the kernel page allocator, or from the preallocated pool. If userspace is asking for a different size from the preallocated HPTs, we first try to allocate using the kernel page allocator. Then we try to allocate from the preallocated pool, and then if that fails, we try allocating decreasing sizes from the kernel page allocator, down to the minimum size allowed (256kB). Note that the kernel page allocator limits allocations to 1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to 16MB (on 64-bit powerpc, at least). Signed-off-by: NPaul Mackerras <paulus@samba.org> [agraf: fix module compilation] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 16 5月, 2012 1 次提交
-
-
由 Paul Mackerras 提交于
When handling the H_BULK_REMOVE hypercall, we were forgetting to invalidate and unlock the hashed page table entry (HPTE) in the case where the page had been paged out. This fixes it by clearing the first doubleword of the HPTE in that case. This fixes a regression introduced in commit a92bce95 ("KVM: PPC: Book3S HV: Keep HPTE locked when invalidating"). The effect of the regression is that the host kernel will sometimes hang when under memory pressure. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 05 3月, 2012 12 次提交
-
-
由 Paul Mackerras 提交于
This moves __gfn_to_memslot() and search_memslots() from kvm_main.c to kvm_host.h to reduce the code duplication caused by the need for non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c to call gfn_to_memslot() in real mode. Rather than putting gfn_to_memslot() itself in a header, which would lead to increased code size, this puts __gfn_to_memslot() in a header. Then, the non-modular uses of gfn_to_memslot() are changed to call __gfn_to_memslot() instead. This way there is only one place in the source code that needs to be changed should the gfn_to_memslot() implementation need to be modified. On powerpc, the Book3S HV style of KVM has code that is called from real mode which needs to call gfn_to_memslot() and thus needs this. (Module code is allocated in the vmalloc region, which can't be accessed in real mode.) With this, we can remove builtin_gfn_to_memslot() from book3s_hv_rm_mmu.c. Signed-off-by: NPaul Mackerras <paulus@samba.org> Acked-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This uses the host view of the hardware R (referenced) bit to speed up kvm_age_hva() and kvm_test_age_hva(). Instead of removing all the relevant HPTEs in kvm_age_hva(), we now just reset their R bits if set. Also, kvm_test_age_hva() now scans the relevant HPTEs to see if any of them have R set. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This allows both the guest and the host to use the referenced (R) and changed (C) bits in the guest hashed page table. The guest has a view of R and C that is maintained in the guest_rpte field of the revmap entry for the HPTE, and the host has a view that is maintained in the rmap entry for the associated gfn. Both view are updated from the guest HPT. If a bit (R or C) is zero in either view, it will be initially set to zero in the HPTE (or HPTEs), until set to 1 by hardware. When an HPTE is removed for any reason, the R and C bits from the HPTE are ORed into both views. We have to be careful to read the R and C bits from the HPTE after invalidating it, but before unlocking it, in case of any late updates by the hardware. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This reworks the implementations of the H_REMOVE and H_BULK_REMOVE hcalls to make sure that we keep the HPTE locked and in the reverse- mapping chain until we have finished invalidating it. Previously we would remove it from the chain and unlock it before invalidating it, leaving a tiny window when the guest could access the page even though we believe we have removed it from the guest (e.g., kvm_unmap_hva() has been called for the page and has found no HPTEs in the chain). In addition, we'll need this for future patches where we will need to read the R and C bits in the HPTE after invalidating it. Doing this required restructuring kvmppc_h_bulk_remove() substantially. Since we want to batch up the tlbies, we now need to keep several HPTEs locked simultaneously. In order to avoid possible deadlocks, we don't spin on the HPTE bitlock for any except the first HPTE in a batch. If we can't acquire the HPTE bitlock for the second or subsequent HPTE, we terminate the batch at that point, do the tlbies that we have accumulated so far, unlock those HPTEs, and then start a new batch to do the remaining invalidations. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
With this, if a guest does an H_ENTER with a read/write HPTE on a page which is currently read-only, we make the actual HPTE inserted be a read-only version of the HPTE. We now intercept protection faults as well as HPTE not found faults, and for a protection fault we work out whether it should be reflected to the guest (e.g. because the guest HPTE didn't allow write access to usermode) or handled by switching to kernel context and calling kvmppc_book3s_hv_page_fault, which will then request write access to the page and update the actual HPTE. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This adds the infrastructure to enable us to page out pages underneath a Book3S HV guest, on processors that support virtualized partition memory, that is, POWER7. Instead of pinning all the guest's pages, we now look in the host userspace Linux page tables to find the mapping for a given guest page. Then, if the userspace Linux PTE gets invalidated, kvm_unmap_hva() gets called for that address, and we replace all the guest HPTEs that refer to that page with absent HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit set, which will cause an HDSI when the guest tries to access them. Finally, the page fault handler is extended to reinstantiate the guest HPTE when the guest tries to access a page which has been paged out. Since we can't intercept the guest DSI and ISI interrupts on PPC970, we still have to pin all the guest pages on PPC970. We have a new flag, kvm->arch.using_mmu_notifiers, that indicates whether we can page guest pages out. If it is not set, the MMU notifier callbacks do nothing and everything operates as before. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This provides the low-level support for MMIO emulation in Book3S HV guests. When the guest tries to map a page which is not covered by any memslot, that page is taken to be an MMIO emulation page. Instead of inserting a valid HPTE, we insert an HPTE that has the valid bit clear but another hypervisor software-use bit set, which we call HPTE_V_ABSENT, to indicate that this is an absent page. An absent page is treated much like a valid page as far as guest hcalls (H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that an absent HPTE doesn't need to be invalidated with tlbie since it was never valid as far as the hardware is concerned. When the guest accesses a page for which there is an absent HPTE, it will take a hypervisor data storage interrupt (HDSI) since we now set the VPM1 bit in the LPCR. Our HDSI handler for HPTE-not-present faults looks up the hash table and if it finds an absent HPTE mapping the requested virtual address, will switch to kernel mode and handle the fault in kvmppc_book3s_hv_page_fault(), which at present just calls kvmppc_hv_emulate_mmio() to set up the MMIO emulation. This is based on an earlier patch by Benjamin Herrenschmidt, but since heavily reworked. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This expands the reverse mapping array to contain two links for each HPTE which are used to link together HPTEs that correspond to the same guest logical page. Each circular list of HPTEs is pointed to by the rmap array entry for the guest logical page, pointed to by the relevant memslot. Links are 32-bit HPT entry indexes rather than full 64-bit pointers, to save space. We use 3 of the remaining 32 bits in the rmap array entries as a lock bit, a referenced bit and a present bit (the present bit is needed since HPTE index 0 is valid). The bit lock for the rmap chain nests inside the HPTE lock bit. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This provides for the case where userspace maps an I/O device into the address range of a memory slot using a VM_PFNMAP mapping. In that case, we work out the pfn from vma->vm_pgoff, and record the cache enable bits from vma->vm_page_prot in two low-order bits in the slot_phys array entries. Then, in kvmppc_h_enter() we check that the cache bits in the HPTE that the guest wants to insert match the cache bits in the slot_phys array entry. However, we do allow the guest to create what it thinks is a non-cacheable or write-through mapping to memory that is actually cacheable, so that we can use normal system memory as part of an emulated device later on. In that case the actual HPTE we insert is a cacheable HPTE. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This relaxes the requirement that the guest memory be provided as 16MB huge pages, allowing it to be provided as normal memory, i.e. in pages of PAGE_SIZE bytes (4k or 64k). To allow this, we index the kvm->arch.slot_phys[] arrays with a small page index, even if huge pages are being used, and use the low-order 5 bits of each entry to store the order of the enclosing page with respect to normal pages, i.e. log_2(enclosing_page_size / PAGE_SIZE). Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This removes the code from kvmppc_core_prepare_memory_region() that looked up the VMA for the region being added and called hva_to_page to get the pfns for the memory. We have no guarantee that there will be anything mapped there at the time of the KVM_SET_USER_MEMORY_REGION ioctl call; userspace can do that ioctl and then map memory into the region later. Instead we defer looking up the pfn for each memory page until it is needed, which generally means when the guest does an H_ENTER hcall on the page. Since we can't call get_user_pages in real mode, if we don't already have the pfn for the page, kvmppc_h_enter() will return H_TOO_HARD and we then call kvmppc_virtmode_h_enter() once we get back to kernel context. That calls kvmppc_get_guest_page() to get the pfn for the page, and then calls back to kvmppc_h_enter() to redo the HPTE insertion. When the first vcpu starts executing, we need to have the RMO or VRMA region mapped so that the guest's real mode accesses will work. Thus we now have a check in kvmppc_vcpu_run() to see if the RMO/VRMA is set up and if not, call kvmppc_hv_setup_rma(). It checks if the memslot starting at guest physical 0 now has RMO memory mapped there; if so it sets it up for the guest, otherwise on POWER7 it sets up the VRMA. The function that does that, kvmppc_map_vrma, is now a bit simpler, as it calls kvmppc_virtmode_h_enter instead of creating the HPTE itself. Since we are now potentially updating entries in the slot_phys[] arrays from multiple vcpu threads, we now have a spinlock protecting those updates to ensure that we don't lose track of any references to pages. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
At present, our implementation of H_ENTER only makes one try at locking each slot that it looks at, and doesn't even retry the ldarx/stdcx. atomic update sequence that it uses to attempt to lock the slot. Thus it can return the H_PTEG_FULL error unnecessarily, particularly when the H_EXACT flag is set, meaning that the caller wants a specific PTEG slot. This improves the situation by making a second pass when no free HPTE slot is found, where we spin until we succeed in locking each slot in turn and then check whether it is full while we hold the lock. If the second pass fails, then we return H_PTEG_FULL. This also moves lock_hpte to a header file (since later commits in this series will need to use it from other source files) and renames it to try_lock_hpte, which is a somewhat less misleading name. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-