- 17 10月, 2013 5 次提交
-
-
由 Aneesh Kumar K.V 提交于
This patch add a new callback kvmppc_ops. This will help us in enabling both HV and PR KVM together in the same kernel. The actual change to enable them together is done in the later patch in the series. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: squash in booke changes] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
The mark_page_dirty() function, despite what its name might suggest, doesn't actually mark the page as dirty as far as the MM subsystem is concerned. It merely sets a bit in KVM's map of dirty pages, if userspace has requested dirty tracking for the relevant memslot. To tell the MM subsystem that the page is dirty, we have to call kvm_set_pfn_dirty() (or an equivalent such as SetPageDirty()). This adds a call to kvm_set_pfn_dirty(), and while we are here, also adds a call to kvm_set_pfn_accessed() to tell the MM subsystem that the page has been accessed. Since we are now using the pfn in several places, this adds a 'pfn' variable to store it and changes the places that used hpaddr >> PAGE_SHIFT to use pfn instead, which is the same thing. This also changes a use of HPTE_R_PP to PP_RXRX. Both are 3, but PP_RXRX is more informative as being the read-only page permission bit setting. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
When the MM code is invalidating a range of pages, it calls the KVM kvm_mmu_notifier_invalidate_range_start() notifier function, which calls kvm_unmap_hva_range(), which arranges to flush all the existing host HPTEs for guest pages. However, the Linux PTEs for the range being flushed are still valid at that point. We are not supposed to establish any new references to pages in the range until the ...range_end() notifier gets called. The PPC-specific KVM code doesn't get any explicit notification of that; instead, we are supposed to use mmu_notifier_retry() to test whether we are or have been inside a range flush notifier pair while we have been getting a page and instantiating a host HPTE for the page. This therefore adds a call to mmu_notifier_retry inside kvmppc_mmu_map_page(). This call is inside a region locked with kvm->mmu_lock, which is the same lock that is called by the KVM MMU notifier functions, thus ensuring that no new notification can proceed while we are in the locked region. Inside this region we also create the host HPTE and link the corresponding hpte_cache structure into the lists used to find it later. We cannot allocate the hpte_cache structure inside this locked region because that can lead to deadlock, so we allocate it outside the region and free it if we end up not using it. This also moves the updates of vcpu3s->hpte_cache_count inside the regions locked with vcpu3s->mmu_lock, and does the increment in kvmppc_mmu_hpte_cache_map() when the pte is added to the cache rather than when it is allocated, in order that the hpte_cache_count is accurate. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
Currently we request write access to all pages that get mapped into the guest, even if the guest is only loading from the page. This reduces the effectiveness of KSM because it means that we unshare every page we access. Also, we always set the changed (C) bit in the guest HPTE if it allows writing, even for a guest load. This fixes both these problems. We pass an 'iswrite' flag to the mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether the access is a load or a store. The mmu.xlate() functions now only set C for stores. kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot() instead of gfn_to_pfn() so that it can indicate whether we need write access to the page, and get back a 'writable' flag to indicate whether the page is writable or not. If that 'writable' flag is clear, we then make the host HPTE read-only even if the guest HPTE allowed writing. This means that we can get a protection fault when the guest writes to a page that it has mapped read-write but which is read-only on the host side (perhaps due to KSM having merged the page). Thus we now call kvmppc_handle_pagefault() for protection faults as well as HPTE not found faults. In kvmppc_handle_pagefault(), if the access was allowed by the guest HPTE and we thus need to install a new host HPTE, we then need to remove the old host HPTE if there is one. This is done with a new function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to find and remove the old host HPTE. Since the memslot-related functions require the KVM SRCU read lock to be held, this adds srcu_read_lock/unlock pairs around the calls to kvmppc_handle_pagefault(). Finally, this changes kvmppc_mmu_book3s_32_xlate_pte() to not ignore guest HPTEs that don't permit access, and to return -EPERM for accesses that are not permitted by the page protections. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
Currently, PR KVM uses 4k pages for the host-side mappings of guest memory, regardless of the host page size. When the host page size is 64kB, we might as well use 64k host page mappings for guest mappings of 64kB and larger pages and for guest real-mode mappings. However, the magic page has to remain a 4k page. To implement this, we first add another flag bit to the guest VSID values we use, to indicate that this segment is one where host pages should be mapped using 64k pages. For segments with this bit set we set the bits in the shadow SLB entry to indicate a 64k base page size. When faulting in host HPTEs for this segment, we make them 64k HPTEs instead of 4k. We record the pagesize in struct hpte_cache for use when invalidating the HPTE. For now we restrict the segment containing the magic page (if any) to 4k pages. It should be possible to lift this restriction in future by ensuring that the magic 4k page is appropriately positioned within a host 64k page. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 30 6月, 2013 2 次提交
-
-
由 Paul Mackerras 提交于
With this, the guest can use 1TB segments as well as 256MB segments. Since we now have the situation where a single emulated guest segment could correspond to multiple shadow segments (as the shadow segments are still 256MB segments), this adds a new kvmppc_mmu_flush_segment() to scan for all shadow segments that need to be removed. This restructures the guest HPT (hashed page table) lookup code to use the correct hashing and matching functions for HPTEs within a 1TB segment. We use the standard hpt_hash() function instead of open-coding the hash calculation, and we use HPTE_V_COMPARE() with an AVPN value that has the B (segment size) field included. The calculation of avpn is done a little earlier since it doesn't change in the loop starting at the do_second label. The computation in kvmppc_mmu_book3s_64_esid_to_vsid() changes so that it returns a 256MB VSID even if the guest SLB entry is a 1TB entry. This is because the users of this function are creating 256MB SLB entries. We set a new VSID_1T flag so that entries created from 1T segments don't collide with entries from 256MB segments. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This makes sure the calculation of the proto-VSIDs used by PR KVM is done with 64-bit arithmetic. Since vcpu3s->context_id[] is int, when we do vcpu3s->context_id[0] << ESID_BITS the shift will be done with 32-bit instructions, possibly leading to significant bits getting lost, as the context id can be up to 524283 and ESID_BITS is 18. To fix this we cast the context id to u64 before shifting. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 21 6月, 2013 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
If a hash bucket gets full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". This implies that when we are invalidating or updating pte, even if HPTE entry is not valid we should do a tlb invalidate. With hugepages, we need to pass the correct actual page size value for tlb invalidation. This change update the patch 0608d692 "powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle transparent hugepages correctly. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 30 4月, 2013 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
We look at both the segment base page size and actual page size and store the pte-lp-encodings in an array per base page size. We also update all relevant functions to take actual page size argument so that we can use the correct PTE LP encoding in HPTE. This should also get the basic Multiple Page Size per Segment (MPSS) support. This is needed to enable THP on ppc64. [Fixed PR KVM build --BenH] Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 17 3月, 2013 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
Now we use ESID_BITS of kernel address to build proto vsid. So rename USER_ESIT_BITS to ESID_BITS Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.8]
-
- 30 10月, 2012 1 次提交
-
-
由 Xiao Guangrong 提交于
This patch filters noslot pfn out from error pfns based on Marcelo comment: noslot pfn is not a error pfn After this patch, - is_noslot_pfn indicates that the gfn is not in slot - is_error_pfn indicates that the gfn is in slot but the error is occurred when translate the gfn to pfn - is_error_noslot_pfn indicates that the pfn either it is error pfns or it is noslot pfn And is_invalid_pfn can be removed, it makes the code more clean Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 06 10月, 2012 1 次提交
-
-
由 Alexander Graf 提交于
Now that we have very simple MMU Notifier support for e500 in place, also add the same simple support to book3s. It gets us one step closer to actual fast support. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 17 9月, 2012 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
This patch convert different functions to take virtual page number instead of virtual address. Virtual page number is virtual address shifted right by VPN_SHIFT (12) bits. This enable us to have an address range of upto 76 bits. Reviewed-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 16 8月, 2012 1 次提交
-
-
由 Alexander Graf 提交于
When we map a page that wasn't icache cleared before, do so when first mapping it in KVM using the same information bits as the Linux mapping logic. That way we are 100% sure that any page we map does not have stale entries in the icache. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 16 5月, 2012 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
The code forgot to scramble the VSIDs the way we normally do and was basically using the "proto VSID" directly with the MMU. This means that in practice, KVM used random VSIDs that could collide with segments used by other user space programs. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> [agraf: simplify ppc32 case] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 05 3月, 2012 1 次提交
-
-
由 Alexander Graf 提交于
When running the 64-bit Book3s PR code without CONFIG_PREEMPT_NONE, we were doing a few things wrong, most notably access to PACA fields without making sure that the pointers stay stable accross the access (preempt_disable()). This patch moves to_svcpu towards a get/put model which allows us to disable preemption while accessing the shadow vcpu fields in the PACA. That way we can run preemptible and everyone's happy! Reported-by: NJörg Sommer <joerg@alea.gnuu.de> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 24 10月, 2010 9 次提交
-
-
由 Alexander Graf 提交于
Up until now we were doing segment mappings wrong on Book3s_32. For Book3s_64 we were using a trick where we know that a single mmu_context gives us 16 bits of context ids. The mm system on Book3s_32 instead uses a clever algorithm to distribute VSIDs across the available range, so a context id really only gives us 16 available VSIDs. To keep at least a few guest processes in the SID shadow, let's map a number of contexts that we can use as VSID pool. This makes the code be actually correct and shouldn't hurt performance too much. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
The define VSID_ALL is unused. Let's remove it. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
It turns out the in-kernel hash function is sub-optimal for our subtle hash inputs where every bit is significant. So let's revert to the original hash functions. This reverts commit 05340ab4f9a6626f7a2e8f9fe5397c61d494f445. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
This patch moves debugging printks for shadow SLB debugging over to tracepoints. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
After a flush the sid map contained lots of entries with 0 for their gvsid and hvsid value. Unfortunately, 0 can be a real value the guest searches for when looking up a vsid so it would incorrectly find the host's 0 hvsid mapping which doesn't belong to our sid space. So let's also check for the valid bit that indicated that the sid we're looking at actually contains useful data. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
This patch moves Book3s MMU debugging over to tracepoints. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Gleb Natapov 提交于
On failure gfn_to_pfn returns bad_page so use correct function to check for that. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Alexander Graf 提交于
We need to override EA as well as PA lookups for the magic page. When the guest tells us to project it, the magic page overrides any guest mappings. In order to reflect that, we need to hook into all the MMU layers of KVM to force map the magic page if necessary. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Alexander Graf 提交于
One of the most obvious registers to share with the guest directly is the MSR. The MSR contains the "interrupts enabled" flag which the guest has to toggle in critical sections. So in order to bring the overhead of interrupt en- and disabling down, let's put msr into the shared page. Keep in mind that even though you can fully read its contents, writing to it doesn't always update all state. There are a few safe fields that don't require hypervisor interaction. See the documentation for a list of MSR bits that are safe to be set from inside the guest. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 01 8月, 2010 3 次提交
-
-
由 Alexander Graf 提交于
We just introduced generic functions to handle shadow pages on PPC. This patch makes the respective backends make use of them, getting rid of a lot of duplicate code along the way. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Alexander Graf 提交于
The linux kernel already provides a hash function. Let's reuse that instead of reinventing the wheel! Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Alexander Graf 提交于
Initially we had to search for pte entries to invalidate them. Since the logic has improved since then, we can just get rid of the search function. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 17 5月, 2010 7 次提交
-
-
由 Alexander Graf 提交于
We have some debug output in Book3S_64. Some of that was invalid though, partially not even compiling because it accessed incorrect variables. So let's fix that up, making debugging more fun again. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Alexander Graf 提交于
We have a condition in the ppc64 host mmu code that should never occur. Unfortunately, it just did happen to me and I was rather puzzled on why, because BUG_ON doesn't tell me anything useful. So let's add some more debug output in case this goes wrong. Also change BUG to WARN, since I don't want to reboot every time I mess something up. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Alexander Graf 提交于
There are some pieces in the code that I overlooked that still use u64s instead of longs. This slows down 32 bit hosts unnecessarily, so let's just move them to ulong. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Alexander Graf 提交于
When we mapped a page as read-only, we can just release it as clean to KVM's page claim mechanisms, because we're pretty sure it hasn't been touched. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Alexander Graf 提交于
The host shadow mmu code needs to get initialized. It needs to fetch a segment it can use to put shadow PTEs into. That initialization code was in generic code, which is icky. Let's move it over to the respective MMU file. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Alexander Graf 提交于
We already have some inline fuctions we use to access vcpu or svcpu structs, depending on whether we're on booke or book3s. Since we just put a few more registers into the svcpu, we also need to make sure the respective callbacks are available and get used. So this patch moves direct use of the now in the svcpu struct fields to inline function calls. While at it, it also moves the definition of those inline function calls to respective header files for booke and book3s, greatly improving readability. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Alexander Graf 提交于
Some HTAB providers (namely the PS3) ignore the SECONDARY flag. They just put an entry in the htab as secondary when they see fit. So we need to check the return value of htab_insert to remember the correct slot id so we can actually invalidate the entry again. Fixes KVM on the PS3. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 25 4月, 2010 1 次提交
-
-
由 Alexander Graf 提交于
We had code to make use of the secondary htab buckets, but kept that disabled because it was unstable when I put it in. I checked again if that's still the case and apparently it was only exposing some instability that was there anyways before. I haven't seen any badness related to usage of secondary htab entries so far. This should speed up guest memory allocations by quite a bit, because we now have more space to put PTEs in. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 05 11月, 2009 1 次提交
-
-
由 Alexander Graf 提交于
We designed the Book3S port of KVM as modular as possible. Most of the code could be easily used on a Book3S_32 host as well. The main difference between 32 and 64 bit cores is the MMU. To keep things well separated, we treat the book3s_64 MMU as one possible compile option. This patch adds all the MMU helpers the rest of the code needs in order to modify the host's MMU, like setting PTEs and segments. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-