- 01 5月, 2016 1 次提交
-
-
由 Michael Ellerman 提交于
We have five locations in 64-bit hash MMU code that do a cmpxchg() of a PTE. Currently doing it inline OK, but in a future patch we will be converting the PTEs to __be64 in some configs. In that case we will need casts at every cmpxchg() site in order to keep sparse happy. So move the logic into a helper, this is a reasonably nice cleanup on its own. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 16 2月, 2016 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
SPAPR_TCE_SHIFT is used in few places only and since IOMMU_PAGE_SHIFT_4K can be easily used instead, remove SPAPR_TCE_SHIFT. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 05 6月, 2015 1 次提交
-
-
由 Paolo Bonzini 提交于
Only two ioctls have to be modified; the address space id is placed in the higher 16 bits of their slot id argument. As of this patch, no architecture defines more than one address space; x86 will be the first. Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 4月, 2015 3 次提交
-
-
由 Paul Mackerras 提交于
This replaces the assembler code for kvmhv_commence_exit() with C code in book3s_hv_builtin.c. It also moves the IPI sending code that was in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq(). Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This creates a debugfs directory for each HV guest (assuming debugfs is enabled in the kernel config), and within that directory, a file by which the contents of the guest's HPT (hashed page table) can be read. The directory is named vmnnnn, where nnnn is the PID of the process that created the guest. The file is named "htab". This is intended to help in debugging problems in the host's management of guest memory. The contents of the file consist of a series of lines like this: 3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196 The first field is the index of the entry in the HPT, the second and third are the HPT entry, so the third entry contains the real page number that is mapped by the entry if the entry's valid bit is set. The fourth field is the guest's view of the second doubleword of the entry, so it contains the guest physical address. (The format of the second through fourth fields are described in the Power ISA and also in arch/powerpc/include/asm/mmu-hash64.h.) Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aneesh Kumar K.V 提交于
This adds helper routines for locking and unlocking HPTEs, and uses them in the rest of the code. We don't change any locking rules in this patch. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 17 4月, 2015 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
For THP that is marked trans splitting, we return the pte. This require the callers to handle the pmd_trans_splitting scenario, if they care. All the current callers are either looking at pfn or write_ok, hence we don't need to update them. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
pte can get updated from other CPUs as part of multiple activities like THP split, huge page collapse, unmap. We need to make sure we don't reload the pte value again and again for different checks. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 10 4月, 2015 1 次提交
-
-
由 Michael Ellerman 提交于
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Fix the 32-bit code also] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 31 3月, 2015 1 次提交
-
-
由 Joe Perches 提交于
Use the normal return values for bool functions Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 17 12月, 2014 1 次提交
-
-
由 Paul Mackerras 提交于
This removes the code that was added to enable HV KVM to work on PPC970 processors. The PPC970 is an old CPU that doesn't support virtualizing guest memory. Removing PPC970 support also lets us remove the code for allocating and managing contiguous real-mode areas, the code for the !kvm->arch.using_mmu_notifiers case, the code for pinning pages of guest memory when first accessed and keeping track of which pages have been pinned, and the code for handling H_ENTER hypercalls in virtual mode. Book3S HV KVM is now supported only on POWER7 and POWER8 processors. The KVM_CAP_PPC_RMA capability now always returns 0. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 15 12月, 2014 1 次提交
-
-
由 Paul Mackerras 提交于
The B (segment size) field in the RB operand for the tlbie instruction is two bits, which we get from the top two bits of the first doubleword of the HPT entry to be invalidated. These bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM bit numbering). The compute_tlbie_rb() function gets these bits as v >> (62 - 8), which is not correct as it will bring in the top 10 bits, not just the top two. These extra bits could corrupt the AP, AVAL and L fields in the RB value. To fix this we shift right 62 bits and then shift left 8 bits, so we only get the two bits of the B field. The first doubleword of the HPT entry is under the control of the guest kernel. In fact, Linux guests will always put zeroes in bits 54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing this. Signed-off-by: NPaul Mackerras <paulus@samba.org> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 28 7月, 2014 3 次提交
-
-
由 Aneesh Kumar K.V 提交于
When calculating the lower bits of AVA field, use the shift count based on the base page size. Also add the missing segment size and remove stale comment. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
When running on an LE host all data structures are kept in little endian byte order. However, the HTAB still needs to be maintained in big endian. So every time we access any HTAB we need to make sure we do so in the right byte order. Fix up all accesses to manually byte swap. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
Some compilers complain about uninitialized variables in the compute_tlbie_rb function. When you follow the code path you'll realize that we'll never get to that point, but the compiler isn't all that smart. So just default to 4k page sizes for everything, making the compiler happy and the code slightly easier to read. Signed-off-by: NAlexander Graf <agraf@suse.de> Acked-by: NPaul Mackerras <paulus@samba.org>
-
- 25 6月, 2014 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
With guests supporting Multiple page size per segment (MPSS), hpte_page_size returns the actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB. Without this patch a hpte lookup can fail since we are comparing wrong page size in kvmppc_hv_find_lock_hpte. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 30 5月, 2014 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
On recent IBM Power CPUs, while the hashed page table is looked up using the page size from the segmentation hardware (i.e. the SLB), it is possible to have the HPT entry indicate a larger page size. Thus for example it is possible to put a 16MB page in a 64kB segment, but since the hash lookup is done using a 64kB page size, it may be necessary to put multiple entries in the HPT for a single 16MB page. This capability is called mixed page-size segment (MPSS). With MPSS, there are two relevant page sizes: the base page size, which is the size used in searching the HPT, and the actual page size, which is the size indicated in the HPT entry. [ Note that the actual page size is always >= base page size ]. We use "ibm,segment-page-sizes" device tree node to advertise the MPSS support to PAPR guest. The penc encoding indicates whether we support a specific combination of base page size and actual page size in the same segment. We also use the penc value in the LP encoding of HPTE entry. This patch exposes MPSS support to KVM guest by advertising the feature via "ibm,segment-page-sizes". It also adds the necessary changes to decode the base page size and the actual page size correctly from the HPTE entry. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 29 3月, 2014 1 次提交
-
-
由 Paul Mackerras 提交于
With HV KVM, some high-frequency hypercalls such as H_ENTER are handled in real mode, and need to access the memslots array for the guest. Accessing the memslots array is safe, because we hold the SRCU read lock for the whole time that a guest vcpu is running. However, the checks that kvm_memslots() does when lockdep is enabled are potentially unsafe in real mode, when only the linear mapping is available. Furthermore, kvm_memslots() can be called from a secondary CPU thread, which is an offline CPU from the point of view of the host kernel, and is not running the task which holds the SRCU read lock. To avoid false positives in the checks in kvm_memslots(), and to avoid possible side effects from doing the checks in real mode, this replaces kvm_memslots() with kvm_memslots_raw() in all the places that execute in real mode. kvm_memslots_raw() is a new function that is like kvm_memslots() but uses rcu_dereference_raw_notrace() instead of kvm_dereference_check(). Signed-off-by: NPaul Mackerras <paulus@samba.org> Acked-by: NScott Wood <scottwood@freescale.com>
-
- 17 10月, 2013 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
This help ups to select the relevant code in the kernel code when we later move HV and PR bits as seperate modules. The patch also makes the config options for PR KVM selectable Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aneesh Kumar K.V 提交于
With later patches supporting PR kvm as a kernel module, the changes that has to be built into the main kernel binary to enable PR KVM module is now selected via KVM_BOOK3S_PR_POSSIBLE Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 10 7月, 2013 1 次提交
-
-
由 Paul Mackerras 提交于
This corrects the usage of the tlbie (TLB invalidate entry) instruction in HV KVM. The tlbie instruction changed between PPC970 and POWER7. On the PPC970, the bit to select large vs. small page is in the instruction, not in the RB register value. This changes the code to use the correct form on PPC970. On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower) field of the RB value incorrectly for 64k pages. This fixes it. Since we now have several cases to handle for the tlbie instruction, this factors out the code to do a sequence of tlbies into a new function, do_tlbies(), and calls that from the various places where the code was doing tlbie instructions inline. It also makes kvmppc_h_bulk_remove() use the same global_invalidates() function for determining whether to do local or global TLB invalidations as is used in other places, for consistency, and also to make sure that kvm->arch.need_tlb_flush gets updated properly. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 08 7月, 2013 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
Older version of power architecture use Real Mode Offset register and Real Mode Limit Selector for mapping guest Real Mode Area. The guest RMA should be physically contigous since we use the range when address translation is not enabled. This patch switch RMA allocation code to use contigous memory allocator. The patch also remove the the linear allocator which not used any more Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aneesh Kumar K.V 提交于
Powerpc architecture uses a hash based page table mechanism for mapping virtual addresses to physical address. The architecture require this hash page table to be physically contiguous. With KVM on Powerpc currently we use early reservation mechanism for allocating guest hash page table. This implies that we need to reserve a big memory region to ensure we can create large number of guest simultaneously with KVM on Power. Another disadvantage is that the reserved memory is not available to rest of the subsystems and and that implies we limit the total available memory in the host. This patch series switch the guest hash page table allocation to use contiguous memory allocator. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 21 6月, 2013 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
We can find pte that are splitting while walking page tables. Return None pte in that case. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 27 4月, 2013 1 次提交
-
-
由 Paul Mackerras 提交于
At present, the code that determines whether a HPT entry has changed, and thus needs to be sent to userspace when it is copying the HPT, doesn't consider a hardware update to the reference and change bits (R and C) in the HPT entries to constitute a change that needs to be sent to userspace. This adds code to check for changes in R and C when we are scanning the HPT to find changed entries, and adds code to set the changed flag for the HPTE when we update the R and C bits in the guest view of the HPTE. Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification() function into kvm_book3s_64.h. Current Linux guest kernels don't use the hardware updates of R and C in the HPT, so this change won't affect them. Linux (or other) kernels might in future want to use the R and C bits and have them correctly transferred across when a guest is migrated, so it is better to correct this deficiency. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 06 12月, 2012 2 次提交
-
-
由 Paul Mackerras 提交于
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on this fd return the contents of the HPT (hashed page table), writes create and/or remove entries in the HPT. There is a new capability, KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl takes an argument structure with the index of the first HPT entry to read out and a set of flags. The flags indicate whether the user is intending to read or write the HPT, and whether to return all entries or only the "bolted" entries (those with the bolted bit, 0x10, set in the first doubleword). This is intended for use in implementing qemu's savevm/loadvm and for live migration. Therefore, on reads, the first pass returns information about all HPTEs (or all bolted HPTEs). When the first pass reaches the end of the HPT, it returns from the read. Subsequent reads only return information about HPTEs that have changed since they were last read. A read that finds no changed HPTEs in the HPT following where the last read finished will return 0 bytes. The format of the data provides a simple run-length compression of the invalid entries. Each block of data starts with a header that indicates the index (position in the HPT, which is just an array), the number of valid entries starting at that index (may be zero), and the number of invalid entries following those valid entries. The valid entries, 16 bytes each, follow the header. The invalid entries are not explicitly represented. Signed-off-by: NPaul Mackerras <paulus@samba.org> [agraf: fix documentation] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This uses a bit in our record of the guest view of the HPTE to record when the HPTE gets modified. We use a reserved bit for this, and ensure that this bit is always cleared in HPTE values returned to the guest. The recording of modified HPTEs is only done if other code indicates its interest by setting kvm->arch.hpte_mod_interest to a non-zero value. The reason for this is that when later commits add facilities for userspace to read the HPT, the first pass of reading the HPT will be quicker if there are no (or very few) HPTEs marked as modified, rather than having most HPTEs marked as modified. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 30 10月, 2012 1 次提交
-
-
由 Paul Mackerras 提交于
This fixes an error in the inline asm in try_lock_hpte() where we were erroneously using a register number as an immediate operand. The bug only affects an error path, and in fact the code will still work as long as the compiler chooses some register other than r0 for the "bits" variable. Nevertheless it should still be fixed. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 30 5月, 2012 1 次提交
-
-
由 Paul Mackerras 提交于
This adds a new ioctl to enable userspace to control the size of the guest hashed page table (HPT) and to clear it out when resetting the guest. The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter a pointer to a u32 containing the desired order of the HPT (log base 2 of the size in bytes), which is updated on successful return to the actual order of the HPT which was allocated. There must be no vcpus running at the time of this ioctl. To enforce this, we now keep a count of the number of vcpus running in kvm->arch.vcpus_running. If the ioctl is called when a HPT has already been allocated, we don't reallocate the HPT but just clear it out. We first clear the kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold the kvm->lock mutex, it will prevent any vcpus from starting to run until we're done, and (b) it means that the first vcpu to run after we're done will re-establish the VRMA if necessary. If userspace doesn't call this ioctl before running the first vcpu, the kernel will allocate a default-sized HPT at that point. We do it then rather than when creating the VM, as the code did previously, so that userspace has a chance to do the ioctl if it wants. When allocating the HPT, we can allocate either from the kernel page allocator, or from the preallocated pool. If userspace is asking for a different size from the preallocated HPTs, we first try to allocate using the kernel page allocator. Then we try to allocate from the preallocated pool, and then if that fails, we try allocating decreasing sizes from the kernel page allocator, down to the minimum size allowed (256kB). Note that the kernel page allocator limits allocations to 1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to 16MB (on 64-bit powerpc, at least). Signed-off-by: NPaul Mackerras <paulus@samba.org> [agraf: fix module compilation] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 05 3月, 2012 10 次提交
-
-
由 Paul Mackerras 提交于
With this, if a guest does an H_ENTER with a read/write HPTE on a page which is currently read-only, we make the actual HPTE inserted be a read-only version of the HPTE. We now intercept protection faults as well as HPTE not found faults, and for a protection fault we work out whether it should be reflected to the guest (e.g. because the guest HPTE didn't allow write access to usermode) or handled by switching to kernel context and calling kvmppc_book3s_hv_page_fault, which will then request write access to the page and update the actual HPTE. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This adds the infrastructure to enable us to page out pages underneath a Book3S HV guest, on processors that support virtualized partition memory, that is, POWER7. Instead of pinning all the guest's pages, we now look in the host userspace Linux page tables to find the mapping for a given guest page. Then, if the userspace Linux PTE gets invalidated, kvm_unmap_hva() gets called for that address, and we replace all the guest HPTEs that refer to that page with absent HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit set, which will cause an HDSI when the guest tries to access them. Finally, the page fault handler is extended to reinstantiate the guest HPTE when the guest tries to access a page which has been paged out. Since we can't intercept the guest DSI and ISI interrupts on PPC970, we still have to pin all the guest pages on PPC970. We have a new flag, kvm->arch.using_mmu_notifiers, that indicates whether we can page guest pages out. If it is not set, the MMU notifier callbacks do nothing and everything operates as before. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This provides the low-level support for MMIO emulation in Book3S HV guests. When the guest tries to map a page which is not covered by any memslot, that page is taken to be an MMIO emulation page. Instead of inserting a valid HPTE, we insert an HPTE that has the valid bit clear but another hypervisor software-use bit set, which we call HPTE_V_ABSENT, to indicate that this is an absent page. An absent page is treated much like a valid page as far as guest hcalls (H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that an absent HPTE doesn't need to be invalidated with tlbie since it was never valid as far as the hardware is concerned. When the guest accesses a page for which there is an absent HPTE, it will take a hypervisor data storage interrupt (HDSI) since we now set the VPM1 bit in the LPCR. Our HDSI handler for HPTE-not-present faults looks up the hash table and if it finds an absent HPTE mapping the requested virtual address, will switch to kernel mode and handle the fault in kvmppc_book3s_hv_page_fault(), which at present just calls kvmppc_hv_emulate_mmio() to set up the MMIO emulation. This is based on an earlier patch by Benjamin Herrenschmidt, but since heavily reworked. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This expands the reverse mapping array to contain two links for each HPTE which are used to link together HPTEs that correspond to the same guest logical page. Each circular list of HPTEs is pointed to by the rmap array entry for the guest logical page, pointed to by the relevant memslot. Links are 32-bit HPT entry indexes rather than full 64-bit pointers, to save space. We use 3 of the remaining 32 bits in the rmap array entries as a lock bit, a referenced bit and a present bit (the present bit is needed since HPTE index 0 is valid). The bit lock for the rmap chain nests inside the HPTE lock bit. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This provides for the case where userspace maps an I/O device into the address range of a memory slot using a VM_PFNMAP mapping. In that case, we work out the pfn from vma->vm_pgoff, and record the cache enable bits from vma->vm_page_prot in two low-order bits in the slot_phys array entries. Then, in kvmppc_h_enter() we check that the cache bits in the HPTE that the guest wants to insert match the cache bits in the slot_phys array entry. However, we do allow the guest to create what it thinks is a non-cacheable or write-through mapping to memory that is actually cacheable, so that we can use normal system memory as part of an emulated device later on. In that case the actual HPTE we insert is a cacheable HPTE. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This relaxes the requirement that the guest memory be provided as 16MB huge pages, allowing it to be provided as normal memory, i.e. in pages of PAGE_SIZE bytes (4k or 64k). To allow this, we index the kvm->arch.slot_phys[] arrays with a small page index, even if huge pages are being used, and use the low-order 5 bits of each entry to store the order of the enclosing page with respect to normal pages, i.e. log_2(enclosing_page_size / PAGE_SIZE). Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This removes the code from kvmppc_core_prepare_memory_region() that looked up the VMA for the region being added and called hva_to_page to get the pfns for the memory. We have no guarantee that there will be anything mapped there at the time of the KVM_SET_USER_MEMORY_REGION ioctl call; userspace can do that ioctl and then map memory into the region later. Instead we defer looking up the pfn for each memory page until it is needed, which generally means when the guest does an H_ENTER hcall on the page. Since we can't call get_user_pages in real mode, if we don't already have the pfn for the page, kvmppc_h_enter() will return H_TOO_HARD and we then call kvmppc_virtmode_h_enter() once we get back to kernel context. That calls kvmppc_get_guest_page() to get the pfn for the page, and then calls back to kvmppc_h_enter() to redo the HPTE insertion. When the first vcpu starts executing, we need to have the RMO or VRMA region mapped so that the guest's real mode accesses will work. Thus we now have a check in kvmppc_vcpu_run() to see if the RMO/VRMA is set up and if not, call kvmppc_hv_setup_rma(). It checks if the memslot starting at guest physical 0 now has RMO memory mapped there; if so it sets it up for the guest, otherwise on POWER7 it sets up the VRMA. The function that does that, kvmppc_map_vrma, is now a bit simpler, as it calls kvmppc_virtmode_h_enter instead of creating the HPTE itself. Since we are now potentially updating entries in the slot_phys[] arrays from multiple vcpu threads, we now have a spinlock protecting those updates to ensure that we don't lose track of any references to pages. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
At present, our implementation of H_ENTER only makes one try at locking each slot that it looks at, and doesn't even retry the ldarx/stdcx. atomic update sequence that it uses to attempt to lock the slot. Thus it can return the H_PTEG_FULL error unnecessarily, particularly when the H_EXACT flag is set, meaning that the caller wants a specific PTEG slot. This improves the situation by making a second pass when no free HPTE slot is found, where we spin until we succeed in locking each slot in turn and then check whether it is full while we hold the lock. If the second pass fails, then we return H_PTEG_FULL. This also moves lock_hpte to a header file (since later commits in this series will need to use it from other source files) and renames it to try_lock_hpte, which is a somewhat less misleading name. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Paul Mackerras 提交于
This adds an array that parallels the guest hashed page table (HPT), that is, it has one entry per HPTE, used to store the guest's view of the second doubleword of the corresponding HPTE. The first doubleword in the HPTE is the same as the guest's idea of it, so we don't need to store a copy, but the second doubleword in the HPTE has the real page number rather than the guest's logical page number. This allows us to remove the back_translate() and reverse_xlate() functions. This "reverse mapping" array is vmalloc'd, meaning that to access it in real mode we have to walk the kernel's page tables explicitly. That is done by the new real_vmalloc_addr() function. (In fact this returns an address in the linear mapping, so the result is usable both in real mode and in virtual mode.) There are also some minor cleanups here: moving the definitions of HPT_ORDER etc. to a header file and defining HPT_NPTE for HPT_NPTEG << 3. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Alexander Graf 提交于
When running the 64-bit Book3s PR code without CONFIG_PREEMPT_NONE, we were doing a few things wrong, most notably access to PACA fields without making sure that the pointers stay stable accross the access (preempt_disable()). This patch moves to_svcpu towards a get/put model which allows us to disable preemption while accessing the shadow vcpu fields in the PACA. That way we can run preemptible and everyone's happy! Reported-by: NJörg Sommer <joerg@alea.gnuu.de> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 26 12月, 2011 1 次提交
-
-
由 Andreas Schwab 提交于
compute_tlbie_rb is only used on ppc64 and cannot be compiled on ppc32. Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-