- 29 8月, 2013 1 次提交
-
-
由 Paul Mackerras 提交于
This reworks kvmppc_mmu_book3s_64_xlate() to make it check the large page bit in the hashed page table entries (HPTEs) it looks at, and to simplify and streamline the code. The checking of the first dword of each HPTE is now done with a single mask and compare operation, and all the code dealing with the matching HPTE, if we find one, is consolidated in one place in the main line of the function flow. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 28 8月, 2013 5 次提交
-
-
由 Paul Mackerras 提交于
It turns out that if we exit the guest due to a hcall instruction (sc 1), and the loading of the instruction in the guest exit path fails for any reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the instruction after the hcall instruction rather than the hcall itself. This in turn means that the instruction doesn't get recognized as an hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel as a sc instruction. That usually results in the guest kernel getting a return code of 38 (ENOSYS) from an hcall, which often triggers a BUG_ON() or other failure. This fixes the problem by adding a new variant of kvmppc_get_last_inst() called kvmppc_get_last_sc(), which fetches the instruction if necessary from pc - 4 rather than pc. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
Currently the code assumes that once we load up guest FP/VSX or VMX state into the CPU, it stays valid in the CPU registers until we explicitly flush it to the thread_struct. However, on POWER7, copy_page() and memcpy() can use VMX. These functions do flush the VMX state to the thread_struct before using VMX instructions, but if this happens while we have guest state in the VMX registers, and we then re-enter the guest, we don't reload the VMX state from the thread_struct, leading to guest corruption. This has been observed to cause guest processes to segfault. To fix this, we check before re-entering the guest that all of the bits corresponding to facilities owned by the guest, as expressed in vcpu->arch.guest_owned_ext, are set in current->thread.regs->msr. Any bits that have been cleared correspond to facilities that have been used by kernel code and thus flushed to the thread_struct, so for them we reload the state from the thread_struct. We also need to check current->thread.regs->msr before calling giveup_fpu() or giveup_altivec(), since if the relevant bit is clear, the state has already been flushed to the thread_struct and to flush it again would corrupt it. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
Commit 8e44ddc3 ("powerpc/kvm/book3s: Add support for H_IPOLL and H_XIRR_X in XICS emulation") added a call to get_tb() but didn't include the header that defines it, and on some configs this means book3s_xics.c fails to compile: arch/powerpc/kvm/book3s_xics.c: In function ‘kvmppc_xics_hcall’: arch/powerpc/kvm/book3s_xics.c:812:3: error: implicit declaration of function ‘get_tb’ [-Werror=implicit-function-declaration] Cc: stable@vger.kernel.org [v3.10, v3.11] Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
err was overwritten by a previous function call, and checked to be 0. If the following page allocation fails, 0 is going to be returned instead of -ENOMEM. Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Chen Gang 提交于
'rmls' is 'unsigned long', lpcr_rmls() will return negative number when failure occurs, so it need a type cast for comparing. 'lpid' is 'unsigned long', kvmppc_alloc_lpid() return negative number when failure occurs, so it need a type cast for comparing. Signed-off-by: NChen Gang <gang.chen@asianux.com> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 26 8月, 2013 1 次提交
-
-
由 Yann Droneaud 提交于
KVM uses anon_inode_get() to allocate file descriptors as part of some of its ioctls. But those ioctls are lacking a flag argument allowing userspace to choose options for the newly opened file descriptor. In such case it's advised to use O_CLOEXEC by default so that userspace is allowed to choose, without race, if the file descriptor is going to be inherited across exec(). This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: NYann Droneaud <ydroneaud@opteya.com> Link: http://lkml.kernel.org/r/cover.1377372576.git.ydroneaud@opteya.comReviewed-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 23 8月, 2013 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
Otherwise we would clear the pvr value Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 25 7月, 2013 1 次提交
-
-
由 Paul Mackerras 提交于
Unlike the other general-purpose SPRs, SPRG3 can be read by usermode code, and is used in recent kernels to store the CPU and NUMA node numbers so that they can be read by VDSO functions. Thus we need to load the guest's SPRG3 value into the real SPRG3 register when entering the guest, and restore the host's value when exiting the guest. We don't need to save the guest SPRG3 value when exiting the guest as usermode code can't modify SPRG3. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 18 7月, 2013 1 次提交
-
-
由 Takuya Yoshikawa 提交于
This is called right after the memslots is updated, i.e. when the result of update_memslots() gets installed in install_new_memslots(). Since the memslots needs to be updated twice when we delete or move a memslot, kvm_arch_commit_memory_region() does not correspond to this exactly. In the following patch, x86 will use this new API to check if the mmio generation has reached its maximum value, in which case mmio sptes need to be flushed out. Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Acked-by: NAlexander Graf <agraf@suse.de> Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 7月, 2013 2 次提交
-
-
由 Scott Wood 提交于
kvm_guest_enter() was already called by kvmppc_prepare_to_enter(). Don't call it again. Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Scott Wood 提交于
Currently this is only being done on 64-bit. Rather than just move it out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is consistent with lazy ee state, and so that we don't track more host code as interrupts-enabled than necessary. Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect that this function now has a role on 32-bit as well. Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 10 7月, 2013 2 次提交
-
-
由 Paul Mackerras 提交于
The table of offsets to real-mode hcall handlers in book3s_hv_rmhandlers.S can contain negative values, if some of the handlers end up before the table in the vmlinux binary. Thus we need to use a sign-extending load to read the values in the table rather than a zero-extending load. Without this, the host crashes when the guest does one of the hcalls with negative offsets, due to jumping to a bogus address. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This corrects the usage of the tlbie (TLB invalidate entry) instruction in HV KVM. The tlbie instruction changed between PPC970 and POWER7. On the PPC970, the bit to select large vs. small page is in the instruction, not in the RB register value. This changes the code to use the correct form on PPC970. On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower) field of the RB value incorrectly for 64k pages. This fixes it. Since we now have several cases to handle for the tlbie instruction, this factors out the code to do a sequence of tlbies into a new function, do_tlbies(), and calls that from the various places where the code was doing tlbie instructions inline. It also makes kvmppc_h_bulk_remove() use the same global_invalidates() function for determining whether to do local or global TLB invalidations as is used in other places, for consistency, and also to make sure that kvm->arch.need_tlb_flush gets updated properly. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 08 7月, 2013 4 次提交
-
-
由 Aneesh Kumar K.V 提交于
Both RMA and hash page table request will be a multiple of 256K. We can use a chunk size of 256K to track the free/used 256K chunk in the bitmap. This should help to reduce the bitmap size. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aneesh Kumar K.V 提交于
Older version of power architecture use Real Mode Offset register and Real Mode Limit Selector for mapping guest Real Mode Area. The guest RMA should be physically contigous since we use the range when address translation is not enabled. This patch switch RMA allocation code to use contigous memory allocator. The patch also remove the the linear allocator which not used any more Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aneesh Kumar K.V 提交于
Powerpc architecture uses a hash based page table mechanism for mapping virtual addresses to physical address. The architecture require this hash page table to be physically contiguous. With KVM on Powerpc currently we use early reservation mechanism for allocating guest hash page table. This implies that we need to reserve a big memory region to ensure we can create large number of guest simultaneously with KVM on Power. Another disadvantage is that the reserved memory is not available to rest of the subsystems and and that implies we limit the total available memory in the host. This patch series switch the guest hash page table allocation to use contiguous memory allocator. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
We don't emulate breakpoints yet, so just ignore reads and writes to / from DABR. This fixes booting of more recent Linux guest kernels for me. Reported-by: NNello Martuscielli <ppc.addon@gmail.com> Tested-by: NNello Martuscielli <ppc.addon@gmail.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 30 6月, 2013 7 次提交
-
-
由 Alexander Graf 提交于
While technically it's legal to write to PIR and have the identifier changed, we don't implement logic to do so because we simply expose vcpu_id to the guest. So instead, let's ignore writes to PIR. This ensures that we don't inject faults into the guest for something the guest is allowed to do. While at it, we cross our fingers hoping that it also doesn't mind that we broke its PIR read values. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
At present, if the guest creates a valid SLB (segment lookaside buffer) entry with the slbmte instruction, then invalidates it with the slbie instruction, then reads the entry with the slbmfee/slbmfev instructions, the result of the slbmfee will have the valid bit set, even though the entry is not actually considered valid by the host. This is confusing, if not worse. This fixes it by zeroing out the orige and origv fields of the SLB entry structure when the entry is invalidated. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
With this, the guest can use 1TB segments as well as 256MB segments. Since we now have the situation where a single emulated guest segment could correspond to multiple shadow segments (as the shadow segments are still 256MB segments), this adds a new kvmppc_mmu_flush_segment() to scan for all shadow segments that need to be removed. This restructures the guest HPT (hashed page table) lookup code to use the correct hashing and matching functions for HPTEs within a 1TB segment. We use the standard hpt_hash() function instead of open-coding the hash calculation, and we use HPTE_V_COMPARE() with an AVPN value that has the B (segment size) field included. The calculation of avpn is done a little earlier since it doesn't change in the loop starting at the do_second label. The computation in kvmppc_mmu_book3s_64_esid_to_vsid() changes so that it returns a 256MB VSID even if the guest SLB entry is a 1TB entry. This is because the users of this function are creating 256MB SLB entries. We set a new VSID_1T flag so that entries created from 1T segments don't collide with entries from 256MB segments. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
The loop in kvmppc_mmu_book3s_64_xlate() that looks up a translation in the guest hashed page table (HPT) keeps going if it finds an HPTE that matches but doesn't allow access. This is incorrect; it is different from what the hardware does, and there should never be more than one matching HPTE anyway. This fixes it to stop when any matching HPTE is found. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
On entering a PR KVM guest, we invalidate the whole SLB before loading up the guest entries. We do this using an slbia instruction, which invalidates all entries except entry 0, followed by an slbie to invalidate entry 0. However, the slbie turns out to be ineffective in some circumstances (specifically when the host linear mapping uses 64k pages) because of errors in computing the parameter to the slbie. The result is that the guest kernel hangs very early in boot because it takes a DSI the first time it tries to access kernel data using a linear mapping address in real mode. Currently we construct bits 36 - 43 (big-endian numbering) of the slbie parameter by taking bits 56 - 63 of the SLB VSID doubleword. These bits for the tlbie are C (class, 1 bit), B (segment size, 2 bits) and 5 reserved bits. For the SLB VSID doubleword these are C (class, 1 bit), reserved (1 bit), LP (large page size, 2 bits), and 4 reserved bits. Thus we are not setting the B field correctly, and when LP = 01 as it is for 64k pages, we are setting a reserved bit. Rather than add more instructions to calculate the slbie parameter correctly, this takes a simpler approach, which is to set entry 0 to zeroes explicitly. Normally slbmte should not be used to invalidate an entry, since it doesn't invalidate the ERATs, but it is OK to use it to invalidate an entry if it is immediately followed by slbia, which does invalidate the ERATs. (This has been confirmed with the Power architects.) This approach takes fewer instructions and will work whatever the contents of entry 0. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Paul Mackerras 提交于
This makes sure the calculation of the proto-VSIDs used by PR KVM is done with 64-bit arithmetic. Since vcpu3s->context_id[] is int, when we do vcpu3s->context_id[0] << ESID_BITS the shift will be done with 32-bit instructions, possibly leading to significant bits getting lost, as the context id can be up to 524283 and ESID_BITS is 18. To fix this we cast the context id to u64 before shifting. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Tiejun Chen 提交于
Availablity of the doorbell_exception function is guarded by CONFIG_PPC_DOORBELL. Use the same define to guard our caller of it. Signed-off-by: NTiejun Chen <tiejun.chen@windriver.com> [agraf: improve patch description] Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 21 6月, 2013 3 次提交
-
-
由 Aneesh Kumar K.V 提交于
We can find pte that are splitting while walking page tables. Return None pte in that case. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Aneesh Kumar K.V 提交于
Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly document why we don't need to handle transparent hugepages at callsites. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Aneesh Kumar K.V 提交于
If a hash bucket gets full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". This implies that when we are invalidating or updating pte, even if HPTE entry is not valid we should do a tlb invalidate. With hugepages, we need to pass the correct actual page size value for tlb invalidation. This change update the patch 0608d692 "powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle transparent hugepages correctly. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 19 6月, 2013 1 次提交
-
-
由 Scott Wood 提交于
kwmppc_lazy_ee_enable() should be called as late as possible, or else we get things like WARN_ON(preemptible()) in enable_kernel_fp() in configurations where preemptible() works. Note that book3s_pr already waits until just before __kvmppc_vcpu_run to call kvmppc_lazy_ee_enable(). Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 6月, 2013 4 次提交
-
-
由 Scott Wood 提交于
EE is hard-disabled on entry to kvmppc_handle_exit(), so call hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled is unset. Without this, we get warnings such as arch/powerpc/kernel/time.c:300, and sometimes host kernel hangs. Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Scott Wood 提交于
KVM core expects arch code to acquire the srcu lock when calling gfn_to_memslot and similar functions. Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Scott Wood 提交于
The previous patch made 64-bit booke KVM build again, but Altivec support is still not complete, and we can't prevent the guest from turning on Altivec (which can corrupt host state until state save/restore is implemented). Disable e6500 on KVM until this is fixed. Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Lai Jiangshan 提交于
At the point of up_out label in kvmppc_hv_setup_htab_rma(), srcu read lock is still held. We have to release it before return. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Alexander Graf <agraf@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: kvm@vger.kernel.org Cc: kvm-ppc@vger.kernel.org Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
- 01 6月, 2013 1 次提交
-
-
由 Paul Mackerras 提交于
This adds the remaining two hypercalls defined by PAPR for manipulating the XICS interrupt controller, H_IPOLL and H_XIRR_X. H_IPOLL returns information about the priority and pending interrupts for a virtual cpu, without changing any state. H_XIRR_X is like H_XIRR in that it reads and acknowledges the highest-priority pending interrupt, but it also returns the timestamp (timebase register value) from when the interrupt was first received by the hypervisor. Currently we just return the current time, since we don't do any software queueing of virtual interrupts inside the XICS emulation code. These hcalls are not currently used by Linux guests, but may be in future. Signed-off-by: NPaul Mackerras <paulus@samba.org> Acked-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 19 5月, 2013 1 次提交
-
-
由 Marc Zyngier 提交于
As requested by the KVM maintainers, remove the addprefix used to refer to the main KVM code from the arch code, and replace it with a KVM variable that does the same thing. Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Christoffer Dall <cdall@cs.columbia.edu> Acked-by: NXiantao Zhang <xiantao.zhang@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Alexander Graf <agraf@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 02 5月, 2013 5 次提交
-
-
由 Paul Mackerras 提交于
This adds the API for userspace to instantiate an XICS device in a VM and connect VCPUs to it. The API consists of a new device type for the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl, which is used to assert and deassert interrupt inputs of the XICS. The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES. Each attribute within this group corresponds to the state of one interrupt source. The attribute number is the same as the interrupt source number. This does not support irq routing or irqfd yet. Signed-off-by: NPaul Mackerras <paulus@samba.org> Acked-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Wei Yongjun 提交于
Add the missing unlock before return from function set_base_addr() when disables the mapping. Introduced by commit 5df554ad (kvm/ppc/mpic: in-kernel MPIC emulation) Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Scott Wood 提交于
These functions do an srcu_dereference without acquiring the srcu lock themselves. Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Scott Wood 提交于
This is an unused (no pun intended) leftover from when this code did reference counting. Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Scott Wood 提交于
Keeping a linked list of statically defined objects doesn't work very well when we have multiple guests. :-P Switch to an array of constant objects. This fixes a hang when multiple guests are used. Signed-off-by: NScott Wood <scottwood@freescale.com> [agraf: remove struct list_head from mem_reg] Signed-off-by: NAlexander Graf <agraf@suse.de>
-