- 20 6月, 2017 1 次提交
-
-
由 Paul Mackerras 提交于
On a POWER9 system, it is possible for an interrupt to become pending for a VCPU when that VCPU is about to cede (execute a H_CEDE hypercall) and has already disabled interrupts, or in the H_CEDE processing up to the point where the XIVE context is pulled from the hardware. In such a case, the H_CEDE should not sleep, but should return immediately to the guest. However, the conditions tested in kvmppc_vcpu_woken() don't include the condition that a XIVE interrupt is pending, so the VCPU could sleep until the next decrementer interrupt. To fix this, we add a new xive_interrupt_pending() helper which looks in the XIVE context that was pulled from the hardware to see if the priority of any pending interrupt is higher (numerically lower than) the CPU priority. If so then kvmppc_vcpu_woken() will return true. If the XIVE context has never been used, then both the pipr and the cppr fields will be zero and the test will indicate that no interrupt is pending. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 19 6月, 2017 5 次提交
-
-
由 Paul Mackerras 提交于
On POWER9, we no longer have the restriction that we had on POWER8 where all threads in a core have to be in the same partition, so the CPU threads are now independent. However, we still want to be able to run guests with a virtual SMT topology, if only to allow migration of guests from POWER8 systems to POWER9. A guest that has a virtual SMT mode greater than 1 will expect to be able to use the doorbell facility; it will expect the msgsndp and msgclrp instructions to work appropriately and to be able to read sensible values from the TIR (thread identification register) and DPDES (directed privileged doorbell exception status) special-purpose registers. However, since each CPU thread is a separate sub-processor in POWER9, these instructions and registers can only be used within a single CPU thread. In order for these instructions to appear to act correctly according to the guest's virtual SMT mode, we have to trap and emulate them. We cause them to trap by clearing the HFSCR_MSGP bit in the HFSCR register. The emulation is triggered by the hypervisor facility unavailable interrupt that occurs when the guest uses them. To cause a doorbell interrupt to occur within the guest, we set the DPDES register to 1. If the guest has interrupts enabled, the CPU will generate a doorbell interrupt and clear the DPDES register in hardware. The DPDES hardware register for the guest is saved in the vcpu->arch.vcore->dpdes field. Since this gets written by the guest exit code, other VCPUs wishing to cause a doorbell interrupt don't write that field directly, but instead set a vcpu->arch.doorbell_request flag. This is consumed and set to 0 by the guest entry code, which then sets DPDES to 1. Emulating reads of the DPDES register is somewhat involved, because it requires reading the doorbell pending interrupt status of all of the VCPU threads in the virtual core, and if any of those VCPUs are running, their doorbell status is only up-to-date in the hardware DPDES registers of the CPUs where they are running. In order to get a reasonable approximation of the current doorbell status, we send those CPUs an IPI, causing an exit from the guest which will update the vcpu->arch.vcore->dpdes field. We then use that value in constructing the emulated DPDES register value. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
This allows userspace to set the desired virtual SMT (simultaneous multithreading) mode for a VM, that is, the number of VCPUs that get assigned to each virtual core. Previously, the virtual SMT mode was fixed to the number of threads per subcore, and if userspace wanted to have fewer vcpus per vcore, then it would achieve that by using a sparse CPU numbering. This had the disadvantage that the vcpu numbers can get quite large, particularly for SMT1 guests on a POWER8 with 8 threads per core. With this patch, userspace can set its desired virtual SMT mode and then use contiguous vcpu numbering. On POWER8, where the threading mode is "strict", the virtual SMT mode must be less than or equal to the number of threads per subcore. On POWER9, which implements a "loose" threading mode, the virtual SMT mode can be any power of 2 between 1 and 8, even though there is effectively one thread per subcore, since the threads are independent and can all be in different partitions. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
This adds code to allow us to use a different value for the HFSCR (Hypervisor Facilities Status and Control Register) when running the guest from that which applies in the host. The reason for doing this is to allow us to trap the msgsndp instruction and related operations in future so that they can be virtualized. We also save the value of HFSCR when a hypervisor facility unavailable interrupt occurs, because the high byte of HFSCR indicates which facility the guest attempted to access. We save and restore the host value on guest entry/exit because some bits of it affect host userspace execution. We only do all this on POWER9, not on POWER8, because we are not intending to virtualize any of the facilities controlled by HFSCR on POWER8. In particular, the HFSCR bit that controls execution of msgsndp and related operations does not exist on POWER8. The HFSCR doesn't exist at all on POWER7. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
It is possible, through a narrow race condition, for a VCPU to exit the guest with a H_CEDE hypercall while it has a doorbell interrupt pending. In this case, the H_CEDE should return immediately, but in fact it puts the VCPU to sleep until some other interrupt becomes pending or a prod is received (via another VCPU doing H_PROD). This fixes it by checking the DPDES (Directed Privileged Doorbell Exception Status) bit for the thread along with the other interrupt pending bits. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
This allows userspace (e.g. QEMU) to enable large decrementer mode for the guest when running on a POWER9 host, by setting the LPCR_LD bit in the guest LPCR value. With this, the guest exit code saves 64 bits of the guest DEC value on exit. Other places that use the guest DEC value check the LPCR_LD bit in the guest LPCR value, and if it is set, omit the 32-bit sign extension that would otherwise be done. This doesn't change the DEC emulation used by PR KVM because PR KVM is not supported on POWER9 yet. This is partly based on an earlier patch by Oliver O'Halloran. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 16 6月, 2017 1 次提交
-
-
由 Paul Mackerras 提交于
POWER9 DD1 has an erratum where writing to the TBU40 register, which is used to apply an offset to the timebase, can cause the timebase to lose counts. This results in the timebase on some CPUs getting out of sync with other CPUs, which then results in misbehaviour of the timekeeping code. To work around the problem, we make KVM ignore the timebase offset for all guests on POWER9 DD1 machines. This means that live migration cannot be supported on POWER9 DD1 machines. Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 15 6月, 2017 2 次提交
-
-
由 Paul Mackerras 提交于
If userspace attempts to call the KVM_RUN ioctl when it has hardware transactional memory (HTM) enabled, the values that it has put in the HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by guest values. To fix this, we detect this condition and save those SPR values in the thread struct, and disable HTM for the task. If userspace goes to access those SPRs or the HTM facility in future, a TM-unavailable interrupt will occur and the handler will reload those SPRs and re-enable HTM. If userspace has started a transaction and suspended it, we would currently lose the transactional state in the guest entry path and would almost certainly get a "TM Bad Thing" interrupt, which would cause the host to crash. To avoid this, we detect this case and return from the KVM_RUN ioctl with an EINVAL error, with the KVM exit reason set to KVM_EXIT_FAIL_ENTRY. Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08) Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
This restores several special-purpose registers (SPRs) to sane values on guest exit that were missed before. TAR and VRSAVE are readable and writable by userspace, and we need to save and restore them to prevent the guest from potentially affecting userspace execution (not that TAR or VRSAVE are used by any known program that run uses the KVM_RUN ioctl). We save/restore these in kvmppc_vcpu_run_hv() rather than on every guest entry/exit. FSCR affects userspace execution in that it can prohibit access to certain facilities by userspace. We restore it to the normal value for the task on exit from the KVM_RUN ioctl. IAMR is normally 0, and is restored to 0 on guest exit. However, with a radix host on POWER9, it is set to a value that prevents the kernel from executing user-accessible memory. On POWER9, we save IAMR on guest entry and restore it on guest exit to the saved value rather than 0. On POWER8 we continue to set it to 0 on guest exit. PSPB is normally 0. We restore it to 0 on guest exit to prevent userspace taking advantage of the guest having set it non-zero (which would allow userspace to set its SMT priority to high). UAMOR is normally 0. We restore it to 0 on guest exit to prevent the AMR from being used as a covert channel between userspace processes, since the AMR is not context-switched at present. Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08) Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 13 6月, 2017 1 次提交
-
-
由 Paul Mackerras 提交于
This adds code to save the values of three SPRs (special-purpose registers) used by userspace to control event-based branches (EBBs), which are essentially interrupts that get delivered directly to userspace. These registers are loaded up with guest values when entering the guest, and their values are saved when exiting the guest, but we were not saving the host values and restoring them before going back to userspace. On POWER8 this would only affect userspace programs which explicitly request the use of EBBs and also use the KVM_RUN ioctl, since the only source of EBBs on POWER8 is the PMU, and there is an explicit enable bit in the PMU registers (and those PMU registers do get properly context-switched between host and guest). On POWER9 there is provision for externally-generated EBBs, and these are not subject to the control in the PMU registers. Since these registers only affect userspace, we can save them when we first come in from userspace and restore them before returning to userspace, rather than saving/restoring the host values on every guest entry/exit. Similarly, we don't need to worry about their values on offline secondary threads since they execute in the context of the idle task, which never executes in userspace. Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08) Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 28 4月, 2017 1 次提交
-
-
由 Denis Kirjanov 提交于
With CONFIG_DEBUG_PREEMPT, get_paca() produces the following warning in kvmppc_book3s_init_hv() since it calls debug_smp_processor_id(). There is no real issue with the xics_phys field. If paca->kvm_hstate.xics_phys is non-zero on one cpu, it will be non-zero on them all. Therefore this is not fixing any actual problem, just the warning. [ 138.521188] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/5596 [ 138.521308] caller is .kvmppc_book3s_init_hv+0x184/0x350 [kvm_hv] [ 138.521404] CPU: 5 PID: 5596 Comm: modprobe Not tainted 4.11.0-rc3-00022-gc7e790c5 #1 [ 138.521509] Call Trace: [ 138.521563] [c0000007d018b810] [c0000000023eef10] .dump_stack+0xe4/0x150 (unreliable) [ 138.521694] [c0000007d018b8a0] [c000000001f6ec04] .check_preemption_disabled+0x134/0x150 [ 138.521829] [c0000007d018b940] [d00000000a010274] .kvmppc_book3s_init_hv+0x184/0x350 [kvm_hv] [ 138.521963] [c0000007d018ba00] [c00000000191d5cc] .do_one_initcall+0x5c/0x1c0 [ 138.522082] [c0000007d018bad0] [c0000000023e9494] .do_init_module+0x84/0x240 [ 138.522201] [c0000007d018bb70] [c000000001aade18] .load_module+0x1f68/0x2a10 [ 138.522319] [c0000007d018bd20] [c000000001aaeb30] .SyS_finit_module+0xc0/0xf0 [ 138.522439] [c0000007d018be30] [c00000000191baec] system_call+0x38/0xfc Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 27 4月, 2017 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 20 4月, 2017 1 次提交
-
-
由 Markus Elfring 提交于
Add a jump target so that a bit of exception handling can be better reused at the end of this function. Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 10 4月, 2017 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
We traditionally have linux/ before asm/ Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 02 3月, 2017 2 次提交
-
-
由 Ingo Molnar 提交于
We are going to split <linux/sched/stat.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/stat.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 31 1月, 2017 10 次提交
-
-
由 David Gibson 提交于
This adds a not yet working outline of the HPT resizing PAPR extension. Specifically it adds the necessary ioctl() functions, their basic steps, the work function which will handle preparation for the resize, and synchronization between these, the guest page fault path and guest HPT update path. The actual guts of the implementation isn't here yet, so for now the calls will always fail. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 David Gibson 提交于
The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page table (HPT) that userspace expects a guest VM to have, and is also used to clear that HPT when necessary (e.g. guest reboot). At present, once the ioctl() is called for the first time, the HPT size can never be changed thereafter - it will be cleared but always sized as from the first call. With upcoming HPT resize implementation, we're going to need to allow userspace to resize the HPT at reset (to change it back to the default size if the guest changed it). So, we need to allow this ioctl() to change the HPT size. This patch also updates Documentation/virtual/kvm/api.txt to reflect the new behaviour. In fact the documentation was already slightly incorrect since 572abd56 "KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl" Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 David Gibson 提交于
Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT) and sets it up as the active page table for a VM. For the upcoming HPT resize implementation we're going to want to allocate HPTs separately from activating them. So, split the allocation itself out into kvmppc_allocate_hpt() and perform the activation with a new kvmppc_set_hpt() function. Likewise we split kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt() which unsets it as an active HPT, then frees it. We also move the logic to fall back to smaller HPT sizes if the first try fails into the single caller which used that behaviour, kvmppc_hv_setup_htab_rma(). This introduces a slight semantic change, in that previously if the initial attempt at CMA allocation failed, we would fall back to attempting smaller sizes with the page allocator. Now, we try first CMA, then the page allocator at each size. As far as I can tell this change should be harmless. To match, we make kvmppc_free_hpt() just free the actual HPT itself. The call to kvmppc_free_lpid() that was there, we move to the single caller. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 David Gibson 提交于
Currently, the powerpc kvm_arch structure contains a number of variables tracking the state of the guest's hashed page table (HPT) in KVM HV. This patch gathers them all together into a single kvm_hpt_info substructure. This makes life more convenient for the upcoming HPT resizing implementation. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
This adds a few last pieces of the support for radix guests: * Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and KVM_PPC_GET_RMMU_INFO ioctls for radix guests * On POWER9, allow secondary threads to be on/off-lined while guests are running. * Set up LPCR and the partition table entry for radix guests. * Don't allocate the rmap array in the kvm_memory_slot structure on radix. * Don't try to initialize the HPT for radix guests, since they don't have an HPT. * Take out the code that prevents the HV KVM module from initializing on radix hosts. At this stage, we only support radix guests if the host is running in radix mode, and only support HPT guests if the host is running in HPT mode. Thus a guest cannot switch from one mode to the other, which enables some simplifications. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
With radix, the guest can do TLB invalidations itself using the tlbie (global) and tlbiel (local) TLB invalidation instructions. Linux guests use local TLB invalidations for translations that have only ever been accessed on one vcpu. However, that doesn't mean that the translations have only been accessed on one physical cpu (pcpu) since vcpus can move around from one pcpu to another. Thus a tlbiel might leave behind stale TLB entries on a pcpu where the vcpu previously ran, and if that task then moves back to that previous pcpu, it could see those stale TLB entries and thus access memory incorrectly. The usual symptom of this is random segfaults in userspace programs in the guest. To cope with this, we detect when a vcpu is about to start executing on a thread in a core that is a different core from the last time it executed. If that is the case, then we mark the core as needing a TLB flush and then send an interrupt to any thread in the core that is currently running a vcpu from the same guest. This will get those vcpus out of the guest, and the first one to re-enter the guest will do the TLB flush. The reason for interrupting the vcpus executing on the old core is to cope with the following scenario: CPU 0 CPU 1 CPU 4 (core 0) (core 0) (core 1) VCPU 0 runs task X VCPU 1 runs core 0 TLB gets entries from task X VCPU 0 moves to CPU 4 VCPU 0 runs task X Unmap pages of task X tlbiel (still VCPU 1) task X moves to VCPU 1 task X runs task X sees stale TLB entries That is, as soon as the VCPU starts executing on the new core, it could unmap and tlbiel some page table entries, and then the task could migrate to one of the VCPUs running on the old core and potentially see stale TLB entries. Since the TLB is shared between all the threads in a core, we only use the bit of kvm->arch.need_tlb_flush corresponding to the first thread in the core. To ensure that we don't have a window where we can miss a flush, this moves the clearing of the bit from before the actual flush to after it. This way, two threads might both do the flush, but we prevent the situation where one thread can enter the guest before the flush is finished. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
This adds code to keep track of dirty pages when requested (that is, when memslot->dirty_bitmap is non-NULL) for radix guests. We use the dirty bits in the PTEs in the second-level (partition-scoped) page tables, together with a bitmap of pages that were dirty when their PTE was invalidated (e.g., when the page was paged out). This bitmap is stored in the first half of the memslot->dirty_bitmap area, and kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the bitmap that gets returned to userspace. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
This adds the code to construct the second-level ("partition-scoped" in architecturese) page tables for guests using the radix MMU. Apart from the PGD level, which is allocated when the guest is created, the rest of the tree is all constructed in response to hypervisor page faults. As well as hypervisor page faults for missing pages, we also get faults for reference/change (RC) bits needing to be set, as well as various other error conditions. For now, we only set the R or C bit in the guest page table if the same bit is set in the host PTE for the backing page. This code can take advantage of the guest being backed with either transparent or ordinary 2MB huge pages, and insert 2MB page entries into the guest page tables. There is no support for 1GB huge pages yet. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl for HPT guests on POWER9. With this, we can return 1 for the KVM_CAP_PPC_MMU_HASH_V3 capability. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
This adds two capabilities and two ioctls to allow userspace to find out about and configure the POWER9 MMU in a guest. The two capabilities tell userspace whether KVM can support a guest using the radix MMU, or using the hashed page table (HPT) MMU with a process table and segment tables. (Note that the MMUs in the POWER9 processor cores do not use the process and segment tables when in HPT mode, but the nest MMU does). The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify whether a guest will use the radix MMU or the HPT MMU, and to specify the size and location (in guest space) of the process table. The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about the radix MMU. It returns a list of supported radix tree geometries (base page size and number of bits indexed at each level of the radix tree) and the encoding used to specify the various page sizes for the TLB invalidate entry instruction. Initially, both capabilities return 0 and the ioctls return -EINVAL, until the necessary infrastructure for them to operate correctly is added. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 27 1月, 2017 2 次提交
-
-
由 Paul Mackerras 提交于
The H_PROD hypercall is supposed to wake up an idle vcpu. We have an implementation, but because Linux doesn't use it except when doing cpu hotplug, it was never tested properly. AIX does use it, and reported it broken. It turns out we were waking the wrong vcpu (the one doing H_PROD, not the target of the prod) and we weren't handling the case where the target needs an IPI to wake it. Fix it by using the existing kvmppc_fast_vcpu_kick_hv() function, which is intended for this kind of thing, and by using the target vcpu not the current vcpu. We were also not looking at the prodded flag when checking whether a ceded vcpu should wake up, so this adds checks for the prodded flag alongside the checks for pending exceptions. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
If the target vcpu for kvmppc_fast_vcpu_kick_hv() is not running on any CPU, then we will have vcpu->arch.thread_cpu == -1, and as it happens, kvmppc_fast_vcpu_kick_hv will call kvmppc_ipi_thread with -1 as the cpu argument. Although this is not meaningful, in the past, before commit 1704a81c ("KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9", 2016-11-18), it was harmless because CPU -1 is not in the same core as any real CPU thread. On a POWER9, however, we don't do the "same core" check, so we were trying to do a msgsnd to thread -1, which is invalid. To avoid this, we add a check to see that vcpu->arch.thread_cpu is >= 0 before calling kvmppc_ipi_thread() with it. Since vcpu->arch.thread_vcpu can change asynchronously, we use READ_ONCE to ensure that the value we check is the same value that we use as the argument to kvmppc_ipi_thread(). Fixes: 1704a81c ("KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9") Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 26 12月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
ktime_set(S,N) was required for the timespec storage type and is still useful for situations where a Seconds and Nanoseconds part of a time value needs to be converted. For anything where the Seconds argument is 0, this is pointless and can be replaced with a simple assignment. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
-
- 25 12月, 2016 1 次提交
-
-
由 Linus Torvalds 提交于
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 12月, 2016 1 次提交
-
-
由 Anna-Maria Gleixner 提交于
Install the callbacks via the state machine. Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: kvm-ppc@vger.kernel.org Cc: Paul Mackerras <paulus@samba.org> Cc: rt@linutronix.de Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alexander Graf <agraf@suse.com> Link: http://lkml.kernel.org/r/20161126231350.10321-18-bigeasy@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 28 11月, 2016 3 次提交
-
-
由 Suraj Jitindar Singh 提交于
Fix comment block to match kernel comment style. Fix print format from signed to unsigned. Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Suraj Jitindar Singh 提交于
The kvm module parameter halt_poll_ns defines the global maximum halt polling interval and can be dynamically changed by writing to the /sys/module/kvm/parameters/halt_poll_ns sysfs file. However in kvm-hv this module parameter value is only ever checked when we grow the current polling interval for the given vcore. This means that if we decrease the halt_poll_ns value below the current polling interval we won't see any effect unless we try to grow the polling interval above the new max at some point or it happens to be shrunk below the halt_poll_ns value. Update the halt polling code so that we always check for a new module param value of halt_poll_ns and set the current halt polling interval to it if it's currently greater than the new max. This means that it's redundant to also perform this check in the grow_halt_poll_ns() function now. Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Suraj Jitindar Singh 提交于
The previous patch exported the variables which back the module parameters of the generic kvm module. Now use these variables in the kvm-hv module so that any change to the generic module parameters will also have the same effect for the kvm-hv module. This removes the duplication of the kvm module parameters which was redundant and should reduce confusion when tuning them. Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 24 11月, 2016 6 次提交
-
-
由 Suraj Jitindar Singh 提交于
The function kvmppc_set_arch_compat() is used to determine the value of the processor compatibility register (PCR) for a guest running in a given compatibility mode. There is currently no support for v3.00 of the ISA. Add support for v3.00 of the ISA which adds an ISA v2.07 compatilibity mode to the PCR. We also add a check to ensure the processor we are running on is capable of emulating the chosen processor (for example a POWER7 cannot emulate a POWER8, similarly with a POWER8 and a POWER9). Based on work by: Paul Mackerras <paulus@ozlabs.org> [paulus@ozlabs.org - moved dummy PCR_ARCH_300 definition here; set guest_pcr_bit when arch_compat == 0, added comment.] Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
With POWER9, each CPU thread has its own MMU context and can be in the host or a guest independently of the other threads; there is still however a restriction that all threads must use the same type of address translation, either radix tree or hashed page table (HPT). Since we only support HPT guests on a HPT host at this point, we can treat the threads as being independent, and avoid all of the work of coordinating the CPU threads. To make this simpler, we introduce a new threads_per_vcore() function that returns 1 on POWER9 and threads_per_subcore on POWER7/8, and use that instead of threads_per_subcore or threads_per_core in various places. This also changes the value of the KVM_CAP_PPC_SMT capability on POWER9 systems from 4 to 1, so that userspace will not try to create VMs with multiple vcpus per vcore. (If userspace did create a VM that thought it was in an SMT mode, the VM might try to use the msgsndp instruction, which will not work as expected. In future it may be possible to trap and emulate msgsndp in order to allow VMs to think they are in an SMT mode, if only for the purpose of allowing migration from POWER8 systems.) With all this, we can now run guests on POWER9 as long as the host is running with HPT translation. Since userspace currently has no way to request radix tree translation for the guest, the guest has no choice but to use HPT translation. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
The new XIVE interrupt controller on POWER9 can direct external interrupts to the hypervisor or the guest. The interrupts directed to the hypervisor are controlled by an LPCR bit called LPCR_HVICE, and come in as a "hypervisor virtualization interrupt". This sets the LPCR bit so that hypervisor virtualization interrupts can occur while we are in the guest. We then also need to cope with exiting the guest because of a hypervisor virtualization interrupt. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
POWER9 includes a new interrupt controller, called XIVE, which is quite different from the XICS interrupt controller on POWER7 and POWER8 machines. KVM-HV accesses the XICS directly in several places in order to send and clear IPIs and handle interrupts from PCI devices being passed through to the guest. In order to make the transition to XIVE easier, OPAL firmware will include an emulation of XICS on top of XIVE. Access to the emulated XICS is via OPAL calls. The one complication is that the EOI (end-of-interrupt) function can now return a value indicating that another interrupt is pending; in this case, the XIVE will not signal an interrupt in hardware to the CPU, and software is supposed to acknowledge the new interrupt without waiting for another interrupt to be delivered in hardware. This adapts KVM-HV to use the OPAL calls on machines where there is no XICS hardware. When there is no XICS, we look for a device-tree node with "ibm,opal-intc" in its compatible property, which is how OPAL indicates that it provides XICS emulation. In order to handle the EOI return value, kvmppc_read_intr() has become kvmppc_read_one_intr(), with a boolean variable passed by reference which can be set by the EOI functions to indicate that another interrupt is pending. The new kvmppc_read_intr() keeps calling kvmppc_read_one_intr() until there are no more interrupts to process. The return value from kvmppc_read_intr() is the largest non-zero value of the returns from kvmppc_read_one_intr(). Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
On POWER9, the msgsnd instruction is able to send interrupts to other cores, as well as other threads on the local core. Since msgsnd is generally simpler and faster than sending an IPI via the XICS, we use msgsnd for all IPIs sent by KVM on POWER9. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
POWER9 adds new capabilities to the tlbie (TLB invalidate entry) and tlbiel (local tlbie) instructions. Both instructions get a set of new parameters (RIC, PRS and R) which appear as bits in the instruction word. The tlbiel instruction now has a second register operand, which contains a PID and/or LPID value if needed, and should otherwise contain 0. This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9 as well as older processors. Since we only handle HPT guests so far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction word as on previous processors, so we don't need to conditionally execute different instructions depending on the processor. The local flush on first entry to a guest in book3s_hv_rmhandlers.S is a loop which depends on the number of TLB sets. Rather than using feature sections to set the number of iterations based on which CPU we're on, we now work out this number at VM creation time and store it in the kvm_arch struct. That will make it possible to get the number from the device tree in future, which will help with compatibility with future processors. Since mmu_partition_table_set_entry() does a global flush of the whole LPID, we don't need to do the TLB flush on first entry to the guest on each processor. Therefore we don't set all bits in the tlb_need_flush bitmap on VM startup on POWER9. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-