- 09 10月, 2018 9 次提交
-
-
由 Paul Mackerras 提交于
When we are running as a nested hypervisor, we use a hypercall to enter the guest rather than code in book3s_hv_rmhandlers.S. This means that the hypercall handlers listed in hcall_real_table never get called. There are some hypercalls that are handled there and not in kvmppc_pseries_do_hcall(), which therefore won't get processed for a nested guest. To fix this, we add cases to kvmppc_pseries_do_hcall() to handle those hypercalls, with the following exceptions: - The HPT hypercalls (H_ENTER, H_REMOVE, etc.) are not handled because we only support radix mode for nested guests. - H_CEDE has to be handled specially because the cede logic in kvmhv_run_single_vcpu assumes that it has been processed by the time that kvmhv_p9_guest_entry() returns. Therefore we put a special case for H_CEDE in kvmhv_p9_guest_entry(). For the XICS hypercalls, if real-mode processing is enabled, then the virtual-mode handlers assume that they are being called only to finish up the operation. Therefore we turn off the real-mode flag in the XICS code when running as a nested hypervisor. Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
This adds a new hypercall, H_ENTER_NESTED, which is used by a nested hypervisor to enter one of its nested guests. The hypercall supplies register values in two structs. Those values are copied by the level 0 (L0) hypervisor (the one which is running in hypervisor mode) into the vcpu struct of the L1 guest, and then the guest is run until an interrupt or error occurs which needs to be reported to L1 via the hypercall return value. Currently this assumes that the L0 and L1 hypervisors are the same endianness, and the structs passed as arguments are in native endianness. If they are of different endianness, the version number check will fail and the hcall will be rejected. Nested hypervisors do not support indep_threads_mode=N, so this adds code to print a warning message if the administrator has set indep_threads_mode=N, and treat it as Y. Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
When the 'regs' field was added to struct kvm_vcpu_arch, the code was changed to use several of the fields inside regs (e.g., gpr, lr, etc.) but not the ccr field, because the ccr field in struct pt_regs is 64 bits on 64-bit platforms, but the cr field in kvm_vcpu_arch is only 32 bits. This changes the code to use the regs.ccr field instead of cr, and changes the assembly code on 64-bit platforms to use 64-bit loads and stores instead of 32-bit ones. Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
This creates an alternative guest entry/exit path which is used for radix guests on POWER9 systems when we have indep_threads_mode=Y. In these circumstances there is exactly one vcpu per vcore and there is no coordination required between vcpus or vcores; the vcpu can enter the guest without needing to synchronize with anything else. The new fast path is implemented almost entirely in C in book3s_hv.c and runs with the MMU on until the guest is entered. On guest exit we use the existing path until the point where we are committed to exiting the guest (as distinct from handling an interrupt in the low-level code and returning to the guest) and we have pulled the guest context from the XIVE. At that point we check a flag in the stack frame to see whether we came in via the old path and the new path; if we came in via the new path then we go back to C code to do the rest of the process of saving the guest context and restoring the host context. The C code is split into separate functions for handling the OS-accessible state and the hypervisor state, with the idea that the latter can be replaced by a hypercall when we implement nested virtualization. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> [mpe: Fix CONFIG_ALTIVEC=n build] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
This adds a parameter to __kvmppc_save_tm and __kvmppc_restore_tm which allows the caller to indicate whether it wants the nonvolatile register state to be preserved across the call, as required by the C calling conventions. This parameter being non-zero also causes the MSR bits that enable TM, FP, VMX and VSX to be preserved. The condition register and DSCR are now always preserved. With this, kvmppc_save_tm_hv and kvmppc_restore_tm_hv can be called from C code provided the 3rd parameter is non-zero. So that these functions can be called from modules, they now include code to set the TOC pointer (r2) on entry, as they can call other built-in C functions which will assume the TOC to have been set. Also, the fake suspend code in kvmppc_save_tm_hv is modified here to assume that treclaim in fake-suspend state does not modify any registers, which is the case on POWER9. This enables the code to be simplified quite a bit. _kvmppc_save_tm_pr and _kvmppc_restore_tm_pr become much simpler with this change, since they now only need to save and restore TAR and pass 1 for the 3rd argument to __kvmppc_{save,restore}_tm. Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
This streamlines the first part of the code that handles a hypervisor interrupt that occurred in the guest. With this, all of the real-mode handling that occurs is done before the "guest_exit_cont" label; once we get to that label we are committed to exiting to host virtual mode. Thus the machine check and HMI real-mode handling is moved before that label. Also, the code to handle external interrupts is moved out of line, as is the code that calls kvmppc_realmode_hmi_handler(). Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
This pulls out the assembler code that is responsible for saving and restoring the PMU state for the host and guest into separate functions so they can be used from an alternate entry path. The calling convention is made compatible with C. Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Reviewed-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
This is based on a patch by Suraj Jitindar Singh. This moves the code in book3s_hv_rmhandlers.S that generates an external, decrementer or privileged doorbell interrupt just before entering the guest to C code in book3s_hv_builtin.c. This is to make future maintenance and modification easier. The algorithm expressed in the C code is almost identical to the previous algorithm. Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
Currently we use two bits in the vcpu pending_exceptions bitmap to indicate that an external interrupt is pending for the guest, one for "one-shot" interrupts that are cleared when delivered, and one for interrupts that persist until cleared by an explicit action of the OS (e.g. an acknowledge to an interrupt controller). The BOOK3S_IRQPRIO_EXTERNAL bit is used for one-shot interrupt requests and BOOK3S_IRQPRIO_EXTERNAL_LEVEL is used for persisting interrupts. In practice BOOK3S_IRQPRIO_EXTERNAL never gets used, because our Book3S platforms generally, and pseries in particular, expect external interrupt requests to persist until they are acknowledged at the interrupt controller. That combined with the confusion introduced by having two bits for what is essentially the same thing makes it attractive to simplify things by only using one bit. This patch does that. With this patch there is only BOOK3S_IRQPRIO_EXTERNAL, and by default it has the semantics of a persisting interrupt. In order to avoid breaking the ABI, we introduce a new "external_oneshot" flag which preserves the behaviour of the KVM_INTERRUPT ioctl with the KVM_INTERRUPT_SET argument. Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 30 7月, 2018 2 次提交
-
-
由 Christophe Leroy 提交于
files not using feature fixup don't need asm/feature-fixups.h files using feature fixup need asm/feature-fixups.h Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Christophe Leroy 提交于
This patch moves ASM_CONST() and stringify_in_c() into dedicated asm-const.h, then cleans all related inclusions. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> [mpe: asm-compat.h should include asm-const.h] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 16 7月, 2018 1 次提交
-
-
由 Nicholas Piggin 提交于
POWER9 DD1 was never a product. It is no longer supported by upstream firmware, and it is not effectively supported in Linux due to lack of testing. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NMichael Ellerman <mpe@ellerman.id.au> [mpe: Remove arch_make_huge_pte() entirely] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 01 6月, 2018 2 次提交
-
-
由 Simon Guo 提交于
Currently __kvmppc_save/restore_tm() APIs can only be invoked from assembly function. This patch adds C function wrappers for them so that they can be safely called from C function. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
HV KVM and PR KVM need different MSR source to indicate whether treclaim. or trecheckpoint. is necessary. This patch add new parameter (guest MSR) for these kvmppc_save_tm/ kvmppc_restore_tm() APIs: - For HV KVM, it is VCPU_MSR - For PR KVM, it is current host MSR or VCPU_SHADOW_SRR1 This enhancement enables these 2 APIs to be reused by PR KVM later. And the patch keeps HV KVM logic unchanged. This patch also reworks kvmppc_save_tm()/kvmppc_restore_tm() to have a clean ABI: r3 for vcpu and r4 for guest_msr. During kvmppc_save_tm/kvmppc_restore_tm(), the R1 need to be saved or restored. Currently the R1 is saved into HSTATE_HOST_R1. In PR KVM, we are going to add a C function wrapper for kvmppc_save_tm/kvmppc_restore_tm() where the R1 will be incremented with added stackframe and save into HSTATE_HOST_R1. There are several places in HV KVM to load HSTATE_HOST_R1 as R1, and we don't want to bring risk or confusion by TM code. This patch will use HSTATE_SCRATCH2 to save/restore R1 in kvmppc_save_tm/kvmppc_restore_tm() to avoid future confusion, since the r1 is actually a temporary/scratch value to be saved/stored. [paulus@ozlabs.org - rebased on top of 7b0e827c ("KVM: PPC: Book3S HV: Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)] Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 31 5月, 2018 2 次提交
-
-
由 Simon Guo 提交于
It is a simple patch just for moving kvmppc_save_tm/kvmppc_restore_tm() functionalities to tm.S. There is no logic change. The reconstruct of those APIs will be done in later patches to improve readability. It is for preparation of reusing those APIs on both HV/PR PPC KVM. Some slight change during move the functions includes: - surrounds some HV KVM specific code with CONFIG_KVM_BOOK3S_HV_POSSIBLE for compilation. - use _GLOBAL() to define kvmppc_save_tm/kvmppc_restore_tm() [paulus@ozlabs.org - rebased on top of 7b0e827c ("KVM: PPC: Book3S HV: Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)] Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
This splits out the handling of "fake suspend" mode, part of the hypervisor TM assist code for POWER9, and puts almost all of it in new kvmppc_save_tm_hv and kvmppc_restore_tm_hv functions. The new functions branch to kvmppc_save/restore_tm if the CPU does not require hypervisor TM assistance. With this, it will be more straightforward to move kvmppc_save_tm and kvmppc_restore_tm to another file and use them for transactional memory support in PR KVM. Additionally, it also makes the code a bit clearer and reduces the number of feature sections. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 18 5月, 2018 2 次提交
-
-
由 Nicholas Piggin 提交于
When CONFIG_RELOCATABLE=n, the Linux real mode interrupt handlers call into KVM using real address. This needs to be translated to the kernel linear effective address before the MMU is switched on. kvmppc_bad_host_intr misses adding these bits, so when it is used to handle a system reset interrupt (that always gets delivered in real mode), it results in an instruction access fault immediately after the MMU is turned on. Fix this by ensuring the top 2 address bits are set when the MMU is turned on. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
The radix guest code can has fewer restrictions about what context it can run in, so move this flushing out of assembly and have it use the Linux TLB flush implementations introduced previously. This allows powerpc:tlbie trace events to be used. This changes the tlbiel sequence to only execute RIC=2 flush once on the first set flushed, then RIC=0 for the rest of the sets. The end result of the flush should be unchanged. This matches the local PID flush pattern that was introduced in a5998fcb ("powerpc/mm/radix: Optimise tlbiel flush all case"). Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 17 5月, 2018 2 次提交
-
-
由 Paul Mackerras 提交于
A radix guest can execute tlbie instructions to invalidate TLB entries. After a tlbie or a group of tlbies, it must then do the architected sequence eieio; tlbsync; ptesync to ensure that the TLB invalidation has been processed by all CPUs in the system before it can rely on no CPU using any translation that it just invalidated. In fact it is the ptesync which does the actual synchronization in this sequence, and hardware has a requirement that the ptesync must be executed on the same CPU thread as the tlbies which it is expected to order. Thus, if a vCPU gets moved from one physical CPU to another after it has done some tlbies but before it can get to do the ptesync, the ptesync will not have the desired effect when it is executed on the second physical CPU. To fix this, we do a ptesync in the exit path for radix guests. If there are any pending tlbies, this will wait for them to complete. If there aren't, then ptesync will just do the same as sync. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
Currently, the HV KVM guest entry/exit code adds the timebase offset from the vcore struct to the timebase on guest entry, and subtracts it on guest exit. Which is fine, except that it is possible for userspace to change the offset using the SET_ONE_REG interface while the vcore is running, as there is only one timebase offset per vcore but potentially multiple VCPUs in the vcore. If that were to happen, KVM would subtract a different offset on guest exit from that which it had added on guest entry, leading to the timebase being out of sync between cores in the host, which then leads to bad things happening such as hangs and spurious watchdog timeouts. To fix this, we add a new field 'tb_offset_applied' to the vcore struct which stores the offset that is currently applied to the timebase. This value is set from the vcore tb_offset field on guest entry, and is what is subtracted from the timebase on guest exit. Since it is zero when the timebase offset is not applied, we can simplify the logic in kvmhv_start_timing and kvmhv_accumulate_time. In addition, we had secondary threads reading the timebase while running concurrently with code on the primary thread which would eventually add or subtract the timebase offset from the timebase. This occurred while saving or restoring the DEC register value on the secondary threads. Although no specific incorrect behaviour has been observed, this is a race which should be fixed. To fix it, we move the DEC saving code to just before we call kvmhv_commence_exit, and the DEC restoring code to after the point where we have waited for the primary thread to switch the MMU context and add the timebase offset. That way we are sure that the timebase contains the guest timebase value in both cases. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 03 5月, 2018 1 次提交
-
-
由 Naveen N. Rao 提交于
During guest entry/exit, we switch over to/from the guest MMU context and we cannot take exceptions in the hypervisor code. Since ftrace may be enabled and since it can result in us taking a trap, disable ftrace by setting paca->ftrace_enabled to zero. There are two paths through which we enter/exit a guest: 1. If we are the vcore runner, then we enter the guest via __kvmppc_vcore_entry() and we disable ftrace around this. This is always the case for Power9, and for the primary thread on Power8. 2. If we are a secondary thread in Power8, then we would be in nap due to SMT being disabled. We are woken up by an IPI to enter the guest. In this scenario, we enter the guest through kvm_start_guest(). We disable ftrace at this point. In this scenario, ftrace would only get re-enabled on the secondary thread when SMT is re-enabled (via start_secondary()). Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 31 3月, 2018 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
SLOF checks for 'sc 1' (hypercall) support by issuing a hcall with H_SET_DABR. Since the recent commit e8ebedbf ("KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9") changed H_SET_DABR to return H_UNSUPPORTED on Power9, we see guest boot failures, the symptom is the boot seems to just stop in SLOF, eg: SLOF *************************************************************** QEMU Starting Build Date = Sep 24 2017 12:23:07 FW Version = buildd@ release 20170724 <no further output> SLOF can cope if H_SET_DABR returns H_HARDWARE. So wwitch the return value to H_HARDWARE instead of H_UNSUPPORTED so that we don't break the guest boot. That does mean we return a different error to PowerVM in this case, but that's probably not a big concern. Fixes: e8ebedbf ("KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 30 3月, 2018 1 次提交
-
-
由 Nicholas Piggin 提交于
The "lppaca" is a structure registered with the hypervisor. This is unnecessary when running on non-virtualised platforms. One field from the lppaca (pmcregs_in_use) is also used by the host, so move the host part out into the paca (lppaca field is still updated in guest mode). Signed-off-by: NNicholas Piggin <npiggin@gmail.com> [mpe: Fix non-pseries build with some #ifdefs] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 27 3月, 2018 2 次提交
-
-
由 Michael Neuling 提交于
POWER9 with the DAWR disabled causes problems for partition migration. Either we have to fail the migration (since we lose the DAWR) or we silently drop the DAWR and allow the migration to pass. This patch does the latter and allows the migration to pass (at the cost of silently losing the DAWR). This is not ideal but hopefully the best overall solution. This approach has been acked by Paulus. With this patch kvmppc_set_one_reg() will store the DAWR in the vcpu but won't actually set it on POWER9 hardware. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Neuling 提交于
POWER7 compat mode guests can use h_set_dabr on POWER9. POWER9 should use the DAWR but since it's disabled there we can't. This returns H_UNSUPPORTED on a h_set_dabr() on POWER9 where the DAWR is disabled. Current Linux guests ignore this error, so they will silently not get the DAWR (sigh). The same error code is being used by POWERVM in this case. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 23 3月, 2018 4 次提交
-
-
由 Paul Mackerras 提交于
This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors, where the contents of the TEXASR can get corrupted while a thread is in fake suspend state. The workaround is for the instruction emulation code to use the value saved at the most recent guest exit in real suspend mode. We achieve this by simply not saving the TEXASR into the vcpu struct on an exit in fake suspend state. We also have to take care to set the orig_texasr field only on guest exit in real suspend state. This also means that on guest entry in fake suspend state, TEXASR will be restored to the value it had on the last exit in real suspend state, effectively counteracting any hardware-caused corruption. This works because TEXASR may not be written in suspend state. With this, the guest might see the wrong values in TEXASR if it reads it while in suspend state, but will see the correct value in non-transactional state (e.g. after a treclaim), and treclaim will work correctly. With this workaround, the code will actually run slightly faster, and will operate correctly on systems without the TEXASR bug (since TEXASR may not be written in suspend state, and is only changed by failure recording, which will have already been done before we get into fake suspend state). Therefore these changes are not made subject to a CPU feature bit. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Suraj Jitindar Singh 提交于
This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors, where a treclaim performed in fake suspend mode can cause subsequent reads from the XER register to return inconsistent values for the SO (summary overflow) bit. The inconsistent SO bit state can potentially be observed on any thread in the core. We have to do the treclaim because that is the only way to get the thread out of suspend state (fake or real) and into non-transactional state. The workaround for the bug is to force the core into SMT4 mode before doing the treclaim. This patch adds the code to do that, conditional on the CPU_FTR_P9_TM_XER_SO_BUG feature bit. Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
POWER9 has hardware bugs relating to transactional memory and thread reconfiguration (changes to hardware SMT mode). Specifically, the core does not have enough storage to store a complete checkpoint of all the architected state for all four threads. The DD2.2 version of POWER9 includes hardware modifications designed to allow hypervisor software to implement workarounds for these problems. This patch implements those workarounds in KVM code so that KVM guests see a full, working transactional memory implementation. The problems center around the use of TM suspended state, where the CPU has a checkpointed state but execution is not transactional. The workaround is to implement a "fake suspend" state, which looks to the guest like suspended state but the CPU does not store a checkpoint. In this state, any instruction that would cause a transition to transactional state (rfid, rfebb, mtmsrd, tresume) or would use the checkpointed state (treclaim) causes a "soft patch" interrupt (vector 0x1500) to the hypervisor so that it can be emulated. The trechkpt instruction also causes a soft patch interrupt. On POWER9 DD2.2, we avoid returning to the guest in any state which would require a checkpoint to be present. The trechkpt in the guest entry path which would normally create that checkpoint is replaced by either a transition to fake suspend state, if the guest is in suspend state, or a rollback to the pre-transactional state if the guest is in transactional state. Fake suspend state is indicated by a flag in the PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and reads back as 0. On exit from the guest, if the guest is in fake suspend state, we still do the treclaim instruction as we would in real suspend state, in order to get into non-transactional state, but we do not save the resulting register state since there was no checkpoint. Emulation of the instructions that cause a softpatch interrupt is handled in two paths. If the guest is in real suspend mode, we call kvmhv_p9_tm_emulation_early() to handle the cases where the guest is transitioning to transactional state. This is called before we do the treclaim in the guest exit path; because we haven't done treclaim, we can get back to the guest with the transaction still active. If the instruction is a case that kvmhv_p9_tm_emulation_early() doesn't handle, or if the guest is in fake suspend state, then we proceed to do the complete guest exit path and subsequently call kvmhv_p9_tm_emulation() in host context with the MMU on. This handles all the cases including the cases that generate program interrupts (illegal instruction or TM Bad Thing) and facility unavailable interrupts. The emulation is reasonably straightforward and is mostly concerned with checking for exception conditions and updating the state of registers such as MSR and CR0. The treclaim emulation takes care to ensure that the TEXASR register gets updated as if it were the guest treclaim instruction that had done failure recording, not the treclaim done in hypervisor state in the guest exit path. With this, the KVM_CAP_PPC_HTM capability returns true (1) even if transactional memory is not available to host userspace. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
Since commit 6964e6a4 ("KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded", 2018-01-11), we have been seeing occasional machine check interrupts on POWER8 systems when running KVM guests, due to SLB multihit errors. This turns out to be due to the guest exit code reloading the host SLB entries from the SLB shadow buffer when the SLB was not previously cleared in the guest entry path. This can happen because the path which skips from the guest entry code to the guest exit code without entering the guest now does the skip before the SLB is cleared and loaded with guest values, but the host values are loaded after the point in the guest exit path that we skip to. To fix this, we move the code that reloads the host SLB values up so that it occurs just before the point in the guest exit code (the label guest_bypass:) where we skip to from the guest entry path. Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Fixes: 6964e6a4 ("KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded") Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 14 3月, 2018 1 次提交
-
-
由 Paul Mackerras 提交于
This fixes a bug where the trap number that is returned by __kvmppc_vcore_entry gets corrupted. The effect of the corruption is that IPIs get ignored on POWER9 systems when the IPI is sent via a doorbell interrupt to a CPU which is executing in a KVM guest. The effect of the IPI being ignored is often that another CPU locks up inside smp_call_function_many() (and if that CPU is holding a spinlock, other CPUs then lock up inside raw_spin_lock()). The trap number is currently held in register r12 for most of the assembly-language part of the guest exit path. In that path, we call kvmppc_subcore_exit_guest(), which is a C function, without restoring r12 afterwards. Depending on the kernel config and the compiler, it may modify r12 or it may not, so some config/compiler combinations see the bug and others don't. To fix this, we arrange for the trap number to be stored on the stack from the 'guest_bypass:' label until the end of the function, then the trap number is loaded and returned in r12 as before. Cc: stable@vger.kernel.org # v4.8+ Fixes: fd7bacbc ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt") Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 09 2月, 2018 1 次提交
-
-
由 Alexander Graf 提交于
We ended up with code that did a conditional branch inside a feature section to code outside of the feature section. Depending on how the object file gets organized, that might mean we exceed the 14bit relocation limit for conditional branches: arch/powerpc/kvm/built-in.o:arch/powerpc/kvm/book3s_hv_rmhandlers.S:416:(__ftr_alt_97+0x8): relocation truncated to fit: R_PPC64_REL14 against `.text'+1ca4 So instead of doing a conditional branch outside of the feature section, let's just jump at the end of the same, making the branch very short. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 19 1月, 2018 5 次提交
-
-
由 Madhavan Srinivasan 提交于
Rename the paca->soft_enabled to paca->irq_soft_mask as it is no longer used as a flag for interrupt state, but a mask. Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Benjamin Herrenschmidt 提交于
This works on top of the single escalation support. When in single escalation, with this change, we will keep the escalation interrupt disabled unless the VCPU is in H_CEDE (idle). In any other case, we know the VCPU will be rescheduled and thus there is no need to take escalation interrupts in the host whenever a guest interrupt fires. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Benjamin Herrenschmidt 提交于
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Benjamin Herrenschmidt 提交于
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Benjamin Herrenschmidt 提交于
The prodded flag is only cleared at the beginning of H_CEDE, so every time we have an escalation, we will cause the *next* H_CEDE to return immediately. Instead use a dedicated "irq_pending" flag to indicate that a guest interrupt is pending for the VCPU. We don't reuse the existing exception bitmap so as to avoid expensive atomic ops. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 18 1月, 2018 1 次提交
-
-
由 Paul Mackerras 提交于
Hypervisor maintenance interrupts (HMIs) are generated by various causes, signalled by bits in the hypervisor maintenance exception register (HMER). In most cases calling OPAL to handle the interrupt is the correct thing to do, but the "debug trigger" HMIs signalled by PPC bit 17 (bit 46) of HMER are used to invoke software workarounds for hardware bugs, and OPAL does not have any code to handle this cause. The debug trigger HMI is used in POWER9 DD2.0 and DD2.1 chips to work around a hardware bug in executing vector load instructions to cache inhibited memory. In POWER9 DD2.2 chips, it is generated when conditions are detected relating to threads being in TM (transactional memory) suspended mode when the core SMT configuration needs to be reconfigured. The kernel currently has code to detect the vector CI load condition, but only when the HMI occurs in the host, not when it occurs in a guest. If a HMI occurs in the guest, it is always passed to OPAL, and then we always re-sync the timebase, because the HMI cause might have been a timebase error, for which OPAL would re-sync the timebase, thus removing the timebase offset which KVM applied for the guest. Since we don't know what OPAL did, we don't know whether to subtract the timebase offset from the timebase, so instead we re-sync the timebase. This adds code to determine explicitly what the cause of a debug trigger HMI will be. This is based on a new device-tree property under the CPU nodes called ibm,hmi-special-triggers, if it is present, or otherwise based on the PVR (processor version register). The handling of debug trigger HMIs is pulled out into a separate function which can be called from the KVM guest exit code. If this function handles and clears the HMI, and no other HMI causes remain, then we skip calling OPAL and we proceed to subtract the guest timebase offset from the timebase. The overall handling for HMIs that occur in the host (i.e. not in a KVM guest) is largely unchanged, except that we now don't set the flag for the vector CI load workaround on DD2.2 processors. This also removes a BUG_ON in the KVM code. BUG_ON is generally not useful in KVM guest entry/exit code since it is difficult to handle the resulting trap gracefully. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 17 1月, 2018 2 次提交
-
-
由 Paul Mackerras 提交于
This moves the code that loads and unloads the guest SLB values so that it is done while the guest LPCR value is loaded in the LPCR register. The reason for doing this is that on POWER9, the behaviour of the slbmte instruction depends on the LPCR[UPRT] bit. If UPRT is 1, as it is for a radix host (or guest), the SLB index is truncated to 2 bits. This means that for a HPT guest on a radix host, the SLB was not being loaded correctly, causing the guest to crash. The SLB is now loaded much later in the guest entry path, after the LPCR is loaded, which for a secondary thread is after it sees that the primary thread has switched the MMU to the guest. The loop that waits for the primary thread has a branch out to the exit code that is taken if it sees that other threads have commenced exiting the guest. Since we have now not loaded the SLB at this point, we make this path branch to a new label 'guest_bypass' and we move the SLB unload code to before this label. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
This fixes a bug where it is possible to enter a guest on a POWER9 system without having the XIVE (interrupt controller) context loaded. This can happen because we unload the XIVE context from the CPU before doing the real-mode handling for machine checks. After the real-mode handler runs, it is possible that we re-enter the guest via a fast path which does not load the XIVE context. To fix this, we move the unloading of the XIVE context to come after the real-mode machine check handler is called. Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 11 1月, 2018 1 次提交
-
-
由 Alexander Graf 提交于
On Book3S in HV mode, we don't use the vcpu->arch.dec field at all. Instead, all logic is built around vcpu->arch.dec_expires. So let's remove the one remaining piece of code that was setting it. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-