- 01 6月, 2018 16 次提交
-
-
由 Simon Guo 提交于
This patch adds support for "treclaim." emulation when PR KVM guest executes treclaim. and traps to host. We will firstly do treclaim. and save TM checkpoint. Then it is necessary to update vcpu current reg content with checkpointed vals. When rfid into guest again, those vcpu current reg content (now the checkpoint vals) will be loaded into regs. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
Currently kvmppc_handle_fac() will not update NV GPRs and thus it can return with GUEST_RESUME. However PR KVM guest always disables MSR_TM bit in privileged state. If PR privileged-state guest is trying to read TM SPRs, it will trigger TM facility unavailable exception and fall into kvmppc_handle_fac(). Then the emulation will be done by kvmppc_core_emulate_mfspr_pr(). The mfspr instruction can include a RT with NV reg. So it is necessary to restore NV GPRs at this case, to reflect the update to NV RT. This patch make kvmppc_handle_fac() return GUEST_RESUME_NV for TM facility unavailable exceptions in guest privileged state. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Reviewed-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
Currently the kernel doesn't use transaction memory. And there is an issue for privileged state in the guest that: tbegin/tsuspend/tresume/tabort TM instructions can impact MSR TM bits without trapping into the PR host. So following code will lead to a false mfmsr result: tbegin <- MSR bits update to Transaction active. beq <- failover handler branch mfmsr <- still read MSR bits from magic page with transaction inactive. It is not an issue for non-privileged guest state since its mfmsr is not patched with magic page and will always trap into the PR host. This patch will always fail tbegin attempt for privileged state in the guest, so that the above issue is prevented. It is benign since currently (guest) kernel doesn't initiate a transaction. Test case: https://github.com/justdoitqd/publicFiles/blob/master/test_tbegin_pr.cSigned-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
The mfspr/mtspr on TM SPRs(TEXASR/TFIAR/TFHAR) are non-privileged instructions and can be executed by PR KVM guest in problem state without trapping into the host. We only emulate mtspr/mfspr texasr/tfiar/tfhar in guest PR=0 state. When we are emulating mtspr tm sprs in guest PR=0 state, the emulation result needs to be visible to guest PR=1 state. That is, the actual TM SPR val should be loaded into actual registers. We already flush TM SPRs into vcpu when switching out of CPU, and load TM SPRs when switching back. This patch corrects mfspr()/mtspr() emulation for TM SPRs to make the actual source/dest be the actual TM SPRs. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
The math registers will be saved into vcpu->arch.fp/vr and corresponding vcpu->arch.fp_tm/vr_tm area. We flush or giveup the math regs into vcpu->arch.fp/vr before saving transaction. After transaction is restored, the math regs will be loaded back into regs. If there is a FP/VEC/VSX unavailable exception during transaction active state, the math checkpoint content might be incorrect and we need to do treclaim./load the correct checkpoint val/trechkpt. sequence to retry the transaction. That will make our solution complicated. To solve this issue, we always make the hardware guest MSR math bits (shadow_msr) consistent with the MSR val which guest sees (kvmppc_get_msr()) when guest msr is with tm enabled. Then all FP/VEC/VSX unavailable exception can be delivered to guest and guest handles the exception by itself. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
The transaction memory checkpoint area save/restore behavior is triggered when VCPU qemu process is switching out/into CPU, i.e. at kvmppc_core_vcpu_put_pr() and kvmppc_core_vcpu_load_pr(). MSR TM active state is determined by TS bits: active: 10(transactional) or 01 (suspended) inactive: 00 (non-transactional) We don't "fake" TM functionality for guest. We "sync" guest virtual MSR TM active state(10 or 01) with shadow MSR. That is to say, we don't emulate a transactional guest with a TM inactive MSR. TM SPR support(TFIAR/TFAR/TEXASR) has already been supported by commit 9916d57e ("KVM: PPC: Book3S PR: Expose TM registers"). Math register support (FPR/VMX/VSX) will be done at subsequent patch. Whether TM context need to be saved/restored can be determined by kvmppc_get_msr() TM active state: * TM active - save/restore TM context * TM inactive - no need to do so and only save/restore TM SPRs. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Suggested-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
This patch adds 2 new APIs, kvmppc_save_tm_sprs() and kvmppc_restore_tm_sprs(), for the purpose of TEXASR/TFIAR/TFHAR save/restore. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
This patch adds 2 new APIs: kvmppc_copyto_vcpu_tm() and kvmppc_copyfrom_vcpu_tm(). These 2 APIs will be used to copy from/to TM data between VCPU_TM/VCPU area. PR KVM will use these APIs for treclaim. or trechkpt. emulation. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
PR KVM host usually runs with TM enabled in its host MSR value, and with non-transactional TS value. When a guest with TM active traps into PR KVM host, the rfid at the tail of kvmppc_interrupt_pr() will try to switch TS bits from S0 (Suspended & TM disabled) to N1 (Non-transactional & TM enabled). That will leads to TM Bad Thing interrupt. This patch manually sets target TS bits unchanged to avoid this exception. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
According to ISA specification for RFID, in MSR TM disabled and TS suspended state (S0), if the target MSR is TM disabled and TS state is inactive (N0), rfid should suppress this update. This patch makes the RFID emulation of PR KVM consistent with this. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
MSR TS bits can be modified with non-privileged instruction such as tbegin./tend. That means guest can change MSR value "silently" without notifying host. It is necessary to sync the TM bits to host so that host can calculate shadow msr correctly. Note, privileged mode in the guest will always fail transactions so we only take care of problem state mode in the guest. The logic is put into kvmppc_copy_from_svcpu() so that kvmppc_handle_exit_pr() can use correct MSR TM bits even when preemption occurs. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
PowerPC TM functionality needs MSR TM/TS bits support in hardware level. Guest TM functionality can not be emulated with "fake" MSR (msr in magic page) TS bits. This patch syncs TM/TS bits in shadow_msr with the MSR value in magic page, so that the MSR TS value which guest sees is consistent with actual MSR bits running in guest. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
This patch simulates interrupt behavior per Power ISA while injecting interrupt in PR KVM: - When interrupt happens, transactional state should be suspended. kvmppc_mmu_book3s_64_reset_msr() will be invoked when injecting an interrupt. This patch performs this ISA logic in kvmppc_mmu_book3s_64_reset_msr(). Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
Currently __kvmppc_save/restore_tm() APIs can only be invoked from assembly function. This patch adds C function wrappers for them so that they can be safely called from C function. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
kvmppc_save_tm() invokes store_fp_state/store_vr_state(). So it is mandatory to turn on FP/VSX/VMX MSR bits for its execution, just like what kvmppc_restore_tm() did. Previously HV KVM has turned the bits on outside of function kvmppc_save_tm(). Now we include this bit change in kvmppc_save_tm() so that the logic is cleaner. And PR KVM can reuse it later. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Reviewed-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
HV KVM and PR KVM need different MSR source to indicate whether treclaim. or trecheckpoint. is necessary. This patch add new parameter (guest MSR) for these kvmppc_save_tm/ kvmppc_restore_tm() APIs: - For HV KVM, it is VCPU_MSR - For PR KVM, it is current host MSR or VCPU_SHADOW_SRR1 This enhancement enables these 2 APIs to be reused by PR KVM later. And the patch keeps HV KVM logic unchanged. This patch also reworks kvmppc_save_tm()/kvmppc_restore_tm() to have a clean ABI: r3 for vcpu and r4 for guest_msr. During kvmppc_save_tm/kvmppc_restore_tm(), the R1 need to be saved or restored. Currently the R1 is saved into HSTATE_HOST_R1. In PR KVM, we are going to add a C function wrapper for kvmppc_save_tm/kvmppc_restore_tm() where the R1 will be incremented with added stackframe and save into HSTATE_HOST_R1. There are several places in HV KVM to load HSTATE_HOST_R1 as R1, and we don't want to bring risk or confusion by TM code. This patch will use HSTATE_SCRATCH2 to save/restore R1 in kvmppc_save_tm/kvmppc_restore_tm() to avoid future confusion, since the r1 is actually a temporary/scratch value to be saved/stored. [paulus@ozlabs.org - rebased on top of 7b0e827c ("KVM: PPC: Book3S HV: Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)] Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 31 5月, 2018 4 次提交
-
-
由 Simon Guo 提交于
It is a simple patch just for moving kvmppc_save_tm/kvmppc_restore_tm() functionalities to tm.S. There is no logic change. The reconstruct of those APIs will be done in later patches to improve readability. It is for preparation of reusing those APIs on both HV/PR PPC KVM. Some slight change during move the functions includes: - surrounds some HV KVM specific code with CONFIG_KVM_BOOK3S_HV_POSSIBLE for compilation. - use _GLOBAL() to define kvmppc_save_tm/kvmppc_restore_tm() [paulus@ozlabs.org - rebased on top of 7b0e827c ("KVM: PPC: Book3S HV: Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)] Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
This merges in the ppc-kvm topic branch of the powerpc repository to get some changes on which future patches will depend, in particular some new exports and TEXASR bit definitions. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
This splits out the handling of "fake suspend" mode, part of the hypervisor TM assist code for POWER9, and puts almost all of it in new kvmppc_save_tm_hv and kvmppc_restore_tm_hv functions. The new functions branch to kvmppc_save/restore_tm if the CPU does not require hypervisor TM assistance. With this, it will be more straightforward to move kvmppc_save_tm and kvmppc_restore_tm to another file and use them for transactional memory support in PR KVM. Additionally, it also makes the code a bit clearer and reduces the number of feature sections. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
Currently, PR KVM does not implement the configure_mmu operation, and so the KVM_PPC_CONFIGURE_V3_MMU ioctl always fails with an EINVAL error. This causes recent kernels to fail to boot as a PR KVM guest on POWER9, since recent kernels booted in HPT mode do the H_REGISTER_PROC_TBL hypercall, which causes userspace (QEMU) to do KVM_PPC_CONFIGURE_V3_MMU, which fails. This implements a minimal configure_mmu operation for PR KVM. It succeeds only if the MMU is being configured for HPT mode and no process table is being registered. This is enough to get recent kernels to boot as a PR KVM guest. Reviewed-by: NGreg Kurz <groug@kaod.org> Tested-by: NGreg Kurz <groug@kaod.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 24 5月, 2018 3 次提交
-
-
由 Simon Guo 提交于
This patch exports tm_enable()/tm_disable/tm_abort() APIs, which will be used for PR KVM transactional memory logic. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Reviewed-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Simon Guo 提交于
This patches add some macros for CR0/TEXASR bits so that PR KVM TM logic (tbegin./treclaim./tabort.) can make use of them later. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Reviewed-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Simon Guo 提交于
PR KVM will need to reuse msr_check_and_set(). This patch exports this API for reuse. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Reviewed-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 22 5月, 2018 7 次提交
-
-
由 Simon Guo 提交于
This patch reimplements LOAD_VMX/STORE_VMX MMIO emulation with analyse_instr() input. When emulating the store, the VMX reg will need to be flushed so that the right reg val can be retrieved before writing to IO MEM. This patch also adds support for lvebx/lvehx/lvewx/stvebx/stvehx/stvewx MMIO emulation. To meet the requirement of handling different element sizes, kvmppc_handle_load128_by2x64()/kvmppc_handle_store128_by2x64() were replaced with kvmppc_handle_vmx_load()/kvmppc_handle_vmx_store(). The framework used is similar to VSX instruction MMIO emulation. Suggested-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
VSX MMIO emulation uses mmio_vsx_copy_type to represent VSX emulated element size/type, such as KVMPPC_VSX_COPY_DWORD_LOAD, etc. This patch expands mmio_vsx_copy_type to cover VMX copy type, such as KVMPPC_VMX_COPY_BYTE(stvebx/lvebx), etc. As a result, mmio_vsx_copy_type is also renamed to mmio_copy_type. It is a preparation for reimplementing VMX MMIO emulation. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
This patch reimplements LOAD_VSX/STORE_VSX instruction MMIO emulation with analyse_instr() input. It utilizes VSX_FPCONV/VSX_SPLAT/SIGNEXT exported by analyse_instr() and handle accordingly. When emulating VSX store, the VSX reg will need to be flushed so that the right reg val can be retrieved before writing to IO MEM. [paulus@ozlabs.org - mask the register number to 5 bits.] Suggested-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
This patch reimplements LOAD_FP/STORE_FP instruction MMIO emulation with analyse_instr() input. It utilizes the FPCONV/UPDATE properties exported by analyse_instr() and invokes kvmppc_handle_load(s)/kvmppc_handle_store() accordingly. For FP store MMIO emulation, the FP regs need to be flushed firstly so that the right FP reg vals can be read from vcpu->arch.fpr, which will be stored into MMIO data. Suggested-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
Currently HV will save math regs(FP/VEC/VSX) when trap into host. But PR KVM will only save math regs when qemu task switch out of CPU, or when returning from qemu code. To emulate FP/VEC/VSX mmio load, PR KVM need to make sure that math regs were flushed firstly and then be able to update saved VCPU FPR/VEC/VSX area reasonably. This patch adds giveup_ext() field to KVM ops. Only PR KVM has non-NULL giveup_ext() ops. kvmppc_complete_mmio_load() can invoke that hook (when not NULL) to flush math regs accordingly, before updating saved register vals. Math regs flush is also necessary for STORE, which will be covered in later patch within this patch series. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
This patch reimplements non-SIMD LOAD/STORE instruction MMIO emulation with analyse_instr() input. It utilizes the BYTEREV/UPDATE/SIGNEXT properties exported by analyse_instr() and invokes kvmppc_handle_load(s)/kvmppc_handle_store() accordingly. It also moves CACHEOP type handling into the skeleton. instruction_type within kvm_ppc.h is renamed to avoid conflict with sstep.h. Suggested-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Simon Guo 提交于
Some VSX instructions like lxvwsx will splat word into VSR. This patch adds a new VSX copy type KVMPPC_VSX_COPY_WORD_LOAD_DUMP to support this. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Reviewed-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 18 5月, 2018 10 次提交
-
-
由 Paul Mackerras 提交于
This relaxes the restriction on using PR KVM on POWER9. The existing code does work inside a guest partition running in HPT mode, because hypercalls such as H_ENTER use the old HPTE format, not the new format used by POWER9, and so no change to PR KVM's HPT manipulation code is required. PR KVM will still refuse to run if the kernel is using radix translation or if it is running bare-metal. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
It's possible to take a SRESET or MCE in these paths due to a bug in the host code or a NMI IPI, etc. A recent bug attempting to load a virtual address from real mode gave th complete but cryptic error, abridged: Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 REGS: c000000fff76dd80 TRAP: 0200 Not tainted MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 LR [c0000000000c2430] do_tlbies+0x230/0x2f0 Sending the NMIs through the Linux handlers gives a nicer output: Severe Machine check interrupt [Not recovered] NIP [c0000000000155ac]: perf_trace_tlbie+0x2c/0x1a0 Initiator: CPU Error type: Real address [Load (bad)] Effective address: d00017fffcc01a28 opal: Machine check interrupt unrecoverable: MSR(RI=0) opal: Hardware platform error: Unrecoverable Machine Check exception CPU: 0 PID: 6700 Comm: qemu-system-ppc Tainted: G M NIP: c0000000000155ac LR: c0000000000c23c0 CTR: c000000000015580 REGS: c000000fff9e9d80 TRAP: 0200 Tainted: G M MSR: 9000000000201001 <SF,HV,ME,LE> CR: 48082222 XER: 00000000 CFAR: 000000010cbc1a30 DAR: d00017fffcc01a28 DSISR: 00000040 SOFTE: 3 NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 LR [c0000000000c23c0] do_tlbies+0x1c0/0x280 Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
When CONFIG_RELOCATABLE=n, the Linux real mode interrupt handlers call into KVM using real address. This needs to be translated to the kernel linear effective address before the MMU is switched on. kvmppc_bad_host_intr misses adding these bits, so when it is used to handle a system reset interrupt (that always gets delivered in real mode), it results in an instruction access fault immediately after the MMU is turned on. Fix this by ensuring the top 2 address bits are set when the MMU is turned on. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
Adding the write bit and RC bits to pte permissions does not require a pte clear and flush. There should not be other bits changed here, because restricting access or changing the PFN must have already invalidated any existing ptes (otherwise the race is already lost). Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
When the radix fault handler has no page from the process address space (e.g., for IO memory), it looks up the process pte and sets partition table pte using that to get attributes like CI and guarded. If the process table entry is to be writable, set _PAGE_DIRTY as well to avoid an RC update. If not, then ensure _PAGE_DIRTY does not come across. Set _PAGE_ACCESSED as well to avoid RC update. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
The radix guest code can has fewer restrictions about what context it can run in, so move this flushing out of assembly and have it use the Linux TLB flush implementations introduced previously. This allows powerpc:tlbie trace events to be used. This changes the tlbiel sequence to only execute RIC=2 flush once on the first set flushed, then RIC=0 for the rest of the sets. The end result of the flush should be unchanged. This matches the local PID flush pattern that was introduced in a5998fcb ("powerpc/mm/radix: Optimise tlbiel flush all case"). Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
This has the advantage of consolidating TLB flush code in fewer places, and it also implements powerpc:tlbie trace events. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
When partition scope mappings are unmapped with kvm_unmap_radix, the pte is cleared, but the page table structure is left in place. If the next page fault requests a different page table geometry (e.g., due to THP promotion or split), kvmppc_create_pte is responsible for changing the page tables. When a page table entry is to be converted to a large pte, the page table entry is cleared, the PWC flushed, then the page table it points to freed. This will cause pte page tables to leak when a 1GB page is to replace a pud entry points to a pmd table with pte tables under it: The pmd table will be freed, but its pte tables will be missed. Fix this by replacing the simple clear and free code with one that walks down the page tables and frees children. Care must be taken to clear the root entry being unmapped then flushing the PWC before freeing any page tables, as explained in comments. This requires PWC flush to logically become a flush-all-PWC (which it already is in hardware, but the KVM API needs to be changed to avoid confusion). This code also checks that no unexpected pte entries exist in any page table being freed, and unmaps those and emits a WARN. This is an expensive operation for the pte page level, but partition scope changes are rare, so it's unconditional for now to iron out bugs. It can be put under a CONFIG option or removed after some time. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
tlbies to an LPAR do not have to be serialised since POWER4/PPC970, after which the MMU_FTR_LOCKLESS_TLBIE feature was introduced to avoid tlbie locking. Since commit c17b98cf ("KVM: PPC: Book3S HV: Remove code for PPC970 processors"), KVM no longer supports processors that do not have this feature, so the tlbie locking can be removed completely. A sanity check for the feature is put in kvmppc_mmu_hv_init. Testing was done on a POWER9 system in HPT mode, with a -smp 32 guest in HPT mode. 32 instances of the powerpc fork benchmark from selftests were run with --fork, and the results measured. Without this patch, total throughput was about 13.5K/sec, and this is the top of the host profile: 74.52% [k] do_tlbies 2.95% [k] kvmppc_book3s_hv_page_fault 1.80% [k] calc_checksum 1.80% [k] kvmppc_vcpu_run_hv 1.49% [k] kvmppc_run_core After this patch, throughput was about 51K/sec, with this profile: 21.28% [k] do_tlbies 5.26% [k] kvmppc_run_core 4.88% [k] kvmppc_book3s_hv_page_fault 3.30% [k] _raw_spin_lock_irqsave 3.25% [k] gup_pgd_range Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-