- 18 6月, 2021 7 次提交
-
-
由 Ilias Stamatis 提交于
In order to implement as much of the nested TSC scaling logic as possible in common code, we need these vendor callbacks for retrieving the TSC offset and the TSC multiplier that L1 has set for L2. Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-7-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ilias Stamatis 提交于
Sometimes kvm_scale_tsc() needs to use the current scaling ratio and other times (like when reading the TSC from user space) it needs to use L1's scaling ratio. Have the caller specify this by passing the ratio as a parameter. Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-5-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ilias Stamatis 提交于
Store L1's scaling ratio in the kvm_vcpu_arch struct like we already do for L1's TSC offset. This allows for easy save/restore when we enter and then exit the nested guest. Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-3-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
If the TDP MMU is in use, wait to allocate the rmaps until the shadow MMU is actually used. (i.e. a nested VM is launched.) This saves memory equal to 0.2% of guest memory in cases where the TDP MMU is used and there are no nested guests involved. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210518173414.450044-8-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
Add a field to control whether new memslots should have rmaps allocated for them. As of this change, it's not safe to skip allocating rmaps, so the field is always set to allocate rmaps. Future changes will make it safe to operate without rmaps, using the TDP MMU. Then further changes will allow the rmaps to be allocated lazily when needed for nested oprtation. No functional change expected. Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210518173414.450044-6-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Siddharth Chandrasekaran 提交于
Now that kvm_hv_flush_tlb() has been patched to support XMM hypercall inputs, we can start advertising this feature to guests. Cc: Alexander Graf <graf@amazon.com> Cc: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: NSiddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <e63fc1c61dd2efecbefef239f4f0a598bd552750.1622019134.git.sidcha@amazon.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Siddharth Chandrasekaran 提交于
Hyper-V supports the use of XMM registers to perform fast hypercalls. This allows guests to take advantage of the improved performance of the fast hypercall interface even though a hypercall may require more than (the current maximum of) two input registers. The XMM fast hypercall interface uses six additional XMM registers (XMM0 to XMM5) to allow the guest to pass an input parameter block of up to 112 bytes. Add framework to read from XMM registers in kvm_hv_hypercall() and use the additional hypercall inputs from XMM registers in kvm_hv_flush_tlb() when possible. Cc: Alexander Graf <graf@amazon.com> Co-developed-by: NEvgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: NEvgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: NSiddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <fc62edad33f1920fe5c74dde47d7d0b4275a9012.1622019134.git.sidcha@amazon.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 27 5月, 2021 1 次提交
-
-
由 Marcelo Tosatti 提交于
Add a start_assignment hook to kvm_x86_ops, which is called when kvm_arch_start_assignment is done. The hook is required to update the wakeup vector of a sleeping vCPU when a device is assigned to the guest. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Message-Id: <20210525134321.254128742@redhat.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 5月, 2021 9 次提交
-
-
由 Sean Christopherson 提交于
Disallow loading KVM SVM if 5-level paging is supported. In theory, NPT for L1 should simply work, but there unknowns with respect to how the guest's MAXPHYADDR will be handled by hardware. Nested NPT is more problematic, as running an L1 VMM that is using 2-level page tables requires stacking single-entry PDP and PML4 tables in KVM's NPT for L2, as there are no equivalent entries in L1's NPT to shadow. Barring hardware magic, for 5-level paging, KVM would need stack another layer to handle PML5. Opportunistically rename the lm_root pointer, which is used for the aforementioned stacking when shadowing 2-level L1 NPT, to pml4_root to call out that it's specifically for PML4. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210505204221.1934471-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Chenyi Qiang 提交于
Bus lock debug exception introduces a new bit DR6_BUS_LOCK (bit 11 of DR6) to indicate that bus lock #DB exception is generated. The set/clear of DR6_BUS_LOCK is similar to the DR6_RTM. The processor clears DR6_BUS_LOCK when the exception is generated. For all other #DB, the processor sets this bit to 1. Software #DB handler should set this bit before returning to the interrupted task. In VMM, to avoid breaking the CPUs without bus lock #DB exception support, activate the DR6_BUS_LOCK conditionally in DR6_FIXED_1 bits. When intercepting the #DB exception caused by bus locks, bit 11 of the exit qualification is set to identify it. The VMM should emulate the exception by clearing the bit 11 of the guest DR6. Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20210202090433.13441-3-chenyi.qiang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Squish the Intel and AMD emulation of MSR_TSC_AUX together and tie it to the guest CPU model instead of the host CPU behavior. While not strictly necessary to avoid guest breakage, emulating cross-vendor "architecture" will provide consistent behavior for the guest, e.g. WRMSR fault behavior won't change if the vCPU is migrated to a host with divergent behavior. Note, the "new" kvm_is_supported_user_return_msr() checks do not add new functionality on either SVM or VMX. On SVM, the equivalent was "tsc_aux_uret_slot < 0", and on VMX the check was buried in the vmx_find_uret_msr() call at the find_uret_msr label. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-15-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Now that SVM and VMX both probe MSRs before "defining" user return slots for them, consolidate the code for probe+define into common x86 and eliminate the odd behavior of having the vendor code define the slot for a given MSR. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-14-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Split out and export the number of configured user return MSRs so that VMX can iterate over the set of MSRs without having to do its own tracking. Keep the list itself internal to x86 so that vendor code still has to go through the "official" APIs to add/modify entries. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-13-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop VMX's global list of user return MSRs now that VMX doesn't resort said list to isolate "active" MSRs, i.e. now that VMX's list and x86's list have the same MSRs in the same order. In addition to eliminating the redundant list, this will also allow moving more of the list management into common x86. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-11-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Disable preemption when probing a user return MSR via RDSMR/WRMSR. If the MSR holds a different value per logical CPU, the WRMSR could corrupt the host's value if KVM is preempted between the RDMSR and WRMSR, and then rescheduled on a different CPU. Opportunistically land the helper in common x86, SVM will use the helper in a future commit. Fixes: 4be53410 ("KVM: VMX: Initialize vmx->guest_msrs[] right after allocation") Cc: stable@vger.kernel.org Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-6-seanjc@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Crash shutdown handler only disables kvmclock and steal time, other PV features remain active so we risk corrupting memory or getting some side-effects in kdump kernel. Move crash handler to kvm.c and unify with CPU offline. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210414123544.1060604-5-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Currenly, we disable kvmclock from machine_shutdown() hook and this only happens for boot CPU. We need to disable it for all CPUs to guard against memory corruption e.g. on restore from hibernate. Note, writing '0' to kvmclock MSR doesn't clear memory location, it just prevents hypervisor from updating the location so for the short while after write and while CPU is still alive, the clock remains usable and correct so we don't need to switch to some other clocksource. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210414123544.1060604-4-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 06 5月, 2021 3 次提交
-
-
由 Lai Jiangshan 提交于
In VMX, the host NMI handler needs to be invoked after NMI VM-Exit. Before commit 1a5488ef ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn"), this was done by INTn ("int $2"). But INTn microcode is relatively expensive, so the commit reworked NMI VM-Exit handling to invoke the kernel handler by function call. But this missed a detail. The NMI entry point for direct invocation is fetched from the IDT table and called on the kernel stack. But on 64-bit the NMI entry installed in the IDT expects to be invoked on the IST stack. It relies on the "NMI executing" variable on the IST stack to work correctly, which is at a fixed position in the IST stack. When the entry point is unexpectedly called on the kernel stack, the RSP-addressed "NMI executing" variable is obviously also on the kernel stack and is "uninitialized" and can cause the NMI entry code to run in the wrong way. Provide a non-ist entry point for VMX which shares the C-function with the regular NMI entry and invoke the new asm entry point instead. On 32-bit this just maps to the regular NMI entry point as 32-bit has no ISTs and is not affected. [ tglx: Made it independent for backporting, massaged changelog ] Fixes: 1a5488ef ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn") Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NLai Jiangshan <laijs@linux.alibaba.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87r1imi8i1.ffs@nanos.tec.linutronix.de
-
由 Sean Christopherson 提交于
Drop write_tsc() and write_rdtscp_aux(); the former has no users, and the latter has only a single user and is slightly misleading since the only in-kernel consumer of MSR_TSC_AUX is RDPID, not RDTSCP. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210504225632.1532621-3-seanjc@google.com
-
由 Alexey Dobriyan 提交于
Both instructions aren't used by kernel. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/YIHHYNKbiSf5N7+o@localhost.localdomain
-
- 05 5月, 2021 1 次提交
-
-
由 Linus Torvalds 提交于
We used to generate this constant with static jumps, which certainly works, but generates some quite unreadable and horrid code, and extra jumps. It's actually much simpler to just use our alternative_asm() infrastructure to generate a simple alternative constant, making the generated code much more obvious (and straight-line rather than "jump around to load the right constant"). Acked-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com>
-
- 03 5月, 2021 1 次提交
-
-
由 Maxim Levitsky 提交于
* Define and use an invalid GPA (all ones) for init value of last and current nested vmcb physical addresses. * Reset the current vmcb12 gpa to the invalid value when leaving the nested mode, similar to what is done on nested vmexit. * Reset the last seen vmcb12 address when disabling the nested SVM, as it relies on vmcb02 fields which are freed at that point. Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest") Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210503125446.1353307-3-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 5月, 2021 3 次提交
-
-
由 Nicholas Piggin 提交于
If an architecture doesn't support a particular page table level as a huge vmap page size then allow it to skip defining the support query function. Link: https://lkml.kernel.org/r/20210317062402.533919-11-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com> Suggested-by: NChristoph Hellwig <hch@lst.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nicholas Piggin 提交于
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Link: https://lkml.kernel.org/r/20210317062402.533919-10-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Russell King <linux@armlinux.org.uk> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nicholas Piggin 提交于
This changes the awkward approach where architectures provide init functions to determine which levels they can provide large mappings for, to one where the arch is queried for each call. This removes code and indirection, and allows constant-folding of dead code for unsupported levels. This also adds a prot argument to the arch query. This is unused currently but could help with some architectures (e.g., some powerpc processors can't map uncacheable memory with large pages). Link: https://lkml.kernel.org/r/20210317062402.533919-7-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NDing Tianhong <dingtianhong@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Russell King <linux@armlinux.org.uk> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 4月, 2021 1 次提交
-
-
由 Sean Christopherson 提交于
Drop the sev_enabled flag and switch its one user over to sev_active(). sev_enabled was made redundant with the introduction of sev_status in commit b57de6cd ("x86/sev-es: Add SEV-ES Feature Detection"). sev_enabled and sev_active() are guaranteed to be equivalent, as each is true iff 'sev_status & MSR_AMD64_SEV_ENABLED' is true, and are only ever written in tandem (ignoring compressed boot's version of sev_status). Removing sev_enabled avoids confusion over whether it refers to the guest or the host, and will also allow KVM to usurp "sev_enabled" for its own purposes. No functional change intended. Reviewed-by: NTom Lendacky <thomas.lendacky@amd.com> Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-7-seanjc@google.com> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 4月, 2021 1 次提交
-
-
由 Nathan Tempelman 提交于
Add a capability for userspace to mirror SEV encryption context from one vm to another. On our side, this is intended to support a Migration Helper vCPU, but it can also be used generically to support other in-guest workloads scheduled by the host. The intention is for the primary guest and the mirror to have nearly identical memslots. The primary benefits of this are that: 1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they can't accidentally clobber each other. 2) The VMs can have different memory-views, which is necessary for post-copy migration (the migration vCPUs on the target need to read and write to pages, when the primary guest would VMEXIT). This does not change the threat model for AMD SEV. Any memory involved is still owned by the primary guest and its initial state is still attested to through the normal SEV_LAUNCH_* flows. If userspace wanted to circumvent SEV, they could achieve the same effect by simply attaching a vCPU to the primary VM. This patch deliberately leaves userspace in charge of the memslots for the mirror, as it already has the power to mess with them in the primary guest. This patch does not support SEV-ES (much less SNP), as it does not handle handing off attested VMSAs to the mirror. For additional context, we need a Migration Helper because SEV PSP migration is far too slow for our live migration on its own. Using an in-guest migrator lets us speed this up significantly. Signed-off-by: NNathan Tempelman <natet@google.com> Message-Id: <20210408223214.2582277-1-natet@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 4月, 2021 2 次提交
-
-
由 Joseph Salisbury 提交于
There is not a consistent pattern for checking Hyper-V hypercall status. Existing code uses a number of variants. The variants work, but a consistent pattern would improve the readability of the code, and be more conformant to what the Hyper-V TLFS says about hypercall status. Implemented new helper functions hv_result(), hv_result_success(), and hv_repcomp(). Changed the places where hv_do_hypercall() and related variants are used to use the helper functions. Signed-off-by: NJoseph Salisbury <joseph.salisbury@microsoft.com> Reviewed-by: NMichael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1618620183-9967-2-git-send-email-joseph.salisbury@linux.microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
-
由 Joseph Salisbury 提交于
This patch makes no functional changes. It simply moves hv_do_rep_hypercall() out of arch/x86/include/asm/mshyperv.h and into asm-generic/mshyperv.h hv_do_rep_hypercall() is architecture independent, so it makes sense that it should be in the architecture independent mshyperv.h, not in the x86-specific mshyperv.h. This is done in preperation for a follow up patch which creates a consistent pattern for checking Hyper-V hypercall status. Signed-off-by: NJoseph Salisbury <joseph.salisbury@microsoft.com> Reviewed-by: NMichael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1618620183-9967-1-git-send-email-joseph.salisbury@linux.microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
-
- 20 4月, 2021 9 次提交
-
-
由 Colin Ian King 提交于
The variable st is being assigned a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NDenis Efremov <efremov@linux.com> Acked-by: NWilly Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20210415130020.1959951-1-colin.king@canonical.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Sean Christopherson 提交于
Add an ECREATE handler that will be used to intercept ECREATE for the purpose of enforcing and enclave's MISCSELECT, ATTRIBUTES and XFRM, i.e. to allow userspace to restrict SGX features via CPUID. ECREATE will be intercepted when any of the aforementioned masks diverges from hardware in order to enforce the desired CPUID model, i.e. inject #GP if the guest attempts to set a bit that hasn't been enumerated as allowed-1 in CPUID. Note, access to the PROVISIONKEY is not yet supported. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: NKai Huang <kai.huang@intel.com> Signed-off-by: NKai Huang <kai.huang@intel.com> Message-Id: <c3a97684f1b71b4f4626a1fc3879472a95651725.1618196135.git.kai.huang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add support for handling VM-Exits that originate from a guest SGX enclave. In SGX, an "enclave" is a new CPL3-only execution environment, wherein the CPU and memory state is protected by hardware to make the state inaccesible to code running outside of the enclave. When exiting an enclave due to an asynchronous event (from the perspective of the enclave), e.g. exceptions, interrupts, and VM-Exits, the enclave's state is automatically saved and scrubbed (the CPU loads synthetic state), and then reloaded when re-entering the enclave. E.g. after an instruction based VM-Exit from an enclave, vmcs.GUEST_RIP will not contain the RIP of the enclave instruction that trigered VM-Exit, but will instead point to a RIP in the enclave's untrusted runtime (the guest userspace code that coordinates entry/exit to/from the enclave). To help a VMM recognize and handle exits from enclaves, SGX adds bits to existing VMCS fields, VM_EXIT_REASON.VMX_EXIT_REASON_FROM_ENCLAVE and GUEST_INTERRUPTIBILITY_INFO.GUEST_INTR_STATE_ENCLAVE_INTR. Define the new architectural bits, and add a boolean to struct vcpu_vmx to cache VMX_EXIT_REASON_FROM_ENCLAVE. Clear the bit in exit_reason so that checks against exit_reason do not need to account for SGX, e.g. "if (exit_reason == EXIT_REASON_EXCEPTION_NMI)" continues to work. KVM is a largely a passive observer of the new bits, e.g. KVM needs to account for the bits when propagating information to a nested VMM, but otherwise doesn't need to act differently for the majority of VM-Exits from enclaves. The one scenario that is directly impacted is emulation, which is for all intents and purposes impossible[1] since KVM does not have access to the RIP or instruction stream that triggered the VM-Exit. The inability to emulate is a non-issue for KVM, as most instructions that might trigger VM-Exit unconditionally #UD in an enclave (before the VM-Exit check. For the few instruction that conditionally #UD, KVM either never sets the exiting control, e.g. PAUSE_EXITING[2], or sets it if and only if the feature is not exposed to the guest in order to inject a #UD, e.g. RDRAND_EXITING. But, because it is still possible for a guest to trigger emulation, e.g. MMIO, inject a #UD if KVM ever attempts emulation after a VM-Exit from an enclave. This is architecturally accurate for instruction VM-Exits, and for MMIO it's the least bad choice, e.g. it's preferable to killing the VM. In practice, only broken or particularly stupid guests should ever encounter this behavior. Add a WARN in skip_emulated_instruction to detect any attempt to modify the guest's RIP during an SGX enclave VM-Exit as all such flows should either be unreachable or must handle exits from enclaves before getting to skip_emulated_instruction. [1] Impossible for all practical purposes. Not truly impossible since KVM could implement some form of para-virtualization scheme. [2] PAUSE_LOOP_EXITING only affects CPL0 and enclaves exist only at CPL3, so we also don't need to worry about that interaction. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NKai Huang <kai.huang@intel.com> Message-Id: <315f54a8507d09c292463ef29104e1d4c62e9090.1618196135.git.kai.huang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Page faults that are signaled by the SGX Enclave Page Cache Map (EPCM), as opposed to the traditional IA32/EPT page tables, set an SGX bit in the error code to indicate that the #PF was induced by SGX. KVM will need to emulate this behavior as part of its trap-and-execute scheme for virtualizing SGX Launch Control, e.g. to inject SGX-induced #PFs if EINIT faults in the host, and to support live migration. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NKai Huang <kai.huang@intel.com> Message-Id: <e170c5175cb9f35f53218a7512c9e3db972b97a2.1618196135.git.kai.huang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Keqian Zhu 提交于
kvm_mmu_slot_largepage_remove_write_access() is decared but not used, just remove it. Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com> Message-Id: <20210406063504.17552-1-zhukeqian1@huawei.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
To analyze some performance issues with lock contention and scheduling, it is nice to know when directed yield are successful or failing. Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1617941911-5338-2-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Kan Liang 提交于
Some platforms, e.g. Alder Lake, have hybrid architecture. Although most PMU capabilities are the same, there are still some unique PMU capabilities for different hybrid PMUs. Perf should register a dedicated pmu for each hybrid PMU. Add a new struct x86_hybrid_pmu, which saves the dedicated pmu and capabilities for each hybrid PMU. The architecture MSR, MSR_IA32_PERF_CAPABILITIES, only indicates the architecture features which are available on all hybrid PMUs. The architecture features are stored in the global x86_pmu.intel_cap. For Alder Lake, the model-specific features are perf metrics and PEBS-via-PT. The corresponding bits of the global x86_pmu.intel_cap should be 0 for these two features. Perf should not use the global intel_cap to check the features on a hybrid system. Add a dedicated intel_cap in the x86_hybrid_pmu to store the model-specific capabilities. Use the dedicated intel_cap to replace the global intel_cap for thse two features. The dedicated intel_cap will be set in the following "Add Alder Lake Hybrid support" patch. Add is_hybrid() to distinguish a hybrid system. ADL may have an alternative configuration. With that configuration, the X86_FEATURE_HYBRID_CPU is not set. Perf cannot rely on the feature bit. Add a new static_key_false, perf_is_hybrid, to indicate a hybrid system. It will be assigned in the following "Add Alder Lake Hybrid support" patch as well. Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1618237865-33448-5-git-send-email-kan.liang@linux.intel.com
-
由 Ricardo Neri 提交于
On processors with Intel Hybrid Technology (i.e., one having more than one type of CPU in the same package), all CPUs support the same instruction set and enumerate the same features on CPUID. Thus, all software can run on any CPU without restrictions. However, there may be model-specific differences among types of CPUs. For instance, each type of CPU may support a different number of performance counters. Also, machine check error banks may be wired differently. Even though most software will not care about these differences, kernel subsystems dealing with these differences must know. Add and expose a new helper function get_this_hybrid_cpu_type() to query the type of the current hybrid CPU. The function will be used later in the perf subsystem. The Intel Software Developer's Manual defines the CPU type as 8-bit identifier. Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NTony Luck <tony.luck@intel.com> Reviewed-by: NLen Brown <len.brown@intel.com> Acked-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1618237865-33448-3-git-send-email-kan.liang@linux.intel.com
-
由 Ricardo Neri 提交于
Add feature enumeration to identify a processor with Intel Hybrid Technology: one in which CPUs of more than one type are the same package. On a hybrid processor, all CPUs support the same homogeneous (i.e., symmetric) instruction set. All CPUs enumerate the same features in CPUID. Thus, software (user space and kernel) can run and migrate to any CPU in the system as well as utilize any of the enumerated features without any change or special provisions. The main difference among CPUs in a hybrid processor are power and performance properties. Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NTony Luck <tony.luck@intel.com> Reviewed-by: NLen Brown <len.brown@intel.com> Acked-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1618237865-33448-2-git-send-email-kan.liang@linux.intel.com
-
- 19 4月, 2021 1 次提交
-
-
由 Ben Gardon 提交于
Protect the contents of the TDP MMU roots list with RCU in preparation for a future patch which will allow the iterator macro to be used under the MMU lock in read mode. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-9-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 17 4月, 2021 1 次提交
-
-
由 Sean Christopherson 提交于
Yank out the hva-based MMU notifier APIs now that all architectures that use the notifiers have moved to the gfn-based APIs. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-7-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-