- 20 9月, 2022 2 次提交
-
-
由 Chenyi Qiang 提交于
mainline inclusion from mainline-v6.0-rc1 commit ed235117 category: feature feature: Notify VM exit bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5PAJ5 CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit/?id=ed235117 Intel-SIG: commit ed235117 ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault") ------------------------------------- KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault For the triple fault sythesized by KVM, e.g. the RSM path or nested_vmx_abort(), if KVM exits to userspace before the request is serviced, userspace could migrate the VM and lose the triple fault. Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault with a new event KVM_VCPUEVENT_VALID_FAULT_FAULT so that userspace can save and restore the triple fault event. This extension is guarded by a new KVM capability KVM_CAP_TRIPLE_FAULT_EVENT. Note that in the set_vcpu_events path, userspace is able to set/clear the triple fault request through triple_fault.pending field. Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220524135624.22988-2-chenyi.qiang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAichun Shi <aichun.shi@intel.com>
-
由 Paolo Bonzini 提交于
mainline inclusion from mainline-v5.19-rc2 commit 6cd88243 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PJ7H CVE: CVE-2022-39189 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6cd88243c7e03845a450795e134b488fc2afb736 ---------------------------------------- If a vCPU is outside guest mode and is scheduled out, it might be in the process of making a memory access. A problem occurs if another vCPU uses the PV TLB flush feature during the period when the vCPU is scheduled out, and a virtual address has already been translated but has not yet been accessed, because this is equivalent to using a stale TLB entry. To avoid this, only report a vCPU as preempted if sure that the guest is at an instruction boundary. A rescheduling request will be delivered to the host physical CPU as an external interrupt, so for simplicity consider any vmexit *not* instruction boundary except for external interrupts. It would in principle be okay to report the vCPU as preempted also if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the vmentry/vmexit overhead unnecessarily, and optimistic spinning is also unlikely to succeed. However, leave it for later because right now kvm_vcpu_check_block() is doing memory accesses. Even though the TLB flush issue only applies to virtual memory address, it's very much preferrable to be conservative. Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> conflict: arch/x86/kvm/x86.c Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: Nyezengruan <yezengruan@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 17 9月, 2022 1 次提交
-
-
由 Chenyi Qiang 提交于
mainline inclusion from mainline-v5.12-rc1 commit fe6b6bc8 category: feature feature: KVM Bus Lock VM Exit bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit/?id=fe6b6bc8 Intel-SIG: commit fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit") ------------------------------------- KVM: VMX: Enable bus lock VM exit Virtual Machine can exploit bus locks to degrade the performance of system. Bus lock can be caused by split locked access to writeback(WB) memory or by using locks on uncacheable(UC) memory. The bus lock is typically >1000 cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores (which must wait for the bus lock to be released before their memory operations can complete). To address the threat, bus lock VM exit is introduced to notify the VMM when a bus lock was acquired, allowing it to enforce throttling or other policy based mitigations. A VMM can enable VM exit due to bus locks by setting a new "Bus Lock Detection" VM-execution control(bit 30 of Secondary Processor-based VM execution controls). If delivery of this VM exit was preempted by a higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of exit reason in vmcs field is set to 1. In current implementation, the KVM exposes this capability through KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap (i.e. off and exit) and enable it explicitly (disabled by default). If bus locks in guest are detected by KVM, exit to user space even when current exit reason is handled by KVM internally. Set a new field KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there is a bus lock detected in guest. Document for Bus Lock VM exit is now available at the latest "Intel Architecture Instruction Set Extensions Programming Reference". Document Link: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.htmlCo-developed-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAichun Shi <aichun.shi@intel.com>
-
- 01 9月, 2022 1 次提交
-
-
由 Wei Huang 提交于
mainline inclusion from mainline-5.15 commit cb0f722a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5NGRU CVE: NA ------------------------------------------------- When the 5-level page table CPU flag is set in the host, but the guest has CR4.LA57=0 (including the case of a 32-bit guest), the top level of the shadow NPT page tables will be fixed, consisting of one pointer to a lower-level table and 511 non-present entries. Extend the existing code that creates the fixed PML4 or PDP table, to provide a fixed PML5 table if needed. This is not needed on EPT because the number of layers in the tables is specified in the EPTP instead of depending on the host CR4. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NWei Huang <wei.huang2@amd.com> Message-Id: <20210818165549.3771014-3-wei.huang2@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NXie Haocheng <haocheng.xie@amd.com>
-
- 31 8月, 2022 1 次提交
-
-
由 Sean Christopherson 提交于
mainline inclusion from mainline-v5.13 commit 03ca4589 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5NGRU CVE: NA ------------------------------------------------- Disallow loading KVM SVM if 5-level paging is supported. In theory, NPT for L1 should simply work, but there unknowns with respect to how the guest's MAXPHYADDR will be handled by hardware. Nested NPT is more problematic, as running an L1 VMM that is using 2-level page tables requires stacking single-entry PDP and PML4 tables in KVM's NPT for L2, as there are no equivalent entries in L1's NPT to shadow. Barring hardware magic, for 5-level paging, KVM would need stack another layer to handle PML5. Opportunistically rename the lm_root pointer, which is used for the aforementioned stacking when shadowing 2-level L1 NPT, to pml4_root to call out that it's specifically for PML4. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210505204221.1934471-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NXie Haocheng <haocheng.xie@amd.com>
-
- 29 8月, 2022 2 次提交
-
-
由 Chao Gao 提交于
mainline inclusion from mainline-v6.0-rc1 commit d588bb9b category: feature feature: IPI Virtualization bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d588bb9be1da6aa750aa64875fe57369db983d8b Intel-SIG: commit d588bb9b ("KVM: VMX: enable IPI virtualization") ------------------------------------- KVM: VMX: enable IPI virtualization With IPI virtualization enabled, the processor emulates writes to APIC registers that would send IPIs. The processor sets the bit corresponding to the vector in target vCPU's PIR and may send a notification (IPI) specified by NDST and NV fields in target vCPU's Posted-Interrupt Descriptor (PID). It is similar to what IOMMU engine does when dealing with posted interrupt from devices. A PID-pointer table is used by the processor to locate the PID of a vCPU with the vCPU's APIC ID. The table size depends on maximum APIC ID assigned for current VM session from userspace. Allocating memory for PID-pointer table is deferred to vCPU creation, because irqchip mode and VM-scope maximum APIC ID is settled at that point. KVM can skip PID-pointer table allocation if !irqchip_in_kernel(). Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its notification vector to wakeup vector. This can ensure that when an IPI for blocked vCPUs arrives, VMM can get control and wake up blocked vCPUs. And if a VCPU is preempted, its posted interrupt notification is suppressed. Note that IPI virtualization can only virualize physical-addressing, flat mode, unicast IPIs. Sending other IPIs would still cause a trap-like APIC-write VM-exit and need to be handled by VMM. Signed-off-by: NChao Gao <chao.gao@intel.com> Signed-off-by: NZeng Guang <guang.zeng@intel.com> Message-Id: <20220419154510.11938-1-guang.zeng@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJason Zeng <jason.zeng@intel.com>
-
由 Zeng Guang 提交于
mainline inclusion from mainline-v6.0-rc1 commit 35875316 category: feature feature: IPI Virtualization bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=35875316384b71d23dc2a45a969732fc8cab16af Intel-SIG: commit 35875316 ("KVM: x86: Allow userspace to set maximum VCPU id for VM") ------------------------------------- KVM: x86: Allow userspace to set maximum VCPU id for VM Introduce new max_vcpu_ids in KVM for x86 architecture. Userspace can assign maximum possible vcpu id for current VM session using KVM_CAP_MAX_VCPU_ID of KVM_ENABLE_CAP ioctl(). This is done for x86 only because the sole use case is to guide memory allocation for PID-pointer table, a structure needed to enable VMX IPI. By default, max_vcpu_ids set as KVM_MAX_VCPU_ID. Suggested-by: NSean Christopherson <seanjc@google.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Signed-off-by: NZeng Guang <guang.zeng@intel.com> Message-Id: <20220419154444.11888-1-guang.zeng@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJason Zeng <jason.zeng@intel.com> Signed-off-by: NJason Zeng <jason.zeng@intel.com>
-
- 04 8月, 2022 1 次提交
-
-
由 Thomas Gleixner 提交于
mainline inclusion from mainline-v5.16-rc1 commit d69c1382 category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC CVE: NA Intel-SIG: commit d69c1382 x86/kvm: Convert FPU handling to a single swap buffer. -------------------------------- For the upcoming AMX support it's necessary to do a proper integration with KVM. Currently KVM allocates two FPU structs which are used for saving the user state of the vCPU thread and restoring the guest state when entering vcpu_run() and doing the reverse operation before leaving vcpu_run(). With the new fpstate mechanism this can be reduced to one extra buffer by swapping the fpstate pointer in current::thread::fpu. This makes the upcoming support for AMX and XFD simpler because then fpstate information (features, sizes, xfd) are always consistent and it does not require any nasty workarounds. Convert the KVM FPU code over to this new scheme. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211022185313.019454292@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
-
- 19 7月, 2022 1 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.112 commit 342454231ee5f2c2782f5510cab2e7a968486fef category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5HL0X Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=342454231ee5f2c2782f5510cab2e7a968486fef -------------------------------- commit 1d0e8480 upstream. Resolve nx_huge_pages to true/false when kvm.ko is loaded, leaving it as -1 is technically undefined behavior when its value is read out by param_get_bool(), as boolean values are supposed to be '0' or '1'. Alternatively, KVM could define a custom getter for the param, but the auto value doesn't depend on the vendor module in any way, and printing "auto" would be unnecessarily unfriendly to the user. In addition to fixing the undefined behavior, resolving the auto value also fixes the scenario where the auto value resolves to N and no vendor module is loaded. Previously, -1 would result in Y being printed even though KVM would ultimately disable the mitigation. Rename the existing MMU module init/exit helpers to clarify that they're invoked with respect to the vendor module, and add comments to document why KVM has two separate "module init" flows. ========================================================================= UBSAN: invalid-load in kernel/params.c:320:33 load of value 255 is not a valid value for type '_Bool' CPU: 6 PID: 892 Comm: tail Not tainted 5.17.0-rc3+ #799 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 ubsan_epilogue+0x5/0x40 __ubsan_handle_load_invalid_value.cold+0x43/0x48 param_get_bool.cold+0xf/0x14 param_attr_show+0x55/0x80 module_attr_show+0x1c/0x30 sysfs_kf_seq_show+0x93/0xc0 seq_read_iter+0x11c/0x450 new_sync_read+0x11b/0x1a0 vfs_read+0xf0/0x190 ksys_read+0x5f/0xe0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ========================================================================= Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation") Cc: stable@vger.kernel.org Reported-by: NBruno Goncalves <bgoncalv@redhat.com> Reported-by: NJan Stancek <jstancek@redhat.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220331221359.3912754-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 08 7月, 2022 2 次提交
-
-
由 Sean Christopherson 提交于
mainline inclusion from mainline-5.13 commit 70210c04 category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5EZEK CVE: NA Intel-SIG: commit 70210c04 KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions. Backport for SGX virtualization support -------------------------------- Add an ECREATE handler that will be used to intercept ECREATE for the purpose of enforcing and enclave's MISCSELECT, ATTRIBUTES and XFRM, i.e. to allow userspace to restrict SGX features via CPUID. ECREATE will be intercepted when any of the aforementioned masks diverges from hardware in order to enforce the desired CPUID model, i.e. inject #GP if the guest attempts to set a bit that hasn't been enumerated as allowed-1 in CPUID. Note, access to the PROVISIONKEY is not yet supported. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: NKai Huang <kai.huang@intel.com> Signed-off-by: NKai Huang <kai.huang@intel.com> Message-Id: <c3a97684f1b71b4f4626a1fc3879472a95651725.1618196135.git.kai.huang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NFan Du <fan.du@intel.com> Signed-off-by: NZhiquan Li <zhiquan1.li@intel.com>
-
由 Sean Christopherson 提交于
mainline inclusion from mainline-5.13 commit 00e7646c category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5EZEK CVE: NA Intel-SIG: commit 00e7646c KVM: x86: Define new #PF SGX error code bit. Backport for SGX virtualization support -------------------------------- Page faults that are signaled by the SGX Enclave Page Cache Map (EPCM), as opposed to the traditional IA32/EPT page tables, set an SGX bit in the error code to indicate that the #PF was induced by SGX. KVM will need to emulate this behavior as part of its trap-and-execute scheme for virtualizing SGX Launch Control, e.g. to inject SGX-induced #PFs if EINIT faults in the host, and to support live migration. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NKai Huang <kai.huang@intel.com> Message-Id: <e170c5175cb9f35f53218a7512c9e3db972b97a2.1618196135.git.kai.huang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NFan Du <fan.du@intel.com> Signed-off-by: NZhiquan Li <zhiquan1.li@intel.com>
-
- 07 6月, 2022 1 次提交
-
-
由 Mingwei Zhang 提交于
mainline inclusion from mainline-v5.18-rc4 commit 683412cc category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I58DFI CVE: CVE-2022-0171 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=683412ccf61294d727ead4a73d97397396e69a6b -------------------------------- Flush the CPU caches when memory is reclaimed from an SEV guest (where reclaim also includes it being unmapped from KVM's memslots). Due to lack of coherency for SEV encrypted memory, failure to flush results in silent data corruption if userspace is malicious/broken and doesn't ensure SEV guest memory is properly pinned and unpinned. Cache coherency is not enforced across the VM boundary in SEV (AMD APM vol.2 Section 15.34.7). Confidential cachelines, generated by confidential VM guests have to be explicitly flushed on the host side. If a memory page containing dirty confidential cachelines was released by VM and reallocated to another user, the cachelines may corrupt the new user at a later time. KVM takes a shortcut by assuming all confidential memory remain pinned until the end of VM lifetime. Therefore, KVM does not flush cache at mmu_notifier invalidation events. Because of this incorrect assumption and the lack of cache flushing, malicous userspace can crash the host kernel: creating a malicious VM and continuously allocates/releases unpinned confidential memory pages when the VM is running. Add cache flush operations to mmu_notifier operations to ensure that any physical memory leaving the guest VM get flushed. In particular, hook mmu_notifier_invalidate_range_start and mmu_notifier_release events and flush cache accordingly. The hook after releasing the mmu lock to avoid contention with other vCPUs. Cc: stable@vger.kernel.org Suggested-by: NSean Christpherson <seanjc@google.com> Reported-by: NMingwei Zhang <mizhang@google.com> Signed-off-by: NMingwei Zhang <mizhang@google.com> Message-Id: <20220421031407.2516575-4-mizhang@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> conflicts: arch/x86/include/asm/kvm-x86-ops.h arch/x86/kvm/x86.c virt/kvm/kvm_main.c Signed-off-by: NChen Jiahao <chenjiahao16@huawei.com> Reviewed-by: NLiao Chang <liaochang1@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 10 5月, 2022 1 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.97 commit 080dbe7e9b86a0392d8dffc00d9971792afc121f bugzilla: https://gitee.com/openeuler/kernel/issues/I55O0O Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=080dbe7e9b86a0392d8dffc00d9971792afc121f -------------------------------- commit f7e57078 upstream. Forcibly leave nested virtualization operation if userspace toggles SMM state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS. If userspace forces the vCPU out of SMM while it's post-VMXON and then injects an SMI, vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both vmxon=false and smm.vmxon=false, but all other nVMX state allocated. Don't attempt to gracefully handle the transition as (a) most transitions are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't sufficient information to handle all transitions, e.g. SVM wants access to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede KVM_SET_NESTED_STATE during state restore as the latter disallows putting the vCPU into L2 if SMM is active, and disallows tagging the vCPU as being post-VMXON in SMM if SMM is not active. Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU in an architecturally impossible state. WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Modules linked in: CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00 Call Trace: <TASK> kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123 kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline] kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460 kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline] kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline] kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250 kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273 __fput+0x286/0x9f0 fs/file_table.c:311 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0xb29/0x2a30 kernel/exit.c:806 do_group_exit+0xd2/0x2f0 kernel/exit.c:935 get_signal+0x4b0/0x28c0 kernel/signal.c:2862 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> Cc: stable@vger.kernel.org Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220125220358.2091737-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Backported-by: NTadeusz Struk <tadeusz.struk@linaro.org> Signed-off-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYu Liao <liaoyu15@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 19 4月, 2022 1 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.93 commit 413b427f5fff5d658c2605ca889d6b13b88efd0c bugzilla: 186204 https://gitee.com/openeuler/kernel/issues/I5311N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=413b427f5fff5d658c2605ca889d6b13b88efd0c -------------------------------- commit f4b027c5 upstream. Override the Processor Trace (PT) interrupt handler for guest mode if and only if PT is configured for host+guest mode, i.e. is being used independently by both host and guest. If PT is configured for system mode, the host fully controls PT and must handle all events. Fixes: 8479e04e ("KVM: x86: Inject PMI for KVM guest") Reported-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Reported-by: NArtem Kashkanov <artem.kashkanov@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211111020738.2512932-4-seanjc@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 14 1月, 2022 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
stable inclusion from stable-v5.10.85 commit 06368922f38f5a7a41a097c048f714e265492193 bugzilla: 186032 https://gitee.com/openeuler/kernel/issues/I4QVI4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=06368922f38f5a7a41a097c048f714e265492193 -------------------------------- commit 1ebfaa11 upstream. Prior to commit 0baedd79 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()"), kvm_hv_flush_tlb() was using 'KVM_REQ_TLB_FLUSH | KVM_REQUEST_NO_WAKEUP' when making a request to flush TLBs on other vCPUs and KVM_REQ_TLB_FLUSH is/was defined as: (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) so KVM_REQUEST_WAIT was lost. Hyper-V TLFS, however, requires that "This call guarantees that by the time control returns back to the caller, the observable effects of all flushes on the specified virtual processors have occurred." and without KVM_REQUEST_WAIT there's a small chance that the vCPU making the TLB flush will resume running before all IPIs get delivered to other vCPUs and a stale mapping can get read there. Fix the issue by adding KVM_REQUEST_WAIT flag to KVM_REQ_TLB_FLUSH_GUEST: kvm_hv_flush_tlb() is the sole caller which uses it for kvm_make_all_cpus_request()/kvm_make_vcpus_request_mask() where KVM_REQUEST_WAIT makes a difference. Cc: stable@kernel.org Fixes: 0baedd79 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()") Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211209102937.584397-1-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 13 10月, 2021 1 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-5.10.50 commit b2c5af71ce4b6e21f6af9fa292e72fd4662a2490 bugzilla: 174522 https://gitee.com/openeuler/kernel/issues/I4DNFY Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b2c5af71ce4b6e21f6af9fa292e72fd4662a2490 -------------------------------- [ Upstream commit 07ffaf34 ] Trigger a full TLB flush on behalf of the guest on nested VM-Enter and VM-Exit when VPID is disabled for L2. kvm_mmu_new_pgd() syncs only the current PGD, which can theoretically leave stale, unsync'd entries in a previous guest PGD, which could be consumed if L2 is allowed to load CR3 with PCID_NOFLUSH=1. Rename KVM_REQ_HV_TLB_FLUSH to KVM_REQ_TLB_FLUSH_GUEST so that it can be utilized for its obvious purpose of emulating a guest TLB flush. Note, there is no change the actual TLB flush executed by KVM, even though the fast PGD switch uses KVM_REQ_TLB_FLUSH_CURRENT. When VPID is disabled for L2, vpid02 is guaranteed to be '0', and thus nested_get_vpid02() will return the VPID that is shared by L1 and L2. Generate the request outside of kvm_mmu_new_pgd(), as getting the common helper to correctly identify which requested is needed is quite painful. E.g. using KVM_REQ_TLB_FLUSH_GUEST when nested EPT is in play is wrong as a TLB flush from the L1 kernel's perspective does not invalidate EPT mappings. And, by using KVM_REQ_TLB_FLUSH_GUEST, nVMX can do future simplification by moving the logic into nested_vmx_transition_tlb_flush(). Fixes: 41fab65e ("KVM: nVMX: Skip MMU sync on nested VMX transition when possible") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 12 10月, 2021 1 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-5.10.48 commit 4dc96804286498f74beabcfb7603bb76d9905ad9 bugzilla: 173268 https://gitee.com/openeuler/kernel/issues/I4DD1K Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4dc96804286498f74beabcfb7603bb76d9905ad9 -------------------------------- commit f71a53d1 upstream. Restore CR4.LA57 to the mmu_role to fix an amusing edge case with nested virtualization. When KVM (L0) is using TDP, CR4.LA57 is not reflected in mmu_role.base.level because that tracks the shadow root level, i.e. TDP level. Normally, this is not an issue because LA57 can't be toggled while long mode is active, i.e. the guest has to first disable paging, then toggle LA57, then re-enable paging, thus ensuring an MMU reinitialization. But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57 without having to bounce through an unpaged section. L1 can also load a new CR3 on exit, i.e. it doesn't even need to play crazy paging games, a single entry PML5 is sufficient. Such shenanigans are only problematic if L0 and L1 use TDP, otherwise L1 and L2 share an MMU that gets reinitialized on nested VM-Enter/VM-Exit due to mmu_role.base.guest_mode. Note, in the L2 case with nested TDP, even though L1 can switch between L2s with different LA57 settings, thus bypassing the paging requirement, in that case KVM's nested_mmu will track LA57 in base.level. This reverts commit 8053f924. Fixes: 8053f924 ("KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-6-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 03 6月, 2021 2 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-5.10.38 commit 31f29749ee970c251b3a7e5b914108425940d089 bugzilla: 51875 CVE: NA -------------------------------- commit 5104d7ff upstream. Disable preemption when probing a user return MSR via RDSMR/WRMSR. If the MSR holds a different value per logical CPU, the WRMSR could corrupt the host's value if KVM is preempted between the RDMSR and WRMSR, and then rescheduled on a different CPU. Opportunistically land the helper in common x86, SVM will use the helper in a future commit. Fixes: 4be53410 ("KVM: VMX: Initialize vmx->guest_msrs[] right after allocation") Cc: stable@vger.kernel.org Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-6-seanjc@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Sean Christopherson 提交于
stable inclusion from stable-5.10.38 commit 21f317826e170c1cf03944d7ce7b9142c238fb71 bugzilla: 51875 CVE: NA -------------------------------- commit c5e2184d upstream. Remove the update_pte() shadow paging logic, which was obsoleted by commit 4731d4c7 ("KVM: MMU: out of sync shadow core"), but never removed. As pointed out by Yu, KVM never write protects leaf page tables for the purposes of shadow paging, and instead marks their associated shadow page as unsync so that the guest can write PTEs at will. The update_pte() path, which predates the unsync logic, optimizes COW scenarios by refreshing leaf SPTEs when they are written, as opposed to zapping the SPTE, restarting the guest, and installing the new SPTE on the subsequent fault. Since KVM no longer write-protects leaf page tables, update_pte() is unreachable and can be dropped. Reported-by: NYu Zhang <yu.c.zhang@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210115004051.4099250-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 19 4月, 2021 2 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-5.10.27 commit 771dfb3c531d1ecce209c82161227d66b24d7784 bugzilla: 51493 -------------------------------- [ Upstream commit b318e8de ] Fix a plethora of issues with MSR filtering by installing the resulting filter as an atomic bundle instead of updating the live filter one range at a time. The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as the hardware MSR bitmaps won't be updated until the next VM-Enter, but the relevant software struct is atomically updated, which is what KVM really needs. Similar to the approach used for modifying memslots, make arch.msr_filter a SRCU-protected pointer, do all the work configuring the new filter outside of kvm->lock, and then acquire kvm->lock only when the new filter has been vetted and created. That way vCPU readers either see the old filter or the new filter in their entirety, not some half-baked state. Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a TOCTOU bug, but that's just the tip of the iceberg... - Nothing is __rcu annotated, making it nigh impossible to audit the code for correctness. - kvm_add_msr_filter() has an unpaired smp_wmb(). Violation of kernel coding style aside, the lack of a smb_rmb() anywhere casts all code into doubt. - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs count before taking the lock. - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug. The entire approach of updating the live filter is also flawed. While installing a new filter is inherently racy if vCPUs are running, fixing the above issues also makes it trivial to ensure certain behavior is deterministic, e.g. KVM can provide deterministic behavior for MSRs with identical settings in the old and new filters. An atomic update of the filter also prevents KVM from getting into a half-baked state, e.g. if installing a filter fails, the existing approach would leave the filter in a half-baked state, having already committed whatever bits of the filter were already processed. [*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering") Cc: stable@vger.kernel.org Cc: Alexander Graf <graf@amazon.com> Reported-by: NYuan Yao <yaoyuan0329os@gmail.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210316184436.2544875-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 chenjiajun 提交于
virt inclusion category: feature bugzilla: 46853 CVE: NA Export EXIT_REASON_PREEMPTION_TIMER kvm exits to vcpu_stat debugfs. Add a new column to vcpu_stat, and provide preemption_timer status to virtualization detection tools. Signed-off-by: Nchenjiajun <chenjiajun8@huawei.com> Reviewed-by: NXiangyou Xie <xiexiangyou@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 12 1月, 2021 2 次提交
-
-
由 chenjiajun 提交于
virt inclusion category: feature bugzilla: 46853 CVE: NA Export vcpu_stat via debugfs for x86, which contains x86 kvm exits items. The path of the vcpu_stat is /sys/kernel/debug/kvm/vcpu_stat, and each line of vcpu_stat is a collection of various kvm exits for a vcpu. And through vcpu_stat, we only need to open one file to tail performance of virtual machine, which is more convenient. Signed-off-by: NFeng Lin <linfeng23@huawei.com> Signed-off-by: Nchenjiajun <chenjiajun8@huawei.com> Reviewed-by: NXiangyou Xie <xiexiangyou@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
由 chenjiajun 提交于
virt inclusion category: feature bugzilla: 46853 CVE: NA This patch create debugfs entry for vcpu stat. The entry path is /sys/kernel/debug/kvm/vcpu_stat. And vcpu_stat contains partial kvm exits items of vcpu, include: pid, hvc_exit_stat, wfe_exit_stat, wfi_exit_stat, mmio_exit_user, mmio_exit_kernel, exits Currently, The maximum vcpu limit is 1024. From this vcpu_stat, user can get the number of these kvm exits items over a period of time, which is helpful to monitor the virtual machine. Signed-off-by: NZenghui Yu <yuzenghui@huawei.com> Signed-off-by: Nchenjiajun <chenjiajun8@huawei.com> Reviewed-by: NXiangyou Xie <xiexiangyou@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
- 27 11月, 2020 1 次提交
-
-
由 Paolo Bonzini 提交于
kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are a hodge-podge of conditions, hacked together to get something that more or less works. But what is actually needed is much simpler; in both cases the fundamental question is, do we have a place to stash an interrupt if userspace does KVM_INTERRUPT? In userspace irqchip mode, that is !vcpu->arch.interrupt.injected. Currently kvm_event_needs_reinjection(vcpu) covers it, but it is unnecessarily restrictive. In split irqchip mode it's a bit more complicated, we need to check kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK cycle and thus requires ExtINTs not to be masked) as well as !pending_userspace_extint(vcpu). However, there is no need to check kvm_event_needs_reinjection(vcpu), since split irqchip keeps pending ExtINT state separate from event injection state, and checking kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher priority than APIC interrupts. In fact the latter fixes a bug: when userspace requests an IRQ window vmexit, an interrupt in the local APIC can cause kvm_cpu_has_interrupt() to be true and thus kvm_vcpu_ready_for_interrupt_injection() to return false. When this happens, vcpu_run does not exit to userspace but the interrupt window vmexits keep occurring. The VM loops without any hope of making progress. Once we try to fix these with something like return kvm_arch_interrupt_allowed(vcpu) && - !kvm_cpu_has_interrupt(vcpu) && - !kvm_event_needs_reinjection(vcpu) && - kvm_cpu_accept_dm_intr(vcpu); + (!lapic_in_kernel(vcpu) + ? !vcpu->arch.interrupt.injected + : (kvm_apic_accept_pic_intr(vcpu) + && !pending_userspace_extint(v))); we realize two things. First, thanks to the previous patch the complex conditional can reuse !kvm_cpu_has_extint(vcpu). Second, the interrupt window request in vcpu_enter_guest() bool req_int_win = dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); should be kept in sync with kvm_vcpu_ready_for_interrupt_injection(): it is unnecessary to ask the processor for an interrupt window if we would not be able to return to userspace. Therefore, kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu) ANDed with the existing check for masked ExtINT. It all makes sense: - we can accept an interrupt from userspace if there is a place to stash it (and, for irqchip split, ExtINTs are not masked). Interrupts from userspace _can_ be accepted even if right now EFLAGS.IF=0. - in order to tell userspace we will inject its interrupt ("IRQ window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both KVM and the vCPU need to be ready to accept the interrupt. ... and this is what the patch implements. Reported-by: NDavid Woodhouse <dwmw@amazon.co.uk> Analyzed-by: NDavid Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NNikos Tsironis <ntsironis@arrikto.com> Reviewed-by: NDavid Woodhouse <dwmw@amazon.co.uk> Tested-by: NDavid Woodhouse <dwmw@amazon.co.uk>
-
- 13 11月, 2020 1 次提交
-
-
由 Babu Moger 提交于
SEV guests fail to boot on a system that supports the PCID feature. While emulating the RSM instruction, KVM reads the guest CR3 and calls kvm_set_cr3(). If the vCPU is in the long mode, kvm_set_cr3() does a sanity check for the CR3 value. In this case, it validates whether the value has any reserved bits set. The reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory encryption is enabled, the memory encryption bit is set in the CR3 value. The memory encryption bit may fall within the KVM reserved bit range, causing the KVM emulation failure. Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will cache the reserved bits in the CR3 value. This will be initialized to rsvd_bits(cpuid_maxphyaddr(vcpu), 63). If the architecture has any special bits(like AMD SEV encryption bit) that needs to be masked from the reserved bits, should be cleared in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler. Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3") Signed-off-by: NBabu Moger <babu.moger@amd.com> Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 10月, 2020 1 次提交
-
-
由 Ben Gardon 提交于
Attach struct kvm_mmu_pages to every page in the TDP MMU to track metadata, facilitate NX reclaim, and enable inproved parallelism of MMU operations in future patches. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20201014182700.2888246-12-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 10月, 2020 6 次提交
-
-
由 Ben Gardon 提交于
The TDP MMU must be able to allocate paging structure root pages and track the usage of those pages. Implement a similar, but separate system for root page allocation to that of the x86 shadow paging implementation. When future patches add synchronization model changes to allow for parallel page faults, these pages will need to be handled differently from the x86 shadow paging based MMU's root pages. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
The TDP MMU offers an alternative mode of operation to the x86 shadow paging based MMU, optimized for running an L1 guest with TDP. The TDP MMU will require new fields that need to be initialized and torn down. Add hooks into the existing KVM MMU initialization process to do that initialization / cleanup. Currently the initialization and cleanup fucntions do not do very much, however more operations will be added in future patches. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20201014182700.2888246-4-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
This will be used to signal an error to the userspace, in case the vendor code failed during handling of this msr. (e.g -ENOMEM) Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-4-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
As vcpu->arch.cpuid_entries is now allocated dynamically, the only remaining use for KVM_MAX_CPUID_ENTRIES is to check KVM_SET_CPUID/ KVM_SET_CPUID2 input for sanity. Since it was reported that the current limit (80) is insufficient for some CPUs, bump KVM_MAX_CPUID_ENTRIES and use an arbitrary value '256' as the new limit. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201001130541.1398392-4-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
The current limit for guest CPUID leaves (KVM_MAX_CPUID_ENTRIES, 80) is reported to be insufficient but before we bump it let's switch to allocating vcpu->arch.cpuid_entries[] array dynamically. Currently, 'struct kvm_cpuid_entry2' is 40 bytes so vcpu->arch.cpuid_entries is 3200 bytes which accounts for 1/4 of the whole 'struct kvm_vcpu_arch' but having it pre-allocated (for all vCPUs which we also pre-allocate) gives us no real benefits. Another plus of the dynamic allocation is that we now do kvm_check_cpuid() check before we assign anything to vcpu->arch.cpuid_nent/cpuid_entries so no changes are made in case the check fails. Opportunistically remove unneeded 'out' labels from kvm_vcpu_ioctl_set_cpuid()/kvm_vcpu_ioctl_set_cpuid2() and return directly whenever possible. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201001130541.1398392-3-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
-
由 Oliver Upton 提交于
KVM unconditionally provides PV features to the guest, regardless of the configured CPUID. An unwitting guest that doesn't check KVM_CPUID_FEATURES before use could access paravirt features that userspace did not intend to provide. Fix this by checking the guest's CPUID before performing any paravirtual operations. Introduce a capability, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, to gate the aforementioned enforcement. Migrating a VM from a host w/o this patch to a host with this patch could silently change the ABI exposed to the guest, warranting that we default to the old behavior and opt-in for the new one. Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Signed-off-by: NOliver Upton <oupton@google.com> Change-Id: I202a0926f65035b872bfe8ad15307c026de59a98 Message-Id: <20200818152429.1923996-4-oupton@google.com> Reviewed-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 9月, 2020 7 次提交
-
-
由 Paolo Bonzini 提交于
We are going to use it for SVM too, so use a more generic name. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> -
由 Alexander Graf 提交于
It's not desireable to have all MSRs always handled by KVM kernel space. Some MSRs would be useful to handle in user space to either emulate behavior (like uCode updates) or differentiate whether they are valid based on the CPU model. To allow user space to specify which MSRs it wants to see handled by KVM, this patch introduces a new ioctl to push filter rules with bitmaps into KVM. Based on these bitmaps, KVM can then decide whether to reject MSR access. With the addition of KVM_CAP_X86_USER_SPACE_MSR it can also deflect the denied MSR events to user space to operate on. If no filter is populated, MSR handling stays identical to before. Signed-off-by: NAlexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-8-graf@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alexander Graf 提交于
In the following commits we will add pieces of MSR filtering. To ensure that code compiles even with the feature half-merged, let's add a few stubs and struct definitions before the real patches start. Signed-off-by: NAlexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-4-graf@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alexander Graf 提交于
MSRs are weird. Some of them are normal control registers, such as EFER. Some however are registers that really are model specific, not very interesting to virtualization workloads, and not performance critical. Others again are really just windows into package configuration. Out of these MSRs, only the first category is necessary to implement in kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against certain CPU models and MSRs that contain information on the package level are much better suited for user space to process. However, over time we have accumulated a lot of MSRs that are not the first category, but still handled by in-kernel KVM code. This patch adds a generic interface to handle WRMSR and RDMSR from user space. With this, any future MSR that is part of the latter categories can be handled in user space. Furthermore, it allows us to replace the existing "ignore_msrs" logic with something that applies per-VM rather than on the full system. That way you can run productive VMs in parallel to experimental ones where you don't care about proper MSR handling. Signed-off-by: NAlexander Graf <graf@amazon.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <20200925143422.21718-3-graf@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rename the "shared_msrs" mechanism, which is used to defer restoring MSRs that are only consumed when running in userspace, to a more banal but less likely to be confusing "user_return_msrs". The "shared" nomenclature is confusing as it's not obvious who is sharing what, e.g. reasonable interpretations are that the guest value is shared by vCPUs in a VM, or that the MSR value is shared/common to guest and host, both of which are wrong. "shared" is also misleading as the MSR value (in hardware) is not guaranteed to be shared/reused between VMs (if that's indeed the correct interpretation of the name), as the ability to share values between VMs is simply a side effect (albiet a very nice side effect) of deferring restoration of the host value until returning from userspace. "user_return" avoids the above confusion by describing the mechanism itself instead of its effects. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-2-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Extend the kvm_exit tracepoint to align it with kvm_nested_vmexit in terms of what information is captured. On SVM, add interrupt info and error code, while on VMX it add IDT vectoring and error code. This sets the stage for macrofying the kvm_exit tracepoint definition so that it can be reused for kvm_nested_vmexit without loss of information. Opportunistically stuff a zero for VM_EXIT_INTR_INFO if the VM-Enter failed, as the field is guaranteed to be invalid. Note, it'd be possible to further filter the interrupt/exception fields based on the VM-Exit reason, but the helper is intended only for tracepoints, i.e. an extra VMREAD or two is a non-issue, the failed VM-Enter case is just low hanging fruit. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923201349.16097-5-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Replace the existing kvm_x86_ops.need_emulation_on_page_fault() with a more generic is_emulatable(), and unconditionally call the new function in x86_emulate_instruction(). KVM will use the generic hook to support multiple security related technologies that prevent emulation in one way or another. Similar to the existing AMD #NPF case where emulation of the current instruction is not possible due to lack of information, AMD's SEV-ES and Intel's SGX and TDX will introduce scenarios where emulation is impossible due to the guest's register state being inaccessible. And again similar to the existing #NPF case, emulation can be initiated by kvm_mmu_page_fault(), i.e. outside of the control of vendor-specific code. While the cause and architecturally visible behavior of the various cases are different, e.g. SGX will inject a #UD, AMD #NPF is a clean resume or complete shutdown, and SEV-ES and TDX "return" an error, the impact on the common emulation code is identical: KVM must stop emulation immediately and resume the guest. Query is_emulatable() in handle_ud() as well so that the force_emulation_prefix code doesn't incorrectly modify RIP before calling emulate_instruction() in the absurdly unlikely scenario that KVM encounters forced emulation in conjunction with "do not emulate". Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200915232702.15945-1-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 8月, 2020 1 次提交
-
-
由 Will Deacon 提交于
The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: NWill Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-