- 05 12月, 2021 1 次提交
-
-
由 Tom Lendacky 提交于
Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT exit code or exit parameters fails. The VMGEXIT instruction can be issued from userspace, even though userspace (likely) can't update the GHCB. To prevent userspace from being able to kill the guest, return an error through the GHCB when validation fails rather than terminating the guest. For cases where the GHCB can't be updated (e.g. the GHCB can't be mapped, etc.), just return back to the guest. The new error codes are documented in the lasest update to the GHCB specification. Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Message-Id: <b57280b5562893e2616257ac9c2d4525a9aeeb42.1638471124.git.thomas.lendacky@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 02 12月, 2021 1 次提交
-
-
由 Paolo Bonzini 提交于
kvm_vcpu_apicv_active() returns false if a virtual machine has no in-kernel local APIC, however kvm_apicv_activated might still be true if there are no reasons to disable APICv; in fact it is quite likely that there is none because APICv is inhibited by specific configurations of the local APIC and those configurations cannot be programmed. This triggers a WARN: WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu)); To avoid this, introduce another cause for APICv inhibition, namely the absence of an in-kernel local APIC. This cause is enabled by default, and is dropped by either KVM_CREATE_IRQCHIP or the enabling of KVM_CAP_IRQCHIP_SPLIT. Reported-by: NIgnat Korchagin <ignat@cloudflare.com> Fixes: ee49a893 ("KVM: x86: Move SVM's APICv sanity check to common x86", 2021-10-22) Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Tested-by: NIgnat Korchagin <ignat@cloudflare.com> Message-Id: <20211130123746.293379-1-pbonzini@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 12月, 2021 1 次提交
-
-
由 Tony Luck 提交于
Convention for all the other "lake" CPUs is all one word. So s/RAPTOR_LAKE/RAPTORLAKE/ Fixes: fbdb5e8f ("x86/cpu: Add Raptor Lake to Intel family") Reported-by: NRui Zhang <rui.zhang@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20211119170832.1034220-1-tony.luck@intel.com
-
- 27 11月, 2021 1 次提交
-
-
由 Joerg Roedel 提交于
The macro is unused after commit 00ecd540 so it can be removed. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Fixes: 00ecd540 ("iommu/vt-d: Clean up unused PASID updating functions") Signed-off-by: NJoerg Roedel <jroedel@suse.de> Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211123105507.7654-2-joro@8bytes.org
-
- 25 11月, 2021 2 次提交
-
-
由 Juergen Gross 提交于
HYPERVISOR_set_debugreg() is being called from noinstr code, so it should be attributed "always_inline". Fixes: 7361fac0 ("x86/xen: Make set_debugreg() noinstr") Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20211125092056.24758-3-jgross@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Juergen Gross 提交于
HYPERVISOR_get_debugreg() is being called from noinstr code, so it should be attributed "always_inline". Fixes: f4afb713 ("x86/xen: Make get_debugreg() noinstr") Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20211125092056.24758-2-jgross@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
- 20 11月, 2021 1 次提交
-
-
由 Juergen Gross 提交于
The prototype of mem_map_via_hcall() is missing in its header, so add it. Reported-by: Nkernel test robot <lkp@intel.com> Fixes: a43fb7da ("xen/pvh: Move Xen code for getting mem map via hcall out of common file") Signed-off-by: NJuergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20211119153913.21678-1-jgross@suse.comReviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
- 18 11月, 2021 1 次提交
-
-
由 Maxim Levitsky 提交于
Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the guest root level and is not reflected in kvm_mmu_page_role.level when TDP is in use. When simply running the guest, it is impossible for EFER.LMA and kvm_mmu.root_level to get out of sync, as the guest cannot transition from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without first bouncing through a different MMU context. And stuffing guest state via KVM_SET_SREGS{,2} also ensures a full MMU context reset. However, if KVM_SET_SREGS{,2} is followed by KVM_SET_NESTED_STATE, e.g. to set guest state when migrating the VM while L2 is active, the vCPU state will reflect L2, not L1. If L1 is using TDP for L2, then root_mmu will have been configured using L2's state, despite not being used for L2. If L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will be configured for guest PAE paging, but will match the mmu_role for 64-bit paging and cause KVM to not reconfigure root_mmu on the next nested VM-Exit. Alternatively, the root_mmu's role could be invalidated after a successful KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu, i.e. that switches the active mmu to guest_mmu, but doing so is unnecessarily tricky, and not even needed if L1 and L2 do have the same role (e.g., they are both 64-bit guests and run with the same CR4). Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211115131837.195527-3-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 11月, 2021 1 次提交
-
-
由 Tony Luck 提交于
Add model ID for Raptor Lake. [ dhansen: These get added as soon as possible so that folks doing development can leverage them. ] Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20211112182835.924977-1-tony.luck@intel.com
-
- 11 11月, 2021 10 次提交
-
-
由 Vitaly Kuznetsov 提交于
KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports either num_online_cpus() or num_present_cpus(), s390 has multiple constants depending on hardware features. On x86, KVM reports an arbitrary value of '710' which is supposed to be the maximum tested value but it's possible to test all KVM_MAX_VCPUS even when there are less physical CPUs available. Drop the arbitrary '710' value and return num_online_cpus() on x86 as well. The recommendation will match other architectures and will mean 'no CPU overcommit'. For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning when the requested vCPU number exceeds it. The static limit of '710' is quite weird as smaller systems with just a few physical CPUs should certainly "recommend" less. Suggested-by: NEduardo Habkost <ehabkost@redhat.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211111134733.86601-1-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paul Durrant 提交于
Currently when kvm_update_cpuid_runtime() runs, it assumes that the KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true, however, if Hyper-V support is enabled. In this case the KVM leaves will be offset. This patch introdues as new 'kvm_cpuid_base' field into struct kvm_vcpu_arch to track the location of the KVM leaves and function kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now target the correct leaf. NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced into processor.h to avoid having duplicate code for the iteration over possible hypervisor base leaves. Signed-off-by: NPaul Durrant <pdurrant@amazon.com> Message-Id: <20211105095101.5384-3-pdurrant@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using standard kvm's inject_pending_event, and not via APICv/AVIC. Since this is a debug feature, just inhibit APICv/AVIC while KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU. Fixes: 61e5f69e ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ") Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Tested-by: NSean Christopherson <seanjc@google.com> Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
In commit b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed") we switched to using a gfn_to_pfn_cache for accessing the guest steal time structure in order to allow for an atomic xchg of the preempted field. This has a couple of problems. Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a guest vCPU using an IOMEM page for its steal time would never have its preempted field set. Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it should have been. There are two stages to the GFN->PFN conversion; first the GFN is converted to a userspace HVA, and then that HVA is looked up in the process page tables to find the underlying host PFN. Correct invalidation of the latter would require being hooked up to the MMU notifiers, but that doesn't happen---so it just keeps mapping and unmapping the *wrong* PFN after the userspace page tables change. In the !IOMEM case at least the stale page *is* pinned all the time it's cached, so it won't be freed and reused by anyone else while still receiving the steal time updates. The map/unmap dance only takes care of the KVM administrivia such as marking the page dirty. Until the gfn_to_pfn cache handles the remapping automatically by integrating with the MMU notifiers, we might as well not get a kernel mapping of it, and use the perfectly serviceable userspace HVA that we already have. We just need to implement the atomic xchg on the userspace address with appropriate exception handling, which is fairly trivial. Cc: stable@vger.kernel.org Fixes: b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed") Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org> [I didn't entirely agree with David's assessment of the usefulness of the gfn_to_pfn cache, and integrated the outcome of the discussion in the above commit message. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Gonda 提交于
For SEV to work with intra host migration, contents of the SEV info struct such as the ASID (used to index the encryption key in the AMD SP) and the list of memory regions need to be transferred to the target VM. This change adds a commands for a target VMM to get a source SEV VM's sev info. Signed-off-by: NPeter Gonda <pgonda@google.com> Suggested-by: NSean Christopherson <seanjc@google.com> Reviewed-by: NMarc Orr <marcorr@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20211021174303.385706-3-pgonda@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ashish Kalra 提交于
The guest support for detecting and enabling SEV Live migration feature uses the following logic : - kvm_init_plaform() checks if its booted under the EFI - If not EFI, i) if kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), issue a wrmsrl() to enable the SEV live migration support - If EFI, i) If kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), read the UEFI variable which indicates OVMF support for live migration ii) the variable indicates live migration is supported, issue a wrmsrl() to enable the SEV live migration support The EFI live migration check is done using a late_initcall() callback. Also, ensure that _bss_decrypted section is marked as decrypted in the hypervisor's guest page encryption status tracking. Signed-off-by: NAshish Kalra <ashish.kalra@amd.com> Reviewed-by: NSteve Rutherford <srutherford@google.com> Message-Id: <b4453e4c87103ebef12217d2505ea99a1c3e0f0f.1629726117.git.ashish.kalra@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Brijesh Singh 提交于
Invoke a hypercall when a memory region is changed from encrypted -> decrypted and vice versa. Hypervisor needs to know the page encryption status during the guest migration. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: NSteve Rutherford <srutherford@google.com> Reviewed-by: NVenu Busireddy <venu.busireddy@oracle.com> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NAshish Kalra <ashish.kalra@amd.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Message-Id: <0a237d5bb08793916c7790a3e653a2cbe7485761.1629726117.git.ashish.kalra@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Brijesh Singh 提交于
KVM hypercall framework relies on alternative framework to patch the VMCALL -> VMMCALL on AMD platform. If a hypercall is made before apply_alternative() is called then it defaults to VMCALL. The approach works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor will be able to decode the instruction and do the right things. But when SEV is active, guest memory is encrypted with guest key and hypervisor will not be able to decode the instruction bytes. To highlight the need to provide this interface, capturing the flow of apply_alternatives() : setup_arch() call init_hypervisor_platform() which detects the hypervisor platform the kernel is running under and then the hypervisor specific initialization code can make early hypercalls. For example, KVM specific initialization in case of SEV will try to mark the "__bss_decrypted" section's encryption state via early page encryption status hypercalls. Now, apply_alternatives() is called much later when setup_arch() calls check_bugs(), so we do need some kind of an early, pre-alternatives hypercall interface. Other cases of pre-alternatives hypercalls include marking per-cpu GHCB pages as decrypted on SEV-ES and per-cpu apf_reason, steal_time and kvm_apic_eoi as decrypted for SEV generally. Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall will be used by the SEV guest to notify encrypted pages to the hypervisor. This kvm_sev_hypercall3() function is abstracted and used as follows : All these early hypercalls are made through early_set_memory_XX() interfaces, which in turn invoke pv_ops (paravirt_ops). This early_set_memory_XX() -> pv_ops.mmu.notify_page_enc_status_changed() is a generic interface and can easily have SEV, TDX and any other future platform specific abstractions added to it. Currently, pv_ops.mmu.notify_page_enc_status_changed() callback is setup to invoke kvm_sev_hypercall3() in case of SEV. Similarly, in case of TDX, pv_ops.mmu.notify_page_enc_status_changed() can be setup to a TDX specific callback. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: NSteve Rutherford <srutherford@google.com> Reviewed-by: NVenu Busireddy <venu.busireddy@oracle.com> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NAshish Kalra <ashish.kalra@amd.com> Message-Id: <6fd25c749205dd0b1eb492c60d41b124760cc6ae.1629726117.git.ashish.kalra@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Boris Ostrovsky 提交于
Commit 66558b73 ("sched: Add cluster scheduler level for x86") introduced cpu_l2c_shared_map mask which is expected to be initialized by smp_op.smp_prepare_cpus(). That commit only updated native_smp_prepare_cpus() version but not xen_pv_smp_prepare_cpus(). As result Xen PV guests crash in set_cpu_sibling_map(). While the new mask can be allocated in xen_pv_smp_prepare_cpus() one can see that both versions of smp_prepare_cpus ops share a number of common operations that can be factored out. So do that instead. Fixes: 66558b73 ("sched: Add cluster scheduler level for x86") Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NJuergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/1635896196-18961-1-git-send-email-boris.ostrovsky@oracle.com
-
由 Peter Zijlstra 提交于
Add a few signature bytes after the static call trampoline and verify those bytes match before patching the trampoline. This avoids patching random other JMPs (such as CFI jump-table entries) instead. These bytes decode as: d: 53 push %rbx e: 43 54 rex.XB push %r12 And happen to spell "SCT". Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211030074758.GT174703@worktop.programming.kicks-ass.net
-
- 04 11月, 2021 1 次提交
-
-
由 Dave Hansen 提交于
tl;dr: AMX state is ~8k. Signal frames can have space for this ~8k and each signal entry writes out all 8k even if it is zeros. Skip writing zeros for AMX to speed up signal delivery by about 4% overall when AMX is in its init state. This is a user-visible change to the sigframe ABI. == Hardware XSAVE Background == XSAVE state components may be tracked by the processor as being in their initial configuration. Software can detect which features are in this configuration by looking at the XSTATE_BV field in an XSAVE buffer or with the XGETBV(1) instruction. Both the XSAVE and XSAVEOPT instructions enumerate features s being in the initial configuration via the XSTATE_BV field in the XSAVE header, However, XSAVEOPT declines to actually write features in their initial configuration to the buffer. XSAVE writes the feature unconditionally, regardless of whether it is in the initial configuration or not. Basically, XSAVE users never need to inspect XSTATE_BV to determine if the feature has been written to the buffer. XSAVEOPT users *do* need to inspect XSTATE_BV. They might also need to clear out the buffer if they want to make an isolated change to the state, like modifying one register. == Software Signal / XSAVE Background == Signal frames have historically been written with XSAVE itself. Each state is written in its entirety, regardless of being in its initial configuration. In other words, the signal frame ABI uses the XSAVE behavior, not the XSAVEOPT behavior. == Problem == This means that any application which has acquired permission to use AMX via ARCH_REQ_XCOMP_PERM will write 8k of state to the signal frame. This 8k write will occur even when AMX was in its initial configuration and software *knows* this because of XSTATE_BV. This problem also exists to a lesser degree with AVX-512 and its 2k of state. However, AVX-512 use does not require ARCH_REQ_XCOMP_PERM and is more likely to have existing users which would be impacted by any change in behavior. == Solution == Stop writing out AMX xfeatures which are in their initial state to the signal frame. This effectively makes the signal frame XSAVE buffer look as if it were written with a combination of XSAVEOPT and XSAVE behavior. Userspace which handles XSAVEOPT- style buffers should be able to handle this naturally. For now, include only the AMX xfeatures: XTILE and XTILEDATA in this new behavior. These require new ABI to use anyway, which makes their users very unlikely to be broken. This XSAVEOPT-like behavior should be expected for all future dynamic xfeatures. It may also be extended to legacy features like AVX-512 in the future. Only attempt this optimization on systems with dynamic features. Disable dynamic feature support (XFD) if XGETBV1 is unavailable by adding a CPUID dependency. This has been measured to reduce the *overall* cycle cost of signal delivery by about 4%. Fixes: 2308ee57 ("x86/fpu/amx: Enable the AMX feature in 64-bit mode") Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: N"Chang S. Bae" <chang.seok.bae@intel.com> Link: https://lore.kernel.org/r/20211102224750.FA412E26@davehans-spike.ostc.intel.com
-
- 02 11月, 2021 5 次提交
-
-
由 Juergen Gross 提交于
Put the definitions of the hypercalls usable only by pv guests inside CONFIG_XEN_PV sections. On Arm two dummy functions related to pv hypercalls can be removed. While at it remove the no longer supported tmem hypercall definition. Signed-off-by: NJuergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20211028081221.2475-3-jgross@suse.comReviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Juergen Gross 提交于
There are some remaining 32-bit pv-guest support leftovers in the Xen hypercall interface. Remove them. Signed-off-by: NJuergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20211028081221.2475-2-jgross@suse.comReviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Oleksandr Andrushchenko 提交于
Xen-pciback driver was designed to be built for x86 only. But it can also be used by other architectures, e.g. Arm. Currently PCI backend implements multiple functionalities at a time, such as: 1. It is used as a database for assignable PCI devices, e.g. xl pci-assignable-{add|remove|list} manipulates that list. So, whenever the toolstack needs to know which PCI devices can be passed through it reads that from the relevant sysfs entries of the pciback. 2. It is used to hold the unbound PCI devices list, e.g. when passing through a PCI device it needs to be unbound from the relevant device driver and bound to pciback (strictly speaking it is not required that the device is bound to pciback, but pciback is again used as a database of the passed through PCI devices, so we can re-bind the devices back to their original drivers when guest domain shuts down) 3. Device reset for the devices being passed through 4. Para-virtualised use-cases support The para-virtualised part of the driver is not always needed as some architectures, e.g. Arm or x86 PVH Dom0, are not using backend-frontend model for PCI device passthrough. For such use-cases make the very first step in splitting the xen-pciback driver into two parts: Xen PCI stub and PCI PV backend drivers. For that add new configuration options CONFIG_XEN_PCI_STUB and CONFIG_XEN_PCIDEV_STUB, so the driver can be limited in its functionality, e.g. no support for para-virtualised scenario. x86 platform will continue using CONFIG_XEN_PCIDEV_BACKEND for the fully featured backend driver. Signed-off-by: NOleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Signed-off-by: NAnastasiia Lukianenko <anastasiia_lukianenko@epam.com> Reviewed-by: NStefano Stabellini <sstabellini@kernel.org> Reviewed-by: NJuergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20211028143620.144936-1-andr2000@gmail.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Juergen Gross 提交于
The initial pvops functions handling irq flags will only ever be called before interrupts are being enabled. So switch them to be dummy functions: - xen_save_fl() can always return 0 - xen_irq_disable() is a nop - xen_irq_enable() can BUG() Add some generic paravirt functions for that purpose. Signed-off-by: NJuergen Gross <jgross@suse.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20211028072748.29862-3-jgross@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Juergen Gross 提交于
xen_pvh_init() is lacking a prototype in a header, add it. Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20211006061950.9227-1-jgross@suse.comReviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
- 29 10月, 2021 7 次提交
-
-
由 Peter Zijlstra 提交于
Current BPF codegen doesn't respect X86_FEATURE_RETPOLINE* flags and unconditionally emits a thunk call, this is sub-optimal and doesn't match the regular, compiler generated, code. Update the i386 JIT to emit code equal to what the compiler emits for the regular kernel text (IOW. a plain THUNK call). Update the x86_64 JIT to emit code similar to the result of compiler and kernel rewrites as according to X86_FEATURE_RETPOLINE* flags. Inlining RETPOLINE_AMD (lfence; jmp *%reg) and !RETPOLINE (jmp *%reg), while doing a THUNK call for RETPOLINE. This removes the hard-coded retpoline thunks and shrinks the generated code. Leaving a single retpoline thunk definition in the kernel. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBorislav Petkov <bp@suse.de> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com> Tested-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120310.614772675@infradead.org
-
由 Peter Zijlstra 提交于
Rewrite retpoline thunk call sites to be indirect calls for spectre_v2=off. This ensures spectre_v2=off is as near to a RETPOLINE=n build as possible. This is the replacement for objtool writing alternative entries to ensure the same and achieves feature-parity with the previous approach. One noteworthy feature is that it relies on the thunks to be in machine order to compute the register index. Specifically, this does not yet address the Jcc __x86_indirect_thunk_* calls generated by clang, a future patch will add this. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBorislav Petkov <bp@suse.de> Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com> Tested-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120310.232495794@infradead.org
-
由 Peter Zijlstra 提交于
Stick all the retpolines in a single symbol and have the individual thunks as inner labels, this should guarantee thunk order and layout. Previously there were 16 (or rather 15 without rsp) separate symbols and a toolchain might reasonably expect it could displace them however it liked, with disregard for their relative position. However, now they're part of a larger symbol. Any change to their relative position would disrupt this larger _array symbol and thus not be sound. This is the same reasoning used for data symbols. On their own there is no guarantee about their relative position wrt to one aonther, but we're still able to do arrays because an array as a whole is a single larger symbol. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBorislav Petkov <bp@suse.de> Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com> Tested-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120310.169659320@infradead.org
-
由 Peter Zijlstra 提交于
Because it makes no sense to split the retpoline gunk over multiple headers. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBorislav Petkov <bp@suse.de> Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com> Tested-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120310.106290934@infradead.org
-
由 Peter Zijlstra 提交于
Currently GEN-for-each-reg.h usage leaves GEN defined, relying on any subsequent usage to start with #undef, which is rude. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBorislav Petkov <bp@suse.de> Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com> Tested-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120310.041792350@infradead.org
-
由 Peter Zijlstra 提交于
Ensure the register order is correct; this allows for easy translation between register number and trampoline and vice-versa. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBorislav Petkov <bp@suse.de> Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com> Tested-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120309.978573921@infradead.org
-
由 Peter Zijlstra 提交于
Now that objtool no longer creates alternatives, these replacement symbols are no longer needed, remove them. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBorislav Petkov <bp@suse.de> Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com> Tested-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120309.915051744@infradead.org
-
- 28 10月, 2021 3 次提交
-
-
由 Tianyu Lan 提交于
Hyperv provides GHCB protocol to write Synthetic Interrupt Controller MSR registers in Isolation VM with AMD SEV SNP and these registers are emulated by hypervisor directly. Hyperv requires to write SINTx MSR registers twice. First writes MSR via GHCB page to communicate with hypervisor and then writes wrmsr instruction to talk with paravisor which runs in VMPL0. Guest OS ID MSR also needs to be set via GHCB page. Reviewed-by: NMichael Kelley <mikelley@microsoft.com> Signed-off-by: NTianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-7-ltykernel@gmail.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
-
由 Tianyu Lan 提交于
Add new hvcall guest address host visibility support to mark memory visible to host. Call it inside set_memory_decrypted /encrypted(). Add HYPERVISOR feature check in the hv_is_isolation_supported() to optimize in non-virtualization environment. Acked-by: NDave Hansen <dave.hansen@intel.com> Reviewed-by: NMichael Kelley <mikelley@microsoft.com> Signed-off-by: NTianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-4-ltykernel@gmail.com [ wei: fix conflicts with tip ] Signed-off-by: NWei Liu <wei.liu@kernel.org>
-
由 Tianyu Lan 提交于
Hyperv exposes GHCB page via SEV ES GHCB MSR for SNP guest to communicate with hypervisor. Map GHCB page for all cpus to read/write MSR register and submit hvcall request via ghcb page. Reviewed-by: NMichael Kelley <mikelley@microsoft.com> Signed-off-by: NTianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-2-ltykernel@gmail.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
-
- 26 10月, 2021 5 次提交
-
-
由 Chang S. Bae 提交于
Add the AMX state components in XFEATURE_MASK_USER_SUPPORTED and the TILE_DATA component to the dynamic states and update the permission check table accordingly. This is only effective on 64 bit kernels as for 32bit kernels XFEATURE_MASK_TILE is defined as 0. TILE_DATA is caller-saved state and the only dynamic state. Add build time sanity check to ensure the assumption that every dynamic feature is caller- saved. Make AMX state depend on XFD as it is dynamic feature. Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-24-chang.seok.bae@intel.com
-
由 Chang S. Bae 提交于
The XSTATE initialization uses check_xstate_against_struct() to sanity check the size of XSTATE-enabled features. AMX is a XSAVE-enabled feature, and its size is not hard-coded but discoverable at run-time via CPUID. The AMX state is composed of state components 17 and 18, which are all user state components. The first component is the XTILECFG state of a 64-byte tile-related control register. The state component 18, called XTILEDATA, contains the actual tile data, and the state size varies on implementations. The architectural maximum, as defined in the CPUID(0x1d, 1): EAX[15:0], is a byte less than 64KB. The first implementation supports 8KB. Check the XTILEDATA state size dynamically. The feature introduces the new tile register, TMM. Define one register struct only and read the number of registers from CPUID. Cross-check the overall size with CPUID again. Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-21-chang.seok.bae@intel.com
-
由 Chang S. Bae 提交于
The fpstate embedded in struct fpu is the default state for storing the FPU registers. It's sized so that the default supported features can be stored. For dynamically enabled features the register buffer is too small. The #NM handler detects first use of a feature which is disabled in the XFD MSR. After handling permission checks it recalculates the size for kernel space and user space state and invokes fpstate_realloc() which tries to reallocate fpstate and install it. Provide the allocator function which checks whether the current buffer size is sufficient and if not allocates one. If allocation is successful the new fpstate is initialized with the new features and sizes and the now enabled features is removed from the task's XFD mask. realloc_fpstate() uses vzalloc(). If use of this mechanism grows to re-allocate buffers larger than 64KB, a more sophisticated allocation scheme that includes purpose-built reclaim capability might be justified. Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-19-chang.seok.bae@intel.com
-
由 Chang S. Bae 提交于
If the XFD MSR has feature bits set then #NM will be raised when user space attempts to use an instruction related to one of these features. When the task has no permissions to use that feature, raise SIGILL, which is the same behavior as #UD. If the task has permissions, calculate the new buffer size for the extended feature set and allocate a larger fpstate. In the unlikely case that vzalloc() fails, SIGSEGV is raised. The allocation function will be added in the next step. Provide a stub which fails for now. [ tglx: Updated serialization ] Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-18-chang.seok.bae@intel.com
-
由 Chang S. Bae 提交于
Add storage for XFD register state to struct fpstate. This will be used to store the XFD MSR state. This will be used for switching the XFD MSR when FPU content is restored. Add a per-CPU variable to cache the current MSR value so the MSR has only to be written when the values are different. Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-15-chang.seok.bae@intel.com
-