- 24 6月, 2022 31 次提交
-
-
由 Jue Wang 提交于
This patch enables MCG_CMCI_P by default in kvm_mce_cap_supported. It reuses ioctl KVM_X86_SET_MCE to implement injection of UnCorrectable No Action required (UCNA) errors, signaled via Corrected Machine Check Interrupt (CMCI). Neither of the CMCI and UCNA emulations depends on hardware. Signed-off-by: NJue Wang <juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-8-juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jue Wang 提交于
This patch adds the emulation of IA32_MCi_CTL2 registers to KVM. A separate mci_ctl2_banks array is used to keep the existing mce_banks register layout intact. In Machine Check Architecture, in addition to MCG_CMCI_P, bit 30 of the per-bank register IA32_MCi_CTL2 controls whether Corrected Machine Check error reporting is enabled. Signed-off-by: NJue Wang <juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-7-juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jue Wang 提交于
This patch updates the allocation of mce_banks with the array allocation API (kcalloc) as a precedent for the later mci_ctl2_banks to implement per-bank control of Corrected Machine Check Interrupt (CMCI). Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NJue Wang <juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-6-juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jue Wang 提交于
This patch calculates the number of lvt entries as part of KVM_X86_MCE_SETUP conditioned on the presence of MCG_CMCI_P bit in MCG_CAP and stores result in kvm_lapic. It translats from APIC_LVTx register to index in lapic_lvt_entry enum. It extends the APIC_LVTx macro as well as other lapic write/reset handling etc to support Corrected Machine Check Interrupt. Signed-off-by: NJue Wang <juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-5-juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jue Wang 提交于
An APIC_LVTx macro is introduced to calcualte the APIC_LVTx register offset based on the index in the lapic_lvt_entry enum. Later patches will extend the APIC_LVTx macro to support the APIC_LVTCMCI register in order to implement Corrected Machine Check Interrupt signaling. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NJue Wang <juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-4-juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jue Wang 提交于
This patch defines a lapic_lvt_entry enum used as explicit indices to the apic_lvt_mask array. In later patches a LVT_CMCI will be added to implement the Corrected Machine Check Interrupt signaling. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NJue Wang <juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-3-juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jue Wang 提交于
Refactor APIC_VERSION so that the maximum number of LVT entries is inserted at runtime rather than compile time. This will be used in a subsequent commit to expose the LVT CMCI Register to VMs that support Corrected Machine Check error counting/signaling (IA32_MCG_CAP.MCG_CMCI_P=1). Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NJue Wang <juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-2-juew@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The TLB flush before installing the newly-populated lower level page table is unnecessary if the lower-level page table maps the huge page identically. KVM knows it is if it did not reuse an existing shadow page table, tell drop_large_spte() to skip the flush in that case. Extracted from a patch by David Matlack. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Add support for Eager Page Splitting pages that are mapped by nested MMUs. Walk through the rmap first splitting all 1GiB pages to 2MiB pages, and then splitting all 2MiB pages to 4KiB pages. Note, Eager Page Splitting is limited to nested MMUs as a policy rather than due to any technical reason (the sp->role.guest_mode check could just be deleted and Eager Page Splitting would work correctly for all shadow MMU pages). There is really no reason to support Eager Page Splitting for tdp_mmu=N, since such support will eventually be phased out, and there is no current use case supporting Eager Page Splitting on hosts where TDP is either disabled or unavailable in hardware. Furthermore, future improvements to nested MMU scalability may diverge the code from the legacy shadow paging implementation. These improvements will be simpler to make if Eager Page Splitting does not have to worry about legacy shadow paging. Splitting huge pages mapped by nested MMUs requires dealing with some extra complexity beyond that of the TDP MMU: (1) The shadow MMU has a limit on the number of shadow pages that are allowed to be allocated. So, as a policy, Eager Page Splitting refuses to split if there are KVM_MIN_FREE_MMU_PAGES or fewer pages available. (2) Splitting a huge page may end up re-using an existing lower level shadow page tables. This is unlike the TDP MMU which always allocates new shadow page tables when splitting. (3) When installing the lower level SPTEs, they must be added to the rmap which may require allocating additional pte_list_desc structs. Case (2) is especially interesting since it may require a TLB flush, unlike the TDP MMU which can fully split huge pages without any TLB flushes. Specifically, an existing lower level page table may point to even lower level page tables that are not fully populated, effectively unmapping a portion of the huge page, which requires a flush. As of this commit, a flush is always done always after dropping the huge page and before installing the lower level page table. This TLB flush could instead be delayed until the MMU lock is about to be dropped, which would batch flushes for multiple splits. However these flushes should be rare in practice (a huge page must be aliased in multiple SPTEs and have been split for NX Huge Pages in only some of them). Flushing immediately is simpler to plumb and also reduces the chances of tripping over a CPU bug (e.g. see iTLB multihit). [ This commit is based off of the original implementation of Eager Page Splitting from Peter in Google's kernel from 2016. ] Suggested-by: NPeter Feiner <pfeiner@google.com> Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-23-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Before allocating a child shadow page table, all callers check whether the parent already points to a huge page and, if so, they drop that SPTE. This is done by drop_large_spte(). However, dropping the large SPTE is really only necessary before the sp is installed. While the sp is returned by kvm_mmu_get_child_sp(), installing it happens later in __link_shadow_page(). Move the call there instead of having it in each and every caller. To ensure that the shadow page is not linked twice if it was present, do _not_ opportunistically make kvm_mmu_get_child_sp() idempotent: instead, return an error value if the shadow page already existed. This is a bit more verbose, but clearer than NULL. Finally, now that the drop_large_spte() name is not taken anymore, remove the two underscores in front of __drop_large_spte(). Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Currently KVM only zaps collapsible 4KiB SPTEs in the shadow MMU. This is fine for now since KVM never creates intermediate huge pages during dirty logging. In other words, KVM always replaces 1GiB pages directly with 4KiB pages, so there is no reason to look for collapsible 2MiB pages. However, this will stop being true once the shadow MMU participates in eager page splitting. During eager page splitting, each 1GiB is first split into 2MiB pages and then those are split into 4KiB pages. The intermediate 2MiB pages may be left behind if an error condition causes eager page splitting to bail early. No functional change intended. Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-20-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Currently make_huge_page_split_spte() assumes execute permissions can be granted to any 4K SPTE when splitting huge pages. This is true for the TDP MMU but is not necessarily true for the shadow MMU, since KVM may be shadowing a non-executable huge page. To fix this, pass in the role of the child shadow page where the huge page will be split and derive the execution permission from that. This is correct because huge pages are always split with direct shadow page and thus the shadow page role contains the correct access permissions. No functional change intended. Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-19-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Splitting huge pages requires allocating/finding shadow pages to replace the huge page. Shadow pages are keyed, in part, off the guest access permissions they are shadowing. For fully direct MMUs, there is no shadowing so the access bits in the shadow page role are always ACC_ALL. But during shadow paging, the guest can enforce whatever access permissions it wants. In particular, eager page splitting needs to know the permissions to use for the subpages, but KVM cannot retrieve them from the guest page tables because eager page splitting does not have a vCPU. Fortunately, the guest access permissions are easy to cache whenever page faults or FNAME(sync_page) update the shadow page tables; this is an extension of the existing cache of the shadowed GFNs in the gfns array of the shadow page. The access bits only take up 3 bits, which leaves 61 bits left over for gfns, which is more than enough. Now that the gfns array caches more information than just GFNs, rename it to shadowed_translation. While here, preemptively fix up the WARN_ON() that detects gfn mismatches in direct SPs. The WARN_ON() was paired with a pr_err_ratelimited(), which means that users could sometimes see the WARN without the accompanying error message. Fix this by outputting the error message as part of the WARN splat, and opportunistically make them WARN_ONCE() because if these ever fire, they are all but guaranteed to fire a lot and will bring down the kernel. Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-18-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Update the page stats in __rmap_add() rather than at the call site. This will avoid having to manually update page stats when splitting huge pages in a subsequent commit. No functional change intended. Reviewed-by: NBen Gardon <bgardon@google.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-17-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Allow adding new entries to the rmap and linking shadow pages without a struct kvm_vcpu pointer by moving the implementation of rmap_add() and link_shadow_page() into inner helper functions. No functional change intended. Reviewed-by: NBen Gardon <bgardon@google.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-16-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Constify rmap_add()'s @slot parameter; it is simply passed on to gfn_to_rmap(), which takes a const memslot. No functional change intended. Reviewed-by: NBen Gardon <bgardon@google.com> Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-15-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Allow @vcpu to be NULL in kvm_mmu_find_shadow_page() (and its only caller __kvm_mmu_get_shadow_page()). @vcpu is only required to sync indirect shadow pages, so it's safe to pass in NULL when looking up direct shadow pages. This will be used for doing eager page splitting, which allocates direct shadow pages from the context of a VM ioctl without access to a vCPU pointer. Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-14-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Get the kvm pointer from the caller, rather than deriving it from vcpu->kvm, and plumb the kvm pointer all the way from kvm_mmu_get_shadow_page(). With this change in place, the vcpu pointer is only needed to sync indirect shadow pages. In other words, __kvm_mmu_get_shadow_page() can now be used to get *direct* shadow pages without a vcpu pointer. This enables eager page splitting, which needs to allocate direct shadow pages during VM ioctls. No functional change intended. Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-13-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
The vcpu pointer in kvm_mmu_alloc_shadow_page() is only used to get the kvm pointer. So drop the vcpu pointer and just pass in the kvm pointer. No functional change intended. Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-12-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Refactor kvm_mmu_alloc_shadow_page() to receive the caches from which it will allocate the various pieces of memory for shadow pages as a parameter, rather than deriving them from the vcpu pointer. This will be useful in a future commit where shadow pages are allocated during VM ioctls for eager page splitting, and thus will use a different set of caches. Preemptively pull the caches out all the way to kvm_mmu_get_shadow_page() since eager page splitting will not be calling kvm_mmu_alloc_shadow_page() directly. No functional change intended. Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-11-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Move the code that write-protects newly-shadowed guest page tables into account_shadowed(). This avoids a extra gfn-to-memslot lookup and is a more logical place for this code to live. But most importantly, this reduces kvm_mmu_alloc_shadow_page()'s reliance on having a struct kvm_vcpu pointer, which will be necessary when creating new shadow pages during VM ioctls for eager page splitting. Note, it is safe to drop the role.level == PG_LEVEL_4K check since account_shadowed() returns early if role.level > PG_LEVEL_4K. No functional change intended. Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-10-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Rename 2 functions: kvm_mmu_get_page() -> kvm_mmu_get_shadow_page() kvm_mmu_free_page() -> kvm_mmu_free_shadow_page() This change makes it clear that these functions deal with shadow pages rather than struct pages. It also aligns these functions with the naming scheme for kvm_mmu_find_shadow_page() and kvm_mmu_alloc_shadow_page(). Prefer "shadow_page" over the shorter "sp" since these are core functions and the line lengths aren't terrible. No functional change intended. Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-9-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Consolidate kvm_mmu_alloc_page() and kvm_mmu_alloc_shadow_page() under the latter so that all shadow page allocation and initialization happens in one place. No functional change intended. Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-8-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Decompose kvm_mmu_get_page() into separate helper functions to increase readability and prepare for allocating shadow pages without a vcpu pointer. Specifically, pull the guts of kvm_mmu_get_page() into 2 helper functions: kvm_mmu_find_shadow_page() - Walks the page hash checking for any existing mmu pages that match the given gfn and role. kvm_mmu_alloc_shadow_page() Allocates and initializes an entirely new kvm_mmu_page. This currently requries a vcpu pointer for allocation and looking up the memslot but that will be removed in a future commit. No functional change intended. Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-7-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
The quadrant is only used when gptes are 4 bytes, but mmu_alloc_{direct,shadow}_roots() pass in a non-zero quadrant for PAE page directories regardless. Make this less confusing by only passing in a non-zero quadrant when it is actually necessary. Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-6-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Instead of computing the shadow page role from scratch for every new page, derive most of the information from the parent shadow page. This eliminates the dependency on the vCPU root role to allocate shadow page tables, and reduces the number of parameters to kvm_mmu_get_page(). Preemptively split out the role calculation to a separate function for use in a following commit. Note that when calculating the MMU root role, we can take @role.passthrough, @role.direct, and @role.access directly from @vcpu->arch.mmu->root_role. Only @role.level and @role.quadrant still must be overridden for PAE page directories, when shadowing 32-bit guest page tables with PAE page tables. No functional change intended. Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-5-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
The "direct" argument is vcpu->arch.mmu->root_role.direct, because unlike non-root page tables, it's impossible to have a direct root in an indirect MMU. So just use that. Suggested-by: NLai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-4-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
The parameter "direct" can either be true or false, and all of the callers pass in a bool variable or true/false literal, so just use the type bool. No functional change intended. Reviewed-by: NLai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-3-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Commit fb58a9c3 ("KVM: x86/mmu: Optimize MMU page cache lookup for fully direct MMUs") skipped the unsync checks and write flood clearing for full direct MMUs. We can extend this further to skip the checks for all direct shadow pages. Direct shadow pages in indirect MMUs (i.e. shadow paging) are used when shadowing a guest huge page with smaller pages. Such direct shadow pages, like their counterparts in fully direct MMUs, are never marked unsynced or have a non-zero write-flooding count. Checking sp->role.direct also generates better code than checking direct_map because, due to register pressure, direct_map has to get shoved onto the stack and then pulled back off. No functional change intended. Reviewed-by: NLai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-2-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
In some cases, the NX hugepage mitigation for iTLB multihit is not needed for all guests on a host. Allow disabling the mitigation on a per-VM basis to avoid the performance hit of NX hugepages on trusted workloads. In order to disable NX hugepages on a VM, ensure that the userspace actor has permission to reboot the system. Since disabling NX hugepages would allow a guest to crash the system, it is similar to reboot permissions. Ideally, KVM would require userspace to prove it has access to KVM's nx_huge_pages module param, e.g. so that userspace can opt out without needing full reboot permissions. But getting access to the module param file info is difficult because it is buried in layers of sysfs and module glue. Requiring CAP_SYS_BOOT is sufficient for all known use cases. Suggested-by: NJim Mattson <jmattson@google.com> Reviewed-by: NDavid Matlack <dmatlack@google.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20220613212523.3436117-9-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
The braces around the KVM_CAP_XSAVE2 block also surround the KVM_CAP_PMU_CAPABILITY block, likely the result of a merge issue. Simply move the curly brace back to where it belongs. Fixes: ba7bb663 ("KVM: x86: Provide per VM capability for disabling PMU virtualization") Reviewed-by: NDavid Matlack <dmatlack@google.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20220613212523.3436117-8-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 20 6月, 2022 9 次提交
-
-
由 Sean Christopherson 提交于
Add a quirk for KVM's behavior of emulating intercepted MONITOR/MWAIT instructions a NOPs regardless of whether or not they are supported in guest CPUID. KVM's current behavior was likely motiviated by a certain fruity operating system that expects MONITOR/MWAIT to be supported unconditionally and blindly executes MONITOR/MWAIT without first checking CPUID. And because KVM does NOT advertise MONITOR/MWAIT to userspace, that's effectively the default setup for any VMM that regurgitates KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID2. Note, this quirk interacts with KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT. The behavior is actually desirable, as userspace VMMs that want to unconditionally hide MONITOR/MWAIT from the guest can leave the MISC_ENABLE quirk enabled. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220608224516.3788274-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Ignore host userspace writes of '0' to F15H_PERF_CTL MSRs KVM reports in the MSR-to-save list, but the MSRs are ultimately unsupported. All MSRs in said list must be writable by userspace, e.g. if userspace sends the list back at KVM without filtering out the MSRs it doesn't need. Note, reads of said MSRs already have the desired behavior. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220611005755.753273-8-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Ignore host userspace reads and writes of '0' to PEBS and BTS MSRs that KVM reports in the MSR-to-save list, but the MSRs are ultimately unsupported. All MSRs in said list must be writable by userspace, e.g. if userspace sends the list back at KVM without filtering out the MSRs it doesn't need. Fixes: 8183a538 ("KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS") Fixes: 902caeb6 ("KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS") Fixes: c59a1f10 ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220611005755.753273-7-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Use vcpu_get_perf_capabilities() when querying MSR_IA32_PERF_CAPABILITIES from the guest's perspective, e.g. to update the vPMU and to determine which MSRs exist. If userspace ignores MSR_IA32_PERF_CAPABILITIES but clear X86_FEATURE_PDCM, the guest should see '0'. Fixes: 902caeb6 ("KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS") Fixes: c59a1f10 ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220611005755.753273-6-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Revert the hack to allow host-initiated accesses to all "PMU" MSRs, as intel_is_valid_msr() returns true for _all_ MSRs, regardless of whether or not it has a snowball's chance in hell of actually being a PMU MSR. That mostly gets papered over by the actual get/set helpers only handling MSRs that they knows about, except there's the minor detail that kvm_pmu_{g,s}et_msr() eat reads and writes when the PMU is disabled. I.e. KVM will happy allow reads and writes to _any_ MSR if the PMU is disabled, either via module param or capability. This reverts commit d1c88a40. Fixes: d1c88a40 ("KVM: x86: always allow host-initiated writes to PMU MSRs") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220611005755.753273-5-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Eating reads and writes to all "PMU" MSRs when there is no PMU is wildly broken as it results in allowing accesses to _any_ MSR on Intel CPUs as intel_is_valid_msr() returns true for all host_initiated accesses. A revert of commit d1c88a40 ("KVM: x86: always allow host-initiated writes to PMU MSRs") will soon follow. This reverts commit 8e6a58e2. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220611005755.753273-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Do not clear manipulate MSR_IA32_PERF_CAPABILITIES in intel_pmu_refresh(), i.e. give userspace full control over capability/read-only MSRs. KVM is not a babysitter, it is userspace's responsiblity to provide a valid and coherent vCPU model. Attempting to "help" the guest by forcing a consistent model creates edge cases, and ironicially leads to inconsistent behavior. Example #1: KVM doesn't do intel_pmu_refresh() when userspace writes the MSR. Example #2: KVM doesn't clear the bits when the PMU is disabled, or when there's no architectural PMU. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220611005755.753273-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Give userspace full control of the read-only bits in MISC_ENABLES, i.e. do not modify bits on PMU refresh and do not preserve existing bits when userspace writes MISC_ENABLES. With a few exceptions where KVM doesn't expose the necessary controls to userspace _and_ there is a clear cut association with CPUID, e.g. reserved CR4 bits, KVM does not own the vCPU and should not manipulate the vCPU model on behalf of "dummy user space". The argument that KVM is doing userspace a favor because "the order of setting vPMU capabilities and MSR_IA32_MISC_ENABLE is not strictly guaranteed" is specious, as attempting to configure MSRs on behalf of userspace inevitably leads to edge cases precisely because KVM does not prescribe a specific order of initialization. Example #1: intel_pmu_refresh() consumes and modifies the vCPU's MSR_IA32_PERF_CAPABILITIES, and so assumes userspace initializes config MSRs before setting the guest CPUID model. If userspace sets CPUID first, then KVM will mark PEBS as available when arch.perf_capabilities is initialized with a non-zero PEBS format, thus creating a bad vCPU model if userspace later disables PEBS by writing PERF_CAPABILITIES. Example #2: intel_pmu_refresh() does not clear PERF_CAP_PEBS_MASK in MSR_IA32_PERF_CAPABILITIES if there is no vPMU, making KVM inconsistent in its desire to be consistent. Example #3: intel_pmu_refresh() does not clear MSR_IA32_MISC_ENABLE_EMON if KVM_SET_CPUID2 is called multiple times, first with a vPMU, then without a vPMU. While slightly contrived, it's plausible a VMM could reflect KVM's default vCPU and then operate on KVM's copy of CPUID to later clear the vPMU settings, e.g. see KVM's selftests. Example #4: Enumerating an Intel vCPU on an AMD host will not call into intel_pmu_refresh() at any point, and so the BTS and PEBS "unavailable" bits will be left clear, without any way for userspace to set them. Keep the "R" behavior of the bit 7, "EMON available", for the guest. Unlike the BTS and PEBS bits, which are fully "RO", the EMON bit can be written with a different value, but that new value is ignored. Cc: Like Xu <likexu@tencent.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Reported-by: Nkernel test robot <oliver.sang@intel.com> Message-Id: <20220611005755.753273-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Dongliang Mu 提交于
kfree can handle NULL pointer as its argument. According to coccinelle isnullfree check, remove NULL check before kfree operation. Signed-off-by: NDongliang Mu <mudongliangabcd@gmail.com> Message-Id: <20220614133458.147314-1-dzm91@hust.edu.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-