- 17 4月, 2021 2 次提交
-
-
由 Maxim Levitsky 提交于
Store the supported bits into KVM_GUESTDBG_VALID_MASK macro, similar to how arm does this. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401135451.1004564-4-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the prototypes for the MMU notifier callbacks out of arch code and into common code. There is no benefit to having each arch replicate the prototypes since any deviation from the invocation in common code will explode. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-9-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 3月, 2021 8 次提交
-
-
由 Sean Christopherson 提交于
Retrieve the active PCID only when writing a guest CR3 value, i.e. don't get the PCID when using EPT or NPT. The PCID is especially problematic for EPT as the bits have different meaning, and so the PCID and must be manually stripped, which is annoying and unnecessary. And on VMX, getting the active PCID also involves reading the guest's CR3 and CR4.PCIDE, i.e. may add pointless VMREADs. Opportunistically rename the pgd/pgd_level params to root_hpa and root_level to better reflect their new roles. Keep the function names, as "load the guest PGD" is still accurate/correct. Last, and probably least, pass root_hpa as a hpa_t/u64 instead of an unsigned long. The EPTP holds a 64-bit value, even in 32-bit mode, so in theory EPT could support HIGHMEM for 32-bit KVM. Never mind that doing so would require changing the MMU page allocators and reworking the MMU to use kmap(). Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-2-seanjc@google.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Let the MMU deal with the SPTE masks to avoid splitting the logic and knowledge across the MMU and VMX. The SPTE masks that are used for EPT are very, very tightly coupled to the MMU implementation. The use of available bits, the existence of A/D types, the fact that shadow_x_mask even exists, and so on and so forth are all baked into the MMU implementation. Cross referencing the params to the masks is also a nightmare, as pretty much every param is a u64. A future patch will make the location of the MMU_WRITABLE and HOST_WRITABLE bits MMU specific, to free up bit 11 for a MMU_PRESENT bit. Doing that change with the current kvm_mmu_set_mask_ptes() would be an absolute mess. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-18-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the entirety of the accelerated RDPMC emulation to x86.c, and assign the common handler directly to the exit handler array for VMX. SVM has bizarre nrips behavior that prevents it from directly invoking the common handler. The nrips goofiness will be addressed in a future patch. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-8-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the trivial exit handlers, e.g. for instructions that KVM "emulates" as nops, to common x86 code. Assign the common handlers directly to the exit handler arrays and drop the vendor trampolines. Opportunistically use pr_warn_once() where appropriate. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-7-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the entirety of XSETBV emulation to x86.c, and assign the function directly to both VMX's and SVM's exit handlers, i.e. drop the unnecessary trampolines. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-6-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Synthesize a nested VM-Exit if L2 triggers an emulated triple fault instead of exiting to userspace, which likely will kill L1. Any flow that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that is _not_intercepted by L1. E.g. if KVM is intercepting #GPs for the VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF will cause KVM to emulate triple fault. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210302174515.2812275-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Unexport the MMU load and unload helpers now that they are no longer used (incorrectly) in vendor code. Opportunistically move the kvm_mmu_sync_roots() declaration into mmu.h, it should not be exposed to vendor code. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-16-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Dongli Zhang 提交于
The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM is running or used to run L2 VM. An example of the usage of 'nested_run' is to help the host administrator to easily track if any L1 VM is used to run L2 VM. Suppose there is issue that may happen with nested virtualization, the administrator will be able to easily narrow down and confirm if the issue is due to nested virtualization via 'nested_run'. For example, whether the fix like commit 88dddc11 ("KVM: nVMX: do not use dangling shadow VMCS after guest reset") is required. Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com> Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 3月, 2021 1 次提交
-
-
由 Muhammad Usama Anjum 提交于
This patch adds the annotation to fix the following sparse errors: arch/x86/kvm//x86.c:8147:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//x86.c:8147:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//x86.c:8147:15: struct kvm_apic_map * arch/x86/kvm//x86.c:10628:16: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//x86.c:10628:16: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//x86.c:10628:16: struct kvm_apic_map * arch/x86/kvm//x86.c:10629:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//x86.c:10629:15: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//x86.c:10629:15: struct kvm_pmu_event_filter * arch/x86/kvm//lapic.c:267:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:267:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:267:15: struct kvm_apic_map * arch/x86/kvm//lapic.c:269:9: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:269:9: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:269:9: struct kvm_apic_map * arch/x86/kvm//lapic.c:637:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:637:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:637:15: struct kvm_apic_map * arch/x86/kvm//lapic.c:994:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:994:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:994:15: struct kvm_apic_map * arch/x86/kvm//lapic.c:1036:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:1036:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:1036:15: struct kvm_apic_map * arch/x86/kvm//lapic.c:1173:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:1173:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:1173:15: struct kvm_apic_map * arch/x86/kvm//pmu.c:190:18: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//pmu.c:190:18: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//pmu.c:190:18: struct kvm_pmu_event_filter * arch/x86/kvm//pmu.c:251:18: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//pmu.c:251:18: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//pmu.c:251:18: struct kvm_pmu_event_filter * arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter * arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter * Signed-off-by: NMuhammad Usama Anjum <musamaanjum@gmail.com> Message-Id: <20210305191123.GA497469@LEGION> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 03 3月, 2021 1 次提交
-
-
由 David Woodhouse 提交于
This is how Xen guests do steal time accounting. The hypervisor records the amount of time spent in each of running/runnable/blocked/offline states. In the Xen accounting, a vCPU is still in state RUNSTATE_running while in Xen for a hypercall or I/O trap, etc. Only if Xen explicitly schedules does the state become RUNSTATE_blocked. In KVM this means that even when the vCPU exits the kvm_run loop, the state remains RUNSTATE_running. The VMM can explicitly set the vCPU to RUNSTATE_blocked by using the KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT attribute, and can also use KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST to retrospectively add a given amount of time to the blocked state and subtract it from the running state. The state_entry_time corresponds to get_kvmclock_ns() at the time the vCPU entered the current state, and the total times of all four states should always add up to state_entry_time. Co-developed-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Message-Id: <20210301125309.874953-2-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 2月, 2021 1 次提交
-
-
由 Dongli Zhang 提交于
The 'mmu_page_hash' is used as hash table while 'active_mmu_pages' is a list. Remove the misplaced comment as it's mostly stating the obvious anyways. Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210226061945.1222-1-dongli.zhang@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 19 2月, 2021 5 次提交
-
-
由 Sean Christopherson 提交于
Remove several exports from the MMU that are no longer necessary. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-15-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Stop setting dirty bits for MMU pages when dirty logging is disabled for a memslot, as PML is now completely disabled when there are no memslots with dirty logging enabled. This means that spurious PML entries will be created for memslots with dirty logging disabled if at least one other memslot has dirty logging enabled. However, spurious PML entries are already possible since dirty bits are set only when a dirty logging is turned off, i.e. memslots that are never dirty logged will have dirty bits cleared. In the end, it's faster overall to eat a few spurious PML entries in the window where dirty logging is being disabled across all memslots. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-13-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Makarand Sonare 提交于
Currently, if enable_pml=1 PML remains enabled for the entire lifetime of the VM irrespective of whether dirty logging is enable or disabled. When dirty logging is disabled, all the pages of the VM are manually marked dirty, so that PML is effectively non-operational. Setting the dirty bits is an expensive operation which can cause severe MMU lock contention in a performance sensitive path when dirty logging is disabled after a failed or canceled live migration. Manually setting dirty bits also fails to prevent PML activity if some code path clears dirty bits, which can incur unnecessary VM-Exits. In order to avoid this extra overhead, dynamically enable/disable PML when dirty logging gets turned on/off for the first/last memslot. Signed-off-by: NMakarand Sonare <makarandsonare@google.com> Co-developed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-12-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop the facade of KVM's PML logic being vendor specific and move the bits that aren't truly VMX specific into common x86 code. The MMU logic for dealing with PML is tightly coupled to the feature and to VMX's implementation, bouncing through kvm_x86_ops obfuscates the code without providing any meaningful separation of concerns or encapsulation. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-10-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Store the vendor-specific dirty log size in a variable, there's no need to wrap it in a function since the value is constant after hardware_setup() runs. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-9-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 2月, 2021 5 次提交
-
-
由 Vitaly Kuznetsov 提交于
Hyper-V emulation is enabled in KVM unconditionally. This is bad at least from security standpoint as it is an extra attack surface. Ideally, there should be a per-VM capability explicitly enabled by VMM but currently it is not the case and we can't mandate one without breaking backwards compatibility. We can, however, check guest visible CPUIDs and only enable Hyper-V emulation when "Hv#1" interface was exposed in HYPERV_CPUID_INTERFACE. Note, VMMs are free to act in any sequence they like, e.g. they can try to set MSRs first and CPUIDs later so we still need to allow the host to read/write Hyper-V specific MSRs unconditionally. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-14-vkuznets@redhat.com> [Add selftest vcpu_set_hv_cpuid API to avoid breaking xen_vmcall_test. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Hyper-V context is only needed for guests which use Hyper-V emulation in KVM (e.g. Windows/Hyper-V guests). 'struct kvm_vcpu_hv' is, however, quite big, it accounts for more than 1/4 of the total 'struct kvm_vcpu_arch' which is also quite big already. This all looks like a waste. Allocate 'struct kvm_vcpu_hv' dynamically. This patch does not bring any (intentional) functional change as we still allocate the context unconditionally but it paves the way to doing that only when needed. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-13-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Current KVM_USER_MEM_SLOTS limits are arch specific (512 on Power, 509 on x86, 32 on s390, 16 on MIPS) but they don't really need to be. Memory slots are allocated dynamically in KVM when added so the only real limitation is 'id_to_index' array which is 'short'. We don't have any other KVM_MEM_SLOTS_NUM/KVM_USER_MEM_SLOTS-sized statically defined structures. Low KVM_USER_MEM_SLOTS can be a limiting factor for some configurations. In particular, when QEMU tries to start a Windows guest with Hyper-V SynIC enabled and e.g. 256 vCPUs the limit is hit as SynIC requires two pages per vCPU and the guest is free to pick any GFN for each of them, this fragments memslots as QEMU wants to have a separate memslot for each of these pages (which are supposed to act as 'overlay' pages). Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210127175731.2020089-3-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
kvm_get_dr and emulator_get_dr except an in-range value for the register number so they cannot fail. Change the return type to void. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The TDP MMU assumes that it can do atomic accesses to 64-bit PTEs. Rather than just disabling it, compile it out completely so that it is possible to use for example 64-bit xchg. To limit the number of stubs, wrap all accesses to tdp_mmu_enabled or tdp_mmu_page with a function. Calls to all other functions in tdp_mmu.c are eliminated and do not even reach the linker. Reviewed-by: NSean Christopherson <seanjc@google.com> Tested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 04 2月, 2021 14 次提交
-
-
由 Sean Christopherson 提交于
Rename cr3_lm_rsvd_bits to reserved_gpa_bits, and use it for all GPA legality checks. AMD's APM states: If the C-bit is an address bit, this bit is masked from the guest physical address when it is translated through the nested page tables. Thus, any access that can conceivably be run through NPT should ignore the C-bit when checking for validity. For features that KVM emulates in software, e.g. MTRRs, there is no clear direction in the APM for how the C-bit should be handled. For such cases, follow the SME behavior inasmuch as possible, since SEV is is essentially a VM-specific variant of SME. For SME, the APM states: In this case the upper physical address bits are treated as reserved when the feature is enabled except where otherwise indicated. Collecting the various relavant SME snippets in the APM and cross- referencing the omissions with Linux kernel code, this leaves MTTRs and APIC_BASE as the only flows that KVM emulates that should _not_ ignore the C-bit. Note, this means the reserved bit checks in the page tables are technically broken. This will be remedied in a future patch. Although the page table checks are technically broken, in practice, it's all but guaranteed to be irrelevant. NPT is required for SEV, i.e. shadowing page tables isn't needed in the common case. Theoretically, the checks could be in play for nested NPT, but it's extremely unlikely that anyone is running nested VMs on SEV, as doing so would require L1 to expose sensitive data to L0, e.g. the entire VMCB. And if anyone is running nested VMs, L0 can't read the guest's encrypted memory, i.e. L1 would need to put its NPT in shared memory, in which case the C-bit will never be set. Or, L1 could use shadow paging, but again, if L0 needs to read page tables, e.g. to load PDPTRs, the memory can't be encrypted if L1 has any expectation of L0 doing the right thing. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-8-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
It turns out that we can't handle event channels *entirely* in userspace by delivering them as ExtINT, because KVM is a bit picky about when it accepts ExtINT interrupts from a legacy PIC. The in-kernel local APIC has to have LVT0 configured in APIC_MODE_EXTINT and unmasked, which isn't necessarily the case for Xen guests especially on secondary CPUs. To cope with this, add kvm_xen_get_interrupt() which checks the evtchn_pending_upcall field in the Xen vcpu_info, and delivers the Xen upcall vector (configured by KVM_XEN_ATTR_TYPE_UPCALL_VECTOR) if it's set regardless of LAPIC LVT0 configuration. This gives us the minimum support we need for completely userspace-based implementation of event channels. This does mean that vcpu_enter_guest() needs to check for the evtchn_pending_upcall flag being set, because it can't rely on someone having set KVM_REQ_EVENT unless we were to add some way for userspace to do so manually. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
-
由 Joao Martins 提交于
Allow the Xen emulated guest the ability to register secondary vcpu time information. On Xen guests this is used in order to be mapped to userspace and hence allow vdso gettimeofday to work. Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
-
由 Joao Martins 提交于
The vcpu info supersedes the per vcpu area of the shared info page and the guest vcpus will use this instead. Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
-
由 Joao Martins 提交于
Add KVM_XEN_ATTR_TYPE_SHARED_INFO to allow hypervisor to know where the guest's shared info page is. Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
-
由 David Woodhouse 提交于
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
-
由 Joao Martins 提交于
Add a new exit reason for emulator to handle Xen hypercalls. Since this means KVM owns the ABI, dispense with the facility for the VMM to provide its own copy of the hypercall pages; just fill them in directly using VMCALL/VMMCALL as we do for the Hyper-V hypercall page. This behaviour is enabled by a new INTERCEPT_HCALL flag in the KVM_XEN_HVM_CONFIG ioctl structure, and advertised by the same flag being returned from the KVM_CAP_XEN_HVM check. Rename xen_hvm_config() to kvm_xen_write_hypercall_page() and move it to the nascent xen.c while we're at it, and add a test case. Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
-
由 Ben Gardon 提交于
To prepare for handling page faults in parallel, change the TDP MMU page fault handler to use atomic operations to set SPTEs so that changes are not lost if multiple threads attempt to modify the same SPTE. Reviewed-by: NPeter Feiner <pfeiner@google.com> Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-21-bgardon@google.com> [Document new locking rules. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
Add a read / write lock to be used in place of the MMU spinlock on x86. The rwlock will enable the TDP MMU to handle page faults, and other operations in parallel in future commits. Reviewed-by: NPeter Feiner <pfeiner@google.com> Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-19-bgardon@google.com> [Introduce virt/kvm/mmu_lock.h - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jason Baron 提交于
Convert kvm_x86_ops to use static calls. Note that all kvm_x86_ops are covered here except for 'pmu_ops and 'nested ops'. Here are some numbers running cpuid in a loop of 1 million calls averaged over 5 runs, measured in the vm (lower is better). Intel Xeon 3000MHz: |default |mitigations=off ------------------------------------- vanilla |.671s |.486s static call|.573s(-15%)|.458s(-6%) AMD EPYC 2500MHz: |default |mitigations=off ------------------------------------- vanilla |.710s |.609s static call|.664s(-6%) |.609s(0%) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: NJason Baron <jbaron@akamai.com> Message-Id: <e057bf1b8a7ad15652df6eeba3f907ae758d3399.1610680941.git.jbaron@akamai.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jason Baron 提交于
Use static calls to improve kvm_x86_ops performance. Introduce the definitions that will be used by a subsequent patch to actualize the savings. Add a new kvm-x86-ops.h header that can be used for the definition of static calls. This header is also intended to be used to simplify the defition of svm_kvm_ops and vmx_x86_ops. Note that all functions in kvm_x86_ops are covered here except for 'pmu_ops' and 'nested ops'. I think they can be covered by static calls in a simlilar manner, but were omitted from this series to reduce scope and because I don't think they have as large of a performance impact. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NJason Baron <jbaron@akamai.com> Message-Id: <e5cc82ead7ab37b2dceb0837a514f3f8bea4f8d1.1610680941.git.jbaron@akamai.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Chenyi Qiang 提交于
DR6_INIT contains the 1-reserved bits as well as the bit that is cleared to 0 when the condition (e.g. RTM) happens. The value can be used to initialize dr6 and also be the XOR mask between the #DB exit qualification (or payload) and DR6. Concerning that DR6_INIT is used as initial value only once, rename it to DR6_ACTIVE_LOW and apply it in other places, which would make the incoming changes for bus lock debug exception more simple. Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20210202090433.13441-2-chenyi.qiang@intel.com> [Define DR6_FIXED_1 from DR6_ACTIVE_LOW and DR6_VOLATILE. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Chenyi Qiang 提交于
Virtual Machine can exploit bus locks to degrade the performance of system. Bus lock can be caused by split locked access to writeback(WB) memory or by using locks on uncacheable(UC) memory. The bus lock is typically >1000 cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores (which must wait for the bus lock to be released before their memory operations can complete). To address the threat, bus lock VM exit is introduced to notify the VMM when a bus lock was acquired, allowing it to enforce throttling or other policy based mitigations. A VMM can enable VM exit due to bus locks by setting a new "Bus Lock Detection" VM-execution control(bit 30 of Secondary Processor-based VM execution controls). If delivery of this VM exit was preempted by a higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of exit reason in vmcs field is set to 1. In current implementation, the KVM exposes this capability through KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap (i.e. off and exit) and enable it explicitly (disabled by default). If bus locks in guest are detected by KVM, exit to user space even when current exit reason is handled by KVM internally. Set a new field KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there is a bus lock detected in guest. Document for Bus Lock VM exit is now available at the latest "Intel Architecture Instruction Set Extensions Programming Reference". Document Link: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.htmlCo-developed-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Remove the update_pte() shadow paging logic, which was obsoleted by commit 4731d4c7 ("KVM: MMU: out of sync shadow core"), but never removed. As pointed out by Yu, KVM never write protects leaf page tables for the purposes of shadow paging, and instead marks their associated shadow page as unsync so that the guest can write PTEs at will. The update_pte() path, which predates the unsync logic, optimizes COW scenarios by refreshing leaf SPTEs when they are written, as opposed to zapping the SPTE, restarting the guest, and installing the new SPTE on the subsequent fault. Since KVM no longer write-protects leaf page tables, update_pte() is unreachable and can be dropped. Reported-by: NYu Zhang <yu.c.zhang@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210115004051.4099250-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 1月, 2021 2 次提交
-
-
由 Tom Lendacky 提交于
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence, where the guest vCPU register state is updated and then the vCPU is VMRUN to begin execution of the AP. For an SEV-ES guest, this won't work because the guest register state is encrypted. Following the GHCB specification, the hypervisor must not alter the guest register state, so KVM must track an AP/vCPU boot. Should the guest want to park the AP, it must use the AP Reset Hold exit event in place of, for example, a HLT loop. First AP boot (first INIT-SIPI-SIPI sequence): Execute the AP (vCPU) as it was initialized and measured by the SEV-ES support. It is up to the guest to transfer control of the AP to the proper location. Subsequent AP boot: KVM will expect to receive an AP Reset Hold exit event indicating that the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to awaken it. When the AP Reset Hold exit event is received, KVM will place the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI sequence, KVM will make the vCPU runnable. It is again up to the guest to then transfer control of the AP to the proper location. To differentiate between an actual HLT and an AP Reset Hold, a new MP state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is placed in upon receiving the AP Reset Hold exit event. Additionally, to communicate the AP Reset Hold exit event up to userspace (if needed), a new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD. A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds a new function that, for non SEV-ES guests, invokes the original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests, implements the logic above. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
The tdp_mmu_roots and tdp_mmu_pages in struct kvm_arch should only contain pages with tdp_mmu_page set to true. tdp_mmu_pages should not contain any pages with a non-zero root_count and tdp_mmu_roots should only contain pages with a positive root_count, unless a thread holds the MMU lock and is in the process of modifying the list. Various functions expect these invariants to be maintained, but they are not explictily documented. Add to the comments on both fields to document the above invariants. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210107001935.3732070-2-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 12月, 2020 1 次提交
-
-
由 Tom Lendacky 提交于
The guest FPU state is automatically restored on VMRUN and saved on VMEXIT by the hardware, so there is no reason to do this in KVM. Eliminate the allocation of the guest_fpu save area and key off that to skip operations related to the guest FPU state. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Message-Id: <173e429b4d0d962c6a443c4553ffdaf31b7665a4.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-