- 13 7月, 2017 1 次提交
-
-
由 Ladi Prosek 提交于
The backwards_tsc_observed global introduced in commit 16a96021 is never reset to false. If a VM happens to be running while the host is suspended (a common source of the TSC jumping backwards), master clock will never be enabled again for any VM. In contrast, if no VM is running while the host is suspended, master clock is unaffected. This is inconsistent and unnecessarily strict. Let's track the backwards_tsc_observed variable separately and let each VM start with a clean slate. Real world impact: My Windows VMs get slower after my laptop undergoes a suspend/resume cycle. The only way to get the perf back is unloading and reloading the kvm module. Signed-off-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 03 7月, 2017 1 次提交
-
-
由 Peter Feiner 提交于
Adds the plumbing to disable A/D bits in the MMU based on a new role bit, ad_disabled. When A/D is disabled, the MMU operates as though A/D aren't available (i.e., using access tracking faults instead). To avoid SP -> kvm_mmu_page.role.ad_disabled lookups all over the place, A/D disablement is now stored in the SPTE. This state is stored in the SPTE by tweaking the use of SPTE_SPECIAL_MASK for access tracking. Rather than just setting SPTE_SPECIAL_MASK when an access-tracking SPTE is non-present, we now always set SPTE_SPECIAL_MASK for access-tracking SPTEs. Signed-off-by: NPeter Feiner <pfeiner@google.com> [Use role.ad_disabled even for direct (non-shadow) EPT page tables. Add documentation and a few MMU_WARN_ONs. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 04 6月, 2017 1 次提交
-
-
由 Andrew Jones 提交于
Marc Zyngier suggested that we define the arch specific VCPU request base, rather than requiring each arch to remember to start from 8. That suggestion, along with Radim Krcmar's recent VCPU request flag addition, snowballed into defining something of an arch VCPU request defining API. No functional change. (Looks like x86 is running out of arch VCPU request bits. Maybe someday we'll need to extend to 64.) Signed-off-by: NAndrew Jones <drjones@redhat.com> Acked-by: NChristoffer Dall <cdall@linaro.org> Signed-off-by: NChristoffer Dall <cdall@linaro.org>
-
- 17 5月, 2017 1 次提交
-
-
由 Paolo Bonzini 提交于
In some fio benchmarks, halt_poll_ns=400000 caused CPU utilization to increase heavily even in cases where the performance improvement was small. In particular, bandwidth divided by CPU usage was as much as 60% lower. To some extent this is the expected effect of the patch, and the additional CPU utilization is only visible when running the benchmarks. However, halving the threshold also halves the extra CPU utilization (from +30-130% to +20-70%) and has no negative effect on performance. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 09 5月, 2017 1 次提交
-
-
由 Bandan Das 提交于
When KVM updates accessed/dirty bits, this hook can be used to invoke an arch specific function that implements/emulates dirty logging such as PML. Signed-off-by: NBandan Das <bsd@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 02 5月, 2017 1 次提交
-
-
由 David Hildenbrand 提交于
We needed the lock to avoid racing with creation of the irqchip on x86. As kvm_set_irq_routing() calls srcu_synchronize_expedited(), this lock might be held for a longer time. Let's introduce an arch specific callback to check if we can actually add irq routes. For x86, all we have to do is check if we have an irqchip in the kernel. We don't need kvm->lock at that point as the irqchip is marked as inititalized only when actually fully created. Reported-by: NSteve Rutherford <srutherford@google.com> Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Fixes: 1df6dded ("KVM: x86: race between KVM_SET_GSI_ROUTING and KVM_CREATE_IRQCHIP") Signed-off-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 27 4月, 2017 2 次提交
-
-
由 Paolo Bonzini 提交于
kvm_make_all_requests() provides a synchronization that waits until all kicked VCPUs have acknowledged the kick. This is important for KVM_REQ_MMU_RELOAD as it prevents freeing while lockless paging is underway. This patch adds the synchronization property into all requests that are currently being used with kvm_make_all_requests() in order to preserve the current behavior and only introduce a new framework. Removing it from requests where it is not necessary is left for future patches. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
Some operations must ensure that the guest is not running with stale data, but if the guest is halted, then the update can wait until another event happens. kvm_make_all_requests() currently doesn't wake up, so we can mark all requests used with it. First 8 bits were arbitrarily reserved for request numbers. Most uses of requests have the request type as a constant, so a compiler will optimize the '&'. An alternative would be to have an inline function that would return whether the request needs a wake-up or not, but I like this one better even though it might produce worse assembly. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 4月, 2017 1 次提交
-
-
由 Kyle Huey 提交于
Hardware support for faulting on the cpuid instruction is not required to emulate it, because cpuid triggers a VM exit anyways. KVM handles the relevant MSRs (MSR_PLATFORM_INFO and MSR_MISC_FEATURES_ENABLE) and upon a cpuid-induced VM exit checks the cpuid faulting state and the CPL. kvm_require_cpl is even kind enough to inject the GP fault for us. Signed-off-by: NKyle Huey <khuey@kylehuey.com> Reviewed-by: NDavid Matlack <dmatlack@google.com> [Return "1" from kvm_emulate_cpuid, it's not void. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 4月, 2017 1 次提交
-
-
由 David Hildenbrand 提交于
Let's add a new mode and set it while we create the irqchip via KVM_CREATE_IRQCHIP and KVM_CAP_SPLIT_IRQCHIP. This mode will be used later to test if adding routes (in kvm_set_routing_entry()) is already allowed. Signed-off-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 07 4月, 2017 2 次提交
-
-
由 Paolo Bonzini 提交于
Its value has never changed; we might as well make it part of the ABI instead of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO). Because PPC does not always make MMIO available, the code has to be made dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Paolo Bonzini 提交于
Now use bit 6 of EPTP to optionally enable A/D bits for EPTP. Another thing to change is that, when EPT accessed and dirty bits are not in use, VMX treats accesses to guest paging structures as data reads. When they are in use (bit 6 of EPTP is set), they are treated as writes and the corresponding EPT dirty bit is set. The MMU didn't know this detail, so this patch adds it. We also have to fix up the exit qualification. It may be wrong because KVM sets bit 6 but the guest might not. L1 emulates EPT A/D bits using write permissions, so in principle it may be possible for EPT A/D bits to be used by L1 even though not available in hardware. The problem is that guest page-table walks will be treated as reads rather than writes, so they would not cause an EPT violation. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [Fixed typo in walk_addr_generic() comment and changed bit clear + conditional-set pattern in handle_ept_violation() to conditional-clear] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 17 2月, 2017 1 次提交
-
-
由 Paolo Bonzini 提交于
The FPU is always active now when running KVM. Reviewed-by: NDavid Matlack <dmatlack@google.com> Reviewed-by: NBandan Das <bsd@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 2月, 2017 1 次提交
-
-
由 Paolo Bonzini 提交于
Calls to apic_find_highest_irr are scanning IRR twice, once in vmx_sync_pir_from_irr and once in apic_search_irr. Change sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr; now that it does the computation, it can also do the RVI write. In order to avoid complications in svm.c, make the callback optional. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 1月, 2017 7 次提交
-
-
由 Paolo Bonzini 提交于
This statistic can be useful to estimate the cost of an IRQ injection scenario, by comparing it with irq_injections. For example the stat shows that sti;hlt triggers more KVM_REQ_EVENT than sti;nop. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tom Lendacky 提交于
When a guest causes a NPF which requires emulation, KVM sometimes walks the guest page tables to translate the GVA to a GPA. This is unnecessary most of the time on AMD hardware since the hardware provides the GPA in EXITINFO2. The only exception cases involve string operations involving rep or operations that use two memory locations. With rep, the GPA will only be the value of the initial NPF and with dual memory locations we won't know which memory address was translated into EXITINFO2. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
This change implements lockless access tracking for Intel CPUs without EPT A bits. This is achieved by marking the PTEs as not-present (but not completely clearing them) when clear_flush_young() is called after marking the pages as accessed. When an EPT Violation is generated as a result of the VM accessing those pages, the PTEs are restored to their original values. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
MMIO SPTEs currently set both bits 62 and 63 to distinguish them as special PTEs. However, bit 63 is used as the SVE bit in Intel EPT PTEs. The SVE bit is ignored for misconfigured PTEs but not necessarily for not-Present PTEs. Since MMIO SPTEs use an EPT misconfiguration, so using bit 63 for them is acceptable. However, the upcoming fast access tracking feature adds another type of special tracking PTE, which uses not-Present PTEs and hence should not set bit 63. In order to use common bits to distinguish both type of special PTEs, we now use only bit 62 as the special bit. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
When using two-dimensional paging, the mmu_page_hash (which provides lookups for existing kvm_mmu_page structs), becomes imbalanced; with too many collisions in buckets 0 and 512. This has been seen to cause mmu_lock to be held for multiple milliseconds in kvm_mmu_get_page on VMs with a large amount of RAM mapped with 4K pages. The current hash function uses the lower 10 bits of gfn to index into mmu_page_hash. When doing shadow paging, gfn is the address of the guest page table being shadow. These tables are 4K-aligned, which makes the low bits of gfn a good hash. However, with two-dimensional paging, no guest page tables are being shadowed, so gfn is the base address that is mapped by the table. Thus page tables (level=1) have a 2MB aligned gfn, page directories (level=2) have a 1GB aligned gfn, etc. This means hashes will only differ in their 10th bit. hash_64() provides a better hash. For example, on a VM with ~200G (99458 direct=1 kvm_mmu_page structs): hash max_mmu_page_hash_collisions -------------------------------------------- low 10 bits 49847 hash_64 105 perfect 97 While we're changing the hash, increase the table size by 4x to better support large VMs (further reduces number of collisions in 200G VM to 29). Note that hash_64() does not provide a good distribution prior to commit ef703f49 ("Eliminate bad hash multipliers from hash_32() and hash_64()"). Signed-off-by: NDavid Matlack <dmatlack@google.com> Change-Id: I5aa6b13c834722813c6cca46b8b1ed6f53368ade Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Report the maximum number of mmu_page_hash collisions as a per-VM stat. This will make it easy to identify problems with the mmu_page_hash in the future. Signed-off-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
irqchip_in_kernel() tried to save a bit by reusing pic_irqchip(), but it just complicated the code. Add a separate state for the irqchip mode. Reviewed-by: NDavid Hildenbrand <david@redhat.com> [Used Paolo's version of condition in irqchip_in_kernel().] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 25 12月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
-
- 17 12月, 2016 1 次提交
-
-
由 Paolo Bonzini 提交于
Introduce a new mutex to avoid an AB-BA deadlock between kvm->lock and vcpu->mutex. Protect accesses in kvm_hv_setup_tsc_page too, as suggested by Roman. Reported-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 12月, 2016 3 次提交
-
-
由 Ladi Prosek 提交于
Loading CR3 as part of emulating vmentry is different from regular CR3 loads, as implemented in kvm_set_cr3, in several ways. * different rules are followed to check CR3 and it is desirable for the caller to distinguish between the possible failures * PDPTRs are not loaded if PAE paging and nested EPT are both enabled * many MMU operations are not necessary This patch introduces nested_vmx_load_cr3 suitable for CR3 loads as part of nested vmentry and vmexit, and makes use of it on the nested vmentry path. Signed-off-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Kyle Huey 提交于
kvm_skip_emulated_instruction calls both kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep, skipping the emulated instruction and generating a trap if necessary. Replacing skip_emulated_instruction calls with kvm_skip_emulated_instruction is straightforward, except for: - ICEBP, which is already inside a trap, so avoid triggering another trap. - Instructions that can trigger exits to userspace, such as the IO insns, MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a KVM_GUESTDBG_SINGLESTEP exit, and the handling code for IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will take precedence. The singlestep will be triggered again on the next instruction, which is the current behavior. - Task switch instructions which would require additional handling (e.g. the task switch bit) and are instead left alone. - Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction, which do not trigger singlestep traps as mentioned previously. Signed-off-by: NKyle Huey <khuey@kylehuey.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Kyle Huey 提交于
Once skipping the emulated instruction can potentially trigger an exit to userspace (via KVM_GUESTDBG_SINGLESTEP) kvm_emulate_cpuid will need to propagate a return value. Signed-off-by: NKyle Huey <khuey@kylehuey.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 25 11月, 2016 2 次提交
-
-
由 Tom Lendacky 提交于
Update the I/O interception support to add the kvm_fast_pio_in function to speed up the in instruction similar to the out instruction. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Tom Lendacky 提交于
AMD hardware adds two additional bits to aid in nested page fault handling. Bit 32 - NPF occurred while translating the guest's final physical address Bit 33 - NPF occurred while translating the guest page tables The guest page tables fault indicator can be used as an aid for nested virtualization. Using V0 for the host, V1 for the first level guest and V2 for the second level guest, when both V1 and V2 are using nested paging there are currently a number of unnecessary instruction emulations. When V2 is launched shadow paging is used in V1 for the nested tables of V2. As a result, KVM marks these pages as RO in the host nested page tables. When V2 exits and we resume V1, these pages are still marked RO. Every nested walk for a guest page table is treated as a user-level write access and this causes a lot of NPFs because the V1 page tables are marked RO in the V0 nested tables. While executing V1, when these NPFs occur KVM sees a write to a read-only page, emulates the V1 instruction and unprotects the page (marking it RW). This patch looks for cases where we get a NPF due to a guest page table walk where the page was marked RO. It immediately unprotects the page and resumes the guest, leading to far fewer instruction emulations when nested virtualization is used. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 03 11月, 2016 1 次提交
-
-
由 Paolo Bonzini 提交于
Since commit a545ab6a ("kvm: x86: add tsc_offset field to struct kvm_vcpu_arch", 2016-09-07) the offset between host and L1 TSC is cached and need not be fished out of the VMCS or VMCB. This means that we can implement adjust_tsc_offset_guest and read_l1_tsc entirely in generic code. The simplification is particularly significant for VMX code, where vmx->nested.vmcs01_tsc_offset was duplicating what is now in vcpu->arch.tsc_offset. Therefore the vmcs01_tsc_offset can be dropped completely. More importantly, this fixes KVM_GET_CLOCK/KVM_SET_CLOCK which, after commit 108b249c ("KVM: x86: introduce get_kvmclock_ns", 2016-09-01) called read_l1_tsc while the VMCS was not loaded. It thus returned bogus values on Intel CPUs. Fixes: 108b249cReported-by: NRoman Kagan <rkagan@virtuozzo.com> Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 20 9月, 2016 1 次提交
-
-
由 Paolo Bonzini 提交于
Lately tsc page was implemented but filled with empty values. This patch setup tsc page scale and offset based on vcpu tsc, tsc_khz and HV_X64_MSR_TIME_REF_COUNT value. The valid tsc page drops HV_X64_MSR_TIME_REF_COUNT msr reads count to zero which potentially improves performance. Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: NPeter Hornyack <peterhornyack@google.com> Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> [Computation of TSC page parameters rewritten to use the Linux timekeeper parameters. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 9月, 2016 2 次提交
-
-
由 Luiz Capitulino 提交于
The TSC offset can now be read directly from struct kvm_arch_vcpu. Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Luiz Capitulino 提交于
A future commit will want to easily read a vCPU's TSC offset, so we store it in struct kvm_arch_vcpu_arch for easy access. Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 9月, 2016 3 次提交
-
-
由 Suravee Suthikulpanit 提交于
This patch introduces avic_ga_log_notifier, which will be called by IOMMU driver whenever it handles the Guest vAPIC (GA) log entry. Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suravee Suthikulpanit 提交于
Introduces per-VM AVIC ID and helper functions to manage the IDs. Currently, the ID will be used to implement 32-bit AVIC IOMMU GA tag. The ID is 24-bit one-based indexing value, and is managed via helper functions to get the next ID, or to free an ID once a VM is destroyed. There should be no ID conflict for any active VMs. Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suraj Jitindar Singh 提交于
vms and vcpus have statistics associated with them which can be viewed within the debugfs. Currently it is assumed within the vcpu_stat_get() and vm_stat_get() functions that all of these statistics are represented as u32s, however the next patch adds some u64 vcpu statistics. Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly. Since vcpu statistics are per vcpu, they will only be updated by a single vcpu at a time so this shouldn't present a problem on 32-bit machines which can't atomically increment 64-bit numbers. However vm statistics could potentially be updated by multiple vcpus from that vm at a time. To avoid the overhead of atomics make all vm statistics ulong such that they are 64-bit on 64-bit systems where they can be atomically incremented and are 32-bit on 32-bit systems which may not be able to atomically increment 64-bit numbers. Modify vm_stat_get() to expect ulongs. Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: NDavid Matlack <dmatlack@google.com> Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 14 7月, 2016 5 次提交
-
-
由 Radim Krčmář 提交于
kzalloc was replaced with kvm_kvzalloc to allow non-contiguous areas and rcu had to be modified to cope with it. The practical limit for KVM_MAX_VCPU_ID right now is INT_MAX, but lower value was chosen in case there were bugs. 1023 is sufficient maximum APIC ID for 288 VCPUs. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
288 is in high demand because of Knights Landing CPU. We cannot set the limit to 640k, because that would be wasting space. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to KVM_CAP_X2APIC_API. The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode. The enableable capability is needed in order to support standard x2APIC and remain backward compatible. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> [Expand kvm_apic_mda comment. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
KVM_CAP_X2APIC_API is a capability for features related to x2APIC enablement. KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to extend APIC ID in get/set ioctl and MSI addresses to 32 bits. Both are needed to support x2APIC. The feature has to be enableable and disabled by default, because get/set ioctl shifted and truncated APIC ID to 8 bits by using a non-standard protocol inspired by xAPIC and the change is not backward-compatible. Changes to MSI addresses follow the format used by interrupt remapping unit. The upper address word, that used to be 0, contains upper 24 bits of the LAPIC address in its upper 24 bits. Lower 8 bits are reserved as 0. Using the upper address word is not backward-compatible either as we didn't check that userspace zeroed the word. Reserved bits are still not explicitly checked, but non-zero data will affect LAPIC addresses, which will cause a bug. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
x2APIC supports up to 2^32-1 LAPICs, but most guest in coming years will probably has fewer VCPUs. Dynamic size saves memory at the cost of turning one constant into a variable. apic_map mutex had to be moved before allocation to avoid races with cpu hotplug. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-