- 20 11月, 2014 6 次提交
-
-
由 Nadav Amit 提交于
In __linearize there is check of the condition whether to check if masking of the linear address is needed. It occurs immediately after switch that evaluates the same condition. Merge them. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
When SS is used using a non-canonical address, an #SS exception is generated on real hardware. KVM emulator causes a #GP instead. Fix it to behave as real x86 CPU. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
If branch (e.g., jmp, ret) causes limit violations, since the target IP > limit, the #GP exception occurs before the branch. In other words, the RIP pushed on the stack should be that of the branch and not that of the target. To do so, we can call __linearize, with new EIP, which also saves us the code which performs the canonical address checks. On the case of assigning an EIP >= 2^32 (when switching cs.l), we also safe, as __linearize will check the new EIP does not exceed the limit and would trigger #GP(0) otherwise. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
When segment is accessed, real hardware does not perform any privilege level checks. In contrast, KVM emulator does. This causes some discrepencies from real hardware. For instance, reading from readable code segment may fail due to incorrect segment checks. In addition, it introduces unnecassary overhead. To reference Intel SDM 5.5 ("Privilege Levels"): "Privilege levels are checked when the segment selector of a segment descriptor is loaded into a segment register." The SDM never mentions privilege level checks during memory access, except for loading far pointers in section 5.10 ("Pointer Validation"). Those are actually segment selector loads and are emulated in the similarily (i.e., regardless to __linearize checks). This behavior was also checked using sysexit. A data-segment whose DPL=0 was loaded, and after sysexit (CPL=3) it is still accessible. Therefore, all the privilege level checks in __linearize are removed. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
When performing segmented-read/write in the emulator for stack operations, it ignores the stack size, and uses the ad_bytes as indication for the pointer size. As a result, a wrong address may be accessed. To fix this behavior, we can remove the masking of address in __linearize and perform it beforehand. It is already done for the operands (so currently it is inefficiently done twice). It is missing in two cases: 1. When using rip_relative 2. On fetch_bit_operand that changes the address. This patch masks the address on these two occassions, and removes the masking from __linearize. Note that it does not mask EIP during fetch. In protected/legacy mode code fetch when RIP >= 2^32 should result in #GP and not wrap-around. Since we make limit checks within __linearize, this is the expected behavior. Partial revert of commit 518547b3 (KVM: x86: Emulator does not calculate address correctly, 2014-09-30). Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
Commit 10e38fc7cab6 ("KVM: x86: Emulator flag for instruction that only support 16-bit addresses in real mode") introduced NoBigReal for instructions such as MONITOR. Apparetnly, the Intel SDM description that led to this patch is misleading. Since no instruction is using NoBigReal, it is safe to remove it, we fully understand what the SDM means. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 11月, 2014 2 次提交
-
-
由 Tiejun Chen 提交于
MMIO_MAX_GEN is the same as MMIO_GEN_MASK. Use only one. Signed-off-by: NTiejun Chen <tiejun.chen@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tiejun Chen 提交于
Instead, just use PFERR_{FETCH, PRESENT, WRITE}_MASK inside handle_ept_violation() for slightly better code. Signed-off-by: NTiejun Chen <tiejun.chen@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 17 11月, 2014 5 次提交
-
-
由 Nadav Amit 提交于
apic_find_highest_irr assumes irr_pending is set if any vector in APIC_IRR is set. If this assumption is broken and apicv is disabled, the injection of interrupts may be deferred until another interrupt is delivered to the guest. Ultimately, if no other interrupt should be injected to that vCPU, the pending interrupt may be lost. commit 56cc2406 ("KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use") changed the behavior of apic_clear_irr so irr_pending is cleared after setting APIC_IRR vector. After this commit, if apic_set_irr and apic_clear_irr run simultaneously, a race may occur, resulting in APIC_IRR vector set, and irr_pending cleared. In the following example, assume a single vector is set in IRR prior to calling apic_clear_irr: apic_set_irr apic_clear_irr ------------ -------------- apic->irr_pending = true; apic_clear_vector(...); vec = apic_search_irr(apic); // => vec == -1 apic_set_vector(...); apic->irr_pending = (vec != -1); // => apic->irr_pending == false Nonetheless, it appears the race might even occur prior to this commit: apic_set_irr apic_clear_irr ------------ -------------- apic->irr_pending = true; apic->irr_pending = false; apic_clear_vector(...); if (apic_search_irr(apic) != -1) apic->irr_pending = true; // => apic->irr_pending == false apic_set_vector(...); Fixing this issue by: 1. Restoring the previous behavior of apic_clear_irr: clear irr_pending, call apic_clear_vector, and then if APIC_IRR is non-zero, set irr_pending. 2. On apic_set_irr: first call apic_set_vector, then set irr_pending. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Logical destination mode can be used to send NMI IPIs even when all APICs are software disabled, so if all APICs are software disabled we should still look at the DFRs. So the DFRs should all be the same, even if some or all APICs are software disabled. However, the SDM does not say this, so tweak the logic as follows: - if one APIC is enabled and has LDR != 0, use that one to build the map. This picks the right DFR in case an OS is only setting it for the software-enabled APICs, or in case an OS is using logical addressing on some APICs while leaving the rest in reset state (using LDR was suggested by Radim). - if all APICs are disabled, pick a random one to build the map. We use the last one with LDR != 0 for simplicity. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
Currently, the APIC logical map does not consider VCPUs whose local-apic is software-disabled. However, NMIs, INIT, etc. should still be delivered to such VCPUs. Therefore, the APIC mode should first be determined, and then the map, considering all VCPUs should be constructed. To address this issue, first find the APIC mode, and only then construct the logical map. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The update_memslots invocation is only needed in one case. Make the code clearer by moving it to __kvm_set_memory_region, and removing the wrapper around insert_memslot. Reviewed-by: NIgor Mammedov <imammedo@redhat.com> Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The two kmemdup invocations can be unified. I find that the new placement of the comment makes it easier to see what happens. Reviewed-by: NIgor Mammedov <imammedo@redhat.com> Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 11月, 2014 3 次提交
-
-
由 Paolo Bonzini 提交于
This completes the optimization from the previous patch, by removing the KVM_MEM_SLOTS_NUM-iteration loop from insert_memslot. Reviewed-by: NIgor Mammedov <imammedo@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Igor Mammedov 提交于
memslots is a sorted array. When a slot is changed, heapsort (lib/sort.c) would take O(n log n) time to update it; an optimized insertion sort will only cost O(n) on an array with just one item out of order. Replace sort() with a custom sort that takes advantage of memslots usage pattern and the known position of the changed slot. performance change of 128 memslots insertions with gradually increasing size (the worst case): heap sort custom sort max: 249747 2500 cycles with custom sort alg taking ~98% less then original update time. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Igor Mammedov 提交于
With the 3 private slots, this gives us 512 slots total. Motivation for this is in addition to assigned devices support more memory hotplug slots, where 1 slot is used by a hotplugged memory stick. It will allow to support upto 256 hotplug memory slots and leave 253 slots for assigned devices and other devices that use them. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 11月, 2014 1 次提交
-
-
由 Chris J Arges 提交于
When running the tsc_adjust kvm-unit-test on an AMD processor with the IA32_TSC_ADJUST feature enabled, the WARN_ON in svm_adjust_tsc_offset can be triggered. This WARN_ON checks for a negative adjustment in case __scale_tsc is called; however it may trigger unnecessary warnings. This patch moves the WARN_ON to trigger only if __scale_tsc will actually be called from svm_adjust_tsc_offset. In addition make adj in kvm_set_msr_common s64 since this can have signed values. Signed-off-by: NChris J Arges <chris.j.arges@canonical.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 11月, 2014 2 次提交
-
-
由 Andy Lutomirski 提交于
There's nothing to switch if the host and guest values are the same. I am unable to find evidence that this makes any difference whatsoever. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> [I could see a difference on Nehalem. From 5 runs: userspace exit, guest!=host 12200 11772 12130 12164 12327 userspace exit, guest=host 11983 11780 11920 11919 12040 lightweight exit, guest!=host 3214 3220 3238 3218 3337 lightweight exit, guest=host 3178 3193 3193 3187 3220 This passes the t-test with 99% confidence for userspace exit, 98.5% confidence for lightweight exit. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andy Lutomirski 提交于
At least on Sandy Bridge, letting the CPU switch IA32_EFER is much faster than switching it manually. I benchmarked this using the vmexit kvm-unit-test (single run, but GOAL multiplied by 5 to do more iterations): Test Before After Change cpuid 2000 1932 -3.40% vmcall 1914 1817 -5.07% mov_from_cr8 13 13 0.00% mov_to_cr8 19 19 0.00% inl_from_pmtimer 19164 10619 -44.59% inl_from_qemu 15662 10302 -34.22% inl_from_kernel 3916 3802 -2.91% outl_to_kernel 2230 2194 -1.61% mov_dr 172 176 2.33% ipi (skipped) (skipped) ipi+halt (skipped) (skipped) ple-round-robin 13 13 0.00% wr_tsc_adjust_msr 1920 1845 -3.91% rd_tsc_adjust_msr 1892 1814 -4.12% mmio-no-eventfd:pci-mem 16394 11165 -31.90% mmio-wildcard-eventfd:pci-mem 4607 4645 0.82% mmio-datamatch-eventfd:pci-mem 4601 4610 0.20% portio-no-eventfd:pci-io 11507 7942 -30.98% portio-wildcard-eventfd:pci-io 2239 2225 -0.63% portio-datamatch-eventfd:pci-io 2250 2234 -0.71% I haven't explicitly computed the significance of these numbers, but this isn't subtle. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> [The results were reproducible on all of Nehalem, Sandy Bridge and Ivy Bridge. The slowness of manual switching is because writing to EFER with WRMSR triggers a TLB flush, even if the only bit you're touching is SCE (so the page table format is not affected). Doing the write as part of vmentry/vmexit, instead, does not flush the TLB, probably because all processors that have EPT also have VPID. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 10 11月, 2014 1 次提交
-
-
由 Paolo Bonzini 提交于
PCIDs are only supported in 64-bit mode. No need to clear bit 63 of CR3 unless the host is 64-bit. Reported by Fengguang Wu's autobuilder. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 11月, 2014 7 次提交
-
-
由 David Matlack 提交于
The new trace event records: * the id of vcpu being updated * the pvclock_vcpu_time_info struct being written to guest memory This is useful for debugging pvclock bugs, such as the bug fixed by "[PATCH] kvm: x86: Fix kvm clock versioning.". Signed-off-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Owen Hofmann 提交于
kvm updates the version number for the guest paravirt clock structure by incrementing the version of its private copy. It does not read the guest version, so will write version = 2 in the first update for every new VM, including after restoring a saved state. If guest state is saved during reading the clock, it could read and accept struct fields and guest TSC from two different updates. This changes the code to increment the guest version and write it back. Signed-off-by: NOwen Hofmann <osh@google.com> Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
Commit 3b32004a ("KVM: x86: movnti minimum op size of 32-bit is not kept") did not fully fix the minimum operand size of MONTI emulation. Still, MOVNTI may be mistakenly performed using 16-bit opsize. This patch add No16 flag to mark an instruction does not support 16-bits operand size. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Marcelo Tosatti 提交于
When the guest writes to the TSC, the masterclock TSC copy must be updated as well along with the TSC_OFFSET update, otherwise a negative tsc_timestamp is calculated at kvm_guest_time_update. Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to "if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)" becomes redundant, so remove the do_request boolean and collapse everything into a single condition. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
Now that KVM injects #UD on "unhandlable" error, it makes better sense to return such error on sysenter instead of directly injecting #UD to the guest. This allows to track more easily the unhandlable cases the emulator does not support. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
APIC base relocation is unsupported by KVM. If anyone uses it, the least should be to report a warning in the hypervisor. Note that KVM-unit-tests uses this feature for some reason, so running the tests triggers the warning. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
Commit 7fe864dc (KVM: x86: Mark VEX-prefix instructions emulation as unimplemented, 2014-06-02) marked VEX instructions as such in protected mode. VEX-prefix instructions are not supported relevant on real-mode and VM86, but should cause #UD instead of being decoded as LES/LDS. Fix this behaviour to be consistent with real hardware. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> [Check for mod == 3, rather than 2 or 3. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 11月, 2014 13 次提交
-
-
由 Nadav Amit 提交于
Task-switch emulation checks the privilege level prior to performing the task-switch. This check is incorrect in the case of task-gates, in which the tss.dpl is ignored, and can cause superfluous exceptions. Moreover this check is unnecassary, since the CPU checks the privilege levels prior to exiting. Intel SDM 25.4.2 says "If CALL or JMP accesses a TSS descriptor directly outside IA-32e mode, privilege levels are checked on the TSS descriptor" prior to exiting. AMD 15.14.1 says "The intercept is checked before the task switch takes place but after the incoming TSS and task gate (if one was involved) have been checked for correctness." This patch removes the CPL checks for CALL and JMP. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
When emulating LTR/LDTR/LGDT/LIDT, #GP should be injected if the base is non-canonical. Otherwise, VM-entry will fail. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
LGDT and LIDT emulation logic is almost identical. Merge the logic into a single point to avoid redundancy. This will be used by the next patch that will ensure the bases of the loaded GDTR and IDTR are canonical. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
If the emulation ends in fault, eflags should not be updated. However, several instruction emulations (actually all the fastops) currently update eflags, if the fault was detected afterwards (e.g., #PF during writeback). Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
Although Intel SDM mentions bit 63 is reserved, MOV to CR3 can have bit 63 set. As Intel SDM states in section 4.10.4 "Invalidation of TLBs and Paging-Structure Caches": " MOV to CR3. ... If CR4.PCIDE = 1 and bit 63 of the instruction’s source operand is 0 ..." In other words, bit 63 is not reserved. KVM emulator currently consider bit 63 as reserved. Fix it. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
According to Intel SDM push of segment selectors is done in the following manner: "if the operand size is 32-bits, either a zero-extended value is pushed on the stack or the segment selector is written on the stack using a 16-bit move. For the last case, all recent Core and Atom processors perform a 16-bit move, leaving the upper portion of the stack location unmodified." This patch modifies the behavior to match the core behavior. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
CMPS and SCAS instructions are evaluated in the wrong order. For reference (of CMPS), see http://www.fermimn.gov.it/linux/quarta/x86/cmps.htm : "Note that the direction of subtraction for CMPS is [SI] - [DI] or [ESI] - [EDI]. The left operand (SI or ESI) is the source and the right operand (DI or EDI) is the destination. This is the reverse of the usual Intel convention in which the left operand is the destination and the right operand is the source." Introducing em_cmp_r for this matter that performs comparison in reverse order using fastop infrastructure to avoid a wrapper function. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
SYSCALL emulation currently clears in 64-bit mode eflags according to MSR_SYSCALL_MASK. However, on bare-metal eflags[1] which is fixed to one cannot be cleared, even if MSR_SYSCALL_MASK masks the bit. This wrong behavior may result in failed VM-entry, as VT disallows entry with eflags[1] cleared. This patch sets the bit after masking eflags on syscall. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
In x86, you can only MOV-sreg to memory with either 16-bits or 64-bits size. In contrast, KVM may write to 32-bits memory on MOV-sreg. This patch fixes KVM behavior, and sets the destination operand size to two, if the destination is memory. When destination is registers, and the operand size is 32-bits, the high 16-bits in modern CPUs is filled with zero. This is handled correctly. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
x86 debug registers hold a linear address. Therefore, breakpoints detection should consider CS.base, and check whether instruction linear address equals (CS.base + RIP). This patch introduces a function to evaluate RIP linear address and uses it for breakpoints detection. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
DR6[0:3] (previous breakpoint indications) are cleared when #DB is injected during handle_exception, just as real hardware does. Similarily, handle_dr should clear DR6[0:3]. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
It should clear B0-B3 and set BD. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
Real-mode exceptions do not deliver error code. As can be seen in Intel SDM volume 2, real-mode exceptions do not have parentheses, which indicate error-code. To avoid significant changes of the code, the error code is "removed" during exception queueing. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-