- 27 12月, 2011 5 次提交
-
-
由 Chris Wright 提交于
The host side pv mmu support has been marked for feature removal in January 2011. It's not in use, is slower than shadow or hardware assisted paging, and a maintenance burden. It's November 2011, time to remove it. Signed-off-by: NChris Wright <chrisw@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Fast prefetch spte for the unsync shadow page on invlpg path Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Remove the same code between emulator_pio_in_emulated and emulator_pio_out_emulated Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
If the emulation is caused by #PF and it is non-page_table writing instruction, it means the VM-EXIT is caused by shadow page protected, we can zap the shadow page and retry this instruction directly The idea is from Avi Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Nadav Har'El 提交于
This patch adds a new vcpu->requests bit, KVM_REQ_IMMEDIATE_EXIT. This bit requests that when next entering the guest, we should run it only for as little as possible, and exit again. We use this new option in nested VMX: When L1 launches L2, but L0 wishes L1 to continue running so it can inject an event to it, we unfortunately cannot just pretend to have run L2 for a little while - We must really launch L2, otherwise certain one-off vmcs12 parameters (namely, L1 injection into L2) will be lost. So the existing code runs L2 in this case. But L2 could potentially run for a long time until it exits, and the injection into L1 will be delayed. The new KVM_REQ_IMMEDIATE_EXIT allows us to request that L2 will be entered, as necessary, but will exit as soon as possible after entry. Our implementation of this request uses smp_send_reschedule() to send a self-IPI, with interrupts disabled. The interrupts remain disabled until the guest is entered, and then, after the entry is complete (often including processing an injection and jumping to the relevant handler), the physical interrupt is noticed and causes an exit. On recent Intel processors, we could have achieved the same goal by using MTF instead of a self-IPI. Another technique worth considering in the future is to use VM_EXIT_ACK_INTR_ON_EXIT and a highest-priority vector IPI - to slightly improve performance by avoiding the useless interrupt handler which ends up being called when smp_send_reschedule() is used. Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 26 12月, 2011 1 次提交
-
-
由 Jan Kiszka 提交于
Unlike all of the other cpuid bits, the TSC deadline timer bit is set unconditionally, regardless of what userspace wants. This is broken in several ways: - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC deadline timer feature, a guest that uses the feature will break - live migration to older host kernels that don't support the TSC deadline timer will cause the feature to be pulled from under the guest's feet; breaking it - guests that are broken wrt the feature will fail. Fix by not enabling the feature automatically; instead report it to userspace. Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not KVM_GET_SUPPORTED_CPUID. Fixes the Illumos guest kernel, which uses the TSC deadline timer feature. [avi: add the KVM_CAP + documentation] Reported-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com> Tested-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 21 10月, 2011 1 次提交
-
-
由 Joerg Roedel 提交于
With per-bus iommu_ops the iommu_found function needs to work on a bus_type too. This patch adds a bus_type parameter to that function and converts all call-places. The function is also renamed to iommu_present because the function now checks if an iommu is present for a given bus and does not check for a global iommu anymore. Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
- 05 10月, 2011 1 次提交
-
-
由 Liu, Jinsong 提交于
This patch emulate lapic tsc deadline timer for guest: Enumerate tsc deadline timer capability by CPUID; Enable tsc deadline timer mode by lapic MMIO; Start tsc deadline timer by WRMSR; [jan: use do_div()] [avi: fix for !irqchip_in_kernel()] [marcelo: another fix for !irqchip_in_kernel()] Signed-off-by: NLiu, Jinsong <jinsong.liu@intel.com> Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 26 9月, 2011 11 次提交
-
-
由 Avi Kivity 提交于
If simultaneous NMIs happen, we're supposed to queue the second and next (collapsing them), but currently we sometimes collapse the second into the first. Fix by using a counter for pending NMIs instead of a bool; since the counter limit depends on whether the processor is currently in an NMI handler, which can only be checked in vcpu context (via the NMI mask), we add a new KVM_REQ_NMI to request recalculation of the counter. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Nadav Har'El 提交于
KVM assumed in several places that reading the TSC MSR returns the value for L1. This is incorrect, because when L2 is running, the correct TSC read exit emulation is to return L2's value. We therefore add a new x86_ops function, read_l1_tsc, to use in places that specifically need to read the L1 TSC, NOT the TSC of the current level of guest. Note that one change, of one line in kvm_arch_vcpu_load, is made redundant by a different patch sent by Zachary Amsden (and not yet applied): kvm_arch_vcpu_load() should not read the guest TSC, and if it didn't, of course we didn't have to change the call of kvm_get_msr() to read_l1_tsc(). [avi: moved callback to kvm_x86_ops tsc block] Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Acked-by: NZachary Amsdem <zamsden@gmail.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Marcelo Tosatti 提交于
Windows Server 2008 SP2 checked build with smp > 1 BSOD's during boot due to lack of microcode update: *** Assertion failed: The system BIOS on this machine does not properly support the processor. The system BIOS did not load any microcode update. A BIOS containing the latest microcode update is needed for system reliability. (CurrentUpdateRevision != 0) *** Source File: d:\longhorn\base\hals\update\intelupd\update.c, line 440 Report a non-zero microcode update signature to make it happy. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Takuya Yoshikawa 提交于
Return EMULATION_OK/FAILED consistently. Also treat instruction fetch errors, not restricted to X86EMUL_UNHANDLEABLE, as EMULATION_FAILED; although this cannot happen in practice, the current logic will continue the emulation even if the decoder fails to fetch the instruction. Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Sasha Levin 提交于
Currently the method of dealing with an IO operation on a bus (PIO/MMIO) is to call the read or write callback for each device registered on the bus until we find a device which handles it. Since the number of devices on a bus can be significant due to ioeventfds and coalesced MMIO zones, this leads to a lot of overhead on each IO operation. Instead of registering devices, we now register ranges which points to a device. Lookup is done using an efficient bsearch instead of a linear search. Performance test was conducted by comparing exit count per second with 200 ioeventfds created on one byte and the guest is trying to access a different byte continuously (triggering usermode exits). Before the patch the guest has achieved 259k exits per second, after the patch the guest does 274k exits per second. Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NSasha Levin <levinsasha928@gmail.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Mike Waychison 提交于
Commit 0945d4b228 tried to fix the get_msr path for the HV_X64_MSR_APIC_ASSIST_PAGE msr, but was poorly tested. We should be returning 0 if the read succeeded, and passing the value back to the caller via the pdata out argument, not returning the value directly. Signed-off-by: NMike Waychison <mikew@google.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Mike Waychison 提交于
"get" support for the HV_X64_MSR_APIC_ASSIST_PAGE msr was missing, even though it is explicitly enumerated as something the vmm should save in msrs_to_save and reported to userland via the KVM_GET_MSR_INDEX_LIST ioctl. Add "get" support for HV_X64_MSR_APIC_ASSIST_PAGE. We simply return the guest visible value of this register, which seems to be correct as a set on the register is validated for us already. Signed-off-by: NMike Waychison <mikew@google.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Sasha Levin 提交于
The patch raises the hard limit of VCPU count to 254. This will allow developers to easily work on scalability and will allow users to test high VCPU setups easily without patching the kernel. To prevent possible issues with current setups, KVM_CAP_NR_VCPUS now returns the recommended VCPU limit (which is still 64) - this should be a safe value for everybody, while a new KVM_CAP_MAX_VCPUS returns the hard limit which is now 254. Cc: Avi Kivity <avi@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Suggested-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NSasha Levin <levinsasha928@gmail.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Xiao Guangrong 提交于
Using the read/write operation to remove the same code Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
The operations of read emulation and write emulation are very similar, so we can abstract the operation of them, in larter patch, it is used to cleanup the same code Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
If the range spans a page boundary, the mmio access can be broke, fix it as write emulation. And we already get the guest physical address, so use it to read guest data directly to avoid walking guest page table again Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 24 7月, 2011 5 次提交
-
-
由 Xiao Guangrong 提交于
Add tracepoints to trace mmio page fault Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
The idea is from Avi: | We could cache the result of a miss in an spte by using a reserved bit, and | checking the page fault error code (or seeing if we get an ept violation or | ept misconfiguration), so if we get repeated mmio on a page, we don't need to | search the slot list/tree. | (https://lkml.org/lkml/2011/2/22/221) When the page fault is caused by mmio, we cache the info in the shadow page table, and also set the reserved bits in the shadow page table, so if the mmio is caused again, we can quickly identify it and emulate it directly Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it can be reduced by this feature, and also avoid walking guest page table for soft mmu. [jan: fix operator precedence issue] Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
The idea is from Avi: | Maybe it's time to kill off bypass_guest_pf=1. It's not as effective as | it used to be, since unsync pages always use shadow_trap_nonpresent_pte, | and since we convert between the two nonpresent_ptes during sync and unsync. Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
If the page fault is caused by mmio, we can cache the mmio info, later, we do not need to walk guest page table and quickly know it is a mmio fault while we emulate the mmio instruction Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Introduce vcpu_mmio_gva_to_gpa to translate the gva to gpa, we can use it to cleanup the code between read emulation and write emulation Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 14 7月, 2011 1 次提交
-
-
由 Glauber Costa 提交于
To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM, while the vcpu had meaningful work to do - halt time does not count. This information is acquired through the run_delay field of delayacct/schedstats infrastructure, that counts time spent in a runqueue but not running. Steal time is a per-cpu information, so the traditional MSR-based infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the memory area address containing information about steal time This patch contains the hypervisor part of the steal time infrasructure, and can be backported independently of the guest portion. [avi, yongjie: export delayacct_on, to avoid build failures in some configs] Signed-off-by: NGlauber Costa <glommer@redhat.com> Tested-by: NEric B Munson <emunson@mgebm.net> CC: Rik van Riel <riel@redhat.com> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: NYongjie Ren <yongjie.ren@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 12 7月, 2011 15 次提交
-
-
由 Yang, Wei 提交于
This patch exposes ERMS feature to KVM guests. The REP MOVSB/STOSB instruction can enhance fast strings attempts to move as much of the data with larger size load/stores as possible. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei 提交于
This patch exposes RDWRGSFS bit to KVM guests. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei 提交于
This patch adds RDWRGSFS support when setting CR4. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei Y 提交于
This patch exposes DRNG feature to KVM guests. The RDRAND instruction can provide software with sequences of random numbers generated from white noise. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Andre Przywara 提交于
commit 123108f1c1aafd51d6a5c79cc04d7999dd88a930 tried to fix KVMs XSAVE valid feature scanning, but it was wrong. It was not considering the sparse nature of this bitfield, instead reading values from uninitialized members of the entries array. This patch now separates subleaf indicies from KVM's array indicies and fills the entry before querying it's value. This fixes AVX support in KVM guests. Signed-off-by: NAndre Przywara <andre.przywara@amd.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei Y 提交于
This patch masks CPUID leaf 7 ebx against host capability word9. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NShan, Haitao <haitao.shan@intel.com> Signed-off-by: NLi, Xin <xin.li@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei Y 提交于
This patch adds SMEP handling when setting CR4. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NShan, Haitao <haitao.shan@intel.com> Signed-off-by: NLi, Xin <xin.li@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
This saves a lot of pointless casts x86_emulate_ctxt and decode_cache. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
The name eip conflicts with a field of the same name in x86_emulate_ctxt, which we plan to fold decode_cache into. The name _eip is unfortunate, but what's really needed is a refactoring here, not a better name. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
We should use the local variables ctxt and c when the emulate_ctxt and decode appears many times. At least, we need to be consistent about how we use these in a function. Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Nadav Har'El 提交于
This patch implements the VMPTRST instruction. Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Nadav Har'El 提交于
This patch implements the VMCLEAR instruction. Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Nadav Har'El 提交于
This patch includes a utility function for decoding pointer operands of VMX instructions issued by L1 (a guest hypervisor) Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Nadav Har'El 提交于
This patch allows the guest to enable the VMXE bit in CR4, which is a prerequisite to running VMXON. Whether to allow setting the VMXE bit now depends on the architecture (svm or vmx), so its checking has moved to kvm_x86_ops->set_cr4(). This function now returns an int: If kvm_x86_ops->set_cr4() returns 1, __kvm_set_cr4() will also return 1, and this will cause kvm_set_cr4() will throw a #GP. Turning on the VMXE bit is allowed only when the nested VMX feature is enabled, and turning it off is forbidden after a vmxon. Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
During tracing the emulator, we noticed that init_emulate_ctxt() sometimes took a bit longer time than we expected. This patch is for mitigating the problem by some degree. By looking into the function, we soon notice that it clears the whole decode_cache whose size is about 2.5K bytes now. Furthermore, most of the bytes are taken for the two read_cache arrays, which are used only by a few instructions. Considering the fact that we are not assuming the cache arrays have been cleared when we store actual data, we do not need to clear the arrays: 2K bytes elimination. In addition, we can avoid clearing the fetch_cache and regs arrays. This patch changes the initialization not to clear the arrays. On our 64-bit host, init_emulate_ctxt() becomes 0.3 to 0.5us faster with this patch applied. Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-