- 27 12月, 2011 15 次提交
-
-
由 Chris Wright 提交于
The host side pv mmu support has been marked for feature removal in January 2011. It's not in use, is slower than shadow or hardware assisted paging, and a maintenance burden. It's November 2011, time to remove it. Signed-off-by: NChris Wright <chrisw@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Jan Kiszka 提交于
The vcpu reference of a kvm_timer can't become NULL while the timer is valid, so drop this redundant test. This also makes it pointless to carry a separate __kvm_timer_fn, fold it into kvm_timer_fn. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Xiao Guangrong 提交于
Detecting write-flooding does not work well, when we handle page written, if the last speculative spte is not accessed, we treat the page is write-flooding, however, we can speculative spte on many path, such as pte prefetch, page synced, that means the last speculative spte may be not point to the written page and the written page can be accessed via other sptes, so depends on the Accessed bit of the last speculative spte is not enough Instead of detected page accessed, we can detect whether the spte is accessed after it is written, if the spte is not accessed but it is written frequently, we treat is not a page table or it not used for a long time Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Sometimes, we only modify the last one byte of a pte to update status bit, for example, clear_bit is used to clear r/w bit in linux kernel and 'andb' instruction is used in this function, in this case, kvm_mmu_pte_write will treat it as misaligned access, and the shadow page table is zapped Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
kvm_mmu_pte_write is too long, we split it for better readable Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
In kvm_mmu_pte_write, we do not need to alloc shadow page, so calling kvm_mmu_free_some_pages is really unnecessary Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Fast prefetch spte for the unsync shadow page on invlpg path Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Directly Use mmu_page_zap_pte to zap spte in FNAME(invlpg), also remove the same code between FNAME(invlpg) and FNAME(sync_page) Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
In current code, the accessed bit is always set when page fault occurred, do not need to set it on pte write path Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Remove the same code between emulator_pio_in_emulated and emulator_pio_out_emulated Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
If the emulation is caused by #PF and it is non-page_table writing instruction, it means the VM-EXIT is caused by shadow page protected, we can zap the shadow page and retry this instruction directly The idea is from Avi Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
The idea is from Avi: | tag instructions that are typically used to modify the page tables, and | drop shadow if any other instruction is used. | The list would include, I'd guess, and, or, bts, btc, mov, xchg, cmpxchg, | and cmpxchg8b. This patch is used to tag the instructions and in the later path, shadow page is dropped if it is written by other instructions Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
kvm_mmu_pte_write is unsafe since we need to alloc pte_list_desc in the function when spte is prefetched, unfortunately, we can not know how many spte need to be prefetched on this path, that means we can use out of the free pte_list_desc object in the cache, and BUG_ON() is triggered, also some path does not fill the cache, such as INS instruction emulated that does not trigger page fault Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Nadav Har'El 提交于
When L0 wishes to inject an interrupt while L2 is running, it emulates an exit to L1 with EXIT_REASON_EXTERNAL_INTERRUPT. This was explained in the original nVMX patch 23, titled "Correct handling of interrupt injection". Unfortunately, it is possible (though rare) that at this point there is valid idt_vectoring_info in vmcs02. For example, L1 injected some interrupt to L2, and when L2 tried to run this interrupt's handler, it got a page fault - so it returns the original interrupt vector in idt_vectoring_info. The problem is that if this is the case, we cannot exit to L1 with EXTERNAL_INTERRUPT like we wished to, because the VMX spec guarantees that idt_vectoring_info and exit_reason_external_interrupt can never happen together. This is not just specified in the spec - a KVM L1 actually prints a kernel warning "unexpected, valid vectoring info" if we violate this guarantee, and some users noticed these warnings in L1's logs. In order to better emulate a processor, which would never return the external interrupt and the idt-vectoring-info together, we need to separate the two injection steps: First, complete L1's injection into L2 (i.e., enter L2, injecting to it the idt-vectoring-info); Second, after entry into L2 succeeds and it exits back to L0, exit to L1 with the EXIT_REASON_EXTERNAL_INTERRUPT. Most of this is already in the code - the only change we need is to remain in L2 (and not exit to L1) in this case. Note that the previous patch ensures (by using KVM_REQ_IMMEDIATE_EXIT) that although we do enter L2 first, it will exit immediately after processing its injection, allowing us to promptly inject to L1. Note how we test vmcs12->idt_vectoring_info_field; This isn't really the vmcs12 value (we haven't exited to L1 yet, so vmcs12 hasn't been updated), but rather the place we save, at the end of vmx_vcpu_run, the vmcs02 value of this field. This was explained in patch 25 ("Correct handling of idt vectoring info") of the original nVMX patch series. Thanks to Dave Allan and to Federico Simoncelli for reporting this bug, to Abel Gordon for helping me figure out the solution, and to Avi Kivity for helping to improve it. Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Nadav Har'El 提交于
This patch adds a new vcpu->requests bit, KVM_REQ_IMMEDIATE_EXIT. This bit requests that when next entering the guest, we should run it only for as little as possible, and exit again. We use this new option in nested VMX: When L1 launches L2, but L0 wishes L1 to continue running so it can inject an event to it, we unfortunately cannot just pretend to have run L2 for a little while - We must really launch L2, otherwise certain one-off vmcs12 parameters (namely, L1 injection into L2) will be lost. So the existing code runs L2 in this case. But L2 could potentially run for a long time until it exits, and the injection into L1 will be delayed. The new KVM_REQ_IMMEDIATE_EXIT allows us to request that L2 will be entered, as necessary, but will exit as soon as possible after entry. Our implementation of this request uses smp_send_reschedule() to send a self-IPI, with interrupts disabled. The interrupts remain disabled until the guest is entered, and then, after the entry is complete (often including processing an injection and jumping to the relevant handler), the physical interrupt is noticed and causes an exit. On recent Intel processors, we could have achieved the same goal by using MTF instead of a self-IPI. Another technique worth considering in the future is to use VM_EXIT_ACK_INTR_ON_EXIT and a highest-priority vector IPI - to slightly improve performance by avoiding the useless interrupt handler which ends up being called when smp_send_reschedule() is used. Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 26 12月, 2011 1 次提交
-
-
由 Jan Kiszka 提交于
Unlike all of the other cpuid bits, the TSC deadline timer bit is set unconditionally, regardless of what userspace wants. This is broken in several ways: - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC deadline timer feature, a guest that uses the feature will break - live migration to older host kernels that don't support the TSC deadline timer will cause the feature to be pulled from under the guest's feet; breaking it - guests that are broken wrt the feature will fail. Fix by not enabling the feature automatically; instead report it to userspace. Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not KVM_GET_SUPPORTED_CPUID. Fixes the Illumos guest kernel, which uses the TSC deadline timer feature. [avi: add the KVM_CAP + documentation] Reported-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com> Tested-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 25 12月, 2011 1 次提交
-
-
由 Jan Kiszka 提交于
User space may create the PIT and forgets about setting up the irqchips. In that case, firing PIT IRQs will crash the host: BUG: unable to handle kernel NULL pointer dereference at 0000000000000128 IP: [<ffffffffa10f6280>] kvm_set_irq+0x30/0x170 [kvm] ... Call Trace: [<ffffffffa11228c1>] pit_do_work+0x51/0xd0 [kvm] [<ffffffff81071431>] process_one_work+0x111/0x4d0 [<ffffffff81071bb2>] worker_thread+0x152/0x340 [<ffffffff81075c8e>] kthread+0x7e/0x90 [<ffffffff815a4474>] kernel_thread_helper+0x4/0x10 Prevent this by checking the irqchip mode before starting a timer. We can't deny creating the PIT if the irqchips aren't set up yet as current user land expects this order to work. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 17 11月, 2011 3 次提交
-
-
由 Gleb Natapov 提交于
Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Gleb Natapov 提交于
Support guest/host-only profiling by switch perf msrs on a guest entry if needed. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Gleb Natapov 提交于
Some cpus have special support for switching PERF_GLOBAL_CTRL msr. Add logic to detect if such support exists and works properly and extend msr switching code to use it if available. Also extend number of generic msr switching entries to 8. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 30 10月, 2011 1 次提交
-
-
由 Jan Kiszka 提交于
AMD processors apparently have a bug in the hardware task switching support when NPT is enabled. If the task switch triggers a NPF, we can get wrong EXITINTINFO along with that fault. On resume, spurious exceptions may then be injected into the guest. We were able to reproduce this bug when our guest triggered #SS and the handler were supposed to run over a separate task with not yet touched stack pages. Work around the issue by continuing to emulate task switches even in NPT mode. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 21 10月, 2011 1 次提交
-
-
由 Joerg Roedel 提交于
With per-bus iommu_ops the iommu_found function needs to work on a bus_type too. This patch adds a bus_type parameter to that function and converts all call-places. The function is also renamed to iommu_present because the function now checks if an iommu is present for a given bus and does not check for a global iommu anymore. Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
- 05 10月, 2011 1 次提交
-
-
由 Liu, Jinsong 提交于
This patch emulate lapic tsc deadline timer for guest: Enumerate tsc deadline timer capability by CPUID; Enable tsc deadline timer mode by lapic MMIO; Start tsc deadline timer by WRMSR; [jan: use do_div()] [avi: fix for !irqchip_in_kernel()] [marcelo: another fix for !irqchip_in_kernel()] Signed-off-by: NLiu, Jinsong <jinsong.liu@intel.com> Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 26 9月, 2011 17 次提交
-
-
由 Avi Kivity 提交于
If simultaneous NMIs happen, we're supposed to queue the second and next (collapsing them), but currently we sometimes collapse the second into the first. Fix by using a counter for pending NMIs instead of a bool; since the counter limit depends on whether the processor is currently in an NMI handler, which can only be checked in vcpu context (via the NMI mask), we add a new KVM_REQ_NMI to request recalculation of the counter. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
The opcodes push %seg pop %seg l%seg, %mem, %reg (e.g. lds/les/lss/lfs/lgs) all have an segment register encoded in the instruction. To allow reuse, decode the segment number into src2 during the decode stage instead of the execution stage. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Use the same technique as the other OpMem variants, and goto mem_common. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
OpReg decoding has a hack that inhibits byte registers for movsx and movzx instructions. It should be replaced by something better, but meanwhile, qualify that the hack is only active for the destination operand. Note these instructions only use OpReg for the destination, but better to be explicit about it. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Similar to SrcImmUByte. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Op fields are going to grow by a bit, we need two free bits. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Unifiying the operands means not taking advantage of the fact that some operand types can only go into certain operands (for example, DI can only be used by the destination), so we need more bits to hold the operand type. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Instead of decoding each operand using its own code, use a generic function. Start with the destination operand. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Simplifies further generalization of decode. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Jan Kiszka 提交于
Certain guests, specifically RTOSes, request faster periodic timers than what we allow by default. Add a module parameter to adjust the limit for non-standard setups. Also add a rate-limited warning in case the guest requested more. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Jan Kiszka 提交于
The use of printk_ratelimit is discouraged, replace it with pr*_ratelimited or __ratelimit. While at it, convert remaining guest-triggerable printks to rate-limited variants. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Jan Kiszka 提交于
Convert remaining printks that the guest can trigger to apic_printk. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-