- 22 5月, 2014 1 次提交
-
-
由 Paolo Bonzini 提交于
Not needed anymore now that the CPL is computed directly during task switch. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 5月, 2014 1 次提交
-
-
由 Marcelo Tosatti 提交于
Updating system_time from the kernel clock once master clock has been enabled can result in time backwards event, in case kernel clock frequency is lower than TSC frequency. Disable master clock in case it is necessary to update it from the resume path. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 5月, 2014 1 次提交
-
-
由 Jan Kiszka 提交于
Regression of 346874c9: PAE is set in long mode, but that does not mean we have valid PDPTRs. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 5月, 2014 1 次提交
-
-
由 Michael S. Tsirkin 提交于
It seems that it's easy to implement the EOI assist on top of the PV EOI feature: simply convert the page address to the format expected by PV EOI. Notes: -"No EOI required" is set only if interrupt injected is edge triggered; this is true because level interrupts are going through IOAPIC which disables PV EOI. In any case, if guest triggers EOI the bit will get cleared on exit. -For migration, set of HV_X64_MSR_APIC_ASSIST_PAGE sets KVM_PV_EOI_EN internally, so restoring HV_X64_MSR_APIC_ASSIST_PAGE seems sufficient In any case, bit is cleared on exit so worst case it's never re-enabled -no handling of PV EOI data is performed at HV_X64_MSR_EOI write; HV_X64_MSR_EOI is a separate optimization - it's an X2APIC replacement that lets you do EOI with an MSR and not IO. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 06 5月, 2014 2 次提交
-
-
由 Andi Kleen 提交于
As requested by Linus add explicit __visible to the asmlinkage users. This marks all functions visible to assembler. Tree sweep for arch/x86/* Signed-off-by: NAndi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Ulrich Obergfell 提交于
This patch moves the 'kvm_pio' tracepoint to emulator_pio_in_emulated() and emulator_pio_out_emulated(), and it adds an argument (a pointer to the 'pio_data'). A single 8-bit or 16-bit or 32-bit data item is fetched from 'pio_data' (depending on 'size'), and the value is included in the trace record ('val'). If 'count' is greater than one, this is indicated by the string "(...)" in the trace output. Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com> Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 4月, 2014 4 次提交
-
-
由 Xiao Guangrong 提交于
Now we can flush all the TLBs out of the mmu lock without TLB corruption when write-proect the sptes, it is because: - we have marked large sptes readonly instead of dropping them that means we just change the spte from writable to readonly so that we only need to care the case of changing spte from present to present (changing the spte from present to nonpresent will flush all the TLBs immediately), in other words, the only case we need to care is mmu_spte_update() - in mmu_spte_update(), we haved checked SPTE_HOST_WRITEABLE | PTE_MMU_WRITEABLE instead of PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK anymore Acked-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Xiao Guangrong 提交于
Currently, kvm zaps the large spte if write-protected is needed, the later read can fault on that spte. Actually, we can make the large spte readonly instead of making them un-present, the page fault caused by read access can be avoided The idea is from Avi: | As I mentioned before, write-protecting a large spte is a good idea, | since it moves some work from protect-time to fault-time, so it reduces | jitter. This removes the need for the return value. This version has fixed the issue reported in 6b73a960, the reason of that issue is that fast_page_fault() directly sets the readonly large spte to writable but only dirty the first page into the dirty-bitmap that means other pages are missed. Fixed it by only the normal sptes (on the PT_PAGE_TABLE_LEVEL level) can be fast fixed Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Nadav Amit 提交于
If EFER.LMA is off, cs.l does not determine execution mode. Currently, the emulation engine assumes differently. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Nadav Amit 提交于
According to Intel specifications, PAE and non-PAE does not have any reserved bits. In long-mode, regardless to PCIDE, only the high bits (above the physical address) are reserved. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 18 4月, 2014 1 次提交
-
-
由 Michael S. Tsirkin 提交于
It is sometimes benefitial to ignore IO size, and only match on address. In hindsight this would have been a better default than matching length when KVM_IOEVENTFD_FLAG_DATAMATCH is not set, In particular, this kind of access can be optimized on VMX: there no need to do page lookups. This can currently be done with many ioeventfds but in a suboptimal way. However we can't change kernel/userspace ABI without risk of breaking some applications. Use len = 0 to mean "ignore length for matching" in a more optimal way. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 15 4月, 2014 2 次提交
-
-
由 Marcelo Tosatti 提交于
Function and callers can be preempted. https://bugzilla.kernel.org/show_bug.cgi?id=73721Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Feng Wu 提交于
This patch adds SMAP handling logic when setting CR4 for guests Thanks a lot to Paolo Bonzini for his suggestion to use the branchless way to detect SMAP violation. Signed-off-by: NFeng Wu <feng.wu@intel.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 27 3月, 2014 1 次提交
-
-
由 Paolo Bonzini 提交于
kvm_x86_ops is still NULL at this point. Since kvm_init_msr_list cannot fail, it is safe to initialize it before the call. Fixes: 93c4adc7Reported-by: NFengguang Wu <fengguang.wu@intel.com> Tested-by: NJet Chen <jet.chen@intel.com> Cc: kvm@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 20 3月, 2014 1 次提交
-
-
由 Srivatsa S. Bhat 提交于
Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the kvm code in x86 by using this latter form of callback registration. Cc: Gleb Natapov <gleb@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 17 3月, 2014 2 次提交
-
-
由 Paolo Bonzini 提交于
When doing nested virtualization, we may be able to read BNDCFGS but still not be allowed to write to GUEST_BNDCFGS in the VMCS. Guard writes to the field with vmx_mpx_supported(), and similarly hide the MSR from userspace if the processor does not support the field. We could work around this with the generic MSR save/load machinery, but there is only a limited number of MSR save/load slots and it is not really worthwhile to waste one for a scenario that should not happen except in the nested virtualization case. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
XSAVE support for KVM is already using host_xcr0 & KVM_SUPPORTED_XCR0 as a "dynamic" version of KVM_SUPPORTED_XCR0. However, this is not enough because the MPX bits should not be presented to the guest unless kvm_x86_ops confirms the support. So, replace all instances of host_xcr0 & KVM_SUPPORTED_XCR0 with a new function kvm_supported_xcr0() that also has this check. Note that here: if (xstate_bv & ~KVM_SUPPORTED_XCR0) return -EINVAL; if (xstate_bv & ~host_cr0) return -EINVAL; the code is equivalent to if ((xstate_bv & ~KVM_SUPPORTED_XCR0) || (xstate_bv & ~host_cr0) return -EINVAL; i.e. "xstate_bv & (~KVM_SUPPORTED_XCR0 | ~host_cr0)" which is in turn equal to "xstate_bv & ~(KVM_SUPPORTED_XCR0 & host_cr0)". So we should also use the new function there. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 3月, 2014 1 次提交
-
-
由 Gabriel L. Somlo 提交于
Both QEMU and KVM have already accumulated a significant number of optimizations based on the hard-coded assumption that ioapic polarity will always use the ActiveHigh convention, where the logical and physical states of level-triggered irq lines always match (i.e., active(asserted) == high == 1, inactive == low == 0). QEMU guests are expected to follow directions given via ACPI and configure the ioapic with polarity 0 (ActiveHigh). However, even when misbehaving guests (e.g. OS X <= 10.9) set the ioapic polarity to 1 (ActiveLow), QEMU will still use the ActiveHigh signaling convention when interfacing with KVM. This patch modifies KVM to completely ignore ioapic polarity as set by the guest OS, enabling misbehaving guests to work alongside those which comply with the ActiveHigh polarity specified by QEMU's ACPI tables. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NGabriel L. Somlo <somlo@cmu.edu> [Move documentation to KVM_IRQ_LINE, add ia64. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 3月, 2014 4 次提交
-
-
由 Paolo Bonzini 提交于
When not running in guest-debug mode, the guest controls the debug registers and having to take an exit for each DR access is a waste of time. If the guest gets into a state where each context switch causes DR to be saved and restored, this can take away as much as 40% of the execution time from the guest. After this patch, VMX- and SVM-specific code can set a flag in switch_db_regs, telling vcpu_enter_guest that on the next exit the debug registers might be dirty and need to be reloaded (syncing will be taken care of by a new callback in kvm_x86_ops). This flag can be set on the first access to a debug registers, so that multiple accesses to the debug registers only cause one vmexit. Note that since the guest will be able to read debug registers and enable breakpoints in DR7, we need to ensure that they are synchronized on entry to the guest---including DR6 that was not synced before. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The next patch will add another bit that we can test with the same "if". Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jan Kiszka 提交于
It's no longer possible to enter enable_irq_window in guest mode when L1 intercepts external interrupts and we are entering L2. This is now caught in vcpu_enter_guest. So we can remove the check from the VMX version of enable_irq_window, thus the need to return an error code from both enable_irq_window and enable_nmi_window. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jan Kiszka 提交于
Move the check for leaving L2 on pending and intercepted IRQs or NMIs from the *_allowed handler into a dedicated callback. Invoke this callback at the relevant points before KVM checks if IRQs/NMIs can be injected. The callback has the task to switch from L2 to L1 if needed and inject the proper vmexit events. The rework fixes L2 wakeups from HLT and provides the foundation for preemption timer emulation. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 04 3月, 2014 2 次提交
-
-
由 Andrew Jones 提交于
commit 0061d53d introduced a mechanism to execute a global clock update for a vm. We can apply this periodically in order to propagate host NTP corrections. Also, if all vcpus of a vm are pinned, then without an additional trigger, no guest NTP corrections can propagate either, as the current trigger is only vcpu cpu migration. Signed-off-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andrew Jones 提交于
When we update a vcpu's local clock it may pick up an NTP correction. We can't wait an indeterminate amount of time for other vcpus to pick up that correction, so commit 0061d53d introduced a global clock update. However, we can't request a global clock update on every vcpu load either (which is what happens if the tsc is marked as unstable). The solution is to rate-limit the global clock updates. Marcelo calculated that we should delay the global clock updates no more than 0.1s as follows: Assume an NTP correction c is applied to one vcpu, but not the other, then in n seconds the delta of the vcpu system_timestamps will be c * n. If we assume a correction of 500ppm (worst-case), then the two vcpus will diverge 50us in 0.1s, which is a considerable amount. Signed-off-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 2月, 2014 2 次提交
-
-
由 Andrew Honig 提交于
The problem occurs when the guest performs a pusha with the stack address pointing to an mmio address (or an invalid guest physical address) to start with, but then extending into an ordinary guest physical address. When doing repeated emulated pushes emulator_read_write sets mmio_needed to 1 on the first one. On a later push when the stack points to regular memory, mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0. As a result, KVM exits to userspace, and then returns to complete_emulated_mmio. In complete_emulated_mmio vcpu->mmio_cur_fragment is incremented. The termination condition of vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved. The code bounces back and fourth to userspace incrementing mmio_cur_fragment past it's buffer. If the guest does nothing else it eventually leads to a a crash on a memcpy from invalid memory address. However if a guest code can cause the vm to be destroyed in another vcpu with excellent timing, then kvm_clear_async_pf_completion_queue can be used by the guest to control the data that's pointed to by the call to cancel_work_item, which can be used to gain execution. Fixes: f78146b0Signed-off-by: NAndrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org (3.5+) Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Takuya Yoshikawa 提交于
No need to scan the entire VCPU array. Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 2月, 2014 3 次提交
-
-
由 Marcelo Tosatti 提交于
emulator_cmpxchg_emulated writes to guest memory, therefore it should update the dirty bitmap accordingly. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liu, Jinsong 提交于
From 44c2abca2c2eadc6f2f752b66de4acc8131880c4 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Mon, 24 Feb 2014 18:12:31 +0800 Subject: [PATCH v5 3/3] KVM: x86: Enable Intel MPX for guest This patch enable Intel MPX feature to guest. Signed-off-by: NXudong Hao <xudong.hao@intel.com> Signed-off-by: NLiu Jinsong <jinsong.liu@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liu, Jinsong 提交于
From 5d5a80cd172ea6fb51786369bcc23356b1e9e956 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Mon, 24 Feb 2014 18:11:55 +0800 Subject: [PATCH v5 2/3] KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save Add MSR_IA32_BNDCFGS to msrs_to_save, and corresponding logic to kvm_get/set_msr(). Signed-off-by: NLiu Jinsong <jinsong.liu@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 2月, 2014 1 次提交
-
-
由 Liu, Jinsong 提交于
From 00c920c96127d20d4c3bb790082700ae375c39a0 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Fri, 21 Feb 2014 23:47:18 +0800 Subject: [PATCH] KVM: x86: Fix xsave cpuid exposing bug EBX of cpuid(0xD, 0) is dynamic per XCR0 features enable/disable. Bit 63 of XCR0 is reserved for future expansion. Signed-off-by: NLiu Jinsong <jinsong.liu@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 04 2月, 2014 1 次提交
-
-
由 Marcelo Tosatti 提交于
Remove unused last_kernel_ns variable. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 30 1月, 2014 1 次提交
-
-
由 Paolo Bonzini 提交于
Self explanatory. Reported-by: NRadim Krcmar <rkrcmar@redhat.com> Cc: Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 27 1月, 2014 1 次提交
-
-
由 Jan Kiszka 提交于
Check for invalid state transitions on guest-initiated updates of MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is not supported and all invalid transitions as described in SDM section 10.12.5. It also checks that no reserved bit is set in APICBASE by the guest. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> [Use cpuid_maxphyaddr instead of guest_cpuid_get_phys_bits. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 1月, 2014 2 次提交
-
-
由 Vadim Rozenfeld 提交于
Signed-off-by: NVadim Rozenfeld <vrozenfe@redhat.com> Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vadim Rozenfeld 提交于
Signed-off-by: NVadim Rozenfeld <vrozenfe@redhat.com> Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 17 1月, 2014 3 次提交
-
-
由 Jan Kiszka 提交于
In contrast to VMX, SVM dose not automatically transfer DR6 into the VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor hook to obtain the current value. And as SVM now picks the DR6 state from its VMCB, we also need a set callback in order to write updates of DR6 back. Fixes a regression of 020df079. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jan Kiszka 提交于
Whenever we change arch.dr7, we also have to call kvm_update_dr7. In case guest debugging is off, this will synchronize the new state into hardware. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vadim Rozenfeld 提交于
Signed-off: Peter Lieven <pl@kamp.de> Signed-off: Gleb Natapov Signed-off: Vadim Rozenfeld <vrozenfe@redhat.com> After some consideration I decided to submit only Hyper-V reference counters support this time. I will submit iTSC support as a separate patch as soon as it is ready. v1 -> v2 1. mark TSC page dirty as suggested by Eric Northup <digitaleric@google.com> and Gleb 2. disable local irq when calling get_kernel_ns, as it was done by Peter Lieven <pl@amp.de> 3. move check for TSC page enable from second patch to this one. v3 -> v4 Get rid of ref counter offset. v4 -> v5 replace __copy_to_user with kvm_write_guest when updateing iTSC page. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 1月, 2014 1 次提交
-
-
由 Paolo Bonzini 提交于
After the previous patch from Marcelo, the comment before this write became obsolete. In fact, the write is unnecessary. The calls to kvm_write_tsc ultimately result in a master clock update as soon as all TSCs agree and the master clock is re-enabled. This master clock update will rewrite tsc_timestamp. So, together with the comment, delete the dead write too. Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 1月, 2014 1 次提交
-
-
由 Marcelo Tosatti 提交于
To fix a problem related to different resolution of TSC and system clock, the offset in TSC units is approximated by delta = vcpu->hv_clock.tsc_timestamp - vcpu->last_guest_tsc (Guest TSC value at (Guest TSC value at last VM-exit) the last kvm_guest_time_update call) Delta is then later scaled using mult,shift pair found in hv_clock structure (which is correct against tsc_timestamp in that structure). However, if a frequency change is performed between these two points, this delta is measured using different TSC frequencies, but scaled using mult,shift pair for one frequency only. The end result is an incorrect delta. The bug which this code works around is not the only cause for clock backwards events. The global accumulator is still necessary, so remove the max_kernel_ns fix and rely on the global accumulator for no clock backwards events. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-