- 17 11月, 2016 1 次提交
-
-
由 Jiang Biao 提交于
vmx_arm_hv_timer is only used in vmx.c, and should be static to avoid compiling warning when with -Wmissing-prototypes option. Signed-off-by: NJiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 04 11月, 2016 3 次提交
-
-
由 Jike Song 提交于
Signed-off-by: NJike Song <jike.song@intel.com> Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jike Song 提交于
The user of page_track might needs extra information, so pass the kvm_page_track_notifier_node to callbacks. Signed-off-by: NJike Song <jike.song@intel.com> Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiaoguang Chen 提交于
When a memory slot is being moved or removed users of page track can be notified. So users can drop write-protection for the pages in that memory slot. This notifier type is needed by KVMGT to sync up its shadow page table when memory slot is being moved or removed. Register the notifier type track_flush_slot to receive memslot move and remove event. Reviewed-by: NXiao Guangrong <guangrong.xiao@intel.com> Signed-off-by: NChen Xiaoguang <xiaoguang.chen@intel.com> [Squashed commits to avoid bisection breakage and reworded the subject.] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 03 11月, 2016 20 次提交
-
-
由 Paolo Bonzini 提交于
On some benchmarks (e.g. netperf with ioeventfd disabled), APICv posted interrupts turn out to be slower than interrupt injection via KVM_REQ_EVENT. This patch optimizes a bit the IRR update, avoiding expensive atomic operations in the common case where PI.ON=0 at vmentry or the PIR vector is mostly zero. This saves at least 20 cycles (1%) per vmexit, as measured by kvm-unit-tests' inl_from_qemu test (20 runs): | enable_apicv=1 | enable_apicv=0 | mean stdev | mean stdev ----------|-----------------|------------------ before | 5826 32.65 | 5765 47.09 after | 5809 43.42 | 5777 77.02 Of course, any change in the right column is just placebo effect. :) The savings are bigger if interrupts are frequent. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
These are never used by the host, but they can still be reflected to the guest. Tested-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The base clock for the LAPIC timer is always CLOCK_MONOTONIC. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Most windows guests still utilize APIC Timer periodic/oneshot mode instead of tsc-deadline mode, and the APIC Timer periodic/oneshot mode are still emulated by high overhead hrtimer on host. This patch converts the expected expire time of the periodic/oneshot mode to guest deadline tsc in order to leverage VMX preemption timer logic for APIC Timer tsc-deadline mode. After each preemption timer vmexit preemption timer is restarted to emulate LVTT current-count register is automatically reloaded from the initial-count register when the count reaches 0. This patch reduces ~5600 cycles for each APIC Timer periodic mode operation virtualization. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> [Squashed with my fixes that were reviewed-by Paolo.] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
Rename start/cancel_hv_tscdeadline to start/cancel_hv_timer since they will handle both APIC Timer periodic/oneshot mode and tsc-deadline mode. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
Introdce kvm_get_lapic_target_expiration_tsc() to get APIC Timer target deadline tsc. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
Check apic_lvtt_tscdeadline() mode directly instead of apic_lvtt_oneshot() and apic_lvtt_period() to guarantee the timer is in tsc-deadline mode when rdmsr MSR_IA32_TSCDEADLINE. Suggested-by: NRadim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
Extract start_sw_period() to handle periodic/oneshot mode, it will be used by later patch. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Longpeng(Mike) 提交于
Since Paolo has removed irq-enable-operation in vmx_handle_external_intr (KVM: x86: use guest_exit_irqoff), the original comment about the IF bit in rflags is incorrect and stale now, so remove it. Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Xiaoguang Chen 提交于
When a memory slot is being moved or removed users of page track can be notified. So users can drop write-protection for the pages in that memory slot. This notifier type is needed by KVMGT to sync up its shadow page table when memory slot is being moved or removed. Register the notifier type track_flush_slot to receive memslot move and remove event. Reviewed-by: NXiao Guangrong <guangrong.xiao@intel.com> Signed-off-by: NChen Xiaoguang <xiaoguang.chen@intel.com> [Squashed commits to avoid bisection breakage and reworded the subject.] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Radim Krčmář 提交于
We've had 10 page-sized bitmaps that were being allocated and freed one by one when we could just use a cycle. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Radim Krčmář 提交于
vmx_disable_intercept_msr_read_x2apic() and vmx_disable_intercept_msr_write_x2apic() differed only in the type. Pass the type to a new function. [Ordered and commented TPR intercept according to Paolo's suggestion.] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Radim Krčmář 提交于
All intercepts are enabled at the beginning, so they can only be used if we disabled an intercept that we wanted to have enabled. This was done for TMCCT to simplify a loop that disables all x2APIC MSR intercepts, but just keeping TMCCT enabled yields better results. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Jim Mattson 提交于
When L0 establishes (or removes) an MSR entry in the VM-entry or VM-exit MSR load lists, the change should affect the dormant VMCS as well as the current VMCS. Moreover, the vmcs02 MSR-load addresses should be initialized. Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Jim Mattson 提交于
When forwarding a hardware VM-entry failure to L1, fetch the VM_INSTRUCTION_ERROR field from vmcs02 before loading vmcs01. (Note that there is an implicit assumption that the VM-entry failure was on the first VM-entry to vmcs02 after nested_vmx_run; otherwise, L1 is going to be very confused.) Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NPeter Feiner <pfeiner@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Peter Feiner 提交于
The MMU notifier sequence number keeps GPA->HPA mappings in sync when GPA->HPA lookups are done outside of the MMU lock (e.g., in tdp_page_fault). Since kvm_age_hva doesn't change GPA->HPA, it's unnecessary to increment the sequence number. Signed-off-by: NPeter Feiner <pfeiner@google.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
Renames x2apic_apicv_inactive msr_bitmaps to x2apic and original x2apic bitmaps to x2apic_apicv. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Owen Hofmann 提交于
Commit 41061cdb ("KVM: emulate: do not initialize memopp") removes a check for non-NULL under incorrect assumptions. An undefined instruction with a ModR/M byte with Mod=0 and R/M-5 (e.g. 0xc7 0x15) will attempt to dereference a null pointer here. Fixes: 41061cdb Message-Id: <1477592752-126650-2-git-send-email-osh@google.com> Signed-off-by: NOwen Hofmann <osh@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
After a successful VM-entry with the "VMCS shadowing" VM-execution control set, the shadow VMCS referenced by the VMCS link pointer field in the current VMCS becomes active on the logical processor. A VMCS that is made active on more than one logical processor may become corrupted. Therefore, before an active VMCS can be migrated to another logical processor, the first logical processor must execute a VMCLEAR for the active VMCS. VMCLEAR both ensures that all VMCS data are written to memory and makes the VMCS inactive. Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-By: NDavid Matlack <dmatlack@google.com> Message-Id: <1477668579-22555-1-git-send-email-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Since commit a545ab6a ("kvm: x86: add tsc_offset field to struct kvm_vcpu_arch", 2016-09-07) the offset between host and L1 TSC is cached and need not be fished out of the VMCS or VMCB. This means that we can implement adjust_tsc_offset_guest and read_l1_tsc entirely in generic code. The simplification is particularly significant for VMX code, where vmx->nested.vmcs01_tsc_offset was duplicating what is now in vcpu->arch.tsc_offset. Therefore the vmcs01_tsc_offset can be dropped completely. More importantly, this fixes KVM_GET_CLOCK/KVM_SET_CLOCK which, after commit 108b249c ("KVM: x86: introduce get_kvmclock_ns", 2016-09-01) called read_l1_tsc while the VMCS was not loaded. It thus returned bogus values on Intel CPUs. Fixes: 108b249cReported-by: NRoman Kagan <rkagan@virtuozzo.com> Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 10月, 2016 2 次提交
-
-
由 Ido Yariv 提交于
vcpu->arch.wbinvd_dirty_mask may still be used after freeing it, corrupting memory. For example, the following call trace may set a bit in an already freed cpu mask: kvm_arch_vcpu_load vcpu_load vmx_free_vcpu_nested vmx_free_vcpu kvm_arch_vcpu_free Fix this by deferring freeing of wbinvd_dirty_mask. Cc: stable@vger.kernel.org Signed-off-by: NIdo Yariv <ido@wizery.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Borislav Petkov 提交于
Add the "0x" prefix to the error messages format to make it unambiguous about what kind of value we're talking about. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Message-Id: <20161027181445.25319-1-bp@alien8.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 27 10月, 2016 1 次提交
-
-
由 Jim Mattson 提交于
Bitwise shifts by amounts greater than or equal to the width of the left operand are undefined. A malicious guest can exploit this to crash a 32-bit host, due to the BUG_ON(1)'s in handle_{invept,invvpid}. Signed-off-by: NJim Mattson <jmattson@google.com> Message-Id: <1477496318-17681-1-git-send-email-jmattson@google.com> [Change 1UL to 1, to match the range check on the shift count. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 20 10月, 2016 2 次提交
-
-
由 Jiri Slaby 提交于
gcc 7 warns: arch/x86/kvm/ioapic.c: In function 'kvm_ioapic_reset': arch/x86/kvm/ioapic.c:597:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size] And it is right. Memset whole array using sizeof operator. Signed-off-by: NJiri Slaby <jslaby@suse.cz> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> [Added x86 subject tag] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Borislav Petkov 提交于
When CONFIG_CPU_FREQ is not set, int cpu is unused and gcc rightfully warns about it: arch/x86/kvm/x86.c: In function ‘kvm_timer_init’: arch/x86/kvm/x86.c:5697:6: warning: unused variable ‘cpu’ [-Wunused-variable] int cpu; ^~~ But since it is used only in the CONFIG_CPU_FREQ block, simply move it there, thus squashing the warning too. Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 12 10月, 2016 1 次提交
-
-
由 Petr Mladek 提交于
A good practice is to prefix the names of functions by the name of the subsystem. The kthread worker API is a mix of classic kthreads and workqueues. Each worker has a dedicated kthread. It runs a generic function that process queued works. It is implemented as part of the kthread subsystem. This patch renames the existing kthread worker API to use the corresponding name from the workqueues API prefixed by kthread_: __init_kthread_worker() -> __kthread_init_worker() init_kthread_worker() -> kthread_init_worker() init_kthread_work() -> kthread_init_work() insert_kthread_work() -> kthread_insert_work() queue_kthread_work() -> kthread_queue_work() flush_kthread_work() -> kthread_flush_work() flush_kthread_worker() -> kthread_flush_worker() Note that the names of DEFINE_KTHREAD_WORK*() macros stay as they are. It is common that the "DEFINE_" prefix has precedence over the subsystem names. Note that INIT() macros and init() functions use different naming scheme. There is no good solution. There are several reasons for this solution: + "init" in the function names stands for the verb "initialize" aka "initialize worker". While "INIT" in the macro names stands for the noun "INITIALIZER" aka "worker initializer". + INIT() macros are used only in DEFINE() macros + init() functions are used close to the other kthread() functions. It looks much better if all the functions use the same scheme. + There will be also kthread_destroy_worker() that will be used close to kthread_cancel_work(). It is related to the init() function. Again it looks better if all functions use the same naming scheme. + there are several precedents for such init() function names, e.g. amd_iommu_init_device(), free_area_init_node(), jump_label_init_type(), regmap_init_mmio_clk(), + It is not an argument but it was inconsistent even before. [arnd@arndb.de: fix linux-next merge conflict] Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.comSuggested-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NPetr Mladek <pmladek@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 9月, 2016 3 次提交
-
-
由 Wanpeng Li 提交于
Run kvm-unit-tests/eventinj.flat in L1: Sending NMI to self After NMI to self FAIL: NMI This test scenario is to test whether VMM can handle NMI IDT-vectoring info correctly. At the beginning, L2 writes LAPIC to send a self NMI, the EPT page tables on both L1 and L0 are empty so: - The L2 accesses memory can generate EPT violation which can be intercepted by L0. The EPT violation vmexit occurred during delivery of this NMI, and the NMI info is recorded in vmcs02's IDT-vectoring info. - L0 walks L1's EPT12 and L0 sees the mapping is invalid, it injects the EPT violation into L1. The vmcs02's IDT-vectoring info is reflected to vmcs12's IDT-vectoring info since it is a nested vmexit. - L1 receives the EPT violation, then fixes its EPT12. - L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exits to L0. - L0 emulates VMRESUME which is called from L1, then return to L2. L0 merges the requirement of vmcs12's IDT-vectoring info and injects it to L2 through vmcs02. - The L2 re-executes the fault instruction and cause EPT violation again. - Since the L1's EPT12 is valid, L0 can fix its EPT02 - L0 resume L2 The EPT violation vmexit occurred during delivery of this NMI again, and the NMI info is recorded in vmcs02's IDT-vectoring info. L0 should inject the NMI through vmentry event injection since it is caused by EPT02's EPT violation. However, vmx_inject_nmi() refuses to inject NMI from IDT-vectoring info if vCPU is in guest mode, this patch fix it by permitting to inject NMI from IDT-vectoring if it is the L0's responsibility to inject NMI from IDT-vectoring info to L2. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Bandan Das <bsd@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
I observed that kvmvapic(to optimize flexpriority=N or AMD) is used to boost TPR access when testing kvm-unit-test/eventinj.flat tpr case on my haswell desktop (w/ flexpriority, w/o APICv). Commit (8d14695f x86, apicv: add virtual x2apic support) disable virtual x2apic mode completely if w/o APICv, and the author also told me that windows guest can't enter into x2apic mode when he developed the APICv feature several years ago. However, it is not truth currently, Interrupt Remapping and vIOMMU is added to qemu and the developers from Intel test windows 8 can work in x2apic mode w/ Interrupt Remapping enabled recently. This patch enables TPR shadow for virtual x2apic mode to boost windows guest in x2apic mode even if w/o APICv. Can pass the kvm-unit-test. Suggested-by: NRadim Krčmář <rkrcmar@redhat.com> Suggested-by: NWincy Van <fanwenyi0529@gmail.com> Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wincy Van <fanwenyi0529@gmail.com> Cc: Yang Zhang <yang.zhang.wz@gmail.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80 do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0 CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47 Call Trace: dump_stack+0x99/0xd0 __warn+0xd1/0xf0 warn_slowpath_fmt+0x4f/0x60 ? prepare_to_swait+0x39/0xa0 ? prepare_to_swait+0x39/0xa0 __might_sleep+0x7e/0x80 __gfn_to_pfn_memslot+0x156/0x480 [kvm] gfn_to_pfn+0x2a/0x30 [kvm] gfn_to_page+0xe/0x20 [kvm] kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm] nested_vmx_vmexit+0x765/0xca0 [kvm_intel] ? _raw_spin_unlock_irqrestore+0x36/0x80 vmx_check_nested_events+0x49/0x1f0 [kvm_intel] kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm] kvm_vcpu_check_block+0x12/0x60 [kvm] kvm_vcpu_block+0x94/0x4c0 [kvm] kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm] ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm] kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] =============================== [ INFO: suspicious RCU usage. ] 4.8.0-rc5+ #47 Not tainted ------------------------------- ./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-x86/4230: #0: (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm] stack backtrace: CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47 Call Trace: dump_stack+0x99/0xd0 lockdep_rcu_suspicious+0xe7/0x120 gfn_to_memslot+0x12a/0x140 [kvm] gfn_to_pfn+0x12/0x30 [kvm] gfn_to_page+0xe/0x20 [kvm] kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm] nested_vmx_vmexit+0x765/0xca0 [kvm_intel] ? _raw_spin_unlock_irqrestore+0x36/0x80 vmx_check_nested_events+0x49/0x1f0 [kvm_intel] kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm] kvm_vcpu_check_block+0x12/0x60 [kvm] kvm_vcpu_block+0x94/0x4c0 [kvm] kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm] ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm] kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] ? __fget+0xfd/0x210 ? __lock_is_held+0x54/0x70 do_vfs_ioctl+0x96/0x6a0 ? __fget+0x11c/0x210 ? __fget+0x5/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x81/0x220 entry_SYSCALL64_slow_path+0x25/0x25 These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat The nested preemption timer is based on hrtimer which is started on L2 entry, stopped on L2 exit and evaluated via the new check_nested_events hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE) if need to yield pCPU and w/o holding srcu read lock when accesses memslots, both can be in nested preemption timer evaluation path which results in the warning above. This patch fix it by leveraging request bit to async reload APIC access page before vmentry in order to avoid to reload directly during the nested preemption timer evaluation, it is safe since the vmcs01 is loaded and current is nested vmexit. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 20 9月, 2016 5 次提交
-
-
由 Colin Ian King 提交于
vm_data->avic_vm_id is a u32, so the check for a error return (less than zero) such as -EAGAIN from avic_get_next_vm_id currently has no effect whatsoever. Fix this by using a temporary int for the comparison and assign vm_data->avic_vm_id to this. I used an explicit u32 cast in the assignment to show why vm_data->avic_vm_id cannot be used in the assign/compare steps. Signed-off-by: NColin Ian King <colin.king@canonical.com> Acked-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Lately tsc page was implemented but filled with empty values. This patch setup tsc page scale and offset based on vcpu tsc, tsc_khz and HV_X64_MSR_TIME_REF_COUNT value. The valid tsc page drops HV_X64_MSR_TIME_REF_COUNT msr reads count to zero which potentially improves performance. Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: NPeter Hornyack <peterhornyack@google.com> Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> [Computation of TSC page parameters rewritten to use the Linux timekeeper parameters. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Introduce a function that reads the exact nanoseconds value that is provided to the guest in kvmclock. This crystallizes the notion of kvmclock as a thin veneer over a stable TSC, that the guest will (hopefully) convert with NTP. In other words, kvmclock is *not* a paravirtualized host-to-guest NTP. Drop the get_kernel_ns() function, that was used both to get the base value of the master clock and to get the current value of kvmclock. The former use is replaced by ktime_get_boot_ns(), the latter is the purpose of get_kernel_ns(). This also allows KVM to provide a Hyper-V time reference counter that is synchronized with the time that is computed from the TSC page. Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Make the guest's kvmclock count up from zero, not from the host boot time. The guest cannot rely on that anyway because it changes on migration, the numbers are easier on the eye and finally it matches the desired semantics of the Hyper-V time reference counter. Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
We will use it in the next patches for KVM_GET_CLOCK and as a basis for the contents of the Hyper-V TSC page. Get the values from the Linux timekeeper even if kvmclock is not enabled. Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 9月, 2016 2 次提交
-
-
由 Luiz Capitulino 提交于
This commit exports the following information to user-space via the newly created per-vcpu debugfs directory: - TSC offset (as a signed number) - TSC scaling ratio - TSC scaling ratio fractinal bits The original intention of this commit was to export only the TSC offset, but the TSC scaling information is exported for completeness. We need to retrieve the TSC offset from user-space in order to support the merging of host and guest traces in trace-cmd. Today, we use the kvm_write_tsc_offset tracepoint, but it has a number of problems (mainly, it requires a running VM to be rebooted, ftrace setup, and also tracepoints are not supposed to be ABIs). The merging of host and guest traces is explained in more detail in this thread: [Qemu-devel] [RFC] host and guest kernel trace merging https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html This commit creates the following files in debugfs: /sys/kernel/debug/kvm/66828-10/vcpu0/tsc-offset /sys/kernel/debug/kvm/66828-10/vcpu0/tsc-scaling-ratio /sys/kernel/debug/kvm/66828-10/vcpu0/tsc-scaling-ratio-frac-bits The last two are only created if TSC scaling is supported. Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Luiz Capitulino 提交于
Two stubs are added: o kvm_arch_has_vcpu_debugfs(): must return true if the arch supports creating debugfs entries in the vcpu debugfs dir (which will be implemented by the next commit) o kvm_arch_create_vcpu_debugfs(): code that creates debugfs entries in the vcpu debugfs dir For x86, this commit introduces a new file to avoid growing arch/x86/kvm/x86.c even more. Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-