- 27 6月, 2016 1 次提交
-
-
由 Marcelo Tosatti 提交于
Move the inline function nsec_to_cycles from x86.c to x86.h, as the next patch uses it from lapic.c. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 02 6月, 2016 4 次提交
-
-
由 Paolo Bonzini 提交于
MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to any of bits 63:32. However, this is not detected at KVM_SET_DEBUGREGS time, and the next KVM_RUN oopses: general protection fault: 0000 [#1] SMP CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 [...] Call Trace: [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm] [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm] [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480 [<ffffffff812418a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71 Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 RIP [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40 RSP <ffff88005836bd50> Testcase (beautified/reduced from syzkaller output): #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[8]; int main() { struct kvm_debugregs dr = { 0 }; r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); memcpy(&dr, "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72" "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8" "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9" "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb", 48); r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr); r[6] = ioctl(r[4], KVM_RUN, 0); } Reported-by: NDmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Paolo Bonzini 提交于
This cannot be returned by KVM_GET_VCPU_EVENTS, so it is okay to return EINVAL. It causes a WARN from exception_type: WARNING: CPU: 3 PID: 16732 at arch/x86/kvm/x86.c:345 exception_type+0x49/0x50 [kvm]() CPU: 3 PID: 16732 Comm: a.out Tainted: G W 4.4.6-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 0000000000000286 000000006308a48b ffff8800bec7fcf8 ffffffff813b542e 0000000000000000 ffffffffa0966496 ffff8800bec7fd30 ffffffff810a40f2 ffff8800552a8000 0000000000000000 00000000002c267c 0000000000000001 Call Trace: [<ffffffff813b542e>] dump_stack+0x63/0x85 [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0 [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa0924809>] exception_type+0x49/0x50 [kvm] [<ffffffffa0934622>] kvm_arch_vcpu_ioctl_run+0x10a2/0x14e0 [kvm] [<ffffffffa091c04d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm] [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480 [<ffffffff812414a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71 ---[ end trace b1a0391266848f50 ]--- Testcase (beautified/reduced from syzkaller output): #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <fcntl.h> #include <sys/ioctl.h> #include <linux/kvm.h> long r[31]; int main() { memset(r, -1, sizeof(r)); r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[7] = ioctl(r[3], KVM_CREATE_VCPU, 0); struct kvm_vcpu_events ve = { .exception.injected = 1, .exception.nr = 0xd4 }; r[27] = ioctl(r[7], KVM_SET_VCPU_EVENTS, &ve); r[30] = ioctl(r[7], KVM_RUN, 0); return 0; } Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Paolo Bonzini 提交于
Found by syzkaller: WARNING: CPU: 3 PID: 15175 at arch/x86/kvm/x86.c:7705 __x86_set_memory_region+0x1dc/0x1f0 [kvm]() CPU: 3 PID: 15175 Comm: a.out Tainted: G W 4.4.6-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 0000000000000286 00000000950899a7 ffff88011ab3fbf0 ffffffff813b542e 0000000000000000 ffffffffa0966496 ffff88011ab3fc28 ffffffff810a40f2 00000000000001fd 0000000000003000 ffff88014fc50000 0000000000000000 Call Trace: [<ffffffff813b542e>] dump_stack+0x63/0x85 [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0 [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa09251cc>] __x86_set_memory_region+0x1dc/0x1f0 [kvm] [<ffffffffa092521b>] x86_set_memory_region+0x3b/0x60 [kvm] [<ffffffffa09bb61c>] vmx_set_tss_addr+0x3c/0x150 [kvm_intel] [<ffffffffa092f4d4>] kvm_arch_vm_ioctl+0x654/0xbc0 [kvm] [<ffffffffa091d31a>] kvm_vm_ioctl+0x9a/0x6f0 [kvm] [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480 [<ffffffff812414a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71 Testcase: #include <unistd.h> #include <sys/ioctl.h> #include <fcntl.h> #include <string.h> #include <linux/kvm.h> long r[8]; int main() { memset(r, -1, sizeof(r)); r[2] = open("/dev/kvm", O_RDONLY|O_TRUNC); r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul); r[5] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul); r[7] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul); return 0; } Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Dmitry Bilunov 提交于
Intel CPUs having Turbo Boost feature implement an MSR to provide a control interface via rdmsr/wrmsr instructions. One could detect the presence of this feature by issuing one of these instructions and handling the #GP exception which is generated in case the referenced MSR is not implemented by the CPU. KVM's vCPU model behaves exactly as a real CPU in this case by injecting a fault when MSR_IA32_PERF_CTL is called (which KVM does not support). However, some operating systems use this register during an early boot stage in which their kernel is not capable of handling #GP correctly, causing #DP and finally a triple fault effectively resetting the vCPU. This patch implements a dummy handler for MSR_IA32_PERF_CTL to avoid the crashes. Signed-off-by: NDmitry Bilunov <kmeaw@yandex-team.ru> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 19 5月, 2016 2 次提交
-
-
由 Suravee Suthikulpanit 提交于
This patch introduces VMEXIT handlers, avic_incomplete_ipi_interception() and avic_unaccelerated_access_interception() along with two trace points (trace_kvm_avic_incomplete_ipi and trace_kvm_avic_unaccelerated_access). Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suravee Suthikulpanit 提交于
Adding function pointers in struct kvm_x86_ops for processor-specific layer to provide hooks for when KVM initialize and destroy VM. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 5月, 2016 1 次提交
-
-
由 Christian Borntraeger 提交于
Some wakeups should not be considered a sucessful poll. For example on s390 I/O interrupts are usually floating, which means that _ALL_ CPUs would be considered runnable - letting all vCPUs poll all the time for transactional like workload, even if one vCPU would be enough. This can result in huge CPU usage for large guests. This patch lets architectures provide a way to qualify wakeups if they should be considered a good/bad wakeups in regard to polls. For s390 the implementation will fence of halt polling for anything but known good, single vCPU events. The s390 implementation for floating interrupts does a wakeup for one vCPU, but the interrupt will be delivered by whatever CPU checks first for a pending interrupt. We prefer the woken up CPU by marking the poll of this CPU as "good" poll. This code will also mark several other wakeup reasons like IPI or expired timers as "good". This will of course also mark some events as not sucessful. As KVM on z runs always as a 2nd level hypervisor, we prefer to not poll, unless we are really sure, though. This patch successfully limits the CPU usage for cases like uperf 1byte transactional ping pong workload or wakeup heavy workload like OLTP while still providing a proper speedup. This also introduced a new vcpu stat "halt_poll_no_tuning" that marks wakeups that are considered not good for polling. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version) Cc: David Matlack <dmatlack@google.com> Cc: Wanpeng Li <kernellwp@gmail.com> [Rename config symbol. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 5月, 2016 1 次提交
-
-
由 Alex Williamson 提交于
If we don't support a mechanism for bypassing IRQs, don't register as a consumer. This eliminates meaningless dev_info()s when the connect fails between producer and consumer, such as on AMD systems where kvm_x86_ops->update_pi_irte is not implemented Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 03 5月, 2016 1 次提交
-
-
由 Wanpeng Li 提交于
Guest should only trust data to be valid when version haven't changed before and after reads of steal time. Besides not changing, it has to be an even number. Hypervisor may write an odd number to version field to indicate that an update is in progress. kvm_steal_clock() in guest has already done the read side, make write side in hypervisor more robust by following the above rule. Reviewed-by: NWincy Van <fanwenyi0529@gmail.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 20 4月, 2016 1 次提交
-
-
由 Liang Chen 提交于
Since accumulate_steal_time is now only called in record_steal_time, it doesn't quite make sense to put the delta calculation in a separate function. The function could be called thousands of times before guest enables the steal time MSR (though the compiler may optimize out this function call). And after it's enabled, the MSR enable bit is tested twice every time. Removing the accumulate_steal_time function also avoids the necessity of having the accum_steal field. Signed-off-by: NLiang Chen <liangchen.linux@gmail.com> Signed-off-by: NGavin Guo <gavin.guo@canonical.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 4月, 2016 2 次提交
-
-
由 Borislav Petkov 提交于
Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <kvm@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1459801503-15600-11-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Link: http://lkml.kernel.org/r/1459801503-15600-10-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 4月, 2016 1 次提交
-
-
由 David Matlack 提交于
An interrupt handler that uses the fpu can kill a KVM VM, if it runs under the following conditions: - the guest's xcr0 register is loaded on the cpu - the guest's fpu context is not loaded - the host is using eagerfpu Note that the guest's xcr0 register and fpu context are not loaded as part of the atomic world switch into "guest mode". They are loaded by KVM while the cpu is still in "host mode". Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The interrupt handler will look something like this: if (irq_fpu_usable()) { kernel_fpu_begin(); [... code that uses the fpu ...] kernel_fpu_end(); } As long as the guest's fpu is not loaded and the host is using eager fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle() returns true). The interrupt handler proceeds to use the fpu with the guest's xcr0 live. kernel_fpu_begin() saves the current fpu context. If this uses XSAVE[OPT], it may leave the xsave area in an undesirable state. According to the SDM, during XSAVE bit i of XSTATE_BV is not modified if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and xcr0[i] == 0 following an XSAVE. kernel_fpu_end() restores the fpu context. Now if any bit i in XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The fault is trapped and SIGSEGV is delivered to the current process. Only pre-4.2 kernels appear to be vulnerable to this sequence of events. Commit 653f52c3 ("kvm,x86: load guest FPU context more eagerly") from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts. This patch fixes the bug by keeping the host's xcr0 loaded outside of the interrupts-disabled region where KVM switches into guest mode. Cc: stable@vger.kernel.org Suggested-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NDavid Matlack <dmatlack@google.com> [Move load after goto cancel_injection. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 4月, 2016 1 次提交
-
-
由 Yuki Shibuya 提交于
Non maskable interrupts (NMI) are preferred to interrupts in current implementation. If a NMI is pending and NMI is blocked by the result of nmi_allowed(), pending interrupt is not injected and enable_irq_window() is not executed, even if interrupts injection is allowed. In old kernel (e.g. 2.6.32), schedule() is often called in NMI context. In this case, interrupts are needed to execute iret that intends end of NMI. The flag of blocking new NMI is not cleared until the guest execute the iret, and interrupts are blocked by pending NMI. Due to this, iret can't be invoked in the guest, and the guest is starved until block is cleared by some events (e.g. canceling injection). This patch injects pending interrupts, when it's allowed, even if NMI is blocked. And, If an interrupts is pending after executing inject_pending_event(), enable_irq_window() is executed regardless of NMI pending counter. Cc: stable@vger.kernel.org Signed-off-by: NYuki Shibuya <shibuya.yk@ncos.nec.co.jp> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 3月, 2016 3 次提交
-
-
由 Lan Tianyu 提交于
The barrier also orders the write to mode from any reads to the page tables done and so update the comment. Signed-off-by: NLan Tianyu <tianyu.lan@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Huaitong Han 提交于
X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation: CPUID.7.0.ECX[3]:PKU. X86_FEATURE_OSPKE is software support for pkeys, enumerated with CPUID.7.0.ECX[4]:OSPKE, and it reflects the setting of CR4.PKE(bit 22). This patch disables CPUID:PKU without ept, because pkeys is not yet implemented for shadow paging. Signed-off-by: NHuaitong Han <huaitong.han@intel.com> Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Huaitong Han 提交于
Protection keys define a new 4-bit protection key field (PKEY) in bits 62:59 of leaf entries of the page tables, the PKEY is an index to PKRU register(16 domains), every domain has 2 bits(write disable bit, access disable bit). Static logic has been produced in update_pkru_bitmask, dynamic logic need read pkey from page table entries, get pkru value, and deduce the correct result. [ Huaitong: Xiao helps to modify many sections. ] Signed-off-by: NHuaitong Han <huaitong.han@intel.com> Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 3月, 2016 1 次提交
-
-
由 Paolo Bonzini 提交于
It is now equal to use_eager_fpu(), which simply tests a cpufeature bit. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 04 3月, 2016 4 次提交
-
-
由 Radim Krčmář 提交于
channels has offset 0 and correct size now, but that can change. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
Discard policy doesn't rely on information from notifiers, so we don't need to register notifiers unconditionally. We kept correct counts in case userspace switched between policies during runtime, but that can be avoided by reseting the state. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
- kvm_create_pit had to lock only because it exposed kvm->arch.vpit very early, but initialization doesn't use kvm->arch.vpit since the last patch, so we can drop locking. - kvm_free_pit is only run after there are no users of KVM and therefore is the sole actor. - Locking in kvm_vm_ioctl_reinject doesn't do anything, because reinject is only protected at that place. - kvm_pit_reset isn't used anywhere and its locking can be dropped if we hide it. Removing useless locking allows to see what actually is being protected by PIT state lock (values accessible from the guest). Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
This patch passes struct kvm_pit into internal PIT functions. Those functions used to get PIT through kvm->arch.vpit, even though most of them never used *kvm for other purposes. Another benefit is that we don't need to set kvm->arch.vpit during initialization. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 03 3月, 2016 4 次提交
-
-
由 Xiao Guangrong 提交于
Register the notifier to receive write track event so that we can update our shadow page table It makes kvm_mmu_pte_write() be the callback of the notifier, no function is changed Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiao Guangrong 提交于
Notifier list is introduced so that any node wants to receive the track event can register to the list Two APIs are introduced here: - kvm_page_track_register_notifier(): register the notifier to receive track event - kvm_page_track_unregister_notifier(): stop receiving track event by unregister the notifier The callback, node->track_write() is called when a write access on the write tracked page happens Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiao Guangrong 提交于
The array, gfn_track[mode][gfn], is introduced in memory slot for every guest page, this is the tracking count for the gust page on different modes. If the page is tracked then the count is increased, the page is not tracked after the count reaches zero We use 'unsigned short' as the tracking count which should be enough as shadow page table only can use 2^14 (2^3 for level, 2^1 for cr4_pae, 2^2 for quadrant, 2^3 for access, 2^1 for nxe, 2^1 for cr0_wp, 2^1 for smep_andnot_wp, 2^1 for smap_andnot_wp, and 2^1 for smm) at most, there is enough room for other trackers Two callbacks, kvm_page_track_create_memslot() and kvm_page_track_free_memslot() are implemented in this patch, they are internally used to initialize and reclaim the memory of the array Currently, only write track mode is supported Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiao Guangrong 提交于
kvm_lpage_info->write_count is used to detect if the large page mapping for the gfn on the specified level is allowed, rename it to disallow_lpage to reflect its purpose, also we rename has_wrprotected_page() to mmu_gfn_lpage_is_disallowed() to make the code more clearer Later we will extend this mechanism for page tracking: if the gfn is tracked then large mapping for that gfn on any level is not allowed. The new name is more straightforward Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 2月, 2016 1 次提交
-
-
由 Paolo Bonzini 提交于
Commit 172b2386 ("KVM: x86: fix missed hardware breakpoints", 2016-02-10) worked around a case where the debug registers are not loaded correctly on preemption and on the first entry to KVM_RUN. However, Xiao Guangrong pointed out that the root cause must be that KVM_DEBUGREG_BP_ENABLED is not being set correctly. This can indeed happen due to the lazy debug exit mechanism, which does not call kvm_update_dr7. Fix it by replacing the existing loop (more or less equivalent to kvm_update_dr0123) with calls to all the kvm_update_dr* functions. Cc: stable@vger.kernel.org # 4.1+ Fixes: 172b2386Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 2月, 2016 2 次提交
-
-
由 Paolo Bonzini 提交于
Sometimes when setting a breakpoint a process doesn't stop on it. This is because the debug registers are not loaded correctly on VCPU load. The following simple reproducer from Oleg Nesterov tries using debug registers in two threads. To see the bug, run a 2-VCPU guest with "taskset -c 0" and run "./bp 0 1" inside the guest. #include <unistd.h> #include <signal.h> #include <stdlib.h> #include <stdio.h> #include <sys/wait.h> #include <sys/ptrace.h> #include <sys/user.h> #include <asm/debugreg.h> #include <assert.h> #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len) { unsigned long dr7; dr7 = ((len | type) & 0xf) << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE); if (enable) dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)); return dr7; } int write_dr(int pid, int dr, unsigned long val) { return ptrace(PTRACE_POKEUSER, pid, offsetof (struct user, u_debugreg[dr]), val); } void set_bp(pid_t pid, void *addr) { unsigned long dr7; assert(write_dr(pid, 0, (long)addr) == 0); dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1); assert(write_dr(pid, 7, dr7) == 0); } void *get_rip(int pid) { return (void*)ptrace(PTRACE_PEEKUSER, pid, offsetof(struct user, regs.rip), 0); } void test(int nr) { void *bp_addr = &&label + nr, *bp_hit; int pid; printf("test bp %d\n", nr); assert(nr < 16); // see 16 asm nops below pid = fork(); if (!pid) { assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0); kill(getpid(), SIGSTOP); for (;;) { label: asm ( "nop; nop; nop; nop;" "nop; nop; nop; nop;" "nop; nop; nop; nop;" "nop; nop; nop; nop;" ); } } assert(pid == wait(NULL)); set_bp(pid, bp_addr); for (;;) { assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0); assert(pid == wait(NULL)); bp_hit = get_rip(pid); if (bp_hit != bp_addr) fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n", bp_hit - &&label, nr); } } int main(int argc, const char *argv[]) { while (--argc) { int nr = atoi(*++argv); if (!fork()) test(nr); } while (wait(NULL) > 0) ; return 0; } Cc: stable@vger.kernel.org Suggested-by: NNadav Amit <namit@cs.technion.ac.il> Reported-by: NAndrey Wagin <avagin@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Adam Buchbinder 提交于
Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: trivial@kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 17 2月, 2016 4 次提交
-
-
由 Paolo Bonzini 提交于
Prepare for improving the precision in the next patch. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Sometimes when setting a breakpoint a process doesn't stop on it. This is because the debug registers are not loaded correctly on VCPU load. The following simple reproducer from Oleg Nesterov tries using debug registers in both the host and the guest, for example by running "./bp 0 1" on the host and "./bp 14 15" under QEMU. #include <unistd.h> #include <signal.h> #include <stdlib.h> #include <stdio.h> #include <sys/wait.h> #include <sys/ptrace.h> #include <sys/user.h> #include <asm/debugreg.h> #include <assert.h> #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len) { unsigned long dr7; dr7 = ((len | type) & 0xf) << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE); if (enable) dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)); return dr7; } int write_dr(int pid, int dr, unsigned long val) { return ptrace(PTRACE_POKEUSER, pid, offsetof (struct user, u_debugreg[dr]), val); } void set_bp(pid_t pid, void *addr) { unsigned long dr7; assert(write_dr(pid, 0, (long)addr) == 0); dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1); assert(write_dr(pid, 7, dr7) == 0); } void *get_rip(int pid) { return (void*)ptrace(PTRACE_PEEKUSER, pid, offsetof(struct user, regs.rip), 0); } void test(int nr) { void *bp_addr = &&label + nr, *bp_hit; int pid; printf("test bp %d\n", nr); assert(nr < 16); // see 16 asm nops below pid = fork(); if (!pid) { assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0); kill(getpid(), SIGSTOP); for (;;) { label: asm ( "nop; nop; nop; nop;" "nop; nop; nop; nop;" "nop; nop; nop; nop;" "nop; nop; nop; nop;" ); } } assert(pid == wait(NULL)); set_bp(pid, bp_addr); for (;;) { assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0); assert(pid == wait(NULL)); bp_hit = get_rip(pid); if (bp_hit != bp_addr) fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n", bp_hit - &&label, nr); } } int main(int argc, const char *argv[]) { while (--argc) { int nr = atoi(*++argv); if (!fork()) test(nr); } while (wait(NULL) > 0) ; return 0; } Cc: stable@vger.kernel.org Suggested-by: NNadadv Amit <namit@cs.technion.ac.il> Reported-by: NAndrey Wagin <avagin@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This is the same as before: kvm_scale_tsc(tgt_tsc_khz) = tgt_tsc_khz * ratio = tgt_tsc_khz * user_tsc_khz / tsc_khz (see set_tsc_khz) = user_tsc_khz (see kvm_guest_time_update) = vcpu->arch.virtual_tsc_khz (see kvm_set_tsc_khz) However, computing it through kvm_scale_tsc will make it possible to include the NTP correction in tgt_tsc_khz. Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This refers to the desired (scaled) frequency, which is called user_tsc_khz in the rest of the file. Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 2月, 2016 3 次提交
-
-
由 Paolo Bonzini 提交于
Different pieces of code checked for vcpu->arch.apic being (non-)NULL, or used kvm_vcpu_has_lapic (more optimized) or lapic_in_kernel. Replace everything with lapic_in_kernel's name and kvm_vcpu_has_lapic's implementation. Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Feng Wu 提交于
Use vector-hashing to deliver lowest-priority interrupts, As an example, modern Intel CPUs in server platform use this method to handle lowest-priority interrupts. Signed-off-by: NFeng Wu <feng.wu@intel.com> Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This is similar to the existing div_frac function, but it returns the remainder too. Unlike div_frac, it can be used to implement long division, e.g. (a << 64) / b for 32-bit a and b. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 1月, 2016 1 次提交
-
-
由 Dan Williams 提交于
To date, we have implemented two I/O usage models for persistent memory, PMEM (a persistent "ram disk") and DAX (mmap persistent memory into userspace). This series adds a third, DAX-GUP, that allows DAX mappings to be the target of direct-i/o. It allows userspace to coordinate DMA/RDMA from/to persistent memory. The implementation leverages the ZONE_DEVICE mm-zone that went into 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned and dynamically mapped by a device driver. The pmem driver, after mapping a persistent memory range into the system memmap via devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus page-backed pmem-pfns via flags in the new pfn_t type. The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the resulting pte(s) inserted into the process page tables with a new _PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys off _PAGE_DEVMAP to pin the device hosting the page range active. Finally, get_page() and put_page() are modified to take references against the device driver established page mapping. Finally, this need for "struct page" for persistent memory requires memory capacity to store the memmap array. Given the memmap array for a large pool of persistent may exhaust available DRAM introduce a mechanism to allocate the memmap from persistent memory. The new "struct vmem_altmap *" parameter to devm_memremap_pages() enables arch_add_memory() to use reserved pmem capacity rather than the page allocator. This patch (of 18): The core has developed a need for a "pfn_t" type [1]. Move the existing pfn_t in KVM to kvm_pfn_t [2]. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com> Acked-by: NChristoffer Dall <christoffer.dall@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 1月, 2016 2 次提交
-
-
由 Andrey Smetanin 提交于
Consolidate updating the Hyper-V SynIC timers in a single place: on guest entry in processing KVM_REQ_HV_STIMER request. This simplifies the overall logic, and makes sure the most current state of msrs and guest clock is used for arming the timers (to achieve that, KVM_REQ_HV_STIMER has to be processed after KVM_REQ_CLOCK_UPDATE). Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Since the numbers now overlap, it makes sense to enumerate them in asm/kvm_host.h rather than linux/kvm_host.h. Functions that refer to architecture-specific requests are also moved to arch/. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-