- 03 9月, 2014 3 次提交
-
-
由 David Matlack 提交于
The following events can lead to an incorrect KVM_EXIT_MMIO bubbling up to userspace: (1) Guest accesses gpa X without a memory slot. The gfn is cached in struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets the SPTE write-execute-noread so that future accesses cause EPT_MISCONFIGs. (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION covering the page just accessed. (3) Guest attempts to read or write to gpa X again. On Intel, this generates an EPT_MISCONFIG. The memory slot generation number that was incremented in (2) would normally take care of this but we fast path mmio faults through quickly_check_mmio_pf(), which only checks the per-vcpu mmio cache. Since we hit the cache, KVM passes a KVM_EXIT_MMIO up to userspace. This patch fixes the issue by using the memslot generation number to validate the mmio cache. Cc: stable@vger.kernel.org Signed-off-by: NDavid Matlack <dmatlack@google.com> [xiaoguangrong: adjust the code to make it simpler for stable-tree fix.] Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: NDavid Matlack <dmatlack@google.com> Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Tested-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
vcpu exits and memslot mutations can run concurrently as long as the vcpu does not aquire the slots mutex. Thus it is theoretically possible for memslots to change underneath a vcpu that is handling an exit. If we increment the memslot generation number again after synchronize_srcu_expedited(), vcpus can safely cache memslot generation without maintaining a single rcu_dereference through an entire vm exit. And much of the x86/kvm code does not maintain a single rcu_dereference of the current memslots during each exit. We can prevent the following case: vcpu (CPU 0) | thread (CPU 1) --------------------------------------------+-------------------------- 1 vm exit | 2 srcu_read_unlock(&kvm->srcu) | 3 decide to cache something based on | old memslots | 4 | change memslots | (increments generation) 5 | synchronize_srcu(&kvm->srcu); 6 retrieve generation # from new memslots | 7 tag cache with new memslot generation | 8 srcu_read_unlock(&kvm->srcu) | ... | <action based on cache occurs even | though the caching decision was based | on the old memslots> | ... | <action *continues* to occur until next | memslot generation change, which may | be never> | | By incrementing the generation after synchronizing with kvm->srcu readers, we ensure that the generation retrieved in (6) will become invalid soon after (8). Keeping the existing increment is not strictly necessary, but we do keep it and just move it for consistency from update_memslots to install_new_memslots. It invalidates old cached MMIOs immediately, instead of having to wait for the end of synchronize_srcu_expedited, which makes the code more clearly correct in case CPU 1 is preempted right after synchronize_srcu() returns. To avoid halving the generation space in SPTEs, always presume that the low bit of the generation is zero when reconstructing a generation number out of an SPTE. This effectively disables MMIO caching in SPTEs during the call to synchronize_srcu_expedited. Using the low bit this way is somewhat like a seqcount---where the protected thing is a cache, and instead of retrying we can simply punt if we observe the low bit to be 1. Cc: stable@vger.kernel.org Signed-off-by: NDavid Matlack <dmatlack@google.com> Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The next patch will give a meaning (a la seqcount) to the low bit of the generation number. Ensure that it matches between kvm->memslots->generation and kvm_current_mmio_generation(). Cc: stable@vger.kernel.org Reviewed-by: NDavid Matlack <dmatlack@google.com> Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 30 8月, 2014 1 次提交
-
-
由 Paolo Bonzini 提交于
The check introduced in commit d7a2a246 (KVM: x86: #GP when attempts to write reserved bits of Variable Range MTRRs, 2014-08-19) will break if the guest maxphyaddr is higher than the host's (which sometimes happens depending on your hardware and how QEMU is configured). To fix this, use cpuid_maxphyaddr similar to how the APIC_BASE MSR does already. Reported-by: NJan Kiszka <jan.kiszka@siemens.com> Tested-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 29 8月, 2014 12 次提交
-
-
由 Radim Krčmář 提交于
In the beggining was on_each_cpu(), which required an unused argument to kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten. Remove unnecessary arguments that stem from this. Signed-off-by: NRadim KrÄmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
Using static inline is going to save few bytes and cycles. For example on powerpc, the difference is 700 B after stripping. (5 kB before) This patch also deals with two overlooked empty functions: kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c 2df72e9b KVM: split kvm_arch_flush_shadow and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c. e790d9ef KVM: add kvm_arch_sched_in Signed-off-by: NRadim KrÄmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid "'struct foo' declared inside parameter list" warnings (and consequent breakage due to conflicting types). Move them from individual files to a generic place in linux/kvm_types.h. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
These are not explicitly aligned, and do not require alignment on AVX. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alex Williamson 提交于
Windows 8.1 guest with NVIDIA driver and GPU fails to boot with an emulation failure. The KVM spew suggests the fault is with lack of movntdq emulation (courtesy of Paolo): Code=02 00 00 b8 08 00 00 00 f3 0f 6f 44 0a f0 f3 0f 6f 4c 0a e0 <66> 0f e7 41 f0 66 0f e7 49 e0 48 83 e9 40 f3 0f 6f 44 0a 10 f3 0f 6f 0c 0a 66 0f e7 41 10 $ as -o a.out .section .text .byte 0x66, 0x0f, 0xe7, 0x41, 0xf0 .byte 0x66, 0x0f, 0xe7, 0x49, 0xe0 $ objdump -d a.out 0: 66 0f e7 41 f0 movntdq %xmm0,-0x10(%rcx) 5: 66 0f e7 49 e0 movntdq %xmm1,-0x20(%rcx) Add the necessary emulation. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nadav Amit 提交于
Unlike VMCALL, the instructions VMXOFF, VMLAUNCH and VMRESUME should cause a UD exception in real-mode or vm86. However, the emulator considers all these instructions the same for the matter of mode checks, and emulation upon exit due to #UD exception. As a result, the hypervisor behaves incorrectly on vm86 mode. VMXOFF, VMLAUNCH or VMRESUME cause on vm86 exit due to #UD. The hypervisor then emulates these instruction and inject #GP to the guest instead of #UD. This patch creates a new group for these instructions and mark only VMCALL as an instruction which can be emulated. Signed-off-by: NNadav Amit <namit@cs.technion.ac.il> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Sparse reports the following easily fixed warnings: arch/x86/kvm/vmx.c:8795:48: sparse: Using plain integer as NULL pointer arch/x86/kvm/vmx.c:2138:5: sparse: symbol vmx_read_l1_tsc was not declared. Should it be static? arch/x86/kvm/vmx.c:6151:48: sparse: Using plain integer as NULL pointer arch/x86/kvm/vmx.c:8851:6: sparse: symbol vmx_sched_in was not declared. Should it be static? arch/x86/kvm/svm.c:2162:5: sparse: symbol svm_read_l1_tsc was not declared. Should it be static? Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411 TPR shadow/threshold feature is important to speed up the Windows guest. Besides, it is a must feature for certain VMM. We map virtual APIC page address and TPR threshold from L1 VMCS. If TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested in, we inject it into L1 VMM for handling. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com> [Add PAGE_ALIGNED check, do not write useless virtual APIC page address if TPR shadowing is disabled. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Introduce function nested_get_vmcs12_pages() to check the valid of nested apic access page and virtual apic page earlier. Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
The idea between capabilities and the KVM_CHECK_EXTENSION ioctl is that userspace can, at run-time, determine if a feature is supported or not. This allows KVM to being supporting a new feature with a new kernel version without any need to update user space. Unfortunately, since the definition of KVM_CAP_USER_NMI was guarded by #ifdef __KVM_HAVE_USER_NMI, such discovery still required a user space update. Therefore, unconditionally export KVM_CAP_USER_NMI and change the the typo in the comment for the IOCTL number definition as well. Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
The idea between capabilities and the KVM_CHECK_EXTENSION ioctl is that userspace can, at run-time, determine if a feature is supported or not. This allows KVM to being supporting a new feature with a new kernel version without any need to update user space. Unfortunately, since the definition of KVM_CAP_READONLY_MEM was guarded by #ifdef __KVM_HAVE_READONLY_MEM, such discovery still required a user space update. Therefore, unconditionally export KVM_CAP_READONLY_MEM and change the in-kernel conditional to rely on __KVM_HAVE_READONLY_MEM. Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christian Borntraeger 提交于
commit ab3f285f ("KVM: s390/mm: try a cow on read only pages for key ops")' misaligned a code block. Let's fixup the indentation. Reported-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 8月, 2014 4 次提交
-
-
由 Paolo Bonzini 提交于
Merge tag 'kvm-s390-next-20140825' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes and features for 3.18 part 1 1. The usual cleanups: get rid of duplicate code, use defines, factor out the sync_reg handling, additional docs for sync_regs, better error handling on interrupt injection 2. We use KVM_REQ_TLB_FLUSH instead of open coding tlb flushes 3. Additional registers for kvm_run sync regs. This is usually not needed in the fast path due to eventfd/irqfd, but kvm stat claims that we reduced the overhead of console output by ~50% on my system 4. A rework of the gmap infrastructure. This is the 2nd step towards host large page support (after getting rid of the storage key dependency). We introduces two radix trees to store the guest-to-host and host-to-guest translations. This gets us rid of most of the page-table walks in the gmap code. Only one in __gmap_link is left, this one is required to link the shadow page table to the process page table. Finally this contains the plumbing to support gmap page tables with less than 5 levels.
-
由 Martin Schwidefsky 提交于
The radix tree rework removed all code that uses the gmap_rmap and gmap_pgtable data structures. Remove these outdated definitions. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Martin Schwidefsky 提交于
Add an addressing limit to the gmap address spaces and only allocate the page table levels that are needed for the given limit. The limit is fixed and can not be changed after a gmap has been created. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Martin Schwidefsky 提交于
Store the target address for the gmap segments in a radix tree instead of using invalid segment table entries. gmap_translate becomes a simple radix_tree_lookup, gmap_fault is split into the address translation with gmap_translate and the part that does the linking of the gmap shadow page table with the process page table. A second radix tree is used to keep the pointers to the segment table entries for segments that are mapped in the guest address space. On unmap of a segment the pointer is retrieved from the radix tree and is used to carry out the segment invalidation in the gmap shadow page table. As the radix tree can only store one pointer, each host segment may only be mapped to exactly one guest location. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
- 25 8月, 2014 16 次提交
-
-
由 Paolo Bonzini 提交于
Fix commit 7b46268d, which mistakenly included the new tracepoint under #ifdef CONFIG_X86_64. Reported-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Merge tag 'kvm-s390-20140825' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD Here are two fixes for s390 KVM code that prevent: 1. a malicious user to trigger a kernel BUG 2. a malicious user to change the storage key of read-only pages
-
由 Martin Schwidefsky 提交于
Make the order of arguments for the gmap calls more consistent, if the gmap pointer is passed it is always the first argument. In addition distinguish between guest address and user address by naming the variables gaddr for a guest address and vmaddr for a user address. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Martin Schwidefsky 提交于
Revert git commit c3a23b9874c1 ("remove unnecessary parameter from gmap_do_ipte_notify"). Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Martin Schwidefsky 提交于
Revert git commit 1b7fd6952063 ("remove unecessary parameter from pgste_ipte_notify") Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Jens Freimann 提交于
The kvm lock protects us against vcpus going away, but they only go away when the virtual machine is shut down. We don't need this mutex here, so let's get rid of it. Signed-off-by: NJens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Jens Freimann 提交于
Currently we just kill the userspace process and exit the thread immediatly without making sure that we don't hold any locks etc. Improve this by making KVM_RUN return -EFAULT if the lowcore is not mapped during interrupt delivery. To achieve this we need to pass the return code of guest memory access routines used in interrupt delivery all the way back to the KVM_RUN ioctl. Signed-off-by: NJens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 David Hildenbrand 提交于
Use the KVM_REQ_TLB_FLUSH request in order to trigger tlb flushes instead of manipulating the SIE control block whenever we need it. Also trigger it for a control register sync directly instead of (ab)using kvm_s390_set_prefix(). Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 David Hildenbrand 提交于
In order to reduce the number of syscalls when dropping to user space, this patch enables the synchronization of the following "registers" with kvm_run: - ARCH0: CPU timer, clock comparator, TOD programmable register, guest breaking-event register, program parameter - PFAULT: pfault parameters (token, select, compare) The registers are grouped to reduce the overhead when syncing. As this grows the number of sync registers quite a bit, let's move the code synchronizing registers with kvm_run from kvm_arch_vcpu_ioctl_run() into separate helper routines. Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Christian Borntraeger 提交于
The load PSW handler does not have to inject pending machine checks. This can wait until the CPU runs the generic interrupt injection code. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
-
由 David Hildenbrand 提交于
We should make sure that all kvm_dirty_regs bits are cleared before dropping to user space. Until now, some would remain pending. Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 David Hildenbrand 提交于
This patch clarifies that kvm_dirty_regs are just a hint to the kernel and that the kernel might just ignore some flags and sync the values (like done for acrs and gprs now). Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Jens Freimann 提交于
Let's make this a reusable function. Signed-off-by: NJens Freimann <jfrei@linux.vnet.ibm.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Christian Borntraeger 提交于
The PFMF instruction handler blindly wrote the storage key even if the page was mapped R/O in the host. Lets try a COW before continuing and bail out in case of errors. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Cc: stable@vger.kernel.org
-
由 Jens Freimann 提交于
Get rid of open coded values for pfault init. Signed-off-by: NJens Freimann <jfrei@linux.vnet.ibm.com> Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Christian Borntraeger 提交于
In the early days, we had some special handling for the KVM_EXIT_S390_SIEIC exit, but this was gone in 2009 with commit d7b0b5eb (KVM: s390: Make psw available on all exits, not just a subset). Now this switch statement is just a sanity check for userspace not messing with the kvm_run structure. Unfortunately, this allows userspace to trigger a kernel BUG. Let's just remove this switch statement. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Cc: stable@vger.kernel.org
-
- 22 8月, 2014 4 次提交
-
-
由 Radim Krčmář 提交于
Tracepoint for dynamic PLE window, fired on every potential change. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
Window is increased on every PLE exit and decreased on every sched_in. The idea is that we don't want to PLE exit if there is no preemption going on. We do this with sched_in() because it does not hold rq lock. There are two new kernel parameters for changing the window: ple_window_grow and ple_window_shrink ple_window_grow affects the window on PLE exit and ple_window_shrink does it on sched_in; depending on their value, the window is modifier like this: (ple_window is kvm_intel's global) ple_window_shrink/ | ple_window_grow | PLE exit | sched_in -------------------+--------------------+--------------------- < 1 | = ple_window | = ple_window < ple_window | *= ple_window_grow | /= ple_window_shrink otherwise | += ple_window_grow | -= ple_window_shrink A third new parameter, ple_window_max, controls the maximal ple_window; it is internally rounded down to a closest multiple of ple_window_grow. VCPU's PLE window is never allowed below ple_window. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
Change PLE window into per-VCPU variable, seeded from module parameter, to allow greater flexibility. Brings in a small overhead on every vmentry. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
sched_in preempt notifier is available for x86, allow its use in specific virtualization technlogies as well. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-