- 12 1月, 2021 3 次提交
-
-
由 chenjiajun 提交于
virt inclusion category: feature bugzilla: 46853 CVE: NA Export vcpu_stat via debugfs for x86, which contains x86 kvm exits items. The path of the vcpu_stat is /sys/kernel/debug/kvm/vcpu_stat, and each line of vcpu_stat is a collection of various kvm exits for a vcpu. And through vcpu_stat, we only need to open one file to tail performance of virtual machine, which is more convenient. Signed-off-by: NFeng Lin <linfeng23@huawei.com> Signed-off-by: Nchenjiajun <chenjiajun8@huawei.com> Reviewed-by: NXiangyou Xie <xiexiangyou@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
由 chenjiajun 提交于
virt inclusion category: feature bugzilla: 46853 CVE: NA This patch export cpu time related items to vcpu_stat. Contain: steal, st_max, utime, stime, gtime The definitions of these items are: steal: cpu time VCPU waits for PCPU while it is servicing another VCPU st_max: max scheduling delay utime: cpu time in userspace stime: cpu time in sys gtime: cpu time in guest Through these items, user can get many cpu usage info of vcpu, such as: CPU Usage of Guest = gtime_delta / delta_cputime CPU Usage of Hyp = (utime_delta - gtime_delta + stime_delta) / delta_cputime CPU Usage of Steal = steal_delta / delta_cputime Max Scheduling Delay = st_max Signed-off-by: Nliangpeng <liangpeng10@huawei.com> Signed-off-by: Nchenjiajun <chenjiajun8@huawei.com> Reviewed-by: NXiangyou Xie <xiexiangyou@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
由 chenjiajun 提交于
virt inclusion category: feature bugzilla: 46853 CVE: NA This patch create debugfs entry for vcpu stat. The entry path is /sys/kernel/debug/kvm/vcpu_stat. And vcpu_stat contains partial kvm exits items of vcpu, include: pid, hvc_exit_stat, wfe_exit_stat, wfi_exit_stat, mmio_exit_user, mmio_exit_kernel, exits Currently, The maximum vcpu limit is 1024. From this vcpu_stat, user can get the number of these kvm exits items over a period of time, which is helpful to monitor the virtual machine. Signed-off-by: NZenghui Yu <yuzenghui@huawei.com> Signed-off-by: Nchenjiajun <chenjiajun8@huawei.com> Reviewed-by: NXiangyou Xie <xiexiangyou@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
- 23 10月, 2020 1 次提交
-
-
由 Ben Gardon 提交于
Dirty logging is a key feature of the KVM MMU and must be supported by the TDP MMU. Add support for both the write protection and PML dirty logging modes. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20201014182700.2888246-16-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 10月, 2020 1 次提交
-
-
由 Peter Xu 提交于
Cache the address space ID just like the slot ID. It will be used in order to fill in the dirty ring entries. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20201014182700.2888246-7-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 9月, 2020 2 次提交
-
-
由 Rustam Kovhaev 提交于
Make use of the struct_size() helper to avoid any potential type mistakes and protect against potential integer overflows Make use of the flex_array_size() helper to calculate the size of a flexible array member within an enclosing structure Suggested-by: NGustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: NRustam Kovhaev <rkovhaev@gmail.com> Message-Id: <20200918120500.954436-1-rkovhaev@gmail.com> Reviewed-by: NGustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yi Li 提交于
There is no need to calculate wildcard in each iteration since wildcard is not changed. Signed-off-by: NYi Li <yili@winhong.com> Message-Id: <20200911055652.3041762-1-yili@winhong.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 9月, 2020 1 次提交
-
-
由 Rustam Kovhaev 提交于
when kmalloc() fails in kvm_io_bus_unregister_dev(), before removing the bus, we should iterate over all other devices linked to it and call kvm_iodevice_destructor() for them Fixes: 90db1043 ("KVM: kvm_io_bus_unregister_dev() should never fail") Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+f196caa45793d6374707@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707Signed-off-by: NRustam Kovhaev <rkovhaev@gmail.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200907185535.233114-1-rkovhaev@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 8月, 2020 1 次提交
-
-
由 Will Deacon 提交于
The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: NWill Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 8月, 2020 1 次提交
-
-
由 Peter Xu 提交于
After the cleanup of page fault accounting, gup does not need to pass task_struct around any more. Remove that parameter in the whole gup stack. Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com> Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 7月, 2020 1 次提交
-
-
由 Ahmed S. Darwish 提交于
A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_spinlock_t data type, which allows to associate a spinlock with the sequence counter. This enables lockdep to verify that the spinlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20200720155530.1173732-24-a.darwish@linutronix.de
-
- 24 7月, 2020 1 次提交
-
-
由 Thomas Gleixner 提交于
Entering a guest is similar to exiting to user space. Pending work like handling signals, rescheduling, task work etc. needs to be handled before that. Provide generic infrastructure to avoid duplication of the same handling code all over the place. The transfer to guest mode handling is different from the exit to usermode handling, e.g. vs. rseq and live patching, so a separate function is used. The initial list of work items handled is: TIF_SIGPENDING, TIF_NEED_RESCHED, TIF_NOTIFY_RESUME Architecture specific TIF flags can be added via defines in the architecture specific include files. The calling convention is also different from the syscall/interrupt entry functions as KVM invokes this from the outer vcpu_run() loop with interrupts and preemption enabled. To prevent missing a pending work item it invokes a check for pending TIF work from interrupt disabled code right before transitioning to guest mode. The lockdep, RCU and tracing state handling is also done directly around the switch to and from guest mode. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200722220519.833296398@linutronix.de
-
- 10 7月, 2020 1 次提交
-
-
由 Sean Christopherson 提交于
Move x86's memory cache helpers to common KVM code so that they can be reused by arm64 and MIPS in future patches. Suggested-by: NChristoffer Dall <christoffer.dall@arm.com> Reviewed-by: NBen Gardon <bgardon@google.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-16-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 7月, 2020 2 次提交
-
-
由 Vitaly Kuznetsov 提交于
OVMF booted guest running on shadow pages crashes on TRIPLE FAULT after enabling paging from SMM. The crash is triggered from mmu_check_root() and is caused by kvm_is_visible_gfn() searching through memslots with as_id = 0 while vCPU may be in a different context (address space). Introduce kvm_vcpu_is_visible_gfn() and use it from mmu_check_root(). Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200708140023.1476020-1-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Unlike normal 'int' functions returning '0' on success, kvm_setup_async_pf()/ kvm_arch_setup_async_pf() return '1' when a job to handle page fault asynchronously was scheduled and '0' otherwise. To avoid the confusion change return type to 'bool'. No functional change intended. Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200615121334.91300-1-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 02 7月, 2020 1 次提交
-
-
由 Paolo Bonzini 提交于
Sparse complains on a call to get_compat_sigset, fix it. The "if" right above explains that sigmask_arg->sigset is basically a compat_sigset_t. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 6月, 2020 2 次提交
-
-
由 Vitaly Kuznetsov 提交于
'Page not present' event may or may not get injected depending on guest's state. If the event wasn't injected, there is no need to inject the corresponding 'page ready' event as the guest may get confused. E.g. Linux thinks that the corresponding 'page not present' event wasn't delivered *yet* and allocates a 'dummy entry' for it. This entry is never freed. Note, 'wakeup all' events have no corresponding 'page not present' event and always get injected. s390 seems to always be able to inject 'page not present', the change is effectively a nop. Suggested-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200610175532.779793-2-vkuznets@redhat.com> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=208081Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
schedule_work() returns 'false' only when the work is already on the queue and this can't happen as kvm_setup_async_pf() always allocates a new one. Also, to avoid potential race, it makes sense to to schedule_work() at the very end after we've added it to the queue. While on it, do some minor cleanup. gfn_to_pfn_async() mentioned in a comment does not currently exist and, moreover, we can check kvm_is_error_hva() at the very beginning, before we try to allocate work so 'retry_sync' label can go away completely. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200610175532.779793-1-vkuznets@redhat.com> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 10 6月, 2020 2 次提交
-
-
由 Michel Lespinasse 提交于
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
Patch series "mm: consolidate definitions of page table accessors", v2. The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures. Most of these definitions are actually identical and typically it boils down to, e.g. static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); } These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined. For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic. These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header. This patch (of 12): The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>. The include statements in such cases are remove with a simple loop: for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 6月, 2020 1 次提交
-
-
由 Souptick Joarder 提交于
API __get_user_pages_fast() renamed to get_user_pages_fast_only() to align with pin_user_pages_fast_only(). As part of this we will get rid of write parameter. Instead caller will pass FOLL_WRITE to get_user_pages_fast_only(). This will not change any existing functionality of the API. All the callers are changed to pass FOLL_WRITE. Also introduce get_user_page_fast_only(), and use it in a few places that hard-code nr_pages to 1. Updated the documentation of the API. Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> [arch/powerpc/kvm] Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michal Suchanek <msuchanek@suse.de> Link: http://lkml.kernel.org/r/1590396812-31277-1-git-send-email-jrdr.linux@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 6月, 2020 1 次提交
-
-
由 Eiichi Tsukata 提交于
Commit b1394e74 ("KVM: x86: fix APIC page invalidation") tried to fix inappropriate APIC page invalidation by re-introducing arch specific kvm_arch_mmu_notifier_invalidate_range() and calling it from kvm_mmu_notifier_invalidate_range_start. However, the patch left a possible race where the VMCS APIC address cache is updated *before* it is unmapped: (Invalidator) kvm_mmu_notifier_invalidate_range_start() (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD) (KVM VCPU) vcpu_enter_guest() (KVM VCPU) kvm_vcpu_reload_apic_access_page() (Invalidator) actually unmap page Because of the above race, there can be a mismatch between the host physical address stored in the APIC_ACCESS_PAGE VMCS field and the host physical address stored in the EPT entry for the APIC GPA (0xfee0000). When this happens, the processor will not trap APIC accesses, and will instead show the raw contents of the APIC-access page. Because Windows OS periodically checks for unexpected modifications to the LAPIC register, this will show up as a BSOD crash with BugCheck CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in https://bugzilla.redhat.com/show_bug.cgi?id=1751017. The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range() cannot guarantee that no additional references are taken to the pages in the range before kvm_mmu_notifier_invalidate_range_end(). Fortunately, this case is supported by the MMU notifier API, as documented in include/linux/mmu_notifier.h: * If the subsystem * can't guarantee that no additional references are taken to * the pages in the range, it has to implement the * invalidate_range() notifier to remove any references taken * after invalidate_range_start(). The fix therefore is to reload the APIC-access page field in the VMCS from kvm_mmu_notifier_invalidate_range() instead of ..._range_start(). Cc: stable@vger.kernel.org Fixes: b1394e74 ("KVM: x86: fix APIC page invalidation") Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=197951Signed-off-by: NEiichi Tsukata <eiichi.tsukata@nutanix.com> Message-Id: <20200606042627.61070-1-eiichi.tsukata@nutanix.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 6月, 2020 1 次提交
-
-
由 Denis Efremov 提交于
Replace opencoded alloc and copy with vmemdup_user(). Signed-off-by: NDenis Efremov <efremov@linux.com> Message-Id: <20200603101131.2107303-1-efremov@linux.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 04 6月, 2020 1 次提交
-
-
由 Paolo Bonzini 提交于
After commit 63d04348 ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point") we are creating the pre-vCPU debugfs files after the creation of the vCPU file descriptor. This makes it possible for userspace to reach kvm_vcpu_release before kvm_create_vcpu_debugfs has finished. The vcpu->debugfs_dentry then does not have any associated inode anymore, and this causes a NULL-pointer dereference in debugfs_create_file. The solution is simply to avoid removing the files; they are cleaned up when the VM file descriptor is closed (and that must be after KVM_CREATE_VCPU returns). We can stop storing the dentry in struct kvm_vcpu too, because it is not needed anywhere after kvm_create_vcpu_debugfs returns. Reported-by: syzbot+705f4401d5a93a59b87d@syzkaller.appspotmail.com Fixes: 63d04348 ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point") Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 6月, 2020 6 次提交
-
-
由 Paolo Bonzini 提交于
The userspace_addr alignment and range checks are not performed for private memory slots that are prepared by KVM itself. This is unnecessary and makes it questionable to use __*_user functions to access memory later on. We also rely on the userspace address being aligned since we have an entire family of functions to map gfn to pfn. Fortunately skipping the check is completely unnecessary. Only x86 uses private memslots and their userspace_addr is obtained from vm_mmap, therefore it must be below PAGE_OFFSET. In fact, any attempt to pass an address above PAGE_OFFSET would have failed because such an address would return true for kvm_is_error_hva. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
If two page ready notifications happen back to back the second one is not delivered and the only mechanism we currently have is kvm_check_async_pf_completion() check in vcpu_run() loop. The check will only be performed with the next vmexit when it happens and in some cases it may take a while. With interrupt based page ready notification delivery the situation is even worse: unlike exceptions, interrupts are not handled immediately so we must check if the slot is empty. This is slow and unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate the fact that the slot is free and host should check its notification queue. Mandate using it for interrupt based 'page ready' APF event delivery. As kvm_check_async_pf_completion() is going away from vcpu_run() we need a way to communicate the fact that vcpu->async_pf.done queue has transitioned from empty to non-empty state. Introduce kvm_arch_async_page_present_queued() and KVM_REQ_APF_READY to do the job. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-7-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
We already have kvm_write_guest_offset_cached(), introduce read analogue. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-5-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
An innocent reader of the following x86 KVM code: bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu) { if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED)) return true; ... may get very confused: if APF mechanism is not enabled, why do we report that we 'can inject async page present'? In reality, upon injection kvm_arch_async_page_present() will check the same condition again and, in case APF is disabled, will just drop the item. This is fine as the guest which deliberately disabled APF doesn't expect to get any APF notifications. Rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() to make it clear what we are checking: if the item can be dequeued (meaning either injected or just dropped). On s390 kvm_arch_can_inject_async_page_present() always returns 'true' so the rename doesn't matter much. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-4-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This reverts commit 5b494aea. If unlocked==true then the vma pointer could be invalidated, so the 2nd follow_pfn() is potentially racy: we do need to get out and redo find_vma_intersection(). Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The userspace_addr alignment and range checks are not performed for private memory slots that are prepared by KVM itself. This is unnecessary and makes it questionable to use __*_user functions to access memory later on. We also rely on the userspace address being aligned since we have an entire family of functions to map gfn to pfn. Fortunately skipping the check is completely unnecessary. Only x86 uses private memslots and their userspace_addr is obtained from vm_mmap, therefore it must be below PAGE_OFFSET. In fact, any attempt to pass an address above PAGE_OFFSET would have failed because such an address would return true for kvm_is_error_hva. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 29 5月, 2020 1 次提交
-
-
由 Marc Zyngier 提交于
With ARMv8.5-GTG, the hardware (or more likely a hypervisor) can advertise the supported Stage-2 page sizes. Let's check this at boot time. Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: NAlexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NWill Deacon <will@kernel.org>
-
- 16 5月, 2020 5 次提交
-
-
由 Fuad Tabba 提交于
Fix spelling and typos (e.g., repeated words) in comments. Signed-off-by: NFuad Tabba <tabba@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200401140310.29701-1-tabba@google.com
-
由 Marc Zyngier 提交于
Now that the 32bit KVM/arm host is a distant memory, let's move the whole of the KVM/arm64 code into the arm64 tree. As they said in the song: Welcome Home (Sanitarium). Signed-off-by: NMarc Zyngier <maz@kernel.org> Acked-by: NWill Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200513104034.74741-1-maz@kernel.org
-
由 David Matlack 提交于
Two new stats for exposing halt-polling cpu usage: halt_poll_success_ns halt_poll_fail_ns Thus sum of these 2 stats is the total cpu time spent polling. "success" means the VCPU polled until a virtual interrupt was delivered. "fail" means the VCPU had to schedule out (either because the maximum poll time was reached or it needed to yield the CPU). To avoid touching every arch's kvm_vcpu_stat struct, only update and export halt-polling cpu usage stats if we're on x86. Exporting cpu usage as a u64 and in nanoseconds means we will overflow at ~500 years, which seems reasonably large. Signed-off-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NJon Cargille <jcargill@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <20200508182240.68440-1-jcargill@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
While optimizing posted-interrupt delivery especially for the timer fastpath scenario, I measured kvm_x86_ops.deliver_posted_interrupt() to introduce substantial latency because the processor has to perform all vmentry tasks, ack the posted interrupt notification vector, read the posted-interrupt descriptor etc. This is not only slow, it is also unnecessary when delivering an interrupt to the current CPU (as is the case for the LAPIC timer) because PIR->IRR and IRR->RVI synchronization is already performed on vmentry Therefore skip kvm_vcpu_trigger_posted_interrupt in this case, and instead do vmx_sync_pir_to_irr() on the EXIT_FASTPATH_REENTER_GUEST fastpath as well. Tested-by: NHaiwei Li <lihaiwei@tencent.com> Cc: Haiwei Li <lihaiwei@tencent.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1588055009-12677-6-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Xu 提交于
hva_to_pfn_remapped() calls fixup_user_fault(), which has already handled the retry gracefully. Even if "unlocked" is set to true, it means that we've got a VM_FAULT_RETRY inside fixup_user_fault(), however the page fault has already retried and we should have the pfn set correctly. No need to do that again. Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20200416155906.267462-1-peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 5月, 2020 2 次提交
-
-
由 Jason Yan 提交于
The '==' expression itself is bool, no need to convert it to bool again. This fixes the following coccicheck warning: virt/kvm/eventfd.c:724:38-43: WARNING: conversion to bool not needed here Signed-off-by: NJason Yan <yanaijie@huawei.com> Message-Id: <20200420123805.4494-1-yanaijie@huawei.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Davidlohr Bueso 提交于
The use of any sort of waitqueue (simple or regular) for wait/waking vcpus has always been an overkill and semantically wrong. Because this is per-vcpu (which is blocked) there is only ever a single waiting vcpu, thus no need for any sort of queue. As such, make use of the rcuwait primitive, with the following considerations: - rcuwait already provides the proper barriers that serialize concurrent waiter and waker. - Task wakeup is done in rcu read critical region, with a stable task pointer. - Because there is no concurrency among waiters, we need not worry about rcuwait_wait_event() calls corrupting the wait->task. As a consequence, this saves the locking done in swait when modifying the queue. This also applies to per-vcore wait for powerpc kvm-hv. The x86 tscdeadline_latency test mentioned in 8577370f ("KVM: Use simple waitqueue for vcpu->wq") shows that, on avg, latency is reduced by around 15-20% with this change. Cc: Paul Mackerras <paulus@ozlabs.org> Cc: kvmarm@lists.cs.columbia.edu Cc: linux-mips@vger.kernel.org Reviewed-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Message-Id: <20200424054837.5138-6-dave@stgolabs.net> [Avoid extra logic changes. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 5月, 2020 1 次提交
-
-
由 Suravee Suthikulpanit 提交于
This allows making request to all other vcpus except the one specified in the parameter. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <1588771076-73790-2-git-send-email-suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 5月, 2020 1 次提交
-
-
由 Marc Zyngier 提交于
In the unlikely event that a 32bit vcpu traps into the hypervisor on an instruction that is located right at the end of the 32bit range, the emulation of that instruction is going to increment PC past the 32bit range. This isn't great, as userspace can then observe this value and get a bit confused. Conversly, userspace can do things like (in the context of a 64bit guest that is capable of 32bit EL0) setting PSTATE to AArch64-EL0, set PC to a 64bit value, change PSTATE to AArch32-USR, and observe that PC hasn't been truncated. More confusion. Fix both by: - truncating PC increments for 32bit guests - sanitizing all 32bit regs every time a core reg is changed by userspace, and that PSTATE indicates a 32bit mode. Cc: stable@vger.kernel.org Acked-by: NWill Deacon <will@kernel.org> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-