- 14 1月, 2022 1 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.84 commit 6a44f200f1f15d041dc381dd13dd840ac36af863 bugzilla: 186030 https://gitee.com/openeuler/kernel/issues/I4QV2F Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6a44f200f1f15d041dc381dd13dd840ac36af863 -------------------------------- commit 6b285a55 upstream. Reject userspace memslots whose size exceeds the storage capacity of an "unsigned long". KVM's uAPI takes the size as u64 to support large slots on 64-bit hosts, but does not account for the size being truncated on 32-bit hosts in various flows. The access_ok() check on the userspace virtual address in particular casts the size to "unsigned long" and will check the wrong number of bytes. KVM doesn't actually support slots whose size doesn't fit in an "unsigned long", e.g. KVM's internal kvm_memory_slot.npages is an "unsigned long", not a "u64", and misc arch specific code follows that behavior. Fixes: fa3d315a ("KVM: Validate userspace_addr of memslot when registered") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Reviewed-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <20211104002531.1176691-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 15 11月, 2021 1 次提交
-
-
由 Sergey Senozhatsky 提交于
stable inclusion from stable-5.10.72 commit 6d0ff920599960a22ea520afda99eed3096c0e12 bugzilla: 182982 https://gitee.com/openeuler/kernel/issues/I4I3L1 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6d0ff920599960a22ea520afda99eed3096c0e12 -------------------------------- [ Upstream commit ae232ea4 ] grow_halt_poll_ns() ignores values between 0 and halt_poll_ns_grow_start (10000 by default). However, when we shrink halt_poll_ns we may fall way below halt_poll_ns_grow_start and endup with halt_poll_ns values that don't make a lot of sense: like 1 or 9, or 19. VCPU1 trace (halt_poll_ns_shrink equals 2): VCPU1 grow 10000 VCPU1 shrink 5000 VCPU1 shrink 2500 VCPU1 shrink 1250 VCPU1 shrink 625 VCPU1 shrink 312 VCPU1 shrink 156 VCPU1 shrink 78 VCPU1 shrink 39 VCPU1 shrink 19 VCPU1 shrink 9 VCPU1 shrink 4 Mirror what grow_halt_poll_ns() does and set halt_poll_ns to 0 as soon as new shrink-ed halt_poll_ns value falls below halt_poll_ns_grow_start. Signed-off-by: NSergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210902031100.252080-1-senozhatsky@chromium.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 15 10月, 2021 2 次提交
-
-
由 Paolo Bonzini 提交于
stable inclusion from stable-5.10.58 commit 32f55c25ee292dbd1d57a7fcfe2d435fa41c19f5 bugzilla: 176984 https://gitee.com/openeuler/kernel/issues/I4E2P4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=32f55c25ee292dbd1d57a7fcfe2d435fa41c19f5 -------------------------------- commit 85cd39af upstream. KVM creates a debugfs directory for each VM in order to store statistics about the virtual machine. The directory name is built from the process pid and a VM fd. While generally unique, it is possible to keep a file descriptor alive in a way that causes duplicate directories, which manifests as these messages: [ 471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present! Even though this should not happen in practice, it is more or less expected in the case of KVM for testcases that call KVM_CREATE_VM and close the resulting file descriptor repeatedly and in parallel. When this happens, debugfs_create_dir() returns an error but kvm_create_vm_debugfs() goes on to allocate stat data structs which are later leaked. The slow memory leak was spotted by syzkaller, where it caused OOM reports. Since the issue only affects debugfs, do a lookup before calling debugfs_create_dir, so that the message is downgraded and rate-limited. While at it, ensure kvm->debugfs_dentry is NULL rather than an error if it is not created. This fixes kvm_destroy_vm_debugfs, which was not checking IS_ERR_OR_NULL correctly. Cc: stable@vger.kernel.org Fixes: 536a6f88 ("KVM: Create debugfs dir and stat files for each VM") Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Suggested-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Paolo Bonzini 提交于
stable inclusion from stable-5.10.56 commit 52acb6c147b30aef1e178249620c81b41da4ae8b bugzilla: 176004 https://gitee.com/openeuler/kernel/issues/I4DYZ4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=52acb6c147b30aef1e178249620c81b41da4ae8b -------------------------------- commit 8750f9bb upstream. The arguments to the KVM_CLEAR_DIRTY_LOG ioctl include a pointer, therefore it needs a compat ioctl implementation. Otherwise, 32-bit userspace fails to invoke it on 64-bit kernels; for x86 it might work fine by chance if the padding is zero, but not on big-endian architectures. Reported-by: Thomas Sattler Cc: stable@vger.kernel.org Fixes: 2a31b9db ("kvm: introduce manual dirty log reprotect") Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 12 10月, 2021 1 次提交
-
-
由 Nicholas Piggin 提交于
stable inclusion from stable-5.10.47 commit dd8ed6c9bc2224c1ace5292d01089d3feb7ebbc3 bugzilla: 172973 https://gitee.com/openeuler/kernel/issues/I4DAKB Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=dd8ed6c9bc2224c1ace5292d01089d3feb7ebbc3 -------------------------------- commit f8be156b upstream. It's possible to create a region which maps valid but non-refcounted pages (e.g., tail pages of non-compound higher order allocations). These host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family of APIs, which take a reference to the page, which takes it from 0 to 1. When the reference is dropped, this will free the page incorrectly. Fix this by only taking a reference on valid pages if it was non-zero, which indicates it is participating in normal refcounting (and can be released with put_page). This addresses CVE-2021-22543. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Tested-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 03 6月, 2021 4 次提交
-
-
由 Benjamin Segall 提交于
stable inclusion from stable-5.10.38 commit ce7639252357be0e1f93a77abd6d81f39c800a91 bugzilla: 51875 CVE: NA -------------------------------- commit 262de410 upstream. single_task_running() is usually more general than need_resched() but CFS_BANDWIDTH throttling will use resched_task() when there is just one task to get the task to block. This was causing long-need_resched warnings and was likely allowing VMs to overrun their quota when halt polling. Signed-off-by: NBen Segall <bsegall@google.com> Signed-off-by: NVenkatesh Srinivas <venkateshs@chromium.org> Message-Id: <20210429162233.116849-1-venkateshs@chromium.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 David Matlack 提交于
stable inclusion from stable-5.10.38 commit 21756f878e827784213df136e678fed0ce9f0e30 bugzilla: 51875 CVE: NA -------------------------------- commit 258785ef upstream. When growing halt-polling, there is no check that the poll time exceeds the per-VM limit. It's possible for vcpu->halt_poll_ns to grow past kvm->max_halt_poll_ns and stay there until a halt which takes longer than kvm->halt_poll_ns. Signed-off-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NVenkatesh Srinivas <venkateshs@chromium.org> Message-Id: <20210506152442.4010298-1-venkateshs@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Sean Christopherson 提交于
stable inclusion from stable-5.10.37 commit 2a20592baff59c5351c5200ec667e1a2aa22af85 bugzilla: 51868 CVE: NA -------------------------------- commit 5d3c4c79 upstream. Abort the walk of coalesced MMIO zones if kvm_io_bus_unregister_dev() fails to allocate memory for the new instance of the bus. If it can't instantiate a new bus, unregister_dev() destroys all devices _except_ the target device. But, it doesn't tell the caller that it obliterated the bus and invoked the destructor for all devices that were on the bus. In the coalesced MMIO case, this can result in a deleted list entry dereference due to attempting to continue iterating on coalesced_zones after future entries (in the walk) have been deleted. Opportunistically add curly braces to the for-loop, which encompasses many lines but sneaks by without braces due to the guts being a single if statement. Fixes: f6588660 ("KVM: fix memory leak in kvm_io_bus_unregister_dev()") Cc: stable@vger.kernel.org Reported-by: NHao Sun <sunhao.th@gmail.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210412222050.876100-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Sean Christopherson 提交于
stable inclusion from stable-5.10.37 commit 03c6cccedd3913006744faa252a4da5145299343 bugzilla: 51868 CVE: NA -------------------------------- commit 2ee37574 upstream. If allocating a new instance of an I/O bus fails when unregistering a device, wait to destroy the device until after all readers are guaranteed to see the new null bus. Destroying devices before the bus is nullified could lead to use-after-free since readers expect the devices on their reference of the bus to remain valid. Fixes: f6588660 ("KVM: fix memory leak in kvm_io_bus_unregister_dev()") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210412222050.876100-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 14 4月, 2021 1 次提交
-
-
由 liubo 提交于
euleros inclusion category: feature feature: add memig scan feature patch to openEuler kernel bugzilla: 48246 ------------------------------------------------- reason:This patch is used to add memig scan feature to openEuler system. memig_scan.ko is used to scan the virtual address of the target process and return the address access information to the user mode for grading cold and hot pages. Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: Nyanxiaodan <yanxiaodan@huawei.com> Signed-off-by: NFeilong Lin <linfeilong@huawei.com> Signed-off-by: Ngeruijun <geruijun@huawei.com> Signed-off-by: Nliubo <liubo254@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 09 4月, 2021 3 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-5.10.19 commit 3320aa64c322512ccc17406c0d60681177a8cac5 bugzilla: 50607 -------------------------------- commit a9545779 upstream. Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving a so called "remapped" hva/pfn pair. In theory, the hva could resolve to a pfn in high memory on a 32-bit kernel. This bug was inadvertantly exposed by commit bd2fae8d ("KVM: do not assume PTE is writable after follow_pfn"), which added an error PFN value to the mix, causing gcc to comlain about overflowing the unsigned long. arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’: include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’ to ‘long unsigned int’ changes value from ‘9218868437227405314’ to ‘2’ [-Werror=overflow] 89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) | ^ virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’ Cc: stable@vger.kernel.org Fixes: add6a0cd ("KVM: MMU: try to fix up page faults before giving up") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210208201940.1258328-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Paolo Bonzini 提交于
stable inclusion from stable-5.10.19 commit a42150f1c965d23ea858c1931f53b591d9e817d4 bugzilla: 50607 -------------------------------- commit 9fd6dad1 upstream. Currently, the follow_pfn function is exported for modules but follow_pte is not. However, follow_pfn is very easy to misuse, because it does not provide protections (so most of its callers assume the page is writable!) and because it returns after having already unlocked the page table lock. Provide instead a simplified version of follow_pte that does not have the pmdpp and range arguments. The older version survives as follow_invalidate_pte() for use by fs/dax.c. Reviewed-by: NJason Gunthorpe <jgg@nvidia.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Paolo Bonzini 提交于
stable inclusion from stable-5.10.19 commit 83d42c25868df563097dd8c17ed15add81dd402c bugzilla: 50607 -------------------------------- commit bd2fae8d upstream. In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functinso. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn. In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable. Usage of follow_pfn was introduced in commit add6a0cd ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable. Fixes: 2e2e3738 ("KVM: Handle vma regions with no backing page") Reported-by: NDavid Stevens <stevensd@google.com> Cc: 3pvd@google.com Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 09 2月, 2021 1 次提交
-
-
由 Marc Zyngier 提交于
stable inclusion from stable-5.10.13 commit 256a0040c6c9f6d342044897e33f280426a4e315 bugzilla: 47995 -------------------------------- commit 139bc8a6 upstream. The use of a tagged address could be pretty confusing for the whole memslot infrastructure as well as the MMU notifiers. Forbid it altogether, as it never quite worked the first place. Cc: stable@vger.kernel.org Reported-by: NRick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 27 1月, 2021 1 次提交
-
-
由 Lai Jiangshan 提交于
stable inclusion from stable-5.10.7 commit ffee6772c489d8d65d86979d4ccc4286624124b2 bugzilla: 47429 -------------------------------- commit 88bf56d0 upstream. In kvm_mmu_notifier_invalidate_range_start(), tlbs_dirty is used as: need_tlb_flush |= kvm->tlbs_dirty; with need_tlb_flush's type being int and tlbs_dirty's type being long. It means that tlbs_dirty is always used as int and the higher 32 bits is useless. We need to check tlbs_dirty in a correct way and this change checks it directly without propagating it to need_tlb_flush. Note: it's _extremely_ unlikely this neglecting of higher 32 bits can cause problems in practice. It would require encountering tlbs_dirty on a 4 billion count boundary, and KVM would need to be using shadow paging or be running a nested guest. Cc: stable@vger.kernel.org Fixes: a4ee1ca4 ("KVM: MMU: delay flush all tlbs on sync_page path") Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20201217154118.16497-1-jiangshanlai@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 12 1月, 2021 3 次提交
-
-
由 chenjiajun 提交于
virt inclusion category: feature bugzilla: 46853 CVE: NA Export vcpu_stat via debugfs for x86, which contains x86 kvm exits items. The path of the vcpu_stat is /sys/kernel/debug/kvm/vcpu_stat, and each line of vcpu_stat is a collection of various kvm exits for a vcpu. And through vcpu_stat, we only need to open one file to tail performance of virtual machine, which is more convenient. Signed-off-by: NFeng Lin <linfeng23@huawei.com> Signed-off-by: Nchenjiajun <chenjiajun8@huawei.com> Reviewed-by: NXiangyou Xie <xiexiangyou@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
由 chenjiajun 提交于
virt inclusion category: feature bugzilla: 46853 CVE: NA This patch export cpu time related items to vcpu_stat. Contain: steal, st_max, utime, stime, gtime The definitions of these items are: steal: cpu time VCPU waits for PCPU while it is servicing another VCPU st_max: max scheduling delay utime: cpu time in userspace stime: cpu time in sys gtime: cpu time in guest Through these items, user can get many cpu usage info of vcpu, such as: CPU Usage of Guest = gtime_delta / delta_cputime CPU Usage of Hyp = (utime_delta - gtime_delta + stime_delta) / delta_cputime CPU Usage of Steal = steal_delta / delta_cputime Max Scheduling Delay = st_max Signed-off-by: Nliangpeng <liangpeng10@huawei.com> Signed-off-by: Nchenjiajun <chenjiajun8@huawei.com> Reviewed-by: NXiangyou Xie <xiexiangyou@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
由 chenjiajun 提交于
virt inclusion category: feature bugzilla: 46853 CVE: NA This patch create debugfs entry for vcpu stat. The entry path is /sys/kernel/debug/kvm/vcpu_stat. And vcpu_stat contains partial kvm exits items of vcpu, include: pid, hvc_exit_stat, wfe_exit_stat, wfi_exit_stat, mmio_exit_user, mmio_exit_kernel, exits Currently, The maximum vcpu limit is 1024. From this vcpu_stat, user can get the number of these kvm exits items over a period of time, which is helpful to monitor the virtual machine. Signed-off-by: NZenghui Yu <yuzenghui@huawei.com> Signed-off-by: Nchenjiajun <chenjiajun8@huawei.com> Reviewed-by: NXiangyou Xie <xiexiangyou@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
- 23 10月, 2020 1 次提交
-
-
由 Ben Gardon 提交于
Dirty logging is a key feature of the KVM MMU and must be supported by the TDP MMU. Add support for both the write protection and PML dirty logging modes. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20201014182700.2888246-16-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 10月, 2020 1 次提交
-
-
由 Peter Xu 提交于
Cache the address space ID just like the slot ID. It will be used in order to fill in the dirty ring entries. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20201014182700.2888246-7-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 9月, 2020 1 次提交
-
-
由 Rustam Kovhaev 提交于
Make use of the struct_size() helper to avoid any potential type mistakes and protect against potential integer overflows Make use of the flex_array_size() helper to calculate the size of a flexible array member within an enclosing structure Suggested-by: NGustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: NRustam Kovhaev <rkovhaev@gmail.com> Message-Id: <20200918120500.954436-1-rkovhaev@gmail.com> Reviewed-by: NGustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 9月, 2020 1 次提交
-
-
由 Rustam Kovhaev 提交于
when kmalloc() fails in kvm_io_bus_unregister_dev(), before removing the bus, we should iterate over all other devices linked to it and call kvm_iodevice_destructor() for them Fixes: 90db1043 ("KVM: kvm_io_bus_unregister_dev() should never fail") Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+f196caa45793d6374707@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707Signed-off-by: NRustam Kovhaev <rkovhaev@gmail.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200907185535.233114-1-rkovhaev@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 8月, 2020 1 次提交
-
-
由 Will Deacon 提交于
The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: NWill Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 8月, 2020 1 次提交
-
-
由 Peter Xu 提交于
After the cleanup of page fault accounting, gup does not need to pass task_struct around any more. Remove that parameter in the whole gup stack. Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com> Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 7月, 2020 1 次提交
-
-
由 Sean Christopherson 提交于
Move x86's memory cache helpers to common KVM code so that they can be reused by arm64 and MIPS in future patches. Suggested-by: NChristoffer Dall <christoffer.dall@arm.com> Reviewed-by: NBen Gardon <bgardon@google.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-16-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 7月, 2020 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
OVMF booted guest running on shadow pages crashes on TRIPLE FAULT after enabling paging from SMM. The crash is triggered from mmu_check_root() and is caused by kvm_is_visible_gfn() searching through memslots with as_id = 0 while vCPU may be in a different context (address space). Introduce kvm_vcpu_is_visible_gfn() and use it from mmu_check_root(). Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200708140023.1476020-1-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 02 7月, 2020 1 次提交
-
-
由 Paolo Bonzini 提交于
Sparse complains on a call to get_compat_sigset, fix it. The "if" right above explains that sigmask_arg->sigset is basically a compat_sigset_t. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 10 6月, 2020 2 次提交
-
-
由 Michel Lespinasse 提交于
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
Patch series "mm: consolidate definitions of page table accessors", v2. The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures. Most of these definitions are actually identical and typically it boils down to, e.g. static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); } These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined. For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic. These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header. This patch (of 12): The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>. The include statements in such cases are remove with a simple loop: for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 6月, 2020 1 次提交
-
-
由 Souptick Joarder 提交于
API __get_user_pages_fast() renamed to get_user_pages_fast_only() to align with pin_user_pages_fast_only(). As part of this we will get rid of write parameter. Instead caller will pass FOLL_WRITE to get_user_pages_fast_only(). This will not change any existing functionality of the API. All the callers are changed to pass FOLL_WRITE. Also introduce get_user_page_fast_only(), and use it in a few places that hard-code nr_pages to 1. Updated the documentation of the API. Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> [arch/powerpc/kvm] Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michal Suchanek <msuchanek@suse.de> Link: http://lkml.kernel.org/r/1590396812-31277-1-git-send-email-jrdr.linux@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 6月, 2020 1 次提交
-
-
由 Eiichi Tsukata 提交于
Commit b1394e74 ("KVM: x86: fix APIC page invalidation") tried to fix inappropriate APIC page invalidation by re-introducing arch specific kvm_arch_mmu_notifier_invalidate_range() and calling it from kvm_mmu_notifier_invalidate_range_start. However, the patch left a possible race where the VMCS APIC address cache is updated *before* it is unmapped: (Invalidator) kvm_mmu_notifier_invalidate_range_start() (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD) (KVM VCPU) vcpu_enter_guest() (KVM VCPU) kvm_vcpu_reload_apic_access_page() (Invalidator) actually unmap page Because of the above race, there can be a mismatch between the host physical address stored in the APIC_ACCESS_PAGE VMCS field and the host physical address stored in the EPT entry for the APIC GPA (0xfee0000). When this happens, the processor will not trap APIC accesses, and will instead show the raw contents of the APIC-access page. Because Windows OS periodically checks for unexpected modifications to the LAPIC register, this will show up as a BSOD crash with BugCheck CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in https://bugzilla.redhat.com/show_bug.cgi?id=1751017. The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range() cannot guarantee that no additional references are taken to the pages in the range before kvm_mmu_notifier_invalidate_range_end(). Fortunately, this case is supported by the MMU notifier API, as documented in include/linux/mmu_notifier.h: * If the subsystem * can't guarantee that no additional references are taken to * the pages in the range, it has to implement the * invalidate_range() notifier to remove any references taken * after invalidate_range_start(). The fix therefore is to reload the APIC-access page field in the VMCS from kvm_mmu_notifier_invalidate_range() instead of ..._range_start(). Cc: stable@vger.kernel.org Fixes: b1394e74 ("KVM: x86: fix APIC page invalidation") Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=197951Signed-off-by: NEiichi Tsukata <eiichi.tsukata@nutanix.com> Message-Id: <20200606042627.61070-1-eiichi.tsukata@nutanix.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 6月, 2020 1 次提交
-
-
由 Denis Efremov 提交于
Replace opencoded alloc and copy with vmemdup_user(). Signed-off-by: NDenis Efremov <efremov@linux.com> Message-Id: <20200603101131.2107303-1-efremov@linux.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 04 6月, 2020 1 次提交
-
-
由 Paolo Bonzini 提交于
After commit 63d04348 ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point") we are creating the pre-vCPU debugfs files after the creation of the vCPU file descriptor. This makes it possible for userspace to reach kvm_vcpu_release before kvm_create_vcpu_debugfs has finished. The vcpu->debugfs_dentry then does not have any associated inode anymore, and this causes a NULL-pointer dereference in debugfs_create_file. The solution is simply to avoid removing the files; they are cleaned up when the VM file descriptor is closed (and that must be after KVM_CREATE_VCPU returns). We can stop storing the dentry in struct kvm_vcpu too, because it is not needed anywhere after kvm_create_vcpu_debugfs returns. Reported-by: syzbot+705f4401d5a93a59b87d@syzkaller.appspotmail.com Fixes: 63d04348 ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point") Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 6月, 2020 4 次提交
-
-
由 Paolo Bonzini 提交于
The userspace_addr alignment and range checks are not performed for private memory slots that are prepared by KVM itself. This is unnecessary and makes it questionable to use __*_user functions to access memory later on. We also rely on the userspace address being aligned since we have an entire family of functions to map gfn to pfn. Fortunately skipping the check is completely unnecessary. Only x86 uses private memslots and their userspace_addr is obtained from vm_mmap, therefore it must be below PAGE_OFFSET. In fact, any attempt to pass an address above PAGE_OFFSET would have failed because such an address would return true for kvm_is_error_hva. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
We already have kvm_write_guest_offset_cached(), introduce read analogue. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-5-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This reverts commit 5b494aea. If unlocked==true then the vma pointer could be invalidated, so the 2nd follow_pfn() is potentially racy: we do need to get out and redo find_vma_intersection(). Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The userspace_addr alignment and range checks are not performed for private memory slots that are prepared by KVM itself. This is unnecessary and makes it questionable to use __*_user functions to access memory later on. We also rely on the userspace address being aligned since we have an entire family of functions to map gfn to pfn. Fortunately skipping the check is completely unnecessary. Only x86 uses private memslots and their userspace_addr is obtained from vm_mmap, therefore it must be below PAGE_OFFSET. In fact, any attempt to pass an address above PAGE_OFFSET would have failed because such an address would return true for kvm_is_error_hva. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 5月, 2020 3 次提交
-
-
由 Fuad Tabba 提交于
Fix spelling and typos (e.g., repeated words) in comments. Signed-off-by: NFuad Tabba <tabba@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200401140310.29701-1-tabba@google.com
-
由 David Matlack 提交于
Two new stats for exposing halt-polling cpu usage: halt_poll_success_ns halt_poll_fail_ns Thus sum of these 2 stats is the total cpu time spent polling. "success" means the VCPU polled until a virtual interrupt was delivered. "fail" means the VCPU had to schedule out (either because the maximum poll time was reached or it needed to yield the CPU). To avoid touching every arch's kvm_vcpu_stat struct, only update and export halt-polling cpu usage stats if we're on x86. Exporting cpu usage as a u64 and in nanoseconds means we will overflow at ~500 years, which seems reasonably large. Signed-off-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NJon Cargille <jcargill@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <20200508182240.68440-1-jcargill@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
While optimizing posted-interrupt delivery especially for the timer fastpath scenario, I measured kvm_x86_ops.deliver_posted_interrupt() to introduce substantial latency because the processor has to perform all vmentry tasks, ack the posted interrupt notification vector, read the posted-interrupt descriptor etc. This is not only slow, it is also unnecessary when delivering an interrupt to the current CPU (as is the case for the LAPIC timer) because PIR->IRR and IRR->RVI synchronization is already performed on vmentry Therefore skip kvm_vcpu_trigger_posted_interrupt in this case, and instead do vmx_sync_pir_to_irr() on the EXIT_FASTPATH_REENTER_GUEST fastpath as well. Tested-by: NHaiwei Li <lihaiwei@tencent.com> Cc: Haiwei Li <lihaiwei@tencent.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1588055009-12677-6-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-