- 12 2月, 2020 1 次提交
-
-
由 Marc Zyngier 提交于
Accessing a per-cpu variable only makes sense when preemption is disabled (and the kernel does check this when the right debug options are switched on). For kvm_get_running_vcpu(), it is fine to return the value after re-enabling preemption, as the preempt notifiers will make sure that this is kept consistent across task migration (the comment above the function hints at it, but lacks the crucial preemption management). While we're at it, move the comment from the ARM code, which explains why the whole thing works. Fixes: 7495e22b ("KVM: Move running VCPU from ARM to common code"). Cc: Paolo Bonzini <pbonzini@redhat.com> Reported-by: NZenghui Yu <yuzenghui@huawei.com> Tested-by: NZenghui Yu <yuzenghui@huawei.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/318984f6-bc36-33a3-abc6-bf2295974b06@huawei.com Message-id: <20200207163410.31276-1-maz@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 2月, 2020 1 次提交
-
-
由 Zhuang Yanying 提交于
We are testing Virtual Machine with KSM on v5.4-rc2 kernel, and found the zero_page refcount overflow. The cause of refcount overflow is increased in try_async_pf (get_user_page) without being decreased in mmu_set_spte() while handling ept violation. In kvm_release_pfn_clean(), only unreserved page will call put_page. However, zero page is reserved. So, as well as creating and destroy vm, the refcount of zero page will continue to increase until it overflows. step1: echo 10000 > /sys/kernel/pages_to_scan/pages_to_scan echo 1 > /sys/kernel/pages_to_scan/run echo 1 > /sys/kernel/pages_to_scan/use_zero_pages step2: just create several normal qemu kvm vms. And destroy it after 10s. Repeat this action all the time. After a long period of time, all domains hang because of the refcount of zero page overflow. Qemu print error log as follow: … error: kvm run failed Bad address EAX=00006cdc EBX=00000008 ECX=80202001 EDX=078bfbfd ESI=ffffffff EDI=00000000 EBP=00000008 ESP=00006cc4 EIP=000efd75 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f7070 00000037 IDT= 000f70ae 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 01 00 00 00 e9 e8 00 00 00 c7 05 4c 55 0f 00 01 00 00 00 <8b> 35 00 00 01 00 8b 3d 04 00 01 00 b8 d8 d3 00 00 c1 e0 08 0c ea a3 00 00 01 00 c7 05 04 … Meanwhile, a kernel warning is departed. [40914.836375] WARNING: CPU: 3 PID: 82067 at ./include/linux/mm.h:987 try_get_page+0x1f/0x30 [40914.836412] CPU: 3 PID: 82067 Comm: CPU 0/KVM Kdump: loaded Tainted: G OE 5.2.0-rc2 #5 [40914.836415] RIP: 0010:try_get_page+0x1f/0x30 [40914.836417] Code: 40 00 c3 0f 1f 84 00 00 00 00 00 48 8b 47 08 a8 01 75 11 8b 47 34 85 c0 7e 10 f0 ff 47 34 b8 01 00 00 00 c3 48 8d 78 ff eb e9 <0f> 0b 31 c0 c3 66 90 66 2e 0f 1f 84 00 0 0 00 00 00 48 8b 47 08 a8 [40914.836418] RSP: 0018:ffffb4144e523988 EFLAGS: 00010286 [40914.836419] RAX: 0000000080000000 RBX: 0000000000000326 RCX: 0000000000000000 [40914.836420] RDX: 0000000000000000 RSI: 00004ffdeba10000 RDI: ffffdf07093f6440 [40914.836421] RBP: ffffdf07093f6440 R08: 800000424fd91225 R09: 0000000000000000 [40914.836421] R10: ffff9eb41bfeebb8 R11: 0000000000000000 R12: ffffdf06bbd1e8a8 [40914.836422] R13: 0000000000000080 R14: 800000424fd91225 R15: ffffdf07093f6440 [40914.836423] FS: 00007fb60ffff700(0000) GS:ffff9eb4802c0000(0000) knlGS:0000000000000000 [40914.836425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [40914.836426] CR2: 0000000000000000 CR3: 0000002f220e6002 CR4: 00000000003626e0 [40914.836427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [40914.836427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [40914.836428] Call Trace: [40914.836433] follow_page_pte+0x302/0x47b [40914.836437] __get_user_pages+0xf1/0x7d0 [40914.836441] ? irq_work_queue+0x9/0x70 [40914.836443] get_user_pages_unlocked+0x13f/0x1e0 [40914.836469] __gfn_to_pfn_memslot+0x10e/0x400 [kvm] [40914.836486] try_async_pf+0x87/0x240 [kvm] [40914.836503] tdp_page_fault+0x139/0x270 [kvm] [40914.836523] kvm_mmu_page_fault+0x76/0x5e0 [kvm] [40914.836588] vcpu_enter_guest+0xb45/0x1570 [kvm] [40914.836632] kvm_arch_vcpu_ioctl_run+0x35d/0x580 [kvm] [40914.836645] kvm_vcpu_ioctl+0x26e/0x5d0 [kvm] [40914.836650] do_vfs_ioctl+0xa9/0x620 [40914.836653] ksys_ioctl+0x60/0x90 [40914.836654] __x64_sys_ioctl+0x16/0x20 [40914.836658] do_syscall_64+0x5b/0x180 [40914.836664] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [40914.836666] RIP: 0033:0x7fb61cb6bfc7 Signed-off-by: NLinFeng <linfeng23@huawei.com> Signed-off-by: NZhuang Yanying <ann.zhuangyanying@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 31 1月, 2020 2 次提交
-
-
由 Boris Ostrovsky 提交于
__kvm_map_gfn()'s call to gfn_to_pfn_memslot() is * relatively expensive * in certain cases (such as when done from atomic context) cannot be called Stashing gfn-to-pfn mapping should help with both cases. This is part of CVE-2019-3016. Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: NJoao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Boris Ostrovsky 提交于
kvm_vcpu_(un)map operates on gfns from any current address space. In certain cases we want to make sure we are not mapping SMRAM and for that we can use kvm_(un)map_gfn() that we are introducing in this patch. This is part of CVE-2019-3016. Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: NJoao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 1月, 2020 16 次提交
-
-
由 Sean Christopherson 提交于
Avoid the "writable" check in __gfn_to_hva_many(), which will always fail on read-only memslots due to gfn_to_hva() assuming writes. Functionally, this allows x86 to create large mappings for read-only memslots that are backed by HugeTLB mappings. Note, the changelog for commit 05da4558 ("KVM: MMU: large page support") states "If the largepage contains write-protected pages, a large pte is not used.", but "write-protected" refers to pages that are temporarily read-only, e.g. read-only memslots didn't even exist at the time. Fixes: 4d8b81ab ("KVM: introduce readonly memslot") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> [Redone using kvm_vcpu_gfn_to_memslot_prot. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Use kvm_vcpu_gfn_to_hva() when retrieving the host page size so that the correct set of memslots is used when handling x86 page faults in SMM. Fixes: 54bf36aa ("KVM: x86: use vcpu-specific functions to read/write/translate GFNs") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add a helper, is_transparent_hugepage(), to explicitly check whether a compound page is a THP and use it when populating KVM's secondary MMU. The explicit check fixes a bug where a remapped compound page, e.g. for an XDP Rx socket, is mapped into a KVM guest and is mistaken for a THP, which results in KVM incorrectly creating a huge page in its secondary MMU. Fixes: 936a5fe6 ("thp: kvm mmu transparent hugepage support") Reported-by: syzbot+c9d1fb51ac9d0d10c39d@syzkaller.appspotmail.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Check the result of __kvm_gfn_to_hva_cache_init() and return immediately instead of relying on the kvm_is_error_hva() check to detect errors so that it's abundantly clear KVM intends to immediately bail on an error. Note, the hva check is still mandatory to handle errors on subqeuesnt calls with the same generation. Similarly, always return -EFAULT on error so that multiple (bad) calls for a given generation will get the same result, e.g. on an illegal gfn wrap, propagating the return from __kvm_gfn_to_hva_cache_init() would cause the initial call to return -EINVAL and subsequent calls to return -EFAULT. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Barret reported a (technically benign) bug where nr_pages_avail can be accessed without being initialized if gfn_to_hva_many() fails. virt/kvm/kvm_main.c:2193:13: warning: 'nr_pages_avail' may be used uninitialized in this function [-Wmaybe-uninitialized] Rather than simply squashing the warning by initializing nr_pages_avail, fix the underlying issues by reworking __kvm_gfn_to_hva_cache_init() to return immediately instead of continuing on. Now that all callers check the result and/or bail immediately on a bad hva, there's no need to explicitly nullify the memslot on error. Reported-by: NBarret Rhoden <brho@google.com> Fixes: f1b9dd5e ("kvm: Disallow wraparound in kvm_gfn_to_hva_cache_init") Cc: Jim Mattson <jmattson@google.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
When reading/writing using the guest/host cache, check for a bad hva before checking for a NULL memslot, which triggers the slow path for handing cross-page accesses. Because the memslot is nullified on error by __kvm_gfn_to_hva_cache_init(), if the bad hva is encountered after crossing into a new page, then the kvm_{read,write}_guest() slow path could potentially write/access the first chunk prior to detecting the bad hva. Arguably, performing a partial access is semantically correct from an architectural perspective, but that behavior is certainly not intended. In the original implementation, memslot was not explicitly nullified and therefore the partial access behavior varied based on whether the memslot itself was null, or if the hva was simply bad. The current behavior was introduced as a seemingly unintentional side effect in commit f1b9dd5e ("kvm: Disallow wraparound in kvm_gfn_to_hva_cache_init"), which justified the change with "since some callers don't check the return code from this function, it sit seems prudent to clear ghc->memslot in the event of an error". Regardless of intent, the partial access is dependent on _not_ checking the result of the cache initialization, which is arguably a bug in its own right, at best simply weird. Fixes: 8f964525 ("KVM: Allow cross page reads and writes from cached translations.") Cc: Jim Mattson <jmattson@google.com> Cc: Andrew Honig <ahonig@google.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
For ring-based dirty log tracking, it will be more efficient to account writes during schedule-out or schedule-in to the currently running VCPU. We would like to do it even if the write doesn't use the current VCPU's address space, as is the case for cached writes (see commit 4e335d9e, "Revert "KVM: Support vCPU-based gfn->hva cache"", 2017-05-02). Therefore, add a mechanism to track the currently-loaded kvm_vcpu struct. There is already something similar in KVM/ARM; one important difference is that kvm_arch_vcpu_{load,put} have two callers in virt/kvm/kvm_main.c: we have to update both the architecture-independent vcpu_{load,put} and the preempt notifiers. Another change made in the process is to allow using kvm_get_running_vcpu() in preemptible code. This is allowed because preempt notifiers ensure that the value does not change even after the VCPU thread is migrated. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Xu 提交于
It's already going to reach 2400 Bytes (which is over half of page size on 4K page archs), so maybe it's good to have this build-time check in case it overflows when adding new fields. Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Xu 提交于
Remove kvm_read_guest_atomic() because it's not used anywhere. Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Open code the allocation and freeing of the vcpu->run page in kvm_vm_ioctl_create_vcpu() and kvm_vcpu_destroy() respectively. Doing so allows kvm_vcpu_init() to be a pure init function and eliminates kvm_vcpu_uninit() entirely. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the putting of vcpu->pid to kvm_vcpu_destroy(). vcpu->pid is guaranteed to be NULL when kvm_vcpu_uninit() is called in the error path of kvm_vm_ioctl_create_vcpu(), e.g. it is explicitly nullified by kvm_vcpu_init() and is only changed by KVM_RUN. No functional change intended. Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Remove kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit() now that all arch specific implementations are nops. Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Remove kvm_arch_vcpu_setup() now that all arch specific implementations are nops. Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Initialize the preempt notifier immediately in kvm_vcpu_init() to pave the way for removing kvm_arch_vcpu_setup(), i.e. to allow arch specific code to call vcpu_load() during kvm_arch_vcpu_create(). Back when preemption support was added, the location of the call to init the preempt notifier was perfectly sane. The overall vCPU creation flow featured a single arch specific hook and the preempt notifer was used immediately after its initialization (by vcpu_load()). E.g.: vcpu = kvm_arch_ops->vcpu_create(kvm, n); if (IS_ERR(vcpu)) return PTR_ERR(vcpu); preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops); vcpu_load(vcpu); r = kvm_mmu_setup(vcpu); vcpu_put(vcpu); if (r < 0) goto free_vcpu; Today, the call to preempt_notifier_init() is sandwiched between two arch specific calls, kvm_arch_vcpu_create() and kvm_arch_vcpu_setup(), which needlessly forces x86 (and possibly others?) to split its vCPU creation flow. Init the preempt notifier prior to any arch specific call so that each arch can independently decide how best to organize its creation flow. Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Unexport kvm_vcpu_cache and kvm_vcpu_{un}init() and make them static now that they are referenced only in kvm_main.c. Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Now that all architectures tightly couple vcpu allocation/free with the mandatory calls to kvm_{un}init_vcpu(), move the sequences verbatim to common KVM code. Move both allocation and initialization in a single patch to eliminate thrash in arch specific code. The bisection benefits of moving the two pieces in separate patches is marginal at best, whereas the odds of introducing a transient arch specific bug are non-zero. Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 1月, 2020 2 次提交
-
-
由 Sean Christopherson 提交于
Add kvm_vcpu_destroy() and wire up all architectures to call the common function instead of their arch specific implementation. The common destruction function will be used by future patches to move allocation and initialization of vCPUs to common KVM code, i.e. to free resources that are allocated by arch agnostic code. No functional change intended. Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add a pre-allocation arch hook to handle checks that are currently done by arch specific code prior to allocating the vCPU object. This paves the way for moving the allocation to common KVM code. Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 1月, 2020 4 次提交
-
-
由 Milan Pandurov 提交于
We can store reference to kvm_stats_debugfs_item instead of copying its values to kvm_stat_data. This allows us to remove duplicated code and usage of temporary kvm_stat_data inside vm_stat_get et al. Signed-off-by: NMilan Pandurov <milanpa@amazon.de> Reviewed-by: NAlexander Graf <graf@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Miaohe Lin 提交于
Fix some writing mistakes in the comments. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Miaohe Lin 提交于
Fix some grammar mistakes in the comments. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Miaohe Lin 提交于
Fix some wrong function names in comment. mmu_check_roots is a typo for mmu_check_root, vmcs_read_any should be vmcs12_read_any and so on. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 1月, 2020 1 次提交
-
-
由 Miaohe Lin 提交于
We can get rid of unnecessary var page in kvm_set_pfn_dirty() , thus make code style similar with kvm_set_pfn_accessed(). Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 11月, 2019 1 次提交
-
-
由 Miaohe Lin 提交于
The jump label out_free_1 and out_free_2 deal with the same stuff, so git rid of one and rename the label out_free_0a to retain the label name order. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 11月, 2019 2 次提交
-
-
由 Radim Krčmář 提交于
Fetching an index for any vcpu in kvm->vcpus array by traversing the entire array everytime is costly. This patch remembers the position of each vcpu in kvm->vcpus array by storing it in vcpus_idx under kvm_vcpu structure. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NNitesh Narayan Lal <nitesh@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Marc Zyngier 提交于
Add a comment explaining the rational behind having both no_compat open and ioctl callbacks to fend off compat tasks. Signed-off-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 11月, 2019 1 次提交
-
-
由 Marc Zyngier 提交于
On a system without KVM_COMPAT, we prevent IOCTLs from being issued by a compat task. Although this prevents most silly things from happening, it can still confuse a 32bit userspace that is able to open the kvm device (the qemu test suite seems to be pretty mad with this behaviour). Take a more radical approach and return a -ENODEV to the compat task. Reported-by: NPeter Maydell <peter.maydell@linaro.org> Signed-off-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 11月, 2019 1 次提交
-
-
由 Sean Christopherson 提交于
Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and instead manually handle ZONE_DEVICE on a case-by-case basis. For things like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal pages, e.g. put pages grabbed via gup(). But for flows such as setting A/D bits or shifting refcounts for transparent huge pages, KVM needs to to avoid processing ZONE_DEVICE pages as the flows in question lack the underlying machinery for proper handling of ZONE_DEVICE pages. This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup() when running a KVM guest backed with /dev/dax memory, as KVM straight up doesn't put any references to ZONE_DEVICE pages acquired by gup(). Note, Dan Williams proposed an alternative solution of doing put_page() on ZONE_DEVICE pages immediately after gup() in order to simplify the auditing needed to ensure is_zone_device_page() is called if and only if the backing device is pinned (via gup()). But that approach would break kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til unmap() when accessing guest memory, unlike KVM's secondary MMU, which coordinates with mmu_notifier invalidations to avoid creating stale page references, i.e. doesn't rely on pages being pinned. [*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.plReported-by: NAdam Borowski <kilobyte@angband.pl> Analyzed-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Cc: stable@vger.kernel.org Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 11月, 2019 2 次提交
-
-
由 Paolo Bonzini 提交于
Reported by syzkaller: ============================= WARNING: suspicious RCU usage ----------------------------- ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by repro_11/12688. stack backtrace: Call Trace: dump_stack+0x7d/0xc5 lockdep_rcu_suspicious+0x123/0x170 kvm_dev_ioctl+0x9a9/0x1260 [kvm] do_vfs_ioctl+0x1a1/0xfb0 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x73/0xb0 do_syscall_64+0x108/0xaa0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Commit a97b0e77 (kvm: call kvm_arch_destroy_vm if vm creation fails) sets users_count to 1 before kvm_arch_init_vm(), however, if kvm_arch_init_vm() fails, we need to decrease this count. By moving it earlier, we can push the decrease to out_err_no_arch_destroy_vm without introducing yet another error label. syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=15209b84e00000 Reported-by: syzbot+75475908cd0910f141ee@syzkaller.appspotmail.com Fixes: a97b0e77 ("kvm: call kvm_arch_destroy_vm if vm creation fails") Cc: Jim Mattson <jmattson@google.com> Analyzed-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Reported by syzkaller: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 14727 Comm: syz-executor.3 Not tainted 5.4.0-rc4+ #0 RIP: 0010:kvm_coalesced_mmio_init+0x5d/0x110 arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:121 Call Trace: kvm_dev_ioctl_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:3446 [inline] kvm_dev_ioctl+0x781/0x1490 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3494 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x196/0x1150 fs/ioctl.c:696 ksys_ioctl+0x62/0x90 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x6e/0xb0 fs/ioctl.c:718 do_syscall_64+0xca/0x5d0 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Commit 9121923c ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm") moves memslots and buses allocations around, however, if kvm->srcu/irq_srcu fails initialization, NULL will be returned instead of error code, NULL will not be intercepted in kvm_dev_ioctl_create_vm() and be dereferenced by kvm_coalesced_mmio_init(), this patch fixes it. Moving the initialization is required anyway to avoid an incorrect synchronize_srcu that was also reported by syzkaller: wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 __synchronize_srcu+0x197/0x250 kernel/rcu/srcutree.c:921 synchronize_srcu_expedited kernel/rcu/srcutree.c:946 [inline] synchronize_srcu+0x239/0x3e8 kernel/rcu/srcutree.c:997 kvm_page_track_unregister_notifier+0xe7/0x130 arch/x86/kvm/page_track.c:212 kvm_mmu_uninit_vm+0x1e/0x30 arch/x86/kvm/mmu.c:5828 kvm_arch_destroy_vm+0x4a2/0x5f0 arch/x86/kvm/x86.c:9579 kvm_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:702 [inline] so do it. Reported-by: syzbot+89a8060879fa0bd2db4f@syzkaller.appspotmail.com Reported-by: syzbot+e27e7027eb2b80e44225@syzkaller.appspotmail.com Fixes: 9121923c ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm") Cc: Jim Mattson <jmattson@google.com> Cc: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 11月, 2019 1 次提交
-
-
由 Junaid Shahid 提交于
The page table pages corresponding to broken down large pages are zapped in FIFO order, so that the large page can potentially be recovered, if it is not longer being used for execution. This removes the performance penalty for walking deeper EPT page tables. By default, one large page will last about one hour once the guest reaches a steady state. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 04 11月, 2019 1 次提交
-
-
由 Junaid Shahid 提交于
Add a function to create a kernel thread associated with a given VM. In particular, it ensures that the worker thread inherits the priority and cgroups of the calling thread. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 31 10月, 2019 1 次提交
-
-
由 Jim Mattson 提交于
In kvm_create_vm(), if we've successfully called kvm_arch_init_vm(), but then fail later in the function, we need to call kvm_arch_destroy_vm() so that it can do any necessary cleanup (like freeing memory). Fixes: 44a95dae ("KVM: x86: Detect and Initialize AVIC support") Signed-off-by: NJohn Sperbeck <jsperbeck@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NJunaid Shahid <junaids@google.com> [Remove dependency on "kvm: Don't clear reference count on kvm_create_vm() error path" which was not committed. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 10月, 2019 1 次提交
-
-
由 Jim Mattson 提交于
This reorganization will allow us to call kvm_arch_destroy_vm in the event that kvm_create_vm fails after calling kvm_arch_init_vm. Suggested-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 10月, 2019 2 次提交
-
-
由 Sean Christopherson 提交于
Add a new helper, kvm_put_kvm_no_destroy(), to handle putting a borrowed reference[*] to the VM when installing a new file descriptor fails. KVM expects the refcount to remain valid in this case, as the in-progress ioctl() has an explicit reference to the VM. The primary motiviation for the helper is to document that the 'kvm' pointer is still valid after putting the borrowed reference, e.g. to document that doing mutex(&kvm->lock) immediately after putting a ref to kvm isn't broken. [*] When exposing a new object to userspace via a file descriptor, e.g. a new vcpu, KVM grabs a reference to itself (the VM) prior to making the object visible to userspace to avoid prematurely freeing the VM in the scenario where userspace immediately closes file descriptor. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Don't waste cycles to shrink/grow vCPU halt_poll_ns if host side polling is disabled. Acked-by: NMarcelo Tosatti <mtosatti@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-