- 13 3月, 2021 4 次提交
-
-
由 Wanpeng Li 提交于
Advancing the timer expiration should only be necessary on guest initiated writes. When we cancel the timer and clear .pending during state restore, clear expired_tscdeadline as well. Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1614818118-965-1-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
If mmu_lock is held for write, don't bother setting !PRESENT SPTEs to REMOVED_SPTE when recursively zapping SPTEs as part of shadow page removal. The concurrent write protections provided by REMOVED_SPTE are not needed, there are no backing page side effects to record, and MMIO SPTEs can be left as is since they are protected by the memslot generation, not by ensuring that the MMIO SPTE is unreachable (which is racy with respect to lockless walks regardless of zapping behavior). Skipping !PRESENT drastically reduces the number of updates needed to tear down sparsely populated MMUs, e.g. when tearing down a 6gb VM that didn't touch much memory, 6929/7168 (~96.6%) of SPTEs were '0' and could be skipped. Avoiding the write itself is likely close to a wash, but avoiding __handle_changed_spte() is a clear-cut win as that involves saving and restoring all non-volatile GPRs (it's a subtly big function), as well as several conditional branches before bailing out. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210310003029.1250571-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 88 On-line CPU(s) list: 0-63 Off-line CPU(s) list: 64-87 # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-5.10.0-rc3-tlinux2-0050+ root=/dev/mapper/cl-root ro rd.lvm.lv=cl/root rhgb quiet console=ttyS0 LANG=en_US .UTF-8 no-kvmclock-vsyscall # echo 1 > /sys/devices/system/cpu/cpu76/online -bash: echo: write error: Cannot allocate memory The per-cpu vsyscall pvclock data pointer assigns either an element of the static array hv_clock_boot (#vCPU <= 64) or dynamically allocated memory hvclock_mem (vCPU > 64), the dynamically memory will not be allocated if kvmclock vsyscall is disabled, this can result in cpu hotpluged fails in kvmclock_setup_percpu() which returns -ENOMEM. It's broken for no-vsyscall and sometimes you end up with vsyscall disabled if the host does something strange. This patch fixes it by allocating this dynamically memory unconditionally even if vsyscall is disabled. Fixes: 6a1cac56 ("x86/kvm: Use __bss_decrypted attribute in shared variables") Reported-by: NZelin Deng <zelin.deng@linux.alibaba.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: stable@vger.kernel.org#v4.19-rc5+ Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1614130683-24137-1-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Muhammad Usama Anjum 提交于
This patch adds the annotation to fix the following sparse errors: arch/x86/kvm//x86.c:8147:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//x86.c:8147:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//x86.c:8147:15: struct kvm_apic_map * arch/x86/kvm//x86.c:10628:16: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//x86.c:10628:16: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//x86.c:10628:16: struct kvm_apic_map * arch/x86/kvm//x86.c:10629:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//x86.c:10629:15: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//x86.c:10629:15: struct kvm_pmu_event_filter * arch/x86/kvm//lapic.c:267:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:267:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:267:15: struct kvm_apic_map * arch/x86/kvm//lapic.c:269:9: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:269:9: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:269:9: struct kvm_apic_map * arch/x86/kvm//lapic.c:637:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:637:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:637:15: struct kvm_apic_map * arch/x86/kvm//lapic.c:994:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:994:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:994:15: struct kvm_apic_map * arch/x86/kvm//lapic.c:1036:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:1036:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:1036:15: struct kvm_apic_map * arch/x86/kvm//lapic.c:1173:15: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//lapic.c:1173:15: struct kvm_apic_map [noderef] __rcu * arch/x86/kvm//lapic.c:1173:15: struct kvm_apic_map * arch/x86/kvm//pmu.c:190:18: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//pmu.c:190:18: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//pmu.c:190:18: struct kvm_pmu_event_filter * arch/x86/kvm//pmu.c:251:18: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//pmu.c:251:18: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//pmu.c:251:18: struct kvm_pmu_event_filter * arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter * arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces): arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter [noderef] __rcu * arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter * Signed-off-by: NMuhammad Usama Anjum <musamaanjum@gmail.com> Message-Id: <20210305191123.GA497469@LEGION> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 3月, 2021 2 次提交
-
-
由 Jan Beulich 提交于
It's not helpful if every driver has to cook its own. Generalize xenbus'es INVALID_GRANT_HANDLE and pcifront's INVALID_GRANT_REF (which shouldn't have expanded to zero to begin with). Use the constants in p2m.c and gntdev.c right away, and update field types where necessary so they would match with the constants' types (albeit without touching struct ioctl_gntdev_grant_ref's ref field, as that's part of the public interface of the kernel and would require introducing a dependency on Xen's grant_table.h public header). Signed-off-by: NJan Beulich <jbeulich@suse.com> Reviewed-by: NJuergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/db7c38a5-0d75-d5d1-19de-e5fe9f0b9c48@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Jan Beulich 提交于
They're only used internally, and the layering violation they contain (x86) or imply (Arm) of calling HYPERVISOR_grant_table_op() strongly advise against any (uncontrolled) use from a module. The functions also never had users except the ones from drivers/xen/grant-table.c forever since their introduction in 3.15. Signed-off-by: NJan Beulich <jbeulich@suse.com> Reviewed-by: NStefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/746a5cd6-1446-eda4-8b23-03c1cac30b8d@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
- 06 3月, 2021 1 次提交
-
-
由 Muhammad Usama Anjum 提交于
Sparse warnings removed: warning: Using plain integer as NULL pointer Signed-off-by: NMuhammad Usama Anjum <musamaanjum@gmail.com> Message-Id: <20210305180816.GA488770@LEGION> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 3月, 2021 2 次提交
-
-
由 Sean Christopherson 提交于
Directly connect the 'npt' param to the 'npt_enabled' variable so that runtime adjustments to npt_enabled are reflected in sysfs. Move the !PAE restriction to a runtime check to ensure NPT is forced off if the host is using 2-level paging, and add a comment explicitly stating why NPT requires a 64-bit kernel or a kernel with PAE enabled. Opportunistically switch the param to octal permissions. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305021637.3768573-1-seanjc@google.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
When posting a deadline timer interrupt, open code the checks guarding __kvm_wait_lapic_expire() in order to skip the lapic_timer_int_injected() check in kvm_wait_lapic_expire(). The injection check will always fail since the interrupt has not yet be injected. Moving the call after injection would also be wrong as that wouldn't actually delay delivery of the IRQ if it is indeed sent via posted interrupt. Fixes: 010fd37f ("KVM: LAPIC: Reduce world switch latency caused by timer_advance_ns") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305021808.3769732-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 03 3月, 2021 6 次提交
-
-
由 Juergen Gross 提交于
Since commit 9e2369c0 ("xen: add helpers to allocate unpopulated memory") foreign mappings are using guest physical addresses allocated via ZONE_DEVICE functionality. This will result in problems for the case of no balloon memory hotplug being configured, as the p2m list will only cover the initial memory size of the domain. Any ZONE_DEVICE allocated address will be outside the p2m range and thus a mapping can't be established with that memory address. Fix that by extending the p2m size for that case. At the same time add a check for a to be created mapping to be within the p2m limits in order to detect errors early. While changing a comment, remove some 32-bit leftovers. This is XSA-369. Fixes: 9e2369c0 ("xen: add helpers to allocate unpopulated memory") Cc: <stable@vger.kernel.org> # 5.9 Reported-by: NMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
由 Jan Beulich 提交于
Bailing immediately from set_foreign_p2m_mapping() upon a p2m updating error leaves the full batch in an ambiguous state as far as the caller is concerned. Instead flags respective slots as bad, unmapping what was mapped there right away. HYPERVISOR_grant_table_op()'s return value and the individual unmap slots' status fields get used only for a one-time - there's not much we can do in case of a failure. Note that there's no GNTST_enomem or alike, so GNTST_general_error gets used. The map ops' handle fields get overwritten just to be on the safe side. This is part of XSA-367. Cc: <stable@vger.kernel.org> Signed-off-by: NJan Beulich <jbeulich@suse.com> Reviewed-by: NJuergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/96cccf5d-e756-5f53-b91a-ea269bfb9be0@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>
-
由 Babu Moger 提交于
This problem was reported on a SVM guest while executing kexec. Kexec fails to load the new kernel when the PCID feature is enabled. When kexec starts loading the new kernel, it starts the process by resetting the vCPU's and then bringing each vCPU online one by one. The vCPU reset is supposed to reset all the register states before the vCPUs are brought online. However, the CR4 register is not reset during this process. If this register is already setup during the last boot, all the flags can remain intact. The X86_CR4_PCIDE bit can only be enabled in long mode. So, it must be enabled much later in SMP initialization. Having the X86_CR4_PCIDE bit set during SMP boot can cause a boot failures. Fix the issue by resetting the CR4 register in init_vmcb(). Signed-off-by: NBabu Moger <babu.moger@amd.com> Message-Id: <161471109108.30811.6392805173629704166.stgit@bmoger-ubuntu> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
This is how Xen guests do steal time accounting. The hypervisor records the amount of time spent in each of running/runnable/blocked/offline states. In the Xen accounting, a vCPU is still in state RUNSTATE_running while in Xen for a hypercall or I/O trap, etc. Only if Xen explicitly schedules does the state become RUNSTATE_blocked. In KVM this means that even when the vCPU exits the kvm_run loop, the state remains RUNSTATE_running. The VMM can explicitly set the vCPU to RUNSTATE_blocked by using the KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT attribute, and can also use KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST to retrospectively add a given amount of time to the blocked state and subtract it from the running state. The state_entry_time corresponds to get_kvmclock_ns() at the time the vCPU entered the current state, and the total times of all four states should always add up to state_entry_time. Co-developed-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Message-Id: <20210301125309.874953-2-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
When clearing the per-vCPU shared regions, set the return value to zero to indicate success. This was causing spurious errors to be returned to userspace on soft reset. Also add a paranoid BUILD_BUG_ON() for compat structure compatibility. Fixes: 0c165b3c ("KVM: x86/xen: Allow reset of Xen attributes") Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Message-Id: <20210301125309.874953-1-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The Xen hypercall interface adds to the attack surface of the hypervisor and will be used quite rarely. Allow compiling it out. Suggested-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 27 2月, 2021 4 次提交
-
-
由 NeilBrown 提交于
The memtype seq_file iterator allocates a buffer in the ->start and ->next functions and frees it in the ->show function. The preferred handling for such resources is to free them in the subsequent ->next or ->stop function call. Since Commit 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code and interface") there is no guarantee that ->show will be called after ->next, so this function can now leak memory. So move the freeing of the buffer to ->next and ->stop. Link: https://lkml.kernel.org/r/161248539022.21478.13874455485854739066.stgit@noble1 Fixes: 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code and interface") Signed-off-by: NNeilBrown <neilb@suse.de> Cc: Xin Long <lucien.xin@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Marco Elver 提交于
Add KFENCE test suite, testing various error detection scenarios. Makes use of KUnit for test organization. Since KFENCE's interface to obtain error reports is via the console, the test verifies that KFENCE outputs expected reports to the console. [elver@google.com: fix typo in test] Link: https://lkml.kernel.org/r/X9lHQExmHGvETxY4@elver.google.com [elver@google.com: show access type in report] Link: https://lkml.kernel.org/r/20210111091544.3287013-2-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-9-elver@google.comSigned-off-by: NAlexander Potapenko <glider@google.com> Signed-off-by: NMarco Elver <elver@google.com> Reviewed-by: NDmitry Vyukov <dvyukov@google.com> Co-developed-by: NAlexander Potapenko <glider@google.com> Reviewed-by: NJann Horn <jannh@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Marco Elver 提交于
Instead of removing the fault handling portion of the stack trace based on the fault handler's name, just use struct pt_regs directly. Change kfence_handle_page_fault() to take a struct pt_regs, and plumb it through to kfence_report_error() for out-of-bounds, use-after-free, or invalid access errors, where pt_regs is used to generate the stack trace. If the kernel is a DEBUG_KERNEL, also show registers for more information. Link: https://lkml.kernel.org/r/20201105092133.2075331-1-elver@google.comSigned-off-by: NMarco Elver <elver@google.com> Suggested-by: NMark Rutland <mark.rutland@arm.com> Acked-by: NMark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Potapenko 提交于
Add architecture specific implementation details for KFENCE and enable KFENCE for the x86 architecture. In particular, this implements the required interface in <asm/kfence.h> for setting up the pool and providing helper functions for protecting and unprotecting pages. For x86, we need to ensure that the pool uses 4K pages, which is done using the set_memory_4k() helper function. [elver@google.com: add missing copyright and description header] Link: https://lkml.kernel.org/r/20210118092159.145934-2-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-3-elver@google.comSigned-off-by: NMarco Elver <elver@google.com> Signed-off-by: NAlexander Potapenko <glider@google.com> Reviewed-by: NDmitry Vyukov <dvyukov@google.com> Co-developed-by: NMarco Elver <elver@google.com> Reviewed-by: NJann Horn <jannh@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 2月, 2021 4 次提交
-
-
由 Paolo Bonzini 提交于
A missing flush would cause the static branch to trigger incorrectly. Cc: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Check that PML is actually enabled before setting the mask to force a SPTE to be write-protected. The bits used for the !AD_ENABLED case are in the upper half of the SPTE. With 64-bit paging and EPT, these bits are ignored, but with 32-bit PAE paging they are reserved. Setting them for L2 SPTEs without checking PML breaks NPT on 32-bit KVM. Fixes: 1f4e5fc8 ("KVM: x86: fix nested guest live migration with PML") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Reported by syzkaller: KASAN: null-ptr-deref in range [0x0000000000000140-0x0000000000000147] CPU: 1 PID: 8370 Comm: syz-executor859 Not tainted 5.11.0-syzkaller #0 RIP: 0010:synic_get arch/x86/kvm/hyperv.c:165 [inline] RIP: 0010:kvm_hv_set_sint_gsi arch/x86/kvm/hyperv.c:475 [inline] RIP: 0010:kvm_hv_irq_routing_update+0x230/0x460 arch/x86/kvm/hyperv.c:498 Call Trace: kvm_set_irq_routing+0x69b/0x940 arch/x86/kvm/../../../virt/kvm/irqchip.c:223 kvm_vm_ioctl+0x12d0/0x2800 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3959 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Hyper-V context is lazily allocated until Hyper-V specific MSRs are accessed or SynIC is enabled. However, the syzkaller testcase sets irq routing table directly w/o enabling SynIC. This results in null-ptr-deref when accessing SynIC Hyper-V context. This patch fixes it. syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=163342ccd00000 Reported-by: syzbot+6987f3b2dbd9eda95f12@syzkaller.appspotmail.com Fixes: 8f014550 ("KVM: x86: hyper-v: Make Hyper-V emulation enablement conditional") Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1614326399-5762-1-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Dongli Zhang 提交于
The 'mmu_page_hash' is used as hash table while 'active_mmu_pages' is a list. Remove the misplaced comment as it's mostly stating the obvious anyways. Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210226061945.1222-1-dongli.zhang@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 2月, 2021 1 次提交
-
-
由 Sean Christopherson 提交于
Fix the interpreation of nested_svm_vmexit()'s return value when synthesizing a nested VM-Exit after intercepting an SVM instruction while L2 was running. The helper returns '0' on success, whereas a return value of '0' in the exit handler path means "exit to userspace". The incorrect return value causes KVM to exit to userspace without filling the run state, e.g. QEMU logs "KVM: unknown exit, hardware reason 0". Fixes: 14c2bf81 ("KVM: SVM: Fix #GP handling for doubly-nested virtualization") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210224005627.657028-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 2月, 2021 5 次提交
-
-
由 Sami Tolvanen 提交于
Pass code model and stack alignment to the linker as these are not stored in LLVM bitcode, and allow CONFIG_LTO_CLANG* to be enabled. Signed-off-by: NSami Tolvanen <samitolvanen@google.com> Reviewed-by: NKees Cook <keescook@chromium.org>
-
由 Sami Tolvanen 提交于
Clang incorrectly inlines functions with differing stack protector attributes, which breaks __restore_processor_state() that relies on stack protector being disabled. This change disables LTO for cpu.c to work aroung the bug. Link: https://bugs.llvm.org/show_bug.cgi?id=47479Suggested-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
-
由 Sami Tolvanen 提交于
Disable LTO for the vDSO. Note that while we could use Clang's LTO for the 64-bit vDSO, it won't add noticeable benefit for the small amount of C code. Signed-off-by: NSami Tolvanen <samitolvanen@google.com> Reviewed-by: NKees Cook <keescook@chromium.org>
-
由 Sami Tolvanen 提交于
Select HAVE_OBJTOOL_MCOUNT if STACK_VALIDATION is selected to use objtool to generate __mcount_loc sections for dynamic ftrace with Clang and gcc <5 (later versions of gcc use -mrecord-mcount). Signed-off-by: NSami Tolvanen <samitolvanen@google.com> Reviewed-by: NKees Cook <keescook@chromium.org>
-
由 Like Xu 提交于
If lbr_desc->event is successfully created, the intel_pmu_create_ guest_lbr_event() will return 0, otherwise it will return -ENOENT, and then jump to LBR msrs dummy handling. Fixes: 1b5ac322 ("KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE") Signed-off-by: NLike Xu <like.xu@linux.intel.com> Message-Id: <20210223013958.1280444-1-like.xu@linux.intel.com> [Add "< 0" and PTR_ERR to make the code clearer. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 2月, 2021 4 次提交
-
-
由 David Stevens 提交于
Track the range being invalidated by mmu_notifier and skip page fault retries if the fault address is not affected by the in-progress invalidation. Handle concurrent invalidations by finding the minimal range which includes all ranges being invalidated. Although the combined range may include unrelated addresses and cannot be shrunk as individual invalidation operations complete, it is unlikely the marginal gains of proper range tracking are worth the additional complexity. The primary benefit of this change is the reduction in the likelihood of extreme latency when handing a page fault due to another thread having been preempted while modifying host virtual addresses. Signed-off-by: NDavid Stevens <stevensd@chromium.org> Message-Id: <20210222024522.1751719-3-stevensd@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Don't retry a page fault due to an mmu_notifier invalidation when handling a page fault for a GPA that did not resolve to a memslot, i.e. an MMIO page fault. Invalidations from the mmu_notifier signal a change in a host virtual address (HVA) mapping; without a memslot, there is no HVA and thus no possibility that the invalidation is relevant to the page fault being handled. Note, the MMIO vs. memslot generation checks handle the case where a pending memslot will create a memslot overlapping the faulting GPA. The mmu_notifier checks are orthogonal to memslot updates. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210222024522.1751719-2-stevensd@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Right now, enter_svm_guest_mode is calling nested_prepare_vmcb_save and nested_prepare_vmcb_control. This results in is_guest_mode being false until the end of nested_prepare_vmcb_control. This is a problem because nested_prepare_vmcb_save can in turn cause changes to the intercepts and these have to be applied to the "host VMCB" (stored in svm->nested.hsave) and then merged with the VMCB12 intercepts into svm->vmcb. In particular, without this change we forget to set the CR0 read and CR0 write intercepts when running a real mode L2 guest with NPT disabled. The guest is therefore able to see the CR0.PG bit that KVM sets to enable "paged real mode". This patch fixes the svm.flat mode_switch test case with npt=0. There are no other problematic calls in nested_prepare_vmcb_save. Moving is_guest_mode to the end is done since commit 06fc7772 ("KVM: SVM: Activate nested state only when guest state is complete", 2010-04-25). However, back then KVM didn't grab a different VMCB when updating the intercepts, it had already copied/merged L1's stuff to L0's VMCB, and then updated L0's VMCB regardless of is_nested(). Later recalc_intercepts was introduced in commit 384c6368 ("KVM: SVM: Add function to recalculate intercept masks", 2011-01-12). This introduced the bug, because recalc_intercepts now throws away the intercept manipulations that svm_set_cr0 had done in the meanwhile to svm->vmcb. [1] https://lore.kernel.org/kvm/1266493115-28386-1-git-send-email-joerg.roedel@amd.com/Reviewed-by: NSean Christopherson <seanjc@google.com> Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Brendan Jackman 提交于
This code generates a CMPXCHG loop in order to implement atomic_fetch bitwise operations. Because CMPXCHG is hard-coded to use rax (which holds the BPF r0 value), it saves the _real_ r0 value into the internal "ax" temporary register and restores it once the loop is complete. In the middle of the loop, the actual bitwise operation is performed using src_reg. The bug occurs when src_reg is r0: as described above, r0 has been clobbered and the real r0 value is in the ax register. Therefore, perform this operation on the ax register instead, when src_reg is r0. Fixes: 981f94c3 ("bpf: Add bitwise atomic instructions") Signed-off-by: NBrendan Jackman <jackmanb@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NKP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20210216125307.1406237-1-jackmanb@google.com
-
- 22 2月, 2021 3 次提交
-
-
由 Jens Axboe 提交于
PF_IO_WORKER are kernel threads too, but they aren't PF_KTHREAD in the sense that we don't assign ->set_child_tid with our own structure. Just ensure that every arch sets up the PF_IO_WORKER threads like kthreads in the arch implementation of copy_thread(). Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Masahiro Yamada 提交于
The 'syscall' variables are not directly used in the commands. Remove the $(srctree)/ prefix because we can rely on VPATH. Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
-
由 Masahiro Yamada 提交于
The rules in these Makefiles cannot detect the command line change because the prerequisite 'FORCE' is missing. Adding 'FORCE' will result in the headers being rebuilt every time because the 'targets' additions are also wrong; the file paths in 'targets' must be relative to the current Makefile. Fix all of them so the if_changed rules work correctly. Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org> Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
-
- 19 2月, 2021 4 次提交
-
-
由 Sean Christopherson 提交于
Remove several exports from the MMU that are no longer necessary. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-15-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop kvm_mmu_slot_largepage_remove_write_access() and refactor its sole caller to use kvm_mmu_slot_remove_write_access(). Remove the now-unused slot_handle_large_level() and slot_handle_all_level() helpers. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-14-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Stop setting dirty bits for MMU pages when dirty logging is disabled for a memslot, as PML is now completely disabled when there are no memslots with dirty logging enabled. This means that spurious PML entries will be created for memslots with dirty logging disabled if at least one other memslot has dirty logging enabled. However, spurious PML entries are already possible since dirty bits are set only when a dirty logging is turned off, i.e. memslots that are never dirty logged will have dirty bits cleared. In the end, it's faster overall to eat a few spurious PML entries in the window where dirty logging is being disabled across all memslots. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-13-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Makarand Sonare 提交于
Currently, if enable_pml=1 PML remains enabled for the entire lifetime of the VM irrespective of whether dirty logging is enable or disabled. When dirty logging is disabled, all the pages of the VM are manually marked dirty, so that PML is effectively non-operational. Setting the dirty bits is an expensive operation which can cause severe MMU lock contention in a performance sensitive path when dirty logging is disabled after a failed or canceled live migration. Manually setting dirty bits also fails to prevent PML activity if some code path clears dirty bits, which can incur unnecessary VM-Exits. In order to avoid this extra overhead, dynamically enable/disable PML when dirty logging gets turned on/off for the first/last memslot. Signed-off-by: NMakarand Sonare <makarandsonare@google.com> Co-developed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-12-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-