- 23 1月, 2019 1 次提交
-
-
由 Juergen Gross 提交于
commit 867cefb4cb1012f42cada1c7d1f35ac8dd276071 upstream. Commit f94c8d11 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface") broke Xen guest time handling across migration: [ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 187.251137] OOM killer disabled. [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 187.252299] suspending xenstore... [ 187.266987] xen:grant_table: Grant tables using version 1 layout [18446743811.706476] OOM killer enabled. [18446743811.706478] Restarting tasks ... done. [18446743811.720505] Setting capacity to 16777216 Fix that by setting xen_sched_clock_offset at resume time to ensure a monotonic clock value. [boris: replaced pr_info() with pr_info_once() in xen_callback_vector() to avoid printing with incorrect timestamp during resume (as we haven't re-adjusted the clock yet)] Fixes: f94c8d11 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface") Cc: <stable@vger.kernel.org> # 4.11 Reported-by: NHans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Tested-by: NHans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 17 1月, 2019 1 次提交
-
-
由 WANG Chao 提交于
commit e4f358916d528d479c3c12bd2fd03f2d5a576380 upstream. Commit 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support") replaced the RETPOLINE define with CONFIG_RETPOLINE checks. Remove the remaining pieces. [ bp: Massage commit message. ] Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support") Signed-off-by: NWANG Chao <chao.wang@ucloud.cn> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NZhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: linux-kbuild@vger.kernel.org Cc: srinivas.eeda@oracle.com Cc: stable <stable@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20181210163725.95977-1-chao.wang@ucloud.cnSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 13 1月, 2019 2 次提交
-
-
由 Kirill A. Shutemov 提交于
[ Upstream commit 254eb5505ca0ca749d3a491fc6668b6c16647a99 ] The LDT remap placement has been changed. It's now placed before the direct mapping in the kernel virtual address space for both paging modes. Change address markers order accordingly. Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging") Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: bhe@redhat.com Cc: hans.van.kranenburg@mendix.com Cc: linux-mm@kvack.org Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20181130202328.65359-3-kirill.shutemov@linux.intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>
-
由 Kirill A. Shutemov 提交于
[ Upstream commit 16877a5570e0c5f4270d5b17f9bab427bcae9514 ] There is a guard hole at the beginning of the kernel address space, also used by hypervisors. It occupies 16 PGD entries. This reserved range is not defined explicitely, it is calculated relative to other entities: direct mapping and user space ranges. The calculation got broken by recent changes of the kernel memory layout: LDT remap range is now mapped before direct mapping and makes the calculation invalid. The breakage leads to crash on Xen dom0 boot[1]. Define the reserved range explicitely. It's part of kernel ABI (hypervisors expect it to be stable) and must not depend on changes in the rest of kernel memory layout. [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03313.html Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging") Reported-by: NHans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NHans van Kranenburg <hans.van.kranenburg@mendix.com> Reviewed-by: NJuergen Gross <jgross@suse.com> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: bhe@redhat.com Cc: linux-mm@kvack.org Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20181130202328.65359-2-kirill.shutemov@linux.intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>
-
- 10 1月, 2019 4 次提交
-
-
由 Sean Christopherson 提交于
commit 1b3ab5ad1b8ad99bae76ec583809c5f5a31c707c upstream. Fixes: 34a1cd60 ("kvm: x86: vmx: move some vmx setting from vmx_init() to hardware_setup()") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Sean Christopherson 提交于
commit e81434995081fd7efb755fd75576b35dbb0850b1 upstream. ____kvm_handle_fault_on_reboot() provides a generic exception fixup handler that is used to cleanly handle faults on VMX/SVM instructions during reboot (or at least try to). If there isn't a reboot in progress, ____kvm_handle_fault_on_reboot() treats any exception as fatal to KVM and invokes kvm_spurious_fault(), which in turn generates a BUG() to get a stack trace and die. When it was originally added by commit 4ecac3fd ("KVM: Handle virtualization instruction #UD faults during reboot"), the "call" to kvm_spurious_fault() was handcoded as PUSH+JMP, where the PUSH'd value is the RIP of the faulting instructing. The PUSH+JMP trickery is necessary because the exception fixup handler code lies outside of its associated function, e.g. right after the function. An actual CALL from the .fixup code would show a slightly bogus stack trace, e.g. an extra "random" function would be inserted into the trace, as the return RIP on the stack would point to no known function (and the unwinder will likely try to guess who owns the RIP). Unfortunately, the JMP was replaced with a CALL when the macro was reworked to not spin indefinitely during reboot (commit b7c4145b "KVM: Don't spin on virt instruction faults during reboot"). This causes the aforementioned behavior where a bogus function is inserted into the stack trace, e.g. my builds like to blame free_kvm_area(). Revert the CALL back to a JMP. The changelog for commit b7c4145b ("KVM: Don't spin on virt instruction faults during reboot") contains nothing that indicates the switch to CALL was deliberate. This is backed up by the fact that the PUSH <insn RIP> was left intact. Note that an alternative to the PUSH+JMP magic would be to JMP back to the "real" code and CALL from there, but that would require adding a JMP in the non-faulting path to avoid calling kvm_spurious_fault() and would add no value, i.e. the stack trace would be the same. Using CALL: ------------[ cut here ]------------ kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356! invalid opcode: 0000 [#1] SMP CPU: 4 PID: 1057 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm] Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 RSP: 0018:ffffc900004bbcc8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff888273fd8000 R08: 00000000000003e8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000371fb0 R13: 0000000000000000 R14: 000000026d763cf4 R15: ffff888273fd8000 FS: 00007f3d69691700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f89bc56fe0 CR3: 0000000271a5a001 CR4: 0000000000362ee0 Call Trace: free_kvm_area+0x1044/0x43ea [kvm_intel] ? vmx_vcpu_run+0x156/0x630 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? __set_task_blocked+0x38/0x90 ? __set_current_blocked+0x50/0x60 ? __fpu__restore_sig+0x97/0x490 ? do_vfs_ioctl+0xa1/0x620 ? __x64_sys_futex+0x89/0x180 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x4f/0x100 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc ---[ end trace 9775b14b123b1713 ]--- Using JMP: ------------[ cut here ]------------ kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356! invalid opcode: 0000 [#1] SMP CPU: 6 PID: 1067 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm] Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 RSP: 0018:ffffc90000497cd0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88827058bd40 R08: 00000000000003e8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000369fb0 R13: 0000000000000000 R14: 00000003c8fc6642 R15: ffff88827058bd40 FS: 00007f3d7219e700(0000) GS:ffff888277900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3d64001000 CR3: 0000000271c6b004 CR4: 0000000000362ee0 Call Trace: vmx_vcpu_run+0x156/0x630 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? __set_task_blocked+0x38/0x90 ? __set_current_blocked+0x50/0x60 ? __fpu__restore_sig+0x97/0x490 ? do_vfs_ioctl+0xa1/0x620 ? __x64_sys_futex+0x89/0x180 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x4f/0x100 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc ---[ end trace f9daedb85ab3ddba ]--- Fixes: b7c4145b ("KVM: Don't spin on virt instruction faults during reboot") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Dan Williams 提交于
commit ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d upstream. Commit: f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()" addressed a case where __flush_tlb_all() is called without preemption being disabled. It also left a warning to catch other cases where preemption is not disabled. That warning triggers for the memory hotplug path which is also used for persistent memory enabling: WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460 RIP: 0010:__flush_tlb_all+0x1b/0x3a [..] Call Trace: phys_pud_init+0x29c/0x2bb kernel_physical_mapping_init+0xfc/0x219 init_memory_mapping+0x1a5/0x3b0 arch_add_memory+0x2c/0x50 devm_memremap_pages+0x3aa/0x610 pmem_attach_disk+0x585/0x700 [nd_pmem] Andy wondered why a path that can sleep was using __flush_tlb_all() [1] and Dave confirmed the expectation for TLB flush is for modifying / invalidating existing PTE entries, but not initial population [2]. Drop the usage of __flush_tlb_all() in phys_{p4d,pud,pmd}_init() on the expectation that this path is only ever populating empty entries for the linear map. Note, at linear map teardown time there is a call to the all-cpu flush_tlb_all() to invalidate the removed mappings. [1]: https://lkml.kernel.org/r/9DFD717D-857D-493D-A606-B635D72BAC21@amacapital.net [2]: https://lkml.kernel.org/r/749919a4-cdb1-48a3-adb4-adb81a5fa0b5@intel.com [ mingo: Minor readability edits. ] Suggested-by: NDave Hansen <dave.hansen@linux.intel.com> Reported-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()") Link: http://lkml.kernel.org/r/154395944713.32119.15611079023837132638.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Michal Hocko 提交于
commit 5b5e4d623ec8a34689df98e42d038a3b594d2ff9 upstream. Swap storage is restricted to max_swapfile_size (~16TB on x86_64) whenever the system is deemed affected by L1TF vulnerability. Even though the limit is quite high for most deployments it seems to be too restrictive for deployments which are willing to live with the mitigation disabled. We have a customer to deploy 8x 6,4TB PCIe/NVMe SSD swap devices which is clearly out of the limit. Drop the swap restriction when l1tf=off is specified. It also doesn't make much sense to warn about too much memory for the l1tf mitigation when it is forcefully disabled by the administrator. [ tglx: Folded the documentation delta change ] Fixes: 377eeaa8 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2") Signed-off-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJiri Kosina <jkosina@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: <linux-mm@kvack.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181113184910.26697-1-mhocko@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 29 12月, 2018 7 次提交
-
-
由 Reinette Chatre 提交于
commit 80b71c340f17705ec145911b9a193ea781811b16 upstream. The user triggers the creation of a pseudo-locked region when writing the requested schemata to the schemata resctrl file. The pseudo-locking of a region is required to be done on a CPU that is associated with the cache on which the pseudo-locked region will reside. In order to run the locking code on a specific CPU, the needed CPU has to be selected and ensured to remain online during the entire locking sequence. At this time, the cpu_hotplug_lock is not taken during the pseudo-lock region creation and it is thus possible for a CPU to be selected to run the pseudo-locking code and then that CPU to go offline before the thread is able to run on it. Fix this by ensuring that the cpu_hotplug_lock is taken while the CPU on which code has to run needs to be controlled. Since the cpu_hotplug_lock is always taken before rdtgroup_mutex the lock order is maintained. Fixes: e0bdfe8e ("x86/intel_rdt: Support creation/removal of pseudo-locked region") Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: stable <stable@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/b7b17432a80f95a1fa21a1698ba643014f58ad31.1544476425.git.reinette.chatre@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Alistair Strachan 提交于
commit cd01544a upstream. Commit 379d98dd ("x86: vdso: Use $LD instead of $CC to link") accidentally broke unwinding from userspace, because ld would strip the .eh_frame sections when linking. Originally, the compiler would implicitly add --eh-frame-hdr when invoking the linker, but when this Makefile was converted from invoking ld via the compiler, to invoking it directly (like vmlinux does), the flag was missed. (The EH_FRAME section is important for the VDSO shared libraries, but not for vmlinux.) Fix the problem by explicitly specifying --eh-frame-hdr, which restores parity with the old method. See relevant bug reports for additional info: https://bugzilla.kernel.org/show_bug.cgi?id=201741 https://bugzilla.redhat.com/show_bug.cgi?id=1659295 Fixes: 379d98dd ("x86: vdso: Use $LD instead of $CC to link") Reported-by: NFlorian Weimer <fweimer@redhat.com> Reported-by: NCarlos O'Donell <carlos@redhat.com> Reported-by: N"H. J. Lu" <hjl.tools@gmail.com> Signed-off-by: NAlistair Strachan <astrachan@google.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Tested-by: NLaura Abbott <labbott@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Carlos O'Donell <carlos@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: kernel-team@android.com Cc: Laura Abbott <labbott@redhat.com> Cc: stable <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: https://lkml.kernel.org/r/20181214223637.35954-1-astrachan@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Dan Williams 提交于
commit 51c3fbd8 upstream. A decoy address is used by set_mce_nospec() to update the cache attributes for a page that may contain poison (multi-bit ECC error) while attempting to minimize the possibility of triggering a speculative access to that page. When reserve_memtype() is handling a decoy address it needs to convert it to its real physical alias. The conversion, AND'ing with __PHYSICAL_MASK, is broken for a 32-bit physical mask and reserve_memtype() is passed the last physical page. Gert reports triggering the: BUG_ON(start >= end); ...assertion when running a 32-bit non-PAE build on a platform that has a driver resource at the top of physical memory: BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved Given that the decoy address scheme is only targeted at 64-bit builds and assumes that the top of physical address space is free for use as a decoy address range, simply bypass address sanitization in the 32-bit case. Lastly, there was no need to crash the system when this failure occurred, and no need to crash future systems if the assumptions of decoy addresses are ever violated. Change the BUG_ON() to a WARN() with an error return. Fixes: 510ee090 ("x86/mm/pat: Prepare {reserve, free}_memtype() for...") Reported-by: NGert Robben <t2@gert.gr> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NGert Robben <t2@gert.gr> Cc: stable@vger.kernel.org Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: platform-driver-x86@vger.kernel.org Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/154454337985.789277.12133288391664677775.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Colin Ian King 提交于
commit 32043fa0 upstream. Currently the copy_to_user of data in the gentry struct is copying uninitiaized data in field _pad from the stack to userspace. Fix this by explicitly memset'ing gentry to zero, this also will zero any compiler added padding fields that may be in struct (currently there are none). Detected by CoverityScan, CID#200783 ("Uninitialized scalar variable") Fixes: b263b31e ("x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls") Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NTyler Hicks <tyhicks@canonical.com> Cc: security@kernel.org Link: https://lkml.kernel.org/r/20181218172956.1440-1-colin.king@canonical.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Cfir Cohen 提交于
commit c2dd5146 upstream. nested_get_vmcs12_pages() processes the posted_intr address in vmcs12. It caches the kmap()ed page object and pointer, however, it doesn't handle errors correctly: it's possible to cache a valid pointer, then release the page and later dereference the dangling pointer. I was able to reproduce with the following steps: 1. Call vmlaunch with valid posted_intr_desc_addr but an invalid MSR_EFER. This causes nested_get_vmcs12_pages() to cache the kmap()ed pi_desc_page and pi_desc. Later the invalid EFER value fails check_vmentry_postreqs() which fails the first vmlaunch. 2. Call vmlanuch with a valid EFER but an invalid posted_intr_desc_addr (I set it to 2G - 0x80). The second time we call nested_get_vmcs12_pages pi_desc_page is unmapped and released and pi_desc_page is set to NULL (the "shouldn't happen" clause). Due to the invalid posted_intr_desc_addr, kvm_vcpu_gpa_to_page() fails and nested_get_vmcs12_pages() returns. It doesn't return an error value so vmlaunch proceeds. Note that at this time we have a dangling pointer in vmx->nested.pi_desc and POSTED_INTR_DESC_ADDR in L0's vmcs. 3. Issue an IPI in L2 guest code. This triggers a call to vmx_complete_nested_posted_interrupt() and pi_test_and_clear_on() which dereferences the dangling pointer. Vulnerable code requires nested and enable_apicv variables to be set to true. The host CPU must also support posted interrupts. Fixes: 5e2f30b7 "KVM: nVMX: get rid of nested_get_page()" Cc: stable@vger.kernel.org Reviewed-by: NAndy Honig <ahonig@google.com> Signed-off-by: NCfir Cohen <cfir@google.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Eduardo Habkost 提交于
commit 0e1b869fff60c81b510c2d00602d778f8f59dd9a upstream. Some guests OSes (including Windows 10) write to MSR 0xc001102c on some cases (possibly while trying to apply a CPU errata). Make KVM ignore reads and writes to that MSR, so the guest won't crash. The MSR is documented as "Execution Unit Configuration (EX_CFG)", at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors". Cc: stable@vger.kernel.org Signed-off-by: NEduardo Habkost <ehabkost@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Wanpeng Li 提交于
commit dcbd3e49c2f0b2c2d8a321507ff8f3de4af76d7c upstream. Reported by syzkaller: CPU: 1 PID: 5962 Comm: syz-executor118 Not tainted 4.20.0-rc6+ #374 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline] RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline] RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline] RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline] RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074 Call Trace: kvm_vcpu_ioctl+0x5c8/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2596 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT14 msr and triggers scan ioapic logic to load synic vectors into EOI exit bitmap. However, irqchip is not initialized by this simple testcase, ioapic/apic objects should not be accessed. This patch fixes it by also considering whether or not apic is present. Reported-by: syzbot+39810e6c400efadfef71@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 21 12月, 2018 2 次提交
-
-
由 YiFei Zhu 提交于
[ Upstream commit 79c2206d369b87b19ac29cb47601059b6bf5c291 ] An affected screen resolution is 1366 x 768, which width is not divisible by 8, the default font width. On such screens, when longer lines are earlyprintk'ed, overflow-to-next-line can never trigger, due to the left-most x-coordinate of the next character always less than the screen width. Earlyprintk will infinite loop in trying to print the rest of the string but unable to, due to the line being full. This patch makes the trigger consider the right-most x-coordinate, instead of left-most, as the value to compare against the screen width threshold. Signed-off-by: NYiFei Zhu <zhuyifei1999@gmail.com> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arend van Spriel <arend.vanspriel@broadcom.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Snowberg <eric.snowberg@oracle.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jon Hunter <jonathanh@nvidia.com> Cc: Julien Thierry <julien.thierry@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20181129171230.18699-12-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Peter Zijlstra 提交于
commit 7aa54be2 upstream. On x86 we cannot do fetch_or() with a single instruction and thus end up using a cmpxchg loop, this reduces determinism. Replace the fetch_or() with a composite operation: tas-pending + load. Using two instructions of course opens a window we previously did not have. Consider the scenario: CPU0 CPU1 CPU2 1) lock trylock -> (0,0,1) 2) lock trylock /* fail */ 3) unlock -> (0,0,0) 4) lock trylock -> (0,0,1) 5) tas-pending -> (0,1,1) load-val <- (0,1,0) from 3 6) clear-pending-set-locked -> (0,0,1) FAIL: _2_ owners where 5) is our new composite operation. When we consider each part of the qspinlock state as a separate variable (as we can when _Q_PENDING_BITS == 8) then the above is entirely possible, because tas-pending will only RmW the pending byte, so the later load is able to observe prior tail and lock state (but not earlier than its own trylock, which operates on the whole word, due to coherence). To avoid this we need 2 things: - the load must come after the tas-pending (obviously, otherwise it can trivially observe prior state). - the tas-pending must be a full word RmW instruction, it cannot be an XCHGB for example, such that we cannot observe other state prior to setting pending. On x86 we can realize this by using "LOCK BTS m32, r32" for tas-pending followed by a regular load. Note that observing later state is not a problem: - if we fail to observe a later unlock, we'll simply spin-wait for that store to become visible. - if we observe a later xchg_tail(), there is no difference from that xchg_tail() having taken place before the tas-pending. Suggested-by: NWill Deacon <will.deacon@arm.com> Reported-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NWill Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: andrea.parri@amarulasolutions.com Cc: longman@redhat.com Fixes: 59fb586b ("locking/qspinlock: Remove unbounded cmpxchg() loop from locking slowpath") Link: https://lkml.kernel.org/r/20181003130957.183726335@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org> [bigeasy: GEN_BINARY_RMWcc macro redo] Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 20 12月, 2018 1 次提交
-
-
由 Masahiro Yamada 提交于
commit 25896d073d8a0403b07e6dec56f58e6c33678207 upstream. It is troublesome to add a diagnostic like this to the Makefile parse stage because the top-level Makefile could be parsed with a stale include/config/auto.conf. Once you are hit by the error about non-retpoline compiler, the compilation still breaks even after disabling CONFIG_RETPOLINE. The easiest fix is to move this check to the "archprepare" like this commit did: 829fe4aa ("x86: Allow generating user-space headers without a compiler") Reported-by: NMeelis Roos <mroos@linux.ee> Tested-by: NMeelis Roos <mroos@linux.ee> Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: NZhenzhong Duan <zhenzhong.duan@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support") Link: http://lkml.kernel.org/r/1543991239-18476-1-git-send-email-yamada.masahiro@socionext.com Link: https://lkml.org/lkml/2018/12/4/206Signed-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Cc: Gi-Oh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 17 12月, 2018 4 次提交
-
-
由 Igor Druzhinin 提交于
[ Upstream commit 123664101aa2156d05251704fc63f9bcbf77741a ] This reverts commit b3cf8528. That commit unintentionally broke Xen balloon memory hotplug with "hotplug_unpopulated" set to 1. As long as "System RAM" resource got assigned under a new "Unusable memory" resource in IO/Mem tree any attempt to online this memory would fail due to general kernel restrictions on having "System RAM" resources as 1st level only. The original issue that commit has tried to workaround fa564ad9 ("x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") also got amended by the following 03a55173 ("x86/PCI: Move and shrink AMD 64-bit window to avoid conflict") which made the original fix to Xen ballooning unnecessary. Signed-off-by: NIgor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Yi Wang 提交于
[ Upstream commit 1e4329ee2c52692ea42cc677fb2133519718b34a ] The inline keyword which is not at the beginning of the function declaration may trigger the following build warnings, so let's fix it: arch/x86/kvm/vmx.c:1309:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] arch/x86/kvm/vmx.c:5947:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] arch/x86/kvm/vmx.c:5985:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] arch/x86/kvm/vmx.c:6023:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] Signed-off-by: NYi Wang <wang.yi59@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Yi Wang 提交于
[ Upstream commit 354cb410d87314e2eda344feea84809e4261570a ] We get the following warnings about empty statements when building with 'W=1': arch/x86/kvm/lapic.c:632:53: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] arch/x86/kvm/lapic.c:1907:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] arch/x86/kvm/lapic.c:1936:65: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] arch/x86/kvm/lapic.c:1975:44: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] Rework the debug helper macro to get rid of these warnings. Signed-off-by: NYi Wang <wang.yi59@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Liran Alon 提交于
[ Upstream commit f48b4711dd6e1cf282f9dfd159c14a305909c97c ] When guest transitions from/to long-mode by modifying MSR_EFER.LMA, the list of shared MSRs to be saved/restored on guest<->host transitions is updated (See vmx_set_efer() call to setup_msrs()). On every entry to guest, vcpu_enter_guest() calls vmx_prepare_switch_to_guest(). This function should also take care of setting the shared MSRs to be saved/restored. However, the function does nothing in case we are already running with loaded guest state (vmx->loaded_cpu_state != NULL). This means that even when guest modifies MSR_EFER.LMA which results in updating the list of shared MSRs, it isn't being taken into account by vmx_prepare_switch_to_guest() because it happens while we are running with loaded guest state. To fix above mentioned issue, add a flag to mark that the list of shared MSRs has been updated and modify vmx_prepare_switch_to_guest() to set shared MSRs when running with host state *OR* list of shared MSRs has been updated. Note that this issue was mistakenly introduced by commit 678e315e ("KVM: vmx: add dedicated utility to access guest's kernel_gs_base") because previously vmx_set_efer() always called vmx_load_host_state() which resulted in vmx_prepare_switch_to_guest() to set shared MSRs. Fixes: 678e315e ("KVM: vmx: add dedicated utility to access guest's kernel_gs_base") Reported-by: NEyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com> Reviewed-by: NLiam Merwick <liam.merwick@oracle.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 13 12月, 2018 3 次提交
-
-
由 Eric Snowberg 提交于
commit b84a64fa upstream. The following commit: d6493401 ("x86/efi: Use efi_exit_boot_services()") introduced a regression on systems with large memory maps causing them to hang on boot. The first "goto get_map" that was removed from exit_boot() ensured there was enough room for the memory map when efi_call_early(exit_boot_services) was called. This happens when (nr_desc > ARRAY_SIZE(params->e820_table). Chain of events: exit_boot() efi_exit_boot_services() efi_get_memory_map <- at this point the mm can't grow over 8 desc priv_func() exit_boot_func() allocate_e820ext() <- new mm grows over 8 desc from e820 alloc efi_call_early(exit_boot_services) <- mm key doesn't match so retry efi_call_early(get_memory_map) <- not enough room for new mm system hangs This patch allocates the e820 buffer before calling efi_exit_boot_services() and fixes the regression. [ mingo: minor cleanliness edits. ] Signed-off-by: NEric Snowberg <eric.snowberg@oracle.com> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arend van Spriel <arend.vanspriel@broadcom.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jon Hunter <jonathanh@nvidia.com> Cc: Julien Thierry <julien.thierry@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: YiFei Zhu <zhuyifei1999@gmail.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20181129171230.18699-2-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Masami Hiramatsu 提交于
kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction commit 43a1b0cb upstream. After copy_optimized_instructions() copies several instructions to the working buffer it tries to fix up the real RIP address, but it adjusts the RIP-relative instruction with an incorrect RIP address for the 2nd and subsequent instructions due to a bug in the logic. This will break the kernel pretty badly (with likely outcomes such as a kernel freeze, a crash, or worse) because probed instructions can refer to the wrong data. For example putting kprobes on cpumask_next() typically hits this bug. cpumask_next() is normally like below if CONFIG_CPUMASK_OFFSTACK=y (in this case nr_cpumask_bits is an alias of nr_cpu_ids): <cpumask_next>: 48 89 f0 mov %rsi,%rax 8b 35 7b fb e2 00 mov 0xe2fb7b(%rip),%esi # ffffffff82db9e64 <nr_cpu_ids> 55 push %rbp ... If we put a kprobe on it and it gets jump-optimized, it gets patched by the kprobes code like this: <cpumask_next>: e9 95 7d 07 1e jmpq 0xffffffffa000207a 7b fb jnp 0xffffffff81f8a2e2 <cpumask_next+2> e2 00 loop 0xffffffff81f8a2e9 <cpumask_next+9> 55 push %rbp This shows that the first two MOV instructions were copied to a trampoline buffer at 0xffffffffa000207a. Here is the disassembled result of the trampoline, skipping the optprobe template instructions: # Dump of assembly code from 0xffffffffa000207a to 0xffffffffa00020ea: 54 push %rsp ... 48 83 c4 08 add $0x8,%rsp 9d popfq 48 89 f0 mov %rsi,%rax 8b 35 82 7d db e2 mov -0x1d24827e(%rip),%esi # 0xffffffff82db9e67 <nr_cpu_ids+3> This dump shows that the second MOV accesses *(nr_cpu_ids+3) instead of the original *nr_cpu_ids. This leads to a kernel freeze because cpumask_next() always returns 0 and for_each_cpu() never ends. Fix this by adding 'len' correctly to the real RIP address while copying. [ mingo: Improved the changelog. ] Reported-by: NMichael Rodin <michael@rodin.online> Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # v4.15+ Fixes: 63fef14f ("kprobes/x86: Make insn buffer always ROX and use text_poke()") Link: http://lkml.kernel.org/r/153504457253.22602.1314289671019919596.stgit@devboxSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Masayoshi Mizuma 提交于
[ Upstream commit 9fd61bc9 ] commit 124049de ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved") breaks movable_node kernel option because it changed the memory gap range to reserved memblock. So, the node is marked as Normal zone even if the SRAT has Hot pluggable affinity. ===================================================================== kernel: BIOS-e820: [mem 0x0000180000000000-0x0000180fffffffff] usable kernel: BIOS-e820: [mem 0x00001c0000000000-0x00001c0fffffffff] usable ... kernel: reserved[0x12]#011[0x0000181000000000-0x00001bffffffffff], 0x000003f000000000 bytes flags: 0x0 ... kernel: ACPI: SRAT: Node 2 PXM 6 [mem 0x180000000000-0x1bffffffffff] hotplug kernel: ACPI: SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug ... kernel: Movable zone start for each node kernel: Node 3: 0x00001c0000000000 kernel: Early memory node ranges ... ===================================================================== The original issue is fixed by the former patches, so let's revert commit 124049de ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"). Link: http://lkml.kernel.org/r/20181002143821.5112-4-msys.mizuma@gmail.comSigned-off-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 08 12月, 2018 1 次提交
-
-
由 Wei Wang 提交于
commit 30510387 upstream. There is a race condition when accessing kvm->arch.apic_access_page_done. Due to it, x86_set_memory_region will fail when creating the second vcpu for a svm guest. Add a mutex_lock to serialize the accesses to apic_access_page_done. This lock is also used by vmx for the same purpose. Signed-off-by: NWei Wang <wawei@amazon.de> Signed-off-by: NAmadeusz Juskowiak <ajusk@amazon.de> Signed-off-by: NJulian Stecklina <jsteckli@amazon.de> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: NJoerg Roedel <jroedel@suse.de> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 06 12月, 2018 14 次提交
-
-
由 Steven Rostedt (VMware) 提交于
commit 07f7175b43827640d1e69c9eded89aa089a234b4 upstream. The function_graph_enter() function does the work of calling the function graph hook function and the management of the shadow stack, simplifying the work done in the architecture dependent prepare_ftrace_return(). Have x86 use the new code, and remove the shadow stack management as well as having to set up the trace structure. This is needed to prepare for a fix of a design bug on how the curr_ret_stack is used. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: stable@kernel.org Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback") Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jiri Olsa 提交于
commit 472de49fdc53365c880ab81ae2b5cfdd83db0b06 upstream. Vince reported a crash in the BTS flush code when touching the callchain data, which was supposed to be initialized as an 'early' callchain, but intel_pmu_drain_bts_buffer() does not do that: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 ... Call Trace: <IRQ> intel_pmu_drain_bts_buffer+0x151/0x220 ? intel_get_event_constraints+0x219/0x360 ? perf_assign_events+0xe2/0x2a0 ? select_idle_sibling+0x22/0x3a0 ? __update_load_avg_se+0x1ec/0x270 ? enqueue_task_fair+0x377/0xdd0 ? cpumask_next_and+0x19/0x20 ? load_balance+0x134/0x950 ? check_preempt_curr+0x7a/0x90 ? ttwu_do_wakeup+0x19/0x140 x86_pmu_stop+0x3b/0x90 x86_pmu_del+0x57/0x160 event_sched_out.isra.106+0x81/0x170 group_sched_out.part.108+0x51/0xc0 __perf_event_disable+0x7f/0x160 event_function+0x8c/0xd0 remote_function+0x3c/0x50 flush_smp_call_function_queue+0x35/0xe0 smp_call_function_single_interrupt+0x3a/0xd0 call_function_single_interrupt+0xf/0x20 </IRQ> It was triggered by fuzzer but can be easily reproduced by: # perf record -e cpu/branch-instructions/pu -g -c 1 Peter suggested not to allow branch tracing for precise events: > Now arguably, this is really stupid behaviour. Who in his right mind > wants callchain output on BTS entries. And even if they do, BTS + > precise_ip is nonsensical. > > So in my mind disallowing precise_ip on BTS would be the simplest fix. Suggested-by: NPeter Zijlstra <peterz@infradead.org> Reported-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 6cbc304f ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)") Link: http://lkml.kernel.org/r/20181121101612.16272-3-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jiri Olsa 提交于
commit 67266c1080ad56c31af72b9c18355fde8ccc124a upstream. Currently we check the branch tracing only by checking for the PERF_COUNT_HW_BRANCH_INSTRUCTIONS event of PERF_TYPE_HARDWARE type. But we can define the same event with the PERF_TYPE_RAW type. Changing the intel_pmu_has_bts() code to check on event's final hw config value, so both HW types are covered. Adding unlikely to intel_pmu_has_bts() condition calls, because it was used in the original code in intel_bts_constraints. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20181121101612.16272-2-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jiri Olsa 提交于
commit ed6101bbf6266ee83e620b19faa7c6ad56bb41ab upstream. Moving branch tracing setup to Intel core object into separate intel_pmu_bts_config function, because it's Intel specific. Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20181121101612.16272-1-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 68239654acafe6aad5a3c1dc7237e60accfebc03 upstream. The sequence fpu->initialized = 1; /* step A */ preempt_disable(); /* step B */ fpu__restore(fpu); preempt_enable(); in __fpu__restore_sig() is racy in regard to a context switch. For 32bit frames, __fpu__restore_sig() prepares the FPU state within fpu->state. To ensure that a context switch (switch_fpu_prepare() in particular) does not modify fpu->state it uses fpu__drop() which sets fpu->initialized to 0. After fpu->initialized is cleared, the CPU's FPU state is not saved to fpu->state during a context switch. The new state is loaded via fpu__restore(). It gets loaded into fpu->state from userland and ensured it is sane. fpu->initialized is then set to 1 in order to avoid fpu__initialize() doing anything (overwrite the new state) which is part of fpu__restore(). A context switch between step A and B above would save CPU's current FPU registers to fpu->state and overwrite the newly prepared state. This looks like a tiny race window but the Kernel Test Robot reported this back in 2016 while we had lazy FPU support. Borislav Petkov made the link between that report and another patch that has been posted. Since the removal of the lazy FPU support, this race goes unnoticed because the warning has been removed. Disable bottom halves around the restore sequence to avoid the race. BH need to be disabled because BH is allowed to run (even with preemption disabled) and might invoke kernel_fpu_begin() by doing IPsec. [ bp: massage commit message a bit. ] Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NIngo Molnar <mingo@kernel.org> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: kvm ML <kvm@vger.kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: stable@vger.kernel.org Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20181120102635.ddv3fvavxajjlfqk@linutronix.de Link: https://lkml.kernel.org/r/20160226074940.GA28911@pd.tnicSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Borislav Petkov 提交于
commit 60c8144a upstream. Currently, the code sets up the thresholding interrupt vector and only then goes about initializing the thresholding banks. Which is wrong, because an early thresholding interrupt would cause a NULL pointer dereference when accessing those banks and prevent the machine from booting. Therefore, set the thresholding interrupt vector only *after* having initialized the banks successfully. Fixes: 18807ddb ("x86/mce/AMD: Reset Threshold Limit after logging error") Reported-by: NRafał Miłecki <rafal@milecki.pl> Reported-by: NJohn Clemens <clemej@gmail.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Tested-by: NRafał Miłecki <rafal@milecki.pl> Tested-by: NJohn Clemens <john@deater.net> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Cc: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Link: https://lkml.kernel.org/r/20181127101700.2964-1-zajec5@gmail.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=201291Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Luiz Capitulino 提交于
commit a87c99e61236ba8ca962ce97a19fab5ebd588d35 upstream. Apparently, the ple_gap parameter was accidentally removed by commit c8e88717. Add it back. Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com> Cc: stable@vger.kernel.org Fixes: c8e88717Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Wanpeng Li 提交于
commit e97f852fd4561e77721bb9a4e0ea9d98305b1e93 upstream. Reported by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8 PGD 80000003ec4da067 P4D 80000003ec4da067 PUD 3f7bfa067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 7 PID: 5059 Comm: debug Tainted: G OE 4.19.0-rc5 #16 RIP: 0010:__lock_acquire+0x1a6/0x1990 Call Trace: lock_acquire+0xdb/0x210 _raw_spin_lock+0x38/0x70 kvm_ioapic_scan_entry+0x3e/0x110 [kvm] vcpu_enter_guest+0x167e/0x1910 [kvm] kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm] kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm] do_vfs_ioctl+0xa5/0x690 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x83/0x6e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr and triggers scan ioapic logic to load synic vectors into EOI exit bitmap. However, irqchip is not initialized by this simple testcase, ioapic/apic objects should not be accessed. This can be triggered by the following program: #define _GNU_SOURCE #include <endian.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/syscall.h> #include <sys/types.h> #include <unistd.h> uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff}; int main(void) { syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0); long res = 0; memcpy((void*)0x20000040, "/dev/kvm", 9); res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0); if (res != -1) r[0] = res; res = syscall(__NR_ioctl, r[0], 0xae01, 0); if (res != -1) r[1] = res; res = syscall(__NR_ioctl, r[1], 0xae41, 0); if (res != -1) r[2] = res; memcpy( (void*)0x20000080, "\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00" "\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43" "\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33" "\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe" "\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22" "\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb", 106); syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080); syscall(__NR_ioctl, r[2], 0xae80, 0); return 0; } This patch fixes it by bailing out scan ioapic if ioapic is not initialized in kernel. Reported-by: NWei Wu <ww9210@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wei Wu <ww9210@gmail.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Wanpeng Li 提交于
commit 38ab012f109caf10f471db1adf284e620dd8d701 upstream. Reported by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 0000000000000014 PGD 800000040410c067 P4D 800000040410c067 PUD 40410d067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 2567 Comm: poc Tainted: G OE 4.19.0-rc5 #16 RIP: 0010:kvm_pv_send_ipi+0x94/0x350 [kvm] Call Trace: kvm_emulate_hypercall+0x3cc/0x700 [kvm] handle_vmcall+0xe/0x10 [kvm_intel] vmx_handle_exit+0xc1/0x11b0 [kvm_intel] vcpu_enter_guest+0x9fb/0x1910 [kvm] kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm] kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm] do_vfs_ioctl+0xa5/0x690 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x83/0x6e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The reason is that the apic map has not yet been initialized, the testcase triggers pv_send_ipi interface by vmcall which results in kvm->arch.apic_map is dereferenced. This patch fixes it by checking whether or not apic map is NULL and bailing out immediately if that is the case. Fixes: 4180bf1b (KVM: X86: Implement "send IPI" hypercall) Reported-by: NWei Wu <ww9210@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wei Wu <ww9210@gmail.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Liran Alon 提交于
commit bcbfbd8ec21096027f1ee13ce6c185e8175166f6 upstream. kvm_pv_clock_pairing() allocates local var "struct kvm_clock_pairing clock_pairing" on stack and initializes all it's fields besides padding (clock_pairing.pad[]). Because clock_pairing var is written completely (including padding) to guest memory, failure to init struct padding results in kernel info-leak. Fix the issue by making sure to also init the padding with zeroes. Fixes: 55dd00a7 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall") Reported-by: syzbot+a8ef68d71211ba264f56@syzkaller.appspotmail.com Reviewed-by: NMark Kanda <mark.kanda@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Leonid Shatz 提交于
commit 326e742533bf0a23f0127d8ea62fb558ba665f08 upstream. Since commit e79f245d ("X86/KVM: Properly update 'tsc_offset' to represent the running guest"), vcpu->arch.tsc_offset meaning was changed to always reflect the tsc_offset value set on active VMCS. Regardless if vCPU is currently running L1 or L2. However, above mentioned commit failed to also change kvm_vcpu_write_tsc_offset() to set vcpu->arch.tsc_offset correctly. This is because vmx_write_tsc_offset() could set the tsc_offset value in active VMCS to given offset parameter *plus vmcs12->tsc_offset*. However, kvm_vcpu_write_tsc_offset() just sets vcpu->arch.tsc_offset to given offset parameter. Without taking into account the possible addition of vmcs12->tsc_offset. (Same is true for SVM case). Fix this issue by changing kvm_x86_ops->write_tsc_offset() to return actually set tsc_offset in active VMCS and modify kvm_vcpu_write_tsc_offset() to set returned value in vcpu->arch.tsc_offset. In addition, rename write_tsc_offset() callback to write_l1_tsc_offset() to make it clear that it is meant to set L1 TSC offset. Fixes: e79f245d ("X86/KVM: Properly update 'tsc_offset' to represent the running guest") Reviewed-by: NLiran Alon <liran.alon@oracle.com> Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com> Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NLeonid Shatz <leonid.shatz@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jim Mattson 提交于
commit fd65d3142f734bc4376053c8d75670041903134d upstream. Previously, we only called indirect_branch_prediction_barrier on the logical CPU that freed a vmcb. This function should be called on all logical CPUs that last loaded the vmcb in question. Fixes: 15d45071 ("KVM/x86: Add IBPB support") Reported-by: NNeel Natu <neelnatu@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Junaid Shahid 提交于
commit 0e0fee5c539b61fdd098332e0e2cc375d9073706 upstream. When a guest page table is updated via an emulated write, kvm_mmu_pte_write() is called to update the shadow PTE using the just written guest PTE value. But if two emulated guest PTE writes happened concurrently, it is possible that the guest PTE and the shadow PTE end up being out of sync. Emulated writes do not mark the shadow page as unsync-ed, so this inconsistency will not be resolved even by a guest TLB flush (unless the page was marked as unsync-ed at some other point). This is fixed by re-reading the current value of the guest PTE after the MMU lock has been acquired instead of just using the value that was written prior to calling kvm_mmu_pte_write(). Signed-off-by: NJunaid Shahid <junaids@google.com> Reviewed-by: NWanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Thomas Gleixner 提交于
commit 55a97402 upstream Provide the possibility to enable IBPB always in combination with 'prctl' and 'seccomp'. Add the extra command line options and rework the IBPB selection to evaluate the command instead of the mode selected by the STIPB switch case. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NIngo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185006.144047038@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-