- 08 9月, 2020 23 次提交
-
-
由 Joerg Roedel 提交于
Install an exception handler for #VC exception that uses a GHCB. Also add the infrastructure for handling different exit-codes by decoding the instruction that caused the exception and error handling. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-24-joro@8bytes.org
-
由 Joerg Roedel 提交于
The functions are needed to map the GHCB for SEV-ES guests. The GHCB is used for communication with the hypervisor, so its content must not be encrypted. After the GHCB is not needed anymore it must be mapped encrypted again so that the running kernel image can safely re-use the memory. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-23-joro@8bytes.org
-
由 Joerg Roedel 提交于
The function can fail to create an identity mapping, check for that and bail out if it happens. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-22-joro@8bytes.org
-
由 Joerg Roedel 提交于
Call set_sev_encryption_mask() while still on the stage 1 #VC-handler because the stage 2 handler needs the kernel's own page tables to be set up, to which calling set_sev_encryption_mask() is a prerequisite. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-21-joro@8bytes.org
-
由 Joerg Roedel 提交于
Add the first handler for #VC exceptions. At stage 1 there is no GHCB yet because the kernel might still be running on the EFI page table. The stage 1 handler is limited to the MSR-based protocol to talk to the hypervisor and can only support CPUID exit-codes, but that is enough to get to stage 2. [ bp: Zap superfluous newlines after rd/wrmsr instruction mnemonics. ] Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-20-joro@8bytes.org
-
由 Joerg Roedel 提交于
Changing the function to take start and end as parameters instead of start and size simplifies the callers which don't need to calculate the size if they already have start and end. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-19-joro@8bytes.org
-
由 Joerg Roedel 提交于
With the page-fault handler in place, he identity mapping can be built on-demand. So remove the code which manually creates the mappings and unexport/remove the functions used for it. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-18-joro@8bytes.org
-
由 Joerg Roedel 提交于
When booted through startup_64(), the kernel keeps running on the EFI page table until the KASLR code sets up its own page table. Without KASLR, the pre-decompression boot code never switches off the EFI page table. Change that by unconditionally switching to a kernel-controlled page table after relocation. This makes sure the kernel can make changes to the mapping when necessary, for example map pages unencrypted in SEV and SEV-ES guests. Also, remove the debug_putstr() calls in initialize_identity_maps() because the function now runs before console_init() is called. [ bp: Massage commit message. ] Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-17-joro@8bytes.org
-
由 Joerg Roedel 提交于
Install a page-fault handler to add an identity mapping to addresses not yet mapped. Also do some checking whether the error code is sane. This makes non SEV-ES machines use the exception handling infrastructure in the pre-decompressions boot code too, making it less likely to break in the future. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-16-joro@8bytes.org
-
由 Joerg Roedel 提交于
The file contains only code related to identity-mapped page tables. Rename the file and compile it always in. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-15-joro@8bytes.org
-
由 Joerg Roedel 提交于
Add code needed to setup an IDT in the early pre-decompression boot-code. The IDT is loaded first in startup_64, which is after EfiExitBootServices() has been called, and later reloaded when the kernel image has been relocated to the end of the decompression area. This allows to setup different IDT handlers before and after the relocation. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-14-joro@8bytes.org
-
由 Joerg Roedel 提交于
The x86-64 ABI defines a red-zone on the stack: The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone. This is not compatible with exception handling, because the IRET frame written by the hardware at the stack pointer and the functions to handle the exception will overwrite the temporary variables of the interrupted function, causing undefined behavior. So disable red-zones for the pre-decompression boot code. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-13-joro@8bytes.org
-
由 Joerg Roedel 提交于
Add a function to check whether an instruction has a REP prefix. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20200907131613.12703-12-joro@8bytes.org
-
由 Joerg Roedel 提交于
Add a function to the instruction decoder which returns the pt_regs offset of the register specified in the reg field of the modrm byte. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20200907131613.12703-11-joro@8bytes.org
-
由 Joerg Roedel 提交于
Factor out the code used to decode an instruction with the correct address and operand sizes to a helper function. No functional changes. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-10-joro@8bytes.org
-
由 Joerg Roedel 提交于
Factor out the code to fetch the instruction from user-space to a helper function. No functional changes. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-9-joro@8bytes.org
-
由 Joerg Roedel 提交于
The inat-tables.c file has some arrays in it that contain pointers to other arrays. These pointers need to be relocated when the kernel image is moved to a different location. The pre-decompression boot-code has no support for applying ELF relocations, so initialize these arrays at runtime in the pre-decompression code to make sure all pointers are correctly initialized. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20200907131613.12703-8-joro@8bytes.org
-
由 Joerg Roedel 提交于
Move the definition of the x86 page-fault error code bits to a new header file asm/trap_pf.h. This makes it easier to include them into pre-decompression boot code. No functional changes. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-7-joro@8bytes.org
-
由 Tom Lendacky 提交于
Add CPU feature detection for Secure Encrypted Virtualization with Encrypted State. This feature enhances SEV by also encrypting the guest register state, making it in-accessible to the hypervisor. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-6-joro@8bytes.org
-
由 Borislav Petkov 提交于
Use the shorthand to make it more readable. No functional changes. Signed-off-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-5-joro@8bytes.org
-
由 Joerg Roedel 提交于
Building a correct GHCB for the hypervisor requires setting valid bits in the GHCB. Simplify that process by providing accessor functions to set values and to update the valid bitmap and to check the valid bitmap in KVM. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-4-joro@8bytes.org
-
由 Tom Lendacky 提交于
Extend the vmcb_safe_area with SEV-ES fields and add a new 'struct ghcb' which will be used for guest-hypervisor communication. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-3-joro@8bytes.org
-
由 Joerg Roedel 提交于
Do not allocate a vmcb_control_area and a vmcb_save_area on the stack, as these structures will become larger with future extenstions of SVM and thus the svm_set_nested_state() function will become a too large stack frame. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-2-joro@8bytes.org
-
- 04 9月, 2020 5 次提交
-
-
由 Thomas Gleixner 提交于
Andy reported that the syscall treacing for 32bit fast syscall fails: # ./tools/testing/selftests/x86/ptrace_syscall_32 ... [RUN] SYSEMU [FAIL] Initial args are wrong (nr=224, args=10 11 12 13 14 4289172732) ... [RUN] SYSCALL [FAIL] Initial args are wrong (nr=29, args=0 0 0 0 0 4289172732) The eason is that the conversion to generic entry code moved the retrieval of the sixth argument (EBP) after the point where the syscall entry work runs, i.e. ptrace, seccomp, audit... Unbreak it by providing a split up version of syscall_enter_from_user_mode(). - syscall_enter_from_user_mode_prepare() establishes state and enables interrupts - syscall_enter_from_user_mode_work() runs the entry work Replace the call to syscall_enter_from_user_mode() in the 32bit fast syscall C-entry with the split functions and stick the EBP retrieval between them. Fixes: 27d6b4d1 ("x86/entry: Use generic syscall entry function") Reported-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/87k0xdjbtt.fsf@nanos.tec.linutronix.de
-
由 Andy Lutomirski 提交于
Trying to clear DR7 around a #DB from usermode malfunctions if the tasks schedules when delivering SIGTRAP. Rather than trying to define a special no-recursion region, just allow a single level of recursion. The same mechanism is used for NMI, and it hasn't caused any problems yet. Fixes: 9f58fdde ("x86/db: Split out dr6/7 handling") Reported-by: NKyle Huey <me@kylehuey.com> Debugged-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NDaniel Thompson <daniel.thompson@linaro.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/8b9bd05f187231df008d48cf818a6a311cbd5c98.1597882384.git.luto@kernel.org Link: https://lore.kernel.org/r/20200902133200.726584153@infradead.org
-
由 Peter Zijlstra 提交于
The WARN added in commit 3c73b81a ("x86/entry, selftests: Further improve user entry sanity checks") unconditionally triggers on a IVB machine because it does not support SMAP. For !SMAP hardware the CLAC/STAC instructions are patched out and thus if userspace sets AC, it is still have set after entry. Fixes: 3c73b81a ("x86/entry, selftests: Further improve user entry sanity checks") Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NDaniel Thompson <daniel.thompson@linaro.org> Acked-by: NAndy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200902133200.666781610@infradead.org
-
由 Vamshi K Sthambamkadi 提交于
On i386, the order of parameters passed on regs is eax,edx,and ecx (as per regparm(3) calling conventions). Change the mapping in regs_get_kernel_argument(), so that arg1=ax arg2=dx, and arg3=cx. Running the selftests testcase kprobes_args_use.tc shows the result as passed. Fixes: 3c88ee19 ("x86: ptrace: Add function argument access API") Signed-off-by: NVamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200828113242.GA1424@cosmos
-
由 Huang Ying 提交于
Commit: cc9aec03 ("x86/numa_emulation: Introduce uniform split capability") uses "-1" as the starting node ID, which causes the strange kernel log as follows, when "numa=fake=32G" is added to the kernel command line: Faking node -1 at [mem 0x0000000000000000-0x0000000893ffffff] (35136MB) Faking node 0 at [mem 0x0000001840000000-0x000000203fffffff] (32768MB) Faking node 1 at [mem 0x0000000894000000-0x000000183fffffff] (64192MB) Faking node 2 at [mem 0x0000002040000000-0x000000283fffffff] (32768MB) Faking node 3 at [mem 0x0000002840000000-0x000000303fffffff] (32768MB) And finally the kernel crashes: BUG: Bad page state in process swapper pfn:00011 page:(____ptrval____) refcount:0 mapcount:1 mapping:(____ptrval____) index:0x55cd7e44b270 pfn:0x11 failed to read mapping contents, not a valid kernel address? flags: 0x5(locked|uptodate) raw: 0000000000000005 000055cd7e44af30 000055cd7e44af50 0000000100000006 raw: 000055cd7e44b270 000055cd7e44b290 0000000000000000 000055cd7e44b510 page dumped because: page still charged to cgroup page->mem_cgroup:000055cd7e44b510 Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.9.0-rc2 #1 Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 Call Trace: dump_stack+0x57/0x80 bad_page.cold+0x63/0x94 __free_pages_ok+0x33f/0x360 memblock_free_all+0x127/0x195 mem_init+0x23/0x1f5 start_kernel+0x219/0x4f5 secondary_startup_64+0xb6/0xc0 Fix this bug via using 0 as the starting node ID. This restores the original behavior before cc9aec03. [ mingo: Massaged the changelog. ] Fixes: cc9aec03 ("x86/numa_emulation: Introduce uniform split capability") Signed-off-by: N"Huang, Ying" <ying.huang@intel.com> Signed-off-by: NIngo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200904061047.612950-1-ying.huang@intel.com
-
- 03 9月, 2020 2 次提交
-
-
由 Joerg Roedel 提交于
One can not simply remove vmalloc faulting on x86-32. Upstream commit: 7f0a002b ("x86/mm: remove vmalloc faulting") removed it on x86 alltogether because previously the arch_sync_kernel_mappings() interface was introduced. This interface added synchronization of vmalloc/ioremap page-table updates to all page-tables in the system at creation time and was thought to make vmalloc faulting obsolete. But that assumption was incredibly naive. It turned out that there is a race window between the time the vmalloc or ioremap code establishes a mapping and the time it synchronizes this change to other page-tables in the system. During this race window another CPU or thread can establish a vmalloc mapping which uses the same intermediate page-table entries (e.g. PMD or PUD) and does no synchronization in the end, because it found all necessary mappings already present in the kernel reference page-table. But when these intermediate page-table entries are not yet synchronized, the other CPU or thread will continue with a vmalloc address that is not yet mapped in the page-table it currently uses, causing an unhandled page fault and oops like below: BUG: unable to handle page fault for address: fe80c000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page *pde = 33183067 *pte = a8648163 Oops: 0002 [#1] SMP CPU: 1 PID: 13514 Comm: cve-2017-17053 Tainted: G ... Call Trace: ldt_dup_context+0x66/0x80 dup_mm+0x2b3/0x480 copy_process+0x133b/0x15c0 _do_fork+0x94/0x3e0 __ia32_sys_clone+0x67/0x80 __do_fast_syscall_32+0x3f/0x70 do_fast_syscall_32+0x29/0x60 do_SYSENTER_32+0x15/0x20 entry_SYSENTER_32+0x9f/0xf2 EIP: 0xb7eef549 So the arch_sync_kernel_mappings() interface is racy, but removing it would mean to re-introduce the vmalloc_sync_all() interface, which is even more awful. Keep arch_sync_kernel_mappings() in place and catch the race condition in the page-fault handler instead. Do a partial revert of above commit to get vmalloc faulting on x86-32 back in place. Fixes: 7f0a002b ("x86/mm: remove vmalloc faulting") Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NIngo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200902155904.17544-1-joro@8bytes.org
-
由 Arvind Sankar 提交于
When CONFIG_RETPOLINE is disabled, Clang uses a jump table for the switch statement in cmdline_find_option (jump tables are disabled when CONFIG_RETPOLINE is enabled). This function is called very early in boot from sme_enable() if CONFIG_AMD_MEM_ENCRYPT is enabled. At this time, the kernel is still executing out of the identity mapping, but the jump table will contain virtual addresses. Fix this by disabling jump tables for cmdline.c when AMD_MEM_ENCRYPT is enabled. Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200903023056.3914690-1-nivedita@alum.mit.edu
-
- 31 8月, 2020 1 次提交
-
-
由 Cathy Zhang 提交于
TSX suspend load tracking instruction is supported by the Intel uarch Sapphire Rapids. It aims to give a way to choose which memory accesses do not need to be tracked in the TSX read set. It's availability is indicated as CPUID.(EAX=7,ECX=0):EDX[bit 16]. Expose TSX Suspend Load Address Tracking feature in KVM CPUID, so KVM could pass this information to guests and they can make use of this feature accordingly. Signed-off-by: NCathy Zhang <cathy.zhang@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1598316478-23337-3-git-send-email-cathy.zhang@intel.com
-
- 30 8月, 2020 1 次提交
-
-
由 Kyung Min Park 提交于
Intel TSX suspend load tracking instructions aim to give a way to choose which memory accesses do not need to be tracked in the TSX read set. Add TSX suspend load tracking CPUID feature flag TSXLDTRK for enumeration. A processor supports Intel TSX suspend load address tracking if CPUID.0x07.0x0:EDX[16] is present. Two instructions XSUSLDTRK, XRESLDTRK are available when this feature is present. The CPU feature flag is shown as "tsxldtrk" in /proc/cpuinfo. Signed-off-by: NKyung Min Park <kyung.min.park@intel.com> Signed-off-by: NCathy Zhang <cathy.zhang@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1598316478-23337-2-git-send-email-cathy.zhang@intel.com
-
- 27 8月, 2020 2 次提交
-
-
由 Thomas Gleixner 提交于
Several people reported that 5.8 broke the interrupt affinity setting mechanism. The consolidation of the entry code reused the regular exception entry code for device interrupts and changed the way how the vector number is conveyed from ptregs->orig_ax to a function argument. The low level entry uses the hardware error code slot to push the vector number onto the stack which is retrieved from there into a function argument and the slot on stack is set to -1. The reason for setting it to -1 is that the error code slot is at the position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax indicates that the entry came via a syscall. If it's not set to a negative value then a signal delivery on return to userspace would try to restart a syscall. But there are other places which rely on pt_regs::orig_ax being a valid indicator for syscall entry. But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the interrupt affinity setting mechanism, which was overlooked when this change was made. Moving interrupts on x86 happens in several steps. A new vector on a different CPU is allocated and the relevant interrupt source is reprogrammed to that. But that's racy and there might be an interrupt already in flight to the old vector. So the old vector is preserved until the first interrupt arrives on the new vector and the new target CPU. Once that happens the old vector is cleaned up, but this cleanup still depends on the vector number being stored in pt_regs::orig_ax, which is now -1. That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector always false. As a consequence the interrupt is moved once, but then it cannot be moved anymore because the cleanup of the old vector never happens. There would be several ways to convey the vector information to that place in the guts of the interrupt handling, but on deeper inspection it turned out that this check is pointless and a leftover from the old affinity model of X86 which supported multi-CPU affinities. Under this model it was possible that an interrupt had an old and a new vector on the same CPU, so the vector match was required. Under the new model the effective affinity of an interrupt is always a single CPU from the requested affinity mask. If the affinity mask changes then either the interrupt stays on the CPU and on the same vector when that CPU is still in the new affinity mask or it is moved to a different CPU, but it is never moved to a different vector on the same CPU. Ergo the cleanup check for the matching vector number is not required and can be removed which makes the dependency on pt_regs:orig_ax go away. The remaining check for new_cpu == smp_processsor_id() is completely sufficient. If it matches then the interrupt was successfully migrated and the cleanup can proceed. For paranoia sake add a warning into the vector assignment code to validate that the assumption of never moving to a different vector on the same CPU holds. Fixes: 633260fa ("x86/irq: Convey vector as argument and not in ptregs") Reported-by: NAlex bykov <alex.bykov@scylladb.com> Reported-by: NAvi Kivity <avi@scylladb.com> Reported-by: NAlexander Graf <graf@amazon.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NAlexander Graf <graf@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87wo1ltaxz.fsf@nanos.tec.linutronix.de
-
由 Ashok Raj 提交于
There is a race when taking a CPU offline. Current code looks like this: native_cpu_disable() { ... apic_soft_disable(); /* * Any existing set bits for pending interrupt to * this CPU are preserved and will be sent via IPI * to another CPU by fixup_irqs(). */ cpu_disable_common(); { .... /* * Race window happens here. Once local APIC has been * disabled any new interrupts from the device to * the old CPU are lost */ fixup_irqs(); // Too late to capture anything in IRR. ... } } The fix is to disable the APIC *after* cpu_disable_common(). Testing was done with a USB NIC that provided a source of frequent interrupts. A script migrated interrupts to a specific CPU and then took that CPU offline. Fixes: 60dcaad5 ("x86/hotplug: Silence APIC and NMI when CPU is dead") Reported-by: NEvan Green <evgreen@chromium.org> Signed-off-by: NAshok Raj <ashok.raj@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NMathias Nyman <mathias.nyman@linux.intel.com> Tested-by: NEvan Green <evgreen@chromium.org> Reviewed-by: NEvan Green <evgreen@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/ Link: https://lore.kernel.org/r/1598501530-45821-1-git-send-email-ashok.raj@intel.com
-
- 26 8月, 2020 3 次提交
-
-
由 Peter Zijlstra 提交于
Unused remnants Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NMarco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.487040689@infradead.org
-
由 Peter Zijlstra 提交于
Remove trace_cpu_idle() from the arch_cpu_idle() implementations and put it in the generic code, right before disabling RCU. Gets rid of more trace_*_rcuidle() users. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NMarco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.428433395@infradead.org
-
由 Peter Zijlstra 提交于
This allows moving the leave_mm() call into generic code before rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NMarco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org
-
- 24 8月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 22 8月, 2020 1 次提交
-
-
由 Will Deacon 提交于
The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: NWill Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 8月, 2020 1 次提交
-
-
由 Sean Christopherson 提交于
KVM has an optmization to avoid expensive MRS read/writes on VMENTER/EXIT. It caches the MSR values and restores them either when leaving the run loop, on preemption or when going out to user space. The affected MSRs are not required for kernel context operations. This changed with the recently introduced mechanism to handle FSGSBASE in the paranoid entry code which has to retrieve the kernel GSBASE value by accessing per CPU memory. The mechanism needs to retrieve the CPU number and uses either LSL or RDPID if the processor supports it. Unfortunately RDPID uses MSR_TSC_AUX which is in the list of cached and lazily restored MSRs, which means between the point where the guest value is written and the point of restore, MSR_TSC_AUX contains a random number. If an NMI or any other exception which uses the paranoid entry path happens in such a context, then RDPID returns the random guest MSR_TSC_AUX value. As a consequence this reads from the wrong memory location to retrieve the kernel GSBASE value. Kernel GS is used to for all regular this_cpu_*() operations. If the GSBASE in the exception handler points to the per CPU memory of a different CPU then this has the obvious consequences of data corruption and crashes. As the paranoid entry path is the only place which accesses MSR_TSX_AUX (via RDPID) and the fallback via LSL is not significantly slower, remove the RDPID alternative from the entry path and always use LSL. The alternative would be to write MSR_TSC_AUX on every VMENTER and VMEXIT which would be inflicting massive overhead on that code path. [ tglx: Rewrote changelog ] Fixes: eaad9812 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro") Reported-by: NTom Lendacky <thomas.lendacky@amd.com> Debugged-by: NTom Lendacky <thomas.lendacky@amd.com> Suggested-by: NAndy Lutomirski <luto@kernel.org> Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200821105229.18938-1-pbonzini@redhat.com
-