- 18 11月, 2020 2 次提交
-
-
由 Zhenzhong Duan 提交于
"intel_iommu=off" command line is used to disable iommu but iommu is force enabled in a tboot system for security reason. However for better performance on high speed network device, a new option "intel_iommu=tboot_noforce" is introduced to disable the force on. By default kernel should panic if iommu init fail in tboot for security reason, but it's unnecessory if we use "intel_iommu=tboot_noforce,off". Fix the code setting force_on and move intel_iommu_tboot_noforce from tboot code to intel iommu code. Fixes: 7304e8f2 ("iommu/vt-d: Correctly disable Intel IOMMU force on") Signed-off-by: NZhenzhong Duan <zhenzhong.duan@gmail.com> Tested-by: NLukasz Hawrylko <lukasz.hawrylko@linux.intel.com> Acked-by: NLu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20201110071908.3133-1-zhenzhong.duan@gmail.comSigned-off-by: NWill Deacon <will@kernel.org>
-
由 Thomas Gleixner 提交于
sysrq-t ends up invoking show_opcodes() for each task which tries to access the user space code of other processes, which is obviously bogus. It either manages to dump where the foreign task's regs->ip points to in a valid mapping of the current task or triggers a pagefault and prints "Code: Bad RIP value.". Both is just wrong. Add a safeguard in copy_code() and check whether the @regs pointer matches currents pt_regs. If not, do not even try to access it. While at it, add commentary why using copy_from_user_nmi() is safe in copy_code() even if the function name suggests otherwise. Reported-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Acked-by: NOleg Nesterov <oleg@redhat.com> Tested-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201117202753.667274723@linutronix.de
-
- 17 11月, 2020 1 次提交
-
-
由 Chen Yu 提交于
Currently, scan_microcode() leverages microcode_matches() to check if the microcode matches the CPU by comparing the family and model. However, the processor stepping and flags of the microcode signature should also be considered when saving a microcode patch for early update. Use find_matching_signature() in scan_microcode() and get rid of the now-unused microcode_matches() which is a good cleanup in itself. Complete the verification of the patch being saved for early loading in save_microcode_patch() directly. This needs to be done there too because save_mc_for_early() will call save_microcode_patch() too. The second reason why this needs to be done is because the loader still tries to support, at least hypothetically, mixed-steppings systems and thus adds all patches to the cache that belong to the same CPU model albeit with different steppings. For example: microcode: CPU: sig=0x906ec, pf=0x2, rev=0xd6 microcode: mc_saved[0]: sig=0x906e9, pf=0x2a, rev=0xd6, total size=0x19400, date = 2020-04-23 microcode: mc_saved[1]: sig=0x906ea, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27 microcode: mc_saved[2]: sig=0x906eb, pf=0x2, rev=0xd6, total size=0x19400, date = 2020-04-23 microcode: mc_saved[3]: sig=0x906ec, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27 microcode: mc_saved[4]: sig=0x906ed, pf=0x22, rev=0xd6, total size=0x19400, date = 2020-04-23 The patch which is being saved for early loading, however, can only be the one which fits the CPU this runs on so do the signature verification before saving. [ bp: Do signature verification in save_microcode_patch() and rewrite commit message. ] Fixes: ec400dde ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU") Signed-off-by: NChen Yu <yu.c.chen@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=208535 Link: https://lkml.kernel.org/r/20201113015923.13960-1-yu.c.chen@intel.com
-
- 13 11月, 2020 1 次提交
-
-
由 Mike Travis 提交于
A test shows that the output contains a space: # cat /proc/sgi_uv/archtype NSGI4 U/UVX Remove that embedded space by copying the "trimmed" buffer instead of the untrimmed input character list. Use sizeof to remove size dependency on copy out length. Increase output buffer size by one character just in case BIOS sends an 8 character string for archtype. Fixes: 1e61f5a9 ("Add and decode Arch Type in UVsystab") Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20201111010418.82133-1-mike.travis@hpe.com
-
- 10 11月, 2020 1 次提交
-
-
由 Peter Zijlstra 提交于
struct perf_sample_data lives on-stack, we should be careful about it's size. Furthermore, the pt_regs copy in there is only because x86_64 is a trainwreck, solve it differently. Reported-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NSteven Rostedt <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org
-
- 07 11月, 2020 3 次提交
-
-
由 Mike Travis 提交于
Testing shows a problem in that UV5 hubless systems were not being recognized. Add them to the list of OEM IDs checked. Fixes: 6c779442 ("Add UV5 direct references") Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201105222741.157029-4-mike.travis@hpe.com
-
由 Mike Travis 提交于
Testing shows that trailing spaces caused problems with the OEM_ID and the OEM_TABLE_ID. One being that the OEM_ID would not string compare correctly. Another the OEM_ID and OEM_TABLE_ID would be concatenated in the printout. Remove any trailing spaces. Fixes: 1e61f5a9 ("Add and decode Arch Type in UVsystab") Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201105222741.157029-3-mike.travis@hpe.com
-
由 Mike Travis 提交于
Testing shows a problem in that the OEM_TABLE_ID was missing for hubless systems. This is used to determine the APIC type (legacy or extended). Add the OEM_TABLE_ID to the early hubless processing. Fixes: 1e61f5a9 ("Add and decode Arch Type in UVsystab") Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201105222741.157029-2-mike.travis@hpe.com
-
- 06 11月, 2020 1 次提交
-
-
由 Anand K Mistry 提交于
On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON, STIBP is set to on and spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED At the same time, IBPB can be set to conditional. However, this leads to the case where it's impossible to turn on IBPB for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED condition leads to a return before the task flag is set. Similarly, ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to conditional. More generally, the following cases are possible: 1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb 2. STIBP = on && IBPB = conditional for AMD CPUs with X86_FEATURE_AMD_STIBP_ALWAYS_ON The first case functions correctly today, but only because spectre_v2_user_ibpb isn't updated to reflect the IBPB mode. At a high level, this change does one thing. If either STIBP or IBPB is set to conditional, allow the prctl to change the task flag. Also, reflect that capability when querying the state. This isn't perfect since it doesn't take into account if only STIBP or IBPB is unconditionally on. But it allows the conditional feature to work as expected, without affecting the unconditional one. [ bp: Massage commit message and comment; space out statements for better readability. ] Fixes: 21998a35 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") Signed-off-by: NAnand K Mistry <amistry@google.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NTom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid
-
- 30 10月, 2020 3 次提交
-
-
由 Joerg Roedel 提交于
MMIO memory is usually not mapped encrypted, so there is no reason to support emulated MMIO when it is mapped encrypted. Prevent a possible hypervisor attack where a RAM page is mapped as an MMIO page in the nested page-table, so that any guest access to it will trigger a #VC exception and leak the data on that page to the hypervisor via the GHCB (like with valid MMIO). On the read side this attack would allow the HV to inject data into the guest. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20201028164659.27002-6-joro@8bytes.org
-
由 Joerg Roedel 提交于
When SEV is enabled, the kernel requests the C-bit position again from the hypervisor to build its own page-table. Since the hypervisor is an untrusted source, the C-bit position needs to be verified before the kernel page-table is used. Call sev_verify_cbit() before writing the CR3. [ bp: Massage. ] Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20201028164659.27002-5-joro@8bytes.org
-
由 Joerg Roedel 提交于
Check whether the hypervisor reported the correct C-bit when running as an SEV guest. Using a wrong C-bit position could be used to leak sensitive data from the guest to the hypervisor. The check function is in a separate file: arch/x86/kernel/sev_verify_cbit.S so that it can be re-used in the running kernel image. [ bp: Massage. ] Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20201028164659.27002-4-joro@8bytes.org
-
- 29 10月, 2020 1 次提交
-
-
由 Joerg Roedel 提交于
The early #VC handler which doesn't have a GHCB can only handle CPUID exit codes. It is needed by the early boot code to handle #VC exceptions raised in verify_cpu() and to get the position of the C-bit. But the CPUID information comes from the hypervisor which is untrusted and might return results which trick the guest into the no-SEV boot path with no C-bit set in the page-tables. All data written to memory would then be unencrypted and could leak sensitive data to the hypervisor. Add sanity checks to the early #VC handler to make sure the hypervisor can not pretend that SEV is disabled. [ bp: Massage a bit. ] Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20201028164659.27002-3-joro@8bytes.org
-
- 28 10月, 2020 3 次提交
-
-
由 Peter Zijlstra 提交于
Commit d53d9bc0 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6") changed the semantics of the variable from random collection of bits, to exactly only those bits that ptrace() needs. Unfortunately this lost DR_STEP for PTRACE_{BLOCK,SINGLE}STEP. Furthermore, it turns out that userspace expects DR_STEP to be unconditionally available, even for manual TF usage outside of PTRACE_{BLOCK,SINGLE}_STEP. Fixes: d53d9bc0 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6") Reported-by: NKyle Huey <me@kylehuey.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: Kyle Huey <me@kylehuey.com> Link: https://lore.kernel.org/r/20201027183330.GM2628@hirez.programming.kicks-ass.net
-
由 Peter Zijlstra 提交于
The ->virtual_dr6 is the value used by ptrace_{get,set}_debugreg(6). A kernel #DB clearing it could mean spurious malfunction of ptrace() expectations. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: Kyle Huey <me@kylehuey.com> Link: https://lore.kernel.org/r/20201027093608.028952500@infradead.org
-
由 Peter Zijlstra 提交于
The SDM states that #DB clears DEBUGCTLMSR_BTF, this means that when the bit is set for userspace (TIF_BLOCKSTEP) and a kernel #DB happens first, the BTF bit meant for userspace execution is lost. Have the kernel #DB handler restore the BTF bit when it was requested for userspace. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: Kyle Huey <me@kylehuey.com> Link: https://lore.kernel.org/r/20201027093607.956147736@infradead.org
-
- 26 10月, 2020 1 次提交
-
-
由 Joe Perches 提交于
Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.plSigned-off-by: NJoe Perches <joe@perches.com> Reviewed-by: NNick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: NMiguel Ojeda <ojeda@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 10月, 2020 1 次提交
-
-
由 Juergen Gross 提交于
When running in lazy TLB mode the currently active page tables might be the ones of a previous process, e.g. when running a kernel thread. This can be problematic in case kernel code is being modified via text_poke() in a kernel thread, and on another processor exit_mmap() is active for the process which was running on the first cpu before the kernel thread. As text_poke() is using a temporary address space and the former address space (obtained via cpu_tlbstate.loaded_mm) is restored afterwards, there is a race possible in case the cpu on which exit_mmap() is running wants to make sure there are no stale references to that address space on any cpu active (this e.g. is required when running as a Xen PV guest, where this problem has been observed and analyzed). In order to avoid that, drop off TLB lazy mode before switching to the temporary address space. Fixes: cefa929c ("x86/mm: Introduce temporary mm structs") Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201009144225.12019-1-jgross@suse.com
-
- 19 10月, 2020 1 次提交
-
-
由 Arvind Sankar 提交于
On 64-bit, the startup_64_setup_env() function added in 866b556e ("x86/head/64: Install startup GDT") has stack protection enabled because of set_bringup_idt_handler(). This happens when CONFIG_STACKPROTECTOR_STRONG is enabled. It also currently needs CONFIG_AMD_MEM_ENCRYPT enabled because then set_bringup_idt_handler() is not an empty stub but that might change in the future, when the other vendor adds their similar technology. At this point, %gs is not yet initialized, and this doesn't cause a crash only because the #PF handler from the decompressor stub is still installed and handles the page fault. Disable stack protection for the whole file, and do it on 32-bit as well to avoid surprises. [ bp: Extend commit message with the exact explanation how it happens. ] Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NJoerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/20201008191623.2881677-6-nivedita@alum.mit.edu
-
- 18 10月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
A previous commit changed the notification mode from true/false to an int, allowing notify-no, notify-yes, or signal-notify. This was backwards compatible in the sense that any existing true/false user would translate to either 0 (on notification sent) or 1, the latter which mapped to TWA_RESUME. TWA_SIGNAL was assigned a value of 2. Clean this up properly, and define a proper enum for the notification mode. Now we have: - TWA_NONE. This is 0, same as before the original change, meaning no notification requested. - TWA_RESUME. This is 1, same as before the original change, meaning that we use TIF_NOTIFY_RESUME. - TWA_SIGNAL. This uses TIF_SIGPENDING/JOBCTL_TASK_WORK for the notification. Clean up all the callers, switching their 0/1/false/true to using the appropriate TWA_* mode for notifications. Fixes: e91b4816 ("task_work: teach task_work_add() to do signal_wake_up()") Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 10月, 2020 2 次提交
-
-
由 Michael Kelley 提交于
On ARM64, Hyper-V now specifies the interrupt to be used by VMbus in the ACPI DSDT. This information is not used on x86 because the interrupt vector must be hardcoded. But update the generic VMbus driver to do the parsing and pass the information to the architecture specific code that sets up the Linux IRQ. Update consumers of the interrupt to get it from an architecture specific function. Signed-off-by: NMichael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1597434304-40631-1-git-send-email-mikelley@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
-
由 Jiri Slaby 提交于
GCC 10 optimizes the scheduler code differently than its predecessors. When CONFIG_DEBUG_SECTION_MISMATCH=y, the Makefile forces GCC not to inline some functions (-fno-inline-functions-called-once). Before GCC 10, "no-inlined" __schedule() starts with the usual prologue: push %bp mov %sp, %bp So the ORC unwinder simply picks stack pointer from %bp and unwinds from __schedule() just perfectly: $ cat /proc/1/stack [<0>] ep_poll+0x3e9/0x450 [<0>] do_epoll_wait+0xaa/0xc0 [<0>] __x64_sys_epoll_wait+0x1a/0x20 [<0>] do_syscall_64+0x33/0x40 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 But now, with GCC 10, there is no %bp prologue in __schedule(): $ cat /proc/1/stack <nothing> The ORC entry of the point in __schedule() is: sp:sp+88 bp:last_sp-48 type:call end:0 In this case, nobody subtracts sizeof "struct inactive_task_frame" in __unwind_start(). The struct is put on the stack by __switch_to_asm() and only then __switch_to_asm() stores %sp to task->thread.sp. But we start unwinding from a point in __schedule() (stored in frame->ret_addr by 'call') and not in __switch_to_asm(). So for these example values in __unwind_start(): sp=ffff94b50001fdc8 bp=ffff8e1f41d29340 ip=__schedule+0x1f0 The stack is: ffff94b50001fdc8: ffff8e1f41578000 # struct inactive_task_frame ffff94b50001fdd0: 0000000000000000 ffff94b50001fdd8: ffff8e1f41d29340 ffff94b50001fde0: ffff8e1f41611d40 # ... ffff94b50001fde8: ffffffff93c41920 # bx ffff94b50001fdf0: ffff8e1f41d29340 # bp ffff94b50001fdf8: ffffffff9376cad0 # ret_addr (and end of the struct) 0xffffffff9376cad0 is __schedule+0x1f0 (after the call to __switch_to_asm). Now follow those 88 bytes from the ORC entry (sp+88). The entry is correct, __schedule() really pushes 48 bytes (8*7) + 32 bytes via subq to store some local values (like 4U below). So to unwind, look at the offset 88-sizeof(long) = 0x50 from here: ffff94b50001fe00: ffff8e1f41578618 ffff94b50001fe08: 00000cc000000255 ffff94b50001fe10: 0000000500000004 ffff94b50001fe18: 7793fab6956b2d00 # NOTE (see below) ffff94b50001fe20: ffff8e1f41578000 ffff94b50001fe28: ffff8e1f41578000 ffff94b50001fe30: ffff8e1f41578000 ffff94b50001fe38: ffff8e1f41578000 ffff94b50001fe40: ffff94b50001fed8 ffff94b50001fe48: ffff8e1f41577ff0 ffff94b50001fe50: ffffffff9376cf12 Here ^^^^^^^^^^^^^^^^ is the correct ret addr from __schedule(). It translates to schedule+0x42 (insn after a call to __schedule()). BUT, unwind_next_frame() tries to take the address starting from 0xffff94b50001fdc8. That is exactly from thread.sp+88-sizeof(long) = 0xffff94b50001fdc8+88-8 = 0xffff94b50001fe18, which is garbage marked as NOTE above. So this quits the unwinding as 7793fab6956b2d00 is obviously not a kernel address. There was a fix to skip 'struct inactive_task_frame' in unwind_get_return_address_ptr in the following commit: 187b96db ("x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks") But we need to skip the struct already in the unwinder proper. So subtract the size (increase the stack pointer) of the structure in __unwind_start() directly. This allows for removal of the code added by commit 187b96db completely, as the address is now at '(unsigned long *)state->sp - 1', the same as in the generic case. [ mingo: Cleaned up the changelog a bit, for better readability. ] Fixes: ee9f8fce ("x86/unwind: Add the ORC unwinder") Bug: https://bugzilla.suse.com/show_bug.cgi?id=1176907Signed-off-by: NJiri Slaby <jslaby@suse.cz> Signed-off-by: NIngo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20201014053051.24199-1-jslaby@suse.cz
-
- 14 10月, 2020 5 次提交
-
-
由 Kairui Song 提交于
kexec_file_load() currently reuses the old boot_params.screen_info, but if drivers have change the hardware state, boot_param.screen_info could contain invalid info. For example, the video type might be no longer VGA, or the frame buffer address might be changed. If the kexec kernel keeps using the old screen_info, kexec'ed kernel may attempt to write to an invalid framebuffer memory region. There are two screen_info instances globally available, boot_params.screen_info and screen_info. Later one is a copy, and is updated by drivers. So let kexec_file_load use the updated copy. [ mingo: Tidied up the changelog. ] Signed-off-by: NKairui Song <kasong@redhat.com> Signed-off-by: NIngo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20201014092429.1415040-2-kasong@redhat.com
-
由 Mike Rapoport 提交于
* Replace magic numbers with defines * Replace memblock_find_in_range() + memblock_reserve() with memblock_phys_alloc_range() * Stop checking for low memory size in reserve_crashkernel_low(). The allocation from limited range will anyway fail if there is no enough memory, so there is no need for extra traversal of memblock.memory Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NBaoquan He <bhe@redhat.com> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-15-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
Currently, initrd image is reserved very early during setup and then it might be relocated and re-reserved after the initial physical memory mapping is created. The "late" reservation of memblock verifies that mapped memory size exceeds the size of initrd, then checks whether the relocation required and, if yes, relocates inirtd to a new memory allocated from memblock and frees the old location. The check for memory size is excessive as memblock allocation will anyway fail if there is not enough memory. Besides, there is no point to allocate memory from memblock using memblock_find_in_range() + memblock_reserve() when there exists memblock_phys_alloc_range() with required functionality. Remove the redundant check and simplify memblock allocation. Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NBaoquan He <bhe@redhat.com> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-14-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
In preparation for attaching a platform device per iomem resource teach the efi_fake_mem code to create an e820 entry per instance. Similar to E820_TYPE_PRAM, bypass merging resource when the e820 map is sanitized. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NArd Biesheuvel <ardb@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Link: https://lkml.kernel.org/r/159643096068.4062302.11590041070221681669.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Thomas Gleixner 提交于
The conversion of #DE to the idtentry mechanism introduced a change in the Ooops message which confuses tools which parse crash information in dmesg. Remove the underscore from 'divide_error' to restore previous behaviour. Fixes: 9d06c402 ("x86/entry: Convert Divide Error to IDTENTRY") Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/CACT4Y+bTZFkuZd7+bPArowOv-7Die+WZpfOWnEO_Wgs3U59+oA@mail.gmail.com
-
- 07 10月, 2020 13 次提交
-
-
由 Tony Luck 提交于
All instructions copying data between kernel and user memory are tagged with either _ASM_EXTABLE_UA or _ASM_EXTABLE_CPY entries in the exception table. ex_fault_handler_type() returns EX_HANDLER_UACCESS for both of these. Recovery is only possible when the machine check was triggered on a read from user memory. In this case the same strategy for recovery applies as if the user had made the access in ring3. If the fault was in kernel memory while copying to user there is no current recovery plan. For MOV and MOVZ instructions a full decode of the instruction is done to find the source address. For MOVS instructions the source address is in the %rsi register. The function fault_in_kernel_space() determines whether the source address is kernel or user, upgrade it from "static" so it can be used here. Co-developed-by: NYouquan Song <youquan.song@intel.com> Signed-off-by: NYouquan Song <youquan.song@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-7-tony.luck@intel.com
-
由 Tony Luck 提交于
Existing kernel code can only recover from a machine check on code that is tagged in the exception table with a fault handling recovery path. Add two new fields in the task structure to pass information from machine check handler to the "task_work" that is queued to run before the task returns to user mode: + mce_vaddr: will be initialized to the user virtual address of the fault in the case where the fault occurred in the kernel copying data from a user address. This is so that kill_me_maybe() can provide that information to the user SIGBUS handler. + mce_kflags: copy of the struct mce.kflags needed by kill_me_maybe() to determine if mce_vaddr is applicable to this error. Add code to recover from a machine check while copying data from user space to the kernel. Action for this case is the same as if the user touched the poison directly; unmap the page and send a SIGBUS to the task. Use a new helper function to share common code between the "fault in user mode" case and the "fault while copying from user" case. New code paths will be activated by the next patch which sets MCE_IN_KERNEL_COPYIN. Suggested-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-6-tony.luck@intel.com
-
由 Tony Luck 提交于
Avoid a proliferation of ex_has_*_handler() functions by having just one function that returns the type of the handler (if any). Drop the __visible attribute for this function. It is not called from assembler so the attribute is not necessary. Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-3-tony.luck@intel.com
-
由 Youquan Song 提交于
New recovery features require additional information about processor state when a machine check occurred. Pass pt_regs down to the routines that need it. No functional change. Signed-off-by: NYouquan Song <youquan.song@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-2-tony.luck@intel.com
-
由 Mike Travis 提交于
Add Copyrights to those files that have been updated for UV5 changes. Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201005203929.148656-14-mike.travis@hpe.com
-
由 Mike Travis 提交于
Update check of BIOS TSC sync status to include both possible "invalid" states provided by newer UV5 BIOS. Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-12-mike.travis@hpe.com
-
由 Mike Travis 提交于
The changes in the UV5 arch shrunk the NODE PRESENT table to just 2x64 entries (128 total) so are in to 64 bit MMRs instead of a depth of 64 bits in an array. Adjust references when counting up the nodes present. Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NDimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-11-mike.travis@hpe.com
-
由 Mike Travis 提交于
Make modifications to the GRU mappings to accommodate changes for UV5. Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NDimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-10-mike.travis@hpe.com
-
由 Mike Travis 提交于
Make modifications to the GAM MMR mappings to accommodate changes for UV5. Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NDimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-9-mike.travis@hpe.com
-
由 Mike Travis 提交于
Make modifications to the MMIOH mappings to accommodate changes for UV5. [ Fix W=1 build warnings. ] Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-8-mike.travis@hpe.com
-
由 Mike Travis 提交于
When the UV BIOS starts the kernel it passes the UVsystab info struct to the kernel which contains information elements more specific than ACPI, and generally pertinent only to the MMRs. These are read only fields so information is passed one way only. A new field starting with UV5 is the UV architecture type so the ACPI OEM_ID field can be used for other purposes going forward. The UV Arch Type selects the entirety of the MMRs available, with their addresses and fields defined in uv_mmrs.h. Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NDimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-7-mike.travis@hpe.com
-
由 Mike Travis 提交于
Add new references to UV5 (and UVY class) system MMR addresses and fields primarily caused by the expansion from 46 to 52 bits of physical memory address. Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NDimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-6-mike.travis@hpe.com
-
由 Mike Travis 提交于
Update UV MMRs in uv_mmrs.h for UV5 based on Verilog output from the UV Hub hardware design files. This is the next UV architecture with a new class (UVY) being defined for 52 bit physical address masks. Uses a bitmask for UV arch identification so a single test can cover multiple versions. Includes other adjustments to match the uv_mmrs.h file to keep from encountering compile errors. New UV5 functionality is added in the patches that follow. [ Fix W=1 build warnings. ] Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-5-mike.travis@hpe.com
-