- 01 6月, 2021 1 次提交
-
-
由 Peter Zijlstra 提交于
With the removal of kprobe::handle_fault there is no reason left that kprobe_page_fault() would ever return true on x86, make sure it doesn't happen by accident. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210525073213.660594073@infradead.org
-
- 22 3月, 2021 1 次提交
-
-
由 Ingo Molnar 提交于
Fix another ~42 single-word typos in arch/x86/ code comments, missed a few in the first pass, in particular in .S files. Signed-off-by: NIngo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
-
- 18 3月, 2021 1 次提交
-
-
由 Ingo Molnar 提交于
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: NIngo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
-
- 27 2月, 2021 3 次提交
-
-
由 Marco Elver 提交于
Add KFENCE test suite, testing various error detection scenarios. Makes use of KUnit for test organization. Since KFENCE's interface to obtain error reports is via the console, the test verifies that KFENCE outputs expected reports to the console. [elver@google.com: fix typo in test] Link: https://lkml.kernel.org/r/X9lHQExmHGvETxY4@elver.google.com [elver@google.com: show access type in report] Link: https://lkml.kernel.org/r/20210111091544.3287013-2-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-9-elver@google.comSigned-off-by: NAlexander Potapenko <glider@google.com> Signed-off-by: NMarco Elver <elver@google.com> Reviewed-by: NDmitry Vyukov <dvyukov@google.com> Co-developed-by: NAlexander Potapenko <glider@google.com> Reviewed-by: NJann Horn <jannh@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Marco Elver 提交于
Instead of removing the fault handling portion of the stack trace based on the fault handler's name, just use struct pt_regs directly. Change kfence_handle_page_fault() to take a struct pt_regs, and plumb it through to kfence_report_error() for out-of-bounds, use-after-free, or invalid access errors, where pt_regs is used to generate the stack trace. If the kernel is a DEBUG_KERNEL, also show registers for more information. Link: https://lkml.kernel.org/r/20201105092133.2075331-1-elver@google.comSigned-off-by: NMarco Elver <elver@google.com> Suggested-by: NMark Rutland <mark.rutland@arm.com> Acked-by: NMark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Potapenko 提交于
Add architecture specific implementation details for KFENCE and enable KFENCE for the x86 architecture. In particular, this implements the required interface in <asm/kfence.h> for setting up the pool and providing helper functions for protecting and unprotecting pages. For x86, we need to ensure that the pool uses 4K pages, which is done using the set_memory_4k() helper function. [elver@google.com: add missing copyright and description header] Link: https://lkml.kernel.org/r/20210118092159.145934-2-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-3-elver@google.comSigned-off-by: NMarco Elver <elver@google.com> Signed-off-by: NAlexander Potapenko <glider@google.com> Reviewed-by: NDmitry Vyukov <dvyukov@google.com> Co-developed-by: NMarco Elver <elver@google.com> Reviewed-by: NJann Horn <jannh@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 2月, 2021 1 次提交
-
-
由 Andy Lutomirski 提交于
efi_recover_from_page_fault() doesn't recover -- it does a special EFI mini-oops. Rename it to make it clear that it crashes. While renaming it, I noticed a blatant bug: a page fault oops in a different thread happening concurrently with an EFI runtime service call would be misinterpreted as an EFI page fault. Fix that. This isn't quite exact. The situation could be improved by using a special CS for calls into EFI. [ bp: Massage commit message and simplify in interrupt check. ] Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/f43b1e80830dc78ed60ed8b0826f4f189254570c.1612924255.git.luto@kernel.org
-
- 10 2月, 2021 12 次提交
-
-
由 Andy Lutomirski 提交于
A SMAP-violating kernel access is not a recoverable condition. Imagine kernel code that, outside of a uaccess region, dereferences a pointer to the user range by accident. If SMAP is on, this will reliably generate as an intentional user access. This makes it easy for bugs to be overlooked if code is inadequately tested both with and without SMAP. This was discovered because BPF can generate invalid accesses to user memory, but those warnings only got printed if SMAP was off. Make it so that this type of error will be discovered with SMAP on as well. [ bp: Massage commit message. ] Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/66a02343624b1ff46f02a838c497fc05c1a871b3.1612924255.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
If the kernel gets a SMEP violation or a fault that would have been a SMEP violation if it had SMEP support, it shouldn't run fixups. Just OOPS. [ bp: Massage commit message. ] Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/46160d8babce2abf1d6daa052146002efa24ac56.1612924255.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
The name no_context() has never been very clear. It's only called for faults from kernel mode, so rename it and change the no-longer-useful user_mode(regs) check to a WARN_ON_ONCE. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/c21940efe676024bb4bc721f7d70c29c420e127e.1612924255.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
Drop an indentation level and remove the last user_mode(regs) == true caller of no_context() by directly OOPSing for implicit kernel faults from usermode. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/6e3d1129494a8de1e59d28012286e3a292a2296e.1612924255.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
Not all callers of no_context() want to run exception fixups. Separate the OOPS code out from the fixup code in no_context(). Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/450f8d8eabafb83a5df349108c8e5ea83a2f939d.1612924255.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
Right now, the case of the kernel trying to execute from user memory is treated more or less just like the kernel getting a page fault on a user access. In the failure path, it checks for erratum #93, tries to otherwise fix up the error, and then oopses. If it manages to jump to the user address space, with or without SMEP, it should not try to resolve the page fault. This is an error, pure and simple. Rearrange the code so that this case is caught early, check for erratum #93, and bail out. [ bp: Massage commit message. ] Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/ab8719c7afb8bd501c4eee0e36493150fbbe5f6a.1612924255.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
In general, page fault errors for WRUSS should be just like get_user(), etc. Fix three bugs in this area: There is a comment that says that, if the kernel can't handle a page fault on a user address due to OOM, the OOM-kill-and-retry logic would be skipped. The code checked kernel *privilege*, not kernel mode, so it missed WRUSS. This means that the kernel would malfunction if it got OOM on a WRUSS fault -- this would be a kernel-mode, user-privilege fault, and the OOM killer would be invoked and the handler would retry the faulting instruction. A failed user access from kernel while a fatal signal is pending should fail even if the instruction in question was WRUSS. do_sigbus() should not send SIGBUS for WRUSS -- it should handle it like any other kernel mode failure. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/a7b7bcea730bd4069e6b7e629236bb2cf526c2fb.1612924255.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
If fault_signal_pending() returns true, then the core mm has unlocked the mm for us. Add a comment to help future readers of this code. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/c56de3d103f40e6304437b150aa7b215530d23f7.1612924255.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
bad_area() and its relatives are called from many places in fault.c, and exactly one of them wants the F00F workaround. __bad_area_nosemaphore() no longer contains any kernel fault code, which prepares for further cleanups. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/e9668729a48ce6754022b0a4415631e8ebdd00e7.1612924255.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
mm_fault_error() is logically just the end of do_user_addr_fault(). Combine the functions. This makes the code easier to read. Most of the churn here is from renaming hw_error_code to error_code in do_user_addr_fault(). This makes no difference at all to the generated code (objdump -dr) as compared to changing noinline to __always_inline in the definition of mm_fault_error(). Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/dedc4d9c9b047e51ce38b991bd23971a28af4e7b.1612924255.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
According to the Revision Guide for AMD Athlon™ 64 and AMD Opteron™ Processors, only early revisions of family 0xF are affected. This will avoid unnecessarily fetching instruction bytes before sending SIGSEGV to user programs. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/477173b7784bc28afb3e53d76ae5ef143917e8dd.1612924255.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
The recent rework of probe_kernel_address() and its conversion to get_kernel_nofault() inadvertently broke is_prefetch(). Before this change, probe_kernel_address() was used as a sloppy "read user or kernel memory" helper, but it doesn't do that any more. The new get_kernel_nofault() reads *kernel* memory only, which completely broke is_prefetch() for user access. Adjust the code to the correct accessor based on access mode. The manual address bounds check is no longer necessary, since the accessor helpers (get_user() / get_kernel_nofault()) do the right thing all by themselves. As a bonus, by using the correct accessor, the open-coded address bounds check is not needed anymore. [ bp: Massage commit message. ] Fixes: eab0c608 ("maccess: unify the probe kernel arch hooks") Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/b91f7f92f3367d2d3a88eec3b09c6aab1b2dc8ef.1612924255.git.luto@kernel.org
-
- 22 1月, 2021 1 次提交
-
-
由 Andy Lutomirski 提交于
The implementation was rather buggy. It unconditionally marked PTEs read-only, even for VM_SHARED mappings. I'm not sure whether this is actually a problem, but it certainly seems unwise. More importantly, it released the mmap lock before flushing the TLB, which could allow a racing CoW operation to falsely believe that the underlying memory was not writable. I can't find any users at all of this mechanism, so just remove it. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NStas Sergeev <stsp2@yandex.ru> Link: https://lkml.kernel.org/r/f3086de0babcab36f69949b5780bde851f719bc8.1611078018.git.luto@kernel.org
-
- 19 11月, 2020 2 次提交
-
-
由 Sean Christopherson 提交于
vDSO functions can now leverage an exception fixup mechanism similar to kernel exception fixup. For vDSO exception fixup, the initial user is Intel's Software Guard Extensions (SGX), which will wrap the low-level transitions to/from the enclave, i.e. EENTER and ERESUME instructions, in a vDSO function and leverage fixup to intercept exceptions that would otherwise generate a signal. This allows the vDSO wrapper to return the fault information directly to its caller, obviating the need for SGX applications and libraries to juggle signal handlers. Attempt to fixup vDSO exceptions immediately prior to populating and sending signal information. Except for the delivery mechanism, an exception in a vDSO function should be treated like any other exception in userspace, e.g. any fault that is successfully handled by the kernel should not be directly visible to userspace. Although it's debatable whether or not all exceptions are of interest to enclaves, defer to the vDSO fixup to decide whether to do fixup or generate a signal. Future users of vDSO fixup, if there ever are any, will undoubtedly have different requirements than SGX enclaves, e.g. the fixup vs. signal logic can be made function specific if/when necessary. Suggested-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NJarkko Sakkinen <jarkko@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NJethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-19-jarkko@kernel.org
-
由 Sean Christopherson 提交于
vDSO exception fixup is a replacement for signals in limited situations. Signals and vDSO exception fixup need to provide similar information to userspace, including the hardware error code. That hardware error code needs to be sanitized. For instance, if userspace accesses a kernel address, the error code could indicate to userspace whether the address had a Present=1 PTE. That can leak information about the kernel layout to userspace, which is bad. The existing signal code does this sanitization, but fairly late in the signal process. The vDSO exception code runs before the sanitization happens. Move error code sanitization out of the signal code and into a helper. Call the helper in the signal code. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NJarkko Sakkinen <jarkko@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NJethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-18-jarkko@kernel.org
-
- 17 11月, 2020 1 次提交
-
-
由 Sean Christopherson 提交于
The x86 architecture has a set of page fault error codes. These indicate things like whether the fault occurred from a write, or whether it originated in userspace. The SGX hardware architecture has its own per-page memory management metadata (EPCM) [*] and hardware which is separate from the normal x86 MMU. The architecture has a new page fault error code: PF_SGX. This new error code bit is set whenever a page fault occurs as the result of the SGX MMU. These faults occur for a variety of reasons. For instance, an access attempt to enclave memory from outside the enclave causes a PF_SGX fault. PF_SGX would also be set for permission conflicts, such as if a write to an enclave page occurs and the page is marked read-write in the x86 page tables but is read-only in the EPCM. These faults do not always indicate errors, though. SGX pages are encrypted with a key that is destroyed at hardware reset, including suspend. Throwing a SIGSEGV allows user space software to react and recover when these events occur. Include PF_SGX in the PF error codes list and throw SIGSEGV when it is encountered. [*] Intel SDM: 36.5.1 Enclave Page Cache Map (EPCM) [ bp: Add bit 15 to the comment above enum x86_pf_error_code too. ] Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NJarkko Sakkinen <jarkko@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NJethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-7-jarkko@kernel.org
-
- 22 10月, 2020 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
KVM was switched to interrupt-based mechanism for 'page ready' event delivery in Linux-5.8 (see commit 2635b5c4 ("KVM: x86: interrupt based APF 'page ready' event delivery")) and #PF (ab)use for 'page ready' event delivery was removed. Linux guest switched to this new mechanism exclusively in 5.9 (see commit b1d40575 ("KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery")) so it is not possible to get #PF for a 'page ready' event even when the guest is running on top of an older KVM (APF mechanism won't be enabled). Update the comment in exc_page_fault() to reflect the new reality. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201002154313.1505327-1-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 10月, 2020 1 次提交
-
-
由 Tony Luck 提交于
All instructions copying data between kernel and user memory are tagged with either _ASM_EXTABLE_UA or _ASM_EXTABLE_CPY entries in the exception table. ex_fault_handler_type() returns EX_HANDLER_UACCESS for both of these. Recovery is only possible when the machine check was triggered on a read from user memory. In this case the same strategy for recovery applies as if the user had made the access in ring3. If the fault was in kernel memory while copying to user there is no current recovery plan. For MOV and MOVZ instructions a full decode of the instruction is done to find the source address. For MOVS instructions the source address is in the %rsi register. The function fault_in_kernel_space() determines whether the source address is kernel or user, upgrade it from "static" so it can be used here. Co-developed-by: NYouquan Song <youquan.song@intel.com> Signed-off-by: NYouquan Song <youquan.song@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-7-tony.luck@intel.com
-
- 03 9月, 2020 1 次提交
-
-
由 Joerg Roedel 提交于
One can not simply remove vmalloc faulting on x86-32. Upstream commit: 7f0a002b ("x86/mm: remove vmalloc faulting") removed it on x86 alltogether because previously the arch_sync_kernel_mappings() interface was introduced. This interface added synchronization of vmalloc/ioremap page-table updates to all page-tables in the system at creation time and was thought to make vmalloc faulting obsolete. But that assumption was incredibly naive. It turned out that there is a race window between the time the vmalloc or ioremap code establishes a mapping and the time it synchronizes this change to other page-tables in the system. During this race window another CPU or thread can establish a vmalloc mapping which uses the same intermediate page-table entries (e.g. PMD or PUD) and does no synchronization in the end, because it found all necessary mappings already present in the kernel reference page-table. But when these intermediate page-table entries are not yet synchronized, the other CPU or thread will continue with a vmalloc address that is not yet mapped in the page-table it currently uses, causing an unhandled page fault and oops like below: BUG: unable to handle page fault for address: fe80c000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page *pde = 33183067 *pte = a8648163 Oops: 0002 [#1] SMP CPU: 1 PID: 13514 Comm: cve-2017-17053 Tainted: G ... Call Trace: ldt_dup_context+0x66/0x80 dup_mm+0x2b3/0x480 copy_process+0x133b/0x15c0 _do_fork+0x94/0x3e0 __ia32_sys_clone+0x67/0x80 __do_fast_syscall_32+0x3f/0x70 do_fast_syscall_32+0x29/0x60 do_SYSENTER_32+0x15/0x20 entry_SYSENTER_32+0x9f/0xf2 EIP: 0xb7eef549 So the arch_sync_kernel_mappings() interface is racy, but removing it would mean to re-introduce the vmalloc_sync_all() interface, which is even more awful. Keep arch_sync_kernel_mappings() in place and catch the race condition in the page-fault handler instead. Do a partial revert of above commit to get vmalloc faulting on x86-32 back in place. Fixes: 7f0a002b ("x86/mm: remove vmalloc faulting") Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NIngo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200902155904.17544-1-joro@8bytes.org
-
- 13 8月, 2020 2 次提交
-
-
由 Peter Xu 提交于
Use the general page fault accounting by passing regs into handle_mm_fault(). Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/20200707225021.200906-23-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Xu 提交于
Patch series "mm: Page fault accounting cleanups", v5. This is v5 of the pf accounting cleanup series. It originates from Gerald Schaefer's report on an issue a week ago regarding to incorrect page fault accountings for retried page fault after commit 4064b982 ("mm: allow VM_FAULT_RETRY for multiple times"): https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/ What this series did: - Correct page fault accounting: we do accounting for a page fault (no matter whether it's from #PF handling, or gup, or anything else) only with the one that completed the fault. For example, page fault retries should not be counted in page fault counters. Same to the perf events. - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf event is used in an adhoc way across different archs. Case (1): for many archs it's done at the entry of a page fault handler, so that it will also cover e.g. errornous faults. Case (2): for some other archs, it is only accounted when the page fault is resolved successfully. Case (3): there're still quite some archs that have not enabled this perf event. Since this series will touch merely all the archs, we unify this perf event to always follow case (1), which is the one that makes most sense. And since we moved the accounting into handle_mm_fault, the other two MAJ/MIN perf events are well taken care of naturally. - Unify definition of "major faults": the definition of "major fault" is slightly changed when used in accounting (not VM_FAULT_MAJOR). More information in patch 1. - Always account the page fault onto the one that triggered the page fault. This does not matter much for #PF handlings, but mostly for gup. More information on this in patch 25. Patchset layout: Patch 1: Introduced the accounting in handle_mm_fault(), not enabled. Patch 2-23: Enable the new accounting for arch #PF handlers one by one. Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.) Patch 25: Cleanup GUP task_struct pointer since it's not needed any more This patch (of 25): This is a preparation patch to move page fault accountings into the general code in handle_mm_fault(). This includes both the per task flt_maj/flt_min counters, and the major/minor page fault perf events. To do this, the pt_regs pointer is passed into handle_mm_fault(). PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault handlers. So far, all the pt_regs pointer that passed into handle_mm_fault() is NULL, which means this patch should have no intented functional change. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 8月, 2020 1 次提交
-
-
由 Mike Rapoport 提交于
Patch series "mm: cleanup usage of <asm/pgalloc.h>" Most architectures have very similar versions of pXd_alloc_one() and pXd_free_one() for intermediate levels of page table. These patches add generic versions of these functions in <asm-generic/pgalloc.h> and enable use of the generic functions where appropriate. In addition, functions declared and defined in <asm/pgalloc.h> headers are used mostly by core mm and early mm initialization in arch and there is no actual reason to have the <asm/pgalloc.h> included all over the place. The first patch in this series removes unneeded includes of <asm/pgalloc.h> In the end it didn't work out as neatly as I hoped and moving pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require unnecessary changes to arches that have custom page table allocations, so I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local to mm/. This patch (of 8): In most cases <asm/pgalloc.h> header is required only for allocations of page table memory. Most of the .c files that include that header do not use symbols declared in <asm/pgalloc.h> and do not require that header. As for the other header files that used to include <asm/pgalloc.h>, it is possible to move that include into the .c file that actually uses symbols from <asm/pgalloc.h> and drop the include from the header file. The process was somewhat automated using sed -i -E '/[<"]asm\/pgalloc\.h/d' \ $(grep -L -w -f /tmp/xx \ $(git grep -E -l '[<"]asm/pgalloc\.h')) where /tmp/xx contains all the symbols defined in arch/*/include/asm/pgalloc.h. [rppt@linux.ibm.com: fix powerpc warning] Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NPekka Enberg <penberg@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 7月, 2020 1 次提交
-
-
由 Thomas Gleixner 提交于
Remove the temporary defines and fixup all references. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.855839271@linutronix.de
-
- 07 7月, 2020 1 次提交
-
-
由 Andy Lutomirski 提交于
They were originally called _cond_rcu because they were special versions with conditional RCU handling. Now they're the standard entry and exit path, so the _cond_rcu part is just confusing. Drop it. Also change the signature to make them more extensible and more foolproof. No functional change -- it's pure refactoring. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/247fc67685263e0b673e1d7f808182d28ff80359.1593795633.git.luto@kernel.org
-
- 19 6月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Better describe what this helper does, and match the naming of copy_from_kernel_nofault. Also switch the argument order around, so that it acts and looks like get_user(). Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 6月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Better describe what these functions do. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 6月, 2020 4 次提交
-
-
由 Thomas Gleixner 提交于
- Move load_current_idt() out of line and replace the hideous comment with a lockdep assert. This allows to make idt_table and idt_descr static. - Mark idt_table read only after the IDT initialization is complete. - Shuffle code around to consolidate the #ifdef sections into one. - Adapt the F00F bug code. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200528145523.084915381@linutronix.de
-
由 Thomas Gleixner 提交于
Convert page fault exceptions to IDTENTRY_RAW: - Implement the C entry point with DEFINE_IDTENTRY_RAW - Add the CR2 read into the exception handler - Add the idtentry_enter/exit_cond_rcu() invocations in in the regular page fault handler and in the async PF part. - Emit the ASM stub with DECLARE_IDTENTRY_RAW - Remove the ASM idtentry in 64-bit - Remove the CR2 read from 64-bit - Remove the open coded ASM entry code in 32-bit - Fix up the XEN/PV code - Remove the old prototypes No functional change. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org> Acked-by: NAndy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202118.238455120@linutronix.de
-
由 Thomas Gleixner 提交于
Convert #UD to IDTENTRY: - Implement the C entry point with DEFINE_IDTENTRY - Emit the ASM stub with DECLARE_IDTENTRY - Remove the ASM idtentry in 64bit - Remove the open coded ASM entry code in 32bit - Fixup the XEN/PV code - Fixup the FOOF bug call in fault.c - Remove the old prototypes No functional change. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NAlexandre Chartre <alexandre.chartre@oracle.com> Acked-by: NAndy Lutomirski <luto@kernel.org> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134904.955511913@linutronix.de
-
由 Thomas Gleixner 提交于
Traps enable interrupts conditionally but rely on the ASM return code to disable them again. That results in redundant interrupt disable and trace calls. Make the trap handlers disable interrupts before returning to avoid that, which allows simplification of the ASM entry code in follow up changes. Originally-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NAlexandre Chartre <alexandre.chartre@oracle.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Acked-by: NAndy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200505134903.622702796@linutronix.de
-
- 10 6月, 2020 3 次提交
-
-
由 Michel Lespinasse 提交于
Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Rename the mmap_sem field to mmap_lock. Any new uses of this lock should now go through the new mmap locking api. The mmap_lock is still implemented as a rwsem, though this could change in the future. [akpm@linux-foundation.org: fix it for mm-gup-might_lock_readmmap_sem-in-get_user_pages_fast.patch] Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NDavidlohr Bueso <dbueso@suse.de> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-11-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-