- 02 3月, 2022 1 次提交
-
-
由 Kees Cook 提交于
Partially revert commit 5f501d55 ("binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE"), which applied the ET_DYN "total_mapping_size" logic also to ET_EXEC. At least ia64 has ET_EXEC PT_LOAD segments that are not virtual-address contiguous (but _are_ file-offset contiguous). This would result in a giant mapping attempting to cover the entire span, including the virtual address range hole, and well beyond the size of the ELF file itself, causing the kernel to refuse to load it. For example: $ readelf -lW /usr/bin/gcc ... Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz ... ... LOAD 0x000000 0x4000000000000000 0x4000000000000000 0x00b5a0 0x00b5a0 ... LOAD 0x00b5a0 0x600000000000b5a0 0x600000000000b5a0 0x0005ac 0x000710 ... ... ^^^^^^^^ ^^^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^ File offset range : 0x000000-0x00bb4c 0x00bb4c bytes Virtual address range : 0x4000000000000000-0x600000000000bcb0 0x200000000000bcb0 bytes Remove the total_mapping_size logic for ET_EXEC, which reduces the ET_EXEC MAP_FIXED_NOREPLACE coverage to only the first PT_LOAD (better than nothing), and retains it for ET_DYN. Ironically, this is the reverse of the problem that originally caused problems with MAP_FIXED_NOREPLACE: overlapping PT_LOAD segments. Future work could restore full coverage if load_elf_binary() were to perform mappings in a separate phase from the loading (where it could resolve both overlaps and holes). Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Reported-by: Nmatoro <matoro_bugzilla_kernel@matoro.tk> Fixes: 5f501d55 ("binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE") Link: https://lore.kernel.org/r/a3edd529-c42d-3b09-135c-7e98a15b150f@leemhuis.infoTested-by: Nmatoro <matoro_mailinglist_kernel@matoro.tk> Link: https://lore.kernel.org/lkml/ce8af9c13bcea9230c7689f3c1e0e2cd@matoro.tkTested-By: NJohn Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/lkml/49182d0d-708b-4029-da5f-bc18603440a6@physik.fu-berlin.de Cc: stable@vger.kernel.org Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 20 1月, 2022 2 次提交
-
-
由 H.J. Lu 提交于
Extend commit ce81bb25 ("fs/binfmt_elf: use PT_LOAD p_align values for suitable start address") which fixed PIE binaries built with -Wl,-z,max-page-size=0x200000, to cover static PIE binaries. This fixes: https://bugzilla.kernel.org/show_bug.cgi?id=215275 Tested by verifying static PIE binaries with -Wl,-z,max-page-size=0x200000 loading. Link: https://lkml.kernel.org/r/20211209174052.370537-1-hjl.tools@gmail.comSigned-off-by: NH.J. Lu <hjl.tools@gmail.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: David Rientjes <rientjes@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Fangrui Song <maskray@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yafang Shao 提交于
It is better to use get_task_comm() instead of the open coded string copy as we do in other places. struct elf_prpsinfo is used to dump the task information in userspace coredump or kernel vmcore. Below is the verification of vmcore, crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM 0 0 0 ffffffff9d21a940 RU 0.0 0 0 [swapper/0] > 0 0 1 ffffa09e40f85e80 RU 0.0 0 0 [swapper/1] > 0 0 2 ffffa09e40f81f80 RU 0.0 0 0 [swapper/2] > 0 0 3 ffffa09e40f83f00 RU 0.0 0 0 [swapper/3] > 0 0 4 ffffa09e40f80000 RU 0.0 0 0 [swapper/4] > 0 0 5 ffffa09e40f89f80 RU 0.0 0 0 [swapper/5] 0 0 6 ffffa09e40f8bf00 RU 0.0 0 0 [swapper/6] > 0 0 7 ffffa09e40f88000 RU 0.0 0 0 [swapper/7] > 0 0 8 ffffa09e40f8de80 RU 0.0 0 0 [swapper/8] > 0 0 9 ffffa09e40f95e80 RU 0.0 0 0 [swapper/9] > 0 0 10 ffffa09e40f91f80 RU 0.0 0 0 [swapper/10] > 0 0 11 ffffa09e40f93f00 RU 0.0 0 0 [swapper/11] > 0 0 12 ffffa09e40f90000 RU 0.0 0 0 [swapper/12] > 0 0 13 ffffa09e40f9bf00 RU 0.0 0 0 [swapper/13] > 0 0 14 ffffa09e40f98000 RU 0.0 0 0 [swapper/14] > 0 0 15 ffffa09e40f9de80 RU 0.0 0 0 [swapper/15] It works well as expected. Some comments are added to explain why we use the hard-coded 16. Link: https://lkml.kernel.org/r/20211120112738.45980-5-laoar.shao@gmail.comSuggested-by: NKees Cook <keescook@chromium.org> Signed-off-by: NYafang Shao <laoar.shao@gmail.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 11月, 2021 2 次提交
-
-
由 Alexey Dobriyan 提交于
"A -= B; A" is equivalent to "A -= B". Link: https://lkml.kernel.org/r/YVmcP256fRMqCwgK@localhost.localdomainSigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
Commit b212921b ("elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings") reverted back to using MAP_FIXED to map ELF LOAD segments because it was found that the segments in some binaries overlap and can cause MAP_FIXED_NOREPLACE to fail. The original intent of MAP_FIXED_NOREPLACE in the ELF loader was to prevent the silent clobbering of an existing mapping (e.g. stack) by the ELF image, which could lead to exploitable conditions. Quoting commit 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map"), which originally introduced the use of MAP_FIXED_NOREPLACE in the loader: Both load_elf_interp and load_elf_binary rely on elf_map to map segments [to a specific] address and they use MAP_FIXED to enforce that. This is however [a] dangerous thing prone to silent data corruption which can be even exploitable. ... Let's take CVE-2017-1000253 as an example ... we could end up mapping [the executable] over the existing stack ... The [stack layout] issue has been fixed since then ... So we should be safe and any [similar] attack should be impractical. On the other hand this is just too subtle [an] assumption ... it can break quite easily and [be] hard to spot. ... Address this [weakness] by changing MAP_FIXED to the newly added MAP_FIXED_NOREPLACE. This will mean that mmap will fail if there is an existing mapping clashing with the requested one [instead of silently] clobbering it. Then processing ET_DYN binaries the loader already calculates a total size for the image when the first segment is mapped, maps the entire image, and then unmaps the remainder before the remaining segments are then individually mapped. To avoid the earlier problems (legitimate overlapping LOAD segments specified in the ELF), apply the same logic to ET_EXEC binaries as well. For both ET_EXEC and ET_DYN+INTERP use MAP_FIXED_NOREPLACE for the initial total size mapping and then use MAP_FIXED to build the final (possibly legitimately overlapping) mappings. For ET_DYN w/out INTERP, continue to map at a system-selected address in the mmap region. Link: https://lkml.kernel.org/r/20210916215947.3993776-1-keescook@chromium.org Link: https://lore.kernel.org/lkml/1595869887-23307-2-git-send-email-anthony.yznaga@oracle.comCo-developed-by: NAnthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: NAnthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Chen Jingwen <chenjingwen6@huawei.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrei Vagin <avagin@openvz.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 10月, 2021 1 次提交
-
-
由 Eric W. Biederman 提交于
Today when a signal is delivered with a handler of SIG_DFL whose default behavior is to generate a core dump not only that process but every process that shares the mm is killed. In the case of vfork this looks like a real world problem. Consider the following well defined sequence. if (vfork() == 0) { execve(...); _exit(EXIT_FAILURE); } If a signal that generates a core dump is received after vfork but before the execve changes the mm the process that called vfork will also be killed (as the mm is shared). Similarly if the execve fails after the point of no return the kernel delivers SIGSEGV which will kill both the exec'ing process and because the mm is shared the process that called vfork as well. As far as I can tell this behavior is a violation of people's reasonable expectations, POSIX, and is unnecessarily fragile when the system is low on memory. Solve this by making a userspace visible change to only kill a single process/thread group. This is possible because Jann Horn recently modified[1] the coredump code so that the mm can safely be modified while the coredump is happening. With LinuxThreads long gone I don't expect anyone to have a notice this behavior change in practice. To accomplish this move the core_state pointer from mm_struct to signal_struct, which allows different thread groups to coredump simultatenously. In zap_threads remove the work to kill anything except for the current thread group. v2: Remove core_state from the VM_BUG_ON_MM print to fix compile failure when CONFIG_DEBUG_VM is enabled. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> [1] a07279c9 ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot") Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3") History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Link: https://lkml.kernel.org/r/87y27mvnke.fsf@disp2133 Link: https://lkml.kernel.org/r/20211007144701.67592574@canb.auug.org.auReviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 04 10月, 2021 1 次提交
-
-
由 Chen Jingwen 提交于
In commit b212921b ("elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings") we still leave MAP_FIXED_NOREPLACE in place for load_elf_interp. Unfortunately, this will cause kernel to fail to start with: 1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already Failed to execute /init (error -17) The reason is that the elf interpreter (ld.so) has overlapping segments. readelf -l ld-2.31.so Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x000000000002c94c 0x000000000002c94c R E 0x10000 LOAD 0x000000000002dae0 0x000000000003dae0 0x000000000003dae0 0x00000000000021e8 0x0000000000002320 RW 0x10000 LOAD 0x000000000002fe00 0x000000000003fe00 0x000000000003fe00 0x00000000000011ac 0x0000000000001328 RW 0x10000 The reason for this problem is the same as described in commit ad55eac7 ("elf: enforce MAP_FIXED on overlaying elf segments"). Not only executable binaries, elf interpreters (e.g. ld.so) can have overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go back to MAP_FIXED in load_elf_interp. Fixes: 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map") Cc: <stable@vger.kernel.org> # v4.19 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NChen Jingwen <chenjingwen6@huawei.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 9月, 2021 2 次提交
-
-
由 David Hildenbrand 提交于
At exec time when we mmap the new executable via MAP_DENYWRITE we have it opened via do_open_execat() and already deny_write_access()'ed the file successfully. Once exec completes, we allow_write_acces(); however, we set mm->exe_file in begin_new_exec() via set_mm_exe_file() and also deny_write_access() as long as mm->exe_file remains set. We'll effectively deny write access to our executable via mm->exe_file until mm->exe_file is changed -- when the process is removed, on new exec, or via sys_prctl(PR_SET_MM_MAP/EXE_FILE). Let's remove all usage of MAP_DENYWRITE, it's no longer necessary for mm->exe_file. In case of an elf interpreter, we'll now only deny write access to the file during exec. This is somewhat okay, because the interpreter behaves (and sometime is) a shared library; all shared libraries, especially the ones loaded directly in user space like via dlopen() won't ever be mapped via MAP_DENYWRITE, because we ignore that from user space completely; these shared libraries can always be modified while mapped and executed. Let's only special-case the main executable, denying write access while being executed by a process. This can be considered a minor user space visible change. While this is a cleanup, it also fixes part of a problem reported with VM_DENYWRITE on overlayfs, as VM_DENYWRITE is effectively unused with this patch and will be removed next: "Overlayfs did not honor positive i_writecount on realfile for VM_DENYWRITE mappings." [1] [1] https://lore.kernel.org/r/YNHXzBgzRrZu1MrD@miu.piliscsaba.redhat.com/Reported-by: NChengguang Xu <cgxu519@mykernel.net> Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NDavid Hildenbrand <david@redhat.com>
-
由 David Hildenbrand 提交于
uselib() is the legacy systemcall for loading shared libraries. Nowadays, applications use dlopen() to load shared libraries, completely implemented in user space via mmap(). For example, glibc uses MAP_COPY to mmap shared libraries. While this maps to MAP_PRIVATE | MAP_DENYWRITE on Linux, Linux ignores any MAP_DENYWRITE specification from user space in mmap. With this change, all remaining in-tree users of MAP_DENYWRITE use it to map an executable. We will be able to open shared libraries loaded via uselib() writable, just as we already can via dlopen() from user space. This is one step into the direction of removing MAP_DENYWRITE from the kernel. This can be considered a minor user space visible change. Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NDavid Hildenbrand <david@redhat.com>
-
- 30 6月, 2021 1 次提交
-
-
由 David Hildenbrand 提交于
Ever since commit e9714acf ("mm: kill vma flag VM_EXECUTABLE and mm->num_exe_file_vmas"), VM_EXECUTABLE is gone and MAP_EXECUTABLE is essentially completely ignored. Let's remove all usage of MAP_EXECUTABLE. [akpm@linux-foundation.org: fix blooper in fs/binfmt_aout.c. per David] Link: https://lkml.kernel.org/r/20210421093453.6904-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: NKees Cook <keescook@chromium.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kevin Brodsky <Kevin.Brodsky@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 6月, 2021 1 次提交
-
-
由 Peter Zijlstra 提交于
Change the type and name of task_struct::state. Drop the volatile and shrink it to an 'unsigned int'. Rename it in order to find all uses such that we can use READ_ONCE/WRITE_ONCE as appropriate. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDaniel Bristot de Oliveira <bristot@redhat.com> Acked-by: NWill Deacon <will@kernel.org> Acked-by: NDaniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org
-
- 08 3月, 2021 1 次提交
-
-
由 Al Viro 提交于
have dump_skip() just remember how much needs to be skipped, leave actual seeks/writing zeroes to the next dump_emit() or the end of coredump output, whichever comes first. And instead of playing with do_truncate() in the end, just write one NUL at the end of the last gap (if any). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 2月, 2021 1 次提交
-
-
由 Laurent Vivier 提交于
It can be useful to the interpreter to know which flags are in use. For instance, knowing if the preserve-argv[0] is in use would allow to skip the pathname argument. This patch uses an unused auxiliary vector, AT_FLAGS, to add a flag to inform interpreter if the preserve-argv[0] is enabled. Note by Helge Deller: The real-world user of this patch is qemu-user, which needs to know if it has to preserve the argv[0]. See Debian bug #970460. Signed-off-by: NLaurent Vivier <laurent@vivier.eu> Reviewed-by: NYunQiang Su <ysu@wavecomp.com> URL: http://bugs.debian.org/970460Signed-off-by: NHelge Deller <deller@gmx.de>
-
- 06 1月, 2021 1 次提交
-
-
由 Al Viro 提交于
Preparations to doing i386 compat elf_prstatus sanely - rather than duplicating the beginning of compat_elf_prstatus, take these fields into a separate structure (compat_elf_prstatus_common), so that it could be reused. Due to the incestous relationship between binfmt_elf.c and compat_binfmt_elf.c we need the same shape change done to native struct elf_prstatus, gathering the fields prior to pr_reg into a new structure (struct elf_prstatus_common). Fortunately, offset of pr_reg is always a multiple of 16 with no padding right before it, so it's possible to turn all the stuff prior to it into a single member without disturbing the layout. [build fix from Geert Uytterhoeven folded in] Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 1月, 2021 1 次提交
-
-
由 Al Viro 提交于
On 64bit architectures that support 32bit processes there are two possible layouts for NT_PRSTATUS note in ELF coredumps. For one thing, several fields are 64bit for native processes and 32bit for compat ones (pr_sigpend, etc.). For another, the register dump is obviously different - the size and number of registers are not going to be the same for 32bit and 64bit variants of processor. Usually that's handled by having two structures - elf_prstatus for native layout and compat_elf_prstatus for 32bit one. 32bit processes are handled by fs/compat_binfmt_elf.c, which defines a macro called 'elf_prstatus' that expands to compat_elf_prstatus. Then it includes fs/binfmt_elf.c, which makes all references to struct elf_prstatus to be textually replaced with struct compat_elf_prstatus. Ugly and somewhat brittle, but it works. However, amd64 is worse - there are _three_ possible layouts. One for native 64bit processes, another for i386 (32bit) processes and yet another for x32 (32bit address space with full 64bit registers). Both i386 and x32 processes are handled by fs/compat_binfmt_elf.c, with usual compat_binfmt_elf.c trickery. However, the layouts for i386 and x32 are not identical - they have the common beginning, but the register dump part (pr_reg) is bigger on x32. Worse, pr_reg is not the last field - it's followed by int pr_fpvalid, so that field ends up at different offsets for i386 and x32 layouts. Fortunately, there's not much code that cares about any of that - it's all encapsulated in fill_thread_core_info(). Since x32 variant is bigger, we define compat_elf_prstatus to match that layout. That way i386 processes have enough space to fit their layout into. Moreover, since these layouts are identical prior to pr_reg, we don't need to distinguish x32 and i386 cases when we are setting the fields prior to pr_reg. Filling pr_reg itself is done by calling ->get() method of appropriate regset, and that method knows what layout (and size) to use. We do need to distinguish x32 and i386 cases only for two things: setting ->pr_fpvalid (offset differs for x32 and i386) and choosing the right size for our note. The way it's done is Not Nice, for the lack of more accurate printable description. There are two macros (PRSTATUS_SIZE and SET_PR_FPVALID), that default essentially to sizeof(struct elf_prstatus) and (S)->pr_fpvalid = 1. On x86 asm/compat.h provides its own variants. Unfortunately, quite a few things go wrong there: * PRSTATUS_SIZE doesn't use the normal test for process being an x32 one; it compares the size reported by regset with the size of pr_reg. * it hardcodes the sizes of x32 and i386 variants (296 and 144 resp.), so if some change in includes leads to asm/compat.h pulled in by fs/binfmt_elf.c we are in trouble - it will end up using the size of x32 variant for 64bit processes. * it's in the wrong place; asm/compat.h couldn't define the structure for i386 layout, since it lacks quite a few types needed for it. Hardcoded sizes are largely due to that. The proper fix would be to have an explicitly defined i386 variant of structure and have PRSTATUS_SIZE/SET_PR_FPVALID check for TIF_X32 to choose the variant that should be used. Unfortunately, that requires some manipulations of headers; we'll do that later in the series, but for now let's go with the minimal variant - rename PRSTATUS_SIZE in asm/compat.h to COMPAT_PRSTATUS_SIZE, have fs/compat_binfmt_elf.c define PRSTATUS_SIZE to COMPAT_PRSTATUS_SIZE and use the normal TIF_X32 check in that macro. The size of i386 variant is kept hardcoded for now. Similar story for SET_PR_FPVALID. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 11 12月, 2020 1 次提交
-
-
由 Eric W. Biederman 提交于
Oleg Nesterov recently asked[1] why is there an unshare_files in do_coredump. After digging through all of the callers of lookup_fd it turns out that it is arch/powerpc/platforms/cell/spufs/coredump.c:coredump_next_context that needs the unshare_files in do_coredump. Looking at the history[2] this code was also the only piece of coredump code that required the unshare_files when the unshare_files was added. Looking at that code it turns out that cell is also the only architecture that implements elf_coredump_extra_notes_size and elf_coredump_extra_notes_write. I looked at the gdb repo[3] support for cell has been removed[4] in binutils 2.34. Geoff Levand reports he is still getting questions on how to run modern kernels on the PS3, from people using 3rd party firmware so this code is not dead. According to Wikipedia the last PS3 shipped in Japan sometime in 2017. So it will probably be a little while before everyone's hardware dies. Add some comments briefly documenting the coredump code that exists only to support cell spufs to make it easier to understand the coredump code. Eventually the hardware will be dead, or their won't be userspace tools, or the coredump code will be refactored and it will be too difficult to update a dead architecture and these comments make it easy to tell where to pull to remove cell spufs support. [1] https://lkml.kernel.org/r/20201123175052.GA20279@redhat.com [2] 179e037f ("do_coredump(): make sure that descriptor table isn't shared") [3] git://sourceware.org/git/binutils-gdb.git [4] abf516c6931a ("Remove Cell Broadband Engine debugging support"). Link: https://lkml.kernel.org/r/87h7pdnlzv.fsf_-_@x220.int.ebiederm.orgSigned-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 30 10月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arraysSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 26 10月, 2020 2 次提交
-
-
由 Gabriel Krisman Bertazi 提交于
Like it is done for SET_PERSONALITY with ARM, which requires the ELF header to select correct personality parameters, x86 requires the headers when selecting which VDSO to load, instead of relying on the going-away TIF_IA32/X32 flags. Add an indirection macro to arch_setup_additional_pages(), that x86 can reimplement to receive the extra parameter just for ELF files. This requires no changes to other architectures, who can continue to use the original arch_setup_additional_pages for ELF and non-ELF binaries. Signed-off-by: NGabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201004032536.1229030-8-krisman@collabora.com
-
由 Gabriel Krisman Bertazi 提交于
Like it is done for SET_PERSONALITY with x86, which requires the ELF header to select correct personality parameters, x86 requires the headers on compat_start_thread() to choose starting CS for ELF32 binaries, instead of relying on the going-away TIF_IA32/X32 flags. Add an indirection macro to ELF invocations of START_THREAD, that x86 can reimplement to receive the extra parameter just for ELF files. This requires no changes to other architectures who don't need the header information, they can continue to use the original start_thread for ELF and non-ELF binaries, and it prevents affecting non-ELF code paths for x86. Signed-off-by: NGabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201004032536.1229030-6-krisman@collabora.com
-
- 19 10月, 2020 1 次提交
-
-
由 Jann Horn 提交于
create_elf_tables() runs after setup_new_exec(), so other tasks can already access our new mm and do things like process_madvise() on it. (At the time I'm writing this commit, process_madvise() is not in mainline yet, but has been in akpm's tree for some time.) While I believe that there are currently no APIs that would actually allow another process to mess up our VMA tree (process_madvise() is limited to MADV_COLD and MADV_PAGEOUT, and uring and userfaultfd cannot reach an mm under which no syscalls have been executed yet), this seems like an accident waiting to happen. Let's make sure that we always take the mmap lock around GUP paths as long as another process might be able to see the mm. (Yes, this diff looks suspicious because we drop the lock before doing anything with `vma`, but that's because we actually don't do anything with it apart from the NULL check.) Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NMichel Lespinasse <walken@google.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://lkml.kernel.org/r/CAG48ez1-PBCdv3y8pn-Ty-b+FmBSLwDuVKFSt8h7wARLy0dF-Q@mail.gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 10月, 2020 4 次提交
-
-
由 Jann Horn 提交于
In both binfmt_elf and binfmt_elf_fdpic, use a new helper dump_vma_snapshot() to take a snapshot of the VMA list (including the gate VMA, if we have one) while protected by the mmap_lock, and then use that snapshot instead of walking the VMA list without locking. An alternative approach would be to keep the mmap_lock held across the entire core dumping operation; however, keeping the mmap_lock locked while we may be blocked for an unbounded amount of time (e.g. because we're dumping to a FUSE filesystem or so) isn't really optimal; the mmap_lock blocks things like the ->release handler of userfaultfd, and we don't really want critical system daemons to grind to a halt just because someone "gifted" them SCM_RIGHTS to an eternally-locked userfaultfd, or something like that. Since both the normal ELF code and the FDPIC ELF code need this functionality (and if any other binfmt wants to add coredump support in the future, they'd probably need it, too), implement this with a common helper in fs/coredump.c. A downside of this approach is that we now need a bigger amount of kernel memory per userspace VMA in the normal ELF case, and that we need O(n) kernel memory in the FDPIC ELF case at all; but 40 bytes per VMA shouldn't be terribly bad. There currently is a data race between stack expansion and anything that reads ->vm_start or ->vm_end under the mmap_lock held in read mode; to mitigate that for core dumping, take the mmap_lock in write mode when taking a snapshot of the VMA hierarchy. (If we only took the mmap_lock in read mode, we could end up with a corrupted core dump if someone does get_user_pages_remote() concurrently. Not really a major problem, but taking the mmap_lock either way works here, so we might as well avoid the issue.) (This doesn't do anything about the existing data races with stack expansion in other mm code.) Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-6-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jann Horn 提交于
At the moment, the binfmt_elf and binfmt_elf_fdpic code have slightly different code to figure out which VMAs should be dumped, and if so, whether the dump should contain the entire VMA or just its first page. Eliminate duplicate code by reworking the binfmt_elf version into a generic core dumping helper in coredump.c. As part of that, change the heuristic for detecting executable/library header pages to check whether the inode is executable instead of looking at the file mode. This is less problematic in terms of locking because it lets us avoid get_user() under the mmap_sem. (And arguably it looks nicer and makes more sense in generic code.) Adjust a little bit based on the binfmt_elf_fdpic version: ->anon_vma is only meaningful under CONFIG_MMU, otherwise we have to assume that the VMA has been written to. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-5-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jann Horn 提交于
Both fs/binfmt_elf.c and fs/binfmt_elf_fdpic.c need to dump ranges of pages into the coredump file. Extract that logic into a common helper. Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-4-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chris Kennelly 提交于
Patch series "Selecting Load Addresses According to p_align", v3. The current ELF loading mechancism provides page-aligned mappings. This can lead to the program being loaded in a way unsuitable for file-backed, transparent huge pages when handling PIE executables. While specifying -z,max-page-size=0x200000 to the linker will generate suitably aligned segments for huge pages on x86_64, the executable needs to be loaded at a suitably aligned address as well. This alignment requires the binary's cooperation, as distinct segments need to be appropriately paddded to be eligible for THP. For binaries built with increased alignment, this limits the number of bits usable for ASLR, but provides some randomization over using fixed load addresses/non-PIE binaries. This patch (of 2): The current ELF loading mechancism provides page-aligned mappings. This can lead to the program being loaded in a way unsuitable for file-backed, transparent huge pages when handling PIE executables. For binaries built with increased alignment, this limits the number of bits usable for ASLR, but provides some randomization over using fixed load addresses/non-PIE binaries. Tested by verifying program with -Wl,-z,max-page-size=0x200000 loading. [akpm@linux-foundation.org: fix max() warning] [ckennelly@google.com: augment comment] Link: https://lkml.kernel.org/r/20200821233848.3904680-2-ckennelly@google.comSigned-off-by: NChris Kennelly <ckennelly@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: David Rientjes <rientjes@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Hugh Dickens <hughd@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Fangrui Song <maskray@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Link: https://lkml.kernel.org/r/20200820170541.1132271-1-ckennelly@google.com Link: https://lkml.kernel.org/r/20200820170541.1132271-2-ckennelly@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 7月, 2020 2 次提交
-
-
由 Al Viro 提交于
all uses are conditional upon ELF_CORE_COPY_XFPREGS, which has not been defined on any architecture since 2010 Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Two new helpers: given a process and regset, dump into a buffer. regset_get() takes a buffer and size, regset_get_alloc() takes size and allocates a buffer. Return value in both cases is the amount of data actually dumped in case of success or -E... on error. In both cases the size is capped by regset->n * regset->size, so ->get() is called with offset 0 and size no more than what regset expects. binfmt_elf.c callers of ->get() are switched to using those; the other caller (copy_regset_to_user()) will need some preparations to switch. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 6月, 2020 1 次提交
-
-
由 Anthony Iliopoulos 提交于
The ifndef was added a long time ago to support archs that would define their own mapping function. The last user was the metag arch which was removed from the tree, and as such there are no users left. Let's kill it. Signed-off-by: NAnthony Iliopoulos <ailiop@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200402161543.4119-1-ailiop@suse.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 6月, 2020 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 29 5月, 2020 1 次提交
-
-
由 Alexander Potapenko 提交于
KMSAN reported uninitialized data being written to disk when dumping core. As a result, several kilobytes of kmalloc memory may be written to the core file and then read by a non-privileged user. Reported-by: Nsam <sunhaoyl@outlook.com> Signed-off-by: NAlexander Potapenko <glider@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NKees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200419100848.63472-1-glider@google.com Link: https://github.com/google/kmsan/issues/76Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 5月, 2020 1 次提交
-
-
由 Eric W. Biederman 提交于
Most of the support for passing the file descriptor of an executable to an interpreter already lives in the generic code and in binfmt_elf. Rework the fields in binfmt_elf that deal with executable file descriptor passing to make executable file descriptor passing a first class concept. Move the fd_install from binfmt_misc into begin_new_exec after the new creds have been installed. This means that accessing the file through /proc/<pid>/fd/N is able to see the creds for the new executable before allowing access to the new executables files. Performing the install of the executables file descriptor after the point of no return also means that nothing special needs to be done on error. The exiting of the process will close all of it's open files. Move the would_dump from binfmt_misc into begin_new_exec right after would_dump is called on the bprm->file. This makes it obvious this case exists and that no nesting of bprm->file is currently supported. In binfmt_misc the movement of fd_install into generic code means that it's special error exit path is no longer needed. Link: https://lkml.kernel.org/r/87y2poyd91.fsf_-_@x220.int.ebiederm.orgAcked-by: NLinus Torvalds <torvalds@linux-foundation.org> Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 08 5月, 2020 2 次提交
-
-
由 Eric W. Biederman 提交于
There is and has been for a very long time been a lot more going on in flush_old_exec than just flushing the old state. After the movement of code from setup_new_exec there is a whole lot more going on than just flushing the old executables state. Rename flush_old_exec to begin_new_exec to more accurately reflect what this function does. Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NGreg Ungerer <gerg@linux-m68k.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
The two functions are now always called one right after the other so merge them together to make future maintenance easier. Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NGreg Ungerer <gerg@linux-m68k.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 06 5月, 2020 2 次提交
-
-
由 Christoph Hellwig 提交于
There is no logic in elf_core_dump itself or in the various arch helpers called from it which use uaccess routines on kernel pointers except for the file writes thate are nicely encapsulated by using __kernel_write in dump_emit. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric W. Biederman 提交于
The code in binfmt_elf.c is differnt from the rest of the code that processes siginfo, as it sends siginfo from a kernel buffer to a file rather than from kernel memory to userspace buffers. To remove it's use of set_fs the code needs some different siginfo helpers. Add the helper copy_siginfo_to_external to copy from the kernel's internal siginfo layout to a buffer in the siginfo layout that userspace expects. Modify fill_siginfo_note to use copy_siginfo_to_external instead of set_fs and copy_siginfo_to_user. Update compat_binfmt_elf.c to use the previously added copy_siginfo_to_external32 to handle the compat case. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 08 4月, 2020 4 次提交
-
-
由 Alexey Dobriyan 提交于
Static executables don't need to free NULL pointer. It doesn't matter really because static executable is not common scenario but do it anyway out of pedantry. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200219185330.GA4933@avx2Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
PT_INTERP ELF header can be spared if executable is static. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200219185012.GB4871@avx2Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
"loc" variable became just a wrapper for PT_INTERP ELF header after main ELF header was moved to "bprm->buf". Delete it. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200219184847.GA4871@avx2Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anshuman Khandual 提交于
This replaces all remaining open encodings with is_vm_hugetlb_page(). Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Will Deacon <will@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Nick Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Paul Burton <paulburton@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/1582520593-30704-4-git-send-email-anshuman.khandual@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 3月, 2020 2 次提交
-
-
由 Dave Martin 提交于
An arch may want to tweak the mmap prot flags for an ELFexecutable's initial mappings. For example, arm64 is going to need to add PROT_BTI for executable pages in an ELF process whose executable is marked as using Branch Target Identification (an ARMv8.5-A control flow integrity feature). So that this can be done in a generic way, add a hook arch_elf_adjust_prot() to modify the prot flags as desired: arches can select CONFIG_HAVE_ELF_PROT and implement their own backend where necessary. By default, leave the prot flags unchanged. Signed-off-by: NMark Brown <broonie@kernel.org> Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com> Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Dave Martin 提交于
ELF program properties will be needed for detecting whether to enable optional architecture or ABI features for a new ELF process. For now, there are no generic properties that we care about, so do nothing unless CONFIG_ARCH_USE_GNU_PROPERTY=y. Otherwise, the presence of properties using the PT_PROGRAM_PROPERTY phdrs entry (if any), and notify each property to the arch code. For now, the added code is not used. Signed-off-by: NMark Brown <broonie@kernel.org> Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-