- 24 1月, 2021 4 次提交
-
-
由 Christian Brauner 提交于
When executing a setuid binary the kernel will verify in bprm_fill_uid() that the inode has a mapping in the caller's user namespace before setting the callers uid and gid. Let bprm_fill_uid() handle idmapped mounts. If the inode is accessed through an idmapped mount it is mapped according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-24-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJames Morris <jamorris@linux.microsoft.com> Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
-
由 Christian Brauner 提交于
When determining whether or not to create a coredump the vfs will verify that the caller is privileged over the inode. Make the would_dump() helper handle idmapped mounts by passing down the mount's user namespace of the exec file. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-23-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
-
由 Christian Brauner 提交于
The two helpers inode_permission() and generic_permission() are used by the vfs to perform basic permission checking by verifying that the caller is privileged over an inode. In order to handle idmapped mounts we extend the two helpers with an additional user namespace argument. On idmapped mounts the two helpers will make sure to map the inode according to the mount's user namespace and then peform identical permission checks to inode_permission() and generic_permission(). If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJames Morris <jamorris@linux.microsoft.com> Acked-by: NSerge Hallyn <serge@hallyn.com> Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
-
由 Christian Brauner 提交于
In order to determine whether a caller holds privilege over a given inode the capability framework exposes the two helpers privileged_wrt_inode_uidgid() and capable_wrt_inode_uidgid(). The former verifies that the inode has a mapping in the caller's user namespace and the latter additionally verifies that the caller has the requested capability in their current user namespace. If the inode is accessed through an idmapped mount map it into the mount's user namespace. Afterwards the checks are identical to non-idmapped inodes. If the initial user namespace is passed all operations are a nop so non-idmapped mounts will not see a change in behavior. Link: https://lore.kernel.org/r/20210121131959.646623-5-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJames Morris <jamorris@linux.microsoft.com> Acked-by: NSerge Hallyn <serge@hallyn.com> Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
-
- 11 12月, 2020 5 次提交
-
-
由 Eric W. Biederman 提交于
Recently syzbot reported[0] that there is a deadlock amongst the users of exec_update_mutex. The problematic lock ordering found by lockdep was: perf_event_open (exec_update_mutex -> ovl_i_mutex) chown (ovl_i_mutex -> sb_writes) sendfile (sb_writes -> p->lock) by reading from a proc file and writing to overlayfs proc_pid_syscall (p->lock -> exec_update_mutex) While looking at possible solutions it occured to me that all of the users and possible users involved only wanted to state of the given process to remain the same. They are all readers. The only writer is exec. There is no reason for readers to block on each other. So fix this deadlock by transforming exec_update_mutex into a rw_semaphore named exec_update_lock that only exec takes for writing. Cc: Jann Horn <jannh@google.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Bernd Edlinger <bernd.edlinger@hotmail.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christopher Yeoh <cyeoh@au1.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Fixes: eea96732 ("exec: Add exec_update_mutex to replace cred_guard_mutex") [0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/87ft4mbqen.fsf@x220.int.ebiederm.orgSigned-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Now that unshare_files happens in begin_new_exec after the point of no return, io_uring_task_cancel can also happen later. Effectively this means io_uring activities for a task are only canceled when exec succeeds. Link: https://lkml.kernel.org/r/878saih2op.fsf@x220.int.ebiederm.orgSigned-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Now that exec no longer needs to return the unshared files to their previous value there is no reason to return displaced. Instead when unshare_fd creates a copy of the file table, call put_files_struct before returning from unshare_files. Acked-by: NChristian Brauner <christian.brauner@ubuntu.com> v1: https://lkml.kernel.org/r/20200817220425.9389-2-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-2-ebiederm@xmission.comSigned-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Many moons ago the binfmts were doing some very questionable things with file descriptors and an unsharing of the file descriptor table was added to make things better[1][2]. The helper steal_lockss was added to avoid breaking the userspace programs[3][4][6]. Unfortunately it turned out that steal_locks did not work for network file systems[5], so it was removed to see if anyone would complain[7][8]. It was thought at the time that NPTL would not be affected as the unshare_files happened after the other threads were killed[8]. Unfortunately because there was an unshare_files in binfmt_elf.c before the threads were killed this analysis was incorrect. This unshare_files in binfmt_elf.c resulted in the unshares_files happening whenever threads were present. Which led to unshare_files being moved to the start of do_execve[9]. Later the problems were rediscovered and the suggested approach was to readd steal_locks under a different name[10]. I happened to be reviewing patches and I noticed that this approach was a step backwards[11]. I proposed simply moving unshare_files[12] and it was pointed out that moving unshare_files without auditing the code was also unsafe[13]. There were then several attempts to solve this[14][15][16] and I even posted this set of changes[17]. Unfortunately because auditing all of execve is time consuming this change did not make it in at the time. Well now that I am cleaning up exec I have made the time to read through all of the binfmts and the only playing with file descriptors is either the security modules closing them in security_bprm_committing_creds or is in the generic code in fs/exec.c. None of it happens before begin_new_exec is called. So move unshare_files into begin_new_exec, after the point of no return. If memory is very very very low and the application calling exec is sharing file descriptor tables between processes we might fail past the point of no return. Which is unfortunate but no different than any of the other places where we allocate memory after the point of no return. This movement allows another process that shares the file table, or another thread of the same process and that closes files or changes their close on exec behavior and races with execve to cause some unexpected things to happen. There is only one time of check to time of use race and it is just there so that execve fails instead of an interpreter failing when it tries to open the file it is supposed to be interpreting. Failing later if userspace is being silly is not a problem. With this change it the following discription from the removal of steal_locks[8] finally becomes true. Apps using NPTL are not affected, since all other threads are killed before execve. Apps using LinuxThreads are only affected if they - have multiple threads during exec (LinuxThreads doesn't kill other threads, the app may do it with pthread_kill_other_threads_np()) - rely on POSIX locks being inherited across exec Both conditions are documented, but not their interaction. Apps using clone() natively are affected if they - use clone(CLONE_FILES) - rely on POSIX locks being inherited across exec I have investigated some paths to make it possible to solve this without moving unshare_files but they all look more complicated[18]. Reported-by: NDaniel P. Berrangé <berrange@redhat.com> Reported-by: NJeff Layton <jlayton@redhat.com> History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git [1] 02cda956de0b ("[PATCH] unshare_files" [2] 04e9bcb4d106 ("[PATCH] use new unshare_files helper") [3] 088f5d7244de ("[PATCH] add steal_locks helper") [4] 02c541ec8ffa ("[PATCH] use new steal_locks helper") [5] https://lkml.kernel.org/r/E1FLIlF-0007zR-00@dorka.pomaz.szeredi.hu [6] https://lkml.kernel.org/r/0060321191605.GB15997@sorel.sous-sol.org [7] https://lkml.kernel.org/r/E1FLwjC-0000kJ-00@dorka.pomaz.szeredi.hu [8] c89681ed ("[PATCH] remove steal_locks()") [9] fd8328be ("[PATCH] sanitize handling of shared descriptor tables in failing execve()") [10] https://lkml.kernel.org/r/20180317142520.30520-1-jlayton@kernel.org [11] https://lkml.kernel.org/r/87r2nwqk73.fsf@xmission.com [12] https://lkml.kernel.org/r/87bmfgvg8w.fsf@xmission.com [13] https://lkml.kernel.org/r/20180322111424.GE30522@ZenIV.linux.org.uk [14] https://lkml.kernel.org/r/20180827174722.3723-1-jlayton@kernel.org [15] https://lkml.kernel.org/r/20180830172423.21964-1-jlayton@kernel.org [16] https://lkml.kernel.org/r/20180914105310.6454-1-jlayton@kernel.org [17] https://lkml.kernel.org/r/87a7ohs5ow.fsf@xmission.com [18] https://lkml.kernel.org/r/87pn8c1uj6.fsf_-_@x220.int.ebiederm.orgAcked-by: NChristian Brauner <christian.brauner@ubuntu.com> v1: https://lkml.kernel.org/r/20200817220425.9389-1-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-1-ebiederm@xmission.comSigned-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Al Viro pointed out that using the phrase "close_on_exec(fd, rcu_dereference_raw(current->files->fdt))" instead of wrapping it in rcu_read_lock(), rcu_read_unlock() is a very questionable optimization[1]. Once wrapped with rcu_read_lock()/rcu_read_unlock() that phrase becomes equivalent the helper function get_close_on_exec so simplify the code and make it more robust by simply using get_close_on_exec. [1] https://lkml.kernel.org/r/20201207222214.GA4115853@ZenIV.linux.org.ukSuggested-by: NAl Viro <viro@ftp.linux.org.uk> Link: https://lkml.kernel.org/r/87k0tqr6zi.fsf_-_@x220.int.ebiederm.orgSigned-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 02 12月, 2020 1 次提交
-
-
由 Gabriel Krisman Bertazi 提交于
Introduce a mechanism to quickly disable/enable syscall handling for a specific process and redirect to userspace via SIGSYS. This is useful for processes with parts that require syscall redirection and parts that don't, but who need to perform this boundary crossing really fast, without paying the cost of a system call to reconfigure syscall handling on each boundary transition. This is particularly important for Windows games running over Wine. The proposed interface looks like this: prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <off>, <length>, [selector]) The range [<offset>,<offset>+<length>) is a part of the process memory map that is allowed to by-pass the redirection code and dispatch syscalls directly, such that in fast paths a process doesn't need to disable the trap nor the kernel has to check the selector. This is essential to return from SIGSYS to a blocked area without triggering another SIGSYS from rt_sigreturn. selector is an optional pointer to a char-sized userspace memory region that has a key switch for the mechanism. This key switch is set to either PR_SYS_DISPATCH_ON, PR_SYS_DISPATCH_OFF to enable and disable the redirection without calling the kernel. The feature is meant to be set per-thread and it is disabled on fork/clone/execv. Internally, this doesn't add overhead to the syscall hot path, and it requires very little per-architecture support. I avoided using seccomp, even though it duplicates some functionality, due to previous feedback that maybe it shouldn't mix with seccomp since it is not a security mechanism. And obviously, this should never be considered a security mechanism, since any part of the program can by-pass it by using the syscall dispatcher. For the sysinfo benchmark, which measures the overhead added to executing a native syscall that doesn't require interception, the overhead using only the direct dispatcher region to issue syscalls is pretty much irrelevant. The overhead of using the selector goes around 40ns for a native (unredirected) syscall in my system, and it is (as expected) dominated by the supervisor-mode user-address access. In fact, with SMAP off, the overhead is consistently less than 5ns on my test box. Signed-off-by: NGabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NAndy Lutomirski <luto@kernel.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NKees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201127193238.821364-4-krisman@collabora.com
-
- 11 11月, 2020 1 次提交
-
-
由 Helge Deller 提交于
On parisc we need to initialize the memory layout for the user stack at process start time to a fixed size, which up until now was limited to the size as given by CONFIG_MAX_STACK_SIZE_MB at compile time. This hard limit was too small and showed problems when compiling ruby2.7, qmlcachegen and some Qt packages. This patch changes two things: a) It increases the default maximum stack size to 100MB. b) Users can modify the stack hard limit size with ulimit and then newly forked processes will use the given stack size which can even be bigger than the default 100MB. Reported-by: NJohn David Anglin <dave.anglin@bell.net> Signed-off-by: NHelge Deller <deller@gmx.de>
-
- 05 10月, 2020 3 次提交
-
-
由 Kees Cook 提交于
These routines are used in places outside of exec(2), so in preparation for refactoring them, move them into a separate source file, fs/kernel_read_file.c. Signed-off-by: NKees Cook <keescook@chromium.org> Reviewed-by: NMimi Zohar <zohar@linux.ibm.com> Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org> Acked-by: NScott Branden <scott.branden@broadcom.com> Link: https://lore.kernel.org/r/20201002173828.2099543-5-keescook@chromium.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Scott Branden 提交于
Move kernel_read_file* out of linux/fs.h to its own linux/kernel_read_file.h include file. That header gets pulled in just about everywhere and doesn't really need functions not related to the general fs interface. Suggested-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NScott Branden <scott.branden@broadcom.com> Signed-off-by: NKees Cook <keescook@chromium.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMimi Zohar <zohar@linux.ibm.com> Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: NJames Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/r/20200706232309.12010-2-scott.branden@broadcom.com Link: https://lore.kernel.org/r/20201002173828.2099543-4-keescook@chromium.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Kees Cook 提交于
FIRMWARE_PREALLOC_BUFFER is a "how", not a "what", and confuses the LSMs that are interested in filtering between types of things. The "how" should be an internal detail made uninteresting to the LSMs. Fixes: a098ecd2 ("firmware: support loading into a pre-allocated buffer") Fixes: fd90bc55 ("ima: based on policy verify firmware signatures (pre-allocated buffer)") Fixes: 4f0496d8 ("ima: based on policy warn about loading firmware (pre-allocated buffer)") Signed-off-by: NKees Cook <keescook@chromium.org> Reviewed-by: NMimi Zohar <zohar@linux.ibm.com> Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org> Acked-by: NScott Branden <scott.branden@broadcom.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201002173828.2099543-2-keescook@chromium.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 01 10月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
Grab actual references to the files_struct. To avoid circular references issues due to this, we add a per-task note that keeps track of what io_uring contexts a task has used. When the tasks execs or exits its assigned files, we cancel requests based on this tracking. With that, we can grab proper references to the files table, and no longer need to rely on stashing away ring_fd and ring_file to check if the ring_fd may have been closed. Cc: stable@vger.kernel.org # v5.5+ Reviewed-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 9月, 2020 1 次提交
-
-
由 Nicholas Piggin 提交于
Reading and modifying current->mm and current->active_mm and switching mm should be done with irqs off, to prevent races seeing an intermediate state. This is similar to commit 38cf307c ("mm: fix kthread_use_mm() vs TLB invalidate"). At exec-time when the new mm is activated, the old one should usually be single-threaded and no longer used, unless something else is holding an mm_users reference (which may be possible). Absent other mm_users, there is also a race with preemption and lazy tlb switching. Consider the kernel_execve case where the current thread is using a lazy tlb active mm: call_usermodehelper() kernel_execve() old_mm = current->mm; active_mm = current->active_mm; *** preempt *** --------------------> schedule() prev->active_mm = NULL; mmdrop(prev active_mm); ... <-------------------- schedule() current->mm = mm; current->active_mm = mm; if (!old_mm) mmdrop(active_mm); If we switch back to the kernel thread from a different mm, there is a double free of the old active_mm, and a missing free of the new one. Closing this race only requires interrupts to be disabled while ->mm and ->active_mm are being switched, but the TLB problem requires also holding interrupts off over activate_mm. Unfortunately not all archs can do that yet, e.g., arm defers the switch if irqs are disabled and expects finish_arch_post_lock_switch() to be called to complete the flush; um takes a blocking lock in activate_mm(). So as a first step, disable interrupts across the mm/active_mm updates to close the lazy tlb preempt race, and provide an arch option to extend that to activate_mm which allows architectures doing IPI based TLB shootdowns to close the second race. This is a bit ugly, but in the interest of fixing the bug and backporting before all architectures are converted this is a compromise. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200914045219.3736466-2-npiggin@gmail.com
-
- 13 8月, 2020 5 次提交
-
-
由 Peter Xu 提交于
After the cleanup of page fault accounting, gup does not need to pass task_struct around any more. Remove that parameter in the whole gup stack. Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com> Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
The path_noexec() check, like the regular file check, was happening too late, letting LSMs see impossible execve()s. Check it earlier as well in may_open() and collect the redundant fs/exec.c path_noexec() test under the same robustness comment as the S_ISREG() check. My notes on the call path, and related arguments, checks, etc: do_open_execat() struct open_flags open_exec_flags = { .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, .acc_mode = MAY_EXEC, ... do_filp_open(dfd, filename, open_flags) path_openat(nameidata, open_flags, flags) file = alloc_empty_file(open_flags, current_cred()); do_open(nameidata, file, open_flags) may_open(path, acc_mode, open_flag) /* new location of MAY_EXEC vs path_noexec() test */ inode_permission(inode, MAY_OPEN | acc_mode) security_inode_permission(inode, acc_mode) vfs_open(path, file) do_dentry_open(file, path->dentry->d_inode, open) security_file_open(f) open() /* old location of path_noexec() test */ Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: http://lkml.kernel.org/r/20200605160013.3954297-4-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
The execve(2)/uselib(2) syscalls have always rejected non-regular files. Recently, it was noticed that a deadlock was introduced when trying to execute pipes, as the S_ISREG() test was happening too late. This was fixed in commit 73601ea5 ("fs/open.c: allow opening only regular files during execve()"), but it was added after inode_permission() had already run, which meant LSMs could see bogus attempts to execute non-regular files. Move the test into the other inode type checks (which already look for other pathological conditions[1]). Since there is no need to use FMODE_EXEC while we still have access to "acc_mode", also switch the test to MAY_EXEC. Also include a comment with the redundant S_ISREG() checks at the end of execve(2)/uselib(2) to note that they are present to avoid any mistakes. My notes on the call path, and related arguments, checks, etc: do_open_execat() struct open_flags open_exec_flags = { .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, .acc_mode = MAY_EXEC, ... do_filp_open(dfd, filename, open_flags) path_openat(nameidata, open_flags, flags) file = alloc_empty_file(open_flags, current_cred()); do_open(nameidata, file, open_flags) may_open(path, acc_mode, open_flag) /* new location of MAY_EXEC vs S_ISREG() test */ inode_permission(inode, MAY_OPEN | acc_mode) security_inode_permission(inode, acc_mode) vfs_open(path, file) do_dentry_open(file, path->dentry->d_inode, open) /* old location of FMODE_EXEC vs S_ISREG() test */ security_file_open(f) open() [1] https://lore.kernel.org/lkml/202006041910.9EF0C602@keescook/Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: http://lkml.kernel.org/r/20200605160013.3954297-3-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
Patch series "Relocate execve() sanity checks", v2. While looking at the code paths for the proposed O_MAYEXEC flag, I saw some things that looked like they should be fixed up. exec: Change uselib(2) IS_SREG() failure to EACCES This just regularizes the return code on uselib(2). exec: Move S_ISREG() check earlier This moves the S_ISREG() check even earlier than it was already. exec: Move path_noexec() check earlier This adds the path_noexec() check to the same place as the S_ISREG() check. This patch (of 3): Change uselib(2)' S_ISREG() error return to EACCES instead of EINVAL so the behavior matches execve(2), and the seemingly documented value. The "not a regular file" failure mode of execve(2) is explicitly documented[1], but it is not mentioned in uselib(2)[2] which does, however, say that open(2) and mmap(2) errors may apply. The documentation for open(2) does not include a "not a regular file" error[3], but mmap(2) does[4], and it is EACCES. [1] http://man7.org/linux/man-pages/man2/execve.2.html#ERRORS [2] http://man7.org/linux/man-pages/man2/uselib.2.html#ERRORS [3] http://man7.org/linux/man-pages/man2/open.2.html#ERRORS [4] http://man7.org/linux/man-pages/man2/mmap.2.html#ERRORSSigned-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NChristian Brauner <christian.brauner@ubuntu.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: http://lkml.kernel.org/r/20200605160013.3954297-1-keescook@chromium.org Link: http://lkml.kernel.org/r/20200605160013.3954297-2-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Both exec and exit want to ensure that the uaccess routines actually do access user pointers. Use the newly added force_uaccess_begin helper instead of an open coded set_fs for that to prepare for kernel builds where set_fs() does not exist. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Link: http://lkml.kernel.org/r/20200710135706.537715-7-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 7月, 2020 6 次提交
-
-
由 Eric W. Biederman 提交于
To allow the kernel not to play games with set_fs to call exec implement kernel_execve. The function kernel_execve takes pointers into kernel memory and copies the values pointed to onto the new userspace stack. The calls with arguments from kernel space of do_execve are replaced with calls to kernel_execve. The calls do_execve and do_execveat are made static as there are now no callers outside of exec. The comments that mention do_execve are updated to refer to kernel_execve or execve depending on the circumstances. In addition to correcting the comments, this makes it easy to grep for do_execve and verify it is not used. Inspired-by: https://lkml.kernel.org/r/20200627072704.2447163-1-hch@lst.deReviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87wo365ikj.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
In preparation for implementiong kernel_execve (which will take kernel pointers not userspace pointers) factor out bprm_stack_limits out of prepare_arg_pages. This separates the counting which depends upon the getting data from userspace from the calculations of the stack limits which is usable in kernel_execve. The remove prepare_args_pages and compute bprm->argc and bprm->envc directly in do_execveat_common, before bprm_stack_limits is called. Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lkml.kernel.org/r/87365u6x60.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Currently it is necessary for the usermode helper code and the code that launches init to use set_fs so that pages coming from the kernel look like they are coming from userspace. To allow that usage of set_fs to be removed cleanly the argument copying from userspace needs to happen earlier. Factor bprm_execve out of do_execve_common to separate out the copying of arguments to the newe stack, and the rest of exec. In separating bprm_execve from do_execve_common the copying of the arguments onto the new stack happens earlier. As the copying of the arguments does not depend any security hooks, files, the file table, current->in_execve, current->fs->in_exec, bprm->unsafe, or creds this is safe. Likewise the security hook security_creds_for_exec does not depend upon preventing the argument copying from happening. In addition to making it possible to implement kernel_execve that performs the copying differently, this separation of bprm_execve from do_execve_common makes for a nice separation of responsibilities making the exec code easier to navigate. Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lkml.kernel.org/r/878sfm6x6x.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Currently it is necessary for the usermode helper code and the code that launches init to use set_fs so that pages coming from the kernel look like they are coming from userspace. To allow that usage of set_fs to be removed cleanly the argument copying from userspace needs to happen earlier. Move the allocation and initialization of bprm->mm into alloc_bprm so that the bprm->mm is available early to store the new user stack into. This is a prerequisite for copying argv and envp into the new user stack early before ther rest of exec. To keep the things consistent the cleanup of bprm->mm is moved into free_bprm. So that bprm->mm will be cleaned up whenever bprm->mm is allocated and free_bprm are called. Moving bprm_mm_init earlier is safe as it does not depend on any files, current->in_execve, current->fs->in_exec, bprm->unsafe, or the if the file table is shared. (AKA bprm_mm_init does not depend on any of the code that happens between alloc_bprm and where it was previously called.) This moves bprm->mm cleanup after current->fs->in_exec is set to 0. This is safe because current->fs->in_exec is only used to preventy taking an additional reference on the fs_struct. This moves bprm->mm cleanup after current->in_execve is set to 0. This is safe because current->in_execve is only used by the lsms (apparmor and tomoyou) and always for LSM specific functions, never for anything to do with the mm. This adds bprm->mm cleanup into the successful return path. This is safe because being on the successful return path implies that begin_new_exec succeeded and set brpm->mm to NULL. As bprm->mm is NULL bprm cleanup I am moving into free_bprm will do nothing. Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lkml.kernel.org/r/87eepe6x7p.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Currently it is necessary for the usermode helper code and the code that launches init to use set_fs so that pages coming from the kernel look like they are coming from userspace. To allow that usage of set_fs to be removed cleanly the argument copying from userspace needs to happen earlier. Move the computation of bprm->filename and possible allocation of a name in the case of execveat into alloc_bprm to make that possible. The exectuable name, the arguments, and the environment are copied into the new usermode stack which is stored in bprm until exec passes the point of no return. As the executable name is copied first onto the usermode stack it needs to be known. As there are no dependencies to computing the executable name, compute it early in alloc_bprm. As an implementation detail if the filename needs to be generated because it embeds a file descriptor store that filename in a new field bprm->fdpath, and free it in free_bprm. Previously this was done in an independent variable pathbuf. I have renamed pathbuf fdpath because fdpath is more suggestive of what kind of path is in the variable. I moved fdpath into struct linux_binprm because it is tightly tied to the other variables in struct linux_binprm, and as such is needed to allow the call alloc_binprm to move. Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lkml.kernel.org/r/87k0z66x8f.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Currently it is necessary for the usermode helper code and the code that launches init to use set_fs so that pages coming from the kernel look like they are coming from userspace. To allow that usage of set_fs to be removed cleanly the argument copying from userspace needs to happen earlier. Move the allocation of the bprm into it's own function (alloc_bprm) and move the call of alloc_bprm before unshare_files so that bprm can ultimately be allocated, the arguments can be placed on the new stack, and then the bprm can be passed into the core of exec. Neither the allocation of struct binprm nor the unsharing depend upon each other so swapping the order in which they are called is trivially safe. To keep things consistent the order of cleanup at the end of do_execve_common swapped to match the order of initialization. Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87pn8y6x9a.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 04 7月, 2020 1 次提交
-
-
由 Eric W. Biederman 提交于
Now that the last callser has been removed remove this code from exec. For anyone thinking of resurrecing do_execve_file please note that the code was buggy in several fundamental ways. - It did not ensure the file it was passed was read-only and that deny_write_access had been called on it. Which subtlely breaks invaniants in exec. - The caller of do_execve_file was expected to hold and put a reference to the file, but an extra reference for use by exec was not taken so that when exec put it's reference to the file an underflow occured on the file reference count. - The point of the interface was so that a pathname did not need to exist. Which breaks pathname based LSMs. Tetsuo Handa originally reported these issues[1]. While it was clear that deny_write_access was missing the fundamental incompatibility with the passed in O_RDWR filehandle was not immediately recognized. All of these issues were fixed by modifying the usermode driver code to have a path, so it did not need this hack. Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/ v1: https://lkml.kernel.org/r/871rm2f0hi.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87lfk54p0m.fsf_-_@x220.int.ebiederm.org Link: https://lkml.kernel.org/r/20200702164140.4468-10-ebiederm@xmission.comReviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: NAlexei Starovoitov <ast@kernel.org> Tested-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 10 6月, 2020 2 次提交
-
-
由 Michel Lespinasse 提交于
Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 6月, 2020 2 次提交
-
-
由 Christoph Hellwig 提交于
read_code operates on user addresses. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/20200515143646.3857579-27-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Only build read_code when binary formats that use it are built into the kernel. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/20200515143646.3857579-26-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 6月, 2020 2 次提交
-
-
由 Christoph Hellwig 提交于
Currently copy_string_kernel is just a wrapper around copy_strings that simplifies the calling conventions and uses set_fs to allow passing a kernel pointer. But due to the fact the we only need to handle a single kernel argument pointer, the logic can be sigificantly simplified while getting rid of the set_fs. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/20200501104105.2621149-3-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
copy_strings_kernel is always used with a single argument, adjust the calling convention to that. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/20200501104105.2621149-2-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 5月, 2020 2 次提交
-
-
由 Eric W. Biederman 提交于
Move the computation of creds from prepare_binfmt into begin_new_exec so that the creds need only be computed once. This is just code reorganization no semantic changes of any kind are made. Moving the computation is safe. I have looked through the kernel and verified none of the binfmts look at bprm->cred directly, and that there are no helpers that look at bprm->cred indirectly. Which means that it is not a problem to compute the bprm->cred later in the execution flow as it is not used until it becomes current->cred. A new function bprm_creds_from_file is added to contain the work that needs to be done. bprm_creds_from_file first computes which file bprm->executable or most likely bprm->file that the bprm->creds will be computed from. The funciton bprm_fill_uid is updated to receive the file instead of accessing bprm->file. The now unnecessary work needed to reset the bprm->cred->euid, and bprm->cred->egid is removed from brpm_fill_uid. A small comment to document that bprm_fill_uid now only deals with the work to handle suid and sgid files. The default case is already heandled by prepare_exec_creds. The function security_bprm_repopulate_creds is renamed security_bprm_creds_from_file and now is explicitly passed the file from which to compute the creds. The documentation of the bprm_creds_from_file security hook is updated to explain when the hook is called and what it needs to do. The file is passed from cap_bprm_creds_from_file into get_file_caps so that the caps are computed for the appropriate file. The now unnecessary work in cap_bprm_creds_from_file to reset the ambient capabilites has been removed. A small comment to document that the work of cap_bprm_creds_from_file is to read capabilities from the files secureity attribute and derive capabilities from the fact the user had uid 0 has been added. Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
There is a small bug in the code that recomputes parts of bprm->cred for every bprm->file. The code never recomputes the part of clear_dangerous_personality_flags it is responsible for. Which means that in practice if someone creates a sgid script the interpreter will not be able to use any of: READ_IMPLIES_EXEC ADDR_NO_RANDOMIZE ADDR_COMPAT_LAYOUT MMAP_PAGE_ZERO. This accentially clearing of personality flags probably does not matter in practice because no one has complained but it does make the code more difficult to understand. Further remaining bug compatible prevents the recomputation from being removed and replaced by simply computing bprm->cred once from the final bprm->file. Making this change removes the last behavior difference between computing bprm->creds from the final file and recomputing bprm->cred several times. Which allows this behavior change to be justified for it's own reasons, and for any but hunts looking into why the behavior changed to wind up here instead of in the code that will follow that computes bprm->cred from the final bprm->file. This small logic bug appears to have existed since the code started clearing dangerous personality bits. History Tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: 1bb0fa189c6a ("[PATCH] NX: clean up legacy binary support") Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 21 5月, 2020 4 次提交
-
-
由 Eric W. Biederman 提交于
Recursion in kernel code is generally a bad idea as it can overflow the kernel stack. Recursion in exec also hides that the code is looping and that the loop changes bprm->file. Instead of recursing in search_binary_handler have the methods that would recurse set bprm->interpreter and return 0. Modify exec_binprm to loop when bprm->interpreter is set. Consolidate all of the reassignments of bprm->file in that loop to make it clear what is going on. The structure of the new loop in exec_binprm is that all errors return immediately, while successful completion (ret == 0 && !bprm->interpreter) just breaks out of the loop and runs what exec_bprm has always run upon successful completion. Fail if the an interpreter is being call after execfd has been set. The code has never properly handled an interpreter being called with execfd being set and with reassignments of bprm->file and the assignment of bprm->executable in generic code it has finally become possible to test and fail when if this problematic condition happens. With the reassignments of bprm->file and the assignment of bprm->executable moved into the generic code add a test to see if bprm->executable is being reassigned. In search_binary_handler remove the test for !bprm->file. With all reassignments of bprm->file moved to exec_binprm bprm->file can never be NULL in search_binary_handler. Link: https://lkml.kernel.org/r/87sgfwyd84.fsf_-_@x220.int.ebiederm.orgAcked-by: NLinus Torvalds <torvalds@linux-foundation.org> Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Most of the support for passing the file descriptor of an executable to an interpreter already lives in the generic code and in binfmt_elf. Rework the fields in binfmt_elf that deal with executable file descriptor passing to make executable file descriptor passing a first class concept. Move the fd_install from binfmt_misc into begin_new_exec after the new creds have been installed. This means that accessing the file through /proc/<pid>/fd/N is able to see the creds for the new executable before allowing access to the new executables files. Performing the install of the executables file descriptor after the point of no return also means that nothing special needs to be done on error. The exiting of the process will close all of it's open files. Move the would_dump from binfmt_misc into begin_new_exec right after would_dump is called on the bprm->file. This makes it obvious this case exists and that no nesting of bprm->file is currently supported. In binfmt_misc the movement of fd_install into generic code means that it's special error exit path is no longer needed. Link: https://lkml.kernel.org/r/87y2poyd91.fsf_-_@x220.int.ebiederm.orgAcked-by: NLinus Torvalds <torvalds@linux-foundation.org> Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
The code in prepare_binary_handler needs to be run every time search_binary_handler is called so move the call into search_binary_handler itself to make the code simpler and easier to understand. Link: https://lkml.kernel.org/r/87d070zrvx.fsf_-_@x220.int.ebiederm.orgAcked-by: NLinus Torvalds <torvalds@linux-foundation.org> Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NJames Morris <jamorris@linux.microsoft.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Add a flag preserve_creds that binfmt_misc can set to prevent credentials from being updated. This allows binfmt_misc to always call prepare_binprm. Allowing the credential computation logic to be consolidated. Not replacing the credentials with the interpreters credentials is safe because because an open file descriptor to the executable is passed to the interpreter. As the interpreter does not need to reopen the executable it is guaranteed to see the same file that exec sees. Ref: c407c033de84 ("[PATCH] binfmt_misc: improve calculation of interpreter's credentials") Link: https://lkml.kernel.org/r/87imgszrwo.fsf_-_@x220.int.ebiederm.orgAcked-by: NLinus Torvalds <torvalds@linux-foundation.org> Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-