1. 21 7月, 2020 5 次提交
    • E
      exec: Factor bprm_stack_limits out of prepare_arg_pages · d8b9cd54
      Eric W. Biederman 提交于
      In preparation for implementiong kernel_execve (which will take kernel
      pointers not userspace pointers) factor out bprm_stack_limits out of
      prepare_arg_pages.  This separates the counting which depends upon the
      getting data from userspace from the calculations of the stack limits
      which is usable in kernel_execve.
      
      The remove prepare_args_pages and compute bprm->argc and bprm->envc
      directly in do_execveat_common, before bprm_stack_limits is called.
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lkml.kernel.org/r/87365u6x60.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      d8b9cd54
    • E
      exec: Factor bprm_execve out of do_execve_common · 0c9cdff0
      Eric W. Biederman 提交于
      Currently it is necessary for the usermode helper code and the code
      that launches init to use set_fs so that pages coming from the kernel
      look like they are coming from userspace.
      
      To allow that usage of set_fs to be removed cleanly the argument
      copying from userspace needs to happen earlier.  Factor bprm_execve
      out of do_execve_common to separate out the copying of arguments
      to the newe stack, and the rest of exec.
      
      In separating bprm_execve from do_execve_common the copying
      of the arguments onto the new stack happens earlier.
      
      As the copying of the arguments does not depend any security hooks,
      files, the file table, current->in_execve, current->fs->in_exec,
      bprm->unsafe, or creds this is safe.
      
      Likewise the security hook security_creds_for_exec does not depend upon
      preventing the argument copying from happening.
      
      In addition to making it possible to implement kernel_execve that
      performs the copying differently, this separation of bprm_execve from
      do_execve_common makes for a nice separation of responsibilities making
      the exec code easier to navigate.
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lkml.kernel.org/r/878sfm6x6x.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      0c9cdff0
    • E
      exec: Move bprm_mm_init into alloc_bprm · f18ac551
      Eric W. Biederman 提交于
      Currently it is necessary for the usermode helper code and the code that
      launches init to use set_fs so that pages coming from the kernel look like
      they are coming from userspace.
      
      To allow that usage of set_fs to be removed cleanly the argument copying
      from userspace needs to happen earlier.  Move the allocation and
      initialization of bprm->mm into alloc_bprm so that the bprm->mm is
      available early to store the new user stack into.  This is a prerequisite
      for copying argv and envp into the new user stack early before ther rest of
      exec.
      
      To keep the things consistent the cleanup of bprm->mm is moved into
      free_bprm.  So that bprm->mm will be cleaned up whenever bprm->mm is
      allocated and free_bprm are called.
      
      Moving bprm_mm_init earlier is safe as it does not depend on any files,
      current->in_execve, current->fs->in_exec, bprm->unsafe, or the if the file
      table is shared. (AKA bprm_mm_init does not depend on any of the code that
      happens between alloc_bprm and where it was previously called.)
      
      This moves bprm->mm cleanup after current->fs->in_exec is set to 0.  This
      is safe because current->fs->in_exec is only used to preventy taking an
      additional reference on the fs_struct.
      
      This moves bprm->mm cleanup after current->in_execve is set to 0.  This is
      safe because current->in_execve is only used by the lsms (apparmor and
      tomoyou) and always for LSM specific functions, never for anything to do
      with the mm.
      
      This adds bprm->mm cleanup into the successful return path.  This is safe
      because being on the successful return path implies that begin_new_exec
      succeeded and set brpm->mm to NULL.  As bprm->mm is NULL bprm cleanup I am
      moving into free_bprm will do nothing.
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lkml.kernel.org/r/87eepe6x7p.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      f18ac551
    • E
      exec: Move initialization of bprm->filename into alloc_bprm · 60d9ad1d
      Eric W. Biederman 提交于
      Currently it is necessary for the usermode helper code and the code
      that launches init to use set_fs so that pages coming from the kernel
      look like they are coming from userspace.
      
      To allow that usage of set_fs to be removed cleanly the argument
      copying from userspace needs to happen earlier.  Move the computation
      of bprm->filename and possible allocation of a name in the case
      of execveat into alloc_bprm to make that possible.
      
      The exectuable name, the arguments, and the environment are
      copied into the new usermode stack which is stored in bprm
      until exec passes the point of no return.
      
      As the executable name is copied first onto the usermode stack
      it needs to be known.  As there are no dependencies to computing
      the executable name, compute it early in alloc_bprm.
      
      As an implementation detail if the filename needs to be generated
      because it embeds a file descriptor store that filename in a new field
      bprm->fdpath, and free it in free_bprm.  Previously this was done in
      an independent variable pathbuf.  I have renamed pathbuf fdpath
      because fdpath is more suggestive of what kind of path is in the
      variable.  I moved fdpath into struct linux_binprm because it is
      tightly tied to the other variables in struct linux_binprm, and as
      such is needed to allow the call alloc_binprm to move.
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lkml.kernel.org/r/87k0z66x8f.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      60d9ad1d
    • E
      exec: Factor out alloc_bprm · 0a8f36eb
      Eric W. Biederman 提交于
      Currently it is necessary for the usermode helper code and the code
      that launches init to use set_fs so that pages coming from the kernel
      look like they are coming from userspace.
      
      To allow that usage of set_fs to be removed cleanly the argument
      copying from userspace needs to happen earlier.  Move the allocation
      of the bprm into it's own function (alloc_bprm) and move the call of
      alloc_bprm before unshare_files so that bprm can ultimately be
      allocated, the arguments can be placed on the new stack, and then the
      bprm can be passed into the core of exec.
      
      Neither the allocation of struct binprm nor the unsharing depend upon each
      other so swapping the order in which they are called is trivially safe.
      
      To keep things consistent the order of cleanup at the end of
      do_execve_common swapped to match the order of initialization.
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Link: https://lkml.kernel.org/r/87pn8y6x9a.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      0a8f36eb
  2. 04 7月, 2020 1 次提交
  3. 10 6月, 2020 2 次提交
  4. 09 6月, 2020 2 次提交
  5. 05 6月, 2020 2 次提交
  6. 30 5月, 2020 2 次提交
    • E
      exec: Compute file based creds only once · 56305aa9
      Eric W. Biederman 提交于
      Move the computation of creds from prepare_binfmt into begin_new_exec
      so that the creds need only be computed once.  This is just code
      reorganization no semantic changes of any kind are made.
      
      Moving the computation is safe.  I have looked through the kernel and
      verified none of the binfmts look at bprm->cred directly, and that
      there are no helpers that look at bprm->cred indirectly.  Which means
      that it is not a problem to compute the bprm->cred later in the
      execution flow as it is not used until it becomes current->cred.
      
      A new function bprm_creds_from_file is added to contain the work that
      needs to be done.  bprm_creds_from_file first computes which file
      bprm->executable or most likely bprm->file that the bprm->creds
      will be computed from.
      
      The funciton bprm_fill_uid is updated to receive the file instead of
      accessing bprm->file.  The now unnecessary work needed to reset the
      bprm->cred->euid, and bprm->cred->egid is removed from brpm_fill_uid.
      A small comment to document that bprm_fill_uid now only deals with the
      work to handle suid and sgid files.  The default case is already
      heandled by prepare_exec_creds.
      
      The function security_bprm_repopulate_creds is renamed
      security_bprm_creds_from_file and now is explicitly passed the file
      from which to compute the creds.  The documentation of the
      bprm_creds_from_file security hook is updated to explain when the hook
      is called and what it needs to do.  The file is passed from
      cap_bprm_creds_from_file into get_file_caps so that the caps are
      computed for the appropriate file.  The now unnecessary work in
      cap_bprm_creds_from_file to reset the ambient capabilites has been
      removed.  A small comment to document that the work of
      cap_bprm_creds_from_file is to read capabilities from the files
      secureity attribute and derive capabilities from the fact the
      user had uid 0 has been added.
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      56305aa9
    • E
      exec: Add a per bprm->file version of per_clear · a7868323
      Eric W. Biederman 提交于
      There is a small bug in the code that recomputes parts of bprm->cred
      for every bprm->file.  The code never recomputes the part of
      clear_dangerous_personality_flags it is responsible for.
      
      Which means that in practice if someone creates a sgid script
      the interpreter will not be able to use any of:
      	READ_IMPLIES_EXEC
      	ADDR_NO_RANDOMIZE
      	ADDR_COMPAT_LAYOUT
      	MMAP_PAGE_ZERO.
      
      This accentially clearing of personality flags probably does
      not matter in practice because no one has complained
      but it does make the code more difficult to understand.
      
      Further remaining bug compatible prevents the recomputation from being
      removed and replaced by simply computing bprm->cred once from the
      final bprm->file.
      
      Making this change removes the last behavior difference between
      computing bprm->creds from the final file and recomputing
      bprm->cred several times.  Which allows this behavior change
      to be justified for it's own reasons, and for any but hunts
      looking into why the behavior changed to wind up here instead
      of in the code that will follow that computes bprm->cred
      from the final bprm->file.
      
      This small logic bug appears to have existed since the code
      started clearing dangerous personality bits.
      
      History Tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      Fixes: 1bb0fa189c6a ("[PATCH] NX: clean up legacy binary support")
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      a7868323
  7. 21 5月, 2020 6 次提交
  8. 17 5月, 2020 1 次提交
    • E
      exec: Move would_dump into flush_old_exec · f87d1c95
      Eric W. Biederman 提交于
      I goofed when I added mm->user_ns support to would_dump.  I missed the
      fact that in the case of binfmt_loader, binfmt_em86, binfmt_misc, and
      binfmt_script bprm->file is reassigned.  Which made the move of
      would_dump from setup_new_exec to __do_execve_file before exec_binprm
      incorrect as it can result in would_dump running on the script instead
      of the interpreter of the script.
      
      The net result is that the code stopped making unreadable interpreters
      undumpable.  Which allows them to be ptraced and written to disk
      without special permissions.  Oops.
      
      The move was necessary because the call in set_new_exec was after
      bprm->mm was no longer valid.
      
      To correct this mistake move the misplaced would_dump from
      __do_execve_file into flos_old_exec, before exec_mmap is called.
      
      I tested and confirmed that without this fix I can attach with gdb to
      a script with an unreadable interpreter, and with this fix I can not.
      
      Cc: stable@vger.kernel.org
      Fixes: f84df2a6 ("exec: Ensure mm->user_ns contains the execed files")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      f87d1c95
  9. 12 5月, 2020 3 次提交
  10. 09 5月, 2020 2 次提交
  11. 08 5月, 2020 6 次提交
  12. 29 4月, 2020 2 次提交
    • E
      exec: Remove BUG_ON(has_group_leader_pid) · 610b8188
      Eric W. Biederman 提交于
      With the introduction of exchange_tids thread_group_leader and
      has_group_leader_pid have become equivalent.  Further at this point in the
      code a thread group has exactly two threads, the previous thread_group_leader
      that is waiting to be reaped and tsk.  So we know it is impossible for tsk to
      be the thread_group_leader.
      
      This is also the last user of has_group_leader_pid so removing this check
      will allow has_group_leader_pid to be removed.
      
      So remove the "BUG_ON(has_group_leader_pid)" that will never fire.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      610b8188
    • E
      proc: Ensure we see the exit of each process tid exactly once · 6b03d130
      Eric W. Biederman 提交于
      When the thread group leader changes during exec and the old leaders
      thread is reaped proc_flush_pid will flush the dentries for the entire
      process because the leader still has it's original pid.
      
      Fix this by exchanging the pids in an rcu safe manner,
      and wrapping the code to do that up in a helper exchange_tids.
      
      When I removed switch_exec_pids and introduced this behavior
      in d73d6529 ("[PATCH] pidhash: kill switch_exec_pids") there
      really was nothing that cared as flushing happened with
      the cached dentry and de_thread flushed both of them on exec.
      
      This lack of fully exchanging pids became a problem a few months later
      when I introduced 48e6484d ("[PATCH] proc: Rewrite the proc dentry
      flush on exit optimization").  Which overlooked the de_thread case
      was no longer swapping pids, and I was looking up proc dentries
      by task->pid.
      
      The current behavior isn't properly a bug as everything in proc will
      continue to work correctly just a little bit less efficiently.  Fix
      this just so there are no little surprise corner cases waiting to bite
      people.
      
      -- Oleg points out this could be an issue in next_tgid in proc where
         has_group_leader_pid is called, and reording some of the assignments
         should fix that.
      
      -- Oleg points out this will break the 10 year old hack in __exit_signal.c
      >	/*
      >	 * This can only happen if the caller is de_thread().
      >	 * FIXME: this is the temporary hack, we should teach
      >	 * posix-cpu-timers to handle this case correctly.
      >	 */
      >	if (unlikely(has_group_leader_pid(tsk)))
      >		posix_cpu_timers_exit_group(tsk);
      
      The code in next_tgid has been changed to use PIDTYPE_TGID,
      and the posix cpu timers code has been fixed so it does not
      need the 10 year old hack, so this should be safe to merge
      now.
      
      Link: https://lore.kernel.org/lkml/87h7x3ajll.fsf_-_@x220.int.ebiederm.org/Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Fixes: 48e6484d ("[PATCH] proc: Rewrite the proc dentry flush on exit optimization").
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      6b03d130
  13. 02 4月, 2020 1 次提交
    • E
      signal: Extend exec_id to 64bits · d1e7fd64
      Eric W. Biederman 提交于
      Replace the 32bit exec_id with a 64bit exec_id to make it impossible
      to wrap the exec_id counter.  With care an attacker can cause exec_id
      wrap and send arbitrary signals to a newly exec'd parent.  This
      bypasses the signal sending checks if the parent changes their
      credentials during exec.
      
      The severity of this problem can been seen that in my limited testing
      of a 32bit exec_id it can take as little as 19s to exec 65536 times.
      Which means that it can take as little as 14 days to wrap a 32bit
      exec_id.  Adam Zabrocki has succeeded wrapping the self_exe_id in 7
      days.  Even my slower timing is in the uptime of a typical server.
      Which means self_exec_id is simply a speed bump today, and if exec
      gets noticably faster self_exec_id won't even be a speed bump.
      
      Extending self_exec_id to 64bits introduces a problem on 32bit
      architectures where reading self_exec_id is no longer atomic and can
      take two read instructions.  Which means that is is possible to hit
      a window where the read value of exec_id does not match the written
      value.  So with very lucky timing after this change this still
      remains expoiltable.
      
      I have updated the update of exec_id on exec to use WRITE_ONCE
      and the read of exec_id in do_notify_parent to use READ_ONCE
      to make it clear that there is no locking between these two
      locations.
      
      Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl
      Fixes: 2.3.23pre2
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      d1e7fd64
  14. 25 3月, 2020 5 次提交