提交 · d8b9cd549ecf0f3dc8da42ada5f0ce73e8ed5f1e · openeuler / Kernel

21 7月, 2020 5 次提交

exec: Factor bprm_stack_limits out of prepare_arg_pages · d8b9cd54

由 Eric W. Biederman 提交于 7月 12, 2020

In preparation for implementiong kernel_execve (which will take kernel
pointers not userspace pointers) factor out bprm_stack_limits out of
prepare_arg_pages. This separates the counting which depends upon the
getting data from userspace from the calculations of the stack limits
which is usable in kernel_execve.

The remove prepare_args_pages and compute bprm->argc and bprm->envc
directly in do_execveat_common, before bprm_stack_limits is called.
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/87365u6x60.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

d8b9cd54

exec: Factor bprm_execve out of do_execve_common · 0c9cdff0

由 Eric W. Biederman 提交于 7月 12, 2020

Currently it is necessary for the usermode helper code and the code
that launches init to use set_fs so that pages coming from the kernel
look like they are coming from userspace.

To allow that usage of set_fs to be removed cleanly the argument
copying from userspace needs to happen earlier.  Factor bprm_execve
out of do_execve_common to separate out the copying of arguments
to the newe stack, and the rest of exec.

In separating bprm_execve from do_execve_common the copying
of the arguments onto the new stack happens earlier.

As the copying of the arguments does not depend any security hooks,
files, the file table, current->in_execve, current->fs->in_exec,
bprm->unsafe, or creds this is safe.

Likewise the security hook security_creds_for_exec does not depend upon
preventing the argument copying from happening.

In addition to making it possible to implement kernel_execve that
performs the copying differently, this separation of bprm_execve from
do_execve_common makes for a nice separation of responsibilities making
the exec code easier to navigate.
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/878sfm6x6x.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

0c9cdff0

exec: Move bprm_mm_init into alloc_bprm · f18ac551

由 Eric W. Biederman 提交于 7月 10, 2020

Currently it is necessary for the usermode helper code and the code that
launches init to use set_fs so that pages coming from the kernel look like
they are coming from userspace.

To allow that usage of set_fs to be removed cleanly the argument copying
from userspace needs to happen earlier. Move the allocation and
initialization of bprm->mm into alloc_bprm so that the bprm->mm is
available early to store the new user stack into. This is a prerequisite
for copying argv and envp into the new user stack early before ther rest of
exec.

To keep the things consistent the cleanup of bprm->mm is moved into
free_bprm. So that bprm->mm will be cleaned up whenever bprm->mm is
allocated and free_bprm are called.

Moving bprm_mm_init earlier is safe as it does not depend on any files,
current->in_execve, current->fs->in_exec, bprm->unsafe, or the if the file
table is shared. (AKA bprm_mm_init does not depend on any of the code that
happens between alloc_bprm and where it was previously called.)

This moves bprm->mm cleanup after current->fs->in_exec is set to 0. This
is safe because current->fs->in_exec is only used to preventy taking an
additional reference on the fs_struct.

This moves bprm->mm cleanup after current->in_execve is set to 0. This is
safe because current->in_execve is only used by the lsms (apparmor and
tomoyou) and always for LSM specific functions, never for anything to do
with the mm.

This adds bprm->mm cleanup into the successful return path. This is safe
because being on the successful return path implies that begin_new_exec
succeeded and set brpm->mm to NULL. As bprm->mm is NULL bprm cleanup I am
moving into free_bprm will do nothing.
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/87eepe6x7p.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

f18ac551

exec: Move initialization of bprm->filename into alloc_bprm · 60d9ad1d

由 Eric W. Biederman 提交于 7月 11, 2020

Currently it is necessary for the usermode helper code and the code
that launches init to use set_fs so that pages coming from the kernel
look like they are coming from userspace.

To allow that usage of set_fs to be removed cleanly the argument
copying from userspace needs to happen earlier. Move the computation
of bprm->filename and possible allocation of a name in the case
of execveat into alloc_bprm to make that possible.

The exectuable name, the arguments, and the environment are
copied into the new usermode stack which is stored in bprm
until exec passes the point of no return.

As the executable name is copied first onto the usermode stack
it needs to be known. As there are no dependencies to computing
the executable name, compute it early in alloc_bprm.

As an implementation detail if the filename needs to be generated
because it embeds a file descriptor store that filename in a new field
bprm->fdpath, and free it in free_bprm. Previously this was done in
an independent variable pathbuf. I have renamed pathbuf fdpath
because fdpath is more suggestive of what kind of path is in the
variable. I moved fdpath into struct linux_binprm because it is
tightly tied to the other variables in struct linux_binprm, and as
such is needed to allow the call alloc_binprm to move.
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/87k0z66x8f.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

60d9ad1d

exec: Factor out alloc_bprm · 0a8f36eb

由 Eric W. Biederman 提交于 7月 10, 2020

Currently it is necessary for the usermode helper code and the code
that launches init to use set_fs so that pages coming from the kernel
look like they are coming from userspace.

To allow that usage of set_fs to be removed cleanly the argument
copying from userspace needs to happen earlier. Move the allocation
of the bprm into it's own function (alloc_bprm) and move the call of
alloc_bprm before unshare_files so that bprm can ultimately be
allocated, the arguments can be placed on the new stack, and then the
bprm can be passed into the core of exec.

Neither the allocation of struct binprm nor the unsharing depend upon each
other so swapping the order in which they are called is trivially safe.

To keep things consistent the order of cleanup at the end of
do_execve_common swapped to match the order of initialization.
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/87pn8y6x9a.fsf@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

0a8f36eb

04 7月, 2020 1 次提交

exec: Remove do_execve_file · 25cf336d

由 Eric W. Biederman 提交于 6月 25, 2020

Now that the last callser has been removed remove this code from exec.

For anyone thinking of resurrecing do_execve_file please note that
the code was buggy in several fundamental ways.

- It did not ensure the file it was passed was read-only and that
deny_write_access had been called on it. Which subtlely breaks
invaniants in exec.

- The caller of do_execve_file was expected to hold and put a
reference to the file, but an extra reference for use by exec was
not taken so that when exec put it's reference to the file an
underflow occured on the file reference count.

- The point of the interface was so that a pathname did not need to
exist. Which breaks pathname based LSMs.

Tetsuo Handa originally reported these issues[1]. While it was clear
that deny_write_access was missing the fundamental incompatibility
with the passed in O_RDWR filehandle was not immediately recognized.

All of these issues were fixed by modifying the usermode driver code
to have a path, so it did not need this hack.
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
[1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/
v1: https://lkml.kernel.org/r/871rm2f0hi.fsf_-_@x220.int.ebiederm.org
v2: https://lkml.kernel.org/r/87lfk54p0m.fsf_-_@x220.int.ebiederm.org
Link: https://lkml.kernel.org/r/20200702164140.4468-10-ebiederm@xmission.comReviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

25cf336d

10 6月, 2020 2 次提交

mmap locking API: convert mmap_sem comments · c1e8d7c6

由 Michel Lespinasse 提交于 6月 08, 2020

Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]
Signed-off-by: NMichel Lespinasse <walken@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c1e8d7c6

mmap locking API: use coccinelle to convert mmap_sem rwsem call sites · d8ed45c5

由 Michel Lespinasse 提交于 6月 08, 2020

This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)
Signed-off-by: NMichel Lespinasse <walken@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8ed45c5

09 6月, 2020 2 次提交

exec: use flush_icache_user_range in read_code · bce2b68b

由 Christoph Hellwig 提交于 6月 07, 2020

read_code operates on user addresses.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/20200515143646.3857579-27-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bce2b68b

exec: only build read_code when needed · 48304f79

由 Christoph Hellwig 提交于 6月 07, 2020

Only build read_code when binary formats that use it are built into the
kernel.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/20200515143646.3857579-26-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

48304f79

05 6月, 2020 2 次提交

exec: open code copy_string_kernel · 762a3af6

由 Christoph Hellwig 提交于 6月 04, 2020

Currently copy_string_kernel is just a wrapper around copy_strings that
simplifies the calling conventions and uses set_fs to allow passing a
kernel pointer. But due to the fact the we only need to handle a single
kernel argument pointer, the logic can be sigificantly simplified while
getting rid of the set_fs.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/20200501104105.2621149-3-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

762a3af6

exec: simplify the copy_strings_kernel calling convention · 986db2d1

由 Christoph Hellwig 提交于 6月 04, 2020

copy_strings_kernel is always used with a single argument,
adjust the calling convention to that.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/20200501104105.2621149-2-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

986db2d1

30 5月, 2020 2 次提交

exec: Compute file based creds only once · 56305aa9

由 Eric W. Biederman 提交于 5月 29, 2020

Move the computation of creds from prepare_binfmt into begin_new_exec
so that the creds need only be computed once.  This is just code
reorganization no semantic changes of any kind are made.

Moving the computation is safe.  I have looked through the kernel and
verified none of the binfmts look at bprm->cred directly, and that
there are no helpers that look at bprm->cred indirectly.  Which means
that it is not a problem to compute the bprm->cred later in the
execution flow as it is not used until it becomes current->cred.

A new function bprm_creds_from_file is added to contain the work that
needs to be done.  bprm_creds_from_file first computes which file
bprm->executable or most likely bprm->file that the bprm->creds
will be computed from.

The funciton bprm_fill_uid is updated to receive the file instead of
accessing bprm->file.  The now unnecessary work needed to reset the
bprm->cred->euid, and bprm->cred->egid is removed from brpm_fill_uid.
A small comment to document that bprm_fill_uid now only deals with the
work to handle suid and sgid files.  The default case is already
heandled by prepare_exec_creds.

The function security_bprm_repopulate_creds is renamed
security_bprm_creds_from_file and now is explicitly passed the file
from which to compute the creds.  The documentation of the
bprm_creds_from_file security hook is updated to explain when the hook
is called and what it needs to do.  The file is passed from
cap_bprm_creds_from_file into get_file_caps so that the caps are
computed for the appropriate file.  The now unnecessary work in
cap_bprm_creds_from_file to reset the ambient capabilites has been
removed.  A small comment to document that the work of
cap_bprm_creds_from_file is to read capabilities from the files
secureity attribute and derive capabilities from the fact the
user had uid 0 has been added.
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

56305aa9

exec: Add a per bprm->file version of per_clear · a7868323

由 Eric W. Biederman 提交于 5月 29, 2020

There is a small bug in the code that recomputes parts of bprm->cred
for every bprm->file.  The code never recomputes the part of
clear_dangerous_personality_flags it is responsible for.

Which means that in practice if someone creates a sgid script
the interpreter will not be able to use any of:
	READ_IMPLIES_EXEC
	ADDR_NO_RANDOMIZE
	ADDR_COMPAT_LAYOUT
	MMAP_PAGE_ZERO.

This accentially clearing of personality flags probably does
not matter in practice because no one has complained
but it does make the code more difficult to understand.

Further remaining bug compatible prevents the recomputation from being
removed and replaced by simply computing bprm->cred once from the
final bprm->file.

Making this change removes the last behavior difference between
computing bprm->creds from the final file and recomputing
bprm->cred several times.  Which allows this behavior change
to be justified for it's own reasons, and for any but hunts
looking into why the behavior changed to wind up here instead
of in the code that will follow that computes bprm->cred
from the final bprm->file.

This small logic bug appears to have existed since the code
started clearing dangerous personality bits.

History Tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: 1bb0fa189c6a ("[PATCH] NX: clean up legacy binary support")
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

a7868323

21 5月, 2020 6 次提交

exec: Remove recursion from search_binary_handler · bc2bf338

由 Eric W. Biederman 提交于 5月 18, 2020

Recursion in kernel code is generally a bad idea as it can overflow
the kernel stack.  Recursion in exec also hides that the code is
looping and that the loop changes bprm->file.

Instead of recursing in search_binary_handler have the methods that
would recurse set bprm->interpreter and return 0.  Modify exec_binprm
to loop when bprm->interpreter is set.  Consolidate all of the
reassignments of bprm->file in that loop to make it clear what is
going on.

The structure of the new loop in exec_binprm is that all errors return
immediately, while successful completion (ret == 0 &&
!bprm->interpreter) just breaks out of the loop and runs what
exec_bprm has always run upon successful completion.

Fail if the an interpreter is being call after execfd has been set.
The code has never properly handled an interpreter being called with
execfd being set and with reassignments of bprm->file and the
assignment of bprm->executable in generic code it has finally become
possible to test and fail when if this problematic condition happens.

With the reassignments of bprm->file and the assignment of
bprm->executable moved into the generic code add a test to see if
bprm->executable is being reassigned.

In search_binary_handler remove the test for !bprm->file.  With all
reassignments of bprm->file moved to exec_binprm bprm->file can never
be NULL in search_binary_handler.

Link: https://lkml.kernel.org/r/87sgfwyd84.fsf_-_@x220.int.ebiederm.orgAcked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

bc2bf338

exec: Generic execfd support · b8a61c9e

由 Eric W. Biederman 提交于 5月 14, 2020

Most of the support for passing the file descriptor of an executable
to an interpreter already lives in the generic code and in binfmt_elf.
Rework the fields in binfmt_elf that deal with executable file
descriptor passing to make executable file descriptor passing a first
class concept.

Move the fd_install from binfmt_misc into begin_new_exec after the new
creds have been installed.  This means that accessing the file through
/proc/<pid>/fd/N is able to see the creds for the new executable
before allowing access to the new executables files.

Performing the install of the executables file descriptor after
the point of no return also means that nothing special needs to
be done on error.  The exiting of the process will close all
of it's open files.

Move the would_dump from binfmt_misc into begin_new_exec right
after would_dump is called on the bprm->file.  This makes it
obvious this case exists and that no nesting of bprm->file is
currently supported.

In binfmt_misc the movement of fd_install into generic code means
that it's special error exit path is no longer needed.

Link: https://lkml.kernel.org/r/87y2poyd91.fsf_-_@x220.int.ebiederm.orgAcked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

b8a61c9e

exec: Move the call of prepare_binprm into search_binary_handler · 8b72ca90

由 Eric W. Biederman 提交于 5月 13, 2020

The code in prepare_binary_handler needs to be run every time
search_binary_handler is called so move the call into search_binary_handler
itself to make the code simpler and easier to understand.

Link: https://lkml.kernel.org/r/87d070zrvx.fsf_-_@x220.int.ebiederm.orgAcked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NJames Morris <jamorris@linux.microsoft.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

8b72ca90

exec: Allow load_misc_binary to call prepare_binprm unconditionally · a16b3357

由 Eric W. Biederman 提交于 5月 16, 2020

Add a flag preserve_creds that binfmt_misc can set to prevent
credentials from being updated. This allows binfmt_misc to always
call prepare_binprm. Allowing the credential computation logic to be
consolidated.

Not replacing the credentials with the interpreters credentials is
safe because because an open file descriptor to the executable is
passed to the interpreter. As the interpreter does not need to
reopen the executable it is guaranteed to see the same file that
exec sees.

Ref: c407c033de84 ("[PATCH] binfmt_misc: improve calculation of interpreter's credentials")
Link: https://lkml.kernel.org/r/87imgszrwo.fsf_-_@x220.int.ebiederm.orgAcked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

a16b3357

exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds · 112b7147

由 Eric W. Biederman 提交于 5月 14, 2020

Rename bprm->cap_elevated to bprm->active_secureexec and initialize it
in prepare_binprm instead of in cap_bprm_set_creds.  Initializing
bprm->active_secureexec in prepare_binprm allows multiple
implementations of security_bprm_repopulate_creds to play nicely with
each other.

Rename security_bprm_set_creds to security_bprm_reopulate_creds to
emphasize that this path recomputes part of bprm->cred.  This
recomputation avoids the time of check vs time of use problems that
are inherent in unix #! interpreters.

In short two renames and a move in the location of initializing
bprm->active_secureexec.

Link: https://lkml.kernel.org/r/87o8qkzrxp.fsf_-_@x220.int.ebiederm.orgAcked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

112b7147

exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds · b8bff599

由 Eric W. Biederman 提交于 3月 22, 2020

Today security_bprm_set_creds has several implementations:
apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_creds,
smack_bprm_set_creds, and tomoyo_bprm_set_creds.

Except for cap_bprm_set_creds they all test bprm->called_set_creds and
return immediately if it is true.  The function cap_bprm_set_creds
ignores bprm->calld_sed_creds entirely.

Create a new LSM hook security_bprm_creds_for_exec that is called just
before prepare_binprm in __do_execve_file, resulting in a LSM hook
that is called exactly once for the entire of exec.  Modify the bits
of security_bprm_set_creds that only want to be called once per exec
into security_bprm_creds_for_exec, leaving only cap_bprm_set_creds
behind.

Remove bprm->called_set_creds all of it's former users have been moved
to security_bprm_creds_for_exec.

Add or upate comments a appropriate to bring them up to date and
to reflect this change.

Link: https://lkml.kernel.org/r/87v9kszrzh.fsf_-_@x220.int.ebiederm.orgAcked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com> # For the LSM and Smack bits
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

b8bff599

17 5月, 2020 1 次提交

exec: Move would_dump into flush_old_exec · f87d1c95

由 Eric W. Biederman 提交于 5月 16, 2020

I goofed when I added mm->user_ns support to would_dump.  I missed the
fact that in the case of binfmt_loader, binfmt_em86, binfmt_misc, and
binfmt_script bprm->file is reassigned.  Which made the move of
would_dump from setup_new_exec to __do_execve_file before exec_binprm
incorrect as it can result in would_dump running on the script instead
of the interpreter of the script.

The net result is that the code stopped making unreadable interpreters
undumpable.  Which allows them to be ptraced and written to disk
without special permissions.  Oops.

The move was necessary because the call in set_new_exec was after
bprm->mm was no longer valid.

To correct this mistake move the misplaced would_dump from
__do_execve_file into flos_old_exec, before exec_mmap is called.

I tested and confirmed that without this fix I can attach with gdb to
a script with an unreadable interpreter, and with this fix I can not.

Cc: stable@vger.kernel.org
Fixes: f84df2a6 ("exec: Ensure mm->user_ns contains the execed files")
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

f87d1c95

12 5月, 2020 3 次提交

exec: Set the point of no return sooner · 6834e0bb

由 Eric W. Biederman 提交于 4月 04, 2020

Make the code more robust by marking the point of no return sooner.
This ensures that future code changes don't need to worry about how
they return errors if they are past this point.

This results in no actual change in behavior as __do_execve_file does
not force SIGSEGV when there is a pending fatal signal pending past
the point of no return. Further the only error returns from de_thread
and exec_mmap that can occur result in fatal signals being pending.
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/87sgga5klu.fsf_-_@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

6834e0bb

exec: Move handling of the point of no return to the top level · 8890b293

由 Eric W. Biederman 提交于 4月 04, 2020

Move the handing of the point of no return from search_binary_handler
into __do_execve_file so that it is easier to find, and to keep
things robust in the face of change.

Make it clear that an existing fatal signal will take precedence over
a forced SIGSEGV by not forcing SIGSEGV if a fatal signal is already
pending.  This does not change the behavior but it saves a reader
of the code the tedium of reading and understanding force_sig
and the signal delivery code.

Update the comment in begin_new_exec about where SIGSEGV is forced.

Keep point_of_no_return from being a mystery by documenting
what the code is doing where it forces SIGSEGV if the
code is past the point of no return.
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/87y2q25knl.fsf_-_@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

8890b293

exec: Run sync_mm_rss before taking exec_update_mutex · a28bf136

由 Eric W. Biederman 提交于 3月 30, 2020

Like exec_mm_release sync_mm_rss is about flushing out the state of
the old_mm, which does not need to happen under exec_update_mutex.

Make this explicit by moving sync_mm_rss outside of exec_update_mutex.
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/875zd66za3.fsf_-_@x220.int.ebiederm.orgSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

a28bf136

09 5月, 2020 2 次提交

exec: Fix spelling of search_binary_handler in a comment · 13c432b5

由 Eric W. Biederman 提交于 3月 19, 2020

Link: https://lkml.kernel.org/r/87h7wq6zc1.fsf_-_@x220.int.ebiederm.orgReviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

13c432b5

exec: Move the comment from above de_thread to above unshare_sighand · 7a60ef48

由 Eric W. Biederman 提交于 3月 08, 2020

The comment describes work that now happens in unshare_sighand so
move the comment where it makes sense.

Link: https://lkml.kernel.org/r/87mu6i6zcs.fsf_-_@x220.int.ebiederm.orgReviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

7a60ef48

08 5月, 2020 6 次提交

exec: Rename flush_old_exec begin_new_exec · 2388777a

由 Eric W. Biederman 提交于 5月 03, 2020

There is and has been for a very long time been a lot more going on in
flush_old_exec than just flushing the old state.  After the movement
of code from setup_new_exec there is a whole lot more going on than
just flushing the old executables state.

Rename flush_old_exec to begin_new_exec to more accurately reflect
what this function does.
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NGreg Ungerer <gerg@linux-m68k.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

2388777a

exec: Move most of setup_new_exec into flush_old_exec · df9e4d2c

由 Eric W. Biederman 提交于 5月 03, 2020

The current idiom for the callers is:

flush_old_exec(bprm);
set_personality(...);
setup_new_exec(bprm);

In 2010 Linus split flush_old_exec into flush_old_exec and
setup_new_exec.  With the intention that setup_new_exec be what is
called after the processes new personality is set.

Move the code that doesn't depend upon the personality from
setup_new_exec into flush_old_exec.  This is to facilitate future
changes by having as much code together in one function as possible.

To see why it is safe to move this code please note that effectively
this change moves the personality setting in the binfmt and the following
three lines of code after everything except unlocking the mutexes:
	arch_pick_mmap_layout
	arch_setup_new_exec
	mm->task_size = TASK_SIZE

The function arch_pick_mmap_layout at most sets:
	mm->get_unmapped_area
	mm->mmap_base
	mm->mmap_legacy_base
	mm->mmap_compat_base
	mm->mmap_compat_legacy_base
which nothing in flush_old_exec or setup_new_exec depends on.

The function arch_setup_new_exec only sets architecture specific
state and the rest of the functions only deal in state that applies
to all architectures.

The last line just sets mm->task_size and again nothing in flush_old_exec
or setup_new_exec depend on task_size.

Ref: 221af7f8 ("Split 'flush_old_exec' into two functions")
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NGreg Ungerer <gerg@linux-m68k.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

df9e4d2c

exec: In setup_new_exec cache current in the local variable me · 7d503feb

由 Eric W. Biederman 提交于 4月 02, 2020

At least gcc 8.3 when generating code for x86_64 has a hard time
consolidating multiple calls to current aka get_current(), and winds
up unnecessarily rereading %gs:current_task several times in
setup_new_exec.

Caching the value of current in the local variable of me generates
slightly better and shorter assembly.
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NGreg Ungerer <gerg@linux-m68k.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

7d503feb

exec: Merge install_exec_creds into setup_new_exec · 96ecee29

由 Eric W. Biederman 提交于 5月 03, 2020

The two functions are now always called one right after the
other so merge them together to make future maintenance easier.
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NGreg Ungerer <gerg@linux-m68k.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

96ecee29

exec: Rename the flag called_exec_mmap point_of_no_return · 1507b7a3

由 Eric W. Biederman 提交于 4月 02, 2020

Update the comments and make the code easier to understand by
renaming this flag.
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NGreg Ungerer <gerg@linux-m68k.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

1507b7a3

exec: Make unlocking exec_update_mutex explict · 89826cce

由 Eric W. Biederman 提交于 4月 02, 2020

With install_exec_creds updated to follow immediately after
setup_new_exec, the failure of unshare_sighand is the only
code path where exec_update_mutex is held but not explicitly
unlocked.

Update that code path to explicitly unlock exec_update_mutex.

Remove the unlocking of exec_update_mutex from free_bprm.
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NGreg Ungerer <gerg@linux-m68k.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

89826cce

29 4月, 2020 2 次提交

exec: Remove BUG_ON(has_group_leader_pid) · 610b8188

由 Eric W. Biederman 提交于 4月 26, 2020

With the introduction of exchange_tids thread_group_leader and
has_group_leader_pid have become equivalent. Further at this point in the
code a thread group has exactly two threads, the previous thread_group_leader
that is waiting to be reaped and tsk. So we know it is impossible for tsk to
be the thread_group_leader.

This is also the last user of has_group_leader_pid so removing this check
will allow has_group_leader_pid to be removed.

So remove the "BUG_ON(has_group_leader_pid)" that will never fire.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

610b8188

proc: Ensure we see the exit of each process tid exactly once · 6b03d130

由 Eric W. Biederman 提交于 4月 19, 2020

When the thread group leader changes during exec and the old leaders
thread is reaped proc_flush_pid will flush the dentries for the entire
process because the leader still has it's original pid.

Fix this by exchanging the pids in an rcu safe manner,
and wrapping the code to do that up in a helper exchange_tids.

When I removed switch_exec_pids and introduced this behavior
in d73d6529 ("[PATCH] pidhash: kill switch_exec_pids") there
really was nothing that cared as flushing happened with
the cached dentry and de_thread flushed both of them on exec.

This lack of fully exchanging pids became a problem a few months later
when I introduced 48e6484d ("[PATCH] proc: Rewrite the proc dentry
flush on exit optimization").  Which overlooked the de_thread case
was no longer swapping pids, and I was looking up proc dentries
by task->pid.

The current behavior isn't properly a bug as everything in proc will
continue to work correctly just a little bit less efficiently.  Fix
this just so there are no little surprise corner cases waiting to bite
people.

-- Oleg points out this could be an issue in next_tgid in proc where
   has_group_leader_pid is called, and reording some of the assignments
   should fix that.

-- Oleg points out this will break the 10 year old hack in __exit_signal.c
>	/*
>	 * This can only happen if the caller is de_thread().
>	 * FIXME: this is the temporary hack, we should teach
>	 * posix-cpu-timers to handle this case correctly.
>	 */
>	if (unlikely(has_group_leader_pid(tsk)))
>		posix_cpu_timers_exit_group(tsk);

The code in next_tgid has been changed to use PIDTYPE_TGID,
and the posix cpu timers code has been fixed so it does not
need the 10 year old hack, so this should be safe to merge
now.

Link: https://lore.kernel.org/lkml/87h7x3ajll.fsf_-_@x220.int.ebiederm.org/Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Fixes: 48e6484d ("[PATCH] proc: Rewrite the proc dentry flush on exit optimization").
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

6b03d130

02 4月, 2020 1 次提交

signal: Extend exec_id to 64bits · d1e7fd64

由 Eric W. Biederman 提交于 3月 30, 2020

Replace the 32bit exec_id with a 64bit exec_id to make it impossible
to wrap the exec_id counter.  With care an attacker can cause exec_id
wrap and send arbitrary signals to a newly exec'd parent.  This
bypasses the signal sending checks if the parent changes their
credentials during exec.

The severity of this problem can been seen that in my limited testing
of a 32bit exec_id it can take as little as 19s to exec 65536 times.
Which means that it can take as little as 14 days to wrap a 32bit
exec_id.  Adam Zabrocki has succeeded wrapping the self_exe_id in 7
days.  Even my slower timing is in the uptime of a typical server.
Which means self_exec_id is simply a speed bump today, and if exec
gets noticably faster self_exec_id won't even be a speed bump.

Extending self_exec_id to 64bits introduces a problem on 32bit
architectures where reading self_exec_id is no longer atomic and can
take two read instructions.  Which means that is is possible to hit
a window where the read value of exec_id does not match the written
value.  So with very lucky timing after this change this still
remains expoiltable.

I have updated the update of exec_id on exec to use WRITE_ONCE
and the read of exec_id in do_notify_parent to use READ_ONCE
to make it clear that there is no locking between these two
locations.

Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl
Fixes: 2.3.23pre2
Cc: stable@vger.kernel.org
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

d1e7fd64

25 3月, 2020 5 次提交

exec: Add exec_update_mutex to replace cred_guard_mutex · eea96732

由 Eric W. Biederman 提交于 3月 25, 2020

The cred_guard_mutex is problematic as it is held over possibly
indefinite waits for userspace. The possible indefinite waits for
userspace that I have identified are: The cred_guard_mutex is held in
PTRACE_EVENT_EXIT waiting for the tracer. The cred_guard_mutex is
held over "put_user(0, tsk->clear_child_tid)" in exit_mm(). The
cred_guard_mutex is held over "get_user(futex_offset, ...") in
exit_robust_list. The cred_guard_mutex held over copy_strings.

The functions get_user and put_user can trigger a page fault which can
potentially wait indefinitely in the case of userfaultfd or if
userspace implements part of the page fault path.

In any of those cases the userspace process that the kernel is waiting
for might make a different system call that winds up taking the
cred_guard_mutex and result in deadlock.

Holding a mutex over any of those possibly indefinite waits for
userspace does not appear necessary. Add exec_update_mutex that will
just cover updating the process during exec where the permissions and
the objects pointed to by the task struct may be out of sync.

The plan is to switch the users of cred_guard_mutex to
exec_update_mutex one by one. This lets us move forward while still
being careful and not introducing any regressions.

Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/
Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/
Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/
Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/
Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/
Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.")
Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2")
Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NBernd Edlinger <bernd.edlinger@hotmail.de>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

eea96732

exec: Move exec_mmap right after de_thread in flush_old_exec · ccf0fa6b

由 Eric W. Biederman 提交于 3月 25, 2020

I have read through the code in exec_mmap and I do not see anything
that depends on sighand or the sighand lock, or on signals in anyway
so this should be safe.

This rearrangement of code has two significant benefits.  It makes
the determination of passing the point of no return by testing bprm->mm
accurate.  All failures prior to that point in flush_old_exec are
either truly recoverable or they are fatal.

Further this consolidates all of the possible indefinite waits for
userspace together at the top of flush_old_exec.  The possible wait
for a ptracer on PTRACE_EVENT_EXIT, the possible wait for a page fault
to be resolved in clear_child_tid, and the possible wait for a page
fault in exit_robust_list.

This consolidation allows the creation of a mutex to replace
cred_guard_mutex that is not held over possible indefinite userspace
waits.  Which will allow removing deadlock scenarios from the kernel.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: NBernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NBernd Edlinger <bernd.edlinger@hotmail.de>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

ccf0fa6b

exec: Move cleanup of posix timers on exec out of de_thread · 153ffb6b

由 Eric W. Biederman 提交于 3月 25, 2020

These functions have very little to do with de_thread move them out
of de_thread an into flush_old_exec proper so it can be more clearly
seen what flush_old_exec is doing.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: NBernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NBernd Edlinger <bernd.edlinger@hotmail.de>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

153ffb6b

exec: Factor unshare_sighand out of de_thread and call it separately · 02169155

由 Eric W. Biederman 提交于 3月 25, 2020

This makes the code clearer and makes it easier to implement a mutex
that is not taken over any locations that may block indefinitely waiting
for userspace.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: NBernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NBernd Edlinger <bernd.edlinger@hotmail.de>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

02169155

exec: Only compute current once in flush_old_exec · 2ca7be7d

由 Eric W. Biederman 提交于 3月 25, 2020

Make it clear that current only needs to be computed once in
flush_old_exec.  This may have some efficiency improvements and it
makes the code easier to change.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: NBernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NBernd Edlinger <bernd.edlinger@hotmail.de>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

2ca7be7d

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功