提交 · 87ceb6a6b81eca000911403446d4c6043b4e4f82 · openeuler / Kernel

15 9月, 2020 1 次提交

io_uring: drop 'ctx' ref on task work cancelation · 87ceb6a6

由 Jens Axboe 提交于 9月 14, 2020

If task_work ends up being marked for cancelation, we go through a
cancelation helper instead of the queue path. In converting task_work to
always hold a ctx reference, this path was missed. Make sure that
io_req_task_cancel() puts the reference that is being held against the
ctx.

Fixes: 6d816e08 ("io_uring: hold 'ctx' reference around task_work queue + execute")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

87ceb6a6

14 9月, 2020 1 次提交

io_uring: grab any needed state during defer prep · 202700e1

由 Jens Axboe 提交于 9月 12, 2020

Always grab work environment for deferred links. The assumption that we
will be running it always from the task in question is false, as exiting
tasks may mean that we're deferring this one to a thread helper. And at
that point it's too late to grab the work environment.

Fixes: debb85f4 ("io_uring: factor out grab_env() from defer_prep()")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

202700e1

06 9月, 2020 2 次提交

io_uring: fix linked deferred ->files cancellation · c127a2a1

由 Pavel Begunkov 提交于 9月 06, 2020

While looking for ->files in ->defer_list, consider that requests there
may actually be links.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c127a2a1

io_uring: fix cancel of deferred reqs with ->files · b7ddce3c

由 Pavel Begunkov 提交于 9月 06, 2020

While trying to cancel requests with ->files, it also should look for
requests in ->defer_list, otherwise it might end up hanging a thread.

Cancel all requests in ->defer_list up to the last request there with
matching ->files, that's needed to follow drain ordering semantics.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7ddce3c

05 9月, 2020 1 次提交

io_uring: fix explicit async read/write mapping for large segments · c183edff

由 Jens Axboe 提交于 9月 04, 2020

If we exceed UIO_FASTIOV, we don't handle the transition correctly
between an allocated vec for requests that are queued with IOSQE_ASYNC.
Store the iovec appropriately and re-set it in the iter iov in case
it changed.

Fixes: ff6165b2 ("io_uring: retain iov_iter state over io_read/io_write calls")
Reported-by: NNick Hill <nick@nickhill.org>
Tested-by: NNorman Maurer <norman.maurer@googlemail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c183edff

03 9月, 2020 1 次提交

io_uring: no read/write-retry on -EAGAIN error and O_NONBLOCK marked file · 355afaeb

由 Jens Axboe 提交于 9月 02, 2020

Actually two things that need fixing up here:

- The io_rw_reissue() -EAGAIN retry is explicit to block devices and
  regular files, so don't ever attempt to do that on other types of
  files.

- If we hit -EAGAIN on a nonblock marked file, don't arm poll handler for
  it. It should just complete with -EAGAIN.

Cc: stable@vger.kernel.org
Reported-by: NNorman Maurer <norman.maurer@googlemail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

355afaeb

02 9月, 2020 1 次提交

io_uring: set table->files[i] to NULL when io_sqe_file_register failed · 95d1c8e5

由 Jiufei Xue 提交于 9月 02, 2020

While io_sqe_file_register() failed in __io_sqe_files_update(),
table->files[i] still point to the original file which may freed
soon, and that will trigger use-after-free problems.

Cc: stable@vger.kernel.org
Fixes: f3bd9dae ("io_uring: fix memleak in __io_sqe_files_update()")
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

95d1c8e5

01 9月, 2020 1 次提交

io_uring: fix removing the wrong file in __io_sqe_files_update() · 98dfd502

由 Jiufei Xue 提交于 9月 01, 2020

Index here is already the position of the file in fixed_file_table, we
should not use io_file_from_index() again to get it. Otherwise, the
wrong file which still in use may be released unexpectedly.

Cc: stable@vger.kernel.org # v5.6
Fixes: 05f3fb3c ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

98dfd502

28 8月, 2020 2 次提交

io_uring: don't bounce block based -EAGAIN retry off task_work · fdee946d

由 Jens Axboe 提交于 8月 27, 2020

These events happen inline from submission, so there's no need to
bounce them through the original task. Just set them up for retry
and issue retry directly instead of going over task_work.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fdee946d

io_uring: fix IOPOLL -EAGAIN retries · eefdf30f

由 Jens Axboe 提交于 8月 27, 2020

This normally isn't hit, as polling is mostly done on NVMe with deep
queue depths. But if we do run into request starvation, we need to
ensure that retries are properly serialized.
Reported-by: NAndres Freund <andres@anarazel.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eefdf30f

27 8月, 2020 2 次提交

io_uring: clear req->result on IOPOLL re-issue · 56450c20

由 Jens Axboe 提交于 8月 26, 2020

Make sure we clear req->result, which was set to -EAGAIN for retry
purposes, when moving it to the reissue list. Otherwise we can end up
retrying a request more than once, which leads to weird results in
the io-wq handling (and other spots).

Cc: stable@vger.kernel.org
Reported-by: NAndres Freund <andres@anarazel.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

56450c20

io_uring: make offset == -1 consistent with preadv2/pwritev2 · 0fef9483

由 Jens Axboe 提交于 8月 26, 2020

The man page for io_uring generally claims were consistent with what
preadv2 and pwritev2 accept, but turns out there's a slight discrepancy
in how offset == -1 is handled for pipes/streams. preadv doesn't allow
it, but preadv2 does. This currently causes io_uring to return -EINVAL
if that is attempted, but we should allow that as documented.

This change makes us consistent with preadv2/pwritev2 for just passing
in a NULL ppos for streams if the offset is -1.

Cc: stable@vger.kernel.org # v5.7+
Reported-by: NBenedikt Ames <wisp3rwind@posteo.eu>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0fef9483

26 8月, 2020 3 次提交

io_uring: ensure read requests go through -ERESTART* transformation · 00d23d51

由 Jens Axboe 提交于 8月 25, 2020

We need to call kiocb_done() for any ret < 0 to ensure that we always
get the proper -ERESTARTSYS (and friends) transformation done.

At some point this should be tied into general error handling, so we
can get rid of the various (mostly network) related commands that check
and perform this substitution.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

00d23d51

io_uring: don't use poll handler if file can't be nonblocking read/written · 9dab14b8

由 Jens Axboe 提交于 8月 25, 2020

There's no point in using the poll handler if we can't do a nonblocking
IO attempt of the operation, since we'll need to go async anyway. In
fact this is actively harmful, as reading from eg pipes won't return 0
to indicate EOF.

Cc: stable@vger.kernel.org # v5.7+
Reported-by: NBenedikt Ames <wisp3rwind@posteo.eu>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9dab14b8

io_uring: fix imbalanced sqo_mm accounting · 6b7898eb

由 Jens Axboe 提交于 8月 25, 2020

We do the initial accounting of locked_vm and pinned_vm before we have
setup ctx->sqo_mm, which means we can end up having not accounted the
memory at setup time, but still decrement it when we exit. This causes
an imbalance in the accounting.

Setup ctx->sqo_mm earlier in io_uring_create(), before we do the first
accounting of mm->{locked,pinned}_vm. This also unifies the state
grabbing for the ctx, and eliminates a failure case in
io_sq_offload_start().

Fixes: f74441e6 ("io_uring: account locked memory before potential error case")
Reported-by: NRobert M. Muncrief <rmuncrief@humanavance.com>
Reported-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Tested-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Tested-by: NRobert M. Muncrief <rmuncrief@humanavance.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6b7898eb

25 8月, 2020 1 次提交

io_uring: revert consumed iov_iter bytes on error · 84216315

由 Jens Axboe 提交于 8月 24, 2020

Some consumers of the iov_iter will return an error, but still have
bytes consumed in the iterator. This is an issue for -EAGAIN, since we
rely on a sane iov_iter state across retries.

Fix this by ensuring that we revert consumed bytes, if any, if the file
operations have consumed any bytes from iterator. This is similar to what
generic_file_read_iter() does, and is always safe as we have the previous
bytes count handy already.

Fixes: ff6165b2 ("io_uring: retain iov_iter state over io_read/io_write calls")
Reported-by: NDmitry Shulyak <yashulyak@gmail.com>
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

84216315

24 8月, 2020 2 次提交

io-wq: fix hang after cancelling pending hashed work · 204361a7

由 Pavel Begunkov 提交于 8月 23, 2020

Don't forget to update wqe->hash_tail after cancelling a pending work
item, if it was hashed.

Cc: stable@vger.kernel.org # 5.7+
Reported-by: NDmitry Shulyak <yashulyak@gmail.com>
Fixes: 86f3cd1b ("io-wq: handle hashed writes in chains")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

204361a7

io_uring: don't recurse on tsk->sighand->siglock with signalfd · fd7d6de2

由 Jens Axboe 提交于 8月 23, 2020

If an application is doing reads on signalfd, and we arm the poll handler
because there's no data available, then the wakeup can recurse on the
tasks sighand->siglock as the signal delivery from task_work_add() will
use TWA_SIGNAL and that attempts to lock it again.

We can detect the signalfd case pretty easily by comparing the poll->head
wait_queue_head_t with the target task signalfd wait queue. Just use
normal task wakeup for this case.

Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fd7d6de2

20 8月, 2020 4 次提交

io_uring: kill extra iovec=NULL in import_iovec() · 867a23ea

由 Pavel Begunkov 提交于 8月 20, 2020

If io_import_iovec() returns an error, return iovec is undefined and
must not be used, so don't set it to NULL when failing.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

867a23ea

io_uring: comment on kfree(iovec) checks · f261c168

由 Pavel Begunkov 提交于 8月 20, 2020

kfree() handles NULL pointers well, but io_{read,write}() checks it
because of performance reasons. Leave a comment there for those who are
tempted to patch it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f261c168

io_uring: fix racy req->flags modification · bb175342

由 Pavel Begunkov 提交于 8月 20, 2020

Setting and clearing REQ_F_OVERFLOW in io_uring_cancel_files() and
io_cqring_overflow_flush() are racy, because they might be called
asynchronously.

REQ_F_OVERFLOW flag in only needed for files cancellation, so if it can
be guaranteed that requests _currently_ marked inflight can't be
overflown, the problem will be solved with removing the flag
altogether.

That's how the patch works, it removes inflight status of a request
in io_cqring_fill_event() whenever it should be thrown into CQ-overflow
list. That's Ok to do, because no opcode specific handling can be done
after io_cqring_fill_event(), the same assumption as with "struct
io_completion" patches.
And it already have a good place for such cleanups, which is
io_clean_op(). A nice side effect of this is removing this inflight
check from the hot path.

note on synchronisation: now __io_cqring_fill_event() may be taking two
spinlocks simultaneously, completion_lock and inflight_lock. It's fine,
because we never do that in reverse order, and CQ-overflow of inflight
requests shouldn't happen often.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bb175342

io_uring: use system_unbound_wq for ring exit work · fc666777

由 Jens Axboe 提交于 8月 19, 2020

We currently use system_wq, which is unbounded in terms of number of
workers. This means that if we're exiting tons of rings at the same
time, then we'll briefly spawn tons of event kworkers just for a very
short blocking time as the rings exit.

Use system_unbound_wq instead, which has a sane cap on the concurrency
level.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fc666777

19 8月, 2020 1 次提交

io_uring: cleanup io_import_iovec() of pre-mapped request · 8452fd0c

由 Jens Axboe 提交于 8月 18, 2020

io_rw_prep_async() goes through a dance of clearing req->io, calling
the iovec import, then re-setting req->io. Provide an internal helper
that does the right thing without needing state tweaked to get there.

This enables further cleanups in io_read, io_write, and
io_resubmit_prep(), but that's left for another time.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8452fd0c

17 8月, 2020 2 次提交

io_uring: get rid of kiocb_wait_page_queue_init() · 3b2a4439

由 Jens Axboe 提交于 8月 16, 2020

The 5.9 merge moved this function io_uring, which means that we don't
need to retain the generic nature of it. Clean up this part by removing
redundant checks, and just inlining the small remainder in
io_rw_should_retry().

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3b2a4439

io_uring: find and cancel head link async work on files exit · b711d4ea

由 Jens Axboe 提交于 8月 16, 2020

Commit f254ac04 ("io_uring: enable lookup of links holding inflight files")
only handled 2 out of the three head link cases we have, we also need to
lookup and cancel work that is blocked in io-wq if that work has a link
that's holding a reference to the files structure.

Put the "cancel head links that hold this request pending" logic into
io_attempt_cancel(), which will to through the motions of finding and
canceling head links that hold the current inflight files stable request
pending.

Cc: stable@vger.kernel.org
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b711d4ea

16 8月, 2020 2 次提交

io_uring: short circuit -EAGAIN for blocking read attempt · f91daf56

由 Jens Axboe 提交于 8月 15, 2020

One case was missed in the short IO retry handling, and that's hitting
-EAGAIN on a blocking attempt read (eg from io-wq context). This is a
problem on sockets that are marked as non-blocking when created, they
don't carry any REQ_F_NOWAIT information to help us terminate them
instead of perpetually retrying.

Fixes: 227c0c96 ("io_uring: internally retry short reads")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f91daf56

io_uring: sanitize double poll handling · d4e7cd36

由 Jens Axboe 提交于 8月 15, 2020

There's a bit of confusion on the matching pairs of poll vs double poll,
depending on if the request is a pure poll (IORING_OP_POLL_ADD) or
poll driven retry.

Add io_poll_get_double() that returns the double poll waitqueue, if any,
and io_poll_get_single() that returns the original poll waitqueue. With
that, remove the argument to io_poll_remove_double().

Finally ensure that wait->private is cleared once the double poll handler
has run, so that remove knows it's already been seen.

Cc: stable@vger.kernel.org # v5.8
Reported-by: syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com
Fixes: 18bceab1 ("io_uring: allow POLL_ADD with double poll_wait() users")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d4e7cd36

15 8月, 2020 2 次提交

fs: autofs: delete repeated words in comments · c734124c

由 Randy Dunlap 提交于 8月 14, 2020

Drop duplicated words {the, at} in comments.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NIan Kent <raven@themaw.net>
Link: http://lkml.kernel.org/r/20200811021817.24982-1-rdunlap@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c734124c

exec: restore EACCES of S_ISDIR execve() · fc4177be

由 Kees Cook 提交于 8月 14, 2020

Patch series "Fix S_ISDIR execve() errno".

Fix an errno change for execve() of directories, noticed by Marc Zyngier.
Along with the fix, include a regression test to avoid seeing this return
in the future.

This patch (of 2):

The return code for attempting to execute a directory has always been
EACCES.  Adjust the S_ISDIR exec test to reflect the old errno instead of
the general EISDIR for other kinds of "open" attempts on directories.

Fixes: 633fb6ac ("exec: move S_ISREG() check earlier")
Reported-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Tested-by: NGreg Kroah-Hartman <gregkh@android.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@google.com>
Link: http://lkml.kernel.org/r/20200813231723.2725102-2-keescook@chromium.org
Link: https://lore.kernel.org/lkml/20200813151305.6191993b@whySigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fc4177be

14 8月, 2020 3 次提交

SMB3: Fix mkdir when idsfromsid configured on mount · c8c412f9

由 Steve French 提交于 8月 13, 2020

mkdir uses a compounded create operation which was not setting
the security descriptor on create of a directory. Fix so
mkdir now sets the mode and owner info properly when idsfromsid
and modefromsid are configured on the mount.
Signed-off-by: NSteve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org> # v5.8
Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>

c8c412f9

io_uring: internally retry short reads · 227c0c96

由 Jens Axboe 提交于 8月 13, 2020

We've had a few application cases of not handling short reads properly,
and it is understandable as short reads aren't really expected if the
application isn't doing non-blocking IO.

Now that we retain the iov_iter over retries, we can implement internal
retry pretty trivially. This ensures that we don't return a short read,
even for buffered reads on page cache conflicts.

Cleanup the deep nesting and hard to read nature of io_read() as well,
it's much more straight forward now to read and understand. Added a
few comments explaining the logic as well.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

227c0c96

io_uring: retain iov_iter state over io_read/io_write calls · ff6165b2

由 Jens Axboe 提交于 8月 13, 2020

Instead of maintaining (and setting/remembering) iov_iter size and
segment counts, just put the iov_iter in the async part of the IO
structure.

This is mostly a preparation patch for doing appropriate internal retries
for short reads, but it also cleans up the state handling nicely and
simplifies it quite a bit.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ff6165b2

13 8月, 2020 8 次提交

io_uring: enable lookup of links holding inflight files · f254ac04

由 Jens Axboe 提交于 8月 12, 2020

When a process exits, we cancel whatever requests it has pending that
are referencing the file table. However, if a link is holding a
reference, then we cannot find it by simply looking at the inflight
list.

Enable checking of the poll and timeout list to find the link, and
cancel it appropriately.

Cc: stable@vger.kernel.org
Reported-by: NJosef <josef.grieb@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f254ac04

mm/gup: remove task_struct pointer for all gup code · 64019a2e

由 Peter Xu 提交于 8月 11, 2020

After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more. Remove that parameter in the whole gup
stack.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

64019a2e

exec: move path_noexec() check earlier · 0fd338b2

由 Kees Cook 提交于 8月 11, 2020

The path_noexec() check, like the regular file check, was happening too
late, letting LSMs see impossible execve()s.  Check it earlier as well in
may_open() and collect the redundant fs/exec.c path_noexec() test under
the same robustness comment as the S_ISREG() check.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
    struct open_flags open_exec_flags = {
        .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
        .acc_mode = MAY_EXEC,
        ...
    do_filp_open(dfd, filename, open_flags)
        path_openat(nameidata, open_flags, flags)
            file = alloc_empty_file(open_flags, current_cred());
            do_open(nameidata, file, open_flags)
                may_open(path, acc_mode, open_flag)
                    /* new location of MAY_EXEC vs path_noexec() test */
                    inode_permission(inode, MAY_OPEN | acc_mode)
                        security_inode_permission(inode, acc_mode)
                vfs_open(path, file)
                    do_dentry_open(file, path->dentry->d_inode, open)
                        security_file_open(f)
                        open()
    /* old location of path_noexec() test */
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/20200605160013.3954297-4-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0fd338b2

exec: move S_ISREG() check earlier · 633fb6ac

由 Kees Cook 提交于 8月 11, 2020

The execve(2)/uselib(2) syscalls have always rejected non-regular files.
Recently, it was noticed that a deadlock was introduced when trying to
execute pipes, as the S_ISREG() test was happening too late.  This was
fixed in commit 73601ea5 ("fs/open.c: allow opening only regular files
during execve()"), but it was added after inode_permission() had already
run, which meant LSMs could see bogus attempts to execute non-regular
files.

Move the test into the other inode type checks (which already look for
other pathological conditions[1]).  Since there is no need to use
FMODE_EXEC while we still have access to "acc_mode", also switch the test
to MAY_EXEC.

Also include a comment with the redundant S_ISREG() checks at the end of
execve(2)/uselib(2) to note that they are present to avoid any mistakes.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
    struct open_flags open_exec_flags = {
        .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
        .acc_mode = MAY_EXEC,
        ...
    do_filp_open(dfd, filename, open_flags)
        path_openat(nameidata, open_flags, flags)
            file = alloc_empty_file(open_flags, current_cred());
            do_open(nameidata, file, open_flags)
                may_open(path, acc_mode, open_flag)
		    /* new location of MAY_EXEC vs S_ISREG() test */
                    inode_permission(inode, MAY_OPEN | acc_mode)
                        security_inode_permission(inode, acc_mode)
                vfs_open(path, file)
                    do_dentry_open(file, path->dentry->d_inode, open)
                        /* old location of FMODE_EXEC vs S_ISREG() test */
                        security_file_open(f)
                        open()

[1] https://lore.kernel.org/lkml/202006041910.9EF0C602@keescook/Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/20200605160013.3954297-3-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

633fb6ac

exec: change uselib(2) IS_SREG() failure to EACCES · db19c91c

由 Kees Cook 提交于 8月 11, 2020

Patch series "Relocate execve() sanity checks", v2.

While looking at the code paths for the proposed O_MAYEXEC flag, I saw
some things that looked like they should be fixed up.

  exec: Change uselib(2) IS_SREG() failure to EACCES
	This just regularizes the return code on uselib(2).

  exec: Move S_ISREG() check earlier
	This moves the S_ISREG() check even earlier than it was already.

  exec: Move path_noexec() check earlier
	This adds the path_noexec() check to the same place as the
	S_ISREG() check.

This patch (of 3):

Change uselib(2)' S_ISREG() error return to EACCES instead of EINVAL so
the behavior matches execve(2), and the seemingly documented value.  The
"not a regular file" failure mode of execve(2) is explicitly
documented[1], but it is not mentioned in uselib(2)[2] which does,
however, say that open(2) and mmap(2) errors may apply.  The documentation
for open(2) does not include a "not a regular file" error[3], but mmap(2)
does[4], and it is EACCES.

[1] http://man7.org/linux/man-pages/man2/execve.2.html#ERRORS
[2] http://man7.org/linux/man-pages/man2/uselib.2.html#ERRORS
[3] http://man7.org/linux/man-pages/man2/open.2.html#ERRORS
[4] http://man7.org/linux/man-pages/man2/mmap.2.html#ERRORSSigned-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/20200605160013.3954297-1-keescook@chromium.org
Link: http://lkml.kernel.org/r/20200605160013.3954297-2-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

db19c91c

coredump: add %f for executable filename · f38c85f1

由 Lepton Wu 提交于 8月 11, 2020

The document reads "%e" should be "executable filename" while actually it
could be changed by things like pr_ctl PR_SET_NAME. People who uses "%e"
in core_pattern get surprised when they find out they get thread name
instead of executable filename.

This is either a bug of document or a bug of code. Since the behavior of
"%e" is there for long time, it could bring another surprise for users if
we "fix" the code.

So we just "fix" the document. And more, for users who really need the
"executable filename" in core_pattern, we introduce a new "%f" for the
real executable filename. We already have "%E" for executable path in
kernel, so just reuse most of its code for the new added "%f" format.
Signed-off-by: NLepton Wu <ytht.net@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200701031432.2978761-1-ytht.net@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f38c85f1

fs/signalfd.c: fix inconsistent return codes for signalfd4 · a089e3fd

由 Helge Deller 提交于 8月 11, 2020

The kernel signalfd4() syscall returns different error codes when called
either in compat or native mode. This behaviour makes correct emulation
in qemu and testing programs like LTP more complicated.

Fix the code to always return -in both modes- EFAULT for unaccessible user
memory, and EINVAL when called with an invalid signal mask.
Signed-off-by: NHelge Deller <deller@gmx.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Laurent Vivier <laurent@vivier.eu>
Link: http://lkml.kernel.org/r/20200530100707.GA10159@ls3530.fritz.boxSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a089e3fd

fat: fix fat_ra_init() for data clusters == 0 · a090a5a7

由 OGAWA Hirofumi 提交于 8月 11, 2020

If data clusters == 0, fat_ra_init() calls the ->ent_blocknr() for the
cluster beyond ->max_clusters.

This checks the limit before initialization to suppress the warning.

Reported-by: syzbot+756199124937b31a9b7e@syzkaller.appspotmail.com
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/87mu462sv4.fsf@mail.parknet.co.jpSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a090a5a7

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功