提交 · cac75a109c934b0ffb3a8be6d9ef9376ff965f15 · openeuler / Kernel

15 4月, 2021 40 次提交

io_uring: improve poll completion performance · cac75a10

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.6-rc1
commit e94f141b
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

For busy IORING_OP_POLL_ADD workloads, we can have enough contention
on the completion lock that we fail the inline completion path quite
often as we fail the trylock on that lock. Add a list for deferred
completions that we can use in that case. This helps reduce the number
of async offloads we have to do, as if we get multiple completions in
a row, we'll piggy back on to the poll_llist instead of having to queue
our own offload.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

cac75a10

io_uring: split overflow state into SQ and CQ side · 327e731f

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.6-rc1
commit ad3eb2c8
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

We currently check ->cq_overflow_list from both SQ and CQ context, which
causes some bouncing of that cache line. Add separate bits of state for
this instead, so that the SQ side can check using its own state, and
likewise for the CQ side.

This adds ->sq_check_overflow with the SQ state, and ->cq_check_overflow
with the CQ state. If we hit an overflow condition, both of these bits
are set. Likewise for overflow flush clear, we clear both bits. For the
fast path of just checking if there's an overflow condition on either
the SQ or CQ side, we can use our own private bit for this.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

327e731f

io_uring: add lookup table for various opcode needs · f9aae564

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.6-rc1
commit d3656344
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

We currently have various switch statements that check if an opcode needs
a file, mm, etc. These are hard to keep in sync as opcodes are added. Add
a struct io_op_def that holds all of this information, so we have just
one spot to update when opcodes are added.

This also enables us to NOT allocate req->io if a deferred command
doesn't need it, and corrects some mistakes we had in terms of what
commands need mm context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

f9aae564

io_uring: remove two unnecessary function declarations · 30d80a79

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.6-rc1
commit add7b6b8
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

__io_free_req() and io_double_put_req() aren't used before they are
defined, so we can kill these two forwards.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

30d80a79

io_uring: move *queue_link_head() from common path · 895a2009

由 Pavel Begunkov 提交于 4月 15, 2021

mainline inclusion
from mainline-v5.6-rc1
commit 32fe525b
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

Move io_queue_link_head() to links handling code in io_submit_sqe(),
so it wouldn't need extra checks and would have better data locality.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

895a2009

io_uring: rename prev to head · 377bfa91

由 Pavel Begunkov 提交于 4月 15, 2021

mainline inclusion
from mainline-5.6-rc1
commit 9d76377f
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

Calling "prev" a head of a link is a bit misleading. Rename it
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

377bfa91

io_uring: add IOSQE_ASYNC · f4a4a246

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.6-rc1
commit ce35a47a
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

io_uring defaults to always doing inline submissions, if at all
possible. But for larger copies, even if the data is fully cached, that
can take a long time. Add an IOSQE_ASYNC flag that the application can
set on the SQE - if set, it'll ensure that we always go async for those
kinds of requests. Use the io-wq IO_WQ_WORK_CONCURRENT flag to ensure we
get the concurrency we desire for this case.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

f4a4a246

io_uring: add support for IORING_OP_STATX · fa0cce61

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.6-rc1
commit eddc7ef5
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

This provides support for async statx(2) through io_uring.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

fa0cce61

io_uring: avoid ring quiesce for fixed file set unregister and update · d6aaef29

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.6-rc1
commit 05f3fb3c
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

We currently fully quiesce the ring before an unregister or update of
the fixed fileset. This is very expensive, and we can be a bit smarter
about this.

Add a percpu refcount for the file tables as a whole. Grab a percpu ref
when we use a registered file, and put it on completion. This is cheap
to do. Upon removal of a file from a set, switch the ref count to atomic
mode. When we hit zero ref on the completion side, then we know we can
drop the previously registered files. When the old files have been
dropped, switch the ref back to percpu mode for normal operation.

Since there's a period between doing the update and the kernel being
done with it, add a IORING_OP_FILES_UPDATE opcode that can perform the
same action. The application knows the update has completed when it gets
the CQE for it. Between doing the update and receiving this completion,
the application must continue to use the unregistered fd if submitting
IO on this particular file.

This takes the runtime of test/file-register from liburing from 14s to
about 0.7s.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

d6aaef29

io_uring: add support for IORING_OP_CLOSE · 45f68aaf

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.6-rc1
commit b5dba59e
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

This works just like close(2), unsurprisingly. We remove the file
descriptor and post the completion inline, then offload the actual
(potential) last file put to async context.

Mark the async part of this work as uncancellable, as we really must
guarantee that the latter part of the close is run.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

45f68aaf

io-wq: add support for uncancellable work · 974233b6

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.6-rc1
commit 0c9d5ccd
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

Not all work can be cancelled, some of it we may need to guarantee
that it runs to completion. Allow the caller to set IO_WQ_WORK_NO_CANCEL
on work that must not be cancelled. Note that the caller work function
must also check for IO_WQ_WORK_NO_CANCEL on work that is marked
IO_WQ_WORK_CANCEL.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

974233b6

io_uring: add support for IORING_OP_OPENAT · 6ed8bd53

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.6-rc1
commit 15b71abe
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

This works just like openat(2), except it can be performed async. For
the normal case of a non-blocking path lookup this will complete
inline. If we have to do IO to perform the open, it'll be done from
async context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Conflicts:
	fs/io_uring.c
	include/linux/socket.h
[not include fddb5d43("open: introduce openat2(2) syscall")
not include 7999096f("iov_iter: Move unnecessary inclusion of
crypto/hash.h"), but it is not needed]
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

6ed8bd53

io_uring: add support for fallocate() · ac176f1b

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.6-rc1
commit d63d1b5e
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

This exposes fallocate(2) through io_uring.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

ac176f1b

io_uring: don't cancel all work on process exit · c5b3db92

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5
commit ebe10026
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

If we're sharing the ring across forks, then one process exiting means
that we cancel ALL work and prevent future work. This is overly
restrictive. As long as we cancel the work associated with the files
from the current task, it's safe to let others persist. Normal fd close
on exit will still wait (and cancel) pending work.

Fixes: fcb323cc ("io_uring: io_uring: add support for async work inheriting files")
Reported-by: NAndres Freund <andres@anarazel.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

c5b3db92

Revert "io_uring: only allow submit from owning task" · 12de03b6

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5
commit 73e08e71
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

This ends up being too restrictive for tasks that willingly fork and
share the ring between forks. Andres reports that this breaks his
postgresql work. Since we're close to 5.5 release, revert this change
for now.

Cc: stable@vger.kernel.org
Fixes: 44d28279 ("io_uring: only allow submit from owning task")
Reported-by: NAndres Freund <andres@anarazel.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

12de03b6

io_uring: fix compat for IORING_REGISTER_FILES_UPDATE · f1ce06ae

由 Eugene Syromiatnikov 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5
commit 1292e972
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

fds field of struct io_uring_files_update is problematic with regards
to compat user space, as pointer size is different in 32-bit, 32-on-64-bit,
and 64-bit user space.  In order to avoid custom handling of compat in
the syscall implementation, make fds __u64 and use u64_to_user_ptr in
order to retrieve it.  Also, align the field naturally and check that
no garbage is passed there.

Fixes: c3a31e60 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE")
Signed-off-by: NEugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

f1ce06ae

io_uring: only allow submit from owning task · 944ed873

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc7
commit 44d28279
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

If the credentials or the mm doesn't match, don't allow the task to
submit anything on behalf of this ring. The task that owns the ring can
pass the file descriptor to another task, but we don't want to allow
that task to submit an SQE that then assumes the ring mm and creds if
it needs to go async.

Cc: stable@vger.kernel.org
Suggested-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

944ed873

io_uring: ensure workqueue offload grabs ring mutex for poll list · 0592aaef

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc7
commit 11ba820b
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

A previous commit moved the locking for the async sqthread, but didn't
take into account that the io-wq workers still need it. We can't use
req->in_async for this anymore as both the sqthread and io-wq workers
set it, gate the need for locking on io_wq_current_is_worker() instead.

Fixes: 8a4955ff ("io_uring: sqthread should grab ctx->uring_lock for submissions")
Reported-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

0592aaef

io_uring: clear req->result always before issuing a read/write request · ca175df5

由 Bijan Mottahedeh 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc7
commit 797f3f53
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

req->result is cleared when io_issue_sqe() calls io_read/write_pre()
routines.  Those routines however are not called when the sqe
argument is NULL, which is the case when io_issue_sqe() is called from
io_wq_submit_work().  io_issue_sqe() may then examine a stale result if
a polled request had previously failed with -EAGAIN:

        if (ctx->flags & IORING_SETUP_IOPOLL) {
                if (req->result == -EAGAIN)
                        return -EAGAIN;

                io_iopoll_req_issued(req);
        }

and in turn cause a subsequently completed request to be re-issued in
io_wq_submit_work().
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

ca175df5

io_uring: be consistent in assigning next work from handler · e02c96fb

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc7
commit 78912934
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

If we pass back dependent work in case of links, we need to always
ensure that we call the link setup and work prep handler. If not, we
might be missing some setup for the next work item.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

e02c96fb

io_uring: don't setup async context for read/write fixed · 4d081528

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc7
commit 74566df3
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

We don't need it, and if we have it, then the retry handler will attempt
to copy the non-existent iovec with the inline iovec, with a segment
count that doesn't make sense.

Fixes: f67676d1 ("io_uring: ensure async punted read/write requests copy iovec")
Reported-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

4d081528

io_uring: remove punt of short reads to async context · 362aa7f3

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc6
commit eacc6dfa
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

We currently punt any short read on a regular file to async context,
but this fails if the short read is due to running into EOF. This is
especially problematic since we only do the single prep for commands
now, as we don't reset kiocb->ki_pos. This can result in a 4k read on
a 1k file returning zero, as we detect the short read and then retry
from async context. At the time of retry, the position is now 1k, and
we end up reading nothing, and hence return 0.

Instead of trying to patch around the fact that short reads can be
legitimate and won't succeed in case of retry, remove the logic to punt
a short read to async context. Simply return it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

362aa7f3

io_uring: pass in 'sqe' to the prep handlers · b48b2469

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc4
commit 3529d8c2
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

This moves the prep handlers outside of the opcode handlers, and allows
us to pass in the sqe directly. If the sqe is non-NULL, it means that
the request should be prepared for the first time.

With the opcode handlers not having access to the sqe at all, we are
guaranteed that the prep handler has setup the request fully by the
time we get there. As before, for opcodes that need to copy in more
data then the io_kiocb allows for, the io_async_ctx holds that info. If
a prep handler is invoked with req->io set, it must use that to retain
information for later.

Finally, we can remove io_kiocb->sqe as well.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

b48b2469

io_uring: standardize the prep methods · 89d430a4

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc4
commit 06b76d44
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

We currently have a mix of use cases. Most of the newer ones are pretty
uniform, but we have some older ones that use different calling
calling conventions. This is confusing.

For the opcodes that currently rely on the req->io->sqe copy saving
them from reuse, add a request type struct in the io_kiocb command
union to store the data they need.

Prepare for all opcodes having a standard prep method, so we can call
it in a uniform fashion and outside of the opcode handler. This is in
preparation for passing in the 'sqe' pointer, rather than storing it
in the io_kiocb. Once we have uniform prep handlers, we can leave all
the prep work to that part, and not even pass in the sqe to the opcode
handler. This ensures that we don't reuse sqe data inadvertently.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

89d430a4

io_uring: read 'count' for IORING_OP_TIMEOUT in prep handler · 6ec79a99

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc4
commit 26a61679
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

Add the count field to struct io_timeout, and ensure the prep handler
has read it. Timeout also needs an async context always, set it up
in the prep handler if we don't have one.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

6ec79a99

io_uring: move all prep state for IORING_OP_{SEND,RECV}_MGS to prep handler · 364eba02

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc4
commit e47293fd
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

Add struct io_sr_msg in our io_kiocb per-command union, and ensure that
the send/recvmsg prep handlers have grabbed what they need from the SQE
by the time prep is done.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

364eba02

io_uring: move all prep state for IORING_OP_CONNECT to prep handler · da4fbe59

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc4
commit 3fbb51c1
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

Add struct io_connect in our io_kiocb per-command union, and ensure
that io_connect_prep() has grabbed what it needs from the SQE.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

da4fbe59

io_uring: add and use struct io_rw for read/writes · 6e33df51

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc4
commit 9adbd45d
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

Put the kiocb in struct io_rw, and add the addr/len for the request as
well. Use the kiocb->private field for the buffer index for fixed reads
and writes.

Any use of kiocb->ki_filp is flipped to req->file. It's the same thing,
and less confusing.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

6e33df51

io_uring: use u64_to_user_ptr() consistently · 4d5f1d19

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc4
commit d55e5f5b
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

We use it in some spots, but not consistently. Convert the rest over,
makes it easier to read as well.

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

4d5f1d19

io_uring: io_wq_submit_work() should not touch req->rw · debd89b5

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc3
commit fd6c2e4c
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

I've been chasing a weird and obscure crash that was userspace stack
corruption, and finally narrowed it down to a bit flip that made a
stack address invalid. io_wq_submit_work() unconditionally flips
the req->rw.ki_flags IOCB_NOWAIT bit, but since it's a generic work
handler, this isn't valid. Normal read/write operations own that
part of the request, on other types it could be something else.

Move the IOCB_NOWAIT clear to the read/write handlers where it belongs.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

debd89b5

io_uring: don't wait when under-submitting · ee773046

由 Pavel Begunkov 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc3
commit 7c504e65
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

There is no reliable way to submit and wait in a single syscall, as
io_submit_sqes() may under-consume sqes (in case of an early error).
Then it will wait for not-yet-submitted requests, deadlocking the user
in most cases.

Don't wait/poll if can't submit all sqes
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

ee773046

io_uring: warn about unhandled opcode · ccce92fa

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc3
commit e781573e
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

Now that we have all the opcodes handled in terms of command prep and
SQE reuse, add a printk_once() to warn about any potentially new and
unhandled ones.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

ccce92fa

io_uring: read opcode and user_data from SQE exactly once · 8affa04f

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc3
commit d625c6ee
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

If we defer a request, we can't be reading the opcode again. Ensure that
the user_data and opcode fields are stable. For the user_data we already
have a place for it, for the opcode we can fill a one byte hold and store
that as well. For both of them, assign them when we originally read the
SQE in io_get_sqring(). Any code that uses sqe->opcode or sqe->user_data
is switched to req->opcode and req->user_data.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

8affa04f

io_uring: make IORING_OP_TIMEOUT_REMOVE deferrable · 6a3bb48b

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc3
commit b29472ee
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the timeout remove op into
the prep handling to make it safe for SQE reuse.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

6a3bb48b

io_uring: make IORING_OP_CANCEL_ASYNC deferrable · c9370096

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc3
commit fbf23849
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the async cancel op into
the prep handling to make it safe for SQE reuse.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

c9370096

io_uring: make IORING_POLL_ADD and IORING_POLL_REMOVE deferrable · 6e4b5db4

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc3
commit 0969e783
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

If we defer these commands as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the poll add/remove into
the prep handling to make it safe for SQE reuse.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

6e4b5db4

io_uring: make HARDLINK imply LINK · 57801b80

由 Pavel Begunkov 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc3
commit ffbb8d6b
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

The rules are as follows, if IOSQE_IO_HARDLINK is specified, then it's a
link and there is no need to set IOSQE_IO_LINK separately, though it
could be there. Add proper check and ensure that IOSQE_IO_HARDLINK
implies IOSQE_IO_LINK.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

57801b80

io_uring: any deferred command must have stable sqe data · e125a278

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc3
commit 8ed8d3c3
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

We're currently not retaining sqe data for accept, fsync, and
sync_file_range. None of these commands need data outside of what
is directly provided, hence it can't go stale when the request is
deferred. However, it can get reused, if an application reuses
SQE entries.

Ensure that we retain the information we need and only read the sqe
contents once, off the submission path. Most of this is just moving
code into a prep and finish function.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

e125a278

io_uring: remove 'sqe' parameter to the OP helpers that take it · e6120a3a

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc3
commit fc4df999
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

We pass in req->sqe for all of them, no need to pass it in as the
request is always passed in. This is a necessary prep patch to be
able to cleanup/fix the request prep path.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

e6120a3a

io_uring: fix pre-prepped issue with force_nonblock == true · 6e638023

由 Jens Axboe 提交于 4月 15, 2021

mainline inclusion
from mainline-5.5-rc3
commit b7bb4f7d
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
CVE: NA
---------------------------

Some of these code paths assume that any force_nonblock == true issue
is not prepped, but that's not true if we did prep as part of link setup
earlier. Check if we already have an async context allocate before
setting up a new one.

Cleanup the async context setup in general, we have a lot of duplicated
code there.

Fixes: 03b1230c ("io_uring: ensure async punted sendmsg/recvmsg requests copy data")
Fixes: f67676d1 ("io_uring: ensure async punted read/write requests copy iovec")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

6e638023

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功