提交 · 6ea60de69562d274b176a2a0c4f1f3722a0b06e8 · openanolis / cloud-kernel

27 5月, 2020 40 次提交

io_uring: move all prep state for IORING_OP_CONNECT to prep handler · 6ea60de6

由 Jens Axboe 提交于 12月 20, 2019

to #26323578

commit 3fbb51c18f5c15a23db74c4da79d3d035176c480 upstream.

Add struct io_connect in our io_kiocb per-command union, and ensure
that io_connect_prep() has grabbed what it needs from the SQE.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

6ea60de6

io_uring: add and use struct io_rw for read/writes · c5d6bd8a

由 Jens Axboe 提交于 12月 20, 2019

to #26323578

commit 9adbd45d6d32ffc1a03f3c51d72cfc69ebfc2ddb upstream.

Put the kiocb in struct io_rw, and add the addr/len for the request as
well. Use the kiocb->private field for the buffer index for fixed reads
and writes.

Any use of kiocb->ki_filp is flipped to req->file. It's the same thing,
and less confusing.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c5d6bd8a

io_uring: use u64_to_user_ptr() consistently · 08ba131e

由 Jens Axboe 提交于 12月 11, 2019

to #26323578

commit d55e5f5b70dd6214ef81fb2313121b72a7dd2200 upstream.

We use it in some spots, but not consistently. Convert the rest over,
makes it easier to read as well.

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

08ba131e

io_uring: io_wq_submit_work() should not touch req->rw · 531bd675

由 Jens Axboe 提交于 12月 18, 2019

to #26323578

commit fd6c2e4c063d64511657ad0031a1677b6a914859 upstream.

I've been chasing a weird and obscure crash that was userspace stack
corruption, and finally narrowed it down to a bit flip that made a
stack address invalid. io_wq_submit_work() unconditionally flips
the req->rw.ki_flags IOCB_NOWAIT bit, but since it's a generic work
handler, this isn't valid. Normal read/write operations own that
part of the request, on other types it could be something else.

Move the IOCB_NOWAIT clear to the read/write handlers where it belongs.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

531bd675

io_uring: don't wait when under-submitting · 13bd4808

由 Pavel Begunkov 提交于 12月 18, 2019

to #26323578

commit 7c504e65206a4379ff38fe41d21b32b6c2c3e53e upstream.

There is no reliable way to submit and wait in a single syscall, as
io_submit_sqes() may under-consume sqes (in case of an early error).
Then it will wait for not-yet-submitted requests, deadlocking the user
in most cases.

Don't wait/poll if can't submit all sqes
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

13bd4808

io_uring: warn about unhandled opcode · 861c7c78

由 Jens Axboe 提交于 12月 17, 2019

to #26323578

commit e781573e2fb1b75acdba61dcb9bcbfc16f288442 upstream.

Now that we have all the opcodes handled in terms of command prep and
SQE reuse, add a printk_once() to warn about any potentially new and
unhandled ones.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

861c7c78

io_uring: read opcode and user_data from SQE exactly once · 96baaf97

由 Jens Axboe 提交于 12月 17, 2019

to #26323578

commit d625c6ee4975000140c57da7e1ff244efefde274 upstream.

If we defer a request, we can't be reading the opcode again. Ensure that
the user_data and opcode fields are stable. For the user_data we already
have a place for it, for the opcode we can fill a one byte hold and store
that as well. For both of them, assign them when we originally read the
SQE in io_get_sqring(). Any code that uses sqe->opcode or sqe->user_data
is switched to req->opcode and req->user_data.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

96baaf97

io_uring: make IORING_OP_TIMEOUT_REMOVE deferrable · baf2a6a7

由 Jens Axboe 提交于 12月 17, 2019

to #26323578

commit b29472ee7b53784f44011069fad15e539fd25bcf upstream.

If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the timeout remove op into
the prep handling to make it safe for SQE reuse.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

baf2a6a7

io_uring: make IORING_OP_CANCEL_ASYNC deferrable · bb92d9dd

由 Jens Axboe 提交于 12月 17, 2019

to #26323578

commit fbf23849b1724d3ea362e346d0877a8d87978fe6 upstream.

If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the async cancel op into
the prep handling to make it safe for SQE reuse.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

bb92d9dd

io_uring: make IORING_POLL_ADD and IORING_POLL_REMOVE deferrable · 5027d877

由 Jens Axboe 提交于 12月 17, 2019

to #26323578

commit 0969e783e3a8913f79df27286501a6c21e961524 upstream.

If we defer these commands as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the poll add/remove into
the prep handling to make it safe for SQE reuse.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

5027d877

io_uring: make HARDLINK imply LINK · 384d1eb0

由 Pavel Begunkov 提交于 12月 17, 2019

to #26323578

commit ffbb8d6b76910d4f3a2bafeaf68c419011e98d05 upstream.

The rules are as follows, if IOSQE_IO_HARDLINK is specified, then it's a
link and there is no need to set IOSQE_IO_LINK separately, though it
could be there. Add proper check and ensure that IOSQE_IO_HARDLINK
implies IOSQE_IO_LINK.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

384d1eb0

io_uring: any deferred command must have stable sqe data · 1d0e0743

由 Jens Axboe 提交于 12月 16, 2019

to #26323578

commit 8ed8d3c3bc32bf5b442c9f54013b4a47d5cae740 upstream.

We're currently not retaining sqe data for accept, fsync, and
sync_file_range. None of these commands need data outside of what
is directly provided, hence it can't go stale when the request is
deferred. However, it can get reused, if an application reuses
SQE entries.

Ensure that we retain the information we need and only read the sqe
contents once, off the submission path. Most of this is just moving
code into a prep and finish function.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

1d0e0743

io_uring: remove 'sqe' parameter to the OP helpers that take it · 16b249b5

由 Jens Axboe 提交于 12月 10, 2019

to #26323578

commit fc4df999e24fc3006441acd4ce6250e6a76ac851 upstream.

We pass in req->sqe for all of them, no need to pass it in as the
request is always passed in. This is a necessary prep patch to be
able to cleanup/fix the request prep path.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

16b249b5

io_uring: fix pre-prepped issue with force_nonblock == true · c16f35ff

由 Jens Axboe 提交于 12月 15, 2019

to #26323578

commit b7bb4f7da0a1a92f142697f1c9ce335e7a44f4b1 upstream.

Some of these code paths assume that any force_nonblock == true issue
is not prepped, but that's not true if we did prep as part of link setup
earlier. Check if we already have an async context allocate before
setting up a new one.

Cleanup the async context setup in general, we have a lot of duplicated
code there.

Fixes: 03b1230ca12a ("io_uring: ensure async punted sendmsg/recvmsg requests copy data")
Fixes: f67676d160c6 ("io_uring: ensure async punted read/write requests copy iovec")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c16f35ff

io_uring: fix sporadic -EFAULT from IORING_OP_RECVMSG · c5eb3fd3

由 Jens Axboe 提交于 12月 15, 2019

to #26323578

commit 0b416c3e1345fd696db4c422643468d844410877 upstream.

If we have to punt the recvmsg to async context, we copy all the
context.  But since the iovec used can be either on-stack (if small) or
dynamically allocated, if it's on-stack, then we need to ensure we reset
the iov pointer. If we don't, then we're reusing old stack data, and
that can lead to -EFAULTs if things get overwritten.

Ensure we retain the right pointers for the iov, and free it as well if
we end up having to go beyond UIO_FASTIOV number of vectors.

Fixes: 03b1230ca12a ("io_uring: ensure async punted sendmsg/recvmsg requests copy data")
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c5eb3fd3

io_uring: fix stale comment and a few typos · 29e01b6a

由 Brian Gianforcaro 提交于 12月 13, 2019

to #26323578

commit d195a66e367b3d24fdd3c3565f37ab7c6882b9d2 upstream.

- Fix a few typos found while reading the code.

- Fix stale io_get_sqring comment referencing s->sqe, the 's' parameter
  was renamed to 'req', but the comment still holds.
Signed-off-by: NBrian Gianforcaro <b.gianfo@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

29e01b6a

io_uring: ensure we return -EINVAL on unknown opcode · 569f4461

由 Jens Axboe 提交于 12月 11, 2019

to #26323578

commit 9e3aa61ae3e01ce1ce6361a41ef725e1f4d1d2bf upstream.

If we submit an unknown opcode and have fd == -1, io_op_needs_file()
will return true as we default to needing a file. Then when we go and
assign the file, we find the 'fd' invalid and return -EBADF. We really
should be returning -EINVAL for that case, as we normally do for
unsupported opcodes.

Change io_op_needs_file() to have the following return values:

0   - does not need a file
1   - does need a file
< 0 - error value

and use this to pass back the right value for this invalid case.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

569f4461

io_uring: add sockets to list of files that support non-blocking issue · d935af1c

由 Jens Axboe 提交于 12月 09, 2019

to #26323578

commit 10d59345578a116042c1a5d737a18234aaf3e0e6 upstream.

In chasing a performance issue between using IORING_OP_RECVMSG and
IORING_OP_READV on sockets, tracing showed that we always punt the
socket reads to async offload. This is due to io_file_supports_async()
not checking for S_ISSOCK on the inode. Since sockets supports the
O_NONBLOCK (or MSG_DONTWAIT) flag just fine, add sockets to the list
of file types that we can do a non-blocking issue to.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

d935af1c

io_uring: only hash regular files for async work execution · 1efbc3bd

由 Jens Axboe 提交于 12月 09, 2019

to #26323578

commit 53108d476a105ab2597d7a4e6040b127829391b5 upstream.

We hash regular files to avoid having multiple threads hammer on the
inode mutex, but it should not be needed on other types of files
(like sockets).
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

1efbc3bd

io_uring: run next sqe inline if possible · 4319a66f

由 Jens Axboe 提交于 12月 09, 2019

to #26323578

commit 4a0a7a187453e65bdd24b9ede045b4c36b958868 upstream.

One major use case of linked commands is the ability to run the next
link inline, if at all possible. This is done correctly for async
offload, but somewhere along the line we lost the ability to do so when
we were able to complete a request without having to punt it. Ensure
that we do so correctly.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4319a66f

io_uring: don't dynamically allocate poll data · c1f967d5

由 Jens Axboe 提交于 12月 09, 2019

to #26323578

commit 392edb45b24337eaa0bc1ecd4e3cf897e662ec61 upstream.

This essentially reverts commit e944475e6984. For high poll ops
workloads, like TAO, the dynamic allocation of the wait_queue
entry for IORING_OP_POLL_ADD adds considerable extra overhead.
Go back to embedding the wait_queue_entry, but keep the usage of
wait->private for the pointer stashing.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c1f967d5

io_uring: deferred send/recvmsg should assign iov · 284c1391

由 Jens Axboe 提交于 12月 09, 2019

to #26323578

commit d96885658d9971fc2c752b8699f17a42ef745db6 upstream.

Don't just assign it from the main call path, that can miss the case
when we're called from issue deferral.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

284c1391

io_uring: sqthread should grab ctx->uring_lock for submissions · e5b3fb54

由 Jens Axboe 提交于 12月 09, 2019

to #26323578

commit 8a4955ff1cca7d4da480774034a16e7c28bafec8 upstream.

We use the mutex to guard against registered file updates, for instance.
Ensure we're safe in accessing that state against concurrent updates.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e5b3fb54

io-wq: briefly spin for new work after finishing work · 52bb67cc

由 Jens Axboe 提交于 12月 07, 2019

to #26323578

commit e995d5123ed433e37a8d63ac528737c912592e3d upstream.

To avoid going to sleep only to get woken shortly thereafter, spin
briefly for new work upon completion of work.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

52bb67cc

io-wq: remove worker->wait waitqueue · 992c567e

由 Jens Axboe 提交于 12月 07, 2019

to #26323578

We only have one cases of using the waitqueue to wake the worker, the
rest are using wake_up_process(). Since we can save some cycles not
fiddling with the waitqueue io_wqe_worker(), switch the work activation
to task wakeup and get rid of the now unused wait_queue_head_t in
struct io_worker.

commit 506d95ff5d6aa0a099a116c49d3884e29801d843 upstream.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

992c567e

io_uring: allow unbreakable links · e10c16e1

由 Jens Axboe 提交于 12月 07, 2019

to #26323578

commit 4e88d6e7793f2f445f43bd608828541d7f43b608 upstream.

Some commands will invariably end in a failure in the sense that the
completion result will be less than zero. One such example is timeouts
that don't have a completion count set, they will always complete with
-ETIME unless cancelled.

For linked commands, we sever links and fail the rest of the chain if
the result is less than zero. Since we have commands where we know that
will happen, add IOSQE_IO_HARDLINK as a stronger link that doesn't sever
regardless of the completion result. Note that the link will still sever
if we fail submitting the parent request, hard links are only resilient
in the presence of completion results for requests that did submit
correctly.

Cc: stable@vger.kernel.org # v5.4
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e10c16e1

io_uring: fix a typo in a comment · c81a55f4

由 LimingWu 提交于 12月 05, 2019

to #26323578

commit 0b4295b5e2b9b42f3f3096496fe4775b656c9ba6 upstream.

thatn -> than.
Signed-off-by: NLiming Wu <19092205@suning.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c81a55f4

io_uring: hook all linked requests via link_list · 9e6624b6

由 Pavel Begunkov 提交于 12月 05, 2019

to #26323578

commit 4493233edcfc0ad0a7f76f1c83f95b1bcf280547 upstream.

Links are created by chaining requests through req->list with an
exception that head uses req->link_list. (e.g. link_list->list->list)
Because of that, io_req_link_next() needs complex splicing to advance.

Link them all through list_list. Also, it seems to be simpler and more
consistent IMHO.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

9e6624b6

io_uring: fix error handling in io_queue_link_head · 4e09c502

由 Pavel Begunkov 提交于 12月 05, 2019

to #26323578

commit 2e6e1fde32d7d41cf076c21060c329d3fdbce25c upstream.

In case of an error io_submit_sqe() drops a request and continues
without it, even if the request was a part of a link. Not only it
doesn't cancel links, but also may execute wrong sequence of actions.

Stop consuming sqes, and let the user handle errors.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4e09c502

io_uring: use hash table for poll command lookups · fc77bd62

由 Jens Axboe 提交于 12月 04, 2019

to #26323578

commit 78076bb64aa8ba5b7207c38b2660a9e10ffa8cc7 upstream.

We recently changed this from a single list to an rbtree, but for some
real life workloads, the rbtree slows down the submission/insertion
case enough so that it's the top cycle consumer on the io_uring side.
In testing, using a hash table is a more well rounded compromise. It
is fast for insertion, and as long as it's sized appropriately, it
works well for the cancellation case as well. Running TAO with a lot
of network sockets, this removes io_poll_req_insert() from spending
2% of the CPU cycles.
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

fc77bd62

io_uring: ensure deferred timeouts copy necessary data · bcad288f

由 Jens Axboe 提交于 12月 04, 2019

to #26323578

commit 2d28390aff879238f00e209e38c2a0b78717360e upstream.

If we defer a timeout, we should ensure that we copy the timespec
when we have consumed the sqe. This is similar to commit f67676d160c6
for read/write requests. We already did this correctly for timeouts
deferred as links, but do it generally and use the infrastructure added
by commit 1a6b74fc8702 instead of having the timeout deferral use its
own.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

bcad288f

io_uring: allow IO_SQE_* flags on IORING_OP_TIMEOUT · f629be21

由 Jens Axboe 提交于 12月 04, 2019

to #26323578

commit 901e59bba9ddad4bc6994ecb8598ea60a993da4c upstream.

There's really no reason why we forbid things like link/drain etc on
regular timeout commands. Enable the usual SQE flags on timeouts.
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f629be21

io_uring: handle connect -EINPROGRESS like -EAGAIN · 11360117

由 Jens Axboe 提交于 12月 03, 2019

to #26323578

commit 87f80d623c6c93c721b2aaead8a45e848bc8ffbf upstream.

Right now we return it to userspace, which means the application has
to poll for the socket to be writeable. Let's just treat it like
-EAGAIN and have io_uring handle it internally, this makes it much
easier to use.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

11360117

io_uring: remove parameter ctx of io_submit_state_start · 708b8cda

由 Jackie Liu 提交于 12月 02, 2019

to #26323578

commit 22efde5998657f6d1f31592c659aa3a9c7ad65f1 upstream.

Parameter ctx we have never used, clean it up.
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

708b8cda

io_uring: mark us with IORING_FEAT_SUBMIT_STABLE · b3b5cb38

由 Jens Axboe 提交于 12月 02, 2019

to #26323578

commit da8c96906990f1108cb626ee7865e69267a3263b upstream.

If this flag is set, applications can be certain that any data for
async offload has been consumed when the kernel has consumed the
SQE.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

b3b5cb38

io_uring: ensure async punted connect requests copy data · 62753e93

由 Jens Axboe 提交于 12月 02, 2019

to #26323578

commit f499a021ea8c9f70321fce3d674d8eca5bbeee2c upstream.

Just like commit f67676d160c6 for read/write requests, this one ensures
that the sockaddr data has been copied for IORING_OP_CONNECT if we need
to punt the request to async context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

62753e93

io_uring: ensure async punted sendmsg/recvmsg requests copy data · 43b411d0

由 Jens Axboe 提交于 12月 02, 2019

to #26323578

commit 03b1230ca12a12e045d83b0357792075bf94a1e0 upstream.

Just like commit f67676d160c6 for read/write requests, this one ensures
that the msghdr data is fully copied if we need to punt a recvmsg or
sendmsg system call to async context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

43b411d0

net: disallow ancillary data for __sys_{send,recv}msg_file() · dbc2b5b9

由 Jens Axboe 提交于 11月 25, 2019

to #26323578

commit d69e07793f891524c6bbf1e75b9ae69db4450953 upstream.

Only io_uring uses (and added) these, and we want to disallow the
use of sendmsg/recvmsg for anything but regular data transfers.
Use the newly added prep helper to split the msghdr copy out from
the core function, to check for msg_control and msg_controllen
settings. If either is set, we return -EINVAL.
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

dbc2b5b9

net: separate out the msghdr copy from ___sys_{send,recv}msg() · 15ec0cd5

由 Jens Axboe 提交于 11月 25, 2019

to #26323578

commit 4257c8ca13b084550574b8c9a667d9c90ff746eb upstream.

This is in preparation for enabling the io_uring helpers for sendmsg
and recvmsg to first copy the header for validation before continuing
with the operation.

There should be no functional changes in this patch.
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

15ec0cd5

io_uring: ensure async punted read/write requests copy iovec · 99130197

由 Jens Axboe 提交于 12月 02, 2019

to #26323578

commit f67676d160c6ee2ed82917fadfed6d29cab8237c upstream.

Currently we don't copy the iovecs when we punt to async context. This
can be problematic for applications that store the iovec on the stack,
as they often assume that it's safe to let the iovec go out of scope
as soon as IO submission has been called. This isn't always safe, as we
will re-copy the iovec once we're in async context.

Make this 100% safe by copying the iovec just once. With this change,
applications may safely store the iovec on the stack for all cases.
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

99130197

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功