提交 · 6d31fb7e1585fe857c96f7b920b8300085c721db · openanolis / cloud-kernel

28 5月, 2020 33 次提交

io_uring: remove 'fname' from io_open structure · 6d31fb7e

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit f8748881b17dc56b3faa1d30c823f071c56593e5 upstream.

We only use it internally in the prep functions for both statx and
openat, so we don't need it to be persistent across the request.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

6d31fb7e

io_uring: add 'struct open_how' to the openat request context · ab1ea5a0

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit c12cedf24e786509de031a832e6b0e5f8b3ca37b upstream.

We'll need this for openat2(2) support, remove flags and mode from
the existing io_open struct.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

ab1ea5a0

io_uring: enable option to only trigger eventfd for async completions · f6ec8326

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit f2842ab5b72d7ee5f7f8385c2d4f32c133f5837b upstream.

If an application is using eventfd notifications with poll to know when
new SQEs can be issued, it's expecting the following read/writes to
complete inline. And with that, it knows that there are events available,
and don't want spurious wakeups on the eventfd for those requests.

This adds IORING_REGISTER_EVENTFD_ASYNC, which works just like
IORING_REGISTER_EVENTFD, except it only triggers notifications for events
that happen from async completions (IRQ, or io-wq worker completions).
Any completions inline from the submission itself will not trigger
notifications.
Suggested-by: NMark Papadakis <markuspapadakis@icloud.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f6ec8326

io_uring: change io_ring_ctx bool fields into bit fields · 8bfd6703

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit 69b3e546139a21b3046b6bf0cb79d5e8c9a3fa75 upstream.

In preparation for adding another one, which would make us spill into
another long (and hence bump the size of the ctx), change them to
bit fields.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8bfd6703

io_uring: file set registration should use interruptible waits · cb875051

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit c150368b496837cb207712e78f903ccfd7633b93 upstream.

If an application attempts to register a set with unbounded requests
pending, we can be stuck here forever if they don't complete. We can
make this wait interruptible, and just abort if we get signaled.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

cb875051

io_uring: Remove unnecessary null check · 1c501709

由 YueHaibing 提交于 1月 07, 2020

to #26323588

commit 96fd84d83a778450ffae737d9efa546ac3983b1f upstream.

Null check kfree is redundant, so remove it.
This is detected by coccinelle.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

1c501709

io_uring: add support for send(2) and recv(2) · 67cb708e

由 Jens Axboe 提交于 1月 04, 2020

to #26323588

commit fddafacee287b3140212c92464077e971401f860 upstream.

This adds IORING_OP_SEND for send(2) support, and IORING_OP_RECV for
recv(2) support.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

67cb708e

io_uring: remove extra io_wq_current_is_worker() · 6a793707

由 Pavel Begunkov 提交于 12月 30, 2019

to #26323588

commit 2550878f8421f7912fdd56b38c630b797f95c749 upstream.

io_wq workers use io_issue_sqe() to forward sqes and never
io_queue_sqe(). Remove extra check for io_wq_current_is_worker()
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

6a793707

io_uring: optimise commit_sqring() for common case · a8293b97

由 Pavel Begunkov 提交于 12月 30, 2019

to #26323588

commit caf582c652feccd42c50923f0467c4f2dcef279e upstream.

It should be pretty rare to not submitting anything when there is
something in the ring. No need to keep heuristics for this case.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a8293b97

io_uring: optimise head checks in io_get_sqring() · 11906dda

由 Pavel Begunkov 提交于 12月 30, 2019

to #26323588

commit ee7d46d9db19ded7b7222af95add63606318a480 upstream.

A user may ask to submit more than there is in the ring, and then
io_uring will submit as much as it can. However, in the last iteration
it will allocate an io_kiocb and immediately free it. It could do
better and adjust @to_submit to what is in the ring.

And since the ring's head is already checked here, there is no need to
do it in the loop, spamming with smp_load_acquire()'s barriers
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

11906dda

io_uring: clamp to_submit in io_submit_sqes() · 19a5810b

由 Pavel Begunkov 提交于 12月 30, 2019

to #26323588

commit 9ef4f124894b7b9241a3cf5f9b40db0812783d66 upstream.

Make io_submit_sqes() to clamp @to_submit itself. It removes duplicated
code and prepares for following changes.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

19a5810b

io_uring: add support for IORING_SETUP_CLAMP · 27a29820

由 Jens Axboe 提交于 12月 28, 2019

to #26323588

commit 8110c1a6212e430a84edd2b83fe9043def8b743e upstream.

Some applications like to start small in terms of ring size, and then
ramp up as needed. This is a bit tricky to do currently, since we don't
advertise the max ring size.

This adds IORING_SETUP_CLAMP. If set, and the values for SQ or CQ ring
size exceed what we support, then clamp them at the max values instead
of returning -EINVAL. Since we return the chosen ring sizes after setup,
no further changes are needed on the application side. io_uring already
changes the ring sizes if the application doesn't ask for power-of-two
sizes, for example.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

27a29820

io_uring: extend batch freeing to cover more cases · 38453a53

由 Jens Axboe 提交于 12月 28, 2019

to #26323588

commit c6ca97b30c47c7ad36107d3764bb4dc37026d171 upstream.

Currently we only batch free if fixed files are used, no links, no aux
data, etc. This extends the batch freeing to only exclude the linked
case and fallback case, and make io_free_req_many() handle the other
cases just fine.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

38453a53

io_uring: wrap multi-req freeing in struct req_batch · 479df09a

由 Jens Axboe 提交于 12月 28, 2019

to #26323588

commit 8237e045983d82ba78eaab5f60b9300927fc6796 upstream.

This cleans up the code a bit, and it allows us to build on top of the
multi-req freeing.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

479df09a

io_uring: batch getting pcpu references · 344bf401

由 Pavel Begunkov 提交于 12月 28, 2019

to #26323588

commit 2b85edfc0c90efc68dea3d665bb4111bf0694e05 upstream.

percpu_ref_tryget() has its own overhead. Instead getting a reference
for each request, grab a bunch once per io_submit_sqes().

~5% throughput boost for a "submit and wait 128 nops" benchmark.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>

__io_req_free_empty() -> __io_req_do_free()
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

344bf401

io_uring: add IORING_OP_MADVISE · 745a29cc

由 Jens Axboe 提交于 12月 25, 2019

to #26323588

commit c1ca757bd6f4632c510714631ddcc2d13030fe1e upstream.

This adds support for doing madvise(2) through io_uring. We assume that
any operation can block, and hence punt everything async. This could be
improved, but hard to make bullet proof. The async punt ensures it's
safe.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

745a29cc

io_uring: add IORING_OP_FADVISE · f6f35684

由 Jens Axboe 提交于 12月 25, 2019

to #26323588

commit 4840e418c2fc533d55ff6caa5b9313eed1d26cfd upstream.

This adds support for doing fadvise through io_uring. We assume that
WILLNEED doesn't block, but that DONTNEED may block.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f6f35684

io_uring: allow use of offset == -1 to mean file position · d19119ee

由 Jens Axboe 提交于 12月 25, 2019

to #26323588

commit ba04291eb66ed895f194ae5abd3748d72bf8aaea upstream.

This behaves like preadv2/pwritev2 with offset == -1, it'll use (and
update) the current file position. This obviously comes with the caveat
that if the application has multiple read/writes in flight, then the
end result will not be as expected. This is similar to threads sharing
a file descriptor and doing IO using the current file position.

Since this feature isn't easily detectable by doing a read or write,
add a feature flags, IORING_FEAT_RW_CUR_POS, to allow applications to
detect presence of this feature.
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

d19119ee

io_uring: add non-vectored read/write commands · 30b7fe8b

由 Jens Axboe 提交于 12月 22, 2019

to #26323588

commit 3a6820f2bb8a079975109c25a5d1f29f46bce5d2 upstream.

For uses cases that don't already naturally have an iovec, it's easier
(or more convenient) to just use a buffer address + length. This is
particular true if the use case is from languages that want to create
a memory safe abstraction on top of io_uring, and where introducing
the need for the iovec may impose an ownership issue. For those cases,
they currently need an indirection buffer, which means allocating data
just for this purpose.

Add basic read/write that don't require the iovec.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

30b7fe8b

io_uring: improve poll completion performance · 5d4ba74a

由 Jens Axboe 提交于 12月 19, 2019

to #26323588

commit e94f141bd248ebdadcb7351f1e70b31cee5add53 upstream.

For busy IORING_OP_POLL_ADD workloads, we can have enough contention
on the completion lock that we fail the inline completion path quite
often as we fail the trylock on that lock. Add a list for deferred
completions that we can use in that case. This helps reduce the number
of async offloads we have to do, as if we get multiple completions in
a row, we'll piggy back on to the poll_llist instead of having to queue
our own offload.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

5d4ba74a

io_uring: split overflow state into SQ and CQ side · 5e8e604e

由 Jens Axboe 提交于 12月 18, 2019

to #26323588

commit ad3eb2c89fb24d14ac81f43eff8e85fece2c934d upstream.

We currently check ->cq_overflow_list from both SQ and CQ context, which
causes some bouncing of that cache line. Add separate bits of state for
this instead, so that the SQ side can check using its own state, and
likewise for the CQ side.

This adds ->sq_check_overflow with the SQ state, and ->cq_check_overflow
with the CQ state. If we hit an overflow condition, both of these bits
are set. Likewise for overflow flush clear, we clear both bits. For the
fast path of just checking if there's an overflow condition on either
the SQ or CQ side, we can use our own private bit for this.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

5e8e604e

io_uring: add lookup table for various opcode needs · 037118ad

由 Jens Axboe 提交于 12月 18, 2019

to #26323588

commit d3656344fea0339fb0365c8df4d2beba4e0089cd upstream.

We currently have various switch statements that check if an opcode needs
a file, mm, etc. These are hard to keep in sync as opcodes are added. Add
a struct io_op_def that holds all of this information, so we have just
one spot to update when opcodes are added.

This also enables us to NOT allocate req->io if a deferred command
doesn't need it, and corrects some mistakes we had in terms of what
commands need mm context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

037118ad

io_uring: remove two unnecessary function declarations · 99071bc2

由 Jens Axboe 提交于 12月 18, 2019

to #26323588

commit add7b6b85a4dfa89283834d181e87ea2144b9028 upstream.

__io_free_req() and io_double_put_req() aren't used before they are
defined, so we can kill these two forwards.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

99071bc2

io_uring: move *queue_link_head() from common path · 4574eaa1

由 Pavel Begunkov 提交于 12月 17, 2019

to #26323588

commit 32fe525b6d10fec956cfe68f0db76839cd7f0ea5 upstream.

Move io_queue_link_head() to links handling code in io_submit_sqe(),
so it wouldn't need extra checks and would have better data locality.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4574eaa1

io_uring: rename prev to head · a13dd9ad

由 Pavel Begunkov 提交于 12月 17, 2019

to #26323588

commit 9d76377f7e13c19441fdd066033345289f89b5fe upstream.

Calling "prev" a head of a link is a bit misleading. Rename it
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a13dd9ad

io_uring: add IOSQE_ASYNC · ec8cd2fa

由 Jens Axboe 提交于 12月 17, 2019

to #26323588

commit ce35a47a3a0208a77b4d31b7f2e8ed57d624093d upstream.

io_uring defaults to always doing inline submissions, if at all
possible. But for larger copies, even if the data is fully cached, that
can take a long time. Add an IOSQE_ASYNC flag that the application can
set on the SQE - if set, it'll ensure that we always go async for those
kinds of requests. Use the io-wq IO_WQ_WORK_CONCURRENT flag to ensure we
get the concurrency we desire for this case.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

ec8cd2fa

io_uring: add support for IORING_OP_STATX · b396604c

由 Jens Axboe 提交于 12月 13, 2019

to #26323588

commit eddc7ef52a6b37b7ba3d1c8a8fbb63d5d9914f8a upstream.

This provides support for async statx(2) through io_uring.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

b396604c

io_uring: avoid ring quiesce for fixed file set unregister and update · 43df7c78

由 Jens Axboe 提交于 12月 09, 2019

to #26323588

We currently fully quiesce the ring before an unregister or update of
the fixed fileset. This is very expensive, and we can be a bit smarter
about this.

Add a percpu refcount for the file tables as a whole. Grab a percpu ref
when we use a registered file, and put it on completion. This is cheap
to do. Upon removal of a file from a set, switch the ref count to atomic
mode. When we hit zero ref on the completion side, then we know we can
drop the previously registered files. When the old files have been
dropped, switch the ref back to percpu mode for normal operation.

Since there's a period between doing the update and the kernel being
done with it, add a IORING_OP_FILES_UPDATE opcode that can perform the
same action. The application knows the update has completed when it gets
the CQE for it. Between doing the update and receiving this completion,
the application must continue to use the unregistered fd if submitting
IO on this particular file.

This takes the runtime of test/file-register from liburing from 14s to
about 0.7s.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

43df7c78

io_uring: initialize percpu refcounters using PERCU_REF_ALLOW_REINIT · 3fc2e820

由 Roman Gushchin 提交于 5月 07, 2019

to #26323588

commit 214828962dead0c698f92b60ef97ce3c5fc2c8fe upstream.

Percpu reference counters should now be initialized with the
PERCPU_REF_ALLOW_REINIT in order to allow switching them to the
percpu mode from the atomic mode. This is exactly what
percpu_ref_reinit() called from __io_uring_register() is supposed to
do. So let's initialize percpu refcounters with the
PERCU_REF_ALLOW_REINIT flag.
Signed-off-by: NRoman Gushchin <guro@fb.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

3fc2e820

io_uring: add support for IORING_OP_CLOSE · 7a194789

由 Jens Axboe 提交于 12月 11, 2019

to #26323588

commit b5dba59e0cf7e2cc4d3b3b1ac5fe81ddf21959eb upstream.

This works just like close(2), unsurprisingly. We remove the file
descriptor and post the completion inline, then offload the actual
(potential) last file put to async context.

Mark the async part of this work as uncancellable, as we really must
guarantee that the latter part of the close is run.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

7a194789

io-wq: add support for uncancellable work · 9b71b0f1

由 Jens Axboe 提交于 12月 11, 2019

to #26323588

commit 0c9d5ccd26a004f59333c06fbbb98f9cb1eed93d upstream.

Not all work can be cancelled, some of it we may need to guarantee
that it runs to completion. Allow the caller to set IO_WQ_WORK_NO_CANCEL
on work that must not be cancelled. Note that the caller work function
must also check for IO_WQ_WORK_NO_CANCEL on work that is marked
IO_WQ_WORK_CANCEL.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

9b71b0f1

io_uring: add support for IORING_OP_OPENAT · dd7ff1b1

由 Jens Axboe 提交于 12月 11, 2019

to #26323588

commit 15b71abe7b52df214785dde0de9f581cc0216d17 upstream.

This works just like openat(2), except it can be performed async. For
the normal case of a non-blocking path lookup this will complete
inline. If we have to do IO to perform the open, it'll be done from
async context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

dd7ff1b1

io_uring: add support for fallocate() · 41076b20

由 Jens Axboe 提交于 12月 10, 2019

to #26323588

commit d63d1b5edb7b832210bfde587ba9e7549fa064eb upstream.

This exposes fallocate(2) through io_uring.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

41076b20

27 5月, 2020 7 次提交

io_uring: don't cancel all work on process exit · 072f3d91

由 Jens Axboe 提交于 1月 26, 2020

to #26323578

commit ebe10026210f9ea740b9a050ee84a166690fddde upstream.

If we're sharing the ring across forks, then one process exiting means
that we cancel ALL work and prevent future work. This is overly
restrictive. As long as we cancel the work associated with the files
from the current task, it's safe to let others persist. Normal fd close
on exit will still wait (and cancel) pending work.

Fixes: fcb323cc53e2 ("io_uring: io_uring: add support for async work inheriting files")
Reported-by: NAndres Freund <andres@anarazel.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

072f3d91

io_uring: fix compat for IORING_REGISTER_FILES_UPDATE · ff2cdc98

由 Eugene Syromiatnikov 提交于 1月 15, 2020

to #26323578

commit 1292e972fff2b2d81e139e0c2fe5f50249e78c58 upstream.

fds field of struct io_uring_files_update is problematic with regards
to compat user space, as pointer size is different in 32-bit, 32-on-64-bit,
and 64-bit user space.  In order to avoid custom handling of compat in
the syscall implementation, make fds __u64 and use u64_to_user_ptr in
order to retrieve it.  Also, align the field naturally and check that
no garbage is passed there.

Fixes: c3a31e605620c279 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE")
Signed-off-by: NEugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

ff2cdc98

io_uring: ensure workqueue offload grabs ring mutex for poll list · 8265261d

由 Jens Axboe 提交于 1月 15, 2020

to #26323578

commit 11ba820bf163e224bf5dd44e545a66a44a5b1d7a upstream.

A previous commit moved the locking for the async sqthread, but didn't
take into account that the io-wq workers still need it. We can't use
req->in_async for this anymore as both the sqthread and io-wq workers
set it, gate the need for locking on io_wq_current_is_worker() instead.

Fixes: 8a4955ff1cca ("io_uring: sqthread should grab ctx->uring_lock for submissions")
Reported-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8265261d

io_uring: clear req->result always before issuing a read/write request · 813322dd

由 Bijan Mottahedeh 提交于 1月 15, 2020

to #26323578

commit 797f3f535d59f05ad12c629338beef6cb801d19e upstream.

req->result is cleared when io_issue_sqe() calls io_read/write_pre()
routines.  Those routines however are not called when the sqe
argument is NULL, which is the case when io_issue_sqe() is called from
io_wq_submit_work().  io_issue_sqe() may then examine a stale result if
a polled request had previously failed with -EAGAIN:

        if (ctx->flags & IORING_SETUP_IOPOLL) {
                if (req->result == -EAGAIN)
                        return -EAGAIN;

                io_iopoll_req_issued(req);
        }

and in turn cause a subsequently completed request to be re-issued in
io_wq_submit_work().
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

813322dd

io_uring: be consistent in assigning next work from handler · 7e63ce0d

由 Jens Axboe 提交于 1月 14, 2020

to #26323578

commit 78912934f4f7dd7a424159c69bf9bdd46e823781 upstream.

If we pass back dependent work in case of links, we need to always
ensure that we call the link setup and work prep handler. If not, we
might be missing some setup for the next work item.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

7e63ce0d

io_uring: don't setup async context for read/write fixed · fcd3cdab

由 Jens Axboe 提交于 1月 13, 2020

to #26323578

commit 74566df3a71c1b92da608868cca787557d8be7b2 upstream.

We don't need it, and if we have it, then the retry handler will attempt
to copy the non-existent iovec with the inline iovec, with a segment
count that doesn't make sense.

Fixes: f67676d160c6 ("io_uring: ensure async punted read/write requests copy iovec")
Reported-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

fcd3cdab

io_uring: remove punt of short reads to async context · fdb55458

由 Jens Axboe 提交于 1月 07, 2020

to #26323578

commit eacc6dfaea963ef61540abb31ad7829be5eff284 upstream.

We currently punt any short read on a regular file to async context,
but this fails if the short read is due to running into EOF. This is
especially problematic since we only do the single prep for commands
now, as we don't reset kiocb->ki_pos. This can result in a 4k read on
a 1k file returning zero, as we detect the short read and then retry
from async context. At the time of retry, the position is now 1k, and
we end up reading nothing, and hence return 0.

Instead of trying to patch around the fact that short reads can be
legitimate and won't succeed in case of retry, remove the logic to punt
a short read to async context. Simply return it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

fdb55458

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功