提交 · 8cdf2193a3335b4cfb6e023b41ac293d0843d287 · openeuler / Kernel

28 1月, 2020 2 次提交

io_uring: add comment for drain_next · 8cdf2193

由 Pavel Begunkov 提交于 1月 25, 2020

Draining the middle of a link is tricky, so leave a comment there
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8cdf2193

io_uring: don't attempt to copy iovec for READ/WRITE · 980ad263

由 Jens Axboe 提交于 1月 24, 2020

For the non-vectored variant of READV/WRITEV, we don't need to setup an
async io context, and we flag that appropriately in the io_op_defs
array. However, in fixing this for the 5.5 kernel in commit 74566df3
we didn't have these opcodes, so the check there was added just for the
READ_FIXED and WRITE_FIXED opcodes. Replace that check with just a
single check for needing async context, that covers all four of these
read/write variants that don't use an iovec.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

980ad263

23 1月, 2020 2 次提交

io_uring: honor IOSQE_ASYNC for linked reqs · 86a761f8

由 Pavel Begunkov 提交于 1月 22, 2020

REQ_F_FORCE_ASYNC is checked only for the head of a link. Fix it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

86a761f8

io_uring: prep req when do IOSQE_ASYNC · 1118591a

由 Pavel Begunkov 提交于 1月 22, 2020

Whenever IOSQE_ASYNC is set, requests will be punted to async without
getting into io_issue_req() and without proper preparation done (e.g.
io_req_defer_prep()). Hence they will be left uninitialised.

Prepare them before punting.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1118591a

21 1月, 2020 36 次提交

io_uring: use labeled array init in io_op_defs · 0463b6c5

由 Pavel Begunkov 提交于 1月 18, 2020

Don't rely on implicit ordering of IORING_OP_ and explicitly place them
at a right place in io_op_defs. Now former comments are now a part of
the code and won't ever outdate.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0463b6c5

io_uring: optimise sqe-to-req flags translation · 6b47ee6e

由 Pavel Begunkov 提交于 1月 18, 2020

For each IOSQE_* flag there is a corresponding REQ_F_* flag. And there
is a repetitive pattern of their translation:
e.g. if (sqe->flags & SQE_FLAG*) req->flags |= REQ_F_FLAG*

Use same numeric values/bits for them and copy instead of manual
handling.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6b47ee6e

io_uring: remove REQ_F_IO_DRAINED · 87987898

由 Pavel Begunkov 提交于 1月 18, 2020

A request can get into the defer list only once, there is no need for
marking it as drained, so remove it. This probably was left after
extracting __need_defer() for use in timeouts.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

87987898

io_uring: file switch work needs to get flushed on exit · e46a7950

由 Jens Axboe 提交于 1月 17, 2020

We currently flush early, but if we have something in progress and a
new switch is scheduled, we need to ensure to flush after our teardown
as well.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e46a7950

io_uring: hide uring_fd in ctx · b14cca0c

由 Pavel Begunkov 提交于 1月 17, 2020

req->ring_fd and req->ring_file are used only during the prep stage
during submission, which is is protected by mutex. There is no need
to store them per-request, place them in ctx.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b14cca0c

io_uring: remove extra check in __io_commit_cqring · 07910158

由 Pavel Begunkov 提交于 1月 17, 2020

__io_commit_cqring() is almost always called when there is a change in
the rings, so the check is rather pessimising.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

07910158

io_uring: optimise use of ctx->drain_next · 711be031

由 Pavel Begunkov 提交于 1月 17, 2020

Move setting ctx->drain_next to the only place it could be set, when it
got linked non-head requests. The same for checking it, it's interesting
only for a head of a link or a non-linked request.

No functional changes here. This removes some code from the common path
and also removes REQ_F_DRAIN_LINK flag, as it doesn't need it anymore.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

711be031

io_uring: add support for probing opcodes · 66f4af93

由 Jens Axboe 提交于 1月 16, 2020

The application currently has no way of knowing if a given opcode is
supported or not without having to try and issue one and see if we get
-EINVAL or not. And even this approach is fraught with peril, as maybe
we're getting -EINVAL due to some fields being missing, or maybe it's
just not that easy to issue that particular command without doing some
other leg work in terms of setup first.

This adds IORING_REGISTER_PROBE, which fills in a structure with info
on what it supported or not. This will work even with sparse opcode
fields, which may happen in the future or even today if someone
backports specific features to older kernels.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

66f4af93

io_uring: account fixed file references correctly in batch · 10fef4be

由 Jens Axboe 提交于 1月 09, 2020

We can't assume that the whole batch has fixed files in it. If it's a
mix, or none at all, then we can end up doing a ref put that either
messes up accounting, or causes an oops if we have no fixed files at
all.

Also ensure we free requests properly between inflight accounted and
normal requests.

Fixes: 82c721577011 ("io_uring: extend batch freeing to cover more cases")
Reported-by: NDmitrii Dolgov <9erthalion6@gmail.com>
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Tested-by: NDmitrii Dolgov <9erthalion6@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

10fef4be

io_uring: add opcode to issue trace event · 354420f7

由 Jens Axboe 提交于 1月 08, 2020

For some test apps at least, user_data is just zeroes. So it's not a
good way to tell what the command actually is. Add the opcode to the
issue trace point.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

354420f7

io_uring: add support for IORING_OP_OPENAT2 · cebdb986

由 Jens Axboe 提交于 1月 08, 2020

Add support for the new openat2(2) system call. It's trivial to do, as
we can have openat(2) just be wrapped around it.
Suggested-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cebdb986

io_uring: remove 'fname' from io_open structure · f8748881

由 Jens Axboe 提交于 1月 08, 2020

We only use it internally in the prep functions for both statx and
openat, so we don't need it to be persistent across the request.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f8748881

io_uring: add 'struct open_how' to the openat request context · c12cedf2

由 Jens Axboe 提交于 1月 08, 2020

We'll need this for openat2(2) support, remove flags and mode from
the existing io_open struct.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c12cedf2

io_uring: enable option to only trigger eventfd for async completions · f2842ab5

由 Jens Axboe 提交于 1月 08, 2020

If an application is using eventfd notifications with poll to know when
new SQEs can be issued, it's expecting the following read/writes to
complete inline. And with that, it knows that there are events available,
and don't want spurious wakeups on the eventfd for those requests.

This adds IORING_REGISTER_EVENTFD_ASYNC, which works just like
IORING_REGISTER_EVENTFD, except it only triggers notifications for events
that happen from async completions (IRQ, or io-wq worker completions).
Any completions inline from the submission itself will not trigger
notifications.
Suggested-by: NMark Papadakis <markuspapadakis@icloud.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f2842ab5

io_uring: change io_ring_ctx bool fields into bit fields · 69b3e546

由 Jens Axboe 提交于 1月 08, 2020

In preparation for adding another one, which would make us spill into
another long (and hence bump the size of the ctx), change them to
bit fields.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

69b3e546

io_uring: file set registration should use interruptible waits · c150368b

由 Jens Axboe 提交于 1月 08, 2020

If an application attempts to register a set with unbounded requests
pending, we can be stuck here forever if they don't complete. We can
make this wait interruptible, and just abort if we get signaled.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c150368b

io_uring: Remove unnecessary null check · 96fd84d8

由 YueHaibing 提交于 1月 07, 2020

Null check kfree is redundant, so remove it.
This is detected by coccinelle.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

96fd84d8

io_uring: add support for send(2) and recv(2) · fddaface

由 Jens Axboe 提交于 1月 04, 2020

This adds IORING_OP_SEND for send(2) support, and IORING_OP_RECV for
recv(2) support.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fddaface

io_uring: remove extra io_wq_current_is_worker() · 2550878f

由 Pavel Begunkov 提交于 12月 30, 2019

io_wq workers use io_issue_sqe() to forward sqes and never
io_queue_sqe(). Remove extra check for io_wq_current_is_worker()
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2550878f

io_uring: optimise commit_sqring() for common case · caf582c6

由 Pavel Begunkov 提交于 12月 30, 2019

It should be pretty rare to not submitting anything when there is
something in the ring. No need to keep heuristics for this case.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

caf582c6

io_uring: optimise head checks in io_get_sqring() · ee7d46d9

由 Pavel Begunkov 提交于 12月 30, 2019

A user may ask to submit more than there is in the ring, and then
io_uring will submit as much as it can. However, in the last iteration
it will allocate an io_kiocb and immediately free it. It could do
better and adjust @to_submit to what is in the ring.

And since the ring's head is already checked here, there is no need to
do it in the loop, spamming with smp_load_acquire()'s barriers
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ee7d46d9

io_uring: clamp to_submit in io_submit_sqes() · 9ef4f124

由 Pavel Begunkov 提交于 12月 30, 2019

Make io_submit_sqes() to clamp @to_submit itself. It removes duplicated
code and prepares for following changes.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9ef4f124

io_uring: add support for IORING_SETUP_CLAMP · 8110c1a6

由 Jens Axboe 提交于 12月 28, 2019

Some applications like to start small in terms of ring size, and then
ramp up as needed. This is a bit tricky to do currently, since we don't
advertise the max ring size.

This adds IORING_SETUP_CLAMP. If set, and the values for SQ or CQ ring
size exceed what we support, then clamp them at the max values instead
of returning -EINVAL. Since we return the chosen ring sizes after setup,
no further changes are needed on the application side. io_uring already
changes the ring sizes if the application doesn't ask for power-of-two
sizes, for example.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8110c1a6

io_uring: extend batch freeing to cover more cases · c6ca97b3

由 Jens Axboe 提交于 12月 28, 2019

Currently we only batch free if fixed files are used, no links, no aux
data, etc. This extends the batch freeing to only exclude the linked
case and fallback case, and make io_free_req_many() handle the other
cases just fine.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c6ca97b3

io_uring: wrap multi-req freeing in struct req_batch · 8237e045

由 Jens Axboe 提交于 12月 28, 2019

This cleans up the code a bit, and it allows us to build on top of the
multi-req freeing.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8237e045

io_uring: batch getting pcpu references · 2b85edfc

由 Pavel Begunkov 提交于 12月 28, 2019

percpu_ref_tryget() has its own overhead. Instead getting a reference
for each request, grab a bunch once per io_submit_sqes().

~5% throughput boost for a "submit and wait 128 nops" benchmark.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>

__io_req_free_empty() -> __io_req_do_free()
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2b85edfc

pcpu_ref: add percpu_ref_tryget_many() · 4e5ef023

由 Pavel Begunkov 提交于 12月 28, 2019

Add percpu_ref_tryget_many(), which works the same way as
percpu_ref_tryget(), but grabs specified number of refs.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NDennis Zhou <dennis@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4e5ef023

io_uring: add IORING_OP_MADVISE · c1ca757b

由 Jens Axboe 提交于 12月 25, 2019

This adds support for doing madvise(2) through io_uring. We assume that
any operation can block, and hence punt everything async. This could be
improved, but hard to make bullet proof. The async punt ensures it's
safe.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c1ca757b

mm: make do_madvise() available internally · db08ca25

由 Jens Axboe 提交于 12月 25, 2019

This is in preparation for enabling this functionality through io_uring.
Add a helper that is just exporting what sys_madvise() does, and have the
system call use it.

No functional changes in this patch.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

db08ca25

io_uring: add IORING_OP_FADVISE · 4840e418

由 Jens Axboe 提交于 12月 25, 2019

This adds support for doing fadvise through io_uring. We assume that
WILLNEED doesn't block, but that DONTNEED may block.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4840e418

io_uring: allow use of offset == -1 to mean file position · ba04291e

由 Jens Axboe 提交于 12月 25, 2019

This behaves like preadv2/pwritev2 with offset == -1, it'll use (and
update) the current file position. This obviously comes with the caveat
that if the application has multiple read/writes in flight, then the
end result will not be as expected. This is similar to threads sharing
a file descriptor and doing IO using the current file position.

Since this feature isn't easily detectable by doing a read or write,
add a feature flags, IORING_FEAT_RW_CUR_POS, to allow applications to
detect presence of this feature.
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ba04291e

io_uring: add non-vectored read/write commands · 3a6820f2

由 Jens Axboe 提交于 12月 22, 2019

For uses cases that don't already naturally have an iovec, it's easier
(or more convenient) to just use a buffer address + length. This is
particular true if the use case is from languages that want to create
a memory safe abstraction on top of io_uring, and where introducing
the need for the iovec may impose an ownership issue. For those cases,
they currently need an indirection buffer, which means allocating data
just for this purpose.

Add basic read/write that don't require the iovec.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3a6820f2

io_uring: improve poll completion performance · e94f141b

由 Jens Axboe 提交于 12月 19, 2019

For busy IORING_OP_POLL_ADD workloads, we can have enough contention
on the completion lock that we fail the inline completion path quite
often as we fail the trylock on that lock. Add a list for deferred
completions that we can use in that case. This helps reduce the number
of async offloads we have to do, as if we get multiple completions in
a row, we'll piggy back on to the poll_llist instead of having to queue
our own offload.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e94f141b

io_uring: split overflow state into SQ and CQ side · ad3eb2c8

由 Jens Axboe 提交于 12月 18, 2019

We currently check ->cq_overflow_list from both SQ and CQ context, which
causes some bouncing of that cache line. Add separate bits of state for
this instead, so that the SQ side can check using its own state, and
likewise for the CQ side.

This adds ->sq_check_overflow with the SQ state, and ->cq_check_overflow
with the CQ state. If we hit an overflow condition, both of these bits
are set. Likewise for overflow flush clear, we clear both bits. For the
fast path of just checking if there's an overflow condition on either
the SQ or CQ side, we can use our own private bit for this.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ad3eb2c8

io_uring: add lookup table for various opcode needs · d3656344

由 Jens Axboe 提交于 12月 18, 2019

We currently have various switch statements that check if an opcode needs
a file, mm, etc. These are hard to keep in sync as opcodes are added. Add
a struct io_op_def that holds all of this information, so we have just
one spot to update when opcodes are added.

This also enables us to NOT allocate req->io if a deferred command
doesn't need it, and corrects some mistakes we had in terms of what
commands need mm context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d3656344

io_uring: remove two unnecessary function declarations · add7b6b8

由 Jens Axboe 提交于 12月 18, 2019

__io_free_req() and io_double_put_req() aren't used before they are
defined, so we can kill these two forwards.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

add7b6b8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功