提交 · b1d8f47246e510a602adee80358319695e68bee2 · openanolis / cloud-kernel

27 5月, 2020 40 次提交

io_uring: add mapping support for NOMMU archs · b1d8f472

由 Roman Penyaev 提交于 11月 28, 2019

to #26323578

commit 6c5c240e412682f97aecd233c1e706822704aa28 upstream.

That is a bit weird scenario but I find it interesting to run fio loads
using LKL linux, where MMU is disabled.  Probably other real archs which
run uClinux can also benefit from this patch.
Signed-off-by: NRoman Penyaev <rpenyaev@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

b1d8f472

io_uring: make poll->wait dynamically allocated · bf7088f1

由 Jens Axboe 提交于 11月 26, 2019

to #26323578

commit e944475e69849273ca8f1fe04a3ce81b5901d165 upstream.

In the quest to bring io_kiocb down to 3 cachelines, this one does
the trick. Make the wait_queue_entry for the poll command come out
of kmalloc instead of embedding it in struct io_poll_iocb, as the
latter is the largest member of io_kiocb. Once we trim this down a
bit, we're back at a healthy 192 bytes for struct io_kiocb.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

bf7088f1

io-wq: shrink io_wq_work a bit · e7680975

由 Jens Axboe 提交于 11月 26, 2019

to #26323578

commit 6206f0e180d4eddc0a178f57120ab1b913701f6e upstream.

Currently we're using 40 bytes for the io_wq_work structure, and 16 of
those is the doubly link list node. We don't need doubly linked lists,
we always add to tail to keep things ordered, and any other use case
is list traversal with deletion. For the deletion case, we can easily
support any node deletion by keeping track of the previous entry.

This shrinks io_wq_work to 32 bytes, and subsequently io_kiock from
io_uring to 216 to 208 bytes.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e7680975

io-wq: fix handling of NUMA node IDs · 4e2bc1f5

由 Jann Horn 提交于 11月 26, 2019

to #26323578

commit 3fc50ab559f5ae400aa33bd0836b3602da7fa51b upstream.

There are several things that can go wrong in the current code on NUMA
systems, especially if not all nodes are online all the time:

 - If the identifiers of the online nodes do not form a single contiguous
   block starting at zero, wq->wqes will be too small, and OOB memory
   accesses will occur e.g. in the loop in io_wq_create().
 - If a node comes online between the call to num_online_nodes() and the
   for_each_node() loop in io_wq_create(), an OOB write will occur.
 - If a node comes online between io_wq_create() and io_wq_enqueue(), a
   lookup is performed for an element that doesn't exist, and an OOB read
   will probably occur.

Fix it by:

 - using nr_node_ids instead of num_online_nodes() for the allocation size;
   nr_node_ids is calculated by setup_nr_node_ids() to be bigger than the
   highest node ID that could possibly come online at some point, even if
   those nodes' identifiers are not a contiguous block
 - creating workers for all possible CPUs, not just all online ones

This is basically what the normal workqueue code also does, as far as I can
tell.
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4e2bc1f5

io_uring: use kzalloc instead of kcalloc for single-element allocations · b6fcf21d

由 Jann Horn 提交于 11月 26, 2019

to #26323578

commit ad6e005ca68de7af76f9ed3e4c9b6f0aa2f842e3 upstream.

These allocations are single-element allocations, so don't use the array
allocation wrapper for them.
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

b6fcf21d

io_uring: cleanup io_import_fixed() · e7436342

由 Pavel Begunkov 提交于 11月 25, 2019

to #26323578

commit 7d009165550adc64e3561c65ecce564125052e00 upstream.

Clean io_import_fixed() call site and make it return proper type.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e7436342

io_uring: inline struct sqe_submit · 505210f7

由 Pavel Begunkov 提交于 11月 25, 2019

to #26323578

commit cf6fd4bd559ee61a4454b161863c8de6f30f8dca upstream.

There is no point left in keeping struct sqe_submit. Inline it
into struct io_kiocb, so any req->submit.field is now just req->field

- moves initialisation of ring_file into io_get_req()
- removes duplicated req->sequence.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

505210f7

io_uring: store timeout's sqe->off in proper place · d18da430

由 Pavel Begunkov 提交于 11月 25, 2019

to #26323578

commit cc42e0ac17d3664a70e020dfe7897f14e7aa7453 upstream.

Timeouts' sequence offset (i.e. sqe->off) is stored in
req->submit.sequence under a false name. Keep it in timeout.data
instead. The unused space for sequence will be reclaimed in the
following patches.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

d18da430

io_uring: remove superfluous check for sqe->off in io_accept() · e8da7bb3

由 Hrvoje Zeba 提交于 11月 25, 2019

to #26323578

commit 8042d6ce8c40df0abb0d91662a754d074a3d3f16 upstream.

This field contains a pointer to addrlen and checking to see if it's set
returns -EINVAL if the caller sets addr & addrlen pointers.

Fixes: 17f2fe35d080 ("io_uring: add support for IORING_OP_ACCEPT")
Signed-off-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e8da7bb3

io_uring: fix dead-hung for non-iter fixed rw · aa572ac1

由 Pavel Begunkov 提交于 11月 24, 2019

to #26323578

commit 311ae9e159d81a1ec1cf645daf40b39ae5a0bd84 upstream.

Read/write requests to devices without implemented read/write_iter
using fixed buffers can cause general protection fault, which totally
hangs a machine.

io_import_fixed() initialises iov_iter with bvec, but loop_rw_iter()
accesses it as iovec, dereferencing random address.

kmap() page by page in this case

Cc: stable@vger.kernel.org
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

aa572ac1

io_uring: add support for IORING_OP_CONNECT · 78e9fdaa

由 Jens Axboe 提交于 11月 23, 2019

to #26323578

commit f8e85cf255ad57d65eeb9a9d0e59e3dec55bdd9e upstream.

This allows an application to call connect() in an async fashion. Like
other opcodes, we first try a non-blocking connect, then punt to async
context if we have to.

Note that we can still return -EINPROGRESS, and in that case the caller
should use IORING_OP_POLL_ADD to do an async wait for completion of the
connect request (just like for regular connect(2), except we can do it
async here too).
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

78e9fdaa

net: add __sys_connect_file() helper · 351e775a

由 Jens Axboe 提交于 11月 23, 2019

to #26323578

commit bd3ded3146daa2cbb57ed353749ef99cf75371b0 upstream.

This is identical to __sys_connect(), except it takes a struct file
instead of an fd, and it also allows passing in extra file->f_flags
flags. The latter is done to support masking in O_NONBLOCK without
manipulating the original file flags.

No functional changes in this patch.

Cc: netdev@vger.kernel.org
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

351e775a

io_uring: only !null ptr to io_issue_sqe() · 420ea251

由 Pavel Begunkov 提交于 11月 21, 2019

to #26323578

commit f9bd67f69af56d712bfd498f5ad9cf7bb177d600 upstream.

Pass only non-null @nxt to io_issue_sqe() and handle it at the caller's
side. And propagate it.

- kiocb_done() is only called from io_read() and io_write(), which are
only called from io_issue_sqe(), so it's @nxt != NULL

- io_put_req_find_next() is called either with explicitly non-null local
nxt, or from one of the functions in io_issue_sqe() switch (or their
callees).
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

420ea251

io_uring: simplify io_req_link_next() · 543fc5fe

由 Pavel Begunkov 提交于 11月 21, 2019

to #26323578

commit b18fdf71e01fba29a804d63f8c1e2ed61011170d upstream.

"if (nxt)" is always true, as it was checked in the while's condition.
io_wq_current_is_worker() is unnecessary, as non-async callers don't
pass nxt, so io_queue_async_work() will be called for them anyway.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

543fc5fe

io_uring: pass only !null to io_req_find_next() · 5ecc65f5

由 Pavel Begunkov 提交于 11月 21, 2019

to #26323578

commit 944e58bfeda0e9b97cd611adafc823c78e0bc464 upstream.

Make io_req_find_next() and io_req_link_next() to accept only non-null
nxt, and handle it in callers.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

5ecc65f5

io_uring: remove io_free_req_find_next() · b0a3bc34

由 Pavel Begunkov 提交于 11月 21, 2019

to #26323578

commit 70cf9f3270a5c5148e93a526dc1e51965259e70c upstream.

There is only one one-liner user of io_free_req_find_next(). Inline it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

b0a3bc34

io_uring: add likely/unlikely in io_get_sqring() · e492d6d5

由 Pavel Begunkov 提交于 11月 21, 2019

to #26323578

commit 9835d6fafba58e6d9386a6d5af800789bdb52e5b upstream.

The number of SQEs to submit is specified by a user, so io_get_sqring()
in most of the cases succeeds. Hint compilers about that.

Checking ASM genereted by gcc 9.2.0 for x64, there is one branch
misprediction.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e492d6d5

io_uring: rename __io_submit_sqe() · 9811e2ae

由 Pavel Begunkov 提交于 11月 21, 2019

to #26323578

commit d732447fed7d6b4c22907f630cd25d574bae5276 upstream.

__io_submit_sqe() is issuing requests, so call it as
such. Moreover, it ends by calling io_iopoll_req_issued().

Rename it and make terminology clearer.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

9811e2ae

io_uring: improve trace_io_uring_defer() trace point · 55ce4ccf

由 Jens Axboe 提交于 11月 21, 2019

to #26323578

commit 915967f69c591b34c5a18d6618af021a81ffd700 upstream.

We don't have shadow requests anymore, so get rid of the shadow
argument. Add the user_data argument, as that's often useful to easily
match up requests, instead of having to look at request pointers.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

55ce4ccf

io_uring: drain next sqe instead of shadowing · 007d7c0a

由 Pavel Begunkov 提交于 11月 21, 2019

to #26323578

commit 1b4a51b6d03d21f55effbcf609ba5526d87d9e9d upstream.

There's an issue with the shadow drain logic in that we drop the
completion lock after deciding to defer a request, then re-grab it later
and assume that the state is still the same. In the mean time, someone
else completing a request could have found and issued it. This can cause
a stall in the queue, by having a shadow request inserted that nobody is
going to drain.

Additionally, if we fail allocating the shadow request, we simply ignore
the drain.

Instead of using a shadow request, defer the next request/link instead.
This also has the following advantages:

- removes semi-duplicated code
- doesn't allocate memory for shadows
- works better if only the head marked for drain
- doesn't need complex synchronisation

On the flip side, it removes the shadow->seq ==
last_drain_in_in_link->seq optimization. That shouldn't be a common
case, and can always be added back, if needed.

Fixes: 4fe2c963154c ("io_uring: add support for link with drain")
Cc: Jackie Liu <liuyun01@kylinos.cn>
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

007d7c0a

io_uring: close lookup gap for dependent next work · f4e3d2b8

由 Jens Axboe 提交于 11月 20, 2019

to #26323578

commit b76da70fc3759df13e0991706451f1a2e06ba19e upstream.

When we find new work to process within the work handler, we queue the
linked timeout before we have issued the new work. This can be
problematic for very short timeouts, as we have a window where the new
work isn't visible.

Allow the work handler to store a callback function for this in the work
item, and flag it with IO_WQ_WORK_CB if the caller has done so. If that
is set, then io-wq will call the callback when it has setup the new work
item.
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f4e3d2b8

io_uring: allow finding next link independent of req reference count · d98ccaf3

由 Jens Axboe 提交于 11月 20, 2019

to #26323578

commit 4d7dd462971405c65bfb3821dbb6b9ce13b5e8d6 upstream.

We currently try and start the next link when we put the request, and
only if we were going to free it. This means that the optimization to
continue executing requests from the same context often fails, as we're
not putting the final reference.

Add REQ_F_LINK_NEXT to keep track of this, and allow io_uring to find the
next request more efficiently.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

d98ccaf3

io_uring: io_allocate_scq_urings() should return a sane state · 4e56986f

由 Jens Axboe 提交于 11月 20, 2019

to #26323578

commit eb065d301e8c83643367bdb0898becc364046bda upstream.

We currently rely on the ring destroy on cleaning things up in case of
failure, but io_allocate_scq_urings() can leave things half initialized
if only parts of it fails.

Be nice and return with either everything setup in success, or return an
error with things nicely cleaned up.

Reported-by: syzbot+0d818c0d39399188f393@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4e56986f

io_uring: Always REQ_F_FREE_SQE for allocated sqe · f9d30ea7

由 Pavel Begunkov 提交于 11月 19, 2019

to #26323578

commit bbad27b2f622fa26d107f8a72c0cd5cc102dc56e upstream.

Always mark requests with allocated sqe and deallocate it in
__io_free_req(). It's easier to follow and doesn't add edge cases.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f9d30ea7

io_uring: io_fail_links() should only consider first linked timeout · 5234d231

由 Jens Axboe 提交于 11月 19, 2019

to #26323578

commit 5d960724b0cb0d12469d1c62912e4a8c09c9fd92 upstream.

We currently clear the linked timeout field if we cancel such a timeout,
but we should only attempt to cancel if it's the first one we see.
Others should simply be freed like other requests, as they haven't
been started yet.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

5234d231

io_uring: Fix leaking linked timeouts · 76e1b540

由 Pavel Begunkov 提交于 11月 19, 2019

to #26323578

commit 09fbb0a83ec6ab5a4037766261c031151985fff6 upstream.

let have a dependant link: REQ -> LINK_TIMEOUT -> LINK_TIMEOUT

1. submission stage: submission references for REQ and LINK_TIMEOUT
are dropped. So, references respectively (1,1,2)

2. io_put(REQ) + FAIL_LINKS stage: calls io_fail_links(), which for all
linked timeouts will call cancel_timeout() and drop 1 reference.
So, references after: (0,0,1). That's a leak.

Make it treat only the first linked timeout as such, and pass others
through __io_double_put_req().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

76e1b540

io_uring: remove redundant check · 965269a5

由 Pavel Begunkov 提交于 11月 19, 2019

to #26323578

commit f70193d6d8cad4cc614223fef349e6ea9d48c61f upstream.

Pass any IORING_OP_LINK_TIMEOUT request further, where it will
eventually fail in io_issue_sqe().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

965269a5

io_uring: break links for failed defer · a162e056

由 Pavel Begunkov 提交于 11月 19, 2019

to #26323578

commit d3b35796b1e3f118017491d621f624e0de7ff9fb upstream.

If io_req_defer() failed, it needs to cancel a dependant link.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a162e056

io-wq: remove extra space characters · 76333268

由 Dan Carpenter 提交于 11月 19, 2019

to #26323578

commit b2e9c7d64b7ecacc1d0f15a6af88a73cab7d8db9 upstream.

These lines are indented an extra space character.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

76333268

io_uring: request cancellations should break links · 653a3c13

由 Jens Axboe 提交于 11月 18, 2019

to #26323578

commit fba38c272a0385148935d6443cb9dc68cf1f37a7 upstream.

We currently don't explicitly break links if a request is cancelled, but
we should. Add explicitly link breakage for all types of request
cancellations that we support.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

653a3c13

io_uring: correct poll cancel and linked timeout expiration completion · 95cecacf

由 Jens Axboe 提交于 11月 18, 2019

to #26323578

commit b0dd8a412699afe3420a08f841333f3474ad45c5 upstream.

Currently a poll request fills a completion entry of 0, even if it got
cancelled. This is odd, and it makes it harder to support with chains.
Ensure that it returns -ECANCELED in the completions events if it got
cancelled, and furthermore ensure that the linked timeout that triggered
it completes with -ETIME if we did indeed trigger the completions
through a timeout.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

95cecacf

io_uring: remove dead REQ_F_SEQ_PREV flag · 468927d3

由 Jens Axboe 提交于 11月 15, 2019

to #26323578

commit e0e328c4b330712e45ba799dc589bda751323110 upstream.

With the conversion to io-wq, we no longer use that flag. Kill it.

Fixes: 561fb04a6a22 ("io_uring: replace workqueue usage with io-wq")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

468927d3

io_uring: fix sequencing issues with linked timeouts · 9f421418

由 Jens Axboe 提交于 11月 14, 2019

to #26323578

commit 94ae5e77a9150a8c6c57432e2db290c6868ddfad upstream.

We have an issue with timeout links that are deeper in the submit chain,
because we only handle it upfront, not from later submissions. Move the
prep + issue of the timeout link to the async work prep handler, and do
it normally for non-async queue. If we validate and prepare the timeout
links upfront when we first see them, there's nothing stopping us from
supporting any sort of nesting.

Fixes: 2665abfd757f ("io_uring: add support for linked SQE timeouts")
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

9f421418

io_uring: make req->timeout be dynamically allocated · 2daf4b5c

由 Jens Axboe 提交于 11月 15, 2019

to #26323578

commit ad8a48acc23cb13cbf4332ebabb867b1baa81842 upstream.

There are a few reasons for this:

- As a prep to improving the linked timeout logic
- io_timeout is the biggest member in the io_kiocb opcode union

This also enables a few cleanups, like unifying the timer setup between
IORING_OP_TIMEOUT and IORING_OP_LINK_TIMEOUT, and not needing multiple
arguments to the link/prep helpers.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

2daf4b5c

io_uring: make io_double_put_req() use normal completion path · a9a99776

由 Jens Axboe 提交于 11月 14, 2019

to #26323578

commit 978db57e2c329fc612ff669cab9bf0007efd3ca3 upstream.

If we don't use the normal completion path, we may skip killing links
that should be errored and freed. Add __io_double_put_req() for use
within the completion path itself, other calls should just use
io_double_put_req().
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a9a99776

io_uring: cleanup return values from the queueing functions · 94453214

由 Jens Axboe 提交于 11月 14, 2019

to #26323578

commit 0e0702dac26b282603261f04a62711a2d9aac17b upstream.

__io_queue_sqe(), io_queue_sqe(), io_queue_link_head() all return 0/err,
but the caller doesn't care since the errors are handled inline. Clean
these up and just make them void.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

94453214

io_uring: io_async_cancel() should pass in 'nxt' request pointer · a5a701a4

由 Jens Axboe 提交于 11月 14, 2019

to #26323578

commit 95a5bbae05ef1ec1cceb8c1b04a482aa0b7c177c upstream.

If we have a linked request, this enables us to pass it back directly
without having to go through async context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a5a701a4

io_uring: make POLL_ADD/POLL_REMOVE scale better · f7860a3c

由 Jens Axboe 提交于 11月 14, 2019

to #26323578

commit eac406c61cd0ec8fe7970ca46ddf23e40a86b579 upstream.

One of the obvious use cases for these commands is networking, where
it's not uncommon to have tons of sockets open and polled for. The
current implementation uses a list for insertion and lookup, which works
fine for file based use cases where the count is usually low, it breaks
down somewhat for higher number of files / sockets. A test case with
30k sockets being polled for and cancelled takes:

real    0m6.968s
user    0m0.002s
sys     0m6.936s

with the patch it takes:

real    0m0.233s
user    0m0.010s
sys     0m0.176s

If you go to 50k sockets, it gets even more abysmal with the current
code:

real    0m40.602s
user    0m0.010s
sys     0m40.555s

with the patch it takes:

real    0m0.398s
user    0m0.000s
sys     0m0.341s

Change is pretty straight forward, just replace the cancel_list with
a red/black tree instead.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f7860a3c

io-wq: remove now redundant struct io_wq_nulls_list · f24ee8ad

由 Jens Axboe 提交于 11月 14, 2019

to #26323578

commit 021d1cdda3875bf35edac9133335f622d7910abc upstream.

Since we don't iterate these lists anymore after commit:

e61df66c69b1 ("io-wq: ensure free/busy list browsing see all items")

we don't need to retain the nulls value we use for them. That means it's
pretty pointless to wrap the hlist_nulls_head in a structure, so get rid
of it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f24ee8ad

io-wq: ensure free/busy list browsing see all items · ed0788d3

由 Jens Axboe 提交于 11月 13, 2019

to #26323578

commit e61df66c69b11bc050d233dc95714a6339192c28 upstream.

We have two lists for workers in io-wq, a busy and a free list. For
certain operations we want to browse all workers, and we currently do
that by browsing the two separate lists. But since these lists are RCU
protected, we can potentially miss workers if they move between the two
lists while we're browsing them.

Add a third list, all_list, that simply holds all workers. A worker is
added to that list when it starts, and removed when it exits. This makes
the worker iteration cleaner, too.
Reported-by: NPaul E. McKenney <paulmck@kernel.org>
Reviewed-by: NPaul E. McKenney <paulmck@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

ed0788d3

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功