提交 · 3aa5fa030558e2b0da284fd069aeb7178543c987 · openeuler / Kernel

06 11月, 2019 3 次提交

io_uring: kill dead REQ_F_LINK_DONE flag · 3aa5fa03

由 Jens Axboe 提交于 11月 05, 2019

We had no more use for this flag after the conversion to io-wq, kill it
off.

Fixes: 561fb04a ("io_uring: replace workqueue usage with io-wq")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3aa5fa03

io_uring: fixup a few spots where link failure isn't flagged · f1f40853

由 Jens Axboe 提交于 11月 05, 2019

If a request fails, we need to ensure we set REQ_F_FAIL_LINK on it if
REQ_F_LINK is set. Any failure in the chain should break the chain.

We were missing a few spots where this should be done. It might be nice
to generalize this somewhat at some point, as long as we factor in the
fact that failure looks different for each request type.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f1f40853

io_uring: enable optimized link handling for IORING_OP_POLL_ADD · 89723d0b

由 Jens Axboe 提交于 11月 05, 2019

As introduced by commit:

ba816ad6 ("io_uring: run dependent links inline if possible")

enable inline dependent link running for poll commands.
io_poll_complete_work() is the most important change, as it allows a
linked sequence of { POLL, READ } (for example) to proceed inline
instead of needing to get punted to another async context. The
submission side only potentially matters for sqthread, but may as well
include that bit.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

89723d0b

04 11月, 2019 1 次提交

io_uring: add completion trace event · 51c3ff62

由 Jens Axboe 提交于 11月 03, 2019

We currently don't have a completion event trace, add one of those. And
to better be able to match up submissions and completions, add user_data
to the submission trace as well.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

51c3ff62

01 11月, 2019 2 次提交

io_uring: set -EINTR directly when a signal wakes up in io_cqring_wait · e9ffa5c2

由 Jackie Liu 提交于 10月 29, 2019

We didn't use -ERESTARTSYS to tell the application layer to restart the
system call, but instead return -EINTR. we can set -EINTR directly when
wakeup by the signal, which can help us save an assignment operation and
comparison operation.
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e9ffa5c2

io_uring: support for generic async request cancel · 62755e35

由 Jens Axboe 提交于 10月 28, 2019

This adds support for IORING_OP_ASYNC_CANCEL, which will attempt to
cancel requests that have been punted to async context and are now
in-flight. This works for regular read/write requests to files, as
long as they haven't been started yet. For socket based IO (or things
like accept4(2)), we can cancel work that is already running as well.

To cancel a request, the sqe must have ->addr set to the user_data of
the request it wishes to cancel. If the request is cancelled
successfully, the original request is completed with -ECANCELED
and the cancel request is completed with a result of 0. If the
request was already running, the original may or may not complete
in error. The cancel request will complete with -EALREADY for that
case. And finally, if the request to cancel wasn't found, the cancel
request is completed with -ENOENT.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

62755e35

30 10月, 2019 17 次提交

io_uring: io_wq_create() returns an error pointer, not NULL · 975c99a5

由 Jens Axboe 提交于 10月 30, 2019

syzbot reported an issue where we crash at setup time if failslab is
used. The issue is that io_wq_create() returns an error pointer on
failure, not NULL. Hence io_uring thought the io-wq was setup just
fine, but in reality it's a garbage error pointer.

Use IS_ERR() instead of a NULL check, and assign ret appropriately.

Reported-by: syzbot+221cc24572a2fed23b6b@syzkaller.appspotmail.com
Fixes: 561fb04a ("io_uring: replace workqueue usage with io-wq")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

975c99a5

io_uring: fix race with canceling timeouts · 842f9612

由 Jens Axboe 提交于 10月 29, 2019

If we get -1 from hrtimer_try_to_cancel(), we know that the timer
is running. Hence leave all completion to the timeout handler. If
we don't, we can corrupt the list and miss a completion.

Fixes: 11365043 ("io_uring: add support for canceling timeout requests")
Reported-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
Tested-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

842f9612

io_uring: support for larger fixed file sets · 65e19f54

由 Jens Axboe 提交于 10月 26, 2019

There's been a few requests for supporting more fixed files than 1024.
This isn't really tricky to do, we just need to split up the file table
into multiple tables and index appropriately. As we do so, reduce the
max single file table to 512. This enables us to do single page allocs
always for the tables, which is an improvement over the situation prior.

This patch adds support for up to 64K files, which should be enough for
everyone.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

65e19f54

io_uring: protect fixed file indexing with array_index_nospec() · b7620121

由 Jens Axboe 提交于 10月 26, 2019

We index the file tables with a user given value. After we check
it's within our limits, use array_index_nospec() to prevent any
spectre attacks here.
Suggested-by: NJann Horn <jannh@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7620121

io_uring: add support for IORING_OP_ACCEPT · 17f2fe35

由 Jens Axboe 提交于 10月 17, 2019

This allows an application to call accept4() in an async fashion. Like
other opcodes, we first try a non-blocking accept, then punt to async
context if we have to.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

17f2fe35

io_uring: io_uring: add support for async work inheriting files · fcb323cc

由 Jens Axboe 提交于 10月 24, 2019

This is in preparation for adding opcodes that need to add new files
in a process file table, system calls like open(2) or accept4(2).

If an opcode needs this, it must set IO_WQ_WORK_NEEDS_FILES in the work
item. If work that needs to get punted to async context have this
set, the async worker will assume the original task file table before
executing the work.

Note that opcodes that need access to the current files of an
application cannot be done through IORING_SETUP_SQPOLL.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fcb323cc

io_uring: replace workqueue usage with io-wq · 561fb04a

由 Jens Axboe 提交于 10月 24, 2019

Drop various work-arounds we have for workqueues:

- We no longer need the async_list for tracking sequential IO.

- We don't have to maintain our own mm tracking/setting.

- We don't need a separate workqueue for buffered writes. This didn't
  even work that well to begin with, as it was suboptimal for multiple
  buffered writers on multiple files.

- We can properly cancel pending interruptible work. This fixes
  deadlocks with particularly socket IO, where we cannot cancel them
  when the io_uring is closed. Hence the ring will wait forever for
  these requests to complete, which may never happen. This is different
  from disk IO where we know requests will complete in a finite amount
  of time.

- Due to being able to cancel work interruptible work that is already
  running, we can implement file table support for work. We need that
  for supporting system calls that add to a process file table.

- It gets us one step closer to adding async support for any system
  call.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

561fb04a

io_uring: Fix mm_fault with READ/WRITE_FIXED · 95a1b3ff

由 Pavel Begunkov 提交于 10月 27, 2019

Commit fb5ccc98 ("io_uring: Fix broken links with offloading")
introduced a potential performance regression with unconditionally
taking mm even for READ/WRITE_FIXED operations.

Return the logic handling it back. mm-faulted requests will go through
the generic submission path, so honoring links and drains, but will
fail further on req->has_user check.

Fixes: fb5ccc98 ("io_uring: Fix broken links with offloading")
Cc: stable@vger.kernel.org # v5.4
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

95a1b3ff

io_uring: remove index from sqe_submit · fa456228

由 Pavel Begunkov 提交于 10月 27, 2019

submit->index is used only for inbound check in submission path (i.e.
head < ctx->sq_entries). However, it always will be true, as
1. it's already validated by io_get_sqring()
2. ctx->sq_entries can't be changedd in between, because of held
ctx->uring_lock and ctx->refs.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fa456228

io_uring: add set of tracing events · c826bd7a

由 Dmitrii Dolgov 提交于 10月 15, 2019

To trace io_uring activity one can get an information from workqueue and
io trace events, but looks like some parts could be hard to identify via
this approach. Making what happens inside io_uring more transparent is
important to be able to reason about many aspects of it, hence introduce
the set of tracing events.

All such events could be roughly divided into two categories:

* those, that are helping to understand correctness (from both kernel
  and an application point of view). E.g. a ring creation, file
  registration, or waiting for available CQE. Proposed approach is to
  get a pointer to an original structure of interest (ring context, or
  request), and then find relevant events. io_uring_queue_async_work
  also exposes a pointer to work_struct, to be able to track down
  corresponding workqueue events.

* those, that provide performance related information. Mostly it's about
  events that change the flow of requests, e.g. whether an async work
  was queued, or delayed due to some dependencies. Another important
  case is how io_uring optimizations (e.g. registered files) are
  utilized.
Signed-off-by: NDmitrii Dolgov <9erthalion6@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c826bd7a

io_uring: add support for canceling timeout requests · 11365043

由 Jens Axboe 提交于 10月 16, 2019

We might have cases where the need for a specific timeout is gone, add
support for canceling an existing timeout operation. This works like the
POLL_REMOVE command, where the application passes in the user_data of
the timeout it wishes to cancel in the sqe->addr field.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

11365043

io_uring: add support for absolute timeouts · a41525ab

由 Jens Axboe 提交于 10月 15, 2019

This is a pretty trivial addition on top of the relative timeouts
we have now, but it's handy for ensuring tighter timing for those
that are building scheduling primitives on top of io_uring.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a41525ab

io_uring: replace s->needs_lock with s->in_async · ba5290cc

由 Jackie Liu 提交于 10月 09, 2019

There is no function change, just to clean up the code, use s->in_async
to make the code know where it is.
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ba5290cc

io_uring: allow application controlled CQ ring size · 33a107f0

由 Jens Axboe 提交于 10月 04, 2019

We currently size the CQ ring as twice the SQ ring, to allow some
flexibility in not overflowing the CQ ring. This is done because the
SQE life time is different than that of the IO request itself, the SQE
is consumed as soon as the kernel has seen the entry.

Certain application don't need a huge SQ ring size, since they just
submit IO in batches. But they may have a lot of requests pending, and
hence need a big CQ ring to hold them all. By allowing the application
to control the CQ ring size multiplier, we can cater to those
applications more efficiently.

If an application wants to define its own CQ ring size, it must set
IORING_SETUP_CQSIZE in the setup flags, and fill out
io_uring_params->cq_entries. The value must be a power of two.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

33a107f0

io_uring: add support for IORING_REGISTER_FILES_UPDATE · c3a31e60

由 Jens Axboe 提交于 10月 03, 2019

Allows the application to remove/replace/add files to/from a file set.
Passes in a struct:

struct io_uring_files_update {
	__u32 offset;
	__s32 *fds;
};

that holds an array of fds, size of array passed in through the usual
nr_args part of the io_uring_register() system call. The logic is as
follows:

1) If ->fds[i] is -1, the existing file at i + ->offset is removed from
   the set.
2) If ->fds[i] is a valid fd, the existing file at i + ->offset is
   replaced with ->fds[i].

For case #2, is the existing file is currently empty (fd == -1), the
new fd is simply added to the array.
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c3a31e60

io_uring: allow sparse fixed file sets · 08a45173

由 Jens Axboe 提交于 10月 03, 2019

This is in preparation for allowing updates to fixed file sets without
requiring a full unregister+register.
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

08a45173

io_uring: run dependent links inline if possible · ba816ad6

由 Jens Axboe 提交于 9月 28, 2019

Currently any dependent link is executed from a new workqueue context,
which means that we'll be doing a context switch per link in the chain.
If we are running the completion of the current request from our async
workqueue and find that the next request is a link, then run it directly
from the workqueue context instead of forcing another switch.

This improves the performance of linked SQEs, and reduces the CPU
overhead.
Reviewed-by: NJackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ba816ad6

28 10月, 2019 2 次提交

io_uring: don't touch ctx in setup after ring fd install · 044c1ab3

由 Jens Axboe 提交于 10月 28, 2019

syzkaller reported an issue where it looks like a malicious app can
trigger a use-after-free of reading the ctx ->sq_array and ->rings
value right after having installed the ring fd in the process file
table.

Defer ring fd installation until after we're done reading those
values.

Fixes: 75b28aff ("io_uring: allocate the two rings together")
Reported-by: syzbot+6f03d895a6cd0d06187f@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

044c1ab3

io_uring: Fix leaked shadow_req · 7b20238d

由 Pavel Begunkov 提交于 10月 27, 2019

io_queue_link_head() owns shadow_req after taking it as an argument.
By not freeing it in case of an error, it can leak the request along
with taken ctx->refs.
Reviewed-by: NJackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7b20238d

26 10月, 2019 2 次提交

io_uring: fix bad inflight accounting for SETUP_IOPOLL|SETUP_SQTHREAD · 2b2ed975

由 Jens Axboe 提交于 10月 25, 2019

We currently assume that submissions from the sqthread are successful,
and if IO polling is enabled, we use that value for knowing how many
completions to look for. But if we overflowed the CQ ring or some
requests simply got errored and already completed, they won't be
available for polling.

For the case of IO polling and SQTHREAD usage, look at the pending
poll list. If it ever hits empty then we know that we don't have
anymore pollable requests inflight. For that case, simply reset
the inflight count to zero.
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2b2ed975

io_uring: used cached copies of sq->dropped and cq->overflow · 498ccd9e

由 Jens Axboe 提交于 10月 25, 2019

We currently use the ring values directly, but that can lead to issues
if the application is malicious and changes these values on our behalf.
Created in-kernel cached versions of them, and just overwrite the user
side when we update them. This is similar to how we treat the sq/cq
ring tail/head updates.
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

498ccd9e

25 10月, 2019 3 次提交

io_uring: Fix race for sqes with userspace · 935d1e45

由 Pavel Begunkov 提交于 10月 25, 2019

io_ring_submit() finalises with
1. io_commit_sqring(), which releases sqes to the userspace
2. Then calls to io_queue_link_head(), accessing released head's sqe

Reorder them.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

935d1e45

io_uring: Fix broken links with offloading · fb5ccc98

由 Pavel Begunkov 提交于 10月 25, 2019

io_sq_thread() processes sqes by 8 without considering links. As a
result, links will be randomely subdivided.

The easiest way to fix it is to call io_get_sqring() inside
io_submit_sqes() as do io_ring_submit().

Downsides:
1. This removes optimisation of not grabbing mm_struct for fixed files
2. It submitting all sqes in one go, without finer-grained sheduling
with cq processing.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fb5ccc98

io_uring: Fix corrupted user_data · 84d55dc5

由 Pavel Begunkov 提交于 10月 25, 2019

There is a bug, where failed linked requests are returned not with
specified @user_data, but with garbage from a kernel stack.

The reason is that io_fail_links() uses req->user_data, which is
uninitialised when called from io_queue_sqe() on fail path.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

84d55dc5

24 10月, 2019 3 次提交

io_uring: correct timeout req sequence when inserting a new entry · a1f58ba4

由 zhangyi (F) 提交于 10月 23, 2019

The sequence number of the timeout req (req->sequence) indicate the
expected completion request. Because of each timeout req consume a
sequence number, so the sequence of each timeout req on the timeout
list shouldn't be the same. But now, we may get the same number (also
incorrect) if we insert a new entry before the last one, such as submit
such two timeout reqs on a new ring instance below.

                    req->sequence
 req_1 (count = 2):       2
 req_2 (count = 1):       2

Then, if we submit a nop req, req_2 will still timeout even the nop req
finished. This patch fix this problem by adjust the sequence number of
each reordered reqs when inserting a new entry.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a1f58ba4

io_uring : correct timeout req sequence when waiting timeout · ef03681a

由 zhangyi (F) 提交于 10月 23, 2019

The sequence number of reqs on the timeout_list before the timeout req
should be adjusted in io_timeout_fn(), because the current timeout req
will consumes a slot in the cq_ring and cq_tail pointer will be
increased, otherwise other timeout reqs may return in advance without
waiting for enough wait_nr.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef03681a

io_uring: revert "io_uring: optimize submit_and_wait API" · bc808bce

由 Jens Axboe 提交于 10月 22, 2019

There are cases where it isn't always safe to block for submission,
even if the caller asked to wait for events as well. Revert the
previous optimization of doing that.

This reverts two commits:

bf7ec93c
c5766668

Fixes: c5766668 ("io_uring: optimize submit_and_wait API")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bc808bce

18 10月, 2019 2 次提交

io_uring: fix logic error in io_timeout · 8b07a65a

由 yangerkun 提交于 10月 17, 2019

If ctx->cached_sq_head < nxt_sq_head, we should add UINT_MAX to tmp, not
tmp_nxt.

Fixes: 5da0fb1a ("io_uring: consider the overflow of sequence for timeout req")
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8b07a65a

io_uring: fix up O_NONBLOCK handling for sockets · 491381ce

由 Jens Axboe 提交于 10月 17, 2019

We've got two issues with the non-regular file handling for non-blocking
IO:

1) We don't want to re-do a short read in full for a non-regular file,
   as we can't just read the data again.
2) For non-regular files that don't support non-blocking IO attempts,
   we need to punt to async context even if the file is opened as
   non-blocking. Otherwise the caller always gets -EAGAIN.

Add two new request flags to handle these cases. One is just a cache
of the inode S_ISREG() status, the other tells io_uring that we always
need to punt this request to async context, even if REQ_F_NOWAIT is set.

Cc: stable@vger.kernel.org
Reported-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
Tested-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

491381ce

15 10月, 2019 1 次提交

io_uring: consider the overflow of sequence for timeout req · 5da0fb1a

由 yangerkun 提交于 10月 15, 2019

Now we recalculate the sequence of timeout with 'req->sequence =
ctx->cached_sq_head + count - 1', judge the right place to insert
for timeout_list by compare the number of request we still expected for
completion. But we have not consider about the situation of overflow:

1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for
the new timeout req can have a small req->sequence.

2. cached_sq_head of now may overflow compare with before req. And it
will lead the timeout req with small req->sequence.

This overflow will lead to the misorder of timeout_list, which can lead
to the wrong order of the completion of timeout_list. Fix it by reuse
req->submit.sequence to store the count, and change the logic of
inserting sort in io_timeout.
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5da0fb1a

11 10月, 2019 1 次提交

io_uring: fix sequence logic for timeout requests · 7adf4eaf

由 Jens Axboe 提交于 10月 10, 2019

We have two ways a request can be deferred:

1) It's a regular request that depends on another one
2) It's a timeout that tracks completions

We have a shared helper to determine whether to defer, and that
attempts to make the right decision based on the request. But we
only have some of this information in the caller. Un-share the
two timeout/defer helpers so the caller can use the right one.

Fixes: 5262f567 ("io_uring: IORING_OP_TIMEOUT support")
Reported-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NJackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7adf4eaf

10 10月, 2019 1 次提交

io_uring: only flush workqueues on fileset removal · 8a997340

由 Jens Axboe 提交于 10月 09, 2019

We should not remove the workqueue, we just need to ensure that the
workqueues are synced. The workqueues are torn down on ctx removal.

Cc: stable@vger.kernel.org
Fixes: 6b06314c ("io_uring: add file set registration")
Reported-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8a997340

08 10月, 2019 1 次提交

io_uring: remove wait loop spurious wakeups · 6805b32e

由 Pavel Begunkov 提交于 10月 08, 2019

Any changes interesting to tasks waiting in io_cqring_wait() are
commited with io_cqring_ev_posted(). However, io_ring_drop_ctx_refs()
also tries to do that but with no reason, that means spurious wakeups
every io_free_req() and io_uring_enter().

Just use percpu_ref_put() instead.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6805b32e

04 10月, 2019 1 次提交

io_uring: fix reversed nonblock flag for link submission · bf7ec93c

由 Pavel Begunkov 提交于 10月 04, 2019

io_queue_link_head() accepts @force_nonblock flag, but io_ring_submit()
passes something opposite.

Fixes: c5766668 ("io_uring: optimize submit_and_wait API")
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bf7ec93c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功