- 06 11月, 2019 3 次提交
-
-
由 Jens Axboe 提交于
We had no more use for this flag after the conversion to io-wq, kill it off. Fixes: 561fb04a ("io_uring: replace workqueue usage with io-wq") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If a request fails, we need to ensure we set REQ_F_FAIL_LINK on it if REQ_F_LINK is set. Any failure in the chain should break the chain. We were missing a few spots where this should be done. It might be nice to generalize this somewhat at some point, as long as we factor in the fact that failure looks different for each request type. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
As introduced by commit: ba816ad6 ("io_uring: run dependent links inline if possible") enable inline dependent link running for poll commands. io_poll_complete_work() is the most important change, as it allows a linked sequence of { POLL, READ } (for example) to proceed inline instead of needing to get punted to another async context. The submission side only potentially matters for sqthread, but may as well include that bit. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 11月, 2019 1 次提交
-
-
由 Jens Axboe 提交于
We currently don't have a completion event trace, add one of those. And to better be able to match up submissions and completions, add user_data to the submission trace as well. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 11月, 2019 2 次提交
-
-
由 Jackie Liu 提交于
We didn't use -ERESTARTSYS to tell the application layer to restart the system call, but instead return -EINTR. we can set -EINTR directly when wakeup by the signal, which can help us save an assignment operation and comparison operation. Reviewed-by: NBob Liu <bob.liu@oracle.com> Signed-off-by: NJackie Liu <liuyun01@kylinos.cn> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This adds support for IORING_OP_ASYNC_CANCEL, which will attempt to cancel requests that have been punted to async context and are now in-flight. This works for regular read/write requests to files, as long as they haven't been started yet. For socket based IO (or things like accept4(2)), we can cancel work that is already running as well. To cancel a request, the sqe must have ->addr set to the user_data of the request it wishes to cancel. If the request is cancelled successfully, the original request is completed with -ECANCELED and the cancel request is completed with a result of 0. If the request was already running, the original may or may not complete in error. The cancel request will complete with -EALREADY for that case. And finally, if the request to cancel wasn't found, the cancel request is completed with -ENOENT. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 30 10月, 2019 17 次提交
-
-
由 Jens Axboe 提交于
syzbot reported an issue where we crash at setup time if failslab is used. The issue is that io_wq_create() returns an error pointer on failure, not NULL. Hence io_uring thought the io-wq was setup just fine, but in reality it's a garbage error pointer. Use IS_ERR() instead of a NULL check, and assign ret appropriately. Reported-by: syzbot+221cc24572a2fed23b6b@syzkaller.appspotmail.com Fixes: 561fb04a ("io_uring: replace workqueue usage with io-wq") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we get -1 from hrtimer_try_to_cancel(), we know that the timer is running. Hence leave all completion to the timeout handler. If we don't, we can corrupt the list and miss a completion. Fixes: 11365043 ("io_uring: add support for canceling timeout requests") Reported-by: NHrvoje Zeba <zeba.hrvoje@gmail.com> Tested-by: NHrvoje Zeba <zeba.hrvoje@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
There's been a few requests for supporting more fixed files than 1024. This isn't really tricky to do, we just need to split up the file table into multiple tables and index appropriately. As we do so, reduce the max single file table to 512. This enables us to do single page allocs always for the tables, which is an improvement over the situation prior. This patch adds support for up to 64K files, which should be enough for everyone. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We index the file tables with a user given value. After we check it's within our limits, use array_index_nospec() to prevent any spectre attacks here. Suggested-by: NJann Horn <jannh@google.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This allows an application to call accept4() in an async fashion. Like other opcodes, we first try a non-blocking accept, then punt to async context if we have to. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This is in preparation for adding opcodes that need to add new files in a process file table, system calls like open(2) or accept4(2). If an opcode needs this, it must set IO_WQ_WORK_NEEDS_FILES in the work item. If work that needs to get punted to async context have this set, the async worker will assume the original task file table before executing the work. Note that opcodes that need access to the current files of an application cannot be done through IORING_SETUP_SQPOLL. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Drop various work-arounds we have for workqueues: - We no longer need the async_list for tracking sequential IO. - We don't have to maintain our own mm tracking/setting. - We don't need a separate workqueue for buffered writes. This didn't even work that well to begin with, as it was suboptimal for multiple buffered writers on multiple files. - We can properly cancel pending interruptible work. This fixes deadlocks with particularly socket IO, where we cannot cancel them when the io_uring is closed. Hence the ring will wait forever for these requests to complete, which may never happen. This is different from disk IO where we know requests will complete in a finite amount of time. - Due to being able to cancel work interruptible work that is already running, we can implement file table support for work. We need that for supporting system calls that add to a process file table. - It gets us one step closer to adding async support for any system call. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Commit fb5ccc98 ("io_uring: Fix broken links with offloading") introduced a potential performance regression with unconditionally taking mm even for READ/WRITE_FIXED operations. Return the logic handling it back. mm-faulted requests will go through the generic submission path, so honoring links and drains, but will fail further on req->has_user check. Fixes: fb5ccc98 ("io_uring: Fix broken links with offloading") Cc: stable@vger.kernel.org # v5.4 Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
submit->index is used only for inbound check in submission path (i.e. head < ctx->sq_entries). However, it always will be true, as 1. it's already validated by io_get_sqring() 2. ctx->sq_entries can't be changedd in between, because of held ctx->uring_lock and ctx->refs. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dmitrii Dolgov 提交于
To trace io_uring activity one can get an information from workqueue and io trace events, but looks like some parts could be hard to identify via this approach. Making what happens inside io_uring more transparent is important to be able to reason about many aspects of it, hence introduce the set of tracing events. All such events could be roughly divided into two categories: * those, that are helping to understand correctness (from both kernel and an application point of view). E.g. a ring creation, file registration, or waiting for available CQE. Proposed approach is to get a pointer to an original structure of interest (ring context, or request), and then find relevant events. io_uring_queue_async_work also exposes a pointer to work_struct, to be able to track down corresponding workqueue events. * those, that provide performance related information. Mostly it's about events that change the flow of requests, e.g. whether an async work was queued, or delayed due to some dependencies. Another important case is how io_uring optimizations (e.g. registered files) are utilized. Signed-off-by: NDmitrii Dolgov <9erthalion6@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We might have cases where the need for a specific timeout is gone, add support for canceling an existing timeout operation. This works like the POLL_REMOVE command, where the application passes in the user_data of the timeout it wishes to cancel in the sqe->addr field. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This is a pretty trivial addition on top of the relative timeouts we have now, but it's handy for ensuring tighter timing for those that are building scheduling primitives on top of io_uring. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jackie Liu 提交于
There is no function change, just to clean up the code, use s->in_async to make the code know where it is. Signed-off-by: NJackie Liu <liuyun01@kylinos.cn> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We currently size the CQ ring as twice the SQ ring, to allow some flexibility in not overflowing the CQ ring. This is done because the SQE life time is different than that of the IO request itself, the SQE is consumed as soon as the kernel has seen the entry. Certain application don't need a huge SQ ring size, since they just submit IO in batches. But they may have a lot of requests pending, and hence need a big CQ ring to hold them all. By allowing the application to control the CQ ring size multiplier, we can cater to those applications more efficiently. If an application wants to define its own CQ ring size, it must set IORING_SETUP_CQSIZE in the setup flags, and fill out io_uring_params->cq_entries. The value must be a power of two. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Allows the application to remove/replace/add files to/from a file set. Passes in a struct: struct io_uring_files_update { __u32 offset; __s32 *fds; }; that holds an array of fds, size of array passed in through the usual nr_args part of the io_uring_register() system call. The logic is as follows: 1) If ->fds[i] is -1, the existing file at i + ->offset is removed from the set. 2) If ->fds[i] is a valid fd, the existing file at i + ->offset is replaced with ->fds[i]. For case #2, is the existing file is currently empty (fd == -1), the new fd is simply added to the array. Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This is in preparation for allowing updates to fixed file sets without requiring a full unregister+register. Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Currently any dependent link is executed from a new workqueue context, which means that we'll be doing a context switch per link in the chain. If we are running the completion of the current request from our async workqueue and find that the next request is a link, then run it directly from the workqueue context instead of forcing another switch. This improves the performance of linked SQEs, and reduces the CPU overhead. Reviewed-by: NJackie Liu <liuyun01@kylinos.cn> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 10月, 2019 2 次提交
-
-
由 Jens Axboe 提交于
syzkaller reported an issue where it looks like a malicious app can trigger a use-after-free of reading the ctx ->sq_array and ->rings value right after having installed the ring fd in the process file table. Defer ring fd installation until after we're done reading those values. Fixes: 75b28aff ("io_uring: allocate the two rings together") Reported-by: syzbot+6f03d895a6cd0d06187f@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_queue_link_head() owns shadow_req after taking it as an argument. By not freeing it in case of an error, it can leak the request along with taken ctx->refs. Reviewed-by: NJackie Liu <liuyun01@kylinos.cn> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 10月, 2019 2 次提交
-
-
由 Jens Axboe 提交于
We currently assume that submissions from the sqthread are successful, and if IO polling is enabled, we use that value for knowing how many completions to look for. But if we overflowed the CQ ring or some requests simply got errored and already completed, they won't be available for polling. For the case of IO polling and SQTHREAD usage, look at the pending poll list. If it ever hits empty then we know that we don't have anymore pollable requests inflight. For that case, simply reset the inflight count to zero. Reported-by: NPavel Begunkov <asml.silence@gmail.com> Reviewed-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We currently use the ring values directly, but that can lead to issues if the application is malicious and changes these values on our behalf. Created in-kernel cached versions of them, and just overwrite the user side when we update them. This is similar to how we treat the sq/cq ring tail/head updates. Reported-by: NPavel Begunkov <asml.silence@gmail.com> Reviewed-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 10月, 2019 3 次提交
-
-
由 Pavel Begunkov 提交于
io_ring_submit() finalises with 1. io_commit_sqring(), which releases sqes to the userspace 2. Then calls to io_queue_link_head(), accessing released head's sqe Reorder them. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_sq_thread() processes sqes by 8 without considering links. As a result, links will be randomely subdivided. The easiest way to fix it is to call io_get_sqring() inside io_submit_sqes() as do io_ring_submit(). Downsides: 1. This removes optimisation of not grabbing mm_struct for fixed files 2. It submitting all sqes in one go, without finer-grained sheduling with cq processing. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
There is a bug, where failed linked requests are returned not with specified @user_data, but with garbage from a kernel stack. The reason is that io_fail_links() uses req->user_data, which is uninitialised when called from io_queue_sqe() on fail path. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 10月, 2019 3 次提交
-
-
由 zhangyi (F) 提交于
The sequence number of the timeout req (req->sequence) indicate the expected completion request. Because of each timeout req consume a sequence number, so the sequence of each timeout req on the timeout list shouldn't be the same. But now, we may get the same number (also incorrect) if we insert a new entry before the last one, such as submit such two timeout reqs on a new ring instance below. req->sequence req_1 (count = 2): 2 req_2 (count = 1): 2 Then, if we submit a nop req, req_2 will still timeout even the nop req finished. This patch fix this problem by adjust the sequence number of each reordered reqs when inserting a new entry. Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 zhangyi (F) 提交于
The sequence number of reqs on the timeout_list before the timeout req should be adjusted in io_timeout_fn(), because the current timeout req will consumes a slot in the cq_ring and cq_tail pointer will be increased, otherwise other timeout reqs may return in advance without waiting for enough wait_nr. Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
There are cases where it isn't always safe to block for submission, even if the caller asked to wait for events as well. Revert the previous optimization of doing that. This reverts two commits: bf7ec93c c5766668 Fixes: c5766668 ("io_uring: optimize submit_and_wait API") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 10月, 2019 2 次提交
-
-
由 yangerkun 提交于
If ctx->cached_sq_head < nxt_sq_head, we should add UINT_MAX to tmp, not tmp_nxt. Fixes: 5da0fb1a ("io_uring: consider the overflow of sequence for timeout req") Signed-off-by: Nyangerkun <yangerkun@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We've got two issues with the non-regular file handling for non-blocking IO: 1) We don't want to re-do a short read in full for a non-regular file, as we can't just read the data again. 2) For non-regular files that don't support non-blocking IO attempts, we need to punt to async context even if the file is opened as non-blocking. Otherwise the caller always gets -EAGAIN. Add two new request flags to handle these cases. One is just a cache of the inode S_ISREG() status, the other tells io_uring that we always need to punt this request to async context, even if REQ_F_NOWAIT is set. Cc: stable@vger.kernel.org Reported-by: NHrvoje Zeba <zeba.hrvoje@gmail.com> Tested-by: NHrvoje Zeba <zeba.hrvoje@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 10月, 2019 1 次提交
-
-
由 yangerkun 提交于
Now we recalculate the sequence of timeout with 'req->sequence = ctx->cached_sq_head + count - 1', judge the right place to insert for timeout_list by compare the number of request we still expected for completion. But we have not consider about the situation of overflow: 1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for the new timeout req can have a small req->sequence. 2. cached_sq_head of now may overflow compare with before req. And it will lead the timeout req with small req->sequence. This overflow will lead to the misorder of timeout_list, which can lead to the wrong order of the completion of timeout_list. Fix it by reuse req->submit.sequence to store the count, and change the logic of inserting sort in io_timeout. Signed-off-by: Nyangerkun <yangerkun@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 10月, 2019 1 次提交
-
-
由 Jens Axboe 提交于
We have two ways a request can be deferred: 1) It's a regular request that depends on another one 2) It's a timeout that tracks completions We have a shared helper to determine whether to defer, and that attempts to make the right decision based on the request. But we only have some of this information in the caller. Un-share the two timeout/defer helpers so the caller can use the right one. Fixes: 5262f567 ("io_uring: IORING_OP_TIMEOUT support") Reported-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NJackie Liu <liuyun01@kylinos.cn> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 10月, 2019 1 次提交
-
-
由 Jens Axboe 提交于
We should not remove the workqueue, we just need to ensure that the workqueues are synced. The workqueues are torn down on ctx removal. Cc: stable@vger.kernel.org Fixes: 6b06314c ("io_uring: add file set registration") Reported-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 10月, 2019 1 次提交
-
-
由 Pavel Begunkov 提交于
Any changes interesting to tasks waiting in io_cqring_wait() are commited with io_cqring_ev_posted(). However, io_ring_drop_ctx_refs() also tries to do that but with no reason, that means spurious wakeups every io_free_req() and io_uring_enter(). Just use percpu_ref_put() instead. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 10月, 2019 1 次提交
-
-
由 Pavel Begunkov 提交于
io_queue_link_head() accepts @force_nonblock flag, but io_ring_submit() passes something opposite. Fixes: c5766668 ("io_uring: optimize submit_and_wait API") Reported-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-