- 03 12月, 2019 3 次提交
-
-
由 Jens Axboe 提交于
Currently we don't copy the iovecs when we punt to async context. This can be problematic for applications that store the iovec on the stack, as they often assume that it's safe to let the iovec go out of scope as soon as IO submission has been called. This isn't always safe, as we will re-copy the iovec once we're in async context. Make this 100% safe by copying the iovec just once. With this change, applications may safely store the iovec on the stack for all cases. Reported-by: N李通洲 <carter.li@eoitek.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Right now we just copy the sqe for async offload, but we want to store more context across an async punt. In preparation for doing so, put the sqe copy inside a structure that we can expand. With this pointer added, we can get rid of REQ_F_FREE_SQE, as that is now indicated by whether req->io is NULL or not. No functional changes in this patch. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We should never return -ERESTARTSYS to userspace, transform it into -EINTR. Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 12月, 2019 1 次提交
-
-
由 Jens Axboe 提交于
syzbot reports: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 9217 Comm: io_uring-sq Not tainted 5.4.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline] RIP: 0010:__validate_creds include/linux/cred.h:187 [inline] RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550 Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c 24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318 RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010 RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849 R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000 R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: io_sq_thread+0x1c7/0xa20 fs/io_uring.c:3274 kthread+0x361/0x430 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Modules linked in: ---[ end trace f2e1a4307fbe2245 ]--- RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline] RIP: 0010:__validate_creds include/linux/cred.h:187 [inline] RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550 Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c 24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318 RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010 RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849 R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000 R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 which is caused by slab fault injection triggering a failure in prepare_creds(). We don't actually need to create a copy of the creds as we're not modifying it, we just need a reference on the current task creds. This avoids the failure case as well, and propagates the const throughout the stack. Fixes: 181e448d ("io_uring: async workers should inherit the user creds") Reported-by: syzbot+5320383e16029ba057ff@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 30 11月, 2019 1 次提交
-
-
由 Jens Axboe 提交于
Christophe reports that current master fails building on powerpc with this error: CC fs/io_uring.o fs/io_uring.c: In function ‘loop_rw_iter’: fs/io_uring.c:1628:21: error: implicit declaration of function ‘kmap’ [-Werror=implicit-function-declaration] iovec.iov_base = kmap(iter->bvec->bv_page) ^ fs/io_uring.c:1628:19: warning: assignment makes pointer from integer without a cast [-Wint-conversion] iovec.iov_base = kmap(iter->bvec->bv_page) ^ fs/io_uring.c:1643:4: error: implicit declaration of function ‘kunmap’ [-Werror=implicit-function-declaration] kunmap(iter->bvec->bv_page); ^ which is caused by a missing highmem.h include. Fix it by including it. Fixes: 311ae9e1 ("io_uring: fix dead-hung for non-iter fixed rw") Reported-by: NChristophe Leroy <christophe.leroy@c-s.fr> Tested-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 11月, 2019 1 次提交
-
-
由 Roman Penyaev 提交于
That is a bit weird scenario but I find it interesting to run fio loads using LKL linux, where MMU is disabled. Probably other real archs which run uClinux can also benefit from this patch. Signed-off-by: NRoman Penyaev <rpenyaev@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 11月, 2019 4 次提交
-
-
由 Jens Axboe 提交于
In the quest to bring io_kiocb down to 3 cachelines, this one does the trick. Make the wait_queue_entry for the poll command come out of kmalloc instead of embedding it in struct io_poll_iocb, as the latter is the largest member of io_kiocb. Once we trim this down a bit, we're back at a healthy 192 bytes for struct io_kiocb. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Clean io_import_fixed() call site and make it return proper type. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
There is no point left in keeping struct sqe_submit. Inline it into struct io_kiocb, so any req->submit.field is now just req->field - moves initialisation of ring_file into io_get_req() - removes duplicated req->sequence. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Timeouts' sequence offset (i.e. sqe->off) is stored in req->submit.sequence under a false name. Keep it in timeout.data instead. The unused space for sequence will be reclaimed in the following patches. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 11月, 2019 30 次提交
-
-
由 Hrvoje Zeba 提交于
This field contains a pointer to addrlen and checking to see if it's set returns -EINVAL if the caller sets addr & addrlen pointers. Fixes: 17f2fe35 ("io_uring: add support for IORING_OP_ACCEPT") Signed-off-by: NHrvoje Zeba <zeba.hrvoje@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we don't inherit the original task creds, then we can confuse users like fuse that pass creds in the request header. See link below on identical aio issue. Link: https://lore.kernel.org/linux-fsdevel/26f0d78e-99ca-2f1b-78b9-433088053a61@scylladb.com/T/#uSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We currently pass in 4 arguments outside of the bounded size. In preparation for adding one more argument, let's bundle them up in a struct to make it more readable. No functional changes in this patch. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Read/write requests to devices without implemented read/write_iter using fixed buffers can cause general protection fault, which totally hangs a machine. io_import_fixed() initialises iov_iter with bvec, but loop_rw_iter() accesses it as iovec, dereferencing random address. kmap() page by page in this case Cc: stable@vger.kernel.org Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This allows an application to call connect() in an async fashion. Like other opcodes, we first try a non-blocking connect, then punt to async context if we have to. Note that we can still return -EINPROGRESS, and in that case the caller should use IORING_OP_POLL_ADD to do an async wait for completion of the connect request (just like for regular connect(2), except we can do it async here too). Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We return -EBUSY on submit when we have a CQ ring overflow backlog, but that can be a bit problematic if the application is using pure userspace poll of the CQ ring. For that case, if the ring briefly overflowed and we have pending entries in the backlog, the submit flushes the backlog successfully but still returns -EBUSY. If we're able to fully flush the CQ ring backlog, let the submission proceed. Reported-by: NDan Melnic <dmm@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Pass only non-null @nxt to io_issue_sqe() and handle it at the caller's side. And propagate it. - kiocb_done() is only called from io_read() and io_write(), which are only called from io_issue_sqe(), so it's @nxt != NULL - io_put_req_find_next() is called either with explicitly non-null local nxt, or from one of the functions in io_issue_sqe() switch (or their callees). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
"if (nxt)" is always true, as it was checked in the while's condition. io_wq_current_is_worker() is unnecessary, as non-async callers don't pass nxt, so io_queue_async_work() will be called for them anyway. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Make io_req_find_next() and io_req_link_next() to accept only non-null nxt, and handle it in callers. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
There is only one one-liner user of io_free_req_find_next(). Inline it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
The number of SQEs to submit is specified by a user, so io_get_sqring() in most of the cases succeeds. Hint compilers about that. Checking ASM genereted by gcc 9.2.0 for x64, there is one branch misprediction. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
__io_submit_sqe() is issuing requests, so call it as such. Moreover, it ends by calling io_iopoll_req_issued(). Rename it and make terminology clearer. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We don't have shadow requests anymore, so get rid of the shadow argument. Add the user_data argument, as that's often useful to easily match up requests, instead of having to look at request pointers. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
There's an issue with the shadow drain logic in that we drop the completion lock after deciding to defer a request, then re-grab it later and assume that the state is still the same. In the mean time, someone else completing a request could have found and issued it. This can cause a stall in the queue, by having a shadow request inserted that nobody is going to drain. Additionally, if we fail allocating the shadow request, we simply ignore the drain. Instead of using a shadow request, defer the next request/link instead. This also has the following advantages: - removes semi-duplicated code - doesn't allocate memory for shadows - works better if only the head marked for drain - doesn't need complex synchronisation On the flip side, it removes the shadow->seq == last_drain_in_in_link->seq optimization. That shouldn't be a common case, and can always be added back, if needed. Fixes: 4fe2c963 ("io_uring: add support for link with drain") Cc: Jackie Liu <liuyun01@kylinos.cn> Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
When we find new work to process within the work handler, we queue the linked timeout before we have issued the new work. This can be problematic for very short timeouts, as we have a window where the new work isn't visible. Allow the work handler to store a callback function for this in the work item, and flag it with IO_WQ_WORK_CB if the caller has done so. If that is set, then io-wq will call the callback when it has setup the new work item. Reported-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We currently try and start the next link when we put the request, and only if we were going to free it. This means that the optimization to continue executing requests from the same context often fails, as we're not putting the final reference. Add REQ_F_LINK_NEXT to keep track of this, and allow io_uring to find the next request more efficiently. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We currently rely on the ring destroy on cleaning things up in case of failure, but io_allocate_scq_urings() can leave things half initialized if only parts of it fails. Be nice and return with either everything setup in success, or return an error with things nicely cleaned up. Reported-by: syzbot+0d818c0d39399188f393@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Always mark requests with allocated sqe and deallocate it in __io_free_req(). It's easier to follow and doesn't add edge cases. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We currently clear the linked timeout field if we cancel such a timeout, but we should only attempt to cancel if it's the first one we see. Others should simply be freed like other requests, as they haven't been started yet. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
let have a dependant link: REQ -> LINK_TIMEOUT -> LINK_TIMEOUT 1. submission stage: submission references for REQ and LINK_TIMEOUT are dropped. So, references respectively (1,1,2) 2. io_put(REQ) + FAIL_LINKS stage: calls io_fail_links(), which for all linked timeouts will call cancel_timeout() and drop 1 reference. So, references after: (0,0,1). That's a leak. Make it treat only the first linked timeout as such, and pass others through __io_double_put_req(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Pass any IORING_OP_LINK_TIMEOUT request further, where it will eventually fail in io_issue_sqe(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
If io_req_defer() failed, it needs to cancel a dependant link. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We currently don't explicitly break links if a request is cancelled, but we should. Add explicitly link breakage for all types of request cancellations that we support. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Currently a poll request fills a completion entry of 0, even if it got cancelled. This is odd, and it makes it harder to support with chains. Ensure that it returns -ECANCELED in the completions events if it got cancelled, and furthermore ensure that the linked timeout that triggered it completes with -ETIME if we did indeed trigger the completions through a timeout. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
With the conversion to io-wq, we no longer use that flag. Kill it. Fixes: 561fb04a ("io_uring: replace workqueue usage with io-wq") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We have an issue with timeout links that are deeper in the submit chain, because we only handle it upfront, not from later submissions. Move the prep + issue of the timeout link to the async work prep handler, and do it normally for non-async queue. If we validate and prepare the timeout links upfront when we first see them, there's nothing stopping us from supporting any sort of nesting. Fixes: 2665abfd ("io_uring: add support for linked SQE timeouts") Reported-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
There are a few reasons for this: - As a prep to improving the linked timeout logic - io_timeout is the biggest member in the io_kiocb opcode union This also enables a few cleanups, like unifying the timer setup between IORING_OP_TIMEOUT and IORING_OP_LINK_TIMEOUT, and not needing multiple arguments to the link/prep helpers. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we don't use the normal completion path, we may skip killing links that should be errored and freed. Add __io_double_put_req() for use within the completion path itself, other calls should just use io_double_put_req(). Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
__io_queue_sqe(), io_queue_sqe(), io_queue_link_head() all return 0/err, but the caller doesn't care since the errors are handled inline. Clean these up and just make them void. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we have a linked request, this enables us to pass it back directly without having to go through async context. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-