- 04 3月, 2021 8 次提交
-
-
由 Jens Axboe 提交于
We prune the full cache regardless, get rid of the dead argument. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Destroy current's io-wq backend and tctx on __io_uring_task_cancel(), aka exec(). Looks it's not strictly necessary, because it will be done at some point when the task dies and changes of creds/files/etc. are handled, but better to do that earlier to free io-wq and not potentially lock previous mm and other resources for the time being. It's safe to do because we wait for all requests of the current task to complete, so no request will use tctx afterwards. Note, that io_uring_files_cancel() may leave some requests for later reaping, so it leaves tctx intact, that's ok as the task is dying anyway. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Make sure that we killed an io-wq by the time a task is dead. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We clear the bit marking the ctx task_work as active after having run the queued work, but we really should be clearing it before. Otherwise we can hit a tiny race ala: CPU0 CPU1 io_task_work_add() tctx_task_work() run_work add_to_list test_and_set_bit clear_bit already set and CPU0 will return thinking the task_work is queued, while in reality it's already being run. If we hit the condition after __tctx_task_work() found no more work, but before we've cleared the bit, then we'll end up thinking it's queued and will be run. In reality it is queued, but we didn't queue the ctx task_work to ensure that it gets run. Fixes: 7cbf1722 ("io_uring: provide FIFO ordering for task_work") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we put the io-wq from io_uring, we really want it to exit. Provide a helper that does that for us. Couple that with not having the manager hold a reference to the 'wq' and the normal SQPOLL exit will tear down the io-wq context appropriate. On the io-wq side, our wq context is per task, so only the task itself is manipulating ->manager and hence it's safe to check and clear without any extra locking. We just need to ensure that the manager task stays around, in case it exits. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We want to reuse this completion, and a single complete should do just fine. Ensure that we park ourselves first if requested, as that is what lead to the initial deadlock in this area. If we've got someone attempting to park us, then we can't proceed without having them finish first. Fixes: 37d1e2e3 ("io_uring: move SQPOLL thread io-wq forked worker") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_uring_try_cancel_requests() matches not only current's requests, but also of other exiting tasks, so we need to actively cancel them and not just wait, especially since the function can be called on flush during do_exit() -> exit_files(). Even if it's not a problem for now, it's much nicer to know that the function tries to cancel everything it can. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we fail to fork an SQPOLL worker, we can hit cancel, and hence attempted thread stop, with the thread already being stopped. Ensure we check for that. Also guard thread stop fully by the sqd mutex, just like we do for park. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 2月, 2021 4 次提交
-
-
由 Jens Axboe 提交于
Just like the changes for io-wq, ensure that we re-fork the SQPOLL thread if the owner execs. Mark the ctx sq thread as sqo_exec if it dies, and the ring as needing a wakeup which will force the task to enter the kernel. When it does, setup the new thread and proceed as usual. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
exec will cancel any threads, including the ones that io-wq is using. This isn't a problem, in fact we'd prefer it to be that way since it means we know that any async work cancels naturally without having to handle it proactively. But it does mean that we need to setup a new manager, as the manager and workers are gone. Handle this at queue time, and cancel work if we fail. Since the manager can go away without us noticing, ensure that the manager itself holds a reference to the 'wq' as well. Rename io_wq_destroy() to io_wq_put() to reflect that. In the future we can now simplify exec cancelation handling, for now just leave it the same. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
syzbot reports the following hang: INFO: task syz-executor.0:12538 can't die for more than 143 seconds. task:syz-executor.0 state:D stack:28352 pid:12538 ppid: 8423 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:4324 [inline] __schedule+0x90c/0x21a0 kernel/sched/core.c:5075 schedule+0xcf/0x270 kernel/sched/core.c:5154 schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x168/0x270 kernel/sched/completion.c:138 io_sq_thread_finish+0x96/0x580 fs/io_uring.c:7152 io_sq_offload_create fs/io_uring.c:7929 [inline] io_uring_create fs/io_uring.c:9465 [inline] io_uring_setup+0x1fb2/0x2c20 fs/io_uring.c:9550 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae which is due to exiting after the SQPOLL thread has been created, but hasn't been started yet. Ensure that we always complete the startup side when waiting for it to exit. Reported-by: syzbot+c927c937cba8ef66dd4a@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Before the io-wq thread change, we maintained a hash work map and lock per-node per-ring. That wasn't ideal, as we really wanted it to be per ring. But now that we have per-task workers, the hash map ends up being just per-task. That'll work just fine for the normal case of having one task use a ring, but if you share the ring between tasks, then it's considerably worse than it was before. Make the hash map per ctx instead, which provides full per-ctx buffered write serialization on hashed writes. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 2月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
This reverts commit 88f171ab. I ran into a case where the ref resurrect now spins, so revert this change for now until we can further investigate why it's broken. The bug seems to indicate spinning on the lock itself, likely there's some ABBA deadlock involved: [<0>] __percpu_ref_switch_mode+0x45/0x180 [<0>] percpu_ref_resurrect+0x46/0x70 [<0>] io_refs_resurrect+0x25/0xa0 [<0>] __io_uring_register+0x135/0x10c0 [<0>] __x64_sys_io_uring_register+0xc2/0x1a0 [<0>] do_syscall_64+0x42/0x110 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 2月, 2021 7 次提交
-
-
由 Jens Axboe 提交于
If the task ends up doing no IO, the context list is empty and we don't call into __io_uring_files_cancel() when the task exits. This can cause a leak of the io-wq structures. Ensure we always call __io_uring_files_cancel(), even if the task context list is empty. Fixes: 5aa75ed5 ("io_uring: tie async worker side to the task context") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
At this point we're only using it for memory accounting, so there's no need to have an extra ->limit_mem - we can just set ->user if we do the accounting, or leave it at NULL if we don't. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We're now just using fork like we would from userspace, so there's no need to try and impose extra restrictions or accounting on the user side of things. That's already being done for us. That also means we don't have to pass in the user_struct anymore, that's correctly inherited through ->creds on fork. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
A few reasons to do this: - The naming of the manager and worker have changed. That's a user visible change, so makes sense to flag it. - Opening certain files that use ->signal (like /proc/self or /dev/tty) now works, and the flag tells the application upfront that this is the case. - Related to the above, using signalfd will now work as well. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Don't forget to zero locked_free_nr, it's not a disaster but makes it attempting to flush it with extra locking when there is nothing in the list. Also, don't traverse a potentially long list freeing requests under spinlock, splice the list and do it afterwards. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we're exiting the ring, just let the IO fail with -EAGAIN as nobody will care anyway. It's not the right context to reissue from. Cc: stable@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Don't use a kthread for SQPOLL, use a forked worker just like the io-wq workers. With that done, we can drop the various context grabbing we do for SQPOLL, it already has everything it needs. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 2月, 2021 8 次提交
-
-
由 Pavel Begunkov 提交于
BUG: KASAN: double-free or invalid-free in io_req_caches_free.constprop.0+0x3ce/0x530 fs/io_uring.c:8709 Workqueue: events_unbound io_ring_exit_work Call Trace: [...] __cache_free mm/slab.c:3424 [inline] kmem_cache_free_bulk+0x4b/0x1b0 mm/slab.c:3744 io_req_caches_free.constprop.0+0x3ce/0x530 fs/io_uring.c:8709 io_ring_ctx_free fs/io_uring.c:8764 [inline] io_ring_exit_work+0x518/0x6b0 fs/io_uring.c:8846 process_one_work+0x98d/0x1600 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Freed by task 11900: [...] kmem_cache_free_bulk+0x4b/0x1b0 mm/slab.c:3744 io_req_caches_free.constprop.0+0x3ce/0x530 fs/io_uring.c:8709 io_uring_flush+0x483/0x6e0 fs/io_uring.c:9237 filp_close+0xb4/0x170 fs/open.c:1286 close_files fs/file.c:403 [inline] put_files_struct fs/file.c:418 [inline] put_files_struct+0x1d0/0x350 fs/file.c:415 exit_files+0x7e/0xa0 fs/file.c:435 do_exit+0xc27/0x2ae0 kernel/exit.c:820 do_group_exit+0x125/0x310 kernel/exit.c:922 [...] io_req_caches_free() doesn't zero submit_state->free_reqs, so io_uring considers just freed requests to be good and sound and will reuse or double free them. Zero the counter. Reported-by: syzbot+30b4936dcdb3aafa4fb4@syzkaller.appspotmail.com Fixes: 41be53e9 ("io_uring: kill cached requests from exiting task closing the ring") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We are no longer grabbing state, so no need to maintain an IO identity that we COW if there are changes. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
The async workers are siblings of the task itself, so by definition we have all the state that we need. Remove any of the state grabbing that we have, and requests flagging what they need. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Instead of using regular kthread kernel threads, create kernel threads that are like a real thread that the task would create. This ensures that we get all the context that we need, without having to carry that state around. This greatly reduces the code complexity, and the risk of missing state for a given request type. With the move away from kthread, we can also dump everything related to assigned state to the new threads. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Move it outside of the io_ring_ctx, and tie it to the io_uring task context. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Moving towards making the io_wq per ring per task, so we can't really share it between rings. Which is fine, since we've now dropped some of that fat from it. Retain compatibility with how attaching works, so that any attempt to attach to an fd that doesn't exist, or isn't an io_uring fd, will fail like it did before. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We hit this case when the task is exiting, and we need somewhere to do background cleanup of requests. Instead of relying on the io-wq task manager to do this work for us, just stuff it somewhere where we can safely run it ourselves directly. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Do run task_work before io_uring_register(), that might make a first quiesce round much nicer. We generally do that for any syscall invocation to avoid spurious -EINTR/-ERESTARTSYS, for task_work that we generate. This patch brings io_uring_register() inline with the two other io_uring syscalls. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 2月, 2021 7 次提交
-
-
由 Pavel Begunkov 提交于
sqe->flags are subset of req flags, so incorrectly copied may span into in-kernel flags and wreck havoc, e.g. by setting REQ_F_INFLIGHT. Fixes: 5be9ad1e ("io_uring: optimise io_init_req() flags setting") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
There is a short window where percpu_refs are already turned zero, but we try to do resurrect(). Play nicer and wait for ->release() to happen in this case and proceed as everything is ok. One downside for ctx refs is that we can ignore signal_pending() on a rare occasion, but someone else should check for it later if needed. Cc: <stable@vger.kernel.org> # 5.5+ Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_rsrc_ref_quiesce() is a generic resource function, though now it was wired to allocate and initialise ref nodes with file-specific callbacks/etc. Keep it sane by passing in as a parameters everything we need for initialisations, otherwise it will hurt us badly one day. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
After a rsrc/files reference node's refs are killed, it must never be used. And that's how it works, it either assigns a new node or kills the whole data table. Let's explicitly NULL it, that shouldn't be necessary, but if something would go wrong I'd rather catch a NULL dereference to using a dangling pointer. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
With the prep and prep async split, we now have potentially 3 helpers that need to be defined for !CONFIG_NET. Add some helpers to do just that. Fixes the following compile error on !CONFIG_NET: fs/io_uring.c:6171:10: error: implicit declaration of function 'io_sendmsg_prep_async'; did you mean 'io_req_prep_async'? [-Werror=implicit-function-declaration] return io_sendmsg_prep_async(req); ^~~~~~~~~~~~~~~~~~~~~ io_req_prep_async Fixes: 93642ef8 ("io_uring: split sqe-prep and async setup") Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Hao Xu 提交于
Abaci reported the below issue: [ 141.400455] hrtimer: interrupt took 205853 ns [ 189.869316] process 'usr/local/ilogtail/ilogtail_0.16.26' started with executable stack [ 250.188042] [ 250.188327] ============================================ [ 250.189015] WARNING: possible recursive locking detected [ 250.189732] 5.11.0-rc4 #1 Not tainted [ 250.190267] -------------------------------------------- [ 250.190917] a.out/7363 is trying to acquire lock: [ 250.191506] ffff888114dbcbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __io_req_task_submit+0x29/0xa0 [ 250.192599] [ 250.192599] but task is already holding lock: [ 250.193309] ffff888114dbfbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_register+0xad/0x210 [ 250.194426] [ 250.194426] other info that might help us debug this: [ 250.195238] Possible unsafe locking scenario: [ 250.195238] [ 250.196019] CPU0 [ 250.196411] ---- [ 250.196803] lock(&ctx->uring_lock); [ 250.197420] lock(&ctx->uring_lock); [ 250.197966] [ 250.197966] *** DEADLOCK *** [ 250.197966] [ 250.198837] May be due to missing lock nesting notation [ 250.198837] [ 250.199780] 1 lock held by a.out/7363: [ 250.200373] #0: ffff888114dbfbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_register+0xad/0x210 [ 250.201645] [ 250.201645] stack backtrace: [ 250.202298] CPU: 0 PID: 7363 Comm: a.out Not tainted 5.11.0-rc4 #1 [ 250.203144] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 250.203887] Call Trace: [ 250.204302] dump_stack+0xac/0xe3 [ 250.204804] __lock_acquire+0xab6/0x13a0 [ 250.205392] lock_acquire+0x2c3/0x390 [ 250.205928] ? __io_req_task_submit+0x29/0xa0 [ 250.206541] __mutex_lock+0xae/0x9f0 [ 250.207071] ? __io_req_task_submit+0x29/0xa0 [ 250.207745] ? 0xffffffffa0006083 [ 250.208248] ? __io_req_task_submit+0x29/0xa0 [ 250.208845] ? __io_req_task_submit+0x29/0xa0 [ 250.209452] ? __io_req_task_submit+0x5/0xa0 [ 250.210083] __io_req_task_submit+0x29/0xa0 [ 250.210687] io_async_task_func+0x23d/0x4c0 [ 250.211278] task_work_run+0x89/0xd0 [ 250.211884] io_run_task_work_sig+0x50/0xc0 [ 250.212464] io_sqe_files_unregister+0xb2/0x1f0 [ 250.213109] __io_uring_register+0x115a/0x1750 [ 250.213718] ? __x64_sys_io_uring_register+0xad/0x210 [ 250.214395] ? __fget_files+0x15a/0x260 [ 250.214956] __x64_sys_io_uring_register+0xbe/0x210 [ 250.215620] ? trace_hardirqs_on+0x46/0x110 [ 250.216205] do_syscall_64+0x2d/0x40 [ 250.216731] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 250.217455] RIP: 0033:0x7f0fa17e5239 [ 250.218034] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48 [ 250.220343] RSP: 002b:00007f0fa1eeac48 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab [ 250.221360] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0fa17e5239 [ 250.222272] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000008 [ 250.223185] RBP: 00007f0fa1eeae20 R08: 0000000000000000 R09: 0000000000000000 [ 250.224091] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 250.224999] R13: 0000000000021000 R14: 0000000000000000 R15: 00007f0fa1eeb700 This is caused by calling io_run_task_work_sig() to do work under uring_lock while the caller io_sqe_files_unregister() already held uring_lock. To fix this issue, briefly drop uring_lock when calling io_run_task_work_sig(), and there are two things to concern: - hold uring_lock in io_ring_ctx_free() around io_sqe_files_unregister() this is for consistency of lock/unlock. - add new fixed rsrc ref node before dropping uring_lock it's not safe to do io_uring_enter-->percpu_ref_get() with a dying one. - check if rsrc_data->refs is dying to avoid parallel io_sqe_files_unregister Reported-by: NAbaci <abaci@linux.alibaba.com> Fixes: 1ffc5422 ("io_uring: fix io_sqe_files_unregister() hangs") Suggested-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NHao Xu <haoxu@linux.alibaba.com> [axboe: fixes from Pavel folded in] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
In case of failure io_wq_submit_work() needs to post an CQE and so potentially take uring_lock. The safest way to deal with it is to do that from under task_work where we can safely take the lock. Also, as io_iopoll_check() holds the lock tight and releases it reluctantly, it will play nicer in the furuter with notifying an iopolling task about new such pending failed requests. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 2月, 2021 5 次提交
-
-
由 Pavel Begunkov 提交于
[ 97.866748] a.out/2890 is trying to acquire lock: [ 97.867829] ffff8881046763e8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_wq_submit_work+0x155/0x240 [ 97.869735] [ 97.869735] but task is already holding lock: [ 97.871033] ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 97.873074] [ 97.873074] other info that might help us debug this: [ 97.874520] Possible unsafe locking scenario: [ 97.874520] [ 97.875845] CPU0 [ 97.876440] ---- [ 97.877048] lock(&ctx->uring_lock); [ 97.877961] lock(&ctx->uring_lock); [ 97.878881] [ 97.878881] *** DEADLOCK *** [ 97.878881] [ 97.880341] May be due to missing lock nesting notation [ 97.880341] [ 97.881952] 1 lock held by a.out/2890: [ 97.882873] #0: ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 97.885108] [ 97.885108] stack backtrace: [ 97.890457] Call Trace: [ 97.891121] dump_stack+0xac/0xe3 [ 97.891972] __lock_acquire+0xab6/0x13a0 [ 97.892940] lock_acquire+0x2c3/0x390 [ 97.894894] __mutex_lock+0xae/0x9f0 [ 97.901101] io_wq_submit_work+0x155/0x240 [ 97.902112] io_wq_cancel_cb+0x162/0x490 [ 97.904126] io_async_find_and_cancel+0x3b/0x140 [ 97.905247] io_issue_sqe+0x86d/0x13e0 [ 97.909122] __io_queue_sqe+0x10b/0x550 [ 97.913971] io_queue_sqe+0x235/0x470 [ 97.914894] io_submit_sqes+0xcce/0xf10 [ 97.917872] __x64_sys_io_uring_enter+0x3fb/0x5b0 [ 97.921424] do_syscall_64+0x2d/0x40 [ 97.922329] entry_SYSCALL_64_after_hwframe+0x44/0xa9 While holding uring_lock, e.g. from inline execution, async cancel request may attempt cancellations through io_wq_submit_work, which may try to grab a lock. Delay it to task_work, so we do it from a clean context and don't have to worry about locking. Cc: <stable@vger.kernel.org> # 5.5+ Fixes: c07e6719 ("io_uring: hold uring_lock while completing failed polled io in io_wq_submit_work()") Reported-by: NAbaci <abaci@linux.alibaba.com> Reported-by: NHao Xu <haoxu@linux.alibaba.com> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Instead of marking a link with REQ_F_FAIL_LINK on an error and delaying its failing to the caller, do it eagerly right when after getting an error in io_submit_sqe(). This renders FAIL_LINK checks in io_queue_link_head() useless and we can skip it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Now, as we can do async setup without holding an SQE, we can skip doing io_req_defer_prep() for link heads, it will be tried to be executed inline and follows all the rules of the non-linked requests. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Now as preparations are split from async setup, we can do the first one pretty early not spilling it across multiple call sites. And after it's done SQE is not needed anymore and we can save on passing it deeply into the submission stack. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
There are two kinds of opcode-specific preparations we do. The first is just initialising req with what is always needed for an opcode and reading all non-generic SQE fields. And the second is copying some of the stuff like iovec preparing to punt a request to somewhere async, e.g. to io-wq or for draining. For requests that have tried an inline execution but still needing to be punted, the second prep type is done by the opcode handler itself. Currently, we don't explicitly split those preparation steps, but combining both of them into io_*_prep(), altering the behaviour by allocating ->async_data. That's pretty messy and hard to follow and also gets in the way of some optimisations. Split the steps, leave the first type as where it is now, and put the second into a new io_req_prep_async() helper. It may make us to do opcode switch twice, but it's worth it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-