- 26 2月, 2021 3 次提交
-
-
由 Jens Axboe 提交于
exec will cancel any threads, including the ones that io-wq is using. This isn't a problem, in fact we'd prefer it to be that way since it means we know that any async work cancels naturally without having to handle it proactively. But it does mean that we need to setup a new manager, as the manager and workers are gone. Handle this at queue time, and cancel work if we fail. Since the manager can go away without us noticing, ensure that the manager itself holds a reference to the 'wq' as well. Rename io_wq_destroy() to io_wq_put() to reflect that. In the future we can now simplify exec cancelation handling, for now just leave it the same. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
syzbot reports the following hang: INFO: task syz-executor.0:12538 can't die for more than 143 seconds. task:syz-executor.0 state:D stack:28352 pid:12538 ppid: 8423 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:4324 [inline] __schedule+0x90c/0x21a0 kernel/sched/core.c:5075 schedule+0xcf/0x270 kernel/sched/core.c:5154 schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x168/0x270 kernel/sched/completion.c:138 io_sq_thread_finish+0x96/0x580 fs/io_uring.c:7152 io_sq_offload_create fs/io_uring.c:7929 [inline] io_uring_create fs/io_uring.c:9465 [inline] io_uring_setup+0x1fb2/0x2c20 fs/io_uring.c:9550 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae which is due to exiting after the SQPOLL thread has been created, but hasn't been started yet. Ensure that we always complete the startup side when waiting for it to exit. Reported-by: syzbot+c927c937cba8ef66dd4a@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Before the io-wq thread change, we maintained a hash work map and lock per-node per-ring. That wasn't ideal, as we really wanted it to be per ring. But now that we have per-task workers, the hash map ends up being just per-task. That'll work just fine for the normal case of having one task use a ring, but if you share the ring between tasks, then it's considerably worse than it was before. Make the hash map per ctx instead, which provides full per-ctx buffered write serialization on hashed writes. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 2月, 2021 5 次提交
-
-
由 Jens Axboe 提交于
If the task ends up doing no IO, the context list is empty and we don't call into __io_uring_files_cancel() when the task exits. This can cause a leak of the io-wq structures. Ensure we always call __io_uring_files_cancel(), even if the task context list is empty. Fixes: 5aa75ed5 ("io_uring: tie async worker side to the task context") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
At this point we're only using it for memory accounting, so there's no need to have an extra ->limit_mem - we can just set ->user if we do the accounting, or leave it at NULL if we don't. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We're now just using fork like we would from userspace, so there's no need to try and impose extra restrictions or accounting on the user side of things. That's already being done for us. That also means we don't have to pass in the user_struct anymore, that's correctly inherited through ->creds on fork. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
A few reasons to do this: - The naming of the manager and worker have changed. That's a user visible change, so makes sense to flag it. - Opening certain files that use ->signal (like /proc/self or /dev/tty) now works, and the flag tells the application upfront that this is the case. - Related to the above, using signalfd will now work as well. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Don't use a kthread for SQPOLL, use a forked worker just like the io-wq workers. With that done, we can drop the various context grabbing we do for SQPOLL, it already has everything it needs. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 2月, 2021 7 次提交
-
-
由 Jens Axboe 提交于
We are no longer grabbing state, so no need to maintain an IO identity that we COW if there are changes. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
The async workers are siblings of the task itself, so by definition we have all the state that we need. Remove any of the state grabbing that we have, and requests flagging what they need. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Instead of using regular kthread kernel threads, create kernel threads that are like a real thread that the task would create. This ensures that we get all the context that we need, without having to carry that state around. This greatly reduces the code complexity, and the risk of missing state for a given request type. With the move away from kthread, we can also dump everything related to assigned state to the new threads. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Move it outside of the io_ring_ctx, and tie it to the io_uring task context. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Moving towards making the io_wq per ring per task, so we can't really share it between rings. Which is fine, since we've now dropped some of that fat from it. Retain compatibility with how attaching works, so that any attempt to attach to an fd that doesn't exist, or isn't an io_uring fd, will fail like it did before. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We hit this case when the task is exiting, and we need somewhere to do background cleanup of requests. Instead of relying on the io-wq task manager to do this work for us, just stuff it somewhere where we can safely run it ourselves directly. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Do run task_work before io_uring_register(), that might make a first quiesce round much nicer. We generally do that for any syscall invocation to avoid spurious -EINTR/-ERESTARTSYS, for task_work that we generate. This patch brings io_uring_register() inline with the two other io_uring syscalls. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 2月, 2021 7 次提交
-
-
由 Pavel Begunkov 提交于
sqe->flags are subset of req flags, so incorrectly copied may span into in-kernel flags and wreck havoc, e.g. by setting REQ_F_INFLIGHT. Fixes: 5be9ad1e ("io_uring: optimise io_init_req() flags setting") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
There is a short window where percpu_refs are already turned zero, but we try to do resurrect(). Play nicer and wait for ->release() to happen in this case and proceed as everything is ok. One downside for ctx refs is that we can ignore signal_pending() on a rare occasion, but someone else should check for it later if needed. Cc: <stable@vger.kernel.org> # 5.5+ Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_rsrc_ref_quiesce() is a generic resource function, though now it was wired to allocate and initialise ref nodes with file-specific callbacks/etc. Keep it sane by passing in as a parameters everything we need for initialisations, otherwise it will hurt us badly one day. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
After a rsrc/files reference node's refs are killed, it must never be used. And that's how it works, it either assigns a new node or kills the whole data table. Let's explicitly NULL it, that shouldn't be necessary, but if something would go wrong I'd rather catch a NULL dereference to using a dangling pointer. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
With the prep and prep async split, we now have potentially 3 helpers that need to be defined for !CONFIG_NET. Add some helpers to do just that. Fixes the following compile error on !CONFIG_NET: fs/io_uring.c:6171:10: error: implicit declaration of function 'io_sendmsg_prep_async'; did you mean 'io_req_prep_async'? [-Werror=implicit-function-declaration] return io_sendmsg_prep_async(req); ^~~~~~~~~~~~~~~~~~~~~ io_req_prep_async Fixes: 93642ef8 ("io_uring: split sqe-prep and async setup") Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Hao Xu 提交于
Abaci reported the below issue: [ 141.400455] hrtimer: interrupt took 205853 ns [ 189.869316] process 'usr/local/ilogtail/ilogtail_0.16.26' started with executable stack [ 250.188042] [ 250.188327] ============================================ [ 250.189015] WARNING: possible recursive locking detected [ 250.189732] 5.11.0-rc4 #1 Not tainted [ 250.190267] -------------------------------------------- [ 250.190917] a.out/7363 is trying to acquire lock: [ 250.191506] ffff888114dbcbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __io_req_task_submit+0x29/0xa0 [ 250.192599] [ 250.192599] but task is already holding lock: [ 250.193309] ffff888114dbfbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_register+0xad/0x210 [ 250.194426] [ 250.194426] other info that might help us debug this: [ 250.195238] Possible unsafe locking scenario: [ 250.195238] [ 250.196019] CPU0 [ 250.196411] ---- [ 250.196803] lock(&ctx->uring_lock); [ 250.197420] lock(&ctx->uring_lock); [ 250.197966] [ 250.197966] *** DEADLOCK *** [ 250.197966] [ 250.198837] May be due to missing lock nesting notation [ 250.198837] [ 250.199780] 1 lock held by a.out/7363: [ 250.200373] #0: ffff888114dbfbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_register+0xad/0x210 [ 250.201645] [ 250.201645] stack backtrace: [ 250.202298] CPU: 0 PID: 7363 Comm: a.out Not tainted 5.11.0-rc4 #1 [ 250.203144] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 250.203887] Call Trace: [ 250.204302] dump_stack+0xac/0xe3 [ 250.204804] __lock_acquire+0xab6/0x13a0 [ 250.205392] lock_acquire+0x2c3/0x390 [ 250.205928] ? __io_req_task_submit+0x29/0xa0 [ 250.206541] __mutex_lock+0xae/0x9f0 [ 250.207071] ? __io_req_task_submit+0x29/0xa0 [ 250.207745] ? 0xffffffffa0006083 [ 250.208248] ? __io_req_task_submit+0x29/0xa0 [ 250.208845] ? __io_req_task_submit+0x29/0xa0 [ 250.209452] ? __io_req_task_submit+0x5/0xa0 [ 250.210083] __io_req_task_submit+0x29/0xa0 [ 250.210687] io_async_task_func+0x23d/0x4c0 [ 250.211278] task_work_run+0x89/0xd0 [ 250.211884] io_run_task_work_sig+0x50/0xc0 [ 250.212464] io_sqe_files_unregister+0xb2/0x1f0 [ 250.213109] __io_uring_register+0x115a/0x1750 [ 250.213718] ? __x64_sys_io_uring_register+0xad/0x210 [ 250.214395] ? __fget_files+0x15a/0x260 [ 250.214956] __x64_sys_io_uring_register+0xbe/0x210 [ 250.215620] ? trace_hardirqs_on+0x46/0x110 [ 250.216205] do_syscall_64+0x2d/0x40 [ 250.216731] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 250.217455] RIP: 0033:0x7f0fa17e5239 [ 250.218034] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48 [ 250.220343] RSP: 002b:00007f0fa1eeac48 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab [ 250.221360] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0fa17e5239 [ 250.222272] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000008 [ 250.223185] RBP: 00007f0fa1eeae20 R08: 0000000000000000 R09: 0000000000000000 [ 250.224091] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 250.224999] R13: 0000000000021000 R14: 0000000000000000 R15: 00007f0fa1eeb700 This is caused by calling io_run_task_work_sig() to do work under uring_lock while the caller io_sqe_files_unregister() already held uring_lock. To fix this issue, briefly drop uring_lock when calling io_run_task_work_sig(), and there are two things to concern: - hold uring_lock in io_ring_ctx_free() around io_sqe_files_unregister() this is for consistency of lock/unlock. - add new fixed rsrc ref node before dropping uring_lock it's not safe to do io_uring_enter-->percpu_ref_get() with a dying one. - check if rsrc_data->refs is dying to avoid parallel io_sqe_files_unregister Reported-by: NAbaci <abaci@linux.alibaba.com> Fixes: 1ffc5422 ("io_uring: fix io_sqe_files_unregister() hangs") Suggested-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NHao Xu <haoxu@linux.alibaba.com> [axboe: fixes from Pavel folded in] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
In case of failure io_wq_submit_work() needs to post an CQE and so potentially take uring_lock. The safest way to deal with it is to do that from under task_work where we can safely take the lock. Also, as io_iopoll_check() holds the lock tight and releases it reluctantly, it will play nicer in the furuter with notifying an iopolling task about new such pending failed requests. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 2月, 2021 12 次提交
-
-
由 Pavel Begunkov 提交于
[ 97.866748] a.out/2890 is trying to acquire lock: [ 97.867829] ffff8881046763e8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_wq_submit_work+0x155/0x240 [ 97.869735] [ 97.869735] but task is already holding lock: [ 97.871033] ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 97.873074] [ 97.873074] other info that might help us debug this: [ 97.874520] Possible unsafe locking scenario: [ 97.874520] [ 97.875845] CPU0 [ 97.876440] ---- [ 97.877048] lock(&ctx->uring_lock); [ 97.877961] lock(&ctx->uring_lock); [ 97.878881] [ 97.878881] *** DEADLOCK *** [ 97.878881] [ 97.880341] May be due to missing lock nesting notation [ 97.880341] [ 97.881952] 1 lock held by a.out/2890: [ 97.882873] #0: ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 97.885108] [ 97.885108] stack backtrace: [ 97.890457] Call Trace: [ 97.891121] dump_stack+0xac/0xe3 [ 97.891972] __lock_acquire+0xab6/0x13a0 [ 97.892940] lock_acquire+0x2c3/0x390 [ 97.894894] __mutex_lock+0xae/0x9f0 [ 97.901101] io_wq_submit_work+0x155/0x240 [ 97.902112] io_wq_cancel_cb+0x162/0x490 [ 97.904126] io_async_find_and_cancel+0x3b/0x140 [ 97.905247] io_issue_sqe+0x86d/0x13e0 [ 97.909122] __io_queue_sqe+0x10b/0x550 [ 97.913971] io_queue_sqe+0x235/0x470 [ 97.914894] io_submit_sqes+0xcce/0xf10 [ 97.917872] __x64_sys_io_uring_enter+0x3fb/0x5b0 [ 97.921424] do_syscall_64+0x2d/0x40 [ 97.922329] entry_SYSCALL_64_after_hwframe+0x44/0xa9 While holding uring_lock, e.g. from inline execution, async cancel request may attempt cancellations through io_wq_submit_work, which may try to grab a lock. Delay it to task_work, so we do it from a clean context and don't have to worry about locking. Cc: <stable@vger.kernel.org> # 5.5+ Fixes: c07e6719 ("io_uring: hold uring_lock while completing failed polled io in io_wq_submit_work()") Reported-by: NAbaci <abaci@linux.alibaba.com> Reported-by: NHao Xu <haoxu@linux.alibaba.com> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Instead of marking a link with REQ_F_FAIL_LINK on an error and delaying its failing to the caller, do it eagerly right when after getting an error in io_submit_sqe(). This renders FAIL_LINK checks in io_queue_link_head() useless and we can skip it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Now, as we can do async setup without holding an SQE, we can skip doing io_req_defer_prep() for link heads, it will be tried to be executed inline and follows all the rules of the non-linked requests. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Now as preparations are split from async setup, we can do the first one pretty early not spilling it across multiple call sites. And after it's done SQE is not needed anymore and we can save on passing it deeply into the submission stack. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
There are two kinds of opcode-specific preparations we do. The first is just initialising req with what is always needed for an opcode and reading all non-generic SQE fields. And the second is copying some of the stuff like iovec preparing to punt a request to somewhere async, e.g. to io-wq or for draining. For requests that have tried an inline execution but still needing to be punted, the second prep type is done by the opcode handler itself. Currently, we don't explicitly split those preparation steps, but combining both of them into io_*_prep(), altering the behaviour by allocating ->async_data. That's pretty messy and hard to follow and also gets in the way of some optimisations. Split the steps, leave the first type as where it is now, and put the second into a new io_req_prep_async() helper. It may make us to do opcode switch twice, but it's worth it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
If we get an error in io_init_req() for a request that would have been linked, we break the submission but still issue a partially composed link, that's nasty, fail it instead. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Move struct io_submit_link into submit_state, which is a part of a submission state and so belongs to it. It saves us from explicitly passing it, and init/deinit is now nicely hidden in io_submit_state_[start,end]. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Behaves identically, just move io_init_req() call into the beginning of io_submit_sqes(). That looks better unloads io_submit_sqes(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
A preparation patch, symbol to symbol move io_init_req() + io_check_restriction() a bit up. The submission path is pretty settled down, so don't worry about backports and move the functions instead of relying on forward declarations in the future. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
IORING_OP_SYNC_FILE_RANGE is marked as .needs_file, so the common path will take care of assigning and validating req->file, no need to duplicate it in io_sfr_prep(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Follow io_*_prep() naming pattern, there are only fsync and sfr that don't do that. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
@i and @Submitted are very much coupled together, and there is no need to keep them both. Remove @i, it doesn't change generated binary but helps to keep a single source of truth. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 2月, 2021 1 次提交
-
-
由 Pavel Begunkov 提交于
Don't forget to free iovec read inline completion and bunch of other cases that do "goto done" before setting up an async context. Fixes: 5ea5dd45 ("io_uring: inline io_read()'s iovec freeing") Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 2月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
We add task_work from any context, hence we need to ensure that we can tolerate it being from IRQ context as well. Fixes: 7cbf1722 ("io_uring: provide FIFO ordering for task_work") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 2月, 2021 3 次提交
-
-
由 Jens Axboe 提交于
Be nice and prune these upfront, in case the ring is being shared and one of the tasks is going away. This is a bit more important now that we account the allocations. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We have three different ones, put it in a helper for easy calling. This is in preparation for doing it outside of ring freeing as well. With that in mind, also ensure that we do the proper locking for safe calling from a context where the ring it still live. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
No changes in this patch, just allows a caller to pass in a targeted task that we must match for freeing requests in the cache. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 2月, 2021 1 次提交
-
-
由 Pavel Begunkov 提交于
Invalid req->flags are tolerated by free/put well, avoid this dancing needlessly presetting it to zero, and then not even resetting but modifying it, i.e. "|=". Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-