- 24 8月, 2021 32 次提交
-
-
由 Jens Axboe 提交于
This is in preparation to making the completion lock work outside of hard/soft IRQ context. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This is in preparation to making the completion lock work outside of hard/soft IRQ context. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This is in preparation to making the completion lock work outside of hard/soft IRQ context. Add a timeout_lock to handle the ordering of timeout completions or cancelations with the timeouts actually triggering. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
For requests with non-fixed files, instead of grabbing just one reference, we get by the number of left requests, so the following requests using the same file can take it without atomics. However, it's not all win. If there is one request in the middle not using files or having a fixed file, we'll need to put back the left references. Even worse if an application submits requests dealing with different files, it will do a put for each new request, so doubling the number of atomics needed. Also, even if not used, it's still takes some cycles in the submission path. If a file used many times, it rather makes sense to pre-register it, if not, we may fall in the described pitfall. So, this optimisation is a matter of use case. Go with the simpliest code-wise way, remove it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
After recent fixes, tctx_task_work() always does proper spinlocking before looking into ->task_list, so now we don't need atomics for ->task_state, replace it with non-atomic task_running using the critical section. Tide it up, combine two separate block with spinlocking, and always try to splice in there, so we do less locking when new requests are arriving during the function execution. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> [axboe: fix missing ->task_running reset on task_work_add() failure] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Inline io_poll_remove_waitqs() into its only user and clean it up. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2f1a91a19ffcd591531dc4c61e2f11c64a2d6a6d.1628536684.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Unlike __io_cqring_overflow_flush(), nobody does forced flushing with io_cqring_overflow_flush(), so removed the argument from it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7594f869ca41b7cfb5a35a3c7c2d402242834e9e.1628536684.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Inline struct io_comp_state into struct io_submit_state. They are already coupled tightly, together with mixed responsibilities it only brings confusion having them separately. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e55bba77426b399e3a2e54e3c6c267c6a0fc4b57.1628536684.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
req->compl.list is used to cache freed requests, and so can't overlap in time with req->inflight_entry. So, use inflight_entry to link requests and remove compl.list. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e430e79d22d70a190d718831bda7bfed1daf8976.1628536684.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
We don't use @tsk argument of io_req_cache_free(), remove it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6a28b4a58ee0aaf0db98e2179b9c9f06f9b0cca1.1628536684.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Don't kfree requests in __io_free_req() but put them back into the internal request cache. That makes allocations more sustainable and will be used for refcounting optimisations. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9f4950fbe7771c8d41799366d0a3a08ac3040236.1628536684.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Move io_fallback_req_func() to kill yet another forward declaration. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d0a8f9d9a0057ed761d6237167d51c9378798d2d.1628536684.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
We cache all the reference to task + tctx, so if io_put_task() is called by the corresponding task itself, we can save on atomics and return the refs right back into the cache. It's beneficial for all inline completions, and also iopolling, when polling and submissions are done by the same task, including SQPOLL|IOPOLL. Note: io_uring_cancel_generic() can return refs to the cache as well, so those should be flushed in the loop for tctx_inflight() to work right. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6fe9646b3cb70e46aca1f58426776e368c8926b3.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
In case of on-exec io_uring cancellations, tasks already wait for all submitted requests to get completed/cancelled, so we don't need to check for ->in_execve separately. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/be8707049f10df9d20ca03dc4ca3316239b5e8e0.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
IO_IOPOLL_BATCH is not used, delete it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b2bdf19dbee2c9fc8865bbab9412135a14e24a64.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
If io_ring_exit_work() can't get it done in 5 minutes, something is going very wrong, don't keep spinning at HZ / 20 rate, it doesn't help and it may take much of CPU time if there is a lot of workers stuck as such. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9e2d1ca81d569f6bc628af1a42ff6663bff7ce9c.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Move IORING_SETUP_IOPOLL check into __io_openat_prep(), so both openat and openat2 reuse it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9a73ce83e4ee60d011180ef177eecef8e87ff2a2.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Inline io_free_req_deferred(), there is no reason to keep it separated. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ce04b7180d4eac0d69dd00677b227eefe80c2cc5.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Move the function together with io_rsrc_node_ref_zero() in the source file as it is to get rid of forward declarations. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4d81f6f833e7d017860b24463a9a68b14a8a5ed2.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Move the function in the source file as it is to get rid of forward declarations. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/33d917d69e4206557c75a5b98fe22bcdf77ce47d.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Refactor __io_uring_register() by extracting a helper responsible for ctx queisce. Looks better and will make it easier to add more optimisations. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0339e0027504176be09237eefa7945bf9a6f153d.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Turns out we always init struct io_wait_queue in io_cqring_wait(), even if it's not used after, i.e. there are already enough of CQEs. And often it's exactly what happens, for instance, requests may have been completed inline, or in case of io_uring_enter(submit=N, wait=1). It shows up in my profiler, so optimise it by delaying the struct init. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6f1b81c60b947d165583dc333947869c3d85d037.1628471125.git.asml.silence@gmail.com [axboe: fixed up for new cqring wait] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Add more annotations for submission path functions holding ->uring_lock. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/128ec4185e26fbd661dd3a424aa66108ee8ff951.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
IOPOLL users should care more about getting completions for requests they submitted, but not in "device did/completed something". Currently, io_do_iopoll() may return a positive number, which will instruct io_iopoll_check() to break the loop and end the syscall, even if there is not enough CQEs or none at all. Don't return positive numbers, so io_iopoll_check() exits only when it gets an actual error, need reschedule or got enough CQEs. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/641a88f751623b6758303b3171f0a4141f06726e.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Replace the main if of io_flush_cached_reqs() with inverted condition + goto, so all the cases are handled in the same way. And also extract io_preinit_req() to make it cleaner and easier to refer to. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1abcba1f7b55dc53bf1dbe95036e345ffb1d5b01.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
We prefer nornal task_works even if it would fail requests inside. Kill a PF_EXITING check in io_req_task_work_add(), task_work_add() handles well dying tasks, i.e. return error when can't enqueue due to late stages of do_exit(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fc14297e8441cd8f5d1743a2488cf0df09bf48ac.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Move io-wq callbacks closer to each other, so it's easier to work with them, and rename io_free_work() into io_wq_free_work() for consistency. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/851bbc7f0f86f206d8c1333efee8bcb9c26e419f.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
If we use fixed files, we can be sure (almost) that REQ_F_ISREG is set. However, for non-reg files io_prep_rw() still will look into inode to double check, and that's expensive and can be avoided. The only caveat is that it only currently works with 64+ bit architectures, see FFS_ISREG, so we should consider that. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0a62780c491ca2522cd52db4ae3f16e03aafed0f.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_file_supports_async() checks whether a file supports nowait operations, so "async" in the name is misleading. Rename it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/33d55b5ce43aa1884c637c1957f1e30d30dc3bec.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Optimise io_file_get() with registered files, which is in a hot path, by inlining parts of the function. Saves a function call, and inefficiencies of passing arguments, e.g. evaluating (sqe_flags & IOSQE_FIXED_FILE). It couldn't have been done before as compilers were refusing to inline it because of the function size. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/52115cd6ce28f33bd0923149c0e6cb611084a0b1.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Instead of hand-coded two-level tables for registered files, allocate them with kvmalloc(). In many cases small enough tables are enough, and so can be kmalloc()'ed removing an extra memory load and a bunch of bit logic instructions from the hot path. If the table is larger, we trade off all the pros with a TLB-assisted memory lookup. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/280421d3b48775dabab773006bb5588c7b2dabc0.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Currently we only wake the first waiter, even if we have enough entries posted to satisfy multiple waiters. Improve that situation so that every waiter knows how much the CQ tail has to advance before they can be safely woken up. With this change, if we have N waiters each asking for 1 event and we get 4 completions, then we wake up 4 waiters. If we have N waiters asking for 2 completions and we get 4 completions, then we wake up the first two. Previously, only the first waiter would've been woken up. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 8月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
We currently check for ret != 0 to indicate error, but '1' is a valid return and just indicates that the allocation succeeded with a wrap. Correct the check to be for < 0, like it was before the xarray conversion. Cc: stable@vger.kernel.org Fixes: 61cf9370 ("io_uring: Convert personality_idr to XArray") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 8月, 2021 1 次提交
-
-
由 Pavel Begunkov 提交于
Pin ring in io_fallback_req_func() by briefly elevating ctx->refs in case any task_work handler touches ctx after releasing a request. Fixes: 9011bf9a ("io_uring: fix stuck fallback reqs") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/833a494713d235ec144284a9bbfe418df4f6b61c.1629235576.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 8月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
If an SQPOLL based ring is newly created and an application issues an io_uring_enter(2) system call on it, then we can return a spurious -EOWNERDEAD error. This happens because there's nothing to submit, and if the caller doesn't specify any other action, the initial error assignment of -EOWNERDEAD never gets overwritten. This causes us to return it directly, even if it isn't valid. Move the error assignment into the actual failure case instead. Cc: stable@vger.kernel.org Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death") Reported-by: Sherlock Holo sherlockya@gmail.com Link: https://github.com/axboe/liburing/issues/413Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 8月, 2021 3 次提交
-
-
由 Pavel Begunkov 提交于
__io_rsrc_put_work() might need ->uring_lock, so nobody should wait for rsrc nodes holding the mutex. However, that's exactly what io_ring_ctx_free() does with io_wait_rsrc_data(). Split it into rsrc wait + dealloc, and move the first one out of the lock. Cc: stable@vger.kernel.org Fixes: b60c8dce ("io_uring: preparation for rsrc tagging") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0130c5c2693468173ec1afab714e0885d2c9c363.1628559783.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Ammar reports that he's seeing a lockdep splat on running test/rsrc_tags from the regression suite: ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc3-bluetea-test-00249-gc7d10223 #5 Tainted: G OE ------------------------------------------------------ kworker/2:4/2684 is trying to acquire lock: ffff88814bb1c0a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_rsrc_put_work+0x13d/0x1a0 but task is already holding lock: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}: __flush_work+0x31b/0x490 io_rsrc_ref_quiesce.part.0.constprop.0+0x35/0xb0 __do_sys_io_uring_register+0x45b/0x1060 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&ctx->uring_lock){+.+.}-{3:3}: __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 __mutex_lock+0x86/0x740 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 kthread+0x135/0x160 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock); lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock); *** DEADLOCK *** 2 locks held by kworker/2:4/2684: #0: ffff88810004d938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 #1: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 stack backtrace: CPU: 2 PID: 2684 Comm: kworker/2:4 Tainted: G OE 5.14.0-rc3-bluetea-test-00249-gc7d10223 #5 Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015 Workqueue: events io_rsrc_put_work Call Trace: dump_stack_lvl+0x6a/0x9a check_noncircular+0xfe/0x110 __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 ? io_rsrc_put_work+0x13d/0x1a0 __mutex_lock+0x86/0x740 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? process_one_work+0x1ce/0x530 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 ? process_one_work+0x530/0x530 kthread+0x135/0x160 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 which is due to holding the ctx->uring_lock when flushing existing pending work, while the pending work flushing may need to grab the uring lock if we're using IOPOLL. Fix this by dropping the uring_lock a bit earlier as part of the flush. Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/issues/404Tested-by: NAmmar Faizi <ammarfaizi2@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Nadav reports running into the below splat on re-enabling softirqs: WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0 Modules linked in: CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020 RIP: 0010:__local_bh_enable_ip+0xaa/0xe0 Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65 RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000 RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9 R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890 FS: 00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0 Call Trace: _raw_spin_unlock_bh+0x31/0x40 io_rsrc_node_ref_zero+0x13e/0x190 io_dismantle_req+0x215/0x220 io_req_complete_post+0x1b8/0x720 __io_complete_rw.isra.0+0x16b/0x1f0 io_complete_rw+0x10/0x20 where it's clear we end up calling the percpu count release directly from the completion path, as it's in atomic mode and we drop the last ref. For file/block IO, this can be from IRQ context already, and the softirq locking for rsrc isn't enough. Just make the lock fully IRQ safe, and ensure we correctly safe state from the release path as we don't know the full context there. Reported-by: NNadav Amit <nadav.amit@gmail.com> Tested-by: NNadav Amit <nadav.amit@gmail.com> Link: https://lore.kernel.org/io-uring/C187C836-E78B-4A31-B24C-D16919ACA093@gmail.com/Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 8月, 2021 2 次提交
-
-
由 Nadav Amit 提交于
The compiler should be forbidden from any strange optimization for async writes to user visible data-structures. Without proper protection, the compiler can cause write-tearing or invent writes that would confuse the userspace. However, there are writes to sq_flags which are not protected by WRITE_ONCE(). Use WRITE_ONCE() for these writes. This is purely a theoretical issue. Presumably, any compiler is very unlikely to do such optimizations. Fixes: 75b28aff ("io_uring: allocate the two rings together") Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: NNadav Amit <namit@vmware.com> Link: https://lore.kernel.org/r/20210808001342.964634-3-namit@vmware.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Nadav Amit 提交于
When using SQPOLL, the submission queue polling thread calls task_work_run() to run queued work. However, when work is added with TWA_SIGNAL - as done by io_uring itself - the TIF_NOTIFY_SIGNAL remains set afterwards and is never cleared. Consequently, when the submission queue polling thread checks whether signal_pending(), it may always find a pending signal, if task_work_add() was ever called before. The impact of this bug might be different on different kernel versions. It appears that on 5.14 it would only cause unnecessary calculation and prevent the polling thread from sleeping. On 5.13, where the bug was found, it stops the polling thread from finding newly submitted work. Instead of task_work_run(), use tracehook_notify_signal() that clears TIF_NOTIFY_SIGNAL. Test for TIF_NOTIFY_SIGNAL in addition to current->task_works to avoid a race in which task_works is cleared but the TIF_NOTIFY_SIGNAL is set. Fixes: 685fe7fe ("io-wq: eliminate the need for a manager thread") Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: NNadav Amit <namit@vmware.com> Link: https://lore.kernel.org/r/20210808001342.964634-2-namit@vmware.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-