- 07 3月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
Ran into a use-after-free on the main io-wq struct, wq. It has a worker ref and completion event, but the manager itself isn't holding a reference. This can lead to a race where the manager thinks there are no workers and exits, but a worker is being added. That leads to the following trace: BUG: KASAN: use-after-free in io_wqe_worker+0x4c0/0x5e0 Read of size 8 at addr ffff888108baa8a0 by task iou-wrk-3080422/3080425 CPU: 5 PID: 3080425 Comm: iou-wrk-3080422 Not tainted 5.12.0-rc1+ #110 Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO 10G (MS-7C60), BIOS 1.60 05/13/2020 Call Trace: dump_stack+0x90/0xbe print_address_description.constprop.0+0x67/0x28d ? io_wqe_worker+0x4c0/0x5e0 kasan_report.cold+0x7b/0xd4 ? io_wqe_worker+0x4c0/0x5e0 __asan_load8+0x6d/0xa0 io_wqe_worker+0x4c0/0x5e0 ? io_worker_handle_work+0xc00/0xc00 ? recalc_sigpending+0xe5/0x120 ? io_worker_handle_work+0xc00/0xc00 ? io_worker_handle_work+0xc00/0xc00 ret_from_fork+0x1f/0x30 Allocated by task 3080422: kasan_save_stack+0x23/0x60 __kasan_kmalloc+0x80/0xa0 kmem_cache_alloc_node_trace+0xa0/0x480 io_wq_create+0x3b5/0x600 io_uring_alloc_task_context+0x13c/0x380 io_uring_add_task_file+0x109/0x140 __x64_sys_io_uring_enter+0x45f/0x660 do_syscall_64+0x32/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 3080422: kasan_save_stack+0x23/0x60 kasan_set_track+0x20/0x40 kasan_set_free_info+0x24/0x40 __kasan_slab_free+0xe8/0x120 kfree+0xa8/0x400 io_wq_put+0x14a/0x220 io_wq_put_and_exit+0x9a/0xc0 io_uring_clean_tctx+0x101/0x140 __io_uring_files_cancel+0x36e/0x3c0 do_exit+0x169/0x1340 __x64_sys_exit+0x34/0x40 do_syscall_64+0x32/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Have the manager itself hold a reference, and now both drop points drop and complete if we hit zero, and the manager can unconditionally do a wait_for_completion() instead of having a race between reading the ref count and waiting if it was non-zero. Fixes: fb3a1f6c ("io-wq: have manager wait for all workers to exit") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 3月, 2021 2 次提交
-
-
由 Jens Axboe 提交于
If we race with shutting down the io-wq context and someone queueing a hashed entry, then we can exit the manager with it armed. If it then triggers after the manager has exited, we can have a use-after-free where io_wqe_hash_wake() attempts to wake a now gone manager process. Move the killing of the hashed write queue into the manager itself, so that we know we've killed it before the task exits. Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This allows us to do task creation and setup without needing to use completions to try and synchronize with the starting thread. Get rid of the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup completion events - we can now do setup before the task is running. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 3月, 2021 10 次提交
-
-
由 Jens Axboe 提交于
If we race on shutting down the io-wq, then we should ensure that any work that was queued after workers shutdown is canceled. Harden the add work check a bit too, checking for IO_WQ_BIT_EXIT and cancel if it's set. Add a WARN_ON() for having any work before we kill the io-wq context. Reported-by: syzbot+91b4b56ead187d35c9d3@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Alex reports that his system fails to suspend using 5.12-rc1, with the following dump: [ 240.650300] PM: suspend entry (deep) [ 240.650748] Filesystems sync: 0.000 seconds [ 240.725605] Freezing user space processes ... [ 260.739483] Freezing of tasks failed after 20.013 seconds (3 tasks refusing to freeze, wq_busy=0): [ 260.739497] task:iou-mgr-446 state:S stack: 0 pid: 516 ppid: 439 flags:0x00004224 [ 260.739504] Call Trace: [ 260.739507] ? sysvec_apic_timer_interrupt+0xb/0x81 [ 260.739515] ? pick_next_task_fair+0x197/0x1cde [ 260.739519] ? sysvec_reschedule_ipi+0x2f/0x6a [ 260.739522] ? asm_sysvec_reschedule_ipi+0x12/0x20 [ 260.739525] ? __schedule+0x57/0x6d6 [ 260.739529] ? del_timer_sync+0xb9/0x115 [ 260.739533] ? schedule+0x63/0xd5 [ 260.739536] ? schedule_timeout+0x219/0x356 [ 260.739540] ? __next_timer_interrupt+0xf1/0xf1 [ 260.739544] ? io_wq_manager+0x73/0xb1 [ 260.739549] ? io_wq_create+0x262/0x262 [ 260.739553] ? ret_from_fork+0x22/0x30 [ 260.739557] task:iou-mgr-517 state:S stack: 0 pid: 522 ppid: 439 flags:0x00004224 [ 260.739561] Call Trace: [ 260.739563] ? sysvec_apic_timer_interrupt+0xb/0x81 [ 260.739566] ? pick_next_task_fair+0x16f/0x1cde [ 260.739569] ? sysvec_apic_timer_interrupt+0xb/0x81 [ 260.739571] ? asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 260.739574] ? __schedule+0x5b7/0x6d6 [ 260.739578] ? del_timer_sync+0x70/0x115 [ 260.739581] ? schedule_timeout+0x211/0x356 [ 260.739585] ? __next_timer_interrupt+0xf1/0xf1 [ 260.739588] ? io_wq_check_workers+0x15/0x11f [ 260.739592] ? io_wq_manager+0x69/0xb1 [ 260.739596] ? io_wq_create+0x262/0x262 [ 260.739600] ? ret_from_fork+0x22/0x30 [ 260.739603] task:iou-wrk-517 state:S stack: 0 pid: 523 ppid: 439 flags:0x00004224 [ 260.739607] Call Trace: [ 260.739609] ? __schedule+0x5b7/0x6d6 [ 260.739614] ? schedule+0x63/0xd5 [ 260.739617] ? schedule_timeout+0x219/0x356 [ 260.739621] ? __next_timer_interrupt+0xf1/0xf1 [ 260.739624] ? task_thread.isra.0+0x148/0x3af [ 260.739628] ? task_thread_unbound+0xa/0xa [ 260.739632] ? task_thread_bound+0x7/0x7 [ 260.739636] ? ret_from_fork+0x22/0x30 [ 260.739647] OOM killer enabled. [ 260.739648] Restarting tasks ... done. [ 260.740077] PM: suspend exit Play nice and ensure that any thread we create will call try_to_freeze() at an opportune time so that memory suspend can proceed. For the io-wq worker threads, mark them as PF_NOFREEZE. They could potentially be blocked for a long time. Reported-by: NAlex Xu (Hello71) <alex_y_xu@yahoo.ca> Tested-by: NAlex Xu (Hello71) <alex_y_xu@yahoo.ca> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
The 'err' path should include the hash put, we already grabbed a reference once we get that far. Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we move it in there, then we no longer have to care about it in io-wq. This means we can drop the cred handling in io-wq, and we can drop the REQ_F_WORK_INITIALIZED flag and async init functions as that was the last user of it since we moved to the new workers. Then we can also drop io_wq_work->creds, and just hold the personality u16 in there instead. Suggested-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we put the io-wq from io_uring, we really want it to exit. Provide a helper that does that for us. Couple that with not having the manager hold a reference to the 'wq' and the normal SQPOLL exit will tear down the io-wq context appropriate. On the io-wq side, our wq context is per task, so only the task itself is manipulating ->manager and hence it's safe to check and clear without any extra locking. We just need to ensure that the manager task stays around, in case it exits. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We are already freeing the wq struct in both spots, so don't put it and get it freed twice. Reported-by: syzbot+7bf785eedca35ca05501@syzkaller.appspotmail.com Fixes: 4fb6ac32 ("io-wq: improve manager/worker handling over exec") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
The manager waits for the workers, hence the manager is always valid if workers are running. Now also have wq destroy wait for the manager on exit, so we now everything is gone. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This is a leftover from a different use cases, it's used to wait for the manager to startup. Rename it as such. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we're in the process of shutting down the async context, then don't create new workers if we already have at least the fixed one. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Instead of having to wait separately on workers and manager, just have the manager wait on the workers. We use an atomic_t for the reference here, as we need to start at 0 and allow increment from that. Since the number of workers is naturally capped by the allowed nr of processes, and that uses an int, there is no risk of overflow. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 3月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
We need to have our worker count updated before continuing, to avoid cases where we repeatedly think we need a new worker, but a fork is already in progress. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 2月, 2021 3 次提交
-
-
由 Jens Axboe 提交于
This flag is now dead, remove it. Fixes: 1cbd9c2b ("io-wq: don't create any IO workers upfront") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
exec will cancel any threads, including the ones that io-wq is using. This isn't a problem, in fact we'd prefer it to be that way since it means we know that any async work cancels naturally without having to handle it proactively. But it does mean that we need to setup a new manager, as the manager and workers are gone. Handle this at queue time, and cancel work if we fail. Since the manager can go away without us noticing, ensure that the manager itself holds a reference to the 'wq' as well. Rename io_wq_destroy() to io_wq_put() to reflect that. In the future we can now simplify exec cancelation handling, for now just leave it the same. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Before the io-wq thread change, we maintained a hash work map and lock per-node per-ring. That wasn't ideal, as we really wanted it to be per ring. But now that we have per-task workers, the hash map ends up being just per-task. That'll work just fine for the normal case of having one task use a ring, but if you share the ring between tasks, then it's considerably worse than it was before. Make the hash map per ctx instead, which provides full per-ctx buffered write serialization on hashed writes. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 2月, 2021 3 次提交
-
-
由 Jens Axboe 提交于
There's a small window between lookup dropping the reference to the worker and calling wake_up_process() on the worker task, where the worker itself could have exited. We ensure that the worker struct itself is valid, but worker->task may very well be gone by the time we issue the wakeup. Fix the race by using a completion triggered by the reference going to zero, and having exit wait for that completion before proceeding. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
These races have always been there, they are just more apparent now that we do early cancel of io-wq when the task exits. Ensure that the io-wq manager sets task state correctly to not miss wakeups for task creation. This is important if we get a wakeup after having marked ourselves as TASK_INTERRUPTIBLE. If we do end up creating workers, then we flip the state back to running, making the subsequent schedule() a no-op. Also increment the wq ref count before forking the thread, to avoid a use-after-free. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We're now just using fork like we would from userspace, so there's no need to try and impose extra restrictions or accounting on the user side of things. That's already being done for us. That also means we don't have to pass in the user_struct anymore, that's correctly inherited through ->creds on fork. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 2月, 2021 9 次提交
-
-
由 Jens Axboe 提交于
We want to use this in io_uring proper as well, for the SQPOLL thread. Rename it from fork_thread() to io_wq_fork_thread(), and make it available through the io-wq.h header. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If the worker isn't on the free_list, don't attempt to delete it. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We are no longer grabbing state, so no need to maintain an IO identity that we COW if there are changes. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Remove the bool return, and the checking for it in the caller. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Instead of using regular kthread kernel threads, create kernel threads that are like a real thread that the task would create. This ensures that we get all the context that we need, without having to carry that state around. This greatly reduces the code complexity, and the risk of missing state for a given request type. With the move away from kthread, we can also dump everything related to assigned state to the new threads. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Just grab it from the worker itself, which we're already passing in. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We don't support attach anymore, so doesn't make sense to carry the use_refs reference count. Get rid of it. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
When the manager thread starts up, it creates a worker per node for the given context. Just let these get created dynamically, like we do for adding further workers. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We hit this case when the task is exiting, and we need somewhere to do background cleanup of requests. Instead of relying on the io-wq task manager to do this work for us, just stuff it somewhere where we can safely run it ourselves directly. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 2月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
By default, kernel threads have init_fs and init_files assigned. In the past, this has triggered security problems, as commands that don't ask for (and hence don't get assigned) fs/files from the originating task can then attempt path resolution etc with access to parts of the system they should not be able to. Rather than add checks in the fs code for misuse, just set these to NULL. If we do attempt to use them, then the resulting code will oops rather than provide access to something that it should not permit. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 2月, 2021 1 次提交
-
-
由 Pavel Begunkov 提交于
Saving one lock/unlock for io-wq is not super important, but adds some ugliness in the code. More important, atomic decs not turning it to zero for some archs won't give the right ordering/barriers so the io_steal_work() may pretty easily get subtly and completely broken. Return back 2-step io-wq work exchange and clean it up. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 2月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
It's no longer used as IORING_OP_CLOSE got rid for the need of flagging it as uncancelable, kill it of. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 12月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
io_uring no longer issues full cancelations on the io-wq, so remove any remnants of this code and the IO_WQ_BIT_CANCEL flag. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 12月, 2020 1 次提交
-
-
由 Pavel Begunkov 提交于
Instead of iterating over each request and cancelling it individually in io_uring_cancel_files(), try to cancel all matching requests and use ->inflight_list only to check if there anything left. In many cases it should be faster, and we can reuse a lot of code from task cancellation. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 11月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
This can't currently happen, but will be possible shortly. Handle missing files just like we do not being able to grab a needed mm, and mark the request as needing cancelation. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 10月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
We correctly set io-wq NUMA node affinities when the io-wq context is setup, but if an entire node CPU set is offlined and then brought back online, the per node affinities are broken. Ensure that we set them again whenever a CPU comes online. This ensures that we always track the right node affinity. The usual cpuhp notifiers are used to drive it. Reported-by: NZhang Qiang <qiang.zhang@windriver.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 10月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
This one was missed in the earlier conversion, should be included like any of the other IO identity flags. Make sure we restore to RLIM_INIFITY when dropping the personality again. Fixes: 98447d65 ("io_uring: move io identity items into separate struct") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 10月, 2020 3 次提交
-
-
由 Jens Axboe 提交于
Make sure the async io-wq workers inherit the loginuid and sessionid from the original task, and restore them to unset once we're done with the async work item. While at it, disable the ability for kernel threads to write to their own loginuid. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
io-wq contains a pointer to the identity, which we just hold in io_kiocb for now. This is in preparation for putting this outside io_kiocb. The only exception is struct files_struct, which we'll need different rules for to avoid a circular dependency. No functional changes in this patch. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We solely rely on work->work_flags now, so use that for proper checking and clearing/dropping of various identity items. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-