1. 13 3月, 2021 3 次提交
    • P
      io_uring: fix OP_ASYNC_CANCEL across tasks · 58f99373
      Pavel Begunkov 提交于
      IORING_OP_ASYNC_CANCEL tries io-wq cancellation only for current task.
      If it fails go over tctx_list and try it out for every single tctx.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      58f99373
    • P
      io_uring: cancel sqpoll via task_work · 521d6a73
      Pavel Begunkov 提交于
      1) The first problem is io_uring_cancel_sqpoll() ->
      io_uring_cancel_task_requests() basically doing park(); park(); and so
      hanging.
      
      2) Another one is more subtle, when the master task is doing cancellations,
      but SQPOLL task submits in-between the end of the cancellation but
      before finish() requests taking a ref to the ctx, and so eternally
      locking it up.
      
      3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and
      same io_uring_cancel_sqpoll() from the owner task, they race for
      tctx->wait events. And there probably more of them.
      
      Instead do SQPOLL cancellations from within SQPOLL task context via
      task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal
      park()/unpark() during cancellation, which is ugly, subtle and anyway
      doesn't allow to do io_run_task_work() properly.
      
      io_uring_cancel_sqpoll() is called only from SQPOLL task context and
      under sqd locking, so all parking is removed from there. And so,
      io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by
      SQPOLL task, and that spare us from some headache.
      
      Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll,
      which is not used anymore.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      521d6a73
    • P
      io_uring: prevent racy sqd->thread checks · 26984fbf
      Pavel Begunkov 提交于
      SQPOLL thread to which we're trying to attach may be going away, it's
      not nice but a more serious problem is if io_sq_offload_create() sees
      sqd->thread==NULL, and tries to init it with a new thread. There are
      tons of ways it can be exploited or fail.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      26984fbf
  2. 12 3月, 2021 4 次提交
  3. 10 3月, 2021 15 次提交
  4. 08 3月, 2021 10 次提交
  5. 07 3月, 2021 2 次提交
    • J
      io-wq: always track creds for async issue · 003e8dcc
      Jens Axboe 提交于
      If we go async with a request, grab the creds that the task currently has
      assigned and make sure that the async side switches to them. This is
      handled in the same way that we do for registered personalities.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      003e8dcc
    • J
      io-wq: fix race in freeing 'wq' and worker access · 886d0137
      Jens Axboe 提交于
      Ran into a use-after-free on the main io-wq struct, wq. It has a worker
      ref and completion event, but the manager itself isn't holding a
      reference. This can lead to a race where the manager thinks there are
      no workers and exits, but a worker is being added. That leads to the
      following trace:
      
      BUG: KASAN: use-after-free in io_wqe_worker+0x4c0/0x5e0
      Read of size 8 at addr ffff888108baa8a0 by task iou-wrk-3080422/3080425
      
      CPU: 5 PID: 3080425 Comm: iou-wrk-3080422 Not tainted 5.12.0-rc1+ #110
      Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO 10G (MS-7C60), BIOS 1.60 05/13/2020
      Call Trace:
       dump_stack+0x90/0xbe
       print_address_description.constprop.0+0x67/0x28d
       ? io_wqe_worker+0x4c0/0x5e0
       kasan_report.cold+0x7b/0xd4
       ? io_wqe_worker+0x4c0/0x5e0
       __asan_load8+0x6d/0xa0
       io_wqe_worker+0x4c0/0x5e0
       ? io_worker_handle_work+0xc00/0xc00
       ? recalc_sigpending+0xe5/0x120
       ? io_worker_handle_work+0xc00/0xc00
       ? io_worker_handle_work+0xc00/0xc00
       ret_from_fork+0x1f/0x30
      
      Allocated by task 3080422:
       kasan_save_stack+0x23/0x60
       __kasan_kmalloc+0x80/0xa0
       kmem_cache_alloc_node_trace+0xa0/0x480
       io_wq_create+0x3b5/0x600
       io_uring_alloc_task_context+0x13c/0x380
       io_uring_add_task_file+0x109/0x140
       __x64_sys_io_uring_enter+0x45f/0x660
       do_syscall_64+0x32/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 3080422:
       kasan_save_stack+0x23/0x60
       kasan_set_track+0x20/0x40
       kasan_set_free_info+0x24/0x40
       __kasan_slab_free+0xe8/0x120
       kfree+0xa8/0x400
       io_wq_put+0x14a/0x220
       io_wq_put_and_exit+0x9a/0xc0
       io_uring_clean_tctx+0x101/0x140
       __io_uring_files_cancel+0x36e/0x3c0
       do_exit+0x169/0x1340
       __x64_sys_exit+0x34/0x40
       do_syscall_64+0x32/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Have the manager itself hold a reference, and now both drop points drop
      and complete if we hit zero, and the manager can unconditionally do a
      wait_for_completion() instead of having a race between reading the ref
      count and waiting if it was non-zero.
      
      Fixes: fb3a1f6c ("io-wq: have manager wait for all workers to exit")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      886d0137
  6. 06 3月, 2021 1 次提交
  7. 05 3月, 2021 5 次提交
    • J
      io_uring: make SQPOLL thread parking saner · 86e0d676
      Jens Axboe 提交于
      We have this weird true/false return from parking, and then some of the
      callers decide to look at that. It can lead to unbalanced parks and
      sqd locking. Have the callers check the thread status once it's parked.
      We know we have the lock at that point, so it's either valid or it's NULL.
      
      Fix race with parking on thread exit. We need to be careful here with
      ordering of the sdq->lock and the IO_SQ_THREAD_SHOULD_PARK bit.
      
      Rename sqd->completion to sqd->parked to reflect that this is the only
      thing this completion event doesn.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      86e0d676
    • J
      io-wq: kill hashed waitqueue before manager exits · 09ca6c40
      Jens Axboe 提交于
      If we race with shutting down the io-wq context and someone queueing
      a hashed entry, then we can exit the manager with it armed. If it then
      triggers after the manager has exited, we can have a use-after-free where
      io_wqe_hash_wake() attempts to wake a now gone manager process.
      
      Move the killing of the hashed write queue into the manager itself, so
      that we know we've killed it before the task exits.
      
      Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      09ca6c40
    • J
      io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return · b5b0ecb7
      Jens Axboe 提交于
      The callback can only be armed, if we get -EIOCBQUEUED returned. It's
      important that we clear the WAITQ bit for other cases, otherwise we can
      queue for async retry and filemap will assume that we're armed and
      return -EAGAIN instead of just blocking for the IO.
      
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b5b0ecb7
    • J
      io_uring: don't keep looping for more events if we can't flush overflow · ca0a2651
      Jens Axboe 提交于
      It doesn't make sense to wait for more events to come in, if we can't
      even flush the overflow we already have to the ring. Return -EBUSY for
      that condition, just like we do for attempts to submit with overflow
      pending.
      
      Cc: stable@vger.kernel.org # 5.11
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ca0a2651
    • J
      io_uring: move to using create_io_thread() · 46fe18b1
      Jens Axboe 提交于
      This allows us to do task creation and setup without needing to use
      completions to try and synchronize with the starting thread. Get rid of
      the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup
      completion events - we can now do setup before the task is running.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      46fe18b1