1. 12 3月, 2021 2 次提交
    • J
      io_uring: perform IOPOLL reaping if canceler is thread itself · d052d1d6
      Jens Axboe 提交于
      We bypass IOPOLL completion polling (and reaping) for the SQPOLL thread,
      but if it's the thread itself invoking cancelations, then we still need
      to perform it or no one will.
      
      Fixes: 9936c7c2 ("io_uring: deduplicate core cancellations sequence")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d052d1d6
    • J
      io_uring: force creation of separate context for ATTACH_WQ and non-threads · 5c2469e0
      Jens Axboe 提交于
      Earlier kernels had SQPOLL threads that could share across anything, as
      we grabbed the context we needed on a per-ring basis. This is no longer
      the case, so only allow attaching directly if we're in the same thread
      group. That is the common use case. For non-group tasks, just setup a
      new context and thread as we would've done if sharing wasn't set. This
      isn't 100% ideal in terms of CPU utilization for the forked and share
      case, but hopefully that isn't much of a concern. If it is, there are
      plans in motion for how to improve that. Most importantly, we want to
      avoid app side regressions where sharing worked before and now doesn't.
      With this patch, functionality is equivalent to previous kernels that
      supported IORING_SETUP_ATTACH_WQ with SQPOLL.
      Reported-by: NStefan Metzmacher <metze@samba.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5c2469e0
  2. 10 3月, 2021 15 次提交
  3. 08 3月, 2021 10 次提交
  4. 07 3月, 2021 2 次提交
    • J
      io-wq: always track creds for async issue · 003e8dcc
      Jens Axboe 提交于
      If we go async with a request, grab the creds that the task currently has
      assigned and make sure that the async side switches to them. This is
      handled in the same way that we do for registered personalities.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      003e8dcc
    • J
      io-wq: fix race in freeing 'wq' and worker access · 886d0137
      Jens Axboe 提交于
      Ran into a use-after-free on the main io-wq struct, wq. It has a worker
      ref and completion event, but the manager itself isn't holding a
      reference. This can lead to a race where the manager thinks there are
      no workers and exits, but a worker is being added. That leads to the
      following trace:
      
      BUG: KASAN: use-after-free in io_wqe_worker+0x4c0/0x5e0
      Read of size 8 at addr ffff888108baa8a0 by task iou-wrk-3080422/3080425
      
      CPU: 5 PID: 3080425 Comm: iou-wrk-3080422 Not tainted 5.12.0-rc1+ #110
      Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO 10G (MS-7C60), BIOS 1.60 05/13/2020
      Call Trace:
       dump_stack+0x90/0xbe
       print_address_description.constprop.0+0x67/0x28d
       ? io_wqe_worker+0x4c0/0x5e0
       kasan_report.cold+0x7b/0xd4
       ? io_wqe_worker+0x4c0/0x5e0
       __asan_load8+0x6d/0xa0
       io_wqe_worker+0x4c0/0x5e0
       ? io_worker_handle_work+0xc00/0xc00
       ? recalc_sigpending+0xe5/0x120
       ? io_worker_handle_work+0xc00/0xc00
       ? io_worker_handle_work+0xc00/0xc00
       ret_from_fork+0x1f/0x30
      
      Allocated by task 3080422:
       kasan_save_stack+0x23/0x60
       __kasan_kmalloc+0x80/0xa0
       kmem_cache_alloc_node_trace+0xa0/0x480
       io_wq_create+0x3b5/0x600
       io_uring_alloc_task_context+0x13c/0x380
       io_uring_add_task_file+0x109/0x140
       __x64_sys_io_uring_enter+0x45f/0x660
       do_syscall_64+0x32/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 3080422:
       kasan_save_stack+0x23/0x60
       kasan_set_track+0x20/0x40
       kasan_set_free_info+0x24/0x40
       __kasan_slab_free+0xe8/0x120
       kfree+0xa8/0x400
       io_wq_put+0x14a/0x220
       io_wq_put_and_exit+0x9a/0xc0
       io_uring_clean_tctx+0x101/0x140
       __io_uring_files_cancel+0x36e/0x3c0
       do_exit+0x169/0x1340
       __x64_sys_exit+0x34/0x40
       do_syscall_64+0x32/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Have the manager itself hold a reference, and now both drop points drop
      and complete if we hit zero, and the manager can unconditionally do a
      wait_for_completion() instead of having a race between reading the ref
      count and waiting if it was non-zero.
      
      Fixes: fb3a1f6c ("io-wq: have manager wait for all workers to exit")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      886d0137
  5. 06 3月, 2021 1 次提交
  6. 05 3月, 2021 7 次提交
    • J
      io_uring: make SQPOLL thread parking saner · 86e0d676
      Jens Axboe 提交于
      We have this weird true/false return from parking, and then some of the
      callers decide to look at that. It can lead to unbalanced parks and
      sqd locking. Have the callers check the thread status once it's parked.
      We know we have the lock at that point, so it's either valid or it's NULL.
      
      Fix race with parking on thread exit. We need to be careful here with
      ordering of the sdq->lock and the IO_SQ_THREAD_SHOULD_PARK bit.
      
      Rename sqd->completion to sqd->parked to reflect that this is the only
      thing this completion event doesn.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      86e0d676
    • J
      io-wq: kill hashed waitqueue before manager exits · 09ca6c40
      Jens Axboe 提交于
      If we race with shutting down the io-wq context and someone queueing
      a hashed entry, then we can exit the manager with it armed. If it then
      triggers after the manager has exited, we can have a use-after-free where
      io_wqe_hash_wake() attempts to wake a now gone manager process.
      
      Move the killing of the hashed write queue into the manager itself, so
      that we know we've killed it before the task exits.
      
      Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      09ca6c40
    • J
      io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return · b5b0ecb7
      Jens Axboe 提交于
      The callback can only be armed, if we get -EIOCBQUEUED returned. It's
      important that we clear the WAITQ bit for other cases, otherwise we can
      queue for async retry and filemap will assume that we're armed and
      return -EAGAIN instead of just blocking for the IO.
      
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b5b0ecb7
    • J
      io_uring: don't keep looping for more events if we can't flush overflow · ca0a2651
      Jens Axboe 提交于
      It doesn't make sense to wait for more events to come in, if we can't
      even flush the overflow we already have to the ring. Return -EBUSY for
      that condition, just like we do for attempts to submit with overflow
      pending.
      
      Cc: stable@vger.kernel.org # 5.11
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ca0a2651
    • J
      io_uring: move to using create_io_thread() · 46fe18b1
      Jens Axboe 提交于
      This allows us to do task creation and setup without needing to use
      completions to try and synchronize with the starting thread. Get rid of
      the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup
      completion events - we can now do setup before the task is running.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      46fe18b1
    • P
      io_uring: reliably cancel linked timeouts · dd59a3d5
      Pavel Begunkov 提交于
      Linked timeouts are fired asynchronously (i.e. soft-irq), and use
      generic cancellation paths to do its stuff, including poking into io-wq.
      The problem is that it's racy to access tctx->io_wq, as
      io_uring_task_cancel() and others may be happening at this exact moment.
      Mark linked timeouts with REQ_F_INLIFGHT for now, making sure there are
      no timeouts before io-wq destraction.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dd59a3d5
    • P
      io_uring: cancel-match based on flags · b05a1bcd
      Pavel Begunkov 提交于
      Instead of going into request internals, like checking req->file->f_op,
      do match them based on REQ_F_INFLIGHT, it's set only when we want it to
      be reliably cancelled.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b05a1bcd
  7. 04 3月, 2021 3 次提交