1. 26 5月, 2021 1 次提交
    • P
      io_uring/io-wq: close io-wq full-stop gap · 17a91051
      Pavel Begunkov 提交于
      There is an old problem with io-wq cancellation where requests should be
      killed and are in io-wq but are not discoverable, e.g. in @next_hashed
      or @linked vars of io_worker_handle_work(). It adds some unreliability
      to individual request canellation, but also may potentially get
      __io_uring_cancel() stuck. For instance:
      
      1) An __io_uring_cancel()'s cancellation round have not found any
         request but there are some as desribed.
      2) __io_uring_cancel() goes to sleep
      3) Then workers wake up and try to execute those hidden requests
         that happen to be unbound.
      
      As we already cancel all requests of io-wq there, set IO_WQ_BIT_EXIT
      in advance, so preventing 3) from executing unbound requests. The
      workers will initially break looping because of getting a signal as they
      are threads of the dying/exec()'ing user task.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/abfcf8c54cb9e8f7bfbad7e9a0cc5433cc70bdc2.1621781238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      17a91051
  2. 12 4月, 2021 1 次提交
    • J
      io-wq: eliminate the need for a manager thread · 685fe7fe
      Jens Axboe 提交于
      io-wq relies on a manager thread to create/fork new workers, as needed.
      But there's really no strong need for it anymore. We have the following
      cases that fork a new worker:
      
      1) Work queue. This is done from the task itself always, and it's trivial
         to create a worker off that path, if needed.
      
      2) All workers have gone to sleep, and we have more work. This is called
         off the sched out path. For this case, use a task_work items to queue
         a fork-worker operation.
      
      3) Hashed work completion. Don't think we need to do anything off this
         case. If need be, it could just use approach 2 as well.
      
      Part of this change is incrementing the running worker count before the
      fork, to avoid cases where we observe we need a worker and then queue
      creation of one. Then new work comes in, we fork a new one. That last
      queue operation should have waited for the previous worker to come up,
      it's quite possible we don't even need it. Hence move the worker running
      from before we fork it off to more efficiently handle that case.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      685fe7fe
  3. 18 3月, 2021 1 次提交
  4. 07 3月, 2021 1 次提交
  5. 05 3月, 2021 1 次提交
    • J
      io_uring: move to using create_io_thread() · 46fe18b1
      Jens Axboe 提交于
      This allows us to do task creation and setup without needing to use
      completions to try and synchronize with the starting thread. Get rid of
      the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup
      completion events - we can now do setup before the task is running.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      46fe18b1
  6. 04 3月, 2021 2 次提交
    • J
      io_uring: move cred assignment into io_issue_sqe() · 5730b27e
      Jens Axboe 提交于
      If we move it in there, then we no longer have to care about it in io-wq.
      This means we can drop the cred handling in io-wq, and we can drop the
      REQ_F_WORK_INITIALIZED flag and async init functions as that was the last
      user of it since we moved to the new workers. Then we can also drop
      io_wq_work->creds, and just hold the personality u16 in there instead.
      Suggested-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5730b27e
    • J
      io-wq: provide an io_wq_put_and_exit() helper · afcc4015
      Jens Axboe 提交于
      If we put the io-wq from io_uring, we really want it to exit. Provide
      a helper that does that for us. Couple that with not having the manager
      hold a reference to the 'wq' and the normal SQPOLL exit will tear down
      the io-wq context appropriate.
      
      On the io-wq side, our wq context is per task, so only the task itself
      is manipulating ->manager and hence it's safe to check and clear without
      any extra locking. We just need to ensure that the manager task stays
      around, in case it exits.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      afcc4015
  7. 26 2月, 2021 2 次提交
    • J
      io-wq: improve manager/worker handling over exec · 4fb6ac32
      Jens Axboe 提交于
      exec will cancel any threads, including the ones that io-wq is using. This
      isn't a problem, in fact we'd prefer it to be that way since it means we
      know that any async work cancels naturally without having to handle it
      proactively.
      
      But it does mean that we need to setup a new manager, as the manager and
      workers are gone. Handle this at queue time, and cancel work if we fail.
      Since the manager can go away without us noticing, ensure that the manager
      itself holds a reference to the 'wq' as well. Rename io_wq_destroy() to
      io_wq_put() to reflect that.
      
      In the future we can now simplify exec cancelation handling, for now just
      leave it the same.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4fb6ac32
    • J
      io-wq: make buffered file write hashed work map per-ctx · e941894e
      Jens Axboe 提交于
      Before the io-wq thread change, we maintained a hash work map and lock
      per-node per-ring. That wasn't ideal, as we really wanted it to be per
      ring. But now that we have per-task workers, the hash map ends up being
      just per-task. That'll work just fine for the normal case of having
      one task use a ring, but if you share the ring between tasks, then it's
      considerably worse than it was before.
      
      Make the hash map per ctx instead, which provides full per-ctx buffered
      write serialization on hashed writes.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e941894e
  8. 24 2月, 2021 1 次提交
    • J
      io-wq: remove nr_process accounting · 728f13e7
      Jens Axboe 提交于
      We're now just using fork like we would from userspace, so there's no
      need to try and impose extra restrictions or accounting on the user
      side of things. That's already being done for us. That also means we
      don't have to pass in the user_struct anymore, that's correctly inherited
      through ->creds on fork.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      728f13e7
  9. 22 2月, 2021 6 次提交
  10. 10 2月, 2021 1 次提交
  11. 04 2月, 2021 1 次提交
  12. 02 2月, 2021 1 次提交
  13. 21 12月, 2020 1 次提交
  14. 18 12月, 2020 1 次提交
    • X
      io_uring: fix io_wqe->work_list corruption · 0020ef04
      Xiaoguang Wang 提交于
      For the first time a req punted to io-wq, we'll initialize io_wq_work's
      list to be NULL, then insert req to io_wqe->work_list. If this req is not
      inserted into tail of io_wqe->work_list, this req's io_wq_work list will
      point to another req's io_wq_work. For splitted bio case, this req maybe
      inserted to io_wqe->work_list repeatedly, once we insert it to tail of
      io_wqe->work_list for the second time, now io_wq_work->list->next will be
      invalid pointer, which then result in many strang error, panic, kernel
      soft-lockup, rcu stall, etc.
      
      In my vm, kernel doest not have commit cc29e1bf ("block: disable
      iopoll for split bio"), below fio job can reproduce this bug steadily:
      [global]
      name=iouring-sqpoll-iopoll-1
      ioengine=io_uring
      iodepth=128
      numjobs=1
      thread
      rw=randread
      direct=1
      registerfiles=1
      hipri=1
      bs=4m
      size=100M
      runtime=120
      time_based
      group_reporting
      randrepeat=0
      
      [device]
      directory=/home/feiman.wxg/mntpoint/  # an ext4 mount point
      
      If we have commit cc29e1bf ("block: disable iopoll for split bio"),
      there will no splitted bio case for polled io, but I think we still to need
      to fix this list corruption, it also should maybe go to stable branchs.
      
      To fix this corruption, if a req is inserted into tail of io_wqe->work_list,
      initialize req->io_wq_work->list->next to bu NULL.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0020ef04
  15. 10 12月, 2020 1 次提交
  16. 21 10月, 2020 1 次提交
  17. 17 10月, 2020 2 次提交
    • J
      io_uring: move io identity items into separate struct · 98447d65
      Jens Axboe 提交于
      io-wq contains a pointer to the identity, which we just hold in io_kiocb
      for now. This is in preparation for putting this outside io_kiocb. The
      only exception is struct files_struct, which we'll need different rules
      for to avoid a circular dependency.
      
      No functional changes in this patch.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      98447d65
    • J
      io_uring: pass required context in as flags · 0f203765
      Jens Axboe 提交于
      We have a number of bits that decide what context to inherit. Set up
      io-wq flags for these instead. This is in preparation for always having
      the various members set, but not always needing them for all requests.
      
      No intended functional changes in this patch.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0f203765
  18. 01 10月, 2020 2 次提交
  19. 25 7月, 2020 1 次提交
  20. 27 6月, 2020 2 次提交
  21. 15 6月, 2020 3 次提交
  22. 11 6月, 2020 1 次提交
    • X
      io_uring: avoid whole io_wq_work copy for requests completed inline · 7cdaf587
      Xiaoguang Wang 提交于
      If requests can be submitted and completed inline, we don't need to
      initialize whole io_wq_work in io_init_req(), which is an expensive
      operation, add a new 'REQ_F_WORK_INITIALIZED' to determine whether
      io_wq_work is initialized and add a helper io_req_init_async(), users
      must call io_req_init_async() for the first time touching any members
      of io_wq_work.
      
      I use /dev/nullb0 to evaluate performance improvement in my physical
      machine:
        modprobe null_blk nr_devices=1 completion_nsec=0
        sudo taskset -c 60 fio  -name=fiotest -filename=/dev/nullb0 -iodepth=128
        -thread -rw=read -ioengine=io_uring -direct=1 -bs=4k -size=100G -numjobs=1
        -time_based -runtime=120
      
      before this patch:
      Run status group 0 (all jobs):
         READ: bw=724MiB/s (759MB/s), 724MiB/s-724MiB/s (759MB/s-759MB/s),
         io=84.8GiB (91.1GB), run=120001-120001msec
      
      With this patch:
      Run status group 0 (all jobs):
         READ: bw=761MiB/s (798MB/s), 761MiB/s-761MiB/s (798MB/s-798MB/s),
         io=89.2GiB (95.8GB), run=120001-120001msec
      
      About 5% improvement.
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7cdaf587
  23. 09 6月, 2020 1 次提交
  24. 04 4月, 2020 1 次提交
  25. 24 3月, 2020 1 次提交
    • P
      io-wq: handle hashed writes in chains · 86f3cd1b
      Pavel Begunkov 提交于
      We always punt async buffered writes to an io-wq helper, as the core
      kernel does not have IOCB_NOWAIT support for that. Most buffered async
      writes complete very quickly, as it's just a copy operation. This means
      that doing multiple locking roundtrips on the shared wqe lock for each
      buffered write is wasteful. Additionally, buffered writes are hashed
      work items, which means that any buffered write to a given file is
      serialized.
      
      Keep identicaly hashed work items contiguously in @wqe->work_list, and
      track a tail for each hash bucket. On dequeue of a hashed item, splice
      all of the same hash in one go using the tracked tail. Until the batch
      is done, the caller doesn't have to synchronize with the wqe or worker
      locks again.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      86f3cd1b
  26. 23 3月, 2020 1 次提交
  27. 15 3月, 2020 1 次提交
  28. 05 3月, 2020 1 次提交
    • P
      io_uring/io-wq: forward submission ref to async · e9fd9396
      Pavel Begunkov 提交于
      First it changes io-wq interfaces. It replaces {get,put}_work() with
      free_work(), which guaranteed to be called exactly once. It also enforces
      free_work() callback to be non-NULL.
      
      io_uring follows the changes and instead of putting a submission reference
      in io_put_req_async_completion(), it will be done in io_free_work(). As
      removes io_get_work() with corresponding refcount_inc(), the ref balance
      is maintained.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e9fd9396