1. 21 1月, 2020 12 次提交
    • J
      io_uring: add lookup table for various opcode needs · d3656344
      Jens Axboe 提交于
      We currently have various switch statements that check if an opcode needs
      a file, mm, etc. These are hard to keep in sync as opcodes are added. Add
      a struct io_op_def that holds all of this information, so we have just
      one spot to update when opcodes are added.
      
      This also enables us to NOT allocate req->io if a deferred command
      doesn't need it, and corrects some mistakes we had in terms of what
      commands need mm context.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d3656344
    • J
      io_uring: remove two unnecessary function declarations · add7b6b8
      Jens Axboe 提交于
      __io_free_req() and io_double_put_req() aren't used before they are
      defined, so we can kill these two forwards.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      add7b6b8
    • P
      io_uring: move *queue_link_head() from common path · 32fe525b
      Pavel Begunkov 提交于
      Move io_queue_link_head() to links handling code in io_submit_sqe(),
      so it wouldn't need extra checks and would have better data locality.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      32fe525b
    • P
      io_uring: rename prev to head · 9d76377f
      Pavel Begunkov 提交于
      Calling "prev" a head of a link is a bit misleading. Rename it
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9d76377f
    • J
      io_uring: add IOSQE_ASYNC · ce35a47a
      Jens Axboe 提交于
      io_uring defaults to always doing inline submissions, if at all
      possible. But for larger copies, even if the data is fully cached, that
      can take a long time. Add an IOSQE_ASYNC flag that the application can
      set on the SQE - if set, it'll ensure that we always go async for those
      kinds of requests. Use the io-wq IO_WQ_WORK_CONCURRENT flag to ensure we
      get the concurrency we desire for this case.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ce35a47a
    • J
      io_uring: add support for IORING_OP_STATX · eddc7ef5
      Jens Axboe 提交于
      This provides support for async statx(2) through io_uring.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eddc7ef5
    • J
      io_uring: avoid ring quiesce for fixed file set unregister and update · 05f3fb3c
      Jens Axboe 提交于
      We currently fully quiesce the ring before an unregister or update of
      the fixed fileset. This is very expensive, and we can be a bit smarter
      about this.
      
      Add a percpu refcount for the file tables as a whole. Grab a percpu ref
      when we use a registered file, and put it on completion. This is cheap
      to do. Upon removal of a file from a set, switch the ref count to atomic
      mode. When we hit zero ref on the completion side, then we know we can
      drop the previously registered files. When the old files have been
      dropped, switch the ref back to percpu mode for normal operation.
      
      Since there's a period between doing the update and the kernel being
      done with it, add a IORING_OP_FILES_UPDATE opcode that can perform the
      same action. The application knows the update has completed when it gets
      the CQE for it. Between doing the update and receiving this completion,
      the application must continue to use the unregistered fd if submitting
      IO on this particular file.
      
      This takes the runtime of test/file-register from liburing from 14s to
      about 0.7s.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      05f3fb3c
    • J
      io_uring: add support for IORING_OP_CLOSE · b5dba59e
      Jens Axboe 提交于
      This works just like close(2), unsurprisingly. We remove the file
      descriptor and post the completion inline, then offload the actual
      (potential) last file put to async context.
      
      Mark the async part of this work as uncancellable, as we really must
      guarantee that the latter part of the close is run.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b5dba59e
    • J
      io-wq: add support for uncancellable work · 0c9d5ccd
      Jens Axboe 提交于
      Not all work can be cancelled, some of it we may need to guarantee
      that it runs to completion. Allow the caller to set IO_WQ_WORK_NO_CANCEL
      on work that must not be cancelled. Note that the caller work function
      must also check for IO_WQ_WORK_NO_CANCEL on work that is marked
      IO_WQ_WORK_CANCEL.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0c9d5ccd
    • J
      io_uring: add support for IORING_OP_OPENAT · 15b71abe
      Jens Axboe 提交于
      This works just like openat(2), except it can be performed async. For
      the normal case of a non-blocking path lookup this will complete
      inline. If we have to do IO to perform the open, it'll be done from
      async context.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      15b71abe
    • J
      io_uring: add support for fallocate() · d63d1b5e
      Jens Axboe 提交于
      This exposes fallocate(2) through io_uring.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d63d1b5e
    • E
      io_uring: fix compat for IORING_REGISTER_FILES_UPDATE · 1292e972
      Eugene Syromiatnikov 提交于
      fds field of struct io_uring_files_update is problematic with regards
      to compat user space, as pointer size is different in 32-bit, 32-on-64-bit,
      and 64-bit user space.  In order to avoid custom handling of compat in
      the syscall implementation, make fds __u64 and use u64_to_user_ptr in
      order to retrieve it.  Also, align the field naturally and check that
      no garbage is passed there.
      
      Fixes: c3a31e60 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE")
      Signed-off-by: NEugene Syromiatnikov <esyr@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1292e972
  2. 17 1月, 2020 1 次提交
  3. 16 1月, 2020 2 次提交
  4. 15 1月, 2020 1 次提交
  5. 14 1月, 2020 1 次提交
  6. 08 1月, 2020 1 次提交
    • J
      io_uring: remove punt of short reads to async context · eacc6dfa
      Jens Axboe 提交于
      We currently punt any short read on a regular file to async context,
      but this fails if the short read is due to running into EOF. This is
      especially problematic since we only do the single prep for commands
      now, as we don't reset kiocb->ki_pos. This can result in a 4k read on
      a 1k file returning zero, as we detect the short read and then retry
      from async context. At the time of retry, the position is now 1k, and
      we end up reading nothing, and hence return 0.
      
      Instead of trying to patch around the fact that short reads can be
      legitimate and won't succeed in case of retry, remove the logic to punt
      a short read to async context. Simply return it.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eacc6dfa
  7. 21 12月, 2019 6 次提交
    • J
      io_uring: pass in 'sqe' to the prep handlers · 3529d8c2
      Jens Axboe 提交于
      This moves the prep handlers outside of the opcode handlers, and allows
      us to pass in the sqe directly. If the sqe is non-NULL, it means that
      the request should be prepared for the first time.
      
      With the opcode handlers not having access to the sqe at all, we are
      guaranteed that the prep handler has setup the request fully by the
      time we get there. As before, for opcodes that need to copy in more
      data then the io_kiocb allows for, the io_async_ctx holds that info. If
      a prep handler is invoked with req->io set, it must use that to retain
      information for later.
      
      Finally, we can remove io_kiocb->sqe as well.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3529d8c2
    • J
      io_uring: standardize the prep methods · 06b76d44
      Jens Axboe 提交于
      We currently have a mix of use cases. Most of the newer ones are pretty
      uniform, but we have some older ones that use different calling
      calling conventions. This is confusing.
      
      For the opcodes that currently rely on the req->io->sqe copy saving
      them from reuse, add a request type struct in the io_kiocb command
      union to store the data they need.
      
      Prepare for all opcodes having a standard prep method, so we can call
      it in a uniform fashion and outside of the opcode handler. This is in
      preparation for passing in the 'sqe' pointer, rather than storing it
      in the io_kiocb. Once we have uniform prep handlers, we can leave all
      the prep work to that part, and not even pass in the sqe to the opcode
      handler. This ensures that we don't reuse sqe data inadvertently.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      06b76d44
    • J
      io_uring: read 'count' for IORING_OP_TIMEOUT in prep handler · 26a61679
      Jens Axboe 提交于
      Add the count field to struct io_timeout, and ensure the prep handler
      has read it. Timeout also needs an async context always, set it up
      in the prep handler if we don't have one.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      26a61679
    • J
      io_uring: move all prep state for IORING_OP_{SEND,RECV}_MGS to prep handler · e47293fd
      Jens Axboe 提交于
      Add struct io_sr_msg in our io_kiocb per-command union, and ensure that
      the send/recvmsg prep handlers have grabbed what they need from the SQE
      by the time prep is done.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e47293fd
    • J
      io_uring: move all prep state for IORING_OP_CONNECT to prep handler · 3fbb51c1
      Jens Axboe 提交于
      Add struct io_connect in our io_kiocb per-command union, and ensure
      that io_connect_prep() has grabbed what it needs from the SQE.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3fbb51c1
    • J
      io_uring: add and use struct io_rw for read/writes · 9adbd45d
      Jens Axboe 提交于
      Put the kiocb in struct io_rw, and add the addr/len for the request as
      well. Use the kiocb->private field for the buffer index for fixed reads
      and writes.
      
      Any use of kiocb->ki_filp is flipped to req->file. It's the same thing,
      and less confusing.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9adbd45d
  8. 20 12月, 2019 1 次提交
  9. 19 12月, 2019 2 次提交
    • J
      io_uring: io_wq_submit_work() should not touch req->rw · fd6c2e4c
      Jens Axboe 提交于
      I've been chasing a weird and obscure crash that was userspace stack
      corruption, and finally narrowed it down to a bit flip that made a
      stack address invalid. io_wq_submit_work() unconditionally flips
      the req->rw.ki_flags IOCB_NOWAIT bit, but since it's a generic work
      handler, this isn't valid. Normal read/write operations own that
      part of the request, on other types it could be something else.
      
      Move the IOCB_NOWAIT clear to the read/write handlers where it belongs.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fd6c2e4c
    • P
      io_uring: don't wait when under-submitting · 7c504e65
      Pavel Begunkov 提交于
      There is no reliable way to submit and wait in a single syscall, as
      io_submit_sqes() may under-consume sqes (in case of an early error).
      Then it will wait for not-yet-submitted requests, deadlocking the user
      in most cases.
      
      Don't wait/poll if can't submit all sqes
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7c504e65
  10. 18 12月, 2019 9 次提交
  11. 16 12月, 2019 2 次提交
  12. 12 12月, 2019 1 次提交
    • J
      io_uring: ensure we return -EINVAL on unknown opcode · 9e3aa61a
      Jens Axboe 提交于
      If we submit an unknown opcode and have fd == -1, io_op_needs_file()
      will return true as we default to needing a file. Then when we go and
      assign the file, we find the 'fd' invalid and return -EBADF. We really
      should be returning -EINVAL for that case, as we normally do for
      unsupported opcodes.
      
      Change io_op_needs_file() to have the following return values:
      
      0   - does not need a file
      1   - does need a file
      < 0 - error value
      
      and use this to pass back the right value for this invalid case.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9e3aa61a
  13. 11 12月, 2019 1 次提交
    • J
      io_uring: add sockets to list of files that support non-blocking issue · 10d59345
      Jens Axboe 提交于
      In chasing a performance issue between using IORING_OP_RECVMSG and
      IORING_OP_READV on sockets, tracing showed that we always punt the
      socket reads to async offload. This is due to io_file_supports_async()
      not checking for S_ISSOCK on the inode. Since sockets supports the
      O_NONBLOCK (or MSG_DONTWAIT) flag just fine, add sockets to the list
      of file types that we can do a non-blocking issue to.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      10d59345