1. 30 10月, 2019 10 次提交
    • P
      io_uring: Fix mm_fault with READ/WRITE_FIXED · 95a1b3ff
      Pavel Begunkov 提交于
      Commit fb5ccc98 ("io_uring: Fix broken links with offloading")
      introduced a potential performance regression with unconditionally
      taking mm even for READ/WRITE_FIXED operations.
      
      Return the logic handling it back. mm-faulted requests will go through
      the generic submission path, so honoring links and drains, but will
      fail further on req->has_user check.
      
      Fixes: fb5ccc98 ("io_uring: Fix broken links with offloading")
      Cc: stable@vger.kernel.org # v5.4
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      95a1b3ff
    • P
      io_uring: remove index from sqe_submit · fa456228
      Pavel Begunkov 提交于
      submit->index is used only for inbound check in submission path (i.e.
      head < ctx->sq_entries). However, it always will be true, as
      1. it's already validated by io_get_sqring()
      2. ctx->sq_entries can't be changedd in between, because of held
      ctx->uring_lock and ctx->refs.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fa456228
    • D
      io_uring: add set of tracing events · c826bd7a
      Dmitrii Dolgov 提交于
      To trace io_uring activity one can get an information from workqueue and
      io trace events, but looks like some parts could be hard to identify via
      this approach. Making what happens inside io_uring more transparent is
      important to be able to reason about many aspects of it, hence introduce
      the set of tracing events.
      
      All such events could be roughly divided into two categories:
      
      * those, that are helping to understand correctness (from both kernel
        and an application point of view). E.g. a ring creation, file
        registration, or waiting for available CQE. Proposed approach is to
        get a pointer to an original structure of interest (ring context, or
        request), and then find relevant events. io_uring_queue_async_work
        also exposes a pointer to work_struct, to be able to track down
        corresponding workqueue events.
      
      * those, that provide performance related information. Mostly it's about
        events that change the flow of requests, e.g. whether an async work
        was queued, or delayed due to some dependencies. Another important
        case is how io_uring optimizations (e.g. registered files) are
        utilized.
      Signed-off-by: NDmitrii Dolgov <9erthalion6@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c826bd7a
    • J
      io_uring: add support for canceling timeout requests · 11365043
      Jens Axboe 提交于
      We might have cases where the need for a specific timeout is gone, add
      support for canceling an existing timeout operation. This works like the
      POLL_REMOVE command, where the application passes in the user_data of
      the timeout it wishes to cancel in the sqe->addr field.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      11365043
    • J
      io_uring: add support for absolute timeouts · a41525ab
      Jens Axboe 提交于
      This is a pretty trivial addition on top of the relative timeouts
      we have now, but it's handy for ensuring tighter timing for those
      that are building scheduling primitives on top of io_uring.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a41525ab
    • J
      io_uring: replace s->needs_lock with s->in_async · ba5290cc
      Jackie Liu 提交于
      There is no function change, just to clean up the code, use s->in_async
      to make the code know where it is.
      Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ba5290cc
    • J
      io_uring: allow application controlled CQ ring size · 33a107f0
      Jens Axboe 提交于
      We currently size the CQ ring as twice the SQ ring, to allow some
      flexibility in not overflowing the CQ ring. This is done because the
      SQE life time is different than that of the IO request itself, the SQE
      is consumed as soon as the kernel has seen the entry.
      
      Certain application don't need a huge SQ ring size, since they just
      submit IO in batches. But they may have a lot of requests pending, and
      hence need a big CQ ring to hold them all. By allowing the application
      to control the CQ ring size multiplier, we can cater to those
      applications more efficiently.
      
      If an application wants to define its own CQ ring size, it must set
      IORING_SETUP_CQSIZE in the setup flags, and fill out
      io_uring_params->cq_entries. The value must be a power of two.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      33a107f0
    • J
      io_uring: add support for IORING_REGISTER_FILES_UPDATE · c3a31e60
      Jens Axboe 提交于
      Allows the application to remove/replace/add files to/from a file set.
      Passes in a struct:
      
      struct io_uring_files_update {
      	__u32 offset;
      	__s32 *fds;
      };
      
      that holds an array of fds, size of array passed in through the usual
      nr_args part of the io_uring_register() system call. The logic is as
      follows:
      
      1) If ->fds[i] is -1, the existing file at i + ->offset is removed from
         the set.
      2) If ->fds[i] is a valid fd, the existing file at i + ->offset is
         replaced with ->fds[i].
      
      For case #2, is the existing file is currently empty (fd == -1), the
      new fd is simply added to the array.
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c3a31e60
    • J
      io_uring: allow sparse fixed file sets · 08a45173
      Jens Axboe 提交于
      This is in preparation for allowing updates to fixed file sets without
      requiring a full unregister+register.
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      08a45173
    • J
      io_uring: run dependent links inline if possible · ba816ad6
      Jens Axboe 提交于
      Currently any dependent link is executed from a new workqueue context,
      which means that we'll be doing a context switch per link in the chain.
      If we are running the completion of the current request from our async
      workqueue and find that the next request is a link, then run it directly
      from the workqueue context instead of forcing another switch.
      
      This improves the performance of linked SQEs, and reduces the CPU
      overhead.
      Reviewed-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ba816ad6
  2. 28 10月, 2019 2 次提交
  3. 26 10月, 2019 2 次提交
  4. 25 10月, 2019 3 次提交
  5. 24 10月, 2019 3 次提交
  6. 18 10月, 2019 2 次提交
  7. 15 10月, 2019 1 次提交
    • Y
      io_uring: consider the overflow of sequence for timeout req · 5da0fb1a
      yangerkun 提交于
      Now we recalculate the sequence of timeout with 'req->sequence =
      ctx->cached_sq_head + count - 1', judge the right place to insert
      for timeout_list by compare the number of request we still expected for
      completion. But we have not consider about the situation of overflow:
      
      1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for
      the new timeout req can have a small req->sequence.
      
      2. cached_sq_head of now may overflow compare with before req. And it
      will lead the timeout req with small req->sequence.
      
      This overflow will lead to the misorder of timeout_list, which can lead
      to the wrong order of the completion of timeout_list. Fix it by reuse
      req->submit.sequence to store the count, and change the logic of
      inserting sort in io_timeout.
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5da0fb1a
  8. 11 10月, 2019 1 次提交
  9. 10 10月, 2019 1 次提交
  10. 08 10月, 2019 1 次提交
  11. 04 10月, 2019 1 次提交
  12. 01 10月, 2019 1 次提交
    • A
      io_uring: use __kernel_timespec in timeout ABI · bdf20073
      Arnd Bergmann 提交于
      All system calls use struct __kernel_timespec instead of the old struct
      timespec, but this one was just added with the old-style ABI. Change it
      now to enforce the use of __kernel_timespec, avoiding ABI confusion and
      the need for compat handlers on 32-bit architectures.
      
      Any user space caller will have to use __kernel_timespec now, but this
      is unambiguous and works for any C library regardless of the time_t
      definition. A nicer way to specify the timeout would have been a less
      ambiguous 64-bit nanosecond value, but I suppose it's too late now to
      change that as this would impact both 32-bit and 64-bit users.
      
      Fixes: 5262f567 ("io_uring: IORING_OP_TIMEOUT support")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bdf20073
  13. 26 9月, 2019 1 次提交
  14. 25 9月, 2019 1 次提交
  15. 24 9月, 2019 2 次提交
  16. 19 9月, 2019 6 次提交
    • J
      io_uring: IORING_OP_TIMEOUT support · 5262f567
      Jens Axboe 提交于
      There's been a few requests for functionality similar to io_getevents()
      and epoll_wait(), where the user can specify a timeout for waiting on
      events. I deliberately did not add support for this through the system
      call initially to avoid overloading the args, but I can see that the use
      cases for this are valid.
      
      This adds support for IORING_OP_TIMEOUT. If a user wants to get woken
      when waiting for events, simply submit one of these timeout commands
      with your wait call (or before). This ensures that the application
      sleeping on the CQ ring waiting for events will get woken. The timeout
      command is passed in as a pointer to a struct timespec. Timeouts are
      relative. The timeout command also includes a way to auto-cancel after
      N events has passed.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5262f567
    • J
      io_uring: use cond_resched() in sqthread · 9831a90c
      Jens Axboe 提交于
      If preempt isn't enabled in the kernel, we can run into hang issues with
      sqthread submissions. Use cond_resched() to play nice instead of
      cpu_relax(), if we end up starting the loop and not having any events
      pending for submissions.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9831a90c
    • J
      io_uring: fix potential crash issue due to io_get_req failure · a1041c27
      Jackie Liu 提交于
      Sometimes io_get_req will return a NUL, then we need to do the
      correct error handling, otherwise it will cause the kernel null
      pointer exception.
      
      Fixes: 4fe2c963 ("io_uring: add support for link with drain")
      Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a1041c27
    • J
      io_uring: ensure poll commands clear ->sqe · 6cc47d1d
      Jens Axboe 提交于
      If we end up getting woken in poll (due to a signal), then we may need
      to punt the poll request to an async worker. When we do that, we look up
      the list to queue at, deferefencing req->submit.sqe, however that is
      only set for requests we initially decided to queue async.
      
      This fixes a crash with poll command usage and wakeups that need to punt
      to async context.
      
      Fixes: 54a91f3b ("io_uring: limit parallelism of buffered writes")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6cc47d1d
    • J
      io_uring: fix use-after-free of shadow_req · 5f5ad9ce
      Jackie Liu 提交于
      There is a potential dangling pointer problem. we never clean
      shadow_req, if there are multiple link lists in this series of
      sqes, then the shadow_req will not reallocate, and continue to
      use the last one. but in the previous, his memory has been
      released, thus forming a dangling pointer. let's clean up him
      and make sure that every new link list can reapply for a new
      shadow_req.
      
      Fixes: 4fe2c963 ("io_uring: add support for link with drain")
      Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5f5ad9ce
    • J
      io_uring: use kmemdup instead of kmalloc and memcpy · 954dab19
      Jackie Liu 提交于
      Just clean up the code, no function changes.
      Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      954dab19
  17. 15 9月, 2019 1 次提交
  18. 13 9月, 2019 1 次提交