1. 06 7月, 2020 3 次提交
    • P
      io_uring: fix mis-refcounting linked timeouts · 6df1db6b
      Pavel Begunkov 提交于
      io_prep_linked_timeout() sets REQ_F_LINK_TIMEOUT altering refcounting of
      the following linked request. After that someone should call
      io_queue_linked_timeout(), otherwise a submission reference of the linked
      timeout won't be ever dropped.
      
      That's what happens in io_steal_work() if io-wq decides to postpone linked
      request with io_wqe_enqueue(). io_queue_linked_timeout() can also be
      potentially called twice without synchronisation during re-submission,
      e.g. io_rw_resubmit().
      
      There are the rules, whoever did io_prep_linked_timeout() must also call
      io_queue_linked_timeout(). To not do it twice, io_prep_linked_timeout()
      will return non NULL only for the first call. That's controlled by
      REQ_F_LINK_TIMEOUT flag.
      
      Also kill REQ_F_QUEUE_TIMEOUT.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6df1db6b
    • J
      io_uring: use new io_req_task_work_add() helper throughout · c2c4c83c
      Jens Axboe 提交于
      Since we now have that in the 5.9 branch, convert the existing users of
      task_work_add() to use this new helper.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c2c4c83c
    • J
      io_uring: abstract out task work running · 4c6e277c
      Jens Axboe 提交于
      Provide a helper to run task_work instead of checking and running
      manually in a bunch of different spots. While doing so, also move the
      task run state setting where we run the task work. Then we can move it
      out of the callback helpers. This also helps ensure we only do this once
      per task_work list run, not per task_work item.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4c6e277c
  2. 05 7月, 2020 1 次提交
    • J
      io_uring: fix regression with always ignoring signals in io_cqring_wait() · b7db41c9
      Jens Axboe 提交于
      When switching to TWA_SIGNAL for task_work notifications, we also made
      any signal based condition in io_cqring_wait() return -ERESTARTSYS.
      This breaks applications that rely on using signals to abort someone
      waiting for events.
      
      Check if we have a signal pending because of queued task_work, and
      repeat the signal check once we've run the task_work. This provides a
      reliable way of telling the two apart.
      
      Additionally, only use TWA_SIGNAL if we are using an eventfd. If not,
      we don't have the dependency situation described in the original commit,
      and we can get by with just using TWA_RESUME like we previously did.
      
      Fixes: ce593a6c ("io_uring: use signal based task_work running")
      Cc: stable@vger.kernel.org # v5.7
      Reported-by: NAndres Freund <andres@anarazel.de>
      Tested-by: NAndres Freund <andres@anarazel.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b7db41c9
  3. 01 7月, 2020 1 次提交
    • J
      io_uring: use signal based task_work running · ce593a6c
      Jens Axboe 提交于
      Since 5.7, we've been using task_work to trigger async running of
      requests in the context of the original task. This generally works
      great, but there's a case where if the task is currently blocked
      in the kernel waiting on a condition to become true, it won't process
      task_work. Even though the task is woken, it just checks whatever
      condition it's waiting on, and goes back to sleep if it's still false.
      
      This is a problem if that very condition only becomes true when that
      task_work is run. An example of that is the task registering an eventfd
      with io_uring, and it's now blocked waiting on an eventfd read. That
      read could depend on a completion event, and that completion event
      won't get trigged until task_work has been run.
      
      Use the TWA_SIGNAL notification for task_work, so that we ensure that
      the task always runs the work when queued.
      
      Cc: stable@vger.kernel.org # v5.7
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ce593a6c
  4. 30 6月, 2020 15 次提交
  5. 29 6月, 2020 1 次提交
  6. 28 6月, 2020 14 次提交
  7. 27 6月, 2020 3 次提交
    • R
      io_uring: fix function args for !CONFIG_NET · 1e16c2f9
      Randy Dunlap 提交于
      Fix build errors when CONFIG_NET is not set/enabled:
      
      ../fs/io_uring.c:5472:10: error: too many arguments to function ‘io_sendmsg’
      ../fs/io_uring.c:5474:10: error: too many arguments to function ‘io_send’
      ../fs/io_uring.c:5484:10: error: too many arguments to function ‘io_recvmsg’
      ../fs/io_uring.c:5486:10: error: too many arguments to function ‘io_recv’
      ../fs/io_uring.c:5510:9: error: too many arguments to function ‘io_accept’
      ../fs/io_uring.c:5518:9: error: too many arguments to function ‘io_connect’
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: io-uring@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1e16c2f9
    • P
      io-wq: return next work from ->do_work() directly · f4db7182
      Pavel Begunkov 提交于
      It's easier to return next work from ->do_work() than
      having an in-out argument. Looks nicer and easier to compile.
      Also, merge io_wq_assign_next() into its only user.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f4db7182
    • J
      io_uring: use task_work for links if possible · c40f6379
      Jens Axboe 提交于
      Currently links are always done in an async fashion, unless we catch them
      inline after we successfully complete a request without having to resort
      to blocking. This isn't necessarily the most efficient approach, it'd be
      more ideal if we could just use the task_work handling for this.
      
      Outside of saving an async jump, we can also do less prep work for these
      kinds of requests.
      
      Running dependent links from the task_work handler yields some nice
      performance benefits. As an example, examples/link-cp from the liburing
      repository uses read+write links to implement a copy operation. Without
      this patch, the a cache fold 4G file read from a VM runs in about 3
      seconds:
      
      $ time examples/link-cp /data/file /dev/null
      
      real	0m2.986s
      user	0m0.051s
      sys	0m2.843s
      
      and a subsequent cache hot run looks like this:
      
      $ time examples/link-cp /data/file /dev/null
      
      real	0m0.898s
      user	0m0.069s
      sys	0m0.797s
      
      With this patch in place, the cold case takes about 2.4 seconds:
      
      $ time examples/link-cp /data/file /dev/null
      
      real	0m2.400s
      user	0m0.020s
      sys	0m2.366s
      
      and the cache hot case looks like this:
      
      $ time examples/link-cp /data/file /dev/null
      
      real	0m0.676s
      user	0m0.010s
      sys	0m0.665s
      
      As expected, the (mostly) cache hot case yields the biggest improvement,
      running about 25% faster with this change, while the cache cold case
      yields about a 20% increase in performance. Outside of the performance
      increase, we're using less CPU as well, as we're not using the async
      offload threads at all for this anymore.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c40f6379
  8. 25 6月, 2020 2 次提交
    • J
      io_uring: enable READ/WRITE to use deferred completions · a1d7c393
      Jens Axboe 提交于
      A bit more surgery required here, as completions are generally done
      through the kiocb->ki_complete() callback, even if they complete inline.
      This enables the regular read/write path to use the io_comp_state
      logic to batch inline completions.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a1d7c393
    • J
      io_uring: pass in completion state to appropriate issue side handlers · 229a7b63
      Jens Axboe 提交于
      Provide the completion state to the handlers that we know can complete
      inline, so they can utilize this for batching completions.
      
      Cap the max batch count at 32. This should be enough to provide a good
      amortization of the cost of the lock+commit dance for completions, while
      still being low enough not to cause any real latency issues for SQPOLL
      applications.
      
      Xuan Zhuo <xuanzhuo@linux.alibaba.com> reports that this changes his
      profile from:
      
      17.97% [kernel] [k] copy_user_generic_unrolled
      13.92% [kernel] [k] io_commit_cqring
      11.04% [kernel] [k] __io_cqring_fill_event
      10.33% [kernel] [k] udp_recvmsg
       5.94% [kernel] [k] skb_release_data
       4.31% [kernel] [k] udp_rmem_release
       2.68% [kernel] [k] __check_object_size
       2.24% [kernel] [k] __slab_free
       2.22% [kernel] [k] _raw_spin_lock_bh
       2.21% [kernel] [k] kmem_cache_free
       2.13% [kernel] [k] free_pcppages_bulk
       1.83% [kernel] [k] io_submit_sqes
       1.38% [kernel] [k] page_frag_free
       1.31% [kernel] [k] inet_recvmsg
      
      to
      
      19.99% [kernel] [k] copy_user_generic_unrolled
      11.63% [kernel] [k] skb_release_data
       9.36% [kernel] [k] udp_rmem_release
       8.64% [kernel] [k] udp_recvmsg
       6.21% [kernel] [k] __slab_free
       4.39% [kernel] [k] __check_object_size
       3.64% [kernel] [k] free_pcppages_bulk
       2.41% [kernel] [k] kmem_cache_free
       2.00% [kernel] [k] io_submit_sqes
       1.95% [kernel] [k] page_frag_free
       1.54% [kernel] [k] io_put_req
      [...]
       0.07% [kernel] [k] io_commit_cqring
       0.44% [kernel] [k] __io_cqring_fill_event
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      229a7b63