提交 · 1cce17aca621c38c657410dc278a48cda982dd2e · openeuler / Kernel

19 10月, 2021 25 次提交

io_uring: don't pass tail into io_free_batch_list · 1cce17ac

由 Pavel Begunkov 提交于 9月 24, 2021

io_free_batch_list() iterates all requests in the passed in list,
so we don't really need to know the tail but can keep iterating until
meet NULL. Just pass the first node into it and it will be enough.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4a12c84b6d887d980e05f417ba4172d04c64acae.1632516769.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1cce17ac

io_uring: inline completion batching helpers · d4b7a5ef

由 Pavel Begunkov 提交于 9月 24, 2021

We now have a single function for batched put of requests, just inline
struct req_batch and all related helpers into it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/595a2917f80dd94288cd7203052c7934f5446580.1632516769.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d4b7a5ef

io_uring: optimise batch completion · f5ed3bcd

由 Pavel Begunkov 提交于 9月 24, 2021

First, convert rest of iopoll bits to single linked lists, and also
replace per-request list_add_tail() with splicing a part of slist.

With that, use io_free_batch_list() to put/free requests. The main
advantage of it is that it's now the only user of struct req_batch and
friends, and so they can be inlined. The main overhead there was
per-request call to not-inlined io_req_free_batch(), which is expensive
enough.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b37fc6d5954b241e025eead7ab92c6f44a42f229.1632516769.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

f5ed3bcd

io_uring: convert iopoll_completed to store_release · b3fa03fd

由 Pavel Begunkov 提交于 9月 24, 2021

Convert explicit barrier around iopoll_completed to smp_load_acquire()
and smp_store_release(). Similar on the callback side, but replaces a
single smp_rmb() with per-request smp_load_acquire(), neither imply any
extra CPU ordering for x86. Use READ_ONCE as usual where it doesn't
matter.

Use it to move filling CQEs by iopoll earlier, that will be necessary
to avoid traversing the list one extra time in the future.
Suggested-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8bd663cb15efdc72d6247c38ee810964e744a450.1632516769.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b3fa03fd

io_uring: add a helper for batch free · 3aa83bfb

由 Pavel Begunkov 提交于 9月 24, 2021

Add a helper io_free_batch_list(), which takes a single linked list and
puts/frees all requests from it in an efficient manner. Will be reused
later.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4fc8306b542c6b1dd1d08e8021ef3bdb0ad15010.1632516769.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3aa83bfb

io_uring: use single linked list for iopoll · 5eef4e87

由 Pavel Begunkov 提交于 9月 24, 2021

Use single linked lists for keeping iopoll requests, takes less space,
may be faster, but mostly will be of benefit for further patches.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/314033676b100cd485518c3bc55e1b95a0dcd71f.1632516769.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5eef4e87

io_uring: split iopoll loop · e3f721e6

由 Pavel Begunkov 提交于 9月 24, 2021

The main loop of io_do_iopoll() iterates and does ->iopoll() until it
meets a first completed request, then it continues from that position
and splices requests to pass them through io_iopoll_complete().

Split the loop in two for clearness, iopolling and reaping completed
requests from the list.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a7f6fd27a94845e5dc925a47a4a9765a92e514fb.1632516769.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e3f721e6

io_uring: replace list with stack for req caches · c2b6c6bc

由 Pavel Begunkov 提交于 9月 24, 2021

Replace struct list_head free_list serving for caching requests with
singly linked stack, which is faster.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1bc942b82422fb2624b8353bd93aca183a022846.1632516769.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c2b6c6bc

io_uring: remove allocation cache array · 3ab665b7

由 Pavel Begunkov 提交于 9月 24, 2021

We have several of request allocation layers, remove the last one, which
is the submit->reqs array, and always use submit->free_reqs instead.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8547095c35f7a87bab14f6447ecd30a273ed7500.1632516769.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3ab665b7

io_uring: use slist for completion batching · 6f33b0bc

由 Pavel Begunkov 提交于 9月 24, 2021

Currently we collect requests for completion batching in an array.
Replace them with a singly linked list. It's as fast as arrays but
doesn't take some much space in ctx, and will be used in future patches.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a666826f2854d17e9fb9417fb302edfeb750f425.1632516769.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6f33b0bc

io_uring: make io_do_iopoll return number of reqs · 5ba3c874

由 Pavel Begunkov 提交于 9月 24, 2021

Don't pass nr_events pointer around but return directly, it's less
expensive than pointer increments.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f771a8153a86f16f12ff4272524e9e549c5de40b.1632516769.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5ba3c874

io_uring: force_nonspin · 87a115fb

由 Pavel Begunkov 提交于 9月 24, 2021

We don't really need to pass the number of requests to complete into
io_do_iopoll(), a flag whether to enforce non-spin mode is enough.

Should be straightforward, maybe except io_iopoll_check(). We pass !min
there, because we do never enter with the number of already reaped
requests is larger than the specified @min, apart from the first
iteration, where nr_events is 0 and so the final check should be
identical.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/782b39d1d8ec584eae15bca0a1feb6f0571fe5b8.1632516769.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

87a115fb

io_uring: mark having different creds unlikely · 6878b40e

由 Pavel Begunkov 提交于 9月 24, 2021

Hint the compiler that it's not as likely to have creds different from
current attached to a request. The current code generation is far from
ideal, hopefully it can help to some compilers to remove duplicated jump
tables and so.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e7815251ac4bf5a4a23d298c752f029ae19f3837.1632516769.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6878b40e

io_uring: return boolean value for io_alloc_async_data · 8d4af685

由 Hao Xu 提交于 9月 22, 2021

boolean value is good enough for io_alloc_async_data.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101522.9179-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

8d4af685

io_uring: optimise io_req_init() sqe flags checks · 68fe256a

由 Pavel Begunkov 提交于 9月 15, 2021

IOSQE_IO_DRAIN is quite marginal and we don't care too much about
IOSQE_BUFFER_SELECT. Save to ifs and hide both of them under
SQE_VALID_FLAGS check. Now we first check whether it uses a "safe"
subset, i.e. without DRAIN and BUFFER_SELECT, and only if it's not
true we test the rest of the flags.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dccfb9ab2ab0969a2d8dc59af88fa0ce44eeb1d5.1631703764.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

68fe256a

io_uring: remove ctx referencing from complete_post · a3f34907

由 Pavel Begunkov 提交于 9月 15, 2021

Now completions are done from task context, that means that it's either
the task itself, task_work or io-wq worker. In all those cases the ctx
will be staying alive by mutexing, explicit referencing or req references
by iowq. Remove extra ctx pinning from io_req_complete_post().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/60a0e96434c16ab4fe587651448290d61ec9a113.1631703756.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a3f34907

io_uring: add more uring info to fdinfo for debug · 83f84356

由 Hao Xu 提交于 9月 13, 2021

Developers may need some uring info to help themselves debug and address
issues in production. This includes sqring/cqring head/tail and the
detailed sqe/cqe info, which is very useful when an application is hung
on a ring.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210913130854.38542-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

83f84356

io_uring: kill extra wake_up_process in tw add · d97ec623

由 Pavel Begunkov 提交于 9月 08, 2021

TWA_SIGNAL already wakes the thread, no need in wake_up_process() after
it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7e90cf643f633e857443e0c9e72471b221735c50.1631115443.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d97ec623

io_uring: dedup CQE flushing non-empty checks · c450178d

由 Pavel Begunkov 提交于 9月 08, 2021

We don't do io_submit_flush_completions() when there is no requests
enqueued, and every single caller checks for it. Hide that check into
the function not forgetting about inlining. That will make it much
easier for changing the empty check condition in the future.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d7ff8cef5da1b38e8ea648f5aad9a315ddfc7b57.1631115443.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c450178d

io_uring: inline linked part of io_req_find_next · d81499bf

由 Pavel Begunkov 提交于 9月 08, 2021

Inline part of __io_req_find_next() that returns a request but doesn't
need io_disarm_next(). It's just two places, but makes links a bit
faster.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4126d13f23d0e91b39b3558e16bd86cafa7fcef2.1631115443.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d81499bf

io_uring: inline io_dismantle_req · 6b639522

由 Pavel Begunkov 提交于 9月 08, 2021

io_dismantle_req() is hot, and not _too_ huge. Inline it, there are 3
call sites, which hopefully will turn into 2 in the future.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bdd2dc30716cac270c2403e99bccd6286e4ae201.1631115443.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6b639522

io_uring: kill off ios_left · 4b628aeb

由 Pavel Begunkov 提交于 9月 08, 2021

->ios_left is only used to decide whether to plug or not, kill it to
avoid this extra accounting, just use the initial submission number.
There is no much difference in regards of enabling plugging, where this
one does it in a few more cases, but all major ones should be covered
well.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f13993bcf5b477f9a7d52881fc49f9457ea9870a.1631115443.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

4b628aeb

io_uring: dump sqe contents if issue fails · a87acfde

由 Jens Axboe 提交于 9月 11, 2021

I recently had to look at a production problem where a request ended
up getting the dreaded -EINVAL error on submit. The most used and
hence useless of error codes, as it just tells you that something
was wrong with your request, but not more than that.

Let's dump the full sqe contents if we run into an issue failure,
that'll allow easier diagnosing of a wide variety of issues.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a87acfde

io_uring: utilize the io batching infrastructure for more efficient polled IO · b688f11e

由 Jens Axboe 提交于 10月 12, 2021

Wire up using an io_comp_batch for f_op->iopoll(). If the lower stack
supports it, we can handle high rates of polled IO more efficiently.

This raises the single core efficiency on my system from ~6.1M IOPS to
~6.6M IOPS running a random read workload at depth 128 on two gen2
Optane drives.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b688f11e

block: add a struct io_comp_batch argument to fops->iopoll() · 5a72e899

由 Jens Axboe 提交于 10月 12, 2021

struct io_comp_batch contains a list head and a completion handler, which
will allow completions to more effciently completed batches of IO.

For now, no functional changes in this patch, we just define the
io_comp_batch structure and add the argument to the file_operations iopoll
handler.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5a72e899

18 10月, 2021 3 次提交

io_uring: don't sleep when polling for I/O · d729cf9a

由 Christoph Hellwig 提交于 10月 12, 2021

There is no point in sleeping for the expected I/O completion timeout
in the io_uring async polling model as we never poll for a specific
I/O.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

d729cf9a

block: replace the spin argument to blk_iopoll with a flags argument · ef99b2d3

由 Christoph Hellwig 提交于 10月 12, 2021

Switch the boolean spin argument to blk_poll to passing a set of flags
instead.  This will allow to control polling behavior in a more fine
grained way.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de
[axboe: adapt to changed io_uring iopoll]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef99b2d3

io_uring: fix a layering violation in io_iopoll_req_issued · 30da1b45

由 Christoph Hellwig 提交于 10月 12, 2021

syscall-level code can't just poke into the details of the poll cookie,
which is private information of the block layer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012111226.760968-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

30da1b45

14 10月, 2021 1 次提交

io_uring: fix wrong condition to grab uring lock · 14cfbb7a

由 Hao Xu 提交于 10月 14, 2021

Grab uring lock when we are in io-worker rather than in the original
or system-wq context since we already hold it in these two situation.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Fixes: b66ceaf3 ("io_uring: move iopoll reissue into regular IO path")
Link: https://lore.kernel.org/r/20211014140400.50235-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

14cfbb7a

02 10月, 2021 1 次提交

io_uring: kill fasync · 3f008385

由 Pavel Begunkov 提交于 10月 01, 2021

We have never supported fasync properly, it would only fire when there
is something polling io_uring making it useless. The original support came
in through the initial io_uring merge for 5.1. Since it's broken and
nobody has reported it, get rid of the fasync bits.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2f7ca3d344d406d34fa6713824198915c41cea86.1633080236.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3f008385

25 9月, 2021 8 次提交

io_uring: make OP_CLOSE consistent with direct open · 7df778be

由 Pavel Begunkov 提交于 9月 24, 2021

From recently open/accept are now able to manipulate fixed file table,
but it's inconsistent that close can't. Close the gap, keep API same as
with open/accept, i.e. via sqe->file_slot.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7df778be

io_uring: kill extra checks in io_write() · 9f3a2cb2

由 Pavel Begunkov 提交于 9月 24, 2021

We don't retry short writes and so we would never get to async setup in
io_write() in that case. Thus ret2 > 0 is always false and
iov_iter_advance() is never used. Apparently, the same is found by
Coverity, which complains on the code.

Fixes: cd658695 ("io_uring: use iov_iter state save/restore helpers")
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5b33e61034748ef1022766efc0fb8854cfcf749c.1632500058.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9f3a2cb2

io_uring: don't punt files update to io-wq unconditionally · cdb31c29

由 Jens Axboe 提交于 9月 24, 2021

There's no reason to punt it unconditionally, we just need to ensure that
the submit lock grabbing is conditional.

Fixes: 05f3fb3c ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cdb31c29

io_uring: put provided buffer meta data under memcg accounting · 9990da93

由 Jens Axboe 提交于 9月 24, 2021

For each provided buffer, we allocate a struct io_buffer to hold the
data associated with it. As a large number of buffers can be provided,
account that data with memcg.

Fixes: ddf0322d ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9990da93

io_uring: allow conditional reschedule for intensive iterators · 8bab4c09

由 Jens Axboe 提交于 9月 24, 2021

If we have a lot of threads and rings, the tctx list can get quite big.
This is especially true if we keep creating new threads and rings.
Likewise for the provided buffers list. Be nice and insert a conditional
reschedule point while iterating the nodes for deletion.

Link: https://lore.kernel.org/io-uring/00000000000064b6b405ccb41113@google.com/
Reported-by: syzbot+111d2a03f51f5ae73775@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8bab4c09

io_uring: fix potential req refcount underflow · 5b7aa38d

由 Hao Xu 提交于 9月 22, 2021

For multishot mode, there may be cases like:

iowq                                 original context
io_poll_add
  _arm_poll()
  mask = vfs_poll() is not 0
  if mask
(2)  io_poll_complete()
  compl_unlock
   (interruption happens
    tw queued to original
    context)
                                     io_poll_task_func()
                                     compl_lock
                                 (3) done = io_poll_complete() is true
                                     compl_unlock
                                     put req ref
(1) if (poll->flags & EPOLLONESHOT)
      put req ref

EPOLLONESHOT flag in (1) may be from (2) or (3), so there are multiple
combinations that can cause ref underfow.
Let's address it by:
- check the return value in (2) as done
- change (1) to if (done)
    in this way, we only do ref put in (1) if 'oneshot flag' is from
    (2)
- do poll.done check in io_poll_task_func(), so that we won't put ref
  for the second time.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-4-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5b7aa38d

io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow · a62682f9

由 Hao Xu 提交于 9月 22, 2021

We should set EPOLLONESHOT if cqring_fill_event() returns false since
io_poll_add() decides to put req or not by it.

Fixes: 5082620f ("io_uring: terminate multishot poll for CQ ring overflow")
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-3-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a62682f9

io_uring: fix race between poll completion and cancel_hash insertion · bd99c71b

由 Hao Xu 提交于 9月 22, 2021

If poll arming and poll completion runs in parallel, there maybe races.
For instance, run io_poll_add in iowq and io_poll_task_func in original
context, then:

  iowq                                      original context
  io_poll_add
    vfs_poll
     (interruption happens
      tw queued to original
      context)                              io_poll_task_func
                                              generate cqe
                                              del from cancel_hash[]
    if !poll.done
      insert to cancel_hash[]

The entry left in cancel_hash[], similar case for fast poll.
Fix it by set poll.done = true when del from cancel_hash[].

Fixes: 5082620f ("io_uring: terminate multishot poll for CQ ring overflow")
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-2-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

bd99c71b

15 9月, 2021 2 次提交

io_uring: move iopoll reissue into regular IO path · b66ceaf3

由 Pavel Begunkov 提交于 9月 15, 2021

230d50d4 ("io_uring: move reissue into regular IO path")
made non-IOPOLL I/O to not retry from ki_complete handler. Follow it
steps and do the same for IOPOLL. Same problems, same implementation,
same -EAGAIN assumptions.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f80dfee2d5fa7678f0052a8ab3cfca9496a112ca.1631699928.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b66ceaf3

io_uring: use iov_iter state save/restore helpers · cd658695

由 Jens Axboe 提交于 9月 10, 2021

Get rid of the need to do re-expand and revert on an iterator when we
encounter a short IO, or failure that warrants a retry. Use the new
state save/restore helpers instead.

We keep the iov_iter_state persistent across retries, if we need to
restart the read or write operation. If there's a pending retry, the
operation will always exit with the state correctly saved.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cd658695

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功