提交 · c86d18f4aa93e0e66cda0e55827cd03eea6bc5f8 · openeuler / Kernel

26 3月, 2022 1 次提交

io_uring: fix memory leak of uid in files registration · c86d18f4

由 Pavel Begunkov 提交于 3月 25, 2022

When there are no files for __io_sqe_files_scm() to process in the
range, it'll free everything and return. However, it forgets to put uid.

Fixes: 08a45173 ("io_uring: allow sparse fixed file sets")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/accee442376f33ce8aaebb099d04967533efde92.1648226048.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c86d18f4

25 3月, 2022 6 次提交

io_uring: fix put_kbuf without proper locking · 8197b053

由 Pavel Begunkov 提交于 3月 25, 2022

io_put_kbuf_comp() should only be called while holding
->completion_lock, however there is no such assumption in io_clean_op()
and thus it can corrupt ->io_buffer_comp. Take the lock there, and
workaround the only user of io_clean_op() calling it with locks. Not
the prettiest solution, but it's easier to refactor it for-next.

Fixes: cc3cec83 ("io_uring: speedup provided buffer handling")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/743e2130b73ec6d48c4c5dd15db896c433431e6d.1648212967.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

8197b053

io_uring: fix invalid flags for io_put_kbuf() · ab0ac095

由 Pavel Begunkov 提交于 3月 25, 2022

io_req_complete_failed() doesn't require callers to hold ->uring_lock,
use IO_URING_F_UNLOCKED version of io_put_kbuf(). The only affected
place is the fail path of io_apoll_task_func(). Also add a lockdep
annotation to catch such bugs in the future.

Fixes: 3b2b78a8 ("io_uring: extend provided buf return to fails")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ccf602dbf8df3b6a8552a262d8ee0a13a086fbc7.1648212967.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ab0ac095

io_uring: improve req fields comments · 41cdcc22

由 Pavel Begunkov 提交于 3月 25, 2022

Move a misplaced comment about req->creds and add a line with
assumptions about req->link.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1e51d1e6b1f3708c2d4127b4e371f9daa4c5f859.1648209006.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

41cdcc22

io_uring: enable EPOLLEXCLUSIVE for accept poll · 52dd8640

由 Dylan Yudaken 提交于 3月 25, 2022

When polling sockets for accept, use EPOLLEXCLUSIVE. This is helpful
when multiple accept SQEs are submitted.

For O_NONBLOCK sockets multiple queued SQEs would previously have all
completed at once, but most with -EAGAIN as the result. Now only one
wakes up and completes.

For sockets without O_NONBLOCK there is no user facing change, but
internally the extra requests would previously be queued onto a worker
thread as they would wake up with no connection waiting, and be
punted. Now they do not wake up unnecessarily.
Co-developed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220325093755.4123343-1-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

52dd8640

io_uring: improve task work cache utilization · 34d2bfe7

由 Jens Axboe 提交于 3月 24, 2022

While profiling task_work intensive workloads, I noticed that most of
the time in tctx_task_work() is spending stalled on loading 'req'. This
is one of the unfortunate side effects of using linked lists,
particularly when they end up being passe around.

Prefetch the next request, if there is one. There's a sufficient amount
of work in between that this makes it available for the next loop.

While fiddling with the cache layout, move the link outside of the
hot completion cacheline. It's rarely used in hot workloads, so better
to bring in kbuf which is used for networked loads with provided buffers.

This reduces tctx_task_work() overhead from ~3% to 1-1.5% in my testing.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

34d2bfe7

io_uring: fix async accept on O_NONBLOCK sockets · a73825ba

由 Dylan Yudaken 提交于 3月 24, 2022

Do not set REQ_F_NOWAIT if the socket is non blocking. When enabled this
causes the accept to immediately post a CQE with EAGAIN, which means you
cannot perform an accept SQE on a NONBLOCK socket asynchronously.

By removing the flag if there is no pending accept then poll is armed as
usual and when a connection comes in the CQE is posted.
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220324143435.2875844-1-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a73825ba

24 3月, 2022 3 次提交

io_uring: remove IORING_CQE_F_MSG · 7ef66d18

由 Jens Axboe 提交于 3月 24, 2022

This was introduced with the message ring opcode, but isn't strictly
required for the request itself. The sender can encode what is needed
in user_data, which is passed to the receiver. It's unclear if having
a separate flag that essentially says "This CQE did not originate from
an SQE on this ring" provides any real utility to applications. While
we can always re-introduce a flag to provide this information, we cannot
take it away at a later point in time.

Remove the flag while we still can, before it's in a released kernel.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7ef66d18

io_uring: add flag for disabling provided buffer recycling · 8a3e8ee5

由 Jens Axboe 提交于 3月 23, 2022

If we need to continue doing this IO, then we don't want a potentially
selected buffer recycled. Add a flag for that.

Set this for recv/recvmsg if they do partial IO.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8a3e8ee5

io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly · 7ba89d2a

由 Jens Axboe 提交于 3月 23, 2022

We currently don't attempt to get the full asked for length even if
MSG_WAITALL is set, if we get a partial receive. If we do see a partial
receive, then just note how many bytes we did and return -EAGAIN to
get it retried.

The iov is advanced appropriately for the vector based case, and we
manually bump the buffer and remainder for the non-vector case.

Cc: stable@vger.kernel.org
Reported-by: NConstantine Gavrilov <constantine.gavrilov@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7ba89d2a

23 3月, 2022 3 次提交

io_uring: don't recycle provided buffer if punted to async worker · 4d55f238

由 Jens Axboe 提交于 3月 22, 2022

We only really need to recycle the buffer when going async for a file
type that has an indefinite reponse time (eg non-file/bdev). And for
files that to arm poll, the async worker will arm poll anyway and the
buffer will get recycled there.

In that latter case, we're not holding ctx->uring_lock. Ensure we take
the issue_flags into account and acquire it if we need to.

Fixes: b1c62645 ("io_uring: recycle provided buffers if request goes async")
Reported-by: NStefan Roesch <shr@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4d55f238

io_uring: fix assuming triggered poll waitqueue is the single poll · d89a4fac

由 Jens Axboe 提交于 3月 22, 2022

syzbot reports a recent regression:

BUG: KASAN: use-after-free in __wake_up_common+0x637/0x650 kernel/sched/wait.c:101
Read of size 8 at addr ffff888011e8a130 by task syz-executor413/3618

CPU: 0 PID: 3618 Comm: syz-executor413 Tainted: G        W         5.17.0-syzkaller-01402-g8565d644 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_address_description.constprop.0.cold+0x8d/0x303 mm/kasan/report.c:255
 __kasan_report mm/kasan/report.c:442 [inline]
 kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
 __wake_up_common+0x637/0x650 kernel/sched/wait.c:101
 __wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:138
 tty_release+0x657/0x1200 drivers/tty/tty_io.c:1781
 __fput+0x286/0x9f0 fs/file_table.c:317
 task_work_run+0xdd/0x1a0 kernel/task_work.c:164
 exit_task_work include/linux/task_work.h:32 [inline]
 do_exit+0xaff/0x29d0 kernel/exit.c:806
 do_group_exit+0xd2/0x2f0 kernel/exit.c:936
 __do_sys_exit_group kernel/exit.c:947 [inline]
 __se_sys_exit_group kernel/exit.c:945 [inline]
 __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:945
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f439a1fac69

which is due to leaving the request on the waitqueue mistakenly. The
reproducer is using a tty device, which means we end up arming the same
poll queue twice (it uses the same poll waitqueue for both), but in
io_poll_wake() we always just clear REQ_F_SINGLE_POLL regardless of which
entry triggered. This leaves one waitqueue potentially armed after we're
done, which then blows up in tty when the waitqueue is attempted removed.

We have no room to store this information, so simply encode it in the
wait_queue_entry->private where we store the io_kiocb request pointer.

Fixes: 91eac1c6 ("io_uring: cache poll/double-poll state with a request flag")
Reported-by: syzbot+09ad4050dd3a120bfccd@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d89a4fac

io_uring: bump poll refs to full 31-bits · e2c0cb7c

由 Jens Axboe 提交于 3月 22, 2022

The previous commit:

1bc84c40088 ("io_uring: remove poll entry from list when canceling all")

removed a potential overflow condition for the poll references. They
are currently limited to 20-bits, even if we have 31-bits available. The
upper bit is used to mark for cancelation.

Bump the poll ref space to 31-bits, making that kind of situation much
harder to trigger in general. We'll separately add overflow checking
and handling.

Fixes: aa43477b ("io_uring: poll rework")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e2c0cb7c

22 3月, 2022 1 次提交

io_uring: remove poll entry from list when canceling all · 61bc84c4

由 Jens Axboe 提交于 3月 21, 2022

When the ring is exiting, as part of the shutdown, poll requests are
removed. But io_poll_remove_all() does not remove entries when finding
them, and since completions are done out-of-band, we can find and remove
the same entry multiple times.

We do guard the poll execution by poll ownership, but that does not
exclude us from reissuing a new one once the previous removal ownership
goes away.

This can race with poll execution as well, where we then end up seeing
req->apoll be NULL because a previous task_work requeue finished the
request.

Remove the poll entry when we find it and get ownership of it. This
prevents multiple invocations from finding it.

Fixes: aa43477b ("io_uring: poll rework")
Reported-by: NDylan Yudaken <dylany@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

61bc84c4

21 3月, 2022 2 次提交

io_uring: fix memory ordering when SQPOLL thread goes to sleep · 649bb75d

由 Almog Khaikin 提交于 3月 21, 2022

Without a full memory barrier between the store to the flags and the
load of the SQ tail the two operations can be reordered and this can
lead to a situation where the SQPOLL thread goes to sleep while the
application writes to the SQ tail and doesn't see the wakeup flag.
This memory barrier pairs with a full memory barrier in the application
between its store to the SQ tail and its load of the flags.
Signed-off-by: NAlmog Khaikin <almogkh@gmail.com>
Link: https://lore.kernel.org/r/20220321090059.46313-1-almogkh@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

649bb75d

io_uring: ensure that fsnotify is always called · f63cf519

由 Jens Axboe 提交于 3月 20, 2022

Ensure that we call fsnotify_modify() if we write a file, and that we
do fsnotify_access() if we read it. This enables anyone using inotify
on the file to get notified.

Ditto for fallocate, ensure that fsnotify_modify() is called.

Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f63cf519

20 3月, 2022 1 次提交

io_uring: recycle provided before arming poll · abdad709

由 Jens Axboe 提交于 3月 19, 2022

We currently have a race where we recycle the selected buffer if poll
returns IO_APOLL_OK. But that's too late, as the poll could already be
triggering or have triggered. If that race happens, then we're putting a
buffer that's already being used.

Fix this by recycling before we arm poll. This does mean that we'll
sometimes almost instantly re-select the buffer, but it's rare enough in
testing that it should not pose a performance issue.

Fixes: b1c62645 ("io_uring: recycle provided buffers if request goes async")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

abdad709

19 3月, 2022 2 次提交

io_uring: terminate manual loop iterator loop correctly for non-vecs · 5e929367

由 Jens Axboe 提交于 3月 18, 2022

The fix for not advancing the iterator if we're using fixed buffers is
broken in that it can hit a condition where we don't terminate the loop.
This results in io-wq looping forever, asking to read (or write) 0 bytes
for every subsequent loop.
Reported-by: NJoel Jaeschke <joel.jaeschke@gmail.com>
Link: https://github.com/axboe/liburing/issues/549
Fixes: 16c8d2df ("io_uring: ensure symmetry in handling iter types in loop_rw_iter()")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5e929367

io_uring: don't check unrelated req->open.how in accept request · adf3a9e9

由 Jens Axboe 提交于 3月 14, 2022

Looks like a victim of too much copy/paste, we should not be looking
at req->open.how in accept. The point is to check CLOEXEC and error
out, which we don't invalid direct descriptors on exec. Hence any
attempt to get a direct descriptor with CLOEXEC is invalid.

No harm is done here, as req->open.how.flags overlaps with
req->accept.flags, but it's very confusing and might change if either of
those command structs are modified.

Fixes: aaa4db12 ("io_uring: accept directly into fixed file table")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

adf3a9e9

18 3月, 2022 1 次提交

io_uring: manage provided buffers strictly ordered · dbc7d452

由 Jens Axboe 提交于 3月 17, 2022

Workloads using provided buffers benefit from using and returning buffers
in the right order, and so does TLBs for that matter. Manage the internal
buffer list in a straight list, rather than use the head buffer as the
insertion node. Use a hashed list for the buffer group IDs instead of
xarray, the overhead is much lower this way. xarray provides internal
locking and other trickery that is handy for some uses cases, but
io_uring already locks internally for the buffer manipulation and needs
none of that.

This is good for about a 2% reduction in overhead, combination of the
improved management and the fact that the workload has an easier time
bundling back provided buffers.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dbc7d452

17 3月, 2022 11 次提交

io_uring: fold evfd signalling under a slower path · 9aa8dfde

由 Pavel Begunkov 提交于 3月 17, 2022

Add ->has_evfd flag, which is true IFF there is an eventfd attached, and
use it to hide io_eventfd_signal() into __io_commit_cqring_flush() and
combine fast checks in a single if. Also, gcc 11.2 wasn't inlining
io_cqring_ev_posted() without this change, so helps with that as well.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f6168471997decded475a063f92915787975a30b.1647481208.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9aa8dfde

io_uring: thin down io_commit_cqring() · 9333f6b4

由 Pavel Begunkov 提交于 3月 17, 2022

io_commit_cqring() is currently always under spinlock section, so it's
always better to keep it as slim as possible. Move
__io_commit_cqring_flush() out of it into ev_posted*(). If fast checks
do fail and this post-processing is required, we'll reacquire
->completion_lock, which is fine as we don't care about performance of
draining and offset timeouts.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ec4e81fd720d3bc7bca8cb9152e080dad1a052f1.1647481208.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9333f6b4

io_uring: shuffle io_eventfd_signal() bits around · 66fc25ca

由 Pavel Begunkov 提交于 3月 17, 2022

A preparation patch, which moves a fast ->io_ev_fd check out of
io_eventfd_signal() into ev_posted*(). Compilers are smart enough for it
to not change anything, but will need it later.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ec4091ac76d43912b73917e8db651c2dac4b7b01.1647481208.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

66fc25ca

io_uring: remove extra barrier for non-sqpoll iopoll · 0f847471

由 Pavel Begunkov 提交于 3月 17, 2022

smp_mb() in io_cqring_ev_posted_iopoll() is only there because of
waitqueue_active(). However, non-SQPOLL IOPOLL ring doesn't wake the CQ
and so the barrier there is useless. Kill it, it's usually pretty
expensive.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d72e8ef6f7a3f6a72e18fad8409f7d47afc8da7d.1647481208.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

0f847471

io_uring: fix provided buffer return on failure for kiocb_done() · b91ef187

由 Pavel Begunkov 提交于 3月 16, 2022

Use io_req_complete_failed() in kiocb_done(). This cleans up the code,
but also ensures that a provided buffers is correctly freed on failure.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a4880106fcf199d5810707fe2d17126fcdf18bc4.1647481208.git.asml.silence@gmail.com
[axboe: split from previous patch]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b91ef187

io_uring: extend provided buf return to fails · 3b2b78a8

由 Pavel Begunkov 提交于 3月 17, 2022

It's never a good idea to put provided buffers without notifying the
userspace, it'll lead to userspace leaks, so add io_put_kbuf() in
io_req_complete_failed(). The fail helper is called by all sorts of
requests, but it's still safe to do as io_put_kbuf() will return 0 in
for all requests that don't support and so don't expect provided buffers.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a4880106fcf199d5810707fe2d17126fcdf18bc4.1647481208.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3b2b78a8

io_uring: refactor timeout cancellation cqe posting · 6695490d

由 Pavel Begunkov 提交于 3月 17, 2022

io_fill_cqe*() is not always the best way to post CQEs just because
there is enough of infrastructure on top. Replace a raw call to a
variant of it inside of io_timeout_cancel(), which also saves us some
bloating and might help with batching later.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/46113ec4345764b4aef3b384ce38cceabaeedcbb.1647481208.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6695490d

io_uring: normilise naming for fill_cqe* · ae4da189

由 Pavel Begunkov 提交于 3月 17, 2022

Restore consistency in __io_fill_cqe* like helpers, always honouring
"io_" prefix and adding "req" when we're passing in a request.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bd016ff5c1a4f74687828069d2619d8a65e0c6d7.1647481208.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ae4da189

io_uring: cache poll/double-poll state with a request flag · 91eac1c6

由 Jens Axboe 提交于 3月 16, 2022

With commit "io_uring: cache req->apoll->events in req->cflags" applied,
we now have just io_poll_remove_entries() dipping into req->apoll when
it isn't strictly necessary.

Mark poll and double-poll with a flag, so we know if we need to look
at apoll->double_poll. This avoids pulling in those cachelines if we
don't need them. The common case is that the poll wake handler already
removed these entries while hot off the completion path.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

91eac1c6

io_uring: cache req->apoll->events in req->cflags · 81459350

由 Jens Axboe 提交于 3月 16, 2022

When we arm poll on behalf of a different type of request, like a network
receive, then we allocate req->apoll as our poll entry. Running network
workloads shows io_poll_check_events() as the most expensive part of
io_uring, and it's all due to having to pull in req->apoll instead of
just the request which we have hot already.

Cache poll->events in req->cflags, which isn't used until the request
completes anyway. This isn't strictly needed for regular poll, where
req->poll.events is used and thus already hot, but for the sake of
unification we do it all around.

This saves 3-4% of overhead in certain request workloads.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

81459350

io_uring: move req->poll_refs into previous struct hole · 521d61fc

由 Jens Axboe 提交于 3月 16, 2022

This serves two purposes:

- We now have the last cacheline mostly unused for generic workloads,
  instead of having to pull in the poll refs explicitly for workloads
  that rely on poll arming.

- It shrinks the io_kiocb from 232 to 224 bytes.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

521d61fc

16 3月, 2022 1 次提交

io_uring: recycle apoll_poll entries · 4d9237e3

由 Jens Axboe 提交于 3月 15, 2022

Particularly for networked workloads, io_uring intensively uses its
poll based backend to get a notification when data/space is available.
Profiling workloads, we see 3-4% of alloc+free that is directly attributed
to just the apoll allocation and free (and the rest being skb alloc+free).

For the fast path, we have ctx->uring_lock held already for both issue
and the inline completions, and we can utilize that to avoid any extra
locking needed to have a basic recycling cache for the apoll entries on
both the alloc and free side.

Double poll still requires an allocation. But those are rare and not
a fast path item.

With the simple cache in place, we see a 3-4% reduction in overhead for
the workload.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4d9237e3

12 3月, 2022 1 次提交

io_uring: remove duplicated member check for io_msg_ring_prep() · f3b6a41e

由 Jens Axboe 提交于 3月 12, 2022

Julia and the kernel test robot report that the prep handling for this
command inadvertently checks one field twice:

fs/io_uring.c:4338:42-56: duplicated argument to && or ||

Get rid of it.
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NJulia Lawall <julia.lawall@lip6.fr>
Fixes: 4f57f06c ("io_uring: add support for IORING_OP_MSG_RING command")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f3b6a41e

11 3月, 2022 7 次提交

io_uring: allow submissions to continue on error · bcbb7bf6

由 Jens Axboe 提交于 3月 10, 2022

By default, io_uring will stop submitting a batch of requests if we run
into an error submitting a request. This isn't strictly necessary, as
the error result is passed out-of-band via a CQE anyway. And it can be
a bit confusing for some applications.

Provide a way to setup a ring that will continue submitting on error,
when the error CQE has been posted.

There's still one case that will break out of submission. If we fail
allocating a request, then we'll still return -ENOMEM. We could in theory
post a CQE for that condition too even if we never got a request. Leave
that for a potential followup.
Reported-by: NDylan Yudaken <dylany@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bcbb7bf6

io_uring: recycle provided buffers if request goes async · b1c62645

由 Jens Axboe 提交于 3月 09, 2022

If we are using provided buffers, it's less than useful to have a buffer
selected and pinned if a request needs to go async or arms poll for
notification trigger on when we can process it.

Recycle the buffer in those events, so we don't pin it for the duration
of the request.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b1c62645

io_uring: ensure reads re-import for selected buffers · 2be2eb02

由 Jens Axboe 提交于 3月 10, 2022

If we drop buffers between scheduling a retry, then we need to re-import
when we start the request again.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2be2eb02

io_uring: retry early for reads if we can poll · 9af177ee

由 Jens Axboe 提交于 3月 09, 2022

Most of the logic in io_read() deals with regular files, and in some ways
it would make sense to split the handling into S_IFREG and others. But
at least for retry, we don't need to bother setting up a bunch of state
just to abort in the loop later. In particular, don't bother forcing
setup of async data for a normal non-vectored read when we don't need it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9af177ee

io_uring: Add support for napi_busy_poll · adc8682e

由 Olivier Langlois 提交于 3月 08, 2022

The sqpoll thread can be used for performing the napi busy poll in a
similar way that it does io polling for file systems supporting direct
access bypassing the page cache.

The other way that io_uring can be used for napi busy poll is by
calling io_uring_enter() to get events.

If the user specify a timeout value, it is distributed between polling
and sleeping by using the systemwide setting
/proc/sys/net/core/busy_poll.

The changes have been tested with this program:
https://github.com/lano1106/io_uring_udp_ping

and the result is:
Without sqpoll:
NAPI busy loop disabled:
rtt min/avg/max/mdev = 40.631/42.050/58.667/1.547 us
NAPI busy loop enabled:
rtt min/avg/max/mdev = 30.619/31.753/61.433/1.456 us

With sqpoll:
NAPI busy loop disabled:
rtt min/avg/max/mdev = 42.087/44.438/59.508/1.533 us
NAPI busy loop enabled:
rtt min/avg/max/mdev = 35.779/37.347/52.201/0.924 us
Co-developed-by: NHao Xu <haoxu@linux.alibaba.com>
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Signed-off-by: NOlivier Langlois <olivier@trillion01.com>
Link: https://lore.kernel.org/r/810bd9408ffc510ff08269e78dca9df4af0b9e4e.1646777484.git.olivier@trillion01.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

adc8682e

io_uring: minor io_cqring_wait() optimization · 950e79dd

由 Olivier Langlois 提交于 3月 08, 2022

Move up the block manipulating the sig variable to execute code
that may encounter an error and exit first before continuing
executing the rest of the function and avoid useless computations
Signed-off-by: NOlivier Langlois <olivier@trillion01.com>
Link: https://lore.kernel.org/r/84513f7cc1b1fb31d8f4cb910aee033391d036b4.1646777484.git.olivier@trillion01.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

950e79dd

io_uring: add support for IORING_OP_MSG_RING command · 4f57f06c

由 Jens Axboe 提交于 3月 10, 2022

This adds support for IORING_OP_MSG_RING, which allows an SQE to signal
another ring. That allows either waking up someone waiting on the ring,
or even passing a 64-bit value via the user_data field in the CQE.

sqe->fd must contain the fd of a ring that should receive the CQE.
sqe->off will be propagated to the cqe->user_data on the target ring,
and sqe->len will be propagated to cqe->res. The results CQE will have
IORING_CQE_F_MSG set in its flags, to indicate that this CQE was generated
from a messaging request rather than a SQE issued locally on that ring.
This effectively allows passing a 64-bit and a 32-bit quantify between
the two rings.

This request type has the following request specific error cases:

- -EBADFD. Set if the sqe->fd doesn't point to a file descriptor that is
  of the io_uring type.
- -EOVERFLOW. Set if we were not able to deliver a request to the target
  ring.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4f57f06c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功