提交 · 8d4ad41e3e8e4b907f088f25aee4a92f3f864027 · openeuler / Kernel

03 9月, 2021 3 次提交

io_uring: prolong tctx_task_work() with flushing · 8d4ad41e

由 Pavel Begunkov 提交于 9月 02, 2021

io_submit_flush_completions() may enqueue linked requests for task_work
execution, so don't leave tctx_task_work() right after the tw list is
exhausted, but try to flush and then retry.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0755d4c2c36301447c63bdd4146c10477cea4249.1630539342.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

8d4ad41e

io_uring: don't disable kiocb_done() CQE batching · 63637853

由 Pavel Begunkov 提交于 9月 02, 2021

Not passing issue_flags from kiocb_done() into __io_complete_rw() means
that completion batching for this case is disabled, e.g. for most of
buffered reads.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b2689462835c3ee28a5999ef4f9a581e24be04a2.1630539342.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

63637853

io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL · fa84693b

由 Jens Axboe 提交于 9月 01, 2021

SQPOLL has a different thread doing submissions, we need to check for
that and use the right task context when updating the worker values.
Just hold the sqd->lock across the operation, this ensures that the
thread cannot go away while we poke at ->io_uring.

Link: https://github.com/axboe/liburing/issues/420
Fixes: 2e480058 ("io-wq: provide a way to limit max number of workers")
Reported-by: NJohannes Lundberg <johalun0@gmail.com>
Tested-by: NJohannes Lundberg <johalun0@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fa84693b

01 9月, 2021 4 次提交

io_uring: don't submit half-prepared drain request · b8ce1b9d

由 Pavel Begunkov 提交于 8月 31, 2021

[ 3784.910888] BUG: kernel NULL pointer dereference, address: 0000000000000020
[ 3784.910904] RIP: 0010:__io_file_supports_nowait+0x5/0xc0
[ 3784.910926] Call Trace:
[ 3784.910928]  ? io_read+0x17c/0x480
[ 3784.910945]  io_issue_sqe+0xcb/0x1840
[ 3784.910953]  __io_queue_sqe+0x44/0x300
[ 3784.910959]  io_req_task_submit+0x27/0x70
[ 3784.910962]  tctx_task_work+0xeb/0x1d0
[ 3784.910966]  task_work_run+0x61/0xa0
[ 3784.910968]  io_run_task_work_sig+0x53/0xa0
[ 3784.910975]  __x64_sys_io_uring_enter+0x22/0x30
[ 3784.910977]  do_syscall_64+0x3d/0x90
[ 3784.910981]  entry_SYSCALL_64_after_hwframe+0x44/0xae

io_drain_req() goes before checks for REQ_F_FAIL, which protect us from
submitting under-prepared request (e.g. failed in io_init_req(). Fail
such drained requests as well.

Fixes: a8295b98 ("io_uring: fix failed linkchain code logic")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e411eb9924d47a131b1e200b26b675df0c2b7627.1630415423.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b8ce1b9d

io_uring: fix queueing half-created requests · c6d3d9cb

由 Pavel Begunkov 提交于 8月 31, 2021

[   27.259845] general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN PTI
[   27.261043] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[   27.263730] RIP: 0010:sock_from_file+0x20/0x90
[   27.272444] Call Trace:
[   27.272736]  io_sendmsg+0x98/0x600
[   27.279216]  io_issue_sqe+0x498/0x68d0
[   27.281142]  __io_queue_sqe+0xab/0xb50
[   27.285830]  io_req_task_submit+0xbf/0x1b0
[   27.286306]  tctx_task_work+0x178/0xad0
[   27.288211]  task_work_run+0xe2/0x190
[   27.288571]  exit_to_user_mode_prepare+0x1a1/0x1b0
[   27.289041]  syscall_exit_to_user_mode+0x19/0x50
[   27.289521]  do_syscall_64+0x48/0x90
[   27.289871]  entry_SYSCALL_64_after_hwframe+0x44/0xae

io_req_complete_failed() -> io_req_complete_post() ->
io_req_task_queue() still would try to enqueue hard linked request,
which can be half prepared (e.g. failed init), so we can't allow
that to happen.

Fixes: a8295b98 ("io_uring: fix failed linkchain code logic")
Reported-by: syzbot+f9704d1878e290eddf73@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/70b513848c1000f88bd75965504649c6bb1415c0.1630415423.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c6d3d9cb

io_uring: retry in case of short read on block device · 7db30437

由 Ming Lei 提交于 8月 21, 2021

In case of buffered reading from block device, when short read happens,
we should retry to read more, otherwise the IO will be completed
partially, for example, the following fio expects to read 2MB, but it
can only read 1M or less bytes:

    fio --name=onessd --filename=/dev/nvme0n1 --filesize=2M \
	--rw=randread --bs=2M --direct=0 --overwrite=0 --numjobs=1 \
	--iodepth=1 --time_based=0 --runtime=2 --ioengine=io_uring \
	--registerfiles --fixedbufs --gtod_reduce=1 --group_reporting

Fix the issue by allowing short read retry for block device, which sets
FMODE_BUF_RASYNC really.

Fixes: 9a173346 ("io_uring: fix short read retries for non-reg files")
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210821150751.1290434-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7db30437

io_uring: IORING_OP_WRITE needs hash_reg_file set · 7b3188e7

由 Jens Axboe 提交于 8月 30, 2021

During some testing, it became evident that using IORING_OP_WRITE doesn't
hash buffered writes like the other writes commands do. That's simply
an oversight, and can cause performance regressions when doing buffered
writes with this command.

Correct that and add the flag, so that buffered writes are correctly
hashed when using the non-iovec based write command.

Cc: stable@vger.kernel.org
Fixes: 3a6820f2 ("io_uring: add non-vectored read/write commands")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7b3188e7

30 8月, 2021 2 次提交

io_uring: allow updating linked timeouts · f1042b6c

由 Pavel Begunkov 提交于 8月 28, 2021

We allow updating normal timeouts, add support for adjusting timings of
linked timeouts as well.
Reported-by: NVictor Stewart <v@nametag.social>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f1042b6c

io_uring: keep ltimeouts in a list · ef9dd637

由 Pavel Begunkov 提交于 8月 28, 2021

A preparation patch. Keep all queued linked timeout in a list, so they
may be found and updated.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef9dd637

29 8月, 2021 2 次提交

io_uring: support CLOCK_BOOTTIME/REALTIME for timeouts · 50c1df2b

由 Jens Axboe 提交于 8月 27, 2021

Certain use cases want to use CLOCK_BOOTTIME or CLOCK_REALTIME rather than
CLOCK_MONOTONIC, instead of the default CLOCK_MONOTONIC.

Add an IORING_TIMEOUT_BOOTTIME and IORING_TIMEOUT_REALTIME flag that
allows timeouts and linked timeouts to use the selected clock source.

Only one clock source may be selected, and we -EINVAL the request if more
than one is given. If neither BOOTIME nor REALTIME are selected, the
previous default of MONOTONIC is used.

Link: https://github.com/axboe/liburing/issues/369Signed-off-by: NJens Axboe <axboe@kernel.dk>

50c1df2b

io-wq: provide a way to limit max number of workers · 2e480058

由 Jens Axboe 提交于 8月 27, 2021

io-wq divides work into two categories:

1) Work that completes in a bounded time, like reading from a regular file
or a block device. This type of work is limited based on the size of
the SQ ring.

2) Work that may never complete, we call this unbounded work. The amount
of workers here is just limited by RLIMIT_NPROC.

For various uses cases, it's handy to have the kernel limit the maximum
amount of pending workers for both categories. Provide a way to do with
with a new IORING_REGISTER_IOWQ_MAX_WORKERS operation.

IORING_REGISTER_IOWQ_MAX_WORKERS takes an array of two integers and sets
the max worker count to what is being passed in for each category. The
old values are returned into that same array. If 0 is being passed in for
either category, it simply returns the current value.

The value is capped at RLIMIT_NPROC. This actually isn't that important
as it's more of a hint, if we're exceeding the value then our attempt
to fork a new worker will fail. This happens naturally already if more
than one node is in the system, as these values are per-node internally
for io-wq.
Reported-by: NJohannes Lundberg <johalun0@gmail.com>
Link: https://github.com/axboe/liburing/issues/420Signed-off-by: NJens Axboe <axboe@kernel.dk>

2e480058

27 8月, 2021 5 次提交

io_uring: add build check for buf_index overflows · 90499ad0

由 Pavel Begunkov 提交于 8月 25, 2021

req->buf_index is u16 and so we rely on registered buffers indexes
fitting into it. Add a build check, so when the upper limit for the
number of buffers is lifted we get a compliation fail but not lurking
problems.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/787e8e1a17cea51ca6301426b1c4c4887b8bd676.1629920396.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

90499ad0

io_uring: clarify io_req_task_cancel() locking · b18a1a45

由 Pavel Begunkov 提交于 8月 25, 2021

It's too easy to forget and misjudge about synchronisation in
io_req_task_cancel(), add a comment clarifying it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/71099083835f983a1fd73d5a3da6391924da8300.1629920396.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b18a1a45

io_uring: add task-refs-get helper · 9a10867a

由 Pavel Begunkov 提交于 8月 27, 2021

As we have a more complicated task referencing, which apart from normal
task references includes taking tctx->inflight and caching all that, it
would be a good idea to have all that isolated in helpers.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d9114d037f1c195897aa13f38a496078eca2afdb.1630023531.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9a10867a

io_uring: fix failed linkchain code logic · a8295b98

由 Hao Xu 提交于 8月 27, 2021

Given a linkchain like this:
req0(link_flag)-->req1(link_flag)-->...-->reqn(no link_flag)

There is a problem:
 - if some intermediate linked req like req1 's submittion fails, reqs
   after it won't be cancelled.

   - sqpoll disabled: maybe it's ok since users can get the error info
     of req1 and stop submitting the following sqes.

   - sqpoll enabled: definitely a problem, the following sqes will be
     submitted in the next round.

The solution is to refactor the code logic to:
 - if a linked req's submittion fails, just mark it and the head(if it
   exists) as REQ_F_FAIL. Leverage req->result to indicate whether it
   is failed or cancelled.
 - submit or fail the whole chain when we come to the end of it.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210827094609.36052-3-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a8295b98

io_uring: remove redundant req_set_fail() · 14afdd6e

由 Hao Xu 提交于 8月 27, 2021

req_set_fail() in io_submit_sqe() is redundant, remove it.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210827094609.36052-2-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

14afdd6e

26 8月, 2021 1 次提交

io_uring: don't free request to slab · 0c6e1d7f

由 Hao Xu 提交于 8月 26, 2021

It's not necessary to free the request back to slab when we fail to
get sqe, just move it to state->free_list.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210825175856.194299-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

0c6e1d7f

25 8月, 2021 3 次提交

io_uring: accept directly into fixed file table · aaa4db12

由 Pavel Begunkov 提交于 8月 25, 2021

As done with open opcodes, allow accept to skip installing fd into
processes' file tables and put it directly into io_uring's fixed file
table. Same restrictions and design as for open.
Suggested-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/6d16163f376fac7ac26a656de6b42199143e9721.1629888991.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

aaa4db12

io_uring: hand code io_accept() fd installing · a7083ad5

由 Pavel Begunkov 提交于 8月 25, 2021

Make io_accept() to handle file descriptor allocations and installation.
A preparation patch for bypassing file tables.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/5b73d204caa0ce979ccb98136695b60f52a3d98c.1629888991.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a7083ad5

io_uring: openat directly into fixed fd table · b9445598

由 Pavel Begunkov 提交于 8月 25, 2021

Instead of opening a file into a process's file table as usual and then
registering the fd within io_uring, some users may want to skip the
first step and place it directly into io_uring's fixed file table.
This patch adds such a capability for IORING_OP_OPENAT and
IORING_OP_OPENAT2.

The behaviour is controlled by setting sqe->file_index, where 0 implies
the old behaviour using normal file tables. If non-zero value is
specified, then it will behave as described and place the file into a
fixed file slot sqe->file_index - 1. A file table should be already
created, the slot should be valid and empty, otherwise the operation
will fail.

Keep the error codes consistent with IORING_OP_FILES_UPDATE, ENXIO and
EINVAL on inappropriate fixed tables, and return EBADF on collision with
already registered file.

Note: IOSQE_FIXED_FILE can't be used to switch between modes, because
accept takes a file, and it already uses the flag with a different
meaning.
Suggested-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/e9b33d1163286f51ea707f87d95bd596dada1e65.1629888991.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b9445598

24 8月, 2021 20 次提交

io_uring: fix io_try_cancel_userdata race for iowq · dadebc35

由 Pavel Begunkov 提交于 8月 23, 2021

WARNING: CPU: 1 PID: 5870 at fs/io_uring.c:5975 io_try_cancel_userdata+0x30f/0x540 fs/io_uring.c:5975
CPU: 0 PID: 5870 Comm: iou-wrk-5860 Not tainted 5.14.0-rc6-next-20210820-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_try_cancel_userdata+0x30f/0x540 fs/io_uring.c:5975
Call Trace:
 io_async_cancel fs/io_uring.c:6014 [inline]
 io_issue_sqe+0x22d5/0x65a0 fs/io_uring.c:6407
 io_wq_submit_work+0x1dc/0x300 fs/io_uring.c:6511
 io_worker_handle_work+0xa45/0x1840 fs/io-wq.c:533
 io_wqe_worker+0x2cc/0xbb0 fs/io-wq.c:582
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

io_try_cancel_userdata() can be called from io_async_cancel() executing
in the io-wq context, so the warning fires, which is there to alert
anyone accessing task->io_uring->io_wq in a racy way. However,
io_wq_put_and_exit() always first waits for all threads to complete,
so the only detail left is to zero tctx->io_wq after the context is
removed.

note: one little assumption is that when IO_WQ_WORK_CANCEL, the executor
won't touch ->io_wq, because io_wq_destroy() might cancel left pending
requests in such a way.

Cc: stable@vger.kernel.org
Reported-by: syzbot+b0c9d1588ae92866515f@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dfdd37a80cfa9ffd3e59538929c99cdd55d8699e.1629721757.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

dadebc35

io_uring: IRQ rw completion batching · 126180b9

由 Pavel Begunkov 提交于 8月 18, 2021

Employ inline completion logic for read/write completions done via
io_req_task_complete(). If ->uring_lock is contended, just do normal
request completion, but if not, make tctx_task_work() to grab the lock
and do batched inline completions in io_req_task_complete().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/94589c3ce69eaed86a21bb1ec696407a54fab1aa.1629286357.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

126180b9

io_uring: batch task work locking · f237c30a

由 Pavel Begunkov 提交于 8月 18, 2021

Many task_work handlers either grab ->uring_lock, or may benefit from
having it. Move locking logic out of individual handlers to a lazy
approach controlled by tctx_task_work(), so we don't keep doing
tons of mutex lock/unlock.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d6a34e147f2507a2f3e2fa1e38a9c541dcad3929.1629286357.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

f237c30a

io_uring: flush completions for fallbacks · 5636c00d

由 Pavel Begunkov 提交于 8月 18, 2021

io_fallback_req_func() doesn't expect anyone creating inline
completions, and no one currently does that. Teach the function to flush
completions preparing for further changes.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8b941516921f72e1a64d58932d671736892d7fff.1629286357.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5636c00d

io_uring: add ->splice_fd_in checks · 26578cda

由 Pavel Begunkov 提交于 8月 20, 2021

->splice_fd_in is used only by splice/tee, but no other request checks
it for validity. Add the check for most of request types excluding
reads/writes/sends/recvs, we don't want overhead for them and can leave
them be as is until the field is actually used.

Cc: stable@vger.kernel.org
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f44bc2acd6777d932de3d71a5692235b5b2b7397.1629451684.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

26578cda

io_uring: add clarifying comment for io_cqring_ev_posted() · 2c5d763c

由 Jens Axboe 提交于 8月 21, 2021

We've previously had an issue where overflow flush unconditionally calls
io_cqring_ev_posted() even if it didn't flush any events to the ring,
causing wake and eventfd increment where no new events are available.
Some applications don't like that, see commit b18032bb for details.

This came up in discussion for another patch recently, hence add a
comment detailing what the relationship between calling the events
posted helper and CQ ring entries is.

Link: https://lore.kernel.org/io-uring/77a44fce-c831-16a6-8e80-9aee77f496a2@kernel.dk/Signed-off-by: NJens Axboe <axboe@kernel.dk>

2c5d763c

io_uring: place fixed tables under memcg limits · 0bea96f5

由 Pavel Begunkov 提交于 8月 20, 2021

Fixed tables may be large enough, place all of them together with
allocated tags under memcg limits.

Cc: stable@vger.kernel.org
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b3ac9f5da9821bb59837b5fe25e8ef4be982218c.1629451684.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

0bea96f5

io_uring: limit fixed table size by RLIMIT_NOFILE · 3a1b8a4e

由 Pavel Begunkov 提交于 8月 20, 2021

Limit the number of files in io_uring fixed tables by RLIMIT_NOFILE,
that's the first and the simpliest restriction that we should impose.

Cc: stable@vger.kernel.org
Suggested-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b2756c340aed7d6c0b302c26dab50c6c5907f4ce.1629451684.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3a1b8a4e

io_uring: fix lack of protection for compl_nr · 99c8bc52

由 Hao Xu 提交于 8月 21, 2021

coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
may cause problems when accessing in parallel:

say coml_nr > 0

  ctx_flush_and put                  other context
   if (compl_nr)                      get mutex
                                      coml_nr > 0
                                      do flush
                                          coml_nr = 0
                                      release mutex
        get mutex
           do flush (*)
        release mutex

in (*) place, we call io_cqring_ev_posted() and users likely get
no events there. To avoid spurious events, re-check the value when
under the lock.

Fixes: 2c32395d ("io_uring: fix __tctx_task_work() ctx race")
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210820221954.61815-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

99c8bc52

io_uring: Add register support for non-4k PAGE_SIZE · 187f08c1

由 wangyangbo 提交于 8月 19, 2021

Now allocated rsrc table uses PAGE_SIZE as the size of 2nd-level, and
accessing this table relies on each level index from fixed TABLE_SHIFT
(12 - 3) in 4k page case. In order to correctly work in non-4k page,
define TABLE_SHIFT as non-fixed (PAGE_SHIFT - shift of data) for
2nd-level table entry number.
Signed-off-by: Nwangyangbo <wangyangbo@uniontech.com>
Link: https://lore.kernel.org/r/20210819055657.27327-1-wangyangbo@uniontech.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

187f08c1

io_uring: extend task put optimisations · e98e49b2

由 Pavel Begunkov 提交于 8月 18, 2021

Now with IRQ completions done via IRQ, almost all requests freeing
are done from the context of submitter task, so it makes sense to
extend task_put optimisation from io_req_free_batch_finish() to cover
all the cases including task_work by moving it into io_put_task().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/824a7cbd745ddeee4a0f3ff85c558a24fd005872.1629302453.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e98e49b2

io_uring: add comments on why PF_EXITING checking is safe · 316319e8

由 Jens Axboe 提交于 8月 19, 2021

We have two checks of task->flags & PF_EXITING left:

1) In io_req_task_submit(), which is called in task_work and hence always
   in the context of the original task. That means that
   req->task == current, and hence checking ->flags is totally fine.

2) In io_poll_rewait(), where we need to stop re-arming poll to prevent
   it interfering with cancelation. This is only run from task_work as
   well, and hence for this case too req->task == current.

Add a comment to both spots detailing that.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

316319e8

io_uring: fix io_timeout_remove locking · ec3c3d0f

由 Pavel Begunkov 提交于 8月 18, 2021

io_timeout_cancel() posts CQEs so needs ->completion_lock to be held,
so grab it in io_timeout_remove().

Fixes: 48ecb6369f1f2 ("io_uring: run timeouts from task_work")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d6f03d653a4d7bf693ef6f39b6a426b6d97fd96f.1629280204.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ec3c3d0f

io_uring: improve same wq polling · 23a65db8

由 Pavel Begunkov 提交于 8月 17, 2021

Move earlier the check for whether __io_queue_proc() tries to poll
already polled waitqueue, and do the same for the second poll entry, if
any. Shouldn't really matter, but at least it would have a more
predictable behaviour.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8cb428cfe8ade0fd055859fabb878db8777d4c2f.1629228203.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

23a65db8

io_uring: reuse io_req_complete_post() · 505657bc

由 Pavel Begunkov 提交于 8月 17, 2021

We have io_req_complete_post() to post a CQE and put the request. It
takes care of all synchronisation and is more concise and efficent, so
replace all hancoded occurrences of
"lock; post CQE; unlock; + put_req()" with io_req_complete_post().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2c83463458a613f9d870e5147eb134da2aa70779.1629228203.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

505657bc

io_uring: better encapsulate buffer select for rw · ae421d93

由 Pavel Begunkov 提交于 8月 17, 2021

Make io_put_rw_kbuf() to do the REQ_F_BUFFER_SELECTED check, so all the
callers don't need to hand code it. The number of places where we call
io_put_rw_kbuf() is growing, so saves some pain.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3df3919e5e7efe03420c44ab4d9317a81a9cf398.1629228203.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ae421d93

io_uring: optimise io_prep_linked_timeout() · 906c6caa

由 Pavel Begunkov 提交于 8月 15, 2021

Linked timeout handling during issuing is heavy, it adds extra
instructions and forces to save the next linked timeout before
io_issue_sqe().

Follwing the same reasoning as in refcounting patches, a request can't
be freed by the time it returns from io_issue_sqe(), so now we don't
need to do io_prep_linked_timeout() in advance, and it can be delayed to
colder paths optimising the generic path.

Also, it should also save quite a lot for requests with linked timeouts
and completed inline on timeout spinlocking + hrtimer_start() +
hrtimer_try_to_cancel() and so on.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/19bfc9a0d26c5c5f1e359f7650afe807ca8ef879.1628981736.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

906c6caa