提交 · 4e0377a1c5c633852f443a562ec55f7dfea65350 · openeuler / Kernel

02 2月, 2021 32 次提交

io_uring: Add skip option for __io_sqe_files_update · 4e0377a1

由 noah 提交于 1月 26, 2021

This patch adds support for skipping a file descriptor when using
IORING_REGISTER_FILES_UPDATE.  __io_sqe_files_update will skip fds set
to IORING_REGISTER_FILES_SKIP. IORING_REGISTER_FILES_SKIP is inturn
added as a #define in io_uring.h
Signed-off-by: Nnoah <goldstein.w.n@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4e0377a1

io_uring: cleanup files_update looping · 67973b93

由 Pavel Begunkov 提交于 1月 26, 2021

Replace a while with a simple for loop, that looks way more natural, and
enables us to use "continue" as indexes are no more updated by hand in
the end of the loop.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

67973b93

io_uring: consolidate putting reqs task · 7c660731

由 Pavel Begunkov 提交于 1月 25, 2021

We grab a task for each request and while putting it it also have to do
extra work like inflight accounting and waking up that task. This
sequence is duplicated several time, it's good time to add a helper.
More to that, the helper generates better code due to better locality
and so not failing alias analysis.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7c660731

io_uring: ensure only sqo_task has file notes · ecfc8492

由 Pavel Begunkov 提交于 1月 25, 2021

For SQPOLL io_uring we want to have only one file note held by
sqo_task. Add a warning to make sure it holds. It's deep in
io_uring_add_task_file() out of hot path, so shouldn't hurt.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ecfc8492

io_uring: simplify io_remove_personalities() · 0bead8cd

由 Yejune Deng 提交于 12月 24, 2020

The function io_remove_personalities() is very similar to
io_unregister_personality(),so implement io_remove_personalities()
calling io_unregister_personality().
Signed-off-by: NYejune Deng <yejune.deng@gmail.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0bead8cd

io_uring/io-wq: kill off now unused IO_WQ_WORK_NO_CANCEL · 4014d943

由 Jens Axboe 提交于 1月 19, 2021

It's no longer used as IORING_OP_CLOSE got rid for the need of flagging
it as uncancelable, kill it of.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4014d943

io_uring: get rid of intermediate IORING_OP_CLOSE stage · 9eac1904

由 Jens Axboe 提交于 1月 19, 2021

We currently split the close into two, in case we have a ->flush op
that we can't safely handle from non-blocking context. This requires
us to flag the op as uncancelable if we do need to punt it async, and
that means special handling for just this op type.

Use __close_fd_get_file() and grab the files lock so we can get the file
and check if we need to go async in one atomic operation. That gets rid
of the need for splitting this into two steps, and hence the need for
IO_WQ_WORK_NO_CANCEL.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9eac1904

io_uring: save atomic dec for inline executed reqs · e342c807

由 Pavel Begunkov 提交于 1月 19, 2021

When a request is completed with comp_state, its completion reference
put is deferred to io_submit_flush_completions(), but the submission
is put not far from there, so do it together to save one atomic dec per
request. That targets requests that complete inline, e.g. buffered rw,
send/recv.

Proper benchmarking haven't been conducted but for nops(batch=32) it was
around 7901 vs 8117 KIOPS (~2.7%), or ~4% per perf profiling.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e342c807

io_uring: don't flush CQEs deep down the stack · 9affd664

由 Pavel Begunkov 提交于 1月 19, 2021

io_submit_flush_completions() is called down the stack in the _state
version of io_req_complete(), that's ok because is only called by
io_uring opcode handler functions directly. Move it up to
__io_queue_sqe() as preparation.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9affd664

io_uring: help inlining of io_req_complete() · a38d68db

由 Pavel Begunkov 提交于 1月 19, 2021

__io_req_complete() inlining is a bit weird, some compilers don't
optimise out the non-NULL branch of it even when called as
io_req_complete(). Help it a bit by extracting state and stateless
helpers out of __io_req_complete().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a38d68db

io_uring: add a helper timeout mode calculation · 8662daec

由 Pavel Begunkov 提交于 1月 19, 2021

Deduplicates translation of timeout flags into hrtimer_mode.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8662daec

io_uring: deduplicate failing task_work_add · eab30c4d

由 Pavel Begunkov 提交于 1月 19, 2021

When io_req_task_work_add() fails, the request will be cancelled by
enqueueing via task_works of io-wq. Extract a function for that.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eab30c4d

io_uring: remove __io_state_file_put · 02b23a9a

由 Pavel Begunkov 提交于 1月 19, 2021

The check in io_state_file_put() is optimised pretty well when called
from __io_file_get(). Don't pollute the code with all these variants.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

02b23a9a

io_uring: simplify io_alloc_req() · 85bcb6c6

由 Pavel Begunkov 提交于 1月 19, 2021

Get rid of a label in io_alloc_req(), it's cleaner to do return
directly.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

85bcb6c6

io_uring: further deduplicate #CQ events calc · 888aae2e

由 Pavel Begunkov 提交于 1月 19, 2021

Apparently, there is one more place hand coded calculation of number of
CQ events in the ring. Use __io_cqring_events() helper in
io_get_cqring() as well. Naturally, assembly stays identical.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

888aae2e

io_uring: inline __io_commit_cqring() · ec30e04b

由 Pavel Begunkov 提交于 1月 19, 2021

Inline it in its only user, that's cleaner
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ec30e04b

io_uring: inline io_async_submit() · 2d7e9358

由 Pavel Begunkov 提交于 1月 19, 2021

The name is confusing and it's used only in one place.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2d7e9358

io_uring: cleanup personalities under uring_lock · 5c766a90

由 Pavel Begunkov 提交于 1月 19, 2021

personality_idr is usually synchronised by uring_lock, the exception
would be removing personalities in io_ring_ctx_wait_and_kill(), which
is legit as refs are killed by that point but still would be more
resilient to do it under the lock.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5c766a90

io_uring: refactor io_resubmit_prep() · dc2a6e9a

由 Pavel Begunkov 提交于 1月 19, 2021

It's awkward to pass return a value into a function for it to return it
back. Check it at the caller site and clean up io_resubmit_prep() a bit.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dc2a6e9a

io_uring: optimise io_rw_reissue() · bf6182b6

由 Pavel Begunkov 提交于 1月 19, 2021

The hot path is IO completing on the first try. Reshuffle io_rw_reissue() so
it's checked first.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bf6182b6

io_uring: make percpu_ref_release names consistent · 00835dce

由 Bijan Mottahedeh 提交于 1月 15, 2021

Make the percpu ref release function names consistent between rsrc data
and nodes.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

00835dce

io_uring: create common fixed_rsrc_data allocation routines · 1ad555c6

由 Bijan Mottahedeh 提交于 1月 15, 2021

Create common alloc/free fixed_rsrc_data routines for both files and
buffers.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
[remove buffer part]
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1ad555c6

io_uring: create common fixed_rsrc_ref_node handling routines · d7954b2b

由 Bijan Mottahedeh 提交于 1月 15, 2021

Create common routines to be used for both files/buffers registration.

[remove io_sqe_rsrc_set_node substitution]
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
[merge, quiesce only for files]
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d7954b2b

io_uring: split ref_node alloc and init · bc9744cd

由 Pavel Begunkov 提交于 1月 15, 2021

A simple prep patch allowing to set refnode callbacks after it was
allocated. This needed to 1) keep ourself off of hi-level functions
where it's not pretty and they are not necessary 2) amortise ref_node
allocation in the future, e.g. for updates.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bc9744cd

io_uring: split alloc_fixed_file_ref_node · 6802535d

由 Bijan Mottahedeh 提交于 1月 15, 2021

Split alloc_fixed_file_ref_node into resource generic/specific parts,
to be leveraged for fixed buffers.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6802535d

io_uring: add rsrc_ref locking routines · 2a63b2d9

由 Bijan Mottahedeh 提交于 1月 15, 2021

Encapsulate resource reference locking into separate routines.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2a63b2d9

io_uring: separate ref_list from fixed_rsrc_data · d67d2263

由 Bijan Mottahedeh 提交于 1月 15, 2021

Uplevel ref_list and make it common to all resources.  This is to
allow one common ref_list to be used for both files, and buffers
in upcoming patches.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d67d2263

io_uring: generalize io_queue_rsrc_removal · 50238531

由 Bijan Mottahedeh 提交于 1月 15, 2021

Generalize io_queue_rsrc_removal to handle both files and buffers.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
[remove io_mapped_ubuf from rsrc tables/etc. for now]
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

50238531

io_uring: rename file related variables to rsrc · 269bbe5f

由 Bijan Mottahedeh 提交于 1月 15, 2021

This is a prep rename patch for subsequent patches to generalize file
registration.

[io_uring_rsrc_update:: rename fds -> data]
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
[leave io_uring_files_update as struct]
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

269bbe5f

io_uring: modularize io_sqe_buffers_register · 2b358604

由 Bijan Mottahedeh 提交于 1月 06, 2021

Move allocation of buffer management structures, and validation of
buffers into separate routines.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2b358604

io_uring: modularize io_sqe_buffer_register · 0a96bbe4

由 Bijan Mottahedeh 提交于 1月 06, 2021

Split io_sqe_buffer_register into two routines:

- io_sqe_buffer_register() registers a single buffer
- io_sqe_buffers_register iterates over all user specified buffers
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0a96bbe4

io_uring: enable LOOKUP_CACHED path resolution for filename lookups · 3a81fd02

由 Jens Axboe 提交于 12月 10, 2020

Instead of being pessimistic and assume that path lookup will block, use
LOOKUP_CACHED to attempt just a cached lookup. This ensures that the
fast path is always done inline, and we only punt to async context if
IO is needed to satisfy the lookup.

For forced nonblock open attempts, mark the file O_NONBLOCK over the
actual ->open() call as well. We can safely clear this again before
doing fd_install(), so it'll never be user visible that we fiddled with
it.

This greatly improves the performance of file open where the dentry is
already cached:

ached		5.10-git	5.10-git+LOOKUP_CACHED	Speedup
---------------------------------------------------------------
33%		1,014,975	900,474			1.1x
89%		 545,466	292,937			1.9x
100%		 435,636	151,475			2.9x

The more cache hot we are, the faster the inline LOOKUP_CACHED
optimization helps. This is unsurprising and expected, as a thread
offload becomes a more dominant part of the total overhead. If we look
at io_uring tracing, doing an IORING_OP_OPENAT on a file that isn't in
the dentry cache will yield:

275.550481: io_uring_create: ring 00000000ddda6278, fd 3 sq size 8, cq size 16, flags 0
275.550491: io_uring_submit_sqe: ring 00000000ddda6278, op 18, data 0x0, non block 1, sq_thread 0
275.550498: io_uring_queue_async_work: ring 00000000ddda6278, request 00000000c0267d17, flags 69760, normal queue, work 000000003d683991
275.550502: io_uring_cqring_wait: ring 00000000ddda6278, min_events 1
275.550556: io_uring_complete: ring 00000000ddda6278, user_data 0x0, result 4

which shows a failed nonblock lookup, then punt to worker, and then we
complete with fd == 4. This takes 65 usec in total. Re-running the same
test case again:

281.253956: io_uring_create: ring 0000000008207252, fd 3 sq size 8, cq size 16, flags 0
281.253967: io_uring_submit_sqe: ring 0000000008207252, op 18, data 0x0, non block 1, sq_thread 0
281.253973: io_uring_complete: ring 0000000008207252, user_data 0x0, result 4

shows the same request completing inline, also returning fd == 4. This
takes 6 usec.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3a81fd02

29 1月, 2021 3 次提交

io_uring: reinforce cancel on flush during exit · 3a7efd1a

由 Pavel Begunkov 提交于 1月 28, 2021

What 84965ff8 ("io_uring: if we see flush on exit, cancel related tasks")
really wants is to cancel all relevant REQ_F_INFLIGHT requests reliably.
That can be achieved by io_uring_cancel_files(), but we'll miss it
calling io_uring_cancel_task_requests(files=NULL) from io_uring_flush(),
because it will go through __io_uring_cancel_task_requests().

Just always call io_uring_cancel_files() during cancel, it's good enough
for now.

Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3a7efd1a

io_uring: fix sqo ownership false positive warning · 70b2c60d

由 Pavel Begunkov 提交于 1月 28, 2021

WARNING: CPU: 0 PID: 21359 at fs/io_uring.c:9042
    io_uring_cancel_task_requests+0xe55/0x10c0 fs/io_uring.c:9042
Call Trace:
 io_uring_flush+0x47b/0x6e0 fs/io_uring.c:9227
 filp_close+0xb4/0x170 fs/open.c:1295
 close_files fs/file.c:403 [inline]
 put_files_struct fs/file.c:418 [inline]
 put_files_struct+0x1cc/0x350 fs/file.c:415
 exit_files+0x7e/0xa0 fs/file.c:435
 do_exit+0xc22/0x2ae0 kernel/exit.c:820
 do_group_exit+0x125/0x310 kernel/exit.c:922
 get_signal+0x427/0x20f0 kernel/signal.c:2773
 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
 handle_signal_work kernel/entry/common.c:147 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
 syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Now io_uring_cancel_task_requests() can be called not through file
notes but directly, remove a WARN_ONCE() there that give us false
positives. That check is not very important and we catch it in other
places.

Fixes: 84965ff8 ("io_uring: if we see flush on exit, cancel related tasks")
Cc: stable@vger.kernel.org # 5.9+
Reported-by: syzbot+3e3d9bd0c6ce9efbc3ef@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

70b2c60d

io_uring: fix list corruption for splice file_get · f609cbb8

由 Pavel Begunkov 提交于 1月 28, 2021

kernel BUG at lib/list_debug.c:29!
Call Trace:
 __list_add include/linux/list.h:67 [inline]
 list_add include/linux/list.h:86 [inline]
 io_file_get+0x8cc/0xdb0 fs/io_uring.c:6466
 __io_splice_prep+0x1bc/0x530 fs/io_uring.c:3866
 io_splice_prep fs/io_uring.c:3920 [inline]
 io_req_prep+0x3546/0x4e80 fs/io_uring.c:6081
 io_queue_sqe+0x609/0x10d0 fs/io_uring.c:6628
 io_submit_sqe fs/io_uring.c:6705 [inline]
 io_submit_sqes+0x1495/0x2720 fs/io_uring.c:6953
 __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9353
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

io_file_get() may be called from splice, and so REQ_F_INFLIGHT may
already be set.

Fixes: 02a13674 ("io_uring: account io_uring internal files as REQ_F_INFLIGHT")
Cc: stable@vger.kernel.org # 5.9+
Reported-by: syzbot+6879187cf57845801267@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f609cbb8

28 1月, 2021 1 次提交

io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE · 6195ba09

由 Hao Xu 提交于 1月 27, 2021

Abaci reported the follow warning:

[   27.073425] do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait_exclusive+0x3a/0xc0
[   27.075805] WARNING: CPU: 0 PID: 951 at kernel/sched/core.c:7853 __might_sleep+0x80/0xa0
[   27.077604] Modules linked in:
[   27.078379] CPU: 0 PID: 951 Comm: a.out Not tainted 5.11.0-rc3+ #1
[   27.079637] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   27.080852] RIP: 0010:__might_sleep+0x80/0xa0
[   27.081835] Code: 65 48 8b 04 25 80 71 01 00 48 8b 90 c0 15 00 00 48 8b 70 18 48 c7 c7 08 39 95 82 c6 05 f9 5f de 08 01 48 89 d1 e8 00 c6 fa ff  0b eb bf 41 0f b6 f5 48 c7 c7 40 23 c9 82 e8 f3 48 ec 00 eb a7
[   27.084521] RSP: 0018:ffffc90000fe3ce8 EFLAGS: 00010286
[   27.085350] RAX: 0000000000000000 RBX: ffffffff82956083 RCX: 0000000000000000
[   27.086348] RDX: ffff8881057a0000 RSI: ffffffff8118cc9e RDI: ffff88813bc28570
[   27.087598] RBP: 00000000000003a7 R08: 0000000000000001 R09: 0000000000000001
[   27.088819] R10: ffffc90000fe3e00 R11: 00000000fffef9f0 R12: 0000000000000000
[   27.089819] R13: 0000000000000000 R14: ffff88810576eb80 R15: ffff88810576e800
[   27.091058] FS:  00007f7b144cf740(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
[   27.092775] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   27.093796] CR2: 00000000022da7b8 CR3: 000000010b928002 CR4: 00000000003706f0
[   27.094778] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   27.095780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   27.097011] Call Trace:
[   27.097685]  __mutex_lock+0x5d/0xa30
[   27.098565]  ? prepare_to_wait_exclusive+0x71/0xc0
[   27.099412]  ? io_cqring_overflow_flush.part.101+0x6d/0x70
[   27.100441]  ? lockdep_hardirqs_on_prepare+0xe9/0x1c0
[   27.101537]  ? _raw_spin_unlock_irqrestore+0x2d/0x40
[   27.102656]  ? trace_hardirqs_on+0x46/0x110
[   27.103459]  ? io_cqring_overflow_flush.part.101+0x6d/0x70
[   27.104317]  io_cqring_overflow_flush.part.101+0x6d/0x70
[   27.105113]  io_cqring_wait+0x36e/0x4d0
[   27.105770]  ? find_held_lock+0x28/0xb0
[   27.106370]  ? io_uring_remove_task_files+0xa0/0xa0
[   27.107076]  __x64_sys_io_uring_enter+0x4fb/0x640
[   27.107801]  ? rcu_read_lock_sched_held+0x59/0xa0
[   27.108562]  ? lockdep_hardirqs_on_prepare+0xe9/0x1c0
[   27.109684]  ? syscall_enter_from_user_mode+0x26/0x70
[   27.110731]  do_syscall_64+0x2d/0x40
[   27.111296]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   27.112056] RIP: 0033:0x7f7b13dc8239
[   27.112663] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05  3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48
[   27.115113] RSP: 002b:00007ffd6d7f5c88 EFLAGS: 00000286 ORIG_RAX: 00000000000001aa
[   27.116562] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b13dc8239
[   27.117961] RDX: 000000000000478e RSI: 0000000000000000 RDI: 0000000000000003
[   27.118925] RBP: 00007ffd6d7f5cb0 R08: 0000000020000040 R09: 0000000000000008
[   27.119773] R10: 0000000000000001 R11: 0000000000000286 R12: 0000000000400480
[   27.120614] R13: 00007ffd6d7f5d90 R14: 0000000000000000 R15: 0000000000000000
[   27.121490] irq event stamp: 5635
[   27.121946] hardirqs last  enabled at (5643): [] console_unlock+0x5c4/0x740
[   27.123476] hardirqs last disabled at (5652): [] console_unlock+0x4e7/0x740
[   27.125192] softirqs last  enabled at (5272): [] __do_softirq+0x3c5/0x5aa
[   27.126430] softirqs last disabled at (5267): [] asm_call_irq_on_stack+0xf/0x20
[   27.127634] ---[ end trace 289d7e28fa60f928 ]---

This is caused by calling io_cqring_overflow_flush() which may sleep
after calling prepare_to_wait_exclusive() which set task state to
TASK_INTERRUPTIBLE
Reported-by: NAbaci <abaci@linux.alibaba.com>
Fixes: 6c503150 ("io_uring: patch up IOPOLL overflow_flush sync")
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6195ba09

27 1月, 2021 2 次提交

io_uring: fix wqe->lock/completion_lock deadlock · 907d1df3

由 Pavel Begunkov 提交于 1月 26, 2021

Joseph reports following deadlock:

CPU0:
...
io_kill_linked_timeout  // &ctx->completion_lock
io_commit_cqring
__io_queue_deferred
__io_queue_async_work
io_wq_enqueue
io_wqe_enqueue  // &wqe->lock

CPU1:
...
__io_uring_files_cancel
io_wq_cancel_cb
io_wqe_cancel_pending_work  // &wqe->lock
io_cancel_task_cb  // &ctx->completion_lock

Only __io_queue_deferred() calls queue_async_work() while holding
ctx->completion_lock, enqueue drained requests via io_req_task_queue()
instead.

Cc: stable@vger.kernel.org # 5.9+
Reported-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Tested-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

907d1df3

io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE · ca70f00b

由 Pavel Begunkov 提交于 1月 26, 2021

do not call blocking ops when !TASK_RUNNING; state=2 set at
	[<00000000ced9dbfc>] prepare_to_wait+0x1f4/0x3b0
	kernel/sched/wait.c:262
WARNING: CPU: 1 PID: 19888 at kernel/sched/core.c:7853
	__might_sleep+0xed/0x100 kernel/sched/core.c:7848
RIP: 0010:__might_sleep+0xed/0x100 kernel/sched/core.c:7848
Call Trace:
 __mutex_lock_common+0xc4/0x2ef0 kernel/locking/mutex.c:935
 __mutex_lock kernel/locking/mutex.c:1103 [inline]
 mutex_lock_nested+0x1a/0x20 kernel/locking/mutex.c:1118
 io_wq_submit_work+0x39a/0x720 fs/io_uring.c:6411
 io_run_cancel fs/io-wq.c:856 [inline]
 io_wqe_cancel_pending_work fs/io-wq.c:990 [inline]
 io_wq_cancel_cb+0x614/0xcb0 fs/io-wq.c:1027
 io_uring_cancel_files fs/io_uring.c:8874 [inline]
 io_uring_cancel_task_requests fs/io_uring.c:8952 [inline]
 __io_uring_files_cancel+0x115d/0x19e0 fs/io_uring.c:9038
 io_uring_files_cancel include/linux/io_uring.h:51 [inline]
 do_exit+0x2e6/0x2490 kernel/exit.c:780
 do_group_exit+0x168/0x2d0 kernel/exit.c:922
 get_signal+0x16b5/0x2030 kernel/signal.c:2770
 arch_do_signal_or_restart+0x8e/0x6a0 arch/x86/kernel/signal.c:811
 handle_signal_work kernel/entry/common.c:147 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
 syscall_exit_to_user_mode+0x48/0x190 kernel/entry/common.c:302
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Rewrite io_uring_cancel_files() to mimic __io_uring_task_cancel()'s
counting scheme, so it does all the heavy work before setting
TASK_UNINTERRUPTIBLE.

Cc: stable@vger.kernel.org # 5.9+
Reported-by: syzbot+f655445043a26a7cfab8@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
[axboe: fix inverted task check]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ca70f00b

26 1月, 2021 1 次提交

io_uring: fix __io_uring_files_cancel() with TASK_UNINTERRUPTIBLE · a1bb3cd5

由 Pavel Begunkov 提交于 1月 26, 2021

If the tctx inflight number haven't changed because of cancellation,
__io_uring_task_cancel() will continue leaving the task in
TASK_UNINTERRUPTIBLE state, that's not expected by
__io_uring_files_cancel(). Ensure we always call finish_wait() before
retrying.

Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a1bb3cd5

25 1月, 2021 1 次提交

io_uring: only call io_cqring_ev_posted() if events were posted · b18032bb

由 Jens Axboe 提交于 1月 24, 2021

This normally doesn't cause any extra harm, but it does mean that we'll
increment the eventfd notification count, if one has been registered
with the ring. This can confuse applications, when they see more
notifications on the eventfd side than are available in the ring.

Do the nice thing and only increment this count, if we actually posted
(or even overflowed) events.
Reported-and-tested-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b18032bb

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功