提交 · 58d3be2c60d2cf4e6bb65bb6200fa39a7bc477f9 · openeuler / Kernel

24 8月, 2021 17 次提交

io_uring: improve ctx hang handling · 58d3be2c

由 Pavel Begunkov 提交于 8月 09, 2021

If io_ring_exit_work() can't get it done in 5 minutes, something is
going very wrong, don't keep spinning at HZ / 20 rate, it doesn't help
and it may take much of CPU time if there is a lot of workers stuck as
such.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9e2d1ca81d569f6bc628af1a42ff6663bff7ce9c.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

58d3be2c

io_uring: deduplicate open iopoll check · d3fddf6d

由 Pavel Begunkov 提交于 8月 09, 2021

Move IORING_SETUP_IOPOLL check into __io_openat_prep(), so both openat
and openat2 reuse it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9a73ce83e4ee60d011180ef177eecef8e87ff2a2.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d3fddf6d

io_uring: inline io_free_req_deferred · 543af3a1

由 Pavel Begunkov 提交于 8月 09, 2021

Inline io_free_req_deferred(), there is no reason to keep it separated.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ce04b7180d4eac0d69dd00677b227eefe80c2cc5.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

543af3a1

io_uring: move io_rsrc_node_alloc() definition · b9bd2bea

由 Pavel Begunkov 提交于 8月 09, 2021

Move the function together with io_rsrc_node_ref_zero() in the source
file as it is to get rid of forward declarations.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4d81f6f833e7d017860b24463a9a68b14a8a5ed2.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b9bd2bea

io_uring: move io_put_task() definition · 6a290a14

由 Pavel Begunkov 提交于 8月 09, 2021

Move the function in the source file as it is to get rid of forward
declarations.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/33d917d69e4206557c75a5b98fe22bcdf77ce47d.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6a290a14

io_uring: extract a helper for ctx quiesce · e73c5c7c

由 Pavel Begunkov 提交于 8月 09, 2021

Refactor __io_uring_register() by extracting a helper responsible for
ctx queisce. Looks better and will make it easier to add more
optimisations.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0339e0027504176be09237eefa7945bf9a6f153d.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e73c5c7c

io_uring: optimise io_cqring_wait() hot path · 90291099

由 Pavel Begunkov 提交于 8月 09, 2021

Turns out we always init struct io_wait_queue in io_cqring_wait(), even
if it's not used after, i.e. there are already enough of CQEs. And often
it's exactly what happens, for instance, requests may have been
completed inline, or in case of io_uring_enter(submit=N, wait=1).

It shows up in my profiler, so optimise it by delaying the struct init.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6f1b81c60b947d165583dc333947869c3d85d037.1628471125.git.asml.silence@gmail.com
[axboe: fixed up for new cqring wait]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

90291099

io_uring: add more locking annotations for submit · 282cdc86

由 Pavel Begunkov 提交于 8月 09, 2021

Add more annotations for submission path functions holding ->uring_lock.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/128ec4185e26fbd661dd3a424aa66108ee8ff951.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

282cdc86

io_uring: don't halt iopoll too early · a2416e1e

由 Pavel Begunkov 提交于 8月 09, 2021

IOPOLL users should care more about getting completions for requests
they submitted, but not in "device did/completed something". Currently,
io_do_iopoll() may return a positive number, which will instruct
io_iopoll_check() to break the loop and end the syscall, even if there
is not enough CQEs or none at all.

Don't return positive numbers, so io_iopoll_check() exits only when it
gets an actual error, need reschedule or got enough CQEs.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/641a88f751623b6758303b3171f0a4141f06726e.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a2416e1e

io_uring: refactor io_alloc_req · 864ea921

由 Pavel Begunkov 提交于 8月 09, 2021

Replace the main if of io_flush_cached_reqs() with inverted condition +
goto, so all the cases are handled in the same way. And also extract
io_preinit_req() to make it cleaner and easier to refer to.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1abcba1f7b55dc53bf1dbe95036e345ffb1d5b01.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

864ea921

io_uring: remove unnecessary PF_EXITING check · 2215bed9

由 Pavel Begunkov 提交于 8月 09, 2021

We prefer nornal task_works even if it would fail requests inside. Kill
a PF_EXITING check in io_req_task_work_add(), task_work_add() handles
well dying tasks, i.e. return error when can't enqueue due to late
stages of do_exit().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fc14297e8441cd8f5d1743a2488cf0df09bf48ac.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2215bed9

io_uring: clean io-wq callbacks · ebc11b6c

由 Pavel Begunkov 提交于 8月 09, 2021

Move io-wq callbacks closer to each other, so it's easier to work with
them, and rename io_free_work() into io_wq_free_work() for consistency.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/851bbc7f0f86f206d8c1333efee8bcb9c26e419f.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ebc11b6c

io_uring: avoid touching inode in rw prep · c97d8a0f

由 Pavel Begunkov 提交于 8月 09, 2021

If we use fixed files, we can be sure (almost) that REQ_F_ISREG is set.
However, for non-reg files io_prep_rw() still will look into inode to
double check, and that's expensive and can be avoided.

The only caveat is that it only currently works with 64+ bit
architectures, see FFS_ISREG, so we should consider that.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0a62780c491ca2522cd52db4ae3f16e03aafed0f.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c97d8a0f

io_uring: rename io_file_supports_async() · b191e2df

由 Pavel Begunkov 提交于 8月 09, 2021

io_file_supports_async() checks whether a file supports nowait
operations, so "async" in the name is misleading. Rename it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/33d55b5ce43aa1884c637c1957f1e30d30dc3bec.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b191e2df

io_uring: inline fixed part of io_file_get() · ac177053

由 Pavel Begunkov 提交于 8月 09, 2021

Optimise io_file_get() with registered files, which is in a hot path,
by inlining parts of the function. Saves a function call, and
inefficiencies of passing arguments, e.g. evaluating
(sqe_flags & IOSQE_FIXED_FILE).

It couldn't have been done before as compilers were refusing to inline
it because of the function size.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/52115cd6ce28f33bd0923149c0e6cb611084a0b1.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ac177053

io_uring: use kvmalloc for fixed files · 042b0d85

由 Pavel Begunkov 提交于 8月 09, 2021

Instead of hand-coded two-level tables for registered files, allocate
them with kvmalloc(). In many cases small enough tables are enough, and
so can be kmalloc()'ed removing an extra memory load and a bunch of bit
logic instructions from the hot path. If the table is larger, we trade
off all the pros with a TLB-assisted memory lookup.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/280421d3b48775dabab773006bb5588c7b2dabc0.1628471125.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

042b0d85

io_uring: be smarter about waking multiple CQ ring waiters · 5fd46178

由 Jens Axboe 提交于 8月 06, 2021

Currently we only wake the first waiter, even if we have enough entries
posted to satisfy multiple waiters. Improve that situation so that
every waiter knows how much the CQ tail has to advance before they can
be safely woken up.

With this change, if we have N waiters each asking for 1 event and we get
4 completions, then we wake up 4 waiters. If we have N waiters asking
for 2 completions and we get 4 completions, then we wake up the first
two. Previously, only the first waiter would've been woken up.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5fd46178

21 8月, 2021 1 次提交

io_uring: fix xa_alloc_cycle() error return value check · a30f895a

由 Jens Axboe 提交于 8月 20, 2021

We currently check for ret != 0 to indicate error, but '1' is a valid
return and just indicates that the allocation succeeded with a wrap.
Correct the check to be for < 0, like it was before the xarray
conversion.

Cc: stable@vger.kernel.org
Fixes: 61cf9370 ("io_uring: Convert personality_idr to XArray")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a30f895a

18 8月, 2021 1 次提交

io_uring: pin ctx on fallback execution · 9cb0073b

由 Pavel Begunkov 提交于 8月 17, 2021

Pin ring in io_fallback_req_func() by briefly elevating ctx->refs in
case any task_work handler touches ctx after releasing a request.

Fixes: 9011bf9a ("io_uring: fix stuck fallback reqs")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/833a494713d235ec144284a9bbfe418df4f6b61c.1629235576.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9cb0073b

15 8月, 2021 1 次提交

io_uring: only assign io_uring_enter() SQPOLL error in actual error case · 21f96522

由 Jens Axboe 提交于 8月 14, 2021

If an SQPOLL based ring is newly created and an application issues an
io_uring_enter(2) system call on it, then we can return a spurious
-EOWNERDEAD error. This happens because there's nothing to submit, and
if the caller doesn't specify any other action, the initial error
assignment of -EOWNERDEAD never gets overwritten. This causes us to
return it directly, even if it isn't valid.

Move the error assignment into the actual failure case instead.

Cc: stable@vger.kernel.org
Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
Reported-by: Sherlock Holo sherlockya@gmail.com
Link: https://github.com/axboe/liburing/issues/413Signed-off-by: NJens Axboe <axboe@kernel.dk>

21f96522

10 8月, 2021 3 次提交

io_uring: fix ctx-exit io_rsrc_put_work() deadlock · 43597aac

由 Pavel Begunkov 提交于 8月 10, 2021

__io_rsrc_put_work() might need ->uring_lock, so nobody should wait for
rsrc nodes holding the mutex. However, that's exactly what
io_ring_ctx_free() does with io_wait_rsrc_data().

Split it into rsrc wait + dealloc, and move the first one out of the
lock.

Cc: stable@vger.kernel.org
Fixes: b60c8dce ("io_uring: preparation for rsrc tagging")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0130c5c2693468173ec1afab714e0885d2c9c363.1628559783.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

43597aac

io_uring: drop ctx->uring_lock before flushing work item · c018db4a

由 Jens Axboe 提交于 8月 09, 2021

Ammar reports that he's seeing a lockdep splat on running test/rsrc_tags
from the regression suite:

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc3-bluetea-test-00249-gc7d10223 #5 Tainted: G           OE
------------------------------------------------------
kworker/2:4/2684 is trying to acquire lock:
ffff88814bb1c0a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_rsrc_put_work+0x13d/0x1a0

but task is already holding lock:
ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}:
       __flush_work+0x31b/0x490
       io_rsrc_ref_quiesce.part.0.constprop.0+0x35/0xb0
       __do_sys_io_uring_register+0x45b/0x1060
       do_syscall_64+0x35/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #0 (&ctx->uring_lock){+.+.}-{3:3}:
       __lock_acquire+0x119a/0x1e10
       lock_acquire+0xc8/0x2f0
       __mutex_lock+0x86/0x740
       io_rsrc_put_work+0x13d/0x1a0
       process_one_work+0x236/0x530
       worker_thread+0x52/0x3b0
       kthread+0x135/0x160
       ret_from_fork+0x1f/0x30

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((work_completion)(&(&ctx->rsrc_put_work)->work));
                               lock(&ctx->uring_lock);
                               lock((work_completion)(&(&ctx->rsrc_put_work)->work));
  lock(&ctx->uring_lock);

 *** DEADLOCK ***

2 locks held by kworker/2:4/2684:
 #0: ffff88810004d938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
 #1: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530

stack backtrace:
CPU: 2 PID: 2684 Comm: kworker/2:4 Tainted: G           OE     5.14.0-rc3-bluetea-test-00249-gc7d10223 #5
Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015
Workqueue: events io_rsrc_put_work
Call Trace:
 dump_stack_lvl+0x6a/0x9a
 check_noncircular+0xfe/0x110
 __lock_acquire+0x119a/0x1e10
 lock_acquire+0xc8/0x2f0
 ? io_rsrc_put_work+0x13d/0x1a0
 __mutex_lock+0x86/0x740
 ? io_rsrc_put_work+0x13d/0x1a0
 ? io_rsrc_put_work+0x13d/0x1a0
 ? io_rsrc_put_work+0x13d/0x1a0
 ? process_one_work+0x1ce/0x530
 io_rsrc_put_work+0x13d/0x1a0
 process_one_work+0x236/0x530
 worker_thread+0x52/0x3b0
 ? process_one_work+0x530/0x530
 kthread+0x135/0x160
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x1f/0x30

which is due to holding the ctx->uring_lock when flushing existing
pending work, while the pending work flushing may need to grab the uring
lock if we're using IOPOLL.

Fix this by dropping the uring_lock a bit earlier as part of the flush.

Cc: stable@vger.kernel.org
Link: https://github.com/axboe/liburing/issues/404Tested-by: NAmmar Faizi <ammarfaizi2@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c018db4a

io_uring: rsrc ref lock needs to be IRQ safe · 4956b9ea

由 Jens Axboe 提交于 8月 09, 2021

Nadav reports running into the below splat on re-enabling softirqs:

WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0
Modules linked in:
CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020
RIP: 0010:__local_bh_enable_ip+0xaa/0xe0
Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65
RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000
RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac
RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9
R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae
R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890
FS:  00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0
Call Trace:
 _raw_spin_unlock_bh+0x31/0x40
 io_rsrc_node_ref_zero+0x13e/0x190
 io_dismantle_req+0x215/0x220
 io_req_complete_post+0x1b8/0x720
 __io_complete_rw.isra.0+0x16b/0x1f0
 io_complete_rw+0x10/0x20

where it's clear we end up calling the percpu count release directly
from the completion path, as it's in atomic mode and we drop the last
ref. For file/block IO, this can be from IRQ context already, and the
softirq locking for rsrc isn't enough.

Just make the lock fully IRQ safe, and ensure we correctly safe state
from the release path as we don't know the full context there.
Reported-by: NNadav Amit <nadav.amit@gmail.com>
Tested-by: NNadav Amit <nadav.amit@gmail.com>
Link: https://lore.kernel.org/io-uring/C187C836-E78B-4A31-B24C-D16919ACA093@gmail.com/Signed-off-by: NJens Axboe <axboe@kernel.dk>

4956b9ea

09 8月, 2021 2 次提交

io_uring: Use WRITE_ONCE() when writing to sq_flags · 20c0b380

由 Nadav Amit 提交于 8月 07, 2021

The compiler should be forbidden from any strange optimization for async
writes to user visible data-structures. Without proper protection, the
compiler can cause write-tearing or invent writes that would confuse the
userspace.

However, there are writes to sq_flags which are not protected by
WRITE_ONCE(). Use WRITE_ONCE() for these writes.

This is purely a theoretical issue. Presumably, any compiler is very
unlikely to do such optimizations.

Fixes: 75b28aff ("io_uring: allocate the two rings together")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NNadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20210808001342.964634-3-namit@vmware.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

20c0b380

io_uring: clear TIF_NOTIFY_SIGNAL when running task work · ef98eb04

由 Nadav Amit 提交于 8月 07, 2021

When using SQPOLL, the submission queue polling thread calls
task_work_run() to run queued work. However, when work is added with
TWA_SIGNAL - as done by io_uring itself - the TIF_NOTIFY_SIGNAL remains
set afterwards and is never cleared.

Consequently, when the submission queue polling thread checks whether
signal_pending(), it may always find a pending signal, if
task_work_add() was ever called before.

The impact of this bug might be different on different kernel versions.
It appears that on 5.14 it would only cause unnecessary calculation and
prevent the polling thread from sleeping. On 5.13, where the bug was
found, it stops the polling thread from finding newly submitted work.

Instead of task_work_run(), use tracehook_notify_signal() that clears
TIF_NOTIFY_SIGNAL. Test for TIF_NOTIFY_SIGNAL in addition to
current->task_works to avoid a race in which task_works is cleared but
the TIF_NOTIFY_SIGNAL is set.

Fixes: 685fe7fe ("io-wq: eliminate the need for a manager thread")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NNadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20210808001342.964634-2-namit@vmware.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ef98eb04

28 7月, 2021 3 次提交

io_uring: fix poll requests leaking second poll entries · a890d01e

由 Hao Xu 提交于 7月 28, 2021

For pure poll requests, it doesn't remove the second poll wait entry
when it's done, neither after vfs_poll() or in the poll completion
handler. We should remove the second poll wait entry.
And we use io_poll_remove_double() rather than io_poll_remove_waitqs()
since the latter has some redundant logic.

Fixes: 88e41cf9 ("io_uring: add multishot mode for IORING_OP_POLL_ADD")
Cc: stable@vger.kernel.org # 5.13+
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210728030322.12307-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a890d01e

io_uring: don't block level reissue off completion path · ef046888

由 Jens Axboe 提交于 7月 27, 2021

Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
completion path. Normally we expect this to happen inline as part
of submission, but apparently SCSI has a weird corner case where it
can happen as part of normal completions.

This should be solved by having the -EAGAIN bubble back up the stack
as part of submission, but previous attempts at this failed and we're
not just quite there yet. Instead we currently use REQ_F_REISSUE to
handle this case.

For now, catch it in io_rw_should_reissue() and prevent a reissue
from a bogus path.

Cc: stable@vger.kernel.org
Reported-by: NFabian Ebner <f.ebner@proxmox.com>
Tested-by: NFabian Ebner <f.ebner@proxmox.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef046888

io_uring: always reissue from task_work context · 773af691

由 Jens Axboe 提交于 7月 27, 2021

As a safeguard, if we're going to queue async work, do it from task_work
from the original task. This ensures that we can always sanely create
threads, regards of what the reissue context may be.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

773af691

27 7月, 2021 1 次提交

io_uring: fix race in unified task_work running · 110aa25c

由 Jens Axboe 提交于 7月 26, 2021

We use a bit to manage if we need to add the shared task_work, but
a list + lock for the pending work. Before aborting a current run
of the task_work we check if the list is empty, but we do so without
grabbing the lock that protects it. This can lead to races where
we think we have nothing left to run, where in practice we could be
racing with a task adding new work to the list. If we do hit that
race condition, we could be left with work items that need processing,
but the shared task_work is not active.

Ensure that we grab the lock before checking if the list is empty,
so we know if it's safe to exit the run or not.

Link: https://lore.kernel.org/io-uring/c6bd5987-e9ae-cd02-49d0-1b3ac1ef65b1@tnonline.net/
Cc: stable@vger.kernel.org # 5.11+
Reported-by: NForza <forza@tnonline.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

110aa25c

26 7月, 2021 1 次提交

io_uring: fix io_prep_async_link locking · 44eff40a

由 Pavel Begunkov 提交于 7月 26, 2021

io_prep_async_link() may be called after arming a linked timeout,
automatically making it unsafe to traverse the linked list. Guard
with completion_lock if there was a linked timeout.

Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/93f7c617e2b4f012a2a175b3dab6bc2f27cebc48.1627304436.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

44eff40a

24 7月, 2021 2 次提交

io_uring: explicitly catch any illegal async queue attempt · 991468dc

由 Jens Axboe 提交于 7月 23, 2021

Catch an illegal case to queue async from an unrelated task that got
the ring fd passed to it. This should not be possible to hit, but
better be proactive and catch it explicitly. io-wq is extended to
check for early IO_WQ_WORK_CANCEL being set on a work item as well,
so it can run the request through the normal cancelation path.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

991468dc

io_uring: never attempt iopoll reissue from release path · 3c30ef0f

由 Jens Axboe 提交于 7月 23, 2021

There are two reasons why this shouldn't be done:

1) Ring is exiting, and we're canceling requests anyway. Any request
should be canceled anyway. In theory, this could iterate for a
number of times if someone else is also driving the target block
queue into request starvation, however the likelihood of this
happening is miniscule.

2) If the original task decided to pass the ring to another task, then
we don't want to be reissuing from this context as it may be an
unrelated task or context. No assumptions should be made about
the context in which ->release() is run. This can only happen for pure
read/write, and we'll get -EFAULT on them anyway.

Link: https://lore.kernel.org/io-uring/YPr4OaHv0iv0KTOc@zeniv-ca.linux.org.uk/Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3c30ef0f

23 7月, 2021 1 次提交

io_uring: fix early fdput() of file · 0cc936f7

由 Jens Axboe 提交于 7月 22, 2021

A previous commit shuffled some code around, and inadvertently used
struct file after fdput() had been called on it. As we can't touch
the file post fdput() dropping our reference, move the fdput() to
after that has been done.

Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/io-uring/YPnqM0fY3nM5RdRI@zeniv-ca.linux.org.uk/
Fixes: f2a48dd0 ("io_uring: refactor io_sq_offload_create()")
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0cc936f7

20 7月, 2021 3 次提交

io_uring: fix memleak in io_init_wq_offload() · 362a9e65

由 Yang Yingliang 提交于 7月 20, 2021

I got memory leak report when doing fuzz test:

BUG: memory leak
unreferenced object 0xffff888107310a80 (size 96):
comm "syz-executor.6", pid 4610, jiffies 4295140240 (age 20.135s)
hex dump (first 32 bytes):
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
backtrace:
[<000000001974933b>] kmalloc include/linux/slab.h:591 [inline]
[<000000001974933b>] kzalloc include/linux/slab.h:721 [inline]
[<000000001974933b>] io_init_wq_offload fs/io_uring.c:7920 [inline]
[<000000001974933b>] io_uring_alloc_task_context+0x466/0x640 fs/io_uring.c:7955
[<0000000039d0800d>] __io_uring_add_tctx_node+0x256/0x360 fs/io_uring.c:9016
[<000000008482e78c>] io_uring_add_tctx_node fs/io_uring.c:9052 [inline]
[<000000008482e78c>] __do_sys_io_uring_enter fs/io_uring.c:9354 [inline]
[<000000008482e78c>] __se_sys_io_uring_enter fs/io_uring.c:9301 [inline]
[<000000008482e78c>] __x64_sys_io_uring_enter+0xabc/0xc20 fs/io_uring.c:9301
[<00000000b875f18f>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<00000000b875f18f>] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
[<000000006b0a8484>] entry_SYSCALL_64_after_hwframe+0x44/0xae

CPU0                          CPU1
io_uring_enter                io_uring_enter
io_uring_add_tctx_node        io_uring_add_tctx_node
__io_uring_add_tctx_node      __io_uring_add_tctx_node
io_uring_alloc_task_context   io_uring_alloc_task_context
io_init_wq_offload            io_init_wq_offload
hash = kzalloc                hash = kzalloc
ctx->hash_map = hash          ctx->hash_map = hash <- one of the hash is leaked

When calling io_uring_enter() in parallel, the 'hash_map' will be leaked,
add uring_lock to protect 'hash_map'.

Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx")
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210720083805.3030730-1-yangyingliang@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

362a9e65

io_uring: remove double poll entry on arm failure · 46fee9ab

由 Pavel Begunkov 提交于 7月 20, 2021

__io_queue_proc() can enqueue both poll entries and still fail
afterwards, so the callers trying to cancel it should also try to remove
the second poll entry (if any).

For example, it may leave the request alive referencing a io_uring
context but not accessible for cancellation:

[  282.599913][ T1620] task:iou-sqp-23145   state:D stack:28720 pid:23155 ppid:  8844 flags:0x00004004
[  282.609927][ T1620] Call Trace:
[  282.613711][ T1620]  __schedule+0x93a/0x26f0
[  282.634647][ T1620]  schedule+0xd3/0x270
[  282.638874][ T1620]  io_uring_cancel_generic+0x54d/0x890
[  282.660346][ T1620]  io_sq_thread+0xaac/0x1250
[  282.696394][ T1620]  ret_from_fork+0x1f/0x30

Cc: stable@vger.kernel.org
Fixes: 18bceab1 ("io_uring: allow POLL_ADD with double poll_wait() users")
Reported-and-tested-by: syzbot+ac957324022b7132accf@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0ec1228fc5eda4cb524eeda857da8efdc43c331c.1626774457.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

46fee9ab

io_uring: explicitly count entries for poll reqs · 68b11e8b

由 Pavel Begunkov 提交于 7月 20, 2021

If __io_queue_proc() fails to add a second poll entry, e.g. kmalloc()
failed, but it goes on with a third waitqueue, it may succeed and
overwrite the error status. Count the number of poll entries we added,
so we can set pt->error to zero at the beginning and find out when the
mentioned scenario happens.

Cc: stable@vger.kernel.org
Fixes: 18bceab1 ("io_uring: allow POLL_ADD with double poll_wait() users")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9d6b9e561f88bcc0163623b74a76c39f712151c3.1626774457.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

68b11e8b

12 7月, 2021 2 次提交

io_uring: fix io_drain_req() · 1b48773f

由 Pavel Begunkov 提交于 7月 11, 2021

io_drain_req() return whether the request has been consumed or not, not
an error code. Fix a stupid mistake slipped from optimisation patches.

Reported-by: syzbot+ba6fcd859210f4e9e109@syzkaller.appspotmail.com
Fixes: 76cc33d7 ("io_uring: refactor io_req_defer()")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4d3c53c4274ffff307c8ae062fc7fda63b978df2.1626039606.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1b48773f

io_uring: use right task for exiting checks · 9c688260

由 Pavel Begunkov 提交于 7月 10, 2021

When we use delayed_work for fallback execution of requests, current
will be not of the submitter task, and so checks in io_req_task_submit()
may not behave as expected. Currently, it leaves inline completions not
flushed, so making io_ring_exit_work() to hang. Use the submitter task
for all those checks.

Cc: stable@vger.kernel.org
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cb413c715bed0bc9c98b169059ea9c8a2c770715.1625881431.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9c688260

09 7月, 2021 2 次提交

io_uring: remove dead non-zero 'poll' check · 9ce85ef2

由 Jens Axboe 提交于 7月 09, 2021

Colin reports that Coverity complains about checking for poll being
non-zero after having dereferenced it multiple times. This is a valid
complaint, and actually a leftover from back when this code was based
on the aio poll code.

Kill the redundant check.

Link: https://lore.kernel.org/io-uring/fe70c532-e2a7-3722-58a1-0fa4e5c5ff2c@canonical.com/Reported-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9ce85ef2

io_uring: mitigate unlikely iopoll lag · 8f487ef2

由 Pavel Begunkov 提交于 7月 08, 2021

We have requests like IORING_OP_FILES_UPDATE that don't go through
->iopoll_list but get completed in place under ->uring_lock, and so
after dropping the lock io_iopoll_check() should expect that some CQEs
might have get completed in a meanwhile.

Currently such events won't be accounted in @nr_events, and the loop
will continue to poll even if there is enough of CQEs. It shouldn't be a
problem as it's not likely to happen and so, but not nice either. Just
return earlier in this case, it should be enough.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/66ef932cc66a34e3771bbae04b2953a8058e9d05.1625747741.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

8f487ef2

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功