提交 · eef51daa72f745b6e771d18f6f37c7e5cd4ccdf1 · openeuler / Kernel

14 6月, 2021 2 次提交

io-wq: remove unused io-wq refcounting · 382cb030

由 Pavel Begunkov 提交于 6月 14, 2021

iowq->refs is initialised to one and killed on exit, so it's not used
and we can kill it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/401007393528ea7c102360e69a29b64498e15db2.1623634181.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

382cb030

io-wq: embed wqe ptr array into struct io_wq · c7f405d6

由 Pavel Begunkov 提交于 6月 14, 2021

io-wq keeps an array of pointers to struct io_wqe, allocate this array
as a part of struct io-wq, it's easier to code and saves an extra
indirection for nearly each io-wq call.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1482c6a001923bbed662dc38a8a580fb08b1ed8c.1623634181.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c7f405d6

26 5月, 2021 2 次提交

io-wq: Fix UAF when wakeup wqe in hash waitqueue · 3743c172

由 Zqiang 提交于 5月 26, 2021

BUG: KASAN: use-after-free in __wake_up_common+0x637/0x650
Read of size 8 at addr ffff8880304250d8 by task iou-wrk-28796/28802

Call Trace:
 __dump_stack [inline]
 dump_stack+0x141/0x1d7
 print_address_description.constprop.0.cold+0x5b/0x2c6
 __kasan_report [inline]
 kasan_report.cold+0x7c/0xd8
 __wake_up_common+0x637/0x650
 __wake_up_common_lock+0xd0/0x130
 io_worker_handle_work+0x9dd/0x1790
 io_wqe_worker+0xb2a/0xd40
 ret_from_fork+0x1f/0x30

Allocated by task 28798:
 kzalloc_node [inline]
 io_wq_create+0x3c4/0xdd0
 io_init_wq_offload [inline]
 io_uring_alloc_task_context+0x1bf/0x6b0
 __io_uring_add_task_file+0x29a/0x3c0
 io_uring_add_task_file [inline]
 io_uring_install_fd [inline]
 io_uring_create [inline]
 io_uring_setup+0x209a/0x2bd0
 do_syscall_64+0x3a/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 28798:
 kfree+0x106/0x2c0
 io_wq_destroy+0x182/0x380
 io_wq_put [inline]
 io_wq_put_and_exit+0x7a/0xa0
 io_uring_clean_tctx [inline]
 __io_uring_cancel+0x428/0x530
 io_uring_files_cancel
 do_exit+0x299/0x2a60
 do_group_exit+0x125/0x310
 get_signal+0x47f/0x2150
 arch_do_signal_or_restart+0x2a8/0x1eb0
 handle_signal_work[inline]
 exit_to_user_mode_loop [inline]
 exit_to_user_mode_prepare+0x171/0x280
 __syscall_exit_to_user_mode_work [inline]
 syscall_exit_to_user_mode+0x19/0x60
 do_syscall_64+0x47/0xb0
 entry_SYSCALL_64_after_hwframe

There are the following scenarios, hash waitqueue is shared by
io-wq1 and io-wq2. (note: wqe is worker)

io-wq1:worker2     | locks bit1
io-wq2:worker1     | waits bit1
io-wq1:worker3     | waits bit1

io-wq1:worker2     | completes all wqe bit1 work items
io-wq1:worker2     | drop bit1, exit

io-wq2:worker1     | locks bit1
io-wq1:worker3     | can not locks bit1, waits bit1 and exit
io-wq1             | exit and free io-wq1
io-wq2:worker1     | drops bit1
io-wq1:worker3     | be waked up, even though wqe is freed

After all iou-wrk belonging to io-wq1 have exited, remove wqe
form hash waitqueue, it is guaranteed that there will be no more
wqe belonging to io-wq1 in the hash waitqueue.

Reported-by: syzbot+6cb11ade52aa17095297@syzkaller.appspotmail.com
Signed-off-by: NZqiang <qiang.zhang@windriver.com>
Link: https://lore.kernel.org/r/20210526050826.30500-1-qiang.zhang@windriver.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3743c172

io_uring/io-wq: close io-wq full-stop gap · 17a91051

由 Pavel Begunkov 提交于 5月 23, 2021

There is an old problem with io-wq cancellation where requests should be
killed and are in io-wq but are not discoverable, e.g. in @next_hashed
or @linked vars of io_worker_handle_work(). It adds some unreliability
to individual request canellation, but also may potentially get
__io_uring_cancel() stuck. For instance:

1) An __io_uring_cancel()'s cancellation round have not found any
   request but there are some as desribed.
2) __io_uring_cancel() goes to sleep
3) Then workers wake up and try to execute those hidden requests
   that happen to be unbound.

As we already cancel all requests of io-wq there, set IO_WQ_BIT_EXIT
in advance, so preventing 3) from executing unbound requests. The
workers will initially break looping because of getting a signal as they
are threads of the dying/exec()'ing user task.

Cc: stable@vger.kernel.org
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/abfcf8c54cb9e8f7bfbad7e9a0cc5433cc70bdc2.1621781238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

17a91051

21 4月, 2021 1 次提交

io-wq: remove unused io_wqe_need_worker() function · eb372672

由 Jens Axboe 提交于 4月 20, 2021

A previous commit removed the need for this, but overlooked that we no
longer use it at all. Get rid of it.

Fixes: 685fe7fe ("io-wq: eliminate the need for a manager thread")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eb372672

12 4月, 2021 5 次提交

io-wq: Fix io_wq_worker_affinity() · e0051d7d

由 Peter Zijlstra 提交于 4月 08, 2021

Do not include private headers and do not frob in internals.

On top of that, while the previous code restores the affinity, it
doesn't ensure the task actually moves there if it was running,
leading to the fun situation that it can be observed running outside
of its allowed mask for potentially significant time.

Use the proper API instead.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/YG7QkiUzlEbW85TU@hirez.programming.kicks-ass.netSigned-off-by: NJens Axboe <axboe@kernel.dk>

e0051d7d

io-wq: simplify code in __io_worker_busy() · 417b5052

由 Hao Xu 提交于 4月 06, 2021

Leverage XOR to simplify the code in __io_worker_busy.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/1617678525-3129-1-git-send-email-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

417b5052

io-wq: cancel task_work on exit only targeting the current 'wq' · c80ca470

由 Jens Axboe 提交于 4月 01, 2021

With using task_work_cancel(), we're potentially canceling task_work
that isn't related to this specific io_wq. Use the newly added
task_work_cancel_match() to ensure that we only remove and cancel work
items that are specific to this io_wq.

Fixes: 685fe7fe ("io-wq: eliminate the need for a manager thread")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c80ca470

io-wq: eliminate the need for a manager thread · 685fe7fe

由 Jens Axboe 提交于 3月 08, 2021

io-wq relies on a manager thread to create/fork new workers, as needed.
But there's really no strong need for it anymore. We have the following
cases that fork a new worker:

1) Work queue. This is done from the task itself always, and it's trivial
to create a worker off that path, if needed.

2) All workers have gone to sleep, and we have more work. This is called
off the sched out path. For this case, use a task_work items to queue
a fork-worker operation.

3) Hashed work completion. Don't think we need to do anything off this
case. If need be, it could just use approach 2 as well.

Part of this change is incrementing the running worker count before the
fork, to avoid cases where we observe we need a worker and then queue
creation of one. Then new work comes in, we fork a new one. That last
queue operation should have waited for the previous worker to come up,
it's quite possible we don't even need it. Hence move the worker running
from before we fork it off to more efficiently handle that case.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

685fe7fe

io-wq: refactor *_get_acct() · 8418f22a

由 Pavel Begunkov 提交于 3月 22, 2021

Extract a helper for io_work_get_acct() and io_wqe_get_acct() to avoid
duplication.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8418f22a

09 4月, 2021 1 次提交

io-wq: cancel unbounded works on io-wq destroy · c60eb049

由 Pavel Begunkov 提交于 4月 08, 2021

WARNING: CPU: 5 PID: 227 at fs/io_uring.c:8578 io_ring_exit_work+0xe6/0x470
RIP: 0010:io_ring_exit_work+0xe6/0x470
Call Trace:
 process_one_work+0x206/0x400
 worker_thread+0x4a/0x3d0
 kthread+0x129/0x170
 ret_from_fork+0x22/0x30

INFO: task lfs-openat:2359 blocked for more than 245 seconds.
task:lfs-openat      state:D stack:    0 pid: 2359 ppid:     1 flags:0x00000004
Call Trace:
 ...
 wait_for_completion+0x8b/0xf0
 io_wq_destroy_manager+0x24/0x60
 io_wq_put_and_exit+0x18/0x30
 io_uring_clean_tctx+0x76/0xa0
 __io_uring_files_cancel+0x1b9/0x2e0
 do_exit+0xc0/0xb40
 ...

Even after io-wq destroy has been issued io-wq worker threads will
continue executing all left work items as usual, and may hang waiting
for I/O that won't ever complete (aka unbounded).

[<0>] pipe_read+0x306/0x450
[<0>] io_iter_do_read+0x1e/0x40
[<0>] io_read+0xd5/0x330
[<0>] io_issue_sqe+0xd21/0x18a0
[<0>] io_wq_submit_work+0x6c/0x140
[<0>] io_worker_handle_work+0x17d/0x400
[<0>] io_wqe_worker+0x2c0/0x330
[<0>] ret_from_fork+0x22/0x30

Cancel all unbounded I/O instead of executing them. This changes the
user visible behaviour, but that's inevitable as io-wq is not per task.
Suggested-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cd4b543154154cba055cf86f351441c2174d7f71.1617842918.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c60eb049

01 4月, 2021 1 次提交

io_uring/io-wq: protect against sprintf overflow · 696ee88a

由 Pavel Begunkov 提交于 4月 01, 2021

task_pid may be large enough to not fit into the left space of
TASK_COMM_LEN-sized buffers and overflow in sprintf. We not so care
about uniqueness, so replace it with safer snprintf().
Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1702c6145d7e1c46fbc382f28334c02e1a3d3994.1617267273.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

696ee88a

28 3月, 2021 1 次提交

io_uring: handle signals for IO threads like a normal thread · dbe1bdbb

由 Jens Axboe 提交于 3月 25, 2021

We go through various hoops to disallow signals for the IO threads, but
there's really no reason why we cannot just allow them. The IO threads
never return to userspace like a normal thread, and hence don't go through
normal signal processing. Instead, just check for a pending signal as part
of the work loop, and call get_signal() to handle it for us if anything
is pending.

With that, we can support receiving signals, including special ones like
SIGSTOP.
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dbe1bdbb

26 3月, 2021 1 次提交

io-wq: fix race around pending work on teardown · f5d2d23b

由 Jens Axboe 提交于 3月 25, 2021

syzbot reports that it's triggering the warning condition on having
pending work on shutdown:

WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_destroy fs/io-wq.c:1061 [inline]
WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_put+0x153/0x260 fs/io-wq.c:1072
Modules linked in:
CPU: 1 PID: 12346 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_wq_destroy fs/io-wq.c:1061 [inline]
RIP: 0010:io_wq_put+0x153/0x260 fs/io-wq.c:1072
Code: 8d e8 71 90 ea 01 49 89 c4 41 83 fc 40 7d 4f e8 33 4d 97 ff 42 80 7c 2d 00 00 0f 85 77 ff ff ff e9 7a ff ff ff e8 1d 4d 97 ff <0f> 0b eb b9 8d 6b ff 89 ee 09 de bf ff ff ff ff e8 18 51 97 ff 09
RSP: 0018:ffffc90001ebfb08 EFLAGS: 00010293
RAX: ffffffff81e16083 RBX: ffff888019038040 RCX: ffff88801e86b780
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000040
RBP: 1ffff1100b2f8a80 R08: ffffffff81e15fce R09: ffffed100b2f8a82
R10: ffffed100b2f8a82 R11: 0000000000000000 R12: 0000000000000000
R13: dffffc0000000000 R14: ffff8880597c5400 R15: ffff888019038000
FS:  00007f8dcd89c700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055e9a054e160 CR3: 000000001dfb8000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 io_uring_clean_tctx+0x1b7/0x210 fs/io_uring.c:8802
 __io_uring_files_cancel+0x13c/0x170 fs/io_uring.c:8820
 io_uring_files_cancel include/linux/io_uring.h:47 [inline]
 do_exit+0x258/0x2340 kernel/exit.c:780
 do_group_exit+0x168/0x2d0 kernel/exit.c:922
 get_signal+0x1734/0x1ef0 kernel/signal.c:2773
 arch_do_signal_or_restart+0x3c/0x610 arch/x86/kernel/signal.c:811
 handle_signal_work kernel/entry/common.c:147 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:208
 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
 syscall_exit_to_user_mode+0x48/0x180 kernel/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x465f69

which shouldn't happen, but seems to be possible due to a race on whether
or not the io-wq manager sees a fatal signal first, or whether the io-wq
workers do. If we race with queueing work and then send a fatal signal to
the owning task, and the io-wq worker sees that before the manager sets
IO_WQ_BIT_EXIT, then it's possible to have the worker exit and leave work
behind.

Just turn the WARN_ON_ONCE() into a cancelation condition instead.

Reported-by: syzbot+77a738a6bc947bf639ca@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f5d2d23b

22 3月, 2021 1 次提交

io_uring: don't use {test,clear}_tsk_thread_flag() for current · 0b8cfa97

由 Jens Axboe 提交于 3月 21, 2021

Linus correctly points out that this is both unnecessary and generates
much worse code on some archs as going from current to thread_info is
actually backwards - and obviously just wasteful, since the thread_info
is what we care about.

Since io_uring only operates on current for these operations, just use
test_thread_flag() instead. For io-wq, we can further simplify and use
tracehook_notify_signal() to handle the TIF_NOTIFY_SIGNAL work and clear
the flag. The latter isn't an actual bug right now, but it may very well
be in the future if we place other work items under TIF_NOTIFY_SIGNAL.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/io-uring/CAHk-=wgYhNck33YHKZ14mFB5MzTTk8gqXHcfj=RWTAXKwgQJgg@mail.gmail.com/Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b8cfa97

21 3月, 2021 1 次提交

io-wq: ensure task is running before processing task_work · 00ddff43

由 Jens Axboe 提交于 3月 21, 2021

Mark the current task as running if we need to run task_work from the
io-wq threads as part of work handling. If that is the case, then return
as such so that the caller can appropriately loop back and reset if it
was part of a going-to-sleep flush.

Fixes: 3bfe6106 ("io-wq: fork worker threads from original task")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

00ddff43

13 3月, 2021 1 次提交

io_uring: allow IO worker threads to be frozen · 16efa4fc

由 Jens Axboe 提交于 3月 12, 2021

With the freezer using the proper signaling to notify us of when it's
time to freeze a thread, we can re-enable normal freezer usage for the
IO threads. Ensure that SQPOLL, io-wq, and the io-wq manager call
try_to_freeze() appropriately, and remove the default setting of
PF_NOFREEZE from create_io_thread().
Signed-off-by: NJens Axboe <axboe@kernel.dk>

16efa4fc

10 3月, 2021 3 次提交

kernel: make IO threads unfreezable by default · e22bc9b4

由 Jens Axboe 提交于 3月 09, 2021

The io-wq threads were already marked as no-freeze, but the manager was
not. On resume, we perpetually have signal_pending() being true, and
hence the manager will loop and spin 100% of the time.

Just mark the tasks created by create_io_thread() as PF_NOFREEZE by
default, and remove any knowledge of it in io-wq and io_uring.
Reported-by: NKevin Locke <kevin@kevinlocke.name>
Tested-by: NKevin Locke <kevin@kevinlocke.name>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e22bc9b4

io-wq: fix ref leak for req in case of exit cancelations · 70e35125

由 yangerkun 提交于 3月 09, 2021

do_work such as io_wq_submit_work that cancel the work may leave a ref of
req as 1 if we have links. Fix it by call io_run_cancel.

Fixes: 4fb6ac32 ("io-wq: improve manager/worker handling over exec")
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Link: https://lore.kernel.org/r/20210309030410.3294078-1-yangerkun@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

70e35125

io-wq: remove unused 'user' member of io_wq · cc20e3fe

由 Jens Axboe 提交于 3月 08, 2021

Previous patches killed the last user of this, now it's just a dead member
in the struct. Get rid of it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cc20e3fe

08 3月, 2021 1 次提交

io-wq: warn on creating manager while exiting · 678eeba4

由 Pavel Begunkov 提交于 3月 06, 2021

Add a simple warning making sure that nobody tries to create a new
manager while we're under IO_WQ_BIT_EXIT. That can potentially happen
due to racy work submission after final put.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

678eeba4

07 3月, 2021 1 次提交

io-wq: fix race in freeing 'wq' and worker access · 886d0137

由 Jens Axboe 提交于 3月 05, 2021

Ran into a use-after-free on the main io-wq struct, wq. It has a worker
ref and completion event, but the manager itself isn't holding a
reference. This can lead to a race where the manager thinks there are
no workers and exits, but a worker is being added. That leads to the
following trace:

BUG: KASAN: use-after-free in io_wqe_worker+0x4c0/0x5e0
Read of size 8 at addr ffff888108baa8a0 by task iou-wrk-3080422/3080425

CPU: 5 PID: 3080425 Comm: iou-wrk-3080422 Not tainted 5.12.0-rc1+ #110
Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO 10G (MS-7C60), BIOS 1.60 05/13/2020
Call Trace:
 dump_stack+0x90/0xbe
 print_address_description.constprop.0+0x67/0x28d
 ? io_wqe_worker+0x4c0/0x5e0
 kasan_report.cold+0x7b/0xd4
 ? io_wqe_worker+0x4c0/0x5e0
 __asan_load8+0x6d/0xa0
 io_wqe_worker+0x4c0/0x5e0
 ? io_worker_handle_work+0xc00/0xc00
 ? recalc_sigpending+0xe5/0x120
 ? io_worker_handle_work+0xc00/0xc00
 ? io_worker_handle_work+0xc00/0xc00
 ret_from_fork+0x1f/0x30

Allocated by task 3080422:
 kasan_save_stack+0x23/0x60
 __kasan_kmalloc+0x80/0xa0
 kmem_cache_alloc_node_trace+0xa0/0x480
 io_wq_create+0x3b5/0x600
 io_uring_alloc_task_context+0x13c/0x380
 io_uring_add_task_file+0x109/0x140
 __x64_sys_io_uring_enter+0x45f/0x660
 do_syscall_64+0x32/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 3080422:
 kasan_save_stack+0x23/0x60
 kasan_set_track+0x20/0x40
 kasan_set_free_info+0x24/0x40
 __kasan_slab_free+0xe8/0x120
 kfree+0xa8/0x400
 io_wq_put+0x14a/0x220
 io_wq_put_and_exit+0x9a/0xc0
 io_uring_clean_tctx+0x101/0x140
 __io_uring_files_cancel+0x36e/0x3c0
 do_exit+0x169/0x1340
 __x64_sys_exit+0x34/0x40
 do_syscall_64+0x32/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Have the manager itself hold a reference, and now both drop points drop
and complete if we hit zero, and the manager can unconditionally do a
wait_for_completion() instead of having a race between reading the ref
count and waiting if it was non-zero.

Fixes: fb3a1f6c ("io-wq: have manager wait for all workers to exit")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

886d0137

05 3月, 2021 2 次提交

io-wq: kill hashed waitqueue before manager exits · 09ca6c40

由 Jens Axboe 提交于 3月 05, 2021

If we race with shutting down the io-wq context and someone queueing
a hashed entry, then we can exit the manager with it armed. If it then
triggers after the manager has exited, we can have a use-after-free where
io_wqe_hash_wake() attempts to wake a now gone manager process.

Move the killing of the hashed write queue into the manager itself, so
that we know we've killed it before the task exits.

Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

09ca6c40

io_uring: move to using create_io_thread() · 46fe18b1

由 Jens Axboe 提交于 3月 04, 2021

This allows us to do task creation and setup without needing to use
completions to try and synchronize with the starting thread. Get rid of
the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup
completion events - we can now do setup before the task is running.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

46fe18b1

04 3月, 2021 10 次提交

io-wq: ensure all pending work is canceled on exit · f0127254

由 Jens Axboe 提交于 3月 03, 2021

If we race on shutting down the io-wq, then we should ensure that any
work that was queued after workers shutdown is canceled. Harden the
add work check a bit too, checking for IO_WQ_BIT_EXIT and cancel if
it's set.

Add a WARN_ON() for having any work before we kill the io-wq context.

Reported-by: syzbot+91b4b56ead187d35c9d3@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f0127254

io_uring: ensure that threads freeze on suspend · e4b4a13f

由 Jens Axboe 提交于 3月 01, 2021

Alex reports that his system fails to suspend using 5.12-rc1, with the
following dump:

[  240.650300] PM: suspend entry (deep)
[  240.650748] Filesystems sync: 0.000 seconds
[  240.725605] Freezing user space processes ...
[  260.739483] Freezing of tasks failed after 20.013 seconds (3 tasks refusing to freeze, wq_busy=0):
[  260.739497] task:iou-mgr-446     state:S stack:    0 pid:  516 ppid:   439 flags:0x00004224
[  260.739504] Call Trace:
[  260.739507]  ? sysvec_apic_timer_interrupt+0xb/0x81
[  260.739515]  ? pick_next_task_fair+0x197/0x1cde
[  260.739519]  ? sysvec_reschedule_ipi+0x2f/0x6a
[  260.739522]  ? asm_sysvec_reschedule_ipi+0x12/0x20
[  260.739525]  ? __schedule+0x57/0x6d6
[  260.739529]  ? del_timer_sync+0xb9/0x115
[  260.739533]  ? schedule+0x63/0xd5
[  260.739536]  ? schedule_timeout+0x219/0x356
[  260.739540]  ? __next_timer_interrupt+0xf1/0xf1
[  260.739544]  ? io_wq_manager+0x73/0xb1
[  260.739549]  ? io_wq_create+0x262/0x262
[  260.739553]  ? ret_from_fork+0x22/0x30
[  260.739557] task:iou-mgr-517     state:S stack:    0 pid:  522 ppid:   439 flags:0x00004224
[  260.739561] Call Trace:
[  260.739563]  ? sysvec_apic_timer_interrupt+0xb/0x81
[  260.739566]  ? pick_next_task_fair+0x16f/0x1cde
[  260.739569]  ? sysvec_apic_timer_interrupt+0xb/0x81
[  260.739571]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[  260.739574]  ? __schedule+0x5b7/0x6d6
[  260.739578]  ? del_timer_sync+0x70/0x115
[  260.739581]  ? schedule_timeout+0x211/0x356
[  260.739585]  ? __next_timer_interrupt+0xf1/0xf1
[  260.739588]  ? io_wq_check_workers+0x15/0x11f
[  260.739592]  ? io_wq_manager+0x69/0xb1
[  260.739596]  ? io_wq_create+0x262/0x262
[  260.739600]  ? ret_from_fork+0x22/0x30
[  260.739603] task:iou-wrk-517     state:S stack:    0 pid:  523 ppid:   439 flags:0x00004224
[  260.739607] Call Trace:
[  260.739609]  ? __schedule+0x5b7/0x6d6
[  260.739614]  ? schedule+0x63/0xd5
[  260.739617]  ? schedule_timeout+0x219/0x356
[  260.739621]  ? __next_timer_interrupt+0xf1/0xf1
[  260.739624]  ? task_thread.isra.0+0x148/0x3af
[  260.739628]  ? task_thread_unbound+0xa/0xa
[  260.739632]  ? task_thread_bound+0x7/0x7
[  260.739636]  ? ret_from_fork+0x22/0x30
[  260.739647] OOM killer enabled.
[  260.739648] Restarting tasks ... done.
[  260.740077] PM: suspend exit

Play nice and ensure that any thread we create will call try_to_freeze()
at an opportune time so that memory suspend can proceed. For the io-wq
worker threads, mark them as PF_NOFREEZE. They could potentially be
blocked for a long time.
Reported-by: NAlex Xu (Hello71) <alex_y_xu@yahoo.ca>
Tested-by: NAlex Xu (Hello71) <alex_y_xu@yahoo.ca>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e4b4a13f

io-wq: fix error path leak of buffered write hash map · dc7bbc9e

由 Jens Axboe 提交于 3月 01, 2021

The 'err' path should include the hash put, we already grabbed a reference
once we get that far.

Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dc7bbc9e

io_uring: move cred assignment into io_issue_sqe() · 5730b27e

由 Jens Axboe 提交于 2月 27, 2021

If we move it in there, then we no longer have to care about it in io-wq.
This means we can drop the cred handling in io-wq, and we can drop the
REQ_F_WORK_INITIALIZED flag and async init functions as that was the last
user of it since we moved to the new workers. Then we can also drop
io_wq_work->creds, and just hold the personality u16 in there instead.
Suggested-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5730b27e

io-wq: provide an io_wq_put_and_exit() helper · afcc4015

由 Jens Axboe 提交于 2月 26, 2021

If we put the io-wq from io_uring, we really want it to exit. Provide
a helper that does that for us. Couple that with not having the manager
hold a reference to the 'wq' and the normal SQPOLL exit will tear down
the io-wq context appropriate.

On the io-wq side, our wq context is per task, so only the task itself
is manipulating ->manager and hence it's safe to check and clear without
any extra locking. We just need to ensure that the manager task stays
around, in case it exits.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

afcc4015

io-wq: fix double put of 'wq' in error path · 470ec4ed

由 Jens Axboe 提交于 2月 26, 2021

We are already freeing the wq struct in both spots, so don't put it and
get it freed twice.

Reported-by: syzbot+7bf785eedca35ca05501@syzkaller.appspotmail.com
Fixes: 4fb6ac32 ("io-wq: improve manager/worker handling over exec")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

470ec4ed

io-wq: wait for manager exit on wq destroy · d364d9e5

由 Jens Axboe 提交于 2月 26, 2021

The manager waits for the workers, hence the manager is always valid if
workers are running. Now also have wq destroy wait for the manager on
exit, so we now everything is gone.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d364d9e5

io-wq: rename wq->done completion to wq->started · dbf99620

由 Jens Axboe 提交于 2月 26, 2021

This is a leftover from a different use cases, it's used to wait for
the manager to startup. Rename it as such.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dbf99620

io-wq: don't ask for a new worker if we're exiting · 613eeb60

由 Jens Axboe 提交于 2月 26, 2021

If we're in the process of shutting down the async context, then don't
create new workers if we already have at least the fixed one.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

613eeb60

io-wq: have manager wait for all workers to exit · fb3a1f6c

由 Jens Axboe 提交于 2月 26, 2021

Instead of having to wait separately on workers and manager, just have
the manager wait on the workers. We use an atomic_t for the reference
here, as we need to start at 0 and allow increment from that. Since the
number of workers is naturally capped by the allowed nr of processes,
and that uses an int, there is no risk of overflow.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fb3a1f6c

02 3月, 2021 1 次提交

io-wq: wait for worker startup when forking a new one · 65d43023

由 Jens Axboe 提交于 2月 26, 2021

We need to have our worker count updated before continuing, to avoid
cases where we repeatedly think we need a new worker, but a fork is
already in progress.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

65d43023

26 2月, 2021 3 次提交

io-wq: remove now unused IO_WQ_BIT_ERROR · d6ce7f67

由 Jens Axboe 提交于 2月 25, 2021

This flag is now dead, remove it.

Fixes: 1cbd9c2b ("io-wq: don't create any IO workers upfront")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d6ce7f67

io-wq: improve manager/worker handling over exec · 4fb6ac32

由 Jens Axboe 提交于 2月 25, 2021

exec will cancel any threads, including the ones that io-wq is using. This
isn't a problem, in fact we'd prefer it to be that way since it means we
know that any async work cancels naturally without having to handle it
proactively.

But it does mean that we need to setup a new manager, as the manager and
workers are gone. Handle this at queue time, and cancel work if we fail.
Since the manager can go away without us noticing, ensure that the manager
itself holds a reference to the 'wq' as well. Rename io_wq_destroy() to
io_wq_put() to reflect that.

In the future we can now simplify exec cancelation handling, for now just
leave it the same.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4fb6ac32

io-wq: make buffered file write hashed work map per-ctx · e941894e

由 Jens Axboe 提交于 2月 19, 2021

Before the io-wq thread change, we maintained a hash work map and lock
per-node per-ring. That wasn't ideal, as we really wanted it to be per
ring. But now that we have per-task workers, the hash map ends up being
just per-task. That'll work just fine for the normal case of having
one task use a ring, but if you share the ring between tasks, then it's
considerably worse than it was before.

Make the hash map per ctx instead, which provides full per-ctx buffered
write serialization on hashed writes.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e941894e

24 2月, 2021 2 次提交

io-wq: fix race around io_worker grabbing · eb2de941

由 Jens Axboe 提交于 2月 23, 2021

There's a small window between lookup dropping the reference to the
worker and calling wake_up_process() on the worker task, where the worker
itself could have exited. We ensure that the worker struct itself is
valid, but worker->task may very well be gone by the time we issue the
wakeup.

Fix the race by using a completion triggered by the reference going to
zero, and having exit wait for that completion before proceeding.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eb2de941

io-wq: fix races around manager/worker creation and task exit · 8b3e78b5

由 Jens Axboe 提交于 2月 23, 2021

These races have always been there, they are just more apparent now that
we do early cancel of io-wq when the task exits.

Ensure that the io-wq manager sets task state correctly to not miss
wakeups for task creation. This is important if we get a wakeup after
having marked ourselves as TASK_INTERRUPTIBLE. If we do end up creating
workers, then we flip the state back to running, making the subsequent
schedule() a no-op. Also increment the wq ref count before forking the
thread, to avoid a use-after-free.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8b3e78b5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功