提交 · 1b00764f09b6912d25e188d972a7764a457926ba · openeuler / Kernel

07 3月, 2021 1 次提交

io-wq: fix race in freeing 'wq' and worker access · 886d0137

由 Jens Axboe 提交于 3月 05, 2021

Ran into a use-after-free on the main io-wq struct, wq. It has a worker
ref and completion event, but the manager itself isn't holding a
reference. This can lead to a race where the manager thinks there are
no workers and exits, but a worker is being added. That leads to the
following trace:

BUG: KASAN: use-after-free in io_wqe_worker+0x4c0/0x5e0
Read of size 8 at addr ffff888108baa8a0 by task iou-wrk-3080422/3080425

CPU: 5 PID: 3080425 Comm: iou-wrk-3080422 Not tainted 5.12.0-rc1+ #110
Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO 10G (MS-7C60), BIOS 1.60 05/13/2020
Call Trace:
 dump_stack+0x90/0xbe
 print_address_description.constprop.0+0x67/0x28d
 ? io_wqe_worker+0x4c0/0x5e0
 kasan_report.cold+0x7b/0xd4
 ? io_wqe_worker+0x4c0/0x5e0
 __asan_load8+0x6d/0xa0
 io_wqe_worker+0x4c0/0x5e0
 ? io_worker_handle_work+0xc00/0xc00
 ? recalc_sigpending+0xe5/0x120
 ? io_worker_handle_work+0xc00/0xc00
 ? io_worker_handle_work+0xc00/0xc00
 ret_from_fork+0x1f/0x30

Allocated by task 3080422:
 kasan_save_stack+0x23/0x60
 __kasan_kmalloc+0x80/0xa0
 kmem_cache_alloc_node_trace+0xa0/0x480
 io_wq_create+0x3b5/0x600
 io_uring_alloc_task_context+0x13c/0x380
 io_uring_add_task_file+0x109/0x140
 __x64_sys_io_uring_enter+0x45f/0x660
 do_syscall_64+0x32/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 3080422:
 kasan_save_stack+0x23/0x60
 kasan_set_track+0x20/0x40
 kasan_set_free_info+0x24/0x40
 __kasan_slab_free+0xe8/0x120
 kfree+0xa8/0x400
 io_wq_put+0x14a/0x220
 io_wq_put_and_exit+0x9a/0xc0
 io_uring_clean_tctx+0x101/0x140
 __io_uring_files_cancel+0x36e/0x3c0
 do_exit+0x169/0x1340
 __x64_sys_exit+0x34/0x40
 do_syscall_64+0x32/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Have the manager itself hold a reference, and now both drop points drop
and complete if we hit zero, and the manager can unconditionally do a
wait_for_completion() instead of having a race between reading the ref
count and waiting if it was non-zero.

Fixes: fb3a1f6c ("io-wq: have manager wait for all workers to exit")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

886d0137

05 3月, 2021 2 次提交

io-wq: kill hashed waitqueue before manager exits · 09ca6c40

由 Jens Axboe 提交于 3月 05, 2021

If we race with shutting down the io-wq context and someone queueing
a hashed entry, then we can exit the manager with it armed. If it then
triggers after the manager has exited, we can have a use-after-free where
io_wqe_hash_wake() attempts to wake a now gone manager process.

Move the killing of the hashed write queue into the manager itself, so
that we know we've killed it before the task exits.

Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

09ca6c40

io_uring: move to using create_io_thread() · 46fe18b1

由 Jens Axboe 提交于 3月 04, 2021

This allows us to do task creation and setup without needing to use
completions to try and synchronize with the starting thread. Get rid of
the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup
completion events - we can now do setup before the task is running.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

46fe18b1

04 3月, 2021 10 次提交

io-wq: ensure all pending work is canceled on exit · f0127254

由 Jens Axboe 提交于 3月 03, 2021

If we race on shutting down the io-wq, then we should ensure that any
work that was queued after workers shutdown is canceled. Harden the
add work check a bit too, checking for IO_WQ_BIT_EXIT and cancel if
it's set.

Add a WARN_ON() for having any work before we kill the io-wq context.

Reported-by: syzbot+91b4b56ead187d35c9d3@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f0127254

io_uring: ensure that threads freeze on suspend · e4b4a13f

由 Jens Axboe 提交于 3月 01, 2021

Alex reports that his system fails to suspend using 5.12-rc1, with the
following dump:

[  240.650300] PM: suspend entry (deep)
[  240.650748] Filesystems sync: 0.000 seconds
[  240.725605] Freezing user space processes ...
[  260.739483] Freezing of tasks failed after 20.013 seconds (3 tasks refusing to freeze, wq_busy=0):
[  260.739497] task:iou-mgr-446     state:S stack:    0 pid:  516 ppid:   439 flags:0x00004224
[  260.739504] Call Trace:
[  260.739507]  ? sysvec_apic_timer_interrupt+0xb/0x81
[  260.739515]  ? pick_next_task_fair+0x197/0x1cde
[  260.739519]  ? sysvec_reschedule_ipi+0x2f/0x6a
[  260.739522]  ? asm_sysvec_reschedule_ipi+0x12/0x20
[  260.739525]  ? __schedule+0x57/0x6d6
[  260.739529]  ? del_timer_sync+0xb9/0x115
[  260.739533]  ? schedule+0x63/0xd5
[  260.739536]  ? schedule_timeout+0x219/0x356
[  260.739540]  ? __next_timer_interrupt+0xf1/0xf1
[  260.739544]  ? io_wq_manager+0x73/0xb1
[  260.739549]  ? io_wq_create+0x262/0x262
[  260.739553]  ? ret_from_fork+0x22/0x30
[  260.739557] task:iou-mgr-517     state:S stack:    0 pid:  522 ppid:   439 flags:0x00004224
[  260.739561] Call Trace:
[  260.739563]  ? sysvec_apic_timer_interrupt+0xb/0x81
[  260.739566]  ? pick_next_task_fair+0x16f/0x1cde
[  260.739569]  ? sysvec_apic_timer_interrupt+0xb/0x81
[  260.739571]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[  260.739574]  ? __schedule+0x5b7/0x6d6
[  260.739578]  ? del_timer_sync+0x70/0x115
[  260.739581]  ? schedule_timeout+0x211/0x356
[  260.739585]  ? __next_timer_interrupt+0xf1/0xf1
[  260.739588]  ? io_wq_check_workers+0x15/0x11f
[  260.739592]  ? io_wq_manager+0x69/0xb1
[  260.739596]  ? io_wq_create+0x262/0x262
[  260.739600]  ? ret_from_fork+0x22/0x30
[  260.739603] task:iou-wrk-517     state:S stack:    0 pid:  523 ppid:   439 flags:0x00004224
[  260.739607] Call Trace:
[  260.739609]  ? __schedule+0x5b7/0x6d6
[  260.739614]  ? schedule+0x63/0xd5
[  260.739617]  ? schedule_timeout+0x219/0x356
[  260.739621]  ? __next_timer_interrupt+0xf1/0xf1
[  260.739624]  ? task_thread.isra.0+0x148/0x3af
[  260.739628]  ? task_thread_unbound+0xa/0xa
[  260.739632]  ? task_thread_bound+0x7/0x7
[  260.739636]  ? ret_from_fork+0x22/0x30
[  260.739647] OOM killer enabled.
[  260.739648] Restarting tasks ... done.
[  260.740077] PM: suspend exit

Play nice and ensure that any thread we create will call try_to_freeze()
at an opportune time so that memory suspend can proceed. For the io-wq
worker threads, mark them as PF_NOFREEZE. They could potentially be
blocked for a long time.
Reported-by: NAlex Xu (Hello71) <alex_y_xu@yahoo.ca>
Tested-by: NAlex Xu (Hello71) <alex_y_xu@yahoo.ca>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e4b4a13f

io-wq: fix error path leak of buffered write hash map · dc7bbc9e

由 Jens Axboe 提交于 3月 01, 2021

The 'err' path should include the hash put, we already grabbed a reference
once we get that far.

Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dc7bbc9e

io_uring: move cred assignment into io_issue_sqe() · 5730b27e

由 Jens Axboe 提交于 2月 27, 2021

If we move it in there, then we no longer have to care about it in io-wq.
This means we can drop the cred handling in io-wq, and we can drop the
REQ_F_WORK_INITIALIZED flag and async init functions as that was the last
user of it since we moved to the new workers. Then we can also drop
io_wq_work->creds, and just hold the personality u16 in there instead.
Suggested-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5730b27e

io-wq: provide an io_wq_put_and_exit() helper · afcc4015

由 Jens Axboe 提交于 2月 26, 2021

If we put the io-wq from io_uring, we really want it to exit. Provide
a helper that does that for us. Couple that with not having the manager
hold a reference to the 'wq' and the normal SQPOLL exit will tear down
the io-wq context appropriate.

On the io-wq side, our wq context is per task, so only the task itself
is manipulating ->manager and hence it's safe to check and clear without
any extra locking. We just need to ensure that the manager task stays
around, in case it exits.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

afcc4015

io-wq: fix double put of 'wq' in error path · 470ec4ed

由 Jens Axboe 提交于 2月 26, 2021

We are already freeing the wq struct in both spots, so don't put it and
get it freed twice.

Reported-by: syzbot+7bf785eedca35ca05501@syzkaller.appspotmail.com
Fixes: 4fb6ac32 ("io-wq: improve manager/worker handling over exec")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

470ec4ed

io-wq: wait for manager exit on wq destroy · d364d9e5

由 Jens Axboe 提交于 2月 26, 2021

The manager waits for the workers, hence the manager is always valid if
workers are running. Now also have wq destroy wait for the manager on
exit, so we now everything is gone.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d364d9e5

io-wq: rename wq->done completion to wq->started · dbf99620

由 Jens Axboe 提交于 2月 26, 2021

This is a leftover from a different use cases, it's used to wait for
the manager to startup. Rename it as such.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dbf99620

io-wq: don't ask for a new worker if we're exiting · 613eeb60

由 Jens Axboe 提交于 2月 26, 2021

If we're in the process of shutting down the async context, then don't
create new workers if we already have at least the fixed one.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

613eeb60

io-wq: have manager wait for all workers to exit · fb3a1f6c

由 Jens Axboe 提交于 2月 26, 2021

Instead of having to wait separately on workers and manager, just have
the manager wait on the workers. We use an atomic_t for the reference
here, as we need to start at 0 and allow increment from that. Since the
number of workers is naturally capped by the allowed nr of processes,
and that uses an int, there is no risk of overflow.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fb3a1f6c

02 3月, 2021 1 次提交

io-wq: wait for worker startup when forking a new one · 65d43023

由 Jens Axboe 提交于 2月 26, 2021

We need to have our worker count updated before continuing, to avoid
cases where we repeatedly think we need a new worker, but a fork is
already in progress.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

65d43023

26 2月, 2021 3 次提交

io-wq: remove now unused IO_WQ_BIT_ERROR · d6ce7f67

由 Jens Axboe 提交于 2月 25, 2021

This flag is now dead, remove it.

Fixes: 1cbd9c2b ("io-wq: don't create any IO workers upfront")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d6ce7f67

io-wq: improve manager/worker handling over exec · 4fb6ac32

由 Jens Axboe 提交于 2月 25, 2021

exec will cancel any threads, including the ones that io-wq is using. This
isn't a problem, in fact we'd prefer it to be that way since it means we
know that any async work cancels naturally without having to handle it
proactively.

But it does mean that we need to setup a new manager, as the manager and
workers are gone. Handle this at queue time, and cancel work if we fail.
Since the manager can go away without us noticing, ensure that the manager
itself holds a reference to the 'wq' as well. Rename io_wq_destroy() to
io_wq_put() to reflect that.

In the future we can now simplify exec cancelation handling, for now just
leave it the same.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4fb6ac32

io-wq: make buffered file write hashed work map per-ctx · e941894e

由 Jens Axboe 提交于 2月 19, 2021

Before the io-wq thread change, we maintained a hash work map and lock
per-node per-ring. That wasn't ideal, as we really wanted it to be per
ring. But now that we have per-task workers, the hash map ends up being
just per-task. That'll work just fine for the normal case of having
one task use a ring, but if you share the ring between tasks, then it's
considerably worse than it was before.

Make the hash map per ctx instead, which provides full per-ctx buffered
write serialization on hashed writes.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e941894e

24 2月, 2021 3 次提交

io-wq: fix race around io_worker grabbing · eb2de941

由 Jens Axboe 提交于 2月 23, 2021

There's a small window between lookup dropping the reference to the
worker and calling wake_up_process() on the worker task, where the worker
itself could have exited. We ensure that the worker struct itself is
valid, but worker->task may very well be gone by the time we issue the
wakeup.

Fix the race by using a completion triggered by the reference going to
zero, and having exit wait for that completion before proceeding.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eb2de941

io-wq: fix races around manager/worker creation and task exit · 8b3e78b5

由 Jens Axboe 提交于 2月 23, 2021

These races have always been there, they are just more apparent now that
we do early cancel of io-wq when the task exits.

Ensure that the io-wq manager sets task state correctly to not miss
wakeups for task creation. This is important if we get a wakeup after
having marked ourselves as TASK_INTERRUPTIBLE. If we do end up creating
workers, then we flip the state back to running, making the subsequent
schedule() a no-op. Also increment the wq ref count before forking the
thread, to avoid a use-after-free.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8b3e78b5

io-wq: remove nr_process accounting · 728f13e7

由 Jens Axboe 提交于 2月 21, 2021

We're now just using fork like we would from userspace, so there's no
need to try and impose extra restrictions or accounting on the user
side of things. That's already being done for us. That also means we
don't have to pass in the user_struct anymore, that's correctly inherited
through ->creds on fork.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

728f13e7

22 2月, 2021 9 次提交

io-wq: make io_wq_fork_thread() available to other users · 843bbfd4

由 Jens Axboe 提交于 2月 17, 2021

We want to use this in io_uring proper as well, for the SQPOLL thread.
Rename it from fork_thread() to io_wq_fork_thread(), and make it
available through the io-wq.h header.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

843bbfd4

io-wq: only remove worker from free_list, if it was there · bf1daa4b

由 Jens Axboe 提交于 2月 16, 2021

If the worker isn't on the free_list, don't attempt to delete it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bf1daa4b

io_uring: remove io_identity · 4379bf8b

由 Jens Axboe 提交于 2月 15, 2021

We are no longer grabbing state, so no need to maintain an IO identity
that we COW if there are changes.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4379bf8b

io-wq: worker idling always returns false · c6d77d92

由 Jens Axboe 提交于 2月 15, 2021

Remove the bool return, and the checking for it in the caller.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c6d77d92

io-wq: fork worker threads from original task · 3bfe6106

由 Jens Axboe 提交于 2月 16, 2021

Instead of using regular kthread kernel threads, create kernel threads
that are like a real thread that the task would create. This ensures that
we get all the context that we need, without having to carry that state
around. This greatly reduces the code complexity, and the risk of missing
state for a given request type.

With the move away from kthread, we can also dump everything related to
assigned state to the new threads.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3bfe6106

io-wq: don't pass 'wqe' needlessly around · 958234d5

由 Jens Axboe 提交于 2月 17, 2021

Just grab it from the worker itself, which we're already passing in.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

958234d5

io-wq: get rid of wq->use_refs · 3b094e72

由 Jens Axboe 提交于 2月 16, 2021

We don't support attach anymore, so doesn't make sense to carry the
use_refs reference count. Get rid of it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3b094e72

io-wq: don't create any IO workers upfront · 1cbd9c2b

由 Jens Axboe 提交于 2月 16, 2021

When the manager thread starts up, it creates a worker per node for
the given context. Just let these get created dynamically, like we do
for adding further workers.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1cbd9c2b

io_uring: remove the need for relying on an io-wq fallback worker · 7c25c0d1

由 Jens Axboe 提交于 2月 16, 2021

We hit this case when the task is exiting, and we need somewhere to
do background cleanup of requests. Instead of relying on the io-wq
task manager to do this work for us, just stuff it somewhere where
we can safely run it ourselves directly.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7c25c0d1

13 2月, 2021 1 次提交

io-wq: clear out worker ->fs and ->files · e06aa2e9

由 Jens Axboe 提交于 2月 12, 2021

By default, kernel threads have init_fs and init_files assigned. In the
past, this has triggered security problems, as commands that don't ask
for (and hence don't get assigned) fs/files from the originating task
can then attempt path resolution etc with access to parts of the system
they should not be able to.

Rather than add checks in the fs code for misuse, just set these to
NULL. If we do attempt to use them, then the resulting code will oops
rather than provide access to something that it should not permit.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e06aa2e9

04 2月, 2021 1 次提交

io_uring/io-wq: return 2-step work swap scheme · 5280f7e5

由 Pavel Begunkov 提交于 2月 04, 2021

Saving one lock/unlock for io-wq is not super important, but adds some
ugliness in the code. More important, atomic decs not turning it to zero
for some archs won't give the right ordering/barriers so the
io_steal_work() may pretty easily get subtly and completely broken.

Return back 2-step io-wq work exchange and clean it up.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5280f7e5

02 2月, 2021 1 次提交

io_uring/io-wq: kill off now unused IO_WQ_WORK_NO_CANCEL · 4014d943

由 Jens Axboe 提交于 1月 19, 2021

It's no longer used as IORING_OP_CLOSE got rid for the need of flagging
it as uncancelable, kill it of.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4014d943

21 12月, 2020 1 次提交

io-wq: kill now unused io_wq_cancel_all() · 446bc1c2

由 Jens Axboe 提交于 12月 20, 2020

io_uring no longer issues full cancelations on the io-wq, so remove any
remnants of this code and the IO_WQ_BIT_CANCEL flag.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

446bc1c2

10 12月, 2020 1 次提交

io_uring: always batch cancel in *cancel_files() · f6edbabb

由 Pavel Begunkov 提交于 11月 06, 2020

Instead of iterating over each request and cancelling it individually in
io_uring_cancel_files(), try to cancel all matching requests and use
->inflight_list only to check if there anything left.

In many cases it should be faster, and we can reuse a lot of code from
task cancellation.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f6edbabb

05 11月, 2020 1 次提交

io-wq: cancel request if it's asking for files and we don't have them · 3dd1680d

由 Jens Axboe 提交于 10月 30, 2020

This can't currently happen, but will be possible shortly. Handle missing
files just like we do not being able to grab a needed mm, and mark the
request as needing cancelation.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3dd1680d

22 10月, 2020 1 次提交

io-wq: re-set NUMA node affinities if CPUs come online · 43c01fbe

由 Jens Axboe 提交于 10月 22, 2020

We correctly set io-wq NUMA node affinities when the io-wq context is
setup, but if an entire node CPU set is offlined and then brought back
online, the per node affinities are broken. Ensure that we set them
again whenever a CPU comes online. This ensures that we always track
the right node affinity. The usual cpuhp notifiers are used to drive it.
Reported-by: NZhang Qiang <qiang.zhang@windriver.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

43c01fbe

21 10月, 2020 1 次提交

io_uring: unify fsize with def->work_flags · 69228338

由 Jens Axboe 提交于 10月 20, 2020

This one was missed in the earlier conversion, should be included like
any of the other IO identity flags. Make sure we restore to RLIM_INIFITY
when dropping the personality again.

Fixes: 98447d65 ("io_uring: move io identity items into separate struct")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

69228338

17 10月, 2020 3 次提交

io-wq: inherit audit loginuid and sessionid · 4ea33a97

由 Jens Axboe 提交于 10月 15, 2020

Make sure the async io-wq workers inherit the loginuid and sessionid from
the original task, and restore them to unset once we're done with the
async work item.

While at it, disable the ability for kernel threads to write to their own
loginuid.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4ea33a97

io_uring: move io identity items into separate struct · 98447d65

由 Jens Axboe 提交于 10月 14, 2020

io-wq contains a pointer to the identity, which we just hold in io_kiocb
for now. This is in preparation for putting this outside io_kiocb. The
only exception is struct files_struct, which we'll need different rules
for to avoid a circular dependency.

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

98447d65

io_uring: rely solely on work flags to determine personality. · dfead8a8

由 Jens Axboe 提交于 10月 14, 2020

We solely rely on work->work_flags now, so use that for proper checking
and clearing/dropping of various identity items.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dfead8a8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功