提交 · a185f1db59f13de73aa470559030e90e50b34d93 · openeuler / Kernel

24 3月, 2021 1 次提交

io_uring: do ctx sqd ejection in a clear context · a185f1db

由 Pavel Begunkov 提交于 3月 23, 2021

WARNING: CPU: 1 PID: 27907 at fs/io_uring.c:7147 io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
CPU: 1 PID: 27907 Comm: iou-sqp-27905 Not tainted 5.12.0-rc4-syzkaller #0
RIP: 0010:io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
Call Trace:
 io_ring_ctx_wait_and_kill+0x214/0x700 fs/io_uring.c:8619
 io_uring_release+0x3e/0x50 fs/io_uring.c:8646
 __fput+0x288/0x920 fs/file_table.c:280
 task_work_run+0xdd/0x1a0 kernel/task_work.c:140
 io_run_task_work fs/io_uring.c:2238 [inline]
 io_run_task_work fs/io_uring.c:2228 [inline]
 io_uring_try_cancel_requests+0x8ec/0xc60 fs/io_uring.c:8770
 io_uring_cancel_sqpoll+0x1cf/0x290 fs/io_uring.c:8974
 io_sqpoll_cancel_cb+0x87/0xb0 fs/io_uring.c:8907
 io_run_task_work_head+0x58/0xb0 fs/io_uring.c:1961
 io_sq_thread+0x3e2/0x18d0 fs/io_uring.c:6763
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

May happen that last ctx ref is killed in io_uring_cancel_sqpoll(), so
fput callback (i.e. io_uring_release()) is enqueued through task_work,
and run by same cancellation. As it's deeply nested we can't do parking
or taking sqd->lock there, because its state is unclear. So avoid
ctx ejection from sqd list from io_ring_ctx_wait_and_kill() and do it
in a clear context in io_ring_exit_work().

Fixes: f6d54255 ("io_uring: halt SQO submission on ctx exit")
Reported-by: syzbot+e3a3f84f5cecf61f0583@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e90df88b8ff2cabb14a7534601d35d62ab4cb8c7.1616496707.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a185f1db

22 3月, 2021 4 次提交

io_uring: fix provide_buffers sign extension · d81269fe

由 Pavel Begunkov 提交于 3月 19, 2021

io_provide_buffers_prep()'s "p->len * p->nbufs" to sign extension
problems. Not a huge problem as it's only used for access_ok() and
increases the checked length, but better to keep typing right.
Reported-by: NColin Ian King <colin.king@canonical.com>
Fixes: efe68c1c ("io_uring: validate the full range of provided buffers for access")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NColin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/562376a39509e260d8532186a06226e56eb1f594.1616149233.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d81269fe

io_uring: don't skip file_end_write() on reissue · b65c128f

由 Pavel Begunkov 提交于 3月 22, 2021

Don't miss to call kiocb_end_write() from __io_complete_rw() on reissue.
Shouldn't be much of a problem as the function actually does some work
only for ISREG, and NONBLOCK won't be reissued.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/32af9b77c5b874e1bee1a3c46396094bd969e577.1616366969.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b65c128f

io_uring: correct io_queue_async_work() traces · d07f1e8a

由 Pavel Begunkov 提交于 3月 22, 2021

Request's io-wq work is hashed in io_prep_async_link(), so
as trace_io_uring_queue_async_work() looks at it should follow after
prep has been done.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/709c9f872f4d2e198c7aed9c49019ca7095dd24d.1616366969.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d07f1e8a

io_uring: don't use {test,clear}_tsk_thread_flag() for current · 0b8cfa97

由 Jens Axboe 提交于 3月 21, 2021

Linus correctly points out that this is both unnecessary and generates
much worse code on some archs as going from current to thread_info is
actually backwards - and obviously just wasteful, since the thread_info
is what we care about.

Since io_uring only operates on current for these operations, just use
test_thread_flag() instead. For io-wq, we can further simplify and use
tracehook_notify_signal() to handle the TIF_NOTIFY_SIGNAL work and clear
the flag. The latter isn't an actual bug right now, but it may very well
be in the future if we place other work items under TIF_NOTIFY_SIGNAL.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/io-uring/CAHk-=wgYhNck33YHKZ14mFB5MzTTk8gqXHcfj=RWTAXKwgQJgg@mail.gmail.com/Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b8cfa97

21 3月, 2021 1 次提交

io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL · 0031275d

由 Stefan Metzmacher 提交于 3月 20, 2021

Without that it's not safe to use them in a linked combination with
others.

Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
should be possible.

We already handle short reads and writes for the following opcodes:

- IORING_OP_READV
- IORING_OP_READ_FIXED
- IORING_OP_READ
- IORING_OP_WRITEV
- IORING_OP_WRITE_FIXED
- IORING_OP_WRITE
- IORING_OP_SPLICE
- IORING_OP_TEE

Now we have it for these as well:

- IORING_OP_SENDMSG
- IORING_OP_SEND
- IORING_OP_RECVMSG
- IORING_OP_RECV

For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
flags in order to call req_set_fail_links().

There might be applications arround depending on the behavior
that even short send[msg]()/recv[msg]() retuns continue an
IOSQE_IO_LINK chain.

It's very unlikely that such applications pass in MSG_WAITALL,
which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.

It's expected that the low level sock_sendmsg() call just ignores
MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
SO_ZEROCOPY.

We also expect the caller to know about the implicit truncation to
MAX_RW_COUNT, which we don't detect.

cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.orgSigned-off-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0031275d

18 3月, 2021 4 次提交

io_uring: don't leak creds on SQO attach error · de75a3d3

由 Pavel Begunkov 提交于 3月 18, 2021

Attaching to already dead/dying SQPOLL task is disallowed in
io_sq_offload_create(), but cleanup is hand coded by calling
io_put_sq_data()/etc., that miss to put ctx->sq_creds.

Defer everything to error-path io_sq_thread_finish(), adding
ctx->sqd_list in the error case as well as finish will handle it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de75a3d3

io_uring: use typesafe pointers in io_uring_task · ee53fb2b

由 Stefan Metzmacher 提交于 3月 15, 2021

Signed-off-by: NStefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/ce2a598e66e48347bb04afbaf2acc67c0cc7971a.1615809009.git.metze@samba.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

ee53fb2b

io_uring: remove structures from include/linux/io_uring.h · 53e043b2

由 Stefan Metzmacher 提交于 3月 15, 2021

Link: https://lore.kernel.org/r/8c1d14f3748105f4caeda01716d47af2fa41d11c.1615809009.git.metze@samba.orgSigned-off-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

53e043b2

io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls · 76cd979f

由 Stefan Metzmacher 提交于 3月 16, 2021

We never want to generate any SIGPIPE, -EPIPE only is much better.
Signed-off-by: NStefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/38961085c3ec49fd21550c7788f214d1ff02d2d4.1615908477.git.metze@samba.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

76cd979f

15 3月, 2021 7 次提交

io_uring: fix sqpoll cancellation via task_work · b7f5a0bf

由 Pavel Begunkov 提交于 3月 15, 2021

Running sqpoll cancellations via task_work_run() is a bad idea because
it depends on other task works to be run, but those may be locked in
currently running task_work_run() because of how it's (splicing the list
in batches).

Enqueue and run them through a separate callback head, namely
struct io_sq_data::park_task_work. As a nice bonus we now precisely
control where it's run, that's much safer than guessing where it can
happen as it was before.
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7f5a0bf

io_uring: add generic callback_head helpers · 9b465711

由 Pavel Begunkov 提交于 3月 15, 2021

We already have helpers to run/add callback_head but taking ctx and
working with ctx->exit_task_work. Extract generic versions of them
implemented in terms of struct callback_head, it will be used later.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9b465711

io_uring: fix concurrent parking · 9e138a48

由 Pavel Begunkov 提交于 3月 14, 2021

If io_sq_thread_park() of one task got rescheduled right after
set_bit(), before it gets back to mutex_lock() there can happen
park()/unpark() by another task with SQPOLL locking again and
continuing running never seeing that first set_bit(SHOULD_PARK),
so won't even try to put the mutex down for parking.

It will get parked eventually when SQPOLL drops the lock for reschedule,
but may be problematic and will get in the way of further fixes.

Account number of tasks waiting for parking with a new atomic variable
park_pending and adjust SHOULD_PARK accordingly. It doesn't entirely
replaces SHOULD_PARK bit with this atomic var because it's convenient
to have it as a bit in the state and will help to do optimisations
later.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9e138a48

io_uring: halt SQO submission on ctx exit · f6d54255

由 Pavel Begunkov 提交于 3月 14, 2021

io_sq_thread_finish() is called in io_ring_ctx_free(), so SQPOLL task is
potentially running submitting new requests. It's not a disaster because
of using a "try" variant of percpu_ref_get, but is far from nice.

Remove ctx from the sqd ctx list earlier, before cancellation loop, so
SQPOLL can't find it and so won't submit new requests.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f6d54255

io_uring: replace sqd rw_semaphore with mutex · 09a6f4ef

由 Pavel Begunkov 提交于 3月 14, 2021

The only user of read-locking of sqd->rw_lock is sq_thread itself, which
is by definition alone, so we don't really need rw_semaphore, but mutex
will do. Replace it with a mutex, and kill read-to-write upgrading and
extra task_work handling in io_sq_thread().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

09a6f4ef

io_uring: fix complete_post use ctx after free · 180f829f

由 Pavel Begunkov 提交于 3月 14, 2021

If io_req_complete_post() put not a final ref, we can't rely on the
request's ctx ref, and so ctx may potentially be freed while
complete_post() is in io_cqring_ev_posted()/etc.

In that case get an additional ctx reference, and put it in the end, so
protecting following io_cqring_ev_posted(). And also prolong ctx
lifetime until spin_unlock happens, as we do with mutexes, so added
percpu_ref_get() doesn't race with ctx free.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

180f829f

io_uring: fix ->flags races by linked timeouts · efe814a4

由 Pavel Begunkov 提交于 3月 14, 2021

It's racy to modify req->flags from a not owning context, e.g. linked
timeout calling req_set_fail_links() for the master request might race
with that request setting/clearing flags while being executed
concurrently. Just remove req_set_fail_links(prev) from
io_link_timeout_fn(), io_async_find_and_cancel() and functions down the
line take care of setting the fail bit.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

efe814a4

14 3月, 2021 1 次提交

io_uring: convert io_buffer_idr to XArray · 9e15c3a0

由 Jens Axboe 提交于 3月 13, 2021

Like we did for the personality idr, convert the IO buffer idr to use
XArray. This avoids a use-after-free on removal of entries, since idr
doesn't like doing so from inside an iterator, and it nicely reduces
the amount of code we need to support this feature.

Fixes: 5a2e745d ("io_uring: buffer registration infrastructure")
Cc: stable@vger.kernel.org
Cc: Matthew Wilcox <willy@infradead.org>
Cc: yangerkun <yangerkun@huawei.com>
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9e15c3a0

13 3月, 2021 4 次提交

io_uring: allow IO worker threads to be frozen · 16efa4fc

由 Jens Axboe 提交于 3月 12, 2021

With the freezer using the proper signaling to notify us of when it's
time to freeze a thread, we can re-enable normal freezer usage for the
IO threads. Ensure that SQPOLL, io-wq, and the io-wq manager call
try_to_freeze() appropriately, and remove the default setting of
PF_NOFREEZE from create_io_thread().
Signed-off-by: NJens Axboe <axboe@kernel.dk>

16efa4fc

io_uring: fix OP_ASYNC_CANCEL across tasks · 58f99373

由 Pavel Begunkov 提交于 3月 12, 2021

IORING_OP_ASYNC_CANCEL tries io-wq cancellation only for current task.
If it fails go over tctx_list and try it out for every single tctx.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

58f99373

io_uring: cancel sqpoll via task_work · 521d6a73

由 Pavel Begunkov 提交于 3月 11, 2021

1) The first problem is io_uring_cancel_sqpoll() ->
io_uring_cancel_task_requests() basically doing park(); park(); and so
hanging.

2) Another one is more subtle, when the master task is doing cancellations,
but SQPOLL task submits in-between the end of the cancellation but
before finish() requests taking a ref to the ctx, and so eternally
locking it up.

3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and
same io_uring_cancel_sqpoll() from the owner task, they race for
tctx->wait events. And there probably more of them.

Instead do SQPOLL cancellations from within SQPOLL task context via
task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal
park()/unpark() during cancellation, which is ugly, subtle and anyway
doesn't allow to do io_run_task_work() properly.

io_uring_cancel_sqpoll() is called only from SQPOLL task context and
under sqd locking, so all parking is removed from there. And so,
io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by
SQPOLL task, and that spare us from some headache.

Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll,
which is not used anymore.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

521d6a73

io_uring: prevent racy sqd->thread checks · 26984fbf

由 Pavel Begunkov 提交于 3月 11, 2021

SQPOLL thread to which we're trying to attach may be going away, it's
not nice but a more serious problem is if io_sq_offload_create() sees
sqd->thread==NULL, and tries to init it with a new thread. There are
tons of ways it can be exploited or fail.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

26984fbf

12 3月, 2021 4 次提交

io_uring: remove useless ->startup completion · 0df8ea60

由 Pavel Begunkov 提交于 3月 11, 2021

We always do complete(&sqd->startup) almost right after sqd->thread
creation, either in the success path or in io_sq_thread_finish(). It's
specifically created not started for us to be able to set some stuff
like sqd->thread and io_uring_alloc_task_context() before following
right after wake_up_new_task().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0df8ea60

io_uring: cancel deferred requests in try_cancel · e1915f76

由 Pavel Begunkov 提交于 3月 11, 2021

As io_uring_cancel_files() and others let SQO to run between
io_uring_try_cancel_requests(), SQO may generate new deferred requests,
so it's safer to try to cancel them in it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e1915f76

io_uring: perform IOPOLL reaping if canceler is thread itself · d052d1d6

由 Jens Axboe 提交于 3月 11, 2021

We bypass IOPOLL completion polling (and reaping) for the SQPOLL thread,
but if it's the thread itself invoking cancelations, then we still need
to perform it or no one will.

Fixes: 9936c7c2 ("io_uring: deduplicate core cancellations sequence")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d052d1d6

io_uring: force creation of separate context for ATTACH_WQ and non-threads · 5c2469e0

由 Jens Axboe 提交于 3月 11, 2021

Earlier kernels had SQPOLL threads that could share across anything, as
we grabbed the context we needed on a per-ring basis. This is no longer
the case, so only allow attaching directly if we're in the same thread
group. That is the common use case. For non-group tasks, just setup a
new context and thread as we would've done if sharing wasn't set. This
isn't 100% ideal in terms of CPU utilization for the forked and share
case, but hopefully that isn't much of a concern. If it is, there are
plans in motion for how to improve that. Most importantly, we want to
avoid app side regressions where sharing worked before and now doesn't.
With this patch, functionality is equivalent to previous kernels that
supported IORING_SETUP_ATTACH_WQ with SQPOLL.
Reported-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5c2469e0

10 3月, 2021 13 次提交

io_uring: remove indirect ctx into sqo injection · 7d41e854

由 Pavel Begunkov 提交于 3月 10, 2021

We use ->ctx_new_list to notify sqo about new ctx pending, then sqo
should stop and splice it to its sqd->ctx_list, paired with
->sq_thread_comp.

The last one is broken because nobody reinitialises it, and trying to
fix it would only add more complexity and bugs. And the first isn't
really needed as is done under park(), that protects from races well.
Add ctx into sqd->ctx_list directly (under park()), it's much simpler
and allows to kill both, ctx_new_list and sq_thread_comp.

note: apparently there is no real problem at the moment, because
sq_thread_comp is used only by io_sq_thread_finish() followed by
parking, where list_del(&ctx->sqd_list) removes it well regardless
whether it's in the new or the active list.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7d41e854

io_uring: fix invalid ctx->sq_thread_idle · 78d7f6ba

由 Pavel Begunkov 提交于 3月 10, 2021

We have to set ctx->sq_thread_idle before adding a ring to an SQ task,
otherwise sqd races for seeing zero and accounting it as such.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

78d7f6ba

kernel: make IO threads unfreezable by default · e22bc9b4

由 Jens Axboe 提交于 3月 09, 2021

The io-wq threads were already marked as no-freeze, but the manager was
not. On resume, we perpetually have signal_pending() being true, and
hence the manager will loop and spin 100% of the time.

Just mark the tasks created by create_io_thread() as PF_NOFREEZE by
default, and remove any knowledge of it in io-wq and io_uring.
Reported-by: NKevin Locke <kevin@kevinlocke.name>
Tested-by: NKevin Locke <kevin@kevinlocke.name>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e22bc9b4

io_uring: always wait for sqd exited when stopping SQPOLL thread · e8f98f24

由 Jens Axboe 提交于 3月 09, 2021

We have a tiny race where io_put_sq_data() calls io_sq_thead_stop()
and finds the thread gone, but the thread has indeed not fully
exited or called complete() yet. Close it up by always having
io_sq_thread_stop() wait on completion of the exit event.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e8f98f24

io_uring: remove unneeded variable 'ret' · 5199328a

由 Yang Li 提交于 3月 09, 2021

Fix the following coccicheck warning:
./fs/io_uring.c:8984:5-8: Unneeded variable: "ret". Return "0" on line
8998
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NYang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/1615271441-33649-1-git-send-email-yang.lee@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5199328a

io_uring: move all io_kiocb init early in io_init_req() · 93e68e03

由 Jens Axboe 提交于 3月 09, 2021

If we hit an error path in the function, make sure that the io_kiocb is
fully initialized at that point so that freeing the request always sees
a valid state.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

93e68e03

io_uring: fix complete_post races for linked req · 7a612350

由 Pavel Begunkov 提交于 3月 09, 2021

Calling io_queue_next() after spin_unlock in io_req_complete_post()
races with the other side extracting and reusing this request. Hand
coded parts of io_req_find_next() considering that io_disarm_next()
and io_req_task_queue() have (and safe) to be called with
completion_lock held.

It already does io_commit_cqring() and io_cqring_ev_posted(), so just
reuse it for post io_disarm_next().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5672a62f3150ee7c55849f40c0037655c4f2840f.1615250156.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7a612350

io_uring: add io_disarm_next() helper · 33cc89a9

由 Pavel Begunkov 提交于 3月 09, 2021

A preparation patch placing all preparations before extracting a next
request into a separate helper io_disarm_next().

Also, don't spuriously do ev_posted in a rare case where REQ_F_FAIL_LINK
is set but there are no requests linked (i.e. after cancelling a linked
timeout or setting IOSQE_IO_LINK on a last request of a submission
batch).
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/44ecff68d6b47e1c4e6b891bdde1ddc08cfc3590.1615250156.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

33cc89a9

io_uring: fix io_sq_offload_create error handling · 97a73a0f

由 Pavel Begunkov 提交于 3月 08, 2021

Don't set IO_SQ_THREAD_SHOULD_STOP when io_sq_offload_create() has
failed on io_uring_alloc_task_context() but leave everything to
io_sq_thread_finish(), because currently io_sq_thread_finish()
hangs on trying to park it. That's great it stalls there, because
otherwise the following io_sq_thread_stop() would be skipped on
IO_SQ_THREAD_SHOULD_STOP check and the sqo would race for sqd with
freeing ctx.

A simple error injection gives something like this.

[  245.463955] INFO: task sqpoll-test-hang:523 blocked for more than 122 seconds.
[  245.463983] Call Trace:
[  245.463990]  __schedule+0x36b/0x950
[  245.464005]  schedule+0x68/0xe0
[  245.464013]  schedule_timeout+0x209/0x2a0
[  245.464032]  wait_for_completion+0x8b/0xf0
[  245.464043]  io_sq_thread_finish+0x44/0x1a0
[  245.464049]  io_uring_setup+0x9ea/0xc80
[  245.464058]  __x64_sys_io_uring_setup+0x16/0x20
[  245.464064]  do_syscall_64+0x38/0x50
[  245.464073]  entry_SYSCALL_64_after_hwframe+0x44/0xae
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

97a73a0f

io_uring: Convert personality_idr to XArray · 61cf9370

由 Matthew Wilcox (Oracle) 提交于 3月 08, 2021

You can't call idr_remove() from within a idr_for_each() callback,
but you can call xa_erase() from an xa_for_each() loop, so switch the
entire personality_idr from the IDR to the XArray.  This manifests as a
use-after-free as idr_for_each() attempts to walk the rest of the node
after removing the last entry from it.

Fixes: 071698e1 ("io_uring: allow registering credentials")
Cc: stable@vger.kernel.org # 5.6+
Reported-by: Nyangerkun <yangerkun@huawei.com>
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
[Pavel: rebased (creds load was moved into io_init_req())]
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7ccff36e1375f2b0ebf73d957f037b43becc0dde.1615212806.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

61cf9370

io_uring: clean R_DISABLED startup mess · 0298ef96

由 Pavel Begunkov 提交于 3月 08, 2021

There are enough of problems with IORING_SETUP_R_DISABLED, including the
burden of checking and kicking off the SQO task all over the codebase --
for exit/cancel/etc.

Rework it, always start the thread but don't do submit unless the flag
is gone, that's much easier.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0298ef96

io_uring: fix unrelated ctx reqs cancellation · f458dd84

由 Pavel Begunkov 提交于 3月 08, 2021

io-wq now is per-task, so cancellations now should match against
request's ctx.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f458dd84

io_uring: SQPOLL parking fixes · 05962f95

由 Jens Axboe 提交于 3月 06, 2021

We keep running into weird dependency issues between the sqd lock and
the parking state. Disentangle the SQPOLL thread from the last bits of
the kthread parking inheritance, and just replace the parking state,
and two associated locks, with a single rw mutex. The SQPOLL thread
keeps the mutex for read all the time, except if someone has marked us
needing to park. Then we drop/re-acquire and try again.

This greatly simplifies the parking state machine (by just getting rid
of it), and makes it a lot more obvious how it works - if you need to
modify the ctx list, then you simply park the thread which will grab
the lock for writing.

Fold in fix from Hillf Danton on not setting STOP on a fatal signal.

Fixes: e54945ae ("io_uring: SQPOLL stop error handling fixes")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

05962f95

08 3月, 2021 1 次提交

io_uring: kill io_sq_thread_fork() and return -EOWNERDEAD if the sq_thread is gone · 04147488

由 Stefan Metzmacher 提交于 3月 07, 2021

This brings the behavior back in line with what 5.11 and earlier did,
and this is no longer needed with the improved handling of creds
not needing to do unshare().
Signed-off-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

04147488

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功