提交 · 87c4311fd2c28e83545cdfa4702b57db15ed1d9b · openeuler / Kernel

01 10月, 2020 38 次提交

J
io_uring: kill callback_head argument for io_req_task_work_add() · 87c4311f
由 Jens Axboe 提交于 9月 30, 2020
```
We always use &req->task_work anyway, no point in passing it in.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
87c4311f

io_uring: move req preps out of io_issue_sqe() · c1379e24

由 Pavel Begunkov 提交于 9月 30, 2020

All request preparations are done only during submission, reflect it in
the code by moving io_req_prep() much earlier into io_queue_sqe().

That's much cleaner, because it doen't expose bits to async code which
it won't ever use. Also it makes the interface harder to misuse, and
there are potential places for bugs.

For instance, __io_queue() doesn't clear @sqe before proceeding to a
next linked request, that could have been disastrous, but hopefully
there are linked requests IFF sqe==NULL, so not actually a bug.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c1379e24

io_uring: decouple issuing and req preparation · bfe76559

由 Pavel Begunkov 提交于 9月 30, 2020

io_issue_sqe() does two things at once, trying to prepare request and
issuing them. Split it in two and deduplicate with io_defer_prep().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bfe76559

io_uring: remove nonblock arg from io_{rw}_prep() · 73debe68

由 Pavel Begunkov 提交于 9月 30, 2020

All io_*_prep() functions including io_{read,write}_prep() are called
only during submission where @force_nonblock is always true. Don't keep
propagating it and instead remove the @force_nonblock argument
from prep() altogether.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

73debe68

io_uring: set/clear IOCB_NOWAIT into io_read/write · a88fc400

由 Pavel Begunkov 提交于 9月 30, 2020

Move setting IOCB_NOWAIT from io_prep_rw() into io_read()/io_write(), so
it's set/cleared in a single place. Also remove @force_nonblock
parameter from io_prep_rw().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a88fc400

io_uring: remove F_NEED_CLEANUP check in *prep() · 2d199895

由 Pavel Begunkov 提交于 9月 30, 2020

REQ_F_NEED_CLEANUP is set only by io_*_prep() and they're guaranteed to
be called only once, so there is no one who may have set the flag
before. Kill REQ_F_NEED_CLEANUP check in these *prep() handlers.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2d199895

io_uring: io_kiocb_ppos() style change · 5b09e37e

由 Pavel Begunkov 提交于 9月 30, 2020

Put brackets around bitwise ops in a complex expression
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5b09e37e

io_uring: simplify io_alloc_req() · 291b2821

由 Pavel Begunkov 提交于 9月 30, 2020

Extract common code from if/else branches. That is cleaner and optimised
even better.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

291b2821

io_uring: show sqthread pid and cpu in fdinfo · dbbe9c64

由 Joseph Qi 提交于 9月 29, 2020

In most cases we'll specify IORING_SETUP_SQPOLL and run multiple
io_uring instances in a host. Since all sqthreads are named
"io_uring-sq", it's hard to distinguish the relations between
application process and its io_uring sqthread.
With this patch, application can get its corresponding sqthread pid
and cpu through show_fdinfo.
Steps:
1. Get io_uring fd first.
$ ls -l /proc/<pid>/fd | grep -w io_uring
2. Then get io_uring instance related info, including corresponding
sqthread pid and cpu.
$ cat /proc/<pid>/fdinfo/<io_uring_fd>

pos:	0
flags:	02000002
mnt_id:	13
SqThread:	6929
SqThreadCpu:	2
UserFiles:	1
    0: testfile
UserBufs:	0
PollList:
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
[axboe: fixed for new shared SQPOLL]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dbbe9c64

io_uring: process task work in io_uring_register() · af9c1a44

由 Jens Axboe 提交于 9月 24, 2020

We do this for CQ ring wait, in case task_work completions come in. We
should do the same in io_uring_register(), to avoid spurious -EINTR
if the ring quiescing ends up having to process task_work to complete
the operation
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

af9c1a44

io_uring: add blkcg accounting to offloaded operations · 91d8f519

由 Dennis Zhou 提交于 9月 16, 2020

There are a few operations that are offloaded to the worker threads. In
this case, we lose process context and end up in kthread context. This
results in ios to be not accounted to the issuing cgroup and
consequently end up as issued by root. Just like others, adopt the
personality of the blkcg too when issuing via the workqueues.

For the SQPOLL thread, it will live and attach in the inited cgroup's
context.
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

91d8f519

io_uring: improve registered buffer accounting for huge pages · de293938

由 Jens Axboe 提交于 9月 17, 2020

io_uring does account any registered buffer as pinned/locked memory, and
checks limit and fails if the given user doesn't have a big enough limit
to register the ranges specified. However, if huge pages are used, we
are potentially under-accounting the memory in terms of what gets pinned
on the vm side.

This patch rectifies that, by ensuring that we account the full size of
a compound page, regardless of how much of it is being registered. Huge
pages are not accounted mulitple times - if multiple sections of a huge
page is registered, then the page is only accounted once.
Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de293938

io_uring: remove unneeded semicolon · 14db8411

由 Zheng Bin 提交于 9月 09, 2020

Fixes coccicheck warning:

fs/io_uring.c:4242:13-14: Unneeded semicolon
Signed-off-by: NZheng Bin <zhengbin13@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

14db8411

io_uring: cap SQ submit size for SQPOLL with multiple rings · e95eee2d

由 Jens Axboe 提交于 9月 08, 2020

In the spirit of fairness, cap the max number of SQ entries we'll submit
for SQPOLL if we have multiple rings. If we don't do that, we could be
submitting tons of entries for one ring, while others are waiting to get
service.

The value of 8 is somewhat arbitrarily chosen as something that allows
a fair bit of batching, without using an excessive time per ring.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e95eee2d

io_uring: get rid of req->io/io_async_ctx union · e8c2bc1f

由 Jens Axboe 提交于 8月 15, 2020

There's really no point in having this union, it just means that we're
always allocating enough room to cater to any command. But that's
pointless, as the ->io field is request type private anyway.

This gets rid of the io_async_ctx structure, and fills in the required
size in the io_op_defs[] instead.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e8c2bc1f

io_uring: kill extra user_bufs check · 4be1c615

由 Pavel Begunkov 提交于 9月 06, 2020

Testing ctx->user_bufs for NULL in io_import_fixed() is not neccessary,
because in that case ctx->nr_user_bufs would be zero, and the following
check would fail.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4be1c615

io_uring: fix overlapped memcpy in io_req_map_rw() · ab0b196c

由 Pavel Begunkov 提交于 9月 06, 2020

When io_req_map_rw() is called from io_rw_prep_async(), it memcpy()
iorw->iter into itself. Even though it doesn't lead to an error, such a
memcpy()'s aliasing rules violation is considered to be a bad practise.

Inline io_req_map_rw() into io_rw_prep_async(). We don't really need any
remapping there, so it's much simpler than the generic implementation.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ab0b196c

io_uring: refactor io_req_map_rw() · afb87658

由 Pavel Begunkov 提交于 9月 06, 2020

Set rw->free_iovec to @iovec, that gives an identical result and stresses
that @iovec param rw->free_iovec play the same role.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

afb87658

io_uring: simplify io_rw_prep_async() · f4bff104

由 Pavel Begunkov 提交于 9月 06, 2020

Don't touch iter->iov and iov in between __io_import_iovec() and
io_req_map_rw(), the former function aleady sets it correctly, because it
creates one more case with NULL'ed iov to consider in io_req_map_rw().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f4bff104

io_uring: provide IORING_ENTER_SQ_WAIT for SQPOLL SQ ring waits · 90554200

由 Jens Axboe 提交于 9月 03, 2020

When using SQPOLL, applications can run into the issue of running out of
SQ ring entries because the thread hasn't consumed them yet. The only
option for dealing with that is checking later, or busy checking for the
condition.

Provide IORING_ENTER_SQ_WAIT if applications want to wait on this
condition.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

90554200

J
io_uring: mark io_uring_fops/io_op_defs as __read_mostly · 738277ad
由 Jens Axboe 提交于 9月 03, 2020
```
These structures are never written, move them appropriately.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
738277ad

io_uring: enable IORING_SETUP_ATTACH_WQ to attach to SQPOLL thread too · aa06165d

由 Jens Axboe 提交于 9月 02, 2020

We support using IORING_SETUP_ATTACH_WQ to share async backends between
rings created by the same process, this now also allows the same to
happen with SQPOLL. The setup procedure remains the same, the caller
sets io_uring_params->wq_fd to the 'parent' context, and then the newly
created ring will attach to that async backend.

This means that multiple rings can share the same SQPOLL thread, saving
resources.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

aa06165d

io_uring: base SQPOLL handling off io_sq_data · 69fb2131

由 Jens Axboe 提交于 9月 14, 2020

Remove the SQPOLL thread from the ctx, and use the io_sq_data as the
data structure we pass in. io_sq_data has a list of ctx's that we can
then iterate over and handle.

As of now we're ready to handle multiple ctx's, though we're still just
handling a single one after this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

69fb2131

io_uring: split SQPOLL data into separate structure · 534ca6d6

由 Jens Axboe 提交于 9月 02, 2020

Move all the necessary state out of io_ring_ctx, and into a new
structure, io_sq_data. The latter now deals with any state or
variables associated with the SQPOLL thread itself.

In preparation for supporting more than one io_ring_ctx per SQPOLL
thread.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

534ca6d6

io_uring: split work handling part of SQPOLL into helper · c8d1ba58

由 Jens Axboe 提交于 9月 14, 2020

This is done in preparation for handling more than one ctx, but it also
cleans up the code a bit since io_sq_thread() was a bit too unwieldy to
get a get overview on.

__io_sq_thread() is now the main handler, and it returns an enum sq_ret
that tells io_sq_thread() what it ended up doing. The parent then makes
a decision on idle, spinning, or work handling based on that.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c8d1ba58

io_uring: move SQPOLL post-wakeup ring need wakeup flag into wake handler · 3f0e64d0

由 Jens Axboe 提交于 9月 02, 2020

We need to decouple the clearing on wakeup from the the inline schedule,
as that is going to be required for handling multiple rings in one
thread.

Wrap our wakeup handler so we can clear it when we get the wakeup, by
definition that is when we no longer need the flag set.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3f0e64d0

io_uring: use private ctx wait queue entries for SQPOLL · 6a779382

由 Jens Axboe 提交于 9月 02, 2020

This is in preparation to sharing the poller thread between rings. For
that we need per-ring wait_queue_entry storage, and we can't easily put
that on the stack if one thread is managing multiple rings.

We'll also be sharing the wait_queue_head across rings for the purposes
of wakeups, provide the usual private ring wait_queue_head for now but
make it a pointer so we can easily override it when sharing.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6a779382

io_uring: io_sq_thread() doesn't need to flush signals · e35afcf9

由 Jens Axboe 提交于 9月 02, 2020

We're not handling signals by default in kernel threads, and we never
use TWA_SIGNAL for the SQPOLL thread internally. Hence we can never
have a signal pending, and we don't need to check for it (nor flush it).
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e35afcf9

io_uring: allow disabling rings during the creation · 7e84e1c7

由 Stefano Garzarella 提交于 8月 27, 2020

This patch adds a new IORING_SETUP_R_DISABLED flag to start the
rings disabled, allowing the user to register restrictions,
buffers, files, before to start processing SQEs.

When IORING_SETUP_R_DISABLED is set, SQE are not processed and
SQPOLL kthread is not started.

The restrictions registration are allowed only when the rings
are disable to prevent concurrency issue while processing SQEs.

The rings can be enabled using IORING_REGISTER_ENABLE_RINGS
opcode with io_uring_register(2).
Suggested-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7e84e1c7

io_uring: add IOURING_REGISTER_RESTRICTIONS opcode · 21b55dbc

由 Stefano Garzarella 提交于 8月 27, 2020

The new io_uring_register(2) IOURING_REGISTER_RESTRICTIONS opcode
permanently installs a feature allowlist on an io_ring_ctx.
The io_ring_ctx can then be passed to untrusted code with the
knowledge that only operations present in the allowlist can be
executed.

The allowlist approach ensures that new features added to io_uring
do not accidentally become available when an existing application
is launched on a newer kernel version.

Currently is it possible to restrict sqe opcodes, sqe flags, and
register opcodes.

IOURING_REGISTER_RESTRICTIONS can only be made once. Afterwards
it is not possible to change restrictions anymore.
This prevents untrusted code from removing restrictions.
Suggested-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

21b55dbc

io_uring: reference ->nsproxy for file table commands · 9b828492

由 Jens Axboe 提交于 9月 18, 2020

If we don't get and assign the namespace for the async work, then certain
paths just don't work properly (like /dev/stdin, /proc/mounts, etc).
Anything that references the current namespace of the given task should
be assigned for async work on behalf of that task.

Cc: stable@vger.kernel.org # v5.5+
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9b828492

io_uring: don't rely on weak ->files references · 0f212204

由 Jens Axboe 提交于 9月 13, 2020

Grab actual references to the files_struct. To avoid circular references
issues due to this, we add a per-task note that keeps track of what
io_uring contexts a task has used. When the tasks execs or exits its
assigned files, we cancel requests based on this tracking.

With that, we can grab proper references to the files table, and no
longer need to rely on stashing away ring_fd and ring_file to check
if the ring_fd may have been closed.

Cc: stable@vger.kernel.org # v5.5+
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0f212204

io_uring: enable task/files specific overflow flushing · e6c8aa9a

由 Jens Axboe 提交于 9月 28, 2020

This allows us to selectively flush out pending overflows, depending on
the task and/or files_struct being passed in.

No intended functional changes in this patch.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e6c8aa9a

io_uring: return cancelation status from poll/timeout/files handlers · 76e1b642

由 Jens Axboe 提交于 9月 26, 2020

Return whether we found and canceled requests or not. This is in
preparation for using this information, no functional changes in this
patch.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

76e1b642

io_uring: unconditionally grab req->task · e3bc8e9d

由 Jens Axboe 提交于 9月 24, 2020

Sometimes we assign a weak reference to it, sometimes we grab a
reference to it. Clean this up and make it unconditional, and drop the
flag related to tracking this state.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e3bc8e9d

io_uring: stash ctx task reference for SQPOLL · 2aede0e4

由 Jens Axboe 提交于 9月 14, 2020

We can grab a reference to the task instead of stashing away the task
files_struct. This is doable without creating a circular reference
between the ring fd and the task itself.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2aede0e4

io_uring: move dropping of files into separate helper · f573d384

由 Jens Axboe 提交于 9月 22, 2020

No functional changes in this patch, prep patch for grabbing references
to the files_struct.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f573d384

io_uring: allow timeout/poll/files killing to take task into account · f3606e3a

由 Jens Axboe 提交于 9月 22, 2020

We currently cancel these when the ring exits, and we cancel all of
them. This is in preparation for killing only the ones associated
with a given task.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f3606e3a

29 9月, 2020 1 次提交

io_uring: fix async buffered reads when readahead is disabled · c8d317aa

由 Hao Xu 提交于 9月 29, 2020

The async buffered reads feature is not working when readahead is
turned off. There are two things to concern:

- when doing retry in io_read, not only the IOCB_WAITQ flag but also
  the IOCB_NOWAIT flag is still set, which makes it goes to would_block
  phase in generic_file_buffered_read() and then return -EAGAIN. After
  that, the io-wq thread work is queued, and later doing the async
  reads in the old way.

- even if we remove IOCB_NOWAIT when doing retry, the feature is still
  not running properly, since in generic_file_buffered_read() it goes to
  lock_page_killable() after calling mapping->a_ops->readpage() to do
  IO, and thus causing process to sleep.

Fixes: 1a0a7853 ("mm: support async buffered reads in generic_file_buffered_read()")
Fixes: 3b2a4439 ("io_uring: get rid of kiocb_wait_page_queue_init()")
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c8d317aa

28 9月, 2020 1 次提交

io_uring: fix potential ABBA deadlock in ->show_fdinfo() · fad8e0de

由 Jens Axboe 提交于 9月 28, 2020

syzbot reports a potential lock deadlock between the normal IO path and
->show_fdinfo():

======================================================
WARNING: possible circular locking dependency detected
5.9.0-rc6-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor.2/19710 is trying to acquire lock:
ffff888098ddc450 (sb_writers#4){.+.+}-{0:0}, at: io_write+0x6b5/0xb30 fs/io_uring.c:3296

but task is already holding lock:
ffff8880a11b8428 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0xe9a/0x1bd0 fs/io_uring.c:8348

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&ctx->uring_lock){+.+.}-{3:3}:
       __mutex_lock_common kernel/locking/mutex.c:956 [inline]
       __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
       __io_uring_show_fdinfo fs/io_uring.c:8417 [inline]
       io_uring_show_fdinfo+0x194/0xc70 fs/io_uring.c:8460
       seq_show+0x4a8/0x700 fs/proc/fd.c:65
       seq_read+0x432/0x1070 fs/seq_file.c:208
       do_loop_readv_writev fs/read_write.c:734 [inline]
       do_loop_readv_writev fs/read_write.c:721 [inline]
       do_iter_read+0x48e/0x6e0 fs/read_write.c:955
       vfs_readv+0xe5/0x150 fs/read_write.c:1073
       kernel_readv fs/splice.c:355 [inline]
       default_file_splice_read.constprop.0+0x4e6/0x9e0 fs/splice.c:412
       do_splice_to+0x137/0x170 fs/splice.c:871
       splice_direct_to_actor+0x307/0x980 fs/splice.c:950
       do_splice_direct+0x1b3/0x280 fs/splice.c:1059
       do_sendfile+0x55f/0xd40 fs/read_write.c:1540
       __do_sys_sendfile64 fs/read_write.c:1601 [inline]
       __se_sys_sendfile64 fs/read_write.c:1587 [inline]
       __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1587
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #1 (&p->lock){+.+.}-{3:3}:
       __mutex_lock_common kernel/locking/mutex.c:956 [inline]
       __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
       seq_read+0x61/0x1070 fs/seq_file.c:155
       pde_read fs/proc/inode.c:306 [inline]
       proc_reg_read+0x221/0x300 fs/proc/inode.c:318
       do_loop_readv_writev fs/read_write.c:734 [inline]
       do_loop_readv_writev fs/read_write.c:721 [inline]
       do_iter_read+0x48e/0x6e0 fs/read_write.c:955
       vfs_readv+0xe5/0x150 fs/read_write.c:1073
       kernel_readv fs/splice.c:355 [inline]
       default_file_splice_read.constprop.0+0x4e6/0x9e0 fs/splice.c:412
       do_splice_to+0x137/0x170 fs/splice.c:871
       splice_direct_to_actor+0x307/0x980 fs/splice.c:950
       do_splice_direct+0x1b3/0x280 fs/splice.c:1059
       do_sendfile+0x55f/0xd40 fs/read_write.c:1540
       __do_sys_sendfile64 fs/read_write.c:1601 [inline]
       __se_sys_sendfile64 fs/read_write.c:1587 [inline]
       __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1587
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (sb_writers#4){.+.+}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:2496 [inline]
       check_prevs_add kernel/locking/lockdep.c:2601 [inline]
       validate_chain kernel/locking/lockdep.c:3218 [inline]
       __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4441
       lock_acquire+0x1f3/0xaf0 kernel/locking/lockdep.c:5029
       percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
       __sb_start_write+0x228/0x450 fs/super.c:1672
       io_write+0x6b5/0xb30 fs/io_uring.c:3296
       io_issue_sqe+0x18f/0x5c50 fs/io_uring.c:5719
       __io_queue_sqe+0x280/0x1160 fs/io_uring.c:6175
       io_queue_sqe+0x692/0xfa0 fs/io_uring.c:6254
       io_submit_sqe fs/io_uring.c:6324 [inline]
       io_submit_sqes+0x1761/0x2400 fs/io_uring.c:6521
       __do_sys_io_uring_enter+0xeac/0x1bd0 fs/io_uring.c:8349
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

Chain exists of:
  sb_writers#4 --> &p->lock --> &ctx->uring_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ctx->uring_lock);
                               lock(&p->lock);
                               lock(&ctx->uring_lock);
  lock(sb_writers#4);

 *** DEADLOCK ***

1 lock held by syz-executor.2/19710:
 #0: ffff8880a11b8428 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0xe9a/0x1bd0 fs/io_uring.c:8348

stack backtrace:
CPU: 0 PID: 19710 Comm: syz-executor.2 Not tainted 5.9.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x198/0x1fd lib/dump_stack.c:118
 check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827
 check_prev_add kernel/locking/lockdep.c:2496 [inline]
 check_prevs_add kernel/locking/lockdep.c:2601 [inline]
 validate_chain kernel/locking/lockdep.c:3218 [inline]
 __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4441
 lock_acquire+0x1f3/0xaf0 kernel/locking/lockdep.c:5029
 percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
 __sb_start_write+0x228/0x450 fs/super.c:1672
 io_write+0x6b5/0xb30 fs/io_uring.c:3296
 io_issue_sqe+0x18f/0x5c50 fs/io_uring.c:5719
 __io_queue_sqe+0x280/0x1160 fs/io_uring.c:6175
 io_queue_sqe+0x692/0xfa0 fs/io_uring.c:6254
 io_submit_sqe fs/io_uring.c:6324 [inline]
 io_submit_sqes+0x1761/0x2400 fs/io_uring.c:6521
 __do_sys_io_uring_enter+0xeac/0x1bd0 fs/io_uring.c:8349
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45e179
Code: 3d b2 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 0b b2 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f1194e74c78 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
RAX: ffffffffffffffda RBX: 00000000000082c0 RCX: 000000000045e179
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004
RBP: 000000000118cf98 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cf4c
R13: 00007ffd1aa5756f R14: 00007f1194e759c0 R15: 000000000118cf4c

Fix this by just not diving into details if we fail to trylock the
io_uring mutex. We know the ctx isn't going away during this operation,
but we cannot safely iterate buffers/files/personalities if we don't
hold the io_uring mutex.

Reported-by: syzbot+2f8fa4e860edc3066aba@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fad8e0de

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功