提交 · 900fad45dc75c8af6015bc514cc11aa3d265426a · openeuler / Kernel

20 10月, 2020 8 次提交

io_uring: fix racy REQ_F_LINK_TIMEOUT clearing · 900fad45

由 Pavel Begunkov 提交于 10月 19, 2020

io_link_timeout_fn() removes REQ_F_LINK_TIMEOUT from the link head's
flags, it's not atomic and may race with what the head is doing.

If io_link_timeout_fn() doesn't clear the flag, as forced by this patch,
then it may happen that for "req -> link_timeout1 -> link_timeout2",
__io_kill_linked_timeout() would find link_timeout2 and try to cancel
it, so miscounting references. Teach it to ignore such double timeouts
by marking the active one with a new flag in io_prep_linked_timeout().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

900fad45

io_uring: do poll's hash_node init in common code · 4d52f338

由 Pavel Begunkov 提交于 10月 18, 2020

Move INIT_HLIST_NODE(&req->hash_node) into __io_arm_poll_handler(), so
that it doesn't duplicated and common poll code would be responsible for
it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4d52f338

io_uring: inline io_poll_task_handler() · dd221f46

由 Pavel Begunkov 提交于 10月 18, 2020

io_poll_task_handler() doesn't add clarity, inline it in its only user.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dd221f46

io_uring: remove extra ->file check in poll prep · 069b8938

由 Pavel Begunkov 提交于 10月 18, 2020

io_poll_add_prep() doesn't need to verify ->file because it's already
done in io_init_req().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

069b8938

io_uring: make cached_cq_overflow non atomic_t · 2c3bac6d

由 Pavel Begunkov 提交于 10月 18, 2020

ctx->cached_cq_overflow is changed only under completion_lock. Convert
it from atomic_t to just int, and mark all places when it's read without
lock with READ_ONCE, which guarantees atomicity (relaxed ordering).
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2c3bac6d

io_uring: inline io_fail_links() · d148ca4b

由 Pavel Begunkov 提交于 10月 18, 2020

Inline io_fail_links() and kill extra io_cqring_ev_posted().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d148ca4b

io_uring: kill ref get/drop in personality init · ec99ca6c

由 Pavel Begunkov 提交于 10月 18, 2020

Don't take an identity on personality/creds init only to drop it a few
lines after. Extract a function which prepares req->work but leaves it
without identity.

Note: it's safe to not check REQ_F_WORK_INITIALIZED there because it's
nobody had a chance to init it before io_init_req().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ec99ca6c

io_uring: flags-based creds init in queue · 2e5aa6cb

由 Pavel Begunkov 提交于 10月 18, 2020

Use IO_WQ_WORK_CREDS to figure out if req has creds to be used.
Since recently it should rely only on flags, but not value of
work.creds.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2e5aa6cb

19 10月, 2020 1 次提交

io_uring: use blk_queue_nowait() to check if NOWAIT supported · 9ba0d0c8

由 Jeffle Xu 提交于 10月 19, 2020

commit 021a2446 ("block: add QUEUE_FLAG_NOWAIT") adds a new helper
function blk_queue_nowait() to check if the bdev supports handling of
REQ_NOWAIT or not. Since then bio-based dm device can also support
REQ_NOWAIT, and currently only dm-linear supports that since
commit 6abc4946 ("dm: add support for REQ_NOWAIT and enable it for
linear target").
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9ba0d0c8

17 10月, 2020 17 次提交

io_uring: fix double poll mask init · 58852d4d

由 Pavel Begunkov 提交于 10月 16, 2020

__io_queue_proc() is used by both, poll reqs and apoll. Don't use
req->poll.events to copy poll mask because for apoll it aliases with
private data of the request.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

58852d4d

io-wq: inherit audit loginuid and sessionid · 4ea33a97

由 Jens Axboe 提交于 10月 15, 2020

Make sure the async io-wq workers inherit the loginuid and sessionid from
the original task, and restore them to unset once we're done with the
async work item.

While at it, disable the ability for kernel threads to write to their own
loginuid.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4ea33a97

io_uring: use percpu counters to track inflight requests · d8a6df10

由 Jens Axboe 提交于 10月 15, 2020

Even though we place the req_issued and req_complete in separate
cachelines, there's considerable overhead in doing the atomics
particularly on the completion side.

Get rid of having the two counters, and just use a percpu_counter for
this. That's what it was made for, after all. This considerably
reduces the overhead in __io_free_req().
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d8a6df10

io_uring: assign new io_identity for task if members have changed · 500a373d

由 Jens Axboe 提交于 10月 15, 2020

This avoids doing a copy for each new async IO, if some parts of the
io_identity has changed. We avoid reference counting for the normal
fast path of nothing ever changing.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

500a373d

io_uring: store io_identity in io_uring_task · 5c3462cf

由 Jens Axboe 提交于 10月 15, 2020

This is, by definition, a per-task structure. So store it in the
task context, instead of doing carrying it in each io_kiocb. We're being
a bit inefficient if members have changed, as that requires an alloc and
copy of a new io_identity struct. The next patch will fix that up.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5c3462cf

io_uring: COW io_identity on mismatch · 1e6fa521

由 Jens Axboe 提交于 10月 15, 2020

If the io_identity doesn't completely match the task, then create a
copy of it and use that. The existing copy remains valid until the last
user of it has gone away.

This also changes the personality lookup to be indexed by io_identity,
instead of creds directly.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1e6fa521

io_uring: move io identity items into separate struct · 98447d65

由 Jens Axboe 提交于 10月 14, 2020

io-wq contains a pointer to the identity, which we just hold in io_kiocb
for now. This is in preparation for putting this outside io_kiocb. The
only exception is struct files_struct, which we'll need different rules
for to avoid a circular dependency.

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

98447d65

io_uring: rely solely on work flags to determine personality. · dfead8a8

由 Jens Axboe 提交于 10月 14, 2020

We solely rely on work->work_flags now, so use that for proper checking
and clearing/dropping of various identity items.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dfead8a8

io_uring: pass required context in as flags · 0f203765

由 Jens Axboe 提交于 10月 14, 2020

We have a number of bits that decide what context to inherit. Set up
io-wq flags for these instead. This is in preparation for always having
the various members set, but not always needing them for all requests.

No intended functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0f203765

io_uring: fix error path cleanup in io_sqe_files_register() · 55cbc256

由 Jens Axboe 提交于 10月 14, 2020

syzbot reports the following crash:

general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 PID: 8927 Comm: syz-executor.3 Not tainted 5.9.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_file_from_index fs/io_uring.c:5963 [inline]
RIP: 0010:io_sqe_files_register fs/io_uring.c:7369 [inline]
RIP: 0010:__io_uring_register fs/io_uring.c:9463 [inline]
RIP: 0010:__do_sys_io_uring_register+0x2fd2/0x3ee0 fs/io_uring.c:9553
Code: ec 03 49 c1 ee 03 49 01 ec 49 01 ee e8 57 61 9c ff 41 80 3c 24 00 0f 85 9b 09 00 00 4d 8b af b8 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 28 00 0f 85 76 09 00 00 49 8b 55 00 89 d8 c1 f8 09 48 98 4c
RSP: 0018:ffffc90009137d68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000ef2a000
RDX: 0000000000040000 RSI: ffffffff81d81dd9 RDI: 0000000000000005
RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffed1012882a37
R13: 0000000000000000 R14: ffffed1012882a38 R15: ffff888094415000
FS: 00007f4266f3c700(0000) GS:ffff8880ae500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000118c000 CR3: 000000008e57d000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45de59
Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f4266f3bc78 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
RAX: ffffffffffffffda RBX: 00000000000083c0 RCX: 000000000045de59
RDX: 0000000020000280 RSI: 0000000000000002 RDI: 0000000000000005
RBP: 000000000118bf68 R08: 0000000000000000 R09: 0000000000000000
R10: 40000000000000a1 R11: 0000000000000246 R12: 000000000118bf2c
R13: 00007fff2fa4f12f R14: 00007f4266f3c9c0 R15: 000000000118bf2c
Modules linked in:
---[ end trace 2a40a195e2d5e6e6 ]---
RIP: 0010:io_file_from_index fs/io_uring.c:5963 [inline]
RIP: 0010:io_sqe_files_register fs/io_uring.c:7369 [inline]
RIP: 0010:__io_uring_register fs/io_uring.c:9463 [inline]
RIP: 0010:__do_sys_io_uring_register+0x2fd2/0x3ee0 fs/io_uring.c:9553
Code: ec 03 49 c1 ee 03 49 01 ec 49 01 ee e8 57 61 9c ff 41 80 3c 24 00 0f 85 9b 09 00 00 4d 8b af b8 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 28 00 0f 85 76 09 00 00 49 8b 55 00 89 d8 c1 f8 09 48 98 4c
RSP: 0018:ffffc90009137d68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000ef2a000
RDX: 0000000000040000 RSI: ffffffff81d81dd9 RDI: 0000000000000005
RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffed1012882a37
R13: 0000000000000000 R14: ffffed1012882a38 R15: ffff888094415000
FS: 00007f4266f3c700(0000) GS:ffff8880ae400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000074a918 CR3: 000000008e57d000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

which is a copy of fget failure condition jumping to cleanup, but the
cleanup requires ctx->file_data to be assigned. Assign it when setup,
and ensure that we clear it again for the error path exit.

Fixes: 5398ae69 ("io_uring: clean file_data access in files_register")
Reported-by: syzbot+f4ebcc98223dafd8991e@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

55cbc256

Revert "io_uring: mark io_uring_fops/io_op_defs as __read_mostly" · 0918682b

由 Jens Axboe 提交于 10月 13, 2020

This reverts commit 738277ad.

This change didn't make a lot of sense, and as Linus reports, it actually
fails on clang:

   /tmp/io_uring-dd40c4.s:26476: Warning: ignoring changed section
   attributes for .data..read_mostly

The arrays are already marked const so, by definition, they are not
just read-mostly, they are read-only.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0918682b

io_uring: fix REQ_F_COMP_LOCKED by killing it · 216578e5

由 Pavel Begunkov 提交于 10月 13, 2020

REQ_F_COMP_LOCKED is used and implemented in a buggy way. The problem is
that the flag is set before io_put_req() but not cleared after, and if
that wasn't the final reference, the request will be freed with the flag
set from some other context, which may not hold a spinlock. That means
possible races with removing linked timeouts and unsynchronised
completion (e.g. access to CQ).

Instead of fixing REQ_F_COMP_LOCKED, kill the flag and use
task_work_add() to move such requests to a fresh context to free from
it, as was done with __io_free_req_finish().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

216578e5

io_uring: dig out COMP_LOCK from deep call chain · 4edf20f9

由 Pavel Begunkov 提交于 10月 13, 2020

io_req_clean_work() checks REQ_F_COMP_LOCK to pass this two layers up.
Move the check up into __io_free_req(), so at least it doesn't looks so
ugly and would facilitate further changes.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4edf20f9

io_uring: don't put a poll req under spinlock · 6a0af224

由 Pavel Begunkov 提交于 10月 13, 2020

Move io_put_req() in io_poll_task_handler() from under spinlock. This
eliminates the need to use REQ_F_COMP_LOCKED, at the expense of
potentially having to grab the lock again. That's still a better trade
off than relying on the locked flag.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6a0af224

io_uring: don't unnecessarily clear F_LINK_TIMEOUT · b1b74cfc

由 Pavel Begunkov 提交于 10月 13, 2020

If a request had REQ_F_LINK_TIMEOUT it would've been cleared in
__io_kill_linked_timeout() by the time of __io_fail_links(), so no need
to care about it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b1b74cfc

io_uring: don't set COMP_LOCKED if won't put · 368c5481

由 Pavel Begunkov 提交于 10月 13, 2020

__io_kill_linked_timeout() sets REQ_F_COMP_LOCKED for a linked timeout
even if it can't cancel it, e.g. it's already running. It not only races
with io_link_timeout_fn() for ->flags field, but also leaves the flag
set and so io_link_timeout_fn() may find it and decide that it holds the
lock. Hopefully, the second problem is potential.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

368c5481

io_uring: Fix sizeof() mismatch · 035fbafc

由 Colin Ian King 提交于 10月 12, 2020

An incorrect sizeof() is being used, sizeof(file_data->table) is not
correct, it should be sizeof(*file_data->table).

Fixes: 5398ae69 ("io_uring: clean file_data access in files_register")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

035fbafc

11 10月, 2020 12 次提交

io_uring: keep a pointer ref_node in file_data · b2e96852

由 Pavel Begunkov 提交于 10月 10, 2020

->cur_refs of struct fixed_file_data always points to percpu_ref
embedded into struct fixed_file_ref_node. Don't overuse container_of()
and offsetting, and point directly to fixed_file_ref_node.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b2e96852

io_uring: refactor *files_register()'s error paths · 600cf3f8

由 Pavel Begunkov 提交于 10月 10, 2020

Don't keep repeating cleaning sequences in error paths, write it once
in the and use labels. It's less error prone and looks cleaner.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

600cf3f8

io_uring: clean file_data access in files_register · 5398ae69

由 Pavel Begunkov 提交于 10月 10, 2020

Keep file_data in a local var and replace with it complex references
such as ctx->file_data.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5398ae69

io_uring: don't delay io_init_req() error check · 692d8363

由 Pavel Begunkov 提交于 10月 10, 2020

Don't postpone io_init_req() error checks and do that right after
calling it. There is no control-flow statements or dependencies with
sqe/submitted accounting, so do those earlier, that makes the code flow
a bit more natural.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

692d8363

io_uring: clean leftovers after splitting issue · 062d04d7

由 Pavel Begunkov 提交于 10月 10, 2020

Kill extra if in io_issue_sqe() and place send/recv[msg] calls
appropriately under switch's cases.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

062d04d7

io_uring: remove timeout.list after hrtimer cancel · a71976f3

由 Pavel Begunkov 提交于 10月 10, 2020

Remove timeouts from ctx->timeout_list after hrtimer_try_to_cancel()
successfully cancels it. With this we don't need to care whether there
was a race and it was removed in io_timeout_fn(), and that will be handy
for following patches.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a71976f3

io_uring: use a separate struct for timeout_remove · 0bdf7a2d

由 Pavel Begunkov 提交于 10月 10, 2020

Don't use struct io_timeout for both IORING_OP_TIMEOUT and
IORING_OP_TIMEOUT_REMOVE, they're quite different. Split them in two,
that allows to remove an unused field in struct io_timeout, and btw kill
->flags not used by either. This also easier to follow, especially for
timeout remove.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0bdf7a2d

io_uring: improve submit_state.ios_left accounting · 71b547c0

由 Pavel Begunkov 提交于 10月 10, 2020

state->ios_left isn't decremented for requests that don't need a file,
so it might be larger than number of SQEs left. That in some
circumstances makes us to grab more files that is needed so imposing
extra put.
Deaccount one ios_left for each request.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

71b547c0

io_uring: simplify io_file_get() · 8371adf5

由 Pavel Begunkov 提交于 10月 10, 2020

Keep ->needs_file_no_error check out of io_file_get(), and let callers
handle it. It makes it more straightforward. Also, as the only error it
can hand back -EBADF, make it return a file or NULL.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8371adf5

io_uring: kill extra check in fixed io_file_get() · 479f517b

由 Pavel Begunkov 提交于 10月 10, 2020

ctx->nr_user_files == 0 IFF ctx->file_data == NULL and there fixed files
are not used. Hence, verifying fds only against ctx->nr_user_files is
enough. Remove the other check from hot path.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

479f517b

io_uring: clean up ->files grabbing · 23329513

由 Pavel Begunkov 提交于 10月 10, 2020

Move work.files grabbing into io_prep_async_work() to all other work
resources initialisation. We don't need to keep it separately now, as
->ring_fd/file are gone. It also allows to not grab it when a request
is not going to io-wq.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

23329513

io_uring: don't io_prep_async_work() linked reqs · 5bf5e464

由 Pavel Begunkov 提交于 10月 10, 2020

There is no real reason left for preparing io-wq work context for linked
requests in advance, remove it as this might become a bottleneck in some
cases.
Reported-by: NRoman Gershman <romger@amazon.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5bf5e464

09 10月, 2020 2 次提交

io_uring: Convert advanced XArray uses to the normal API · 5e2ed8c4

由 Matthew Wilcox (Oracle) 提交于 10月 09, 2020

There are no bugs here that I've spotted, it's just easier to use the
normal API and there are no performance advantages to using the more
verbose advanced API.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5e2ed8c4

io_uring: Fix XArray usage in io_uring_add_task_file · 236434c3

由 Matthew Wilcox (Oracle) 提交于 10月 09, 2020

The xas_store() wasn't paired with an xas_nomem() loop, so if it couldn't
allocate memory using GFP_NOWAIT, it would leak the reference to the file
descriptor.  Also the node pointed to by the xas could be freed between
the call to xas_load() under the rcu_read_lock() and the acquisition of
the xa_lock.

It's easier to just use the normal xa_load/xa_store interface here.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
[axboe: fix missing assign after alloc, cur_uring -> tctx rename]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

236434c3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功