提交 · b2e9685283127f30e7f2b466af0046ff9bd27a86 · openeuler / Kernel

11 10月, 2020 12 次提交

io_uring: keep a pointer ref_node in file_data · b2e96852

由 Pavel Begunkov 提交于 10月 10, 2020

->cur_refs of struct fixed_file_data always points to percpu_ref
embedded into struct fixed_file_ref_node. Don't overuse container_of()
and offsetting, and point directly to fixed_file_ref_node.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b2e96852

io_uring: refactor *files_register()'s error paths · 600cf3f8

由 Pavel Begunkov 提交于 10月 10, 2020

Don't keep repeating cleaning sequences in error paths, write it once
in the and use labels. It's less error prone and looks cleaner.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

600cf3f8

io_uring: clean file_data access in files_register · 5398ae69

由 Pavel Begunkov 提交于 10月 10, 2020

Keep file_data in a local var and replace with it complex references
such as ctx->file_data.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5398ae69

io_uring: don't delay io_init_req() error check · 692d8363

由 Pavel Begunkov 提交于 10月 10, 2020

Don't postpone io_init_req() error checks and do that right after
calling it. There is no control-flow statements or dependencies with
sqe/submitted accounting, so do those earlier, that makes the code flow
a bit more natural.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

692d8363

io_uring: clean leftovers after splitting issue · 062d04d7

由 Pavel Begunkov 提交于 10月 10, 2020

Kill extra if in io_issue_sqe() and place send/recv[msg] calls
appropriately under switch's cases.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

062d04d7

io_uring: remove timeout.list after hrtimer cancel · a71976f3

由 Pavel Begunkov 提交于 10月 10, 2020

Remove timeouts from ctx->timeout_list after hrtimer_try_to_cancel()
successfully cancels it. With this we don't need to care whether there
was a race and it was removed in io_timeout_fn(), and that will be handy
for following patches.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a71976f3

io_uring: use a separate struct for timeout_remove · 0bdf7a2d

由 Pavel Begunkov 提交于 10月 10, 2020

Don't use struct io_timeout for both IORING_OP_TIMEOUT and
IORING_OP_TIMEOUT_REMOVE, they're quite different. Split them in two,
that allows to remove an unused field in struct io_timeout, and btw kill
->flags not used by either. This also easier to follow, especially for
timeout remove.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0bdf7a2d

io_uring: improve submit_state.ios_left accounting · 71b547c0

由 Pavel Begunkov 提交于 10月 10, 2020

state->ios_left isn't decremented for requests that don't need a file,
so it might be larger than number of SQEs left. That in some
circumstances makes us to grab more files that is needed so imposing
extra put.
Deaccount one ios_left for each request.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

71b547c0

io_uring: simplify io_file_get() · 8371adf5

由 Pavel Begunkov 提交于 10月 10, 2020

Keep ->needs_file_no_error check out of io_file_get(), and let callers
handle it. It makes it more straightforward. Also, as the only error it
can hand back -EBADF, make it return a file or NULL.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8371adf5

io_uring: kill extra check in fixed io_file_get() · 479f517b

由 Pavel Begunkov 提交于 10月 10, 2020

ctx->nr_user_files == 0 IFF ctx->file_data == NULL and there fixed files
are not used. Hence, verifying fds only against ctx->nr_user_files is
enough. Remove the other check from hot path.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

479f517b

io_uring: clean up ->files grabbing · 23329513

由 Pavel Begunkov 提交于 10月 10, 2020

Move work.files grabbing into io_prep_async_work() to all other work
resources initialisation. We don't need to keep it separately now, as
->ring_fd/file are gone. It also allows to not grab it when a request
is not going to io-wq.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

23329513

io_uring: don't io_prep_async_work() linked reqs · 5bf5e464

由 Pavel Begunkov 提交于 10月 10, 2020

There is no real reason left for preparing io-wq work context for linked
requests in advance, remove it as this might become a bottleneck in some
cases.
Reported-by: NRoman Gershman <romger@amazon.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5bf5e464

09 10月, 2020 4 次提交

io_uring: Convert advanced XArray uses to the normal API · 5e2ed8c4

由 Matthew Wilcox (Oracle) 提交于 10月 09, 2020

There are no bugs here that I've spotted, it's just easier to use the
normal API and there are no performance advantages to using the more
verbose advanced API.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5e2ed8c4

io_uring: Fix XArray usage in io_uring_add_task_file · 236434c3

由 Matthew Wilcox (Oracle) 提交于 10月 09, 2020

The xas_store() wasn't paired with an xas_nomem() loop, so if it couldn't
allocate memory using GFP_NOWAIT, it would leak the reference to the file
descriptor.  Also the node pointed to by the xas could be freed between
the call to xas_load() under the rcu_read_lock() and the acquisition of
the xa_lock.

It's easier to just use the normal xa_load/xa_store interface here.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
[axboe: fix missing assign after alloc, cur_uring -> tctx rename]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

236434c3

io_uring: Fix use of XArray in __io_uring_files_cancel · ce765372

由 Matthew Wilcox (Oracle) 提交于 10月 09, 2020

We have to drop the lock during each iteration, so there's no advantage
to using the advanced API. Convert this to a standard xa_for_each() loop.

Reported-by: syzbot+27c12725d8ff0bfe1a13@syzkaller.appspotmail.com
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ce765372

io_uring: fix break condition for __io_uring_register() waiting · ed6930c9

由 Jens Axboe 提交于 10月 08, 2020

Colin reports that there's unreachable code, since we only ever break
if ret == 0. This is correct, and is due to a reversed logic condition
in when to break or not.

Break out of the loop if we don't process any task work, in that case
we do want to return -EINTR.

Fixes: af9c1a44 ("io_uring: process task work in io_uring_register()")
Reported-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ed6930c9

08 10月, 2020 2 次提交

io_uring: no need to call xa_destroy() on empty xarray · ca6484cd

由 Jens Axboe 提交于 10月 08, 2020

The kernel test robot reports this lockdep issue:

[child1:659] mbind (274) returned ENOSYS, marking as inactive.
[child1:659] mq_timedsend (279) returned ENOSYS, marking as inactive.
[main] 10175 iterations. [F:7781 S:2344 HI:2397]
[   24.610601]
[   24.610743] ================================
[   24.611083] WARNING: inconsistent lock state
[   24.611437] 5.9.0-rc7-00017-g0f212204 #5 Not tainted
[   24.611861] --------------------------------
[   24.612193] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[   24.612660] ksoftirqd/0/7 [HC0[0]:SC1[3]:HE0:SE0] takes:
[   24.613086] f00ed998 (&xa->xa_lock#4){+.?.}-{2:2}, at: xa_destroy+0x43/0xc1
[   24.613642] {SOFTIRQ-ON-W} state was registered at:
[   24.614024]   lock_acquire+0x20c/0x29b
[   24.614341]   _raw_spin_lock+0x21/0x30
[   24.614636]   io_uring_add_task_file+0xe8/0x13a
[   24.614987]   io_uring_create+0x535/0x6bd
[   24.615297]   io_uring_setup+0x11d/0x136
[   24.615606]   __ia32_sys_io_uring_setup+0xd/0xf
[   24.615977]   do_int80_syscall_32+0x53/0x6c
[   24.616306]   restore_all_switch_stack+0x0/0xb1
[   24.616677] irq event stamp: 939881
[   24.616968] hardirqs last  enabled at (939880): [<8105592d>] __local_bh_enable_ip+0x13c/0x145
[   24.617642] hardirqs last disabled at (939881): [<81b6ace3>] _raw_spin_lock_irqsave+0x1b/0x4e
[   24.618321] softirqs last  enabled at (939738): [<81b6c7c8>] __do_softirq+0x3f0/0x45a
[   24.618924] softirqs last disabled at (939743): [<81055741>] run_ksoftirqd+0x35/0x61
[   24.619521]
[   24.619521] other info that might help us debug this:
[   24.620028]  Possible unsafe locking scenario:
[   24.620028]
[   24.620492]        CPU0
[   24.620685]        ----
[   24.620894]   lock(&xa->xa_lock#4);
[   24.621168]   <Interrupt>
[   24.621381]     lock(&xa->xa_lock#4);
[   24.621695]
[   24.621695]  *** DEADLOCK ***
[   24.621695]
[   24.622154] 1 lock held by ksoftirqd/0/7:
[   24.622468]  #0: 823bfb94 (rcu_callback){....}-{0:0}, at: rcu_process_callbacks+0xc0/0x155
[   24.623106]
[   24.623106] stack backtrace:
[   24.623454] CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 5.9.0-rc7-00017-g0f212204 #5
[   24.624090] Call Trace:
[   24.624284]  ? show_stack+0x40/0x46
[   24.624551]  dump_stack+0x1b/0x1d
[   24.624809]  print_usage_bug+0x17a/0x185
[   24.625142]  mark_lock+0x11d/0x1db
[   24.625474]  ? print_shortest_lock_dependencies+0x121/0x121
[   24.625905]  __lock_acquire+0x41e/0x7bf
[   24.626206]  lock_acquire+0x20c/0x29b
[   24.626517]  ? xa_destroy+0x43/0xc1
[   24.626810]  ? lock_acquire+0x20c/0x29b
[   24.627110]  _raw_spin_lock_irqsave+0x3e/0x4e
[   24.627450]  ? xa_destroy+0x43/0xc1
[   24.627725]  xa_destroy+0x43/0xc1
[   24.627989]  __io_uring_free+0x57/0x71
[   24.628286]  ? get_pid+0x22/0x22
[   24.628544]  __put_task_struct+0xf2/0x163
[   24.628865]  put_task_struct+0x1f/0x2a
[   24.629161]  delayed_put_task_struct+0xe2/0xe9
[   24.629509]  rcu_process_callbacks+0x128/0x155
[   24.629860]  __do_softirq+0x1a3/0x45a
[   24.630151]  run_ksoftirqd+0x35/0x61
[   24.630443]  smpboot_thread_fn+0x304/0x31a
[   24.630763]  kthread+0x124/0x139
[   24.631016]  ? sort_range+0x18/0x18
[   24.631290]  ? kthread_create_worker_on_cpu+0x17/0x17
[   24.631682]  ret_from_fork+0x1c/0x28

which is complaining about xa_destroy() grabbing the xa lock in an
IRQ disabling fashion, whereas the io_uring uses cases aren't interrupt
safe. This is really an xarray issue, since it should not assume the
lock type. But for our use case, since we know the xarray is empty at
this point, there's no need to actually call xa_destroy(). So just get
rid of it.

Fixes: 0f212204 ("io_uring: don't rely on weak ->files references")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ca6484cd

io_uring: batch account ->req_issue and task struct references · faf7b51c

由 Jens Axboe 提交于 10月 07, 2020

Identical to how we handle the ctx reference counts, increase by the
batch we're expecting to submit, and handle any slow path residual,
if any. The request alloc-and-issue path is very hot, and this makes
a noticeable difference by avoiding an two atomic incs for each
individual request.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

faf7b51c

01 10月, 2020 22 次提交

J
io_uring: kill callback_head argument for io_req_task_work_add() · 87c4311f
由 Jens Axboe 提交于 9月 30, 2020
```
We always use &req->task_work anyway, no point in passing it in.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
87c4311f

io_uring: move req preps out of io_issue_sqe() · c1379e24

由 Pavel Begunkov 提交于 9月 30, 2020

All request preparations are done only during submission, reflect it in
the code by moving io_req_prep() much earlier into io_queue_sqe().

That's much cleaner, because it doen't expose bits to async code which
it won't ever use. Also it makes the interface harder to misuse, and
there are potential places for bugs.

For instance, __io_queue() doesn't clear @sqe before proceeding to a
next linked request, that could have been disastrous, but hopefully
there are linked requests IFF sqe==NULL, so not actually a bug.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c1379e24

io_uring: decouple issuing and req preparation · bfe76559

由 Pavel Begunkov 提交于 9月 30, 2020

io_issue_sqe() does two things at once, trying to prepare request and
issuing them. Split it in two and deduplicate with io_defer_prep().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bfe76559

io_uring: remove nonblock arg from io_{rw}_prep() · 73debe68

由 Pavel Begunkov 提交于 9月 30, 2020

All io_*_prep() functions including io_{read,write}_prep() are called
only during submission where @force_nonblock is always true. Don't keep
propagating it and instead remove the @force_nonblock argument
from prep() altogether.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

73debe68

io_uring: set/clear IOCB_NOWAIT into io_read/write · a88fc400

由 Pavel Begunkov 提交于 9月 30, 2020

Move setting IOCB_NOWAIT from io_prep_rw() into io_read()/io_write(), so
it's set/cleared in a single place. Also remove @force_nonblock
parameter from io_prep_rw().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a88fc400

io_uring: remove F_NEED_CLEANUP check in *prep() · 2d199895

由 Pavel Begunkov 提交于 9月 30, 2020

REQ_F_NEED_CLEANUP is set only by io_*_prep() and they're guaranteed to
be called only once, so there is no one who may have set the flag
before. Kill REQ_F_NEED_CLEANUP check in these *prep() handlers.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2d199895

io_uring: io_kiocb_ppos() style change · 5b09e37e

由 Pavel Begunkov 提交于 9月 30, 2020

Put brackets around bitwise ops in a complex expression
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5b09e37e

io_uring: simplify io_alloc_req() · 291b2821

由 Pavel Begunkov 提交于 9月 30, 2020

Extract common code from if/else branches. That is cleaner and optimised
even better.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

291b2821

io_uring: show sqthread pid and cpu in fdinfo · dbbe9c64

由 Joseph Qi 提交于 9月 29, 2020

In most cases we'll specify IORING_SETUP_SQPOLL and run multiple
io_uring instances in a host. Since all sqthreads are named
"io_uring-sq", it's hard to distinguish the relations between
application process and its io_uring sqthread.
With this patch, application can get its corresponding sqthread pid
and cpu through show_fdinfo.
Steps:
1. Get io_uring fd first.
$ ls -l /proc/<pid>/fd | grep -w io_uring
2. Then get io_uring instance related info, including corresponding
sqthread pid and cpu.
$ cat /proc/<pid>/fdinfo/<io_uring_fd>

pos:	0
flags:	02000002
mnt_id:	13
SqThread:	6929
SqThreadCpu:	2
UserFiles:	1
    0: testfile
UserBufs:	0
PollList:
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
[axboe: fixed for new shared SQPOLL]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dbbe9c64

io_uring: process task work in io_uring_register() · af9c1a44

由 Jens Axboe 提交于 9月 24, 2020

We do this for CQ ring wait, in case task_work completions come in. We
should do the same in io_uring_register(), to avoid spurious -EINTR
if the ring quiescing ends up having to process task_work to complete
the operation
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

af9c1a44

io_uring: add blkcg accounting to offloaded operations · 91d8f519

由 Dennis Zhou 提交于 9月 16, 2020

There are a few operations that are offloaded to the worker threads. In
this case, we lose process context and end up in kthread context. This
results in ios to be not accounted to the issuing cgroup and
consequently end up as issued by root. Just like others, adopt the
personality of the blkcg too when issuing via the workqueues.

For the SQPOLL thread, it will live and attach in the inited cgroup's
context.
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

91d8f519

io_uring: improve registered buffer accounting for huge pages · de293938

由 Jens Axboe 提交于 9月 17, 2020

io_uring does account any registered buffer as pinned/locked memory, and
checks limit and fails if the given user doesn't have a big enough limit
to register the ranges specified. However, if huge pages are used, we
are potentially under-accounting the memory in terms of what gets pinned
on the vm side.

This patch rectifies that, by ensuring that we account the full size of
a compound page, regardless of how much of it is being registered. Huge
pages are not accounted mulitple times - if multiple sections of a huge
page is registered, then the page is only accounted once.
Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de293938

io_uring: remove unneeded semicolon · 14db8411

由 Zheng Bin 提交于 9月 09, 2020

Fixes coccicheck warning:

fs/io_uring.c:4242:13-14: Unneeded semicolon
Signed-off-by: NZheng Bin <zhengbin13@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

14db8411

io_uring: cap SQ submit size for SQPOLL with multiple rings · e95eee2d

由 Jens Axboe 提交于 9月 08, 2020

In the spirit of fairness, cap the max number of SQ entries we'll submit
for SQPOLL if we have multiple rings. If we don't do that, we could be
submitting tons of entries for one ring, while others are waiting to get
service.

The value of 8 is somewhat arbitrarily chosen as something that allows
a fair bit of batching, without using an excessive time per ring.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e95eee2d

io_uring: get rid of req->io/io_async_ctx union · e8c2bc1f

由 Jens Axboe 提交于 8月 15, 2020

There's really no point in having this union, it just means that we're
always allocating enough room to cater to any command. But that's
pointless, as the ->io field is request type private anyway.

This gets rid of the io_async_ctx structure, and fills in the required
size in the io_op_defs[] instead.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e8c2bc1f

io_uring: kill extra user_bufs check · 4be1c615

由 Pavel Begunkov 提交于 9月 06, 2020

Testing ctx->user_bufs for NULL in io_import_fixed() is not neccessary,
because in that case ctx->nr_user_bufs would be zero, and the following
check would fail.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4be1c615

io_uring: fix overlapped memcpy in io_req_map_rw() · ab0b196c

由 Pavel Begunkov 提交于 9月 06, 2020

When io_req_map_rw() is called from io_rw_prep_async(), it memcpy()
iorw->iter into itself. Even though it doesn't lead to an error, such a
memcpy()'s aliasing rules violation is considered to be a bad practise.

Inline io_req_map_rw() into io_rw_prep_async(). We don't really need any
remapping there, so it's much simpler than the generic implementation.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ab0b196c

io_uring: refactor io_req_map_rw() · afb87658

由 Pavel Begunkov 提交于 9月 06, 2020

Set rw->free_iovec to @iovec, that gives an identical result and stresses
that @iovec param rw->free_iovec play the same role.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

afb87658

io_uring: simplify io_rw_prep_async() · f4bff104

由 Pavel Begunkov 提交于 9月 06, 2020

Don't touch iter->iov and iov in between __io_import_iovec() and
io_req_map_rw(), the former function aleady sets it correctly, because it
creates one more case with NULL'ed iov to consider in io_req_map_rw().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f4bff104

io_uring: provide IORING_ENTER_SQ_WAIT for SQPOLL SQ ring waits · 90554200

由 Jens Axboe 提交于 9月 03, 2020

When using SQPOLL, applications can run into the issue of running out of
SQ ring entries because the thread hasn't consumed them yet. The only
option for dealing with that is checking later, or busy checking for the
condition.

Provide IORING_ENTER_SQ_WAIT if applications want to wait on this
condition.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

90554200

J
io_uring: mark io_uring_fops/io_op_defs as __read_mostly · 738277ad
由 Jens Axboe 提交于 9月 03, 2020
```
These structures are never written, move them appropriately.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
738277ad

io_uring: enable IORING_SETUP_ATTACH_WQ to attach to SQPOLL thread too · aa06165d

由 Jens Axboe 提交于 9月 02, 2020

We support using IORING_SETUP_ATTACH_WQ to share async backends between
rings created by the same process, this now also allows the same to
happen with SQPOLL. The setup procedure remains the same, the caller
sets io_uring_params->wq_fd to the 'parent' context, and then the newly
created ring will attach to that async backend.

This means that multiple rings can share the same SQPOLL thread, saving
resources.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

aa06165d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功