提交 · 4e0377a1c5c633852f443a562ec55f7dfea65350 · openeuler / Kernel

02 2月, 2021 33 次提交

io_uring: Add skip option for __io_sqe_files_update · 4e0377a1

由 noah 提交于 1月 26, 2021

This patch adds support for skipping a file descriptor when using
IORING_REGISTER_FILES_UPDATE.  __io_sqe_files_update will skip fds set
to IORING_REGISTER_FILES_SKIP. IORING_REGISTER_FILES_SKIP is inturn
added as a #define in io_uring.h
Signed-off-by: Nnoah <goldstein.w.n@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4e0377a1

io_uring: cleanup files_update looping · 67973b93

由 Pavel Begunkov 提交于 1月 26, 2021

Replace a while with a simple for loop, that looks way more natural, and
enables us to use "continue" as indexes are no more updated by hand in
the end of the loop.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

67973b93

io_uring: consolidate putting reqs task · 7c660731

由 Pavel Begunkov 提交于 1月 25, 2021

We grab a task for each request and while putting it it also have to do
extra work like inflight accounting and waking up that task. This
sequence is duplicated several time, it's good time to add a helper.
More to that, the helper generates better code due to better locality
and so not failing alias analysis.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7c660731

io_uring: ensure only sqo_task has file notes · ecfc8492

由 Pavel Begunkov 提交于 1月 25, 2021

For SQPOLL io_uring we want to have only one file note held by
sqo_task. Add a warning to make sure it holds. It's deep in
io_uring_add_task_file() out of hot path, so shouldn't hurt.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ecfc8492

io_uring: simplify io_remove_personalities() · 0bead8cd

由 Yejune Deng 提交于 12月 24, 2020

The function io_remove_personalities() is very similar to
io_unregister_personality(),so implement io_remove_personalities()
calling io_unregister_personality().
Signed-off-by: NYejune Deng <yejune.deng@gmail.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0bead8cd

io_uring/io-wq: kill off now unused IO_WQ_WORK_NO_CANCEL · 4014d943

由 Jens Axboe 提交于 1月 19, 2021

It's no longer used as IORING_OP_CLOSE got rid for the need of flagging
it as uncancelable, kill it of.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4014d943

io_uring: get rid of intermediate IORING_OP_CLOSE stage · 9eac1904

由 Jens Axboe 提交于 1月 19, 2021

We currently split the close into two, in case we have a ->flush op
that we can't safely handle from non-blocking context. This requires
us to flag the op as uncancelable if we do need to punt it async, and
that means special handling for just this op type.

Use __close_fd_get_file() and grab the files lock so we can get the file
and check if we need to go async in one atomic operation. That gets rid
of the need for splitting this into two steps, and hence the need for
IO_WQ_WORK_NO_CANCEL.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9eac1904

fs: provide locked helper variant of close_fd_get_file() · 53dec2ea

由 Jens Axboe 提交于 1月 19, 2021

Assumes current->files->file_lock is already held on invocation. Helps
the caller check the file before removing the fd, if it needs to.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

53dec2ea

io_uring: save atomic dec for inline executed reqs · e342c807

由 Pavel Begunkov 提交于 1月 19, 2021

When a request is completed with comp_state, its completion reference
put is deferred to io_submit_flush_completions(), but the submission
is put not far from there, so do it together to save one atomic dec per
request. That targets requests that complete inline, e.g. buffered rw,
send/recv.

Proper benchmarking haven't been conducted but for nops(batch=32) it was
around 7901 vs 8117 KIOPS (~2.7%), or ~4% per perf profiling.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e342c807

io_uring: don't flush CQEs deep down the stack · 9affd664

由 Pavel Begunkov 提交于 1月 19, 2021

io_submit_flush_completions() is called down the stack in the _state
version of io_req_complete(), that's ok because is only called by
io_uring opcode handler functions directly. Move it up to
__io_queue_sqe() as preparation.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9affd664

io_uring: help inlining of io_req_complete() · a38d68db

由 Pavel Begunkov 提交于 1月 19, 2021

__io_req_complete() inlining is a bit weird, some compilers don't
optimise out the non-NULL branch of it even when called as
io_req_complete(). Help it a bit by extracting state and stateless
helpers out of __io_req_complete().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a38d68db

io_uring: add a helper timeout mode calculation · 8662daec

由 Pavel Begunkov 提交于 1月 19, 2021

Deduplicates translation of timeout flags into hrtimer_mode.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8662daec

io_uring: deduplicate failing task_work_add · eab30c4d

由 Pavel Begunkov 提交于 1月 19, 2021

When io_req_task_work_add() fails, the request will be cancelled by
enqueueing via task_works of io-wq. Extract a function for that.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eab30c4d

io_uring: remove __io_state_file_put · 02b23a9a

由 Pavel Begunkov 提交于 1月 19, 2021

The check in io_state_file_put() is optimised pretty well when called
from __io_file_get(). Don't pollute the code with all these variants.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

02b23a9a

io_uring: simplify io_alloc_req() · 85bcb6c6

由 Pavel Begunkov 提交于 1月 19, 2021

Get rid of a label in io_alloc_req(), it's cleaner to do return
directly.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

85bcb6c6

io_uring: further deduplicate #CQ events calc · 888aae2e

由 Pavel Begunkov 提交于 1月 19, 2021

Apparently, there is one more place hand coded calculation of number of
CQ events in the ring. Use __io_cqring_events() helper in
io_get_cqring() as well. Naturally, assembly stays identical.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

888aae2e

io_uring: inline __io_commit_cqring() · ec30e04b

由 Pavel Begunkov 提交于 1月 19, 2021

Inline it in its only user, that's cleaner
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ec30e04b

io_uring: inline io_async_submit() · 2d7e9358

由 Pavel Begunkov 提交于 1月 19, 2021

The name is confusing and it's used only in one place.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2d7e9358

io_uring: cleanup personalities under uring_lock · 5c766a90

由 Pavel Begunkov 提交于 1月 19, 2021

personality_idr is usually synchronised by uring_lock, the exception
would be removing personalities in io_ring_ctx_wait_and_kill(), which
is legit as refs are killed by that point but still would be more
resilient to do it under the lock.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5c766a90

io_uring: refactor io_resubmit_prep() · dc2a6e9a

由 Pavel Begunkov 提交于 1月 19, 2021

It's awkward to pass return a value into a function for it to return it
back. Check it at the caller site and clean up io_resubmit_prep() a bit.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dc2a6e9a

io_uring: optimise io_rw_reissue() · bf6182b6

由 Pavel Begunkov 提交于 1月 19, 2021

The hot path is IO completing on the first try. Reshuffle io_rw_reissue() so
it's checked first.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bf6182b6

io_uring: make percpu_ref_release names consistent · 00835dce

由 Bijan Mottahedeh 提交于 1月 15, 2021

Make the percpu ref release function names consistent between rsrc data
and nodes.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

00835dce

io_uring: create common fixed_rsrc_data allocation routines · 1ad555c6

由 Bijan Mottahedeh 提交于 1月 15, 2021

Create common alloc/free fixed_rsrc_data routines for both files and
buffers.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
[remove buffer part]
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1ad555c6

io_uring: create common fixed_rsrc_ref_node handling routines · d7954b2b

由 Bijan Mottahedeh 提交于 1月 15, 2021

Create common routines to be used for both files/buffers registration.

[remove io_sqe_rsrc_set_node substitution]
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
[merge, quiesce only for files]
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d7954b2b

io_uring: split ref_node alloc and init · bc9744cd

由 Pavel Begunkov 提交于 1月 15, 2021

A simple prep patch allowing to set refnode callbacks after it was
allocated. This needed to 1) keep ourself off of hi-level functions
where it's not pretty and they are not necessary 2) amortise ref_node
allocation in the future, e.g. for updates.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bc9744cd

io_uring: split alloc_fixed_file_ref_node · 6802535d

由 Bijan Mottahedeh 提交于 1月 15, 2021

Split alloc_fixed_file_ref_node into resource generic/specific parts,
to be leveraged for fixed buffers.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6802535d

io_uring: add rsrc_ref locking routines · 2a63b2d9

由 Bijan Mottahedeh 提交于 1月 15, 2021

Encapsulate resource reference locking into separate routines.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2a63b2d9

io_uring: separate ref_list from fixed_rsrc_data · d67d2263

由 Bijan Mottahedeh 提交于 1月 15, 2021

Uplevel ref_list and make it common to all resources.  This is to
allow one common ref_list to be used for both files, and buffers
in upcoming patches.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d67d2263

io_uring: generalize io_queue_rsrc_removal · 50238531

由 Bijan Mottahedeh 提交于 1月 15, 2021

Generalize io_queue_rsrc_removal to handle both files and buffers.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
[remove io_mapped_ubuf from rsrc tables/etc. for now]
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

50238531

io_uring: rename file related variables to rsrc · 269bbe5f

由 Bijan Mottahedeh 提交于 1月 15, 2021

This is a prep rename patch for subsequent patches to generalize file
registration.

[io_uring_rsrc_update:: rename fds -> data]
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
[leave io_uring_files_update as struct]
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

269bbe5f

io_uring: modularize io_sqe_buffers_register · 2b358604

由 Bijan Mottahedeh 提交于 1月 06, 2021

Move allocation of buffer management structures, and validation of
buffers into separate routines.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2b358604

io_uring: modularize io_sqe_buffer_register · 0a96bbe4

由 Bijan Mottahedeh 提交于 1月 06, 2021

Split io_sqe_buffer_register into two routines:

- io_sqe_buffer_register() registers a single buffer
- io_sqe_buffers_register iterates over all user specified buffers
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0a96bbe4

io_uring: enable LOOKUP_CACHED path resolution for filename lookups · 3a81fd02

由 Jens Axboe 提交于 12月 10, 2020

Instead of being pessimistic and assume that path lookup will block, use
LOOKUP_CACHED to attempt just a cached lookup. This ensures that the
fast path is always done inline, and we only punt to async context if
IO is needed to satisfy the lookup.

For forced nonblock open attempts, mark the file O_NONBLOCK over the
actual ->open() call as well. We can safely clear this again before
doing fd_install(), so it'll never be user visible that we fiddled with
it.

This greatly improves the performance of file open where the dentry is
already cached:

ached		5.10-git	5.10-git+LOOKUP_CACHED	Speedup
---------------------------------------------------------------
33%		1,014,975	900,474			1.1x
89%		 545,466	292,937			1.9x
100%		 435,636	151,475			2.9x

The more cache hot we are, the faster the inline LOOKUP_CACHED
optimization helps. This is unsurprising and expected, as a thread
offload becomes a more dominant part of the total overhead. If we look
at io_uring tracing, doing an IORING_OP_OPENAT on a file that isn't in
the dentry cache will yield:

275.550481: io_uring_create: ring 00000000ddda6278, fd 3 sq size 8, cq size 16, flags 0
275.550491: io_uring_submit_sqe: ring 00000000ddda6278, op 18, data 0x0, non block 1, sq_thread 0
275.550498: io_uring_queue_async_work: ring 00000000ddda6278, request 00000000c0267d17, flags 69760, normal queue, work 000000003d683991
275.550502: io_uring_cqring_wait: ring 00000000ddda6278, min_events 1
275.550556: io_uring_complete: ring 00000000ddda6278, user_data 0x0, result 4

which shows a failed nonblock lookup, then punt to worker, and then we
complete with fd == 4. This takes 65 usec in total. Re-running the same
test case again:

281.253956: io_uring_create: ring 0000000008207252, fd 3 sq size 8, cq size 16, flags 0
281.253967: io_uring_submit_sqe: ring 0000000008207252, op 18, data 0x0, non block 1, sq_thread 0
281.253973: io_uring_complete: ring 0000000008207252, user_data 0x0, result 4

shows the same request completing inline, also returning fd == 4. This
takes 6 usec.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3a81fd02

29 1月, 2021 6 次提交

cifs: fix dfs domain referrals · 0d4873f9

由 Ronnie Sahlberg 提交于 1月 28, 2021

The new mount API requires additional changes to how DFS
is handled. Additional testing of DFS uncovered problems
with domain based DFS referrals (a follow on patch addresses
DFS links) which this patch addresses.
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: NSteve French <stfrench@microsoft.com>

0d4873f9

io_uring: reinforce cancel on flush during exit · 3a7efd1a

由 Pavel Begunkov 提交于 1月 28, 2021

What 84965ff8 ("io_uring: if we see flush on exit, cancel related tasks")
really wants is to cancel all relevant REQ_F_INFLIGHT requests reliably.
That can be achieved by io_uring_cancel_files(), but we'll miss it
calling io_uring_cancel_task_requests(files=NULL) from io_uring_flush(),
because it will go through __io_uring_cancel_task_requests().

Just always call io_uring_cancel_files() during cancel, it's good enough
for now.

Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3a7efd1a

cifs: returning mount parm processing errors correctly · bd2f0b43

由 Steve French 提交于 1月 28, 2021

During additional testing of the updated cifs.ko with the
new mount API support, we found a few additional cases where
we were logging errors, but not returning them to the user.

For example:
   a) invalid security mechanisms
   b) invalid cache options
   c) unsupported rdma
   d) invalid smb dialect requested

Fixes: 24e0a1ef ("cifs: switch to new mount api")
Acked-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

bd2f0b43

io_uring: fix sqo ownership false positive warning · 70b2c60d

由 Pavel Begunkov 提交于 1月 28, 2021

WARNING: CPU: 0 PID: 21359 at fs/io_uring.c:9042
    io_uring_cancel_task_requests+0xe55/0x10c0 fs/io_uring.c:9042
Call Trace:
 io_uring_flush+0x47b/0x6e0 fs/io_uring.c:9227
 filp_close+0xb4/0x170 fs/open.c:1295
 close_files fs/file.c:403 [inline]
 put_files_struct fs/file.c:418 [inline]
 put_files_struct+0x1cc/0x350 fs/file.c:415
 exit_files+0x7e/0xa0 fs/file.c:435
 do_exit+0xc22/0x2ae0 kernel/exit.c:820
 do_group_exit+0x125/0x310 kernel/exit.c:922
 get_signal+0x427/0x20f0 kernel/signal.c:2773
 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
 handle_signal_work kernel/entry/common.c:147 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
 syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Now io_uring_cancel_task_requests() can be called not through file
notes but directly, remove a WARN_ONCE() there that give us false
positives. That check is not very important and we catch it in other
places.

Fixes: 84965ff8 ("io_uring: if we see flush on exit, cancel related tasks")
Cc: stable@vger.kernel.org # 5.9+
Reported-by: syzbot+3e3d9bd0c6ce9efbc3ef@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

70b2c60d

io_uring: fix list corruption for splice file_get · f609cbb8

由 Pavel Begunkov 提交于 1月 28, 2021

kernel BUG at lib/list_debug.c:29!
Call Trace:
 __list_add include/linux/list.h:67 [inline]
 list_add include/linux/list.h:86 [inline]
 io_file_get+0x8cc/0xdb0 fs/io_uring.c:6466
 __io_splice_prep+0x1bc/0x530 fs/io_uring.c:3866
 io_splice_prep fs/io_uring.c:3920 [inline]
 io_req_prep+0x3546/0x4e80 fs/io_uring.c:6081
 io_queue_sqe+0x609/0x10d0 fs/io_uring.c:6628
 io_submit_sqe fs/io_uring.c:6705 [inline]
 io_submit_sqes+0x1495/0x2720 fs/io_uring.c:6953
 __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9353
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

io_file_get() may be called from splice, and so REQ_F_INFLIGHT may
already be set.

Fixes: 02a13674 ("io_uring: account io_uring internal files as REQ_F_INFLIGHT")
Cc: stable@vger.kernel.org # 5.9+
Reported-by: syzbot+6879187cf57845801267@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f609cbb8

cifs: fix mounts to subdirectories of target · c9b8cd6a

由 Steve French 提交于 1月 28, 2021

The "prefixpath" mount option needs to be ignored
which was missed in the recent conversion to the
new mount API (prefixpath would be set by the mount
helper if mounting a subdirectory of the root of a
share e.g. //server/share/subdir)

Fixes: 24e0a1ef ("cifs: switch to new mount api")
Suggested-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>

c9b8cd6a

28 1月, 2021 1 次提交

cifs: ignore auto and noauto options if given · 19d51588

由 Adam Harvey 提交于 1月 27, 2021

In 24e0a1ef, the noauto and auto options were missed when migrating
to the new mount API. As a result, users with noauto in their fstab
mount options are now unable to mount cifs filesystems, as they'll
receive an "Unknown parameter" error.

This restores the old behaviour of ignoring noauto and auto if they're
given.

Fixes: 24e0a1ef ("cifs: switch to new mount api")
Signed-off-by: NAdam Harvey <adam@adamharvey.name>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

19d51588

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功