提交 · 4e9067025259d1227c7f4f18a02c166c93e49290 · openeuler / Kernel

09 5月, 2022 4 次提交

io_uring: always use req->buf_index for the provided buffer group · 4e906702

由 Jens Axboe 提交于 4月 28, 2022

The read/write opcodes use it already, but the recv/recvmsg do not. If
we switch them over and read and validate this at init time while we're
checking if the opcode supports it anyway, then we can do it in one spot
and we don't have to pass in a separate group ID for io_buffer_select().
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4e906702

io_uring: ignore ->buf_index if REQ_F_BUFFER_SELECT isn't set · bb68d504

由 Jens Axboe 提交于 4月 29, 2022

There's no point in validity checking buf_index if the request doesn't
have REQ_F_BUFFER_SELECT set, as we will never use it for that case.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bb68d504

io_uring: kill io_rw_buffer_select() wrapper · e5b00349

由 Jens Axboe 提交于 4月 28, 2022

After the recent changes, this is direct call to io_buffer_select()
anyway. With this change, there are no wrappers left for provided
buffer selection.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e5b00349

io_uring: make io_buffer_select() return the user address directly · c54d52c2

由 Jens Axboe 提交于 4月 28, 2022

There's no point in having callers provide a kbuf, we're just returning
the address anyway.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c54d52c2

06 5月, 2022 4 次提交

io_uring: kill io_recv_buffer_select() wrapper · 9396ed85

由 Jens Axboe 提交于 4月 28, 2022

It's just a thin wrapper around io_buffer_select(), get rid of it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9396ed85

io_uring: use 'sr' vs 'req->sr_msg' consistently · 0a352aaa

由 Jens Axboe 提交于 4月 30, 2022

For all of send/sendmsg and recv/recvmsg we have the local 'sr' variable,
yet some cases still use req->sr_msg which sr points to. Use 'sr'
consistently.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0a352aaa

io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg · 0455d4cc

由 Jens Axboe 提交于 4月 26, 2022

If IORING_RECVSEND_POLL_FIRST is set for recv/recvmsg or send/sendmsg,
then we arm poll first rather than attempt a receive or send upfront.
This can be useful if we expect there to be no data (or space) available
for the request, as we can then avoid wasting time on the initial
issue attempt.
Reviewed-by: NHao Xu <howeyxu@tencent.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0455d4cc

io_uring: check IOPOLL/ioprio support upfront · 73911426

由 Jens Axboe 提交于 4月 26, 2022

Don't punt this check to the op prep handlers, add the support to
io_op_defs and we can check them while setting up the request.

This reduces the text size by 500 bytes on aarch64, and makes this less
fragile by having the check in one spot and needing opcodes to opt in
to IOPOLL or ioprio support.
Reviewed-by: NHao Xu <howeyxu@tencent.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

73911426

30 4月, 2022 7 次提交

io_uring: replace smp_mb() with smp_mb__after_atomic() in io_sq_thread() · f2e030dd

由 Almog Khaikin 提交于 4月 26, 2022

The IORING_SQ_NEED_WAKEUP flag is now set using atomic_or() which
implies a full barrier on some architectures but it is not required to
do so. Use the more appropriate smp_mb__after_atomic() which avoids the
extra barrier on those architectures.
Signed-off-by: NAlmog Khaikin <almogkh@gmail.com>
Link: https://lore.kernel.org/r/20220426163403.112692-1-almogkh@gmail.com
Fixes: 8018823e6987 ("io_uring: serialize ctx->rings->sq_flags with atomic_or/and")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f2e030dd

io_uring: add IORING_SETUP_TASKRUN_FLAG · ef060ea9

由 Jens Axboe 提交于 4月 25, 2022

If IORING_SETUP_COOP_TASKRUN is set to use cooperative scheduling for
running task_work, then IORING_SETUP_TASKRUN_FLAG can be set so the
application can tell if task_work is pending in the kernel for this
ring. This allows use cases like io_uring_peek_cqe() to still function
appropriately, or for the task to know when it would be useful to
call io_uring_wait_cqe() to run pending events.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-7-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

ef060ea9

io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used · e1169f06

由 Jens Axboe 提交于 4月 25, 2022

If this is set, io_uring will never use an IPI to deliver a task_work
notification. This can be used in the common case where a single task or
thread communicates with the ring, and doesn't rely on
io_uring_cqe_peek().

This provides a noticeable win in performance, both from eliminating
the IPI itself, but also from avoiding interrupting the submitting
task unnecessarily.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-6-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

e1169f06

io_uring: set task_work notify method at init time · 9f010507

由 Jens Axboe 提交于 4月 25, 2022

While doing so, switch SQPOLL to TWA_SIGNAL_NO_IPI as well, as that
just does a task wakeup and then we can remove the special wakeup we
have in task_work_add.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-5-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

9f010507

io-wq: use __set_notify_signal() to wake workers · 6cf5862e

由 Jens Axboe 提交于 4月 25, 2022

The only difference between set_notify_signal() and __set_notify_signal()
is that the former checks if it needs to deliver an IPI to force a
reschedule. As the io-wq workers never leave the kernel, and IPI is never
needed, they simply need a wakeup.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-4-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

6cf5862e

io_uring: serialize ctx->rings->sq_flags with atomic_or/and · 3a4b89a2

由 Jens Axboe 提交于 4月 25, 2022

Rather than require ctx->completion_lock for ensuring that we don't
clobber the flags, use the atomic bitop helpers instead. This removes
the need to grab the completion_lock, in preparation for needing to set
or clear sq_flags when we don't know the status of this lock.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-3-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

3a4b89a2

task_work: allow TWA_SIGNAL without a rescheduling IPI · e788be95

由 Jens Axboe 提交于 4月 28, 2022

Some use cases don't always need an IPI when sending a TWA_SIGNAL
notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL, except
it doesn't send an IPI to the target task. It merely sets
TIF_NOTIFY_SIGNAL and wakes up the task.

This can be useful in avoiding a forceful transition to the kernel if the
task is running in userspace. Depending on the task_work in question, it
may be quite fine waiting for the next reschedule or kernel enter anyway,
or the use case may even have other mechanisms for hinting to the task
that a transition may be useful. This can drive more cooperative
scheduling of task_work.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/821f42b6-7d91-8074-8212-d34998097de4@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

e788be95

26 4月, 2022 1 次提交

io_uring: fix compile warning for 32-bit builds · 69cc1b6f

由 Jens Axboe 提交于 4月 25, 2022

If IO_URING_SCM_ALL isn't set, as it would not be on 32-bit builds,
then we trigger a warning:

fs/io_uring.c: In function '__io_sqe_files_unregister':
fs/io_uring.c:8992:13: warning: unused variable 'i' [-Wunused-variable]
 8992 |         int i;
      |             ^

Move the ifdef up to include the 'i' variable declaration.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Fixes: 5e45690a ("io_uring: store SCM state in io_fixed_file->file_ptr")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

69cc1b6f

25 4月, 2022 24 次提交

io_uring: return an error when cqe is dropped · 155bc950

由 Dylan Yudaken 提交于 4月 21, 2022

Right now io_uring will not actively inform userspace if a CQE is
dropped. This is extremely rare, requiring a CQ ring overflow, as well as
a GFP_ATOMIC kmalloc failure. However the consequences could cause for
example applications to go into an undefined state, possibly waiting for a
CQE that never arrives.

Return an error code (EBADR) in these cases. Since this is expected to be
incredibly rare, try and avoid as much as possible affecting the hot code
paths, and so it only is returned lazily and when there is no other
available CQEs.

Once the error is returned, reset the error condition assuming the user is
either ok with it or will clean up appropriately.
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-6-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

155bc950

io_uring: use constants for cq_overflow bitfield · 10988a0a

由 Dylan Yudaken 提交于 4月 21, 2022

Prepare to use this bitfield for more flags by using constants instead of
magic value 0
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-5-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

10988a0a

io_uring: rework io_uring_enter to simplify return value · 3e813c90

由 Dylan Yudaken 提交于 4月 21, 2022

io_uring_enter returns the count submitted preferrably over an error
code. In some code paths this check is not required, so reorganise the
code so that the check is only done as needed.
This is also a prep for returning error codes only in waiting scenarios.
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-4-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3e813c90

io_uring: trace cqe overflows · 08dcd028

由 Dylan Yudaken 提交于 4月 21, 2022

Trace cqe overflows in io_uring. Print ocqe before the check, so if it is
NULL it indicates that it has been dropped.
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-3-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

08dcd028

io_uring: add trace support for CQE overflow · 47894438

由 Dylan Yudaken 提交于 4月 21, 2022

Add trace function for overflowing CQ ring.
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-2-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

47894438

io_uring: allow re-poll if we made progress · 10c87333

由 Jens Axboe 提交于 4月 20, 2022

We currently check REQ_F_POLLED before arming async poll for a
notification to retry. If it's set, then we don't allow poll and will
punt to io-wq instead. This is done to prevent a situation where a buggy
driver will repeatedly return that there's space/data available yet we
get -EAGAIN.

However, if we already transferred data, then it should be safe to rely
on poll again. Gate the check on whether or not REQ_F_PARTIAL_IO is
also set.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

10c87333

io_uring: support MSG_WAITALL for IORING_OP_SEND(MSG) · 4c3c0943

由 Jens Axboe 提交于 4月 20, 2022

Like commit 7ba89d2a for recv/recvmsg, support MSG_WAITALL for the
send side. If this flag is set and we do a short send, retry for a
stream of seqpacket socket.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4c3c0943

io_uring: add support for IORING_ASYNC_CANCEL_ANY · 970f256e

由 Jens Axboe 提交于 4月 18, 2022

Rather than match on a specific key, be it user_data or file, allow
canceling any request that we can lookup. Works like
IORING_ASYNC_CANCEL_ALL in that it cancels multiple requests, but it
doesn't key off user_data or the file.

Can't be set with IORING_ASYNC_CANCEL_FD, as that's a key selector.
Only one may be used at the time.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-6-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

970f256e

io_uring: allow IORING_OP_ASYNC_CANCEL with 'fd' key · 4bf94615

由 Jens Axboe 提交于 4月 18, 2022

Currently sqe->addr must contain the user_data of the request being
canceled. Introduce the IORING_ASYNC_CANCEL_FD flag, which tells the
kernel that we're keying off the file fd instead for cancelation. This
allows canceling any request that a) uses a file, and b) was assigned the
file based on the value being passed in.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-5-axboe@kernel.dk

4bf94615

io_uring: add support for IORING_ASYNC_CANCEL_ALL · 8e29da69

由 Jens Axboe 提交于 4月 18, 2022

The current cancelation will lookup and cancel the first request it
finds based on the key passed in. Add a flag that allows to cancel any
request that matches they key. It completes with the number of requests
found and canceled, or res < 0 if an error occured.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-4-axboe@kernel.dk

8e29da69

io_uring: pass in struct io_cancel_data consistently · b21432b4

由 Jens Axboe 提交于 4月 18, 2022

In preparation for being able to not only key cancel off the user_data,
pass in the io_cancel_data struct for the various functions that deal
with request cancelation.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-3-axboe@kernel.dk

b21432b4

io_uring: remove dead 'poll_only' argument to io_poll_cancel() · 98d3dcc8

由 Jens Axboe 提交于 4月 18, 2022

It's only called from one location, and it always passes in 'false'.
Kill the argument, and just pass in 'false' to io_poll_find().
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-2-axboe@kernel.dk

98d3dcc8

io_uring: refactor io_disarm_next() locking · 81ec803b

由 Pavel Begunkov 提交于 4月 20, 2022

Split timeout handling into removal + failing, so we can reduce
spinlocking time and remove another instance of triple nested locking.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0f00d115f9d4c5749028f19623708ad3695512d6.1650458197.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

81ec803b

io_uring: move timeout locking in io_timeout_cancel() · 3645c200

由 Pavel Begunkov 提交于 4月 20, 2022

Move ->timeout_lock grabbing inside of io_timeout_cancel(), so
we can do io_req_task_queue_fail() outside of the lock. It's much nicer
than relying on triple nested locking.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cde758c2897930d31e205ed8f476d4ec879a8849.1650458197.git.asml.silence@gmail.com
[axboe: drop now wrong timeout_lock annotation]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3645c200

io_uring: store SCM state in io_fixed_file->file_ptr · 5e45690a

由 Jens Axboe 提交于 4月 20, 2022

A previous commit removed SCM accounting for non-unix sockets, as those
are the only ones that can cause a fixed file reference. While that is
true, it also means we're now dereferencing the file as part of the
workqueue driven __io_sqe_files_unregister() after the process has
exited. This isn't safe for SCM files, as unix gc may have already
reaped them when the process exited. KASAN complains about this:

[   12.307040] Freed by task 0:
[   12.307592]  kasan_save_stack+0x28/0x4c
[   12.308318]  kasan_set_track+0x28/0x38
[   12.309049]  kasan_set_free_info+0x24/0x44
[   12.309890]  ____kasan_slab_free+0x108/0x11c
[   12.310739]  __kasan_slab_free+0x14/0x1c
[   12.311482]  slab_free_freelist_hook+0xd4/0x164
[   12.312382]  kmem_cache_free+0x100/0x1dc
[   12.313178]  file_free_rcu+0x58/0x74
[   12.313864]  rcu_core+0x59c/0x7c0
[   12.314675]  rcu_core_si+0xc/0x14
[   12.315496]  _stext+0x30c/0x414
[   12.316287]
[   12.316687] Last potentially related work creation:
[   12.317885]  kasan_save_stack+0x28/0x4c
[   12.318845]  __kasan_record_aux_stack+0x9c/0xb0
[   12.319976]  kasan_record_aux_stack_noalloc+0x10/0x18
[   12.321268]  call_rcu+0x50/0x35c
[   12.322082]  __fput+0x2fc/0x324
[   12.322873]  ____fput+0xc/0x14
[   12.323644]  task_work_run+0xac/0x10c
[   12.324561]  do_notify_resume+0x37c/0xe74
[   12.325420]  el0_svc+0x5c/0x68
[   12.326050]  el0t_64_sync_handler+0xb0/0x12c
[   12.326918]  el0t_64_sync+0x164/0x168
[   12.327657]
[   12.327976] Second to last potentially related work creation:
[   12.329134]  kasan_save_stack+0x28/0x4c
[   12.329864]  __kasan_record_aux_stack+0x9c/0xb0
[   12.330735]  kasan_record_aux_stack+0x10/0x18
[   12.331576]  task_work_add+0x34/0xf0
[   12.332284]  fput_many+0x11c/0x134
[   12.332960]  fput+0x10/0x94
[   12.333524]  __scm_destroy+0x80/0x84
[   12.334213]  unix_destruct_scm+0xc4/0x144
[   12.334948]  skb_release_head_state+0x5c/0x6c
[   12.335696]  skb_release_all+0x14/0x38
[   12.336339]  __kfree_skb+0x14/0x28
[   12.336928]  kfree_skb_reason+0xf4/0x108
[   12.337604]  unix_gc+0x1e8/0x42c
[   12.338154]  unix_release_sock+0x25c/0x2dc
[   12.338895]  unix_release+0x58/0x78
[   12.339531]  __sock_release+0x68/0xec
[   12.340170]  sock_close+0x14/0x20
[   12.340729]  __fput+0x18c/0x324
[   12.341254]  ____fput+0xc/0x14
[   12.341763]  task_work_run+0xac/0x10c
[   12.342367]  do_notify_resume+0x37c/0xe74
[   12.343086]  el0_svc+0x5c/0x68
[   12.343510]  el0t_64_sync_handler+0xb0/0x12c
[   12.344086]  el0t_64_sync+0x164/0x168

We have an extra bit we can use in file_ptr on 64-bit, use that to store
whether this file is SCM'ed or not, avoiding the need to look at the
file contents itself. This does mean that 32-bit will be stuck with SCM
for all registered files, just like 64-bit did before the referenced
commit.

Fixes: 1f59bc0f ("io_uring: don't scm-account for non af_unix sockets")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5e45690a

io_uring: kill ctx arg from io_req_put_rsrc · 7ac1edc4

由 Pavel Begunkov 提交于 4月 18, 2022

The ctx argument of io_req_put_rsrc() is not used, kill it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bb51bf3ff02775b03e6ea21bc79c25d7870d1644.1650311386.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7ac1edc4

io_uring: add a helper for putting rsrc nodes · 25a15d3c

由 Pavel Begunkov 提交于 4月 18, 2022

Add a simple helper to encapsulating dropping rsrc nodes references,
it's cleaner and will help if we'd change rsrc refcounting or play with
percpu_ref_put() [no]inlining.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/63fdd953ac75898734cd50e8f69e95e6664f46fe.1650311386.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

25a15d3c

io_uring: store rsrc node in req instead of refs · c1bdf8ed

由 Pavel Begunkov 提交于 4月 18, 2022

req->fixed_rsrc_refs keeps a pointer to rsrc node pcpu references, but
it's more natural just to store rsrc node directly. There were some
reasons for that in the past but not anymore.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cee1c86ec9023f3e4f6ce8940d58c017ef8782f4.1650311386.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c1bdf8ed

io_uring: refactor io_assign_file error path · 772f5e00

由 Pavel Begunkov 提交于 4月 18, 2022

All io_assign_file() callers do error handling themselves,
req_set_fail() in the io_assign_file()'s fail path needlessly bloats the
kernel and is not the best abstraction to have. Simplify the error path.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/eff77fb1eac2b6a90cca5223813e6a396ffedec0.1650311386.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

772f5e00

io_uring: use right helpers for file assign locking · 93f052cb

由 Pavel Begunkov 提交于 4月 18, 2022

We have io_ring_submit_[un]lock() functions helping us with conditional
->uring_lock locking, use them in io_file_get_fixed() instead of hand
coding.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c9c9ff1e046f6eb68da0a251962a697f8a2275fa.1650311386.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

93f052cb