提交 · 343ae23049ddc7712d4bdd72d6a5afb8c653c1e2 · openanolis / cloud-kernel

28 5月, 2020 40 次提交

io_uring: fix deferred req iovec leak · 343ae230

由 Pavel Begunkov 提交于 2月 06, 2020

to #26323588

commit 1e95081cb5b4cf77065d37866f57cf3c90a3df78 upstream.

After defer, a request will be prepared, that includes allocating iovec
if needed, and then submitted through io_wq_submit_work() but not custom
handler (e.g. io_rw_async()/io_sendrecv_async()). However, it'll leak
iovec, as it's in io-wq and the code goes as follows:

io_read() {
	if (!io_wq_current_is_worker())
		kfree(iovec);
}

Put all deallocation logic in io_{read,write,send,recv}(), which will
leave the memory, if going async with -EAGAIN.

It also fixes a leak after failed io_alloc_async_ctx() in
io_{recv,send}_msg().

Cc: stable@vger.kernel.org # 5.5
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

343ae230

io_uring: fix 1-bit bitfields to be unsigned · ced6d31b

由 Randy Dunlap 提交于 2月 05, 2020

to #26323588

commit e1d85334d62386e9503e4a0d5d022e2d8e0011a0 upstream.

Make bitfields of size 1 bit be unsigned (since there is no room
for the sign bit).
This clears up the sparse warnings:

  CHECK   ../fs/io_uring.c
../fs/io_uring.c:207:50: error: dubious one-bit signed bitfield
../fs/io_uring.c:208:55: error: dubious one-bit signed bitfield
../fs/io_uring.c:209:63: error: dubious one-bit signed bitfield
../fs/io_uring.c:210:54: error: dubious one-bit signed bitfield
../fs/io_uring.c:211:57: error: dubious one-bit signed bitfield

Found by sight and then verified with sparse.

Fixes: 69b3e546139a ("io_uring: change io_ring_ctx bool fields into bit fields")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

ced6d31b

io_uring: get rid of delayed mm check · e181f2f8

由 Pavel Begunkov 提交于 2月 06, 2020

to #26323588

commit 1cb1edb2f5ba8a3e8d47ded391007c6fe3ac0ad7 upstream.

Fail fast if can't grab mm, so past that requests always have an mm
when required. This allows us to remove req->user altogether.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e181f2f8

io_uring: cleanup fixed file data table references · 93ba78fd

由 Jens Axboe 提交于 2月 04, 2020

to #26323588

commit 2faf852d1be8a4960d328492298da6448cca0279 upstream.

syzbot reports a use-after-free in io_ring_file_ref_switch() when it
tries to switch back to percpu mode. When we put the final reference to
the table by calling percpu_ref_kill_and_confirm(), we don't want the
zero reference to queue async work for flushing the potentially queued
up items. We currently do a few flush_work(), but they merely paper
around the issue, since the work item may not have been queued yet
depending on the when the percpu-ref callback gets run.

Coming into the file unregister, we know we have the ring quiesced.
io_ring_file_ref_switch() can check for whether or not the ref is dying
or not, and not queue anything async at that point. Once the ref has
been confirmed killed, flush any potential items manually.

Reported-by: syzbot+7caeaea49c2c8a591e3d@syzkaller.appspotmail.com
Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

93ba78fd

io_uring: spin for sq thread to idle on shutdown · 74b46d93

由 Jens Axboe 提交于 2月 04, 2020

to #26323588

commit df069d80c8e38c19531c392322e9a16617475c44 upstream.

As part of io_uring shutdown, we cancel work that is pending and won't
necessarily complete on its own. That includes requests like poll
commands and timeouts.

If we're using SQPOLL for kernel side submission and we shutdown the
ring immediately after queueing such work, we can race with the sqthread
doing the submission. This means we may miss cancelling some work, which
results in the io_uring shutdown hanging forever.

Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

74b46d93

io_uring: put the flag changing code in the same spot · a7c70de9

由 Pavel Begunkov 提交于 2月 01, 2020

to #26323588

commit 3e577dcd73a1fdc641bf45e5ea4a37869de221b5 upstream.

Both iocb_flags() and kiocb_set_rw_flags() are inline and modify
kiocb->ki_flags. Place them close, so they can be potentially better
optimised.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a7c70de9

io_uring: iterate req cache backwards · f80ed44f

由 Pavel Begunkov 提交于 2月 01, 2020

to #26323588

commit 6c8a31346925cbb373f84a18428ab3df59d3950e upstream.

Grab requests from cache-array from the end, so can get by only
free_reqs.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f80ed44f

io_uring: punt even fadvise() WILLNEED to async context · a8af11c5

由 Jens Axboe 提交于 2月 01, 2020

to #26323588

commit 3e69426da2599677ebbe76e2d97a606c4797bd74 upstream.

Andres correctly points out that read-ahead can block, if it needs to
read in meta data (or even just through the page cache page allocations).
Play it safe for now and just ensure WILLNEED is also punted to async
context.

While in there, allow the file settings hints from non-blocking
context. They don't need to start/do IO, and we can safely do them
inline.

Fixes: 4840e418c2fc ("io_uring: add IORING_OP_FADVISE")
Reported-by: NAndres Freund <andres@anarazel.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a8af11c5

io_uring: fix sporadic double CQE entry for close · f809fef1

由 Jens Axboe 提交于 1月 31, 2020

to #26323588

commit 1a417f4e618e05fba29ba222f1e8555c302376ce upstream.

We punt close to async for the final fput(), but we log the completion
even before that even in that case. We rely on the request not having
a files table assigned to detect what the final async close should do.
However, if we punt the async queue to __io_queue_sqe(), we'll get
->files assigned and this makes io_close_finish() think it should both
close the filp again (which does no harm) AND log a new CQE event for
this request. This causes duplicate CQEs.

Queue the request up for async manually so we don't grab files
needlessly and trigger this condition.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f809fef1

io_uring: remove extra ->file check · e0280e19

由 Pavel Begunkov 提交于 2月 01, 2020

to #26323588

commit 9250f9ee194dc3dcee28a42a1533fa2cc0edd215 upstream.

It won't ever get into io_prep_rw() when req->file haven't been set in
io_req_set_file(), hence remove the check.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e0280e19

io_uring: don't map read/write iovec potentially twice · 4bde6717

由 Jens Axboe 提交于 1月 31, 2020

to #26323588

commit 5d204bcfa09330972ad3428a8f81c23f371d3e6d upstream.

If we have a read/write that is deferred, we already setup the async IO
context for that request, and mapped it. When we later try and execute
the request and we get -EAGAIN, we don't want to attempt to re-map it.
If we do, we end up with garbage in the iovec, which typically leads
to an -EFAULT or -EINVAL completion.

Cc: stable@vger.kernel.org # 5.5
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4bde6717

io_uring: use the proper helpers for io_send/recv · a44f9077

由 Jens Axboe 提交于 1月 31, 2020

to #26323588

commit 0b7b21e42ba2d6ac9595a4358a9354249605a3af upstream.

Don't use the recvmsg/sendmsg helpers, use the same helpers that the
recv(2) and send(2) system calls use.
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a44f9077

io_uring: prevent potential eventfd recursion on poll · 0c6f7422

由 Jens Axboe 提交于 2月 01, 2020

to #26323588

commit f0b493e6b9a8959356983f57112229e69c2f7b8c upstream.

If we have nested or circular eventfd wakeups, then we can deadlock if
we run them inline from our poll waitqueue wakeup handler. It's also
possible to have very long chains of notifications, to the extent where
we could risk blowing the stack.

Check the eventfd recursion count before calling eventfd_signal(). If
it's non-zero, then punt the signaling to async context. This is always
safe, as it takes us out-of-line in terms of stack and locking context.

Cc: stable@vger.kernel.org # 5.1+
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

0c6f7422

io_uring: add BUILD_BUG_ON() to assert the layout of struct io_uring_sqe · 955781d9

由 Stefan Metzmacher 提交于 1月 29, 2020

to #26323588

commit d7f62e825fd19202a0749d10fb439714c51f67d2 upstream.

With nesting of anonymous unions and structs it's hard to
review layout changes. It's better to ask the compiler
for these things.
Signed-off-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

955781d9

io_uring: add ->show_fdinfo() for the io_uring file descriptor · 80f00261

由 Jens Axboe 提交于 1月 30, 2020

to #26323588

commit 87ce955b24c9940cb2ca7e5173fcf175578d9fe9 upstream.

It can be hard to know exactly what is registered with the ring.
Especially for credentials, it'd be handy to be able to see which
ones are registered, what personalities they have, and what the ID
of each of them is.

This adds support for showing information registered in the ring from
the fdinfo of the io_uring fd. Here's an example from a test case that
registers 4 files (two of them sparse), 4 buffers, and 2 personalities:

pos:	0
flags:	02000002
mnt_id:	14
UserFiles:	4
    0: file-no-1
    1: file-no-2
    2: <none>
    3: <none>
UserBufs:	4
    0: 0x563817c46000/128
    1: 0x563817c47000/256
    2: 0x563817c48000/512
    3: 0x563817c49000/1024
Personalities:
    1
	Uid:	0		0		0		0
	Gid:	0		0		0		0
	Groups:	0
	CapEff:	0000003fffffffff
    2
	Uid:	0		0		0		0
	Gid:	0		0		0		0
	Groups:	0
	CapEff:	0000003fffffffff
Suggested-by: NJann Horn <jannh@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

80f00261

io_uring: add support for epoll_ctl(2) · c9166992

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit 3e4827b05d2ac2d377ed136a52829ec46787bf4b upstream.

This adds IORING_OP_EPOLL_CTL, which can perform the same work as the
epoll_ctl(2) system call.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c9166992

io_uring: fix linked command file table usage · c3c1561c

由 Jens Axboe 提交于 1月 29, 2020

to #26323588

commit f86cd20c9454847a524ddbdcdec32c0380ed7c9b upstream.

We're not consistent in how the file table is grabbed and assigned if we
have a command linked that requires the use of it.

Add ->file_table to the io_op_defs[] array, and use that to determine
when to grab the table instead of having the handlers set it if they
need to defer. This also means we can kill the IO_WQ_WORK_NEEDS_FILES
flag. We always initialize work->files, so io-wq can just check for
that.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c3c1561c

io_uring: support using a registered personality for commands · 9f4ebfa2

由 Jens Axboe 提交于 1月 28, 2020

to #26323588

commit 75c6a03904e0dd414a4d99a3072075cb5117e5bc upstream.

For personalities previously registered via IORING_REGISTER_PERSONALITY,
allow any command to select them. This is done through setting
sqe->personality to the id returned from registration, and then flagging
sqe->flags with IOSQE_PERSONALITY.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

9f4ebfa2

io_uring: allow registering credentials · 28628ef6

由 Jens Axboe 提交于 1月 28, 2020

to #26323588

commit 071698e13ac6ba786dfa22349a7b62deb5a9464d upstream.

If an application wants to use a ring with different kinds of
credentials, it can register them upfront. We don't lookup credentials,
the credentials of the task calling IORING_REGISTER_PERSONALITY is used.

An 'id' is returned for the application to use in subsequent personality
support.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

28628ef6

io_uring: add io-wq workqueue sharing · b124ec72

由 Pavel Begunkov 提交于 1月 28, 2020

to #26323588

commit 24369c2e3bb06d8c4e71fd6ceaf4f8a01ae79b7c upstream.

If IORING_SETUP_ATTACH_WQ is set, it expects wq_fd in io_uring_params to
be a valid io_uring fd io-wq of which will be shared with the newly
created io_uring instance. If the flag is set but it can't share io-wq,
it fails.

This allows creation of "sibling" io_urings, where we prefer to keep the
SQ/CQ private, but want to share the async backend to minimize the amount
of overhead associated with having multiple rings that belong to the same
backend.
Reported-by: NJens Axboe <axboe@kernel.dk>
Reported-by: NDaurnimator <quae@daurnimator.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

b124ec72

io_uring/io-wq: don't use static creds/mm assignments · 8b7feaf6

由 Jens Axboe 提交于 1月 27, 2020

to #26323588

commit cccf0ee834559ae0b327b40290e14f6a2a017177 upstream.

We currently setup the io_wq with a static set of mm and creds. Even for
a single-use io-wq per io_uring, this is suboptimal as we have may have
multiple enters of the ring. For sharing the io-wq backend, it doesn't
work at all.

Switch to passing in the creds and mm when the work item is setup. This
means that async work is no longer deferred to the io_uring mm and creds,
it is done with the current mm and creds.

Flag this behavior with IORING_FEAT_CUR_PERSONALITY, so applications know
they can rely on the current personality (mm and creds) being the same
for direct issue and async issue.
Reviewed-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8b7feaf6

io_uring: fix refcounting with batched allocations at OOM · c1f480a9

由 Pavel Begunkov 提交于 1月 25, 2020

to #26323588

commit 9466f43741bc08edd7b1bee642dd6f5561091634 upstream.

In case of out of memory the second argument of percpu_ref_put_many() in
io_submit_sqes() may evaluate into "nr - (-EAGAIN)", that is clearly
wrong.

Fixes: 2b85edfc0c90 ("io_uring: batch getting pcpu references")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c1f480a9

io_uring: add comment for drain_next · 9cdf862f

由 Pavel Begunkov 提交于 1月 25, 2020

to #26323588

commit 8cdf2193a3335b4cfb6e023b41ac293d0843d287 upstream.

Draining the middle of a link is tricky, so leave a comment there
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

9cdf862f

io_uring: don't attempt to copy iovec for READ/WRITE · fe5b6bc4

由 Jens Axboe 提交于 1月 24, 2020

to #26323588

commit 980ad26304abf11e78caaa68023411b9c088b848 upstream.

For the non-vectored variant of READV/WRITEV, we don't need to setup an
async io context, and we flag that appropriately in the io_op_defs
array. However, in fixing this for the 5.5 kernel in commit 74566df3a71c
we didn't have these opcodes, so the check there was added just for the
READ_FIXED and WRITE_FIXED opcodes. Replace that check with just a
single check for needing async context, that covers all four of these
read/write variants that don't use an iovec.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

fe5b6bc4

io_uring: honor IOSQE_ASYNC for linked reqs · 0dc26a41

由 Pavel Begunkov 提交于 1月 22, 2020

to #26323588

commit 86a761f81ec87a96572214f5db606f60d36aaf08 upstream.

REQ_F_FORCE_ASYNC is checked only for the head of a link. Fix it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

0dc26a41

io_uring: prep req when do IOSQE_ASYNC · c72e84a2

由 Pavel Begunkov 提交于 1月 22, 2020

to #26323588

commit 1118591ab883f46df4ab614cc976bc4c8e04a464 upstream.

Whenever IOSQE_ASYNC is set, requests will be punted to async without
getting into io_issue_req() and without proper preparation done (e.g.
io_req_defer_prep()). Hence they will be left uninitialised.

Prepare them before punting.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c72e84a2

io_uring: use labeled array init in io_op_defs · bf135e22

由 Pavel Begunkov 提交于 1月 18, 2020

to #26323588

commit 0463b6c58e557118d602b2f225fa3bbe9b6f3560 upstream.

Don't rely on implicit ordering of IORING_OP_ and explicitly place them
at a right place in io_op_defs. Now former comments are now a part of
the code and won't ever outdate.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

bf135e22

io_uring: optimise sqe-to-req flags translation · 17147236

由 Pavel Begunkov 提交于 1月 18, 2020

to #26323588

commit 6b47ee6ecab142f938a40bf3b297abac74218ee2 upstream.

For each IOSQE_* flag there is a corresponding REQ_F_* flag. And there
is a repetitive pattern of their translation:
e.g. if (sqe->flags & SQE_FLAG*) req->flags |= REQ_F_FLAG*

Use same numeric values/bits for them and copy instead of manual
handling.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

17147236

io_uring: remove REQ_F_IO_DRAINED · 2f79ef57

由 Pavel Begunkov 提交于 1月 18, 2020

to #26323588

commit 87987898a1dbc69b1138f7c10eb9abd655c03396 upstream.

A request can get into the defer list only once, there is no need for
marking it as drained, so remove it. This probably was left after
extracting __need_defer() for use in timeouts.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

2f79ef57

io_uring: file switch work needs to get flushed on exit · a6394883

由 Jens Axboe 提交于 1月 17, 2020

to #26323588

commit e46a7950d362231a4d0b078af5f4c109b8e5ac9e upstream.

We currently flush early, but if we have something in progress and a
new switch is scheduled, we need to ensure to flush after our teardown
as well.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a6394883

io_uring: hide uring_fd in ctx · 1dd60f09

由 Pavel Begunkov 提交于 1月 17, 2020

to #26323588

commit b14cca0c84c760fbd39ad6bb7e1181e2df103d25 upstream.

req->ring_fd and req->ring_file are used only during the prep stage
during submission, which is is protected by mutex. There is no need
to store them per-request, place them in ctx.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

1dd60f09

io_uring: remove extra check in __io_commit_cqring · bdff4da9

由 Pavel Begunkov 提交于 1月 17, 2020

to #26323588

commit 0791015837f1520dd72918355dcb1f1e79175255 upstream.

__io_commit_cqring() is almost always called when there is a change in
the rings, so the check is rather pessimising.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

bdff4da9

io_uring: optimise use of ctx->drain_next · e5d150e9

由 Pavel Begunkov 提交于 1月 17, 2020

to #26323588

commit 711be0312df4d350fb5bf1671c132cccae5aaf9a upstream.

Move setting ctx->drain_next to the only place it could be set, when it
got linked non-head requests. The same for checking it, it's interesting
only for a head of a link or a non-linked request.

No functional changes here. This removes some code from the common path
and also removes REQ_F_DRAIN_LINK flag, as it doesn't need it anymore.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e5d150e9

io_uring: add support for probing opcodes · b7e0b05d

由 Jens Axboe 提交于 1月 16, 2020

to #26323588

commit 66f4af93da5761d2fa05c0dc673a47003cdb9cfe upstream.

The application currently has no way of knowing if a given opcode is
supported or not without having to try and issue one and see if we get
-EINVAL or not. And even this approach is fraught with peril, as maybe
we're getting -EINVAL due to some fields being missing, or maybe it's
just not that easy to issue that particular command without doing some
other leg work in terms of setup first.

This adds IORING_REGISTER_PROBE, which fills in a structure with info
on what it supported or not. This will work even with sparse opcode
fields, which may happen in the future or even today if someone
backports specific features to older kernels.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

b7e0b05d

io_uring: account fixed file references correctly in batch · 2c00d9a5

由 Jens Axboe 提交于 1月 09, 2020

to #26323588

commit 10fef4bebf979bb705feed087611293d5864adfe upstream.

We can't assume that the whole batch has fixed files in it. If it's a
mix, or none at all, then we can end up doing a ref put that either
messes up accounting, or causes an oops if we have no fixed files at
all.

Also ensure we free requests properly between inflight accounted and
normal requests.

Fixes: 82c721577011 ("io_uring: extend batch freeing to cover more cases")
Reported-by: NDmitrii Dolgov <9erthalion6@gmail.com>
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Tested-by: NDmitrii Dolgov <9erthalion6@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

2c00d9a5

io_uring: add opcode to issue trace event · d6849a85

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit 354420f705ccd0aa2d41249f3bb55b4afbed1873 upstream.

For some test apps at least, user_data is just zeroes. So it's not a
good way to tell what the command actually is. Add the opcode to the
issue trace point.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

d6849a85

io_uring: add support for IORING_OP_OPENAT2 · c4270ce5

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit cebdb98617ae3e842c81c73758a185248b37cfd6 upstream.

Add support for the new openat2(2) system call. It's trivial to do, as
we can have openat(2) just be wrapped around it.
Suggested-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c4270ce5

io_uring: remove 'fname' from io_open structure · 6d31fb7e

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit f8748881b17dc56b3faa1d30c823f071c56593e5 upstream.

We only use it internally in the prep functions for both statx and
openat, so we don't need it to be persistent across the request.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

6d31fb7e

io_uring: add 'struct open_how' to the openat request context · ab1ea5a0

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit c12cedf24e786509de031a832e6b0e5f8b3ca37b upstream.

We'll need this for openat2(2) support, remove flags and mode from
the existing io_open struct.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

ab1ea5a0

io_uring: enable option to only trigger eventfd for async completions · f6ec8326

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit f2842ab5b72d7ee5f7f8385c2d4f32c133f5837b upstream.

If an application is using eventfd notifications with poll to know when
new SQEs can be issued, it's expecting the following read/writes to
complete inline. And with that, it knows that there are events available,
and don't want spurious wakeups on the eventfd for those requests.

This adds IORING_REGISTER_EVENTFD_ASYNC, which works just like
IORING_REGISTER_EVENTFD, except it only triggers notifications for events
that happen from async completions (IRQ, or io-wq worker completions).
Any completions inline from the submission itself will not trigger
notifications.
Suggested-by: NMark Papadakis <markuspapadakis@icloud.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f6ec8326

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功