提交 · e9fd2309669ea954c775320aa5dd065b1c50dca7 · openanolis / cloud-kernel

28 5月, 2020 40 次提交

io_uring: cancel pending async work if task exits · e9fd2309

由 Jens Axboe 提交于 2月 08, 2020

to #26323588

commit 6ab231448fdc5e37c15a94a4700fca11e80007f7 upstream.

Normally we cancel all work we track, but for untracked work we could
leave the async worker behind until that work completes. This is totally
fine, but does leave resources pending after the task is gone until that
work completes.

Cancel work that this task queued up when it goes away.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e9fd2309

io-wq: add io_wq_cancel_pid() to cancel based on a specific pid · 40270319

由 Jens Axboe 提交于 2月 08, 2020

to #26323588

commit 36282881a795cbf717aca79392ae9cdf0fef59c9 upstream.

Add a helper that allows the caller to cancel work based on what mm
it belongs to. This allows io_uring to cancel work from a given
task or thread when it exits.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

40270319

io-wq: make io_wqe_cancel_work() take a match handler · 89768418

由 Jens Axboe 提交于 2月 08, 2020

to #26323588

commit 00bcda13dcbf6bf7fa6f2a5886dd555362de8cfa upstream.

We want to use the cancel functionality for canceling based on not
just the work itself. Instead of matching on the work address
manually, allow a match handler to tell us if we found the right work
item or not.

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

89768418

io_uring: fix openat/statx's filename leak · 97f6378d

由 Pavel Begunkov 提交于 2月 08, 2020

to #26323588

commit 0bdbdd08a8f991bdaee54465a168c0795ea5d28b upstream.

As in the previous patch, make openat*_prep() and statx_prep() handle
double preparation to avoid resource leakage.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

97f6378d

io_uring: fix double prep iovec leak · 8b3d1de0

由 Pavel Begunkov 提交于 2月 08, 2020

to #26323588

commit 5f798beaf35d79355cbf18019c1993a84475a2c3 upstream.

Requests may be prepared multiple times with ->io allocated (i.e. async
prepared). Preparation functions don't handle it and forget about
previously allocated resources. This may happen in case of:
- spurious defer_check
- non-head (i.e. async prepared) request executed in sync (via nxt).

Make the handlers check, whether they already allocated resources, which
is true IFF REQ_F_NEED_CLEANUP is set.

Cc: stable@vger.kernel.org # 5.5
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8b3d1de0

io_uring: fix async close() with f_op->flush() · 4a6b66dd

由 Pavel Begunkov 提交于 2月 08, 2020

to #26323588

commit a93b33312f63ef6d5997f45d6fdf4de84c5396cc upstream.

First, io_close() misses filp_close() and io_cqring_add_event(), when
f_op->flush is defined. That's because in this case it will
io_queue_async_work() itself not grabbing files, so the corresponding
chunk in io_close_finish() won't be executed.

Second, when submitted through io_wq_submit_work(), it will do
filp_close() and *_add_event() twice: first inline in io_close(),
and the second one in call to io_close_finish() from io_close().
The second one will also fire, because it was submitted async through
generic path, and so have grabbed files.

And the last nice thing is to remove this weird pilgrimage with checking
work/old_work and casting it to nxt. Just use a helper instead.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4a6b66dd

io_uring: allow AT_FDCWD for non-file openat/openat2/statx · 083d6337

由 Jens Axboe 提交于 2月 06, 2020

to #26323588

commit 0b5faf6ba7fb78bb1fe7336d23ea1978386a6c3a upstream.

Don't just check for dirfd == -1, we should allow AT_FDCWD as well for
relative lookups.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

083d6337

io_uring: grab ->fs as part of async preparation · ddf405a0

由 Jens Axboe 提交于 2月 07, 2020

to #26323588

commit ff002b30181d30cdfbca316dadd099c3ca0d739c upstream.

This passes it in to io-wq, so it assumes the right fs_struct when
executing async work that may need to do lookups.

Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

ddf405a0

io-wq: add support for inheriting ->fs · 357f8607

由 Jens Axboe 提交于 2月 06, 2020

to #26323588

commit 9392a27d88b9707145d713654eb26f0c29789e50 upstream.

Some work items need this for relative path lookup, make it available
like the other inherited credentials/mm/etc.

Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

357f8607

io_uring: retry raw bdev writes if we hit -EOPNOTSUPP · 50df07cb

由 Jens Axboe 提交于 2月 07, 2020

to #26323588

commit faac996ccd5da95bc56b91aa80f2643c2d0a1c56 upstream.

For non-blocking issue, we set IOCB_NOWAIT in the kiocb. However, on a
raw block device, this yields an -EOPNOTSUPP return, as non-blocking
writes aren't supported. Turn this -EOPNOTSUPP into -EAGAIN, so we retry
from blocking context with IOCB_NOWAIT cleared.

Cc: stable@vger.kernel.org # 5.5
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

50df07cb

io_uring: add cleanup for openat()/statx() · 8c3e80dd

由 Pavel Begunkov 提交于 2月 07, 2020

to #26323588

commit 8fef80bf56a49c60b457dedb99fd6c5279a5dbe1 upstream.

openat() and statx() may have allocated ->open.filename, which should be
be put. Add cleanup handlers for them.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8c3e80dd

io_uring: fix iovec leaks · 36d6c6ed

由 Pavel Begunkov 提交于 2月 07, 2020

to #26323588

commit 99bc4c38537d774e667d043c520914082da19abf upstream.

Allocated iovec is freed only in io_{read,write,send,recv)(), and just
leaves it if an error occured. There are plenty of such cases:
- cancellation of non-head requests
- fail grabbing files in __io_queue_sqe()
- set REQ_F_NOWAIT and returning in __io_queue_sqe()

Add REQ_F_NEED_CLEANUP, which will force such requests with custom
allocated resourses go through cleanup handlers on put.

Cc: stable@vger.kernel.org # 5.5
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

36d6c6ed

io_uring: remove unused struct io_async_open · 5d043bb1

由 Pavel Begunkov 提交于 2月 07, 2020

to #26323588

commit e96e977992d0ea40b6e70cb63dede85c9078e744 upstream.

struct io_async_open is unused, remove it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

5d043bb1

io_uring: flush overflowed CQ events in the io_uring_poll() · 644813a9

由 Stefano Garzarella 提交于 2月 07, 2020

to #26323588

commit 63e5d81f72af1bf370bf8a6745b0a8d71a7bb37d upstream.

In io_uring_poll() we must flush overflowed CQ events before to
check if there are CQ events available, to avoid missing events.

We call the io_cqring_events() that checks and flushes any overflow
and returns the number of CQ events available.
Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

644813a9

io_uring: statx/openat/openat2 don't support fixed files · c51ce185

由 Jens Axboe 提交于 2月 06, 2020

to #26323588

commit cf3040ca55f2085b0a372a620ee2cb93ae19b686 upstream.

All of these opcodes take a directory file descriptor. We can't easily
support fixed files for these operations, and the use case for that
probably isn't all that clear (or sensible) anyway.

Disable IOSQE_FIXED_FILE for these operations.
Reported-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c51ce185

io_uring: fix deferred req iovec leak · 343ae230

由 Pavel Begunkov 提交于 2月 06, 2020

to #26323588

commit 1e95081cb5b4cf77065d37866f57cf3c90a3df78 upstream.

After defer, a request will be prepared, that includes allocating iovec
if needed, and then submitted through io_wq_submit_work() but not custom
handler (e.g. io_rw_async()/io_sendrecv_async()). However, it'll leak
iovec, as it's in io-wq and the code goes as follows:

io_read() {
	if (!io_wq_current_is_worker())
		kfree(iovec);
}

Put all deallocation logic in io_{read,write,send,recv}(), which will
leave the memory, if going async with -EAGAIN.

It also fixes a leak after failed io_alloc_async_ctx() in
io_{recv,send}_msg().

Cc: stable@vger.kernel.org # 5.5
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

343ae230

io_uring: fix 1-bit bitfields to be unsigned · ced6d31b

由 Randy Dunlap 提交于 2月 05, 2020

to #26323588

commit e1d85334d62386e9503e4a0d5d022e2d8e0011a0 upstream.

Make bitfields of size 1 bit be unsigned (since there is no room
for the sign bit).
This clears up the sparse warnings:

  CHECK   ../fs/io_uring.c
../fs/io_uring.c:207:50: error: dubious one-bit signed bitfield
../fs/io_uring.c:208:55: error: dubious one-bit signed bitfield
../fs/io_uring.c:209:63: error: dubious one-bit signed bitfield
../fs/io_uring.c:210:54: error: dubious one-bit signed bitfield
../fs/io_uring.c:211:57: error: dubious one-bit signed bitfield

Found by sight and then verified with sparse.

Fixes: 69b3e546139a ("io_uring: change io_ring_ctx bool fields into bit fields")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

ced6d31b

io_uring: get rid of delayed mm check · e181f2f8

由 Pavel Begunkov 提交于 2月 06, 2020

to #26323588

commit 1cb1edb2f5ba8a3e8d47ded391007c6fe3ac0ad7 upstream.

Fail fast if can't grab mm, so past that requests always have an mm
when required. This allows us to remove req->user altogether.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e181f2f8

io_uring: cleanup fixed file data table references · 93ba78fd

由 Jens Axboe 提交于 2月 04, 2020

to #26323588

commit 2faf852d1be8a4960d328492298da6448cca0279 upstream.

syzbot reports a use-after-free in io_ring_file_ref_switch() when it
tries to switch back to percpu mode. When we put the final reference to
the table by calling percpu_ref_kill_and_confirm(), we don't want the
zero reference to queue async work for flushing the potentially queued
up items. We currently do a few flush_work(), but they merely paper
around the issue, since the work item may not have been queued yet
depending on the when the percpu-ref callback gets run.

Coming into the file unregister, we know we have the ring quiesced.
io_ring_file_ref_switch() can check for whether or not the ref is dying
or not, and not queue anything async at that point. Once the ref has
been confirmed killed, flush any potential items manually.

Reported-by: syzbot+7caeaea49c2c8a591e3d@syzkaller.appspotmail.com
Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

93ba78fd

io_uring: spin for sq thread to idle on shutdown · 74b46d93

由 Jens Axboe 提交于 2月 04, 2020

to #26323588

commit df069d80c8e38c19531c392322e9a16617475c44 upstream.

As part of io_uring shutdown, we cancel work that is pending and won't
necessarily complete on its own. That includes requests like poll
commands and timeouts.

If we're using SQPOLL for kernel side submission and we shutdown the
ring immediately after queueing such work, we can race with the sqthread
doing the submission. This means we may miss cancelling some work, which
results in the io_uring shutdown hanging forever.

Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

74b46d93

io_uring: put the flag changing code in the same spot · a7c70de9

由 Pavel Begunkov 提交于 2月 01, 2020

to #26323588

commit 3e577dcd73a1fdc641bf45e5ea4a37869de221b5 upstream.

Both iocb_flags() and kiocb_set_rw_flags() are inline and modify
kiocb->ki_flags. Place them close, so they can be potentially better
optimised.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a7c70de9

io_uring: iterate req cache backwards · f80ed44f

由 Pavel Begunkov 提交于 2月 01, 2020

to #26323588

commit 6c8a31346925cbb373f84a18428ab3df59d3950e upstream.

Grab requests from cache-array from the end, so can get by only
free_reqs.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f80ed44f

io_uring: punt even fadvise() WILLNEED to async context · a8af11c5

由 Jens Axboe 提交于 2月 01, 2020

to #26323588

commit 3e69426da2599677ebbe76e2d97a606c4797bd74 upstream.

Andres correctly points out that read-ahead can block, if it needs to
read in meta data (or even just through the page cache page allocations).
Play it safe for now and just ensure WILLNEED is also punted to async
context.

While in there, allow the file settings hints from non-blocking
context. They don't need to start/do IO, and we can safely do them
inline.

Fixes: 4840e418c2fc ("io_uring: add IORING_OP_FADVISE")
Reported-by: NAndres Freund <andres@anarazel.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a8af11c5

io_uring: fix sporadic double CQE entry for close · f809fef1

由 Jens Axboe 提交于 1月 31, 2020

to #26323588

commit 1a417f4e618e05fba29ba222f1e8555c302376ce upstream.

We punt close to async for the final fput(), but we log the completion
even before that even in that case. We rely on the request not having
a files table assigned to detect what the final async close should do.
However, if we punt the async queue to __io_queue_sqe(), we'll get
->files assigned and this makes io_close_finish() think it should both
close the filp again (which does no harm) AND log a new CQE event for
this request. This causes duplicate CQEs.

Queue the request up for async manually so we don't grab files
needlessly and trigger this condition.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f809fef1

io_uring: remove extra ->file check · e0280e19

由 Pavel Begunkov 提交于 2月 01, 2020

to #26323588

commit 9250f9ee194dc3dcee28a42a1533fa2cc0edd215 upstream.

It won't ever get into io_prep_rw() when req->file haven't been set in
io_req_set_file(), hence remove the check.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e0280e19

io_uring: don't map read/write iovec potentially twice · 4bde6717

由 Jens Axboe 提交于 1月 31, 2020

to #26323588

commit 5d204bcfa09330972ad3428a8f81c23f371d3e6d upstream.

If we have a read/write that is deferred, we already setup the async IO
context for that request, and mapped it. When we later try and execute
the request and we get -EAGAIN, we don't want to attempt to re-map it.
If we do, we end up with garbage in the iovec, which typically leads
to an -EFAULT or -EINVAL completion.

Cc: stable@vger.kernel.org # 5.5
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4bde6717

io_uring: use the proper helpers for io_send/recv · a44f9077

由 Jens Axboe 提交于 1月 31, 2020

to #26323588

commit 0b7b21e42ba2d6ac9595a4358a9354249605a3af upstream.

Don't use the recvmsg/sendmsg helpers, use the same helpers that the
recv(2) and send(2) system calls use.
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a44f9077

io_uring: prevent potential eventfd recursion on poll · 0c6f7422

由 Jens Axboe 提交于 2月 01, 2020

to #26323588

commit f0b493e6b9a8959356983f57112229e69c2f7b8c upstream.

If we have nested or circular eventfd wakeups, then we can deadlock if
we run them inline from our poll waitqueue wakeup handler. It's also
possible to have very long chains of notifications, to the extent where
we could risk blowing the stack.

Check the eventfd recursion count before calling eventfd_signal(). If
it's non-zero, then punt the signaling to async context. This is always
safe, as it takes us out-of-line in terms of stack and locking context.

Cc: stable@vger.kernel.org # 5.1+
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

0c6f7422

eventfd: track eventfd_signal() recursion depth · 0c570e16

由 Jens Axboe 提交于 2月 02, 2020

to #26323588

commit b5e683d5cab8cd433b06ae178621f083cabd4f63 upstream.

eventfd use cases from aio and io_uring can deadlock due to circular
or resursive calling, when eventfd_signal() tries to grab the waitqueue
lock. On top of that, it's also possible to construct notification
chains that are deep enough that we could blow the stack.

Add a percpu counter that tracks the percpu recursion depth, warn if we
exceed it. The counter is also exposed so that users of eventfd_signal()
can do the right thing if it's non-zero in the context where it is
called.

Cc: stable@vger.kernel.org # 4.19+
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

0c570e16

io_uring: add BUILD_BUG_ON() to assert the layout of struct io_uring_sqe · 955781d9

由 Stefan Metzmacher 提交于 1月 29, 2020

to #26323588

commit d7f62e825fd19202a0749d10fb439714c51f67d2 upstream.

With nesting of anonymous unions and structs it's hard to
review layout changes. It's better to ask the compiler
for these things.
Signed-off-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

955781d9

io_uring: add ->show_fdinfo() for the io_uring file descriptor · 80f00261

由 Jens Axboe 提交于 1月 30, 2020

to #26323588

commit 87ce955b24c9940cb2ca7e5173fcf175578d9fe9 upstream.

It can be hard to know exactly what is registered with the ring.
Especially for credentials, it'd be handy to be able to see which
ones are registered, what personalities they have, and what the ID
of each of them is.

This adds support for showing information registered in the ring from
the fdinfo of the io_uring fd. Here's an example from a test case that
registers 4 files (two of them sparse), 4 buffers, and 2 personalities:

pos:	0
flags:	02000002
mnt_id:	14
UserFiles:	4
    0: file-no-1
    1: file-no-2
    2: <none>
    3: <none>
UserBufs:	4
    0: 0x563817c46000/128
    1: 0x563817c47000/256
    2: 0x563817c48000/512
    3: 0x563817c49000/1024
Personalities:
    1
	Uid:	0		0		0		0
	Gid:	0		0		0		0
	Groups:	0
	CapEff:	0000003fffffffff
    2
	Uid:	0		0		0		0
	Gid:	0		0		0		0
	Groups:	0
	CapEff:	0000003fffffffff
Suggested-by: NJann Horn <jannh@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

80f00261

io_uring: add support for epoll_ctl(2) · c9166992

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit 3e4827b05d2ac2d377ed136a52829ec46787bf4b upstream.

This adds IORING_OP_EPOLL_CTL, which can perform the same work as the
epoll_ctl(2) system call.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c9166992

eventpoll: support non-blocking do_epoll_ctl() calls · 73681652

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit 39220e8d4a2aaab045ea03cc16d737e85d0817bf upstream.

Also make it available outside of epoll, along with the helper that
decides if we need to copy the passed in epoll_event.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

73681652

eventpoll: abstract out epoll_ctl() handler · 10702bb9

由 Jens Axboe 提交于 1月 08, 2020

to #26323588

commit 58e41a44c488f3e9601fd8150f58377ef8f44889 upstream.

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

10702bb9

io_uring: fix linked command file table usage · c3c1561c

由 Jens Axboe 提交于 1月 29, 2020

to #26323588

commit f86cd20c9454847a524ddbdcdec32c0380ed7c9b upstream.

We're not consistent in how the file table is grabbed and assigned if we
have a command linked that requires the use of it.

Add ->file_table to the io_op_defs[] array, and use that to determine
when to grab the table instead of having the handlers set it if they
need to defer. This also means we can kill the IO_WQ_WORK_NEEDS_FILES
flag. We always initialize work->files, so io-wq can just check for
that.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c3c1561c

io_uring: support using a registered personality for commands · 9f4ebfa2

由 Jens Axboe 提交于 1月 28, 2020

to #26323588

commit 75c6a03904e0dd414a4d99a3072075cb5117e5bc upstream.

For personalities previously registered via IORING_REGISTER_PERSONALITY,
allow any command to select them. This is done through setting
sqe->personality to the id returned from registration, and then flagging
sqe->flags with IOSQE_PERSONALITY.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

9f4ebfa2

io_uring: allow registering credentials · 28628ef6

由 Jens Axboe 提交于 1月 28, 2020

to #26323588

commit 071698e13ac6ba786dfa22349a7b62deb5a9464d upstream.

If an application wants to use a ring with different kinds of
credentials, it can register them upfront. We don't lookup credentials,
the credentials of the task calling IORING_REGISTER_PERSONALITY is used.

An 'id' is returned for the application to use in subsequent personality
support.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

28628ef6

io_uring: add io-wq workqueue sharing · b124ec72

由 Pavel Begunkov 提交于 1月 28, 2020

to #26323588

commit 24369c2e3bb06d8c4e71fd6ceaf4f8a01ae79b7c upstream.

If IORING_SETUP_ATTACH_WQ is set, it expects wq_fd in io_uring_params to
be a valid io_uring fd io-wq of which will be shared with the newly
created io_uring instance. If the flag is set but it can't share io-wq,
it fails.

This allows creation of "sibling" io_urings, where we prefer to keep the
SQ/CQ private, but want to share the async backend to minimize the amount
of overhead associated with having multiple rings that belong to the same
backend.
Reported-by: NJens Axboe <axboe@kernel.dk>
Reported-by: NDaurnimator <quae@daurnimator.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

b124ec72

io-wq: allow grabbing existing io-wq · e45f2f87

由 Pavel Begunkov 提交于 1月 28, 2020

to #26323588

commit eba6f5a330cf042bb0001f0b5e8cbf21be1b25d6 upstream.

Export a helper to attach to an existing io-wq, rather than setting up
a new one. This is doable now that we have reference counted io_wq's.
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e45f2f87

io_uring/io-wq: don't use static creds/mm assignments · 8b7feaf6

由 Jens Axboe 提交于 1月 27, 2020

to #26323588

commit cccf0ee834559ae0b327b40290e14f6a2a017177 upstream.

We currently setup the io_wq with a static set of mm and creds. Even for
a single-use io-wq per io_uring, this is suboptimal as we have may have
multiple enters of the ring. For sharing the io-wq backend, it doesn't
work at all.

Switch to passing in the creds and mm when the work item is setup. This
means that async work is no longer deferred to the io_uring mm and creds,
it is done with the current mm and creds.

Flag this behavior with IORING_FEAT_CUR_PERSONALITY, so applications know
they can rely on the current personality (mm and creds) being the same
for direct issue and async issue.
Reviewed-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8b7feaf6

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功