提交 · d3b879463ffaaa1572b54adcca23710c6d65208b · openanolis / cloud-kernel

04 6月, 2020 40 次提交

io_uring: remove redundant variable pointer nxt and io_wq_assign_next call · d3b87946

由 Colin Ian King 提交于 4月 06, 2020

to #28170604

commit 211fea18a7bb9b8d51cb5d2b9cbe5583af256609 upstream

An earlier commit "io_uring: remove @nxt from handlers" removed the
setting of pointer nxt and now it is always null, hence the non-null
check and call to io_wq_assign_next is redundant and can be removed.

Addresses-Coverity: ("'Constant' variable guard")
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

d3b87946

io_uring: fix ctx refcounting in io_submit_sqes() · a4f950b2

由 Pavel Begunkov 提交于 4月 06, 2020

to #28170604

commit 48bdd849e967f1c573d2b2bc24308e24a83f39c2 upstream

If io_get_req() fails, it drops a ref. Then, awhile keeping @Submitted
unmodified, io_submit_sqes() breaks the loop and puts @nr - @Submitted
refs. For each submitted req a ref is dropped in io_put_req() and
friends. So, for @nr taken refs there will be
(@nr - @Submitted + @Submitted + 1) dropped.

Remove ctx refcounting from io_get_req(), that at the same time makes
it clearer.

Fixes: 2b85edfc0c90 ("io_uring: batch getting pcpu references")
Cc: stable@vger.kernel.org # v5.6
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a4f950b2

io_uring: process requests completed with -EAGAIN on poll list · ee110291

由 Bijan Mottahedeh 提交于 4月 03, 2020

to #28170604

commit 581f981034890dfd27be7e98946e8f0461f3967a upstream

A request that completes with an -EAGAIN result after it has been added
to the poll list, will not be removed from that list in io_do_iopoll()
because the f_op->iopoll() will not succeed for that request.

Maintain a retryable local list similar to the done list, and explicity
reissue requests with an -EAGAIN result.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

ee110291

io_uring: remove bogus RLIMIT_NOFILE check in file registration · 38e2dd3c

由 Jens Axboe 提交于 4月 03, 2020

to #28170604

commit c336e992cb1cb1db9ee608dfb30342ae781057ab upstream

We already checked this limit when the file was opened, and we keep it
open in the file table. Hence when we added unit_inflight to the count
we want to register, we're doubly accounting these files. This results
in -EMFILE for file registration, if we're at half the limit.

Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

38e2dd3c

io_uring: use io-wq manager as backup task if task is exiting · 38954954

由 Jens Axboe 提交于 4月 03, 2020

to #28170604

commit aa96bf8a9ee33457b7e3ea43e97dfa1e3a15ab20 upstream

If the original task is (or has) exited, then the task work will not get
queued properly. Allow for using the io-wq manager task to queue this
work for execution, and ensure that the io-wq manager notices and runs
this work if woken up (or exiting).
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

38954954

io_uring: grab task reference for poll requests · e32c3252

由 Jens Axboe 提交于 4月 03, 2020

to #28170604

commit 3537b6a7c65434d0d2cc0c9862e69be11c367fdc upstream

We can have a task exit if it's not the owner of the ring. Be safe and
grab an actual reference to it, to avoid a potential use-after-free.
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e32c3252

io_uring: retry poll if we got woken with non-matching mask · 04e4ec37

由 Jens Axboe 提交于 4月 03, 2020

to #28170604

commit a6ba632d2c249a4390289727c07b8b55eb02a41d upstream

If we get woken and the poll doesn't match our mask, re-add the task
to the poll waitqueue and try again instead of completing the request
with a mask of 0.
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

04e4ec37

io_uring: add missing finish_wait() in io_sq_thread() · c7c2f54b

由 Hillf Danton 提交于 4月 01, 2020

to #28170604

commit 10bea96dcc13ad841d53bdcc9d8e731e9e0ad58f upstream

Add it to pair with prepare_to_wait() in an attempt to avoid
anything weird in the field.

Fixes: b41e98524e42 ("io_uring: add per-task callback handler")
Reported-by: syzbot+0c3370f235b74b3cfd97@syzkaller.appspotmail.com
Signed-off-by: NHillf Danton <hdanton@sina.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c7c2f54b

io_uring: refactor file register/unregister/update handling · 6663e7bf

由 Xiaoguang Wang 提交于 3月 31, 2020

to #28170604

commit 0558955373023b08f638c9ede36741b0e4200f58 upstream

While diving into io_uring fileset register/unregister/update codes, we
found one bug in the fileset update handling. io_uring fileset update
use a percpu_ref variable to check whether we can put the previously
registered file, only when the refcnt of the perfcpu_ref variable
reaches zero, can we safely put these files. But this doesn't work so
well. If applications always issue requests continually, this
perfcpu_ref will never have an chance to reach zero, and it'll always be
in atomic mode, also will defeat the gains introduced by fileset
register/unresiger/update feature, which are used to reduce the atomic
operation overhead of fput/fget.

To fix this issue, while applications do IORING_REGISTER_FILES or
IORING_REGISTER_FILES_UPDATE operations, we allocate a new percpu_ref
and kill the old percpu_ref, new requests will use the new percpu_ref.
Once all previous old requests complete, old percpu_refs will be dropped
and registered files will be put safely.

Link: https://lore.kernel.org/io-uring/5a8dac33-4ca2-4847-b091-f7dcd3ad0ff3@linux.alibaba.com/T/#tSigned-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

6663e7bf

io_uring: cleanup io_alloc_async_ctx() · 23f4e892

由 Xiaoguang Wang 提交于 3月 27, 2020

to #28170604

commit 3d9932a8b240c9019f48358e8a6928c53c2c7f6b upstream

Cleanup io_alloc_async_ctx() a bit, add a new __io_alloc_async_ctx(),
so io_setup_async_rw() won't need to check whether async_ctx is true
or false again.
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

23f4e892

io_uring: fix missing 'return' in comment · cd99ef9b

由 Chucheng Luo 提交于 3月 25, 2020

to #28170604

commit bff6035d0c40fa1dd195aa41f61814d622883420 upstream

The missing 'return' work may make it hard for other developers to
understand it.
Signed-off-by: NChucheng Luo <luochucheng@vivo.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

cd99ef9b

io-wq: handle hashed writes in chains · 56986bc4

由 Pavel Begunkov 提交于 3月 23, 2020

to #28170604

commit 86f3cd1b589a10dbdca98c52cc0cd0f56523c9b3 upstream

We always punt async buffered writes to an io-wq helper, as the core
kernel does not have IOCB_NOWAIT support for that. Most buffered async
writes complete very quickly, as it's just a copy operation. This means
that doing multiple locking roundtrips on the shared wqe lock for each
buffered write is wasteful. Additionally, buffered writes are hashed
work items, which means that any buffered write to a given file is
serialized.

Keep identicaly hashed work items contiguously in @wqe->work_list, and
track a tail for each hash bucket. On dequeue of a hashed item, splice
all of the same hash in one go using the tracked tail. Until the batch
is done, the caller doesn't have to synchronize with the wqe or worker
locks again.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

56986bc4

io-uring: drop 'free_pfile' in struct io_file_put · 34474b3c

由 Hillf Danton 提交于 3月 23, 2020

to #28170604

commit a5318d3cdffbecf075928363d7e4becfeddabfcb upstream

Sync removal of file is only used in case of a GFP_KERNEL kmalloc
failure at the cost of io_file_put::done and work flush, while a
glich like it can be handled at the call site without too much pain.

That said, what is proposed is to drop sync removing of file, and
the kink in neck as well.
Signed-off-by: NHillf Danton <hdanton@sina.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

34474b3c

io-uring: drop completion when removing file · 8658724a

由 Hillf Danton 提交于 3月 23, 2020

to #28170604

commit 4afdb733b1606c6cb86e7833f9335f4870cf7ddd upstream

A case of task hung was reported by syzbot,

INFO: task syz-executor975:9880 blocked for more than 143 seconds.
      Not tainted 5.6.0-rc6-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor975 D27576  9880   9878 0x80004000
Call Trace:
 schedule+0xd0/0x2a0 kernel/sched/core.c:4154
 schedule_timeout+0x6db/0xba0 kernel/time/timer.c:1871
 do_wait_for_common kernel/sched/completion.c:83 [inline]
 __wait_for_common kernel/sched/completion.c:104 [inline]
 wait_for_common kernel/sched/completion.c:115 [inline]
 wait_for_completion+0x26a/0x3c0 kernel/sched/completion.c:136
 io_queue_file_removal+0x1af/0x1e0 fs/io_uring.c:5826
 __io_sqe_files_update.isra.0+0x3a1/0xb00 fs/io_uring.c:5867
 io_sqe_files_update fs/io_uring.c:5918 [inline]
 __io_uring_register+0x377/0x2c00 fs/io_uring.c:7131
 __do_sys_io_uring_register fs/io_uring.c:7202 [inline]
 __se_sys_io_uring_register fs/io_uring.c:7184 [inline]
 __x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:7184
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

and bisect pointed to 05f3fb3c5397 ("io_uring: avoid ring quiesce for
fixed file set unregister and update").

It is down to the order that we wait for work done before flushing it
while nobody is likely going to wake us up.

We can drop that completion on stack as flushing work itself is a sync
operation we need and no more is left behind it.

To that end, io_file_put::done is re-used for indicating if it can be
freed in the workqueue worker context.
Reported-and-Inspired-by: Nsyzbot <syzbot+538d1957ce178382a394@syzkaller.appspotmail.com>
Signed-off-by: NHillf Danton <hdanton@sina.com>

Rename ->done to ->free_pfile
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8658724a

io_uring: Fix ->data corruption on re-enqueue · fd7ce40a

由 Pavel Begunkov 提交于 3月 23, 2020

to #28170604

commit 18a542ff19ad149fac9e5a36a4012e3cac7b3b3b upstream

work->data and work->list are shared in union. io_wq_assign_next() sets
->data if a req having a linked_timeout, but then io-wq may want to use
work->list, e.g. to do re-enqueue of a request, so corrupting ->data.

->data is not necessary, just remove it and extract linked_timeout
through @Link_list.

Fixes: 60cf46ae6054 ("io-wq: hash dependent work")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

fd7ce40a

io-wq: close cancel gap for hashed linked work · 4aa92083

由 Pavel Begunkov 提交于 3月 22, 2020

to #28170604

commit f2cf11492b8b30d89b2fbf525c9ea5e8c4ccc842 upstream

After io_assign_current_work() of a linked work, it can be decided to
offloaded to another thread so doing io_wqe_enqueue(). However, until
next io_assign_current_work() it can be cancelled, that isn't handled.

Don't assign it, if it's not going to be executed.

Fixes: 60cf46ae6054 ("io-wq: hash dependent work")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4aa92083

io_uring: make spdxcheck.py happy · 8b3e2b70

由 Lukas Bulwahn 提交于 3月 21, 2020

to #28170604

commit 9f5834c868e901b00f1bfe4d0052b5906b4a2b7f upstream

Commit bbbdeb4720a0 ("io_uring: dual license io_uring.h uapi header")
uses a nested SPDX-License-Identifier to dual license the header.

Since then, ./scripts/spdxcheck.py complains:

  include/uapi/linux/io_uring.h: 1:60 Missing parentheses: OR

Add parentheses to make spdxcheck.py happy.
Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8b3e2b70

io_uring: honor original task RLIMIT_FSIZE · e335da1a

由 Jens Axboe 提交于 3月 20, 2020

to #28170604

commit 4ed734b0d0913e566a9d871e15d24eb240f269f7 upstream

With the previous fixes for number of files open checking, I added some
debug code to see if we had other spots where we're checking rlimit()
against the async io-wq workers. The only one I found was file size
checking, which we should also honor.

During write and fallocate prep, store the max file size and override
that for the current ask if we're in io-wq worker context.

Cc: stable@vger.kernel.org # 5.1+
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e335da1a

io-wq: hash dependent work · f749f220

由 Pavel Begunkov 提交于 3月 14, 2020

to #28170604

commit 60cf46ae605446feb0c43c472c0fd1af4cd96231 upstream

Enable io-wq hashing stuff for dependent works simply by re-enqueueing
such requests.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f749f220

io-wq: split hashing and enqueueing · 9fc5877e

由 Pavel Begunkov 提交于 3月 14, 2020

to #28170604

commit 8766dd516c535abf04491dca674d0ef6c95d814f upstream

It's a preparation patch removing io_wq_enqueue_hashed(), which
now should be done by io_wq_hash_work() + io_wq_enqueue().

Also, set hash value for dependant works, and do it as late as possible,
because req->file can be unavailable before. This hash will be ignored
by io-wq.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

9fc5877e

io-wq: don't resched if there is no work · 5a8fe193

由 Pavel Begunkov 提交于 3月 14, 2020

to #28170604

commit d78298e73a3443a3c1766fa89f5370f52a4efd94 upstream

This little tweak restores the behaviour that was before the recent
io_worker_handle_work() optimisation patches. It makes the function do
cond_resched() and flush_signals() only if there is an actual work to
execute.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

5a8fe193

io-wq: remove duplicated cancel code · 2c38ddc5

由 Pavel Begunkov 提交于 3月 07, 2020

to #28170604

commit 2293b4195800f88de2c454a24b25874be56d87f3 upstream

Deduplicate cancellation parts, as many of them looks the same, as do
e.g.
- io_wqe_cancel_cb_work() and io_wqe_cancel_work()
- io_wq_worker_cancel() and io_work_cancel()
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

2c38ddc5

io_uring: fix truncated async read/readv and write/writev retry · 23a7ba54

由 Jens Axboe 提交于 3月 11, 2020

to #28170604

commit 3f9d64415fdaa73017fcb168930006648617b488 upstream

Ensure we keep the truncated value, if we did truncate it. If not, we
might read/write more than the registered buffer size.

Also for retry, ensure that we return the truncated mapped value for
the vectorized versions of the read/write commands.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

23a7ba54

io_uring: dual license io_uring.h uapi header · 6032d618

由 Jens Axboe 提交于 3月 11, 2020

to #28170604

commit bbbdeb4720a0759ec90e3bcb20ad28d19e531346 upstream

This just syncs the header it with the liburing version, so there's no
confusion on the license of the header parts.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

6032d618

io_uring: Fix unused function warnings · d3e7d856

由 YueHaibing 提交于 3月 04, 2020

to #28170604

commit 469956e853ccdba72bb82ad2eea6e8ab6b15791f upstream

If CONFIG_NET is not set, gcc warns:

fs/io_uring.c:3110:12: warning: io_setup_async_msg defined but not used [-Wunused-function]
 static int io_setup_async_msg(struct io_kiocb *req,
            ^~~~~~~~~~~~~~~~~~

There are many funcions wraped by CONFIG_NET, move them
together to simplify code, also fix this warning.
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>

Minor tweaks.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

d3e7d856

io_uring: add end-of-bits marker and build time verify it · 7fcbf2e6

由 Jens Axboe 提交于 3月 03, 2020

to #28170604

commit 84557871f2ff332edd445d70349c8724c313c683 upstream

Not easy to tell if we're going over the size of bits we can shove
in req->flags, so add an end-of-bits marker and a BUILD_BUG_ON()
check for it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

7fcbf2e6

io_uring: provide means of removing buffers · bdc3a9cd

由 Jens Axboe 提交于 3月 02, 2020

to #28170604

commit 067524e914cb23e20d59480b318fe2625eaee7c8 upstream

We have IORING_OP_PROVIDE_BUFFERS, but the only way to remove buffers
is to trigger IO on them. The usual case of shrinking a buffer pool
would be to just not replenish the buffers when IO completes, and
instead just free it. But it may be nice to have a way to manually
remove a number of buffers from a given group, and
IORING_OP_REMOVE_BUFFERS provides that functionality.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

bdc3a9cd

io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_RECVMSG · e6e2f869

由 Jens Axboe 提交于 2月 27, 2020

to #28170604

commit 52de1fe122408d7a62b6cff9ed3895ebb882d71f upstream

Like IORING_OP_READV, this is limited to supporting just a single
segment in the iovec passed in.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

Notes: use VERIFY_WRITE for access_ok()
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e6e2f869

net: abstract out normal and compat msghdr import · 992cfc11

由 Jens Axboe 提交于 2月 27, 2020

to #28170604

commit 0a384abfae66651b28e4bbe16883b1ff046ba3b3 upstream

This splits it into two parts, one that imports the message, and one
that imports the iovec. This allows a caller to only do the first part,
and import the iovec manually afterwards.

No functional changes in this patch.
Acked-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

992cfc11

io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_READV · 89673d36

由 Jens Axboe 提交于 2月 27, 2020

to #28170604

commit 4d954c258a0c365a85a2d1b1cccf63aec38fca4c upstream

This adds support for the vectored read. This is limited to supporting
just 1 segment in the iov, and is provided just for convenience for
applications that use IORING_OP_READV already.

The iov helpers will be used for IORING_OP_RECVMSG as well.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

Notes: use VERIFY_WRITE for access_ok()
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

89673d36

io_uring: support buffer selection for OP_READ and OP_RECV · 1d68f9f6

由 Jens Axboe 提交于 2月 23, 2020

to #28170604

commit bcda7baaa3f15c7a95db3c024bb046d6e298f76b upstream

If a server process has tons of pending socket connections, generally
it uses epoll to wait for activity. When the socket is ready for reading
(or writing), the task can select a buffer and issue a recv/send on the
given fd.

Now that we have fast (non-async thread) support, a task can have tons
of pending reads or writes pending. But that means they need buffers to
back that data, and if the number of connections is high enough, having
them preallocated for all possible connections is unfeasible.

With IORING_OP_PROVIDE_BUFFERS, an application can register buffers to
use for any request. The request then sets IOSQE_BUFFER_SELECT in the
sqe, and a given group ID in sqe->buf_group. When the fd becomes ready,
a free buffer from the specified group is selected. If none are
available, the request is terminated with -ENOBUFS. If successful, the
CQE on completion will contain the buffer ID chosen in the cqe->flags
member, encoded as:

	(buffer_id << IORING_CQE_BUFFER_SHIFT) | IORING_CQE_F_BUFFER;

Once a buffer has been consumed by a request, it is no longer available
and must be registered again with IORING_OP_PROVIDE_BUFFERS.

Requests need to support this feature. For now, IORING_OP_READ and
IORING_OP_RECV support it. This is checked on SQE submission, a CQE with
res == -EOPNOTSUPP will be posted if attempted on unsupported requests.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

1d68f9f6

io_uring: add IORING_OP_PROVIDE_BUFFERS · 72e1286a

由 Jens Axboe 提交于 2月 23, 2020

to #28170604

commit ddf0322db79c5984dc1a1db890f946dd19b7d6d9 upstream

IORING_OP_PROVIDE_BUFFERS uses the buffer registration infrastructure to
support passing in an addr/len that is associated with a buffer ID and
buffer group ID. The group ID is used to index and lookup the buffers,
while the buffer ID can be used to notify the application which buffer
in the group was used. The addr passed in is the starting buffer address,
and length is each buffer length. A number of buffers to add with can be
specified, in which case addr is incremented by length for each addition,
and each buffer increments the buffer ID specified.

No validation is done of the buffer ID. If the application provides
buffers within the same group with identical buffer IDs, then it'll have
a hard time telling which buffer ID was used. The only restriction is
that the buffer ID can be a max of 16-bits in size, so USHRT_MAX is the
maximum ID that can be used.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

Notes: use VERIFY_WRITE for access_ok()
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

72e1286a

io_uring: buffer registration infrastructure · 96e1429b

由 Jens Axboe 提交于 2月 23, 2020

to #28170604

commit 5a2e745d4d430c4dbeeeb448c3d5c0c3109e511e upstream

This just prepares the ring for having lists of buffers associated with
it, that the application can provide for SQEs to consume instead of
providing their own.

The buffers are organized by group ID.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

96e1429b

io_uring/io-wq: forward submission ref to async · ab800df8

由 Pavel Begunkov 提交于 3月 04, 2020

to #28170604

commit e9fd939654f17651ff65e7e55aa6934d29eb4335 upstream

First it changes io-wq interfaces. It replaces {get,put}_work() with
free_work(), which guaranteed to be called exactly once. It also enforces
free_work() callback to be non-NULL.

io_uring follows the changes and instead of putting a submission reference
in io_put_req_async_completion(), it will be done in io_free_work(). As
removes io_get_work() with corresponding refcount_inc(), the ref balance
is maintained.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

ab800df8

io-wq: optimise out *next_work() double lock · 06cd37d5

由 Pavel Begunkov 提交于 3月 04, 2020

to #28170604

commit f462fd36fc43662eeb42c95a9b8da8659af6d75e upstream

When executing non-linked hashed work, io_worker_handle_work()
will lock-unlock wqe->lock to update hash, and then immediately
lock-unlock to get next work. Optimise this case and do
lock/unlock only once.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

06cd37d5

io-wq: optimise locking in io_worker_handle_work() · c4b60d58

由 Pavel Begunkov 提交于 3月 04, 2020

to #28170604

commit 58e3931987377d3f4ec7bbc13e4ea0aab52dc6b0 upstream

There are 2 optimisations:
- Now, io_worker_handler_work() do io_assign_current_work() twice per
request, and each one adds lock/unlock(worker->lock) pair. The first is
to reset worker->cur_work to NULL, and the second to set a real work
shortly after. If there is a dependant work, set it immediately, that
effectively removes the extra NULL'ing.

- And there is no use in taking wqe->lock for linked works, as they are
not hashed now. Optimise it out.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c4b60d58

io-wq: shuffle io_worker_handle_work() code · 128bda4c

由 Pavel Begunkov 提交于 3月 04, 2020

to #28170604

commit dc026a73c7221b4d9d146ed0bde69ff578ebe8dc upstream

This is a preparation patch, it adds some helpers and makes
the next patches cleaner.

- extract io_impersonate_work() and io_assign_current_work()
- replace @next label with nested do-while
- move put_work() right after NULL'ing cur_work.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

128bda4c

io_uring: get next work with submission ref drop · 5f289bb9

由 Pavel Begunkov 提交于 3月 03, 2020

to #28170604

commit 7a743e225b2a9da772b28a50031e1ccd8a8ce404 upstream

If after dropping the submission reference req->refs == 1, the request
is done, because this one is for io_put_work() and will be dropped
synchronously shortly after. In this case it's safe to steal a next
work from the request.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

5f289bb9

io_uring: remove @nxt from handlers · 76b0f792

由 Pavel Begunkov 提交于 3月 03, 2020

to #28170604

commit 014db0073cc6a12e1f421b9231d6f3aa35735823 upstream

There will be no use for @nxt in the handlers, and it's doesn't work
anyway, so purge it
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

76b0f792

io_uring: make submission ref putting consistent · 8a438665

由 Pavel Begunkov 提交于 3月 03, 2020

to #28170604

commit 594506fec5faec2b1ec82ad6fb0c8132512fc459 upstream

The rule is simple, any async handler gets a submission ref and should
put it at the end. Make them all follow it, and so more consistent.

This is a preparation patch, and as io_wq_assign_next() currently won't
ever work, this doesn't care to use io_put_req_find_next() instead of
io_put_req().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>

refcount_inc_not_zero() -> refcount_inc() fix.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8a438665

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功