提交 · c349b387a7ec2f7bf5271eb3b254dd1a2f7ed187 · openanolis / cloud-kernel

29 6月, 2020 29 次提交

io_wq: add per-wq work handler instead of per work · c349b387

由 Pavel Begunkov 提交于 6月 08, 2020

to #28736503

commit f5fa38c59cb0b40633dee5cdf7465801be3e4928 upstream

io_uring is the only user of io-wq, and now it uses only io-wq callback
for all its requests, namely io_wq_submit_work(). Instead of storing
work->runner callback in each instance of io_wq_work, keep it in io-wq
itself.

pros:
- reduces io_wq_work size
- more robust -- ->func won't be invalidated with mem{cpy,set}(req)
- helps other work
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c349b387

io_uring: don't arm a timeout through work.func · efc07284

由 Pavel Begunkov 提交于 6月 08, 2020

to #28736503

commit d4c81f38522f3e7f4be1b472ef9988d0ed7f3696 upstream

Remove io_link_work_cb() -- the last custom work.func.
Not the prettiest thing, but works. Instead of queueing a linked timeout
in io_link_work_cb() mark a request with REQ_F_QUEUE_TIMEOUT and do
enqueueing based on the flag in io_wq_submit_work().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

efc07284

io_uring: remove custom ->func handlers · 7e2014fa

由 Pavel Begunkov 提交于 6月 08, 2020

to #28736503

commit ac45abc0e2a8ed16ecc0eea039fe762ddfefbcad upstream

In preparation of getting rid of work.func, this removes almost all
custom instances of it, leaving only io_wq_submit_work() and
io_link_work_cb(). And the last one will be dealt later.

Nothing fancy, just routinely remove *_finish() function and inline
what's left. E.g. remove io_fsync_finish() + inline __io_fsync() into
io_fsync().

As no users of io_req_cancelled() are left, delete it as well. The patch
adds extra switch lookup on cold-ish path, but that's overweighted by
nice diffstat and other benefits of the following patches.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

7e2014fa

io_uring: don't derive close state from ->func · c6ed5079

由 Pavel Begunkov 提交于 6月 08, 2020

to #28736503

commit 3af73b286ccee493dc055fc58da02b2dc7a5304d upstream

Relying on having a specific work.func is dangerous, even if an opcode
handler set it itself. E.g. io_wq_assign_next() can modify it.

io_close() sets a custom work.func to indicate that
__close_fd_get_file() was already called. Fortunately, there is no bugs
with io_wq_assign_next() and close yet.

Still, do it safe and always be prepared to be called through
io_wq_submit_work(). Zero req->close.put_file in prep, and call
__close_fd_get_file() IFF it's NULL.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c6ed5079

io_uring: use kvfree() in io_sqe_buffer_register() · 0885bdfb

由 Denis Efremov 提交于 6月 05, 2020

to #28736503

commit a8c73c1a614f6da6c0b04c393f87447e28cb6de4 upstream

Use kvfree() to free the pages and vmas, since they are allocated by
kvmalloc_array() in a loop.

Fixes: d4ef647510b1 ("io_uring: avoid page allocation warnings")
Signed-off-by: NDenis Efremov <efremov@linux.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200605093203.40087-1-efremov@linux.comSigned-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

0885bdfb

io_uring: validate the full range of provided buffers for access · f48b6274

由 Bijan Mottahedeh 提交于 6月 04, 2020

to #28736503

commit efe68c1ca8f49e8c06afd74b699411bfbb8ba1ff upstream

Account for the number of provided buffers when validating the address
range.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

f48b6274

io_uring: re-set iov base/len for buffer select retry · 616bb4fd

由 Jens Axboe 提交于 6月 04, 2020

to #28736503

commit dddb3e26f6d88c5344d28cb5ff9d3d6fa05c4f7a upstream

We already have the buffer selected, but we should set the iter list
again.

Cc: stable@vger.kernel.org # v5.7
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

616bb4fd

io_uring: move send/recv IOPOLL check into prep · 43770e26

由 Pavel Begunkov 提交于 6月 03, 2020

to #28736503

commit d2b6f48b691ed67569786c332f0173b918d3fd1b upstream

Fail recv/send in case of IORING_SETUP_IOPOLL earlier during prep,
so it'd be done only once. Removes duplication as well
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

43770e26

io_uring: deduplicate io_openat{,2}_prep() · 6ead77bf

由 Pavel Begunkov 提交于 6月 03, 2020

to #28736503

commit ec65fea5a8d7a82d3137dd2a44197eb577da111f upstream

io_openat_prep() and io_openat2_prep() are identical except for how
struct open_how is built. Deduplicate it with a helper.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

6ead77bf

io_uring: do build_open_how() only once · d81e76a7

由 Pavel Begunkov 提交于 6月 03, 2020

to #28736503

commit 25e72d1012b30bdff712b563e6141a4f311d28d6 upstream

build_open_how() is just adjusting open_flags/mode. Do it once during
prep. It looks better than storing raw values for the future.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d81e76a7

io_uring: fix {SQ,IO}POLL with unsupported opcodes · a17a35d5

由 Pavel Begunkov 提交于 6月 03, 2020

to #28736503

commit 3232dd02af65f2d01be641120d2a710176b0c7a7 upstream

IORING_SETUP_IOPOLL is defined only for read/write, other opcodes should
be disallowed, otherwise it'll get an error as below. Also refuse
open/close with SQPOLL, as the polling thread wouldn't know which file
table to use.

RIP: 0010:io_iopoll_getevents+0x111/0x5a0
Call Trace:
 ? _raw_spin_unlock_irqrestore+0x24/0x40
 ? do_send_sig_info+0x64/0x90
 io_iopoll_reap_events.part.0+0x5e/0xa0
 io_ring_ctx_wait_and_kill+0x132/0x1c0
 io_uring_release+0x20/0x30
 __fput+0xcd/0x230
 ____fput+0xe/0x10
 task_work_run+0x67/0xa0
 do_exit+0x353/0xb10
 ? handle_mm_fault+0xd4/0x200
 ? syscall_trace_enter+0x18c/0x2c0
 do_group_exit+0x43/0xa0
 __x64_sys_exit_group+0x18/0x20
 do_syscall_64+0x60/0x1e0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
[axboe: allow provide/remove buffers and files update]
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

a17a35d5

io_uring: disallow close of ring itself · da73d5e3

由 Jens Axboe 提交于 6月 02, 2020

to #28736503

commit fd2206e4e97b5bae422d9f2f9ebbc79bc97e44a5 upstream

A previous commit enabled this functionality, which also enabled O_PATH
to work correctly with io_uring. But we can't safely close the ring
itself, as the file handle isn't reference counted inside
io_uring_enter(). Instead of jumping through hoops to enable ring
closure, add a "soft" ->needs_file option, ->needs_file_no_error. This
enables O_PATH file descriptors to work, but still catches the case of
trying to close the ring itself.
Reported-by: NJann Horn <jannh@google.com>
Fixes: 904fbcb115c8 ("io_uring: remove 'fd is io_uring' from close path")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

da73d5e3

io_uring: fix overflowed reqs cancellation · edba7ee1

由 Pavel Begunkov 提交于 5月 30, 2020

to #28736503

commit 7b53d59859bc932b37895d2d37388e7fa29af7a5 upstream

Overflowed requests in io_uring_cancel_files() should be shed only of
inflight and overflowed refs. All other left references are owned by
someone else.

If refcount_sub_and_test() fails, it will go further and put put extra
ref, don't do that. Also, don't need to do io_wq_cancel_work()
for overflowed reqs, they will be let go shortly anyway.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

edba7ee1

io_uring: off timeouts based only on completions · 39e5e106

由 Pavel Begunkov 提交于 5月 30, 2020

to #28736503

commit bfe68a221905de37e65394a6d58c1e5f3e545d2f upstream

Offset timeouts wait not for sqe->off non-timeout CQEs, but rather
sqe->off + number of prior inflight requests. Wait exactly for
sqe->off non-timeout completions
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

39e5e106

io_uring: move timeouts flushing to a helper · 58825d4e

由 Pavel Begunkov 提交于 5月 30, 2020

to #28736503

commit 360428f8c0cd857006a8a3f515946285370489ac upstream

Separate flushing offset timeouts io_commit_cqring() by moving it into a
helper. Just a preparation, makes following patches clearer.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

58825d4e

io_uring: call statx directly · 743f0324

由 Bijan Mottahedeh 提交于 5月 22, 2020

to #28736503

commit e62753e4e2926f249d088cc0517be5ed4efec6d6 upstream

Calling statx directly both simplifies the interface and avoids potential
incompatibilities between sync and async invokations.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

743f0324

io_uring: add io_statx structure · 756f2076

由 Bijan Mottahedeh 提交于 5月 22, 2020

to #28736503

commit 1d9e1288039a47dc1189c3c1fed5cf3c215e94b7 upstream

Separate statx data from open in io_kiocb. No functional changes.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

756f2076

io_uring: get rid of manual punting in io_close · 492cf1ed

由 Pavel Begunkov 提交于 5月 26, 2020

to #28736503

commit 0bf0eefdab52d9f9f3a1eeda32a4fc7afe4e9219 upstream

io_close() was punting async manually to skip grabbing files. Use
REQ_F_NO_FILE_TABLE instead, and pass it through the generic path
with -EAGAIN.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

492cf1ed

io_uring: separate DRAIN flushing into a cold path · 7f7487a9

由 Pavel Begunkov 提交于 5月 26, 2020

to #28736503

commit 0451894522108d6c72934aff6ef89023743a9ed4 upstream

io_commit_cqring() assembly doesn't look good with extra code handling
drained requests. IOSQE_IO_DRAIN is slow and discouraged to be used in
a hot path, so try to minimise its impact by putting it into a helper
and doing a fast check.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

7f7487a9

io_uring: don't re-read sqe->off in timeout_prep() · 1ddc14b7

由 Pavel Begunkov 提交于 5月 26, 2020

to #28736503

commit 56080b02ed6e71fbc0add2d05a32ed7361dd736a upstream

SQEs are user writable, don't read sqe->off twice in io_timeout_prep()
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

1ddc14b7

io_uring: simplify io_timeout locking · 697ad1d7

由 Pavel Begunkov 提交于 5月 26, 2020

to #28736503

commit 733f5c95e6fdabd05b8dfc15e04512809c9652c2 upstream

Move spin_lock_irq() earlier to have only 1 call site of it in
io_timeout(). It makes the flow easier.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

697ad1d7

io_uring: fix flush req->refs underflow · a1be6c19

由 Pavel Begunkov 提交于 5月 26, 2020

to #28736503

commit 4518a3cc273cf82efdd36522fb1f13baad173c70 upstream

In io_uring_cancel_files(), after refcount_sub_and_test() leaves 0
req->refs, it calls io_put_req(), which would also put a ref. Call
io_free_req() instead.

Cc: stable@vger.kernel.org
Fixes: 2ca10259b418 ("io_uring: prune request from overflow list on flush")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

a1be6c19

io_uring: don't submit sqes when ctx->refs is dying · bc430a1c

由 Xiaoguang Wang 提交于 5月 20, 2020

to #28736503

commit 6b668c9b7fc6fc0c313cdaee8b75d17f4d954ab5 upstream

When IORING_SETUP_SQPOLL is enabled, io_ring_ctx_wait_and_kill() will wait
for sq thread to idle by busy loop:

    while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait))
        cond_resched();

Above loop isn't very CPU friendly, it may introduce a short cpu burst on
the current cpu.

If ctx->refs is dying, we forbid sq_thread from submitting any further
SQEs. Instead they just get discarded when we exit.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

bc430a1c

io_uring: remove req->needs_fixed_files · 7166d570

由 Pavel Begunkov 提交于 5月 17, 2020

to #28736503

commit 0cdaf760f42eb8e8a714c1cc017423e5da6d4936 upstream

A submission is "async" IIF it's done by SQPOLL thread. Instead of
passing @async flag into io_submit_sqes(), deduce it from ctx->flags.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

7166d570

io_uring: add tee(2) support · ae81f3f3

由 Pavel Begunkov 提交于 5月 17, 2020

to #28736503

commit f2a8d5c7a218b9c24befb756c4eb30aa550ce822 upstream

Add IORING_OP_TEE implementing tee(2) support. Almost identical to
splice bits, but without offsets.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

ae81f3f3

io_uring: don't repeat valid flag list · 157739bf

由 Pavel Begunkov 提交于 5月 17, 2020

to #28736503

commit c11368a57be460de889696f6ff8815fbcacf4db2 upstream

req->flags stores all sqe->flags. After checking that sqe->flags are
valid set if IOSQE* flags, no need to double check it, just forward them
all.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

157739bf

io_uring: rename io_file_put() · ff238118

由 Pavel Begunkov 提交于 5月 17, 2020

to #28736503

commit 9f13c35b33fddb186beab9ef21c555a01e45f4d7 upstream

io_file_put() deals with flushing state's file refs, adding "state" to
its name makes it a bit clearer. Also, avoid double check of
state->file in __io_file_get() in some cases.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

ff238118

io_uring: cleanup io_poll_remove_one() logic · 6f3b9878

由 Jens Axboe 提交于 5月 17, 2020

to #28736503

commit 3bfa5bcb26f0b52d7ae8416aa0618fff21aceaaf upstream

We only need apoll in the one section, do the juggling with the work
restoration there. This removes a special case further down as well.

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

6f3b9878

io_uring: remove obsolete 'state' parameter · 3cd5e53f

由 Xiaoguang Wang 提交于 5月 08, 2020

to #28736503

commit 7d01bd745a8f52ff2883f661235139ab6e7d23e6 upstream

The "struct io_submit_state *state" parameter is not used, remove it.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

3cd5e53f

12 6月, 2020 1 次提交

io_uring: check file O_NONBLOCK state for accept · e8758c9b

由 Jiufei Xue 提交于 6月 10, 2020

task #27774850

commit e697deed834de15d2322d0619d51893022c90ea2 upstream.

If the socket is O_NONBLOCK, we should complete the accept request
with -EAGAIN when data is not ready.
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

e8758c9b

04 6月, 2020 10 次提交

io_uring: reset -EBUSY error when io sq thread is waken up · 2abbba20

由 Xiaoguang Wang 提交于 5月 20, 2020

to #28170604

commit d4ae271dfaae2a5f41c015f2f20d62a1deeec734 upstream

In io_sq_thread(), currently if we get an -EBUSY error and go to sleep,
we will won't clear it again, which will result in io_sq_thread() will
never have a chance to submit sqes again. Below test program test.c
can reveal this bug:

int main(int argc, char *argv[])
{
        struct io_uring ring;
        int i, fd, ret;
        struct io_uring_sqe *sqe;
        struct io_uring_cqe *cqe;
        struct iovec *iovecs;
        void *buf;
        struct io_uring_params p;

        if (argc < 2) {
                printf("%s: file\n", argv[0]);
                return 1;
        }

        memset(&p, 0, sizeof(p));
        p.flags = IORING_SETUP_SQPOLL;
        ret = io_uring_queue_init_params(4, &ring, &p);
        if (ret < 0) {
                fprintf(stderr, "queue_init: %s\n", strerror(-ret));
                return 1;
        }

        fd = open(argv[1], O_RDONLY | O_DIRECT);
        if (fd < 0) {
                perror("open");
                return 1;
        }

        iovecs = calloc(10, sizeof(struct iovec));
        for (i = 0; i < 10; i++) {
                if (posix_memalign(&buf, 4096, 4096))
                        return 1;
                iovecs[i].iov_base = buf;
                iovecs[i].iov_len = 4096;
        }

        ret = io_uring_register_files(&ring, &fd, 1);
        if (ret < 0) {
                fprintf(stderr, "%s: register %d\n", __FUNCTION__, ret);
                return ret;
        }

        for (i = 0; i < 10; i++) {
                sqe = io_uring_get_sqe(&ring);
                if (!sqe)
                        break;

                io_uring_prep_readv(sqe, 0, &iovecs[i], 1, 0);
                sqe->flags |= IOSQE_FIXED_FILE;

                ret = io_uring_submit(&ring);
                sleep(1);
                printf("submit %d\n", i);
        }

        for (i = 0; i < 10; i++) {
                io_uring_wait_cqe(&ring, &cqe);
                printf("receive: %d\n", i);
                if (cqe->res != 4096) {
                        fprintf(stderr, "ret=%d, wanted 4096\n", cqe->res);
                        ret = 1;
                }
                io_uring_cqe_seen(&ring, cqe);
        }

        close(fd);
        io_uring_queue_exit(&ring);
        return 0;
}
sudo ./test testfile
above command will hang on the tenth request, to fix this bug, when io
sq_thread is waken up, we reset the variable 'ret' to be zero.
Suggested-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

2abbba20

io_uring: don't add non-IO requests to iopoll pending list · 4d7ac019

由 Jens Axboe 提交于 5月 19, 2020

to #28170604

commit b532576ed39efe3b351ae8320b2ab67a4c4c3719 upstream

We normally disable any commands that aren't specifically poll commands
for a ring that is setup for polling, but we do allow buffer provide and
remove commands to support buffer selection for polled IO. Once a
request is issued, we add it to the poll list to poll for completion. But
we should not do that for non-IO commands, as those request complete
inline immediately and aren't pollable. If we do, we can leave requests
on the iopoll list after they are freed.

Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4d7ac019

io_uring: don't use kiocb.private to store buf_index · d90653af

由 Bijan Mottahedeh 提交于 5月 19, 2020

to #28170604

commit 4f4eeba87cc731b200bff9372d14a80f5996b277 upstream

kiocb.private is used in iomap_dio_rw() so store buf_index separately.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>

Move 'buf_index' to a hole in io_kiocb.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

d90653af

io_uring: cancel work if task_work_add() fails · 44e75116

由 Jens Axboe 提交于 5月 18, 2020

to #28170604

commit e3aabf9554fd04eb14cd44ae7583fc9d40edd250 upstream

We currently move it to the io_wqe_manager for execution, but we cannot
safely do so as we may lack some of the state to execute it out of
context. As we cancel work anyway when the ring/task exits, just mark
this request as canceled and io_async_task_func() will do the right
thing.

Fixes: aa96bf8a9ee3 ("io_uring: use io-wq manager as backup task if task is exiting")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

44e75116

io_uring: remove dead check in io_splice() · f1645783

由 Jens Axboe 提交于 5月 17, 2020

to #28170604

commit 948a7749454b1712f1b2f2429f9493eb3e4a89b0 upstream

We checked for 'force_nonblock' higher up, so it's definitely false
at this point. Kill the check, it's a remnant of when we tried to do
inline splice without always punting to async context.

Fixes: 2fb3e82284fc ("io_uring: punt splice async because of inode mutex")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f1645783

io_uring: fix FORCE_ASYNC req preparation · c8fc1875

由 Pavel Begunkov 提交于 5月 17, 2020

to #28170604

commit bd2ab18a1d6267446eae1b47dd839050452bdf7f upstream

As for other not inlined requests, alloc req->io for FORCE_ASYNC reqs,
so they can be prepared properly.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c8fc1875

io_uring: don't prepare DRAIN reqs twice · 3c2eefb7

由 Pavel Begunkov 提交于 5月 17, 2020

to #28170604

commit 650b548129b60b0d23508351800108196f4aa89f upstream

If req->io is not NULL, it's already prepared. Don't do it again,
it's dangerous.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

3c2eefb7

io_uring: initialize ctx->sqo_wait earlier · c73bbba3

由 Jens Axboe 提交于 5月 17, 2020

to #28170604

commit 583863ed918136412ddf14de2e12534f17cfdc6f upstream

Ensure that ctx->sqo_wait is initialized as soon as the ctx is allocated,
instead of deferring it to the offload setup. This fixes a syzbot
reported lockdep complaint, which is really due to trying to wake_up
on an uninitialized wait queue:

RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441319
RDX: 0000000000000001 RSI: 0000000020000140 RDI: 000000000000047b
RBP: 0000000000010475 R08: 0000000000000001 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402260
R13: 00000000004022f0 R14: 0000000000000000 R15: 0000000000000000
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 7090 Comm: syz-executor222 Not tainted 5.7.0-rc1-next-20200415-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 assign_lock_key kernel/locking/lockdep.c:913 [inline]
 register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225
 __lock_acquire+0x104/0x4c50 kernel/locking/lockdep.c:4234
 lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4934
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159
 __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:122
 io_cqring_ev_posted+0xa5/0x1e0 fs/io_uring.c:1160
 io_poll_remove_all fs/io_uring.c:4357 [inline]
 io_ring_ctx_wait_and_kill+0x2bc/0x5a0 fs/io_uring.c:7305
 io_uring_create fs/io_uring.c:7843 [inline]
 io_uring_setup+0x115e/0x22b0 fs/io_uring.c:7870
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x441319
Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9

Reported-by: syzbot+8c91f5d054e998721c57@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c73bbba3

io_uring: polled fixed file must go through free iteration · b850f336

由 Jens Axboe 提交于 5月 13, 2020

to #28170604

commit 9d9e88a24c1f20ebfc2f28b1762ce78c0b9e1cb3 upstream

When we changed the file registration handling, it became important to
iterate the bulk request freeing list for fixed files as well, or we
miss dropping the fixed file reference. If not, we're leaking references,
and we'll get a kworker stuck waiting for file references to disappear.

This also means we can remove the special casing of fixed vs non-fixed
files, we need to iterate for both and we can just rely on
__io_req_aux_free() doing io_put_file() instead of doing it manually.

Fixes: 055895537302 ("io_uring: refactor file register/unregister/update handling")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

b850f336

io_uring: fix zero len do_splice() · a49e9093

由 Pavel Begunkov 提交于 5月 04, 2020

to #28170604

commit c96874265cd04b4bd4a8e114ac9af039a6d83cfe upstream

do_splice() doesn't expect len to be 0. Just always return 0 in this
case as splice(2) does.

Fixes: 7d67af2c0134 ("io_uring: add splice(2) support")
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a49e9093

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功