提交 · 2cfc484223ff4c3ca4aded95b429e10ef75bd240 · openanolis / cloud-kernel

04 6月, 2020 40 次提交

io_uring: io_async_task_func() should check and honor cancelation · 2cfc4842

由 Jens Axboe 提交于 4月 13, 2020

to #28170604

commit 2bae047ec9576da72d5003487de0bb93e747fff7 upstream

If the request has been marked as canceled, don't try and issue it.
Instead just fill a canceled event and finish the request.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

2cfc4842

io_uring: check for need to re-wait in polled async handling · 2e7787a6

由 Jens Axboe 提交于 4月 13, 2020

to #28170604

commit 74ce6ce43d4fc6ce15efb21378d9ef26125c298b upstream

We added this for just the regular poll requests in commit a6ba632d2c24
("io_uring: retry poll if we got woken with non-matching mask"), we
should do the same for the poll handler used pollable async requests.
Move the re-wait check and arm into a helper, and call it from
io_async_task_func() as well.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

2e7787a6

io_uring: correct O_NONBLOCK check for splice punt · 874faab1

由 Jens Axboe 提交于 4月 12, 2020

to #28170604

commit 88357580854aab29d27e1a443575caaedd081612 upstream

The splice file punt check uses file->f_mode to check for O_NONBLOCK,
but it should be checking file->f_flags. This leads to punting even
for files that have O_NONBLOCK set, which isn't necessary. This equates
to checking for FMODE_PATH, which will never be set on the fd in
question.

Fixes: 7d67af2c0134 ("io_uring: add splice(2) support")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

874faab1

io_uring: restore req->work when canceling poll request · 8ce2eca8

由 Xiaoguang Wang 提交于 4月 12, 2020

to #28170604

commit b1f573bd15fda2e19ea66a4d26fae8be1b12791d upstream

When running liburing test case 'accept', I got below warning:
RED: Invalid credentials
RED: At include/linux/cred.h:285
RED: Specified credentials: 00000000d02474a0
RED: ->magic=4b, put_addr=000000005b4f46e9
RED: ->usage=-1699227648, subscr=-25693
RED: ->*uid = { 256,-25693,-25693,65534 }
RED: ->*gid = { 0,-1925859360,-1789740800,-1827028688 }
RED: ->security is 00000000258c136e
eneral protection fault, probably for non-canonical address 0xdead4ead00000000: 0000 [#1] SMP PTI
PU: 21 PID: 2037 Comm: accept Not tainted 5.6.0+ #318
ardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
IP: 0010:dump_invalid_creds+0x16f/0x184
ode: 48 8b 83 88 00 00 00 48 3d ff 0f 00 00 76 29 48 89 c2 81 e2 00 ff ff ff 48
81 fa 00 6b 6b 6b 74 17 5b 48 c7 c7 4b b1 10 8e 5d <8b> 50 04 41 5c 8b 30 41 5d
e9 67 e3 04 00 5b 5d 41 5c 41 5d c3 0f
SP: 0018:ffffacc1039dfb38 EFLAGS: 00010087
AX: dead4ead00000000 RBX: ffff9ba39319c100 RCX: 0000000000000007
DX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8e10b14b
BP: ffffffff8e108476 R08: 0000000000000000 R09: 0000000000000001
10: 0000000000000000 R11: ffffacc1039df9e5 R12: 000000009552b900
13: 000000009319c130 R14: ffff9ba39319c100 R15: 0000000000000246
S:  00007f96b2bfc4c0(0000) GS:ffff9ba39f340000(0000) knlGS:0000000000000000
S:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
R2: 0000000000401870 CR3: 00000007db7a4000 CR4: 00000000000006e0
all Trace:
__invalid_creds+0x48/0x4a
__io_req_aux_free+0x2e8/0x3b0
? io_poll_remove_one+0x2a/0x1d0
__io_free_req+0x18/0x200
io_free_req+0x31/0x350
io_poll_remove_one+0x17f/0x1d0
io_poll_cancel.isra.80+0x6c/0x80
io_async_find_and_cancel+0x111/0x120
io_issue_sqe+0x181/0x10e0
? __lock_acquire+0x552/0xae0
? lock_acquire+0x8e/0x310
? fs_reclaim_acquire.part.97+0x5/0x30
__io_queue_sqe.part.100+0xc4/0x580
? io_submit_sqes+0x751/0xbd0
? rcu_read_lock_sched_held+0x32/0x40
io_submit_sqes+0x9ba/0xbd0
? __x64_sys_io_uring_enter+0x2b2/0x460
? __x64_sys_io_uring_enter+0xaf/0x460
? find_held_lock+0x2d/0x90
? __x64_sys_io_uring_enter+0x111/0x460
__x64_sys_io_uring_enter+0x2d7/0x460
do_syscall_64+0x5a/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xb3

After looking into codes, it turns out that this issue is because we didn't
restore the req->work, which is changed in io_arm_poll_handler(), req->work
is a union with below struct:
	struct {
		struct callback_head	task_work;
		struct hlist_node	hash_node;
		struct async_poll	*apoll;
	};
If we forget to restore, members in struct io_wq_work would be invalid,
restore the req->work to fix this issue.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

Get rid of not needed 'need_restore' variable.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8ce2eca8

io_uring: move all request init code in one place · 7b895592

由 Pavel Begunkov 提交于 4月 12, 2020

to #28170604

commit ef4ff581102a917a69877feca2e5347e2f3e458c upstream

Requests initialisation is scattered across several functions, namely
io_init_req(), io_submit_sqes(), io_submit_sqe(). Put it
in io_init_req() for better data locality and code clarity.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

7b895592

io_uring: keep all sqe->flags in req->flags · 34167552

由 Pavel Begunkov 提交于 4月 12, 2020

to #28170604

commit dea3b49c7fb09b4f6b6a574c0485ffeb9df7b69c upstream

It's a good idea to not read sqe->flags twice, as it's prone to security
bugs. Instead of passing it around, embeed them in req->flags. It's
already so except for IOSQE_IO_LINK.
1. rename former REQ_F_LINK -> REQ_F_LINK_HEAD
2. introduce and copy REQ_F_LINK, which mimics IO_IOSQE_LINK

And leave req_set_fail_links() using new REQ_F_LINK, because it's more
sensible.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

34167552

io_uring: early submission req fail code · 9990b323

由 Pavel Begunkov 提交于 4月 12, 2020

to #28170604

commit 1d4240cc9e7bb101dac58f30283fa24a809f5606 upstream

Having only one place for cleaning up a request after a link assembly/
submission failure will play handy in the future. At least it allows
to remove duplicated cleanup sequence.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

9990b323

io_uring: track mm through current->mm · c8acf67a

由 Pavel Begunkov 提交于 4月 12, 2020

to #28170604

commit bf9c2f1cdcc718b6d2d41172f6ca005fe22cc7ff upstream

As a preparation for extracting request init bits, remove self-coded mm
tracking from io_submit_sqes(), but rely on current->mm. It's more
convenient, than passing this piece of state in other functions.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c8acf67a

io_uring: remove obsolete @mm_fault · 8d30cf13

由 Pavel Begunkov 提交于 4月 12, 2020

to #28170604

commit dccc587f6c07ccc734588226fdf62f685558e89f upstream

If io_submit_sqes() can't grab an mm, it fails and exits right away.
There is no need to track the fact of the failure. Remove @mm_fault.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8d30cf13

io_uring: punt final io_ring_ctx wait-and-free to workqueue · a58f0235

由 Jens Axboe 提交于 4月 09, 2020

to #28170604

commit 85faa7b8346ebef0606d2d0df6d3f8c76acb3654 upstream

We can't reliably wait in io_ring_ctx_wait_and_kill(), since the
task_works list isn't ordered (in fact it's LIFO ordered). We could
either fix this with a separate task_works list for io_uring work, or
just punt the wait-and-free to async context. This ensures that
task_work that comes in while we're shutting down is processed
correctly. If we don't go async, we could have work past the fput()
work for the ring that depends on work that won't be executed until
after we're done with the wait-and-free. But as this operation is
blocking, it'll never get a chance to run.

This was reproduced with hundreds of thousands of sockets running
memcached, haven't been able to reproduce this synthetically.
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a58f0235

io_uring: fix fs cleanup on cqe overflow · 94dc6e5a

由 Pavel Begunkov 提交于 4月 09, 2020

to #28170604

commit c398ecb3d611925e4a5411afdf7489914a5c0460 upstream

If completion queue overflow occurs, __io_cqring_fill_event() will
update req->cflags, which is in a union with req->work and happens to
be aliased to req->work.fs. Following io_free_req() ->
io_req_work_drop_env() may get a bunch of different problems (miscount
fs->users, segfault, etc) on cleaning @fs.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

94dc6e5a

io_uring: don't read user-shared sqe flags twice · 4dc80933

由 Pavel Begunkov 提交于 4月 08, 2020

to #28170604

commit 9c280f9087118099f50566e906b9d9d5a0fb4529 upstream

Don't re-read userspace-shared sqe->flags, it can be exploited.
sqe->flags are copied into req->flags in io_submit_sqe(), check them
there instead.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4dc80933

io_uring: remove req init from io_get_req() · 654ce9d0

由 Pavel Begunkov 提交于 4月 08, 2020

to #28170604

commit 0553b8bda8709c47863eab3fff7ac32ad04ca52b upstream

io_get_req() do two different things: io_kiocb allocation and
initialisation. Move init part out of it and rename into
io_alloc_req(). It's simpler this way and also have better data
locality.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

654ce9d0

io_uring: alloc req only after getting sqe · 612d8c55

由 Pavel Begunkov 提交于 4月 08, 2020

to #28170604

commit b1e50e549b1372d9742509230dc4af7dd521d984 upstream

As io_get_sqe() split into 2 stage get/consume, get an sqe before
allocating io_kiocb, so no free_req*() for a failure case is needed,
and inline back __io_req_do_free(), which has only 1 user.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

612d8c55

io_uring: simplify io_get_sqring · 987399d5

由 Pavel Begunkov 提交于 4月 08, 2020

to #28170604

commit 709b302faddfac757d87df2080f900eccb1dc9e2 upstream

Make io_get_sqring() care only about sqes themselves, not initialising
the io_kiocb. Also, split it into get + consume, that will be helpful in
the future.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

987399d5

io_uring: do not always copy iovec in io_req_map_rw() · fea338e6

由 Xiaoguang Wang 提交于 4月 08, 2020

to #28170604

commit 45097daea2f4e89bdb1c98359f78d0d6feb8e5c8 upstream

In io_read_prep() or io_write_prep(), io_req_map_rw() takes
struct io_async_rw's fast_iov as argument to call io_import_iovec(),
and if io_import_iovec() uses struct io_async_rw's fast_iov as
valid iovec array, later indeed io_req_map_rw() does not need
to do the memcpy operation, because they are same pointers.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

fea338e6

io_uring: ensure openat sets O_LARGEFILE if needed · 85158b61

由 Jens Axboe 提交于 4月 08, 2020

to #28170604

commit 08a1d26eb894a9dcf79f674558a284ad1ffef517 upstream

OPENAT2 correctly sets O_LARGEFILE if it has to, but that escaped the
OPENAT opcode. Dmitry reports that his test case that compares openat()
and IORING_OP_OPENAT sees failures on large files:

*** sync openat
openat succeeded
sync write at offset 0
write succeeded
sync write at offset 4294967296
write succeeded

*** sync openat
openat succeeded
io_uring write at offset 0
write succeeded
io_uring write at offset 4294967296
write succeeded

*** io_uring openat
openat succeeded
sync write at offset 0
write succeeded
sync write at offset 4294967296
write failed: File too large

*** io_uring openat
openat succeeded
io_uring write at offset 0
write succeeded
io_uring write at offset 4294967296
write failed: File too large

Ensure we set O_LARGEFILE, if force_o_largefile() is true.

Cc: stable@vger.kernel.org # v5.6
Fixes: 15b71abe7b52 ("io_uring: add support for IORING_OP_OPENAT")
Reported-by: NDmitry Kadashev <dkadashev@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

85158b61

io_uring: initialize fixed_file_data lock · 05bbc4a0

由 Xiaoguang Wang 提交于 4月 07, 2020

to #28170604

commit f7fe9346869a12efe3af3cc9be2e45a1b6ff8761 upstream

syzbot reports below warning:
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 7099 Comm: syz-executor897 Not tainted 5.6.0-next-20200406-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 assign_lock_key kernel/locking/lockdep.c:913 [inline]
 register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225
 __lock_acquire+0x104/0x4e00 kernel/locking/lockdep.c:4223
 lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4923
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159
 io_sqe_files_register fs/io_uring.c:6599 [inline]
 __io_uring_register+0x1fe8/0x2f00 fs/io_uring.c:8001
 __do_sys_io_uring_register fs/io_uring.c:8081 [inline]
 __se_sys_io_uring_register fs/io_uring.c:8063 [inline]
 __x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:8063
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x440289
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffff1bbf558 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440289
RDX: 0000000020000280 RSI: 0000000000000002 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000401b10
R13: 0000000000401ba0 R14: 0000000000000000 R15: 0000000000000000

Initialize struct fixed_file_data's lock to fix this issue.

Reported-by: syzbot+e6eeca4a035da76b3065@syzkaller.appspotmail.com
Fixes: 055895537302 ("io_uring: refactor file register/unregister/update handling")
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

05bbc4a0

io_uring: remove redundant variable pointer nxt and io_wq_assign_next call · d3b87946

由 Colin Ian King 提交于 4月 06, 2020

to #28170604

commit 211fea18a7bb9b8d51cb5d2b9cbe5583af256609 upstream

An earlier commit "io_uring: remove @nxt from handlers" removed the
setting of pointer nxt and now it is always null, hence the non-null
check and call to io_wq_assign_next is redundant and can be removed.

Addresses-Coverity: ("'Constant' variable guard")
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

d3b87946

io_uring: fix ctx refcounting in io_submit_sqes() · a4f950b2

由 Pavel Begunkov 提交于 4月 06, 2020

to #28170604

commit 48bdd849e967f1c573d2b2bc24308e24a83f39c2 upstream

If io_get_req() fails, it drops a ref. Then, awhile keeping @Submitted
unmodified, io_submit_sqes() breaks the loop and puts @nr - @Submitted
refs. For each submitted req a ref is dropped in io_put_req() and
friends. So, for @nr taken refs there will be
(@nr - @Submitted + @Submitted + 1) dropped.

Remove ctx refcounting from io_get_req(), that at the same time makes
it clearer.

Fixes: 2b85edfc0c90 ("io_uring: batch getting pcpu references")
Cc: stable@vger.kernel.org # v5.6
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a4f950b2

io_uring: process requests completed with -EAGAIN on poll list · ee110291

由 Bijan Mottahedeh 提交于 4月 03, 2020

to #28170604

commit 581f981034890dfd27be7e98946e8f0461f3967a upstream

A request that completes with an -EAGAIN result after it has been added
to the poll list, will not be removed from that list in io_do_iopoll()
because the f_op->iopoll() will not succeed for that request.

Maintain a retryable local list similar to the done list, and explicity
reissue requests with an -EAGAIN result.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

ee110291

io_uring: remove bogus RLIMIT_NOFILE check in file registration · 38e2dd3c

由 Jens Axboe 提交于 4月 03, 2020

to #28170604

commit c336e992cb1cb1db9ee608dfb30342ae781057ab upstream

We already checked this limit when the file was opened, and we keep it
open in the file table. Hence when we added unit_inflight to the count
we want to register, we're doubly accounting these files. This results
in -EMFILE for file registration, if we're at half the limit.

Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

38e2dd3c

io_uring: use io-wq manager as backup task if task is exiting · 38954954

由 Jens Axboe 提交于 4月 03, 2020

to #28170604

commit aa96bf8a9ee33457b7e3ea43e97dfa1e3a15ab20 upstream

If the original task is (or has) exited, then the task work will not get
queued properly. Allow for using the io-wq manager task to queue this
work for execution, and ensure that the io-wq manager notices and runs
this work if woken up (or exiting).
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

38954954

io_uring: grab task reference for poll requests · e32c3252

由 Jens Axboe 提交于 4月 03, 2020

to #28170604

commit 3537b6a7c65434d0d2cc0c9862e69be11c367fdc upstream

We can have a task exit if it's not the owner of the ring. Be safe and
grab an actual reference to it, to avoid a potential use-after-free.
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e32c3252

io_uring: retry poll if we got woken with non-matching mask · 04e4ec37

由 Jens Axboe 提交于 4月 03, 2020

to #28170604

commit a6ba632d2c249a4390289727c07b8b55eb02a41d upstream

If we get woken and the poll doesn't match our mask, re-add the task
to the poll waitqueue and try again instead of completing the request
with a mask of 0.
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

04e4ec37

io_uring: add missing finish_wait() in io_sq_thread() · c7c2f54b

由 Hillf Danton 提交于 4月 01, 2020

to #28170604

commit 10bea96dcc13ad841d53bdcc9d8e731e9e0ad58f upstream

Add it to pair with prepare_to_wait() in an attempt to avoid
anything weird in the field.

Fixes: b41e98524e42 ("io_uring: add per-task callback handler")
Reported-by: syzbot+0c3370f235b74b3cfd97@syzkaller.appspotmail.com
Signed-off-by: NHillf Danton <hdanton@sina.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c7c2f54b

io_uring: refactor file register/unregister/update handling · 6663e7bf

由 Xiaoguang Wang 提交于 3月 31, 2020

to #28170604

commit 0558955373023b08f638c9ede36741b0e4200f58 upstream

While diving into io_uring fileset register/unregister/update codes, we
found one bug in the fileset update handling. io_uring fileset update
use a percpu_ref variable to check whether we can put the previously
registered file, only when the refcnt of the perfcpu_ref variable
reaches zero, can we safely put these files. But this doesn't work so
well. If applications always issue requests continually, this
perfcpu_ref will never have an chance to reach zero, and it'll always be
in atomic mode, also will defeat the gains introduced by fileset
register/unresiger/update feature, which are used to reduce the atomic
operation overhead of fput/fget.

To fix this issue, while applications do IORING_REGISTER_FILES or
IORING_REGISTER_FILES_UPDATE operations, we allocate a new percpu_ref
and kill the old percpu_ref, new requests will use the new percpu_ref.
Once all previous old requests complete, old percpu_refs will be dropped
and registered files will be put safely.

Link: https://lore.kernel.org/io-uring/5a8dac33-4ca2-4847-b091-f7dcd3ad0ff3@linux.alibaba.com/T/#tSigned-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

6663e7bf

io_uring: cleanup io_alloc_async_ctx() · 23f4e892

由 Xiaoguang Wang 提交于 3月 27, 2020

to #28170604

commit 3d9932a8b240c9019f48358e8a6928c53c2c7f6b upstream

Cleanup io_alloc_async_ctx() a bit, add a new __io_alloc_async_ctx(),
so io_setup_async_rw() won't need to check whether async_ctx is true
or false again.
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

23f4e892

io_uring: fix missing 'return' in comment · cd99ef9b

由 Chucheng Luo 提交于 3月 25, 2020

to #28170604

commit bff6035d0c40fa1dd195aa41f61814d622883420 upstream

The missing 'return' work may make it hard for other developers to
understand it.
Signed-off-by: NChucheng Luo <luochucheng@vivo.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

cd99ef9b

io-wq: handle hashed writes in chains · 56986bc4

由 Pavel Begunkov 提交于 3月 23, 2020

to #28170604

commit 86f3cd1b589a10dbdca98c52cc0cd0f56523c9b3 upstream

We always punt async buffered writes to an io-wq helper, as the core
kernel does not have IOCB_NOWAIT support for that. Most buffered async
writes complete very quickly, as it's just a copy operation. This means
that doing multiple locking roundtrips on the shared wqe lock for each
buffered write is wasteful. Additionally, buffered writes are hashed
work items, which means that any buffered write to a given file is
serialized.

Keep identicaly hashed work items contiguously in @wqe->work_list, and
track a tail for each hash bucket. On dequeue of a hashed item, splice
all of the same hash in one go using the tracked tail. Until the batch
is done, the caller doesn't have to synchronize with the wqe or worker
locks again.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

56986bc4

io-uring: drop 'free_pfile' in struct io_file_put · 34474b3c

由 Hillf Danton 提交于 3月 23, 2020

to #28170604

commit a5318d3cdffbecf075928363d7e4becfeddabfcb upstream

Sync removal of file is only used in case of a GFP_KERNEL kmalloc
failure at the cost of io_file_put::done and work flush, while a
glich like it can be handled at the call site without too much pain.

That said, what is proposed is to drop sync removing of file, and
the kink in neck as well.
Signed-off-by: NHillf Danton <hdanton@sina.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

34474b3c

io-uring: drop completion when removing file · 8658724a

由 Hillf Danton 提交于 3月 23, 2020

to #28170604

commit 4afdb733b1606c6cb86e7833f9335f4870cf7ddd upstream

A case of task hung was reported by syzbot,

INFO: task syz-executor975:9880 blocked for more than 143 seconds.
      Not tainted 5.6.0-rc6-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor975 D27576  9880   9878 0x80004000
Call Trace:
 schedule+0xd0/0x2a0 kernel/sched/core.c:4154
 schedule_timeout+0x6db/0xba0 kernel/time/timer.c:1871
 do_wait_for_common kernel/sched/completion.c:83 [inline]
 __wait_for_common kernel/sched/completion.c:104 [inline]
 wait_for_common kernel/sched/completion.c:115 [inline]
 wait_for_completion+0x26a/0x3c0 kernel/sched/completion.c:136
 io_queue_file_removal+0x1af/0x1e0 fs/io_uring.c:5826
 __io_sqe_files_update.isra.0+0x3a1/0xb00 fs/io_uring.c:5867
 io_sqe_files_update fs/io_uring.c:5918 [inline]
 __io_uring_register+0x377/0x2c00 fs/io_uring.c:7131
 __do_sys_io_uring_register fs/io_uring.c:7202 [inline]
 __se_sys_io_uring_register fs/io_uring.c:7184 [inline]
 __x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:7184
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

and bisect pointed to 05f3fb3c5397 ("io_uring: avoid ring quiesce for
fixed file set unregister and update").

It is down to the order that we wait for work done before flushing it
while nobody is likely going to wake us up.

We can drop that completion on stack as flushing work itself is a sync
operation we need and no more is left behind it.

To that end, io_file_put::done is re-used for indicating if it can be
freed in the workqueue worker context.
Reported-and-Inspired-by: Nsyzbot <syzbot+538d1957ce178382a394@syzkaller.appspotmail.com>
Signed-off-by: NHillf Danton <hdanton@sina.com>

Rename ->done to ->free_pfile
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8658724a

io_uring: Fix ->data corruption on re-enqueue · fd7ce40a

由 Pavel Begunkov 提交于 3月 23, 2020

to #28170604

commit 18a542ff19ad149fac9e5a36a4012e3cac7b3b3b upstream

work->data and work->list are shared in union. io_wq_assign_next() sets
->data if a req having a linked_timeout, but then io-wq may want to use
work->list, e.g. to do re-enqueue of a request, so corrupting ->data.

->data is not necessary, just remove it and extract linked_timeout
through @Link_list.

Fixes: 60cf46ae6054 ("io-wq: hash dependent work")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

fd7ce40a

io-wq: close cancel gap for hashed linked work · 4aa92083

由 Pavel Begunkov 提交于 3月 22, 2020

to #28170604

commit f2cf11492b8b30d89b2fbf525c9ea5e8c4ccc842 upstream

After io_assign_current_work() of a linked work, it can be decided to
offloaded to another thread so doing io_wqe_enqueue(). However, until
next io_assign_current_work() it can be cancelled, that isn't handled.

Don't assign it, if it's not going to be executed.

Fixes: 60cf46ae6054 ("io-wq: hash dependent work")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4aa92083

io_uring: make spdxcheck.py happy · 8b3e2b70

由 Lukas Bulwahn 提交于 3月 21, 2020

to #28170604

commit 9f5834c868e901b00f1bfe4d0052b5906b4a2b7f upstream

Commit bbbdeb4720a0 ("io_uring: dual license io_uring.h uapi header")
uses a nested SPDX-License-Identifier to dual license the header.

Since then, ./scripts/spdxcheck.py complains:

  include/uapi/linux/io_uring.h: 1:60 Missing parentheses: OR

Add parentheses to make spdxcheck.py happy.
Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8b3e2b70

io_uring: honor original task RLIMIT_FSIZE · e335da1a

由 Jens Axboe 提交于 3月 20, 2020

to #28170604

commit 4ed734b0d0913e566a9d871e15d24eb240f269f7 upstream

With the previous fixes for number of files open checking, I added some
debug code to see if we had other spots where we're checking rlimit()
against the async io-wq workers. The only one I found was file size
checking, which we should also honor.

During write and fallocate prep, store the max file size and override
that for the current ask if we're in io-wq worker context.

Cc: stable@vger.kernel.org # 5.1+
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e335da1a

io-wq: hash dependent work · f749f220

由 Pavel Begunkov 提交于 3月 14, 2020

to #28170604

commit 60cf46ae605446feb0c43c472c0fd1af4cd96231 upstream

Enable io-wq hashing stuff for dependent works simply by re-enqueueing
such requests.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f749f220

io-wq: split hashing and enqueueing · 9fc5877e

由 Pavel Begunkov 提交于 3月 14, 2020

to #28170604

commit 8766dd516c535abf04491dca674d0ef6c95d814f upstream

It's a preparation patch removing io_wq_enqueue_hashed(), which
now should be done by io_wq_hash_work() + io_wq_enqueue().

Also, set hash value for dependant works, and do it as late as possible,
because req->file can be unavailable before. This hash will be ignored
by io-wq.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

9fc5877e

io-wq: don't resched if there is no work · 5a8fe193

由 Pavel Begunkov 提交于 3月 14, 2020

to #28170604

commit d78298e73a3443a3c1766fa89f5370f52a4efd94 upstream

This little tweak restores the behaviour that was before the recent
io_worker_handle_work() optimisation patches. It makes the function do
cond_resched() and flush_signals() only if there is an actual work to
execute.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

5a8fe193

io-wq: remove duplicated cancel code · 2c38ddc5

由 Pavel Begunkov 提交于 3月 07, 2020

to #28170604

commit 2293b4195800f88de2c454a24b25874be56d87f3 upstream

Deduplicate cancellation parts, as many of them looks the same, as do
e.g.
- io_wqe_cancel_cb_work() and io_wqe_cancel_work()
- io_wq_worker_cancel() and io_work_cancel()
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

2c38ddc5

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功