- 29 6月, 2020 13 次提交
-
-
由 Bijan Mottahedeh 提交于
to #28736503 commit 1d9e1288039a47dc1189c3c1fed5cf3c215e94b7 upstream Separate statx data from open in io_kiocb. No functional changes. Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit 0bf0eefdab52d9f9f3a1eeda32a4fc7afe4e9219 upstream io_close() was punting async manually to skip grabbing files. Use REQ_F_NO_FILE_TABLE instead, and pass it through the generic path with -EAGAIN. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit 0451894522108d6c72934aff6ef89023743a9ed4 upstream io_commit_cqring() assembly doesn't look good with extra code handling drained requests. IOSQE_IO_DRAIN is slow and discouraged to be used in a hot path, so try to minimise its impact by putting it into a helper and doing a fast check. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit 56080b02ed6e71fbc0add2d05a32ed7361dd736a upstream SQEs are user writable, don't read sqe->off twice in io_timeout_prep() Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit 733f5c95e6fdabd05b8dfc15e04512809c9652c2 upstream Move spin_lock_irq() earlier to have only 1 call site of it in io_timeout(). It makes the flow easier. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit 4518a3cc273cf82efdd36522fb1f13baad173c70 upstream In io_uring_cancel_files(), after refcount_sub_and_test() leaves 0 req->refs, it calls io_put_req(), which would also put a ref. Call io_free_req() instead. Cc: stable@vger.kernel.org Fixes: 2ca10259b418 ("io_uring: prune request from overflow list on flush") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
to #28736503 commit 6b668c9b7fc6fc0c313cdaee8b75d17f4d954ab5 upstream When IORING_SETUP_SQPOLL is enabled, io_ring_ctx_wait_and_kill() will wait for sq thread to idle by busy loop: while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait)) cond_resched(); Above loop isn't very CPU friendly, it may introduce a short cpu burst on the current cpu. If ctx->refs is dying, we forbid sq_thread from submitting any further SQEs. Instead they just get discarded when we exit. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit 0cdaf760f42eb8e8a714c1cc017423e5da6d4936 upstream A submission is "async" IIF it's done by SQPOLL thread. Instead of passing @async flag into io_submit_sqes(), deduce it from ctx->flags. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit f2a8d5c7a218b9c24befb756c4eb30aa550ce822 upstream Add IORING_OP_TEE implementing tee(2) support. Almost identical to splice bits, but without offsets. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit c11368a57be460de889696f6ff8815fbcacf4db2 upstream req->flags stores all sqe->flags. After checking that sqe->flags are valid set if IOSQE* flags, no need to double check it, just forward them all. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit 9f13c35b33fddb186beab9ef21c555a01e45f4d7 upstream io_file_put() deals with flushing state's file refs, adding "state" to its name makes it a bit clearer. Also, avoid double check of state->file in __io_file_get() in some cases. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28736503 commit 3bfa5bcb26f0b52d7ae8416aa0618fff21aceaaf upstream We only need apoll in the one section, do the juggling with the work restoration there. This removes a special case further down as well. No functional changes in this patch. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
to #28736503 commit 7d01bd745a8f52ff2883f661235139ab6e7d23e6 upstream The "struct io_submit_state *state" parameter is not used, remove it. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
- 12 6月, 2020 1 次提交
-
-
由 Jiufei Xue 提交于
task #27774850 commit e697deed834de15d2322d0619d51893022c90ea2 upstream. If the socket is O_NONBLOCK, we should complete the accept request with -EAGAIN when data is not ready. Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
- 04 6月, 2020 26 次提交
-
-
由 Xiaoguang Wang 提交于
to #28170604 commit d4ae271dfaae2a5f41c015f2f20d62a1deeec734 upstream In io_sq_thread(), currently if we get an -EBUSY error and go to sleep, we will won't clear it again, which will result in io_sq_thread() will never have a chance to submit sqes again. Below test program test.c can reveal this bug: int main(int argc, char *argv[]) { struct io_uring ring; int i, fd, ret; struct io_uring_sqe *sqe; struct io_uring_cqe *cqe; struct iovec *iovecs; void *buf; struct io_uring_params p; if (argc < 2) { printf("%s: file\n", argv[0]); return 1; } memset(&p, 0, sizeof(p)); p.flags = IORING_SETUP_SQPOLL; ret = io_uring_queue_init_params(4, &ring, &p); if (ret < 0) { fprintf(stderr, "queue_init: %s\n", strerror(-ret)); return 1; } fd = open(argv[1], O_RDONLY | O_DIRECT); if (fd < 0) { perror("open"); return 1; } iovecs = calloc(10, sizeof(struct iovec)); for (i = 0; i < 10; i++) { if (posix_memalign(&buf, 4096, 4096)) return 1; iovecs[i].iov_base = buf; iovecs[i].iov_len = 4096; } ret = io_uring_register_files(&ring, &fd, 1); if (ret < 0) { fprintf(stderr, "%s: register %d\n", __FUNCTION__, ret); return ret; } for (i = 0; i < 10; i++) { sqe = io_uring_get_sqe(&ring); if (!sqe) break; io_uring_prep_readv(sqe, 0, &iovecs[i], 1, 0); sqe->flags |= IOSQE_FIXED_FILE; ret = io_uring_submit(&ring); sleep(1); printf("submit %d\n", i); } for (i = 0; i < 10; i++) { io_uring_wait_cqe(&ring, &cqe); printf("receive: %d\n", i); if (cqe->res != 4096) { fprintf(stderr, "ret=%d, wanted 4096\n", cqe->res); ret = 1; } io_uring_cqe_seen(&ring, cqe); } close(fd); io_uring_queue_exit(&ring); return 0; } sudo ./test testfile above command will hang on the tenth request, to fix this bug, when io sq_thread is waken up, we reset the variable 'ret' to be zero. Suggested-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28170604 commit b532576ed39efe3b351ae8320b2ab67a4c4c3719 upstream We normally disable any commands that aren't specifically poll commands for a ring that is setup for polling, but we do allow buffer provide and remove commands to support buffer selection for polled IO. Once a request is issued, we add it to the poll list to poll for completion. But we should not do that for non-IO commands, as those request complete inline immediately and aren't pollable. If we do, we can leave requests on the iopoll list after they are freed. Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Bijan Mottahedeh 提交于
to #28170604 commit 4f4eeba87cc731b200bff9372d14a80f5996b277 upstream kiocb.private is used in iomap_dio_rw() so store buf_index separately. Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com> Move 'buf_index' to a hole in io_kiocb. Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28170604 commit e3aabf9554fd04eb14cd44ae7583fc9d40edd250 upstream We currently move it to the io_wqe_manager for execution, but we cannot safely do so as we may lack some of the state to execute it out of context. As we cancel work anyway when the ring/task exits, just mark this request as canceled and io_async_task_func() will do the right thing. Fixes: aa96bf8a9ee3 ("io_uring: use io-wq manager as backup task if task is exiting") Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28170604 commit 948a7749454b1712f1b2f2429f9493eb3e4a89b0 upstream We checked for 'force_nonblock' higher up, so it's definitely false at this point. Kill the check, it's a remnant of when we tried to do inline splice without always punting to async context. Fixes: 2fb3e82284fc ("io_uring: punt splice async because of inode mutex") Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28170604 commit bd2ab18a1d6267446eae1b47dd839050452bdf7f upstream As for other not inlined requests, alloc req->io for FORCE_ASYNC reqs, so they can be prepared properly. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28170604 commit 650b548129b60b0d23508351800108196f4aa89f upstream If req->io is not NULL, it's already prepared. Don't do it again, it's dangerous. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28170604 commit 583863ed918136412ddf14de2e12534f17cfdc6f upstream Ensure that ctx->sqo_wait is initialized as soon as the ctx is allocated, instead of deferring it to the offload setup. This fixes a syzbot reported lockdep complaint, which is really due to trying to wake_up on an uninitialized wait queue: RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441319 RDX: 0000000000000001 RSI: 0000000020000140 RDI: 000000000000047b RBP: 0000000000010475 R08: 0000000000000001 R09: 00000000004002c8 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402260 R13: 00000000004022f0 R14: 0000000000000000 R15: 0000000000000000 INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 7090 Comm: syz-executor222 Not tainted 5.7.0-rc1-next-20200415-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 assign_lock_key kernel/locking/lockdep.c:913 [inline] register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225 __lock_acquire+0x104/0x4c50 kernel/locking/lockdep.c:4234 lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4934 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159 __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:122 io_cqring_ev_posted+0xa5/0x1e0 fs/io_uring.c:1160 io_poll_remove_all fs/io_uring.c:4357 [inline] io_ring_ctx_wait_and_kill+0x2bc/0x5a0 fs/io_uring.c:7305 io_uring_create fs/io_uring.c:7843 [inline] io_uring_setup+0x115e/0x22b0 fs/io_uring.c:7870 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x441319 Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 Reported-by: syzbot+8c91f5d054e998721c57@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28170604 commit 9d9e88a24c1f20ebfc2f28b1762ce78c0b9e1cb3 upstream When we changed the file registration handling, it became important to iterate the bulk request freeing list for fixed files as well, or we miss dropping the fixed file reference. If not, we're leaking references, and we'll get a kworker stuck waiting for file references to disappear. This also means we can remove the special casing of fixed vs non-fixed files, we need to iterate for both and we can just rely on __io_req_aux_free() doing io_put_file() instead of doing it manually. Fixes: 055895537302 ("io_uring: refactor file register/unregister/update handling") Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28170604 commit c96874265cd04b4bd4a8e114ac9af039a6d83cfe upstream do_splice() doesn't expect len to be 0. Just always return 0 in this case as splice(2) does. Fixes: 7d67af2c0134 ("io_uring: add splice(2) support") Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28170604 commit 63ff822358b276137059520cf16e587e8073e80f upstream We currently make some guesses as when to open this fd, but in reality we have no business (or need) to do so at all. In fact, it makes certain things fail, like O_PATH. Remove the fd lookup from these opcodes, we're just passing the 'fd' to generic helpers anyway. With that, we can also remove the special casing of fd values in io_req_needs_file(), and the 'fd_non_neg' check that we have. And we can ensure that we only read sqe->fd once. This fixes O_PATH usage with openat/openat2, and ditto statx path side oddities. Cc: stable@vger.kernel.org: # v5.6 Reported-by: NMax Kellermann <mk@cm4all.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
to #28170604 commit 7f13657d141346125f4d0bb93eab4777f40c406e upstream If copy_to_user() in io_uring_setup() failed, we'll leak many kernel resources, which will be recycled until process terminates. This bug can be reproduced by using mprotect to set params to PROT_READ. To fix this issue, refactor io_uring_create() a bit to add a new 'struct io_uring_params __user *params' parameter and move the copy_to_user() in io_uring_setup() to io_uring_setup(), if copy_to_user() failed, we can free kernel resource properly. Suggested-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
to #28170604 commit d8f1b9716cfd1a1f74c0fedad40c5f65a25aa208 upstream The prepare_to_wait() and finish_wait() calls in io_uring_cancel_files() are mismatched. Currently I don't see any issues related this bug, just find it by learning codes. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28170604 commit 2fb3e82284fca40afbde5351907f0a5b3be717f9 upstream Nonblocking do_splice() still may wait for some time on an inode mutex. Let's play safe and always punt it async. Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28170604 commit 4ee3631451c9a62e6b6bc7ee51fb9a5b34e33509 upstream io_req_defer() do double-checked locking. Use proper helpers for that, i.e. list_empty_careful(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28170604 commit 7759a0bfadceef3910d0e50f86d63b6ed58b4e70 upstream [ 40.179474] refcount_t: underflow; use-after-free. [ 40.179499] WARNING: CPU: 6 PID: 1848 at lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0 ... [ 40.179612] RIP: 0010:refcount_warn_saturate+0xae/0xf0 [ 40.179617] Code: 28 44 0a 01 01 e8 d7 01 c2 ff 0f 0b 5d c3 80 3d 15 44 0a 01 00 75 91 48 c7 c7 b8 f5 75 be c6 05 05 44 0a 01 01 e8 b7 01 c2 ff <0f> 0b 5d c3 80 3d f3 43 0a 01 00 0f 85 6d ff ff ff 48 c7 c7 10 f6 [ 40.179619] RSP: 0018:ffffb252423ebe18 EFLAGS: 00010286 [ 40.179623] RAX: 0000000000000000 RBX: ffff98d65e929400 RCX: 0000000000000000 [ 40.179625] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 00000000ffffffff [ 40.179627] RBP: ffffb252423ebe18 R08: 0000000000000001 R09: 000000000000055d [ 40.179629] R10: 0000000000000c8c R11: 0000000000000001 R12: 0000000000000000 [ 40.179631] R13: ffff98d68c434400 R14: ffff98d6a9cbaa20 R15: ffff98d6a609ccb8 [ 40.179634] FS: 0000000000000000(0000) GS:ffff98d6af580000(0000) knlGS:0000000000000000 [ 40.179636] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 40.179638] CR2: 00000000033e3194 CR3: 000000006480a003 CR4: 00000000003606e0 [ 40.179641] Call Trace: [ 40.179652] io_put_req+0x36/0x40 [ 40.179657] io_free_work+0x15/0x20 [ 40.179661] io_worker_handle_work+0x2f5/0x480 [ 40.179667] io_wqe_worker+0x2a9/0x360 [ 40.179674] ? _raw_spin_unlock_irqrestore+0x24/0x40 [ 40.179681] kthread+0x12c/0x170 [ 40.179685] ? io_worker_handle_work+0x480/0x480 [ 40.179690] ? kthread_park+0x90/0x90 [ 40.179695] ret_from_fork+0x35/0x40 [ 40.179702] ---[ end trace 85027405f00110aa ]--- Opcode handler must never put submission ref, but that's what io_sync_file_range_finish() do. use io_steal_work() there. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
to #28170604 commit 3fd44c86711f71156b586c22b0495c58f69358bb upstream While working on to make io_uring sqpoll mode support syscalls that need struct files_struct, I got cpu soft lockup in io_ring_ctx_wait_and_kill(), while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait)) cpu_relax(); above loop never has an chance to exit, it's because preempt isn't enabled in the kernel, and the context calling io_ring_ctx_wait_and_kill() and io_sq_thread() run in the same cpu, if io_sq_thread calls a cond_resched() yield cpu and another context enters above loop, then io_sq_thread() will always in runqueue and never exit. Use cond_resched() can fix this issue. Reported-by: syzbot+66243bb7126c410cefe6@syzkaller.appspotmail.com Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Bijan Mottahedeh 提交于
to #28170604 commit dd461af65946de060bff2dab08a63676d2731afe upstream Use ctx->fallback_req address for test_and_set_bit_lock() and clear_bit_unlock(). Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28170604 commit 490e89676a523c688343d6cb8ca5f5dc476414df upstream We do blocking retry from our poll handler, if the file supports polled notifications. Only mark the request as needing an async worker if we can't poll for it. Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28170604 commit af197f50ac53fff1241598c73ca606754a3bb808 upstream We can have files like eventfd where it's perfectly fine to do poll based retry on them, right now io_file_supports_async() doesn't take that into account. Pass in data direction and check the f_op instead of just always needing an async worker. Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28170604 commit 5b0bbee4732cbd58aa98213d4c11a366356bba3d upstream Clay reports that OP_STATX fails for a test case with a valid fd and empty path: -- Test 0: statx:fd 3: SUCCEED, file mode 100755 -- Test 1: statx:path ./uring_statx: SUCCEED, file mode 100755 -- Test 2: io_uring_statx:fd 3: FAIL, errno 9: Bad file descriptor -- Test 3: io_uring_statx:path ./uring_statx: SUCCEED, file mode 100755 This is due to statx not grabbing the process file table, hence we can't lookup the fd in async context. If the fd is valid, ensure that we grab the file table so we can grab the file from async context. Cc: stable@vger.kernel.org # v5.6 Reported-by: NClay Harris <bugs@claycon.org> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
to #28170604 commit 44575a67314b3768d4415252271e8f60c5c70118 upstream When testing io_uring IORING_FEAT_FAST_POLL feature, I got below panic: BUG: kernel NULL pointer dereference, address: 0000000000000030 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 5 PID: 2154 Comm: io_uring_echo_s Not tainted 5.6.0+ #359 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:io_wq_submit_work+0xf/0xa0 Code: ff ff ff be 02 00 00 00 e8 ae c9 19 00 e9 58 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 2f <8b> 45 30 48 8d 9d 48 ff ff ff 25 01 01 00 00 83 f8 01 75 07 eb 2a RSP: 0018:ffffbef543e93d58 EFLAGS: 00010286 RAX: ffffffff84364f50 RBX: ffffa3eb50f046b8 RCX: 0000000000000000 RDX: ffffa3eb0efc1840 RSI: 0000000000000006 RDI: ffffa3eb50f046b8 RBP: 0000000000000000 R08: 00000000fffd070d R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffa3eb50f046b8 R13: ffffa3eb0efc2088 R14: ffffffff85b69be0 R15: ffffa3eb0effa4b8 FS: 00007fe9f69cc4c0(0000) GS:ffffa3eb5ef40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000030 CR3: 0000000020410000 CR4: 00000000000006e0 Call Trace: task_work_run+0x6d/0xa0 do_exit+0x39a/0xb80 ? get_signal+0xfe/0xbc0 do_group_exit+0x47/0xb0 get_signal+0x14b/0xbc0 ? __x64_sys_io_uring_enter+0x1b7/0x450 do_signal+0x2c/0x260 ? __x64_sys_io_uring_enter+0x228/0x450 exit_to_usermode_loop+0x87/0xf0 do_syscall_64+0x209/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x7fe9f64f8df9 Code: Bad RIP value. task_work_run calls io_wq_submit_work unexpectedly, it's obvious that struct callback_head's func member has been changed. After looking into codes, I found this issue is still due to the union definition: union { /* * Only commands that never go async can use the below fields, * obviously. Right now only IORING_OP_POLL_ADD uses them, and * async armed poll handlers for regular commands. The latter * restore the work, if needed. */ struct { struct callback_head task_work; struct hlist_node hash_node; struct async_poll *apoll; }; struct io_wq_work work; }; When task_work_run has multiple work to execute, the work that calls io_poll_remove_all() will do req->work restore for non-poll request always, but indeed if a non-poll request has been added to a new callback_head, subsequent callback will call io_async_task_func() to handle this request, that means we should not do the restore work for such non-poll request. Meanwhile in io_async_task_func(), we should drop submit ref when req has been canceled. Fix both issues. Fixes: b1f573bd15fd ("io_uring: restore req->work when canceling poll request") Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Use io_double_put_req() Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28170604 commit 31af27c7cc9f675d93a135dca99e6413f9096f1d upstream When checking for draining with __req_need_defer(), it tries to match how many requests were sent before a current one with number of already completed. Dropped SQEs are included in req->sequence, and they won't ever appear in CQ. To compensate for that, __req_need_defer() substracts ctx->cached_sq_dropped. However, what it should really use is number of SQEs dropped __before__ the current one. In other words, any submitted request shouldn't shouldn't affect dequeueing from the drain queue of previously submitted ones. Instead of saving proper ctx->cached_sq_dropped in each request, substract from req->sequence it at initialisation, so it includes number of properly submitted requests. note: it also changes behaviour of timeouts, but 1. it's already diverge from the description because of using SQ 2. the description is ambiguous regarding dropped SQEs Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28170604 commit b55ce732004989c85bf9d858c03e6d477cf9023b upstream req->timeout.count and req->io->timeout.seq_offset store the same value, which is sqe->off. Kill the second one Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28170604 commit 22cad1585c6bc6caf2688701004cf2af6865cbe0 upstream io_timeout() can be executed asynchronously by a worker and without holding ctx->uring_lock 1. using ctx->cached_sq_head there is racy there 2. it should count events from a moment of timeout's submission, but not execution Use req->sequence. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28170604 commit 8e2e1faf28b3e66430f55f0b0ee83370ecc150af upstream syzbot reports this crash: BUG: unable to handle page fault for address: ffffffffffffffe8 PGD f96e17067 P4D f96e17067 PUD f96e19067 PMD 0 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI CPU: 55 PID: 211750 Comm: trinity-c127 Tainted: G B L 5.7.0-rc1-next-20200413 #4 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 04/12/2017 RIP: 0010:__wake_up_common+0x98/0x290 el/sched/wait.c:87 Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10 RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8 RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8 RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000 R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8 FS: 00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0 Call Trace: __wake_up_common_lock+0xea/0x150 ommon_lock at kernel/sched/wait.c:124 ? __wake_up_common+0x290/0x290 ? lockdep_hardirqs_on+0x16/0x2c0 __wake_up+0x13/0x20 io_cqring_ev_posted+0x75/0xe0 v_posted at fs/io_uring.c:1160 io_ring_ctx_wait_and_kill+0x1c0/0x2f0 l at fs/io_uring.c:7305 io_uring_create+0xa8d/0x13b0 ? io_req_defer_prep+0x990/0x990 ? __kasan_check_write+0x14/0x20 io_uring_setup+0xb8/0x130 ? io_uring_create+0x13b0/0x13b0 ? check_flags.part.28+0x220/0x220 ? lockdep_hardirqs_on+0x16/0x2c0 __x64_sys_io_uring_setup+0x31/0x40 do_syscall_64+0xcc/0xaf0 ? syscall_return_slowpath+0x580/0x580 ? lockdep_hardirqs_off+0x1f/0x140 ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3 ? trace_hardirqs_off_caller+0x3a/0x150 ? trace_hardirqs_off_thunk+0x1a/0x1c entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x7fdcb9dd76ed Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe7fd4e4f8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 RAX: ffffffffffffffda RBX: 00000000000001a9 RCX: 00007fdcb9dd76ed RDX: fffffffffffffffc RSI: 0000000000000000 RDI: 0000000000005d54 RBP: 00000000000001a9 R08: 0000000e31d3caa7 R09: 0082400004004000 R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000002 R13: 00007fdcb842e058 R14: 00007fdcba4c46c0 R15: 00007fdcb842e000 Modules linked in: bridge stp llc nfnetlink cn brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_intel kvm irqbypass intel_cstate intel_uncore dax_pmem intel_rapl_perf dax_pmem_core ip_tables x_tables xfs sd_mod tg3 firmware_class libphy hpsa scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: binfmt_misc] CR2: ffffffffffffffe8 ---[ end trace f9502383d57e0e22 ]--- RIP: 0010:__wake_up_common+0x98/0x290 Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10 RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8 RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8 RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000 R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8 FS: 00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x29800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]— which is due to error injection (or allocation failure) preventing the rings from being setup. On shutdown, we attempt to remove any pending requests, and for poll request, we call io_cqring_ev_posted() when we've killed poll requests. However, since the rings aren't setup, we won't find any poll requests. Make the calling of io_cqring_ev_posted() dependent on actually having completed requests. This fixes this setup corner case, and removes spurious calls if we remove poll requests and don't find any. Reported-by: NQian Cai <cai@lca.pw> Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-