- 05 6月, 2020 2 次提交
-
-
由 Pavel Begunkov 提交于
build_open_how() is just adjusting open_flags/mode. Do it once during prep. It looks better than storing raw values for the future. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
IORING_SETUP_IOPOLL is defined only for read/write, other opcodes should be disallowed, otherwise it'll get an error as below. Also refuse open/close with SQPOLL, as the polling thread wouldn't know which file table to use. RIP: 0010:io_iopoll_getevents+0x111/0x5a0 Call Trace: ? _raw_spin_unlock_irqrestore+0x24/0x40 ? do_send_sig_info+0x64/0x90 io_iopoll_reap_events.part.0+0x5e/0xa0 io_ring_ctx_wait_and_kill+0x132/0x1c0 io_uring_release+0x20/0x30 __fput+0xcd/0x230 ____fput+0xe/0x10 task_work_run+0x67/0xa0 do_exit+0x353/0xb10 ? handle_mm_fault+0xd4/0x200 ? syscall_trace_enter+0x18c/0x2c0 do_group_exit+0x43/0xa0 __x64_sys_exit_group+0x18/0x20 do_syscall_64+0x60/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> [axboe: allow provide/remove buffers and files update] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 6月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
A previous commit enabled this functionality, which also enabled O_PATH to work correctly with io_uring. But we can't safely close the ring itself, as the file handle isn't reference counted inside io_uring_enter(). Instead of jumping through hoops to enable ring closure, add a "soft" ->needs_file option, ->needs_file_no_error. This enables O_PATH file descriptors to work, but still catches the case of trying to close the ring itself. Reported-by: NJann Horn <jannh@google.com> Fixes: 904fbcb1 ("io_uring: remove 'fd is io_uring' from close path") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 30 5月, 2020 3 次提交
-
-
由 Pavel Begunkov 提交于
Overflowed requests in io_uring_cancel_files() should be shed only of inflight and overflowed refs. All other left references are owned by someone else. If refcount_sub_and_test() fails, it will go further and put put extra ref, don't do that. Also, don't need to do io_wq_cancel_work() for overflowed reqs, they will be let go shortly anyway. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Offset timeouts wait not for sqe->off non-timeout CQEs, but rather sqe->off + number of prior inflight requests. Wait exactly for sqe->off non-timeout completions Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Separate flushing offset timeouts io_commit_cqring() by moving it into a helper. Just a preparation, makes following patches clearer. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 5月, 2020 7 次提交
-
-
由 Bijan Mottahedeh 提交于
Calling statx directly both simplifies the interface and avoids potential incompatibilities between sync and async invokations. Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bijan Mottahedeh 提交于
Separate statx data from open in io_kiocb. No functional changes. Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_close() was punting async manually to skip grabbing files. Use REQ_F_NO_FILE_TABLE instead, and pass it through the generic path with -EAGAIN. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_commit_cqring() assembly doesn't look good with extra code handling drained requests. IOSQE_IO_DRAIN is slow and discouraged to be used in a hot path, so try to minimise its impact by putting it into a helper and doing a fast check. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
SQEs are user writable, don't read sqe->off twice in io_timeout_prep() Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Move spin_lock_irq() earlier to have only 1 call site of it in io_timeout(). It makes the flow easier. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
In io_uring_cancel_files(), after refcount_sub_and_test() leaves 0 req->refs, it calls io_put_req(), which would also put a ref. Call io_free_req() instead. Cc: stable@vger.kernel.org Fixes: 2ca10259 ("io_uring: prune request from overflow list on flush") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 5月, 2020 4 次提交
-
-
由 Xiaoguang Wang 提交于
When IORING_SETUP_SQPOLL is enabled, io_ring_ctx_wait_and_kill() will wait for sq thread to idle by busy loop: while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait)) cond_resched(); Above loop isn't very CPU friendly, it may introduce a short cpu burst on the current cpu. If ctx->refs is dying, we forbid sq_thread from submitting any further SQEs. Instead they just get discarded when we exit. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Xiaoguang Wang 提交于
In io_sq_thread(), currently if we get an -EBUSY error and go to sleep, we will won't clear it again, which will result in io_sq_thread() will never have a chance to submit sqes again. Below test program test.c can reveal this bug: int main(int argc, char *argv[]) { struct io_uring ring; int i, fd, ret; struct io_uring_sqe *sqe; struct io_uring_cqe *cqe; struct iovec *iovecs; void *buf; struct io_uring_params p; if (argc < 2) { printf("%s: file\n", argv[0]); return 1; } memset(&p, 0, sizeof(p)); p.flags = IORING_SETUP_SQPOLL; ret = io_uring_queue_init_params(4, &ring, &p); if (ret < 0) { fprintf(stderr, "queue_init: %s\n", strerror(-ret)); return 1; } fd = open(argv[1], O_RDONLY | O_DIRECT); if (fd < 0) { perror("open"); return 1; } iovecs = calloc(10, sizeof(struct iovec)); for (i = 0; i < 10; i++) { if (posix_memalign(&buf, 4096, 4096)) return 1; iovecs[i].iov_base = buf; iovecs[i].iov_len = 4096; } ret = io_uring_register_files(&ring, &fd, 1); if (ret < 0) { fprintf(stderr, "%s: register %d\n", __FUNCTION__, ret); return ret; } for (i = 0; i < 10; i++) { sqe = io_uring_get_sqe(&ring); if (!sqe) break; io_uring_prep_readv(sqe, 0, &iovecs[i], 1, 0); sqe->flags |= IOSQE_FIXED_FILE; ret = io_uring_submit(&ring); sleep(1); printf("submit %d\n", i); } for (i = 0; i < 10; i++) { io_uring_wait_cqe(&ring, &cqe); printf("receive: %d\n", i); if (cqe->res != 4096) { fprintf(stderr, "ret=%d, wanted 4096\n", cqe->res); ret = 1; } io_uring_cqe_seen(&ring, cqe); } close(fd); io_uring_queue_exit(&ring); return 0; } sudo ./test testfile above command will hang on the tenth request, to fix this bug, when io sq_thread is waken up, we reset the variable 'ret' to be zero. Suggested-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We normally disable any commands that aren't specifically poll commands for a ring that is setup for polling, but we do allow buffer provide and remove commands to support buffer selection for polled IO. Once a request is issued, we add it to the poll list to poll for completion. But we should not do that for non-IO commands, as those request complete inline immediately and aren't pollable. If we do, we can leave requests on the iopoll list after they are freed. Fixes: ddf0322d ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bijan Mottahedeh 提交于
kiocb.private is used in iomap_dio_rw() so store buf_index separately. Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com> Move 'buf_index' to a hole in io_kiocb. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 5月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
We currently move it to the io_wqe_manager for execution, but we cannot safely do so as we may lack some of the state to execute it out of context. As we cancel work anyway when the ring/task exits, just mark this request as canceled and io_async_task_func() will do the right thing. Fixes: aa96bf8a ("io_uring: use io-wq manager as backup task if task is exiting") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 5月, 2020 7 次提交
-
-
由 Jens Axboe 提交于
If the request is still hashed in io_async_task_func(), then it cannot have been canceled and it's pointless to check. So save that check. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We checked for 'force_nonblock' higher up, so it's definitely false at this point. Kill the check, it's a remnant of when we tried to do inline splice without always punting to async context. Fixes: 2fb3e822 ("io_uring: punt splice async because of inode mutex") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Add IORING_OP_TEE implementing tee(2) support. Almost identical to splice bits, but without offsets. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
req->flags stores all sqe->flags. After checking that sqe->flags are valid set if IOSQE* flags, no need to double check it, just forward them all. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_file_put() deals with flushing state's file refs, adding "state" to its name makes it a bit clearer. Also, avoid double check of state->file in __io_file_get() in some cases. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
A submission is "async" IIF it's done by SQPOLL thread. Instead of passing @async flag into io_submit_sqes(), deduce it from ctx->flags. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We only need apoll in the one section, do the juggling with the work restoration there. This removes a special case further down as well. No functional changes in this patch. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 5月, 2020 3 次提交
-
-
由 Pavel Begunkov 提交于
As for other not inlined requests, alloc req->io for FORCE_ASYNC reqs, so they can be prepared properly. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
If req->io is not NULL, it's already prepared. Don't do it again, it's dangerous. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Ensure that ctx->sqo_wait is initialized as soon as the ctx is allocated, instead of deferring it to the offload setup. This fixes a syzbot reported lockdep complaint, which is really due to trying to wake_up on an uninitialized wait queue: RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441319 RDX: 0000000000000001 RSI: 0000000020000140 RDI: 000000000000047b RBP: 0000000000010475 R08: 0000000000000001 R09: 00000000004002c8 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402260 R13: 00000000004022f0 R14: 0000000000000000 R15: 0000000000000000 INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 7090 Comm: syz-executor222 Not tainted 5.7.0-rc1-next-20200415-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 assign_lock_key kernel/locking/lockdep.c:913 [inline] register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225 __lock_acquire+0x104/0x4c50 kernel/locking/lockdep.c:4234 lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4934 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159 __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:122 io_cqring_ev_posted+0xa5/0x1e0 fs/io_uring.c:1160 io_poll_remove_all fs/io_uring.c:4357 [inline] io_ring_ctx_wait_and_kill+0x2bc/0x5a0 fs/io_uring.c:7305 io_uring_create fs/io_uring.c:7843 [inline] io_uring_setup+0x115e/0x22b0 fs/io_uring.c:7870 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x441319 Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 Reported-by: syzbot+8c91f5d054e998721c57@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 5月, 2020 5 次提交
-
-
由 Jens Axboe 提交于
There's no point in using list_del_init() on entries that are going away, and the associated lock is always used in process context so let's not use the IRQ disabling+saving variant of the spinlock. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Stefano Garzarella 提交于
This new flag should be set/clear from the application to disable/enable eventfd notifications when a request is completed and queued to the CQ ring. Before this patch, notifications were always sent if an eventfd is registered, so IORING_CQ_EVENTFD_DISABLED is not set during the initialization. It will be up to the application to set the flag after initialization if no notifications are required at the beginning. Signed-off-by: NStefano Garzarella <sgarzare@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Stefano Garzarella 提交于
This patch adds the new 'cq_flags' field that should be written by the application and read by the kernel. This new field is available to the userspace application through 'cq_off.flags'. We are using 4-bytes previously reserved and set to zero. This means that if the application finds this field to zero, then the new functionality is not supported. In the next patch we will introduce the first flag available. Signed-off-by: NStefano Garzarella <sgarzare@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Some file descriptors use separate waitqueues for their f_ops->poll() handler, most commonly one for read and one for write. The io_uring poll implementation doesn't work with that, as the 2nd poll_wait() call will cause the io_uring poll request to -EINVAL. This affects (at least) tty devices and /dev/random as well. This is a big problem for event loops where some file descriptors work, and others don't. With this fix, io_uring handles multiple waitqueues. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We currently embed and queue a work item per fixed_file_ref_node that we update, but if the workload does a lot of these, then the associated kworker-events overhead can become quite noticeable. Since we rarely need to wait on these, batch them at 1 second intervals instead. If we do need to wait for them, we just flush the pending delayed work. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 5月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
We used to have three completions, now we just have two. With the two, let's not allocate them dynamically, just embed then in the ctx and name them appropriately. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 5月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
When we changed the file registration handling, it became important to iterate the bulk request freeing list for fixed files as well, or we miss dropping the fixed file reference. If not, we're leaking references, and we'll get a kworker stuck waiting for file references to disappear. This also means we can remove the special casing of fixed vs non-fixed files, we need to iterate for both and we can just rely on __io_req_aux_free() doing io_put_file() instead of doing it manually. Fixes: 05589553 ("io_uring: refactor file register/unregister/update handling") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 5月, 2020 1 次提交
-
-
由 Xiaoming Ni 提交于
Remove duplicate semicolon at the end of line in io_file_from_index() Signed-off-by: NXiaoming Ni <nixiaoming@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 5月, 2020 1 次提交
-
-
由 Pavel Begunkov 提交于
do_splice() doesn't expect len to be 0. Just always return 0 in this case as splice(2) does. Fixes: 7d67af2c ("io_uring: add splice(2) support") Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 5月, 2020 2 次提交
-
-
由 Xiaoguang Wang 提交于
The "struct io_submit_state *state" parameter is not used, remove it. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
The attempt protecting us from closing the ring itself wasn't really complete, and we actually don't need it. The referencing of requests themselve, and the references they hold on the ring, ensures that the life time of the ring is sane. With the check removed, we can also remove the need to have the close operation fget() the file. Reported-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 5月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
We currently make some guesses as when to open this fd, but in reality we have no business (or need) to do so at all. In fact, it makes certain things fail, like O_PATH. Remove the fd lookup from these opcodes, we're just passing the 'fd' to generic helpers anyway. With that, we can also remove the special casing of fd values in io_req_needs_file(), and the 'fd_non_neg' check that we have. And we can ensure that we only read sqe->fd once. This fixes O_PATH usage with openat/openat2, and ditto statx path side oddities. Cc: stable@vger.kernel.org: # v5.6 Reported-by: NMax Kellermann <mk@cm4all.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-