- 30 5月, 2020 3 次提交
-
-
由 Pavel Begunkov 提交于
Overflowed requests in io_uring_cancel_files() should be shed only of inflight and overflowed refs. All other left references are owned by someone else. If refcount_sub_and_test() fails, it will go further and put put extra ref, don't do that. Also, don't need to do io_wq_cancel_work() for overflowed reqs, they will be let go shortly anyway. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Offset timeouts wait not for sqe->off non-timeout CQEs, but rather sqe->off + number of prior inflight requests. Wait exactly for sqe->off non-timeout completions Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Separate flushing offset timeouts io_commit_cqring() by moving it into a helper. Just a preparation, makes following patches clearer. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 5月, 2020 7 次提交
-
-
由 Bijan Mottahedeh 提交于
Calling statx directly both simplifies the interface and avoids potential incompatibilities between sync and async invokations. Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bijan Mottahedeh 提交于
Separate statx data from open in io_kiocb. No functional changes. Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_close() was punting async manually to skip grabbing files. Use REQ_F_NO_FILE_TABLE instead, and pass it through the generic path with -EAGAIN. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_commit_cqring() assembly doesn't look good with extra code handling drained requests. IOSQE_IO_DRAIN is slow and discouraged to be used in a hot path, so try to minimise its impact by putting it into a helper and doing a fast check. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
SQEs are user writable, don't read sqe->off twice in io_timeout_prep() Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Move spin_lock_irq() earlier to have only 1 call site of it in io_timeout(). It makes the flow easier. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
In io_uring_cancel_files(), after refcount_sub_and_test() leaves 0 req->refs, it calls io_put_req(), which would also put a ref. Call io_free_req() instead. Cc: stable@vger.kernel.org Fixes: 2ca10259 ("io_uring: prune request from overflow list on flush") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 5月, 2020 1 次提交
-
-
由 Xiaoguang Wang 提交于
When IORING_SETUP_SQPOLL is enabled, io_ring_ctx_wait_and_kill() will wait for sq thread to idle by busy loop: while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait)) cond_resched(); Above loop isn't very CPU friendly, it may introduce a short cpu burst on the current cpu. If ctx->refs is dying, we forbid sq_thread from submitting any further SQEs. Instead they just get discarded when we exit. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 5月, 2020 6 次提交
-
-
由 Jens Axboe 提交于
If the request is still hashed in io_async_task_func(), then it cannot have been canceled and it's pointless to check. So save that check. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Add IORING_OP_TEE implementing tee(2) support. Almost identical to splice bits, but without offsets. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
req->flags stores all sqe->flags. After checking that sqe->flags are valid set if IOSQE* flags, no need to double check it, just forward them all. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_file_put() deals with flushing state's file refs, adding "state" to its name makes it a bit clearer. Also, avoid double check of state->file in __io_file_get() in some cases. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
A submission is "async" IIF it's done by SQPOLL thread. Instead of passing @async flag into io_submit_sqes(), deduce it from ctx->flags. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We only need apoll in the one section, do the juggling with the work restoration there. This removes a special case further down as well. No functional changes in this patch. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 5月, 2020 5 次提交
-
-
由 Jens Axboe 提交于
There's no point in using list_del_init() on entries that are going away, and the associated lock is always used in process context so let's not use the IRQ disabling+saving variant of the spinlock. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Stefano Garzarella 提交于
This new flag should be set/clear from the application to disable/enable eventfd notifications when a request is completed and queued to the CQ ring. Before this patch, notifications were always sent if an eventfd is registered, so IORING_CQ_EVENTFD_DISABLED is not set during the initialization. It will be up to the application to set the flag after initialization if no notifications are required at the beginning. Signed-off-by: NStefano Garzarella <sgarzare@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Stefano Garzarella 提交于
This patch adds the new 'cq_flags' field that should be written by the application and read by the kernel. This new field is available to the userspace application through 'cq_off.flags'. We are using 4-bytes previously reserved and set to zero. This means that if the application finds this field to zero, then the new functionality is not supported. In the next patch we will introduce the first flag available. Signed-off-by: NStefano Garzarella <sgarzare@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Some file descriptors use separate waitqueues for their f_ops->poll() handler, most commonly one for read and one for write. The io_uring poll implementation doesn't work with that, as the 2nd poll_wait() call will cause the io_uring poll request to -EINVAL. This affects (at least) tty devices and /dev/random as well. This is a big problem for event loops where some file descriptors work, and others don't. With this fix, io_uring handles multiple waitqueues. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We currently embed and queue a work item per fixed_file_ref_node that we update, but if the workload does a lot of these, then the associated kworker-events overhead can become quite noticeable. Since we rarely need to wait on these, batch them at 1 second intervals instead. If we do need to wait for them, we just flush the pending delayed work. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 5月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
We used to have three completions, now we just have two. With the two, let's not allocate them dynamically, just embed then in the ctx and name them appropriately. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 5月, 2020 1 次提交
-
-
由 Xiaoming Ni 提交于
Remove duplicate semicolon at the end of line in io_file_from_index() Signed-off-by: NXiaoming Ni <nixiaoming@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 5月, 2020 2 次提交
-
-
由 Xiaoguang Wang 提交于
The "struct io_submit_state *state" parameter is not used, remove it. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
The attempt protecting us from closing the ring itself wasn't really complete, and we actually don't need it. The referencing of requests themselve, and the references they hold on the ring, ensures that the life time of the ring is sane. With the check removed, we can also remove the need to have the close operation fget() the file. Reported-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 5月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
We currently make some guesses as when to open this fd, but in reality we have no business (or need) to do so at all. In fact, it makes certain things fail, like O_PATH. Remove the fd lookup from these opcodes, we're just passing the 'fd' to generic helpers anyway. With that, we can also remove the special casing of fd values in io_req_needs_file(), and the 'fd_non_neg' check that we have. And we can ensure that we only read sqe->fd once. This fixes O_PATH usage with openat/openat2, and ditto statx path side oddities. Cc: stable@vger.kernel.org: # v5.6 Reported-by: NMax Kellermann <mk@cm4all.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 06 5月, 2020 1 次提交
-
-
由 Xiaoguang Wang 提交于
If copy_to_user() in io_uring_setup() failed, we'll leak many kernel resources, which will be recycled until process terminates. This bug can be reproduced by using mprotect to set params to PROT_READ. To fix this issue, refactor io_uring_create() a bit to add a new 'struct io_uring_params __user *params' parameter and move the copy_to_user() in io_uring_setup() to io_uring_setup(), if copy_to_user() failed, we can free kernel resource properly. Suggested-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 5月, 2020 1 次提交
-
-
由 Xiaoguang Wang 提交于
The prepare_to_wait() and finish_wait() calls in io_uring_cancel_files() are mismatched. Currently I don't see any issues related this bug, just find it by learning codes. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 5月, 2020 7 次提交
-
-
由 Pavel Begunkov 提交于
Nonblocking do_splice() still may wait for some time on an inode mutex. Let's play safe and always punt it async. Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_req_defer() do double-checked locking. Use proper helpers for that, i.e. list_empty_careful(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
[ 40.179474] refcount_t: underflow; use-after-free. [ 40.179499] WARNING: CPU: 6 PID: 1848 at lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0 ... [ 40.179612] RIP: 0010:refcount_warn_saturate+0xae/0xf0 [ 40.179617] Code: 28 44 0a 01 01 e8 d7 01 c2 ff 0f 0b 5d c3 80 3d 15 44 0a 01 00 75 91 48 c7 c7 b8 f5 75 be c6 05 05 44 0a 01 01 e8 b7 01 c2 ff <0f> 0b 5d c3 80 3d f3 43 0a 01 00 0f 85 6d ff ff ff 48 c7 c7 10 f6 [ 40.179619] RSP: 0018:ffffb252423ebe18 EFLAGS: 00010286 [ 40.179623] RAX: 0000000000000000 RBX: ffff98d65e929400 RCX: 0000000000000000 [ 40.179625] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 00000000ffffffff [ 40.179627] RBP: ffffb252423ebe18 R08: 0000000000000001 R09: 000000000000055d [ 40.179629] R10: 0000000000000c8c R11: 0000000000000001 R12: 0000000000000000 [ 40.179631] R13: ffff98d68c434400 R14: ffff98d6a9cbaa20 R15: ffff98d6a609ccb8 [ 40.179634] FS: 0000000000000000(0000) GS:ffff98d6af580000(0000) knlGS:0000000000000000 [ 40.179636] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 40.179638] CR2: 00000000033e3194 CR3: 000000006480a003 CR4: 00000000003606e0 [ 40.179641] Call Trace: [ 40.179652] io_put_req+0x36/0x40 [ 40.179657] io_free_work+0x15/0x20 [ 40.179661] io_worker_handle_work+0x2f5/0x480 [ 40.179667] io_wqe_worker+0x2a9/0x360 [ 40.179674] ? _raw_spin_unlock_irqrestore+0x24/0x40 [ 40.179681] kthread+0x12c/0x170 [ 40.179685] ? io_worker_handle_work+0x480/0x480 [ 40.179690] ? kthread_park+0x90/0x90 [ 40.179695] ret_from_fork+0x35/0x40 [ 40.179702] ---[ end trace 85027405f00110aa ]--- Opcode handler must never put submission ref, but that's what io_sync_file_range_finish() do. use io_steal_work() there. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Xiaoguang Wang 提交于
While working on to make io_uring sqpoll mode support syscalls that need struct files_struct, I got cpu soft lockup in io_ring_ctx_wait_and_kill(), while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait)) cpu_relax(); above loop never has an chance to exit, it's because preempt isn't enabled in the kernel, and the context calling io_ring_ctx_wait_and_kill() and io_sq_thread() run in the same cpu, if io_sq_thread calls a cond_resched() yield cpu and another context enters above loop, then io_sq_thread() will always in runqueue and never exit. Use cond_resched() can fix this issue. Reported-by: syzbot+66243bb7126c410cefe6@syzkaller.appspotmail.com Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bijan Mottahedeh 提交于
Use ctx->fallback_req address for test_and_set_bit_lock() and clear_bit_unlock(). Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We do blocking retry from our poll handler, if the file supports polled notifications. Only mark the request as needing an async worker if we can't poll for it. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We can have files like eventfd where it's perfectly fine to do poll based retry on them, right now io_file_supports_async() doesn't take that into account. Pass in data direction and check the f_op instead of just always needing an async worker. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 4月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
Clay reports that OP_STATX fails for a test case with a valid fd and empty path: -- Test 0: statx:fd 3: SUCCEED, file mode 100755 -- Test 1: statx:path ./uring_statx: SUCCEED, file mode 100755 -- Test 2: io_uring_statx:fd 3: FAIL, errno 9: Bad file descriptor -- Test 3: io_uring_statx:path ./uring_statx: SUCCEED, file mode 100755 This is due to statx not grabbing the process file table, hence we can't lookup the fd in async context. If the fd is valid, ensure that we grab the file table so we can grab the file from async context. Cc: stable@vger.kernel.org # v5.6 Reported-by: NClay Harris <bugs@claycon.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 4月, 2020 1 次提交
-
-
由 Xiaoguang Wang 提交于
When testing io_uring IORING_FEAT_FAST_POLL feature, I got below panic: BUG: kernel NULL pointer dereference, address: 0000000000000030 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 5 PID: 2154 Comm: io_uring_echo_s Not tainted 5.6.0+ #359 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:io_wq_submit_work+0xf/0xa0 Code: ff ff ff be 02 00 00 00 e8 ae c9 19 00 e9 58 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 2f <8b> 45 30 48 8d 9d 48 ff ff ff 25 01 01 00 00 83 f8 01 75 07 eb 2a RSP: 0018:ffffbef543e93d58 EFLAGS: 00010286 RAX: ffffffff84364f50 RBX: ffffa3eb50f046b8 RCX: 0000000000000000 RDX: ffffa3eb0efc1840 RSI: 0000000000000006 RDI: ffffa3eb50f046b8 RBP: 0000000000000000 R08: 00000000fffd070d R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffa3eb50f046b8 R13: ffffa3eb0efc2088 R14: ffffffff85b69be0 R15: ffffa3eb0effa4b8 FS: 00007fe9f69cc4c0(0000) GS:ffffa3eb5ef40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000030 CR3: 0000000020410000 CR4: 00000000000006e0 Call Trace: task_work_run+0x6d/0xa0 do_exit+0x39a/0xb80 ? get_signal+0xfe/0xbc0 do_group_exit+0x47/0xb0 get_signal+0x14b/0xbc0 ? __x64_sys_io_uring_enter+0x1b7/0x450 do_signal+0x2c/0x260 ? __x64_sys_io_uring_enter+0x228/0x450 exit_to_usermode_loop+0x87/0xf0 do_syscall_64+0x209/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x7fe9f64f8df9 Code: Bad RIP value. task_work_run calls io_wq_submit_work unexpectedly, it's obvious that struct callback_head's func member has been changed. After looking into codes, I found this issue is still due to the union definition: union { /* * Only commands that never go async can use the below fields, * obviously. Right now only IORING_OP_POLL_ADD uses them, and * async armed poll handlers for regular commands. The latter * restore the work, if needed. */ struct { struct callback_head task_work; struct hlist_node hash_node; struct async_poll *apoll; }; struct io_wq_work work; }; When task_work_run has multiple work to execute, the work that calls io_poll_remove_all() will do req->work restore for non-poll request always, but indeed if a non-poll request has been added to a new callback_head, subsequent callback will call io_async_task_func() to handle this request, that means we should not do the restore work for such non-poll request. Meanwhile in io_async_task_func(), we should drop submit ref when req has been canceled. Fix both issues. Fixes: b1f573bd ("io_uring: restore req->work when canceling poll request") Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Use io_double_put_req() Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 4月, 2020 2 次提交
-
-
由 Pavel Begunkov 提交于
When checking for draining with __req_need_defer(), it tries to match how many requests were sent before a current one with number of already completed. Dropped SQEs are included in req->sequence, and they won't ever appear in CQ. To compensate for that, __req_need_defer() substracts ctx->cached_sq_dropped. However, what it should really use is number of SQEs dropped __before__ the current one. In other words, any submitted request shouldn't shouldn't affect dequeueing from the drain queue of previously submitted ones. Instead of saving proper ctx->cached_sq_dropped in each request, substract from req->sequence it at initialisation, so it includes number of properly submitted requests. note: it also changes behaviour of timeouts, but 1. it's already diverge from the description because of using SQ 2. the description is ambiguous regarding dropped SQEs Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
req->timeout.count and req->io->timeout.seq_offset store the same value, which is sqe->off. Kill the second one Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-