- 22 2月, 2020 2 次提交
-
-
由 Xiaoguang Wang 提交于
Since commit a3a0e43f ("io_uring: don't enter poll loop if we have CQEs pending"), if we already events pending, we won't enter poll loop. In case SETUP_IOPOLL and SETUP_SQPOLL are both enabled, if app has been terminated and don't reap pending events which are already in cq ring, and there are some reqs in poll_list, io_sq_thread will enter __io_iopoll_check(), and find pending events, then return, this loop will never have a chance to exit. I have seen this issue in fio stress tests, to fix this issue, let io_sq_thread call io_iopoll_getevents() with argument 'min' being zero, and remove __io_iopoll_check(). Fixes: a3a0e43f ("io_uring: don't enter poll loop if we have CQEs pending") Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Stefano Garzarella 提交于
This patch drops 'cur_mm' before calling cond_resched(), to prevent the sq_thread from spinning even when the user process is finished. Before this patch, if the user process ended without closing the io_uring fd, the sq_thread continues to spin until the 'sq_thread_idle' timeout ends. In the worst case where the 'sq_thread_idle' parameter is bigger than INT_MAX, the sq_thread will spin forever. Fixes: 6c271ce2 ("io_uring: add submission polling") Signed-off-by: NStefano Garzarella <sgarzare@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 2月, 2020 2 次提交
-
-
由 Pavel Begunkov 提交于
io_cleanup_req() should be called before req->io is freed, and so shouldn't be after __io_free_req() -> __io_req_aux_free(). Also, it will be ignored for in io_free_req_many(), which use __io_req_aux_free(). Place cleanup_req() into __io_req_aux_free(). Fixes: 99bc4c38 ("io_uring: fix iovec leaks") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dan Carpenter 提交于
The "kmsg" pointer can't be NULL and we have already dereferenced it so a check here would be useless. Reviewed-by: NStefano Garzarella <sgarzare@redhat.com> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 2月, 2020 1 次提交
-
-
由 Pavel Begunkov 提交于
fallocate_finish() is missing cancellation check. Add it. It's safe to do that, as only flags setup and sqe fields copy are done before it gets into __io_fallocate(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 2月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
Carter reported an issue where he could produce a stall on ring exit, when we're cleaning up requests that match the given file table. For this particular test case, a combination of a few things caused the issue: - The cq ring was overflown - The request being canceled was in the overflow list The combination of the above means that the cq overflow list holds a reference to the request. The request is canceled correctly, but since the overflow list holds a reference to it, the final put won't happen. Since the final put doesn't happen, the request remains in the inflight. Hence we never finish the cancelation flush. Fix this by removing requests from the overflow list if we're canceling them. Cc: stable@vger.kernel.org # 5.5 Reported-by: NCarter Li 李通洲 <carter.li@eoitek.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 2月, 2020 2 次提交
-
-
由 Jens Axboe 提交于
Jonas reports that he sometimes sees -97/-22 error returns from sendmsg, if it gets punted async. This is due to not retaining the sockaddr_storage between calls. Include that in the state we copy when going async. Cc: stable@vger.kernel.org # 5.3+ Reported-by: NJonas Bonn <jonas@norrbonn.se> Tested-by: NJonas Bonn <jonas@norrbonn.se> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Normally we cancel all work we track, but for untracked work we could leave the async worker behind until that work completes. This is totally fine, but does leave resources pending after the task is gone until that work completes. Cancel work that this task queued up when it goes away. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 2月, 2020 11 次提交
-
-
由 Pavel Begunkov 提交于
As in the previous patch, make openat*_prep() and statx_prep() handle double preparation to avoid resource leakage. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Requests may be prepared multiple times with ->io allocated (i.e. async prepared). Preparation functions don't handle it and forget about previously allocated resources. This may happen in case of: - spurious defer_check - non-head (i.e. async prepared) request executed in sync (via nxt). Make the handlers check, whether they already allocated resources, which is true IFF REQ_F_NEED_CLEANUP is set. Cc: stable@vger.kernel.org # 5.5 Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
First, io_close() misses filp_close() and io_cqring_add_event(), when f_op->flush is defined. That's because in this case it will io_queue_async_work() itself not grabbing files, so the corresponding chunk in io_close_finish() won't be executed. Second, when submitted through io_wq_submit_work(), it will do filp_close() and *_add_event() twice: first inline in io_close(), and the second one in call to io_close_finish() from io_close(). The second one will also fire, because it was submitted async through generic path, and so have grabbed files. And the last nice thing is to remove this weird pilgrimage with checking work/old_work and casting it to nxt. Just use a helper instead. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Don't just check for dirfd == -1, we should allow AT_FDCWD as well for relative lookups. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This passes it in to io-wq, so it assumes the right fs_struct when executing async work that may need to do lookups. Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
For non-blocking issue, we set IOCB_NOWAIT in the kiocb. However, on a raw block device, this yields an -EOPNOTSUPP return, as non-blocking writes aren't supported. Turn this -EOPNOTSUPP into -EAGAIN, so we retry from blocking context with IOCB_NOWAIT cleared. Cc: stable@vger.kernel.org # 5.5 Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
openat() and statx() may have allocated ->open.filename, which should be be put. Add cleanup handlers for them. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Allocated iovec is freed only in io_{read,write,send,recv)(), and just leaves it if an error occured. There are plenty of such cases: - cancellation of non-head requests - fail grabbing files in __io_queue_sqe() - set REQ_F_NOWAIT and returning in __io_queue_sqe() Add REQ_F_NEED_CLEANUP, which will force such requests with custom allocated resourses go through cleanup handlers on put. Cc: stable@vger.kernel.org # 5.5 Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
struct io_async_open is unused, remove it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Stefano Garzarella 提交于
In io_uring_poll() we must flush overflowed CQ events before to check if there are CQ events available, to avoid missing events. We call the io_cqring_events() that checks and flushes any overflow and returns the number of CQ events available. Signed-off-by: NStefano Garzarella <sgarzare@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
All of these opcodes take a directory file descriptor. We can't easily support fixed files for these operations, and the use case for that probably isn't all that clear (or sensible) anyway. Disable IOSQE_FIXED_FILE for these operations. Reported-by: NStefan Metzmacher <metze@samba.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 07 2月, 2020 3 次提交
-
-
由 Pavel Begunkov 提交于
After defer, a request will be prepared, that includes allocating iovec if needed, and then submitted through io_wq_submit_work() but not custom handler (e.g. io_rw_async()/io_sendrecv_async()). However, it'll leak iovec, as it's in io-wq and the code goes as follows: io_read() { if (!io_wq_current_is_worker()) kfree(iovec); } Put all deallocation logic in io_{read,write,send,recv}(), which will leave the memory, if going async with -EAGAIN. It also fixes a leak after failed io_alloc_async_ctx() in io_{recv,send}_msg(). Cc: stable@vger.kernel.org # 5.5 Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Randy Dunlap 提交于
Make bitfields of size 1 bit be unsigned (since there is no room for the sign bit). This clears up the sparse warnings: CHECK ../fs/io_uring.c ../fs/io_uring.c:207:50: error: dubious one-bit signed bitfield ../fs/io_uring.c:208:55: error: dubious one-bit signed bitfield ../fs/io_uring.c:209:63: error: dubious one-bit signed bitfield ../fs/io_uring.c:210:54: error: dubious one-bit signed bitfield ../fs/io_uring.c:211:57: error: dubious one-bit signed bitfield Found by sight and then verified with sparse. Fixes: 69b3e546 ("io_uring: change io_ring_ctx bool fields into bit fields") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: io-uring@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Fail fast if can't grab mm, so past that requests always have an mm when required. This allows us to remove req->user altogether. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 2月, 2020 2 次提交
-
-
由 Jens Axboe 提交于
syzbot reports a use-after-free in io_ring_file_ref_switch() when it tries to switch back to percpu mode. When we put the final reference to the table by calling percpu_ref_kill_and_confirm(), we don't want the zero reference to queue async work for flushing the potentially queued up items. We currently do a few flush_work(), but they merely paper around the issue, since the work item may not have been queued yet depending on the when the percpu-ref callback gets run. Coming into the file unregister, we know we have the ring quiesced. io_ring_file_ref_switch() can check for whether or not the ref is dying or not, and not queue anything async at that point. Once the ref has been confirmed killed, flush any potential items manually. Reported-by: syzbot+7caeaea49c2c8a591e3d@syzkaller.appspotmail.com Fixes: 05f3fb3c ("io_uring: avoid ring quiesce for fixed file set unregister and update") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
As part of io_uring shutdown, we cancel work that is pending and won't necessarily complete on its own. That includes requests like poll commands and timeouts. If we're using SQPOLL for kernel side submission and we shutdown the ring immediately after queueing such work, we can race with the sqthread doing the submission. This means we may miss cancelling some work, which results in the io_uring shutdown hanging forever. Cc: stable@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 2月, 2020 8 次提交
-
-
由 Pavel Begunkov 提交于
Both iocb_flags() and kiocb_set_rw_flags() are inline and modify kiocb->ki_flags. Place them close, so they can be potentially better optimised. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Grab requests from cache-array from the end, so can get by only free_reqs. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Andres correctly points out that read-ahead can block, if it needs to read in meta data (or even just through the page cache page allocations). Play it safe for now and just ensure WILLNEED is also punted to async context. While in there, allow the file settings hints from non-blocking context. They don't need to start/do IO, and we can safely do them inline. Fixes: 4840e418 ("io_uring: add IORING_OP_FADVISE") Reported-by: NAndres Freund <andres@anarazel.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We punt close to async for the final fput(), but we log the completion even before that even in that case. We rely on the request not having a files table assigned to detect what the final async close should do. However, if we punt the async queue to __io_queue_sqe(), we'll get ->files assigned and this makes io_close_finish() think it should both close the filp again (which does no harm) AND log a new CQE event for this request. This causes duplicate CQEs. Queue the request up for async manually so we don't grab files needlessly and trigger this condition. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
It won't ever get into io_prep_rw() when req->file haven't been set in io_req_set_file(), hence remove the check. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we have a read/write that is deferred, we already setup the async IO context for that request, and mapped it. When we later try and execute the request and we get -EAGAIN, we don't want to attempt to re-map it. If we do, we end up with garbage in the iovec, which typically leads to an -EFAULT or -EINVAL completion. Cc: stable@vger.kernel.org # 5.5 Reported-by: NDan Melnic <dmm@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Don't use the recvmsg/sendmsg helpers, use the same helpers that the recv(2) and send(2) system calls use. Reported-by: N李通洲 <carter.li@eoitek.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we have nested or circular eventfd wakeups, then we can deadlock if we run them inline from our poll waitqueue wakeup handler. It's also possible to have very long chains of notifications, to the extent where we could risk blowing the stack. Check the eventfd recursion count before calling eventfd_signal(). If it's non-zero, then punt the signaling to async context. This is always safe, as it takes us out-of-line in terms of stack and locking context. Cc: stable@vger.kernel.org # 5.1+ Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 2月, 2020 2 次提交
-
-
由 John Hubbard 提交于
In order to provide a clearer, more symmetric API for pinning and unpinning DMA pages. This way, pin_user_pages*() calls match up with unpin_user_pages*() calls, and the API is a lot closer to being self-explanatory. Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
Convert fs/io_uring to use the new pin_user_pages() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages, and therefore for any code that calls put_user_page(). In partial anticipation of this work, the io_uring code was already calling put_user_page() instead of put_page(). Therefore, in order to convert from the get_user_pages()/put_page() model, to the pin_user_pages()/put_user_page() model, the only change required here is to change get_user_pages() to pin_user_pages(). Link: http://lkml.kernel.org/r/20200107224558.2362728-17-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 1月, 2020 2 次提交
-
-
由 Stefan Metzmacher 提交于
With nesting of anonymous unions and structs it's hard to review layout changes. It's better to ask the compiler for these things. Signed-off-by: NStefan Metzmacher <metze@samba.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
It can be hard to know exactly what is registered with the ring. Especially for credentials, it'd be handy to be able to see which ones are registered, what personalities they have, and what the ID of each of them is. This adds support for showing information registered in the ring from the fdinfo of the io_uring fd. Here's an example from a test case that registers 4 files (two of them sparse), 4 buffers, and 2 personalities: pos: 0 flags: 02000002 mnt_id: 14 UserFiles: 4 0: file-no-1 1: file-no-2 2: <none> 3: <none> UserBufs: 4 0: 0x563817c46000/128 1: 0x563817c47000/256 2: 0x563817c48000/512 3: 0x563817c49000/1024 Personalities: 1 Uid: 0 0 0 0 Gid: 0 0 0 0 Groups: 0 CapEff: 0000003fffffffff 2 Uid: 0 0 0 0 Gid: 0 0 0 0 Groups: 0 CapEff: 0000003fffffffff Suggested-by: NJann Horn <jannh@google.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 30 1月, 2020 2 次提交
-
-
由 Jens Axboe 提交于
This adds IORING_OP_EPOLL_CTL, which can perform the same work as the epoll_ctl(2) system call. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We're not consistent in how the file table is grabbed and assigned if we have a command linked that requires the use of it. Add ->file_table to the io_op_defs[] array, and use that to determine when to grab the table instead of having the handlers set it if they need to defer. This also means we can kill the IO_WQ_WORK_NEEDS_FILES flag. We always initialize work->files, so io-wq can just check for that. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 1月, 2020 2 次提交
-
-
由 Jens Axboe 提交于
For personalities previously registered via IORING_REGISTER_PERSONALITY, allow any command to select them. This is done through setting sqe->personality to the id returned from registration, and then flagging sqe->flags with IOSQE_PERSONALITY. Reviewed-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If an application wants to use a ring with different kinds of credentials, it can register them upfront. We don't lookup credentials, the credentials of the task calling IORING_REGISTER_PERSONALITY is used. An 'id' is returned for the application to use in subsequent personality support. Reviewed-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-