- 18 3月, 2020 40 次提交
-
-
由 YueHaibing 提交于
commit 364b05fd06e87e53dc03396f73afeac48d8e0998 upstream. The callback function of call_rcu() just calls kfree(), so we can use kfree_rcu() instead of call_rcu() + callback function. Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 1d7bb1d50fb4dc141c7431cc21fdd24ffcc83c76 upstream. Currently we drop completion events, if the CQ ring is full. That's fine for requests with bounded completion times, but it may make it harder or impossible to use io_uring with networked IO where request completion times are generally unbounded. Or with POLL, for example, which is also unbounded. After this patch, we never overflow the ring, we simply store requests in a backlog for later flushing. This flushing is done automatically by the kernel. To prevent the backlog from growing indefinitely, if the backlog is non-empty, we apply back pressure on IO submissions. Any attempt to submit new IO with a non-empty backlog will get an -EBUSY return from the kernel. This is a signal to the application that it has backlogged CQ events, and that it must reap those before being allowed to submit more IO. Note that if we do return -EBUSY, we will have filled whatever backlogged events into the CQ ring first, if there's room. This means the application can safely reap events WITHOUT entering the kernel and waiting for them, they are already available in the CQ ring. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 78e19bbef38362cebff38aa1ca12e2c82bb72eb8 upstream. This is in preparation for handling CQ ring overflow a bit smarter. We should not have any functional changes in this patch. Most of the changes are fairly straight forward, the only ones that stick out a bit are the ones that change __io_free_req() to take the reference count into account. If the request hasn't been submitted yet, we know it's safe to simply ignore references and free it. But let's clean these up too, as later patches will depend on the caller doing the right thing if the completion logging grabs a reference to the request. Reviewed-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 84f97dc2333c626979bb547fce343a1003544dcc upstream. The rings can be derived from the ctx, and we need the ctx there for a future change. No functional changes in this patch. Reviewed-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 2665abfd757fb35a241c6f0b1ebf620e3ffb36fb upstream. While we have support for generic timeouts, we don't have a way to tie a timeout to a specific SQE. The generic timeouts simply trigger wakeups on the CQ ring. This adds support for IORING_OP_LINK_TIMEOUT. This command is only valid as a link to a previous command. The timeout specific can be either relative or absolute, following the same rules as IORING_OP_TIMEOUT. If the timeout triggers before the dependent command completes, it will attempt to cancel that command. Likewise, if the dependent command completes before the timeout triggers, it will cancel the timeout. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit e977d6d34f0c08e3c3b132c9e73b98d0db50abc1 upstream. We're going to need this helper in a future patch, so move it out of io_async_cancel() and into its own separate function. No functional changes in this patch. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
commit 267bc90442aa47002e2991f7d9dd141e168b466b upstream. req->submit is always up-to-date, use it directly Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
commit 50585b9a07367b92382c1e975265344daeba78cd upstream. Stack allocated struct sqe_submit is passed down to the submission path along with a request (a.k.a. struct io_kiocb), and will be copied into req->submit for async requests. As space for it is already allocated, fill req->submit in the first place instead of using on-stack one. As a result: 1. sqe->submit is the only place for sqe_submit and is always valid, so we don't need to track which one to use. 2. don't need to copy in case of async 3. allows to simplify the code by not carrying it as an argument all the way down 4. allows to reduce number of function arguments / potentially improve spilling The downside is that stack is most probably be cached, that's not true for just allocated memory for a request. Another concern is cache pollution. Though, a request would be touched and fetched along with req->submit at some point anyway, so shouldn't be a problem. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
commit 196be95cd5572078be9deb81cbea145fab246029 upstream. Let io_submit_sqes() to allocate io_kiocb before fetching an sqe. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
commit e5eb6366ac2d1df8ad5b010718ac1997ceae45be upstream. After a call to io_submit_sqe(), it's already known whether it needs to queue a link or not. Do it there, as it's simplier and doesn't keep an extra variable across the loop. Reviewed-by:Bob Liu <bob.liu@oracle.com> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
commit ae9428ca61271b6b7f52ebbc359676c9fdfde523 upstream. io_submit_sqes() and io_ring_submit() are doing the same stuff with a little difference. Deduplicate them. Reviewed-by:Bob Liu <bob.liu@oracle.com> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 3aa5fa030558e2b0da284fd069aeb7178543c987 upstream. We had no more use for this flag after the conversion to io-wq, kill it off. Fixes: 561fb04a6a22 ("io_uring: replace workqueue usage with io-wq") Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit f1f40853c01b5ccd0a1a29ce0b515c6f5405a798 upstream. If a request fails, we need to ensure we set REQ_F_FAIL_LINK on it if REQ_F_LINK is set. Any failure in the chain should break the chain. We were missing a few spots where this should be done. It might be nice to generalize this somewhat at some point, as long as we factor in the fact that failure looks different for each request type. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 89723d0bd6c77540c01ce7db2cd6f8c3be2fd958 upstream. As introduced by commit: ba816ad61fdf ("io_uring: run dependent links inline if possible") enable inline dependent link running for poll commands. io_poll_complete_work() is the most important change, as it allows a linked sequence of { POLL, READ } (for example) to proceed inline instead of needing to get punted to another async context. The submission side only potentially matters for sqthread, but may as well include that bit. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 1056ef940380c4e32349ccb6d956858edf70520c upstream. We now have a list that's appropriate for both kernel and userspace discussions on io_uring usage and development, add that to the MAINTAINERS entry. Also add the io-wq files. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 51c3ff62cac635ae9d75f875ce5b7bdafc97abd5 upstream. We currently don't have a completion event trace, add one of those. And to better be able to match up submissions and completions, add user_data to the submission trace as well. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 0069fc6b1cf28de3a3890ed7c87a5b8ab79ca528 upstream. This internal logic was killed with the conversion to io-wq, so we no longer have a need for this particular trace. Kill it. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jackie Liu 提交于
commit e9ffa5c2b77edf2689f876b640318b16fc3ea2a7 upstream. We didn't use -ERESTARTSYS to tell the application layer to restart the system call, but instead return -EINTR. we can set -EINTR directly when wakeup by the signal, which can help us save an assignment operation and comparison operation. Reviewed-by: NBob Liu <bob.liu@oracle.com> Signed-off-by: NJackie Liu <liuyun01@kylinos.cn> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 62755e35dfb2b113c52b81cd96d01c20971c8e02 upstream. This adds support for IORING_OP_ASYNC_CANCEL, which will attempt to cancel requests that have been punted to async context and are now in-flight. This works for regular read/write requests to files, as long as they haven't been started yet. For socket based IO (or things like accept4(2)), we can cancel work that is already running as well. To cancel a request, the sqe must have ->addr set to the user_data of the request it wishes to cancel. If the request is cancelled successfully, the original request is completed with -ECANCELED and the cancel request is completed with a result of 0. If the request was already running, the original may or may not complete in error. The cancel request will complete with -EALREADY for that case. And finally, if the request to cancel wasn't found, the cancel request is completed with -ENOENT. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 6873e0bd6a9cb14ecfadd89d9ed9698ff1761902 upstream. We use io_kiocb->result == -EAGAIN as a way to know if we need to re-submit a polled request, as -EAGAIN reporting happens out-of-line for IO submission failures. This field is cleared when we originally allocate the request, but it isn't reset when we retry the submission from async context. This can cause issues where we think something needs a re-issue, but we're really just reading stale data. Reset ->result whenever we re-prep a request for polled submission. Cc: stable@vger.kernel.org Fixes: 9e645e1105ca ("io_uring: add support for sqe links") Reported-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 975c99a570967dd48e917dd7853867fee3febabd upstream. syzbot reported an issue where we crash at setup time if failslab is used. The issue is that io_wq_create() returns an error pointer on failure, not NULL. Hence io_uring thought the io-wq was setup just fine, but in reality it's a garbage error pointer. Use IS_ERR() instead of a NULL check, and assign ret appropriately. Reported-by: syzbot+221cc24572a2fed23b6b@syzkaller.appspotmail.com Fixes: 561fb04a6a22 ("io_uring: replace workqueue usage with io-wq") Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 842f96124c5617b060cc0f071dcfb6ab24bdd042 upstream. If we get -1 from hrtimer_try_to_cancel(), we know that the timer is running. Hence leave all completion to the timeout handler. If we don't, we can corrupt the list and miss a completion. Fixes: 11365043e527 ("io_uring: add support for canceling timeout requests") Reported-by: NHrvoje Zeba <zeba.hrvoje@gmail.com> Tested-by: NHrvoje Zeba <zeba.hrvoje@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 65e19f54d29cd8559ce60cfd0d751bef7afbdc5c upstream. There's been a few requests for supporting more fixed files than 1024. This isn't really tricky to do, we just need to split up the file table into multiple tables and index appropriately. As we do so, reduce the max single file table to 512. This enables us to do single page allocs always for the tables, which is an improvement over the situation prior. This patch adds support for up to 64K files, which should be enough for everyone. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit b7620121dc04e44ce654297050f9eaf39d414a34 upstream. We index the file tables with a user given value. After we check it's within our limits, use array_index_nospec() to prevent any spectre attacks here. Suggested-by: NJann Horn <jannh@google.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 17f2fe35d080d8f64e86a60cdcd3a97edcbc213b upstream. This allows an application to call accept4() in an async fashion. Like other opcodes, we first try a non-blocking accept, then punt to async context if we have to. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit de2ea4b64b75a79ed9cdf9bf30e0e197901084e4 upstream. This is identical to __sys_accept4(), except it takes a struct file instead of an fd, and it also allows passing in extra file->f_flags flags. The latter is done to support masking in O_NONBLOCK without manipulating the original file flags. No functional changes in this patch. Cc: netdev@vger.kernel.org Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit fcb323cc53e29d9cc696d606bb42736b32dd9825 upstream. This is in preparation for adding opcodes that need to add new files in a process file table, system calls like open(2) or accept4(2). If an opcode needs this, it must set IO_WQ_WORK_NEEDS_FILES in the work item. If work that needs to get punted to async context have this set, the async worker will assume the original task file table before executing the work. Note that opcodes that need access to the current files of an application cannot be done through IORING_SETUP_SQPOLL. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 561fb04a6a2257716738dac2ed812f377c2634c2 upstream. Drop various work-arounds we have for workqueues: - We no longer need the async_list for tracking sequential IO. - We don't have to maintain our own mm tracking/setting. - We don't need a separate workqueue for buffered writes. This didn't even work that well to begin with, as it was suboptimal for multiple buffered writers on multiple files. - We can properly cancel pending interruptible work. This fixes deadlocks with particularly socket IO, where we cannot cancel them when the io_uring is closed. Hence the ring will wait forever for these requests to complete, which may never happen. This is different from disk IO where we know requests will complete in a finite amount of time. - Due to being able to cancel work interruptible work that is already running, we can implement file table support for work. We need that for supporting system calls that add to a process file table. - It gets us one step closer to adding async support for any system call. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 771b53d033e8663abdf59704806aa856b236dcdb upstream. This adds support for io-wq, a smaller and specialized thread pool implementation. This is meant to replace workqueues for io_uring. Among the reasons for this addition are: - We can assign memory context smarter and more persistently if we manage the life time of threads. - We can drop various work-arounds we have in io_uring, like the async_list. - We can implement hashed work insertion, to manage concurrency of buffered writes without needing a) an extra workqueue, or b) needlessly making the concurrency of said workqueue very low which hurts performance of multiple buffered file writers. - We can implement cancel through signals, for cancelling interruptible work like read/write (or send/recv) to/from sockets. - We need the above cancel for being able to assign and use file tables from a process. - We can implement a more thorough cancel operation in general. - We need it to move towards a syslet/threadlet model for even faster async execution. For that we need to take ownership of the used threads. This list is just off the top of my head. Performance should be the same, or better, at least that's what I've seen in my testing. io-wq supports basic NUMA functionality, setting up a pool per node. io-wq hooks up to the scheduler schedule in/out just like workqueue and uses that to drive the need for more/less workers. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk> [Joseph: Cherry-pick allow_kernel_signal() from upstream commit 33da8e7c814f] Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Thomas Gleixner 提交于
commit 6d25be5782e482eb93e3de0c94d0a517879377d0 upstream. The worker accounting for CPU bound workers is plugged into the core scheduler code and the wakeup code. This is not a hard requirement and can be avoided by keeping track of the state in the workqueue code itself. Keep track of the sleeping state in the worker itself and call the notifier before entering the core scheduler. There might be false positives when the task is woken between that call and actually scheduling, but that's not really different from scheduling and being woken immediately after switching away. When nr_running is updated when the task is retunrning from schedule() then it is later compared when it is done from ttwu(). [ bigeasy: preempt_disable() around wq_worker_sleeping() by Daniel Bristot de Oliveira ] Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NTejun Heo <tj@kernel.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/ad2b29b5715f970bffc1a7026cabd6ff0b24076a.1532952814.git.bristot@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Thomas Gleixner 提交于
commit 15917dc02841862840efcbfe1da0830f88078b5c upstream. The RTMUTEX tester was removed long ago but the PF bit stayed around. Remove it and free up the space. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
commit 95a1b3ff9a3e4ea2f26c4e802067d58831f415db upstream. Commit fb5ccc98782f ("io_uring: Fix broken links with offloading") introduced a potential performance regression with unconditionally taking mm even for READ/WRITE_FIXED operations. Return the logic handling it back. mm-faulted requests will go through the generic submission path, so honoring links and drains, but will fail further on req->has_user check. Fixes: fb5ccc98782f ("io_uring: Fix broken links with offloading") Cc: stable@vger.kernel.org # v5.4 Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
commit fa4562280889ad372dfb1413833a8b8675721b17 upstream. submit->index is used only for inbound check in submission path (i.e. head < ctx->sq_entries). However, it always will be true, as 1. it's already validated by io_get_sqring() 2. ctx->sq_entries can't be changedd in between, because of held ctx->uring_lock and ctx->refs. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Dmitrii Dolgov 提交于
commit c826bd7a743f275e2b68c16d595534063b400deb upstream. To trace io_uring activity one can get an information from workqueue and io trace events, but looks like some parts could be hard to identify via this approach. Making what happens inside io_uring more transparent is important to be able to reason about many aspects of it, hence introduce the set of tracing events. All such events could be roughly divided into two categories: * those, that are helping to understand correctness (from both kernel and an application point of view). E.g. a ring creation, file registration, or waiting for available CQE. Proposed approach is to get a pointer to an original structure of interest (ring context, or request), and then find relevant events. io_uring_queue_async_work also exposes a pointer to work_struct, to be able to track down corresponding workqueue events. * those, that provide performance related information. Mostly it's about events that change the flow of requests, e.g. whether an async work was queued, or delayed due to some dependencies. Another important case is how io_uring optimizations (e.g. registered files) are utilized. Signed-off-by: NDmitrii Dolgov <9erthalion6@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 11365043e5271fea4c92189a976833da477a3a44 upstream. We might have cases where the need for a specific timeout is gone, add support for canceling an existing timeout operation. This works like the POLL_REMOVE command, where the application passes in the user_data of the timeout it wishes to cancel in the sqe->addr field. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit a41525ab2e75987e809926352ebc6f1397da900e upstream. This is a pretty trivial addition on top of the relative timeouts we have now, but it's handy for ensuring tighter timing for those that are building scheduling primitives on top of io_uring. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jackie Liu 提交于
commit ba5290ccb6b57fc5e274ae46d051fba1f0ece262 upstream. There is no function change, just to clean up the code, use s->in_async to make the code know where it is. Signed-off-by: NJackie Liu <liuyun01@kylinos.cn> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 33a107f0a1b8df0ad925e39d8afc97bb78e0cec1 upstream. We currently size the CQ ring as twice the SQ ring, to allow some flexibility in not overflowing the CQ ring. This is done because the SQE life time is different than that of the IO request itself, the SQE is consumed as soon as the kernel has seen the entry. Certain application don't need a huge SQ ring size, since they just submit IO in batches. But they may have a lot of requests pending, and hence need a big CQ ring to hold them all. By allowing the application to control the CQ ring size multiplier, we can cater to those applications more efficiently. If an application wants to define its own CQ ring size, it must set IORING_SETUP_CQSIZE in the setup flags, and fill out io_uring_params->cq_entries. The value must be a power of two. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit c3a31e605620c279163c14068a60869ea3fda203 upstream. Allows the application to remove/replace/add files to/from a file set. Passes in a struct: struct io_uring_files_update { __u32 offset; __s32 *fds; }; that holds an array of fds, size of array passed in through the usual nr_args part of the io_uring_register() system call. The logic is as follows: 1) If ->fds[i] is -1, the existing file at i + ->offset is removed from the set. 2) If ->fds[i] is a valid fd, the existing file at i + ->offset is replaced with ->fds[i]. For case #2, is the existing file is currently empty (fd == -1), the new fd is simply added to the array. Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 08a451739a9b5783f67de51e84cb6d9559bb9dc4 upstream. This is in preparation for allowing updates to fixed file sets without requiring a full unregister+register. Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-