- 19 10月, 2021 24 次提交
-
-
由 Hao Xu 提交于
boolean value is good enough for io_alloc_async_data. Signed-off-by: NHao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210922101522.9179-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
IOSQE_IO_DRAIN is quite marginal and we don't care too much about IOSQE_BUFFER_SELECT. Save to ifs and hide both of them under SQE_VALID_FLAGS check. Now we first check whether it uses a "safe" subset, i.e. without DRAIN and BUFFER_SELECT, and only if it's not true we test the rest of the flags. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dccfb9ab2ab0969a2d8dc59af88fa0ce44eeb1d5.1631703764.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Now completions are done from task context, that means that it's either the task itself, task_work or io-wq worker. In all those cases the ctx will be staying alive by mutexing, explicit referencing or req references by iowq. Remove extra ctx pinning from io_req_complete_post(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/60a0e96434c16ab4fe587651448290d61ec9a113.1631703756.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Hao Xu 提交于
Developers may need some uring info to help themselves debug and address issues in production. This includes sqring/cqring head/tail and the detailed sqe/cqe info, which is very useful when an application is hung on a ring. Signed-off-by: NHao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210913130854.38542-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
TWA_SIGNAL already wakes the thread, no need in wake_up_process() after it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7e90cf643f633e857443e0c9e72471b221735c50.1631115443.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
We don't do io_submit_flush_completions() when there is no requests enqueued, and every single caller checks for it. Hide that check into the function not forgetting about inlining. That will make it much easier for changing the empty check condition in the future. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d7ff8cef5da1b38e8ea648f5aad9a315ddfc7b57.1631115443.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Inline part of __io_req_find_next() that returns a request but doesn't need io_disarm_next(). It's just two places, but makes links a bit faster. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4126d13f23d0e91b39b3558e16bd86cafa7fcef2.1631115443.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_dismantle_req() is hot, and not _too_ huge. Inline it, there are 3 call sites, which hopefully will turn into 2 in the future. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bdd2dc30716cac270c2403e99bccd6286e4ae201.1631115443.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
->ios_left is only used to decide whether to plug or not, kill it to avoid this extra accounting, just use the initial submission number. There is no much difference in regards of enabling plugging, where this one does it in a few more cases, but all major ones should be covered well. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f13993bcf5b477f9a7d52881fc49f9457ea9870a.1631115443.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bixuan Cui 提交于
While task_work_add() in io_workqueue_create() is true, then duplicate code is executed: -> clear_bit_unlock(0, &worker->create_state); -> io_worker_release(worker); -> atomic_dec(&acct->nr_running); -> io_worker_ref_put(wq); -> return false; -> clear_bit_unlock(0, &worker->create_state); // back to io_workqueue_create() -> io_worker_release(worker); -> kfree(worker); The io_worker_release() and clear_bit_unlock() are executed twice. Fixes: 3146cba9 ("io-wq: make worker creation resilient against signals") Signed-off-by: NBixuan Cui <cuibixuan@huawei.com> Link: https://lore.kernel.org/r/20210911085847.34849-1-cuibixuan@huawei.comReviwed-by: NHao Xu <haoxu@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
I recently had to look at a production problem where a request ended up getting the dreaded -EINVAL error on submit. The most used and hence useless of error codes, as it just tells you that something was wrong with your request, but not more than that. Let's dump the full sqe contents if we run into an issue failure, that'll allow easier diagnosing of a wide variety of issues. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We added RQF_ELV to tell whether there's an IO scheduler attached, and RQF_ELVPRIV tells us whether there's an IO scheduler with private data attached. Don't check RQF_ELV in blk_mq_free_request(), what we care about here is just if we have scheduler private data attached. This fixes a boot crash Fixes: 2ff0682d ("block: store elevator state in request") Reported-by: NYi Zhang <yi.zhang@redhat.com> Reported-by: syzbot+eb8104072aeab6cc1195@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Trivial to do now, just need our own io_comp_batch on the stack and pass that in to the usual command completion handling. I pondered making this dependent on how many entries we had to process, but even for a single entry there's no discernable difference in performance or latency. Running a sync workload over io_uring: t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1 yields the below performance before the patch: IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1) IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1) IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1) and the following after: IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1) IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1) IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1) which definitely isn't slower, about the same if you factor in a bit of variance. For peak performance workloads, benchmarking shows a 2% improvement. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Wire up using an io_comp_batch for f_op->iopoll(). If the lower stack supports it, we can handle high rates of polled IO more efficiently. This raises the single core efficiency on my system from ~6.1M IOPS to ~6.6M IOPS running a random read workload at depth 128 on two gen2 Optane drives. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Take advantage of struct io_comp_batch, if passed in to the nvme poll handler. If it's set, rather than complete each request individually inline, store them in the io_comp_batch list. We only do so for requests that will complete successfully, anything else will be completed inline as before. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Instead of calling blk_mq_end_request() on a single request, add a helper that takes the new struct io_comp_batch and completes any request stored in there. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
sbitmap currently only supports clearing tags one-by-one, add a helper that allows the caller to pass in an array of tags to clear. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Instead of open-coding the list additions, traversal, and removal, provide a basic set of helpers. Suggested-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Just like the blk_mq_ctx counterparts, we've got a bunch of counters in here that are only for debugfs and are of questionnable value. They are: - dispatched, index of how many requests were dispatched in one go - poll_{considered,invoked,success}, which track poll sucess rates. We're confident in the iopoll implementation at this point, don't bother tracking these. As a bonus, this shrinks each hardware queue from 576 bytes to 512 bytes, dropping a whole cacheline. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
These were added as part of early days debugging for blk-mq, and they are not really useful anymore. Rather than spend cycles updating them, just get rid of them. As a bonus, this shrinks the per-cpu software queue size from 256b to 192b. That's a whole cacheline less. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Add a local variable for rq_flags, it helps to compile out some of rq_flags reloads. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
We should have enough of registers in blk_mq_rq_ctx_init(), store them in local vars, so we don't keep reloading them. note: keeping q->elevator may look unnecessary, but it's also used inside inlined blk_mq_tags_from_data(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Don't init rq->hash and rq->rb_node in blk_mq_rq_ctx_init() if there is no elevator. Also, move some other initialisers that imply barriers to the end, so the compiler is free to rearrange and optimise other the rest of them. note: fold in a change from Jens leaving queue_list unconditional, as it might lead to problems otherwise. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 10月, 2021 16 次提交
-
-
由 Jens Axboe 提交于
Add an rq private RQF_ELV flag, which tells the block layer that this request was initialized on a queue that has an IO scheduler attached. This allows for faster checking in the fast path, rather than having to deference rq->q later on. Elevator switching does full quiesce of the queue before detaching an IO scheduler, so it's safe to cache this in the request itself. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We set BIO_TRACKED unconditionally when rq_qos_throttle() is called, even though we may not even have an rq_qos handler. Only mark it as TRACKED if it really is potentially tracked. This saves considerable time for the case where the bio isn't tracked: 2.64% -1.65% [kernel.vmlinux] [k] bio_endio Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
It's been a while since this was analyzed, move some members around to better flow with the use case. Initial state up top, and queued state after that. This improves my peak case by about 1.5%, from 7750K to 7900K IOPS. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
For some reason we still have them in blk-core, with the rest of the request completion being in blk-mq. That causes and out-of-line call for each completion. Move them into blk-mq.c instead, where they belong. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We have exactly one caller of this, just get rid of adding the useless function name to the output. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we're completing nbytes and nbytes is the size of the bio, don't bother with calling into the iterator increment helpers. Just clear the bio size and we're done. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached queue pointer and so is faster. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/addf6ea988c04213697ba3684c853e4ed7642a39.1634219547.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached queue pointer and so is faster. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/efc41f880262517c8dc32f932f1b23112f21b255.1634219547.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached queue pointer and so is faster. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/85c36ea784d285a5075baa10049e6b59e15fb484.1634219547.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached queue pointer and so is faster. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a352936ce5d9ac719645b1e29b173d931ebcdc02.1634219547.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
There are tons of places where we need to get a request_queue only having bdev, which turns into bdev->bd_disk->queue. There are probably a hundred of such places considering inline helpers, and enough of them are in hot paths. Cache queue pointer in struct block_device and make use of it in bdev_get_queue(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a3bfaecdd28956f03629d0ca5c63ebc096e1c809.1634219547.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
The fast path is no splitting needed. Separate the handling into a check part we can inline, and an out-of-line handling path if we do need to split. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This generates a lot better code for me, and bumps performance from 7650K IOPS to 7750K IOPS. Looking at profiles for the run and running perf diff, it confirms that we're now sending a lot less time there: 6.38% -2.80% [kernel.vmlinux] [k] blkdev_direct_IO Taking it from the 2nd most cycle consumer to only the 9th most at 3.35% of the CPU time. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
bdev = &BDEV_I(file->f_mapping->host)->bdev Getting struct block_device from a file requires 2 memory dereferences as illustrated above, that takes a toll on performance, so cache it in yet unused file->private_data. That gives a noticeable peak performance improvement. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/8415f9fe12e544b9da89593dfbca8de2b52efe03.1634115360.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Set the poll queue flag to enable polling, given that the multipath node just dispatches the bios to a lower queue. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NMark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-17-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The poll attribute is a historic artefact from before when we had explicit poll queues that require driver specific configuration. Just print a warning when writing to the attribute. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Tested-by: NMark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-16-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-