- 23 12月, 2021 3 次提交
-
-
由 Lukas Bulwahn 提交于
Commit 5fc11eeb ("block: open code create_task_io_context in set_task_ioprio") introduces a needless assignment 'ioc = task->io_context', as the local variable ioc is not further used before returning. Even after the further fix, commit a957b612 ("block: fix error in handling dead task for ioprio setting"), the assignment still remains needless. Drop this needless assignment in set_task_ioprio(). This code smell was identified with 'make clang-analyzer'. Fixes: 5fc11eeb ("block: open code create_task_io_context in set_task_ioprio") Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211223125300.20691-1-lukas.bulwahn@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Keith Busch 提交于
While harmless, the blank line is certainly not intended to be part of the rq_list_for_each() macro. Remove it. Signed-off-by: NKeith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211222215239.1768164-1-kbusch@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Randy Dunlap 提交于
Fix all kernel-doc warnings in <linux/bio.h>: include/linux/bio.h:136: warning: Function parameter or member 'nbytes' not described in 'bio_advance' include/linux/bio.h:136: warning: Excess function parameter 'bytes' description in 'bio_advance' include/linux/bio.h:391: warning: No description found for return value of 'bio_next_split' Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/20211222211532.24060-1-rdunlap@infradead.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 12月, 2021 3 次提交
-
-
由 Tetsuo Handa 提交于
ioctl(fd, LOOP_CTL_ADD, 1048576) causes sysfs: cannot create duplicate filename '/dev/block/7:0' message because such request is treated as if ioctl(fd, LOOP_CTL_ADD, 0) due to MINORMASK == 1048575. Verify that all minor numbers for that device fit in the minor range. Reported-by: Nwangyangbo <wangyangbo@uniontech.com> Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/b1b19379-23ee-5379-0eb5-94bf5f79f1b4@i-love.sakura.ne.jpSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tetsuo Handa 提交于
Since lo_simple_ioctl(LOOP_SET_BLOCK_SIZE) and ioctl(NBD_SET_BLKSIZE) pass user-controlled "unsigned long arg" to blk_validate_block_size(), "unsigned long" should be used for validation. Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/9ecbf057-4375-c2db-ab53-e4cc0dff953d@i-love.sakura.ne.jpSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
One device_add is called disk->ev will be freed by disk_release, so we should free it twice. Fix this by allocating disk->ev after device_add so that the extra local unwinding can be removed entirely. Based on an earlier patch from Tetsuo Handa. Reported-by: Nsyzbot <syzbot+28a66a9fbc621c939000@syzkaller.appspotmail.com> Tested-by: Nsyzbot <syzbot+28a66a9fbc621c939000@syzkaller.appspotmail.com> Fixes: 83cbce95 ("block: add error handling for device_add_disk / add_disk") Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211221161851.788424-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 12月, 2021 4 次提交
-
-
由 Ming Lei 提交于
blk_stat_disable_accounting() is added in commit 68497092 ("block: make queue stat accounting a reference"), and called in kyber_exit_sched(). So we have to free q->stats after elevator is unloaded from blk_exit_queue() in blk_release_queue(). Otherwise kernel panic is caused. Fixes: 68497092 ("block: make queue stat accounting a reference") Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211221040436.1333880-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Don't combine the task exiting and "already have io_context" case, we need to just abort if the task is marked as dead. Return -ESRCH, which is the documented value for ioprio_set() if the specified task could not be found. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Reported-by: syzbot+8836466a79f4175961b0@syzkaller.appspotmail.com Fixes: 5fc11eeb ("block: open code create_task_io_context in set_task_ioprio") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Keith Busch 提交于
The low level drivers don't expect to see new requests after a successful quiesce completes. Check the queue quiesce state within the rcu protected area prior to calling the driver's queue_rqs(). Fixes: 3c67d44d ("block: add mq_ops->queue_rqs hook") Signed-off-by: NKeith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211220205919.180191-1-kbusch@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Wander Lairson Costa 提交于
The running_trace_lock protects running_trace_list and is acquired within the tracepoint which implies disabled preemption. The spinlock_t typed lock can not be acquired with disabled preemption on PREEMPT_RT because it becomes a sleeping lock. The runtime of the tracepoint depends on the number of entries in running_trace_list and has no limit. The blk-tracer is considered debug code and higher latencies here are okay. Make running_trace_lock a raw_spinlock_t. Signed-off-by: NWander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20211220192827.38297-1-wander@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 12月, 2021 14 次提交
-
-
由 Christoph Hellwig 提交于
Only bfq needs to code to track icq, so make it conditional. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Fold create_task_io_context into the only remaining caller. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The flow in set_task_ioprio can be simplified by simply open coding create_task_io_context, which removes a refcount roundtrip on the I/O context. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Fold get_task_io_context into its only caller, and simplify the code as no reference to the I/O context is required to just set the ioprio field. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Keep set_task_ioprio with the other low-level code that accesses the io_context structure. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Fold __ioc_clear_queue into ioc_clear_queue and switch to always use plain _irq locking instead of the more expensive _irqsave that is not needed here. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Move the code to delay freeing the icqs into a separate helper. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
No caller passes in a NULL pointer, so remove the check. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Factor out a ioc_exit_icqs helper to tear down the icqs and the fold the rest of put_iocontext_active into exit_io_context. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Don't hold a reference to ->refcount for each active reference, but just one for all active references. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Nothing ever looks at ->nr_tasks, so remove it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This enables the block layer to send us a full plug list of requests that need submitting. The block layer guarantees that they all belong to the same queue, but we do have to check the hardware queue mapping for each request. If errors are encountered, leave them in the passed in list. Then the block layer will handle them individually. This is good for about a 4% improvement in peak performance, taking us from 9.6M to 10M IOPS/core. Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Add a nvme_prep_rq() helper to setup a command, and nvme_queue_rq() is adapted to use this helper. Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We'll need it for batched submit as well. Since we now have a copy helper, get rid of the nvme_submit_cmd() wrapper. Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 12月, 2021 3 次提交
-
-
由 Jens Axboe 提交于
If we have a list of requests in our plug list, send it to the driver in one go, if possible. The driver must set mq_ops->queue_rqs() to support this, if not the usual one-by-one path is used. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Pointless to maintain a head/tail for the list, as we never need to access the tail. Entries are always LIFO for cache hotness reasons. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
The batched completions only deal with non-partial requests anyway, and it doesn't deal with any requests that have errors. Add a completion handler that assumes it's a full request and that it's all being ended successfully. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 12月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
kyber turns on IO statistics when it is loaded on a queue, which means that even if kyber is then later unloaded, we're still stuck with stats enabled on the queue. Change the account enabled from a bool to an int, and pair the enable call with the equivalent disable call. This ensures that stats gets turned off again appropriately. Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 12月, 2021 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
Add a Context section and rewrite the rest to be clearer. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Link: https://lore.kernel.org/r/20211213171113.3097631-1-willy@infradead.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 12月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
mtdblock / mtdblock_ro set part_bits to 0 and thus nevever scanned partitions. Restore that behavior by setting the GENHD_FL_NO_PART flag. Fixes: 1ebe2e5f ("block: remove GENHD_FL_EXT_DEVT") Reported-by: NGeert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20211206070409.2836165-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 07 12月, 2021 5 次提交
-
-
由 John Garry 提交于
Kashyap reports high CPU usage in blk_mq_queue_tag_busy_iter() and callees using megaraid SAS RAID card since moving to shared tags [0]. Previously, when shared tags was shared sbitmap, this function was less than optimum since we would iter through all tags for all hctx's, yet only ever match upto tagset depth number of rqs. Since the change to shared tags, things are even less efficient if we have parallel callers of blk_mq_queue_tag_busy_iter(). This is because in bt_iter() -> blk_mq_find_and_get_req() there would be more contention on accessing each request ref and tags->lock since they are now shared among all HW queues. Optimise by having separate calls to bt_for_each() for when we're using shared tags. In this case no longer pass a hctx, as it is no longer relevant, and teach bt_iter() about this. Ming suggested something along the lines of this change, apart from a different implementation. [0] https://lore.kernel.org/linux-block/e4e92abbe9d52bcba6b8cc6c91c442cc@mail.gmail.com/Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reported-and-tested-by: NKashyap Desai <kashyap.desai@broadcom.com> Fixes: e155b0c2 ("blk-mq: Use shared tags for shared sbitmap support") Link: https://lore.kernel.org/r/1638794990-137490-4-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Typedefs busy_iter_fn and busy_tag_iter_fn are now identical, so delete busy_iter_fn to reduce duplication. It would be nicer to delete busy_tag_iter_fn, as the name busy_iter_fn is less specific. However busy_tag_iter_fn is used in many different parts of the tree, unlike busy_iter_fn which is just use in block/, so just take the straightforward path now, so that we could rename later treewide. Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Tested-by: NKashyap Desai <kashyap.desai@broadcom.com> Link: https://lore.kernel.org/r/1638794990-137490-3-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
The only user of blk_mq_hw_ctx blk_mq_hw_ctx argument is blk_mq_rq_inflight(). Function blk_mq_rq_inflight() uses the hctx to find the associated request queue to match against the request. However this same check is already done in caller bt_iter(), so drop this check. With that change there are no more users of busy_iter_fn blk_mq_hw_ctx argument, so drop the argument. Reviewed-by Hannes Reinecke <hare@suse.de> Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Tested-by: NKashyap Desai <kashyap.desai@broadcom.com> Link: https://lore.kernel.org/r/1638794990-137490-2-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
blk_mq_run_dispatch_ops() is defined as one macro, and plug->mq_list will be changed when running 'dispatch_ops', so add one local variable for holding request queue. Reported-and-tested-by: NYi Zhang <yi.zhang@redhat.com> Fixes: 4cafe86c ("blk-mq: run dispatch lock once in case of issuing from list") Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
The operation protected via blk_mq_run_dispatch_ops() in blk_mq_run_hw_queue won't sleep, so don't run might_sleep() for it. Reported-and-tested-by: NMarek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 12月, 2021 5 次提交
-
-
由 Ming Lei 提交于
It isn't necessary to call blk_mq_run_dispatch_ops() once for issuing single request directly, and enough to do it one time when issuing from whole list. Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203131534.3668411-5-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
We have switched to allocate srcu into request queue, so it is fine to pass request queue to blk_mq_run_dispatch_ops(). Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203131534.3668411-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
In case of BLK_MQ_F_BLOCKING, per-hctx srcu is used to protect dispatch critical area. However, this srcu instance stays at the end of hctx, and it often takes standalone cacheline, often cold. Inside srcu_read_lock() and srcu_read_unlock(), WRITE is always done on the indirect percpu variable which is allocated from heap instead of being embedded, srcu->srcu_idx is read only in srcu_read_lock(). It doesn't matter if srcu structure stays in hctx or request queue. So switch to per-request-queue srcu for protecting dispatch, and this way simplifies quiesce a lot, not mention quiesce is always done on the request queue wide. Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203131534.3668411-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Remove hctx_lock and hctx_unlock, and add one helper of blk_mq_run_dispatch_ops() to run code block defined in dispatch_ops with rcu/srcu read held. Compared with hctx_lock()/hctx_unlock(): 1) remove 2 branch to 1, so we just need to check (hctx->flags & BLK_MQ_F_BLOCKING) once when running one dispatch_ops 2) srcu_idx needn't to be touched in case of non-blocking 3) might_sleep_if() can be moved to the blocking branch Also put the added blk_mq_run_dispatch_ops() in private header, so that the following patch can use it out of blk-mq.c. Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203131534.3668411-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
refcount_t is not as expensive as it used to be, but it's still more expensive than the io_uring method of using atomic_t and just checking for potential over/underflow. This borrows that same implementation, which in turn is based on the mm implementation from Linus. Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-