- 07 12月, 2021 3 次提交
-
-
由 John Garry 提交于
The only user of blk_mq_hw_ctx blk_mq_hw_ctx argument is blk_mq_rq_inflight(). Function blk_mq_rq_inflight() uses the hctx to find the associated request queue to match against the request. However this same check is already done in caller bt_iter(), so drop this check. With that change there are no more users of busy_iter_fn blk_mq_hw_ctx argument, so drop the argument. Reviewed-by Hannes Reinecke <hare@suse.de> Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Tested-by: NKashyap Desai <kashyap.desai@broadcom.com> Link: https://lore.kernel.org/r/1638794990-137490-2-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
blk_mq_run_dispatch_ops() is defined as one macro, and plug->mq_list will be changed when running 'dispatch_ops', so add one local variable for holding request queue. Reported-and-tested-by: NYi Zhang <yi.zhang@redhat.com> Fixes: 4cafe86c ("blk-mq: run dispatch lock once in case of issuing from list") Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
The operation protected via blk_mq_run_dispatch_ops() in blk_mq_run_hw_queue won't sleep, so don't run might_sleep() for it. Reported-and-tested-by: NMarek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 12月, 2021 7 次提交
-
-
由 Ming Lei 提交于
It isn't necessary to call blk_mq_run_dispatch_ops() once for issuing single request directly, and enough to do it one time when issuing from whole list. Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203131534.3668411-5-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
We have switched to allocate srcu into request queue, so it is fine to pass request queue to blk_mq_run_dispatch_ops(). Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203131534.3668411-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
In case of BLK_MQ_F_BLOCKING, per-hctx srcu is used to protect dispatch critical area. However, this srcu instance stays at the end of hctx, and it often takes standalone cacheline, often cold. Inside srcu_read_lock() and srcu_read_unlock(), WRITE is always done on the indirect percpu variable which is allocated from heap instead of being embedded, srcu->srcu_idx is read only in srcu_read_lock(). It doesn't matter if srcu structure stays in hctx or request queue. So switch to per-request-queue srcu for protecting dispatch, and this way simplifies quiesce a lot, not mention quiesce is always done on the request queue wide. Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203131534.3668411-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Remove hctx_lock and hctx_unlock, and add one helper of blk_mq_run_dispatch_ops() to run code block defined in dispatch_ops with rcu/srcu read held. Compared with hctx_lock()/hctx_unlock(): 1) remove 2 branch to 1, so we just need to check (hctx->flags & BLK_MQ_F_BLOCKING) once when running one dispatch_ops 2) srcu_idx needn't to be touched in case of non-blocking 3) might_sleep_if() can be moved to the blocking branch Also put the added blk_mq_run_dispatch_ops() in private header, so that the following patch can use it out of blk-mq.c. Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203131534.3668411-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
refcount_t is not as expensive as it used to be, but it's still more expensive than the io_uring method of using atomic_t and just checking for potential over/underflow. This borrows that same implementation, which in turn is based on the mm implementation from Linus. Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Don't call into generic_file_read_iter() if we know it's O_DIRECT, just set it up ourselves and call our own handler. This avoids an indirect call for O_DIRECT. Fall back to filemap_read() if we fail. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
No functional changes in this patch, just in preparation for efficiently calling this light function from the block O_DIRECT handling. Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 12月, 2021 2 次提交
-
-
由 Jens Axboe 提交于
When we attempt to merge off the cached request path, we return NULL if successful. This makes the caller believe that it's should allocate a new request, and hence we end up with the bio both merged and associated with a new request. This, predictably, leads to all sorts of crashes. Pass in a pointer to the bio pointer, and clear it for the merge case. Then the caller knows that the bio is already queued, and no new requests need to get allocated. Fixes: 5b13bc8a ("blk-mq: cleanup request allocation") Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Expected case is returning a request, just check for success and return the request rather than having an error label. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 12月, 2021 1 次提交
-
-
由 Ming Lei 提交于
Without checking q->poll_stat in queue_poll_stat_show(), kernel panic may be caused if q->poll_stat isn't allocated. Fixes: 48b5c1fb ("block: only allocate poll_stats if there's a user of them") Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211202090716.3292244-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 11月, 2021 27 次提交
-
-
由 Ye Bin 提交于
We do test with inject error fault base on v4.19, after test some time we found sync /dev/sda always failed. [root@localhost] sync /dev/sda sync: error syncing '/dev/sda': Input/output error scsi log as follows: [19069.812296] sd 0:0:0:0: [sda] tag#64 Send: scmd 0x00000000d03a0b6b [19069.812302] sd 0:0:0:0: [sda] tag#64 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [19069.812533] sd 0:0:0:0: [sda] tag#64 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK [19069.812536] sd 0:0:0:0: [sda] tag#64 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [19069.812539] sd 0:0:0:0: [sda] tag#64 scsi host busy 1 failed 0 [19069.812542] sd 0:0:0:0: Notifying upper driver of completion (result 0) [19069.812546] sd 0:0:0:0: [sda] tag#64 sd_done: completed 0 of 0 bytes [19069.812549] sd 0:0:0:0: [sda] tag#64 0 sectors total, 0 bytes done. [19069.812564] print_req_error: I/O error, dev sda, sector 0 ftrace log as follows: rep-306069 [007] .... 19654.923315: block_bio_queue: 8,0 FWS 0 + 0 [rep] rep-306069 [007] .... 19654.923333: block_getrq: 8,0 FWS 0 + 0 [rep] kworker/7:1H-250 [007] .... 19654.923352: block_rq_issue: 8,0 FF 0 () 0 + 0 [kworker/7:1H] <idle>-0 [007] ..s. 19654.923562: block_rq_complete: 8,0 FF () 18446744073709551615 + 0 [0] <idle>-0 [007] d.s. 19654.923576: block_rq_complete: 8,0 WS () 0 + 0 [-5] As 8d699663 introduce 'fq->rq_status', this data only update when 'flush_rq' reference count isn't zero. If flush request once failed and record error code in 'fq->rq_status'. If there is no chance to update 'fq->rq_status',then do fsync will always failed. To address this issue reset 'fq->rq_status' after return error code to upper layer. Fixes: 8d699663("block: fix null pointer dereference in blk_mq_rq_timed_out()") Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211129012659.1553733-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Now that blk_execute_rq does not take a gendisk argument there is no need to pass it through the scsi_ioctl callchain either. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait given that it is unused now. Also convert the boolean at_head parameter to actually use the bool type while touching the prototype. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Just use the disk attached to the request_queue instead. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
There is a 1:1 relationship between request_queues and gendisks now, so no need for these extra checks. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The block layer already performs this check, no need to duplicate it in the driver. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMiquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Colin Ian King 提交于
The variable ret is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Signed-off-by: NColin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20211126230652.1175636-1-colin.i.king@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Remove the ioc argument as it always points to current->io_context. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Remove the ioc and gfp_mask argument, which are hard coded by the caller. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Grab a reference to the newly allocated or existing io_context in create_task_io_context and return it. This simplifies the callers and removes the need for double lookups. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-13-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
In __copy_io we know that the newly allocate task_struct does not have an I/O context yet and is not exiting. So just allocate the I/O context struct and install it directly. There is no need to lock the task either as it is just being created. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Factor out a helper that just allocate an I/O context. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Fold it into it's only caller, and remove a lof of the debug checks that are not needed. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
After the prepare side has been moved to the only I/O scheduler that cares, do the same for the cleanup and the NULL initialization. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Move blk_mq_sched_assign_ioc so that many interfaces from the file can be marked static. Rename the function to ioc_find_get_icq as well and return the icq to simplify the interface. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
This reverts commit 4896c4e64ba5d5d5acdbcf68c5910dd4f6d8fa62. The helper is not needed any more. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
No need to create a new I/O context if there is none present yet in ->limit_depth. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Remove the unused bfqd argument, and hardcode ioc to current->io_context. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Move the copying of the I/O context to the block layer as that is where we can use the proper low-level interfaces. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Add the proper module prefix to avoid conflicts with a function in the scheduler. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
bio->bi_opf isn't finalized before checking the bio, so use it after submit_bio_checks() returns. Fixes: 5b13bc8a ("blk-mq: cleanup request allocation") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jan Kara 提交于
Commit 7cc4ffc5 ("block, bfq: put reqs of waker and woken in dispatch list") added a condition to bfq_insert_request() which added waker's requests directly to dispatch list. The rationale was that completing waker's IO is needed to get more IO for the current queue. Although this rationale is valid, there is a hole in it. The waker does not necessarily serve the IO only for the current queue and maybe it's current IO is not needed for current queue to make progress. Furthermore injecting IO like this completely bypasses any service accounting within bfq and thus we do not properly track how much service is waker's queue getting or that the waker is actually doing any IO. Depending on the conditions this can result in the waker getting too much or too few service. Consider for example the following job file: [global] directory=/mnt/repro/ rw=write size=8g time_based runtime=30 ramp_time=10 blocksize=1m direct=0 ioengine=sync [slowwriter] numjobs=1 prioclass=2 prio=7 fsync=200 [fastwriter] numjobs=1 prioclass=2 prio=0 fsync=200 Despite processes have very different IO priorities, they get the same about of service. The reason is that bfq identifies these processes as having waker-wakee relationship and once that happens, IO from fastwriter gets injected during slowwriter's time slice. As a result bfq is not aware that fastwriter has any IO to do and constantly schedules only slowwriter's queue. Thus fastwriter is forced to compete with slowwriter's IO all the time instead of getting its share of time based on IO priority. Drop the special injection condition from bfq_insert_request(). As a result, requests will be tracked and queued in a normal way and on next dispatch bfq_select_queue() can decide whether the waker's inserted requests should be injected during the current queue's timeslice or not. Fixes: 7cc4ffc5 ("block, bfq: put reqs of waker and woken in dispatch list") Acked-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-8-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jan Kara 提交于
Waker - wakee relationships are important in deciding whether one queue can preempt the other one. Print information about detected waker-wakee relationships so that scheduling decisions can be better understood from block traces. Acked-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-7-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jan Kara 提交于
Instead of having helper formating bfqq pid, provide a helper to generate full bfqq name as used in the traces. It saves some code duplication and will save more in the coming tracepoints. Acked-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-6-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jan Kara 提交于
Currently, when process A starts issuing requests shortly after process B has completed some IO three times in a row, we decide that B is a "waker" of A meaning that completing IO of B is needed for A to make progress and generally stop separating A's and B's IO much. This logic is useful to avoid unnecessary idling and thus throughput loss for cases where workload needs to switch e.g. between the process and the journaling thread doing IO. However the detection heuristic tends to frequently give false positives when A and B are fighting IO bandwidth and other processes aren't doing much IO as we are basically deemed to eventually accumulate three occurences of a situation where one process starts issuing requests after the other has completed some IO. To reduce these false positives, cancel the waker detection also if we didn't accumulate three detected wakeups within given timeout. The rationale is that if wakeups are really rare, the pointless idling doesn't hurt throughput that much anyway. This significantly reduces false waker detection for workload like: [global] directory=/mnt/repro/ rw=write size=8g time_based runtime=30 ramp_time=10 blocksize=1m direct=0 ioengine=sync [slowwriter] numjobs=1 fsync=200 [fastwriter] numjobs=1 fsync=200 Acked-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-5-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jan Kara 提交于
When cgroup IO scheduling is used with BFQ it does not really provide service differentiation if the cgroup drives a big IO depth. That for example happens with writeback which asynchronously submits lots of IO but it can happen with AIO as well. The problem is that if we have two cgroups that submit IO with different weights, the cgroup with higher weight properly gets more IO time and is able to dispatch more IO. However this causes lower weight cgroup to accumulate more requests inside BFQ and eventually lower weight cgroup consumes most of IO scheduler tags. At that point higher weight cgroup stops getting better service as it is mostly blocked waiting for a scheduler tag while its queues inside BFQ are empty and thus lower weight cgroup gets served. Check how many requests submitting cgroup has allocated in bfq_limit_depth() and if it consumes more requests than what would correspond to its weight limit available depth to 1 so that the cgroup cannot consume many more requests. With this limitation the higher weight cgroup gets proper service even with writeback. Reviewed-by: NMichal Koutný <mkoutny@suse.com> Acked-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-4-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
-