- 02 6月, 2015 1 次提交
-
-
由 Keith Busch 提交于
Storage controllers may expose multiple block devices that share hardware resources managed by blk-mq. This patch enhances the shared tags so a low-level driver can access the shared resources not tied to the unshared h/w contexts. This way the LLD can dynamically add and delete disks and request queues without having to track all the request_queue hctx's to iterate outstanding tags. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 19 5月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
lockdep gets unhappy about the not disabling irqs when using the queue_lock around it. Instead of trying to fix that up just switch to an atomic_t and get rid of the lock. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 5月, 2015 4 次提交
-
-
由 Shaohua Li 提交于
Last patch makes plug work for multiple queue case. However it only works for single disk case, because it assumes only one request in the plug list. If a task is accessing multiple disks, eg MD/DM, the assumption is wrong. Let blk_attempt_plug_merge() record request from the same queue. V2: use NULL parameter in !mq case. Fix a bug. Add comments in blk_attempt_plug_merge to make it less (hopefully) confusion. Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Shaohua Li 提交于
plug is still helpful for workload with IO merge, but it can be harmful otherwise especially with multiple hardware queues, as there is (supposed) no lock contention in this case and plug can introduce latency. For multiple queues, we do limited plug, eg plug only if there is request merge. If a request doesn't have merge with following request, the requet will be dispatched immediately. V2: check blk_queue_nomerges() as suggested by Jeff. Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Shaohua Li 提交于
If we directly issue a request and it fails, we use blk_mq_merge_queue_io(). But we already assigned bio to a request in blk_mq_bio_to_request. blk_mq_merge_queue_io shouldn't run blk_mq_bio_to_request again. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jeff Moyer 提交于
The following appears in blk_sq_make_request: /* * If we have multiple hardware queues, just go directly to * one of those for sync IO. */ We clearly don't have multiple hardware queues, here! This comment was introduced with this commit 07068d5b (blk-mq: split make request handler for multi and single queue): We want slightly different behavior from them: - On single queue devices, we currently use the per-process plug for deferred IO and for merging. - On multi queue devices, we don't use the per-process plug, but we want to go straight to hardware for SYNC IO. The old code had this: use_plug = !is_flush_fua && ((q->nr_hw_queues == 1) || !is_sync); and that was converted to: use_plug = !is_flush_fua && !is_sync; which is not equivalent. For the single queue case, that second half of the && expression is always true. So, what I think was actually inteded follows (and this more closely matches what is done in blk_queue_bio). V2: delete the 'likely', which should not be a big deal Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 17 4月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
Commit 889fa31f was a bit too eager in reducing the loop count, so we ended up missing queues in some configurations. Ensure that our division rounds up, so that's not the case. Reported-by: NGuenter Roeck <linux@roeck-us.net> Fixes: 889fa31f ("blk-mq: reduce unnecessary software queue looping") Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 16 4月, 2015 1 次提交
-
-
由 Chong Yuan 提交于
In flush_busy_ctxs() and blk_mq_hctx_has_pending(), regardless of how many ctxs assigned to one hctx, they will all loop hctx->ctx_map.map_size times. Here hctx->ctx_map.map_size is a const ALIGN(nr_cpu_ids, 8) / 8. Especially, flush_busy_ctxs() is in hot code path. And it's unnecessary. Change ->map_size to contain the actually mapped software queues, so we only loop for as many iterations as we have to. And remove cpumask setting and nr_ctx count in blk_mq_init_cpu_queues() since they are all re-done in blk_mq_map_swqueue(). blk_mq_map_swqueue(). Signed-off-by: NChong Yuan <chong.yuan@memblaze.com> Reviewed-by: NWenbo Wang <wenbo.wang@memblaze.com> Updated by me for formatting and commenting. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 12 4月, 2015 1 次提交
-
-
由 Linus Torvalds 提交于
Jan Engelhardt reports a strange oops with an invalid ->sense_buffer pointer in scsi_init_cmd_errh() with the blk-mq code. The sense_buffer pointer should have been initialized by the call to scsi_init_request() from blk_mq_init_rq_map(), but there seems to be some non-repeatable memory corruptor. This patch makes sure we initialize the whole struct request allocation (and the associated 'struct scsi_cmnd' for the SCSI case) to zero, by using __GFP_ZERO in the allocation. The old code initialized a couple of individual fields, leaving the rest undefined (although many of them are then initialized in later phases, like blk_mq_rq_ctx_init() etc. It's not entirely clear why this matters, but it's the rigth thing to do regardless, and with 4.0 imminent this is the defensive "let's just make sure everything is initialized properly" patch. Tested-by: NJan Engelhardt <jengelh@inai.de> Acked-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 3月, 2015 2 次提交
-
-
由 Wei Fang 提交于
Don't assign ->rq_timeout twice. Signed-off-by: NWei Fang <fangwei1@huawei.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Xiaoguang Wang 提交于
At the beginning of blk_mq_alloc_tag_set(), we have already checked whether 'set->nr_hw_queues' is zero, so here remove this redundant check. Signed-off-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 13 3月, 2015 4 次提交
-
-
由 Keith Busch 提交于
Return -EBUSY if we're unable to enter a queue immediately when allocating a blk-mq request without __GFP_WAIT. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Snitzer 提交于
Rename blk_mq_run_queues to blk_mq_run_hw_queues, add async argument, and export it. DM's suspend support must be able to run the queue without starting stopped hw queues. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Snitzer 提交于
Add a variant of blk_mq_init_queue that allows a previously allocated queue to be initialized. blk_mq_init_allocated_queue models blk_init_allocated_queue -- which was also created for DM's use. DM's approach to device creation requires a placeholder request_queue be allocated for use with alloc_dev() but the decision about what type of request_queue will be ultimately created is deferred until all component devices referenced in the DM table are processed to determine the table type (request-based, blk-mq request-based, or bio-based). Also, because of DM's late finalization of the request_queue type the call to blk_mq_register_disk() doesn't happen during alloc_dev(). Must export blk_mq_register_disk() so that DM can backfill the 'mq' dir once the blk-mq queue is fully allocated. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Snitzer 提交于
If percpu_ref_init() fails the allocated q and hctxs must get cleaned up; using 'err_map' doesn't allow that to happen. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NMing Lei <ming.lei@canonical.com> Cc: stable@kernel.org Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 11 2月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
We no longer use it outside of blk-mq.c, so we can make it static and stop exporting it. Additionally, kill the 'async' argument, as there's only one used of it. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 30 1月, 2015 2 次提交
-
-
由 Ming Lei 提交于
The kobject memory inside blk-mq hctx/ctx shouldn't have been freed before the kobject is released because driver core can access it freely before its release. We can't do that in all ctx/hctx/mq_kobj's release handler because it can be run before blk_cleanup_queue(). Given mq_kobj shouldn't have been introduced, this patch simply moves mq's release into blk_release_queue(). Reported-by: NSasha Levin <sasha.levin@oracle.com> Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
This reverts commit 76d697d1. The commit 76d697d1 causes general protection fault reported from Bart Van Assche: https://lkml.org/lkml/2015/1/28/334Reported-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 24 1月, 2015 1 次提交
-
-
由 Shaohua Li 提交于
This is the blk-mq part to support tag allocation policy. The default allocation policy isn't changed (though it's not a strict FIFO). The new policy is round-robin for libata. But it's a try-best implementation. If multiple tasks are competing, the tags returned will be mixed (which is unavoidable even with !mq, as requests from different tasks can be mixed in queue) Cc: Jens Axboe <axboe@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 21 1月, 2015 1 次提交
-
-
由 Ming Lei 提交于
The kobject memory shouldn't have been freed before the kobject is released because driver core can access it freely before its release. This patch frees hctx in its release callback. For ctx, they share one single per-cpu variable which is associated with the request queue, so free ctx in q->mq_kobj's release handler. Signed-off-by: NSasha Levin <sasha.levin@oracle.com> (fix ctx kobjects) Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 08 1月, 2015 7 次提交
-
-
由 Keith Busch 提交于
Requests that haven't been started prior to a queue dying can be ended in error without waiting for them to start and time out. Signed-off-by: NKeith Busch <keith.busch@intel.com> Added code comment to explain why this is done. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Some types of requests may be started that are not gauranteed to ever complete. This adds a request flag that a driver can use so mark the request as such. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
Adds a helper function a driver can use to abort requeued requests in case any are pending when h/w queues are being removed. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Kicking requeued requests will start h/w queues in a work_queue, which may alter the driver's requested state to temporarily stop them. This patch exports a method to cancel the q->requeue_work so a driver can be assured stopped h/w queues won't be started up before it is ready. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Drivers can iterate over all allocated request tags, but their callback needs a way to know if the driver started the request in the first place. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
When the queue is set to dying, wake up tasks that are waiting on frozen queue so they realize it is dying and abandon their request. Signed-off-by: NKeith Busch <keith.busch@intel.com> Modified by me to add a code comment on the need for the wakeup. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
We store it in the tag set, we don't need it in the hardware queue. While removing cmd_size, place ->queue_num further down to avoid a hole on 64-bit archs. It's not used in any fast paths, so we can safely move it. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 03 1月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
Commit b4c6a028 exported the start and unfreeze, but we need the regular blk_mq_freeze_queue() for the loop conversion. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 01 1月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
If it's dying, we can't expect new request to complete and come in an wake up other tasks waiting for requests. So after we have marked it as dying, wake up everybody currently waiting for a request. Once they wake, they will retry their allocation and fail appropriately due to the state of the queue. Tested-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 21 12月, 2014 2 次提交
-
-
由 Keith Busch 提交于
Let drivers prevent entering a queue that isn't available. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Fixes usage counter when a request could not be allocated. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 12月, 2014 1 次提交
-
-
由 Ming Lei 提交于
When one hardware queue has no mapped software queues, it shouldn't have been scheduled. Otherwise WARNING or OOPS can triggered. blk_mq_hw_queue_mapped() helper is introduce for fixing the problem. Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 01 12月, 2014 1 次提交
-
-
由 Shaohua Li 提交于
We call blk_mq_alloc_tag_set() first then blk_mq_init_queue(). The requests are allocated in the former function. So the kdump check should be moved to there to really save memory. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 24 11月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
Don't duplicate the code to handle the not cpu bounce case in the caller, do it inside blk_mq_hctx_next_cpu instead. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 18 11月, 2014 2 次提交
-
-
由 Jens Axboe 提交于
It's silly to use blk_mq_free_request() which in turn maps the request to the hardware queue, for places where we already know what the hardware queue is. This saves us an extra mapping of a hardware queue on request completion, if the caller knows this information already. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
Drivers that know they are blk-mq should just use this function instead of calling through blk_put_request(). Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 12 11月, 2014 3 次提交
-
-
由 Bart Van Assche 提交于
The queuecommand() callback functions in SCSI low-level drivers need to know which hardware context has been selected by the block layer. Since this information is not available in the request structure, and since passing the hctx pointer directly to the queuecommand callback function would require modification of all SCSI LLDs, add a function to the block layer that allows to query the hardware context index. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Paolo Bonzini 提交于
blk-mq is using preempt_disable/enable in order to ensure that the queue runners are placed on the right CPU. This does not work with the RT patches, because __blk_mq_run_hw_queue takes a non-raw spinlock with the preemption-disabled region. If there is contention on the lock, this violates the rules for preemption-disabled regions. While this should be easily fixable within the RT patches just by doing migrate_disable/enable, we can do better and document _why_ this particular region runs with disabled preemption. After the previous patch, it is trivial to switch it to get/put_cpu; the RT patches then can change it to get_cpu_light, which lets virtio-blk run under RT kernels. Cc: Jens Axboe <axboe@kernel.dk> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: NClark Williams <williams@redhat.com> Tested-by: NClark Williams <williams@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Paolo Bonzini 提交于
preempt_disable/enable surrounds every call to blk_mq_run_hw_queue, except the one in blk-flush.c. In fact that one is always asynchronous, and it does not need smp_processor_id(). We can do the same for all other calls, avoiding preempt_disable when async is true. This avoids peppering blk-mq.c with preemption-disabled regions. Cc: Jens Axboe <axboe@kernel.dk> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: NClark Williams <williams@redhat.com> Tested-by: NClark Williams <williams@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 05 11月, 2014 1 次提交
-
-
由 Tejun Heo 提交于
q->mq_usage_counter is a percpu_ref which is killed and drained when the queue is frozen. On a CPU hotplug event, blk_mq_queue_reinit() which involves freezing the queue is invoked on all existing queues. Because percpu_ref killing and draining involve a RCU grace period, doing the above on one queue after another may take a long time if there are many queues on the system. This patch splits out initiation of freezing and waiting for its completion, and updates blk_mq_queue_reinit_notify() so that the queues are frozen in parallel instead of one after another. Note that freezing and unfreezing are moved from blk_mq_queue_reinit() to blk_mq_queue_reinit_notify(). Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com> Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-