- 28 1月, 2017 1 次提交
-
-
由 Jens Axboe 提交于
Instead of letting the caller check this and handle the details of inserting a flush request, put the logic in the scheduler insertion function. This fixes direct flush insertion outside of the usual make_request_fn calls, like from dm via blk_insert_cloned_request(). Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 27 1月, 2017 2 次提交
-
-
由 Jens Axboe 提交于
If we have both multiple hardware queues and shared tag map between devices, we need to ensure that we propagate the hardware queue restart bit higher up. This is because we can get into a situation where we don't have any IO pending on a hardware queue, yet we fail getting a tag to start new IO. If that happens, it's not enough to mark the hardware queue as needing a restart, we need to bubble that up to the higher level queue as well. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Tested-by: NHannes Reinecke <hare@suse.com>
-
由 Omar Sandoval 提交于
In preparation for putting blk-mq debugging information in debugfs, create a directory tree mirroring the one in sysfs: # tree -d /sys/kernel/debug/block /sys/kernel/debug/block |-- nvme0n1 | `-- mq | |-- 0 | | `-- cpu0 | |-- 1 | | `-- cpu1 | |-- 2 | | `-- cpu2 | `-- 3 | `-- cpu3 `-- vda `-- mq `-- 0 |-- cpu0 |-- cpu1 |-- cpu2 `-- cpu3 Also add the scaffolding for the actual files that will go in here, either under the hardware queue or software queue directories. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 18 1月, 2017 4 次提交
-
-
由 Jens Axboe 提交于
This adds a set of hooks that intercepts the blk-mq path of allocating/inserting/issuing/completing requests, allowing us to develop a scheduler within that framework. We reuse the existing elevator scheduler API on the registration side, but augment that with the scheduler flagging support for the blk-mq interfce, and with a separate set of ops hooks for MQ devices. We split driver and scheduler tags, so we can run the scheduling independently of device queue depth. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
Prep patch for adding an extra tag map for scheduler requests. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
This is in preparation for having another tag set available. Cleanup the parameters, and allow passing in of tags for blk_mq_put_tag(). Signed-off-by: NJens Axboe <axboe@fb.com> [hch: even more cleanups] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
- 10 12月, 2016 1 次提交
-
-
由 Jens Axboe 提交于
Takes a list of requests, and dispatches it. Moves any residual requests to the dispatch list. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com>
-
- 11 11月, 2016 1 次提交
-
-
由 Jens Axboe 提交于
For legacy block, we simply track them in the request queue. For blk-mq, we track them on a per-sw queue basis, which we can then sum up through the hardware queues and finally to a per device state. The stats are tracked in, roughly, 0.1s interval windows. Add sysfs files to display the stats. The feature is off by default, to avoid any extra overhead. In-kernel users of it can turn it on by setting QUEUE_FLAG_STATS in the queue flags. We currently don't turn it on if someone just reads any of the stats files, that is something we could add as well. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 11月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
This will allow SCSI to have a single blk_mq_ops structure that either lets the LLDD map the queues to PCIe MSIx vectors or use the default. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 03 11月, 2016 1 次提交
-
-
由 Bart Van Assche 提交于
Multiple functions test the BLK_MQ_S_STOPPED bit so introduce a helper function that performs this test. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NMing Lei <tom.leiming@gmail.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 9月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
Replace the block-mq notifier list management with the multi instance facility in the cpu hotplug state machine. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-block@vger.kernel.org Cc: rt@linutronix.de Cc: Christoph Hellwing <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 17 9月, 2016 2 次提交
-
-
由 Omar Sandoval 提交于
Allocating your own per-cpu allocation hint separately makes for an awkward API. Instead, allocate the per-cpu hint as part of the struct sbitmap_queue. There's no point for a struct sbitmap_queue without the cache, but you can still use a bare struct sbitmap. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Omar Sandoval 提交于
This is a generally useful data structure, so make it available to anyone else who might want to use it. It's also a nice cleanup separating the allocation logic from the rest of the tag handling logic. The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only selected by CONFIG_BLOCK for now. This should be a complete noop functionality-wise. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 15 9月, 2016 2 次提交
-
-
由 Christoph Hellwig 提交于
This allows drivers specify their own queue mapping by overriding the setup-time function that builds the mq_map. This can be used for example to build the map based on the MSI-X vector mapping provided by the core interrupt layer for PCI devices. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
All drivers use the default, so provide an inline version of it. If we ever need other queue mapping we can add an optional method back, although supporting will also require major changes to the queue setup code. This provides better code generation, and better debugability as well. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 10 2月, 2016 1 次提交
-
-
由 Keith Busch 提交于
The hardware's provided queue count may change at runtime with resource provisioning. This patch allows a block driver to alter the number of h/w queues available when its resource count changes. The main part is a new blk-mq API to request a new number of h/w queues for a given live tag set. The new API freezes all queues using that set, then adjusts the allocated count prior to remapping these to CPUs. The bulk of the rest just shifts where h/w contexts and all their artifacts are allocated and freed. The number of max h/w contexts is capped to the number of possible cpus since there is no use for more than that. As such, all pre-allocated memory for pointers need to account for the max possible rather than the initial number of queues. A side effect of this is that the blk-mq will proceed successfully as long as it can allocate at least one h/w context. Previously it would fail request queue initialization if less than the requested number was allocated. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Tested-by: NJon Derrick <jonathan.derrick@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 02 12月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
We already have the reserved flag, and a nowait flag awkwardly encoded as a gfp_t. Add a real flags argument to make the scheme more extensible and allow for a nicer calling convention. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 12 11月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
It's no longer used outside of blk-mq core. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 10 10月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 30 9月, 2015 1 次提交
-
-
由 Akinobu Mita 提交于
Notifier callbacks for CPU_ONLINE action can be run on the other CPU than the CPU which was just onlined. So it is possible for the process running on the just onlined CPU to insert request and run hw queue before establishing new mapping which is done by blk_mq_queue_reinit_notify(). This can cause a problem when the CPU has just been onlined first time since the request queue was initialized. At this time ctx->index_hw for the CPU, which is the index in hctx->ctxs[] for this ctx, is still zero before blk_mq_queue_reinit_notify() is called by notifier callbacks for CPU_ONLINE action. For example, there is a single hw queue (hctx) and two CPU queues (ctx0 for CPU0, and ctx1 for CPU1). Now CPU1 is just onlined and a request is inserted into ctx1->rq_list and set bit0 in pending bitmap as ctx1->index_hw is still zero. And then while running hw queue, flush_busy_ctxs() finds bit0 is set in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list. But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is ignored. Fix it by ensuring that new mapping is established before onlined cpu starts running. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Reviewed-by: NMing Lei <tom.leiming@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ming Lei <tom.leiming@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 30 1月, 2015 1 次提交
-
-
由 Ming Lei 提交于
The kobject memory inside blk-mq hctx/ctx shouldn't have been freed before the kobject is released because driver core can access it freely before its release. We can't do that in all ctx/hctx/mq_kobj's release handler because it can be run before blk_cleanup_queue(). Given mq_kobj shouldn't have been introduced, this patch simply moves mq's release into blk_release_queue(). Reported-by: NSasha Levin <sasha.levin@oracle.com> Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 01 1月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
If it's dying, we can't expect new request to complete and come in an wake up other tasks waiting for requests. So after we have marked it as dying, wake up everybody currently waiting for a request. Once they wake, they will retry their allocation and fail appropriately due to the state of the queue. Tested-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 12月, 2014 1 次提交
-
-
由 Ming Lei 提交于
When one hardware queue has no mapped software queues, it shouldn't have been scheduled. Otherwise WARNING or OOPS can triggered. blk_mq_hw_queue_mapped() helper is introduce for fixing the problem. Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 26 9月, 2014 2 次提交
-
-
由 Ming Lei 提交于
These two temporary functions are introduced for holding flush initialization and de-initialization, so that we can introduce 'flush queue' easier in the following patch. And once 'flush queue' and its allocation/free functions are ready, they will be removed for sake of code readability. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
It is reasonable to allocate flush req in blk_mq_init_flush(). Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 23 9月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Moved blk_mq_rq_timed_out() definition to the private blk-mq.h header. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 02 7月, 2014 1 次提交
-
-
由 Tejun Heo 提交于
blk_mq freezing is entangled with generic bypassing which bypasses blkcg and io scheduler and lets IO requests fall through the block layer to the drivers in FIFO order. This allows forward progress on IOs with the advanced features disabled so that those features can be configured or altered without worrying about stalling IO which may lead to deadlock through memory allocation. However, generic bypassing doesn't quite fit blk-mq. blk-mq currently doesn't make use of blkcg or ioscheds and it maps bypssing to freezing, which blocks request processing and drains all the in-flight ones. This causes problems as bypassing assumes that request processing is online. blk-mq works around this by conditionally allowing request processing for the problem case - during queue initialization. Another weirdity is that except for during queue cleanup, bypassing started on the generic side prevents blk-mq from processing new requests but doesn't drain the in-flight ones. This shouldn't break anything but again highlights that something isn't quite right here. The root cause is conflating blk-mq freezing and generic bypassing which are two different mechanisms. The only intersecting purpose that they serve is during queue cleanup. Let's properly separate blk-mq freezing from generic bypassing and simply use it where necessary. * request_queue->mq_freeze_depth is added and blk_mq_[un]freeze_queue() now operate on this counter instead of ->bypass_depth. The replacement for QUEUE_FLAG_BYPASS isn't added but the counter is tested directly. This will be further updated by later changes. * blk_mq_drain_queue() is dropped and "__" prefix is dropped from blk_mq_freeze_queue(). Queue cleanup path now calls blk_mq_freeze_queue() directly. * blk_queue_enter()'s fast path condition is simplified to simply check @q->mq_freeze_depth. Previously, the condition was !blk_queue_dying(q) && (!blk_queue_bypass(q) || !blk_queue_init_done(q)) mq_freeze_depth is incremented right after dying is set and blk_queue_init_done() exception isn't necessary as blk-mq doesn't start frozen, which only leaves the blk_queue_bypass() test which can be replaced by @q->mq_freeze_depth test. This change simplifies the code and reduces confusion in the area. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 04 6月, 2014 2 次提交
-
-
由 Ming Lei 提交于
blk_mq_put_ctx() has to be called before io_schedule() in bt_get(). This patch fixes the problem by taking similar approach from percpu_ida allocation for the situation. Signed-off-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
The blk-mq tag code need these helpers. Signed-off-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 30 5月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
Currently blk-mq registers all the hardware queues in sysfs, regardless of whether it uses them (e.g. they have CPU mappings) or not. The unused hardware queues lack the cpux/ directories, and the other sysfs entries (like active, pending, etc) are all zeroes. Change this so that sysfs correctly reflects the current mappings of the hardware queues. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 28 5月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
Drivers currently have to figure this out on their own, and they are missing information to do it properly. The ones that did attempt to do it, do it wrong. So just pass in the suggested node directly to the alloc function. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 5月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
Prepare this for the next patch which adds more smarts in the plugging logic, so that we can save some memory. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 21 5月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
For request_fn based devices, the block layer exports a 'nr_requests' file through sysfs to allow adjusting of queue depth on the fly. Currently this returns -EINVAL for blk-mq, since it's not wired up. Wire this up for blk-mq, so that it now also always dynamic adjustments of the allowed queue depth for any given block device managed by blk-mq. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 20 5月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
We will use it for the pending list in blk-mq core as well. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 5月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
blk-mq currently uses percpu_ida for tag allocation. But that only works well if the ratio between tag space and number of CPUs is sufficiently high. For most devices and systems, that is not the case. The end result if that we either only utilize the tag space partially, or we end up attempting to fully exhaust it and run into lots of lock contention with stealing between CPUs. This is not optimal. This new tagging scheme is a hybrid bitmap allocator. It uses two tricks to both be SMP friendly and allow full exhaustion of the space: 1) We cache the last allocated (or freed) tag on a per blk-mq software context basis. This allows us to limit the space we have to search. The key element here is not caching it in the shared tag structure, otherwise we end up dirtying more shared cache lines on each allocate/free operation. 2) The tag space is split into cache line sized groups, and each context will start off randomly in that space. Even up to full utilization of the space, this divides the tag users efficiently into cache line groups, avoiding dirtying the same one both between allocators and between allocator and freeer. This scheme shows drastically better behaviour, both on small tag spaces but on large ones as well. It has been tested extensively to show better performance for all the cases blk-mq cares about. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 25 4月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
The blk-mq code is using it's own version of the I/O completion affinity tunables, which causes a few issues: - the rq_affinity sysfs file doesn't work for blk-mq devices, even if it still is present, thus breaking existing tuning setups. - the rq_affinity = 1 mode, which is the defauly for legacy request based drivers isn't implemented at all. - blk-mq drivers don't implement any completion affinity with the default flag settings. This patches removes the blk-mq ipi_redirect flag and sysfs file, as well as the internal BLK_MQ_F_SHOULD_IPI flag and replaces it with code that respects the queue-wide rq_affinity flags and also implements the rq_affinity = 1 mode. This means I/O completion affinity can now only be tuned block-queue wide instead of per context, which seems more sensible to me anyway. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 24 4月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
If a requeue event races with a timeout, we can get into the situation where we attempt to complete a request from the timeout handler when it's not start anymore. This causes a crash. So have the timeout handler check that REQ_ATOM_STARTED is still set on the request - if not, we ignore the event. If this happens, the request has now been marked as complete. As a consequence, we need to ensure to clear REQ_ATOM_COMPLETE in blk_mq_start_request(), as to maintain proper request state. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 16 4月, 2014 2 次提交
-
-
由 Christoph Hellwig 提交于
Add a new blk_mq_tag_set structure that gets set up before we initialize the queue. A single blk_mq_tag_set structure can be shared by multiple queues. Signed-off-by: NChristoph Hellwig <hch@lst.de> Modular export of blk_mq_{alloc,free}_tagset added by me. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Drivers shouldn't have to care about the block layer setting aside a request to implement the flush state machine. We already override the mq context and tag to make it more transparent, but so far haven't deal with the driver private data in the request. Make sure to override this as well, and while we're at it add a proper helper sitting in blk-mq.c that implements the full impersonation. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-