提交 · f26cdc8536ad50fb802a0445f836b4f94ca09ae7 · openanolis / cloud-kernel

02 6月, 2015 1 次提交

blk-mq: Shared tag enhancements · f26cdc85

由 Keith Busch 提交于 6月 01, 2015

Storage controllers may expose multiple block devices that share hardware
resources managed by blk-mq. This patch enhances the shared tags so a
low-level driver can access the shared resources not tied to the unshared
h/w contexts. This way the LLD can dynamically add and delete disks and
request queues without having to track all the request_queue hctx's to
iterate outstanding tags.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f26cdc85

19 5月, 2015 1 次提交

block: use an atomic_t for mq_freeze_depth · 4ecd4fef

由 Christoph Hellwig 提交于 5月 07, 2015

lockdep gets unhappy about the not disabling irqs when using the queue_lock
around it.  Instead of trying to fix that up just switch to an atomic_t
and get rid of the lock.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4ecd4fef

09 5月, 2015 4 次提交

blk-mq: make plug work for mutiple disks and queues · 5b3f341f

由 Shaohua Li 提交于 5月 08, 2015

Last patch makes plug work for multiple queue case. However it only
works for single disk case, because it assumes only one request in the
plug list. If a task is accessing multiple disks, eg MD/DM, the
assumption is wrong. Let blk_attempt_plug_merge() record request from
the same queue.

V2: use NULL parameter in !mq case. Fix a bug. Add comments in
blk_attempt_plug_merge to make it less (hopefully) confusion.

Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5b3f341f

blk-mq: do limited block plug for multiple queue case · f984df1f

由 Shaohua Li 提交于 5月 08, 2015

plug is still helpful for workload with IO merge, but it can be harmful
otherwise especially with multiple hardware queues, as there is
(supposed) no lock contention in this case and plug can introduce
latency. For multiple queues, we do limited plug, eg plug only if there
is request merge. If a request doesn't have merge with following
request, the requet will be dispatched immediately.

V2: check blk_queue_nomerges() as suggested by Jeff.

Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f984df1f

blk-mq: avoid re-initialize request which is failed in direct dispatch · 239ad215

由 Shaohua Li 提交于 5月 08, 2015

If we directly issue a request and it fails, we use
blk_mq_merge_queue_io(). But we already assigned bio to a request in
blk_mq_bio_to_request. blk_mq_merge_queue_io shouldn't run
blk_mq_bio_to_request again.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

239ad215

blk-mq: fix plugging in blk_sq_make_request · e6c4438b

由 Jeff Moyer 提交于 5月 08, 2015

The following appears in blk_sq_make_request:

	/*
	 * If we have multiple hardware queues, just go directly to
	 * one of those for sync IO.
	 */

We clearly don't have multiple hardware queues, here!  This comment was
introduced with this commit 07068d5b (blk-mq: split make request
handler for multi and single queue):

    We want slightly different behavior from them:

    - On single queue devices, we currently use the per-process plug
      for deferred IO and for merging.

    - On multi queue devices, we don't use the per-process plug, but
      we want to go straight to hardware for SYNC IO.

The old code had this:

        use_plug = !is_flush_fua && ((q->nr_hw_queues == 1) || !is_sync);

and that was converted to:

	use_plug = !is_flush_fua && !is_sync;

which is not equivalent.  For the single queue case, that second half of
the && expression is always true.  So, what I think was actually inteded
follows (and this more closely matches what is done in blk_queue_bio).

V2: delete the 'likely', which should not be a big deal
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e6c4438b

17 4月, 2015 1 次提交

blk-mq: fix iteration of busy bitmap · 569fd0ce

由 Jens Axboe 提交于 4月 17, 2015

Commit 889fa31f was a bit too eager in reducing the loop count,
so we ended up missing queues in some configurations. Ensure that
our division rounds up, so that's not the case.
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Fixes: 889fa31f ("blk-mq: reduce unnecessary software queue looping")
Signed-off-by: NJens Axboe <axboe@fb.com>

569fd0ce

16 4月, 2015 1 次提交

blk-mq: reduce unnecessary software queue looping · 889fa31f

由 Chong Yuan 提交于 4月 15, 2015

In flush_busy_ctxs() and blk_mq_hctx_has_pending(), regardless of how many
ctxs assigned to one hctx, they will all loop hctx->ctx_map.map_size
times. Here hctx->ctx_map.map_size is a const ALIGN(nr_cpu_ids, 8) / 8.
Especially, flush_busy_ctxs() is in hot code path. And it's unnecessary.
Change ->map_size to contain the actually mapped software queues, so we
only loop for as many iterations as we have to.

And remove cpumask setting and nr_ctx count in blk_mq_init_cpu_queues()
since they are all re-done in blk_mq_map_swqueue().
blk_mq_map_swqueue().
Signed-off-by: NChong Yuan <chong.yuan@memblaze.com>
Reviewed-by: NWenbo Wang <wenbo.wang@memblaze.com>

Updated by me for formatting and commenting.
Signed-off-by: NJens Axboe <axboe@fb.com>

889fa31f

12 4月, 2015 1 次提交

blk-mq: initialize 'struct request' and associated data to zero · ac211175

由 Linus Torvalds 提交于 4月 09, 2015

Jan Engelhardt reports a strange oops with an invalid ->sense_buffer
pointer in scsi_init_cmd_errh() with the blk-mq code.

The sense_buffer pointer should have been initialized by the call to
scsi_init_request() from blk_mq_init_rq_map(), but there seems to be
some non-repeatable memory corruptor.

This patch makes sure we initialize the whole struct request allocation
(and the associated 'struct scsi_cmnd' for the SCSI case) to zero, by
using __GFP_ZERO in the allocation.  The old code initialized a couple
of individual fields, leaving the rest undefined (although many of them
are then initialized in later phases, like blk_mq_rq_ctx_init() etc.

It's not entirely clear why this matters, but it's the rigth thing to do
regardless, and with 4.0 imminent this is the defensive "let's just make
sure everything is initialized properly" patch.
Tested-by: NJan Engelhardt <jengelh@inai.de>
Acked-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac211175

30 3月, 2015 2 次提交

blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue() · c76cbbcf

由 Wei Fang 提交于 3月 30, 2015

Don't assign ->rq_timeout twice.
Signed-off-by: NWei Fang <fangwei1@huawei.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c76cbbcf

block: remove redundant check about 'set->nr_hw_queues' in blk_mq_alloc_tag_set() · f9018ac9

由 Xiaoguang Wang 提交于 3月 30, 2015

At the beginning of blk_mq_alloc_tag_set(), we have already checked whether
'set->nr_hw_queues' is zero, so here remove this redundant check.
Signed-off-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f9018ac9

13 3月, 2015 4 次提交

blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set · bfd343aa

由 Keith Busch 提交于 3月 11, 2015

Return -EBUSY if we're unable to enter a queue immediately when
allocating a blk-mq request without __GFP_WAIT.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

bfd343aa

blk-mq: export blk_mq_run_hw_queues · b94ec296

由 Mike Snitzer 提交于 3月 11, 2015

Rename blk_mq_run_queues to blk_mq_run_hw_queues, add async argument,
and export it.

DM's suspend support must be able to run the queue without starting
stopped hw queues.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b94ec296

blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk · b62c21b7

由 Mike Snitzer 提交于 3月 12, 2015

Add a variant of blk_mq_init_queue that allows a previously allocated
queue to be initialized. blk_mq_init_allocated_queue models
blk_init_allocated_queue -- which was also created for DM's use.

DM's approach to device creation requires a placeholder request_queue be
allocated for use with alloc_dev() but the decision about what type of
request_queue will be ultimately created is deferred until all component
devices referenced in the DM table are processed to determine the table
type (request-based, blk-mq request-based, or bio-based).

Also, because of DM's late finalization of the request_queue type
the call to blk_mq_register_disk() doesn't happen during alloc_dev().
Must export blk_mq_register_disk() so that DM can backfill the 'mq' dir
once the blk-mq queue is fully allocated.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b62c21b7

blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path · 9a30b096

由 Mike Snitzer 提交于 3月 12, 2015

If percpu_ref_init() fails the allocated q and hctxs must get cleaned
up; using 'err_map' doesn't allow that to happen.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NMing Lei <ming.lei@canonical.com>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

9a30b096

11 2月, 2015 1 次提交

blk-mq: make blk_mq_run_queues() static · 201f201c

由 Jens Axboe 提交于 2月 10, 2015

We no longer use it outside of blk-mq.c, so we can make it static
and stop exporting it. Additionally, kill the 'async' argument, as
there's only one used of it.
Signed-off-by: NJens Axboe <axboe@fb.com>

201f201c

30 1月, 2015 2 次提交

blk-mq: release mq's kobjects in blk_release_queue() · e09aae7e

由 Ming Lei 提交于 1月 29, 2015

The kobject memory inside blk-mq hctx/ctx shouldn't have been freed
before the kobject is released because driver core can access it freely
before its release.

We can't do that in all ctx/hctx/mq_kobj's release handler because
it can be run before blk_cleanup_queue().

Given mq_kobj shouldn't have been introduced, this patch simply moves
mq's release into blk_release_queue().
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e09aae7e

Revert "blk-mq: fix hctx/ctx kobject use-after-free" · 74170118

由 Ming Lei 提交于 1月 29, 2015

This reverts commit 76d697d1.

The commit 76d697d1 causes general protection fault
reported from Bart Van Assche:

	https://lkml.org/lkml/2015/1/28/334Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

74170118

24 1月, 2015 1 次提交

blk-mq: add tag allocation policy · 24391c0d

由 Shaohua Li 提交于 1月 23, 2015

This is the blk-mq part to support tag allocation policy. The default
allocation policy isn't changed (though it's not a strict FIFO). The new
policy is round-robin for libata. But it's a try-best implementation. If
multiple tasks are competing, the tags returned will be mixed (which is
unavoidable even with !mq, as requests from different tasks can be
mixed in queue)

Cc: Jens Axboe <axboe@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

24391c0d

21 1月, 2015 1 次提交

blk-mq: fix hctx/ctx kobject use-after-free · 76d697d1

由 Ming Lei 提交于 1月 20, 2015

The kobject memory shouldn't have been freed before the kobject
is released because driver core can access it freely before its
release.

This patch frees hctx in its release callback. For ctx, they
share one single per-cpu variable which is associated with
the request queue, so free ctx in q->mq_kobj's release handler.
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
(fix ctx kobjects)
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

76d697d1

08 1月, 2015 7 次提交

blk-mq: End unstarted requests on a dying queue · eb130dbf

由 Keith Busch 提交于 1月 08, 2015

Requests that haven't been started prior to a queue dying can be ended
in error without waiting for them to start and time out.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

Added code comment to explain why this is done.
Signed-off-by: NJens Axboe <axboe@fb.com>

eb130dbf

blk-mq: Allow requests to never expire · 5b3f25fc

由 Keith Busch 提交于 1月 07, 2015

Some types of requests may be started that are not gauranteed to ever
complete. This adds a request flag that a driver can use so mark the
request as such.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5b3f25fc

blk-mq: Add helper to abort requeued requests · 1885b24d

由 Jens Axboe 提交于 1月 07, 2015

Adds a helper function a driver can use to abort requeued requests in
case any are pending when h/w queues are being removed.
Signed-off-by: NJens Axboe <axboe@fb.com>

1885b24d

blk-mq: Let drivers cancel requeue_work · c68ed59f

由 Keith Busch 提交于 1月 07, 2015

Kicking requeued requests will start h/w queues in a work_queue, which
may alter the driver's requested state to temporarily stop them. This
patch exports a method to cancel the q->requeue_work so a driver can be
assured stopped h/w queues won't be started up before it is ready.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c68ed59f

blk-mq: Export if requests were started · 973c0191

由 Keith Busch 提交于 1月 07, 2015

Drivers can iterate over all allocated request tags, but their callback
needs a way to know if the driver started the request in the first place.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

973c0191

blk-mq: Wake tasks entering queue on dying · 3fd5940c

由 Keith Busch 提交于 1月 08, 2015

When the queue is set to dying, wake up tasks that are waiting on frozen
queue so they realize it is dying and abandon their request.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

Modified by me to add a code comment on the need for the wakeup.
Signed-off-by: NJens Axboe <axboe@fb.com>

3fd5940c

blk-mq: get rid of ->cmd_size in the hardware queue · 17ded320

由 Jens Axboe 提交于 1月 07, 2015

We store it in the tag set, we don't need it in the hardware queue.
While removing cmd_size, place ->queue_num further down to avoid
a hole on 64-bit archs. It's not used in any fast paths, so we
can safely move it.
Signed-off-by: NJens Axboe <axboe@fb.com>

17ded320

03 1月, 2015 1 次提交

blk-mq: export blk_mq_freeze_queue() · c761d96b

由 Jens Axboe 提交于 1月 02, 2015

Commit b4c6a028 exported the start and unfreeze, but we need
the regular blk_mq_freeze_queue() for the loop conversion.
Signed-off-by: NJens Axboe <axboe@fb.com>

c761d96b

01 1月, 2015 1 次提交

block: wake up waiters when a queue is marked dying · aed3ea94

由 Jens Axboe 提交于 12月 22, 2014

If it's dying, we can't expect new request to complete and come
in an wake up other tasks waiting for requests. So after we
have marked it as dying, wake up everybody currently waiting
for a request. Once they wake, they will retry their allocation
and fail appropriately due to the state of the queue.
Tested-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

aed3ea94

21 12月, 2014 2 次提交

blk-mq: Export freeze/unfreeze functions · b4c6a028

由 Keith Busch 提交于 12月 19, 2014

Let drivers prevent entering a queue that isn't available.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b4c6a028

blk-mq: Exit queue on alloc failure · c76541a9

由 Keith Busch 提交于 12月 19, 2014

Fixes usage counter when a request could not be allocated.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c76541a9

09 12月, 2014 1 次提交

blk-mq: prevent unmapped hw queue from being scheduled · 19c66e59

由 Ming Lei 提交于 12月 03, 2014

When one hardware queue has no mapped software queues, it
shouldn't have been scheduled. Otherwise WARNING or OOPS
can triggered.

blk_mq_hw_queue_mapped() helper is introduce for fixing
the problem.
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

19c66e59

01 12月, 2014 1 次提交

blk-mq: move the kdump check to blk_mq_alloc_tag_set · 6637fadf

由 Shaohua Li 提交于 11月 30, 2014

We call blk_mq_alloc_tag_set() first then blk_mq_init_queue(). The requests are
allocated in the former function. So the kdump check should be moved to there
to really save memory.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6637fadf

24 11月, 2014 1 次提交

blk-mq: handle the single queue case in blk_mq_hctx_next_cpu · b657d7e6

由 Christoph Hellwig 提交于 11月 24, 2014

Don't duplicate the code to handle the not cpu bounce case in the
caller, do it inside blk_mq_hctx_next_cpu instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

b657d7e6

18 11月, 2014 2 次提交

blk-mq: add blk_mq_free_hctx_request() · 7c7f2f2b

由 Jens Axboe 提交于 11月 17, 2014

It's silly to use blk_mq_free_request() which in turn maps the
request to the hardware queue, for places where we already know
what the hardware queue is. This saves us an extra mapping of a
hardware queue on request completion, if the caller knows this
information already.
Signed-off-by: NJens Axboe <axboe@fb.com>

7c7f2f2b

blk-mq: export blk_mq_free_request() · 1a3b595a

由 Jens Axboe 提交于 11月 17, 2014

Drivers that know they are blk-mq should just use this function
instead of calling through blk_put_request().
Signed-off-by: NJens Axboe <axboe@fb.com>

1a3b595a

12 11月, 2014 3 次提交

blk-mq: add blk_mq_unique_tag() · 205fb5f5

由 Bart Van Assche 提交于 10月 30, 2014

The queuecommand() callback functions in SCSI low-level drivers
need to know which hardware context has been selected by the
block layer. Since this information is not available in the
request structure, and since passing the hctx pointer directly to
the queuecommand callback function would require modification of
all SCSI LLDs, add a function to the block layer that allows to
query the hardware context index.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

205fb5f5

blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable · 2a90d4aa

由 Paolo Bonzini 提交于 11月 07, 2014

blk-mq is using preempt_disable/enable in order to ensure that the
queue runners are placed on the right CPU.  This does not work with
the RT patches, because __blk_mq_run_hw_queue takes a non-raw
spinlock with the preemption-disabled region.  If there is contention
on the lock, this violates the rules for preemption-disabled regions.

While this should be easily fixable within the RT patches just by doing
migrate_disable/enable, we can do better and document _why_ this
particular region runs with disabled preemption.  After the previous
patch, it is trivial to switch it to get/put_cpu; the RT patches then
can change it to get_cpu_light, which lets virtio-blk run under RT
kernels.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: NClark Williams <williams@redhat.com>
Tested-by: NClark Williams <williams@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2a90d4aa

blk_mq: call preempt_disable/enable in blk_mq_run_hw_queue, and only if needed · 398205b8

由 Paolo Bonzini 提交于 11月 07, 2014

preempt_disable/enable surrounds every call to blk_mq_run_hw_queue,
except the one in blk-flush.c.  In fact that one is always asynchronous,
and it does not need smp_processor_id().

We can do the same for all other calls, avoiding preempt_disable when
async is true.  This avoids peppering blk-mq.c with preemption-disabled
regions.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: NClark Williams <williams@redhat.com>
Tested-by: NClark Williams <williams@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

398205b8

05 11月, 2014 1 次提交

blk-mq: make mq_queue_reinit_notify() freeze queues in parallel · f3af020b

由 Tejun Heo 提交于 11月 04, 2014

q->mq_usage_counter is a percpu_ref which is killed and drained when
the queue is frozen.  On a CPU hotplug event, blk_mq_queue_reinit()
which involves freezing the queue is invoked on all existing queues.
Because percpu_ref killing and draining involve a RCU grace period,
doing the above on one queue after another may take a long time if
there are many queues on the system.

This patch splits out initiation of freezing and waiting for its
completion, and updates blk_mq_queue_reinit_notify() so that the
queues are frozen in parallel instead of one after another.  Note that
freezing and unfreezing are moved from blk_mq_queue_reinit() to
blk_mq_queue_reinit_notify().
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f3af020b

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功