提交 · 1dfa0f68c040080c5fefa7211b4ec34d202f8570 · openeuler / raspberrypi-kernel

06 2月, 2015 3 次提交

block: add a helper to free bio bounce buffer pages · 1dfa0f68

由 Christoph Hellwig 提交于 1月 18, 2015

The code sniplet to walk all bio_vecs and free their pages is opencoded in
way to many places, so factor it into a helper.  Also convert the slightly
more complex cases in bio_kern_endio and __bio_copy_iov where we break
the freeing from an existing loop into a separate one.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

1dfa0f68

block: use blk_rq_map_user_iov to implement blk_rq_map_user · ddad8dd0

由 Christoph Hellwig 提交于 1月 18, 2015

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

ddad8dd0

block: simplify bio_map_kern · 42d2683a

由 Christoph Hellwig 提交于 1月 18, 2015

Just open code the trivial mapping from a kernel virtual address to
a bio instead of going through the complex user address mapping
machinery.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

42d2683a

29 1月, 2015 3 次提交

block: keep established cmd_flags when cloning into a blk-mq request · 77a08689

由 Keith Busch 提交于 1月 12, 2015

blk_mq_alloc_request() may establish REQ_MQ_INFLIGHT in addition to
incrementing the hctx->nr_active count.  Any cmd_flags that are
established in the newly allocated clone request must be preserved in
addition to the cmd_flags that are later copied over from the original
request as part of blk_rq_prep_clone().

Otherwise, if REQ_MQ_INFLIGHT isn't set in the clone request the
hctx->nr_active count won't get decremented via blk_mq_free_request().

The only consumer of blk_rq_prep_clone() is request-based DM, which uses
blk_rq_init() prior to calling blk_rq_prep_clone() for the non-blk-mq
case.  Given the cloned request's cmd_flags will be 0 it is safe to OR
them with the original request's cmd_flags for both the non-blk-mq and
blk-mq cases.
Reported-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

77a08689

block: add blk-mq support to blk_insert_cloned_request() · 7fb4898e

由 Keith Busch 提交于 10月 17, 2014

If the request passed to blk_insert_cloned_request() was allocated by
a blk-mq device it must be submitted using blk_mq_insert_request().
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7fb4898e

block: require blk_rq_prep_clone() be given an initialized clone request · febf7158

由 Keith Busch 提交于 10月 17, 2014

Prepare to allow blk_rq_prep_clone() to accept clone requests that were
allocated from blk-mq request queues.  As such the blk_rq_prep_clone()
caller must first initialize the clone request.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

febf7158

24 1月, 2015 2 次提交

blk-mq: add tag allocation policy · 24391c0d

由 Shaohua Li 提交于 1月 23, 2015

This is the blk-mq part to support tag allocation policy. The default
allocation policy isn't changed (though it's not a strict FIFO). The new
policy is round-robin for libata. But it's a try-best implementation. If
multiple tasks are competing, the tags returned will be mixed (which is
unavoidable even with !mq, as requests from different tasks can be
mixed in queue)

Cc: Jens Axboe <axboe@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

24391c0d

block: support different tag allocation policy · ee1b6f7a

由 Shaohua Li 提交于 1月 15, 2015

The libata tag allocation is using a round-robin policy. Next patch will
make libata use block generic tag allocation, so let's add a policy to
tag allocation.

Currently two policies: FIFO (default) and round-robin.

Cc: Jens Axboe <axboe@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

ee1b6f7a

22 1月, 2015 3 次提交

block: Remove annoying "unknown partition table" message · bb5c3cdd

由 Boaz Harrosh 提交于 1月 22, 2015

As Christoph put it:
  Can we just get rid of the warnings?  It's fairly annoying as devices
  without partitions are perfectly fine and very useful.

Me too I see this message every VM boot for ages on all my
devices. Would love to just remove it. For me a partition-table
is only needed for a booting BIOS, grub, and stuff.

CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

bb5c3cdd

block: Add discard flag to blkdev_issue_zeroout() function · d93ba7a5

由 Martin K. Petersen 提交于 1月 20, 2015

blkdev_issue_discard() will zero a given block range. This is done by
way of explicit writing, thus provisioning or allocating the blocks on
disk.

There are use cases where the desired behavior is to zero the blocks but
unprovision them if possible. The blocks must deterministically contain
zeroes when they are subsequently read back.

This patch adds a flag to blkdev_issue_zeroout() that provides this
variant. If the discard flag is set and a block device guarantees
discard_zeroes_data we will use REQ_DISCARD to clear the block range. If
the device does not support discard_zeroes_data or if the discard
request fails we will fall back to first REQ_WRITE_SAME and then a
regular REQ_WRITE.

Also update the callers of blkdev_issue_zero() to reflect the new flag
and make sb_issue_zeroout() prefer the discard approach.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d93ba7a5

cfq-iosched: fix incorrect filing of rt async cfqq · c6ce1943

由 Jeff Moyer 提交于 1月 12, 2015

Hi,

If you can manage to submit an async write as the first async I/O from
the context of a process with realtime scheduling priority, then a
cfq_queue is allocated, but filed into the wrong async_cfqq bucket.  It
ends up in the best effort array, but actually has realtime I/O
scheduling priority set in cfqq->ioprio.

The reason is that cfq_get_queue assumes the default scheduling class and
priority when there is no information present (i.e. when the async cfqq
is created):

static struct cfq_queue *
cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
	      struct bio *bio, gfp_t gfp_mask)
{
	const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
	const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);

cic->ioprio starts out as 0, which is "invalid".  So, class of 0
(IOPRIO_CLASS_NONE) is passed to cfq_async_queue_prio like so:

		async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);

static struct cfq_queue **
cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
{
        switch (ioprio_class) {
        case IOPRIO_CLASS_RT:
                return &cfqd->async_cfqq[0][ioprio];
        case IOPRIO_CLASS_NONE:
                ioprio = IOPRIO_NORM;
                /* fall through */
        case IOPRIO_CLASS_BE:
                return &cfqd->async_cfqq[1][ioprio];
        case IOPRIO_CLASS_IDLE:
                return &cfqd->async_idle_cfqq;
        default:
                BUG();
        }
}

Here, instead of returning a class mapped from the process' scheduling
priority, we get back the bucket associated with IOPRIO_CLASS_BE.

Now, there is no queue allocated there yet, so we create it:

		cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);

That function ends up doing this:

			cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync);
			cfq_init_prio_data(cfqq, cic);

cfq_init_cfqq marks the priority as having changed.  Then, cfq_init_prio
data does this:

	ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
	switch (ioprio_class) {
	default:
		printk(KERN_ERR "cfq: bad prio %x\n", ioprio_class);
	case IOPRIO_CLASS_NONE:
		/*
		 * no prio set, inherit CPU scheduling settings
		 */
		cfqq->ioprio = task_nice_ioprio(tsk);
		cfqq->ioprio_class = task_nice_ioclass(tsk);
		break;

So we basically have two code paths that treat IOPRIO_CLASS_NONE
differently, which results in an RT async cfqq filed into a best effort
bucket.

Attached is a patch which fixes the problem.  I'm not sure how to make
it cleaner.  Suggestions would be welcome.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Tested-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

c6ce1943

14 1月, 2015 1 次提交

blk-mq: fix false negative out-of-tags condition · 0bf36498

由 Jens Axboe 提交于 1月 14, 2015

The blk-mq tagging tries to maintain some locality between CPUs and
the tags issued. The tags are split into groups of words, and the
words may not be fully populated. When searching for a new free tag,
blk-mq may look at partial words, hence it passes in an offset/size
to find_next_zero_bit(). However, it does that wrong, the size must
always be the full length of the number of tags in that word,
otherwise we'll potentially miss some near the end.

Another issue is when __bt_get() goes from one word set to the next.
It bumps the index, but not the last_tag associated with the
previous index. Bump that to be in the range of the new word.

Finally, clean up __bt_get() and __bt_get_word() a bit and get
rid of the goto in there, and the unnecessary 'wrap' variable.
Signed-off-by: NJens Axboe <axboe@fb.com>

0bf36498

03 1月, 2015 1 次提交

blk-mq: export blk_mq_freeze_queue() · c761d96b

由 Jens Axboe 提交于 1月 02, 2015

Commit b4c6a028 exported the start and unfreeze, but we need
the regular blk_mq_freeze_queue() for the loop conversion.
Signed-off-by: NJens Axboe <axboe@fb.com>

c761d96b

01 1月, 2015 1 次提交

block: wake up waiters when a queue is marked dying · aed3ea94

由 Jens Axboe 提交于 12月 22, 2014

If it's dying, we can't expect new request to complete and come
in an wake up other tasks waiting for requests. So after we
have marked it as dying, wake up everybody currently waiting
for a request. Once they wake, they will retry their allocation
and fail appropriately due to the state of the queue.
Tested-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

aed3ea94

21 12月, 2014 2 次提交

blk-mq: Export freeze/unfreeze functions · b4c6a028

由 Keith Busch 提交于 12月 19, 2014

Let drivers prevent entering a queue that isn't available.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b4c6a028

blk-mq: Exit queue on alloc failure · c76541a9

由 Keith Busch 提交于 12月 19, 2014

Fixes usage counter when a request could not be allocated.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c76541a9

15 12月, 2014 1 次提交

Revert "blk-mq: Micro-optimize bt_get()" · 35d37c66

由 Jens Axboe 提交于 12月 15, 2014

This reverts commit 52f7eb94.

The optimization is only really safe for a single queue, otherwise
'bs' and 'bt' can indeed change, and if we don't do a finish_wait()
for each loop, we'll potentially change the wait structure and
corrupt task wait list.
Reported-by: NJan Kara <jack@suse.cz>

35d37c66

12 12月, 2014 1 次提交

bio: modify __bio_add_page() to accept pages that don't start a new segment · fcbf6a08

由 Maurizio Lombardi 提交于 12月 10, 2014

The original behaviour is to refuse to add a new page if the maximum
number of segments has been reached, regardless of the fact the page we
are going to add can be merged into the last segment or not.

Unfortunately, when the system runs under heavy memory fragmentation
conditions, a driver may try to add multiple pages to the last segment.
The original code won't accept them and EBUSY will be reported to
userspace.

This patch modifies the function so it refuses to add a page only in case
the latter starts a new segment and the maximum number of segments has
already been reached.

The bug can be easily reproduced with the st driver:

1) set CONFIG_SCSI_MPT2SAS_MAX_SGE or CONFIG_SCSI_MPT3SAS_MAX_SGE  to 16
2) modprobe st buffer_kbs=1024
3) #dd if=/dev/zero of=/dev/st0 bs=1M count=10
   dd: error writing `/dev/st0': Device or resource busy
Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Cc: Jet Chen <jet.chen@intel.com>
Cc: Tomas Henzl <thenzl@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

fcbf6a08

10 12月, 2014 6 次提交

blk-mq: Fix uninitialized kobject at CPU hotplugging · 06a41a99

由 Takashi Iwai 提交于 12月 10, 2014

When a CPU is hotplugged, the current blk-mq spews a warning like:

  kobject '(null)' (ffffe8ffffc8b5d8): tried to add an uninitialized object, something is seriously wrong.
  CPU: 1 PID: 1386 Comm: systemd-udevd Not tainted 3.18.0-rc7-2.g088d59b-default #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_171129-lamiak 04/01/2014
   0000000000000000 0000000000000002 ffffffff81605f07 ffffe8ffffc8b5d8
   ffffffff8132c7a0 ffff88023341d370 0000000000000020 ffff8800bb05bd58
   ffff8800bb05bd08 000000000000a0a0 000000003f441940 0000000000000007
  Call Trace:
   [<ffffffff81005306>] dump_trace+0x86/0x330
   [<ffffffff81005644>] show_stack_log_lvl+0x94/0x170
   [<ffffffff81006d21>] show_stack+0x21/0x50
   [<ffffffff81605f07>] dump_stack+0x41/0x51
   [<ffffffff8132c7a0>] kobject_add+0xa0/0xb0
   [<ffffffff8130aee1>] blk_mq_register_hctx+0x91/0xb0
   [<ffffffff8130b82e>] blk_mq_sysfs_register+0x3e/0x60
   [<ffffffff81309298>] blk_mq_queue_reinit_notify+0xf8/0x190
   [<ffffffff8107cfdc>] notifier_call_chain+0x4c/0x70
   [<ffffffff8105fd23>] cpu_notify+0x23/0x50
   [<ffffffff81060037>] _cpu_up+0x157/0x170
   [<ffffffff810600d9>] cpu_up+0x89/0xb0
   [<ffffffff815fa5b5>] cpu_subsys_online+0x35/0x80
   [<ffffffff814323cd>] device_online+0x5d/0xa0
   [<ffffffff81432485>] online_store+0x75/0x80
   [<ffffffff81236a5a>] kernfs_fop_write+0xda/0x150
   [<ffffffff811c5532>] vfs_write+0xb2/0x1f0
   [<ffffffff811c5f42>] SyS_write+0x42/0xb0
   [<ffffffff8160c4ed>] system_call_fastpath+0x16/0x1b
   [<00007f0132fb24e0>] 0x7f0132fb24e0

This is indeed because of an uninitialized kobject for blk_mq_ctx.
The blk_mq_ctx kobjects are initialized in blk_mq_sysfs_init(), but it
goes loop over hctx_for_each_ctx(), i.e. it initializes only for
online CPUs.  Thus, when a CPU is hotplugged, the ctx for the newly
onlined CPU is registered without initialization.

This patch fixes the issue by initializing the all ctx kobjects
belonging to each queue.

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=908794
Cc: <stable@vger.kernel.org>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

06a41a99

blk-mq: Use all available hardware queues · 959f5f5b

由 Bart Van Assche 提交于 12月 09, 2014

Suppose that a system has two CPU sockets, three cores per socket,
that it does not support hyperthreading and that four hardware
queues are provided by a block driver. With the current algorithm
this will lead to the following assignment of CPU cores to hardware
queues:

  HWQ 0: 0 1
  HWQ 1: 2 3
  HWQ 2: 4 5
  HWQ 3: (none)

This patch changes the queue assignment into:

  HWQ 0: 0 1
  HWQ 1: 2
  HWQ 2: 3 4
  HWQ 3: 5

In other words, this patch has the following three effects:
- All four hardware queues are used instead of only three.
- CPU cores are spread more evenly over hardware queues. For the
  above example the range of the number of CPU cores associated
  with a single HWQ is reduced from [0..2] to [1..2].
- If the number of HWQ's is a multiple of the number of CPU sockets
  it is now guaranteed that all CPU cores associated with a single
  HWQ reside on the same CPU socket.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

959f5f5b

blk-mq: Micro-optimize bt_get() · 52f7eb94

由 Bart Van Assche 提交于 12月 09, 2014

Remove a superfluous finish_wait() call. Convert the two bt_wait_ptr()
calls into a single call.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Elliott <elliott@hp.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

52f7eb94

blk-mq: Fix a race between bt_clear_tag() and bt_get() · c38d185d

由 Bart Van Assche 提交于 12月 09, 2014

What we need is the following two guarantees:
* Any thread that observes the effect of the test_and_set_bit() by
  __bt_get_word() also observes the preceding addition of 'current'
  to the appropriate wait list. This is guaranteed by the semantics
  of the spin_unlock() operation performed by prepare_and_wait().
  Hence the conversion of test_and_set_bit_lock() into
  test_and_set_bit().
* The wait lists are examined by bt_clear() after the tag bit has
  been cleared. clear_bit_unlock() guarantees that any thread that
  observes that the bit has been cleared also observes the store
  operations preceding clear_bit_unlock(). However,
  clear_bit_unlock() does not prevent that the wait lists are examined
  before that the tag bit is cleared. Hence the addition of a memory
  barrier between clear_bit() and the wait list examination.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Elliott <elliott@hp.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: <stable@vger.kernel.org> # v3.13+
Signed-off-by: NJens Axboe <axboe@fb.com>

c38d185d

blk-mq: Avoid that __bt_get_word() wraps multiple times · 9e98e9d7

由 Bart Van Assche 提交于 12月 09, 2014

If __bt_get_word() is called with last_tag != 0, if the first
find_next_zero_bit() fails, if after wrap-around the
test_and_set_bit() call fails and find_next_zero_bit() succeeds,
if the next test_and_set_bit() call fails and subsequently
find_next_zero_bit() does not find a zero bit, then another
wrap-around will occur. Avoid this by introducing an additional
local variable.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Elliott <elliott@hp.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: <stable@vger.kernel.org> # v3.13+
Signed-off-by: NJens Axboe <axboe@fb.com>

9e98e9d7

blk-mq: Fix a use-after-free · 45a9c9d9

由 Bart Van Assche 提交于 12月 09, 2014

blk-mq users are allowed to free the memory request_queue.tag_set
points at after blk_cleanup_queue() has finished but before
blk_release_queue() has started. This can happen e.g. in the SCSI
core. The SCSI core namely embeds the tag_set structure in a SCSI
host structure. The SCSI host structure is freed by
scsi_host_dev_release(). This function is called after
blk_cleanup_queue() finished but can be called before
blk_release_queue().

This means that it is not safe to access request_queue.tag_set from
inside blk_release_queue(). Hence remove the blk_sync_queue() call
from blk_release_queue(). This call is not necessary - outstanding
requests must have finished before blk_release_queue() is
called. Additionally, move the blk_mq_free_queue() call from
blk_release_queue() to blk_cleanup_queue() to avoid that struct
request_queue.tag_set gets accessed after it has been freed.

This patch avoids that the following kernel oops can be triggered
when deleting a SCSI host for which scsi-mq was enabled:

Call Trace:
 [<ffffffff8109a7c4>] lock_acquire+0xc4/0x270
 [<ffffffff814ce111>] mutex_lock_nested+0x61/0x380
 [<ffffffff812575f0>] blk_mq_free_queue+0x30/0x180
 [<ffffffff8124d654>] blk_release_queue+0x84/0xd0
 [<ffffffff8126c29b>] kobject_cleanup+0x7b/0x1a0
 [<ffffffff8126c140>] kobject_put+0x30/0x70
 [<ffffffff81245895>] blk_put_queue+0x15/0x20
 [<ffffffff8125c409>] disk_release+0x99/0xd0
 [<ffffffff8133d056>] device_release+0x36/0xb0
 [<ffffffff8126c29b>] kobject_cleanup+0x7b/0x1a0
 [<ffffffff8126c140>] kobject_put+0x30/0x70
 [<ffffffff8125a78a>] put_disk+0x1a/0x20
 [<ffffffff811d4cb5>] __blkdev_put+0x135/0x1b0
 [<ffffffff811d56a0>] blkdev_put+0x50/0x160
 [<ffffffff81199eb4>] kill_block_super+0x44/0x70
 [<ffffffff8119a2a4>] deactivate_locked_super+0x44/0x60
 [<ffffffff8119a87e>] deactivate_super+0x4e/0x70
 [<ffffffff811b9833>] cleanup_mnt+0x43/0x90
 [<ffffffff811b98d2>] __cleanup_mnt+0x12/0x20
 [<ffffffff8107252c>] task_work_run+0xac/0xe0
 [<ffffffff81002c01>] do_notify_resume+0x61/0xa0
 [<ffffffff814d2c58>] int_signal+0x12/0x17
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Elliott <elliott@hp.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: <stable@vger.kernel.org> # v3.13+
Signed-off-by: NJens Axboe <axboe@fb.com>

45a9c9d9

09 12月, 2014 1 次提交

blk-mq: prevent unmapped hw queue from being scheduled · 19c66e59

由 Ming Lei 提交于 12月 03, 2014

When one hardware queue has no mapped software queues, it
shouldn't have been scheduled. Otherwise WARNING or OOPS
can triggered.

blk_mq_hw_queue_mapped() helper is introduce for fixing
the problem.
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

19c66e59

08 12月, 2014 2 次提交

blk-mq: re-check for available tags after running the hardware queue · 080ff351

由 Jens Axboe 提交于 12月 08, 2014

If we run out of tags and have to sleep, we run the hardware queue
to kick pending IO into gear. During that run, we may have completed
requests, so re-check if we have free tags before going to sleep.
Signed-off-by: NJens Axboe <axboe@fb.com>

080ff351

blk-mq: fix hang in bt_get() · b3223207

由 Bart Van Assche 提交于 12月 08, 2014

Avoid that if there are fewer hardware queues than CPU threads that
bt_get() can hang. The symptoms of the hang were as follows:

* All tags allocated for a particular hardware queue.
* (nr_tags) pending commands for that hardware queue.
* No pending commands for the software queues associated with that
  hardware queue.
Signed-off-by: NJens Axboe <axboe@fb.com>

b3223207

04 12月, 2014 1 次提交

block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM · 47fafbc7

由 Rafael J. Wysocki 提交于 12月 04, 2014

After commit b2b49ccb (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.

Replace CONFIG_PM_RUNTIME with CONFIG_PM in the block device core.
Reviewed-by: NAaron Lu <aaron.lu@intel.com>
Acked-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

47fafbc7

02 12月, 2014 1 次提交

block: fix regression where bio_integrity_process uses wrong bio_vec iterator · 594416a7

由 Darrick J. Wong 提交于 11月 25, 2014

bio integrity handling is broken on a system with LVM layered atop a
DIF/DIX SCSI drive because device mapper clones the bio, modifies the
clone, and sends the clone to the lower layers for processing.
However, the clone bio has bi_vcnt == 0, which means that when the sd
driver calls bio_integrity_process to attach DIX data, the
for_each_segment_all() call (which uses bi_vcnt) returns immediately
and random garbage is sent to the disk on a disk write.  The disk of
course returns an error.

Therefore, teach bio_integrity_process() to use bio_for_each_segment()
to iterate the bio_vecs, since the per-bio iterator tracks which
bio_vecs are associated with that particular bio.  The integrity
handling code is effectively part of the "driver" (it's not the bio
owner), so it must use the correct iterator function.

v2: Fix a compiler warning about abandoned local variables.  This
patch supersedes "block: bio_integrity_process uses wrong bio_vec
iterator".  Patch applies against 3.18-rc6.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

594416a7

01 12月, 2014 1 次提交

blk-mq: move the kdump check to blk_mq_alloc_tag_set · 6637fadf

由 Shaohua Li 提交于 11月 30, 2014

We call blk_mq_alloc_tag_set() first then blk_mq_init_queue(). The requests are
allocated in the former function. So the kdump check should be moved to there
to really save memory.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6637fadf

25 11月, 2014 3 次提交

blk-mq: cleanup tag free handling · 70114c39

由 Jens Axboe 提交于 11月 24, 2014

We only call __blk_mq_put_tag() and __blk_mq_put_reserved_tag()
from blk_mq_put_tag(), so just inline the two calls instead of
having them as separate functions.
Signed-off-by: NJens Axboe <axboe@fb.com>

70114c39

blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map · a33c1ba2

由 Jens Axboe 提交于 11月 24, 2014

We currently use num_possible_cpus(), but that breaks on sparc64 where
the CPU ID space is discontig. Use nr_cpu_ids as the highest CPU ID
instead, so we don't end up reading from invalid memory.

Cc: stable@kernel.org # 3.13+
Signed-off-by: NJens Axboe <axboe@fb.com>

a33c1ba2

scsi: rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16 · eb846d9f

由 Hannes Reinecke 提交于 11月 17, 2014

SPC-3 defines SERVICE ACTION IN(12) and SERVICE ACTION IN(16).
So rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16 to be
consistent with SPC and to allow for better distinction.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Tested-by: NRobert Elliott <elliott@hp.com>
Reviewed-by: NRobert Elliott <elliott@hp.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

eb846d9f

24 11月, 2014 2 次提交

blk: introduce generic io stat accounting help function · 394ffa50

由 Gu Zheng 提交于 11月 24, 2014

Many block drivers accounting io stat based on bio (e.g. NVMe...),
the blk_account_io_start/end() which is based on request
does not make sense to them, so here we introduce the similar help
function named generic_start/end_io_acct base on raw sectors, and it can
simplify some driver's open io accounting code.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

394ffa50

blk-mq: handle the single queue case in blk_mq_hctx_next_cpu · b657d7e6

由 Christoph Hellwig 提交于 11月 24, 2014

Don't duplicate the code to handle the not cpu bounce case in the
caller, do it inside blk_mq_hctx_next_cpu instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

b657d7e6

20 11月, 2014 1 次提交

genhd: check for int overflow in disk_expand_part_tbl() · 5fabcb4c

由 Jens Axboe 提交于 11月 19, 2014

We can get here from blkdev_ioctl() -> blkpg_ioctl() -> add_partition()
with a user passed in partno value. If we pass in 0x7fffffff, the
new target in disk_expand_part_tbl() overflows the 'int' and we
access beyond the end of ptbl->part[] and even write to it when we
do the rcu_assign_pointer() to assign the new partition.
Reported-by: NDavid Ramos <daramos@stanford.edu>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

5fabcb4c

18 11月, 2014 2 次提交

blk-mq: add blk_mq_free_hctx_request() · 7c7f2f2b

由 Jens Axboe 提交于 11月 17, 2014

It's silly to use blk_mq_free_request() which in turn maps the
request to the hardware queue, for places where we already know
what the hardware queue is. This saves us an extra mapping of a
hardware queue on request completion, if the caller knows this
information already.
Signed-off-by: NJens Axboe <axboe@fb.com>

7c7f2f2b

blk-mq: export blk_mq_free_request() · 1a3b595a

由 Jens Axboe 提交于 11月 17, 2014

Drivers that know they are blk-mq should just use this function
instead of calling through blk_put_request().
Signed-off-by: NJens Axboe <axboe@fb.com>

1a3b595a

12 11月, 2014 2 次提交

scsi: add new scsi-command flag for tagged commands · 125c99bc

由 Christoph Hellwig 提交于 11月 03, 2014

Currently scsi piggy backs on the block layer to define the concept
of a tagged command.  But we want to be able to have block-level host-wide
tags assigned even for untagged commands like the initial INQUIRY, so add
a new SCSI-level flag for commands that are tagged at the scsi level, so
that even commands without that set can have tags assigned to them.  Note
that this alredy is the case for the blk-mq code path, and this just lets
the old path catch up with it.

We also set this flag based upon sdev->simple_tags instead of the block
queue flag, so that it is entirely independent of the block layer tagging,
and thus always correct even if a driver doesn't use block level tagging
yet.

Also remove the old blk_rq_tagged; it was only used by SCSI drivers, and
removing it forces them to look for the proper replacement.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Christie <michaelc@cs.wisc.edu>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>

125c99bc

blk-mq: add blk_mq_unique_tag() · 205fb5f5

由 Bart Van Assche 提交于 10月 30, 2014

The queuecommand() callback functions in SCSI low-level drivers
need to know which hardware context has been selected by the
block layer. Since this information is not available in the
request structure, and since passing the hctx pointer directly to
the queuecommand callback function would require modification of
all SCSI LLDs, add a function to the block layer that allows to
query the hardware context index.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

205fb5f5