提交 · 06a41a99d13d8e919e9a00a4849e6b85ae492592 · openeuler / raspberrypi-kernel

10 12月, 2014 6 次提交

blk-mq: Fix uninitialized kobject at CPU hotplugging · 06a41a99

由 Takashi Iwai 提交于 12月 10, 2014

When a CPU is hotplugged, the current blk-mq spews a warning like:

  kobject '(null)' (ffffe8ffffc8b5d8): tried to add an uninitialized object, something is seriously wrong.
  CPU: 1 PID: 1386 Comm: systemd-udevd Not tainted 3.18.0-rc7-2.g088d59b-default #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_171129-lamiak 04/01/2014
   0000000000000000 0000000000000002 ffffffff81605f07 ffffe8ffffc8b5d8
   ffffffff8132c7a0 ffff88023341d370 0000000000000020 ffff8800bb05bd58
   ffff8800bb05bd08 000000000000a0a0 000000003f441940 0000000000000007
  Call Trace:
   [<ffffffff81005306>] dump_trace+0x86/0x330
   [<ffffffff81005644>] show_stack_log_lvl+0x94/0x170
   [<ffffffff81006d21>] show_stack+0x21/0x50
   [<ffffffff81605f07>] dump_stack+0x41/0x51
   [<ffffffff8132c7a0>] kobject_add+0xa0/0xb0
   [<ffffffff8130aee1>] blk_mq_register_hctx+0x91/0xb0
   [<ffffffff8130b82e>] blk_mq_sysfs_register+0x3e/0x60
   [<ffffffff81309298>] blk_mq_queue_reinit_notify+0xf8/0x190
   [<ffffffff8107cfdc>] notifier_call_chain+0x4c/0x70
   [<ffffffff8105fd23>] cpu_notify+0x23/0x50
   [<ffffffff81060037>] _cpu_up+0x157/0x170
   [<ffffffff810600d9>] cpu_up+0x89/0xb0
   [<ffffffff815fa5b5>] cpu_subsys_online+0x35/0x80
   [<ffffffff814323cd>] device_online+0x5d/0xa0
   [<ffffffff81432485>] online_store+0x75/0x80
   [<ffffffff81236a5a>] kernfs_fop_write+0xda/0x150
   [<ffffffff811c5532>] vfs_write+0xb2/0x1f0
   [<ffffffff811c5f42>] SyS_write+0x42/0xb0
   [<ffffffff8160c4ed>] system_call_fastpath+0x16/0x1b
   [<00007f0132fb24e0>] 0x7f0132fb24e0

This is indeed because of an uninitialized kobject for blk_mq_ctx.
The blk_mq_ctx kobjects are initialized in blk_mq_sysfs_init(), but it
goes loop over hctx_for_each_ctx(), i.e. it initializes only for
online CPUs.  Thus, when a CPU is hotplugged, the ctx for the newly
onlined CPU is registered without initialization.

This patch fixes the issue by initializing the all ctx kobjects
belonging to each queue.

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=908794
Cc: <stable@vger.kernel.org>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

06a41a99

blk-mq: Use all available hardware queues · 959f5f5b

由 Bart Van Assche 提交于 12月 09, 2014

Suppose that a system has two CPU sockets, three cores per socket,
that it does not support hyperthreading and that four hardware
queues are provided by a block driver. With the current algorithm
this will lead to the following assignment of CPU cores to hardware
queues:

  HWQ 0: 0 1
  HWQ 1: 2 3
  HWQ 2: 4 5
  HWQ 3: (none)

This patch changes the queue assignment into:

  HWQ 0: 0 1
  HWQ 1: 2
  HWQ 2: 3 4
  HWQ 3: 5

In other words, this patch has the following three effects:
- All four hardware queues are used instead of only three.
- CPU cores are spread more evenly over hardware queues. For the
  above example the range of the number of CPU cores associated
  with a single HWQ is reduced from [0..2] to [1..2].
- If the number of HWQ's is a multiple of the number of CPU sockets
  it is now guaranteed that all CPU cores associated with a single
  HWQ reside on the same CPU socket.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

959f5f5b

blk-mq: Micro-optimize bt_get() · 52f7eb94

由 Bart Van Assche 提交于 12月 09, 2014

Remove a superfluous finish_wait() call. Convert the two bt_wait_ptr()
calls into a single call.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Elliott <elliott@hp.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

52f7eb94

blk-mq: Fix a race between bt_clear_tag() and bt_get() · c38d185d

由 Bart Van Assche 提交于 12月 09, 2014

What we need is the following two guarantees:
* Any thread that observes the effect of the test_and_set_bit() by
  __bt_get_word() also observes the preceding addition of 'current'
  to the appropriate wait list. This is guaranteed by the semantics
  of the spin_unlock() operation performed by prepare_and_wait().
  Hence the conversion of test_and_set_bit_lock() into
  test_and_set_bit().
* The wait lists are examined by bt_clear() after the tag bit has
  been cleared. clear_bit_unlock() guarantees that any thread that
  observes that the bit has been cleared also observes the store
  operations preceding clear_bit_unlock(). However,
  clear_bit_unlock() does not prevent that the wait lists are examined
  before that the tag bit is cleared. Hence the addition of a memory
  barrier between clear_bit() and the wait list examination.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Elliott <elliott@hp.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: <stable@vger.kernel.org> # v3.13+
Signed-off-by: NJens Axboe <axboe@fb.com>

c38d185d

blk-mq: Avoid that __bt_get_word() wraps multiple times · 9e98e9d7

由 Bart Van Assche 提交于 12月 09, 2014

If __bt_get_word() is called with last_tag != 0, if the first
find_next_zero_bit() fails, if after wrap-around the
test_and_set_bit() call fails and find_next_zero_bit() succeeds,
if the next test_and_set_bit() call fails and subsequently
find_next_zero_bit() does not find a zero bit, then another
wrap-around will occur. Avoid this by introducing an additional
local variable.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Elliott <elliott@hp.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: <stable@vger.kernel.org> # v3.13+
Signed-off-by: NJens Axboe <axboe@fb.com>

9e98e9d7

blk-mq: Fix a use-after-free · 45a9c9d9

由 Bart Van Assche 提交于 12月 09, 2014

blk-mq users are allowed to free the memory request_queue.tag_set
points at after blk_cleanup_queue() has finished but before
blk_release_queue() has started. This can happen e.g. in the SCSI
core. The SCSI core namely embeds the tag_set structure in a SCSI
host structure. The SCSI host structure is freed by
scsi_host_dev_release(). This function is called after
blk_cleanup_queue() finished but can be called before
blk_release_queue().

This means that it is not safe to access request_queue.tag_set from
inside blk_release_queue(). Hence remove the blk_sync_queue() call
from blk_release_queue(). This call is not necessary - outstanding
requests must have finished before blk_release_queue() is
called. Additionally, move the blk_mq_free_queue() call from
blk_release_queue() to blk_cleanup_queue() to avoid that struct
request_queue.tag_set gets accessed after it has been freed.

This patch avoids that the following kernel oops can be triggered
when deleting a SCSI host for which scsi-mq was enabled:

Call Trace:
 [<ffffffff8109a7c4>] lock_acquire+0xc4/0x270
 [<ffffffff814ce111>] mutex_lock_nested+0x61/0x380
 [<ffffffff812575f0>] blk_mq_free_queue+0x30/0x180
 [<ffffffff8124d654>] blk_release_queue+0x84/0xd0
 [<ffffffff8126c29b>] kobject_cleanup+0x7b/0x1a0
 [<ffffffff8126c140>] kobject_put+0x30/0x70
 [<ffffffff81245895>] blk_put_queue+0x15/0x20
 [<ffffffff8125c409>] disk_release+0x99/0xd0
 [<ffffffff8133d056>] device_release+0x36/0xb0
 [<ffffffff8126c29b>] kobject_cleanup+0x7b/0x1a0
 [<ffffffff8126c140>] kobject_put+0x30/0x70
 [<ffffffff8125a78a>] put_disk+0x1a/0x20
 [<ffffffff811d4cb5>] __blkdev_put+0x135/0x1b0
 [<ffffffff811d56a0>] blkdev_put+0x50/0x160
 [<ffffffff81199eb4>] kill_block_super+0x44/0x70
 [<ffffffff8119a2a4>] deactivate_locked_super+0x44/0x60
 [<ffffffff8119a87e>] deactivate_super+0x4e/0x70
 [<ffffffff811b9833>] cleanup_mnt+0x43/0x90
 [<ffffffff811b98d2>] __cleanup_mnt+0x12/0x20
 [<ffffffff8107252c>] task_work_run+0xac/0xe0
 [<ffffffff81002c01>] do_notify_resume+0x61/0xa0
 [<ffffffff814d2c58>] int_signal+0x12/0x17
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Elliott <elliott@hp.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: <stable@vger.kernel.org> # v3.13+
Signed-off-by: NJens Axboe <axboe@fb.com>

45a9c9d9

09 12月, 2014 1 次提交

blk-mq: prevent unmapped hw queue from being scheduled · 19c66e59

由 Ming Lei 提交于 12月 03, 2014

When one hardware queue has no mapped software queues, it
shouldn't have been scheduled. Otherwise WARNING or OOPS
can triggered.

blk_mq_hw_queue_mapped() helper is introduce for fixing
the problem.
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

19c66e59

08 12月, 2014 2 次提交

blk-mq: re-check for available tags after running the hardware queue · 080ff351

由 Jens Axboe 提交于 12月 08, 2014

If we run out of tags and have to sleep, we run the hardware queue
to kick pending IO into gear. During that run, we may have completed
requests, so re-check if we have free tags before going to sleep.
Signed-off-by: NJens Axboe <axboe@fb.com>

080ff351

blk-mq: fix hang in bt_get() · b3223207

由 Bart Van Assche 提交于 12月 08, 2014

Avoid that if there are fewer hardware queues than CPU threads that
bt_get() can hang. The symptoms of the hang were as follows:

* All tags allocated for a particular hardware queue.
* (nr_tags) pending commands for that hardware queue.
* No pending commands for the software queues associated with that
  hardware queue.
Signed-off-by: NJens Axboe <axboe@fb.com>

b3223207

01 12月, 2014 1 次提交

blk-mq: move the kdump check to blk_mq_alloc_tag_set · 6637fadf

由 Shaohua Li 提交于 11月 30, 2014

We call blk_mq_alloc_tag_set() first then blk_mq_init_queue(). The requests are
allocated in the former function. So the kdump check should be moved to there
to really save memory.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6637fadf

25 11月, 2014 2 次提交

blk-mq: cleanup tag free handling · 70114c39

由 Jens Axboe 提交于 11月 24, 2014

We only call __blk_mq_put_tag() and __blk_mq_put_reserved_tag()
from blk_mq_put_tag(), so just inline the two calls instead of
having them as separate functions.
Signed-off-by: NJens Axboe <axboe@fb.com>

70114c39

blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map · a33c1ba2

由 Jens Axboe 提交于 11月 24, 2014

We currently use num_possible_cpus(), but that breaks on sparc64 where
the CPU ID space is discontig. Use nr_cpu_ids as the highest CPU ID
instead, so we don't end up reading from invalid memory.

Cc: stable@kernel.org # 3.13+
Signed-off-by: NJens Axboe <axboe@fb.com>

a33c1ba2

24 11月, 2014 2 次提交

blk: introduce generic io stat accounting help function · 394ffa50

由 Gu Zheng 提交于 11月 24, 2014

Many block drivers accounting io stat based on bio (e.g. NVMe...),
the blk_account_io_start/end() which is based on request
does not make sense to them, so here we introduce the similar help
function named generic_start/end_io_acct base on raw sectors, and it can
simplify some driver's open io accounting code.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

394ffa50

blk-mq: handle the single queue case in blk_mq_hctx_next_cpu · b657d7e6

由 Christoph Hellwig 提交于 11月 24, 2014

Don't duplicate the code to handle the not cpu bounce case in the
caller, do it inside blk_mq_hctx_next_cpu instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

b657d7e6

20 11月, 2014 1 次提交

genhd: check for int overflow in disk_expand_part_tbl() · 5fabcb4c

由 Jens Axboe 提交于 11月 19, 2014

We can get here from blkdev_ioctl() -> blkpg_ioctl() -> add_partition()
with a user passed in partno value. If we pass in 0x7fffffff, the
new target in disk_expand_part_tbl() overflows the 'int' and we
access beyond the end of ptbl->part[] and even write to it when we
do the rcu_assign_pointer() to assign the new partition.
Reported-by: NDavid Ramos <daramos@stanford.edu>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

5fabcb4c

18 11月, 2014 2 次提交

blk-mq: add blk_mq_free_hctx_request() · 7c7f2f2b

由 Jens Axboe 提交于 11月 17, 2014

It's silly to use blk_mq_free_request() which in turn maps the
request to the hardware queue, for places where we already know
what the hardware queue is. This saves us an extra mapping of a
hardware queue on request completion, if the caller knows this
information already.
Signed-off-by: NJens Axboe <axboe@fb.com>

7c7f2f2b

blk-mq: export blk_mq_free_request() · 1a3b595a

由 Jens Axboe 提交于 11月 17, 2014

Drivers that know they are blk-mq should just use this function
instead of calling through blk_put_request().
Signed-off-by: NJens Axboe <axboe@fb.com>

1a3b595a

12 11月, 2014 2 次提交

blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable · 2a90d4aa

由 Paolo Bonzini 提交于 11月 07, 2014

blk-mq is using preempt_disable/enable in order to ensure that the
queue runners are placed on the right CPU.  This does not work with
the RT patches, because __blk_mq_run_hw_queue takes a non-raw
spinlock with the preemption-disabled region.  If there is contention
on the lock, this violates the rules for preemption-disabled regions.

While this should be easily fixable within the RT patches just by doing
migrate_disable/enable, we can do better and document _why_ this
particular region runs with disabled preemption.  After the previous
patch, it is trivial to switch it to get/put_cpu; the RT patches then
can change it to get_cpu_light, which lets virtio-blk run under RT
kernels.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: NClark Williams <williams@redhat.com>
Tested-by: NClark Williams <williams@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2a90d4aa

blk_mq: call preempt_disable/enable in blk_mq_run_hw_queue, and only if needed · 398205b8

由 Paolo Bonzini 提交于 11月 07, 2014

preempt_disable/enable surrounds every call to blk_mq_run_hw_queue,
except the one in blk-flush.c.  In fact that one is always asynchronous,
and it does not need smp_processor_id().

We can do the same for all other calls, avoiding preempt_disable when
async is true.  This avoids peppering blk-mq.c with preemption-disabled
regions.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: NClark Williams <williams@redhat.com>
Tested-by: NClark Williams <williams@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

398205b8

30 10月, 2014 2 次提交

blk-mq: add BLK_MQ_F_DEFER_ISSUE support flag · e167dfb5

由 Jens Axboe 提交于 10月 29, 2014

Drivers can now tell blk-mq if they take advantage of the deferred
issue through 'last' or not. If they do, don't do queue-direct
for sync IO. This is a preparation patch for the nvme conversion.
Signed-off-by: NJens Axboe <axboe@fb.com>

e167dfb5

blk-mq: add a 'list' parameter to ->queue_rq() · 74c45052

由 Jens Axboe 提交于 10月 29, 2014

Since we have the notion of a 'last' request in a chain, we can use
this to have the hardware optimize the issuing of requests. Add
a list_head parameter to queue_rq that the driver can use to
temporarily store hw commands for issue when 'last' is true. If we
are doing a chain of requests, pass in a NULL list for the first
request to force issue of that immediately, then batch the remainder
for deferred issue until the last request has been sent.

Instead of adding yet another argument to the hot ->queue_rq path,
encapsulate the passed arguments in a blk_mq_queue_data structure.
This is passed as a constant, and has been tested as faster than
passing 4 (or even 3) args through ->queue_rq. Update drivers for
the new ->queue_rq() prototype. There are no functional changes
in this patch for drivers - if they don't use the passed in list,
then they will just queue requests individually like before.
Signed-off-by: NJens Axboe <axboe@fb.com>

74c45052

22 10月, 2014 1 次提交

block: remove artifical max_hw_sectors cap · 34b48db6

由 Christoph Hellwig 提交于 9月 06, 2014

Set max_sectors to the value the drivers provides as hardware limit by
default.  Linux had proper I/O throttling for a long time and doesn't
rely on a artifically small maximum I/O size anymore.  By not limiting
the I/O size by default we remove an annoying tuning step required for
most Linux installation.

Note that both the user, and if absolutely required the driver can still
impose a limit for FS requests below max_hw_sectors_kb.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

34b48db6

14 10月, 2014 2 次提交

blk-mq: allocate cpumask on the home node · a86073e4

由 Jens Axboe 提交于 10月 13, 2014

All other allocs are done on the specific node, somehow the
cpumask for hw queue runs was missed. Fix that by using
zalloc_cpumask_var_node() in blk_mq_init_queue().
Signed-off-by: NJens Axboe <axboe@fb.com>

a86073e4

bio-integrity: remove the needless fail handle of bip_slab creating · b65c7491

由 Gu Zheng 提交于 10月 13, 2014

bip_slab is created with SLAB_PANIC, so the fail handler is unneeded.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

b65c7491

13 10月, 2014 2 次提交

block: include func name in __get_request prints · 7b2b10e0

由 Robert Elliott 提交于 8月 27, 2014

In __get_request calls to printk_ratelimited, include the function name so
the callbacks suppressed message matches the messages that are printed,
and add "dev" before the device name so it matches other block layer
messages.
Signed-off-by: NRobert Elliott <elliott@hp.com>
Reviewed-by: NWebb Scales <webbnh@hp.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7b2b10e0

block: make blk_update_request print prefix match ratelimited prefix · ef3ecb66

由 Robert Elliott 提交于 8月 27, 2014

In blk_update_request, change the printk_ratelimited
prefix from end_request to blk_update_request so it
matches the name printed if rate limiting occurs.

Old:
[10234.933106] blk_update_request: 174 callbacks suppressed
[10234.934940] end_request: critical target error, dev sdr, sector 16
[10234.949788] end_request: critical target error, dev sdr, sector 16

New:
[16863.445173] blk_update_request: 398 callbacks suppressed
[16863.447029] blk_update_request: critical target error, dev sdr, sector
1442066176
[16863.449383] blk_update_request: critical target error, dev sdr, sector
802802888
[16863.451680] blk_update_request: critical target error, dev sdr, sector
1609535456
Signed-off-by: NRobert Elliott <elliott@hp.com>
Reviewed-by: NWebb Scales <webbnh@hp.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

ef3ecb66

10 10月, 2014 1 次提交

blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio · 764f612c

由 Ming Lei 提交于 10月 09, 2014

It isn't correct to figure out req->bi_phys_segments from bio->bi_vcnt
if the bio is cloned.
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Tested-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

764f612c

09 10月, 2014 1 次提交

block: fix alignment_offset math that assumes io_min is a power-of-2 · b8839b8c

由 Mike Snitzer 提交于 10月 08, 2014

The math in both blk_stack_limits() and queue_limit_alignment_offset()
assume that a block device's io_min (aka minimum_io_size) is always a
power-of-2.  Fix the math such that it works for non-power-of-2 io_min.

This issue (of alignment_offset != 0) became apparent when testing
dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
1280K.  Commit fdfb4c8c ("dm thin: set minimum_io_size to pool's data
block size") unlocked the potential for alignment_offset != 0 due to
the dm-thin-pool's io_min possibly being a non-power-of-2.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b8839b8c

07 10月, 2014 2 次提交

blk-mq: Make bt_clear_tag() easier to read · 9d8f0bcc

由 Bart Van Assche 提交于 10月 07, 2014

Eliminate a backwards goto statement from bt_clear_tag().
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

9d8f0bcc

blk-mq: fix potential hang if rolling wakeup depth is too high · abab13b5

由 Jens Axboe 提交于 10月 07, 2014

We currently divide the queue depth by 4 as our batch wakeup
count, but we split the wakeups over BT_WAIT_QUEUES number of
wait queues. This defaults to 8. If the product of the resulting
batch wake count and BT_WAIT_QUEUES is higher than the device
queue depth, we can get into a situation where a task goes to
sleep waiting for a request, but never gets woken up.
Reported-by: NBart Van Assche <bvanassche@acm.org>
Fixes: 4bb659b1
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

abab13b5

04 10月, 2014 2 次提交

block: add bioset_create_nobvec() · d8f429e1

由 Junichi Nomura 提交于 10月 03, 2014

Users of bio_clone_fast() do not want bios with their own bvecs.
Allocating a bvec mempool as part of the bioset intended for such users
is a waste of memory.

bioset_create_nobvec() creates a bioset that doesn't have the bvec
mempool.
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

d8f429e1

block: use bio_clone_fast() in blk_rq_prep_clone() · 11dfce50

由 Junichi Nomura 提交于 10月 03, 2014

Request cloning clones bios in the request to track the completion
of each bio.
For that purpose, we can use bio_clone_fast() instead of bio_clone()
to avoid unnecessary allocation and copy of bvecs.

This patch reduces memory footprint of request-based device-mapper
(about 1-4KB for each request) and is a preparation for further
reduction of memory usage by removing unused bvec mempool.
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

11dfce50

01 10月, 2014 1 次提交

block: misplaced rq_complete tracepoint · 4a0efdc9

由 Hannes Reinecke 提交于 10月 01, 2014

The rq_complete tracepoint was never issued for empty requests,
causing the resulting blktrace information to never show any
completion for those request.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

4a0efdc9

28 9月, 2014 1 次提交

block: Replace strnicmp with strncasecmp · 58294050

由 Rasmus Villemoes 提交于 9月 16, 2014

The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics
and a slightly buggy strncasecmp. The latter is the POSIX name, so
strnicmp was renamed to strncasecmp, and strnicmp made into a wrapper
for the new strncasecmp to avoid breaking existing users.

To allow the compat wrapper strnicmp to be removed at some point in
the future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NJens Axboe <axboe@fb.com>

58294050

27 9月, 2014 6 次提交

block: Add T10 Protection Information functions · 2341c2f8

由 Martin K. Petersen 提交于 9月 26, 2014

The T10 Protection Information format is also used by some devices that
do not go through the SCSI layer (virtual block devices, NVMe). Relocate
the relevant functions to a block layer library that can be used without
involving SCSI.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

2341c2f8

block: Don't merge requests if integrity flags differ · 4eaf99be

由 Martin K. Petersen 提交于 9月 26, 2014

We'd occasionally merge requests with conflicting integrity flags.
Introduce a merge helper which checks that the requests have compatible
integrity payloads.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4eaf99be

block: Integrity checksum flag · aae7df50

由 Martin K. Petersen 提交于 9月 26, 2014

Make the choice of checksum a per-I/O property by introducing a flag
that can be inspected by the SCSI layer. There are several reasons for
this:

 1. It allows us to switch choice of checksum without unloading and
    reloading the HBA driver.

 2. During error recovery we need to be able to tell the HBA that
    checksums read from disk should not be verified and converted to IP
    checksums.

 3. For error injection purposes we need to be able to write a bad guard
    tag to storage. Since the storage device only supports T10 CRC we
    need to be able to disable IP checksum conversion on the HBA.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

aae7df50

block: Relocate bio integrity flags · b1f01388

由 Martin K. Petersen 提交于 9月 26, 2014

Move flags affecting the integrity code out of the bio bi_flags and into
the block integrity payload.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b1f01388

block: Add a disk flag to block integrity profile · 3aec2f41

由 Martin K. Petersen 提交于 9月 26, 2014

So far we have relied on the app tag size to determine whether a disk
has been formatted with T10 protection information or not. However, not
all target devices provide application tag storage.

Add a flag to the block integrity profile that indicates whether the
disk has been formatted with protection information.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: NJens Axboe <axboe@fb.com>

3aec2f41

block: Add prefix to block integrity profile flags · 8288f496

由 Martin K. Petersen 提交于 9月 26, 2014

Add a BLK_ prefix to the integrity profile flags. Also rename the flags
to be more consistent with the generate/verify terminology in the rest
of the integrity code.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

8288f496